Recommended reading: Dive into Deep Learning — D2L: Chapter 11.
Research paper: Vaswani et al., "Attention Is All You Need" (NeurIPS 2017).