Recommended reading: Dive into Deep Learning — D2L: Chapter 11 up to 11.5.
Homework 4: Add cross-attention to your prior GRU-based seq2seq model so that the decoder can attend over all encoder hidden states at each decoding step (same translation task). Report and compare accuracy/bleu vs your previous best.
Cross-Attention diagram preview: