Lecture 8 – Attention Mechanism (Part 1)
Lecture 8.2 – Attention Mechanism (Part 2)
Lecture 8.3 – Implementing Attention in seq2seq Decoder
📚 Resources & Live Coding

Recommended reading: Dive into Deep Learning — D2L: Chapter 11 up to 11.5.

Homework note: Submit with Module 9 — Add cross-attention to your prior GRU-based seq2seq model so that the decoder can attend over all encoder hidden states at each decoding step (same translation task). Report and compare accuracy/bleu vs your previous best.

Cross-Attention diagram preview: