Before diving into deep multilayer perceptrons, we add regularization (weight decay) and data splits (train/validation). We minimize MSE(y, ŷ) + λ‖θ‖² on the training split and evaluate generalization on the validation split.
Recommended reading: Dive into Deep Learning — D2L: 3.6–3.7; 12; 19.
The Colab notebook contains the lecture code for Module 3 (optimization and training MLPs). Run the cells sequentially as demonstrated in the lecture.