Lecture 3 – Optimization Foundations & Ablation Methodology

Before diving into deep multilayer perceptrons, we add regularization (weight decay) and data splits (train/validation). We minimize MSE(y, ŷ) + λ‖θ‖² on the training split and evaluate generalization on the validation split.

3.2: Training MLP I
3.3: Training MLP II
📚 Resources & Lecture Code

Recommended reading: Dive into Deep Learning — D2L: 3.6–3.7; 12; 19.

The Colab notebook contains the lecture code for Module 3 (optimization and training MLPs). Run the cells sequentially as demonstrated in the lecture.