Lecture 1 – From Linear Regression to Neural Networks
Lecture Notes
Key definitions & takeaway points.
  • Historical milestones of neural networks and deep learning.
  • Difference between traditional machine learning and representation learning.
  • Definition of a perceptron and the concept of activation functions.
  • Training via empirical risk minimisation.
  • Preview of optimisation algorithms (GD, SGD).
📖 Further Reading

For a deeper treatment, you can read D2L Online Book until end of 5.2. Implementation of Multilayer Perceptrons.

1 – From Maximum Likelihood to Cross-Entropy Loss

6 Parts + Bonus (complete sequentially).

  1. Derive binary cross-entropy from maximum likelihood (include gradient).
  2. Extend derivation to multi-class softmax.
  3. Code binary & multi-class cross-entropy from scratch.
  4. Verify numerically vs. scikit-learn logistic regression.
  5. Explore effect of label smoothing.
  6. Analyse gradient behaviour near saturation.
  7. Bonus: Implement focal loss and compare on an imbalanced dataset.

2 – Normal Equations vs. Gradient Descent

Understand computational trade-offs between analytical and iterative solutions. Implement both methods, measure runtime, accuracy, memory usage and conditioning effects. See the template function compare_methods in the starter notebook.

3 – SGD Exploration: Escaping Local Minima

Reproduce the two-hole landscape, perform a systematic hyper-parameter study and design your own complex loss landscape.

4 – Modern Optimizers Showdown (PyTorch)

Compare SGD, Momentum, Adam, AdaGrad and RMSProp on challenging optimisation problems including the Rosenbrock function and your two-hole landscape.

5 – Hebbian Learning: "Neurons That Fire Together, Wire Together"

Implement pure Hebbian learning, Oja's rule and analyse their limitations using a pattern association task.

6 – The XOR Challenge

Build and train a minimal neural network that solves the XOR problem to solidify your understanding of non-linear activation functions.

Bonus – Bias-Variance Decomposition in Practice

Conduct an empirical study of the bias-variance trade-off across model complexities using bootstrap sampling.