12.1 – Multimodal Learning

Vision-language alignment, contrastive objectives (e.g., CLIP), and fusion strategies for building multimodal systems.

Slide preview

Suggested reading

12.2 – Diffusion Models

Noise schedules, forward/reverse processes, and sampling recipes that power modern generative diffusion pipelines.

Slide preview

Suggested reading

12.3 – Variational Autoencoders (VAE)

Latent variable modeling with encoder/decoder pairs, ELBO optimization, and practical VAE architectures.

Slide preview

Suggested reading

12.4 – Generative Adversarial Networks (GAN)

Adversarial training, loss variants, and practical tips for stabilizing GANs for image generation.

Slide preview

Suggested reading

End of course