Readings labeled [Vision] are from Foundations of Computer Vision by Antonio Torralba, Phillip Isola, and William T. Freeman. (MIT Press, 2024. ISBN: 9780262048972.) The book is available online under a CC BY-NC-ND license.
Session 1: Introduction to Deep Learning
Required readings:
Optional readings:
Session 2: How to Train a Neural Net
Required readings:
- [Vision] Chapter 10: Gradient-Based Learning Algorithms
- [Vision] Chapter 14: Backpropagation
Session 3: Approximation Theory
No required readings.
Optional readings:
- Deep learning theory lecture notes sections 2 and 5
Session 4: Architectures: Grids
Required readings:
Session 5: Architectures: Graphs
Required readings:
- Hamilton, William. Graph Representation Learning, chapter 5 (mainly focus on the content in section 5.1)
Optional readings:
- Xu, Keyulu, Weihua Hu, et al. “How Powerful Are Graph Neural Networks?” arXiv preprint arXiv:1810.00826 (2018).
- Sanchez-Lengeling, Benjamin, Emily Reif, et al. “A Gentle Introduction to Graph Neural Networks.” Distill, 2021.
Session 6: Generalization Theory
No required readings.
Optional readings:
- Zhang, Chiyuan, Samy Bengio, et al. “Understanding Deep Learning Requires Rethinking Generalization.” arXiv preprint arXiv:1611.03530 (2016).
- Belkin, Mikhail, Daniel Hsu, et al. “Reconciling Modern Machine-Learning Practice and the Classical Bias–Variance Trade-Off.” Proceedings of the National Academy of Sciences 116, no. 32 (2019): 15849–15854.
Session 7: Scaling Rules for Optimization
Required readings:
Session 8: Architectures: Transformers
Required readings:
- [Vision] Chapter 26: Transformers (Note that this reading focuses on examples from vision, but you can apply the same architecture to any kind of data.)
Session 9: Hacker’s Guide to Deep Learning
Optional readings:
- A Recipe for Training Neural Networks
- Rules of Machine Learning: Best Practices for ML Engineering (PDF)
Session 10: Architectures: Memory
Required readings:
Optional readings:
Session 11: Representation Learning: Reconstruction-Based
Required readings:
- [Vision] Chapter 30: Representation Learning
Optional readings:
- Bengio, Yoshua, Aaron Courville, and Pascal Vincent. “Representation Learning: A Review and New Perspectives.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35, no. 8 (2013): 1798–1828.
Session 12: Representation Learning: Similarity-Based
Required readings:
- Continue with session 11 readings
Optional readings:
- Wang, Tongzhou, and Phillip Isola. “Understanding Contrastive Representation Learning Through Alignment and Uniformity on the Hypersphere.” In International Conference on Machine Learning, pp. 9929–9939. PMLR, 2020.
Session 13: Representation Learning: Theory
No required readings.
Optional readings:
- Cho, Youngmin, and Lawrence Saul. “Kernel Methods for Deep Learning.” Advances in Neural Information Processing Systems 22 (2009).
- Lee, Jaehoon, Yasaman Bahri, et al. “Deep Neural Networks as Gaussian Processes.” arXiv preprint arXiv:1711.00165 (2017).
Session 14: Generative Models: Basics
Required readings:
- [Vision] Chapter 32: Generative Models
Session 15: Generative Models: Representation Learning Meets Generative Modeling
Required readings:
Optional readings:
- Kingma, Diederik P., and Max Welling. “Auto-Encoding Variational Bayes.” arXiv preprint arXiv:1312.6114 (2013).
Session 16: Generative Models: Conditional Models
No required readings.
Optional readings:
Session 17: Generalization: Out-of-Distribution (OOD)
Required readings:
- Mądry, Aleksander, and Ludwig Schmidt. “A Brief Introduction to Adversarial Examples.” gradient science, July 6, 2018.
- Mądry, Aleksander, Ludwig Schmidt, and Dimitris Tsipras. “Training Robust Classifiers (Part 1).” gradient science, July 11, 2018.
Optional readings:
- Koh, Pang Wei, Shiori Sagawa, et al. “Wilds: A Benchmark of In-the-Wild Distribution Shifts.” In International Conference on Machine Learning, pp. 5637–5664. PMLR, 2021.
- Geirhos, Robert, Jörn-Henrik Jacobsen, et al. “Shortcut Learning in Deep Neural Networks.” Nature Machine Intelligence 2, no. 11 (2020): 665–673.
- Tsipras, Dimitris, Shibani Santurkar, et al. “From Imagenet to Image Classification: Contextualizing Progress on Benchmarks.” In International Conference on Machine Learning, pp. 9625–9635. PMLR, 2020.
- Xiao, Kai, Logan Engstrom, et al. “Noise or Signal: The Role of Image Backgrounds in Object Recognition.” arXiv preprint arXiv:2006.09994 (2020).
- Xu, Keyulu, Mozhi Zhang, et al. “How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks.” arXiv preprint arXiv:2009.11848 (2020).
Session 18: Transfer Learning: Models
Required readings:
Optional readings:
- Farahani, Abolfazl, Sahar Voghoei, et al. “A Brief Review of Domain Adaptation.” Advances in Data Science and Information Engineering: Proceedings from ICDATA 2020 and IKE 2020 (2021): 877–894.
- Kay, Justin, Timm Haucke, et al. “Align and Distill: Unifying and Improving Domain Adaptive Object Detection.” arXiv preprint arXiv:2403.12029 (2024).
Session 19: Transfer Learning: Data
Required readings:
- Continue with session 19 readings.
Session 20: Scaling Laws
Required readings:
- Kaplan, Jared, Sam McCandlish, et al. “Scaling Laws for Neural Language Models.” arXiv preprint arXiv:2001.08361 (2020).
Optional readings:
- Hoffmann, Jordan, Sebastian Borgeaud, et al. “Training Compute-Optimal Large Language Models.” arXiv preprint arXiv:2203.15556 (2022).
- Sharma, Utkarsh, and Jared Kaplan. “A Neural Scaling Law from the Dimension of the Data Manifold.” arXiv preprint arXiv:2004.10802 (2020).
- Sorscher, Ben, Robert Geirhos, et al. “Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning.” Advances in Neural Information Processing Systems 35 (2022): 19523–19536.
- McCandlish, Sam, Jared Kaplan, et al. “An Empirical Model of Large-Batch Training.” arXiv preprint arXiv:1812.06162 (2018).
Session 21: Large Language Models
No required readings.
Optional readings:
- Kojima, Takeshi, Shixiang Shane Gu, et al. “Large Language Models Are Zero-Shot Reasoners.” Advances in Neural Information Processing Systems 35 (2022): 22199–22213.
Session 22: AI for Musical Creativity
No required readings.
Session 23: Metrized Deep Learning
No required readings.
Optional readings:
- Bernstein, Jeremy, and Laker Newhouse. “Modular Duality in Deep Learning.” arXiv preprint arXiv:2410.21265 (2024).
- Flynn, Thomas. “The Duality Structure Gradient Descent Algorithm: Analysis and Applications to Neural Networks.” arXiv preprint arXiv:1708.00523 (2017).
- Large, Tim, Yang Liu, et al. “Scalable Optimization in the Modular Norm.” Advances in Neural Information Processing Systems 37 (2024): 73501–73548.
Session 24: Inference Methods for Deep Learning
No required readings.
Optional readings:
- Sun, Yu, Xiaolong Wang, et al. “Test-Time Training with Self-Supervision for Generalization Under Distribution Shifts.” In International Conference on Machine Learning, pp. 9229–9248. PMLR, 2020.
- Zelikman, Eric, Yuhuai Wu, et al. “Star: Bootstrapping Reasoning with Reasoning.” Advances in Neural Information Processing Systems 35 (2022): 15476–15488.
Session 25: Efficient Policy Optimization Techniques for LLMs
No required readings.