Readings | Deep Learning | Electrical Engineering and Computer Science

Readings labeled [Vision] are from Foundations of Computer Vision by Antonio Torralba, Phillip Isola, and William T. Freeman. (MIT Press, 2024. ISBN: 9780262048972.) The book is available online under a CC BY-NC-ND license.

Session 1: Introduction to Deep Learning

Required readings:

Notation for this course
[Vision] Chapter 12: Neural Networks.

Optional readings:

[Vision] Chapter 13: Neural Networks as Distribution Transformers

Session 2: How to Train a Neural Net

Required readings:

[Vision] Chapter 10: Gradient-Based Learning Algorithms
[Vision] Chapter 14: Backpropagation

Session 3: Approximation Theory

No required readings.

Optional readings:

Deep learning theory lecture notes sections 2 and 5

Session 4: Architectures: Grids

Required readings:

[Vision] Chapter 24: Convolutional Neural Nets

Session 5: Architectures: Graphs

Required readings:

Hamilton, William. Graph Representation Learning, chapter 5 (mainly focus on the content in section 5.1)

Optional readings:

Xu, Keyulu, Weihua Hu, et al. “How Powerful Are Graph Neural Networks?” arXiv preprint arXiv:1810.00826 (2018).
Sanchez-Lengeling, Benjamin, Emily Reif, et al. “A Gentle Introduction to Graph Neural Networks.” Distill, 2021.

Session 6: Generalization Theory

No required readings.

Optional readings:

Zhang, Chiyuan, Samy Bengio, et al. “Understanding Deep Learning Requires Rethinking Generalization.” arXiv preprint arXiv:1611.03530 (2016).
Belkin, Mikhail, Daniel Hsu, et al. “Reconciling Modern Machine-Learning Practice and the Classical Bias–Variance Trade-Off.” Proceedings of the National Academy of Sciences 116, no. 32 (2019): 15849–15854.

Session 7: Scaling Rules for Optimization

Required readings:

13.7 General Steepest Descent

Session 8: Architectures: Transformers

Required readings:

[Vision] Chapter 26: Transformers (Note that this reading focuses on examples from vision, but you can apply the same architecture to any kind of data.)

Session 9: Hacker’s Guide to Deep Learning

Optional readings:

Session 10: Architectures: Memory

Required readings:

[Vision] Chapter 25: Recurrent Neural Networks

Optional readings:

Deep Learning Recurrent Networks: Stability Analysis and LSTMs (PDF)

Session 11: Representation Learning: Reconstruction-Based

Required readings:

[Vision] Chapter 30: Representation Learning

Optional readings:

Bengio, Yoshua, Aaron Courville, and Pascal Vincent. “Representation Learning: A Review and New Perspectives.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35, no. 8 (2013): 1798–1828.

Session 12: Representation Learning: Similarity-Based

Required readings:

Continue with session 11 readings

Optional readings:

Wang, Tongzhou, and Phillip Isola. “Understanding Contrastive Representation Learning Through Alignment and Uniformity on the Hypersphere.” In International Conference on Machine Learning, pp. 9929–9939. PMLR, 2020.

Session 13: Representation Learning: Theory

No required readings.

Optional readings:

Cho, Youngmin, and Lawrence Saul. “Kernel Methods for Deep Learning.” Advances in Neural Information Processing Systems 22 (2009).
Lee, Jaehoon, Yasaman Bahri, et al. “Deep Neural Networks as Gaussian Processes.” arXiv preprint arXiv:1711.00165 (2017).

Session 14: Generative Models: Basics

Required readings:

[Vision] Chapter 32: Generative Models

Session 15: Generative Models: Representation Learning Meets Generative Modeling

Required readings:

[Vision] Chapter 33: Generative Modeling Meets Representation Learning

Optional readings:

Kingma, Diederik P., and Max Welling. “Auto-Encoding Variational Bayes.” arXiv preprint arXiv:1312.6114 (2013).

Session 16: Generative Models: Conditional Models

No required readings.

Optional readings:

[Vision] Chapter 34: Conditional Generative Models

Session 17: Generalization: Out-of-Distribution (OOD)

Required readings:

Mądry, Aleksander, and Ludwig Schmidt. “A Brief Introduction to Adversarial Examples.” gradient science, July 6, 2018.
Mądry, Aleksander, Ludwig Schmidt, and Dimitris Tsipras. “Training Robust Classifiers (Part 1).” gradient science, July 11, 2018.

Optional readings:

Koh, Pang Wei, Shiori Sagawa, et al. “Wilds: A Benchmark of In-the-Wild Distribution Shifts.” In International Conference on Machine Learning, pp. 5637–5664. PMLR, 2021.
Geirhos, Robert, Jörn-Henrik Jacobsen, et al. “Shortcut Learning in Deep Neural Networks.” Nature Machine Intelligence 2, no. 11 (2020): 665–673.
Tsipras, Dimitris, Shibani Santurkar, et al. “From Imagenet to Image Classification: Contextualizing Progress on Benchmarks.” In International Conference on Machine Learning, pp. 9625–9635. PMLR, 2020.
Xiao, Kai, Logan Engstrom, et al. “Noise or Signal: The Role of Image Backgrounds in Object Recognition.” arXiv preprint arXiv:2006.09994 (2020).
Xu, Keyulu, Mozhi Zhang, et al. “How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks.” arXiv preprint arXiv:2009.11848 (2020).

Session 18: Transfer Learning: Models

Required readings:

[Vision] Chapter 37: Transfer Learning and Adaptation

Optional readings:

Farahani, Abolfazl, Sahar Voghoei, et al. “A Brief Review of Domain Adaptation.” Advances in Data Science and Information Engineering: Proceedings from ICDATA 2020 and IKE 2020 (2021): 877–894.
Kay, Justin, Timm Haucke, et al. “Align and Distill: Unifying and Improving Domain Adaptive Object Detection.” arXiv preprint arXiv:2403.12029 (2024).

Session 19: Transfer Learning: Data

Required readings:

Continue with session 19 readings.

Session 20: Scaling Laws

Required readings:

Kaplan, Jared, Sam McCandlish, et al. “Scaling Laws for Neural Language Models.” arXiv preprint arXiv:2001.08361 (2020).

Optional readings:

Hoffmann, Jordan, Sebastian Borgeaud, et al. “Training Compute-Optimal Large Language Models.” arXiv preprint arXiv:2203.15556 (2022).
Sharma, Utkarsh, and Jared Kaplan. “A Neural Scaling Law from the Dimension of the Data Manifold.” arXiv preprint arXiv:2004.10802 (2020).
Sorscher, Ben, Robert Geirhos, et al. “Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning.” Advances in Neural Information Processing Systems 35 (2022): 19523–19536.
McCandlish, Sam, Jared Kaplan, et al. “An Empirical Model of Large-Batch Training.” arXiv preprint arXiv:1812.06162 (2018).

Session 21: Large Language Models

No required readings.

Optional readings:

Kojima, Takeshi, Shixiang Shane Gu, et al. “Large Language Models Are Zero-Shot Reasoners.” Advances in Neural Information Processing Systems 35 (2022): 22199–22213.

Session 22: AI for Musical Creativity

No required readings.

Session 23: Metrized Deep Learning

No required readings.

Optional readings:

Bernstein, Jeremy, and Laker Newhouse. “Modular Duality in Deep Learning.” arXiv preprint arXiv:2410.21265 (2024).
Flynn, Thomas. “The Duality Structure Gradient Descent Algorithm: Analysis and Applications to Neural Networks.” arXiv preprint arXiv:1708.00523 (2017).
Large, Tim, Yang Liu, et al. “Scalable Optimization in the Modular Norm.” Advances in Neural Information Processing Systems 37 (2024): 73501–73548.

Session 24: Inference Methods for Deep Learning

No required readings.

Optional readings:

Sun, Yu, Xiaolong Wang, et al. “Test-Time Training with Self-Supervision for Generalization Under Distribution Shifts.” In International Conference on Machine Learning, pp. 9229–9248. PMLR, 2020.
Zelikman, Eric, Yuhuai Wu, et al. “Star: Bootstrapping Reasoning with Reasoning.” Advances in Neural Information Processing Systems 35 (2022): 15476–15488.

Session 25: Efficient Policy Optimization Techniques for LLMs

No required readings.

Browse Course Material

Course Info

Instructors

Departments

As Taught In

Level

Topics

Learning Resource Types

Deep Learning

Session 1: Introduction to Deep Learning

Session 2: How to Train a Neural Net

Session 3: Approximation Theory

Session 4: Architectures: Grids

Session 5: Architectures: Graphs

Session 6: Generalization Theory

Session 7: Scaling Rules for Optimization

Session 8: Architectures: Transformers

Session 9: Hacker’s Guide to Deep Learning

Session 10: Architectures: Memory

Session 11: Representation Learning: Reconstruction-Based

Session 12: Representation Learning: Similarity-Based

Session 13: Representation Learning: Theory

Session 14: Generative Models: Basics

Session 15: Generative Models: Representation Learning Meets Generative Modeling

Session 16: Generative Models: Conditional Models

Session 17: Generalization: Out-of-Distribution (OOD)

Session 18: Transfer Learning: Models

Session 19: Transfer Learning: Data

Session 20: Scaling Laws

Session 21: Large Language Models

Session 22: AI for Musical Creativity

Session 23: Metrized Deep Learning

Session 24: Inference Methods for Deep Learning

Session 25: Efficient Policy Optimization Techniques for LLMs

Course Info

Instructors

Departments

As Taught In

Level

Topics

Learning Resource Types