18.S096 | January IAP 2022 | Undergraduate
Matrix Calculus for Machine Learning and Beyond


Course Meeting Times

Lectures: 3 sessions / week, 2 hours / session


Courses in linear algebra (such as 18.06 Linear Algebra) and multivariate calculus (such as 18.02 Multivariable Calculus)

Course Description

We all know that calculus courses such as 18.01 Single Variable Calculus and 18.02 Multivariable Calculus cover univariate and vector calculus, respectively. Modern applications such as machine learning require the next big step, matrix calculus.

This class covers a coherent approach to matrix calculus showing techniques that allow you to think of a matrix holistically (not just as an array of scalars), compute derivatives of important matrix factorizations, and really understand forward and reverse modes of differentiation. We will discuss adjoint methods, custom Jacobian matrix vector products, and how modern automatic differentiation is more computer science than mathematics in that it is neither symbolic nor based on finite differences. 


Here are some of the topics covered:

  • Derivatives as linear operators and linear approximation on arbitrary vector spaces: beyond gradients and Jacobians.
  • Derivatives of functions with matrix inputs and/or outputs (e.g. matrix inverses and determinants). Kronecker products and matrix “vectorization.”
  • Derivatives of matrix factorizations (e.g. eigenvalues/SVD) and derivatives with constraints (e.g. orthogonal matrices).
  • Multidimensional chain rules, and the signifance of right-to-left (“forward”) vs. left-to-right (“reverse”) composition. Chain rules on computational graphs (e.g. neural networks).
  • Forward- and reverse-mode manual and automatic multivariate differentiation.
  • Adjoint methods (vJp/pullback rules) for derivatives of solutions of linear, nonlinear, and differential equations.
  • Application to nonlinear root-finding and optimization. Multidimensional Newton and steepest–descent methods.
  • Applications in engineering/scientific optimization and machine learning.
  • Second derivatives, Hessian matrices, quadratic approximations, and quasi-Newton methods.


Lecture # and Topics Key Dates
Lecture 1 Part 1: Introduction          
Lecture 1 Part 2: Derivatives as Linear Operators
Lecture 2 Part 1: Derivatives as Linear Operators (cont.)         
Lecture 2 Part 2: Two by Two Matrix Jacobians
Problem Set 1 out
Lecture 3 Part 1: The Gradient of a Scalar Function of a Vector: Column Vector or Row Vector?       
Lecture 3 Part 2: Finite Difference
Lecture 4 Part 1: The Gradient of the Determinant       
Lecture 4 Part 2: Nonlinear Root-Finding, Optimization, and Adjoint-Method Differentiation
Problem Set 1 due          
Problem Set 2 out
Lecture 5: Forward and Reverse Automatic Differentiation in a Nutshell  
Lecture 6 Part 1: Derivatives of Eigenproblems       
Lecture 6 Part 2: Second Derivatives, Bilinear Forms, and Hessian Matrices
Lecture 7 Part 1: Hessian Matrices (cont.)       
Lecture 7 Part 2: Backpropagation through Back Substitution with a Backslash
Problem Set 2 due
Lecture 8 Part 1: Hessian Matrices (cont.)       
Lecture 8 Part 2: Differentiable Programming and Neural Differential Equations

Course Info
As Taught In
January IAP 2022
Learning Resource Types
notes Lecture Notes
assignment_turned_in Problem Sets with Solutions