Lecture 22: Gradient Descent: Downhill to a Minimum

Course Info

Instructor

Prof. Gilbert Strang

Departments

Mathematics

As Taught In

Spring 2018

Level

Undergraduate

Topics

Learning Resource Types

Lecture Videos

Problem Sets

Instructor Insights

Podcasts

Download Course

Video Lectures

Description

Gradient descent is the most common optimization algorithm in deep learning and machine learning. It only takes into account the first derivative when performing updates on parameters—the stepwise process that moves downhill to reach a local minimum.

Summary

Gradient descent: Downhill from \(x\) to new \(X = x - s (\partial F / \partial x)\)
Excellent example: \(F(x,y) = \frac{1}{2} (x^2 + by^2)\)
If \(b\) is small we take a zig-zag path toward (0, 0).
Each step multiplies by \((b - 1)/(b + 1)\)
Remarkable function: logarithm of determinant of \(X\)

Related section in textbook: VI.4

Instructor: Prof. Gilbert Strang

Problems for Lecture 22
From textbook Section VI.4

1. For a 1 by 1 matrix in Example 3, the determinant is just \(\det X=x_{11}\). Find the first and second derivatives of \(F(X)=-\log(\det X)=-\log x_{11}\) for \(x_{11}>0\). Sketch the graph of \(F=-\log x\) to see that this function \(F\) is convex.

6. What is the gradient descent equation \(\boldsymbol{x}_{k+1}=\boldsymbol{x}_k-s_k\nabla f(\boldsymbol{x}_k)\) for the least squares problem of minimizing \(f(\boldsymbol{x})=\frac{1}{2}||A\boldsymbol{x}-\boldsymbol{b}||^2\)?