Video Lectures

Lecture 23: Accelerating Gradient Descent (Use Momentum)

Description

In this lecture, Professor Strang explains both momentum-based gradient descent and Nesterov’s accelerated gradient descent.

Summary

Study the zig-zag example: Minimize \(F = \frac{1}{2} (x^2 + by^2)\)
Add a momentum term / heavy ball remembers its directions.
New point \(k\) + 1 comes from TWO old points \(k\) and \(k\) - 1.
“1st order” becomes “2nd order” or “1st order system” as in ODEs.
Convergence rate improves: 1 - \(b\) to 1 - square root of \(b\) !

Related section in textbook: VI.4

Instructor: Prof. Gilbert Strang

Problem for Lecture 23
From textbook Section VI.4

5. Explain why projection onto a convex set \(K\) is a contraction in equation (24). Why is the distance \(||\boldsymbol{x}-\boldsymbol{y}||\) never increased when \(\boldsymbol{x}\) and \(\boldsymbol{y}\) are projected onto \(K\)?

Course Info

Departments
As Taught In
Spring 2018
Learning Resource Types
Lecture Videos
Problem Sets
Instructor Insights