Lecture 27: Backpropagation: Find Partial Derivatives

Course Info

Instructor

Prof. Gilbert Strang

Departments

Mathematics

As Taught In

Spring 2018

Level

Undergraduate

Topics

Learning Resource Types

Lecture Videos

Problem Sets

Instructor Insights

Download Course

Video Lectures

Description

In this lecture, Professor Strang presents Professor Sra’s theorem which proves the convergence of stochastic gradient descent (SGD). He then reviews backpropagation, a method to compute derivatives quickly, using the chain rule.

Summary

Computational graph: Each step in computing \(F(x)\) from the weights
Derivative of each step + chain rule gives gradient of \(F\).
Reverse mode: Backwards from output to input
The key step to optimizing weights is backprop + stoch grad descent.

Related section in textbook: VII.3

Instructor: Prof. Gilbert Strang