Video Lectures

Lecture 27: Backpropagation: Find Partial Derivatives

Description

In this lecture, Professor Strang presents Professor Sra’s theorem which proves the convergence of stochastic gradient descent (SGD). He then reviews backpropagation, a method to compute derivatives quickly, using the chain rule.

Summary

Computational graph: Each step in computing \(F(x)\) from the weights
Derivative of each step + chain rule gives gradient of \(F\).
Reverse mode: Backwards from output to input
The key step to optimizing weights is backprop + stoch grad descent.

Related section in textbook: VII.3

Instructor: Prof. Gilbert Strang

Problems for Lecture 27
From textbook Section VII.2

2. If \(A\) is an \(m\) by \(n\) matrix with \(m>n\), is it faster to multiply \(A(A^{\mathtt{T}} A)\) or \((AA^{\mathtt{T}})A\)?

5. Draw a computational graph to compute the function \(f(x,y)=x^3(x-y)\). Use the graph to compute \(f(2,3)\).

Course Info

Departments
As Taught In
Spring 2018
Learning Resource Types
Lecture Videos
Problem Sets
Instructor Insights