Lecture 33: Neural Nets and the Learning Function

Course Info

Instructor

Prof. Gilbert Strang

Departments

Mathematics

As Taught In

Spring 2018

Level

Undergraduate

Topics

Learning Resource Types

Lecture Videos

Problem Sets

Instructor Insights

Download Course

Video Lectures

Description

This lecture focuses on the construction of the learning function \(F\), which is optimized by stochastic gradient descent and applied to the training data to minimize the loss. Professor Strang also begins his review of distance matrices.

Summary

Each training sample is given by a vector \(v\).
Next layer of the net is \(F_1(v)\) = ReLU\((A_1 v + b_1)\).
\( w_1 = A_1 v + b_1\) with optimized weights in \(A_1\) and \(b_1\)
ReLU(\(w\)) = nonlinear activation function \(= \max (0,w) \)
Minimize loss function by optimizing weights \(x\)’s = \(A\)’s and \(b\)’s
Distance matrix given between points: Find the points!

Related sections in textbook: VII.1 and IV.10

Instructor: Prof. Gilbert Strang

Problems for Lecture 33
From textbook Section VII.1

5. How many weights and biases are in a network with \(m=N_0=4\) inputs in each feature vector \(\boldsymbol{v}_0\) and \(N=6\) neurons on each of the 3 hidden layers? How many activation functions \((\hbox{ReLU})\) are in this network, before the final output?

From textbook Section IV.10

2. \(||\boldsymbol{x}_1-\boldsymbol{x}_2||^2=9\) and \(||\boldsymbol{x}_2-\boldsymbol{x}_3||^2=16\) and \(||\boldsymbol{x}_1-\boldsymbol{x}_3||^2=25\) do satisfy the triangle inequality \(3+4>5\). Construct \(G\) and find points \(\boldsymbol{x}_1,\boldsymbol{x}_2,\boldsymbol{x}_3\) that match these distances.