WEBVTT
00:00:00.000 --> 00:00:01.992
[MUSIC PLAYING]
00:00:15.960 --> 00:00:18.210
GILBERT STRANG: So,
I'm Gilbert Strang.
00:00:18.210 --> 00:00:26.430
And this is about my new course
18.065 and the new textbook
00:00:26.430 --> 00:00:28.860
Linear Algebra and
Learning from Data,
00:00:28.860 --> 00:00:30.750
and what's in those subjects.
00:00:30.750 --> 00:00:36.360
So there are really
two essential topics
00:00:36.360 --> 00:00:42.690
and two supplementary, but
all very important subjects.
00:00:42.690 --> 00:00:47.250
So if I tell you about those
four parts of mathematics that
00:00:47.250 --> 00:00:49.680
are in the course,
that will give you
00:00:49.680 --> 00:00:53.230
an idea if you're interested
to follow through.
00:00:53.230 --> 00:00:56.730
So the first big subject
is linear algebra.
00:00:56.730 --> 00:01:01.140
That subject has just surged,
exploded in importance
00:01:01.140 --> 00:01:02.310
in practice.
00:01:02.310 --> 00:01:07.380
What I want to focus on is
some of the best matrices, say,
00:01:07.380 --> 00:01:11.880
symmetric matrices, orthogonal
matrices, and their relation.
00:01:15.270 --> 00:01:19.080
Those are the stars
of linear algebra.
00:01:19.080 --> 00:01:24.690
And the key step is
to factor a matrix
00:01:24.690 --> 00:01:28.820
into maybe symmetric
times orthogonal matrix,
00:01:28.820 --> 00:01:34.950
maybe orthogonal times diagonal
times orthogonal matrix--
00:01:34.950 --> 00:01:38.700
that's a very important
factorization called
00:01:38.700 --> 00:01:41.700
the singular value
decomposition.
00:01:41.700 --> 00:01:46.380
That doesn't get into a lot
of linear algebra courses,
00:01:46.380 --> 00:01:49.350
but it's so critical.
00:01:49.350 --> 00:01:54.340
So can I speak now about
the second important topic,
00:01:54.340 --> 00:01:55.780
which is deep learning?
00:01:55.780 --> 00:01:57.830
So what is deep learning?
00:01:57.830 --> 00:02:02.370
The job is to create a function.
00:02:02.370 --> 00:02:06.030
Your inputs are, like
for driverless cars,
00:02:06.030 --> 00:02:08.820
the input would be an
image that's a telephone
00:02:08.820 --> 00:02:10.440
pole or a pedestrian.
00:02:10.440 --> 00:02:17.460
And the system has to learn
to recognize which it is.
00:02:17.460 --> 00:02:23.730
Or the inputs from
handwriting on addresses
00:02:23.730 --> 00:02:25.650
would be a zip code.
00:02:25.650 --> 00:02:31.830
So the system has to learn how
to recognize 0, 1, 2, 3 up to 9
00:02:31.830 --> 00:02:34.380
from handwriting of all kinds.
00:02:34.380 --> 00:02:38.580
Or another one is speech,
like what Siri has to do.
00:02:38.580 --> 00:02:42.630
So my speech has to get
input and interpreted
00:02:42.630 --> 00:02:45.690
and output by the
process of deep learning.
00:02:45.690 --> 00:02:49.080
So it involves creating
a learning function.
00:02:49.080 --> 00:02:54.120
The function takes
the input, the data,
00:02:54.120 --> 00:02:59.750
and produces the output,
the meaning of that data.
00:02:59.750 --> 00:03:03.120
And so what's the function like?
00:03:03.120 --> 00:03:07.080
That's what mathematics
is about, functions.
00:03:07.080 --> 00:03:11.070
So it involves matrix
multiplication.
00:03:11.070 --> 00:03:14.910
Part of the function
is multiplying vectors
00:03:14.910 --> 00:03:16.650
by matrices.
00:03:16.650 --> 00:03:18.720
So that's a bunch of steps.
00:03:18.720 --> 00:03:22.440
But if there was only that,
if it was all linear algebra,
00:03:22.440 --> 00:03:25.660
the thing would
fail and has failed.
00:03:25.660 --> 00:03:32.040
What makes it work now so much
that companies are investing
00:03:32.040 --> 00:03:36.330
enormously in the technology
is that there is now
00:03:36.330 --> 00:03:39.810
a nonlinear function, a very
simple one in the middle
00:03:39.810 --> 00:03:42.870
between every pair of matrices.
00:03:42.870 --> 00:03:46.510
And that nonlinear function, I
can even tell you what it is.
00:03:46.510 --> 00:03:49.500
It's a function f,
let's call it f,
00:03:49.500 --> 00:03:55.230
f of x is equal to
x if x is positive.
00:03:55.230 --> 00:03:59.010
And f of x is 0
if x is negative.
00:03:59.010 --> 00:04:00.690
So you can imagine it's graph.
00:04:00.690 --> 00:04:02.790
It's a flat line
where it's negative.
00:04:02.790 --> 00:04:06.660
And then it's a 45 degree
slope where it's positive.
00:04:06.660 --> 00:04:09.030
So putting that
nonlinear function
00:04:09.030 --> 00:04:11.760
in between the matrix
multiplications
00:04:11.760 --> 00:04:18.300
is the way to construct
successful learning function.
00:04:18.300 --> 00:04:20.790
But you have to
find those matrices.
00:04:20.790 --> 00:04:23.490
I mentioned two
supporting subjects.
00:04:23.490 --> 00:04:26.850
The first is-- optimization
would be the word.
00:04:26.850 --> 00:04:31.260
We have to find the entries
in those matrices that
00:04:31.260 --> 00:04:34.200
go into the learning function.
00:04:34.200 --> 00:04:36.030
That's a crucial step.
00:04:36.030 --> 00:04:39.840
So this is a problem
of minimizing the error
00:04:39.840 --> 00:04:44.520
with all those matrix
entries as variables.
00:04:44.520 --> 00:04:50.880
So this is multivariable
calculus, like 100,000.
00:04:50.880 --> 00:04:57.210
500,000 variables, it's just
unthinkable in a basic calculus
00:04:57.210 --> 00:05:02.370
course, but it's happening
in a company that's
00:05:02.370 --> 00:05:04.380
working with deep learning.
00:05:04.380 --> 00:05:10.370
And so that's the giant
calculation of deep learning.
00:05:10.370 --> 00:05:16.130
That's what keeps
GPU's going for a week.
00:05:16.130 --> 00:05:19.970
But it gives amazing
results that could never
00:05:19.970 --> 00:05:22.610
have been achieved in the past.
00:05:22.610 --> 00:05:27.060
So then the other key
subject is statistics.
00:05:27.060 --> 00:05:29.570
And the basic
ideas of statistics
00:05:29.570 --> 00:05:32.510
play a role here,
because when you're
00:05:32.510 --> 00:05:37.190
multiplying a whole sequence
of matrices in deep learning,
00:05:37.190 --> 00:05:41.480
it's very possible
for the numbers
00:05:41.480 --> 00:05:45.290
to grow out of
sight exponentially
00:05:45.290 --> 00:05:47.510
or to drop to zero.
00:05:47.510 --> 00:05:52.220
And both of those are bad news
for the learning function.
00:05:52.220 --> 00:05:56.870
So you need to keep the mean
and variance at the right spot
00:05:56.870 --> 00:06:02.510
to keep those numbers in
the learning function,
00:06:02.510 --> 00:06:08.330
those matrices in a good range.
00:06:08.330 --> 00:06:12.500
So this course won't
be a statistics course,
00:06:12.500 --> 00:06:16.820
but it will use statistics
as deep learning does.
00:06:16.820 --> 00:06:19.700
So those are the four subjects.
00:06:19.700 --> 00:06:24.080
Linear algebra and deep
learning, two big ones.
00:06:24.080 --> 00:06:29.240
Optimization and
statistics, essential also.
00:06:29.240 --> 00:06:35.720
So I hope you'll enjoy the
videos and enjoy the textbook.
00:06:35.720 --> 00:06:41.660
And go to the OpenCourseWare
site ocw.mit.edu
00:06:41.660 --> 00:06:43.850
for the full picture.
00:06:43.850 --> 00:06:47.930
Beyond the videos, there
are exercises, problems,
00:06:47.930 --> 00:06:53.210
discussion, lots more toward
making a complete presentation,
00:06:53.210 --> 00:06:55.420
which I hope you like.