WEBVTT
00:00:13.972 --> 00:00:16.440
GILBERT STRANG: OK,
so I was speaking
00:00:16.440 --> 00:00:19.650
about eigenvalues
and eigenvectors
00:00:19.650 --> 00:00:21.690
for a square matrix.
00:00:21.690 --> 00:00:25.760
And then I said for data
for many other applications,
00:00:25.760 --> 00:00:27.390
the matrices are not square.
00:00:27.390 --> 00:00:31.560
We need something that replaces
eigenvalues and eigenvectors.
00:00:31.560 --> 00:00:32.640
And what they are--
00:00:32.640 --> 00:00:37.530
and it's perfect-- is singular
values and singular vectors.
00:00:37.530 --> 00:00:41.620
So may I explain singular
values and singular vectors?
00:00:41.620 --> 00:00:45.930
This slide shows a lot of them.
00:00:45.930 --> 00:00:51.010
The point is that
there will be--
00:00:51.010 --> 00:00:54.430
now I don't say eigenvectors--
two-- different left singular
00:00:54.430 --> 00:00:55.380
vectors.
00:00:55.380 --> 00:00:58.270
They will go into this matrix u.
00:00:58.270 --> 00:01:01.740
Right singular vectors
will go into v.
00:01:01.740 --> 00:01:03.880
It was the other case
that was so special.
00:01:03.880 --> 00:01:06.820
When the matrix was
symmetric, then the left
00:01:06.820 --> 00:01:08.250
equals left eigenvector.
00:01:08.250 --> 00:01:10.720
They're the same
as the right one.
00:01:10.720 --> 00:01:12.430
That's sort of sensible.
00:01:12.430 --> 00:01:14.740
But a general
matrix and certainly
00:01:14.740 --> 00:01:16.135
a rectangular matrix--
00:01:19.270 --> 00:01:21.100
well, we don't call
them eigenvectors,
00:01:21.100 --> 00:01:22.600
because that would
be confusing-- we
00:01:22.600 --> 00:01:24.520
call them singular vectors.
00:01:24.520 --> 00:01:28.750
And then, inbetween
are not eigenvalues,
00:01:28.750 --> 00:01:31.450
but singular values.
00:01:31.450 --> 00:01:32.130
Oh, right.
00:01:32.130 --> 00:01:34.700
Oh, hiding over here is a key.
00:01:34.700 --> 00:01:40.120
A times the v's gives
sigma times the u's.
00:01:40.120 --> 00:01:43.210
So that's the replacement
for ax equal lambda
00:01:43.210 --> 00:01:45.510
x, which had x on both sides.
00:01:45.510 --> 00:01:48.330
OK, now we've got two.
00:01:48.330 --> 00:01:51.820
But the beauty is now we've
got two of those to work with.
00:01:51.820 --> 00:01:57.280
We can make all the u's
orthogonal to each other--
00:01:57.280 --> 00:02:01.000
all the v's orthogonal
to each other.
00:02:01.000 --> 00:02:03.910
We can do what only
symmetric matrices
00:02:03.910 --> 00:02:06.160
could do for eigenvectors.
00:02:06.160 --> 00:02:08.289
We can do it now
for all matrices,
00:02:08.289 --> 00:02:13.150
not even squares, just
this is where life is, OK.
00:02:13.150 --> 00:02:15.340
And these numbers
instead of the lambdas
00:02:15.340 --> 00:02:17.350
are called singular values.
00:02:17.350 --> 00:02:20.290
And we use the letter
sigma for those.
00:02:20.290 --> 00:02:24.110
And here is a picture of
the geometry in 2 by 2
00:02:24.110 --> 00:02:26.600
if we had a 2 by 2 matrix.
00:02:26.600 --> 00:02:29.860
So you remember,
factorization breaks
00:02:29.860 --> 00:02:33.960
up a matrix into
separate small parts,
00:02:33.960 --> 00:02:36.160
each doing its own thing.
00:02:36.160 --> 00:02:39.630
So if I multiply by
vector x, the first thing
00:02:39.630 --> 00:02:42.270
that's going to hit
it is v transpose.
00:02:42.270 --> 00:02:45.660
V transpose is an
orthogonal matrix.
00:02:45.660 --> 00:02:48.930
Remember, I said we can
make these singular vectors
00:02:48.930 --> 00:02:50.160
perpendicular.
00:02:50.160 --> 00:02:52.200
That's what an orthogonal
matrix-- so it's just
00:02:52.200 --> 00:02:54.390
like a rotation that you see.
00:02:54.390 --> 00:03:00.290
So the v transpose is just
turns the vector to get here
00:03:00.290 --> 00:03:01.890
to get to the second one.
00:03:01.890 --> 00:03:04.200
Then I'm multiplying
by the lambdas.
00:03:04.200 --> 00:03:05.490
But they're not lambdas now.
00:03:05.490 --> 00:03:07.040
They're sigma.
00:03:07.040 --> 00:03:10.360
The matrix, so
that's capital sigma.
00:03:10.360 --> 00:03:12.940
So there is sigma 1 and sigma 2.
00:03:12.940 --> 00:03:16.140
What they do is
stretch the circle.
00:03:16.140 --> 00:03:17.470
It's a diagonal matrix.
00:03:17.470 --> 00:03:20.020
So it doesn't turn things.
00:03:20.020 --> 00:03:23.590
But it stretches the
circle to an ellipse
00:03:23.590 --> 00:03:26.710
because it gets the two
different singular values
00:03:26.710 --> 00:03:28.480
in-- sigma 1 and sigma 2.
00:03:28.480 --> 00:03:34.480
And then the last guy, the
u is going to hit last.
00:03:34.480 --> 00:03:36.820
It takes the ellipse
and turns out again.
00:03:36.820 --> 00:03:38.350
It's again a rotation--
00:03:38.350 --> 00:03:41.758
rotation, stretch, rotation.
00:03:41.758 --> 00:03:42.550
I'll say it again--
00:03:42.550 --> 00:03:45.140
rotation, stretch, rotation.
00:03:45.140 --> 00:03:47.860
That's what singular
values and singular
00:03:47.860 --> 00:03:51.430
vectors do, the singular
value decomposition.
00:03:51.430 --> 00:03:56.740
And it's got the best
of all worlds here.
00:03:56.740 --> 00:04:01.750
It's got the rotations,
the orthogonal matrices.
00:04:01.750 --> 00:04:06.003
And it's got the stretches,
the diagonal matrices.
00:04:06.003 --> 00:04:07.920
Compared to those two,
those are the greatest.
00:04:07.920 --> 00:04:13.290
Triangular matrices were good
when we were young an hour ago.
00:04:13.290 --> 00:04:15.670
Now, we're seeing the best.
00:04:15.670 --> 00:04:20.014
OK, now let me just show
you where they come from.
00:04:20.014 --> 00:04:23.780
Oh, so how to find these v's.
00:04:23.780 --> 00:04:27.970
Well, the answer is, if I'm
looking for orthogonal vectors,
00:04:27.970 --> 00:04:32.540
the great idea is find
a symmetric matrix
00:04:32.540 --> 00:04:35.000
and with those eigenvectors.
00:04:35.000 --> 00:04:40.280
So these v's that I want for
A are actually eigenvectors
00:04:40.280 --> 00:04:43.100
of this symmetric
matrix A transpose times
00:04:43.100 --> 00:04:45.560
A. That's just nice.
00:04:45.560 --> 00:04:48.350
So we can find those
singular vectors
00:04:48.350 --> 00:04:51.110
just as fast as we
can find eigenvectors
00:04:51.110 --> 00:04:52.700
for a symmetric matrix.
00:04:52.700 --> 00:04:56.510
And we know there, because
A transpose A is symmetric.
00:04:56.510 --> 00:04:59.540
We know the eigenvectors
are perpendicular
00:04:59.540 --> 00:05:01.580
to each other orthonormal.
00:05:01.580 --> 00:05:04.910
OK, and now what about the
other ones because remember,
00:05:04.910 --> 00:05:06.160
we have two sets.
00:05:06.160 --> 00:05:13.100
The u's-- well, we just multiply
by A. And we've got the u's.
00:05:13.100 --> 00:05:16.280
Well, and divide by sigmas,
because these vectors u's
00:05:16.280 --> 00:05:19.700
and v's are unit
vectors, length one.
00:05:19.700 --> 00:05:22.100
So we have to scale
them properly.
00:05:22.100 --> 00:05:27.380
And this was a little key
bit of algebra to check that,
00:05:27.380 --> 00:05:29.660
not only the v's
were orthogonal,
00:05:29.660 --> 00:05:31.880
but the u's are orthogonal.
00:05:31.880 --> 00:05:33.320
Yeah, it just comes out--
00:05:33.320 --> 00:05:34.590
comes out.
00:05:34.590 --> 00:05:36.870
So this singular
value decomposition,
00:05:36.870 --> 00:05:41.750
which is maybe, well, say 100
years old, maybe a bit more.
00:05:41.750 --> 00:05:48.410
But it's really in the last 20,
30 years that singular values
00:05:48.410 --> 00:05:50.160
have become so important.
00:05:50.160 --> 00:05:54.212
This is the best
factorization of them all.
00:05:54.212 --> 00:05:57.620
And that's not always reflected
in linear algebra courses.
00:05:57.620 --> 00:06:04.540
So part of my goal today is
to say get to singular values.
00:06:04.540 --> 00:06:08.780
If you've done symmetric
matrices and their eigenvalues,
00:06:08.780 --> 00:06:11.220
then you can do singular values.
00:06:11.220 --> 00:06:17.850
And I think that's absolutely
worth doing, OK, yeah,
00:06:17.850 --> 00:06:22.850
so and remembering down here
that capital Sigma stands
00:06:22.850 --> 00:06:27.500
for the diagonal matrix of
these positive numbers, sigma 1,
00:06:27.500 --> 00:06:30.500
sigma 2 down to sigma r there.
00:06:30.500 --> 00:06:34.820
The rank, which came way
back in the first slides,
00:06:34.820 --> 00:06:36.830
tells you how many there are.
00:06:36.830 --> 00:06:40.700
Good, good.
00:06:40.700 --> 00:06:43.230
Oh, here's an example.
00:06:43.230 --> 00:06:45.740
So I took a small
matrix because I'm
00:06:45.740 --> 00:06:49.430
doing this by pencil and
paper and actually showing you
00:06:49.430 --> 00:06:52.590
that the singular value.
00:06:52.590 --> 00:06:55.100
So there is my matrix, 2 by 2.
00:06:55.100 --> 00:06:56.310
Here are the u's.
00:06:56.310 --> 00:06:58.130
Do you see that those
are orthogonal--
00:06:58.130 --> 00:07:01.070
1, 3 against minus 3, 1?
00:07:01.070 --> 00:07:03.150
Take the dot product,
and you get 0.
00:07:03.150 --> 00:07:04.960
The v's are orthogonal.
00:07:04.960 --> 00:07:06.800
The sigma is diagonal.
00:07:06.800 --> 00:07:11.890
And then the pieces from
that add back to the matrix.
00:07:11.890 --> 00:07:14.190
So it's really, it's
broken my matrix
00:07:14.190 --> 00:07:16.560
into a couple of pieces--
00:07:16.560 --> 00:07:20.720
one for the first
singular value in vector,
00:07:20.720 --> 00:07:23.980
and the other for the second
singular value in vector.
00:07:23.980 --> 00:07:26.430
And that's what
data science wants.
00:07:26.430 --> 00:07:30.310
Data science wants to know
what's important in the matrix?
00:07:30.310 --> 00:07:34.610
Well, what's important
is sigma 1, the big guy.
00:07:34.610 --> 00:07:36.400
Sigma 2, you see.
00:07:36.400 --> 00:07:38.020
Well, it was 3 times smaller--
00:07:38.020 --> 00:07:40.450
3/2 versus 1/2.
00:07:40.450 --> 00:07:45.760
So if I had a 100 by 100
matrix or 100 by 1,000,
00:07:45.760 --> 00:07:51.560
I'd have 100 singular values and
maybe the first five I'd keep.
00:07:51.560 --> 00:07:54.502
If I'm in the financial
market, those guys,
00:07:54.502 --> 00:07:57.490
those first numbers
are telling me
00:07:57.490 --> 00:08:00.790
maybe what bond prices
are going to do over time.
00:08:00.790 --> 00:08:06.110
And it's a mixture
of a few features,
00:08:06.110 --> 00:08:09.930
but not all 1,000
features, right.
00:08:09.930 --> 00:08:13.890
So this is singular
value decomposition
00:08:13.890 --> 00:08:17.450
picks out the important
part of a data matrix.
00:08:17.450 --> 00:08:19.490
And you cannot ask
for a more than that.
00:08:22.250 --> 00:08:25.700
Here's what you do with a matrix
is just totally enormous--
00:08:25.700 --> 00:08:27.730
too big to multiply--
00:08:27.730 --> 00:08:29.560
too big to compute.
00:08:29.560 --> 00:08:37.220
Then you randomly sample it.
00:08:37.220 --> 00:08:39.799
Yeah, maybe the next
slide even mentions
00:08:39.799 --> 00:08:42.690
that word randomized
numerical linear algebra.
00:08:42.690 --> 00:08:46.170
So this, I'll go back to this.
00:08:46.170 --> 00:08:48.980
So the singular
value decomposition--
00:08:48.980 --> 00:08:52.410
this, what we just talked
about with the u's and the v's
00:08:52.410 --> 00:08:54.350
and the sigmas.
00:08:54.350 --> 00:08:56.670
Sigma 1 is the biggest.
00:08:56.670 --> 00:08:59.400
Sigma r is the smallest.
00:08:59.400 --> 00:09:01.640
So in data science,
you very often
00:09:01.640 --> 00:09:06.410
keep just these first ones,
maybe the first k, the k
00:09:06.410 --> 00:09:08.150
largest ones.
00:09:08.150 --> 00:09:10.250
And then you've
got the matrix that
00:09:10.250 --> 00:09:14.900
has rank only k, because you're
only working with k vectors.
00:09:14.900 --> 00:09:19.130
And it turns out that's the
closest one to the big matrix
00:09:19.130 --> 00:09:23.610
A. So this singular value
is among other things
00:09:23.610 --> 00:09:27.930
is picking out, putting
in order of importance
00:09:27.930 --> 00:09:30.940
the little pieces of the matrix.
00:09:30.940 --> 00:09:34.700
And then you can just pick
a few pieces to work with.
00:09:34.700 --> 00:09:36.090
Yeah, yeah.
00:09:36.090 --> 00:09:40.620
And the idea of norms is how to
measure the size of a matrix.
00:09:40.620 --> 00:09:46.060
Yeah, but I'll leave
that for the future.
00:09:46.060 --> 00:09:50.230
And randomized linear algebra
I just want to mention.
00:09:50.230 --> 00:09:53.950
Seems a little crazy that
by just randomly sampling
00:09:53.950 --> 00:09:58.580
a matrix, we could
learn anything about it.
00:09:58.580 --> 00:10:02.560
But typically data
is sort of organized.
00:10:02.560 --> 00:10:05.270
It's not just
totally random stuff.
00:10:05.270 --> 00:10:09.820
So if we want to know like, my
friend in the Broad Institute
00:10:09.820 --> 00:10:14.410
was doing the ancient
history of man.
00:10:14.410 --> 00:10:18.470
So data from thousands
of years ago.
00:10:18.470 --> 00:10:20.090
So he had a giant matrix--
00:10:20.090 --> 00:10:22.580
a lot of data-- too much data.
00:10:22.580 --> 00:10:27.880
And he said, how can we find the
singular value decomposition?
00:10:27.880 --> 00:10:29.660
Pick out the important thing.
00:10:29.660 --> 00:10:33.210
So you had to sample the data.
00:10:33.210 --> 00:10:36.990
Statistics is a beautiful
important subject.
00:10:36.990 --> 00:10:40.590
And it leans on linear algebra.
00:10:40.590 --> 00:10:44.140
Data science leans
on linear algebra.
00:10:44.140 --> 00:10:46.650
You are seeing the tool.
00:10:46.650 --> 00:10:52.080
Calculus would be functions
would be continuous curves.
00:10:52.080 --> 00:10:54.735
Linear algebra is about vectors.
00:10:54.735 --> 00:10:57.300
This is just n components.
00:10:57.300 --> 00:10:59.010
And that's where you compute.
00:10:59.010 --> 00:11:01.350
And that's where you understand.
00:11:01.350 --> 00:11:03.060
OK.
00:11:03.060 --> 00:11:07.120
Oh, this is maybe the
last slide to just
00:11:07.120 --> 00:11:10.570
help orient you in the courses.
00:11:10.570 --> 00:11:14.830
So at MIT 18.06 is the
Linear Algebra Course.
00:11:14.830 --> 00:11:17.950
And maybe you know
18.06 and also
00:11:17.950 --> 00:11:23.390
18.06 Scholar, SC,
on OpenCourseWare.
00:11:23.390 --> 00:11:30.770
And then this is the new course
with the new book, 18.065.
00:11:30.770 --> 00:11:34.430
So its numbers sort of
indicating a second course
00:11:34.430 --> 00:11:35.180
in linear algebra.
00:11:35.180 --> 00:11:36.790
That's when I'm
actually teaching now,
00:11:36.790 --> 00:11:40.050
Monday, Wednesday, Friday.
00:11:40.050 --> 00:11:42.300
And so that starts
with linear algebra,
00:11:42.300 --> 00:11:44.880
but it's mostly
about deep learning--
00:11:44.880 --> 00:11:45.960
learning from data.
00:11:45.960 --> 00:11:47.280
So you need statistics.
00:11:47.280 --> 00:11:50.520
You need optimization,
minimizing.
00:11:50.520 --> 00:11:53.910
Big functions of
calculus comes into it.
00:11:53.910 --> 00:11:58.170
So that's a lot of fun
to teach and to learn.
00:11:58.170 --> 00:12:01.470
And, of course,
it's tremendously
00:12:01.470 --> 00:12:03.510
important in industry now.
00:12:03.510 --> 00:12:06.570
And Google and Facebook
and ever so many companies
00:12:06.570 --> 00:12:09.300
need people who understand this.
00:12:09.300 --> 00:12:12.960
And, oh, and then I
am repeating 18.06
00:12:12.960 --> 00:12:16.170
because there is this
new book coming, I hope.
00:12:16.170 --> 00:12:19.140
Did some more this morning.
00:12:19.140 --> 00:12:20.640
Linear Algebra for Everyone.
00:12:20.640 --> 00:12:23.580
So I have
optimistically put 2021.
00:12:23.580 --> 00:12:27.140
And you're the first
people that know about it.
00:12:27.140 --> 00:12:30.850
So these are the websites
for the two that we have.
00:12:30.850 --> 00:12:32.800
That's the website
for the linear algebra
00:12:32.800 --> 00:12:35.390
book, math.mit.edu.
00:12:35.390 --> 00:12:39.520
And this is the website for
the Learning from Data book.
00:12:39.520 --> 00:12:43.990
So you see there the table of
contents and all and solutions
00:12:43.990 --> 00:12:47.830
to problems-- lots of things.
00:12:47.830 --> 00:12:50.650
Thanks for listening
to this is--
00:12:50.650 --> 00:12:56.470
what-- maybe four or five
pieces in this 2020 vision
00:12:56.470 --> 00:13:03.850
to update the videos
that have been watched
00:13:03.850 --> 00:13:07.480
so much on OpenCourseWare.
00:13:07.480 --> 00:13:09.330
Thank you.