1
00:00:13,972 --> 00:00:16,440
GILBERT STRANG: OK,
so I was speaking
2
00:00:16,440 --> 00:00:19,650
about eigenvalues
and eigenvectors
3
00:00:19,650 --> 00:00:21,690
for a square matrix.
4
00:00:21,690 --> 00:00:25,760
And then I said for data
for many other applications,
5
00:00:25,760 --> 00:00:27,390
the matrices are not square.
6
00:00:27,390 --> 00:00:31,560
We need something that replaces
eigenvalues and eigenvectors.
7
00:00:31,560 --> 00:00:32,640
And what they are--
8
00:00:32,640 --> 00:00:37,530
and it's perfect-- is singular
values and singular vectors.
9
00:00:37,530 --> 00:00:41,620
So may I explain singular
values and singular vectors?
10
00:00:41,620 --> 00:00:45,930
This slide shows a lot of them.
11
00:00:45,930 --> 00:00:51,010
The point is that
there will be--
12
00:00:51,010 --> 00:00:54,430
now I don't say eigenvectors--
two-- different left singular
13
00:00:54,430 --> 00:00:55,380
vectors.
14
00:00:55,380 --> 00:00:58,270
They will go into this matrix u.
15
00:00:58,270 --> 00:01:01,740
Right singular vectors
will go into v.
16
00:01:01,740 --> 00:01:03,880
It was the other case
that was so special.
17
00:01:03,880 --> 00:01:06,820
When the matrix was
symmetric, then the left
18
00:01:06,820 --> 00:01:08,250
equals left eigenvector.
19
00:01:08,250 --> 00:01:10,720
They're the same
as the right one.
20
00:01:10,720 --> 00:01:12,430
That's sort of sensible.
21
00:01:12,430 --> 00:01:14,740
But a general
matrix and certainly
22
00:01:14,740 --> 00:01:16,135
a rectangular matrix--
23
00:01:19,270 --> 00:01:21,100
well, we don't call
them eigenvectors,
24
00:01:21,100 --> 00:01:22,600
because that would
be confusing-- we
25
00:01:22,600 --> 00:01:24,520
call them singular vectors.
26
00:01:24,520 --> 00:01:28,750
And then, inbetween
are not eigenvalues,
27
00:01:28,750 --> 00:01:31,450
but singular values.
28
00:01:31,450 --> 00:01:32,130
Oh, right.
29
00:01:32,130 --> 00:01:34,700
Oh, hiding over here is a key.
30
00:01:34,700 --> 00:01:40,120
A times the v's gives
sigma times the u's.
31
00:01:40,120 --> 00:01:43,210
So that's the replacement
for ax equal lambda
32
00:01:43,210 --> 00:01:45,510
x, which had x on both sides.
33
00:01:45,510 --> 00:01:48,330
OK, now we've got two.
34
00:01:48,330 --> 00:01:51,820
But the beauty is now we've
got two of those to work with.
35
00:01:51,820 --> 00:01:57,280
We can make all the u's
orthogonal to each other--
36
00:01:57,280 --> 00:02:01,000
all the v's orthogonal
to each other.
37
00:02:01,000 --> 00:02:03,910
We can do what only
symmetric matrices
38
00:02:03,910 --> 00:02:06,160
could do for eigenvectors.
39
00:02:06,160 --> 00:02:08,289
We can do it now
for all matrices,
40
00:02:08,289 --> 00:02:13,150
not even squares, just
this is where life is, OK.
41
00:02:13,150 --> 00:02:15,340
And these numbers
instead of the lambdas
42
00:02:15,340 --> 00:02:17,350
are called singular values.
43
00:02:17,350 --> 00:02:20,290
And we use the letter
sigma for those.
44
00:02:20,290 --> 00:02:24,110
And here is a picture of
the geometry in 2 by 2
45
00:02:24,110 --> 00:02:26,600
if we had a 2 by 2 matrix.
46
00:02:26,600 --> 00:02:29,860
So you remember,
factorization breaks
47
00:02:29,860 --> 00:02:33,960
up a matrix into
separate small parts,
48
00:02:33,960 --> 00:02:36,160
each doing its own thing.
49
00:02:36,160 --> 00:02:39,630
So if I multiply by
vector x, the first thing
50
00:02:39,630 --> 00:02:42,270
that's going to hit
it is v transpose.
51
00:02:42,270 --> 00:02:45,660
V transpose is an
orthogonal matrix.
52
00:02:45,660 --> 00:02:48,930
Remember, I said we can
make these singular vectors
53
00:02:48,930 --> 00:02:50,160
perpendicular.
54
00:02:50,160 --> 00:02:52,200
That's what an orthogonal
matrix-- so it's just
55
00:02:52,200 --> 00:02:54,390
like a rotation that you see.
56
00:02:54,390 --> 00:03:00,290
So the v transpose is just
turns the vector to get here
57
00:03:00,290 --> 00:03:01,890
to get to the second one.
58
00:03:01,890 --> 00:03:04,200
Then I'm multiplying
by the lambdas.
59
00:03:04,200 --> 00:03:05,490
But they're not lambdas now.
60
00:03:05,490 --> 00:03:07,040
They're sigma.
61
00:03:07,040 --> 00:03:10,360
The matrix, so
that's capital sigma.
62
00:03:10,360 --> 00:03:12,940
So there is sigma 1 and sigma 2.
63
00:03:12,940 --> 00:03:16,140
What they do is
stretch the circle.
64
00:03:16,140 --> 00:03:17,470
It's a diagonal matrix.
65
00:03:17,470 --> 00:03:20,020
So it doesn't turn things.
66
00:03:20,020 --> 00:03:23,590
But it stretches the
circle to an ellipse
67
00:03:23,590 --> 00:03:26,710
because it gets the two
different singular values
68
00:03:26,710 --> 00:03:28,480
in-- sigma 1 and sigma 2.
69
00:03:28,480 --> 00:03:34,480
And then the last guy, the
u is going to hit last.
70
00:03:34,480 --> 00:03:36,820
It takes the ellipse
and turns out again.
71
00:03:36,820 --> 00:03:38,350
It's again a rotation--
72
00:03:38,350 --> 00:03:41,758
rotation, stretch, rotation.
73
00:03:41,758 --> 00:03:42,550
I'll say it again--
74
00:03:42,550 --> 00:03:45,140
rotation, stretch, rotation.
75
00:03:45,140 --> 00:03:47,860
That's what singular
values and singular
76
00:03:47,860 --> 00:03:51,430
vectors do, the singular
value decomposition.
77
00:03:51,430 --> 00:03:56,740
And it's got the best
of all worlds here.
78
00:03:56,740 --> 00:04:01,750
It's got the rotations,
the orthogonal matrices.
79
00:04:01,750 --> 00:04:06,003
And it's got the stretches,
the diagonal matrices.
80
00:04:06,003 --> 00:04:07,920
Compared to those two,
those are the greatest.
81
00:04:07,920 --> 00:04:13,290
Triangular matrices were good
when we were young an hour ago.
82
00:04:13,290 --> 00:04:15,670
Now, we're seeing the best.
83
00:04:15,670 --> 00:04:20,014
OK, now let me just show
you where they come from.
84
00:04:20,014 --> 00:04:23,780
Oh, so how to find these v's.
85
00:04:23,780 --> 00:04:27,970
Well, the answer is, if I'm
looking for orthogonal vectors,
86
00:04:27,970 --> 00:04:32,540
the great idea is find
a symmetric matrix
87
00:04:32,540 --> 00:04:35,000
and with those eigenvectors.
88
00:04:35,000 --> 00:04:40,280
So these v's that I want for
A are actually eigenvectors
89
00:04:40,280 --> 00:04:43,100
of this symmetric
matrix A transpose times
90
00:04:43,100 --> 00:04:45,560
A. That's just nice.
91
00:04:45,560 --> 00:04:48,350
So we can find those
singular vectors
92
00:04:48,350 --> 00:04:51,110
just as fast as we
can find eigenvectors
93
00:04:51,110 --> 00:04:52,700
for a symmetric matrix.
94
00:04:52,700 --> 00:04:56,510
And we know there, because
A transpose A is symmetric.
95
00:04:56,510 --> 00:04:59,540
We know the eigenvectors
are perpendicular
96
00:04:59,540 --> 00:05:01,580
to each other orthonormal.
97
00:05:01,580 --> 00:05:04,910
OK, and now what about the
other ones because remember,
98
00:05:04,910 --> 00:05:06,160
we have two sets.
99
00:05:06,160 --> 00:05:13,100
The u's-- well, we just multiply
by A. And we've got the u's.
100
00:05:13,100 --> 00:05:16,280
Well, and divide by sigmas,
because these vectors u's
101
00:05:16,280 --> 00:05:19,700
and v's are unit
vectors, length one.
102
00:05:19,700 --> 00:05:22,100
So we have to scale
them properly.
103
00:05:22,100 --> 00:05:27,380
And this was a little key
bit of algebra to check that,
104
00:05:27,380 --> 00:05:29,660
not only the v's
were orthogonal,
105
00:05:29,660 --> 00:05:31,880
but the u's are orthogonal.
106
00:05:31,880 --> 00:05:33,320
Yeah, it just comes out--
107
00:05:33,320 --> 00:05:34,590
comes out.
108
00:05:34,590 --> 00:05:36,870
So this singular
value decomposition,
109
00:05:36,870 --> 00:05:41,750
which is maybe, well, say 100
years old, maybe a bit more.
110
00:05:41,750 --> 00:05:48,410
But it's really in the last 20,
30 years that singular values
111
00:05:48,410 --> 00:05:50,160
have become so important.
112
00:05:50,160 --> 00:05:54,212
This is the best
factorization of them all.
113
00:05:54,212 --> 00:05:57,620
And that's not always reflected
in linear algebra courses.
114
00:05:57,620 --> 00:06:04,540
So part of my goal today is
to say get to singular values.
115
00:06:04,540 --> 00:06:08,780
If you've done symmetric
matrices and their eigenvalues,
116
00:06:08,780 --> 00:06:11,220
then you can do singular values.
117
00:06:11,220 --> 00:06:17,850
And I think that's absolutely
worth doing, OK, yeah,
118
00:06:17,850 --> 00:06:22,850
so and remembering down here
that capital Sigma stands
119
00:06:22,850 --> 00:06:27,500
for the diagonal matrix of
these positive numbers, sigma 1,
120
00:06:27,500 --> 00:06:30,500
sigma 2 down to sigma r there.
121
00:06:30,500 --> 00:06:34,820
The rank, which came way
back in the first slides,
122
00:06:34,820 --> 00:06:36,830
tells you how many there are.
123
00:06:36,830 --> 00:06:40,700
Good, good.
124
00:06:40,700 --> 00:06:43,230
Oh, here's an example.
125
00:06:43,230 --> 00:06:45,740
So I took a small
matrix because I'm
126
00:06:45,740 --> 00:06:49,430
doing this by pencil and
paper and actually showing you
127
00:06:49,430 --> 00:06:52,590
that the singular value.
128
00:06:52,590 --> 00:06:55,100
So there is my matrix, 2 by 2.
129
00:06:55,100 --> 00:06:56,310
Here are the u's.
130
00:06:56,310 --> 00:06:58,130
Do you see that those
are orthogonal--
131
00:06:58,130 --> 00:07:01,070
1, 3 against minus 3, 1?
132
00:07:01,070 --> 00:07:03,150
Take the dot product,
and you get 0.
133
00:07:03,150 --> 00:07:04,960
The v's are orthogonal.
134
00:07:04,960 --> 00:07:06,800
The sigma is diagonal.
135
00:07:06,800 --> 00:07:11,890
And then the pieces from
that add back to the matrix.
136
00:07:11,890 --> 00:07:14,190
So it's really, it's
broken my matrix
137
00:07:14,190 --> 00:07:16,560
into a couple of pieces--
138
00:07:16,560 --> 00:07:20,720
one for the first
singular value in vector,
139
00:07:20,720 --> 00:07:23,980
and the other for the second
singular value in vector.
140
00:07:23,980 --> 00:07:26,430
And that's what
data science wants.
141
00:07:26,430 --> 00:07:30,310
Data science wants to know
what's important in the matrix?
142
00:07:30,310 --> 00:07:34,610
Well, what's important
is sigma 1, the big guy.
143
00:07:34,610 --> 00:07:36,400
Sigma 2, you see.
144
00:07:36,400 --> 00:07:38,020
Well, it was 3 times smaller--
145
00:07:38,020 --> 00:07:40,450
3/2 versus 1/2.
146
00:07:40,450 --> 00:07:45,760
So if I had a 100 by 100
matrix or 100 by 1,000,
147
00:07:45,760 --> 00:07:51,560
I'd have 100 singular values and
maybe the first five I'd keep.
148
00:07:51,560 --> 00:07:54,502
If I'm in the financial
market, those guys,
149
00:07:54,502 --> 00:07:57,490
those first numbers
are telling me
150
00:07:57,490 --> 00:08:00,790
maybe what bond prices
are going to do over time.
151
00:08:00,790 --> 00:08:06,110
And it's a mixture
of a few features,
152
00:08:06,110 --> 00:08:09,930
but not all 1,000
features, right.
153
00:08:09,930 --> 00:08:13,890
So this is singular
value decomposition
154
00:08:13,890 --> 00:08:17,450
picks out the important
part of a data matrix.
155
00:08:17,450 --> 00:08:19,490
And you cannot ask
for a more than that.
156
00:08:22,250 --> 00:08:25,700
Here's what you do with a matrix
is just totally enormous--
157
00:08:25,700 --> 00:08:27,730
too big to multiply--
158
00:08:27,730 --> 00:08:29,560
too big to compute.
159
00:08:29,560 --> 00:08:37,220
Then you randomly sample it.
160
00:08:37,220 --> 00:08:39,799
Yeah, maybe the next
slide even mentions
161
00:08:39,799 --> 00:08:42,690
that word randomized
numerical linear algebra.
162
00:08:42,690 --> 00:08:46,170
So this, I'll go back to this.
163
00:08:46,170 --> 00:08:48,980
So the singular
value decomposition--
164
00:08:48,980 --> 00:08:52,410
this, what we just talked
about with the u's and the v's
165
00:08:52,410 --> 00:08:54,350
and the sigmas.
166
00:08:54,350 --> 00:08:56,670
Sigma 1 is the biggest.
167
00:08:56,670 --> 00:08:59,400
Sigma r is the smallest.
168
00:08:59,400 --> 00:09:01,640
So in data science,
you very often
169
00:09:01,640 --> 00:09:06,410
keep just these first ones,
maybe the first k, the k
170
00:09:06,410 --> 00:09:08,150
largest ones.
171
00:09:08,150 --> 00:09:10,250
And then you've
got the matrix that
172
00:09:10,250 --> 00:09:14,900
has rank only k, because you're
only working with k vectors.
173
00:09:14,900 --> 00:09:19,130
And it turns out that's the
closest one to the big matrix
174
00:09:19,130 --> 00:09:23,610
A. So this singular value
is among other things
175
00:09:23,610 --> 00:09:27,930
is picking out, putting
in order of importance
176
00:09:27,930 --> 00:09:30,940
the little pieces of the matrix.
177
00:09:30,940 --> 00:09:34,700
And then you can just pick
a few pieces to work with.
178
00:09:34,700 --> 00:09:36,090
Yeah, yeah.
179
00:09:36,090 --> 00:09:40,620
And the idea of norms is how to
measure the size of a matrix.
180
00:09:40,620 --> 00:09:46,060
Yeah, but I'll leave
that for the future.
181
00:09:46,060 --> 00:09:50,230
And randomized linear algebra
I just want to mention.
182
00:09:50,230 --> 00:09:53,950
Seems a little crazy that
by just randomly sampling
183
00:09:53,950 --> 00:09:58,580
a matrix, we could
learn anything about it.
184
00:09:58,580 --> 00:10:02,560
But typically data
is sort of organized.
185
00:10:02,560 --> 00:10:05,270
It's not just
totally random stuff.
186
00:10:05,270 --> 00:10:09,820
So if we want to know like, my
friend in the Broad Institute
187
00:10:09,820 --> 00:10:14,410
was doing the ancient
history of man.
188
00:10:14,410 --> 00:10:18,470
So data from thousands
of years ago.
189
00:10:18,470 --> 00:10:20,090
So he had a giant matrix--
190
00:10:20,090 --> 00:10:22,580
a lot of data-- too much data.
191
00:10:22,580 --> 00:10:27,880
And he said, how can we find the
singular value decomposition?
192
00:10:27,880 --> 00:10:29,660
Pick out the important thing.
193
00:10:29,660 --> 00:10:33,210
So you had to sample the data.
194
00:10:33,210 --> 00:10:36,990
Statistics is a beautiful
important subject.
195
00:10:36,990 --> 00:10:40,590
And it leans on linear algebra.
196
00:10:40,590 --> 00:10:44,140
Data science leans
on linear algebra.
197
00:10:44,140 --> 00:10:46,650
You are seeing the tool.
198
00:10:46,650 --> 00:10:52,080
Calculus would be functions
would be continuous curves.
199
00:10:52,080 --> 00:10:54,735
Linear algebra is about vectors.
200
00:10:54,735 --> 00:10:57,300
This is just n components.
201
00:10:57,300 --> 00:10:59,010
And that's where you compute.
202
00:10:59,010 --> 00:11:01,350
And that's where you understand.
203
00:11:01,350 --> 00:11:03,060
OK.
204
00:11:03,060 --> 00:11:07,120
Oh, this is maybe the
last slide to just
205
00:11:07,120 --> 00:11:10,570
help orient you in the courses.
206
00:11:10,570 --> 00:11:14,830
So at MIT 18.06 is the
Linear Algebra Course.
207
00:11:14,830 --> 00:11:17,950
And maybe you know
18.06 and also
208
00:11:17,950 --> 00:11:23,390
18.06 Scholar, SC,
on OpenCourseWare.
209
00:11:23,390 --> 00:11:30,770
And then this is the new course
with the new book, 18.065.
210
00:11:30,770 --> 00:11:34,430
So its numbers sort of
indicating a second course
211
00:11:34,430 --> 00:11:35,180
in linear algebra.
212
00:11:35,180 --> 00:11:36,790
That's when I'm
actually teaching now,
213
00:11:36,790 --> 00:11:40,050
Monday, Wednesday, Friday.
214
00:11:40,050 --> 00:11:42,300
And so that starts
with linear algebra,
215
00:11:42,300 --> 00:11:44,880
but it's mostly
about deep learning--
216
00:11:44,880 --> 00:11:45,960
learning from data.
217
00:11:45,960 --> 00:11:47,280
So you need statistics.
218
00:11:47,280 --> 00:11:50,520
You need optimization,
minimizing.
219
00:11:50,520 --> 00:11:53,910
Big functions of
calculus comes into it.
220
00:11:53,910 --> 00:11:58,170
So that's a lot of fun
to teach and to learn.
221
00:11:58,170 --> 00:12:01,470
And, of course,
it's tremendously
222
00:12:01,470 --> 00:12:03,510
important in industry now.
223
00:12:03,510 --> 00:12:06,570
And Google and Facebook
and ever so many companies
224
00:12:06,570 --> 00:12:09,300
need people who understand this.
225
00:12:09,300 --> 00:12:12,960
And, oh, and then I
am repeating 18.06
226
00:12:12,960 --> 00:12:16,170
because there is this
new book coming, I hope.
227
00:12:16,170 --> 00:12:19,140
Did some more this morning.
228
00:12:19,140 --> 00:12:20,640
Linear Algebra for Everyone.
229
00:12:20,640 --> 00:12:23,580
So I have
optimistically put 2021.
230
00:12:23,580 --> 00:12:27,140
And you're the first
people that know about it.
231
00:12:27,140 --> 00:12:30,850
So these are the websites
for the two that we have.
232
00:12:30,850 --> 00:12:32,800
That's the website
for the linear algebra
233
00:12:32,800 --> 00:12:35,390
book, math.mit.edu.
234
00:12:35,390 --> 00:12:39,520
And this is the website for
the Learning from Data book.
235
00:12:39,520 --> 00:12:43,990
So you see there the table of
contents and all and solutions
236
00:12:43,990 --> 00:12:47,830
to problems-- lots of things.
237
00:12:47,830 --> 00:12:50,650
Thanks for listening
to this is--
238
00:12:50,650 --> 00:12:56,470
what-- maybe four or five
pieces in this 2020 vision
239
00:12:56,470 --> 00:13:03,850
to update the videos
that have been watched
240
00:13:03,850 --> 00:13:07,480
so much on OpenCourseWare.
241
00:13:07,480 --> 00:13:09,330
Thank you.