WEBVTT

00:00:01.550 --> 00:00:03.920
The following content is
provided under a Creative

00:00:03.920 --> 00:00:05.310
Commons license.

00:00:05.310 --> 00:00:07.520
Your support will help
MIT OpenCourseWare

00:00:07.520 --> 00:00:11.610
continue to offer high-quality
educational resources for free.

00:00:11.610 --> 00:00:14.180
To make a donation or to
view additional materials

00:00:14.180 --> 00:00:18.140
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:18.140 --> 00:00:19.026
at ocw.mit.edu.

00:00:24.235 --> 00:00:28.470
GILBERT STRANG: OK,
let me make a start.

00:00:28.470 --> 00:00:33.390
On the left, you see
the topic for today.

00:00:33.390 --> 00:00:34.470
We're doing pretty well.

00:00:34.470 --> 00:00:40.200
This completes my review of the
highlights of linear algebra,

00:00:40.200 --> 00:00:41.550
so that's five lectures.

00:00:44.550 --> 00:00:47.550
I'll follow up on
those five points,

00:00:47.550 --> 00:00:51.510
because the neat part is
it really ties together

00:00:51.510 --> 00:00:52.860
the whole subject.

00:00:52.860 --> 00:00:59.485
Eigenvalues, energy, A transpose
A, determinants, pivots--

00:01:02.010 --> 00:01:03.760
they all come together.

00:01:03.760 --> 00:01:08.280
Each one gives a test for
positive and definite matrices.

00:01:08.280 --> 00:01:10.470
That's where I'm going.

00:01:10.470 --> 00:01:16.770
Claire is hoping to come in
for a little bit of the class

00:01:16.770 --> 00:01:23.080
to ask if anybody has
started on the homework.

00:01:23.080 --> 00:01:31.110
And got Julia rolling, and got
a yes from the auto grader.

00:01:31.110 --> 00:01:37.190
Is anybody like-- no.

00:01:37.190 --> 00:01:40.760
You're taking a chance, right?

00:01:40.760 --> 00:01:45.110
Julia, in principle, works,
but in practice, it's

00:01:45.110 --> 00:01:47.960
always an adventure
the first time.

00:01:47.960 --> 00:01:52.160
So we chose this
lab on convolution,

00:01:52.160 --> 00:01:55.580
because it was the
first lab last year,

00:01:55.580 --> 00:02:00.020
and it doesn't ask
for much math at all.

00:02:00.020 --> 00:02:02.300
Really, you're just
creating a matrix

00:02:02.300 --> 00:02:04.770
and getting the auto
grader to say, yes,

00:02:04.770 --> 00:02:05.865
that's the right matrix.

00:02:10.288 --> 00:02:12.190
And we'll see that matrix.

00:02:12.190 --> 00:02:15.620
We'll see this
idea of convolution

00:02:15.620 --> 00:02:18.440
at the right time, which
is not that far off.

00:02:18.440 --> 00:02:24.170
It's signal processing,
and it's early in part

00:02:24.170 --> 00:02:25.100
three of the book.

00:02:27.880 --> 00:02:30.990
If Claire comes in,
she'll answer questions.

00:02:30.990 --> 00:02:36.440
Otherwise, I guess it would
be emailing questions to--

00:02:36.440 --> 00:02:40.070
I realize that the deadline
is not on top of you,

00:02:40.070 --> 00:02:44.360
and you've got a whole
weekend to make Julia fly.

00:02:48.170 --> 00:02:51.260
I'll start on the math then.

00:02:51.260 --> 00:02:55.070
We had symmetric-- eigenvalues
of matrices, and especially

00:02:55.070 --> 00:02:58.730
symmetric matrices, and
those have real eigenvalues,

00:02:58.730 --> 00:03:01.730
and I'll quickly show why.

00:03:01.730 --> 00:03:05.780
And orthogonal eigenvectors,
and I'll quickly show why.

00:03:05.780 --> 00:03:10.850
But I want to move
to the new idea--

00:03:10.850 --> 00:03:13.580
positive definite matrices.

00:03:13.580 --> 00:03:17.120
These are the best of
the symmetric matrices.

00:03:17.120 --> 00:03:21.320
They are symmetric matrices
that have positive eigenvalues.

00:03:21.320 --> 00:03:25.250
That's the easy way to remember
positive definite matrices.

00:03:25.250 --> 00:03:28.610
They have positive eigenvalues,
but it's certainly not

00:03:28.610 --> 00:03:30.200
the easy way to test.

00:03:30.200 --> 00:03:34.160
If I give you a matrix like
that, that's only two by two.

00:03:34.160 --> 00:03:36.680
We could actually
find the eigenvalues,

00:03:36.680 --> 00:03:41.600
but we would like to have other
tests, easier tests, which

00:03:41.600 --> 00:03:47.420
would be equivalent to
positive eigenvalues.

00:03:47.420 --> 00:03:50.830
Every one of those five tests--
any one of those five tests

00:03:50.830 --> 00:03:53.810
is all you need.

00:03:53.810 --> 00:03:57.950
Let me start with that
example and ask you to look,

00:03:57.950 --> 00:04:01.665
and then I'm going to discuss
those five separate points.

00:04:04.610 --> 00:04:08.680
My question is,
is that matrix s?

00:04:08.680 --> 00:04:10.610
It's obviously symmetric.

00:04:10.610 --> 00:04:14.690
Is it positive,
definite, or not?

00:04:14.690 --> 00:04:16.399
You could compute
its eigenvalues

00:04:16.399 --> 00:04:17.779
since it's two by two.

00:04:17.779 --> 00:04:20.130
It's energy-- I'll come
back to that, because that's

00:04:20.130 --> 00:04:21.529
the most important one.

00:04:21.529 --> 00:04:24.320
Number two is
really fundamental.

00:04:24.320 --> 00:04:26.870
Number three would ask
you to factor that.

00:04:26.870 --> 00:04:30.020
Well, you don't want
to take time with that.

00:04:30.020 --> 00:04:31.610
Well, what do you think?

00:04:31.610 --> 00:04:33.410
Is it positive,
definite, or not?

00:04:33.410 --> 00:04:39.110
I see an expert in the
front row saying no.

00:04:39.110 --> 00:04:40.610
Why is it no?

00:04:40.610 --> 00:04:41.690
The answer is no.

00:04:41.690 --> 00:04:43.790
That's not a positive
definite matrix.

00:04:43.790 --> 00:04:46.170
Where does it let us down?

00:04:46.170 --> 00:04:48.170
It's got all positive
numbers, but that's not

00:04:48.170 --> 00:04:49.580
what we're asking.

00:04:49.580 --> 00:04:51.560
We're asking
positive eigenvalues,

00:04:51.560 --> 00:04:53.670
positive determinants,
positive pivots.

00:04:56.630 --> 00:04:59.270
How does it let us down?

00:04:59.270 --> 00:05:02.510
Which is the easy test
to see that it fails?

00:05:02.510 --> 00:05:03.803
AUDIENCE: Maybe determinant?

00:05:03.803 --> 00:05:04.970
GILBERT STRANG: Determinant.

00:05:04.970 --> 00:05:12.120
The determinant is 15
minus 16, so negative.

00:05:12.120 --> 00:05:17.520
So how is the determinant
connected to the eigenvalues?

00:05:17.520 --> 00:05:18.320
Everybody?

00:05:18.320 --> 00:05:18.820
Yep.

00:05:18.820 --> 00:05:19.280
AUDIENCE: [INAUDIBLE]

00:05:19.280 --> 00:05:20.700
GILBERT STRANG:
It's the product.

00:05:20.700 --> 00:05:24.880
So the two eigenvalues of
s, they're real, of course,

00:05:24.880 --> 00:05:28.600
and they multiply to give the
determinant, which is minus 1.

00:05:28.600 --> 00:05:31.670
So one of them is negative,
and one of them is positive.

00:05:31.670 --> 00:05:35.300
This matrix is an
indefinite matrix--

00:05:35.300 --> 00:05:36.650
indefinite.

00:05:36.650 --> 00:05:41.000
So how could I make
it positive definite?

00:05:41.000 --> 00:05:42.080
OK.

00:05:42.080 --> 00:05:44.000
We can just play
with an example,

00:05:44.000 --> 00:05:48.840
and then we see these
things happening.

00:05:48.840 --> 00:05:50.540
Let's see.

00:05:50.540 --> 00:05:55.850
OK, what shall I put in
place of the 5, for example?

00:05:55.850 --> 00:06:00.500
I could lower the 4, or I
can up the 5, or up the 3.

00:06:00.500 --> 00:06:02.450
I can make the diagonal entries.

00:06:02.450 --> 00:06:05.180
If I add stuff to
the main diagonal,

00:06:05.180 --> 00:06:08.870
I'm making it more positive.

00:06:08.870 --> 00:06:12.230
So that's the
straightforward way.

00:06:12.230 --> 00:06:15.420
So what number in
there would be safe?

00:06:15.420 --> 00:06:16.130
AUDIENCE: 6.

00:06:16.130 --> 00:06:17.400
GILBERT STRANG: 6.

00:06:17.400 --> 00:06:17.900
OK.

00:06:17.900 --> 00:06:19.880
6 would be safe.

00:06:19.880 --> 00:06:23.300
If I go up from 5 to
6, I've gotta de--

00:06:23.300 --> 00:06:26.540
so when I say here
"leading determinants,"

00:06:26.540 --> 00:06:29.240
what does that mean?

00:06:29.240 --> 00:06:31.520
That word leading
means something.

00:06:31.520 --> 00:06:34.910
It means that I take
that 1 by 1 determinant--

00:06:34.910 --> 00:06:36.920
it would have to pass that.

00:06:36.920 --> 00:06:39.360
Just the determinant
itself would not do it.

00:06:39.360 --> 00:06:41.480
Let me give you an example.

00:06:41.480 --> 00:06:48.470
No for-- let me take
minus 3 and minus 6.

00:06:48.470 --> 00:06:50.510
That would have the
same determinant.

00:06:55.010 --> 00:06:57.890
The determinant would
still be 18 minus 16--

00:06:57.890 --> 00:06:58.730
2.

00:06:58.730 --> 00:07:04.550
But it fails the
test on the 1 by 1.

00:07:04.550 --> 00:07:05.550
And this passes.

00:07:05.550 --> 00:07:09.280
This passes the 1 by 1
test and 2 by 2 tests.

00:07:09.280 --> 00:07:11.890
So that's what this means here.

00:07:11.890 --> 00:07:15.480
Leading determinants
are from the upper left.

00:07:15.480 --> 00:07:17.460
You have to check n
things because you've

00:07:17.460 --> 00:07:19.260
got n eigenvalues.

00:07:19.260 --> 00:07:22.620
And those are the n tests.

00:07:22.620 --> 00:07:25.710
And have you noticed the
connection to pivots?

00:07:25.710 --> 00:07:30.900
So let's just remember
that small item.

00:07:30.900 --> 00:07:35.340
What would be the pivots
because we didn't take

00:07:35.340 --> 00:07:37.660
a long time on elimination?

00:07:37.660 --> 00:07:43.380
So what would be the pivots
for that matrix, 3-4-4-6?

00:07:43.380 --> 00:07:45.870
Well, what's the first pivot?

00:07:45.870 --> 00:07:50.250
3, sitting there-- the 1-1
entry would be the first pivot.

00:07:50.250 --> 00:07:56.520
So the pivots would be 3,
and what's the second pivot?

00:07:56.520 --> 00:07:59.070
Well, maybe to
see it clearly you

00:07:59.070 --> 00:08:01.950
want me to take that
elimination step.

00:08:01.950 --> 00:08:05.280
Why don't I do it just
so you'll see it here?

00:08:05.280 --> 00:08:09.870
So elimination would subtract
some multiple of row 1

00:08:09.870 --> 00:08:11.290
from row 2.

00:08:11.290 --> 00:08:13.740
I would leave 1 one alone.

00:08:13.740 --> 00:08:16.530
I would subtract some
multiple to get a 0 there.

00:08:16.530 --> 00:08:18.270
And what's the multiple?

00:08:18.270 --> 00:08:20.550
What's the multiplier?

00:08:20.550 --> 00:08:21.570
AUDIENCE: In that much--

00:08:21.570 --> 00:08:22.940
GILBERT STRANG: 4/3.

00:08:22.940 --> 00:08:29.940
4/3 times row 1, away from
row 2, would produce that 0.

00:08:29.940 --> 00:08:35.190
But 4/3 times the 4,
that would be 16/3.

00:08:35.190 --> 00:08:38.250
And we're subtracting
it from 18/3.

00:08:38.250 --> 00:08:39.990
I think we've got 2/3 left.

00:08:43.960 --> 00:08:48.880
So the pivots, which is
this, in elimination,

00:08:48.880 --> 00:08:50.710
are the 3 and the 2/3.

00:08:50.710 --> 00:08:52.210
And of course, they're positive.

00:08:52.210 --> 00:08:55.300
And actually, you see
the immediate connection.

00:08:55.300 --> 00:09:01.750
This pivot is the 2 by 2
determinant divided by the 1

00:09:01.750 --> 00:09:04.690
by 1 determinant.

00:09:04.690 --> 00:09:07.000
The 2 by 2 determinant,
we figured out--

00:09:07.000 --> 00:09:10.460
18 minus 16 was 2.

00:09:10.460 --> 00:09:13.510
The 1 by 1 determinant is 3.

00:09:13.510 --> 00:09:18.970
And sure enough, that
second pivot is 2/3.

00:09:18.970 --> 00:09:32.200
This is not-- so by example,
I'm illustrating what these

00:09:32.200 --> 00:09:33.970
different tests--

00:09:33.970 --> 00:09:37.030
and again, each test
is all you need.

00:09:37.030 --> 00:09:39.790
If it passes one test,
it passes them all.

00:09:39.790 --> 00:09:42.370
And we haven't found
the eigenvalues.

00:09:42.370 --> 00:09:44.200
Let me do the energy.

00:09:44.200 --> 00:09:46.110
Can I do energy here?

00:09:46.110 --> 00:09:46.610
OK.

00:09:46.610 --> 00:09:48.190
So what's this--

00:09:48.190 --> 00:09:54.620
I am saying that this is
really the great test.

00:09:54.620 --> 00:09:59.860
That, for me, is the definition
of a positive definite matrix.

00:09:59.860 --> 00:10:03.610
And the word "energy"
comes in because it's

00:10:03.610 --> 00:10:08.090
quadratic, [INAUDIBLE] kinetic
energy or potential energy.

00:10:08.090 --> 00:10:13.060
So that's the energy in the
vector x for this matrix.

00:10:13.060 --> 00:10:15.640
So let me compute
it, x transpose Sx.

00:10:15.640 --> 00:10:23.170
So let me put in S
here, the original S.

00:10:23.170 --> 00:10:28.580
And let me put in of any
vector x, so, say xy or x1.

00:10:28.580 --> 00:10:30.625
Maybe-- do you like x--

00:10:30.625 --> 00:10:32.620
xy is easier.

00:10:32.620 --> 00:10:36.130
So that's our
vector x transposed.

00:10:36.130 --> 00:10:41.500
This is our matrix S.
And here's our vector x.

00:10:41.500 --> 00:10:45.100
So it's a function of x and y.

00:10:45.100 --> 00:10:48.400
It's a pure quadratic function.

00:10:48.400 --> 00:10:51.730
Do you know what I get
when I multiply that out?

00:10:51.730 --> 00:10:55.590
I get a very simple,
important type of function.

00:10:55.590 --> 00:10:58.880
Shall we multiply it out?

00:10:58.880 --> 00:10:59.380
Let's see.

00:10:59.380 --> 00:11:05.380
Shall I multiply that by that
first, so I get 3x plus 4y?

00:11:05.380 --> 00:11:11.890
And 4x plus 6y is what I'm
getting from these two.

00:11:11.890 --> 00:11:16.100
And now I'm hitting
that with the xy.

00:11:16.100 --> 00:11:18.950
And now I'm going
to see the energy.

00:11:18.950 --> 00:11:20.870
And you'll see the pattern.

00:11:20.870 --> 00:11:22.670
That's always what
math is about.

00:11:22.670 --> 00:11:23.990
What's the pattern?

00:11:23.990 --> 00:11:27.920
So I've x times 3x, 3x squared.

00:11:27.920 --> 00:11:29.960
And I have y times 6y.

00:11:29.960 --> 00:11:32.510
That's 6y squared.

00:11:32.510 --> 00:11:34.700
And I have x times 4y.

00:11:34.700 --> 00:11:36.860
That's for 4xy.

00:11:36.860 --> 00:11:38.690
And I have y times 4x.

00:11:38.690 --> 00:11:39.920
That's 4 more xy.

00:11:44.060 --> 00:11:45.920
So I've got all those terms.

00:11:45.920 --> 00:11:50.720
Every term, every
number in the matrix

00:11:50.720 --> 00:11:54.620
gives me a piece of the energy.

00:11:54.620 --> 00:11:58.730
And you see that the diagonal
numbers, 3 and 6, those

00:11:58.730 --> 00:12:03.830
give me the diagonal pieces,
3x squared and 6y squared.

00:12:03.830 --> 00:12:05.760
And then the cross--

00:12:05.760 --> 00:12:08.540
or I maybe call them
the cross terms.

00:12:08.540 --> 00:12:13.160
Those give me 4xy and
4xy, so, really, 8xy.

00:12:13.160 --> 00:12:16.040
So you could call
this thing 8xy.

00:12:20.190 --> 00:12:22.470
So that's my function.

00:12:22.470 --> 00:12:23.730
That's my quadratic.

00:12:23.730 --> 00:12:25.680
That's my energy.

00:12:25.680 --> 00:12:31.120
And I believe that
is greater than 0.

00:12:31.120 --> 00:12:33.130
Let me graph the thing.

00:12:33.130 --> 00:12:34.510
Let me graph that energy.

00:12:38.560 --> 00:12:39.160
OK.

00:12:39.160 --> 00:12:42.670
So here's a graph of my
function, f of x and y.

00:12:45.340 --> 00:12:48.460
Here is x, and here's y.

00:12:48.460 --> 00:12:52.400
And of course, that's
on the graph, 0-0.

00:12:52.400 --> 00:12:57.000
At x equals 0, y equals 0,
the function is clearly 0.

00:12:57.000 --> 00:12:59.710
Everybody's got his eye-- let
me write that function again

00:12:59.710 --> 00:13:00.420
here--

00:13:00.420 --> 00:13:04.975
3x squared, 6y squared, 8xy.

00:13:09.460 --> 00:13:11.650
Actually, you can see--

00:13:11.650 --> 00:13:15.700
this is how I think
about that function.

00:13:15.700 --> 00:13:20.020
So 3x squared is obviously
carrying me upwards.

00:13:20.020 --> 00:13:21.950
It will never go negative.

00:13:21.950 --> 00:13:24.760
6y squared will
never go negative.

00:13:24.760 --> 00:13:28.030
8xy can go negative, right?

00:13:28.030 --> 00:13:31.930
If x and y have opposite
signs, that'll go negative.

00:13:31.930 --> 00:13:38.260
But the question is, do
these positive pieces

00:13:38.260 --> 00:13:45.065
overwhelm it and make the
graph go up like a bowl?

00:13:49.890 --> 00:13:53.610
And the answer is yes, for
a positive definite matrix.

00:13:53.610 --> 00:13:57.120
So this is a graph of a
positive definite matrix,

00:13:57.120 --> 00:14:02.160
of positive energy, the energy
of a positive definite matrix.

00:14:02.160 --> 00:14:08.410
So this is the energy x
transpose Sx that I'm graphing.

00:14:08.410 --> 00:14:11.460
And there it is.

00:14:11.460 --> 00:14:13.680
This is important.

00:14:13.680 --> 00:14:14.590
This is important.

00:14:14.590 --> 00:14:18.930
This is the kind of function
we like, x transpose Sx,

00:14:18.930 --> 00:14:25.960
where S is positive definite, so
the function goes up like that.

00:14:25.960 --> 00:14:28.650
This is what deep
learning is about.

00:14:28.650 --> 00:14:32.730
This could be a loss
function that you minimize.

00:14:32.730 --> 00:14:36.870
It could depend on
100,000 variables or more.

00:14:36.870 --> 00:14:43.890
And it could come from the
error in the difference

00:14:43.890 --> 00:14:53.070
between training data and
the number you get it.

00:14:53.070 --> 00:14:57.080
The loss would be some
expression like that.

00:14:57.080 --> 00:15:02.250
Well, I'll make sense of
those words as soon as I can.

00:15:02.250 --> 00:15:11.500
What I want to say is deep
learning, neural nets, machine

00:15:11.500 --> 00:15:14.530
learning, the big computation--

00:15:14.530 --> 00:15:17.260
is to minimize an energy--

00:15:17.260 --> 00:15:18.850
is to minimize an energy.

00:15:18.850 --> 00:15:20.800
Now of course, I
made the minimum

00:15:20.800 --> 00:15:25.420
easy to find because
I have pure squares.

00:15:25.420 --> 00:15:28.630
Well, that doesn't happen
in practice, of course.

00:15:28.630 --> 00:15:35.110
In practice, we have linear
terms, x transpose b,

00:15:35.110 --> 00:15:38.080
or nonlinear.

00:15:38.080 --> 00:15:42.430
Yeah, the loss function doesn't
have to be a [INAUDIBLE] cross

00:15:42.430 --> 00:15:44.330
entropy, all kinds of things.

00:15:44.330 --> 00:15:48.490
There is a whole dictionary
of possible loss functions.

00:15:48.490 --> 00:15:51.710
But but this is the model.

00:15:51.710 --> 00:15:53.030
This is the model.

00:15:53.030 --> 00:15:55.210
And I'll make it
the perfect model

00:15:55.210 --> 00:16:01.020
by just focusing on that part.

00:16:01.020 --> 00:16:09.200
Well, by the way, what would
happen if that was in there?

00:16:09.200 --> 00:16:12.410
I shouldn't have X'd
it out so quickly

00:16:12.410 --> 00:16:14.030
since I just put it up there.

00:16:14.030 --> 00:16:16.580
Let me put it back up.

00:16:16.580 --> 00:16:18.940
I thought better of it.

00:16:18.940 --> 00:16:20.260
OK.

00:16:20.260 --> 00:16:27.350
This is a kind of least squares
problem with some data, b.

00:16:27.350 --> 00:16:28.970
Minimize that.

00:16:28.970 --> 00:16:31.250
So what would be the
graph of this guy?

00:16:31.250 --> 00:16:37.460
Can I just draw the same sort
of picture for that function?

00:16:37.460 --> 00:16:40.720
Will it be a bowl?

00:16:40.720 --> 00:16:42.820
Yes.

00:16:42.820 --> 00:16:44.980
If I have this
term, all that does

00:16:44.980 --> 00:16:51.600
is move it off center
here, at x equals 0.

00:16:51.600 --> 00:16:52.640
Well, I still get 0.

00:16:52.640 --> 00:16:53.140
Sorry.

00:16:53.140 --> 00:16:54.520
I still go through that point.

00:16:54.520 --> 00:16:57.540
If this is the 0 vector,
I'm still getting 0.

00:16:57.540 --> 00:16:59.100
But this, we'll bring it below.

00:16:59.100 --> 00:17:02.130
That would produce
a bowl like that.

00:17:02.130 --> 00:17:05.460
Actually, it would
just be the same bowl.

00:17:05.460 --> 00:17:08.040
The bowl would just be shifted.

00:17:08.040 --> 00:17:12.599
I could write that to
show how that happens.

00:17:12.599 --> 00:17:17.040
So this is now below 0.

00:17:17.040 --> 00:17:22.290
That's the solution
we're after that tells us

00:17:22.290 --> 00:17:25.680
the weights in the
neural network.

00:17:25.680 --> 00:17:28.380
I'm just using these
words, but we'll soon

00:17:28.380 --> 00:17:31.350
have a meaning to them.

00:17:31.350 --> 00:17:34.040
I want to find that
minimum, in other words.

00:17:34.040 --> 00:17:36.930
And I want to find it for much
more complicated functions

00:17:36.930 --> 00:17:37.740
than that.

00:17:37.740 --> 00:17:40.410
Of course, if I
minimize the quadratic,

00:17:40.410 --> 00:17:42.660
that means setting
derivatives to 0.

00:17:42.660 --> 00:17:44.790
I just have linear equations.

00:17:44.790 --> 00:17:49.160
Probably, I could write
everything down for that thing.

00:17:49.160 --> 00:17:51.930
So let's put in some
nonlinear stuff,

00:17:51.930 --> 00:17:55.790
which way to wiggles the
bowl, makes it not so easy.

00:17:59.880 --> 00:18:02.020
Can I look a month ahead?

00:18:02.020 --> 00:18:03.190
How do you find--

00:18:03.190 --> 00:18:05.890
so this is a big
part of mathematics--

00:18:05.890 --> 00:18:10.190
applied math,
optimization, minimization

00:18:10.190 --> 00:18:15.470
of a complicated function
of 100,000 variables.

00:18:15.470 --> 00:18:17.180
That's the biggest computation.

00:18:17.180 --> 00:18:20.180
That's the reason machine
learning on big problems

00:18:20.180 --> 00:18:24.980
takes a week on a
GPU or multiple GPUs,

00:18:24.980 --> 00:18:28.340
because you have
so many unknowns.

00:18:28.340 --> 00:18:32.240
More than 100,000
would be quite normal.

00:18:32.240 --> 00:18:35.960
In general, let's just have
the pleasure of looking ahead

00:18:35.960 --> 00:18:40.130
for one minute, and then
I'll come back to real life

00:18:40.130 --> 00:18:42.560
here, linear algebra.

00:18:42.560 --> 00:18:51.050
I can't resist thinking aloud,
how do you find the minimum?

00:18:51.050 --> 00:18:58.110
By the way, these functions,
both of them, are convex.

00:18:58.110 --> 00:18:59.100
So that is convex.

00:19:04.940 --> 00:19:09.290
So I want to connect
convex functions, f--

00:19:09.290 --> 00:19:11.350
and what does convex mean?

00:19:11.350 --> 00:19:17.460
It means, well, that
the graph is like that.

00:19:17.460 --> 00:19:19.660
[LAUGHTER]

00:19:19.660 --> 00:19:22.660
Not perfect, it could--

00:19:22.660 --> 00:19:27.280
but if it's a
quadratic, then convex

00:19:27.280 --> 00:19:32.410
means positive
definite, or maybe

00:19:32.410 --> 00:19:35.810
in the extreme,
positive semidefinite.

00:19:35.810 --> 00:19:39.070
I'll have to mention that.

00:19:39.070 --> 00:19:42.250
But convex means it goes up.

00:19:42.250 --> 00:19:43.630
But it could have wiggles.

00:19:43.630 --> 00:19:47.020
It doesn't have to be
just perfect squares

00:19:47.020 --> 00:19:49.720
in linear terms,
but general things.

00:19:49.720 --> 00:19:55.060
And for deep learning,
it will include non--

00:19:55.060 --> 00:19:58.750
it will go far
beyond quadratics.

00:19:58.750 --> 00:20:00.580
Well, it may not be convex.

00:20:00.580 --> 00:20:02.530
I guess that's also true.

00:20:02.530 --> 00:20:04.120
Yeah.

00:20:04.120 --> 00:20:08.230
So deep learning has
got serious problems

00:20:08.230 --> 00:20:10.570
because those
functions, they may

00:20:10.570 --> 00:20:16.150
look like this but then over
here they could go nonxconvex.

00:20:16.150 --> 00:20:18.520
They could dip
down a little more.

00:20:18.520 --> 00:20:21.580
And you're looking for this
point or for this point.

00:20:24.820 --> 00:20:30.520
Still, I'm determined to tell
you how to find it or a start

00:20:30.520 --> 00:20:31.780
on how you find it.

00:20:31.780 --> 00:20:32.980
So you're at some point.

00:20:35.950 --> 00:20:41.810
Start there, somewhere
on the surface.

00:20:41.810 --> 00:20:45.900
Some x, some vector
x is your start, x0--

00:20:49.890 --> 00:20:51.030
starting point.

00:20:51.030 --> 00:20:58.020
And we're going to just take a
step, hopefully down the bowl.

00:20:58.020 --> 00:21:00.900
Well of course, it would
be fantastic to get there

00:21:00.900 --> 00:21:04.950
in one step, but that's
not going to happen.

00:21:04.950 --> 00:21:09.570
That would be solving a big
linear system, very expensive,

00:21:09.570 --> 00:21:11.190
and a big nonlinear system.

00:21:11.190 --> 00:21:13.320
So really, that's what
we're trying to solve--

00:21:13.320 --> 00:21:15.300
a big nonlinear system.

00:21:15.300 --> 00:21:19.050
And I should be on this
picture because here we

00:21:19.050 --> 00:21:20.790
can see where the minimum is.

00:21:20.790 --> 00:21:22.980
But they just shift.

00:21:22.980 --> 00:21:27.420
So what would you do if
you had a starting point

00:21:27.420 --> 00:21:32.240
and you wanted to go
look for the minimum?

00:21:32.240 --> 00:21:34.730
What's the natural idea?

00:21:34.730 --> 00:21:37.010
Compute derivatives.

00:21:37.010 --> 00:21:38.810
You've got calculus
on your side.

00:21:38.810 --> 00:21:42.300
Compute the first derivatives.

00:21:42.300 --> 00:21:46.550
So the first derivatives
with respect to x--

00:21:46.550 --> 00:21:50.210
so I would compute the
derivative with respect

00:21:50.210 --> 00:21:53.990
to x, and the derivative
of f with respect to y,

00:21:53.990 --> 00:21:56.990
and 100,000 more.

00:21:56.990 --> 00:21:59.750
And that takes a little while.

00:21:59.750 --> 00:22:01.760
And now I've got
the derivatives.

00:22:01.760 --> 00:22:02.955
What do I do?

00:22:02.955 --> 00:22:03.830
AUDIENCE: [INAUDIBLE]

00:22:03.830 --> 00:22:05.600
GILBERT STRANG: I
go-- that tells me

00:22:05.600 --> 00:22:07.170
the steepest direction.

00:22:07.170 --> 00:22:09.500
That tells me, at
that point, which

00:22:09.500 --> 00:22:12.930
way is the fastest way down.

00:22:12.930 --> 00:22:14.090
So I would follow--

00:22:14.090 --> 00:22:15.890
I would do a gradient descent.

00:22:15.890 --> 00:22:17.720
I would follow that gradient.

00:22:17.720 --> 00:22:21.920
This is called the gradient,
all the first derivatives.

00:22:21.920 --> 00:22:24.528
It's called the gradient of f--

00:22:24.528 --> 00:22:25.070
the gradient.

00:22:29.950 --> 00:22:32.030
Gradient vector-- it's
a vector, of course,

00:22:32.030 --> 00:22:35.440
because f is a function
of lots of variables.

00:22:35.440 --> 00:22:39.970
I would start down
in that direction.

00:22:39.970 --> 00:22:45.230
And how far to go, that's
the million dollar question

00:22:45.230 --> 00:22:48.480
in deep learning.

00:22:48.480 --> 00:22:52.210
Is it going to hit 0?

00:22:52.210 --> 00:22:52.870
Nope.

00:22:52.870 --> 00:22:54.400
It's not.

00:22:54.400 --> 00:22:55.120
It's not.

00:22:58.060 --> 00:23:02.040
So basically, you
go down until it--

00:23:04.720 --> 00:23:09.430
so you're traveling here in
the x, along the gradient.

00:23:09.430 --> 00:23:13.330
And you're not going to hit 0.

00:23:13.330 --> 00:23:17.200
You're all going here
in some direction.

00:23:17.200 --> 00:23:21.940
So you keep going down
this thing until it--

00:23:21.940 --> 00:23:26.290
oh, I'm not Rembrandt here.

00:23:26.290 --> 00:23:31.830
Your path down-- think of
yourself on a mountain.

00:23:31.830 --> 00:23:34.460
You're trying to go down hill.

00:23:34.460 --> 00:23:36.960
So you take-- as
fast as you can.

00:23:36.960 --> 00:23:40.190
So you take the steepest
route down until--

00:23:40.190 --> 00:23:42.830
but you have blinkers.

00:23:42.830 --> 00:23:47.330
Once you decide on a direction,
you go in that direction.

00:23:47.330 --> 00:23:50.900
Of course-- so what will happen?

00:23:50.900 --> 00:23:52.610
You'll go down for
a while and then

00:23:52.610 --> 00:23:57.980
it will turn up again when
you get to, maybe, close

00:23:57.980 --> 00:23:59.940
to the bottom or maybe not.

00:23:59.940 --> 00:24:02.270
You're not going to hit here.

00:24:02.270 --> 00:24:04.760
And it's going to
miss that and come up.

00:24:04.760 --> 00:24:08.250
Maybe I should draw it
over here, whatever.

00:24:08.250 --> 00:24:16.540
So it's called a line search,
to decide how far to go there.

00:24:16.540 --> 00:24:17.655
And then say, OK stop.

00:24:20.440 --> 00:24:24.190
And you can invest a lot
of time or a little time

00:24:24.190 --> 00:24:27.880
to decide on that
first stopping point.

00:24:27.880 --> 00:24:31.220
And now just tell me,
what do you do next?

00:24:31.220 --> 00:24:34.070
So now you're here.

00:24:34.070 --> 00:24:36.580
What now?

00:24:36.580 --> 00:24:39.520
Recalculate the gradient.

00:24:39.520 --> 00:24:43.840
Find the steepest way
down from that point,

00:24:43.840 --> 00:24:47.830
follow it until it turns
up or approximately,

00:24:47.830 --> 00:24:49.160
then you're at a new point.

00:24:49.160 --> 00:24:51.330
So this is gradient descent.

00:24:51.330 --> 00:24:54.550
That's gradient descent,
the big algorithm

00:24:54.550 --> 00:25:00.110
of deep learning of neural
nets, of machine learning--

00:25:00.110 --> 00:25:02.440
of optimization, you could say.

00:25:02.440 --> 00:25:06.150
Notice that we didn't
compute second derivatives.

00:25:06.150 --> 00:25:08.680
If we computed
second derivatives,

00:25:08.680 --> 00:25:12.640
we could have a fancier
formula that could

00:25:12.640 --> 00:25:17.410
account for the curve here.

00:25:17.410 --> 00:25:19.390
But to compute
second derivatives

00:25:19.390 --> 00:25:22.380
when you've got hundreds
and thousands of variables

00:25:22.380 --> 00:25:24.620
is not a lot of fun.

00:25:24.620 --> 00:25:30.010
So most effectively,
machine learning

00:25:30.010 --> 00:25:33.910
is limited to first
derivatives, the gradient.

00:25:37.150 --> 00:25:37.940
OK.

00:25:37.940 --> 00:25:40.580
So that's the general idea.

00:25:40.580 --> 00:25:48.850
But there are lots and
lots of decisions and--

00:25:48.850 --> 00:25:52.930
why doesn't that-- how
well does that work,

00:25:52.930 --> 00:25:56.200
maybe, is a good
question to ask.

00:25:56.200 --> 00:26:04.860
Does this work pretty well or
do we have to add more ideas?

00:26:04.860 --> 00:26:09.020
Well, it doesn't
always work well.

00:26:09.020 --> 00:26:13.370
Let me tell you
what the trouble is.

00:26:13.370 --> 00:26:20.380
I'm way off-- this is
March or something.

00:26:20.380 --> 00:26:23.840
But anyway, I'll
finish this sentence.

00:26:23.840 --> 00:26:30.590
So what's the problem with
this gradient descent idea?

00:26:30.590 --> 00:26:33.680
It turns out, if you're
going down a narrow valley--

00:26:33.680 --> 00:26:37.400
I don't know, if you can sort
of imagine a narrow valley

00:26:37.400 --> 00:26:38.900
toward the bottom.

00:26:38.900 --> 00:26:42.980
So here's the bottom.

00:26:42.980 --> 00:26:44.880
Here's your starting point.

00:26:44.880 --> 00:26:50.510
And this is-- you have to
have think of this as a bowl.

00:26:50.510 --> 00:26:54.050
So the bowl is--

00:26:54.050 --> 00:26:56.540
or the two eigenvalues,
you could say--

00:26:56.540 --> 00:26:58.700
are 1 and a very small number.

00:26:58.700 --> 00:27:01.610
The bowl is long and thin.

00:27:01.610 --> 00:27:02.750
Are you with me?

00:27:02.750 --> 00:27:05.240
Imagine a long, thin bowl.

00:27:05.240 --> 00:27:08.230
Then what happens for that case?

00:27:08.230 --> 00:27:11.120
You take the steepest descent.

00:27:11.120 --> 00:27:14.210
But you cross the valley,
and very soon, you're

00:27:14.210 --> 00:27:15.800
climbing again.

00:27:15.800 --> 00:27:18.890
So you take very,
very small steps,

00:27:18.890 --> 00:27:24.200
just staggering back
and forth across this

00:27:24.200 --> 00:27:29.920
and getting slowly, but too
slowly, toward the bottom.

00:27:29.920 --> 00:27:35.740
So that's why things
have got to be improved.

00:27:35.740 --> 00:27:41.320
If you have a very small
eigenvalue and a very large

00:27:41.320 --> 00:27:48.160
eigenvalue, those tell you the
shape of the bowl, of course.

00:27:48.160 --> 00:27:53.440
And many cases will
be like that-- have

00:27:53.440 --> 00:27:55.510
a small and a large eigenvalue.

00:27:55.510 --> 00:27:57.640
And then you're
spending all your time.

00:27:57.640 --> 00:28:01.870
You're quickly going up the
other side, down, up, down, up,

00:28:01.870 --> 00:28:02.590
down.

00:28:02.590 --> 00:28:06.450
And you need a new idea.

00:28:06.450 --> 00:28:10.420
OK, so that's really--

00:28:10.420 --> 00:28:13.520
so this is one major reason
why positive definite

00:28:13.520 --> 00:28:16.460
is so important because
positive definite gives

00:28:16.460 --> 00:28:18.600
pictures like that.

00:28:18.600 --> 00:28:20.690
But then, we have
this question of,

00:28:20.690 --> 00:28:23.630
are the eigenvalues
sort of the same size?

00:28:23.630 --> 00:28:25.790
Of course, if the
eigenvalues are all equal,

00:28:25.790 --> 00:28:28.040
what's my bowl like?

00:28:28.040 --> 00:28:32.170
Suppose I have the identity.

00:28:32.170 --> 00:28:36.210
So then x squared plus y
squared is my function.

00:28:36.210 --> 00:28:38.930
Then it's a perfectly
circular bowl.

00:28:38.930 --> 00:28:40.160
What will happen?

00:28:40.160 --> 00:28:42.320
Can you imagine a
perfectly circular--

00:28:42.320 --> 00:28:48.710
like any bowl in the kitchen is
probably, most likely circular.

00:28:48.710 --> 00:28:51.590
And suppose I do
gradient descent there.

00:28:51.590 --> 00:28:56.600
I start at some point on
this perfectly circular bowl.

00:28:56.600 --> 00:28:57.980
I start down.

00:28:57.980 --> 00:28:59.690
And where do I
stop in that case?

00:29:02.960 --> 00:29:05.630
Do I hit bottom?

00:29:05.630 --> 00:29:07.205
I do, by symmetry.

00:29:11.520 --> 00:29:16.900
So if I take x squared plus
y squared as my function

00:29:16.900 --> 00:29:22.250
and I start somewhere, I
figure out the gradient.

00:29:22.250 --> 00:29:22.970
Yeah.

00:29:22.970 --> 00:29:25.460
The answer is I'll go
right through the center.

00:29:25.460 --> 00:29:32.960
So really positive eigenvalues,
positive definite matrices

00:29:32.960 --> 00:29:34.370
give us a bowl.

00:29:34.370 --> 00:29:39.470
But if the eigenvalues
are far apart,

00:29:39.470 --> 00:29:42.050
that's when we have problems.

00:29:42.050 --> 00:29:44.810
OK.

00:29:44.810 --> 00:29:51.840
I'm going back to my
job, which is this--

00:29:51.840 --> 00:29:56.730
because this is so nice.

00:29:56.730 --> 00:29:57.540
Right.

00:29:57.540 --> 00:30:01.970
Could you-- well,
the homework that's

00:30:01.970 --> 00:30:09.350
maybe going out this minute
for middle of next week

00:30:09.350 --> 00:30:11.540
gives you some
exercises with this.

00:30:11.540 --> 00:30:20.730
Let me do a couple of things,
a couple of exercises here.

00:30:20.730 --> 00:30:25.890
For example, suppose I have a
positive definite matrix, S,

00:30:25.890 --> 00:30:31.650
and a positive definite matrix,
T. If I add those matrices,

00:30:31.650 --> 00:30:33.600
is the result positive definite?

00:30:33.600 --> 00:30:37.140
So there is a perfect
math question,

00:30:37.140 --> 00:30:39.208
and we hope to answer it.

00:30:41.960 --> 00:30:44.660
So S and T--

00:30:44.660 --> 00:30:48.470
positive definite.

00:30:48.470 --> 00:30:50.180
What about S plus T?

00:30:53.720 --> 00:30:56.470
Is that matrix
positive definite?

00:30:56.470 --> 00:30:58.130
OK.

00:30:58.130 --> 00:31:00.320
How do I answer such a question?

00:31:00.320 --> 00:31:03.920
I look at my five tests
and I think, can I use it?

00:31:03.920 --> 00:31:06.210
Which one will be good?

00:31:06.210 --> 00:31:11.150
And one that won't tell
me much is the eigenvalues

00:31:11.150 --> 00:31:14.960
because the
eigenvalues of S plus T

00:31:14.960 --> 00:31:19.970
are not immediately clear from
the eigenvalues of S and T

00:31:19.970 --> 00:31:21.260
separately.

00:31:21.260 --> 00:31:23.120
I don't want to use that test.

00:31:23.120 --> 00:31:26.540
This is my favorite test,
so I'm going to use that.

00:31:26.540 --> 00:31:28.790
What about the energy in--

00:31:28.790 --> 00:31:30.140
so look at the energy.

00:31:33.590 --> 00:31:39.860
So I look at x
transpose, S plus T x.

00:31:39.860 --> 00:31:43.200
And what's my question
in my mind here?

00:31:43.200 --> 00:31:48.300
Is that a positive number
or not, for every x?

00:31:48.300 --> 00:31:50.340
And how am I going to
answer that question?

00:31:53.200 --> 00:31:56.320
Just separate those
into two pieces, right?

00:31:56.320 --> 00:31:58.270
It's there in front of me.

00:31:58.270 --> 00:32:00.880
It's this one plus this one.

00:32:04.630 --> 00:32:08.380
And both of those are positive,
so the answer is yes, it

00:32:08.380 --> 00:32:09.530
is positive definite.

00:32:09.530 --> 00:32:10.030
Yes.

00:32:15.110 --> 00:32:17.870
You see how the
energy was right.

00:32:17.870 --> 00:32:21.530
I don't want to compute the
pivots or any determinants.

00:32:21.530 --> 00:32:26.180
That would be a nightmare trying
to find the determinants for S

00:32:26.180 --> 00:32:32.690
plus T. But this one
just does it immediately.

00:32:32.690 --> 00:32:36.380
What else would be a good
example to start with?

00:32:36.380 --> 00:32:38.240
What about S inverse?

00:32:38.240 --> 00:32:40.260
Is that positive definite?

00:32:40.260 --> 00:32:44.660
So let me ask S
positive definite,

00:32:44.660 --> 00:32:47.310
and I want to ask
about its inverse.

00:32:47.310 --> 00:32:49.175
So its inverse is
a symmetric matrix.

00:32:51.770 --> 00:32:56.990
And is it positive definite?

00:32:56.990 --> 00:33:00.170
And the answer-- yes.

00:33:00.170 --> 00:33:01.790
Yes.

00:33:01.790 --> 00:33:07.630
I've got five tests, 20% chance
at picking the right one.

00:33:07.630 --> 00:33:11.220
Determinants is not good.

00:33:11.220 --> 00:33:13.520
The first one is great.

00:33:13.520 --> 00:33:16.970
The first one is the good
one for this question

00:33:16.970 --> 00:33:18.650
because the eigenvalues.

00:33:18.650 --> 00:33:21.200
So the answer is yes.

00:33:21.200 --> 00:33:26.890
Yes, this has-- eigenvalues.

00:33:26.890 --> 00:33:30.520
So what are the
eigenvalues of S inverse?

00:33:30.520 --> 00:33:32.810
1 over lambda?

00:33:32.810 --> 00:33:37.946
So-- yes, positive
definite, positive definite.

00:33:45.400 --> 00:33:47.680
Yep.

00:33:47.680 --> 00:33:51.070
What about-- let me
ask you just one more

00:33:51.070 --> 00:33:54.000
question of the same sort.

00:33:54.000 --> 00:34:00.250
Suppose I have a matrix, S,
and suppose I multiply it

00:34:00.250 --> 00:34:03.440
by another matrix.

00:34:03.440 --> 00:34:03.940
Oh, well.

00:34:03.940 --> 00:34:05.510
OK.

00:34:05.510 --> 00:34:12.860
Suppose-- do I want
to ask you this?

00:34:12.860 --> 00:34:19.060
Suppose I asked you about
S times another matrix,

00:34:19.060 --> 00:34:29.010
M. Would that be
positive definite or not?

00:34:29.010 --> 00:34:31.980
Now I'm going to
tell you the answer

00:34:31.980 --> 00:34:35.880
is that the question wasn't
any good because that matrix is

00:34:35.880 --> 00:34:39.000
probably not symmetric,
and I'm only dealing

00:34:39.000 --> 00:34:40.860
with symmetric matrices.

00:34:40.860 --> 00:34:45.090
Matrices have to be
symmetric before I

00:34:45.090 --> 00:34:50.400
know they have real eigenvalues
and I can ask these questions.

00:34:50.400 --> 00:34:52.050
So that's not good.

00:34:52.050 --> 00:34:55.664
But I could-- oh, let's see.

00:34:58.830 --> 00:35:02.010
Let me put it in
an orthogonal guy.

00:35:02.010 --> 00:35:03.810
Well, still that's
not symmetric.

00:35:03.810 --> 00:35:06.470
But if I put the--

00:35:06.470 --> 00:35:07.950
it's transpose over there.

00:35:07.950 --> 00:35:10.530
Then I made it symmetric.

00:35:10.530 --> 00:35:14.370
Oh, dear, I may be getting
myself in trouble here.

00:35:14.370 --> 00:35:17.490
So I'm starting with
a positive definite S.

00:35:17.490 --> 00:35:19.620
I'm hitting it with
an orthogonal matrix

00:35:19.620 --> 00:35:21.120
and its transpose.

00:35:21.120 --> 00:35:25.080
And my instinct carried
me here because I know

00:35:25.080 --> 00:35:26.560
that that's still symmetric.

00:35:26.560 --> 00:35:27.060
Right?

00:35:27.060 --> 00:35:28.470
Everybody sees that?

00:35:28.470 --> 00:35:32.430
If I transpose this, Q
transpose will come here,

00:35:32.430 --> 00:35:34.300
S, Q will go there.

00:35:34.300 --> 00:35:37.390
It'll be symmetric.

00:35:37.390 --> 00:35:41.880
Now is that positive definite?

00:35:41.880 --> 00:35:42.960
Ah, yes.

00:35:42.960 --> 00:35:46.335
We can answer that.

00:35:46.335 --> 00:35:47.920
Can we?

00:35:47.920 --> 00:35:49.760
Is that positive definite?

00:35:49.760 --> 00:35:52.430
So remember that this
is an orthogonal matrix,

00:35:52.430 --> 00:35:55.930
so also, if you wanted me to
write it that way, I could.

00:35:59.150 --> 00:36:02.060
And what about
positive-definiteness

00:36:02.060 --> 00:36:02.970
of that thing?

00:36:08.420 --> 00:36:10.100
Answer, I think, is yes.

00:36:10.100 --> 00:36:12.090
Do you agree?

00:36:12.090 --> 00:36:14.070
It is positive definite?

00:36:14.070 --> 00:36:16.050
Give me a reason, though.

00:36:16.050 --> 00:36:18.530
Why is this positive definite?

00:36:21.190 --> 00:36:27.710
So that word similar, this
is a similar matrix to S?

00:36:27.710 --> 00:36:30.700
Do you remember what similar
means from last time?

00:36:30.700 --> 00:36:33.250
It means that sum
M and its inverse

00:36:33.250 --> 00:36:35.860
are here, which they are.

00:36:35.860 --> 00:36:41.080
And so what's the
consequence of being similar?

00:36:41.080 --> 00:36:44.470
What do I know about a
matrix that's similar to S?

00:36:44.470 --> 00:36:44.977
It has--

00:36:44.977 --> 00:36:46.060
AUDIENCE: Same [INAUDIBLE]

00:36:46.060 --> 00:36:47.435
GILBERT STRANG:
Same eigenvalues.

00:36:47.435 --> 00:36:49.990
And therefore, we're good.

00:36:49.990 --> 00:36:50.800
Right?

00:36:50.800 --> 00:36:54.580
Or I could go this way.

00:36:54.580 --> 00:36:57.400
I like energy, so
let me try that one.

00:36:57.400 --> 00:37:04.060
x transpose, Q transpose, SQx--

00:37:04.060 --> 00:37:06.310
that would be the energy.

00:37:06.310 --> 00:37:08.020
And what am I trying to show?

00:37:08.020 --> 00:37:10.270
I'm trying to show
it's positive.

00:37:10.270 --> 00:37:16.180
So, of course, as
soon as I see that,

00:37:16.180 --> 00:37:20.500
it's just waiting for me to--

00:37:20.500 --> 00:37:24.760
let Qx be something
called y, maybe.

00:37:24.760 --> 00:37:26.092
And then what will this be?

00:37:26.092 --> 00:37:27.050
AUDIENCE: y [INAUDIBLE]

00:37:27.050 --> 00:37:29.800
GILBERT STRANG: y transpose.

00:37:29.800 --> 00:37:36.850
So this energy would be the
same as y transpose, Sy.

00:37:36.850 --> 00:37:39.400
And what do I know about that?

00:37:39.400 --> 00:37:43.530
It's positive because
that's an energy in the y,

00:37:43.530 --> 00:37:45.390
for the y vector.

00:37:45.390 --> 00:37:49.740
So one way or another,
we get the answer yes

00:37:49.740 --> 00:37:51.774
to that question.

00:37:51.774 --> 00:37:53.659
OK.

00:37:53.659 --> 00:37:54.159
OK.

00:37:57.980 --> 00:38:06.350
Let me introduce the
idea of semidefinite.

00:38:06.350 --> 00:38:09.450
Semidefinite is the borderline.

00:38:09.450 --> 00:38:10.820
So what did we have?

00:38:10.820 --> 00:38:13.400
We had 3, 4, 4.

00:38:13.400 --> 00:38:18.120
And then when it was 5,
you told me indefinite,

00:38:18.120 --> 00:38:19.960
a negative eigenvalue.

00:38:19.960 --> 00:38:24.460
When it was 6, you told me
2 positive eigenvalues--

00:38:24.460 --> 00:38:25.510
definite.

00:38:25.510 --> 00:38:28.510
What's the borderline?

00:38:28.510 --> 00:38:29.880
What's the borderline there?

00:38:32.680 --> 00:38:35.440
It's not going to be an integer.

00:38:35.440 --> 00:38:36.410
What do I mean?

00:38:36.410 --> 00:38:38.222
What am I looking
for, the borderline?

00:38:43.110 --> 00:38:44.220
So tell me again?

00:38:44.220 --> 00:38:45.120
AUDIENCE: 16 over--

00:38:45.120 --> 00:38:48.960
GILBERT STRANG: 16/3,
that sounds right.

00:38:48.960 --> 00:38:50.760
Why is that the borderline?

00:38:50.760 --> 00:38:52.200
AUDIENCE: [INAUDIBLE]

00:38:52.200 --> 00:38:54.540
GILBERT STRANG: Because
now the determinant is--

00:38:54.540 --> 00:38:55.280
AUDIENCE: 0.

00:38:55.280 --> 00:38:56.030
GILBERT STRANG: 0.

00:38:56.030 --> 00:38:56.760
It's singular.

00:38:56.760 --> 00:38:59.550
It has a 0 eigenvalue.

00:38:59.550 --> 00:39:00.870
There's a 0 eigenvalue.

00:39:00.870 --> 00:39:03.270
So that's what
semidefinite means.

00:39:03.270 --> 00:39:05.730
Lambdas are equal to 0.

00:39:05.730 --> 00:39:06.780
Wait a minute.

00:39:06.780 --> 00:39:10.620
That has a 0 eigenvalue
because it's determinant is 0.

00:39:10.620 --> 00:39:15.120
How do I know that the other
eigenvalue is positive?

00:39:15.120 --> 00:39:17.730
Could it be that the
other ei-- so this

00:39:17.730 --> 00:39:23.380
is the semidefinite
case we hope.

00:39:23.380 --> 00:39:27.710
But we'd better
finish that reasoning.

00:39:27.710 --> 00:39:31.040
How do I know that the other
eigenvalue is positive?

00:39:31.040 --> 00:39:32.340
AUDIENCE: Trace.

00:39:32.340 --> 00:39:35.330
GILBERT STRANG: The
trace, because adding

00:39:35.330 --> 00:39:38.900
3 plus 16/3, whatever
the heck that might give,

00:39:38.900 --> 00:39:41.480
it certainly gives
a positive number.

00:39:41.480 --> 00:39:44.650
And that will be
lambda 1 plus lambda 2.

00:39:44.650 --> 00:39:45.980
That's the trace.

00:39:45.980 --> 00:39:48.200
But lambda 2 is 0.

00:39:48.200 --> 00:39:50.210
We know from this it's singular.

00:39:50.210 --> 00:39:51.830
So we know lambda 2 is 0.

00:39:51.830 --> 00:39:55.870
So lambda 1 must be 3 plus 5--

00:39:55.870 --> 00:39:57.910
5 and 1/3.

00:39:57.910 --> 00:40:06.070
The lambdas must be 8 and
1/3, 3 plus 5 and 1/3, and 0.

00:40:06.070 --> 00:40:11.350
So that's a positive
semidefinite.

00:40:11.350 --> 00:40:14.050
If you think of the
positive definite matrices

00:40:14.050 --> 00:40:20.710
as some clump in matrix space,
then the positive semidefinite

00:40:20.710 --> 00:40:23.320
definite ones are sort of
the edge of that clump.

00:40:23.320 --> 00:40:25.390
There the boundary of
the clump, the ones

00:40:25.390 --> 00:40:31.210
that are not quite inside
but not outside either.

00:40:31.210 --> 00:40:37.600
They're lying right on the edge
of positive definite matrices.

00:40:37.600 --> 00:40:38.800
Let me just take a--

00:40:41.420 --> 00:40:45.510
so what about a
matrix of all 1s?

00:40:49.200 --> 00:40:54.060
What's the story on that
one-- positive definite, all

00:40:54.060 --> 00:40:58.470
the numbers are positive,
or positive semidefinite,

00:40:58.470 --> 00:40:59.840
or indefinite?

00:40:59.840 --> 00:41:02.070
What do you think here?

00:41:02.070 --> 00:41:05.270
1-1, all 1.

00:41:05.270 --> 00:41:06.150
AUDIENCE: Semi--

00:41:06.150 --> 00:41:09.790
GILBERT STRANG: Semidefinite
sounds like a good guess.

00:41:09.790 --> 00:41:13.620
Do you know what the eigenvalues
of this matrix would be?

00:41:13.620 --> 00:41:16.110
AUDIENCE: 0 [INAUDIBLE]

00:41:16.110 --> 00:41:20.280
GILBERT STRANG: 3, 0, and
0-- why did you say that?

00:41:20.280 --> 00:41:22.110
AUDIENCE: Because 2 [INAUDIBLE]

00:41:22.110 --> 00:41:24.360
GILBERT STRANG: Because we
only have-- the rank is?

00:41:24.360 --> 00:41:24.867
AUDIENCE: 1.

00:41:24.867 --> 00:41:26.700
GILBERT STRANG: Yeah,
we introduced that key

00:41:26.700 --> 00:41:29.190
where the rank is 1.

00:41:29.190 --> 00:41:32.750
So there's only one
nonzero eigenvalue.

00:41:32.750 --> 00:41:37.320
And then the trace tells
me that number is 3.

00:41:37.320 --> 00:41:43.700
So this is a positive
semidefinite matrix.

00:41:43.700 --> 00:41:51.650
So all these tests change
a little for semidefinite.

00:41:51.650 --> 00:41:55.310
The eigenvalue is
greater or equal to 0.

00:41:55.310 --> 00:41:58.520
The energy is greater
or equal to 0.

00:41:58.520 --> 00:42:01.320
The A transpose A-- but
now I don't require--

00:42:01.320 --> 00:42:03.710
oh, I didn't discuss this.

00:42:03.710 --> 00:42:07.010
But semidefinite would
allow dependent columns.

00:42:07.010 --> 00:42:10.430
By the way, you've
got to do this for me.

00:42:10.430 --> 00:42:14.000
Write that matrix as A
transpose times A just

00:42:14.000 --> 00:42:19.275
to see that it's
semidefinite because--

00:42:22.720 --> 00:42:29.090
so write that as A
transpose A. Yeah.

00:42:29.090 --> 00:42:32.840
If it's a rank 1 matrix, you
know what it must look like.

00:42:37.280 --> 00:42:41.460
A transpose A, how many terms
am I going to have in this?

00:42:41.460 --> 00:42:45.590
And now I'm thinking back to the
very beginning of this course

00:42:45.590 --> 00:42:49.350
if I pulled off the pieces.

00:42:49.350 --> 00:42:55.740
In general, this is lambda 1
times the first eigenvector,

00:42:55.740 --> 00:42:58.140
times the first
eigenvector transposed.

00:42:58.140 --> 00:43:00.690
AUDIENCE: Would it just
be a vector of three 1s?

00:43:00.690 --> 00:43:03.350
GILBERT STRANG: Yeah, it would
just be a vector of three 1s.

00:43:03.350 --> 00:43:03.850
Yeah.

00:43:03.850 --> 00:43:10.380
So this would be
the usual picture.

00:43:10.380 --> 00:43:15.810
This is the same as the
Q lambda, Q transpose.

00:43:15.810 --> 00:43:20.220
This is the big fact for
any symmetric matrix.

00:43:20.220 --> 00:43:27.390
And this is symmetric,
but its rank is only 1,

00:43:27.390 --> 00:43:33.570
so that lambda 2 is
0 for that matrix.

00:43:33.570 --> 00:43:35.820
Lambda 3 is 0 for that matrix.

00:43:35.820 --> 00:43:40.200
And the one eigenvector
is the vector 1-1-1.

00:43:40.200 --> 00:43:45.090
And the eigen-- so this
would be 3 times 1-1-1.

00:43:45.090 --> 00:43:48.250
Oh, I have to do--

00:43:48.250 --> 00:43:49.270
yeah.

00:43:49.270 --> 00:43:54.130
So I was going to do 3
times 1-1-1, times 1-1-1.

00:43:57.450 --> 00:44:00.950
But that gives me 3-3-3.

00:44:00.950 --> 00:44:02.405
That's not right.

00:44:02.405 --> 00:44:03.850
AUDIENCE: Normalize them.

00:44:03.850 --> 00:44:05.558
GILBERT STRANG: I have
to normalize them.

00:44:05.558 --> 00:44:06.210
That's right.

00:44:06.210 --> 00:44:06.710
Yeah.

00:44:06.710 --> 00:44:09.510
So that's a vector whose
length is the square root of 3.

00:44:09.510 --> 00:44:13.800
So I have to divide by
that, and divide by it.

00:44:13.800 --> 00:44:16.500
And then the 3 cancels
the square root of 3s,

00:44:16.500 --> 00:44:20.110
and I'm just left
with 1-1-1, 1-1-1.

00:44:20.110 --> 00:44:20.610
Yeah.

00:44:20.610 --> 00:44:23.040
AUDIENCE: [INAUDIBLE]

00:44:23.040 --> 00:44:25.810
GILBERT STRANG: So
there is a matrix--

00:44:25.810 --> 00:44:29.260
one of our building-block
type matrices because it only

00:44:29.260 --> 00:44:32.910
has one nonzero eigenvalue.

00:44:32.910 --> 00:44:36.780
Its rank is 1, so it could
not be positive definite.

00:44:36.780 --> 00:44:39.310
It's singular.

00:44:39.310 --> 00:44:44.110
But it is positive semidefinite
because that eigenvalue

00:44:44.110 --> 00:44:46.380
is positive.

00:44:46.380 --> 00:44:48.360
OK.

00:44:48.360 --> 00:44:54.090
So you've got the idea of
positive definite matrices.

00:44:54.090 --> 00:44:57.780
Again, any one of
those five tests

00:44:57.780 --> 00:45:01.260
is enough to show that
it's positive definite.

00:45:01.260 --> 00:45:05.580
And so what's my goal next week?

00:45:05.580 --> 00:45:08.970
It's the singular value
decomposition and all

00:45:08.970 --> 00:45:11.520
that that leads us to.

00:45:11.520 --> 00:45:17.880
We're there now,
ready for the SVD.

00:45:17.880 --> 00:45:18.480
OK.

00:45:18.480 --> 00:45:20.700
Have a good weekend,
and see you--

00:45:20.700 --> 00:45:22.680
oh, I see you on
Tuesday, I guess.

00:45:22.680 --> 00:45:27.440
Right-- not Monday
but Tuesday next week.