WEBVTT

00:00:01.730 --> 00:00:03.700
PROFESSOR: The
previous video was

00:00:03.700 --> 00:00:06.480
about positive
definite matrices.

00:00:06.480 --> 00:00:12.420
This video is also linear
algebra, a very interesting way

00:00:12.420 --> 00:00:17.660
to break up a matrix called the
singular value decomposition.

00:00:17.660 --> 00:00:23.760
And everybody says SVD for
singular value decomposition.

00:00:23.760 --> 00:00:25.560
And what is that factoring?

00:00:25.560 --> 00:00:28.790
What are the three
pieces of the SVD?

00:00:28.790 --> 00:00:34.600
So this is the fact is
every matrix, rectangular,

00:00:34.600 --> 00:00:40.970
every matrix factors into--
these are the three pieces.

00:00:40.970 --> 00:00:43.580
U sigma V transpose.

00:00:43.580 --> 00:00:48.150
People use those letters
for the three factors.

00:00:48.150 --> 00:00:54.430
The factor U is an orthogonal
matrix, an orthogonal matrix.

00:00:54.430 --> 00:00:58.780
The factor sigma in the
middle is a diagonal matrix.

00:00:58.780 --> 00:01:01.020
The factor V
transpose on the right

00:01:01.020 --> 00:01:03.610
is also an orthogonal matrix.

00:01:03.610 --> 00:01:09.890
So I have orthogonal, diagonal,
orthogonal, or physically,

00:01:09.890 --> 00:01:13.770
rotation, stretching, rotation.

00:01:13.770 --> 00:01:18.015
Now we have seen
three factors for

00:01:18.015 --> 00:01:22.030
a matrix, V, lambda, V inverse.

00:01:22.030 --> 00:01:23.740
What's the difference?

00:01:23.740 --> 00:01:30.750
What's the difference between
this SVD, this, and the V,

00:01:30.750 --> 00:01:35.120
lambda, V transpose,
V inverse, V lambda,

00:01:35.120 --> 00:01:39.610
V inverse for diagonalizing
other matrices?

00:01:39.610 --> 00:01:43.050
So the lambda is diagonal
and the sigma is diagonal,

00:01:43.050 --> 00:01:44.460
but they're different.

00:01:44.460 --> 00:01:49.970
The key point is I now have two
different matrices, not just

00:01:49.970 --> 00:01:53.210
V and V inverse, but
two different matrices.

00:01:53.210 --> 00:01:56.810
But the new great
advantage is they

00:01:56.810 --> 00:02:00.900
are orthogonal
matrices, both of them.

00:02:00.900 --> 00:02:09.229
So by going to-- and I can do it
for rectangular matrices also.

00:02:09.229 --> 00:02:13.230
Eigenvalues really worked
for square matrices.

00:02:13.230 --> 00:02:15.160
Now we really are-- we have two.

00:02:15.160 --> 00:02:19.780
We have an input matrix
and an output matrix.

00:02:19.780 --> 00:02:25.150
In those spaces m and n can
have different dimensions.

00:02:25.150 --> 00:02:29.090
So by allowing two
separate bases,

00:02:29.090 --> 00:02:35.630
we get rectangular matrices,
and we get orthogonal factors

00:02:35.630 --> 00:02:37.240
with, again, a diagonal.

00:02:37.240 --> 00:02:39.120
And this is called--
these numbers

00:02:39.120 --> 00:02:44.560
sigma instead of eigenvalues,
are called singular values.

00:02:44.560 --> 00:02:46.850
So these are the
singular values.

00:02:46.850 --> 00:02:50.470
These are the singular vectors,
the right singular vectors

00:02:50.470 --> 00:02:52.910
and the left singular vectors.

00:02:52.910 --> 00:02:55.430
That's the statement
of the factorization.

00:02:55.430 --> 00:03:01.820
But we have to think a little
bit, what are those factors?

00:03:01.820 --> 00:03:06.100
What are the-- can we
see why this works?

00:03:06.100 --> 00:03:07.980
So I want that.

00:03:07.980 --> 00:03:12.450
And let me do, as
you see this coming,

00:03:12.450 --> 00:03:18.310
I'll look at A transpose
A. I like A transpose A.

00:03:18.310 --> 00:03:21.500
So A transpose will
be, I transpose this.

00:03:21.500 --> 00:03:27.980
V sigma transpose
U transpose, right?

00:03:27.980 --> 00:03:29.160
That's A transpose.

00:03:29.160 --> 00:03:34.980
Then I multiply by A
U sigma V transpose.

00:03:34.980 --> 00:03:38.410
And what do I have?

00:03:38.410 --> 00:03:40.900
Well, I've got six matrices.

00:03:40.900 --> 00:03:45.140
But U transpose U in
here is the identity,

00:03:45.140 --> 00:03:47.720
because U is an
orthogonal matrix.

00:03:47.720 --> 00:03:52.330
So I really have just the V
on one side, a sigma transpose

00:03:52.330 --> 00:03:59.420
sigma, that'll be diagonal,
and a V transpose the right.

00:03:59.420 --> 00:04:01.070
This I recognize.

00:04:01.070 --> 00:04:02.120
This I recognize.

00:04:02.120 --> 00:04:07.990
Here is a single V, a diagonal
matrix, a V transpose.

00:04:07.990 --> 00:04:10.350
What I'm showing
you here, what we

00:04:10.350 --> 00:04:14.240
reached is the eigenvalue,
the diagonalization,

00:04:14.240 --> 00:04:17.810
the usual eigenvalues
are in here

00:04:17.810 --> 00:04:21.029
and the eigenvectors
are in here.

00:04:21.029 --> 00:04:24.740
But the matrix is A transpose A.

00:04:24.740 --> 00:04:28.610
Once again, A was rectangular
and completely general

00:04:28.610 --> 00:04:32.710
and we couldn't see
perfect results.

00:04:32.710 --> 00:04:34.510
But when we went
to A transpose A,

00:04:34.510 --> 00:04:37.970
that gave us a positive
semidefinite matrix,

00:04:37.970 --> 00:04:39.860
symmetric for sure.

00:04:39.860 --> 00:04:42.930
Its eigenvectors
will be orthogonal.

00:04:42.930 --> 00:04:46.840
That's how I know this V
matrix, the eigenvectors

00:04:46.840 --> 00:04:49.960
for this symmetric
matrix, are orthogonal

00:04:49.960 --> 00:04:53.030
and the eigenvalues
are positive.

00:04:53.030 --> 00:04:55.830
And they're the squares
of the singular value.

00:04:55.830 --> 00:04:58.450
So this is telling
me the lambdas

00:04:58.450 --> 00:05:07.620
for A transpose A are the
sigma squareds for s-- for A.

00:05:07.620 --> 00:05:08.205
For A itself.

00:05:12.860 --> 00:05:14.790
Lambda is the same.

00:05:14.790 --> 00:05:21.210
Lambda for A transpose A is
sigma squared for the matrix A.

00:05:21.210 --> 00:05:25.410
Well that tells me V,
that tells me sigma,

00:05:25.410 --> 00:05:31.850
and U disappeared here because
U transpose U was the identity.

00:05:31.850 --> 00:05:33.250
It just went away.

00:05:33.250 --> 00:05:36.570
How would I get hold of U?

00:05:36.570 --> 00:05:39.330
Well, here's one way to see it.

00:05:39.330 --> 00:05:44.730
I multiply A times A transpose
in that order, in that order.

00:05:44.730 --> 00:05:48.790
So now I have U
sigma V transpose

00:05:48.790 --> 00:05:51.890
times the transpose,
which is the V sigma

00:05:51.890 --> 00:05:55.260
transpose U transpose--
I'm having a lot of fun

00:05:55.260 --> 00:05:56.570
here with transposes.

00:05:56.570 --> 00:06:00.970
But V transpose V is now
the identity in the middle.

00:06:00.970 --> 00:06:03.190
So what do I learn here?

00:06:03.190 --> 00:06:07.440
I learn that U is
the eigenvector

00:06:07.440 --> 00:06:12.330
matrix for AA transpose.

00:06:12.330 --> 00:06:16.670
So these have the
same eigenvalues,

00:06:16.670 --> 00:06:18.610
A times B has the
same eigenvalues

00:06:18.610 --> 00:06:22.680
as B times A in this
case, it comes out here.

00:06:22.680 --> 00:06:23.940
Same eigenvalues.

00:06:23.940 --> 00:06:28.120
This has eigenvectors V,
this has eigenvectors U,

00:06:28.120 --> 00:06:32.580
and those are the V and
the U in the singular value

00:06:32.580 --> 00:06:33.580
decomposition.

00:06:33.580 --> 00:06:36.130
Well, I have to
show you an example

00:06:36.130 --> 00:06:38.710
I have to show you an
example and an application,

00:06:38.710 --> 00:06:40.390
and that's it.

00:06:40.390 --> 00:06:41.440
So here's an example.

00:06:44.760 --> 00:06:50.060
Suppose A, I'll make it a
square matrix, 2, 2, minus 1,

00:06:50.060 --> 00:06:53.590
1, not symmetric.

00:06:53.590 --> 00:06:55.090
Certainly not positive definite.

00:06:55.090 --> 00:06:58.360
I wouldn't use the word because
that matrix is not symmetric.

00:06:58.360 --> 00:07:04.010
But it's got an
SVD, three factors.

00:07:08.890 --> 00:07:12.190
And I work them out.

00:07:12.190 --> 00:07:16.960
This is the orthogonal matrix.

00:07:16.960 --> 00:07:22.810
I have to divide by square root
of 5 to make it unit vectors.

00:07:22.810 --> 00:07:25.680
Oops, that's not going to work.

00:07:25.680 --> 00:07:27.250
How about that?

00:07:27.250 --> 00:07:32.140
The two columns are orthogonal
and that's a perfectly good U.

00:07:32.140 --> 00:07:36.360
And then in the sigma, I
got, well that's a-- oh,

00:07:36.360 --> 00:07:37.930
I did want 1 and 1.

00:07:37.930 --> 00:07:40.880
I did want 1 and 1, yes.

00:07:40.880 --> 00:07:46.970
So I have a singular matrix,
determinant 0, singular matrix.

00:07:46.970 --> 00:07:52.300
So my eigenvalues will be 0 and
it turns out square root of 10

00:07:52.300 --> 00:07:57.020
is the other eigenvalue for
that-- other singular value

00:07:57.020 --> 00:07:58.160
for this guy.

00:07:58.160 --> 00:08:01.920
And now I'll put in the
V transpose matrix, which

00:08:01.920 --> 00:08:09.160
is 1, 1, and 1, minus 1 is it?

00:08:09.160 --> 00:08:12.160
And those have length
square root of 2,

00:08:12.160 --> 00:08:13.725
which I have to divide by.

00:08:17.290 --> 00:08:20.500
Well, I didn't do
that so smoothly,

00:08:20.500 --> 00:08:22.880
but the result is clear.

00:08:22.880 --> 00:08:27.010
U, sigma, V transpose,
so here's the sigma.

00:08:29.660 --> 00:08:35.020
And the singular values of this
matrix are square root of 10

00:08:35.020 --> 00:08:39.559
and then 0 because
it's a singular matrix.

00:08:39.559 --> 00:08:45.080
And the eigenvectors, well the
singular vectors of the matrix

00:08:45.080 --> 00:08:50.830
are the left singular vectors
and the right singular vectors.

00:08:50.830 --> 00:08:53.510
That looks good to me.

00:08:53.510 --> 00:08:56.320
And now the
application to finish.

00:08:56.320 --> 00:09:01.750
A first application is,
well, very important.

00:09:01.750 --> 00:09:04.090
All the time in
this century, we're

00:09:04.090 --> 00:09:08.580
getting matrices
with data in them.

00:09:08.580 --> 00:09:13.150
Maybe in life sciences,
we test a bunch

00:09:13.150 --> 00:09:19.100
of sample people for genes.

00:09:19.100 --> 00:09:23.890
So I have a-- my data
comes somehoe-- I

00:09:23.890 --> 00:09:27.460
have a gene expression matrix.

00:09:27.460 --> 00:09:38.320
I have samples, people, people
1, 2, 3 in those columns.

00:09:38.320 --> 00:09:46.200
And I have in the rows,
let me say four rows,

00:09:46.200 --> 00:09:53.008
I have genes, gene expressions.

00:09:53.008 --> 00:09:54.950
That would be completely normal.

00:09:54.950 --> 00:09:58.210
A rectangular matrix,
because the number of people

00:09:58.210 --> 00:10:00.970
and the number of
genes is not the same.

00:10:00.970 --> 00:10:04.980
And in reality, those are
both very, very big numbers,

00:10:04.980 --> 00:10:06.850
so I have a large matrix.

00:10:06.850 --> 00:10:10.370
And out of it, I want to--
and each number in the matrix

00:10:10.370 --> 00:10:17.190
is telling me how much the gene
is expressed by that person.

00:10:17.190 --> 00:10:21.610
We may be searching for
genes causing some disease.

00:10:21.610 --> 00:10:25.470
So we take several people, some
well, some with the disease,

00:10:25.470 --> 00:10:27.220
we check on the genes.

00:10:27.220 --> 00:10:30.850
We get a big matrix and
we look to understand

00:10:30.850 --> 00:10:32.140
something about of it.

00:10:32.140 --> 00:10:33.660
What can we understand?

00:10:33.660 --> 00:10:35.050
What are we looking for?

00:10:35.050 --> 00:10:39.630
We're looking for the
correlation, the connection,

00:10:39.630 --> 00:10:45.590
between some combination
maybe of genes and some--

00:10:45.590 --> 00:10:49.650
we're looking for a gene
people connection here.

00:10:49.650 --> 00:10:53.480
But it's not going to
be person number one.

00:10:53.480 --> 00:10:55.660
We're not looking
for one person.

00:10:55.660 --> 00:10:58.350
We're going to find a
mixture of those people,

00:10:58.350 --> 00:11:05.060
so we're going to have sort of
an eigensample, eigenpeople.

00:11:05.060 --> 00:11:09.130
Oh, that's a terrible--
eigenperson would be better.

00:11:09.130 --> 00:11:12.530
So I think I'm seeing
an eigenperson.

00:11:12.530 --> 00:11:15.260
Let me see where I'm
going to put this.

00:11:15.260 --> 00:11:21.380
So yeah, I think my matrix
would be written-- oh, here

00:11:21.380 --> 00:11:23.840
is the main point.

00:11:23.840 --> 00:11:26.600
That just as I see
in this example,

00:11:26.600 --> 00:11:30.970
it's the first vector
and the first vector

00:11:30.970 --> 00:11:35.290
and the biggest sigma
that are all important.

00:11:35.290 --> 00:11:39.060
Well, in that example the
other sigma was 0, nothing.

00:11:39.060 --> 00:11:40.990
But in this example,
I'll probably

00:11:40.990 --> 00:11:43.270
have three different sigmas.

00:11:43.270 --> 00:11:50.600
But the largest sigma, the
first, the U1 and the V1, it's

00:11:50.600 --> 00:11:52.230
that combination that I want.

00:11:52.230 --> 00:12:03.700
I want U1 sigma 1 V1 transpose,
the first eigenvector

00:12:03.700 --> 00:12:06.680
of A transpose A
and of AA transpose.

00:12:06.680 --> 00:12:09.690
And the first singular,
the biggest singular value,

00:12:09.690 --> 00:12:11.830
that's the information.

00:12:11.830 --> 00:12:17.560
That's the best
sort of put together

00:12:17.560 --> 00:12:21.710
person, eigenperson,
combination of these people

00:12:21.710 --> 00:12:24.190
and the best
combination of genes.

00:12:24.190 --> 00:12:26.660
It has the-- in
statistics, I would

00:12:26.660 --> 00:12:28.460
say the greatest variance.

00:12:28.460 --> 00:12:31.880
In ordinary English, I would
say the most information.

00:12:31.880 --> 00:12:35.620
The most information
in this big matrix

00:12:35.620 --> 00:12:40.620
is in this very special
matrix with only rank one,

00:12:40.620 --> 00:12:43.630
only a single column repeated.

00:12:43.630 --> 00:12:47.030
A single row
repeated, and a number

00:12:47.030 --> 00:12:50.080
sigma 1, the number
that tells me that.

00:12:50.080 --> 00:12:53.120
Because remember,
U is a unit vector.

00:12:53.120 --> 00:12:54.870
V is a unit vector.

00:12:54.870 --> 00:12:57.110
It's that number sigma
1 that's selling me.

00:12:57.110 --> 00:13:02.830
So it's like that unit vector
times that number, key number,

00:13:02.830 --> 00:13:07.170
times that unit
vector, that's this.

00:13:07.170 --> 00:13:12.280
I'm talking here about
principle component analysis.

00:13:12.280 --> 00:13:16.420
I'm looking for the principle
component, this part.

00:13:16.420 --> 00:13:20.100
Principle component analysis.

00:13:20.100 --> 00:13:26.580
A big application in
applied statistics.

00:13:26.580 --> 00:13:32.640
You know, in large
scale drug tests,

00:13:32.640 --> 00:13:38.268
statisticians really have
a central place here.

00:13:38.268 --> 00:13:41.450
And this is on
the research side,

00:13:41.450 --> 00:13:44.870
to find the-- get
the information out

00:13:44.870 --> 00:13:48.150
of a big sample.

00:13:48.150 --> 00:13:52.330
So U1 is sort of a
combination of people.

00:13:52.330 --> 00:13:54.920
V1 is a combination of genes.

00:13:54.920 --> 00:13:57.610
Sigma 1 is the biggest
number I can get.

00:13:57.610 --> 00:14:02.760
So that's PCA, all coming
from the singular value

00:14:02.760 --> 00:14:04.370
decomposition.

00:14:04.370 --> 00:14:06.090
Thank you.