WEBVTT
00:00:01.580 --> 00:00:03.920
The following content is
provided under a Creative
00:00:03.920 --> 00:00:05.340
Commons license.
00:00:05.340 --> 00:00:07.550
Your support will help
MIT OpenCourseWare
00:00:07.550 --> 00:00:11.640
continue to offer high quality
educational resources for free.
00:00:11.640 --> 00:00:14.180
To make a donation or to
view additional materials
00:00:14.180 --> 00:00:18.110
from hundreds of MIT courses,
visit MIT OpenCourseWare
00:00:18.110 --> 00:00:19.130
at ocw.mit.edu.
00:00:23.390 --> 00:00:26.150
JAMES W. SWAN: So this is
going to be our last lecture
00:00:26.150 --> 00:00:27.960
on linear algebra.
00:00:27.960 --> 00:00:30.417
The first three
lectures covered basics.
00:00:30.417 --> 00:00:32.750
The next three lectures, we
talked about different sorts
00:00:32.750 --> 00:00:34.730
of transformations of matrices.
00:00:34.730 --> 00:00:36.742
This final lecture is
the last of those three.
00:00:36.742 --> 00:00:39.200
We're going to talk about in
another sort of transformation
00:00:39.200 --> 00:00:42.820
called the singular
value decomposition.
00:00:42.820 --> 00:00:46.610
OK, before we jump in, I'd like
to do the usual recap business.
00:00:46.610 --> 00:00:49.010
I think it's always hopeful
to recap or look at things
00:00:49.010 --> 00:00:51.170
from a different perspective.
00:00:51.170 --> 00:00:55.130
Early on, I told you that the
infinite dimensional equivalent
00:00:55.130 --> 00:00:57.740
of vectors would be something
like a function, which
00:00:57.740 --> 00:01:02.510
is a map, a unique map maybe
from a point to x to some value
00:01:02.510 --> 00:01:04.150
f of x.
00:01:04.150 --> 00:01:06.080
And there is an
equivalent representation
00:01:06.080 --> 00:01:08.780
of the eigenvalue eigenvector
problem in function space.
00:01:08.780 --> 00:01:11.330
We call these eigenvalues
and eigenfunctions.
00:01:11.330 --> 00:01:15.074
Here's a classic one where
the function is y of x, OK?
00:01:15.074 --> 00:01:17.240
This is the equivalent of
the vector, and equivalent
00:01:17.240 --> 00:01:18.890
of the transformation
or the matrix
00:01:18.890 --> 00:01:21.230
that's this differential
operator this time,
00:01:21.230 --> 00:01:22.850
the second derivative.
00:01:22.850 --> 00:01:26.340
So I take the second derivative
of this particular function,
00:01:26.340 --> 00:01:28.040
and the function is stretched.
00:01:28.040 --> 00:01:30.980
It's multiplied by some
fixed value at all points.
00:01:30.980 --> 00:01:33.830
And it becomes lambda times y.
00:01:33.830 --> 00:01:37.100
And that operator has to be
closed with some boundary
00:01:37.100 --> 00:01:37.920
conditions as well.
00:01:37.920 --> 00:01:39.830
We have to say
what the value of y
00:01:39.830 --> 00:01:43.640
is at the edges
of some boundary.
00:01:43.640 --> 00:01:45.590
So there's a one-to-one
correspondence
00:01:45.590 --> 00:01:48.470
between these things.
00:01:48.470 --> 00:01:54.730
What is the eigenfunction here,
or what are the eigenfunctions?
00:01:54.730 --> 00:01:57.160
And what are the
eigenvalues associated
00:01:57.160 --> 00:01:59.590
with this transformation
or this operator?
00:01:59.590 --> 00:02:01.667
Can you work those
out really quickly?
00:02:01.667 --> 00:02:03.250
You learned this at
some point, right?
00:02:03.250 --> 00:02:05.995
Somebody taught you
differential equations
00:02:05.995 --> 00:02:07.360
and you calculated these things.
00:02:07.360 --> 00:02:08.410
Take about 90 seconds.
00:02:08.410 --> 00:02:09.370
Work with the people around you.
00:02:09.370 --> 00:02:11.286
See if you can come to
a conclusion about what
00:02:11.286 --> 00:02:14.980
the eigenfunction
and eigenvalues are.
00:02:26.699 --> 00:02:27.490
That's enough time.
00:02:27.490 --> 00:02:28.870
You can work on this
on your own later
00:02:28.870 --> 00:02:29.953
if you've run out of time.
00:02:29.953 --> 00:02:30.976
Don't worry about it.
00:02:30.976 --> 00:02:32.600
Does somebody want
to volunteer a guess
00:02:32.600 --> 00:02:36.340
for what the eigenfunctions
are in this case?
00:02:36.340 --> 00:02:38.350
What are they?
00:02:38.350 --> 00:02:39.300
Yeah?
00:02:39.300 --> 00:02:42.846
AUDIENCE: [INAUDIBLE]
00:02:42.846 --> 00:02:44.720
JAMES W. SWAN: OK, so
you chose exponentials.
00:02:44.720 --> 00:02:45.950
That's an interesting choice.
00:02:45.950 --> 00:02:47.880
That's one possible
choice you can make.
00:02:47.880 --> 00:02:48.994
OK, so we could say--
00:02:48.994 --> 00:02:50.660
this is sort of a
classical one that you
00:02:50.660 --> 00:02:55.274
think about when you first
learn differential equation.
00:02:55.274 --> 00:02:56.690
They say, an
equation of this sort
00:02:56.690 --> 00:03:01.544
has solutions that look like
exponentials, and that's true.
00:03:01.544 --> 00:03:03.210
There's another
representation for this,
00:03:03.210 --> 00:03:05.760
which is as trigonometric
functions instead, right?
00:03:09.110 --> 00:03:10.470
Either of those is acceptable.
00:03:10.470 --> 00:03:12.550
[INAUDIBLE] the
trigonometric functions,
00:03:12.550 --> 00:03:17.417
that representation is a
little more useful for us here.
00:03:17.417 --> 00:03:19.750
We know that the boundary
conditions tell us that y of 0
00:03:19.750 --> 00:03:22.430
is supposed to be 0.
00:03:22.430 --> 00:03:26.590
That means that the C1 has to
be 0, because cosine of 0 is 1.
00:03:26.590 --> 00:03:29.550
So C1 has 0 in this case.
00:03:29.550 --> 00:03:33.030
So that fixes one of
these coefficients.
00:03:33.030 --> 00:03:35.730
And now we're left
with a problem, right?
00:03:35.730 --> 00:03:38.310
Our solutions, our
eigenfunctions,
00:03:38.310 --> 00:03:39.220
cannot be unique.
00:03:39.220 --> 00:03:41.760
So we don't get to
specify C2, right?
00:03:41.760 --> 00:03:44.537
Any function that's a
multiple of this sine
00:03:44.537 --> 00:03:45.870
should also be an eigenfunction.
00:03:45.870 --> 00:03:47.640
So instead the other
boundary condition,
00:03:47.640 --> 00:03:51.030
this y of l equals 0, needs
to be used to pin down
00:03:51.030 --> 00:03:52.920
with the eigenvalue is.
00:03:52.920 --> 00:03:54.870
So the second
equation, y of l equals
00:03:54.870 --> 00:03:59.560
0, which implies that the
square root of minus lambda
00:03:59.560 --> 00:04:01.660
has to be equal
to 2 pi over l, it
00:04:01.660 --> 00:04:04.232
has to be all the
nodes of the sine
00:04:04.232 --> 00:04:05.440
where the sine is equal to 0.
00:04:05.440 --> 00:04:07.565
That's the equivalent of
our secular characteristic
00:04:07.565 --> 00:04:10.990
polynomial that prescribes with
the eigenvalues are associated
00:04:10.990 --> 00:04:13.524
with each of the eigenfunctions.
00:04:13.524 --> 00:04:15.190
So now we know what
the eigenvalues are.
00:04:15.190 --> 00:04:17.320
The eigenvalues are
the set of numbers
00:04:17.320 --> 00:04:21.700
minus 2 pi n over l squared.
00:04:21.700 --> 00:04:23.490
There's an infinite
number of eigenvalues.
00:04:23.490 --> 00:04:26.920
It's an infinite dimensional
space that we're in,
00:04:26.920 --> 00:04:29.230
so it's not a big surprise
that it works out that way.
00:04:29.230 --> 00:04:33.100
And the eigenvectors then are
different scalar multiples
00:04:33.100 --> 00:04:36.220
of sine of the eigenvalues,
square root of the eigenvalues,
00:04:36.220 --> 00:04:37.672
minus x.
00:04:37.672 --> 00:04:39.130
There's a one-to-one
correspondence
00:04:39.130 --> 00:04:41.110
between all the linear
algebra we've done
00:04:41.110 --> 00:04:42.670
and linear
differential equations
00:04:42.670 --> 00:04:44.560
or linear partial
differential equations.
00:04:44.560 --> 00:04:47.030
You can think about these
things in exactly the same way.
00:04:47.030 --> 00:04:53.080
I'm sure in 1050, you started to
talk about orthogonal functions
00:04:53.080 --> 00:04:55.270
to represent solutions of
differential equations.
00:04:55.270 --> 00:04:57.190
Or if you haven't, you're
going to very soon.
00:04:57.190 --> 00:04:58.690
This is a part of
the course you get
00:04:58.690 --> 00:05:01.180
to look at the analytical side
of some of these things as
00:05:01.180 --> 00:05:02.230
opposed to the numerical side.
00:05:02.230 --> 00:05:03.771
But there's a
one-to-one relationship
00:05:03.771 --> 00:05:04.730
between those things.
00:05:04.730 --> 00:05:06.855
So if you understand one,
you understand the other,
00:05:06.855 --> 00:05:10.770
and you can come at them
from either perspective.
00:05:10.770 --> 00:05:11.979
This sort of stuff is useful.
00:05:11.979 --> 00:05:14.144
Actually, the classical
chemical engineering example
00:05:14.144 --> 00:05:15.920
comes from quantum
mechanics where
00:05:15.920 --> 00:05:18.780
you think about wave functions
and different energy levels
00:05:18.780 --> 00:05:21.870
corresponding to eigenvalues.
00:05:21.870 --> 00:05:23.220
That's cool.
00:05:23.220 --> 00:05:25.460
Sometimes, I like to think
about a mechanical analog
00:05:25.460 --> 00:05:28.420
to that, which is the
buckling of an elastic column.
00:05:28.420 --> 00:05:29.670
So you should do this at home.
00:05:29.670 --> 00:05:31.800
You should go get a
piece of spaghetti
00:05:31.800 --> 00:05:34.620
and push on the ends of
the piece of the spaghetti.
00:05:34.620 --> 00:05:36.000
And the spaghetti will buckle.
00:05:36.000 --> 00:05:38.300
Eventually it'll break,
but it'll buckle first.
00:05:38.300 --> 00:05:39.520
It'll bend.
00:05:39.520 --> 00:05:41.070
And how does it bend?
00:05:41.070 --> 00:05:45.330
Well, a balance of linear
momentum on this bar
00:05:45.330 --> 00:05:48.780
would tell you
that the deflection
00:05:48.780 --> 00:05:52.420
in the bar at different
points x along the bar
00:05:52.420 --> 00:05:57.600
multiplied by the pressure has
to balance the bending moment
00:05:57.600 --> 00:05:58.690
in the bar itself.
00:05:58.690 --> 00:06:00.840
So this e is some
elastic constant.
00:06:00.840 --> 00:06:02.640
I has a moment of inertia.
00:06:02.640 --> 00:06:04.650
And D squared y dx
squared is something
00:06:04.650 --> 00:06:05.980
like the curvature of the bar.
00:06:05.980 --> 00:06:07.590
So it's the bending
moments of the bar
00:06:07.590 --> 00:06:09.510
that balances the
pressure that's
00:06:09.510 --> 00:06:11.250
being exerted on the bar.
00:06:11.250 --> 00:06:13.800
And sure enough,
this bar will buckle
00:06:13.800 --> 00:06:16.860
when the pressure
applied exceeds
00:06:16.860 --> 00:06:20.330
the first eigenvalue associated
with this differential
00:06:20.330 --> 00:06:20.830
equation.
00:06:20.830 --> 00:06:24.390
We just worked that
eigenvalue out.
00:06:24.390 --> 00:06:29.950
We said that that eigenvalue
had to be the square root of 2
00:06:29.950 --> 00:06:31.407
pi over l squared.
00:06:31.407 --> 00:06:33.240
And so when the pressure
exceeds square root
00:06:33.240 --> 00:06:36.810
of 2 pi over l squared
times the elastic modulus,
00:06:36.810 --> 00:06:40.464
this column will bend
and deform continuously
00:06:40.464 --> 00:06:41.880
until it eventually
breaks, right?
00:06:41.880 --> 00:06:44.710
It will undergo this
linear elastic deformation,
00:06:44.710 --> 00:06:48.560
then plastic deformation
later, and it will break.
00:06:48.560 --> 00:06:50.360
The Eiffel Tower,
actually, is one
00:06:50.360 --> 00:06:51.860
of the first
structures in the world
00:06:51.860 --> 00:06:54.270
to utilize this
principle, right?
00:06:54.270 --> 00:06:56.000
It's got very
narrow beams in it.
00:06:56.000 --> 00:06:59.614
The beams are engineered so
that their elastic modulus
00:06:59.614 --> 00:07:01.280
is strong enough that
they won't buckle.
00:07:01.280 --> 00:07:04.060
Gustave Eiffel is one of the
first applied physicists,
00:07:04.060 --> 00:07:07.700
somebody who took the
physics of elastic bars
00:07:07.700 --> 00:07:10.460
and applied them to
building structures that
00:07:10.460 --> 00:07:14.570
weren't big and blocky, but used
a minimal amount of material.
00:07:14.570 --> 00:07:16.880
Cool, right?
00:07:16.880 --> 00:07:18.505
OK, so that's recap.
00:07:21.580 --> 00:07:23.980
Any questions about that?
00:07:23.980 --> 00:07:26.410
You've seen these things before.
00:07:26.410 --> 00:07:29.530
You understood them
well before too maybe?
00:07:29.530 --> 00:07:33.430
Give some thought to this, OK?
00:07:33.430 --> 00:07:37.090
We talked about
eigendecomposition last time
00:07:37.090 --> 00:07:40.050
that, associated with
the square matrix,
00:07:40.050 --> 00:07:44.050
was a particular eigenvalue or
particular set of eigenvalues,
00:07:44.050 --> 00:07:48.250
stretches and corresponding
eigenvectors directions.
00:07:48.250 --> 00:07:51.490
These were special solutions to
the system of linear equations
00:07:51.490 --> 00:07:52.540
based on a matrix.
00:07:52.540 --> 00:07:53.815
It was a square matrix.
00:07:53.815 --> 00:07:55.540
And you might ask,
well, what happens
00:07:55.540 --> 00:07:57.850
if the matrix isn't square?
00:07:57.850 --> 00:08:03.010
What if A is in the space of
real matrices that are n by m,
00:08:03.010 --> 00:08:04.630
where n and m maybe
aren't the same?
00:08:04.630 --> 00:08:06.910
Maybe they are the same,
but maybe they're not.
00:08:06.910 --> 00:08:10.046
And there is an
equivalent decomposition.
00:08:10.046 --> 00:08:11.920
It's called the singular
value decomposition.
00:08:11.920 --> 00:08:16.850
It's like an eigendecomposition
for non-square matrices.
00:08:16.850 --> 00:08:21.150
So rather than writing our
matrix as some w lambda w
00:08:21.150 --> 00:08:24.720
inverse, we're going to
write it as some product
00:08:24.720 --> 00:08:29.490
U times sigma times
V with this dagger.
00:08:29.490 --> 00:08:31.940
The dagger here is
conjugate transpose.
00:08:31.940 --> 00:08:34.400
Transpose the matrix, and
take the complex conjugate
00:08:34.400 --> 00:08:36.382
of all the elements, OK?
00:08:36.382 --> 00:08:39.630
I mentioned last time that
eigenvalues and eigenvectors
00:08:39.630 --> 00:08:42.820
could be complex,
potentially, right?
00:08:42.820 --> 00:08:45.960
So whenever we have that case
where things can be complex,
00:08:45.960 --> 00:08:47.510
usually the
transposition operation
00:08:47.510 --> 00:08:49.510
is replaced with the
conjugate transpose.
00:08:52.224 --> 00:08:53.640
What are these
different matrices.
00:08:53.640 --> 00:08:55.620
Well, let me tell you.
00:08:55.620 --> 00:08:58.140
U is a complex matrix.
00:08:58.140 --> 00:09:01.530
It maps from the
space N to R N to R N,
00:09:01.530 --> 00:09:04.350
so it's an n by n square matrix.
00:09:04.350 --> 00:09:09.240
Sigma is a real valued matrix,
and it lives in the space of n
00:09:09.240 --> 00:09:10.830
by n matrices.
00:09:10.830 --> 00:09:16.520
V is a square matrix again,
but it has dimensions m by m.
00:09:16.520 --> 00:09:19.770
Remember, A maps
from R M to R N,
00:09:19.770 --> 00:09:22.110
so that's what the
sequence of products says.
00:09:22.110 --> 00:09:23.820
B maps from m to m.
00:09:23.820 --> 00:09:27.320
Sigma maps from m to n.
00:09:27.320 --> 00:09:28.720
U maps from n to n.
00:09:28.720 --> 00:09:31.350
So this match from
m to n as well.
00:09:34.060 --> 00:09:36.760
Sigma is like
lambda from before.
00:09:36.760 --> 00:09:38.660
It's a diagonal matrix.
00:09:38.660 --> 00:09:40.040
It only has diagonal elements.
00:09:40.040 --> 00:09:41.415
It's just not
square, but it only
00:09:41.415 --> 00:09:45.540
has diagonal elements, all
of which will be positive.
00:09:48.100 --> 00:09:51.730
And then U and V are called
the left and right singular
00:09:51.730 --> 00:09:52.310
vectors.
00:09:52.310 --> 00:09:54.560
And they have special
properties associated with them,
00:09:54.560 --> 00:09:56.290
which I'll show you right now.
00:09:56.290 --> 00:09:59.600
Any questions about
how this decomposition
00:09:59.600 --> 00:10:02.690
is composed or made up?
00:10:02.690 --> 00:10:04.790
It looks just like the
eigendecomposition,
00:10:04.790 --> 00:10:06.360
but it can be applied
to any matrix.
00:10:06.360 --> 00:10:06.860
Yes?
00:10:06.860 --> 00:10:07.910
AUDIENCE: Quick question.
00:10:07.910 --> 00:10:08.170
JAMES W. SWAN: Sure.
00:10:08.170 --> 00:10:09.920
AUDIENCE: Do all
matrices have this thing,
00:10:09.920 --> 00:10:12.440
or is it like the eigenvalues
where some do and some don't.
00:10:12.440 --> 00:10:13.170
JAMES W. SWAN: This
is a great question.
00:10:13.170 --> 00:10:15.253
So all matrices are going
to have a singular value
00:10:15.253 --> 00:10:16.710
decomposition.
00:10:16.710 --> 00:10:18.410
We saw with the
eigenvalue decomposition
00:10:18.410 --> 00:10:22.520
that there could be a case
where the eigenvectors are
00:10:22.520 --> 00:10:25.190
degenerate, and we can't
write that full decomposition.
00:10:25.190 --> 00:10:28.214
All matrices are going to
have this decomposition.
00:10:34.380 --> 00:10:41.690
So for some properties
of this decomposition,
00:10:41.690 --> 00:10:44.330
U and V are what we
call unitary matrices.
00:10:44.330 --> 00:10:45.850
I talked about these before.
00:10:45.850 --> 00:10:49.430
Unitary matrices are ones for
whom, if they're real valued,
00:10:49.430 --> 00:10:51.830
their transpose is
also their inverse.
00:10:51.830 --> 00:10:54.490
If they're complex
valued, and they're
00:10:54.490 --> 00:10:57.620
conjugate transpose is the
equivalent of their inverse.
00:10:57.620 --> 00:11:01.930
So U times U conjugate
transpose will be identity.
00:11:01.930 --> 00:11:05.800
V times V conjugate
transpose will be identity.
00:11:05.800 --> 00:11:08.510
Unitary matrices also
have the property
00:11:08.510 --> 00:11:12.400
that they impart no
stretch to a matrix--
00:11:12.400 --> 00:11:13.820
or to vectors.
00:11:13.820 --> 00:11:16.460
So their maps don't stretch.
00:11:16.460 --> 00:11:19.104
They're kind of like
rotational matrices, right?
00:11:19.104 --> 00:11:21.520
They change directions, but
they don't stretch things out.
00:11:25.190 --> 00:11:30.890
If I were to take A conjugate
transpose and multiply it by A,
00:11:30.890 --> 00:11:36.680
that would be the same as taking
U sigma V conjugate transpose,
00:11:36.680 --> 00:11:38.780
and multiplying it
by U sigma V. If I
00:11:38.780 --> 00:11:43.010
use the properties of
matrix multiplications
00:11:43.010 --> 00:11:46.340
and complex
conjugate transposes,
00:11:46.340 --> 00:11:48.500
and work out what
this expression is,
00:11:48.500 --> 00:11:52.280
I'll find out that it's
equivalent to V sigma conjugate
00:11:52.280 --> 00:11:56.660
transpose sigma V
conjugate transpose.
00:11:56.660 --> 00:12:03.050
Well this has exactly the same
form as an eigendecomposition.
00:12:03.050 --> 00:12:07.400
An eigendecomposition
of A times A instead
00:12:07.400 --> 00:12:11.600
of an eigendecomposition
of A. So V
00:12:11.600 --> 00:12:16.160
is the set of eigenvectors
of A conjugate transpose A,
00:12:16.160 --> 00:12:19.700
and sigma squared
are the eigenvalues
00:12:19.700 --> 00:12:24.260
of A conjugate
transpose times A.
00:12:24.260 --> 00:12:26.610
And if I reverse the order
of this multiplication--
00:12:26.610 --> 00:12:30.530
so I do A times A conjugate
transpose-- and work it out,
00:12:30.530 --> 00:12:34.010
that would be U sigma
sigma U. And so U
00:12:34.010 --> 00:12:38.060
are the eigenvectors of
A A conjugate transpose,
00:12:38.060 --> 00:12:42.320
and sigma squared are still the
eigenvalues of A A conjugate
00:12:42.320 --> 00:12:45.070
transpose.
00:12:45.070 --> 00:12:47.830
So what are these
things U and V?
00:12:47.830 --> 00:12:51.690
They relate to the
eigenvectors of the product
00:12:51.690 --> 00:12:54.550
of A with itself, this
particular product of A
00:12:54.550 --> 00:12:58.480
with itself, or this particular
product of A with itself.
00:12:58.480 --> 00:13:01.310
Sigma are the singular values.
00:13:01.310 --> 00:13:03.750
And all matrices possess
this sort of a decomposition.
00:13:03.750 --> 00:13:06.530
They all have a set of singular
values and singular vectors.
00:13:06.530 --> 00:13:08.994
These sigmas are called the
singular values of the A.
00:13:08.994 --> 00:13:10.160
They have a particular name.
00:13:10.160 --> 00:13:12.110
I'm going to show you how you
can use this decomposition
00:13:12.110 --> 00:13:14.000
to do something you
already know how to do,
00:13:14.000 --> 00:13:15.083
but how to do it formally.
00:13:18.200 --> 00:13:21.738
What are some properties of the
singular value decomposition?
00:13:24.810 --> 00:13:27.510
So if we take a matrix A and
we compute it's singular value
00:13:27.510 --> 00:13:29.470
decomposition, this is
how you do it in Matlab.
00:13:33.030 --> 00:13:36.560
We'll find out, for this
matrix, U is identity.
00:13:36.560 --> 00:13:39.720
Sigma is identity with an
extra column pasted on it.
00:13:39.720 --> 00:13:40.920
And B is also identity.
00:13:40.920 --> 00:13:44.130
I mean, this is the simplest
possible four by three matrix
00:13:44.130 --> 00:13:45.517
I can write down.
00:13:45.517 --> 00:13:47.850
You don't have to know how
to compute the singular value
00:13:47.850 --> 00:13:50.160
decomposition, you just
need to know that it
00:13:50.160 --> 00:13:51.420
can be computed in this way.
00:13:51.420 --> 00:13:53.160
You might be able to
guess how to compute it
00:13:53.160 --> 00:13:55.035
based on what we did
with eigenvalues earlier
00:13:55.035 --> 00:13:57.240
and eigenvectors.
00:13:57.240 --> 00:13:59.670
It'll turn out some of
the columns of sigma
00:13:59.670 --> 00:14:00.950
will be non-zero right?
00:14:00.950 --> 00:14:04.390
There are three non-zero
columns of sigma.
00:14:04.390 --> 00:14:07.780
And the columns of
V, they correspond
00:14:07.780 --> 00:14:11.800
to those columns of sigma,
spanned the null space
00:14:11.800 --> 00:14:16.620
of the matrix A.
00:14:16.620 --> 00:14:20.220
So the first three
columns here are non-zero,
00:14:20.220 --> 00:14:24.221
the first three columns
of V. I'm sorry,
00:14:24.221 --> 00:14:25.970
the first three columns
here are non-zero.
00:14:25.970 --> 00:14:27.590
The last column is 0.
00:14:27.590 --> 00:14:30.440
The columns of sigma
which are 0 correspond
00:14:30.440 --> 00:14:33.920
to a particular column in V,
this last column here, which
00:14:33.920 --> 00:14:36.017
lives in the null
space of A. So you
00:14:36.017 --> 00:14:37.600
can see, if I take
A and I multiply it
00:14:37.600 --> 00:14:41.970
by any vector that's
proportional to 0, 0, 0, 1,
00:14:41.970 --> 00:14:43.340
I'll get back 0.
00:14:43.340 --> 00:14:46.640
So the null space
of A is spanned
00:14:46.640 --> 00:14:48.260
by all these vectors
corresponding
00:14:48.260 --> 00:14:49.850
to the 0 columns of sigma.
00:14:52.980 --> 00:14:54.807
Some of the columns
of sigma are non-zero.
00:14:54.807 --> 00:14:55.890
These first three columns.
00:14:55.890 --> 00:15:02.000
And the rows of U corresponding
to those three columns
00:15:02.000 --> 00:15:04.550
span the range of A. So
if I do the singular value
00:15:04.550 --> 00:15:09.410
decomposition of a matrix,
and I look at U, V, and sigma
00:15:09.410 --> 00:15:11.060
and what they're composed of--
00:15:11.060 --> 00:15:15.050
where sigma is 0 and non-zero,
and the corresponding columns
00:15:15.050 --> 00:15:17.330
or rows of U and V--
then I can figure out
00:15:17.330 --> 00:15:25.110
what vectors span the range
and null space of the matrix A.
00:15:25.110 --> 00:15:26.710
Here's another example.
00:15:26.710 --> 00:15:29.130
So here I have A.
Now instead of being
00:15:29.130 --> 00:15:32.850
three rows by four columns,
it's four rows by three columns.
00:15:32.850 --> 00:15:34.660
And here's the singular
value decomposition
00:15:34.660 --> 00:15:37.140
that comes out of Matlab.
00:15:37.140 --> 00:15:41.580
There are no vectors that
live in the null space of A,
00:15:41.580 --> 00:15:44.160
and there are no 0
columns in sigma.
00:15:44.160 --> 00:15:46.975
There's no corresponding
columns in V.
00:15:46.975 --> 00:15:50.920
There are no vectors
in the null space of A.
00:15:50.920 --> 00:15:55.480
The range of A is spanned
by the rows corresponding
00:15:55.480 --> 00:15:58.420
to the non-zero-- the
rows of U corresponding
00:15:58.420 --> 00:15:59.950
to the non-zero
columns of sigma.
00:15:59.950 --> 00:16:02.890
So it's these three columns
in the first three rows.
00:16:02.890 --> 00:16:07.000
And these first three
rows, clearly they span--
00:16:07.000 --> 00:16:12.040
they describe the same range
as the three columns in A.
00:16:12.040 --> 00:16:13.540
So the singular
value decomposition
00:16:13.540 --> 00:16:17.290
gives us direct access
to the null space
00:16:17.290 --> 00:16:18.810
and the range of a matrix.
00:16:18.810 --> 00:16:21.570
That's handy.
00:16:21.570 --> 00:16:24.170
And it can be used
in various ways.
00:16:24.170 --> 00:16:26.140
So here's one example
where it can be used.
00:16:26.140 --> 00:16:29.830
Here I have a fingerprint.
00:16:29.830 --> 00:16:31.150
It's a bitmap.
00:16:31.150 --> 00:16:33.767
It's a square bit of
data, like a matrix,
00:16:33.767 --> 00:16:35.350
and each of the
elements of the matrix
00:16:35.350 --> 00:16:39.430
takes on a value describing
how dark or light that pixel.
00:16:39.430 --> 00:16:43.840
Let's say it's grayscale, and
it's value's between 0 and 255.
00:16:43.840 --> 00:16:45.820
That's pretty typical.
00:16:45.820 --> 00:16:48.490
So I have this matrix, and
each element to the matrix
00:16:48.490 --> 00:16:50.410
corresponds to a pixel.
00:16:50.410 --> 00:16:53.470
And I do a singular
value decomposition.
00:16:53.470 --> 00:16:55.750
Some of the singular
values, the values of sigma,
00:16:55.750 --> 00:16:57.760
are bigger than others.
00:16:57.760 --> 00:17:00.770
They're all positive, but
some are bigger than others.
00:17:00.770 --> 00:17:02.440
The ones that are
biggest in magnitude
00:17:02.440 --> 00:17:06.770
carry the most information
content about the matrix.
00:17:06.770 --> 00:17:11.470
So we can do data compression by
neglecting singular values that
00:17:11.470 --> 00:17:14.980
are smaller than some
threshold, and also neglecting
00:17:14.980 --> 00:17:17.439
the corresponding
singular vectors.
00:17:17.439 --> 00:17:18.730
And that's what I've done here.
00:17:18.730 --> 00:17:21.400
So here's the original
bitmap of the fingerprint.
00:17:21.400 --> 00:17:23.680
I did the singular
value decomposition,
00:17:23.680 --> 00:17:27.819
and then I retained only the 50
biggest singular values and I
00:17:27.819 --> 00:17:29.830
left all the other
singular values out.
00:17:29.830 --> 00:17:32.290
This bitmap was something
like, I don't know,
00:17:32.290 --> 00:17:34.450
300 pixels by 300
pixels, so there's
00:17:34.450 --> 00:17:37.210
like 300 singular
values, but I got rid
00:17:37.210 --> 00:17:40.480
of 5/6 of the
information content.
00:17:40.480 --> 00:17:43.120
I dropped 5/6 of the
singular vectors,
00:17:43.120 --> 00:17:44.800
and then I
reconstructed the matrix
00:17:44.800 --> 00:17:47.320
from the singular values
and those singular vectors,
00:17:47.320 --> 00:17:48.910
and you get a faithful
representation
00:17:48.910 --> 00:17:51.142
of the original fingerprint.
00:17:51.142 --> 00:17:52.600
So the singular
value decomposition
00:17:52.600 --> 00:17:54.433
says something about
the information content
00:17:54.433 --> 00:17:57.100
in the transformation
that is the matrix, right?
00:17:57.100 --> 00:17:58.720
There are some
transformations that
00:17:58.720 --> 00:18:03.172
are of lower power or
importance than others.
00:18:03.172 --> 00:18:04.630
And the magnitude
of these singular
00:18:04.630 --> 00:18:06.300
values tell you what they are.
00:18:06.300 --> 00:18:09.680
Does that makes sense?
00:18:09.680 --> 00:18:10.680
How else can it be used?
00:18:10.680 --> 00:18:14.370
Well, one way it can be
used is finding the least
00:18:14.370 --> 00:18:17.820
square solution
to the equation Ax
00:18:17.820 --> 00:18:22.420
equals b, where A is no
longer a square matrix, OK?
00:18:26.410 --> 00:18:28.030
You've done this
in other contexts
00:18:28.030 --> 00:18:32.860
before where the equations
are overspecified.
00:18:32.860 --> 00:18:36.090
We have more equations than
unknowns, like data fitting.
00:18:36.090 --> 00:18:39.820
You form the normal equations,
you multiply both sides of Ax
00:18:39.820 --> 00:18:45.910
equals b by A transpose, and
then invert A transpose A.
00:18:45.910 --> 00:18:47.890
You might not be
too surprised, then,
00:18:47.890 --> 00:18:50.380
to think that singular value
decomposition could be useful
00:18:50.380 --> 00:18:50.920
here too.
00:18:50.920 --> 00:18:54.790
Since we already saw the data in
a singular value decomposition
00:18:54.790 --> 00:18:58.630
corresponds to eigenvectors and
eigenvalues of this A transpose
00:18:58.630 --> 00:19:00.130
A, right?
00:19:00.130 --> 00:19:02.950
But there's a way to use
this sort of decomposition
00:19:02.950 --> 00:19:05.140
formally to solve
problems that are
00:19:05.140 --> 00:19:09.730
both overspecified
and underspecified.
00:19:09.730 --> 00:19:14.620
Least squares means find
the vector of solutions
00:19:14.620 --> 00:19:21.860
x that minimizes
this function phi.
00:19:21.860 --> 00:19:24.560
Phi is the length of the
vector given by the difference
00:19:24.560 --> 00:19:26.450
between Ax and b.
00:19:26.450 --> 00:19:29.900
It's one measure of how far
an error our solution x is.
00:19:29.900 --> 00:19:34.220
So let's define the value
x which is least in error.
00:19:34.220 --> 00:19:36.065
This is one definition
of least squares.
00:19:40.490 --> 00:19:43.950
And I know the singular value
decomposition of A. So A
00:19:43.950 --> 00:19:48.685
is U sigma times V. So I
have U sigma V times x.
00:19:48.685 --> 00:19:52.140
I can factor out U, and I've
got a factor of U transpose,
00:19:52.140 --> 00:19:54.960
or U conjugate transpose
multiplying by b.
00:19:54.960 --> 00:20:00.000
So Ax minus b is the same as
U times the quantity sigma V
00:20:00.000 --> 00:20:05.240
conjugate transpose x minus
U conjugate transpose b.
00:20:05.240 --> 00:20:08.480
We want to know the x
that minimizes this phi.
00:20:08.480 --> 00:20:10.130
It's an optimization problem.
00:20:10.130 --> 00:20:12.990
We'll talk in great detail about
these sorts of problems later.
00:20:12.990 --> 00:20:15.350
This one is so easy to do,
we can just work it out
00:20:15.350 --> 00:20:18.152
in a couple lines of text.
00:20:18.152 --> 00:20:19.610
We'll define a new
set of unknowns,
00:20:19.610 --> 00:20:25.130
y, which is V transpose times
x, and a new right-hand side
00:20:25.130 --> 00:20:29.270
for a system of equations p,
which is U transpose times b.
00:20:29.270 --> 00:20:31.220
And then we can rewrite
our function phi
00:20:31.220 --> 00:20:32.600
that we're trying to minimize.
00:20:32.600 --> 00:20:37.720
So phi then becomes
U sigma y minus p.
00:20:37.720 --> 00:20:39.080
U is a unitary vector.
00:20:39.080 --> 00:20:45.440
It imparts no stretch in the two
norms, so this sigma y minus p
00:20:45.440 --> 00:20:49.850
doesn't get elongated by
multiplication with U.
00:20:49.850 --> 00:20:52.280
So it's length,
the length of this,
00:20:52.280 --> 00:20:55.660
is the same as the length
of sigma y minus p.
00:20:55.660 --> 00:20:56.915
You can prove this.
00:20:56.915 --> 00:20:58.540
It's not very difficult
to show at all.
00:20:58.540 --> 00:21:02.430
You use the definition of
the two norm to prove it.
00:21:02.430 --> 00:21:10.150
So phi is minimized by y's,
which makes this norm smallest,
00:21:10.150 --> 00:21:11.170
make it closest to 0.
00:21:14.240 --> 00:21:17.060
Let r be the number
of non-zero singular
00:21:17.060 --> 00:21:19.190
values, the number
of those sigmas
00:21:19.190 --> 00:21:22.440
which are not equal to 0.
00:21:22.440 --> 00:21:24.360
That's also the rank of A.
00:21:24.360 --> 00:21:29.330
Then I can rewrite
phi as the sum from i
00:21:29.330 --> 00:21:36.060
equals 1 to r of sigma i i
time y i minus p i squared.
00:21:36.060 --> 00:21:40.160
That's parts of this length,
this Euclidean length,
00:21:40.160 --> 00:21:42.320
for which sigma is non-zero.
00:21:42.320 --> 00:21:46.730
Plus the sum from r
plus 1 to n, the sum
00:21:46.730 --> 00:21:50.240
over the rest of
the values of p,
00:21:50.240 --> 00:21:52.590
for which the
corresponding sigmas are 0.
00:21:59.710 --> 00:22:02.830
I want to minimize
phi, and the only thing
00:22:02.830 --> 00:22:05.337
that I can change to
minimize it is what?
00:22:08.139 --> 00:22:10.530
What am I free to
pick in this equation
00:22:10.530 --> 00:22:13.930
in order to make phi
as small as possible?
00:22:13.930 --> 00:22:14.570
Yeah?
00:22:14.570 --> 00:22:15.070
AUDIENCE: y.
00:22:15.070 --> 00:22:17.290
JAMES W. SWAN: y,
so I need to choose
00:22:17.290 --> 00:22:21.110
the y's that make this
phi as small as possible.
00:22:21.110 --> 00:22:22.955
What value should I
choose for the y's?
00:22:26.678 --> 00:22:27.666
What do you think?
00:22:30.630 --> 00:22:34.380
AUDIENCE: [INAUDIBLE]
00:22:34.380 --> 00:22:35.630
JAMES W. SWAN: Perfect, right?
00:22:35.630 --> 00:22:40.000
Choose y equals p
i over sigma i i.
00:22:40.000 --> 00:22:42.610
Right, y i is p
i over sigma i i.
00:22:42.610 --> 00:22:46.150
Then all of these terms is 0.
00:22:46.150 --> 00:22:48.490
I can't make this sum
any smaller than that.
00:22:48.490 --> 00:22:53.980
That fixes the value
of y i up to r.
00:22:53.980 --> 00:22:57.280
I can't do anything about
this left over bit here.
00:22:57.280 --> 00:23:00.340
There's no choice of
y that's going to make
00:23:00.340 --> 00:23:01.657
this part and the smaller.
00:23:01.657 --> 00:23:02.490
It's just left over.
00:23:02.490 --> 00:23:04.275
It's some remainder that
we can't make any smaller
00:23:04.275 --> 00:23:05.410
or minimize an smaller.
00:23:05.410 --> 00:23:08.125
There isn't an exact solution
to this problem, in many cases.
00:23:10.690 --> 00:23:17.190
But one way this could be
0 is if r is equal to n.
00:23:17.190 --> 00:23:19.570
Then there are left
over unspecified terms,
00:23:19.570 --> 00:23:22.375
and then this y i
equals p i over sigma i
00:23:22.375 --> 00:23:23.920
is the exact solution
to the problem.
00:23:28.280 --> 00:23:29.690
So this is what you told me.
00:23:29.690 --> 00:23:36.230
Choose y i is p i over sigma i i
for i bigger than 1 and smaller
00:23:36.230 --> 00:23:38.650
than r.
00:23:38.650 --> 00:23:40.710
There are going to
be values of y i
00:23:40.710 --> 00:23:46.070
that go between r plus 1 and
m, because A was a vector that
00:23:46.070 --> 00:23:47.990
mapped from m to n, right?
00:23:47.990 --> 00:23:52.160
So I have extra values of y that
could be specified potentially.
00:23:52.160 --> 00:23:56.279
If that's true, if r
plus 1 is smaller than m,
00:23:56.279 --> 00:23:58.570
then there's some components
of y that I don't get to--
00:23:58.570 --> 00:23:59.770
I can't specify, right?
00:23:59.770 --> 00:24:02.930
My system of equations is
somehow underdetermined.
00:24:02.930 --> 00:24:05.540
I need some external
information to show me what
00:24:05.540 --> 00:24:07.740
values to pick for those y i.
00:24:07.740 --> 00:24:08.900
I don't know.
00:24:08.900 --> 00:24:10.400
I can't use them.
00:24:10.400 --> 00:24:13.420
Sometimes people just
set y i equal to 0.
00:24:13.420 --> 00:24:16.220
That's sort of silly,
but that's what's done.
00:24:16.220 --> 00:24:20.150
It's called the minimum
norm least square solution.
00:24:20.150 --> 00:24:23.090
y has minimum length, when you
set all these other components
00:24:23.090 --> 00:24:24.050
to 0.
00:24:24.050 --> 00:24:27.140
But the truth is, we can't
specify those components,
00:24:27.140 --> 00:24:27.716
right?
00:24:27.716 --> 00:24:29.090
We need some
external information
00:24:29.090 --> 00:24:31.850
in order to specify them.
00:24:31.850 --> 00:24:34.680
Once we know y, we
can find x going back
00:24:34.680 --> 00:24:36.260
to our definition of what y is.
00:24:36.260 --> 00:24:38.840
So I multiply this equation
by V on both sides,
00:24:38.840 --> 00:24:42.350
and I'll get V y equals x.
00:24:42.350 --> 00:24:44.180
So I can find my
least square solution
00:24:44.180 --> 00:24:47.596
to the problem from the
singular value decomposition.
00:24:47.596 --> 00:24:49.220
So I can find the
least square solution
00:24:49.220 --> 00:24:52.850
to both overdetermined and
underdetermined problems using
00:24:52.850 --> 00:24:55.694
singular value decomposition.
00:24:55.694 --> 00:24:57.110
It inherits all
the properties you
00:24:57.110 --> 00:24:58.670
know of solving the
normal equations,
00:24:58.670 --> 00:25:01.430
multiplying by A transpose
the entire equation,
00:25:01.430 --> 00:25:04.722
and solving for a least
square solution that way.
00:25:04.722 --> 00:25:06.680
But that's only good for
overdetermined systems
00:25:06.680 --> 00:25:07.250
of equations.
00:25:07.250 --> 00:25:09.041
This can work for
underdetermined equations
00:25:09.041 --> 00:25:09.770
as well.
00:25:09.770 --> 00:25:12.450
And maybe we do have
extraneous information
00:25:12.450 --> 00:25:15.080
that lets us specify these
other components somehow.
00:25:15.080 --> 00:25:17.420
Maybe we do a
separate optimization
00:25:17.420 --> 00:25:20.110
that chooses from all
possible solutions
00:25:20.110 --> 00:25:23.070
where these y i's are free,
and picks the best one subject
00:25:23.070 --> 00:25:25.890
to some other constraint.
00:25:25.890 --> 00:25:27.550
Does it makes sense?
00:25:27.550 --> 00:25:29.010
OK, that's the
last decomposition
00:25:29.010 --> 00:25:32.070
we're going to talk about.
00:25:32.070 --> 00:25:34.890
It's as expensive to compute
the singular value decomposition
00:25:34.890 --> 00:25:36.810
as it is to solve a
system of equations.
00:25:36.810 --> 00:25:38.310
You might have
guessed that it's got
00:25:38.310 --> 00:25:40.470
an order N cubed flavor to it.
00:25:40.470 --> 00:25:42.660
It's kind of inescapable
that we run up
00:25:42.660 --> 00:25:45.920
against those
computational difficulties,
00:25:45.920 --> 00:25:48.530
order N cubed
computational complexity.
00:25:48.530 --> 00:25:50.730
And there are many problems
of practical interest,
00:25:50.730 --> 00:25:54.180
particularly solutions of
PDEs, for which that's not
00:25:54.180 --> 00:25:56.000
going to cut it.
00:25:56.000 --> 00:25:59.970
Where you couldn't solve
the problem with that sort
00:25:59.970 --> 00:26:01.330
of scaling in time.
00:26:01.330 --> 00:26:04.710
You couldn't compute the
Gaussian elimination,
00:26:04.710 --> 00:26:06.540
or the singular
value decomposition,
00:26:06.540 --> 00:26:08.970
or an eigenvalue decomposition.
00:26:08.970 --> 00:26:09.960
It won't work.
00:26:09.960 --> 00:26:14.520
And in those cases, we appeal
to not exact solution methods,
00:26:14.520 --> 00:26:17.010
but approximate
solution methods.
00:26:17.010 --> 00:26:20.120
So instead of trying to
get an exact solution,
00:26:20.120 --> 00:26:22.080
we'll try to formulate
one that's good enough.
00:26:22.080 --> 00:26:24.390
We already know the computer
introduces numerical error
00:26:24.390 --> 00:26:26.070
anyways.
00:26:26.070 --> 00:26:28.440
Maybe we don't need machine
precision in our solution
00:26:28.440 --> 00:26:30.600
or something close to machine
precision in our solution.
00:26:30.600 --> 00:26:32.266
Maybe we're solving
engineering problem,
00:26:32.266 --> 00:26:34.230
and we're willing to
accept relative errors
00:26:34.230 --> 00:26:37.770
on the order of 10 to the
minus 3 or 10 to the minus 5,
00:26:37.770 --> 00:26:41.550
some specified tolerance
that we apply to the problem.
00:26:41.550 --> 00:26:44.250
And in those circumstances,
we use iterative methods
00:26:44.250 --> 00:26:46.095
to solve systems of
equations instead
00:26:46.095 --> 00:26:49.020
of exact methods,
elimination methods,
00:26:49.020 --> 00:26:50.640
or metrics
decomposition methods.
00:26:53.830 --> 00:26:56.560
These algorithms are all
based on iterative refinement
00:26:56.560 --> 00:26:57.580
of an initial guess.
00:26:57.580 --> 00:26:59.260
So if we have some
system of equations
00:26:59.260 --> 00:27:02.350
we're trying to
solve, Ax equals b,
00:27:02.350 --> 00:27:05.410
we'll formulate some
linear map, right?
00:27:05.410 --> 00:27:09.910
xi plus 1 will be some
matrix C times x i
00:27:09.910 --> 00:27:13.540
plus some little vector c
where x i is my last best
00:27:13.540 --> 00:27:17.080
guess for the solution to
this problem, and x i plus 1
00:27:17.080 --> 00:27:20.320
is my next best guess for
the solution to this problem.
00:27:20.320 --> 00:27:24.700
And I'm hoping, as I apply
this map more and more times,
00:27:24.700 --> 00:27:27.010
I'm creeping closer
to the exact solution
00:27:27.010 --> 00:27:29.590
to the original
system of equations.
00:27:29.590 --> 00:27:33.610
The map will converge
when x i plus 1
00:27:33.610 --> 00:27:36.190
approaches x i, when the
map isn't making any changes
00:27:36.190 --> 00:27:39.250
to the vector anymore.
00:27:39.250 --> 00:27:48.490
And the converged value will
be a solution when x i--
00:27:48.490 --> 00:27:51.910
which is equal to i
minus c inverse times c,
00:27:51.910 --> 00:27:54.370
if I replace x i was
1 with x i appear,
00:27:54.370 --> 00:27:58.120
so I say that my map has
converged-- when this value is
00:27:58.120 --> 00:28:00.160
equivalent to A
inverse times B, when
00:28:00.160 --> 00:28:03.880
it's a solution to the
original problem, right?
00:28:03.880 --> 00:28:05.480
So my map may converge.
00:28:05.480 --> 00:28:08.320
It may not converge to a
solution of the problem I like,
00:28:08.320 --> 00:28:09.790
but if it satisfies
this condition,
00:28:09.790 --> 00:28:12.730
then has converged to be
a solution of the problem
00:28:12.730 --> 00:28:14.270
that I like as well.
00:28:14.270 --> 00:28:18.400
And so it's all about using this
C here and this little c here
00:28:18.400 --> 00:28:22.000
so that this map converges
to solution of the problem
00:28:22.000 --> 00:28:22.720
I'm after.
00:28:22.720 --> 00:28:25.000
And there are lots of
schemes for doing this.
00:28:25.000 --> 00:28:26.680
Some of them are kind of ad hoc.
00:28:26.680 --> 00:28:28.230
I'm going to show
you one right now.
00:28:28.230 --> 00:28:30.490
And then when we
do optimization,
00:28:30.490 --> 00:28:32.230
we'll talk about
a more formal way
00:28:32.230 --> 00:28:35.210
of doing this for
which you can guarantee
00:28:35.210 --> 00:28:37.937
very rapid convergence
to a solution.
00:28:37.937 --> 00:28:40.020
So here's a system of
equations I'd like to solve.
00:28:40.020 --> 00:28:41.510
It's not a very big one.
00:28:41.510 --> 00:28:44.050
It doesn't really make sense
to solve this one iteratively,
00:28:44.050 --> 00:28:45.940
but it's a nice illustration.
00:28:45.940 --> 00:28:49.570
One way to go about
formulating this map
00:28:49.570 --> 00:28:53.290
is to split this
matrix into two parts.
00:28:53.290 --> 00:28:57.160
So I'll split it into a diagonal
part and an off diagonal part.
00:28:57.160 --> 00:29:00.070
So I haven't changed the
problem at all by doing that.
00:29:00.070 --> 00:29:03.740
And then I'm going to
rename this x x i plus 1,
00:29:03.740 --> 00:29:07.060
and I'm going to
rename this x x i.
00:29:07.060 --> 00:29:09.454
And then move this
matrix vector product
00:29:09.454 --> 00:29:10.870
to the other side
of the equation.
00:29:10.870 --> 00:29:12.376
And here's my map.
00:29:12.376 --> 00:29:13.750
Of course, this
matrix multiplied
00:29:13.750 --> 00:29:16.210
doesn't make any-- it's
not useful to write it out
00:29:16.210 --> 00:29:16.780
explicitly.
00:29:16.780 --> 00:29:19.070
This is just identity.
00:29:19.070 --> 00:29:20.840
So I can drop this entirely.
00:29:20.840 --> 00:29:22.090
This is just x i plus one.
00:29:22.090 --> 00:29:23.320
So here's my map.
00:29:23.320 --> 00:29:27.430
Take an initial guess,
multiply it by this matrix,
00:29:27.430 --> 00:29:31.600
add the vector 1, 0, and repeat
over and over and over again.
00:29:31.600 --> 00:29:33.130
Hopefully-- we
don't really know--
00:29:33.130 --> 00:29:34.671
but hopefully, it's
going to converge
00:29:34.671 --> 00:29:38.502
to a solution of the
original linear equations.
00:29:38.502 --> 00:29:39.710
I didn't make up that method.
00:29:39.710 --> 00:29:42.490
That's a method called
Jacobi Iteration.
00:29:42.490 --> 00:29:46.280
And the strategy is to split
the matrix A into two parts--
00:29:46.280 --> 00:29:49.010
a sum of its diagonal
elements, and it's off diagonal
00:29:49.010 --> 00:29:51.050
elements--
00:29:51.050 --> 00:29:54.410
and rewrite the original
equations as an iterative map.
00:29:54.410 --> 00:30:00.650
So D times x i plus 1 is equal
to minus r times x i plus b.
00:30:00.650 --> 00:30:07.600
Or x i plus 1 is D inverse
times minus r x i plus b.
00:30:07.600 --> 00:30:11.100
If the equations converge,
then D plus r times x i
00:30:11.100 --> 00:30:13.780
has to be equal to b, we
will have found a solution.
00:30:13.780 --> 00:30:15.040
If it converges, right?
00:30:15.040 --> 00:30:18.729
If these iterations
approach a steady value.
00:30:18.729 --> 00:30:20.770
If they don't change from
iteration to iteration.
00:30:20.770 --> 00:30:21.940
Is
00:30:21.940 --> 00:30:23.796
The nice thing about
the Jacobi method is it
00:30:23.796 --> 00:30:25.170
turns the hard
problem, the order
00:30:25.170 --> 00:30:29.920
N cubed problem of
computing A inverse B,
00:30:29.920 --> 00:30:31.750
into a succession
of easy problems,
00:30:31.750 --> 00:30:38.710
D inverse times some vector C.
How many calculations does it
00:30:38.710 --> 00:30:40.150
take to compute that D inverse?
00:30:44.710 --> 00:30:46.454
N, that's right, order N.
00:30:46.454 --> 00:30:47.620
It's just a diagonal matrix.
00:30:47.620 --> 00:30:49.880
I invert each of its diagonal
elements, and I'm done.
00:30:49.880 --> 00:30:53.790
So I went from order N cubed,
which was going to be hard,
00:30:53.790 --> 00:30:57.320
into a succession
of order N problems.
00:30:57.320 --> 00:30:59.730
So as long as it doesn't
take me order N squared
00:30:59.730 --> 00:31:02.879
iterations to get to the
solution that I want,
00:31:02.879 --> 00:31:03.670
I'm going to be OK.
00:31:03.670 --> 00:31:05.378
This is going to be
a viable way to solve
00:31:05.378 --> 00:31:07.690
this problem faster than
finding the exact solution.
00:31:13.240 --> 00:31:15.640
How do you know
that it converges?
00:31:15.640 --> 00:31:17.020
That's the question.
00:31:17.020 --> 00:31:20.290
Is this thing actually
going to converge or not,
00:31:20.290 --> 00:31:23.980
or are these iterations just
going to run on and on forever?
00:31:23.980 --> 00:31:26.620
Well, one way to check whether
it will converge or not
00:31:26.620 --> 00:31:31.220
is to go back up to this
equation here, and substitute b
00:31:31.220 --> 00:31:34.890
equals Ax, where x is the
exact solution to the problem.
00:31:37.420 --> 00:31:40.810
And you can transform,
then, this equation into one
00:31:40.810 --> 00:31:45.640
that looks like x i plus
1 minus x equal to minus D
00:31:45.640 --> 00:31:49.930
inverse times r x i minus x.
00:31:49.930 --> 00:31:52.780
And if I take the
norm of both sides
00:31:52.780 --> 00:31:55.270
and I apply our
normal equality--
00:31:55.270 --> 00:31:58.000
where the norm of a
matrix vector product
00:31:58.000 --> 00:32:01.460
is smaller than the product
of the norms of the matrices
00:32:01.460 --> 00:32:02.860
of the vectors--
00:32:02.860 --> 00:32:06.070
then I can get a
ratio like this.
00:32:06.070 --> 00:32:10.060
That the absolute error
in iteration I plus 1
00:32:10.060 --> 00:32:13.090
divided by the absolute
error in iteration i
00:32:13.090 --> 00:32:17.410
is smaller than the
norm of this matrix.
00:32:17.410 --> 00:32:21.810
So if I'm converging,
then what I expect
00:32:21.810 --> 00:32:25.320
is this ratio should
be smaller than 1.
00:32:25.320 --> 00:32:27.240
The error in my
next approximation
00:32:27.240 --> 00:32:30.080
should be smaller than the error
in my current approximation.
00:32:30.080 --> 00:32:31.450
That makes sense?
00:32:31.450 --> 00:32:35.100
So that means that I would hope
that the norm of this matrix
00:32:35.100 --> 00:32:36.690
is also smaller than 1.
00:32:36.690 --> 00:32:40.290
If it is, then I'm going to
be guaranteed to converge.
00:32:40.290 --> 00:32:42.912
So for a particular
coefficient matrix,
00:32:42.912 --> 00:32:45.120
for a system of linear
equations I'm trying to solve,
00:32:45.120 --> 00:32:47.010
I may be able to find--
00:32:47.010 --> 00:32:50.430
I may find that this is true.
00:32:50.430 --> 00:32:51.960
And then I can
apply this method,
00:32:51.960 --> 00:32:54.980
and I'll converge to a solution.
00:32:54.980 --> 00:32:58.100
We call this sort of
convergence linear.
00:32:58.100 --> 00:32:59.990
Whatever this number
is, it tells me
00:32:59.990 --> 00:33:03.890
the fraction by which the
error is reduced from iteration
00:33:03.890 --> 00:33:04.940
to iteration.
00:33:04.940 --> 00:33:07.810
So suppose this is 1/10.
00:33:07.810 --> 00:33:11.510
Then the absolute error is going
to be reduced by a factor of 10
00:33:11.510 --> 00:33:14.300
in each iteration.
00:33:14.300 --> 00:33:15.720
It's not going to
be 1/10 usually.
00:33:15.720 --> 00:33:17.095
It's going to be
something that's
00:33:17.095 --> 00:33:20.201
a little bit bigger than that
typically, but that's the idea.
00:33:20.201 --> 00:33:22.700
You can show-- I would encourage
you to try to work this out
00:33:22.700 --> 00:33:25.560
on your own-- but you can show
that the infinity norm of this
00:33:25.560 --> 00:33:26.060
product--
00:33:29.060 --> 00:33:33.050
infinity norm of this
product is equal to this.
00:33:33.050 --> 00:33:36.040
And if I ask that the
infinity norm of this product
00:33:36.040 --> 00:33:38.540
be smaller than 1,
that's guaranteed
00:33:38.540 --> 00:33:41.450
when the diagonal values of
the matrix and absolute value
00:33:41.450 --> 00:33:43.850
are bigger than
the sum of the off
00:33:43.850 --> 00:33:46.670
diagonal values in a particular
row or a particular column.
00:33:46.670 --> 00:33:49.187
And that kind of matrix we
call diagonally dominant.
00:33:49.187 --> 00:33:51.770
The diagonal values are bigger
than the sum and absolute value
00:33:51.770 --> 00:33:53.570
of the off diagonal pieces.
00:33:53.570 --> 00:33:57.290
So diagonally dominant matrices,
which come up quite often,
00:33:57.290 --> 00:33:59.240
can be-- those linear
equations based
00:33:59.240 --> 00:34:02.660
on those matrices can be solved
reasonable efficiency using
00:34:02.660 --> 00:34:04.262
the Jacobi method.
00:34:04.262 --> 00:34:05.720
There are better
methods to choose.
00:34:05.720 --> 00:34:07.350
I'll show you one in a second.
00:34:07.350 --> 00:34:08.900
But you can guarantee
that this is
00:34:08.900 --> 00:34:11.360
going to converge to a solution,
and that the solution will
00:34:11.360 --> 00:34:13.219
be the right solution to
the linear equations you
00:34:13.219 --> 00:34:14.094
were trying to solve.
00:34:19.440 --> 00:34:23.010
So if the goal is just to
turn hard problems into easier
00:34:23.010 --> 00:34:25.500
to solve problems, then
there are other natural ways
00:34:25.500 --> 00:34:27.540
to want to split a matrix.
00:34:27.540 --> 00:34:32.100
So maybe you want to split into
A lower triangular part which
00:34:32.100 --> 00:34:34.829
contains the diagonal
elements of A,
00:34:34.829 --> 00:34:36.449
and an upper
triangular part which
00:34:36.449 --> 00:34:41.130
has no diagonal elements of A.
We just split this thing apart.
00:34:41.130 --> 00:34:43.370
And then we could rewrite
our system of equations
00:34:43.370 --> 00:34:45.750
is an iterative map
like this, L times x i
00:34:45.750 --> 00:34:50.429
plus 1 is minus U
times x i plus b.
00:34:50.429 --> 00:34:53.550
All I have to do is invert
l to find my next iteration.
00:34:53.550 --> 00:34:55.590
And how expensive
computationally
00:34:55.590 --> 00:35:00.360
is it to solve a system of
equations which is triangular?
00:35:00.360 --> 00:35:02.630
This is a process we
call back substitution.
00:35:02.630 --> 00:35:03.540
Its order--
00:35:03.540 --> 00:35:04.470
AUDIENCE: N squared.
00:35:04.470 --> 00:35:05.820
JAMES W. SWAN: --N squared.
00:35:05.820 --> 00:35:08.160
So we still beat N cubed.
00:35:08.160 --> 00:35:11.640
One would hope that it doesn't
require too many iterations
00:35:11.640 --> 00:35:12.420
to do this.
00:35:12.420 --> 00:35:15.160
But in principle, we can do
this order N squared operations
00:35:15.160 --> 00:35:16.500
many times.
00:35:16.500 --> 00:35:18.840
And it'll turn out
that this sort of a map
00:35:18.840 --> 00:35:22.080
converges to the solution
that we're after.
00:35:22.080 --> 00:35:24.750
It converges when matrices
are either diagonally dominant
00:35:24.750 --> 00:35:28.470
as before, or they're symmetric
and they're positive definite.
00:35:28.470 --> 00:35:31.680
Positive definite means all
the eigenvalues of the matrix
00:35:31.680 --> 00:35:33.240
are bigger than 0.
00:35:40.680 --> 00:35:43.292
So try the iterative method
solving some equations
00:35:43.292 --> 00:35:44.250
and see how we convert.
00:35:44.250 --> 00:35:44.749
Yes?
00:35:44.749 --> 00:35:47.460
AUDIENCE: How do you justify
ignoring the diagonal elements
00:35:47.460 --> 00:35:48.400
in that method?
00:35:50.862 --> 00:35:52.320
JAMES W. SWAN: So
the question was,
00:35:52.320 --> 00:35:54.960
how do you justify ignoring
the diagonal elements
00:35:54.960 --> 00:35:56.102
in this method.
00:35:56.102 --> 00:35:57.810
Maybe I was going too
fast or I misspoke.
00:35:57.810 --> 00:36:01.980
So I'm going to split A into
a lower triangular matrix that
00:36:01.980 --> 00:36:05.250
has all the diagonal
elements, and U
00:36:05.250 --> 00:36:08.125
is the upper parts with none of
those diagonal elements on it.
00:36:08.125 --> 00:36:09.000
Does that make sense?
00:36:09.000 --> 00:36:09.750
AUDIENCE: Yeah.
00:36:09.750 --> 00:36:11.180
JAMES W. SWAN: Thank you
for asking that question.
00:36:11.180 --> 00:36:12.430
I hope that's clear.
00:36:12.430 --> 00:36:16.660
l holds onto the diagonal
pieces and U takes those away.
00:36:19.790 --> 00:36:20.490
So let's try it.
00:36:20.490 --> 00:36:23.310
On a matrix like this,
the exact solution
00:36:23.310 --> 00:36:27.210
to this system of equations
is 3/4, 1/2, and 1/4.
00:36:27.210 --> 00:36:28.980
All right, we'll
try Jacobi, we'll
00:36:28.980 --> 00:36:31.530
have to give it some initial
guess for the solution, right?
00:36:31.530 --> 00:36:33.450
We're talking about
places where you
00:36:33.450 --> 00:36:37.160
can derive those initial guesses
from later on in the course,
00:36:37.160 --> 00:36:39.300
but we have to start
the iterative process
00:36:39.300 --> 00:36:41.730
with some guess
at the solutions.
00:36:41.730 --> 00:36:43.380
So here's an initial guess.
00:36:43.380 --> 00:36:44.550
We'll apply this map.
00:36:44.550 --> 00:36:47.010
Here's Gauss-Seidel with
the same initial guess,
00:36:47.010 --> 00:36:48.670
and we'll apply this map.
00:36:48.670 --> 00:36:52.120
They're both
linearly convergent,
00:36:52.120 --> 00:36:53.700
so the relative
error will go down
00:36:53.700 --> 00:36:57.660
by a fixed factor
after each iteration.
00:36:57.660 --> 00:37:00.470
Iteration one, the relative
error in Jacobi will be 38%.
00:37:00.470 --> 00:37:02.910
In Gauss-Seidel, it'll be 40%.
00:37:02.910 --> 00:37:05.190
If we apply this all the
way down to 10 iterations,
00:37:05.190 --> 00:37:09.020
the relative error Jacobi will
be 1.7%, and the relative error
00:37:09.020 --> 00:37:11.360
in Gauss-Seidel 0.08%.
00:37:11.360 --> 00:37:13.320
And we can go on and on
with these iterations
00:37:13.320 --> 00:37:16.740
if we want until we get
sufficiently converged, we
00:37:16.740 --> 00:37:19.320
get to a point where the
relative error is small enough
00:37:19.320 --> 00:37:22.590
that we're happy to accept
this answer as a solution
00:37:22.590 --> 00:37:24.660
to our system of equations.
00:37:24.660 --> 00:37:28.450
So we traded the burden of doing
all these calculations to do
00:37:28.450 --> 00:37:34.080
elimination for a faster,
less computationally complex
00:37:34.080 --> 00:37:35.250
methodology.
00:37:35.250 --> 00:37:38.620
But the trade off was we don't
get an exact solution anymore.
00:37:38.620 --> 00:37:40.890
We're going to have finite
precision in the result,
00:37:40.890 --> 00:37:42.750
and we have to
specify the tolerance
00:37:42.750 --> 00:37:44.541
that we want to converge to.
00:37:44.541 --> 00:37:47.040
We're going to see now-- this
is the hook into the next part
00:37:47.040 --> 00:37:47.860
of that class--
00:37:47.860 --> 00:37:50.070
we're going to talk about
solutions of nonlinear
00:37:50.070 --> 00:37:52.470
equations next for
which there are
00:37:52.470 --> 00:37:55.140
almost no non-linear equations
that we can solve exactly.
00:37:55.140 --> 00:37:58.945
They all have to be solved
using these iterative methods.
00:37:58.945 --> 00:38:01.320
You can use these iterative
methods for linear equations.
00:38:01.320 --> 00:38:03.079
It's very common
to do it this way.
00:38:03.079 --> 00:38:04.620
In my group, we
solve lots of systems
00:38:04.620 --> 00:38:07.860
of linear equations associated
with hydrodynamic problems.
00:38:07.860 --> 00:38:10.670
These come up when
you're talking about,
00:38:10.670 --> 00:38:12.510
say, low Reynolds
number flows, which
00:38:12.510 --> 00:38:15.420
are linear sorts of
fluid flow problems.
00:38:15.420 --> 00:38:15.990
They're big.
00:38:15.990 --> 00:38:18.050
It's really hard to do
Gaussian elimination,
00:38:18.050 --> 00:38:19.910
so you apply different
iterative methods.
00:38:19.910 --> 00:38:20.910
You can do Gauss-Seidel.
00:38:20.910 --> 00:38:22.120
You can do Jacobi.
00:38:22.120 --> 00:38:23.790
We'll learn about
more advanced ones
00:38:23.790 --> 00:38:26.880
like PCG, which you're
applying on your homework now,
00:38:26.880 --> 00:38:29.970
and you should be seeing that
it converges relatively quickly
00:38:29.970 --> 00:38:32.176
in cases where exact
elimination doesn't work.
00:38:32.176 --> 00:38:34.050
We'll learn, actually,
how to do that method.
00:38:34.050 --> 00:38:35.758
That's one that we
apply in my own group.
00:38:35.758 --> 00:38:37.870
It's pretty common
to use out there.
00:38:37.870 --> 00:38:38.370
Yes?
00:38:38.370 --> 00:38:43.440
AUDIENCE: One question, is
that that Gauss, [INAUDIBLE]
00:38:43.440 --> 00:38:45.930
JAMES W. SWAN: Order N squared.
00:38:45.930 --> 00:38:47.708
AUDIENCE: Yeah,
that's what I meant.
00:38:47.708 --> 00:38:50.148
So now we've got an [INAUDIBLE].
00:38:50.148 --> 00:38:54.407
So we basically have
[INAUDIBLE] iterations, right?
00:38:54.407 --> 00:38:56.240
JAMES W. SWAN: This is
a wonderful question.
00:38:56.240 --> 00:39:00.120
So this is a pathological
problem in the sense
00:39:00.120 --> 00:39:03.300
that it requires a
lot of calculations
00:39:03.300 --> 00:39:05.550
to get an iterative
solution here.
00:39:05.550 --> 00:39:07.110
We haven't gotten
to an end that's
00:39:07.110 --> 00:39:10.290
big enough that the
computational complexities
00:39:10.290 --> 00:39:11.700
crossover.
00:39:11.700 --> 00:39:15.810
So for small Ns, probably
the factor in front of N--
00:39:15.810 --> 00:39:17.200
whatever number that is--
00:39:17.200 --> 00:39:19.000
and maybe even the
smaller factors,
00:39:19.000 --> 00:39:20.940
order N squared
factors on that order
00:39:20.940 --> 00:39:22.440
N cubed, play a big
role in how long
00:39:22.440 --> 00:39:24.660
it takes to actually
complete this thing.
00:39:24.660 --> 00:39:28.140
But modern problems are so
big that we almost always
00:39:28.140 --> 00:39:30.480
are running out to Ns
that are large enough
00:39:30.480 --> 00:39:31.590
that we see a crossover.
00:39:31.590 --> 00:39:34.110
You'll see this in your
homework this week.
00:39:34.110 --> 00:39:36.030
You won't see this
crossover at N equals 3.
00:39:36.030 --> 00:39:37.613
You're going to see
it out at N equals
00:39:37.613 --> 00:39:41.336
500 or 1,200, big problems.
00:39:41.336 --> 00:39:43.210
Then we're going to
encounter this crossover.
00:39:43.210 --> 00:39:44.376
That's a wonderful question.
00:39:44.376 --> 00:39:49.110
So first small system
sizes, iterative methods
00:39:49.110 --> 00:39:50.750
maybe don't buy you much.
00:39:50.750 --> 00:39:53.000
I suppose it depends on the
application though, right?
00:39:53.000 --> 00:39:56.430
If you're doing something
that involves solving problems
00:39:56.430 --> 00:40:01.420
on embedded hardware, in some
sort of sensor or control
00:40:01.420 --> 00:40:04.230
valve, there may be
very limited memory
00:40:04.230 --> 00:40:06.290
or computational capacity
available to you.
00:40:06.290 --> 00:40:08.490
And you may actually
apply an iterative method
00:40:08.490 --> 00:40:12.120
like this to a problem
that that controller
00:40:12.120 --> 00:40:15.330
needs to solve, for example.
00:40:15.330 --> 00:40:17.370
It just may not
have the capability
00:40:17.370 --> 00:40:20.880
of storing and inverting what
we would consider, today,
00:40:20.880 --> 00:40:25.350
a relatively small matrix
because the hardware doesn't
00:40:25.350 --> 00:40:26.670
have that sort of capability.
00:40:26.670 --> 00:40:28.890
So there could be
cases where you
00:40:28.890 --> 00:40:32.250
might choose something
that's slower but feasible,
00:40:32.250 --> 00:40:34.650
versus something that's
faster and exact,
00:40:34.650 --> 00:40:36.990
because there are
other constraints.
00:40:36.990 --> 00:40:40.560
They do exist, but modern
computers are pretty efficient.
00:40:40.560 --> 00:40:43.590
Your cell phone is faster
than the fastest computers
00:40:43.590 --> 00:40:46.410
in the world 20 years ago.
00:40:46.410 --> 00:40:47.550
We're doing OK.
00:40:47.550 --> 00:40:49.890
So we've got to get out to
big system sizes, big problem
00:40:49.890 --> 00:40:52.060
sizes, before this
starts to pay off.
00:40:52.060 --> 00:40:55.210
But it does for many
practical problems.
00:40:55.210 --> 00:40:59.410
OK I'll close with this, because
this is the hook into solving
00:40:59.410 --> 00:41:00.520
nonlinear equations.
00:41:03.537 --> 00:41:05.370
So I showed you these
two iterative methods,
00:41:05.370 --> 00:41:07.830
and they kind of had
stringent requirements
00:41:07.830 --> 00:41:10.770
for when they were actually
going to converge, right?
00:41:10.770 --> 00:41:13.350
I had to have a diagonally
dominant system of equations
00:41:13.350 --> 00:41:15.460
for Jacobi to converge.
00:41:15.460 --> 00:41:18.720
I had to have diagonal dominance
or symmetric positive definite
00:41:18.720 --> 00:41:19.260
matrices.
00:41:19.260 --> 00:41:20.730
These things exist
and they come up
00:41:20.730 --> 00:41:22.188
in lots of physical
problems, but I
00:41:22.188 --> 00:41:25.050
had to have it in order for
Gauss-Seidel to converge.
00:41:25.050 --> 00:41:26.700
What if I have a
system of equations
00:41:26.700 --> 00:41:28.194
that doesn't work that way?
00:41:28.194 --> 00:41:29.610
Or what if I have
an iterative map
00:41:29.610 --> 00:41:33.900
that I like for some reason, but
it doesn't appear to converge?
00:41:33.900 --> 00:41:37.270
Maybe it converges under some
circumstances, but not others.
00:41:37.270 --> 00:41:40.140
Well, there's a way to modify
these iterative maps, called
00:41:40.140 --> 00:41:43.380
successive
over-relaxation, which
00:41:43.380 --> 00:41:45.977
can help promote convergence.
00:41:45.977 --> 00:41:48.060
So suppose we have an
iterative map like this, x i
00:41:48.060 --> 00:41:51.897
plus 1 is some function of
the previous iteration value.
00:41:51.897 --> 00:41:52.980
Doesn't matter what it is.
00:41:52.980 --> 00:41:54.646
It could be linear,
could be non-linear.
00:41:54.646 --> 00:41:57.760
We don't actually care.
00:41:57.760 --> 00:42:00.610
The sought after solution
is found when x i plus 1
00:42:00.610 --> 00:42:01.630
is equal to x i.
00:42:01.630 --> 00:42:03.540
So this map is one
the convergence
00:42:03.540 --> 00:42:06.250
to the exact solution of
the problem that we want.
00:42:06.250 --> 00:42:08.650
We've somehow guaranteed
that that's the case,
00:42:08.650 --> 00:42:10.990
but it has to converge.
00:42:10.990 --> 00:42:14.890
One way to modify that
map is to say x i plus 1
00:42:14.890 --> 00:42:17.890
is 1 minus some
scalar value omega
00:42:17.890 --> 00:42:23.020
times x i plus omega times f.
00:42:23.020 --> 00:42:26.350
You can confirm that if you
substitute x i plus 1 equals
00:42:26.350 --> 00:42:28.900
x i into this equation,
you'll come up
00:42:28.900 --> 00:42:33.940
with the same fixed points
of this iterative map
00:42:33.940 --> 00:42:36.500
x i is equal to f of x i.
00:42:36.500 --> 00:42:40.210
So you haven't changed what
value will converge here,
00:42:40.210 --> 00:42:42.460
but you've affected the
rate at which it converges.
00:42:42.460 --> 00:42:45.640
Here you're saying x i
plus 1 is some fraction
00:42:45.640 --> 00:42:50.290
of my previous solution plus
some fraction of this f.
00:42:50.290 --> 00:42:53.890
And I get to control how big
those different fractions.
00:42:53.890 --> 00:42:58.410
So if things aren't converging
well for a map like this,
00:42:58.410 --> 00:43:00.700
then I could try
successive over-relaxation,
00:43:00.700 --> 00:43:03.400
and I could adjust this
relaxation parameter
00:43:03.400 --> 00:43:07.890
to be some fraction, some
number between 0 and 1,
00:43:07.890 --> 00:43:10.050
until I start to
observe convergence.
00:43:10.050 --> 00:43:11.550
And there are some
rules one can use
00:43:11.550 --> 00:43:13.383
to try to promote
convergence with this kind
00:43:13.383 --> 00:43:15.204
of successive over-relaxation.
00:43:15.204 --> 00:43:17.370
This is a very generic
technique that one can apply.
00:43:17.370 --> 00:43:20.700
If you have any iterative
map you're trying to apply,
00:43:20.700 --> 00:43:22.840
it should go to the
solution you want
00:43:22.840 --> 00:43:24.570
but it doesn't converge
for some reason,
00:43:24.570 --> 00:43:27.630
then you can use this
relaxation technique to promote
00:43:27.630 --> 00:43:29.250
convergence to the solution.
00:43:29.250 --> 00:43:31.770
You may slow the
convergence way down.
00:43:31.770 --> 00:43:34.680
It may be very slow to
converge, but it will converge.
00:43:34.680 --> 00:43:36.690
And after all, an
answer is better
00:43:36.690 --> 00:43:38.940
than no answer, no matter
how long it takes to get it.
00:43:38.940 --> 00:43:40.740
So sometimes you've
got to get these things
00:43:40.740 --> 00:43:43.670
by hook or by crook.
00:43:43.670 --> 00:43:45.710
So for example, you can
apply this to Jacobi.
00:43:45.710 --> 00:43:48.514
This was the
original Jacobi map.
00:43:48.514 --> 00:43:49.430
And we just take that.
00:43:49.430 --> 00:43:53.640
We add 1 minus omega
times x i plus omega
00:43:53.640 --> 00:43:55.680
times this factor over here.
00:43:55.680 --> 00:43:59.300
And now we can choose omega so
that this solution converges.
00:43:59.300 --> 00:44:02.360
We always make omega
small enough so
00:44:02.360 --> 00:44:04.550
that the diagonal
values of our matrix
00:44:04.550 --> 00:44:07.220
appear big enough
that the matrix looks
00:44:07.220 --> 00:44:09.770
like it's diagonally dominated.
00:44:09.770 --> 00:44:12.230
You could go back to that
same convergence analysis
00:44:12.230 --> 00:44:14.510
that I showed you before
and try to apply it
00:44:14.510 --> 00:44:19.252
to this over-relaxation
form of Jacobi and see that,
00:44:19.252 --> 00:44:21.710
while there's always going to
be some value of omega that's
00:44:21.710 --> 00:44:25.420
small enough, that this
thing will converge.
00:44:25.420 --> 00:44:27.920
It will look effectively
diagonally dominant,
00:44:27.920 --> 00:44:32.150
because omega inverse
times D will be big enough,
00:44:32.150 --> 00:44:34.865
or omega times D inverse
will be small enough.
00:44:34.865 --> 00:44:36.619
Does that make sense?
00:44:36.619 --> 00:44:39.160
You can apply the same sort of
damping method to Gauss-Seidel
00:44:39.160 --> 00:44:39.660
as well.
00:44:39.660 --> 00:44:42.360
It's very common to do this.
00:44:42.360 --> 00:44:45.560
The relaxation parameter acts
like an effective increase
00:44:45.560 --> 00:44:47.692
in the eigenvalues
of the matrix.
00:44:47.692 --> 00:44:50.150
So you can think about L. That's
a lower triangular matrix.
00:44:50.150 --> 00:44:54.310
It's diagonal values
are its eigenvalues.
00:44:54.310 --> 00:44:56.190
The diagonal values
of L inverse--
00:44:56.190 --> 00:44:59.000
well, 1 over those
diagonal values
00:44:59.000 --> 00:45:00.820
are the eigenvalues
of L inverse.
00:45:00.820 --> 00:45:03.560
And so if we make
omega very small,
00:45:03.560 --> 00:45:06.630
then we make the eigenvalues
of L inverse very small,
00:45:06.630 --> 00:45:08.510
or the eigenvalues
or L very big.
00:45:08.510 --> 00:45:12.170
And again, the matrix starts
to look diagonally dominated.
00:45:12.170 --> 00:45:15.330
And you can promote
convergence in this way.
00:45:15.330 --> 00:45:17.960
So even though this
may be slow, you
00:45:17.960 --> 00:45:19.640
can use it to
guarantee convergence
00:45:19.640 --> 00:45:22.299
of some iterative procedures,
not just for linear equations,
00:45:22.299 --> 00:45:23.840
but for non-linear
equations as well.
00:45:23.840 --> 00:45:25.370
And we'll see,
there are good ways
00:45:25.370 --> 00:45:27.230
of choosing omega
for certain classes
00:45:27.230 --> 00:45:29.010
of non-linear equations.
00:45:29.010 --> 00:45:30.440
We'll apply
Newton-Raphson method,
00:45:30.440 --> 00:45:33.590
and then will damp it using
exactly the sort of procedure.
00:45:33.590 --> 00:45:37.190
And I'll show you
how you can choose
00:45:37.190 --> 00:45:41.860
a nearly optimal value for
omega to promote convergence
00:45:41.860 --> 00:45:43.610
to the solution.
00:45:43.610 --> 00:45:46.480
Any questions?
00:45:46.480 --> 00:45:50.140
No, let me address one
more thing before you go.
00:45:50.140 --> 00:45:53.140
We've scheduled times
for the quizzes.
00:45:53.140 --> 00:45:55.180
They are going to
be in the evenings
00:45:55.180 --> 00:45:58.180
on the dates that are
specified on the syllabus.
00:45:58.180 --> 00:46:01.060
We wanted to do them
during the daytime.
00:46:01.060 --> 00:46:02.890
It was really
difficult to schedule
00:46:02.890 --> 00:46:04.970
a room that was big
enough for this class,
00:46:04.970 --> 00:46:07.990
so they have to be from 7:00
to 9:00 in the gymnasium.
00:46:07.990 --> 00:46:09.709
I apologize for that.
00:46:09.709 --> 00:46:11.500
We spent several days
looking around trying
00:46:11.500 --> 00:46:14.050
to find a place where
we could put everybody
00:46:14.050 --> 00:46:17.680
so you would all get the
same experience in the quiz.
00:46:17.680 --> 00:46:20.590
I know that the November
quiz comes back to back
00:46:20.590 --> 00:46:24.130
with the thermodynamics
exam as well.
00:46:24.130 --> 00:46:26.030
That's frustrating.
00:46:26.030 --> 00:46:27.850
Thermodynamics is the next day.
00:46:27.850 --> 00:46:29.170
That week is tricky.
00:46:29.170 --> 00:46:33.004
That's AICHE, so most of
the faculty have to travel.
00:46:33.004 --> 00:46:34.420
We won't be able
to teach, but you
00:46:34.420 --> 00:46:36.350
won't have classes
one of those days
00:46:36.350 --> 00:46:39.910
so you have extra time to study.
00:46:39.910 --> 00:46:41.750
And Columbus Day also
falls in that week,
00:46:41.750 --> 00:46:44.590
so there's no way to put
three exams in four days
00:46:44.590 --> 00:46:46.830
without having them
come right back to back.
00:46:46.830 --> 00:46:48.460
Believe me, we
thought about this
00:46:48.460 --> 00:46:51.004
and tried to get things
scheduled as efficiently as we
00:46:51.004 --> 00:46:52.420
could for you, but
sometimes there
00:46:52.420 --> 00:46:54.420
are constraints that are
outside of our control.
00:46:54.420 --> 00:46:55.927
But the quiz times are set.
00:46:55.927 --> 00:46:58.260
There's going to be done in
October and one in November.
00:46:58.260 --> 00:47:00.790
They'll be in the evening, and
they'll be in the gymnasium.
00:47:00.790 --> 00:47:03.790
I'll give you directions
to it before the exam, just
00:47:03.790 --> 00:47:06.280
say you know exactly
where to go, OK?
00:47:06.280 --> 00:47:08.210
Thank you, guys.