WEBVTT

00:00:01.580 --> 00:00:03.920
The following content is
provided under a Creative

00:00:03.920 --> 00:00:05.340
Commons license.

00:00:05.340 --> 00:00:07.550
Your support will help
MIT OpenCourseWare

00:00:07.550 --> 00:00:11.640
continue to offer high quality
educational resources for free.

00:00:11.640 --> 00:00:14.180
To make a donation or to
view additional materials

00:00:14.180 --> 00:00:18.110
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:18.110 --> 00:00:19.130
at ocw.mit.edu.

00:00:23.390 --> 00:00:26.150
JAMES W. SWAN: So this is
going to be our last lecture

00:00:26.150 --> 00:00:27.960
on linear algebra.

00:00:27.960 --> 00:00:30.417
The first three
lectures covered basics.

00:00:30.417 --> 00:00:32.750
The next three lectures, we
talked about different sorts

00:00:32.750 --> 00:00:34.730
of transformations of matrices.

00:00:34.730 --> 00:00:36.742
This final lecture is
the last of those three.

00:00:36.742 --> 00:00:39.200
We're going to talk about in
another sort of transformation

00:00:39.200 --> 00:00:42.820
called the singular
value decomposition.

00:00:42.820 --> 00:00:46.610
OK, before we jump in, I'd like
to do the usual recap business.

00:00:46.610 --> 00:00:49.010
I think it's always hopeful
to recap or look at things

00:00:49.010 --> 00:00:51.170
from a different perspective.

00:00:51.170 --> 00:00:55.130
Early on, I told you that the
infinite dimensional equivalent

00:00:55.130 --> 00:00:57.740
of vectors would be something
like a function, which

00:00:57.740 --> 00:01:02.510
is a map, a unique map maybe
from a point to x to some value

00:01:02.510 --> 00:01:04.150
f of x.

00:01:04.150 --> 00:01:06.080
And there is an
equivalent representation

00:01:06.080 --> 00:01:08.780
of the eigenvalue eigenvector
problem in function space.

00:01:08.780 --> 00:01:11.330
We call these eigenvalues
and eigenfunctions.

00:01:11.330 --> 00:01:15.074
Here's a classic one where
the function is y of x, OK?

00:01:15.074 --> 00:01:17.240
This is the equivalent of
the vector, and equivalent

00:01:17.240 --> 00:01:18.890
of the transformation
or the matrix

00:01:18.890 --> 00:01:21.230
that's this differential
operator this time,

00:01:21.230 --> 00:01:22.850
the second derivative.

00:01:22.850 --> 00:01:26.340
So I take the second derivative
of this particular function,

00:01:26.340 --> 00:01:28.040
and the function is stretched.

00:01:28.040 --> 00:01:30.980
It's multiplied by some
fixed value at all points.

00:01:30.980 --> 00:01:33.830
And it becomes lambda times y.

00:01:33.830 --> 00:01:37.100
And that operator has to be
closed with some boundary

00:01:37.100 --> 00:01:37.920
conditions as well.

00:01:37.920 --> 00:01:39.830
We have to say
what the value of y

00:01:39.830 --> 00:01:43.640
is at the edges
of some boundary.

00:01:43.640 --> 00:01:45.590
So there's a one-to-one
correspondence

00:01:45.590 --> 00:01:48.470
between these things.

00:01:48.470 --> 00:01:54.730
What is the eigenfunction here,
or what are the eigenfunctions?

00:01:54.730 --> 00:01:57.160
And what are the
eigenvalues associated

00:01:57.160 --> 00:01:59.590
with this transformation
or this operator?

00:01:59.590 --> 00:02:01.667
Can you work those
out really quickly?

00:02:01.667 --> 00:02:03.250
You learned this at
some point, right?

00:02:03.250 --> 00:02:05.995
Somebody taught you
differential equations

00:02:05.995 --> 00:02:07.360
and you calculated these things.

00:02:07.360 --> 00:02:08.410
Take about 90 seconds.

00:02:08.410 --> 00:02:09.370
Work with the people around you.

00:02:09.370 --> 00:02:11.286
See if you can come to
a conclusion about what

00:02:11.286 --> 00:02:14.980
the eigenfunction
and eigenvalues are.

00:02:26.699 --> 00:02:27.490
That's enough time.

00:02:27.490 --> 00:02:28.870
You can work on this
on your own later

00:02:28.870 --> 00:02:29.953
if you've run out of time.

00:02:29.953 --> 00:02:30.976
Don't worry about it.

00:02:30.976 --> 00:02:32.600
Does somebody want
to volunteer a guess

00:02:32.600 --> 00:02:36.340
for what the eigenfunctions
are in this case?

00:02:36.340 --> 00:02:38.350
What are they?

00:02:38.350 --> 00:02:39.300
Yeah?

00:02:39.300 --> 00:02:42.846
AUDIENCE: [INAUDIBLE]

00:02:42.846 --> 00:02:44.720
JAMES W. SWAN: OK, so
you chose exponentials.

00:02:44.720 --> 00:02:45.950
That's an interesting choice.

00:02:45.950 --> 00:02:47.880
That's one possible
choice you can make.

00:02:47.880 --> 00:02:48.994
OK, so we could say--

00:02:48.994 --> 00:02:50.660
this is sort of a
classical one that you

00:02:50.660 --> 00:02:55.274
think about when you first
learn differential equation.

00:02:55.274 --> 00:02:56.690
They say, an
equation of this sort

00:02:56.690 --> 00:03:01.544
has solutions that look like
exponentials, and that's true.

00:03:01.544 --> 00:03:03.210
There's another
representation for this,

00:03:03.210 --> 00:03:05.760
which is as trigonometric
functions instead, right?

00:03:09.110 --> 00:03:10.470
Either of those is acceptable.

00:03:10.470 --> 00:03:12.550
[INAUDIBLE] the
trigonometric functions,

00:03:12.550 --> 00:03:17.417
that representation is a
little more useful for us here.

00:03:17.417 --> 00:03:19.750
We know that the boundary
conditions tell us that y of 0

00:03:19.750 --> 00:03:22.430
is supposed to be 0.

00:03:22.430 --> 00:03:26.590
That means that the C1 has to
be 0, because cosine of 0 is 1.

00:03:26.590 --> 00:03:29.550
So C1 has 0 in this case.

00:03:29.550 --> 00:03:33.030
So that fixes one of
these coefficients.

00:03:33.030 --> 00:03:35.730
And now we're left
with a problem, right?

00:03:35.730 --> 00:03:38.310
Our solutions, our
eigenfunctions,

00:03:38.310 --> 00:03:39.220
cannot be unique.

00:03:39.220 --> 00:03:41.760
So we don't get to
specify C2, right?

00:03:41.760 --> 00:03:44.537
Any function that's a
multiple of this sine

00:03:44.537 --> 00:03:45.870
should also be an eigenfunction.

00:03:45.870 --> 00:03:47.640
So instead the other
boundary condition,

00:03:47.640 --> 00:03:51.030
this y of l equals 0, needs
to be used to pin down

00:03:51.030 --> 00:03:52.920
with the eigenvalue is.

00:03:52.920 --> 00:03:54.870
So the second
equation, y of l equals

00:03:54.870 --> 00:03:59.560
0, which implies that the
square root of minus lambda

00:03:59.560 --> 00:04:01.660
has to be equal
to 2 pi over l, it

00:04:01.660 --> 00:04:04.232
has to be all the
nodes of the sine

00:04:04.232 --> 00:04:05.440
where the sine is equal to 0.

00:04:05.440 --> 00:04:07.565
That's the equivalent of
our secular characteristic

00:04:07.565 --> 00:04:10.990
polynomial that prescribes with
the eigenvalues are associated

00:04:10.990 --> 00:04:13.524
with each of the eigenfunctions.

00:04:13.524 --> 00:04:15.190
So now we know what
the eigenvalues are.

00:04:15.190 --> 00:04:17.320
The eigenvalues are
the set of numbers

00:04:17.320 --> 00:04:21.700
minus 2 pi n over l squared.

00:04:21.700 --> 00:04:23.490
There's an infinite
number of eigenvalues.

00:04:23.490 --> 00:04:26.920
It's an infinite dimensional
space that we're in,

00:04:26.920 --> 00:04:29.230
so it's not a big surprise
that it works out that way.

00:04:29.230 --> 00:04:33.100
And the eigenvectors then are
different scalar multiples

00:04:33.100 --> 00:04:36.220
of sine of the eigenvalues,
square root of the eigenvalues,

00:04:36.220 --> 00:04:37.672
minus x.

00:04:37.672 --> 00:04:39.130
There's a one-to-one
correspondence

00:04:39.130 --> 00:04:41.110
between all the linear
algebra we've done

00:04:41.110 --> 00:04:42.670
and linear
differential equations

00:04:42.670 --> 00:04:44.560
or linear partial
differential equations.

00:04:44.560 --> 00:04:47.030
You can think about these
things in exactly the same way.

00:04:47.030 --> 00:04:53.080
I'm sure in 1050, you started to
talk about orthogonal functions

00:04:53.080 --> 00:04:55.270
to represent solutions of
differential equations.

00:04:55.270 --> 00:04:57.190
Or if you haven't, you're
going to very soon.

00:04:57.190 --> 00:04:58.690
This is a part of
the course you get

00:04:58.690 --> 00:05:01.180
to look at the analytical side
of some of these things as

00:05:01.180 --> 00:05:02.230
opposed to the numerical side.

00:05:02.230 --> 00:05:03.771
But there's a
one-to-one relationship

00:05:03.771 --> 00:05:04.730
between those things.

00:05:04.730 --> 00:05:06.855
So if you understand one,
you understand the other,

00:05:06.855 --> 00:05:10.770
and you can come at them
from either perspective.

00:05:10.770 --> 00:05:11.979
This sort of stuff is useful.

00:05:11.979 --> 00:05:14.144
Actually, the classical
chemical engineering example

00:05:14.144 --> 00:05:15.920
comes from quantum
mechanics where

00:05:15.920 --> 00:05:18.780
you think about wave functions
and different energy levels

00:05:18.780 --> 00:05:21.870
corresponding to eigenvalues.

00:05:21.870 --> 00:05:23.220
That's cool.

00:05:23.220 --> 00:05:25.460
Sometimes, I like to think
about a mechanical analog

00:05:25.460 --> 00:05:28.420
to that, which is the
buckling of an elastic column.

00:05:28.420 --> 00:05:29.670
So you should do this at home.

00:05:29.670 --> 00:05:31.800
You should go get a
piece of spaghetti

00:05:31.800 --> 00:05:34.620
and push on the ends of
the piece of the spaghetti.

00:05:34.620 --> 00:05:36.000
And the spaghetti will buckle.

00:05:36.000 --> 00:05:38.300
Eventually it'll break,
but it'll buckle first.

00:05:38.300 --> 00:05:39.520
It'll bend.

00:05:39.520 --> 00:05:41.070
And how does it bend?

00:05:41.070 --> 00:05:45.330
Well, a balance of linear
momentum on this bar

00:05:45.330 --> 00:05:48.780
would tell you
that the deflection

00:05:48.780 --> 00:05:52.420
in the bar at different
points x along the bar

00:05:52.420 --> 00:05:57.600
multiplied by the pressure has
to balance the bending moment

00:05:57.600 --> 00:05:58.690
in the bar itself.

00:05:58.690 --> 00:06:00.840
So this e is some
elastic constant.

00:06:00.840 --> 00:06:02.640
I has a moment of inertia.

00:06:02.640 --> 00:06:04.650
And D squared y dx
squared is something

00:06:04.650 --> 00:06:05.980
like the curvature of the bar.

00:06:05.980 --> 00:06:07.590
So it's the bending
moments of the bar

00:06:07.590 --> 00:06:09.510
that balances the
pressure that's

00:06:09.510 --> 00:06:11.250
being exerted on the bar.

00:06:11.250 --> 00:06:13.800
And sure enough,
this bar will buckle

00:06:13.800 --> 00:06:16.860
when the pressure
applied exceeds

00:06:16.860 --> 00:06:20.330
the first eigenvalue associated
with this differential

00:06:20.330 --> 00:06:20.830
equation.

00:06:20.830 --> 00:06:24.390
We just worked that
eigenvalue out.

00:06:24.390 --> 00:06:29.950
We said that that eigenvalue
had to be the square root of 2

00:06:29.950 --> 00:06:31.407
pi over l squared.

00:06:31.407 --> 00:06:33.240
And so when the pressure
exceeds square root

00:06:33.240 --> 00:06:36.810
of 2 pi over l squared
times the elastic modulus,

00:06:36.810 --> 00:06:40.464
this column will bend
and deform continuously

00:06:40.464 --> 00:06:41.880
until it eventually
breaks, right?

00:06:41.880 --> 00:06:44.710
It will undergo this
linear elastic deformation,

00:06:44.710 --> 00:06:48.560
then plastic deformation
later, and it will break.

00:06:48.560 --> 00:06:50.360
The Eiffel Tower,
actually, is one

00:06:50.360 --> 00:06:51.860
of the first
structures in the world

00:06:51.860 --> 00:06:54.270
to utilize this
principle, right?

00:06:54.270 --> 00:06:56.000
It's got very
narrow beams in it.

00:06:56.000 --> 00:06:59.614
The beams are engineered so
that their elastic modulus

00:06:59.614 --> 00:07:01.280
is strong enough that
they won't buckle.

00:07:01.280 --> 00:07:04.060
Gustave Eiffel is one of the
first applied physicists,

00:07:04.060 --> 00:07:07.700
somebody who took the
physics of elastic bars

00:07:07.700 --> 00:07:10.460
and applied them to
building structures that

00:07:10.460 --> 00:07:14.570
weren't big and blocky, but used
a minimal amount of material.

00:07:14.570 --> 00:07:16.880
Cool, right?

00:07:16.880 --> 00:07:18.505
OK, so that's recap.

00:07:21.580 --> 00:07:23.980
Any questions about that?

00:07:23.980 --> 00:07:26.410
You've seen these things before.

00:07:26.410 --> 00:07:29.530
You understood them
well before too maybe?

00:07:29.530 --> 00:07:33.430
Give some thought to this, OK?

00:07:33.430 --> 00:07:37.090
We talked about
eigendecomposition last time

00:07:37.090 --> 00:07:40.050
that, associated with
the square matrix,

00:07:40.050 --> 00:07:44.050
was a particular eigenvalue or
particular set of eigenvalues,

00:07:44.050 --> 00:07:48.250
stretches and corresponding
eigenvectors directions.

00:07:48.250 --> 00:07:51.490
These were special solutions to
the system of linear equations

00:07:51.490 --> 00:07:52.540
based on a matrix.

00:07:52.540 --> 00:07:53.815
It was a square matrix.

00:07:53.815 --> 00:07:55.540
And you might ask,
well, what happens

00:07:55.540 --> 00:07:57.850
if the matrix isn't square?

00:07:57.850 --> 00:08:03.010
What if A is in the space of
real matrices that are n by m,

00:08:03.010 --> 00:08:04.630
where n and m maybe
aren't the same?

00:08:04.630 --> 00:08:06.910
Maybe they are the same,
but maybe they're not.

00:08:06.910 --> 00:08:10.046
And there is an
equivalent decomposition.

00:08:10.046 --> 00:08:11.920
It's called the singular
value decomposition.

00:08:11.920 --> 00:08:16.850
It's like an eigendecomposition
for non-square matrices.

00:08:16.850 --> 00:08:21.150
So rather than writing our
matrix as some w lambda w

00:08:21.150 --> 00:08:24.720
inverse, we're going to
write it as some product

00:08:24.720 --> 00:08:29.490
U times sigma times
V with this dagger.

00:08:29.490 --> 00:08:31.940
The dagger here is
conjugate transpose.

00:08:31.940 --> 00:08:34.400
Transpose the matrix, and
take the complex conjugate

00:08:34.400 --> 00:08:36.382
of all the elements, OK?

00:08:36.382 --> 00:08:39.630
I mentioned last time that
eigenvalues and eigenvectors

00:08:39.630 --> 00:08:42.820
could be complex,
potentially, right?

00:08:42.820 --> 00:08:45.960
So whenever we have that case
where things can be complex,

00:08:45.960 --> 00:08:47.510
usually the
transposition operation

00:08:47.510 --> 00:08:49.510
is replaced with the
conjugate transpose.

00:08:52.224 --> 00:08:53.640
What are these
different matrices.

00:08:53.640 --> 00:08:55.620
Well, let me tell you.

00:08:55.620 --> 00:08:58.140
U is a complex matrix.

00:08:58.140 --> 00:09:01.530
It maps from the
space N to R N to R N,

00:09:01.530 --> 00:09:04.350
so it's an n by n square matrix.

00:09:04.350 --> 00:09:09.240
Sigma is a real valued matrix,
and it lives in the space of n

00:09:09.240 --> 00:09:10.830
by n matrices.

00:09:10.830 --> 00:09:16.520
V is a square matrix again,
but it has dimensions m by m.

00:09:16.520 --> 00:09:19.770
Remember, A maps
from R M to R N,

00:09:19.770 --> 00:09:22.110
so that's what the
sequence of products says.

00:09:22.110 --> 00:09:23.820
B maps from m to m.

00:09:23.820 --> 00:09:27.320
Sigma maps from m to n.

00:09:27.320 --> 00:09:28.720
U maps from n to n.

00:09:28.720 --> 00:09:31.350
So this match from
m to n as well.

00:09:34.060 --> 00:09:36.760
Sigma is like
lambda from before.

00:09:36.760 --> 00:09:38.660
It's a diagonal matrix.

00:09:38.660 --> 00:09:40.040
It only has diagonal elements.

00:09:40.040 --> 00:09:41.415
It's just not
square, but it only

00:09:41.415 --> 00:09:45.540
has diagonal elements, all
of which will be positive.

00:09:48.100 --> 00:09:51.730
And then U and V are called
the left and right singular

00:09:51.730 --> 00:09:52.310
vectors.

00:09:52.310 --> 00:09:54.560
And they have special
properties associated with them,

00:09:54.560 --> 00:09:56.290
which I'll show you right now.

00:09:56.290 --> 00:09:59.600
Any questions about
how this decomposition

00:09:59.600 --> 00:10:02.690
is composed or made up?

00:10:02.690 --> 00:10:04.790
It looks just like the
eigendecomposition,

00:10:04.790 --> 00:10:06.360
but it can be applied
to any matrix.

00:10:06.360 --> 00:10:06.860
Yes?

00:10:06.860 --> 00:10:07.910
AUDIENCE: Quick question.

00:10:07.910 --> 00:10:08.170
JAMES W. SWAN: Sure.

00:10:08.170 --> 00:10:09.920
AUDIENCE: Do all
matrices have this thing,

00:10:09.920 --> 00:10:12.440
or is it like the eigenvalues
where some do and some don't.

00:10:12.440 --> 00:10:13.170
JAMES W. SWAN: This
is a great question.

00:10:13.170 --> 00:10:15.253
So all matrices are going
to have a singular value

00:10:15.253 --> 00:10:16.710
decomposition.

00:10:16.710 --> 00:10:18.410
We saw with the
eigenvalue decomposition

00:10:18.410 --> 00:10:22.520
that there could be a case
where the eigenvectors are

00:10:22.520 --> 00:10:25.190
degenerate, and we can't
write that full decomposition.

00:10:25.190 --> 00:10:28.214
All matrices are going to
have this decomposition.

00:10:34.380 --> 00:10:41.690
So for some properties
of this decomposition,

00:10:41.690 --> 00:10:44.330
U and V are what we
call unitary matrices.

00:10:44.330 --> 00:10:45.850
I talked about these before.

00:10:45.850 --> 00:10:49.430
Unitary matrices are ones for
whom, if they're real valued,

00:10:49.430 --> 00:10:51.830
their transpose is
also their inverse.

00:10:51.830 --> 00:10:54.490
If they're complex
valued, and they're

00:10:54.490 --> 00:10:57.620
conjugate transpose is the
equivalent of their inverse.

00:10:57.620 --> 00:11:01.930
So U times U conjugate
transpose will be identity.

00:11:01.930 --> 00:11:05.800
V times V conjugate
transpose will be identity.

00:11:05.800 --> 00:11:08.510
Unitary matrices also
have the property

00:11:08.510 --> 00:11:12.400
that they impart no
stretch to a matrix--

00:11:12.400 --> 00:11:13.820
or to vectors.

00:11:13.820 --> 00:11:16.460
So their maps don't stretch.

00:11:16.460 --> 00:11:19.104
They're kind of like
rotational matrices, right?

00:11:19.104 --> 00:11:21.520
They change directions, but
they don't stretch things out.

00:11:25.190 --> 00:11:30.890
If I were to take A conjugate
transpose and multiply it by A,

00:11:30.890 --> 00:11:36.680
that would be the same as taking
U sigma V conjugate transpose,

00:11:36.680 --> 00:11:38.780
and multiplying it
by U sigma V. If I

00:11:38.780 --> 00:11:43.010
use the properties of
matrix multiplications

00:11:43.010 --> 00:11:46.340
and complex
conjugate transposes,

00:11:46.340 --> 00:11:48.500
and work out what
this expression is,

00:11:48.500 --> 00:11:52.280
I'll find out that it's
equivalent to V sigma conjugate

00:11:52.280 --> 00:11:56.660
transpose sigma V
conjugate transpose.

00:11:56.660 --> 00:12:03.050
Well this has exactly the same
form as an eigendecomposition.

00:12:03.050 --> 00:12:07.400
An eigendecomposition
of A times A instead

00:12:07.400 --> 00:12:11.600
of an eigendecomposition
of A. So V

00:12:11.600 --> 00:12:16.160
is the set of eigenvectors
of A conjugate transpose A,

00:12:16.160 --> 00:12:19.700
and sigma squared
are the eigenvalues

00:12:19.700 --> 00:12:24.260
of A conjugate
transpose times A.

00:12:24.260 --> 00:12:26.610
And if I reverse the order
of this multiplication--

00:12:26.610 --> 00:12:30.530
so I do A times A conjugate
transpose-- and work it out,

00:12:30.530 --> 00:12:34.010
that would be U sigma
sigma U. And so U

00:12:34.010 --> 00:12:38.060
are the eigenvectors of
A A conjugate transpose,

00:12:38.060 --> 00:12:42.320
and sigma squared are still the
eigenvalues of A A conjugate

00:12:42.320 --> 00:12:45.070
transpose.

00:12:45.070 --> 00:12:47.830
So what are these
things U and V?

00:12:47.830 --> 00:12:51.690
They relate to the
eigenvectors of the product

00:12:51.690 --> 00:12:54.550
of A with itself, this
particular product of A

00:12:54.550 --> 00:12:58.480
with itself, or this particular
product of A with itself.

00:12:58.480 --> 00:13:01.310
Sigma are the singular values.

00:13:01.310 --> 00:13:03.750
And all matrices possess
this sort of a decomposition.

00:13:03.750 --> 00:13:06.530
They all have a set of singular
values and singular vectors.

00:13:06.530 --> 00:13:08.994
These sigmas are called the
singular values of the A.

00:13:08.994 --> 00:13:10.160
They have a particular name.

00:13:10.160 --> 00:13:12.110
I'm going to show you how you
can use this decomposition

00:13:12.110 --> 00:13:14.000
to do something you
already know how to do,

00:13:14.000 --> 00:13:15.083
but how to do it formally.

00:13:18.200 --> 00:13:21.738
What are some properties of the
singular value decomposition?

00:13:24.810 --> 00:13:27.510
So if we take a matrix A and
we compute it's singular value

00:13:27.510 --> 00:13:29.470
decomposition, this is
how you do it in Matlab.

00:13:33.030 --> 00:13:36.560
We'll find out, for this
matrix, U is identity.

00:13:36.560 --> 00:13:39.720
Sigma is identity with an
extra column pasted on it.

00:13:39.720 --> 00:13:40.920
And B is also identity.

00:13:40.920 --> 00:13:44.130
I mean, this is the simplest
possible four by three matrix

00:13:44.130 --> 00:13:45.517
I can write down.

00:13:45.517 --> 00:13:47.850
You don't have to know how
to compute the singular value

00:13:47.850 --> 00:13:50.160
decomposition, you just
need to know that it

00:13:50.160 --> 00:13:51.420
can be computed in this way.

00:13:51.420 --> 00:13:53.160
You might be able to
guess how to compute it

00:13:53.160 --> 00:13:55.035
based on what we did
with eigenvalues earlier

00:13:55.035 --> 00:13:57.240
and eigenvectors.

00:13:57.240 --> 00:13:59.670
It'll turn out some of
the columns of sigma

00:13:59.670 --> 00:14:00.950
will be non-zero right?

00:14:00.950 --> 00:14:04.390
There are three non-zero
columns of sigma.

00:14:04.390 --> 00:14:07.780
And the columns of
V, they correspond

00:14:07.780 --> 00:14:11.800
to those columns of sigma,
spanned the null space

00:14:11.800 --> 00:14:16.620
of the matrix A.

00:14:16.620 --> 00:14:20.220
So the first three
columns here are non-zero,

00:14:20.220 --> 00:14:24.221
the first three columns
of V. I'm sorry,

00:14:24.221 --> 00:14:25.970
the first three columns
here are non-zero.

00:14:25.970 --> 00:14:27.590
The last column is 0.

00:14:27.590 --> 00:14:30.440
The columns of sigma
which are 0 correspond

00:14:30.440 --> 00:14:33.920
to a particular column in V,
this last column here, which

00:14:33.920 --> 00:14:36.017
lives in the null
space of A. So you

00:14:36.017 --> 00:14:37.600
can see, if I take
A and I multiply it

00:14:37.600 --> 00:14:41.970
by any vector that's
proportional to 0, 0, 0, 1,

00:14:41.970 --> 00:14:43.340
I'll get back 0.

00:14:43.340 --> 00:14:46.640
So the null space
of A is spanned

00:14:46.640 --> 00:14:48.260
by all these vectors
corresponding

00:14:48.260 --> 00:14:49.850
to the 0 columns of sigma.

00:14:52.980 --> 00:14:54.807
Some of the columns
of sigma are non-zero.

00:14:54.807 --> 00:14:55.890
These first three columns.

00:14:55.890 --> 00:15:02.000
And the rows of U corresponding
to those three columns

00:15:02.000 --> 00:15:04.550
span the range of A. So
if I do the singular value

00:15:04.550 --> 00:15:09.410
decomposition of a matrix,
and I look at U, V, and sigma

00:15:09.410 --> 00:15:11.060
and what they're composed of--

00:15:11.060 --> 00:15:15.050
where sigma is 0 and non-zero,
and the corresponding columns

00:15:15.050 --> 00:15:17.330
or rows of U and V--
then I can figure out

00:15:17.330 --> 00:15:25.110
what vectors span the range
and null space of the matrix A.

00:15:25.110 --> 00:15:26.710
Here's another example.

00:15:26.710 --> 00:15:29.130
So here I have A.
Now instead of being

00:15:29.130 --> 00:15:32.850
three rows by four columns,
it's four rows by three columns.

00:15:32.850 --> 00:15:34.660
And here's the singular
value decomposition

00:15:34.660 --> 00:15:37.140
that comes out of Matlab.

00:15:37.140 --> 00:15:41.580
There are no vectors that
live in the null space of A,

00:15:41.580 --> 00:15:44.160
and there are no 0
columns in sigma.

00:15:44.160 --> 00:15:46.975
There's no corresponding
columns in V.

00:15:46.975 --> 00:15:50.920
There are no vectors
in the null space of A.

00:15:50.920 --> 00:15:55.480
The range of A is spanned
by the rows corresponding

00:15:55.480 --> 00:15:58.420
to the non-zero-- the
rows of U corresponding

00:15:58.420 --> 00:15:59.950
to the non-zero
columns of sigma.

00:15:59.950 --> 00:16:02.890
So it's these three columns
in the first three rows.

00:16:02.890 --> 00:16:07.000
And these first three
rows, clearly they span--

00:16:07.000 --> 00:16:12.040
they describe the same range
as the three columns in A.

00:16:12.040 --> 00:16:13.540
So the singular
value decomposition

00:16:13.540 --> 00:16:17.290
gives us direct access
to the null space

00:16:17.290 --> 00:16:18.810
and the range of a matrix.

00:16:18.810 --> 00:16:21.570
That's handy.

00:16:21.570 --> 00:16:24.170
And it can be used
in various ways.

00:16:24.170 --> 00:16:26.140
So here's one example
where it can be used.

00:16:26.140 --> 00:16:29.830
Here I have a fingerprint.

00:16:29.830 --> 00:16:31.150
It's a bitmap.

00:16:31.150 --> 00:16:33.767
It's a square bit of
data, like a matrix,

00:16:33.767 --> 00:16:35.350
and each of the
elements of the matrix

00:16:35.350 --> 00:16:39.430
takes on a value describing
how dark or light that pixel.

00:16:39.430 --> 00:16:43.840
Let's say it's grayscale, and
it's value's between 0 and 255.

00:16:43.840 --> 00:16:45.820
That's pretty typical.

00:16:45.820 --> 00:16:48.490
So I have this matrix, and
each element to the matrix

00:16:48.490 --> 00:16:50.410
corresponds to a pixel.

00:16:50.410 --> 00:16:53.470
And I do a singular
value decomposition.

00:16:53.470 --> 00:16:55.750
Some of the singular
values, the values of sigma,

00:16:55.750 --> 00:16:57.760
are bigger than others.

00:16:57.760 --> 00:17:00.770
They're all positive, but
some are bigger than others.

00:17:00.770 --> 00:17:02.440
The ones that are
biggest in magnitude

00:17:02.440 --> 00:17:06.770
carry the most information
content about the matrix.

00:17:06.770 --> 00:17:11.470
So we can do data compression by
neglecting singular values that

00:17:11.470 --> 00:17:14.980
are smaller than some
threshold, and also neglecting

00:17:14.980 --> 00:17:17.439
the corresponding
singular vectors.

00:17:17.439 --> 00:17:18.730
And that's what I've done here.

00:17:18.730 --> 00:17:21.400
So here's the original
bitmap of the fingerprint.

00:17:21.400 --> 00:17:23.680
I did the singular
value decomposition,

00:17:23.680 --> 00:17:27.819
and then I retained only the 50
biggest singular values and I

00:17:27.819 --> 00:17:29.830
left all the other
singular values out.

00:17:29.830 --> 00:17:32.290
This bitmap was something
like, I don't know,

00:17:32.290 --> 00:17:34.450
300 pixels by 300
pixels, so there's

00:17:34.450 --> 00:17:37.210
like 300 singular
values, but I got rid

00:17:37.210 --> 00:17:40.480
of 5/6 of the
information content.

00:17:40.480 --> 00:17:43.120
I dropped 5/6 of the
singular vectors,

00:17:43.120 --> 00:17:44.800
and then I
reconstructed the matrix

00:17:44.800 --> 00:17:47.320
from the singular values
and those singular vectors,

00:17:47.320 --> 00:17:48.910
and you get a faithful
representation

00:17:48.910 --> 00:17:51.142
of the original fingerprint.

00:17:51.142 --> 00:17:52.600
So the singular
value decomposition

00:17:52.600 --> 00:17:54.433
says something about
the information content

00:17:54.433 --> 00:17:57.100
in the transformation
that is the matrix, right?

00:17:57.100 --> 00:17:58.720
There are some
transformations that

00:17:58.720 --> 00:18:03.172
are of lower power or
importance than others.

00:18:03.172 --> 00:18:04.630
And the magnitude
of these singular

00:18:04.630 --> 00:18:06.300
values tell you what they are.

00:18:06.300 --> 00:18:09.680
Does that makes sense?

00:18:09.680 --> 00:18:10.680
How else can it be used?

00:18:10.680 --> 00:18:14.370
Well, one way it can be
used is finding the least

00:18:14.370 --> 00:18:17.820
square solution
to the equation Ax

00:18:17.820 --> 00:18:22.420
equals b, where A is no
longer a square matrix, OK?

00:18:26.410 --> 00:18:28.030
You've done this
in other contexts

00:18:28.030 --> 00:18:32.860
before where the equations
are overspecified.

00:18:32.860 --> 00:18:36.090
We have more equations than
unknowns, like data fitting.

00:18:36.090 --> 00:18:39.820
You form the normal equations,
you multiply both sides of Ax

00:18:39.820 --> 00:18:45.910
equals b by A transpose, and
then invert A transpose A.

00:18:45.910 --> 00:18:47.890
You might not be
too surprised, then,

00:18:47.890 --> 00:18:50.380
to think that singular value
decomposition could be useful

00:18:50.380 --> 00:18:50.920
here too.

00:18:50.920 --> 00:18:54.790
Since we already saw the data in
a singular value decomposition

00:18:54.790 --> 00:18:58.630
corresponds to eigenvectors and
eigenvalues of this A transpose

00:18:58.630 --> 00:19:00.130
A, right?

00:19:00.130 --> 00:19:02.950
But there's a way to use
this sort of decomposition

00:19:02.950 --> 00:19:05.140
formally to solve
problems that are

00:19:05.140 --> 00:19:09.730
both overspecified
and underspecified.

00:19:09.730 --> 00:19:14.620
Least squares means find
the vector of solutions

00:19:14.620 --> 00:19:21.860
x that minimizes
this function phi.

00:19:21.860 --> 00:19:24.560
Phi is the length of the
vector given by the difference

00:19:24.560 --> 00:19:26.450
between Ax and b.

00:19:26.450 --> 00:19:29.900
It's one measure of how far
an error our solution x is.

00:19:29.900 --> 00:19:34.220
So let's define the value
x which is least in error.

00:19:34.220 --> 00:19:36.065
This is one definition
of least squares.

00:19:40.490 --> 00:19:43.950
And I know the singular value
decomposition of A. So A

00:19:43.950 --> 00:19:48.685
is U sigma times V. So I
have U sigma V times x.

00:19:48.685 --> 00:19:52.140
I can factor out U, and I've
got a factor of U transpose,

00:19:52.140 --> 00:19:54.960
or U conjugate transpose
multiplying by b.

00:19:54.960 --> 00:20:00.000
So Ax minus b is the same as
U times the quantity sigma V

00:20:00.000 --> 00:20:05.240
conjugate transpose x minus
U conjugate transpose b.

00:20:05.240 --> 00:20:08.480
We want to know the x
that minimizes this phi.

00:20:08.480 --> 00:20:10.130
It's an optimization problem.

00:20:10.130 --> 00:20:12.990
We'll talk in great detail about
these sorts of problems later.

00:20:12.990 --> 00:20:15.350
This one is so easy to do,
we can just work it out

00:20:15.350 --> 00:20:18.152
in a couple lines of text.

00:20:18.152 --> 00:20:19.610
We'll define a new
set of unknowns,

00:20:19.610 --> 00:20:25.130
y, which is V transpose times
x, and a new right-hand side

00:20:25.130 --> 00:20:29.270
for a system of equations p,
which is U transpose times b.

00:20:29.270 --> 00:20:31.220
And then we can rewrite
our function phi

00:20:31.220 --> 00:20:32.600
that we're trying to minimize.

00:20:32.600 --> 00:20:37.720
So phi then becomes
U sigma y minus p.

00:20:37.720 --> 00:20:39.080
U is a unitary vector.

00:20:39.080 --> 00:20:45.440
It imparts no stretch in the two
norms, so this sigma y minus p

00:20:45.440 --> 00:20:49.850
doesn't get elongated by
multiplication with U.

00:20:49.850 --> 00:20:52.280
So it's length,
the length of this,

00:20:52.280 --> 00:20:55.660
is the same as the length
of sigma y minus p.

00:20:55.660 --> 00:20:56.915
You can prove this.

00:20:56.915 --> 00:20:58.540
It's not very difficult
to show at all.

00:20:58.540 --> 00:21:02.430
You use the definition of
the two norm to prove it.

00:21:02.430 --> 00:21:10.150
So phi is minimized by y's,
which makes this norm smallest,

00:21:10.150 --> 00:21:11.170
make it closest to 0.

00:21:14.240 --> 00:21:17.060
Let r be the number
of non-zero singular

00:21:17.060 --> 00:21:19.190
values, the number
of those sigmas

00:21:19.190 --> 00:21:22.440
which are not equal to 0.

00:21:22.440 --> 00:21:24.360
That's also the rank of A.

00:21:24.360 --> 00:21:29.330
Then I can rewrite
phi as the sum from i

00:21:29.330 --> 00:21:36.060
equals 1 to r of sigma i i
time y i minus p i squared.

00:21:36.060 --> 00:21:40.160
That's parts of this length,
this Euclidean length,

00:21:40.160 --> 00:21:42.320
for which sigma is non-zero.

00:21:42.320 --> 00:21:46.730
Plus the sum from r
plus 1 to n, the sum

00:21:46.730 --> 00:21:50.240
over the rest of
the values of p,

00:21:50.240 --> 00:21:52.590
for which the
corresponding sigmas are 0.

00:21:59.710 --> 00:22:02.830
I want to minimize
phi, and the only thing

00:22:02.830 --> 00:22:05.337
that I can change to
minimize it is what?

00:22:08.139 --> 00:22:10.530
What am I free to
pick in this equation

00:22:10.530 --> 00:22:13.930
in order to make phi
as small as possible?

00:22:13.930 --> 00:22:14.570
Yeah?

00:22:14.570 --> 00:22:15.070
AUDIENCE: y.

00:22:15.070 --> 00:22:17.290
JAMES W. SWAN: y,
so I need to choose

00:22:17.290 --> 00:22:21.110
the y's that make this
phi as small as possible.

00:22:21.110 --> 00:22:22.955
What value should I
choose for the y's?

00:22:26.678 --> 00:22:27.666
What do you think?

00:22:30.630 --> 00:22:34.380
AUDIENCE: [INAUDIBLE]

00:22:34.380 --> 00:22:35.630
JAMES W. SWAN: Perfect, right?

00:22:35.630 --> 00:22:40.000
Choose y equals p
i over sigma i i.

00:22:40.000 --> 00:22:42.610
Right, y i is p
i over sigma i i.

00:22:42.610 --> 00:22:46.150
Then all of these terms is 0.

00:22:46.150 --> 00:22:48.490
I can't make this sum
any smaller than that.

00:22:48.490 --> 00:22:53.980
That fixes the value
of y i up to r.

00:22:53.980 --> 00:22:57.280
I can't do anything about
this left over bit here.

00:22:57.280 --> 00:23:00.340
There's no choice of
y that's going to make

00:23:00.340 --> 00:23:01.657
this part and the smaller.

00:23:01.657 --> 00:23:02.490
It's just left over.

00:23:02.490 --> 00:23:04.275
It's some remainder that
we can't make any smaller

00:23:04.275 --> 00:23:05.410
or minimize an smaller.

00:23:05.410 --> 00:23:08.125
There isn't an exact solution
to this problem, in many cases.

00:23:10.690 --> 00:23:17.190
But one way this could be
0 is if r is equal to n.

00:23:17.190 --> 00:23:19.570
Then there are left
over unspecified terms,

00:23:19.570 --> 00:23:22.375
and then this y i
equals p i over sigma i

00:23:22.375 --> 00:23:23.920
is the exact solution
to the problem.

00:23:28.280 --> 00:23:29.690
So this is what you told me.

00:23:29.690 --> 00:23:36.230
Choose y i is p i over sigma i i
for i bigger than 1 and smaller

00:23:36.230 --> 00:23:38.650
than r.

00:23:38.650 --> 00:23:40.710
There are going to
be values of y i

00:23:40.710 --> 00:23:46.070
that go between r plus 1 and
m, because A was a vector that

00:23:46.070 --> 00:23:47.990
mapped from m to n, right?

00:23:47.990 --> 00:23:52.160
So I have extra values of y that
could be specified potentially.

00:23:52.160 --> 00:23:56.279
If that's true, if r
plus 1 is smaller than m,

00:23:56.279 --> 00:23:58.570
then there's some components
of y that I don't get to--

00:23:58.570 --> 00:23:59.770
I can't specify, right?

00:23:59.770 --> 00:24:02.930
My system of equations is
somehow underdetermined.

00:24:02.930 --> 00:24:05.540
I need some external
information to show me what

00:24:05.540 --> 00:24:07.740
values to pick for those y i.

00:24:07.740 --> 00:24:08.900
I don't know.

00:24:08.900 --> 00:24:10.400
I can't use them.

00:24:10.400 --> 00:24:13.420
Sometimes people just
set y i equal to 0.

00:24:13.420 --> 00:24:16.220
That's sort of silly,
but that's what's done.

00:24:16.220 --> 00:24:20.150
It's called the minimum
norm least square solution.

00:24:20.150 --> 00:24:23.090
y has minimum length, when you
set all these other components

00:24:23.090 --> 00:24:24.050
to 0.

00:24:24.050 --> 00:24:27.140
But the truth is, we can't
specify those components,

00:24:27.140 --> 00:24:27.716
right?

00:24:27.716 --> 00:24:29.090
We need some
external information

00:24:29.090 --> 00:24:31.850
in order to specify them.

00:24:31.850 --> 00:24:34.680
Once we know y, we
can find x going back

00:24:34.680 --> 00:24:36.260
to our definition of what y is.

00:24:36.260 --> 00:24:38.840
So I multiply this equation
by V on both sides,

00:24:38.840 --> 00:24:42.350
and I'll get V y equals x.

00:24:42.350 --> 00:24:44.180
So I can find my
least square solution

00:24:44.180 --> 00:24:47.596
to the problem from the
singular value decomposition.

00:24:47.596 --> 00:24:49.220
So I can find the
least square solution

00:24:49.220 --> 00:24:52.850
to both overdetermined and
underdetermined problems using

00:24:52.850 --> 00:24:55.694
singular value decomposition.

00:24:55.694 --> 00:24:57.110
It inherits all
the properties you

00:24:57.110 --> 00:24:58.670
know of solving the
normal equations,

00:24:58.670 --> 00:25:01.430
multiplying by A transpose
the entire equation,

00:25:01.430 --> 00:25:04.722
and solving for a least
square solution that way.

00:25:04.722 --> 00:25:06.680
But that's only good for
overdetermined systems

00:25:06.680 --> 00:25:07.250
of equations.

00:25:07.250 --> 00:25:09.041
This can work for
underdetermined equations

00:25:09.041 --> 00:25:09.770
as well.

00:25:09.770 --> 00:25:12.450
And maybe we do have
extraneous information

00:25:12.450 --> 00:25:15.080
that lets us specify these
other components somehow.

00:25:15.080 --> 00:25:17.420
Maybe we do a
separate optimization

00:25:17.420 --> 00:25:20.110
that chooses from all
possible solutions

00:25:20.110 --> 00:25:23.070
where these y i's are free,
and picks the best one subject

00:25:23.070 --> 00:25:25.890
to some other constraint.

00:25:25.890 --> 00:25:27.550
Does it makes sense?

00:25:27.550 --> 00:25:29.010
OK, that's the
last decomposition

00:25:29.010 --> 00:25:32.070
we're going to talk about.

00:25:32.070 --> 00:25:34.890
It's as expensive to compute
the singular value decomposition

00:25:34.890 --> 00:25:36.810
as it is to solve a
system of equations.

00:25:36.810 --> 00:25:38.310
You might have
guessed that it's got

00:25:38.310 --> 00:25:40.470
an order N cubed flavor to it.

00:25:40.470 --> 00:25:42.660
It's kind of inescapable
that we run up

00:25:42.660 --> 00:25:45.920
against those
computational difficulties,

00:25:45.920 --> 00:25:48.530
order N cubed
computational complexity.

00:25:48.530 --> 00:25:50.730
And there are many problems
of practical interest,

00:25:50.730 --> 00:25:54.180
particularly solutions of
PDEs, for which that's not

00:25:54.180 --> 00:25:56.000
going to cut it.

00:25:56.000 --> 00:25:59.970
Where you couldn't solve
the problem with that sort

00:25:59.970 --> 00:26:01.330
of scaling in time.

00:26:01.330 --> 00:26:04.710
You couldn't compute the
Gaussian elimination,

00:26:04.710 --> 00:26:06.540
or the singular
value decomposition,

00:26:06.540 --> 00:26:08.970
or an eigenvalue decomposition.

00:26:08.970 --> 00:26:09.960
It won't work.

00:26:09.960 --> 00:26:14.520
And in those cases, we appeal
to not exact solution methods,

00:26:14.520 --> 00:26:17.010
but approximate
solution methods.

00:26:17.010 --> 00:26:20.120
So instead of trying to
get an exact solution,

00:26:20.120 --> 00:26:22.080
we'll try to formulate
one that's good enough.

00:26:22.080 --> 00:26:24.390
We already know the computer
introduces numerical error

00:26:24.390 --> 00:26:26.070
anyways.

00:26:26.070 --> 00:26:28.440
Maybe we don't need machine
precision in our solution

00:26:28.440 --> 00:26:30.600
or something close to machine
precision in our solution.

00:26:30.600 --> 00:26:32.266
Maybe we're solving
engineering problem,

00:26:32.266 --> 00:26:34.230
and we're willing to
accept relative errors

00:26:34.230 --> 00:26:37.770
on the order of 10 to the
minus 3 or 10 to the minus 5,

00:26:37.770 --> 00:26:41.550
some specified tolerance
that we apply to the problem.

00:26:41.550 --> 00:26:44.250
And in those circumstances,
we use iterative methods

00:26:44.250 --> 00:26:46.095
to solve systems of
equations instead

00:26:46.095 --> 00:26:49.020
of exact methods,
elimination methods,

00:26:49.020 --> 00:26:50.640
or metrics
decomposition methods.

00:26:53.830 --> 00:26:56.560
These algorithms are all
based on iterative refinement

00:26:56.560 --> 00:26:57.580
of an initial guess.

00:26:57.580 --> 00:26:59.260
So if we have some
system of equations

00:26:59.260 --> 00:27:02.350
we're trying to
solve, Ax equals b,

00:27:02.350 --> 00:27:05.410
we'll formulate some
linear map, right?

00:27:05.410 --> 00:27:09.910
xi plus 1 will be some
matrix C times x i

00:27:09.910 --> 00:27:13.540
plus some little vector c
where x i is my last best

00:27:13.540 --> 00:27:17.080
guess for the solution to
this problem, and x i plus 1

00:27:17.080 --> 00:27:20.320
is my next best guess for
the solution to this problem.

00:27:20.320 --> 00:27:24.700
And I'm hoping, as I apply
this map more and more times,

00:27:24.700 --> 00:27:27.010
I'm creeping closer
to the exact solution

00:27:27.010 --> 00:27:29.590
to the original
system of equations.

00:27:29.590 --> 00:27:33.610
The map will converge
when x i plus 1

00:27:33.610 --> 00:27:36.190
approaches x i, when the
map isn't making any changes

00:27:36.190 --> 00:27:39.250
to the vector anymore.

00:27:39.250 --> 00:27:48.490
And the converged value will
be a solution when x i--

00:27:48.490 --> 00:27:51.910
which is equal to i
minus c inverse times c,

00:27:51.910 --> 00:27:54.370
if I replace x i was
1 with x i appear,

00:27:54.370 --> 00:27:58.120
so I say that my map has
converged-- when this value is

00:27:58.120 --> 00:28:00.160
equivalent to A
inverse times B, when

00:28:00.160 --> 00:28:03.880
it's a solution to the
original problem, right?

00:28:03.880 --> 00:28:05.480
So my map may converge.

00:28:05.480 --> 00:28:08.320
It may not converge to a
solution of the problem I like,

00:28:08.320 --> 00:28:09.790
but if it satisfies
this condition,

00:28:09.790 --> 00:28:12.730
then has converged to be
a solution of the problem

00:28:12.730 --> 00:28:14.270
that I like as well.

00:28:14.270 --> 00:28:18.400
And so it's all about using this
C here and this little c here

00:28:18.400 --> 00:28:22.000
so that this map converges
to solution of the problem

00:28:22.000 --> 00:28:22.720
I'm after.

00:28:22.720 --> 00:28:25.000
And there are lots of
schemes for doing this.

00:28:25.000 --> 00:28:26.680
Some of them are kind of ad hoc.

00:28:26.680 --> 00:28:28.230
I'm going to show
you one right now.

00:28:28.230 --> 00:28:30.490
And then when we
do optimization,

00:28:30.490 --> 00:28:32.230
we'll talk about
a more formal way

00:28:32.230 --> 00:28:35.210
of doing this for
which you can guarantee

00:28:35.210 --> 00:28:37.937
very rapid convergence
to a solution.

00:28:37.937 --> 00:28:40.020
So here's a system of
equations I'd like to solve.

00:28:40.020 --> 00:28:41.510
It's not a very big one.

00:28:41.510 --> 00:28:44.050
It doesn't really make sense
to solve this one iteratively,

00:28:44.050 --> 00:28:45.940
but it's a nice illustration.

00:28:45.940 --> 00:28:49.570
One way to go about
formulating this map

00:28:49.570 --> 00:28:53.290
is to split this
matrix into two parts.

00:28:53.290 --> 00:28:57.160
So I'll split it into a diagonal
part and an off diagonal part.

00:28:57.160 --> 00:29:00.070
So I haven't changed the
problem at all by doing that.

00:29:00.070 --> 00:29:03.740
And then I'm going to
rename this x x i plus 1,

00:29:03.740 --> 00:29:07.060
and I'm going to
rename this x x i.

00:29:07.060 --> 00:29:09.454
And then move this
matrix vector product

00:29:09.454 --> 00:29:10.870
to the other side
of the equation.

00:29:10.870 --> 00:29:12.376
And here's my map.

00:29:12.376 --> 00:29:13.750
Of course, this
matrix multiplied

00:29:13.750 --> 00:29:16.210
doesn't make any-- it's
not useful to write it out

00:29:16.210 --> 00:29:16.780
explicitly.

00:29:16.780 --> 00:29:19.070
This is just identity.

00:29:19.070 --> 00:29:20.840
So I can drop this entirely.

00:29:20.840 --> 00:29:22.090
This is just x i plus one.

00:29:22.090 --> 00:29:23.320
So here's my map.

00:29:23.320 --> 00:29:27.430
Take an initial guess,
multiply it by this matrix,

00:29:27.430 --> 00:29:31.600
add the vector 1, 0, and repeat
over and over and over again.

00:29:31.600 --> 00:29:33.130
Hopefully-- we
don't really know--

00:29:33.130 --> 00:29:34.671
but hopefully, it's
going to converge

00:29:34.671 --> 00:29:38.502
to a solution of the
original linear equations.

00:29:38.502 --> 00:29:39.710
I didn't make up that method.

00:29:39.710 --> 00:29:42.490
That's a method called
Jacobi Iteration.

00:29:42.490 --> 00:29:46.280
And the strategy is to split
the matrix A into two parts--

00:29:46.280 --> 00:29:49.010
a sum of its diagonal
elements, and it's off diagonal

00:29:49.010 --> 00:29:51.050
elements--

00:29:51.050 --> 00:29:54.410
and rewrite the original
equations as an iterative map.

00:29:54.410 --> 00:30:00.650
So D times x i plus 1 is equal
to minus r times x i plus b.

00:30:00.650 --> 00:30:07.600
Or x i plus 1 is D inverse
times minus r x i plus b.

00:30:07.600 --> 00:30:11.100
If the equations converge,
then D plus r times x i

00:30:11.100 --> 00:30:13.780
has to be equal to b, we
will have found a solution.

00:30:13.780 --> 00:30:15.040
If it converges, right?

00:30:15.040 --> 00:30:18.729
If these iterations
approach a steady value.

00:30:18.729 --> 00:30:20.770
If they don't change from
iteration to iteration.

00:30:20.770 --> 00:30:21.940
Is

00:30:21.940 --> 00:30:23.796
The nice thing about
the Jacobi method is it

00:30:23.796 --> 00:30:25.170
turns the hard
problem, the order

00:30:25.170 --> 00:30:29.920
N cubed problem of
computing A inverse B,

00:30:29.920 --> 00:30:31.750
into a succession
of easy problems,

00:30:31.750 --> 00:30:38.710
D inverse times some vector C.
How many calculations does it

00:30:38.710 --> 00:30:40.150
take to compute that D inverse?

00:30:44.710 --> 00:30:46.454
N, that's right, order N.

00:30:46.454 --> 00:30:47.620
It's just a diagonal matrix.

00:30:47.620 --> 00:30:49.880
I invert each of its diagonal
elements, and I'm done.

00:30:49.880 --> 00:30:53.790
So I went from order N cubed,
which was going to be hard,

00:30:53.790 --> 00:30:57.320
into a succession
of order N problems.

00:30:57.320 --> 00:30:59.730
So as long as it doesn't
take me order N squared

00:30:59.730 --> 00:31:02.879
iterations to get to the
solution that I want,

00:31:02.879 --> 00:31:03.670
I'm going to be OK.

00:31:03.670 --> 00:31:05.378
This is going to be
a viable way to solve

00:31:05.378 --> 00:31:07.690
this problem faster than
finding the exact solution.

00:31:13.240 --> 00:31:15.640
How do you know
that it converges?

00:31:15.640 --> 00:31:17.020
That's the question.

00:31:17.020 --> 00:31:20.290
Is this thing actually
going to converge or not,

00:31:20.290 --> 00:31:23.980
or are these iterations just
going to run on and on forever?

00:31:23.980 --> 00:31:26.620
Well, one way to check whether
it will converge or not

00:31:26.620 --> 00:31:31.220
is to go back up to this
equation here, and substitute b

00:31:31.220 --> 00:31:34.890
equals Ax, where x is the
exact solution to the problem.

00:31:37.420 --> 00:31:40.810
And you can transform,
then, this equation into one

00:31:40.810 --> 00:31:45.640
that looks like x i plus
1 minus x equal to minus D

00:31:45.640 --> 00:31:49.930
inverse times r x i minus x.

00:31:49.930 --> 00:31:52.780
And if I take the
norm of both sides

00:31:52.780 --> 00:31:55.270
and I apply our
normal equality--

00:31:55.270 --> 00:31:58.000
where the norm of a
matrix vector product

00:31:58.000 --> 00:32:01.460
is smaller than the product
of the norms of the matrices

00:32:01.460 --> 00:32:02.860
of the vectors--

00:32:02.860 --> 00:32:06.070
then I can get a
ratio like this.

00:32:06.070 --> 00:32:10.060
That the absolute error
in iteration I plus 1

00:32:10.060 --> 00:32:13.090
divided by the absolute
error in iteration i

00:32:13.090 --> 00:32:17.410
is smaller than the
norm of this matrix.

00:32:17.410 --> 00:32:21.810
So if I'm converging,
then what I expect

00:32:21.810 --> 00:32:25.320
is this ratio should
be smaller than 1.

00:32:25.320 --> 00:32:27.240
The error in my
next approximation

00:32:27.240 --> 00:32:30.080
should be smaller than the error
in my current approximation.

00:32:30.080 --> 00:32:31.450
That makes sense?

00:32:31.450 --> 00:32:35.100
So that means that I would hope
that the norm of this matrix

00:32:35.100 --> 00:32:36.690
is also smaller than 1.

00:32:36.690 --> 00:32:40.290
If it is, then I'm going to
be guaranteed to converge.

00:32:40.290 --> 00:32:42.912
So for a particular
coefficient matrix,

00:32:42.912 --> 00:32:45.120
for a system of linear
equations I'm trying to solve,

00:32:45.120 --> 00:32:47.010
I may be able to find--

00:32:47.010 --> 00:32:50.430
I may find that this is true.

00:32:50.430 --> 00:32:51.960
And then I can
apply this method,

00:32:51.960 --> 00:32:54.980
and I'll converge to a solution.

00:32:54.980 --> 00:32:58.100
We call this sort of
convergence linear.

00:32:58.100 --> 00:32:59.990
Whatever this number
is, it tells me

00:32:59.990 --> 00:33:03.890
the fraction by which the
error is reduced from iteration

00:33:03.890 --> 00:33:04.940
to iteration.

00:33:04.940 --> 00:33:07.810
So suppose this is 1/10.

00:33:07.810 --> 00:33:11.510
Then the absolute error is going
to be reduced by a factor of 10

00:33:11.510 --> 00:33:14.300
in each iteration.

00:33:14.300 --> 00:33:15.720
It's not going to
be 1/10 usually.

00:33:15.720 --> 00:33:17.095
It's going to be
something that's

00:33:17.095 --> 00:33:20.201
a little bit bigger than that
typically, but that's the idea.

00:33:20.201 --> 00:33:22.700
You can show-- I would encourage
you to try to work this out

00:33:22.700 --> 00:33:25.560
on your own-- but you can show
that the infinity norm of this

00:33:25.560 --> 00:33:26.060
product--

00:33:29.060 --> 00:33:33.050
infinity norm of this
product is equal to this.

00:33:33.050 --> 00:33:36.040
And if I ask that the
infinity norm of this product

00:33:36.040 --> 00:33:38.540
be smaller than 1,
that's guaranteed

00:33:38.540 --> 00:33:41.450
when the diagonal values of
the matrix and absolute value

00:33:41.450 --> 00:33:43.850
are bigger than
the sum of the off

00:33:43.850 --> 00:33:46.670
diagonal values in a particular
row or a particular column.

00:33:46.670 --> 00:33:49.187
And that kind of matrix we
call diagonally dominant.

00:33:49.187 --> 00:33:51.770
The diagonal values are bigger
than the sum and absolute value

00:33:51.770 --> 00:33:53.570
of the off diagonal pieces.

00:33:53.570 --> 00:33:57.290
So diagonally dominant matrices,
which come up quite often,

00:33:57.290 --> 00:33:59.240
can be-- those linear
equations based

00:33:59.240 --> 00:34:02.660
on those matrices can be solved
reasonable efficiency using

00:34:02.660 --> 00:34:04.262
the Jacobi method.

00:34:04.262 --> 00:34:05.720
There are better
methods to choose.

00:34:05.720 --> 00:34:07.350
I'll show you one in a second.

00:34:07.350 --> 00:34:08.900
But you can guarantee
that this is

00:34:08.900 --> 00:34:11.360
going to converge to a solution,
and that the solution will

00:34:11.360 --> 00:34:13.219
be the right solution to
the linear equations you

00:34:13.219 --> 00:34:14.094
were trying to solve.

00:34:19.440 --> 00:34:23.010
So if the goal is just to
turn hard problems into easier

00:34:23.010 --> 00:34:25.500
to solve problems, then
there are other natural ways

00:34:25.500 --> 00:34:27.540
to want to split a matrix.

00:34:27.540 --> 00:34:32.100
So maybe you want to split into
A lower triangular part which

00:34:32.100 --> 00:34:34.829
contains the diagonal
elements of A,

00:34:34.829 --> 00:34:36.449
and an upper
triangular part which

00:34:36.449 --> 00:34:41.130
has no diagonal elements of A.
We just split this thing apart.

00:34:41.130 --> 00:34:43.370
And then we could rewrite
our system of equations

00:34:43.370 --> 00:34:45.750
is an iterative map
like this, L times x i

00:34:45.750 --> 00:34:50.429
plus 1 is minus U
times x i plus b.

00:34:50.429 --> 00:34:53.550
All I have to do is invert
l to find my next iteration.

00:34:53.550 --> 00:34:55.590
And how expensive
computationally

00:34:55.590 --> 00:35:00.360
is it to solve a system of
equations which is triangular?

00:35:00.360 --> 00:35:02.630
This is a process we
call back substitution.

00:35:02.630 --> 00:35:03.540
Its order--

00:35:03.540 --> 00:35:04.470
AUDIENCE: N squared.

00:35:04.470 --> 00:35:05.820
JAMES W. SWAN: --N squared.

00:35:05.820 --> 00:35:08.160
So we still beat N cubed.

00:35:08.160 --> 00:35:11.640
One would hope that it doesn't
require too many iterations

00:35:11.640 --> 00:35:12.420
to do this.

00:35:12.420 --> 00:35:15.160
But in principle, we can do
this order N squared operations

00:35:15.160 --> 00:35:16.500
many times.

00:35:16.500 --> 00:35:18.840
And it'll turn out
that this sort of a map

00:35:18.840 --> 00:35:22.080
converges to the solution
that we're after.

00:35:22.080 --> 00:35:24.750
It converges when matrices
are either diagonally dominant

00:35:24.750 --> 00:35:28.470
as before, or they're symmetric
and they're positive definite.

00:35:28.470 --> 00:35:31.680
Positive definite means all
the eigenvalues of the matrix

00:35:31.680 --> 00:35:33.240
are bigger than 0.

00:35:40.680 --> 00:35:43.292
So try the iterative method
solving some equations

00:35:43.292 --> 00:35:44.250
and see how we convert.

00:35:44.250 --> 00:35:44.749
Yes?

00:35:44.749 --> 00:35:47.460
AUDIENCE: How do you justify
ignoring the diagonal elements

00:35:47.460 --> 00:35:48.400
in that method?

00:35:50.862 --> 00:35:52.320
JAMES W. SWAN: So
the question was,

00:35:52.320 --> 00:35:54.960
how do you justify ignoring
the diagonal elements

00:35:54.960 --> 00:35:56.102
in this method.

00:35:56.102 --> 00:35:57.810
Maybe I was going too
fast or I misspoke.

00:35:57.810 --> 00:36:01.980
So I'm going to split A into
a lower triangular matrix that

00:36:01.980 --> 00:36:05.250
has all the diagonal
elements, and U

00:36:05.250 --> 00:36:08.125
is the upper parts with none of
those diagonal elements on it.

00:36:08.125 --> 00:36:09.000
Does that make sense?

00:36:09.000 --> 00:36:09.750
AUDIENCE: Yeah.

00:36:09.750 --> 00:36:11.180
JAMES W. SWAN: Thank you
for asking that question.

00:36:11.180 --> 00:36:12.430
I hope that's clear.

00:36:12.430 --> 00:36:16.660
l holds onto the diagonal
pieces and U takes those away.

00:36:19.790 --> 00:36:20.490
So let's try it.

00:36:20.490 --> 00:36:23.310
On a matrix like this,
the exact solution

00:36:23.310 --> 00:36:27.210
to this system of equations
is 3/4, 1/2, and 1/4.

00:36:27.210 --> 00:36:28.980
All right, we'll
try Jacobi, we'll

00:36:28.980 --> 00:36:31.530
have to give it some initial
guess for the solution, right?

00:36:31.530 --> 00:36:33.450
We're talking about
places where you

00:36:33.450 --> 00:36:37.160
can derive those initial guesses
from later on in the course,

00:36:37.160 --> 00:36:39.300
but we have to start
the iterative process

00:36:39.300 --> 00:36:41.730
with some guess
at the solutions.

00:36:41.730 --> 00:36:43.380
So here's an initial guess.

00:36:43.380 --> 00:36:44.550
We'll apply this map.

00:36:44.550 --> 00:36:47.010
Here's Gauss-Seidel with
the same initial guess,

00:36:47.010 --> 00:36:48.670
and we'll apply this map.

00:36:48.670 --> 00:36:52.120
They're both
linearly convergent,

00:36:52.120 --> 00:36:53.700
so the relative
error will go down

00:36:53.700 --> 00:36:57.660
by a fixed factor
after each iteration.

00:36:57.660 --> 00:37:00.470
Iteration one, the relative
error in Jacobi will be 38%.

00:37:00.470 --> 00:37:02.910
In Gauss-Seidel, it'll be 40%.

00:37:02.910 --> 00:37:05.190
If we apply this all the
way down to 10 iterations,

00:37:05.190 --> 00:37:09.020
the relative error Jacobi will
be 1.7%, and the relative error

00:37:09.020 --> 00:37:11.360
in Gauss-Seidel 0.08%.

00:37:11.360 --> 00:37:13.320
And we can go on and on
with these iterations

00:37:13.320 --> 00:37:16.740
if we want until we get
sufficiently converged, we

00:37:16.740 --> 00:37:19.320
get to a point where the
relative error is small enough

00:37:19.320 --> 00:37:22.590
that we're happy to accept
this answer as a solution

00:37:22.590 --> 00:37:24.660
to our system of equations.

00:37:24.660 --> 00:37:28.450
So we traded the burden of doing
all these calculations to do

00:37:28.450 --> 00:37:34.080
elimination for a faster,
less computationally complex

00:37:34.080 --> 00:37:35.250
methodology.

00:37:35.250 --> 00:37:38.620
But the trade off was we don't
get an exact solution anymore.

00:37:38.620 --> 00:37:40.890
We're going to have finite
precision in the result,

00:37:40.890 --> 00:37:42.750
and we have to
specify the tolerance

00:37:42.750 --> 00:37:44.541
that we want to converge to.

00:37:44.541 --> 00:37:47.040
We're going to see now-- this
is the hook into the next part

00:37:47.040 --> 00:37:47.860
of that class--

00:37:47.860 --> 00:37:50.070
we're going to talk about
solutions of nonlinear

00:37:50.070 --> 00:37:52.470
equations next for
which there are

00:37:52.470 --> 00:37:55.140
almost no non-linear equations
that we can solve exactly.

00:37:55.140 --> 00:37:58.945
They all have to be solved
using these iterative methods.

00:37:58.945 --> 00:38:01.320
You can use these iterative
methods for linear equations.

00:38:01.320 --> 00:38:03.079
It's very common
to do it this way.

00:38:03.079 --> 00:38:04.620
In my group, we
solve lots of systems

00:38:04.620 --> 00:38:07.860
of linear equations associated
with hydrodynamic problems.

00:38:07.860 --> 00:38:10.670
These come up when
you're talking about,

00:38:10.670 --> 00:38:12.510
say, low Reynolds
number flows, which

00:38:12.510 --> 00:38:15.420
are linear sorts of
fluid flow problems.

00:38:15.420 --> 00:38:15.990
They're big.

00:38:15.990 --> 00:38:18.050
It's really hard to do
Gaussian elimination,

00:38:18.050 --> 00:38:19.910
so you apply different
iterative methods.

00:38:19.910 --> 00:38:20.910
You can do Gauss-Seidel.

00:38:20.910 --> 00:38:22.120
You can do Jacobi.

00:38:22.120 --> 00:38:23.790
We'll learn about
more advanced ones

00:38:23.790 --> 00:38:26.880
like PCG, which you're
applying on your homework now,

00:38:26.880 --> 00:38:29.970
and you should be seeing that
it converges relatively quickly

00:38:29.970 --> 00:38:32.176
in cases where exact
elimination doesn't work.

00:38:32.176 --> 00:38:34.050
We'll learn, actually,
how to do that method.

00:38:34.050 --> 00:38:35.758
That's one that we
apply in my own group.

00:38:35.758 --> 00:38:37.870
It's pretty common
to use out there.

00:38:37.870 --> 00:38:38.370
Yes?

00:38:38.370 --> 00:38:43.440
AUDIENCE: One question, is
that that Gauss, [INAUDIBLE]

00:38:43.440 --> 00:38:45.930
JAMES W. SWAN: Order N squared.

00:38:45.930 --> 00:38:47.708
AUDIENCE: Yeah,
that's what I meant.

00:38:47.708 --> 00:38:50.148
So now we've got an [INAUDIBLE].

00:38:50.148 --> 00:38:54.407
So we basically have
[INAUDIBLE] iterations, right?

00:38:54.407 --> 00:38:56.240
JAMES W. SWAN: This is
a wonderful question.

00:38:56.240 --> 00:39:00.120
So this is a pathological
problem in the sense

00:39:00.120 --> 00:39:03.300
that it requires a
lot of calculations

00:39:03.300 --> 00:39:05.550
to get an iterative
solution here.

00:39:05.550 --> 00:39:07.110
We haven't gotten
to an end that's

00:39:07.110 --> 00:39:10.290
big enough that the
computational complexities

00:39:10.290 --> 00:39:11.700
crossover.

00:39:11.700 --> 00:39:15.810
So for small Ns, probably
the factor in front of N--

00:39:15.810 --> 00:39:17.200
whatever number that is--

00:39:17.200 --> 00:39:19.000
and maybe even the
smaller factors,

00:39:19.000 --> 00:39:20.940
order N squared
factors on that order

00:39:20.940 --> 00:39:22.440
N cubed, play a big
role in how long

00:39:22.440 --> 00:39:24.660
it takes to actually
complete this thing.

00:39:24.660 --> 00:39:28.140
But modern problems are so
big that we almost always

00:39:28.140 --> 00:39:30.480
are running out to Ns
that are large enough

00:39:30.480 --> 00:39:31.590
that we see a crossover.

00:39:31.590 --> 00:39:34.110
You'll see this in your
homework this week.

00:39:34.110 --> 00:39:36.030
You won't see this
crossover at N equals 3.

00:39:36.030 --> 00:39:37.613
You're going to see
it out at N equals

00:39:37.613 --> 00:39:41.336
500 or 1,200, big problems.

00:39:41.336 --> 00:39:43.210
Then we're going to
encounter this crossover.

00:39:43.210 --> 00:39:44.376
That's a wonderful question.

00:39:44.376 --> 00:39:49.110
So first small system
sizes, iterative methods

00:39:49.110 --> 00:39:50.750
maybe don't buy you much.

00:39:50.750 --> 00:39:53.000
I suppose it depends on the
application though, right?

00:39:53.000 --> 00:39:56.430
If you're doing something
that involves solving problems

00:39:56.430 --> 00:40:01.420
on embedded hardware, in some
sort of sensor or control

00:40:01.420 --> 00:40:04.230
valve, there may be
very limited memory

00:40:04.230 --> 00:40:06.290
or computational capacity
available to you.

00:40:06.290 --> 00:40:08.490
And you may actually
apply an iterative method

00:40:08.490 --> 00:40:12.120
like this to a problem
that that controller

00:40:12.120 --> 00:40:15.330
needs to solve, for example.

00:40:15.330 --> 00:40:17.370
It just may not
have the capability

00:40:17.370 --> 00:40:20.880
of storing and inverting what
we would consider, today,

00:40:20.880 --> 00:40:25.350
a relatively small matrix
because the hardware doesn't

00:40:25.350 --> 00:40:26.670
have that sort of capability.

00:40:26.670 --> 00:40:28.890
So there could be
cases where you

00:40:28.890 --> 00:40:32.250
might choose something
that's slower but feasible,

00:40:32.250 --> 00:40:34.650
versus something that's
faster and exact,

00:40:34.650 --> 00:40:36.990
because there are
other constraints.

00:40:36.990 --> 00:40:40.560
They do exist, but modern
computers are pretty efficient.

00:40:40.560 --> 00:40:43.590
Your cell phone is faster
than the fastest computers

00:40:43.590 --> 00:40:46.410
in the world 20 years ago.

00:40:46.410 --> 00:40:47.550
We're doing OK.

00:40:47.550 --> 00:40:49.890
So we've got to get out to
big system sizes, big problem

00:40:49.890 --> 00:40:52.060
sizes, before this
starts to pay off.

00:40:52.060 --> 00:40:55.210
But it does for many
practical problems.

00:40:55.210 --> 00:40:59.410
OK I'll close with this, because
this is the hook into solving

00:40:59.410 --> 00:41:00.520
nonlinear equations.

00:41:03.537 --> 00:41:05.370
So I showed you these
two iterative methods,

00:41:05.370 --> 00:41:07.830
and they kind of had
stringent requirements

00:41:07.830 --> 00:41:10.770
for when they were actually
going to converge, right?

00:41:10.770 --> 00:41:13.350
I had to have a diagonally
dominant system of equations

00:41:13.350 --> 00:41:15.460
for Jacobi to converge.

00:41:15.460 --> 00:41:18.720
I had to have diagonal dominance
or symmetric positive definite

00:41:18.720 --> 00:41:19.260
matrices.

00:41:19.260 --> 00:41:20.730
These things exist
and they come up

00:41:20.730 --> 00:41:22.188
in lots of physical
problems, but I

00:41:22.188 --> 00:41:25.050
had to have it in order for
Gauss-Seidel to converge.

00:41:25.050 --> 00:41:26.700
What if I have a
system of equations

00:41:26.700 --> 00:41:28.194
that doesn't work that way?

00:41:28.194 --> 00:41:29.610
Or what if I have
an iterative map

00:41:29.610 --> 00:41:33.900
that I like for some reason, but
it doesn't appear to converge?

00:41:33.900 --> 00:41:37.270
Maybe it converges under some
circumstances, but not others.

00:41:37.270 --> 00:41:40.140
Well, there's a way to modify
these iterative maps, called

00:41:40.140 --> 00:41:43.380
successive
over-relaxation, which

00:41:43.380 --> 00:41:45.977
can help promote convergence.

00:41:45.977 --> 00:41:48.060
So suppose we have an
iterative map like this, x i

00:41:48.060 --> 00:41:51.897
plus 1 is some function of
the previous iteration value.

00:41:51.897 --> 00:41:52.980
Doesn't matter what it is.

00:41:52.980 --> 00:41:54.646
It could be linear,
could be non-linear.

00:41:54.646 --> 00:41:57.760
We don't actually care.

00:41:57.760 --> 00:42:00.610
The sought after solution
is found when x i plus 1

00:42:00.610 --> 00:42:01.630
is equal to x i.

00:42:01.630 --> 00:42:03.540
So this map is one
the convergence

00:42:03.540 --> 00:42:06.250
to the exact solution of
the problem that we want.

00:42:06.250 --> 00:42:08.650
We've somehow guaranteed
that that's the case,

00:42:08.650 --> 00:42:10.990
but it has to converge.

00:42:10.990 --> 00:42:14.890
One way to modify that
map is to say x i plus 1

00:42:14.890 --> 00:42:17.890
is 1 minus some
scalar value omega

00:42:17.890 --> 00:42:23.020
times x i plus omega times f.

00:42:23.020 --> 00:42:26.350
You can confirm that if you
substitute x i plus 1 equals

00:42:26.350 --> 00:42:28.900
x i into this equation,
you'll come up

00:42:28.900 --> 00:42:33.940
with the same fixed points
of this iterative map

00:42:33.940 --> 00:42:36.500
x i is equal to f of x i.

00:42:36.500 --> 00:42:40.210
So you haven't changed what
value will converge here,

00:42:40.210 --> 00:42:42.460
but you've affected the
rate at which it converges.

00:42:42.460 --> 00:42:45.640
Here you're saying x i
plus 1 is some fraction

00:42:45.640 --> 00:42:50.290
of my previous solution plus
some fraction of this f.

00:42:50.290 --> 00:42:53.890
And I get to control how big
those different fractions.

00:42:53.890 --> 00:42:58.410
So if things aren't converging
well for a map like this,

00:42:58.410 --> 00:43:00.700
then I could try
successive over-relaxation,

00:43:00.700 --> 00:43:03.400
and I could adjust this
relaxation parameter

00:43:03.400 --> 00:43:07.890
to be some fraction, some
number between 0 and 1,

00:43:07.890 --> 00:43:10.050
until I start to
observe convergence.

00:43:10.050 --> 00:43:11.550
And there are some
rules one can use

00:43:11.550 --> 00:43:13.383
to try to promote
convergence with this kind

00:43:13.383 --> 00:43:15.204
of successive over-relaxation.

00:43:15.204 --> 00:43:17.370
This is a very generic
technique that one can apply.

00:43:17.370 --> 00:43:20.700
If you have any iterative
map you're trying to apply,

00:43:20.700 --> 00:43:22.840
it should go to the
solution you want

00:43:22.840 --> 00:43:24.570
but it doesn't converge
for some reason,

00:43:24.570 --> 00:43:27.630
then you can use this
relaxation technique to promote

00:43:27.630 --> 00:43:29.250
convergence to the solution.

00:43:29.250 --> 00:43:31.770
You may slow the
convergence way down.

00:43:31.770 --> 00:43:34.680
It may be very slow to
converge, but it will converge.

00:43:34.680 --> 00:43:36.690
And after all, an
answer is better

00:43:36.690 --> 00:43:38.940
than no answer, no matter
how long it takes to get it.

00:43:38.940 --> 00:43:40.740
So sometimes you've
got to get these things

00:43:40.740 --> 00:43:43.670
by hook or by crook.

00:43:43.670 --> 00:43:45.710
So for example, you can
apply this to Jacobi.

00:43:45.710 --> 00:43:48.514
This was the
original Jacobi map.

00:43:48.514 --> 00:43:49.430
And we just take that.

00:43:49.430 --> 00:43:53.640
We add 1 minus omega
times x i plus omega

00:43:53.640 --> 00:43:55.680
times this factor over here.

00:43:55.680 --> 00:43:59.300
And now we can choose omega so
that this solution converges.

00:43:59.300 --> 00:44:02.360
We always make omega
small enough so

00:44:02.360 --> 00:44:04.550
that the diagonal
values of our matrix

00:44:04.550 --> 00:44:07.220
appear big enough
that the matrix looks

00:44:07.220 --> 00:44:09.770
like it's diagonally dominated.

00:44:09.770 --> 00:44:12.230
You could go back to that
same convergence analysis

00:44:12.230 --> 00:44:14.510
that I showed you before
and try to apply it

00:44:14.510 --> 00:44:19.252
to this over-relaxation
form of Jacobi and see that,

00:44:19.252 --> 00:44:21.710
while there's always going to
be some value of omega that's

00:44:21.710 --> 00:44:25.420
small enough, that this
thing will converge.

00:44:25.420 --> 00:44:27.920
It will look effectively
diagonally dominant,

00:44:27.920 --> 00:44:32.150
because omega inverse
times D will be big enough,

00:44:32.150 --> 00:44:34.865
or omega times D inverse
will be small enough.

00:44:34.865 --> 00:44:36.619
Does that make sense?

00:44:36.619 --> 00:44:39.160
You can apply the same sort of
damping method to Gauss-Seidel

00:44:39.160 --> 00:44:39.660
as well.

00:44:39.660 --> 00:44:42.360
It's very common to do this.

00:44:42.360 --> 00:44:45.560
The relaxation parameter acts
like an effective increase

00:44:45.560 --> 00:44:47.692
in the eigenvalues
of the matrix.

00:44:47.692 --> 00:44:50.150
So you can think about L. That's
a lower triangular matrix.

00:44:50.150 --> 00:44:54.310
It's diagonal values
are its eigenvalues.

00:44:54.310 --> 00:44:56.190
The diagonal values
of L inverse--

00:44:56.190 --> 00:44:59.000
well, 1 over those
diagonal values

00:44:59.000 --> 00:45:00.820
are the eigenvalues
of L inverse.

00:45:00.820 --> 00:45:03.560
And so if we make
omega very small,

00:45:03.560 --> 00:45:06.630
then we make the eigenvalues
of L inverse very small,

00:45:06.630 --> 00:45:08.510
or the eigenvalues
or L very big.

00:45:08.510 --> 00:45:12.170
And again, the matrix starts
to look diagonally dominated.

00:45:12.170 --> 00:45:15.330
And you can promote
convergence in this way.

00:45:15.330 --> 00:45:17.960
So even though this
may be slow, you

00:45:17.960 --> 00:45:19.640
can use it to
guarantee convergence

00:45:19.640 --> 00:45:22.299
of some iterative procedures,
not just for linear equations,

00:45:22.299 --> 00:45:23.840
but for non-linear
equations as well.

00:45:23.840 --> 00:45:25.370
And we'll see,
there are good ways

00:45:25.370 --> 00:45:27.230
of choosing omega
for certain classes

00:45:27.230 --> 00:45:29.010
of non-linear equations.

00:45:29.010 --> 00:45:30.440
We'll apply
Newton-Raphson method,

00:45:30.440 --> 00:45:33.590
and then will damp it using
exactly the sort of procedure.

00:45:33.590 --> 00:45:37.190
And I'll show you
how you can choose

00:45:37.190 --> 00:45:41.860
a nearly optimal value for
omega to promote convergence

00:45:41.860 --> 00:45:43.610
to the solution.

00:45:43.610 --> 00:45:46.480
Any questions?

00:45:46.480 --> 00:45:50.140
No, let me address one
more thing before you go.

00:45:50.140 --> 00:45:53.140
We've scheduled times
for the quizzes.

00:45:53.140 --> 00:45:55.180
They are going to
be in the evenings

00:45:55.180 --> 00:45:58.180
on the dates that are
specified on the syllabus.

00:45:58.180 --> 00:46:01.060
We wanted to do them
during the daytime.

00:46:01.060 --> 00:46:02.890
It was really
difficult to schedule

00:46:02.890 --> 00:46:04.970
a room that was big
enough for this class,

00:46:04.970 --> 00:46:07.990
so they have to be from 7:00
to 9:00 in the gymnasium.

00:46:07.990 --> 00:46:09.709
I apologize for that.

00:46:09.709 --> 00:46:11.500
We spent several days
looking around trying

00:46:11.500 --> 00:46:14.050
to find a place where
we could put everybody

00:46:14.050 --> 00:46:17.680
so you would all get the
same experience in the quiz.

00:46:17.680 --> 00:46:20.590
I know that the November
quiz comes back to back

00:46:20.590 --> 00:46:24.130
with the thermodynamics
exam as well.

00:46:24.130 --> 00:46:26.030
That's frustrating.

00:46:26.030 --> 00:46:27.850
Thermodynamics is the next day.

00:46:27.850 --> 00:46:29.170
That week is tricky.

00:46:29.170 --> 00:46:33.004
That's AICHE, so most of
the faculty have to travel.

00:46:33.004 --> 00:46:34.420
We won't be able
to teach, but you

00:46:34.420 --> 00:46:36.350
won't have classes
one of those days

00:46:36.350 --> 00:46:39.910
so you have extra time to study.

00:46:39.910 --> 00:46:41.750
And Columbus Day also
falls in that week,

00:46:41.750 --> 00:46:44.590
so there's no way to put
three exams in four days

00:46:44.590 --> 00:46:46.830
without having them
come right back to back.

00:46:46.830 --> 00:46:48.460
Believe me, we
thought about this

00:46:48.460 --> 00:46:51.004
and tried to get things
scheduled as efficiently as we

00:46:51.004 --> 00:46:52.420
could for you, but
sometimes there

00:46:52.420 --> 00:46:54.420
are constraints that are
outside of our control.

00:46:54.420 --> 00:46:55.927
But the quiz times are set.

00:46:55.927 --> 00:46:58.260
There's going to be done in
October and one in November.

00:46:58.260 --> 00:47:00.790
They'll be in the evening, and
they'll be in the gymnasium.

00:47:00.790 --> 00:47:03.790
I'll give you directions
to it before the exam, just

00:47:03.790 --> 00:47:06.280
say you know exactly
where to go, OK?

00:47:06.280 --> 00:47:08.210
Thank you, guys.