WEBVTT

00:00:00.500 --> 00:00:01.950
The following
content is provided

00:00:01.950 --> 00:00:06.090
by MIT OpenCourseWare under
a Creative Commons license.

00:00:06.090 --> 00:00:08.330
Additional information
about our license

00:00:08.330 --> 00:00:10.470
and MIT OpenCourseWare
in general,

00:00:10.470 --> 00:00:11.930
is available at ocw.mit.edu.

00:00:17.100 --> 00:00:20.310
PROFESSOR: Up on the web
now are lots of sections,

00:00:20.310 --> 00:00:22.280
including this
conjugate gradients,

00:00:22.280 --> 00:00:27.960
so today is the end of the
conjugate gradients discussion,

00:00:27.960 --> 00:00:33.000
and it's kind of a
difficult algorithm.

00:00:33.000 --> 00:00:37.590
It's beautiful and
slightly tricky,

00:00:37.590 --> 00:00:42.610
but if I just see what the
point is of conjugate gradients,

00:00:42.610 --> 00:00:47.590
we don't have to check that the
exact formulas that I've listed

00:00:47.590 --> 00:00:52.440
here -- so that's one step
of conjugate gradients:

00:00:52.440 --> 00:01:00.480
going from k minus 1 to k, and
we'll look at each of the five

00:01:00.480 --> 00:01:06.650
steps in the cycle, without
patiently checking all those

00:01:06.650 --> 00:01:07.590
formulas.

00:01:07.590 --> 00:01:09.780
We'll just, sort of, see
how much work is involved.

00:01:09.780 --> 00:01:17.040
But let me begin where we were
last time, which was Arnoldi.

00:01:17.040 --> 00:01:20.250
So Arnoldi was a
Gram-Schmidt-like algorithm,

00:01:20.250 --> 00:01:26.350
not exactly Gram-Schmidt, that
produced an orthogonal basis,

00:01:26.350 --> 00:01:30.430
and that basis is the basis
for the Krylov subspace.

00:01:30.430 --> 00:01:38.620
So each new basis vector
in the Krylov subspace

00:01:38.620 --> 00:01:42.170
is found by multiplying
by another power of A,

00:01:42.170 --> 00:01:44.620
but that's a terrible basis.

00:01:44.620 --> 00:01:50.920
So the basis that we get from
these guys has to be improved,

00:01:50.920 --> 00:01:54.320
orthogonalized, and
that's what Arnoldi does.

00:01:54.320 --> 00:01:58.790
And if we look to see
what it does matrix-wise,

00:01:58.790 --> 00:02:02.900
it turns out to be exactly
-- have that nice equation,

00:02:02.900 --> 00:02:08.490
that we're creating a matrix
Q with the orthogonal vectors,

00:02:08.490 --> 00:02:14.320
and a matrix H that tells
us how they were connected

00:02:14.320 --> 00:02:19.880
to the matrix A. OK.

00:02:19.880 --> 00:02:33.270
So that's the central equation,
and the properties of H

00:02:33.270 --> 00:02:35.310
are important.

00:02:35.310 --> 00:02:42.900
So we saw that the key property
was, in general, we've got --

00:02:42.900 --> 00:02:45.620
it's full up in
that upper corner,

00:02:45.620 --> 00:02:49.320
just the way
Gram-Schmidt would be.

00:02:49.320 --> 00:02:53.830
But in case A is a
symmetric matrix,

00:02:53.830 --> 00:02:58.180
then we can check that H will
be symmetric from this formula.

00:02:58.180 --> 00:03:00.040
And the fact that
it's symmetric means

00:03:00.040 --> 00:03:03.290
that those are zeros
up there, because we

00:03:03.290 --> 00:03:05.750
know there are zeros down here.

00:03:05.750 --> 00:03:07.940
We know there's
only one diagonal

00:03:07.940 --> 00:03:09.570
below the main diagonal.

00:03:12.290 --> 00:03:14.050
And if it's symmetric,
then there's

00:03:14.050 --> 00:03:19.740
only one diagonal above,
and that's the great case,

00:03:19.740 --> 00:03:24.540
the symmetric case.

00:03:24.540 --> 00:03:29.120
So I'm going to take
two seconds just to say,

00:03:29.120 --> 00:03:31.820
here is the best way
to find eigenvalues.

00:03:31.820 --> 00:03:37.210
We haven't spoken one word
about computing eigenvalues,

00:03:37.210 --> 00:03:42.170
but this is the moment
to say that word.

00:03:42.170 --> 00:03:45.710
If I have a symmetric matrix,
eigenvalues are way, way easier

00:03:45.710 --> 00:03:48.320
for symmetric matrices.

00:03:48.320 --> 00:03:51.940
Linear systems are also
easier, but eigenvalues

00:03:51.940 --> 00:03:56.700
are an order of magnitude
easier for symmetric matrices.

00:03:56.700 --> 00:04:00.300
And here is a really good
way to find the eigenvalues

00:04:00.300 --> 00:04:02.420
of large matrices.

00:04:02.420 --> 00:04:05.440
I assume that you don't
want all the eigenvalues.

00:04:05.440 --> 00:04:12.160
I mean, that is very unusual
to need many, many eigenvalues

00:04:12.160 --> 00:04:14.090
of a big matrix.

00:04:14.090 --> 00:04:20.500
You're looking -- probably those
high eigenvalues are not really

00:04:20.500 --> 00:04:25.300
physically close, good
representations of the actual

00:04:25.300 --> 00:04:28.330
eigenvalues of the
continuous problem.

00:04:28.330 --> 00:04:30.480
So, we have continuous
and discrete

00:04:30.480 --> 00:04:34.650
that are close for
the low eigenvalues,

00:04:34.650 --> 00:04:40.810
because that's where
the error is smooth.

00:04:40.810 --> 00:04:45.690
High eigenvalues
are not so reliable.

00:04:45.690 --> 00:04:51.450
So for the low eigenvalues,
here is a good way to do them.

00:04:51.450 --> 00:04:59.540
Do Arnoldi, and it gets named
also, now, after Lanczos,

00:04:59.540 --> 00:05:03.210
because he had this
eigenvalue idea.

00:05:03.210 --> 00:05:07.560
Run that for a while.

00:05:07.560 --> 00:05:13.430
Say if you want 10 eigenvalues,
maybe go up to j equal to 20.

00:05:13.430 --> 00:05:16.490
So 20 Arnoldi steps,
quite reasonable.

00:05:16.490 --> 00:05:22.150
And then look at the 20
by 20 submatrix of H,

00:05:22.150 --> 00:05:25.910
so you're not going to
go to the end, to H_n.

00:05:25.910 --> 00:05:28.600
You're going to
stop at some point,

00:05:28.600 --> 00:05:31.180
and you have this nice matrix.

00:05:34.770 --> 00:05:38.060
It'll be tri-diagonal,
symmetric tri-diagonal,

00:05:38.060 --> 00:05:41.500
and that's the kind of
matrix we know how to find

00:05:41.500 --> 00:05:46.920
the eigenvalues of; so this
is symmetric and tri-diagonal,

00:05:46.920 --> 00:05:56.050
and eigenvalues of that class
of matrices are very nice,

00:05:56.050 --> 00:06:00.110
and they're very meaningful,
and they're very close to --

00:06:00.110 --> 00:06:04.800
in many cases; I won't try
to give a theory here --

00:06:04.800 --> 00:06:07.330
in many cases, you get
very good approximations.

00:06:07.330 --> 00:06:10.030
The eigenvalues
of this submatrix

00:06:10.030 --> 00:06:12.630
are sometimes
called Ritz values,

00:06:12.630 --> 00:06:18.550
and they use the eigenvalues
of this submatrix,

00:06:18.550 --> 00:06:24.430
and those are often
called Ritz values.

00:06:24.430 --> 00:06:29.170
And so the first 10 eigenvalues
of that 20 by 20 matrix,

00:06:29.170 --> 00:06:35.990
would be in many, many
problems very fast, very

00:06:35.990 --> 00:06:40.850
efficient approximations
to the low eigenvalues

00:06:40.850 --> 00:06:44.270
of the big matrix A
that you start with.

00:06:44.270 --> 00:06:48.790
OK, the reason is
that eigenvalues --

00:06:48.790 --> 00:06:51.380
do you all recognize
that eigenvalues --

00:06:51.380 --> 00:06:55.940
that this is the operation, when
I bring Q inverse over here,

00:06:55.940 --> 00:07:00.890
that I would call these
matrices similar --

00:07:00.890 --> 00:07:08.200
two matrices are similar if
there's a Q that connects them

00:07:08.200 --> 00:07:09.270
this way.

00:07:09.270 --> 00:07:14.860
And similar matrices
have the same lambdas,

00:07:14.860 --> 00:07:16.120
the same eigenvalues.

00:07:18.710 --> 00:07:22.790
So the full matrix Q, if we
went all the way to the end,

00:07:22.790 --> 00:07:26.070
we could find its eigenvalues,
but the whole point

00:07:26.070 --> 00:07:30.760
of all this, of this
month, is algorithms

00:07:30.760 --> 00:07:32.900
that you can stop
partway, and you still

00:07:32.900 --> 00:07:35.180
have something really good.

00:07:35.180 --> 00:07:42.560
OK, so that's -- I think
we can't do everything,

00:07:42.560 --> 00:07:45.570
so I won't try to do more with
eigenvalues except to mention

00:07:45.570 --> 00:07:46.070
that.

00:07:46.070 --> 00:07:50.370
Someday you may want the
eigenvalues of a large matrix,

00:07:50.370 --> 00:08:00.340
and I guess ARPACK would
be the Arnoldi pack, that

00:08:00.340 --> 00:08:06.030
would be the package to go
to for the Arnoldi iteration,

00:08:06.030 --> 00:08:09.640
and then to use for
the eigenvalues.

00:08:09.640 --> 00:08:17.930
OK, so that's a side point,
just since linear algebra is --

00:08:17.930 --> 00:08:20.990
half of it's linear systems
and half of it's eigenvalue

00:08:20.990 --> 00:08:26.490
problems, I didn't think I
could miss the chance to do that

00:08:26.490 --> 00:08:28.420
second half, the
eigenvalue part.

00:08:28.420 --> 00:08:33.750
OK, now I'm going to
tackle conjugate gradients,

00:08:33.750 --> 00:08:35.330
to get the idea.

00:08:35.330 --> 00:08:38.035
OK, so the idea,
well, of course, A

00:08:38.035 --> 00:08:42.900
is symmetric positive definite.

00:08:42.900 --> 00:08:47.460
If it's not, you could
try these formulas,

00:08:47.460 --> 00:08:52.780
but you'd be taking
a pretty big chance.

00:08:52.780 --> 00:08:56.530
If it is, these are just right.

00:08:56.530 --> 00:08:59.850
OK, so what's the
main point about --

00:08:59.850 --> 00:09:08.300
here's the rule that
decides what x_k will be.

00:09:08.300 --> 00:09:15.490
x_k will be the vector in
the Krylov -- of course,

00:09:15.490 --> 00:09:22.440
x_k is in this Krylov space K_k.

00:09:25.040 --> 00:09:32.490
So it'll be -- we can
create x_k recursively,

00:09:32.490 --> 00:09:35.290
and we should need,
just, if we do it right,

00:09:35.290 --> 00:09:38.710
should need just one
multiplication by A at every

00:09:38.710 --> 00:09:42.970
step, and then
some inner product.

00:09:42.970 --> 00:09:47.140
OK, so now, what do
we know from this?

00:09:47.140 --> 00:09:53.090
This vector, since it has
this, it starts with --

00:09:53.090 --> 00:09:56.900
it has an x_k that's in
the k-th Krylov subspace,

00:09:56.900 --> 00:10:01.950
but now it's multiplied by A, so
that's up to the k plus first.

00:10:01.950 --> 00:10:08.520
And b is in K_1, it's in
all the Krylov subspaces,

00:10:08.520 --> 00:10:18.980
so this is in -- and therefore
r_k is in -- K_(k+1), right?

00:10:18.980 --> 00:10:22.070
This is in the next space
up, because there's a

00:10:22.070 --> 00:10:25.660
multiplication by
A. And therefore,

00:10:25.660 --> 00:10:30.580
if we choose to determine
it by this rule,

00:10:30.580 --> 00:10:36.120
we learn that this
vector -- I mean,

00:10:36.120 --> 00:10:44.340
we already know from Arnoldi
that q_(k+1) is orthogonal

00:10:44.340 --> 00:10:46.470
to this space.

00:10:46.470 --> 00:10:53.680
This is the new k plus first
vector that's in K_(k+1),

00:10:53.680 --> 00:10:56.710
so this is just where
this is to be found.

00:10:56.710 --> 00:11:02.160
So this vector r_k is found
in the next Krylov subspace;

00:11:02.160 --> 00:11:05.240
this one is in the
next Krylov subspace.

00:11:05.240 --> 00:11:09.060
They're both orthogonal to all
the previous Krylov subspaces,

00:11:09.060 --> 00:11:11.150
so one's a multiple
of the other.

00:11:14.180 --> 00:11:17.820
So, that will mean
that automatically,

00:11:17.820 --> 00:11:22.430
that the r's have the
same property that Arnoldi

00:11:22.430 --> 00:11:25.550
created for the q's:
they're orthogonal.

00:11:25.550 --> 00:11:29.920
So the residuals are orthogonal.

00:11:29.920 --> 00:11:32.540
So, orthogonal residuals.

00:11:32.540 --> 00:11:39.850
Orthogonal residuals, good.

00:11:43.970 --> 00:11:48.050
I better say, by the way, when
we do conjugate gradients,

00:11:48.050 --> 00:11:52.590
we don't also do Arnoldi, so
Arnoldi's finding these q's;

00:11:52.590 --> 00:11:55.740
we're leaving him
behind for a while,

00:11:55.740 --> 00:11:57.590
because we're going
to go directly

00:11:57.590 --> 00:12:00.430
to the x's and r's here.

00:12:00.430 --> 00:12:05.420
So, I won't know the q's but
I know a lot about the q,

00:12:05.420 --> 00:12:10.720
so it's natural to write this
simple fact: that each residual

00:12:10.720 --> 00:12:13.750
is a multiple of the new
q, and therefore, it's

00:12:13.750 --> 00:12:16.770
orthogonal to all the old ones.

00:12:16.770 --> 00:12:23.410
And now I want to -- this is the
other property that determines

00:12:23.410 --> 00:12:25.190
the new x.

00:12:25.190 --> 00:12:29.670
So the new x should be
-- what do I see here,

00:12:29.670 --> 00:12:36.620
I see a sort of delta x, the
change in x, the correction --

00:12:36.620 --> 00:12:40.620
which is what I want to
compute -- at step k,

00:12:40.620 --> 00:12:42.750
that's what I have to find.

00:12:42.750 --> 00:12:46.640
And I could look at the
corrections at the old steps,

00:12:46.640 --> 00:12:50.010
and notice there's
an A in the middle.

00:12:50.010 --> 00:12:52.270
So the corrections
to the x's are

00:12:52.270 --> 00:12:58.410
orthogonal in the
A inner products,

00:12:58.410 --> 00:13:01.680
and that's where the name
conjugate comes from.

00:13:01.680 --> 00:13:11.190
So this is conjugate directions.

00:13:11.190 --> 00:13:15.650
Why do I say directions,
and why do I say conjugate?

00:13:15.650 --> 00:13:17.650
Conjugate gradients,
even, I could say.

00:13:17.650 --> 00:13:19.790
Why did I --
conjugate directions,

00:13:19.790 --> 00:13:22.430
I maybe shouldn't
have erased that,

00:13:22.430 --> 00:13:28.390
but I'll also say
conjugate gradients.

00:13:28.390 --> 00:13:30.690
OK.

00:13:30.690 --> 00:13:34.920
So conjugate, all I
mean by conjugate,

00:13:34.920 --> 00:13:42.060
is orthogonal in the natural
inner product that goes with

00:13:42.060 --> 00:13:44.370
the problem, and
that's given by A.

00:13:44.370 --> 00:13:47.900
So the inner product given
by the problem is the inner

00:13:47.900 --> 00:13:54.070
product of x and y,
is x transposed A*y.

00:13:54.070 --> 00:14:00.080
And that's a good inner product,
and it's the one that comes

00:14:00.080 --> 00:14:04.750
with the problem and
so, natural to look

00:14:04.750 --> 00:14:09.570
at orthogonality in that
sense, and that's what we have.

00:14:09.570 --> 00:14:12.750
So these are orthogonal
in the usual sense,

00:14:12.750 --> 00:14:15.580
because they have an A
already built into them,

00:14:15.580 --> 00:14:19.160
and these are orthogonal
in the A inner product.

00:14:19.160 --> 00:14:21.510
OK.

00:14:21.510 --> 00:14:26.720
So I wrote down two
requirements on the new x.

00:14:26.720 --> 00:14:29.060
Well, actually you might
say, I wrote down a whole lot

00:14:29.060 --> 00:14:32.010
of requirements, because I'm
saying that this should hold

00:14:32.010 --> 00:14:33.960
for all the previous i's.

00:14:33.960 --> 00:14:37.420
And this should hold for
all the previous i's.

00:14:37.420 --> 00:14:39.200
And you see indices
are getting in here,

00:14:39.200 --> 00:14:45.310
and I'm asking
for your patience.

00:14:45.310 --> 00:14:47.890
What's the point that
I want to make here?

00:14:47.890 --> 00:14:52.470
It's this short recurrence
point, this point that we had,

00:14:52.470 --> 00:14:56.760
over here, when h was
just tri-diagonal,

00:14:56.760 --> 00:14:59.220
so we only needed
-- at the new step,

00:14:59.220 --> 00:15:06.080
the only things we had
to find were the --

00:15:06.080 --> 00:15:10.650
say at the second step, we just
had to find that h and that h,

00:15:10.650 --> 00:15:14.200
because this one was
the same as that.

00:15:14.200 --> 00:15:21.410
And then the next one, we --
maybe I didn't say that right.

00:15:21.410 --> 00:15:24.870
I guess when we --
let me start again.

00:15:24.870 --> 00:15:29.540
At the first step I need
to find two numbers.

00:15:29.540 --> 00:15:31.430
Now because H is
symmetric, that's

00:15:31.430 --> 00:15:32.900
already the same as that.

00:15:32.900 --> 00:15:35.560
So at the next step I
need these two numbers,

00:15:35.560 --> 00:15:37.600
and then that number
is already that.

00:15:37.600 --> 00:15:42.560
So, two numbers at every
step and big zeros up here.

00:15:42.560 --> 00:15:46.490
That's that's the key
point when A is symmetric.

00:15:46.490 --> 00:15:49.480
So the same will
be true over here.

00:15:49.480 --> 00:15:58.190
We have two numbers, and
they're called alpha and beta,

00:15:58.190 --> 00:15:59.820
so they're a little
different numbers,

00:15:59.820 --> 00:16:04.020
because we're working with the
x's now and not with the q's,

00:16:04.020 --> 00:16:09.330
so you don't see any q's written
down for conjugate gradients.

00:16:09.330 --> 00:16:11.060
OK.

00:16:11.060 --> 00:16:15.940
All right, let me just look
at the cycle of five steps.

00:16:18.660 --> 00:16:28.170
So I have to give it a start,
so I start with d_0 as b and x_0

00:16:28.170 --> 00:16:36.940
as 0, and I guess, r_0 which,
of course, is b minus A*x_0.

00:16:36.940 --> 00:16:40.300
If that's 0, it's also b.

00:16:40.300 --> 00:16:41.460
So those are the starts.

00:16:44.630 --> 00:16:47.460
So I'm ready to take a step.

00:16:47.460 --> 00:16:51.570
I know the ones
with subscript zero,

00:16:51.570 --> 00:16:54.110
and I'm ready for the next step.

00:16:54.110 --> 00:16:56.540
I know the ones with
subscript k minus 1,

00:16:56.540 --> 00:17:01.050
and I'm ready for this
step, this cycle, k.

00:17:04.760 --> 00:17:06.370
So, here's a number.

00:17:09.610 --> 00:17:12.900
So I'm coming into this -- well
let me say maybe how I should

00:17:12.900 --> 00:17:13.490
say it.

00:17:13.490 --> 00:17:18.460
I'm coming into that cycle
with a new d, now what is a d?

00:17:18.460 --> 00:17:20.650
It's a search direction.

00:17:20.650 --> 00:17:22.570
It's the way I'm going to go.

00:17:22.570 --> 00:17:26.260
It's the direction in which
this delta x is going to go.

00:17:26.260 --> 00:17:29.640
This delta x is going to be,
probably I'll see it here.

00:17:29.640 --> 00:17:34.950
Yeah, there is delta x,
is a multiple of d, right?

00:17:34.950 --> 00:17:39.650
If I put this on this
side, that's delta x,

00:17:39.650 --> 00:17:42.520
the change in x is
a multiple of d,

00:17:42.520 --> 00:17:46.130
so d is the direction
in n-dimensional space

00:17:46.130 --> 00:17:49.840
that I'm looking
for the correction,

00:17:49.840 --> 00:17:53.210
and alpha is the
number, is how far

00:17:53.210 --> 00:17:58.650
I go along d for the best,
for the right, delta x.

00:17:58.650 --> 00:18:06.190
So alpha will be determined
by, probably by that.

00:18:06.190 --> 00:18:11.740
Probably by that, yeah, it
involves an inner product,

00:18:11.740 --> 00:18:13.950
picking off a component.

00:18:13.950 --> 00:18:14.730
OK?

00:18:14.730 --> 00:18:18.000
So the first step is
to find the number;

00:18:18.000 --> 00:18:22.170
the next step is to take
that update, delta x is

00:18:22.170 --> 00:18:24.740
the right multiple of
the search direction

00:18:24.740 --> 00:18:26.460
that I'm coming in with.

00:18:26.460 --> 00:18:30.020
Then correcting r is easy;
I'll come back to that;

00:18:30.020 --> 00:18:31.810
if I know the
change in x, I know

00:18:31.810 --> 00:18:35.390
the change in r right away.

00:18:35.390 --> 00:18:39.600
So that's given me,
I'm now updated.

00:18:39.600 --> 00:18:43.040
But now I have to look ahead.

00:18:43.040 --> 00:18:47.390
What direction am I
going to travel next?

00:18:47.390 --> 00:18:54.390
So the next steps are to find a
number beta and then the search

00:18:54.390 --> 00:19:01.620
direction, which isn't r; it's r
again with just one correction.

00:19:01.620 --> 00:19:05.310
So it's a beautiful set
of formulas, beautiful.

00:19:05.310 --> 00:19:09.610
And I could write -- you know,
because I know x's and r's, I

00:19:09.610 --> 00:19:16.250
could write the formulas --
there are different expressions

00:19:16.250 --> 00:19:20.895
for the same alpha and beta,
but this is a very satisfactory

00:19:20.895 --> 00:19:21.550
one.

00:19:21.550 --> 00:19:22.730
OK.

00:19:22.730 --> 00:19:26.820
Have a look at that
cycle just to see what

00:19:26.820 --> 00:19:28.840
how much work is involved.

00:19:28.840 --> 00:19:31.380
How much work is
involved in that cycle?

00:19:31.380 --> 00:19:35.490
OK, well, there's
a multiplication

00:19:35.490 --> 00:19:38.340
by A, no surprise.

00:19:38.340 --> 00:19:40.050
It's that
multiplication by A that

00:19:40.050 --> 00:19:43.480
takes us to the
next Krylov space,

00:19:43.480 --> 00:19:46.600
takes us to the next update.

00:19:46.600 --> 00:19:50.030
So I see a multiplication by
A, and also here, but it's

00:19:50.030 --> 00:19:51.940
the same multiplication.

00:19:51.940 --> 00:19:59.090
So I see one multiplication
by A. So, easy to list.

00:19:59.090 --> 00:20:05.500
So each cycle, there's
one multiplication A

00:20:05.500 --> 00:20:12.460
times the search direction, and
because A is a sparse matrix,

00:20:12.460 --> 00:20:18.450
the cost there is the number
of non-zeros in the matrix.

00:20:18.450 --> 00:20:20.260
OK.

00:20:20.260 --> 00:20:24.520
Then, what other computations
do you have to do?

00:20:24.520 --> 00:20:32.020
I see an inner product there,
and I'm going to use it again.

00:20:32.020 --> 00:20:35.560
I guess, and here it
is at the next step,

00:20:35.560 --> 00:20:42.040
so I guess per cycle, I have
to do one inner product of r

00:20:42.040 --> 00:20:47.030
with itself, and I have to do
one of these inner products.

00:20:47.030 --> 00:20:53.270
So I see two inner products,
and of course, these

00:20:53.270 --> 00:20:54.720
are vectors of length n.

00:20:54.720 --> 00:21:02.550
so that's 2n multiplications,
2n additions.

00:21:02.550 --> 00:21:08.930
OK, this was a multiple of
n, depending how sparse A is,

00:21:08.930 --> 00:21:14.326
this is 2n adds and 2n
multiplies for these two

00:21:14.326 --> 00:21:14.950
inner products.

00:21:14.950 --> 00:21:18.050
OK.

00:21:18.050 --> 00:21:18.550
Yeah.

00:21:18.550 --> 00:21:23.450
I guess you could
say take them there.

00:21:23.450 --> 00:21:25.400
And then what else
do we have to do?

00:21:25.400 --> 00:21:27.980
Oh, we have these
updates, so I have

00:21:27.980 --> 00:21:32.230
to multiply a vector
by that scalar and add.

00:21:32.230 --> 00:21:35.480
And again here, I multiply
that vector by that scalar

00:21:35.480 --> 00:21:37.610
and subtract.

00:21:37.610 --> 00:21:40.310
Oh, and do I have
to do it again here?

00:21:40.310 --> 00:21:41.550
Have I got three?

00:21:41.550 --> 00:21:44.560
Somehow, I thought I
might only have two.

00:21:44.560 --> 00:21:51.670
Oh, yeah, let me say two
or three vector updates.

00:21:58.410 --> 00:22:07.120
People are much better than I
am at spotting the right way

00:22:07.120 --> 00:22:09.890
to organize those calculations.

00:22:09.890 --> 00:22:17.470
But that's really a very
small price to pay per cycle,

00:22:17.470 --> 00:22:24.980
and so if it converges
quickly, as it normally does,

00:22:24.980 --> 00:22:26.630
this is going to be
a good algorithm.

00:22:26.630 --> 00:22:30.020
Maybe I'll -- maybe having
introduced the question of how

00:22:30.020 --> 00:22:33.130
quickly does it converge, I
should maybe say something

00:22:33.130 --> 00:22:34.950
about that.

00:22:34.950 --> 00:22:40.520
So what's the error,
after k steps?

00:22:40.520 --> 00:22:43.900
How is it related to
the error at the start?

00:22:46.830 --> 00:22:49.810
So, I have to first say, what
do I mean by this measure

00:22:49.810 --> 00:22:57.150
of error, and then I
have to say what's the --

00:22:57.150 --> 00:23:01.620
I hope I see a factor, less
than 1, to the k power.

00:23:01.620 --> 00:23:05.780
Yeah, so I'm hoping some factor
to the k power will be there,

00:23:05.780 --> 00:23:09.090
and I sure hope
it's less than 1.

00:23:09.090 --> 00:23:10.770
Let me say what it is.

00:23:10.770 --> 00:23:15.080
I think -- so this is maybe
not the very best estimate,

00:23:15.080 --> 00:23:17.080
but it shows the main point.

00:23:17.080 --> 00:23:21.060
It's the square
root of lambda_max

00:23:21.060 --> 00:23:24.920
minus the square
root of lambda_min,

00:23:24.920 --> 00:23:26.980
divided by the
square root of that

00:23:26.980 --> 00:23:30.110
plus the square root of that.

00:23:30.110 --> 00:23:34.260
And maybe there's a
factor 2 somewhere.

00:23:34.260 --> 00:23:39.130
So this is one estimate
for the convergence factor,

00:23:39.130 --> 00:23:44.060
and you see what it depends
on, I could divide everything

00:23:44.060 --> 00:23:48.940
by lambda_min there, so that I
could write that as the square

00:23:48.940 --> 00:23:54.120
root of the condition number --
I'll divide by that -- minus 1,

00:23:54.120 --> 00:23:58.100
divided by the square root of
the condition number plus 1,

00:23:58.100 --> 00:23:59.150
to the k power.

00:24:01.950 --> 00:24:08.660
So the condition number has a
part in this error estimate,

00:24:08.660 --> 00:24:14.620
and I won't derive that,
even in the notes, I won't.

00:24:14.620 --> 00:24:18.450
And other people work out
other estimates for the error,

00:24:18.450 --> 00:24:20.590
it's an interesting problem.

00:24:20.590 --> 00:24:24.680
And I do have to say
that this norm is

00:24:24.680 --> 00:24:33.380
the natural inner
product, e_k squared

00:24:33.380 --> 00:24:37.110
is the inner product
of e_k with itself,

00:24:37.110 --> 00:24:39.970
but again with A in the middle.

00:24:39.970 --> 00:24:44.470
I suppose that a little
part of the message here,

00:24:44.470 --> 00:24:51.250
and on that middle board, is
that it's just natural to see

00:24:51.250 --> 00:24:54.620
the matrix A playing
a part in the --

00:24:54.620 --> 00:24:57.370
it just comes in naturally.

00:24:57.370 --> 00:25:00.810
Conjugate gradients
brings it in naturally.

00:25:00.810 --> 00:25:07.650
These formulas are just
created so that this will work.

00:25:07.650 --> 00:25:09.990
OK.

00:25:09.990 --> 00:25:13.150
So I suppose I'm
thinking there's

00:25:13.150 --> 00:25:17.160
one word I haven't
really justified,

00:25:17.160 --> 00:25:18.690
and that's the word gradient.

00:25:18.690 --> 00:25:21.590
Why do we call it
conjugate gradient?

00:25:21.590 --> 00:25:25.480
Here I erased --
"conjugate directions"

00:25:25.480 --> 00:25:28.720
I was quite happy
with, the directions

00:25:28.720 --> 00:25:38.450
or the d's, because this is
the direction in which we moved

00:25:38.450 --> 00:25:43.100
from x, so conjugate
directions would be just fine,

00:25:43.100 --> 00:25:49.830
and those words come into this
part of numerical mathematics,

00:25:49.830 --> 00:25:51.990
but where do gradients come in?

00:25:51.990 --> 00:25:53.430
OK.

00:25:53.430 --> 00:25:54.880
So, let me say,
gradient of what?

00:25:57.860 --> 00:26:02.570
Well, what is --
so first of all,

00:26:02.570 --> 00:26:15.580
gradient of what is in these
linear problems where A*x equal

00:26:15.580 --> 00:26:16.780
b?

00:26:16.780 --> 00:26:19.380
Those come from the
gradient of the energy.

00:26:19.380 --> 00:26:22.470
So can I say E of
x for the energy?

00:26:22.470 --> 00:26:27.940
As 1/2 x transpose A*x
minus b transpose x?

00:26:35.030 --> 00:26:36.700
You might say, where
did that come from?

00:26:41.240 --> 00:26:43.860
Normally, we've been talking
about linear systems,

00:26:43.860 --> 00:26:47.530
everything's been in terms of
A*x equal b, and now suddenly,

00:26:47.530 --> 00:26:52.220
I'm changing to a different
part of mathematics.

00:26:52.220 --> 00:26:56.520
I'm looking at minimizing
-- so optimization,

00:26:56.520 --> 00:27:02.440
the part that's coming in
April, is minimizing energy,

00:27:02.440 --> 00:27:07.040
minimizing function,
and what's the link?

00:27:07.040 --> 00:27:13.120
Well, if I minimize that,
the gradient of this is --

00:27:13.120 --> 00:27:18.560
how shall I write? grad
E, the gradient of E,

00:27:18.560 --> 00:27:28.030
the list of the derivatives
of E with respect to the x_i,

00:27:28.030 --> 00:27:33.650
and if we work that out -- well,
why not pretend it's scalar

00:27:33.650 --> 00:27:36.310
for the moment, so
these are just numbers.

00:27:36.310 --> 00:27:43.250
Then this ordinary derivative
of 1/2 A x squared is A*x,

00:27:43.250 --> 00:27:51.950
and the derivative of b*x is
b, and that rule continues

00:27:51.950 --> 00:27:53.740
into the vector case.

00:27:53.740 --> 00:27:57.910
So what have we got here?

00:27:57.910 --> 00:28:02.190
We've computed, we've
introduced the natural energy,

00:28:02.190 --> 00:28:05.650
so this is a natural
energy for the problem,

00:28:05.650 --> 00:28:12.470
and the reason it's natural
is when we minimize it,

00:28:12.470 --> 00:28:19.240
set the derivatives to zero, we
got the equation A*x equal b,

00:28:19.240 --> 00:28:21.820
so that would be
zero at a minimum.

00:28:21.820 --> 00:28:27.190
This would equal zero at a min,
so what I'm doing right now

00:28:27.190 --> 00:28:29.350
is pretty serious.

00:28:29.350 --> 00:28:35.860
On the other hand, I'll repeat
it again in April when we do

00:28:35.860 --> 00:28:39.500
minimizations,
but this is the --

00:28:39.500 --> 00:28:45.160
this quadratic energy
has a linear gradient,

00:28:45.160 --> 00:28:49.440
and its minimum is
exactly A*x equal b.

00:28:49.440 --> 00:28:52.340
The minimum point
is A*x equal b.

00:28:52.340 --> 00:28:55.570
In other words, solving
the linear equation

00:28:55.570 --> 00:28:59.260
and minimizing the energy
are one and the same, one

00:28:59.260 --> 00:29:00.740
and the same.

00:29:00.740 --> 00:29:04.940
And I'm allowed to
use the word min,

00:29:04.940 --> 00:29:11.420
because A is positive definite,
so the positive definiteness

00:29:11.420 --> 00:29:15.930
of A is what means that
this is really a minimum

00:29:15.930 --> 00:29:18.790
and not a maximum
or a saddle point.

00:29:18.790 --> 00:29:19.560
OK.

00:29:19.560 --> 00:29:25.280
So I was just -- I wanted
just to think about how do you

00:29:25.280 --> 00:29:26.700
minimize a function?

00:29:29.250 --> 00:29:34.230
I guess any sensible
person, given a function

00:29:34.230 --> 00:29:39.310
and looking for the minimum,
would figure out the gradient

00:29:39.310 --> 00:29:43.380
and move that way, move,
well, not up the gradient,

00:29:43.380 --> 00:29:47.090
move down the gradient, so
I imagine this minimization

00:29:47.090 --> 00:29:54.170
problem, I imagine here I'm
in x, this is the x_1, x_2,

00:29:54.170 --> 00:30:00.920
the x's; here's the E, E of
x is something like that,

00:30:00.920 --> 00:30:03.240
maybe its minimum is there.

00:30:03.240 --> 00:30:09.410
This is the graph of E of x, and
I'm trying to find that point,

00:30:09.410 --> 00:30:14.600
and I make as first
guess, somewhere there.

00:30:14.600 --> 00:30:16.980
Somewhere on that surface.

00:30:16.980 --> 00:30:25.580
Maybe, so to help your
eye, this is like a bowl,

00:30:25.580 --> 00:30:28.610
and maybe there's
a point, so that's

00:30:28.610 --> 00:30:31.480
on the surface of the bowl.

00:30:31.480 --> 00:30:34.920
That's x_0, let's say.

00:30:34.920 --> 00:30:41.640
I'm trying to get to the bottom
of the bowl, and as I said,

00:30:41.640 --> 00:30:43.900
any sensible person
would figure out, well,

00:30:43.900 --> 00:30:46.340
what's the steepest way down.

00:30:46.340 --> 00:30:48.960
The steepest way down is the
direction of the gradient.

00:30:51.670 --> 00:30:59.470
So I could call this gradient
g, if I wanted, and then I

00:30:59.470 --> 00:31:01.380
would go in the
direction of minus g,

00:31:01.380 --> 00:31:03.420
because I want to
go down and not up.

00:31:03.420 --> 00:31:07.310
I mean, we're climbing,
I shouldn't say climbing,

00:31:07.310 --> 00:31:14.150
we're descending a mountain,
and trying to get to the --

00:31:14.150 --> 00:31:15.870
or descending a valley, sorry.

00:31:15.870 --> 00:31:17.790
We're descending into
a valley and trying

00:31:17.790 --> 00:31:20.400
to get to the bottom.

00:31:20.400 --> 00:31:22.430
And I'm assuming
that this valley

00:31:22.430 --> 00:31:33.310
is this nice shape given by
just a second-degree polynomial.

00:31:33.310 --> 00:31:38.220
Of course, when we
get to optimization,

00:31:38.220 --> 00:31:40.670
this will be the simplest
problem, but not the only one.

00:31:40.670 --> 00:31:42.560
Here it's our problem.

00:31:42.560 --> 00:31:47.400
Now, of course -- so
what happens if I move

00:31:47.400 --> 00:31:54.570
in the direction of -- the
direction water would flow.

00:31:54.570 --> 00:31:58.270
If I tip a flask of
water on the ground,

00:31:58.270 --> 00:32:01.730
it's going to flow in
the steepest direction,

00:32:01.730 --> 00:32:05.860
and that would be the
natural direction to go.

00:32:09.490 --> 00:32:12.430
But it's not the best direction.

00:32:12.430 --> 00:32:18.960
Maybe the first step is, but
-- so I'm talking now about

00:32:18.960 --> 00:32:23.940
conjugate gradient seen
as a minimization problem,

00:32:23.940 --> 00:32:26.780
because that's where the
word gradient is justified.

00:32:26.780 --> 00:32:31.310
This is the gradient,
and of course, this

00:32:31.310 --> 00:32:35.890
is just minus the
residual, so this is

00:32:35.890 --> 00:32:37.340
the direction of the residual.

00:32:37.340 --> 00:32:39.410
So that's the
question, should I just

00:32:39.410 --> 00:32:42.980
go in the direction of
the residual all the time?

00:32:42.980 --> 00:32:46.600
At every step -- and this is
what the method of steepest

00:32:46.600 --> 00:32:50.900
descent will do, so let
me make the contrast.

00:32:50.900 --> 00:32:55.790
Steepest descent
is the first thing

00:32:55.790 --> 00:33:01.300
you would think
of, direction is r.

00:33:05.630 --> 00:33:08.820
That's the gradient direction,
or the negative gradient

00:33:08.820 --> 00:33:09.720
direction.

00:33:09.720 --> 00:33:16.130
Follow r until, in that
direction, you've hit bottom.

00:33:16.130 --> 00:33:18.900
So you'll go down this,
you see what happens,

00:33:18.900 --> 00:33:22.920
you'll be going down one
side, and you'll hit bottom,

00:33:22.920 --> 00:33:28.040
and then you will come up
again on a plane that's

00:33:28.040 --> 00:33:30.370
cutting through the surface.

00:33:30.370 --> 00:33:34.000
But you're not going to
hit that, first shot.

00:33:34.000 --> 00:33:36.110
When you start here,
see water would --

00:33:36.110 --> 00:33:40.710
the actual steepest direction,
the gradient direction,

00:33:40.710 --> 00:33:44.210
changes as the water
flows down to the bottom,

00:33:44.210 --> 00:33:48.610
but here we've chosen
a fixed direction;

00:33:48.610 --> 00:33:52.170
we're following it as
long as we're going down,

00:33:52.170 --> 00:33:54.060
but then it'll start up again.

00:33:54.060 --> 00:34:02.920
So we stop there and find
the gradient again there.

00:34:02.920 --> 00:34:05.520
Follow that down,
until it goes up again;

00:34:05.520 --> 00:34:07.650
follow that down,
until it goes up again,

00:34:07.650 --> 00:34:14.260
and the trouble is
convergence isn't great.

00:34:14.260 --> 00:34:16.800
The trouble is that
we end up -- even

00:34:16.800 --> 00:34:19.400
if we're in
100-dimensional space,

00:34:19.400 --> 00:34:29.080
we end up sort of just -- our
repeated gradients are not

00:34:29.080 --> 00:34:31.260
really in good, new directions.

00:34:31.260 --> 00:34:38.110
We find ourselves doing a lot of
work to make a little progress,

00:34:38.110 --> 00:34:46.670
and the reason is that these
r's don't have the property

00:34:46.670 --> 00:34:52.300
that you're always looking for
in computations, orthogonality.

00:34:52.300 --> 00:34:56.010
Somehow orthogonality tells
you that you're really

00:34:56.010 --> 00:35:01.450
getting something new, if it's
orthogonal to all the old ones.

00:35:01.450 --> 00:35:02.360
OK.

00:35:02.360 --> 00:35:09.570
Now, so this, the
direction r is not great.

00:35:14.870 --> 00:35:17.780
The better one is to
have some orthogonality.

00:35:17.780 --> 00:35:31.410
Take a direction d that
-- well, here's the point.

00:35:34.590 --> 00:35:36.810
This is the right direction;
this is the direction

00:35:36.810 --> 00:35:38.420
conjugate gradients choose.

00:35:38.420 --> 00:35:41.550
This is where --
so it's a gradient,

00:35:41.550 --> 00:35:47.340
but then it removes the
component in the direction we

00:35:47.340 --> 00:35:50.010
just took.

00:35:50.010 --> 00:35:52.120
That's what that beta will be.

00:35:52.120 --> 00:35:58.170
When we compute that beta,
and I'm a little surprised

00:35:58.170 --> 00:36:00.840
that it isn't a minus sign.

00:36:00.840 --> 00:36:04.360
No, it's OK.

00:36:04.360 --> 00:36:07.780
It turns out that
way, let me just

00:36:07.780 --> 00:36:10.860
be sure that I wrote
the formula down right,

00:36:10.860 --> 00:36:16.600
at least wrote it down
as I have it here, yep.

00:36:16.600 --> 00:36:19.410
Again, I'm not
checking the formulas,

00:36:19.410 --> 00:36:22.910
but just describing the ideas.

00:36:22.910 --> 00:36:25.980
So the idea is that
this formula gives us

00:36:25.980 --> 00:36:33.400
a new direction, which has this
property two that we're after.

00:36:33.400 --> 00:36:36.440
That the direction, it's
not actually orthogonal,

00:36:36.440 --> 00:36:40.520
it's A-orthogonal to
the previous direction.

00:36:40.520 --> 00:36:44.230
OK, I'm going to draw one
picture of what -- whoops,

00:36:44.230 --> 00:36:47.740
not on that -- of
these directions,

00:36:47.740 --> 00:36:54.170
and then that's my best I can
do with conjugate gradients.

00:36:54.170 --> 00:37:00.940
I want to try to
draw this search

00:37:00.940 --> 00:37:05.330
for the bottom of the valley.

00:37:05.330 --> 00:37:08.000
So I'm going to draw
that search by drawing

00:37:08.000 --> 00:37:11.250
the level sets of the function.

00:37:11.250 --> 00:37:17.450
I'm going to -- yeah, somehow
the contours of my function

00:37:17.450 --> 00:37:26.320
are, since my function here is
quadratic, all the contours,

00:37:26.320 --> 00:37:28.660
the level sets will be ellipses.

00:37:28.660 --> 00:37:31.070
They'll all be, sort of,
like similar ellipses,

00:37:31.070 --> 00:37:33.360
and there is the one I want.

00:37:33.360 --> 00:37:35.530
That's the bottom of the valley.

00:37:35.530 --> 00:37:38.150
So this is the contour
where energy three,

00:37:38.150 --> 00:37:42.120
this is the contour energy two,
this is the contour energy one,

00:37:42.120 --> 00:37:45.650
and there's the point where
the energy is a minimum.

00:37:45.650 --> 00:37:50.280
Well, it might be
negative, whatever,

00:37:50.280 --> 00:37:51.960
but it's the smallest.

00:37:51.960 --> 00:37:52.810
OK.

00:37:52.810 --> 00:37:59.850
So what does -- can I compare
steepest descent with conjugate

00:37:59.850 --> 00:38:00.730
gradient?

00:38:00.730 --> 00:38:03.110
Let me draw one more contour,
just so I have enough

00:38:03.110 --> 00:38:06.880
to make this point.

00:38:06.880 --> 00:38:14.000
OK, so steepest descent starts
somewhere, OK, starts there,

00:38:14.000 --> 00:38:15.890
let's say.

00:38:15.890 --> 00:38:17.610
Suppose that's my
starting point.

00:38:17.610 --> 00:38:22.230
Steepest descent goes,
if I draw it right,

00:38:22.230 --> 00:38:25.640
it'll go perpendicular, right?

00:38:25.640 --> 00:38:30.350
It goes perpendicular
to the level contour.

00:38:30.350 --> 00:38:35.340
You carry along until, as long
as the contours are going down,

00:38:35.340 --> 00:38:40.510
and then if I continue here,
I'm climbing back up the valley.

00:38:40.510 --> 00:38:41.430
So I'll stop here.

00:38:41.430 --> 00:38:47.200
Let's suppose that that happened
to be tangent right there, OK.

00:38:47.200 --> 00:38:52.800
Now, that was the first
step of steepest descent.

00:38:52.800 --> 00:38:56.090
Now I'm at this point,
I look at this contour,

00:38:56.090 --> 00:39:00.130
and well, if I've drawn
it reasonably well,

00:39:00.130 --> 00:39:02.650
I'm following maybe
perpendicular,

00:39:02.650 --> 00:39:06.600
and now again I'm perpendicular,
steepest descent says

00:39:06.600 --> 00:39:11.650
go perpendicular to the contour,
until you've got to the bottom

00:39:11.650 --> 00:39:13.230
there.

00:39:13.230 --> 00:39:23.110
Then down, then up, then do
you see this frustrating crawl

00:39:23.110 --> 00:39:26.220
across the valley.

00:39:26.220 --> 00:39:29.380
It wouldn't happen if these
contours were circles.

00:39:29.380 --> 00:39:33.260
If these contours -- if A
was the identity matrix,

00:39:33.260 --> 00:39:37.000
if A was the identity matrix,
these would be perfect circles,

00:39:37.000 --> 00:39:39.450
and you'd go to the
center right away.

00:39:39.450 --> 00:39:44.800
Well that only says that A*x
equal b is easy to solve when A

00:39:44.800 --> 00:39:46.970
is the identity matrix.

00:39:46.970 --> 00:39:55.070
So this is steepest descent,
and its rate of convergence

00:39:55.070 --> 00:40:00.800
is slow, because the longer
and thinner the valley,

00:40:00.800 --> 00:40:06.260
the worse it is here, OK.

00:40:06.260 --> 00:40:08.280
What about conjugate gradients?

00:40:08.280 --> 00:40:18.800
Well, I'm just going
to draw magic --

00:40:18.800 --> 00:40:20.580
I'm going to make
it look like magic.

00:40:20.580 --> 00:40:25.640
I'm going to start at the
same place, the first step,

00:40:25.640 --> 00:40:29.090
because I'm starting with
zero, the best thing I can do

00:40:29.090 --> 00:40:33.180
is follow the
gradients to there,

00:40:33.180 --> 00:40:39.810
and now, so that was the
first step, and now what?

00:40:39.810 --> 00:40:46.230
Well, the next direction
-- so these are the search

00:40:46.230 --> 00:40:47.550
directions.

00:40:47.550 --> 00:40:51.790
That's the search direction
d_1, d_2, d_3, d_4.

00:40:51.790 --> 00:40:56.060
Here is d_1, and what
does d_2 look like?

00:40:56.060 --> 00:40:59.580
May I cheat and go
straight to the center?

00:40:59.580 --> 00:41:03.690
It's a little bit of a cheat,
I'll go near the center.

00:41:03.690 --> 00:41:09.280
The next direction is
much more like that

00:41:09.280 --> 00:41:12.480
and then goes to
the bottom, which

00:41:12.480 --> 00:41:16.250
would be somewhere like
there inside that level set,

00:41:16.250 --> 00:41:20.490
and then the next direction
would probably be very close.

00:41:20.490 --> 00:41:27.520
So this is not 90 degrees unless
you're in the A inner product.

00:41:27.520 --> 00:41:37.350
So this is a 90 degree angle
in the A inner product.

00:41:37.350 --> 00:41:40.040
And what does the
A inner product do?

00:41:40.040 --> 00:41:49.080
In the A inner product,
in which level sets

00:41:49.080 --> 00:41:58.040
are circles, or spheres or
whatever, in whatever dimension

00:41:58.040 --> 00:41:58.540
we want.

00:41:58.540 --> 00:42:02.630
Well, I'm not going
to be, I don't want

00:42:02.630 --> 00:42:06.100
to be held to everything here.

00:42:06.100 --> 00:42:11.700
If you see this
picture, then we've

00:42:11.700 --> 00:42:15.090
not only made headway with
the conjugate gradient

00:42:15.090 --> 00:42:19.150
method, which is a big deal
for solving linear systems,

00:42:19.150 --> 00:42:22.170
but also we've made headway with
the conjugate gradient method

00:42:22.170 --> 00:42:25.290
for minimizing functions.

00:42:25.290 --> 00:42:28.590
And if the function
wasn't quadratic,

00:42:28.590 --> 00:42:31.330
and our equations
weren't linear,

00:42:31.330 --> 00:42:34.880
the conjugate gradient idea
would still be available.

00:42:34.880 --> 00:42:39.760
Non-linear conjugate
gradients would somehow

00:42:39.760 --> 00:42:48.310
follow the same idea of
A-orthogonal search directions.

00:42:48.310 --> 00:42:51.750
So these are not the gradients,
they're the search directions

00:42:51.750 --> 00:42:53.120
here.

00:42:53.120 --> 00:42:55.940
OK.

00:42:55.940 --> 00:43:03.530
That's sort of my speech about
the conjugate gradient method.

00:43:03.530 --> 00:43:07.460
Without having checked all
the formulas in the cycle,

00:43:07.460 --> 00:43:12.020
we know how much work is
involved, we know what we get,

00:43:12.020 --> 00:43:15.950
and we see a bit
of the geometry.

00:43:15.950 --> 00:43:17.130
OK.

00:43:17.130 --> 00:43:21.590
And we see it as a
linear systems solver

00:43:21.590 --> 00:43:26.660
and also as an energy
minimizer, and those

00:43:26.660 --> 00:43:31.330
are both important ways to
think of conjugate gradients.

00:43:31.330 --> 00:43:32.660
OK.

00:43:32.660 --> 00:43:40.340
I'll just take a few final
minutes to say a word about

00:43:40.340 --> 00:43:44.730
other -- what if our problem
is not symmetric positive

00:43:44.730 --> 00:43:46.460
definite?

00:43:46.460 --> 00:43:50.220
What if we have an
un-symmetric problem,

00:43:50.220 --> 00:43:53.460
or a symmetric, but not
positive definite problem?

00:43:53.460 --> 00:44:01.870
Well, in that case, these
denominators could be zero.

00:44:01.870 --> 00:44:05.500
I mean, if we don't know that
A is positive definite, that

00:44:05.500 --> 00:44:08.040
could be a zero, it
could be anything,

00:44:08.040 --> 00:44:11.500
and that method is broken.

00:44:11.500 --> 00:44:18.030
So I want to suggest a safer,
but more expensive method now

00:44:18.030 --> 00:44:24.850
for -- so this would be
MINRES, minimum residual.

00:44:24.850 --> 00:44:29.230
So the idea is
minimize, choose x_k

00:44:29.230 --> 00:44:35.080
to minimize the norm of
r_k, that's the residual.

00:44:35.080 --> 00:44:44.230
So minimize the norm
of b minus A*x_k.

00:44:44.230 --> 00:44:47.380
So that's my choice of
x_k, and the question is,

00:44:47.380 --> 00:44:51.830
how quickly can I do it?

00:44:51.830 --> 00:44:55.400
So this is MINRES, OK.

00:44:55.400 --> 00:44:58.690
And there are other
related methods,

00:44:58.690 --> 00:45:00.430
there's a whole
family of methods.

00:45:00.430 --> 00:45:03.750
What do I want to
say about MINRES?

00:45:03.750 --> 00:45:08.430
Let me just find MINRES.

00:45:08.430 --> 00:45:10.440
Well, yeah.

00:45:14.650 --> 00:45:18.840
I guess, for this I
will start with Arnoldi.

00:45:28.980 --> 00:45:34.180
So for this different algorithm,
which comes up with a different

00:45:34.180 --> 00:45:37.090
x_k, I'm going to
-- and of course,

00:45:37.090 --> 00:45:43.600
the x_k is going to be
in the Krylov space K_k,

00:45:43.600 --> 00:45:46.870
and I'm going to
use this good basis.

00:45:46.870 --> 00:45:52.890
This gives me the
great basis of q's.

00:45:52.890 --> 00:45:56.710
So I turn my problem,
I convert the problem.

00:45:56.710 --> 00:46:10.790
I convert this minimization --
I look for x as a combination

00:46:10.790 --> 00:46:15.190
of the of the q's, right?

00:46:15.190 --> 00:46:18.980
Is that how you
see x equals Q*y?

00:46:18.980 --> 00:46:22.200
So instead of looking
for the vector x,

00:46:22.200 --> 00:46:24.100
I'm looking for
the vector y, which

00:46:24.100 --> 00:46:30.110
tells me what combination of
these columns, the q's, x is.

00:46:30.110 --> 00:46:34.760
In other words, I'm working
in the q system, the q basis.

00:46:34.760 --> 00:46:36.330
So I'm working with y.

00:46:36.330 --> 00:46:43.890
So I would, naturally, write
this as a minimum of b minus

00:46:43.890 --> 00:46:44.610
A*Q*y_k.

00:46:47.360 --> 00:46:49.120
Same problem.

00:46:49.120 --> 00:46:51.620
Same problem, I'm
just deciding, OK, I'm

00:46:51.620 --> 00:46:55.930
going to find this x as
a combination of the q's,

00:46:55.930 --> 00:46:59.200
because those q's
are orthogonal,

00:46:59.200 --> 00:47:02.810
and that will make this
calculation a lot nicer.

00:47:02.810 --> 00:47:04.660
OK, what's going to happen here?

00:47:04.660 --> 00:47:10.680
You know that I'm going to use
the property of the q's that

00:47:10.680 --> 00:47:15.730
A*Q is Q*H.

00:47:15.730 --> 00:47:19.970
That was the key point
that Arnoldi achieved.

00:47:19.970 --> 00:47:24.070
And the net result is
going to be, of course,

00:47:24.070 --> 00:47:28.400
I have to do this at level
k, so I'm chopping it off,

00:47:28.400 --> 00:47:31.380
and I have to
patiently, and the notes

00:47:31.380 --> 00:47:33.940
do that, patiently
see, OK, what's

00:47:33.940 --> 00:47:38.580
happening when we chop it
off, chop off these matrices.

00:47:38.580 --> 00:47:41.340
We have to watch where
the non-zeros appear.

00:47:41.340 --> 00:47:45.930
In the end, I get to a
minimization problem for y,

00:47:45.930 --> 00:47:52.310
so we get to a minimum
problem for the matrix

00:47:52.310 --> 00:48:01.920
H that leads me to to
the y that I'm after.

00:48:01.920 --> 00:48:05.810
So it's a minimum problem
for a piece, sorry,

00:48:05.810 --> 00:48:15.400
I should really say H_k, some
piece of it, like this piece.

00:48:15.400 --> 00:48:22.770
In fact, exactly that piece.

00:48:22.770 --> 00:48:27.960
So if I ask -- all right,
suppose I have a least squares

00:48:27.960 --> 00:48:29.520
problem.

00:48:29.520 --> 00:48:34.200
This is least squares because
my norm is just the square,

00:48:34.200 --> 00:48:36.660
the norm squared.

00:48:36.660 --> 00:48:39.420
I'm just working with l^2 norm.

00:48:39.420 --> 00:48:41.740
And here's my matrix.

00:48:45.820 --> 00:48:52.210
It's 3 by 2, so I'm looking
at least squares problems,

00:48:52.210 --> 00:48:57.900
in that -- matrices
of that type,

00:48:57.900 --> 00:49:00.650
and they're easy to solve.

00:49:00.650 --> 00:49:04.540
The notes will give the
algorithm that solves the least

00:49:04.540 --> 00:49:05.540
squares problem.

00:49:05.540 --> 00:49:09.410
I guess what I'm interested in
here is just is it expensive

00:49:09.410 --> 00:49:10.780
or not?

00:49:10.780 --> 00:49:15.910
Well, the least squares
problem will not be expensive;

00:49:15.910 --> 00:49:18.480
it's only got one extra
row, compared to the number

00:49:18.480 --> 00:49:20.850
of columns, that's not bad.

00:49:20.850 --> 00:49:27.690
What's expensive is the fact
that our H is not tri-diagonal

00:49:27.690 --> 00:49:31.510
anymore, these are
not zeros anymore.

00:49:31.510 --> 00:49:37.050
I have to compute all
these H's, so the hard work

00:49:37.050 --> 00:49:40.330
is back in Arnoldi.

00:49:40.330 --> 00:49:46.650
So it's Arnoldi
leading to MINRES,

00:49:46.650 --> 00:49:50.640
and we have serious
work already in Arnoldi.

00:49:50.640 --> 00:49:54.170
We don't have short recurrences,
we've got all these h's.

00:49:54.170 --> 00:49:56.630
OK, enough.

00:49:56.630 --> 00:50:08.200
Because ARPACK would be a good
code to call to do MINRES.

00:50:08.200 --> 00:50:15.140
OK that's my shot
at the key Krylov

00:50:15.140 --> 00:50:17.440
algorithms of numerical
linear algebra,

00:50:17.440 --> 00:50:24.270
and you see that they're
more subtle than Jacobi.

00:50:24.270 --> 00:50:26.530
Jacobi and
Gauss-Seidel were just

00:50:26.530 --> 00:50:29.060
straight simple
iterations; these

00:50:29.060 --> 00:50:35.500
have recurrence that
produces orthogonality,

00:50:35.500 --> 00:50:39.510
and you pay very little,
and you get so much.

00:50:39.510 --> 00:50:45.570
OK, so that's Krylov methods,
and that section of notes

00:50:45.570 --> 00:50:51.640
is up on the web, and of
course, at some point,

00:50:51.640 --> 00:50:58.330
numerical comparisons of how
fast is conjugate gradients,

00:50:58.330 --> 00:51:02.970
and is it faster than
direct elimination?

00:51:02.970 --> 00:51:06.000
You know, because that's a
big choice you have to make.

00:51:06.000 --> 00:51:09.820
Suppose you do have a symmetric
positive definite matrix.

00:51:09.820 --> 00:51:17.150
Do you do Gauss elimination with
re-ordering, minimum degree,

00:51:17.150 --> 00:51:20.440
or do you make the
completely different choice

00:51:20.440 --> 00:51:24.600
of iterative methods
like conjugate gradients?

00:51:24.600 --> 00:51:29.210
And that's a decision you
have to make when you're

00:51:29.210 --> 00:51:31.540
faced with a large
matrix problem,

00:51:31.540 --> 00:51:39.510
and only experiments can compare
two such widely different

00:51:39.510 --> 00:51:41.290
directions to take.

00:51:41.290 --> 00:51:50.890
So I like to end up by
emphasizing that there's still

00:51:50.890 --> 00:51:53.640
a big question.

00:51:53.640 --> 00:51:58.010
Do this or do minimum degree?

00:51:58.010 --> 00:52:04.860
And only some experiments
will show which is the better.

00:52:04.860 --> 00:52:08.780
OK, so that's good for
today, and next time

00:52:08.780 --> 00:52:13.620
will be direct
solution, using the FFT.

00:52:13.620 --> 00:52:16.310
Good, thanks.