1
00:00:01,580 --> 00:00:03,920
The following content is
provided under a Creative
2
00:00:03,920 --> 00:00:05,340
Commons license.
3
00:00:05,340 --> 00:00:07,550
Your support will help
MIT OpenCourseWare
4
00:00:07,550 --> 00:00:11,640
continue to offer high quality
educational resources for free.
5
00:00:11,640 --> 00:00:14,180
To make a donation or to
view additional materials
6
00:00:14,180 --> 00:00:18,110
from hundreds of MIT courses,
visit MIT OpenCourseWare
7
00:00:18,110 --> 00:00:19,130
at ocw.mit.edu.
8
00:00:23,390 --> 00:00:26,150
JAMES W. SWAN: So this is
going to be our last lecture
9
00:00:26,150 --> 00:00:27,960
on linear algebra.
10
00:00:27,960 --> 00:00:30,417
The first three
lectures covered basics.
11
00:00:30,417 --> 00:00:32,750
The next three lectures, we
talked about different sorts
12
00:00:32,750 --> 00:00:34,730
of transformations of matrices.
13
00:00:34,730 --> 00:00:36,742
This final lecture is
the last of those three.
14
00:00:36,742 --> 00:00:39,200
We're going to talk about in
another sort of transformation
15
00:00:39,200 --> 00:00:42,820
called the singular
value decomposition.
16
00:00:42,820 --> 00:00:46,610
OK, before we jump in, I'd like
to do the usual recap business.
17
00:00:46,610 --> 00:00:49,010
I think it's always hopeful
to recap or look at things
18
00:00:49,010 --> 00:00:51,170
from a different perspective.
19
00:00:51,170 --> 00:00:55,130
Early on, I told you that the
infinite dimensional equivalent
20
00:00:55,130 --> 00:00:57,740
of vectors would be something
like a function, which
21
00:00:57,740 --> 00:01:02,510
is a map, a unique map maybe
from a point to x to some value
22
00:01:02,510 --> 00:01:04,150
f of x.
23
00:01:04,150 --> 00:01:06,080
And there is an
equivalent representation
24
00:01:06,080 --> 00:01:08,780
of the eigenvalue eigenvector
problem in function space.
25
00:01:08,780 --> 00:01:11,330
We call these eigenvalues
and eigenfunctions.
26
00:01:11,330 --> 00:01:15,074
Here's a classic one where
the function is y of x, OK?
27
00:01:15,074 --> 00:01:17,240
This is the equivalent of
the vector, and equivalent
28
00:01:17,240 --> 00:01:18,890
of the transformation
or the matrix
29
00:01:18,890 --> 00:01:21,230
that's this differential
operator this time,
30
00:01:21,230 --> 00:01:22,850
the second derivative.
31
00:01:22,850 --> 00:01:26,340
So I take the second derivative
of this particular function,
32
00:01:26,340 --> 00:01:28,040
and the function is stretched.
33
00:01:28,040 --> 00:01:30,980
It's multiplied by some
fixed value at all points.
34
00:01:30,980 --> 00:01:33,830
And it becomes lambda times y.
35
00:01:33,830 --> 00:01:37,100
And that operator has to be
closed with some boundary
36
00:01:37,100 --> 00:01:37,920
conditions as well.
37
00:01:37,920 --> 00:01:39,830
We have to say
what the value of y
38
00:01:39,830 --> 00:01:43,640
is at the edges
of some boundary.
39
00:01:43,640 --> 00:01:45,590
So there's a one-to-one
correspondence
40
00:01:45,590 --> 00:01:48,470
between these things.
41
00:01:48,470 --> 00:01:54,730
What is the eigenfunction here,
or what are the eigenfunctions?
42
00:01:54,730 --> 00:01:57,160
And what are the
eigenvalues associated
43
00:01:57,160 --> 00:01:59,590
with this transformation
or this operator?
44
00:01:59,590 --> 00:02:01,667
Can you work those
out really quickly?
45
00:02:01,667 --> 00:02:03,250
You learned this at
some point, right?
46
00:02:03,250 --> 00:02:05,995
Somebody taught you
differential equations
47
00:02:05,995 --> 00:02:07,360
and you calculated these things.
48
00:02:07,360 --> 00:02:08,410
Take about 90 seconds.
49
00:02:08,410 --> 00:02:09,370
Work with the people around you.
50
00:02:09,370 --> 00:02:11,286
See if you can come to
a conclusion about what
51
00:02:11,286 --> 00:02:14,980
the eigenfunction
and eigenvalues are.
52
00:02:26,699 --> 00:02:27,490
That's enough time.
53
00:02:27,490 --> 00:02:28,870
You can work on this
on your own later
54
00:02:28,870 --> 00:02:29,953
if you've run out of time.
55
00:02:29,953 --> 00:02:30,976
Don't worry about it.
56
00:02:30,976 --> 00:02:32,600
Does somebody want
to volunteer a guess
57
00:02:32,600 --> 00:02:36,340
for what the eigenfunctions
are in this case?
58
00:02:36,340 --> 00:02:38,350
What are they?
59
00:02:38,350 --> 00:02:39,300
Yeah?
60
00:02:39,300 --> 00:02:42,846
AUDIENCE: [INAUDIBLE]
61
00:02:42,846 --> 00:02:44,720
JAMES W. SWAN: OK, so
you chose exponentials.
62
00:02:44,720 --> 00:02:45,950
That's an interesting choice.
63
00:02:45,950 --> 00:02:47,880
That's one possible
choice you can make.
64
00:02:47,880 --> 00:02:48,994
OK, so we could say--
65
00:02:48,994 --> 00:02:50,660
this is sort of a
classical one that you
66
00:02:50,660 --> 00:02:55,274
think about when you first
learn differential equation.
67
00:02:55,274 --> 00:02:56,690
They say, an
equation of this sort
68
00:02:56,690 --> 00:03:01,544
has solutions that look like
exponentials, and that's true.
69
00:03:01,544 --> 00:03:03,210
There's another
representation for this,
70
00:03:03,210 --> 00:03:05,760
which is as trigonometric
functions instead, right?
71
00:03:09,110 --> 00:03:10,470
Either of those is acceptable.
72
00:03:10,470 --> 00:03:12,550
[INAUDIBLE] the
trigonometric functions,
73
00:03:12,550 --> 00:03:17,417
that representation is a
little more useful for us here.
74
00:03:17,417 --> 00:03:19,750
We know that the boundary
conditions tell us that y of 0
75
00:03:19,750 --> 00:03:22,430
is supposed to be 0.
76
00:03:22,430 --> 00:03:26,590
That means that the C1 has to
be 0, because cosine of 0 is 1.
77
00:03:26,590 --> 00:03:29,550
So C1 has 0 in this case.
78
00:03:29,550 --> 00:03:33,030
So that fixes one of
these coefficients.
79
00:03:33,030 --> 00:03:35,730
And now we're left
with a problem, right?
80
00:03:35,730 --> 00:03:38,310
Our solutions, our
eigenfunctions,
81
00:03:38,310 --> 00:03:39,220
cannot be unique.
82
00:03:39,220 --> 00:03:41,760
So we don't get to
specify C2, right?
83
00:03:41,760 --> 00:03:44,537
Any function that's a
multiple of this sine
84
00:03:44,537 --> 00:03:45,870
should also be an eigenfunction.
85
00:03:45,870 --> 00:03:47,640
So instead the other
boundary condition,
86
00:03:47,640 --> 00:03:51,030
this y of l equals 0, needs
to be used to pin down
87
00:03:51,030 --> 00:03:52,920
with the eigenvalue is.
88
00:03:52,920 --> 00:03:54,870
So the second
equation, y of l equals
89
00:03:54,870 --> 00:03:59,560
0, which implies that the
square root of minus lambda
90
00:03:59,560 --> 00:04:01,660
has to be equal
to 2 pi over l, it
91
00:04:01,660 --> 00:04:04,232
has to be all the
nodes of the sine
92
00:04:04,232 --> 00:04:05,440
where the sine is equal to 0.
93
00:04:05,440 --> 00:04:07,565
That's the equivalent of
our secular characteristic
94
00:04:07,565 --> 00:04:10,990
polynomial that prescribes with
the eigenvalues are associated
95
00:04:10,990 --> 00:04:13,524
with each of the eigenfunctions.
96
00:04:13,524 --> 00:04:15,190
So now we know what
the eigenvalues are.
97
00:04:15,190 --> 00:04:17,320
The eigenvalues are
the set of numbers
98
00:04:17,320 --> 00:04:21,700
minus 2 pi n over l squared.
99
00:04:21,700 --> 00:04:23,490
There's an infinite
number of eigenvalues.
100
00:04:23,490 --> 00:04:26,920
It's an infinite dimensional
space that we're in,
101
00:04:26,920 --> 00:04:29,230
so it's not a big surprise
that it works out that way.
102
00:04:29,230 --> 00:04:33,100
And the eigenvectors then are
different scalar multiples
103
00:04:33,100 --> 00:04:36,220
of sine of the eigenvalues,
square root of the eigenvalues,
104
00:04:36,220 --> 00:04:37,672
minus x.
105
00:04:37,672 --> 00:04:39,130
There's a one-to-one
correspondence
106
00:04:39,130 --> 00:04:41,110
between all the linear
algebra we've done
107
00:04:41,110 --> 00:04:42,670
and linear
differential equations
108
00:04:42,670 --> 00:04:44,560
or linear partial
differential equations.
109
00:04:44,560 --> 00:04:47,030
You can think about these
things in exactly the same way.
110
00:04:47,030 --> 00:04:53,080
I'm sure in 1050, you started to
talk about orthogonal functions
111
00:04:53,080 --> 00:04:55,270
to represent solutions of
differential equations.
112
00:04:55,270 --> 00:04:57,190
Or if you haven't, you're
going to very soon.
113
00:04:57,190 --> 00:04:58,690
This is a part of
the course you get
114
00:04:58,690 --> 00:05:01,180
to look at the analytical side
of some of these things as
115
00:05:01,180 --> 00:05:02,230
opposed to the numerical side.
116
00:05:02,230 --> 00:05:03,771
But there's a
one-to-one relationship
117
00:05:03,771 --> 00:05:04,730
between those things.
118
00:05:04,730 --> 00:05:06,855
So if you understand one,
you understand the other,
119
00:05:06,855 --> 00:05:10,770
and you can come at them
from either perspective.
120
00:05:10,770 --> 00:05:11,979
This sort of stuff is useful.
121
00:05:11,979 --> 00:05:14,144
Actually, the classical
chemical engineering example
122
00:05:14,144 --> 00:05:15,920
comes from quantum
mechanics where
123
00:05:15,920 --> 00:05:18,780
you think about wave functions
and different energy levels
124
00:05:18,780 --> 00:05:21,870
corresponding to eigenvalues.
125
00:05:21,870 --> 00:05:23,220
That's cool.
126
00:05:23,220 --> 00:05:25,460
Sometimes, I like to think
about a mechanical analog
127
00:05:25,460 --> 00:05:28,420
to that, which is the
buckling of an elastic column.
128
00:05:28,420 --> 00:05:29,670
So you should do this at home.
129
00:05:29,670 --> 00:05:31,800
You should go get a
piece of spaghetti
130
00:05:31,800 --> 00:05:34,620
and push on the ends of
the piece of the spaghetti.
131
00:05:34,620 --> 00:05:36,000
And the spaghetti will buckle.
132
00:05:36,000 --> 00:05:38,300
Eventually it'll break,
but it'll buckle first.
133
00:05:38,300 --> 00:05:39,520
It'll bend.
134
00:05:39,520 --> 00:05:41,070
And how does it bend?
135
00:05:41,070 --> 00:05:45,330
Well, a balance of linear
momentum on this bar
136
00:05:45,330 --> 00:05:48,780
would tell you
that the deflection
137
00:05:48,780 --> 00:05:52,420
in the bar at different
points x along the bar
138
00:05:52,420 --> 00:05:57,600
multiplied by the pressure has
to balance the bending moment
139
00:05:57,600 --> 00:05:58,690
in the bar itself.
140
00:05:58,690 --> 00:06:00,840
So this e is some
elastic constant.
141
00:06:00,840 --> 00:06:02,640
I has a moment of inertia.
142
00:06:02,640 --> 00:06:04,650
And D squared y dx
squared is something
143
00:06:04,650 --> 00:06:05,980
like the curvature of the bar.
144
00:06:05,980 --> 00:06:07,590
So it's the bending
moments of the bar
145
00:06:07,590 --> 00:06:09,510
that balances the
pressure that's
146
00:06:09,510 --> 00:06:11,250
being exerted on the bar.
147
00:06:11,250 --> 00:06:13,800
And sure enough,
this bar will buckle
148
00:06:13,800 --> 00:06:16,860
when the pressure
applied exceeds
149
00:06:16,860 --> 00:06:20,330
the first eigenvalue associated
with this differential
150
00:06:20,330 --> 00:06:20,830
equation.
151
00:06:20,830 --> 00:06:24,390
We just worked that
eigenvalue out.
152
00:06:24,390 --> 00:06:29,950
We said that that eigenvalue
had to be the square root of 2
153
00:06:29,950 --> 00:06:31,407
pi over l squared.
154
00:06:31,407 --> 00:06:33,240
And so when the pressure
exceeds square root
155
00:06:33,240 --> 00:06:36,810
of 2 pi over l squared
times the elastic modulus,
156
00:06:36,810 --> 00:06:40,464
this column will bend
and deform continuously
157
00:06:40,464 --> 00:06:41,880
until it eventually
breaks, right?
158
00:06:41,880 --> 00:06:44,710
It will undergo this
linear elastic deformation,
159
00:06:44,710 --> 00:06:48,560
then plastic deformation
later, and it will break.
160
00:06:48,560 --> 00:06:50,360
The Eiffel Tower,
actually, is one
161
00:06:50,360 --> 00:06:51,860
of the first
structures in the world
162
00:06:51,860 --> 00:06:54,270
to utilize this
principle, right?
163
00:06:54,270 --> 00:06:56,000
It's got very
narrow beams in it.
164
00:06:56,000 --> 00:06:59,614
The beams are engineered so
that their elastic modulus
165
00:06:59,614 --> 00:07:01,280
is strong enough that
they won't buckle.
166
00:07:01,280 --> 00:07:04,060
Gustave Eiffel is one of the
first applied physicists,
167
00:07:04,060 --> 00:07:07,700
somebody who took the
physics of elastic bars
168
00:07:07,700 --> 00:07:10,460
and applied them to
building structures that
169
00:07:10,460 --> 00:07:14,570
weren't big and blocky, but used
a minimal amount of material.
170
00:07:14,570 --> 00:07:16,880
Cool, right?
171
00:07:16,880 --> 00:07:18,505
OK, so that's recap.
172
00:07:21,580 --> 00:07:23,980
Any questions about that?
173
00:07:23,980 --> 00:07:26,410
You've seen these things before.
174
00:07:26,410 --> 00:07:29,530
You understood them
well before too maybe?
175
00:07:29,530 --> 00:07:33,430
Give some thought to this, OK?
176
00:07:33,430 --> 00:07:37,090
We talked about
eigendecomposition last time
177
00:07:37,090 --> 00:07:40,050
that, associated with
the square matrix,
178
00:07:40,050 --> 00:07:44,050
was a particular eigenvalue or
particular set of eigenvalues,
179
00:07:44,050 --> 00:07:48,250
stretches and corresponding
eigenvectors directions.
180
00:07:48,250 --> 00:07:51,490
These were special solutions to
the system of linear equations
181
00:07:51,490 --> 00:07:52,540
based on a matrix.
182
00:07:52,540 --> 00:07:53,815
It was a square matrix.
183
00:07:53,815 --> 00:07:55,540
And you might ask,
well, what happens
184
00:07:55,540 --> 00:07:57,850
if the matrix isn't square?
185
00:07:57,850 --> 00:08:03,010
What if A is in the space of
real matrices that are n by m,
186
00:08:03,010 --> 00:08:04,630
where n and m maybe
aren't the same?
187
00:08:04,630 --> 00:08:06,910
Maybe they are the same,
but maybe they're not.
188
00:08:06,910 --> 00:08:10,046
And there is an
equivalent decomposition.
189
00:08:10,046 --> 00:08:11,920
It's called the singular
value decomposition.
190
00:08:11,920 --> 00:08:16,850
It's like an eigendecomposition
for non-square matrices.
191
00:08:16,850 --> 00:08:21,150
So rather than writing our
matrix as some w lambda w
192
00:08:21,150 --> 00:08:24,720
inverse, we're going to
write it as some product
193
00:08:24,720 --> 00:08:29,490
U times sigma times
V with this dagger.
194
00:08:29,490 --> 00:08:31,940
The dagger here is
conjugate transpose.
195
00:08:31,940 --> 00:08:34,400
Transpose the matrix, and
take the complex conjugate
196
00:08:34,400 --> 00:08:36,382
of all the elements, OK?
197
00:08:36,382 --> 00:08:39,630
I mentioned last time that
eigenvalues and eigenvectors
198
00:08:39,630 --> 00:08:42,820
could be complex,
potentially, right?
199
00:08:42,820 --> 00:08:45,960
So whenever we have that case
where things can be complex,
200
00:08:45,960 --> 00:08:47,510
usually the
transposition operation
201
00:08:47,510 --> 00:08:49,510
is replaced with the
conjugate transpose.
202
00:08:52,224 --> 00:08:53,640
What are these
different matrices.
203
00:08:53,640 --> 00:08:55,620
Well, let me tell you.
204
00:08:55,620 --> 00:08:58,140
U is a complex matrix.
205
00:08:58,140 --> 00:09:01,530
It maps from the
space N to R N to R N,
206
00:09:01,530 --> 00:09:04,350
so it's an n by n square matrix.
207
00:09:04,350 --> 00:09:09,240
Sigma is a real valued matrix,
and it lives in the space of n
208
00:09:09,240 --> 00:09:10,830
by n matrices.
209
00:09:10,830 --> 00:09:16,520
V is a square matrix again,
but it has dimensions m by m.
210
00:09:16,520 --> 00:09:19,770
Remember, A maps
from R M to R N,
211
00:09:19,770 --> 00:09:22,110
so that's what the
sequence of products says.
212
00:09:22,110 --> 00:09:23,820
B maps from m to m.
213
00:09:23,820 --> 00:09:27,320
Sigma maps from m to n.
214
00:09:27,320 --> 00:09:28,720
U maps from n to n.
215
00:09:28,720 --> 00:09:31,350
So this match from
m to n as well.
216
00:09:34,060 --> 00:09:36,760
Sigma is like
lambda from before.
217
00:09:36,760 --> 00:09:38,660
It's a diagonal matrix.
218
00:09:38,660 --> 00:09:40,040
It only has diagonal elements.
219
00:09:40,040 --> 00:09:41,415
It's just not
square, but it only
220
00:09:41,415 --> 00:09:45,540
has diagonal elements, all
of which will be positive.
221
00:09:48,100 --> 00:09:51,730
And then U and V are called
the left and right singular
222
00:09:51,730 --> 00:09:52,310
vectors.
223
00:09:52,310 --> 00:09:54,560
And they have special
properties associated with them,
224
00:09:54,560 --> 00:09:56,290
which I'll show you right now.
225
00:09:56,290 --> 00:09:59,600
Any questions about
how this decomposition
226
00:09:59,600 --> 00:10:02,690
is composed or made up?
227
00:10:02,690 --> 00:10:04,790
It looks just like the
eigendecomposition,
228
00:10:04,790 --> 00:10:06,360
but it can be applied
to any matrix.
229
00:10:06,360 --> 00:10:06,860
Yes?
230
00:10:06,860 --> 00:10:07,910
AUDIENCE: Quick question.
231
00:10:07,910 --> 00:10:08,170
JAMES W. SWAN: Sure.
232
00:10:08,170 --> 00:10:09,920
AUDIENCE: Do all
matrices have this thing,
233
00:10:09,920 --> 00:10:12,440
or is it like the eigenvalues
where some do and some don't.
234
00:10:12,440 --> 00:10:13,170
JAMES W. SWAN: This
is a great question.
235
00:10:13,170 --> 00:10:15,253
So all matrices are going
to have a singular value
236
00:10:15,253 --> 00:10:16,710
decomposition.
237
00:10:16,710 --> 00:10:18,410
We saw with the
eigenvalue decomposition
238
00:10:18,410 --> 00:10:22,520
that there could be a case
where the eigenvectors are
239
00:10:22,520 --> 00:10:25,190
degenerate, and we can't
write that full decomposition.
240
00:10:25,190 --> 00:10:28,214
All matrices are going to
have this decomposition.
241
00:10:34,380 --> 00:10:41,690
So for some properties
of this decomposition,
242
00:10:41,690 --> 00:10:44,330
U and V are what we
call unitary matrices.
243
00:10:44,330 --> 00:10:45,850
I talked about these before.
244
00:10:45,850 --> 00:10:49,430
Unitary matrices are ones for
whom, if they're real valued,
245
00:10:49,430 --> 00:10:51,830
their transpose is
also their inverse.
246
00:10:51,830 --> 00:10:54,490
If they're complex
valued, and they're
247
00:10:54,490 --> 00:10:57,620
conjugate transpose is the
equivalent of their inverse.
248
00:10:57,620 --> 00:11:01,930
So U times U conjugate
transpose will be identity.
249
00:11:01,930 --> 00:11:05,800
V times V conjugate
transpose will be identity.
250
00:11:05,800 --> 00:11:08,510
Unitary matrices also
have the property
251
00:11:08,510 --> 00:11:12,400
that they impart no
stretch to a matrix--
252
00:11:12,400 --> 00:11:13,820
or to vectors.
253
00:11:13,820 --> 00:11:16,460
So their maps don't stretch.
254
00:11:16,460 --> 00:11:19,104
They're kind of like
rotational matrices, right?
255
00:11:19,104 --> 00:11:21,520
They change directions, but
they don't stretch things out.
256
00:11:25,190 --> 00:11:30,890
If I were to take A conjugate
transpose and multiply it by A,
257
00:11:30,890 --> 00:11:36,680
that would be the same as taking
U sigma V conjugate transpose,
258
00:11:36,680 --> 00:11:38,780
and multiplying it
by U sigma V. If I
259
00:11:38,780 --> 00:11:43,010
use the properties of
matrix multiplications
260
00:11:43,010 --> 00:11:46,340
and complex
conjugate transposes,
261
00:11:46,340 --> 00:11:48,500
and work out what
this expression is,
262
00:11:48,500 --> 00:11:52,280
I'll find out that it's
equivalent to V sigma conjugate
263
00:11:52,280 --> 00:11:56,660
transpose sigma V
conjugate transpose.
264
00:11:56,660 --> 00:12:03,050
Well this has exactly the same
form as an eigendecomposition.
265
00:12:03,050 --> 00:12:07,400
An eigendecomposition
of A times A instead
266
00:12:07,400 --> 00:12:11,600
of an eigendecomposition
of A. So V
267
00:12:11,600 --> 00:12:16,160
is the set of eigenvectors
of A conjugate transpose A,
268
00:12:16,160 --> 00:12:19,700
and sigma squared
are the eigenvalues
269
00:12:19,700 --> 00:12:24,260
of A conjugate
transpose times A.
270
00:12:24,260 --> 00:12:26,610
And if I reverse the order
of this multiplication--
271
00:12:26,610 --> 00:12:30,530
so I do A times A conjugate
transpose-- and work it out,
272
00:12:30,530 --> 00:12:34,010
that would be U sigma
sigma U. And so U
273
00:12:34,010 --> 00:12:38,060
are the eigenvectors of
A A conjugate transpose,
274
00:12:38,060 --> 00:12:42,320
and sigma squared are still the
eigenvalues of A A conjugate
275
00:12:42,320 --> 00:12:45,070
transpose.
276
00:12:45,070 --> 00:12:47,830
So what are these
things U and V?
277
00:12:47,830 --> 00:12:51,690
They relate to the
eigenvectors of the product
278
00:12:51,690 --> 00:12:54,550
of A with itself, this
particular product of A
279
00:12:54,550 --> 00:12:58,480
with itself, or this particular
product of A with itself.
280
00:12:58,480 --> 00:13:01,310
Sigma are the singular values.
281
00:13:01,310 --> 00:13:03,750
And all matrices possess
this sort of a decomposition.
282
00:13:03,750 --> 00:13:06,530
They all have a set of singular
values and singular vectors.
283
00:13:06,530 --> 00:13:08,994
These sigmas are called the
singular values of the A.
284
00:13:08,994 --> 00:13:10,160
They have a particular name.
285
00:13:10,160 --> 00:13:12,110
I'm going to show you how you
can use this decomposition
286
00:13:12,110 --> 00:13:14,000
to do something you
already know how to do,
287
00:13:14,000 --> 00:13:15,083
but how to do it formally.
288
00:13:18,200 --> 00:13:21,738
What are some properties of the
singular value decomposition?
289
00:13:24,810 --> 00:13:27,510
So if we take a matrix A and
we compute it's singular value
290
00:13:27,510 --> 00:13:29,470
decomposition, this is
how you do it in Matlab.
291
00:13:33,030 --> 00:13:36,560
We'll find out, for this
matrix, U is identity.
292
00:13:36,560 --> 00:13:39,720
Sigma is identity with an
extra column pasted on it.
293
00:13:39,720 --> 00:13:40,920
And B is also identity.
294
00:13:40,920 --> 00:13:44,130
I mean, this is the simplest
possible four by three matrix
295
00:13:44,130 --> 00:13:45,517
I can write down.
296
00:13:45,517 --> 00:13:47,850
You don't have to know how
to compute the singular value
297
00:13:47,850 --> 00:13:50,160
decomposition, you just
need to know that it
298
00:13:50,160 --> 00:13:51,420
can be computed in this way.
299
00:13:51,420 --> 00:13:53,160
You might be able to
guess how to compute it
300
00:13:53,160 --> 00:13:55,035
based on what we did
with eigenvalues earlier
301
00:13:55,035 --> 00:13:57,240
and eigenvectors.
302
00:13:57,240 --> 00:13:59,670
It'll turn out some of
the columns of sigma
303
00:13:59,670 --> 00:14:00,950
will be non-zero right?
304
00:14:00,950 --> 00:14:04,390
There are three non-zero
columns of sigma.
305
00:14:04,390 --> 00:14:07,780
And the columns of
V, they correspond
306
00:14:07,780 --> 00:14:11,800
to those columns of sigma,
spanned the null space
307
00:14:11,800 --> 00:14:16,620
of the matrix A.
308
00:14:16,620 --> 00:14:20,220
So the first three
columns here are non-zero,
309
00:14:20,220 --> 00:14:24,221
the first three columns
of V. I'm sorry,
310
00:14:24,221 --> 00:14:25,970
the first three columns
here are non-zero.
311
00:14:25,970 --> 00:14:27,590
The last column is 0.
312
00:14:27,590 --> 00:14:30,440
The columns of sigma
which are 0 correspond
313
00:14:30,440 --> 00:14:33,920
to a particular column in V,
this last column here, which
314
00:14:33,920 --> 00:14:36,017
lives in the null
space of A. So you
315
00:14:36,017 --> 00:14:37,600
can see, if I take
A and I multiply it
316
00:14:37,600 --> 00:14:41,970
by any vector that's
proportional to 0, 0, 0, 1,
317
00:14:41,970 --> 00:14:43,340
I'll get back 0.
318
00:14:43,340 --> 00:14:46,640
So the null space
of A is spanned
319
00:14:46,640 --> 00:14:48,260
by all these vectors
corresponding
320
00:14:48,260 --> 00:14:49,850
to the 0 columns of sigma.
321
00:14:52,980 --> 00:14:54,807
Some of the columns
of sigma are non-zero.
322
00:14:54,807 --> 00:14:55,890
These first three columns.
323
00:14:55,890 --> 00:15:02,000
And the rows of U corresponding
to those three columns
324
00:15:02,000 --> 00:15:04,550
span the range of A. So
if I do the singular value
325
00:15:04,550 --> 00:15:09,410
decomposition of a matrix,
and I look at U, V, and sigma
326
00:15:09,410 --> 00:15:11,060
and what they're composed of--
327
00:15:11,060 --> 00:15:15,050
where sigma is 0 and non-zero,
and the corresponding columns
328
00:15:15,050 --> 00:15:17,330
or rows of U and V--
then I can figure out
329
00:15:17,330 --> 00:15:25,110
what vectors span the range
and null space of the matrix A.
330
00:15:25,110 --> 00:15:26,710
Here's another example.
331
00:15:26,710 --> 00:15:29,130
So here I have A.
Now instead of being
332
00:15:29,130 --> 00:15:32,850
three rows by four columns,
it's four rows by three columns.
333
00:15:32,850 --> 00:15:34,660
And here's the singular
value decomposition
334
00:15:34,660 --> 00:15:37,140
that comes out of Matlab.
335
00:15:37,140 --> 00:15:41,580
There are no vectors that
live in the null space of A,
336
00:15:41,580 --> 00:15:44,160
and there are no 0
columns in sigma.
337
00:15:44,160 --> 00:15:46,975
There's no corresponding
columns in V.
338
00:15:46,975 --> 00:15:50,920
There are no vectors
in the null space of A.
339
00:15:50,920 --> 00:15:55,480
The range of A is spanned
by the rows corresponding
340
00:15:55,480 --> 00:15:58,420
to the non-zero-- the
rows of U corresponding
341
00:15:58,420 --> 00:15:59,950
to the non-zero
columns of sigma.
342
00:15:59,950 --> 00:16:02,890
So it's these three columns
in the first three rows.
343
00:16:02,890 --> 00:16:07,000
And these first three
rows, clearly they span--
344
00:16:07,000 --> 00:16:12,040
they describe the same range
as the three columns in A.
345
00:16:12,040 --> 00:16:13,540
So the singular
value decomposition
346
00:16:13,540 --> 00:16:17,290
gives us direct access
to the null space
347
00:16:17,290 --> 00:16:18,810
and the range of a matrix.
348
00:16:18,810 --> 00:16:21,570
That's handy.
349
00:16:21,570 --> 00:16:24,170
And it can be used
in various ways.
350
00:16:24,170 --> 00:16:26,140
So here's one example
where it can be used.
351
00:16:26,140 --> 00:16:29,830
Here I have a fingerprint.
352
00:16:29,830 --> 00:16:31,150
It's a bitmap.
353
00:16:31,150 --> 00:16:33,767
It's a square bit of
data, like a matrix,
354
00:16:33,767 --> 00:16:35,350
and each of the
elements of the matrix
355
00:16:35,350 --> 00:16:39,430
takes on a value describing
how dark or light that pixel.
356
00:16:39,430 --> 00:16:43,840
Let's say it's grayscale, and
it's value's between 0 and 255.
357
00:16:43,840 --> 00:16:45,820
That's pretty typical.
358
00:16:45,820 --> 00:16:48,490
So I have this matrix, and
each element to the matrix
359
00:16:48,490 --> 00:16:50,410
corresponds to a pixel.
360
00:16:50,410 --> 00:16:53,470
And I do a singular
value decomposition.
361
00:16:53,470 --> 00:16:55,750
Some of the singular
values, the values of sigma,
362
00:16:55,750 --> 00:16:57,760
are bigger than others.
363
00:16:57,760 --> 00:17:00,770
They're all positive, but
some are bigger than others.
364
00:17:00,770 --> 00:17:02,440
The ones that are
biggest in magnitude
365
00:17:02,440 --> 00:17:06,770
carry the most information
content about the matrix.
366
00:17:06,770 --> 00:17:11,470
So we can do data compression by
neglecting singular values that
367
00:17:11,470 --> 00:17:14,980
are smaller than some
threshold, and also neglecting
368
00:17:14,980 --> 00:17:17,439
the corresponding
singular vectors.
369
00:17:17,439 --> 00:17:18,730
And that's what I've done here.
370
00:17:18,730 --> 00:17:21,400
So here's the original
bitmap of the fingerprint.
371
00:17:21,400 --> 00:17:23,680
I did the singular
value decomposition,
372
00:17:23,680 --> 00:17:27,819
and then I retained only the 50
biggest singular values and I
373
00:17:27,819 --> 00:17:29,830
left all the other
singular values out.
374
00:17:29,830 --> 00:17:32,290
This bitmap was something
like, I don't know,
375
00:17:32,290 --> 00:17:34,450
300 pixels by 300
pixels, so there's
376
00:17:34,450 --> 00:17:37,210
like 300 singular
values, but I got rid
377
00:17:37,210 --> 00:17:40,480
of 5/6 of the
information content.
378
00:17:40,480 --> 00:17:43,120
I dropped 5/6 of the
singular vectors,
379
00:17:43,120 --> 00:17:44,800
and then I
reconstructed the matrix
380
00:17:44,800 --> 00:17:47,320
from the singular values
and those singular vectors,
381
00:17:47,320 --> 00:17:48,910
and you get a faithful
representation
382
00:17:48,910 --> 00:17:51,142
of the original fingerprint.
383
00:17:51,142 --> 00:17:52,600
So the singular
value decomposition
384
00:17:52,600 --> 00:17:54,433
says something about
the information content
385
00:17:54,433 --> 00:17:57,100
in the transformation
that is the matrix, right?
386
00:17:57,100 --> 00:17:58,720
There are some
transformations that
387
00:17:58,720 --> 00:18:03,172
are of lower power or
importance than others.
388
00:18:03,172 --> 00:18:04,630
And the magnitude
of these singular
389
00:18:04,630 --> 00:18:06,300
values tell you what they are.
390
00:18:06,300 --> 00:18:09,680
Does that makes sense?
391
00:18:09,680 --> 00:18:10,680
How else can it be used?
392
00:18:10,680 --> 00:18:14,370
Well, one way it can be
used is finding the least
393
00:18:14,370 --> 00:18:17,820
square solution
to the equation Ax
394
00:18:17,820 --> 00:18:22,420
equals b, where A is no
longer a square matrix, OK?
395
00:18:26,410 --> 00:18:28,030
You've done this
in other contexts
396
00:18:28,030 --> 00:18:32,860
before where the equations
are overspecified.
397
00:18:32,860 --> 00:18:36,090
We have more equations than
unknowns, like data fitting.
398
00:18:36,090 --> 00:18:39,820
You form the normal equations,
you multiply both sides of Ax
399
00:18:39,820 --> 00:18:45,910
equals b by A transpose, and
then invert A transpose A.
400
00:18:45,910 --> 00:18:47,890
You might not be
too surprised, then,
401
00:18:47,890 --> 00:18:50,380
to think that singular value
decomposition could be useful
402
00:18:50,380 --> 00:18:50,920
here too.
403
00:18:50,920 --> 00:18:54,790
Since we already saw the data in
a singular value decomposition
404
00:18:54,790 --> 00:18:58,630
corresponds to eigenvectors and
eigenvalues of this A transpose
405
00:18:58,630 --> 00:19:00,130
A, right?
406
00:19:00,130 --> 00:19:02,950
But there's a way to use
this sort of decomposition
407
00:19:02,950 --> 00:19:05,140
formally to solve
problems that are
408
00:19:05,140 --> 00:19:09,730
both overspecified
and underspecified.
409
00:19:09,730 --> 00:19:14,620
Least squares means find
the vector of solutions
410
00:19:14,620 --> 00:19:21,860
x that minimizes
this function phi.
411
00:19:21,860 --> 00:19:24,560
Phi is the length of the
vector given by the difference
412
00:19:24,560 --> 00:19:26,450
between Ax and b.
413
00:19:26,450 --> 00:19:29,900
It's one measure of how far
an error our solution x is.
414
00:19:29,900 --> 00:19:34,220
So let's define the value
x which is least in error.
415
00:19:34,220 --> 00:19:36,065
This is one definition
of least squares.
416
00:19:40,490 --> 00:19:43,950
And I know the singular value
decomposition of A. So A
417
00:19:43,950 --> 00:19:48,685
is U sigma times V. So I
have U sigma V times x.
418
00:19:48,685 --> 00:19:52,140
I can factor out U, and I've
got a factor of U transpose,
419
00:19:52,140 --> 00:19:54,960
or U conjugate transpose
multiplying by b.
420
00:19:54,960 --> 00:20:00,000
So Ax minus b is the same as
U times the quantity sigma V
421
00:20:00,000 --> 00:20:05,240
conjugate transpose x minus
U conjugate transpose b.
422
00:20:05,240 --> 00:20:08,480
We want to know the x
that minimizes this phi.
423
00:20:08,480 --> 00:20:10,130
It's an optimization problem.
424
00:20:10,130 --> 00:20:12,990
We'll talk in great detail about
these sorts of problems later.
425
00:20:12,990 --> 00:20:15,350
This one is so easy to do,
we can just work it out
426
00:20:15,350 --> 00:20:18,152
in a couple lines of text.
427
00:20:18,152 --> 00:20:19,610
We'll define a new
set of unknowns,
428
00:20:19,610 --> 00:20:25,130
y, which is V transpose times
x, and a new right-hand side
429
00:20:25,130 --> 00:20:29,270
for a system of equations p,
which is U transpose times b.
430
00:20:29,270 --> 00:20:31,220
And then we can rewrite
our function phi
431
00:20:31,220 --> 00:20:32,600
that we're trying to minimize.
432
00:20:32,600 --> 00:20:37,720
So phi then becomes
U sigma y minus p.
433
00:20:37,720 --> 00:20:39,080
U is a unitary vector.
434
00:20:39,080 --> 00:20:45,440
It imparts no stretch in the two
norms, so this sigma y minus p
435
00:20:45,440 --> 00:20:49,850
doesn't get elongated by
multiplication with U.
436
00:20:49,850 --> 00:20:52,280
So it's length,
the length of this,
437
00:20:52,280 --> 00:20:55,660
is the same as the length
of sigma y minus p.
438
00:20:55,660 --> 00:20:56,915
You can prove this.
439
00:20:56,915 --> 00:20:58,540
It's not very difficult
to show at all.
440
00:20:58,540 --> 00:21:02,430
You use the definition of
the two norm to prove it.
441
00:21:02,430 --> 00:21:10,150
So phi is minimized by y's,
which makes this norm smallest,
442
00:21:10,150 --> 00:21:11,170
make it closest to 0.
443
00:21:14,240 --> 00:21:17,060
Let r be the number
of non-zero singular
444
00:21:17,060 --> 00:21:19,190
values, the number
of those sigmas
445
00:21:19,190 --> 00:21:22,440
which are not equal to 0.
446
00:21:22,440 --> 00:21:24,360
That's also the rank of A.
447
00:21:24,360 --> 00:21:29,330
Then I can rewrite
phi as the sum from i
448
00:21:29,330 --> 00:21:36,060
equals 1 to r of sigma i i
time y i minus p i squared.
449
00:21:36,060 --> 00:21:40,160
That's parts of this length,
this Euclidean length,
450
00:21:40,160 --> 00:21:42,320
for which sigma is non-zero.
451
00:21:42,320 --> 00:21:46,730
Plus the sum from r
plus 1 to n, the sum
452
00:21:46,730 --> 00:21:50,240
over the rest of
the values of p,
453
00:21:50,240 --> 00:21:52,590
for which the
corresponding sigmas are 0.
454
00:21:59,710 --> 00:22:02,830
I want to minimize
phi, and the only thing
455
00:22:02,830 --> 00:22:05,337
that I can change to
minimize it is what?
456
00:22:08,139 --> 00:22:10,530
What am I free to
pick in this equation
457
00:22:10,530 --> 00:22:13,930
in order to make phi
as small as possible?
458
00:22:13,930 --> 00:22:14,570
Yeah?
459
00:22:14,570 --> 00:22:15,070
AUDIENCE: y.
460
00:22:15,070 --> 00:22:17,290
JAMES W. SWAN: y,
so I need to choose
461
00:22:17,290 --> 00:22:21,110
the y's that make this
phi as small as possible.
462
00:22:21,110 --> 00:22:22,955
What value should I
choose for the y's?
463
00:22:26,678 --> 00:22:27,666
What do you think?
464
00:22:30,630 --> 00:22:34,380
AUDIENCE: [INAUDIBLE]
465
00:22:34,380 --> 00:22:35,630
JAMES W. SWAN: Perfect, right?
466
00:22:35,630 --> 00:22:40,000
Choose y equals p
i over sigma i i.
467
00:22:40,000 --> 00:22:42,610
Right, y i is p
i over sigma i i.
468
00:22:42,610 --> 00:22:46,150
Then all of these terms is 0.
469
00:22:46,150 --> 00:22:48,490
I can't make this sum
any smaller than that.
470
00:22:48,490 --> 00:22:53,980
That fixes the value
of y i up to r.
471
00:22:53,980 --> 00:22:57,280
I can't do anything about
this left over bit here.
472
00:22:57,280 --> 00:23:00,340
There's no choice of
y that's going to make
473
00:23:00,340 --> 00:23:01,657
this part and the smaller.
474
00:23:01,657 --> 00:23:02,490
It's just left over.
475
00:23:02,490 --> 00:23:04,275
It's some remainder that
we can't make any smaller
476
00:23:04,275 --> 00:23:05,410
or minimize an smaller.
477
00:23:05,410 --> 00:23:08,125
There isn't an exact solution
to this problem, in many cases.
478
00:23:10,690 --> 00:23:17,190
But one way this could be
0 is if r is equal to n.
479
00:23:17,190 --> 00:23:19,570
Then there are left
over unspecified terms,
480
00:23:19,570 --> 00:23:22,375
and then this y i
equals p i over sigma i
481
00:23:22,375 --> 00:23:23,920
is the exact solution
to the problem.
482
00:23:28,280 --> 00:23:29,690
So this is what you told me.
483
00:23:29,690 --> 00:23:36,230
Choose y i is p i over sigma i i
for i bigger than 1 and smaller
484
00:23:36,230 --> 00:23:38,650
than r.
485
00:23:38,650 --> 00:23:40,710
There are going to
be values of y i
486
00:23:40,710 --> 00:23:46,070
that go between r plus 1 and
m, because A was a vector that
487
00:23:46,070 --> 00:23:47,990
mapped from m to n, right?
488
00:23:47,990 --> 00:23:52,160
So I have extra values of y that
could be specified potentially.
489
00:23:52,160 --> 00:23:56,279
If that's true, if r
plus 1 is smaller than m,
490
00:23:56,279 --> 00:23:58,570
then there's some components
of y that I don't get to--
491
00:23:58,570 --> 00:23:59,770
I can't specify, right?
492
00:23:59,770 --> 00:24:02,930
My system of equations is
somehow underdetermined.
493
00:24:02,930 --> 00:24:05,540
I need some external
information to show me what
494
00:24:05,540 --> 00:24:07,740
values to pick for those y i.
495
00:24:07,740 --> 00:24:08,900
I don't know.
496
00:24:08,900 --> 00:24:10,400
I can't use them.
497
00:24:10,400 --> 00:24:13,420
Sometimes people just
set y i equal to 0.
498
00:24:13,420 --> 00:24:16,220
That's sort of silly,
but that's what's done.
499
00:24:16,220 --> 00:24:20,150
It's called the minimum
norm least square solution.
500
00:24:20,150 --> 00:24:23,090
y has minimum length, when you
set all these other components
501
00:24:23,090 --> 00:24:24,050
to 0.
502
00:24:24,050 --> 00:24:27,140
But the truth is, we can't
specify those components,
503
00:24:27,140 --> 00:24:27,716
right?
504
00:24:27,716 --> 00:24:29,090
We need some
external information
505
00:24:29,090 --> 00:24:31,850
in order to specify them.
506
00:24:31,850 --> 00:24:34,680
Once we know y, we
can find x going back
507
00:24:34,680 --> 00:24:36,260
to our definition of what y is.
508
00:24:36,260 --> 00:24:38,840
So I multiply this equation
by V on both sides,
509
00:24:38,840 --> 00:24:42,350
and I'll get V y equals x.
510
00:24:42,350 --> 00:24:44,180
So I can find my
least square solution
511
00:24:44,180 --> 00:24:47,596
to the problem from the
singular value decomposition.
512
00:24:47,596 --> 00:24:49,220
So I can find the
least square solution
513
00:24:49,220 --> 00:24:52,850
to both overdetermined and
underdetermined problems using
514
00:24:52,850 --> 00:24:55,694
singular value decomposition.
515
00:24:55,694 --> 00:24:57,110
It inherits all
the properties you
516
00:24:57,110 --> 00:24:58,670
know of solving the
normal equations,
517
00:24:58,670 --> 00:25:01,430
multiplying by A transpose
the entire equation,
518
00:25:01,430 --> 00:25:04,722
and solving for a least
square solution that way.
519
00:25:04,722 --> 00:25:06,680
But that's only good for
overdetermined systems
520
00:25:06,680 --> 00:25:07,250
of equations.
521
00:25:07,250 --> 00:25:09,041
This can work for
underdetermined equations
522
00:25:09,041 --> 00:25:09,770
as well.
523
00:25:09,770 --> 00:25:12,450
And maybe we do have
extraneous information
524
00:25:12,450 --> 00:25:15,080
that lets us specify these
other components somehow.
525
00:25:15,080 --> 00:25:17,420
Maybe we do a
separate optimization
526
00:25:17,420 --> 00:25:20,110
that chooses from all
possible solutions
527
00:25:20,110 --> 00:25:23,070
where these y i's are free,
and picks the best one subject
528
00:25:23,070 --> 00:25:25,890
to some other constraint.
529
00:25:25,890 --> 00:25:27,550
Does it makes sense?
530
00:25:27,550 --> 00:25:29,010
OK, that's the
last decomposition
531
00:25:29,010 --> 00:25:32,070
we're going to talk about.
532
00:25:32,070 --> 00:25:34,890
It's as expensive to compute
the singular value decomposition
533
00:25:34,890 --> 00:25:36,810
as it is to solve a
system of equations.
534
00:25:36,810 --> 00:25:38,310
You might have
guessed that it's got
535
00:25:38,310 --> 00:25:40,470
an order N cubed flavor to it.
536
00:25:40,470 --> 00:25:42,660
It's kind of inescapable
that we run up
537
00:25:42,660 --> 00:25:45,920
against those
computational difficulties,
538
00:25:45,920 --> 00:25:48,530
order N cubed
computational complexity.
539
00:25:48,530 --> 00:25:50,730
And there are many problems
of practical interest,
540
00:25:50,730 --> 00:25:54,180
particularly solutions of
PDEs, for which that's not
541
00:25:54,180 --> 00:25:56,000
going to cut it.
542
00:25:56,000 --> 00:25:59,970
Where you couldn't solve
the problem with that sort
543
00:25:59,970 --> 00:26:01,330
of scaling in time.
544
00:26:01,330 --> 00:26:04,710
You couldn't compute the
Gaussian elimination,
545
00:26:04,710 --> 00:26:06,540
or the singular
value decomposition,
546
00:26:06,540 --> 00:26:08,970
or an eigenvalue decomposition.
547
00:26:08,970 --> 00:26:09,960
It won't work.
548
00:26:09,960 --> 00:26:14,520
And in those cases, we appeal
to not exact solution methods,
549
00:26:14,520 --> 00:26:17,010
but approximate
solution methods.
550
00:26:17,010 --> 00:26:20,120
So instead of trying to
get an exact solution,
551
00:26:20,120 --> 00:26:22,080
we'll try to formulate
one that's good enough.
552
00:26:22,080 --> 00:26:24,390
We already know the computer
introduces numerical error
553
00:26:24,390 --> 00:26:26,070
anyways.
554
00:26:26,070 --> 00:26:28,440
Maybe we don't need machine
precision in our solution
555
00:26:28,440 --> 00:26:30,600
or something close to machine
precision in our solution.
556
00:26:30,600 --> 00:26:32,266
Maybe we're solving
engineering problem,
557
00:26:32,266 --> 00:26:34,230
and we're willing to
accept relative errors
558
00:26:34,230 --> 00:26:37,770
on the order of 10 to the
minus 3 or 10 to the minus 5,
559
00:26:37,770 --> 00:26:41,550
some specified tolerance
that we apply to the problem.
560
00:26:41,550 --> 00:26:44,250
And in those circumstances,
we use iterative methods
561
00:26:44,250 --> 00:26:46,095
to solve systems of
equations instead
562
00:26:46,095 --> 00:26:49,020
of exact methods,
elimination methods,
563
00:26:49,020 --> 00:26:50,640
or metrics
decomposition methods.
564
00:26:53,830 --> 00:26:56,560
These algorithms are all
based on iterative refinement
565
00:26:56,560 --> 00:26:57,580
of an initial guess.
566
00:26:57,580 --> 00:26:59,260
So if we have some
system of equations
567
00:26:59,260 --> 00:27:02,350
we're trying to
solve, Ax equals b,
568
00:27:02,350 --> 00:27:05,410
we'll formulate some
linear map, right?
569
00:27:05,410 --> 00:27:09,910
xi plus 1 will be some
matrix C times x i
570
00:27:09,910 --> 00:27:13,540
plus some little vector c
where x i is my last best
571
00:27:13,540 --> 00:27:17,080
guess for the solution to
this problem, and x i plus 1
572
00:27:17,080 --> 00:27:20,320
is my next best guess for
the solution to this problem.
573
00:27:20,320 --> 00:27:24,700
And I'm hoping, as I apply
this map more and more times,
574
00:27:24,700 --> 00:27:27,010
I'm creeping closer
to the exact solution
575
00:27:27,010 --> 00:27:29,590
to the original
system of equations.
576
00:27:29,590 --> 00:27:33,610
The map will converge
when x i plus 1
577
00:27:33,610 --> 00:27:36,190
approaches x i, when the
map isn't making any changes
578
00:27:36,190 --> 00:27:39,250
to the vector anymore.
579
00:27:39,250 --> 00:27:48,490
And the converged value will
be a solution when x i--
580
00:27:48,490 --> 00:27:51,910
which is equal to i
minus c inverse times c,
581
00:27:51,910 --> 00:27:54,370
if I replace x i was
1 with x i appear,
582
00:27:54,370 --> 00:27:58,120
so I say that my map has
converged-- when this value is
583
00:27:58,120 --> 00:28:00,160
equivalent to A
inverse times B, when
584
00:28:00,160 --> 00:28:03,880
it's a solution to the
original problem, right?
585
00:28:03,880 --> 00:28:05,480
So my map may converge.
586
00:28:05,480 --> 00:28:08,320
It may not converge to a
solution of the problem I like,
587
00:28:08,320 --> 00:28:09,790
but if it satisfies
this condition,
588
00:28:09,790 --> 00:28:12,730
then has converged to be
a solution of the problem
589
00:28:12,730 --> 00:28:14,270
that I like as well.
590
00:28:14,270 --> 00:28:18,400
And so it's all about using this
C here and this little c here
591
00:28:18,400 --> 00:28:22,000
so that this map converges
to solution of the problem
592
00:28:22,000 --> 00:28:22,720
I'm after.
593
00:28:22,720 --> 00:28:25,000
And there are lots of
schemes for doing this.
594
00:28:25,000 --> 00:28:26,680
Some of them are kind of ad hoc.
595
00:28:26,680 --> 00:28:28,230
I'm going to show
you one right now.
596
00:28:28,230 --> 00:28:30,490
And then when we
do optimization,
597
00:28:30,490 --> 00:28:32,230
we'll talk about
a more formal way
598
00:28:32,230 --> 00:28:35,210
of doing this for
which you can guarantee
599
00:28:35,210 --> 00:28:37,937
very rapid convergence
to a solution.
600
00:28:37,937 --> 00:28:40,020
So here's a system of
equations I'd like to solve.
601
00:28:40,020 --> 00:28:41,510
It's not a very big one.
602
00:28:41,510 --> 00:28:44,050
It doesn't really make sense
to solve this one iteratively,
603
00:28:44,050 --> 00:28:45,940
but it's a nice illustration.
604
00:28:45,940 --> 00:28:49,570
One way to go about
formulating this map
605
00:28:49,570 --> 00:28:53,290
is to split this
matrix into two parts.
606
00:28:53,290 --> 00:28:57,160
So I'll split it into a diagonal
part and an off diagonal part.
607
00:28:57,160 --> 00:29:00,070
So I haven't changed the
problem at all by doing that.
608
00:29:00,070 --> 00:29:03,740
And then I'm going to
rename this x x i plus 1,
609
00:29:03,740 --> 00:29:07,060
and I'm going to
rename this x x i.
610
00:29:07,060 --> 00:29:09,454
And then move this
matrix vector product
611
00:29:09,454 --> 00:29:10,870
to the other side
of the equation.
612
00:29:10,870 --> 00:29:12,376
And here's my map.
613
00:29:12,376 --> 00:29:13,750
Of course, this
matrix multiplied
614
00:29:13,750 --> 00:29:16,210
doesn't make any-- it's
not useful to write it out
615
00:29:16,210 --> 00:29:16,780
explicitly.
616
00:29:16,780 --> 00:29:19,070
This is just identity.
617
00:29:19,070 --> 00:29:20,840
So I can drop this entirely.
618
00:29:20,840 --> 00:29:22,090
This is just x i plus one.
619
00:29:22,090 --> 00:29:23,320
So here's my map.
620
00:29:23,320 --> 00:29:27,430
Take an initial guess,
multiply it by this matrix,
621
00:29:27,430 --> 00:29:31,600
add the vector 1, 0, and repeat
over and over and over again.
622
00:29:31,600 --> 00:29:33,130
Hopefully-- we
don't really know--
623
00:29:33,130 --> 00:29:34,671
but hopefully, it's
going to converge
624
00:29:34,671 --> 00:29:38,502
to a solution of the
original linear equations.
625
00:29:38,502 --> 00:29:39,710
I didn't make up that method.
626
00:29:39,710 --> 00:29:42,490
That's a method called
Jacobi Iteration.
627
00:29:42,490 --> 00:29:46,280
And the strategy is to split
the matrix A into two parts--
628
00:29:46,280 --> 00:29:49,010
a sum of its diagonal
elements, and it's off diagonal
629
00:29:49,010 --> 00:29:51,050
elements--
630
00:29:51,050 --> 00:29:54,410
and rewrite the original
equations as an iterative map.
631
00:29:54,410 --> 00:30:00,650
So D times x i plus 1 is equal
to minus r times x i plus b.
632
00:30:00,650 --> 00:30:07,600
Or x i plus 1 is D inverse
times minus r x i plus b.
633
00:30:07,600 --> 00:30:11,100
If the equations converge,
then D plus r times x i
634
00:30:11,100 --> 00:30:13,780
has to be equal to b, we
will have found a solution.
635
00:30:13,780 --> 00:30:15,040
If it converges, right?
636
00:30:15,040 --> 00:30:18,729
If these iterations
approach a steady value.
637
00:30:18,729 --> 00:30:20,770
If they don't change from
iteration to iteration.
638
00:30:20,770 --> 00:30:21,940
Is
639
00:30:21,940 --> 00:30:23,796
The nice thing about
the Jacobi method is it
640
00:30:23,796 --> 00:30:25,170
turns the hard
problem, the order
641
00:30:25,170 --> 00:30:29,920
N cubed problem of
computing A inverse B,
642
00:30:29,920 --> 00:30:31,750
into a succession
of easy problems,
643
00:30:31,750 --> 00:30:38,710
D inverse times some vector C.
How many calculations does it
644
00:30:38,710 --> 00:30:40,150
take to compute that D inverse?
645
00:30:44,710 --> 00:30:46,454
N, that's right, order N.
646
00:30:46,454 --> 00:30:47,620
It's just a diagonal matrix.
647
00:30:47,620 --> 00:30:49,880
I invert each of its diagonal
elements, and I'm done.
648
00:30:49,880 --> 00:30:53,790
So I went from order N cubed,
which was going to be hard,
649
00:30:53,790 --> 00:30:57,320
into a succession
of order N problems.
650
00:30:57,320 --> 00:30:59,730
So as long as it doesn't
take me order N squared
651
00:30:59,730 --> 00:31:02,879
iterations to get to the
solution that I want,
652
00:31:02,879 --> 00:31:03,670
I'm going to be OK.
653
00:31:03,670 --> 00:31:05,378
This is going to be
a viable way to solve
654
00:31:05,378 --> 00:31:07,690
this problem faster than
finding the exact solution.
655
00:31:13,240 --> 00:31:15,640
How do you know
that it converges?
656
00:31:15,640 --> 00:31:17,020
That's the question.
657
00:31:17,020 --> 00:31:20,290
Is this thing actually
going to converge or not,
658
00:31:20,290 --> 00:31:23,980
or are these iterations just
going to run on and on forever?
659
00:31:23,980 --> 00:31:26,620
Well, one way to check whether
it will converge or not
660
00:31:26,620 --> 00:31:31,220
is to go back up to this
equation here, and substitute b
661
00:31:31,220 --> 00:31:34,890
equals Ax, where x is the
exact solution to the problem.
662
00:31:37,420 --> 00:31:40,810
And you can transform,
then, this equation into one
663
00:31:40,810 --> 00:31:45,640
that looks like x i plus
1 minus x equal to minus D
664
00:31:45,640 --> 00:31:49,930
inverse times r x i minus x.
665
00:31:49,930 --> 00:31:52,780
And if I take the
norm of both sides
666
00:31:52,780 --> 00:31:55,270
and I apply our
normal equality--
667
00:31:55,270 --> 00:31:58,000
where the norm of a
matrix vector product
668
00:31:58,000 --> 00:32:01,460
is smaller than the product
of the norms of the matrices
669
00:32:01,460 --> 00:32:02,860
of the vectors--
670
00:32:02,860 --> 00:32:06,070
then I can get a
ratio like this.
671
00:32:06,070 --> 00:32:10,060
That the absolute error
in iteration I plus 1
672
00:32:10,060 --> 00:32:13,090
divided by the absolute
error in iteration i
673
00:32:13,090 --> 00:32:17,410
is smaller than the
norm of this matrix.
674
00:32:17,410 --> 00:32:21,810
So if I'm converging,
then what I expect
675
00:32:21,810 --> 00:32:25,320
is this ratio should
be smaller than 1.
676
00:32:25,320 --> 00:32:27,240
The error in my
next approximation
677
00:32:27,240 --> 00:32:30,080
should be smaller than the error
in my current approximation.
678
00:32:30,080 --> 00:32:31,450
That makes sense?
679
00:32:31,450 --> 00:32:35,100
So that means that I would hope
that the norm of this matrix
680
00:32:35,100 --> 00:32:36,690
is also smaller than 1.
681
00:32:36,690 --> 00:32:40,290
If it is, then I'm going to
be guaranteed to converge.
682
00:32:40,290 --> 00:32:42,912
So for a particular
coefficient matrix,
683
00:32:42,912 --> 00:32:45,120
for a system of linear
equations I'm trying to solve,
684
00:32:45,120 --> 00:32:47,010
I may be able to find--
685
00:32:47,010 --> 00:32:50,430
I may find that this is true.
686
00:32:50,430 --> 00:32:51,960
And then I can
apply this method,
687
00:32:51,960 --> 00:32:54,980
and I'll converge to a solution.
688
00:32:54,980 --> 00:32:58,100
We call this sort of
convergence linear.
689
00:32:58,100 --> 00:32:59,990
Whatever this number
is, it tells me
690
00:32:59,990 --> 00:33:03,890
the fraction by which the
error is reduced from iteration
691
00:33:03,890 --> 00:33:04,940
to iteration.
692
00:33:04,940 --> 00:33:07,810
So suppose this is 1/10.
693
00:33:07,810 --> 00:33:11,510
Then the absolute error is going
to be reduced by a factor of 10
694
00:33:11,510 --> 00:33:14,300
in each iteration.
695
00:33:14,300 --> 00:33:15,720
It's not going to
be 1/10 usually.
696
00:33:15,720 --> 00:33:17,095
It's going to be
something that's
697
00:33:17,095 --> 00:33:20,201
a little bit bigger than that
typically, but that's the idea.
698
00:33:20,201 --> 00:33:22,700
You can show-- I would encourage
you to try to work this out
699
00:33:22,700 --> 00:33:25,560
on your own-- but you can show
that the infinity norm of this
700
00:33:25,560 --> 00:33:26,060
product--
701
00:33:29,060 --> 00:33:33,050
infinity norm of this
product is equal to this.
702
00:33:33,050 --> 00:33:36,040
And if I ask that the
infinity norm of this product
703
00:33:36,040 --> 00:33:38,540
be smaller than 1,
that's guaranteed
704
00:33:38,540 --> 00:33:41,450
when the diagonal values of
the matrix and absolute value
705
00:33:41,450 --> 00:33:43,850
are bigger than
the sum of the off
706
00:33:43,850 --> 00:33:46,670
diagonal values in a particular
row or a particular column.
707
00:33:46,670 --> 00:33:49,187
And that kind of matrix we
call diagonally dominant.
708
00:33:49,187 --> 00:33:51,770
The diagonal values are bigger
than the sum and absolute value
709
00:33:51,770 --> 00:33:53,570
of the off diagonal pieces.
710
00:33:53,570 --> 00:33:57,290
So diagonally dominant matrices,
which come up quite often,
711
00:33:57,290 --> 00:33:59,240
can be-- those linear
equations based
712
00:33:59,240 --> 00:34:02,660
on those matrices can be solved
reasonable efficiency using
713
00:34:02,660 --> 00:34:04,262
the Jacobi method.
714
00:34:04,262 --> 00:34:05,720
There are better
methods to choose.
715
00:34:05,720 --> 00:34:07,350
I'll show you one in a second.
716
00:34:07,350 --> 00:34:08,900
But you can guarantee
that this is
717
00:34:08,900 --> 00:34:11,360
going to converge to a solution,
and that the solution will
718
00:34:11,360 --> 00:34:13,219
be the right solution to
the linear equations you
719
00:34:13,219 --> 00:34:14,094
were trying to solve.
720
00:34:19,440 --> 00:34:23,010
So if the goal is just to
turn hard problems into easier
721
00:34:23,010 --> 00:34:25,500
to solve problems, then
there are other natural ways
722
00:34:25,500 --> 00:34:27,540
to want to split a matrix.
723
00:34:27,540 --> 00:34:32,100
So maybe you want to split into
A lower triangular part which
724
00:34:32,100 --> 00:34:34,829
contains the diagonal
elements of A,
725
00:34:34,829 --> 00:34:36,449
and an upper
triangular part which
726
00:34:36,449 --> 00:34:41,130
has no diagonal elements of A.
We just split this thing apart.
727
00:34:41,130 --> 00:34:43,370
And then we could rewrite
our system of equations
728
00:34:43,370 --> 00:34:45,750
is an iterative map
like this, L times x i
729
00:34:45,750 --> 00:34:50,429
plus 1 is minus U
times x i plus b.
730
00:34:50,429 --> 00:34:53,550
All I have to do is invert
l to find my next iteration.
731
00:34:53,550 --> 00:34:55,590
And how expensive
computationally
732
00:34:55,590 --> 00:35:00,360
is it to solve a system of
equations which is triangular?
733
00:35:00,360 --> 00:35:02,630
This is a process we
call back substitution.
734
00:35:02,630 --> 00:35:03,540
Its order--
735
00:35:03,540 --> 00:35:04,470
AUDIENCE: N squared.
736
00:35:04,470 --> 00:35:05,820
JAMES W. SWAN: --N squared.
737
00:35:05,820 --> 00:35:08,160
So we still beat N cubed.
738
00:35:08,160 --> 00:35:11,640
One would hope that it doesn't
require too many iterations
739
00:35:11,640 --> 00:35:12,420
to do this.
740
00:35:12,420 --> 00:35:15,160
But in principle, we can do
this order N squared operations
741
00:35:15,160 --> 00:35:16,500
many times.
742
00:35:16,500 --> 00:35:18,840
And it'll turn out
that this sort of a map
743
00:35:18,840 --> 00:35:22,080
converges to the solution
that we're after.
744
00:35:22,080 --> 00:35:24,750
It converges when matrices
are either diagonally dominant
745
00:35:24,750 --> 00:35:28,470
as before, or they're symmetric
and they're positive definite.
746
00:35:28,470 --> 00:35:31,680
Positive definite means all
the eigenvalues of the matrix
747
00:35:31,680 --> 00:35:33,240
are bigger than 0.
748
00:35:40,680 --> 00:35:43,292
So try the iterative method
solving some equations
749
00:35:43,292 --> 00:35:44,250
and see how we convert.
750
00:35:44,250 --> 00:35:44,749
Yes?
751
00:35:44,749 --> 00:35:47,460
AUDIENCE: How do you justify
ignoring the diagonal elements
752
00:35:47,460 --> 00:35:48,400
in that method?
753
00:35:50,862 --> 00:35:52,320
JAMES W. SWAN: So
the question was,
754
00:35:52,320 --> 00:35:54,960
how do you justify ignoring
the diagonal elements
755
00:35:54,960 --> 00:35:56,102
in this method.
756
00:35:56,102 --> 00:35:57,810
Maybe I was going too
fast or I misspoke.
757
00:35:57,810 --> 00:36:01,980
So I'm going to split A into
a lower triangular matrix that
758
00:36:01,980 --> 00:36:05,250
has all the diagonal
elements, and U
759
00:36:05,250 --> 00:36:08,125
is the upper parts with none of
those diagonal elements on it.
760
00:36:08,125 --> 00:36:09,000
Does that make sense?
761
00:36:09,000 --> 00:36:09,750
AUDIENCE: Yeah.
762
00:36:09,750 --> 00:36:11,180
JAMES W. SWAN: Thank you
for asking that question.
763
00:36:11,180 --> 00:36:12,430
I hope that's clear.
764
00:36:12,430 --> 00:36:16,660
l holds onto the diagonal
pieces and U takes those away.
765
00:36:19,790 --> 00:36:20,490
So let's try it.
766
00:36:20,490 --> 00:36:23,310
On a matrix like this,
the exact solution
767
00:36:23,310 --> 00:36:27,210
to this system of equations
is 3/4, 1/2, and 1/4.
768
00:36:27,210 --> 00:36:28,980
All right, we'll
try Jacobi, we'll
769
00:36:28,980 --> 00:36:31,530
have to give it some initial
guess for the solution, right?
770
00:36:31,530 --> 00:36:33,450
We're talking about
places where you
771
00:36:33,450 --> 00:36:37,160
can derive those initial guesses
from later on in the course,
772
00:36:37,160 --> 00:36:39,300
but we have to start
the iterative process
773
00:36:39,300 --> 00:36:41,730
with some guess
at the solutions.
774
00:36:41,730 --> 00:36:43,380
So here's an initial guess.
775
00:36:43,380 --> 00:36:44,550
We'll apply this map.
776
00:36:44,550 --> 00:36:47,010
Here's Gauss-Seidel with
the same initial guess,
777
00:36:47,010 --> 00:36:48,670
and we'll apply this map.
778
00:36:48,670 --> 00:36:52,120
They're both
linearly convergent,
779
00:36:52,120 --> 00:36:53,700
so the relative
error will go down
780
00:36:53,700 --> 00:36:57,660
by a fixed factor
after each iteration.
781
00:36:57,660 --> 00:37:00,470
Iteration one, the relative
error in Jacobi will be 38%.
782
00:37:00,470 --> 00:37:02,910
In Gauss-Seidel, it'll be 40%.
783
00:37:02,910 --> 00:37:05,190
If we apply this all the
way down to 10 iterations,
784
00:37:05,190 --> 00:37:09,020
the relative error Jacobi will
be 1.7%, and the relative error
785
00:37:09,020 --> 00:37:11,360
in Gauss-Seidel 0.08%.
786
00:37:11,360 --> 00:37:13,320
And we can go on and on
with these iterations
787
00:37:13,320 --> 00:37:16,740
if we want until we get
sufficiently converged, we
788
00:37:16,740 --> 00:37:19,320
get to a point where the
relative error is small enough
789
00:37:19,320 --> 00:37:22,590
that we're happy to accept
this answer as a solution
790
00:37:22,590 --> 00:37:24,660
to our system of equations.
791
00:37:24,660 --> 00:37:28,450
So we traded the burden of doing
all these calculations to do
792
00:37:28,450 --> 00:37:34,080
elimination for a faster,
less computationally complex
793
00:37:34,080 --> 00:37:35,250
methodology.
794
00:37:35,250 --> 00:37:38,620
But the trade off was we don't
get an exact solution anymore.
795
00:37:38,620 --> 00:37:40,890
We're going to have finite
precision in the result,
796
00:37:40,890 --> 00:37:42,750
and we have to
specify the tolerance
797
00:37:42,750 --> 00:37:44,541
that we want to converge to.
798
00:37:44,541 --> 00:37:47,040
We're going to see now-- this
is the hook into the next part
799
00:37:47,040 --> 00:37:47,860
of that class--
800
00:37:47,860 --> 00:37:50,070
we're going to talk about
solutions of nonlinear
801
00:37:50,070 --> 00:37:52,470
equations next for
which there are
802
00:37:52,470 --> 00:37:55,140
almost no non-linear equations
that we can solve exactly.
803
00:37:55,140 --> 00:37:58,945
They all have to be solved
using these iterative methods.
804
00:37:58,945 --> 00:38:01,320
You can use these iterative
methods for linear equations.
805
00:38:01,320 --> 00:38:03,079
It's very common
to do it this way.
806
00:38:03,079 --> 00:38:04,620
In my group, we
solve lots of systems
807
00:38:04,620 --> 00:38:07,860
of linear equations associated
with hydrodynamic problems.
808
00:38:07,860 --> 00:38:10,670
These come up when
you're talking about,
809
00:38:10,670 --> 00:38:12,510
say, low Reynolds
number flows, which
810
00:38:12,510 --> 00:38:15,420
are linear sorts of
fluid flow problems.
811
00:38:15,420 --> 00:38:15,990
They're big.
812
00:38:15,990 --> 00:38:18,050
It's really hard to do
Gaussian elimination,
813
00:38:18,050 --> 00:38:19,910
so you apply different
iterative methods.
814
00:38:19,910 --> 00:38:20,910
You can do Gauss-Seidel.
815
00:38:20,910 --> 00:38:22,120
You can do Jacobi.
816
00:38:22,120 --> 00:38:23,790
We'll learn about
more advanced ones
817
00:38:23,790 --> 00:38:26,880
like PCG, which you're
applying on your homework now,
818
00:38:26,880 --> 00:38:29,970
and you should be seeing that
it converges relatively quickly
819
00:38:29,970 --> 00:38:32,176
in cases where exact
elimination doesn't work.
820
00:38:32,176 --> 00:38:34,050
We'll learn, actually,
how to do that method.
821
00:38:34,050 --> 00:38:35,758
That's one that we
apply in my own group.
822
00:38:35,758 --> 00:38:37,870
It's pretty common
to use out there.
823
00:38:37,870 --> 00:38:38,370
Yes?
824
00:38:38,370 --> 00:38:43,440
AUDIENCE: One question, is
that that Gauss, [INAUDIBLE]
825
00:38:43,440 --> 00:38:45,930
JAMES W. SWAN: Order N squared.
826
00:38:45,930 --> 00:38:47,708
AUDIENCE: Yeah,
that's what I meant.
827
00:38:47,708 --> 00:38:50,148
So now we've got an [INAUDIBLE].
828
00:38:50,148 --> 00:38:54,407
So we basically have
[INAUDIBLE] iterations, right?
829
00:38:54,407 --> 00:38:56,240
JAMES W. SWAN: This is
a wonderful question.
830
00:38:56,240 --> 00:39:00,120
So this is a pathological
problem in the sense
831
00:39:00,120 --> 00:39:03,300
that it requires a
lot of calculations
832
00:39:03,300 --> 00:39:05,550
to get an iterative
solution here.
833
00:39:05,550 --> 00:39:07,110
We haven't gotten
to an end that's
834
00:39:07,110 --> 00:39:10,290
big enough that the
computational complexities
835
00:39:10,290 --> 00:39:11,700
crossover.
836
00:39:11,700 --> 00:39:15,810
So for small Ns, probably
the factor in front of N--
837
00:39:15,810 --> 00:39:17,200
whatever number that is--
838
00:39:17,200 --> 00:39:19,000
and maybe even the
smaller factors,
839
00:39:19,000 --> 00:39:20,940
order N squared
factors on that order
840
00:39:20,940 --> 00:39:22,440
N cubed, play a big
role in how long
841
00:39:22,440 --> 00:39:24,660
it takes to actually
complete this thing.
842
00:39:24,660 --> 00:39:28,140
But modern problems are so
big that we almost always
843
00:39:28,140 --> 00:39:30,480
are running out to Ns
that are large enough
844
00:39:30,480 --> 00:39:31,590
that we see a crossover.
845
00:39:31,590 --> 00:39:34,110
You'll see this in your
homework this week.
846
00:39:34,110 --> 00:39:36,030
You won't see this
crossover at N equals 3.
847
00:39:36,030 --> 00:39:37,613
You're going to see
it out at N equals
848
00:39:37,613 --> 00:39:41,336
500 or 1,200, big problems.
849
00:39:41,336 --> 00:39:43,210
Then we're going to
encounter this crossover.
850
00:39:43,210 --> 00:39:44,376
That's a wonderful question.
851
00:39:44,376 --> 00:39:49,110
So first small system
sizes, iterative methods
852
00:39:49,110 --> 00:39:50,750
maybe don't buy you much.
853
00:39:50,750 --> 00:39:53,000
I suppose it depends on the
application though, right?
854
00:39:53,000 --> 00:39:56,430
If you're doing something
that involves solving problems
855
00:39:56,430 --> 00:40:01,420
on embedded hardware, in some
sort of sensor or control
856
00:40:01,420 --> 00:40:04,230
valve, there may be
very limited memory
857
00:40:04,230 --> 00:40:06,290
or computational capacity
available to you.
858
00:40:06,290 --> 00:40:08,490
And you may actually
apply an iterative method
859
00:40:08,490 --> 00:40:12,120
like this to a problem
that that controller
860
00:40:12,120 --> 00:40:15,330
needs to solve, for example.
861
00:40:15,330 --> 00:40:17,370
It just may not
have the capability
862
00:40:17,370 --> 00:40:20,880
of storing and inverting what
we would consider, today,
863
00:40:20,880 --> 00:40:25,350
a relatively small matrix
because the hardware doesn't
864
00:40:25,350 --> 00:40:26,670
have that sort of capability.
865
00:40:26,670 --> 00:40:28,890
So there could be
cases where you
866
00:40:28,890 --> 00:40:32,250
might choose something
that's slower but feasible,
867
00:40:32,250 --> 00:40:34,650
versus something that's
faster and exact,
868
00:40:34,650 --> 00:40:36,990
because there are
other constraints.
869
00:40:36,990 --> 00:40:40,560
They do exist, but modern
computers are pretty efficient.
870
00:40:40,560 --> 00:40:43,590
Your cell phone is faster
than the fastest computers
871
00:40:43,590 --> 00:40:46,410
in the world 20 years ago.
872
00:40:46,410 --> 00:40:47,550
We're doing OK.
873
00:40:47,550 --> 00:40:49,890
So we've got to get out to
big system sizes, big problem
874
00:40:49,890 --> 00:40:52,060
sizes, before this
starts to pay off.
875
00:40:52,060 --> 00:40:55,210
But it does for many
practical problems.
876
00:40:55,210 --> 00:40:59,410
OK I'll close with this, because
this is the hook into solving
877
00:40:59,410 --> 00:41:00,520
nonlinear equations.
878
00:41:03,537 --> 00:41:05,370
So I showed you these
two iterative methods,
879
00:41:05,370 --> 00:41:07,830
and they kind of had
stringent requirements
880
00:41:07,830 --> 00:41:10,770
for when they were actually
going to converge, right?
881
00:41:10,770 --> 00:41:13,350
I had to have a diagonally
dominant system of equations
882
00:41:13,350 --> 00:41:15,460
for Jacobi to converge.
883
00:41:15,460 --> 00:41:18,720
I had to have diagonal dominance
or symmetric positive definite
884
00:41:18,720 --> 00:41:19,260
matrices.
885
00:41:19,260 --> 00:41:20,730
These things exist
and they come up
886
00:41:20,730 --> 00:41:22,188
in lots of physical
problems, but I
887
00:41:22,188 --> 00:41:25,050
had to have it in order for
Gauss-Seidel to converge.
888
00:41:25,050 --> 00:41:26,700
What if I have a
system of equations
889
00:41:26,700 --> 00:41:28,194
that doesn't work that way?
890
00:41:28,194 --> 00:41:29,610
Or what if I have
an iterative map
891
00:41:29,610 --> 00:41:33,900
that I like for some reason, but
it doesn't appear to converge?
892
00:41:33,900 --> 00:41:37,270
Maybe it converges under some
circumstances, but not others.
893
00:41:37,270 --> 00:41:40,140
Well, there's a way to modify
these iterative maps, called
894
00:41:40,140 --> 00:41:43,380
successive
over-relaxation, which
895
00:41:43,380 --> 00:41:45,977
can help promote convergence.
896
00:41:45,977 --> 00:41:48,060
So suppose we have an
iterative map like this, x i
897
00:41:48,060 --> 00:41:51,897
plus 1 is some function of
the previous iteration value.
898
00:41:51,897 --> 00:41:52,980
Doesn't matter what it is.
899
00:41:52,980 --> 00:41:54,646
It could be linear,
could be non-linear.
900
00:41:54,646 --> 00:41:57,760
We don't actually care.
901
00:41:57,760 --> 00:42:00,610
The sought after solution
is found when x i plus 1
902
00:42:00,610 --> 00:42:01,630
is equal to x i.
903
00:42:01,630 --> 00:42:03,540
So this map is one
the convergence
904
00:42:03,540 --> 00:42:06,250
to the exact solution of
the problem that we want.
905
00:42:06,250 --> 00:42:08,650
We've somehow guaranteed
that that's the case,
906
00:42:08,650 --> 00:42:10,990
but it has to converge.
907
00:42:10,990 --> 00:42:14,890
One way to modify that
map is to say x i plus 1
908
00:42:14,890 --> 00:42:17,890
is 1 minus some
scalar value omega
909
00:42:17,890 --> 00:42:23,020
times x i plus omega times f.
910
00:42:23,020 --> 00:42:26,350
You can confirm that if you
substitute x i plus 1 equals
911
00:42:26,350 --> 00:42:28,900
x i into this equation,
you'll come up
912
00:42:28,900 --> 00:42:33,940
with the same fixed points
of this iterative map
913
00:42:33,940 --> 00:42:36,500
x i is equal to f of x i.
914
00:42:36,500 --> 00:42:40,210
So you haven't changed what
value will converge here,
915
00:42:40,210 --> 00:42:42,460
but you've affected the
rate at which it converges.
916
00:42:42,460 --> 00:42:45,640
Here you're saying x i
plus 1 is some fraction
917
00:42:45,640 --> 00:42:50,290
of my previous solution plus
some fraction of this f.
918
00:42:50,290 --> 00:42:53,890
And I get to control how big
those different fractions.
919
00:42:53,890 --> 00:42:58,410
So if things aren't converging
well for a map like this,
920
00:42:58,410 --> 00:43:00,700
then I could try
successive over-relaxation,
921
00:43:00,700 --> 00:43:03,400
and I could adjust this
relaxation parameter
922
00:43:03,400 --> 00:43:07,890
to be some fraction, some
number between 0 and 1,
923
00:43:07,890 --> 00:43:10,050
until I start to
observe convergence.
924
00:43:10,050 --> 00:43:11,550
And there are some
rules one can use
925
00:43:11,550 --> 00:43:13,383
to try to promote
convergence with this kind
926
00:43:13,383 --> 00:43:15,204
of successive over-relaxation.
927
00:43:15,204 --> 00:43:17,370
This is a very generic
technique that one can apply.
928
00:43:17,370 --> 00:43:20,700
If you have any iterative
map you're trying to apply,
929
00:43:20,700 --> 00:43:22,840
it should go to the
solution you want
930
00:43:22,840 --> 00:43:24,570
but it doesn't converge
for some reason,
931
00:43:24,570 --> 00:43:27,630
then you can use this
relaxation technique to promote
932
00:43:27,630 --> 00:43:29,250
convergence to the solution.
933
00:43:29,250 --> 00:43:31,770
You may slow the
convergence way down.
934
00:43:31,770 --> 00:43:34,680
It may be very slow to
converge, but it will converge.
935
00:43:34,680 --> 00:43:36,690
And after all, an
answer is better
936
00:43:36,690 --> 00:43:38,940
than no answer, no matter
how long it takes to get it.
937
00:43:38,940 --> 00:43:40,740
So sometimes you've
got to get these things
938
00:43:40,740 --> 00:43:43,670
by hook or by crook.
939
00:43:43,670 --> 00:43:45,710
So for example, you can
apply this to Jacobi.
940
00:43:45,710 --> 00:43:48,514
This was the
original Jacobi map.
941
00:43:48,514 --> 00:43:49,430
And we just take that.
942
00:43:49,430 --> 00:43:53,640
We add 1 minus omega
times x i plus omega
943
00:43:53,640 --> 00:43:55,680
times this factor over here.
944
00:43:55,680 --> 00:43:59,300
And now we can choose omega so
that this solution converges.
945
00:43:59,300 --> 00:44:02,360
We always make omega
small enough so
946
00:44:02,360 --> 00:44:04,550
that the diagonal
values of our matrix
947
00:44:04,550 --> 00:44:07,220
appear big enough
that the matrix looks
948
00:44:07,220 --> 00:44:09,770
like it's diagonally dominated.
949
00:44:09,770 --> 00:44:12,230
You could go back to that
same convergence analysis
950
00:44:12,230 --> 00:44:14,510
that I showed you before
and try to apply it
951
00:44:14,510 --> 00:44:19,252
to this over-relaxation
form of Jacobi and see that,
952
00:44:19,252 --> 00:44:21,710
while there's always going to
be some value of omega that's
953
00:44:21,710 --> 00:44:25,420
small enough, that this
thing will converge.
954
00:44:25,420 --> 00:44:27,920
It will look effectively
diagonally dominant,
955
00:44:27,920 --> 00:44:32,150
because omega inverse
times D will be big enough,
956
00:44:32,150 --> 00:44:34,865
or omega times D inverse
will be small enough.
957
00:44:34,865 --> 00:44:36,619
Does that make sense?
958
00:44:36,619 --> 00:44:39,160
You can apply the same sort of
damping method to Gauss-Seidel
959
00:44:39,160 --> 00:44:39,660
as well.
960
00:44:39,660 --> 00:44:42,360
It's very common to do this.
961
00:44:42,360 --> 00:44:45,560
The relaxation parameter acts
like an effective increase
962
00:44:45,560 --> 00:44:47,692
in the eigenvalues
of the matrix.
963
00:44:47,692 --> 00:44:50,150
So you can think about L. That's
a lower triangular matrix.
964
00:44:50,150 --> 00:44:54,310
It's diagonal values
are its eigenvalues.
965
00:44:54,310 --> 00:44:56,190
The diagonal values
of L inverse--
966
00:44:56,190 --> 00:44:59,000
well, 1 over those
diagonal values
967
00:44:59,000 --> 00:45:00,820
are the eigenvalues
of L inverse.
968
00:45:00,820 --> 00:45:03,560
And so if we make
omega very small,
969
00:45:03,560 --> 00:45:06,630
then we make the eigenvalues
of L inverse very small,
970
00:45:06,630 --> 00:45:08,510
or the eigenvalues
or L very big.
971
00:45:08,510 --> 00:45:12,170
And again, the matrix starts
to look diagonally dominated.
972
00:45:12,170 --> 00:45:15,330
And you can promote
convergence in this way.
973
00:45:15,330 --> 00:45:17,960
So even though this
may be slow, you
974
00:45:17,960 --> 00:45:19,640
can use it to
guarantee convergence
975
00:45:19,640 --> 00:45:22,299
of some iterative procedures,
not just for linear equations,
976
00:45:22,299 --> 00:45:23,840
but for non-linear
equations as well.
977
00:45:23,840 --> 00:45:25,370
And we'll see,
there are good ways
978
00:45:25,370 --> 00:45:27,230
of choosing omega
for certain classes
979
00:45:27,230 --> 00:45:29,010
of non-linear equations.
980
00:45:29,010 --> 00:45:30,440
We'll apply
Newton-Raphson method,
981
00:45:30,440 --> 00:45:33,590
and then will damp it using
exactly the sort of procedure.
982
00:45:33,590 --> 00:45:37,190
And I'll show you
how you can choose
983
00:45:37,190 --> 00:45:41,860
a nearly optimal value for
omega to promote convergence
984
00:45:41,860 --> 00:45:43,610
to the solution.
985
00:45:43,610 --> 00:45:46,480
Any questions?
986
00:45:46,480 --> 00:45:50,140
No, let me address one
more thing before you go.
987
00:45:50,140 --> 00:45:53,140
We've scheduled times
for the quizzes.
988
00:45:53,140 --> 00:45:55,180
They are going to
be in the evenings
989
00:45:55,180 --> 00:45:58,180
on the dates that are
specified on the syllabus.
990
00:45:58,180 --> 00:46:01,060
We wanted to do them
during the daytime.
991
00:46:01,060 --> 00:46:02,890
It was really
difficult to schedule
992
00:46:02,890 --> 00:46:04,970
a room that was big
enough for this class,
993
00:46:04,970 --> 00:46:07,990
so they have to be from 7:00
to 9:00 in the gymnasium.
994
00:46:07,990 --> 00:46:09,709
I apologize for that.
995
00:46:09,709 --> 00:46:11,500
We spent several days
looking around trying
996
00:46:11,500 --> 00:46:14,050
to find a place where
we could put everybody
997
00:46:14,050 --> 00:46:17,680
so you would all get the
same experience in the quiz.
998
00:46:17,680 --> 00:46:20,590
I know that the November
quiz comes back to back
999
00:46:20,590 --> 00:46:24,130
with the thermodynamics
exam as well.
1000
00:46:24,130 --> 00:46:26,030
That's frustrating.
1001
00:46:26,030 --> 00:46:27,850
Thermodynamics is the next day.
1002
00:46:27,850 --> 00:46:29,170
That week is tricky.
1003
00:46:29,170 --> 00:46:33,004
That's AICHE, so most of
the faculty have to travel.
1004
00:46:33,004 --> 00:46:34,420
We won't be able
to teach, but you
1005
00:46:34,420 --> 00:46:36,350
won't have classes
one of those days
1006
00:46:36,350 --> 00:46:39,910
so you have extra time to study.
1007
00:46:39,910 --> 00:46:41,750
And Columbus Day also
falls in that week,
1008
00:46:41,750 --> 00:46:44,590
so there's no way to put
three exams in four days
1009
00:46:44,590 --> 00:46:46,830
without having them
come right back to back.
1010
00:46:46,830 --> 00:46:48,460
Believe me, we
thought about this
1011
00:46:48,460 --> 00:46:51,004
and tried to get things
scheduled as efficiently as we
1012
00:46:51,004 --> 00:46:52,420
could for you, but
sometimes there
1013
00:46:52,420 --> 00:46:54,420
are constraints that are
outside of our control.
1014
00:46:54,420 --> 00:46:55,927
But the quiz times are set.
1015
00:46:55,927 --> 00:46:58,260
There's going to be done in
October and one in November.
1016
00:46:58,260 --> 00:47:00,790
They'll be in the evening, and
they'll be in the gymnasium.
1017
00:47:00,790 --> 00:47:03,790
I'll give you directions
to it before the exam, just
1018
00:47:03,790 --> 00:47:06,280
say you know exactly
where to go, OK?
1019
00:47:06,280 --> 00:47:08,210
Thank you, guys.