1
00:00:01,580 --> 00:00:03,920
The following content is
provided under a Creative

2
00:00:03,920 --> 00:00:05,340
Commons license.

3
00:00:05,340 --> 00:00:07,550
Your support will help
MIT OpenCourseWare

4
00:00:07,550 --> 00:00:11,640
continue to offer high quality
educational resources for free.

5
00:00:11,640 --> 00:00:14,180
To make a donation or to
view additional materials

6
00:00:14,180 --> 00:00:18,110
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,110 --> 00:00:19,130
at ocw.mit.edu.

8
00:00:23,390 --> 00:00:26,150
JAMES W. SWAN: So this is
going to be our last lecture

9
00:00:26,150 --> 00:00:27,960
on linear algebra.

10
00:00:27,960 --> 00:00:30,417
The first three
lectures covered basics.

11
00:00:30,417 --> 00:00:32,750
The next three lectures, we
talked about different sorts

12
00:00:32,750 --> 00:00:34,730
of transformations of matrices.

13
00:00:34,730 --> 00:00:36,742
This final lecture is
the last of those three.

14
00:00:36,742 --> 00:00:39,200
We're going to talk about in
another sort of transformation

15
00:00:39,200 --> 00:00:42,820
called the singular
value decomposition.

16
00:00:42,820 --> 00:00:46,610
OK, before we jump in, I'd like
to do the usual recap business.

17
00:00:46,610 --> 00:00:49,010
I think it's always hopeful
to recap or look at things

18
00:00:49,010 --> 00:00:51,170
from a different perspective.

19
00:00:51,170 --> 00:00:55,130
Early on, I told you that the
infinite dimensional equivalent

20
00:00:55,130 --> 00:00:57,740
of vectors would be something
like a function, which

21
00:00:57,740 --> 00:01:02,510
is a map, a unique map maybe
from a point to x to some value

22
00:01:02,510 --> 00:01:04,150
f of x.

23
00:01:04,150 --> 00:01:06,080
And there is an
equivalent representation

24
00:01:06,080 --> 00:01:08,780
of the eigenvalue eigenvector
problem in function space.

25
00:01:08,780 --> 00:01:11,330
We call these eigenvalues
and eigenfunctions.

26
00:01:11,330 --> 00:01:15,074
Here's a classic one where
the function is y of x, OK?

27
00:01:15,074 --> 00:01:17,240
This is the equivalent of
the vector, and equivalent

28
00:01:17,240 --> 00:01:18,890
of the transformation
or the matrix

29
00:01:18,890 --> 00:01:21,230
that's this differential
operator this time,

30
00:01:21,230 --> 00:01:22,850
the second derivative.

31
00:01:22,850 --> 00:01:26,340
So I take the second derivative
of this particular function,

32
00:01:26,340 --> 00:01:28,040
and the function is stretched.

33
00:01:28,040 --> 00:01:30,980
It's multiplied by some
fixed value at all points.

34
00:01:30,980 --> 00:01:33,830
And it becomes lambda times y.

35
00:01:33,830 --> 00:01:37,100
And that operator has to be
closed with some boundary

36
00:01:37,100 --> 00:01:37,920
conditions as well.

37
00:01:37,920 --> 00:01:39,830
We have to say
what the value of y

38
00:01:39,830 --> 00:01:43,640
is at the edges
of some boundary.

39
00:01:43,640 --> 00:01:45,590
So there's a one-to-one
correspondence

40
00:01:45,590 --> 00:01:48,470
between these things.

41
00:01:48,470 --> 00:01:54,730
What is the eigenfunction here,
or what are the eigenfunctions?

42
00:01:54,730 --> 00:01:57,160
And what are the
eigenvalues associated

43
00:01:57,160 --> 00:01:59,590
with this transformation
or this operator?

44
00:01:59,590 --> 00:02:01,667
Can you work those
out really quickly?

45
00:02:01,667 --> 00:02:03,250
You learned this at
some point, right?

46
00:02:03,250 --> 00:02:05,995
Somebody taught you
differential equations

47
00:02:05,995 --> 00:02:07,360
and you calculated these things.

48
00:02:07,360 --> 00:02:08,410
Take about 90 seconds.

49
00:02:08,410 --> 00:02:09,370
Work with the people around you.

50
00:02:09,370 --> 00:02:11,286
See if you can come to
a conclusion about what

51
00:02:11,286 --> 00:02:14,980
the eigenfunction
and eigenvalues are.

52
00:02:26,699 --> 00:02:27,490
That's enough time.

53
00:02:27,490 --> 00:02:28,870
You can work on this
on your own later

54
00:02:28,870 --> 00:02:29,953
if you've run out of time.

55
00:02:29,953 --> 00:02:30,976
Don't worry about it.

56
00:02:30,976 --> 00:02:32,600
Does somebody want
to volunteer a guess

57
00:02:32,600 --> 00:02:36,340
for what the eigenfunctions
are in this case?

58
00:02:36,340 --> 00:02:38,350
What are they?

59
00:02:38,350 --> 00:02:39,300
Yeah?

60
00:02:39,300 --> 00:02:42,846
AUDIENCE: [INAUDIBLE]

61
00:02:42,846 --> 00:02:44,720
JAMES W. SWAN: OK, so
you chose exponentials.

62
00:02:44,720 --> 00:02:45,950
That's an interesting choice.

63
00:02:45,950 --> 00:02:47,880
That's one possible
choice you can make.

64
00:02:47,880 --> 00:02:48,994
OK, so we could say--

65
00:02:48,994 --> 00:02:50,660
this is sort of a
classical one that you

66
00:02:50,660 --> 00:02:55,274
think about when you first
learn differential equation.

67
00:02:55,274 --> 00:02:56,690
They say, an
equation of this sort

68
00:02:56,690 --> 00:03:01,544
has solutions that look like
exponentials, and that's true.

69
00:03:01,544 --> 00:03:03,210
There's another
representation for this,

70
00:03:03,210 --> 00:03:05,760
which is as trigonometric
functions instead, right?

71
00:03:09,110 --> 00:03:10,470
Either of those is acceptable.

72
00:03:10,470 --> 00:03:12,550
[INAUDIBLE] the
trigonometric functions,

73
00:03:12,550 --> 00:03:17,417
that representation is a
little more useful for us here.

74
00:03:17,417 --> 00:03:19,750
We know that the boundary
conditions tell us that y of 0

75
00:03:19,750 --> 00:03:22,430
is supposed to be 0.

76
00:03:22,430 --> 00:03:26,590
That means that the C1 has to
be 0, because cosine of 0 is 1.

77
00:03:26,590 --> 00:03:29,550
So C1 has 0 in this case.

78
00:03:29,550 --> 00:03:33,030
So that fixes one of
these coefficients.

79
00:03:33,030 --> 00:03:35,730
And now we're left
with a problem, right?

80
00:03:35,730 --> 00:03:38,310
Our solutions, our
eigenfunctions,

81
00:03:38,310 --> 00:03:39,220
cannot be unique.

82
00:03:39,220 --> 00:03:41,760
So we don't get to
specify C2, right?

83
00:03:41,760 --> 00:03:44,537
Any function that's a
multiple of this sine

84
00:03:44,537 --> 00:03:45,870
should also be an eigenfunction.

85
00:03:45,870 --> 00:03:47,640
So instead the other
boundary condition,

86
00:03:47,640 --> 00:03:51,030
this y of l equals 0, needs
to be used to pin down

87
00:03:51,030 --> 00:03:52,920
with the eigenvalue is.

88
00:03:52,920 --> 00:03:54,870
So the second
equation, y of l equals

89
00:03:54,870 --> 00:03:59,560
0, which implies that the
square root of minus lambda

90
00:03:59,560 --> 00:04:01,660
has to be equal
to 2 pi over l, it

91
00:04:01,660 --> 00:04:04,232
has to be all the
nodes of the sine

92
00:04:04,232 --> 00:04:05,440
where the sine is equal to 0.

93
00:04:05,440 --> 00:04:07,565
That's the equivalent of
our secular characteristic

94
00:04:07,565 --> 00:04:10,990
polynomial that prescribes with
the eigenvalues are associated

95
00:04:10,990 --> 00:04:13,524
with each of the eigenfunctions.

96
00:04:13,524 --> 00:04:15,190
So now we know what
the eigenvalues are.

97
00:04:15,190 --> 00:04:17,320
The eigenvalues are
the set of numbers

98
00:04:17,320 --> 00:04:21,700
minus 2 pi n over l squared.

99
00:04:21,700 --> 00:04:23,490
There's an infinite
number of eigenvalues.

100
00:04:23,490 --> 00:04:26,920
It's an infinite dimensional
space that we're in,

101
00:04:26,920 --> 00:04:29,230
so it's not a big surprise
that it works out that way.

102
00:04:29,230 --> 00:04:33,100
And the eigenvectors then are
different scalar multiples

103
00:04:33,100 --> 00:04:36,220
of sine of the eigenvalues,
square root of the eigenvalues,

104
00:04:36,220 --> 00:04:37,672
minus x.

105
00:04:37,672 --> 00:04:39,130
There's a one-to-one
correspondence

106
00:04:39,130 --> 00:04:41,110
between all the linear
algebra we've done

107
00:04:41,110 --> 00:04:42,670
and linear
differential equations

108
00:04:42,670 --> 00:04:44,560
or linear partial
differential equations.

109
00:04:44,560 --> 00:04:47,030
You can think about these
things in exactly the same way.

110
00:04:47,030 --> 00:04:53,080
I'm sure in 1050, you started to
talk about orthogonal functions

111
00:04:53,080 --> 00:04:55,270
to represent solutions of
differential equations.

112
00:04:55,270 --> 00:04:57,190
Or if you haven't, you're
going to very soon.

113
00:04:57,190 --> 00:04:58,690
This is a part of
the course you get

114
00:04:58,690 --> 00:05:01,180
to look at the analytical side
of some of these things as

115
00:05:01,180 --> 00:05:02,230
opposed to the numerical side.

116
00:05:02,230 --> 00:05:03,771
But there's a
one-to-one relationship

117
00:05:03,771 --> 00:05:04,730
between those things.

118
00:05:04,730 --> 00:05:06,855
So if you understand one,
you understand the other,

119
00:05:06,855 --> 00:05:10,770
and you can come at them
from either perspective.

120
00:05:10,770 --> 00:05:11,979
This sort of stuff is useful.

121
00:05:11,979 --> 00:05:14,144
Actually, the classical
chemical engineering example

122
00:05:14,144 --> 00:05:15,920
comes from quantum
mechanics where

123
00:05:15,920 --> 00:05:18,780
you think about wave functions
and different energy levels

124
00:05:18,780 --> 00:05:21,870
corresponding to eigenvalues.

125
00:05:21,870 --> 00:05:23,220
That's cool.

126
00:05:23,220 --> 00:05:25,460
Sometimes, I like to think
about a mechanical analog

127
00:05:25,460 --> 00:05:28,420
to that, which is the
buckling of an elastic column.

128
00:05:28,420 --> 00:05:29,670
So you should do this at home.

129
00:05:29,670 --> 00:05:31,800
You should go get a
piece of spaghetti

130
00:05:31,800 --> 00:05:34,620
and push on the ends of
the piece of the spaghetti.

131
00:05:34,620 --> 00:05:36,000
And the spaghetti will buckle.

132
00:05:36,000 --> 00:05:38,300
Eventually it'll break,
but it'll buckle first.

133
00:05:38,300 --> 00:05:39,520
It'll bend.

134
00:05:39,520 --> 00:05:41,070
And how does it bend?

135
00:05:41,070 --> 00:05:45,330
Well, a balance of linear
momentum on this bar

136
00:05:45,330 --> 00:05:48,780
would tell you
that the deflection

137
00:05:48,780 --> 00:05:52,420
in the bar at different
points x along the bar

138
00:05:52,420 --> 00:05:57,600
multiplied by the pressure has
to balance the bending moment

139
00:05:57,600 --> 00:05:58,690
in the bar itself.

140
00:05:58,690 --> 00:06:00,840
So this e is some
elastic constant.

141
00:06:00,840 --> 00:06:02,640
I has a moment of inertia.

142
00:06:02,640 --> 00:06:04,650
And D squared y dx
squared is something

143
00:06:04,650 --> 00:06:05,980
like the curvature of the bar.

144
00:06:05,980 --> 00:06:07,590
So it's the bending
moments of the bar

145
00:06:07,590 --> 00:06:09,510
that balances the
pressure that's

146
00:06:09,510 --> 00:06:11,250
being exerted on the bar.

147
00:06:11,250 --> 00:06:13,800
And sure enough,
this bar will buckle

148
00:06:13,800 --> 00:06:16,860
when the pressure
applied exceeds

149
00:06:16,860 --> 00:06:20,330
the first eigenvalue associated
with this differential

150
00:06:20,330 --> 00:06:20,830
equation.

151
00:06:20,830 --> 00:06:24,390
We just worked that
eigenvalue out.

152
00:06:24,390 --> 00:06:29,950
We said that that eigenvalue
had to be the square root of 2

153
00:06:29,950 --> 00:06:31,407
pi over l squared.

154
00:06:31,407 --> 00:06:33,240
And so when the pressure
exceeds square root

155
00:06:33,240 --> 00:06:36,810
of 2 pi over l squared
times the elastic modulus,

156
00:06:36,810 --> 00:06:40,464
this column will bend
and deform continuously

157
00:06:40,464 --> 00:06:41,880
until it eventually
breaks, right?

158
00:06:41,880 --> 00:06:44,710
It will undergo this
linear elastic deformation,

159
00:06:44,710 --> 00:06:48,560
then plastic deformation
later, and it will break.

160
00:06:48,560 --> 00:06:50,360
The Eiffel Tower,
actually, is one

161
00:06:50,360 --> 00:06:51,860
of the first
structures in the world

162
00:06:51,860 --> 00:06:54,270
to utilize this
principle, right?

163
00:06:54,270 --> 00:06:56,000
It's got very
narrow beams in it.

164
00:06:56,000 --> 00:06:59,614
The beams are engineered so
that their elastic modulus

165
00:06:59,614 --> 00:07:01,280
is strong enough that
they won't buckle.

166
00:07:01,280 --> 00:07:04,060
Gustave Eiffel is one of the
first applied physicists,

167
00:07:04,060 --> 00:07:07,700
somebody who took the
physics of elastic bars

168
00:07:07,700 --> 00:07:10,460
and applied them to
building structures that

169
00:07:10,460 --> 00:07:14,570
weren't big and blocky, but used
a minimal amount of material.

170
00:07:14,570 --> 00:07:16,880
Cool, right?

171
00:07:16,880 --> 00:07:18,505
OK, so that's recap.

172
00:07:21,580 --> 00:07:23,980
Any questions about that?

173
00:07:23,980 --> 00:07:26,410
You've seen these things before.

174
00:07:26,410 --> 00:07:29,530
You understood them
well before too maybe?

175
00:07:29,530 --> 00:07:33,430
Give some thought to this, OK?

176
00:07:33,430 --> 00:07:37,090
We talked about
eigendecomposition last time

177
00:07:37,090 --> 00:07:40,050
that, associated with
the square matrix,

178
00:07:40,050 --> 00:07:44,050
was a particular eigenvalue or
particular set of eigenvalues,

179
00:07:44,050 --> 00:07:48,250
stretches and corresponding
eigenvectors directions.

180
00:07:48,250 --> 00:07:51,490
These were special solutions to
the system of linear equations

181
00:07:51,490 --> 00:07:52,540
based on a matrix.

182
00:07:52,540 --> 00:07:53,815
It was a square matrix.

183
00:07:53,815 --> 00:07:55,540
And you might ask,
well, what happens

184
00:07:55,540 --> 00:07:57,850
if the matrix isn't square?

185
00:07:57,850 --> 00:08:03,010
What if A is in the space of
real matrices that are n by m,

186
00:08:03,010 --> 00:08:04,630
where n and m maybe
aren't the same?

187
00:08:04,630 --> 00:08:06,910
Maybe they are the same,
but maybe they're not.

188
00:08:06,910 --> 00:08:10,046
And there is an
equivalent decomposition.

189
00:08:10,046 --> 00:08:11,920
It's called the singular
value decomposition.

190
00:08:11,920 --> 00:08:16,850
It's like an eigendecomposition
for non-square matrices.

191
00:08:16,850 --> 00:08:21,150
So rather than writing our
matrix as some w lambda w

192
00:08:21,150 --> 00:08:24,720
inverse, we're going to
write it as some product

193
00:08:24,720 --> 00:08:29,490
U times sigma times
V with this dagger.

194
00:08:29,490 --> 00:08:31,940
The dagger here is
conjugate transpose.

195
00:08:31,940 --> 00:08:34,400
Transpose the matrix, and
take the complex conjugate

196
00:08:34,400 --> 00:08:36,382
of all the elements, OK?

197
00:08:36,382 --> 00:08:39,630
I mentioned last time that
eigenvalues and eigenvectors

198
00:08:39,630 --> 00:08:42,820
could be complex,
potentially, right?

199
00:08:42,820 --> 00:08:45,960
So whenever we have that case
where things can be complex,

200
00:08:45,960 --> 00:08:47,510
usually the
transposition operation

201
00:08:47,510 --> 00:08:49,510
is replaced with the
conjugate transpose.

202
00:08:52,224 --> 00:08:53,640
What are these
different matrices.

203
00:08:53,640 --> 00:08:55,620
Well, let me tell you.

204
00:08:55,620 --> 00:08:58,140
U is a complex matrix.

205
00:08:58,140 --> 00:09:01,530
It maps from the
space N to R N to R N,

206
00:09:01,530 --> 00:09:04,350
so it's an n by n square matrix.

207
00:09:04,350 --> 00:09:09,240
Sigma is a real valued matrix,
and it lives in the space of n

208
00:09:09,240 --> 00:09:10,830
by n matrices.

209
00:09:10,830 --> 00:09:16,520
V is a square matrix again,
but it has dimensions m by m.

210
00:09:16,520 --> 00:09:19,770
Remember, A maps
from R M to R N,

211
00:09:19,770 --> 00:09:22,110
so that's what the
sequence of products says.

212
00:09:22,110 --> 00:09:23,820
B maps from m to m.

213
00:09:23,820 --> 00:09:27,320
Sigma maps from m to n.

214
00:09:27,320 --> 00:09:28,720
U maps from n to n.

215
00:09:28,720 --> 00:09:31,350
So this match from
m to n as well.

216
00:09:34,060 --> 00:09:36,760
Sigma is like
lambda from before.

217
00:09:36,760 --> 00:09:38,660
It's a diagonal matrix.

218
00:09:38,660 --> 00:09:40,040
It only has diagonal elements.

219
00:09:40,040 --> 00:09:41,415
It's just not
square, but it only

220
00:09:41,415 --> 00:09:45,540
has diagonal elements, all
of which will be positive.

221
00:09:48,100 --> 00:09:51,730
And then U and V are called
the left and right singular

222
00:09:51,730 --> 00:09:52,310
vectors.

223
00:09:52,310 --> 00:09:54,560
And they have special
properties associated with them,

224
00:09:54,560 --> 00:09:56,290
which I'll show you right now.

225
00:09:56,290 --> 00:09:59,600
Any questions about
how this decomposition

226
00:09:59,600 --> 00:10:02,690
is composed or made up?

227
00:10:02,690 --> 00:10:04,790
It looks just like the
eigendecomposition,

228
00:10:04,790 --> 00:10:06,360
but it can be applied
to any matrix.

229
00:10:06,360 --> 00:10:06,860
Yes?

230
00:10:06,860 --> 00:10:07,910
AUDIENCE: Quick question.

231
00:10:07,910 --> 00:10:08,170
JAMES W. SWAN: Sure.

232
00:10:08,170 --> 00:10:09,920
AUDIENCE: Do all
matrices have this thing,

233
00:10:09,920 --> 00:10:12,440
or is it like the eigenvalues
where some do and some don't.

234
00:10:12,440 --> 00:10:13,170
JAMES W. SWAN: This
is a great question.

235
00:10:13,170 --> 00:10:15,253
So all matrices are going
to have a singular value

236
00:10:15,253 --> 00:10:16,710
decomposition.

237
00:10:16,710 --> 00:10:18,410
We saw with the
eigenvalue decomposition

238
00:10:18,410 --> 00:10:22,520
that there could be a case
where the eigenvectors are

239
00:10:22,520 --> 00:10:25,190
degenerate, and we can't
write that full decomposition.

240
00:10:25,190 --> 00:10:28,214
All matrices are going to
have this decomposition.

241
00:10:34,380 --> 00:10:41,690
So for some properties
of this decomposition,

242
00:10:41,690 --> 00:10:44,330
U and V are what we
call unitary matrices.

243
00:10:44,330 --> 00:10:45,850
I talked about these before.

244
00:10:45,850 --> 00:10:49,430
Unitary matrices are ones for
whom, if they're real valued,

245
00:10:49,430 --> 00:10:51,830
their transpose is
also their inverse.

246
00:10:51,830 --> 00:10:54,490
If they're complex
valued, and they're

247
00:10:54,490 --> 00:10:57,620
conjugate transpose is the
equivalent of their inverse.

248
00:10:57,620 --> 00:11:01,930
So U times U conjugate
transpose will be identity.

249
00:11:01,930 --> 00:11:05,800
V times V conjugate
transpose will be identity.

250
00:11:05,800 --> 00:11:08,510
Unitary matrices also
have the property

251
00:11:08,510 --> 00:11:12,400
that they impart no
stretch to a matrix--

252
00:11:12,400 --> 00:11:13,820
or to vectors.

253
00:11:13,820 --> 00:11:16,460
So their maps don't stretch.

254
00:11:16,460 --> 00:11:19,104
They're kind of like
rotational matrices, right?

255
00:11:19,104 --> 00:11:21,520
They change directions, but
they don't stretch things out.

256
00:11:25,190 --> 00:11:30,890
If I were to take A conjugate
transpose and multiply it by A,

257
00:11:30,890 --> 00:11:36,680
that would be the same as taking
U sigma V conjugate transpose,

258
00:11:36,680 --> 00:11:38,780
and multiplying it
by U sigma V. If I

259
00:11:38,780 --> 00:11:43,010
use the properties of
matrix multiplications

260
00:11:43,010 --> 00:11:46,340
and complex
conjugate transposes,

261
00:11:46,340 --> 00:11:48,500
and work out what
this expression is,

262
00:11:48,500 --> 00:11:52,280
I'll find out that it's
equivalent to V sigma conjugate

263
00:11:52,280 --> 00:11:56,660
transpose sigma V
conjugate transpose.

264
00:11:56,660 --> 00:12:03,050
Well this has exactly the same
form as an eigendecomposition.

265
00:12:03,050 --> 00:12:07,400
An eigendecomposition
of A times A instead

266
00:12:07,400 --> 00:12:11,600
of an eigendecomposition
of A. So V

267
00:12:11,600 --> 00:12:16,160
is the set of eigenvectors
of A conjugate transpose A,

268
00:12:16,160 --> 00:12:19,700
and sigma squared
are the eigenvalues

269
00:12:19,700 --> 00:12:24,260
of A conjugate
transpose times A.

270
00:12:24,260 --> 00:12:26,610
And if I reverse the order
of this multiplication--

271
00:12:26,610 --> 00:12:30,530
so I do A times A conjugate
transpose-- and work it out,

272
00:12:30,530 --> 00:12:34,010
that would be U sigma
sigma U. And so U

273
00:12:34,010 --> 00:12:38,060
are the eigenvectors of
A A conjugate transpose,

274
00:12:38,060 --> 00:12:42,320
and sigma squared are still the
eigenvalues of A A conjugate

275
00:12:42,320 --> 00:12:45,070
transpose.

276
00:12:45,070 --> 00:12:47,830
So what are these
things U and V?

277
00:12:47,830 --> 00:12:51,690
They relate to the
eigenvectors of the product

278
00:12:51,690 --> 00:12:54,550
of A with itself, this
particular product of A

279
00:12:54,550 --> 00:12:58,480
with itself, or this particular
product of A with itself.

280
00:12:58,480 --> 00:13:01,310
Sigma are the singular values.

281
00:13:01,310 --> 00:13:03,750
And all matrices possess
this sort of a decomposition.

282
00:13:03,750 --> 00:13:06,530
They all have a set of singular
values and singular vectors.

283
00:13:06,530 --> 00:13:08,994
These sigmas are called the
singular values of the A.

284
00:13:08,994 --> 00:13:10,160
They have a particular name.

285
00:13:10,160 --> 00:13:12,110
I'm going to show you how you
can use this decomposition

286
00:13:12,110 --> 00:13:14,000
to do something you
already know how to do,

287
00:13:14,000 --> 00:13:15,083
but how to do it formally.

288
00:13:18,200 --> 00:13:21,738
What are some properties of the
singular value decomposition?

289
00:13:24,810 --> 00:13:27,510
So if we take a matrix A and
we compute it's singular value

290
00:13:27,510 --> 00:13:29,470
decomposition, this is
how you do it in Matlab.

291
00:13:33,030 --> 00:13:36,560
We'll find out, for this
matrix, U is identity.

292
00:13:36,560 --> 00:13:39,720
Sigma is identity with an
extra column pasted on it.

293
00:13:39,720 --> 00:13:40,920
And B is also identity.

294
00:13:40,920 --> 00:13:44,130
I mean, this is the simplest
possible four by three matrix

295
00:13:44,130 --> 00:13:45,517
I can write down.

296
00:13:45,517 --> 00:13:47,850
You don't have to know how
to compute the singular value

297
00:13:47,850 --> 00:13:50,160
decomposition, you just
need to know that it

298
00:13:50,160 --> 00:13:51,420
can be computed in this way.

299
00:13:51,420 --> 00:13:53,160
You might be able to
guess how to compute it

300
00:13:53,160 --> 00:13:55,035
based on what we did
with eigenvalues earlier

301
00:13:55,035 --> 00:13:57,240
and eigenvectors.

302
00:13:57,240 --> 00:13:59,670
It'll turn out some of
the columns of sigma

303
00:13:59,670 --> 00:14:00,950
will be non-zero right?

304
00:14:00,950 --> 00:14:04,390
There are three non-zero
columns of sigma.

305
00:14:04,390 --> 00:14:07,780
And the columns of
V, they correspond

306
00:14:07,780 --> 00:14:11,800
to those columns of sigma,
spanned the null space

307
00:14:11,800 --> 00:14:16,620
of the matrix A.

308
00:14:16,620 --> 00:14:20,220
So the first three
columns here are non-zero,

309
00:14:20,220 --> 00:14:24,221
the first three columns
of V. I'm sorry,

310
00:14:24,221 --> 00:14:25,970
the first three columns
here are non-zero.

311
00:14:25,970 --> 00:14:27,590
The last column is 0.

312
00:14:27,590 --> 00:14:30,440
The columns of sigma
which are 0 correspond

313
00:14:30,440 --> 00:14:33,920
to a particular column in V,
this last column here, which

314
00:14:33,920 --> 00:14:36,017
lives in the null
space of A. So you

315
00:14:36,017 --> 00:14:37,600
can see, if I take
A and I multiply it

316
00:14:37,600 --> 00:14:41,970
by any vector that's
proportional to 0, 0, 0, 1,

317
00:14:41,970 --> 00:14:43,340
I'll get back 0.

318
00:14:43,340 --> 00:14:46,640
So the null space
of A is spanned

319
00:14:46,640 --> 00:14:48,260
by all these vectors
corresponding

320
00:14:48,260 --> 00:14:49,850
to the 0 columns of sigma.

321
00:14:52,980 --> 00:14:54,807
Some of the columns
of sigma are non-zero.

322
00:14:54,807 --> 00:14:55,890
These first three columns.

323
00:14:55,890 --> 00:15:02,000
And the rows of U corresponding
to those three columns

324
00:15:02,000 --> 00:15:04,550
span the range of A. So
if I do the singular value

325
00:15:04,550 --> 00:15:09,410
decomposition of a matrix,
and I look at U, V, and sigma

326
00:15:09,410 --> 00:15:11,060
and what they're composed of--

327
00:15:11,060 --> 00:15:15,050
where sigma is 0 and non-zero,
and the corresponding columns

328
00:15:15,050 --> 00:15:17,330
or rows of U and V--
then I can figure out

329
00:15:17,330 --> 00:15:25,110
what vectors span the range
and null space of the matrix A.

330
00:15:25,110 --> 00:15:26,710
Here's another example.

331
00:15:26,710 --> 00:15:29,130
So here I have A.
Now instead of being

332
00:15:29,130 --> 00:15:32,850
three rows by four columns,
it's four rows by three columns.

333
00:15:32,850 --> 00:15:34,660
And here's the singular
value decomposition

334
00:15:34,660 --> 00:15:37,140
that comes out of Matlab.

335
00:15:37,140 --> 00:15:41,580
There are no vectors that
live in the null space of A,

336
00:15:41,580 --> 00:15:44,160
and there are no 0
columns in sigma.

337
00:15:44,160 --> 00:15:46,975
There's no corresponding
columns in V.

338
00:15:46,975 --> 00:15:50,920
There are no vectors
in the null space of A.

339
00:15:50,920 --> 00:15:55,480
The range of A is spanned
by the rows corresponding

340
00:15:55,480 --> 00:15:58,420
to the non-zero-- the
rows of U corresponding

341
00:15:58,420 --> 00:15:59,950
to the non-zero
columns of sigma.

342
00:15:59,950 --> 00:16:02,890
So it's these three columns
in the first three rows.

343
00:16:02,890 --> 00:16:07,000
And these first three
rows, clearly they span--

344
00:16:07,000 --> 00:16:12,040
they describe the same range
as the three columns in A.

345
00:16:12,040 --> 00:16:13,540
So the singular
value decomposition

346
00:16:13,540 --> 00:16:17,290
gives us direct access
to the null space

347
00:16:17,290 --> 00:16:18,810
and the range of a matrix.

348
00:16:18,810 --> 00:16:21,570
That's handy.

349
00:16:21,570 --> 00:16:24,170
And it can be used
in various ways.

350
00:16:24,170 --> 00:16:26,140
So here's one example
where it can be used.

351
00:16:26,140 --> 00:16:29,830
Here I have a fingerprint.

352
00:16:29,830 --> 00:16:31,150
It's a bitmap.

353
00:16:31,150 --> 00:16:33,767
It's a square bit of
data, like a matrix,

354
00:16:33,767 --> 00:16:35,350
and each of the
elements of the matrix

355
00:16:35,350 --> 00:16:39,430
takes on a value describing
how dark or light that pixel.

356
00:16:39,430 --> 00:16:43,840
Let's say it's grayscale, and
it's value's between 0 and 255.

357
00:16:43,840 --> 00:16:45,820
That's pretty typical.

358
00:16:45,820 --> 00:16:48,490
So I have this matrix, and
each element to the matrix

359
00:16:48,490 --> 00:16:50,410
corresponds to a pixel.

360
00:16:50,410 --> 00:16:53,470
And I do a singular
value decomposition.

361
00:16:53,470 --> 00:16:55,750
Some of the singular
values, the values of sigma,

362
00:16:55,750 --> 00:16:57,760
are bigger than others.

363
00:16:57,760 --> 00:17:00,770
They're all positive, but
some are bigger than others.

364
00:17:00,770 --> 00:17:02,440
The ones that are
biggest in magnitude

365
00:17:02,440 --> 00:17:06,770
carry the most information
content about the matrix.

366
00:17:06,770 --> 00:17:11,470
So we can do data compression by
neglecting singular values that

367
00:17:11,470 --> 00:17:14,980
are smaller than some
threshold, and also neglecting

368
00:17:14,980 --> 00:17:17,439
the corresponding
singular vectors.

369
00:17:17,439 --> 00:17:18,730
And that's what I've done here.

370
00:17:18,730 --> 00:17:21,400
So here's the original
bitmap of the fingerprint.

371
00:17:21,400 --> 00:17:23,680
I did the singular
value decomposition,

372
00:17:23,680 --> 00:17:27,819
and then I retained only the 50
biggest singular values and I

373
00:17:27,819 --> 00:17:29,830
left all the other
singular values out.

374
00:17:29,830 --> 00:17:32,290
This bitmap was something
like, I don't know,

375
00:17:32,290 --> 00:17:34,450
300 pixels by 300
pixels, so there's

376
00:17:34,450 --> 00:17:37,210
like 300 singular
values, but I got rid

377
00:17:37,210 --> 00:17:40,480
of 5/6 of the
information content.

378
00:17:40,480 --> 00:17:43,120
I dropped 5/6 of the
singular vectors,

379
00:17:43,120 --> 00:17:44,800
and then I
reconstructed the matrix

380
00:17:44,800 --> 00:17:47,320
from the singular values
and those singular vectors,

381
00:17:47,320 --> 00:17:48,910
and you get a faithful
representation

382
00:17:48,910 --> 00:17:51,142
of the original fingerprint.

383
00:17:51,142 --> 00:17:52,600
So the singular
value decomposition

384
00:17:52,600 --> 00:17:54,433
says something about
the information content

385
00:17:54,433 --> 00:17:57,100
in the transformation
that is the matrix, right?

386
00:17:57,100 --> 00:17:58,720
There are some
transformations that

387
00:17:58,720 --> 00:18:03,172
are of lower power or
importance than others.

388
00:18:03,172 --> 00:18:04,630
And the magnitude
of these singular

389
00:18:04,630 --> 00:18:06,300
values tell you what they are.

390
00:18:06,300 --> 00:18:09,680
Does that makes sense?

391
00:18:09,680 --> 00:18:10,680
How else can it be used?

392
00:18:10,680 --> 00:18:14,370
Well, one way it can be
used is finding the least

393
00:18:14,370 --> 00:18:17,820
square solution
to the equation Ax

394
00:18:17,820 --> 00:18:22,420
equals b, where A is no
longer a square matrix, OK?

395
00:18:26,410 --> 00:18:28,030
You've done this
in other contexts

396
00:18:28,030 --> 00:18:32,860
before where the equations
are overspecified.

397
00:18:32,860 --> 00:18:36,090
We have more equations than
unknowns, like data fitting.

398
00:18:36,090 --> 00:18:39,820
You form the normal equations,
you multiply both sides of Ax

399
00:18:39,820 --> 00:18:45,910
equals b by A transpose, and
then invert A transpose A.

400
00:18:45,910 --> 00:18:47,890
You might not be
too surprised, then,

401
00:18:47,890 --> 00:18:50,380
to think that singular value
decomposition could be useful

402
00:18:50,380 --> 00:18:50,920
here too.

403
00:18:50,920 --> 00:18:54,790
Since we already saw the data in
a singular value decomposition

404
00:18:54,790 --> 00:18:58,630
corresponds to eigenvectors and
eigenvalues of this A transpose

405
00:18:58,630 --> 00:19:00,130
A, right?

406
00:19:00,130 --> 00:19:02,950
But there's a way to use
this sort of decomposition

407
00:19:02,950 --> 00:19:05,140
formally to solve
problems that are

408
00:19:05,140 --> 00:19:09,730
both overspecified
and underspecified.

409
00:19:09,730 --> 00:19:14,620
Least squares means find
the vector of solutions

410
00:19:14,620 --> 00:19:21,860
x that minimizes
this function phi.

411
00:19:21,860 --> 00:19:24,560
Phi is the length of the
vector given by the difference

412
00:19:24,560 --> 00:19:26,450
between Ax and b.

413
00:19:26,450 --> 00:19:29,900
It's one measure of how far
an error our solution x is.

414
00:19:29,900 --> 00:19:34,220
So let's define the value
x which is least in error.

415
00:19:34,220 --> 00:19:36,065
This is one definition
of least squares.

416
00:19:40,490 --> 00:19:43,950
And I know the singular value
decomposition of A. So A

417
00:19:43,950 --> 00:19:48,685
is U sigma times V. So I
have U sigma V times x.

418
00:19:48,685 --> 00:19:52,140
I can factor out U, and I've
got a factor of U transpose,

419
00:19:52,140 --> 00:19:54,960
or U conjugate transpose
multiplying by b.

420
00:19:54,960 --> 00:20:00,000
So Ax minus b is the same as
U times the quantity sigma V

421
00:20:00,000 --> 00:20:05,240
conjugate transpose x minus
U conjugate transpose b.

422
00:20:05,240 --> 00:20:08,480
We want to know the x
that minimizes this phi.

423
00:20:08,480 --> 00:20:10,130
It's an optimization problem.

424
00:20:10,130 --> 00:20:12,990
We'll talk in great detail about
these sorts of problems later.

425
00:20:12,990 --> 00:20:15,350
This one is so easy to do,
we can just work it out

426
00:20:15,350 --> 00:20:18,152
in a couple lines of text.

427
00:20:18,152 --> 00:20:19,610
We'll define a new
set of unknowns,

428
00:20:19,610 --> 00:20:25,130
y, which is V transpose times
x, and a new right-hand side

429
00:20:25,130 --> 00:20:29,270
for a system of equations p,
which is U transpose times b.

430
00:20:29,270 --> 00:20:31,220
And then we can rewrite
our function phi

431
00:20:31,220 --> 00:20:32,600
that we're trying to minimize.

432
00:20:32,600 --> 00:20:37,720
So phi then becomes
U sigma y minus p.

433
00:20:37,720 --> 00:20:39,080
U is a unitary vector.

434
00:20:39,080 --> 00:20:45,440
It imparts no stretch in the two
norms, so this sigma y minus p

435
00:20:45,440 --> 00:20:49,850
doesn't get elongated by
multiplication with U.

436
00:20:49,850 --> 00:20:52,280
So it's length,
the length of this,

437
00:20:52,280 --> 00:20:55,660
is the same as the length
of sigma y minus p.

438
00:20:55,660 --> 00:20:56,915
You can prove this.

439
00:20:56,915 --> 00:20:58,540
It's not very difficult
to show at all.

440
00:20:58,540 --> 00:21:02,430
You use the definition of
the two norm to prove it.

441
00:21:02,430 --> 00:21:10,150
So phi is minimized by y's,
which makes this norm smallest,

442
00:21:10,150 --> 00:21:11,170
make it closest to 0.

443
00:21:14,240 --> 00:21:17,060
Let r be the number
of non-zero singular

444
00:21:17,060 --> 00:21:19,190
values, the number
of those sigmas

445
00:21:19,190 --> 00:21:22,440
which are not equal to 0.

446
00:21:22,440 --> 00:21:24,360
That's also the rank of A.

447
00:21:24,360 --> 00:21:29,330
Then I can rewrite
phi as the sum from i

448
00:21:29,330 --> 00:21:36,060
equals 1 to r of sigma i i
time y i minus p i squared.

449
00:21:36,060 --> 00:21:40,160
That's parts of this length,
this Euclidean length,

450
00:21:40,160 --> 00:21:42,320
for which sigma is non-zero.

451
00:21:42,320 --> 00:21:46,730
Plus the sum from r
plus 1 to n, the sum

452
00:21:46,730 --> 00:21:50,240
over the rest of
the values of p,

453
00:21:50,240 --> 00:21:52,590
for which the
corresponding sigmas are 0.

454
00:21:59,710 --> 00:22:02,830
I want to minimize
phi, and the only thing

455
00:22:02,830 --> 00:22:05,337
that I can change to
minimize it is what?

456
00:22:08,139 --> 00:22:10,530
What am I free to
pick in this equation

457
00:22:10,530 --> 00:22:13,930
in order to make phi
as small as possible?

458
00:22:13,930 --> 00:22:14,570
Yeah?

459
00:22:14,570 --> 00:22:15,070
AUDIENCE: y.

460
00:22:15,070 --> 00:22:17,290
JAMES W. SWAN: y,
so I need to choose

461
00:22:17,290 --> 00:22:21,110
the y's that make this
phi as small as possible.

462
00:22:21,110 --> 00:22:22,955
What value should I
choose for the y's?

463
00:22:26,678 --> 00:22:27,666
What do you think?

464
00:22:30,630 --> 00:22:34,380
AUDIENCE: [INAUDIBLE]

465
00:22:34,380 --> 00:22:35,630
JAMES W. SWAN: Perfect, right?

466
00:22:35,630 --> 00:22:40,000
Choose y equals p
i over sigma i i.

467
00:22:40,000 --> 00:22:42,610
Right, y i is p
i over sigma i i.

468
00:22:42,610 --> 00:22:46,150
Then all of these terms is 0.

469
00:22:46,150 --> 00:22:48,490
I can't make this sum
any smaller than that.

470
00:22:48,490 --> 00:22:53,980
That fixes the value
of y i up to r.

471
00:22:53,980 --> 00:22:57,280
I can't do anything about
this left over bit here.

472
00:22:57,280 --> 00:23:00,340
There's no choice of
y that's going to make

473
00:23:00,340 --> 00:23:01,657
this part and the smaller.

474
00:23:01,657 --> 00:23:02,490
It's just left over.

475
00:23:02,490 --> 00:23:04,275
It's some remainder that
we can't make any smaller

476
00:23:04,275 --> 00:23:05,410
or minimize an smaller.

477
00:23:05,410 --> 00:23:08,125
There isn't an exact solution
to this problem, in many cases.

478
00:23:10,690 --> 00:23:17,190
But one way this could be
0 is if r is equal to n.

479
00:23:17,190 --> 00:23:19,570
Then there are left
over unspecified terms,

480
00:23:19,570 --> 00:23:22,375
and then this y i
equals p i over sigma i

481
00:23:22,375 --> 00:23:23,920
is the exact solution
to the problem.

482
00:23:28,280 --> 00:23:29,690
So this is what you told me.

483
00:23:29,690 --> 00:23:36,230
Choose y i is p i over sigma i i
for i bigger than 1 and smaller

484
00:23:36,230 --> 00:23:38,650
than r.

485
00:23:38,650 --> 00:23:40,710
There are going to
be values of y i

486
00:23:40,710 --> 00:23:46,070
that go between r plus 1 and
m, because A was a vector that

487
00:23:46,070 --> 00:23:47,990
mapped from m to n, right?

488
00:23:47,990 --> 00:23:52,160
So I have extra values of y that
could be specified potentially.

489
00:23:52,160 --> 00:23:56,279
If that's true, if r
plus 1 is smaller than m,

490
00:23:56,279 --> 00:23:58,570
then there's some components
of y that I don't get to--

491
00:23:58,570 --> 00:23:59,770
I can't specify, right?

492
00:23:59,770 --> 00:24:02,930
My system of equations is
somehow underdetermined.

493
00:24:02,930 --> 00:24:05,540
I need some external
information to show me what

494
00:24:05,540 --> 00:24:07,740
values to pick for those y i.

495
00:24:07,740 --> 00:24:08,900
I don't know.

496
00:24:08,900 --> 00:24:10,400
I can't use them.

497
00:24:10,400 --> 00:24:13,420
Sometimes people just
set y i equal to 0.

498
00:24:13,420 --> 00:24:16,220
That's sort of silly,
but that's what's done.

499
00:24:16,220 --> 00:24:20,150
It's called the minimum
norm least square solution.

500
00:24:20,150 --> 00:24:23,090
y has minimum length, when you
set all these other components

501
00:24:23,090 --> 00:24:24,050
to 0.

502
00:24:24,050 --> 00:24:27,140
But the truth is, we can't
specify those components,

503
00:24:27,140 --> 00:24:27,716
right?

504
00:24:27,716 --> 00:24:29,090
We need some
external information

505
00:24:29,090 --> 00:24:31,850
in order to specify them.

506
00:24:31,850 --> 00:24:34,680
Once we know y, we
can find x going back

507
00:24:34,680 --> 00:24:36,260
to our definition of what y is.

508
00:24:36,260 --> 00:24:38,840
So I multiply this equation
by V on both sides,

509
00:24:38,840 --> 00:24:42,350
and I'll get V y equals x.

510
00:24:42,350 --> 00:24:44,180
So I can find my
least square solution

511
00:24:44,180 --> 00:24:47,596
to the problem from the
singular value decomposition.

512
00:24:47,596 --> 00:24:49,220
So I can find the
least square solution

513
00:24:49,220 --> 00:24:52,850
to both overdetermined and
underdetermined problems using

514
00:24:52,850 --> 00:24:55,694
singular value decomposition.

515
00:24:55,694 --> 00:24:57,110
It inherits all
the properties you

516
00:24:57,110 --> 00:24:58,670
know of solving the
normal equations,

517
00:24:58,670 --> 00:25:01,430
multiplying by A transpose
the entire equation,

518
00:25:01,430 --> 00:25:04,722
and solving for a least
square solution that way.

519
00:25:04,722 --> 00:25:06,680
But that's only good for
overdetermined systems

520
00:25:06,680 --> 00:25:07,250
of equations.

521
00:25:07,250 --> 00:25:09,041
This can work for
underdetermined equations

522
00:25:09,041 --> 00:25:09,770
as well.

523
00:25:09,770 --> 00:25:12,450
And maybe we do have
extraneous information

524
00:25:12,450 --> 00:25:15,080
that lets us specify these
other components somehow.

525
00:25:15,080 --> 00:25:17,420
Maybe we do a
separate optimization

526
00:25:17,420 --> 00:25:20,110
that chooses from all
possible solutions

527
00:25:20,110 --> 00:25:23,070
where these y i's are free,
and picks the best one subject

528
00:25:23,070 --> 00:25:25,890
to some other constraint.

529
00:25:25,890 --> 00:25:27,550
Does it makes sense?

530
00:25:27,550 --> 00:25:29,010
OK, that's the
last decomposition

531
00:25:29,010 --> 00:25:32,070
we're going to talk about.

532
00:25:32,070 --> 00:25:34,890
It's as expensive to compute
the singular value decomposition

533
00:25:34,890 --> 00:25:36,810
as it is to solve a
system of equations.

534
00:25:36,810 --> 00:25:38,310
You might have
guessed that it's got

535
00:25:38,310 --> 00:25:40,470
an order N cubed flavor to it.

536
00:25:40,470 --> 00:25:42,660
It's kind of inescapable
that we run up

537
00:25:42,660 --> 00:25:45,920
against those
computational difficulties,

538
00:25:45,920 --> 00:25:48,530
order N cubed
computational complexity.

539
00:25:48,530 --> 00:25:50,730
And there are many problems
of practical interest,

540
00:25:50,730 --> 00:25:54,180
particularly solutions of
PDEs, for which that's not

541
00:25:54,180 --> 00:25:56,000
going to cut it.

542
00:25:56,000 --> 00:25:59,970
Where you couldn't solve
the problem with that sort

543
00:25:59,970 --> 00:26:01,330
of scaling in time.

544
00:26:01,330 --> 00:26:04,710
You couldn't compute the
Gaussian elimination,

545
00:26:04,710 --> 00:26:06,540
or the singular
value decomposition,

546
00:26:06,540 --> 00:26:08,970
or an eigenvalue decomposition.

547
00:26:08,970 --> 00:26:09,960
It won't work.

548
00:26:09,960 --> 00:26:14,520
And in those cases, we appeal
to not exact solution methods,

549
00:26:14,520 --> 00:26:17,010
but approximate
solution methods.

550
00:26:17,010 --> 00:26:20,120
So instead of trying to
get an exact solution,

551
00:26:20,120 --> 00:26:22,080
we'll try to formulate
one that's good enough.

552
00:26:22,080 --> 00:26:24,390
We already know the computer
introduces numerical error

553
00:26:24,390 --> 00:26:26,070
anyways.

554
00:26:26,070 --> 00:26:28,440
Maybe we don't need machine
precision in our solution

555
00:26:28,440 --> 00:26:30,600
or something close to machine
precision in our solution.

556
00:26:30,600 --> 00:26:32,266
Maybe we're solving
engineering problem,

557
00:26:32,266 --> 00:26:34,230
and we're willing to
accept relative errors

558
00:26:34,230 --> 00:26:37,770
on the order of 10 to the
minus 3 or 10 to the minus 5,

559
00:26:37,770 --> 00:26:41,550
some specified tolerance
that we apply to the problem.

560
00:26:41,550 --> 00:26:44,250
And in those circumstances,
we use iterative methods

561
00:26:44,250 --> 00:26:46,095
to solve systems of
equations instead

562
00:26:46,095 --> 00:26:49,020
of exact methods,
elimination methods,

563
00:26:49,020 --> 00:26:50,640
or metrics
decomposition methods.

564
00:26:53,830 --> 00:26:56,560
These algorithms are all
based on iterative refinement

565
00:26:56,560 --> 00:26:57,580
of an initial guess.

566
00:26:57,580 --> 00:26:59,260
So if we have some
system of equations

567
00:26:59,260 --> 00:27:02,350
we're trying to
solve, Ax equals b,

568
00:27:02,350 --> 00:27:05,410
we'll formulate some
linear map, right?

569
00:27:05,410 --> 00:27:09,910
xi plus 1 will be some
matrix C times x i

570
00:27:09,910 --> 00:27:13,540
plus some little vector c
where x i is my last best

571
00:27:13,540 --> 00:27:17,080
guess for the solution to
this problem, and x i plus 1

572
00:27:17,080 --> 00:27:20,320
is my next best guess for
the solution to this problem.

573
00:27:20,320 --> 00:27:24,700
And I'm hoping, as I apply
this map more and more times,

574
00:27:24,700 --> 00:27:27,010
I'm creeping closer
to the exact solution

575
00:27:27,010 --> 00:27:29,590
to the original
system of equations.

576
00:27:29,590 --> 00:27:33,610
The map will converge
when x i plus 1

577
00:27:33,610 --> 00:27:36,190
approaches x i, when the
map isn't making any changes

578
00:27:36,190 --> 00:27:39,250
to the vector anymore.

579
00:27:39,250 --> 00:27:48,490
And the converged value will
be a solution when x i--

580
00:27:48,490 --> 00:27:51,910
which is equal to i
minus c inverse times c,

581
00:27:51,910 --> 00:27:54,370
if I replace x i was
1 with x i appear,

582
00:27:54,370 --> 00:27:58,120
so I say that my map has
converged-- when this value is

583
00:27:58,120 --> 00:28:00,160
equivalent to A
inverse times B, when

584
00:28:00,160 --> 00:28:03,880
it's a solution to the
original problem, right?

585
00:28:03,880 --> 00:28:05,480
So my map may converge.

586
00:28:05,480 --> 00:28:08,320
It may not converge to a
solution of the problem I like,

587
00:28:08,320 --> 00:28:09,790
but if it satisfies
this condition,

588
00:28:09,790 --> 00:28:12,730
then has converged to be
a solution of the problem

589
00:28:12,730 --> 00:28:14,270
that I like as well.

590
00:28:14,270 --> 00:28:18,400
And so it's all about using this
C here and this little c here

591
00:28:18,400 --> 00:28:22,000
so that this map converges
to solution of the problem

592
00:28:22,000 --> 00:28:22,720
I'm after.

593
00:28:22,720 --> 00:28:25,000
And there are lots of
schemes for doing this.

594
00:28:25,000 --> 00:28:26,680
Some of them are kind of ad hoc.

595
00:28:26,680 --> 00:28:28,230
I'm going to show
you one right now.

596
00:28:28,230 --> 00:28:30,490
And then when we
do optimization,

597
00:28:30,490 --> 00:28:32,230
we'll talk about
a more formal way

598
00:28:32,230 --> 00:28:35,210
of doing this for
which you can guarantee

599
00:28:35,210 --> 00:28:37,937
very rapid convergence
to a solution.

600
00:28:37,937 --> 00:28:40,020
So here's a system of
equations I'd like to solve.

601
00:28:40,020 --> 00:28:41,510
It's not a very big one.

602
00:28:41,510 --> 00:28:44,050
It doesn't really make sense
to solve this one iteratively,

603
00:28:44,050 --> 00:28:45,940
but it's a nice illustration.

604
00:28:45,940 --> 00:28:49,570
One way to go about
formulating this map

605
00:28:49,570 --> 00:28:53,290
is to split this
matrix into two parts.

606
00:28:53,290 --> 00:28:57,160
So I'll split it into a diagonal
part and an off diagonal part.

607
00:28:57,160 --> 00:29:00,070
So I haven't changed the
problem at all by doing that.

608
00:29:00,070 --> 00:29:03,740
And then I'm going to
rename this x x i plus 1,

609
00:29:03,740 --> 00:29:07,060
and I'm going to
rename this x x i.

610
00:29:07,060 --> 00:29:09,454
And then move this
matrix vector product

611
00:29:09,454 --> 00:29:10,870
to the other side
of the equation.

612
00:29:10,870 --> 00:29:12,376
And here's my map.

613
00:29:12,376 --> 00:29:13,750
Of course, this
matrix multiplied

614
00:29:13,750 --> 00:29:16,210
doesn't make any-- it's
not useful to write it out

615
00:29:16,210 --> 00:29:16,780
explicitly.

616
00:29:16,780 --> 00:29:19,070
This is just identity.

617
00:29:19,070 --> 00:29:20,840
So I can drop this entirely.

618
00:29:20,840 --> 00:29:22,090
This is just x i plus one.

619
00:29:22,090 --> 00:29:23,320
So here's my map.

620
00:29:23,320 --> 00:29:27,430
Take an initial guess,
multiply it by this matrix,

621
00:29:27,430 --> 00:29:31,600
add the vector 1, 0, and repeat
over and over and over again.

622
00:29:31,600 --> 00:29:33,130
Hopefully-- we
don't really know--

623
00:29:33,130 --> 00:29:34,671
but hopefully, it's
going to converge

624
00:29:34,671 --> 00:29:38,502
to a solution of the
original linear equations.

625
00:29:38,502 --> 00:29:39,710
I didn't make up that method.

626
00:29:39,710 --> 00:29:42,490
That's a method called
Jacobi Iteration.

627
00:29:42,490 --> 00:29:46,280
And the strategy is to split
the matrix A into two parts--

628
00:29:46,280 --> 00:29:49,010
a sum of its diagonal
elements, and it's off diagonal

629
00:29:49,010 --> 00:29:51,050
elements--

630
00:29:51,050 --> 00:29:54,410
and rewrite the original
equations as an iterative map.

631
00:29:54,410 --> 00:30:00,650
So D times x i plus 1 is equal
to minus r times x i plus b.

632
00:30:00,650 --> 00:30:07,600
Or x i plus 1 is D inverse
times minus r x i plus b.

633
00:30:07,600 --> 00:30:11,100
If the equations converge,
then D plus r times x i

634
00:30:11,100 --> 00:30:13,780
has to be equal to b, we
will have found a solution.

635
00:30:13,780 --> 00:30:15,040
If it converges, right?

636
00:30:15,040 --> 00:30:18,729
If these iterations
approach a steady value.

637
00:30:18,729 --> 00:30:20,770
If they don't change from
iteration to iteration.

638
00:30:20,770 --> 00:30:21,940
Is

639
00:30:21,940 --> 00:30:23,796
The nice thing about
the Jacobi method is it

640
00:30:23,796 --> 00:30:25,170
turns the hard
problem, the order

641
00:30:25,170 --> 00:30:29,920
N cubed problem of
computing A inverse B,

642
00:30:29,920 --> 00:30:31,750
into a succession
of easy problems,

643
00:30:31,750 --> 00:30:38,710
D inverse times some vector C.
How many calculations does it

644
00:30:38,710 --> 00:30:40,150
take to compute that D inverse?

645
00:30:44,710 --> 00:30:46,454
N, that's right, order N.

646
00:30:46,454 --> 00:30:47,620
It's just a diagonal matrix.

647
00:30:47,620 --> 00:30:49,880
I invert each of its diagonal
elements, and I'm done.

648
00:30:49,880 --> 00:30:53,790
So I went from order N cubed,
which was going to be hard,

649
00:30:53,790 --> 00:30:57,320
into a succession
of order N problems.

650
00:30:57,320 --> 00:30:59,730
So as long as it doesn't
take me order N squared

651
00:30:59,730 --> 00:31:02,879
iterations to get to the
solution that I want,

652
00:31:02,879 --> 00:31:03,670
I'm going to be OK.

653
00:31:03,670 --> 00:31:05,378
This is going to be
a viable way to solve

654
00:31:05,378 --> 00:31:07,690
this problem faster than
finding the exact solution.

655
00:31:13,240 --> 00:31:15,640
How do you know
that it converges?

656
00:31:15,640 --> 00:31:17,020
That's the question.

657
00:31:17,020 --> 00:31:20,290
Is this thing actually
going to converge or not,

658
00:31:20,290 --> 00:31:23,980
or are these iterations just
going to run on and on forever?

659
00:31:23,980 --> 00:31:26,620
Well, one way to check whether
it will converge or not

660
00:31:26,620 --> 00:31:31,220
is to go back up to this
equation here, and substitute b

661
00:31:31,220 --> 00:31:34,890
equals Ax, where x is the
exact solution to the problem.

662
00:31:37,420 --> 00:31:40,810
And you can transform,
then, this equation into one

663
00:31:40,810 --> 00:31:45,640
that looks like x i plus
1 minus x equal to minus D

664
00:31:45,640 --> 00:31:49,930
inverse times r x i minus x.

665
00:31:49,930 --> 00:31:52,780
And if I take the
norm of both sides

666
00:31:52,780 --> 00:31:55,270
and I apply our
normal equality--

667
00:31:55,270 --> 00:31:58,000
where the norm of a
matrix vector product

668
00:31:58,000 --> 00:32:01,460
is smaller than the product
of the norms of the matrices

669
00:32:01,460 --> 00:32:02,860
of the vectors--

670
00:32:02,860 --> 00:32:06,070
then I can get a
ratio like this.

671
00:32:06,070 --> 00:32:10,060
That the absolute error
in iteration I plus 1

672
00:32:10,060 --> 00:32:13,090
divided by the absolute
error in iteration i

673
00:32:13,090 --> 00:32:17,410
is smaller than the
norm of this matrix.

674
00:32:17,410 --> 00:32:21,810
So if I'm converging,
then what I expect

675
00:32:21,810 --> 00:32:25,320
is this ratio should
be smaller than 1.

676
00:32:25,320 --> 00:32:27,240
The error in my
next approximation

677
00:32:27,240 --> 00:32:30,080
should be smaller than the error
in my current approximation.

678
00:32:30,080 --> 00:32:31,450
That makes sense?

679
00:32:31,450 --> 00:32:35,100
So that means that I would hope
that the norm of this matrix

680
00:32:35,100 --> 00:32:36,690
is also smaller than 1.

681
00:32:36,690 --> 00:32:40,290
If it is, then I'm going to
be guaranteed to converge.

682
00:32:40,290 --> 00:32:42,912
So for a particular
coefficient matrix,

683
00:32:42,912 --> 00:32:45,120
for a system of linear
equations I'm trying to solve,

684
00:32:45,120 --> 00:32:47,010
I may be able to find--

685
00:32:47,010 --> 00:32:50,430
I may find that this is true.

686
00:32:50,430 --> 00:32:51,960
And then I can
apply this method,

687
00:32:51,960 --> 00:32:54,980
and I'll converge to a solution.

688
00:32:54,980 --> 00:32:58,100
We call this sort of
convergence linear.

689
00:32:58,100 --> 00:32:59,990
Whatever this number
is, it tells me

690
00:32:59,990 --> 00:33:03,890
the fraction by which the
error is reduced from iteration

691
00:33:03,890 --> 00:33:04,940
to iteration.

692
00:33:04,940 --> 00:33:07,810
So suppose this is 1/10.

693
00:33:07,810 --> 00:33:11,510
Then the absolute error is going
to be reduced by a factor of 10

694
00:33:11,510 --> 00:33:14,300
in each iteration.

695
00:33:14,300 --> 00:33:15,720
It's not going to
be 1/10 usually.

696
00:33:15,720 --> 00:33:17,095
It's going to be
something that's

697
00:33:17,095 --> 00:33:20,201
a little bit bigger than that
typically, but that's the idea.

698
00:33:20,201 --> 00:33:22,700
You can show-- I would encourage
you to try to work this out

699
00:33:22,700 --> 00:33:25,560
on your own-- but you can show
that the infinity norm of this

700
00:33:25,560 --> 00:33:26,060
product--

701
00:33:29,060 --> 00:33:33,050
infinity norm of this
product is equal to this.

702
00:33:33,050 --> 00:33:36,040
And if I ask that the
infinity norm of this product

703
00:33:36,040 --> 00:33:38,540
be smaller than 1,
that's guaranteed

704
00:33:38,540 --> 00:33:41,450
when the diagonal values of
the matrix and absolute value

705
00:33:41,450 --> 00:33:43,850
are bigger than
the sum of the off

706
00:33:43,850 --> 00:33:46,670
diagonal values in a particular
row or a particular column.

707
00:33:46,670 --> 00:33:49,187
And that kind of matrix we
call diagonally dominant.

708
00:33:49,187 --> 00:33:51,770
The diagonal values are bigger
than the sum and absolute value

709
00:33:51,770 --> 00:33:53,570
of the off diagonal pieces.

710
00:33:53,570 --> 00:33:57,290
So diagonally dominant matrices,
which come up quite often,

711
00:33:57,290 --> 00:33:59,240
can be-- those linear
equations based

712
00:33:59,240 --> 00:34:02,660
on those matrices can be solved
reasonable efficiency using

713
00:34:02,660 --> 00:34:04,262
the Jacobi method.

714
00:34:04,262 --> 00:34:05,720
There are better
methods to choose.

715
00:34:05,720 --> 00:34:07,350
I'll show you one in a second.

716
00:34:07,350 --> 00:34:08,900
But you can guarantee
that this is

717
00:34:08,900 --> 00:34:11,360
going to converge to a solution,
and that the solution will

718
00:34:11,360 --> 00:34:13,219
be the right solution to
the linear equations you

719
00:34:13,219 --> 00:34:14,094
were trying to solve.

720
00:34:19,440 --> 00:34:23,010
So if the goal is just to
turn hard problems into easier

721
00:34:23,010 --> 00:34:25,500
to solve problems, then
there are other natural ways

722
00:34:25,500 --> 00:34:27,540
to want to split a matrix.

723
00:34:27,540 --> 00:34:32,100
So maybe you want to split into
A lower triangular part which

724
00:34:32,100 --> 00:34:34,829
contains the diagonal
elements of A,

725
00:34:34,829 --> 00:34:36,449
and an upper
triangular part which

726
00:34:36,449 --> 00:34:41,130
has no diagonal elements of A.
We just split this thing apart.

727
00:34:41,130 --> 00:34:43,370
And then we could rewrite
our system of equations

728
00:34:43,370 --> 00:34:45,750
is an iterative map
like this, L times x i

729
00:34:45,750 --> 00:34:50,429
plus 1 is minus U
times x i plus b.

730
00:34:50,429 --> 00:34:53,550
All I have to do is invert
l to find my next iteration.

731
00:34:53,550 --> 00:34:55,590
And how expensive
computationally

732
00:34:55,590 --> 00:35:00,360
is it to solve a system of
equations which is triangular?

733
00:35:00,360 --> 00:35:02,630
This is a process we
call back substitution.

734
00:35:02,630 --> 00:35:03,540
Its order--

735
00:35:03,540 --> 00:35:04,470
AUDIENCE: N squared.

736
00:35:04,470 --> 00:35:05,820
JAMES W. SWAN: --N squared.

737
00:35:05,820 --> 00:35:08,160
So we still beat N cubed.

738
00:35:08,160 --> 00:35:11,640
One would hope that it doesn't
require too many iterations

739
00:35:11,640 --> 00:35:12,420
to do this.

740
00:35:12,420 --> 00:35:15,160
But in principle, we can do
this order N squared operations

741
00:35:15,160 --> 00:35:16,500
many times.

742
00:35:16,500 --> 00:35:18,840
And it'll turn out
that this sort of a map

743
00:35:18,840 --> 00:35:22,080
converges to the solution
that we're after.

744
00:35:22,080 --> 00:35:24,750
It converges when matrices
are either diagonally dominant

745
00:35:24,750 --> 00:35:28,470
as before, or they're symmetric
and they're positive definite.

746
00:35:28,470 --> 00:35:31,680
Positive definite means all
the eigenvalues of the matrix

747
00:35:31,680 --> 00:35:33,240
are bigger than 0.

748
00:35:40,680 --> 00:35:43,292
So try the iterative method
solving some equations

749
00:35:43,292 --> 00:35:44,250
and see how we convert.

750
00:35:44,250 --> 00:35:44,749
Yes?

751
00:35:44,749 --> 00:35:47,460
AUDIENCE: How do you justify
ignoring the diagonal elements

752
00:35:47,460 --> 00:35:48,400
in that method?

753
00:35:50,862 --> 00:35:52,320
JAMES W. SWAN: So
the question was,

754
00:35:52,320 --> 00:35:54,960
how do you justify ignoring
the diagonal elements

755
00:35:54,960 --> 00:35:56,102
in this method.

756
00:35:56,102 --> 00:35:57,810
Maybe I was going too
fast or I misspoke.

757
00:35:57,810 --> 00:36:01,980
So I'm going to split A into
a lower triangular matrix that

758
00:36:01,980 --> 00:36:05,250
has all the diagonal
elements, and U

759
00:36:05,250 --> 00:36:08,125
is the upper parts with none of
those diagonal elements on it.

760
00:36:08,125 --> 00:36:09,000
Does that make sense?

761
00:36:09,000 --> 00:36:09,750
AUDIENCE: Yeah.

762
00:36:09,750 --> 00:36:11,180
JAMES W. SWAN: Thank you
for asking that question.

763
00:36:11,180 --> 00:36:12,430
I hope that's clear.

764
00:36:12,430 --> 00:36:16,660
l holds onto the diagonal
pieces and U takes those away.

765
00:36:19,790 --> 00:36:20,490
So let's try it.

766
00:36:20,490 --> 00:36:23,310
On a matrix like this,
the exact solution

767
00:36:23,310 --> 00:36:27,210
to this system of equations
is 3/4, 1/2, and 1/4.

768
00:36:27,210 --> 00:36:28,980
All right, we'll
try Jacobi, we'll

769
00:36:28,980 --> 00:36:31,530
have to give it some initial
guess for the solution, right?

770
00:36:31,530 --> 00:36:33,450
We're talking about
places where you

771
00:36:33,450 --> 00:36:37,160
can derive those initial guesses
from later on in the course,

772
00:36:37,160 --> 00:36:39,300
but we have to start
the iterative process

773
00:36:39,300 --> 00:36:41,730
with some guess
at the solutions.

774
00:36:41,730 --> 00:36:43,380
So here's an initial guess.

775
00:36:43,380 --> 00:36:44,550
We'll apply this map.

776
00:36:44,550 --> 00:36:47,010
Here's Gauss-Seidel with
the same initial guess,

777
00:36:47,010 --> 00:36:48,670
and we'll apply this map.

778
00:36:48,670 --> 00:36:52,120
They're both
linearly convergent,

779
00:36:52,120 --> 00:36:53,700
so the relative
error will go down

780
00:36:53,700 --> 00:36:57,660
by a fixed factor
after each iteration.

781
00:36:57,660 --> 00:37:00,470
Iteration one, the relative
error in Jacobi will be 38%.

782
00:37:00,470 --> 00:37:02,910
In Gauss-Seidel, it'll be 40%.

783
00:37:02,910 --> 00:37:05,190
If we apply this all the
way down to 10 iterations,

784
00:37:05,190 --> 00:37:09,020
the relative error Jacobi will
be 1.7%, and the relative error

785
00:37:09,020 --> 00:37:11,360
in Gauss-Seidel 0.08%.

786
00:37:11,360 --> 00:37:13,320
And we can go on and on
with these iterations

787
00:37:13,320 --> 00:37:16,740
if we want until we get
sufficiently converged, we

788
00:37:16,740 --> 00:37:19,320
get to a point where the
relative error is small enough

789
00:37:19,320 --> 00:37:22,590
that we're happy to accept
this answer as a solution

790
00:37:22,590 --> 00:37:24,660
to our system of equations.

791
00:37:24,660 --> 00:37:28,450
So we traded the burden of doing
all these calculations to do

792
00:37:28,450 --> 00:37:34,080
elimination for a faster,
less computationally complex

793
00:37:34,080 --> 00:37:35,250
methodology.

794
00:37:35,250 --> 00:37:38,620
But the trade off was we don't
get an exact solution anymore.

795
00:37:38,620 --> 00:37:40,890
We're going to have finite
precision in the result,

796
00:37:40,890 --> 00:37:42,750
and we have to
specify the tolerance

797
00:37:42,750 --> 00:37:44,541
that we want to converge to.

798
00:37:44,541 --> 00:37:47,040
We're going to see now-- this
is the hook into the next part

799
00:37:47,040 --> 00:37:47,860
of that class--

800
00:37:47,860 --> 00:37:50,070
we're going to talk about
solutions of nonlinear

801
00:37:50,070 --> 00:37:52,470
equations next for
which there are

802
00:37:52,470 --> 00:37:55,140
almost no non-linear equations
that we can solve exactly.

803
00:37:55,140 --> 00:37:58,945
They all have to be solved
using these iterative methods.

804
00:37:58,945 --> 00:38:01,320
You can use these iterative
methods for linear equations.

805
00:38:01,320 --> 00:38:03,079
It's very common
to do it this way.

806
00:38:03,079 --> 00:38:04,620
In my group, we
solve lots of systems

807
00:38:04,620 --> 00:38:07,860
of linear equations associated
with hydrodynamic problems.

808
00:38:07,860 --> 00:38:10,670
These come up when
you're talking about,

809
00:38:10,670 --> 00:38:12,510
say, low Reynolds
number flows, which

810
00:38:12,510 --> 00:38:15,420
are linear sorts of
fluid flow problems.

811
00:38:15,420 --> 00:38:15,990
They're big.

812
00:38:15,990 --> 00:38:18,050
It's really hard to do
Gaussian elimination,

813
00:38:18,050 --> 00:38:19,910
so you apply different
iterative methods.

814
00:38:19,910 --> 00:38:20,910
You can do Gauss-Seidel.

815
00:38:20,910 --> 00:38:22,120
You can do Jacobi.

816
00:38:22,120 --> 00:38:23,790
We'll learn about
more advanced ones

817
00:38:23,790 --> 00:38:26,880
like PCG, which you're
applying on your homework now,

818
00:38:26,880 --> 00:38:29,970
and you should be seeing that
it converges relatively quickly

819
00:38:29,970 --> 00:38:32,176
in cases where exact
elimination doesn't work.

820
00:38:32,176 --> 00:38:34,050
We'll learn, actually,
how to do that method.

821
00:38:34,050 --> 00:38:35,758
That's one that we
apply in my own group.

822
00:38:35,758 --> 00:38:37,870
It's pretty common
to use out there.

823
00:38:37,870 --> 00:38:38,370
Yes?

824
00:38:38,370 --> 00:38:43,440
AUDIENCE: One question, is
that that Gauss, [INAUDIBLE]

825
00:38:43,440 --> 00:38:45,930
JAMES W. SWAN: Order N squared.

826
00:38:45,930 --> 00:38:47,708
AUDIENCE: Yeah,
that's what I meant.

827
00:38:47,708 --> 00:38:50,148
So now we've got an [INAUDIBLE].

828
00:38:50,148 --> 00:38:54,407
So we basically have
[INAUDIBLE] iterations, right?

829
00:38:54,407 --> 00:38:56,240
JAMES W. SWAN: This is
a wonderful question.

830
00:38:56,240 --> 00:39:00,120
So this is a pathological
problem in the sense

831
00:39:00,120 --> 00:39:03,300
that it requires a
lot of calculations

832
00:39:03,300 --> 00:39:05,550
to get an iterative
solution here.

833
00:39:05,550 --> 00:39:07,110
We haven't gotten
to an end that's

834
00:39:07,110 --> 00:39:10,290
big enough that the
computational complexities

835
00:39:10,290 --> 00:39:11,700
crossover.

836
00:39:11,700 --> 00:39:15,810
So for small Ns, probably
the factor in front of N--

837
00:39:15,810 --> 00:39:17,200
whatever number that is--

838
00:39:17,200 --> 00:39:19,000
and maybe even the
smaller factors,

839
00:39:19,000 --> 00:39:20,940
order N squared
factors on that order

840
00:39:20,940 --> 00:39:22,440
N cubed, play a big
role in how long

841
00:39:22,440 --> 00:39:24,660
it takes to actually
complete this thing.

842
00:39:24,660 --> 00:39:28,140
But modern problems are so
big that we almost always

843
00:39:28,140 --> 00:39:30,480
are running out to Ns
that are large enough

844
00:39:30,480 --> 00:39:31,590
that we see a crossover.

845
00:39:31,590 --> 00:39:34,110
You'll see this in your
homework this week.

846
00:39:34,110 --> 00:39:36,030
You won't see this
crossover at N equals 3.

847
00:39:36,030 --> 00:39:37,613
You're going to see
it out at N equals

848
00:39:37,613 --> 00:39:41,336
500 or 1,200, big problems.

849
00:39:41,336 --> 00:39:43,210
Then we're going to
encounter this crossover.

850
00:39:43,210 --> 00:39:44,376
That's a wonderful question.

851
00:39:44,376 --> 00:39:49,110
So first small system
sizes, iterative methods

852
00:39:49,110 --> 00:39:50,750
maybe don't buy you much.

853
00:39:50,750 --> 00:39:53,000
I suppose it depends on the
application though, right?

854
00:39:53,000 --> 00:39:56,430
If you're doing something
that involves solving problems

855
00:39:56,430 --> 00:40:01,420
on embedded hardware, in some
sort of sensor or control

856
00:40:01,420 --> 00:40:04,230
valve, there may be
very limited memory

857
00:40:04,230 --> 00:40:06,290
or computational capacity
available to you.

858
00:40:06,290 --> 00:40:08,490
And you may actually
apply an iterative method

859
00:40:08,490 --> 00:40:12,120
like this to a problem
that that controller

860
00:40:12,120 --> 00:40:15,330
needs to solve, for example.

861
00:40:15,330 --> 00:40:17,370
It just may not
have the capability

862
00:40:17,370 --> 00:40:20,880
of storing and inverting what
we would consider, today,

863
00:40:20,880 --> 00:40:25,350
a relatively small matrix
because the hardware doesn't

864
00:40:25,350 --> 00:40:26,670
have that sort of capability.

865
00:40:26,670 --> 00:40:28,890
So there could be
cases where you

866
00:40:28,890 --> 00:40:32,250
might choose something
that's slower but feasible,

867
00:40:32,250 --> 00:40:34,650
versus something that's
faster and exact,

868
00:40:34,650 --> 00:40:36,990
because there are
other constraints.

869
00:40:36,990 --> 00:40:40,560
They do exist, but modern
computers are pretty efficient.

870
00:40:40,560 --> 00:40:43,590
Your cell phone is faster
than the fastest computers

871
00:40:43,590 --> 00:40:46,410
in the world 20 years ago.

872
00:40:46,410 --> 00:40:47,550
We're doing OK.

873
00:40:47,550 --> 00:40:49,890
So we've got to get out to
big system sizes, big problem

874
00:40:49,890 --> 00:40:52,060
sizes, before this
starts to pay off.

875
00:40:52,060 --> 00:40:55,210
But it does for many
practical problems.

876
00:40:55,210 --> 00:40:59,410
OK I'll close with this, because
this is the hook into solving

877
00:40:59,410 --> 00:41:00,520
nonlinear equations.

878
00:41:03,537 --> 00:41:05,370
So I showed you these
two iterative methods,

879
00:41:05,370 --> 00:41:07,830
and they kind of had
stringent requirements

880
00:41:07,830 --> 00:41:10,770
for when they were actually
going to converge, right?

881
00:41:10,770 --> 00:41:13,350
I had to have a diagonally
dominant system of equations

882
00:41:13,350 --> 00:41:15,460
for Jacobi to converge.

883
00:41:15,460 --> 00:41:18,720
I had to have diagonal dominance
or symmetric positive definite

884
00:41:18,720 --> 00:41:19,260
matrices.

885
00:41:19,260 --> 00:41:20,730
These things exist
and they come up

886
00:41:20,730 --> 00:41:22,188
in lots of physical
problems, but I

887
00:41:22,188 --> 00:41:25,050
had to have it in order for
Gauss-Seidel to converge.

888
00:41:25,050 --> 00:41:26,700
What if I have a
system of equations

889
00:41:26,700 --> 00:41:28,194
that doesn't work that way?

890
00:41:28,194 --> 00:41:29,610
Or what if I have
an iterative map

891
00:41:29,610 --> 00:41:33,900
that I like for some reason, but
it doesn't appear to converge?

892
00:41:33,900 --> 00:41:37,270
Maybe it converges under some
circumstances, but not others.

893
00:41:37,270 --> 00:41:40,140
Well, there's a way to modify
these iterative maps, called

894
00:41:40,140 --> 00:41:43,380
successive
over-relaxation, which

895
00:41:43,380 --> 00:41:45,977
can help promote convergence.

896
00:41:45,977 --> 00:41:48,060
So suppose we have an
iterative map like this, x i

897
00:41:48,060 --> 00:41:51,897
plus 1 is some function of
the previous iteration value.

898
00:41:51,897 --> 00:41:52,980
Doesn't matter what it is.

899
00:41:52,980 --> 00:41:54,646
It could be linear,
could be non-linear.

900
00:41:54,646 --> 00:41:57,760
We don't actually care.

901
00:41:57,760 --> 00:42:00,610
The sought after solution
is found when x i plus 1

902
00:42:00,610 --> 00:42:01,630
is equal to x i.

903
00:42:01,630 --> 00:42:03,540
So this map is one
the convergence

904
00:42:03,540 --> 00:42:06,250
to the exact solution of
the problem that we want.

905
00:42:06,250 --> 00:42:08,650
We've somehow guaranteed
that that's the case,

906
00:42:08,650 --> 00:42:10,990
but it has to converge.

907
00:42:10,990 --> 00:42:14,890
One way to modify that
map is to say x i plus 1

908
00:42:14,890 --> 00:42:17,890
is 1 minus some
scalar value omega

909
00:42:17,890 --> 00:42:23,020
times x i plus omega times f.

910
00:42:23,020 --> 00:42:26,350
You can confirm that if you
substitute x i plus 1 equals

911
00:42:26,350 --> 00:42:28,900
x i into this equation,
you'll come up

912
00:42:28,900 --> 00:42:33,940
with the same fixed points
of this iterative map

913
00:42:33,940 --> 00:42:36,500
x i is equal to f of x i.

914
00:42:36,500 --> 00:42:40,210
So you haven't changed what
value will converge here,

915
00:42:40,210 --> 00:42:42,460
but you've affected the
rate at which it converges.

916
00:42:42,460 --> 00:42:45,640
Here you're saying x i
plus 1 is some fraction

917
00:42:45,640 --> 00:42:50,290
of my previous solution plus
some fraction of this f.

918
00:42:50,290 --> 00:42:53,890
And I get to control how big
those different fractions.

919
00:42:53,890 --> 00:42:58,410
So if things aren't converging
well for a map like this,

920
00:42:58,410 --> 00:43:00,700
then I could try
successive over-relaxation,

921
00:43:00,700 --> 00:43:03,400
and I could adjust this
relaxation parameter

922
00:43:03,400 --> 00:43:07,890
to be some fraction, some
number between 0 and 1,

923
00:43:07,890 --> 00:43:10,050
until I start to
observe convergence.

924
00:43:10,050 --> 00:43:11,550
And there are some
rules one can use

925
00:43:11,550 --> 00:43:13,383
to try to promote
convergence with this kind

926
00:43:13,383 --> 00:43:15,204
of successive over-relaxation.

927
00:43:15,204 --> 00:43:17,370
This is a very generic
technique that one can apply.

928
00:43:17,370 --> 00:43:20,700
If you have any iterative
map you're trying to apply,

929
00:43:20,700 --> 00:43:22,840
it should go to the
solution you want

930
00:43:22,840 --> 00:43:24,570
but it doesn't converge
for some reason,

931
00:43:24,570 --> 00:43:27,630
then you can use this
relaxation technique to promote

932
00:43:27,630 --> 00:43:29,250
convergence to the solution.

933
00:43:29,250 --> 00:43:31,770
You may slow the
convergence way down.

934
00:43:31,770 --> 00:43:34,680
It may be very slow to
converge, but it will converge.

935
00:43:34,680 --> 00:43:36,690
And after all, an
answer is better

936
00:43:36,690 --> 00:43:38,940
than no answer, no matter
how long it takes to get it.

937
00:43:38,940 --> 00:43:40,740
So sometimes you've
got to get these things

938
00:43:40,740 --> 00:43:43,670
by hook or by crook.

939
00:43:43,670 --> 00:43:45,710
So for example, you can
apply this to Jacobi.

940
00:43:45,710 --> 00:43:48,514
This was the
original Jacobi map.

941
00:43:48,514 --> 00:43:49,430
And we just take that.

942
00:43:49,430 --> 00:43:53,640
We add 1 minus omega
times x i plus omega

943
00:43:53,640 --> 00:43:55,680
times this factor over here.

944
00:43:55,680 --> 00:43:59,300
And now we can choose omega so
that this solution converges.

945
00:43:59,300 --> 00:44:02,360
We always make omega
small enough so

946
00:44:02,360 --> 00:44:04,550
that the diagonal
values of our matrix

947
00:44:04,550 --> 00:44:07,220
appear big enough
that the matrix looks

948
00:44:07,220 --> 00:44:09,770
like it's diagonally dominated.

949
00:44:09,770 --> 00:44:12,230
You could go back to that
same convergence analysis

950
00:44:12,230 --> 00:44:14,510
that I showed you before
and try to apply it

951
00:44:14,510 --> 00:44:19,252
to this over-relaxation
form of Jacobi and see that,

952
00:44:19,252 --> 00:44:21,710
while there's always going to
be some value of omega that's

953
00:44:21,710 --> 00:44:25,420
small enough, that this
thing will converge.

954
00:44:25,420 --> 00:44:27,920
It will look effectively
diagonally dominant,

955
00:44:27,920 --> 00:44:32,150
because omega inverse
times D will be big enough,

956
00:44:32,150 --> 00:44:34,865
or omega times D inverse
will be small enough.

957
00:44:34,865 --> 00:44:36,619
Does that make sense?

958
00:44:36,619 --> 00:44:39,160
You can apply the same sort of
damping method to Gauss-Seidel

959
00:44:39,160 --> 00:44:39,660
as well.

960
00:44:39,660 --> 00:44:42,360
It's very common to do this.

961
00:44:42,360 --> 00:44:45,560
The relaxation parameter acts
like an effective increase

962
00:44:45,560 --> 00:44:47,692
in the eigenvalues
of the matrix.

963
00:44:47,692 --> 00:44:50,150
So you can think about L. That's
a lower triangular matrix.

964
00:44:50,150 --> 00:44:54,310
It's diagonal values
are its eigenvalues.

965
00:44:54,310 --> 00:44:56,190
The diagonal values
of L inverse--

966
00:44:56,190 --> 00:44:59,000
well, 1 over those
diagonal values

967
00:44:59,000 --> 00:45:00,820
are the eigenvalues
of L inverse.

968
00:45:00,820 --> 00:45:03,560
And so if we make
omega very small,

969
00:45:03,560 --> 00:45:06,630
then we make the eigenvalues
of L inverse very small,

970
00:45:06,630 --> 00:45:08,510
or the eigenvalues
or L very big.

971
00:45:08,510 --> 00:45:12,170
And again, the matrix starts
to look diagonally dominated.

972
00:45:12,170 --> 00:45:15,330
And you can promote
convergence in this way.

973
00:45:15,330 --> 00:45:17,960
So even though this
may be slow, you

974
00:45:17,960 --> 00:45:19,640
can use it to
guarantee convergence

975
00:45:19,640 --> 00:45:22,299
of some iterative procedures,
not just for linear equations,

976
00:45:22,299 --> 00:45:23,840
but for non-linear
equations as well.

977
00:45:23,840 --> 00:45:25,370
And we'll see,
there are good ways

978
00:45:25,370 --> 00:45:27,230
of choosing omega
for certain classes

979
00:45:27,230 --> 00:45:29,010
of non-linear equations.

980
00:45:29,010 --> 00:45:30,440
We'll apply
Newton-Raphson method,

981
00:45:30,440 --> 00:45:33,590
and then will damp it using
exactly the sort of procedure.

982
00:45:33,590 --> 00:45:37,190
And I'll show you
how you can choose

983
00:45:37,190 --> 00:45:41,860
a nearly optimal value for
omega to promote convergence

984
00:45:41,860 --> 00:45:43,610
to the solution.

985
00:45:43,610 --> 00:45:46,480
Any questions?

986
00:45:46,480 --> 00:45:50,140
No, let me address one
more thing before you go.

987
00:45:50,140 --> 00:45:53,140
We've scheduled times
for the quizzes.

988
00:45:53,140 --> 00:45:55,180
They are going to
be in the evenings

989
00:45:55,180 --> 00:45:58,180
on the dates that are
specified on the syllabus.

990
00:45:58,180 --> 00:46:01,060
We wanted to do them
during the daytime.

991
00:46:01,060 --> 00:46:02,890
It was really
difficult to schedule

992
00:46:02,890 --> 00:46:04,970
a room that was big
enough for this class,

993
00:46:04,970 --> 00:46:07,990
so they have to be from 7:00
to 9:00 in the gymnasium.

994
00:46:07,990 --> 00:46:09,709
I apologize for that.

995
00:46:09,709 --> 00:46:11,500
We spent several days
looking around trying

996
00:46:11,500 --> 00:46:14,050
to find a place where
we could put everybody

997
00:46:14,050 --> 00:46:17,680
so you would all get the
same experience in the quiz.

998
00:46:17,680 --> 00:46:20,590
I know that the November
quiz comes back to back

999
00:46:20,590 --> 00:46:24,130
with the thermodynamics
exam as well.

1000
00:46:24,130 --> 00:46:26,030
That's frustrating.

1001
00:46:26,030 --> 00:46:27,850
Thermodynamics is the next day.

1002
00:46:27,850 --> 00:46:29,170
That week is tricky.

1003
00:46:29,170 --> 00:46:33,004
That's AICHE, so most of
the faculty have to travel.

1004
00:46:33,004 --> 00:46:34,420
We won't be able
to teach, but you

1005
00:46:34,420 --> 00:46:36,350
won't have classes
one of those days

1006
00:46:36,350 --> 00:46:39,910
so you have extra time to study.

1007
00:46:39,910 --> 00:46:41,750
And Columbus Day also
falls in that week,

1008
00:46:41,750 --> 00:46:44,590
so there's no way to put
three exams in four days

1009
00:46:44,590 --> 00:46:46,830
without having them
come right back to back.

1010
00:46:46,830 --> 00:46:48,460
Believe me, we
thought about this

1011
00:46:48,460 --> 00:46:51,004
and tried to get things
scheduled as efficiently as we

1012
00:46:51,004 --> 00:46:52,420
could for you, but
sometimes there

1013
00:46:52,420 --> 00:46:54,420
are constraints that are
outside of our control.

1014
00:46:54,420 --> 00:46:55,927
But the quiz times are set.

1015
00:46:55,927 --> 00:46:58,260
There's going to be done in
October and one in November.

1016
00:46:58,260 --> 00:47:00,790
They'll be in the evening, and
they'll be in the gymnasium.

1017
00:47:00,790 --> 00:47:03,790
I'll give you directions
to it before the exam, just

1018
00:47:03,790 --> 00:47:06,280
say you know exactly
where to go, OK?

1019
00:47:06,280 --> 00:47:08,210
Thank you, guys.