1 00:00:01,580 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,340 Commons license. 3 00:00:05,340 --> 00:00:07,550 Your support will help MIT OpenCourseWare 4 00:00:07,550 --> 00:00:11,640 continue to offer high quality educational resources for free. 5 00:00:11,640 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,110 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,110 --> 00:00:19,130 at ocw.mit.edu. 8 00:00:23,390 --> 00:00:26,150 JAMES W. SWAN: So this is going to be our last lecture 9 00:00:26,150 --> 00:00:27,960 on linear algebra. 10 00:00:27,960 --> 00:00:30,417 The first three lectures covered basics. 11 00:00:30,417 --> 00:00:32,750 The next three lectures, we talked about different sorts 12 00:00:32,750 --> 00:00:34,730 of transformations of matrices. 13 00:00:34,730 --> 00:00:36,742 This final lecture is the last of those three. 14 00:00:36,742 --> 00:00:39,200 We're going to talk about in another sort of transformation 15 00:00:39,200 --> 00:00:42,820 called the singular value decomposition. 16 00:00:42,820 --> 00:00:46,610 OK, before we jump in, I'd like to do the usual recap business. 17 00:00:46,610 --> 00:00:49,010 I think it's always hopeful to recap or look at things 18 00:00:49,010 --> 00:00:51,170 from a different perspective. 19 00:00:51,170 --> 00:00:55,130 Early on, I told you that the infinite dimensional equivalent 20 00:00:55,130 --> 00:00:57,740 of vectors would be something like a function, which 21 00:00:57,740 --> 00:01:02,510 is a map, a unique map maybe from a point to x to some value 22 00:01:02,510 --> 00:01:04,150 f of x. 23 00:01:04,150 --> 00:01:06,080 And there is an equivalent representation 24 00:01:06,080 --> 00:01:08,780 of the eigenvalue eigenvector problem in function space. 25 00:01:08,780 --> 00:01:11,330 We call these eigenvalues and eigenfunctions. 26 00:01:11,330 --> 00:01:15,074 Here's a classic one where the function is y of x, OK? 27 00:01:15,074 --> 00:01:17,240 This is the equivalent of the vector, and equivalent 28 00:01:17,240 --> 00:01:18,890 of the transformation or the matrix 29 00:01:18,890 --> 00:01:21,230 that's this differential operator this time, 30 00:01:21,230 --> 00:01:22,850 the second derivative. 31 00:01:22,850 --> 00:01:26,340 So I take the second derivative of this particular function, 32 00:01:26,340 --> 00:01:28,040 and the function is stretched. 33 00:01:28,040 --> 00:01:30,980 It's multiplied by some fixed value at all points. 34 00:01:30,980 --> 00:01:33,830 And it becomes lambda times y. 35 00:01:33,830 --> 00:01:37,100 And that operator has to be closed with some boundary 36 00:01:37,100 --> 00:01:37,920 conditions as well. 37 00:01:37,920 --> 00:01:39,830 We have to say what the value of y 38 00:01:39,830 --> 00:01:43,640 is at the edges of some boundary. 39 00:01:43,640 --> 00:01:45,590 So there's a one-to-one correspondence 40 00:01:45,590 --> 00:01:48,470 between these things. 41 00:01:48,470 --> 00:01:54,730 What is the eigenfunction here, or what are the eigenfunctions? 42 00:01:54,730 --> 00:01:57,160 And what are the eigenvalues associated 43 00:01:57,160 --> 00:01:59,590 with this transformation or this operator? 44 00:01:59,590 --> 00:02:01,667 Can you work those out really quickly? 45 00:02:01,667 --> 00:02:03,250 You learned this at some point, right? 46 00:02:03,250 --> 00:02:05,995 Somebody taught you differential equations 47 00:02:05,995 --> 00:02:07,360 and you calculated these things. 48 00:02:07,360 --> 00:02:08,410 Take about 90 seconds. 49 00:02:08,410 --> 00:02:09,370 Work with the people around you. 50 00:02:09,370 --> 00:02:11,286 See if you can come to a conclusion about what 51 00:02:11,286 --> 00:02:14,980 the eigenfunction and eigenvalues are. 52 00:02:26,699 --> 00:02:27,490 That's enough time. 53 00:02:27,490 --> 00:02:28,870 You can work on this on your own later 54 00:02:28,870 --> 00:02:29,953 if you've run out of time. 55 00:02:29,953 --> 00:02:30,976 Don't worry about it. 56 00:02:30,976 --> 00:02:32,600 Does somebody want to volunteer a guess 57 00:02:32,600 --> 00:02:36,340 for what the eigenfunctions are in this case? 58 00:02:36,340 --> 00:02:38,350 What are they? 59 00:02:38,350 --> 00:02:39,300 Yeah? 60 00:02:39,300 --> 00:02:42,846 AUDIENCE: [INAUDIBLE] 61 00:02:42,846 --> 00:02:44,720 JAMES W. SWAN: OK, so you chose exponentials. 62 00:02:44,720 --> 00:02:45,950 That's an interesting choice. 63 00:02:45,950 --> 00:02:47,880 That's one possible choice you can make. 64 00:02:47,880 --> 00:02:48,994 OK, so we could say-- 65 00:02:48,994 --> 00:02:50,660 this is sort of a classical one that you 66 00:02:50,660 --> 00:02:55,274 think about when you first learn differential equation. 67 00:02:55,274 --> 00:02:56,690 They say, an equation of this sort 68 00:02:56,690 --> 00:03:01,544 has solutions that look like exponentials, and that's true. 69 00:03:01,544 --> 00:03:03,210 There's another representation for this, 70 00:03:03,210 --> 00:03:05,760 which is as trigonometric functions instead, right? 71 00:03:09,110 --> 00:03:10,470 Either of those is acceptable. 72 00:03:10,470 --> 00:03:12,550 [INAUDIBLE] the trigonometric functions, 73 00:03:12,550 --> 00:03:17,417 that representation is a little more useful for us here. 74 00:03:17,417 --> 00:03:19,750 We know that the boundary conditions tell us that y of 0 75 00:03:19,750 --> 00:03:22,430 is supposed to be 0. 76 00:03:22,430 --> 00:03:26,590 That means that the C1 has to be 0, because cosine of 0 is 1. 77 00:03:26,590 --> 00:03:29,550 So C1 has 0 in this case. 78 00:03:29,550 --> 00:03:33,030 So that fixes one of these coefficients. 79 00:03:33,030 --> 00:03:35,730 And now we're left with a problem, right? 80 00:03:35,730 --> 00:03:38,310 Our solutions, our eigenfunctions, 81 00:03:38,310 --> 00:03:39,220 cannot be unique. 82 00:03:39,220 --> 00:03:41,760 So we don't get to specify C2, right? 83 00:03:41,760 --> 00:03:44,537 Any function that's a multiple of this sine 84 00:03:44,537 --> 00:03:45,870 should also be an eigenfunction. 85 00:03:45,870 --> 00:03:47,640 So instead the other boundary condition, 86 00:03:47,640 --> 00:03:51,030 this y of l equals 0, needs to be used to pin down 87 00:03:51,030 --> 00:03:52,920 with the eigenvalue is. 88 00:03:52,920 --> 00:03:54,870 So the second equation, y of l equals 89 00:03:54,870 --> 00:03:59,560 0, which implies that the square root of minus lambda 90 00:03:59,560 --> 00:04:01,660 has to be equal to 2 pi over l, it 91 00:04:01,660 --> 00:04:04,232 has to be all the nodes of the sine 92 00:04:04,232 --> 00:04:05,440 where the sine is equal to 0. 93 00:04:05,440 --> 00:04:07,565 That's the equivalent of our secular characteristic 94 00:04:07,565 --> 00:04:10,990 polynomial that prescribes with the eigenvalues are associated 95 00:04:10,990 --> 00:04:13,524 with each of the eigenfunctions. 96 00:04:13,524 --> 00:04:15,190 So now we know what the eigenvalues are. 97 00:04:15,190 --> 00:04:17,320 The eigenvalues are the set of numbers 98 00:04:17,320 --> 00:04:21,700 minus 2 pi n over l squared. 99 00:04:21,700 --> 00:04:23,490 There's an infinite number of eigenvalues. 100 00:04:23,490 --> 00:04:26,920 It's an infinite dimensional space that we're in, 101 00:04:26,920 --> 00:04:29,230 so it's not a big surprise that it works out that way. 102 00:04:29,230 --> 00:04:33,100 And the eigenvectors then are different scalar multiples 103 00:04:33,100 --> 00:04:36,220 of sine of the eigenvalues, square root of the eigenvalues, 104 00:04:36,220 --> 00:04:37,672 minus x. 105 00:04:37,672 --> 00:04:39,130 There's a one-to-one correspondence 106 00:04:39,130 --> 00:04:41,110 between all the linear algebra we've done 107 00:04:41,110 --> 00:04:42,670 and linear differential equations 108 00:04:42,670 --> 00:04:44,560 or linear partial differential equations. 109 00:04:44,560 --> 00:04:47,030 You can think about these things in exactly the same way. 110 00:04:47,030 --> 00:04:53,080 I'm sure in 1050, you started to talk about orthogonal functions 111 00:04:53,080 --> 00:04:55,270 to represent solutions of differential equations. 112 00:04:55,270 --> 00:04:57,190 Or if you haven't, you're going to very soon. 113 00:04:57,190 --> 00:04:58,690 This is a part of the course you get 114 00:04:58,690 --> 00:05:01,180 to look at the analytical side of some of these things as 115 00:05:01,180 --> 00:05:02,230 opposed to the numerical side. 116 00:05:02,230 --> 00:05:03,771 But there's a one-to-one relationship 117 00:05:03,771 --> 00:05:04,730 between those things. 118 00:05:04,730 --> 00:05:06,855 So if you understand one, you understand the other, 119 00:05:06,855 --> 00:05:10,770 and you can come at them from either perspective. 120 00:05:10,770 --> 00:05:11,979 This sort of stuff is useful. 121 00:05:11,979 --> 00:05:14,144 Actually, the classical chemical engineering example 122 00:05:14,144 --> 00:05:15,920 comes from quantum mechanics where 123 00:05:15,920 --> 00:05:18,780 you think about wave functions and different energy levels 124 00:05:18,780 --> 00:05:21,870 corresponding to eigenvalues. 125 00:05:21,870 --> 00:05:23,220 That's cool. 126 00:05:23,220 --> 00:05:25,460 Sometimes, I like to think about a mechanical analog 127 00:05:25,460 --> 00:05:28,420 to that, which is the buckling of an elastic column. 128 00:05:28,420 --> 00:05:29,670 So you should do this at home. 129 00:05:29,670 --> 00:05:31,800 You should go get a piece of spaghetti 130 00:05:31,800 --> 00:05:34,620 and push on the ends of the piece of the spaghetti. 131 00:05:34,620 --> 00:05:36,000 And the spaghetti will buckle. 132 00:05:36,000 --> 00:05:38,300 Eventually it'll break, but it'll buckle first. 133 00:05:38,300 --> 00:05:39,520 It'll bend. 134 00:05:39,520 --> 00:05:41,070 And how does it bend? 135 00:05:41,070 --> 00:05:45,330 Well, a balance of linear momentum on this bar 136 00:05:45,330 --> 00:05:48,780 would tell you that the deflection 137 00:05:48,780 --> 00:05:52,420 in the bar at different points x along the bar 138 00:05:52,420 --> 00:05:57,600 multiplied by the pressure has to balance the bending moment 139 00:05:57,600 --> 00:05:58,690 in the bar itself. 140 00:05:58,690 --> 00:06:00,840 So this e is some elastic constant. 141 00:06:00,840 --> 00:06:02,640 I has a moment of inertia. 142 00:06:02,640 --> 00:06:04,650 And D squared y dx squared is something 143 00:06:04,650 --> 00:06:05,980 like the curvature of the bar. 144 00:06:05,980 --> 00:06:07,590 So it's the bending moments of the bar 145 00:06:07,590 --> 00:06:09,510 that balances the pressure that's 146 00:06:09,510 --> 00:06:11,250 being exerted on the bar. 147 00:06:11,250 --> 00:06:13,800 And sure enough, this bar will buckle 148 00:06:13,800 --> 00:06:16,860 when the pressure applied exceeds 149 00:06:16,860 --> 00:06:20,330 the first eigenvalue associated with this differential 150 00:06:20,330 --> 00:06:20,830 equation. 151 00:06:20,830 --> 00:06:24,390 We just worked that eigenvalue out. 152 00:06:24,390 --> 00:06:29,950 We said that that eigenvalue had to be the square root of 2 153 00:06:29,950 --> 00:06:31,407 pi over l squared. 154 00:06:31,407 --> 00:06:33,240 And so when the pressure exceeds square root 155 00:06:33,240 --> 00:06:36,810 of 2 pi over l squared times the elastic modulus, 156 00:06:36,810 --> 00:06:40,464 this column will bend and deform continuously 157 00:06:40,464 --> 00:06:41,880 until it eventually breaks, right? 158 00:06:41,880 --> 00:06:44,710 It will undergo this linear elastic deformation, 159 00:06:44,710 --> 00:06:48,560 then plastic deformation later, and it will break. 160 00:06:48,560 --> 00:06:50,360 The Eiffel Tower, actually, is one 161 00:06:50,360 --> 00:06:51,860 of the first structures in the world 162 00:06:51,860 --> 00:06:54,270 to utilize this principle, right? 163 00:06:54,270 --> 00:06:56,000 It's got very narrow beams in it. 164 00:06:56,000 --> 00:06:59,614 The beams are engineered so that their elastic modulus 165 00:06:59,614 --> 00:07:01,280 is strong enough that they won't buckle. 166 00:07:01,280 --> 00:07:04,060 Gustave Eiffel is one of the first applied physicists, 167 00:07:04,060 --> 00:07:07,700 somebody who took the physics of elastic bars 168 00:07:07,700 --> 00:07:10,460 and applied them to building structures that 169 00:07:10,460 --> 00:07:14,570 weren't big and blocky, but used a minimal amount of material. 170 00:07:14,570 --> 00:07:16,880 Cool, right? 171 00:07:16,880 --> 00:07:18,505 OK, so that's recap. 172 00:07:21,580 --> 00:07:23,980 Any questions about that? 173 00:07:23,980 --> 00:07:26,410 You've seen these things before. 174 00:07:26,410 --> 00:07:29,530 You understood them well before too maybe? 175 00:07:29,530 --> 00:07:33,430 Give some thought to this, OK? 176 00:07:33,430 --> 00:07:37,090 We talked about eigendecomposition last time 177 00:07:37,090 --> 00:07:40,050 that, associated with the square matrix, 178 00:07:40,050 --> 00:07:44,050 was a particular eigenvalue or particular set of eigenvalues, 179 00:07:44,050 --> 00:07:48,250 stretches and corresponding eigenvectors directions. 180 00:07:48,250 --> 00:07:51,490 These were special solutions to the system of linear equations 181 00:07:51,490 --> 00:07:52,540 based on a matrix. 182 00:07:52,540 --> 00:07:53,815 It was a square matrix. 183 00:07:53,815 --> 00:07:55,540 And you might ask, well, what happens 184 00:07:55,540 --> 00:07:57,850 if the matrix isn't square? 185 00:07:57,850 --> 00:08:03,010 What if A is in the space of real matrices that are n by m, 186 00:08:03,010 --> 00:08:04,630 where n and m maybe aren't the same? 187 00:08:04,630 --> 00:08:06,910 Maybe they are the same, but maybe they're not. 188 00:08:06,910 --> 00:08:10,046 And there is an equivalent decomposition. 189 00:08:10,046 --> 00:08:11,920 It's called the singular value decomposition. 190 00:08:11,920 --> 00:08:16,850 It's like an eigendecomposition for non-square matrices. 191 00:08:16,850 --> 00:08:21,150 So rather than writing our matrix as some w lambda w 192 00:08:21,150 --> 00:08:24,720 inverse, we're going to write it as some product 193 00:08:24,720 --> 00:08:29,490 U times sigma times V with this dagger. 194 00:08:29,490 --> 00:08:31,940 The dagger here is conjugate transpose. 195 00:08:31,940 --> 00:08:34,400 Transpose the matrix, and take the complex conjugate 196 00:08:34,400 --> 00:08:36,382 of all the elements, OK? 197 00:08:36,382 --> 00:08:39,630 I mentioned last time that eigenvalues and eigenvectors 198 00:08:39,630 --> 00:08:42,820 could be complex, potentially, right? 199 00:08:42,820 --> 00:08:45,960 So whenever we have that case where things can be complex, 200 00:08:45,960 --> 00:08:47,510 usually the transposition operation 201 00:08:47,510 --> 00:08:49,510 is replaced with the conjugate transpose. 202 00:08:52,224 --> 00:08:53,640 What are these different matrices. 203 00:08:53,640 --> 00:08:55,620 Well, let me tell you. 204 00:08:55,620 --> 00:08:58,140 U is a complex matrix. 205 00:08:58,140 --> 00:09:01,530 It maps from the space N to R N to R N, 206 00:09:01,530 --> 00:09:04,350 so it's an n by n square matrix. 207 00:09:04,350 --> 00:09:09,240 Sigma is a real valued matrix, and it lives in the space of n 208 00:09:09,240 --> 00:09:10,830 by n matrices. 209 00:09:10,830 --> 00:09:16,520 V is a square matrix again, but it has dimensions m by m. 210 00:09:16,520 --> 00:09:19,770 Remember, A maps from R M to R N, 211 00:09:19,770 --> 00:09:22,110 so that's what the sequence of products says. 212 00:09:22,110 --> 00:09:23,820 B maps from m to m. 213 00:09:23,820 --> 00:09:27,320 Sigma maps from m to n. 214 00:09:27,320 --> 00:09:28,720 U maps from n to n. 215 00:09:28,720 --> 00:09:31,350 So this match from m to n as well. 216 00:09:34,060 --> 00:09:36,760 Sigma is like lambda from before. 217 00:09:36,760 --> 00:09:38,660 It's a diagonal matrix. 218 00:09:38,660 --> 00:09:40,040 It only has diagonal elements. 219 00:09:40,040 --> 00:09:41,415 It's just not square, but it only 220 00:09:41,415 --> 00:09:45,540 has diagonal elements, all of which will be positive. 221 00:09:48,100 --> 00:09:51,730 And then U and V are called the left and right singular 222 00:09:51,730 --> 00:09:52,310 vectors. 223 00:09:52,310 --> 00:09:54,560 And they have special properties associated with them, 224 00:09:54,560 --> 00:09:56,290 which I'll show you right now. 225 00:09:56,290 --> 00:09:59,600 Any questions about how this decomposition 226 00:09:59,600 --> 00:10:02,690 is composed or made up? 227 00:10:02,690 --> 00:10:04,790 It looks just like the eigendecomposition, 228 00:10:04,790 --> 00:10:06,360 but it can be applied to any matrix. 229 00:10:06,360 --> 00:10:06,860 Yes? 230 00:10:06,860 --> 00:10:07,910 AUDIENCE: Quick question. 231 00:10:07,910 --> 00:10:08,170 JAMES W. SWAN: Sure. 232 00:10:08,170 --> 00:10:09,920 AUDIENCE: Do all matrices have this thing, 233 00:10:09,920 --> 00:10:12,440 or is it like the eigenvalues where some do and some don't. 234 00:10:12,440 --> 00:10:13,170 JAMES W. SWAN: This is a great question. 235 00:10:13,170 --> 00:10:15,253 So all matrices are going to have a singular value 236 00:10:15,253 --> 00:10:16,710 decomposition. 237 00:10:16,710 --> 00:10:18,410 We saw with the eigenvalue decomposition 238 00:10:18,410 --> 00:10:22,520 that there could be a case where the eigenvectors are 239 00:10:22,520 --> 00:10:25,190 degenerate, and we can't write that full decomposition. 240 00:10:25,190 --> 00:10:28,214 All matrices are going to have this decomposition. 241 00:10:34,380 --> 00:10:41,690 So for some properties of this decomposition, 242 00:10:41,690 --> 00:10:44,330 U and V are what we call unitary matrices. 243 00:10:44,330 --> 00:10:45,850 I talked about these before. 244 00:10:45,850 --> 00:10:49,430 Unitary matrices are ones for whom, if they're real valued, 245 00:10:49,430 --> 00:10:51,830 their transpose is also their inverse. 246 00:10:51,830 --> 00:10:54,490 If they're complex valued, and they're 247 00:10:54,490 --> 00:10:57,620 conjugate transpose is the equivalent of their inverse. 248 00:10:57,620 --> 00:11:01,930 So U times U conjugate transpose will be identity. 249 00:11:01,930 --> 00:11:05,800 V times V conjugate transpose will be identity. 250 00:11:05,800 --> 00:11:08,510 Unitary matrices also have the property 251 00:11:08,510 --> 00:11:12,400 that they impart no stretch to a matrix-- 252 00:11:12,400 --> 00:11:13,820 or to vectors. 253 00:11:13,820 --> 00:11:16,460 So their maps don't stretch. 254 00:11:16,460 --> 00:11:19,104 They're kind of like rotational matrices, right? 255 00:11:19,104 --> 00:11:21,520 They change directions, but they don't stretch things out. 256 00:11:25,190 --> 00:11:30,890 If I were to take A conjugate transpose and multiply it by A, 257 00:11:30,890 --> 00:11:36,680 that would be the same as taking U sigma V conjugate transpose, 258 00:11:36,680 --> 00:11:38,780 and multiplying it by U sigma V. If I 259 00:11:38,780 --> 00:11:43,010 use the properties of matrix multiplications 260 00:11:43,010 --> 00:11:46,340 and complex conjugate transposes, 261 00:11:46,340 --> 00:11:48,500 and work out what this expression is, 262 00:11:48,500 --> 00:11:52,280 I'll find out that it's equivalent to V sigma conjugate 263 00:11:52,280 --> 00:11:56,660 transpose sigma V conjugate transpose. 264 00:11:56,660 --> 00:12:03,050 Well this has exactly the same form as an eigendecomposition. 265 00:12:03,050 --> 00:12:07,400 An eigendecomposition of A times A instead 266 00:12:07,400 --> 00:12:11,600 of an eigendecomposition of A. So V 267 00:12:11,600 --> 00:12:16,160 is the set of eigenvectors of A conjugate transpose A, 268 00:12:16,160 --> 00:12:19,700 and sigma squared are the eigenvalues 269 00:12:19,700 --> 00:12:24,260 of A conjugate transpose times A. 270 00:12:24,260 --> 00:12:26,610 And if I reverse the order of this multiplication-- 271 00:12:26,610 --> 00:12:30,530 so I do A times A conjugate transpose-- and work it out, 272 00:12:30,530 --> 00:12:34,010 that would be U sigma sigma U. And so U 273 00:12:34,010 --> 00:12:38,060 are the eigenvectors of A A conjugate transpose, 274 00:12:38,060 --> 00:12:42,320 and sigma squared are still the eigenvalues of A A conjugate 275 00:12:42,320 --> 00:12:45,070 transpose. 276 00:12:45,070 --> 00:12:47,830 So what are these things U and V? 277 00:12:47,830 --> 00:12:51,690 They relate to the eigenvectors of the product 278 00:12:51,690 --> 00:12:54,550 of A with itself, this particular product of A 279 00:12:54,550 --> 00:12:58,480 with itself, or this particular product of A with itself. 280 00:12:58,480 --> 00:13:01,310 Sigma are the singular values. 281 00:13:01,310 --> 00:13:03,750 And all matrices possess this sort of a decomposition. 282 00:13:03,750 --> 00:13:06,530 They all have a set of singular values and singular vectors. 283 00:13:06,530 --> 00:13:08,994 These sigmas are called the singular values of the A. 284 00:13:08,994 --> 00:13:10,160 They have a particular name. 285 00:13:10,160 --> 00:13:12,110 I'm going to show you how you can use this decomposition 286 00:13:12,110 --> 00:13:14,000 to do something you already know how to do, 287 00:13:14,000 --> 00:13:15,083 but how to do it formally. 288 00:13:18,200 --> 00:13:21,738 What are some properties of the singular value decomposition? 289 00:13:24,810 --> 00:13:27,510 So if we take a matrix A and we compute it's singular value 290 00:13:27,510 --> 00:13:29,470 decomposition, this is how you do it in Matlab. 291 00:13:33,030 --> 00:13:36,560 We'll find out, for this matrix, U is identity. 292 00:13:36,560 --> 00:13:39,720 Sigma is identity with an extra column pasted on it. 293 00:13:39,720 --> 00:13:40,920 And B is also identity. 294 00:13:40,920 --> 00:13:44,130 I mean, this is the simplest possible four by three matrix 295 00:13:44,130 --> 00:13:45,517 I can write down. 296 00:13:45,517 --> 00:13:47,850 You don't have to know how to compute the singular value 297 00:13:47,850 --> 00:13:50,160 decomposition, you just need to know that it 298 00:13:50,160 --> 00:13:51,420 can be computed in this way. 299 00:13:51,420 --> 00:13:53,160 You might be able to guess how to compute it 300 00:13:53,160 --> 00:13:55,035 based on what we did with eigenvalues earlier 301 00:13:55,035 --> 00:13:57,240 and eigenvectors. 302 00:13:57,240 --> 00:13:59,670 It'll turn out some of the columns of sigma 303 00:13:59,670 --> 00:14:00,950 will be non-zero right? 304 00:14:00,950 --> 00:14:04,390 There are three non-zero columns of sigma. 305 00:14:04,390 --> 00:14:07,780 And the columns of V, they correspond 306 00:14:07,780 --> 00:14:11,800 to those columns of sigma, spanned the null space 307 00:14:11,800 --> 00:14:16,620 of the matrix A. 308 00:14:16,620 --> 00:14:20,220 So the first three columns here are non-zero, 309 00:14:20,220 --> 00:14:24,221 the first three columns of V. I'm sorry, 310 00:14:24,221 --> 00:14:25,970 the first three columns here are non-zero. 311 00:14:25,970 --> 00:14:27,590 The last column is 0. 312 00:14:27,590 --> 00:14:30,440 The columns of sigma which are 0 correspond 313 00:14:30,440 --> 00:14:33,920 to a particular column in V, this last column here, which 314 00:14:33,920 --> 00:14:36,017 lives in the null space of A. So you 315 00:14:36,017 --> 00:14:37,600 can see, if I take A and I multiply it 316 00:14:37,600 --> 00:14:41,970 by any vector that's proportional to 0, 0, 0, 1, 317 00:14:41,970 --> 00:14:43,340 I'll get back 0. 318 00:14:43,340 --> 00:14:46,640 So the null space of A is spanned 319 00:14:46,640 --> 00:14:48,260 by all these vectors corresponding 320 00:14:48,260 --> 00:14:49,850 to the 0 columns of sigma. 321 00:14:52,980 --> 00:14:54,807 Some of the columns of sigma are non-zero. 322 00:14:54,807 --> 00:14:55,890 These first three columns. 323 00:14:55,890 --> 00:15:02,000 And the rows of U corresponding to those three columns 324 00:15:02,000 --> 00:15:04,550 span the range of A. So if I do the singular value 325 00:15:04,550 --> 00:15:09,410 decomposition of a matrix, and I look at U, V, and sigma 326 00:15:09,410 --> 00:15:11,060 and what they're composed of-- 327 00:15:11,060 --> 00:15:15,050 where sigma is 0 and non-zero, and the corresponding columns 328 00:15:15,050 --> 00:15:17,330 or rows of U and V-- then I can figure out 329 00:15:17,330 --> 00:15:25,110 what vectors span the range and null space of the matrix A. 330 00:15:25,110 --> 00:15:26,710 Here's another example. 331 00:15:26,710 --> 00:15:29,130 So here I have A. Now instead of being 332 00:15:29,130 --> 00:15:32,850 three rows by four columns, it's four rows by three columns. 333 00:15:32,850 --> 00:15:34,660 And here's the singular value decomposition 334 00:15:34,660 --> 00:15:37,140 that comes out of Matlab. 335 00:15:37,140 --> 00:15:41,580 There are no vectors that live in the null space of A, 336 00:15:41,580 --> 00:15:44,160 and there are no 0 columns in sigma. 337 00:15:44,160 --> 00:15:46,975 There's no corresponding columns in V. 338 00:15:46,975 --> 00:15:50,920 There are no vectors in the null space of A. 339 00:15:50,920 --> 00:15:55,480 The range of A is spanned by the rows corresponding 340 00:15:55,480 --> 00:15:58,420 to the non-zero-- the rows of U corresponding 341 00:15:58,420 --> 00:15:59,950 to the non-zero columns of sigma. 342 00:15:59,950 --> 00:16:02,890 So it's these three columns in the first three rows. 343 00:16:02,890 --> 00:16:07,000 And these first three rows, clearly they span-- 344 00:16:07,000 --> 00:16:12,040 they describe the same range as the three columns in A. 345 00:16:12,040 --> 00:16:13,540 So the singular value decomposition 346 00:16:13,540 --> 00:16:17,290 gives us direct access to the null space 347 00:16:17,290 --> 00:16:18,810 and the range of a matrix. 348 00:16:18,810 --> 00:16:21,570 That's handy. 349 00:16:21,570 --> 00:16:24,170 And it can be used in various ways. 350 00:16:24,170 --> 00:16:26,140 So here's one example where it can be used. 351 00:16:26,140 --> 00:16:29,830 Here I have a fingerprint. 352 00:16:29,830 --> 00:16:31,150 It's a bitmap. 353 00:16:31,150 --> 00:16:33,767 It's a square bit of data, like a matrix, 354 00:16:33,767 --> 00:16:35,350 and each of the elements of the matrix 355 00:16:35,350 --> 00:16:39,430 takes on a value describing how dark or light that pixel. 356 00:16:39,430 --> 00:16:43,840 Let's say it's grayscale, and it's value's between 0 and 255. 357 00:16:43,840 --> 00:16:45,820 That's pretty typical. 358 00:16:45,820 --> 00:16:48,490 So I have this matrix, and each element to the matrix 359 00:16:48,490 --> 00:16:50,410 corresponds to a pixel. 360 00:16:50,410 --> 00:16:53,470 And I do a singular value decomposition. 361 00:16:53,470 --> 00:16:55,750 Some of the singular values, the values of sigma, 362 00:16:55,750 --> 00:16:57,760 are bigger than others. 363 00:16:57,760 --> 00:17:00,770 They're all positive, but some are bigger than others. 364 00:17:00,770 --> 00:17:02,440 The ones that are biggest in magnitude 365 00:17:02,440 --> 00:17:06,770 carry the most information content about the matrix. 366 00:17:06,770 --> 00:17:11,470 So we can do data compression by neglecting singular values that 367 00:17:11,470 --> 00:17:14,980 are smaller than some threshold, and also neglecting 368 00:17:14,980 --> 00:17:17,439 the corresponding singular vectors. 369 00:17:17,439 --> 00:17:18,730 And that's what I've done here. 370 00:17:18,730 --> 00:17:21,400 So here's the original bitmap of the fingerprint. 371 00:17:21,400 --> 00:17:23,680 I did the singular value decomposition, 372 00:17:23,680 --> 00:17:27,819 and then I retained only the 50 biggest singular values and I 373 00:17:27,819 --> 00:17:29,830 left all the other singular values out. 374 00:17:29,830 --> 00:17:32,290 This bitmap was something like, I don't know, 375 00:17:32,290 --> 00:17:34,450 300 pixels by 300 pixels, so there's 376 00:17:34,450 --> 00:17:37,210 like 300 singular values, but I got rid 377 00:17:37,210 --> 00:17:40,480 of 5/6 of the information content. 378 00:17:40,480 --> 00:17:43,120 I dropped 5/6 of the singular vectors, 379 00:17:43,120 --> 00:17:44,800 and then I reconstructed the matrix 380 00:17:44,800 --> 00:17:47,320 from the singular values and those singular vectors, 381 00:17:47,320 --> 00:17:48,910 and you get a faithful representation 382 00:17:48,910 --> 00:17:51,142 of the original fingerprint. 383 00:17:51,142 --> 00:17:52,600 So the singular value decomposition 384 00:17:52,600 --> 00:17:54,433 says something about the information content 385 00:17:54,433 --> 00:17:57,100 in the transformation that is the matrix, right? 386 00:17:57,100 --> 00:17:58,720 There are some transformations that 387 00:17:58,720 --> 00:18:03,172 are of lower power or importance than others. 388 00:18:03,172 --> 00:18:04,630 And the magnitude of these singular 389 00:18:04,630 --> 00:18:06,300 values tell you what they are. 390 00:18:06,300 --> 00:18:09,680 Does that makes sense? 391 00:18:09,680 --> 00:18:10,680 How else can it be used? 392 00:18:10,680 --> 00:18:14,370 Well, one way it can be used is finding the least 393 00:18:14,370 --> 00:18:17,820 square solution to the equation Ax 394 00:18:17,820 --> 00:18:22,420 equals b, where A is no longer a square matrix, OK? 395 00:18:26,410 --> 00:18:28,030 You've done this in other contexts 396 00:18:28,030 --> 00:18:32,860 before where the equations are overspecified. 397 00:18:32,860 --> 00:18:36,090 We have more equations than unknowns, like data fitting. 398 00:18:36,090 --> 00:18:39,820 You form the normal equations, you multiply both sides of Ax 399 00:18:39,820 --> 00:18:45,910 equals b by A transpose, and then invert A transpose A. 400 00:18:45,910 --> 00:18:47,890 You might not be too surprised, then, 401 00:18:47,890 --> 00:18:50,380 to think that singular value decomposition could be useful 402 00:18:50,380 --> 00:18:50,920 here too. 403 00:18:50,920 --> 00:18:54,790 Since we already saw the data in a singular value decomposition 404 00:18:54,790 --> 00:18:58,630 corresponds to eigenvectors and eigenvalues of this A transpose 405 00:18:58,630 --> 00:19:00,130 A, right? 406 00:19:00,130 --> 00:19:02,950 But there's a way to use this sort of decomposition 407 00:19:02,950 --> 00:19:05,140 formally to solve problems that are 408 00:19:05,140 --> 00:19:09,730 both overspecified and underspecified. 409 00:19:09,730 --> 00:19:14,620 Least squares means find the vector of solutions 410 00:19:14,620 --> 00:19:21,860 x that minimizes this function phi. 411 00:19:21,860 --> 00:19:24,560 Phi is the length of the vector given by the difference 412 00:19:24,560 --> 00:19:26,450 between Ax and b. 413 00:19:26,450 --> 00:19:29,900 It's one measure of how far an error our solution x is. 414 00:19:29,900 --> 00:19:34,220 So let's define the value x which is least in error. 415 00:19:34,220 --> 00:19:36,065 This is one definition of least squares. 416 00:19:40,490 --> 00:19:43,950 And I know the singular value decomposition of A. So A 417 00:19:43,950 --> 00:19:48,685 is U sigma times V. So I have U sigma V times x. 418 00:19:48,685 --> 00:19:52,140 I can factor out U, and I've got a factor of U transpose, 419 00:19:52,140 --> 00:19:54,960 or U conjugate transpose multiplying by b. 420 00:19:54,960 --> 00:20:00,000 So Ax minus b is the same as U times the quantity sigma V 421 00:20:00,000 --> 00:20:05,240 conjugate transpose x minus U conjugate transpose b. 422 00:20:05,240 --> 00:20:08,480 We want to know the x that minimizes this phi. 423 00:20:08,480 --> 00:20:10,130 It's an optimization problem. 424 00:20:10,130 --> 00:20:12,990 We'll talk in great detail about these sorts of problems later. 425 00:20:12,990 --> 00:20:15,350 This one is so easy to do, we can just work it out 426 00:20:15,350 --> 00:20:18,152 in a couple lines of text. 427 00:20:18,152 --> 00:20:19,610 We'll define a new set of unknowns, 428 00:20:19,610 --> 00:20:25,130 y, which is V transpose times x, and a new right-hand side 429 00:20:25,130 --> 00:20:29,270 for a system of equations p, which is U transpose times b. 430 00:20:29,270 --> 00:20:31,220 And then we can rewrite our function phi 431 00:20:31,220 --> 00:20:32,600 that we're trying to minimize. 432 00:20:32,600 --> 00:20:37,720 So phi then becomes U sigma y minus p. 433 00:20:37,720 --> 00:20:39,080 U is a unitary vector. 434 00:20:39,080 --> 00:20:45,440 It imparts no stretch in the two norms, so this sigma y minus p 435 00:20:45,440 --> 00:20:49,850 doesn't get elongated by multiplication with U. 436 00:20:49,850 --> 00:20:52,280 So it's length, the length of this, 437 00:20:52,280 --> 00:20:55,660 is the same as the length of sigma y minus p. 438 00:20:55,660 --> 00:20:56,915 You can prove this. 439 00:20:56,915 --> 00:20:58,540 It's not very difficult to show at all. 440 00:20:58,540 --> 00:21:02,430 You use the definition of the two norm to prove it. 441 00:21:02,430 --> 00:21:10,150 So phi is minimized by y's, which makes this norm smallest, 442 00:21:10,150 --> 00:21:11,170 make it closest to 0. 443 00:21:14,240 --> 00:21:17,060 Let r be the number of non-zero singular 444 00:21:17,060 --> 00:21:19,190 values, the number of those sigmas 445 00:21:19,190 --> 00:21:22,440 which are not equal to 0. 446 00:21:22,440 --> 00:21:24,360 That's also the rank of A. 447 00:21:24,360 --> 00:21:29,330 Then I can rewrite phi as the sum from i 448 00:21:29,330 --> 00:21:36,060 equals 1 to r of sigma i i time y i minus p i squared. 449 00:21:36,060 --> 00:21:40,160 That's parts of this length, this Euclidean length, 450 00:21:40,160 --> 00:21:42,320 for which sigma is non-zero. 451 00:21:42,320 --> 00:21:46,730 Plus the sum from r plus 1 to n, the sum 452 00:21:46,730 --> 00:21:50,240 over the rest of the values of p, 453 00:21:50,240 --> 00:21:52,590 for which the corresponding sigmas are 0. 454 00:21:59,710 --> 00:22:02,830 I want to minimize phi, and the only thing 455 00:22:02,830 --> 00:22:05,337 that I can change to minimize it is what? 456 00:22:08,139 --> 00:22:10,530 What am I free to pick in this equation 457 00:22:10,530 --> 00:22:13,930 in order to make phi as small as possible? 458 00:22:13,930 --> 00:22:14,570 Yeah? 459 00:22:14,570 --> 00:22:15,070 AUDIENCE: y. 460 00:22:15,070 --> 00:22:17,290 JAMES W. SWAN: y, so I need to choose 461 00:22:17,290 --> 00:22:21,110 the y's that make this phi as small as possible. 462 00:22:21,110 --> 00:22:22,955 What value should I choose for the y's? 463 00:22:26,678 --> 00:22:27,666 What do you think? 464 00:22:30,630 --> 00:22:34,380 AUDIENCE: [INAUDIBLE] 465 00:22:34,380 --> 00:22:35,630 JAMES W. SWAN: Perfect, right? 466 00:22:35,630 --> 00:22:40,000 Choose y equals p i over sigma i i. 467 00:22:40,000 --> 00:22:42,610 Right, y i is p i over sigma i i. 468 00:22:42,610 --> 00:22:46,150 Then all of these terms is 0. 469 00:22:46,150 --> 00:22:48,490 I can't make this sum any smaller than that. 470 00:22:48,490 --> 00:22:53,980 That fixes the value of y i up to r. 471 00:22:53,980 --> 00:22:57,280 I can't do anything about this left over bit here. 472 00:22:57,280 --> 00:23:00,340 There's no choice of y that's going to make 473 00:23:00,340 --> 00:23:01,657 this part and the smaller. 474 00:23:01,657 --> 00:23:02,490 It's just left over. 475 00:23:02,490 --> 00:23:04,275 It's some remainder that we can't make any smaller 476 00:23:04,275 --> 00:23:05,410 or minimize an smaller. 477 00:23:05,410 --> 00:23:08,125 There isn't an exact solution to this problem, in many cases. 478 00:23:10,690 --> 00:23:17,190 But one way this could be 0 is if r is equal to n. 479 00:23:17,190 --> 00:23:19,570 Then there are left over unspecified terms, 480 00:23:19,570 --> 00:23:22,375 and then this y i equals p i over sigma i 481 00:23:22,375 --> 00:23:23,920 is the exact solution to the problem. 482 00:23:28,280 --> 00:23:29,690 So this is what you told me. 483 00:23:29,690 --> 00:23:36,230 Choose y i is p i over sigma i i for i bigger than 1 and smaller 484 00:23:36,230 --> 00:23:38,650 than r. 485 00:23:38,650 --> 00:23:40,710 There are going to be values of y i 486 00:23:40,710 --> 00:23:46,070 that go between r plus 1 and m, because A was a vector that 487 00:23:46,070 --> 00:23:47,990 mapped from m to n, right? 488 00:23:47,990 --> 00:23:52,160 So I have extra values of y that could be specified potentially. 489 00:23:52,160 --> 00:23:56,279 If that's true, if r plus 1 is smaller than m, 490 00:23:56,279 --> 00:23:58,570 then there's some components of y that I don't get to-- 491 00:23:58,570 --> 00:23:59,770 I can't specify, right? 492 00:23:59,770 --> 00:24:02,930 My system of equations is somehow underdetermined. 493 00:24:02,930 --> 00:24:05,540 I need some external information to show me what 494 00:24:05,540 --> 00:24:07,740 values to pick for those y i. 495 00:24:07,740 --> 00:24:08,900 I don't know. 496 00:24:08,900 --> 00:24:10,400 I can't use them. 497 00:24:10,400 --> 00:24:13,420 Sometimes people just set y i equal to 0. 498 00:24:13,420 --> 00:24:16,220 That's sort of silly, but that's what's done. 499 00:24:16,220 --> 00:24:20,150 It's called the minimum norm least square solution. 500 00:24:20,150 --> 00:24:23,090 y has minimum length, when you set all these other components 501 00:24:23,090 --> 00:24:24,050 to 0. 502 00:24:24,050 --> 00:24:27,140 But the truth is, we can't specify those components, 503 00:24:27,140 --> 00:24:27,716 right? 504 00:24:27,716 --> 00:24:29,090 We need some external information 505 00:24:29,090 --> 00:24:31,850 in order to specify them. 506 00:24:31,850 --> 00:24:34,680 Once we know y, we can find x going back 507 00:24:34,680 --> 00:24:36,260 to our definition of what y is. 508 00:24:36,260 --> 00:24:38,840 So I multiply this equation by V on both sides, 509 00:24:38,840 --> 00:24:42,350 and I'll get V y equals x. 510 00:24:42,350 --> 00:24:44,180 So I can find my least square solution 511 00:24:44,180 --> 00:24:47,596 to the problem from the singular value decomposition. 512 00:24:47,596 --> 00:24:49,220 So I can find the least square solution 513 00:24:49,220 --> 00:24:52,850 to both overdetermined and underdetermined problems using 514 00:24:52,850 --> 00:24:55,694 singular value decomposition. 515 00:24:55,694 --> 00:24:57,110 It inherits all the properties you 516 00:24:57,110 --> 00:24:58,670 know of solving the normal equations, 517 00:24:58,670 --> 00:25:01,430 multiplying by A transpose the entire equation, 518 00:25:01,430 --> 00:25:04,722 and solving for a least square solution that way. 519 00:25:04,722 --> 00:25:06,680 But that's only good for overdetermined systems 520 00:25:06,680 --> 00:25:07,250 of equations. 521 00:25:07,250 --> 00:25:09,041 This can work for underdetermined equations 522 00:25:09,041 --> 00:25:09,770 as well. 523 00:25:09,770 --> 00:25:12,450 And maybe we do have extraneous information 524 00:25:12,450 --> 00:25:15,080 that lets us specify these other components somehow. 525 00:25:15,080 --> 00:25:17,420 Maybe we do a separate optimization 526 00:25:17,420 --> 00:25:20,110 that chooses from all possible solutions 527 00:25:20,110 --> 00:25:23,070 where these y i's are free, and picks the best one subject 528 00:25:23,070 --> 00:25:25,890 to some other constraint. 529 00:25:25,890 --> 00:25:27,550 Does it makes sense? 530 00:25:27,550 --> 00:25:29,010 OK, that's the last decomposition 531 00:25:29,010 --> 00:25:32,070 we're going to talk about. 532 00:25:32,070 --> 00:25:34,890 It's as expensive to compute the singular value decomposition 533 00:25:34,890 --> 00:25:36,810 as it is to solve a system of equations. 534 00:25:36,810 --> 00:25:38,310 You might have guessed that it's got 535 00:25:38,310 --> 00:25:40,470 an order N cubed flavor to it. 536 00:25:40,470 --> 00:25:42,660 It's kind of inescapable that we run up 537 00:25:42,660 --> 00:25:45,920 against those computational difficulties, 538 00:25:45,920 --> 00:25:48,530 order N cubed computational complexity. 539 00:25:48,530 --> 00:25:50,730 And there are many problems of practical interest, 540 00:25:50,730 --> 00:25:54,180 particularly solutions of PDEs, for which that's not 541 00:25:54,180 --> 00:25:56,000 going to cut it. 542 00:25:56,000 --> 00:25:59,970 Where you couldn't solve the problem with that sort 543 00:25:59,970 --> 00:26:01,330 of scaling in time. 544 00:26:01,330 --> 00:26:04,710 You couldn't compute the Gaussian elimination, 545 00:26:04,710 --> 00:26:06,540 or the singular value decomposition, 546 00:26:06,540 --> 00:26:08,970 or an eigenvalue decomposition. 547 00:26:08,970 --> 00:26:09,960 It won't work. 548 00:26:09,960 --> 00:26:14,520 And in those cases, we appeal to not exact solution methods, 549 00:26:14,520 --> 00:26:17,010 but approximate solution methods. 550 00:26:17,010 --> 00:26:20,120 So instead of trying to get an exact solution, 551 00:26:20,120 --> 00:26:22,080 we'll try to formulate one that's good enough. 552 00:26:22,080 --> 00:26:24,390 We already know the computer introduces numerical error 553 00:26:24,390 --> 00:26:26,070 anyways. 554 00:26:26,070 --> 00:26:28,440 Maybe we don't need machine precision in our solution 555 00:26:28,440 --> 00:26:30,600 or something close to machine precision in our solution. 556 00:26:30,600 --> 00:26:32,266 Maybe we're solving engineering problem, 557 00:26:32,266 --> 00:26:34,230 and we're willing to accept relative errors 558 00:26:34,230 --> 00:26:37,770 on the order of 10 to the minus 3 or 10 to the minus 5, 559 00:26:37,770 --> 00:26:41,550 some specified tolerance that we apply to the problem. 560 00:26:41,550 --> 00:26:44,250 And in those circumstances, we use iterative methods 561 00:26:44,250 --> 00:26:46,095 to solve systems of equations instead 562 00:26:46,095 --> 00:26:49,020 of exact methods, elimination methods, 563 00:26:49,020 --> 00:26:50,640 or metrics decomposition methods. 564 00:26:53,830 --> 00:26:56,560 These algorithms are all based on iterative refinement 565 00:26:56,560 --> 00:26:57,580 of an initial guess. 566 00:26:57,580 --> 00:26:59,260 So if we have some system of equations 567 00:26:59,260 --> 00:27:02,350 we're trying to solve, Ax equals b, 568 00:27:02,350 --> 00:27:05,410 we'll formulate some linear map, right? 569 00:27:05,410 --> 00:27:09,910 xi plus 1 will be some matrix C times x i 570 00:27:09,910 --> 00:27:13,540 plus some little vector c where x i is my last best 571 00:27:13,540 --> 00:27:17,080 guess for the solution to this problem, and x i plus 1 572 00:27:17,080 --> 00:27:20,320 is my next best guess for the solution to this problem. 573 00:27:20,320 --> 00:27:24,700 And I'm hoping, as I apply this map more and more times, 574 00:27:24,700 --> 00:27:27,010 I'm creeping closer to the exact solution 575 00:27:27,010 --> 00:27:29,590 to the original system of equations. 576 00:27:29,590 --> 00:27:33,610 The map will converge when x i plus 1 577 00:27:33,610 --> 00:27:36,190 approaches x i, when the map isn't making any changes 578 00:27:36,190 --> 00:27:39,250 to the vector anymore. 579 00:27:39,250 --> 00:27:48,490 And the converged value will be a solution when x i-- 580 00:27:48,490 --> 00:27:51,910 which is equal to i minus c inverse times c, 581 00:27:51,910 --> 00:27:54,370 if I replace x i was 1 with x i appear, 582 00:27:54,370 --> 00:27:58,120 so I say that my map has converged-- when this value is 583 00:27:58,120 --> 00:28:00,160 equivalent to A inverse times B, when 584 00:28:00,160 --> 00:28:03,880 it's a solution to the original problem, right? 585 00:28:03,880 --> 00:28:05,480 So my map may converge. 586 00:28:05,480 --> 00:28:08,320 It may not converge to a solution of the problem I like, 587 00:28:08,320 --> 00:28:09,790 but if it satisfies this condition, 588 00:28:09,790 --> 00:28:12,730 then has converged to be a solution of the problem 589 00:28:12,730 --> 00:28:14,270 that I like as well. 590 00:28:14,270 --> 00:28:18,400 And so it's all about using this C here and this little c here 591 00:28:18,400 --> 00:28:22,000 so that this map converges to solution of the problem 592 00:28:22,000 --> 00:28:22,720 I'm after. 593 00:28:22,720 --> 00:28:25,000 And there are lots of schemes for doing this. 594 00:28:25,000 --> 00:28:26,680 Some of them are kind of ad hoc. 595 00:28:26,680 --> 00:28:28,230 I'm going to show you one right now. 596 00:28:28,230 --> 00:28:30,490 And then when we do optimization, 597 00:28:30,490 --> 00:28:32,230 we'll talk about a more formal way 598 00:28:32,230 --> 00:28:35,210 of doing this for which you can guarantee 599 00:28:35,210 --> 00:28:37,937 very rapid convergence to a solution. 600 00:28:37,937 --> 00:28:40,020 So here's a system of equations I'd like to solve. 601 00:28:40,020 --> 00:28:41,510 It's not a very big one. 602 00:28:41,510 --> 00:28:44,050 It doesn't really make sense to solve this one iteratively, 603 00:28:44,050 --> 00:28:45,940 but it's a nice illustration. 604 00:28:45,940 --> 00:28:49,570 One way to go about formulating this map 605 00:28:49,570 --> 00:28:53,290 is to split this matrix into two parts. 606 00:28:53,290 --> 00:28:57,160 So I'll split it into a diagonal part and an off diagonal part. 607 00:28:57,160 --> 00:29:00,070 So I haven't changed the problem at all by doing that. 608 00:29:00,070 --> 00:29:03,740 And then I'm going to rename this x x i plus 1, 609 00:29:03,740 --> 00:29:07,060 and I'm going to rename this x x i. 610 00:29:07,060 --> 00:29:09,454 And then move this matrix vector product 611 00:29:09,454 --> 00:29:10,870 to the other side of the equation. 612 00:29:10,870 --> 00:29:12,376 And here's my map. 613 00:29:12,376 --> 00:29:13,750 Of course, this matrix multiplied 614 00:29:13,750 --> 00:29:16,210 doesn't make any-- it's not useful to write it out 615 00:29:16,210 --> 00:29:16,780 explicitly. 616 00:29:16,780 --> 00:29:19,070 This is just identity. 617 00:29:19,070 --> 00:29:20,840 So I can drop this entirely. 618 00:29:20,840 --> 00:29:22,090 This is just x i plus one. 619 00:29:22,090 --> 00:29:23,320 So here's my map. 620 00:29:23,320 --> 00:29:27,430 Take an initial guess, multiply it by this matrix, 621 00:29:27,430 --> 00:29:31,600 add the vector 1, 0, and repeat over and over and over again. 622 00:29:31,600 --> 00:29:33,130 Hopefully-- we don't really know-- 623 00:29:33,130 --> 00:29:34,671 but hopefully, it's going to converge 624 00:29:34,671 --> 00:29:38,502 to a solution of the original linear equations. 625 00:29:38,502 --> 00:29:39,710 I didn't make up that method. 626 00:29:39,710 --> 00:29:42,490 That's a method called Jacobi Iteration. 627 00:29:42,490 --> 00:29:46,280 And the strategy is to split the matrix A into two parts-- 628 00:29:46,280 --> 00:29:49,010 a sum of its diagonal elements, and it's off diagonal 629 00:29:49,010 --> 00:29:51,050 elements-- 630 00:29:51,050 --> 00:29:54,410 and rewrite the original equations as an iterative map. 631 00:29:54,410 --> 00:30:00,650 So D times x i plus 1 is equal to minus r times x i plus b. 632 00:30:00,650 --> 00:30:07,600 Or x i plus 1 is D inverse times minus r x i plus b. 633 00:30:07,600 --> 00:30:11,100 If the equations converge, then D plus r times x i 634 00:30:11,100 --> 00:30:13,780 has to be equal to b, we will have found a solution. 635 00:30:13,780 --> 00:30:15,040 If it converges, right? 636 00:30:15,040 --> 00:30:18,729 If these iterations approach a steady value. 637 00:30:18,729 --> 00:30:20,770 If they don't change from iteration to iteration. 638 00:30:20,770 --> 00:30:21,940 Is 639 00:30:21,940 --> 00:30:23,796 The nice thing about the Jacobi method is it 640 00:30:23,796 --> 00:30:25,170 turns the hard problem, the order 641 00:30:25,170 --> 00:30:29,920 N cubed problem of computing A inverse B, 642 00:30:29,920 --> 00:30:31,750 into a succession of easy problems, 643 00:30:31,750 --> 00:30:38,710 D inverse times some vector C. How many calculations does it 644 00:30:38,710 --> 00:30:40,150 take to compute that D inverse? 645 00:30:44,710 --> 00:30:46,454 N, that's right, order N. 646 00:30:46,454 --> 00:30:47,620 It's just a diagonal matrix. 647 00:30:47,620 --> 00:30:49,880 I invert each of its diagonal elements, and I'm done. 648 00:30:49,880 --> 00:30:53,790 So I went from order N cubed, which was going to be hard, 649 00:30:53,790 --> 00:30:57,320 into a succession of order N problems. 650 00:30:57,320 --> 00:30:59,730 So as long as it doesn't take me order N squared 651 00:30:59,730 --> 00:31:02,879 iterations to get to the solution that I want, 652 00:31:02,879 --> 00:31:03,670 I'm going to be OK. 653 00:31:03,670 --> 00:31:05,378 This is going to be a viable way to solve 654 00:31:05,378 --> 00:31:07,690 this problem faster than finding the exact solution. 655 00:31:13,240 --> 00:31:15,640 How do you know that it converges? 656 00:31:15,640 --> 00:31:17,020 That's the question. 657 00:31:17,020 --> 00:31:20,290 Is this thing actually going to converge or not, 658 00:31:20,290 --> 00:31:23,980 or are these iterations just going to run on and on forever? 659 00:31:23,980 --> 00:31:26,620 Well, one way to check whether it will converge or not 660 00:31:26,620 --> 00:31:31,220 is to go back up to this equation here, and substitute b 661 00:31:31,220 --> 00:31:34,890 equals Ax, where x is the exact solution to the problem. 662 00:31:37,420 --> 00:31:40,810 And you can transform, then, this equation into one 663 00:31:40,810 --> 00:31:45,640 that looks like x i plus 1 minus x equal to minus D 664 00:31:45,640 --> 00:31:49,930 inverse times r x i minus x. 665 00:31:49,930 --> 00:31:52,780 And if I take the norm of both sides 666 00:31:52,780 --> 00:31:55,270 and I apply our normal equality-- 667 00:31:55,270 --> 00:31:58,000 where the norm of a matrix vector product 668 00:31:58,000 --> 00:32:01,460 is smaller than the product of the norms of the matrices 669 00:32:01,460 --> 00:32:02,860 of the vectors-- 670 00:32:02,860 --> 00:32:06,070 then I can get a ratio like this. 671 00:32:06,070 --> 00:32:10,060 That the absolute error in iteration I plus 1 672 00:32:10,060 --> 00:32:13,090 divided by the absolute error in iteration i 673 00:32:13,090 --> 00:32:17,410 is smaller than the norm of this matrix. 674 00:32:17,410 --> 00:32:21,810 So if I'm converging, then what I expect 675 00:32:21,810 --> 00:32:25,320 is this ratio should be smaller than 1. 676 00:32:25,320 --> 00:32:27,240 The error in my next approximation 677 00:32:27,240 --> 00:32:30,080 should be smaller than the error in my current approximation. 678 00:32:30,080 --> 00:32:31,450 That makes sense? 679 00:32:31,450 --> 00:32:35,100 So that means that I would hope that the norm of this matrix 680 00:32:35,100 --> 00:32:36,690 is also smaller than 1. 681 00:32:36,690 --> 00:32:40,290 If it is, then I'm going to be guaranteed to converge. 682 00:32:40,290 --> 00:32:42,912 So for a particular coefficient matrix, 683 00:32:42,912 --> 00:32:45,120 for a system of linear equations I'm trying to solve, 684 00:32:45,120 --> 00:32:47,010 I may be able to find-- 685 00:32:47,010 --> 00:32:50,430 I may find that this is true. 686 00:32:50,430 --> 00:32:51,960 And then I can apply this method, 687 00:32:51,960 --> 00:32:54,980 and I'll converge to a solution. 688 00:32:54,980 --> 00:32:58,100 We call this sort of convergence linear. 689 00:32:58,100 --> 00:32:59,990 Whatever this number is, it tells me 690 00:32:59,990 --> 00:33:03,890 the fraction by which the error is reduced from iteration 691 00:33:03,890 --> 00:33:04,940 to iteration. 692 00:33:04,940 --> 00:33:07,810 So suppose this is 1/10. 693 00:33:07,810 --> 00:33:11,510 Then the absolute error is going to be reduced by a factor of 10 694 00:33:11,510 --> 00:33:14,300 in each iteration. 695 00:33:14,300 --> 00:33:15,720 It's not going to be 1/10 usually. 696 00:33:15,720 --> 00:33:17,095 It's going to be something that's 697 00:33:17,095 --> 00:33:20,201 a little bit bigger than that typically, but that's the idea. 698 00:33:20,201 --> 00:33:22,700 You can show-- I would encourage you to try to work this out 699 00:33:22,700 --> 00:33:25,560 on your own-- but you can show that the infinity norm of this 700 00:33:25,560 --> 00:33:26,060 product-- 701 00:33:29,060 --> 00:33:33,050 infinity norm of this product is equal to this. 702 00:33:33,050 --> 00:33:36,040 And if I ask that the infinity norm of this product 703 00:33:36,040 --> 00:33:38,540 be smaller than 1, that's guaranteed 704 00:33:38,540 --> 00:33:41,450 when the diagonal values of the matrix and absolute value 705 00:33:41,450 --> 00:33:43,850 are bigger than the sum of the off 706 00:33:43,850 --> 00:33:46,670 diagonal values in a particular row or a particular column. 707 00:33:46,670 --> 00:33:49,187 And that kind of matrix we call diagonally dominant. 708 00:33:49,187 --> 00:33:51,770 The diagonal values are bigger than the sum and absolute value 709 00:33:51,770 --> 00:33:53,570 of the off diagonal pieces. 710 00:33:53,570 --> 00:33:57,290 So diagonally dominant matrices, which come up quite often, 711 00:33:57,290 --> 00:33:59,240 can be-- those linear equations based 712 00:33:59,240 --> 00:34:02,660 on those matrices can be solved reasonable efficiency using 713 00:34:02,660 --> 00:34:04,262 the Jacobi method. 714 00:34:04,262 --> 00:34:05,720 There are better methods to choose. 715 00:34:05,720 --> 00:34:07,350 I'll show you one in a second. 716 00:34:07,350 --> 00:34:08,900 But you can guarantee that this is 717 00:34:08,900 --> 00:34:11,360 going to converge to a solution, and that the solution will 718 00:34:11,360 --> 00:34:13,219 be the right solution to the linear equations you 719 00:34:13,219 --> 00:34:14,094 were trying to solve. 720 00:34:19,440 --> 00:34:23,010 So if the goal is just to turn hard problems into easier 721 00:34:23,010 --> 00:34:25,500 to solve problems, then there are other natural ways 722 00:34:25,500 --> 00:34:27,540 to want to split a matrix. 723 00:34:27,540 --> 00:34:32,100 So maybe you want to split into A lower triangular part which 724 00:34:32,100 --> 00:34:34,829 contains the diagonal elements of A, 725 00:34:34,829 --> 00:34:36,449 and an upper triangular part which 726 00:34:36,449 --> 00:34:41,130 has no diagonal elements of A. We just split this thing apart. 727 00:34:41,130 --> 00:34:43,370 And then we could rewrite our system of equations 728 00:34:43,370 --> 00:34:45,750 is an iterative map like this, L times x i 729 00:34:45,750 --> 00:34:50,429 plus 1 is minus U times x i plus b. 730 00:34:50,429 --> 00:34:53,550 All I have to do is invert l to find my next iteration. 731 00:34:53,550 --> 00:34:55,590 And how expensive computationally 732 00:34:55,590 --> 00:35:00,360 is it to solve a system of equations which is triangular? 733 00:35:00,360 --> 00:35:02,630 This is a process we call back substitution. 734 00:35:02,630 --> 00:35:03,540 Its order-- 735 00:35:03,540 --> 00:35:04,470 AUDIENCE: N squared. 736 00:35:04,470 --> 00:35:05,820 JAMES W. SWAN: --N squared. 737 00:35:05,820 --> 00:35:08,160 So we still beat N cubed. 738 00:35:08,160 --> 00:35:11,640 One would hope that it doesn't require too many iterations 739 00:35:11,640 --> 00:35:12,420 to do this. 740 00:35:12,420 --> 00:35:15,160 But in principle, we can do this order N squared operations 741 00:35:15,160 --> 00:35:16,500 many times. 742 00:35:16,500 --> 00:35:18,840 And it'll turn out that this sort of a map 743 00:35:18,840 --> 00:35:22,080 converges to the solution that we're after. 744 00:35:22,080 --> 00:35:24,750 It converges when matrices are either diagonally dominant 745 00:35:24,750 --> 00:35:28,470 as before, or they're symmetric and they're positive definite. 746 00:35:28,470 --> 00:35:31,680 Positive definite means all the eigenvalues of the matrix 747 00:35:31,680 --> 00:35:33,240 are bigger than 0. 748 00:35:40,680 --> 00:35:43,292 So try the iterative method solving some equations 749 00:35:43,292 --> 00:35:44,250 and see how we convert. 750 00:35:44,250 --> 00:35:44,749 Yes? 751 00:35:44,749 --> 00:35:47,460 AUDIENCE: How do you justify ignoring the diagonal elements 752 00:35:47,460 --> 00:35:48,400 in that method? 753 00:35:50,862 --> 00:35:52,320 JAMES W. SWAN: So the question was, 754 00:35:52,320 --> 00:35:54,960 how do you justify ignoring the diagonal elements 755 00:35:54,960 --> 00:35:56,102 in this method. 756 00:35:56,102 --> 00:35:57,810 Maybe I was going too fast or I misspoke. 757 00:35:57,810 --> 00:36:01,980 So I'm going to split A into a lower triangular matrix that 758 00:36:01,980 --> 00:36:05,250 has all the diagonal elements, and U 759 00:36:05,250 --> 00:36:08,125 is the upper parts with none of those diagonal elements on it. 760 00:36:08,125 --> 00:36:09,000 Does that make sense? 761 00:36:09,000 --> 00:36:09,750 AUDIENCE: Yeah. 762 00:36:09,750 --> 00:36:11,180 JAMES W. SWAN: Thank you for asking that question. 763 00:36:11,180 --> 00:36:12,430 I hope that's clear. 764 00:36:12,430 --> 00:36:16,660 l holds onto the diagonal pieces and U takes those away. 765 00:36:19,790 --> 00:36:20,490 So let's try it. 766 00:36:20,490 --> 00:36:23,310 On a matrix like this, the exact solution 767 00:36:23,310 --> 00:36:27,210 to this system of equations is 3/4, 1/2, and 1/4. 768 00:36:27,210 --> 00:36:28,980 All right, we'll try Jacobi, we'll 769 00:36:28,980 --> 00:36:31,530 have to give it some initial guess for the solution, right? 770 00:36:31,530 --> 00:36:33,450 We're talking about places where you 771 00:36:33,450 --> 00:36:37,160 can derive those initial guesses from later on in the course, 772 00:36:37,160 --> 00:36:39,300 but we have to start the iterative process 773 00:36:39,300 --> 00:36:41,730 with some guess at the solutions. 774 00:36:41,730 --> 00:36:43,380 So here's an initial guess. 775 00:36:43,380 --> 00:36:44,550 We'll apply this map. 776 00:36:44,550 --> 00:36:47,010 Here's Gauss-Seidel with the same initial guess, 777 00:36:47,010 --> 00:36:48,670 and we'll apply this map. 778 00:36:48,670 --> 00:36:52,120 They're both linearly convergent, 779 00:36:52,120 --> 00:36:53,700 so the relative error will go down 780 00:36:53,700 --> 00:36:57,660 by a fixed factor after each iteration. 781 00:36:57,660 --> 00:37:00,470 Iteration one, the relative error in Jacobi will be 38%. 782 00:37:00,470 --> 00:37:02,910 In Gauss-Seidel, it'll be 40%. 783 00:37:02,910 --> 00:37:05,190 If we apply this all the way down to 10 iterations, 784 00:37:05,190 --> 00:37:09,020 the relative error Jacobi will be 1.7%, and the relative error 785 00:37:09,020 --> 00:37:11,360 in Gauss-Seidel 0.08%. 786 00:37:11,360 --> 00:37:13,320 And we can go on and on with these iterations 787 00:37:13,320 --> 00:37:16,740 if we want until we get sufficiently converged, we 788 00:37:16,740 --> 00:37:19,320 get to a point where the relative error is small enough 789 00:37:19,320 --> 00:37:22,590 that we're happy to accept this answer as a solution 790 00:37:22,590 --> 00:37:24,660 to our system of equations. 791 00:37:24,660 --> 00:37:28,450 So we traded the burden of doing all these calculations to do 792 00:37:28,450 --> 00:37:34,080 elimination for a faster, less computationally complex 793 00:37:34,080 --> 00:37:35,250 methodology. 794 00:37:35,250 --> 00:37:38,620 But the trade off was we don't get an exact solution anymore. 795 00:37:38,620 --> 00:37:40,890 We're going to have finite precision in the result, 796 00:37:40,890 --> 00:37:42,750 and we have to specify the tolerance 797 00:37:42,750 --> 00:37:44,541 that we want to converge to. 798 00:37:44,541 --> 00:37:47,040 We're going to see now-- this is the hook into the next part 799 00:37:47,040 --> 00:37:47,860 of that class-- 800 00:37:47,860 --> 00:37:50,070 we're going to talk about solutions of nonlinear 801 00:37:50,070 --> 00:37:52,470 equations next for which there are 802 00:37:52,470 --> 00:37:55,140 almost no non-linear equations that we can solve exactly. 803 00:37:55,140 --> 00:37:58,945 They all have to be solved using these iterative methods. 804 00:37:58,945 --> 00:38:01,320 You can use these iterative methods for linear equations. 805 00:38:01,320 --> 00:38:03,079 It's very common to do it this way. 806 00:38:03,079 --> 00:38:04,620 In my group, we solve lots of systems 807 00:38:04,620 --> 00:38:07,860 of linear equations associated with hydrodynamic problems. 808 00:38:07,860 --> 00:38:10,670 These come up when you're talking about, 809 00:38:10,670 --> 00:38:12,510 say, low Reynolds number flows, which 810 00:38:12,510 --> 00:38:15,420 are linear sorts of fluid flow problems. 811 00:38:15,420 --> 00:38:15,990 They're big. 812 00:38:15,990 --> 00:38:18,050 It's really hard to do Gaussian elimination, 813 00:38:18,050 --> 00:38:19,910 so you apply different iterative methods. 814 00:38:19,910 --> 00:38:20,910 You can do Gauss-Seidel. 815 00:38:20,910 --> 00:38:22,120 You can do Jacobi. 816 00:38:22,120 --> 00:38:23,790 We'll learn about more advanced ones 817 00:38:23,790 --> 00:38:26,880 like PCG, which you're applying on your homework now, 818 00:38:26,880 --> 00:38:29,970 and you should be seeing that it converges relatively quickly 819 00:38:29,970 --> 00:38:32,176 in cases where exact elimination doesn't work. 820 00:38:32,176 --> 00:38:34,050 We'll learn, actually, how to do that method. 821 00:38:34,050 --> 00:38:35,758 That's one that we apply in my own group. 822 00:38:35,758 --> 00:38:37,870 It's pretty common to use out there. 823 00:38:37,870 --> 00:38:38,370 Yes? 824 00:38:38,370 --> 00:38:43,440 AUDIENCE: One question, is that that Gauss, [INAUDIBLE] 825 00:38:43,440 --> 00:38:45,930 JAMES W. SWAN: Order N squared. 826 00:38:45,930 --> 00:38:47,708 AUDIENCE: Yeah, that's what I meant. 827 00:38:47,708 --> 00:38:50,148 So now we've got an [INAUDIBLE]. 828 00:38:50,148 --> 00:38:54,407 So we basically have [INAUDIBLE] iterations, right? 829 00:38:54,407 --> 00:38:56,240 JAMES W. SWAN: This is a wonderful question. 830 00:38:56,240 --> 00:39:00,120 So this is a pathological problem in the sense 831 00:39:00,120 --> 00:39:03,300 that it requires a lot of calculations 832 00:39:03,300 --> 00:39:05,550 to get an iterative solution here. 833 00:39:05,550 --> 00:39:07,110 We haven't gotten to an end that's 834 00:39:07,110 --> 00:39:10,290 big enough that the computational complexities 835 00:39:10,290 --> 00:39:11,700 crossover. 836 00:39:11,700 --> 00:39:15,810 So for small Ns, probably the factor in front of N-- 837 00:39:15,810 --> 00:39:17,200 whatever number that is-- 838 00:39:17,200 --> 00:39:19,000 and maybe even the smaller factors, 839 00:39:19,000 --> 00:39:20,940 order N squared factors on that order 840 00:39:20,940 --> 00:39:22,440 N cubed, play a big role in how long 841 00:39:22,440 --> 00:39:24,660 it takes to actually complete this thing. 842 00:39:24,660 --> 00:39:28,140 But modern problems are so big that we almost always 843 00:39:28,140 --> 00:39:30,480 are running out to Ns that are large enough 844 00:39:30,480 --> 00:39:31,590 that we see a crossover. 845 00:39:31,590 --> 00:39:34,110 You'll see this in your homework this week. 846 00:39:34,110 --> 00:39:36,030 You won't see this crossover at N equals 3. 847 00:39:36,030 --> 00:39:37,613 You're going to see it out at N equals 848 00:39:37,613 --> 00:39:41,336 500 or 1,200, big problems. 849 00:39:41,336 --> 00:39:43,210 Then we're going to encounter this crossover. 850 00:39:43,210 --> 00:39:44,376 That's a wonderful question. 851 00:39:44,376 --> 00:39:49,110 So first small system sizes, iterative methods 852 00:39:49,110 --> 00:39:50,750 maybe don't buy you much. 853 00:39:50,750 --> 00:39:53,000 I suppose it depends on the application though, right? 854 00:39:53,000 --> 00:39:56,430 If you're doing something that involves solving problems 855 00:39:56,430 --> 00:40:01,420 on embedded hardware, in some sort of sensor or control 856 00:40:01,420 --> 00:40:04,230 valve, there may be very limited memory 857 00:40:04,230 --> 00:40:06,290 or computational capacity available to you. 858 00:40:06,290 --> 00:40:08,490 And you may actually apply an iterative method 859 00:40:08,490 --> 00:40:12,120 like this to a problem that that controller 860 00:40:12,120 --> 00:40:15,330 needs to solve, for example. 861 00:40:15,330 --> 00:40:17,370 It just may not have the capability 862 00:40:17,370 --> 00:40:20,880 of storing and inverting what we would consider, today, 863 00:40:20,880 --> 00:40:25,350 a relatively small matrix because the hardware doesn't 864 00:40:25,350 --> 00:40:26,670 have that sort of capability. 865 00:40:26,670 --> 00:40:28,890 So there could be cases where you 866 00:40:28,890 --> 00:40:32,250 might choose something that's slower but feasible, 867 00:40:32,250 --> 00:40:34,650 versus something that's faster and exact, 868 00:40:34,650 --> 00:40:36,990 because there are other constraints. 869 00:40:36,990 --> 00:40:40,560 They do exist, but modern computers are pretty efficient. 870 00:40:40,560 --> 00:40:43,590 Your cell phone is faster than the fastest computers 871 00:40:43,590 --> 00:40:46,410 in the world 20 years ago. 872 00:40:46,410 --> 00:40:47,550 We're doing OK. 873 00:40:47,550 --> 00:40:49,890 So we've got to get out to big system sizes, big problem 874 00:40:49,890 --> 00:40:52,060 sizes, before this starts to pay off. 875 00:40:52,060 --> 00:40:55,210 But it does for many practical problems. 876 00:40:55,210 --> 00:40:59,410 OK I'll close with this, because this is the hook into solving 877 00:40:59,410 --> 00:41:00,520 nonlinear equations. 878 00:41:03,537 --> 00:41:05,370 So I showed you these two iterative methods, 879 00:41:05,370 --> 00:41:07,830 and they kind of had stringent requirements 880 00:41:07,830 --> 00:41:10,770 for when they were actually going to converge, right? 881 00:41:10,770 --> 00:41:13,350 I had to have a diagonally dominant system of equations 882 00:41:13,350 --> 00:41:15,460 for Jacobi to converge. 883 00:41:15,460 --> 00:41:18,720 I had to have diagonal dominance or symmetric positive definite 884 00:41:18,720 --> 00:41:19,260 matrices. 885 00:41:19,260 --> 00:41:20,730 These things exist and they come up 886 00:41:20,730 --> 00:41:22,188 in lots of physical problems, but I 887 00:41:22,188 --> 00:41:25,050 had to have it in order for Gauss-Seidel to converge. 888 00:41:25,050 --> 00:41:26,700 What if I have a system of equations 889 00:41:26,700 --> 00:41:28,194 that doesn't work that way? 890 00:41:28,194 --> 00:41:29,610 Or what if I have an iterative map 891 00:41:29,610 --> 00:41:33,900 that I like for some reason, but it doesn't appear to converge? 892 00:41:33,900 --> 00:41:37,270 Maybe it converges under some circumstances, but not others. 893 00:41:37,270 --> 00:41:40,140 Well, there's a way to modify these iterative maps, called 894 00:41:40,140 --> 00:41:43,380 successive over-relaxation, which 895 00:41:43,380 --> 00:41:45,977 can help promote convergence. 896 00:41:45,977 --> 00:41:48,060 So suppose we have an iterative map like this, x i 897 00:41:48,060 --> 00:41:51,897 plus 1 is some function of the previous iteration value. 898 00:41:51,897 --> 00:41:52,980 Doesn't matter what it is. 899 00:41:52,980 --> 00:41:54,646 It could be linear, could be non-linear. 900 00:41:54,646 --> 00:41:57,760 We don't actually care. 901 00:41:57,760 --> 00:42:00,610 The sought after solution is found when x i plus 1 902 00:42:00,610 --> 00:42:01,630 is equal to x i. 903 00:42:01,630 --> 00:42:03,540 So this map is one the convergence 904 00:42:03,540 --> 00:42:06,250 to the exact solution of the problem that we want. 905 00:42:06,250 --> 00:42:08,650 We've somehow guaranteed that that's the case, 906 00:42:08,650 --> 00:42:10,990 but it has to converge. 907 00:42:10,990 --> 00:42:14,890 One way to modify that map is to say x i plus 1 908 00:42:14,890 --> 00:42:17,890 is 1 minus some scalar value omega 909 00:42:17,890 --> 00:42:23,020 times x i plus omega times f. 910 00:42:23,020 --> 00:42:26,350 You can confirm that if you substitute x i plus 1 equals 911 00:42:26,350 --> 00:42:28,900 x i into this equation, you'll come up 912 00:42:28,900 --> 00:42:33,940 with the same fixed points of this iterative map 913 00:42:33,940 --> 00:42:36,500 x i is equal to f of x i. 914 00:42:36,500 --> 00:42:40,210 So you haven't changed what value will converge here, 915 00:42:40,210 --> 00:42:42,460 but you've affected the rate at which it converges. 916 00:42:42,460 --> 00:42:45,640 Here you're saying x i plus 1 is some fraction 917 00:42:45,640 --> 00:42:50,290 of my previous solution plus some fraction of this f. 918 00:42:50,290 --> 00:42:53,890 And I get to control how big those different fractions. 919 00:42:53,890 --> 00:42:58,410 So if things aren't converging well for a map like this, 920 00:42:58,410 --> 00:43:00,700 then I could try successive over-relaxation, 921 00:43:00,700 --> 00:43:03,400 and I could adjust this relaxation parameter 922 00:43:03,400 --> 00:43:07,890 to be some fraction, some number between 0 and 1, 923 00:43:07,890 --> 00:43:10,050 until I start to observe convergence. 924 00:43:10,050 --> 00:43:11,550 And there are some rules one can use 925 00:43:11,550 --> 00:43:13,383 to try to promote convergence with this kind 926 00:43:13,383 --> 00:43:15,204 of successive over-relaxation. 927 00:43:15,204 --> 00:43:17,370 This is a very generic technique that one can apply. 928 00:43:17,370 --> 00:43:20,700 If you have any iterative map you're trying to apply, 929 00:43:20,700 --> 00:43:22,840 it should go to the solution you want 930 00:43:22,840 --> 00:43:24,570 but it doesn't converge for some reason, 931 00:43:24,570 --> 00:43:27,630 then you can use this relaxation technique to promote 932 00:43:27,630 --> 00:43:29,250 convergence to the solution. 933 00:43:29,250 --> 00:43:31,770 You may slow the convergence way down. 934 00:43:31,770 --> 00:43:34,680 It may be very slow to converge, but it will converge. 935 00:43:34,680 --> 00:43:36,690 And after all, an answer is better 936 00:43:36,690 --> 00:43:38,940 than no answer, no matter how long it takes to get it. 937 00:43:38,940 --> 00:43:40,740 So sometimes you've got to get these things 938 00:43:40,740 --> 00:43:43,670 by hook or by crook. 939 00:43:43,670 --> 00:43:45,710 So for example, you can apply this to Jacobi. 940 00:43:45,710 --> 00:43:48,514 This was the original Jacobi map. 941 00:43:48,514 --> 00:43:49,430 And we just take that. 942 00:43:49,430 --> 00:43:53,640 We add 1 minus omega times x i plus omega 943 00:43:53,640 --> 00:43:55,680 times this factor over here. 944 00:43:55,680 --> 00:43:59,300 And now we can choose omega so that this solution converges. 945 00:43:59,300 --> 00:44:02,360 We always make omega small enough so 946 00:44:02,360 --> 00:44:04,550 that the diagonal values of our matrix 947 00:44:04,550 --> 00:44:07,220 appear big enough that the matrix looks 948 00:44:07,220 --> 00:44:09,770 like it's diagonally dominated. 949 00:44:09,770 --> 00:44:12,230 You could go back to that same convergence analysis 950 00:44:12,230 --> 00:44:14,510 that I showed you before and try to apply it 951 00:44:14,510 --> 00:44:19,252 to this over-relaxation form of Jacobi and see that, 952 00:44:19,252 --> 00:44:21,710 while there's always going to be some value of omega that's 953 00:44:21,710 --> 00:44:25,420 small enough, that this thing will converge. 954 00:44:25,420 --> 00:44:27,920 It will look effectively diagonally dominant, 955 00:44:27,920 --> 00:44:32,150 because omega inverse times D will be big enough, 956 00:44:32,150 --> 00:44:34,865 or omega times D inverse will be small enough. 957 00:44:34,865 --> 00:44:36,619 Does that make sense? 958 00:44:36,619 --> 00:44:39,160 You can apply the same sort of damping method to Gauss-Seidel 959 00:44:39,160 --> 00:44:39,660 as well. 960 00:44:39,660 --> 00:44:42,360 It's very common to do this. 961 00:44:42,360 --> 00:44:45,560 The relaxation parameter acts like an effective increase 962 00:44:45,560 --> 00:44:47,692 in the eigenvalues of the matrix. 963 00:44:47,692 --> 00:44:50,150 So you can think about L. That's a lower triangular matrix. 964 00:44:50,150 --> 00:44:54,310 It's diagonal values are its eigenvalues. 965 00:44:54,310 --> 00:44:56,190 The diagonal values of L inverse-- 966 00:44:56,190 --> 00:44:59,000 well, 1 over those diagonal values 967 00:44:59,000 --> 00:45:00,820 are the eigenvalues of L inverse. 968 00:45:00,820 --> 00:45:03,560 And so if we make omega very small, 969 00:45:03,560 --> 00:45:06,630 then we make the eigenvalues of L inverse very small, 970 00:45:06,630 --> 00:45:08,510 or the eigenvalues or L very big. 971 00:45:08,510 --> 00:45:12,170 And again, the matrix starts to look diagonally dominated. 972 00:45:12,170 --> 00:45:15,330 And you can promote convergence in this way. 973 00:45:15,330 --> 00:45:17,960 So even though this may be slow, you 974 00:45:17,960 --> 00:45:19,640 can use it to guarantee convergence 975 00:45:19,640 --> 00:45:22,299 of some iterative procedures, not just for linear equations, 976 00:45:22,299 --> 00:45:23,840 but for non-linear equations as well. 977 00:45:23,840 --> 00:45:25,370 And we'll see, there are good ways 978 00:45:25,370 --> 00:45:27,230 of choosing omega for certain classes 979 00:45:27,230 --> 00:45:29,010 of non-linear equations. 980 00:45:29,010 --> 00:45:30,440 We'll apply Newton-Raphson method, 981 00:45:30,440 --> 00:45:33,590 and then will damp it using exactly the sort of procedure. 982 00:45:33,590 --> 00:45:37,190 And I'll show you how you can choose 983 00:45:37,190 --> 00:45:41,860 a nearly optimal value for omega to promote convergence 984 00:45:41,860 --> 00:45:43,610 to the solution. 985 00:45:43,610 --> 00:45:46,480 Any questions? 986 00:45:46,480 --> 00:45:50,140 No, let me address one more thing before you go. 987 00:45:50,140 --> 00:45:53,140 We've scheduled times for the quizzes. 988 00:45:53,140 --> 00:45:55,180 They are going to be in the evenings 989 00:45:55,180 --> 00:45:58,180 on the dates that are specified on the syllabus. 990 00:45:58,180 --> 00:46:01,060 We wanted to do them during the daytime. 991 00:46:01,060 --> 00:46:02,890 It was really difficult to schedule 992 00:46:02,890 --> 00:46:04,970 a room that was big enough for this class, 993 00:46:04,970 --> 00:46:07,990 so they have to be from 7:00 to 9:00 in the gymnasium. 994 00:46:07,990 --> 00:46:09,709 I apologize for that. 995 00:46:09,709 --> 00:46:11,500 We spent several days looking around trying 996 00:46:11,500 --> 00:46:14,050 to find a place where we could put everybody 997 00:46:14,050 --> 00:46:17,680 so you would all get the same experience in the quiz. 998 00:46:17,680 --> 00:46:20,590 I know that the November quiz comes back to back 999 00:46:20,590 --> 00:46:24,130 with the thermodynamics exam as well. 1000 00:46:24,130 --> 00:46:26,030 That's frustrating. 1001 00:46:26,030 --> 00:46:27,850 Thermodynamics is the next day. 1002 00:46:27,850 --> 00:46:29,170 That week is tricky. 1003 00:46:29,170 --> 00:46:33,004 That's AICHE, so most of the faculty have to travel. 1004 00:46:33,004 --> 00:46:34,420 We won't be able to teach, but you 1005 00:46:34,420 --> 00:46:36,350 won't have classes one of those days 1006 00:46:36,350 --> 00:46:39,910 so you have extra time to study. 1007 00:46:39,910 --> 00:46:41,750 And Columbus Day also falls in that week, 1008 00:46:41,750 --> 00:46:44,590 so there's no way to put three exams in four days 1009 00:46:44,590 --> 00:46:46,830 without having them come right back to back. 1010 00:46:46,830 --> 00:46:48,460 Believe me, we thought about this 1011 00:46:48,460 --> 00:46:51,004 and tried to get things scheduled as efficiently as we 1012 00:46:51,004 --> 00:46:52,420 could for you, but sometimes there 1013 00:46:52,420 --> 00:46:54,420 are constraints that are outside of our control. 1014 00:46:54,420 --> 00:46:55,927 But the quiz times are set. 1015 00:46:55,927 --> 00:46:58,260 There's going to be done in October and one in November. 1016 00:46:58,260 --> 00:47:00,790 They'll be in the evening, and they'll be in the gymnasium. 1017 00:47:00,790 --> 00:47:03,790 I'll give you directions to it before the exam, just 1018 00:47:03,790 --> 00:47:06,280 say you know exactly where to go, OK? 1019 00:47:06,280 --> 00:47:08,210 Thank you, guys.