1 00:00:01,540 --> 00:00:03,910 The following content is provided under a Creative 2 00:00:03,910 --> 00:00:05,300 Commons license. 3 00:00:05,300 --> 00:00:07,510 Your support will help MIT OpenCourseWare 4 00:00:07,510 --> 00:00:11,600 continue to offer high-quality educational resources for free. 5 00:00:11,600 --> 00:00:14,140 To make a donation or to view additional materials 6 00:00:14,140 --> 00:00:18,100 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,100 --> 00:00:19,310 at ocw.mit.edu. 8 00:00:24,145 --> 00:00:24,770 JAMES SWAN: OK. 9 00:00:24,770 --> 00:00:26,790 Well, everyone's quieted down, so that 10 00:00:26,790 --> 00:00:28,680 means we have to get started. 11 00:00:28,680 --> 00:00:31,350 So let me say something here. 12 00:00:31,350 --> 00:00:34,260 This will be our last conversation 13 00:00:34,260 --> 00:00:36,360 about optimization. 14 00:00:36,360 --> 00:00:39,510 So we've discussed unconstrained optimization. 15 00:00:39,510 --> 00:00:42,150 And now we're going to discuss a slightly more complicated 16 00:00:42,150 --> 00:00:43,525 problem-- but you're going to see 17 00:00:43,525 --> 00:00:45,420 it's really not that much more complicated-- 18 00:00:45,420 --> 00:00:47,170 constrained optimization. 19 00:00:50,042 --> 00:00:51,750 These are the things we discussed before. 20 00:00:51,750 --> 00:00:53,520 I don't want to spend much time recapping 21 00:00:53,520 --> 00:00:55,936 because I want to take a minute and talk about the midterm 22 00:00:55,936 --> 00:00:57,780 exam. 23 00:00:57,780 --> 00:00:59,250 So we have a quiz. 24 00:00:59,250 --> 00:01:01,020 It's next Wednesday. 25 00:01:01,020 --> 00:01:02,680 Here's where it's going to be located. 26 00:01:02,680 --> 00:01:03,630 Here's 66. 27 00:01:03,630 --> 00:01:04,950 Head down Ames. 28 00:01:04,950 --> 00:01:09,030 You're looking for Walker Memorial on the third floor. 29 00:01:09,030 --> 00:01:12,030 Unfortunately, the time for the quiz is 7:00 to 9:00 PM. 30 00:01:12,030 --> 00:01:14,640 We really did try hard to get the scheduling 31 00:01:14,640 --> 00:01:16,450 office to give us something better, 32 00:01:16,450 --> 00:01:20,100 but the only way to get a room that would fit everybody in 33 00:01:20,100 --> 00:01:22,035 was to do it at this time in Walker. 34 00:01:22,035 --> 00:01:23,910 I really don't understand, because I actually 35 00:01:23,910 --> 00:01:27,480 requested locations for the quizzes back in April. 36 00:01:27,480 --> 00:01:31,800 And somehow I was too early, maybe, 37 00:01:31,800 --> 00:01:33,992 and got buried under a pile. 38 00:01:33,992 --> 00:01:35,700 Maybe not important enough, I don't know. 39 00:01:35,700 --> 00:01:40,146 But it's got to be from seven to nine next Wednesday. 40 00:01:40,146 --> 00:01:40,645 Third floor. 41 00:01:40,645 --> 00:01:42,270 There's not going to be any class 42 00:01:42,270 --> 00:01:44,350 next Wednesday because you have a quiz instead. 43 00:01:44,350 --> 00:01:49,500 So you get a little extra time to relax or study, prepare, 44 00:01:49,500 --> 00:01:51,690 calm yourself before you go into the exam. 45 00:01:51,690 --> 00:01:54,150 There's no homework this week. 46 00:01:54,150 --> 00:01:58,950 So you can just use this time to focus on the material we've 47 00:01:58,950 --> 00:02:00,090 discussed. 48 00:02:00,090 --> 00:02:02,310 There's a practice exam from last year 49 00:02:02,310 --> 00:02:07,950 posted on the Steller site, which you can utilize and study 50 00:02:07,950 --> 00:02:08,850 from. 51 00:02:08,850 --> 00:02:10,250 I'll tell you this. 52 00:02:13,506 --> 00:02:15,990 That practice exam is skewed a little more 53 00:02:15,990 --> 00:02:19,410 towards some chemical engineering problems 54 00:02:19,410 --> 00:02:22,060 that motivate the numerics. 55 00:02:22,060 --> 00:02:25,050 I've found in the past that when problems like that 56 00:02:25,050 --> 00:02:27,060 are given on the exam, sometimes there's 57 00:02:27,060 --> 00:02:30,580 a lot of reading that goes into understanding the engineering 58 00:02:30,580 --> 00:02:31,080 problem. 59 00:02:31,080 --> 00:02:33,760 And that tends to set back the problem-solving. 60 00:02:33,760 --> 00:02:37,110 So I'll tell you that the quiz that you'll take on Wednesday 61 00:02:37,110 --> 00:02:40,590 will have less of the engineering associated with it, 62 00:02:40,590 --> 00:02:46,080 and focus more on the numerical or computational science. 63 00:02:46,080 --> 00:02:48,400 The underlying sorts of questions, 64 00:02:48,400 --> 00:02:51,450 the way the questions are asked, the kinds of responses 65 00:02:51,450 --> 00:02:53,730 you're expected to give I'd say are very similar. 66 00:02:53,730 --> 00:02:57,450 But we've tried to tune the exam so that it'll 67 00:02:57,450 --> 00:02:59,430 be less of a burden to understand 68 00:02:59,430 --> 00:03:01,429 the structure of the problem before describing 69 00:03:01,429 --> 00:03:02,220 how you'd solve it. 70 00:03:02,220 --> 00:03:03,920 So I think that's good. 71 00:03:03,920 --> 00:03:07,990 It's comprehensive up to today. 72 00:03:07,990 --> 00:03:10,860 So linear algebra, systems of nonlinear equations 73 00:03:10,860 --> 00:03:14,160 and optimization are the quiz topics. 74 00:03:14,160 --> 00:03:18,660 We're going to switch on Friday to ordinary differential 75 00:03:18,660 --> 00:03:20,735 equations and initial value problems. 76 00:03:20,735 --> 00:03:22,110 So you have two lectures on that, 77 00:03:22,110 --> 00:03:24,540 but you won't have done any homework. 78 00:03:24,540 --> 00:03:26,940 You probably don't know enough or aren't practiced enough 79 00:03:26,940 --> 00:03:30,796 to answer any questions intelligently on the quiz. 80 00:03:30,796 --> 00:03:32,670 So don't expect that material to be on there. 81 00:03:32,670 --> 00:03:33,380 It's not. 82 00:03:33,380 --> 00:03:36,672 It's going to be these three topics. 83 00:03:36,672 --> 00:03:38,880 Are there any questions about this that I can answer? 84 00:03:42,864 --> 00:03:44,358 Kristin has a question. 85 00:03:44,358 --> 00:03:45,354 AUDIENCE: [INAUDIBLE]. 86 00:03:56,535 --> 00:03:57,160 JAMES SWAN: OK. 87 00:03:57,160 --> 00:03:59,380 So yeah, come prepared. 88 00:03:59,380 --> 00:04:00,190 It might be cold. 89 00:04:00,190 --> 00:04:02,440 It might be hot. 90 00:04:02,440 --> 00:04:04,130 It leaks when it rains a little bit. 91 00:04:04,130 --> 00:04:07,480 Yeah, it's not the greatest spot. 92 00:04:07,480 --> 00:04:09,590 So come prepared. 93 00:04:09,590 --> 00:04:10,300 That's true. 94 00:04:10,300 --> 00:04:12,000 Other questions? 95 00:04:12,000 --> 00:04:13,292 Things you want to know? 96 00:04:13,292 --> 00:04:15,180 AUDIENCE: What can we take to the exam? 97 00:04:15,180 --> 00:04:16,471 JAMES SWAN: Ooh, good question. 98 00:04:16,471 --> 00:04:20,059 So you can bring the book recommended for the course. 99 00:04:20,059 --> 00:04:21,100 You can bring your notes. 100 00:04:21,100 --> 00:04:22,960 You can bring a calculator. 101 00:04:22,960 --> 00:04:24,300 You need to bring some pencils. 102 00:04:24,300 --> 00:04:28,210 We'll provide blue books for you to write your solutions 103 00:04:28,210 --> 00:04:29,860 to the exam in. 104 00:04:29,860 --> 00:04:31,150 So those are the materials. 105 00:04:31,150 --> 00:04:32,320 Good. 106 00:04:32,320 --> 00:04:33,370 What else? 107 00:04:33,370 --> 00:04:34,710 Same question. 108 00:04:34,710 --> 00:04:36,419 OK. 109 00:04:36,419 --> 00:04:37,898 Other questions? 110 00:04:40,856 --> 00:04:42,171 No? 111 00:04:42,171 --> 00:04:42,670 OK. 112 00:04:45,755 --> 00:04:47,880 So then let's jump into the topic of the day, which 113 00:04:47,880 --> 00:04:50,070 is constrained optimization. 114 00:04:50,070 --> 00:04:51,840 So these are problems of the sort. 115 00:04:51,840 --> 00:04:57,870 Minimize an objective function f of x subject to the constraint 116 00:04:57,870 --> 00:05:02,190 that x belongs to some set D, or find the argument 117 00:05:02,190 --> 00:05:04,007 x that minimizes this function. 118 00:05:04,007 --> 00:05:05,590 These are equivalent sorts of problem. 119 00:05:05,590 --> 00:05:08,410 Sometimes, we want to know one or the other or both. 120 00:05:08,410 --> 00:05:10,790 That's not a problem. 121 00:05:10,790 --> 00:05:12,290 And graphically, it looks like this. 122 00:05:12,290 --> 00:05:13,770 Here's f, our objective function. 123 00:05:13,770 --> 00:05:16,470 It's a nice convex bowl-shaped function here. 124 00:05:16,470 --> 00:05:19,080 And we want to know the values of x1 and x2, 125 00:05:19,080 --> 00:05:21,870 let's say, that minimize this function 126 00:05:21,870 --> 00:05:23,220 subject to some constraint. 127 00:05:23,220 --> 00:05:27,900 That constraint could be that x1 and x2 live 128 00:05:27,900 --> 00:05:29,820 inside this little blue circle. 129 00:05:29,820 --> 00:05:32,820 It could be D. It could be that x1 and x2 live 130 00:05:32,820 --> 00:05:34,680 on the surface of this circle, right, 131 00:05:34,680 --> 00:05:36,780 on the circumference of this circle. 132 00:05:36,780 --> 00:05:40,670 That could be the constraint. 133 00:05:40,670 --> 00:05:43,160 So these are the sorts of problems we want to solve. 134 00:05:43,160 --> 00:05:45,980 D is called the feasible set, and can 135 00:05:45,980 --> 00:05:48,710 be described in terms of really two types of constraints. 136 00:05:48,710 --> 00:05:51,200 One is what we call equality constraints. 137 00:05:51,200 --> 00:05:54,650 So D can be the set of values x such 138 00:05:54,650 --> 00:06:00,440 that some nonlinear function c of x is equal to zero. 139 00:06:00,440 --> 00:06:03,080 So it's the set of points that satisfy this nonlinear 140 00:06:03,080 --> 00:06:04,400 equation. 141 00:06:04,400 --> 00:06:06,080 And among those points, we want to know 142 00:06:06,080 --> 00:06:09,274 which one produces the minimum in the objective function. 143 00:06:09,274 --> 00:06:10,940 Or it could be an inequality constraint. 144 00:06:10,940 --> 00:06:14,420 So D could be the set of points such that some nonlinear 145 00:06:14,420 --> 00:06:20,240 function h of x is, by convention, positive. 146 00:06:20,240 --> 00:06:22,170 So h of x could represent, for example, 147 00:06:22,170 --> 00:06:23,900 the interior of a circle, and c of x 148 00:06:23,900 --> 00:06:27,189 could represent the circumference of a circle. 149 00:06:27,189 --> 00:06:28,730 And we would have nonlinear equations 150 00:06:28,730 --> 00:06:33,560 that reflect those values of x that satisfy 151 00:06:33,560 --> 00:06:35,360 those sorts of geometries. 152 00:06:38,770 --> 00:06:43,930 So equality constrained, points that lie on this circle, 153 00:06:43,930 --> 00:06:48,010 inequality constrained, points that lie within this circle. 154 00:06:48,010 --> 00:06:50,560 The shape of the feasible set is constrained by the problem 155 00:06:50,560 --> 00:06:52,460 that you're actually interested in. 156 00:06:52,460 --> 00:06:54,460 So it's easy for me to draw circles in the plane 157 00:06:54,460 --> 00:06:56,293 because that's a shape you're familiar with. 158 00:06:56,293 --> 00:06:57,970 But actually, it'll come from some sort 159 00:06:57,970 --> 00:07:01,060 of physical constraint on the engineering problem you're 160 00:07:01,060 --> 00:07:03,940 looking at, like mole fractions need to be bigger than zero 161 00:07:03,940 --> 00:07:06,430 and smaller than one, and temperatures in absolute value 162 00:07:06,430 --> 00:07:09,220 have to be bigger than zero and smaller than some value 163 00:07:09,220 --> 00:07:12,790 because that's a safety factor on the process. 164 00:07:12,790 --> 00:07:16,960 So these set up the constraints on various sorts 165 00:07:16,960 --> 00:07:19,864 of optimization problems that we're interested in. 166 00:07:19,864 --> 00:07:22,030 It could also be true that we're interested in, say, 167 00:07:22,030 --> 00:07:24,790 optimization in the domain outside of this circle, too. 168 00:07:24,790 --> 00:07:28,030 It could be on the inside, could be on the outside. 169 00:07:28,030 --> 00:07:31,816 That's also an inequality constrained sort of problem. 170 00:07:31,816 --> 00:07:34,730 You know some of these already. 171 00:07:34,730 --> 00:07:36,350 They're familiar to you. 172 00:07:36,350 --> 00:07:38,840 So here's a classic one from mechanics. 173 00:07:38,840 --> 00:07:43,120 Here's the total energy in a system for, say, a pendulum. 174 00:07:43,120 --> 00:07:47,080 So x is like the position of the tip of this pendulum and v 175 00:07:47,080 --> 00:07:48,910 is the velocity that it moves with. 176 00:07:48,910 --> 00:07:50,690 This is the kinetic energy. 177 00:07:50,690 --> 00:07:51,920 This is the potential energy. 178 00:07:51,920 --> 00:07:54,128 And we know the pendulum will come to rest in a place 179 00:07:54,128 --> 00:07:56,680 where the energy is minimized. 180 00:07:56,680 --> 00:07:59,050 Well, the energy can only be minimized 181 00:07:59,050 --> 00:08:02,590 when the velocity here is zero, because any non-zero velocity 182 00:08:02,590 --> 00:08:04,450 will always push the energy content up. 183 00:08:04,450 --> 00:08:06,040 So it comes to rest. 184 00:08:06,040 --> 00:08:07,690 It doesn't move. 185 00:08:07,690 --> 00:08:10,120 And then there's some value of x at which 186 00:08:10,120 --> 00:08:11,710 the energy is minimized. 187 00:08:11,710 --> 00:08:14,680 If there is no constraint that says that the pendulum is 188 00:08:14,680 --> 00:08:17,680 attached to some central axis, then I 189 00:08:17,680 --> 00:08:19,270 can always make the energy smaller 190 00:08:19,270 --> 00:08:21,430 by making x more and more negative. 191 00:08:21,430 --> 00:08:22,570 It just keeps falling. 192 00:08:22,570 --> 00:08:23,740 There is no stopping point. 193 00:08:23,740 --> 00:08:25,507 But there's a constraint. 194 00:08:25,507 --> 00:08:27,340 The distance between the tip of the pendulum 195 00:08:27,340 --> 00:08:31,965 and this central point is some fixed distance out. 196 00:08:31,965 --> 00:08:34,090 So this is an equality constrained sort of problem, 197 00:08:34,090 --> 00:08:35,714 and we have to choose from the set of v 198 00:08:35,714 --> 00:08:38,679 and x the values subject to this constraint that 199 00:08:38,679 --> 00:08:39,970 minimize the total energy. 200 00:08:39,970 --> 00:08:43,650 And that's this configuration of the pendulum here. 201 00:08:43,650 --> 00:08:46,420 So you know these sorts of problems already. 202 00:08:46,420 --> 00:08:51,380 We talked about this one, linear sorts of programs. 203 00:08:51,380 --> 00:08:55,480 These are optimization problems where the objective function is 204 00:08:55,480 --> 00:08:57,940 linear in the design variables. 205 00:08:57,940 --> 00:09:01,300 So it's just the dot product between x and some vector 206 00:09:01,300 --> 00:09:03,070 c that weights the different design 207 00:09:03,070 --> 00:09:05,300 options against each other. 208 00:09:05,300 --> 00:09:07,060 So we talked about ice cream. 209 00:09:07,060 --> 00:09:08,480 Yes, this is all premium ice cream 210 00:09:08,480 --> 00:09:11,435 because it comes in the small containers, 211 00:09:11,435 --> 00:09:12,810 subject to different constraints. 212 00:09:12,810 --> 00:09:14,226 So those constraints can be things 213 00:09:14,226 --> 00:09:16,480 like, oh, x has to be positive because we can't make 214 00:09:16,480 --> 00:09:18,220 negative amounts of ice cream. 215 00:09:18,220 --> 00:09:20,440 And maybe we've done market research 216 00:09:20,440 --> 00:09:22,300 that tells us that the market can only 217 00:09:22,300 --> 00:09:26,110 tolerate certain ratios of different types of ice cream. 218 00:09:26,110 --> 00:09:28,600 And that may be some set of linear equations 219 00:09:28,600 --> 00:09:31,570 that describe that market research that sort of bound 220 00:09:31,570 --> 00:09:33,812 the upper values of how much ice cream 221 00:09:33,812 --> 00:09:35,020 we can put out on the market. 222 00:09:35,020 --> 00:09:38,680 And then we try to choose the optimal blend of pina colada 223 00:09:38,680 --> 00:09:40,230 and strawberry to sell. 224 00:09:43,385 --> 00:09:46,040 So those are linear programs. 225 00:09:46,040 --> 00:09:50,270 This is an inequality constrained optimization. 226 00:09:53,660 --> 00:09:56,330 In general, we might write these problems like this. 227 00:09:56,330 --> 00:09:59,900 We might say minimize f of x subject to the constraint 228 00:09:59,900 --> 00:10:04,400 that c of x is 0 and h of x is positive. 229 00:10:04,400 --> 00:10:06,200 So minimize it over the values of x that 230 00:10:06,200 --> 00:10:08,490 satisfy these two constraints. 231 00:10:08,490 --> 00:10:12,060 There's an old approach that's discussed in the literature. 232 00:10:12,060 --> 00:10:12,920 And it's not used. 233 00:10:12,920 --> 00:10:14,000 I'm going to describe it to you, and then I 234 00:10:14,000 --> 00:10:16,000 want you to try to figure out why it's not used. 235 00:10:16,000 --> 00:10:19,210 And it's called the penalty method. 236 00:10:19,210 --> 00:10:21,250 And the penalty method works this way. 237 00:10:21,250 --> 00:10:23,990 It says define a new objective function, 238 00:10:23,990 --> 00:10:28,690 which is our old objective function plus some penalty 239 00:10:28,690 --> 00:10:30,835 for violating the constraints. 240 00:10:30,835 --> 00:10:31,960 How does that penalty work? 241 00:10:31,960 --> 00:10:35,590 So we know that we want values of x for which c of x 242 00:10:35,590 --> 00:10:38,200 is equal to 0. 243 00:10:38,200 --> 00:10:41,200 So if we add to our objective function the norm of c of x-- 244 00:10:41,200 --> 00:10:44,720 this is a positive quantity-- 245 00:10:44,720 --> 00:10:46,700 this is a positive quantity-- whenever 246 00:10:46,700 --> 00:10:48,740 x doesn't satisfy the constraint, 247 00:10:48,740 --> 00:10:51,080 this positive quantity will give us 248 00:10:51,080 --> 00:10:54,420 a bigger value for this objective function f 249 00:10:54,420 --> 00:10:56,180 than if c of x was equal to 0. 250 00:10:56,180 --> 00:11:01,820 So we penalize points which don't satisfy the constraint. 251 00:11:01,820 --> 00:11:04,940 And in the limit that this penalty factor mu here 252 00:11:04,940 --> 00:11:10,130 goes to zero, the penalties get large, so large 253 00:11:10,130 --> 00:11:13,280 that our solution will have to prefer 254 00:11:13,280 --> 00:11:14,962 satisfying the constraints. 255 00:11:14,962 --> 00:11:16,670 There's another penalty factor over here, 256 00:11:16,670 --> 00:11:19,280 which is identical to this one but for the inequality 257 00:11:19,280 --> 00:11:20,360 constraint. 258 00:11:20,360 --> 00:11:26,770 It says take a heaviside step function 259 00:11:26,770 --> 00:11:31,400 for which is equal to 1 when the value of its argument 260 00:11:31,400 --> 00:11:32,960 is positive, and it's equal to zero 261 00:11:32,960 --> 00:11:35,650 when the value of its argument is negative. 262 00:11:35,650 --> 00:11:40,760 So whenever I violate each of my inequality constraints, 263 00:11:40,760 --> 00:11:44,930 Hi of x, turn on this heaviside step function, 264 00:11:44,930 --> 00:11:46,610 make it equal to 1, and then multiply it 265 00:11:46,610 --> 00:11:50,270 by the value of the constraint squared, a positive number. 266 00:11:50,270 --> 00:11:52,430 So this is the inequality constraint penalty, 267 00:11:52,430 --> 00:11:54,470 and this is the equality constraint penalty. 268 00:11:54,470 --> 00:11:57,480 People don't use this, though. 269 00:11:57,480 --> 00:11:58,470 It makes sense. 270 00:11:58,470 --> 00:12:00,540 I take the limit that mu goes to zero. 271 00:12:00,540 --> 00:12:03,810 I'm going to have to prefer solutions 272 00:12:03,810 --> 00:12:06,580 that satisfy these constraints. 273 00:12:06,580 --> 00:12:08,745 Otherwise, if I don't satisfy these constraints, 274 00:12:08,745 --> 00:12:10,620 I could always move closer to a solution that 275 00:12:10,620 --> 00:12:12,036 satisfies the constraint, and I'll 276 00:12:12,036 --> 00:12:14,970 bring down the value of the objective function. 277 00:12:14,970 --> 00:12:15,857 I'll make it lower. 278 00:12:15,857 --> 00:12:17,940 So I'll always prefer these lower value solutions. 279 00:12:17,940 --> 00:12:21,180 But can you guys take a second and sort of talk to each other? 280 00:12:21,180 --> 00:12:25,147 See if you can figure out why one doesn't use this method. 281 00:12:25,147 --> 00:12:26,355 Why is this method a problem? 282 00:15:08,980 --> 00:15:11,780 OK, I heard the volume go up at some point, which 283 00:15:11,780 --> 00:15:13,952 means either you switched topics and felt 284 00:15:13,952 --> 00:15:15,410 more comfortable talking about that 285 00:15:15,410 --> 00:15:17,360 than this, or maybe you guys were coming 286 00:15:17,360 --> 00:15:19,970 to some conclusions, or had some ideas about why 287 00:15:19,970 --> 00:15:21,359 this might be a bad idea. 288 00:15:21,359 --> 00:15:23,900 Do you want to volunteer some of what you were talking about? 289 00:15:23,900 --> 00:15:24,878 Yeah, Hersh. 290 00:15:24,878 --> 00:15:28,301 AUDIENCE: Could it be that [INAUDIBLE]?? 291 00:15:41,040 --> 00:15:43,340 JAMES SWAN: Well, that's an interesting idea. 292 00:15:43,340 --> 00:15:45,710 So yeah, if we have a non-convex optimization problem, 293 00:15:45,710 --> 00:15:48,710 there could be some issues with f of x, and maybe f 294 00:15:48,710 --> 00:15:50,810 of x runs away so fast that I can never 295 00:15:50,810 --> 00:15:53,080 make the penalty big enough to enforce the constraint. 296 00:15:53,080 --> 00:15:54,830 That's actually a really interesting idea. 297 00:15:54,830 --> 00:15:57,630 And I like the idea of comparing the magnitude of these two 298 00:15:57,630 --> 00:15:58,130 terms. 299 00:15:58,130 --> 00:15:59,600 I think that's on the right track. 300 00:15:59,600 --> 00:16:01,560 Were there some other ideas about why 301 00:16:01,560 --> 00:16:02,624 you might not do this? 302 00:16:02,624 --> 00:16:03,290 Different ideas? 303 00:16:03,290 --> 00:16:04,408 Yeah. 304 00:16:04,408 --> 00:16:05,830 AUDIENCE: [INAUDIBLE]. 305 00:16:09,980 --> 00:16:12,480 JAMES SWAN: Well, you know, that that's an interesting idea, 306 00:16:12,480 --> 00:16:14,630 but actually the two terms in the parentheses 307 00:16:14,630 --> 00:16:17,150 here are both positive. 308 00:16:17,150 --> 00:16:19,100 So they're only going to be minimized 309 00:16:19,100 --> 00:16:21,760 when I satisfy the constraints. 310 00:16:21,760 --> 00:16:24,500 So the local minima of the terms in parentheses 311 00:16:24,500 --> 00:16:30,260 sit on or within the boundaries of the feasible set 312 00:16:30,260 --> 00:16:31,410 that we're looking at. 313 00:16:31,410 --> 00:16:32,868 So by construction, actually, we're 314 00:16:32,868 --> 00:16:35,660 going to be able to satisfy them because the local minima 315 00:16:35,660 --> 00:16:38,420 of these points sits on these boundaries. 316 00:16:38,420 --> 00:16:43,230 These terms are minimized by satisfying the constraints. 317 00:16:43,230 --> 00:16:43,731 Other ideas? 318 00:16:43,731 --> 00:16:44,229 Yeah. 319 00:16:44,229 --> 00:16:46,190 AUDIENCE: Do your iterates have to be feasible? 320 00:16:46,190 --> 00:16:46,610 JAMES SWAN: What's that? 321 00:16:46,610 --> 00:16:48,380 AUDIENCE: Your iterates don't have to be feasible? 322 00:16:48,380 --> 00:16:49,963 JAMES SWAN: Ooh, this is a good point. 323 00:16:49,963 --> 00:16:52,590 The iterates-- this is an unconstrained optimization 324 00:16:52,590 --> 00:16:53,090 problem. 325 00:16:53,090 --> 00:16:55,600 I'm just going to minimize this objective function. 326 00:16:55,600 --> 00:16:57,140 It's like what Hersh said, I can go 327 00:16:57,140 --> 00:16:58,764 anywhere I want in the domain. 328 00:16:58,764 --> 00:17:00,680 I'm going to minimize this objective function, 329 00:17:00,680 --> 00:17:01,280 and then I'm going to try to take 330 00:17:01,280 --> 00:17:02,570 the limit as mu goes to zero. 331 00:17:02,570 --> 00:17:04,194 The iterates don't have to be feasible. 332 00:17:04,194 --> 00:17:06,609 Maybe I can't even evaluate f of x if the iterates aren't 333 00:17:06,609 --> 00:17:07,310 feasible. 334 00:17:07,310 --> 00:17:08,660 That's an excellent point. 335 00:17:08,660 --> 00:17:10,640 That could be an issue. 336 00:17:10,640 --> 00:17:13,829 Anything else? 337 00:17:13,829 --> 00:17:17,030 Are there some other ideas? 338 00:17:17,030 --> 00:17:17,617 Sure. 339 00:17:17,617 --> 00:17:19,078 AUDIENCE: [INAUDIBLE]. 340 00:17:28,050 --> 00:17:30,007 JAMES SWAN: I think that's a good point. 341 00:17:30,007 --> 00:17:31,468 AUDIENCE: --boundary from outside 342 00:17:31,468 --> 00:17:33,210 without knowing what's inside. 343 00:17:33,210 --> 00:17:33,960 JAMES SWAN: Short. 344 00:17:33,960 --> 00:17:36,090 So you'll see, actually, the right way to do this 345 00:17:36,090 --> 00:17:38,370 is to use what's called interior point methods, which 346 00:17:38,370 --> 00:17:39,900 live inside the domain. 347 00:17:39,900 --> 00:17:41,100 This is an excellent point. 348 00:17:41,100 --> 00:17:43,890 There's another issue with this that's 349 00:17:43,890 --> 00:17:46,391 I think actually less subtle than some of these ideas, which 350 00:17:46,391 --> 00:17:47,640 they're all correct, actually. 351 00:17:47,640 --> 00:17:50,220 These can be problems with this sort of penalty method. 352 00:17:50,220 --> 00:17:53,370 As I take the limit that mu goes to zero, 353 00:17:53,370 --> 00:17:57,390 the penalty function becomes large for all points 354 00:17:57,390 --> 00:17:58,320 outside the domain. 355 00:17:58,320 --> 00:18:02,347 They can become larger than f for those points. 356 00:18:02,347 --> 00:18:03,930 And so there are some practical issues 357 00:18:03,930 --> 00:18:06,210 about comparing these two terms against each other. 358 00:18:06,210 --> 00:18:11,010 I may not have sufficient accuracy, sufficient number 359 00:18:11,010 --> 00:18:14,950 of digits to accurately add these two terms together. 360 00:18:14,950 --> 00:18:17,661 So I may prefer to find some point 361 00:18:17,661 --> 00:18:20,160 that lives on the boundary of the domain as mu goes to zero. 362 00:18:20,160 --> 00:18:21,534 But I can't guarantee that it was 363 00:18:21,534 --> 00:18:27,600 a minima of f on that domain, or within that feasible set. 364 00:18:27,600 --> 00:18:30,360 So a lot of practical issues that suggest this 365 00:18:30,360 --> 00:18:32,397 is a bad idea. 366 00:18:32,397 --> 00:18:33,230 This is an old idea. 367 00:18:33,230 --> 00:18:34,650 People knew this was bad for a long time. 368 00:18:34,650 --> 00:18:35,691 It seems natural, though. 369 00:18:35,691 --> 00:18:37,530 It seems like a good way to transform 370 00:18:37,530 --> 00:18:41,154 from these constrained optimization problems 371 00:18:41,154 --> 00:18:42,570 to something we know how to solve, 372 00:18:42,570 --> 00:18:43,992 an unconstrained optimization. 373 00:18:43,992 --> 00:18:46,200 But actually, it turns out not to be such a great way 374 00:18:46,200 --> 00:18:46,700 to do it. 375 00:18:50,340 --> 00:18:52,080 So let's talk about separating out 376 00:18:52,080 --> 00:18:54,160 these two different methods from each other, 377 00:18:54,160 --> 00:18:55,740 or these two different problems. 378 00:18:55,740 --> 00:18:57,840 Let's talk first about equality constraints, 379 00:18:57,840 --> 00:19:01,550 and then we'll talk about inequality constraints. 380 00:19:01,550 --> 00:19:04,040 So equality constrained optimization problems 381 00:19:04,040 --> 00:19:04,850 look like this. 382 00:19:04,850 --> 00:19:08,280 Minimize f of x subject to c of x equals zero. 383 00:19:08,280 --> 00:19:09,710 And let's make it even easier. 384 00:19:09,710 --> 00:19:13,910 Rather than having some vector of equality constraints, 385 00:19:13,910 --> 00:19:15,722 let's just have a single equation 386 00:19:15,722 --> 00:19:17,930 that we have to satisfy for that equality constraint, 387 00:19:17,930 --> 00:19:19,550 like the equation for a circle. 388 00:19:19,550 --> 00:19:22,970 Solutions have to sit on the circumference of a circle. 389 00:19:22,970 --> 00:19:26,170 So one equation that we have to satisfy. 390 00:19:26,170 --> 00:19:28,640 You might ask again, what are the necessary conditions 391 00:19:28,640 --> 00:19:31,480 for defining a minimum? 392 00:19:31,480 --> 00:19:33,230 That's what we used when we had equality-- 393 00:19:33,230 --> 00:19:35,270 or when we had unconstrained optimization. 394 00:19:35,270 --> 00:19:38,420 First we had to define what a minimum was, 395 00:19:38,420 --> 00:19:40,940 and we found that minima were critical points, places 396 00:19:40,940 --> 00:19:44,720 where the gradient of the objective function was zero. 397 00:19:44,720 --> 00:19:46,880 That doesn't have to be true anymore. 398 00:19:46,880 --> 00:19:52,670 Now, the minima has to live on this boundary of some domain. 399 00:19:52,670 --> 00:19:56,240 It has to live in this set of points c of x equals zero. 400 00:19:56,240 --> 00:19:58,100 And the gradient of f is not necessarily 401 00:19:58,100 --> 00:20:00,980 zero at that minimal point. 402 00:20:00,980 --> 00:20:04,580 But you might guess that Taylor expansions are the way 403 00:20:04,580 --> 00:20:14,180 to figure out what the appropriate conditions 404 00:20:14,180 --> 00:20:15,180 for a minima are. 405 00:20:15,180 --> 00:20:18,050 So let's take f of x, and let's expand it, do a Taylor 406 00:20:18,050 --> 00:20:20,630 expansion in some direction, d. 407 00:20:20,630 --> 00:20:23,810 So we'll take a step away from x, which is small, 408 00:20:23,810 --> 00:20:24,840 in some direction, d. 409 00:20:24,840 --> 00:20:28,550 So f of x plus d is f of x plus g dot 410 00:20:28,550 --> 00:20:34,580 d, the dot product between the gradient of f and d. 411 00:20:34,580 --> 00:20:38,570 And at a minimum, either the gradient 412 00:20:38,570 --> 00:20:43,460 is zero or the gradient is perpendicular to this direction 413 00:20:43,460 --> 00:20:45,860 we moved in, d. 414 00:20:45,860 --> 00:20:53,380 We know that because this term is going to increase-- 415 00:20:53,380 --> 00:20:55,370 well, will change the value of f of x. 416 00:20:55,370 --> 00:20:57,060 It will either make it bigger or smaller 417 00:20:57,060 --> 00:20:59,460 depending on whether it's positive or negative. 418 00:20:59,460 --> 00:21:01,170 In either case, it will say that this 419 00:21:01,170 --> 00:21:04,770 point x can't be a minimum unless this term is exactly 420 00:21:04,770 --> 00:21:07,740 equal to zero in the limit that d becomes small. 421 00:21:07,740 --> 00:21:09,870 So either the gradient is zero or the gradient 422 00:21:09,870 --> 00:21:13,310 is orthogonal to this direction d we stepped in. 423 00:21:13,310 --> 00:21:16,290 And d was arbitrary. 424 00:21:16,290 --> 00:21:18,710 We just said take a step in a direction, d. 425 00:21:22,350 --> 00:21:24,150 Lets take our equality constraint 426 00:21:24,150 --> 00:21:28,140 and do the same sort of Taylor expansion, because we 427 00:21:28,140 --> 00:21:33,050 know if we're searching for a minima along this curve 428 00:21:33,050 --> 00:21:35,310 c of x better be equal to zero. 429 00:21:35,310 --> 00:21:36,820 It better satisfy the constraint. 430 00:21:36,820 --> 00:21:40,440 And also, c of x plus d, that little step in the direction d, 431 00:21:40,440 --> 00:21:43,140 should also satisfy the constraint. 432 00:21:43,140 --> 00:21:46,980 We want to study only the feasible set of values. 433 00:21:46,980 --> 00:21:48,602 So actually, d wasn't arbitrary. d 434 00:21:48,602 --> 00:21:50,310 had to satisfy this constraint that, when 435 00:21:50,310 --> 00:21:53,700 I took this little step, c of x plus d had to be equal to zero. 436 00:21:53,700 --> 00:21:55,770 So again, we'll take now a Taylor expansion 437 00:21:55,770 --> 00:21:59,310 of c of x plus d, which is c of x plus grad 438 00:21:59,310 --> 00:22:03,260 of c of x dotted with d. 439 00:22:03,260 --> 00:22:07,730 And that implies that d must be perpendicular to the gradient 440 00:22:07,730 --> 00:22:10,790 of c of x, because c of x plus d has to be zero 441 00:22:10,790 --> 00:22:12,200 and c of x has to be zero. 442 00:22:12,200 --> 00:22:16,496 So the gradient of c of x dot d-- it's a leading order 443 00:22:16,496 --> 00:22:17,870 has also got to be equal to zero. 444 00:22:17,870 --> 00:22:21,290 So d and the gradient in c are perpendicular, 445 00:22:21,290 --> 00:22:23,780 and d and the gradient in g have to be 446 00:22:23,780 --> 00:22:27,230 perpendicular at a minimum. 447 00:22:27,230 --> 00:22:29,900 That's going to define the minimum on this equality 448 00:22:29,900 --> 00:22:31,960 constrained set. 449 00:22:31,960 --> 00:22:34,830 Does that make sense? 450 00:22:34,830 --> 00:22:37,260 c satisfies the constraint, c plus d 451 00:22:37,260 --> 00:22:38,820 satisfies the constraint. 452 00:22:38,820 --> 00:22:42,060 If this is true, d has to be perpendicular to the gradient 453 00:22:42,060 --> 00:22:46,430 of c, g has to be perpendicular to the gradient of d. 454 00:22:46,430 --> 00:22:50,050 d is, in some sense, arbitrary still. 455 00:22:50,050 --> 00:22:52,080 d has to satisfy condition that it's 456 00:22:52,080 --> 00:22:53,800 perpendicular to the gradient of c, 457 00:22:53,800 --> 00:22:55,949 but who knows, there could be lots 458 00:22:55,949 --> 00:22:57,990 of vectors that are perpendicular to the gradient 459 00:22:57,990 --> 00:22:59,880 of c. 460 00:22:59,880 --> 00:23:02,100 So the only generic relationship between these two 461 00:23:02,100 --> 00:23:06,720 we can formulate is g must be parallel to the gradient of c. 462 00:23:06,720 --> 00:23:08,970 g is perpendicular to d, gradient 463 00:23:08,970 --> 00:23:10,550 of c is perpendicular to d. 464 00:23:10,550 --> 00:23:12,985 In the most generic way, g and gradient of c 465 00:23:12,985 --> 00:23:14,360 should be parallel to each other, 466 00:23:14,360 --> 00:23:16,290 because d I can select arbitrarily 467 00:23:16,290 --> 00:23:21,400 from all the vectors of the same dimension as x. 468 00:23:24,060 --> 00:23:25,710 If g is parallel to the gradient of c, 469 00:23:25,710 --> 00:23:31,080 then I can write that g minus some scalar multiplied 470 00:23:31,080 --> 00:23:33,257 by the gradient of c is equal to zero. 471 00:23:33,257 --> 00:23:35,340 That's an equivalent statement, that g is parallel 472 00:23:35,340 --> 00:23:37,230 to the gradient of c. 473 00:23:37,230 --> 00:23:41,490 So that's a condition associated with points 474 00:23:41,490 --> 00:23:45,660 x that solve this equality constrained problem. 475 00:23:45,660 --> 00:23:47,790 The other condition is that point x still 476 00:23:47,790 --> 00:23:52,170 has to satisfy the equality constraint. 477 00:23:52,170 --> 00:23:55,560 But I introduced a new unknown, this lambda, 478 00:23:55,560 --> 00:23:58,170 which is called the Lagrange multiplier. 479 00:23:58,170 --> 00:24:02,730 So now I have one extra unknown, but I have one extra equation. 480 00:24:06,094 --> 00:24:08,010 Let me give you a graphical depiction of this, 481 00:24:08,010 --> 00:24:11,510 and then I'll write down the formal equations again. 482 00:24:11,510 --> 00:24:14,660 So let's suppose we want to minimize 483 00:24:14,660 --> 00:24:17,750 this parabolic function subject to the constraint 484 00:24:17,750 --> 00:24:20,540 that the solution lives on the line. 485 00:24:20,540 --> 00:24:22,520 So here's the contours of the function, 486 00:24:22,520 --> 00:24:24,590 and the solution has to live on this line. 487 00:24:28,380 --> 00:24:30,050 So I get to stand on this line, and I 488 00:24:30,050 --> 00:24:34,010 get to walk and walk and walk until I can't walk downhill 489 00:24:34,010 --> 00:24:36,650 anymore. and I've got to turn and walk uphill again. 490 00:24:36,650 --> 00:24:40,870 And you can see the point where I can't walk downhill anymore 491 00:24:40,870 --> 00:24:45,170 is the place where this constraint is parallel 492 00:24:45,170 --> 00:24:49,700 to the contour, or where the gradient 493 00:24:49,700 --> 00:24:52,520 of the objective function is parallel 494 00:24:52,520 --> 00:24:55,712 to the gradient of the constraint. 495 00:24:55,712 --> 00:24:57,170 So you can actually find this point 496 00:24:57,170 --> 00:25:00,140 by imagining yourself moving along this landscape. 497 00:25:00,140 --> 00:25:02,960 After I get to this point, I start going uphill again. 498 00:25:05,530 --> 00:25:09,700 So that's the method of Lagrange multipliers. 499 00:25:09,700 --> 00:25:11,800 Minimize f of x subject to this constraint. 500 00:25:11,800 --> 00:25:15,380 The solution is given by the point x 501 00:25:15,380 --> 00:25:20,510 at which the gradient is parallel to the gradient of c, 502 00:25:20,510 --> 00:25:22,910 and at which c is equal to zero. 503 00:25:22,910 --> 00:25:25,970 And you solve this system of nonlinear equations 504 00:25:25,970 --> 00:25:27,900 for two unknowns. 505 00:25:27,900 --> 00:25:31,790 One is x, and the other is this unknown lambda. 506 00:25:31,790 --> 00:25:35,540 How far stretched is the gradient 507 00:25:35,540 --> 00:25:39,192 in f relative to the gradient in c? 508 00:25:39,192 --> 00:25:41,150 So again, we've turned the minimization problem 509 00:25:41,150 --> 00:25:43,760 into a system of nonlinear equations. 510 00:25:43,760 --> 00:25:46,100 In order to satisfy the equality constraint, 511 00:25:46,100 --> 00:25:47,980 we've had to introduce another unknown, 512 00:25:47,980 --> 00:25:50,150 the Lagrange multiplier. 513 00:25:50,150 --> 00:25:54,980 It turns out this solution set, x and lambda, 514 00:25:54,980 --> 00:25:58,940 is a critical point of something called the Lagrangian. 515 00:25:58,940 --> 00:26:03,950 It's a function f of x minus lambda times c. 516 00:26:03,950 --> 00:26:08,840 It's a critical point in x and lambda of this nonlinear 517 00:26:08,840 --> 00:26:10,820 function called the Lagrangian. 518 00:26:10,820 --> 00:26:13,010 It's not a minimum of this function, unfortunately. 519 00:26:13,010 --> 00:26:17,240 It's a saddle point of the Lagrangian, it turns out. 520 00:26:17,240 --> 00:26:21,671 So we're trying to find a saddle point of the Lagrangian. 521 00:26:21,671 --> 00:26:23,990 Does this make sense? 522 00:26:23,990 --> 00:26:24,747 Yes? 523 00:26:24,747 --> 00:26:25,902 OK. 524 00:26:25,902 --> 00:26:27,360 We've got to be careful, of course. 525 00:26:27,360 --> 00:26:29,630 Just like with unconstrained optimization, 526 00:26:29,630 --> 00:26:33,530 we actually have to check that our solution is a minimum. 527 00:26:33,530 --> 00:26:36,200 We can't take for granted, we can't 528 00:26:36,200 --> 00:26:39,920 suppose that our nonlinear solver found a minimum 529 00:26:39,920 --> 00:26:41,322 when it solved this equation. 530 00:26:41,322 --> 00:26:43,530 Other critical points can satisfy this equation, too. 531 00:26:43,530 --> 00:26:47,169 So we've got to go back and try to check robustly whether it's 532 00:26:47,169 --> 00:26:47,960 actually a minimum. 533 00:26:47,960 --> 00:26:48,918 But this is the method. 534 00:26:48,918 --> 00:26:53,060 Introduce an additional unknown, the Lagrange multiplier, 535 00:26:53,060 --> 00:26:54,809 because you can show geometrically 536 00:26:54,809 --> 00:26:56,600 that the gradient of the objective function 537 00:26:56,600 --> 00:27:00,140 should be parallel to the gradient of the constraint 538 00:27:00,140 --> 00:27:01,670 at the minimum. 539 00:27:01,670 --> 00:27:03,541 Does that make sense? 540 00:27:03,541 --> 00:27:05,060 Does this picture make sense? 541 00:27:05,060 --> 00:27:06,020 OK. 542 00:27:06,020 --> 00:27:08,353 So you know how to solve systems of nonlinear equations, 543 00:27:08,353 --> 00:27:10,460 you know how to solve constrained optimization 544 00:27:10,460 --> 00:27:10,959 problems. 545 00:27:15,300 --> 00:27:17,060 So here's f. 546 00:27:17,060 --> 00:27:19,870 Here's c. 547 00:27:19,870 --> 00:27:23,170 We can actually write out what these equations are. 548 00:27:23,170 --> 00:27:25,750 So you can show that the gradient of x minus lambda 549 00:27:25,750 --> 00:27:31,660 gradient of c, that's a vector, 2x1 minus lambda and 20x2 550 00:27:31,660 --> 00:27:33,070 plus lambda. 551 00:27:33,070 --> 00:27:34,960 And c is the equation for this line 552 00:27:34,960 --> 00:27:37,480 down here, so x1 minus x2 minus 3. 553 00:27:37,480 --> 00:27:39,850 And that's all got to be equal to zero. 554 00:27:39,850 --> 00:27:42,410 In this case, this is just a system of linear equations. 555 00:27:42,410 --> 00:27:44,080 So you can actually solve directly 556 00:27:44,080 --> 00:27:47,629 for x1, x2, and lambda. 557 00:27:47,629 --> 00:27:50,170 And it's not too difficult to find the solution for all three 558 00:27:50,170 --> 00:27:52,480 of these things by hand. 559 00:27:52,480 --> 00:27:55,750 But in general, these constraints can be nonlinear. 560 00:27:55,750 --> 00:27:58,570 The objective function doesn't have to be quadratic. 561 00:27:58,570 --> 00:28:00,340 Those are the easiest cases to look at. 562 00:28:00,340 --> 00:28:02,810 And the same methodology applies. 563 00:28:02,810 --> 00:28:06,265 And so you should check that you're able to do this. 564 00:28:06,265 --> 00:28:08,700 This is the simplest possible equality constraint problem. 565 00:28:08,700 --> 00:28:09,720 You could do it by hand. 566 00:28:09,720 --> 00:28:11,230 You should check that you're actually able to do it, 567 00:28:11,230 --> 00:28:13,840 that you understand the steps that go into writing out 568 00:28:13,840 --> 00:28:14,911 these equations. 569 00:28:17,500 --> 00:28:19,290 Let's just take one step forward and look 570 00:28:19,290 --> 00:28:21,810 at a less generic case, one in which 571 00:28:21,810 --> 00:28:28,220 we have a vector valued function that gives the equality 572 00:28:28,220 --> 00:28:29,240 constraints instead. 573 00:28:29,240 --> 00:28:31,220 So rather than one equation we have to satisfy, 574 00:28:31,220 --> 00:28:31,985 there may be many. 575 00:28:34,610 --> 00:28:39,110 It's possible that the feasible set doesn't 576 00:28:39,110 --> 00:28:41,240 have any solutions in it. 577 00:28:41,240 --> 00:28:42,830 It's possible that there is no x that 578 00:28:42,830 --> 00:28:46,560 satisfies all of these constraints simultaneously. 579 00:28:46,560 --> 00:28:49,490 That's a bad problem to have. 580 00:28:49,490 --> 00:28:52,100 You wouldn't like to have that problem very much. 581 00:28:52,100 --> 00:28:53,850 But it's possible that that's the case. 582 00:28:53,850 --> 00:28:57,600 But let's assume that there are solutions for the time being. 583 00:28:57,600 --> 00:29:00,081 So there are x's that satisfy the equality constraint. 584 00:29:00,081 --> 00:29:01,580 Let's see if we can figure out again 585 00:29:01,580 --> 00:29:04,640 what the necessary conditions for defining a minima are. 586 00:29:04,640 --> 00:29:07,100 So same as before, let's Taylor expand 587 00:29:07,100 --> 00:29:10,340 f of x going in some direction, d. 588 00:29:10,340 --> 00:29:12,060 And let's make d a nice small step 589 00:29:12,060 --> 00:29:14,840 so we can just treat the f of x plus d 590 00:29:14,840 --> 00:29:16,820 as a linearized function. 591 00:29:16,820 --> 00:29:19,220 So we can see again that g has to be 592 00:29:19,220 --> 00:29:21,770 perpendicular to this direction, d, if we're 593 00:29:21,770 --> 00:29:22,790 going to have a minima. 594 00:29:22,790 --> 00:29:24,665 Otherwise, I could step in some direction, d, 595 00:29:24,665 --> 00:29:27,920 and I'll find either a smaller value of f of x plus d 596 00:29:27,920 --> 00:29:30,560 or a bigger value of f of x plus d. 597 00:29:30,560 --> 00:29:33,626 So g has to be perpendicular to d. 598 00:29:33,626 --> 00:29:35,000 And for the equality constraints, 599 00:29:35,000 --> 00:29:39,901 again, they all have to satisfy this equality constraint 600 00:29:39,901 --> 00:29:40,400 up there. 601 00:29:40,400 --> 00:29:43,970 So c of x has to be equal to zero, and c of x plus d 602 00:29:43,970 --> 00:29:45,310 also has to be equal to zero. 603 00:29:47,950 --> 00:29:51,710 And so if we take a Taylor expansion 604 00:29:51,710 --> 00:29:55,670 of c of x plus d, about x, you'll 605 00:29:55,670 --> 00:29:58,730 get c of x plus d plus the Jacobian 606 00:29:58,730 --> 00:30:02,680 of c, all the partial derivatives of c with respect 607 00:30:02,680 --> 00:30:04,970 to x, multiplied by d. 608 00:30:07,960 --> 00:30:11,710 We know that c of x plus d is zero, and c of x is zero, 609 00:30:11,710 --> 00:30:16,830 so the directions, d, belong to what set of vectors? 610 00:30:16,830 --> 00:30:18,240 The null space. 611 00:30:18,240 --> 00:30:21,300 So these directions have to live in the null space 612 00:30:21,300 --> 00:30:25,620 of the Jacobian of c. 613 00:30:25,620 --> 00:30:27,090 So I can't step in any direction, 614 00:30:27,090 --> 00:30:29,100 I have to step in directions that 615 00:30:29,100 --> 00:30:33,464 are in the null space of c. 616 00:30:33,464 --> 00:30:36,520 g is perpendicular to d, as well. 617 00:30:36,520 --> 00:30:40,180 And d belongs to the null space of c. 618 00:30:40,180 --> 00:30:45,940 In fact, you know that d is perpendicular to each 619 00:30:45,940 --> 00:30:47,860 of the rows of the Jacobian. 620 00:30:47,860 --> 00:30:49,290 Right? 621 00:30:49,290 --> 00:30:51,120 You know that? 622 00:30:51,120 --> 00:30:53,800 I just do the matrix vector product, right? 623 00:30:53,800 --> 00:30:56,190 And so each element of this matrix vector product 624 00:30:56,190 --> 00:31:01,920 is the dot product of d with a different row of the Jacobian. 625 00:31:01,920 --> 00:31:05,880 So those rows are a set of vectors. 626 00:31:05,880 --> 00:31:12,660 Those rows describe the range of J transpose, 627 00:31:12,660 --> 00:31:15,960 or the row space of J. Remember we talked about the four 628 00:31:15,960 --> 00:31:17,460 fundamental subspaces, and I said 629 00:31:17,460 --> 00:31:19,001 we almost never use those other ones, 630 00:31:19,001 --> 00:31:20,910 but this is one time when we will. 631 00:31:20,910 --> 00:31:25,670 So those rows belong to the range of J transpose, 632 00:31:25,670 --> 00:31:29,760 or they belong to the left null space of J. 633 00:31:29,760 --> 00:31:33,420 I need to find a g, a gradient, which 634 00:31:33,420 --> 00:31:34,980 is always perpendicular to d. 635 00:31:34,980 --> 00:31:39,690 And I know d is always perpendicular to the rows of J. 636 00:31:39,690 --> 00:31:44,315 So I can write g as a linear superposition of the rows of J. 637 00:31:44,315 --> 00:31:46,440 As long as g is a linear superposition of the rows, 638 00:31:46,440 --> 00:31:50,320 it'll always be perpendicular to d. 639 00:31:50,320 --> 00:31:53,080 Vectors from the null space of a matrix 640 00:31:53,080 --> 00:31:57,640 are orthogonal to vectors from the row space of that matrix, 641 00:31:57,640 --> 00:31:59,326 it turns out. 642 00:31:59,326 --> 00:32:00,950 And they're orthogonal for this reason. 643 00:32:06,350 --> 00:32:11,750 So it tells us, if Jd is zero, then 644 00:32:11,750 --> 00:32:13,300 d belongs to the null space. 645 00:32:13,300 --> 00:32:15,280 g is perpendicular to d. 646 00:32:15,280 --> 00:32:18,350 That means I could write g as a linear superposition 647 00:32:18,350 --> 00:32:24,530 of the rows of J. So g belongs to the range of J transpose, 648 00:32:24,530 --> 00:32:27,290 or it belongs to the row space of J. 649 00:32:27,290 --> 00:32:29,230 Those are equivalent statements. 650 00:32:29,230 --> 00:32:30,980 And therefore, I should be able to write g 651 00:32:30,980 --> 00:32:33,782 as a linear superposition of the rows of J. 652 00:32:33,782 --> 00:32:35,240 And one way to say that is I should 653 00:32:35,240 --> 00:32:37,160 be able to write g as J transpose 654 00:32:37,160 --> 00:32:41,990 times some other vector lambda. 655 00:32:41,990 --> 00:32:43,790 That's an equivalent way of saying 656 00:32:43,790 --> 00:32:48,004 that g is a linear superposition of the rows of J. 657 00:32:48,004 --> 00:32:49,420 I don't know the values of lambda. 658 00:32:53,700 --> 00:32:55,590 So I introduced a new set of unknowns, 659 00:32:55,590 --> 00:32:59,142 a set of Lagrange multipliers. 660 00:32:59,142 --> 00:33:00,600 My minima is going to be found when 661 00:33:00,600 --> 00:33:04,530 I satisfy this equation, just like before, 662 00:33:04,530 --> 00:33:08,280 and when I'm able to satisfy all of the equality constraints. 663 00:33:14,160 --> 00:33:19,960 How many Lagrange multipliers do I have here? 664 00:33:19,960 --> 00:33:20,960 Can you figure that out? 665 00:33:20,960 --> 00:33:23,324 You can talk with your neighbors if you want. 666 00:33:23,324 --> 00:33:24,240 Take a couple minutes. 667 00:33:24,240 --> 00:33:26,600 Tell me how many Lagrange multipliers, how many elements 668 00:33:26,600 --> 00:33:27,710 are in this vector lambda. 669 00:34:19,380 --> 00:34:22,934 How many elements are in lambda? 670 00:34:22,934 --> 00:34:23,600 Can you tell me? 671 00:34:26,383 --> 00:34:26,883 Sam. 672 00:34:26,883 --> 00:34:29,239 AUDIENCE: Same as the number of equality constraints. 673 00:34:29,239 --> 00:34:30,530 JAMES SWAN: Yes. 674 00:34:30,530 --> 00:34:33,469 It's the same as the number of equality constraints. 675 00:34:33,469 --> 00:34:38,219 J came from the gradient of c. 676 00:34:38,219 --> 00:34:41,370 It's the Jacobian of c. 677 00:34:41,370 --> 00:34:46,290 So it has a number of columns equal to the number of elements 678 00:34:46,290 --> 00:34:48,889 in x, because I'm taking partial derivatives with respect 679 00:34:48,889 --> 00:34:51,300 to each element of x, and has a number 680 00:34:51,300 --> 00:34:54,960 of rows equal to the number of elements of c. 681 00:34:54,960 --> 00:34:58,770 So J transpose, I just transpose those dimensions. 682 00:34:58,770 --> 00:35:02,700 And lambda must have the same number of elements 683 00:35:02,700 --> 00:35:06,000 as c does in order to make this product make sense. 684 00:35:06,000 --> 00:35:08,250 So I introduce a new number of unknowns. 685 00:35:08,250 --> 00:35:12,240 It's equal to exactly the number of equality constraints 686 00:35:12,240 --> 00:35:14,610 that I had, which is good, because I'm 687 00:35:14,610 --> 00:35:17,220 going to make a system of equations that 688 00:35:17,220 --> 00:35:20,790 says g of x minus J transpose lambda equals 0 689 00:35:20,790 --> 00:35:23,460 and c of x equals 0. 690 00:35:23,460 --> 00:35:26,970 And the number of equations here is the number 691 00:35:26,970 --> 00:35:29,290 of elements in x for this gradient, 692 00:35:29,290 --> 00:35:31,874 and the number of elements in c for c. 693 00:35:31,874 --> 00:35:33,540 And the number of unknowns is the number 694 00:35:33,540 --> 00:35:36,902 of elements in x, and the number of elements in c associated 695 00:35:36,902 --> 00:35:38,110 with the Lagrange multiplier. 696 00:35:38,110 --> 00:35:40,800 So I have enough equations and unknowns to determine 697 00:35:40,800 --> 00:35:43,700 all of these things. 698 00:35:43,700 --> 00:35:47,440 So whether I have one equality constraint or a million 699 00:35:47,440 --> 00:35:50,530 equality constraints, the problem is identical. 700 00:35:50,530 --> 00:35:52,660 We use the method of Lagrange multipliers. 701 00:35:52,660 --> 00:35:55,780 We have to solve an augmented system of equations 702 00:35:55,780 --> 00:36:02,800 for x and this projection on the row space of J, which tells us 703 00:36:02,800 --> 00:36:05,710 how the gradient is stretched or made up, 704 00:36:05,710 --> 00:36:08,800 composed of elements of the row space of J. 705 00:36:08,800 --> 00:36:10,300 These are the conditions associated 706 00:36:10,300 --> 00:36:13,690 with a minima in our objective function 707 00:36:13,690 --> 00:36:17,566 on this boundary dictated by the equality constraint. 708 00:36:17,566 --> 00:36:19,690 And of course, the solution set is a critical point 709 00:36:19,690 --> 00:36:25,800 of a Lagrangian, which is f of x minus c dot lambda. 710 00:36:25,800 --> 00:36:28,840 And it's not a minimum of it, it's a critical point. 711 00:36:28,840 --> 00:36:33,309 It's a saddle point, it turns out, of this Lagrangian. 712 00:36:33,309 --> 00:36:35,350 So we've got to check, did we find a saddle point 713 00:36:35,350 --> 00:36:39,974 or not when we find a solution to this equation here. 714 00:36:39,974 --> 00:36:41,890 But it's just a system of nonlinear equations. 715 00:36:41,890 --> 00:36:44,860 If we have some good initial guess, what do we apply? 716 00:36:44,860 --> 00:36:48,670 Newton-Raphson, converge rate towards the solution. 717 00:36:48,670 --> 00:36:51,790 If we don't have a good initial guess, 718 00:36:51,790 --> 00:36:55,270 we've discussed lots of methods we could employ, like homotopy 719 00:36:55,270 --> 00:36:57,340 or continuation to try to develop 720 00:36:57,340 --> 00:37:00,709 good initial guesses for what the solution should be. 721 00:37:00,709 --> 00:37:02,601 Are there any questions about this? 722 00:37:05,450 --> 00:37:07,930 Good. 723 00:37:07,930 --> 00:37:08,430 OK. 724 00:37:08,430 --> 00:37:14,370 So you go to Matlab and you call fmincon, 725 00:37:14,370 --> 00:37:17,120 do a minimization problem, and you give it some constraints. 726 00:37:17,120 --> 00:37:18,870 Linear constraints, nonlinear constraints, 727 00:37:18,870 --> 00:37:20,502 it doesn't matter actually. 728 00:37:20,502 --> 00:37:22,210 The problem is the same for both of them. 729 00:37:22,210 --> 00:37:25,440 It's just a little bit easier if I have linear constraints. 730 00:37:25,440 --> 00:37:28,650 If this constraining function is a linear function, then 731 00:37:28,650 --> 00:37:30,360 the Jacobian I know. 732 00:37:30,360 --> 00:37:34,594 It's the coefficient matrix of this linear problem. 733 00:37:34,594 --> 00:37:36,760 Now I only have to solve linear equations down here. 734 00:37:36,760 --> 00:37:39,730 So the problem is a little bit simpler to solve. 735 00:37:39,730 --> 00:37:42,360 So Matlab sort of breaks these apart 736 00:37:42,360 --> 00:37:45,420 so it can use different techniques depending on which 737 00:37:45,420 --> 00:37:46,530 sort of problem is posed. 738 00:37:46,530 --> 00:37:48,030 But the solution method is the same. 739 00:37:48,030 --> 00:37:49,950 It does the method of Lagrange multipliers 740 00:37:49,950 --> 00:37:51,394 to find the solution. 741 00:37:51,394 --> 00:37:51,894 OK? 742 00:37:54,850 --> 00:37:58,270 Inequality constraints. 743 00:37:58,270 --> 00:38:02,620 So interior point methods were mentioned. 744 00:38:02,620 --> 00:38:04,750 And it turns out this is really the best 745 00:38:04,750 --> 00:38:08,740 way to go about solving generic inequality constrained 746 00:38:08,740 --> 00:38:09,790 problems. 747 00:38:09,790 --> 00:38:11,470 So the problems of the sort minimize 748 00:38:11,470 --> 00:38:15,340 f of x subject to h of x is positive, 749 00:38:15,340 --> 00:38:17,830 or at least not negative. 750 00:38:17,830 --> 00:38:20,110 This is some nonlinear inequality 751 00:38:20,110 --> 00:38:23,350 that describes some domain and its boundary in which 752 00:38:23,350 --> 00:38:25,120 the solution has to live. 753 00:38:25,120 --> 00:38:29,530 And what's done is to rewrite as an unconstrained optimization 754 00:38:29,530 --> 00:38:34,480 problem with a barrier that's incorporated. 755 00:38:34,480 --> 00:38:36,310 This looks a lot like the penalty method, 756 00:38:36,310 --> 00:38:37,750 but it's very different. 757 00:38:37,750 --> 00:38:39,610 And I'll explain how. 758 00:38:39,610 --> 00:38:43,630 So instead, we want to minimize this f of x minus mu 759 00:38:43,630 --> 00:38:49,090 times the sum of log of h, each of these constraints. 760 00:38:51,840 --> 00:38:58,110 If h is negative, we'll take the log of the negative argument. 761 00:38:58,110 --> 00:39:00,430 That's a problem computationally. 762 00:39:00,430 --> 00:39:02,670 So the best we could do is approach the boundary 763 00:39:02,670 --> 00:39:05,400 where h is equal to zero. 764 00:39:05,400 --> 00:39:08,320 And as h goes to zero, the log goes to minus infinity. 765 00:39:08,320 --> 00:39:11,490 So this term tends to blow up because I've got a minus 766 00:39:11,490 --> 00:39:12,780 sign in front of it. 767 00:39:12,780 --> 00:39:16,920 So this is sort of like a penalty, 768 00:39:16,920 --> 00:39:19,320 but it's a little different because the factor in front 769 00:39:19,320 --> 00:39:23,670 I'm actually going to take the limit as mu goes to zero. 770 00:39:23,670 --> 00:39:26,340 I'm going to take the limit as this factor gets small, 771 00:39:26,340 --> 00:39:28,650 rather than gets big. 772 00:39:28,650 --> 00:39:31,170 The log will always get big as I approach 773 00:39:31,170 --> 00:39:33,070 the boundary of the domain. 774 00:39:33,070 --> 00:39:35,061 It'll blow up. 775 00:39:35,061 --> 00:39:36,060 So that's not a problem. 776 00:39:36,060 --> 00:39:39,180 But I can take the limit that mu gets smaller and smaller. 777 00:39:39,180 --> 00:39:42,690 And this quantity here will have less and less 778 00:39:42,690 --> 00:39:46,920 of an impact on the shape of this new objective function 779 00:39:46,920 --> 00:39:48,600 and mu gets smaller and smaller. 780 00:39:48,600 --> 00:39:51,250 The impact will only be nearest the boundary. 781 00:39:51,250 --> 00:39:53,100 Does that make sense? 782 00:39:53,100 --> 00:39:55,356 So you take the limit that mu approaches zero. 783 00:39:55,356 --> 00:39:57,480 It's got to approach it from the positive side, not 784 00:39:57,480 --> 00:40:01,386 the negative side, so everything behaves well. 785 00:40:01,386 --> 00:40:05,575 And this is called an interior point method. 786 00:40:05,575 --> 00:40:07,950 So we have to determine the minimum of this new objective 787 00:40:07,950 --> 00:40:10,740 function for progressively weaker barriers. 788 00:40:10,740 --> 00:40:12,570 So we might start with some value of mu, 789 00:40:12,570 --> 00:40:15,780 and we might reduce mu progressively 790 00:40:15,780 --> 00:40:17,490 until we get mu down small enough 791 00:40:17,490 --> 00:40:19,509 that we think we've converged to a solution. 792 00:40:19,509 --> 00:40:20,800 So how do you do that reliably? 793 00:40:24,340 --> 00:40:29,140 What's the procedure one uses to solve a problem successively 794 00:40:29,140 --> 00:40:30,445 for different parameter values? 795 00:40:32,914 --> 00:40:33,830 AUDIENCE: [INAUDIBLE]. 796 00:40:33,830 --> 00:40:35,538 JAMES SWAN: Yeah, it's a homotopy, right? 797 00:40:35,538 --> 00:40:38,120 You're just going to change the value of this barrier 798 00:40:38,120 --> 00:40:39,364 parameter. 799 00:40:39,364 --> 00:40:40,780 And you're going to find a minima. 800 00:40:40,780 --> 00:40:42,650 And if you make a small change in the barrier parameter, 801 00:40:42,650 --> 00:40:44,774 that's going to serve as an excellent initial guess 802 00:40:44,774 --> 00:40:46,220 for the next value. 803 00:40:46,220 --> 00:40:48,770 And so you're just going to take these small steps. 804 00:40:48,770 --> 00:40:51,200 And the optimization routine is going 805 00:40:51,200 --> 00:40:53,180 to carry you towards the minimum in the limit 806 00:40:53,180 --> 00:40:54,370 that mu goes to zero. 807 00:40:54,370 --> 00:40:55,610 So you do this with homotopy. 808 00:40:58,340 --> 00:41:01,700 Here's an example of this sort of interior point 809 00:41:01,700 --> 00:41:03,200 method, a trivial example. 810 00:41:03,200 --> 00:41:06,180 Minimize x subject to x being positive. 811 00:41:06,180 --> 00:41:09,695 So we know the solution lives where x equals zero. 812 00:41:09,695 --> 00:41:13,040 But let's write this as unconstrained optimization 813 00:41:13,040 --> 00:41:13,760 using a barrier. 814 00:41:13,760 --> 00:41:18,850 So minimize x minus mu times log x. 815 00:41:18,850 --> 00:41:21,140 Here's x minus mu times log x. 816 00:41:21,140 --> 00:41:26,070 So out here, where x is big, x wins over log x, 817 00:41:26,070 --> 00:41:28,120 so everything starts to look linear. 818 00:41:28,120 --> 00:41:30,470 But as x become smaller and smaller, 819 00:41:30,470 --> 00:41:34,010 log x gets very negative, so minus log x gets very positive. 820 00:41:34,010 --> 00:41:36,210 And here's the log creeping back up. 821 00:41:36,210 --> 00:41:38,206 And as I decrease mu smaller and smaller, 822 00:41:38,206 --> 00:41:39,830 you can see the minima of this function 823 00:41:39,830 --> 00:41:44,840 is moving closer and closer and closer to zero. 824 00:41:44,840 --> 00:41:47,240 So if I take the limit that mu decreases 825 00:41:47,240 --> 00:41:49,520 from some positive number towards zero, 826 00:41:49,520 --> 00:41:52,410 eventually this minimum is going to converge 827 00:41:52,410 --> 00:41:55,340 to the minimum of the constrained inequality, 828 00:41:55,340 --> 00:41:57,580 constrained optimization problem. 829 00:41:57,580 --> 00:41:58,900 Make sense? 830 00:41:58,900 --> 00:42:01,350 OK. 831 00:42:01,350 --> 00:42:01,850 OK. 832 00:42:01,850 --> 00:42:04,740 So we want to do this. 833 00:42:04,740 --> 00:42:07,130 You can use any barrier function you want. 834 00:42:07,130 --> 00:42:09,650 Any thoughts on why a logarithmic barrier is used? 835 00:42:18,710 --> 00:42:19,210 No. 836 00:42:19,210 --> 00:42:21,100 OK, that's OK. 837 00:42:21,100 --> 00:42:23,910 So minus log is going to be convex. 838 00:42:23,910 --> 00:42:26,720 Log isn't convex, but minus log is going to be convex. 839 00:42:26,720 --> 00:42:27,795 So that's good. 840 00:42:27,795 --> 00:42:29,920 If this function's convex, then their combination's 841 00:42:29,920 --> 00:42:32,020 going to be convex, and we'll be OK. 842 00:42:32,020 --> 00:42:34,300 But the gradient of the log is easy to compute. 843 00:42:34,300 --> 00:42:37,690 Grad log h is 1 over h grad h. 844 00:42:37,690 --> 00:42:40,180 So if I know h, I know grad h, it's easy for me 845 00:42:40,180 --> 00:42:42,130 to compute the gradient of log h. 846 00:42:42,130 --> 00:42:46,000 We know we're going to solve this unconstrained optimization 847 00:42:46,000 --> 00:42:48,690 problem where we need to take grad of this objective function 848 00:42:48,690 --> 00:42:49,210 equal zero. 849 00:42:49,210 --> 00:42:50,987 So the calculations are easy. 850 00:42:50,987 --> 00:42:52,320 The log makes it easy like that. 851 00:42:52,320 --> 00:42:55,870 The log is also like the most weakly singular 852 00:42:55,870 --> 00:42:57,520 function available to us. 853 00:42:57,520 --> 00:43:00,070 Out of all the tool box of all problems we can reach to, 854 00:43:00,070 --> 00:43:02,835 the log has the mildest sort of singularities. 855 00:43:02,835 --> 00:43:04,960 Singularities at both ends, which is sort of funny, 856 00:43:04,960 --> 00:43:06,520 but the mildest sort of singularities 857 00:43:06,520 --> 00:43:08,782 you have to cope with. 858 00:43:08,782 --> 00:43:10,990 So we want to find the minimum of these unconstrained 859 00:43:10,990 --> 00:43:15,310 optimization problems where the gradient of f minus mu sum 1 860 00:43:15,310 --> 00:43:18,400 over h, grad h, is equal to zero. 861 00:43:18,400 --> 00:43:20,700 And we just do that for progressively smaller values 862 00:43:20,700 --> 00:43:23,410 of mu, and we'll converge to a solution. 863 00:43:23,410 --> 00:43:25,720 That's the interior point method. 864 00:43:25,720 --> 00:43:31,360 You use homotopy to study a sequence of barrier parameters, 865 00:43:31,360 --> 00:43:36,760 or continuation to study a sequence of barrier parameters. 866 00:43:36,760 --> 00:43:41,580 You stop the homotopy or continuation when what? 867 00:43:41,580 --> 00:43:42,806 How are you going to stop? 868 00:43:47,000 --> 00:43:49,234 I've got to make mu small, right? 869 00:43:49,234 --> 00:43:51,150 I want to go towards the limit mu equals zero. 870 00:43:51,150 --> 00:43:52,860 I can't actually get to mu equals zero, 871 00:43:52,860 --> 00:43:54,450 I've just got to approach it. 872 00:43:54,450 --> 00:43:57,930 So how small do I need to make mu before I quit? 873 00:43:57,930 --> 00:43:59,160 It's an interesting question. 874 00:43:59,160 --> 00:43:59,910 What do you think? 875 00:44:02,274 --> 00:44:03,440 I'll take this answer first. 876 00:44:03,440 --> 00:44:06,512 AUDIENCE: So it doesn't affect the limitation. 877 00:44:06,512 --> 00:44:07,220 JAMES SWAN: Good. 878 00:44:07,220 --> 00:44:09,027 So we might look at the solution and see 879 00:44:09,027 --> 00:44:10,610 is the solution becoming less and less 880 00:44:10,610 --> 00:44:12,430 sensitive to the choice of mu. 881 00:44:12,430 --> 00:44:14,252 Did you have another suggestion? 882 00:44:14,252 --> 00:44:15,668 AUDIENCE: [INAUDIBLE]. 883 00:44:17,930 --> 00:44:19,180 JAMES SWAN: Set the tolerance. 884 00:44:19,180 --> 00:44:20,766 Right, OK. 885 00:44:20,766 --> 00:44:23,484 AUDIENCE: [INAUDIBLE]. 886 00:44:23,484 --> 00:44:24,150 JAMES SWAN: Mhm. 887 00:44:24,150 --> 00:44:25,050 Right, right, right, right. 888 00:44:25,050 --> 00:44:25,550 So you-- 889 00:44:25,550 --> 00:44:28,842 AUDIENCE: [INAUDIBLE]. 890 00:44:28,842 --> 00:44:29,550 JAMES SWAN: Good. 891 00:44:29,550 --> 00:44:31,210 So there were two suggestions here. 892 00:44:31,210 --> 00:44:34,100 One is along the lines of a step-norm criteria, 893 00:44:34,100 --> 00:44:36,520 like I check my solution as I change mu, 894 00:44:36,520 --> 00:44:39,700 and I ask when does my solution seem 895 00:44:39,700 --> 00:44:42,700 relatively insensitive to mu. 896 00:44:42,700 --> 00:44:45,370 When the changes in these steps relative to mu 897 00:44:45,370 --> 00:44:48,010 get sufficiently small, I might be 898 00:44:48,010 --> 00:44:49,810 willing to accept these solutions 899 00:44:49,810 --> 00:44:53,230 as reasonable solutions for the constrained optimization. 900 00:44:53,230 --> 00:44:55,180 I can also go back and I can check 901 00:44:55,180 --> 00:44:58,240 sort of function norm criteria. 902 00:44:58,240 --> 00:45:00,650 I can take the value of x I found as the minimum, 903 00:45:00,650 --> 00:45:02,920 and I can ask how good a job does 904 00:45:02,920 --> 00:45:08,140 it do satisfying the original equations. 905 00:45:08,140 --> 00:45:11,180 How far away am I from satisfying the inequality 906 00:45:11,180 --> 00:45:11,680 constraint? 907 00:45:11,680 --> 00:45:14,740 How close am I to actually minimizing the function 908 00:45:14,740 --> 00:45:15,625 within that domain? 909 00:45:20,245 --> 00:45:21,344 OK. 910 00:45:21,344 --> 00:45:22,760 So we're running out of time here. 911 00:45:22,760 --> 00:45:26,300 Let me provide you with an example. 912 00:45:26,300 --> 00:45:27,350 So let's minimize again-- 913 00:45:27,350 --> 00:45:29,808 I always pick this function because it's easy to visualize, 914 00:45:29,808 --> 00:45:32,770 a nice parabolic function that opens upwards. 915 00:45:32,770 --> 00:45:36,340 And let's minimize it subject to the constraint 916 00:45:36,340 --> 00:45:42,620 that h of x1 and x2 is equal to 1 minus-- 917 00:45:42,620 --> 00:45:45,730 well, the equation for a circle of radius 1, essentially. 918 00:45:45,730 --> 00:45:49,240 The interior of that circle. 919 00:45:49,240 --> 00:45:51,450 So here's the contours of the function, 920 00:45:51,450 --> 00:45:53,055 and this red domain is the constraint. 921 00:45:53,055 --> 00:45:54,840 And we want to know the smallest value 922 00:45:54,840 --> 00:45:58,440 of f that lives in this domain. 923 00:45:58,440 --> 00:45:59,440 So here's a Matlab code. 924 00:45:59,440 --> 00:46:00,800 You can try it out. 925 00:46:00,800 --> 00:46:03,960 And make a function, the objective function, f, 926 00:46:03,960 --> 00:46:06,650 it's x squared plus 10x-- 927 00:46:06,650 --> 00:46:08,900 x1 squared plus 10x2 squared. 928 00:46:08,900 --> 00:46:10,240 Here's the gradient. 929 00:46:10,240 --> 00:46:13,210 Here's the Hessian. 930 00:46:13,210 --> 00:46:15,700 Here, I calculate h. 931 00:46:15,700 --> 00:46:17,080 Here's the gradient in h. 932 00:46:17,080 --> 00:46:19,810 Here's the Hessian in h. 933 00:46:19,810 --> 00:46:22,810 I've got to define a new objective function, phi, 934 00:46:22,810 --> 00:46:26,470 which is f minus mu log h. 935 00:46:26,470 --> 00:46:29,320 This is the gradient in phi and this is the Hessian of phi. 936 00:46:29,320 --> 00:46:30,760 Oh, man, what a mess. 937 00:46:30,760 --> 00:46:33,310 But actually, not such a mess, because the log 938 00:46:33,310 --> 00:46:35,785 makes it really easy to take these derivatives. 939 00:46:35,785 --> 00:46:40,810 So it's just a lot of differential sort of calculus 940 00:46:40,810 --> 00:46:43,960 involved in working this out, but this is the Hessian of phi. 941 00:46:43,960 --> 00:46:46,270 And then I need some initial guess. 942 00:46:46,270 --> 00:46:48,940 So I pick the center of my circle 943 00:46:48,940 --> 00:46:50,737 as an initial guess for the solution. 944 00:46:50,737 --> 00:46:52,570 And I'm going to loop over values of mu that 945 00:46:52,570 --> 00:46:53,707 get progressively smaller. 946 00:46:53,707 --> 00:46:55,290 I'll just go down to 10 to the minus 2 947 00:46:55,290 --> 00:46:57,719 and stop for illustration purposes here. 948 00:46:57,719 --> 00:47:00,010 But really, we should be checking the solution as we go 949 00:47:00,010 --> 00:47:04,360 and deciding what values we want to stop with. 950 00:47:04,360 --> 00:47:06,800 And then this loop here, what's this do? 951 00:47:09,880 --> 00:47:12,222 What's it do? 952 00:47:12,222 --> 00:47:15,030 Can you tell? 953 00:47:15,030 --> 00:47:16,440 AUDIENCE: Is it Newton? 954 00:47:16,440 --> 00:47:16,770 JAMES SWAN: What's that? 955 00:47:16,770 --> 00:47:17,596 AUDIENCE: Newton? 956 00:47:17,596 --> 00:47:19,470 JAMES SWAN: Yeah, it's Newton-Raphson, right? 957 00:47:19,470 --> 00:47:25,050 x is x minus Hessian inverse times grad phi, right? 958 00:47:25,050 --> 00:47:26,462 So I just do Newton-Raphson. 959 00:47:26,462 --> 00:47:28,170 I take my initial guess and I loop around 960 00:47:28,170 --> 00:47:30,630 with Newton-Raphson, and when this loop finishes, 961 00:47:30,630 --> 00:47:32,910 I reduce mu, and it'll just use my previous guess 962 00:47:32,910 --> 00:47:35,580 as the initial guess for the next value of the loop, 963 00:47:35,580 --> 00:47:38,187 until mu is sufficiently small. 964 00:47:38,187 --> 00:47:39,680 OK? 965 00:47:39,680 --> 00:47:41,310 Interior point method. 966 00:47:41,310 --> 00:47:43,370 Here's what that solution path looks like. 967 00:47:43,370 --> 00:47:46,520 So mu started at 1, and the barrier was here. 968 00:47:46,520 --> 00:47:49,370 It was close to the edge of the circle, but not quite on it. 969 00:47:49,370 --> 00:47:51,200 But as I reduced mu further and further 970 00:47:51,200 --> 00:47:53,030 and further, you can see the path, 971 00:47:53,030 --> 00:47:54,530 the solution path, that was followed 972 00:47:54,530 --> 00:47:56,930 works its way closer to the boundary of the circle. 973 00:47:56,930 --> 00:47:59,027 And the minimum is found right here. 974 00:47:59,027 --> 00:48:00,860 So it turns out the minimum of this function 975 00:48:00,860 --> 00:48:02,720 doesn't live in the domain, it lives 976 00:48:02,720 --> 00:48:04,960 on the boundary of the domain. 977 00:48:04,960 --> 00:48:08,830 Recall that this point should be a point where 978 00:48:08,830 --> 00:48:11,290 the boundary of the domain is parallel 979 00:48:11,290 --> 00:48:15,699 to the contours of the function, since actually we didn't need 980 00:48:15,699 --> 00:48:16,990 the inequality constraint here. 981 00:48:16,990 --> 00:48:18,630 We could have used the equality constraint. 982 00:48:18,630 --> 00:48:20,840 The equality constrained problem has the same solution 983 00:48:20,840 --> 00:48:22,423 as the inequality constrained problem. 984 00:48:22,423 --> 00:48:23,800 And look, that actually happened. 985 00:48:23,800 --> 00:48:25,677 Here's the contours of the function. 986 00:48:25,677 --> 00:48:27,760 The contour of the function runs right along here, 987 00:48:27,760 --> 00:48:29,200 and you can see it looks like it's 988 00:48:29,200 --> 00:48:31,360 going to be tangent to the circle at this point. 989 00:48:31,360 --> 00:48:34,450 So the interpoint method actually solved 990 00:48:34,450 --> 00:48:37,450 an equality constrained problem in addition to an inequality 991 00:48:37,450 --> 00:48:40,589 constrained problem, which is-- that's sort of cool that you 992 00:48:40,589 --> 00:48:41,380 can do it that way. 993 00:48:44,010 --> 00:48:46,620 How about if I want to do a combination of equality 994 00:48:46,620 --> 00:48:48,135 and inequality constraints? 995 00:48:48,135 --> 00:48:50,010 Then what do I do? 996 00:48:57,285 --> 00:48:58,255 Yeah. 997 00:48:58,255 --> 00:49:02,150 AUDIENCE: [INAUDIBLE]. 998 00:49:02,150 --> 00:49:03,020 JAMES SWAN: Perfect. 999 00:49:03,020 --> 00:49:07,280 Convert the equality constraint into unknowns, 1000 00:49:07,280 --> 00:49:09,909 Lagrange multipliers, instead. 1001 00:49:09,909 --> 00:49:11,450 And then do the interior point method 1002 00:49:11,450 --> 00:49:13,449 on the Lagrange multiplier problem. 1003 00:49:13,449 --> 00:49:15,740 Now you've got a combination of equality and inequality 1004 00:49:15,740 --> 00:49:16,460 constrained. 1005 00:49:16,460 --> 00:49:18,885 This is exactly what Matlab does. 1006 00:49:18,885 --> 00:49:20,840 So it converts equality constraints 1007 00:49:20,840 --> 00:49:22,610 into Lagrange multipliers. 1008 00:49:22,610 --> 00:49:24,500 Inequality constraints it actually solves 1009 00:49:24,500 --> 00:49:26,630 using interior point methods. 1010 00:49:26,630 --> 00:49:28,790 Buried in that interior point method 1011 00:49:28,790 --> 00:49:32,450 is some form of Newton-Raphson and steepest descent combined 1012 00:49:32,450 --> 00:49:34,580 together, like dog leg we talked about 1013 00:49:34,580 --> 00:49:36,500 for unconstrained problems. 1014 00:49:36,500 --> 00:49:38,120 And it's going to do a continuation. 1015 00:49:38,120 --> 00:49:40,470 As it reduces the values of mu, it'll 1016 00:49:40,470 --> 00:49:43,000 have some heuristic for how it does that. 1017 00:49:43,000 --> 00:49:46,010 It's going to use its previous solutions as initial guesses 1018 00:49:46,010 --> 00:49:47,910 for the next iteration. 1019 00:49:47,910 --> 00:49:49,700 So these are very complicated problems, 1020 00:49:49,700 --> 00:49:52,310 but if you understand how to solve systems of nonlinear 1021 00:49:52,310 --> 00:49:55,010 equations, and you think carefully 1022 00:49:55,010 --> 00:49:57,350 about how to control numerical error in your algorithm, 1023 00:49:57,350 --> 00:49:58,808 you come to a conclusion like this, 1024 00:49:58,808 --> 00:50:04,100 that you can do these sorts of Lagrange multiplier interior 1025 00:50:04,100 --> 00:50:06,710 point methods to solve a wide variety of problems 1026 00:50:06,710 --> 00:50:09,820 with reasonable reliability. 1027 00:50:09,820 --> 00:50:10,625 OK? 1028 00:50:10,625 --> 00:50:12,156 Any more questions? 1029 00:50:15,030 --> 00:50:16,140 No? 1030 00:50:16,140 --> 00:50:17,200 Good. 1031 00:50:17,200 --> 00:50:20,830 Well, thank you, and we'll see you on Friday.