1 00:00:00,000 --> 00:00:02,520 The following content is provided under a Creative 2 00:00:02,520 --> 00:00:03,970 Commons license. 3 00:00:03,970 --> 00:00:06,330 Your support will help MIT OpenCourseWare 4 00:00:06,330 --> 00:00:10,660 continue to offer high-quality educational resources for free. 5 00:00:10,660 --> 00:00:13,320 To make a donation or view additional materials 6 00:00:13,320 --> 00:00:17,170 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,170 --> 00:00:18,370 at ocw.mit.edu. 8 00:00:21,900 --> 00:00:23,735 RUSS TEDRAKE: OK, so last time we 9 00:00:23,735 --> 00:00:25,110 talked about non-linear dynamics. 10 00:00:25,110 --> 00:00:26,130 We drew phase plots. 11 00:00:26,130 --> 00:00:28,260 We talked about basins of attraction. 12 00:00:28,260 --> 00:00:31,380 We talked about fixed points. 13 00:00:31,380 --> 00:00:34,980 And we hinted at control, and I tried to motivate control 14 00:00:34,980 --> 00:00:39,780 not as some nice matrix manipulation of equations, 15 00:00:39,780 --> 00:00:42,797 but actually by thinking about phase plots and saying, 16 00:00:42,797 --> 00:00:44,880 you're going to move that phase plot a little bit. 17 00:00:44,880 --> 00:00:46,338 You're going to reshape it in order 18 00:00:46,338 --> 00:00:48,390 to bend the system to your will right-- 19 00:00:48,390 --> 00:00:49,380 but just a little bend. 20 00:00:49,380 --> 00:00:51,600 You're only allowed a little building in this class. 21 00:00:51,600 --> 00:00:55,377 OK, so today we're going to make good on that idea, 22 00:00:55,377 --> 00:00:57,960 but we're going to do it on an even simpler system first, just 23 00:00:57,960 --> 00:00:58,830 today. 24 00:00:58,830 --> 00:01:00,997 We're going to do it on the double integrator, which 25 00:01:00,997 --> 00:01:02,750 is q double dot equals u-- 26 00:01:09,710 --> 00:01:14,120 because here I can do everything analytically on the board. 27 00:01:14,120 --> 00:01:16,910 If you want a physical interpretation of that-- 28 00:01:16,910 --> 00:01:18,320 which I always like-- 29 00:01:18,320 --> 00:01:25,790 you can think of this as a brick of unit mass 30 00:01:25,790 --> 00:01:29,300 on ice, where you provide as a control input 31 00:01:29,300 --> 00:01:31,970 a force, like this. 32 00:01:31,970 --> 00:01:34,670 [INAUDIBLE] force equals u, and there's no friction, 33 00:01:34,670 --> 00:01:35,990 and mass equals 1. 34 00:01:38,840 --> 00:01:43,250 What we're going to try to do with this double integrator is 35 00:01:43,250 --> 00:01:45,920 roughly, we're going to try to drive it to some-- 36 00:01:45,920 --> 00:01:46,650 to the origin. 37 00:01:46,650 --> 00:01:50,243 We're going to try to drive it to zero position-- 38 00:01:53,458 --> 00:01:55,250 I guess that's negative x in this picture-- 39 00:01:55,250 --> 00:01:58,760 and with 0 velocity. 40 00:01:58,760 --> 00:02:03,860 It turns out there's lots of ways to do that. 41 00:02:03,860 --> 00:02:06,830 And the goal here is to make you think about ways 42 00:02:06,830 --> 00:02:09,500 to do that that involve invoking optimality, 43 00:02:09,500 --> 00:02:12,050 because that's going to be our computational crutch 44 00:02:12,050 --> 00:02:14,520 for the rest of the term. 45 00:02:14,520 --> 00:02:15,020 OK. 46 00:02:19,040 --> 00:02:22,220 I've been trying to bring the tools 47 00:02:22,220 --> 00:02:24,240 from the different disciplines all together. 48 00:02:24,240 --> 00:02:28,610 So let me start by doing just a quick pole placement 49 00:02:28,610 --> 00:02:31,400 analysis, for those of you that don't think about poles 50 00:02:31,400 --> 00:02:35,790 and linear systems that much. 51 00:02:35,790 --> 00:02:39,320 So if I want to write the-- 52 00:02:39,320 --> 00:02:42,710 a state space form this equation-- again, 53 00:02:42,710 --> 00:02:45,470 I've always tried to use q just to be my coordinates, 54 00:02:45,470 --> 00:02:49,140 and I'll use x to be my state vector. 55 00:02:49,140 --> 00:02:58,667 So a state space form of this is going to use vector x to be, 56 00:02:58,667 --> 00:02:59,750 in this case, q and q dot. 57 00:03:02,930 --> 00:03:06,680 And that dynamics there is the simplest state space 58 00:03:06,680 --> 00:03:13,380 form you're going to see, but a state space linear equation 59 00:03:13,380 --> 00:03:17,780 will have the form Ax plus Bu. 60 00:03:17,780 --> 00:03:28,760 In our case, it's going to be the trivial 0,1' 0, 0; and x 61 00:03:28,760 --> 00:03:35,432 plus 0, 1 times u. 62 00:03:35,432 --> 00:03:37,310 OK, it's not going to get easier than that, 63 00:03:37,310 --> 00:03:38,900 but we're going to use that form, 64 00:03:38,900 --> 00:03:40,108 because that's going to help. 65 00:03:43,250 --> 00:03:48,190 OK, our goal now is to design u. 66 00:03:48,190 --> 00:03:50,692 We want to come up with a control action u-- 67 00:03:50,692 --> 00:03:52,900 which you can think of as being a force on the brick, 68 00:03:52,900 --> 00:03:54,040 let's say-- 69 00:03:54,040 --> 00:03:57,640 which drives the system to 0. 70 00:03:57,640 --> 00:04:05,380 So in general, our goal is to design some feedback law-- 71 00:04:05,380 --> 00:04:09,340 I use pi for my control policies-- 72 00:04:09,340 --> 00:04:10,540 which is a function of x. 73 00:04:15,320 --> 00:04:17,430 Let's start by doing the linear thing. 74 00:04:17,430 --> 00:04:23,480 Let's start with considering [INAUDIBLE] 75 00:04:23,480 --> 00:04:29,735 of the form of negative kx, where k is a matrix. 76 00:04:29,735 --> 00:04:31,360 Well, actually, what is k in this case? 77 00:04:34,720 --> 00:04:36,160 AUDIENCE: [INAUDIBLE] 78 00:04:36,160 --> 00:04:37,660 RUSS TEDRAKE: 1 by 2, right? 79 00:04:37,660 --> 00:04:53,020 So it's going to be k1, k2 times x, which is my q, q dot-- 80 00:04:53,020 --> 00:04:57,993 equivalent of saying negative k1q minus k2 q dot. 81 00:04:57,993 --> 00:04:59,410 So many of you will recognize this 82 00:04:59,410 --> 00:05:02,980 as a proportional derivative controller form. 83 00:05:07,060 --> 00:05:09,700 OK, so if I take this u equals negative x 84 00:05:09,700 --> 00:05:12,910 and I start thinking about what that-- if I change k, 85 00:05:12,910 --> 00:05:15,490 what happens to my control system? 86 00:05:15,490 --> 00:05:18,170 That's easy to do in linear systems. 87 00:05:18,170 --> 00:05:23,290 So if I stick that gain matrix in, then what I get 88 00:05:23,290 --> 00:05:26,560 is a closed loop system, which is A minus-- 89 00:05:26,560 --> 00:05:43,910 sorry-- minus Bk x, which is just the system 0, 1; 90 00:05:43,910 --> 00:05:47,321 negative k1, negative k2; x. 91 00:05:50,040 --> 00:05:53,450 OK, and if you've had a class on differential equations, 92 00:05:53,450 --> 00:05:54,830 you know how to solve that. 93 00:05:58,790 --> 00:06:02,128 The solution uses the eigenvalues of the system. 94 00:06:02,128 --> 00:06:04,295 You can quickly take the eigenvalues of that matrix. 95 00:06:11,754 --> 00:06:23,500 Characteristic equation out to be k squared minus 4k1 over 2, 96 00:06:23,500 --> 00:06:27,490 with eigenvectors-- 97 00:06:27,490 --> 00:06:34,330 v1 is this, v2 is this. 98 00:06:40,220 --> 00:06:45,770 That's just the eigenvalues and eigenvectors of this matrix. 99 00:06:54,100 --> 00:06:56,590 So what are the conditions on the eigenvalues 100 00:06:56,590 --> 00:06:59,344 to make sure the system's stable? 101 00:06:59,344 --> 00:07:00,720 AUDIENCE: [INAUDIBLE] 102 00:07:00,720 --> 00:07:05,610 RUSS TEDRAKE: [INAUDIBLE] both negative. 103 00:07:05,610 --> 00:07:08,113 Potentially, we care about whether the system has any 104 00:07:08,113 --> 00:07:10,530 oscillations or not, which manifest themselves and whether 105 00:07:10,530 --> 00:07:11,340 that's-- 106 00:07:11,340 --> 00:07:12,824 whether the thing's complex-- 107 00:07:12,824 --> 00:07:14,490 [INAUDIBLE] complex eigenvalues. 108 00:07:14,490 --> 00:07:17,910 This is all things you've seen in plenty of classes, 109 00:07:17,910 --> 00:07:19,770 but the only way it's going to be complex 110 00:07:19,770 --> 00:07:22,030 is if this thing goes negative, right? 111 00:07:24,570 --> 00:07:25,320 OK. 112 00:07:25,320 --> 00:07:27,210 So we want a couple of things. 113 00:07:27,210 --> 00:07:29,640 We want both of them to be-- 114 00:07:29,640 --> 00:07:35,460 both of these to be less than 0, which we can get pretty easily. 115 00:07:35,460 --> 00:07:38,910 And we want k2 squared to be bigger than 4k1. 116 00:07:43,170 --> 00:07:46,800 k2 squared is bigger than 4k1-- 117 00:07:46,800 --> 00:07:48,750 then the system is actually overdamped. 118 00:07:52,590 --> 00:07:55,560 If it equals 4k1, it's critically damped. 119 00:08:01,660 --> 00:08:03,720 And it's underdamped if it's less than 4k1. 120 00:08:11,550 --> 00:08:19,310 For stability, we want lambda 1 and 2 to be less than 0 121 00:08:19,310 --> 00:08:20,469 for stability. 122 00:08:29,950 --> 00:08:33,970 OK, so just to connect this to the phase plots we were talking 123 00:08:33,970 --> 00:08:35,380 about yesterday, you've seen-- 124 00:08:35,380 --> 00:08:39,309 you might have seen phase plots first in this context, 125 00:08:39,309 --> 00:08:42,010 in the linear systems. context . 126 00:08:42,010 --> 00:08:44,920 The reason to do this eigenvalue decomposition 127 00:08:44,920 --> 00:08:47,920 is that you have these beautiful graphical interpretations 128 00:08:47,920 --> 00:08:50,390 of the solutions to the system. 129 00:08:50,390 --> 00:08:54,670 Let's choose a particular case. 130 00:08:54,670 --> 00:08:56,810 What did I pick? 131 00:08:56,810 --> 00:08:59,788 So let's say k1 is 1. 132 00:08:59,788 --> 00:09:01,330 That means I'm going to want to think 133 00:09:01,330 --> 00:09:05,620 about an overdamped system. 134 00:09:05,620 --> 00:09:09,590 I want k2 to be at least greater than 2. 135 00:09:19,390 --> 00:09:22,780 So I'm going to choose k2 equals 4. 136 00:09:22,780 --> 00:09:29,770 If I do that, then my eigenvalues 137 00:09:29,770 --> 00:09:38,650 work out to be negative 4 plus or minus 16 minus 4 138 00:09:38,650 --> 00:09:44,620 over 2, which is negative 2 plus or minus the square root of 3. 139 00:09:44,620 --> 00:09:48,910 Square root of 3 is about 1.75. 140 00:09:48,910 --> 00:09:55,120 So I get negative 0.25 and negative 3.75 141 00:09:55,120 --> 00:09:57,040 for my eigenvalues. 142 00:10:02,860 --> 00:10:05,770 OK. 143 00:10:05,770 --> 00:10:09,797 And the eigenvectors are going to be just this form. 144 00:10:09,797 --> 00:10:12,130 So what that allows me to do is make my same state space 145 00:10:12,130 --> 00:10:19,530 plots we were making yesterday where now I have q and q dot. 146 00:10:33,850 --> 00:10:45,940 And my first eigenvector is going to be 1, negative 0.25. 147 00:10:45,940 --> 00:10:47,800 I'll use these as quarters. 148 00:10:47,800 --> 00:10:51,520 So I go minus 0.25 [INAUDIBLE] 1 here, 149 00:10:51,520 --> 00:10:54,760 so I get a line that looks like this. 150 00:10:58,600 --> 00:10:59,110 That's v1. 151 00:11:04,690 --> 00:11:10,330 And I get a line that goes almost 1, negative 4-- 152 00:11:10,330 --> 00:11:13,795 so a little bit under that here. 153 00:11:20,140 --> 00:11:21,370 v2 is over here. 154 00:11:25,510 --> 00:11:27,310 OK. 155 00:11:27,310 --> 00:11:29,978 And the eigenvalues on this-- if you've 156 00:11:29,978 --> 00:11:32,020 seen these plots before, we typically [INAUDIBLE] 157 00:11:32,020 --> 00:11:34,000 They're both negative, so we're going 158 00:11:34,000 --> 00:11:36,250 to draw an arrow like this. 159 00:11:36,250 --> 00:11:38,980 Systems-- initial conditions that start on this 160 00:11:38,980 --> 00:11:41,160 will just get smaller. 161 00:11:41,160 --> 00:11:43,660 Initial conditions that start on this will also get smaller, 162 00:11:43,660 --> 00:11:45,660 but they can actually get smaller a lot quicker. 163 00:11:54,400 --> 00:11:56,080 Just to say a few of the subtleties, 164 00:11:56,080 --> 00:11:58,570 so I an overdamp system so I don't 165 00:11:58,570 --> 00:12:00,730 have repeated eigenvalues. 166 00:12:00,730 --> 00:12:03,220 I chose a overdamped system so I don't have oscillations, 167 00:12:03,220 --> 00:12:04,900 because I can make the same plots. 168 00:12:04,900 --> 00:12:05,970 But for the overdamped system, this 169 00:12:05,970 --> 00:12:07,512 is a great way to think about things. 170 00:12:09,670 --> 00:12:11,800 When I don't have repeated eigenvalues, the-- 171 00:12:14,530 --> 00:12:16,630 any initial condition of the system 172 00:12:16,630 --> 00:12:18,880 can be written as a linear combination. 173 00:12:18,880 --> 00:12:20,680 That means that, when the system doesn't 174 00:12:20,680 --> 00:12:26,530 have repeated eigenvalues, the eigenvectors span the space. 175 00:12:26,530 --> 00:12:29,500 Any initial conditions can be written as a linear combination 176 00:12:29,500 --> 00:12:30,580 of the two eigenvalues. 177 00:12:33,250 --> 00:12:37,030 And the dynamics from this point are just 178 00:12:37,030 --> 00:12:41,510 the exponential dynamics on the-- 179 00:12:41,510 --> 00:12:43,930 of the two components. 180 00:12:43,930 --> 00:12:44,880 I don't know. 181 00:12:44,880 --> 00:12:47,130 Tell me if you want me to say that again in a more-- 182 00:12:51,700 --> 00:12:55,600 it's not really the focus, but if you understand that, 183 00:12:55,600 --> 00:12:56,970 you got the whole thing here. 184 00:12:56,970 --> 00:13:01,360 So what it means is you could take any initial condition-- it 185 00:13:01,360 --> 00:13:04,570 turns out, any initial condition when I have eigen vectors which 186 00:13:04,570 --> 00:13:06,070 span the space. 187 00:13:06,070 --> 00:13:09,460 I'm guaranteed to have that, if I have unique eigenvalues. 188 00:13:12,070 --> 00:13:15,700 Then I can write this as a combination of these vectors. 189 00:13:20,840 --> 00:13:24,140 I've got one component like this and one component like this. 190 00:13:24,140 --> 00:13:26,170 This initial condition-- if I say this is x0-- 191 00:13:30,970 --> 00:13:36,610 so I can say x0 is alpha 1 times v1 plus alpha 2 times v2. 192 00:13:41,110 --> 00:13:47,950 We know that initial conditions that are just on the line v1 193 00:13:47,950 --> 00:13:52,420 go e to the negative lambda-- 194 00:13:52,420 --> 00:13:58,270 or e to the lambda t v1, so the whole system 195 00:13:58,270 --> 00:14:02,680 goes alpha 1 e to the negative lamb-- oh, sorry. 196 00:14:02,680 --> 00:14:07,690 I've got negative eigenvalues-- to the lambda t v1 plus alpha 2 197 00:14:07,690 --> 00:14:12,340 e to the lambda 2 t v2. 198 00:14:15,460 --> 00:14:21,110 That's the great thing about all these linear systems. 199 00:14:21,110 --> 00:14:24,280 So what that means is, if I've drawn the eigenvectors, 200 00:14:24,280 --> 00:14:28,030 then I know exactly the entire phase plot of the system. 201 00:14:28,030 --> 00:14:30,550 So we're connecting back to the pendulum. 202 00:14:30,550 --> 00:14:33,190 I went through in all the different places. 203 00:14:33,190 --> 00:14:35,080 I thought about what the contributions 204 00:14:35,080 --> 00:14:36,452 were and mapped it out. 205 00:14:36,452 --> 00:14:37,660 Here I don't have to do that. 206 00:14:37,660 --> 00:14:41,380 I know that this system is going to-- the component-- 207 00:14:41,380 --> 00:14:44,260 the space that has-- in this eigenvalue 208 00:14:44,260 --> 00:14:46,425 is going to decay quickly towards 0. 209 00:14:46,425 --> 00:14:48,550 And it's going to decay faster than this one, which 210 00:14:48,550 --> 00:14:52,240 means an initial condition like this is going to go like this. 211 00:14:52,240 --> 00:14:55,360 I should have used one of my many other colors to do that. 212 00:14:59,080 --> 00:15:01,270 Looks like a blue-- 213 00:15:01,270 --> 00:15:05,860 so trajectories from that initial condition 214 00:15:05,860 --> 00:15:08,710 are going to do that. 215 00:15:08,710 --> 00:15:13,210 And trajectories from this initial condition 216 00:15:13,210 --> 00:15:15,515 are going to-- what are they going to do, 217 00:15:15,515 --> 00:15:16,390 if I start over here? 218 00:15:19,810 --> 00:15:22,120 They're going to go mostly down, and eventually we 219 00:15:22,120 --> 00:15:24,580 think it's stable, so it's going to get to the origin. 220 00:15:24,580 --> 00:15:29,777 I can even do my filled in circle there. 221 00:15:29,777 --> 00:15:32,110 When are they going to start bending towards the origin? 222 00:15:35,098 --> 00:15:36,100 AUDIENCE: [INAUDIBLE] 223 00:15:36,100 --> 00:15:40,900 RUSS TEDRAKE: Later-- they have to pass this before-- 224 00:15:40,900 --> 00:15:43,240 they have to pass that and get the negative velocities 225 00:15:43,240 --> 00:15:45,370 before they can hook back in. 226 00:15:45,370 --> 00:15:51,400 So the trajectories look like this, and go in. 227 00:15:51,400 --> 00:15:53,920 Yeah? 228 00:15:53,920 --> 00:15:57,310 And likewise, so all these trajectories-- 229 00:15:57,310 --> 00:16:05,140 you can map out the entire phase portrait of the space 230 00:16:05,140 --> 00:16:09,610 pretty quickly just by understanding the eigenvalues 231 00:16:09,610 --> 00:16:10,745 and eigenvectors. 232 00:16:14,553 --> 00:16:15,720 Same thing's true over here. 233 00:16:24,566 --> 00:16:27,500 OK, so there's another example of a phase plot that we have. 234 00:16:27,500 --> 00:16:29,840 In the linear systems, it works out to be clean. 235 00:16:29,840 --> 00:16:32,600 You can just do these eigenvalues, eigenvectors. 236 00:16:32,600 --> 00:16:40,340 OK, now, control-- we're allowed to change k1 and k2. 237 00:16:40,340 --> 00:16:45,287 Changing k1 and k2 is going to change the phase portrait. 238 00:16:45,287 --> 00:16:46,745 It's going to change those vectors. 239 00:16:49,880 --> 00:16:55,148 I want to change them to make it do whatever I want. 240 00:16:55,148 --> 00:16:57,440 The first discussion is, what do we want to make it do? 241 00:17:03,200 --> 00:17:05,750 Maybe even before that, I should observe that, 242 00:17:05,750 --> 00:17:10,880 without thinking about optimality at all, 243 00:17:10,880 --> 00:17:15,349 it would be easy to stop here, because I-- 244 00:17:15,349 --> 00:17:20,480 if I look at this carefully, as long as I choose k squared-- 245 00:17:20,480 --> 00:17:22,880 k2 squared greater than 4, K1. 246 00:17:22,880 --> 00:17:26,420 I know I'm going to not oscillate. 247 00:17:26,420 --> 00:17:29,630 And I can just start driving k1-- 248 00:17:29,630 --> 00:17:34,700 and correspondingly, k2-- up as high as I like, 249 00:17:34,700 --> 00:17:37,310 and make the system get to the origin as fast as I want, 250 00:17:37,310 --> 00:17:38,360 and it won't oscillate. 251 00:17:41,330 --> 00:17:45,490 Why not just drive them all the way to infinity? 252 00:17:45,490 --> 00:17:47,450 AUDIENCE: [INAUDIBLE] 253 00:17:47,450 --> 00:17:48,740 RUSS TEDRAKE: You can't-- don't have a motive to do that. 254 00:17:48,740 --> 00:17:50,930 That's the first unsatisfying thing-- 255 00:17:50,930 --> 00:17:52,257 absolutely. 256 00:17:52,257 --> 00:17:53,840 Probably there's some unmodeled thing. 257 00:17:53,840 --> 00:17:55,290 Even if I did have a motive that could do that, 258 00:17:55,290 --> 00:17:56,990 there's probably some unmodeled things 259 00:17:56,990 --> 00:18:00,310 that I might excite, and cause bad things to happen anyways. 260 00:18:00,310 --> 00:18:01,190 AUDIENCE: [INAUDIBLE] 261 00:18:01,190 --> 00:18:02,480 RUSS TEDRAKE: You could melt the ice and it'll break. 262 00:18:02,480 --> 00:18:03,860 That's right. 263 00:18:03,860 --> 00:18:04,460 That's right. 264 00:18:04,460 --> 00:18:05,835 I guess I could have said wheels, 265 00:18:05,835 --> 00:18:10,100 and then maybe they'd melt the tires. 266 00:18:10,100 --> 00:18:12,500 OK. 267 00:18:12,500 --> 00:18:14,060 And you can see that here, actually. 268 00:18:14,060 --> 00:18:14,560 Remember? 269 00:18:17,390 --> 00:18:21,290 What is the unactuated phase plot of the system look like? 270 00:18:21,290 --> 00:18:23,450 I can just draw that. 271 00:18:23,450 --> 00:18:27,230 If u was just uniformally 0, if k1 and k2 were 0, 272 00:18:27,230 --> 00:18:31,270 what would the phase have looked like there? 273 00:18:31,270 --> 00:18:35,808 AUDIENCE: [INAUDIBLE] 274 00:18:35,808 --> 00:18:37,350 RUSS TEDRAKE: It would have just been 275 00:18:37,350 --> 00:18:42,090 x dot equals A of x, where A is this guy. 276 00:18:46,410 --> 00:18:49,350 So it wouldn't have been as interesting. 277 00:18:49,350 --> 00:18:54,140 Every point would have just had a vector like this. 278 00:18:54,140 --> 00:18:57,960 It would have been a little bigger with bigger velocities, 279 00:18:57,960 --> 00:18:58,860 but it's just-- 280 00:18:58,860 --> 00:19:03,180 it would just be a flow like that, 281 00:19:03,180 --> 00:19:06,030 which I hope is what you'd expect it to do, 282 00:19:06,030 --> 00:19:08,040 since it's an integrator. 283 00:19:08,040 --> 00:19:09,450 Things are just going to-- 284 00:19:09,450 --> 00:19:13,200 off into the ether. 285 00:19:13,200 --> 00:19:15,540 OK. 286 00:19:15,540 --> 00:19:16,380 So if I consider-- 287 00:19:16,380 --> 00:19:19,680 I started with this, and I'm getting out things that 288 00:19:19,680 --> 00:19:22,320 look like this, I'm already-- 289 00:19:22,320 --> 00:19:26,160 in my unitless cartoon here, it's sort of already 290 00:19:26,160 --> 00:19:28,710 looks like I'm using a lot of torque to do what I'm doing. 291 00:19:28,710 --> 00:19:31,440 I'm using a lot of force. 292 00:19:31,440 --> 00:19:33,150 I'm really significantly changing 293 00:19:33,150 --> 00:19:35,430 those dynamics in order to bend this thing 294 00:19:35,430 --> 00:19:37,050 to come around like that. 295 00:19:37,050 --> 00:19:38,850 That's OK, but we can do better. 296 00:19:42,720 --> 00:19:45,960 So today I want to use this system, which 297 00:19:45,960 --> 00:19:51,780 I think it's quite easy to have strong intuition for, 298 00:19:51,780 --> 00:19:54,868 to start designing optimal feedback controllers. 299 00:19:59,240 --> 00:20:02,270 So let's address the we don't have infinite torque 300 00:20:02,270 --> 00:20:02,990 problem first. 301 00:20:23,770 --> 00:20:26,215 One more comment on this-- 302 00:20:26,215 --> 00:20:28,390 I didn't actually call them poles-- 303 00:20:28,390 --> 00:20:30,460 there's a pole placement version of this too. 304 00:20:30,460 --> 00:20:32,025 It's exactly the same thing. 305 00:20:32,025 --> 00:20:33,400 If you were to draw a root locus, 306 00:20:33,400 --> 00:20:35,440 what would the system look like? 307 00:20:39,280 --> 00:20:41,110 The typical root locus would be you're 308 00:20:41,110 --> 00:20:46,300 multiplying the entire feedback by some linear term. 309 00:20:46,300 --> 00:20:49,850 You're not scaling them in some squared law, 310 00:20:49,850 --> 00:20:52,840 so what you get is, for very small feedback gains, 311 00:20:52,840 --> 00:20:54,580 you get oscillations. 312 00:20:54,580 --> 00:20:59,290 As you crank it up, the poles connect in the left half plane, 313 00:20:59,290 --> 00:21:01,600 and then they separate. 314 00:21:01,600 --> 00:21:03,670 And as I keep turning up my gains, one of them 315 00:21:03,670 --> 00:21:04,753 creeps towards the origin. 316 00:21:04,753 --> 00:21:07,508 The other one goes off really far to infinity. 317 00:21:07,508 --> 00:21:09,550 So just those of you think about poles and zeros, 318 00:21:09,550 --> 00:21:12,220 this is exactly the same way to say that. 319 00:21:12,220 --> 00:21:14,920 I didn't do a root locus because I was changing two parameters, 320 00:21:14,920 --> 00:21:16,360 but it all connects. 321 00:21:21,810 --> 00:21:26,100 OK, so now, let's say I have a hard constraint 322 00:21:26,100 --> 00:21:27,825 on what u I can provide. 323 00:21:31,470 --> 00:21:44,580 Let's just say that I have an additional constraint that's, 324 00:21:44,580 --> 00:21:48,790 let's say, the absolute value of u has got to be less than 1. 325 00:21:54,580 --> 00:21:59,870 Well, that changes a lot of things. 326 00:21:59,870 --> 00:22:06,100 My linear system analysis is impoverished now. 327 00:22:06,100 --> 00:22:10,540 If you want a graphical version of what that's doing, that's-- 328 00:22:10,540 --> 00:22:14,070 my zero input looked like that. 329 00:22:14,070 --> 00:22:16,210 I wanted to go like this with my linear controller, 330 00:22:16,210 --> 00:22:18,085 but maybe it's capped at something like this. 331 00:22:21,073 --> 00:22:22,990 OK, so what's that going to do to your system? 332 00:22:29,505 --> 00:22:35,730 If I just ran the policy u is some saturation, say, 333 00:22:35,730 --> 00:22:39,630 on negative kx, I took my same feedback-- linear feedback 334 00:22:39,630 --> 00:22:43,440 controller and I just said, if it's greater than 1 it's 1; 335 00:22:43,440 --> 00:22:48,470 if it's less than negative 1, it's negative 1, 336 00:22:48,470 --> 00:22:50,810 I think you're still OK. 337 00:22:50,810 --> 00:22:53,120 Trajectories are still going to get to the origin. 338 00:22:53,120 --> 00:22:57,163 They might take fairly long routes to the origin. 339 00:22:57,163 --> 00:22:58,580 You're not going to lose stability 340 00:22:58,580 --> 00:23:02,420 in this case because of that, but it 341 00:23:02,420 --> 00:23:04,503 starts to feel like, man, I should really 342 00:23:04,503 --> 00:23:06,170 be thinking about those hard constraints 343 00:23:06,170 --> 00:23:09,120 when I design my controller. 344 00:23:09,120 --> 00:23:11,030 All right. 345 00:23:11,030 --> 00:23:12,357 So how do we do that? 346 00:23:12,357 --> 00:23:13,940 One way to do that is optimal control. 347 00:23:13,940 --> 00:23:15,232 It's not the only way to do it. 348 00:23:17,508 --> 00:23:19,300 Let's formulate an optimal control problem. 349 00:23:22,580 --> 00:23:26,360 Let me sync up with my notes so I don't go too far afield. 350 00:23:30,350 --> 00:23:33,230 OK. 351 00:23:33,230 --> 00:23:37,340 Let's say my goal in life is to get to the origin as fast 352 00:23:37,340 --> 00:23:40,910 as possible in minimum time. 353 00:23:40,910 --> 00:23:44,248 But I'm subject to this constraint. 354 00:23:44,248 --> 00:23:46,040 So that's the famous minimum time problem-- 355 00:24:15,240 --> 00:24:18,661 subject to that constraint. 356 00:24:21,750 --> 00:24:22,605 OK. 357 00:24:22,605 --> 00:24:23,970 AUDIENCE: [INAUDIBLE] 358 00:24:23,970 --> 00:24:24,720 RUSS TEDRAKE: Yes. 359 00:24:24,720 --> 00:24:25,230 What do we want? 360 00:24:25,230 --> 00:24:27,022 Both the position and the velocity to be 0. 361 00:24:30,512 --> 00:24:32,220 Turns out you need this constraint for it 362 00:24:32,220 --> 00:24:34,590 to be a well-posed problem. 363 00:24:34,590 --> 00:24:36,810 If I didn't have constraints on u, then, like I said, 364 00:24:36,810 --> 00:24:38,813 I would just use as much u as possible. 365 00:24:38,813 --> 00:24:40,230 I would get there infinitely fast, 366 00:24:40,230 --> 00:24:41,800 and we haven't learned a whole lot. 367 00:24:44,430 --> 00:24:49,960 There are other ways to penalize u or something like that, 368 00:24:49,960 --> 00:24:52,420 but we're going to put a hard constraint on it here. 369 00:24:52,420 --> 00:24:56,670 OK, now, muster all your intuition about bricks and ice 370 00:24:56,670 --> 00:24:57,720 and tell me-- 371 00:24:57,720 --> 00:25:00,900 if I've got limited force to give 372 00:25:00,900 --> 00:25:05,100 and I want to get to the origin as fast as possible, 373 00:25:05,100 --> 00:25:07,995 what should I do? 374 00:25:07,995 --> 00:25:09,360 AUDIENCE: Bang-bang. 375 00:25:09,360 --> 00:25:10,610 RUSS TEDRAKE: Bang-bang. 376 00:25:10,610 --> 00:25:11,260 Good. 377 00:25:11,260 --> 00:25:12,430 He knows the answer. 378 00:25:12,430 --> 00:25:13,560 What should I do? 379 00:25:13,560 --> 00:25:16,660 People haven't thought about it and don't know bang-bang is. 380 00:25:16,660 --> 00:25:21,660 AUDIENCE: [INAUDIBLE] 381 00:25:21,660 --> 00:25:22,710 RUSS TEDRAKE: Right. 382 00:25:22,710 --> 00:25:23,520 Right. 383 00:25:23,520 --> 00:25:26,250 So if I want to get there as fast as possible, 384 00:25:26,250 --> 00:25:28,770 I'm going to hit the accelerator, 385 00:25:28,770 --> 00:25:32,753 go as fast as I can until some critical point, 386 00:25:32,753 --> 00:25:34,170 where I'm going to hit the brakes. 387 00:25:34,170 --> 00:25:37,040 And I'm going to skid stop right into the goal. 388 00:25:37,040 --> 00:25:38,790 There's nothing better I can do than that. 389 00:25:38,790 --> 00:25:41,910 We're going to prove that, but I want 390 00:25:41,910 --> 00:25:45,318 to see-- that's a fairly complicated thing. 391 00:25:45,318 --> 00:25:47,610 It's something you can guess for the double integrator. 392 00:25:47,610 --> 00:25:52,043 You can't guess for a walking robot, for instance. 393 00:25:52,043 --> 00:25:54,210 But we want to get that out of some machinery that's 394 00:25:54,210 --> 00:25:58,380 going to be more general than double integrators. 395 00:25:58,380 --> 00:26:12,320 OK, so the proposition was bang-bang control. 396 00:26:21,300 --> 00:26:23,790 You might hear people casually say, 397 00:26:23,790 --> 00:26:28,230 bang-bang control's optimal, and that is-- 398 00:26:28,230 --> 00:26:32,340 if you have hard limits on your actuators, 399 00:26:32,340 --> 00:26:34,950 it's very common that the best thing to do 400 00:26:34,950 --> 00:26:36,795 is to be at those limits all the time. 401 00:26:36,795 --> 00:26:38,670 If that's the way you've defined the problem, 402 00:26:38,670 --> 00:26:42,970 bang-bang control solutions are pretty ubiquitous. 403 00:26:46,780 --> 00:26:48,780 They don't always work that well in real robots, 404 00:26:48,780 --> 00:26:51,990 because actuators don't like to produce zero-- 405 00:26:51,990 --> 00:26:53,070 infinite force and then-- 406 00:26:53,070 --> 00:26:55,110 or max force and then negative max 407 00:26:55,110 --> 00:26:58,440 force within a single time step. 408 00:27:14,410 --> 00:27:29,370 Good-- OK, so I think the only subtle part about it 409 00:27:29,370 --> 00:27:33,720 is figuring out when I need to switch from hitting 410 00:27:33,720 --> 00:27:36,405 the gas to hitting the brakes. 411 00:27:36,405 --> 00:27:38,280 So let's see if we can figure that out first. 412 00:27:42,683 --> 00:27:44,100 I think a pretty good way to do it 413 00:27:44,100 --> 00:27:46,830 is to think about what happens if you hit the brakes. 414 00:27:49,650 --> 00:27:51,150 And then you want to hit the brakes 415 00:27:51,150 --> 00:27:53,730 and arrive directly at the goal. 416 00:27:53,730 --> 00:27:56,340 There's only going to be a handful of states 417 00:27:56,340 --> 00:27:58,660 from which, if I was going at some-- if I was 418 00:27:58,660 --> 00:28:02,640 some position and some velocity and I hit the brakes full now, 419 00:28:02,640 --> 00:28:05,420 I'm going to land exactly at the goal. 420 00:28:05,420 --> 00:28:08,070 Let's see if we can figure out that set of states first. 421 00:28:15,120 --> 00:28:20,040 Let's think about the case where q is greater than 0 first. 422 00:28:20,040 --> 00:28:21,870 Just pick a side. 423 00:28:21,870 --> 00:28:30,615 So in that case, hitting the brakes-- 424 00:28:34,560 --> 00:28:37,885 is that positive 1 or negative 1? 425 00:28:37,885 --> 00:28:38,760 AUDIENCE: [INAUDIBLE] 426 00:28:38,760 --> 00:28:40,110 RUSS TEDRAKE: Negative 1-- 427 00:28:40,110 --> 00:28:42,200 no, it's positive 1. 428 00:28:42,200 --> 00:28:43,152 You almost got me. 429 00:28:47,440 --> 00:28:52,690 If q is greater than 0, it's positive. 430 00:28:52,690 --> 00:28:54,640 q is greater than 0-- 431 00:28:54,640 --> 00:28:57,640 then u is positive. 432 00:28:57,640 --> 00:29:00,520 I want to be pushing back in the direction 433 00:29:00,520 --> 00:29:03,010 I'm already coming from, so u is positive 1. 434 00:29:05,550 --> 00:29:09,977 All right, so now, we're going to have some math ahead of us. 435 00:29:09,977 --> 00:29:11,560 See if we can integrate this equation. 436 00:29:15,130 --> 00:29:18,500 I can do that on the board for you. 437 00:29:18,500 --> 00:29:20,853 q dot of t-- 438 00:29:20,853 --> 00:29:21,770 I better get it right. 439 00:29:21,770 --> 00:29:23,786 [LAUGHTER] 440 00:29:26,528 --> 00:29:28,900 ut-- so in this case, it was 1-- 441 00:29:28,900 --> 00:29:30,400 plus q dot of 0. 442 00:29:34,850 --> 00:29:38,120 I'll just make it a little bit more [INAUDIBLE] This case, 443 00:29:38,120 --> 00:29:43,870 it was 1, and q double dot of t is-- 444 00:29:43,870 --> 00:29:51,760 sorry-- switch orders-- q0 plus q dot 0 t plus 1/2 ut squared. 445 00:29:57,730 --> 00:29:59,357 OK, I want to figure out-- 446 00:29:59,357 --> 00:30:01,967 AUDIENCE: [INAUDIBLE] 447 00:30:01,967 --> 00:30:03,300 RUSS TEDRAKE: Did I screw it up? 448 00:30:03,300 --> 00:30:04,090 What? 449 00:30:04,090 --> 00:30:05,860 AUDIENCE: [INAUDIBLE] 450 00:30:05,860 --> 00:30:07,560 AUDIENCE: [INAUDIBLE] 451 00:30:07,560 --> 00:30:08,560 RUSS TEDRAKE: Oh, sorry. 452 00:30:08,560 --> 00:30:09,060 Sorry. 453 00:30:09,060 --> 00:30:11,740 Thank you-- good. 454 00:30:11,740 --> 00:30:12,240 Thank you. 455 00:30:16,510 --> 00:30:21,220 OK, so let's figure out, if u is 1, what trajectories 456 00:30:21,220 --> 00:30:23,500 are going to get me so that q-- 457 00:30:23,500 --> 00:30:28,600 at some t final, qt and q dot of t are 0-- 458 00:30:28,600 --> 00:30:30,730 simple enough-- little manipulation. 459 00:30:41,530 --> 00:30:47,965 So it turns out I'm going to solve for q0 and q dot 0. 460 00:31:01,850 --> 00:31:03,710 So q dot 0-- 461 00:31:03,710 --> 00:31:05,990 looks like that's going to be negative u of t. 462 00:31:13,270 --> 00:31:15,020 It's a little bit weird, my notation here. 463 00:31:15,020 --> 00:31:16,603 I'm saying that the initial conditions 464 00:31:16,603 --> 00:31:20,277 are moving backwards. 465 00:31:20,277 --> 00:31:21,610 The equations are simple enough. 466 00:31:21,610 --> 00:31:23,530 I hope it's OK. 467 00:31:23,530 --> 00:31:26,350 And q0 t had better be-- 468 00:31:28,970 --> 00:31:31,760 it turns out to be 1/2 ut squared. 469 00:31:31,760 --> 00:31:37,210 So q dot of t is negative ut. 470 00:31:37,210 --> 00:31:38,680 Add those together. 471 00:31:38,680 --> 00:31:42,760 q0 is going to be 1/2 ut squared. 472 00:31:46,240 --> 00:31:48,220 If I solve for t-- solve out t-- 473 00:31:51,766 --> 00:31:59,800 in this case, u is 1 so t, say, is just negative q dot. 474 00:32:04,960 --> 00:32:11,373 So q of 0 is just 1/2-- 475 00:32:11,373 --> 00:32:12,040 let's keep that. 476 00:32:12,040 --> 00:32:18,121 This is just 1 t squared q dot 0 squared. 477 00:32:18,121 --> 00:32:27,760 If I plot that, what I've got in my state space-- 478 00:32:27,760 --> 00:32:38,200 q, q dot-- is a manifold of solutions, which starts at 0. 479 00:32:38,200 --> 00:32:40,990 And then I said that I did this for u is positive. 480 00:32:46,810 --> 00:32:48,040 And it goes like this. 481 00:32:54,960 --> 00:32:56,280 This one's not a solution. 482 00:32:56,280 --> 00:32:59,772 Where did I get that out? 483 00:32:59,772 --> 00:33:01,230 And one of my assumptions here when 484 00:33:01,230 --> 00:33:05,662 I inverted t or something that I-- that solution disappears. 485 00:33:05,662 --> 00:33:06,870 You can't have negative time. 486 00:33:12,690 --> 00:33:14,310 In fact, in my notes, I did it. 487 00:33:14,310 --> 00:33:16,620 I solved for the other t, which would have been better. 488 00:33:16,620 --> 00:33:17,400 Sorry. 489 00:33:17,400 --> 00:33:23,440 OK, so there's a line of solutions here-- 490 00:33:23,440 --> 00:33:26,530 which, if I started this q-- this is actually 491 00:33:26,530 --> 00:33:30,030 the positive queue, negative q velocities. 492 00:33:30,030 --> 00:33:31,710 I hit the brakes. 493 00:33:31,710 --> 00:33:34,770 I go coasting into the stop at the origin. 494 00:33:37,830 --> 00:33:41,310 Turns out, if I do-- if I think about the negative q case, 495 00:33:41,310 --> 00:33:43,110 I get a similar line-- 496 00:33:43,110 --> 00:33:43,920 similar curve. 497 00:33:43,920 --> 00:33:46,440 [INAUDIBLE] quadratic curve over here. 498 00:33:50,123 --> 00:33:52,290 You know what-- let me be a little bit more careful. 499 00:33:52,290 --> 00:33:57,465 Let me make that one pink, because this is the now u 500 00:33:57,465 --> 00:33:58,680 is negative 1 case. 501 00:34:09,489 --> 00:34:09,989 Good. 502 00:34:15,120 --> 00:34:17,280 We figured out the line of solutions 503 00:34:17,280 --> 00:34:21,480 where, if I hit the brakes, they get to the origin. 504 00:34:21,480 --> 00:34:22,889 Harness your intuition again. 505 00:34:22,889 --> 00:34:25,409 What do I do if I'm here? 506 00:34:29,353 --> 00:34:31,330 AUDIENCE: [INAUDIBLE] 507 00:34:31,330 --> 00:34:32,290 RUSS TEDRAKE: Right. 508 00:34:32,290 --> 00:34:35,300 This was the stopping all the way to the goal. 509 00:34:35,300 --> 00:34:38,170 So pretty much, from anywhere else, I want to accelerate. 510 00:34:38,170 --> 00:34:42,082 So what does accelerating look like when I'm here? 511 00:34:42,082 --> 00:34:43,020 AUDIENCE: [INAUDIBLE] 512 00:34:43,020 --> 00:34:44,853 RUSS TEDRAKE: It's going to put me going up. 513 00:34:47,215 --> 00:34:55,210 And what happens is, any time I'm below this curve, 514 00:34:55,210 --> 00:34:59,620 I'm going to drive myself up. 515 00:34:59,620 --> 00:35:03,160 I can't go backwards like that and drive myself up, hit that, 516 00:35:03,160 --> 00:35:06,310 and ride it in. 517 00:35:06,310 --> 00:35:08,350 And if I'm above the curve, what do I do? 518 00:35:12,142 --> 00:35:13,033 AUDIENCE: [INAUDIBLE] 519 00:35:13,033 --> 00:35:14,950 RUSS TEDRAKE: Have to overshoot a little bit-- 520 00:35:14,950 --> 00:35:18,090 I can't bend down more than this, 521 00:35:18,090 --> 00:35:20,650 so I'm going to ride it all the way over to here, 522 00:35:20,650 --> 00:35:25,120 connect up to this surface, and ride it in. 523 00:35:25,120 --> 00:35:29,380 And it turns out, any time I'm over here, the best thing to do 524 00:35:29,380 --> 00:35:29,890 is to-- 525 00:35:32,440 --> 00:35:33,760 did I get my colors wrong? 526 00:35:33,760 --> 00:35:36,730 Got my colors wrong-- 527 00:35:36,730 --> 00:35:37,460 let me fix that. 528 00:35:37,460 --> 00:35:37,960 Sorry. 529 00:35:40,750 --> 00:35:43,870 It's confusing. 530 00:35:43,870 --> 00:35:47,650 This is the accelerate, and then break. 531 00:35:47,650 --> 00:35:49,450 And this is the break. 532 00:36:01,440 --> 00:36:05,340 Let me just recolor it for you to make it a little more clear. 533 00:36:05,340 --> 00:36:06,080 Sorry about that. 534 00:36:19,465 --> 00:36:25,685 So let's say I'm pink over here, blue over here. 535 00:36:25,685 --> 00:36:26,185 OK. 536 00:36:28,730 --> 00:36:29,540 I want to be pink. 537 00:36:29,540 --> 00:36:31,440 I want to decelerate just like this 538 00:36:31,440 --> 00:36:33,773 if I'm above it because I want to take these curves that 539 00:36:33,773 --> 00:36:34,670 are almost there. 540 00:36:42,035 --> 00:36:43,410 If I've got extra time, I'm going 541 00:36:43,410 --> 00:36:45,702 to accelerate to the point where I decelerate again, so 542 00:36:45,702 --> 00:36:47,435 down here should be blue. 543 00:36:47,435 --> 00:36:48,810 And then this is, again, the case 544 00:36:48,810 --> 00:36:51,900 where I decelerate as much as I can until I take the pink line. 545 00:36:56,550 --> 00:36:59,740 This was the u equals negative 1, 546 00:36:59,740 --> 00:37:02,690 and this was the u equals positive 1. 547 00:37:08,990 --> 00:37:10,040 OK. 548 00:37:10,040 --> 00:37:12,260 Is that at all satisfying? 549 00:37:12,260 --> 00:37:17,480 We can now connect this back again to your phase plot 550 00:37:17,480 --> 00:37:18,830 pictures. 551 00:37:18,830 --> 00:37:23,000 We had our initial lines that looked like this. 552 00:37:23,000 --> 00:37:28,370 [INAUDIBLE] allowed to apply a bounded amount of torque. 553 00:37:28,370 --> 00:37:30,350 So the best thing I can do, if I'm right here, 554 00:37:30,350 --> 00:37:32,990 is I can warp this thing down to the point where 555 00:37:32,990 --> 00:37:34,790 I get right there. 556 00:37:40,310 --> 00:37:41,990 And if I'm here, I can warp it up 557 00:37:41,990 --> 00:37:44,380 to push me here, and then ride it down. 558 00:37:47,520 --> 00:37:49,920 The hard part is actually showing that that's optimal. 559 00:37:49,920 --> 00:37:52,560 And the reason I'm going to go through it is because it's-- it 560 00:37:52,560 --> 00:37:55,050 forms the basis for all the algorithms that are going to be 561 00:37:55,050 --> 00:37:55,592 more general. 562 00:37:59,040 --> 00:38:00,840 So let me show you that that's optimal. 563 00:38:07,560 --> 00:38:12,900 To do that, I need to introduce our first optimality 564 00:38:12,900 --> 00:38:14,550 ideas for the course. 565 00:38:17,196 --> 00:38:19,200 Are people OK with the-- 566 00:38:19,200 --> 00:38:20,760 that picture? 567 00:38:20,760 --> 00:38:23,880 AUDIENCE: [INAUDIBLE] below the line. 568 00:38:23,880 --> 00:38:24,777 RUSS TEDRAKE: Mm-hmm. 569 00:38:24,777 --> 00:38:28,600 AUDIENCE: [INAUDIBLE] 570 00:38:28,600 --> 00:38:30,300 RUSS TEDRAKE: So tell me where. 571 00:38:30,300 --> 00:38:32,110 Bottom right? 572 00:38:32,110 --> 00:38:34,120 OK. 573 00:38:34,120 --> 00:38:36,180 So this is the place where, if I decelerate, 574 00:38:36,180 --> 00:38:38,310 I get to the origin. 575 00:38:38,310 --> 00:38:41,250 If I'm here, then I have a little bit more velocity 576 00:38:41,250 --> 00:38:43,860 in the same position. 577 00:38:43,860 --> 00:38:47,070 So if I hit the brakes, I'm not going to stop in time. 578 00:38:51,810 --> 00:38:53,490 I don't quite decelerate fast enough 579 00:38:53,490 --> 00:38:55,770 to get here, because there's limited torque, 580 00:38:55,770 --> 00:38:59,880 so I just slip past it until my chance to come in the other way 581 00:38:59,880 --> 00:39:00,380 again. 582 00:39:08,320 --> 00:39:12,810 The only separation is that curve. 583 00:39:12,810 --> 00:39:15,837 AUDIENCE: [INAUDIBLE] 584 00:39:15,837 --> 00:39:17,920 RUSS TEDRAKE: Everywhere up here, you want to be-- 585 00:39:17,920 --> 00:39:19,132 top left you said? 586 00:39:19,132 --> 00:39:23,260 AUDIENCE: [INAUDIBLE] 587 00:39:23,260 --> 00:39:26,110 RUSS TEDRAKE: Here, you're blue. 588 00:39:26,110 --> 00:39:28,870 The same way, you want to accelerate as much as you can, 589 00:39:28,870 --> 00:39:30,370 because you want to get to the place 590 00:39:30,370 --> 00:39:33,280 where you have to hit the brakes. 591 00:39:33,280 --> 00:39:34,030 This is me. 592 00:39:34,030 --> 00:39:35,752 I'm at some position. 593 00:39:35,752 --> 00:39:37,210 I don't have enough velocity that I 594 00:39:37,210 --> 00:39:38,950 have to just hit my brakes, so I'm 595 00:39:38,950 --> 00:39:42,010 going to gun it until I'm at the velocity where I just 596 00:39:42,010 --> 00:39:44,920 have to hit my brakes, and then ride it in. 597 00:39:44,920 --> 00:39:48,760 Up here, I'm at the point where I have too much velocity. 598 00:39:48,760 --> 00:39:50,207 Even if I hit my brakes, I'm still 599 00:39:50,207 --> 00:39:51,790 going to overshoot a little bit, which 600 00:39:51,790 --> 00:39:53,240 means I'm going to have to-- 601 00:39:53,240 --> 00:39:55,090 and so you could think of it as now-- 602 00:39:55,090 --> 00:39:57,812 the word brakes maybe flips when you flip that cross across. 603 00:39:57,812 --> 00:39:59,020 Maybe that's the right thing. 604 00:39:59,020 --> 00:40:03,100 But the action I take is only changing based 605 00:40:03,100 --> 00:40:05,900 on that switching surface-- 606 00:40:08,700 --> 00:40:11,010 which, as you know, will be a nightmare 607 00:40:11,010 --> 00:40:13,447 for a lot of our reinforcement learning algorithms. 608 00:40:13,447 --> 00:40:14,280 This is a hard cusp. 609 00:40:19,830 --> 00:40:22,710 So if you have a control system that has a hard non-linearity 610 00:40:22,710 --> 00:40:24,650 like this, which is-- 611 00:40:24,650 --> 00:40:27,150 I'm doing one thing here, and I'm immediately, 612 00:40:27,150 --> 00:40:30,720 at some discrete place, doing a different thing-- 613 00:40:30,720 --> 00:40:34,860 that's a very non-linear event. 614 00:40:34,860 --> 00:40:36,927 And it's hard to get analytically 615 00:40:36,927 --> 00:40:38,760 when you're doing something more complicated 616 00:40:38,760 --> 00:40:41,130 than a double integrator, and it can 617 00:40:41,130 --> 00:40:43,200 be hard to get computationally. 618 00:40:43,200 --> 00:40:46,170 But we'll talk about that when the time comes. 619 00:40:46,170 --> 00:40:53,410 OK, so how the heck do I make an optimality argument about this? 620 00:40:53,410 --> 00:40:57,178 I want to introduce Pontryagin's minimum principle. 621 00:41:15,010 --> 00:41:15,510 OK. 622 00:41:34,070 --> 00:41:39,020 This is going to be a load of equations real quick, 623 00:41:39,020 --> 00:41:43,130 and we're going to tease them out. 624 00:41:43,130 --> 00:41:47,330 In general, optimal control problems are going to go-- 625 00:41:47,330 --> 00:41:51,050 are going to be formulated by designing a cost function. 626 00:41:51,050 --> 00:41:58,430 That cost function is some scalar quantity 627 00:41:58,430 --> 00:42:00,260 that I want to minimize. 628 00:42:12,300 --> 00:42:15,590 I'm going to use the symbols g of x you 629 00:42:15,590 --> 00:42:17,430 as an instantaneous cost function. 630 00:42:25,850 --> 00:42:32,540 I'll use h of x to mean final costs. 631 00:42:32,540 --> 00:42:35,080 I'm going to show you what this means in a second. 632 00:42:35,080 --> 00:42:40,240 And I'm going to use J of x to be a cost-to-go. 633 00:42:48,050 --> 00:42:52,130 It's very important-- all of these are scalars-- 634 00:42:52,130 --> 00:42:55,160 not vectors at all, just a scalar quantity. 635 00:42:58,010 --> 00:43:03,200 So a typical optimal control problem 636 00:43:03,200 --> 00:43:09,320 will be formulated as something like h of x capital T 637 00:43:09,320 --> 00:43:21,650 plus integral from 0 to T g of x u dt, subject to x dot of t 638 00:43:21,650 --> 00:43:22,580 is f of xu. 639 00:43:22,580 --> 00:43:29,220 Let's say the dynamics and x0 t is some-- 640 00:43:29,220 --> 00:43:31,830 let me it call it x0 here-- 641 00:43:31,830 --> 00:43:32,330 x0. 642 00:43:40,440 --> 00:43:44,010 This is a general cost function-- 643 00:43:44,010 --> 00:43:47,160 cost-to-go function form for optimal control. 644 00:43:47,160 --> 00:43:50,460 We're going to use it a lot. 645 00:43:50,460 --> 00:43:52,970 There's just a couple of things to note about it. 646 00:43:52,970 --> 00:43:55,638 So just to get some intuition, so g of xu-- 647 00:43:55,638 --> 00:43:57,930 that's things I'm penalizing throughout the trajectory. 648 00:43:57,930 --> 00:44:08,310 So maybe I want to do small actions in general, 649 00:44:08,310 --> 00:44:10,630 in which case, I could put some term in here, 650 00:44:10,630 --> 00:44:13,770 which penalizes me for having a u. 651 00:44:13,770 --> 00:44:17,280 I could put a u squared in here or something like that. 652 00:44:17,280 --> 00:44:21,630 Or maybe I want to just worry about being 653 00:44:21,630 --> 00:44:22,830 a long way from the origin. 654 00:44:22,830 --> 00:44:26,000 Maybe I'll put an x squared in here or something like this. 655 00:44:28,710 --> 00:44:32,310 h is a final cost. 656 00:44:32,310 --> 00:44:34,050 It's some function that only penalizes 657 00:44:34,050 --> 00:44:36,100 the final state of the system. 658 00:44:36,100 --> 00:44:39,390 So maybe I don't care what I'm doing for the first capital T 659 00:44:39,390 --> 00:44:44,670 seconds, but at time capital T, I 660 00:44:44,670 --> 00:44:46,890 want to penalize it for being away from 0-- 661 00:44:46,890 --> 00:44:49,170 x squared here or something like that. 662 00:44:49,170 --> 00:44:51,277 There's lots of different forms. 663 00:44:51,277 --> 00:44:52,860 The only thing that's really important 664 00:44:52,860 --> 00:44:55,360 to note about this, the only really restriction in the forms 665 00:44:55,360 --> 00:44:56,940 that you can play with, is that we 666 00:44:56,940 --> 00:45:01,140 do tend to assume this form, which is additive. 667 00:45:01,140 --> 00:45:06,270 It's integrable-- integrates some scalar cost g. 668 00:45:06,270 --> 00:45:09,460 So I don't look at multiplicative contributions 669 00:45:09,460 --> 00:45:09,960 of-- 670 00:45:09,960 --> 00:45:13,170 from x at time 1 and x time 4 or something like that. 671 00:45:13,170 --> 00:45:17,190 I'm only looking at additive cost functions. 672 00:45:17,190 --> 00:45:20,880 Assuming that form-- that additive cost form 673 00:45:20,880 --> 00:45:25,380 will make all the derivations work, roughly. 674 00:45:25,380 --> 00:45:30,930 OK, so for the minimum time problem, what is that form? 675 00:45:34,907 --> 00:45:36,990 You could formulate it a couple of different ways. 676 00:46:01,565 --> 00:46:08,510 In this case, I could actually have g of xu equals 1, 677 00:46:08,510 --> 00:46:23,720 and have capital T defined as the time, 678 00:46:23,720 --> 00:46:25,490 and have h of x equal 0. 679 00:46:28,490 --> 00:46:34,550 That's a perfectly reasonable optimal control formulation. 680 00:46:34,550 --> 00:46:39,860 So it certainly fits in this general optimal control form. 681 00:46:44,030 --> 00:46:45,980 OK, so now we need to know how to-- 682 00:46:45,980 --> 00:46:47,090 we've got this guess. 683 00:46:47,090 --> 00:46:50,150 I'm going to leave that hard-earned picture up there. 684 00:46:53,570 --> 00:46:54,980 I like this one too, but-- 685 00:47:18,110 --> 00:47:20,240 let me just say what Pontryagin's minimum principle 686 00:47:20,240 --> 00:47:23,600 is first, and then we'll make sure it makes sense. 687 00:47:23,600 --> 00:47:35,150 So for this general form, J of x is h of xT plus integral 688 00:47:35,150 --> 00:47:40,610 from 0 to T g x, u dt, subject to-- 689 00:47:40,610 --> 00:47:45,200 and I'm going to try to be very careful about writing 690 00:47:45,200 --> 00:47:48,500 these all every time. 691 00:47:58,780 --> 00:48:03,317 Let's assume for-- to begin with, capital T is fixed, 692 00:48:03,317 --> 00:48:04,650 just a parameter somebody chose. 693 00:48:09,176 --> 00:48:12,217 Let's say u is bounded to some set u. 694 00:48:12,217 --> 00:48:14,550 In our problem right now, it was negative 1 to 1, right? 695 00:48:22,800 --> 00:48:24,420 The minimum principle goes like this. 696 00:48:56,010 --> 00:48:58,350 We're going to define this new auxiliary function, 697 00:48:58,350 --> 00:49:33,830 the Hamiltonian, capital H. If I have found some optimal control 698 00:49:33,830 --> 00:49:34,871 solution-- 699 00:49:42,410 --> 00:49:44,330 I'll think of it in terms of-- the solution 700 00:49:44,330 --> 00:49:48,800 right now in terms of a trajectory, which 701 00:49:48,800 --> 00:49:54,546 is some sequence x star of t, u star of t. 702 00:49:59,210 --> 00:50:04,384 Then it must satisfy the following conditions. 703 00:50:10,320 --> 00:50:15,950 First of all, we know it must satisfy f of x star u star. 704 00:50:15,950 --> 00:50:21,380 That was already one of our conditions. 705 00:50:21,380 --> 00:50:22,730 And it has to satisfy the-- 706 00:50:30,470 --> 00:50:31,430 OK. 707 00:50:31,430 --> 00:50:34,040 There's a significantly less trivial one, 708 00:50:34,040 --> 00:50:42,080 which is that p dot of t has got to equal to negative partial h, 709 00:50:42,080 --> 00:50:49,970 partial x evaluated at x star, u star, p-- 710 00:50:54,640 --> 00:50:57,600 which, if h is what I had up there, 711 00:50:57,600 --> 00:51:02,850 works out to be partial g, partial x, 712 00:51:02,850 --> 00:51:06,180 plus partial f, partial x. 713 00:51:06,180 --> 00:51:07,650 Transpose p. 714 00:51:21,390 --> 00:51:24,000 And this auxillary variable that we 715 00:51:24,000 --> 00:51:31,140 had has to be the gradient of partial h evaluated x star t. 716 00:51:36,990 --> 00:51:44,340 One last condition-- this is 1, 2, 3. 717 00:51:47,100 --> 00:51:52,260 u star t had better be the argmin 718 00:51:52,260 --> 00:52:00,030 over u of h of x star, u, p. 719 00:52:08,200 --> 00:52:08,860 OK. 720 00:52:08,860 --> 00:52:09,910 Sorry. 721 00:52:09,910 --> 00:52:10,510 Cut that out. 722 00:52:10,510 --> 00:52:12,010 We're going to make sense of it now. 723 00:52:19,180 --> 00:52:20,515 OK, before we derive it-- 724 00:52:20,515 --> 00:52:22,390 and I'll just do a sketch of the derivation-- 725 00:52:22,390 --> 00:52:24,100 but before we derive it, let's just 726 00:52:24,100 --> 00:52:26,123 think about the implications. 727 00:52:29,470 --> 00:52:34,060 First of all, this says the optimal control trajectory must 728 00:52:34,060 --> 00:52:40,630 satisfy, which means it's a necessary condition 729 00:52:40,630 --> 00:52:42,176 for optimality. 730 00:52:48,130 --> 00:52:51,430 If I found some optimal trajectory x 731 00:52:51,430 --> 00:52:57,430 star, u star, some trajectory x, u, I can verify that-- 732 00:52:57,430 --> 00:52:59,863 a necessary condition is that all these things are hold, 733 00:52:59,863 --> 00:53:01,780 but that's actually not a sufficient condition 734 00:53:01,780 --> 00:53:04,420 in general. 735 00:53:04,420 --> 00:53:07,120 For linear systems that are convex-- 736 00:53:07,120 --> 00:53:10,160 linear dynamics that are convex and the cost function, 737 00:53:10,160 --> 00:53:12,400 it turns out it's OK, but in general, it's 738 00:53:12,400 --> 00:53:14,055 not always sufficient. 739 00:53:22,970 --> 00:53:26,200 And it says that, if I take my x and I integrate it forward 740 00:53:26,200 --> 00:53:29,440 in time, solving x by integrating my dynamics 741 00:53:29,440 --> 00:53:33,370 forward, and then I take this other function-- 742 00:53:33,370 --> 00:53:36,340 this new set of variables p, which happens to have the same 743 00:53:36,340 --> 00:53:37,470 size as x-- 744 00:53:37,470 --> 00:53:40,960 we'll see that-- and integrate it effectively backwards 745 00:53:40,960 --> 00:53:46,560 in time, because I have final condition on p-- 746 00:53:46,560 --> 00:53:50,620 if I do both of those things, and I can write down u as being 747 00:53:50,620 --> 00:53:53,890 the argument of what's left-- h, x-- 748 00:53:53,890 --> 00:53:59,247 then I've satisfied a necessary condition for optimality. 749 00:53:59,247 --> 00:54:00,580 Let's try to make sense of that. 750 00:54:24,810 --> 00:54:27,680 How many people have done optimization before at all? 751 00:54:30,740 --> 00:54:32,990 How many people have seen Lagrange multipliers before? 752 00:54:35,960 --> 00:54:42,800 OK, good-- so let me say a few things but not dwell. 753 00:54:42,800 --> 00:54:47,043 And there's a lot of information in the notes-- 754 00:54:47,043 --> 00:54:47,960 as fast as I can type. 755 00:55:07,330 --> 00:55:09,190 All right, in general, what I'm trying to do 756 00:55:09,190 --> 00:55:11,000 is I'm trying to minimize some function. 757 00:55:11,000 --> 00:55:12,500 In this case, I'm trying to minimize 758 00:55:12,500 --> 00:55:18,010 J. I'm trying to minimize this J of x 759 00:55:18,010 --> 00:55:23,793 by finding the u [INAUDIBLE] which minimizes then. 760 00:55:23,793 --> 00:55:25,210 But let's make it a little simpler 761 00:55:25,210 --> 00:55:27,160 just to make sure we get the basic idea. 762 00:55:27,160 --> 00:55:30,580 Let me just say J is some function of some parameter 763 00:55:30,580 --> 00:55:32,080 alpha. 764 00:55:32,080 --> 00:55:33,360 I'm trying to minimize J-- 765 00:55:37,060 --> 00:55:38,820 I can even do it even simpler. 766 00:55:38,820 --> 00:55:41,380 Let's just say minimize over x J of x. 767 00:55:44,320 --> 00:55:49,720 So if I have some function of x-- 768 00:55:49,720 --> 00:55:53,590 J of x looks like this-- 769 00:55:53,590 --> 00:55:55,390 I want to find the minimum. 770 00:55:55,390 --> 00:55:58,810 The first condition, the necessary condition, 771 00:55:58,810 --> 00:56:01,390 is that, at the minimum, the derivative of that thing 772 00:56:01,390 --> 00:56:01,990 better be 0. 773 00:56:07,460 --> 00:56:12,530 So I can check by just checking if partial J, partial x equals 774 00:56:12,530 --> 00:56:18,860 0, that I've got a necessary condition for a minimum. 775 00:56:31,340 --> 00:56:33,300 That's actually a lot of it. 776 00:56:33,300 --> 00:56:35,740 The second part is the Lagrange multiplier part. 777 00:56:40,610 --> 00:56:42,260 Let's say x is a vector now-- 778 00:56:42,260 --> 00:56:44,310 a two-dimensional vector. 779 00:56:44,310 --> 00:56:48,470 Let's say I want to do the optimization 780 00:56:48,470 --> 00:56:58,820 min J of x, subject to the constraint that x1 equals x2-- 781 00:56:58,820 --> 00:57:02,620 or let's do something slightly more interesting. 782 00:57:02,620 --> 00:57:05,340 x1 plus x2 is 3. 783 00:57:12,750 --> 00:57:15,870 Turns out, thanks to the method of Lagrange-- 784 00:57:15,870 --> 00:57:17,220 one of his many methods-- 785 00:57:20,220 --> 00:57:23,730 solving this problem is no more difficult than solving 786 00:57:23,730 --> 00:57:27,990 this problem-- finding necessary conditions for this problem. 787 00:57:27,990 --> 00:57:34,710 By just making an augmented function, 788 00:57:34,710 --> 00:57:45,090 you can now minimize x and lambda of J of x plus lambda 789 00:57:45,090 --> 00:57:48,540 times this constraint-- which, in this case, 790 00:57:48,540 --> 00:57:54,732 is x1 plus x2 minus 3-- 791 00:57:54,732 --> 00:57:55,980 has to equal 0. 792 00:58:01,540 --> 00:58:08,333 It turns out, if partial J, partial lambda equals 0, 793 00:58:08,333 --> 00:58:10,125 then that means the constraint is enforced. 794 00:58:17,120 --> 00:58:24,560 Partial J, partial lambda in this is x1 plus x2 minus 3. 795 00:58:24,560 --> 00:58:26,810 If that equals 0, which is the condition I'm 796 00:58:26,810 --> 00:58:31,880 looking for anyways for the minimum, then I've now-- 797 00:58:31,880 --> 00:58:36,620 not only have I satisfied my constraint, but the remaining 798 00:58:36,620 --> 00:58:38,540 minima-- 799 00:58:38,540 --> 00:58:41,150 the minimization of this is this constrained solution 800 00:58:41,150 --> 00:58:43,820 to that optimization. 801 00:58:43,820 --> 00:58:46,315 The Lagrange multiplier method is very, very useful. 802 00:58:46,315 --> 00:58:47,690 If you don't know it, look it up. 803 00:58:47,690 --> 00:58:48,410 It's very good. 804 00:58:57,750 --> 00:58:58,250 Yeah? 805 00:58:58,250 --> 00:59:00,500 AUDIENCE: So in the partial J, partial lambda, 806 00:59:00,500 --> 00:59:02,650 that J [INAUDIBLE] partial is this new J-- 807 00:59:02,650 --> 00:59:03,650 RUSS TEDRAKE: Oh, sorry. 808 00:59:03,650 --> 00:59:04,130 Thank you. 809 00:59:04,130 --> 00:59:04,630 Thank you. 810 00:59:04,630 --> 00:59:05,360 Yep. 811 00:59:05,360 --> 00:59:07,430 Good catch, good catch-- thank you. 812 00:59:07,430 --> 00:59:10,233 Partial of-- I don't know-- that whole thing-- 813 00:59:10,233 --> 00:59:11,900 partial lambda-- thank you-- good catch. 814 00:59:19,270 --> 00:59:23,410 And in the method of Lagrange multipliers, 815 00:59:23,410 --> 00:59:26,350 lambda has an interpretation of a constraint force. 816 00:59:29,680 --> 00:59:34,570 What you're about to see is that all I'm saying 817 00:59:34,570 --> 00:59:36,670 in Pontryagin's minimum principle-- 818 00:59:36,670 --> 00:59:42,940 which is an absolute staple in optimal control-- 819 00:59:42,940 --> 00:59:47,140 is all I'm saying is that J of x is-- 820 00:59:47,140 --> 00:59:50,470 which is my cost function-- my cost-to-go function-- 821 00:59:50,470 --> 00:59:54,610 is at a stationary point with a Lagrange multiplier 822 00:59:54,610 --> 00:59:58,150 which enforces this dynamic. 823 00:59:58,150 --> 01:00:03,180 And that Lagrange multiplier happens to p. 824 01:00:03,180 --> 01:00:03,720 OK? 825 01:00:03,720 --> 01:00:05,262 So let's just see how that plays out. 826 01:00:13,320 --> 01:00:19,710 OK, so this is a sketch of the derivation 827 01:00:19,710 --> 01:00:21,953 of Pontryagin's minimum principle, which, I think-- 828 01:00:21,953 --> 01:00:23,370 I'm just going to do enough so you 829 01:00:23,370 --> 01:00:24,330 see where those things are and have 830 01:00:24,330 --> 01:00:25,655 some intuition about them-- 831 01:00:25,655 --> 01:00:27,780 a sketch of it based on the calculus of variations. 832 01:00:27,780 --> 01:00:30,020 So there's many other ways to do it-- 833 01:00:35,790 --> 01:00:41,580 calculus of variations, which is a scary name 834 01:00:41,580 --> 01:00:42,770 for a very simple thing. 835 01:00:54,150 --> 01:00:55,710 This is the problem solving-- 836 01:00:55,710 --> 01:01:00,180 J of x0 is h of x plus integral over g, 837 01:01:00,180 --> 01:01:02,910 subject to those constraints. 838 01:01:02,910 --> 01:01:06,480 So how do I write that in terms of a Lagrange multiplier? 839 01:01:06,480 --> 01:01:08,250 I'm going to do a second function, which 840 01:01:08,250 --> 01:01:10,292 I won't make the mistake of calling J again here. 841 01:01:10,292 --> 01:01:19,500 Let's call it S. Some function S is going to be h of xT 842 01:01:19,500 --> 01:01:26,520 plus the integral from 0 to T of g of xt, 843 01:01:26,520 --> 01:01:31,710 ut plus some Lagrange multipliers-- 844 01:01:31,710 --> 01:01:34,660 p in this case-- 845 01:01:34,660 --> 01:01:40,550 times my constraint, which is x dot minus f of xu. 846 01:01:40,550 --> 01:01:42,540 I was trying to use T's everywhere. 847 01:01:42,540 --> 01:02:07,580 [INAUDIBLE] 848 01:02:07,580 --> 01:02:13,520 OK, so now I can explicitly try to find the place where S-- 849 01:02:13,520 --> 01:02:17,780 which is my Lagrange multiplayer version of my problem, 850 01:02:17,780 --> 01:02:21,590 which has the explicit cost that I'm trying to minimize-- 851 01:02:21,590 --> 01:02:24,410 subject to the constraints that x better be-- 852 01:02:24,410 --> 01:02:27,440 satisfy my dynamics-- exactly the same 853 01:02:27,440 --> 01:02:30,373 as that two-second Lagrange multiplayer introduction. 854 01:02:34,650 --> 01:02:39,360 Now, getting it right is a little bit funny. 855 01:02:39,360 --> 01:02:41,070 So S is now-- 856 01:02:41,070 --> 01:02:43,320 you could think of S as a functional [INAUDIBLE] 857 01:02:43,320 --> 01:02:47,040 a function of functions. 858 01:02:47,040 --> 01:02:49,320 If I take a variation, this is just-- 859 01:02:49,320 --> 01:02:52,710 it's going to be exactly like your basic calculus, 860 01:02:52,710 --> 01:02:56,430 but the calculus of variations uses these symbols 861 01:02:56,430 --> 01:02:59,760 for a variation on a function is just 862 01:02:59,760 --> 01:03:06,760 going to be partial h, partial x times the variation in x 863 01:03:06,760 --> 01:03:16,850 of T plus the integral 0 dt 864 01:03:16,850 --> 01:03:20,650 Notice quickly that this thing inside here is just h. 865 01:03:23,600 --> 01:03:26,360 That's my Hamiltonian. 866 01:03:26,360 --> 01:03:28,880 So that thing inside there I can just-- 867 01:03:54,568 --> 01:03:58,830 OK, this says this is a variational analysis of S that 868 01:03:58,830 --> 01:04:03,360 says, if my function changes by some small amount in x tilde, 869 01:04:03,360 --> 01:04:06,900 this is the result of-- in changing S-- in S. 870 01:04:06,900 --> 01:04:10,650 Similarly, if my thing changes by a little bit in xt, 871 01:04:10,650 --> 01:04:12,960 or in ut, or pt, or in all of them 872 01:04:12,960 --> 01:04:15,210 simultaneously, this tells me what the variation's 873 01:04:15,210 --> 01:04:19,680 going to be in S. 874 01:04:19,680 --> 01:04:22,740 The stationary conditions then-- if I'm at an optima, 875 01:04:22,740 --> 01:04:24,390 what I care about is that-- 876 01:04:24,390 --> 01:04:27,990 if I change u a little bit, if I change x a little bit, 877 01:04:27,990 --> 01:04:30,040 if I change p a little bit, S better not change. 878 01:04:30,040 --> 01:04:31,920 That's my condition-- my necessary condition 879 01:04:31,920 --> 01:04:32,953 for optimality. 880 01:04:40,350 --> 01:04:45,780 If partial h, partial p is 0, then I 881 01:04:45,780 --> 01:04:48,760 know that changing p isn't going to change the solution, 882 01:04:48,760 --> 01:04:52,020 so I can look for stationary points with respect to p. 883 01:04:54,870 --> 01:04:57,870 Partial h, partial p better be 0. 884 01:04:57,870 --> 01:05:00,690 What's partial h, partial p? 885 01:05:00,690 --> 01:05:05,880 Well, it turns out it's just x dot minus f of xu, 886 01:05:05,880 --> 01:05:07,220 which is my forward dynamics. 887 01:05:10,140 --> 01:05:14,550 So if I've integrated my system forward in time, 888 01:05:14,550 --> 01:05:16,470 then this thing's going to be true, 889 01:05:16,470 --> 01:05:19,680 and [INAUDIBLE] steady state with respect to changes in p. 890 01:05:43,610 --> 01:05:48,380 OK, let's look at the changes with respect to x. 891 01:05:48,380 --> 01:05:53,000 So to get the contributions from x correct, 892 01:05:53,000 --> 01:05:55,133 we first need to worry about this x dot. 893 01:05:55,133 --> 01:05:57,550 We don't want to have that x dot floating around in there, 894 01:05:57,550 --> 01:05:59,758 so let's integrate by parts to get that out of there. 895 01:06:04,400 --> 01:06:09,230 We're going to look at this partial h, partial x-- 896 01:06:09,230 --> 01:06:12,140 the variations-- but first, integrate by parts. 897 01:06:30,890 --> 01:06:39,260 The integral of my p of t x dot t dt from 0 to t 898 01:06:39,260 --> 01:06:47,600 is just going to be p of T, x of T minus p0, 899 01:06:47,600 --> 01:06:58,520 [? x0 ?] minus the integral capital T of p dot t x of t-- 900 01:06:58,520 --> 01:06:59,600 I forgot my transpose-- 901 01:07:03,120 --> 01:07:07,710 dt-- basic integration by parts. 902 01:07:07,710 --> 01:07:14,840 OK, if I now take my variation in partial x, having used that, 903 01:07:14,840 --> 01:07:16,370 then what I get is-- 904 01:07:41,080 --> 01:07:51,190 which gives me-- which, in this case, was partial g, 905 01:07:51,190 --> 01:07:56,665 partial x transpose plus partial f, partial x transpose p of t. 906 01:08:07,772 --> 01:08:09,730 My goal is to show you enough of the derivation 907 01:08:09,730 --> 01:08:12,093 that you understand what these terms are, 908 01:08:12,093 --> 01:08:14,260 and not so much to get completely bogged down in it. 909 01:08:14,260 --> 01:08:17,529 If you want a good treatment, a careful treatment, 910 01:08:17,529 --> 01:08:26,479 you should see Bertsekas optimal control book. 911 01:08:35,702 --> 01:08:37,160 When I say careful treatment there, 912 01:08:37,160 --> 01:08:39,930 it's going to be 5 pages or more at least. 913 01:08:43,430 --> 01:08:50,420 OK, so Pontryagin's minimum principle says that, 914 01:08:50,420 --> 01:08:54,290 if my constraint is satisfied to x dot, 915 01:08:54,290 --> 01:08:59,600 and if I can just integrate back to this p dot backwards in time 916 01:08:59,600 --> 01:09:02,300 from some final conditions-- which are the same basic 917 01:09:02,300 --> 01:09:05,660 variation argument that drives this-- 918 01:09:05,660 --> 01:09:08,000 then I found Lagrange multipliers which 919 01:09:08,000 --> 01:09:11,740 satisfies that constraint. 920 01:09:11,740 --> 01:09:14,710 And the final variation says that u had better 921 01:09:14,710 --> 01:09:17,770 be the minimum of h of x. 922 01:09:17,770 --> 01:09:20,770 That puts me at a local minima in 923 01:09:20,770 --> 01:09:22,452 my constrained optimization-- 924 01:09:26,000 --> 01:09:27,830 big pill to swallow, but this is the way 925 01:09:27,830 --> 01:09:36,535 we're going to show that the brick solution is optimal. 926 01:09:36,535 --> 01:09:37,910 People are much more enthusiastic 927 01:09:37,910 --> 01:09:42,643 when there's bricks. 928 01:09:42,643 --> 01:09:43,550 It's OK. 929 01:09:43,550 --> 01:09:44,608 I understand. 930 01:10:06,100 --> 01:10:12,010 OK, so let's turn the crank and use this tool now to see what 931 01:10:12,010 --> 01:10:13,510 the heck-- 932 01:10:13,510 --> 01:10:18,520 see if we can verify our original bang-bang policy 933 01:10:18,520 --> 01:10:19,210 is optimal. 934 01:10:32,060 --> 01:10:32,920 So for the bang-- 935 01:10:37,030 --> 01:10:44,932 Pontryagin bang-bang double integrator-- 936 01:11:00,550 --> 01:11:02,550 what's the [INAUDIBLE] look like for this thing? 937 01:11:05,840 --> 01:11:08,240 g of xu we said was just 1-- 938 01:11:14,040 --> 01:11:19,050 and p transpose times x dot minus f of xu. 939 01:11:26,112 --> 01:11:27,820 I'll just write it out in elements forms. 940 01:11:27,820 --> 01:11:29,130 It's so simple. 941 01:11:29,130 --> 01:11:37,850 It's p1 times q dot plus p2 times u. 942 01:11:41,210 --> 01:11:41,710 OK? 943 01:11:58,110 --> 01:12:01,420 If we had derived our bang-bang controller just like this, 944 01:12:01,420 --> 01:12:03,810 then we could actually immediately say, 945 01:12:03,810 --> 01:12:06,360 what's the optimal control solution? 946 01:12:06,360 --> 01:12:21,020 If I want to take u star as the argmin, u in negative 1 947 01:12:21,020 --> 01:12:31,020 of 1 of h x, u, p-- 948 01:12:31,020 --> 01:12:33,035 and what's it going to be, just looking at what 949 01:12:33,035 --> 01:12:34,160 I've got on the board here? 950 01:12:40,600 --> 01:12:44,800 This is a good time to make sure you get it. 951 01:12:53,215 --> 01:12:55,210 AUDIENCE: [INAUDIBLE] 952 01:12:55,210 --> 01:12:57,290 RUSS TEDRAKE: Yeah-- good. 953 01:12:57,290 --> 01:13:01,411 So these terms are-- 954 01:13:01,411 --> 01:13:03,910 have no impact. 955 01:13:03,910 --> 01:13:07,550 If p2 is positive, and I want to minimize this thing, 956 01:13:07,550 --> 01:13:09,280 then u better be negative-- 957 01:13:09,280 --> 01:13:12,130 as negative as possible. 958 01:13:12,130 --> 01:13:15,250 Negative as possible means negative 1. 959 01:13:15,250 --> 01:13:19,580 And if p2 is negative, then you should 960 01:13:19,580 --> 01:13:23,740 be as positive as possible to minimize that thing. 961 01:13:23,740 --> 01:13:26,260 So it turns out our same policy that we worked hard 962 01:13:26,260 --> 01:13:31,570 for in the Pontryagin, in terms of the Lagrange multipliers, 963 01:13:31,570 --> 01:13:35,320 works out to just be p2-- 964 01:13:35,320 --> 01:13:37,675 sine of p2 of t-- 965 01:13:37,675 --> 01:13:38,550 negative sine, sorry. 966 01:13:41,190 --> 01:13:45,128 So the sine function is just 1 if it's greater than 0, 967 01:13:45,128 --> 01:13:46,420 negative 1 if it's less than 0. 968 01:13:51,310 --> 01:13:53,630 My equations for p-- 969 01:13:53,630 --> 01:13:55,880 which, if I didn't use the word adjoint equations yet, 970 01:13:55,880 --> 01:13:57,970 I should have-- 971 01:13:57,970 --> 01:14:00,923 these equations for p are called the adjoint equations. 972 01:14:17,200 --> 01:14:20,620 My equations for p are pretty painless. 973 01:14:20,620 --> 01:14:28,792 So p1 dot is negative partial h, partial x1. 974 01:14:31,750 --> 01:14:34,293 x1 is q, so-- 975 01:14:34,293 --> 01:14:35,210 doesn't appear at all. 976 01:14:35,210 --> 01:14:36,965 That's 0. 977 01:14:36,965 --> 01:14:39,340 So that Lagrange multiplier isn't going to change at all. 978 01:14:39,340 --> 01:14:42,370 That's pretty painless. 979 01:14:42,370 --> 01:14:52,120 And p2 dot is negative partial h, partial x2, 980 01:14:52,120 --> 01:14:55,600 and that's negative p1 t. 981 01:15:10,440 --> 01:15:18,870 OK, so it turns out that p1-- 982 01:15:18,870 --> 01:15:21,750 my Lagrange multiplier's just going to be some constant. 983 01:15:21,750 --> 01:15:28,770 And p2-- t is just going to be the interval of that constant-- 984 01:15:28,770 --> 01:15:30,740 c2 plus c1 times t. 985 01:15:37,820 --> 01:15:42,010 Try and debate how much to squeeze 986 01:15:42,010 --> 01:15:43,200 in the next few minutes-- 987 01:16:05,540 --> 01:16:07,400 you know what, let's-- 988 01:16:07,400 --> 01:16:09,830 let me do it tomorrow for real-- or on Thursday for real, 989 01:16:09,830 --> 01:16:11,930 because I don't-- because it's going to take 10 minutes 990 01:16:11,930 --> 01:16:13,760 to finish, but it's worth doing it right. 991 01:16:13,760 --> 01:16:19,170 So for homework, to yourself, see if you can work it through. 992 01:16:19,170 --> 01:16:21,290 Take the equations that we had and show 993 01:16:21,290 --> 01:16:24,050 that these-- those four conditions are satisfied, 994 01:16:24,050 --> 01:16:26,270 and I'll spend the first 10 minutes of class doing it 995 01:16:26,270 --> 01:16:27,210 properly on Thursday. 996 01:16:27,210 --> 01:16:29,750 I don't want to rush through it and have it mean nothing. 997 01:16:33,410 --> 01:16:34,280 Sorry. 998 01:16:34,280 --> 01:16:35,172 I wrote 3 here. 999 01:16:35,172 --> 01:16:37,130 There's a condition-- a final condition on p2-- 1000 01:16:37,130 --> 01:16:40,880 so three conditions by this number. 1001 01:16:40,880 --> 01:16:42,500 Awesome.