1 00:00:00,000 --> 00:00:02,520 The following content is provided under a Creative 2 00:00:02,520 --> 00:00:03,970 Commons license. 3 00:00:03,970 --> 00:00:06,330 Your support will help MIT OpenCourseWare 4 00:00:06,330 --> 00:00:10,660 continue to offer high-quality educational resources for free. 5 00:00:10,660 --> 00:00:13,320 To make a donation or view additional materials 6 00:00:13,320 --> 00:00:17,170 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,170 --> 00:00:18,370 at ocw.mit.edu. 8 00:00:21,672 --> 00:00:22,380 RUSS TEDRAKE: OK. 9 00:00:22,380 --> 00:00:23,100 Welcome back. 10 00:00:26,010 --> 00:00:27,750 Since we ended abruptly, I want to start 11 00:00:27,750 --> 00:00:30,990 with a recap of last time. 12 00:00:30,990 --> 00:00:34,020 And then we've got a lot of new ground to cover. 13 00:00:34,020 --> 00:00:43,350 So remember last time, we considered 14 00:00:43,350 --> 00:00:50,550 the system q double dot equals u, which is of a general form, 15 00:00:50,550 --> 00:00:54,720 just a linear feedback system, which is state space form 16 00:00:54,720 --> 00:01:00,060 looks like this, where it happens that a and b are 17 00:01:00,060 --> 00:01:01,050 particularly simple. 18 00:01:03,780 --> 00:01:06,520 And we looked at designing-- 19 00:01:06,520 --> 00:01:08,020 let's not say designing controller-- 20 00:01:08,020 --> 00:01:10,890 we looked at reshaping the phase space 21 00:01:10,890 --> 00:01:12,850 a couple of different ways. 22 00:01:12,850 --> 00:01:17,700 The first way, which is the sort of 6.302 way, maybe, 23 00:01:17,700 --> 00:01:21,660 would be designing sort of by pole placement, 24 00:01:21,660 --> 00:01:26,610 by designing feedback gains possibly by hand, possibly 25 00:01:26,610 --> 00:01:28,960 by a root locus analysis. 26 00:01:28,960 --> 00:01:44,700 So we looked at manually designing some linear feedback 27 00:01:44,700 --> 00:01:48,750 law, u equals negative Kx. 28 00:01:48,750 --> 00:02:00,040 And we did things like plotting the phase portrait, which 29 00:02:00,040 --> 00:02:16,810 gave us for q, q dot a phase portrait that 30 00:02:16,810 --> 00:02:33,090 looked like this, where this has an eigenvalue of negative 3.75 31 00:02:33,090 --> 00:02:39,330 approximately, and this one had an eigenvalue of negative 0.25. 32 00:02:39,330 --> 00:02:43,815 This was all for K equals 1, 4. 33 00:02:46,440 --> 00:02:47,700 OK. 34 00:02:47,700 --> 00:02:53,130 And we ended up seeing that just from that quick analysis 35 00:02:53,130 --> 00:02:56,220 we could see phase portraits which looked like this. 36 00:02:56,220 --> 00:02:58,680 They come across the origin, and then they'd 37 00:02:58,680 --> 00:03:01,120 hook in towards the goal. 38 00:03:04,350 --> 00:03:05,622 And similarly here. 39 00:03:05,622 --> 00:03:07,830 This one's so much faster that it would go like this. 40 00:03:12,340 --> 00:03:15,570 Then we looked at an optimal control 41 00:03:15,570 --> 00:03:18,930 way of solving the same thing. 42 00:03:18,930 --> 00:03:38,220 We looked at doing a minimum time optimal control approach, 43 00:03:38,220 --> 00:03:41,430 not specifically so that we could get there faster, 44 00:03:41,430 --> 00:03:45,135 even though "minimum time" is in the name, because here 45 00:03:45,135 --> 00:03:47,010 remember, we could get there arbitrarily fast 46 00:03:47,010 --> 00:03:50,190 by just cranking K as high as we wanted, but actually 47 00:03:50,190 --> 00:03:53,250 for trying to do something a little bit smarter, which 48 00:03:53,250 --> 00:03:57,630 is get there in minimum time when I have an extra constraint 49 00:03:57,630 --> 00:04:00,720 that u was bounded, in the case we looked at yesterday 50 00:04:00,720 --> 00:04:04,680 was bounded by negative 1, 1. 51 00:04:04,680 --> 00:04:08,250 And in that case when u was bounded, now 52 00:04:08,250 --> 00:04:10,270 the minimum time problem becomes nontrivial. 53 00:04:10,270 --> 00:04:13,380 It's not just crank the gains to infinity. 54 00:04:13,380 --> 00:04:19,079 And we had to use some better thinking about it. 55 00:04:19,079 --> 00:04:28,980 And the result was a phase portrait which actually, I 56 00:04:28,980 --> 00:04:30,990 don't know if you left realizing, 57 00:04:30,990 --> 00:04:36,880 it didn't look that different in some ways. 58 00:04:36,880 --> 00:04:42,840 Remember, we had these switching surfaces defined here. 59 00:04:46,620 --> 00:04:56,820 And above this, we'd execute one policy, one bang-bang solution. 60 00:04:56,820 --> 00:05:01,500 And then below it we'd execute another one. 61 00:05:01,500 --> 00:05:03,900 And the resulting system trajectories-- remember, 62 00:05:03,900 --> 00:05:08,280 this one hooked down across the origin and went into the goal 63 00:05:08,280 --> 00:05:09,600 like that. 64 00:05:09,600 --> 00:05:13,770 This one really did exactly the same thing, right? 65 00:05:13,770 --> 00:05:15,600 They would start over here. 66 00:05:15,600 --> 00:05:18,030 They'd hook down here with-- 67 00:05:18,030 --> 00:05:21,935 this time they'd explicitly hit that switching surface 68 00:05:21,935 --> 00:05:23,310 and then ride that into the goal. 69 00:05:26,780 --> 00:05:31,230 So it's a little bit of a sharper result, possibly, 70 00:05:31,230 --> 00:05:32,160 than the other one. 71 00:05:32,160 --> 00:05:38,520 And that final surface was a curve instead of this line. 72 00:05:38,520 --> 00:05:43,470 And for that, we got to have good performance 73 00:05:43,470 --> 00:05:45,150 with bounded torques. 74 00:05:48,810 --> 00:05:56,640 Now, we also did the first of two ways 75 00:05:56,640 --> 00:05:59,610 that we're going to use to sort of analytically 76 00:05:59,610 --> 00:06:01,230 investigate optimality. 77 00:06:07,005 --> 00:06:08,255 AUDIENCE: Can I interrupt you? 78 00:06:08,255 --> 00:06:10,410 RUSS TEDRAKE: Anytime, yeah. 79 00:06:10,410 --> 00:06:13,980 AUDIENCE: Was there a good reason we just-- basically said 80 00:06:13,980 --> 00:06:17,544 we want to do linear feedback there? 81 00:06:17,544 --> 00:06:20,730 Could we have done like x1 times x2? 82 00:06:20,730 --> 00:06:22,440 RUSS TEDRAKE: Good, yeah. 83 00:06:22,440 --> 00:06:24,490 Because-- well, there's a lot of good reasons. 84 00:06:24,490 --> 00:06:28,570 So it's because then the closed loop dynamics are linear, 85 00:06:28,570 --> 00:06:32,340 and we can analyze them in every way, including 86 00:06:32,340 --> 00:06:34,500 making these plots in ways that I couldn't have 87 00:06:34,500 --> 00:06:35,970 done if this was nonlinear. 88 00:06:39,523 --> 00:06:41,940 Another answer would be that this is what 90% of the world 89 00:06:41,940 --> 00:06:45,210 would have done, if that's satisfying at all. 90 00:06:45,210 --> 00:06:49,200 I think that's the dominant way of sort 91 00:06:49,200 --> 00:06:51,540 of thinking about these things. 92 00:06:51,540 --> 00:06:55,410 x1 times x2 is comparably much harder to reason about, 93 00:06:55,410 --> 00:06:56,163 actually. 94 00:06:56,163 --> 00:06:57,371 AUDIENCE: I totally get that. 95 00:06:57,371 --> 00:07:00,090 But is there like a system that the optimal control that 96 00:07:00,090 --> 00:07:04,095 lies in the space that you have to take into 97 00:07:04,095 --> 00:07:06,398 account these different approximations. 98 00:07:06,398 --> 00:07:07,190 RUSS TEDRAKE: Good. 99 00:07:07,190 --> 00:07:12,500 So this is an example of a nonlinear controller. 100 00:07:12,500 --> 00:07:14,840 It happens that the actual control action 101 00:07:14,840 --> 00:07:17,810 is either 1 or negative 1. 102 00:07:17,810 --> 00:07:21,500 But the decision plane is very nonlinear. 103 00:07:21,500 --> 00:07:25,620 So that's absolutely a nonlinear controller. 104 00:07:25,620 --> 00:07:26,660 It came out of linear-- 105 00:07:26,660 --> 00:07:29,450 out of optimal control on a linear system. 106 00:07:29,450 --> 00:07:32,040 But the result is a nonlinear controller. 107 00:07:32,040 --> 00:07:32,540 OK. 108 00:07:35,130 --> 00:07:37,048 Now, certain classes of nonlinear controllers 109 00:07:37,048 --> 00:07:39,090 are going to pop out and be easier to think about 110 00:07:39,090 --> 00:07:41,610 than the broad class. 111 00:07:41,610 --> 00:07:44,910 But we're going to see lots of instances as quickly as we can. 112 00:07:50,020 --> 00:07:50,520 OK. 113 00:07:50,520 --> 00:07:55,740 So we did-- we actually got that curve by thinking just about-- 114 00:07:55,740 --> 00:08:00,155 just using our intuition to reason about bang-bang control. 115 00:08:00,155 --> 00:08:01,530 At the end, I started to show you 116 00:08:01,530 --> 00:08:06,360 that the same thing comes out of what I call solution technique 117 00:08:06,360 --> 00:08:09,360 1 here. 118 00:08:13,830 --> 00:08:16,590 I wouldn't call it that outside of the room. 119 00:08:16,590 --> 00:08:19,770 That's just me being clear here, which 120 00:08:19,770 --> 00:08:23,388 was based on Pontryagin's minimum principle. 121 00:08:37,320 --> 00:08:41,674 Which in this case, is nothing more than just-- 122 00:08:41,674 --> 00:08:43,049 let's write it down, exactly what 123 00:08:43,049 --> 00:08:44,299 we mean by this cost function. 124 00:08:47,240 --> 00:08:50,370 We have some-- let me be a little bit more loose. 125 00:08:50,370 --> 00:08:54,000 We have J, some cost function we want to optimize, 126 00:08:54,000 --> 00:09:00,780 which is a finite time integral of 1 dt. 127 00:09:00,780 --> 00:09:06,180 That sounds ridiculous, but we're just optimizing time. 128 00:09:06,180 --> 00:09:11,010 But we want to optimize that subject to the constraints 129 00:09:11,010 --> 00:09:17,520 that x dot equals f of x u, which in this case 130 00:09:17,520 --> 00:09:28,580 is our linear system; and the constraint that u 131 00:09:28,580 --> 00:09:38,240 was negative 1 in that regime; and the constraint that at time 132 00:09:38,240 --> 00:09:40,880 t, x t had better be at the origin. 133 00:09:44,150 --> 00:09:45,950 Given those constraints, we can say 134 00:09:45,950 --> 00:09:54,275 let's minimize T. We're going to minimize that J, sorry. 135 00:09:54,275 --> 00:09:55,940 I already got the t in there, so. 136 00:09:55,940 --> 00:10:04,490 Minimize with respect to the trajectory in x, u, 137 00:10:04,490 --> 00:10:06,140 that cost function. 138 00:10:06,140 --> 00:10:11,060 I use this overbar to denote the entire time 139 00:10:11,060 --> 00:10:18,680 history of a variable like x t1 to t final, or something 140 00:10:18,680 --> 00:10:20,090 like this-- time t0 to t final. 141 00:10:24,700 --> 00:10:25,270 OK. 142 00:10:25,270 --> 00:10:26,645 That's how we set up the problem. 143 00:10:26,645 --> 00:10:28,680 It's just optimizing some function 144 00:10:28,680 --> 00:10:33,040 but subject to a handful of constraints. 145 00:10:33,040 --> 00:10:36,250 Pontryagin's minimum principle is nothing more 146 00:10:36,250 --> 00:10:39,340 than putting Lagrange multipliers to work 147 00:10:39,340 --> 00:10:41,800 to turn that constrained optimization 148 00:10:41,800 --> 00:10:46,180 into unconstrained optimization. 149 00:10:46,180 --> 00:10:58,270 And for this problem, we can build our augmented system 150 00:10:58,270 --> 00:11:03,540 I'll call J prime here, which just is the same thing 151 00:11:03,540 --> 00:11:05,320 but taking in the constraints. 152 00:11:05,320 --> 00:11:08,670 So first of all, we've got a constraint on x T equaling 0. 153 00:11:08,670 --> 00:11:11,100 So I can put that in as a Lagrange multiplier, 154 00:11:11,100 --> 00:11:14,370 let's say lambda times something that better equal 0, which 155 00:11:14,370 --> 00:11:17,940 in this case was just x t And then 156 00:11:17,940 --> 00:11:26,760 plus 0 to t1 plus the constraint on the dynamics, 157 00:11:26,760 --> 00:11:29,940 which I'll call it a different Lagrange multiplier p, 158 00:11:29,940 --> 00:11:37,517 times f of x, u minus x dot, this whole thing dt. 159 00:11:42,190 --> 00:11:42,760 Yes? 160 00:11:42,760 --> 00:11:43,760 AUDIENCE: How do you impose the constraint 161 00:11:43,760 --> 00:11:44,680 that u is [INAUDIBLE]? 162 00:11:44,680 --> 00:11:45,597 RUSS TEDRAKE: Awesome. 163 00:11:45,597 --> 00:11:46,910 Good question. 164 00:11:46,910 --> 00:11:51,020 So it turns out what we're going to look at-- 165 00:11:51,020 --> 00:11:54,160 we want to verify that this thing is optimal. 166 00:11:54,160 --> 00:11:56,620 So you might want to put that constraint right in here. 167 00:11:56,620 --> 00:11:59,870 But it actually is more natural-- 168 00:11:59,870 --> 00:12:02,570 here, let me finish my statement here. 169 00:12:02,570 --> 00:12:06,920 The way we're going to verify optimality of this policy 170 00:12:06,920 --> 00:12:12,380 is by verifying that we're at a local minimum in J prime. 171 00:12:12,380 --> 00:12:17,090 I want to say that if I change x, If I change u, 172 00:12:17,090 --> 00:12:24,757 if I change p in any admissible way, then J is going to change. 173 00:12:24,757 --> 00:12:26,840 Small changes in here is not going to change this. 174 00:12:26,840 --> 00:12:28,215 I'm at a local minima in J prime. 175 00:12:31,195 --> 00:12:34,398 That's the minimum principle idea, right? 176 00:12:34,398 --> 00:12:36,440 I just want my-- if I'm at a minimum of function, 177 00:12:36,440 --> 00:12:37,820 the gradient is 0. 178 00:12:37,820 --> 00:12:40,220 In the Lagrange multiplier, the minimum 179 00:12:40,220 --> 00:12:43,430 of this augmented function, the gradient had to be 0. 180 00:12:43,430 --> 00:12:46,400 So if I change any of these, I want that to be-- 181 00:12:46,400 --> 00:12:50,420 that change to be 0. 182 00:12:50,420 --> 00:12:53,660 So it turns out that the more natural way 183 00:12:53,660 --> 00:12:59,352 to look at this bound in u is by not 184 00:12:59,352 --> 00:13:01,810 changing-- not allowing u to change outside of that regime. 185 00:13:07,610 --> 00:13:09,960 This is actually fairly procedural. 186 00:13:09,960 --> 00:13:12,290 So you end up doing this calculus 187 00:13:12,290 --> 00:13:15,530 of variations on J prime. 188 00:13:15,530 --> 00:13:17,998 But I actually-- I made a call earlier today. 189 00:13:17,998 --> 00:13:20,540 I think it's going to-- if I do it right now in the beginning 190 00:13:20,540 --> 00:13:22,040 in class, I'm going to lose you to-- 191 00:13:22,040 --> 00:13:25,280 I mean, I'm going to bore you and lose you. 192 00:13:25,280 --> 00:13:28,160 But it's in the notes, and it's clean. 193 00:13:28,160 --> 00:13:30,380 So I'm going to leave that hanging 194 00:13:30,380 --> 00:13:32,630 and let you look at it in the notes 195 00:13:32,630 --> 00:13:36,770 without typos that I might put up on the board, OK? 196 00:13:36,770 --> 00:13:40,340 Because I want to move on to the dynamic programming 197 00:13:40,340 --> 00:13:43,250 view of the world, sort of the other possible solution 198 00:13:43,250 --> 00:13:44,134 technique. 199 00:13:50,070 --> 00:13:51,500 OK. 200 00:13:51,500 --> 00:13:56,330 So today, we're going to do-- 201 00:13:56,330 --> 00:13:58,580 you can think of it as just solution technique 2 here. 202 00:14:04,550 --> 00:14:08,465 And it's based on dynamic programming. 203 00:14:20,597 --> 00:14:22,430 Now, the computer scientists in the audience 204 00:14:22,430 --> 00:14:24,742 say, I know dynamic programming. 205 00:14:24,742 --> 00:14:27,200 It's how I find the shortest path between point A and point 206 00:14:27,200 --> 00:14:30,200 B without reusing memory, and things like that. 207 00:14:30,200 --> 00:14:31,740 And you're exactly right. 208 00:14:31,740 --> 00:14:33,225 That's exactly what it is. 209 00:14:33,225 --> 00:14:34,850 It happens that the dynamic programming 210 00:14:34,850 --> 00:14:40,580 has a slightly bigger footprint in the world. 211 00:14:40,580 --> 00:14:43,340 There's a continuous form of dynamic programming. 212 00:14:43,340 --> 00:14:44,470 OK. 213 00:14:44,470 --> 00:14:46,490 So a graph search is a very discrete form 214 00:14:46,490 --> 00:14:48,110 of dynamic programming. 215 00:14:48,110 --> 00:14:49,610 So I'm going to start with sort of-- 216 00:14:49,610 --> 00:14:52,152 I'm actually going to work from the graph search sort of view 217 00:14:52,152 --> 00:14:54,350 of the world, but to make the continuous form that 218 00:14:54,350 --> 00:14:58,490 works for these continuous dynamical systems. 219 00:14:58,490 --> 00:15:03,530 And we're going to use this to investigate a different cost 220 00:15:03,530 --> 00:15:30,620 function, which is just this-- 221 00:15:30,620 --> 00:15:42,860 still subject to the dynamics, which in this case 222 00:15:42,860 --> 00:15:44,495 was the linear dynamics. 223 00:15:51,420 --> 00:15:51,920 OK. 224 00:15:56,900 --> 00:15:59,858 So before we worry about solving it, 225 00:15:59,858 --> 00:16:02,150 let's take a minute to decide if it's a reasonable cost 226 00:16:02,150 --> 00:16:02,650 function. 227 00:16:07,040 --> 00:16:10,380 It's different in a couple of ways. 228 00:16:10,380 --> 00:16:14,000 First of all, there's no hard limit on u. 229 00:16:14,000 --> 00:16:17,450 But I do penalize for u being away from 0. 230 00:16:20,930 --> 00:16:25,760 So it's sort of a softer penalty on u, not a hard limit. 231 00:16:25,760 --> 00:16:28,557 And then these terms are penalizing it 232 00:16:28,557 --> 00:16:30,890 from being-- the system from being away from the origin. 233 00:16:34,160 --> 00:16:36,140 And instead of going for some finite time 234 00:16:36,140 --> 00:16:42,180 and minimizing time, I'm going to go for an infinite horizon. 235 00:16:42,180 --> 00:16:47,270 So the only way to drive this thing, the only way, actually, 236 00:16:47,270 --> 00:16:53,510 for J to be a finite cost over this infinite integral, 237 00:16:53,510 --> 00:16:59,862 is if q and q dot get to 0, and you do u of 0 at 0. 238 00:16:59,862 --> 00:17:01,570 Otherwise, this thing's going to blow up. 239 00:17:01,570 --> 00:17:03,690 It's going to be an infinite integral. 240 00:17:03,690 --> 00:17:05,569 So the solution had better result 241 00:17:05,569 --> 00:17:08,630 in us getting to the origin, it turns out. 242 00:17:08,630 --> 00:17:11,480 But I'm not trying to explicitly minimize the time. 243 00:17:11,480 --> 00:17:13,160 I'm just penalizing it for being away, 244 00:17:13,160 --> 00:17:16,910 and I'm penalizing it for taking action. 245 00:17:16,910 --> 00:17:23,900 Now, what's the name of this type of control? 246 00:17:23,900 --> 00:17:26,180 Who knows is? 247 00:17:26,180 --> 00:17:27,650 I think-- yeah, LQR, right? 248 00:17:27,650 --> 00:17:29,872 So this is a Linear Quadratic Regulator. 249 00:17:40,970 --> 00:17:42,410 OK. 250 00:17:42,410 --> 00:17:44,210 It's a staple of-- 251 00:17:44,210 --> 00:17:49,040 it's sort of the best, most used result from optimal control. 252 00:17:49,040 --> 00:17:52,873 Everybody opens up Matlab and calls lqr. 253 00:17:52,873 --> 00:17:54,290 But you're going to understand it. 254 00:17:59,380 --> 00:17:59,880 Good. 255 00:17:59,880 --> 00:18:05,310 But to do LQR, to understand how that derivation works, 256 00:18:05,310 --> 00:18:07,977 we've got to do-- we're going to go through dynamic programming. 257 00:18:11,386 --> 00:18:14,795 AUDIENCE: Couldn't we use the same cost function 258 00:18:14,795 --> 00:18:15,378 there as well? 259 00:18:15,378 --> 00:18:16,295 RUSS TEDRAKE: Awesome. 260 00:18:16,295 --> 00:18:16,920 OK. 261 00:18:16,920 --> 00:18:19,170 So why don't I put that cost function down and just do 262 00:18:19,170 --> 00:18:22,220 Pontryagin's minimum principal? 263 00:18:22,220 --> 00:18:24,940 There's only one sort of subtle reason, 264 00:18:24,940 --> 00:18:27,930 which is that that's an infinite horizon cost. 265 00:18:31,337 --> 00:18:32,920 So I was going to say this at the end, 266 00:18:32,920 --> 00:18:34,890 but let's have this discussion now. 267 00:18:34,890 --> 00:18:37,470 So this is an infinite horizon. 268 00:18:37,470 --> 00:18:42,360 Pontryagin's is used to verify the optimality 269 00:18:42,360 --> 00:18:44,580 of some finite integral. 270 00:18:48,000 --> 00:18:53,520 So let's compare-- well, I know you know value-- 271 00:18:53,520 --> 00:18:54,850 the dynamic programming. 272 00:18:54,850 --> 00:18:56,400 So maybe let me say what dynamic programming is, 273 00:18:56,400 --> 00:18:57,510 and then I'll contrast them. 274 00:18:57,510 --> 00:18:58,010 Yeah. 275 00:19:07,070 --> 00:19:10,213 But the people sort of-- 276 00:19:10,213 --> 00:19:11,880 I just want to understand what happened. 277 00:19:11,880 --> 00:19:14,000 We got two different cost functions, 278 00:19:14,000 --> 00:19:15,898 two different solution techniques for now. 279 00:19:15,898 --> 00:19:17,690 And we're going to address in a few minutes 280 00:19:17,690 --> 00:19:20,330 why I did different solution techniques 281 00:19:20,330 --> 00:19:21,840 for the different cost functions. 282 00:19:21,840 --> 00:19:24,260 But I hope they both seem like sort of reasonable cost 283 00:19:24,260 --> 00:19:28,220 functions if I want to get my system to the origin. 284 00:19:28,220 --> 00:19:30,470 Different-- we're going to look at what the result is, 285 00:19:30,470 --> 00:19:31,430 the different results. 286 00:19:31,430 --> 00:19:33,430 And actually, something I want to leave you with 287 00:19:33,430 --> 00:19:36,733 is that you can, in fact, do lots of different combinations 288 00:19:36,733 --> 00:19:37,400 of these things. 289 00:19:37,400 --> 00:19:42,761 You could do quadratic costs and try to have some minimum time. 290 00:19:42,761 --> 00:19:46,740 There's lots and lots of ways to formulate these cost functions. 291 00:19:46,740 --> 00:19:51,950 These are two sort of examples, but you 292 00:19:51,950 --> 00:19:55,480 can do minimum time LQR, you can do all these things. 293 00:19:55,480 --> 00:19:56,078 OK. 294 00:19:56,078 --> 00:19:57,620 But with the way we're going to drive 295 00:19:57,620 --> 00:20:00,380 the LQR controller is by thinking 296 00:20:00,380 --> 00:20:02,060 about dynamic programming. 297 00:20:02,060 --> 00:20:04,370 And to do that, let me start with the discrete world, 298 00:20:04,370 --> 00:20:05,870 where people-- where it makes sense. 299 00:20:08,772 --> 00:20:10,730 So let's imagine I have a discrete time system. 300 00:20:23,120 --> 00:20:30,420 So x of n plus 1 is f of x n u n. 301 00:20:37,626 --> 00:20:39,125 And I have some cost function. 302 00:20:42,180 --> 00:20:44,718 Now remember, in the Pontryagin minimum principle, 303 00:20:44,718 --> 00:20:46,760 which shows that there's a sort of a general form 304 00:20:46,760 --> 00:20:48,320 that a lot of these cost functions 305 00:20:48,320 --> 00:20:56,270 take in the discrete form, it's h of x at capital 306 00:20:56,270 --> 00:21:03,660 N plus a sum instead of an integral of n 307 00:21:03,660 --> 00:21:12,220 equals 0 to N minus 1 g of x n u n. 308 00:21:21,270 --> 00:21:21,770 OK. 309 00:21:24,940 --> 00:21:33,760 Now, again, I said this sort of additive form of cost functions 310 00:21:33,760 --> 00:21:34,745 is pretty common. 311 00:21:34,745 --> 00:21:37,120 And you're going to see right now one of the reasons why. 312 00:21:37,120 --> 00:21:40,150 The great thing about having these costs that 313 00:21:40,150 --> 00:21:43,120 accumulate additively over the trajectory 314 00:21:43,120 --> 00:21:49,600 is that I can make a recursive form of this equation. 315 00:21:49,600 --> 00:21:52,080 So in particular, if I-- 316 00:21:52,080 --> 00:21:54,250 so I should call this, really, what 317 00:21:54,250 --> 00:22:01,390 I've been calling J, that's really the J of being at x 0 318 00:22:01,390 --> 00:22:02,380 at time 0. 319 00:22:06,850 --> 00:22:12,490 And I can compute J of being at x 0 at time 0 320 00:22:12,490 --> 00:22:16,990 and incurring the rest of the cost recursively 321 00:22:16,990 --> 00:22:20,620 by looking at what it would be like to be at some state 322 00:22:20,620 --> 00:22:21,610 x at time N-- 323 00:22:26,340 --> 00:22:32,267 and that in this case is just h of x of n-- 324 00:22:32,267 --> 00:22:34,350 and then thinking about what it would be like at-- 325 00:22:34,350 --> 00:22:38,070 to be at some J of x N minus 1-- 326 00:22:42,030 --> 00:22:51,210 and that's going to be g of x n minus 1 u of n minus 1 327 00:22:51,210 --> 00:22:53,820 plus h of x n. 328 00:23:03,900 --> 00:23:06,000 Let me be even more careful. 329 00:23:06,000 --> 00:23:07,980 And I'm going to say, let's evaluate 330 00:23:07,980 --> 00:23:17,022 the cost of running a particular policy, 331 00:23:17,022 --> 00:23:21,976 u n is just some pi of J of x n. 332 00:23:21,976 --> 00:23:24,416 AUDIENCE: Sorry, why is the first x a 0, 333 00:23:24,416 --> 00:23:26,856 and then the rest of the x's [INAUDIBLE]?? 334 00:23:30,512 --> 00:23:31,220 RUSS TEDRAKE: OK. 335 00:23:31,220 --> 00:23:34,100 So why did I put x 0 here? 336 00:23:34,100 --> 00:23:35,220 That was intentional. 337 00:23:35,220 --> 00:23:38,512 I'm trying to make x 0 the variable that fits in here. 338 00:23:38,512 --> 00:23:40,220 Here x is the variable that fits in here. 339 00:23:40,220 --> 00:23:42,220 But you're right, I could be a little bit more-- 340 00:23:42,220 --> 00:23:44,110 I should be more careful. 341 00:23:44,110 --> 00:23:48,830 So now J, a function of this variable x at time N 342 00:23:48,830 --> 00:23:50,390 should really just be h of x. 343 00:23:50,390 --> 00:23:51,910 Yeah, good. 344 00:23:51,910 --> 00:23:54,320 So then this is-- 345 00:23:54,320 --> 00:23:56,580 I could say it this way. 346 00:23:56,580 --> 00:24:01,400 The other way I could say it is J x minus 1 equals x. 347 00:24:01,400 --> 00:24:03,320 Maybe that's the best way to rectify it. 348 00:24:11,960 --> 00:24:13,030 OK. 349 00:24:13,030 --> 00:24:20,410 And when I'm evaluating the cost of a particular policy, 350 00:24:20,410 --> 00:24:27,220 I'm going to use the notation J pi here, 351 00:24:27,220 --> 00:24:30,190 say this is the cost I should expect to receive 352 00:24:30,190 --> 00:24:33,370 given I'm in some state x. 353 00:24:33,370 --> 00:24:36,162 To make it even more satisfying, let's just 354 00:24:36,162 --> 00:24:37,120 be the same everywhere. 355 00:24:37,120 --> 00:24:43,675 This is x 0, and here I'll say x 0 equals my x. 356 00:24:46,360 --> 00:24:51,340 If I'm in some state x at time 0 executing policy pi, 357 00:24:51,340 --> 00:24:53,770 I'm going to incur this cost. 358 00:24:53,770 --> 00:24:59,770 If I'm at some state x at time N incurring this-- 359 00:24:59,770 --> 00:25:03,970 taking this policy, I'm going to get this. 360 00:25:03,970 --> 00:25:06,040 Here I'm going to get this. 361 00:25:06,040 --> 00:25:10,420 And even when I'm executing policy pi, 362 00:25:10,420 --> 00:25:13,780 I can even furthermore say that x n 363 00:25:13,780 --> 00:25:21,850 is f of x n minus 1 pi of x n minus 1. 364 00:25:24,980 --> 00:25:26,980 It's probably impossible to read in that corner. 365 00:25:35,570 --> 00:25:36,070 OK. 366 00:25:44,045 --> 00:25:45,670 So you can see where I'm going with it. 367 00:25:50,740 --> 00:26:00,640 It's pretty easy to see that J pi of x at some N 368 00:26:00,640 --> 00:26:07,330 is just the one-step cost g of x n u 369 00:26:07,330 --> 00:26:24,980 n plus the cost I expect to see given that x n plus 1 at time 370 00:26:24,980 --> 00:26:25,520 equals 1. 371 00:26:41,730 --> 00:26:42,230 OK. 372 00:26:48,630 --> 00:26:54,570 So the reason we like these integral costs or the sum 373 00:26:54,570 --> 00:26:58,560 of costs in the discrete time case 374 00:26:58,560 --> 00:27:03,780 is because I can do these recursive computations. 375 00:27:03,780 --> 00:27:06,150 And the same thing true if I look at-- 376 00:27:06,150 --> 00:27:09,420 if I define what the optimal cost is. 377 00:27:09,420 --> 00:27:15,720 So let's now define J star to be the cost I 378 00:27:15,720 --> 00:27:34,430 incur if I follow the optimal policy, which is pi star. 379 00:27:40,330 --> 00:27:43,060 Well, it turns out the same thing works. 380 00:27:56,440 --> 00:28:00,920 But now, there's an extra term here. 381 00:28:31,870 --> 00:28:32,370 OK. 382 00:28:43,150 --> 00:28:48,250 So it's easy to see that the cost of following 383 00:28:48,250 --> 00:28:52,750 a particular policy is recursive. 384 00:28:52,750 --> 00:28:55,780 It's more surprising that the cost 385 00:28:55,780 --> 00:28:58,930 to go of the optimal policy is equally 386 00:28:58,930 --> 00:29:02,080 recursive with a simple form like this, min over u. 387 00:29:05,500 --> 00:29:07,480 And this actually follows from something called 388 00:29:07,480 --> 00:29:09,234 the principle of optimality. 389 00:29:12,020 --> 00:29:14,500 Anybody see the principle of optimality before? 390 00:29:14,500 --> 00:29:16,700 OK. 391 00:29:16,700 --> 00:29:22,580 It says that if I want to be optimal over some trajectory, 392 00:29:22,580 --> 00:29:25,397 I'd better be optimal over-- 393 00:29:25,397 --> 00:29:26,730 from the end of that trajectory. 394 00:29:26,730 --> 00:29:32,588 So if I want to be optimal for the last-- 395 00:29:32,588 --> 00:29:35,720 it's from n minus 2 to the end, then 396 00:29:35,720 --> 00:29:38,930 I'd better be optimal from n minus 1 to the end. 397 00:29:38,930 --> 00:29:43,340 So it turns out if I act optimally in one step 398 00:29:43,340 --> 00:29:45,500 by doing this min over u, and then follow 399 00:29:45,500 --> 00:29:50,840 the policy of acting optimally for the rest of time, then 400 00:29:50,840 --> 00:29:55,310 that's optimal for the entire function, OK? 401 00:30:17,290 --> 00:30:17,790 OK. 402 00:30:21,490 --> 00:30:24,520 OK, good. 403 00:30:24,520 --> 00:30:27,400 So we've got a recursive form of this cost-to-go function 404 00:30:27,400 --> 00:30:32,920 that we exploited with the additive thing, 405 00:30:32,920 --> 00:30:34,450 the additive form. 406 00:30:34,450 --> 00:30:41,170 And now, the optimal policy comes straight out. 407 00:30:52,540 --> 00:31:03,360 The best thing to do, if you're in state x and a time n. 408 00:31:03,360 --> 00:31:18,510 is just the arg min over u of g x, u plus J star 409 00:31:18,510 --> 00:31:30,270 x, n plus 1 n plus 1 with that same x, n plus 1 defined by-- 410 00:31:44,090 --> 00:31:46,430 So in discrete time, optimal control is trivial. 411 00:31:46,430 --> 00:31:49,040 If you have an additive cost function, all you have to do 412 00:31:49,040 --> 00:31:52,430 is figure out what your cost is at the end, 413 00:31:52,430 --> 00:31:56,420 and then go back one step, do the thing that acts-- 414 00:31:56,420 --> 00:31:59,060 that in one step minimizes the cost 415 00:31:59,060 --> 00:32:02,130 and gets me to the lowest possible cost in the future. 416 00:32:02,130 --> 00:32:04,130 And if I just do that recursively backwards, 417 00:32:04,130 --> 00:32:05,750 I come up with the optimal policy 418 00:32:05,750 --> 00:32:09,860 that gets me from any x in n steps to the end. 419 00:32:18,590 --> 00:32:21,310 Does that make sense? 420 00:32:21,310 --> 00:32:22,460 Ask questions. 421 00:32:31,100 --> 00:32:31,950 Do people buy that? 422 00:32:31,950 --> 00:32:34,638 Is that obvious, or does that need more explanation? 423 00:32:46,600 --> 00:32:47,900 OK. 424 00:32:47,900 --> 00:32:49,670 Ask questions if you have them. 425 00:32:49,670 --> 00:32:50,170 All right. 426 00:32:50,170 --> 00:32:58,720 So we're going to use the discrete time form again 427 00:32:58,720 --> 00:33:00,130 when we get to the algorithms. 428 00:33:00,130 --> 00:33:02,290 But I'm trying to use it today to leapfrog 429 00:33:02,290 --> 00:33:07,030 into the continuous time conditions for optimality. 430 00:33:10,330 --> 00:33:13,180 So what happens if we now do the same sort of discrete time 431 00:33:13,180 --> 00:33:17,650 thinking, but do it in the limit where the time between steps 432 00:33:17,650 --> 00:33:18,970 goes to 0? 433 00:33:22,270 --> 00:33:24,760 So let me try to do the limiting argument to get us back 434 00:33:24,760 --> 00:33:25,600 to continuous time. 435 00:33:47,512 --> 00:33:48,050 OK. 436 00:33:48,050 --> 00:33:56,860 Now we've got our cost function, again, is h of x at capital T 437 00:33:56,860 --> 00:34:04,575 plus the integral from 0 to T of g x, u dt. 438 00:34:12,199 --> 00:34:14,840 The analogous statement from this recursion 439 00:34:14,840 --> 00:34:23,510 in the discrete time is that J x at t 440 00:34:23,510 --> 00:34:28,520 is going to be a limiting argument as dt 441 00:34:28,520 --> 00:34:45,679 goes to 0 of the min over u of g x, u dt plus J 442 00:34:45,679 --> 00:34:51,590 x of t plus dt t plus dt. 443 00:35:03,240 --> 00:35:04,560 OK. 444 00:35:04,560 --> 00:35:08,220 This is now-- that's just a limiting argument 445 00:35:08,220 --> 00:35:14,970 as dt goes to 0 of the same recursive statement. 446 00:35:14,970 --> 00:35:27,270 I'm going to approximate J x of t plus dt as-- 447 00:35:27,270 --> 00:35:30,870 this is J star let me not forget my stars-- 448 00:35:30,870 --> 00:35:41,850 as J star at x t plus partial J star partial x 449 00:35:41,850 --> 00:35:50,462 x dot dt plus partial J star partial t dt. 450 00:35:56,240 --> 00:35:59,180 It's a Taylor expansion of that term. 451 00:36:18,240 --> 00:36:19,590 OK. 452 00:36:19,590 --> 00:36:26,100 If I insert that back in, then I have J star x of t 453 00:36:26,100 --> 00:36:36,375 equals the limit as dt goes to 0 min over u g 454 00:36:36,375 --> 00:36:46,350 x, u dt plus partial J star partial x-- 455 00:36:46,350 --> 00:36:49,770 x dot is just f of x, u, remember-- 456 00:36:49,770 --> 00:36:59,785 dt plus partial J partial t dt. 457 00:36:59,785 --> 00:37:03,400 And I left off that J x there, because that actually 458 00:37:03,400 --> 00:37:04,330 doesn't depend on u. 459 00:37:04,330 --> 00:37:09,730 So I'm going to put that outside here, plus J x and t. 460 00:37:22,770 --> 00:37:23,760 Those guys cancel. 461 00:37:26,650 --> 00:37:29,970 And now I've got a dt everywhere. 462 00:37:29,970 --> 00:37:33,420 So I can actually take that out, and my limiting argument 463 00:37:33,420 --> 00:37:35,850 goes away. 464 00:37:35,850 --> 00:37:39,615 And what I'm left with, 0 equals min 465 00:37:39,615 --> 00:37:48,540 over u g of x, u plus partial J partial x star 466 00:37:48,540 --> 00:37:50,790 plus partial J partial t. 467 00:37:55,080 --> 00:37:58,470 This is a very famous equation, will be used a lot. 468 00:38:13,255 --> 00:38:14,880 It's called the Hamilton-Jacobi-Bellman 469 00:38:14,880 --> 00:38:15,653 equation. 470 00:38:15,653 --> 00:38:16,278 AUDIENCE: Russ. 471 00:38:16,278 --> 00:38:17,028 RUSS TEDRAKE: Yes? 472 00:38:17,028 --> 00:38:17,550 Did I miss-- 473 00:38:17,550 --> 00:38:20,250 AUDIENCE: x dot in the middle term there. 474 00:38:20,250 --> 00:38:21,486 RUSS TEDRAKE: Here? 475 00:38:21,486 --> 00:38:23,160 AUDIENCE: Last equation. 476 00:38:23,160 --> 00:38:24,808 That x dot [INAUDIBLE]. 477 00:38:24,808 --> 00:38:25,850 RUSS TEDRAKE: Oh, thanks. 478 00:38:25,850 --> 00:38:26,850 Good. 479 00:38:26,850 --> 00:38:29,700 This is f of x, u. 480 00:38:29,700 --> 00:38:30,200 Good. 481 00:38:30,200 --> 00:38:30,700 Thank you. 482 00:38:42,750 --> 00:38:43,860 Good, thank you. 483 00:38:43,860 --> 00:38:45,390 That is the Hamilton-Jacobi-Bellman 484 00:38:45,390 --> 00:38:48,840 equation, often known as the HJB. 485 00:38:52,200 --> 00:38:57,163 So Hamilton and Jacobi are really old guys. 486 00:38:57,163 --> 00:38:58,080 Bellman's a newer guy. 487 00:38:58,080 --> 00:39:00,010 He was in the '60s or something. 488 00:39:00,010 --> 00:39:02,700 A lot of people say Hamilton-Bellman-Jacobi. 489 00:39:02,700 --> 00:39:04,260 That doesn't seem quite right to me. 490 00:39:04,260 --> 00:39:06,780 That's some guy in the '60s sticking his name 491 00:39:06,780 --> 00:39:08,100 in between Hamilton and Jacobi. 492 00:39:08,100 --> 00:39:10,130 So I try to-- 493 00:39:10,130 --> 00:39:13,110 I will probably say HBJ a couple of times in the class, 494 00:39:13,110 --> 00:39:15,780 but whenever I'm thinking about it I say HJB, OK? 495 00:39:18,640 --> 00:39:19,140 OK. 496 00:39:19,140 --> 00:39:21,140 So we did a little bit of work in discrete time. 497 00:39:21,140 --> 00:39:24,300 But the absolute output of that thinking, 498 00:39:24,300 --> 00:39:26,790 the thing you need to remember, is 499 00:39:26,790 --> 00:39:33,330 this Hamilton-Jacobi-Bellman equation, OK? 500 00:39:33,330 --> 00:39:37,320 These turn out to be the conditions of optimality 501 00:39:37,320 --> 00:39:38,350 for continuous time. 502 00:39:41,410 --> 00:39:42,960 Let's think about what it means. 503 00:39:48,570 --> 00:39:54,540 So do you have yet a picture of this sort of what J is. 504 00:39:54,540 --> 00:39:56,010 J is a cost-to-go. 505 00:39:56,010 --> 00:39:58,170 It's a function over the entire landscape. 506 00:39:58,170 --> 00:40:01,110 It tells me if I'm in some state, how much cost 507 00:40:01,110 --> 00:40:03,120 am I going to incur with my cost function 508 00:40:03,120 --> 00:40:05,910 as it runs off into time. 509 00:40:05,910 --> 00:40:08,490 In the finite horizon case, it's just an integral 510 00:40:08,490 --> 00:40:10,290 to the end of time. 511 00:40:10,290 --> 00:40:12,000 In the infinite horizon case, I've 512 00:40:12,000 --> 00:40:14,417 started this initial condition, and I run my cost function 513 00:40:14,417 --> 00:40:16,290 forever. 514 00:40:16,290 --> 00:40:22,330 So j is a cost landscape, a cost-to-go landscape. 515 00:40:22,330 --> 00:40:26,340 This statement here says that, if I 516 00:40:26,340 --> 00:40:33,030 move a little bit in that landscape in x, scale 517 00:40:33,030 --> 00:40:35,175 by this x dot, then the thing I should incur 518 00:40:35,175 --> 00:40:38,190 is that is my instantaneous cost. 519 00:40:38,190 --> 00:40:38,690 OK. 520 00:40:42,300 --> 00:40:44,337 The way my cost landscape-- the difference 521 00:40:44,337 --> 00:40:46,170 of being in initial condition 1 versus being 522 00:40:46,170 --> 00:40:49,740 in initial condition 2, if they're neighboring, 523 00:40:49,740 --> 00:40:53,230 goes like the cost function. 524 00:40:53,230 --> 00:40:57,000 And there's the cost function-- the cost-to-go function lives 525 00:40:57,000 --> 00:41:00,040 in x, and it lives in time. 526 00:41:00,040 --> 00:41:00,540 OK. 527 00:41:04,800 --> 00:41:08,520 It's one of the most important equations we'll have-- 528 00:41:08,520 --> 00:41:11,470 Hamilton-Bellman-Jacobi equation. 529 00:41:11,470 --> 00:41:16,976 AUDIENCE: So we can take out the partial case that [INAUDIBLE]?? 530 00:41:19,910 --> 00:41:22,270 Because that one Is independent of u, the last term. 531 00:41:24,990 --> 00:41:30,770 So if we take that out, basically, 532 00:41:30,770 --> 00:41:34,340 the difference between the value to [INAUDIBLE] with respect 533 00:41:34,340 --> 00:41:38,090 to time, in this time and going to the next time that sort 534 00:41:38,090 --> 00:41:41,022 of seems like a TD error squared-- 535 00:41:41,022 --> 00:41:41,980 RUSS TEDRAKE: Oh, yeah. 536 00:41:41,980 --> 00:41:42,480 Yeah. 537 00:41:42,480 --> 00:41:43,670 Good. 538 00:41:43,670 --> 00:41:46,360 There's absolutely-- this is exactly the source of the TD 539 00:41:46,360 --> 00:41:47,690 error and the Bell-- yeah. 540 00:41:47,690 --> 00:41:49,470 It's exactly the Bellman equation. 541 00:41:49,470 --> 00:41:49,970 So yeah. 542 00:41:49,970 --> 00:41:50,637 So you're right. 543 00:41:50,637 --> 00:41:55,355 Partial J partial t could have been outside the min over u. 544 00:41:55,355 --> 00:41:57,110 It doesn't actually have u. 545 00:41:57,110 --> 00:42:01,490 But we're going to see all those connections 546 00:42:01,490 --> 00:42:03,260 as we get into the algorithms. 547 00:42:03,260 --> 00:42:07,640 But for-- this now is a tool for proving analytically 548 00:42:07,640 --> 00:42:10,370 and driving analytically some optimal controllers. 549 00:42:14,000 --> 00:42:14,990 We need one more-- 550 00:42:17,780 --> 00:42:20,750 we need to say something stronger about how useful 551 00:42:20,750 --> 00:42:21,650 that tool is. 552 00:42:34,290 --> 00:42:45,470 So there's the sufficiency theorem 553 00:42:45,470 --> 00:42:48,000 is what gives this guy teeth, OK? 554 00:42:48,000 --> 00:42:51,360 So I told you that the Pontryagin's minimum principle 555 00:42:51,360 --> 00:42:54,615 was a necessary condition for optimality. 556 00:42:54,615 --> 00:42:56,220 It wasn't necessarily sufficient. 557 00:42:56,220 --> 00:43:00,630 If you show that the system satisfies the Pontryagin's 558 00:43:00,630 --> 00:43:05,383 minimum principle, then you're close, 559 00:43:05,383 --> 00:43:07,800 but you actually also have to say it uniquely solves that, 560 00:43:07,800 --> 00:43:10,350 it's the only solution to that, solves the Pontryagin's 561 00:43:10,350 --> 00:43:11,100 minimum principle. 562 00:43:11,100 --> 00:43:13,637 So there's extra work needed. 563 00:43:13,637 --> 00:43:15,720 The theorem we're putting up here is this saying-- 564 00:43:15,720 --> 00:43:18,402 is going to say that if this equation is satisfied, 565 00:43:18,402 --> 00:43:19,860 then that's sufficient to guarantee 566 00:43:19,860 --> 00:43:25,690 that the policy is optimal. 567 00:43:25,690 --> 00:43:27,040 OK. 568 00:43:27,040 --> 00:43:50,990 So given a policy pi x of t, and a cost-to go function, 569 00:43:50,990 --> 00:44:07,370 J pi x of t, if pi is the argument of this, 570 00:44:07,370 --> 00:44:37,535 if pi is the policy which minimizes that for all x 571 00:44:37,535 --> 00:45:32,760 and all t, and that condition is met, then we can-- 572 00:45:32,760 --> 00:45:41,160 that's sufficient to give that J pi x of t 573 00:45:41,160 --> 00:45:53,190 equals J pi of x of t and pi x of t pi star x of t. 574 00:46:01,950 --> 00:46:02,450 OK. 575 00:46:10,227 --> 00:46:12,060 The proof of that I'm not even going to try. 576 00:46:12,060 --> 00:46:15,450 It's sort of tedious. 577 00:46:15,450 --> 00:46:17,970 It's in Bertsekas, if you like-- 578 00:46:17,970 --> 00:46:19,680 Bertsekas' book. 579 00:46:19,680 --> 00:46:22,260 But we're going to use this a lot. 580 00:46:26,160 --> 00:46:31,170 So if I can find some combination of J, pi, and pi 581 00:46:31,170 --> 00:46:33,960 that match that condition, then I've found an optimal policy. 582 00:46:40,790 --> 00:46:41,290 OK. 583 00:46:46,000 --> 00:46:49,300 Let's use this to solve the problem we want-- 584 00:47:09,900 --> 00:47:12,330 the linear quadratic regulator in its general form. 585 00:47:26,580 --> 00:47:30,680 So they've got a system x equals Ax plus Bu. 586 00:47:49,340 --> 00:47:57,440 And let's say I have a cost function J of x 0 587 00:47:57,440 --> 00:48:00,230 is h of x, t-- 588 00:48:00,230 --> 00:48:03,570 the same thing I've been writing all day here-- 589 00:48:03,570 --> 00:48:17,270 g of x, u dt, where x 0 equals x, where h in general 590 00:48:17,270 --> 00:48:23,060 takes the form x transpose Qfx, and g 591 00:48:23,060 --> 00:48:29,360 takes the form x transpose Qx plus u transpose Ru. 592 00:48:34,690 --> 00:48:39,550 To make things-- to be careful, we're going to assume that-- 593 00:48:42,070 --> 00:48:44,980 we're going to enforce-- we're choosing the cost function. 594 00:48:44,980 --> 00:48:48,310 We're going to enforce that this is positive definite, making 595 00:48:48,310 --> 00:48:51,580 sure we don't get any negative cost here. 596 00:48:51,580 --> 00:48:59,770 And similarly-- actually, it only has to be semi-definite. 597 00:48:59,770 --> 00:49:03,460 Q transpose equals Q greater than 598 00:49:03,460 --> 00:49:07,930 or equal to 0 and R transpose equals R. 599 00:49:07,930 --> 00:49:10,450 That one does have to be positive. 600 00:49:10,450 --> 00:49:17,690 Definite 601 00:49:17,690 --> 00:49:19,070 OK. 602 00:49:19,070 --> 00:49:24,530 Here's a pretty general linear dynamical system, 603 00:49:24,530 --> 00:49:28,070 quadratic regulator cost. 604 00:49:28,070 --> 00:49:31,970 To satisfy the HBJ, we simply have 605 00:49:31,970 --> 00:49:35,600 to have that this condition-- 606 00:50:00,040 --> 00:50:10,570 so 0 equals min over u x transpose Qx plus u transpose 607 00:50:10,570 --> 00:50:24,340 Ru plus partial J partial x star times Ax plus Bu 608 00:50:24,340 --> 00:50:31,480 plus partial J star partial t, that had better equal 0. 609 00:50:31,480 --> 00:50:35,980 So I need to find that cost-to-go function which 610 00:50:35,980 --> 00:50:37,350 makes this thing 0. 611 00:50:44,394 --> 00:50:49,460 It turns out the solution to these things, 612 00:50:49,460 --> 00:50:57,260 we can just guess a form for J. Let's guess that J star x of t 613 00:50:57,260 --> 00:51:09,280 is also quadratic, again with a positive-- 614 00:51:09,280 --> 00:51:11,046 it's going to have to be positive. 615 00:51:22,026 --> 00:51:32,090 It could be-- in that case, partial J partial 616 00:51:32,090 --> 00:51:39,860 x is 2x transpose S of t. 617 00:51:43,100 --> 00:51:51,265 Partial J partial t is x transpose s dot t x. 618 00:51:56,540 --> 00:51:57,040 OK. 619 00:52:00,197 --> 00:52:01,250 Let's pop this guy in. 620 00:52:33,053 --> 00:52:34,470 I want to just crank through here. 621 00:52:34,470 --> 00:52:40,080 So does it make sense at all, that the J of x, t 622 00:52:40,080 --> 00:52:42,598 would be a quadratic form like that? 623 00:52:42,598 --> 00:52:43,890 Why is that a reasonable guess? 624 00:52:48,710 --> 00:52:49,760 Yeah. 625 00:52:49,760 --> 00:52:52,256 AUDIENCE: Because the final time [INAUDIBLE] 626 00:52:52,256 --> 00:52:53,612 match the [INAUDIBLE]. 627 00:52:53,612 --> 00:52:54,320 RUSS TEDRAKE: OK. 628 00:52:54,320 --> 00:52:57,080 So in the final time, that's a reasonable guess, 629 00:52:57,080 --> 00:52:58,940 because it started like this. 630 00:53:03,680 --> 00:53:04,590 Yeah. 631 00:53:04,590 --> 00:53:05,340 And it turns out-- 632 00:53:05,340 --> 00:53:08,850 I mean, we're actually going to see it by verification. 633 00:53:08,850 --> 00:53:13,140 But for the linear system, when I pump the cost backwards 634 00:53:13,140 --> 00:53:14,940 in time, this quadratic cost, it's 635 00:53:14,940 --> 00:53:16,426 going to have to stay quadratic. 636 00:53:23,210 --> 00:53:23,720 OK. 637 00:53:23,720 --> 00:53:31,850 So I've got 0 equals min over u x transpose Qx plus u transpose 638 00:53:31,850 --> 00:53:37,700 Ru plus 2x transpose S of t-- 639 00:53:37,700 --> 00:53:51,564 bless you-- times Ax plus Bu plus x transpose S t x. 640 00:53:54,902 --> 00:53:56,360 I need that whole thing to work out 641 00:53:56,360 --> 00:54:02,100 to be 0 for the minimizing u. 642 00:54:02,100 --> 00:54:06,380 So let's figure out what the minimizing u is now. 643 00:54:06,380 --> 00:54:08,450 Is it OK if I just sort of shorthand? 644 00:54:08,450 --> 00:54:11,870 I'll say the gradient of that whole thing 645 00:54:11,870 --> 00:54:16,760 in square brackets with respect to u here is going to be, 646 00:54:16,760 --> 00:54:19,070 what, 2Ru-- 647 00:54:19,070 --> 00:54:20,876 or u transpose R, I guess? 648 00:54:26,330 --> 00:54:29,690 We're going to try to be careful that this whole thing is 649 00:54:29,690 --> 00:54:31,148 a scalar. 650 00:54:31,148 --> 00:54:32,815 We're always talking about scalar costs. 651 00:54:32,815 --> 00:54:35,240 So I've got vectors and matrices going around, 652 00:54:35,240 --> 00:54:38,870 but the whole thing has to collapse to be a scalar. 653 00:54:38,870 --> 00:54:44,870 The gradient of a scalar with respect to a vector, 654 00:54:44,870 --> 00:54:47,090 I want it to always be a vector. 655 00:54:47,090 --> 00:54:50,870 The gradient of a vector with respect to a vector 656 00:54:50,870 --> 00:54:52,220 is going to be a matrix. 657 00:54:52,220 --> 00:54:55,550 So try to be careful about making-- 658 00:54:55,550 --> 00:55:03,020 that gradient better be a vector plus what's left here? 659 00:55:03,020 --> 00:55:07,640 2x transpose S that guy there, right? 660 00:55:13,880 --> 00:55:17,390 But I have to take the transpose of that. 661 00:55:17,390 --> 00:55:23,480 So it's 2B transpose S of t. 662 00:55:23,480 --> 00:55:27,625 The S t transpose is not x-- 663 00:55:27,625 --> 00:55:29,150 I screwed up, sorry. 664 00:55:29,150 --> 00:55:30,320 It's still x transpose. 665 00:55:30,320 --> 00:55:38,570 I'm trying to-- x transpose S t B. That thing has to equal 0. 666 00:55:41,360 --> 00:55:44,540 And that's where I get my transpose back. 667 00:55:44,540 --> 00:55:50,760 So u star, the u that makes this gradient 0, is going to be-- 668 00:55:50,760 --> 00:55:53,300 those 2's cancel. 669 00:55:53,300 --> 00:55:59,750 It's going to be negative R inverse B transpose 670 00:55:59,750 --> 00:56:02,720 S transpose x. 671 00:56:11,070 --> 00:56:18,170 Which is important to realize that was actually-- 672 00:56:18,170 --> 00:56:22,450 it's equivalent to writing negative 1/2 R inverse 673 00:56:22,450 --> 00:56:28,630 B transpose partial J partial x transpose. 674 00:56:42,050 --> 00:56:43,550 OK. 675 00:56:43,550 --> 00:56:44,900 So what does this mean? 676 00:56:47,760 --> 00:56:52,880 So I've got some quadratic approximation 677 00:56:52,880 --> 00:56:54,890 of my value function. 678 00:56:54,890 --> 00:56:57,530 It's 0 at the origin always and forever. 679 00:56:57,530 --> 00:57:00,170 If I'm at the origin, I'm going to stay at the origin, 680 00:57:00,170 --> 00:57:02,060 my cost-to-go is 0. 681 00:57:02,060 --> 00:57:06,170 The exact shape of the quadratic bowl changes over time. 682 00:57:06,170 --> 00:57:11,150 The best thing to do is to go down 683 00:57:11,150 --> 00:57:13,550 to negative of the partial J partial x 684 00:57:13,550 --> 00:57:16,790 is trying to go down the cost-to-go function. 685 00:57:16,790 --> 00:57:19,250 I want to go down the cost-to-go function as fast as I can. 686 00:57:22,040 --> 00:57:24,560 But I'm going to wait-- 687 00:57:24,560 --> 00:57:27,560 I'm going to change, possibly, the exact direction. 688 00:57:27,560 --> 00:57:31,040 Rather than going straight down the cost-to-go function in x, 689 00:57:31,040 --> 00:57:32,815 I might orient myself a little bit 690 00:57:32,815 --> 00:57:34,190 depending on the weightings I put 691 00:57:34,190 --> 00:57:37,700 on-- the cost I put on the different u's. So I'm 692 00:57:37,700 --> 00:57:41,870 going to rotate that vector a little bit. 693 00:57:41,870 --> 00:57:45,770 This is what I can do, and this is the weighting I've done. 694 00:57:45,770 --> 00:57:48,575 So the best thing to do is to go down your cost-to-go function, 695 00:57:48,575 --> 00:57:50,450 get to the point where my cost-to-go is going 696 00:57:50,450 --> 00:57:54,290 to be as small as possible, filtered 697 00:57:54,290 --> 00:57:56,810 by the direction I can actually go 698 00:57:56,810 --> 00:58:00,650 and twisted by the way I penalize actions. 699 00:58:00,650 --> 00:58:01,150 OK. 700 00:58:04,420 --> 00:58:06,130 And it's sort of amazing, I think, 701 00:58:06,130 --> 00:58:13,180 that the whole thing works out to be just some linear feedback 702 00:58:13,180 --> 00:58:15,340 law negative Kx-- 703 00:58:15,340 --> 00:58:20,230 yet another reason [INAUDIBLE] to use that form. 704 00:58:28,900 --> 00:58:30,250 OK. 705 00:58:30,250 --> 00:58:31,750 Sorry, I should be a little careful. 706 00:58:31,750 --> 00:58:34,210 This is-- it depends on time. 707 00:58:34,210 --> 00:58:36,077 So it's K of t x. 708 00:58:39,732 --> 00:58:40,940 Why should it depend on time? 709 00:58:48,240 --> 00:58:49,745 This is a-- what's that? 710 00:58:49,745 --> 00:58:50,580 AUDIENCE: We switch. 711 00:58:50,580 --> 00:58:52,200 RUSS TEDRAKE: Because we switch what? 712 00:58:52,200 --> 00:58:53,712 AUDIENCE: The actuation. 713 00:58:53,712 --> 00:58:56,170 RUSS TEDRAKE: There's no hard switch in the actuation here. 714 00:58:56,170 --> 00:58:58,870 This is saying, I'm going to smoothly go down 715 00:58:58,870 --> 00:59:01,853 a value function. 716 00:59:01,853 --> 00:59:03,520 This one isn't the bang-bang controller. 717 00:59:03,520 --> 00:59:06,190 This turns out to be a smooth descent 718 00:59:06,190 --> 00:59:07,780 of some cost-to-go function. 719 00:59:10,530 --> 00:59:11,030 Yeah? 720 00:59:11,030 --> 00:59:15,897 AUDIENCE: The S t equals partial [INAUDIBLE].. 721 00:59:15,897 --> 00:59:18,230 RUSS TEDRAKE: I mean, S of t is time [INAUDIBLE] itself. 722 00:59:18,230 --> 00:59:20,190 AUDIENCE: Yeah, so it [INAUDIBLE].. 723 00:59:20,190 --> 00:59:22,040 RUSS TEDRAKE: So intuitively, why should I 724 00:59:22,040 --> 00:59:25,430 take a different linear control action if I'm at a time 725 00:59:25,430 --> 00:59:28,080 1 versus time 2? 726 00:59:28,080 --> 00:59:30,330 AUDIENCE: Because you're time dependent. 727 00:59:30,330 --> 00:59:32,250 So if you're very close to the final time, 728 00:59:32,250 --> 00:59:34,355 you want to [INAUDIBLE] lots of control, 729 00:59:34,355 --> 00:59:36,480 because you don't have that much time [INAUDIBLE].. 730 00:59:36,480 --> 00:59:38,480 RUSS TEDRAKE: Awesome, yeah. 731 00:59:38,480 --> 00:59:43,020 This is a quirk of having a finite horizon cost function. 732 00:59:43,020 --> 00:59:45,150 In the infinite horizon case, it turns out 733 00:59:45,150 --> 00:59:48,525 you're going to just get a u equals negative Kx, 734 00:59:48,525 --> 00:59:51,360 where K is a variant of time. 735 00:59:51,360 --> 00:59:52,620 But in the time-- 736 00:59:52,620 --> 00:59:56,010 finite horizon problem, there's this quirk, 737 00:59:56,010 --> 00:59:58,170 which is the time ends at some point, 738 00:59:58,170 --> 01:00:00,480 and I have to deal with it. 739 01:00:00,480 --> 01:00:06,000 If the bank closes at 5:00, if I'm here and it's 4:50, 740 01:00:06,000 --> 01:00:07,830 and the bank closes at 5:00, I'm going to-- 741 01:00:07,830 --> 01:00:10,533 I'd better get over there faster than if it was 4:30 742 01:00:10,533 --> 01:00:11,700 and the bank closes at 5:00. 743 01:00:14,970 --> 01:00:17,610 In my mind, actually, there's a lot of problems that are-- 744 01:00:17,610 --> 01:00:19,800 bank closing is a weird one, but there 745 01:00:19,800 --> 01:00:22,440 are a lot of problems that are naturally formulated 746 01:00:22,440 --> 01:00:24,900 as finite horizon problems. 747 01:00:24,900 --> 01:00:26,520 Things-- maybe a pick-and-place. 748 01:00:26,520 --> 01:00:29,392 The minimum time problem was a finite horizon, pick-and-place. 749 01:00:29,392 --> 01:00:31,350 There are a lot of problems which are naturally 750 01:00:31,350 --> 01:00:33,660 formulated as infinite horizon. 751 01:00:33,660 --> 01:00:36,900 I just want to walk as well as I possibly can 752 01:00:36,900 --> 01:00:37,930 for a very long time. 753 01:00:37,930 --> 01:00:40,830 I don't need to get to some place at a certain time. 754 01:00:40,830 --> 01:00:43,170 OK. 755 01:00:43,170 --> 01:00:45,900 But in many ways, the finite horizon time ones 756 01:00:45,900 --> 01:00:47,970 are the weird ones, because you always 757 01:00:47,970 --> 01:00:49,990 have to worry about the end of time approaching. 758 01:00:53,850 --> 01:00:54,682 OK. 759 01:00:54,682 --> 01:00:56,357 AUDIENCE: How do we get S t? 760 01:00:56,357 --> 01:00:57,690 RUSS TEDRAKE: How do we get S t? 761 01:00:57,690 --> 01:00:58,190 OK. 762 01:01:00,660 --> 01:01:03,930 Well, it's the thing that makes this equation 0. 763 01:01:08,930 --> 01:01:10,010 So what is that thing? 764 01:01:22,150 --> 01:01:24,270 I figured out what the minimizing u is. 765 01:01:27,180 --> 01:01:29,580 I can insert that back in. 766 01:01:29,580 --> 01:01:39,930 So I get now 0 equals Q plus x transpose-- 767 01:01:39,930 --> 01:01:42,420 I'm going to insert u in-- 768 01:01:42,420 --> 01:01:46,200 K-- or I'll do the whole thing, actually-- 769 01:01:46,200 --> 01:01:56,050 S of t B R inverse times R times R inverse. 770 01:01:56,050 --> 01:02:00,090 So I'm going to go ahead and cancel those out. 771 01:02:00,090 --> 01:02:03,846 B transpose S of t x. 772 01:02:07,140 --> 01:02:09,780 And the negative signs, because there's two u's there. 773 01:02:09,780 --> 01:02:13,260 The negative sign didn't get me. 774 01:02:13,260 --> 01:02:22,410 And then plus 2x transpose S of t Ax plus-- 775 01:02:22,410 --> 01:02:42,430 so minus B R inverse B transpose S of t x plus x transpose S dot 776 01:02:42,430 --> 01:02:42,970 of x. 777 01:02:50,420 --> 01:02:54,850 It turns out that this term here should 778 01:02:54,850 --> 01:02:58,220 be the same as that term there, modulo of factor 2. 779 01:02:58,220 --> 01:03:04,000 If you look, it's S, B, R inverse, B transpose, S. 780 01:03:04,000 --> 01:03:12,850 So this one actually, I can just turn that into a minus. 781 01:03:12,850 --> 01:03:13,350 OK. 782 01:03:16,510 --> 01:03:22,010 And it turns out that everything has this x transpose matrix x 783 01:03:22,010 --> 01:03:22,510 form. 784 01:03:25,430 --> 01:03:26,900 So I can actually-- 785 01:03:26,900 --> 01:03:30,350 in order for this thing to be true for all x, 786 01:03:30,350 --> 01:03:35,450 it must be that the matrix inside had better be 0. 787 01:03:35,450 --> 01:03:45,900 So it turns out to be 0 equals Q minus S t B R 788 01:03:45,900 --> 01:04:00,950 inverse B transpose S t plus 2 S t A plus S dot t 789 01:04:00,950 --> 01:04:04,190 had better be equal to 0. 790 01:04:04,190 --> 01:04:06,440 OK. 791 01:04:06,440 --> 01:04:09,560 Now, I made some assumptions to get here. 792 01:04:09,560 --> 01:04:12,200 Know what assumptions I made? 793 01:04:12,200 --> 01:04:15,050 The big one is that I guessed that form of the value 794 01:04:15,050 --> 01:04:16,203 function. 795 01:04:16,203 --> 01:04:17,870 And one of the things I guessed about it 796 01:04:17,870 --> 01:04:20,570 was that it was symmetric. 797 01:04:20,570 --> 01:04:22,640 So let's see if we're looking symmetric. 798 01:04:22,640 --> 01:04:25,400 So Q, we already said, was symmetric. 799 01:04:25,400 --> 01:04:27,200 That's all good. 800 01:04:27,200 --> 01:04:29,780 That guy's nice and symmetric. 801 01:04:29,780 --> 01:04:31,820 That's all good. 802 01:04:31,820 --> 01:04:35,360 So this is the one we have to worry about. 803 01:04:35,360 --> 01:04:36,604 Is that guy symmetric? 804 01:04:42,040 --> 01:04:44,290 It's actually not symmetric like that. 805 01:04:44,290 --> 01:04:54,430 But I can equivalently write it as S t A plus A transpose S t, 806 01:04:54,430 --> 01:04:55,992 since S is symmetric. 807 01:04:59,370 --> 01:05:00,911 And that guy is symmetric. 808 01:05:14,133 --> 01:05:15,300 I said a very strange thing. 809 01:05:15,300 --> 01:05:17,202 I just said that the matrices are-- 810 01:05:17,202 --> 01:05:19,410 this one is not symmetric, I can write the same thing 811 01:05:19,410 --> 01:05:20,460 as-- it's this. 812 01:05:20,460 --> 01:05:36,540 So what I mean to say is that these are equivalent for all x. 813 01:05:47,750 --> 01:05:53,390 Because this has got to equal this. 814 01:06:02,310 --> 01:06:02,810 OK. 815 01:06:02,810 --> 01:06:13,970 So, good. 816 01:06:13,970 --> 01:06:15,590 OK. 817 01:06:15,590 --> 01:06:16,990 So this equation, which I'm going 818 01:06:16,990 --> 01:06:19,100 to write one more time since it's 819 01:06:19,100 --> 01:06:20,690 an equation that has a name associated 820 01:06:20,690 --> 01:06:22,250 with someone famous-- 821 01:06:22,250 --> 01:06:25,220 deserves a box around it, I guess. 822 01:06:25,220 --> 01:06:28,820 So this is the Riccati equation. 823 01:06:28,820 --> 01:06:33,816 I'm going to move the S over to this side. 824 01:06:33,816 --> 01:06:34,910 It's a Riccati equation. 825 01:06:58,020 --> 01:06:59,520 And I also have that final condition 826 01:06:59,520 --> 01:07:02,820 that you rightly pointed out, where S of capital T 827 01:07:02,820 --> 01:07:03,930 had better equal Qf. 828 01:07:12,390 --> 01:07:17,880 So direct application of the Hamilton-Bellman-Jacobi 829 01:07:17,880 --> 01:07:27,330 equation, I was able to derive this Riccati equation, which 830 01:07:27,330 --> 01:07:30,720 gives me a solution for the value function. 831 01:07:30,720 --> 01:07:34,590 Because it gives me a final condition on an S 832 01:07:34,590 --> 01:07:37,020 and then the governing equation which 833 01:07:37,020 --> 01:07:40,050 integrates the equation backwards from capital T to 0. 834 01:07:45,990 --> 01:07:48,150 And once I have S, remember, we said 835 01:07:48,150 --> 01:07:58,180 that the u was just negative R inverse B transpose S of t x. 836 01:07:58,180 --> 01:07:59,160 So I've got everything. 837 01:07:59,160 --> 01:08:00,955 Once I have S, I have everything. 838 01:08:09,700 --> 01:08:10,200 OK. 839 01:08:10,200 --> 01:08:13,200 So this is one of the absolute fundamental results 840 01:08:13,200 --> 01:08:15,225 in optimal control. 841 01:08:18,930 --> 01:08:26,279 It turns out that if you want to know the infinite horizon 842 01:08:26,279 --> 01:08:28,829 solution to the-- 843 01:08:28,829 --> 01:08:32,100 if you look at the solution as time goes to infinity-- 844 01:08:32,100 --> 01:08:35,460 remember, I wrote my cost function initially was-- 845 01:08:35,460 --> 01:08:41,040 the problem we're trying to solve is an infinite integral. 846 01:08:41,040 --> 01:08:43,140 It turns out that the infinite solution 847 01:08:43,140 --> 01:08:47,910 is the steady-state solution of this equation. 848 01:08:47,910 --> 01:08:51,210 So if you integrate this equation back enough, 849 01:08:51,210 --> 01:08:51,930 it's stable. 850 01:08:51,930 --> 01:08:56,819 It finds a steady state where S dot is 0. 851 01:08:56,819 --> 01:08:59,220 And that solution when S dot equals 852 01:08:59,220 --> 01:09:18,310 0, The S which solves this, that whole thing minus Q, 853 01:09:18,310 --> 01:09:19,705 is the infinite horizon solution. 854 01:09:31,770 --> 01:09:34,260 OK. 855 01:09:34,260 --> 01:09:47,340 If you open up Matlab, and you type lqr A, B, Q, R, 856 01:09:47,340 --> 01:09:50,340 then it's going to output two things. 857 01:09:50,340 --> 01:09:57,060 It outputs K, and it outputs S. Solving this thing is actually 858 01:09:57,060 --> 01:09:57,990 not trivial. 859 01:09:57,990 --> 01:10:03,060 So how do you solve that for S? 860 01:10:03,060 --> 01:10:09,120 The hard one is it's got this S in both places. 861 01:10:09,120 --> 01:10:13,230 But this is the Lyapunov equation again. 862 01:10:13,230 --> 01:10:15,870 It's so famous, it comes up so pervasively, 863 01:10:15,870 --> 01:10:18,870 that people have really good tools for solving it, 864 01:10:18,870 --> 01:10:20,300 numerical tools for solving it. 865 01:10:20,300 --> 01:10:22,050 So Matlab's got some nice routine in there 866 01:10:22,050 --> 01:10:26,280 to solve, to find S. 867 01:10:26,280 --> 01:10:30,660 And when I call lqr with the dynamics 868 01:10:30,660 --> 01:10:36,960 and the Q, R gives me exactly the infinite horizon S 869 01:10:36,960 --> 01:10:41,220 and infinite horizon non-time-variant K. 870 01:10:41,220 --> 01:10:43,895 If you need to do a finite horizon quadratic regulator, 871 01:10:43,895 --> 01:10:46,020 then you actually need to integrate these equations 872 01:10:46,020 --> 01:10:48,530 yourself. 873 01:10:48,530 --> 01:10:49,590 OK. 874 01:10:49,590 --> 01:10:53,220 I hate going that long with just equations and not intuition. 875 01:10:53,220 --> 01:10:56,175 So let me connect it back to the brick now. 876 01:10:56,175 --> 01:10:59,620 That was the point of doing everything in the brick world 877 01:10:59,620 --> 01:11:00,120 here. 878 01:11:03,630 --> 01:11:05,580 OK. 879 01:11:05,580 --> 01:11:10,140 So we've got Q double dot equals u. 880 01:11:10,140 --> 01:11:16,980 We've got now infinite horizon J x 881 01:11:16,980 --> 01:11:25,590 is infinite horizon g x, u dt, where I said g x, 882 01:11:25,590 --> 01:11:33,015 u was 1/2 Q squared plus 1/2 Q dot squared plus 1/2 u squared. 883 01:11:36,330 --> 01:11:44,235 So now that's exactly in the LQR form 0, 1, 0, 0. 884 01:11:46,830 --> 01:11:50,670 B is 0, 1. 885 01:11:50,670 --> 01:11:56,850 Q is the identity matrix. 886 01:11:56,850 --> 01:11:58,530 And R is 1. 887 01:12:02,550 --> 01:12:04,140 It turns out I can actually solve 888 01:12:04,140 --> 01:12:08,640 that one algebraically for S. If you pump all the symbols in-- 889 01:12:08,640 --> 01:12:11,460 I won't do it because there's a lot of symbols-- 890 01:12:11,460 --> 01:12:15,270 but in a few lines of algebra, you can figure out what S has 891 01:12:15,270 --> 01:12:19,440 to be, just because so many terms drop out with those 892 01:12:19,440 --> 01:12:21,400 0's that actually there's-- 893 01:12:24,130 --> 01:12:28,080 There's the three equations and three unknowns. 894 01:12:28,080 --> 01:12:38,070 And it turns out that S has to be square root of 2, 1, 1, 895 01:12:38,070 --> 01:12:38,970 square root of 2. 896 01:12:42,610 --> 01:12:43,110 OK. 897 01:13:03,540 --> 01:13:14,040 The u, remember, was negative R inverse B transpose 898 01:13:14,040 --> 01:13:22,620 B transpose S x, which, if I punch those in, 899 01:13:22,620 --> 01:13:28,440 gives me 1 square root of 2 times 900 01:13:28,440 --> 01:13:52,370 x, which gives me closed loop dynamics of x dot equals Ax 901 01:13:52,370 --> 01:14:02,150 minus BKx is equal to 0, 1, negative 1, 902 01:14:02,150 --> 01:14:06,520 square root of 2 times x. 903 01:14:12,150 --> 01:14:13,410 OK. 904 01:14:13,410 --> 01:14:15,030 Now I'm going to plot two things here. 905 01:14:20,030 --> 01:14:30,790 First thing I'm going to plot is J of x. 906 01:14:30,790 --> 01:14:35,530 J of x is square root of 2, 1, 1, square root of 2. 907 01:14:38,150 --> 01:14:39,650 A little thinking about that, you'll 908 01:14:39,650 --> 01:14:50,490 see that it comes out to be an ellipsoid that is-- 909 01:14:50,490 --> 01:14:58,400 [INAUDIBLE]---- sort of shaped like this. 910 01:15:01,180 --> 01:15:03,670 I draw contours of that function, 911 01:15:03,670 --> 01:15:08,260 of that x transpose S x. 912 01:15:14,305 --> 01:15:17,720 And the cost-to-go is 0 here. 913 01:15:17,720 --> 01:15:25,950 And it's a bowl that comes up in this sort of elliptic bowl. 914 01:15:36,110 --> 01:15:36,610 All right. 915 01:15:36,610 --> 01:15:39,460 So what is the optimal policy going to look like, 916 01:15:39,460 --> 01:15:40,780 given that that's my bowl? 917 01:15:45,550 --> 01:15:48,350 We said the best thing to do is go down the steepest descent 918 01:15:48,350 --> 01:15:48,850 of the bowl. 919 01:15:52,070 --> 01:15:53,200 I want to go down-- 920 01:15:53,200 --> 01:15:55,990 wherever I am, I want to go down as fast as I can. 921 01:15:59,730 --> 01:16:01,470 But I can't do it exactly. 922 01:16:01,470 --> 01:16:03,180 That was actually sort of a-- 923 01:16:03,180 --> 01:16:03,720 that's OK. 924 01:16:03,720 --> 01:16:07,110 I mean, I can't do it exactly, because all I'm allowed to do 925 01:16:07,110 --> 01:16:07,980 is change-- 926 01:16:07,980 --> 01:16:11,950 I have one component that I'm not allowed to change, right? 927 01:16:11,950 --> 01:16:19,470 I have that my Q is going to go forward independent of u 928 01:16:19,470 --> 01:16:20,640 directly. 929 01:16:20,640 --> 01:16:25,170 So B transpose S x is going to be give me a projection 930 01:16:25,170 --> 01:16:27,810 of that gradient onto this-- 931 01:16:27,810 --> 01:16:30,090 the thing I can actually control, 932 01:16:30,090 --> 01:16:32,400 which way I can point my phase portrait 933 01:16:32,400 --> 01:16:33,510 in that given my control. 934 01:16:36,690 --> 01:16:39,480 And then R is going to scale it again. 935 01:16:39,480 --> 01:16:44,080 And the resulting closed loop dynamics, 936 01:16:44,080 --> 01:16:45,580 let's see if we can figure that out. 937 01:16:49,210 --> 01:16:52,500 So if I take the eigenvectors and eigenvalues that, well, 938 01:16:52,500 --> 01:16:55,290 it turns out I'm not going to make the plot. 939 01:16:55,290 --> 01:17:03,270 My eigenvalues were square root of 2 plus or minus i 1 940 01:17:03,270 --> 01:17:12,280 over square root of 2 with v being 1 over square root of 2. 941 01:17:24,980 --> 01:17:28,750 So the best thing I can possibly do is to go down that-- 942 01:17:28,750 --> 01:17:30,920 if I didn't care about-- 943 01:17:30,920 --> 01:17:32,560 if I didn't worry about penalizing R, 944 01:17:32,560 --> 01:17:34,310 I didn't worry about my control actuation, 945 01:17:34,310 --> 01:17:36,680 would be to go straight down that bowl. 946 01:17:36,680 --> 01:17:38,680 But because I'm scaling things by-- 947 01:17:38,680 --> 01:17:41,180 I'm filtering things by wearing what I can actually control, 948 01:17:41,180 --> 01:17:43,610 and I'm penalizing things by R, the actual response 949 01:17:43,610 --> 01:17:47,280 is a complex response which goes down-- 950 01:17:47,280 --> 01:17:50,000 goes down this bowl and oscillates its way 951 01:17:50,000 --> 01:17:51,230 into the origin. 952 01:18:12,610 --> 01:18:13,860 OK, good. 953 01:18:13,860 --> 01:18:14,860 It was a little painful. 954 01:18:14,860 --> 01:18:21,190 But that is a set of tools that we're 955 01:18:21,190 --> 01:18:24,220 going to lean on when we're making all our algorithms. 956 01:18:24,220 --> 01:18:28,120 You've now seen a pretty representative sampling 957 01:18:28,120 --> 01:18:30,250 of what people can do analytically 958 01:18:30,250 --> 01:18:32,770 with optimal control. 959 01:18:32,770 --> 01:18:36,040 When you have a linear dynamical system, 960 01:18:36,040 --> 01:18:40,090 and there's a handful of cost functions which you can-- 961 01:18:40,090 --> 01:18:43,480 either by Pontryagin or dynamic programming, 962 01:18:43,480 --> 01:18:46,240 the Hamilton-Bellman-Jacobi sufficiency theorem, 963 01:18:46,240 --> 01:18:50,770 those are really the two big tools that are out there. 964 01:18:50,770 --> 01:18:53,080 In cases, especially for linear systems, 965 01:18:53,080 --> 01:18:56,680 you can analytically come up with optimal control policies 966 01:18:56,680 --> 01:19:00,520 and value functions. 967 01:19:00,520 --> 01:19:01,870 Why did we distinguish the two? 968 01:19:01,870 --> 01:19:06,580 Why did I use one in one place and the other 969 01:19:06,580 --> 01:19:07,720 in the other place? 970 01:19:07,720 --> 01:19:12,100 Well, it turns out the Hamilton-Bellman-Jacobi 971 01:19:12,100 --> 01:19:18,130 sufficiency theorem has in it these partial J 972 01:19:18,130 --> 01:19:21,910 partial x, partial J partial t. 973 01:19:21,910 --> 01:19:27,040 So it's only valid, actually, if partial J partial x is smooth. 974 01:19:29,820 --> 01:19:36,720 The policy we got from minimum time 975 01:19:36,720 --> 01:19:40,336 has this hard nonlinearity in the middle of it. 976 01:19:40,336 --> 01:19:42,630 It turns out that the value function 977 01:19:42,630 --> 01:19:46,560 that you have in the minimum time problem 978 01:19:46,560 --> 01:19:49,140 also has a hard nonlinearity in it. 979 01:19:49,140 --> 01:19:52,260 If I'm here versus here, it's smooth, 980 01:19:52,260 --> 01:19:54,690 but the gradients are not smooth. 981 01:19:54,690 --> 01:19:56,920 The gradient is discontinuous. 982 01:19:56,920 --> 01:20:02,580 So on this cusp, partial J partial x is undefined. 983 01:20:02,580 --> 01:20:04,290 So that's the only reason why I didn't 984 01:20:04,290 --> 01:20:08,640 lean on the sufficiency theorem completely. 985 01:20:08,640 --> 01:20:12,300 How did Pontryagin get around that? 986 01:20:12,300 --> 01:20:15,480 The sufficiency theorem is talking about-- 987 01:20:19,398 --> 01:20:20,730 it's looking at over-- 988 01:20:20,730 --> 01:20:22,860 roughly over the entire state space. 989 01:20:22,860 --> 01:20:27,900 It's looking at variations in the cost-to-go function 990 01:20:27,900 --> 01:20:31,530 as I move in x and in time. 991 01:20:31,530 --> 01:20:35,310 Pontryagin, if you remember, was along a particular trajectory. 992 01:20:35,310 --> 01:20:37,290 It was verifying that a particular trajectory 993 01:20:37,290 --> 01:20:40,380 was locally optimal. 994 01:20:40,380 --> 01:20:42,660 And it turns out in problems like this 995 01:20:42,660 --> 01:20:48,660 in these bang-bang problems, along a particular trajectory, 996 01:20:48,660 --> 01:20:53,830 my cost-to-go is smooth. 997 01:20:53,830 --> 01:20:55,720 The cost-to-go in the minimum time problem 998 01:20:55,720 --> 01:20:59,560 was just time, right? 999 01:20:59,560 --> 01:21:02,440 So the time I get-- 1000 01:21:02,440 --> 01:21:04,630 the time it takes for me to go to here to here 1001 01:21:04,630 --> 01:21:08,950 is just smoothly decreasing as I get closer like time. 1002 01:21:08,950 --> 01:21:13,690 Along any trajectory, with these additive costs, 1003 01:21:13,690 --> 01:21:16,180 the value function is going to be smooth. 1004 01:21:16,180 --> 01:21:19,300 But along a non-system trajectory, 1005 01:21:19,300 --> 01:21:21,130 some line like this, partial-- 1006 01:21:21,130 --> 01:21:26,140 if I just look at J, how J varies over x, it's not smooth. 1007 01:21:26,140 --> 01:21:28,180 So Pontryagin is a weaker statement. 1008 01:21:28,180 --> 01:21:31,600 It's a statement about local stability along a trajectory. 1009 01:21:31,600 --> 01:21:34,120 But it's valid in slightly larger domains, 1010 01:21:34,120 --> 01:21:36,850 because it doesn't rely on value functions 1011 01:21:36,850 --> 01:21:38,678 being smoothly differentiable. 1012 01:21:44,420 --> 01:21:50,660 Now, for the first-order-- 1013 01:21:50,660 --> 01:21:53,090 sorry, for the double integrator, the brick on ice, 1014 01:21:53,090 --> 01:21:55,940 we could have just chosen our K's by hand 1015 01:21:55,940 --> 01:21:58,340 and pushed them higher or smaller. 1016 01:21:58,340 --> 01:21:59,150 We could do locus. 1017 01:21:59,150 --> 01:22:01,070 We could figure out a pretty reasonable set 1018 01:22:01,070 --> 01:22:06,160 of K's, of feedback gains, to make it stabilize to the goal. 1019 01:22:06,160 --> 01:22:09,980 LQR gives us a different set of knobs that we could tune. 1020 01:22:09,980 --> 01:22:12,860 Now we could more explicitly say what 1021 01:22:12,860 --> 01:22:16,303 our concern is for getting to the goal by the Q matrix, 1022 01:22:16,303 --> 01:22:18,470 versus what our concern is about using a lot of cost 1023 01:22:18,470 --> 01:22:19,310 in the R matrix. 1024 01:22:21,860 --> 01:22:24,050 So maybe that's not very compelling. 1025 01:22:24,050 --> 01:22:25,910 Maybe we just did a lot of work to just 1026 01:22:25,910 --> 01:22:27,493 have a slightly different set of knobs 1027 01:22:27,493 --> 01:22:29,810 to turn when I'm designing my feedback controller. 1028 01:22:29,810 --> 01:22:31,430 But what you're going to see is that, 1029 01:22:31,430 --> 01:22:35,000 for much more complicated systems that are still linear-- 1030 01:22:35,000 --> 01:22:38,660 or linearizations about very complicated systems, 1031 01:22:38,660 --> 01:22:40,490 LQR is going to give you an explicit way 1032 01:22:40,490 --> 01:22:45,140 to design these linear feedback controllers in a way that's 1033 01:22:45,140 --> 01:22:47,060 optimal. 1034 01:22:47,060 --> 01:22:50,570 So we're actually doing a variation of LQR 1035 01:22:50,570 --> 01:22:54,958 now to make an airplane land on a perch, for instance. 1036 01:22:54,958 --> 01:22:56,750 We can-- we're going to use it to stabilize 1037 01:22:56,750 --> 01:23:00,840 the double-inverted pendulum, the Acrobot, around the top. 1038 01:23:00,840 --> 01:23:04,340 So it's going to be a generally more useful tool. 1039 01:23:04,340 --> 01:23:06,860 Down at the brick, double integrator level, 1040 01:23:06,860 --> 01:23:09,110 you can think it's almost just a different set of ways 1041 01:23:09,110 --> 01:23:10,010 to do your locus. 1042 01:23:12,570 --> 01:23:13,070 OK. 1043 01:23:13,070 --> 01:23:16,580 You have now, through two sort of dry lectures 1044 01:23:16,580 --> 01:23:18,290 relative to the rest of the class, 1045 01:23:18,290 --> 01:23:25,567 learned two ways to do analytical optimal control. 1046 01:23:25,567 --> 01:23:27,650 One is by means of Pontryagin's minimum principle, 1047 01:23:27,650 --> 01:23:29,990 one is by means of dynamic programming, which 1048 01:23:29,990 --> 01:23:34,880 is through the HJB sufficiency theorem. 1049 01:23:34,880 --> 01:23:36,410 And you've seen some representatives 1050 01:23:36,410 --> 01:23:39,860 of what people can do with those analytical optimal control. 1051 01:23:39,860 --> 01:23:45,620 And it got us far enough to make a brick go to the origin. 1052 01:23:45,620 --> 01:23:46,310 Right. 1053 01:23:46,310 --> 01:23:48,410 And it'll do a few more things, but. 1054 01:23:48,410 --> 01:23:52,670 OK, so that's about as far as we get with analytics. 1055 01:23:52,670 --> 01:23:56,330 We're going to use this in places to start algorithms up. 1056 01:23:56,330 --> 01:24:02,840 But if we want to, for instance, solve the minimum time problem 1057 01:24:02,840 --> 01:24:05,390 or the quadratic regulator problem, 1058 01:24:05,390 --> 01:24:09,610 for the nonlinear dynamics of the pendulum, 1059 01:24:09,610 --> 01:24:13,540 if I take my x dot equals Ax plus Bu away 1060 01:24:13,540 --> 01:24:18,970 and give it the mgL sine theta, then most of these tools 1061 01:24:18,970 --> 01:24:19,540 break down. 1062 01:24:22,210 --> 01:24:25,780 Next Tuesday happens to be a holiday, virtual Monday. 1063 01:24:25,780 --> 01:24:27,830 So we won't do it on next Tuesday. 1064 01:24:27,830 --> 01:24:30,760 But next Thursday, I'm going to show you algorithms 1065 01:24:30,760 --> 01:24:31,840 that are based on these. 1066 01:24:31,840 --> 01:24:36,400 This is the important foundation that are going to solve 1067 01:24:36,400 --> 01:24:40,570 algorithmically the same optimal control problems that we're-- 1068 01:24:40,570 --> 01:24:46,180 more optimal control problems that we can solve analytically. 1069 01:24:46,180 --> 01:24:49,600 And then the-- we'll go on from there 1070 01:24:49,600 --> 01:24:51,990 to more and more complicated systems.