1 00:00:00,000 --> 00:00:02,520 The following content is provided under a Creative 2 00:00:02,520 --> 00:00:03,970 Commons license. 3 00:00:03,970 --> 00:00:06,330 Your support will help MIT OpenCourseWare 4 00:00:06,330 --> 00:00:10,660 continue to offer high-quality educational resources for free. 5 00:00:10,660 --> 00:00:13,320 To make a donation or view additional materials 6 00:00:13,320 --> 00:00:17,160 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,160 --> 00:00:18,370 at ocw.mit.edu. 8 00:00:22,060 --> 00:00:23,800 RUSS TEDRAKE: OK, welcome back. 9 00:00:27,020 --> 00:00:31,510 So last week, we spent the week talking about policy search 10 00:00:31,510 --> 00:00:36,130 methods, and trying to make a distinction between those 11 00:00:36,130 --> 00:00:39,040 and the value-based methods we started with. 12 00:00:39,040 --> 00:00:42,100 And by the end of the week, we had a couple 13 00:00:42,100 --> 00:00:46,660 pretty slick methods for optimizing 14 00:00:46,660 --> 00:00:50,390 an open-loop trajectory of the system. 15 00:00:50,390 --> 00:00:52,360 So we talked about at least two ways. 16 00:01:15,790 --> 00:01:17,790 So by open-loop, I mean it's a function of time, 17 00:01:17,790 --> 00:01:20,880 not a function of state. 18 00:01:20,880 --> 00:01:30,750 We talked about the shooting methods, 19 00:01:30,750 --> 00:01:47,070 where we evaluated J of alpha x0 times 0 just by simulation. 20 00:01:47,070 --> 00:01:55,495 And we evaluated-- explicitly evaluated the gradients by-- 21 00:01:55,495 --> 00:01:57,120 well, I gave you two algorithms for it. 22 00:01:57,120 --> 00:02:01,750 I gave you one that I called back prop through time-- 23 00:02:01,750 --> 00:02:03,570 which was an adjoint method-- 24 00:02:03,570 --> 00:02:06,990 and another one that I called RTRL-- 25 00:02:06,990 --> 00:02:08,889 real-time recurrent learning, which 26 00:02:08,889 --> 00:02:10,889 are the names from the neural network community, 27 00:02:10,889 --> 00:02:14,490 but perfectly good names for those methods. 28 00:02:26,292 --> 00:02:27,750 And then the claim was that, if you 29 00:02:27,750 --> 00:02:33,780 can compute those two things by simulation or-- 30 00:02:33,780 --> 00:02:37,470 forward simulation and then a back propagation pass, 31 00:02:37,470 --> 00:02:40,890 or a simulation, which carried also the derivatives forward 32 00:02:40,890 --> 00:02:44,340 in time, then we could hand those gradients 33 00:02:44,340 --> 00:02:50,365 to SNOPT or some other non-linear optimization 34 00:02:50,365 --> 00:02:50,865 package. 35 00:02:54,400 --> 00:02:56,400 And if we're good, we can also lean on SNOPT 36 00:02:56,400 --> 00:02:58,650 to handle things like final value constraints. 37 00:02:58,650 --> 00:03:00,900 If you want to make sure the trajectory succeeds 38 00:03:00,900 --> 00:03:02,610 in getting you exactly to the goal 39 00:03:02,610 --> 00:03:05,193 or if you want to make sure that your torques are never bigger 40 00:03:05,193 --> 00:03:07,500 than some maximum talk allowed, then you 41 00:03:07,500 --> 00:03:10,770 can take advantage of that. 42 00:03:10,770 --> 00:03:14,190 And the second method, remember, was direct co-location method, 43 00:03:14,190 --> 00:03:17,400 which we often abbreviate as DIRCOL. 44 00:03:21,660 --> 00:03:26,290 And the big idea there was to over-parameterized 45 00:03:26,290 --> 00:03:38,100 our optimization with the open-loop trajectory, but also 46 00:03:38,100 --> 00:03:44,780 the state trajectory, which makes coming up 47 00:03:44,780 --> 00:03:51,485 with gradients simple. 48 00:03:55,610 --> 00:04:05,455 And then I have to enforce the constraint that x of-- 49 00:04:05,455 --> 00:04:06,830 let's say, in discrete time here, 50 00:04:06,830 --> 00:04:11,301 n plus 1 had better be subject to the dynamics-- 51 00:04:17,560 --> 00:04:23,230 so two very similar methods of trying 52 00:04:23,230 --> 00:04:28,673 to compute some open-loop trajectory 53 00:04:28,673 --> 00:04:29,590 as a function of time. 54 00:04:29,590 --> 00:04:33,610 Ultimately, what I care about is a set of actions 55 00:04:33,610 --> 00:04:37,540 that I apply over time that will get me to the goal 56 00:04:37,540 --> 00:04:41,320 or minimize my cost function. 57 00:04:41,320 --> 00:04:44,020 In the case where I explicitly parameterized 58 00:04:44,020 --> 00:04:50,770 an open-loop trajectory, both of these results 59 00:04:50,770 --> 00:04:53,980 in a solution which satisfies the Pontryagin minimum 60 00:04:53,980 --> 00:04:59,323 principle, subject to discretization 61 00:04:59,323 --> 00:05:00,490 errors and things like that. 62 00:05:29,636 --> 00:05:34,053 AUDIENCE: [INAUDIBLE] 63 00:05:34,053 --> 00:05:35,220 RUSS TEDRAKE: We did, right. 64 00:05:35,220 --> 00:05:40,540 So I should say, subject to time discretization. 65 00:05:40,540 --> 00:05:45,130 That's the one place where technically, it 66 00:05:45,130 --> 00:05:47,980 would satisfy a discrete time version of Pontryagin's 67 00:05:47,980 --> 00:05:48,730 minimum principle. 68 00:05:48,730 --> 00:06:01,895 AUDIENCE: [INAUDIBLE] 69 00:06:01,895 --> 00:06:03,270 RUSS TEDRAKE: You can think of it 70 00:06:03,270 --> 00:06:06,060 whichever way it makes you happier-- 71 00:06:06,060 --> 00:06:09,900 so in fact, the parameters that you hand in-- 72 00:06:09,900 --> 00:06:12,710 maybe it's easier to think of it as a function-- 73 00:06:12,710 --> 00:06:15,210 a discrete function of time, because you're going to hand it 74 00:06:15,210 --> 00:06:20,310 u at certain points in time, and you're going to handle x-- 75 00:06:20,310 --> 00:06:23,760 hand it x at certain points in time. 76 00:06:23,760 --> 00:06:27,300 And this discrete time update can be an Euler integration 77 00:06:27,300 --> 00:06:31,260 or a higher order integration of your continuous dynamics, 78 00:06:31,260 --> 00:06:32,760 but you only satisfy the constraints 79 00:06:32,760 --> 00:06:33,960 of discrete intervals of time. 80 00:06:33,960 --> 00:06:34,460 Yeah. 81 00:06:39,660 --> 00:06:42,940 OK, I did give you a slightly more general-- 82 00:06:42,940 --> 00:06:47,470 I tried to point out that these methods could equally well 83 00:06:47,470 --> 00:06:52,390 compute, find good parameters of a feedback control or something 84 00:06:52,390 --> 00:06:53,800 too. 85 00:06:53,800 --> 00:06:56,770 The simple case was when my parameters alpha 86 00:06:56,770 --> 00:07:00,880 were explicitly my control tape, but more 87 00:07:00,880 --> 00:07:03,460 generally, if you wanted to tune a feedback 88 00:07:03,460 --> 00:07:05,320 controller-- a linear feedback controller, 89 00:07:05,320 --> 00:07:06,580 or a non-linear feedback controller, 90 00:07:06,580 --> 00:07:08,290 or a neural network, or whatever it is, 91 00:07:08,290 --> 00:07:11,500 you can use the same methods to do that. 92 00:07:11,500 --> 00:07:13,600 I would only make this statement in the case 93 00:07:13,600 --> 00:07:17,740 where the controller specifically 94 00:07:17,740 --> 00:07:26,267 is the open-loop tape, because if I parameterized 95 00:07:26,267 --> 00:07:28,600 my trajectory by some feedback controller, for instance, 96 00:07:28,600 --> 00:07:31,120 then that's going to restrict the policy class. 97 00:07:31,120 --> 00:07:33,250 That's going to restrict the class of tapes 98 00:07:33,250 --> 00:07:36,430 that I can look over, which makes it a more compact, more 99 00:07:36,430 --> 00:07:38,890 efficient way to solve your optimization, 100 00:07:38,890 --> 00:07:41,290 but potentially prevents you from achieving 101 00:07:41,290 --> 00:07:43,150 a perfect minimum. 102 00:07:46,998 --> 00:07:50,110 AUDIENCE: [INAUDIBLE] 103 00:07:50,110 --> 00:07:50,860 RUSS TEDRAKE: Yep. 104 00:07:50,860 --> 00:07:53,163 So by virtue of saying that they satisfy 105 00:07:53,163 --> 00:07:54,580 Pontryagin's minimum principle, we 106 00:07:54,580 --> 00:07:56,860 know that that's only a local optima. 107 00:07:56,860 --> 00:08:01,480 This says that I can't make a small change in u 108 00:08:01,480 --> 00:08:04,000 to get better performance. 109 00:08:04,000 --> 00:08:04,960 Yep. 110 00:08:04,960 --> 00:08:08,692 But it's only a necessary condition for optimality, not 111 00:08:08,692 --> 00:08:09,400 a sufficient one. 112 00:08:11,980 --> 00:08:16,220 But there's a bigger problem with it-- with both of those. 113 00:08:16,220 --> 00:08:19,750 And that's the fact that they're completely 114 00:08:19,750 --> 00:08:23,320 useless in real life, unless I do 115 00:08:23,320 --> 00:08:26,350 one more step, which is to stabilize the trajectory as I 116 00:08:26,350 --> 00:08:28,210 get out. 117 00:08:28,210 --> 00:08:35,530 So finding some open-loop trajectory by these methods, 118 00:08:35,530 --> 00:08:38,740 satisfying Pontryagin's minimum principle-- fine. 119 00:08:38,740 --> 00:08:42,760 But there's nothing in this process that says, 120 00:08:42,760 --> 00:08:46,750 if I don't-- if I changed my initial conditions by epsilon, 121 00:08:46,750 --> 00:08:48,970 I could completely diverge when I follow-- 122 00:08:48,970 --> 00:08:52,180 when I execute that open-loop trajectory. 123 00:08:52,180 --> 00:08:55,330 If I change my simulation time step by a little bit, 124 00:08:55,330 --> 00:08:56,320 I might diverge. 125 00:08:56,320 --> 00:08:58,610 If I have modeling errors, I might diverge. 126 00:09:01,120 --> 00:09:04,695 So in order to make these useful for a real system, 127 00:09:04,695 --> 00:09:06,070 we have to do another step, which 128 00:09:06,070 --> 00:09:08,350 is trajectory stabilization. 129 00:09:08,350 --> 00:09:11,020 And it actually follows quite naturally from the things 130 00:09:11,020 --> 00:09:13,530 we've already talked about. 131 00:09:13,530 --> 00:09:17,080 OK, so today we're going to give these guys teeth 132 00:09:17,080 --> 00:09:18,896 with a trajectory optimization. 133 00:09:52,160 --> 00:09:54,650 And I'll show you examples of a trajectory 134 00:09:54,650 --> 00:09:56,870 that's optimized beautifully for the pendulum even, 135 00:09:56,870 --> 00:09:59,060 and if I simulate it a little differently back-- 136 00:09:59,060 --> 00:10:00,143 just does the wrong thing. 137 00:10:00,143 --> 00:10:01,550 It never gets to the top. 138 00:10:01,550 --> 00:10:06,460 So we want to get rid of that problem. 139 00:10:09,190 --> 00:10:12,675 OK, so the solution is to design a trajectory stabilization. 140 00:10:19,330 --> 00:10:27,040 Now, for those of you that have been 141 00:10:27,040 --> 00:10:28,893 playing with robots for many years, when 142 00:10:28,893 --> 00:10:30,310 you hear trajectory stabilization, 143 00:10:30,310 --> 00:10:31,360 what do you think of? 144 00:10:31,360 --> 00:10:33,730 What kind of tricks to people use 145 00:10:33,730 --> 00:10:35,368 for trajectory stabilization? 146 00:10:37,735 --> 00:10:38,860 AUDIENCE: Sliding surfaces. 147 00:10:38,860 --> 00:10:41,830 RUSS TEDRAKE: Sliding surfaces-- that's a good one for-- 148 00:10:41,830 --> 00:10:45,820 [INAUDIBLE] often will design a sliding surface 149 00:10:45,820 --> 00:10:47,470 and squish the aerodynamics. 150 00:10:50,800 --> 00:10:53,320 That's actually pretty encompassing. 151 00:10:53,320 --> 00:10:56,380 I think a lot of the trajectory stabilizers 152 00:10:56,380 --> 00:10:59,410 are based on sliding modes or feedback 153 00:10:59,410 --> 00:11:02,440 linearization in some form. 154 00:11:02,440 --> 00:11:06,388 And all I'll say about it is that the story's sort 155 00:11:06,388 --> 00:11:07,930 of the same as everything we've said. 156 00:11:07,930 --> 00:11:10,090 If you have a fully actuated system, 157 00:11:10,090 --> 00:11:14,980 it's not hard to design a trajectory stabilizer. 158 00:11:14,980 --> 00:11:16,930 A good sliding mode controller could take-- 159 00:11:16,930 --> 00:11:21,190 could work even for an underactuated system, 160 00:11:21,190 --> 00:11:22,580 but I think there's a-- 161 00:11:22,580 --> 00:11:25,570 I prefer the linear quadratic form 162 00:11:25,570 --> 00:11:27,610 of these trajectories stabilizers. 163 00:11:27,610 --> 00:11:31,690 OK, so we want to do a trajectory stabilization that's 164 00:11:31,690 --> 00:11:33,410 suitable for underactuated systems. 165 00:11:45,070 --> 00:11:47,200 And the approach is going to be with LQR. 166 00:12:02,490 --> 00:12:03,870 OK, so if we're going to use LQR, 167 00:12:03,870 --> 00:12:06,285 we better be able to linearize our system. 168 00:12:08,920 --> 00:12:11,670 So far, when we've done the linearizations, 169 00:12:11,670 --> 00:12:15,875 we've only done them at fixed points. 170 00:12:15,875 --> 00:12:17,250 So the first thing we have to ask 171 00:12:17,250 --> 00:12:21,060 ourselves is, what happens if we try to linearize at a more 172 00:12:21,060 --> 00:12:23,440 arbitrary point in state space? 173 00:12:23,440 --> 00:12:23,940 Yeah. 174 00:12:31,860 --> 00:12:38,890 So let's say I've got the system x dot equals f of xu, 175 00:12:38,890 --> 00:12:46,200 and now I want to linearize around some x0, u0, but not 176 00:12:46,200 --> 00:12:48,990 necessarily a carefully chosen x0, u0-- 177 00:12:48,990 --> 00:12:51,270 just something random in state space. 178 00:12:54,480 --> 00:12:57,810 The Taylor expansion of this says 179 00:12:57,810 --> 00:13:01,980 that this thing's going to be approximately 180 00:13:01,980 --> 00:13:09,579 f of x0, u0, partial f, partial x evaluated at x. 181 00:13:27,210 --> 00:13:30,720 OK, and we called this before A and this B, 182 00:13:30,720 --> 00:13:49,510 and so that thing we can actually write as, in general, 183 00:13:49,510 --> 00:13:53,332 in the case where f of x0, u0-- if x0, 184 00:13:53,332 --> 00:13:54,790 u0 was a fixed point of the system, 185 00:13:54,790 --> 00:13:59,620 that term disappears, but be careful. 186 00:13:59,620 --> 00:14:02,860 If you're doing your linearization out here, 187 00:14:02,860 --> 00:14:05,230 if you're at-- not at a fixed point, 188 00:14:05,230 --> 00:14:07,180 if you have any velocity, for instance, 189 00:14:07,180 --> 00:14:11,915 then, in the original x-coordinates, 190 00:14:11,915 --> 00:14:14,290 it's not actually-- the Taylor expansion doesn't give you 191 00:14:14,290 --> 00:14:15,370 a linear system. 192 00:14:15,370 --> 00:14:16,810 It gives you some affine system. 193 00:14:16,810 --> 00:14:18,768 This thing is harder to work with-- not 194 00:14:18,768 --> 00:14:20,560 incredibly harder, but harder to work with. 195 00:14:23,820 --> 00:14:26,230 The solution is quite simple, but I just 196 00:14:26,230 --> 00:14:30,820 wanted to say it the bad way first 197 00:14:30,820 --> 00:14:34,420 so that you appreciate the good way. 198 00:14:34,420 --> 00:14:47,780 If we change coordinates and we use instead 199 00:14:47,780 --> 00:14:57,530 for our coordinates the difference between x and x0 200 00:14:57,530 --> 00:15:27,080 of t, then x bar dot is going to be x dot minus x0 dot 201 00:15:27,080 --> 00:15:37,730 equals x dot minus f of x0, u0, which is that C. 202 00:15:37,730 --> 00:15:45,150 This guy here is taken care of in this new coordinate system, 203 00:15:45,150 --> 00:15:47,900 which allows me to write the whole thing as x 204 00:15:47,900 --> 00:15:51,680 bar dot equals A of x bar. 205 00:16:03,220 --> 00:16:04,740 You with me on that? 206 00:16:04,740 --> 00:16:11,290 Linearizing a system at a more arbitrary point-- 207 00:16:11,290 --> 00:16:14,100 doing a Taylor expansion results in a linear system 208 00:16:14,100 --> 00:16:17,640 only if you change coordinates to lie 209 00:16:17,640 --> 00:16:21,810 on some system trajectory. 210 00:16:21,810 --> 00:16:40,650 So x0, u0 must be a solution of x of f of xu of that equation. 211 00:16:40,650 --> 00:16:46,830 And then the system reduces to a linear system description. 212 00:16:46,830 --> 00:16:51,670 But the cost you pay for this beautiful, simple-- 213 00:16:51,670 --> 00:16:54,600 well, let me be even a little bit more careful. 214 00:16:54,600 --> 00:17:01,690 So A here, this partial f, partial x, 215 00:17:01,690 --> 00:17:07,030 is evaluated at xt u of t. 216 00:17:07,030 --> 00:17:09,160 And in general, A and B in this-- 217 00:17:09,160 --> 00:17:15,603 when I do this are functions of time, as well as x and t. 218 00:17:18,407 --> 00:17:19,740 That's a pretty important point. 219 00:17:22,730 --> 00:17:25,760 So if I'm willing to change coordinates to live along 220 00:17:25,760 --> 00:17:29,420 the trajectory, then the result is I can get this linear 221 00:17:29,420 --> 00:17:34,310 time-varying model of the dynamics along feasible 222 00:17:34,310 --> 00:17:35,540 trajectories-- 223 00:17:35,540 --> 00:17:37,460 system trajectories. 224 00:17:37,460 --> 00:17:39,380 The cost is that you have to work 225 00:17:39,380 --> 00:17:43,340 in a coordinate system that moves along your trajectory. 226 00:17:43,340 --> 00:17:46,438 So we'll see where that comes in in a little bit. 227 00:17:46,438 --> 00:17:48,980 But the first question is, OK, let's say I've got this linear 228 00:17:48,980 --> 00:17:51,080 time-varying-- 229 00:17:51,080 --> 00:17:52,550 time-varying linear system. 230 00:17:52,550 --> 00:17:55,940 Can I do all the things I want to do with that? 231 00:17:55,940 --> 00:17:59,690 In most of our control classes, we end up doing LTI systems. 232 00:18:02,780 --> 00:18:05,907 LTV systems-- linear time-varying-- 233 00:18:19,070 --> 00:18:24,680 are actually a fantastically rich class of systems 234 00:18:24,680 --> 00:18:27,470 that we don't talk about enough, I think, in life. 235 00:18:27,470 --> 00:18:29,600 They're still linear systems. 236 00:18:29,600 --> 00:18:33,320 Superposition still holds. 237 00:18:33,320 --> 00:18:42,210 If I have initial condition 1 and some u 238 00:18:42,210 --> 00:18:48,650 trajectory 1 for t greater than equal to t0, 239 00:18:48,650 --> 00:18:52,850 and that gives me some resulting x trajectory out 240 00:18:52,850 --> 00:19:00,710 for t greater than t0, and I have another solution 241 00:19:00,710 --> 00:19:05,270 with a different initial condition 242 00:19:05,270 --> 00:19:14,760 and a different control, and that gives me a different-- 243 00:19:14,760 --> 00:19:20,240 I call this x1, x2 for t greater than or equal to t0-- 244 00:19:25,420 --> 00:19:28,120 if I have that, then it better be 245 00:19:28,120 --> 00:19:37,960 the case that alpha 1 x1 of t0 plus alpha 2 x2 of t0 246 00:19:37,960 --> 00:19:48,520 plus alpha 1 u1 tape plus alpha 2 u2 tape 247 00:19:48,520 --> 00:19:50,320 is going to result in a trajectory which 248 00:19:50,320 --> 00:19:55,780 is alpha 1 x1 plus alpha 2 x2. 249 00:20:00,040 --> 00:20:01,330 That's superposition. 250 00:20:01,330 --> 00:20:08,470 That's the defining characteristic of linearity. 251 00:20:08,470 --> 00:20:11,290 And even though this is a richer class of systems-- 252 00:20:11,290 --> 00:20:17,200 these A of t, x of t, B of t, u of t-- 253 00:20:17,200 --> 00:20:18,670 superposition still holds. 254 00:20:22,710 --> 00:20:25,560 And in fact, a lot of our derivations 255 00:20:25,560 --> 00:20:28,140 that we've done that are for linear systems still hold. 256 00:20:36,660 --> 00:20:39,600 OK, so now the question is, how do we design-- 257 00:20:39,600 --> 00:20:42,950 how do we work with the fact that this thing is still easy, 258 00:20:42,950 --> 00:20:45,050 and design a controller that works 259 00:20:45,050 --> 00:20:47,320 with this new linearized system? 260 00:21:13,118 --> 00:21:15,660 Maybe first I should break out my colored chalk and make sure 261 00:21:15,660 --> 00:21:16,868 we have intuition about this. 262 00:21:21,278 --> 00:21:22,820 Do you understand what this is doing, 263 00:21:22,820 --> 00:21:26,770 if I do this time-varying linearization? 264 00:21:26,770 --> 00:21:29,020 Let me do an example with the pendulum 265 00:21:29,020 --> 00:21:34,480 here, our favorite theta, theta dot. 266 00:21:34,480 --> 00:21:37,390 And let's say we carve up-- 267 00:21:37,390 --> 00:21:43,420 we find some nice solution which gets me from my one fixed point 268 00:21:43,420 --> 00:21:45,425 to the other fixed point. 269 00:21:45,425 --> 00:21:47,800 The ones we were getting were these pump-up trajectories, 270 00:21:47,800 --> 00:21:49,570 which looked something like this. 271 00:21:56,030 --> 00:21:59,990 I'm moving through state space here, 272 00:21:59,990 --> 00:22:05,660 and the dynamics here vary with state in a non-linear way. 273 00:22:05,660 --> 00:22:08,660 But if I have a trajectory, a feasible trajectory 274 00:22:08,660 --> 00:22:12,230 that goes through the relevant parts of state space, 275 00:22:12,230 --> 00:22:15,410 then this time-varying linearization takes 276 00:22:15,410 --> 00:22:20,570 my non-linear system, and makes it parameterized 277 00:22:20,570 --> 00:22:23,883 only-- instead of by being parameterized by state, 278 00:22:23,883 --> 00:22:25,550 it's going to make it parameterized only 279 00:22:25,550 --> 00:22:28,483 by time along the trajectory. 280 00:22:33,060 --> 00:22:41,910 The trick is the trajectory allows 281 00:22:41,910 --> 00:23:01,720 me to reparameterize my non-linearity in terms of time, 282 00:23:01,720 --> 00:23:03,548 instead of state. 283 00:23:03,548 --> 00:23:04,840 It sounds like a simple thing-- 284 00:23:04,840 --> 00:23:06,130 I'm just reparameterizing it-- 285 00:23:06,130 --> 00:23:08,170 but it makes all the difference in the world. 286 00:23:08,170 --> 00:23:11,110 If things are parameterized as a function of time, 287 00:23:11,110 --> 00:23:13,902 and are otherwise linear, then I could do all kinds 288 00:23:13,902 --> 00:23:14,860 of computation on them. 289 00:23:14,860 --> 00:23:17,290 I can integrate the equations. 290 00:23:17,290 --> 00:23:20,832 I can design quadratic regulators on it. 291 00:23:20,832 --> 00:23:22,540 It makes all the difference in the world. 292 00:23:26,650 --> 00:23:29,530 So what I'm effectively doing is coming up 293 00:23:29,530 --> 00:23:34,360 with local linear representations of the dynamics 294 00:23:34,360 --> 00:23:35,290 along the trajectory. 295 00:23:35,290 --> 00:23:37,630 I'm not sure if this is a helpful way for me to draw it, 296 00:23:37,630 --> 00:23:41,710 but you can think of this thing as approximating the dynamics 297 00:23:41,710 --> 00:23:43,840 along that trajectory. 298 00:23:43,840 --> 00:23:45,760 At every given instant in time, I'm 299 00:23:45,760 --> 00:23:49,600 going to use one of these linear models. 300 00:23:49,600 --> 00:23:51,730 This is supposed to be some plane 301 00:23:51,730 --> 00:23:53,122 that you're driving through-- 302 00:23:53,122 --> 00:23:54,580 not sure if that's actually helpful 303 00:23:54,580 --> 00:23:57,920 graphic, but it's the way I think of it. 304 00:24:01,190 --> 00:24:04,280 And by virtue of taking a particular path through, 305 00:24:04,280 --> 00:24:07,670 I can make locally linear models on which these things 306 00:24:07,670 --> 00:24:09,320 have eigenvectors, and eigenvalues, 307 00:24:09,320 --> 00:24:14,480 or whatever that are valid in the neighborhood 308 00:24:14,480 --> 00:24:18,210 of the trajectory. 309 00:24:18,210 --> 00:24:22,970 So if you can imagine, even without any stabilization, 310 00:24:22,970 --> 00:24:26,000 it could be that I could quickly assess 311 00:24:26,000 --> 00:24:30,170 the stability of my time-varying linear model. 312 00:24:30,170 --> 00:24:32,150 And trajectories in this linear model 313 00:24:32,150 --> 00:24:35,900 may converge to the nominal limit cycle, 314 00:24:35,900 --> 00:24:40,150 or they may diverge, depending on A and B. 315 00:24:40,150 --> 00:24:42,380 Or they may blow up. 316 00:24:46,250 --> 00:24:48,470 This is by far the more common case, unfortunately. 317 00:24:48,470 --> 00:24:51,020 You'd be very lucky to come out of a shooting method 318 00:24:51,020 --> 00:24:53,570 or a direct co-location method, and end up with a system 319 00:24:53,570 --> 00:24:55,403 where if you played it out, it just happened 320 00:24:55,403 --> 00:24:58,280 to be a stable trajectory. 321 00:24:58,280 --> 00:25:00,260 But we can assess all that quickly 322 00:25:00,260 --> 00:25:05,810 with these time-varying linearizations found locally. 323 00:25:05,810 --> 00:25:06,660 Make sense? 324 00:25:06,660 --> 00:25:07,404 Yeah? 325 00:25:07,404 --> 00:25:12,330 AUDIENCE: [INAUDIBLE] talk about that there 326 00:25:12,330 --> 00:25:13,790 is a bad way of doing this. 327 00:25:13,790 --> 00:25:16,130 This is not a bad way of doing this, right? 328 00:25:16,130 --> 00:25:17,246 We were talkinga about it. 329 00:25:21,560 --> 00:25:24,560 RUSS TEDRAKE: If I do a Taylor expansion of my system 330 00:25:24,560 --> 00:25:28,040 in the original coordinate system, which is x, 331 00:25:28,040 --> 00:25:30,500 then it's not linear. 332 00:25:30,500 --> 00:25:33,170 End parentheses, that was the bad way to do it. 333 00:25:33,170 --> 00:25:34,610 Yeah? 334 00:25:34,610 --> 00:25:35,750 Good way to do it-- 335 00:25:35,750 --> 00:25:39,500 change the coordinates to a coordinate system, which 336 00:25:39,500 --> 00:25:43,070 moves with the trajectory. 337 00:25:43,070 --> 00:25:46,523 If you do that, things become time-varying linear. 338 00:25:46,523 --> 00:25:48,440 That was a good way to do it, and that's still 339 00:25:48,440 --> 00:25:49,200 in open parentheses. 340 00:25:49,200 --> 00:25:49,970 We're still going. 341 00:25:49,970 --> 00:25:50,470 Yeah. 342 00:26:00,210 --> 00:26:06,360 OK, so our task now is to design a time-varying feedback 343 00:26:06,360 --> 00:26:08,250 controller-- since our model is time-varying, 344 00:26:08,250 --> 00:26:11,970 you'd expect our solution to also be time-varying-- 345 00:26:11,970 --> 00:26:15,715 which takes these bad, unstable trajectories of the system-- 346 00:26:15,715 --> 00:26:16,590 and they really are-- 347 00:26:16,590 --> 00:26:19,350 I'll show you simple pendulum. 348 00:26:19,350 --> 00:26:20,825 This trajectory comes out. 349 00:26:20,825 --> 00:26:22,950 Actually, if you just integrate in a different way, 350 00:26:22,950 --> 00:26:24,930 it'll go off and do the wrong thing. 351 00:26:24,930 --> 00:26:27,520 It typically doesn't go off and add energy to the system 352 00:26:27,520 --> 00:26:28,193 so much. 353 00:26:28,193 --> 00:26:28,860 The ones I get-- 354 00:26:28,860 --> 00:26:31,230 I see, I'll show you, are more-- 355 00:26:31,230 --> 00:26:33,930 they diverge and the other way, and end up just floating around 356 00:26:33,930 --> 00:26:37,260 here, for instance. 357 00:26:37,260 --> 00:26:40,860 But they're not going to get you up here. 358 00:26:40,860 --> 00:26:45,150 So can we design a time-varying stabilizer 359 00:26:45,150 --> 00:26:46,958 that regulates that trajectory? 360 00:26:56,230 --> 00:27:04,570 OK, I did actually do the original finite horizon LQR 361 00:27:04,570 --> 00:27:10,420 derivation on the board that day-- 362 00:27:10,420 --> 00:27:12,910 definitely won't write all that again, but let 363 00:27:12,910 --> 00:27:22,790 me say that roughly nothing in that derivation breaks-- 364 00:27:22,790 --> 00:27:24,880 I'm going to show you the important pieces-- 365 00:27:24,880 --> 00:27:28,260 nothing in that derivation breaks, surprisingly, 366 00:27:28,260 --> 00:27:32,670 if A and B are now a function of time. 367 00:27:32,670 --> 00:27:33,670 So let's remember that-- 368 00:27:42,480 --> 00:27:43,860 the LQR derivation. 369 00:27:56,860 --> 00:28:01,040 Now I'm working with this x bar coordinate system. 370 00:28:12,150 --> 00:28:13,860 And I want to design a cost function 371 00:28:13,860 --> 00:28:17,460 to minimize here, which lives in this coordinate system 372 00:28:17,460 --> 00:28:18,900 again here. 373 00:28:18,900 --> 00:28:24,795 Let's say it's the final horizon times Qf-- 374 00:28:29,100 --> 00:28:30,600 I've been trying to use t little f, 375 00:28:30,600 --> 00:28:34,590 since my transposes look like the final horizon time 376 00:28:34,590 --> 00:28:36,180 otherwise-- 377 00:28:36,180 --> 00:28:40,350 0 to tf dt x bar-- 378 00:28:40,350 --> 00:28:46,380 again, transpose Q plus u bar Ru. 379 00:28:57,800 --> 00:29:00,050 OK, in the original LQR derivation, 380 00:29:00,050 --> 00:29:03,590 we guessed that the form-- 381 00:29:03,590 --> 00:29:13,850 that the optimal policy had the form x bar S of t x bar. 382 00:29:13,850 --> 00:29:15,080 That's still intact. 383 00:29:15,080 --> 00:29:16,850 That's still a good assumption. 384 00:29:16,850 --> 00:29:19,108 This thing's linear. 385 00:29:19,108 --> 00:29:20,900 It's just in a different coordinate system. 386 00:29:25,620 --> 00:29:30,350 And we started cranking through the sufficiency theorem, 387 00:29:30,350 --> 00:29:32,900 the Hamilton-Jacobi-Bellman equation. 388 00:29:41,590 --> 00:29:44,740 And we found that our optimal feedback policy-- 389 00:29:44,740 --> 00:29:47,080 first of all, our optimal cost-to-go 390 00:29:47,080 --> 00:29:51,790 was described by this Riccati equation, which 391 00:29:51,790 --> 00:30:00,670 was negative S of t is Q minus S of t B 392 00:30:00,670 --> 00:30:14,500 our inverse B transpose S of t plus S of t A plus A transpose 393 00:30:14,500 --> 00:30:16,180 S of t. 394 00:30:16,180 --> 00:30:18,610 And it turns out that, with the-- if you have 395 00:30:18,610 --> 00:30:21,490 a time-varying A and B, that it's-- 396 00:30:21,490 --> 00:30:24,890 exact same dynamics govern it. 397 00:30:24,890 --> 00:30:30,100 You just have your time dependence also in A and B. 398 00:30:30,100 --> 00:30:35,190 And that exact same Riccati equation works, 399 00:30:35,190 --> 00:30:40,220 and our final value condition was just Qf. 400 00:30:49,813 --> 00:30:52,980 And you can see from this, if it didn't make a difference for me 401 00:30:52,980 --> 00:30:56,160 when A and B became functions of time, 402 00:30:56,160 --> 00:30:59,650 it's pretty simple-- although less interesting, I guess. 403 00:30:59,650 --> 00:31:02,160 If Q were to be a function of time-- no problem. 404 00:31:02,160 --> 00:31:03,960 If R was a function of time-- no problem. 405 00:31:08,040 --> 00:31:14,580 They still have to be positive definite and symmetric. 406 00:31:24,450 --> 00:31:27,510 Oops-- I did it the wrong way. 407 00:31:27,510 --> 00:31:29,520 Q can be 0, but R can not be 0. 408 00:31:38,910 --> 00:31:41,280 OK, so the LQR you know and love, 409 00:31:41,280 --> 00:31:48,120 that you've used in Matlab, is the time invariant infinite 410 00:31:48,120 --> 00:31:50,813 horizon LQR. 411 00:31:50,813 --> 00:31:52,980 I told you that, if you cared about a finite horizon 412 00:31:52,980 --> 00:31:55,530 and you had a time invariant linear system, then 413 00:31:55,530 --> 00:31:56,640 suddenly you had to-- 414 00:31:56,640 --> 00:31:59,070 you couldn't just find the stationary points in this. 415 00:31:59,070 --> 00:32:01,590 Remember, Matlab's solution just tells you the long-term 416 00:32:01,590 --> 00:32:06,780 behavior of S. In the time-- 417 00:32:06,780 --> 00:32:10,180 finite horizon time, even the LTI case, 418 00:32:10,180 --> 00:32:13,480 which is the A and B do not depend on time-- 419 00:32:13,480 --> 00:32:15,420 the linear time invariant case-- 420 00:32:15,420 --> 00:32:18,120 I still had to integrate back this Riccati equation in order 421 00:32:18,120 --> 00:32:20,730 to get my LQR controller. 422 00:32:20,730 --> 00:32:22,710 It's no more expensive to do the same thing 423 00:32:22,710 --> 00:32:25,890 in the linear time-varying feedback case. 424 00:32:30,720 --> 00:32:32,400 And the resulting controller is-- 425 00:32:36,140 --> 00:32:45,650 u star is my nominal controller minus my R inverse B transpose 426 00:32:45,650 --> 00:32:47,210 S of t x bar. 427 00:32:59,810 --> 00:33:03,890 These equations come up enough that these 428 00:33:03,890 --> 00:33:07,220 are pretty famous, pretty important equations, and so I-- 429 00:33:07,220 --> 00:33:10,097 those I know off the top of my head. 430 00:33:10,097 --> 00:33:11,180 They come up all the time. 431 00:33:15,320 --> 00:33:18,410 And this is the resulting optimal trajectory, 432 00:33:18,410 --> 00:33:22,840 which is my nominal trajectory plus my feedback gain, which 433 00:33:22,840 --> 00:33:24,650 came out of my original LQR controller, 434 00:33:24,650 --> 00:33:25,525 if you remember that. 435 00:33:28,872 --> 00:33:32,978 AUDIENCE: [INAUDIBLE] 436 00:33:32,978 --> 00:33:34,020 RUSS TEDRAKE: Yes-- good. 437 00:33:34,020 --> 00:33:39,372 I should definitely put a T under B. Thank you. 438 00:33:39,372 --> 00:33:41,580 I haven't written that case, but R could equally well 439 00:33:41,580 --> 00:33:42,780 be time-dependent. 440 00:33:49,020 --> 00:33:51,870 OK, so something big just happened. 441 00:33:54,660 --> 00:33:59,880 I can take a really, really complicated non-linear system 442 00:33:59,880 --> 00:34:04,020 along some trajectory-- if I find a good trajectory, 443 00:34:04,020 --> 00:34:06,210 then I can actually linearize that system 444 00:34:06,210 --> 00:34:09,955 along this trajectory and stabilize it. 445 00:34:09,955 --> 00:34:12,330 The thing I haven't convinced you of yet-- because I only 446 00:34:12,330 --> 00:34:14,790 know how to do it from showing examples, 447 00:34:14,790 --> 00:34:17,590 but it really works well. 448 00:34:17,590 --> 00:34:20,460 So even though it's a linear system-- 449 00:34:20,460 --> 00:34:23,610 it's a linear approximation of the non-linear system, 450 00:34:23,610 --> 00:34:27,690 something like the [INAUDIBLE] or the cartpole swing-up. 451 00:34:27,690 --> 00:34:29,699 It's got a huge basin of attraction. 452 00:34:29,699 --> 00:34:31,440 Lots and lots of initial conditions 453 00:34:31,440 --> 00:34:35,340 will find their way to the trajectory and get to the goal. 454 00:34:35,340 --> 00:34:38,969 If you want to do non-linear control of a humanoid robot 455 00:34:38,969 --> 00:34:45,449 or something like this, this actually scales pretty nicely. 456 00:34:45,449 --> 00:34:47,520 I just have to solve this equation. 457 00:34:47,520 --> 00:34:48,840 S is the size-- 458 00:34:48,840 --> 00:34:54,570 is a matrix that's by number of states. 459 00:34:54,570 --> 00:34:56,560 But I could do that in 30 dimensions. 460 00:34:56,560 --> 00:34:59,160 That's no problem. 461 00:34:59,160 --> 00:35:01,500 And even for very non-linear systems, 462 00:35:01,500 --> 00:35:06,000 local linear feedback works very, very well-- 463 00:35:06,000 --> 00:35:08,740 so well, in fact, that I think that, if you ask-- 464 00:35:08,740 --> 00:35:11,550 and when I did ask the [INAUDIBLE] guys, 465 00:35:11,550 --> 00:35:13,743 Sasha Megretski says, this is definitely 466 00:35:13,743 --> 00:35:15,660 what I would do if I was controlling a walking 467 00:35:15,660 --> 00:35:19,770 robot or something like that. 468 00:35:19,770 --> 00:35:21,750 We're trying to do the same thing to control 469 00:35:21,750 --> 00:35:23,645 neurons in a dish now. 470 00:35:23,645 --> 00:35:25,020 We're trying to build good models 471 00:35:25,020 --> 00:35:27,810 of the dynamics-- time-varying models, for instance-- 472 00:35:27,810 --> 00:35:30,530 and then doing this kind of control. 473 00:35:30,530 --> 00:35:32,010 Yeah. 474 00:35:32,010 --> 00:35:34,860 It works really, really well. 475 00:35:34,860 --> 00:35:37,500 The only complaint about it is that it's 476 00:35:37,500 --> 00:35:41,680 going to have-- it's based on this linear approximation, 477 00:35:41,680 --> 00:35:43,560 so it will have a finite basin of attraction. 478 00:35:43,560 --> 00:35:46,680 For some systems, it can be quite big. 479 00:35:46,680 --> 00:35:48,810 If you have systems with hard non-linearities, 480 00:35:48,810 --> 00:35:51,113 it won't be as big. 481 00:35:51,113 --> 00:35:52,530 Later in the course, I'll show you 482 00:35:52,530 --> 00:35:55,320 ways to explicitly reason about the size of those basins 483 00:35:55,320 --> 00:35:57,540 of attraction, but today let's just 484 00:35:57,540 --> 00:36:00,523 say this is a good thing to know, 485 00:36:00,523 --> 00:36:01,940 good thing to have in your pocket. 486 00:36:06,430 --> 00:36:08,510 Let me show you a working-- 487 00:36:08,510 --> 00:36:10,810 try to convince you that it's pretty good. 488 00:36:43,910 --> 00:36:47,720 OK, so let's see where I've left myself here. 489 00:36:47,720 --> 00:36:50,020 I took this-- the pendulum-- 490 00:36:50,020 --> 00:36:51,270 let's do the shooting version. 491 00:36:51,270 --> 00:36:54,080 They both work fine, but let's do the shooting version. 492 00:37:00,130 --> 00:37:02,350 Is that bigger than I did last time? 493 00:37:02,350 --> 00:37:03,390 That's pretty obnoxious. 494 00:37:03,390 --> 00:37:04,923 Maybe it's always been obnoxious. 495 00:37:09,660 --> 00:37:11,550 Can we get away with that? 496 00:37:11,550 --> 00:37:13,080 Yeah. 497 00:37:13,080 --> 00:37:16,300 You guys are like, I'm not blind. 498 00:37:16,300 --> 00:37:16,800 OK. 499 00:37:22,090 --> 00:37:24,910 So I showed you last time the shooting code. 500 00:37:24,910 --> 00:37:31,810 It comes out with a resulting tape x, t, and u. 501 00:37:31,810 --> 00:37:34,090 After the result of these trajectory optimizers, 502 00:37:34,090 --> 00:37:37,180 whether it's shooting or whether it's 503 00:37:37,180 --> 00:37:38,980 direct co-location-- whatever it is-- 504 00:37:38,980 --> 00:37:40,720 it comes up with some open-loop tape. 505 00:37:40,720 --> 00:37:43,720 I put x in there too just to-- as the reference trajectory 506 00:37:43,720 --> 00:37:45,700 that results, but what really matters 507 00:37:45,700 --> 00:37:50,140 is the time stamps and u command, the open-loop tape. 508 00:38:18,120 --> 00:38:19,370 Why don't I save it this time? 509 00:38:34,260 --> 00:38:35,870 OK, so it comes up-- in this case, 510 00:38:35,870 --> 00:38:37,470 with these parameters I've chosen, 511 00:38:37,470 --> 00:38:39,170 comes up with some one-pump policy. 512 00:38:39,170 --> 00:38:41,750 With the torque limits I have, the [INAUDIBLE] I have, 513 00:38:41,750 --> 00:38:44,390 it comes up with a one-pump policy that gets me 514 00:38:44,390 --> 00:38:47,510 to the top in four seconds. 515 00:38:47,510 --> 00:38:53,421 OK, let me now just simulate that a little bit differently. 516 00:39:06,190 --> 00:39:09,390 So the only thing I'm going to do here now is-- 517 00:39:09,390 --> 00:39:15,600 this control_ode is just a simulation which plays back 518 00:39:15,600 --> 00:39:19,303 exactly the same open-loop tape, but it plays it back 519 00:39:19,303 --> 00:39:20,970 with a little more careful integration-- 520 00:39:20,970 --> 00:39:23,137 because in the actual-- in the shooting code I used, 521 00:39:23,137 --> 00:39:26,010 I used the big time step just so I don't waste time computing 522 00:39:26,010 --> 00:39:27,810 gradients to the n-th degree of accuracy. 523 00:39:27,810 --> 00:39:29,070 That's not worthwhile. 524 00:39:29,070 --> 00:39:31,140 If I simulate the exact same thing back 525 00:39:31,140 --> 00:39:32,782 with a more careful ode integration, 526 00:39:32,782 --> 00:39:33,740 let's see what happens. 527 00:39:43,690 --> 00:39:47,090 So that was that same trajectory that-- 528 00:39:47,090 --> 00:39:49,980 exact same control inputs, just simulated more carefully. 529 00:39:49,980 --> 00:39:52,505 It made its honest effort to get up there, 530 00:39:52,505 --> 00:39:53,880 but it didn't quite get up there, 531 00:39:53,880 --> 00:39:56,550 turned around, and came back down. 532 00:39:56,550 --> 00:39:59,820 I'm trying to show it also in just-- 533 00:39:59,820 --> 00:40:03,640 this is the different state trajectories over time. 534 00:40:03,640 --> 00:40:05,760 You can see that the red and blue lines 535 00:40:05,760 --> 00:40:09,570 are the desired versus actual in the-- in theta, in this case. 536 00:40:09,570 --> 00:40:13,260 And these two lines are the desired versus 537 00:40:13,260 --> 00:40:15,075 actual in theta dot. 538 00:40:15,075 --> 00:40:17,640 They start off exactly on top of each other, 539 00:40:17,640 --> 00:40:20,070 but just little differences in the numerics 540 00:40:20,070 --> 00:40:23,020 causes them to go in different directions-- 541 00:40:23,020 --> 00:40:23,700 part ways. 542 00:40:27,680 --> 00:40:32,240 OK, so now I've got this LTV LQR solution, 543 00:40:32,240 --> 00:40:34,210 which is exactly what I just showed you. 544 00:40:40,020 --> 00:40:43,070 So I was just simulating a just now with just u 545 00:40:43,070 --> 00:40:44,360 being the nominal u. 546 00:40:44,360 --> 00:40:47,810 Now I'm going to add this time-varying feedback term, 547 00:40:47,810 --> 00:40:48,950 x minus x desired. 548 00:40:55,030 --> 00:40:57,880 And now my more careful integration 549 00:40:57,880 --> 00:41:00,830 results in a closed-loop system, which not only got to the goal, 550 00:41:00,830 --> 00:41:02,705 but actually stayed up at the goal, because I 551 00:41:02,705 --> 00:41:06,590 have a stable system all the way to the top. 552 00:41:06,590 --> 00:41:07,450 OK? 553 00:41:07,450 --> 00:41:09,970 All right, so what I just said was very unimpressive. 554 00:41:09,970 --> 00:41:15,580 I said I computed a open-loop policy with my methods 555 00:41:15,580 --> 00:41:16,510 from Thursday. 556 00:41:16,510 --> 00:41:17,530 I simulated them back. 557 00:41:17,530 --> 00:41:19,450 They didn't work. 558 00:41:19,450 --> 00:41:21,212 But I then put a feedback controller 559 00:41:21,212 --> 00:41:23,170 on, and from the exact same initial conditions, 560 00:41:23,170 --> 00:41:25,930 I now can simulate them, and they work. 561 00:41:25,930 --> 00:41:29,510 So it's disappointing that we had to do that at all, 562 00:41:29,510 --> 00:41:32,590 but I can now-- 563 00:41:32,590 --> 00:41:34,870 the stability is more than just stabilizing 564 00:41:34,870 --> 00:41:36,310 the initial conditions. 565 00:41:36,310 --> 00:41:39,632 Let's add some fairly big random numbers 566 00:41:39,632 --> 00:41:41,590 to that initial condition and see what happens. 567 00:41:45,850 --> 00:41:49,387 It's recomputing the policy every time, 568 00:41:49,387 --> 00:41:50,970 just because it was fast enough that I 569 00:41:50,970 --> 00:41:53,800 didn't bother to change it. 570 00:41:53,800 --> 00:41:55,710 OK, so that actually started with pretty big 571 00:41:55,710 --> 00:41:57,900 different initial conditions. 572 00:41:57,900 --> 00:42:00,756 So theta was off by-- 573 00:42:00,756 --> 00:42:03,480 I don't know-- 2/10 of a radian or something like this. 574 00:42:03,480 --> 00:42:06,212 The velocities were off by 1/2 a radian per second. 575 00:42:06,212 --> 00:42:07,170 We could crank that up. 576 00:42:07,170 --> 00:42:08,970 I bet it does a lot better than that. 577 00:42:08,970 --> 00:42:11,610 But if you watch these things, they 578 00:42:11,610 --> 00:42:16,260 converge quite nicely to together at the end there. 579 00:42:22,580 --> 00:42:24,440 And what matters is they get up to the top. 580 00:42:32,070 --> 00:42:34,590 So again, these things come together, find their way up 581 00:42:34,590 --> 00:42:37,770 to the top, and live. 582 00:42:37,770 --> 00:42:44,280 I bet, if I put it a lot bigger, it'll still work. 583 00:42:44,280 --> 00:42:46,110 I normally do an order of magnitude, 584 00:42:46,110 --> 00:42:50,470 but let's not be silly. 585 00:42:50,470 --> 00:42:53,963 Oh-- didn't make it. 586 00:42:53,963 --> 00:42:56,130 There's only one reason it didn't make it, actually. 587 00:42:56,130 --> 00:42:58,860 It's because, if you look in here, 588 00:42:58,860 --> 00:43:02,830 I'm actually honest about implementing the max torques. 589 00:43:02,830 --> 00:43:03,330 Yeah. 590 00:43:03,330 --> 00:43:05,340 So I actually have a torque limit, I impose it, 591 00:43:05,340 --> 00:43:06,870 and it lives on there. 592 00:43:06,870 --> 00:43:08,532 If I didn't, I bet I could convince you 593 00:43:08,532 --> 00:43:09,990 it works for any initial condition. 594 00:43:09,990 --> 00:43:13,170 But let's try it one more time-- get a little more lucky 595 00:43:13,170 --> 00:43:14,561 with the initial conditions. 596 00:43:17,950 --> 00:43:18,450 Oh, come on. 597 00:43:18,450 --> 00:43:19,450 Come on. 598 00:43:19,450 --> 00:43:21,420 Yes. 599 00:43:21,420 --> 00:43:24,330 OK, that was pretty far off, and it's still found its way back 600 00:43:24,330 --> 00:43:26,100 to the trajectory. 601 00:43:26,100 --> 00:43:27,780 Good-- yeah? 602 00:43:27,780 --> 00:43:29,970 Look at how big those initial conditions are. 603 00:43:29,970 --> 00:43:34,680 There and there versus-- wow, that's really good. 604 00:43:34,680 --> 00:43:35,610 OK. 605 00:43:35,610 --> 00:43:36,710 Did I see a question? 606 00:43:36,710 --> 00:43:38,950 No? 607 00:43:38,950 --> 00:43:42,690 All right, so this stuff works for pendulum. 608 00:43:42,690 --> 00:43:44,520 It works for more interesting systems too. 609 00:43:44,520 --> 00:43:47,330 I'll just show you the cartpole real quick here. 610 00:43:53,140 --> 00:43:54,097 I won't do the-- 611 00:43:54,097 --> 00:43:55,930 here is what it looks like without feedback. 612 00:43:55,930 --> 00:44:02,680 I'll just do the initial conditions corrupted solution, 613 00:44:02,680 --> 00:44:04,610 pump up-- 614 00:44:04,610 --> 00:44:07,640 OK, so if you remember my solutions from last time, 615 00:44:07,640 --> 00:44:09,140 I never drove off the screen before, 616 00:44:09,140 --> 00:44:11,557 so that it was actually it catching it by deviating enough 617 00:44:11,557 --> 00:44:13,098 that it came off the screen, and then 618 00:44:13,098 --> 00:44:14,720 slowly coming back to the top. 619 00:44:21,000 --> 00:44:24,250 It must be its x position or something going way off. 620 00:44:24,250 --> 00:44:25,810 No, not x position-- what is that? 621 00:44:25,810 --> 00:44:26,560 That's my control. 622 00:44:26,560 --> 00:44:27,460 Yeah. 623 00:44:27,460 --> 00:44:28,918 Did I do torque limits on that one? 624 00:44:28,918 --> 00:44:30,130 I still did torque limits. 625 00:44:30,130 --> 00:44:33,590 I just set them high, I guess. 626 00:44:33,590 --> 00:44:34,550 Yeah. 627 00:44:34,550 --> 00:44:36,440 So it really works. 628 00:44:36,440 --> 00:44:39,320 And the cool thing is the cost of implementing 629 00:44:39,320 --> 00:44:43,760 that LQR LTV stabilizer was negligibly 630 00:44:43,760 --> 00:44:46,040 more than implementing the-- 631 00:44:46,040 --> 00:44:48,709 most of that time was the shooting optimization. 632 00:44:55,575 --> 00:44:56,440 Yes? 633 00:44:56,440 --> 00:45:00,083 AUDIENCE: Why do you always start at the 0 time? 634 00:45:00,083 --> 00:45:01,750 You could look at the initial conditions 635 00:45:01,750 --> 00:45:03,880 and look where is the closest point 636 00:45:03,880 --> 00:45:08,110 on my nominal trajectories and then do your control policy 637 00:45:08,110 --> 00:45:10,180 from that moment in time. 638 00:45:10,180 --> 00:45:11,197 RUSS TEDRAKE: OK. 639 00:45:11,197 --> 00:45:12,530 So that's a really, really good. 640 00:45:12,530 --> 00:45:15,530 OK, that's exactly what I want to talk about next, actually. 641 00:45:23,740 --> 00:45:31,480 I designed a time-varying feedback controller, 642 00:45:31,480 --> 00:45:36,580 is negative K of t x bar of t. 643 00:45:36,580 --> 00:45:39,230 I designed that ahead of time. 644 00:45:39,230 --> 00:45:41,200 And then, from the initial conditions, 645 00:45:41,200 --> 00:45:45,970 I started simulating from 0, and I just played out the-- 646 00:45:45,970 --> 00:45:48,460 my nominal trajectory just marched forward with time, 647 00:45:48,460 --> 00:45:50,710 my feedback controller just marched forward with time, 648 00:45:50,710 --> 00:45:54,070 and my aerodynamics just marched forward with time. 649 00:45:54,070 --> 00:45:56,770 OK, so before I explicitly address your question, 650 00:45:56,770 --> 00:45:57,760 let me point out-- 651 00:45:57,760 --> 00:46:02,320 let me ask even a simpler question here. 652 00:46:08,187 --> 00:46:10,770 If I had plotted that in state space, what you would have seen 653 00:46:10,770 --> 00:46:14,040 is that the trajectory starts off somewhere in state space 654 00:46:14,040 --> 00:46:14,950 and comes together. 655 00:46:14,950 --> 00:46:15,850 That would have a good idea. 656 00:46:15,850 --> 00:46:17,308 Maybe I should do that in a minute. 657 00:46:17,308 --> 00:46:21,390 But it comes together and finds its way onto that trajectory. 658 00:46:21,390 --> 00:46:21,890 Yeah? 659 00:46:24,880 --> 00:46:26,220 OK, so here's the question. 660 00:46:28,950 --> 00:46:31,080 Instead of just changes in initial conditions, 661 00:46:31,080 --> 00:46:34,110 what happens if I have disturbances 662 00:46:34,110 --> 00:46:35,783 that push me off the trajectory? 663 00:46:35,783 --> 00:46:36,450 Well, that's OK. 664 00:46:36,450 --> 00:46:39,900 That's no different really than a different initial condition. 665 00:46:39,900 --> 00:46:41,612 They'll come back on here. 666 00:46:41,612 --> 00:46:43,320 What happens if I have a disturbance that 667 00:46:43,320 --> 00:46:49,140 pushes me along the trajectory with this controller? 668 00:46:49,140 --> 00:46:54,330 Let's say I've got the helpful disturbance, which, 669 00:46:54,330 --> 00:46:57,690 when I was right here, just happened 670 00:46:57,690 --> 00:46:59,610 to push me right to there. 671 00:47:03,090 --> 00:47:06,030 What's my feedback controller going to do? 672 00:47:06,030 --> 00:47:07,320 AUDIENCE: Slow it down. 673 00:47:07,320 --> 00:47:12,910 RUSS TEDRAKE: Yeah-- probably in a dramatic fashion. 674 00:47:12,910 --> 00:47:17,760 It's the same way-- it tries to quickly converge from here. 675 00:47:17,760 --> 00:47:20,850 It's going to push itself back towards that point, possibly. 676 00:47:20,850 --> 00:47:22,830 Slowing down doesn't-- makes it sound-- 677 00:47:22,830 --> 00:47:24,402 no big deal. 678 00:47:24,402 --> 00:47:25,860 It can't go backwards, but it might 679 00:47:25,860 --> 00:47:29,880 try to do something more severe to try to catch up 680 00:47:29,880 --> 00:47:33,510 with that old trajectory. 681 00:47:33,510 --> 00:47:37,680 So the major limitation of this is that it's blindly-- 682 00:47:37,680 --> 00:47:39,870 in order to have the strong convergence properties 683 00:47:39,870 --> 00:47:43,710 that we have, the controller is blindly marching forward 684 00:47:43,710 --> 00:47:45,330 in time. 685 00:47:45,330 --> 00:47:48,870 The great thing about switching to a time parameterization 686 00:47:48,870 --> 00:47:52,080 I can compute everything-- everything's linear again. 687 00:47:52,080 --> 00:47:56,730 The bad thing is you're a slave to time. 688 00:47:56,730 --> 00:47:59,100 So Phillip asked a next question. 689 00:47:59,100 --> 00:48:00,600 He says, so why not-- 690 00:48:00,600 --> 00:48:05,010 why do I just blindly start marching forward from time 0? 691 00:48:05,010 --> 00:48:08,700 Maybe, if I have a controller, I should just 692 00:48:08,700 --> 00:48:12,030 look for the closest point in my trajectory, 693 00:48:12,030 --> 00:48:13,920 and then, instead of indexing off time, 694 00:48:13,920 --> 00:48:18,780 index off some sort of phase, some fraction of my trajectory, 695 00:48:18,780 --> 00:48:21,900 and then execute that controller. 696 00:48:21,900 --> 00:48:25,710 And you can do that. 697 00:48:25,710 --> 00:48:27,600 I wish you the best if you do that, 698 00:48:27,600 --> 00:48:31,950 but my suspicion is that, if on every dt, 699 00:48:31,950 --> 00:48:34,220 you pick the closest point in the trajectory, 700 00:48:34,220 --> 00:48:36,720 then the result is you're going to chatter like you wouldn't 701 00:48:36,720 --> 00:48:38,370 believe. 702 00:48:38,370 --> 00:48:41,550 So there's a lot of protection you get when you-- 703 00:48:41,550 --> 00:48:42,960 you could think of this very much 704 00:48:42,960 --> 00:48:46,980 as a gain-scheduled linear controller. 705 00:48:46,980 --> 00:48:50,220 This is a time-varying gain scheduling, 706 00:48:50,220 --> 00:48:52,900 and the problem is if I switch gain quickly, 707 00:48:52,900 --> 00:48:54,540 then you're going to get chattering. 708 00:48:54,540 --> 00:48:57,310 So it might make a lot of sense, for instance, 709 00:48:57,310 --> 00:49:00,743 if you were to get a big disturbance, to re-evaluate, 710 00:49:00,743 --> 00:49:02,160 and try to find the closest point, 711 00:49:02,160 --> 00:49:06,690 and start executing that new policy with time re-indexed. 712 00:49:06,690 --> 00:49:10,380 But it's probably a bad idea, in my experience, 713 00:49:10,380 --> 00:49:12,450 to decide which part of the trajectory 714 00:49:12,450 --> 00:49:14,340 you're closest to on every-- 715 00:49:14,340 --> 00:49:16,385 every dt. 716 00:49:16,385 --> 00:49:17,510 That's probably a bad idea. 717 00:49:23,250 --> 00:49:24,840 Yes? 718 00:49:24,840 --> 00:49:26,760 AUDIENCE: Could you maybe play some tricks 719 00:49:26,760 --> 00:49:28,815 if you had some idea of the basin of attraction 720 00:49:28,815 --> 00:49:31,630 of the current point you're trying to get to? 721 00:49:31,630 --> 00:49:33,530 And if you know that you're outside of it, 722 00:49:33,530 --> 00:49:36,540 then work around it, [INAUDIBLE]?? 723 00:49:36,540 --> 00:49:37,290 RUSS TEDRAKE: Yes. 724 00:49:40,930 --> 00:49:45,270 So I have a particular trick the does that does that in-- 725 00:49:45,270 --> 00:49:48,240 we'll talk about it in the motion planning, but-- 726 00:49:48,240 --> 00:49:50,400 yeah, so Mark knows about these tricks 727 00:49:50,400 --> 00:49:53,170 for computing basins of attraction pretty efficiently. 728 00:49:53,170 --> 00:49:57,150 And so these days what we do is we actually 729 00:49:57,150 --> 00:49:59,060 try to compute the funnel-- 730 00:49:59,060 --> 00:50:00,810 the basin of attraction of this trajectory 731 00:50:00,810 --> 00:50:05,610 around the trajectory, and you could know discretely 732 00:50:05,610 --> 00:50:07,698 if you left that basin of attraction. 733 00:50:07,698 --> 00:50:09,240 So I'll give you the recipe for that, 734 00:50:09,240 --> 00:50:11,010 but it actually makes more sense, 735 00:50:11,010 --> 00:50:13,860 I think, in the motion planning context, where we actually 736 00:50:13,860 --> 00:50:16,713 will design trajectories that fill 737 00:50:16,713 --> 00:50:17,880 the space with these basins. 738 00:50:21,660 --> 00:50:24,190 This is very similar to the concept of flow tubes. 739 00:50:24,190 --> 00:50:24,690 Yes. 740 00:50:32,390 --> 00:50:36,170 OK, so big idea-- 741 00:50:36,170 --> 00:50:40,160 turn my non-linear system into a linear time-varying system, 742 00:50:40,160 --> 00:50:43,730 because I've re-parameterized it that along the trajectory. 743 00:50:43,730 --> 00:50:48,770 Do linear time-varying control, and even really complicated 744 00:50:48,770 --> 00:50:51,230 systems-- it'll work well. 745 00:50:51,230 --> 00:50:53,655 We're doing on our [INAUDIBLE] plane. 746 00:50:53,655 --> 00:50:56,510 I mean, it's really a pretty good idea. 747 00:50:59,168 --> 00:51:00,710 When I first started working with it, 748 00:51:00,710 --> 00:51:05,600 I thought that it would have the problem that-- 749 00:51:09,380 --> 00:51:12,500 it would have the property that it uses a lot of control 750 00:51:12,500 --> 00:51:14,120 to force itself back to the trajectory 751 00:51:14,120 --> 00:51:16,550 and rigidly follow the trajectory. 752 00:51:16,550 --> 00:51:19,130 It's easy to equate linear control 753 00:51:19,130 --> 00:51:22,673 with high-gain linear feedback, which people do a lot of, 754 00:51:22,673 --> 00:51:24,590 but it doesn't necessarily need that property. 755 00:51:24,590 --> 00:51:26,780 If R is small in this derivation, 756 00:51:26,780 --> 00:51:30,440 it can actually take very subtle approaches back 757 00:51:30,440 --> 00:51:31,190 to the trajectory. 758 00:51:31,190 --> 00:51:34,470 Your system might come in and do whatever 759 00:51:34,470 --> 00:51:36,470 it needs to get back on the trajectory with very 760 00:51:36,470 --> 00:51:37,053 little torque. 761 00:51:41,880 --> 00:51:44,720 The only price you pay is, if your torque is smaller, 762 00:51:44,720 --> 00:51:48,650 if you're penalizing torque use higher, 763 00:51:48,650 --> 00:51:50,938 then you might restrict your-- that might shrink 764 00:51:50,938 --> 00:51:51,980 your basin of attraction. 765 00:51:51,980 --> 00:51:56,450 It might be that, because it's trying to use less torque, 766 00:51:56,450 --> 00:51:59,140 it will not overcome the non-linearities. 767 00:51:59,140 --> 00:52:00,890 But in the neighborhood of the trajectory, 768 00:52:00,890 --> 00:52:05,090 you can get these very elegant solutions 769 00:52:05,090 --> 00:52:07,370 which look like minimal energy kind of solutions 770 00:52:07,370 --> 00:52:09,380 for the non-linear problem in the vicinity 771 00:52:09,380 --> 00:52:10,830 of these trajectories. 772 00:52:10,830 --> 00:52:12,560 So one of the ideas we'll talk about later is how do you 773 00:52:12,560 --> 00:52:14,518 design the minimal set of trajectories-- which, 774 00:52:14,518 --> 00:52:19,077 if you use these controllers, which do the right thing 775 00:52:19,077 --> 00:52:19,910 in a lot of places-- 776 00:52:25,187 --> 00:52:27,270 if you walked away from this class knowing nothing 777 00:52:27,270 --> 00:52:31,350 but direct co-location and linear time-varying feedback 778 00:52:31,350 --> 00:52:34,930 control, I bet you could control a lot of cool systems. 779 00:52:34,930 --> 00:52:35,430 Yeah. 780 00:52:35,430 --> 00:52:37,555 I guess you also have to know sys id, which I'm not 781 00:52:37,555 --> 00:52:39,750 going to tell you about. 782 00:52:39,750 --> 00:52:40,800 That's the gotcha. 783 00:52:40,800 --> 00:52:42,633 You have to have a model for all this stuff. 784 00:52:45,295 --> 00:52:46,920 If someone gives you a model, if you're 785 00:52:46,920 --> 00:52:50,130 willing to construct a model, then you can do a lot of things 786 00:52:50,130 --> 00:52:50,880 with this. 787 00:52:58,570 --> 00:53:05,770 OK, I want to give you one more mental picture 788 00:53:05,770 --> 00:53:09,070 to think about what this is doing so it launches 789 00:53:09,070 --> 00:53:13,340 into the next thing here. 790 00:53:13,340 --> 00:53:20,140 So my cost-to-go function, which I just erased, is, remember-- 791 00:53:20,140 --> 00:53:29,890 my cost-to-go function, J of x bar t, is x bar S of t x bar. 792 00:53:34,600 --> 00:53:35,680 This is a quadratic form. 793 00:53:35,680 --> 00:53:37,990 Just like the original LQR, you can think 794 00:53:37,990 --> 00:53:39,850 of this as a quadratic bowl. 795 00:53:39,850 --> 00:53:42,340 In the LTI LQR case-- 796 00:53:42,340 --> 00:53:45,880 am I OK throwing around these three-letter acronyms? 797 00:53:45,880 --> 00:53:51,340 In the LTI LQR case, it was a static quadratic bowl 798 00:53:51,340 --> 00:53:54,220 centered around the point I'm trying to stabilize-- 799 00:53:54,220 --> 00:53:55,210 so my cost-to-go. 800 00:53:55,210 --> 00:53:59,260 It said-- says, as I move away from the point I'm trying 801 00:53:59,260 --> 00:54:04,380 to regulate, I'm going to incur more cost in the direction-- 802 00:54:04,380 --> 00:54:10,810 the rate it grows depends on the variables inside S. 803 00:54:10,810 --> 00:54:15,910 Now, in this picture, I have still a time-varying-- 804 00:54:15,910 --> 00:54:18,850 I have a time-varying quadratic bowl, 805 00:54:18,850 --> 00:54:22,630 but it's also moving through time, 806 00:54:22,630 --> 00:54:27,940 because it's based on x bar. 807 00:54:27,940 --> 00:54:36,100 So in my pendulum world, if I have this nominal trajectory, 808 00:54:36,100 --> 00:54:39,010 you can think of it as having some quadratic bowl here. 809 00:54:41,950 --> 00:54:45,490 And the LTI stabilizer that we did come up 810 00:54:45,490 --> 00:54:47,080 with that was based on LQR did have 811 00:54:47,080 --> 00:54:50,320 some sort of quadratic bowl shape that looked like that. 812 00:54:53,377 --> 00:54:55,960 Backwards in time, there's going to be another quadratic bowl. 813 00:54:55,960 --> 00:54:58,450 Can I draw it very badly like this? 814 00:55:04,010 --> 00:55:07,190 If I can just draw coming off the board a little bit-- so 815 00:55:07,190 --> 00:55:11,940 there's some quadratic bowl centered around this point, 816 00:55:11,940 --> 00:55:13,220 which is my costs-to-go. 817 00:55:13,220 --> 00:55:15,540 At that point, if I marched further backwards in time, 818 00:55:15,540 --> 00:55:18,740 I've got some other quadratic bowl around this point. 819 00:55:18,740 --> 00:55:20,775 That makes the point, again, that-- 820 00:55:24,590 --> 00:55:27,050 if my quadratic ball is currently 821 00:55:27,050 --> 00:55:29,720 this because time is 5-- 822 00:55:29,720 --> 00:55:32,930 or I had a 4-second trajectory-- maybe times 3 here-- 823 00:55:32,930 --> 00:55:34,430 and I'm pushed along the trajectory, 824 00:55:34,430 --> 00:55:37,520 it's actually going to incur just as much cost, 825 00:55:37,520 --> 00:55:39,680 roughly, as I'm pushed another direction. 826 00:55:39,680 --> 00:55:43,700 There's a quadratic bowl literally centered around x0 827 00:55:43,700 --> 00:55:46,310 at time t. 828 00:55:46,310 --> 00:55:49,070 That's what this equation says. 829 00:55:51,810 --> 00:55:56,070 And this quadratic bowl is the cost-to-go estimate. 830 00:55:56,070 --> 00:56:00,780 It says, if I'm away from the trajectory, 831 00:56:00,780 --> 00:56:04,350 I should expect the cost I incur in getting back 832 00:56:04,350 --> 00:56:10,080 towards that trajectory to be this quadratic form. 833 00:56:10,080 --> 00:56:10,580 Is that OK? 834 00:56:15,810 --> 00:56:18,270 And the key point is, because I've re-parameterized 835 00:56:18,270 --> 00:56:23,670 my equations in terms of x bar, this quadratic bowl always 836 00:56:23,670 --> 00:56:26,718 lives on that trajectory. 837 00:56:30,570 --> 00:56:34,620 My cost function was x bar Q x bar. 838 00:56:34,620 --> 00:56:36,195 My best thing to do is to drive x 839 00:56:36,195 --> 00:56:38,940 bar to 0, which means to drive my system back 840 00:56:38,940 --> 00:56:40,132 to the trajectory. 841 00:56:42,970 --> 00:56:45,170 People OK with that imagery? 842 00:56:45,170 --> 00:56:46,420 It doesn't look like they are. 843 00:56:46,420 --> 00:56:47,050 Everybody's OK. 844 00:56:50,870 --> 00:56:52,660 Are we OK here the LTI stabilizer 845 00:56:52,660 --> 00:56:55,570 being an LQR bowl-- or a quadratic bowl? 846 00:56:55,570 --> 00:56:59,440 So the farther I am away in the directions defined by S, 847 00:56:59,440 --> 00:57:02,620 I'm going to cut some cost getting back. 848 00:57:02,620 --> 00:57:05,470 This is just the same thing that says, 849 00:57:05,470 --> 00:57:08,920 if I'm at this point in the trajectory, 850 00:57:08,920 --> 00:57:12,010 I'm going to cover this cost-to-go. 851 00:57:12,010 --> 00:57:14,770 And the best thing to do, the minimal cost-to-go 852 00:57:14,770 --> 00:57:18,670 is living right on that trajectory. 853 00:57:18,670 --> 00:57:21,190 As a consequence, the optimal controller, 854 00:57:21,190 --> 00:57:24,610 which tries to go down the landscape of the cost-to-go, 855 00:57:24,610 --> 00:57:28,210 is going to drive you back to the trajectory. 856 00:57:28,210 --> 00:57:30,460 Now, I said all that because I'm about to do something 857 00:57:30,460 --> 00:57:31,600 that sounds totally wacky. 858 00:57:40,580 --> 00:57:44,420 Would it ever make sense for me to design a slightly different 859 00:57:44,420 --> 00:57:48,200 cost function, which, when I linearize and design 860 00:57:48,200 --> 00:57:52,730 the feedback controller, I end up with a cost-to-go over here? 861 00:57:56,930 --> 00:57:58,940 Let's say I have some nominal trajectory. 862 00:57:58,940 --> 00:58:01,970 I found, through whatever method, 863 00:58:01,970 --> 00:58:06,230 some reasonable system trajectory, but I really-- 864 00:58:06,230 --> 00:58:07,700 I'm still not happy with that. 865 00:58:07,700 --> 00:58:13,260 The trajectory I really wanted was something like this, 866 00:58:13,260 --> 00:58:15,620 let's say. 867 00:58:15,620 --> 00:58:18,880 Would it make any sense to do my linearization 868 00:58:18,880 --> 00:58:23,840 around this trajectory, and try to drive the system 869 00:58:23,840 --> 00:58:25,327 to this other trajectory? 870 00:58:29,303 --> 00:58:32,648 AUDIENCE: You mean like scaling your optimal trajectory? 871 00:58:32,648 --> 00:58:34,315 RUSS TEDRAKE: I don't even mean scaling. 872 00:58:34,315 --> 00:58:35,170 They could cross. 873 00:58:35,170 --> 00:58:36,202 They could do whatever. 874 00:58:36,202 --> 00:58:37,285 It's not a simple scaling. 875 00:58:39,830 --> 00:58:42,210 Let me give you a simpler version of the problem. 876 00:58:50,511 --> 00:58:54,792 AUDIENCE: [INAUDIBLE] 877 00:58:54,792 --> 00:58:56,000 RUSS TEDRAKE: Say that again. 878 00:58:56,000 --> 00:59:05,520 AUDIENCE: [INAUDIBLE] 879 00:59:05,520 --> 00:59:06,390 RUSS TEDRAKE: Yes. 880 00:59:06,390 --> 00:59:10,440 I'm going to divine a cost function, which 881 00:59:10,440 --> 00:59:15,210 would have it so I prefer to live on that trajectory. 882 00:59:15,210 --> 00:59:17,430 Let me do it in the time invariant case 883 00:59:17,430 --> 00:59:18,480 just so it's clear. 884 00:59:32,700 --> 00:59:35,880 Let's say my coordinate system's back and simple. 885 00:59:35,880 --> 00:59:36,735 It lives around 0. 886 00:59:39,820 --> 00:59:42,870 Let's say I have that cost function, or actually 887 00:59:42,870 --> 00:59:44,520 that dynamics. 888 00:59:44,520 --> 00:59:47,460 And instead of-- my original cost function was just x 889 00:59:47,460 --> 00:59:49,050 transpose Qx-- 890 00:59:49,050 --> 00:59:53,775 let's say my cost function now is-- 891 01:00:39,687 --> 01:00:41,520 let's think about this problem for a second. 892 01:00:49,610 --> 01:00:51,910 So let's say I have a linear system. 893 01:00:51,910 --> 01:00:53,980 Now, the LQR controller we did initially-- 894 01:00:57,950 --> 01:00:59,939 little sloppy with that. 895 01:01:04,630 --> 01:01:06,820 The LQR controller I did initially 896 01:01:06,820 --> 01:01:09,410 always assumed that the desired place you wanted to be in life 897 01:01:09,410 --> 01:01:09,910 was 0. 898 01:01:12,760 --> 01:01:15,640 If the desired place you want to be in life is a constant-- 899 01:01:15,640 --> 01:01:17,560 it's 3, let's say-- 900 01:01:17,560 --> 01:01:21,670 then you can still do your linear quadratic regulator. 901 01:01:21,670 --> 01:01:25,390 Just move your coordinate system so the 3 is 0. 902 01:01:25,390 --> 01:01:27,100 But let's say I've got a linear system, 903 01:01:27,100 --> 01:01:30,250 but I want to drive it through some trajectory-- 904 01:01:30,250 --> 01:01:34,570 time-varying trajectory-- x desired as a function of time. 905 01:01:34,570 --> 01:01:37,905 Then I can't quite just recenter the origin. 906 01:01:37,905 --> 01:01:39,280 I've got to think about, how do I 907 01:01:39,280 --> 01:01:42,312 drive my linear system through some other trajectories? 908 01:01:48,590 --> 01:01:51,440 Now the-- it's actually-- 909 01:01:55,040 --> 01:02:00,900 LTI system, but my cost function is time-varying, because my-- 910 01:02:00,900 --> 01:02:05,120 I have the desired trajectory that varies with time. 911 01:02:05,120 --> 01:02:07,790 The result-- I won't write it down again-- 912 01:02:07,790 --> 01:02:12,180 again, I can do this Riccati equation. 913 01:02:12,180 --> 01:02:12,680 Back up. 914 01:02:15,410 --> 01:02:19,460 The only difference is that the quadratic bowl 915 01:02:19,460 --> 01:02:23,682 is no longer going to be centered on the origin. 916 01:02:23,682 --> 01:02:25,890 The quadratic bowl is going to move with that desired 917 01:02:25,890 --> 01:02:26,920 trajectory. 918 01:02:29,860 --> 01:02:30,520 OK? 919 01:02:30,520 --> 01:02:31,048 Yeah? 920 01:02:31,048 --> 01:02:33,340 AUDIENCE: If that's far away from where you linearized, 921 01:02:33,340 --> 01:02:34,330 could you-- 922 01:02:34,330 --> 01:02:37,542 RUSS TEDRAKE: That's an excellent question. 923 01:02:37,542 --> 01:02:39,250 But this is a linear system, so first, we 924 01:02:39,250 --> 01:02:41,230 don't have to worry about that, but don't let 925 01:02:41,230 --> 01:02:42,438 me forget to go back to that. 926 01:02:46,330 --> 01:02:50,020 So I can drive my linear system through some trajectory that's 927 01:02:50,020 --> 01:02:54,940 non-zero beautifully with an LQR controller. 928 01:02:54,940 --> 01:02:57,250 The only problem is that my LQR controller 929 01:02:57,250 --> 01:03:01,720 has to have has a cost-to-go function and a controller which 930 01:03:01,720 --> 01:03:03,940 is not pointing me always at the origin. 931 01:03:03,940 --> 01:03:06,763 You wouldn't want that. 932 01:03:06,763 --> 01:03:08,930 So in fact, the way it looks-- there's a lot of ways 933 01:03:08,930 --> 01:03:10,030 that people derive it. 934 01:03:10,030 --> 01:03:13,360 With Pontryagin, it's not too hard to derive. 935 01:03:13,360 --> 01:03:16,285 I prefer to derive it with the HJB. 936 01:03:25,850 --> 01:03:29,855 I'm not going to do the derivation, but-- 937 01:03:29,855 --> 01:03:31,730 I don't mean to bore you, but what you end up 938 01:03:31,730 --> 01:03:43,730 with is J of x of t has a form x transpose S of t x plus x 939 01:03:43,730 --> 01:03:44,510 transpose-- 940 01:03:44,510 --> 01:03:46,610 I call this S2-- 941 01:03:46,610 --> 01:03:51,035 S1 of t plus S0 of t. 942 01:03:51,035 --> 01:03:52,160 It's a full quadratic form. 943 01:03:55,350 --> 01:03:58,500 When I just have this, it's always a quadratic bowl. 944 01:03:58,500 --> 01:04:00,325 It's always centered around 0. 945 01:04:00,325 --> 01:04:02,700 If you want it, in general, to be a quadratic bowl that's 946 01:04:02,700 --> 01:04:06,270 not necessarily at 0, you need the full quadratic form. 947 01:04:06,270 --> 01:04:07,860 I could equally well have written this 948 01:04:07,860 --> 01:04:14,550 as x minus x something desired, S of t. 949 01:04:14,550 --> 01:04:15,990 But let's work with this form. 950 01:04:15,990 --> 01:04:16,490 Yeah? 951 01:04:23,260 --> 01:04:26,590 So this is just an equation of a quadratic bowl, not necessarily 952 01:04:26,590 --> 01:04:29,620 centered on the origin. 953 01:04:29,620 --> 01:04:35,920 And the LQR derivation gives me my backwards dynamics for S2. 954 01:04:35,920 --> 01:04:40,060 It gives me the backwards dynamics for S1 and for S0. 955 01:04:45,440 --> 01:04:46,940 And it's in the notes. 956 01:04:46,940 --> 01:04:49,010 It's actually already in your notes. 957 01:04:49,010 --> 01:04:53,330 It's in the HJB chapter that has been up there for a while. 958 01:04:57,170 --> 01:05:03,770 OK, now, the reason I'm on about all this 959 01:05:03,770 --> 01:05:06,410 is that there's another way-- 960 01:05:06,410 --> 01:05:08,180 I told you about shooting methods. 961 01:05:08,180 --> 01:05:10,170 I told you about direct co-location. 962 01:05:10,170 --> 01:05:11,810 There is yet another way that people 963 01:05:11,810 --> 01:05:19,998 like to design trajectories, which use LQR directly. 964 01:05:25,250 --> 01:05:27,578 And that's this iterative LQR procedure. 965 01:05:42,910 --> 01:05:43,410 OK. 966 01:06:06,090 --> 01:06:09,900 So let's say I have some trajectory that I've already 967 01:06:09,900 --> 01:06:16,440 found, x 0 of t, and I have some different trajectory, which 968 01:06:16,440 --> 01:06:20,460 is my desired trajectory, x desired of t. 969 01:06:24,710 --> 01:06:27,260 Then, using this optimal tracking-- 970 01:06:27,260 --> 01:06:30,020 if you stick back in the time-varying components, 971 01:06:30,020 --> 01:06:32,990 using this optimal tracking, I can 972 01:06:32,990 --> 01:06:35,420 linearize my dynamical system around that. 973 01:06:35,420 --> 01:06:37,108 So I have no guarantees that x desired 974 01:06:37,108 --> 01:06:38,150 is a feasible trajectory. 975 01:06:38,150 --> 01:06:40,250 In fact, many cases-- it's not. 976 01:06:40,250 --> 01:06:44,030 For instance, x desired might be B at the goal at all times. 977 01:06:44,030 --> 01:06:47,840 If I came up with a perfectly feasible x desired trajectory, 978 01:06:47,840 --> 01:06:52,280 I probably wouldn't be running an open-loop solver. 979 01:06:52,280 --> 01:06:54,650 I want to get to I want to get as close as desired-- 980 01:06:54,650 --> 01:06:57,710 as possible to the x desired while potentially 981 01:06:57,710 --> 01:07:00,220 minimizing cost and respecting the dynamics of the system. 982 01:07:02,880 --> 01:07:04,820 Here's one way to do it-- 983 01:07:04,820 --> 01:07:10,100 linearize my system around my initial guess, x0 of t, 984 01:07:10,100 --> 01:07:12,860 then design a linear optimal tracking-- 985 01:07:12,860 --> 01:07:15,020 linear time-varying optimal tracking which 986 01:07:15,020 --> 01:07:17,690 tries to regulate my system as close as 987 01:07:17,690 --> 01:07:21,240 possible to that trajectory. 988 01:07:21,240 --> 01:07:24,600 Now, what Steven said was exactly on point. 989 01:07:24,600 --> 01:07:30,510 If I drive my system away from where I linearized, 990 01:07:30,510 --> 01:07:32,940 there's no guarantee that my linear model 991 01:07:32,940 --> 01:07:36,660 is going to be any good here. 992 01:07:36,660 --> 01:07:40,310 But the hope is that this trajectory is better-- 993 01:07:40,310 --> 01:07:44,480 a better guess than the one before. 994 01:07:44,480 --> 01:07:52,310 And you iterate, make another approximation around there, 995 01:07:52,310 --> 01:07:55,550 design the LQR controller, run the LQR controller that drives 996 01:07:55,550 --> 01:07:58,040 me here to find the new u tape. 997 01:07:58,040 --> 01:08:01,100 That defines my new trajectory-- repeat. 998 01:08:01,100 --> 01:08:02,270 OK? 999 01:08:02,270 --> 01:08:05,120 That's called iterative LQR. 1000 01:08:05,120 --> 01:08:06,760 What else is it called? 1001 01:08:06,760 --> 01:08:07,760 Do you know? 1002 01:08:19,728 --> 01:08:20,840 Yeah. 1003 01:08:20,840 --> 01:08:22,092 Do you see that? 1004 01:08:22,092 --> 01:08:24,145 It's differential dynamic programming-- 1005 01:08:37,740 --> 01:08:38,327 almost. 1006 01:08:38,327 --> 01:08:40,410 There's a subtle difference, which I can tell you, 1007 01:08:40,410 --> 01:08:42,640 if you want. 1008 01:08:42,640 --> 01:08:44,069 There's a lot of names for it. 1009 01:08:44,069 --> 01:08:48,149 There's another guy, Bobrow-- some of you know Jim Bobrow-- 1010 01:08:48,149 --> 01:08:50,609 he wrote this up called the sequential linear quadratic 1011 01:08:50,609 --> 01:08:53,979 regulators. 1012 01:08:53,979 --> 01:08:57,423 Any four-letter acronym that ends in LQR-- 1013 01:08:57,423 --> 01:08:59,340 if you put it in Google, you'll find something 1014 01:08:59,340 --> 01:09:00,423 that's probably this idea. 1015 01:09:00,423 --> 01:09:01,350 Yeah. 1016 01:09:01,350 --> 01:09:04,050 If you put in whatever arbitrary constant in front of it, 1017 01:09:04,050 --> 01:09:06,558 you'll probably get this idea out. 1018 01:09:06,558 --> 01:09:10,645 AUDIENCE: What prevents your actuator costs 1019 01:09:10,645 --> 01:09:14,978 from accumulating from one iteration to another? 1020 01:09:14,978 --> 01:09:16,520 RUSS TEDRAKE: Every iteration, you're 1021 01:09:16,520 --> 01:09:19,140 trying to minimize your actuator cost. 1022 01:09:19,140 --> 01:09:23,051 AUDIENCE: Right, but I mean, if you have a lot of iterations, 1023 01:09:23,051 --> 01:09:25,540 couldn't that potentially grow? 1024 01:09:25,540 --> 01:09:28,950 RUSS TEDRAKE: I don't actually add to my old u tape. 1025 01:09:28,950 --> 01:09:31,740 I actually completely replace my old u tape 1026 01:09:31,740 --> 01:09:34,350 with a new controller which drives me to the system. 1027 01:09:34,350 --> 01:09:35,170 AUDIENCE: Oh, OK. 1028 01:09:35,170 --> 01:09:39,510 RUSS TEDRAKE: So there's no worries about additive actions. 1029 01:09:39,510 --> 01:09:42,060 It actually tells me in my original non-linear system 1030 01:09:42,060 --> 01:09:44,700 what's my best guess as a u tape that goes there. 1031 01:09:47,990 --> 01:09:50,520 AUDIENCE: Is this basically a trick 1032 01:09:50,520 --> 01:09:53,882 to get rid of the slow [INAUDIBLE]?? 1033 01:09:53,882 --> 01:09:55,840 RUSS TEDRAKE: So very, very good-- so why would 1034 01:09:55,840 --> 01:09:56,590 I want to do this? 1035 01:09:56,590 --> 01:09:58,648 Why didn't I tell you about this first, or why-- 1036 01:09:58,648 --> 01:10:00,440 how does this compare to the other methods? 1037 01:10:03,220 --> 01:10:06,170 There is a sense by which-- and I thought about doing the whole 1038 01:10:06,170 --> 01:10:07,420 derivation, but I think this-- 1039 01:10:07,420 --> 01:10:12,130 I hope that this short discussion is sufficient. 1040 01:10:12,130 --> 01:10:14,230 So what I'm roughly doing is I'm using 1041 01:10:14,230 --> 01:10:19,660 LQR to come up with a quadratic approximation of where 1042 01:10:19,660 --> 01:10:20,500 my cost-- 1043 01:10:20,500 --> 01:10:21,730 where my minimum is. 1044 01:10:24,760 --> 01:10:27,970 This is very much in the spirit of those SQP 1045 01:10:27,970 --> 01:10:31,150 methods, the sequential quadratic methods. 1046 01:10:31,150 --> 01:10:33,970 I'm using computation on this line 1047 01:10:33,970 --> 01:10:37,447 to come up with a quadratic approximation of where I think 1048 01:10:37,447 --> 01:10:38,530 the new minimum should be. 1049 01:10:41,110 --> 01:11:01,400 So as such, it's a relatively cheap way with SQP properties, 1050 01:11:01,400 --> 01:11:02,935 convergence properties. 1051 01:11:12,640 --> 01:11:15,030 OK. 1052 01:11:15,030 --> 01:11:18,180 The methods I told you about on Thursday-- 1053 01:11:18,180 --> 01:11:19,890 the backprop through time, the RTRL-- 1054 01:11:22,530 --> 01:11:25,770 they computed J over my trajectory. 1055 01:11:25,770 --> 01:11:30,000 They computed partial J, partial alpha over my trajectory. 1056 01:11:30,000 --> 01:11:34,830 They did not ever explicitly compute the second derivative. 1057 01:11:34,830 --> 01:11:42,450 I never computed partial J, partial alpha, partial alpha. 1058 01:11:42,450 --> 01:11:45,030 To explicitly do an SQP update, somebody 1059 01:11:45,030 --> 01:11:50,280 needs to compute the Hessian of that optimization. 1060 01:11:50,280 --> 01:11:53,160 I'm relying on SNOPT to do some bookkeeping 1061 01:11:53,160 --> 01:11:57,660 to estimate the Hessian to do the second-order update. 1062 01:11:57,660 --> 01:12:00,300 I would do better if I had an efficient way 1063 01:12:00,300 --> 01:12:02,370 to compute the second derivatives, 1064 01:12:02,370 --> 01:12:04,830 and I could hand that directly to SNOPT or whatever, 1065 01:12:04,830 --> 01:12:09,270 and we'd get-- expect faster convergence. 1066 01:12:09,270 --> 01:12:12,760 This isn't quite the gradients that I want, 1067 01:12:12,760 --> 01:12:15,810 but it has that feel to it, and it has similar convergence 1068 01:12:15,810 --> 01:12:16,715 properties. 1069 01:12:16,715 --> 01:12:18,090 So what you should think about is 1070 01:12:18,090 --> 01:12:19,632 you should think about this is a more 1071 01:12:19,632 --> 01:12:22,020 explicit second-order method for making 1072 01:12:22,020 --> 01:12:29,287 a large jump in my trajectories with sequential quadratic 1073 01:12:29,287 --> 01:12:30,120 convergence results. 1074 01:12:32,820 --> 01:12:36,270 I feel like I've lost everybody now, but ask questions, 1075 01:12:36,270 --> 01:12:38,820 if you need to. 1076 01:12:38,820 --> 01:12:43,680 The advantage of it is that it's fast. 1077 01:12:43,680 --> 01:12:45,870 It could potentially require very few iterations 1078 01:12:45,870 --> 01:12:48,465 to converge. 1079 01:12:48,465 --> 01:12:51,090 One of the strongest advantages is that there's no explicit way 1080 01:12:51,090 --> 01:12:53,108 to do constraints. 1081 01:12:53,108 --> 01:12:55,650 You have to think harder about how to do constraints in this. 1082 01:12:58,170 --> 01:12:59,700 And I know less formal guarantees 1083 01:12:59,700 --> 01:13:01,920 that it will succeed, because it's an approximation 1084 01:13:01,920 --> 01:13:03,645 of that quadratic. 1085 01:13:06,930 --> 01:13:10,860 So the RL community uses DDP a lot, 1086 01:13:10,860 --> 01:13:12,900 and actually, a lot of people who do DDP do 1087 01:13:12,900 --> 01:13:14,180 iterative LQR, for instance. 1088 01:13:14,180 --> 01:13:16,180 For instance, Peter [INAUDIBLE] and those guys-- 1089 01:13:16,180 --> 01:13:17,010 they always call DDP. 1090 01:13:17,010 --> 01:13:18,552 They're actually doing iterative LQR. 1091 01:13:18,552 --> 01:13:21,930 DDP explicitly actually has-- 1092 01:13:21,930 --> 01:13:25,800 you have to do a second-order expansion of your dynamics, 1093 01:13:25,800 --> 01:13:27,840 so you don't just get A of t x. 1094 01:13:27,840 --> 01:13:30,930 You actually go to second-order expansion of your dynamics. 1095 01:13:30,930 --> 01:13:33,750 So it's a little bit more expensive of an update, 1096 01:13:33,750 --> 01:13:37,440 but most people equate it almost exactly to iterative LQR. 1097 01:13:40,570 --> 01:13:44,970 AUDIENCE: So this x0 trajectory, this 1098 01:13:44,970 --> 01:13:47,460 isn't a trajectory you found by doing 1099 01:13:47,460 --> 01:13:49,010 RTRL or something like that? 1100 01:13:49,010 --> 01:13:50,318 This is something different? 1101 01:13:50,318 --> 01:13:51,860 RUSS TEDRAKE: Good-- so this could be 1102 01:13:51,860 --> 01:13:54,350 a standard replacement to RTRL. 1103 01:13:54,350 --> 01:13:57,240 I could start with a random x0 trajectory. 1104 01:13:57,240 --> 01:14:00,560 So maybe it's better to start with a random u trajectory, 1105 01:14:00,560 --> 01:14:03,530 simulate it, and get an x0 trajectory. 1106 01:14:03,530 --> 01:14:06,500 And then it will quickly reshape until it 1107 01:14:06,500 --> 01:14:09,869 gets as close as possible to this x desired trajectory. 1108 01:14:09,869 --> 01:14:12,202 AUDIENCE: But you're reshaping your control actions that 1109 01:14:12,202 --> 01:14:13,340 get you to the x trajectory? 1110 01:14:13,340 --> 01:14:14,090 RUSS TEDRAKE: Yes. 1111 01:14:14,090 --> 01:14:17,840 So I'm reshaping u, resimulating to get the new x. 1112 01:14:17,840 --> 01:14:18,440 Yeah. 1113 01:14:18,440 --> 01:14:20,590 I wrote it more carefully in the notes, and-- 1114 01:14:20,590 --> 01:14:24,560 but I hope this is the right level to do the class. 1115 01:14:24,560 --> 01:14:26,900 And there's one extra thing that-- 1116 01:14:26,900 --> 01:14:29,930 so I say this works if you have a desired x-- 1117 01:14:29,930 --> 01:14:36,180 desired trajectory, which means your cost function 1118 01:14:36,180 --> 01:14:37,410 has this sort of a form. 1119 01:14:40,170 --> 01:14:43,590 The advocates of iterative LQR and DDP 1120 01:14:43,590 --> 01:14:47,530 say that every cost function has this form. 1121 01:14:47,530 --> 01:14:50,590 This is just a second-order Taylor expansion 1122 01:14:50,590 --> 01:14:54,842 of whatever non-linear cost function you want. 1123 01:14:54,842 --> 01:14:56,800 So write down whatever non-linear cost function 1124 01:14:56,800 --> 01:15:00,070 you have, do a second-order expansion on it, 1125 01:15:00,070 --> 01:15:05,160 and you end up with a quadratic cost function like this. 1126 01:15:05,160 --> 01:15:08,540 And you can then approximate that solution 1127 01:15:08,540 --> 01:15:10,970 with an iterative LQR scheme-- 1128 01:15:10,970 --> 01:15:13,760 or RTRL, or backprop through time. 1129 01:15:13,760 --> 01:15:18,020 This is the third out of our list of methods. 1130 01:15:18,020 --> 01:15:19,087 My goal is only to know-- 1131 01:15:19,087 --> 01:15:20,420 so that you know that it exists. 1132 01:15:20,420 --> 01:15:22,253 And you can read the notes if you want more, 1133 01:15:22,253 --> 01:15:24,450 and you can read the papers if you want more. 1134 01:15:24,450 --> 01:15:25,957 OK? 1135 01:15:25,957 --> 01:15:26,540 Yeah, Michael? 1136 01:15:26,540 --> 01:15:28,145 AUDIENCE: So I think last time you 1137 01:15:28,145 --> 01:15:31,130 talked about you're parallelizing the deviation 1138 01:15:31,130 --> 01:15:33,020 from your non-control input. 1139 01:15:33,020 --> 01:15:35,780 So what if you were-- like as you iterate 1140 01:15:35,780 --> 01:15:37,580 the controller, [INAUDIBLE]? 1141 01:15:41,660 --> 01:15:48,620 RUSS TEDRAKE: Good-- the total cost is actually the cost 1142 01:15:48,620 --> 01:15:52,760 with respect to some u desired. 1143 01:15:52,760 --> 01:15:55,460 So I end up trying to optimize that in a coordinate system 1144 01:15:55,460 --> 01:16:00,950 based on u0, but the cost I'm trying to minimize is the u 1145 01:16:00,950 --> 01:16:03,258 the original coordinate system minus u desired-- 1146 01:16:03,258 --> 01:16:04,550 which, in a lot of cases, is 0. 1147 01:16:07,850 --> 01:16:09,860 Although I do it in a weird coordinate system, 1148 01:16:09,860 --> 01:16:12,410 and it actually eventually subtracts itself out 1149 01:16:12,410 --> 01:16:16,010 because I add it back in at the end, and-- 1150 01:16:16,010 --> 01:16:18,170 it's quite easy to, for instance, 1151 01:16:18,170 --> 01:16:21,560 minimize u squared in the original coordinate system. 1152 01:16:21,560 --> 01:16:22,060 OK? 1153 01:16:27,217 --> 01:16:29,050 So on Thursday, we get to do walking robots. 1154 01:16:29,050 --> 01:16:31,008 We're going to move on to the next major thing. 1155 01:16:31,008 --> 01:16:37,120 But you've now learned three of the open-loop trajectory 1156 01:16:37,120 --> 01:16:39,510 optimizers that people really use-- 1157 01:16:39,510 --> 01:16:45,010 iterative LQR very quickly, RTRL backprop through time-- 1158 01:16:45,010 --> 01:16:46,540 I grouped as one-- 1159 01:16:46,540 --> 01:16:50,800 the shooting methods and direct co-location. 1160 01:16:50,800 --> 01:16:53,173 There's another one that's recent addition 1161 01:16:53,173 --> 01:16:55,090 to the scene, which is this discrete mechanics 1162 01:16:55,090 --> 01:16:57,765 and optimal control, this DMOC. 1163 01:16:57,765 --> 01:16:59,140 If anybody was excited about that 1164 01:16:59,140 --> 01:17:00,310 and wanted to do a class project on that, 1165 01:17:00,310 --> 01:17:01,780 that would be a perfect thing. 1166 01:17:01,780 --> 01:17:03,630 Grab that paper. 1167 01:17:03,630 --> 01:17:06,047 Show us that it works on the [INAUDIBLE] carpole. 1168 01:17:06,047 --> 01:17:06,880 That'd be beautiful. 1169 01:17:06,880 --> 01:17:09,920 I'd love to have that-- 1170 01:17:09,920 --> 01:17:13,840 have us try that and see how it compares to the other methods. 1171 01:17:18,083 --> 01:17:20,500 You've got a pretty good toolkit for optimal control now-- 1172 01:17:20,500 --> 01:17:23,650 practical optimal control. 1173 01:17:23,650 --> 01:17:26,950 And it works for flying robots, but it also 1174 01:17:26,950 --> 01:17:30,070 worked for your wheeled robots, if you want to control them 1175 01:17:30,070 --> 01:17:31,210 with better control. 1176 01:17:35,470 --> 01:17:38,260 You could do a drop-in replacement LTI optimal 1177 01:17:38,260 --> 01:17:40,450 tracking controller, and it would be better-- 1178 01:17:40,450 --> 01:17:42,880 assuming your model's better. 1179 01:17:42,880 --> 01:17:46,300 So you have these tools. 1180 01:17:46,300 --> 01:17:49,190 Quick procedural things-- I know we're out of time. 1181 01:17:49,190 --> 01:17:52,150 So next Thursday-- well, so let me say the good thing first. 1182 01:17:52,150 --> 01:17:54,640 In two weeks, you're on spring break. 1183 01:17:54,640 --> 01:17:56,560 Yeah. 1184 01:17:56,560 --> 01:17:58,945 The Thursday preceding that is our midterm. 1185 01:18:02,270 --> 01:18:04,458 We haven't had a midterm in the class before, 1186 01:18:04,458 --> 01:18:06,250 so there's no old exams for me to give you, 1187 01:18:06,250 --> 01:18:08,000 but John and I are going to try to come up 1188 01:18:08,000 --> 01:18:10,810 with some representative problems for you 1189 01:18:10,810 --> 01:18:13,570 to take home for Thursday of this week 1190 01:18:13,570 --> 01:18:17,350 so you can have some problems to munch on over the weekend. 1191 01:18:17,350 --> 01:18:21,170 It'll be an in-class exam Thursday before spring break, 1192 01:18:21,170 --> 01:18:23,800 which is a week from Thursday. 1193 01:18:23,800 --> 01:18:25,690 OK? 1194 01:18:25,690 --> 01:18:27,280 AUDIENCE: [INAUDIBLE] 1195 01:18:27,280 --> 01:18:28,030 RUSS TEDRAKE: Yes. 1196 01:18:28,030 --> 01:18:32,410 So open-book-- well, you can grab whatever notes-- 1197 01:18:32,410 --> 01:18:36,760 open-note exam-- absolutely. 1198 01:18:36,760 --> 01:18:38,950 Well, I'll say it more in the preparation package, 1199 01:18:38,950 --> 01:18:40,782 but roughly, we're going to-- 1200 01:18:40,782 --> 01:18:42,490 I think, if you have your notes with you, 1201 01:18:42,490 --> 01:18:44,350 if you've done the problem set-- 1202 01:18:44,350 --> 01:18:47,470 and most importantly, if you know how these algorithms-- 1203 01:18:47,470 --> 01:18:51,190 where the algorithms relate to each other and where they'd be 1204 01:18:51,190 --> 01:18:52,390 used in different systems-- 1205 01:18:52,390 --> 01:18:56,020 I can guarantee I'm going to ask you something about that-- 1206 01:18:56,020 --> 01:18:58,540 then it's not designed to be a killer. 1207 01:19:02,560 --> 01:19:06,385 Good-- and I hope you start thinking about projects. 1208 01:19:09,400 --> 01:19:12,830 Just out of being a fairly nice person, 1209 01:19:12,830 --> 01:19:16,627 I wasn't going to ask you to do projects before your midterm. 1210 01:19:16,627 --> 01:19:18,460 But this time last year, I was asking people 1211 01:19:18,460 --> 01:19:21,310 to submit project proposals. 1212 01:19:21,310 --> 01:19:23,950 We're going to do that immediately after the midterm. 1213 01:19:23,950 --> 01:19:26,518 If you've been chewing on, this method 1214 01:19:26,518 --> 01:19:28,810 looked like a really good match to my research problem, 1215 01:19:28,810 --> 01:19:32,390 or I've never actually thought about juggling robots before, 1216 01:19:32,390 --> 01:19:35,722 or something like this, you can imagine-- 1217 01:19:35,722 --> 01:19:37,180 so in the fairly near future, we're 1218 01:19:37,180 --> 01:19:41,290 going to ask you for a half-page project proposal 1219 01:19:41,290 --> 01:19:43,450 that we can iterate with you on to get going 1220 01:19:43,450 --> 01:19:46,120 on a world-class final project. 1221 01:19:46,120 --> 01:19:48,070 Yeah? 1222 01:19:48,070 --> 01:19:49,860 See you Thursday.