1
00:00:00,000 --> 00:00:02,520
The following content is
provided under a Creative

2
00:00:02,520 --> 00:00:03,970
Commons license.

3
00:00:03,970 --> 00:00:06,330
Your support will help
MIT OpenCourseWare

4
00:00:06,330 --> 00:00:10,660
continue to offer high-quality
educational resources for free.

5
00:00:10,660 --> 00:00:13,320
To make a donation or
view additional materials

6
00:00:13,320 --> 00:00:17,160
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,160 --> 00:00:18,370
at ocw.mit.edu.

8
00:00:22,060 --> 00:00:23,800
RUSS TEDRAKE: OK, welcome back.

9
00:00:27,020 --> 00:00:31,510
So last week, we spent the week
talking about policy search

10
00:00:31,510 --> 00:00:36,130
methods, and trying to make
a distinction between those

11
00:00:36,130 --> 00:00:39,040
and the value-based
methods we started with.

12
00:00:39,040 --> 00:00:42,100
And by the end of the
week, we had a couple

13
00:00:42,100 --> 00:00:46,660
pretty slick methods
for optimizing

14
00:00:46,660 --> 00:00:50,390
an open-loop trajectory
of the system.

15
00:00:50,390 --> 00:00:52,360
So we talked about
at least two ways.

16
00:01:15,790 --> 00:01:17,790
So by open-loop, I mean
it's a function of time,

17
00:01:17,790 --> 00:01:20,880
not a function of state.

18
00:01:20,880 --> 00:01:30,750
We talked about the
shooting methods,

19
00:01:30,750 --> 00:01:47,070
where we evaluated J of alpha
x0 times 0 just by simulation.

20
00:01:47,070 --> 00:01:55,495
And we evaluated-- explicitly
evaluated the gradients by--

21
00:01:55,495 --> 00:01:57,120
well, I gave you two
algorithms for it.

22
00:01:57,120 --> 00:02:01,750
I gave you one that I called
back prop through time--

23
00:02:01,750 --> 00:02:03,570
which was an adjoint method--

24
00:02:03,570 --> 00:02:06,990
and another one
that I called RTRL--

25
00:02:06,990 --> 00:02:08,889
real-time recurrent
learning, which

26
00:02:08,889 --> 00:02:10,889
are the names from the
neural network community,

27
00:02:10,889 --> 00:02:14,490
but perfectly good
names for those methods.

28
00:02:26,292 --> 00:02:27,750
And then the claim
was that, if you

29
00:02:27,750 --> 00:02:33,780
can compute those two
things by simulation or--

30
00:02:33,780 --> 00:02:37,470
forward simulation and then
a back propagation pass,

31
00:02:37,470 --> 00:02:40,890
or a simulation, which carried
also the derivatives forward

32
00:02:40,890 --> 00:02:44,340
in time, then we could
hand those gradients

33
00:02:44,340 --> 00:02:50,365
to SNOPT or some other
non-linear optimization

34
00:02:50,365 --> 00:02:50,865
package.

35
00:02:54,400 --> 00:02:56,400
And if we're good, we
can also lean on SNOPT

36
00:02:56,400 --> 00:02:58,650
to handle things like
final value constraints.

37
00:02:58,650 --> 00:03:00,900
If you want to make sure
the trajectory succeeds

38
00:03:00,900 --> 00:03:02,610
in getting you
exactly to the goal

39
00:03:02,610 --> 00:03:05,193
or if you want to make sure that
your torques are never bigger

40
00:03:05,193 --> 00:03:07,500
than some maximum
talk allowed, then you

41
00:03:07,500 --> 00:03:10,770
can take advantage of that.

42
00:03:10,770 --> 00:03:14,190
And the second method, remember,
was direct co-location method,

43
00:03:14,190 --> 00:03:17,400
which we often
abbreviate as DIRCOL.

44
00:03:21,660 --> 00:03:26,290
And the big idea there
was to over-parameterized

45
00:03:26,290 --> 00:03:38,100
our optimization with the
open-loop trajectory, but also

46
00:03:38,100 --> 00:03:44,780
the state trajectory,
which makes coming up

47
00:03:44,780 --> 00:03:51,485
with gradients simple.

48
00:03:55,610 --> 00:04:05,455
And then I have to enforce
the constraint that x of--

49
00:04:05,455 --> 00:04:06,830
let's say, in
discrete time here,

50
00:04:06,830 --> 00:04:11,301
n plus 1 had better be
subject to the dynamics--

51
00:04:17,560 --> 00:04:23,230
so two very similar
methods of trying

52
00:04:23,230 --> 00:04:28,673
to compute some
open-loop trajectory

53
00:04:28,673 --> 00:04:29,590
as a function of time.

54
00:04:29,590 --> 00:04:33,610
Ultimately, what I care
about is a set of actions

55
00:04:33,610 --> 00:04:37,540
that I apply over time that
will get me to the goal

56
00:04:37,540 --> 00:04:41,320
or minimize my cost function.

57
00:04:41,320 --> 00:04:44,020
In the case where I
explicitly parameterized

58
00:04:44,020 --> 00:04:50,770
an open-loop trajectory,
both of these results

59
00:04:50,770 --> 00:04:53,980
in a solution which satisfies
the Pontryagin minimum

60
00:04:53,980 --> 00:04:59,323
principle, subject
to discretization

61
00:04:59,323 --> 00:05:00,490
errors and things like that.

62
00:05:29,636 --> 00:05:34,053
AUDIENCE: [INAUDIBLE]

63
00:05:34,053 --> 00:05:35,220
RUSS TEDRAKE: We did, right.

64
00:05:35,220 --> 00:05:40,540
So I should say, subject
to time discretization.

65
00:05:40,540 --> 00:05:45,130
That's the one place
where technically, it

66
00:05:45,130 --> 00:05:47,980
would satisfy a discrete
time version of Pontryagin's

67
00:05:47,980 --> 00:05:48,730
minimum principle.

68
00:05:48,730 --> 00:06:01,895
AUDIENCE: [INAUDIBLE]

69
00:06:01,895 --> 00:06:03,270
RUSS TEDRAKE: You
can think of it

70
00:06:03,270 --> 00:06:06,060
whichever way it
makes you happier--

71
00:06:06,060 --> 00:06:09,900
so in fact, the parameters
that you hand in--

72
00:06:09,900 --> 00:06:12,710
maybe it's easier to think
of it as a function--

73
00:06:12,710 --> 00:06:15,210
a discrete function of time,
because you're going to hand it

74
00:06:15,210 --> 00:06:20,310
u at certain points in time,
and you're going to handle x--

75
00:06:20,310 --> 00:06:23,760
hand it x at certain
points in time.

76
00:06:23,760 --> 00:06:27,300
And this discrete time update
can be an Euler integration

77
00:06:27,300 --> 00:06:31,260
or a higher order integration
of your continuous dynamics,

78
00:06:31,260 --> 00:06:32,760
but you only satisfy
the constraints

79
00:06:32,760 --> 00:06:33,960
of discrete intervals of time.

80
00:06:33,960 --> 00:06:34,460
Yeah.

81
00:06:39,660 --> 00:06:42,940
OK, I did give you a
slightly more general--

82
00:06:42,940 --> 00:06:47,470
I tried to point out that these
methods could equally well

83
00:06:47,470 --> 00:06:52,390
compute, find good parameters of
a feedback control or something

84
00:06:52,390 --> 00:06:53,800
too.

85
00:06:53,800 --> 00:06:56,770
The simple case was
when my parameters alpha

86
00:06:56,770 --> 00:07:00,880
were explicitly my
control tape, but more

87
00:07:00,880 --> 00:07:03,460
generally, if you wanted
to tune a feedback

88
00:07:03,460 --> 00:07:05,320
controller-- a linear
feedback controller,

89
00:07:05,320 --> 00:07:06,580
or a non-linear
feedback controller,

90
00:07:06,580 --> 00:07:08,290
or a neural network,
or whatever it is,

91
00:07:08,290 --> 00:07:11,500
you can use the same
methods to do that.

92
00:07:11,500 --> 00:07:13,600
I would only make this
statement in the case

93
00:07:13,600 --> 00:07:17,740
where the controller
specifically

94
00:07:17,740 --> 00:07:26,267
is the open-loop tape,
because if I parameterized

95
00:07:26,267 --> 00:07:28,600
my trajectory by some feedback
controller, for instance,

96
00:07:28,600 --> 00:07:31,120
then that's going to
restrict the policy class.

97
00:07:31,120 --> 00:07:33,250
That's going to restrict
the class of tapes

98
00:07:33,250 --> 00:07:36,430
that I can look over, which
makes it a more compact, more

99
00:07:36,430 --> 00:07:38,890
efficient way to solve
your optimization,

100
00:07:38,890 --> 00:07:41,290
but potentially prevents
you from achieving

101
00:07:41,290 --> 00:07:43,150
a perfect minimum.

102
00:07:46,998 --> 00:07:50,110
AUDIENCE: [INAUDIBLE]

103
00:07:50,110 --> 00:07:50,860
RUSS TEDRAKE: Yep.

104
00:07:50,860 --> 00:07:53,163
So by virtue of saying
that they satisfy

105
00:07:53,163 --> 00:07:54,580
Pontryagin's minimum
principle, we

106
00:07:54,580 --> 00:07:56,860
know that that's
only a local optima.

107
00:07:56,860 --> 00:08:01,480
This says that I can't
make a small change in u

108
00:08:01,480 --> 00:08:04,000
to get better performance.

109
00:08:04,000 --> 00:08:04,960
Yep.

110
00:08:04,960 --> 00:08:08,692
But it's only a necessary
condition for optimality, not

111
00:08:08,692 --> 00:08:09,400
a sufficient one.

112
00:08:11,980 --> 00:08:16,220
But there's a bigger problem
with it-- with both of those.

113
00:08:16,220 --> 00:08:19,750
And that's the fact
that they're completely

114
00:08:19,750 --> 00:08:23,320
useless in real
life, unless I do

115
00:08:23,320 --> 00:08:26,350
one more step, which is to
stabilize the trajectory as I

116
00:08:26,350 --> 00:08:28,210
get out.

117
00:08:28,210 --> 00:08:35,530
So finding some open-loop
trajectory by these methods,

118
00:08:35,530 --> 00:08:38,740
satisfying Pontryagin's
minimum principle-- fine.

119
00:08:38,740 --> 00:08:42,760
But there's nothing in
this process that says,

120
00:08:42,760 --> 00:08:46,750
if I don't-- if I changed my
initial conditions by epsilon,

121
00:08:46,750 --> 00:08:48,970
I could completely
diverge when I follow--

122
00:08:48,970 --> 00:08:52,180
when I execute that
open-loop trajectory.

123
00:08:52,180 --> 00:08:55,330
If I change my simulation
time step by a little bit,

124
00:08:55,330 --> 00:08:56,320
I might diverge.

125
00:08:56,320 --> 00:08:58,610
If I have modeling
errors, I might diverge.

126
00:09:01,120 --> 00:09:04,695
So in order to make these
useful for a real system,

127
00:09:04,695 --> 00:09:06,070
we have to do
another step, which

128
00:09:06,070 --> 00:09:08,350
is trajectory stabilization.

129
00:09:08,350 --> 00:09:11,020
And it actually follows quite
naturally from the things

130
00:09:11,020 --> 00:09:13,530
we've already talked about.

131
00:09:13,530 --> 00:09:17,080
OK, so today we're going
to give these guys teeth

132
00:09:17,080 --> 00:09:18,896
with a trajectory optimization.

133
00:09:52,160 --> 00:09:54,650
And I'll show you
examples of a trajectory

134
00:09:54,650 --> 00:09:56,870
that's optimized beautifully
for the pendulum even,

135
00:09:56,870 --> 00:09:59,060
and if I simulate it a
little differently back--

136
00:09:59,060 --> 00:10:00,143
just does the wrong thing.

137
00:10:00,143 --> 00:10:01,550
It never gets to the top.

138
00:10:01,550 --> 00:10:06,460
So we want to get
rid of that problem.

139
00:10:09,190 --> 00:10:12,675
OK, so the solution is to design
a trajectory stabilization.

140
00:10:19,330 --> 00:10:27,040
Now, for those of
you that have been

141
00:10:27,040 --> 00:10:28,893
playing with robots
for many years, when

142
00:10:28,893 --> 00:10:30,310
you hear trajectory
stabilization,

143
00:10:30,310 --> 00:10:31,360
what do you think of?

144
00:10:31,360 --> 00:10:33,730
What kind of tricks
to people use

145
00:10:33,730 --> 00:10:35,368
for trajectory stabilization?

146
00:10:37,735 --> 00:10:38,860
AUDIENCE: Sliding surfaces.

147
00:10:38,860 --> 00:10:41,830
RUSS TEDRAKE: Sliding surfaces--
that's a good one for--

148
00:10:41,830 --> 00:10:45,820
[INAUDIBLE] often will
design a sliding surface

149
00:10:45,820 --> 00:10:47,470
and squish the aerodynamics.

150
00:10:50,800 --> 00:10:53,320
That's actually
pretty encompassing.

151
00:10:53,320 --> 00:10:56,380
I think a lot of the
trajectory stabilizers

152
00:10:56,380 --> 00:10:59,410
are based on sliding
modes or feedback

153
00:10:59,410 --> 00:11:02,440
linearization in some form.

154
00:11:02,440 --> 00:11:06,388
And all I'll say about it
is that the story's sort

155
00:11:06,388 --> 00:11:07,930
of the same as
everything we've said.

156
00:11:07,930 --> 00:11:10,090
If you have a fully
actuated system,

157
00:11:10,090 --> 00:11:14,980
it's not hard to design
a trajectory stabilizer.

158
00:11:14,980 --> 00:11:16,930
A good sliding mode
controller could take--

159
00:11:16,930 --> 00:11:21,190
could work even for an
underactuated system,

160
00:11:21,190 --> 00:11:22,580
but I think there's a--

161
00:11:22,580 --> 00:11:25,570
I prefer the linear
quadratic form

162
00:11:25,570 --> 00:11:27,610
of these trajectories
stabilizers.

163
00:11:27,610 --> 00:11:31,690
OK, so we want to do a
trajectory stabilization that's

164
00:11:31,690 --> 00:11:33,410
suitable for
underactuated systems.

165
00:11:45,070 --> 00:11:47,200
And the approach is
going to be with LQR.

166
00:12:02,490 --> 00:12:03,870
OK, so if we're
going to use LQR,

167
00:12:03,870 --> 00:12:06,285
we better be able to
linearize our system.

168
00:12:08,920 --> 00:12:11,670
So far, when we've done
the linearizations,

169
00:12:11,670 --> 00:12:15,875
we've only done them
at fixed points.

170
00:12:15,875 --> 00:12:17,250
So the first thing
we have to ask

171
00:12:17,250 --> 00:12:21,060
ourselves is, what happens if
we try to linearize at a more

172
00:12:21,060 --> 00:12:23,440
arbitrary point in state space?

173
00:12:23,440 --> 00:12:23,940
Yeah.

174
00:12:31,860 --> 00:12:38,890
So let's say I've got the
system x dot equals f of xu,

175
00:12:38,890 --> 00:12:46,200
and now I want to linearize
around some x0, u0, but not

176
00:12:46,200 --> 00:12:48,990
necessarily a carefully
chosen x0, u0--

177
00:12:48,990 --> 00:12:51,270
just something random
in state space.

178
00:12:54,480 --> 00:12:57,810
The Taylor expansion
of this says

179
00:12:57,810 --> 00:13:01,980
that this thing's going
to be approximately

180
00:13:01,980 --> 00:13:09,579
f of x0, u0, partial f,
partial x evaluated at x.

181
00:13:27,210 --> 00:13:30,720
OK, and we called this
before A and this B,

182
00:13:30,720 --> 00:13:49,510
and so that thing we can
actually write as, in general,

183
00:13:49,510 --> 00:13:53,332
in the case where f
of x0, u0-- if x0,

184
00:13:53,332 --> 00:13:54,790
u0 was a fixed
point of the system,

185
00:13:54,790 --> 00:13:59,620
that term disappears,
but be careful.

186
00:13:59,620 --> 00:14:02,860
If you're doing your
linearization out here,

187
00:14:02,860 --> 00:14:05,230
if you're at-- not
at a fixed point,

188
00:14:05,230 --> 00:14:07,180
if you have any
velocity, for instance,

189
00:14:07,180 --> 00:14:11,915
then, in the original
x-coordinates,

190
00:14:11,915 --> 00:14:14,290
it's not actually-- the Taylor
expansion doesn't give you

191
00:14:14,290 --> 00:14:15,370
a linear system.

192
00:14:15,370 --> 00:14:16,810
It gives you some affine system.

193
00:14:16,810 --> 00:14:18,768
This thing is harder
to work with-- not

194
00:14:18,768 --> 00:14:20,560
incredibly harder, but
harder to work with.

195
00:14:23,820 --> 00:14:26,230
The solution is quite
simple, but I just

196
00:14:26,230 --> 00:14:30,820
wanted to say it
the bad way first

197
00:14:30,820 --> 00:14:34,420
so that you appreciate
the good way.

198
00:14:34,420 --> 00:14:47,780
If we change coordinates
and we use instead

199
00:14:47,780 --> 00:14:57,530
for our coordinates the
difference between x and x0

200
00:14:57,530 --> 00:15:27,080
of t, then x bar dot is going
to be x dot minus x0 dot

201
00:15:27,080 --> 00:15:37,730
equals x dot minus f of
x0, u0, which is that C.

202
00:15:37,730 --> 00:15:45,150
This guy here is taken care of
in this new coordinate system,

203
00:15:45,150 --> 00:15:47,900
which allows me to write
the whole thing as x

204
00:15:47,900 --> 00:15:51,680
bar dot equals A of x bar.

205
00:16:03,220 --> 00:16:04,740
You with me on that?

206
00:16:04,740 --> 00:16:11,290
Linearizing a system at
a more arbitrary point--

207
00:16:11,290 --> 00:16:14,100
doing a Taylor expansion
results in a linear system

208
00:16:14,100 --> 00:16:17,640
only if you change
coordinates to lie

209
00:16:17,640 --> 00:16:21,810
on some system trajectory.

210
00:16:21,810 --> 00:16:40,650
So x0, u0 must be a solution of
x of f of xu of that equation.

211
00:16:40,650 --> 00:16:46,830
And then the system reduces to
a linear system description.

212
00:16:46,830 --> 00:16:51,670
But the cost you pay for
this beautiful, simple--

213
00:16:51,670 --> 00:16:54,600
well, let me be even a
little bit more careful.

214
00:16:54,600 --> 00:17:01,690
So A here, this
partial f, partial x,

215
00:17:01,690 --> 00:17:07,030
is evaluated at xt u of t.

216
00:17:07,030 --> 00:17:09,160
And in general,
A and B in this--

217
00:17:09,160 --> 00:17:15,603
when I do this are functions
of time, as well as x and t.

218
00:17:18,407 --> 00:17:19,740
That's a pretty important point.

219
00:17:22,730 --> 00:17:25,760
So if I'm willing to change
coordinates to live along

220
00:17:25,760 --> 00:17:29,420
the trajectory, then the
result is I can get this linear

221
00:17:29,420 --> 00:17:34,310
time-varying model of the
dynamics along feasible

222
00:17:34,310 --> 00:17:35,540
trajectories--

223
00:17:35,540 --> 00:17:37,460
system trajectories.

224
00:17:37,460 --> 00:17:39,380
The cost is that
you have to work

225
00:17:39,380 --> 00:17:43,340
in a coordinate system that
moves along your trajectory.

226
00:17:43,340 --> 00:17:46,438
So we'll see where that
comes in in a little bit.

227
00:17:46,438 --> 00:17:48,980
But the first question is, OK,
let's say I've got this linear

228
00:17:48,980 --> 00:17:51,080
time-varying--

229
00:17:51,080 --> 00:17:52,550
time-varying linear system.

230
00:17:52,550 --> 00:17:55,940
Can I do all the things
I want to do with that?

231
00:17:55,940 --> 00:17:59,690
In most of our control classes,
we end up doing LTI systems.

232
00:18:02,780 --> 00:18:05,907
LTV systems-- linear
time-varying--

233
00:18:19,070 --> 00:18:24,680
are actually a fantastically
rich class of systems

234
00:18:24,680 --> 00:18:27,470
that we don't talk about
enough, I think, in life.

235
00:18:27,470 --> 00:18:29,600
They're still linear systems.

236
00:18:29,600 --> 00:18:33,320
Superposition still holds.

237
00:18:33,320 --> 00:18:42,210
If I have initial
condition 1 and some u

238
00:18:42,210 --> 00:18:48,650
trajectory 1 for t
greater than equal to t0,

239
00:18:48,650 --> 00:18:52,850
and that gives me some
resulting x trajectory out

240
00:18:52,850 --> 00:19:00,710
for t greater than t0, and
I have another solution

241
00:19:00,710 --> 00:19:05,270
with a different
initial condition

242
00:19:05,270 --> 00:19:14,760
and a different control, and
that gives me a different--

243
00:19:14,760 --> 00:19:20,240
I call this x1, x2 for t
greater than or equal to t0--

244
00:19:25,420 --> 00:19:28,120
if I have that,
then it better be

245
00:19:28,120 --> 00:19:37,960
the case that alpha 1 x1
of t0 plus alpha 2 x2 of t0

246
00:19:37,960 --> 00:19:48,520
plus alpha 1 u1 tape
plus alpha 2 u2 tape

247
00:19:48,520 --> 00:19:50,320
is going to result
in a trajectory which

248
00:19:50,320 --> 00:19:55,780
is alpha 1 x1 plus alpha 2 x2.

249
00:20:00,040 --> 00:20:01,330
That's superposition.

250
00:20:01,330 --> 00:20:08,470
That's the defining
characteristic of linearity.

251
00:20:08,470 --> 00:20:11,290
And even though this is a
richer class of systems--

252
00:20:11,290 --> 00:20:17,200
these A of t, x of
t, B of t, u of t--

253
00:20:17,200 --> 00:20:18,670
superposition still holds.

254
00:20:22,710 --> 00:20:25,560
And in fact, a lot
of our derivations

255
00:20:25,560 --> 00:20:28,140
that we've done that are for
linear systems still hold.

256
00:20:36,660 --> 00:20:39,600
OK, so now the question
is, how do we design--

257
00:20:39,600 --> 00:20:42,950
how do we work with the fact
that this thing is still easy,

258
00:20:42,950 --> 00:20:45,050
and design a
controller that works

259
00:20:45,050 --> 00:20:47,320
with this new linearized system?

260
00:21:13,118 --> 00:21:15,660
Maybe first I should break out
my colored chalk and make sure

261
00:21:15,660 --> 00:21:16,868
we have intuition about this.

262
00:21:21,278 --> 00:21:22,820
Do you understand
what this is doing,

263
00:21:22,820 --> 00:21:26,770
if I do this time-varying
linearization?

264
00:21:26,770 --> 00:21:29,020
Let me do an example
with the pendulum

265
00:21:29,020 --> 00:21:34,480
here, our favorite
theta, theta dot.

266
00:21:34,480 --> 00:21:37,390
And let's say we carve up--

267
00:21:37,390 --> 00:21:43,420
we find some nice solution which
gets me from my one fixed point

268
00:21:43,420 --> 00:21:45,425
to the other fixed point.

269
00:21:45,425 --> 00:21:47,800
The ones we were getting were
these pump-up trajectories,

270
00:21:47,800 --> 00:21:49,570
which looked
something like this.

271
00:21:56,030 --> 00:21:59,990
I'm moving through
state space here,

272
00:21:59,990 --> 00:22:05,660
and the dynamics here vary
with state in a non-linear way.

273
00:22:05,660 --> 00:22:08,660
But if I have a trajectory,
a feasible trajectory

274
00:22:08,660 --> 00:22:12,230
that goes through the
relevant parts of state space,

275
00:22:12,230 --> 00:22:15,410
then this time-varying
linearization takes

276
00:22:15,410 --> 00:22:20,570
my non-linear system, and
makes it parameterized

277
00:22:20,570 --> 00:22:23,883
only-- instead of by being
parameterized by state,

278
00:22:23,883 --> 00:22:25,550
it's going to make
it parameterized only

279
00:22:25,550 --> 00:22:28,483
by time along the trajectory.

280
00:22:33,060 --> 00:22:41,910
The trick is the
trajectory allows

281
00:22:41,910 --> 00:23:01,720
me to reparameterize my
non-linearity in terms of time,

282
00:23:01,720 --> 00:23:03,548
instead of state.

283
00:23:03,548 --> 00:23:04,840
It sounds like a simple thing--

284
00:23:04,840 --> 00:23:06,130
I'm just reparameterizing it--

285
00:23:06,130 --> 00:23:08,170
but it makes all the
difference in the world.

286
00:23:08,170 --> 00:23:11,110
If things are parameterized
as a function of time,

287
00:23:11,110 --> 00:23:13,902
and are otherwise linear,
then I could do all kinds

288
00:23:13,902 --> 00:23:14,860
of computation on them.

289
00:23:14,860 --> 00:23:17,290
I can integrate the equations.

290
00:23:17,290 --> 00:23:20,832
I can design quadratic
regulators on it.

291
00:23:20,832 --> 00:23:22,540
It makes all the
difference in the world.

292
00:23:26,650 --> 00:23:29,530
So what I'm effectively
doing is coming up

293
00:23:29,530 --> 00:23:34,360
with local linear
representations of the dynamics

294
00:23:34,360 --> 00:23:35,290
along the trajectory.

295
00:23:35,290 --> 00:23:37,630
I'm not sure if this is a
helpful way for me to draw it,

296
00:23:37,630 --> 00:23:41,710
but you can think of this thing
as approximating the dynamics

297
00:23:41,710 --> 00:23:43,840
along that trajectory.

298
00:23:43,840 --> 00:23:45,760
At every given
instant in time, I'm

299
00:23:45,760 --> 00:23:49,600
going to use one of
these linear models.

300
00:23:49,600 --> 00:23:51,730
This is supposed
to be some plane

301
00:23:51,730 --> 00:23:53,122
that you're driving through--

302
00:23:53,122 --> 00:23:54,580
not sure if that's
actually helpful

303
00:23:54,580 --> 00:23:57,920
graphic, but it's the
way I think of it.

304
00:24:01,190 --> 00:24:04,280
And by virtue of taking a
particular path through,

305
00:24:04,280 --> 00:24:07,670
I can make locally linear
models on which these things

306
00:24:07,670 --> 00:24:09,320
have eigenvectors,
and eigenvalues,

307
00:24:09,320 --> 00:24:14,480
or whatever that are
valid in the neighborhood

308
00:24:14,480 --> 00:24:18,210
of the trajectory.

309
00:24:18,210 --> 00:24:22,970
So if you can imagine, even
without any stabilization,

310
00:24:22,970 --> 00:24:26,000
it could be that I
could quickly assess

311
00:24:26,000 --> 00:24:30,170
the stability of my
time-varying linear model.

312
00:24:30,170 --> 00:24:32,150
And trajectories in
this linear model

313
00:24:32,150 --> 00:24:35,900
may converge to the
nominal limit cycle,

314
00:24:35,900 --> 00:24:40,150
or they may diverge,
depending on A and B.

315
00:24:40,150 --> 00:24:42,380
Or they may blow up.

316
00:24:46,250 --> 00:24:48,470
This is by far the more
common case, unfortunately.

317
00:24:48,470 --> 00:24:51,020
You'd be very lucky to come
out of a shooting method

318
00:24:51,020 --> 00:24:53,570
or a direct co-location method,
and end up with a system

319
00:24:53,570 --> 00:24:55,403
where if you played it
out, it just happened

320
00:24:55,403 --> 00:24:58,280
to be a stable trajectory.

321
00:24:58,280 --> 00:25:00,260
But we can assess
all that quickly

322
00:25:00,260 --> 00:25:05,810
with these time-varying
linearizations found locally.

323
00:25:05,810 --> 00:25:06,660
Make sense?

324
00:25:06,660 --> 00:25:07,404
Yeah?

325
00:25:07,404 --> 00:25:12,330
AUDIENCE: [INAUDIBLE]
talk about that there

326
00:25:12,330 --> 00:25:13,790
is a bad way of doing this.

327
00:25:13,790 --> 00:25:16,130
This is not a bad way
of doing this, right?

328
00:25:16,130 --> 00:25:17,246
We were talkinga about it.

329
00:25:21,560 --> 00:25:24,560
RUSS TEDRAKE: If I do a
Taylor expansion of my system

330
00:25:24,560 --> 00:25:28,040
in the original coordinate
system, which is x,

331
00:25:28,040 --> 00:25:30,500
then it's not linear.

332
00:25:30,500 --> 00:25:33,170
End parentheses, that
was the bad way to do it.

333
00:25:33,170 --> 00:25:34,610
Yeah?

334
00:25:34,610 --> 00:25:35,750
Good way to do it--

335
00:25:35,750 --> 00:25:39,500
change the coordinates to
a coordinate system, which

336
00:25:39,500 --> 00:25:43,070
moves with the trajectory.

337
00:25:43,070 --> 00:25:46,523
If you do that, things
become time-varying linear.

338
00:25:46,523 --> 00:25:48,440
That was a good way to
do it, and that's still

339
00:25:48,440 --> 00:25:49,200
in open parentheses.

340
00:25:49,200 --> 00:25:49,970
We're still going.

341
00:25:49,970 --> 00:25:50,470
Yeah.

342
00:26:00,210 --> 00:26:06,360
OK, so our task now is to
design a time-varying feedback

343
00:26:06,360 --> 00:26:08,250
controller-- since our
model is time-varying,

344
00:26:08,250 --> 00:26:11,970
you'd expect our solution
to also be time-varying--

345
00:26:11,970 --> 00:26:15,715
which takes these bad, unstable
trajectories of the system--

346
00:26:15,715 --> 00:26:16,590
and they really are--

347
00:26:16,590 --> 00:26:19,350
I'll show you simple pendulum.

348
00:26:19,350 --> 00:26:20,825
This trajectory comes out.

349
00:26:20,825 --> 00:26:22,950
Actually, if you just
integrate in a different way,

350
00:26:22,950 --> 00:26:24,930
it'll go off and
do the wrong thing.

351
00:26:24,930 --> 00:26:27,520
It typically doesn't go off
and add energy to the system

352
00:26:27,520 --> 00:26:28,193
so much.

353
00:26:28,193 --> 00:26:28,860
The ones I get--

354
00:26:28,860 --> 00:26:31,230
I see, I'll show you, are more--

355
00:26:31,230 --> 00:26:33,930
they diverge and the other way,
and end up just floating around

356
00:26:33,930 --> 00:26:37,260
here, for instance.

357
00:26:37,260 --> 00:26:40,860
But they're not going
to get you up here.

358
00:26:40,860 --> 00:26:45,150
So can we design a
time-varying stabilizer

359
00:26:45,150 --> 00:26:46,958
that regulates that trajectory?

360
00:26:56,230 --> 00:27:04,570
OK, I did actually do the
original finite horizon LQR

361
00:27:04,570 --> 00:27:10,420
derivation on the
board that day--

362
00:27:10,420 --> 00:27:12,910
definitely won't write
all that again, but let

363
00:27:12,910 --> 00:27:22,790
me say that roughly nothing
in that derivation breaks--

364
00:27:22,790 --> 00:27:24,880
I'm going to show you
the important pieces--

365
00:27:24,880 --> 00:27:28,260
nothing in that derivation
breaks, surprisingly,

366
00:27:28,260 --> 00:27:32,670
if A and B are now
a function of time.

367
00:27:32,670 --> 00:27:33,670
So let's remember that--

368
00:27:42,480 --> 00:27:43,860
the LQR derivation.

369
00:27:56,860 --> 00:28:01,040
Now I'm working with this
x bar coordinate system.

370
00:28:12,150 --> 00:28:13,860
And I want to design
a cost function

371
00:28:13,860 --> 00:28:17,460
to minimize here, which lives
in this coordinate system

372
00:28:17,460 --> 00:28:18,900
again here.

373
00:28:18,900 --> 00:28:24,795
Let's say it's the
final horizon times Qf--

374
00:28:29,100 --> 00:28:30,600
I've been trying
to use t little f,

375
00:28:30,600 --> 00:28:34,590
since my transposes look
like the final horizon time

376
00:28:34,590 --> 00:28:36,180
otherwise--

377
00:28:36,180 --> 00:28:40,350
0 to tf dt x bar--

378
00:28:40,350 --> 00:28:46,380
again, transpose
Q plus u bar Ru.

379
00:28:57,800 --> 00:29:00,050
OK, in the original
LQR derivation,

380
00:29:00,050 --> 00:29:03,590
we guessed that the form--

381
00:29:03,590 --> 00:29:13,850
that the optimal policy had
the form x bar S of t x bar.

382
00:29:13,850 --> 00:29:15,080
That's still intact.

383
00:29:15,080 --> 00:29:16,850
That's still a good assumption.

384
00:29:16,850 --> 00:29:19,108
This thing's linear.

385
00:29:19,108 --> 00:29:20,900
It's just in a different
coordinate system.

386
00:29:25,620 --> 00:29:30,350
And we started cranking through
the sufficiency theorem,

387
00:29:30,350 --> 00:29:32,900
the Hamilton-Jacobi-Bellman
equation.

388
00:29:41,590 --> 00:29:44,740
And we found that our
optimal feedback policy--

389
00:29:44,740 --> 00:29:47,080
first of all, our
optimal cost-to-go

390
00:29:47,080 --> 00:29:51,790
was described by this
Riccati equation, which

391
00:29:51,790 --> 00:30:00,670
was negative S of t
is Q minus S of t B

392
00:30:00,670 --> 00:30:14,500
our inverse B transpose S of t
plus S of t A plus A transpose

393
00:30:14,500 --> 00:30:16,180
S of t.

394
00:30:16,180 --> 00:30:18,610
And it turns out that,
with the-- if you have

395
00:30:18,610 --> 00:30:21,490
a time-varying A
and B, that it's--

396
00:30:21,490 --> 00:30:24,890
exact same dynamics govern it.

397
00:30:24,890 --> 00:30:30,100
You just have your time
dependence also in A and B.

398
00:30:30,100 --> 00:30:35,190
And that exact same
Riccati equation works,

399
00:30:35,190 --> 00:30:40,220
and our final value
condition was just Qf.

400
00:30:49,813 --> 00:30:52,980
And you can see from this, if it
didn't make a difference for me

401
00:30:52,980 --> 00:30:56,160
when A and B became
functions of time,

402
00:30:56,160 --> 00:30:59,650
it's pretty simple-- although
less interesting, I guess.

403
00:30:59,650 --> 00:31:02,160
If Q were to be a function
of time-- no problem.

404
00:31:02,160 --> 00:31:03,960
If R was a function
of time-- no problem.

405
00:31:08,040 --> 00:31:14,580
They still have to be positive
definite and symmetric.

406
00:31:24,450 --> 00:31:27,510
Oops-- I did it the wrong way.

407
00:31:27,510 --> 00:31:29,520
Q can be 0, but R can not be 0.

408
00:31:38,910 --> 00:31:41,280
OK, so the LQR
you know and love,

409
00:31:41,280 --> 00:31:48,120
that you've used in Matlab,
is the time invariant infinite

410
00:31:48,120 --> 00:31:50,813
horizon LQR.

411
00:31:50,813 --> 00:31:52,980
I told you that, if you
cared about a finite horizon

412
00:31:52,980 --> 00:31:55,530
and you had a time invariant
linear system, then

413
00:31:55,530 --> 00:31:56,640
suddenly you had to--

414
00:31:56,640 --> 00:31:59,070
you couldn't just find the
stationary points in this.

415
00:31:59,070 --> 00:32:01,590
Remember, Matlab's solution
just tells you the long-term

416
00:32:01,590 --> 00:32:06,780
behavior of S. In the time--

417
00:32:06,780 --> 00:32:10,180
finite horizon time,
even the LTI case,

418
00:32:10,180 --> 00:32:13,480
which is the A and B
do not depend on time--

419
00:32:13,480 --> 00:32:15,420
the linear time invariant case--

420
00:32:15,420 --> 00:32:18,120
I still had to integrate back
this Riccati equation in order

421
00:32:18,120 --> 00:32:20,730
to get my LQR controller.

422
00:32:20,730 --> 00:32:22,710
It's no more expensive
to do the same thing

423
00:32:22,710 --> 00:32:25,890
in the linear time-varying
feedback case.

424
00:32:30,720 --> 00:32:32,400
And the resulting
controller is--

425
00:32:36,140 --> 00:32:45,650
u star is my nominal controller
minus my R inverse B transpose

426
00:32:45,650 --> 00:32:47,210
S of t x bar.

427
00:32:59,810 --> 00:33:03,890
These equations come
up enough that these

428
00:33:03,890 --> 00:33:07,220
are pretty famous, pretty
important equations, and so I--

429
00:33:07,220 --> 00:33:10,097
those I know off
the top of my head.

430
00:33:10,097 --> 00:33:11,180
They come up all the time.

431
00:33:15,320 --> 00:33:18,410
And this is the resulting
optimal trajectory,

432
00:33:18,410 --> 00:33:22,840
which is my nominal trajectory
plus my feedback gain, which

433
00:33:22,840 --> 00:33:24,650
came out of my original
LQR controller,

434
00:33:24,650 --> 00:33:25,525
if you remember that.

435
00:33:28,872 --> 00:33:32,978
AUDIENCE: [INAUDIBLE]

436
00:33:32,978 --> 00:33:34,020
RUSS TEDRAKE: Yes-- good.

437
00:33:34,020 --> 00:33:39,372
I should definitely put
a T under B. Thank you.

438
00:33:39,372 --> 00:33:41,580
I haven't written that case,
but R could equally well

439
00:33:41,580 --> 00:33:42,780
be time-dependent.

440
00:33:49,020 --> 00:33:51,870
OK, so something
big just happened.

441
00:33:54,660 --> 00:33:59,880
I can take a really, really
complicated non-linear system

442
00:33:59,880 --> 00:34:04,020
along some trajectory-- if
I find a good trajectory,

443
00:34:04,020 --> 00:34:06,210
then I can actually
linearize that system

444
00:34:06,210 --> 00:34:09,955
along this trajectory
and stabilize it.

445
00:34:09,955 --> 00:34:12,330
The thing I haven't convinced
you of yet-- because I only

446
00:34:12,330 --> 00:34:14,790
know how to do it
from showing examples,

447
00:34:14,790 --> 00:34:17,590
but it really works well.

448
00:34:17,590 --> 00:34:20,460
So even though it's
a linear system--

449
00:34:20,460 --> 00:34:23,610
it's a linear approximation
of the non-linear system,

450
00:34:23,610 --> 00:34:27,690
something like the [INAUDIBLE]
or the cartpole swing-up.

451
00:34:27,690 --> 00:34:29,699
It's got a huge
basin of attraction.

452
00:34:29,699 --> 00:34:31,440
Lots and lots of
initial conditions

453
00:34:31,440 --> 00:34:35,340
will find their way to the
trajectory and get to the goal.

454
00:34:35,340 --> 00:34:38,969
If you want to do non-linear
control of a humanoid robot

455
00:34:38,969 --> 00:34:45,449
or something like this, this
actually scales pretty nicely.

456
00:34:45,449 --> 00:34:47,520
I just have to
solve this equation.

457
00:34:47,520 --> 00:34:48,840
S is the size--

458
00:34:48,840 --> 00:34:54,570
is a matrix that's
by number of states.

459
00:34:54,570 --> 00:34:56,560
But I could do that
in 30 dimensions.

460
00:34:56,560 --> 00:34:59,160
That's no problem.

461
00:34:59,160 --> 00:35:01,500
And even for very
non-linear systems,

462
00:35:01,500 --> 00:35:06,000
local linear feedback
works very, very well--

463
00:35:06,000 --> 00:35:08,740
so well, in fact, that I
think that, if you ask--

464
00:35:08,740 --> 00:35:11,550
and when I did ask
the [INAUDIBLE] guys,

465
00:35:11,550 --> 00:35:13,743
Sasha Megretski says,
this is definitely

466
00:35:13,743 --> 00:35:15,660
what I would do if I was
controlling a walking

467
00:35:15,660 --> 00:35:19,770
robot or something like that.

468
00:35:19,770 --> 00:35:21,750
We're trying to do the
same thing to control

469
00:35:21,750 --> 00:35:23,645
neurons in a dish now.

470
00:35:23,645 --> 00:35:25,020
We're trying to
build good models

471
00:35:25,020 --> 00:35:27,810
of the dynamics-- time-varying
models, for instance--

472
00:35:27,810 --> 00:35:30,530
and then doing this
kind of control.

473
00:35:30,530 --> 00:35:32,010
Yeah.

474
00:35:32,010 --> 00:35:34,860
It works really, really well.

475
00:35:34,860 --> 00:35:37,500
The only complaint
about it is that it's

476
00:35:37,500 --> 00:35:41,680
going to have-- it's based
on this linear approximation,

477
00:35:41,680 --> 00:35:43,560
so it will have a finite
basin of attraction.

478
00:35:43,560 --> 00:35:46,680
For some systems,
it can be quite big.

479
00:35:46,680 --> 00:35:48,810
If you have systems with
hard non-linearities,

480
00:35:48,810 --> 00:35:51,113
it won't be as big.

481
00:35:51,113 --> 00:35:52,530
Later in the course,
I'll show you

482
00:35:52,530 --> 00:35:55,320
ways to explicitly reason
about the size of those basins

483
00:35:55,320 --> 00:35:57,540
of attraction, but
today let's just

484
00:35:57,540 --> 00:36:00,523
say this is a good
thing to know,

485
00:36:00,523 --> 00:36:01,940
good thing to have
in your pocket.

486
00:36:06,430 --> 00:36:08,510
Let me show you a working--

487
00:36:08,510 --> 00:36:10,810
try to convince you
that it's pretty good.

488
00:36:43,910 --> 00:36:47,720
OK, so let's see where
I've left myself here.

489
00:36:47,720 --> 00:36:50,020
I took this-- the pendulum--

490
00:36:50,020 --> 00:36:51,270
let's do the shooting version.

491
00:36:51,270 --> 00:36:54,080
They both work fine, but
let's do the shooting version.

492
00:37:00,130 --> 00:37:02,350
Is that bigger than
I did last time?

493
00:37:02,350 --> 00:37:03,390
That's pretty obnoxious.

494
00:37:03,390 --> 00:37:04,923
Maybe it's always
been obnoxious.

495
00:37:09,660 --> 00:37:11,550
Can we get away with that?

496
00:37:11,550 --> 00:37:13,080
Yeah.

497
00:37:13,080 --> 00:37:16,300
You guys are like,
I'm not blind.

498
00:37:16,300 --> 00:37:16,800
OK.

499
00:37:22,090 --> 00:37:24,910
So I showed you last
time the shooting code.

500
00:37:24,910 --> 00:37:31,810
It comes out with a
resulting tape x, t, and u.

501
00:37:31,810 --> 00:37:34,090
After the result of these
trajectory optimizers,

502
00:37:34,090 --> 00:37:37,180
whether it's shooting
or whether it's

503
00:37:37,180 --> 00:37:38,980
direct co-location--
whatever it is--

504
00:37:38,980 --> 00:37:40,720
it comes up with
some open-loop tape.

505
00:37:40,720 --> 00:37:43,720
I put x in there too just to--
as the reference trajectory

506
00:37:43,720 --> 00:37:45,700
that results, but
what really matters

507
00:37:45,700 --> 00:37:50,140
is the time stamps and u
command, the open-loop tape.

508
00:38:18,120 --> 00:38:19,370
Why don't I save it this time?

509
00:38:34,260 --> 00:38:35,870
OK, so it comes
up-- in this case,

510
00:38:35,870 --> 00:38:37,470
with these parameters
I've chosen,

511
00:38:37,470 --> 00:38:39,170
comes up with some
one-pump policy.

512
00:38:39,170 --> 00:38:41,750
With the torque limits I
have, the [INAUDIBLE] I have,

513
00:38:41,750 --> 00:38:44,390
it comes up with a one-pump
policy that gets me

514
00:38:44,390 --> 00:38:47,510
to the top in four seconds.

515
00:38:47,510 --> 00:38:53,421
OK, let me now just simulate
that a little bit differently.

516
00:39:06,190 --> 00:39:09,390
So the only thing I'm
going to do here now is--

517
00:39:09,390 --> 00:39:15,600
this control_ode is just a
simulation which plays back

518
00:39:15,600 --> 00:39:19,303
exactly the same open-loop
tape, but it plays it back

519
00:39:19,303 --> 00:39:20,970
with a little more
careful integration--

520
00:39:20,970 --> 00:39:23,137
because in the actual-- in
the shooting code I used,

521
00:39:23,137 --> 00:39:26,010
I used the big time step just
so I don't waste time computing

522
00:39:26,010 --> 00:39:27,810
gradients to the n-th
degree of accuracy.

523
00:39:27,810 --> 00:39:29,070
That's not worthwhile.

524
00:39:29,070 --> 00:39:31,140
If I simulate the
exact same thing back

525
00:39:31,140 --> 00:39:32,782
with a more careful
ode integration,

526
00:39:32,782 --> 00:39:33,740
let's see what happens.

527
00:39:43,690 --> 00:39:47,090
So that was that same
trajectory that--

528
00:39:47,090 --> 00:39:49,980
exact same control inputs,
just simulated more carefully.

529
00:39:49,980 --> 00:39:52,505
It made its honest
effort to get up there,

530
00:39:52,505 --> 00:39:53,880
but it didn't
quite get up there,

531
00:39:53,880 --> 00:39:56,550
turned around, and
came back down.

532
00:39:56,550 --> 00:39:59,820
I'm trying to show
it also in just--

533
00:39:59,820 --> 00:40:03,640
this is the different state
trajectories over time.

534
00:40:03,640 --> 00:40:05,760
You can see that the
red and blue lines

535
00:40:05,760 --> 00:40:09,570
are the desired versus actual
in the-- in theta, in this case.

536
00:40:09,570 --> 00:40:13,260
And these two lines
are the desired versus

537
00:40:13,260 --> 00:40:15,075
actual in theta dot.

538
00:40:15,075 --> 00:40:17,640
They start off exactly
on top of each other,

539
00:40:17,640 --> 00:40:20,070
but just little
differences in the numerics

540
00:40:20,070 --> 00:40:23,020
causes them to go in
different directions--

541
00:40:23,020 --> 00:40:23,700
part ways.

542
00:40:27,680 --> 00:40:32,240
OK, so now I've got
this LTV LQR solution,

543
00:40:32,240 --> 00:40:34,210
which is exactly what
I just showed you.

544
00:40:40,020 --> 00:40:43,070
So I was just simulating
a just now with just u

545
00:40:43,070 --> 00:40:44,360
being the nominal u.

546
00:40:44,360 --> 00:40:47,810
Now I'm going to add this
time-varying feedback term,

547
00:40:47,810 --> 00:40:48,950
x minus x desired.

548
00:40:55,030 --> 00:40:57,880
And now my more
careful integration

549
00:40:57,880 --> 00:41:00,830
results in a closed-loop system,
which not only got to the goal,

550
00:41:00,830 --> 00:41:02,705
but actually stayed up
at the goal, because I

551
00:41:02,705 --> 00:41:06,590
have a stable system
all the way to the top.

552
00:41:06,590 --> 00:41:07,450
OK?

553
00:41:07,450 --> 00:41:09,970
All right, so what I just
said was very unimpressive.

554
00:41:09,970 --> 00:41:15,580
I said I computed a open-loop
policy with my methods

555
00:41:15,580 --> 00:41:16,510
from Thursday.

556
00:41:16,510 --> 00:41:17,530
I simulated them back.

557
00:41:17,530 --> 00:41:19,450
They didn't work.

558
00:41:19,450 --> 00:41:21,212
But I then put a
feedback controller

559
00:41:21,212 --> 00:41:23,170
on, and from the exact
same initial conditions,

560
00:41:23,170 --> 00:41:25,930
I now can simulate
them, and they work.

561
00:41:25,930 --> 00:41:29,510
So it's disappointing that
we had to do that at all,

562
00:41:29,510 --> 00:41:32,590
but I can now--

563
00:41:32,590 --> 00:41:34,870
the stability is more
than just stabilizing

564
00:41:34,870 --> 00:41:36,310
the initial conditions.

565
00:41:36,310 --> 00:41:39,632
Let's add some fairly
big random numbers

566
00:41:39,632 --> 00:41:41,590
to that initial condition
and see what happens.

567
00:41:45,850 --> 00:41:49,387
It's recomputing the
policy every time,

568
00:41:49,387 --> 00:41:50,970
just because it was
fast enough that I

569
00:41:50,970 --> 00:41:53,800
didn't bother to change it.

570
00:41:53,800 --> 00:41:55,710
OK, so that actually
started with pretty big

571
00:41:55,710 --> 00:41:57,900
different initial conditions.

572
00:41:57,900 --> 00:42:00,756
So theta was off by--

573
00:42:00,756 --> 00:42:03,480
I don't know-- 2/10 of a
radian or something like this.

574
00:42:03,480 --> 00:42:06,212
The velocities were off by
1/2 a radian per second.

575
00:42:06,212 --> 00:42:07,170
We could crank that up.

576
00:42:07,170 --> 00:42:08,970
I bet it does a lot
better than that.

577
00:42:08,970 --> 00:42:11,610
But if you watch
these things, they

578
00:42:11,610 --> 00:42:16,260
converge quite nicely to
together at the end there.

579
00:42:22,580 --> 00:42:24,440
And what matters is
they get up to the top.

580
00:42:32,070 --> 00:42:34,590
So again, these things come
together, find their way up

581
00:42:34,590 --> 00:42:37,770
to the top, and live.

582
00:42:37,770 --> 00:42:44,280
I bet, if I put it a lot
bigger, it'll still work.

583
00:42:44,280 --> 00:42:46,110
I normally do an
order of magnitude,

584
00:42:46,110 --> 00:42:50,470
but let's not be silly.

585
00:42:50,470 --> 00:42:53,963
Oh-- didn't make it.

586
00:42:53,963 --> 00:42:56,130
There's only one reason it
didn't make it, actually.

587
00:42:56,130 --> 00:42:58,860
It's because, if
you look in here,

588
00:42:58,860 --> 00:43:02,830
I'm actually honest about
implementing the max torques.

589
00:43:02,830 --> 00:43:03,330
Yeah.

590
00:43:03,330 --> 00:43:05,340
So I actually have a
torque limit, I impose it,

591
00:43:05,340 --> 00:43:06,870
and it lives on there.

592
00:43:06,870 --> 00:43:08,532
If I didn't, I bet
I could convince you

593
00:43:08,532 --> 00:43:09,990
it works for any
initial condition.

594
00:43:09,990 --> 00:43:13,170
But let's try it one more
time-- get a little more lucky

595
00:43:13,170 --> 00:43:14,561
with the initial conditions.

596
00:43:17,950 --> 00:43:18,450
Oh, come on.

597
00:43:18,450 --> 00:43:19,450
Come on.

598
00:43:19,450 --> 00:43:21,420
Yes.

599
00:43:21,420 --> 00:43:24,330
OK, that was pretty far off, and
it's still found its way back

600
00:43:24,330 --> 00:43:26,100
to the trajectory.

601
00:43:26,100 --> 00:43:27,780
Good-- yeah?

602
00:43:27,780 --> 00:43:29,970
Look at how big those
initial conditions are.

603
00:43:29,970 --> 00:43:34,680
There and there versus--
wow, that's really good.

604
00:43:34,680 --> 00:43:35,610
OK.

605
00:43:35,610 --> 00:43:36,710
Did I see a question?

606
00:43:36,710 --> 00:43:38,950
No?

607
00:43:38,950 --> 00:43:42,690
All right, so this stuff
works for pendulum.

608
00:43:42,690 --> 00:43:44,520
It works for more
interesting systems too.

609
00:43:44,520 --> 00:43:47,330
I'll just show you the
cartpole real quick here.

610
00:43:53,140 --> 00:43:54,097
I won't do the--

611
00:43:54,097 --> 00:43:55,930
here is what it looks
like without feedback.

612
00:43:55,930 --> 00:44:02,680
I'll just do the initial
conditions corrupted solution,

613
00:44:02,680 --> 00:44:04,610
pump up--

614
00:44:04,610 --> 00:44:07,640
OK, so if you remember my
solutions from last time,

615
00:44:07,640 --> 00:44:09,140
I never drove off
the screen before,

616
00:44:09,140 --> 00:44:11,557
so that it was actually it
catching it by deviating enough

617
00:44:11,557 --> 00:44:13,098
that it came off
the screen, and then

618
00:44:13,098 --> 00:44:14,720
slowly coming back to the top.

619
00:44:21,000 --> 00:44:24,250
It must be its x position
or something going way off.

620
00:44:24,250 --> 00:44:25,810
No, not x position--
what is that?

621
00:44:25,810 --> 00:44:26,560
That's my control.

622
00:44:26,560 --> 00:44:27,460
Yeah.

623
00:44:27,460 --> 00:44:28,918
Did I do torque
limits on that one?

624
00:44:28,918 --> 00:44:30,130
I still did torque limits.

625
00:44:30,130 --> 00:44:33,590
I just set them high, I guess.

626
00:44:33,590 --> 00:44:34,550
Yeah.

627
00:44:34,550 --> 00:44:36,440
So it really works.

628
00:44:36,440 --> 00:44:39,320
And the cool thing is
the cost of implementing

629
00:44:39,320 --> 00:44:43,760
that LQR LTV stabilizer
was negligibly

630
00:44:43,760 --> 00:44:46,040
more than implementing the--

631
00:44:46,040 --> 00:44:48,709
most of that time was the
shooting optimization.

632
00:44:55,575 --> 00:44:56,440
Yes?

633
00:44:56,440 --> 00:45:00,083
AUDIENCE: Why do you
always start at the 0 time?

634
00:45:00,083 --> 00:45:01,750
You could look at the
initial conditions

635
00:45:01,750 --> 00:45:03,880
and look where is
the closest point

636
00:45:03,880 --> 00:45:08,110
on my nominal trajectories and
then do your control policy

637
00:45:08,110 --> 00:45:10,180
from that moment in time.

638
00:45:10,180 --> 00:45:11,197
RUSS TEDRAKE: OK.

639
00:45:11,197 --> 00:45:12,530
So that's a really, really good.

640
00:45:12,530 --> 00:45:15,530
OK, that's exactly what I want
to talk about next, actually.

641
00:45:23,740 --> 00:45:31,480
I designed a time-varying
feedback controller,

642
00:45:31,480 --> 00:45:36,580
is negative K of t x bar of t.

643
00:45:36,580 --> 00:45:39,230
I designed that ahead of time.

644
00:45:39,230 --> 00:45:41,200
And then, from the
initial conditions,

645
00:45:41,200 --> 00:45:45,970
I started simulating from 0,
and I just played out the--

646
00:45:45,970 --> 00:45:48,460
my nominal trajectory just
marched forward with time,

647
00:45:48,460 --> 00:45:50,710
my feedback controller just
marched forward with time,

648
00:45:50,710 --> 00:45:54,070
and my aerodynamics just
marched forward with time.

649
00:45:54,070 --> 00:45:56,770
OK, so before I explicitly
address your question,

650
00:45:56,770 --> 00:45:57,760
let me point out--

651
00:45:57,760 --> 00:46:02,320
let me ask even a
simpler question here.

652
00:46:08,187 --> 00:46:10,770
If I had plotted that in state
space, what you would have seen

653
00:46:10,770 --> 00:46:14,040
is that the trajectory starts
off somewhere in state space

654
00:46:14,040 --> 00:46:14,950
and comes together.

655
00:46:14,950 --> 00:46:15,850
That would have a good idea.

656
00:46:15,850 --> 00:46:17,308
Maybe I should do
that in a minute.

657
00:46:17,308 --> 00:46:21,390
But it comes together and finds
its way onto that trajectory.

658
00:46:21,390 --> 00:46:21,890
Yeah?

659
00:46:24,880 --> 00:46:26,220
OK, so here's the question.

660
00:46:28,950 --> 00:46:31,080
Instead of just changes
in initial conditions,

661
00:46:31,080 --> 00:46:34,110
what happens if I
have disturbances

662
00:46:34,110 --> 00:46:35,783
that push me off the trajectory?

663
00:46:35,783 --> 00:46:36,450
Well, that's OK.

664
00:46:36,450 --> 00:46:39,900
That's no different really than
a different initial condition.

665
00:46:39,900 --> 00:46:41,612
They'll come back on here.

666
00:46:41,612 --> 00:46:43,320
What happens if I have
a disturbance that

667
00:46:43,320 --> 00:46:49,140
pushes me along the trajectory
with this controller?

668
00:46:49,140 --> 00:46:54,330
Let's say I've got the
helpful disturbance, which,

669
00:46:54,330 --> 00:46:57,690
when I was right
here, just happened

670
00:46:57,690 --> 00:46:59,610
to push me right to there.

671
00:47:03,090 --> 00:47:06,030
What's my feedback
controller going to do?

672
00:47:06,030 --> 00:47:07,320
AUDIENCE: Slow it down.

673
00:47:07,320 --> 00:47:12,910
RUSS TEDRAKE: Yeah-- probably
in a dramatic fashion.

674
00:47:12,910 --> 00:47:17,760
It's the same way-- it tries
to quickly converge from here.

675
00:47:17,760 --> 00:47:20,850
It's going to push itself back
towards that point, possibly.

676
00:47:20,850 --> 00:47:22,830
Slowing down doesn't--
makes it sound--

677
00:47:22,830 --> 00:47:24,402
no big deal.

678
00:47:24,402 --> 00:47:25,860
It can't go backwards,
but it might

679
00:47:25,860 --> 00:47:29,880
try to do something more
severe to try to catch up

680
00:47:29,880 --> 00:47:33,510
with that old trajectory.

681
00:47:33,510 --> 00:47:37,680
So the major limitation of
this is that it's blindly--

682
00:47:37,680 --> 00:47:39,870
in order to have the strong
convergence properties

683
00:47:39,870 --> 00:47:43,710
that we have, the controller
is blindly marching forward

684
00:47:43,710 --> 00:47:45,330
in time.

685
00:47:45,330 --> 00:47:48,870
The great thing about switching
to a time parameterization

686
00:47:48,870 --> 00:47:52,080
I can compute everything--
everything's linear again.

687
00:47:52,080 --> 00:47:56,730
The bad thing is
you're a slave to time.

688
00:47:56,730 --> 00:47:59,100
So Phillip asked
a next question.

689
00:47:59,100 --> 00:48:00,600
He says, so why not--

690
00:48:00,600 --> 00:48:05,010
why do I just blindly start
marching forward from time 0?

691
00:48:05,010 --> 00:48:08,700
Maybe, if I have a
controller, I should just

692
00:48:08,700 --> 00:48:12,030
look for the closest
point in my trajectory,

693
00:48:12,030 --> 00:48:13,920
and then, instead of
indexing off time,

694
00:48:13,920 --> 00:48:18,780
index off some sort of phase,
some fraction of my trajectory,

695
00:48:18,780 --> 00:48:21,900
and then execute
that controller.

696
00:48:21,900 --> 00:48:25,710
And you can do that.

697
00:48:25,710 --> 00:48:27,600
I wish you the best
if you do that,

698
00:48:27,600 --> 00:48:31,950
but my suspicion is
that, if on every dt,

699
00:48:31,950 --> 00:48:34,220
you pick the closest
point in the trajectory,

700
00:48:34,220 --> 00:48:36,720
then the result is you're going
to chatter like you wouldn't

701
00:48:36,720 --> 00:48:38,370
believe.

702
00:48:38,370 --> 00:48:41,550
So there's a lot of
protection you get when you--

703
00:48:41,550 --> 00:48:42,960
you could think
of this very much

704
00:48:42,960 --> 00:48:46,980
as a gain-scheduled
linear controller.

705
00:48:46,980 --> 00:48:50,220
This is a time-varying
gain scheduling,

706
00:48:50,220 --> 00:48:52,900
and the problem is if
I switch gain quickly,

707
00:48:52,900 --> 00:48:54,540
then you're going
to get chattering.

708
00:48:54,540 --> 00:48:57,310
So it might make a lot
of sense, for instance,

709
00:48:57,310 --> 00:49:00,743
if you were to get a big
disturbance, to re-evaluate,

710
00:49:00,743 --> 00:49:02,160
and try to find
the closest point,

711
00:49:02,160 --> 00:49:06,690
and start executing that new
policy with time re-indexed.

712
00:49:06,690 --> 00:49:10,380
But it's probably a bad
idea, in my experience,

713
00:49:10,380 --> 00:49:12,450
to decide which part
of the trajectory

714
00:49:12,450 --> 00:49:14,340
you're closest to on every--

715
00:49:14,340 --> 00:49:16,385
every dt.

716
00:49:16,385 --> 00:49:17,510
That's probably a bad idea.

717
00:49:23,250 --> 00:49:24,840
Yes?

718
00:49:24,840 --> 00:49:26,760
AUDIENCE: Could you
maybe play some tricks

719
00:49:26,760 --> 00:49:28,815
if you had some idea of
the basin of attraction

720
00:49:28,815 --> 00:49:31,630
of the current point
you're trying to get to?

721
00:49:31,630 --> 00:49:33,530
And if you know that
you're outside of it,

722
00:49:33,530 --> 00:49:36,540
then work around
it, [INAUDIBLE]??

723
00:49:36,540 --> 00:49:37,290
RUSS TEDRAKE: Yes.

724
00:49:40,930 --> 00:49:45,270
So I have a particular trick
the does that does that in--

725
00:49:45,270 --> 00:49:48,240
we'll talk about it in
the motion planning, but--

726
00:49:48,240 --> 00:49:50,400
yeah, so Mark knows
about these tricks

727
00:49:50,400 --> 00:49:53,170
for computing basins of
attraction pretty efficiently.

728
00:49:53,170 --> 00:49:57,150
And so these days what
we do is we actually

729
00:49:57,150 --> 00:49:59,060
try to compute the funnel--

730
00:49:59,060 --> 00:50:00,810
the basin of attraction
of this trajectory

731
00:50:00,810 --> 00:50:05,610
around the trajectory, and
you could know discretely

732
00:50:05,610 --> 00:50:07,698
if you left that
basin of attraction.

733
00:50:07,698 --> 00:50:09,240
So I'll give you
the recipe for that,

734
00:50:09,240 --> 00:50:11,010
but it actually
makes more sense,

735
00:50:11,010 --> 00:50:13,860
I think, in the motion planning
context, where we actually

736
00:50:13,860 --> 00:50:16,713
will design
trajectories that fill

737
00:50:16,713 --> 00:50:17,880
the space with these basins.

738
00:50:21,660 --> 00:50:24,190
This is very similar to
the concept of flow tubes.

739
00:50:24,190 --> 00:50:24,690
Yes.

740
00:50:32,390 --> 00:50:36,170
OK, so big idea--

741
00:50:36,170 --> 00:50:40,160
turn my non-linear system into
a linear time-varying system,

742
00:50:40,160 --> 00:50:43,730
because I've re-parameterized
it that along the trajectory.

743
00:50:43,730 --> 00:50:48,770
Do linear time-varying control,
and even really complicated

744
00:50:48,770 --> 00:50:51,230
systems-- it'll work well.

745
00:50:51,230 --> 00:50:53,655
We're doing on our
[INAUDIBLE] plane.

746
00:50:53,655 --> 00:50:56,510
I mean, it's really
a pretty good idea.

747
00:50:59,168 --> 00:51:00,710
When I first started
working with it,

748
00:51:00,710 --> 00:51:05,600
I thought that it would
have the problem that--

749
00:51:09,380 --> 00:51:12,500
it would have the property
that it uses a lot of control

750
00:51:12,500 --> 00:51:14,120
to force itself back
to the trajectory

751
00:51:14,120 --> 00:51:16,550
and rigidly follow
the trajectory.

752
00:51:16,550 --> 00:51:19,130
It's easy to equate
linear control

753
00:51:19,130 --> 00:51:22,673
with high-gain linear feedback,
which people do a lot of,

754
00:51:22,673 --> 00:51:24,590
but it doesn't necessarily
need that property.

755
00:51:24,590 --> 00:51:26,780
If R is small in
this derivation,

756
00:51:26,780 --> 00:51:30,440
it can actually take very
subtle approaches back

757
00:51:30,440 --> 00:51:31,190
to the trajectory.

758
00:51:31,190 --> 00:51:34,470
Your system might come
in and do whatever

759
00:51:34,470 --> 00:51:36,470
it needs to get back on
the trajectory with very

760
00:51:36,470 --> 00:51:37,053
little torque.

761
00:51:41,880 --> 00:51:44,720
The only price you pay is,
if your torque is smaller,

762
00:51:44,720 --> 00:51:48,650
if you're penalizing
torque use higher,

763
00:51:48,650 --> 00:51:50,938
then you might restrict
your-- that might shrink

764
00:51:50,938 --> 00:51:51,980
your basin of attraction.

765
00:51:51,980 --> 00:51:56,450
It might be that, because it's
trying to use less torque,

766
00:51:56,450 --> 00:51:59,140
it will not overcome
the non-linearities.

767
00:51:59,140 --> 00:52:00,890
But in the neighborhood
of the trajectory,

768
00:52:00,890 --> 00:52:05,090
you can get these
very elegant solutions

769
00:52:05,090 --> 00:52:07,370
which look like minimal
energy kind of solutions

770
00:52:07,370 --> 00:52:09,380
for the non-linear
problem in the vicinity

771
00:52:09,380 --> 00:52:10,830
of these trajectories.

772
00:52:10,830 --> 00:52:12,560
So one of the ideas we'll
talk about later is how do you

773
00:52:12,560 --> 00:52:14,518
design the minimal set
of trajectories-- which,

774
00:52:14,518 --> 00:52:19,077
if you use these controllers,
which do the right thing

775
00:52:19,077 --> 00:52:19,910
in a lot of places--

776
00:52:25,187 --> 00:52:27,270
if you walked away from
this class knowing nothing

777
00:52:27,270 --> 00:52:31,350
but direct co-location and
linear time-varying feedback

778
00:52:31,350 --> 00:52:34,930
control, I bet you could
control a lot of cool systems.

779
00:52:34,930 --> 00:52:35,430
Yeah.

780
00:52:35,430 --> 00:52:37,555
I guess you also have to
know sys id, which I'm not

781
00:52:37,555 --> 00:52:39,750
going to tell you about.

782
00:52:39,750 --> 00:52:40,800
That's the gotcha.

783
00:52:40,800 --> 00:52:42,633
You have to have a model
for all this stuff.

784
00:52:45,295 --> 00:52:46,920
If someone gives you
a model, if you're

785
00:52:46,920 --> 00:52:50,130
willing to construct a model,
then you can do a lot of things

786
00:52:50,130 --> 00:52:50,880
with this.

787
00:52:58,570 --> 00:53:05,770
OK, I want to give you
one more mental picture

788
00:53:05,770 --> 00:53:09,070
to think about what this
is doing so it launches

789
00:53:09,070 --> 00:53:13,340
into the next thing here.

790
00:53:13,340 --> 00:53:20,140
So my cost-to-go function, which
I just erased, is, remember--

791
00:53:20,140 --> 00:53:29,890
my cost-to-go function, J of x
bar t, is x bar S of t x bar.

792
00:53:34,600 --> 00:53:35,680
This is a quadratic form.

793
00:53:35,680 --> 00:53:37,990
Just like the original
LQR, you can think

794
00:53:37,990 --> 00:53:39,850
of this as a quadratic bowl.

795
00:53:39,850 --> 00:53:42,340
In the LTI LQR case--

796
00:53:42,340 --> 00:53:45,880
am I OK throwing around
these three-letter acronyms?

797
00:53:45,880 --> 00:53:51,340
In the LTI LQR case, it
was a static quadratic bowl

798
00:53:51,340 --> 00:53:54,220
centered around the point
I'm trying to stabilize--

799
00:53:54,220 --> 00:53:55,210
so my cost-to-go.

800
00:53:55,210 --> 00:53:59,260
It said-- says, as I move
away from the point I'm trying

801
00:53:59,260 --> 00:54:04,380
to regulate, I'm going to incur
more cost in the direction--

802
00:54:04,380 --> 00:54:10,810
the rate it grows depends
on the variables inside S.

803
00:54:10,810 --> 00:54:15,910
Now, in this picture, I
have still a time-varying--

804
00:54:15,910 --> 00:54:18,850
I have a time-varying
quadratic bowl,

805
00:54:18,850 --> 00:54:22,630
but it's also
moving through time,

806
00:54:22,630 --> 00:54:27,940
because it's based on x bar.

807
00:54:27,940 --> 00:54:36,100
So in my pendulum world, if I
have this nominal trajectory,

808
00:54:36,100 --> 00:54:39,010
you can think of it as having
some quadratic bowl here.

809
00:54:41,950 --> 00:54:45,490
And the LTI stabilizer
that we did come up

810
00:54:45,490 --> 00:54:47,080
with that was based
on LQR did have

811
00:54:47,080 --> 00:54:50,320
some sort of quadratic bowl
shape that looked like that.

812
00:54:53,377 --> 00:54:55,960
Backwards in time, there's going
to be another quadratic bowl.

813
00:54:55,960 --> 00:54:58,450
Can I draw it very
badly like this?

814
00:55:04,010 --> 00:55:07,190
If I can just draw coming off
the board a little bit-- so

815
00:55:07,190 --> 00:55:11,940
there's some quadratic bowl
centered around this point,

816
00:55:11,940 --> 00:55:13,220
which is my costs-to-go.

817
00:55:13,220 --> 00:55:15,540
At that point, if I marched
further backwards in time,

818
00:55:15,540 --> 00:55:18,740
I've got some other quadratic
bowl around this point.

819
00:55:18,740 --> 00:55:20,775
That makes the
point, again, that--

820
00:55:24,590 --> 00:55:27,050
if my quadratic
ball is currently

821
00:55:27,050 --> 00:55:29,720
this because time is 5--

822
00:55:29,720 --> 00:55:32,930
or I had a 4-second trajectory--
maybe times 3 here--

823
00:55:32,930 --> 00:55:34,430
and I'm pushed along
the trajectory,

824
00:55:34,430 --> 00:55:37,520
it's actually going to
incur just as much cost,

825
00:55:37,520 --> 00:55:39,680
roughly, as I'm pushed
another direction.

826
00:55:39,680 --> 00:55:43,700
There's a quadratic bowl
literally centered around x0

827
00:55:43,700 --> 00:55:46,310
at time t.

828
00:55:46,310 --> 00:55:49,070
That's what this equation says.

829
00:55:51,810 --> 00:55:56,070
And this quadratic bowl is
the cost-to-go estimate.

830
00:55:56,070 --> 00:56:00,780
It says, if I'm away
from the trajectory,

831
00:56:00,780 --> 00:56:04,350
I should expect the cost
I incur in getting back

832
00:56:04,350 --> 00:56:10,080
towards that trajectory
to be this quadratic form.

833
00:56:10,080 --> 00:56:10,580
Is that OK?

834
00:56:15,810 --> 00:56:18,270
And the key point is,
because I've re-parameterized

835
00:56:18,270 --> 00:56:23,670
my equations in terms of x
bar, this quadratic bowl always

836
00:56:23,670 --> 00:56:26,718
lives on that trajectory.

837
00:56:30,570 --> 00:56:34,620
My cost function
was x bar Q x bar.

838
00:56:34,620 --> 00:56:36,195
My best thing to
do is to drive x

839
00:56:36,195 --> 00:56:38,940
bar to 0, which means
to drive my system back

840
00:56:38,940 --> 00:56:40,132
to the trajectory.

841
00:56:42,970 --> 00:56:45,170
People OK with that imagery?

842
00:56:45,170 --> 00:56:46,420
It doesn't look like they are.

843
00:56:46,420 --> 00:56:47,050
Everybody's OK.

844
00:56:50,870 --> 00:56:52,660
Are we OK here
the LTI stabilizer

845
00:56:52,660 --> 00:56:55,570
being an LQR bowl--
or a quadratic bowl?

846
00:56:55,570 --> 00:56:59,440
So the farther I am away in
the directions defined by S,

847
00:56:59,440 --> 00:57:02,620
I'm going to cut some
cost getting back.

848
00:57:02,620 --> 00:57:05,470
This is just the
same thing that says,

849
00:57:05,470 --> 00:57:08,920
if I'm at this point
in the trajectory,

850
00:57:08,920 --> 00:57:12,010
I'm going to cover
this cost-to-go.

851
00:57:12,010 --> 00:57:14,770
And the best thing to do,
the minimal cost-to-go

852
00:57:14,770 --> 00:57:18,670
is living right on
that trajectory.

853
00:57:18,670 --> 00:57:21,190
As a consequence, the
optimal controller,

854
00:57:21,190 --> 00:57:24,610
which tries to go down the
landscape of the cost-to-go,

855
00:57:24,610 --> 00:57:28,210
is going to drive you
back to the trajectory.

856
00:57:28,210 --> 00:57:30,460
Now, I said all that because
I'm about to do something

857
00:57:30,460 --> 00:57:31,600
that sounds totally wacky.

858
00:57:40,580 --> 00:57:44,420
Would it ever make sense for me
to design a slightly different

859
00:57:44,420 --> 00:57:48,200
cost function, which, when
I linearize and design

860
00:57:48,200 --> 00:57:52,730
the feedback controller, I end
up with a cost-to-go over here?

861
00:57:56,930 --> 00:57:58,940
Let's say I have some
nominal trajectory.

862
00:57:58,940 --> 00:58:01,970
I found, through
whatever method,

863
00:58:01,970 --> 00:58:06,230
some reasonable system
trajectory, but I really--

864
00:58:06,230 --> 00:58:07,700
I'm still not happy with that.

865
00:58:07,700 --> 00:58:13,260
The trajectory I really wanted
was something like this,

866
00:58:13,260 --> 00:58:15,620
let's say.

867
00:58:15,620 --> 00:58:18,880
Would it make any sense
to do my linearization

868
00:58:18,880 --> 00:58:23,840
around this trajectory,
and try to drive the system

869
00:58:23,840 --> 00:58:25,327
to this other trajectory?

870
00:58:29,303 --> 00:58:32,648
AUDIENCE: You mean like scaling
your optimal trajectory?

871
00:58:32,648 --> 00:58:34,315
RUSS TEDRAKE: I don't
even mean scaling.

872
00:58:34,315 --> 00:58:35,170
They could cross.

873
00:58:35,170 --> 00:58:36,202
They could do whatever.

874
00:58:36,202 --> 00:58:37,285
It's not a simple scaling.

875
00:58:39,830 --> 00:58:42,210
Let me give you a simpler
version of the problem.

876
00:58:50,511 --> 00:58:54,792
AUDIENCE: [INAUDIBLE]

877
00:58:54,792 --> 00:58:56,000
RUSS TEDRAKE: Say that again.

878
00:58:56,000 --> 00:59:05,520
AUDIENCE: [INAUDIBLE]

879
00:59:05,520 --> 00:59:06,390
RUSS TEDRAKE: Yes.

880
00:59:06,390 --> 00:59:10,440
I'm going to divine a
cost function, which

881
00:59:10,440 --> 00:59:15,210
would have it so I prefer
to live on that trajectory.

882
00:59:15,210 --> 00:59:17,430
Let me do it in the
time invariant case

883
00:59:17,430 --> 00:59:18,480
just so it's clear.

884
00:59:32,700 --> 00:59:35,880
Let's say my coordinate
system's back and simple.

885
00:59:35,880 --> 00:59:36,735
It lives around 0.

886
00:59:39,820 --> 00:59:42,870
Let's say I have that
cost function, or actually

887
00:59:42,870 --> 00:59:44,520
that dynamics.

888
00:59:44,520 --> 00:59:47,460
And instead of-- my original
cost function was just x

889
00:59:47,460 --> 00:59:49,050
transpose Qx--

890
00:59:49,050 --> 00:59:53,775
let's say my cost
function now is--

891
01:00:39,687 --> 01:00:41,520
let's think about this
problem for a second.

892
01:00:49,610 --> 01:00:51,910
So let's say I have
a linear system.

893
01:00:51,910 --> 01:00:53,980
Now, the LQR controller
we did initially--

894
01:00:57,950 --> 01:00:59,939
little sloppy with that.

895
01:01:04,630 --> 01:01:06,820
The LQR controller
I did initially

896
01:01:06,820 --> 01:01:09,410
always assumed that the desired
place you wanted to be in life

897
01:01:09,410 --> 01:01:09,910
was 0.

898
01:01:12,760 --> 01:01:15,640
If the desired place you want
to be in life is a constant--

899
01:01:15,640 --> 01:01:17,560
it's 3, let's say--

900
01:01:17,560 --> 01:01:21,670
then you can still do your
linear quadratic regulator.

901
01:01:21,670 --> 01:01:25,390
Just move your coordinate
system so the 3 is 0.

902
01:01:25,390 --> 01:01:27,100
But let's say I've
got a linear system,

903
01:01:27,100 --> 01:01:30,250
but I want to drive it
through some trajectory--

904
01:01:30,250 --> 01:01:34,570
time-varying trajectory-- x
desired as a function of time.

905
01:01:34,570 --> 01:01:37,905
Then I can't quite just
recenter the origin.

906
01:01:37,905 --> 01:01:39,280
I've got to think
about, how do I

907
01:01:39,280 --> 01:01:42,312
drive my linear system through
some other trajectories?

908
01:01:48,590 --> 01:01:51,440
Now the-- it's actually--

909
01:01:55,040 --> 01:02:00,900
LTI system, but my cost function
is time-varying, because my--

910
01:02:00,900 --> 01:02:05,120
I have the desired trajectory
that varies with time.

911
01:02:05,120 --> 01:02:07,790
The result-- I won't
write it down again--

912
01:02:07,790 --> 01:02:12,180
again, I can do this
Riccati equation.

913
01:02:12,180 --> 01:02:12,680
Back up.

914
01:02:15,410 --> 01:02:19,460
The only difference is
that the quadratic bowl

915
01:02:19,460 --> 01:02:23,682
is no longer going to be
centered on the origin.

916
01:02:23,682 --> 01:02:25,890
The quadratic bowl is going
to move with that desired

917
01:02:25,890 --> 01:02:26,920
trajectory.

918
01:02:29,860 --> 01:02:30,520
OK?

919
01:02:30,520 --> 01:02:31,048
Yeah?

920
01:02:31,048 --> 01:02:33,340
AUDIENCE: If that's far away
from where you linearized,

921
01:02:33,340 --> 01:02:34,330
could you--

922
01:02:34,330 --> 01:02:37,542
RUSS TEDRAKE: That's
an excellent question.

923
01:02:37,542 --> 01:02:39,250
But this is a linear
system, so first, we

924
01:02:39,250 --> 01:02:41,230
don't have to worry
about that, but don't let

925
01:02:41,230 --> 01:02:42,438
me forget to go back to that.

926
01:02:46,330 --> 01:02:50,020
So I can drive my linear system
through some trajectory that's

927
01:02:50,020 --> 01:02:54,940
non-zero beautifully
with an LQR controller.

928
01:02:54,940 --> 01:02:57,250
The only problem is
that my LQR controller

929
01:02:57,250 --> 01:03:01,720
has to have has a cost-to-go
function and a controller which

930
01:03:01,720 --> 01:03:03,940
is not pointing me
always at the origin.

931
01:03:03,940 --> 01:03:06,763
You wouldn't want that.

932
01:03:06,763 --> 01:03:08,930
So in fact, the way it
looks-- there's a lot of ways

933
01:03:08,930 --> 01:03:10,030
that people derive it.

934
01:03:10,030 --> 01:03:13,360
With Pontryagin, it's
not too hard to derive.

935
01:03:13,360 --> 01:03:16,285
I prefer to derive
it with the HJB.

936
01:03:25,850 --> 01:03:29,855
I'm not going to do
the derivation, but--

937
01:03:29,855 --> 01:03:31,730
I don't mean to bore
you, but what you end up

938
01:03:31,730 --> 01:03:43,730
with is J of x of t has a form
x transpose S of t x plus x

939
01:03:43,730 --> 01:03:44,510
transpose--

940
01:03:44,510 --> 01:03:46,610
I call this S2--

941
01:03:46,610 --> 01:03:51,035
S1 of t plus S0 of t.

942
01:03:51,035 --> 01:03:52,160
It's a full quadratic form.

943
01:03:55,350 --> 01:03:58,500
When I just have this, it's
always a quadratic bowl.

944
01:03:58,500 --> 01:04:00,325
It's always centered around 0.

945
01:04:00,325 --> 01:04:02,700
If you want it, in general,
to be a quadratic bowl that's

946
01:04:02,700 --> 01:04:06,270
not necessarily at 0, you
need the full quadratic form.

947
01:04:06,270 --> 01:04:07,860
I could equally well
have written this

948
01:04:07,860 --> 01:04:14,550
as x minus x something
desired, S of t.

949
01:04:14,550 --> 01:04:15,990
But let's work with this form.

950
01:04:15,990 --> 01:04:16,490
Yeah?

951
01:04:23,260 --> 01:04:26,590
So this is just an equation of a
quadratic bowl, not necessarily

952
01:04:26,590 --> 01:04:29,620
centered on the origin.

953
01:04:29,620 --> 01:04:35,920
And the LQR derivation gives me
my backwards dynamics for S2.

954
01:04:35,920 --> 01:04:40,060
It gives me the backwards
dynamics for S1 and for S0.

955
01:04:45,440 --> 01:04:46,940
And it's in the notes.

956
01:04:46,940 --> 01:04:49,010
It's actually already
in your notes.

957
01:04:49,010 --> 01:04:53,330
It's in the HJB chapter that
has been up there for a while.

958
01:04:57,170 --> 01:05:03,770
OK, now, the reason
I'm on about all this

959
01:05:03,770 --> 01:05:06,410
is that there's another way--

960
01:05:06,410 --> 01:05:08,180
I told you about
shooting methods.

961
01:05:08,180 --> 01:05:10,170
I told you about
direct co-location.

962
01:05:10,170 --> 01:05:11,810
There is yet another
way that people

963
01:05:11,810 --> 01:05:19,998
like to design trajectories,
which use LQR directly.

964
01:05:25,250 --> 01:05:27,578
And that's this
iterative LQR procedure.

965
01:05:42,910 --> 01:05:43,410
OK.

966
01:06:06,090 --> 01:06:09,900
So let's say I have some
trajectory that I've already

967
01:06:09,900 --> 01:06:16,440
found, x 0 of t, and I have
some different trajectory, which

968
01:06:16,440 --> 01:06:20,460
is my desired trajectory,
x desired of t.

969
01:06:24,710 --> 01:06:27,260
Then, using this
optimal tracking--

970
01:06:27,260 --> 01:06:30,020
if you stick back in the
time-varying components,

971
01:06:30,020 --> 01:06:32,990
using this optimal
tracking, I can

972
01:06:32,990 --> 01:06:35,420
linearize my dynamical
system around that.

973
01:06:35,420 --> 01:06:37,108
So I have no guarantees
that x desired

974
01:06:37,108 --> 01:06:38,150
is a feasible trajectory.

975
01:06:38,150 --> 01:06:40,250
In fact, many cases-- it's not.

976
01:06:40,250 --> 01:06:44,030
For instance, x desired might
be B at the goal at all times.

977
01:06:44,030 --> 01:06:47,840
If I came up with a perfectly
feasible x desired trajectory,

978
01:06:47,840 --> 01:06:52,280
I probably wouldn't be
running an open-loop solver.

979
01:06:52,280 --> 01:06:54,650
I want to get to I want to
get as close as desired--

980
01:06:54,650 --> 01:06:57,710
as possible to the x
desired while potentially

981
01:06:57,710 --> 01:07:00,220
minimizing cost and respecting
the dynamics of the system.

982
01:07:02,880 --> 01:07:04,820
Here's one way to do it--

983
01:07:04,820 --> 01:07:10,100
linearize my system around
my initial guess, x0 of t,

984
01:07:10,100 --> 01:07:12,860
then design a linear
optimal tracking--

985
01:07:12,860 --> 01:07:15,020
linear time-varying
optimal tracking which

986
01:07:15,020 --> 01:07:17,690
tries to regulate my
system as close as

987
01:07:17,690 --> 01:07:21,240
possible to that trajectory.

988
01:07:21,240 --> 01:07:24,600
Now, what Steven said
was exactly on point.

989
01:07:24,600 --> 01:07:30,510
If I drive my system away
from where I linearized,

990
01:07:30,510 --> 01:07:32,940
there's no guarantee
that my linear model

991
01:07:32,940 --> 01:07:36,660
is going to be any good here.

992
01:07:36,660 --> 01:07:40,310
But the hope is that this
trajectory is better--

993
01:07:40,310 --> 01:07:44,480
a better guess than
the one before.

994
01:07:44,480 --> 01:07:52,310
And you iterate, make another
approximation around there,

995
01:07:52,310 --> 01:07:55,550
design the LQR controller, run
the LQR controller that drives

996
01:07:55,550 --> 01:07:58,040
me here to find the new u tape.

997
01:07:58,040 --> 01:08:01,100
That defines my new
trajectory-- repeat.

998
01:08:01,100 --> 01:08:02,270
OK?

999
01:08:02,270 --> 01:08:05,120
That's called iterative LQR.

1000
01:08:05,120 --> 01:08:06,760
What else is it called?

1001
01:08:06,760 --> 01:08:07,760
Do you know?

1002
01:08:19,728 --> 01:08:20,840
Yeah.

1003
01:08:20,840 --> 01:08:22,092
Do you see that?

1004
01:08:22,092 --> 01:08:24,145
It's differential
dynamic programming--

1005
01:08:37,740 --> 01:08:38,327
almost.

1006
01:08:38,327 --> 01:08:40,410
There's a subtle difference,
which I can tell you,

1007
01:08:40,410 --> 01:08:42,640
if you want.

1008
01:08:42,640 --> 01:08:44,069
There's a lot of names for it.

1009
01:08:44,069 --> 01:08:48,149
There's another guy, Bobrow--
some of you know Jim Bobrow--

1010
01:08:48,149 --> 01:08:50,609
he wrote this up called the
sequential linear quadratic

1011
01:08:50,609 --> 01:08:53,979
regulators.

1012
01:08:53,979 --> 01:08:57,423
Any four-letter acronym
that ends in LQR--

1013
01:08:57,423 --> 01:08:59,340
if you put it in Google,
you'll find something

1014
01:08:59,340 --> 01:09:00,423
that's probably this idea.

1015
01:09:00,423 --> 01:09:01,350
Yeah.

1016
01:09:01,350 --> 01:09:04,050
If you put in whatever arbitrary
constant in front of it,

1017
01:09:04,050 --> 01:09:06,558
you'll probably
get this idea out.

1018
01:09:06,558 --> 01:09:10,645
AUDIENCE: What prevents
your actuator costs

1019
01:09:10,645 --> 01:09:14,978
from accumulating from
one iteration to another?

1020
01:09:14,978 --> 01:09:16,520
RUSS TEDRAKE: Every
iteration, you're

1021
01:09:16,520 --> 01:09:19,140
trying to minimize
your actuator cost.

1022
01:09:19,140 --> 01:09:23,051
AUDIENCE: Right, but I mean, if
you have a lot of iterations,

1023
01:09:23,051 --> 01:09:25,540
couldn't that potentially grow?

1024
01:09:25,540 --> 01:09:28,950
RUSS TEDRAKE: I don't
actually add to my old u tape.

1025
01:09:28,950 --> 01:09:31,740
I actually completely
replace my old u tape

1026
01:09:31,740 --> 01:09:34,350
with a new controller which
drives me to the system.

1027
01:09:34,350 --> 01:09:35,170
AUDIENCE: Oh, OK.

1028
01:09:35,170 --> 01:09:39,510
RUSS TEDRAKE: So there's no
worries about additive actions.

1029
01:09:39,510 --> 01:09:42,060
It actually tells me in my
original non-linear system

1030
01:09:42,060 --> 01:09:44,700
what's my best guess as
a u tape that goes there.

1031
01:09:47,990 --> 01:09:50,520
AUDIENCE: Is this
basically a trick

1032
01:09:50,520 --> 01:09:53,882
to get rid of the
slow [INAUDIBLE]??

1033
01:09:53,882 --> 01:09:55,840
RUSS TEDRAKE: So very,
very good-- so why would

1034
01:09:55,840 --> 01:09:56,590
I want to do this?

1035
01:09:56,590 --> 01:09:58,648
Why didn't I tell you
about this first, or why--

1036
01:09:58,648 --> 01:10:00,440
how does this compare
to the other methods?

1037
01:10:03,220 --> 01:10:06,170
There is a sense by which-- and
I thought about doing the whole

1038
01:10:06,170 --> 01:10:07,420
derivation, but I think this--

1039
01:10:07,420 --> 01:10:12,130
I hope that this short
discussion is sufficient.

1040
01:10:12,130 --> 01:10:14,230
So what I'm roughly
doing is I'm using

1041
01:10:14,230 --> 01:10:19,660
LQR to come up with a quadratic
approximation of where

1042
01:10:19,660 --> 01:10:20,500
my cost--

1043
01:10:20,500 --> 01:10:21,730
where my minimum is.

1044
01:10:24,760 --> 01:10:27,970
This is very much in
the spirit of those SQP

1045
01:10:27,970 --> 01:10:31,150
methods, the sequential
quadratic methods.

1046
01:10:31,150 --> 01:10:33,970
I'm using computation
on this line

1047
01:10:33,970 --> 01:10:37,447
to come up with a quadratic
approximation of where I think

1048
01:10:37,447 --> 01:10:38,530
the new minimum should be.

1049
01:10:41,110 --> 01:11:01,400
So as such, it's a relatively
cheap way with SQP properties,

1050
01:11:01,400 --> 01:11:02,935
convergence properties.

1051
01:11:12,640 --> 01:11:15,030
OK.

1052
01:11:15,030 --> 01:11:18,180
The methods I told you
about on Thursday--

1053
01:11:18,180 --> 01:11:19,890
the backprop through
time, the RTRL--

1054
01:11:22,530 --> 01:11:25,770
they computed J
over my trajectory.

1055
01:11:25,770 --> 01:11:30,000
They computed partial J, partial
alpha over my trajectory.

1056
01:11:30,000 --> 01:11:34,830
They did not ever explicitly
compute the second derivative.

1057
01:11:34,830 --> 01:11:42,450
I never computed partial J,
partial alpha, partial alpha.

1058
01:11:42,450 --> 01:11:45,030
To explicitly do an
SQP update, somebody

1059
01:11:45,030 --> 01:11:50,280
needs to compute the Hessian
of that optimization.

1060
01:11:50,280 --> 01:11:53,160
I'm relying on SNOPT
to do some bookkeeping

1061
01:11:53,160 --> 01:11:57,660
to estimate the Hessian to
do the second-order update.

1062
01:11:57,660 --> 01:12:00,300
I would do better if
I had an efficient way

1063
01:12:00,300 --> 01:12:02,370
to compute the
second derivatives,

1064
01:12:02,370 --> 01:12:04,830
and I could hand that
directly to SNOPT or whatever,

1065
01:12:04,830 --> 01:12:09,270
and we'd get-- expect
faster convergence.

1066
01:12:09,270 --> 01:12:12,760
This isn't quite the
gradients that I want,

1067
01:12:12,760 --> 01:12:15,810
but it has that feel to it,
and it has similar convergence

1068
01:12:15,810 --> 01:12:16,715
properties.

1069
01:12:16,715 --> 01:12:18,090
So what you should
think about is

1070
01:12:18,090 --> 01:12:19,632
you should think
about this is a more

1071
01:12:19,632 --> 01:12:22,020
explicit second-order
method for making

1072
01:12:22,020 --> 01:12:29,287
a large jump in my trajectories
with sequential quadratic

1073
01:12:29,287 --> 01:12:30,120
convergence results.

1074
01:12:32,820 --> 01:12:36,270
I feel like I've lost everybody
now, but ask questions,

1075
01:12:36,270 --> 01:12:38,820
if you need to.

1076
01:12:38,820 --> 01:12:43,680
The advantage of it
is that it's fast.

1077
01:12:43,680 --> 01:12:45,870
It could potentially
require very few iterations

1078
01:12:45,870 --> 01:12:48,465
to converge.

1079
01:12:48,465 --> 01:12:51,090
One of the strongest advantages
is that there's no explicit way

1080
01:12:51,090 --> 01:12:53,108
to do constraints.

1081
01:12:53,108 --> 01:12:55,650
You have to think harder about
how to do constraints in this.

1082
01:12:58,170 --> 01:12:59,700
And I know less
formal guarantees

1083
01:12:59,700 --> 01:13:01,920
that it will succeed,
because it's an approximation

1084
01:13:01,920 --> 01:13:03,645
of that quadratic.

1085
01:13:06,930 --> 01:13:10,860
So the RL community
uses DDP a lot,

1086
01:13:10,860 --> 01:13:12,900
and actually, a lot of
people who do DDP do

1087
01:13:12,900 --> 01:13:14,180
iterative LQR, for instance.

1088
01:13:14,180 --> 01:13:16,180
For instance, Peter
[INAUDIBLE] and those guys--

1089
01:13:16,180 --> 01:13:17,010
they always call DDP.

1090
01:13:17,010 --> 01:13:18,552
They're actually
doing iterative LQR.

1091
01:13:18,552 --> 01:13:21,930
DDP explicitly actually has--

1092
01:13:21,930 --> 01:13:25,800
you have to do a second-order
expansion of your dynamics,

1093
01:13:25,800 --> 01:13:27,840
so you don't just get A of t x.

1094
01:13:27,840 --> 01:13:30,930
You actually go to second-order
expansion of your dynamics.

1095
01:13:30,930 --> 01:13:33,750
So it's a little bit more
expensive of an update,

1096
01:13:33,750 --> 01:13:37,440
but most people equate it
almost exactly to iterative LQR.

1097
01:13:40,570 --> 01:13:44,970
AUDIENCE: So this
x0 trajectory, this

1098
01:13:44,970 --> 01:13:47,460
isn't a trajectory
you found by doing

1099
01:13:47,460 --> 01:13:49,010
RTRL or something like that?

1100
01:13:49,010 --> 01:13:50,318
This is something different?

1101
01:13:50,318 --> 01:13:51,860
RUSS TEDRAKE: Good--
so this could be

1102
01:13:51,860 --> 01:13:54,350
a standard replacement to RTRL.

1103
01:13:54,350 --> 01:13:57,240
I could start with a
random x0 trajectory.

1104
01:13:57,240 --> 01:14:00,560
So maybe it's better to start
with a random u trajectory,

1105
01:14:00,560 --> 01:14:03,530
simulate it, and get
an x0 trajectory.

1106
01:14:03,530 --> 01:14:06,500
And then it will
quickly reshape until it

1107
01:14:06,500 --> 01:14:09,869
gets as close as possible to
this x desired trajectory.

1108
01:14:09,869 --> 01:14:12,202
AUDIENCE: But you're reshaping
your control actions that

1109
01:14:12,202 --> 01:14:13,340
get you to the x trajectory?

1110
01:14:13,340 --> 01:14:14,090
RUSS TEDRAKE: Yes.

1111
01:14:14,090 --> 01:14:17,840
So I'm reshaping u,
resimulating to get the new x.

1112
01:14:17,840 --> 01:14:18,440
Yeah.

1113
01:14:18,440 --> 01:14:20,590
I wrote it more carefully
in the notes, and--

1114
01:14:20,590 --> 01:14:24,560
but I hope this is the
right level to do the class.

1115
01:14:24,560 --> 01:14:26,900
And there's one
extra thing that--

1116
01:14:26,900 --> 01:14:29,930
so I say this works if
you have a desired x--

1117
01:14:29,930 --> 01:14:36,180
desired trajectory, which
means your cost function

1118
01:14:36,180 --> 01:14:37,410
has this sort of a form.

1119
01:14:40,170 --> 01:14:43,590
The advocates of
iterative LQR and DDP

1120
01:14:43,590 --> 01:14:47,530
say that every cost
function has this form.

1121
01:14:47,530 --> 01:14:50,590
This is just a second-order
Taylor expansion

1122
01:14:50,590 --> 01:14:54,842
of whatever non-linear
cost function you want.

1123
01:14:54,842 --> 01:14:56,800
So write down whatever
non-linear cost function

1124
01:14:56,800 --> 01:15:00,070
you have, do a second-order
expansion on it,

1125
01:15:00,070 --> 01:15:05,160
and you end up with a quadratic
cost function like this.

1126
01:15:05,160 --> 01:15:08,540
And you can then
approximate that solution

1127
01:15:08,540 --> 01:15:10,970
with an iterative LQR scheme--

1128
01:15:10,970 --> 01:15:13,760
or RTRL, or backprop
through time.

1129
01:15:13,760 --> 01:15:18,020
This is the third out
of our list of methods.

1130
01:15:18,020 --> 01:15:19,087
My goal is only to know--

1131
01:15:19,087 --> 01:15:20,420
so that you know that it exists.

1132
01:15:20,420 --> 01:15:22,253
And you can read the
notes if you want more,

1133
01:15:22,253 --> 01:15:24,450
and you can read the
papers if you want more.

1134
01:15:24,450 --> 01:15:25,957
OK?

1135
01:15:25,957 --> 01:15:26,540
Yeah, Michael?

1136
01:15:26,540 --> 01:15:28,145
AUDIENCE: So I
think last time you

1137
01:15:28,145 --> 01:15:31,130
talked about you're
parallelizing the deviation

1138
01:15:31,130 --> 01:15:33,020
from your non-control input.

1139
01:15:33,020 --> 01:15:35,780
So what if you were--
like as you iterate

1140
01:15:35,780 --> 01:15:37,580
the controller, [INAUDIBLE]?

1141
01:15:41,660 --> 01:15:48,620
RUSS TEDRAKE: Good-- the total
cost is actually the cost

1142
01:15:48,620 --> 01:15:52,760
with respect to some u desired.

1143
01:15:52,760 --> 01:15:55,460
So I end up trying to optimize
that in a coordinate system

1144
01:15:55,460 --> 01:16:00,950
based on u0, but the cost I'm
trying to minimize is the u

1145
01:16:00,950 --> 01:16:03,258
the original coordinate
system minus u desired--

1146
01:16:03,258 --> 01:16:04,550
which, in a lot of cases, is 0.

1147
01:16:07,850 --> 01:16:09,860
Although I do it in a
weird coordinate system,

1148
01:16:09,860 --> 01:16:12,410
and it actually eventually
subtracts itself out

1149
01:16:12,410 --> 01:16:16,010
because I add it back
in at the end, and--

1150
01:16:16,010 --> 01:16:18,170
it's quite easy
to, for instance,

1151
01:16:18,170 --> 01:16:21,560
minimize u squared in the
original coordinate system.

1152
01:16:21,560 --> 01:16:22,060
OK?

1153
01:16:27,217 --> 01:16:29,050
So on Thursday, we get
to do walking robots.

1154
01:16:29,050 --> 01:16:31,008
We're going to move on
to the next major thing.

1155
01:16:31,008 --> 01:16:37,120
But you've now learned three
of the open-loop trajectory

1156
01:16:37,120 --> 01:16:39,510
optimizers that
people really use--

1157
01:16:39,510 --> 01:16:45,010
iterative LQR very quickly,
RTRL backprop through time--

1158
01:16:45,010 --> 01:16:46,540
I grouped as one--

1159
01:16:46,540 --> 01:16:50,800
the shooting methods
and direct co-location.

1160
01:16:50,800 --> 01:16:53,173
There's another one
that's recent addition

1161
01:16:53,173 --> 01:16:55,090
to the scene, which is
this discrete mechanics

1162
01:16:55,090 --> 01:16:57,765
and optimal control, this DMOC.

1163
01:16:57,765 --> 01:16:59,140
If anybody was
excited about that

1164
01:16:59,140 --> 01:17:00,310
and wanted to do a
class project on that,

1165
01:17:00,310 --> 01:17:01,780
that would be a perfect thing.

1166
01:17:01,780 --> 01:17:03,630
Grab that paper.

1167
01:17:03,630 --> 01:17:06,047
Show us that it works on
the [INAUDIBLE] carpole.

1168
01:17:06,047 --> 01:17:06,880
That'd be beautiful.

1169
01:17:06,880 --> 01:17:09,920
I'd love to have that--

1170
01:17:09,920 --> 01:17:13,840
have us try that and see how it
compares to the other methods.

1171
01:17:18,083 --> 01:17:20,500
You've got a pretty good toolkit
for optimal control now--

1172
01:17:20,500 --> 01:17:23,650
practical optimal control.

1173
01:17:23,650 --> 01:17:26,950
And it works for flying
robots, but it also

1174
01:17:26,950 --> 01:17:30,070
worked for your wheeled robots,
if you want to control them

1175
01:17:30,070 --> 01:17:31,210
with better control.

1176
01:17:35,470 --> 01:17:38,260
You could do a drop-in
replacement LTI optimal

1177
01:17:38,260 --> 01:17:40,450
tracking controller,
and it would be better--

1178
01:17:40,450 --> 01:17:42,880
assuming your model's better.

1179
01:17:42,880 --> 01:17:46,300
So you have these tools.

1180
01:17:46,300 --> 01:17:49,190
Quick procedural things--
I know we're out of time.

1181
01:17:49,190 --> 01:17:52,150
So next Thursday-- well, so let
me say the good thing first.

1182
01:17:52,150 --> 01:17:54,640
In two weeks, you're
on spring break.

1183
01:17:54,640 --> 01:17:56,560
Yeah.

1184
01:17:56,560 --> 01:17:58,945
The Thursday preceding
that is our midterm.

1185
01:18:02,270 --> 01:18:04,458
We haven't had a midterm
in the class before,

1186
01:18:04,458 --> 01:18:06,250
so there's no old exams
for me to give you,

1187
01:18:06,250 --> 01:18:08,000
but John and I are
going to try to come up

1188
01:18:08,000 --> 01:18:10,810
with some representative
problems for you

1189
01:18:10,810 --> 01:18:13,570
to take home for
Thursday of this week

1190
01:18:13,570 --> 01:18:17,350
so you can have some problems
to munch on over the weekend.

1191
01:18:17,350 --> 01:18:21,170
It'll be an in-class exam
Thursday before spring break,

1192
01:18:21,170 --> 01:18:23,800
which is a week from Thursday.

1193
01:18:23,800 --> 01:18:25,690
OK?

1194
01:18:25,690 --> 01:18:27,280
AUDIENCE: [INAUDIBLE]

1195
01:18:27,280 --> 01:18:28,030
RUSS TEDRAKE: Yes.

1196
01:18:28,030 --> 01:18:32,410
So open-book-- well, you
can grab whatever notes--

1197
01:18:32,410 --> 01:18:36,760
open-note exam-- absolutely.

1198
01:18:36,760 --> 01:18:38,950
Well, I'll say it more in
the preparation package,

1199
01:18:38,950 --> 01:18:40,782
but roughly, we're going to--

1200
01:18:40,782 --> 01:18:42,490
I think, if you have
your notes with you,

1201
01:18:42,490 --> 01:18:44,350
if you've done the problem set--

1202
01:18:44,350 --> 01:18:47,470
and most importantly, if you
know how these algorithms--

1203
01:18:47,470 --> 01:18:51,190
where the algorithms relate to
each other and where they'd be

1204
01:18:51,190 --> 01:18:52,390
used in different systems--

1205
01:18:52,390 --> 01:18:56,020
I can guarantee I'm going to
ask you something about that--

1206
01:18:56,020 --> 01:18:58,540
then it's not designed
to be a killer.

1207
01:19:02,560 --> 01:19:06,385
Good-- and I hope you start
thinking about projects.

1208
01:19:09,400 --> 01:19:12,830
Just out of being a
fairly nice person,

1209
01:19:12,830 --> 01:19:16,627
I wasn't going to ask you to do
projects before your midterm.

1210
01:19:16,627 --> 01:19:18,460
But this time last year,
I was asking people

1211
01:19:18,460 --> 01:19:21,310
to submit project proposals.

1212
01:19:21,310 --> 01:19:23,950
We're going to do that
immediately after the midterm.

1213
01:19:23,950 --> 01:19:26,518
If you've been chewing
on, this method

1214
01:19:26,518 --> 01:19:28,810
looked like a really good
match to my research problem,

1215
01:19:28,810 --> 01:19:32,390
or I've never actually thought
about juggling robots before,

1216
01:19:32,390 --> 01:19:35,722
or something like
this, you can imagine--

1217
01:19:35,722 --> 01:19:37,180
so in the fairly
near future, we're

1218
01:19:37,180 --> 01:19:41,290
going to ask you for a
half-page project proposal

1219
01:19:41,290 --> 01:19:43,450
that we can iterate
with you on to get going

1220
01:19:43,450 --> 01:19:46,120
on a world-class final project.

1221
01:19:46,120 --> 01:19:48,070
Yeah?

1222
01:19:48,070 --> 01:19:49,860
See you Thursday.