WEBVTT
00:00:00.000 --> 00:00:02.520
The following content is
provided under a Creative
00:00:02.520 --> 00:00:03.970
Commons license.
00:00:03.970 --> 00:00:06.330
Your support will help
MIT OpenCourseWare
00:00:06.330 --> 00:00:10.660
continue to offer high-quality
educational resources for free.
00:00:10.660 --> 00:00:13.320
To make a donation or
view additional materials
00:00:13.320 --> 00:00:17.170
from hundreds of MIT courses,
visit MIT OpenCourseWare
00:00:17.170 --> 00:00:18.370
at ocw.mit.edu.
00:00:21.672 --> 00:00:22.380
RUSS TEDRAKE: OK.
00:00:22.380 --> 00:00:23.100
Welcome back.
00:00:26.010 --> 00:00:27.750
Since we ended abruptly,
I want to start
00:00:27.750 --> 00:00:30.990
with a recap of last time.
00:00:30.990 --> 00:00:34.020
And then we've got a lot
of new ground to cover.
00:00:34.020 --> 00:00:43.350
So remember last
time, we considered
00:00:43.350 --> 00:00:50.550
the system q double dot equals
u, which is of a general form,
00:00:50.550 --> 00:00:54.720
just a linear feedback system,
which is state space form
00:00:54.720 --> 00:01:00.060
looks like this, where it
happens that a and b are
00:01:00.060 --> 00:01:01.050
particularly simple.
00:01:03.780 --> 00:01:06.520
And we looked at designing--
00:01:06.520 --> 00:01:08.020
let's not say
designing controller--
00:01:08.020 --> 00:01:10.890
we looked at reshaping
the phase space
00:01:10.890 --> 00:01:12.850
a couple of different ways.
00:01:12.850 --> 00:01:17.700
The first way, which is the
sort of 6.302 way, maybe,
00:01:17.700 --> 00:01:21.660
would be designing sort
of by pole placement,
00:01:21.660 --> 00:01:26.610
by designing feedback gains
possibly by hand, possibly
00:01:26.610 --> 00:01:28.960
by a root locus analysis.
00:01:28.960 --> 00:01:44.700
So we looked at manually
designing some linear feedback
00:01:44.700 --> 00:01:48.750
law, u equals negative Kx.
00:01:48.750 --> 00:02:00.040
And we did things like plotting
the phase portrait, which
00:02:00.040 --> 00:02:16.810
gave us for q, q dot
a phase portrait that
00:02:16.810 --> 00:02:33.090
looked like this, where this has
an eigenvalue of negative 3.75
00:02:33.090 --> 00:02:39.330
approximately, and this one had
an eigenvalue of negative 0.25.
00:02:39.330 --> 00:02:43.815
This was all for K equals 1, 4.
00:02:46.440 --> 00:02:47.700
OK.
00:02:47.700 --> 00:02:53.130
And we ended up seeing that
just from that quick analysis
00:02:53.130 --> 00:02:56.220
we could see phase portraits
which looked like this.
00:02:56.220 --> 00:02:58.680
They come across the
origin, and then they'd
00:02:58.680 --> 00:03:01.120
hook in towards the goal.
00:03:04.350 --> 00:03:05.622
And similarly here.
00:03:05.622 --> 00:03:07.830
This one's so much faster
that it would go like this.
00:03:12.340 --> 00:03:15.570
Then we looked at
an optimal control
00:03:15.570 --> 00:03:18.930
way of solving the same thing.
00:03:18.930 --> 00:03:38.220
We looked at doing a minimum
time optimal control approach,
00:03:38.220 --> 00:03:41.430
not specifically so that
we could get there faster,
00:03:41.430 --> 00:03:45.135
even though "minimum time"
is in the name, because here
00:03:45.135 --> 00:03:47.010
remember, we could get
there arbitrarily fast
00:03:47.010 --> 00:03:50.190
by just cranking K as high
as we wanted, but actually
00:03:50.190 --> 00:03:53.250
for trying to do something
a little bit smarter, which
00:03:53.250 --> 00:03:57.630
is get there in minimum time
when I have an extra constraint
00:03:57.630 --> 00:04:00.720
that u was bounded, in the
case we looked at yesterday
00:04:00.720 --> 00:04:04.680
was bounded by negative 1, 1.
00:04:04.680 --> 00:04:08.250
And in that case when
u was bounded, now
00:04:08.250 --> 00:04:10.270
the minimum time problem
becomes nontrivial.
00:04:10.270 --> 00:04:13.380
It's not just crank
the gains to infinity.
00:04:13.380 --> 00:04:19.079
And we had to use some
better thinking about it.
00:04:19.079 --> 00:04:28.980
And the result was a phase
portrait which actually, I
00:04:28.980 --> 00:04:30.990
don't know if you
left realizing,
00:04:30.990 --> 00:04:36.880
it didn't look that
different in some ways.
00:04:36.880 --> 00:04:42.840
Remember, we had these
switching surfaces defined here.
00:04:46.620 --> 00:04:56.820
And above this, we'd execute one
policy, one bang-bang solution.
00:04:56.820 --> 00:05:01.500
And then below it we'd
execute another one.
00:05:01.500 --> 00:05:03.900
And the resulting system
trajectories-- remember,
00:05:03.900 --> 00:05:08.280
this one hooked down across the
origin and went into the goal
00:05:08.280 --> 00:05:09.600
like that.
00:05:09.600 --> 00:05:13.770
This one really did exactly
the same thing, right?
00:05:13.770 --> 00:05:15.600
They would start over here.
00:05:15.600 --> 00:05:18.030
They'd hook down here with--
00:05:18.030 --> 00:05:21.935
this time they'd explicitly
hit that switching surface
00:05:21.935 --> 00:05:23.310
and then ride that
into the goal.
00:05:26.780 --> 00:05:31.230
So it's a little bit of a
sharper result, possibly,
00:05:31.230 --> 00:05:32.160
than the other one.
00:05:32.160 --> 00:05:38.520
And that final surface was a
curve instead of this line.
00:05:38.520 --> 00:05:43.470
And for that, we got to
have good performance
00:05:43.470 --> 00:05:45.150
with bounded torques.
00:05:48.810 --> 00:05:56.640
Now, we also did the
first of two ways
00:05:56.640 --> 00:05:59.610
that we're going to use
to sort of analytically
00:05:59.610 --> 00:06:01.230
investigate optimality.
00:06:07.005 --> 00:06:08.255
AUDIENCE: Can I interrupt you?
00:06:08.255 --> 00:06:10.410
RUSS TEDRAKE: Anytime, yeah.
00:06:10.410 --> 00:06:13.980
AUDIENCE: Was there a good
reason we just-- basically said
00:06:13.980 --> 00:06:17.544
we want to do linear
feedback there?
00:06:17.544 --> 00:06:20.730
Could we have done
like x1 times x2?
00:06:20.730 --> 00:06:22.440
RUSS TEDRAKE: Good, yeah.
00:06:22.440 --> 00:06:24.490
Because-- well, there's
a lot of good reasons.
00:06:24.490 --> 00:06:28.570
So it's because then the closed
loop dynamics are linear,
00:06:28.570 --> 00:06:32.340
and we can analyze them
in every way, including
00:06:32.340 --> 00:06:34.500
making these plots in
ways that I couldn't have
00:06:34.500 --> 00:06:35.970
done if this was nonlinear.
00:06:39.523 --> 00:06:41.940
Another answer would be that
this is what 90% of the world
00:06:41.940 --> 00:06:45.210
would have done, if
that's satisfying at all.
00:06:45.210 --> 00:06:49.200
I think that's the
dominant way of sort
00:06:49.200 --> 00:06:51.540
of thinking about these things.
00:06:51.540 --> 00:06:55.410
x1 times x2 is comparably
much harder to reason about,
00:06:55.410 --> 00:06:56.163
actually.
00:06:56.163 --> 00:06:57.371
AUDIENCE: I totally get that.
00:06:57.371 --> 00:07:00.090
But is there like a system
that the optimal control that
00:07:00.090 --> 00:07:04.095
lies in the space that
you have to take into
00:07:04.095 --> 00:07:06.398
account these different
approximations.
00:07:06.398 --> 00:07:07.190
RUSS TEDRAKE: Good.
00:07:07.190 --> 00:07:12.500
So this is an example of
a nonlinear controller.
00:07:12.500 --> 00:07:14.840
It happens that the
actual control action
00:07:14.840 --> 00:07:17.810
is either 1 or negative 1.
00:07:17.810 --> 00:07:21.500
But the decision plane
is very nonlinear.
00:07:21.500 --> 00:07:25.620
So that's absolutely a
nonlinear controller.
00:07:25.620 --> 00:07:26.660
It came out of linear--
00:07:26.660 --> 00:07:29.450
out of optimal control
on a linear system.
00:07:29.450 --> 00:07:32.040
But the result is a
nonlinear controller.
00:07:32.040 --> 00:07:32.540
OK.
00:07:35.130 --> 00:07:37.048
Now, certain classes of
nonlinear controllers
00:07:37.048 --> 00:07:39.090
are going to pop out and
be easier to think about
00:07:39.090 --> 00:07:41.610
than the broad class.
00:07:41.610 --> 00:07:44.910
But we're going to see lots of
instances as quickly as we can.
00:07:50.020 --> 00:07:50.520
OK.
00:07:50.520 --> 00:07:55.740
So we did-- we actually got that
curve by thinking just about--
00:07:55.740 --> 00:08:00.155
just using our intuition to
reason about bang-bang control.
00:08:00.155 --> 00:08:01.530
At the end, I
started to show you
00:08:01.530 --> 00:08:06.360
that the same thing comes out of
what I call solution technique
00:08:06.360 --> 00:08:09.360
1 here.
00:08:13.830 --> 00:08:16.590
I wouldn't call it that
outside of the room.
00:08:16.590 --> 00:08:19.770
That's just me being
clear here, which
00:08:19.770 --> 00:08:23.388
was based on Pontryagin's
minimum principle.
00:08:37.320 --> 00:08:41.674
Which in this case, is
nothing more than just--
00:08:41.674 --> 00:08:43.049
let's write it
down, exactly what
00:08:43.049 --> 00:08:44.299
we mean by this cost function.
00:08:47.240 --> 00:08:50.370
We have some-- let me be
a little bit more loose.
00:08:50.370 --> 00:08:54.000
We have J, some cost
function we want to optimize,
00:08:54.000 --> 00:09:00.780
which is a finite
time integral of 1 dt.
00:09:00.780 --> 00:09:06.180
That sounds ridiculous, but
we're just optimizing time.
00:09:06.180 --> 00:09:11.010
But we want to optimize that
subject to the constraints
00:09:11.010 --> 00:09:17.520
that x dot equals f of
x u, which in this case
00:09:17.520 --> 00:09:28.580
is our linear system;
and the constraint that u
00:09:28.580 --> 00:09:38.240
was negative 1 in that regime;
and the constraint that at time
00:09:38.240 --> 00:09:40.880
t, x t had better
be at the origin.
00:09:44.150 --> 00:09:45.950
Given those
constraints, we can say
00:09:45.950 --> 00:09:54.275
let's minimize T. We're going
to minimize that J, sorry.
00:09:54.275 --> 00:09:55.940
I already got the
t in there, so.
00:09:55.940 --> 00:10:04.490
Minimize with respect to
the trajectory in x, u,
00:10:04.490 --> 00:10:06.140
that cost function.
00:10:06.140 --> 00:10:11.060
I use this overbar to
denote the entire time
00:10:11.060 --> 00:10:18.680
history of a variable like x
t1 to t final, or something
00:10:18.680 --> 00:10:20.090
like this-- time t0 to t final.
00:10:24.700 --> 00:10:25.270
OK.
00:10:25.270 --> 00:10:26.645
That's how we set
up the problem.
00:10:26.645 --> 00:10:28.680
It's just optimizing
some function
00:10:28.680 --> 00:10:33.040
but subject to a
handful of constraints.
00:10:33.040 --> 00:10:36.250
Pontryagin's minimum
principle is nothing more
00:10:36.250 --> 00:10:39.340
than putting Lagrange
multipliers to work
00:10:39.340 --> 00:10:41.800
to turn that
constrained optimization
00:10:41.800 --> 00:10:46.180
into unconstrained optimization.
00:10:46.180 --> 00:10:58.270
And for this problem, we can
build our augmented system
00:10:58.270 --> 00:11:03.540
I'll call J prime here,
which just is the same thing
00:11:03.540 --> 00:11:05.320
but taking in the constraints.
00:11:05.320 --> 00:11:08.670
So first of all, we've got a
constraint on x T equaling 0.
00:11:08.670 --> 00:11:11.100
So I can put that in as
a Lagrange multiplier,
00:11:11.100 --> 00:11:14.370
let's say lambda times something
that better equal 0, which
00:11:14.370 --> 00:11:17.940
in this case was
just x t And then
00:11:17.940 --> 00:11:26.760
plus 0 to t1 plus the
constraint on the dynamics,
00:11:26.760 --> 00:11:29.940
which I'll call it a different
Lagrange multiplier p,
00:11:29.940 --> 00:11:37.517
times f of x, u minus x
dot, this whole thing dt.
00:11:42.190 --> 00:11:42.760
Yes?
00:11:42.760 --> 00:11:43.760
AUDIENCE: How do you
impose the constraint
00:11:43.760 --> 00:11:44.680
that u is [INAUDIBLE]?
00:11:44.680 --> 00:11:45.597
RUSS TEDRAKE: Awesome.
00:11:45.597 --> 00:11:46.910
Good question.
00:11:46.910 --> 00:11:51.020
So it turns out what
we're going to look at--
00:11:51.020 --> 00:11:54.160
we want to verify that
this thing is optimal.
00:11:54.160 --> 00:11:56.620
So you might want to put that
constraint right in here.
00:11:56.620 --> 00:11:59.870
But it actually
is more natural--
00:11:59.870 --> 00:12:02.570
here, let me finish
my statement here.
00:12:02.570 --> 00:12:06.920
The way we're going to verify
optimality of this policy
00:12:06.920 --> 00:12:12.380
is by verifying that we're at
a local minimum in J prime.
00:12:12.380 --> 00:12:17.090
I want to say that if I
change x, If I change u,
00:12:17.090 --> 00:12:24.757
if I change p in any admissible
way, then J is going to change.
00:12:24.757 --> 00:12:26.840
Small changes in here is
not going to change this.
00:12:26.840 --> 00:12:28.215
I'm at a local
minima in J prime.
00:12:31.195 --> 00:12:34.398
That's the minimum
principle idea, right?
00:12:34.398 --> 00:12:36.440
I just want my-- if I'm
at a minimum of function,
00:12:36.440 --> 00:12:37.820
the gradient is 0.
00:12:37.820 --> 00:12:40.220
In the Lagrange
multiplier, the minimum
00:12:40.220 --> 00:12:43.430
of this augmented function,
the gradient had to be 0.
00:12:43.430 --> 00:12:46.400
So if I change any of
these, I want that to be--
00:12:46.400 --> 00:12:50.420
that change to be 0.
00:12:50.420 --> 00:12:53.660
So it turns out that
the more natural way
00:12:53.660 --> 00:12:59.352
to look at this
bound in u is by not
00:12:59.352 --> 00:13:01.810
changing-- not allowing u to
change outside of that regime.
00:13:07.610 --> 00:13:09.960
This is actually
fairly procedural.
00:13:09.960 --> 00:13:12.290
So you end up
doing this calculus
00:13:12.290 --> 00:13:15.530
of variations on J prime.
00:13:15.530 --> 00:13:17.998
But I actually-- I made
a call earlier today.
00:13:17.998 --> 00:13:20.540
I think it's going to-- if I do
it right now in the beginning
00:13:20.540 --> 00:13:22.040
in class, I'm going
to lose you to--
00:13:22.040 --> 00:13:25.280
I mean, I'm going to
bore you and lose you.
00:13:25.280 --> 00:13:28.160
But it's in the
notes, and it's clean.
00:13:28.160 --> 00:13:30.380
So I'm going to
leave that hanging
00:13:30.380 --> 00:13:32.630
and let you look
at it in the notes
00:13:32.630 --> 00:13:36.770
without typos that I might
put up on the board, OK?
00:13:36.770 --> 00:13:40.340
Because I want to move on
to the dynamic programming
00:13:40.340 --> 00:13:43.250
view of the world, sort of
the other possible solution
00:13:43.250 --> 00:13:44.134
technique.
00:13:50.070 --> 00:13:51.500
OK.
00:13:51.500 --> 00:13:56.330
So today, we're going to do--
00:13:56.330 --> 00:13:58.580
you can think of it as just
solution technique 2 here.
00:14:04.550 --> 00:14:08.465
And it's based on
dynamic programming.
00:14:20.597 --> 00:14:22.430
Now, the computer
scientists in the audience
00:14:22.430 --> 00:14:24.742
say, I know dynamic programming.
00:14:24.742 --> 00:14:27.200
It's how I find the shortest
path between point A and point
00:14:27.200 --> 00:14:30.200
B without reusing memory,
and things like that.
00:14:30.200 --> 00:14:31.740
And you're exactly right.
00:14:31.740 --> 00:14:33.225
That's exactly what it is.
00:14:33.225 --> 00:14:34.850
It happens that the
dynamic programming
00:14:34.850 --> 00:14:40.580
has a slightly bigger
footprint in the world.
00:14:40.580 --> 00:14:43.340
There's a continuous form
of dynamic programming.
00:14:43.340 --> 00:14:44.470
OK.
00:14:44.470 --> 00:14:46.490
So a graph search is
a very discrete form
00:14:46.490 --> 00:14:48.110
of dynamic programming.
00:14:48.110 --> 00:14:49.610
So I'm going to
start with sort of--
00:14:49.610 --> 00:14:52.152
I'm actually going to work from
the graph search sort of view
00:14:52.152 --> 00:14:54.350
of the world, but to make
the continuous form that
00:14:54.350 --> 00:14:58.490
works for these continuous
dynamical systems.
00:14:58.490 --> 00:15:03.530
And we're going to use this to
investigate a different cost
00:15:03.530 --> 00:15:30.620
function, which is just this--
00:15:30.620 --> 00:15:42.860
still subject to the
dynamics, which in this case
00:15:42.860 --> 00:15:44.495
was the linear dynamics.
00:15:51.420 --> 00:15:51.920
OK.
00:15:56.900 --> 00:15:59.858
So before we worry
about solving it,
00:15:59.858 --> 00:16:02.150
let's take a minute to decide
if it's a reasonable cost
00:16:02.150 --> 00:16:02.650
function.
00:16:07.040 --> 00:16:10.380
It's different in
a couple of ways.
00:16:10.380 --> 00:16:14.000
First of all, there's
no hard limit on u.
00:16:14.000 --> 00:16:17.450
But I do penalize for
u being away from 0.
00:16:20.930 --> 00:16:25.760
So it's sort of a softer
penalty on u, not a hard limit.
00:16:25.760 --> 00:16:28.557
And then these terms
are penalizing it
00:16:28.557 --> 00:16:30.890
from being-- the system from
being away from the origin.
00:16:34.160 --> 00:16:36.140
And instead of going
for some finite time
00:16:36.140 --> 00:16:42.180
and minimizing time, I'm going
to go for an infinite horizon.
00:16:42.180 --> 00:16:47.270
So the only way to drive this
thing, the only way, actually,
00:16:47.270 --> 00:16:53.510
for J to be a finite cost
over this infinite integral,
00:16:53.510 --> 00:16:59.862
is if q and q dot get to
0, and you do u of 0 at 0.
00:16:59.862 --> 00:17:01.570
Otherwise, this thing's
going to blow up.
00:17:01.570 --> 00:17:03.690
It's going to be an
infinite integral.
00:17:03.690 --> 00:17:05.569
So the solution
had better result
00:17:05.569 --> 00:17:08.630
in us getting to the
origin, it turns out.
00:17:08.630 --> 00:17:11.480
But I'm not trying to
explicitly minimize the time.
00:17:11.480 --> 00:17:13.160
I'm just penalizing
it for being away,
00:17:13.160 --> 00:17:16.910
and I'm penalizing
it for taking action.
00:17:16.910 --> 00:17:23.900
Now, what's the name of
this type of control?
00:17:23.900 --> 00:17:26.180
Who knows is?
00:17:26.180 --> 00:17:27.650
I think-- yeah, LQR, right?
00:17:27.650 --> 00:17:29.872
So this is a Linear
Quadratic Regulator.
00:17:40.970 --> 00:17:42.410
OK.
00:17:42.410 --> 00:17:44.210
It's a staple of--
00:17:44.210 --> 00:17:49.040
it's sort of the best, most used
result from optimal control.
00:17:49.040 --> 00:17:52.873
Everybody opens up
Matlab and calls lqr.
00:17:52.873 --> 00:17:54.290
But you're going
to understand it.
00:17:59.380 --> 00:17:59.880
Good.
00:17:59.880 --> 00:18:05.310
But to do LQR, to understand
how that derivation works,
00:18:05.310 --> 00:18:07.977
we've got to do-- we're going to
go through dynamic programming.
00:18:11.386 --> 00:18:14.795
AUDIENCE: Couldn't we use
the same cost function
00:18:14.795 --> 00:18:15.378
there as well?
00:18:15.378 --> 00:18:16.295
RUSS TEDRAKE: Awesome.
00:18:16.295 --> 00:18:16.920
OK.
00:18:16.920 --> 00:18:19.170
So why don't I put that cost
function down and just do
00:18:19.170 --> 00:18:22.220
Pontryagin's minimum principal?
00:18:22.220 --> 00:18:24.940
There's only one sort
of subtle reason,
00:18:24.940 --> 00:18:27.930
which is that that's an
infinite horizon cost.
00:18:31.337 --> 00:18:32.920
So I was going to
say this at the end,
00:18:32.920 --> 00:18:34.890
but let's have this
discussion now.
00:18:34.890 --> 00:18:37.470
So this is an infinite horizon.
00:18:37.470 --> 00:18:42.360
Pontryagin's is used to
verify the optimality
00:18:42.360 --> 00:18:44.580
of some finite integral.
00:18:48.000 --> 00:18:53.520
So let's compare-- well,
I know you know value--
00:18:53.520 --> 00:18:54.850
the dynamic programming.
00:18:54.850 --> 00:18:56.400
So maybe let me say what
dynamic programming is,
00:18:56.400 --> 00:18:57.510
and then I'll contrast them.
00:18:57.510 --> 00:18:58.010
Yeah.
00:19:07.070 --> 00:19:10.213
But the people sort of--
00:19:10.213 --> 00:19:11.880
I just want to
understand what happened.
00:19:11.880 --> 00:19:14.000
We got two different
cost functions,
00:19:14.000 --> 00:19:15.898
two different solution
techniques for now.
00:19:15.898 --> 00:19:17.690
And we're going to
address in a few minutes
00:19:17.690 --> 00:19:20.330
why I did different
solution techniques
00:19:20.330 --> 00:19:21.840
for the different
cost functions.
00:19:21.840 --> 00:19:24.260
But I hope they both seem
like sort of reasonable cost
00:19:24.260 --> 00:19:28.220
functions if I want to get
my system to the origin.
00:19:28.220 --> 00:19:30.470
Different-- we're going to
look at what the result is,
00:19:30.470 --> 00:19:31.430
the different results.
00:19:31.430 --> 00:19:33.430
And actually, something
I want to leave you with
00:19:33.430 --> 00:19:36.733
is that you can, in fact, do
lots of different combinations
00:19:36.733 --> 00:19:37.400
of these things.
00:19:37.400 --> 00:19:42.761
You could do quadratic costs and
try to have some minimum time.
00:19:42.761 --> 00:19:46.740
There's lots and lots of ways to
formulate these cost functions.
00:19:46.740 --> 00:19:51.950
These are two sort
of examples, but you
00:19:51.950 --> 00:19:55.480
can do minimum time LQR,
you can do all these things.
00:19:55.480 --> 00:19:56.078
OK.
00:19:56.078 --> 00:19:57.620
But with the way
we're going to drive
00:19:57.620 --> 00:20:00.380
the LQR controller
is by thinking
00:20:00.380 --> 00:20:02.060
about dynamic programming.
00:20:02.060 --> 00:20:04.370
And to do that, let me start
with the discrete world,
00:20:04.370 --> 00:20:05.870
where people-- where
it makes sense.
00:20:08.772 --> 00:20:10.730
So let's imagine I have
a discrete time system.
00:20:23.120 --> 00:20:30.420
So x of n plus 1
is f of x n u n.
00:20:37.626 --> 00:20:39.125
And I have some cost function.
00:20:42.180 --> 00:20:44.718
Now remember, in the
Pontryagin minimum principle,
00:20:44.718 --> 00:20:46.760
which shows that there's
a sort of a general form
00:20:46.760 --> 00:20:48.320
that a lot of these
cost functions
00:20:48.320 --> 00:20:56.270
take in the discrete form,
it's h of x at capital
00:20:56.270 --> 00:21:03.660
N plus a sum instead
of an integral of n
00:21:03.660 --> 00:21:12.220
equals 0 to N minus
1 g of x n u n.
00:21:21.270 --> 00:21:21.770
OK.
00:21:24.940 --> 00:21:33.760
Now, again, I said this sort of
additive form of cost functions
00:21:33.760 --> 00:21:34.745
is pretty common.
00:21:34.745 --> 00:21:37.120
And you're going to see right
now one of the reasons why.
00:21:37.120 --> 00:21:40.150
The great thing about
having these costs that
00:21:40.150 --> 00:21:43.120
accumulate additively
over the trajectory
00:21:43.120 --> 00:21:49.600
is that I can make a recursive
form of this equation.
00:21:49.600 --> 00:21:52.080
So in particular, if I--
00:21:52.080 --> 00:21:54.250
so I should call
this, really, what
00:21:54.250 --> 00:22:01.390
I've been calling J, that's
really the J of being at x 0
00:22:01.390 --> 00:22:02.380
at time 0.
00:22:06.850 --> 00:22:12.490
And I can compute J of
being at x 0 at time 0
00:22:12.490 --> 00:22:16.990
and incurring the rest
of the cost recursively
00:22:16.990 --> 00:22:20.620
by looking at what it would
be like to be at some state
00:22:20.620 --> 00:22:21.610
x at time N--
00:22:26.340 --> 00:22:32.267
and that in this case
is just h of x of n--
00:22:32.267 --> 00:22:34.350
and then thinking about
what it would be like at--
00:22:34.350 --> 00:22:38.070
to be at some J of x N minus 1--
00:22:42.030 --> 00:22:51.210
and that's going to be g of
x n minus 1 u of n minus 1
00:22:51.210 --> 00:22:53.820
plus h of x n.
00:23:03.900 --> 00:23:06.000
Let me be even more careful.
00:23:06.000 --> 00:23:07.980
And I'm going to
say, let's evaluate
00:23:07.980 --> 00:23:17.022
the cost of running
a particular policy,
00:23:17.022 --> 00:23:21.976
u n is just some pi of J of x n.
00:23:21.976 --> 00:23:24.416
AUDIENCE: Sorry, why
is the first x a 0,
00:23:24.416 --> 00:23:26.856
and then the rest of
the x's [INAUDIBLE]??
00:23:30.512 --> 00:23:31.220
RUSS TEDRAKE: OK.
00:23:31.220 --> 00:23:34.100
So why did I put x 0 here?
00:23:34.100 --> 00:23:35.220
That was intentional.
00:23:35.220 --> 00:23:38.512
I'm trying to make x 0 the
variable that fits in here.
00:23:38.512 --> 00:23:40.220
Here x is the variable
that fits in here.
00:23:40.220 --> 00:23:42.220
But you're right, I could
be a little bit more--
00:23:42.220 --> 00:23:44.110
I should be more careful.
00:23:44.110 --> 00:23:48.830
So now J, a function of
this variable x at time N
00:23:48.830 --> 00:23:50.390
should really just be h of x.
00:23:50.390 --> 00:23:51.910
Yeah, good.
00:23:51.910 --> 00:23:54.320
So then this is--
00:23:54.320 --> 00:23:56.580
I could say it this way.
00:23:56.580 --> 00:24:01.400
The other way I could say
it is J x minus 1 equals x.
00:24:01.400 --> 00:24:03.320
Maybe that's the best
way to rectify it.
00:24:11.960 --> 00:24:13.030
OK.
00:24:13.030 --> 00:24:20.410
And when I'm evaluating the
cost of a particular policy,
00:24:20.410 --> 00:24:27.220
I'm going to use the
notation J pi here,
00:24:27.220 --> 00:24:30.190
say this is the cost I
should expect to receive
00:24:30.190 --> 00:24:33.370
given I'm in some state x.
00:24:33.370 --> 00:24:36.162
To make it even more
satisfying, let's just
00:24:36.162 --> 00:24:37.120
be the same everywhere.
00:24:37.120 --> 00:24:43.675
This is x 0, and here
I'll say x 0 equals my x.
00:24:46.360 --> 00:24:51.340
If I'm in some state x at
time 0 executing policy pi,
00:24:51.340 --> 00:24:53.770
I'm going to incur this cost.
00:24:53.770 --> 00:24:59.770
If I'm at some state x at
time N incurring this--
00:24:59.770 --> 00:25:03.970
taking this policy,
I'm going to get this.
00:25:03.970 --> 00:25:06.040
Here I'm going to get this.
00:25:06.040 --> 00:25:10.420
And even when I'm
executing policy pi,
00:25:10.420 --> 00:25:13.780
I can even furthermore
say that x n
00:25:13.780 --> 00:25:21.850
is f of x n minus 1
pi of x n minus 1.
00:25:24.980 --> 00:25:26.980
It's probably impossible
to read in that corner.
00:25:35.570 --> 00:25:36.070
OK.
00:25:44.045 --> 00:25:45.670
So you can see where
I'm going with it.
00:25:50.740 --> 00:26:00.640
It's pretty easy to see
that J pi of x at some N
00:26:00.640 --> 00:26:07.330
is just the one-step
cost g of x n u
00:26:07.330 --> 00:26:24.980
n plus the cost I expect to see
given that x n plus 1 at time
00:26:24.980 --> 00:26:25.520
equals 1.
00:26:41.730 --> 00:26:42.230
OK.
00:26:48.630 --> 00:26:54.570
So the reason we like these
integral costs or the sum
00:26:54.570 --> 00:26:58.560
of costs in the
discrete time case
00:26:58.560 --> 00:27:03.780
is because I can do these
recursive computations.
00:27:03.780 --> 00:27:06.150
And the same thing
true if I look at--
00:27:06.150 --> 00:27:09.420
if I define what
the optimal cost is.
00:27:09.420 --> 00:27:15.720
So let's now define J
star to be the cost I
00:27:15.720 --> 00:27:34.430
incur if I follow the optimal
policy, which is pi star.
00:27:40.330 --> 00:27:43.060
Well, it turns out
the same thing works.
00:27:56.440 --> 00:28:00.920
But now, there's
an extra term here.
00:28:31.870 --> 00:28:32.370
OK.
00:28:43.150 --> 00:28:48.250
So it's easy to see that
the cost of following
00:28:48.250 --> 00:28:52.750
a particular policy
is recursive.
00:28:52.750 --> 00:28:55.780
It's more surprising
that the cost
00:28:55.780 --> 00:28:58.930
to go of the optimal
policy is equally
00:28:58.930 --> 00:29:02.080
recursive with a simple
form like this, min over u.
00:29:05.500 --> 00:29:07.480
And this actually follows
from something called
00:29:07.480 --> 00:29:09.234
the principle of optimality.
00:29:12.020 --> 00:29:14.500
Anybody see the principle
of optimality before?
00:29:14.500 --> 00:29:16.700
OK.
00:29:16.700 --> 00:29:22.580
It says that if I want to be
optimal over some trajectory,
00:29:22.580 --> 00:29:25.397
I'd better be optimal over--
00:29:25.397 --> 00:29:26.730
from the end of that trajectory.
00:29:26.730 --> 00:29:32.588
So if I want to be
optimal for the last--
00:29:32.588 --> 00:29:35.720
it's from n minus
2 to the end, then
00:29:35.720 --> 00:29:38.930
I'd better be optimal
from n minus 1 to the end.
00:29:38.930 --> 00:29:43.340
So it turns out if I act
optimally in one step
00:29:43.340 --> 00:29:45.500
by doing this min over
u, and then follow
00:29:45.500 --> 00:29:50.840
the policy of acting optimally
for the rest of time, then
00:29:50.840 --> 00:29:55.310
that's optimal for the
entire function, OK?
00:30:17.290 --> 00:30:17.790
OK.
00:30:21.490 --> 00:30:24.520
OK, good.
00:30:24.520 --> 00:30:27.400
So we've got a recursive form
of this cost-to-go function
00:30:27.400 --> 00:30:32.920
that we exploited with
the additive thing,
00:30:32.920 --> 00:30:34.450
the additive form.
00:30:34.450 --> 00:30:41.170
And now, the optimal
policy comes straight out.
00:30:52.540 --> 00:31:03.360
The best thing to do, if
you're in state x and a time n.
00:31:03.360 --> 00:31:18.510
is just the arg min over
u of g x, u plus J star
00:31:18.510 --> 00:31:30.270
x, n plus 1 n plus 1 with that
same x, n plus 1 defined by--
00:31:44.090 --> 00:31:46.430
So in discrete time,
optimal control is trivial.
00:31:46.430 --> 00:31:49.040
If you have an additive cost
function, all you have to do
00:31:49.040 --> 00:31:52.430
is figure out what your
cost is at the end,
00:31:52.430 --> 00:31:56.420
and then go back one step,
do the thing that acts--
00:31:56.420 --> 00:31:59.060
that in one step
minimizes the cost
00:31:59.060 --> 00:32:02.130
and gets me to the lowest
possible cost in the future.
00:32:02.130 --> 00:32:04.130
And if I just do that
recursively backwards,
00:32:04.130 --> 00:32:05.750
I come up with
the optimal policy
00:32:05.750 --> 00:32:09.860
that gets me from any x
in n steps to the end.
00:32:18.590 --> 00:32:21.310
Does that make sense?
00:32:21.310 --> 00:32:22.460
Ask questions.
00:32:31.100 --> 00:32:31.950
Do people buy that?
00:32:31.950 --> 00:32:34.638
Is that obvious, or does
that need more explanation?
00:32:46.600 --> 00:32:47.900
OK.
00:32:47.900 --> 00:32:49.670
Ask questions if you have them.
00:32:49.670 --> 00:32:50.170
All right.
00:32:50.170 --> 00:32:58.720
So we're going to use the
discrete time form again
00:32:58.720 --> 00:33:00.130
when we get to the algorithms.
00:33:00.130 --> 00:33:02.290
But I'm trying to use
it today to leapfrog
00:33:02.290 --> 00:33:07.030
into the continuous time
conditions for optimality.
00:33:10.330 --> 00:33:13.180
So what happens if we now do
the same sort of discrete time
00:33:13.180 --> 00:33:17.650
thinking, but do it in the limit
where the time between steps
00:33:17.650 --> 00:33:18.970
goes to 0?
00:33:22.270 --> 00:33:24.760
So let me try to do the
limiting argument to get us back
00:33:24.760 --> 00:33:25.600
to continuous time.
00:33:47.512 --> 00:33:48.050
OK.
00:33:48.050 --> 00:33:56.860
Now we've got our cost function,
again, is h of x at capital T
00:33:56.860 --> 00:34:04.575
plus the integral from
0 to T of g x, u dt.
00:34:12.199 --> 00:34:14.840
The analogous statement
from this recursion
00:34:14.840 --> 00:34:23.510
in the discrete time
is that J x at t
00:34:23.510 --> 00:34:28.520
is going to be a
limiting argument as dt
00:34:28.520 --> 00:34:45.679
goes to 0 of the min over
u of g x, u dt plus J
00:34:45.679 --> 00:34:51.590
x of t plus dt t plus dt.
00:35:03.240 --> 00:35:04.560
OK.
00:35:04.560 --> 00:35:08.220
This is now-- that's
just a limiting argument
00:35:08.220 --> 00:35:14.970
as dt goes to 0 of the
same recursive statement.
00:35:14.970 --> 00:35:27.270
I'm going to approximate
J x of t plus dt as--
00:35:27.270 --> 00:35:30.870
this is J star let me
not forget my stars--
00:35:30.870 --> 00:35:41.850
as J star at x t plus
partial J star partial x
00:35:41.850 --> 00:35:50.462
x dot dt plus partial
J star partial t dt.
00:35:56.240 --> 00:35:59.180
It's a Taylor
expansion of that term.
00:36:18.240 --> 00:36:19.590
OK.
00:36:19.590 --> 00:36:26.100
If I insert that back in,
then I have J star x of t
00:36:26.100 --> 00:36:36.375
equals the limit as dt
goes to 0 min over u g
00:36:36.375 --> 00:36:46.350
x, u dt plus partial
J star partial x--
00:36:46.350 --> 00:36:49.770
x dot is just f of
x, u, remember--
00:36:49.770 --> 00:36:59.785
dt plus partial J partial t dt.
00:36:59.785 --> 00:37:03.400
And I left off that J x
there, because that actually
00:37:03.400 --> 00:37:04.330
doesn't depend on u.
00:37:04.330 --> 00:37:09.730
So I'm going to put that
outside here, plus J x and t.
00:37:22.770 --> 00:37:23.760
Those guys cancel.
00:37:26.650 --> 00:37:29.970
And now I've got
a dt everywhere.
00:37:29.970 --> 00:37:33.420
So I can actually take that
out, and my limiting argument
00:37:33.420 --> 00:37:35.850
goes away.
00:37:35.850 --> 00:37:39.615
And what I'm left
with, 0 equals min
00:37:39.615 --> 00:37:48.540
over u g of x, u plus
partial J partial x star
00:37:48.540 --> 00:37:50.790
plus partial J partial t.
00:37:55.080 --> 00:37:58.470
This is a very famous
equation, will be used a lot.
00:38:13.255 --> 00:38:14.880
It's called the
Hamilton-Jacobi-Bellman
00:38:14.880 --> 00:38:15.653
equation.
00:38:15.653 --> 00:38:16.278
AUDIENCE: Russ.
00:38:16.278 --> 00:38:17.028
RUSS TEDRAKE: Yes?
00:38:17.028 --> 00:38:17.550
Did I miss--
00:38:17.550 --> 00:38:20.250
AUDIENCE: x dot in
the middle term there.
00:38:20.250 --> 00:38:21.486
RUSS TEDRAKE: Here?
00:38:21.486 --> 00:38:23.160
AUDIENCE: Last equation.
00:38:23.160 --> 00:38:24.808
That x dot [INAUDIBLE].
00:38:24.808 --> 00:38:25.850
RUSS TEDRAKE: Oh, thanks.
00:38:25.850 --> 00:38:26.850
Good.
00:38:26.850 --> 00:38:29.700
This is f of x, u.
00:38:29.700 --> 00:38:30.200
Good.
00:38:30.200 --> 00:38:30.700
Thank you.
00:38:42.750 --> 00:38:43.860
Good, thank you.
00:38:43.860 --> 00:38:45.390
That is the
Hamilton-Jacobi-Bellman
00:38:45.390 --> 00:38:48.840
equation, often
known as the HJB.
00:38:52.200 --> 00:38:57.163
So Hamilton and Jacobi
are really old guys.
00:38:57.163 --> 00:38:58.080
Bellman's a newer guy.
00:38:58.080 --> 00:39:00.010
He was in the '60s or something.
00:39:00.010 --> 00:39:02.700
A lot of people say
Hamilton-Bellman-Jacobi.
00:39:02.700 --> 00:39:04.260
That doesn't seem
quite right to me.
00:39:04.260 --> 00:39:06.780
That's some guy in the
'60s sticking his name
00:39:06.780 --> 00:39:08.100
in between Hamilton and Jacobi.
00:39:08.100 --> 00:39:10.130
So I try to--
00:39:10.130 --> 00:39:13.110
I will probably say HBJ a
couple of times in the class,
00:39:13.110 --> 00:39:15.780
but whenever I'm thinking
about it I say HJB, OK?
00:39:18.640 --> 00:39:19.140
OK.
00:39:19.140 --> 00:39:21.140
So we did a little bit
of work in discrete time.
00:39:21.140 --> 00:39:24.300
But the absolute output
of that thinking,
00:39:24.300 --> 00:39:26.790
the thing you need
to remember, is
00:39:26.790 --> 00:39:33.330
this Hamilton-Jacobi-Bellman
equation, OK?
00:39:33.330 --> 00:39:37.320
These turn out to be the
conditions of optimality
00:39:37.320 --> 00:39:38.350
for continuous time.
00:39:41.410 --> 00:39:42.960
Let's think about what it means.
00:39:48.570 --> 00:39:54.540
So do you have yet a picture
of this sort of what J is.
00:39:54.540 --> 00:39:56.010
J is a cost-to-go.
00:39:56.010 --> 00:39:58.170
It's a function over
the entire landscape.
00:39:58.170 --> 00:40:01.110
It tells me if I'm in
some state, how much cost
00:40:01.110 --> 00:40:03.120
am I going to incur
with my cost function
00:40:03.120 --> 00:40:05.910
as it runs off into time.
00:40:05.910 --> 00:40:08.490
In the finite horizon
case, it's just an integral
00:40:08.490 --> 00:40:10.290
to the end of time.
00:40:10.290 --> 00:40:12.000
In the infinite
horizon case, I've
00:40:12.000 --> 00:40:14.417
started this initial condition,
and I run my cost function
00:40:14.417 --> 00:40:16.290
forever.
00:40:16.290 --> 00:40:22.330
So j is a cost landscape,
a cost-to-go landscape.
00:40:22.330 --> 00:40:26.340
This statement here
says that, if I
00:40:26.340 --> 00:40:33.030
move a little bit in that
landscape in x, scale
00:40:33.030 --> 00:40:35.175
by this x dot, then the
thing I should incur
00:40:35.175 --> 00:40:38.190
is that is my
instantaneous cost.
00:40:38.190 --> 00:40:38.690
OK.
00:40:42.300 --> 00:40:44.337
The way my cost
landscape-- the difference
00:40:44.337 --> 00:40:46.170
of being in initial
condition 1 versus being
00:40:46.170 --> 00:40:49.740
in initial condition 2,
if they're neighboring,
00:40:49.740 --> 00:40:53.230
goes like the cost function.
00:40:53.230 --> 00:40:57.000
And there's the cost function--
the cost-to-go function lives
00:40:57.000 --> 00:41:00.040
in x, and it lives in time.
00:41:00.040 --> 00:41:00.540
OK.
00:41:04.800 --> 00:41:08.520
It's one of the most important
equations we'll have--
00:41:08.520 --> 00:41:11.470
Hamilton-Bellman-Jacobi
equation.
00:41:11.470 --> 00:41:16.976
AUDIENCE: So we can take out the
partial case that [INAUDIBLE]??
00:41:19.910 --> 00:41:22.270
Because that one Is independent
of u, the last term.
00:41:24.990 --> 00:41:30.770
So if we take that
out, basically,
00:41:30.770 --> 00:41:34.340
the difference between the value
to [INAUDIBLE] with respect
00:41:34.340 --> 00:41:38.090
to time, in this time and going
to the next time that sort
00:41:38.090 --> 00:41:41.022
of seems like a
TD error squared--
00:41:41.022 --> 00:41:41.980
RUSS TEDRAKE: Oh, yeah.
00:41:41.980 --> 00:41:42.480
Yeah.
00:41:42.480 --> 00:41:43.670
Good.
00:41:43.670 --> 00:41:46.360
There's absolutely-- this is
exactly the source of the TD
00:41:46.360 --> 00:41:47.690
error and the Bell-- yeah.
00:41:47.690 --> 00:41:49.470
It's exactly the
Bellman equation.
00:41:49.470 --> 00:41:49.970
So yeah.
00:41:49.970 --> 00:41:50.637
So you're right.
00:41:50.637 --> 00:41:55.355
Partial J partial t could have
been outside the min over u.
00:41:55.355 --> 00:41:57.110
It doesn't actually have u.
00:41:57.110 --> 00:42:01.490
But we're going to see
all those connections
00:42:01.490 --> 00:42:03.260
as we get into the algorithms.
00:42:03.260 --> 00:42:07.640
But for-- this now is a tool
for proving analytically
00:42:07.640 --> 00:42:10.370
and driving analytically
some optimal controllers.
00:42:14.000 --> 00:42:14.990
We need one more--
00:42:17.780 --> 00:42:20.750
we need to say something
stronger about how useful
00:42:20.750 --> 00:42:21.650
that tool is.
00:42:34.290 --> 00:42:45.470
So there's the
sufficiency theorem
00:42:45.470 --> 00:42:48.000
is what gives this
guy teeth, OK?
00:42:48.000 --> 00:42:51.360
So I told you that the
Pontryagin's minimum principle
00:42:51.360 --> 00:42:54.615
was a necessary
condition for optimality.
00:42:54.615 --> 00:42:56.220
It wasn't necessarily
sufficient.
00:42:56.220 --> 00:43:00.630
If you show that the system
satisfies the Pontryagin's
00:43:00.630 --> 00:43:05.383
minimum principle,
then you're close,
00:43:05.383 --> 00:43:07.800
but you actually also have to
say it uniquely solves that,
00:43:07.800 --> 00:43:10.350
it's the only solution to
that, solves the Pontryagin's
00:43:10.350 --> 00:43:11.100
minimum principle.
00:43:11.100 --> 00:43:13.637
So there's extra work needed.
00:43:13.637 --> 00:43:15.720
The theorem we're putting
up here is this saying--
00:43:15.720 --> 00:43:18.402
is going to say that if
this equation is satisfied,
00:43:18.402 --> 00:43:19.860
then that's sufficient
to guarantee
00:43:19.860 --> 00:43:25.690
that the policy is optimal.
00:43:25.690 --> 00:43:27.040
OK.
00:43:27.040 --> 00:43:50.990
So given a policy pi x of t,
and a cost-to go function,
00:43:50.990 --> 00:44:07.370
J pi x of t, if pi is
the argument of this,
00:44:07.370 --> 00:44:37.535
if pi is the policy which
minimizes that for all x
00:44:37.535 --> 00:45:32.760
and all t, and that condition
is met, then we can--
00:45:32.760 --> 00:45:41.160
that's sufficient to
give that J pi x of t
00:45:41.160 --> 00:45:53.190
equals J pi of x of t and
pi x of t pi star x of t.
00:46:01.950 --> 00:46:02.450
OK.
00:46:10.227 --> 00:46:12.060
The proof of that I'm
not even going to try.
00:46:12.060 --> 00:46:15.450
It's sort of tedious.
00:46:15.450 --> 00:46:17.970
It's in Bertsekas, if you like--
00:46:17.970 --> 00:46:19.680
Bertsekas' book.
00:46:19.680 --> 00:46:22.260
But we're going
to use this a lot.
00:46:26.160 --> 00:46:31.170
So if I can find some
combination of J, pi, and pi
00:46:31.170 --> 00:46:33.960
that match that condition, then
I've found an optimal policy.
00:46:40.790 --> 00:46:41.290
OK.
00:46:46.000 --> 00:46:49.300
Let's use this to solve
the problem we want--
00:47:09.900 --> 00:47:12.330
the linear quadratic
regulator in its general form.
00:47:26.580 --> 00:47:30.680
So they've got a system
x equals Ax plus Bu.
00:47:49.340 --> 00:47:57.440
And let's say I have a
cost function J of x 0
00:47:57.440 --> 00:48:00.230
is h of x, t--
00:48:00.230 --> 00:48:03.570
the same thing I've been
writing all day here--
00:48:03.570 --> 00:48:17.270
g of x, u dt, where x 0
equals x, where h in general
00:48:17.270 --> 00:48:23.060
takes the form x
transpose Qfx, and g
00:48:23.060 --> 00:48:29.360
takes the form x transpose
Qx plus u transpose Ru.
00:48:34.690 --> 00:48:39.550
To make things-- to be careful,
we're going to assume that--
00:48:42.070 --> 00:48:44.980
we're going to enforce-- we're
choosing the cost function.
00:48:44.980 --> 00:48:48.310
We're going to enforce that this
is positive definite, making
00:48:48.310 --> 00:48:51.580
sure we don't get any
negative cost here.
00:48:51.580 --> 00:48:59.770
And similarly-- actually, it
only has to be semi-definite.
00:48:59.770 --> 00:49:03.460
Q transpose equals
Q greater than
00:49:03.460 --> 00:49:07.930
or equal to 0 and R
transpose equals R.
00:49:07.930 --> 00:49:10.450
That one does have
to be positive.
00:49:10.450 --> 00:49:17.690
Definite
00:49:17.690 --> 00:49:19.070
OK.
00:49:19.070 --> 00:49:24.530
Here's a pretty general
linear dynamical system,
00:49:24.530 --> 00:49:28.070
quadratic regulator cost.
00:49:28.070 --> 00:49:31.970
To satisfy the
HBJ, we simply have
00:49:31.970 --> 00:49:35.600
to have that this condition--
00:50:00.040 --> 00:50:10.570
so 0 equals min over u x
transpose Qx plus u transpose
00:50:10.570 --> 00:50:24.340
Ru plus partial J partial
x star times Ax plus Bu
00:50:24.340 --> 00:50:31.480
plus partial J star partial
t, that had better equal 0.
00:50:31.480 --> 00:50:35.980
So I need to find that
cost-to-go function which
00:50:35.980 --> 00:50:37.350
makes this thing 0.
00:50:44.394 --> 00:50:49.460
It turns out the
solution to these things,
00:50:49.460 --> 00:50:57.260
we can just guess a form for J.
Let's guess that J star x of t
00:50:57.260 --> 00:51:09.280
is also quadratic,
again with a positive--
00:51:09.280 --> 00:51:11.046
it's going to have
to be positive.
00:51:22.026 --> 00:51:32.090
It could be-- in that
case, partial J partial
00:51:32.090 --> 00:51:39.860
x is 2x transpose S of t.
00:51:43.100 --> 00:51:51.265
Partial J partial t is
x transpose s dot t x.
00:51:56.540 --> 00:51:57.040
OK.
00:52:00.197 --> 00:52:01.250
Let's pop this guy in.
00:52:33.053 --> 00:52:34.470
I want to just
crank through here.
00:52:34.470 --> 00:52:40.080
So does it make sense at
all, that the J of x, t
00:52:40.080 --> 00:52:42.598
would be a quadratic
form like that?
00:52:42.598 --> 00:52:43.890
Why is that a reasonable guess?
00:52:48.710 --> 00:52:49.760
Yeah.
00:52:49.760 --> 00:52:52.256
AUDIENCE: Because the
final time [INAUDIBLE]
00:52:52.256 --> 00:52:53.612
match the [INAUDIBLE].
00:52:53.612 --> 00:52:54.320
RUSS TEDRAKE: OK.
00:52:54.320 --> 00:52:57.080
So in the final time,
that's a reasonable guess,
00:52:57.080 --> 00:52:58.940
because it started like this.
00:53:03.680 --> 00:53:04.590
Yeah.
00:53:04.590 --> 00:53:05.340
And it turns out--
00:53:05.340 --> 00:53:08.850
I mean, we're actually going
to see it by verification.
00:53:08.850 --> 00:53:13.140
But for the linear system,
when I pump the cost backwards
00:53:13.140 --> 00:53:14.940
in time, this
quadratic cost, it's
00:53:14.940 --> 00:53:16.426
going to have to stay quadratic.
00:53:23.210 --> 00:53:23.720
OK.
00:53:23.720 --> 00:53:31.850
So I've got 0 equals min over u
x transpose Qx plus u transpose
00:53:31.850 --> 00:53:37.700
Ru plus 2x transpose S of t--
00:53:37.700 --> 00:53:51.564
bless you-- times Ax plus
Bu plus x transpose S t x.
00:53:54.902 --> 00:53:56.360
I need that whole
thing to work out
00:53:56.360 --> 00:54:02.100
to be 0 for the minimizing u.
00:54:02.100 --> 00:54:06.380
So let's figure out what
the minimizing u is now.
00:54:06.380 --> 00:54:08.450
Is it OK if I just
sort of shorthand?
00:54:08.450 --> 00:54:11.870
I'll say the gradient
of that whole thing
00:54:11.870 --> 00:54:16.760
in square brackets with respect
to u here is going to be,
00:54:16.760 --> 00:54:19.070
what, 2Ru--
00:54:19.070 --> 00:54:20.876
or u transpose R, I guess?
00:54:26.330 --> 00:54:29.690
We're going to try to be
careful that this whole thing is
00:54:29.690 --> 00:54:31.148
a scalar.
00:54:31.148 --> 00:54:32.815
We're always talking
about scalar costs.
00:54:32.815 --> 00:54:35.240
So I've got vectors and
matrices going around,
00:54:35.240 --> 00:54:38.870
but the whole thing has to
collapse to be a scalar.
00:54:38.870 --> 00:54:44.870
The gradient of a scalar
with respect to a vector,
00:54:44.870 --> 00:54:47.090
I want it to always be a vector.
00:54:47.090 --> 00:54:50.870
The gradient of a vector
with respect to a vector
00:54:50.870 --> 00:54:52.220
is going to be a matrix.
00:54:52.220 --> 00:54:55.550
So try to be careful
about making--
00:54:55.550 --> 00:55:03.020
that gradient better be a
vector plus what's left here?
00:55:03.020 --> 00:55:07.640
2x transpose S that
guy there, right?
00:55:13.880 --> 00:55:17.390
But I have to take
the transpose of that.
00:55:17.390 --> 00:55:23.480
So it's 2B transpose S of t.
00:55:23.480 --> 00:55:27.625
The S t transpose is not x--
00:55:27.625 --> 00:55:29.150
I screwed up, sorry.
00:55:29.150 --> 00:55:30.320
It's still x transpose.
00:55:30.320 --> 00:55:38.570
I'm trying to-- x transpose S
t B. That thing has to equal 0.
00:55:41.360 --> 00:55:44.540
And that's where I
get my transpose back.
00:55:44.540 --> 00:55:50.760
So u star, the u that makes this
gradient 0, is going to be--
00:55:50.760 --> 00:55:53.300
those 2's cancel.
00:55:53.300 --> 00:55:59.750
It's going to be negative
R inverse B transpose
00:55:59.750 --> 00:56:02.720
S transpose x.
00:56:11.070 --> 00:56:18.170
Which is important to
realize that was actually--
00:56:18.170 --> 00:56:22.450
it's equivalent to writing
negative 1/2 R inverse
00:56:22.450 --> 00:56:28.630
B transpose partial J
partial x transpose.
00:56:42.050 --> 00:56:43.550
OK.
00:56:43.550 --> 00:56:44.900
So what does this mean?
00:56:47.760 --> 00:56:52.880
So I've got some
quadratic approximation
00:56:52.880 --> 00:56:54.890
of my value function.
00:56:54.890 --> 00:56:57.530
It's 0 at the origin
always and forever.
00:56:57.530 --> 00:57:00.170
If I'm at the origin, I'm
going to stay at the origin,
00:57:00.170 --> 00:57:02.060
my cost-to-go is 0.
00:57:02.060 --> 00:57:06.170
The exact shape of the quadratic
bowl changes over time.
00:57:06.170 --> 00:57:11.150
The best thing to
do is to go down
00:57:11.150 --> 00:57:13.550
to negative of the
partial J partial x
00:57:13.550 --> 00:57:16.790
is trying to go down
the cost-to-go function.
00:57:16.790 --> 00:57:19.250
I want to go down the cost-to-go
function as fast as I can.
00:57:22.040 --> 00:57:24.560
But I'm going to wait--
00:57:24.560 --> 00:57:27.560
I'm going to change,
possibly, the exact direction.
00:57:27.560 --> 00:57:31.040
Rather than going straight down
the cost-to-go function in x,
00:57:31.040 --> 00:57:32.815
I might orient
myself a little bit
00:57:32.815 --> 00:57:34.190
depending on the
weightings I put
00:57:34.190 --> 00:57:37.700
on-- the cost I put on
the different u's. So I'm
00:57:37.700 --> 00:57:41.870
going to rotate that
vector a little bit.
00:57:41.870 --> 00:57:45.770
This is what I can do, and this
is the weighting I've done.
00:57:45.770 --> 00:57:48.575
So the best thing to do is to go
down your cost-to-go function,
00:57:48.575 --> 00:57:50.450
get to the point where
my cost-to-go is going
00:57:50.450 --> 00:57:54.290
to be as small as
possible, filtered
00:57:54.290 --> 00:57:56.810
by the direction
I can actually go
00:57:56.810 --> 00:58:00.650
and twisted by the way
I penalize actions.
00:58:00.650 --> 00:58:01.150
OK.
00:58:04.420 --> 00:58:06.130
And it's sort of
amazing, I think,
00:58:06.130 --> 00:58:13.180
that the whole thing works out
to be just some linear feedback
00:58:13.180 --> 00:58:15.340
law negative Kx--
00:58:15.340 --> 00:58:20.230
yet another reason
[INAUDIBLE] to use that form.
00:58:28.900 --> 00:58:30.250
OK.
00:58:30.250 --> 00:58:31.750
Sorry, I should be
a little careful.
00:58:31.750 --> 00:58:34.210
This is-- it depends on time.
00:58:34.210 --> 00:58:36.077
So it's K of t x.
00:58:39.732 --> 00:58:40.940
Why should it depend on time?
00:58:48.240 --> 00:58:49.745
This is a-- what's that?
00:58:49.745 --> 00:58:50.580
AUDIENCE: We switch.
00:58:50.580 --> 00:58:52.200
RUSS TEDRAKE: Because
we switch what?
00:58:52.200 --> 00:58:53.712
AUDIENCE: The actuation.
00:58:53.712 --> 00:58:56.170
RUSS TEDRAKE: There's no hard
switch in the actuation here.
00:58:56.170 --> 00:58:58.870
This is saying, I'm
going to smoothly go down
00:58:58.870 --> 00:59:01.853
a value function.
00:59:01.853 --> 00:59:03.520
This one isn't the
bang-bang controller.
00:59:03.520 --> 00:59:06.190
This turns out to
be a smooth descent
00:59:06.190 --> 00:59:07.780
of some cost-to-go function.
00:59:10.530 --> 00:59:11.030
Yeah?
00:59:11.030 --> 00:59:15.897
AUDIENCE: The S t equals
partial [INAUDIBLE]..
00:59:15.897 --> 00:59:18.230
RUSS TEDRAKE: I mean, S of t
is time [INAUDIBLE] itself.
00:59:18.230 --> 00:59:20.190
AUDIENCE: Yeah,
so it [INAUDIBLE]..
00:59:20.190 --> 00:59:22.040
RUSS TEDRAKE: So
intuitively, why should I
00:59:22.040 --> 00:59:25.430
take a different linear
control action if I'm at a time
00:59:25.430 --> 00:59:28.080
1 versus time 2?
00:59:28.080 --> 00:59:30.330
AUDIENCE: Because
you're time dependent.
00:59:30.330 --> 00:59:32.250
So if you're very close
to the final time,
00:59:32.250 --> 00:59:34.355
you want to [INAUDIBLE]
lots of control,
00:59:34.355 --> 00:59:36.480
because you don't have
that much time [INAUDIBLE]..
00:59:36.480 --> 00:59:38.480
RUSS TEDRAKE: Awesome, yeah.
00:59:38.480 --> 00:59:43.020
This is a quirk of having a
finite horizon cost function.
00:59:43.020 --> 00:59:45.150
In the infinite horizon
case, it turns out
00:59:45.150 --> 00:59:48.525
you're going to just get
a u equals negative Kx,
00:59:48.525 --> 00:59:51.360
where K is a variant of time.
00:59:51.360 --> 00:59:52.620
But in the time--
00:59:52.620 --> 00:59:56.010
finite horizon problem,
there's this quirk,
00:59:56.010 --> 00:59:58.170
which is the time
ends at some point,
00:59:58.170 --> 01:00:00.480
and I have to deal with it.
01:00:00.480 --> 01:00:06.000
If the bank closes at 5:00,
if I'm here and it's 4:50,
01:00:06.000 --> 01:00:07.830
and the bank closes at
5:00, I'm going to--
01:00:07.830 --> 01:00:10.533
I'd better get over there
faster than if it was 4:30
01:00:10.533 --> 01:00:11.700
and the bank closes at 5:00.
01:00:14.970 --> 01:00:17.610
In my mind, actually, there's
a lot of problems that are--
01:00:17.610 --> 01:00:19.800
bank closing is a
weird one, but there
01:00:19.800 --> 01:00:22.440
are a lot of problems that
are naturally formulated
01:00:22.440 --> 01:00:24.900
as finite horizon problems.
01:00:24.900 --> 01:00:26.520
Things-- maybe a pick-and-place.
01:00:26.520 --> 01:00:29.392
The minimum time problem was a
finite horizon, pick-and-place.
01:00:29.392 --> 01:00:31.350
There are a lot of problems
which are naturally
01:00:31.350 --> 01:00:33.660
formulated as infinite horizon.
01:00:33.660 --> 01:00:36.900
I just want to walk as
well as I possibly can
01:00:36.900 --> 01:00:37.930
for a very long time.
01:00:37.930 --> 01:00:40.830
I don't need to get to some
place at a certain time.
01:00:40.830 --> 01:00:43.170
OK.
01:00:43.170 --> 01:00:45.900
But in many ways, the
finite horizon time ones
01:00:45.900 --> 01:00:47.970
are the weird ones,
because you always
01:00:47.970 --> 01:00:49.990
have to worry about the
end of time approaching.
01:00:53.850 --> 01:00:54.682
OK.
01:00:54.682 --> 01:00:56.357
AUDIENCE: How do we get S t?
01:00:56.357 --> 01:00:57.690
RUSS TEDRAKE: How do we get S t?
01:00:57.690 --> 01:00:58.190
OK.
01:01:00.660 --> 01:01:03.930
Well, it's the thing that
makes this equation 0.
01:01:08.930 --> 01:01:10.010
So what is that thing?
01:01:22.150 --> 01:01:24.270
I figured out what
the minimizing u is.
01:01:27.180 --> 01:01:29.580
I can insert that back in.
01:01:29.580 --> 01:01:39.930
So I get now 0 equals
Q plus x transpose--
01:01:39.930 --> 01:01:42.420
I'm going to insert u in--
01:01:42.420 --> 01:01:46.200
K-- or I'll do the
whole thing, actually--
01:01:46.200 --> 01:01:56.050
S of t B R inverse
times R times R inverse.
01:01:56.050 --> 01:02:00.090
So I'm going to go ahead
and cancel those out.
01:02:00.090 --> 01:02:03.846
B transpose S of t x.
01:02:07.140 --> 01:02:09.780
And the negative signs,
because there's two u's there.
01:02:09.780 --> 01:02:13.260
The negative sign didn't get me.
01:02:13.260 --> 01:02:22.410
And then plus 2x
transpose S of t Ax plus--
01:02:22.410 --> 01:02:42.430
so minus B R inverse B transpose
S of t x plus x transpose S dot
01:02:42.430 --> 01:02:42.970
of x.
01:02:50.420 --> 01:02:54.850
It turns out that
this term here should
01:02:54.850 --> 01:02:58.220
be the same as that term
there, modulo of factor 2.
01:02:58.220 --> 01:03:04.000
If you look, it's S, B, R
inverse, B transpose, S.
01:03:04.000 --> 01:03:12.850
So this one actually, I can
just turn that into a minus.
01:03:12.850 --> 01:03:13.350
OK.
01:03:16.510 --> 01:03:22.010
And it turns out that everything
has this x transpose matrix x
01:03:22.010 --> 01:03:22.510
form.
01:03:25.430 --> 01:03:26.900
So I can actually--
01:03:26.900 --> 01:03:30.350
in order for this thing
to be true for all x,
01:03:30.350 --> 01:03:35.450
it must be that the matrix
inside had better be 0.
01:03:35.450 --> 01:03:45.900
So it turns out to be 0
equals Q minus S t B R
01:03:45.900 --> 01:04:00.950
inverse B transpose S t
plus 2 S t A plus S dot t
01:04:00.950 --> 01:04:04.190
had better be equal to 0.
01:04:04.190 --> 01:04:06.440
OK.
01:04:06.440 --> 01:04:09.560
Now, I made some
assumptions to get here.
01:04:09.560 --> 01:04:12.200
Know what assumptions I made?
01:04:12.200 --> 01:04:15.050
The big one is that I guessed
that form of the value
01:04:15.050 --> 01:04:16.203
function.
01:04:16.203 --> 01:04:17.870
And one of the things
I guessed about it
01:04:17.870 --> 01:04:20.570
was that it was symmetric.
01:04:20.570 --> 01:04:22.640
So let's see if we're
looking symmetric.
01:04:22.640 --> 01:04:25.400
So Q, we already
said, was symmetric.
01:04:25.400 --> 01:04:27.200
That's all good.
01:04:27.200 --> 01:04:29.780
That guy's nice and symmetric.
01:04:29.780 --> 01:04:31.820
That's all good.
01:04:31.820 --> 01:04:35.360
So this is the one we
have to worry about.
01:04:35.360 --> 01:04:36.604
Is that guy symmetric?
01:04:42.040 --> 01:04:44.290
It's actually not
symmetric like that.
01:04:44.290 --> 01:04:54.430
But I can equivalently write it
as S t A plus A transpose S t,
01:04:54.430 --> 01:04:55.992
since S is symmetric.
01:04:59.370 --> 01:05:00.911
And that guy is symmetric.
01:05:14.133 --> 01:05:15.300
I said a very strange thing.
01:05:15.300 --> 01:05:17.202
I just said that
the matrices are--
01:05:17.202 --> 01:05:19.410
this one is not symmetric,
I can write the same thing
01:05:19.410 --> 01:05:20.460
as-- it's this.
01:05:20.460 --> 01:05:36.540
So what I mean to say is that
these are equivalent for all x.
01:05:47.750 --> 01:05:53.390
Because this has
got to equal this.
01:06:02.310 --> 01:06:02.810
OK.
01:06:02.810 --> 01:06:13.970
So, good.
01:06:13.970 --> 01:06:15.590
OK.
01:06:15.590 --> 01:06:16.990
So this equation,
which I'm going
01:06:16.990 --> 01:06:19.100
to write one more
time since it's
01:06:19.100 --> 01:06:20.690
an equation that has
a name associated
01:06:20.690 --> 01:06:22.250
with someone famous--
01:06:22.250 --> 01:06:25.220
deserves a box
around it, I guess.
01:06:25.220 --> 01:06:28.820
So this is the Riccati equation.
01:06:28.820 --> 01:06:33.816
I'm going to move the
S over to this side.
01:06:33.816 --> 01:06:34.910
It's a Riccati equation.
01:06:58.020 --> 01:06:59.520
And I also have
that final condition
01:06:59.520 --> 01:07:02.820
that you rightly pointed
out, where S of capital T
01:07:02.820 --> 01:07:03.930
had better equal Qf.
01:07:12.390 --> 01:07:17.880
So direct application of
the Hamilton-Bellman-Jacobi
01:07:17.880 --> 01:07:27.330
equation, I was able to derive
this Riccati equation, which
01:07:27.330 --> 01:07:30.720
gives me a solution
for the value function.
01:07:30.720 --> 01:07:34.590
Because it gives me a
final condition on an S
01:07:34.590 --> 01:07:37.020
and then the governing
equation which
01:07:37.020 --> 01:07:40.050
integrates the equation
backwards from capital T to 0.
01:07:45.990 --> 01:07:48.150
And once I have S,
remember, we said
01:07:48.150 --> 01:07:58.180
that the u was just negative R
inverse B transpose S of t x.
01:07:58.180 --> 01:07:59.160
So I've got everything.
01:07:59.160 --> 01:08:00.955
Once I have S, I
have everything.
01:08:09.700 --> 01:08:10.200
OK.
01:08:10.200 --> 01:08:13.200
So this is one of the
absolute fundamental results
01:08:13.200 --> 01:08:15.225
in optimal control.
01:08:18.930 --> 01:08:26.279
It turns out that if you want
to know the infinite horizon
01:08:26.279 --> 01:08:28.829
solution to the--
01:08:28.829 --> 01:08:32.100
if you look at the solution
as time goes to infinity--
01:08:32.100 --> 01:08:35.460
remember, I wrote my cost
function initially was--
01:08:35.460 --> 01:08:41.040
the problem we're trying to
solve is an infinite integral.
01:08:41.040 --> 01:08:43.140
It turns out that
the infinite solution
01:08:43.140 --> 01:08:47.910
is the steady-state
solution of this equation.
01:08:47.910 --> 01:08:51.210
So if you integrate this
equation back enough,
01:08:51.210 --> 01:08:51.930
it's stable.
01:08:51.930 --> 01:08:56.819
It finds a steady
state where S dot is 0.
01:08:56.819 --> 01:08:59.220
And that solution
when S dot equals
01:08:59.220 --> 01:09:18.310
0, The S which solves this,
that whole thing minus Q,
01:09:18.310 --> 01:09:19.705
is the infinite
horizon solution.
01:09:31.770 --> 01:09:34.260
OK.
01:09:34.260 --> 01:09:47.340
If you open up Matlab, and
you type lqr A, B, Q, R,
01:09:47.340 --> 01:09:50.340
then it's going to
output two things.
01:09:50.340 --> 01:09:57.060
It outputs K, and it outputs S.
Solving this thing is actually
01:09:57.060 --> 01:09:57.990
not trivial.
01:09:57.990 --> 01:10:03.060
So how do you solve that for S?
01:10:03.060 --> 01:10:09.120
The hard one is it's got
this S in both places.
01:10:09.120 --> 01:10:13.230
But this is the
Lyapunov equation again.
01:10:13.230 --> 01:10:15.870
It's so famous, it
comes up so pervasively,
01:10:15.870 --> 01:10:18.870
that people have really
good tools for solving it,
01:10:18.870 --> 01:10:20.300
numerical tools for solving it.
01:10:20.300 --> 01:10:22.050
So Matlab's got some
nice routine in there
01:10:22.050 --> 01:10:26.280
to solve, to find S.
01:10:26.280 --> 01:10:30.660
And when I call lqr
with the dynamics
01:10:30.660 --> 01:10:36.960
and the Q, R gives me exactly
the infinite horizon S
01:10:36.960 --> 01:10:41.220
and infinite horizon
non-time-variant K.
01:10:41.220 --> 01:10:43.895
If you need to do a finite
horizon quadratic regulator,
01:10:43.895 --> 01:10:46.020
then you actually need to
integrate these equations
01:10:46.020 --> 01:10:48.530
yourself.
01:10:48.530 --> 01:10:49.590
OK.
01:10:49.590 --> 01:10:53.220
I hate going that long with just
equations and not intuition.
01:10:53.220 --> 01:10:56.175
So let me connect it
back to the brick now.
01:10:56.175 --> 01:10:59.620
That was the point of doing
everything in the brick world
01:10:59.620 --> 01:11:00.120
here.
01:11:03.630 --> 01:11:05.580
OK.
01:11:05.580 --> 01:11:10.140
So we've got Q
double dot equals u.
01:11:10.140 --> 01:11:16.980
We've got now
infinite horizon J x
01:11:16.980 --> 01:11:25.590
is infinite horizon g x,
u dt, where I said g x,
01:11:25.590 --> 01:11:33.015
u was 1/2 Q squared plus 1/2 Q
dot squared plus 1/2 u squared.
01:11:36.330 --> 01:11:44.235
So now that's exactly in
the LQR form 0, 1, 0, 0.
01:11:46.830 --> 01:11:50.670
B is 0, 1.
01:11:50.670 --> 01:11:56.850
Q is the identity matrix.
01:11:56.850 --> 01:11:58.530
And R is 1.
01:12:02.550 --> 01:12:04.140
It turns out I
can actually solve
01:12:04.140 --> 01:12:08.640
that one algebraically for S.
If you pump all the symbols in--
01:12:08.640 --> 01:12:11.460
I won't do it because
there's a lot of symbols--
01:12:11.460 --> 01:12:15.270
but in a few lines of algebra,
you can figure out what S has
01:12:15.270 --> 01:12:19.440
to be, just because so many
terms drop out with those
01:12:19.440 --> 01:12:21.400
0's that actually there's--
01:12:24.130 --> 01:12:28.080
There's the three equations
and three unknowns.
01:12:28.080 --> 01:12:38.070
And it turns out that S has
to be square root of 2, 1, 1,
01:12:38.070 --> 01:12:38.970
square root of 2.
01:12:42.610 --> 01:12:43.110
OK.
01:13:03.540 --> 01:13:14.040
The u, remember, was negative
R inverse B transpose
01:13:14.040 --> 01:13:22.620
B transpose S x, which,
if I punch those in,
01:13:22.620 --> 01:13:28.440
gives me 1 square
root of 2 times
01:13:28.440 --> 01:13:52.370
x, which gives me closed loop
dynamics of x dot equals Ax
01:13:52.370 --> 01:14:02.150
minus BKx is equal
to 0, 1, negative 1,
01:14:02.150 --> 01:14:06.520
square root of 2 times x.
01:14:12.150 --> 01:14:13.410
OK.
01:14:13.410 --> 01:14:15.030
Now I'm going to
plot two things here.
01:14:20.030 --> 01:14:30.790
First thing I'm going
to plot is J of x.
01:14:30.790 --> 01:14:35.530
J of x is square root of
2, 1, 1, square root of 2.
01:14:38.150 --> 01:14:39.650
A little thinking
about that, you'll
01:14:39.650 --> 01:14:50.490
see that it comes out to
be an ellipsoid that is--
01:14:50.490 --> 01:14:58.400
[INAUDIBLE]---- sort
of shaped like this.
01:15:01.180 --> 01:15:03.670
I draw contours
of that function,
01:15:03.670 --> 01:15:08.260
of that x transpose S x.
01:15:14.305 --> 01:15:17.720
And the cost-to-go is 0 here.
01:15:17.720 --> 01:15:25.950
And it's a bowl that comes up
in this sort of elliptic bowl.
01:15:36.110 --> 01:15:36.610
All right.
01:15:36.610 --> 01:15:39.460
So what is the optimal
policy going to look like,
01:15:39.460 --> 01:15:40.780
given that that's my bowl?
01:15:45.550 --> 01:15:48.350
We said the best thing to do
is go down the steepest descent
01:15:48.350 --> 01:15:48.850
of the bowl.
01:15:52.070 --> 01:15:53.200
I want to go down--
01:15:53.200 --> 01:15:55.990
wherever I am, I want to
go down as fast as I can.
01:15:59.730 --> 01:16:01.470
But I can't do it exactly.
01:16:01.470 --> 01:16:03.180
That was actually sort of a--
01:16:03.180 --> 01:16:03.720
that's OK.
01:16:03.720 --> 01:16:07.110
I mean, I can't do it exactly,
because all I'm allowed to do
01:16:07.110 --> 01:16:07.980
is change--
01:16:07.980 --> 01:16:11.950
I have one component that I'm
not allowed to change, right?
01:16:11.950 --> 01:16:19.470
I have that my Q is going to
go forward independent of u
01:16:19.470 --> 01:16:20.640
directly.
01:16:20.640 --> 01:16:25.170
So B transpose S x is going
to be give me a projection
01:16:25.170 --> 01:16:27.810
of that gradient onto this--
01:16:27.810 --> 01:16:30.090
the thing I can
actually control,
01:16:30.090 --> 01:16:32.400
which way I can point
my phase portrait
01:16:32.400 --> 01:16:33.510
in that given my control.
01:16:36.690 --> 01:16:39.480
And then R is going
to scale it again.
01:16:39.480 --> 01:16:44.080
And the resulting
closed loop dynamics,
01:16:44.080 --> 01:16:45.580
let's see if we can
figure that out.
01:16:49.210 --> 01:16:52.500
So if I take the eigenvectors
and eigenvalues that, well,
01:16:52.500 --> 01:16:55.290
it turns out I'm not
going to make the plot.
01:16:55.290 --> 01:17:03.270
My eigenvalues were square
root of 2 plus or minus i 1
01:17:03.270 --> 01:17:12.280
over square root of 2 with v
being 1 over square root of 2.
01:17:24.980 --> 01:17:28.750
So the best thing I can
possibly do is to go down that--
01:17:28.750 --> 01:17:30.920
if I didn't care about--
01:17:30.920 --> 01:17:32.560
if I didn't worry
about penalizing R,
01:17:32.560 --> 01:17:34.310
I didn't worry about
my control actuation,
01:17:34.310 --> 01:17:36.680
would be to go straight
down that bowl.
01:17:36.680 --> 01:17:38.680
But because I'm
scaling things by--
01:17:38.680 --> 01:17:41.180
I'm filtering things by wearing
what I can actually control,
01:17:41.180 --> 01:17:43.610
and I'm penalizing things
by R, the actual response
01:17:43.610 --> 01:17:47.280
is a complex response
which goes down--
01:17:47.280 --> 01:17:50.000
goes down this bowl
and oscillates its way
01:17:50.000 --> 01:17:51.230
into the origin.
01:18:12.610 --> 01:18:13.860
OK, good.
01:18:13.860 --> 01:18:14.860
It was a little painful.
01:18:14.860 --> 01:18:21.190
But that is a set
of tools that we're
01:18:21.190 --> 01:18:24.220
going to lean on when we're
making all our algorithms.
01:18:24.220 --> 01:18:28.120
You've now seen a pretty
representative sampling
01:18:28.120 --> 01:18:30.250
of what people can
do analytically
01:18:30.250 --> 01:18:32.770
with optimal control.
01:18:32.770 --> 01:18:36.040
When you have a linear
dynamical system,
01:18:36.040 --> 01:18:40.090
and there's a handful of cost
functions which you can--
01:18:40.090 --> 01:18:43.480
either by Pontryagin
or dynamic programming,
01:18:43.480 --> 01:18:46.240
the Hamilton-Bellman-Jacobi
sufficiency theorem,
01:18:46.240 --> 01:18:50.770
those are really the two big
tools that are out there.
01:18:50.770 --> 01:18:53.080
In cases, especially
for linear systems,
01:18:53.080 --> 01:18:56.680
you can analytically come up
with optimal control policies
01:18:56.680 --> 01:19:00.520
and value functions.
01:19:00.520 --> 01:19:01.870
Why did we distinguish the two?
01:19:01.870 --> 01:19:06.580
Why did I use one in
one place and the other
01:19:06.580 --> 01:19:07.720
in the other place?
01:19:07.720 --> 01:19:12.100
Well, it turns out the
Hamilton-Bellman-Jacobi
01:19:12.100 --> 01:19:18.130
sufficiency theorem has
in it these partial J
01:19:18.130 --> 01:19:21.910
partial x, partial J partial t.
01:19:21.910 --> 01:19:27.040
So it's only valid, actually, if
partial J partial x is smooth.
01:19:29.820 --> 01:19:36.720
The policy we got
from minimum time
01:19:36.720 --> 01:19:40.336
has this hard nonlinearity
in the middle of it.
01:19:40.336 --> 01:19:42.630
It turns out that
the value function
01:19:42.630 --> 01:19:46.560
that you have in the
minimum time problem
01:19:46.560 --> 01:19:49.140
also has a hard
nonlinearity in it.
01:19:49.140 --> 01:19:52.260
If I'm here versus
here, it's smooth,
01:19:52.260 --> 01:19:54.690
but the gradients
are not smooth.
01:19:54.690 --> 01:19:56.920
The gradient is discontinuous.
01:19:56.920 --> 01:20:02.580
So on this cusp, partial
J partial x is undefined.
01:20:02.580 --> 01:20:04.290
So that's the only
reason why I didn't
01:20:04.290 --> 01:20:08.640
lean on the sufficiency
theorem completely.
01:20:08.640 --> 01:20:12.300
How did Pontryagin
get around that?
01:20:12.300 --> 01:20:15.480
The sufficiency theorem
is talking about--
01:20:19.398 --> 01:20:20.730
it's looking at over--
01:20:20.730 --> 01:20:22.860
roughly over the
entire state space.
01:20:22.860 --> 01:20:27.900
It's looking at variations
in the cost-to-go function
01:20:27.900 --> 01:20:31.530
as I move in x and in time.
01:20:31.530 --> 01:20:35.310
Pontryagin, if you remember, was
along a particular trajectory.
01:20:35.310 --> 01:20:37.290
It was verifying that
a particular trajectory
01:20:37.290 --> 01:20:40.380
was locally optimal.
01:20:40.380 --> 01:20:42.660
And it turns out in
problems like this
01:20:42.660 --> 01:20:48.660
in these bang-bang problems,
along a particular trajectory,
01:20:48.660 --> 01:20:53.830
my cost-to-go is smooth.
01:20:53.830 --> 01:20:55.720
The cost-to-go in the
minimum time problem
01:20:55.720 --> 01:20:59.560
was just time, right?
01:20:59.560 --> 01:21:02.440
So the time I get--
01:21:02.440 --> 01:21:04.630
the time it takes for
me to go to here to here
01:21:04.630 --> 01:21:08.950
is just smoothly decreasing
as I get closer like time.
01:21:08.950 --> 01:21:13.690
Along any trajectory,
with these additive costs,
01:21:13.690 --> 01:21:16.180
the value function is
going to be smooth.
01:21:16.180 --> 01:21:19.300
But along a
non-system trajectory,
01:21:19.300 --> 01:21:21.130
some line like this, partial--
01:21:21.130 --> 01:21:26.140
if I just look at J, how J
varies over x, it's not smooth.
01:21:26.140 --> 01:21:28.180
So Pontryagin is a
weaker statement.
01:21:28.180 --> 01:21:31.600
It's a statement about local
stability along a trajectory.
01:21:31.600 --> 01:21:34.120
But it's valid in
slightly larger domains,
01:21:34.120 --> 01:21:36.850
because it doesn't
rely on value functions
01:21:36.850 --> 01:21:38.678
being smoothly differentiable.
01:21:44.420 --> 01:21:50.660
Now, for the first-order--
01:21:50.660 --> 01:21:53.090
sorry, for the double
integrator, the brick on ice,
01:21:53.090 --> 01:21:55.940
we could have just
chosen our K's by hand
01:21:55.940 --> 01:21:58.340
and pushed them
higher or smaller.
01:21:58.340 --> 01:21:59.150
We could do locus.
01:21:59.150 --> 01:22:01.070
We could figure out a
pretty reasonable set
01:22:01.070 --> 01:22:06.160
of K's, of feedback gains, to
make it stabilize to the goal.
01:22:06.160 --> 01:22:09.980
LQR gives us a different set
of knobs that we could tune.
01:22:09.980 --> 01:22:12.860
Now we could more
explicitly say what
01:22:12.860 --> 01:22:16.303
our concern is for getting
to the goal by the Q matrix,
01:22:16.303 --> 01:22:18.470
versus what our concern is
about using a lot of cost
01:22:18.470 --> 01:22:19.310
in the R matrix.
01:22:21.860 --> 01:22:24.050
So maybe that's not
very compelling.
01:22:24.050 --> 01:22:25.910
Maybe we just did a
lot of work to just
01:22:25.910 --> 01:22:27.493
have a slightly
different set of knobs
01:22:27.493 --> 01:22:29.810
to turn when I'm designing
my feedback controller.
01:22:29.810 --> 01:22:31.430
But what you're
going to see is that,
01:22:31.430 --> 01:22:35.000
for much more complicated
systems that are still linear--
01:22:35.000 --> 01:22:38.660
or linearizations about
very complicated systems,
01:22:38.660 --> 01:22:40.490
LQR is going to give
you an explicit way
01:22:40.490 --> 01:22:45.140
to design these linear feedback
controllers in a way that's
01:22:45.140 --> 01:22:47.060
optimal.
01:22:47.060 --> 01:22:50.570
So we're actually doing
a variation of LQR
01:22:50.570 --> 01:22:54.958
now to make an airplane land
on a perch, for instance.
01:22:54.958 --> 01:22:56.750
We can-- we're going
to use it to stabilize
01:22:56.750 --> 01:23:00.840
the double-inverted pendulum,
the Acrobot, around the top.
01:23:00.840 --> 01:23:04.340
So it's going to be a
generally more useful tool.
01:23:04.340 --> 01:23:06.860
Down at the brick,
double integrator level,
01:23:06.860 --> 01:23:09.110
you can think it's almost
just a different set of ways
01:23:09.110 --> 01:23:10.010
to do your locus.
01:23:12.570 --> 01:23:13.070
OK.
01:23:13.070 --> 01:23:16.580
You have now, through
two sort of dry lectures
01:23:16.580 --> 01:23:18.290
relative to the
rest of the class,
01:23:18.290 --> 01:23:25.567
learned two ways to do
analytical optimal control.
01:23:25.567 --> 01:23:27.650
One is by means of
Pontryagin's minimum principle,
01:23:27.650 --> 01:23:29.990
one is by means of
dynamic programming, which
01:23:29.990 --> 01:23:34.880
is through the HJB
sufficiency theorem.
01:23:34.880 --> 01:23:36.410
And you've seen
some representatives
01:23:36.410 --> 01:23:39.860
of what people can do with those
analytical optimal control.
01:23:39.860 --> 01:23:45.620
And it got us far enough to
make a brick go to the origin.
01:23:45.620 --> 01:23:46.310
Right.
01:23:46.310 --> 01:23:48.410
And it'll do a few
more things, but.
01:23:48.410 --> 01:23:52.670
OK, so that's about as far
as we get with analytics.
01:23:52.670 --> 01:23:56.330
We're going to use this in
places to start algorithms up.
01:23:56.330 --> 01:24:02.840
But if we want to, for instance,
solve the minimum time problem
01:24:02.840 --> 01:24:05.390
or the quadratic
regulator problem,
01:24:05.390 --> 01:24:09.610
for the nonlinear
dynamics of the pendulum,
01:24:09.610 --> 01:24:13.540
if I take my x dot
equals Ax plus Bu away
01:24:13.540 --> 01:24:18.970
and give it the mgL sine
theta, then most of these tools
01:24:18.970 --> 01:24:19.540
break down.
01:24:22.210 --> 01:24:25.780
Next Tuesday happens to be
a holiday, virtual Monday.
01:24:25.780 --> 01:24:27.830
So we won't do it
on next Tuesday.
01:24:27.830 --> 01:24:30.760
But next Thursday, I'm
going to show you algorithms
01:24:30.760 --> 01:24:31.840
that are based on these.
01:24:31.840 --> 01:24:36.400
This is the important foundation
that are going to solve
01:24:36.400 --> 01:24:40.570
algorithmically the same optimal
control problems that we're--
01:24:40.570 --> 01:24:46.180
more optimal control problems
that we can solve analytically.
01:24:46.180 --> 01:24:49.600
And then the-- we'll
go on from there
01:24:49.600 --> 01:24:51.990
to more and more
complicated systems.