WEBVTT

00:00:00.806 --> 00:00:04.200
Let us now come back to the
trajectory estimation problem

00:00:04.200 --> 00:00:06.250
that we introduced earlier.

00:00:06.250 --> 00:00:09.630
We have an object
that moves vertically.

00:00:09.630 --> 00:00:14.720
At any given time t, the height
at which the object is found

00:00:14.720 --> 00:00:17.350
is equal to this expression.

00:00:17.350 --> 00:00:19.950
It corresponds to the
following-- the object starts

00:00:19.950 --> 00:00:22.940
at time 0, at some
initial height Theta0,

00:00:22.940 --> 00:00:25.570
it has an initial
velocity of Theta1,

00:00:25.570 --> 00:00:27.980
but also has a
certain acceleration.

00:00:27.980 --> 00:00:29.890
And if Theta2 is
negative, this will

00:00:29.890 --> 00:00:31.750
be a downwards
acceleration, which

00:00:31.750 --> 00:00:36.360
means that the object eventually
will turn and start going down.

00:00:36.360 --> 00:00:39.850
So this is a typical trajectory
of such an object, where

00:00:39.850 --> 00:00:44.010
here we're plotting the
height as a function of time.

00:00:44.010 --> 00:00:47.810
However, the Thetas
are unknown and they

00:00:47.810 --> 00:00:51.120
are random-- we do not
know what they are.

00:00:51.120 --> 00:00:53.745
So this blue curve
is just a simulation

00:00:53.745 --> 00:00:57.910
where we drew values for those
random variables at random.

00:00:57.910 --> 00:00:59.640
But if we were to
simulate again,

00:00:59.640 --> 00:01:02.380
we might obtain a somewhat
different blue curve,

00:01:02.380 --> 00:01:06.250
because the values of the Thetas
might have been different.

00:01:06.250 --> 00:01:09.830
We do not observe the
true trajectory directly.

00:01:09.830 --> 00:01:13.610
What we do observe is
certain data points.

00:01:13.610 --> 00:01:14.820
What are they?

00:01:14.820 --> 00:01:19.000
At certain times ti
we make a measurement

00:01:19.000 --> 00:01:23.030
of the height of the object,
except that this measurement

00:01:23.030 --> 00:01:26.120
is corrupted by
some additive noise.

00:01:26.120 --> 00:01:28.440
This is the model that
we introduced earlier.

00:01:28.440 --> 00:01:31.340
And our assumptions were that
all of the random variables

00:01:31.340 --> 00:01:36.020
involved-- the Thetas and the
W's were normal with 0 mean

00:01:36.020 --> 00:01:38.539
and were also independent.

00:01:38.539 --> 00:01:43.390
In that case, we saw that
maximizing the posterior

00:01:43.390 --> 00:01:49.000
distribution of the Thetas
after taking logarithms

00:01:49.000 --> 00:01:52.690
amounted to minimizing
this quadratic function

00:01:52.690 --> 00:01:54.039
of the thetas.

00:01:54.039 --> 00:01:57.370
So once we have some data
available in our hands,

00:01:57.370 --> 00:02:00.500
we look at this expression
as a function of the thetas

00:02:00.500 --> 00:02:04.270
and find the thetas that
are as as good as possible

00:02:04.270 --> 00:02:06.180
in terms of this criterion.

00:02:06.180 --> 00:02:10.360
And this is the MAP methodology
for this particular example.

00:02:10.360 --> 00:02:12.490
Now, for the purposes
of this illustration,

00:02:12.490 --> 00:02:16.430
actually, we will change our
assumptions a little bit.

00:02:16.430 --> 00:02:17.850
They will be as follows.

00:02:17.850 --> 00:02:22.300
Regarding the acceleration, we
will take it to be a constant.

00:02:22.300 --> 00:02:24.240
The acceleration
term often has to do

00:02:24.240 --> 00:02:26.650
with gravitational
effects which are known,

00:02:26.650 --> 00:02:28.860
so we will treat
Theta2 as a constant.

00:02:28.860 --> 00:02:30.770
And that means that
there's no point

00:02:30.770 --> 00:02:34.060
in having a prior
distribution for Theta2.

00:02:34.060 --> 00:02:36.860
So this term here,
which originated

00:02:36.860 --> 00:02:41.570
from the prior distribution of
Theta2 is going to disappear.

00:02:41.570 --> 00:02:45.760
We will take the variances of
these basic random variables

00:02:45.760 --> 00:02:47.310
to be the same.

00:02:47.310 --> 00:02:49.490
And because of this,
these constants

00:02:49.490 --> 00:02:52.030
here will all be the same.

00:02:52.030 --> 00:02:55.560
Therefore, we can take them
outside of this expression,

00:02:55.560 --> 00:02:58.670
and outside the minimization
they will not matter.

00:02:58.670 --> 00:03:02.220
So we can remove them
from the picture.

00:03:02.220 --> 00:03:05.330
The factor of 1/2 can
also be removed similarly.

00:03:05.330 --> 00:03:08.510
It does not affect
the minimization.

00:03:08.510 --> 00:03:12.700
Finally, just in order to
get a nicer illustration,

00:03:12.700 --> 00:03:15.920
instead of taking 0
means, we're assuming

00:03:15.920 --> 00:03:19.620
that the initial position
has a mean of 200.

00:03:19.620 --> 00:03:22.560
So we're starting
somewhere around here.

00:03:22.560 --> 00:03:26.230
And furthermore, the initial
velocity has a mean of 50.

00:03:26.230 --> 00:03:29.920
So we expect the object
to start moving upwards.

00:03:29.920 --> 00:03:32.650
How does this change
the formulation?

00:03:32.650 --> 00:03:35.880
Well, remember, that
this term and this term

00:03:35.880 --> 00:03:41.480
originated from the priors
for Theta0 and Theta1.

00:03:41.480 --> 00:03:46.380
If we now change the means,
the priors will change.

00:03:46.380 --> 00:03:50.600
And what happens, if you recall
the formula for the normal PDF

00:03:50.600 --> 00:03:54.300
and how the mean enters,
after you take logarithms,

00:03:54.300 --> 00:03:57.010
you see that instead
of having here theta0,

00:03:57.010 --> 00:04:02.980
you should have theta0 minus
the mean of theta0 squared.

00:04:02.980 --> 00:04:06.640
And this leads us to the
following formulation.

00:04:06.640 --> 00:04:10.630
So this is the formulation
that we will consider.

00:04:10.630 --> 00:04:14.030
We obtain these data points,
and for these particular data

00:04:14.030 --> 00:04:17.680
points and for known times at
which the measurements were

00:04:17.680 --> 00:04:22.260
taken, we put these numbers
into this minimization, carried

00:04:22.260 --> 00:04:26.450
it out numerically, and
this is what we got.

00:04:26.450 --> 00:04:29.680
We got estimates for the
different parameters.

00:04:29.680 --> 00:04:33.640
And using this estimates,
we can use this expression

00:04:33.640 --> 00:04:36.240
to construct an
estimated trajectory.

00:04:36.240 --> 00:04:39.870
And the estimated trajectory
is given by the red curve.

00:04:39.870 --> 00:04:43.500
It seems to be doing
somewhat of a reasonable job,

00:04:43.500 --> 00:04:44.720
but not quite.

00:04:44.720 --> 00:04:49.120
The distance between these two
curves is quite substantial.

00:04:49.120 --> 00:04:50.970
How could we do a little better?

00:04:50.970 --> 00:04:54.659
Why is it that we're
not doing very well?

00:04:54.659 --> 00:04:57.210
Let's think intuitively.

00:04:57.210 --> 00:05:01.200
One of the parameters we
wish to estimate is Theta1.

00:05:01.200 --> 00:05:03.280
And Theta1 is a velocity.

00:05:03.280 --> 00:05:05.590
Now, all of our measurements
are concentrated

00:05:05.590 --> 00:05:07.830
at pretty much the same time.

00:05:07.830 --> 00:05:11.430
But if you measure an object
only at a certain time,

00:05:11.430 --> 00:05:14.490
it is very difficult to
estimate its velocity.

00:05:14.490 --> 00:05:16.820
A much better idea
would be to try

00:05:16.820 --> 00:05:19.900
to measure the position of
the object at different times

00:05:19.900 --> 00:05:23.700
and use that information
to estimate velocity.

00:05:23.700 --> 00:05:27.300
So let us instead of
taking all the measurements

00:05:27.300 --> 00:05:31.490
around the initial time,
have five measurements

00:05:31.490 --> 00:05:34.980
in the beginning and five
measurements towards the end.

00:05:34.980 --> 00:05:37.380
The total number of
measurements in this example

00:05:37.380 --> 00:05:40.280
is the same as in
the previous example.

00:05:40.280 --> 00:05:44.780
And once more, we generate
a simulated trajectory

00:05:44.780 --> 00:05:47.010
according to the
probability distributions

00:05:47.010 --> 00:05:48.409
that we are assuming.

00:05:48.409 --> 00:05:51.270
Then we generate data
according to this model

00:05:51.270 --> 00:05:54.909
and we wish to estimate
this trajectory.

00:05:54.909 --> 00:05:58.380
We take the data, plug them
into this minimization,

00:05:58.380 --> 00:06:02.390
carry it out numerically,
and this is what we obtain.

00:06:02.390 --> 00:06:06.160
So we see that here we
are doing a lot better.

00:06:06.160 --> 00:06:08.510
The estimated
trajectory is quite

00:06:08.510 --> 00:06:12.230
close to the unknown
blue trajectory,

00:06:12.230 --> 00:06:17.910
even though the data seems
to be scattered quite a bit.

00:06:17.910 --> 00:06:19.440
This is a very nice property.

00:06:19.440 --> 00:06:23.290
But is it just an accident
of this numerical experiment?

00:06:23.290 --> 00:06:26.730
Or, also, to put it
differently, once you

00:06:26.730 --> 00:06:29.510
get your estimated
trajectory, yes, it

00:06:29.510 --> 00:06:32.380
is true that it is close
to the blue trajectory,

00:06:32.380 --> 00:06:35.980
but you do not necessarily
know that fact.

00:06:35.980 --> 00:06:38.190
It is one thing to
have an estimate that

00:06:38.190 --> 00:06:43.310
is close to the true value,
and it's a different thing

00:06:43.310 --> 00:06:45.450
to have an estimate
that you know

00:06:45.450 --> 00:06:48.380
that it is close
to the true value.

00:06:48.380 --> 00:06:52.170
So how could we get some
guarantees that, indeed, this

00:06:52.170 --> 00:06:56.520
is the case, that we
have good estimates?

00:06:56.520 --> 00:06:59.070
Here's how it goes.

00:06:59.070 --> 00:07:02.420
As we discussed before,
the posterior distribution

00:07:02.420 --> 00:07:06.670
of the Thetas given
the data is normal.

00:07:06.670 --> 00:07:10.940
And for similar reasons,
the posterior distribution

00:07:10.940 --> 00:07:14.400
of this quantity, which
is the true position,

00:07:14.400 --> 00:07:18.650
it's what we denoted by X of t,
the posterior distribution of X

00:07:18.650 --> 00:07:21.890
of t is also normal.

00:07:21.890 --> 00:07:27.040
And in fact, what we obtain from
this diagram is at any given

00:07:27.040 --> 00:07:30.430
point it's the maximum of
posteriority probability

00:07:30.430 --> 00:07:35.530
estimate of the position
X of t at that time.

00:07:35.530 --> 00:07:39.270
However, besides just
this point estimate,

00:07:39.270 --> 00:07:41.050
we have additional information.

00:07:44.940 --> 00:07:47.950
We know that the posterior
distribution of X of t

00:07:47.950 --> 00:07:48.810
is normal.

00:07:48.810 --> 00:07:51.110
And so, for example,
at this time,

00:07:51.110 --> 00:07:54.900
this is the peak
of the posterior.

00:07:54.900 --> 00:07:58.430
This is the maximum a
posteriori probability estimate.

00:07:58.430 --> 00:08:04.720
By we are also able to calculate
the variance of this posterior

00:08:04.720 --> 00:08:06.180
distribution.

00:08:06.180 --> 00:08:08.650
This is a calculation
that's a bit complicated

00:08:08.650 --> 00:08:11.370
for the multivariate
case, for the case where

00:08:11.370 --> 00:08:13.390
you have multiple
unknown parameters.

00:08:13.390 --> 00:08:15.300
We will not get into it.

00:08:15.300 --> 00:08:17.580
But we did see
earlier an example

00:08:17.580 --> 00:08:19.950
where we had a single
unknown parameter,

00:08:19.950 --> 00:08:22.360
and in which we were
able to calculate

00:08:22.360 --> 00:08:24.620
the variance of the
posterior distribution.

00:08:24.620 --> 00:08:27.220
So the idea is somewhat similar.

00:08:27.220 --> 00:08:30.900
So not only we have an
estimate for the position

00:08:30.900 --> 00:08:33.409
of the object at
this particular time,

00:08:33.409 --> 00:08:36.230
but we also have a
probability distribution

00:08:36.230 --> 00:08:39.980
for what the true
position might be.

00:08:39.980 --> 00:08:43.240
And once we have such
a posterior probability

00:08:43.240 --> 00:08:48.340
distribution, we can find an
interval with the property

00:08:48.340 --> 00:08:55.020
that 95% of the probability
is inside that interval.

00:08:55.020 --> 00:08:58.860
In other words, we construct
an interval with the property

00:08:58.860 --> 00:09:04.235
that the probability that X
of t belongs to the interval.

00:09:06.790 --> 00:09:09.440
(Now, we're talking about
posterior probabilities.

00:09:09.440 --> 00:09:15.660
So it is a posterior
probability, given the data.)

00:09:15.660 --> 00:09:20.670
This probability
is, let's say, 0.95.

00:09:20.670 --> 00:09:23.970
Such an interval gives useful
information besides a point

00:09:23.970 --> 00:09:28.470
estimate, it also gives us
a range of possible values.

00:09:28.470 --> 00:09:31.760
And outside this range,
it is quite unlikely

00:09:31.760 --> 00:09:34.770
to have the true
trajectory be out there.

00:09:34.770 --> 00:09:38.800
So here we're showing some
confidence intervals that

00:09:38.800 --> 00:09:42.830
apply to different times,
and they're pretty narrow,

00:09:42.830 --> 00:09:44.260
they're pretty small.

00:09:44.260 --> 00:09:48.370
And they indicate,
they give us confidence

00:09:48.370 --> 00:09:51.230
that we have pretty
accurate estimates

00:09:51.230 --> 00:09:53.940
of the true trajectory.

00:09:53.940 --> 00:09:55.760
This kind of
confidence intervals

00:09:55.760 --> 00:09:59.060
that we have discussed in
the context of this examples

00:09:59.060 --> 00:10:03.580
are called Bayesian
confidence intervals.

00:10:03.580 --> 00:10:06.540
And they're very useful when
you report your results,

00:10:06.540 --> 00:10:08.560
to not just give
point estimates,

00:10:08.560 --> 00:10:13.540
but to also provide
confidence intervals.

00:10:13.540 --> 00:10:16.150
Coming back to the
bigger picture, what

00:10:16.150 --> 00:10:18.490
happened in this
particular example

00:10:18.490 --> 00:10:22.100
is quite indicative of many
real world applications.

00:10:22.100 --> 00:10:24.870
One starts with a
linear model, in which

00:10:24.870 --> 00:10:29.590
we have a linear relation
between the variables that

00:10:29.590 --> 00:10:32.180
are unknown and
the observations,

00:10:32.180 --> 00:10:35.180
but where also the observations
are corrupted by noise.

00:10:35.180 --> 00:10:38.495
One makes certain normality
and independence assumptions.

00:10:38.495 --> 00:10:41.500
And as long as the modeling
has been done carefully

00:10:41.500 --> 00:10:44.050
and the assumptions
are justified, then

00:10:44.050 --> 00:10:47.560
by carrying out this
procedure, one usually

00:10:47.560 --> 00:10:52.990
obtains estimates that are very
helpful and very informative.