WEBVTT
00:00:00.806 --> 00:00:04.200
Let us now come back to the
trajectory estimation problem
00:00:04.200 --> 00:00:06.250
that we introduced earlier.
00:00:06.250 --> 00:00:09.630
We have an object
that moves vertically.
00:00:09.630 --> 00:00:14.720
At any given time t, the height
at which the object is found
00:00:14.720 --> 00:00:17.350
is equal to this expression.
00:00:17.350 --> 00:00:19.950
It corresponds to the
following-- the object starts
00:00:19.950 --> 00:00:22.940
at time 0, at some
initial height Theta0,
00:00:22.940 --> 00:00:25.570
it has an initial
velocity of Theta1,
00:00:25.570 --> 00:00:27.980
but also has a
certain acceleration.
00:00:27.980 --> 00:00:29.890
And if Theta2 is
negative, this will
00:00:29.890 --> 00:00:31.750
be a downwards
acceleration, which
00:00:31.750 --> 00:00:36.360
means that the object eventually
will turn and start going down.
00:00:36.360 --> 00:00:39.850
So this is a typical trajectory
of such an object, where
00:00:39.850 --> 00:00:44.010
here we're plotting the
height as a function of time.
00:00:44.010 --> 00:00:47.810
However, the Thetas
are unknown and they
00:00:47.810 --> 00:00:51.120
are random-- we do not
know what they are.
00:00:51.120 --> 00:00:53.745
So this blue curve
is just a simulation
00:00:53.745 --> 00:00:57.910
where we drew values for those
random variables at random.
00:00:57.910 --> 00:00:59.640
But if we were to
simulate again,
00:00:59.640 --> 00:01:02.380
we might obtain a somewhat
different blue curve,
00:01:02.380 --> 00:01:06.250
because the values of the Thetas
might have been different.
00:01:06.250 --> 00:01:09.830
We do not observe the
true trajectory directly.
00:01:09.830 --> 00:01:13.610
What we do observe is
certain data points.
00:01:13.610 --> 00:01:14.820
What are they?
00:01:14.820 --> 00:01:19.000
At certain times ti
we make a measurement
00:01:19.000 --> 00:01:23.030
of the height of the object,
except that this measurement
00:01:23.030 --> 00:01:26.120
is corrupted by
some additive noise.
00:01:26.120 --> 00:01:28.440
This is the model that
we introduced earlier.
00:01:28.440 --> 00:01:31.340
And our assumptions were that
all of the random variables
00:01:31.340 --> 00:01:36.020
involved-- the Thetas and the
W's were normal with 0 mean
00:01:36.020 --> 00:01:38.539
and were also independent.
00:01:38.539 --> 00:01:43.390
In that case, we saw that
maximizing the posterior
00:01:43.390 --> 00:01:49.000
distribution of the Thetas
after taking logarithms
00:01:49.000 --> 00:01:52.690
amounted to minimizing
this quadratic function
00:01:52.690 --> 00:01:54.039
of the thetas.
00:01:54.039 --> 00:01:57.370
So once we have some data
available in our hands,
00:01:57.370 --> 00:02:00.500
we look at this expression
as a function of the thetas
00:02:00.500 --> 00:02:04.270
and find the thetas that
are as as good as possible
00:02:04.270 --> 00:02:06.180
in terms of this criterion.
00:02:06.180 --> 00:02:10.360
And this is the MAP methodology
for this particular example.
00:02:10.360 --> 00:02:12.490
Now, for the purposes
of this illustration,
00:02:12.490 --> 00:02:16.430
actually, we will change our
assumptions a little bit.
00:02:16.430 --> 00:02:17.850
They will be as follows.
00:02:17.850 --> 00:02:22.300
Regarding the acceleration, we
will take it to be a constant.
00:02:22.300 --> 00:02:24.240
The acceleration
term often has to do
00:02:24.240 --> 00:02:26.650
with gravitational
effects which are known,
00:02:26.650 --> 00:02:28.860
so we will treat
Theta2 as a constant.
00:02:28.860 --> 00:02:30.770
And that means that
there's no point
00:02:30.770 --> 00:02:34.060
in having a prior
distribution for Theta2.
00:02:34.060 --> 00:02:36.860
So this term here,
which originated
00:02:36.860 --> 00:02:41.570
from the prior distribution of
Theta2 is going to disappear.
00:02:41.570 --> 00:02:45.760
We will take the variances of
these basic random variables
00:02:45.760 --> 00:02:47.310
to be the same.
00:02:47.310 --> 00:02:49.490
And because of this,
these constants
00:02:49.490 --> 00:02:52.030
here will all be the same.
00:02:52.030 --> 00:02:55.560
Therefore, we can take them
outside of this expression,
00:02:55.560 --> 00:02:58.670
and outside the minimization
they will not matter.
00:02:58.670 --> 00:03:02.220
So we can remove them
from the picture.
00:03:02.220 --> 00:03:05.330
The factor of 1/2 can
also be removed similarly.
00:03:05.330 --> 00:03:08.510
It does not affect
the minimization.
00:03:08.510 --> 00:03:12.700
Finally, just in order to
get a nicer illustration,
00:03:12.700 --> 00:03:15.920
instead of taking 0
means, we're assuming
00:03:15.920 --> 00:03:19.620
that the initial position
has a mean of 200.
00:03:19.620 --> 00:03:22.560
So we're starting
somewhere around here.
00:03:22.560 --> 00:03:26.230
And furthermore, the initial
velocity has a mean of 50.
00:03:26.230 --> 00:03:29.920
So we expect the object
to start moving upwards.
00:03:29.920 --> 00:03:32.650
How does this change
the formulation?
00:03:32.650 --> 00:03:35.880
Well, remember, that
this term and this term
00:03:35.880 --> 00:03:41.480
originated from the priors
for Theta0 and Theta1.
00:03:41.480 --> 00:03:46.380
If we now change the means,
the priors will change.
00:03:46.380 --> 00:03:50.600
And what happens, if you recall
the formula for the normal PDF
00:03:50.600 --> 00:03:54.300
and how the mean enters,
after you take logarithms,
00:03:54.300 --> 00:03:57.010
you see that instead
of having here theta0,
00:03:57.010 --> 00:04:02.980
you should have theta0 minus
the mean of theta0 squared.
00:04:02.980 --> 00:04:06.640
And this leads us to the
following formulation.
00:04:06.640 --> 00:04:10.630
So this is the formulation
that we will consider.
00:04:10.630 --> 00:04:14.030
We obtain these data points,
and for these particular data
00:04:14.030 --> 00:04:17.680
points and for known times at
which the measurements were
00:04:17.680 --> 00:04:22.260
taken, we put these numbers
into this minimization, carried
00:04:22.260 --> 00:04:26.450
it out numerically, and
this is what we got.
00:04:26.450 --> 00:04:29.680
We got estimates for the
different parameters.
00:04:29.680 --> 00:04:33.640
And using this estimates,
we can use this expression
00:04:33.640 --> 00:04:36.240
to construct an
estimated trajectory.
00:04:36.240 --> 00:04:39.870
And the estimated trajectory
is given by the red curve.
00:04:39.870 --> 00:04:43.500
It seems to be doing
somewhat of a reasonable job,
00:04:43.500 --> 00:04:44.720
but not quite.
00:04:44.720 --> 00:04:49.120
The distance between these two
curves is quite substantial.
00:04:49.120 --> 00:04:50.970
How could we do a little better?
00:04:50.970 --> 00:04:54.659
Why is it that we're
not doing very well?
00:04:54.659 --> 00:04:57.210
Let's think intuitively.
00:04:57.210 --> 00:05:01.200
One of the parameters we
wish to estimate is Theta1.
00:05:01.200 --> 00:05:03.280
And Theta1 is a velocity.
00:05:03.280 --> 00:05:05.590
Now, all of our measurements
are concentrated
00:05:05.590 --> 00:05:07.830
at pretty much the same time.
00:05:07.830 --> 00:05:11.430
But if you measure an object
only at a certain time,
00:05:11.430 --> 00:05:14.490
it is very difficult to
estimate its velocity.
00:05:14.490 --> 00:05:16.820
A much better idea
would be to try
00:05:16.820 --> 00:05:19.900
to measure the position of
the object at different times
00:05:19.900 --> 00:05:23.700
and use that information
to estimate velocity.
00:05:23.700 --> 00:05:27.300
So let us instead of
taking all the measurements
00:05:27.300 --> 00:05:31.490
around the initial time,
have five measurements
00:05:31.490 --> 00:05:34.980
in the beginning and five
measurements towards the end.
00:05:34.980 --> 00:05:37.380
The total number of
measurements in this example
00:05:37.380 --> 00:05:40.280
is the same as in
the previous example.
00:05:40.280 --> 00:05:44.780
And once more, we generate
a simulated trajectory
00:05:44.780 --> 00:05:47.010
according to the
probability distributions
00:05:47.010 --> 00:05:48.409
that we are assuming.
00:05:48.409 --> 00:05:51.270
Then we generate data
according to this model
00:05:51.270 --> 00:05:54.909
and we wish to estimate
this trajectory.
00:05:54.909 --> 00:05:58.380
We take the data, plug them
into this minimization,
00:05:58.380 --> 00:06:02.390
carry it out numerically,
and this is what we obtain.
00:06:02.390 --> 00:06:06.160
So we see that here we
are doing a lot better.
00:06:06.160 --> 00:06:08.510
The estimated
trajectory is quite
00:06:08.510 --> 00:06:12.230
close to the unknown
blue trajectory,
00:06:12.230 --> 00:06:17.910
even though the data seems
to be scattered quite a bit.
00:06:17.910 --> 00:06:19.440
This is a very nice property.
00:06:19.440 --> 00:06:23.290
But is it just an accident
of this numerical experiment?
00:06:23.290 --> 00:06:26.730
Or, also, to put it
differently, once you
00:06:26.730 --> 00:06:29.510
get your estimated
trajectory, yes, it
00:06:29.510 --> 00:06:32.380
is true that it is close
to the blue trajectory,
00:06:32.380 --> 00:06:35.980
but you do not necessarily
know that fact.
00:06:35.980 --> 00:06:38.190
It is one thing to
have an estimate that
00:06:38.190 --> 00:06:43.310
is close to the true value,
and it's a different thing
00:06:43.310 --> 00:06:45.450
to have an estimate
that you know
00:06:45.450 --> 00:06:48.380
that it is close
to the true value.
00:06:48.380 --> 00:06:52.170
So how could we get some
guarantees that, indeed, this
00:06:52.170 --> 00:06:56.520
is the case, that we
have good estimates?
00:06:56.520 --> 00:06:59.070
Here's how it goes.
00:06:59.070 --> 00:07:02.420
As we discussed before,
the posterior distribution
00:07:02.420 --> 00:07:06.670
of the Thetas given
the data is normal.
00:07:06.670 --> 00:07:10.940
And for similar reasons,
the posterior distribution
00:07:10.940 --> 00:07:14.400
of this quantity, which
is the true position,
00:07:14.400 --> 00:07:18.650
it's what we denoted by X of t,
the posterior distribution of X
00:07:18.650 --> 00:07:21.890
of t is also normal.
00:07:21.890 --> 00:07:27.040
And in fact, what we obtain from
this diagram is at any given
00:07:27.040 --> 00:07:30.430
point it's the maximum of
posteriority probability
00:07:30.430 --> 00:07:35.530
estimate of the position
X of t at that time.
00:07:35.530 --> 00:07:39.270
However, besides just
this point estimate,
00:07:39.270 --> 00:07:41.050
we have additional information.
00:07:44.940 --> 00:07:47.950
We know that the posterior
distribution of X of t
00:07:47.950 --> 00:07:48.810
is normal.
00:07:48.810 --> 00:07:51.110
And so, for example,
at this time,
00:07:51.110 --> 00:07:54.900
this is the peak
of the posterior.
00:07:54.900 --> 00:07:58.430
This is the maximum a
posteriori probability estimate.
00:07:58.430 --> 00:08:04.720
By we are also able to calculate
the variance of this posterior
00:08:04.720 --> 00:08:06.180
distribution.
00:08:06.180 --> 00:08:08.650
This is a calculation
that's a bit complicated
00:08:08.650 --> 00:08:11.370
for the multivariate
case, for the case where
00:08:11.370 --> 00:08:13.390
you have multiple
unknown parameters.
00:08:13.390 --> 00:08:15.300
We will not get into it.
00:08:15.300 --> 00:08:17.580
But we did see
earlier an example
00:08:17.580 --> 00:08:19.950
where we had a single
unknown parameter,
00:08:19.950 --> 00:08:22.360
and in which we were
able to calculate
00:08:22.360 --> 00:08:24.620
the variance of the
posterior distribution.
00:08:24.620 --> 00:08:27.220
So the idea is somewhat similar.
00:08:27.220 --> 00:08:30.900
So not only we have an
estimate for the position
00:08:30.900 --> 00:08:33.409
of the object at
this particular time,
00:08:33.409 --> 00:08:36.230
but we also have a
probability distribution
00:08:36.230 --> 00:08:39.980
for what the true
position might be.
00:08:39.980 --> 00:08:43.240
And once we have such
a posterior probability
00:08:43.240 --> 00:08:48.340
distribution, we can find an
interval with the property
00:08:48.340 --> 00:08:55.020
that 95% of the probability
is inside that interval.
00:08:55.020 --> 00:08:58.860
In other words, we construct
an interval with the property
00:08:58.860 --> 00:09:04.235
that the probability that X
of t belongs to the interval.
00:09:06.790 --> 00:09:09.440
(Now, we're talking about
posterior probabilities.
00:09:09.440 --> 00:09:15.660
So it is a posterior
probability, given the data.)
00:09:15.660 --> 00:09:20.670
This probability
is, let's say, 0.95.
00:09:20.670 --> 00:09:23.970
Such an interval gives useful
information besides a point
00:09:23.970 --> 00:09:28.470
estimate, it also gives us
a range of possible values.
00:09:28.470 --> 00:09:31.760
And outside this range,
it is quite unlikely
00:09:31.760 --> 00:09:34.770
to have the true
trajectory be out there.
00:09:34.770 --> 00:09:38.800
So here we're showing some
confidence intervals that
00:09:38.800 --> 00:09:42.830
apply to different times,
and they're pretty narrow,
00:09:42.830 --> 00:09:44.260
they're pretty small.
00:09:44.260 --> 00:09:48.370
And they indicate,
they give us confidence
00:09:48.370 --> 00:09:51.230
that we have pretty
accurate estimates
00:09:51.230 --> 00:09:53.940
of the true trajectory.
00:09:53.940 --> 00:09:55.760
This kind of
confidence intervals
00:09:55.760 --> 00:09:59.060
that we have discussed in
the context of this examples
00:09:59.060 --> 00:10:03.580
are called Bayesian
confidence intervals.
00:10:03.580 --> 00:10:06.540
And they're very useful when
you report your results,
00:10:06.540 --> 00:10:08.560
to not just give
point estimates,
00:10:08.560 --> 00:10:13.540
but to also provide
confidence intervals.
00:10:13.540 --> 00:10:16.150
Coming back to the
bigger picture, what
00:10:16.150 --> 00:10:18.490
happened in this
particular example
00:10:18.490 --> 00:10:22.100
is quite indicative of many
real world applications.
00:10:22.100 --> 00:10:24.870
One starts with a
linear model, in which
00:10:24.870 --> 00:10:29.590
we have a linear relation
between the variables that
00:10:29.590 --> 00:10:32.180
are unknown and
the observations,
00:10:32.180 --> 00:10:35.180
but where also the observations
are corrupted by noise.
00:10:35.180 --> 00:10:38.495
One makes certain normality
and independence assumptions.
00:10:38.495 --> 00:10:41.500
And as long as the modeling
has been done carefully
00:10:41.500 --> 00:10:44.050
and the assumptions
are justified, then
00:10:44.050 --> 00:10:47.560
by carrying out this
procedure, one usually
00:10:47.560 --> 00:10:52.990
obtains estimates that are very
helpful and very informative.