WEBVTT
00:00:00.500 --> 00:00:04.240
As a preparation for more
complex and more difficult
00:00:04.240 --> 00:00:08.580
models, we will start by
looking at the simplest model
00:00:08.580 --> 00:00:12.330
that there is, that
involves a linear relation
00:00:12.330 --> 00:00:15.060
and normal random variables.
00:00:15.060 --> 00:00:17.930
The specifics of the
model are as follows.
00:00:17.930 --> 00:00:21.490
There's an unknown parameter
modeled as a random variable,
00:00:21.490 --> 00:00:23.730
Theta, that we wish to estimate.
00:00:23.730 --> 00:00:28.250
What we have in our hands is
Theta plus some additive noise,
00:00:28.250 --> 00:00:31.790
W. And this sum is
our observation,
00:00:31.790 --> 00:00:34.390
X. The assumptions
that we make are
00:00:34.390 --> 00:00:37.470
that Theta and W are
normal random variables.
00:00:37.470 --> 00:00:39.520
And to keep the
calculations simple,
00:00:39.520 --> 00:00:42.770
we assume that they're standard
normal random variables.
00:00:42.770 --> 00:00:45.130
Furthermore, we assume
that Theta and W
00:00:45.130 --> 00:00:48.380
are independent of each other.
00:00:48.380 --> 00:00:54.090
According to the Bayesian
program, inference about Theta
00:00:54.090 --> 00:00:58.450
is essentially the calculation
of the posterior distribution
00:00:58.450 --> 00:01:01.710
of Theta if I tell you that
the observation, capital
00:01:01.710 --> 00:01:05.209
X takes on a specific
value little x.
00:01:05.209 --> 00:01:07.620
To calculate this
posterior distribution,
00:01:07.620 --> 00:01:11.880
we invoke the appropriate
form of the Bayes rule.
00:01:11.880 --> 00:01:13.539
We have the prior of Theta.
00:01:13.539 --> 00:01:15.170
It's a standard normal.
00:01:15.170 --> 00:01:18.280
Now we need to figure out
the conditional distribution
00:01:18.280 --> 00:01:21.530
of X given Theta.
00:01:21.530 --> 00:01:24.010
What is it?
00:01:24.010 --> 00:01:28.060
If I tell you that the random
variable, capital Theta,
00:01:28.060 --> 00:01:32.190
takes on a specific
value, little theta, then
00:01:32.190 --> 00:01:35.650
in that conditional
universe, our observation
00:01:35.650 --> 00:01:40.890
is going to be that specific
value of Theta, which
00:01:40.890 --> 00:01:45.090
is our little theta,
plus the noise term,
00:01:45.090 --> 00:01:48.100
capital W. This is the
relation that's holds
00:01:48.100 --> 00:01:49.860
in the conditional
universe, where
00:01:49.860 --> 00:01:52.970
we are told the value of Theta.
00:01:52.970 --> 00:01:55.710
Now W is independent of Theta.
00:01:55.710 --> 00:01:59.380
So even though I have told
you the value of Theta,
00:01:59.380 --> 00:02:01.780
the distribution of
W does not change.
00:02:01.780 --> 00:02:04.160
It is still a standard normal.
00:02:04.160 --> 00:02:09.139
So X is a standard normal
plus a constant theta.
00:02:09.139 --> 00:02:12.110
What that does is
that it changes
00:02:12.110 --> 00:02:14.920
the mean of the
normal distribution,
00:02:14.920 --> 00:02:18.420
but it will still be a
normal random variable.
00:02:18.420 --> 00:02:21.030
So in this conditional
universe, X
00:02:21.030 --> 00:02:23.790
is going to be a
normal random variable
00:02:23.790 --> 00:02:26.980
with mean equal to
theta, and with variance
00:02:26.980 --> 00:02:31.190
equal to the variance of
W, which is equal to 1.
00:02:31.190 --> 00:02:34.550
So now, we know what
this distribution is,
00:02:34.550 --> 00:02:38.530
and we can move with the
calculation of the posterior.
00:02:38.530 --> 00:02:44.690
So we have the denominator
term, which I'm writing here.
00:02:44.690 --> 00:02:47.860
And then we have the
density of Theta.
00:02:47.860 --> 00:02:50.079
Since it is a
standard normal, it
00:02:50.079 --> 00:02:54.300
takes the form of a constant.
00:02:54.300 --> 00:02:58.430
We do not really care about
the value of that constant.
00:02:58.430 --> 00:03:02.710
What we care really is
the term on the exponent.
00:03:02.710 --> 00:03:07.010
And then we have the
conditional density of X
00:03:07.010 --> 00:03:11.180
given Theta, which is a
normal with these parameters.
00:03:11.180 --> 00:03:17.510
And therefore, it takes the
form c e to the minus 1/2.
00:03:17.510 --> 00:03:20.079
It's a density in x.
00:03:20.079 --> 00:03:24.100
And so, up here, we have x
minus the mean of that density.
00:03:24.100 --> 00:03:29.370
But the mean is equal
to theta, squared.
00:03:29.370 --> 00:03:31.120
And this is the final form.
00:03:31.120 --> 00:03:36.440
Now what we notice here is that
we have a few constant terms.
00:03:36.440 --> 00:03:41.310
Another term that depends on x,
and then a quadratic in theta.
00:03:41.310 --> 00:03:46.360
So we can write all this as
some function of x, and then
00:03:46.360 --> 00:03:50.930
e to the negative of
some quadratic in theta.
00:03:54.640 --> 00:03:59.820
Now when we're doing inference,
we are given the value of X.
00:03:59.820 --> 00:04:04.980
So let us fix a particular
value of little x
00:04:04.980 --> 00:04:08.770
and concentrate on the
dependence on theta.
00:04:08.770 --> 00:04:12.500
So with x being fixed,
this is just a constant.
00:04:12.500 --> 00:04:16.000
And as a function of theta,
it's e to the minus something
00:04:16.000 --> 00:04:17.670
quadratic in theta.
00:04:17.670 --> 00:04:23.130
And we recognize that
this is a normal PDF.
00:04:23.130 --> 00:04:25.380
So we conclude that the
posterior distribution
00:04:25.380 --> 00:04:30.690
of Theta, given our
observation, is normal.
00:04:30.690 --> 00:04:37.260
Since it is normal, the expected
value of this conditional PDF
00:04:37.260 --> 00:04:40.620
will be the same as the
peak of that the PDF.
00:04:40.620 --> 00:04:44.110
And this would be our
point estimate of Theta
00:04:44.110 --> 00:04:45.270
in particular.
00:04:45.270 --> 00:04:48.610
If we use either of the
MAP-- Maximum A Posterior
00:04:48.610 --> 00:04:52.090
Probability-- or the least
mean squares estimator,
00:04:52.090 --> 00:04:54.240
which is defined as the
conditional expectation
00:04:54.240 --> 00:04:57.130
of Theta, given the
observation that we have made.
00:04:57.130 --> 00:04:59.480
So this conditional
expectation is just
00:04:59.480 --> 00:05:01.680
the mean of this
posterior distribution.
00:05:01.680 --> 00:05:05.490
It is also the peak of that
posterior distribution.
00:05:05.490 --> 00:05:08.370
So let us find what the peak is.
00:05:08.370 --> 00:05:12.490
To find the peak, we focus
on the exponent term, which
00:05:12.490 --> 00:05:18.420
is ignoring the minus sign,
the exponent term is this one.
00:05:22.750 --> 00:05:25.420
And to find the peak
of the distribution,
00:05:25.420 --> 00:05:31.910
we need to find the place where
this exponent term is smallest.
00:05:31.910 --> 00:05:33.865
To find out when this
term is smallest,
00:05:33.865 --> 00:05:37.840
we take its derivative
with respect to theta
00:05:37.840 --> 00:05:39.390
and set it equal to 0.
00:05:39.390 --> 00:05:42.640
The derivative of
this term is theta.
00:05:42.640 --> 00:05:48.380
The derivative of this
term is theta minus x.
00:05:48.380 --> 00:05:51.409
We set this to 0.
00:05:51.409 --> 00:05:56.930
And when we solve this equation,
we find 2 theta equal to x.
00:05:56.930 --> 00:06:00.375
Therefore, theta
is equal to x/2.
00:06:00.375 --> 00:06:02.860
And so, we conclude
from here that the peak
00:06:02.860 --> 00:06:07.930
of the distribution occurs
when theta is equal to x/2.
00:06:07.930 --> 00:06:13.010
And this is our
estimate of theta.
00:06:13.010 --> 00:06:16.220
So our estimate
takes into account
00:06:16.220 --> 00:06:20.350
that we believe that
theta is 0 on the average.
00:06:20.350 --> 00:06:24.280
But also takes into account the
observation that we have made,
00:06:24.280 --> 00:06:28.890
and comes up with a value that's
in between our prior mean,
00:06:28.890 --> 00:06:33.870
which was 0, and the
observation, which is little x.
00:06:33.870 --> 00:06:36.800
So this is what
the estimates are.
00:06:36.800 --> 00:06:40.750
If we want to talk about
estimators, which are now
00:06:40.750 --> 00:06:43.520
random variables,
what would they be?
00:06:43.520 --> 00:06:46.030
The estimator is
a random variable
00:06:46.030 --> 00:06:49.420
that takes this value
whenever capital X takes
00:06:49.420 --> 00:06:51.540
the value of little x.
00:06:51.540 --> 00:06:53.530
Therefore, it's the
random variable,
00:06:53.530 --> 00:06:56.940
which is equal to capital X/2.
00:06:56.940 --> 00:06:59.680
This is a relation
between random variables.
00:06:59.680 --> 00:07:02.290
This is a corresponding
relation between numbers
00:07:02.290 --> 00:07:06.590
if you're given a specific
value for little x.
00:07:06.590 --> 00:07:10.020
How special is this example?
00:07:10.020 --> 00:07:14.900
It turns out that the same
structure of the solution
00:07:14.900 --> 00:07:19.000
shows up even if we assume
that Theta and W are
00:07:19.000 --> 00:07:21.300
independent normal
random variables,
00:07:21.300 --> 00:07:24.330
but with some general
means and variances.
00:07:24.330 --> 00:07:26.480
You should be able
to verify on your own
00:07:26.480 --> 00:07:30.150
by repeating the calculations
that we just carried out
00:07:30.150 --> 00:07:35.720
that the posterior distribution
of Theta will still be normal.
00:07:35.720 --> 00:07:39.390
And since it is normal, the
peak of the distribution
00:07:39.390 --> 00:07:41.550
is the same as the
expected value.
00:07:41.550 --> 00:07:45.320
So the expected value, or
least mean squares estimator,
00:07:45.320 --> 00:07:48.330
coincides with the maximum
a-posteriori probability
00:07:48.330 --> 00:07:49.550
estimator.
00:07:49.550 --> 00:07:52.940
And finally, although
this formula will not
00:07:52.940 --> 00:07:56.760
be exactly true, there
will be a similar formula
00:07:56.760 --> 00:07:59.130
for the estimator,
namely the estimator
00:07:59.130 --> 00:08:04.200
will turn out to be a linear
function of the measurements.
00:08:04.200 --> 00:08:06.830
We will see that these
conclusions are actually
00:08:06.830 --> 00:08:09.230
even more general than that.
00:08:09.230 --> 00:08:12.730
And this is what makes
it very appealing
00:08:12.730 --> 00:08:17.840
to work with normal random
variables and linear relations.