WEBVTT

00:00:01.560 --> 00:00:04.260
As we discussed
earlier-- even if we

00:00:04.260 --> 00:00:06.390
have multiple
observations we can still

00:00:06.390 --> 00:00:11.050
find the structure of the best
linear estimator in a fairly

00:00:11.050 --> 00:00:13.650
simple, computational
way-- by solving

00:00:13.650 --> 00:00:15.470
a system of linear equations.

00:00:15.470 --> 00:00:19.880
But usually, we do not get
nice and simple formulas.

00:00:19.880 --> 00:00:22.010
But here is a nice
example, in which

00:00:22.010 --> 00:00:25.430
we will get a simple formula.

00:00:25.430 --> 00:00:27.550
The example is
something that we have

00:00:27.550 --> 00:00:30.930
seen before-- in various guises.

00:00:30.930 --> 00:00:34.080
We're trying to estimate a
certain quantity-- Theta.

00:00:34.080 --> 00:00:39.060
And what we obtain is multiple,
noisy observations of Theta.

00:00:39.060 --> 00:00:43.800
That is-- at each observation
we see Theta plus a noise term.

00:00:43.800 --> 00:00:46.340
The assumptions that
we make is that Theta

00:00:46.340 --> 00:00:49.290
has a prior distribution
with a certain mean

00:00:49.290 --> 00:00:50.820
and a certain variance.

00:00:50.820 --> 00:00:53.950
And the noise terms
are zero mean--

00:00:53.950 --> 00:00:56.750
but they have some variance.

00:00:56.750 --> 00:00:59.060
And the additional
assumption that we make

00:00:59.060 --> 00:01:01.790
is that all of these
random variables

00:01:01.790 --> 00:01:03.860
are independent of each other.

00:01:03.860 --> 00:01:06.080
So the noise terms
are independent

00:01:06.080 --> 00:01:09.850
between themselves-- and also,
the noise terms are independent

00:01:09.850 --> 00:01:11.360
of Theta.

00:01:11.360 --> 00:01:14.060
This is the usual
assumption-- but actually--

00:01:14.060 --> 00:01:15.640
in the linear
estimation problem,

00:01:15.640 --> 00:01:19.010
we do not need to make an
independence assumption.

00:01:19.010 --> 00:01:21.560
It's enough for our
purposes to just assume

00:01:21.560 --> 00:01:23.890
that they are uncorrelated.

00:01:23.890 --> 00:01:26.320
So we will assume that the
correlation coefficient

00:01:26.320 --> 00:01:32.000
between any two of these random
variables is equal to zero.

00:01:32.000 --> 00:01:37.890
Now we can write down the form
of the mean squared estimation

00:01:37.890 --> 00:01:40.250
error criterion that
we have-- and try

00:01:40.250 --> 00:01:42.759
to find good choices
for the coefficients

00:01:42.759 --> 00:01:46.590
to be attached to each
one of the observations.

00:01:46.590 --> 00:01:48.940
However-- we're going
to find the solution

00:01:48.940 --> 00:01:51.789
to this problem using
a shortcut that's

00:01:51.789 --> 00:01:55.770
going to bypass all
kinds of computations.

00:01:55.770 --> 00:01:58.680
Here's the trick.

00:01:58.680 --> 00:02:03.840
Let us suppose-- in addition--
that these random variable were

00:02:03.840 --> 00:02:06.740
not just uncorrelated,
but independent.

00:02:06.740 --> 00:02:09.628
And that they happen to be
normal random variables.

00:02:12.790 --> 00:02:16.760
This is a problem that
we did study before--

00:02:16.760 --> 00:02:20.590
and we did find the maximum
a posteriori probability

00:02:20.590 --> 00:02:22.990
estimate of Theta.

00:02:22.990 --> 00:02:25.040
Because the
posterior was normal,

00:02:25.040 --> 00:02:28.560
and we found that this was also
the conditional expectation

00:02:28.560 --> 00:02:30.450
estimator of Theta.

00:02:30.450 --> 00:02:36.730
And we did find a formula for
it-- which took this form.

00:02:36.730 --> 00:02:40.670
This was the form of the
optimal estimate of Theta.

00:02:40.670 --> 00:02:44.490
If you obtain values for
the different observations--

00:02:44.490 --> 00:02:47.120
little xi.

00:02:47.120 --> 00:02:49.230
On the other hand-- if
you want to translate this

00:02:49.230 --> 00:02:54.350
into random variable notation--
then notice that this is going

00:02:54.350 --> 00:02:57.620
to be a random variable,
this is our estimator.

00:02:57.620 --> 00:03:00.890
it's a conditional
expectation of Theta given X--

00:03:00.890 --> 00:03:03.580
and it's random
because it depends

00:03:03.580 --> 00:03:07.430
on the values of the data that
we see-- which are themselves

00:03:07.430 --> 00:03:08.920
random variables.

00:03:08.920 --> 00:03:11.950
On the other hand--
this x0-- is actually

00:03:11.950 --> 00:03:14.220
the prior mean of Theta.

00:03:14.220 --> 00:03:16.210
So this is a constant--
it's not random,

00:03:16.210 --> 00:03:20.500
and that's why we keep it
with a lowercase notation.

00:03:20.500 --> 00:03:23.400
Now-- what is interesting
about this form?

00:03:23.400 --> 00:03:27.550
It is a linear function
of the observations.

00:03:27.550 --> 00:03:31.800
And as we have discussed
earlier-- if it turns out

00:03:31.800 --> 00:03:35.530
that the conditional expectation
is linear in the observations,

00:03:35.530 --> 00:03:41.079
then this is also the best
possible linear estimator.

00:03:41.079 --> 00:03:44.030
So for the special case--
where our random variables are

00:03:44.030 --> 00:03:48.060
independent and normal--
we have a formula

00:03:48.060 --> 00:03:51.600
for the best linear estimator.

00:03:51.600 --> 00:03:55.070
Now what if they are not normal?

00:03:55.070 --> 00:03:57.550
Suppose that they're not normal.

00:03:57.550 --> 00:04:03.040
But that they have the same
means and variances as here--

00:04:03.040 --> 00:04:05.830
as in the normal example.

00:04:05.830 --> 00:04:08.870
Since these means and
variances are the same--

00:04:08.870 --> 00:04:12.330
and since these random
variables are uncorrelated--

00:04:12.330 --> 00:04:15.190
it follows that also
all kinds of covariances

00:04:15.190 --> 00:04:17.800
between the random
variable is involved here--

00:04:17.800 --> 00:04:22.670
are going to be the same
as for the normal example.

00:04:22.670 --> 00:04:27.150
Now, the optimal solution to
the linear estimation problem--

00:04:27.150 --> 00:04:31.100
as we discussed
earlier-- only cares

00:04:31.100 --> 00:04:34.159
about the means,
variances, and covariances

00:04:34.159 --> 00:04:36.380
of the random
variables involved.

00:04:36.380 --> 00:04:39.690
The details of the
distribution do not matter.

00:04:39.690 --> 00:04:42.150
So whether we have
normal distributions--

00:04:42.150 --> 00:04:45.890
or non-normal distributions--
as long as we're

00:04:45.890 --> 00:04:50.100
making enough assumptions that
fix all the means, variances,

00:04:50.100 --> 00:04:52.680
and covariances of
interest-- we should

00:04:52.680 --> 00:04:56.130
be getting exactly
the same solution.

00:04:56.130 --> 00:04:59.560
Therefore-- this
solution remains valid

00:04:59.560 --> 00:05:04.160
for the case of general
random variables, as well.