WEBVTT
00:00:00.800 --> 00:00:03.050
In this segment,
we're going to go over
00:00:03.050 --> 00:00:06.800
a few theoretical properties of
the estimation error in least
00:00:06.800 --> 00:00:09.140
mean squares estimation.
00:00:09.140 --> 00:00:13.400
Recall that our least
mean squares estimator
00:00:13.400 --> 00:00:16.410
is the conditional expectation
of the unknown random variable,
00:00:16.410 --> 00:00:18.830
given our observations.
00:00:18.830 --> 00:00:20.680
Let us define the
error, which is
00:00:20.680 --> 00:00:22.620
the difference
between the estimator
00:00:22.620 --> 00:00:26.160
and the random variable that
we are trying to estimate.
00:00:26.160 --> 00:00:28.710
Let us start with
some observations.
00:00:28.710 --> 00:00:32.689
What is the expected
value of our estimator?
00:00:32.689 --> 00:00:36.100
Well, using the law of
iterated expectations,
00:00:36.100 --> 00:00:38.730
the expectation of a
conditional expectation
00:00:38.730 --> 00:00:41.355
is the same as the
unconditional expectation.
00:00:44.180 --> 00:00:47.650
And using this property,
by moving this Theta
00:00:47.650 --> 00:00:50.240
to the other side,
what we obtain
00:00:50.240 --> 00:00:55.950
is that the estimation error
has an expectation of 0.
00:00:55.950 --> 00:00:58.370
So this tells us that
the estimation error,
00:00:58.370 --> 00:01:02.570
on the average, is equal
to 0, which is good news.
00:01:02.570 --> 00:01:06.590
In fact, something
stronger is true.
00:01:06.590 --> 00:01:11.900
Not just the overall average
of the estimation error is 0,
00:01:11.900 --> 00:01:16.780
but even if you condition on a
particular measurement, still
00:01:16.780 --> 00:01:19.660
the conditional expectation
of your estimation error
00:01:19.660 --> 00:01:21.610
is going to be equal to 0.
00:01:21.610 --> 00:01:24.180
Let us derive this relation.
00:01:24.180 --> 00:01:28.740
We're looking at the expected
value of Theta tilde, which
00:01:28.740 --> 00:01:36.550
is Theta hat minus Theta,
conditional on a value of X.
00:01:36.550 --> 00:01:39.530
Now, if I tell you
the value of X,
00:01:39.530 --> 00:01:42.039
then the estimator is
completely determined--
00:01:42.039 --> 00:01:44.070
there's no
uncertainty about it--
00:01:44.070 --> 00:01:48.030
so the expectation of Theta hat,
in this conditional universe,
00:01:48.030 --> 00:01:51.740
is just Theta hat itself.
00:01:51.740 --> 00:01:54.990
And we're left with
the second term,
00:01:54.990 --> 00:01:59.280
but the second term
is also Theta hat,
00:01:59.280 --> 00:02:04.310
and therefore we obtain
a difference of 0.
00:02:04.310 --> 00:02:09.180
Let us now move to a slightly
more complicated question.
00:02:09.180 --> 00:02:11.610
What is the covariance
between the estimation
00:02:11.610 --> 00:02:15.570
error and the estimate?
00:02:15.570 --> 00:02:18.690
We will calculate the
covariance as follows.
00:02:18.690 --> 00:02:21.740
It is the expected
value of the product
00:02:21.740 --> 00:02:25.829
of the two random variables
that we are interested in,
00:02:25.829 --> 00:02:28.913
minus the product of
their expectations.
00:02:35.290 --> 00:02:38.290
Now, we already calculated
that the expected value
00:02:38.290 --> 00:02:42.130
of the estimation
error is equal to 0,
00:02:42.130 --> 00:02:46.760
and therefore, this
term here disappears.
00:02:46.760 --> 00:02:50.329
This term is equal to 0.
00:02:50.329 --> 00:02:54.700
So we now need to
calculate the first term.
00:02:54.700 --> 00:02:58.290
This may seem difficult,
but conditioning is always
00:02:58.290 --> 00:03:01.050
a great trick, so let's do that.
00:03:01.050 --> 00:03:05.904
Let us start by calculating
the conditional expectation
00:03:05.904 --> 00:03:06.570
of this product.
00:03:14.450 --> 00:03:17.180
As before, in the
conditional universe,
00:03:17.180 --> 00:03:21.710
where we're told the value
of X, the value of Theta hat
00:03:21.710 --> 00:03:22.960
is known.
00:03:22.960 --> 00:03:25.460
It is becoming a
constant, so it can
00:03:25.460 --> 00:03:27.215
be pulled outside
the expectation.
00:03:34.150 --> 00:03:38.490
But then we can apply the fact
that we established earlier
00:03:38.490 --> 00:03:44.240
that this term is 0, and
therefore, we obtain a 0 here.
00:03:44.240 --> 00:03:49.800
Now, the expected value
of a random variable
00:03:49.800 --> 00:03:52.200
is the same as
the expected value
00:03:52.200 --> 00:03:54.320
of the conditional expectation.
00:03:54.320 --> 00:03:57.700
This is, again, the law
of iterated expectations.
00:03:57.700 --> 00:04:00.870
Since the conditional
expectation is 0,
00:04:00.870 --> 00:04:03.610
when we apply the law
of iterated expectations
00:04:03.610 --> 00:04:07.370
to this quantity,
we also obtain a 0.
00:04:07.370 --> 00:04:10.790
Therefore, this
term is 0 as well,
00:04:10.790 --> 00:04:13.250
and we have established
what we wanted to show.
00:04:16.269 --> 00:04:20.620
Using this fact, now
we can figure out
00:04:20.620 --> 00:04:23.300
that the following is true.
00:04:23.300 --> 00:04:26.270
We write the random
variable Theta
00:04:26.270 --> 00:04:33.110
as the sum of Theta
hat minus Theta tilde.
00:04:33.110 --> 00:04:36.420
This comes simply from
this definition here,
00:04:36.420 --> 00:04:38.659
by just moving
Theta to this side,
00:04:38.659 --> 00:04:41.080
and Theta tilde
to the other side.
00:04:41.080 --> 00:04:45.909
So Theta is the difference
of two random variables,
00:04:45.909 --> 00:04:49.890
and these two random
variables have 0 covariance.
00:04:49.890 --> 00:04:53.310
When two random variables
have 0 covariance,
00:04:53.310 --> 00:04:57.159
then the variance of their
sum, or of their difference,
00:04:57.159 --> 00:04:59.270
is the sum of the variances.
00:04:59.270 --> 00:05:01.640
And this leads us
to this relation--
00:05:01.640 --> 00:05:03.560
that the variance of
our random variable
00:05:03.560 --> 00:05:06.180
can be decomposed
into two pieces.
00:05:06.180 --> 00:05:10.100
One of them is the
variance of the estimator,
00:05:10.100 --> 00:05:13.390
and the other is the variance
of the estimation error.
00:05:15.930 --> 00:05:18.290
This is an interesting fact.
00:05:18.290 --> 00:05:21.360
It can actually be derived
in a different way, as well.
00:05:21.360 --> 00:05:25.240
It is just a manifestation of
the law of total variances,
00:05:25.240 --> 00:05:29.480
but hidden in somewhat
different notation.
00:05:29.480 --> 00:05:31.060
And this concludes
our discussion
00:05:31.060 --> 00:05:34.280
of theoretical properties
of the estimation error.
00:05:34.280 --> 00:05:37.220
Unfortunately we will
not have the opportunity
00:05:37.220 --> 00:05:40.180
to use them in any
interesting ways.
00:05:40.180 --> 00:05:43.120
On the other hand, they
are a foundational piece
00:05:43.120 --> 00:05:47.000
for the more general theory
of least-squares estimation.
00:05:47.000 --> 00:05:51.050
If you try to develop it in
a more sophisticated and more
00:05:51.050 --> 00:05:54.300
deep way, it turns out
that these properties
00:05:54.300 --> 00:05:57.504
are cornerstones of that theory.