1
00:00:00,800 --> 00:00:03,050
In this segment,
we're going to go over
2
00:00:03,050 --> 00:00:06,800
a few theoretical properties of
the estimation error in least
3
00:00:06,800 --> 00:00:09,140
mean squares estimation.
4
00:00:09,140 --> 00:00:13,400
Recall that our least
mean squares estimator
5
00:00:13,400 --> 00:00:16,410
is the conditional expectation
of the unknown random variable,
6
00:00:16,410 --> 00:00:18,830
given our observations.
7
00:00:18,830 --> 00:00:20,680
Let us define the
error, which is
8
00:00:20,680 --> 00:00:22,620
the difference
between the estimator
9
00:00:22,620 --> 00:00:26,160
and the random variable that
we are trying to estimate.
10
00:00:26,160 --> 00:00:28,710
Let us start with
some observations.
11
00:00:28,710 --> 00:00:32,689
What is the expected
value of our estimator?
12
00:00:32,689 --> 00:00:36,100
Well, using the law of
iterated expectations,
13
00:00:36,100 --> 00:00:38,730
the expectation of a
conditional expectation
14
00:00:38,730 --> 00:00:41,355
is the same as the
unconditional expectation.
15
00:00:44,180 --> 00:00:47,650
And using this property,
by moving this Theta
16
00:00:47,650 --> 00:00:50,240
to the other side,
what we obtain
17
00:00:50,240 --> 00:00:55,950
is that the estimation error
has an expectation of 0.
18
00:00:55,950 --> 00:00:58,370
So this tells us that
the estimation error,
19
00:00:58,370 --> 00:01:02,570
on the average, is equal
to 0, which is good news.
20
00:01:02,570 --> 00:01:06,590
In fact, something
stronger is true.
21
00:01:06,590 --> 00:01:11,900
Not just the overall average
of the estimation error is 0,
22
00:01:11,900 --> 00:01:16,780
but even if you condition on a
particular measurement, still
23
00:01:16,780 --> 00:01:19,660
the conditional expectation
of your estimation error
24
00:01:19,660 --> 00:01:21,610
is going to be equal to 0.
25
00:01:21,610 --> 00:01:24,180
Let us derive this relation.
26
00:01:24,180 --> 00:01:28,740
We're looking at the expected
value of Theta tilde, which
27
00:01:28,740 --> 00:01:36,550
is Theta hat minus Theta,
conditional on a value of X.
28
00:01:36,550 --> 00:01:39,530
Now, if I tell you
the value of X,
29
00:01:39,530 --> 00:01:42,039
then the estimator is
completely determined--
30
00:01:42,039 --> 00:01:44,070
there's no
uncertainty about it--
31
00:01:44,070 --> 00:01:48,030
so the expectation of Theta hat,
in this conditional universe,
32
00:01:48,030 --> 00:01:51,740
is just Theta hat itself.
33
00:01:51,740 --> 00:01:54,990
And we're left with
the second term,
34
00:01:54,990 --> 00:01:59,280
but the second term
is also Theta hat,
35
00:01:59,280 --> 00:02:04,310
and therefore we obtain
a difference of 0.
36
00:02:04,310 --> 00:02:09,180
Let us now move to a slightly
more complicated question.
37
00:02:09,180 --> 00:02:11,610
What is the covariance
between the estimation
38
00:02:11,610 --> 00:02:15,570
error and the estimate?
39
00:02:15,570 --> 00:02:18,690
We will calculate the
covariance as follows.
40
00:02:18,690 --> 00:02:21,740
It is the expected
value of the product
41
00:02:21,740 --> 00:02:25,829
of the two random variables
that we are interested in,
42
00:02:25,829 --> 00:02:28,913
minus the product of
their expectations.
43
00:02:35,290 --> 00:02:38,290
Now, we already calculated
that the expected value
44
00:02:38,290 --> 00:02:42,130
of the estimation
error is equal to 0,
45
00:02:42,130 --> 00:02:46,760
and therefore, this
term here disappears.
46
00:02:46,760 --> 00:02:50,329
This term is equal to 0.
47
00:02:50,329 --> 00:02:54,700
So we now need to
calculate the first term.
48
00:02:54,700 --> 00:02:58,290
This may seem difficult,
but conditioning is always
49
00:02:58,290 --> 00:03:01,050
a great trick, so let's do that.
50
00:03:01,050 --> 00:03:05,904
Let us start by calculating
the conditional expectation
51
00:03:05,904 --> 00:03:06,570
of this product.
52
00:03:14,450 --> 00:03:17,180
As before, in the
conditional universe,
53
00:03:17,180 --> 00:03:21,710
where we're told the value
of X, the value of Theta hat
54
00:03:21,710 --> 00:03:22,960
is known.
55
00:03:22,960 --> 00:03:25,460
It is becoming a
constant, so it can
56
00:03:25,460 --> 00:03:27,215
be pulled outside
the expectation.
57
00:03:34,150 --> 00:03:38,490
But then we can apply the fact
that we established earlier
58
00:03:38,490 --> 00:03:44,240
that this term is 0, and
therefore, we obtain a 0 here.
59
00:03:44,240 --> 00:03:49,800
Now, the expected value
of a random variable
60
00:03:49,800 --> 00:03:52,200
is the same as
the expected value
61
00:03:52,200 --> 00:03:54,320
of the conditional expectation.
62
00:03:54,320 --> 00:03:57,700
This is, again, the law
of iterated expectations.
63
00:03:57,700 --> 00:04:00,870
Since the conditional
expectation is 0,
64
00:04:00,870 --> 00:04:03,610
when we apply the law
of iterated expectations
65
00:04:03,610 --> 00:04:07,370
to this quantity,
we also obtain a 0.
66
00:04:07,370 --> 00:04:10,790
Therefore, this
term is 0 as well,
67
00:04:10,790 --> 00:04:13,250
and we have established
what we wanted to show.
68
00:04:16,269 --> 00:04:20,620
Using this fact, now
we can figure out
69
00:04:20,620 --> 00:04:23,300
that the following is true.
70
00:04:23,300 --> 00:04:26,270
We write the random
variable Theta
71
00:04:26,270 --> 00:04:33,110
as the sum of Theta
hat minus Theta tilde.
72
00:04:33,110 --> 00:04:36,420
This comes simply from
this definition here,
73
00:04:36,420 --> 00:04:38,659
by just moving
Theta to this side,
74
00:04:38,659 --> 00:04:41,080
and Theta tilde
to the other side.
75
00:04:41,080 --> 00:04:45,909
So Theta is the difference
of two random variables,
76
00:04:45,909 --> 00:04:49,890
and these two random
variables have 0 covariance.
77
00:04:49,890 --> 00:04:53,310
When two random variables
have 0 covariance,
78
00:04:53,310 --> 00:04:57,159
then the variance of their
sum, or of their difference,
79
00:04:57,159 --> 00:04:59,270
is the sum of the variances.
80
00:04:59,270 --> 00:05:01,640
And this leads us
to this relation--
81
00:05:01,640 --> 00:05:03,560
that the variance of
our random variable
82
00:05:03,560 --> 00:05:06,180
can be decomposed
into two pieces.
83
00:05:06,180 --> 00:05:10,100
One of them is the
variance of the estimator,
84
00:05:10,100 --> 00:05:13,390
and the other is the variance
of the estimation error.
85
00:05:15,930 --> 00:05:18,290
This is an interesting fact.
86
00:05:18,290 --> 00:05:21,360
It can actually be derived
in a different way, as well.
87
00:05:21,360 --> 00:05:25,240
It is just a manifestation of
the law of total variances,
88
00:05:25,240 --> 00:05:29,480
but hidden in somewhat
different notation.
89
00:05:29,480 --> 00:05:31,060
And this concludes
our discussion
90
00:05:31,060 --> 00:05:34,280
of theoretical properties
of the estimation error.
91
00:05:34,280 --> 00:05:37,220
Unfortunately we will
not have the opportunity
92
00:05:37,220 --> 00:05:40,180
to use them in any
interesting ways.
93
00:05:40,180 --> 00:05:43,120
On the other hand, they
are a foundational piece
94
00:05:43,120 --> 00:05:47,000
for the more general theory
of least-squares estimation.
95
00:05:47,000 --> 00:05:51,050
If you try to develop it in
a more sophisticated and more
96
00:05:51,050 --> 00:05:54,300
deep way, it turns out
that these properties
97
00:05:54,300 --> 00:05:57,504
are cornerstones of that theory.