1
00:00:00,820 --> 00:00:03,610
Let's now go through
another example,
2
00:00:03,610 --> 00:00:06,770
which will be a little
more challenging.
3
00:00:06,770 --> 00:00:09,040
We're going to revisit
an old problem.
4
00:00:09,040 --> 00:00:12,535
We have a coin that has
an unknown bias, Theta.
5
00:00:12,535 --> 00:00:15,700
And we have a prior
distribution on this Theta.
6
00:00:15,700 --> 00:00:18,960
We fix some positive
integer, n, we flip a coin
7
00:00:18,960 --> 00:00:21,750
n times, that has
this unknown bias.
8
00:00:21,750 --> 00:00:24,180
And we record the
number of heads.
9
00:00:24,180 --> 00:00:27,090
On the basis of the number of
heads that have been observed,
10
00:00:27,090 --> 00:00:31,850
we wish to estimate the
bias, Theta, of the coin.
11
00:00:31,850 --> 00:00:33,360
To make things more
concrete, we're
12
00:00:33,360 --> 00:00:36,080
going to assume a prior
distribution on Theta that
13
00:00:36,080 --> 00:00:39,120
is uniform on the unit interval.
14
00:00:39,120 --> 00:00:42,500
Now, this is a problem we
have considered before.
15
00:00:42,500 --> 00:00:47,800
We have calculated the expected
value of Theta given X.
16
00:00:47,800 --> 00:00:51,310
And we did find that
the expected value takes
17
00:00:51,310 --> 00:00:52,920
this particular form.
18
00:00:52,920 --> 00:00:56,800
Now, notice that this is
a linear function of X.
19
00:00:56,800 --> 00:01:00,180
And if it turns out the
least mean squares estimator
20
00:01:00,180 --> 00:01:03,010
is a linear function
of X, then we're
21
00:01:03,010 --> 00:01:05,580
guaranteed, since
this is the best,
22
00:01:05,580 --> 00:01:08,020
that this is also the
best within the class
23
00:01:08,020 --> 00:01:10,620
of linear estimators.
24
00:01:10,620 --> 00:01:13,120
So we immediately
have the conclusion
25
00:01:13,120 --> 00:01:15,220
that the linear
least mean squares
26
00:01:15,220 --> 00:01:18,900
estimator is this
particular function of X.
27
00:01:18,900 --> 00:01:22,350
So there's not much left to do.
28
00:01:22,350 --> 00:01:24,700
On the other hand,
just for practice,
29
00:01:24,700 --> 00:01:29,030
let us derive this answer
directly from the formulas
30
00:01:29,030 --> 00:01:32,330
that we have for the linear
least mean squares estimator,
31
00:01:32,330 --> 00:01:35,910
and see whether we're going
to get the same answer.
32
00:01:35,910 --> 00:01:37,930
So we want to use this formula.
33
00:01:37,930 --> 00:01:41,259
And in order to apply this
formula, all that we have to do
34
00:01:41,259 --> 00:01:44,979
is to calculate these expected
values, this variance,
35
00:01:44,979 --> 00:01:47,110
and this covariance.
36
00:01:47,110 --> 00:01:50,539
So now let's move on to this
particular calculational
37
00:01:50,539 --> 00:01:53,030
exercise.
38
00:01:53,030 --> 00:01:55,190
Let's start by
writing down what we
39
00:01:55,190 --> 00:01:58,759
know about the random variables
involved in this problem.
40
00:01:58,759 --> 00:02:01,180
About Theta, we know
that it is uniform.
41
00:02:01,180 --> 00:02:06,470
And so it has a mean of
1/2 and a variance of 1/12.
42
00:02:06,470 --> 00:02:09,460
About X, what we know
is the following.
43
00:02:09,460 --> 00:02:13,760
If you fix the bias of
the coin, then the number
44
00:02:13,760 --> 00:02:16,400
of heads you're going
to obtain in n flips
45
00:02:16,400 --> 00:02:20,670
has a binomial distribution,
with parameters n and Theta.
46
00:02:20,670 --> 00:02:23,740
But of course, Theta itself
is a random variable.
47
00:02:23,740 --> 00:02:27,220
So for this reason, this is
a conditional distribution.
48
00:02:27,220 --> 00:02:29,440
But within the
conditional universe,
49
00:02:29,440 --> 00:02:32,460
we know the mean and the
variance of a binomial,
50
00:02:32,460 --> 00:02:34,079
and they are as follows.
51
00:02:34,079 --> 00:02:38,000
The mean of a binomial is n
times the bias of the coin.
52
00:02:38,000 --> 00:02:40,910
But because we're talking
about the conditional universe,
53
00:02:40,910 --> 00:02:42,960
this is a conditional
expectation.
54
00:02:42,960 --> 00:02:45,010
And it's a random
variable, because it's
55
00:02:45,010 --> 00:02:48,350
affected by the value of
the random variable Theta.
56
00:02:48,350 --> 00:02:50,180
And similarly, for
the variance, it's
57
00:02:50,180 --> 00:02:53,880
the usual formula for the
variance of a binomial,
58
00:02:53,880 --> 00:02:58,780
except that now the bias
itself is a random variable.
59
00:02:58,780 --> 00:03:02,250
So now let's continue with the
calculation of the quantities
60
00:03:02,250 --> 00:03:06,060
that we need for the
formula for our estimator.
61
00:03:06,060 --> 00:03:09,120
Let's start with the
expected value of X.
62
00:03:09,120 --> 00:03:12,050
Since we know the
conditional expectation of X,
63
00:03:12,050 --> 00:03:15,460
we can use the law of
iterated expectations.
64
00:03:15,460 --> 00:03:19,650
The unconditional expectation
is the expected value
65
00:03:19,650 --> 00:03:25,200
of the conditional expectation,
which is n times Theta.
66
00:03:25,200 --> 00:03:29,480
And since the mean of Theta
is 1/2, we obtain n/2.
67
00:03:32,630 --> 00:03:36,250
Let us now continue with the
calculation of the variance.
68
00:03:36,250 --> 00:03:38,720
There are different ways
that we can calculate it.
69
00:03:38,720 --> 00:03:41,850
One could be the law
of total variance.
70
00:03:41,850 --> 00:03:44,620
But we will take the
alternative approach,
71
00:03:44,620 --> 00:03:48,010
which is to use the general
formula for the variance,
72
00:03:48,010 --> 00:03:50,140
that the variance is
equal to the expected
73
00:03:50,140 --> 00:03:52,740
value of the square
of a random variable,
74
00:03:52,740 --> 00:03:57,900
minus the square of
the expected value.
75
00:03:57,900 --> 00:03:59,970
We know the expected value of X.
76
00:03:59,970 --> 00:04:01,870
So all that's left
is to calculate
77
00:04:01,870 --> 00:04:04,540
the expected value of X squared.
78
00:04:04,540 --> 00:04:07,480
How are we going calculate it?
79
00:04:07,480 --> 00:04:11,110
Well, we know the conditional
distribution of X.
80
00:04:11,110 --> 00:04:13,800
So it should be
easy to calculate
81
00:04:13,800 --> 00:04:17,820
the conditional
expectation of X squared
82
00:04:17,820 --> 00:04:21,190
in the conditional
universe, and then
83
00:04:21,190 --> 00:04:24,120
use the law of
iterated expectations
84
00:04:24,120 --> 00:04:27,460
to obtain the
unconditional expectation.
85
00:04:27,460 --> 00:04:30,740
So now, we need to calculate
this conditional expectation
86
00:04:30,740 --> 00:04:32,330
here.
87
00:04:32,330 --> 00:04:34,230
How do we do it?
88
00:04:34,230 --> 00:04:36,780
The expected value of a
square of a random variable
89
00:04:36,780 --> 00:04:40,370
is always equal to the variance
of that random variable
90
00:04:40,370 --> 00:04:43,800
plus the square of
the expected value.
91
00:04:43,800 --> 00:04:45,530
We're going to
use this property,
92
00:04:45,530 --> 00:04:48,340
but we're going to use it
in the conditional universe.
93
00:04:48,340 --> 00:04:50,580
So in the conditional
universe, this
94
00:04:50,580 --> 00:04:52,900
is going to be equal
to the variance,
95
00:04:52,900 --> 00:04:57,670
in the conditional universe,
which is n times Theta times 1
96
00:04:57,670 --> 00:05:04,020
minus Theta, plus the square
of the expected value of X,
97
00:05:04,020 --> 00:05:07,300
but the expected value in the
conditional universe, which
98
00:05:07,300 --> 00:05:10,050
is this quantity--
n times Theta.
99
00:05:10,050 --> 00:05:16,210
So we obtain another term--
n squared, Theta squared.
100
00:05:16,210 --> 00:05:19,330
So now we can go back to
our previous calculation,
101
00:05:19,330 --> 00:05:22,430
and right here,
the expected value
102
00:05:22,430 --> 00:05:26,690
of this expression,
which is n times Theta.
103
00:05:31,210 --> 00:05:34,930
And then we have some
Theta squared terms.
104
00:05:34,930 --> 00:05:36,570
One is n squared.
105
00:05:36,570 --> 00:05:39,159
The other is a minus n.
106
00:05:39,159 --> 00:05:46,880
So we obtain plus n squared
minus n Theta squared.
107
00:05:51,380 --> 00:05:54,409
The expected value
of n times Theta
108
00:05:54,409 --> 00:05:58,700
is n times the expected
value of Theta, which is 1/2.
109
00:05:58,700 --> 00:06:01,715
So we obtain a factor of n/2.
110
00:06:04,460 --> 00:06:08,270
But then we have this
additional term here.
111
00:06:08,270 --> 00:06:11,120
We need the expected
value of Theta squared.
112
00:06:11,120 --> 00:06:12,180
What is it?
113
00:06:12,180 --> 00:06:15,760
Well, since we know the mean
and the variance of Theta,
114
00:06:15,760 --> 00:06:18,470
we can calculate the expected
value of Theta squared.
115
00:06:18,470 --> 00:06:26,410
It is equal to the variance
plus the square of the mean.
116
00:06:26,410 --> 00:06:30,630
And this evaluates to 1/3.
117
00:06:30,630 --> 00:06:33,880
So from here, we're
going to obtain
118
00:06:33,880 --> 00:06:41,540
plus n squared minus
n divided by 3.
119
00:06:41,540 --> 00:06:45,450
And you can rewrite this
term in a different way
120
00:06:45,450 --> 00:06:52,380
by collecting first the
n terms, n/2 minus n/3.
121
00:06:52,380 --> 00:06:53,815
This gives us n/6.
122
00:06:56,510 --> 00:06:58,540
And then there's
the n squared term,
123
00:06:58,540 --> 00:07:01,170
which is n squared over 3.
124
00:07:03,850 --> 00:07:06,590
Now that we found the
expected value of X squared,
125
00:07:06,590 --> 00:07:09,260
we can go back to
this calculation.
126
00:07:09,260 --> 00:07:17,850
We have n/6 plus
n squared over 3
127
00:07:17,850 --> 00:07:20,970
minus the square of the
expected value of X,
128
00:07:20,970 --> 00:07:22,600
which is this expression here.
129
00:07:22,600 --> 00:07:26,480
So we obtain minus
n squared over 4.
130
00:07:26,480 --> 00:07:31,130
1/3 minus 1/4-- that makes 1/12.
131
00:07:31,130 --> 00:07:42,170
So we obtain n/6 plus
n squared over 12.
132
00:07:42,170 --> 00:07:49,270
Or another way of writing this
is n times n plus 2 over 12.
133
00:07:51,880 --> 00:07:53,840
And this completes
the calculation
134
00:07:53,840 --> 00:07:56,470
of the variance of X.
135
00:07:56,470 --> 00:07:59,290
The last quantity that's
left for us to calculate
136
00:07:59,290 --> 00:08:03,080
is the covariance
of Theta with X.
137
00:08:03,080 --> 00:08:06,640
We're going to calculate it
using the alternative formula
138
00:08:06,640 --> 00:08:11,320
for the covariance, which is
the expectation of the product
139
00:08:11,320 --> 00:08:13,206
minus the product
of the expectations.
140
00:08:16,960 --> 00:08:19,300
We have the expectations,
but we do not
141
00:08:19,300 --> 00:08:21,540
have the expectation
of the product.
142
00:08:21,540 --> 00:08:23,950
So we need to calculate it.
143
00:08:23,950 --> 00:08:26,160
Once more, it's going
to be the same trick.
144
00:08:26,160 --> 00:08:28,800
We're going to
condition on Theta,
145
00:08:28,800 --> 00:08:33,070
and then use the law of
iterated expectations.
146
00:08:33,070 --> 00:08:35,350
So the law of
iterated expectations,
147
00:08:35,350 --> 00:08:38,789
when you condition on
Theta, takes this form.
148
00:08:38,789 --> 00:08:41,230
And to continue
here, we need to find
149
00:08:41,230 --> 00:08:45,220
this conditional
expectation in the inside.
150
00:08:45,220 --> 00:08:49,030
This conditional
expectation-- what is it?
151
00:08:49,030 --> 00:08:51,630
If I give you Theta,
then you know Theta.
152
00:08:51,630 --> 00:08:53,470
It is becoming now a constant.
153
00:08:53,470 --> 00:08:55,410
There's nothing
random to it, so it
154
00:08:55,410 --> 00:08:58,600
can be pulled outside
the expectation.
155
00:08:58,600 --> 00:09:03,780
And we obtain Theta times the
conditional expectation of X.
156
00:09:03,780 --> 00:09:06,740
We know what the conditional
expectation of X is.
157
00:09:06,740 --> 00:09:08,070
It's n times Theta.
158
00:09:08,070 --> 00:09:13,330
So from here, we obtain overall,
a term n times Theta squared.
159
00:09:13,330 --> 00:09:15,940
So now we can go back here.
160
00:09:15,940 --> 00:09:17,780
We have the expected value.
161
00:09:17,780 --> 00:09:20,890
And the term in the
inside-- we just found it.
162
00:09:20,890 --> 00:09:23,390
It's n times Theta squared.
163
00:09:23,390 --> 00:09:25,750
And since the expected
value of Theta squared
164
00:09:25,750 --> 00:09:28,695
is 1/3, from here,
we obtain n/3.
165
00:09:32,670 --> 00:09:36,040
And now we can go back,
finally, to the calculation
166
00:09:36,040 --> 00:09:37,330
of the covariance.
167
00:09:37,330 --> 00:09:43,160
It's going to be n/3, minus
the expected value of Theta,
168
00:09:43,160 --> 00:09:48,350
which is 1/2, times the expected
value of X, which is n/2.
169
00:09:48,350 --> 00:09:52,310
So it's minus n over four.
170
00:09:52,310 --> 00:09:54,330
And this evaluates to n/12.
171
00:09:57,880 --> 00:10:01,240
So we have succeeded in
calculating all the quantities
172
00:10:01,240 --> 00:10:04,830
that are needed in the
formula for the linear least
173
00:10:04,830 --> 00:10:06,045
mean squares estimator.
174
00:10:08,710 --> 00:10:12,540
We can now take those values
that we have just found
175
00:10:12,540 --> 00:10:15,620
and substitute them
into this formula.
176
00:10:15,620 --> 00:10:18,650
And after a little bit of
algebra and moving terms
177
00:10:18,650 --> 00:10:22,820
around, everything simplifies
to this expression.
178
00:10:22,820 --> 00:10:25,670
Just to verify that
this makes sense,
179
00:10:25,670 --> 00:10:28,122
what is the
coefficient next to X?
180
00:10:28,122 --> 00:10:30,820
It's the covariance
divided by the variance.
181
00:10:30,820 --> 00:10:34,320
n/12 divided by
this expression--
182
00:10:34,320 --> 00:10:36,050
this n cancels that n.
183
00:10:36,050 --> 00:10:37,410
This 12 cancels that 12.
184
00:10:37,410 --> 00:10:40,460
We're left with an n plus
2 in the denominator.
185
00:10:40,460 --> 00:10:43,520
And indeed, the coefficient
that multiplies X
186
00:10:43,520 --> 00:10:47,960
is the term n plus 2
in the denominator.
187
00:10:47,960 --> 00:10:51,280
And you can similarly verify
that the constant term as well
188
00:10:51,280 --> 00:10:53,870
is the correct one.
189
00:10:53,870 --> 00:10:58,400
So of course, this answer is
what we had found in the past
190
00:10:58,400 --> 00:11:02,160
to be the optimal,
the least mean squares
191
00:11:02,160 --> 00:11:08,310
estimator of X. As we discussed
earlier, when this is linear
192
00:11:08,310 --> 00:11:11,970
in X, it has to be the same as
the linear least mean squares
193
00:11:11,970 --> 00:11:13,050
estimator.
194
00:11:13,050 --> 00:11:15,530
So this answer is
not a surprise,
195
00:11:15,530 --> 00:11:17,930
but it was an interesting
and perhaps useful
196
00:11:17,930 --> 00:11:21,440
exercise to go through the
details of this calculation
197
00:11:21,440 --> 00:11:24,030
to see what it
takes to figure out
198
00:11:24,030 --> 00:11:27,360
the different terms
in this formula.