1
00:00:00,000 --> 00:00:00,040
2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative
3
00:00:02,460 --> 00:00:03,870
Commons license.
4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to
5
00:00:06,910 --> 00:00:10,560
offer high quality educational
resources for free.
6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from
7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at
8
00:00:19,290 --> 00:00:20,540
ocw.mit.edu.
9
00:00:20,540 --> 00:00:22,648
10
00:00:22,648 --> 00:00:25,410
JOHN TSITSIKLIS: So today we're
going to finish with the
11
00:00:25,410 --> 00:00:28,240
core material of this class.
12
00:00:28,240 --> 00:00:30,980
That is the material that has to
do with probability theory
13
00:00:30,980 --> 00:00:31,690
in general.
14
00:00:31,690 --> 00:00:34,240
And then for the rest of the
semester we're going to look
15
00:00:34,240 --> 00:00:38,290
at some special types of models,
talk about inference.
16
00:00:38,290 --> 00:00:40,970
Well, there's also going to
be a small module of core
17
00:00:40,970 --> 00:00:42,840
material coming later.
18
00:00:42,840 --> 00:00:46,940
But today we're basically
finishing chapter four.
19
00:00:46,940 --> 00:00:50,800
And what we're going to do is
we're going to look at a
20
00:00:50,800 --> 00:00:53,720
somewhat familiar concept, the
concept of the conditional
21
00:00:53,720 --> 00:00:55,000
expectation.
22
00:00:55,000 --> 00:00:58,690
But we're going to look at it
from a slightly different
23
00:00:58,690 --> 00:01:02,840
angle, from a slightly more
sophisticated angle.
24
00:01:02,840 --> 00:01:05,370
And together with the
conditional expectation we
25
00:01:05,370 --> 00:01:08,445
will also talk about conditional
variances.
26
00:01:08,445 --> 00:01:11,840
It's something that we're going
to denote this way.
27
00:01:11,840 --> 00:01:15,180
And we're going to see what they
are, and there are some
28
00:01:15,180 --> 00:01:17,820
subtle concepts that
are involved here.
29
00:01:17,820 --> 00:01:20,780
And we're going to apply some
of the tools we're going to
30
00:01:20,780 --> 00:01:24,390
develop to deal with a special
type of situation in which
31
00:01:24,390 --> 00:01:26,660
we're adding random variables.
32
00:01:26,660 --> 00:01:31,860
But we're adding a random number
of random variables.
33
00:01:31,860 --> 00:01:34,720
OK, so let's start talking
about conditional
34
00:01:34,720 --> 00:01:37,410
expectations.
35
00:01:37,410 --> 00:01:39,970
I guess you know
what they are.
36
00:01:39,970 --> 00:01:43,660
Suppose we are in the discrete
the world. xy, or discrete
37
00:01:43,660 --> 00:01:45,590
random variables.
38
00:01:45,590 --> 00:01:49,340
We defined the conditional
expectation of x given that I
39
00:01:49,340 --> 00:01:52,480
told you the value of the
random variable y.
40
00:01:52,480 --> 00:01:56,800
And the way we define it is the
same way as an ordinary
41
00:01:56,800 --> 00:02:01,020
expectation, except that we're
using the conditional PMF.
42
00:02:01,020 --> 00:02:03,440
So we're using the probabilities
that apply to
43
00:02:03,440 --> 00:02:06,910
the new universe where we are
told the value of the random
44
00:02:06,910 --> 00:02:08,289
variable y.
45
00:02:08,289 --> 00:02:12,050
So this is still a familiar
concept so far.
46
00:02:12,050 --> 00:02:14,720
If we're dealing with the
continuous random variable x
47
00:02:14,720 --> 00:02:17,170
the formula is the same, except
that here we have an
48
00:02:17,170 --> 00:02:21,450
integral, and we have to use
the conditional density
49
00:02:21,450 --> 00:02:25,020
function of x.
50
00:02:25,020 --> 00:02:28,770
Now what I'm going to do, I want
to introduce it gently
51
00:02:28,770 --> 00:02:32,200
through the example that we
talked about last time.
52
00:02:32,200 --> 00:02:35,290
So last time we talked about
having a stick that has a
53
00:02:35,290 --> 00:02:36,770
certain length.
54
00:02:36,770 --> 00:02:41,950
And we take that stick, and we
break it at some point that we
55
00:02:41,950 --> 00:02:43,790
choose uniformly at random.
56
00:02:43,790 --> 00:02:49,390
And let's denote why the place
where we chose to break it.
57
00:02:49,390 --> 00:02:52,750
Having chosen y, then
we're left with a
58
00:02:52,750 --> 00:02:53,930
piece of the stick.
59
00:02:53,930 --> 00:02:57,750
And I'm going to choose a place
to break it once more
60
00:02:57,750 --> 00:03:01,330
uniformly at random
between 0 and y.
61
00:03:01,330 --> 00:03:04,170
So this is the second place at
which we are going to break
62
00:03:04,170 --> 00:03:07,900
it, and we call that place x.
63
00:03:07,900 --> 00:03:12,040
OK, so what's the conditional
expectation of x if I tell you
64
00:03:12,040 --> 00:03:13,630
the value of y?
65
00:03:13,630 --> 00:03:16,740
I tell you that capital Y
happens to take a specific
66
00:03:16,740 --> 00:03:18,800
numerical value.
67
00:03:18,800 --> 00:03:22,770
So this capital Y is now a
specific numerical value, x is
68
00:03:22,770 --> 00:03:25,280
chosen uniformly over
this range.
69
00:03:25,280 --> 00:03:29,780
So the expected value of x is
going to be half of this range
70
00:03:29,780 --> 00:03:30,810
between 0 and y.
71
00:03:30,810 --> 00:03:36,850
So the conditional expectation
is little y over 2.
72
00:03:36,850 --> 00:03:40,210
The important thing to realize
here is that this
73
00:03:40,210 --> 00:03:42,170
quantity is a number.
74
00:03:42,170 --> 00:03:45,570
I told you that the random
variable took a certain
75
00:03:45,570 --> 00:03:49,170
numerical value,
let's say 3.5.
76
00:03:49,170 --> 00:03:52,780
And then you tell me given that
the random variable took
77
00:03:52,780 --> 00:03:59,940
the numerical value 3.5 the
expected value of x is 1.75.
78
00:03:59,940 --> 00:04:04,080
So this is an equality
between numbers.
79
00:04:04,080 --> 00:04:08,110
On the other hand, before you
do the experiment you don't
80
00:04:08,110 --> 00:04:12,160
know what y is going
to turn out to be.
81
00:04:12,160 --> 00:04:15,680
So this little y is the
numerical value that has been
82
00:04:15,680 --> 00:04:18,990
observed when you start doing
the experiments and you
83
00:04:18,990 --> 00:04:22,700
observe the value of capital
Y. So in some sense this
84
00:04:22,700 --> 00:04:27,770
quantity is not known ahead of
time, it is random itself.
85
00:04:27,770 --> 00:04:33,670
So maybe we can start thinking
of it as a random variable.
86
00:04:33,670 --> 00:04:37,010
So to put it differently, before
we do the experiment I
87
00:04:37,010 --> 00:04:41,030
ask you what's the expected
value of x given y?
88
00:04:41,030 --> 00:04:44,740
You're going to answer me well
I don't know, it depends on
89
00:04:44,740 --> 00:04:47,580
what y is going to
turn out to be.
90
00:04:47,580 --> 00:04:52,690
So the expected value of x given
y itself can be viewed
91
00:04:52,690 --> 00:04:56,540
as a random variable, because
it depends on the random
92
00:04:56,540 --> 00:04:58,330
variable capital Y.
93
00:04:58,330 --> 00:05:02,080
So hidden here there's some kind
of statement about random
94
00:05:02,080 --> 00:05:04,810
variables instead of numbers.
95
00:05:04,810 --> 00:05:07,770
And that statement about
random variables, we
96
00:05:07,770 --> 00:05:09,660
write it this way.
97
00:05:09,660 --> 00:05:12,770
By thinking of the expected
value, the conditional
98
00:05:12,770 --> 00:05:17,330
expectation, as a random
variable instead of a number.
99
00:05:17,330 --> 00:05:20,380
It's a random variable when we
do not specify a specific
100
00:05:20,380 --> 00:05:23,410
number, but we think of it
as an abstract object.
101
00:05:23,410 --> 00:05:29,560
The expected value of x given
the random variable y is the
102
00:05:29,560 --> 00:05:34,390
random variable y over 2 no
matter what capital Y
103
00:05:34,390 --> 00:05:37,090
turns out to be.
104
00:05:37,090 --> 00:05:39,530
So we turn and take a statement
that deals with
105
00:05:39,530 --> 00:05:43,460
equality of two numbers, and we
make it a statement that's
106
00:05:43,460 --> 00:05:46,740
an equality between two
random variables.
107
00:05:46,740 --> 00:05:49,910
OK so this is clearly a random
variable because
108
00:05:49,910 --> 00:05:52,330
capital Y is random.
109
00:05:52,330 --> 00:05:54,170
What exactly is this object?
110
00:05:54,170 --> 00:05:57,130
I didn't yet define it
for you formally.
111
00:05:57,130 --> 00:06:02,150
So let's now give the formal
definition of this object
112
00:06:02,150 --> 00:06:04,570
that's going to be
denoted this way.
113
00:06:04,570 --> 00:06:09,400
The conditional expectation of
x given the random variable y
114
00:06:09,400 --> 00:06:12,830
is a random variable.
115
00:06:12,830 --> 00:06:14,900
Which random variable is it?
116
00:06:14,900 --> 00:06:19,670
It's the random variable that
takes this specific numerical
117
00:06:19,670 --> 00:06:24,330
value whenever capital Y happens
to take the specific
118
00:06:24,330 --> 00:06:26,480
numerical value little y.
119
00:06:26,480 --> 00:06:30,010
In particular, this is a random
variable, which is a
120
00:06:30,010 --> 00:06:33,840
function of the random variable
capital Y. In this
121
00:06:33,840 --> 00:06:36,680
instance, it's given by a simple
formula in terms of
122
00:06:36,680 --> 00:06:39,540
capital Y. In other situations
it might be a
123
00:06:39,540 --> 00:06:41,520
more complicated formula.
124
00:06:41,520 --> 00:06:44,680
So again, to summarize,
it's a random.
125
00:06:44,680 --> 00:06:48,530
The conditional expectation can
be thought of as a random
126
00:06:48,530 --> 00:06:53,040
variable instead of something
that's just a number.
127
00:06:53,040 --> 00:06:55,940
So in any specific context when
you're given the value of
128
00:06:55,940 --> 00:06:59,110
capital Y the conditional
expectation becomes a number.
129
00:06:59,110 --> 00:07:02,890
This is the realized value
of this random variable.
130
00:07:02,890 --> 00:07:06,260
But before the experiment
starts, before you know what
131
00:07:06,260 --> 00:07:10,140
capital Y is going to be, all
that you can say is that the
132
00:07:10,140 --> 00:07:14,320
conditional expectation is going
to be 1/2 of whatever
133
00:07:14,320 --> 00:07:16,840
capital Y turns out to be.
134
00:07:16,840 --> 00:07:20,270
This is a pretty subtle concept,
it's an abstraction,
135
00:07:20,270 --> 00:07:22,990
but it's a useful abstraction.
136
00:07:22,990 --> 00:07:29,440
And we're going to see
today how to use it.
137
00:07:29,440 --> 00:07:32,940
All right, I have made the point
that the conditional
138
00:07:32,940 --> 00:07:37,200
expectation, the random variable
that takes these
139
00:07:37,200 --> 00:07:40,490
numerical values is
a random variable.
140
00:07:40,490 --> 00:07:43,090
If it is a random variable
this means that it has an
141
00:07:43,090 --> 00:07:45,710
expectation of its own.
142
00:07:45,710 --> 00:07:48,590
So let's start thinking what
the expectation of the
143
00:07:48,590 --> 00:07:53,432
conditional expectation is
going to turn out to be.
144
00:07:53,432 --> 00:07:59,210
OK, so the conditional
expectation is a random
145
00:07:59,210 --> 00:08:03,030
variable, and in general it's
some function of the random
146
00:08:03,030 --> 00:08:05,465
variable y that we
are observing.
147
00:08:05,465 --> 00:08:07,970
148
00:08:07,970 --> 00:08:13,910
In terms of numerical values if
capital Y happens to take a
149
00:08:13,910 --> 00:08:17,490
specific numerical value then
the conditional expectation
150
00:08:17,490 --> 00:08:20,830
also takes a specific numerical
value, and we use
151
00:08:20,830 --> 00:08:22,630
the same function
to evaluate it.
152
00:08:22,630 --> 00:08:25,770
The difference here is that this
is an equality of random
153
00:08:25,770 --> 00:08:29,440
variables, this is an equality
between numbers.
154
00:08:29,440 --> 00:08:33,120
Now if we want to calculate
the expected value of the
155
00:08:33,120 --> 00:08:38,539
conditional expectation we're
basically talking about the
156
00:08:38,539 --> 00:08:44,080
expected value of a function
of a random variable.
157
00:08:44,080 --> 00:08:48,620
And we know how to calculate
expected values of a function.
158
00:08:48,620 --> 00:08:54,330
If we are in the discrete case,
for example, this would
159
00:08:54,330 --> 00:09:02,690
be a sum over all y's of the
function who's expected value
160
00:09:02,690 --> 00:09:09,580
we're taking times the
probability that y takes on a
161
00:09:09,580 --> 00:09:11,940
specific numerical value.
162
00:09:11,940 --> 00:09:16,360
OK, but let's remember
what g is.
163
00:09:16,360 --> 00:09:22,690
So g is the numerical value
of the conditional
164
00:09:22,690 --> 00:09:25,300
expectation of x with y.
165
00:09:25,300 --> 00:09:29,530
166
00:09:29,530 --> 00:09:33,450
And now when you see this
expression you recognize it.
167
00:09:33,450 --> 00:09:35,630
This is the expression
that we get in the
168
00:09:35,630 --> 00:09:37,190
total expectation theorem.
169
00:09:37,190 --> 00:09:41,300
170
00:09:41,300 --> 00:09:42,795
Did I miss something?
171
00:09:42,795 --> 00:09:45,570
172
00:09:45,570 --> 00:09:48,700
Yes, in the total expectation
theorem to find the expected
173
00:09:48,700 --> 00:09:52,720
value of x, we divide the world
into different scenarios
174
00:09:52,720 --> 00:09:55,970
depending on what y happens.
175
00:09:55,970 --> 00:09:59,110
We calculate the expectation
in each one of the possible
176
00:09:59,110 --> 00:10:01,750
worlds, and we take the
weighted average.
177
00:10:01,750 --> 00:10:04,770
So this is a formula that you
have seen before, and you
178
00:10:04,770 --> 00:10:08,610
recognize that this is the
expected value of x.
179
00:10:08,610 --> 00:10:13,280
So this is a longer, more
detailed derivation of what I
180
00:10:13,280 --> 00:10:17,770
had written up here, but the
important thing to keep in
181
00:10:17,770 --> 00:10:22,790
mind is the moral of the
story, the punchline.
182
00:10:22,790 --> 00:10:26,640
The expected value of the
conditional expectation is the
183
00:10:26,640 --> 00:10:27,890
expectation itself.
184
00:10:27,890 --> 00:10:30,710
185
00:10:30,710 --> 00:10:35,030
So this is just our total
expectation theorem, but
186
00:10:35,030 --> 00:10:37,700
written in more abstract
notation.
187
00:10:37,700 --> 00:10:40,035
And it comes handy to have this
more abstract notation,
188
00:10:40,035 --> 00:10:43,570
as as we're going to
see in a while.
189
00:10:43,570 --> 00:10:47,320
OK, we can apply this to
our stick example.
190
00:10:47,320 --> 00:10:50,220
If we want to find the expected
value of x how much
191
00:10:50,220 --> 00:10:53,110
of the stick is left
at the end?
192
00:10:53,110 --> 00:10:57,370
We can calculate it using this
law of iterated expectations.
193
00:10:57,370 --> 00:11:00,190
It's the expected value of the
conditional expectation.
194
00:11:00,190 --> 00:11:03,790
We know that the conditional
expectation is y over 2.
195
00:11:03,790 --> 00:11:10,730
So expected value of y is l over
2, because y is uniform
196
00:11:10,730 --> 00:11:12,830
so we get l over 4.
197
00:11:12,830 --> 00:11:15,440
So this gives us the same answer
that we derived last
198
00:11:15,440 --> 00:11:18,210
time in a rather long way.
199
00:11:18,210 --> 00:11:24,470
200
00:11:24,470 --> 00:11:27,750
All right, now that we have
mastered conditional
201
00:11:27,750 --> 00:11:33,100
expectations, let's raise the
bar a little more and talk
202
00:11:33,100 --> 00:11:35,590
about conditional variances.
203
00:11:35,590 --> 00:11:38,750
So the conditional expectation
is the mean value, or the
204
00:11:38,750 --> 00:11:41,380
expected value, in a conditional
universe where
205
00:11:41,380 --> 00:11:43,450
you're told the value of y.
206
00:11:43,450 --> 00:11:47,270
In that same conditional
universe you can talk about
207
00:11:47,270 --> 00:11:51,360
the conditional distribution
of x, which has a mean--
208
00:11:51,360 --> 00:11:52,810
the conditional expectation--
209
00:11:52,810 --> 00:11:54,140
but the conditional
distribution of
210
00:11:54,140 --> 00:11:56,130
x also has a variance.
211
00:11:56,130 --> 00:11:58,730
So we can talk about the
variance of x in that
212
00:11:58,730 --> 00:12:01,500
conditional universe.
213
00:12:01,500 --> 00:12:07,390
The conditional variance as a
number is the natural thing.
214
00:12:07,390 --> 00:12:11,680
It's the variance of x, except
that all the calculations are
215
00:12:11,680 --> 00:12:13,790
done in the conditional
universe.
216
00:12:13,790 --> 00:12:19,940
In the conditional universe the
expected value of x is the
217
00:12:19,940 --> 00:12:21,740
conditional expectation.
218
00:12:21,740 --> 00:12:24,530
This is the distance from the
mean in the conditional
219
00:12:24,530 --> 00:12:26,310
universe squared.
220
00:12:26,310 --> 00:12:30,080
And we take the average value
of the squared distance, but
221
00:12:30,080 --> 00:12:32,660
calculate it again using the
probabilities that apply in
222
00:12:32,660 --> 00:12:35,240
the conditional universe.
223
00:12:35,240 --> 00:12:38,020
This is an equality
between numbers.
224
00:12:38,020 --> 00:12:43,720
I tell you the value of y, once
you know that value for y
225
00:12:43,720 --> 00:12:47,730
you can go ahead and plot the
conditional distribution of x.
226
00:12:47,730 --> 00:12:50,090
And for that conditional
distribution you can calculate
227
00:12:50,090 --> 00:12:52,890
the number which is the
variance of x in that
228
00:12:52,890 --> 00:12:54,650
conditional universe.
229
00:12:54,650 --> 00:12:57,820
So now let's repeat the mental
gymnastics from the previous
230
00:12:57,820 --> 00:13:03,560
slide, and abstract things, and
define a random variable--
231
00:13:03,560 --> 00:13:06,080
the conditional variance.
232
00:13:06,080 --> 00:13:08,900
And it's going to be a random
variable because we leave the
233
00:13:08,900 --> 00:13:12,010
numerical value of capital
Y unspecified.
234
00:13:12,010 --> 00:13:15,670
So ahead of time we don't know
what capital Y is going to be,
235
00:13:15,670 --> 00:13:18,860
and because of that we don't
know ahead of time what the
236
00:13:18,860 --> 00:13:20,870
conditional variance
is going to be.
237
00:13:20,870 --> 00:13:24,500
So before the experiment starts
if I ask you what's the
238
00:13:24,500 --> 00:13:26,060
conditional variance of x?
239
00:13:26,060 --> 00:13:28,440
You're going to tell me well I
don't know, It depends on what
240
00:13:28,440 --> 00:13:30,300
y is going to turn out to be.
241
00:13:30,300 --> 00:13:32,770
It's going to be something
that depends on y.
242
00:13:32,770 --> 00:13:36,210
So it's a random variable,
which is a function of y.
243
00:13:36,210 --> 00:13:38,980
So more precisely, the
conditional variance when
244
00:13:38,980 --> 00:13:42,480
written in this notation just
with capital letters, is a
245
00:13:42,480 --> 00:13:43,730
random variable.
246
00:13:43,730 --> 00:13:47,560
It's a random variable whose
value is completely determined
247
00:13:47,560 --> 00:13:52,330
once you learned the value of
capital Y. And it takes a
248
00:13:52,330 --> 00:13:55,070
specific numerical value.
249
00:13:55,070 --> 00:13:58,700
If capital Y happens to get a
realization that's a specific
250
00:13:58,700 --> 00:14:03,130
number, then the variance also
becomes a specific number.
251
00:14:03,130 --> 00:14:05,390
And it's just a conditional
variance of y
252
00:14:05,390 --> 00:14:09,420
over x in that universe.
253
00:14:09,420 --> 00:14:12,390
All right, OK, so let's continue
what we did in the
254
00:14:12,390 --> 00:14:13,620
previous slide.
255
00:14:13,620 --> 00:14:15,960
We had the law of iterated
expectations.
256
00:14:15,960 --> 00:14:18,350
That told us that expected
value of a conditional
257
00:14:18,350 --> 00:14:21,360
expectation is the unconditional
expectation.
258
00:14:21,360 --> 00:14:26,140
Is there a similar rule that
might apply in this context?
259
00:14:26,140 --> 00:14:29,810
So you might guess that the
variance of x could be found
260
00:14:29,810 --> 00:14:33,590
by taking the expected value of
the conditional variance.
261
00:14:33,590 --> 00:14:35,680
It turns out that this
is not true.
262
00:14:35,680 --> 00:14:38,480
There is a formula for the
variance in terms of
263
00:14:38,480 --> 00:14:40,060
conditional quantities.
264
00:14:40,060 --> 00:14:42,280
But the formula is a little
more complicated.
265
00:14:42,280 --> 00:14:46,200
If involves two terms
instead of one.
266
00:14:46,200 --> 00:14:50,010
So we're going to go
quickly through the
267
00:14:50,010 --> 00:14:52,470
derivation of this formula.
268
00:14:52,470 --> 00:14:55,260
And then, through examples
we'll try to get some
269
00:14:55,260 --> 00:14:58,480
interpretation of what the
different terms here
270
00:14:58,480 --> 00:15:01,440
correspond to.
271
00:15:01,440 --> 00:15:04,800
All right, so let's try
to prove this formula.
272
00:15:04,800 --> 00:15:08,940
And the proof is sort of a
useful exercise to make sure
273
00:15:08,940 --> 00:15:11,860
you understand all the symbols
that are involved in here.
274
00:15:11,860 --> 00:15:14,850
So the proof is not difficult,
it's 4 and 1/2 lines of
275
00:15:14,850 --> 00:15:18,220
algebra, of just writing
down formulas.
276
00:15:18,220 --> 00:15:21,710
But the challenge is to make
sure that at each point you
277
00:15:21,710 --> 00:15:25,070
understand what each one
of the objects is.
278
00:15:25,070 --> 00:15:27,880
So we go into formula for
the variance affects.
279
00:15:27,880 --> 00:15:32,480
We know in general that the
variance of x has this nice
280
00:15:32,480 --> 00:15:34,590
expression that we often
use to calculate it.
281
00:15:34,590 --> 00:15:37,340
The expected value of the
squared of the random variable
282
00:15:37,340 --> 00:15:41,220
minus the mean squared.
283
00:15:41,220 --> 00:15:45,290
This formula, for the variances,
of course it should
284
00:15:45,290 --> 00:15:48,380
apply to conditional
universes.
285
00:15:48,380 --> 00:15:50,430
I mean it's a general formula
about variances.
286
00:15:50,430 --> 00:15:53,650
If we put ourselves in a
conditional universe where the
287
00:15:53,650 --> 00:15:58,380
random variable y is given to us
the same math should work.
288
00:15:58,380 --> 00:16:01,220
So we should have a similar
formula for
289
00:16:01,220 --> 00:16:02,900
the conditional variances.
290
00:16:02,900 --> 00:16:05,430
It's just the same formula,
but applied to
291
00:16:05,430 --> 00:16:07,370
the conditional universe.
292
00:16:07,370 --> 00:16:10,130
The variance of x in the
conditional universe is the
293
00:16:10,130 --> 00:16:12,050
expected value of x squared--
294
00:16:12,050 --> 00:16:13,770
in the conditional universe--
295
00:16:13,770 --> 00:16:16,700
minus the mean of x-- in the
conditional universe--
296
00:16:16,700 --> 00:16:17,730
squared.
297
00:16:17,730 --> 00:16:20,350
So this formula looks fine.
298
00:16:20,350 --> 00:16:23,620
Now let's take expected
values of both sides.
299
00:16:23,620 --> 00:16:27,470
Remember the conditional
variance is a random variable,
300
00:16:27,470 --> 00:16:30,600
because its value depends on
whatever realization we get
301
00:16:30,600 --> 00:16:33,860
for capital Y. So we can
take expectations here.
302
00:16:33,860 --> 00:16:36,320
We get the expected value
of the variance.
303
00:16:36,320 --> 00:16:39,380
Then we have the expected
value of a conditional
304
00:16:39,380 --> 00:16:40,740
expectation.
305
00:16:40,740 --> 00:16:44,420
Here we use the fact that
we discussed before.
306
00:16:44,420 --> 00:16:48,020
The expected value of a
conditional expectation is the
307
00:16:48,020 --> 00:16:50,560
same as the unconditional
expectation.
308
00:16:50,560 --> 00:16:52,780
So this term becomes this.
309
00:16:52,780 --> 00:16:57,240
And finally, here we just have
some weird looking random
310
00:16:57,240 --> 00:17:02,360
variable, and we take the
expected value of it.
311
00:17:02,360 --> 00:17:06,210
All right, now we need to do
something about this term.
312
00:17:06,210 --> 00:17:10,130
Let's use the same
rule up here to
313
00:17:10,130 --> 00:17:14,030
write down this variance.
314
00:17:14,030 --> 00:17:17,810
So variance of an expectation,
that's kind of strange, but
315
00:17:17,810 --> 00:17:21,460
you remember that the
conditional expectation is
316
00:17:21,460 --> 00:17:23,790
random, because y is random.
317
00:17:23,790 --> 00:17:26,099
So this thing is a random
variable, so
318
00:17:26,099 --> 00:17:28,390
this thing has a variance.
319
00:17:28,390 --> 00:17:30,310
What is the variance
of this thing?
320
00:17:30,310 --> 00:17:37,740
It's the expected value of the
thing squared minus the square
321
00:17:37,740 --> 00:17:40,590
of the expected value
of the thing.
322
00:17:40,590 --> 00:17:43,340
Now what's the expected
value of that thing?
323
00:17:43,340 --> 00:17:47,230
By the law of iterated
expectations, once more, the
324
00:17:47,230 --> 00:17:49,990
expected value of this thing
is the unconditional
325
00:17:49,990 --> 00:17:51,090
expectation.
326
00:17:51,090 --> 00:17:54,560
And that's why here I put the
unconditional expectation.
327
00:17:54,560 --> 00:17:58,040
So I'm using again this general
rule about how to
328
00:17:58,040 --> 00:18:01,510
calculate variances, and I'm
applying it to calculate the
329
00:18:01,510 --> 00:18:05,680
variance of the conditional
expectation.
330
00:18:05,680 --> 00:18:10,030
And now you notice that if you
add these two expressions c
331
00:18:10,030 --> 00:18:15,040
and d we get this plus
that, which is this.
332
00:18:15,040 --> 00:18:17,220
It's equal to--
333
00:18:17,220 --> 00:18:22,360
these two terms cancel, we're
left with this minus that,
334
00:18:22,360 --> 00:18:24,810
which is the variance of x.
335
00:18:24,810 --> 00:18:27,430
And that's the end
of the proof.
336
00:18:27,430 --> 00:18:31,105
This one of those proofs that
do not convey any intuition.
337
00:18:31,105 --> 00:18:34,310
338
00:18:34,310 --> 00:18:37,880
This, as I said, it's a useful
proof to go through just to
339
00:18:37,880 --> 00:18:40,250
make sure you understand
the symbols.
340
00:18:40,250 --> 00:18:44,020
It starts to get pretty
confusing, and a little bit on
341
00:18:44,020 --> 00:18:45,490
the abstract side.
342
00:18:45,490 --> 00:18:48,010
So it's good to understand
what's going on.
343
00:18:48,010 --> 00:18:52,610
Now there is intuition behind
this formula, some of which is
344
00:18:52,610 --> 00:18:54,780
better left for later
in the class when
345
00:18:54,780 --> 00:18:56,680
we talk about inference.
346
00:18:56,680 --> 00:19:01,380
The idea is that the conditional
expectation you
347
00:19:01,380 --> 00:19:04,110
can interpret it as an estimate
of the random
348
00:19:04,110 --> 00:19:06,700
variable that you
are trying to--
349
00:19:06,700 --> 00:19:10,240
an estimate of x based on
measurements of y, you can
350
00:19:10,240 --> 00:19:14,090
think of these variances as
having something to do with an
351
00:19:14,090 --> 00:19:15,650
estimation error.
352
00:19:15,650 --> 00:19:19,040
And once you start thinking in
those terms an interpretation
353
00:19:19,040 --> 00:19:20,060
will come about.
354
00:19:20,060 --> 00:19:23,750
But again as I said this is
better left for when we start
355
00:19:23,750 --> 00:19:25,320
talking about inference.
356
00:19:25,320 --> 00:19:28,080
Nevertheless, we're going to get
some intuition about all
357
00:19:28,080 --> 00:19:33,010
these formulas by considering
a baby example where we're
358
00:19:33,010 --> 00:19:35,900
going to apply the law of
iterated expectations, and the
359
00:19:35,900 --> 00:19:38,060
law of total variance.
360
00:19:38,060 --> 00:19:42,360
So the baby example is that we
do this beautiful experiment
361
00:19:42,360 --> 00:19:47,190
of giving a quiz to a class
consisting of many sections.
362
00:19:47,190 --> 00:19:49,325
And we're interested in
two random variables.
363
00:19:49,325 --> 00:19:52,440
364
00:19:52,440 --> 00:19:54,590
So we have a number of students,
and they're all
365
00:19:54,590 --> 00:19:55,980
allocated to sections.
366
00:19:55,980 --> 00:19:59,890
The experiment is that I pick
a student at random, and I
367
00:19:59,890 --> 00:20:01,180
look at two random variables.
368
00:20:01,180 --> 00:20:05,880
One is the quiz score of the
randomly selected student, and
369
00:20:05,880 --> 00:20:09,960
the other random variable is
the section number of the
370
00:20:09,960 --> 00:20:13,040
student that I have selected.
371
00:20:13,040 --> 00:20:17,010
We're given some statistics
about the two sections.
372
00:20:17,010 --> 00:20:19,960
Section one has 10 students,
section two has 20 students.
373
00:20:19,960 --> 00:20:22,430
The quiz average in section
one was 90.
374
00:20:22,430 --> 00:20:25,860
Quiz average in section
two was 60.
375
00:20:25,860 --> 00:20:28,320
What's the expected
value of x?
376
00:20:28,320 --> 00:20:32,990
What's the expected quiz score
if I pick a student at random?
377
00:20:32,990 --> 00:20:34,420
Well, each student has the same
378
00:20:34,420 --> 00:20:35,930
probability of being selected.
379
00:20:35,930 --> 00:20:38,740
I'm making that assumption
out of the 30 students.
380
00:20:38,740 --> 00:20:43,520
I need to add the quiz scores
of all of the students.
381
00:20:43,520 --> 00:20:47,210
So I need to add the quiz scores
in section one, which
382
00:20:47,210 --> 00:20:48,860
is 90 times 10.
383
00:20:48,860 --> 00:20:51,030
I need to add the quiz scores
in that section,
384
00:20:51,030 --> 00:20:52,720
which is 60 times 20.
385
00:20:52,720 --> 00:20:55,220
And we find that the overall
average was 70.
386
00:20:55,220 --> 00:20:58,310
So this is the usual
unconditional expectation.
387
00:20:58,310 --> 00:21:00,990
Let's look at the conditional
expectation, and let's look at
388
00:21:00,990 --> 00:21:03,000
the elementary version
where we're talking
389
00:21:03,000 --> 00:21:04,690
about numerical values.
390
00:21:04,690 --> 00:21:07,330
If I tell you that the randomly
selected student was
391
00:21:07,330 --> 00:21:10,780
in section one what's the
expected value of the quiz
392
00:21:10,780 --> 00:21:12,490
score of that student?
393
00:21:12,490 --> 00:21:16,900
Well, given this information,
we're picking a random student
394
00:21:16,900 --> 00:21:20,820
uniformly from that section in
which the average was 90.
395
00:21:20,820 --> 00:21:23,070
The expected value of the
score of that student
396
00:21:23,070 --> 00:21:24,580
is going to be 90.
397
00:21:24,580 --> 00:21:28,800
So given the specific value of
y, the specific section, the
398
00:21:28,800 --> 00:21:31,280
conditional expectation or the
expected value of the quiz
399
00:21:31,280 --> 00:21:34,470
score is a specific number,
the number 90.
400
00:21:34,470 --> 00:21:37,900
Similarly for the second section
the expected value is
401
00:21:37,900 --> 00:21:41,480
60, that's the average score
in the second section.
402
00:21:41,480 --> 00:21:42,940
This is the elementary
version.
403
00:21:42,940 --> 00:21:45,000
What about the abstract
version?
404
00:21:45,000 --> 00:21:48,350
In the abstract version the
conditional expectation is a
405
00:21:48,350 --> 00:21:52,540
random variable because
it depends.
406
00:21:52,540 --> 00:21:57,220
In which section is the
student that I picked?
407
00:21:57,220 --> 00:22:01,680
And with probability 1/3, I'm
going to pick a student in the
408
00:22:01,680 --> 00:22:04,890
first section, in which case
the conditional expectation
409
00:22:04,890 --> 00:22:08,180
will be 90, and with probability
2/3 I'm going to
410
00:22:08,180 --> 00:22:10,260
pick a student in the
second section.
411
00:22:10,260 --> 00:22:12,450
And in that case the conditional
expectation will
412
00:22:12,450 --> 00:22:14,220
take the value of 60.
413
00:22:14,220 --> 00:22:17,020
So this illustrates the idea
that the conditional
414
00:22:17,020 --> 00:22:19,300
expectation is a random
variable.
415
00:22:19,300 --> 00:22:21,760
Depending on what y is going
to be, the conditional
416
00:22:21,760 --> 00:22:25,320
expectation is going to be one
or the other value with
417
00:22:25,320 --> 00:22:27,260
certain probabilities.
418
00:22:27,260 --> 00:22:29,230
Now that we have the
distribution of the
419
00:22:29,230 --> 00:22:31,610
conditional expectation
we can calculate the
420
00:22:31,610 --> 00:22:33,560
expected value of it.
421
00:22:33,560 --> 00:22:37,220
And the expected value of such a
random variable is 1/3 times
422
00:22:37,220 --> 00:22:44,000
90, plus 2/3 times 60, and
it comes out to equal 70.
423
00:22:44,000 --> 00:22:49,020
Which miraculously is the same
number that we got up there.
424
00:22:49,020 --> 00:22:53,060
So this tells you that you can
calculate the overall average
425
00:22:53,060 --> 00:22:58,180
in a large class by taking the
averages in each one of the
426
00:22:58,180 --> 00:23:02,900
sections and weighing each one
of the sections according to
427
00:23:02,900 --> 00:23:06,320
the number of students
that it has.
428
00:23:06,320 --> 00:23:10,560
So this section had 90 students
but only 1/3 of the
429
00:23:10,560 --> 00:23:13,850
students, so it gets
a weight of 1/3.
430
00:23:13,850 --> 00:23:16,520
So the law of iterated
expectations, once more, is
431
00:23:16,520 --> 00:23:18,540
nothing too complicated.
432
00:23:18,540 --> 00:23:20,770
It's just that you can calculate
overall class
433
00:23:20,770 --> 00:23:22,780
average by looking
at the section
434
00:23:22,780 --> 00:23:26,330
averages and combine them.
435
00:23:26,330 --> 00:23:28,680
Now since the conditional
expectation is a random
436
00:23:28,680 --> 00:23:31,860
variable, of course it has
a variance of it's own.
437
00:23:31,860 --> 00:23:34,080
So let's calculate
the variance.
438
00:23:34,080 --> 00:23:36,060
How do we calculate variances?
439
00:23:36,060 --> 00:23:38,960
We look at all the possible
numerical values of this
440
00:23:38,960 --> 00:23:42,270
random variable, which
are 90 and 60.
441
00:23:42,270 --> 00:23:45,620
We look at the difference of
those possible numerical
442
00:23:45,620 --> 00:23:49,910
values from the mean of this
random variable, and the mean
443
00:23:49,910 --> 00:23:53,770
of that random variable, we
found that's it's 70.
444
00:23:53,770 --> 00:23:57,480
And then we weight the different
possible numerical
445
00:23:57,480 --> 00:23:59,960
values according to their
probabilities.
446
00:23:59,960 --> 00:24:03,930
So with probability 1/3 the
conditional expectation is 90,
447
00:24:03,930 --> 00:24:06,940
which is 20 away
from the mean.
448
00:24:06,940 --> 00:24:08,470
And we get this squared
distance.
449
00:24:08,470 --> 00:24:11,750
With probability 2/3 the
conditional expectation is 60,
450
00:24:11,750 --> 00:24:14,400
which is 10 away from the
mean, has this squared
451
00:24:14,400 --> 00:24:16,910
distance and gets weighed
by 2/3, which is the
452
00:24:16,910 --> 00:24:18,470
probability of 60.
453
00:24:18,470 --> 00:24:21,130
So you do the numbers, and you
get the value for the variance
454
00:24:21,130 --> 00:24:26,800
equal to 200.
455
00:24:26,800 --> 00:24:30,250
All right, so now we want to
move towards using that more
456
00:24:30,250 --> 00:24:33,770
complicated formula involving
the conditional variances.
457
00:24:33,770 --> 00:24:36,650
458
00:24:36,650 --> 00:24:40,470
OK, suppose someone goes and
calculates the variance of the
459
00:24:40,470 --> 00:24:44,060
quiz scores inside each
one of the sections.
460
00:24:44,060 --> 00:24:47,680
So someone gives us these two
pieces of information.
461
00:24:47,680 --> 00:24:53,230
In section one we take the
differences from the mean in
462
00:24:53,230 --> 00:24:57,900
that section, and let's say that
the various turns out to
463
00:24:57,900 --> 00:25:00,240
be a number equal
to 10 similarly
464
00:25:00,240 --> 00:25:01,410
in the second section.
465
00:25:01,410 --> 00:25:05,280
So these are the variances
of the quiz scores inside
466
00:25:05,280 --> 00:25:07,520
individual sections.
467
00:25:07,520 --> 00:25:09,850
The variance in one conditional
universe, the
468
00:25:09,850 --> 00:25:13,290
variance in the other
conditional universe.
469
00:25:13,290 --> 00:25:18,860
So if I pick a student in
section one and I don't tell
470
00:25:18,860 --> 00:25:21,400
you anything more about the
student, what's the variance
471
00:25:21,400 --> 00:25:23,530
of the random score
of that student?
472
00:25:23,530 --> 00:25:25,810
The variance is 10.
473
00:25:25,810 --> 00:25:28,210
I know why, but I don't
know the student.
474
00:25:28,210 --> 00:25:31,260
So the score is still a random
variable in that universe.
475
00:25:31,260 --> 00:25:33,860
It has a variance, and
that's the variance.
476
00:25:33,860 --> 00:25:36,330
Similarly, in the other
universe, the variance of the
477
00:25:36,330 --> 00:25:39,110
quiz scores is this
number, 20.
478
00:25:39,110 --> 00:25:42,650
Once more, this is an equality
between numbers.
479
00:25:42,650 --> 00:25:44,920
I have fixed the specific
value of y.
480
00:25:44,920 --> 00:25:48,440
So I put myself in a specific
universe, I can calculate the
481
00:25:48,440 --> 00:25:51,430
variance in that specific
universe.
482
00:25:51,430 --> 00:25:55,150
If I don't specify a numerical
value for capital Y, and say I
483
00:25:55,150 --> 00:25:58,390
don't know what Y is going to
be, it's going to be random.
484
00:25:58,390 --> 00:26:02,510
Then what kind of section
variance I'm going to get
485
00:26:02,510 --> 00:26:04,500
itself will be random.
486
00:26:04,500 --> 00:26:09,530
With probability 1/3, I pick a
student in the first section
487
00:26:09,530 --> 00:26:14,740
in which case the conditional
variance given what I have
488
00:26:14,740 --> 00:26:16,630
picked is going to be 10.
489
00:26:16,630 --> 00:26:20,990
Or with probability 2/3 I pick
y equal to 2, and I place
490
00:26:20,990 --> 00:26:22,690
myself in that universe.
491
00:26:22,690 --> 00:26:25,790
And in that universe the
conditional variance is 20.
492
00:26:25,790 --> 00:26:28,320
So you see again from here that
the conditional variance
493
00:26:28,320 --> 00:26:32,410
is a random variable that takes
different values with
494
00:26:32,410 --> 00:26:33,920
certain probabilities.
495
00:26:33,920 --> 00:26:37,830
And which value it takes depends
on the realization of
496
00:26:37,830 --> 00:26:41,670
the random variable capital Y.
So this happens if capital Y
497
00:26:41,670 --> 00:26:45,970
is one, this happens if capital
Y is equal to 2.
498
00:26:45,970 --> 00:26:50,000
Once you have something
of this form--
499
00:26:50,000 --> 00:26:52,040
a random variable that takes
values with certain
500
00:26:52,040 --> 00:26:53,150
probabilities--
501
00:26:53,150 --> 00:26:55,690
then you can certainly calculate
the expected value
502
00:26:55,690 --> 00:26:57,320
of that random variable.
503
00:26:57,320 --> 00:27:00,110
Don't get intimidated by the
fact that this random
504
00:27:00,110 --> 00:27:03,555
variable, it's something that's
described by a string
505
00:27:03,555 --> 00:27:07,850
of eight symbols, or
seven, instead of
506
00:27:07,850 --> 00:27:09,440
just a single letter.
507
00:27:09,440 --> 00:27:15,290
Think of this whole string of
symbols there as just being a
508
00:27:15,290 --> 00:27:16,940
random variable.
509
00:27:16,940 --> 00:27:21,790
You could call it z for example,
use one letter.
510
00:27:21,790 --> 00:27:25,990
So z is a random variable that
takes these two values with
511
00:27:25,990 --> 00:27:27,990
these corresponding
probabilities.
512
00:27:27,990 --> 00:27:31,210
So we can talk about the
expected value of Z, which is
513
00:27:31,210 --> 00:27:35,560
going to be 1/3 times 10, 2/3
times 20, and we get a certain
514
00:27:35,560 --> 00:27:38,260
number from here.
515
00:27:38,260 --> 00:27:41,620
And now we have all the pieces
to calculate the overall
516
00:27:41,620 --> 00:27:43,620
variance of x.
517
00:27:43,620 --> 00:27:49,330
The formula from the previous
slide tells us this.
518
00:27:49,330 --> 00:27:51,310
Do we have all the pieces?
519
00:27:51,310 --> 00:27:53,190
The expected value of
the variance, we
520
00:27:53,190 --> 00:27:55,160
just calculated it.
521
00:27:55,160 --> 00:27:58,710
The variance of the expected
value, this was the last
522
00:27:58,710 --> 00:28:00,410
calculation in the
previous slide.
523
00:28:00,410 --> 00:28:03,490
We did get a number for
it, it was 200.
524
00:28:03,490 --> 00:28:05,765
You add the two, you find
the total variance.
525
00:28:05,765 --> 00:28:09,050
526
00:28:09,050 --> 00:28:12,350
Now the useful piece of this
exercise is to try to
527
00:28:12,350 --> 00:28:16,490
interpret these two numbers,
and see what they mean.
528
00:28:16,490 --> 00:28:20,350
529
00:28:20,350 --> 00:28:26,670
The variance of x given y for
a specific y is the variance
530
00:28:26,670 --> 00:28:28,850
inside section one.
531
00:28:28,850 --> 00:28:31,820
This is the variance
inside section two.
532
00:28:31,820 --> 00:28:34,940
The expected value is some
kind of average of the
533
00:28:34,940 --> 00:28:38,440
variances inside individual
sections.
534
00:28:38,440 --> 00:28:41,770
So this term tells us
something about the
535
00:28:41,770 --> 00:28:46,010
variability of this course,
how widely spread they are
536
00:28:46,010 --> 00:28:47,856
within individual sections.
537
00:28:47,856 --> 00:28:50,580
538
00:28:50,580 --> 00:28:57,870
So we have three sections, and
this course happens to be--
539
00:28:57,870 --> 00:29:01,180
OK, let's say the sections
are really different.
540
00:29:01,180 --> 00:29:03,190
So here you have undergraduates
and here you
541
00:29:03,190 --> 00:29:05,860
have post-doctoral students.
542
00:29:05,860 --> 00:29:08,590
And these are the quiz scores,
that's section one, section
543
00:29:08,590 --> 00:29:09,960
two, section three.
544
00:29:09,960 --> 00:29:13,360
Here's the mean of the
first section.
545
00:29:13,360 --> 00:29:16,200
And the variance has something
to do with the spread.
546
00:29:16,200 --> 00:29:18,430
The variance in the second
section has something to do
547
00:29:18,430 --> 00:29:21,830
with the spread, similarly
with the third spread.
548
00:29:21,830 --> 00:29:28,220
And the expected value of the
conditional variances is some
549
00:29:28,220 --> 00:29:31,690
weighted average of the three
variances that we get from
550
00:29:31,690 --> 00:29:33,720
individual sections.
551
00:29:33,720 --> 00:29:37,060
So variability within sections
definitely contributes
552
00:29:37,060 --> 00:29:40,000
something to the overall
variability of this course.
553
00:29:40,000 --> 00:29:45,340
But if you ask me about the
variability over the entire
554
00:29:45,340 --> 00:29:47,740
class there's a second effect.
555
00:29:47,740 --> 00:29:50,470
That has to do with the fact
that different sections are
556
00:29:50,470 --> 00:29:52,660
very different from
each other.
557
00:29:52,660 --> 00:29:59,440
That these courses here are far
away from those scores.
558
00:29:59,440 --> 00:30:02,490
And this term is the one
that does the job.
559
00:30:02,490 --> 00:30:08,410
This one looks at the expected
values inside each section,
560
00:30:08,410 --> 00:30:12,840
and these expected values are
this, this, and that.
561
00:30:12,840 --> 00:30:18,230
And asks a question how widely
spread are they?
562
00:30:18,230 --> 00:30:23,000
It asks how different from
each other are the means
563
00:30:23,000 --> 00:30:25,400
inside individual sections?
564
00:30:25,400 --> 00:30:28,280
And in this picture it would be
a large number because the
565
00:30:28,280 --> 00:30:31,980
difference section means
are quite different.
566
00:30:31,980 --> 00:30:35,890
So the story that this formula
is telling us is that the
567
00:30:35,890 --> 00:30:40,810
overall variability of the quiz
scores consists of two
568
00:30:40,810 --> 00:30:44,720
factors that can be quantified
and added.
569
00:30:44,720 --> 00:30:49,580
One factor is how much
variability is there inside
570
00:30:49,580 --> 00:30:51,420
individual sections?
571
00:30:51,420 --> 00:30:54,990
And the other factor is how
different are the sections
572
00:30:54,990 --> 00:30:56,100
from each other?
573
00:30:56,100 --> 00:30:58,620
Both effects contribute
to the overall
574
00:30:58,620 --> 00:30:59,885
variability of this course.
575
00:30:59,885 --> 00:31:03,920
576
00:31:03,920 --> 00:31:08,290
Let's continue with just one
more numerical example.
577
00:31:08,290 --> 00:31:11,730
Just to get the hang of doing
these kinds of calculations,
578
00:31:11,730 --> 00:31:15,810
and apply this formula to do a
divide and conquer calculation
579
00:31:15,810 --> 00:31:18,270
of the variance of a
random variable.
580
00:31:18,270 --> 00:31:20,830
Just for variety now we're going
to take a continuous
581
00:31:20,830 --> 00:31:22,140
random variable.
582
00:31:22,140 --> 00:31:25,890
Somebody gives you a PDF if this
form, and they ask you
583
00:31:25,890 --> 00:31:26,640
for the variance.
584
00:31:26,640 --> 00:31:29,490
And you say oh that's too
complicated, I don't want to
585
00:31:29,490 --> 00:31:30,350
do integrals.
586
00:31:30,350 --> 00:31:32,480
Can I divide and conquer?
587
00:31:32,480 --> 00:31:35,210
And you say OK, let me do
the following trick.
588
00:31:35,210 --> 00:31:37,830
Let me define a random
variable, y.
589
00:31:37,830 --> 00:31:43,450
Which takes the value 1 if x
falls in here, and takes the
590
00:31:43,450 --> 00:31:47,080
value 2 if x falls in
the second interval.
591
00:31:47,080 --> 00:31:51,340
And let me try to work in the
conditional world where things
592
00:31:51,340 --> 00:31:54,340
might be easier, and then
add things up to
593
00:31:54,340 --> 00:31:57,540
get the overall variance.
594
00:31:57,540 --> 00:32:01,500
So I have defined y this
particular way.
595
00:32:01,500 --> 00:32:04,562
In this example y becomes
a function of x.
596
00:32:04,562 --> 00:32:07,370
y is completely determined
by x.
597
00:32:07,370 --> 00:32:11,230
And I'm going to calculate the
overall variance by trying to
598
00:32:11,230 --> 00:32:14,420
calculate all of the terms
that are involved here.
599
00:32:14,420 --> 00:32:16,430
So let's start calculating.
600
00:32:16,430 --> 00:32:21,690
First observation is that this
event has probability 1/3, and
601
00:32:21,690 --> 00:32:24,390
this event has probability
2/3.
602
00:32:24,390 --> 00:32:28,480
The expected value of x given
that we are in this universe
603
00:32:28,480 --> 00:32:31,260
is 1/2, because we
have a uniform
604
00:32:31,260 --> 00:32:33,350
distribution from 0 to 1.
605
00:32:33,350 --> 00:32:36,630
Here we have a uniform
distribution from 1 to 2, so
606
00:32:36,630 --> 00:32:40,820
the conditional expectation of
x in that universe is 3/2.
607
00:32:40,820 --> 00:32:43,200
How about conditional
variances?
608
00:32:43,200 --> 00:32:48,920
In the world who are y is equal
to 1 x has a uniform
609
00:32:48,920 --> 00:32:50,770
distribution on a
unit interval.
610
00:32:50,770 --> 00:32:53,090
What's the variance of x?
611
00:32:53,090 --> 00:32:57,480
By now you've probably seen that
formula, it's 1 over 12.
612
00:32:57,480 --> 00:33:00,580
1 over 12 is the variance of a
uniform distribution over a
613
00:33:00,580 --> 00:33:01,880
unit interval.
614
00:33:01,880 --> 00:33:07,120
When y is equal to 2 the
variance is again 1 over 12.
615
00:33:07,120 --> 00:33:10,850
Because in this instance again
x has a uniform distribution
616
00:33:10,850 --> 00:33:13,360
over an interval
of unit length.
617
00:33:13,360 --> 00:33:16,010
What's the overall expected
value of x?
618
00:33:16,010 --> 00:33:19,080
The way you find the overall
expected value is to consider
619
00:33:19,080 --> 00:33:21,370
the different numerical values
of the conditional
620
00:33:21,370 --> 00:33:22,450
expectation.
621
00:33:22,450 --> 00:33:25,570
And weigh them according
to their probabilities.
622
00:33:25,570 --> 00:33:28,770
So with probability 1/3
the conditional
623
00:33:28,770 --> 00:33:30,830
expectation is 1/2.
624
00:33:30,830 --> 00:33:34,170
And with probability
2/3 the conditional
625
00:33:34,170 --> 00:33:36,460
expectation is 3 over 2.
626
00:33:36,460 --> 00:33:39,555
And this turns out
to be 7 over 6.
627
00:33:39,555 --> 00:33:45,080
628
00:33:45,080 --> 00:33:48,450
So this is the advance work
we need to do, now let's
629
00:33:48,450 --> 00:33:50,660
calculate a few things here.
630
00:33:50,660 --> 00:33:56,660
What's the variance of the
expected value of x given y?
631
00:33:56,660 --> 00:34:00,800
Expected value of x given y is
a random variable that takes
632
00:34:00,800 --> 00:34:06,600
these two values with
these probabilities.
633
00:34:06,600 --> 00:34:10,610
So to find the variance we
consider the probability that
634
00:34:10,610 --> 00:34:18,730
the expected value takes the
numerical value of 1/2 minus
635
00:34:18,730 --> 00:34:23,659
the mean of the conditional
expectation.
636
00:34:23,659 --> 00:34:26,820
What's the mean of the
conditional expectation?
637
00:34:26,820 --> 00:34:28,560
It's the unconditional
expectation.
638
00:34:28,560 --> 00:34:30,980
So it's 7 over 6.
639
00:34:30,980 --> 00:34:32,889
We just did that calculation.
640
00:34:32,889 --> 00:34:38,050
So I'm putting here that number,
7 over 6 squared.
641
00:34:38,050 --> 00:34:41,830
And then there's a second term
with probability 2/3, the
642
00:34:41,830 --> 00:34:48,760
conditional expectation takes
this value of 3 over 2, which
643
00:34:48,760 --> 00:34:54,380
is so much away from the mean,
and we get this contribution.
644
00:34:54,380 --> 00:34:57,800
So this way we have calculated
the variance of the
645
00:34:57,800 --> 00:35:01,590
conditional expectation,
this is this term.
646
00:35:01,590 --> 00:35:04,000
What is this?
647
00:35:04,000 --> 00:35:05,940
Any guesses what
this number is?
648
00:35:05,940 --> 00:35:09,900
649
00:35:09,900 --> 00:35:11,740
It's 1 over 12, why?
650
00:35:11,740 --> 00:35:15,740
The conditional variance just
happened in this example to be
651
00:35:15,740 --> 00:35:18,550
1 over 12 no matter what.
652
00:35:18,550 --> 00:35:21,240
So the conditional variance
is a deterministic random
653
00:35:21,240 --> 00:35:23,530
variable that takes
a constant value.
654
00:35:23,530 --> 00:35:27,110
So the expected value of
this random variable
655
00:35:27,110 --> 00:35:29,490
is just 1 over 12.
656
00:35:29,490 --> 00:35:35,460
So we got the two pieces that we
need, and so we do have the
657
00:35:35,460 --> 00:35:39,515
overall variance of the
random variable x.
658
00:35:39,515 --> 00:35:45,680
659
00:35:45,680 --> 00:35:50,750
So this was just an academic
example in order to get the
660
00:35:50,750 --> 00:35:56,660
hang of how to manipulate
various quantities.
661
00:35:56,660 --> 00:36:00,480
Now let's use what we have
learned and the tools that we
662
00:36:00,480 --> 00:36:04,410
have to do something a little
more interesting.
663
00:36:04,410 --> 00:36:07,820
OK, so by now you're all in
love with probabilities.
664
00:36:07,820 --> 00:36:11,590
So over the weekend you're going
to bookstores to buy
665
00:36:11,590 --> 00:36:13,540
probability books.
666
00:36:13,540 --> 00:36:19,110
So you're going to visit a
random number bookstores, and
667
00:36:19,110 --> 00:36:23,900
at each one of the bookstores
you're going to spend a random
668
00:36:23,900 --> 00:36:26,420
amount of money.
669
00:36:26,420 --> 00:36:31,060
So let n be the number of stores
that you are visiting.
670
00:36:31,060 --> 00:36:32,890
So n is an integer--
671
00:36:32,890 --> 00:36:34,870
non-negative random variable--
672
00:36:34,870 --> 00:36:37,050
and perhaps you know
the distribution
673
00:36:37,050 --> 00:36:39,230
of that random variable.
674
00:36:39,230 --> 00:36:44,080
Each time that you walk into a
store your mind is clear from
675
00:36:44,080 --> 00:36:48,580
whatever you did before, and you
just buy a random number
676
00:36:48,580 --> 00:36:51,530
of books that has nothing to
do with how many books you
677
00:36:51,530 --> 00:36:53,650
bought earlier on the day.
678
00:36:53,650 --> 00:36:55,890
It has nothing to do with
how many stores you are
679
00:36:55,890 --> 00:36:57,490
visiting, and so on.
680
00:36:57,490 --> 00:37:00,760
So each time you enter as a
brand new person, and buy a
681
00:37:00,760 --> 00:37:02,180
random number of books,
and spend a
682
00:37:02,180 --> 00:37:03,580
random amount of money.
683
00:37:03,580 --> 00:37:07,160
So what I'm saying, more
precisely, is that I'm making
684
00:37:07,160 --> 00:37:08,760
the following assumptions.
685
00:37:08,760 --> 00:37:11,130
That for each store i--
686
00:37:11,130 --> 00:37:14,360
if you end up visiting
the i-th store--
687
00:37:14,360 --> 00:37:17,480
the amount of money that you
spend is a random variable
688
00:37:17,480 --> 00:37:19,090
that has a certain
distribution.
689
00:37:19,090 --> 00:37:23,410
That distribution is the same
for each store, and the xi's
690
00:37:23,410 --> 00:37:26,890
from store to store are
independent from each other.
691
00:37:26,890 --> 00:37:30,800
And furthermore, the xi's are
all independent of n.
692
00:37:30,800 --> 00:37:34,130
So how much I'm spending at the
store-- once I get in--
693
00:37:34,130 --> 00:37:37,280
has nothing to do with how
many stores I'm visiting.
694
00:37:37,280 --> 00:37:40,700
So this is the setting that
we're going to look at.
695
00:37:40,700 --> 00:37:45,470
y is the total amount of money
that you did spend.
696
00:37:45,470 --> 00:37:48,790
It's the sum of how much you
spent in the stores, but the
697
00:37:48,790 --> 00:37:53,980
index goes up to capital N.
And what's the twist here?
698
00:37:53,980 --> 00:37:57,460
It's that we're dealing with the
sum of independent random
699
00:37:57,460 --> 00:38:02,690
variables except that how many
random variables we have is
700
00:38:02,690 --> 00:38:07,470
not given to us ahead of time,
but it is chosen at random.
701
00:38:07,470 --> 00:38:12,480
So it's a sum of a random number
of random variables.
702
00:38:12,480 --> 00:38:15,360
We would like to calculate some
quantities that have to
703
00:38:15,360 --> 00:38:19,690
do with y, in particular the
expected value of y, or the
704
00:38:19,690 --> 00:38:21,930
variance of y.
705
00:38:21,930 --> 00:38:23,540
How do we go about it?
706
00:38:23,540 --> 00:38:26,950
OK, we know something about the
linearity of expectations.
707
00:38:26,950 --> 00:38:31,890
That expectation of a sum is the
sum of the expectations.
708
00:38:31,890 --> 00:38:37,180
But we have used that rule only
in the case where it's
709
00:38:37,180 --> 00:38:39,850
the sum of a fixed number
of random variables.
710
00:38:39,850 --> 00:38:43,670
So expected value of x plus y
plus z is expectation of x,
711
00:38:43,670 --> 00:38:46,390
plus expectation of y, plus
expectation of z.
712
00:38:46,390 --> 00:38:48,960
We know this for a fixed number
of random variables.
713
00:38:48,960 --> 00:38:53,140
We don't know it, or how it
would work for the case of a
714
00:38:53,140 --> 00:38:54,430
random number.
715
00:38:54,430 --> 00:38:57,870
Well, if we know something
about the case for fixed
716
00:38:57,870 --> 00:39:01,730
random variables let's transport
ourselves to a
717
00:39:01,730 --> 00:39:05,310
conditional universe where the
number of random variables
718
00:39:05,310 --> 00:39:07,570
we're summing is fixed.
719
00:39:07,570 --> 00:39:11,640
So let's try to break the
problem divide and conquer by
720
00:39:11,640 --> 00:39:15,300
conditioning on the different
possible values of the number
721
00:39:15,300 --> 00:39:17,290
of bookstores that
we're visiting.
722
00:39:17,290 --> 00:39:19,860
So let's work in the conditional
universe, find the
723
00:39:19,860 --> 00:39:24,950
conditional expectation in this
universe, and then use
724
00:39:24,950 --> 00:39:29,630
our law of iterated expectations
to see what
725
00:39:29,630 --> 00:39:32,840
happens more generally.
726
00:39:32,840 --> 00:39:37,120
If I told you that I visited
exactly little n stores, where
727
00:39:37,120 --> 00:39:40,420
little n now is a number,
let's say 10.
728
00:39:40,420 --> 00:39:44,840
Then the amount of money you're
spending is x1 plus x2
729
00:39:44,840 --> 00:39:51,060
all the way up to x10 given
that we visited 10 stores.
730
00:39:51,060 --> 00:39:54,640
So what I have done here is that
I've replaced the capital
731
00:39:54,640 --> 00:39:59,370
N with little n, and I can do
this because I'm now in the
732
00:39:59,370 --> 00:40:01,160
conditional universe
where I know that
733
00:40:01,160 --> 00:40:04,160
capital N is little n.
734
00:40:04,160 --> 00:40:06,840
Now little n is fixed.
735
00:40:06,840 --> 00:40:10,810
We have assumed that n is
independent from the xi's.
736
00:40:10,810 --> 00:40:15,900
So in this universe of a fixed
n this information here
737
00:40:15,900 --> 00:40:20,400
doesn't tell me anything new
about the values of the x's.
738
00:40:20,400 --> 00:40:24,600
If you're conditioning random
variables that are independent
739
00:40:24,600 --> 00:40:27,220
from the random variables you
are interested in, the
740
00:40:27,220 --> 00:40:30,630
conditioning has no effect,
and so it can be dropped.
741
00:40:30,630 --> 00:40:33,000
So in this conditional universe
where you visit
742
00:40:33,000 --> 00:40:35,720
exactly 10 stores the expected
amount of money you're
743
00:40:35,720 --> 00:40:40,840
spending is the expectation of
the amount of money spent in
744
00:40:40,840 --> 00:40:44,350
10 stores, which is the sum of
the expected amount of money
745
00:40:44,350 --> 00:40:45,880
in each store.
746
00:40:45,880 --> 00:40:48,760
Each one of these is the same
number, because the random
747
00:40:48,760 --> 00:40:50,960
variables have identical
distributions.
748
00:40:50,960 --> 00:40:54,130
So it's n times the expected
value of money you spent in a
749
00:40:54,130 --> 00:40:57,140
typical store.
750
00:40:57,140 --> 00:41:02,240
This is almost obvious without
doing it formally.
751
00:41:02,240 --> 00:41:05,010
If I'm telling you that you're
visiting 10 stores, what you
752
00:41:05,010 --> 00:41:09,220
expect to spend is 10 times the
amount you expect to spend
753
00:41:09,220 --> 00:41:12,180
in each store individually.
754
00:41:12,180 --> 00:41:16,480
Now let's take this equality
here and rewrite it in our
755
00:41:16,480 --> 00:41:20,030
abstract notation, in terms
of random variables.
756
00:41:20,030 --> 00:41:22,170
This is an equality
between numbers.
757
00:41:22,170 --> 00:41:25,440
Expected value of y given that
you visit 10 stores is 10
758
00:41:25,440 --> 00:41:28,220
times this particular number.
759
00:41:28,220 --> 00:41:30,345
Let's translate it into
random variables.
760
00:41:30,345 --> 00:41:36,290
In random variable notation,
the expected value of money
761
00:41:36,290 --> 00:41:39,610
you're spending given the
number of stores--
762
00:41:39,610 --> 00:41:42,480
but without telling you
a specific number--
763
00:41:42,480 --> 00:41:46,720
is whatever that number of
stores turns out to be times
764
00:41:46,720 --> 00:41:49,300
the expected value of x.
765
00:41:49,300 --> 00:41:55,110
So this is a random variable
that takes this as a numerical
766
00:41:55,110 --> 00:41:58,150
value whenever capital
N happens to be
767
00:41:58,150 --> 00:42:00,030
equal to little n.
768
00:42:00,030 --> 00:42:04,570
This is a random variable, which
by definition takes this
769
00:42:04,570 --> 00:42:07,450
numerical value whenever
capital N is
770
00:42:07,450 --> 00:42:09,520
equal to little n.
771
00:42:09,520 --> 00:42:14,960
So no matter what capital N
happens to be what specific
772
00:42:14,960 --> 00:42:18,870
value, little n, it takes
this is equal to that.
773
00:42:18,870 --> 00:42:21,590
Therefore the value of this
random variable is going to be
774
00:42:21,590 --> 00:42:23,350
equal to that random variable.
775
00:42:23,350 --> 00:42:26,750
So as random variables, these
two random variables are equal
776
00:42:26,750 --> 00:42:28,000
to each other.
777
00:42:28,000 --> 00:42:29,940
778
00:42:29,940 --> 00:42:33,200
And now we use the law of
iterated expectations.
779
00:42:33,200 --> 00:42:35,750
The law of iterated expectations
tells us that the
780
00:42:35,750 --> 00:42:39,530
overall expected value of y is
the expected value of the
781
00:42:39,530 --> 00:42:41,270
conditional expectation.
782
00:42:41,270 --> 00:42:43,650
We have a formula for the
conditional expectation.
783
00:42:43,650 --> 00:42:46,580
It's n times expected
value of x.
784
00:42:46,580 --> 00:42:50,390
Now the expected value
of x is a number.
785
00:42:50,390 --> 00:42:54,970
Expected value of something
random times a number is
786
00:42:54,970 --> 00:42:58,320
expected value of the
random variable
787
00:42:58,320 --> 00:42:59,820
times the number itself.
788
00:42:59,820 --> 00:43:02,880
We can take a number outside
the expectation.
789
00:43:02,880 --> 00:43:06,060
So expected value of
x gets pulled out.
790
00:43:06,060 --> 00:43:09,790
And that's the conclusion,
that overall the expected
791
00:43:09,790 --> 00:43:13,340
amount of money you're going to
spend is equal to how many
792
00:43:13,340 --> 00:43:16,670
stores you expect to visit on
the average, and how much
793
00:43:16,670 --> 00:43:22,050
money you expect to spend on
each one on the average.
794
00:43:22,050 --> 00:43:24,890
You might have guessed that
this is the answer.
795
00:43:24,890 --> 00:43:30,400
If you expect to visit 10
stores, and you expect to
796
00:43:30,400 --> 00:43:34,460
spend $100 on each store, then
yes, you expect to spend
797
00:43:34,460 --> 00:43:36,150
$1,000 today.
798
00:43:36,150 --> 00:43:39,050
You're not going to impress your
Harvard friends if you
799
00:43:39,050 --> 00:43:40,300
tell them that story.
800
00:43:40,300 --> 00:43:42,900
801
00:43:42,900 --> 00:43:46,410
It's one of the cases where
reasoning, on the average,
802
00:43:46,410 --> 00:43:50,160
does give you the plausible
answer.
803
00:43:50,160 --> 00:43:54,290
But you will be able to impress
your Harvard friends
804
00:43:54,290 --> 00:43:56,940
if you tell them that I can
actually calculate the
805
00:43:56,940 --> 00:44:01,510
variance of how much
I can spend.
806
00:44:01,510 --> 00:44:05,500
And we're going to work by
applying this formula that we
807
00:44:05,500 --> 00:44:09,710
have, and the difficulty is
basically sorting out all
808
00:44:09,710 --> 00:44:14,360
those terms here, and
what they mean.
809
00:44:14,360 --> 00:44:20,630
So let's start with this term.
810
00:44:20,630 --> 00:44:23,460
So the expected value of y given
that you're visiting n
811
00:44:23,460 --> 00:44:26,280
stores is n times the
expected value of x.
812
00:44:26,280 --> 00:44:28,250
That's what we did in
the previous slide.
813
00:44:28,250 --> 00:44:32,540
So this thing is a random
variable, it has a variance.
814
00:44:32,540 --> 00:44:34,300
What is the variance?
815
00:44:34,300 --> 00:44:39,240
Is the variance of n times
the expected value of x.
816
00:44:39,240 --> 00:44:42,010
Remember expected value
of x is a number.
817
00:44:42,010 --> 00:44:46,180
So we're dealing with the
variance of n times a number.
818
00:44:46,180 --> 00:44:48,330
What happens when you
multiply a random
819
00:44:48,330 --> 00:44:50,800
variable by a constant?
820
00:44:50,800 --> 00:44:55,020
The variance becomes the
previous variance times the
821
00:44:55,020 --> 00:44:56,650
constant squared.
822
00:44:56,650 --> 00:45:01,900
So the variance of this is the
variance of n times the square
823
00:45:01,900 --> 00:45:04,300
of that constant that
we had here.
824
00:45:04,300 --> 00:45:08,570
So this tells us the variance
of the expected
825
00:45:08,570 --> 00:45:10,290
value of y given n.
826
00:45:10,290 --> 00:45:13,380
This is the part of the
variability of how much money
827
00:45:13,380 --> 00:45:16,950
you're spending, which is
attributed to the randomness,
828
00:45:16,950 --> 00:45:19,650
or the variability, in
the number of stores
829
00:45:19,650 --> 00:45:21,380
that you are visiting.
830
00:45:21,380 --> 00:45:24,450
So the interpretation of the
two terms is there's
831
00:45:24,450 --> 00:45:27,760
randomness in how much you're
going to spend, and this is
832
00:45:27,760 --> 00:45:32,480
attributed to the randomness
in the number of stores
833
00:45:32,480 --> 00:45:36,660
together with the randomness
inside individual stores.
834
00:45:36,660 --> 00:45:40,110
Well, after I tell you how many
stores you're visiting.
835
00:45:40,110 --> 00:45:42,570
So now let's deal with this
term-- the variance inside
836
00:45:42,570 --> 00:45:45,020
individual stores.
837
00:45:45,020 --> 00:45:47,070
Let's take it slow.
838
00:45:47,070 --> 00:45:50,490
If I tell you that you're
visiting exactly little n
839
00:45:50,490 --> 00:45:54,220
stores, then y is how much
money you spent in those
840
00:45:54,220 --> 00:45:55,490
little n stores.
841
00:45:55,490 --> 00:45:59,480
You're dealing with the sum of
little n random variables.
842
00:45:59,480 --> 00:46:01,290
What is the variance
of the sum of
843
00:46:01,290 --> 00:46:03,120
little n random variables?
844
00:46:03,120 --> 00:46:05,880
It's the sum of their
variances.
845
00:46:05,880 --> 00:46:10,590
So each store contributes a
variance of x, and you're
846
00:46:10,590 --> 00:46:12,600
adding over little n stores.
847
00:46:12,600 --> 00:46:16,520
That's the variance of money
spent if I tell you
848
00:46:16,520 --> 00:46:18,040
the number of stores.
849
00:46:18,040 --> 00:46:26,430
Now let's translate this into
random variable notation.
850
00:46:26,430 --> 00:46:30,310
This is a random variable that
takes this numerical value
851
00:46:30,310 --> 00:46:33,630
whenever capital N is
equal to little n.
852
00:46:33,630 --> 00:46:37,020
This is a random variable that
takes this numerical value
853
00:46:37,020 --> 00:46:39,250
whenever capital N is
equal to little n.
854
00:46:39,250 --> 00:46:40,760
This is equal to that.
855
00:46:40,760 --> 00:46:43,960
Therefore, these two are always
equal, no matter what
856
00:46:43,960 --> 00:46:45,400
happens to y.
857
00:46:45,400 --> 00:46:49,100
So we have an equality here
between random variables.
858
00:46:49,100 --> 00:46:51,620
Now we take expectations
of both.
859
00:46:51,620 --> 00:46:56,160
Expected value of the variance
is expected value of this.
860
00:46:56,160 --> 00:46:59,890
OK it may look confusing to
think of the expected value of
861
00:46:59,890 --> 00:47:05,740
the variance here, but the
variance of x is a number, not
862
00:47:05,740 --> 00:47:06,650
a random variable.
863
00:47:06,650 --> 00:47:08,480
You think of it as a constant.
864
00:47:08,480 --> 00:47:12,580
So its expected value of n times
a constant gives us the
865
00:47:12,580 --> 00:47:16,420
expected value of n times
the constant itself.
866
00:47:16,420 --> 00:47:20,840
So now we got the second term
as well, and now we put
867
00:47:20,840 --> 00:47:24,900
everything together, this plus
that to get an expression for
868
00:47:24,900 --> 00:47:28,050
the overall variance of y.
869
00:47:28,050 --> 00:47:32,380
Which again, as I said before,
the overall variability in y
870
00:47:32,380 --> 00:47:36,790
has to do with the variability
of how much you spent inside
871
00:47:36,790 --> 00:47:39,210
the typical store.
872
00:47:39,210 --> 00:47:43,000
And the variability in
the number of stores
873
00:47:43,000 --> 00:47:45,510
that you are visiting.
874
00:47:45,510 --> 00:47:48,820
OK, so this is it for today.
875
00:47:48,820 --> 00:47:52,600
We'll change subjects quite
radically from next time.
876
00:47:52,600 --> 00:47:53,850