1
00:00:00,000 --> 00:00:00,040
2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative
3
00:00:02,460 --> 00:00:03,870
Commons license.
4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to
5
00:00:06,910 --> 00:00:10,560
offer high-quality educational
resources for free.
6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from
7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at
8
00:00:19,290 --> 00:00:20,540
ocw.mit.edu.
9
00:00:20,540 --> 00:00:22,420
10
00:00:22,420 --> 00:00:25,310
PROFESSOR: OK, good morning.
11
00:00:25,310 --> 00:00:30,930
So today, we're going to have
a fairly packed lecture.
12
00:00:30,930 --> 00:00:34,060
We are going to conclude
with chapter two,
13
00:00:34,060 --> 00:00:35,560
discrete random variables.
14
00:00:35,560 --> 00:00:37,140
And we will be talking
mostly about
15
00:00:37,140 --> 00:00:39,322
multiple random variables.
16
00:00:39,322 --> 00:00:43,060
And this is also the
last lecture as far
17
00:00:43,060 --> 00:00:44,720
as quiz one is concerned.
18
00:00:44,720 --> 00:00:48,350
So it's going to cover the
material until today, and of
19
00:00:48,350 --> 00:00:52,550
course the next recitation
and tutorial as well.
20
00:00:52,550 --> 00:00:57,170
OK, so we're going to review
quickly what we introduced at
21
00:00:57,170 --> 00:01:01,040
the end of last lecture, where
we talked about the joint PMF
22
00:01:01,040 --> 00:01:02,300
of two random variables.
23
00:01:02,300 --> 00:01:05,040
We're going to talk about the
case of more than two random
24
00:01:05,040 --> 00:01:07,440
variables as well.
25
00:01:07,440 --> 00:01:09,910
We're going to talk about
the familiar concepts of
26
00:01:09,910 --> 00:01:14,300
conditioning and independence,
but applied to random
27
00:01:14,300 --> 00:01:16,460
variables instead of events.
28
00:01:16,460 --> 00:01:19,320
We're going to look at the
expectations once more, talk
29
00:01:19,320 --> 00:01:22,720
about a few properties that they
have, and then solve a
30
00:01:22,720 --> 00:01:25,900
couple of problems and calculate
a few things in
31
00:01:25,900 --> 00:01:28,180
somewhat clever ways.
32
00:01:28,180 --> 00:01:31,790
So the first point I want to
make is that, to a large
33
00:01:31,790 --> 00:01:34,870
extent, whatever is happening
in our chapter on discrete
34
00:01:34,870 --> 00:01:39,160
random variables is just an
exercise in notation.
35
00:01:39,160 --> 00:01:42,850
There is stuff and concepts that
you are already familiar
36
00:01:42,850 --> 00:01:45,230
with-- probabilities,
probabilities of two things
37
00:01:45,230 --> 00:01:47,490
happening, conditional
probabilities.
38
00:01:47,490 --> 00:01:51,760
And all that we're doing, to
some extent, is rewriting
39
00:01:51,760 --> 00:01:54,840
those familiar concepts
in new notation.
40
00:01:54,840 --> 00:01:57,810
So for example, this
is the joint PMF
41
00:01:57,810 --> 00:01:59,020
of two random variable.
42
00:01:59,020 --> 00:02:02,080
It gives us, for any pair or
possible values of those
43
00:02:02,080 --> 00:02:05,510
random variables, the
probability that that pair
44
00:02:05,510 --> 00:02:07,270
occurs simultaneously.
45
00:02:07,270 --> 00:02:10,020
So it's the probability that
simultaneously x takes that
46
00:02:10,020 --> 00:02:13,580
value, and y takes
that other value.
47
00:02:13,580 --> 00:02:17,210
And similarly, we have the
notion of the conditional PMF,
48
00:02:17,210 --> 00:02:21,060
which is just a list of the --
condition of -- the various
49
00:02:21,060 --> 00:02:23,750
conditional probabilities
of interest, conditional
50
00:02:23,750 --> 00:02:26,450
probability that one random
variable takes this value
51
00:02:26,450 --> 00:02:30,320
given that the other random
variable takes that value.
52
00:02:30,320 --> 00:02:33,640
Now, a remark about conditional
probabilities.
53
00:02:33,640 --> 00:02:36,640
Conditional probabilities
generally are like ordinary
54
00:02:36,640 --> 00:02:37,370
probabilities.
55
00:02:37,370 --> 00:02:40,170
You condition on something
particular.
56
00:02:40,170 --> 00:02:43,230
So here we condition
on a particular y.
57
00:02:43,230 --> 00:02:46,580
So think of little y as
a fixed quantity.
58
00:02:46,580 --> 00:02:49,800
And then look at this
as a function of x.
59
00:02:49,800 --> 00:02:54,430
So given that y, which we
condition on, given our new
60
00:02:54,430 --> 00:02:58,990
universe, we're considering the
various possibilities for
61
00:02:58,990 --> 00:03:01,290
x and the probabilities
that they have.
62
00:03:01,290 --> 00:03:04,000
Now, the probabilities over
all x's, of course,
63
00:03:04,000 --> 00:03:05,830
needs to add to 1.
64
00:03:05,830 --> 00:03:11,530
So we should have a relation
of this kind.
65
00:03:11,530 --> 00:03:14,420
So they're just like ordinary
probabilities over the
66
00:03:14,420 --> 00:03:18,230
different x's in a universe
where we are told the value of
67
00:03:18,230 --> 00:03:20,940
the random variable y.
68
00:03:20,940 --> 00:03:22,335
Now, how are these related?
69
00:03:22,335 --> 00:03:25,200
70
00:03:25,200 --> 00:03:28,190
So we call these the marginal,
these the joint, these the
71
00:03:28,190 --> 00:03:29,150
conditional.
72
00:03:29,150 --> 00:03:31,510
And there are some relations
between these.
73
00:03:31,510 --> 00:03:35,430
For example, to find the
marginal from the joint, it's
74
00:03:35,430 --> 00:03:37,730
pretty straightforward.
75
00:03:37,730 --> 00:03:41,680
The probability that x takes a
particular value is the sum of
76
00:03:41,680 --> 00:03:45,030
the probabilities of all of the
different ways that this
77
00:03:45,030 --> 00:03:47,190
particular value may occur.
78
00:03:47,190 --> 00:03:48,380
What are the different ways?
79
00:03:48,380 --> 00:03:51,910
Well, it may occur together with
a certain y, or together
80
00:03:51,910 --> 00:03:55,110
with some other y, or together
with some other y.
81
00:03:55,110 --> 00:03:58,030
So you look at all the possible
y's that can go
82
00:03:58,030 --> 00:04:01,750
together with this x, and add
the probabilities of all of
83
00:04:01,750 --> 00:04:07,220
those pairs for which we get
this particular value of x.
84
00:04:07,220 --> 00:04:13,120
And then there's a relation
between that connects these
85
00:04:13,120 --> 00:04:16,230
two probabilities with the
conditional probability.
86
00:04:16,230 --> 00:04:18,630
And it's this relation.
87
00:04:18,630 --> 00:04:20,279
It's nothing new.
88
00:04:20,279 --> 00:04:25,160
It's just new notation for
writing what we already know,
89
00:04:25,160 --> 00:04:28,130
that the probability of two
things happening is the
90
00:04:28,130 --> 00:04:31,460
probability that the first thing
happens, and then given
91
00:04:31,460 --> 00:04:34,210
that the first thing happens,
the probability that the
92
00:04:34,210 --> 00:04:36,140
second one happened.
93
00:04:36,140 --> 00:04:39,050
So how do we go from
one to the other?
94
00:04:39,050 --> 00:04:42,960
Think of A as being the event
that X takes the value, little
95
00:04:42,960 --> 00:04:49,120
x, and B being the event that
Y takes the value, little y.
96
00:04:49,120 --> 00:04:52,230
So the joint probability is the
probability that these two
97
00:04:52,230 --> 00:04:54,220
things happen simultaneously.
98
00:04:54,220 --> 00:04:58,140
It's the probability that X
takes this value times the
99
00:04:58,140 --> 00:05:03,280
conditional probability that Y
takes this value, given that X
100
00:05:03,280 --> 00:05:04,670
took that first value.
101
00:05:04,670 --> 00:05:08,470
So it's the familiar
multiplication rule, but just
102
00:05:08,470 --> 00:05:11,030
transcribed in our
new notation.
103
00:05:11,030 --> 00:05:13,690
So nothing new so far.
104
00:05:13,690 --> 00:05:17,480
OK, why did we go through this
exercise and this notation?
105
00:05:17,480 --> 00:05:19,980
It's because in the experiments
where we're
106
00:05:19,980 --> 00:05:23,160
interested in the real world,
typically there's going to be
107
00:05:23,160 --> 00:05:24,630
lots of uncertain quantities.
108
00:05:24,630 --> 00:05:27,150
There's going to be multiple
random variables.
109
00:05:27,150 --> 00:05:31,520
And we want to be able to talk
about them simultaneously.
110
00:05:31,520 --> 00:05:31,665
Okay.
111
00:05:31,665 --> 00:05:35,110
Why two and not more than two?
112
00:05:35,110 --> 00:05:37,620
How about three random
variables?
113
00:05:37,620 --> 00:05:41,290
Well, if you understand what's
going on in this slide, you
114
00:05:41,290 --> 00:05:45,720
should be able to kind of
automatically generalize this
115
00:05:45,720 --> 00:05:48,260
to the case of multiple
random variables.
116
00:05:48,260 --> 00:05:51,590
So for example, if we have three
random variables, X, Y,
117
00:05:51,590 --> 00:05:56,720
and Z, and you see an expression
like this, it
118
00:05:56,720 --> 00:05:58,670
should be clear what it means.
119
00:05:58,670 --> 00:06:02,070
It's the probability that
X takes this value and
120
00:06:02,070 --> 00:06:06,240
simultaneously Y takes that
value and simultaneously Z
121
00:06:06,240 --> 00:06:07,765
takes that value.
122
00:06:07,765 --> 00:06:13,280
I guess that's an uppercase Z
here, that's a lowercase z.
123
00:06:13,280 --> 00:06:20,500
And if I ask you to find the
marginal of X, if I tell you
124
00:06:20,500 --> 00:06:24,340
the joint PMF of the three
random variables and I ask you
125
00:06:24,340 --> 00:06:27,320
for this value, how
would you find it?
126
00:06:27,320 --> 00:06:31,350
Well, you will try to generalize
this relation here.
127
00:06:31,350 --> 00:06:35,250
The probability that x occurs
is the sum of the
128
00:06:35,250 --> 00:06:44,450
probabilities of all events
that make X to take that
129
00:06:44,450 --> 00:06:45,870
particular value.
130
00:06:45,870 --> 00:06:47,400
So what are all the events?
131
00:06:47,400 --> 00:06:51,530
Well, this particular x can
happen together with some y
132
00:06:51,530 --> 00:06:52,790
and some z.
133
00:06:52,790 --> 00:06:55,150
We don't care which y and z.
134
00:06:55,150 --> 00:06:57,890
Any y and z will do.
135
00:06:57,890 --> 00:07:01,220
So when we consider all
possibilities, we need to add
136
00:07:01,220 --> 00:07:04,760
here over all possible values
of y's and z's.
137
00:07:04,760 --> 00:07:08,020
So consider all triples,
x, y, z.
138
00:07:08,020 --> 00:07:12,380
Fix x and consider all the
possibilities for the
139
00:07:12,380 --> 00:07:16,600
remaining variables, y and z,
add these up, and that gives
140
00:07:16,600 --> 00:07:24,740
you the marginal PMF of X. And
then there's other things that
141
00:07:24,740 --> 00:07:26,140
you can do.
142
00:07:26,140 --> 00:07:29,340
This is the multiplication
rule for two events.
143
00:07:29,340 --> 00:07:32,510
We saw back in chapter one that
there's a multiplication
144
00:07:32,510 --> 00:07:35,130
rule when you talk about
more than two events.
145
00:07:35,130 --> 00:07:38,860
And you can write a chain of
conditional probabilities.
146
00:07:38,860 --> 00:07:43,860
We can certainly do the same
in our new notation.
147
00:07:43,860 --> 00:07:45,810
So let's look at this
rule up here.
148
00:07:45,810 --> 00:07:48,700
149
00:07:48,700 --> 00:07:51,220
Multiplication rule for three
random variables,
150
00:07:51,220 --> 00:07:53,000
what does it say?
151
00:07:53,000 --> 00:07:55,280
The probability of three
things happening
152
00:07:55,280 --> 00:07:59,770
simultaneously, X, Y, Z taking
specific values, little x,
153
00:07:59,770 --> 00:08:03,110
little y, little z, that
probability is the probability
154
00:08:03,110 --> 00:08:07,210
that the first thing happens,
that X takes that value.
155
00:08:07,210 --> 00:08:09,880
Given that X takes that value,
we multiply it with the
156
00:08:09,880 --> 00:08:14,650
probability that Y takes
also a certain value.
157
00:08:14,650 --> 00:08:18,560
And now, given that X and Y have
taken those particular
158
00:08:18,560 --> 00:08:21,730
values, we multiply with a
conditional probability that
159
00:08:21,730 --> 00:08:24,380
the third thing happens,
given that the
160
00:08:24,380 --> 00:08:26,960
first two things happen.
161
00:08:26,960 --> 00:08:30,080
So this is just the
multiplication rule for three
162
00:08:30,080 --> 00:08:33,530
events, which would be
probability of A intersection
163
00:08:33,530 --> 00:08:35,669
B intersection C equals--
164
00:08:35,669 --> 00:08:37,909
you know the rest
of the formula.
165
00:08:37,909 --> 00:08:42,330
You just rewrite this formula
in PMF notation.
166
00:08:42,330 --> 00:08:45,310
Probability of A intersection
B intersection C is the
167
00:08:45,310 --> 00:08:49,450
probability of A, which
corresponds to this term,
168
00:08:49,450 --> 00:08:54,010
times the probability of B given
A, times the probability
169
00:08:54,010 --> 00:09:00,700
of C given A and B.
170
00:09:00,700 --> 00:09:04,920
So what else is there that's
left from chapter one that we
171
00:09:04,920 --> 00:09:10,190
can or should generalize
to random variables?
172
00:09:10,190 --> 00:09:12,560
Well, there's the notion
of independence.
173
00:09:12,560 --> 00:09:16,720
So let's define what
independence means.
174
00:09:16,720 --> 00:09:19,970
Instead of talking about just
two random variables, let's go
175
00:09:19,970 --> 00:09:22,470
directly to the case of multiple
random variables.
176
00:09:22,470 --> 00:09:24,400
When we talked about events,
things were a little
177
00:09:24,400 --> 00:09:25,100
complicated.
178
00:09:25,100 --> 00:09:28,480
We had a simple definition for
independence of two events.
179
00:09:28,480 --> 00:09:31,950
Two events are independent if
the probability of both is
180
00:09:31,950 --> 00:09:33,740
equal to the product of
the probabilities.
181
00:09:33,740 --> 00:09:35,830
But for three events, it
was kind of messy.
182
00:09:35,830 --> 00:09:38,460
We needed to write down
lots of conditions.
183
00:09:38,460 --> 00:09:41,140
For random variables,
things in some sense
184
00:09:41,140 --> 00:09:42,060
are a little simpler.
185
00:09:42,060 --> 00:09:46,360
We only need to write down one
formula and take this as the
186
00:09:46,360 --> 00:09:49,020
definition of independence.
187
00:09:49,020 --> 00:09:53,630
Three random variables are
independent if and only if, by
188
00:09:53,630 --> 00:09:58,390
definition, their joint
probability mass function
189
00:09:58,390 --> 00:10:02,560
factors out into individual
probability mass functions.
190
00:10:02,560 --> 00:10:08,190
So the probability that all
three things happen is the
191
00:10:08,190 --> 00:10:11,840
product of the individual
probabilities that each one of
192
00:10:11,840 --> 00:10:14,170
these three things
is happening.
193
00:10:14,170 --> 00:10:17,580
So independence means
mathematically that you can
194
00:10:17,580 --> 00:10:21,030
just multiply probabilities to
get to the probability of
195
00:10:21,030 --> 00:10:22,706
several things happening
simultaneously.
196
00:10:22,706 --> 00:10:25,680
197
00:10:25,680 --> 00:10:31,040
So with three events, we have
to write a huge number of
198
00:10:31,040 --> 00:10:34,500
equations, of equalities
that have to hold.
199
00:10:34,500 --> 00:10:37,500
How can it be that with random
variables we can only manage
200
00:10:37,500 --> 00:10:39,370
with one equality?
201
00:10:39,370 --> 00:10:41,230
Well, the catch is
that this is not
202
00:10:41,230 --> 00:10:43,260
really just one equality.
203
00:10:43,260 --> 00:10:48,390
We require this to be true for
every little x, y, and z.
204
00:10:48,390 --> 00:10:52,600
So in some sense, this is a
bunch of conditions that are
205
00:10:52,600 --> 00:10:56,300
being put on the joint PMF, a
bunch of conditions that we
206
00:10:56,300 --> 00:10:58,130
need to check.
207
00:10:58,130 --> 00:11:01,040
So this is the mathematical
definition.
208
00:11:01,040 --> 00:11:05,400
What is the intuitive content
of this definition?
209
00:11:05,400 --> 00:11:11,130
The intuitive content is
the same as for events.
210
00:11:11,130 --> 00:11:15,020
Random variables are independent
if knowing
211
00:11:15,020 --> 00:11:19,490
something about the realized
values of some of these random
212
00:11:19,490 --> 00:11:25,510
variables does not change our
beliefs about the likelihood
213
00:11:25,510 --> 00:11:29,510
of various values for the
remaining random variables.
214
00:11:29,510 --> 00:11:34,250
So independence would translate,
for example, to a
215
00:11:34,250 --> 00:11:39,690
condition such as the
conditional PMF of X , given
216
00:11:39,690 --> 00:11:46,420
y, should be equal to the
marginal PMF of X. What is
217
00:11:46,420 --> 00:11:47,490
this saying?
218
00:11:47,490 --> 00:11:53,070
That you have some original
beliefs about how likely it is
219
00:11:53,070 --> 00:11:55,210
for X to take this value.
220
00:11:55,210 --> 00:11:58,350
Now, someone comes and
tells you that Y took
221
00:11:58,350 --> 00:12:00,140
on a certain value.
222
00:12:00,140 --> 00:12:03,470
This causes you, in principle,
to revise your beliefs.
223
00:12:03,470 --> 00:12:06,430
And your new beliefs will be
captured by the conditional
224
00:12:06,430 --> 00:12:08,750
PMF, or the conditional
probabilities.
225
00:12:08,750 --> 00:12:12,820
Independence means that your
revised beliefs actually will
226
00:12:12,820 --> 00:12:15,420
be the same as your
original beliefs.
227
00:12:15,420 --> 00:12:19,960
Telling you information about
the value of Y doesn't change
228
00:12:19,960 --> 00:12:24,400
what you expect for the
random variable X.
229
00:12:24,400 --> 00:12:28,750
Why didn't we use this
definition for independence?
230
00:12:28,750 --> 00:12:31,900
Well, because this definition
only makes sense when this
231
00:12:31,900 --> 00:12:34,330
conditional is well-defined.
232
00:12:34,330 --> 00:12:43,290
And this conditional is only
well-defined if the events
233
00:12:43,290 --> 00:12:46,130
that Y takes on that particular
value has positive
234
00:12:46,130 --> 00:12:47,220
probability.
235
00:12:47,220 --> 00:12:51,730
We cannot condition on events
that have zero probability, so
236
00:12:51,730 --> 00:12:55,460
conditional probabilities are
only defined for y's that are
237
00:12:55,460 --> 00:12:59,500
likely to occur, that have
a positive probability.
238
00:12:59,500 --> 00:13:03,640
Now, similarly, with multiple
random variables, if they're
239
00:13:03,640 --> 00:13:07,970
independent, you would have
relations such as the
240
00:13:07,970 --> 00:13:14,290
conditional of X, given y and
z, should be the same as the
241
00:13:14,290 --> 00:13:17,340
marginal of X. What
is this saying?
242
00:13:17,340 --> 00:13:21,220
Again, that if I tell you the
values, the realized values of
243
00:13:21,220 --> 00:13:25,900
random variables Y and Z, this
is not going to change your
244
00:13:25,900 --> 00:13:28,900
beliefs about how likely
x is to occur.
245
00:13:28,900 --> 00:13:30,900
Whatever you believed in the
beginning, you're going to
246
00:13:30,900 --> 00:13:33,000
believe the same thing
afterwards.
247
00:13:33,000 --> 00:13:36,130
So it's important to keep that
intuition in mind, because
248
00:13:36,130 --> 00:13:39,200
sometimes this way you can tell
whether random variables
249
00:13:39,200 --> 00:13:42,820
are independent without having
to do calculations and to
250
00:13:42,820 --> 00:13:44,930
check this formula.
251
00:13:44,930 --> 00:13:47,300
OK, so let's check our concepts
252
00:13:47,300 --> 00:13:49,250
with a simple example.
253
00:13:49,250 --> 00:13:52,220
Let's look at two random
variables that are discrete,
254
00:13:52,220 --> 00:13:55,100
take values between
one and for each.
255
00:13:55,100 --> 00:13:57,890
And this is a table that
gives us the joint PMF.
256
00:13:57,890 --> 00:14:05,720
So it tells us the probability
that X equals to 2 and Y
257
00:14:05,720 --> 00:14:08,040
equals to 1 happening
simultaneously.
258
00:14:08,040 --> 00:14:10,810
It's an event that has
probability 1/20.
259
00:14:10,810 --> 00:14:14,510
Are these two random variables
independent?
260
00:14:14,510 --> 00:14:17,610
You can try to check a
condition like this.
261
00:14:17,610 --> 00:14:21,940
But can we tell directly
from the table?
262
00:14:21,940 --> 00:14:28,470
If I tell you a value of Y,
could that give you useful
263
00:14:28,470 --> 00:14:29,720
information about X?
264
00:14:29,720 --> 00:14:32,180
265
00:14:32,180 --> 00:14:32,860
Certainly.
266
00:14:32,860 --> 00:14:38,680
If I tell you that Y is equal
to 1, this tells you that X
267
00:14:38,680 --> 00:14:40,990
must be equal to 2.
268
00:14:40,990 --> 00:14:44,870
But if I tell you that Y was
equal to 3, this tells you
269
00:14:44,870 --> 00:14:47,540
that, still, X could
be anything.
270
00:14:47,540 --> 00:14:52,220
So telling you the value of
Y kind of changes what you
271
00:14:52,220 --> 00:14:57,240
expect or what you consider
possible for the values of the
272
00:14:57,240 --> 00:14:59,020
other random variable.
273
00:14:59,020 --> 00:15:03,070
So by just inspecting here, we
can tell that the random
274
00:15:03,070 --> 00:15:04,860
variables are not independent.
275
00:15:04,860 --> 00:15:08,290
276
00:15:08,290 --> 00:15:08,470
Okay.
277
00:15:08,470 --> 00:15:10,990
What's the other concept we
introduced in chapter one?
278
00:15:10,990 --> 00:15:14,060
We introduced the concept of
conditional independence.
279
00:15:14,060 --> 00:15:17,120
And conditional independence is
like ordinary independence
280
00:15:17,120 --> 00:15:20,420
but applied to a conditional
universe where we're given
281
00:15:20,420 --> 00:15:21,780
some information.
282
00:15:21,780 --> 00:15:24,610
So suppose someone tells you
that the outcome of the
283
00:15:24,610 --> 00:15:30,420
experiment is such that X is
less than or equal to 2 and Y
284
00:15:30,420 --> 00:15:33,920
is larger than or equal to 3.
285
00:15:33,920 --> 00:15:37,670
So we are given the information
that we now live
286
00:15:37,670 --> 00:15:40,010
inside this universe.
287
00:15:40,010 --> 00:15:42,080
So what happens inside
this universe?
288
00:15:42,080 --> 00:15:47,200
Inside this universe, our random
variables are going to
289
00:15:47,200 --> 00:15:55,140
have a new joint PMF which is
conditioned on the event that
290
00:15:55,140 --> 00:15:58,650
we were told that
it has occurred.
291
00:15:58,650 --> 00:16:04,780
So let A correspond to this
sort of event here.
292
00:16:04,780 --> 00:16:06,900
And now we're dealing with
conditional probabilities.
293
00:16:06,900 --> 00:16:09,490
What are those conditional
probabilities?
294
00:16:09,490 --> 00:16:11,490
We can put them in a table.
295
00:16:11,490 --> 00:16:14,220
So it's a two by two table,
since we only have two
296
00:16:14,220 --> 00:16:15,540
possible values.
297
00:16:15,540 --> 00:16:18,080
What are they going to be?
298
00:16:18,080 --> 00:16:20,740
Well, these probabilities
show up in the ratios
299
00:16:20,740 --> 00:16:22,910
1, 2, 2, and 4.
300
00:16:22,910 --> 00:16:25,480
Those ratios have to
stay the same.
301
00:16:25,480 --> 00:16:29,700
The probabilities need
to add up to one.
302
00:16:29,700 --> 00:16:34,030
So what should the denominators
be since these
303
00:16:34,030 --> 00:16:35,380
numbers add up to nine?
304
00:16:35,380 --> 00:16:37,820
These are the conditional
probabilities.
305
00:16:37,820 --> 00:16:40,575
So this is the conditional
PMF in this example.
306
00:16:40,575 --> 00:16:43,870
307
00:16:43,870 --> 00:16:46,990
Now, in this conditional
universe, is x
308
00:16:46,990 --> 00:16:48,255
independent from y?
309
00:16:48,255 --> 00:16:51,230
310
00:16:51,230 --> 00:17:01,450
If I tell you that y takes this
value, so we live in this
311
00:17:01,450 --> 00:17:04,980
universe, what do you
know about x?
312
00:17:04,980 --> 00:17:08,109
What you know about x is at this
value is twice as likely
313
00:17:08,109 --> 00:17:09,930
as that value.
314
00:17:09,930 --> 00:17:13,859
If I condition on y taking this
value, so we're living
315
00:17:13,859 --> 00:17:16,450
here, what do you
know about x?
316
00:17:16,450 --> 00:17:21,660
What you know about x is that
this value is twice as likely
317
00:17:21,660 --> 00:17:23,240
as that value.
318
00:17:23,240 --> 00:17:24,500
So it's the same.
319
00:17:24,500 --> 00:17:30,250
Whether we live here or we live
there, this x is twice as
320
00:17:30,250 --> 00:17:33,670
likely as that x.
321
00:17:33,670 --> 00:17:41,560
So the conditional PMF in this
new universe, the conditional
322
00:17:41,560 --> 00:17:55,970
PMF of X given y, in the new
universe is the same as the
323
00:17:55,970 --> 00:18:01,250
marginal PMF of X, but of course
in the new universe.
324
00:18:01,250 --> 00:18:04,370
So no matter what y is,
the conditional
325
00:18:04,370 --> 00:18:06,860
PMF of X is the same.
326
00:18:06,860 --> 00:18:12,150
And that conditional
PMF is 1/3 and 2/3.
327
00:18:12,150 --> 00:18:15,150
This is the conditional PMF of
X in the new universe no
328
00:18:15,150 --> 00:18:17,000
matter what y occurs.
329
00:18:17,000 --> 00:18:20,330
So Y does not give us any
information about X, doesn't
330
00:18:20,330 --> 00:18:25,620
cause us to change our beliefs
inside this little universe.
331
00:18:25,620 --> 00:18:28,440
And therefore the two random
variables are independent.
332
00:18:28,440 --> 00:18:31,180
Now, the other way that you
can verify that we have
333
00:18:31,180 --> 00:18:34,960
independence is to find the
marginal PMFs of the two
334
00:18:34,960 --> 00:18:36,250
random variables.
335
00:18:36,250 --> 00:18:39,650
The marginal PMF of
X, you find it by
336
00:18:39,650 --> 00:18:41,100
adding those two terms.
337
00:18:41,100 --> 00:18:42,720
You get 1/3.
338
00:18:42,720 --> 00:18:44,620
Adding those two terms,
you get 2/3.
339
00:18:44,620 --> 00:18:48,530
Marginal PMF of Y, you find it,
you add these two terms,
340
00:18:48,530 --> 00:18:51,410
and you get 1/3.
341
00:18:51,410 --> 00:18:56,470
And the marginal PMF of Y
here is going to be 2/3.
342
00:18:56,470 --> 00:18:59,700
And then you ask the question,
is the joint the product of
343
00:18:59,700 --> 00:19:00,860
the marginals?
344
00:19:00,860 --> 00:19:02,630
And indeed it is.
345
00:19:02,630 --> 00:19:05,330
This times this gives you 1/9.
346
00:19:05,330 --> 00:19:08,050
This times this gives you 2/9.
347
00:19:08,050 --> 00:19:12,180
So the values in the table with
the joint PMFs is the
348
00:19:12,180 --> 00:19:17,220
product of the marginal PMFs of
X and Y in this universe,
349
00:19:17,220 --> 00:19:19,090
so the two random variables are
350
00:19:19,090 --> 00:19:21,850
independent inside this universe.
351
00:19:21,850 --> 00:19:26,704
So we say that they're
conditionally independent.
352
00:19:26,704 --> 00:19:28,500
All right.
353
00:19:28,500 --> 00:19:32,720
Now let's move to the new topic,
to the new concept that
354
00:19:32,720 --> 00:19:35,170
we introduce in this chapter,
which is the concept of
355
00:19:35,170 --> 00:19:36,440
expectations.
356
00:19:36,440 --> 00:19:38,200
So what are the things
to know here?
357
00:19:38,200 --> 00:19:40,150
One is the general idea.
358
00:19:40,150 --> 00:19:43,140
The way to think about
expectations is that it's
359
00:19:43,140 --> 00:19:46,080
something like the average value
for random variable if
360
00:19:46,080 --> 00:19:49,590
you do an experiment over and
over, and if you interpret
361
00:19:49,590 --> 00:19:51,550
probabilities as frequencies.
362
00:19:51,550 --> 00:19:57,030
So you get x's over and over
with a certain frequency --
363
00:19:57,030 --> 00:19:58,670
P(x) --
364
00:19:58,670 --> 00:20:01,160
a particular value, little
x, gets realized.
365
00:20:01,160 --> 00:20:03,960
And each time that this happens,
you get x dollars.
366
00:20:03,960 --> 00:20:06,040
How many dollars do you
get on the average?
367
00:20:06,040 --> 00:20:09,330
Well, this formula gives you
that particular average.
368
00:20:09,330 --> 00:20:13,190
So first thing we do is to write
down a definition for
369
00:20:13,190 --> 00:20:15,420
this sort of concept.
370
00:20:15,420 --> 00:20:19,810
But then the other things you
need to know is how to
371
00:20:19,810 --> 00:20:23,990
calculate expectations using
shortcuts sometimes, and what
372
00:20:23,990 --> 00:20:25,440
properties they have.
373
00:20:25,440 --> 00:20:28,500
The most important shortcut
there is is that, if you want
374
00:20:28,500 --> 00:20:31,250
to calculate the expected value,
the average value for a
375
00:20:31,250 --> 00:20:36,380
random variable, you do not need
to find the PMF of that
376
00:20:36,380 --> 00:20:37,530
random variable.
377
00:20:37,530 --> 00:20:41,180
But you can work directly with
the x's and the y's.
378
00:20:41,180 --> 00:20:44,210
So you do the experiment
over and over.
379
00:20:44,210 --> 00:20:46,670
The outcome of the experiment
is a pair (x,y).
380
00:20:46,670 --> 00:20:49,400
And each time that a certain
(x,y) happens,
381
00:20:49,400 --> 00:20:51,280
you get so many dollars.
382
00:20:51,280 --> 00:20:54,990
So this fraction of the time,
a certain (x,y) happens.
383
00:20:54,990 --> 00:20:58,050
And that fraction of the time,
you get so many dollars, so
384
00:20:58,050 --> 00:21:00,860
this is the average number
of dollars that you get.
385
00:21:00,860 --> 00:21:05,230
So what you end up, since it
is the average, then that
386
00:21:05,230 --> 00:21:07,830
means that it corresponds
to the expected value.
387
00:21:07,830 --> 00:21:09,820
Now, this is something that, of
course, needs a little bit
388
00:21:09,820 --> 00:21:10,850
of mathematical proof.
389
00:21:10,850 --> 00:21:13,880
But this is just a different
way of accounting.
390
00:21:13,880 --> 00:21:16,510
And it turns out we give
you the right answer.
391
00:21:16,510 --> 00:21:19,420
And it's a very useful
shortcut.
392
00:21:19,420 --> 00:21:22,070
Now, when we're talking about
functions of random variables,
393
00:21:22,070 --> 00:21:26,620
in general, we cannot speak
just about averages.
394
00:21:26,620 --> 00:21:29,690
That is, the expected value
of a function of a random
395
00:21:29,690 --> 00:21:31,860
variable is not the same
as the function of
396
00:21:31,860 --> 00:21:33,320
the expected values.
397
00:21:33,320 --> 00:21:36,120
A function of averages is
not the same as the
398
00:21:36,120 --> 00:21:38,380
average of a function.
399
00:21:38,380 --> 00:21:40,510
So in general, this
is not true.
400
00:21:40,510 --> 00:21:43,960
But what it's important to know
is to know the exceptions
401
00:21:43,960 --> 00:21:45,370
to this rule.
402
00:21:45,370 --> 00:21:48,620
And the important exceptions
are mainly two.
403
00:21:48,620 --> 00:21:51,560
One is the case of linear
404
00:21:51,560 --> 00:21:53,040
functions of a random variable.
405
00:21:53,040 --> 00:21:54,800
We discussed this last time.
406
00:21:54,800 --> 00:21:59,810
So the expected value of
temperature in Celsius is, you
407
00:21:59,810 --> 00:22:03,340
first find the expected value of
temperature in Fahrenheit,
408
00:22:03,340 --> 00:22:05,810
and then you do the conversion
to Celsius.
409
00:22:05,810 --> 00:22:08,600
So whether you first average and
then do the conversion to
410
00:22:08,600 --> 00:22:11,730
the new units or not, it
shouldn't matter when you get
411
00:22:11,730 --> 00:22:13,740
the result.
412
00:22:13,740 --> 00:22:16,740
The other property that turns
out to be true when you talk
413
00:22:16,740 --> 00:22:19,280
about multiple random variables
is that expectation
414
00:22:19,280 --> 00:22:21,070
still behaves linearly.
415
00:22:21,070 --> 00:22:26,600
So let X, Y, and Z be the score
of a random student at
416
00:22:26,600 --> 00:22:29,940
each one of the three
sections of the SAT.
417
00:22:29,940 --> 00:22:36,310
So the overall SAT score is X
plus Y plus Z. This is the
418
00:22:36,310 --> 00:22:40,940
average score, the average
total SAT score.
419
00:22:40,940 --> 00:22:43,790
Another way to calculate that
average is to look at the
420
00:22:43,790 --> 00:22:47,480
first section of the SAT and
see what was the average.
421
00:22:47,480 --> 00:22:50,710
Look at the second section, look
at what was the average,
422
00:22:50,710 --> 00:22:53,470
and so the third, and
add the averages.
423
00:22:53,470 --> 00:22:56,910
So you can do the averages for
each section separately, add
424
00:22:56,910 --> 00:23:00,500
the averages, or you can find
total scores for each student
425
00:23:00,500 --> 00:23:01,710
and average them.
426
00:23:01,710 --> 00:23:05,690
So I guess you probably believe
that this is correct
427
00:23:05,690 --> 00:23:09,030
if you talk just about
averaging scores.
428
00:23:09,030 --> 00:23:12,580
Since expectations are just the
variation of averages, it
429
00:23:12,580 --> 00:23:16,010
turns out that this is
also true in general.
430
00:23:16,010 --> 00:23:19,760
And the derivation of this is
very simple, based on the
431
00:23:19,760 --> 00:23:21,320
expected value rule.
432
00:23:21,320 --> 00:23:24,450
And you can look at
it in the notes.
433
00:23:24,450 --> 00:23:27,740
So this is one exception,
which is linearity.
434
00:23:27,740 --> 00:23:31,540
The second important exception
is the case of independent
435
00:23:31,540 --> 00:23:34,520
random variables, that the
product of two random
436
00:23:34,520 --> 00:23:37,830
variables has an expectation
which is the product of the
437
00:23:37,830 --> 00:23:38,980
expectations.
438
00:23:38,980 --> 00:23:41,400
In general, this is not true.
439
00:23:41,400 --> 00:23:47,010
But for the case where we have
independence, the expectation
440
00:23:47,010 --> 00:23:48,080
works out as follows.
441
00:23:48,080 --> 00:23:55,130
Using the expected value rule,
this is how you calculate the
442
00:23:55,130 --> 00:23:59,170
expected value of a function
of a random variable.
443
00:23:59,170 --> 00:24:04,810
So think of this as being your
g(X, Y) and this being your
444
00:24:04,810 --> 00:24:06,160
g(little x, y).
445
00:24:06,160 --> 00:24:08,760
So this is something that's
generally true.
446
00:24:08,760 --> 00:24:20,350
Now, if we have independence,
then the PMFs factor out, and
447
00:24:20,350 --> 00:24:25,660
then you can separate this sum
by bringing together the x
448
00:24:25,660 --> 00:24:30,130
terms, bring them outside
the y summation.
449
00:24:30,130 --> 00:24:34,370
And you find that this is the
same as expected value of X
450
00:24:34,370 --> 00:24:38,890
times the expected value of Y.
So independence is used in
451
00:24:38,890 --> 00:24:40,140
this step here.
452
00:24:40,140 --> 00:24:44,020
453
00:24:44,020 --> 00:24:48,640
OK, now what if X and Y are
independent, but instead of
454
00:24:48,640 --> 00:24:51,020
taking the expectation of
X times Y, we take the
455
00:24:51,020 --> 00:24:56,600
expectation of the product of
two functions of X and Y?
456
00:24:56,600 --> 00:24:59,560
I claim that the expected value
of the product is still
457
00:24:59,560 --> 00:25:02,630
going to be the product of
the expected values.
458
00:25:02,630 --> 00:25:04,180
How do we show that?
459
00:25:04,180 --> 00:25:09,230
We could show it by just redoing
this derivation here.
460
00:25:09,230 --> 00:25:13,500
Instead of X and Y, we would
have g(X) and h(Y), so the
461
00:25:13,500 --> 00:25:14,850
algebra goes through.
462
00:25:14,850 --> 00:25:17,720
But there's a better way to
think about it which is more
463
00:25:17,720 --> 00:25:18,960
conceptual.
464
00:25:18,960 --> 00:25:20,886
And here's the idea.
465
00:25:20,886 --> 00:25:25,750
If X and Y are independent,
what does it mean?
466
00:25:25,750 --> 00:25:31,180
X does not convey any
information about Y. If X
467
00:25:31,180 --> 00:25:36,350
conveys no information about Y,
does X convey information
468
00:25:36,350 --> 00:25:40,500
about h(Y)?
469
00:25:40,500 --> 00:25:41,940
No.
470
00:25:41,940 --> 00:25:46,160
If X tells me nothing about Y,
nothing new, it shouldn't tell
471
00:25:46,160 --> 00:25:50,580
me anything about h(Y).
472
00:25:50,580 --> 00:25:59,270
Now, if X tells me nothing about
h of h(Y), could g(X)
473
00:25:59,270 --> 00:26:01,470
tell me something about h(Y)?
474
00:26:01,470 --> 00:26:02,250
No.
475
00:26:02,250 --> 00:26:06,780
So the idea is that, if X is
unrelated to Y, doesn't have
476
00:26:06,780 --> 00:26:11,080
any useful information, then
g(X) could not have any useful
477
00:26:11,080 --> 00:26:13,250
information for h(Y).
478
00:26:13,250 --> 00:26:21,030
So if X and Y are independent,
then g(X) and h(Y) are also
479
00:26:21,030 --> 00:26:22,280
independent.
480
00:26:22,280 --> 00:26:27,150
481
00:26:27,150 --> 00:26:29,430
So this is something that
one can try to prove
482
00:26:29,430 --> 00:26:31,500
mathematically, but it's more
important to understand
483
00:26:31,500 --> 00:26:34,530
conceptually why this is so.
484
00:26:34,530 --> 00:26:38,220
It's in terms of conveying
information.
485
00:26:38,220 --> 00:26:44,950
So if X tells me nothing about
Y, X cannot tell me anything
486
00:26:44,950 --> 00:26:48,490
about Y cubed, or X cannot
tell me anything by Y
487
00:26:48,490 --> 00:26:51,030
squared, and so on.
488
00:26:51,030 --> 00:26:52,260
That's the idea.
489
00:26:52,260 --> 00:26:57,180
And once we are convinced that
g(X) and h(Y) are independent,
490
00:26:57,180 --> 00:27:00,550
then we can apply our previous
rule, that for independent
491
00:27:00,550 --> 00:27:04,390
random variables, expectations
multiply the right way.
492
00:27:04,390 --> 00:27:08,660
Apply the previous rule, but
apply it now to these two
493
00:27:08,660 --> 00:27:10,490
independent random variables.
494
00:27:10,490 --> 00:27:12,785
And we get the conclusion
that we wanted.
495
00:27:12,785 --> 00:27:15,500
496
00:27:15,500 --> 00:27:19,050
Now, besides expectations, we
also introduced the concept of
497
00:27:19,050 --> 00:27:20,300
the variance.
498
00:27:20,300 --> 00:27:23,560
499
00:27:23,560 --> 00:27:27,450
And if you remember the
definition of the variance,
500
00:27:27,450 --> 00:27:31,100
let me write down the formula
for the variance of aX.
501
00:27:31,100 --> 00:27:34,920
It's the expected value of the
random variable that we're
502
00:27:34,920 --> 00:27:39,630
looking at minus the expected
value of the random variable
503
00:27:39,630 --> 00:27:42,050
that we're looking at.
504
00:27:42,050 --> 00:27:44,780
So this is the difference
of the random
505
00:27:44,780 --> 00:27:47,850
variable from its mean.
506
00:27:47,850 --> 00:27:50,880
And we take that difference
and square it, so it's the
507
00:27:50,880 --> 00:27:53,070
squared distance from the
mean, and then take
508
00:27:53,070 --> 00:27:55,250
expectations of the
whole thing.
509
00:27:55,250 --> 00:27:59,570
So when you look at that
expression, you realize that a
510
00:27:59,570 --> 00:28:01,780
can be pulled out of
those expressions.
511
00:28:01,780 --> 00:28:04,540
512
00:28:04,540 --> 00:28:10,340
And because there is a squared,
when you pull out the
513
00:28:10,340 --> 00:28:12,980
a, it's going to come
out as an a-squared.
514
00:28:12,980 --> 00:28:16,050
So that gives us the rule for
finding the variance of a
515
00:28:16,050 --> 00:28:18,990
scale or product of
a random variable.
516
00:28:18,990 --> 00:28:22,370
The variance captures the idea
of how wide, how spread out a
517
00:28:22,370 --> 00:28:24,210
certain distribution is.
518
00:28:24,210 --> 00:28:26,600
Bigger variance means it's
more spread out.
519
00:28:26,600 --> 00:28:29,360
Now, if you take a random
variable and the constants to
520
00:28:29,360 --> 00:28:31,960
it, what does it do to
its distribution?
521
00:28:31,960 --> 00:28:35,480
It just shifts it, but it
doesn't change its width.
522
00:28:35,480 --> 00:28:37,140
So intuitively it
means that the
523
00:28:37,140 --> 00:28:39,030
variance should not change.
524
00:28:39,030 --> 00:28:42,360
You can check that
mathematically, but it should
525
00:28:42,360 --> 00:28:44,290
also make sense intuitively.
526
00:28:44,290 --> 00:28:47,710
So the variance, when you add
the constant, does not change.
527
00:28:47,710 --> 00:28:51,680
Now, can you add variances is
the way we added expectations?
528
00:28:51,680 --> 00:28:54,760
Does variance behave linearly?
529
00:28:54,760 --> 00:28:57,810
It turns out that not always.
530
00:28:57,810 --> 00:28:59,270
Here, we need a condition.
531
00:28:59,270 --> 00:29:03,880
It's only in special cases--
532
00:29:03,880 --> 00:29:06,210
for example, when the two
random variables are
533
00:29:06,210 --> 00:29:07,190
independent--
534
00:29:07,190 --> 00:29:09,300
that you can add variances.
535
00:29:09,300 --> 00:29:13,300
The variance of the sum is the
sum of the variances if X and
536
00:29:13,300 --> 00:29:15,370
Y are independent.
537
00:29:15,370 --> 00:29:18,880
The derivation of this is,
again, very short and simple.
538
00:29:18,880 --> 00:29:22,590
We'll skip it, but it's an
important fact to remember.
539
00:29:22,590 --> 00:29:26,140
Now, to appreciate why this
equality is not true always,
540
00:29:26,140 --> 00:29:28,980
we can think of some
extreme examples.
541
00:29:28,980 --> 00:29:32,250
Suppose that X is the same as
Y. What's going to be the
542
00:29:32,250 --> 00:29:34,520
variance of X plus Y?
543
00:29:34,520 --> 00:29:39,810
Well, X plus Y, in this case,
is the same as 2X, so we're
544
00:29:39,810 --> 00:29:44,620
going to get 4 times the
variance of X, which is
545
00:29:44,620 --> 00:29:49,770
different than the variance of
X plus the variance of X.
546
00:29:49,770 --> 00:29:52,920
So that expression would give
us twice the variance of X.
547
00:29:52,920 --> 00:29:56,460
But actually now it's 4 times
the variance of X. The other
548
00:29:56,460 --> 00:30:01,990
extreme would be if X is equal
to -Y. Then the variance is
549
00:30:01,990 --> 00:30:05,390
the variance of the random
variable, which is always
550
00:30:05,390 --> 00:30:07,020
equal to 0.
551
00:30:07,020 --> 00:30:09,980
Now, a random variable which
is always equal to 0 has no
552
00:30:09,980 --> 00:30:10,700
uncertainty.
553
00:30:10,700 --> 00:30:14,570
It is always equal to its mean
value, so the variance, in
554
00:30:14,570 --> 00:30:17,090
this case, turns out to be 0.
555
00:30:17,090 --> 00:30:19,940
So in both of these cases,
of course we have random
556
00:30:19,940 --> 00:30:23,020
variables that are extremely
dependent.
557
00:30:23,020 --> 00:30:24,740
Why are they dependent?
558
00:30:24,740 --> 00:30:27,940
Because if I tell you something
about Y, it tells
559
00:30:27,940 --> 00:30:32,020
you an awful lot about the value
of X. There's a lot of
560
00:30:32,020 --> 00:30:34,910
information about X if
I tell you Y, in this
561
00:30:34,910 --> 00:30:37,050
case or in that case.
562
00:30:37,050 --> 00:30:39,940
And finally, a short drill.
563
00:30:39,940 --> 00:30:42,570
If I tell you that the random
variables are independent and
564
00:30:42,570 --> 00:30:44,840
you want to calculate the
variance of a linear
565
00:30:44,840 --> 00:30:48,330
combination of this kind,
then how do you argue?
566
00:30:48,330 --> 00:30:51,940
You argue that, since X and Y
are independent, this means
567
00:30:51,940 --> 00:30:55,660
that X and 3Y are also
independent.
568
00:30:55,660 --> 00:30:59,610
X has no information about Y, so
X has no information about
569
00:30:59,610 --> 00:31:05,000
-Y. X has no information about
-Y, so X should not have any
570
00:31:05,000 --> 00:31:10,270
information about -3Y.
571
00:31:10,270 --> 00:31:14,400
So X and -3Y are independent.
572
00:31:14,400 --> 00:31:18,480
So the variance of Z should be
the variance of X plus the
573
00:31:18,480 --> 00:31:26,910
variance of -3Y, which is the
variance of X plus 9 times the
574
00:31:26,910 --> 00:31:31,760
variance of Y. The important
thing to note here is that no
575
00:31:31,760 --> 00:31:34,080
matter what happens, you
end up getting a
576
00:31:34,080 --> 00:31:37,000
plus here, not a minus.
577
00:31:37,000 --> 00:31:41,160
So that's the sort of important
thing to remember in
578
00:31:41,160 --> 00:31:42,410
this type of calculation.
579
00:31:42,410 --> 00:31:44,820
580
00:31:44,820 --> 00:31:48,890
So this has been all concepts,
reviews, new
581
00:31:48,890 --> 00:31:50,390
concepts and all that.
582
00:31:50,390 --> 00:31:52,720
It's the usual fire hose.
583
00:31:52,720 --> 00:31:56,680
Now let's use them to do
something useful finally.
584
00:31:56,680 --> 00:31:59,220
So let's revisit our old
example, the binomial
585
00:31:59,220 --> 00:32:03,350
distribution, which counts the
number of successes in
586
00:32:03,350 --> 00:32:06,230
independent trials of a coin.
587
00:32:06,230 --> 00:32:09,030
It's a biased coin that has
a probability of heads, or
588
00:32:09,030 --> 00:32:13,000
probability of success, equal
to p at each trial.
589
00:32:13,000 --> 00:32:16,160
Finally, we can go through the
exercise of calculating the
590
00:32:16,160 --> 00:32:18,820
expected value of this
random variable.
591
00:32:18,820 --> 00:32:21,790
And there's the way of
calculating that expectation
592
00:32:21,790 --> 00:32:24,260
that would be the favorite
of those people who enjoy
593
00:32:24,260 --> 00:32:27,500
algebra, which is to write down
the definition of the
594
00:32:27,500 --> 00:32:28,740
expected value.
595
00:32:28,740 --> 00:32:31,980
We add over all possible values
of the random variable,
596
00:32:31,980 --> 00:32:35,580
over all the possible k's, and
weigh them according to the
597
00:32:35,580 --> 00:32:38,440
probabilities that this
particular k occurs.
598
00:32:38,440 --> 00:32:42,250
The probability that X takes on
a particular value k is, of
599
00:32:42,250 --> 00:32:44,820
course, the binomial
PMF, which is
600
00:32:44,820 --> 00:32:47,560
this familiar formula.
601
00:32:47,560 --> 00:32:50,480
Clearly, that would be a messy
and challenging calculation.
602
00:32:50,480 --> 00:32:52,490
Can we find a shortcut?
603
00:32:52,490 --> 00:32:54,010
There's a very clever trick.
604
00:32:54,010 --> 00:32:56,690
There's lots of problems in
probability that you can
605
00:32:56,690 --> 00:33:00,000
approach really nicely by
breaking up the random
606
00:33:00,000 --> 00:33:03,830
variable of interest into a
sum of simpler and more
607
00:33:03,830 --> 00:33:06,010
manageable random variables.
608
00:33:06,010 --> 00:33:09,700
And if you can make it to be a
sum of random variables that
609
00:33:09,700 --> 00:33:12,590
are just 0's or 1's,
so much the better.
610
00:33:12,590 --> 00:33:13,990
Life is easier.
611
00:33:13,990 --> 00:33:16,850
Random variables that take
values 0 or 1, we call them
612
00:33:16,850 --> 00:33:18,380
indicator variables.
613
00:33:18,380 --> 00:33:21,700
They indicate whether an event
has occurred or not.
614
00:33:21,700 --> 00:33:25,600
In this case, we look at each
coin flip one at a time.
615
00:33:25,600 --> 00:33:29,710
For the i-th flip, if it
resulted in heads or a
616
00:33:29,710 --> 00:33:32,110
success, we record it 1.
617
00:33:32,110 --> 00:33:34,220
If not, we record it 0.
618
00:33:34,220 --> 00:33:37,540
And then we look at the
random variable.
619
00:33:37,540 --> 00:33:42,580
If we take the sum of the Xi's,
what is it going to be?
620
00:33:42,580 --> 00:33:48,030
We add one each time that we get
a success, so the sum is
621
00:33:48,030 --> 00:33:50,820
going to be the total
number of successes.
622
00:33:50,820 --> 00:33:53,900
So we break up the random
variable of interest as a sum
623
00:33:53,900 --> 00:33:57,610
of really nice and simple
random variables.
624
00:33:57,610 --> 00:34:00,380
And now we can use the linearity
of expectations.
625
00:34:00,380 --> 00:34:02,800
We're going to find the
expectation of X by finding
626
00:34:02,800 --> 00:34:05,700
the expectation of the Xi's
and then adding the
627
00:34:05,700 --> 00:34:06,770
expectations.
628
00:34:06,770 --> 00:34:09,520
What's the expected
value of Xi?
629
00:34:09,520 --> 00:34:13,050
Well, Xi takes the value 1 with
probability p, and takes
630
00:34:13,050 --> 00:34:15,610
the value 0 with probability
1-p.
631
00:34:15,610 --> 00:34:19,070
So the expected value
of Xi is just p.
632
00:34:19,070 --> 00:34:24,889
So the expected value of X is
going to be just n times p.
633
00:34:24,889 --> 00:34:29,560
Because X is the sum of n terms,
each one of which has
634
00:34:29,560 --> 00:34:33,050
expectation p, the expected
value of the sum is the sum of
635
00:34:33,050 --> 00:34:34,600
the expected values.
636
00:34:34,600 --> 00:34:38,440
So I guess that's a pretty good
shortcut for doing this
637
00:34:38,440 --> 00:34:40,790
horrendous calculation
up there.
638
00:34:40,790 --> 00:34:47,210
So in case you didn't realize
it, that's what we just
639
00:34:47,210 --> 00:34:51,940
established without
doing any algebra.
640
00:34:51,940 --> 00:34:52,219
Good.
641
00:34:52,219 --> 00:34:56,150
How about the variance
of X, of Xi?
642
00:34:56,150 --> 00:34:57,570
Two ways to calculate it.
643
00:34:57,570 --> 00:35:01,160
One is by using directly the
formula for the variance,
644
00:35:01,160 --> 00:35:02,370
which would be --
645
00:35:02,370 --> 00:35:03,900
let's see what it would be.
646
00:35:03,900 --> 00:35:06,800
With probability
p, you get a 1.
647
00:35:06,800 --> 00:35:11,270
And in this case, you are
so far from the mean.
648
00:35:11,270 --> 00:35:13,950
That's your squared distance
from the mean.
649
00:35:13,950 --> 00:35:18,750
With probability 1-p, you
get a 0, which is so far
650
00:35:18,750 --> 00:35:20,380
away from the mean.
651
00:35:20,380 --> 00:35:24,380
And then you can simplify that
formula and get an answer.
652
00:35:24,380 --> 00:35:28,660
How about a slightly easier
way of doing it.
653
00:35:28,660 --> 00:35:31,360
Instead of doing the algebra
here, let me indicate the
654
00:35:31,360 --> 00:35:33,420
slightly easier way.
655
00:35:33,420 --> 00:35:36,070
We have a formula for the
variance that tells us that we
656
00:35:36,070 --> 00:35:42,290
can find the variance by
proceeding this way.
657
00:35:42,290 --> 00:35:45,980
That's a formula that's
generally true for variances.
658
00:35:45,980 --> 00:35:47,380
Why is this easier?
659
00:35:47,380 --> 00:35:49,560
What's the expected value
of Xi squared?
660
00:35:49,560 --> 00:35:52,240
661
00:35:52,240 --> 00:35:53,290
Backtrack.
662
00:35:53,290 --> 00:35:57,140
What is Xi squared, after all?
663
00:35:57,140 --> 00:35:59,510
It's the same thing as Xi.
664
00:35:59,510 --> 00:36:04,200
Since Xi takes value 0 and 1, Xi
squared also takes the same
665
00:36:04,200 --> 00:36:05,780
values, 0 and 1.
666
00:36:05,780 --> 00:36:09,050
So the expected value of Xi
squared is the same as the
667
00:36:09,050 --> 00:36:11,990
expected value of Xi,
which is equal to p.
668
00:36:11,990 --> 00:36:15,120
669
00:36:15,120 --> 00:36:20,530
And the expected value of Xi
squared is p squared, so we
670
00:36:20,530 --> 00:36:24,680
get the final answer,
p times (1-p).
671
00:36:24,680 --> 00:36:28,630
If you were to work through and
do the cancellations in
672
00:36:28,630 --> 00:36:32,400
this messy expression here,
after one line you would also
673
00:36:32,400 --> 00:36:34,050
get to the same formula.
674
00:36:34,050 --> 00:36:38,240
But this sort of illustrates
that working with this formula
675
00:36:38,240 --> 00:36:40,550
for the variance, sometimes
things work
676
00:36:40,550 --> 00:36:43,090
out a little faster.
677
00:36:43,090 --> 00:36:45,420
Finally, are we in business?
678
00:36:45,420 --> 00:36:47,820
Can we calculate the variance
of the random
679
00:36:47,820 --> 00:36:50,100
variable X as well?
680
00:36:50,100 --> 00:36:52,650
Well, we have the rule that
for independent random
681
00:36:52,650 --> 00:36:55,680
variables, the variance
of the sum is
682
00:36:55,680 --> 00:36:57,870
the sum of the variances.
683
00:36:57,870 --> 00:37:00,930
So to find the variance of X,
we just need to add the
684
00:37:00,930 --> 00:37:02,960
variances of the Xi's.
685
00:37:02,960 --> 00:37:07,140
We have n Xi's, and each
one of them has
686
00:37:07,140 --> 00:37:10,110
variance p_n times (1-p).
687
00:37:10,110 --> 00:37:12,290
And we are done.
688
00:37:12,290 --> 00:37:17,780
So this way, we have calculated
both the mean and
689
00:37:17,780 --> 00:37:21,550
the variance of the binomial
random variable.
690
00:37:21,550 --> 00:37:27,280
It's interesting to look at this
particular formula and
691
00:37:27,280 --> 00:37:29,180
see what it tells us.
692
00:37:29,180 --> 00:37:33,470
If you are to plot the variance
of X as a function of
693
00:37:33,470 --> 00:37:36,050
p, it has this shape.
694
00:37:36,050 --> 00:37:45,900
695
00:37:45,900 --> 00:37:51,310
And the maximum is
here at 1/2.
696
00:37:51,310 --> 00:37:55,150
p times (1-p) is 0 when
p is equal to 0.
697
00:37:55,150 --> 00:37:58,570
And when p equals to 1, it's a
quadratic, so it must have
698
00:37:58,570 --> 00:38:00,250
this particular shape.
699
00:38:00,250 --> 00:38:02,080
So what does it tell us?
700
00:38:02,080 --> 00:38:05,880
If you think about variance as
a measure of uncertainty, it
701
00:38:05,880 --> 00:38:10,290
tells you that coin flips
are most uncertain when
702
00:38:10,290 --> 00:38:12,620
your coin is fair.
703
00:38:12,620 --> 00:38:16,190
When p is equal to 1/2, that's
when you have the most
704
00:38:16,190 --> 00:38:17,050
randomness.
705
00:38:17,050 --> 00:38:18,790
And this is kind of intuitive.
706
00:38:18,790 --> 00:38:21,460
if on the other hand I tell you
that the coin is extremely
707
00:38:21,460 --> 00:38:26,490
biased, p very close to 1, which
means it almost always
708
00:38:26,490 --> 00:38:29,460
gives you heads, then
that would be
709
00:38:29,460 --> 00:38:30,630
a case of low variance.
710
00:38:30,630 --> 00:38:32,870
There's low variability
in the results.
711
00:38:32,870 --> 00:38:35,270
There's little uncertainty about
what's going to happen.
712
00:38:35,270 --> 00:38:39,570
It's going to be mostly heads
with some occasional tails.
713
00:38:39,570 --> 00:38:42,010
So p equals 1/2.
714
00:38:42,010 --> 00:38:45,350
Fair coin, that's the coin which
is the most uncertain of
715
00:38:45,350 --> 00:38:47,240
all coins, in some sense.
716
00:38:47,240 --> 00:38:49,240
And it corresponds to the
biggest variance.
717
00:38:49,240 --> 00:38:53,760
It corresponds to an X that has
the widest distribution.
718
00:38:53,760 --> 00:38:57,680
Now that we're on a roll and we
can calculate such hugely
719
00:38:57,680 --> 00:39:01,400
complicated sums in simple ways,
let us try to push our
720
00:39:01,400 --> 00:39:05,100
luck and do a problem with
this flavor, but a little
721
00:39:05,100 --> 00:39:06,590
harder than that.
722
00:39:06,590 --> 00:39:07,960
So you go to one of those
723
00:39:07,960 --> 00:39:09,910
old-fashioned cocktail parties.
724
00:39:09,910 --> 00:39:16,010
All males at least will have
those standard big hats which
725
00:39:16,010 --> 00:39:16,990
look identical.
726
00:39:16,990 --> 00:39:19,700
They check them in when
they walk in.
727
00:39:19,700 --> 00:39:23,390
And when they walk out, since
they look pretty identical,
728
00:39:23,390 --> 00:39:26,830
they just pick a random
hat and go home.
729
00:39:26,830 --> 00:39:31,080
So n people, they pick their
hats completely at random,
730
00:39:31,080 --> 00:39:33,950
quote, unquote, and
then leave.
731
00:39:33,950 --> 00:39:36,970
And the question is, to say
something about the number of
732
00:39:36,970 --> 00:39:42,070
people who end up, by accident
or by luck, to get back their
733
00:39:42,070 --> 00:39:45,170
own hat, the exact same hat
that they checked in.
734
00:39:45,170 --> 00:39:48,490
OK, first what do we mean
completely at random?
735
00:39:48,490 --> 00:39:51,060
Completely at random, we
basically mean that any
736
00:39:51,060 --> 00:39:54,180
permutation of the hats
is equally likely.
737
00:39:54,180 --> 00:39:58,520
Any way of distributing those
n hats to the n people, any
738
00:39:58,520 --> 00:40:01,350
particular way is as likely
as any other way.
739
00:40:01,350 --> 00:40:05,230
So there's complete symmetry
between hats and people.
740
00:40:05,230 --> 00:40:08,490
So what we want to do is to
calculate the expected value
741
00:40:08,490 --> 00:40:11,460
and the variance of this random
variable X. Let's start
742
00:40:11,460 --> 00:40:13,240
with the expected value.
743
00:40:13,240 --> 00:40:17,840
Let's reuse the trick from
the binomial case.
744
00:40:17,840 --> 00:40:21,110
So total number of hats picked,
we're going to think
745
00:40:21,110 --> 00:40:24,140
of total number of hats
picked as a sum of
746
00:40:24,140 --> 00:40:26,900
(0, 1) random variables.
747
00:40:26,900 --> 00:40:30,470
X1 tells us whether person
1 got their own hat back.
748
00:40:30,470 --> 00:40:32,920
If they did, we record a 1.
749
00:40:32,920 --> 00:40:34,960
X2, the same thing.
750
00:40:34,960 --> 00:40:40,910
By adding all X's is how many
1's did we get, which counts
751
00:40:40,910 --> 00:40:45,510
how many people selected
their own hats.
752
00:40:45,510 --> 00:40:48,100
So we broke down the random
variable of interest, the
753
00:40:48,100 --> 00:40:51,500
number of people who get their
own hats back, as a sum of
754
00:40:51,500 --> 00:40:53,570
random variables.
755
00:40:53,570 --> 00:40:56,200
And these random variables,
again, are easy to handle,
756
00:40:56,200 --> 00:40:58,010
because they're binary.
757
00:40:58,010 --> 00:40:59,250
The only take two values.
758
00:40:59,250 --> 00:41:03,500
What's the probability that Xi
is equal to 1, the i-th person
759
00:41:03,500 --> 00:41:06,730
has a probability that they
get their own hat?
760
00:41:06,730 --> 00:41:09,430
There's n hats by symmetry.
761
00:41:09,430 --> 00:41:11,890
The chance is that they end up
getting their own hat, as
762
00:41:11,890 --> 00:41:14,930
opposed to any one of the
other n - 1 hats,
763
00:41:14,930 --> 00:41:18,020
is going to be 1/n.
764
00:41:18,020 --> 00:41:20,710
So what's the expected
value of Xi?
765
00:41:20,710 --> 00:41:23,130
It's one times 1/n.
766
00:41:23,130 --> 00:41:26,510
With probability 1/n, you get
your own hat, or you get a
767
00:41:26,510 --> 00:41:30,960
value of 0 with probability
1-1/n, which is 1/n.
768
00:41:30,960 --> 00:41:34,660
769
00:41:34,660 --> 00:41:38,360
All right, so we got the
expected value of the Xi's.
770
00:41:38,360 --> 00:41:41,510
And remember, we want to do is
to calculate the expected
771
00:41:41,510 --> 00:41:46,900
value of X by using this
decomposition?
772
00:41:46,900 --> 00:41:52,230
Are the random variables Xi
independent of each other?
773
00:41:52,230 --> 00:41:55,470
You can try to answer that
question by writing down a
774
00:41:55,470 --> 00:41:58,510
joint PMF for the X's,
but I'm sure that
775
00:41:58,510 --> 00:42:00,000
you will not succeed.
776
00:42:00,000 --> 00:42:02,740
But can you think intuitively?
777
00:42:02,740 --> 00:42:05,940
If I tell you information about
some of the Xi's, does
778
00:42:05,940 --> 00:42:08,920
it give you information about
the remaining ones?
779
00:42:08,920 --> 00:42:09,300
Yeah.
780
00:42:09,300 --> 00:42:13,950
If I tell you that out of 10
people, 9 of them got their
781
00:42:13,950 --> 00:42:16,710
own hat back, does that
tell you something
782
00:42:16,710 --> 00:42:18,330
about the 10th person?
783
00:42:18,330 --> 00:42:18,690
Yes.
784
00:42:18,690 --> 00:42:22,510
If 9 got their own hat, then the
10th must also have gotten
785
00:42:22,510 --> 00:42:24,170
their own hat back.
786
00:42:24,170 --> 00:42:27,170
So the first 9 random variables
tell you something
787
00:42:27,170 --> 00:42:28,790
about the 10th one.
788
00:42:28,790 --> 00:42:33,000
And conveying information of
this sort, that's the case of
789
00:42:33,000 --> 00:42:34,410
dependence.
790
00:42:34,410 --> 00:42:38,100
All right, so the random
variables are not independent.
791
00:42:38,100 --> 00:42:39,030
Are we stuck?
792
00:42:39,030 --> 00:42:43,240
Can we still calculate the
expected value of X?
793
00:42:43,240 --> 00:42:45,210
Yes, we can.
794
00:42:45,210 --> 00:42:50,710
And the reason we can is that
expectations are linear.
795
00:42:50,710 --> 00:42:53,940
Expectation of a sum of random
variables is the sum of the
796
00:42:53,940 --> 00:42:55,140
expectations.
797
00:42:55,140 --> 00:42:57,490
And that's always true.
798
00:42:57,490 --> 00:43:00,710
There's no independence
assumption that's being used
799
00:43:00,710 --> 00:43:02,540
to apply that rule.
800
00:43:02,540 --> 00:43:06,980
So we have that the expected
value of X is the sum of the
801
00:43:06,980 --> 00:43:09,580
expected value of the Xi's.
802
00:43:09,580 --> 00:43:12,970
And this is a property
that's always true.
803
00:43:12,970 --> 00:43:14,350
You don't need independence.
804
00:43:14,350 --> 00:43:15,590
You don't care.
805
00:43:15,590 --> 00:43:18,660
So we're adding n terms,
each one of which has
806
00:43:18,660 --> 00:43:20,430
expected value 1/n.
807
00:43:20,430 --> 00:43:22,670
And the final answer is 1.
808
00:43:22,670 --> 00:43:27,430
So out of the 100 people who
selected hats at random, on
809
00:43:27,430 --> 00:43:32,590
the average, you expect only one
of them to end up getting
810
00:43:32,590 --> 00:43:35,830
their own hat back.
811
00:43:35,830 --> 00:43:36,640
Very good.
812
00:43:36,640 --> 00:43:41,620
So since we are succeeding so
far, let's try to see if we
813
00:43:41,620 --> 00:43:44,620
can succeed in calculating
the variance as well.
814
00:43:44,620 --> 00:43:46,580
And of course, we will.
815
00:43:46,580 --> 00:43:50,160
But it's going to be a little
more complicated.
816
00:43:50,160 --> 00:43:52,760
The reason it's going to be a
little more complicated is
817
00:43:52,760 --> 00:43:56,500
because the Xi's are not
independent, so the variance
818
00:43:56,500 --> 00:44:00,280
of the sum is not the same as
the sum of the variances.
819
00:44:00,280 --> 00:44:04,320
So it's not enough to find the
variances of the Xi's.
820
00:44:04,320 --> 00:44:06,930
We'll have to do more work.
821
00:44:06,930 --> 00:44:08,550
And here's what's involved.
822
00:44:08,550 --> 00:44:12,320
Let's start with the general
formula for the variance,
823
00:44:12,320 --> 00:44:15,950
which, as I mentioned before,
it's usually the simpler way
824
00:44:15,950 --> 00:44:18,430
to go about calculating
variances.
825
00:44:18,430 --> 00:44:21,800
So we need to calculate the
expected value for X-squared,
826
00:44:21,800 --> 00:44:27,110
and subtract from it the
expectation squared.
827
00:44:27,110 --> 00:44:31,010
Well, we already found the
expected value of X. It's
828
00:44:31,010 --> 00:44:31,870
equal to 1.
829
00:44:31,870 --> 00:44:34,580
So 1-squared gives us just 1.
830
00:44:34,580 --> 00:44:37,980
So we're left with the task of
calculating the expected value
831
00:44:37,980 --> 00:44:43,440
of X-squared, the random
variable X-squared.
832
00:44:43,440 --> 00:44:45,610
Let's try to follow
the same idea.
833
00:44:45,610 --> 00:44:49,770
Write this messy random
variable, X-squared, as a sum
834
00:44:49,770 --> 00:44:54,440
of hopefully simpler
random variables.
835
00:44:54,440 --> 00:44:59,350
So X is the sum of the
Xi's, so you square
836
00:44:59,350 --> 00:45:01,560
both sides of this.
837
00:45:01,560 --> 00:45:05,150
And then you expand the
right-hand side.
838
00:45:05,150 --> 00:45:09,390
When you expand the right-hand
side, you get the squares of
839
00:45:09,390 --> 00:45:11,420
the terms that appear here.
840
00:45:11,420 --> 00:45:14,230
And then you get all
the cross-terms.
841
00:45:14,230 --> 00:45:19,100
For every pair of (i,j) that
are different, i different
842
00:45:19,100 --> 00:45:24,030
than j, you're going to have
a cross-term in the sum.
843
00:45:24,030 --> 00:45:29,230
So now, in order to calculate
the expected value of
844
00:45:29,230 --> 00:45:32,480
X-squared, what does
our task reduce to?
845
00:45:32,480 --> 00:45:36,230
It reduces to calculating the
expected value of this term
846
00:45:36,230 --> 00:45:38,690
and calculating the expected
value of that term.
847
00:45:38,690 --> 00:45:41,060
So let's do them
one at a time.
848
00:45:41,060 --> 00:45:47,040
Expected value of Xi squared,
what is it going to be?
849
00:45:47,040 --> 00:45:48,660
Same trick as before.
850
00:45:48,660 --> 00:45:53,350
Xi takes value 0 or 1, so Xi
squared takes just the same
851
00:45:53,350 --> 00:45:55,290
values, 0 or 1.
852
00:45:55,290 --> 00:45:57,010
So that's the easy one.
853
00:45:57,010 --> 00:46:00,680
That's the same as expected
value of Xi, which we already
854
00:46:00,680 --> 00:46:04,410
know to be 1/n.
855
00:46:04,410 --> 00:46:07,830
So this gives us a first
contribution down here.
856
00:46:07,830 --> 00:46:10,840
857
00:46:10,840 --> 00:46:14,220
The expected value of this
term is going to be what?
858
00:46:14,220 --> 00:46:17,210
We have n terms in
the summation.
859
00:46:17,210 --> 00:46:21,800
And each one of these terms
has an expectation of 1/n.
860
00:46:21,800 --> 00:46:24,710
So we did a piece
of the puzzle.
861
00:46:24,710 --> 00:46:28,480
So now let's deal with the
second piece of the puzzle.
862
00:46:28,480 --> 00:46:32,020
Let's find the expected
value of Xi times Xj.
863
00:46:32,020 --> 00:46:35,540
Now by symmetry, the expected
value of Xi times Xj is going
864
00:46:35,540 --> 00:46:39,900
to be the same no matter
what i and j you see.
865
00:46:39,900 --> 00:46:44,930
So let's just think about X1
and X2 and try to find the
866
00:46:44,930 --> 00:46:48,260
expected value of X1 and X2.
867
00:46:48,260 --> 00:46:51,710
X1 times X2 is a random
variable.
868
00:46:51,710 --> 00:46:53,960
What values does it take?
869
00:46:53,960 --> 00:46:56,570
Only 0 or 1?
870
00:46:56,570 --> 00:47:00,000
Since X1 and X2 are 0 or 1,
their product can only take
871
00:47:00,000 --> 00:47:02,010
the values of 0 or 1.
872
00:47:02,010 --> 00:47:04,990
So to find the probability
distribution of this random
873
00:47:04,990 --> 00:47:07,320
variable, it's just sufficient
to find the probability that
874
00:47:07,320 --> 00:47:09,530
it takes the value of 1.
875
00:47:09,530 --> 00:47:14,500
Now, what does X1 times
X2 equal to 1 mean?
876
00:47:14,500 --> 00:47:19,500
It means that X1 was
1 and X2 was 1.
877
00:47:19,500 --> 00:47:22,390
The only way that you can get
a product of 1 is if both of
878
00:47:22,390 --> 00:47:24,350
them turned out to be 1's.
879
00:47:24,350 --> 00:47:29,570
So that's the same as saying,
persons 1 and 2 both picked
880
00:47:29,570 --> 00:47:31,980
their own hats.
881
00:47:31,980 --> 00:47:35,510
The probability that person 1
and person 2 both pick their
882
00:47:35,510 --> 00:47:39,600
own hats is the probability of
two things happening, which is
883
00:47:39,600 --> 00:47:42,320
the product of the first thing
happening times the
884
00:47:42,320 --> 00:47:44,310
conditional probability
of the second, given
885
00:47:44,310 --> 00:47:46,160
that the first happened.
886
00:47:46,160 --> 00:47:48,690
And in words, this is the
probability that the first
887
00:47:48,690 --> 00:47:51,840
person picked their own hat
times the probability that the
888
00:47:51,840 --> 00:47:54,920
second person picks their own
hat, given that the first
889
00:47:54,920 --> 00:47:56,990
person already picked
their own.
890
00:47:56,990 --> 00:47:58,820
So what's the probability
that the first person
891
00:47:58,820 --> 00:48:00,760
picks their own hat?
892
00:48:00,760 --> 00:48:03,040
We know that it's 1/n.
893
00:48:03,040 --> 00:48:05,030
Now, how about the
second person?
894
00:48:05,030 --> 00:48:09,540
If I tell you that one person
has their own hat, and that
895
00:48:09,540 --> 00:48:13,240
person takes their hat and goes
away, from the point of
896
00:48:13,240 --> 00:48:17,250
view of the second person,
there's n - 1 people left
897
00:48:17,250 --> 00:48:19,770
looking at n - 1 hats.
898
00:48:19,770 --> 00:48:22,330
And they're getting just
hats at random.
899
00:48:22,330 --> 00:48:24,930
What's the chance that
I will get my own?
900
00:48:24,930 --> 00:48:26,180
It's 1/n - 1.
901
00:48:26,180 --> 00:48:29,210
902
00:48:29,210 --> 00:48:33,700
So think of them as person 1
goes, picks a hat at random,
903
00:48:33,700 --> 00:48:36,850
it happens to be their
own, and it leaves.
904
00:48:36,850 --> 00:48:40,120
You're left with n - 1 people,
and there are n
905
00:48:40,120 --> 00:48:41,250
- 1 hats out there.
906
00:48:41,250 --> 00:48:44,490
Person 2 goes and picks a hat
at random, with probability
907
00:48:44,490 --> 00:48:48,820
1/n - 1, is going to
pick his own hat.
908
00:48:48,820 --> 00:48:52,400
So the expected value now of
this random variable is,
909
00:48:52,400 --> 00:48:54,520
again, that same number,
because this is
910
00:48:54,520 --> 00:48:57,500
a 0, 1 random variable.
911
00:48:57,500 --> 00:49:02,370
So this is the same as expected
value of Xi times Xj
912
00:49:02,370 --> 00:49:04,810
when i different than j.
913
00:49:04,810 --> 00:49:09,830
So here, all that's left to do
is to add the expectations of
914
00:49:09,830 --> 00:49:10,540
these terms.
915
00:49:10,540 --> 00:49:14,480
Each one of these terms has an
expected value that's 1/n
916
00:49:14,480 --> 00:49:16,910
times (1/n - 1).
917
00:49:16,910 --> 00:49:19,170
And how many terms do we have?
918
00:49:19,170 --> 00:49:21,410
How many of these are
we adding up?
919
00:49:21,410 --> 00:49:24,840
920
00:49:24,840 --> 00:49:28,950
It's n-squared - n.
921
00:49:28,950 --> 00:49:31,830
When you expand the quadratic,
there's a total
922
00:49:31,830 --> 00:49:33,890
of n-squared terms.
923
00:49:33,890 --> 00:49:37,860
Some are self-terms,
n of them.
924
00:49:37,860 --> 00:49:42,170
And the remaining number of
terms is n-squared - n.
925
00:49:42,170 --> 00:49:48,310
So here we got n-squared
- n terms.
926
00:49:48,310 --> 00:49:51,200
And so we need to multiply
here with n-squared - n.
927
00:49:51,200 --> 00:49:53,810
928
00:49:53,810 --> 00:49:59,980
And after you realize that this
number here is 1, and you
929
00:49:59,980 --> 00:50:03,490
realize that this is the same
as the denominator, you get
930
00:50:03,490 --> 00:50:06,750
the answer that the expected
value of X squared equals 2.
931
00:50:06,750 --> 00:50:10,120
And then, finally going up to
the top formula, we get the
932
00:50:10,120 --> 00:50:14,720
expected value of X squared,
which is 2 - 1, and the
933
00:50:14,720 --> 00:50:17,610
variance is just equal to 1.
934
00:50:17,610 --> 00:50:21,680
So the variance of this random
variable, number of people who
935
00:50:21,680 --> 00:50:25,130
get their own hats back,
is also equal to 1,
936
00:50:25,130 --> 00:50:26,540
equal to the mean.
937
00:50:26,540 --> 00:50:27,690
Looks like magic.
938
00:50:27,690 --> 00:50:29,220
Why is this the case?
939
00:50:29,220 --> 00:50:31,550
Well, there's a deeper
explanation why these two
940
00:50:31,550 --> 00:50:33,630
numbers should come out
to be the same.
941
00:50:33,630 --> 00:50:35,980
But this is something that would
probably have to wait a
942
00:50:35,980 --> 00:50:39,420
couple of chapters before we
could actually explain it.
943
00:50:39,420 --> 00:50:40,730
And so I'll stop here.
944
00:50:40,730 --> 00:50:41,980