1
00:00:00,000 --> 00:00:00,040
2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative
3
00:00:02,460 --> 00:00:03,870
Commons license.
4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to
5
00:00:06,910 --> 00:00:10,560
offer high-quality educational
resources for free.
6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from
7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at
8
00:00:19,290 --> 00:00:20,540
ocw.mit.edu.
9
00:00:20,540 --> 00:00:22,460
10
00:00:22,460 --> 00:00:26,290
PROFESSOR: OK so let's start.
11
00:00:26,290 --> 00:00:28,120
So today, we're going
to continue the
12
00:00:28,120 --> 00:00:29,550
subject from last time.
13
00:00:29,550 --> 00:00:32,619
And the subject is
random variables.
14
00:00:32,619 --> 00:00:35,890
As we discussed, random
variables basically associate
15
00:00:35,890 --> 00:00:39,550
numerical values with the
outcomes of an experiment.
16
00:00:39,550 --> 00:00:42,690
And we want to learn how
to manipulate them.
17
00:00:42,690 --> 00:00:45,890
Now to a large extent, what's
going to happen, what's
18
00:00:45,890 --> 00:00:50,850
happening during this chapter,
is that we are revisiting the
19
00:00:50,850 --> 00:00:53,220
same concepts we have
seen in chapter one.
20
00:00:53,220 --> 00:00:57,340
But we're going to introduce
a lot of new notation, but
21
00:00:57,340 --> 00:01:00,210
really dealing with the
same kind of stuff.
22
00:01:00,210 --> 00:01:05,500
The only difference where we
go beyond the new notation,
23
00:01:05,500 --> 00:01:08,470
the new concept in this chapter
is the concept of the
24
00:01:08,470 --> 00:01:10,470
expectation or expected
values.
25
00:01:10,470 --> 00:01:14,010
And we're going to learn how
to manipulate expectations.
26
00:01:14,010 --> 00:01:17,280
So let us start with a quick
review of what we
27
00:01:17,280 --> 00:01:19,610
discussed last time.
28
00:01:19,610 --> 00:01:22,830
We talked about random
variables.
29
00:01:22,830 --> 00:01:25,860
Loosely speaking, random
variables are random
30
00:01:25,860 --> 00:01:28,670
quantities that result
from an experiment.
31
00:01:28,670 --> 00:01:31,950
More precisely speaking,
mathematically speaking, a
32
00:01:31,950 --> 00:01:35,310
random variable is a function
from the sample space to the
33
00:01:35,310 --> 00:01:35,900
real numbers.
34
00:01:35,900 --> 00:01:39,370
That is, you give me an outcome,
and based on that
35
00:01:39,370 --> 00:01:42,940
outcome, I can tell you the
value of the random variable.
36
00:01:42,940 --> 00:01:45,750
So the value of the random
variable is a function of the
37
00:01:45,750 --> 00:01:47,870
outcome that we have.
38
00:01:47,870 --> 00:01:51,170
Now given a random variable,
some of the numerical outcomes
39
00:01:51,170 --> 00:01:52,850
are more likely than others.
40
00:01:52,850 --> 00:01:55,600
And we want to say which ones
are more likely and
41
00:01:55,600 --> 00:01:57,420
how likely they are.
42
00:01:57,420 --> 00:02:00,900
And the way we do that is by
writing down the probabilities
43
00:02:00,900 --> 00:02:04,030
of the different possible
numerical outcomes.
44
00:02:04,030 --> 00:02:05,530
Notice here, the notation.
45
00:02:05,530 --> 00:02:08,660
We use uppercase to denote
the random variable.
46
00:02:08,660 --> 00:02:11,840
We use lowercase to denote
real numbers.
47
00:02:11,840 --> 00:02:15,880
So the way you read this, this
is the probability that the
48
00:02:15,880 --> 00:02:20,160
random variable, capital X,
happens to take the numerical
49
00:02:20,160 --> 00:02:21,840
value, little x.
50
00:02:21,840 --> 00:02:25,780
This is a concept that's
familiar from chapter one.
51
00:02:25,780 --> 00:02:28,810
And this is just the new
notation we will be using for
52
00:02:28,810 --> 00:02:30,080
that concept.
53
00:02:30,080 --> 00:02:33,690
It's the Probability Mass
Function of the random
54
00:02:33,690 --> 00:02:37,620
variable, capital X. So the
subscript just indicates which
55
00:02:37,620 --> 00:02:40,190
random variable we're
talking about.
56
00:02:40,190 --> 00:02:44,070
And it's the probability
assigned to
57
00:02:44,070 --> 00:02:45,300
a particular outcome.
58
00:02:45,300 --> 00:02:48,060
And we want to assign such
probabilities for all possibly
59
00:02:48,060 --> 00:02:49,400
numerical values.
60
00:02:49,400 --> 00:02:52,690
So you can think of this as
being a function of little x.
61
00:02:52,690 --> 00:02:56,910
And it tells you how likely
every little x is going to be.
62
00:02:56,910 --> 00:02:59,310
Now the new concept we
introduced last time is the
63
00:02:59,310 --> 00:03:02,520
concept of the expected value
for random variable, which is
64
00:03:02,520 --> 00:03:04,160
defined this way.
65
00:03:04,160 --> 00:03:06,870
You look at all the
possible outcomes.
66
00:03:06,870 --> 00:03:10,900
And you form some kind of
average of all the possible
67
00:03:10,900 --> 00:03:15,000
numerical values over the random
variable capital X. You
68
00:03:15,000 --> 00:03:17,830
consider all the possible
numerical values, and you form
69
00:03:17,830 --> 00:03:18,560
an average.
70
00:03:18,560 --> 00:03:23,160
In fact, it's a weighted average
where, to every little
71
00:03:23,160 --> 00:03:26,750
x, you assign a weight equal to
the probability that that
72
00:03:26,750 --> 00:03:30,190
particular little x is
going to be realized.
73
00:03:30,190 --> 00:03:34,700
74
00:03:34,700 --> 00:03:38,100
Now, as we discussed last time,
if you have a random
75
00:03:38,100 --> 00:03:41,220
variable, you can take a
function of a random variable.
76
00:03:41,220 --> 00:03:43,860
And that's going to be a
new random variable.
77
00:03:43,860 --> 00:03:47,870
So if capital X is a random
variable and g is a function,
78
00:03:47,870 --> 00:03:51,920
g of X is a new random
variable.
79
00:03:51,920 --> 00:03:53,090
You do the experiment.
80
00:03:53,090 --> 00:03:54,310
You get an outcome.
81
00:03:54,310 --> 00:03:56,950
This determines the value of
X. And that determines the
82
00:03:56,950 --> 00:03:58,400
value of g of X.
83
00:03:58,400 --> 00:04:01,790
So the numerical value of g of
X is determined by whatever
84
00:04:01,790 --> 00:04:03,330
happens in the experiment.
85
00:04:03,330 --> 00:04:04,150
It's random.
86
00:04:04,150 --> 00:04:06,440
And that makes it a
random variable.
87
00:04:06,440 --> 00:04:09,040
Since it's a random variable,
it has an
88
00:04:09,040 --> 00:04:11,320
expectation of its own.
89
00:04:11,320 --> 00:04:14,860
So how would we calculate the
expectation of g of X?
90
00:04:14,860 --> 00:04:18,430
You could proceed by just using
the definition, which
91
00:04:18,430 --> 00:04:23,170
would require you to find the
PMF of the random variable g
92
00:04:23,170 --> 00:04:29,480
of X. So find the PMF of g of X,
and then apply the formula
93
00:04:29,480 --> 00:04:31,260
for the expected value
of a random
94
00:04:31,260 --> 00:04:33,360
variable with known PMF.
95
00:04:33,360 --> 00:04:36,820
But there is also a shortcut,
which is just a different way
96
00:04:36,820 --> 00:04:40,530
of doing the counting and the
calculations, in which we do
97
00:04:40,530 --> 00:04:44,580
not need to find the PMF of g
of X. We just work with the
98
00:04:44,580 --> 00:04:47,290
PMF of the original
random variable.
99
00:04:47,290 --> 00:04:50,010
And what this is saying is that
the average value of g of
100
00:04:50,010 --> 00:04:51,800
X is obtained as follows.
101
00:04:51,800 --> 00:04:55,020
You look at all the possible
results, the X's,
102
00:04:55,020 --> 00:04:56,460
how likely they are.
103
00:04:56,460 --> 00:04:59,740
And when that particular
X happens, this is
104
00:04:59,740 --> 00:05:01,470
how much you get.
105
00:05:01,470 --> 00:05:05,690
And so this way, you add
these things up.
106
00:05:05,690 --> 00:05:10,310
And you get the average amount
that you're going to get, the
107
00:05:10,310 --> 00:05:13,120
average value of g of X, where
you average over the
108
00:05:13,120 --> 00:05:16,130
likelihoods of the
different X's.
109
00:05:16,130 --> 00:05:19,730
Now expected values have some
properties that are always
110
00:05:19,730 --> 00:05:23,570
true and some properties that
sometimes are not true.
111
00:05:23,570 --> 00:05:28,160
So the property that is not
always true is that this would
112
00:05:28,160 --> 00:05:34,400
be the same as g of the expected
value of X. So in
113
00:05:34,400 --> 00:05:36,730
general, this is not true.
114
00:05:36,730 --> 00:05:40,260
You cannot interchange function
and expectation,
115
00:05:40,260 --> 00:05:44,310
which means you cannot reason
on the average, in general.
116
00:05:44,310 --> 00:05:45,780
But there are some exceptions.
117
00:05:45,780 --> 00:05:49,780
When g is a linear function,
then the expected value for a
118
00:05:49,780 --> 00:05:53,460
linear function is the same as
that same linear function of
119
00:05:53,460 --> 00:05:54,470
the expectation.
120
00:05:54,470 --> 00:05:57,470
So for linear functions, so
for random variable, the
121
00:05:57,470 --> 00:06:00,010
expectation behaves nicely.
122
00:06:00,010 --> 00:06:05,320
So this is basically telling you
that, if X is degrees in
123
00:06:05,320 --> 00:06:09,440
Celsius, alpha X plus b is
degrees in Fahrenheit, you can
124
00:06:09,440 --> 00:06:12,030
first do the conversion
to Fahrenheit
125
00:06:12,030 --> 00:06:13,390
and take the average.
126
00:06:13,390 --> 00:06:16,870
Or you can find the average
temperature in Celsius, and
127
00:06:16,870 --> 00:06:21,270
then do the conversion
to Fahrenheit.
128
00:06:21,270 --> 00:06:23,630
Either is valid.
129
00:06:23,630 --> 00:06:27,370
So the expected value tells us
something about where is the
130
00:06:27,370 --> 00:06:31,320
center of the distribution, more
specifically, the center
131
00:06:31,320 --> 00:06:34,360
of mass or the center of gravity
of the PMF, when you
132
00:06:34,360 --> 00:06:36,170
plot it as a bar graph.
133
00:06:36,170 --> 00:06:39,880
Besides the average value, you
may be interested in knowing
134
00:06:39,880 --> 00:06:45,810
how far will you be from
the average, typically.
135
00:06:45,810 --> 00:06:48,880
So let's look at this quantity,
X minus expected
136
00:06:48,880 --> 00:06:50,270
value of X.
137
00:06:50,270 --> 00:06:53,620
This is the distance from
the average value.
138
00:06:53,620 --> 00:06:58,230
So for a random outcome of the
experiment, this quantity in
139
00:06:58,230 --> 00:07:01,260
here measures how far
away from the mean
140
00:07:01,260 --> 00:07:03,380
you happen to be.
141
00:07:03,380 --> 00:07:08,330
This quantity inside the
brackets is a random variable.
142
00:07:08,330 --> 00:07:09,620
Why?
143
00:07:09,620 --> 00:07:12,620
Because capital X is random.
144
00:07:12,620 --> 00:07:15,910
And what we have here is capital
X, which is random,
145
00:07:15,910 --> 00:07:17,620
minus a number.
146
00:07:17,620 --> 00:07:20,130
Remember, expected values
are numbers.
147
00:07:20,130 --> 00:07:22,340
Now a random variable
minus a number is
148
00:07:22,340 --> 00:07:23,470
a new random variable.
149
00:07:23,470 --> 00:07:26,340
It has an expectation
of its own.
150
00:07:26,340 --> 00:07:30,600
We can use the linearity rule,
expected value of something
151
00:07:30,600 --> 00:07:35,060
minus something else is just
the difference of their
152
00:07:35,060 --> 00:07:36,120
expected value.
153
00:07:36,120 --> 00:07:40,250
So it's going to be expected
value of X minus the expected
154
00:07:40,250 --> 00:07:42,290
value over this thing.
155
00:07:42,290 --> 00:07:44,180
Now this thing is a number.
156
00:07:44,180 --> 00:07:46,700
And the expected value
of a number is
157
00:07:46,700 --> 00:07:48,710
just the number itself.
158
00:07:48,710 --> 00:07:51,660
So we get from here that this
is expected value minus
159
00:07:51,660 --> 00:07:52,750
expected value.
160
00:07:52,750 --> 00:07:55,510
And we get zero.
161
00:07:55,510 --> 00:07:57,420
What is this telling us?
162
00:07:57,420 --> 00:08:02,690
That, on the average, the
assigned difference from the
163
00:08:02,690 --> 00:08:05,080
mean is equal to zero.
164
00:08:05,080 --> 00:08:06,460
That is, the mean is here.
165
00:08:06,460 --> 00:08:08,850
Sometimes X will fall
to the right.
166
00:08:08,850 --> 00:08:11,770
Sometimes X will fall
to the left.
167
00:08:11,770 --> 00:08:16,170
On the average, the average
distance from the mean is
168
00:08:16,170 --> 00:08:19,390
going to be zero, because
sometimes the realized
169
00:08:19,390 --> 00:08:21,950
distance will be positive,
sometimes it will be negative.
170
00:08:21,950 --> 00:08:24,680
Positives and negatives
cancel out.
171
00:08:24,680 --> 00:08:28,140
So if we want to capture the
idea of how far are we from
172
00:08:28,140 --> 00:08:33,090
the mean, just looking at the
assigned distance from the
173
00:08:33,090 --> 00:08:36,789
mean is not going to give us
any useful information.
174
00:08:36,789 --> 00:08:40,200
So if we want to say something
about how far we are,
175
00:08:40,200 --> 00:08:43,120
typically, we should do
something different.
176
00:08:43,120 --> 00:08:47,810
One possibility might be to take
the absolute values of
177
00:08:47,810 --> 00:08:49,520
the differences.
178
00:08:49,520 --> 00:08:51,780
And that's a quantity that
sometimes people are
179
00:08:51,780 --> 00:08:52,920
interested in.
180
00:08:52,920 --> 00:08:58,540
But it turns out that a more
useful quantity happens to be
181
00:08:58,540 --> 00:09:02,190
the variance of a random
variable, which actually
182
00:09:02,190 --> 00:09:07,030
measures the average squared
distance from the mean.
183
00:09:07,030 --> 00:09:11,230
So you have a random outcome,
random results, random
184
00:09:11,230 --> 00:09:14,130
numerical value of the
random variable.
185
00:09:14,130 --> 00:09:17,370
It is a certain distance
away from the mean.
186
00:09:17,370 --> 00:09:19,390
That certain distance
is random.
187
00:09:19,390 --> 00:09:21,210
We take the square of that.
188
00:09:21,210 --> 00:09:23,200
This is the squared distance
from the mean,
189
00:09:23,200 --> 00:09:24,750
which is again random.
190
00:09:24,750 --> 00:09:27,990
Since it's random, it has an
expected value of its own.
191
00:09:27,990 --> 00:09:32,000
And that expected value, we call
it the variance of X. And
192
00:09:32,000 --> 00:09:35,350
so we have this particular
definition.
193
00:09:35,350 --> 00:09:39,500
Using the rule that we have up
here for how to calculate
194
00:09:39,500 --> 00:09:42,870
expectations of functions
of a random variable,
195
00:09:42,870 --> 00:09:44,820
why does that apply?
196
00:09:44,820 --> 00:09:48,980
Well, what we have inside the
brackets here is a function of
197
00:09:48,980 --> 00:09:52,850
the random variable, capital X.
So we can apply this rule
198
00:09:52,850 --> 00:09:55,750
where g is this particular
function.
199
00:09:55,750 --> 00:09:58,540
And we can use that to calculate
the variance,
200
00:09:58,540 --> 00:10:02,160
starting with the PMF of the
random variable X. And then we
201
00:10:02,160 --> 00:10:05,200
have a useful formula that's a
nice shortcut, sometimes, if
202
00:10:05,200 --> 00:10:07,620
you want to do the
calculation.
203
00:10:07,620 --> 00:10:11,340
Now one thing that's slightly
wrong with the variance is
204
00:10:11,340 --> 00:10:15,600
that the units are not right, if
you want to talk about the
205
00:10:15,600 --> 00:10:16,930
spread a of a distribution.
206
00:10:16,930 --> 00:10:20,830
Suppose that X is a random
variable measured in meters.
207
00:10:20,830 --> 00:10:26,170
The variance will have the
units of meters squared.
208
00:10:26,170 --> 00:10:28,390
So it's a kind of a
different thing.
209
00:10:28,390 --> 00:10:31,120
If you want to talk about the
spread of the distribution
210
00:10:31,120 --> 00:10:35,210
using the same units as you have
for X, it's convenient to
211
00:10:35,210 --> 00:10:37,880
take the square root
of the variance.
212
00:10:37,880 --> 00:10:39,580
And that's something
that we define.
213
00:10:39,580 --> 00:10:42,980
And we call it to the standard
deviation of X, or the
214
00:10:42,980 --> 00:10:46,140
standard deviation of the
distribution of X. So it tells
215
00:10:46,140 --> 00:10:49,510
you the amount of spread
in your distribution.
216
00:10:49,510 --> 00:10:52,980
And it is in the same units as
the random variable itself
217
00:10:52,980 --> 00:10:54,230
that you are dealing with.
218
00:10:54,230 --> 00:10:57,260
219
00:10:57,260 --> 00:11:02,500
And we can just illustrate
those quantities with an
220
00:11:02,500 --> 00:11:06,670
example that's about as
simple as it can be.
221
00:11:06,670 --> 00:11:08,570
So consider the following
experiment.
222
00:11:08,570 --> 00:11:11,060
You're going to go from
here to New York,
223
00:11:11,060 --> 00:11:13,140
let's say, 200 miles.
224
00:11:13,140 --> 00:11:15,500
And you have two alternatives.
225
00:11:15,500 --> 00:11:20,640
Either you'll get your private
plane and go at a speed of 200
226
00:11:20,640 --> 00:11:27,690
miles per hour, constant speed
during your trip, or
227
00:11:27,690 --> 00:11:32,010
otherwise, you'll decide to walk
really, really slowly, at
228
00:11:32,010 --> 00:11:35,120
the leisurely pace of
one mile per hour.
229
00:11:35,120 --> 00:11:38,150
So you pick the speed at
random by doing this
230
00:11:38,150 --> 00:11:39,820
experiment, by flipping
a coin.
231
00:11:39,820 --> 00:11:41,510
And with probability one-half,
you do one thing.
232
00:11:41,510 --> 00:11:44,230
With probably one-half, you
do the other thing.
233
00:11:44,230 --> 00:11:47,890
So your V is a random
variable.
234
00:11:47,890 --> 00:11:51,050
In case you're interested in
how much time it's going to
235
00:11:51,050 --> 00:11:54,970
take you to get there, well,
time is equal to distance
236
00:11:54,970 --> 00:11:56,920
divided by speed.
237
00:11:56,920 --> 00:11:58,480
So that's the formula.
238
00:11:58,480 --> 00:12:01,660
The time itself is a random
variable, because it's a
239
00:12:01,660 --> 00:12:03,850
function of V, which
is random.
240
00:12:03,850 --> 00:12:06,390
How much time it's going to take
you depends on the coin
241
00:12:06,390 --> 00:12:09,270
flip that you do in the
beginning to decide what speed
242
00:12:09,270 --> 00:12:11,920
you are going to have.
243
00:12:11,920 --> 00:12:15,110
OK, just as a warm up, the
trivial calculations.
244
00:12:15,110 --> 00:12:17,790
To find the expected value of
V, you argue as follows.
245
00:12:17,790 --> 00:12:21,730
With probability one-half,
V is going to be one.
246
00:12:21,730 --> 00:12:25,740
And with probability one-half,
V is going to be 200.
247
00:12:25,740 --> 00:12:31,410
And so the expected value
of your speed is 100.5.
248
00:12:31,410 --> 00:12:34,970
If you wish to calculate the
variance of V, then you argue
249
00:12:34,970 --> 00:12:36,420
as follows.
250
00:12:36,420 --> 00:12:40,650
With probability one-half, I'm
going to travel at the speed
251
00:12:40,650 --> 00:12:45,920
of one, whereas, the
mean is 100.5.
252
00:12:45,920 --> 00:12:49,090
So this is the distance from
the mean, if I decide to
253
00:12:49,090 --> 00:12:50,380
travel at the speed of one.
254
00:12:50,380 --> 00:12:52,920
We take that distance from
the mean squared.
255
00:12:52,920 --> 00:12:55,330
That's one contribution
to the variance.
256
00:12:55,330 --> 00:12:59,220
And with probability one-half,
you're going to travel at the
257
00:12:59,220 --> 00:13:05,610
speed of 200, which is this
much away from the mean.
258
00:13:05,610 --> 00:13:08,270
You take the square of that.
259
00:13:08,270 --> 00:13:12,360
OK, so approximately how
big is this number?
260
00:13:12,360 --> 00:13:14,370
Well, this is roughly
100 squared.
261
00:13:14,370 --> 00:13:16,130
That's also 100 squared.
262
00:13:16,130 --> 00:13:20,760
So approximately, the variance
of this random
263
00:13:20,760 --> 00:13:25,060
variable is 100 squared.
264
00:13:25,060 --> 00:13:28,090
Now if I tell you that the
variance of this distribution
265
00:13:28,090 --> 00:13:31,980
is 10,000, it doesn't
really help you to
266
00:13:31,980 --> 00:13:33,600
relate it to this diagram.
267
00:13:33,600 --> 00:13:35,950
Whereas, the standard deviation,
where you take the
268
00:13:35,950 --> 00:13:38,850
square root, is more
interesting.
269
00:13:38,850 --> 00:13:43,770
It's the square root of 100
squared, which is a 100.
270
00:13:43,770 --> 00:13:48,320
And the standard deviation,
indeed, gives us a sense of
271
00:13:48,320 --> 00:13:53,280
how spread out this distribution
is from the mean.
272
00:13:53,280 --> 00:13:57,440
So the standard deviation
basically gives us some
273
00:13:57,440 --> 00:14:01,110
indication about this spacing
that we have here.
274
00:14:01,110 --> 00:14:03,950
It tells us the amount of spread
in our distribution.
275
00:14:03,950 --> 00:14:07,320
276
00:14:07,320 --> 00:14:12,110
OK, now let's look at what
happens to time.
277
00:14:12,110 --> 00:14:14,970
V is a random variable.
278
00:14:14,970 --> 00:14:17,340
T is a random variable.
279
00:14:17,340 --> 00:14:19,820
So now let's look at the
expected values and all of
280
00:14:19,820 --> 00:14:23,730
that for the time.
281
00:14:23,730 --> 00:14:29,250
OK, so the time is a function
of a random variable.
282
00:14:29,250 --> 00:14:33,070
We can find the expected time
by looking at all possible
283
00:14:33,070 --> 00:14:37,030
outcomes of the experiment, the
V's, weigh them according
284
00:14:37,030 --> 00:14:39,820
to their probabilities, and for
each particular V, keep
285
00:14:39,820 --> 00:14:42,880
track of how much
time it took us.
286
00:14:42,880 --> 00:14:48,760
So if V is one, which happens
with probability one-half, the
287
00:14:48,760 --> 00:14:52,440
time it takes is going
to be 200.
288
00:14:52,440 --> 00:14:56,410
If we travel at speed of one,
it takes us 200 time units.
289
00:14:56,410 --> 00:15:01,560
And otherwise, if our speed is
equal to 200, the time is one.
290
00:15:01,560 --> 00:15:06,540
So the expected value of T is
once more the same as before.
291
00:15:06,540 --> 00:15:09,740
It's 100.5.
292
00:15:09,740 --> 00:15:13,840
So the expected speed
is 100.5.
293
00:15:13,840 --> 00:15:19,320
The expected time
is also 100.5.
294
00:15:19,320 --> 00:15:22,030
So the product of these
expectations is
295
00:15:22,030 --> 00:15:24,350
something like 10,000.
296
00:15:24,350 --> 00:15:29,230
How about the expected value
of the product of T and V?
297
00:15:29,230 --> 00:15:32,460
Well, T times V is 200.
298
00:15:32,460 --> 00:15:37,790
No matter what outcome you have
in the experiment, in
299
00:15:37,790 --> 00:15:41,320
that particular outcome, T
times V is total distance
300
00:15:41,320 --> 00:15:45,660
traveled, which is
exactly 200.
301
00:15:45,660 --> 00:15:49,890
And so what do we get in this
simple example is that the
302
00:15:49,890 --> 00:15:53,550
expected value of the product of
these two random variables
303
00:15:53,550 --> 00:15:57,630
is different than the product
of their expected values.
304
00:15:57,630 --> 00:16:01,120
This is one more instance
of where we cannot
305
00:16:01,120 --> 00:16:03,500
reason on the average.
306
00:16:03,500 --> 00:16:08,190
So on the average, over a large
number of trips, your
307
00:16:08,190 --> 00:16:10,120
average time would be 100.
308
00:16:10,120 --> 00:16:13,570
On the average, over a large
number of trips, your average
309
00:16:13,570 --> 00:16:15,820
speed would be 100.
310
00:16:15,820 --> 00:16:20,740
But your average distance
traveled is not 100 times 100.
311
00:16:20,740 --> 00:16:22,580
It's something else.
312
00:16:22,580 --> 00:16:26,850
So you cannot reason on the
average, whenever you're
313
00:16:26,850 --> 00:16:28,850
dealing with non-linear
things.
314
00:16:28,850 --> 00:16:31,410
And the non-linear thing here
is that you have a function
315
00:16:31,410 --> 00:16:34,740
which is a product of stuff,
as opposed to just
316
00:16:34,740 --> 00:16:37,330
linear sums of stuff.
317
00:16:37,330 --> 00:16:42,530
Another way to look at what's
happening here is the expected
318
00:16:42,530 --> 00:16:44,100
value of the time.
319
00:16:44,100 --> 00:16:47,460
Time, by definition, is
200 over the speed.
320
00:16:47,460 --> 00:16:52,100
Expected value of the time, we
found it to be about a 100.
321
00:16:52,100 --> 00:17:02,800
And so expected value of 200
over V is about a 100.
322
00:17:02,800 --> 00:17:07,280
But it's different from this
quantity here, which is
323
00:17:07,280 --> 00:17:13,119
roughly equal to
2, and so 200.
324
00:17:13,119 --> 00:17:15,430
Expected value of
V is about 100.
325
00:17:15,430 --> 00:17:19,030
So this quantity is about
equal to two.
326
00:17:19,030 --> 00:17:22,329
Whereas, this quantity
up here is about 100.
327
00:17:22,329 --> 00:17:23,609
So what do we have here?
328
00:17:23,609 --> 00:17:26,960
We have a non-linear function
of V. And we find that the
329
00:17:26,960 --> 00:17:31,130
expected value of this function
is not the same thing
330
00:17:31,130 --> 00:17:34,390
as the function of the
expected value.
331
00:17:34,390 --> 00:17:38,560
So again, that's an instance
where you cannot interchange
332
00:17:38,560 --> 00:17:40,820
expected values and functions.
333
00:17:40,820 --> 00:17:42,770
And that's because things
are non-linear.
334
00:17:42,770 --> 00:17:46,120
335
00:17:46,120 --> 00:17:51,730
OK, so now let us introduce
a new concept.
336
00:17:51,730 --> 00:17:56,030
Or maybe it's not quite
a new concept.
337
00:17:56,030 --> 00:17:58,570
So we discussed, in chapter
one, that we have
338
00:17:58,570 --> 00:17:59,470
probabilities.
339
00:17:59,470 --> 00:18:03,170
We also have conditional
probabilities.
340
00:18:03,170 --> 00:18:05,250
What's the difference
between them?
341
00:18:05,250 --> 00:18:06,410
Essentially, none.
342
00:18:06,410 --> 00:18:09,560
Probabilities are just an
assignment of probability
343
00:18:09,560 --> 00:18:11,650
values to give different
outcomes, given
344
00:18:11,650 --> 00:18:12,980
a particular model.
345
00:18:12,980 --> 00:18:16,180
Somebody comes and gives
you new information.
346
00:18:16,180 --> 00:18:18,370
So you come up with
a new model.
347
00:18:18,370 --> 00:18:20,180
And you have a new
probabilities.
348
00:18:20,180 --> 00:18:23,120
We call these conditional
probabilities, but they taste
349
00:18:23,120 --> 00:18:27,230
and behave exactly the same
as ordinary probabilities.
350
00:18:27,230 --> 00:18:30,110
So since we can have conditional
probabilities, why
351
00:18:30,110 --> 00:18:35,020
not have conditional PMFs as
well, since PMFs deal with
352
00:18:35,020 --> 00:18:37,120
probabilities anyway.
353
00:18:37,120 --> 00:18:40,520
So we have a random variable,
capital X. It has
354
00:18:40,520 --> 00:18:42,420
a PMF of its own.
355
00:18:42,420 --> 00:18:46,810
For example, it could be the PMF
in this picture, which is
356
00:18:46,810 --> 00:18:52,160
a uniform PMF that takes for
possible different values.
357
00:18:52,160 --> 00:18:54,850
And we also have an event.
358
00:18:54,850 --> 00:18:57,240
And somebody comes
and tells us that
359
00:18:57,240 --> 00:18:59,920
this event has occurred.
360
00:18:59,920 --> 00:19:02,700
The PMF tells you the
probability that capital X
361
00:19:02,700 --> 00:19:04,730
equals to some little x.
362
00:19:04,730 --> 00:19:08,930
Somebody tells you that a
certain event has occurred
363
00:19:08,930 --> 00:19:12,320
that's going to make you change
the probabilities that
364
00:19:12,320 --> 00:19:14,690
you assign to the different
values.
365
00:19:14,690 --> 00:19:17,180
You are going to use conditional
probabilities.
366
00:19:17,180 --> 00:19:20,880
So this part, it's clear what
it means from chapter one.
367
00:19:20,880 --> 00:19:25,200
And this part is just the new
notation we're using in this
368
00:19:25,200 --> 00:19:28,380
chapter to talk about
conditional probabilities.
369
00:19:28,380 --> 00:19:31,200
So this is just a definition.
370
00:19:31,200 --> 00:19:35,010
So the conditional PMF
is an ordinary PMF.
371
00:19:35,010 --> 00:19:39,840
But it's the PMF that applies
to a new model in which we
372
00:19:39,840 --> 00:19:42,480
have been given some information
about the outcome
373
00:19:42,480 --> 00:19:43,840
of the experiment.
374
00:19:43,840 --> 00:19:47,560
So to make it concrete, consider
this event here.
375
00:19:47,560 --> 00:19:50,610
Take the event that capital
X is bigger
376
00:19:50,610 --> 00:19:52,290
than or equal to two.
377
00:19:52,290 --> 00:19:54,490
In the picture, what
is the event A?
378
00:19:54,490 --> 00:19:57,165
The event A consists of
these three outcomes.
379
00:19:57,165 --> 00:20:00,470
380
00:20:00,470 --> 00:20:04,900
OK, what is the conditional PMF,
given that we are told
381
00:20:04,900 --> 00:20:08,920
that event A has occurred?
382
00:20:08,920 --> 00:20:11,900
Given that the event A has
occurred, it basically tells
383
00:20:11,900 --> 00:20:15,390
us that this outcome
has not occurred.
384
00:20:15,390 --> 00:20:18,670
There's only three possible
outcomes now.
385
00:20:18,670 --> 00:20:21,350
In the new universe, in the new
model where we condition
386
00:20:21,350 --> 00:20:23,840
on A, there's only three
possible outcomes.
387
00:20:23,840 --> 00:20:25,900
Those three possible outcomes
were equally
388
00:20:25,900 --> 00:20:27,660
likely when we started.
389
00:20:27,660 --> 00:20:30,130
So in the conditional universe,
they will remain
390
00:20:30,130 --> 00:20:31,270
equally likely.
391
00:20:31,270 --> 00:20:34,230
Remember, whenever you
condition, the relative
392
00:20:34,230 --> 00:20:36,330
likelihoods remain the same.
393
00:20:36,330 --> 00:20:38,290
They keep the same
proportions.
394
00:20:38,290 --> 00:20:40,860
They just need to be
re-scaled, so that
395
00:20:40,860 --> 00:20:42,520
they add up to one.
396
00:20:42,520 --> 00:20:46,380
So each one of these will have
the same probability.
397
00:20:46,380 --> 00:20:48,310
Now in the new world,
probabilities need
398
00:20:48,310 --> 00:20:49,330
to add up to 1.
399
00:20:49,330 --> 00:20:54,130
So each one of them is going to
get a probability of 1/3 in
400
00:20:54,130 --> 00:20:55,385
the conditional universe.
401
00:20:55,385 --> 00:20:58,420
402
00:20:58,420 --> 00:21:00,650
So this is our conditional
model.
403
00:21:00,650 --> 00:21:08,276
So our PMF is equal to 1/3 for
X equals to 2, 3 and 4.
404
00:21:08,276 --> 00:21:10,270
All right.
405
00:21:10,270 --> 00:21:13,380
Now whenever you have a
probabilistic model involving
406
00:21:13,380 --> 00:21:16,550
a random variable and you have
a PMF for that random
407
00:21:16,550 --> 00:21:19,690
variable, you can talk about
the expected value of that
408
00:21:19,690 --> 00:21:20,990
random variable.
409
00:21:20,990 --> 00:21:25,340
We defined expected values
just a few minutes ago.
410
00:21:25,340 --> 00:21:28,800
Here, we're dealing with
a conditional model and
411
00:21:28,800 --> 00:21:30,680
conditional probabilities.
412
00:21:30,680 --> 00:21:33,680
And so we can also talk about
the expected value of the
413
00:21:33,680 --> 00:21:38,100
random variable X in this new
universe, in this new
414
00:21:38,100 --> 00:21:40,830
conditional model that
we're dealing with.
415
00:21:40,830 --> 00:21:43,680
And this leads us to the
definition of the notion of a
416
00:21:43,680 --> 00:21:45,780
conditional expectation.
417
00:21:45,780 --> 00:21:50,680
The conditional expectation
is nothing but an ordinary
418
00:21:50,680 --> 00:21:56,720
expectation, except that you
don't use the original PMF.
419
00:21:56,720 --> 00:21:58,600
You use the conditional PMF.
420
00:21:58,600 --> 00:22:00,620
You use the conditional
probabilities.
421
00:22:00,620 --> 00:22:05,550
It's just an ordinary
expectation, but applied to
422
00:22:05,550 --> 00:22:09,400
the new model that we have to
the conditional universe where
423
00:22:09,400 --> 00:22:13,270
we are told that the certain
event has occurred.
424
00:22:13,270 --> 00:22:17,310
So we can now calculate the
condition expectation, which,
425
00:22:17,310 --> 00:22:19,890
in this particular example,
would be 1/3.
426
00:22:19,890 --> 00:22:24,150
That's the probability of a
2, plus 1/3 which is the
427
00:22:24,150 --> 00:22:29,280
probability of a 3 plus 1/3,
the probability of a 4.
428
00:22:29,280 --> 00:22:33,160
And then you can use your
calculator to find the answer,
429
00:22:33,160 --> 00:22:35,780
or you can just argue
by symmetry.
430
00:22:35,780 --> 00:22:39,360
The expected value has to be the
center of gravity of the
431
00:22:39,360 --> 00:22:45,040
PMF we're working with,
which is equal to 3.
432
00:22:45,040 --> 00:22:49,880
So conditional expectations are
no different from ordinary
433
00:22:49,880 --> 00:22:51,230
expectations.
434
00:22:51,230 --> 00:22:54,500
They're just ordinary
expectations applied to a new
435
00:22:54,500 --> 00:22:57,600
type of situation or a
new type of model.
436
00:22:57,600 --> 00:23:03,010
Anything we might know about
expectations will remain valid
437
00:23:03,010 --> 00:23:04,930
about conditional
expectations.
438
00:23:04,930 --> 00:23:07,880
So for example, the conditional
expectation of a
439
00:23:07,880 --> 00:23:11,040
linear function of a random
variable is going to be the
440
00:23:11,040 --> 00:23:14,250
linear function of the
conditional expectations.
441
00:23:14,250 --> 00:23:18,200
Or you can take any formula that
you might know, such as
442
00:23:18,200 --> 00:23:23,970
the formula that expected value
of X is equal to the--
443
00:23:23,970 --> 00:23:24,310
sorry--
444
00:23:24,310 --> 00:23:31,030
expected value of g of X is the
sum over all X's of g of X
445
00:23:31,030 --> 00:23:37,700
times the PMF of X. So this is
the formula that we already
446
00:23:37,700 --> 00:23:41,790
know about how to calculate
expectations of a function of
447
00:23:41,790 --> 00:23:43,540
a random variable.
448
00:23:43,540 --> 00:23:47,190
If we move to the conditional
universe, what changes?
449
00:23:47,190 --> 00:23:51,170
In the conditional universe,
we're talking about the
450
00:23:51,170 --> 00:23:55,150
conditional expectation, given
that event A has occurred.
451
00:23:55,150 --> 00:23:59,390
And we use the conditional
probabilities, given that A
452
00:23:59,390 --> 00:24:00,650
has occurred.
453
00:24:00,650 --> 00:24:05,140
So any formula has a conditional
counterpart.
454
00:24:05,140 --> 00:24:07,790
In the conditional counterparts,
expectations get
455
00:24:07,790 --> 00:24:10,000
replaced by conditional
expectations.
456
00:24:10,000 --> 00:24:13,940
And probabilities get replaced
by conditional probabilities.
457
00:24:13,940 --> 00:24:17,190
So once you know the first
formula and you know the
458
00:24:17,190 --> 00:24:21,210
general idea, there's absolutely
no reason for you
459
00:24:21,210 --> 00:24:24,020
to memorize a formula
like this one.
460
00:24:24,020 --> 00:24:27,070
You shouldn't even have to write
it on your cheat sheet
461
00:24:27,070 --> 00:24:30,840
for the exam, OK?
462
00:24:30,840 --> 00:24:40,980
OK, all right, so now let's look
at an example of a random
463
00:24:40,980 --> 00:24:44,470
variable that we've seen before,
the geometric random
464
00:24:44,470 --> 00:24:47,910
variable, and this time do
something a little more
465
00:24:47,910 --> 00:24:51,660
interesting with it.
466
00:24:51,660 --> 00:24:54,510
Do you remember from last time
what the geometric random
467
00:24:54,510 --> 00:24:55,580
variable is?
468
00:24:55,580 --> 00:24:56,560
We do coin flips.
469
00:24:56,560 --> 00:24:59,580
Each time there's a
probability of P
470
00:24:59,580 --> 00:25:01,250
of obtaining heads.
471
00:25:01,250 --> 00:25:03,910
And we're interested in the
number of tosses we're going
472
00:25:03,910 --> 00:25:07,580
to need until we observe heads
for the first time.
473
00:25:07,580 --> 00:25:10,045
The probability that the random
variable takes the
474
00:25:10,045 --> 00:25:13,290
value K, this is the probability
that the first K
475
00:25:13,290 --> 00:25:15,620
appeared at the K-th toss.
476
00:25:15,620 --> 00:25:20,900
So this is the probability of
K minus 1 consecutive tails
477
00:25:20,900 --> 00:25:22,670
followed by a head.
478
00:25:22,670 --> 00:25:28,360
So this is the probability of
having to weight K tosses.
479
00:25:28,360 --> 00:25:32,280
And when we plot this PMF, it
has this kind of shape, which
480
00:25:32,280 --> 00:25:36,020
is the shape of a geometric
progression.
481
00:25:36,020 --> 00:25:40,550
It starts at 1, and it goes
all the way to infinity.
482
00:25:40,550 --> 00:25:43,700
So this is a discrete random
variable that takes values
483
00:25:43,700 --> 00:25:49,160
over an infinite set, the set
of the positive integers.
484
00:25:49,160 --> 00:25:51,720
So it's a random variable,
therefore, it has an
485
00:25:51,720 --> 00:25:53,140
expectation.
486
00:25:53,140 --> 00:25:56,790
And the expected value is, by
definition, we'll consider all
487
00:25:56,790 --> 00:25:59,180
possible values of the
random variable.
488
00:25:59,180 --> 00:26:02,560
And we weigh them according to
their probabilities, which
489
00:26:02,560 --> 00:26:05,860
leads us to this expression.
490
00:26:05,860 --> 00:26:09,860
You may have evaluated that
expression some time in your
491
00:26:09,860 --> 00:26:11,400
previous life.
492
00:26:11,400 --> 00:26:15,730
And there are tricks for how
to evaluate this and get a
493
00:26:15,730 --> 00:26:16,770
closed-form answer.
494
00:26:16,770 --> 00:26:19,350
But it's sort of an
algebraic trick.
495
00:26:19,350 --> 00:26:20,710
You might not remember it.
496
00:26:20,710 --> 00:26:23,520
How do we go about doing
this summation?
497
00:26:23,520 --> 00:26:26,830
Well, we're going to use a
probabilistic trick and manage
498
00:26:26,830 --> 00:26:33,440
to evaluate the expectation of
X, essentially, without doing
499
00:26:33,440 --> 00:26:34,870
any algebra.
500
00:26:34,870 --> 00:26:38,600
And in the process of doing so,
we're going to get some
501
00:26:38,600 --> 00:26:43,080
intuition about what happens
in coin tosses and with
502
00:26:43,080 --> 00:26:45,750
geometric random variables.
503
00:26:45,750 --> 00:26:48,930
So we have two people who
are going to do the same
504
00:26:48,930 --> 00:26:53,870
experiment, flip a coin until
they obtain heads for the
505
00:26:53,870 --> 00:26:55,550
first time.
506
00:26:55,550 --> 00:27:00,170
One of these people is going to
use the letter Y to count
507
00:27:00,170 --> 00:27:02,760
how many heads it took.
508
00:27:02,760 --> 00:27:06,860
So that person starts
flipping right now.
509
00:27:06,860 --> 00:27:08,710
This is the current time.
510
00:27:08,710 --> 00:27:11,440
And they are going to obtain
tails, tails, tails, until
511
00:27:11,440 --> 00:27:13,620
eventually they obtain heads.
512
00:27:13,620 --> 00:27:20,750
And this random variable Y is,
of course, geometric, so it
513
00:27:20,750 --> 00:27:22,560
has a PMF of this form.
514
00:27:22,560 --> 00:27:25,510
515
00:27:25,510 --> 00:27:32,090
OK, now there is a second person
who is doing that same
516
00:27:32,090 --> 00:27:33,410
experiment.
517
00:27:33,410 --> 00:27:37,400
That second person is going to
take, again, a random number,
518
00:27:37,400 --> 00:27:40,160
X, until they obtain heads
for the first time.
519
00:27:40,160 --> 00:27:44,560
And of course, X is going to
have the same PMF as Y.
520
00:27:44,560 --> 00:27:47,140
But that person was impatient.
521
00:27:47,140 --> 00:27:51,880
And they actually started
flipping earlier, before the Y
522
00:27:51,880 --> 00:27:53,490
person started flipping.
523
00:27:53,490 --> 00:27:55,400
They flipped the coin twice.
524
00:27:55,400 --> 00:27:57,640
And they were unlucky, and they
525
00:27:57,640 --> 00:27:59,655
obtained tails both times.
526
00:27:59,655 --> 00:28:02,370
527
00:28:02,370 --> 00:28:05,115
And so they have to continue.
528
00:28:05,115 --> 00:28:09,100
529
00:28:09,100 --> 00:28:13,300
Looking at the situation at this
time, how do these two
530
00:28:13,300 --> 00:28:14,690
people compare?
531
00:28:14,690 --> 00:28:20,260
Who do you think is going
to obtain heads first?
532
00:28:20,260 --> 00:28:22,610
Is one more likely
than the other?
533
00:28:22,610 --> 00:28:26,160
So if you play at the casino a
lot, you'll say, oh, there
534
00:28:26,160 --> 00:28:29,810
were two tails in a row, so
a head should be coming up
535
00:28:29,810 --> 00:28:31,350
sometime soon.
536
00:28:31,350 --> 00:28:35,600
But this is a wrong argument,
because coin flips, at least
537
00:28:35,600 --> 00:28:37,870
in our model, are independent.
538
00:28:37,870 --> 00:28:41,750
The fact that these two happened
to be tails doesn't
539
00:28:41,750 --> 00:28:45,230
change anything about our
beliefs about what's going to
540
00:28:45,230 --> 00:28:46,900
be happening here.
541
00:28:46,900 --> 00:28:49,900
So what's going to be happening
to that person is
542
00:28:49,900 --> 00:28:53,140
they will be flipping
independent coin flips.
543
00:28:53,140 --> 00:28:54,930
That person will also
be flipping
544
00:28:54,930 --> 00:28:56,600
independent coin flips.
545
00:28:56,600 --> 00:29:00,660
And both of them wait until
the first head occurs.
546
00:29:00,660 --> 00:29:04,050
They're facing an identical
situation,
547
00:29:04,050 --> 00:29:06,770
starting from this time.
548
00:29:06,770 --> 00:29:11,850
OK, now what's the probabilistic
model of what
549
00:29:11,850 --> 00:29:14,530
this person is facing?
550
00:29:14,530 --> 00:29:18,940
The time until that person
obtains heads for the first
551
00:29:18,940 --> 00:29:25,740
time is X. So this number of
flips until they obtain heads
552
00:29:25,740 --> 00:29:30,080
for the first time is going
to be X minus 2.
553
00:29:30,080 --> 00:29:35,810
So X is the total number
until the first head.
554
00:29:35,810 --> 00:29:41,280
X minus 2 is the number or
flips, starting from here.
555
00:29:41,280 --> 00:29:44,060
Now what information do we
have about that person?
556
00:29:44,060 --> 00:29:45,910
We have the information
that their first
557
00:29:45,910 --> 00:29:47,970
two flips were tails.
558
00:29:47,970 --> 00:29:52,650
So we're given the information
that X was bigger than 2.
559
00:29:52,650 --> 00:29:57,035
So the probabilistic model that
describes this piece of
560
00:29:57,035 --> 00:30:01,790
the experiment is that it's
going to take a random number
561
00:30:01,790 --> 00:30:04,270
of flips until the first head.
562
00:30:04,270 --> 00:30:08,420
That number of flips, starting
from here until the next head,
563
00:30:08,420 --> 00:30:10,980
is that number X minus 2.
564
00:30:10,980 --> 00:30:13,210
But we're given the information
that this person
565
00:30:13,210 --> 00:30:17,780
has already wasted
2 coin flips.
566
00:30:17,780 --> 00:30:20,330
Now we argued that
probabilistically, this
567
00:30:20,330 --> 00:30:24,780
person, this part of the
experiment here is identical
568
00:30:24,780 --> 00:30:26,710
with that part of
the experiment.
569
00:30:26,710 --> 00:30:31,650
So the PMF of this random
variable, which is X minus 2,
570
00:30:31,650 --> 00:30:33,980
conditioned on this information,
should be the
571
00:30:33,980 --> 00:30:39,150
same as that PMF that
we have down there.
572
00:30:39,150 --> 00:30:46,290
So the formal statement that
I'm making is that this PMF
573
00:30:46,290 --> 00:30:51,910
here of X minus 2, given that
X is bigger than 2, is the
574
00:30:51,910 --> 00:30:58,060
same as the PMF of X itself.
575
00:30:58,060 --> 00:31:00,280
What is this saying?
576
00:31:00,280 --> 00:31:04,220
Given that I tell you that you
already did a few flips and
577
00:31:04,220 --> 00:31:08,450
they were failures, the
remaining number of flips
578
00:31:08,450 --> 00:31:13,260
until the first head has the
same geometric distribution as
579
00:31:13,260 --> 00:31:16,130
if you were starting
from scratch.
580
00:31:16,130 --> 00:31:19,590
Whatever happened in the past,
it happened, but has no
581
00:31:19,590 --> 00:31:22,670
bearing what's going to
happen in the future.
582
00:31:22,670 --> 00:31:27,660
Remaining coin flips until
a head has the same
583
00:31:27,660 --> 00:31:32,220
distribution, whether you're
starting right now, or whether
584
00:31:32,220 --> 00:31:35,590
you had done some other
stuff in the past.
585
00:31:35,590 --> 00:31:38,860
So this is a property that we
call the memorylessness
586
00:31:38,860 --> 00:31:42,550
property of the geometric
distribution.
587
00:31:42,550 --> 00:31:45,560
Essentially, it says that
whatever happens in the future
588
00:31:45,560 --> 00:31:48,920
is independent from whatever
happened in the past.
589
00:31:48,920 --> 00:31:51,350
And that's true almost by
definition, because we're
590
00:31:51,350 --> 00:31:53,750
assuming independent
coin flips.
591
00:31:53,750 --> 00:31:56,750
Really, independence means that
information about one
592
00:31:56,750 --> 00:32:00,390
part of the experiment has no
bearing about what's going to
593
00:32:00,390 --> 00:32:04,280
happen in the other parts
of the experiment.
594
00:32:04,280 --> 00:32:09,010
The argument that I tried to
give using the intuition of
595
00:32:09,010 --> 00:32:14,240
coin flips, you can make it
formal by just manipulating
596
00:32:14,240 --> 00:32:16,110
PMFs formally.
597
00:32:16,110 --> 00:32:19,450
So this is the original
PMF of X.
598
00:32:19,450 --> 00:32:22,090
Suppose that you condition
on the event that X
599
00:32:22,090 --> 00:32:24,030
is bigger than 3.
600
00:32:24,030 --> 00:32:27,570
This conditioning information,
what it does is it tells you
601
00:32:27,570 --> 00:32:30,450
that this piece did
not happen.
602
00:32:30,450 --> 00:32:33,760
You're conditioning just
on this event.
603
00:32:33,760 --> 00:32:37,430
When you condition on that
event, what's left is the
604
00:32:37,430 --> 00:32:42,130
conditional PMF, which has the
same shape as this one, except
605
00:32:42,130 --> 00:32:45,010
that it needs to be
re-normalized up, so that the
606
00:32:45,010 --> 00:32:46,820
probabilities add up to one.
607
00:32:46,820 --> 00:32:52,460
So you take that picture, but
you need to change the height
608
00:32:52,460 --> 00:32:56,210
of it, so that these
terms add up to 1.
609
00:32:56,210 --> 00:32:59,730
And this is the conditional
PMF of X, given that X is
610
00:32:59,730 --> 00:33:01,310
bigger than 2.
611
00:33:01,310 --> 00:33:04,360
But we're talking here not about
X. We're talking about
612
00:33:04,360 --> 00:33:07,930
the remaining number of heads.
613
00:33:07,930 --> 00:33:12,030
Remaining number of heads
is X minus 2.
614
00:33:12,030 --> 00:33:17,120
If we have the PMF of X, can we
find the PMF of X minus 2?
615
00:33:17,120 --> 00:33:22,870
Well, if X is equal to 3, that
corresponds to X minus 2 being
616
00:33:22,870 --> 00:33:24,170
equal to 1.
617
00:33:24,170 --> 00:33:26,730
So this probability here
should be equal to that
618
00:33:26,730 --> 00:33:27,950
probability.
619
00:33:27,950 --> 00:33:31,400
The probability that X is equal
to 4 should be the same
620
00:33:31,400 --> 00:33:34,710
as the probability that X
minus 2 is equal to 2.
621
00:33:34,710 --> 00:33:38,980
So basically, the PMF of X minus
2 is the same as the PMF
622
00:33:38,980 --> 00:33:43,460
of X, except that it gets
shifted by these 2 units.
623
00:33:43,460 --> 00:33:47,340
So this way, we have formally
derived the conditional PMF of
624
00:33:47,340 --> 00:33:51,490
the remaining number of coin
tosses, given that the first
625
00:33:51,490 --> 00:33:55,230
two flips were tails.
626
00:33:55,230 --> 00:33:58,880
And we see that it's exactly
the same as the PMF that we
627
00:33:58,880 --> 00:34:00,230
started with.
628
00:34:00,230 --> 00:34:05,130
And so this is the formal proof
of this statement here.
629
00:34:05,130 --> 00:34:10,010
So it's useful here to digest
both these formal statements
630
00:34:10,010 --> 00:34:13,290
and understand it and understand
the notation that
631
00:34:13,290 --> 00:34:17,050
is involved here, but also
to really appreciate the
632
00:34:17,050 --> 00:34:21,409
intuitive argument what
this is really saying.
633
00:34:21,409 --> 00:34:28,980
OK, all right, so now we want to
use this observation, this
634
00:34:28,980 --> 00:34:32,389
memorylessness, to eventually
calculate the expected value
635
00:34:32,389 --> 00:34:34,679
for a geometric random
variable.
636
00:34:34,679 --> 00:34:38,150
And the way we're going to do
it is by using a divide and
637
00:34:38,150 --> 00:34:41,650
conquer tool, which is an analog
of what we have already
638
00:34:41,650 --> 00:34:44,489
seen sometime before.
639
00:34:44,489 --> 00:34:48,230
Remember our story that there's
a number of possible
640
00:34:48,230 --> 00:34:49,840
scenarios about the world?
641
00:34:49,840 --> 00:34:54,120
And there's a certain event, B,
that can happen under any
642
00:34:54,120 --> 00:34:55,889
of these possible scenarios.
643
00:34:55,889 --> 00:34:57,980
And we have the total
probability theory.
644
00:34:57,980 --> 00:35:00,970
And that tells us that, to find
the probability of this
645
00:35:00,970 --> 00:35:03,360
event, B, you consider
the probabilities of
646
00:35:03,360 --> 00:35:06,000
B under each scenario.
647
00:35:06,000 --> 00:35:09,180
And you weigh those
probabilities according to the
648
00:35:09,180 --> 00:35:12,190
probabilities of the different
scenarios that we have.
649
00:35:12,190 --> 00:35:14,520
So that's a formula that
we already know
650
00:35:14,520 --> 00:35:16,760
and have worked with.
651
00:35:16,760 --> 00:35:18,020
What's the next step?
652
00:35:18,020 --> 00:35:19,910
Is it something deep?
653
00:35:19,910 --> 00:35:24,280
No, it's just translation
in different notation.
654
00:35:24,280 --> 00:35:29,140
This is the exactly same
formula, but with PMFs.
655
00:35:29,140 --> 00:35:32,720
The event that capital X is
equal to little x can happen
656
00:35:32,720 --> 00:35:34,420
in many different ways.
657
00:35:34,420 --> 00:35:37,580
It can happen under
either scenario.
658
00:35:37,580 --> 00:35:40,910
And within each scenario, you
need to use the conditional
659
00:35:40,910 --> 00:35:44,140
probabilities of that event,
given that this
660
00:35:44,140 --> 00:35:46,270
scenario has occurred.
661
00:35:46,270 --> 00:35:49,640
So this formula is identical to
that one, except that we're
662
00:35:49,640 --> 00:35:53,440
using conditional PMFs,
instead of conditional
663
00:35:53,440 --> 00:35:54,290
probabilities.
664
00:35:54,290 --> 00:35:56,860
But conditional PMFs, of
course, are nothing but
665
00:35:56,860 --> 00:35:59,710
conditional probabilities
anyway.
666
00:35:59,710 --> 00:36:02,500
So nothing new so far.
667
00:36:02,500 --> 00:36:08,700
Then what I do is to take this
formula here and multiply both
668
00:36:08,700 --> 00:36:15,320
sides by X and take the
sum over all X's.
669
00:36:15,320 --> 00:36:17,270
What do we get on this side?
670
00:36:17,270 --> 00:36:19,430
We get the expected
value of X.
671
00:36:19,430 --> 00:36:22,830
What do we get on that side?
672
00:36:22,830 --> 00:36:24,290
Probability of A1.
673
00:36:24,290 --> 00:36:29,770
And then here, sum over all
X's of X times P. That's,
674
00:36:29,770 --> 00:36:33,030
again, the same calculation
we have when we deal with
675
00:36:33,030 --> 00:36:36,450
expectations, except that, since
here, we're dealing with
676
00:36:36,450 --> 00:36:39,010
conditional probabilities,
we're going to get the
677
00:36:39,010 --> 00:36:41,220
conditional expectation.
678
00:36:41,220 --> 00:36:44,160
And this is the total
expectation theorem.
679
00:36:44,160 --> 00:36:47,300
It's a very useful way for
calculating expectations using
680
00:36:47,300 --> 00:36:49,440
a divide and conquer method.
681
00:36:49,440 --> 00:36:53,730
We figure out the average value
of X under each one of
682
00:36:53,730 --> 00:36:55,590
the possible scenarios.
683
00:36:55,590 --> 00:37:01,330
The overall average value of
X is a weighted linear
684
00:37:01,330 --> 00:37:04,500
combination of the expected
values of X in the different
685
00:37:04,500 --> 00:37:07,960
scenarios where the weights are
chosen according to the
686
00:37:07,960 --> 00:37:09,210
different probabilities.
687
00:37:09,210 --> 00:37:15,356
688
00:37:15,356 --> 00:37:21,040
OK, and now we're going to apply
this to the case of a
689
00:37:21,040 --> 00:37:23,070
geometric random variable.
690
00:37:23,070 --> 00:37:26,410
And we're going to divide and
conquer by considering
691
00:37:26,410 --> 00:37:31,350
separately the two cases where
the first toss was heads, and
692
00:37:31,350 --> 00:37:35,820
the other case where the
first toss was tails.
693
00:37:35,820 --> 00:37:40,160
So the expected value of X is
the probability that the first
694
00:37:40,160 --> 00:37:44,020
toss was heads, so that X is
equal to 1, and the expected
695
00:37:44,020 --> 00:37:46,520
value if that happened.
696
00:37:46,520 --> 00:37:51,530
What is the expected value of X,
given that X is equal to 1?
697
00:37:51,530 --> 00:37:55,770
If X is known to be
equal to 1, then X
698
00:37:55,770 --> 00:37:57,390
becomes just a number.
699
00:37:57,390 --> 00:38:01,100
And the expected value of a
number is the number itself.
700
00:38:01,100 --> 00:38:05,120
So this first line here is the
probability of heads in the
701
00:38:05,120 --> 00:38:07,660
first toss times the number 1.
702
00:38:07,660 --> 00:38:13,320
703
00:38:13,320 --> 00:38:21,000
So the probability that X is
bigger than 1 is 1 minus P.
704
00:38:21,000 --> 00:38:25,400
And then we need to do
something about this
705
00:38:25,400 --> 00:38:26,650
conditional expectation.
706
00:38:26,650 --> 00:38:29,830
707
00:38:29,830 --> 00:38:33,400
What is it?
708
00:38:33,400 --> 00:38:39,420
I can write it in, perhaps,
a more suggested form, as
709
00:38:39,420 --> 00:38:51,360
expected the value of X minus
1, given that X minus 1 is
710
00:38:51,360 --> 00:38:52,610
bigger than 1.
711
00:38:52,610 --> 00:38:56,590
712
00:38:56,590 --> 00:38:57,840
Ah.
713
00:38:57,840 --> 00:39:02,420
714
00:39:02,420 --> 00:39:07,453
OK, X bigger than 1 is the
same as X minus 1 being
715
00:39:07,453 --> 00:39:10,770
positive, this way.
716
00:39:10,770 --> 00:39:15,500
X minus 1 is positive plus 1.
717
00:39:15,500 --> 00:39:16,680
What did I do here?
718
00:39:16,680 --> 00:39:21,250
I added and subtracted 1.
719
00:39:21,250 --> 00:39:24,110
Now what is this?
720
00:39:24,110 --> 00:39:29,240
This is the expected value of
the remaining coin flips,
721
00:39:29,240 --> 00:39:34,660
until I obtain heads, given that
the first one was tails.
722
00:39:34,660 --> 00:39:38,690
It's the same story that we were
going through down there.
723
00:39:38,690 --> 00:39:41,860
Given that the first coin flip
was tails doesn't tell me
724
00:39:41,860 --> 00:39:43,790
anything about the
future, about the
725
00:39:43,790 --> 00:39:45,750
remaining coin flips.
726
00:39:45,750 --> 00:39:49,610
So this expectation should be
the same as the expectation
727
00:39:49,610 --> 00:39:53,740
faced by a person who was
starting just now.
728
00:39:53,740 --> 00:39:59,080
So this should be equal to the
expected value of X itself.
729
00:39:59,080 --> 00:40:04,120
And then we have the plus 1
that's come from there, OK?
730
00:40:04,120 --> 00:40:07,830
731
00:40:07,830 --> 00:40:11,140
Remaining coin flips until a
head, given that I had a tail
732
00:40:11,140 --> 00:40:15,710
yesterday, is the same as
expected number of flips until
733
00:40:15,710 --> 00:40:20,160
heads for a person just starting
now and wasn't doing
734
00:40:20,160 --> 00:40:21,280
anything yesterday.
735
00:40:21,280 --> 00:40:24,640
So the fact that they I had a
coin flip yesterday doesn't
736
00:40:24,640 --> 00:40:28,990
change my beliefs about how
long it's going to take me
737
00:40:28,990 --> 00:40:31,170
until the first head.
738
00:40:31,170 --> 00:40:34,700
So once we believe that
relation, than
739
00:40:34,700 --> 00:40:37,750
we plug this here.
740
00:40:37,750 --> 00:40:42,346
And this red term becomes
expected value of X plus 1.
741
00:40:42,346 --> 00:40:46,000
742
00:40:46,000 --> 00:40:50,850
So now we didn't exactly get the
answer we wanted, but we
743
00:40:50,850 --> 00:40:56,110
got an equation that involves
the expected value of X. And
744
00:40:56,110 --> 00:40:58,230
it's the only unknown
in that equation.
745
00:40:58,230 --> 00:41:03,990
Expected value of X equals to P
plus (1 minus P) times this
746
00:41:03,990 --> 00:41:05,190
expression.
747
00:41:05,190 --> 00:41:08,480
You solve this equation for
expected value of X, and you
748
00:41:08,480 --> 00:41:12,990
get the value of 1/P.
749
00:41:12,990 --> 00:41:16,920
The final answer does make
intuitive sense.
750
00:41:16,920 --> 00:41:21,030
If P is small, heads are
difficult to obtain.
751
00:41:21,030 --> 00:41:24,050
So you expect that it's going to
take you a long time until
752
00:41:24,050 --> 00:41:26,310
you see heads for
the first time.
753
00:41:26,310 --> 00:41:29,243
So it is definitely a
reasonable answer.
754
00:41:29,243 --> 00:41:32,960
Now the trick that we used here,
the divide and conquer
755
00:41:32,960 --> 00:41:36,860
trick, is a really nice one.
756
00:41:36,860 --> 00:41:39,760
It gives us a very good shortcut
in this problem.
757
00:41:39,760 --> 00:41:44,230
But you must definitely spend
some time making sure you
758
00:41:44,230 --> 00:41:48,670
understand why this expression
here is the same as that
759
00:41:48,670 --> 00:41:50,020
expression there.
760
00:41:50,020 --> 00:41:53,790
Essentially, what it's saying is
that, if I tell you that X
761
00:41:53,790 --> 00:41:57,460
is bigger than 1, that the first
coin flip was tails, all
762
00:41:57,460 --> 00:42:02,040
I'm telling you is that that
person has wasted a coin flip,
763
00:42:02,040 --> 00:42:05,310
and they are starting
all over again.
764
00:42:05,310 --> 00:42:08,510
So they've wasted 1 coin flip.
765
00:42:08,510 --> 00:42:10,670
And they're starting
all over again.
766
00:42:10,670 --> 00:42:13,220
If I tell you that the first
flip was tails, that's the
767
00:42:13,220 --> 00:42:16,800
only information that I'm
basically giving you, a wasted
768
00:42:16,800 --> 00:42:19,970
flip, and then starts
all over again.
769
00:42:19,970 --> 00:42:23,180
All right, so in the few
remaining minutes now, we're
770
00:42:23,180 --> 00:42:26,970
going to quickly introduce a few
new concepts that we will
771
00:42:26,970 --> 00:42:31,050
be playing with in the
next ten days or so.
772
00:42:31,050 --> 00:42:33,300
And you will get plenty
of opportunities
773
00:42:33,300 --> 00:42:34,860
to manipulate them.
774
00:42:34,860 --> 00:42:37,180
So here's the idea.
775
00:42:37,180 --> 00:42:40,370
A typical experiment may have
several random variables
776
00:42:40,370 --> 00:42:43,310
associated with that
experiment.
777
00:42:43,310 --> 00:42:46,370
So a typical student has
height and weight.
778
00:42:46,370 --> 00:42:48,800
If I give you the PMF of
height, that tells me
779
00:42:48,800 --> 00:42:51,060
something about distribution
of heights in the class.
780
00:42:51,060 --> 00:42:57,110
I give you the PMF of weight,
it tells me something about
781
00:42:57,110 --> 00:42:58,990
the different weights
in this class.
782
00:42:58,990 --> 00:43:01,690
But if I want to ask a
question, is there an
783
00:43:01,690 --> 00:43:05,910
association between height and
weight, then I need to know a
784
00:43:05,910 --> 00:43:09,730
little more how height and
weight relate to each other.
785
00:43:09,730 --> 00:43:13,480
And the PMF of height
individuality and PMF of
786
00:43:13,480 --> 00:43:16,130
weight just by itself
do not tell me
787
00:43:16,130 --> 00:43:18,240
anything about those relations.
788
00:43:18,240 --> 00:43:21,730
To be able to say something
about those relations, I need
789
00:43:21,730 --> 00:43:27,230
to know something about joint
probabilities, how likely is
790
00:43:27,230 --> 00:43:31,500
it that certain X's go together
with certain Y's.
791
00:43:31,500 --> 00:43:34,180
So these probabilities,
essentially, capture
792
00:43:34,180 --> 00:43:37,930
associations between these
two random variables.
793
00:43:37,930 --> 00:43:40,910
And it's the information I would
need to have to do any
794
00:43:40,910 --> 00:43:44,900
kind of statistical study that
tries to relate the two random
795
00:43:44,900 --> 00:43:47,600
variables with each other.
796
00:43:47,600 --> 00:43:49,440
These are ordinary
probabilities.
797
00:43:49,440 --> 00:43:50,750
This is an event.
798
00:43:50,750 --> 00:43:52,630
It's the event that
this thing happens
799
00:43:52,630 --> 00:43:55,230
and that thing happens.
800
00:43:55,230 --> 00:43:58,460
This is just the notation
that we will be using.
801
00:43:58,460 --> 00:44:00,840
It's called the joint PMF.
802
00:44:00,840 --> 00:44:04,560
It's the joint Probability
Mass Function of the two
803
00:44:04,560 --> 00:44:09,170
random variables X and Y looked
at together, jointly.
804
00:44:09,170 --> 00:44:11,740
And it gives me the probability
that any
805
00:44:11,740 --> 00:44:17,100
particular numerical outcome
pair does happen.
806
00:44:17,100 --> 00:44:20,580
So in the finite case, you can
represent joint PMFs, for
807
00:44:20,580 --> 00:44:22,660
example, by a table.
808
00:44:22,660 --> 00:44:25,940
This particular table here would
give you information
809
00:44:25,940 --> 00:44:31,870
such as, let's see, the joint
PMF evaluated at 2, 3.
810
00:44:31,870 --> 00:44:35,240
This is the probability that
X is equal to 3 and,
811
00:44:35,240 --> 00:44:38,200
simultaneously, Y
is equal to 3.
812
00:44:38,200 --> 00:44:40,370
So it would be that
number here.
813
00:44:40,370 --> 00:44:41,620
It's 4/20.
814
00:44:41,620 --> 00:44:44,290
815
00:44:44,290 --> 00:44:47,330
OK, what is a basic
property of PMFs?
816
00:44:47,330 --> 00:44:49,920
First, these are probabilities,
so all of the
817
00:44:49,920 --> 00:44:52,240
entries have to be
non-negative.
818
00:44:52,240 --> 00:44:57,470
If you adopt the probabilities
over all possible numerical
819
00:44:57,470 --> 00:45:01,070
pairs that you could get, of
course, the total probability
820
00:45:01,070 --> 00:45:03,050
must be equal to 1.
821
00:45:03,050 --> 00:45:06,070
So that's another thing
that we want.
822
00:45:06,070 --> 00:45:10,090
Now suppose somebody gives
me this model, but I
823
00:45:10,090 --> 00:45:12,410
don't care about Y's.
824
00:45:12,410 --> 00:45:15,760
All I care is the distribution
of the X's.
825
00:45:15,760 --> 00:45:18,550
So I'm going to find the
probability that X takes on a
826
00:45:18,550 --> 00:45:20,060
particular value.
827
00:45:20,060 --> 00:45:22,230
Can I find it from the table?
828
00:45:22,230 --> 00:45:23,190
Of course, I can.
829
00:45:23,190 --> 00:45:27,930
If you ask me what's the
probability that X is equal to
830
00:45:27,930 --> 00:45:31,890
3, what I'm going to do is
to add up those three
831
00:45:31,890 --> 00:45:33,790
probabilities together.
832
00:45:33,790 --> 00:45:37,680
And those probabilities, taken
all together, give me the
833
00:45:37,680 --> 00:45:40,400
probability that X
is equal to 3.
834
00:45:40,400 --> 00:45:43,950
These are all the possible ways
that the event X equals
835
00:45:43,950 --> 00:45:45,310
to 3 can happen.
836
00:45:45,310 --> 00:45:49,850
So we add these, and
we get the 6/20.
837
00:45:49,850 --> 00:45:53,180
What I just did, can we
translate it to a formula?
838
00:45:53,180 --> 00:45:55,510
What did I do?
839
00:45:55,510 --> 00:45:59,790
I fixed the particular value
of X. And I added up the
840
00:45:59,790 --> 00:46:05,100
values of the joint PMF over all
the possible values of Y.
841
00:46:05,100 --> 00:46:07,710
So that's how you do it.
842
00:46:07,710 --> 00:46:08,990
You take the joint.
843
00:46:08,990 --> 00:46:13,020
You take one slice of the joint,
keeping X fixed, and
844
00:46:13,020 --> 00:46:16,110
adding up over the different
values of Y.
845
00:46:16,110 --> 00:46:18,930
The moral of this example is
that, if you know the joint
846
00:46:18,930 --> 00:46:22,590
PMFs, then you can find the
individual PMFs of every
847
00:46:22,590 --> 00:46:24,310
individual random variable.
848
00:46:24,310 --> 00:46:25,980
And we have a name for these.
849
00:46:25,980 --> 00:46:28,840
We call them the
marginal PMFs.
850
00:46:28,840 --> 00:46:31,900
We have the joint that talks
about both together, and the
851
00:46:31,900 --> 00:46:35,170
marginal that talks about
them one at the time.
852
00:46:35,170 --> 00:46:38,590
And finally, since we love
conditional probabilities, we
853
00:46:38,590 --> 00:46:41,150
will certainly want to define
an object called the
854
00:46:41,150 --> 00:46:44,160
conditional PMF.
855
00:46:44,160 --> 00:46:46,940
So this quantity here
is a familiar one.
856
00:46:46,940 --> 00:46:49,310
It's just a conditional
probability.
857
00:46:49,310 --> 00:46:54,940
It's the probability that X
takes on a particular value,
858
00:46:54,940 --> 00:46:58,210
given that Y takes
a certain value.
859
00:46:58,210 --> 00:47:02,060
860
00:47:02,060 --> 00:47:07,160
For our example, let's take
little y to be equal to 2,
861
00:47:07,160 --> 00:47:10,890
which means that we're
conditioning to live inside
862
00:47:10,890 --> 00:47:12,490
this universe.
863
00:47:12,490 --> 00:47:17,920
This red universe here is the
y equal to 2 universe.
864
00:47:17,920 --> 00:47:20,860
And these are the conditional
probabilities of the different
865
00:47:20,860 --> 00:47:22,935
X's inside that universe.
866
00:47:22,935 --> 00:47:26,020
867
00:47:26,020 --> 00:47:29,860
OK, once more, just an
exercise in notation.
868
00:47:29,860 --> 00:47:34,850
This is the chapter two version
of the notation of
869
00:47:34,850 --> 00:47:37,750
what we were denoting this
way in chapter one.
870
00:47:37,750 --> 00:47:43,000
The way to read this is that
it's a conditional PMF having
871
00:47:43,000 --> 00:47:46,450
to do with two random variables,
the PMF of X
872
00:47:46,450 --> 00:47:51,150
conditioned on information
about Y. We are fixing a
873
00:47:51,150 --> 00:47:54,830
particular value of capital Y,
that's the value on which we
874
00:47:54,830 --> 00:47:56,610
are conditioning.
875
00:47:56,610 --> 00:47:58,340
And we're looking at the
probabilities of
876
00:47:58,340 --> 00:48:00,190
the different X's.
877
00:48:00,190 --> 00:48:03,890
So it's really a function
of two arguments, little
878
00:48:03,890 --> 00:48:05,340
x and little y.
879
00:48:05,340 --> 00:48:10,490
But the best way to think about
it is to fix little y
880
00:48:10,490 --> 00:48:15,030
and think of it as a function
of X. So I'm fixing little y
881
00:48:15,030 --> 00:48:17,610
here, let's say, to
y equal to 2.
882
00:48:17,610 --> 00:48:20,400
So I'm considering only this.
883
00:48:20,400 --> 00:48:24,090
And now, this quantity becomes
a function of little x.
884
00:48:24,090 --> 00:48:27,340
For the different little x's,
we're going to have different
885
00:48:27,340 --> 00:48:29,290
conditional probabilities.
886
00:48:29,290 --> 00:48:31,040
What are those conditional
probabilities?
887
00:48:31,040 --> 00:48:36,230
888
00:48:36,230 --> 00:48:40,760
OK, conditional probabilities
are proportional to original
889
00:48:40,760 --> 00:48:41,940
probabilities.
890
00:48:41,940 --> 00:48:45,200
So it's going to be those
numbers, but scaled up.
891
00:48:45,200 --> 00:48:48,340
And they need to be scaled
so that they add up to 1.
892
00:48:48,340 --> 00:48:50,280
So we have 1, 3 and 1.
893
00:48:50,280 --> 00:48:51,970
That's a total of 5.
894
00:48:51,970 --> 00:48:56,220
So the conditional PMF would
have the shape zero,
895
00:48:56,220 --> 00:49:02,480
1/5, 3/5, and 1/5.
896
00:49:02,480 --> 00:49:07,540
This is the conditional PMF,
given a particular value of Y.
897
00:49:07,540 --> 00:49:13,180
It has the same shape as those
numbers, where by shape, I
898
00:49:13,180 --> 00:49:15,850
mean try to visualize
a bar graph.
899
00:49:15,850 --> 00:49:19,370
The bar graph associated with
those numbers has exactly the
900
00:49:19,370 --> 00:49:23,630
same shape as the bar graph
associated with those numbers.
901
00:49:23,630 --> 00:49:26,790
The only thing that has changed
is the scaling.
902
00:49:26,790 --> 00:49:29,630
Big moral, let me say in
different words, the
903
00:49:29,630 --> 00:49:34,250
conditional PMF, given a
particular value of Y, is just
904
00:49:34,250 --> 00:49:39,790
a slice of the joint PMF where
you maintain the same shape,
905
00:49:39,790 --> 00:49:44,320
but you rescale the numbers
so that they add up to 1.
906
00:49:44,320 --> 00:49:48,410
Now mathematically, of course,
what all of this is doing is
907
00:49:48,410 --> 00:49:54,750
it's taking the original joint
PDF and it rescales it by a
908
00:49:54,750 --> 00:49:56,540
certain factor.
909
00:49:56,540 --> 00:50:00,340
This does not involve X, so the
shape, is a function of X,
910
00:50:00,340 --> 00:50:01,720
has not changed.
911
00:50:01,720 --> 00:50:04,910
We're keeping the same shape
as a function of X, but we
912
00:50:04,910 --> 00:50:06,420
divide by a certain number.
913
00:50:06,420 --> 00:50:09,670
And that's the number that we
need, so that the conditional
914
00:50:09,670 --> 00:50:12,810
probabilities add up to 1.
915
00:50:12,810 --> 00:50:15,850
Now where does this
formula come from?
916
00:50:15,850 --> 00:50:17,880
Well, this is just the
definition of conditional
917
00:50:17,880 --> 00:50:19,000
probabilities.
918
00:50:19,000 --> 00:50:21,870
Probability of something
conditioned on something else
919
00:50:21,870 --> 00:50:24,620
is the probability of both
things happening, the
920
00:50:24,620 --> 00:50:28,040
intersection of the two divided
by the probability of
921
00:50:28,040 --> 00:50:29,890
the conditioning event.
922
00:50:29,890 --> 00:50:33,100
And last remark is that, as
I just said, conditional
923
00:50:33,100 --> 00:50:35,930
probabilities are nothing
different than ordinary
924
00:50:35,930 --> 00:50:37,210
probabilities.
925
00:50:37,210 --> 00:50:42,390
So a conditional PMF must sum
to 1, no matter what you are
926
00:50:42,390 --> 00:50:44,360
conditioning on.
927
00:50:44,360 --> 00:50:47,370
All right, so this was sort of
quick introduction into our
928
00:50:47,370 --> 00:50:48,920
new notation.
929
00:50:48,920 --> 00:50:53,030
But you get a lot of practice
in the next days to come.