1
00:00:00,499 --> 00:00:02,900
PROFESSOR: We ask about
averages all the time.
2
00:00:02,900 --> 00:00:05,280
And in the context
of random variables,
3
00:00:05,280 --> 00:00:07,870
averages get abstracted
into a lovely concept
4
00:00:07,870 --> 00:00:11,950
called the expectation
of the random variable.
5
00:00:11,950 --> 00:00:14,170
Let's begin with a
motivating example
6
00:00:14,170 --> 00:00:18,240
which, as is often the case,
will come from gambling.
7
00:00:18,240 --> 00:00:20,830
So there's a game
that's actually
8
00:00:20,830 --> 00:00:24,910
played in casinos called
Carnival Dice where you have
9
00:00:24,910 --> 00:00:28,760
three dice, and the
way you play is you
10
00:00:28,760 --> 00:00:30,970
pick your favorite
number from 1 to 6,
11
00:00:30,970 --> 00:00:32,720
whatever it happens to be.
12
00:00:32,720 --> 00:00:34,334
And then you roll
the three dice.
13
00:00:34,334 --> 00:00:36,500
The dice are assumed to be
fair, so each one of them
14
00:00:36,500 --> 00:00:38,670
has a one in six
chance of coming up
15
00:00:38,670 --> 00:00:40,140
with any given number.
16
00:00:40,140 --> 00:00:43,910
And then the payoff
goes as follows.
17
00:00:43,910 --> 00:00:48,630
For every match of your
favorite number, you get $1.00.
18
00:00:48,630 --> 00:00:51,870
And if none of your favorite--
if none of the die show
19
00:00:51,870 --> 00:00:56,840
your favorite number,
then you lose $1.00.
20
00:00:56,840 --> 00:00:57,340
OK.
21
00:00:57,340 --> 00:00:58,180
Let's do an example.
22
00:00:58,180 --> 00:00:59,763
Suppose your favorite
number was five.
23
00:00:59,763 --> 00:01:02,440
You announce that to the
house, or the dealer,
24
00:01:02,440 --> 00:01:04,590
and then the dice start rolling.
25
00:01:04,590 --> 00:01:07,644
Now if your roll happened to
come up with the numbers two,
26
00:01:07,644 --> 00:01:09,560
three, and four, well,
there's no fives there,
27
00:01:09,560 --> 00:01:11,300
so you've lost $1.00.
28
00:01:11,300 --> 00:01:14,422
On the other hand, if your
rolls came out five, four, six,
29
00:01:14,422 --> 00:01:15,880
there's one five,
you've one $1.00.
30
00:01:15,880 --> 00:01:18,200
If it came out five, four,
five, there's two fives,
31
00:01:18,200 --> 00:01:19,000
you've won $1.00.
32
00:01:19,000 --> 00:01:22,080
And if it was all fives,
you've actually won $3.00.
33
00:01:22,080 --> 00:01:25,460
Now real carnival dice is often
played where you either win
34
00:01:25,460 --> 00:01:27,670
or lose $1.00 depending on
whether there's any match
35
00:01:27,670 --> 00:01:30,380
at all, but we're playing
a more generous game where,
36
00:01:30,380 --> 00:01:32,120
if you double match,
you get $2.00.
37
00:01:32,120 --> 00:01:34,410
If you triple match,
you get $3.00.
38
00:01:34,410 --> 00:01:38,290
So the basic question about
this is, is this a fair game.
39
00:01:38,290 --> 00:01:41,280
Is this worth playing, and
how can we think about that?
40
00:01:41,280 --> 00:01:43,960
Well, we're going to think
about it probabilistically.
41
00:01:43,960 --> 00:01:46,330
So let's think about
the probability
42
00:01:46,330 --> 00:01:48,955
of rolling no fives.
43
00:01:48,955 --> 00:01:51,040
If five is my favorite
number, what's
44
00:01:51,040 --> 00:01:52,910
the probability that
I roll none of them?
45
00:01:52,910 --> 00:01:55,370
Well, there's a five
out of six chance
46
00:01:55,370 --> 00:01:58,070
that I don't roll a
five on the first die,
47
00:01:58,070 --> 00:02:00,020
and on the second die
and on the third die.
48
00:02:00,020 --> 00:02:02,390
And since the die rolls are
assumed to be independent,
49
00:02:02,390 --> 00:02:05,890
the dies are independent,
the probability of no fives
50
00:02:05,890 --> 00:02:11,664
is 5/6 to the third, which
comes out to be 125/216.
51
00:02:11,664 --> 00:02:13,080
I'm writing this
out because we're
52
00:02:13,080 --> 00:02:15,240
going to put all
the numbers over 216
53
00:02:15,240 --> 00:02:16,680
to make them easier to compare.
54
00:02:16,680 --> 00:02:17,450
OK.
55
00:02:17,450 --> 00:02:19,880
What's the probability
of one five?
56
00:02:19,880 --> 00:02:23,320
Well, the probability
of any single sequence
57
00:02:23,320 --> 00:02:30,066
of die rolls with a single five
is 5/6 of no five times 5/6
58
00:02:30,066 --> 00:02:32,850
of no five times
1/6 of one five.
59
00:02:32,850 --> 00:02:37,900
And there are 3 choose 1
possible sequences of dice
60
00:02:37,900 --> 00:02:42,800
rolls with one five, and
the others non-fives.
61
00:02:42,800 --> 00:02:44,520
Likewise, for two
fives, there's 3
62
00:02:44,520 --> 00:02:49,130
choose 2 times 5/6
to the 1, which
63
00:02:49,130 --> 00:02:57,550
is one way of choosing the
place that does not have a five.
64
00:02:57,550 --> 00:03:01,630
And 1/6 times 1/6, which is
the probability of getting
65
00:03:01,630 --> 00:03:03,214
fives in the other places.
66
00:03:03,214 --> 00:03:05,380
I didn't say that well, but
you can get it straight.
67
00:03:05,380 --> 00:03:05,950
OK.
68
00:03:05,950 --> 00:03:10,490
The probability of three fives
is the probability of 1/6
69
00:03:10,490 --> 00:03:12,415
of getting a five on
the first die, 1/6
70
00:03:12,415 --> 00:03:14,540
of getting a five on the
second die, 1/6 of getting
71
00:03:14,540 --> 00:03:15,725
a five on the third die.
72
00:03:15,725 --> 00:03:17,690
It's simply 1/6 cubed.
73
00:03:17,690 --> 00:03:20,610
OK, so we can easily
calculate these probabilities.
74
00:03:20,610 --> 00:03:22,530
This is a familiar exercise.
75
00:03:22,530 --> 00:03:23,830
Let's put them in a chart.
76
00:03:23,830 --> 00:03:27,040
So what we've figured
out is that 0 matches has
77
00:03:27,040 --> 00:03:29,920
a probability of 125 over 216.
78
00:03:29,920 --> 00:03:33,360
And in that case, I lose $1.00.
79
00:03:33,360 --> 00:03:36,770
One match turns out to have a
probability of 75 out of 216,
80
00:03:36,770 --> 00:03:38,846
and I win $1.00.
81
00:03:38,846 --> 00:03:42,290
Two matches is 15 out
of 216, I win $2.00.
82
00:03:42,290 --> 00:03:46,340
And three matches, there's
one chance in 216 that I win
83
00:03:46,340 --> 00:03:48,520
the $3.00.
84
00:03:48,520 --> 00:03:53,130
So now I can ask about
what do I expect to win.
85
00:03:53,130 --> 00:03:55,940
Suppose I play 216
games, and the games
86
00:03:55,940 --> 00:03:58,670
split exactly according
to these probabilities.
87
00:03:58,670 --> 00:04:01,810
Then what I would expect
is that I would wind up
88
00:04:01,810 --> 00:04:05,470
with 0 matches about 125 times.
89
00:04:05,470 --> 00:04:08,000
That was the probability
of there being no matches.
90
00:04:08,000 --> 00:04:09,860
It was 125/216.
91
00:04:09,860 --> 00:04:13,604
So if I played 216
games, I expect about 125
92
00:04:13,604 --> 00:04:15,270
are going to-- I'm
going to win nothing.
93
00:04:15,270 --> 00:04:17,894
Or, I'm going to get no matches,
which actually means I'll lose
94
00:04:17,894 --> 00:04:19,240
$1.00 on each.
95
00:04:19,240 --> 00:04:21,820
One match I expect
about 75 times.
96
00:04:21,820 --> 00:04:23,120
2 matches, 15 times.
97
00:04:23,120 --> 00:04:25,190
3 matches, once.
98
00:04:25,190 --> 00:04:33,245
So my average win is going to be
125 times minus 1, 75 times 1,
99
00:04:33,245 --> 00:04:38,700
15 times 2 plus 1
times 3 divided by 216.
100
00:04:38,700 --> 00:04:43,460
So these numbers on the top were
how the 216 rolls split among
101
00:04:43,460 --> 00:04:46,610
my choices of losing $1.00,
winning $1.00, winning $2.00,
102
00:04:46,610 --> 00:04:47,660
and winning $3.00.
103
00:04:47,660 --> 00:04:49,540
And it comes out to
be slightly negative.
104
00:04:49,540 --> 00:04:55,100
It's actually minus $0.08--
minus 17/216 of $1.00,
105
00:04:55,100 --> 00:04:58,000
which is about minus $0.08.
106
00:04:58,000 --> 00:05:00,880
So I'm losing, on the
average, $0.08 per roll.
107
00:05:00,880 --> 00:05:02,800
This is not a fair game.
108
00:05:02,800 --> 00:05:04,800
It's really biased against me.
109
00:05:04,800 --> 00:05:06,260
And if I keep
playing long enough,
110
00:05:06,260 --> 00:05:07,890
I'm going to find
that I average out
111
00:05:07,890 --> 00:05:13,100
a kind of steady loss
of about $0.08 a play.
112
00:05:13,100 --> 00:05:15,700
So we would summarize
this by saying
113
00:05:15,700 --> 00:05:18,760
that you expect to
lose $0.08, meaning
114
00:05:18,760 --> 00:05:21,374
that your average loss is $0.08
and you expect that that's
115
00:05:21,374 --> 00:05:23,040
going to be the
phenomenon that comes up
116
00:05:23,040 --> 00:05:25,761
if you keep playing the game
repeatedly and repeatedly.
117
00:05:25,761 --> 00:05:27,260
It's important to
notice, of course,
118
00:05:27,260 --> 00:05:30,940
you never actually lose
$0.08 on any single play.
119
00:05:30,940 --> 00:05:34,040
So what you-- this notion of
your expecting to lose $0.08,
120
00:05:34,040 --> 00:05:35,570
it never happens.
121
00:05:35,570 --> 00:05:37,660
It's just your average loss.
122
00:05:37,660 --> 00:05:40,310
Notice every single play
you're either going to lose $1,
123
00:05:40,310 --> 00:05:42,050
win $1, win $2, win $3.
124
00:05:42,050 --> 00:05:44,770
There's no $0.08
at all showing up.
125
00:05:44,770 --> 00:05:46,280
OK.
126
00:05:46,280 --> 00:05:49,470
So now let's abstract
the expected value
127
00:05:49,470 --> 00:05:52,400
of a random variable
R. So a random variable
128
00:05:52,400 --> 00:05:54,440
is this thing that
probabilistically
129
00:05:54,440 --> 00:05:57,500
takes on different values
with different probabilities.
130
00:05:57,500 --> 00:05:59,810
And its expected
value is defined
131
00:05:59,810 --> 00:06:03,160
to be its average value where
the different values are
132
00:06:03,160 --> 00:06:05,780
weighted by their probabilities.
133
00:06:05,780 --> 00:06:08,090
We can write this out
as a precise formula.
134
00:06:08,090 --> 00:06:11,170
The expectation of
a random variable R
135
00:06:11,170 --> 00:06:16,426
is defined to be the sum over
all its possible values-- it
136
00:06:16,426 --> 00:06:18,050
doesn't indicate what
the summation is,
137
00:06:18,050 --> 00:06:22,140
but that's over all
possible values v-- of v
138
00:06:22,140 --> 00:06:24,890
times the probability
that v comes up,
139
00:06:24,890 --> 00:06:26,530
the probability
that R equals v. So
140
00:06:26,530 --> 00:06:29,200
this is the basic
definition of the expected
141
00:06:29,200 --> 00:06:31,520
value of a random variable.
142
00:06:31,520 --> 00:06:35,140
Now let me mention here
that this sum works
143
00:06:35,140 --> 00:06:39,680
because since we're assuming
accountable sample space,
144
00:06:39,680 --> 00:06:43,080
R is defined on only
countably many outcomes,
145
00:06:43,080 --> 00:06:46,060
which means it can only
take countably many values.
146
00:06:46,060 --> 00:06:50,030
So this is a countable sum
over all the possible values
147
00:06:50,030 --> 00:06:56,090
that R takes, because there are
only countably many of them.
148
00:06:56,090 --> 00:06:58,400
And what we've just
concluded, then,
149
00:06:58,400 --> 00:07:02,240
is the expected win in
the carnival dice game
150
00:07:02,240 --> 00:07:06,010
is minus 17/216.
151
00:07:06,010 --> 00:07:08,430
Check this formal definition
of the expectation
152
00:07:08,430 --> 00:07:12,450
of a random variable versus
the random variable defined
153
00:07:12,450 --> 00:07:17,480
to be how much you win on a
given play of carnival dice,
154
00:07:17,480 --> 00:07:19,890
and it comes out
to be that average.
155
00:07:19,890 --> 00:07:23,799
Minus 17/216.
156
00:07:23,799 --> 00:07:25,340
Now there's a
technical result that's
157
00:07:25,340 --> 00:07:29,990
useful in some proofs that
says that there's another way
158
00:07:29,990 --> 00:07:31,290
to get the expectation.
159
00:07:31,290 --> 00:07:33,320
The expectation can
also be expressed
160
00:07:33,320 --> 00:07:36,810
by saying it's the sum over
all the possible outcomes
161
00:07:36,810 --> 00:07:38,710
in the sample space--
S is the sample
162
00:07:38,710 --> 00:07:44,060
space-- of the value of the
random variable at that outcome
163
00:07:44,060 --> 00:07:47,500
times the probability
of that outcome.
164
00:07:47,500 --> 00:07:51,610
So this is another
alternative definition of
165
00:07:51,610 --> 00:07:56,990
compared to saying
it's the sum over all
166
00:07:56,990 --> 00:08:00,160
the values times the
probability of that value.
167
00:08:00,160 --> 00:08:02,620
Here, it's the sum
over all the outcomes
168
00:08:02,620 --> 00:08:04,789
of the value of the
random variable,
169
00:08:04,789 --> 00:08:06,830
the outcome times the
probability of the outcome.
170
00:08:06,830 --> 00:08:10,060
It's not entirely obvious
that those two definitions
171
00:08:10,060 --> 00:08:10,820
are equivalent.
172
00:08:10,820 --> 00:08:12,361
This form of the
definition turns out
173
00:08:12,361 --> 00:08:14,190
to be technically
helpful in some proofs,
174
00:08:14,190 --> 00:08:17,490
although outside of
proofs you don't use it
175
00:08:17,490 --> 00:08:18,670
so much in applications.
176
00:08:18,670 --> 00:08:21,550
But it's not a bad exercise
to prove this equivalence.
177
00:08:21,550 --> 00:08:23,050
So I'm going to
walk you through it.
178
00:08:23,050 --> 00:08:26,560
But if it's boring-- it's kind
of a boring series of equations
179
00:08:26,560 --> 00:08:29,180
on slides, and you're
welcome to skip past it.
180
00:08:29,180 --> 00:08:32,700
It is a derivation that I expect
you to be able to carry out.
181
00:08:32,700 --> 00:08:34,730
So let's just carry
out this derivation.
182
00:08:34,730 --> 00:08:36,900
I'm going to prove
that the expectation is
183
00:08:36,900 --> 00:08:40,282
equal to the sum over all
the outcomes of the value
184
00:08:40,282 --> 00:08:42,740
of the random variable at the
outcome times the probability
185
00:08:42,740 --> 00:08:44,080
of the outcome.
186
00:08:44,080 --> 00:08:46,227
And let's prove it.
187
00:08:46,227 --> 00:08:48,560
In order to prove it, let's
begin with one little remark
188
00:08:48,560 --> 00:08:49,870
that's useful.
189
00:08:49,870 --> 00:08:52,990
Remember that this
notation R equals
190
00:08:52,990 --> 00:08:56,550
v describes the event that the
random variable takes the value
191
00:08:56,550 --> 00:09:00,730
v, which by definition is an
event is the set of outcomes
192
00:09:00,730 --> 00:09:02,310
where this property holds.
193
00:09:02,310 --> 00:09:06,490
So it's the set of outcomes
omega where R of omega
194
00:09:06,490 --> 00:09:11,090
is equal to v. So let's just
remember that, that brackets
195
00:09:11,090 --> 00:09:13,540
R equals v is the
event that R is
196
00:09:13,540 --> 00:09:17,090
equal to v, meaning the set
of outcomes where that's true.
197
00:09:17,090 --> 00:09:18,670
So what that tells
us in particular
198
00:09:18,670 --> 00:09:20,510
is that the
probability of R equals
199
00:09:20,510 --> 00:09:23,720
v is, by definition, the
sum of the probabilities
200
00:09:23,720 --> 00:09:26,680
of the outcomes in the event.
201
00:09:26,680 --> 00:09:29,485
So it's the sum over
all those outcomes.
202
00:09:32,110 --> 00:09:33,990
Now let's go back to
the original definition
203
00:09:33,990 --> 00:09:36,860
of the expectation of R.
The original definition is--
204
00:09:36,860 --> 00:09:41,240
and the standard one is-- it's
the sum over all the values
205
00:09:41,240 --> 00:09:43,740
of the value times
the probability
206
00:09:43,740 --> 00:09:46,550
that the random variable
is equal to the value.
207
00:09:46,550 --> 00:09:48,200
Now on the previous
slide, we just
208
00:09:48,200 --> 00:09:50,640
had a formula for
the probability
209
00:09:50,640 --> 00:09:53,370
that R is equal
to v. It's simply
210
00:09:53,370 --> 00:09:57,990
the sum over all the outcomes
of where R is equal to v,
211
00:09:57,990 --> 00:10:00,270
of the probability
of that outcome.
212
00:10:00,270 --> 00:10:03,100
So I can replace
that term by the sum
213
00:10:03,100 --> 00:10:06,970
over all the outcomes of the
probability of the outcome.
214
00:10:06,970 --> 00:10:07,576
OK.
215
00:10:07,576 --> 00:10:09,700
So I'm trying to head
towards an expressions that's
216
00:10:09,700 --> 00:10:12,980
only outcomes, which is kind
of the top-level strategy here.
217
00:10:12,980 --> 00:10:14,680
So the first thing
I did was I got rid
218
00:10:14,680 --> 00:10:17,600
of that probability
of v and replaced it
219
00:10:17,600 --> 00:10:19,704
by the sum of all
these probabilities--
220
00:10:19,704 --> 00:10:21,370
of the probabilities
of all the outcomes
221
00:10:21,370 --> 00:10:25,935
where R is v. Well, next step
is I'm going to just distribute
222
00:10:25,935 --> 00:10:28,020
the v over the inner sum.
223
00:10:28,020 --> 00:10:32,350
And I get that this thing
is equal to the sum,
224
00:10:32,350 --> 00:10:36,470
again, over all those outcomes
in R equals v of v times
225
00:10:36,470 --> 00:10:38,740
the probability of the outcome.
226
00:10:38,740 --> 00:10:43,550
But look, these outcomes are the
outcomes where R is equal to v.
227
00:10:43,550 --> 00:10:50,170
So I could replace
that v by R of omega.
228
00:10:50,170 --> 00:10:52,330
That one slipped
sideways a little bit,
229
00:10:52,330 --> 00:10:53,240
so let's watch that.
230
00:10:53,240 --> 00:10:58,730
This v is simply going
to become an R of omega.
231
00:10:58,730 --> 00:11:01,340
I'm still [INAUDIBLE] over
the same set of omegas,
232
00:11:01,340 --> 00:11:04,770
but now I've gotten rid
of pretty much everything
233
00:11:04,770 --> 00:11:05,730
but the omegas.
234
00:11:05,730 --> 00:11:10,530
So I've got this inner sum of
over all possible omegas in R
235
00:11:10,530 --> 00:11:14,320
of v of R of omega times
the probability of omega.
236
00:11:14,320 --> 00:11:16,670
And I'm summing
over all possible v.
237
00:11:16,670 --> 00:11:19,000
But if I'm summing over
all possible v and then
238
00:11:19,000 --> 00:11:21,525
all possible outcomes
where R is equal to v,
239
00:11:21,525 --> 00:11:25,540
I wind up summing over
all possible outcomes.
240
00:11:25,540 --> 00:11:29,250
And so I've finished the proof
that the expectation of R
241
00:11:29,250 --> 00:11:33,770
is equal to the sum over all
the outcomes of R of omega times
242
00:11:33,770 --> 00:11:37,120
the probability of omega.
243
00:11:37,120 --> 00:11:39,140
Now I'd never do a proof
like this in a lecture,
244
00:11:39,140 --> 00:11:41,800
because I think watching
a lecturer write stuff
245
00:11:41,800 --> 00:11:43,500
on the board, a whole
bunch of symbols
246
00:11:43,500 --> 00:11:46,270
and manipulating equations,
is really insipid and boring.
247
00:11:46,270 --> 00:11:48,160
Most people can't
follow it anyway.
248
00:11:48,160 --> 00:11:50,530
I'm hoping that in the
video, where you can go back
249
00:11:50,530 --> 00:11:52,950
if you wish and replay it
and watch it more slowly,
250
00:11:52,950 --> 00:11:55,010
or at your own
speed, the derivation
251
00:11:55,010 --> 00:11:56,870
will be of some value to you.
252
00:11:56,870 --> 00:12:00,070
But let's step back a
little bit and notice
253
00:12:00,070 --> 00:12:02,910
some top-level technical
things that we never
254
00:12:02,910 --> 00:12:05,820
really paid attention to
in the process of doing
255
00:12:05,820 --> 00:12:07,490
this manipulative proof.
256
00:12:07,490 --> 00:12:09,860
So the top-level
observation, first of all,
257
00:12:09,860 --> 00:12:13,670
is that this proof, like many
proofs in basic foundations
258
00:12:13,670 --> 00:12:16,340
of probability theory
and random variables,
259
00:12:16,340 --> 00:12:19,120
in particular, involves
taking sums and rearranging
260
00:12:19,120 --> 00:12:21,440
the terms in the sums a lot.
261
00:12:21,440 --> 00:12:24,170
So the first question
is, why sums?
262
00:12:24,170 --> 00:12:26,640
Remember here we
were summing over all
263
00:12:26,640 --> 00:12:30,520
the possible variables,
all the possible values
264
00:12:30,520 --> 00:12:31,810
of some random variable.
265
00:12:31,810 --> 00:12:33,220
Why is that a sum?
266
00:12:33,220 --> 00:12:38,030
Well it's a sum because we
were assuming that the sample
267
00:12:38,030 --> 00:12:39,520
space was countable.
268
00:12:39,520 --> 00:12:41,640
There were only a
countable number
269
00:12:41,640 --> 00:12:45,430
of values R of omega 0, R
of omega 1, R of omega n,
270
00:12:45,430 --> 00:12:46,690
and so on.
271
00:12:46,690 --> 00:12:51,100
And so we can be
sure that the sum
272
00:12:51,100 --> 00:12:53,390
over all the possible values
of the random variable
273
00:12:53,390 --> 00:12:54,730
is a countable sum.
274
00:12:54,730 --> 00:12:58,620
It's a sum, and we don't have
to worry about integrals, which
275
00:12:58,620 --> 00:13:00,980
is the main technical
reason why we're
276
00:13:00,980 --> 00:13:03,790
doing discrete probability and
assuming that there are only
277
00:13:03,790 --> 00:13:06,130
a countable number of outcomes.
278
00:13:06,130 --> 00:13:08,790
There's a second very
important technicality
279
00:13:08,790 --> 00:13:10,360
that's worth mentioning.
280
00:13:10,360 --> 00:13:12,880
All the proofs involved
rearranging terms
281
00:13:12,880 --> 00:13:16,960
in sums freely and without care.
282
00:13:16,960 --> 00:13:19,780
But that means that
we're implicitly
283
00:13:19,780 --> 00:13:24,320
assuming that it's safe to do
that, and that, in particular,
284
00:13:24,320 --> 00:13:27,280
that the defining
sum for expectations
285
00:13:27,280 --> 00:13:29,490
needs to be
absolutely convergent.
286
00:13:29,490 --> 00:13:31,780
And all of these sums
need to be absolutely
287
00:13:31,780 --> 00:13:34,850
convergent in order for
that kind of rearrangement
288
00:13:34,850 --> 00:13:35,960
to make sense.
289
00:13:35,960 --> 00:13:37,910
So remember that
absolute convergence
290
00:13:37,910 --> 00:13:42,240
means that the sum of the
absolute values of all
291
00:13:42,240 --> 00:13:46,030
the terms in the sum converge.
292
00:13:46,030 --> 00:13:49,070
So if we look at this
definition of expectation,
293
00:13:49,070 --> 00:13:51,830
it said it was the sum over
all the values in the range.
294
00:13:51,830 --> 00:13:54,220
We know that's a
countable sum of the value
295
00:13:54,220 --> 00:13:57,350
times the probability that
R was equal to that value.
296
00:13:57,350 --> 00:14:00,730
But the very definition
never specified
297
00:14:00,730 --> 00:14:04,330
the order in which these terms,
v times probability R equals v,
298
00:14:04,330 --> 00:14:05,560
got added up.
299
00:14:05,560 --> 00:14:07,400
It better not make a difference.
300
00:14:07,400 --> 00:14:11,170
So we're implicitly assuming
absolute convergence
301
00:14:11,170 --> 00:14:14,300
of this sum in order
for the expectation
302
00:14:14,300 --> 00:14:15,720
to even be well-defined.
303
00:14:15,720 --> 00:14:17,730
As a matter of fact,
the terrible pathology
304
00:14:17,730 --> 00:14:19,355
that happens-- and
you may have learned
305
00:14:19,355 --> 00:14:20,820
about this in
first-time calculus,
306
00:14:20,820 --> 00:14:22,570
and we actually have
a problem in the text
307
00:14:22,570 --> 00:14:26,660
about it-- is that you can have
sums like this, that are not
308
00:14:26,660 --> 00:14:32,010
absolutely convergent, and then
you pick your favorite value
309
00:14:32,010 --> 00:14:34,520
and I can rearrange
the terms in the sum
310
00:14:34,520 --> 00:14:37,610
so that it converges
to that value.
311
00:14:37,610 --> 00:14:40,840
When you're dealing with
non-absolute value sums,
312
00:14:40,840 --> 00:14:44,560
rearranging is a no-no.
313
00:14:44,560 --> 00:14:47,210
The sum depends
crucially on the ordering
314
00:14:47,210 --> 00:14:49,590
in which the terms
appear, and all
315
00:14:49,590 --> 00:14:52,660
of the reasoning and probability
theory would be inapplicable.
316
00:14:52,660 --> 00:14:55,950
So we are implicitly assuming
that all of these sums
317
00:14:55,950 --> 00:14:57,545
are absolutely convergent.
318
00:15:01,060 --> 00:15:02,930
Just to get some
vocabulary in place,
319
00:15:02,930 --> 00:15:06,520
the expected value is also known
as the mean value, or the mean,
320
00:15:06,520 --> 00:15:11,380
or the expectation of
the random variable.
321
00:15:11,380 --> 00:15:14,680
Now let's connect up
expectations with averages
322
00:15:14,680 --> 00:15:16,020
in a more precise way.
323
00:15:16,020 --> 00:15:17,690
We said that the
expectation was kind
324
00:15:17,690 --> 00:15:20,240
of an abstraction of
averages, but it's more
325
00:15:20,240 --> 00:15:22,592
intimately connected to
averages than that, even.
326
00:15:22,592 --> 00:15:24,050
Let's take an
example where suppose
327
00:15:24,050 --> 00:15:27,820
you have a pile of graded exams,
and you pick one at random.
328
00:15:27,820 --> 00:15:32,330
Let's let S be the score of
the randomly picked exam.
329
00:15:32,330 --> 00:15:37,520
So I'm turning this process,
this random process of picking
330
00:15:37,520 --> 00:15:42,960
an exam from the pile, is
defining a random variable, S,
331
00:15:42,960 --> 00:15:45,380
where by definition of
picking one at random,
332
00:15:45,380 --> 00:15:46,450
I mean uniformly.
333
00:15:46,450 --> 00:15:49,610
So S is actually not a
uniform random variable,
334
00:15:49,610 --> 00:15:52,430
but I'm picking the exams
with equal probability.
335
00:15:52,430 --> 00:15:55,220
And then they have
different scores,
336
00:15:55,220 --> 00:15:58,460
so the outcomes are of
uniform probability.
337
00:15:58,460 --> 00:16:00,940
But S is not,
because there might
338
00:16:00,940 --> 00:16:03,650
be a lot of outcomes, a lot
of exams with the same score.
339
00:16:03,650 --> 00:16:04,370
All right.
340
00:16:04,370 --> 00:16:07,210
S is a random variable
defined by this process
341
00:16:07,210 --> 00:16:09,660
of picking a random exam.
342
00:16:09,660 --> 00:16:12,690
And then you can just check
that the expectation of S
343
00:16:12,690 --> 00:16:17,100
now exactly equals the
average exam score, which
344
00:16:17,100 --> 00:16:18,550
is the typical
thing that students
345
00:16:18,550 --> 00:16:21,891
want to know when the exam is
done, what's the average score.
346
00:16:21,891 --> 00:16:23,390
Actually, the average
score is often
347
00:16:23,390 --> 00:16:26,690
less informative than the
median score, the middle score,
348
00:16:26,690 --> 00:16:28,790
but people somehow
rather always want
349
00:16:28,790 --> 00:16:30,500
to know about the averages.
350
00:16:30,500 --> 00:16:33,940
The reason why the average
may not be so informative
351
00:16:33,940 --> 00:16:37,170
is because-- well, it has some
weird properties that I'll
352
00:16:37,170 --> 00:16:38,350
illustrate in a second.
353
00:16:38,350 --> 00:16:40,890
But the point here
of what we did
354
00:16:40,890 --> 00:16:47,530
where we took the-- we got at
the average score on the exam
355
00:16:47,530 --> 00:16:53,340
by defining a random variable
based on picking a random exam.
356
00:16:53,340 --> 00:16:54,940
So that's a general process.
357
00:16:54,940 --> 00:16:59,620
We can estimate averages in
some population of things
358
00:16:59,620 --> 00:17:04,810
by estimating the expectations
of random variables
359
00:17:04,810 --> 00:17:07,940
based on picking random
elements from the thing
360
00:17:07,940 --> 00:17:09,569
that we're averaging over.
361
00:17:09,569 --> 00:17:11,944
That's called sampling,
and it's a basic idea
362
00:17:11,944 --> 00:17:14,180
of probability theory
that we're going
363
00:17:14,180 --> 00:17:16,420
to be able to get
a hold of averages
364
00:17:16,420 --> 00:17:21,670
by abstracting the
calculation of an average
365
00:17:21,670 --> 00:17:25,650
into taking-- defining
a random variable
366
00:17:25,650 --> 00:17:28,260
and calculating its expectation.
367
00:17:28,260 --> 00:17:31,170
Let's look at an example.
368
00:17:31,170 --> 00:17:33,710
It's obviously impossible
for all the exams
369
00:17:33,710 --> 00:17:36,612
to be above average,
because then the average
370
00:17:36,612 --> 00:17:37,570
would be above average.
371
00:17:37,570 --> 00:17:38,480
That's absurd.
372
00:17:38,480 --> 00:17:41,210
So if you translate that
into a formal statement
373
00:17:41,210 --> 00:17:43,952
about expectations, it
translates directly--
374
00:17:43,952 --> 00:17:45,910
by the way, I don't know
how many of you listen
375
00:17:45,910 --> 00:17:49,780
to the Prairie Home Companion,
but one of the sign-offs
376
00:17:49,780 --> 00:17:53,120
there is at the town of
Lake Woebegone in Wisconsin,
377
00:17:53,120 --> 00:17:55,140
where all the children
are above average.
378
00:17:55,140 --> 00:17:57,920
Well, t'ain't possible.
379
00:17:57,920 --> 00:18:00,910
That translates into
this technical statement
380
00:18:00,910 --> 00:18:04,060
that the probability
that a random variable is
381
00:18:04,060 --> 00:18:08,400
greater than its expected
value is less than 1.
382
00:18:08,400 --> 00:18:13,590
It can't always be greater
than its expected value.
383
00:18:13,590 --> 00:18:15,360
That's absurd.
384
00:18:15,360 --> 00:18:20,504
On the other hand, it's actually
possible for the probability
385
00:18:20,504 --> 00:18:22,920
that the random variable is
bigger than its expected value
386
00:18:22,920 --> 00:18:26,050
to be as close to 1 as you want.
387
00:18:26,050 --> 00:18:28,330
And one way to
think about that is
388
00:18:28,330 --> 00:18:31,180
that, for example,
almost everyone
389
00:18:31,180 --> 00:18:33,730
has an above average
number of fingers.
390
00:18:33,730 --> 00:18:34,980
Think about that for a second.
391
00:18:34,980 --> 00:18:38,690
Almost everyone has an above
average number of fingers.
392
00:18:38,690 --> 00:18:40,330
Well, the explanation
is really simple.
393
00:18:40,330 --> 00:18:45,020
It's simply because
amputation is much more
394
00:18:45,020 --> 00:18:47,790
common than polydactylism.
395
00:18:47,790 --> 00:18:50,410
And if you can't understand
what I just said,
396
00:18:50,410 --> 00:18:53,420
look it up and think about it.