1
00:00:00,000 --> 00:00:00,040
2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative
3
00:00:02,460 --> 00:00:03,870
Commons license.
4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to
5
00:00:06,910 --> 00:00:10,560
offer high quality educational
resources for free.
6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from
7
00:00:13,460 --> 00:00:17,390
hundreds of MIT courses, visit
MIT OpenCourseWare at
8
00:00:17,390 --> 00:00:18,640
ocw.mit.edu.
9
00:00:18,640 --> 00:00:21,860
10
00:00:21,860 --> 00:00:25,130
JOHN TSITSIKLIS: We're going
to start today a new unit.
11
00:00:25,130 --> 00:00:29,320
so we will be talking about
limit theorems.
12
00:00:29,320 --> 00:00:33,580
So just to introduce the topic,
let's think of the
13
00:00:33,580 --> 00:00:35,560
following situation.
14
00:00:35,560 --> 00:00:37,580
There's a population
of penguins down
15
00:00:37,580 --> 00:00:38,970
at the South Pole.
16
00:00:38,970 --> 00:00:42,740
And if you were to pick a
penguin at random and measure
17
00:00:42,740 --> 00:00:46,930
their height, the expected value
of their height would be
18
00:00:46,930 --> 00:00:50,020
the average of the heights of
the different penguins in the
19
00:00:50,020 --> 00:00:50,970
population.
20
00:00:50,970 --> 00:00:53,430
So suppose when you
pick one, every
21
00:00:53,430 --> 00:00:55,210
penguin is equally likely.
22
00:00:55,210 --> 00:00:58,020
Then the expected value is just
the average of all the
23
00:00:58,020 --> 00:00:59,340
penguins out there.
24
00:00:59,340 --> 00:01:01,650
So your boss asks you to
find out what that the
25
00:01:01,650 --> 00:01:03,020
expected value is.
26
00:01:03,020 --> 00:01:04,980
One way would be to
go and measure
27
00:01:04,980 --> 00:01:06,540
each and every penguin.
28
00:01:06,540 --> 00:01:08,600
That might be a little
time consuming.
29
00:01:08,600 --> 00:01:13,120
So alternatively, what you can
do is to go and pick penguins
30
00:01:13,120 --> 00:01:17,450
at random, pick a few of them,
let's say a number n of them.
31
00:01:17,450 --> 00:01:20,420
So you measure the height
of each one.
32
00:01:20,420 --> 00:01:25,920
And then you calculate the
average of the heights of
33
00:01:25,920 --> 00:01:29,050
those penguins that you
have collected.
34
00:01:29,050 --> 00:01:33,100
So this is your estimate
of the expected value.
35
00:01:33,100 --> 00:01:41,010
Now, we called this the sample
mean, which is the mean value,
36
00:01:41,010 --> 00:01:44,430
but within the sample that
you have collected.
37
00:01:44,430 --> 00:01:48,090
This is something that's sort
of feels the same as the
38
00:01:48,090 --> 00:01:52,140
expected value, which
is again, the mean.
39
00:01:52,140 --> 00:01:54,400
But the expected value's a
different kind of mean.
40
00:01:54,400 --> 00:01:57,870
The expected value is the mean
over the entire population,
41
00:01:57,870 --> 00:02:01,680
whereas the sample mean is the
average over the smaller
42
00:02:01,680 --> 00:02:03,940
sample that you have measured.
43
00:02:03,940 --> 00:02:06,330
The expected value
is a number.
44
00:02:06,330 --> 00:02:09,220
The sample mean is a
random variable.
45
00:02:09,220 --> 00:02:11,720
It's a random variable because
the sample you have
46
00:02:11,720 --> 00:02:15,010
collected is random.
47
00:02:15,010 --> 00:02:18,520
Now, we think that this is a
reasonable way of estimating
48
00:02:18,520 --> 00:02:19,900
the expectation.
49
00:02:19,900 --> 00:02:25,710
So in the limit as n goes to
infinity, it's plausible that
50
00:02:25,710 --> 00:02:29,170
the sample mean, the estimate
that we are constructing,
51
00:02:29,170 --> 00:02:33,790
should somehow get close
to the expected value.
52
00:02:33,790 --> 00:02:34,560
What does this mean?
53
00:02:34,560 --> 00:02:36,160
What does it mean
to get close?
54
00:02:36,160 --> 00:02:37,620
In what sense?
55
00:02:37,620 --> 00:02:39,440
And is this statement true?
56
00:02:39,440 --> 00:02:44,160
This is the kind of statement
that we deal with when dealing
57
00:02:44,160 --> 00:02:45,710
with limit theorems.
58
00:02:45,710 --> 00:02:49,500
That's the subject of limit
theorems, when what happens if
59
00:02:49,500 --> 00:02:52,020
you're dealing with lots and
lots of random variables, and
60
00:02:52,020 --> 00:02:54,620
perhaps take averages
and so on.
61
00:02:54,620 --> 00:02:57,280
So why do we bother
about this?
62
00:02:57,280 --> 00:03:01,200
Well, if you're in the sampling
business, it would be
63
00:03:01,200 --> 00:03:04,870
reassuring to know that this
particular way of estimating
64
00:03:04,870 --> 00:03:06,880
the expected value
actually gets you
65
00:03:06,880 --> 00:03:08,850
close to the true answer.
66
00:03:08,850 --> 00:03:11,890
There's also a higher level
reason, which is a little more
67
00:03:11,890 --> 00:03:13,660
abstract and mathematical.
68
00:03:13,660 --> 00:03:17,110
So probability problems are easy
to deal with if you're
69
00:03:17,110 --> 00:03:20,040
having in your hands one or
two random variables.
70
00:03:20,040 --> 00:03:23,520
You can write down their mass
functions, joints density
71
00:03:23,520 --> 00:03:24,930
functions, and so on.
72
00:03:24,930 --> 00:03:27,500
You can calculate on paper
or on a computer,
73
00:03:27,500 --> 00:03:29,430
you can get the answers.
74
00:03:29,430 --> 00:03:33,510
Probability problems become
computationally intractable if
75
00:03:33,510 --> 00:03:36,760
you're dealing, let's say, with
100 random variables and
76
00:03:36,760 --> 00:03:40,280
you're trying to get the exact
answers for anything.
77
00:03:40,280 --> 00:03:43,050
So in principle, the same
formulas that we have, they
78
00:03:43,050 --> 00:03:44,230
still apply.
79
00:03:44,230 --> 00:03:47,360
But they involve summations
over large ranges of
80
00:03:47,360 --> 00:03:48,830
combinations of indices.
81
00:03:48,830 --> 00:03:51,310
And that makes life extremely
difficult.
82
00:03:51,310 --> 00:03:55,100
But when you push the envelope
and you go to a situation
83
00:03:55,100 --> 00:03:58,480
where you're dealing with a
very, very large number of
84
00:03:58,480 --> 00:04:02,130
variables, then you can
start taking limits.
85
00:04:02,130 --> 00:04:05,200
And when you take limits,
wonderful things happen.
86
00:04:05,200 --> 00:04:08,030
Many formulas start simplifying,
and you can
87
00:04:08,030 --> 00:04:11,770
actually get useful answers by
considering those limits.
88
00:04:11,770 --> 00:04:15,450
And that's sort of the big
reason why looking at limit
89
00:04:15,450 --> 00:04:17,820
theorems is a useful
thing to do.
90
00:04:17,820 --> 00:04:20,990
So what we're going to do today,
first we're going to
91
00:04:20,990 --> 00:04:27,110
start with a useful, simple tool
that allows us to relates
92
00:04:27,110 --> 00:04:30,290
probabilities with
expected values.
93
00:04:30,290 --> 00:04:33,230
The Markov inequality is the
first inequality we're going
94
00:04:33,230 --> 00:04:33,840
to write down.
95
00:04:33,840 --> 00:04:37,650
And then using that, we're going
to get the Chebyshev's
96
00:04:37,650 --> 00:04:39,760
inequality, a related
inequality.
97
00:04:39,760 --> 00:04:43,760
Then we need to define what do
we mean by convergence when we
98
00:04:43,760 --> 00:04:45,270
talk about random variables.
99
00:04:45,270 --> 00:04:48,310
It's a notion that's a
generalization of the notion
100
00:04:48,310 --> 00:04:51,000
of the usual convergence
of limits of
101
00:04:51,000 --> 00:04:52,690
a sequence of numbers.
102
00:04:52,690 --> 00:04:55,710
And once we have our notion of
convergence, we're going to
103
00:04:55,710 --> 00:05:00,860
see that, indeed, the sample
mean converges to the true
104
00:05:00,860 --> 00:05:04,380
mean, converges to the expected
value of the X's.
105
00:05:04,380 --> 00:05:08,840
And this statement is called the
weak law of large numbers.
106
00:05:08,840 --> 00:05:11,650
The reason it's called the weak
law is because there's
107
00:05:11,650 --> 00:05:14,640
also a strong law, which is
a statement with the same
108
00:05:14,640 --> 00:05:16,570
flavor, but with a somewhat
different
109
00:05:16,570 --> 00:05:18,410
mathematical content.
110
00:05:18,410 --> 00:05:20,790
But it's a little more abstract,
and we will not be
111
00:05:20,790 --> 00:05:21,680
getting into this.
112
00:05:21,680 --> 00:05:26,070
So the weak law is all that
you're going to get.
113
00:05:26,070 --> 00:05:28,570
All right.
114
00:05:28,570 --> 00:05:31,050
So now we start our
digression.
115
00:05:31,050 --> 00:05:38,220
And our first tool will be the
so-called Markov inequality.
116
00:05:38,220 --> 00:05:45,050
117
00:05:45,050 --> 00:05:48,040
So let's take a random variable
that's always
118
00:05:48,040 --> 00:05:48,870
non-negative.
119
00:05:48,870 --> 00:05:51,790
No matter what, it gets
no negative values.
120
00:05:51,790 --> 00:05:53,710
To keep things simple,
let's assume it's a
121
00:05:53,710 --> 00:05:55,500
discrete random variable.
122
00:05:55,500 --> 00:05:59,770
So the expected value is the sum
over all possible values
123
00:05:59,770 --> 00:06:01,460
that a random variable
can take.
124
00:06:01,460 --> 00:06:04,440
125
00:06:04,440 --> 00:06:06,600
The values of the random
variables that can take
126
00:06:06,600 --> 00:06:10,850
weighted according to their
corresponding probabilities.
127
00:06:10,850 --> 00:06:13,700
Now, this is a sum
over all x's.
128
00:06:13,700 --> 00:06:16,640
But x takes non-negative
values.
129
00:06:16,640 --> 00:06:19,780
And the PMF is also
non-negative.
130
00:06:19,780 --> 00:06:24,310
So if I take a sum over fewer
things, I'm going to get a
131
00:06:24,310 --> 00:06:25,550
smaller value.
132
00:06:25,550 --> 00:06:29,180
So the sum when I add over
everything is less than or
133
00:06:29,180 --> 00:06:33,255
equal to the sum that I will get
if I only add those terms
134
00:06:33,255 --> 00:06:35,620
that are bigger than
a certain constant.
135
00:06:35,620 --> 00:06:38,600
136
00:06:38,600 --> 00:06:45,140
Now, if I'm adding over x's that
are bigger than a, the x
137
00:06:45,140 --> 00:06:48,630
that shows up up there
will always be larger
138
00:06:48,630 --> 00:06:50,490
than or equal to a.
139
00:06:50,490 --> 00:06:52,370
So we get this inequality.
140
00:06:52,370 --> 00:06:58,170
141
00:06:58,170 --> 00:06:59,980
And now, a is a constant.
142
00:06:59,980 --> 00:07:02,870
I can pull it outside
the summation.
143
00:07:02,870 --> 00:07:05,320
And then I'm left with the
probabilities of all the x's
144
00:07:05,320 --> 00:07:06,990
that are bigger than a.
145
00:07:06,990 --> 00:07:08,850
And that's just the
probability of
146
00:07:08,850 --> 00:07:10,250
being bigger than a.
147
00:07:10,250 --> 00:07:15,540
148
00:07:15,540 --> 00:07:18,140
OK, so that's the Markov
inequality.
149
00:07:18,140 --> 00:07:23,800
Basically tells us that the
expected value is larger than
150
00:07:23,800 --> 00:07:26,240
or equal to this number.
151
00:07:26,240 --> 00:07:30,260
It relates expected values
to probabilities.
152
00:07:30,260 --> 00:07:34,660
It tells us that if the expected
value is small, then
153
00:07:34,660 --> 00:07:39,250
the probability that x is big
is also going to be small.
154
00:07:39,250 --> 00:07:42,240
So it's translates a statement
about smallness of expected
155
00:07:42,240 --> 00:07:46,205
values to a statement about
smallness of probabilities.
156
00:07:46,205 --> 00:07:49,020
157
00:07:49,020 --> 00:07:49,930
OK.
158
00:07:49,930 --> 00:07:54,210
What we actually need is a
somewhat different version of
159
00:07:54,210 --> 00:07:57,240
this same statement.
160
00:07:57,240 --> 00:08:03,010
And what we're going to do is to
apply this inequality to a
161
00:08:03,010 --> 00:08:08,150
non-negative random variable
of a special type.
162
00:08:08,150 --> 00:08:13,330
And you can think of applying
this same calculation to a
163
00:08:13,330 --> 00:08:18,800
random variable of this form, (X
minus mu)-squared, where mu
164
00:08:18,800 --> 00:08:21,870
is the expected value of X.
165
00:08:21,870 --> 00:08:24,075
Now, this is a non-negative
random variable.
166
00:08:24,075 --> 00:08:35,419
167
00:08:35,419 --> 00:08:37,919
So, the expected value of this
random variable, which is the
168
00:08:37,919 --> 00:08:42,220
variance, by following the same
thinking as we had in
169
00:08:42,220 --> 00:08:52,880
that derivation up to there, is
bigger than the probability
170
00:08:52,880 --> 00:08:58,210
that this random variable
is bigger than some--
171
00:08:58,210 --> 00:09:04,760
let me use a-squared
instead of an a
172
00:09:04,760 --> 00:09:06,585
times the value a-squared.
173
00:09:06,585 --> 00:09:12,420
174
00:09:12,420 --> 00:09:16,310
So now of course, this
probability is the same as the
175
00:09:16,310 --> 00:09:23,440
probability that the absolute
value of X minus mu is bigger
176
00:09:23,440 --> 00:09:27,190
than a times a-squared.
177
00:09:27,190 --> 00:09:34,860
And this side is equal to the
variance of X. So this relates
178
00:09:34,860 --> 00:09:40,890
the variance of X to the
probability that our random
179
00:09:40,890 --> 00:09:45,200
variable is far away
from its mean.
180
00:09:45,200 --> 00:09:50,590
If the variance is small, then
it means that the probability
181
00:09:50,590 --> 00:09:54,635
of being far away from the
mean is also small.
182
00:09:54,635 --> 00:09:57,240
183
00:09:57,240 --> 00:10:02,220
So I derived this by applying
the Markov inequality to this
184
00:10:02,220 --> 00:10:04,950
particular non-negative
random variable.
185
00:10:04,950 --> 00:10:09,500
Or just to reinforce, perhaps,
the message, and increase your
186
00:10:09,500 --> 00:10:13,450
confidence in this inequality,
let's just look at the
187
00:10:13,450 --> 00:10:16,980
derivation once more, where I'm
going, here, to start from
188
00:10:16,980 --> 00:10:20,890
first principles, but use the
same idea as the one that was
189
00:10:20,890 --> 00:10:23,480
used in the proof out here.
190
00:10:23,480 --> 00:10:23,685
Ok.
191
00:10:23,685 --> 00:10:26,920
So just for variety, now let's
think of X as being a
192
00:10:26,920 --> 00:10:28,760
continuous random variable.
193
00:10:28,760 --> 00:10:31,520
The derivation is the same
whether it's discrete or
194
00:10:31,520 --> 00:10:32,510
continuous.
195
00:10:32,510 --> 00:10:35,990
So by definition, the variance
is the integral, is this
196
00:10:35,990 --> 00:10:38,130
particular integral.
197
00:10:38,130 --> 00:10:43,920
Now, the integral is going to
become smaller if I integrate,
198
00:10:43,920 --> 00:10:47,130
instead of integrating over
the full range, I only
199
00:10:47,130 --> 00:10:51,070
integrate over x's that are
far away from the mean.
200
00:10:51,070 --> 00:10:52,700
So mu is the mean.
201
00:10:52,700 --> 00:10:54,345
Think of c as some big number.
202
00:10:54,345 --> 00:10:59,670
203
00:10:59,670 --> 00:11:02,210
These are x's that are far
away from the mean to the
204
00:11:02,210 --> 00:11:05,410
left, from minus infinity
to mu minus c.
205
00:11:05,410 --> 00:11:09,030
And these are the x's that are
far away from the mean on the
206
00:11:09,030 --> 00:11:11,210
positive side.
207
00:11:11,210 --> 00:11:13,420
So by integrating over
fewer stuff, I'm
208
00:11:13,420 --> 00:11:15,580
getting a smaller integral.
209
00:11:15,580 --> 00:11:21,970
Now, for any x in this range,
this distance, x minus mu, is
210
00:11:21,970 --> 00:11:23,220
at least c.
211
00:11:23,220 --> 00:11:26,320
So that squared is at
least c squared.
212
00:11:26,320 --> 00:11:28,910
So this term over this
range of integration
213
00:11:28,910 --> 00:11:30,520
is at least c squared.
214
00:11:30,520 --> 00:11:33,250
So I can take it outside
the integral.
215
00:11:33,250 --> 00:11:36,400
And I'm left just with the
integral of the density.
216
00:11:36,400 --> 00:11:38,480
Same thing on the other side.
217
00:11:38,480 --> 00:11:41,770
And so what factors out is
this term c squared.
218
00:11:41,770 --> 00:11:45,360
And inside, we're left with the
probability of being to
219
00:11:45,360 --> 00:11:49,060
the left of mu minus c, and then
the probability of being
220
00:11:49,060 --> 00:11:52,310
to the right of mu plus c,
which is the same as the
221
00:11:52,310 --> 00:11:55,370
probability that the absolute
value of the distance from the
222
00:11:55,370 --> 00:11:58,770
mean is larger than
or equal to c.
223
00:11:58,770 --> 00:12:04,820
So that's the same inequality
that we proved there, except
224
00:12:04,820 --> 00:12:06,060
that here I'm using c.
225
00:12:06,060 --> 00:12:10,530
There I used a, but it's
exactly the same one.
226
00:12:10,530 --> 00:12:12,960
This inequality was maybe better
to understand if you
227
00:12:12,960 --> 00:12:16,790
take that term and send it
to the other side and
228
00:12:16,790 --> 00:12:18,780
write it this form.
229
00:12:18,780 --> 00:12:20,010
What does it tell us?
230
00:12:20,010 --> 00:12:25,750
It tells us that if c is a big
number, it tells us that the
231
00:12:25,750 --> 00:12:30,750
probability of being more than
c away from the mean is going
232
00:12:30,750 --> 00:12:32,330
to be a small number.
233
00:12:32,330 --> 00:12:34,780
When c is big, this is small.
234
00:12:34,780 --> 00:12:35,970
Now, this is intuitive.
235
00:12:35,970 --> 00:12:38,290
The variance is a measure
of the spread of the
236
00:12:38,290 --> 00:12:40,960
distribution, how wide it is.
237
00:12:40,960 --> 00:12:43,960
It tells us that if the
variance is small, the
238
00:12:43,960 --> 00:12:46,320
distribution is not very wide.
239
00:12:46,320 --> 00:12:49,020
And mathematically, this
translates to this statement
240
00:12:49,020 --> 00:12:52,360
that when the variance is small,
the probability of
241
00:12:52,360 --> 00:12:54,880
being far away is going
to be small.
242
00:12:54,880 --> 00:12:58,370
And the further away you're
looking, that is, if c is a
243
00:12:58,370 --> 00:13:00,330
bigger number, that probability
244
00:13:00,330 --> 00:13:01,765
also becomes small.
245
00:13:01,765 --> 00:13:04,930
246
00:13:04,930 --> 00:13:07,880
Maybe an even more intuitive way
to think about the content
247
00:13:07,880 --> 00:13:13,230
of this inequality is to,
instead of c, use the number
248
00:13:13,230 --> 00:13:16,910
k, where k is positive
and sigma is
249
00:13:16,910 --> 00:13:18,530
the standard deviation.
250
00:13:18,530 --> 00:13:22,670
So let's just plug k sigma
in the place of c.
251
00:13:22,670 --> 00:13:25,300
So this becomes k
sigma squared.
252
00:13:25,300 --> 00:13:27,130
These sigma squared's cancel.
253
00:13:27,130 --> 00:13:29,770
We're left with 1
over k-square.
254
00:13:29,770 --> 00:13:31,690
Now, what is this?
255
00:13:31,690 --> 00:13:36,260
This is the event that you are
k standard deviations away
256
00:13:36,260 --> 00:13:37,770
from the mean.
257
00:13:37,770 --> 00:13:40,600
So for example, this statement
here tells you that if you
258
00:13:40,600 --> 00:13:44,900
look at the test scores from a
quiz, what fraction of the
259
00:13:44,900 --> 00:13:49,900
class are 3 standard deviations
away from the mean?
260
00:13:49,900 --> 00:13:53,000
It's possible, but it's not
going to be a lot of people.
261
00:13:53,000 --> 00:13:57,930
It's going to be at most, 1/9
of the class that can be 3
262
00:13:57,930 --> 00:14:02,190
standard deviations or more
away from the mean.
263
00:14:02,190 --> 00:14:05,250
So the Chebyshev inequality
is a really useful one.
264
00:14:05,250 --> 00:14:07,860
265
00:14:07,860 --> 00:14:11,300
It comes in handy whenever you
want to relate probabilities
266
00:14:11,300 --> 00:14:12,800
and expected values.
267
00:14:12,800 --> 00:14:16,390
So if you know that your
expected values or, in
268
00:14:16,390 --> 00:14:19,260
particular, that your variance
is small, this tells you
269
00:14:19,260 --> 00:14:23,080
something about tailed
probabilities.
270
00:14:23,080 --> 00:14:25,530
So this is the end of our
first digression.
271
00:14:25,530 --> 00:14:28,320
We have this inequality
in our hands.
272
00:14:28,320 --> 00:14:31,170
Our second digression is
talk about limits.
273
00:14:31,170 --> 00:14:34,680
274
00:14:34,680 --> 00:14:37,190
We want to eventually talk
about limits of random
275
00:14:37,190 --> 00:14:39,750
variables, but as a warm up,
we're going to start with
276
00:14:39,750 --> 00:14:42,440
limits of sequences.
277
00:14:42,440 --> 00:14:47,670
So you're given a sequence
of numbers, a1,
278
00:14:47,670 --> 00:14:50,500
a2, a3, and so on.
279
00:14:50,500 --> 00:14:54,160
And we want to define the
notion that a sequence
280
00:14:54,160 --> 00:14:56,470
converges to a number.
281
00:14:56,470 --> 00:15:04,710
You sort of know what this
means, but let's just go
282
00:15:04,710 --> 00:15:06,510
through it some more.
283
00:15:06,510 --> 00:15:09,890
So here's a.
284
00:15:09,890 --> 00:15:16,200
We have our sequence of
values as n increases.
285
00:15:16,200 --> 00:15:20,290
What do we mean by the sequence
converging to a is
286
00:15:20,290 --> 00:15:23,550
that when you look at those
values, they get closer and
287
00:15:23,550 --> 00:15:25,140
closer to a.
288
00:15:25,140 --> 00:15:29,570
So this value here is your
typical a sub n.
289
00:15:29,570 --> 00:15:33,880
They get closer and closer to
a, and they stay closer.
290
00:15:33,880 --> 00:15:36,860
So let's try to make
that more precise.
291
00:15:36,860 --> 00:15:40,750
What it means is let's
fix a sense of what
292
00:15:40,750 --> 00:15:42,250
it means to be close.
293
00:15:42,250 --> 00:15:47,540
Let me look at an interval that
goes from a - epsilon to
294
00:15:47,540 --> 00:15:50,340
a + epsilon.
295
00:15:50,340 --> 00:15:57,280
Then if my sequence converges
to a, this means that as n
296
00:15:57,280 --> 00:16:02,810
increases, eventually the values
of the sequence that I
297
00:16:02,810 --> 00:16:06,420
get stay inside this band.
298
00:16:06,420 --> 00:16:10,430
Since they converge to a, this
means that eventually they
299
00:16:10,430 --> 00:16:14,130
will be smaller than
a + epsilon and
300
00:16:14,130 --> 00:16:16,310
bigger than a - epsilon.
301
00:16:16,310 --> 00:16:21,320
So convergence means that
given a band of positive
302
00:16:21,320 --> 00:16:25,690
length around the number a,
the values of the sequence
303
00:16:25,690 --> 00:16:28,720
that you get eventually
get inside and
304
00:16:28,720 --> 00:16:31,300
stay inside that band.
305
00:16:31,300 --> 00:16:34,060
So that's sort of the picture
definition of
306
00:16:34,060 --> 00:16:35,840
what convergence means.
307
00:16:35,840 --> 00:16:40,460
So now let's translate this into
a mathematical statement.
308
00:16:40,460 --> 00:16:45,610
Given a band of positive length,
no matter how wide
309
00:16:45,610 --> 00:16:50,690
that band is or how narrow it
is, so for every epsilon
310
00:16:50,690 --> 00:16:56,500
positive, eventually the
sequence gets inside the band.
311
00:16:56,500 --> 00:16:58,460
What does eventually mean?
312
00:16:58,460 --> 00:17:01,410
There exists a time,
so that after that
313
00:17:01,410 --> 00:17:03,510
time something happens.
314
00:17:03,510 --> 00:17:07,230
And the something that happens
is that after that time, we
315
00:17:07,230 --> 00:17:09,520
are inside that band.
316
00:17:09,520 --> 00:17:12,060
So this is a formal mathematical
definition, which
317
00:17:12,060 --> 00:17:17,250
actually translates what I was
telling in the wordy way
318
00:17:17,250 --> 00:17:20,140
before, and showing in
terms of the picture.
319
00:17:20,140 --> 00:17:25,140
Given a certain band, even if
it's narrow, eventually, after
320
00:17:25,140 --> 00:17:28,520
a certain time n0, the values
of the sequence are going to
321
00:17:28,520 --> 00:17:30,240
stay inside this band.
322
00:17:30,240 --> 00:17:35,770
Now, if I were to take epsilon
to be very small, this thing
323
00:17:35,770 --> 00:17:38,130
would still be true that
eventually I'm going to get
324
00:17:38,130 --> 00:17:42,400
inside of the band, except that
I may have to wait longer
325
00:17:42,400 --> 00:17:45,770
for the values to
get inside here.
326
00:17:45,770 --> 00:17:48,400
All right, that's what it means
for a deterministic
327
00:17:48,400 --> 00:17:51,350
sequence to converge
to something.
328
00:17:51,350 --> 00:17:54,150
Now, how about random
variables.
329
00:17:54,150 --> 00:17:57,340
What does it mean for a sequence
of random variables
330
00:17:57,340 --> 00:18:00,280
to converge to a number?
331
00:18:00,280 --> 00:18:02,600
We're just going to twist
a little bit of the word
332
00:18:02,600 --> 00:18:03,310
definition.
333
00:18:03,310 --> 00:18:08,390
For numbers, we said that
eventually the numbers get
334
00:18:08,390 --> 00:18:10,180
inside that band.
335
00:18:10,180 --> 00:18:13,270
But if instead of numbers we
have random variables with a
336
00:18:13,270 --> 00:18:18,080
certain distribution, so here
instead of a_n we're dealing
337
00:18:18,080 --> 00:18:20,750
with a random variable that has
a distribution, let's say,
338
00:18:20,750 --> 00:18:26,650
of this kind, what we want is
that this distribution gets
339
00:18:26,650 --> 00:18:31,460
inside this band, so it gets
concentrated inside here.
340
00:18:31,460 --> 00:18:33,150
What does it means that
the distribution
341
00:18:33,150 --> 00:18:34,850
gets inside this band?
342
00:18:34,850 --> 00:18:36,910
I mean a random variable
has a distribution.
343
00:18:36,910 --> 00:18:40,130
It may have some tails, so
maybe not the entire
344
00:18:40,130 --> 00:18:43,920
distribution gets concentrated
inside of the band.
345
00:18:43,920 --> 00:18:48,660
But we want that more and more
of this distribution is
346
00:18:48,660 --> 00:18:50,820
concentrated in this band.
347
00:18:50,820 --> 00:18:51,730
So that --
348
00:18:51,730 --> 00:18:53,130
in a sense that --
349
00:18:53,130 --> 00:18:57,070
the probability of falling
outside the band converges to
350
00:18:57,070 --> 00:19:00,410
0 -- becomes smaller
and smaller.
351
00:19:00,410 --> 00:19:05,660
So in words, we're going to say
that the sequence random
352
00:19:05,660 --> 00:19:09,070
variables or a sequence of
probability distributions,
353
00:19:09,070 --> 00:19:12,060
that would be the same,
converges to a particular
354
00:19:12,060 --> 00:19:15,070
number a if the following
is true.
355
00:19:15,070 --> 00:19:22,320
If I consider a small band
around a, then the probability
356
00:19:22,320 --> 00:19:26,300
that my random variable falls
outside this band, which is
357
00:19:26,300 --> 00:19:29,530
the area under this curve,
this probability becomes
358
00:19:29,530 --> 00:19:32,620
smaller and smaller as
n goes to infinity.
359
00:19:32,620 --> 00:19:35,370
The probability of being
outside this band
360
00:19:35,370 --> 00:19:38,570
converges to 0.
361
00:19:38,570 --> 00:19:40,620
So that's the intuitive idea.
362
00:19:40,620 --> 00:19:45,080
So in the beginning, maybe our
distribution is sitting
363
00:19:45,080 --> 00:19:46,590
everywhere.
364
00:19:46,590 --> 00:19:49,490
As n increases, the distribution
starts to get
365
00:19:49,490 --> 00:19:51,560
concentrating inside the band.
366
00:19:51,560 --> 00:19:57,300
When a is even bigger, our
distribution is even more
367
00:19:57,300 --> 00:20:00,310
inside that band, so that these
outside probabilities
368
00:20:00,310 --> 00:20:02,460
become smaller and smaller.
369
00:20:02,460 --> 00:20:03,860
So the corresponding
mathematical
370
00:20:03,860 --> 00:20:06,760
statement is the following.
371
00:20:06,760 --> 00:20:13,730
I fix a band around
a, a +/- epsilon.
372
00:20:13,730 --> 00:20:18,170
Given that band, the probability
of falling outside
373
00:20:18,170 --> 00:20:21,350
this band, this probability
converges to 0.
374
00:20:21,350 --> 00:20:23,600
Or another way to say it is
that the limit of this
375
00:20:23,600 --> 00:20:26,560
probability is equal to 0.
376
00:20:26,560 --> 00:20:29,720
If you were to translate this
into a complete mathematical
377
00:20:29,720 --> 00:20:31,800
statement, you would have
to write down the
378
00:20:31,800 --> 00:20:34,150
following messy thing.
379
00:20:34,150 --> 00:20:37,220
For every epsilon positive --
380
00:20:37,220 --> 00:20:39,480
that's this statement --
381
00:20:39,480 --> 00:20:41,240
the limit is 0.
382
00:20:41,240 --> 00:20:44,610
What does it mean that the
limit of something is 0?
383
00:20:44,610 --> 00:20:47,670
We flip back to the
previous slide.
384
00:20:47,670 --> 00:20:48,110
Why?
385
00:20:48,110 --> 00:20:51,430
Because a probability
is a number.
386
00:20:51,430 --> 00:20:54,720
So here we're talking about
a sequence of numbers
387
00:20:54,720 --> 00:20:56,340
convergent to 0.
388
00:20:56,340 --> 00:20:58,190
What does it mean for a
sequence of numbers to
389
00:20:58,190 --> 00:20:59,180
converge to 0?
390
00:20:59,180 --> 00:21:05,320
It means that for any epsilon
prime positive, there exists
391
00:21:05,320 --> 00:21:11,230
some n0 such that for every
n bigger than n0 the
392
00:21:11,230 --> 00:21:12,770
following is true --
393
00:21:12,770 --> 00:21:16,450
that this probability
is less than or
394
00:21:16,450 --> 00:21:17,860
equal to epsilon prime.
395
00:21:17,860 --> 00:21:20,610
396
00:21:20,610 --> 00:21:27,660
So the mathematical statement
is a little hard to parse.
397
00:21:27,660 --> 00:21:32,270
For every size of that band,
and then you take the
398
00:21:32,270 --> 00:21:34,990
definition of what it means for
the limit of a sequence of
399
00:21:34,990 --> 00:21:37,720
numbers to converge to 0.
400
00:21:37,720 --> 00:21:42,340
But it's a lot easier to
describe this in words and,
401
00:21:42,340 --> 00:21:45,010
basically, think in terms
of this picture.
402
00:21:45,010 --> 00:21:48,690
That as n increases, the
probability of falling outside
403
00:21:48,690 --> 00:21:51,305
those bands just become
smaller and smaller.
404
00:21:51,305 --> 00:21:56,590
So the statement is that our
distribution gets concentrated
405
00:21:56,590 --> 00:22:01,340
in arbitrarily narrow little
bands around that
406
00:22:01,340 --> 00:22:05,050
particular number a.
407
00:22:05,050 --> 00:22:05,350
OK.
408
00:22:05,350 --> 00:22:07,790
So let's look at an example.
409
00:22:07,790 --> 00:22:11,660
Suppose a random variable Yn has
a discrete distribution of
410
00:22:11,660 --> 00:22:13,720
this particular type.
411
00:22:13,720 --> 00:22:17,150
Does it converge to something?
412
00:22:17,150 --> 00:22:19,570
Well, the probability
distribution of this random
413
00:22:19,570 --> 00:22:22,370
variable gets concentrated
at 0 --
414
00:22:22,370 --> 00:22:26,520
there's more and more
probability of being at 0.
415
00:22:26,520 --> 00:22:29,710
If I fix a band around 0 --
416
00:22:29,710 --> 00:22:34,850
so if I take the band from minus
epsilon to epsilon and
417
00:22:34,850 --> 00:22:36,520
look at that band--
418
00:22:36,520 --> 00:22:42,350
the probability of falling
outside this band is 1/n.
419
00:22:42,350 --> 00:22:45,780
As n goes to infinity, that
probability goes to 0.
420
00:22:45,780 --> 00:22:50,550
So in this case, we do
have convergence.
421
00:22:50,550 --> 00:22:56,780
And Yn converges in probability
to the number 0.
422
00:22:56,780 --> 00:23:00,310
So this just captures the
facts obvious from this
423
00:23:00,310 --> 00:23:03,680
picture, that more and more of
our probability distribution
424
00:23:03,680 --> 00:23:07,630
gets concentrated around 0,
as n goes to infinity.
425
00:23:07,630 --> 00:23:10,330
Now, an interesting thing to
notice is the following, that
426
00:23:10,330 --> 00:23:15,390
even though Yn converges to 0,
if you were to write down the
427
00:23:15,390 --> 00:23:20,440
expected value for Yn,
what would it be?
428
00:23:20,440 --> 00:23:24,410
It's going to be n times the
probability of this value,
429
00:23:24,410 --> 00:23:26,240
which is 1/n.
430
00:23:26,240 --> 00:23:29,230
So the expected value
turns out to be 1.
431
00:23:29,230 --> 00:23:34,300
And if you were to look at the
expected value of Yn-squared,
432
00:23:34,300 --> 00:23:38,190
this would be 0.
433
00:23:38,190 --> 00:23:41,770
times this probability, and
then n-squared times this
434
00:23:41,770 --> 00:23:45,720
probability, which
is equal to n.
435
00:23:45,720 --> 00:23:49,850
And this actually goes
to infinity.
436
00:23:49,850 --> 00:23:53,580
So we have this, perhaps,
strange situation where a
437
00:23:53,580 --> 00:23:58,030
random variable goes to 0, but
the expected value of this
438
00:23:58,030 --> 00:24:01,140
random variable does
not go to 0.
439
00:24:01,140 --> 00:24:04,570
And the second moment of that
random variable actually goes
440
00:24:04,570 --> 00:24:05,790
to infinity.
441
00:24:05,790 --> 00:24:08,740
So this tells us that
convergence in probability
442
00:24:08,740 --> 00:24:11,380
tells you something,
but it doesn't tell
443
00:24:11,380 --> 00:24:13,310
you the whole story.
444
00:24:13,310 --> 00:24:17,260
Convergence to 0 of a random
variable doesn't imply
445
00:24:17,260 --> 00:24:20,630
anything about convergence
of expected values or of
446
00:24:20,630 --> 00:24:23,420
variances and so on.
447
00:24:23,420 --> 00:24:26,060
So the reason is that
convergence in probability
448
00:24:26,060 --> 00:24:28,470
tells you that this
tail probability
449
00:24:28,470 --> 00:24:30,400
here is very small.
450
00:24:30,400 --> 00:24:34,440
But it doesn't tell you how
far does this tail go.
451
00:24:34,440 --> 00:24:39,390
As in this example, the tail
probability is small, but that
452
00:24:39,390 --> 00:24:43,410
tail acts far away, so it
gives a disproportionate
453
00:24:43,410 --> 00:24:45,950
contribution to the expected
value or the
454
00:24:45,950 --> 00:24:47,200
expected value squared.
455
00:24:47,200 --> 00:24:53,340
456
00:24:53,340 --> 00:24:53,650
OK.
457
00:24:53,650 --> 00:24:59,000
So now we've got everything that
we need to go back to the
458
00:24:59,000 --> 00:25:02,900
sample mean and study
its properties.
459
00:25:02,900 --> 00:25:05,460
So the sad thing is
that we have a
460
00:25:05,460 --> 00:25:07,320
sequence of random variables.
461
00:25:07,320 --> 00:25:08,350
They're independent.
462
00:25:08,350 --> 00:25:10,450
They have the same
distribution.
463
00:25:10,450 --> 00:25:12,790
And we assume that they
have a finite mean
464
00:25:12,790 --> 00:25:14,480
and a finite variance.
465
00:25:14,480 --> 00:25:18,430
We're looking at the
sample mean.
466
00:25:18,430 --> 00:25:21,670
Now in principle, you can
calculate the probability
467
00:25:21,670 --> 00:25:25,090
distribution of the sample mean,
because we know how to
468
00:25:25,090 --> 00:25:26,950
find the distributions
of sums of
469
00:25:26,950 --> 00:25:28,320
independent random variables.
470
00:25:28,320 --> 00:25:31,030
You use the convolution
formula over and over.
471
00:25:31,030 --> 00:25:32,870
But this is pretty
complicated, so
472
00:25:32,870 --> 00:25:34,730
let's not look at that.
473
00:25:34,730 --> 00:25:38,920
Let's just look at expected
values, variances, and the
474
00:25:38,920 --> 00:25:42,610
probabilities that the sample
mean is far away
475
00:25:42,610 --> 00:25:44,310
from the true mean.
476
00:25:44,310 --> 00:25:47,470
So what is the expected value
of this random variable?
477
00:25:47,470 --> 00:25:51,260
The expected value of a sum of
random variables is the sum of
478
00:25:51,260 --> 00:25:52,510
the expected values.
479
00:25:52,510 --> 00:25:56,320
480
00:25:56,320 --> 00:26:00,320
And then we have this factor
of n in the denominator.
481
00:26:00,320 --> 00:26:07,040
Each one of these expected
values is mu, so we get mu.
482
00:26:07,040 --> 00:26:13,960
So the sample mean, the average
value of this Mn in
483
00:26:13,960 --> 00:26:18,570
expectation is the same as
the true mean inside our
484
00:26:18,570 --> 00:26:20,620
population.
485
00:26:20,620 --> 00:26:26,560
Now here, this is a fine
conceptual point, there's two
486
00:26:26,560 --> 00:26:29,920
kinds of averages involved
when you write down this
487
00:26:29,920 --> 00:26:31,280
expression.
488
00:26:31,280 --> 00:26:33,310
We understand that
expectations are
489
00:26:33,310 --> 00:26:36,470
some kind of average.
490
00:26:36,470 --> 00:26:40,250
The sample mean is also an
average over the values that
491
00:26:40,250 --> 00:26:42,240
we have observed.
492
00:26:42,240 --> 00:26:45,220
But it's two different
kinds of averages.
493
00:26:45,220 --> 00:26:50,460
The sample mean is the average
of the heights of the penguins
494
00:26:50,460 --> 00:26:54,330
that we collected over
a single expedition.
495
00:26:54,330 --> 00:26:59,600
The expected value is to be
thought of as follows, my
496
00:26:59,600 --> 00:27:02,060
probabilistic experiment
is one expedition
497
00:27:02,060 --> 00:27:04,160
to the South Pole.
498
00:27:04,160 --> 00:27:09,760
Expected value here means
thinking on the average over a
499
00:27:09,760 --> 00:27:12,620
huge number of expeditions.
500
00:27:12,620 --> 00:27:16,270
So my expedition is a random
experiment, I collect random
501
00:27:16,270 --> 00:27:18,520
samples, and they record Mn.
502
00:27:18,520 --> 00:27:21,230
503
00:27:21,230 --> 00:27:27,170
The average result of an
expedition is what we would
504
00:27:27,170 --> 00:27:31,060
get if we were to carry out
a zillion expeditions and
505
00:27:31,060 --> 00:27:35,050
average the averages that we
get at each particular
506
00:27:35,050 --> 00:27:36,090
expedition.
507
00:27:36,090 --> 00:27:39,860
So this Mn is the average during
a single expedition.
508
00:27:39,860 --> 00:27:44,090
This expectation is the average
over an imagined
509
00:27:44,090 --> 00:27:46,125
infinite sequence
of expeditions.
510
00:27:46,125 --> 00:27:49,760
511
00:27:49,760 --> 00:27:52,830
And of course, the other thing
to always keep in mind is that
512
00:27:52,830 --> 00:27:56,910
expectations give you numbers,
whereas the sample mean is
513
00:27:56,910 --> 00:28:00,210
actually a random variable.
514
00:28:00,210 --> 00:28:00,486
All right.
515
00:28:00,486 --> 00:28:03,310
So this random variable,
how random is it?
516
00:28:03,310 --> 00:28:05,610
How big is its variance?
517
00:28:05,610 --> 00:28:10,040
So the variance of a sum of
random variables is the sum of
518
00:28:10,040 --> 00:28:12,370
the variances.
519
00:28:12,370 --> 00:28:16,610
But since we're dividing by n,
when you calculate variances
520
00:28:16,610 --> 00:28:19,580
this brings in a factor
of n-squared.
521
00:28:19,580 --> 00:28:21,215
So the variance is sigma-squared
over n.
522
00:28:21,215 --> 00:28:24,340
523
00:28:24,340 --> 00:28:26,870
And in particular, the variance
of the sample mean
524
00:28:26,870 --> 00:28:28,620
becomes smaller and smaller.
525
00:28:28,620 --> 00:28:31,170
It means that when you estimate
that average height
526
00:28:31,170 --> 00:28:34,570
of penguins, if you take a
large sample, then your
527
00:28:34,570 --> 00:28:37,530
estimate is not going
to be too random.
528
00:28:37,530 --> 00:28:41,120
The randomness in your estimates
become small if you
529
00:28:41,120 --> 00:28:43,250
have a large sample size.
530
00:28:43,250 --> 00:28:46,090
Having a large sample size kind
of removes the randomness
531
00:28:46,090 --> 00:28:47,930
from your experiment.
532
00:28:47,930 --> 00:28:52,690
Now let's apply the Chebyshev
inequality to say something
533
00:28:52,690 --> 00:28:56,020
about tail probabilities
for the sample mean.
534
00:28:56,020 --> 00:28:59,610
The probability that you are
more than epsilon away from
535
00:28:59,610 --> 00:29:03,650
the true mean is less than or
equal to the variance of this
536
00:29:03,650 --> 00:29:07,030
quantity divided by this
number squared.
537
00:29:07,030 --> 00:29:09,860
So that's just the translation
of the Chebyshev inequality to
538
00:29:09,860 --> 00:29:12,320
the particular context
we've got here.
539
00:29:12,320 --> 00:29:13,590
We found the variance.
540
00:29:13,590 --> 00:29:15,100
It's sigma-squared over n.
541
00:29:15,100 --> 00:29:18,340
So we end up with
this expression.
542
00:29:18,340 --> 00:29:20,490
So what does this
expression do?
543
00:29:20,490 --> 00:29:25,570
544
00:29:25,570 --> 00:29:32,370
For any given epsilon, if
I fix epsilon, then this
545
00:29:32,370 --> 00:29:36,630
probability, which is less
than sigma-squared over n
546
00:29:36,630 --> 00:29:40,550
epsilon-squared, converges to
0 as n goes to infinity.
547
00:29:40,550 --> 00:29:44,730
548
00:29:44,730 --> 00:29:48,050
And this is just the definition
of convergence in
549
00:29:48,050 --> 00:29:49,690
probability.
550
00:29:49,690 --> 00:29:54,310
If this happens, that the
probability of being more than
551
00:29:54,310 --> 00:29:57,590
epsilon away from the mean, that
probability goes to 0,
552
00:29:57,590 --> 00:30:01,510
and this is true no matter how
I choose my epsilon, then by
553
00:30:01,510 --> 00:30:04,490
definition we have convergence
in probability.
554
00:30:04,490 --> 00:30:08,050
So we have proved that the
sample mean converges in
555
00:30:08,050 --> 00:30:11,430
probability to the true mean.
556
00:30:11,430 --> 00:30:16,210
And this is what the weak law
of large numbers tells us.
557
00:30:16,210 --> 00:30:21,060
So in some vague sense, it
tells us that the sample
558
00:30:21,060 --> 00:30:24,350
means, when you take the
average of many, many
559
00:30:24,350 --> 00:30:28,150
measurements in your sample,
then the sample mean is a good
560
00:30:28,150 --> 00:30:31,870
estimate of the true mean in the
sense that it approaches
561
00:30:31,870 --> 00:30:36,380
the true mean as your sample
size increases.
562
00:30:36,380 --> 00:30:39,220
It approaches the true mean,
but of course in a very
563
00:30:39,220 --> 00:30:42,540
specific sense, in probability,
according to this
564
00:30:42,540 --> 00:30:46,550
notion of convergence
that we have used.
565
00:30:46,550 --> 00:30:51,060
So since we're talking about
sampling, let's go over an
566
00:30:51,060 --> 00:30:56,150
example, which is the typical
situation faced by someone
567
00:30:56,150 --> 00:30:58,110
who's constructing a poll.
568
00:30:58,110 --> 00:31:02,680
So you're interested in some
property of the population.
569
00:31:02,680 --> 00:31:05,590
So what fraction of
the population
570
00:31:05,590 --> 00:31:08,380
prefers Coke to Pepsi?
571
00:31:08,380 --> 00:31:11,080
So there's a number f, which
is that fraction of the
572
00:31:11,080 --> 00:31:12,460
population.
573
00:31:12,460 --> 00:31:16,260
And so this is an
exact number.
574
00:31:16,260 --> 00:31:20,250
So out of a population of 100
million, 20 million prefer
575
00:31:20,250 --> 00:31:25,590
Coke, then f would be 0.2.
576
00:31:25,590 --> 00:31:27,970
We want to find out what
that fraction is.
577
00:31:27,970 --> 00:31:30,590
We cannot ask everyone.
578
00:31:30,590 --> 00:31:34,250
What we're going to do is to
take a random sample of people
579
00:31:34,250 --> 00:31:37,300
and ask them for their
preferences.
580
00:31:37,300 --> 00:31:42,690
So the ith person either says
yes for Coke or no.
581
00:31:42,690 --> 00:31:46,430
And we record that by putting
a 1 each time that we get a
582
00:31:46,430 --> 00:31:49,160
yes answer.
583
00:31:49,160 --> 00:31:51,850
And then we form the average
of these x's.
584
00:31:51,850 --> 00:31:53,070
What is this average?
585
00:31:53,070 --> 00:31:57,000
It's the number of 1's that
we got divided by n.
586
00:31:57,000 --> 00:32:02,590
So this is a fraction, but
calculated only on the basis
587
00:32:02,590 --> 00:32:04,880
of the sample that we have.
588
00:32:04,880 --> 00:32:10,260
So you can think of this as
being an estimate, f_hat,
589
00:32:10,260 --> 00:32:13,120
based on the sample
that we have.
590
00:32:13,120 --> 00:32:17,155
Now, even though we used the
lower case letter here, this
591
00:32:17,155 --> 00:32:20,590
f_hat is, of course,
a random variable.
592
00:32:20,590 --> 00:32:23,300
f is a number.
593
00:32:23,300 --> 00:32:27,570
This is the true fraction in
the overall population.
594
00:32:27,570 --> 00:32:30,380
f_hat is the estimate
that we get by using
595
00:32:30,380 --> 00:32:32,300
our particular sample.
596
00:32:32,300 --> 00:32:32,410
Ok.
597
00:32:32,410 --> 00:32:38,760
So your boss told you, I need to
know what f is, but go and
598
00:32:38,760 --> 00:32:40,150
do some sampling.
599
00:32:40,150 --> 00:32:42,720
What are you going to respond?
600
00:32:42,720 --> 00:32:46,360
Unless I ask everyone in the
whole population, there's no
601
00:32:46,360 --> 00:32:51,180
way for me to know f exactly.
602
00:32:51,180 --> 00:32:51,890
Right?
603
00:32:51,890 --> 00:32:54,560
There's no way.
604
00:32:54,560 --> 00:32:59,040
OK, so the boss tells you, well
OK, then that'll me f
605
00:32:59,040 --> 00:33:00,860
within an accuracy.
606
00:33:00,860 --> 00:33:10,910
I want an answer from you,
that's your answer, which is
607
00:33:10,910 --> 00:33:14,930
close to the correct answer
within 1 % point.
608
00:33:14,930 --> 00:33:20,260
So if the true f is 0.4, your
answer should be somewhere
609
00:33:20,260 --> 00:33:22,500
between 0.39 and 0.41.
610
00:33:22,500 --> 00:33:25,520
I want a really accurate
answer.
611
00:33:25,520 --> 00:33:27,580
What are you going to say?
612
00:33:27,580 --> 00:33:31,360
Well, there's no guarantee
that my answer
613
00:33:31,360 --> 00:33:33,230
will be within 1 %.
614
00:33:33,230 --> 00:33:37,320
Maybe I'm unlucky and I just
happen to sample the wrong set
615
00:33:37,320 --> 00:33:40,450
of people and my answer
comes out to be wrong.
616
00:33:40,450 --> 00:33:45,800
So I cannot give you a hard
guarantee that this inequality
617
00:33:45,800 --> 00:33:47,240
will be satisfied.
618
00:33:47,240 --> 00:33:51,990
But perhaps, I can give you a
guarantee that this inequality
619
00:33:51,990 --> 00:33:55,680
will be satisfied, this accuracy
requirement will be
620
00:33:55,680 --> 00:33:59,340
satisfied, with high
confidence.
621
00:33:59,340 --> 00:34:02,520
That is, there's going to be
a smaller probability that
622
00:34:02,520 --> 00:34:04,420
things go wrong, that
I'm unlikely
623
00:34:04,420 --> 00:34:07,030
and I use a bad sample.
624
00:34:07,030 --> 00:34:10,750
But leaving aside that smaller
probability of being unlucky,
625
00:34:10,750 --> 00:34:13,989
my answer will be accurate
within the accuracy
626
00:34:13,989 --> 00:34:16,100
requirement that you have.
627
00:34:16,100 --> 00:34:20,500
So these two numbers are the
usual specs that one has when
628
00:34:20,500 --> 00:34:22,010
designing polls.
629
00:34:22,010 --> 00:34:27,370
So this number is the accuracy
that we want.
630
00:34:27,370 --> 00:34:29,300
It's the desired accuracy.
631
00:34:29,300 --> 00:34:35,239
And this number has to do with
the confidence that we want.
632
00:34:35,239 --> 00:34:40,210
So 1 minus that number, we could
call it the confidence
633
00:34:40,210 --> 00:34:43,500
that we want out
of our sample.
634
00:34:43,500 --> 00:34:47,820
So this is really 1
minus confidence.
635
00:34:47,820 --> 00:34:51,830
So now your job is to figure out
how large an n, how large
636
00:34:51,830 --> 00:34:56,219
a sample should you be using, in
order to satisfy the specs
637
00:34:56,219 --> 00:34:59,060
that your boss gave you.
638
00:34:59,060 --> 00:35:02,560
All you know at this stage is
the Chebyshev inequality.
639
00:35:02,560 --> 00:35:05,210
So you just try to use it.
640
00:35:05,210 --> 00:35:09,780
The probability of getting an
answer that's more than 0.01
641
00:35:09,780 --> 00:35:14,780
away from the true answer is, by
Chebyshev's inequality, the
642
00:35:14,780 --> 00:35:20,170
variance of this random variable
divided by this
643
00:35:20,170 --> 00:35:21,540
number squared.
644
00:35:21,540 --> 00:35:25,870
The variance, as we argued
a little earlier, is the
645
00:35:25,870 --> 00:35:29,190
variance of the x's
divided by n.
646
00:35:29,190 --> 00:35:31,830
So we get this expression.
647
00:35:31,830 --> 00:35:35,230
So we would like this
number to be less
648
00:35:35,230 --> 00:35:38,330
than or equal to 0.05.
649
00:35:38,330 --> 00:35:41,620
OK, here we hit a little
bit off a difficulty.
650
00:35:41,620 --> 00:35:49,040
The variance, (sigma_x)-squared,
what is it?
651
00:35:49,040 --> 00:35:54,010
(Sigma_x)-squared is, if you
remember the variance of a
652
00:35:54,010 --> 00:35:58,010
Bernoulli random variable,
is this quantity.
653
00:35:58,010 --> 00:35:59,730
But we don't know it.
654
00:35:59,730 --> 00:36:02,880
f is what we're trying to
estimate in the first place.
655
00:36:02,880 --> 00:36:06,790
So the variance is not known,
so I cannot plug in a number
656
00:36:06,790 --> 00:36:08,080
inside here.
657
00:36:08,080 --> 00:36:12,340
What I can do is to be
conservative and use an upper
658
00:36:12,340 --> 00:36:14,050
bound of the variance.
659
00:36:14,050 --> 00:36:17,280
How large can this number get?
660
00:36:17,280 --> 00:36:20,090
Well, you can plot
f times (1-f).
661
00:36:20,090 --> 00:36:25,950
662
00:36:25,950 --> 00:36:26,750
It's a parabola.
663
00:36:26,750 --> 00:36:29,420
It has a root at 0 and at 1.
664
00:36:29,420 --> 00:36:34,450
So the maximum value is going to
be, by symmetry, at 1/2 and
665
00:36:34,450 --> 00:36:39,350
when f is 1/2, then this
variance becomes 1/4.
666
00:36:39,350 --> 00:36:42,340
So I don't know
(sigma_x)-squared, but I'm
667
00:36:42,340 --> 00:36:45,480
going to use the worst case
value for (sigma_x)-squared,
668
00:36:45,480 --> 00:36:48,480
which is 4.
669
00:36:48,480 --> 00:36:53,320
And this is now an inequality
that I know to be always true.
670
00:36:53,320 --> 00:36:56,910
I've got my specs, and my specs
tell me that I want this
671
00:36:56,910 --> 00:36:59,800
number to be less than 0.05.
672
00:36:59,800 --> 00:37:04,980
And given what I know, the best
thing I can do is to say,
673
00:37:04,980 --> 00:37:07,860
OK, I'm going to take
this number and make
674
00:37:07,860 --> 00:37:14,070
it less than 0.05.
675
00:37:14,070 --> 00:37:20,860
If I choose my n so that this
is less than 0.05, then I'm
676
00:37:20,860 --> 00:37:24,890
certain that this probability
is also less than 0.05.
677
00:37:24,890 --> 00:37:28,720
What does it take for this
inequality to be true?
678
00:37:28,720 --> 00:37:36,370
You can solve for n here, and
you find that to satisfy this
679
00:37:36,370 --> 00:37:40,780
inequality, n should be larger
than or equal to 50,000.
680
00:37:40,780 --> 00:37:44,250
So you can just let n
be equal to 50,000.
681
00:37:44,250 --> 00:37:47,920
So the Chebyshev inequality
tells us that if you take n
682
00:37:47,920 --> 00:37:51,940
equal to 50,000, then by the
Chebyshev inequality, we're
683
00:37:51,940 --> 00:37:57,850
guaranteed to satisfy the specs
that we were given.
684
00:37:57,850 --> 00:37:57,960
Ok.
685
00:37:57,960 --> 00:38:03,950
Now, 50,000 is a bit of
a large sample size.
686
00:38:03,950 --> 00:38:05,980
Right?
687
00:38:05,980 --> 00:38:09,490
If you read anything in the
newspapers where they say so
688
00:38:09,490 --> 00:38:13,230
much of the voters think this
and that, this was determined
689
00:38:13,230 --> 00:38:19,830
on the basis of a sample of
1,200 likely voters or so.
690
00:38:19,830 --> 00:38:23,430
So the numbers that you will
typically see in these news
691
00:38:23,430 --> 00:38:27,590
items about polling, they
usually involve sample sizes
692
00:38:27,590 --> 00:38:30,080
about the 1,000 or so.
693
00:38:30,080 --> 00:38:35,250
You will never see a sample
size of 50,000.
694
00:38:35,250 --> 00:38:37,230
That's too much.
695
00:38:37,230 --> 00:38:41,670
So where can we cut
some corners?
696
00:38:41,670 --> 00:38:46,390
Well, we can cut corners
basically in three places.
697
00:38:46,390 --> 00:38:49,950
This requirement is a
little too tight.
698
00:38:49,950 --> 00:38:53,530
Newspaper stories will usually
tell you, we have an accuracy
699
00:38:53,530 --> 00:38:58,800
of +/- 3 % points, instead
of 1 % point.
700
00:38:58,800 --> 00:39:03,770
And because this number comes up
as a square, by making it 3
701
00:39:03,770 --> 00:39:09,000
% points instead of 1, saves
you a factor of 10.
702
00:39:09,000 --> 00:39:12,790
Then, the five percent
confidence, I guess that's
703
00:39:12,790 --> 00:39:15,180
usually OK.
704
00:39:15,180 --> 00:39:19,400
If we use that factor of 10,
then we make our sample that
705
00:39:19,400 --> 00:39:23,730
we gain from here, then we get
a sample size of 10,000.
706
00:39:23,730 --> 00:39:25,980
And that's, again,
a little too big.
707
00:39:25,980 --> 00:39:28,140
So where can we fix things?
708
00:39:28,140 --> 00:39:31,140
Well, it turns out that this
inequality that we're using
709
00:39:31,140 --> 00:39:34,660
here, Chebyshev's inequality,
is just an inequality.
710
00:39:34,660 --> 00:39:36,890
It's not that tight.
711
00:39:36,890 --> 00:39:38,850
It's not very accurate.
712
00:39:38,850 --> 00:39:42,800
Maybe there's a better way of
calculating or estimating this
713
00:39:42,800 --> 00:39:46,760
quantity, which is smaller
than this.
714
00:39:46,760 --> 00:39:49,770
And using a more accurate
inequality or a more accurate
715
00:39:49,770 --> 00:39:55,320
bound, then we can convince
ourselves that we can settle
716
00:39:55,320 --> 00:39:57,800
with a smaller sample size.
717
00:39:57,800 --> 00:40:01,770
This more accurate kind of
inequality comes out of a
718
00:40:01,770 --> 00:40:04,140
difference limit theorem,
which is the next limit
719
00:40:04,140 --> 00:40:06,030
theorem we're going
to consider.
720
00:40:06,030 --> 00:40:08,310
We're going to start the
discussion today, but we're
721
00:40:08,310 --> 00:40:12,150
going to continue with
it next week.
722
00:40:12,150 --> 00:40:18,750
Before I tell you exactly what
that other limit theorem says,
723
00:40:18,750 --> 00:40:20,800
let me give you the
big picture of
724
00:40:20,800 --> 00:40:24,760
what's involved here.
725
00:40:24,760 --> 00:40:29,170
We're dealing with sums of
i.i.d random variables.
726
00:40:29,170 --> 00:40:32,300
Each X has a distribution
of its own.
727
00:40:32,300 --> 00:40:34,840
728
00:40:34,840 --> 00:40:41,190
So suppose that X has a
distribution which is
729
00:40:41,190 --> 00:40:43,090
something like this.
730
00:40:43,090 --> 00:40:48,560
This is the density of X. If I
add lots of X's together, what
731
00:40:48,560 --> 00:40:51,460
kind of distribution
do I expect?
732
00:40:51,460 --> 00:40:55,170
The mean is going to be
n times the mean of an
733
00:40:55,170 --> 00:41:00,560
individual X. So if this is mu,
I'm going to get a mean of
734
00:41:00,560 --> 00:41:02,730
n times mu.
735
00:41:02,730 --> 00:41:06,620
But my variance will
also increase.
736
00:41:06,620 --> 00:41:08,050
When I add the random
variables,
737
00:41:08,050 --> 00:41:10,190
I'm adding the variances.
738
00:41:10,190 --> 00:41:13,370
So since the variance increases,
we're going to get
739
00:41:13,370 --> 00:41:17,610
a distribution that's
pretty wide.
740
00:41:17,610 --> 00:41:23,240
So this is the density of X1
plus all the way to Xn.
741
00:41:23,240 --> 00:41:27,640
So as n increases, my
distribution shifts, because
742
00:41:27,640 --> 00:41:28,770
the mean is positive.
743
00:41:28,770 --> 00:41:30,610
So I keep adding things.
744
00:41:30,610 --> 00:41:33,870
And also, my distribution
becomes wider and wider.
745
00:41:33,870 --> 00:41:36,080
The variance increases.
746
00:41:36,080 --> 00:41:39,260
Well, we started a different
scaling.
747
00:41:39,260 --> 00:41:42,980
We started a scaled version of
this quantity when we looked
748
00:41:42,980 --> 00:41:46,180
at the weak law of
large numbers.
749
00:41:46,180 --> 00:41:49,580
In the weak law of large
numbers, we take this random
750
00:41:49,580 --> 00:41:52,140
variable and divide it by n.
751
00:41:52,140 --> 00:41:56,300
And what the weak law tells us
is that we're going to get a
752
00:41:56,300 --> 00:42:01,050
distribution that's very highly
concentrated around the
753
00:42:01,050 --> 00:42:03,650
true mean, which is mu.
754
00:42:03,650 --> 00:42:07,520
So this here would be the
density of X1 plus
755
00:42:07,520 --> 00:42:12,630
Xn divided by n.
756
00:42:12,630 --> 00:42:16,660
Because I've divided by n, the
mean has become the original
757
00:42:16,660 --> 00:42:19,410
mean, which is mu.
758
00:42:19,410 --> 00:42:22,620
But the weak law of large
numbers tells us that the
759
00:42:22,620 --> 00:42:26,650
distribution of this random
variable is very concentrated
760
00:42:26,650 --> 00:42:27,810
around the mean.
761
00:42:27,810 --> 00:42:29,850
So we get a distribution
that's very
762
00:42:29,850 --> 00:42:31,520
narrow in this kind.
763
00:42:31,520 --> 00:42:34,230
In the limit, this distribution
becomes one
764
00:42:34,230 --> 00:42:37,570
that's just concentrated
on top of mu.
765
00:42:37,570 --> 00:42:40,930
So it's sort of a degenerate
distribution.
766
00:42:40,930 --> 00:42:46,070
So these are two extremes, no
scaling for the sum, a scaling
767
00:42:46,070 --> 00:42:47,740
where we divide by n.
768
00:42:47,740 --> 00:42:50,680
In this extreme, we get the
trivial case of a distribution
769
00:42:50,680 --> 00:42:52,860
that flattens out completely.
770
00:42:52,860 --> 00:42:56,070
In this scaling, we get a
distribution that gets
771
00:42:56,070 --> 00:42:59,150
concentrated around
a single point.
772
00:42:59,150 --> 00:43:02,030
Again, we look at some
intermediate scaling that
773
00:43:02,030 --> 00:43:04,050
makes things more interesting.
774
00:43:04,050 --> 00:43:09,700
Things do become interesting
if we scale by dividing the
775
00:43:09,700 --> 00:43:14,520
sum by square root of n instead
of dividing by n.
776
00:43:14,520 --> 00:43:17,210
What effect does this have?
777
00:43:17,210 --> 00:43:22,510
When we scale by dividing by
square root of n, the variance
778
00:43:22,510 --> 00:43:28,050
of Sn over square root of n is
going to be the variance of Sn
779
00:43:28,050 --> 00:43:30,760
over sum divided by n.
780
00:43:30,760 --> 00:43:32,780
That's how variances behave.
781
00:43:32,780 --> 00:43:37,370
The variance of Sn is n
sigma-squared, divide by n,
782
00:43:37,370 --> 00:43:41,330
which is sigma squared, which
means that when we scale in
783
00:43:41,330 --> 00:43:45,940
this particular way,
as n changes, the
784
00:43:45,940 --> 00:43:48,230
variance doesn't change.
785
00:43:48,230 --> 00:43:50,300
So the width of our
distribution
786
00:43:50,300 --> 00:43:52,190
will be sort of constant.
787
00:43:52,190 --> 00:43:56,360
The distribution changes shape,
but it doesn't become
788
00:43:56,360 --> 00:43:59,910
narrower as was the case here.
789
00:43:59,910 --> 00:44:04,550
It doesn't become wider, kind
of keeps the same width.
790
00:44:04,550 --> 00:44:09,260
So perhaps in the limit, this
distribution is going to take
791
00:44:09,260 --> 00:44:11,080
an interesting shape.
792
00:44:11,080 --> 00:44:14,170
And that's indeed the case.
793
00:44:14,170 --> 00:44:19,800
So let's do what
we did before.
794
00:44:19,800 --> 00:44:25,110
So we're looking at the sum, and
we want to divide the sum
795
00:44:25,110 --> 00:44:28,860
by something that goes like
square root of n.
796
00:44:28,860 --> 00:44:33,140
So the variance of Sn
is n sigma squared.
797
00:44:33,140 --> 00:44:38,240
The variance of the sigma Sn
is the square root of that.
798
00:44:38,240 --> 00:44:39,570
It's this number.
799
00:44:39,570 --> 00:44:43,930
So effectively, we're scaling
by order of square root n.
800
00:44:43,930 --> 00:44:47,570
Now, I'm doing another
thing here.
801
00:44:47,570 --> 00:44:52,350
If my random variable has a
positive mean, then this
802
00:44:52,350 --> 00:44:55,470
quantity is going to
have a mean that's
803
00:44:55,470 --> 00:44:56,950
positive and growing.
804
00:44:56,950 --> 00:44:59,450
It's going to be shifting
to the right.
805
00:44:59,450 --> 00:45:01,350
Why is that?
806
00:45:01,350 --> 00:45:04,370
Sn has a mean that's
proportional to n.
807
00:45:04,370 --> 00:45:09,510
When I divide by square root n,
then it means that the mean
808
00:45:09,510 --> 00:45:11,990
scales like square root of n.
809
00:45:11,990 --> 00:45:14,740
So my distribution would
still keep shifting
810
00:45:14,740 --> 00:45:16,720
after I do this division.
811
00:45:16,720 --> 00:45:20,860
I want to keep my distribution
in place, so I subtract out
812
00:45:20,860 --> 00:45:23,920
the mean of Sn.
813
00:45:23,920 --> 00:45:29,580
So what we're doing here is
a standard technique or
814
00:45:29,580 --> 00:45:32,670
transformation where you take
a random variable and you
815
00:45:32,670 --> 00:45:34,890
so-called standardize it.
816
00:45:34,890 --> 00:45:38,500
I remove the mean of that random
variable and I divide
817
00:45:38,500 --> 00:45:40,100
by the standard deviation.
818
00:45:40,100 --> 00:45:43,030
This results in a random
variable that has 0 mean and
819
00:45:43,030 --> 00:45:44,960
unit variance.
820
00:45:44,960 --> 00:45:49,880
What Zn measures is the
following, Zn tells me how
821
00:45:49,880 --> 00:45:55,520
many standard deviations am
I away from the mean.
822
00:45:55,520 --> 00:45:59,380
Sn minus (n times expected value
of X) tells me how much
823
00:45:59,380 --> 00:46:02,980
is Sn away from the
mean value of Sn.
824
00:46:02,980 --> 00:46:06,250
And by dividing by the standard
deviation of Sn --
825
00:46:06,250 --> 00:46:09,830
this tells me how many standard
deviations away from
826
00:46:09,830 --> 00:46:12,550
the mean am I.
827
00:46:12,550 --> 00:46:15,360
So we're going to look at this
random variable, which is just
828
00:46:15,360 --> 00:46:17,260
a transformation Zn.
829
00:46:17,260 --> 00:46:20,840
It's a linear transformation
of Sn.
830
00:46:20,840 --> 00:46:24,740
S And we're going to compare
this random variable to a
831
00:46:24,740 --> 00:46:27,230
standard normal random
variable.
832
00:46:27,230 --> 00:46:30,610
So a standard normal is the
random variable that you are
833
00:46:30,610 --> 00:46:35,200
familiar with, given by the
usual formula, and for which
834
00:46:35,200 --> 00:46:37,400
we have tables for it.
835
00:46:37,400 --> 00:46:40,400
This Zn has 0 mean and
unit variance.
836
00:46:40,400 --> 00:46:44,220
So in that respect, it has the
same statistics as the
837
00:46:44,220 --> 00:46:45,655
standard normal.
838
00:46:45,655 --> 00:46:48,960
The distribution of Zn
could be anything --
839
00:46:48,960 --> 00:46:50,770
can be pretty messy.
840
00:46:50,770 --> 00:46:53,320
But there is this amazing
theorem called the central
841
00:46:53,320 --> 00:46:58,250
limit theorem that tells us that
the distribution of Zn
842
00:46:58,250 --> 00:47:01,930
approaches the distribution of
the standard normal in the
843
00:47:01,930 --> 00:47:06,270
following sense, that
probability is that you can
844
00:47:06,270 --> 00:47:07,080
calculate --
845
00:47:07,080 --> 00:47:07,930
of this type --
846
00:47:07,930 --> 00:47:10,350
that you can calculate
for Zn --
847
00:47:10,350 --> 00:47:13,330
is the limit becomes the same as
the probabilities that you
848
00:47:13,330 --> 00:47:17,590
would get from the standard
normal tables for Z.
849
00:47:17,590 --> 00:47:19,750
It's a statement about
the cumulative
850
00:47:19,750 --> 00:47:21,960
distribution functions.
851
00:47:21,960 --> 00:47:25,060
This quantity, as a function
of c, is the cumulative
852
00:47:25,060 --> 00:47:27,920
distribution function of
the random variable Zn.
853
00:47:27,920 --> 00:47:30,860
This is the cumulative
distribution function of the
854
00:47:30,860 --> 00:47:32,190
standard normal.
855
00:47:32,190 --> 00:47:34,530
The central limit theorem tells
us that the cumulative
856
00:47:34,530 --> 00:47:39,340
distribution function of the
sum of a number of random
857
00:47:39,340 --> 00:47:43,040
variables, after they're
appropriately standardized,
858
00:47:43,040 --> 00:47:46,480
approaches the cumulative
distribution function over the
859
00:47:46,480 --> 00:47:50,580
standard normal distribution.
860
00:47:50,580 --> 00:47:53,620
In particular, this tells
us that we can calculate
861
00:47:53,620 --> 00:47:59,480
probabilities for Zn when n is
large by calculating instead
862
00:47:59,480 --> 00:48:02,800
probabilities for Z. And that's
going to be a good
863
00:48:02,800 --> 00:48:04,020
approximation.
864
00:48:04,020 --> 00:48:07,670
Probabilities for Z are easy to
calculate because they're
865
00:48:07,670 --> 00:48:09,250
well tabulated.
866
00:48:09,250 --> 00:48:12,820
So we get a very nice shortcut
for calculating
867
00:48:12,820 --> 00:48:14,990
probabilities for Zn.
868
00:48:14,990 --> 00:48:17,990
Now, it's not Zn that you're
interested in.
869
00:48:17,990 --> 00:48:20,890
What you're interested
in is Sn.
870
00:48:20,890 --> 00:48:23,820
And Sn --
871
00:48:23,820 --> 00:48:29,080
inverting this relation
here --
872
00:48:29,080 --> 00:48:38,330
Sn is square root n sigma
Zn plus n expected
873
00:48:38,330 --> 00:48:42,602
value of X. All right.
874
00:48:42,602 --> 00:48:46,620
Now, if you can calculate
probabilities for Zn, even
875
00:48:46,620 --> 00:48:49,380
approximately, then you can
certainly calculate
876
00:48:49,380 --> 00:48:53,290
probabilities for Sn, because
one is a linear
877
00:48:53,290 --> 00:48:55,206
function of the other.
878
00:48:55,206 --> 00:48:58,710
And we're going to do a little
bit of that next time.
879
00:48:58,710 --> 00:49:02,220
You're going to get, also, some
practice in recitation.
880
00:49:02,220 --> 00:49:04,975
At a more vague level, you could
describe the central
881
00:49:04,975 --> 00:49:08,270
limit theorem as saying the
following, when n is large,
882
00:49:08,270 --> 00:49:12,160
you can pretend that Zn is
a standard normal random
883
00:49:12,160 --> 00:49:15,440
variable and do the calculations
as if Zn was
884
00:49:15,440 --> 00:49:16,680
standard normal.
885
00:49:16,680 --> 00:49:21,530
Now, pretending that Zn is
normal is the same as
886
00:49:21,530 --> 00:49:25,900
pretending that Sn is normal,
because Sn is a linear
887
00:49:25,900 --> 00:49:27,700
function of Zn.
888
00:49:27,700 --> 00:49:30,400
And we know that linear
functions of normal random
889
00:49:30,400 --> 00:49:32,140
variables are normal.
890
00:49:32,140 --> 00:49:36,290
So the central limit theorem
essentially tells us that we
891
00:49:36,290 --> 00:49:40,070
can pretend that Sn is a normal
random variable and do
892
00:49:40,070 --> 00:49:44,760
the calculations just as if it
were a normal random variable.
893
00:49:44,760 --> 00:49:47,020
Mathematically speaking though,
the central limit
894
00:49:47,020 --> 00:49:50,480
theorem does not talk about
the distribution of Sn,
895
00:49:50,480 --> 00:49:54,940
because the distribution of Sn
becomes degenerate in the
896
00:49:54,940 --> 00:49:57,650
limit, just a very flat
and long thing.
897
00:49:57,650 --> 00:49:59,810
So strictly speaking
mathematically, it's a
898
00:49:59,810 --> 00:50:03,060
statement about cumulative
distributions of Zn's.
899
00:50:03,060 --> 00:50:06,420
Practically, the way you use it
is by just pretending that
900
00:50:06,420 --> 00:50:08,415
Sn is normal.
901
00:50:08,415 --> 00:50:09,400
Very good.
902
00:50:09,400 --> 00:50:11,080
Enjoy the Thanksgiving Holiday.
903
00:50:11,080 --> 00:50:12,330