WEBVTT
00:00:00.160 --> 00:00:02.340
The weak law of large
numbers tells us
00:00:02.340 --> 00:00:03.640
that the sample mean--
00:00:03.640 --> 00:00:06.480
that is, the average of
independent identically
00:00:06.480 --> 00:00:09.260
distributed random
variables, Xi--
00:00:09.260 --> 00:00:12.960
converges, in a certain sense,
to a number, namely the
00:00:12.960 --> 00:00:16.700
expected value of the random
variables, Xi.
00:00:16.700 --> 00:00:19.750
But it does not tell us much
about the details of the
00:00:19.750 --> 00:00:23.260
distribution of the
sample mean.
00:00:23.260 --> 00:00:26.210
The central limit theorem
provides us exactly with this
00:00:26.210 --> 00:00:27.590
kind of detail.
00:00:27.590 --> 00:00:31.470
It tells us that the sum of many
independent identically
00:00:31.470 --> 00:00:35.260
distributed random variables
has approximately a normal
00:00:35.260 --> 00:00:36.840
distribution.
00:00:36.840 --> 00:00:40.300
The mean and variance of this
normal is easy to find if we
00:00:40.300 --> 00:00:43.820
know the mean and variance of
the original random variables.
00:00:43.820 --> 00:00:48.330
This enables us to carry out
approximate calculations
00:00:48.330 --> 00:00:52.650
rather quickly by using
the normal tables.
00:00:52.650 --> 00:00:55.630
We will start with a precise
statement of the central limit
00:00:55.630 --> 00:01:00.290
theorem, and we will emphasize
that it is a universal result.
00:01:00.290 --> 00:01:03.870
It holds no matter what the
distribution of the original
00:01:03.870 --> 00:01:08.970
random variables, and for this
reason, it is very useful.
00:01:08.970 --> 00:01:12.300
We will work through several
examples of the typical ways
00:01:12.300 --> 00:01:15.200
that the central limit
theorem is used.
00:01:15.200 --> 00:01:18.730
We will develop a refinement
that can be used when we are
00:01:18.730 --> 00:01:21.940
dealing with discrete
distributions, which provides
00:01:21.940 --> 00:01:25.300
us with even more accurate
approximations.
00:01:25.300 --> 00:01:29.190
And finally we will revisit
the polling problem, and
00:01:29.190 --> 00:01:32.800
inquire again about the number
of samples that are needed to
00:01:32.800 --> 00:01:37.160
obtain a certain accuracy with
a certain confidence.
00:01:37.160 --> 00:01:40.140
We will see that the central
limit theorem is much more
00:01:40.140 --> 00:01:44.520
informative, much less
conservative, compared to the
00:01:44.520 --> 00:01:48.050
conclusions that we had gotten
before based on the Chebyshev
00:01:48.050 --> 00:01:49.300
inequality.