WEBVTT

00:00:00.160 --> 00:00:02.340
The weak law of large
numbers tells us

00:00:02.340 --> 00:00:03.640
that the sample mean--

00:00:03.640 --> 00:00:06.480
that is, the average of
independent identically

00:00:06.480 --> 00:00:09.260
distributed random
variables, Xi--

00:00:09.260 --> 00:00:12.960
converges, in a certain sense,
to a number, namely the

00:00:12.960 --> 00:00:16.700
expected value of the random
variables, Xi.

00:00:16.700 --> 00:00:19.750
But it does not tell us much
about the details of the

00:00:19.750 --> 00:00:23.260
distribution of the
sample mean.

00:00:23.260 --> 00:00:26.210
The central limit theorem
provides us exactly with this

00:00:26.210 --> 00:00:27.590
kind of detail.

00:00:27.590 --> 00:00:31.470
It tells us that the sum of many
independent identically

00:00:31.470 --> 00:00:35.260
distributed random variables
has approximately a normal

00:00:35.260 --> 00:00:36.840
distribution.

00:00:36.840 --> 00:00:40.300
The mean and variance of this
normal is easy to find if we

00:00:40.300 --> 00:00:43.820
know the mean and variance of
the original random variables.

00:00:43.820 --> 00:00:48.330
This enables us to carry out
approximate calculations

00:00:48.330 --> 00:00:52.650
rather quickly by using
the normal tables.

00:00:52.650 --> 00:00:55.630
We will start with a precise
statement of the central limit

00:00:55.630 --> 00:01:00.290
theorem, and we will emphasize
that it is a universal result.

00:01:00.290 --> 00:01:03.870
It holds no matter what the
distribution of the original

00:01:03.870 --> 00:01:08.970
random variables, and for this
reason, it is very useful.

00:01:08.970 --> 00:01:12.300
We will work through several
examples of the typical ways

00:01:12.300 --> 00:01:15.200
that the central limit
theorem is used.

00:01:15.200 --> 00:01:18.730
We will develop a refinement
that can be used when we are

00:01:18.730 --> 00:01:21.940
dealing with discrete
distributions, which provides

00:01:21.940 --> 00:01:25.300
us with even more accurate
approximations.

00:01:25.300 --> 00:01:29.190
And finally we will revisit
the polling problem, and

00:01:29.190 --> 00:01:32.800
inquire again about the number
of samples that are needed to

00:01:32.800 --> 00:01:37.160
obtain a certain accuracy with
a certain confidence.

00:01:37.160 --> 00:01:40.140
We will see that the central
limit theorem is much more

00:01:40.140 --> 00:01:44.520
informative, much less
conservative, compared to the

00:01:44.520 --> 00:01:48.050
conclusions that we had gotten
before based on the Chebyshev

00:01:48.050 --> 00:01:49.300
inequality.