WEBVTT
00:00:00.530 --> 00:00:02.960
The following content is
provided under a Creative
00:00:02.960 --> 00:00:04.370
Commons license.
00:00:04.370 --> 00:00:07.410
Your support will help MIT
OpenCourseWare continue to
00:00:07.410 --> 00:00:11.060
offer high-quality educational
resources for free.
00:00:11.060 --> 00:00:13.960
To make a donation or view
additional materials from
00:00:13.960 --> 00:00:19.790
hundreds of MIT courses, visit
MIT OpenCourseWare at
00:00:19.790 --> 00:00:21.040
ocw.mit.edu.
00:00:23.830 --> 00:00:25.630
PROFESSOR: OK, let's
get started.
00:00:25.630 --> 00:00:28.430
We'll start a minute early and
then maybe we can finish a
00:00:28.430 --> 00:00:31.230
minute early if we're lucky.
00:00:31.230 --> 00:00:35.320
Today we want to talk about
the laws of large numbers.
00:00:35.320 --> 00:00:38.310
We want to talk about
convergence a little bit.
00:00:38.310 --> 00:00:41.170
We will not really get into
the strong law of large
00:00:41.170 --> 00:00:45.820
numbers, which we're going to
do later, because that's a
00:00:45.820 --> 00:00:48.590
kind of a mysterious,
difficult topic.
00:00:48.590 --> 00:00:53.120
And I wanted to put off really
talking about that until we
00:00:53.120 --> 00:00:55.480
got to the point where we could
make some use of it,
00:00:55.480 --> 00:00:57.810
which is not quite yet.
00:00:57.810 --> 00:00:59.970
So first, I want to review
what we've done
00:00:59.970 --> 00:01:01.220
just a little bit.
00:01:04.370 --> 00:01:08.890
We've said that probability
models are very natural things
00:01:08.890 --> 00:01:13.380
for real-world situations,
particularly those that are
00:01:13.380 --> 00:01:15.600
repeatable.
00:01:15.600 --> 00:01:22.940
And by repeatable, I mean they
use trials, which have
00:01:22.940 --> 00:01:25.980
essentially the same
initial conditions.
00:01:25.980 --> 00:01:28.130
They're essentially isolated
from each other.
00:01:28.130 --> 00:01:31.270
When I say they're isolated from
each other, I mean there
00:01:31.270 --> 00:01:33.470
isn't any apparent
contact that they
00:01:33.470 --> 00:01:35.340
have with each other.
00:01:35.340 --> 00:01:39.970
So for example, when you're
flipping coins, there's not
00:01:39.970 --> 00:01:42.260
one very unusual coin,
and that's the coin
00:01:42.260 --> 00:01:43.810
you use all the time.
00:01:43.810 --> 00:01:45.910
And then you try to use
those results to be
00:01:45.910 --> 00:01:48.700
typical of all coins.
00:01:48.700 --> 00:01:50.920
Have a fixed set of possible
outcomes for
00:01:50.920 --> 00:01:52.440
these multiple trials.
00:01:52.440 --> 00:01:56.430
And they have an essentially
random individual outcomes.
00:01:56.430 --> 00:01:59.860
Now, you'll see there's a real
problem here when I use the
00:01:59.860 --> 00:02:04.270
word "random" there and
"probability models" there
00:02:04.270 --> 00:02:06.130
because there is something
inherently
00:02:06.130 --> 00:02:08.440
circular in this argument.
00:02:08.440 --> 00:02:11.190
It's something that always
happens when you get into
00:02:11.190 --> 00:02:14.760
modeling where you're trying to
take the messy real-world
00:02:14.760 --> 00:02:19.090
and turn it into a nice, clean,
mathematical model.
00:02:19.090 --> 00:02:22.350
So that really, what we
all do, and we do this
00:02:22.350 --> 00:02:26.590
instinctively, is after we
start getting used to a
00:02:26.590 --> 00:02:30.570
particular model, we assume that
the real-world is like
00:02:30.570 --> 00:02:31.840
that model.
00:02:31.840 --> 00:02:35.320
If you don't think you
do that, think again.
00:02:35.320 --> 00:02:38.390
Because I think everyone does.
00:02:38.390 --> 00:02:41.370
So you really have that problem
of trying to figure
00:02:41.370 --> 00:02:45.010
out what's wrong with models,
how to go to better models,
00:02:45.010 --> 00:02:46.910
and we do this all the time.
00:02:46.910 --> 00:02:50.030
OK, for any model, an extended
model-- in other words, an
00:02:50.030 --> 00:02:54.590
extended mathematical model, for
a sequence or an n-tuple
00:02:54.590 --> 00:03:00.030
of independent identically
distributed repetitions is
00:03:00.030 --> 00:03:02.660
always well-defined
mathematically.
00:03:02.660 --> 00:03:05.230
We haven't proven that,
it's not trivial.
00:03:05.230 --> 00:03:08.020
But in fact, it's true.
00:03:08.020 --> 00:03:12.050
Relative frequencies and
sample averages.
00:03:12.050 --> 00:03:17.240
Relative frequencies apply to
events, and you can represent
00:03:17.240 --> 00:03:20.180
events in terms of indicator
functions and then use
00:03:20.180 --> 00:03:22.240
everything you know about
random variables
00:03:22.240 --> 00:03:24.330
to deal with them.
00:03:24.330 --> 00:03:26.650
Therefore, you can use
sample averages.
00:03:26.650 --> 00:03:29.910
In this extended model,
essentially become
00:03:29.910 --> 00:03:31.330
deterministic.
00:03:31.330 --> 00:03:34.210
And that's what the laws of
large numbers say in various
00:03:34.210 --> 00:03:36.210
different ways.
00:03:36.210 --> 00:03:39.880
And beyond knowing that they
become deterministic, our
00:03:39.880 --> 00:03:43.700
problem today is to decide
exactly what that means.
00:03:48.010 --> 00:03:51.280
The laws of large numbers
specify what "become
00:03:51.280 --> 00:03:53.250
deterministic" means.
00:03:53.250 --> 00:03:56.170
They only operates within
the extended model.
00:03:56.170 --> 00:03:59.390
In other words, laws of large
numbers don't apply to the
00:03:59.390 --> 00:04:01.440
real world.
00:04:01.440 --> 00:04:04.260
Well, we hope they apply to the
real world, but they only
00:04:04.260 --> 00:04:07.140
apply to the real world when the
model is good because you
00:04:07.140 --> 00:04:10.100
can only prove the laws of
large numbers within this
00:04:10.100 --> 00:04:11.370
model domain.
00:04:11.370 --> 00:04:15.220
Probability theory provides an
awful lot of consistency
00:04:15.220 --> 00:04:18.730
checks and ways to avoid
experimentation.
00:04:18.730 --> 00:04:21.250
In other words, I'm not claiming
here that you have to
00:04:21.250 --> 00:04:25.600
do experimentation with a large
number of so-called
00:04:25.600 --> 00:04:30.150
independent trials very often
because you have so many ways
00:04:30.150 --> 00:04:31.520
of checking on things.
00:04:31.520 --> 00:04:34.890
But every once in a while, you
have to do experimentation.
00:04:34.890 --> 00:04:39.520
And when you do, somehow or
other the idea of this large
00:04:39.520 --> 00:04:45.300
number of trials, and either IID
trials or trials which are
00:04:45.300 --> 00:04:47.330
somehow isolated from
each other.
00:04:50.410 --> 00:04:53.720
And we will soon get to talk
about Markov models.
00:04:53.720 --> 00:04:57.730
We will see that with Markov
models, you don't have the IID
00:04:57.730 --> 00:05:02.320
property, but you have enough
independence over time that
00:05:02.320 --> 00:05:04.920
you can still get these
sorts of results.
00:05:04.920 --> 00:05:10.800
So anyway, the determinism in
this large number of trials
00:05:10.800 --> 00:05:14.650
really underlies much of the
value of probability.
00:05:14.650 --> 00:05:16.970
OK, in other words, you
don't need to use this
00:05:16.970 --> 00:05:19.120
experimentation very often.
00:05:19.120 --> 00:05:22.920
But when you do, you really need
it because that's what
00:05:22.920 --> 00:05:27.050
you use to resolve conflicts,
and to settle on things, and
00:05:27.050 --> 00:05:30.960
to have different people who are
all trying to understand
00:05:30.960 --> 00:05:34.510
what's going on have some
idea of something
00:05:34.510 --> 00:05:37.210
they can agree on.
00:05:37.210 --> 00:05:39.620
OK, so that's enough for
probability models.
00:05:39.620 --> 00:05:41.200
That's enough for philosophy.
00:05:41.200 --> 00:05:44.740
We will come back to this
with little bits and
00:05:44.740 --> 00:05:46.930
pieces now and then.
00:05:46.930 --> 00:05:50.670
But at this point, we're really
going into talking
00:05:50.670 --> 00:05:54.530
about the mathematical
models themselves.
00:05:54.530 --> 00:05:58.310
OK, so let's talk about the
Markov bound, the Chebyshev
00:05:58.310 --> 00:06:00.470
bound, and the Chernoff bound.
00:06:00.470 --> 00:06:02.670
You should be reading the notes,
so I hope you know what
00:06:02.670 --> 00:06:05.360
all these things are
so I can go through
00:06:05.360 --> 00:06:07.005
them relatively quickly.
00:06:10.730 --> 00:06:13.850
If you think that using these
lectures slides that I'm
00:06:13.850 --> 00:06:18.000
passing out plus doing the
problems is sufficient for
00:06:18.000 --> 00:06:20.040
understanding this
course, you're
00:06:20.040 --> 00:06:22.180
really kidding yourself.
00:06:22.180 --> 00:06:27.070
I mean, the course is based on
this text, which explains
00:06:27.070 --> 00:06:28.340
things much more fully.
00:06:28.340 --> 00:06:30.270
It still has errors in it.
00:06:30.270 --> 00:06:32.640
It still has typos in It.
00:06:32.640 --> 00:06:36.770
But a whole lot fewer than these
lecture slides, so you
00:06:36.770 --> 00:06:40.800
should be reading them and then
using them try to get a
00:06:40.800 --> 00:06:45.810
better idea of what these
lectures mean, and using that
00:06:45.810 --> 00:06:49.450
to get the better idea
of what the exercises
00:06:49.450 --> 00:06:51.090
you're doing mean.
00:06:51.090 --> 00:06:56.560
Doing the exercises does not
do you any good whatsoever.
00:06:56.560 --> 00:07:00.250
The only thing that does you
some good is to do an exercise
00:07:00.250 --> 00:07:03.740
and then think about what it
has to do with anything.
00:07:03.740 --> 00:07:06.440
And if you don't do that second
part, then all you're
00:07:06.440 --> 00:07:09.290
doing is you're building
a very,
00:07:09.290 --> 00:07:11.810
very second rate computer.
00:07:11.810 --> 00:07:16.190
Your abilities as a computer
are about the same for the
00:07:16.190 --> 00:07:20.250
most part as the computer
in a coffee maker.
00:07:22.750 --> 00:07:26.730
You are really not up to what
a TV set does anymore.
00:07:26.730 --> 00:07:31.200
I mean, TV sets can do so much
computation that they're way
00:07:31.200 --> 00:07:33.610
beyond your abilities
at this point.
00:07:33.610 --> 00:07:38.700
So the only edge you have, the
only thing you can do to try
00:07:38.700 --> 00:07:42.850
to make yourself worthwhile is
to understand these things
00:07:42.850 --> 00:07:45.570
because computers cannot
do any of that
00:07:45.570 --> 00:07:47.580
understanding at all.
00:07:47.580 --> 00:07:49.800
So you're way ahead
of them there.
00:07:49.800 --> 00:07:51.735
OK, so what is the
Markov model?
00:07:55.080 --> 00:07:59.220
What it says is, if y is a
non-negative random variable--
00:07:59.220 --> 00:08:02.370
in other words, it's a random
variable, which only takes one
00:08:02.370 --> 00:08:04.570
non-negative sample values.
00:08:04.570 --> 00:08:08.190
If it has an expectation,
expectation of y.
00:08:08.190 --> 00:08:13.010
And for any real y greater than
0, the probability that Y
00:08:13.010 --> 00:08:16.910
is greater than or equal to
little y is the expected value
00:08:16.910 --> 00:08:19.820
of Y divided by little y.
00:08:19.820 --> 00:08:22.260
The proof of it is by picture.
00:08:22.260 --> 00:08:25.980
If you don't like proofs by
pictures, you should get used
00:08:25.980 --> 00:08:28.560
to it because we will prove a
great number of things by
00:08:28.560 --> 00:08:29.880
pictures here.
00:08:29.880 --> 00:08:33.380
And I claim that a proof by
picture is better than a proof
00:08:33.380 --> 00:08:36.909
by algebra because if there's
anything wrong with it, you
00:08:36.909 --> 00:08:39.309
can see from looking
at it what it is.
00:08:39.309 --> 00:08:43.400
So we know that the expected
value of Y is the integral
00:08:43.400 --> 00:08:47.680
under the complimentary
distribution function.
00:08:47.680 --> 00:08:52.400
This square in here, this area
of Y times probability of Y
00:08:52.400 --> 00:08:55.770
greater or equal to Y. The
probability of Y greater than
00:08:55.770 --> 00:08:59.500
or equal, capital Y, the random
variable y greater than
00:08:59.500 --> 00:09:04.030
or equal to the number,
little y, is just that
00:09:04.030 --> 00:09:05.790
point right up there.
00:09:05.790 --> 00:09:07.330
This is the point y.
00:09:07.330 --> 00:09:10.570
This is the point probability
of capital Y
00:09:10.570 --> 00:09:12.750
greater than little y.
00:09:12.750 --> 00:09:15.510
It doesn't make any difference
when you're integrating
00:09:15.510 --> 00:09:20.840
whether you use a greater than
or equal to sign or a
00:09:20.840 --> 00:09:22.870
greater than sign.
00:09:22.870 --> 00:09:26.260
If you have a discontinuity,
the integral is the same no
00:09:26.260 --> 00:09:28.590
matter which way
you look at it.
00:09:28.590 --> 00:09:35.740
So this area here is y times
the probability that random
00:09:35.740 --> 00:09:39.790
variable y is greater than
or equal to number y.
00:09:39.790 --> 00:09:43.380
And all that the Markov bound
says is that this little
00:09:43.380 --> 00:09:47.100
rectangle here is less than or
equal to the integral under
00:09:47.100 --> 00:09:48.520
that curve.
00:09:48.520 --> 00:09:50.220
That's a perfectly
rigorous proof.
00:09:53.970 --> 00:09:58.020
We don't really care about
rigorous proofs here, anyway
00:09:58.020 --> 00:10:02.530
since we're trying to get at
the issue of how you use
00:10:02.530 --> 00:10:09.050
probability, but we don't want
to have proofs which mislead
00:10:09.050 --> 00:10:10.430
you about things.
00:10:10.430 --> 00:10:13.720
In other words, proofs
which aren't right.
00:10:13.720 --> 00:10:18.490
So we try to be right, and I
want you to learn to be right.
00:10:18.490 --> 00:10:23.320
But I don't want you to start
to worry too much about
00:10:23.320 --> 00:10:27.020
looking like a mathematician
when you prove things.
00:10:27.020 --> 00:10:29.770
OK, the Chebyshev inequality.
00:10:29.770 --> 00:10:32.570
If Z has a mean--
00:10:32.570 --> 00:10:38.690
and when you say Z has a mean,
what you really mean is the
00:10:38.690 --> 00:10:44.450
expected value of the absolute
value of Z is finite.
00:10:44.450 --> 00:10:48.900
And it has a variance sigma
squared of Z. That's saying a
00:10:48.900 --> 00:10:51.520
little more than just
having a mean.
00:10:51.520 --> 00:10:55.180
Then, for any epsilon
greater than 0.
00:10:55.180 --> 00:10:57.310
In other words, this bound
works for any epsilon.
00:10:57.310 --> 00:11:00.730
The probability that the
absolute value of Z
00:11:00.730 --> 00:11:01.920
less than the mean.
00:11:01.920 --> 00:11:05.530
In other words, that it's
further away from the mean by
00:11:05.530 --> 00:11:09.535
more than epsilon, is less than
or equal to the variance
00:11:09.535 --> 00:11:13.140
of Z divided by epsilon
squared.
00:11:13.140 --> 00:11:17.910
Again, this is a very weak
bound, but it's very general.
00:11:17.910 --> 00:11:20.220
And therefore, it's
very useful.
00:11:20.220 --> 00:11:24.020
And the proof is simplicity
itself.
00:11:24.020 --> 00:11:29.670
You define a new random variable
y, which is Z minus
00:11:29.670 --> 00:11:33.650
the expected value of
Z quantity squared.
00:11:33.650 --> 00:11:37.990
The expected value of Y then is
the expected value of this,
00:11:37.990 --> 00:11:41.580
which is just the
variance of Z.
00:11:41.580 --> 00:11:45.300
So for any y greater than 0,
we'll use the Markov bound,
00:11:45.300 --> 00:11:48.520
which says the probability
that random variable Y is
00:11:48.520 --> 00:11:52.090
greater than or equal to number
little y is less than
00:11:52.090 --> 00:11:54.260
or equal to sigma
of Z squared.
00:11:54.260 --> 00:11:58.740
Namely, the expected value of
random variable Y divided by
00:11:58.740 --> 00:12:02.100
number Y. That's just
the Markov bound.
00:12:02.100 --> 00:12:06.560
And then, random variable Y is
greater than or equal to
00:12:06.560 --> 00:12:10.780
number y if and only if the
positive square root of
00:12:10.780 --> 00:12:14.140
capital Y-- we're dealing only
with positive non-negative
00:12:14.140 --> 00:12:15.120
things here--
00:12:15.120 --> 00:12:19.090
is greater than or equal to the
square root of number y.
00:12:19.090 --> 00:12:23.700
And that's less than or equal
to sigma Z squared over y.
00:12:23.700 --> 00:12:27.860
And square root of Y is
just the magnitude
00:12:27.860 --> 00:12:29.820
of Z minus Z bar.
00:12:29.820 --> 00:12:34.500
We're setting epsilon equal to
square root of y yields the
00:12:34.500 --> 00:12:36.350
Chebyshev bound.
00:12:36.350 --> 00:12:44.140
Now, that's something which
I don't believe
00:12:44.140 --> 00:12:46.380
in memorizing proofs.
00:12:46.380 --> 00:12:48.440
I think that's a
terrible idea.
00:12:48.440 --> 00:12:53.080
But that's something so simple
and so often used that you
00:12:53.080 --> 00:12:54.830
just ought to think
in those terms.
00:12:54.830 --> 00:12:58.070
You ought to be able to see
that diagram of the Markov
00:12:58.070 --> 00:12:58.800
inequality.
00:12:58.800 --> 00:13:01.580
You ought to be able to
see why it is true.
00:13:01.580 --> 00:13:03.980
And you ought to understand
it well enough
00:13:03.980 --> 00:13:05.710
that you can use it.
00:13:05.710 --> 00:13:09.480
In other words, there's a big
difference in mathematics
00:13:09.480 --> 00:13:13.780
between knowing what a theorem
says and knowing that the
00:13:13.780 --> 00:13:17.020
theorem is true, and
really having a gut
00:13:17.020 --> 00:13:19.080
feeling for that theorem.
00:13:19.080 --> 00:13:21.570
I mean, you know this when
you deal with numbers.
00:13:21.570 --> 00:13:23.030
You know it when you deal
with integration or
00:13:23.030 --> 00:13:25.880
differentiation, or any of
those things you've known
00:13:25.880 --> 00:13:28.040
about for a long time.
00:13:28.040 --> 00:13:30.500
There's a big difference between
the things that you
00:13:30.500 --> 00:13:32.910
can really work with because
you understand them and you
00:13:32.910 --> 00:13:39.490
see them and those things that
you just know as something you
00:13:39.490 --> 00:13:40.740
don't really understand.
00:13:44.140 --> 00:13:48.120
This is something that you
really ought to understand and
00:13:48.120 --> 00:13:50.310
be able to see it.
00:13:50.310 --> 00:13:52.650
OK, the Chernoff bound
is the last of these.
00:13:52.650 --> 00:13:54.690
We will use this a great deal.
00:13:54.690 --> 00:13:56.880
It's a generating
function bound.
00:13:56.880 --> 00:14:02.990
And it says, for any number,
positive number z, and any
00:14:02.990 --> 00:14:07.040
positive number r greater than
0, such that the moment
00:14:07.040 --> 00:14:09.620
generating function-- the moment
generating function of
00:14:09.620 --> 00:14:14.320
a random variable z is a
function given the random
00:14:14.320 --> 00:14:15.600
variable z.
00:14:15.600 --> 00:14:18.610
It's a function of
a real number r.
00:14:18.610 --> 00:14:24.710
And that function is the
expected value of e to the rZ.
00:14:24.710 --> 00:14:27.410
It's called the generating
function because if you start
00:14:27.410 --> 00:14:30.880
taking derivatives of this and
evaluate them, that r equals
00:14:30.880 --> 00:14:34.140
0, what you get is the
various moments of z.
00:14:34.140 --> 00:14:36.410
You've probably seen
that at some point.
00:14:36.410 --> 00:14:39.650
If you haven't seen it, it's not
important here because we
00:14:39.650 --> 00:14:41.580
don't use that at all.
00:14:41.580 --> 00:14:45.380
What we really use is the fact
that this is a function.
00:14:45.380 --> 00:14:50.130
It's a function, which is
increasing as r increases
00:14:50.130 --> 00:14:53.140
because you put--
00:14:53.140 --> 00:14:56.100
well, it just does.
00:14:56.100 --> 00:14:59.330
And what it says is the
probability that this random
00:14:59.330 --> 00:15:01.430
variable is greater than or
equal to the number z--
00:15:01.430 --> 00:15:03.410
I should really use different
letters for these things.
00:15:03.410 --> 00:15:05.480
It's hard to talk about them--
00:15:05.480 --> 00:15:08.560
is less than or equal to the
moment generating function
00:15:08.560 --> 00:15:11.030
times e to the minus rZ.
00:15:11.030 --> 00:15:14.340
And the proof is exactly the
same as the proof before.
00:15:14.340 --> 00:15:16.730
You might get the picture that
you can prove many, many
00:15:16.730 --> 00:15:19.420
different things from the
Markov inequality.
00:15:19.420 --> 00:15:21.400
And in fact, you can.
00:15:21.400 --> 00:15:24.580
You just put in whatever you
want to and you get a new
00:15:24.580 --> 00:15:26.260
inequality.
00:15:26.260 --> 00:15:30.070
And you can call it after
yourself if you want.
00:15:30.070 --> 00:15:31.220
I mean, Chernoff.
00:15:31.220 --> 00:15:33.290
Chernoff is still alive.
00:15:33.290 --> 00:15:37.460
Chernoff is a faculty
member at Harvard.
00:15:37.460 --> 00:15:40.550
And this is kind of curious
because he sort of slipped
00:15:40.550 --> 00:15:43.090
this in in a paper that he wrote
where he was trying to
00:15:43.090 --> 00:15:45.860
prove something difficult.
00:15:45.860 --> 00:15:49.740
And this is a relationship that
many mathematicians have
00:15:49.740 --> 00:15:51.900
used over many, many years.
00:15:51.900 --> 00:15:55.400
And it's so simple that they
didn't make any fuss about it.
00:15:55.400 --> 00:15:57.560
And he didn't make any
fuss about it.
00:15:57.560 --> 00:16:01.800
And he was sort of embarrassed
that many engineers, starting
00:16:01.800 --> 00:16:04.690
with Claude Shannon, found this
to be extraordinarily
00:16:04.690 --> 00:16:07.610
useful, and started calling
it the Chernoff bound.
00:16:07.610 --> 00:16:10.800
He was slightly embarrassed of
having this totally trivial
00:16:10.800 --> 00:16:14.300
thing suddenly be
named after him.
00:16:14.300 --> 00:16:17.030
But anyway, that's the
way it happened.
00:16:17.030 --> 00:16:22.020
And now it's a widely used tool
that we use all the time.
00:16:22.020 --> 00:16:25.940
So it's the same proof that we
had before for any y greater
00:16:25.940 --> 00:16:28.740
0, Markov says this.
00:16:28.740 --> 00:16:31.450
And therefore, you get that.
00:16:31.450 --> 00:16:34.510
This decreases exponentially
with Z, and
00:16:34.510 --> 00:16:36.910
that's why it's useful.
00:16:36.910 --> 00:16:40.730
I mean, the Markov inequality
only decays as
00:16:40.730 --> 00:16:44.500
1 over little y.
00:16:44.500 --> 00:16:48.880
The Chebyshev inequality decays
as 1 over y squared.
00:16:48.880 --> 00:16:51.610
This decays exponentially
with y.
00:16:51.610 --> 00:16:55.210
And therefore, when you start
dealing with large deviations,
00:16:55.210 --> 00:16:59.360
trying to talk about things that
are very, very unlikely
00:16:59.360 --> 00:17:03.420
when you get very, very far from
the mean, this is a very
00:17:03.420 --> 00:17:04.640
useful way to do it.
00:17:04.640 --> 00:17:08.310
And it's sort of the standard
way of doing it at this point.
00:17:08.310 --> 00:17:12.530
We won't use it right now, but
this is the right time to talk
00:17:12.530 --> 00:17:14.140
about it a little bit.
00:17:14.140 --> 00:17:17.579
OK, next topic we want to take
up is really these laws of
00:17:17.579 --> 00:17:21.450
large numbers, and something
about convergence.
00:17:21.450 --> 00:17:25.579
We want to understand a
little bit about that.
00:17:25.579 --> 00:17:33.390
And this picture that we've seen
before, we take X1, up to
00:17:33.390 --> 00:17:37.710
Xn as n independent identically
00:17:37.710 --> 00:17:40.210
distributed random variables.
00:17:40.210 --> 00:17:43.570
They each have mean expected
value of X. They each have
00:17:43.570 --> 00:17:45.880
variance sigma squared.
00:17:45.880 --> 00:17:48.870
You let Sn be the sum
of all of them.
00:17:48.870 --> 00:17:53.400
What we want to understand is,
how does S sub n behave?
00:17:53.400 --> 00:17:57.040
And more particularly,
how does Sn over n--
00:17:57.040 --> 00:18:00.003
namely, the relative--
00:18:00.003 --> 00:18:05.230
not the relative frequency, but
the sample average of x
00:18:05.230 --> 00:18:07.710
behave when you take
n samples?
00:18:07.710 --> 00:18:13.520
So this curve shows the
distribution function of S4,
00:18:13.520 --> 00:18:20.700
of S20, and of S50 when you have
a binary random variable
00:18:20.700 --> 00:18:25.500
with probability of 1 equal to
a quarter, probability of 0
00:18:25.500 --> 00:18:26.970
equal to 3/4.
00:18:26.970 --> 00:18:30.140
And what you see graphically
is what you can see
00:18:30.140 --> 00:18:33.080
mathematically very
easily, too.
00:18:33.080 --> 00:18:38.570
The mean value of S sub
n is n times X bar.
00:18:38.570 --> 00:18:42.680
So the center point in these
curves is moving out within.
00:18:42.680 --> 00:18:44.650
You see the center point here.
00:18:44.650 --> 00:18:48.060
Center point somewhere
around there.
00:18:48.060 --> 00:18:52.950
Actually, the center point is
at-- yeah, just about what it
00:18:52.950 --> 00:18:55.080
looks like.
00:18:55.080 --> 00:19:00.180
And the center point here is
out somewhere around there.
00:19:00.180 --> 00:19:04.540
And you see the variance, you
see the variance going up
00:19:04.540 --> 00:19:06.700
linearly with n.
00:19:06.700 --> 00:19:10.710
Not with n squared, but with
n, which means the standard
00:19:10.710 --> 00:19:14.750
deviation is going up with
the square root of n.
00:19:14.750 --> 00:19:18.510
That's sort of why the law
of large numbers works.
00:19:18.510 --> 00:19:22.220
It's because the standard
deviation of these random--
00:19:22.220 --> 00:19:26.070
of these sums only goes up with
the square root of n.
00:19:26.070 --> 00:19:31.520
So these curves, along with
moving out, become relatively
00:19:31.520 --> 00:19:35.335
more compressed relative to
how far out they are.
00:19:40.940 --> 00:19:46.520
This curve here is relatively
more compressed for its mean
00:19:46.520 --> 00:19:48.230
than this one is here.
00:19:48.230 --> 00:19:51.240
And that's more compressed
relative to this one.
00:19:51.240 --> 00:19:55.210
We get this a lot more
easily if we look
00:19:55.210 --> 00:19:56.430
at the sample average.
00:19:56.430 --> 00:19:58.810
Namely, S sub n over n.
00:19:58.810 --> 00:20:03.250
This is a random variable
of mean X bar.
00:20:03.250 --> 00:20:07.320
That's a random variable of
variance sigma squared over n.
00:20:07.320 --> 00:20:10.770
That's something that you ought
to just recognize and
00:20:10.770 --> 00:20:14.230
have very close to the top of
your consciousness because
00:20:14.230 --> 00:20:16.510
that again, is sort
of why the sample
00:20:16.510 --> 00:20:18.590
average starts to converge.
00:20:18.590 --> 00:20:24.880
So what happens then is for n
equals 4, you get this very
00:20:24.880 --> 00:20:26.780
"blech" curve.
00:20:26.780 --> 00:20:29.370
For n equals 20, it starts
looking a little more
00:20:29.370 --> 00:20:30.490
reasonable.
00:20:30.490 --> 00:20:35.150
For n equals 50, it's starting
to scrunch in and start to
00:20:35.150 --> 00:20:38.270
look like a unit step.
00:20:38.270 --> 00:20:43.250
And what we'll find is that the
intuitive way of looking
00:20:43.250 --> 00:20:48.090
at the law of large numbers, or
one of the more intuitive
00:20:48.090 --> 00:20:50.880
ways of looking at it,
is that the sample
00:20:50.880 --> 00:20:52.860
average starts to look--
00:20:52.860 --> 00:20:57.000
starts to have a distribution
function, which
00:20:57.000 --> 00:20:58.730
looks like a unit step.
00:20:58.730 --> 00:21:01.140
And that step occurs
at the mean.
00:21:01.140 --> 00:21:05.310
So this curve here keeps
scrunching in.
00:21:05.310 --> 00:21:08.350
This part down here is
moving over that way.
00:21:08.350 --> 00:21:11.790
This part over here is
moving over that way.
00:21:11.790 --> 00:21:16.620
And it all gets close
to a unit step.
00:21:16.620 --> 00:21:20.350
OK, the variance of Sn over n,
as we've said, is equal to
00:21:20.350 --> 00:21:22.680
sigma squared over n.
00:21:22.680 --> 00:21:27.960
The limit of the variance as n
goes to infinity takes the
00:21:27.960 --> 00:21:29.440
limit of that.
00:21:29.440 --> 00:21:32.770
Don't even have to know the
definition of a limit, can see
00:21:32.770 --> 00:21:36.070
that when n gets large,
this get small.
00:21:36.070 --> 00:21:37.340
And this goes to 0.
00:21:37.340 --> 00:21:41.400
So the limit of this
goes to 0.
00:21:41.400 --> 00:21:45.240
Now, here's the important
thing.
00:21:45.240 --> 00:21:52.240
This equation says a whole lot
more than this equation says.
00:21:52.240 --> 00:21:58.090
Because this equation says how
quickly that approaches 0.
00:21:58.090 --> 00:22:01.370
All this says is it
approaches 0.
00:22:01.370 --> 00:22:06.150
So we've thrown away a lot that
we know, and now all we
00:22:06.150 --> 00:22:09.350
know is this.
00:22:09.350 --> 00:22:15.720
This 3 says that the convergence
is as 1/n.
00:22:15.720 --> 00:22:16.890
This doesn't say that.
00:22:16.890 --> 00:22:19.920
This just says that
it converges.
00:22:19.920 --> 00:22:23.660
Why would anyone in their right
mind want to replace an
00:22:23.660 --> 00:22:28.030
informative statement like this
with a not informative
00:22:28.030 --> 00:22:30.240
statement like this?
00:22:30.240 --> 00:22:33.450
Any ideas of why you might
want to do that?
00:22:33.450 --> 00:22:34.700
Any suggestions?
00:22:38.790 --> 00:22:39.145
AUDIENCE: Convenience.
00:22:39.145 --> 00:22:39.500
PROFESSOR: What?
00:22:39.500 --> 00:22:41.720
AUDIENCE: Convenience.
00:22:41.720 --> 00:22:42.120
Convenience.
00:22:42.120 --> 00:22:44.130
Sometimes you don't need to--
00:22:44.130 --> 00:22:45.320
PROFESSOR: Well, yes,
convenience.
00:22:45.320 --> 00:22:47.260
But there's a much
stronger reason.
00:22:51.750 --> 00:22:55.740
This is a statement for
IID random variables.
00:22:55.740 --> 00:22:59.500
This law of large numbers, we
want it to apply to as many
00:22:59.500 --> 00:23:01.840
different situations
as possible.
00:23:01.840 --> 00:23:04.030
To things that aren't
quite IID.
00:23:04.030 --> 00:23:08.150
To things that don't
have a variance.
00:23:08.150 --> 00:23:15.030
And this statement here
is going to apply more
00:23:15.030 --> 00:23:16.450
generally than this.
00:23:16.450 --> 00:23:20.610
You can have situations where
the variance goes to 0 more
00:23:20.610 --> 00:23:24.780
slowly than 1/n if these random
variables are not
00:23:24.780 --> 00:23:25.990
independent.
00:23:25.990 --> 00:23:28.380
But you still have
this statement.
00:23:28.380 --> 00:23:35.170
And this statement is what we
really need, so this really
00:23:35.170 --> 00:23:37.770
says something, which is called
00:23:37.770 --> 00:23:40.230
convergence and mean square.
00:23:40.230 --> 00:23:41.400
Why mean square?
00:23:41.400 --> 00:23:44.560
Because this is the
mean squared.
00:23:44.560 --> 00:23:47.380
So obvious terminology.
00:23:47.380 --> 00:23:50.400
Mathematicians aren't always
very good at choosing
00:23:50.400 --> 00:23:54.240
terminology that makes sense
when you look at it,
00:23:54.240 --> 00:23:55.990
but this one does.
00:23:55.990 --> 00:24:00.800
Definition is a sequence of
random variables Y1, Y2, Y3,
00:24:00.800 --> 00:24:01.830
and so forth.
00:24:01.830 --> 00:24:06.560
Converges in mean square to a
random variable Y if this
00:24:06.560 --> 00:24:10.100
limit here is equal to 0.
00:24:10.100 --> 00:24:17.130
So in this case, Y, this random
variable Y, is really a
00:24:17.130 --> 00:24:20.360
deterministic random variable,
which is just the
00:24:20.360 --> 00:24:25.290
deterministic value, expected
value of X. This random
00:24:25.290 --> 00:24:30.290
variable here is this relative
frequency here.
00:24:30.290 --> 00:24:33.390
And this is saying that the
expected value of the relative
00:24:33.390 --> 00:24:38.210
frequency relative to the
expected value of X--
00:24:38.210 --> 00:24:39.660
this is going to 0.
00:24:39.660 --> 00:24:41.910
This isn't saying
anything extra.
00:24:41.910 --> 00:24:44.670
This is just saying, if you're
not interested in the law of
00:24:44.670 --> 00:24:47.910
large numbers, you might be
interested in how a bunch of
00:24:47.910 --> 00:24:53.030
random variables approach some
other random variable.
00:24:53.030 --> 00:24:57.760
Now, if you look at a set of
real numbers and you say, does
00:24:57.760 --> 00:25:02.500
that set of real numbers
approach something?
00:25:02.500 --> 00:25:06.660
I mean, you have sort of a
complicated looking definition
00:25:06.660 --> 00:25:10.590
for that, which really
says that the numbers
00:25:10.590 --> 00:25:13.320
approach this constant.
00:25:13.320 --> 00:25:18.860
But a set of numbers is so much
more simple-minded than a
00:25:18.860 --> 00:25:20.960
set of random variables.
00:25:20.960 --> 00:25:23.250
I mean, a set of random
variables is--
00:25:23.250 --> 00:25:28.100
I mean, not even their
distribution functions really
00:25:28.100 --> 00:25:29.680
explain what they are.
00:25:29.680 --> 00:25:31.620
There's also the relationship
between the
00:25:31.620 --> 00:25:33.390
distribution functions.
00:25:33.390 --> 00:25:38.550
So you're not going to find
anything very easy that says
00:25:38.550 --> 00:25:40.680
random variables converge.
00:25:40.680 --> 00:25:43.530
And you can expect to find the
number of different kinds of
00:25:43.530 --> 00:25:47.020
statements about convergence.
00:25:47.020 --> 00:25:49.270
And this is just going
to be one of them.
00:25:49.270 --> 00:25:53.010
This is something called
convergence and mean square--
00:25:53.010 --> 00:25:53.736
yes?
00:25:53.736 --> 00:26:02.296
AUDIENCE: Going from 3 to 4,
we don't need IID anymore?
00:26:02.296 --> 00:26:06.000
So they can be just--
00:26:06.000 --> 00:26:08.790
PROFESSOR: You can certainly
find examples where it's not
00:26:08.790 --> 00:26:12.620
IID, and this doesn't hold
and this does hold.
00:26:12.620 --> 00:26:16.200
The most interesting case where
this doesn't hold and
00:26:16.200 --> 00:26:20.780
this does hold is where you--
00:26:20.780 --> 00:26:23.800
no, you still need a variance
for this to hold.
00:26:28.600 --> 00:26:30.930
Yeah, so I guess I can't really
construct any nice
00:26:30.930 --> 00:26:37.300
examples of n where this holds
and this doesn't hold.
00:26:37.300 --> 00:26:40.160
But there are some if you talk
about random variables that
00:26:40.160 --> 00:26:41.410
are not IID.
00:26:44.580 --> 00:26:46.406
I ought to have a problem
that does that.
00:26:46.406 --> 00:26:48.950
But so far, I don't.
00:26:52.690 --> 00:26:57.520
Now, the fact that this sample
average converges in mean
00:26:57.520 --> 00:27:00.640
square doesn't tell us directly
what might be more
00:27:00.640 --> 00:27:02.070
interesting.
00:27:02.070 --> 00:27:05.320
I mean, you look at that
statement and it doesn't
00:27:05.320 --> 00:27:09.550
really tell you what this
complementary distribution
00:27:09.550 --> 00:27:11.530
function looks like.
00:27:11.530 --> 00:27:16.360
I mean, to me the thing that
is closest to what I would
00:27:16.360 --> 00:27:22.430
think of as convergence is that
this sequence of random
00:27:22.430 --> 00:27:27.680
variables minus the random
variable, the convergence of
00:27:27.680 --> 00:27:33.250
that difference, approaches a
distribution function, which
00:27:33.250 --> 00:27:34.890
is the unit step.
00:27:34.890 --> 00:27:37.720
Which means that the probability
that you're
00:27:37.720 --> 00:27:44.380
anywhere off of that center
point is going to 0.
00:27:44.380 --> 00:27:49.070
I mean, that's a very easy
to interpret statement.
00:27:49.070 --> 00:27:53.550
The fact that the variance is
going to 0, I don't quite know
00:27:53.550 --> 00:27:56.180
how do interpret it, except
through Chebyshev's law, which
00:27:56.180 --> 00:27:59.520
gets me to the other
statement.
00:27:59.520 --> 00:28:05.680
So what I'm saying here is if
we apply Chebyshev to that
00:28:05.680 --> 00:28:08.690
statement before number 3--
00:28:08.690 --> 00:28:10.560
this one--
00:28:10.560 --> 00:28:12.540
which says what the
variance is.
00:28:12.540 --> 00:28:17.840
If we apply Chebyshev, then what
we get is the probability
00:28:17.840 --> 00:28:24.050
that the relative frequency
minus the mean, the absolute
00:28:24.050 --> 00:28:27.320
value of that is greater than
or equal to epsilon.
00:28:27.320 --> 00:28:31.530
That probability is less than or
equal to sigma squared over
00:28:31.530 --> 00:28:33.930
n times epsilon squared.
00:28:33.930 --> 00:28:37.030
You'll notice this is
a very peculiar
00:28:37.030 --> 00:28:39.480
statement in terms of epsilon.
00:28:39.480 --> 00:28:44.700
Because if you want to make
epsilon very small, so you get
00:28:44.700 --> 00:28:51.410
something strong here,
this term blows up.
00:28:51.410 --> 00:28:55.730
So the way you have to look at
this is pick some epsilon
00:28:55.730 --> 00:28:56.980
you're happy with.
00:29:00.830 --> 00:29:04.760
I mean, you might want these two
things to be within 1% of
00:29:04.760 --> 00:29:06.490
each other.
00:29:06.490 --> 00:29:14.020
Then, epsilon squared
here is 10,000.
00:29:14.020 --> 00:29:17.830
But by making n big enough,
that gets submerged.
00:29:17.830 --> 00:29:21.020
So excuse me.
00:29:21.020 --> 00:29:26.920
Epsilon squared is 1/10,000
so 1 over
00:29:26.920 --> 00:29:29.180
epsilon squared is 10,000.
00:29:29.180 --> 00:29:30.900
So you need to make
n very large.
00:29:30.900 --> 00:29:31.900
Yes?
00:29:31.900 --> 00:29:34.406
AUDIENCE: So that's why at times
when n is too small and
00:29:34.406 --> 00:29:36.130
epsilon is too small as well,
you can get obvious things,
00:29:36.130 --> 00:29:38.390
like it's less than or equal
to a number greater than 1?
00:29:38.390 --> 00:29:39.526
PROFESSOR: Yes.
00:29:39.526 --> 00:29:42.660
And this inequality is not much
good because there's a
00:29:42.660 --> 00:29:45.120
very obvious inequality
that works.
00:29:45.120 --> 00:29:46.810
Yes.
00:29:46.810 --> 00:29:50.580
But the other thing is this
is a very weak inequality.
00:29:50.580 --> 00:29:53.590
So all this is doing is
giving you a bound.
00:29:53.590 --> 00:29:59.230
All it's doing is saying that
when n gets big enough, this
00:29:59.230 --> 00:30:03.110
number gets as small as
you want it to be.
00:30:03.110 --> 00:30:06.040
So you can get an arbitrary
accuracy of epsilon between
00:30:06.040 --> 00:30:08.520
sample average and mean.
00:30:08.520 --> 00:30:10.780
You can get that with
a probability
00:30:10.780 --> 00:30:13.450
1 minus this quantity.
00:30:13.450 --> 00:30:17.110
You can make that as close
to 1 as you wish if
00:30:17.110 --> 00:30:18.680
you increase n.
00:30:18.680 --> 00:30:22.690
So that gives us the law of
large numbers, and I haven't
00:30:22.690 --> 00:30:27.450
stated it formally, all the
formal jazz as in the notes.
00:30:27.450 --> 00:30:31.700
But it says, if you have IID
random variables with a finite
00:30:31.700 --> 00:30:35.800
variance, the limit of the
probability that Sn over n
00:30:35.800 --> 00:30:38.970
minus x bar, the absolute value
of that is greater than
00:30:38.970 --> 00:30:43.360
or equal to epsilon is equal to
0 in the limit, no matter
00:30:43.360 --> 00:30:45.630
how you choose epsilon.
00:30:45.630 --> 00:30:47.610
Namely, this is one of those
peculiar things in
00:30:47.610 --> 00:30:49.760
mathematics.
00:30:49.760 --> 00:30:53.930
It depends on who gets
the first choice.
00:30:53.930 --> 00:30:56.420
If I get to choose epsilon
and you get to
00:30:56.420 --> 00:30:58.730
choose n, then you win.
00:30:58.730 --> 00:31:00.650
You can make this go to 0.
00:31:00.650 --> 00:31:04.430
If you choose n and then I
choose epsilon, you lose.
00:31:04.430 --> 00:31:06.840
So it's only when you choose
first that you win.
00:31:06.840 --> 00:31:08.630
But still, this statement
works.
00:31:08.630 --> 00:31:11.940
For every epsilon greater
than 0, this limit
00:31:11.940 --> 00:31:14.220
here is equal to 0.
00:31:14.220 --> 00:31:22.266
Now, let's go immediately a
couple of pages beyond and
00:31:22.266 --> 00:31:25.200
look at this figure a little
bit because I think this
00:31:25.200 --> 00:31:29.290
figure tells what's going
on, I think better
00:31:29.290 --> 00:31:30.710
than anything else.
00:31:30.710 --> 00:31:34.070
You have the mean of x, which is
right in the center of this
00:31:34.070 --> 00:31:36.740
distribution function.
00:31:36.740 --> 00:31:40.800
As n gets larger and larger,
this distribution function
00:31:40.800 --> 00:31:44.060
here is going to be scrunching
in, which we sort of know
00:31:44.060 --> 00:31:46.720
because the variance
is going to 0.
00:31:46.720 --> 00:31:50.210
And we also sort of know it
because of what this weak law
00:31:50.210 --> 00:31:53.110
of large numbers tells us.
00:31:53.110 --> 00:31:55.390
And we have these.
00:31:58.140 --> 00:32:02.220
If we pick some given epsilon,
then we have--
00:32:02.220 --> 00:32:07.070
if we look at a range of two
epsilon, epsilon on one side
00:32:07.070 --> 00:32:11.030
of the mean, epsilon on the
other side of the mean, then
00:32:11.030 --> 00:32:16.140
we can ask the question, how
well does this distribution
00:32:16.140 --> 00:32:18.950
function conform
to a unit step?
00:32:18.950 --> 00:32:23.830
Well, one easy way of looking at
that is saying, if we draw
00:32:23.830 --> 00:32:29.150
a rectangle here of width 2
epsilon around X bar, when
00:32:29.150 --> 00:32:32.380
does this distribution function
get inside that
00:32:32.380 --> 00:32:37.280
rectangle and when does it
leave the rectangle?
00:32:37.280 --> 00:32:42.930
And what the weak law of large
numbers says is that if you
00:32:42.930 --> 00:32:47.900
pick epsilon and hold it fixed,
then delta 1 is going
00:32:47.900 --> 00:32:51.340
to 0 and delta 2
is going to 0.
00:32:51.340 --> 00:32:52.590
And eventually--
00:32:56.863 --> 00:33:00.030
I think this is dying out.
00:33:00.030 --> 00:33:01.570
Well, no problem.
00:33:04.510 --> 00:33:09.130
What this says is that as n gets
larger and larger, this
00:33:09.130 --> 00:33:11.660
quantity shrinks down to 0.
00:33:11.660 --> 00:33:14.970
That quantity up there
shrinks down to 0.
00:33:14.970 --> 00:33:17.950
And suddenly, you have something
which is, for all
00:33:17.950 --> 00:33:23.120
practical purposes,
a unit step.
00:33:23.120 --> 00:33:26.015
Namely, if you think about it a
little bit, how can you take
00:33:26.015 --> 00:33:30.450
an increasing curve, which
increases from 0 to 1, and say
00:33:30.450 --> 00:33:33.560
that's close to a unit step?
00:33:33.560 --> 00:33:35.200
Isn't this a nice
way of doing it?
00:33:38.590 --> 00:33:41.350
I mean, the function is
increasing so it can't do
00:33:41.350 --> 00:33:45.450
anything after it crosses
this point here.
00:33:45.450 --> 00:33:50.990
All it can do is increase, and
eventually it leaves again.
00:33:50.990 --> 00:33:53.780
Now, another thing.
00:33:53.780 --> 00:33:59.020
When you think about the weak
law of large numbers and you
00:33:59.020 --> 00:34:04.890
don't state it formally, one of
the important things is you
00:34:04.890 --> 00:34:10.909
can't make epsilon 0 here and
you can't make delta 0.
00:34:10.909 --> 00:34:14.389
You need both an epsilon and
a delta in this argument.
00:34:14.389 --> 00:34:16.790
And you can see that just by
looking at a reasonable
00:34:16.790 --> 00:34:19.030
distribution function.
00:34:19.030 --> 00:34:23.550
If you make epsilon equal to 0,
then you're asking, what's
00:34:23.550 --> 00:34:28.940
the probability that this sample
average is exactly
00:34:28.940 --> 00:34:31.190
equal to the mean?
00:34:31.190 --> 00:34:35.920
And in most cases, that's
equal to 0.
00:34:35.920 --> 00:34:39.020
Namely, you can't win
on that argument.
00:34:39.020 --> 00:34:45.800
And if you try to make
delta equal to 0.
00:34:45.800 --> 00:34:47.050
In other words, you ask--
00:34:50.850 --> 00:34:54.219
then suddenly you're stuck over
here, and you're stuck
00:34:54.219 --> 00:34:58.820
way over there, and you can't
make epsilon small.
00:34:58.820 --> 00:35:03.170
So trying to say that a curve
looks like a step function,
00:35:03.170 --> 00:35:07.290
you really need two fudge
factors to do that.
00:35:07.290 --> 00:35:09.410
So the weak law of
large numbers.
00:35:09.410 --> 00:35:18.120
In terms of dealing with how
close you are to a step
00:35:18.120 --> 00:35:21.960
function, the weak law of large
numbers says about the
00:35:21.960 --> 00:35:24.610
only thing you can
reasonably say.
00:35:24.610 --> 00:35:27.620
OK, now let's go back
to the slide before.
00:35:27.620 --> 00:35:31.290
The weak law of large numbers
says that the limit as n goes
00:35:31.290 --> 00:35:35.790
to infinity of the probability
that the sample average is
00:35:35.790 --> 00:35:39.180
greater than or equal
to epsilon equals 0.
00:35:39.180 --> 00:35:41.820
And it says that for every
epsilon greater than 0.
00:35:45.160 --> 00:35:49.290
An equivalent statement is
this statement here.
00:35:52.350 --> 00:35:56.400
The probability that Sn over n
minus x bar is greater than or
00:35:56.400 --> 00:36:00.130
equal to epsilon is a
complicated looking animal,
00:36:00.130 --> 00:36:01.230
but it's just a number.
00:36:01.230 --> 00:36:03.500
It's just a number
between 0 and 1.
00:36:03.500 --> 00:36:05.060
It's a probability.
00:36:05.060 --> 00:36:08.460
So for every n, you get
a number up there.
00:36:08.460 --> 00:36:13.680
And what this is saying is that
that sequence of numbers
00:36:13.680 --> 00:36:16.640
is approaching 0.
00:36:16.640 --> 00:36:19.560
Another way to say that a
sequence of numbers approaches
00:36:19.560 --> 00:36:21.900
0 is this way down here.
00:36:21.900 --> 00:36:27.270
It says that for every epsilon
greater than 0 and every delta
00:36:27.270 --> 00:36:33.000
greater than 0, the probability
that this quantity
00:36:33.000 --> 00:36:38.450
is less than or equal
to delta is--
00:36:38.450 --> 00:36:42.950
this probability is less than
or equal to delta for all
00:36:42.950 --> 00:36:43.910
large enough n.
00:36:43.910 --> 00:36:48.040
In other words, these funny
little things on the edge here
00:36:48.040 --> 00:36:56.930
for this next slide, the delta
1 and delta 2 are going to 0.
00:36:56.930 --> 00:36:59.980
So it's important to understand
this both ways.
00:37:04.930 --> 00:37:10.580
And now again, this quantity
here looks very much like--
00:37:21.740 --> 00:37:24.920
these two equations look
very much alike.
00:37:24.920 --> 00:37:27.470
Except this one tells you
something more about
00:37:27.470 --> 00:37:29.060
convergence than
this one does.
00:37:29.060 --> 00:37:31.900
This says how this goes to 0.
00:37:31.900 --> 00:37:34.260
This only says that
it goes to 0.
00:37:34.260 --> 00:37:37.245
So again, we have
the same thing.
00:37:37.245 --> 00:37:41.800
The weak law of large numbers
says this weaker thing, and it
00:37:41.800 --> 00:37:44.690
says this weaker thing
because sometimes you
00:37:44.690 --> 00:37:46.660
need the weaker thing.
00:37:46.660 --> 00:37:49.200
And in this case, there
is a good example.
00:37:49.200 --> 00:37:52.960
The weak law of large numbers
is true, even if you don't
00:37:52.960 --> 00:37:54.490
have a variance.
00:37:54.490 --> 00:37:56.790
It's true under the
single condition
00:37:56.790 --> 00:37:58.890
that you have a mean.
00:37:58.890 --> 00:38:03.540
There's a nice proof in
the text about that.
00:38:03.540 --> 00:38:08.770
It's a proof that does
something, which we're going
00:38:08.770 --> 00:38:11.030
to do many, many times.
00:38:11.030 --> 00:38:12.880
You look at a random variable.
00:38:12.880 --> 00:38:15.990
You can't say what you want
to say about it, so
00:38:15.990 --> 00:38:18.930
you truncate it.
00:38:18.930 --> 00:38:21.910
You say, let me--
00:38:21.910 --> 00:38:27.520
I mean, if you think a problem
is too hard, you look at a
00:38:27.520 --> 00:38:29.120
simpler problem.
00:38:29.120 --> 00:38:32.050
If you're drunk and you drop
a coin, you look for it
00:38:32.050 --> 00:38:33.010
underneath a light.
00:38:33.010 --> 00:38:35.130
You don't look for it where
it's dark, even though you
00:38:35.130 --> 00:38:37.380
dropped it where it's dark.
00:38:37.380 --> 00:38:38.620
So all of us do that.
00:38:38.620 --> 00:38:41.830
If we can't solve a problem,
we try to pose a simpler,
00:38:41.830 --> 00:38:44.360
similar problem that
we can solve.
00:38:44.360 --> 00:38:47.220
So you truncate this
random variable.
00:38:47.220 --> 00:38:50.190
When you truncate a random
variable, I mean you just take
00:38:50.190 --> 00:38:53.050
its distribution function
and you chop it off
00:38:53.050 --> 00:38:55.470
to a certain point.
00:38:55.470 --> 00:38:58.040
And what happens then?
00:38:58.040 --> 00:38:59.370
Well. at that point you
have a variance.
00:38:59.370 --> 00:39:01.010
You have a moment generating
function.
00:39:01.010 --> 00:39:02.680
You have all the things
you want.
00:39:02.680 --> 00:39:07.160
Nothing peculiar can happen
because the thing is bounded.
00:39:07.160 --> 00:39:11.090
So then the trick in proving the
weak law of large numbers
00:39:11.090 --> 00:39:14.620
under these more general
circumstances is to first
00:39:14.620 --> 00:39:16.960
truncate the random variable.
00:39:16.960 --> 00:39:19.480
You then have the weak
law of large numbers.
00:39:19.480 --> 00:39:22.890
And then the thing that you do
is in a very ticklish way, you
00:39:22.890 --> 00:39:26.050
start increasing n,
and you increase
00:39:26.050 --> 00:39:27.850
the truncation parameter.
00:39:27.850 --> 00:39:31.550
And if you do this in just the
right way, you wind up proving
00:39:31.550 --> 00:39:33.620
the theorem you want to prove.
00:39:33.620 --> 00:39:36.000
Now, I'm not saying you ought
to read that proof now.
00:39:38.710 --> 00:39:42.020
If you're sailing along with no
problems at all, you ought
00:39:42.020 --> 00:39:43.990
to read that proof now.
00:39:43.990 --> 00:39:46.800
If you don't quite have the
kind of mathematical
00:39:46.800 --> 00:39:51.670
background that I seem to be
often assuming in this course,
00:39:51.670 --> 00:39:53.420
you ought to skip that.
00:39:53.420 --> 00:39:56.740
You will have many opportunities
to understand
00:39:56.740 --> 00:39:57.830
the technique later.
00:39:57.830 --> 00:40:01.900
It's the technique which is
important, it's not the--
00:40:01.900 --> 00:40:04.600
I mean, it's not so much
the actual proof.
00:40:09.530 --> 00:40:13.660
Now, the thing we didn't talk
about, about this picture
00:40:13.660 --> 00:40:18.770
here, is we say that a sequence
of random variables--
00:40:18.770 --> 00:40:20.520
Y1, Y2, et cetera--
00:40:20.520 --> 00:40:25.350
converges in probability to a
random variable y if for every
00:40:25.350 --> 00:40:30.740
epsilon greater than 0 and every
delta greater than 0,
00:40:30.740 --> 00:40:36.440
the probability that the n-th
random variable minus this
00:40:36.440 --> 00:40:39.370
funny random variable is greater
than or equal to
00:40:39.370 --> 00:40:41.330
epsilon, is less than
or equal to delta.
00:40:41.330 --> 00:40:45.460
That's saying the same thing
as this picture says.
00:40:45.460 --> 00:40:49.500
In this picture here, you can
draw each one of the Y sub n's
00:40:49.500 --> 00:40:52.840
minus Y. You think of
Y sub n minus Y as a
00:40:52.840 --> 00:40:55.710
single random variable.
00:40:55.710 --> 00:40:58.780
And then you get this kind of
curve here and the same
00:40:58.780 --> 00:41:00.580
interpretation works.
00:41:00.580 --> 00:41:04.160
So again, what you're saying
with convergence and
00:41:04.160 --> 00:41:09.380
probability is that the
distribution function of Yn
00:41:09.380 --> 00:41:15.710
minus Y is approaching a unit
step as n gets big.
00:41:15.710 --> 00:41:18.570
So that's really the meaning
of convergence and
00:41:18.570 --> 00:41:20.190
probability.
00:41:20.190 --> 00:41:23.720
I mean, you get this unit step
as n gets bigger and bigger.
00:41:27.164 --> 00:41:30.860
OK, so let's review all
of what we've done in
00:41:30.860 --> 00:41:34.070
the last half hour.
00:41:34.070 --> 00:41:38.470
If a random variable, generic
random variable x, has a
00:41:38.470 --> 00:41:39.640
standard deviation--
00:41:39.640 --> 00:41:42.570
in other words, if it has
a finite variance.
00:41:42.570 --> 00:41:49.580
And if X1, X2 are IID with that
standard deviation, then
00:41:49.580 --> 00:41:53.860
the standard deviation of the
relative frequency is equal to
00:41:53.860 --> 00:41:56.840
the standard deviation
of x divided by the
00:41:56.840 --> 00:41:58.290
square root of n.
00:41:58.290 --> 00:42:02.610
So the standard deviation
is the relative
00:42:02.610 --> 00:42:03.760
of the sample average.
00:42:03.760 --> 00:42:06.430
Excuse me, sample average,
not relative frequency.
00:42:06.430 --> 00:42:12.080
Of the sample average is going
to 0 as n gets big.
00:42:12.080 --> 00:42:17.530
In the same way, if you have a
sequence of arbitrary random
00:42:17.530 --> 00:42:21.720
variables, which are converging
to Y in mean
00:42:21.720 --> 00:42:25.820
square, then Chebyshev shows
that it converges in
00:42:25.820 --> 00:42:27.180
probability.
00:42:27.180 --> 00:42:31.530
OK so mean square convergence
implies convergence in
00:42:31.530 --> 00:42:32.810
probability.
00:42:32.810 --> 00:42:35.950
Mean square convergence is a
funny statement which says
00:42:35.950 --> 00:42:44.470
that this sequence of random
variables has a standard
00:42:44.470 --> 00:42:47.260
deviation, which
is going to 0.
00:42:47.260 --> 00:42:51.500
And it's hard to see exactly
what that means because that
00:42:51.500 --> 00:42:55.140
standard deviation is a
complicated integral.
00:42:55.140 --> 00:42:58.040
And I don't know
what it means.
00:42:58.040 --> 00:43:02.070
But if you use the Chebyshev
inequality, then it means this
00:43:02.070 --> 00:43:10.610
very simple statement, which
says that this sequence has to
00:43:10.610 --> 00:43:13.840
converge in probability to Y.
00:43:13.840 --> 00:43:16.600
Mean square convergence then
implies convergence in
00:43:16.600 --> 00:43:19.480
probability.
00:43:19.480 --> 00:43:22.420
Reverse isn't true because--
00:43:22.420 --> 00:43:26.010
and I can't give you an example
of it now, but I've
00:43:26.010 --> 00:43:28.450
already told you something
about it.
00:43:28.450 --> 00:43:31.650
Because I've said that's the
weak law of large numbers
00:43:31.650 --> 00:43:39.280
continues to hold if the generic
random variable has a
00:43:39.280 --> 00:43:41.890
mean, but doesn't have a
variance because of this
00:43:41.890 --> 00:43:43.140
truncation argument.
00:43:48.440 --> 00:43:53.170
Well, I mean, what it says
then is a variance is not
00:43:53.170 --> 00:43:57.030
required for the weak law of
large numbers to hold.
00:43:57.030 --> 00:43:59.650
And if the variance doesn't
hold, then you certainly don't
00:43:59.650 --> 00:44:01.990
have convergence
in mean square.
00:44:01.990 --> 00:44:07.150
So we have an example even
though you haven't proven that
00:44:07.150 --> 00:44:08.700
that example works.
00:44:08.700 --> 00:44:11.780
You have an example where the
weak law of large numbers
00:44:11.780 --> 00:44:19.290
holds, but convergence in mean
square does not hold.
00:44:19.290 --> 00:44:24.510
OK, and the final thing is
convergence in probability
00:44:24.510 --> 00:44:29.730
really means that the
distribution of Yn minus Y
00:44:29.730 --> 00:44:31.330
approaches the unit step.
00:44:31.330 --> 00:44:31.750
Yes?
00:44:31.750 --> 00:44:35.117
AUDIENCE: So in general,
convergence in probability
00:44:35.117 --> 00:44:37.041
doesn't imply convergence
in distribution.
00:44:37.041 --> 00:44:38.970
But it holds in this special
case because--
00:44:38.970 --> 00:44:40.746
PROFESSOR: It does imply
convergence in distribution.
00:44:40.746 --> 00:44:42.560
We haven't talked about
convergence in
00:44:42.560 --> 00:44:43.810
distribution yet.
00:44:48.040 --> 00:44:52.060
Except it does not imply
convergence in mean square,
00:44:52.060 --> 00:44:55.190
which is a thing that
requires a variance.
00:44:55.190 --> 00:44:59.230
So you can have convergence
in probability without
00:44:59.230 --> 00:45:02.570
convergence in mean square,
but not the other way.
00:45:02.570 --> 00:45:04.860
I mean, convergence in mean
square, you just apply
00:45:04.860 --> 00:45:07.210
Chebyshev to it,
and suddenly--
00:45:07.210 --> 00:45:09.995
presto, you have convergence
in probability.
00:45:16.960 --> 00:45:18.530
And incidentally, I
wish all of you
00:45:18.530 --> 00:45:21.450
would ask more questions.
00:45:21.450 --> 00:45:24.720
Because we're taking this video,
which is going to be
00:45:24.720 --> 00:45:28.860
shown to many people in many
different countries.
00:45:28.860 --> 00:45:32.830
And they ask themselves, would
it be better if I came to MIT
00:45:32.830 --> 00:45:35.450
and then I could sit-in class
and ask questions?
00:45:35.450 --> 00:45:37.760
And then they see these videos
and they say, ah, it doesn't
00:45:37.760 --> 00:45:39.590
make any difference, nobody
ask questions anyway.
00:45:42.820 --> 00:45:45.750
And because of that, MIT
will simply wither
00:45:45.750 --> 00:45:47.810
away at some point.
00:45:47.810 --> 00:45:50.720
So it's very important for you
to ask questions now and then.
00:45:55.470 --> 00:45:58.400
Now, let's go on to the
central limit theorem.
00:46:03.900 --> 00:46:08.110
This sum of n IID
random variables
00:46:08.110 --> 00:46:12.110
minus n times the mean--
00:46:12.110 --> 00:46:17.010
in other words, we just
normalized it to 0 mean.
00:46:17.010 --> 00:46:20.945
S sub n minus n x bar is a
0 mean random variable.
00:46:20.945 --> 00:46:23.660
And it has variance n
times sigma squared.
00:46:23.660 --> 00:46:28.140
It also has second moment
n times sigma squared.
00:46:28.140 --> 00:46:35.220
And what that means is that you
take Sn minus n times the
00:46:35.220 --> 00:46:39.590
mean of x and divide it by the
square root of n times sigma.
00:46:39.590 --> 00:46:41.490
What you get is something
which is 0
00:46:41.490 --> 00:46:44.640
mean and unit variance.
00:46:44.640 --> 00:46:49.070
So as you keep increasing n,
this random variable here, Sn
00:46:49.070 --> 00:46:53.550
minus n x bar over the square
root of n sigma, just sits
00:46:53.550 --> 00:46:57.720
there rock solid with the same
mean and the same variance,
00:46:57.720 --> 00:47:00.190
nothing ever happens to it.
00:47:00.190 --> 00:47:03.670
Except it has a distribution
function, and the distribution
00:47:03.670 --> 00:47:05.020
function changes.
00:47:05.020 --> 00:47:08.220
I mean, you see the distribution
function changing
00:47:08.220 --> 00:47:12.380
here as you let n get
larger and larger.
00:47:12.380 --> 00:47:16.310
In some sense, these steps are
getting smaller and smaller.
00:47:16.310 --> 00:47:19.680
So it looks like you're
approaching some particular
00:47:19.680 --> 00:47:26.470
curve and when we looked
at the Bernoulli case--
00:47:26.470 --> 00:47:28.290
I guess it was just last time.
00:47:28.290 --> 00:47:32.080
When we looked at the Bernoulli
case, what we saw is
00:47:32.080 --> 00:47:37.610
that these steps here were
going as e to the minus
00:47:37.610 --> 00:47:42.050
difference from the mean squared
divided by 2 times
00:47:42.050 --> 00:47:43.570
sigma squared.
00:47:43.570 --> 00:47:47.270
Bunch of stuff, but what we saw
was that these steps were
00:47:47.270 --> 00:47:52.790
proportional to the density
of a Gaussian.
00:47:52.790 --> 00:47:57.030
In other words, this curve that
we're converging to is
00:47:57.030 --> 00:48:00.170
proportional to the distribution
function of the
00:48:00.170 --> 00:48:02.050
Gaussian random variable.
00:48:02.050 --> 00:48:06.120
We didn't completely prove that
because all we did was to
00:48:06.120 --> 00:48:08.590
show what happened to the PMF.
00:48:08.590 --> 00:48:11.500
We didn't really integrate
these things.
00:48:11.500 --> 00:48:16.830
We didn't really deal with all
of the small quantities.
00:48:16.830 --> 00:48:19.100
We said they weren't
important.
00:48:19.100 --> 00:48:23.620
But you sort of got the picture
of exactly why this
00:48:23.620 --> 00:48:26.860
convergence to a normal
distribution
00:48:26.860 --> 00:48:29.900
function takes place.
00:48:29.900 --> 00:48:33.310
And the theorem says this more
general thing that this
00:48:33.310 --> 00:48:37.340
convergence does, in
fact, take place.
00:48:37.340 --> 00:48:41.310
And that is what the central
limit theorem says.
00:48:41.310 --> 00:48:45.460
It says that that happens not
only for the Bernoulli case,
00:48:45.460 --> 00:48:48.360
but it happens for all
random variables,
00:48:48.360 --> 00:48:51.210
which have a variance.
00:48:51.210 --> 00:48:54.890
And the convergence is
relatively good if the random
00:48:54.890 --> 00:48:57.110
variables have a certain
moment that
00:48:57.110 --> 00:48:58.575
can be awful otherwise.
00:49:08.450 --> 00:49:12.980
So this expression on top then
is really the expression of
00:49:12.980 --> 00:49:15.320
the central limit theorem.
00:49:15.320 --> 00:49:24.320
It says not only does the
normalized sample average--
00:49:24.320 --> 00:49:28.290
I'll call this whole thing the
normalized sample average
00:49:28.290 --> 00:49:33.700
because Sn minus n X bar has
variance square root of n
00:49:33.700 --> 00:49:35.350
times sigma sub x.
00:49:35.350 --> 00:49:40.900
So this normalized sample
average has mean 0 and
00:49:40.900 --> 00:49:42.760
standard deviation 1.
00:49:42.760 --> 00:49:46.370
Not only does it have mean 0
and variance 1, but it also
00:49:46.370 --> 00:49:50.620
becomes closer and closer to
this Gaussian distribution.
00:49:50.620 --> 00:49:51.870
Why is that important?
00:49:54.500 --> 00:49:57.020
Well, if you start studying
noise and things like that,
00:49:57.020 --> 00:49:58.870
it's very important.
00:49:58.870 --> 00:50:03.290
Because it says that if you have
the sum of lots and lots
00:50:03.290 --> 00:50:09.450
of very, very small, unimportant
things, then what
00:50:09.450 --> 00:50:12.830
those things add up to if
they're relatively independent
00:50:12.830 --> 00:50:15.310
is something which is
almost Gaussian.
00:50:15.310 --> 00:50:22.080
You pick up a book on noise
theory or you pick up a book
00:50:22.080 --> 00:50:26.360
which is on communication, or
which is on control, or
00:50:26.360 --> 00:50:30.430
something like that, and after
you read a few chapters, you
00:50:30.430 --> 00:50:35.550
get the idea that all random
variables are Gaussian.
00:50:35.550 --> 00:50:38.040
This is particularly true
if you look at books on
00:50:38.040 --> 00:50:39.630
statistics.
00:50:39.630 --> 00:50:42.160
Many, many books on statistics,
particularly for
00:50:42.160 --> 00:50:45.360
undergraduates, the only random
variable they ever talk
00:50:45.360 --> 00:50:48.610
about is the normal
random variable.
00:50:48.610 --> 00:50:51.160
For some reason or other, you're
led to believe that all
00:50:51.160 --> 00:50:53.860
random variables are Gaussian.
00:50:53.860 --> 00:50:55.300
Well, of course, they aren't.
00:50:55.300 --> 00:50:59.080
But this says that a lot of
random variables, which are
00:50:59.080 --> 00:51:03.210
sums of large numbers of little
things, in fact are
00:51:03.210 --> 00:51:04.540
close to Gaussian.
00:51:04.540 --> 00:51:06.925
But we're interested in it
here for another reason.
00:51:11.540 --> 00:51:15.530
And we'll come to that
in a little bit.
00:51:15.530 --> 00:51:19.160
But let me make the comment that
the proofs that I gave
00:51:19.160 --> 00:51:22.150
you about the central
limit theorem for
00:51:22.150 --> 00:51:24.650
the Bernoulli case--
00:51:24.650 --> 00:51:27.300
and if you fill-in those
epsilons and deltas there,
00:51:27.300 --> 00:51:30.520
that really was a valid proof.
00:51:30.520 --> 00:51:34.270
That technique does not work
at all when you have a
00:51:34.270 --> 00:51:36.180
non-Bernoulli situation.
00:51:36.180 --> 00:51:38.760
Because the situation is
very, very complicated.
00:51:38.760 --> 00:51:41.400
You wind up--
00:51:41.400 --> 00:51:43.740
I mean, if you have a Bernoulli
case, you wind up
00:51:43.740 --> 00:51:46.800
with this nice, nice
distribution, which says that
00:51:46.800 --> 00:51:51.390
every step in the Bernoulli
distribution, you have terms
00:51:51.390 --> 00:51:54.420
that are increasing and then
terms that are decreasing.
00:51:54.420 --> 00:51:57.030
If you look at what happens
for a discrete random
00:51:57.030 --> 00:52:01.870
variable, which is not binary,
you have the most god awful
00:52:01.870 --> 00:52:05.570
distribution if you
try to look at the
00:52:05.570 --> 00:52:07.130
probability mass function.
00:52:07.130 --> 00:52:09.500
It is just awful.
00:52:09.500 --> 00:52:12.140
And the only thing which
looks nice is the
00:52:12.140 --> 00:52:13.105
distribution function.
00:52:13.105 --> 00:52:16.110
The distribution function
looks relatively nice.
00:52:16.110 --> 00:52:18.713
And why is hard to tell.
00:52:18.713 --> 00:52:24.110
And if you look at proofs of
it, it goes through Fourier
00:52:24.110 --> 00:52:25.090
transforms.
00:52:25.090 --> 00:52:28.330
In probability theory, Fourier
transforms are called
00:52:28.330 --> 00:52:32.300
characteristics functions, but
it's really the same thing.
00:52:32.300 --> 00:52:35.610
And you go through this very
complicated argument.
00:52:35.610 --> 00:52:38.720
I've been through it
a number of times.
00:52:38.720 --> 00:52:41.200
And to me, it's all algebra.
00:52:41.200 --> 00:52:44.340
And I'm not a person that just
accepts the fact that
00:52:44.340 --> 00:52:46.690
something is all
algebra easily.
00:52:46.690 --> 00:52:50.170
I keep trying to find ways of
making sense out of it.
00:52:50.170 --> 00:52:52.700
And I've never been able to make
sense out of it, but I'm
00:52:52.700 --> 00:52:54.400
convinced that it's true.
00:52:54.400 --> 00:52:58.180
So you just have to sort
of live with that.
00:52:58.180 --> 00:53:06.550
So anyway, the central limit
theorem does apply to the
00:53:06.550 --> 00:53:08.420
distribution function.
00:53:08.420 --> 00:53:11.090
Namely, exactly what
that says.
00:53:11.090 --> 00:53:16.620
The distribution function of
this normalized sample average
00:53:16.620 --> 00:53:21.330
does go into the distribution
function of the Gaussian.
00:53:21.330 --> 00:53:26.240
The PMFs do not converge
at all.
00:53:26.240 --> 00:53:30.410
And nothing else converges,
but just that one thing.
00:53:30.410 --> 00:53:34.780
OK, a sequence of random
variables converges in
00:53:34.780 --> 00:53:36.060
distribution--
00:53:36.060 --> 00:53:38.450
this is what someone was asking
about just a second
00:53:38.450 --> 00:53:43.080
ago, but I don't think you were
really asking about that.
00:53:43.080 --> 00:53:46.980
But this is what convergence
in distribution means.
00:53:46.980 --> 00:53:51.610
It means it's the limit of the
distribution functions of a
00:53:51.610 --> 00:53:54.240
sequence of random variables.
00:53:54.240 --> 00:53:59.140
Turns into the distribution
function of some
00:53:59.140 --> 00:54:00.390
other random variable.
00:54:06.530 --> 00:54:09.990
Then you say that these random
variables converge in
00:54:09.990 --> 00:54:14.930
distribution to Z. And that's
a nice, useful thing.
00:54:14.930 --> 00:54:18.920
And the CLT, the Central Limit
Theorem, then says that this
00:54:18.920 --> 00:54:24.990
normalized sample average
converges in distribution to
00:54:24.990 --> 00:54:28.440
the distribution of a normal
random variable.
00:54:28.440 --> 00:54:33.280
Many people call that density
city phi and call the normal
00:54:33.280 --> 00:54:34.650
distribution phi.
00:54:34.650 --> 00:54:35.280
I don't know why.
00:54:35.280 --> 00:54:39.600
I mean, you've got to call it
something, so many people call
00:54:39.600 --> 00:54:40.850
it the same thing.
00:54:43.110 --> 00:54:46.660
This convergence and
distribution is really almost
00:54:46.660 --> 00:54:48.450
a misnomer.
00:54:48.450 --> 00:54:52.240
Because when random variables
converge in distribution to
00:54:52.240 --> 00:54:56.740
another random variable, I mean,
if you say something
00:54:56.740 --> 00:55:03.290
converges, usually you have the
idea that the thing which
00:55:03.290 --> 00:55:06.710
is converging to something else
is getting close to it in
00:55:06.710 --> 00:55:08.190
some sense.
00:55:08.190 --> 00:55:11.240
And the random variables aren't
getting close at all,
00:55:11.240 --> 00:55:12.885
it's only the distribution
functions
00:55:12.885 --> 00:55:15.180
that are getting close.
00:55:15.180 --> 00:55:21.680
If I take a sequence of IID
random variables, all of them
00:55:21.680 --> 00:55:24.910
have the same distribution
function.
00:55:24.910 --> 00:55:27.430
And therefore, a
sequence of IID
00:55:27.430 --> 00:55:30.250
random variables converges.
00:55:30.250 --> 00:55:35.040
And in fact, it's converged
right from the beginning to
00:55:35.040 --> 00:55:37.100
the same random variable,
to the same
00:55:37.100 --> 00:55:39.140
generic random variable.
00:55:39.140 --> 00:55:41.330
But they're not at all
close to each other.
00:55:41.330 --> 00:55:46.130
But you still call this
convergence in distribution.
00:55:46.130 --> 00:55:48.820
Why do we make such a big fuss
about convergence in
00:55:48.820 --> 00:55:50.020
distribution?
00:55:50.020 --> 00:55:53.850
Well, primarily because of the
central limit theorem because
00:55:53.850 --> 00:55:57.740
you would like to see that a
sequence of random variables,
00:55:57.740 --> 00:56:01.600
in fact, starts to look like
something that is interesting,
00:56:01.600 --> 00:56:04.250
which is the Gaussian random
variable after a while.
00:56:04.250 --> 00:56:08.340
It says we can do these
crazy things that
00:56:08.340 --> 00:56:11.910
statisticians do, and that--
00:56:11.910 --> 00:56:14.390
well, fortunately, most
communication theorists are a
00:56:14.390 --> 00:56:18.160
little more careful than
statisticians.
00:56:18.160 --> 00:56:20.980
Somebody's going to hit me
for saying that, but
00:56:20.980 --> 00:56:22.860
I think it's true.
00:56:22.860 --> 00:56:29.420
But the central limit theorem,
in fact, does say that many of
00:56:29.420 --> 00:56:33.180
these sums of random variables
you look at can be reasonably
00:56:33.180 --> 00:56:35.135
approximated as being
Gaussian.
00:56:38.750 --> 00:56:44.730
So what we have now is
convergence in probability
00:56:44.730 --> 00:56:46.550
implies convergence
in distribution.
00:56:49.110 --> 00:56:51.620
And the proof, I will--
00:56:51.620 --> 00:56:56.680
on the slides, I always
abbreviate proof by Pf.
00:56:56.680 --> 00:57:00.190
And sometimes Pf is just what
it sounds like it, it's
00:57:00.190 --> 00:57:05.430
"poof." It is not quite a proof,
and you have to look at
00:57:05.430 --> 00:57:07.790
those to get the actual proof.
00:57:07.790 --> 00:57:11.530
But this says the convergence
is a sequence of Yn's in
00:57:11.530 --> 00:57:14.970
probability means that it
converges to a unit step.
00:57:14.970 --> 00:57:17.360
That's exactly what convergence
00:57:17.360 --> 00:57:20.400
in probability mean.
00:57:20.400 --> 00:57:24.860
It converges to a unit step,
and ti converges everywhere
00:57:24.860 --> 00:57:27.580
but at the step itself.
00:57:27.580 --> 00:57:30.040
If you look at the definition
of convergence in
00:57:30.040 --> 00:57:33.970
distribution, and I might not
have said it carefully enough
00:57:33.970 --> 00:57:38.630
when I defined it back here.
00:57:38.630 --> 00:57:40.310
Oh, yes, I did.
00:57:40.310 --> 00:57:42.590
Remarkable.
00:57:42.590 --> 00:57:46.130
Often I make up these slides
when I'm half asleep, and they
00:57:46.130 --> 00:57:49.050
don't always say what I
intended them to say.
00:57:49.050 --> 00:57:53.290
And my evil twin brother comes
in and changes them later.
00:57:53.290 --> 00:57:54.430
But here I said it right.
00:57:54.430 --> 00:57:58.660
A sequence of random variables
converges in distribution to
00:57:58.660 --> 00:58:05.910
another random variable Z if the
limit of the distribution
00:58:05.910 --> 00:58:10.736
function is equal
to the limit--
00:58:10.736 --> 00:58:14.530
if the limit of the distribution
function of the Z
00:58:14.530 --> 00:58:18.780
sub n is equal to the
distribution function of Z.
00:58:18.780 --> 00:58:23.470
But it only says for all Z
where this distribution
00:58:23.470 --> 00:58:26.550
function is continuous.
00:58:26.550 --> 00:58:33.840
You can't really expect much
more than that because if
00:58:33.840 --> 00:58:36.180
you're looking at
a distribution--
00:58:36.180 --> 00:58:38.380
if you're looking at a
limiting distribution
00:58:38.380 --> 00:58:43.990
function, which looks like this,
especially for the law
00:58:43.990 --> 00:58:50.080
of large numbers, all we've
been able to show is that
00:58:50.080 --> 00:58:56.930
these distributions come in down
here very close, go up
00:58:56.930 --> 00:58:59.660
and get out very
close up there.
00:58:59.660 --> 00:59:01.650
We haven't said anything
about where they
00:59:01.650 --> 00:59:03.680
cross this actual line.
00:59:03.680 --> 00:59:07.160
And there's nothing in the
argument about the weak law of
00:59:07.160 --> 00:59:11.970
large numbers, which says
anything about what happens
00:59:11.970 --> 00:59:15.640
right exactly at the mean.
00:59:15.640 --> 00:59:18.700
But that's something
that's the central
00:59:18.700 --> 00:59:20.460
limit theorem says--
00:59:20.460 --> 00:59:21.632
Yes?
00:59:21.632 --> 00:59:22.574
AUDIENCE: What's the Zn?
00:59:22.574 --> 00:59:25.400
That's not the sample mean?
00:59:31.370 --> 00:59:37.530
PROFESSOR: When we use the Zn's
for the central limit
00:59:37.530 --> 00:59:42.510
theorem, then what I mean by
the Z sub n's here is those
00:59:42.510 --> 00:59:47.380
normalized random variables Sn
minus n x bar over square root
00:59:47.380 --> 00:59:48.630
of n times sigma.
00:59:50.760 --> 00:59:54.080
And in all of these definitions
of convergence,
00:59:54.080 --> 00:59:57.950
the random variables, which are
converging to something,
00:59:57.950 --> 00:59:59.890
are always rather peculiar.
00:59:59.890 --> 01:00:02.660
Sometimes they're the
sample averages.
01:00:02.660 --> 01:00:06.840
Sometimes they're the normalized
sample averages.
01:00:06.840 --> 01:00:08.090
God knows what they are.
01:00:10.980 --> 01:00:13.430
But what mathematicians
like to do--
01:00:13.430 --> 01:00:15.950
and there's a good reason for
what they like to do--
01:00:15.950 --> 01:00:20.340
is they like to define different
kinds of convergence
01:00:20.340 --> 01:00:25.620
in general terms, and then apply
them to the specific
01:00:25.620 --> 01:00:28.770
thing that you're
interested in.
01:00:28.770 --> 01:00:31.620
OK, so the central limit
theorem says that this
01:00:31.620 --> 01:00:38.610
normalized sum converges in
distribution to phi, but it
01:00:38.610 --> 01:00:42.180
only has to converge where the
distribution function is
01:00:42.180 --> 01:00:43.287
continuous.
01:00:43.287 --> 01:00:45.275
Yes?
01:00:45.275 --> 01:00:47.760
AUDIENCE: So the theorem applies
to the distribution.
01:00:47.760 --> 01:00:52.730
Why doesn't it apply to PMF?
01:00:52.730 --> 01:01:02.560
PROFESSOR: Well, if you look at
the example we have here,
01:01:02.560 --> 01:01:10.400
if you look at the PDF for
this normalized random
01:01:10.400 --> 01:01:13.030
variable, you find something
which is
01:01:13.030 --> 01:01:15.550
jumping up, jumping up.
01:01:15.550 --> 01:01:18.780
If we look at it for n equals
50, it's still jumping up.
01:01:18.780 --> 01:01:20.760
The jumps are smaller.
01:01:20.760 --> 01:01:23.150
But if you look at the
PDF for this--
01:01:23.150 --> 01:01:28.810
well, if you look at the
distribution function for the
01:01:28.810 --> 01:01:30.530
normal, it has a density.
01:01:34.260 --> 01:01:41.130
This PDF function for the
things which you want to
01:01:41.130 --> 01:01:45.690
approach a limit never
have a density.
01:01:45.690 --> 01:01:47.760
All the time they have a PDF.
01:01:47.760 --> 01:01:51.020
The steps are getting
smaller and smaller.
01:01:51.020 --> 01:01:54.390
And you can see that here as
you're up to n equals 50.
01:01:54.390 --> 01:01:57.740
You can see these little
tiny steps here.
01:01:57.740 --> 01:02:03.350
But you still have a PMF.
01:02:03.350 --> 01:02:06.850
You'll want to look at it
in terms of density.
01:02:06.850 --> 01:02:09.980
You have to look at in
terms of impulses.
01:02:09.980 --> 01:02:12.630
And there's no way you can say
an impulse is starting to
01:02:12.630 --> 01:02:14.765
approach a smooth curve.
01:02:36.640 --> 01:02:39.610
OK, so we have this proof that
converges in probability,
01:02:39.610 --> 01:02:43.770
implies convergence
in distribution.
01:02:43.770 --> 01:02:47.680
And since convergence in mean
square implies convergence in
01:02:47.680 --> 01:02:51.430
probability, and convergence
in probability implies
01:02:51.430 --> 01:02:54.760
convergence in distribution,
we suddenly have the
01:02:54.760 --> 01:02:58.030
convergence in mean square
implies convergence in
01:02:58.030 --> 01:02:59.425
distribution also.
01:02:59.425 --> 01:03:03.740
And you have this nice picture
in the book of all the things
01:03:03.740 --> 01:03:05.990
that converge in distribution.
01:03:05.990 --> 01:03:07.420
Inside of that--
01:03:07.420 --> 01:03:10.250
this is distribution.
01:03:10.250 --> 01:03:12.850
Inside of that is all the
things that converge in
01:03:12.850 --> 01:03:14.100
probability.
01:03:17.440 --> 01:03:20.490
And inside of that is
all the things that
01:03:20.490 --> 01:03:21.740
converge in mean square.
01:03:31.510 --> 01:03:34.200
Now, there's a paradox here.
01:03:34.200 --> 01:03:38.590
And what the paradox is, is that
the central limit theorem
01:03:38.590 --> 01:03:43.470
says something very, very strong
about how Sn over n--
01:03:43.470 --> 01:03:45.520
namely, the sample average--
01:03:45.520 --> 01:03:48.250
converges to the mean.
01:03:48.250 --> 01:03:50.640
The convergence in distribution
is a very weak
01:03:50.640 --> 01:03:52.570
form of convergence.
01:03:52.570 --> 01:03:55.610
So how is this weak form of
convergence telling you
01:03:55.610 --> 01:04:02.460
something that says specific
about how a sample average
01:04:02.460 --> 01:04:03.680
converges to the mean?
01:04:03.680 --> 01:04:05.750
It tells you much more
than the weak law of
01:04:05.750 --> 01:04:06.910
large numbers does.
01:04:06.910 --> 01:04:09.780
Because it tells you if you at
this thing, it's starting to
01:04:09.780 --> 01:04:13.340
approach a normal distribution
function.
01:04:13.340 --> 01:04:15.970
And the resolution
of that paradox--
01:04:15.970 --> 01:04:17.840
and this is important
I think--
01:04:17.840 --> 01:04:21.470
is that the random variables
converge in distribution to
01:04:21.470 --> 01:04:23.950
the central limit theorem
are these
01:04:23.950 --> 01:04:28.860
normalized random variables.
01:04:28.860 --> 01:04:36.830
The ones that converge in
probability are the things
01:04:36.830 --> 01:04:40.030
which are normalizing in terms
of the mean, but you're not
01:04:40.030 --> 01:04:43.330
normalizing them in
terms of variance.
01:04:43.330 --> 01:04:47.700
So when you look at one curve
relative to the other curve,
01:04:47.700 --> 01:04:51.500
one curve is a squashed down
version of the other curve.
01:04:51.500 --> 01:04:53.740
I mean, look at those pictures
we have for that example.
01:04:59.230 --> 01:05:03.980
If you look at a sequence of
distribution functions for S
01:05:03.980 --> 01:05:09.230
sub n over n, what you find is
things which are squashing
01:05:09.230 --> 01:05:13.520
down into a unit step.
01:05:13.520 --> 01:05:17.100
If you look at what you have
for the normalized random
01:05:17.100 --> 01:05:24.990
variables, normalized to unit
variance, what you have is
01:05:24.990 --> 01:05:27.910
something which is not squashing
down at all.
01:05:27.910 --> 01:05:30.310
It gives the whole shape
of the thing.
01:05:30.310 --> 01:05:34.580
You can get from one curve to
the other just by squashing or
01:05:34.580 --> 01:05:37.290
expanding on the x-axis.
01:05:37.290 --> 01:05:39.790
That's the only difference
between them.
01:05:39.790 --> 01:05:44.750
So the central limit theorem
says when you don't squash,
01:05:44.750 --> 01:05:49.340
you get this nice Gaussian
distribution function.
01:05:49.340 --> 01:05:54.040
The weak law of large numbers
says when you do squash, you
01:05:54.040 --> 01:05:56.010
get a unit step.
01:05:56.010 --> 01:05:57.960
Now, which tells you more?
01:05:57.960 --> 01:06:01.610
Well, if you have the central
limit theorem, it tells you a
01:06:01.610 --> 01:06:08.900
lot more because it says, if
you look at this unit step
01:06:08.900 --> 01:06:14.160
here, and you expand it out by
a factor of square root of n,
01:06:14.160 --> 01:06:19.340
what you're going to get is
something that goes like this.
01:06:19.340 --> 01:06:22.490
The central limit theorem tells
you exactly what the
01:06:22.490 --> 01:06:25.780
distribution function
is at x bar.
01:06:25.780 --> 01:06:30.960
It tells you that that's
converging to what?
01:06:30.960 --> 01:06:36.750
What's the probability that the
sum of a large number of
01:06:36.750 --> 01:06:39.500
random variables is greater
than n times the mean?
01:06:42.670 --> 01:06:44.790
What is it approximately?
01:06:44.790 --> 01:06:45.090
AUDIENCE: 1/2.
01:06:45.090 --> 01:06:46.340
PROFESSOR: 1/2.
01:06:47.880 --> 01:06:50.370
That's what this says.
01:06:50.370 --> 01:06:52.460
This is a distribution
function.
01:06:52.460 --> 01:06:54.400
It's converging to
the distribution
01:06:54.400 --> 01:06:57.240
function of the normal.
01:06:57.240 --> 01:07:00.500
It hits that point, the
normal is centered
01:07:00.500 --> 01:07:02.980
on this x bar here.
01:07:02.980 --> 01:07:05.280
And it hits that point
exactly at 1/2.
01:07:05.280 --> 01:07:08.800
This says the probability of
being on that side is 1/2.
01:07:08.800 --> 01:07:11.976
The probability of being
on this side is 1/2.
01:07:11.976 --> 01:07:15.500
So you see, the central limit
theorem is telling you a whole
01:07:15.500 --> 01:07:19.580
lot more about how this is
converging than the weak law
01:07:19.580 --> 01:07:21.980
of large numbers is.
01:07:21.980 --> 01:07:23.440
Now, I come back to the
question I asked
01:07:23.440 --> 01:07:24.690
you a long time ago.
01:07:27.760 --> 01:07:31.080
Why is the weak law
of large numbers--
01:07:31.080 --> 01:07:35.170
why do you see it used more
often than the central limit
01:07:35.170 --> 01:07:37.005
theorem since it's so
much less powerful?
01:07:39.710 --> 01:07:42.010
Well, it's the same
answer as before.
01:07:42.010 --> 01:07:45.480
It's less powerful, but it
applies to a much larger
01:07:45.480 --> 01:07:47.390
number of cases.
01:07:47.390 --> 01:07:51.620
And in many situations, all
you want is that weaker
01:07:51.620 --> 01:07:54.210
statement that tells you
everything you want to know,
01:07:54.210 --> 01:07:57.670
but it tells you that weaker
statement for this enormous
01:07:57.670 --> 01:08:01.820
variety of different
situations.
01:08:01.820 --> 01:08:05.980
Mean square convergence applies
to fewer things.
01:08:05.980 --> 01:08:08.570
Well, of course, convergence
in distribution applies for
01:08:08.570 --> 01:08:09.990
even more things.
01:08:09.990 --> 01:08:13.080
But we saw that when you're
dealing with the central limit
01:08:13.080 --> 01:08:18.120
theorem, all bets are off on
that because it's talking
01:08:18.120 --> 01:08:21.109
about a different sequence of
random variables, which might
01:08:21.109 --> 01:08:23.430
or might not converge.
01:08:23.430 --> 01:08:29.520
OK, so finally, convergence
with probability 1.
01:08:32.109 --> 01:08:38.060
Many people call convergence
with probability 1 convergence
01:08:38.060 --> 01:08:42.560
almost surely, or convergence
almost everywhere.
01:08:42.560 --> 01:08:45.640
You will see this almost
everywhere.
01:08:45.640 --> 01:08:50.330
Now, why do I want to use
convergence with probability,
01:08:50.330 --> 01:08:54.109
and why is that a dangerous
thing to do?
01:08:54.109 --> 01:08:58.270
When you say things are
converging with probability 1,
01:08:58.270 --> 01:09:01.770
it sounds very much like you're
saying they converge in
01:09:01.770 --> 01:09:03.970
probability because you're
using the word
01:09:03.970 --> 01:09:05.439
"probability" in each.
01:09:05.439 --> 01:09:08.499
The two are very, very
different concepts.
01:09:11.790 --> 01:09:15.729
And therefore, it would seem
like you should avoid the word
01:09:15.729 --> 01:09:19.330
"probability" in this second one
and say convergence almost
01:09:19.330 --> 01:09:23.689
surely or convergence
almost everywhere.
01:09:23.689 --> 01:09:27.120
And the reason I don't like
those is they don't make any
01:09:27.120 --> 01:09:30.800
sense, unless you understand
measure theory.
01:09:30.800 --> 01:09:33.069
And we're not assuming
that you understand
01:09:33.069 --> 01:09:35.220
measure theory here.
01:09:35.220 --> 01:09:37.720
If you wanted to do that first
problem in the last problem
01:09:37.720 --> 01:09:40.899
set, you had to understand
measure theory.
01:09:40.899 --> 01:09:45.555
And I apologize for that, I
didn't mean to do that to you.
01:09:45.555 --> 01:09:52.390
But this notion of convergence
with probability 1, I think
01:09:52.390 --> 01:09:53.649
you can understand that.
01:09:53.649 --> 01:09:56.590
I think you can get a good sense
of what it means without
01:09:56.590 --> 01:09:57.950
knowing any measure theory.
01:09:57.950 --> 01:10:01.050
And at least that's what
we're trying to do.
01:10:01.050 --> 01:10:04.180
OK, so let's go on.
01:10:06.950 --> 01:10:09.410
We've already said that a random
variable is a lot more
01:10:09.410 --> 01:10:14.200
complicated thing than
a number is.
01:10:14.200 --> 01:10:16.060
I think those of you who
thought you understood
01:10:16.060 --> 01:10:19.410
probability theory pretty well,
I probably managed to
01:10:19.410 --> 01:10:23.790
confuse you enough to get you
to the point where you think
01:10:23.790 --> 01:10:26.690
you're not on totally
safe ground talking
01:10:26.690 --> 01:10:28.330
about random variables.
01:10:28.330 --> 01:10:31.310
And you're certainly not on very
safe ground talking about
01:10:31.310 --> 01:10:34.850
how random variables converge
to each other.
01:10:34.850 --> 01:10:38.960
And that's good because to reach
a greater understanding
01:10:38.960 --> 01:10:41.840
of something, you have to get
to the point where you're a
01:10:41.840 --> 01:10:44.090
little bit confused first.
01:10:44.090 --> 01:10:46.910
So I've intentionally
tried to--
01:10:46.910 --> 01:10:48.190
well, I haven't tried
to make this more
01:10:48.190 --> 01:10:51.140
confusing than necessary.
01:10:51.140 --> 01:10:56.490
But in fact, it's not as simple
as what elementary
01:10:56.490 --> 01:10:59.320
courses would make
you believe.
01:10:59.320 --> 01:11:04.140
OK, this notion of convergence
with probability 1, which we
01:11:04.140 --> 01:11:13.140
abbreviate WP1, is something
that we're not going to talk
01:11:13.140 --> 01:11:17.030
about a great deal until we
come to renewal processes.
01:11:17.030 --> 01:11:19.880
And the reason is we won't need
it a great deal until we
01:11:19.880 --> 01:11:21.960
come to renewal processes.
01:11:21.960 --> 01:11:25.150
But you ought to know that
there's something like that
01:11:25.150 --> 01:11:26.770
hanging out there.
01:11:26.770 --> 01:11:29.580
And you ought to have some
idea of what it is.
01:11:29.580 --> 01:11:31.890
So here's the definition
of it.
01:11:31.890 --> 01:11:37.120
A sequence of random variables
convergence with probability 1
01:11:37.120 --> 01:11:40.600
to some other random
variable Z all in
01:11:40.600 --> 01:11:43.190
the same sample space.
01:11:43.190 --> 01:11:46.920
If the probability of sample
points in omega--
01:11:46.920 --> 01:11:53.750
now remember, that a sample
point implies a value for each
01:11:53.750 --> 01:11:56.560
one of these random variables.
01:11:56.560 --> 01:12:00.270
So in a sense, you can think of
a sample point as, more or
01:12:00.270 --> 01:12:04.630
less, equivalent to a sample
path of this sequence of
01:12:04.630 --> 01:12:06.340
random variables here.
01:12:06.340 --> 01:12:11.810
OK, so for omega and capital
Omega, it says the limit as n
01:12:11.810 --> 01:12:16.670
goes to infinity of these random
variables at the point
01:12:16.670 --> 01:12:25.690
omega is equal to what this
extra random variable is at
01:12:25.690 --> 01:12:26.210
the point--
01:12:26.210 --> 01:12:28.000
and it says that the probability
of that whole
01:12:28.000 --> 01:12:31.050
thing is equal to 1.
01:12:31.050 --> 01:12:33.230
Now, how many of you can
look at that statement
01:12:33.230 --> 01:12:35.926
and see what it means?
01:12:35.926 --> 01:12:39.550
Well, I'm sure some of you can
because you've seen it before.
01:12:39.550 --> 01:12:43.580
But understanding what that
statement means, even though
01:12:43.580 --> 01:12:47.810
it's a very simple statement,
is not very easy.
01:12:47.810 --> 01:12:49.790
So there's the statement
up there.
01:12:49.790 --> 01:12:52.000
Let's try to parse it.
01:12:52.000 --> 01:12:56.470
In other words, break it down
into what it's talking about.
01:12:56.470 --> 01:13:01.210
For each sample point omega,
that sample point is going to
01:13:01.210 --> 01:13:04.960
map into a sequence of--
01:13:21.680 --> 01:13:29.390
so each sample point maps into
this sample path of values for
01:13:29.390 --> 01:13:31.210
this sequence of random
variables.
01:13:35.220 --> 01:13:37.590
Some of those sequences--
01:13:37.590 --> 01:13:40.985
OK, this now is a sequence
of numbers.
01:13:53.880 --> 01:13:58.790
So each omega goes into some
sequence of numbers.
01:13:58.790 --> 01:14:06.630
And that also is unquestionably
close to this
01:14:06.630 --> 01:14:13.460
final generic random variable,
capital Z evaluated at omega.
01:14:13.460 --> 01:14:18.370
Now, some of these sequences
here, sequences of real
01:14:18.370 --> 01:14:24.470
numbers, we all know what a
limit of real numbers is.
01:14:24.470 --> 01:14:26.280
I hope you do.
01:14:26.280 --> 01:14:28.690
I know that many of you don't.
01:14:28.690 --> 01:14:31.230
And we'll talk about it later
when we start talking about
01:14:31.230 --> 01:14:33.000
the strong law of
large numbers.
01:14:33.000 --> 01:14:38.600
But this does, perhaps,
have a limit.
01:14:38.600 --> 01:14:40.780
It perhaps doesn't
have a limit.
01:14:40.780 --> 01:14:46.280
If you look at a sequence 1, 2,
1, 2, 1, 2, 1, 2, forever,
01:14:46.280 --> 01:14:48.660
that doesn't have a limit
because it doesn't start to
01:14:48.660 --> 01:14:50.000
get close to anything.
01:14:50.000 --> 01:14:53.060
It keeps wandering
around forever.
01:14:53.060 --> 01:15:00.670
If you look at a sequence which
is 1 for 10 terms, then
01:15:00.670 --> 01:15:08.200
it's 2 the 11th term, and then
it's 1 for 100 terms, 2 for 1
01:15:08.200 --> 01:15:13.900
more term, 1 for 1,000 terms,
then 2 for the next term, and
01:15:13.900 --> 01:15:17.120
so forth, that's a much
more tricky case.
01:15:17.120 --> 01:15:21.130
Because in that sequence,
pretty soon
01:15:21.130 --> 01:15:22.210
all you see is 1's.
01:15:22.210 --> 01:15:24.950
You look for an awful long way
and you don't see any 2's.
01:15:24.950 --> 01:15:28.340
That does not converge just
because the definition of
01:15:28.340 --> 01:15:30.160
convergence.
01:15:30.160 --> 01:15:33.640
And when you work with
convergence for a long time,
01:15:33.640 --> 01:15:35.850
after a while you're very
happy that that doesn't
01:15:35.850 --> 01:15:39.230
converge because it would
play all sorts of
01:15:39.230 --> 01:15:42.300
havoc with all of analysis.
01:15:42.300 --> 01:15:45.730
So anyway, there is this idea
that these numbers either
01:15:45.730 --> 01:15:47.710
converge or they
don't converge.
01:15:47.710 --> 01:15:51.600
When these numbers converge,
they might or might not
01:15:51.600 --> 01:15:53.500
converge to this.
01:15:53.500 --> 01:15:58.350
So for every omega in this
sample space, you have this
01:15:58.350 --> 01:16:00.130
sequence here.
01:16:00.130 --> 01:16:02.830
That sequence might converge.
01:16:02.830 --> 01:16:05.460
If it does converge, it might
converge to this or it might
01:16:05.460 --> 01:16:08.070
converge to something else.
01:16:08.070 --> 01:16:13.090
And what this is saying here is
you take this entire set of
01:16:13.090 --> 01:16:14.700
sequences here.
01:16:14.700 --> 01:16:17.960
Namely, you take the entire
set of omega.
01:16:17.960 --> 01:16:21.440
And for each one of those
omegas, this might or might
01:16:21.440 --> 01:16:22.800
not converge.
01:16:22.800 --> 01:16:26.700
You look at the set of
omega for which this
01:16:26.700 --> 01:16:28.590
sequence does converge.
01:16:28.590 --> 01:16:30.810
And for which it does converge
to Z of omega.
01:16:34.210 --> 01:16:41.950
And now you look at that set
and what convergence with
01:16:41.950 --> 01:16:47.010
probability 1 means is that that
set turns out to be an
01:16:47.010 --> 01:16:53.720
event, and that event turns
out to have probability 1.
01:16:53.720 --> 01:16:58.370
Which says that for almost
everything that happens, you
01:16:58.370 --> 01:17:01.500
look at this sample sequence
and it has a
01:17:01.500 --> 01:17:05.260
limit, which is this.
01:17:05.260 --> 01:17:08.050
And that's true in
probability.
01:17:08.050 --> 01:17:10.570
It's not true for
most sequences.
01:17:10.570 --> 01:17:15.180
Let me give you a very quick
and simple example.
01:17:15.180 --> 01:17:20.620
Look at this Bernoulli case, and
suppose the probability of
01:17:20.620 --> 01:17:24.830
a 1 is one quarter and the
probability of a 0 is 3/4.
01:17:24.830 --> 01:17:28.450
Look at what happens when you
take an extraordinarily large
01:17:28.450 --> 01:17:36.620
number of trials and you ask
for those sample sequences
01:17:36.620 --> 01:17:38.880
that you take.
01:17:38.880 --> 01:17:41.800
What's going to happen
to them?
01:17:41.800 --> 01:17:46.760
Well, this says that if you
look at the relative
01:17:46.760 --> 01:17:50.330
frequency of 1's--
01:17:50.330 --> 01:17:57.220
well, if they converge in this
sense, if the set of relative
01:17:57.220 --> 01:17:59.330
frequencies--
01:17:59.330 --> 01:18:00.440
excuse me.
01:18:00.440 --> 01:18:06.370
If the set of sample averages
converges in this sense, then
01:18:06.370 --> 01:18:12.960
it says with probability 1, that
sample average is going
01:18:12.960 --> 01:18:16.750
to converge to one quarter.
01:18:16.750 --> 01:18:19.340
Now, that doesn't mean that most
sequences are going to
01:18:19.340 --> 01:18:21.790
converge that way.
01:18:21.790 --> 01:18:26.810
Because most sequences are
going to converge to 1/2.
01:18:26.810 --> 01:18:31.010
There are many more sequences
with half 1's and half 0's
01:18:31.010 --> 01:18:33.370
than there are with
three quarter 1's
01:18:33.370 --> 01:18:37.690
and one quarter 0's--
01:18:37.690 --> 01:18:40.270
with three quarter 0's
and one quarter 1's.
01:18:40.270 --> 01:18:42.980
So there are many more of
one than the other.
01:18:42.980 --> 01:18:48.090
But those particular sequences,
which have
01:18:48.090 --> 01:18:49.640
probability--
01:18:49.640 --> 01:18:54.650
those particular sequences which
have relative frequency
01:18:54.650 --> 01:19:00.110
one quarter are much more likely
than those which have
01:19:00.110 --> 01:19:02.290
relative frequency one half.
01:19:02.290 --> 01:19:04.450
Because the ones with
relative frequency
01:19:04.450 --> 01:19:06.135
one half are so unlikely.
01:19:09.180 --> 01:19:11.670
That's a complicated
set of ideas.
01:19:11.670 --> 01:19:14.860
You sort of know that it has
to be true because of these
01:19:14.860 --> 01:19:16.960
other laws of large numbers.
01:19:16.960 --> 01:19:20.630
And this is simply extending
those laws of large numbers
01:19:20.630 --> 01:19:25.870
one more step to say not only
does the distribution function
01:19:25.870 --> 01:19:28.420
of that sample average converge
to what it should
01:19:28.420 --> 01:19:33.750
converge to, but also for
these sequences with
01:19:33.750 --> 01:19:36.380
probability 1, they converge.
01:19:36.380 --> 01:19:40.450
If you look at the sequence for
long enough, the sample
01:19:40.450 --> 01:19:43.720
average is going to converge to
what it should for that one
01:19:43.720 --> 01:19:45.525
particular sample sequence.
01:19:50.030 --> 01:19:55.510
OK, the strong law of large
numbers then says that if X1,
01:19:55.510 --> 01:20:02.060
X2, and so forth are IID random
variables, and they
01:20:02.060 --> 01:20:06.210
have an expected value, which
is less than infinity, then
01:20:06.210 --> 01:20:11.950
the sample average converges
to the actual average with
01:20:11.950 --> 01:20:13.550
probability 1.
01:20:13.550 --> 01:20:16.910
In other words, it says with
probability 1, you look at
01:20:16.910 --> 01:20:18.640
this sequence forever.
01:20:18.640 --> 01:20:20.740
I don't know how you look
at a sequence forever.
01:20:20.740 --> 01:20:22.270
I've never figured that out.
01:20:22.270 --> 01:20:25.930
But if you could look at it
forever, then with probability
01:20:25.930 --> 01:20:30.580
1, it would come out with the
right relative frequency.
01:20:30.580 --> 01:20:34.870
It'll take a lot of investment
of time when we get to chapter
01:20:34.870 --> 01:20:38.160
4 to try to sort that out.
01:20:38.160 --> 01:20:40.765
Wanted to tell you a little bit
about it, read about in
01:20:40.765 --> 01:20:43.920
notes a little bit in chapter
1, and we will
01:20:43.920 --> 01:20:45.590
come back to it.
01:20:45.590 --> 01:20:48.520
And I think you will
then understand it.
01:20:48.520 --> 01:20:53.770
OK, with that, we are
done with chapter 1.
01:20:53.770 --> 01:20:57.600
Next time we will go into
Poisson processes.
01:20:57.600 --> 01:21:03.230
If you're upset by all of the
abstraction in chapter 1, you
01:21:03.230 --> 01:21:06.780
will be very happy when we get
into Poisson processes because
01:21:06.780 --> 01:21:08.870
there's nothing abstract
there at all.
01:21:08.870 --> 01:21:11.340
Everything you could reasonably
say about Poisson
01:21:11.340 --> 01:21:16.790
processes is either obviously
true or obviously false.
01:21:16.790 --> 01:21:19.810
I mean, there's nothing that's
strange there at all.
01:21:19.810 --> 01:21:23.100
After you understand it,
everything works.
01:21:23.100 --> 01:21:24.740
So we'll do that next time.