WEBVTT
00:00:00.530 --> 00:00:02.960
The following content is
provided under a Creative
00:00:02.960 --> 00:00:04.370
Commons license.
00:00:04.370 --> 00:00:07.410
Your support will help MIT
OpenCourseWare continue to
00:00:07.410 --> 00:00:11.060
offer high quality educational
resources for free.
00:00:11.060 --> 00:00:13.960
To make a donation or view
additional materials from
00:00:13.960 --> 00:00:19.790
hundreds of MIT courses, visit
MIT OpenCourseWare at
00:00:19.790 --> 00:00:21.040
ocw.mit.edu.
00:00:25.210 --> 00:00:29.010
PROFESSOR: OK, we're ready for
the second lecture today.
00:00:29.010 --> 00:00:33.330
We will start to get into a
little technical material,
00:00:33.330 --> 00:00:36.890
which doesn't mean necessarily
that it's more important.
00:00:36.890 --> 00:00:40.630
It just means that it's easier
because it deals with
00:00:40.630 --> 00:00:42.470
mathematics.
00:00:42.470 --> 00:00:46.920
I'm going to spend a little
bit more time reviewing
00:00:46.920 --> 00:00:50.370
probability, as you
learned it before.
00:00:50.370 --> 00:00:53.800
I want to review it at a
slightly more fundamental
00:00:53.800 --> 00:00:56.600
level than what you're
used to seeing it as.
00:00:56.600 --> 00:01:00.900
You will understand why as we go
on because when we get into
00:01:00.900 --> 00:01:05.519
stochastic processes, we will
find that there are lots of
00:01:05.519 --> 00:01:08.700
very peculiar things
that happen.
00:01:08.700 --> 00:01:12.260
And when peculiar things happen,
the only thing you can
00:01:12.260 --> 00:01:15.160
do is go back to basics ideas.
00:01:15.160 --> 00:01:19.090
And if you don't understand what
those basic ideas are,
00:01:19.090 --> 00:01:20.640
then you're in real trouble.
00:01:20.640 --> 00:01:24.780
So we'll start out by talking
about expectations just a
00:01:24.780 --> 00:01:26.030
little bit.
00:01:28.860 --> 00:01:36.570
Distribution function of a
random variable often says
00:01:36.570 --> 00:01:38.600
more than you're really
interested in.
00:01:38.600 --> 00:01:41.540
In other words, a distribution
function is a function from
00:01:41.540 --> 00:01:45.590
the whole sample space into
the real numbers.
00:01:45.590 --> 00:01:48.730
And that's a very complicated
thing in general.
00:01:48.730 --> 00:01:51.990
And the expectation is just
one simple number.
00:01:51.990 --> 00:01:55.050
And with that one simple number,
you get an idea of
00:01:55.050 --> 00:01:58.090
what that random variable is all
about, whether it's big or
00:01:58.090 --> 00:02:01.090
it's little or what have you.
00:02:01.090 --> 00:02:04.740
They're a bunch of formulas that
you're familiar with for
00:02:04.740 --> 00:02:06.730
finding the expectation.
00:02:06.730 --> 00:02:11.140
If you have a discrete random
variable, the usual formula is
00:02:11.140 --> 00:02:14.840
you take all of the possible
sample values, multiply each
00:02:14.840 --> 00:02:18.100
of them by the probability
of that sample value,
00:02:18.100 --> 00:02:20.200
and you sum it up.
00:02:20.200 --> 00:02:23.300
This is what you learned right
at the beginning of taking
00:02:23.300 --> 00:02:24.230
probability.
00:02:24.230 --> 00:02:27.100
If you've never taken
probability, you learned it in
00:02:27.100 --> 00:02:30.120
statistic classes just as
something you don't know where
00:02:30.120 --> 00:02:30.850
it comes from.
00:02:30.850 --> 00:02:32.100
But it's there.
00:02:33.950 --> 00:02:37.710
If you have a continuous random
variable, a continuous
00:02:37.710 --> 00:02:40.270
random variable is one
that has a density.
00:02:40.270 --> 00:02:43.200
You can find the expectation
there.
00:02:43.200 --> 00:02:46.860
If you have an arbitrary random
variable and it's not
00:02:46.860 --> 00:02:52.520
negative, then there's this
peculiar formula here, which I
00:02:52.520 --> 00:02:54.080
will point to.
00:02:59.510 --> 00:03:01.085
I think I'll point to it.
00:03:05.710 --> 00:03:09.680
Ah yes, I will point to it.
00:03:09.680 --> 00:03:14.810
This formula down here, you
might or might not have seen.
00:03:14.810 --> 00:03:18.180
And I hope by the end of this
course, you'll realize that
00:03:18.180 --> 00:03:23.620
it's one, more fundamental and
two, probably more useful than
00:03:23.620 --> 00:03:25.600
either of these two.
00:03:25.600 --> 00:03:30.510
And then there's a final one,
which this final formula, I'll
00:03:30.510 --> 00:03:35.020
tell you a little bit about
that when we get to it.
00:03:35.020 --> 00:03:41.260
OK, the formula for the expected
value in terms of the
00:03:41.260 --> 00:03:45.640
integral of the complimentary
distribution function.
00:03:45.640 --> 00:03:49.520
There's a picture here which
shows you how it corresponds
00:03:49.520 --> 00:03:51.930
to the usual thing you're
used to for a
00:03:51.930 --> 00:03:53.850
discrete random variable.
00:03:53.850 --> 00:03:57.100
Namely what you're doing is
you're integrating this
00:03:57.100 --> 00:03:59.570
complimentary distribution
function, which is the
00:03:59.570 --> 00:04:05.060
probability that the random
variable x is greater than any
00:04:05.060 --> 00:04:07.430
particular x along
the axis here.
00:04:07.430 --> 00:04:10.880
So you integrate this
function along here.
00:04:10.880 --> 00:04:13.900
And according to what I'm trying
to convince you of,
00:04:13.900 --> 00:04:17.980
just integrating that function
gives you the expected value.
00:04:17.980 --> 00:04:21.510
And the reason is that this top
little square here is a1
00:04:21.510 --> 00:04:25.250
times the probability that
x is equal to a1.
00:04:25.250 --> 00:04:28.990
Next one is a2 times the
probability it's equal to a2
00:04:28.990 --> 00:04:31.160
and so forth down.
00:04:31.160 --> 00:04:34.020
And you can obviously generalize
this to any
00:04:34.020 --> 00:04:37.030
discrete random variable,
which is non-negative.
00:04:37.030 --> 00:04:40.490
And I'm just talking about
non-negative random variables
00:04:40.490 --> 00:04:42.500
for the moment.
00:04:42.500 --> 00:04:46.340
If x has a density, the same
argument applies to any
00:04:46.340 --> 00:04:48.580
Riemann sum for that integral.
00:04:48.580 --> 00:04:49.660
You can take integrals.
00:04:49.660 --> 00:04:52.020
You can break them up
into little slices.
00:04:52.020 --> 00:04:54.470
If you break them up into
little slices, you can
00:04:54.470 --> 00:04:56.490
represent it in this way.
00:04:56.490 --> 00:05:01.250
And presto, again, you get that
this integral is equal to
00:05:01.250 --> 00:05:03.000
the expectation.
00:05:03.000 --> 00:05:06.910
And if you have any other thing
at all, you can always
00:05:06.910 --> 00:05:10.670
represent it in terms
of this Riemann sum.
00:05:10.670 --> 00:05:13.360
Now why is it even more
powerful then that?
00:05:13.360 --> 00:05:17.100
Well, it's more powerful then
that because if you took
00:05:17.100 --> 00:05:18.190
measure theory--
00:05:18.190 --> 00:05:22.780
which most of you presumably
have not taken yet, and many
00:05:22.780 --> 00:05:24.720
of you might never take it--
00:05:24.720 --> 00:05:27.200
you will find out that this
is really the fundamental
00:05:27.200 --> 00:05:29.220
definition after all.
00:05:29.220 --> 00:05:33.180
And integration, when you look
and measure theoretic terms,
00:05:33.180 --> 00:05:36.600
instead of taking little slices
that go this way, you
00:05:36.600 --> 00:05:41.400
wind up taking little slices
that go that way.
00:05:41.400 --> 00:05:44.830
So that any way this is the
fundamental definition of
00:05:44.830 --> 00:05:46.010
expectation.
00:05:46.010 --> 00:05:48.960
If you're worried about whether
expectations exist or
00:05:48.960 --> 00:05:52.180
not, why is this much nicer?
00:05:52.180 --> 00:05:56.280
Because what you're integrating
here is simply a
00:05:56.280 --> 00:06:02.010
function, which is monotonic
decreasing with x.
00:06:02.010 --> 00:06:08.130
In other words, if you try to
integrate it by integrating
00:06:08.130 --> 00:06:13.900
this function out to some
largest value and then
00:06:13.900 --> 00:06:18.300
chopping if off there, what
you get is some number.
00:06:18.300 --> 00:06:23.490
If you extend this chopping off
point out, what you get is
00:06:23.490 --> 00:06:25.520
a number which keeps
increasing.
00:06:25.520 --> 00:06:27.060
What can happen?
00:06:27.060 --> 00:06:30.470
As you take a number which is
increasing, you either get to
00:06:30.470 --> 00:06:32.710
infinity or you get to
some finite limit.
00:06:32.710 --> 00:06:34.320
Nothing else can happen.
00:06:34.320 --> 00:06:37.160
So there aren't any limiting
problems here.
00:06:37.160 --> 00:06:40.710
And when you take expectations
in other ways, there are
00:06:40.710 --> 00:06:43.670
always questions that
you have to ask.
00:06:43.670 --> 00:06:45.850
And they're often serious.
00:06:45.850 --> 00:06:48.750
So this is just a much nicer
way of doing it.
00:06:48.750 --> 00:06:51.840
Anyway, that's the way
we're going to do it.
00:06:51.840 --> 00:06:55.020
And so now we go on.
00:06:55.020 --> 00:06:59.220
Oh, I should mention where the
other formula comes from.
00:06:59.220 --> 00:07:02.420
This formula back here.
00:07:06.580 --> 00:07:11.360
You get that by representing x
as both the positive part of x
00:07:11.360 --> 00:07:13.360
plus the negative part of x.
00:07:13.360 --> 00:07:16.610
And if you want to see how to
do that exactly, it's in the
00:07:16.610 --> 00:07:20.320
notes where it talks about
first this and then this.
00:07:20.320 --> 00:07:22.770
So you just put the
two together.
00:07:22.770 --> 00:07:25.470
And then you get an
expected value.
00:07:25.470 --> 00:07:28.950
A word about notation here, and
there's nothing I can do
00:07:28.950 --> 00:07:30.750
about this.
00:07:30.750 --> 00:07:32.590
It's an unfortunate thing.
00:07:32.590 --> 00:07:36.840
When somebody says that the
expected value of a random
00:07:36.840 --> 00:07:39.440
variable exists, what
do they mean?
00:07:43.830 --> 00:07:48.910
Any engineer would try to
integrate it and would either
00:07:48.910 --> 00:07:52.300
get something which was
undefined, because it was
00:07:52.300 --> 00:07:54.040
infinite going this way.
00:07:54.040 --> 00:07:56.090
It's minus infinity
going that way.
00:07:56.090 --> 00:07:59.510
And there's no way to put
the two together.
00:07:59.510 --> 00:08:03.260
If you get infinity going this
way, something finite going
00:08:03.260 --> 00:08:07.700
that way, like with a
non-negative random variable,
00:08:07.700 --> 00:08:11.480
it's kind of silly to say the
expectation doesn't exist.
00:08:11.480 --> 00:08:14.440
Because really what's happening
is the expectation
00:08:14.440 --> 00:08:15.720
is infinite.
00:08:15.720 --> 00:08:18.950
Now mathematicians and everybody
who writes books,
00:08:18.950 --> 00:08:21.840
everybody who writes
papers, everybody--
00:08:21.840 --> 00:08:23.510
I think--
00:08:23.510 --> 00:08:27.810
defines expected value as
existing only if it's finite.
00:08:27.810 --> 00:08:31.280
In other words, what you're
doing is taking this integral
00:08:31.280 --> 00:08:33.409
over the set of real values.
00:08:33.409 --> 00:08:37.130
And you don't allow plus
infinity or minus infinity.
00:08:37.130 --> 00:08:52.060
So you say that the expectation
does not exist if
00:08:52.060 --> 00:08:56.950
in fact it's infinite or it's
minus infinity or it is
00:08:56.950 --> 00:08:58.810
undefined completely.
00:08:58.810 --> 00:09:01.510
And you say it's undefined
in all of those cases.
00:09:01.510 --> 00:09:04.550
And that's just a convention
that everybody lives by.
00:09:04.550 --> 00:09:09.340
So the other way of saying this
is if the expected value
00:09:09.340 --> 00:09:13.840
of the magnitude of the random
variable is infinite, then the
00:09:13.840 --> 00:09:15.790
expectation doesn't exist.
00:09:15.790 --> 00:09:19.430
So we will try to say it
that way sometimes
00:09:19.430 --> 00:09:21.750
when it's really important.
00:09:21.750 --> 00:09:26.010
OK, let's go on to indicator
random variables.
00:09:26.010 --> 00:09:28.180
You're probably familiar
with these.
00:09:28.180 --> 00:09:33.900
For every event you can think
of, an event is something
00:09:33.900 --> 00:09:38.820
which is true, which occurs when
some set of the sample
00:09:38.820 --> 00:09:42.460
points occur and is not
true otherwise.
00:09:42.460 --> 00:09:46.570
So the definition of an
indicator random variable is
00:09:46.570 --> 00:09:49.520
that the indicator
for an event a--
00:09:49.520 --> 00:09:51.980
as a function of the
sample space--
00:09:51.980 --> 00:09:58.920
is equal to 1, if omega is in
the event a, and 0 otherwise.
00:09:58.920 --> 00:10:02.510
So if you draw the distribution
function of it,
00:10:02.510 --> 00:10:11.170
the distribution function of the
indicator function is 0 up
00:10:11.170 --> 00:10:13.270
until the point 0.
00:10:13.270 --> 00:10:16.165
Then it jumps up to 1 minus
the probability of a.
00:10:16.165 --> 00:10:19.000
At 1, it jumps all
the way up to 1.
00:10:19.000 --> 00:10:21.290
So it's simply a binary
random variable.
00:10:21.290 --> 00:10:25.600
So every event has an indicator
random variable.
00:10:25.600 --> 00:10:27.810
Every indicator random
variable has a
00:10:27.810 --> 00:10:29.640
binary random variable.
00:10:29.640 --> 00:10:33.250
So indicator random variables
are very simple.
00:10:33.250 --> 00:10:36.900
Events are very simple because
you can map any event into an
00:10:36.900 --> 00:10:39.400
indicator random variable
[INAUDIBLE].
00:10:39.400 --> 00:10:42.930
And this also says that since
we want to talk about events
00:10:42.930 --> 00:10:46.410
very often, binary random
variables are particularly
00:10:46.410 --> 00:10:49.040
important in this field.
00:10:49.040 --> 00:10:52.760
OK, but what this really says
now is that any theorem about
00:10:52.760 --> 00:10:55.990
random variables can be
applied to events.
00:10:55.990 --> 00:10:59.360
This is one of the few examples
I know where it's
00:10:59.360 --> 00:11:02.370
much harder to find the
expectation by taking the
00:11:02.370 --> 00:11:05.350
complimentary distribution
function and integrating it.
00:11:05.350 --> 00:11:06.420
It's not hard.
00:11:06.420 --> 00:11:09.750
But it's far easier to take
the probability that the
00:11:09.750 --> 00:11:13.650
indicator random variable
is 0, which is 1 minus
00:11:13.650 --> 00:11:14.800
probability of a.
00:11:14.800 --> 00:11:18.600
The probability is equal to 1,
which is probability of a, and
00:11:18.600 --> 00:11:22.480
take the expectation, which is
the probability of a, and the
00:11:22.480 --> 00:11:25.410
standard deviation, which
is the square root the
00:11:25.410 --> 00:11:29.580
probability of a times 1 minus
the probability of a.
00:11:29.580 --> 00:11:34.170
So random variables are sort
of trivial things in a way.
00:11:34.170 --> 00:11:36.940
OK, let's go on to multiple
random variables.
00:11:36.940 --> 00:11:40.720
Now here's something that's
a trick question in a way.
00:11:40.720 --> 00:11:43.110
But it's a very important
trick question.
00:11:43.110 --> 00:11:47.040
Is a random variable specified
by its distribution function?
00:11:47.040 --> 00:11:49.620
We've already seen that it's
not really specified by its
00:11:49.620 --> 00:11:54.410
density or by its probability
mass function.
00:11:54.410 --> 00:11:57.190
But we've said a distribution
function is a more general
00:11:57.190 --> 00:11:59.670
thing so that every random
variable has a
00:11:59.670 --> 00:12:01.500
distribution function.
00:12:01.500 --> 00:12:05.900
Does the distribution function
specify the random variable?
00:12:05.900 --> 00:12:09.910
No, that's the whole
reason for what
00:12:09.910 --> 00:12:14.610
Kolmogorov did back in 1933.
00:12:14.610 --> 00:12:17.250
Or at least it was one
of the main reasons
00:12:17.250 --> 00:12:18.130
for what he was doing.
00:12:18.130 --> 00:12:21.550
He wanted to straighten out
this ambiguity which runs
00:12:21.550 --> 00:12:25.690
through the field about
confusing random variables
00:12:25.690 --> 00:12:27.760
with their distribution
function.
00:12:27.760 --> 00:12:32.600
Random variables are functions
from the sample space to the
00:12:32.600 --> 00:12:34.110
real numbers.
00:12:34.110 --> 00:12:36.340
And they're not anything else.
00:12:36.340 --> 00:12:40.550
So if you want to really define
a random variable, you
00:12:40.550 --> 00:12:43.290
not only have to know what that
random variable is but
00:12:43.290 --> 00:12:46.980
you also have to know what
its relationships are.
00:12:46.980 --> 00:12:49.360
It's like if you're trying
to understand the person.
00:12:49.360 --> 00:12:52.110
You can't understand the person
without understanding
00:12:52.110 --> 00:12:56.450
something about who they know,
how they know them, all those
00:12:56.450 --> 00:12:57.050
other things.
00:12:57.050 --> 00:12:59.360
All those relationships
are important.
00:12:59.360 --> 00:13:00.950
And it's the same with
random variables.
00:13:00.950 --> 00:13:03.520
You got to know about all
the relationships.
00:13:03.520 --> 00:13:05.890
Many problems you can solve
just in terms of
00:13:05.890 --> 00:13:07.470
distribution function.
00:13:07.470 --> 00:13:10.120
But ultimately you have to--
00:13:10.120 --> 00:13:13.170
or ultimately in many cases,
you have to deal with these
00:13:13.170 --> 00:13:15.790
joint distribution functions.
00:13:15.790 --> 00:13:17.810
And random variables
are independent.
00:13:17.810 --> 00:13:21.440
If the joint distribution
function is equal to the
00:13:21.440 --> 00:13:27.890
product of the distribution
functions for all x1 to xn,
00:13:27.890 --> 00:13:34.790
and that same form carries over
for density functions and
00:13:34.790 --> 00:13:36.425
for probability mass
functions.
00:13:41.757 --> 00:13:46.230
OK, if you have discrete random
variables, the idea of
00:13:46.230 --> 00:13:49.990
independence is a whole lot more
intuitive if you express
00:13:49.990 --> 00:13:54.280
it in terms of conditional
probabilities.
00:13:54.280 --> 00:13:58.620
The conditional probability
that the random variable x
00:13:58.620 --> 00:14:03.690
takes on some sample value x
given that the random variable
00:14:03.690 --> 00:14:06.210
y takes on a sample value y.
00:14:08.750 --> 00:14:14.300
Just as one side comment here,
when you're doing problems,
00:14:14.300 --> 00:14:18.700
you will very often want to
leave out the subscripts here
00:14:18.700 --> 00:14:21.980
saying what random variables
you're dealing with.
00:14:21.980 --> 00:14:26.680
And you will use either capital
or small letters here
00:14:26.680 --> 00:14:33.140
mixing up the argument and the
function itself, which
00:14:33.140 --> 00:14:34.110
everybody does.
00:14:34.110 --> 00:14:36.070
And it's perfectly all right.
00:14:36.070 --> 00:14:40.480
I suggest that you try not to do
it for a while because you
00:14:40.480 --> 00:14:43.960
get so confused doing this,
not being able to sort out
00:14:43.960 --> 00:14:46.960
what's a random variable and
what's a real number.
00:14:46.960 --> 00:14:51.860
A lot of wags say random
variables are neither random,
00:14:51.860 --> 00:14:54.080
because they're functions
of the sample space,
00:14:54.080 --> 00:14:56.300
nor are they variables.
00:14:56.300 --> 00:14:59.590
And both of those are true.
00:14:59.590 --> 00:15:01.840
That's immaterial here.
00:15:01.840 --> 00:15:04.140
It's just that when you start
getting confused about a
00:15:04.140 --> 00:15:07.840
problem, it's important to
sort out which things are
00:15:07.840 --> 00:15:10.610
random variables and which
things are arguments.
00:15:10.610 --> 00:15:13.700
Now this conditional probability
is something
00:15:13.700 --> 00:15:15.320
you're all familiar with.
00:15:15.320 --> 00:15:20.620
But x and y are independent then
if the probability of x
00:15:20.620 --> 00:15:24.780
conditional on y is the same
as the probability of x not
00:15:24.780 --> 00:15:26.000
conditional on y.
00:15:26.000 --> 00:15:29.830
In other words, if observing
what y is doesn't tell you
00:15:29.830 --> 00:15:32.970
anything about what x is, that's
really your intuitive
00:15:32.970 --> 00:15:36.000
definition of independence.
00:15:36.000 --> 00:15:39.050
It's what you use if
you're dealing with
00:15:39.050 --> 00:15:41.580
some real world situation.
00:15:41.580 --> 00:15:44.280
And you're asking what does
this have to do with that?
00:15:44.280 --> 00:15:47.050
And if this has nothing to
do with that, the random
00:15:47.050 --> 00:15:49.950
variables over here have nothing
to do with the random
00:15:49.950 --> 00:15:54.790
variables over there, you would
say in the real world
00:15:54.790 --> 00:15:56.880
that these things are
independent of each other.
00:15:56.880 --> 00:15:59.540
When you have a probability
model, you say they're
00:15:59.540 --> 00:16:01.580
statistically independent
of each other.
00:16:11.580 --> 00:16:21.280
OK, so that's the relationship
between the real world and the
00:16:21.280 --> 00:16:23.730
models that we're dealing
with all the time.
00:16:23.730 --> 00:16:26.090
We call it independence
in both cases.
00:16:26.090 --> 00:16:28.980
But it means somewhat different
things in the two
00:16:28.980 --> 00:16:30.750
situations.
00:16:30.750 --> 00:16:34.270
OK, next about IID
random variables.
00:16:34.270 --> 00:16:35.670
What are they?
00:16:35.670 --> 00:16:41.480
Well, the joint distribution
function has to be equal to
00:16:41.480 --> 00:16:46.760
the product of the individual
distribution functions.
00:16:46.760 --> 00:16:50.440
You notice I've done something
funny here, which is a
00:16:50.440 --> 00:16:52.430
convention I always use.
00:16:52.430 --> 00:16:53.990
A lot of people use it.
00:16:53.990 --> 00:16:57.680
If you have a bunch of
independent random variables,
00:16:57.680 --> 00:17:00.370
they all have the same
distribution function.
00:17:00.370 --> 00:17:03.600
If they all have the same
distribution function, it gets
00:17:03.600 --> 00:17:06.859
confusing to refer to their
distribution functions as a
00:17:06.859 --> 00:17:09.750
distribution function of
the random variable x1,
00:17:09.750 --> 00:17:12.680
distribution function of
the random variable x2.
00:17:12.680 --> 00:17:17.150
It's nicer to just take a
generic random variable x,
00:17:17.150 --> 00:17:20.079
which has the same distribution
as all of these,
00:17:20.079 --> 00:17:23.930
and express this numerically
in this way.
00:17:23.930 --> 00:17:28.460
You have the same product form
for probability mass functions
00:17:28.460 --> 00:17:29.890
and for density functions.
00:17:29.890 --> 00:17:33.150
So this works throughout.
00:17:33.150 --> 00:17:39.430
OK next, think about a
probability model in which r,
00:17:39.430 --> 00:17:42.980
the set of real numbers,
is a sample space.
00:17:42.980 --> 00:17:46.420
And x is some random variable
on that sample space.
00:17:46.420 --> 00:17:50.310
Namely x then is a function from
the real numbers on to
00:17:50.310 --> 00:17:53.445
the real numbers.
00:17:53.445 --> 00:17:56.660
The interesting thing
here, and what I'm
00:17:56.660 --> 00:17:59.380
saying here is obvious.
00:17:59.380 --> 00:18:01.350
It's something that
you all know.
00:18:01.350 --> 00:18:03.890
It's something that you've all
been using even before you
00:18:03.890 --> 00:18:07.010
started to learn about
probability theory.
00:18:07.010 --> 00:18:10.380
But at the same time, you
probably never thought about
00:18:10.380 --> 00:18:13.190
it in a serious enough way that
you would really make
00:18:13.190 --> 00:18:15.190
sense out of it.
00:18:15.190 --> 00:18:18.890
You can always create an
extended probability model in
00:18:18.890 --> 00:18:23.450
which the Cartesian space r to
the n-- in other words, the
00:18:23.450 --> 00:18:26.920
space of n real numbers--
00:18:26.920 --> 00:18:28.720
is the sample space.
00:18:28.720 --> 00:18:32.860
And x1 to xn are independent
identity
00:18:32.860 --> 00:18:34.640
distributed random variables.
00:18:34.640 --> 00:18:36.720
This is not obvious.
00:18:36.720 --> 00:18:38.730
And that's something
you have to prove.
00:18:38.730 --> 00:18:41.820
But it's not hard to prove.
00:18:41.820 --> 00:18:46.180
And all you have to do is start
out with a probability
00:18:46.180 --> 00:18:49.950
model for one random variable.
00:18:49.950 --> 00:18:53.260
And then just define all
products to be what they're
00:18:53.260 --> 00:18:56.960
supposed to be and go from the
products to all unions and all
00:18:56.960 --> 00:18:58.940
intersections.
00:18:58.940 --> 00:19:01.330
We're just going to assume that
that's true because we
00:19:01.330 --> 00:19:06.000
have to assume it's true if we
don't want to use any measure
00:19:06.000 --> 00:19:06.490
theory here.
00:19:06.490 --> 00:19:08.980
This is one of the easier
things to show
00:19:08.980 --> 00:19:10.380
using measure theory.
00:19:10.380 --> 00:19:13.110
But it's something you
are always used to.
00:19:13.110 --> 00:19:16.560
When you think of a random
experiment, when you think of
00:19:16.560 --> 00:19:19.030
playing dice with somebody
or playing cards with
00:19:19.030 --> 00:19:22.390
someone, you are--
00:19:22.390 --> 00:19:25.650
from the very beginning when you
started to talk about odds
00:19:25.650 --> 00:19:30.290
or anything of that sort, you
have always had the idea that
00:19:30.290 --> 00:19:33.790
this is a game which you
can play repeatedly.
00:19:33.790 --> 00:19:36.590
And each time you play it,
it's the same game.
00:19:36.590 --> 00:19:38.750
But the outcome is different.
00:19:38.750 --> 00:19:42.510
But all the probabilities
are exactly the same.
00:19:42.510 --> 00:19:46.150
What this says is you can always
make a probability
00:19:46.150 --> 00:19:46.990
model that way.
00:19:46.990 --> 00:19:49.590
You can always make a
probability model which
00:19:49.590 --> 00:19:52.670
corresponds to what you've
always believed deep in your
00:19:52.670 --> 00:19:54.700
hearts all your lives.
00:19:54.700 --> 00:19:56.300
And fortunately, that's true.
00:19:56.300 --> 00:19:59.110
Otherwise, you wouldn't
use probability.
00:19:59.110 --> 00:20:03.830
OK, so let's move
on from that.
00:20:03.830 --> 00:20:08.920
The page of philosophy, I will
stop doing that pretty soon.
00:20:08.920 --> 00:20:13.240
But I have to get across what
the relationship is between
00:20:13.240 --> 00:20:15.260
the real world and
these models that
00:20:15.260 --> 00:20:16.210
we're dealing with.
00:20:16.210 --> 00:20:20.700
Because otherwise, you as
engineers or business people
00:20:20.700 --> 00:20:23.760
or financial analysts or
whatever the heck you're going
00:20:23.760 --> 00:20:27.930
to become will start believing
in your probability models.
00:20:27.930 --> 00:20:33.000
And you will cause untold damage
by losing track of the
00:20:33.000 --> 00:20:36.640
fact that these are supposedly
models of something.
00:20:36.640 --> 00:20:38.160
And you better think of
what they're supposed
00:20:38.160 --> 00:20:40.081
to be models of.
00:20:40.081 --> 00:20:43.210
OK, in order to do that, we're
going to study the sample
00:20:43.210 --> 00:20:49.120
average, namely the sum of n
random variables divided by n.
00:20:49.120 --> 00:20:50.820
That's the way you take
sample averages.
00:20:50.820 --> 00:20:52.030
You add them all up.
00:20:52.030 --> 00:20:54.380
You divide by n.
00:20:54.380 --> 00:20:56.570
The law of large numbers, which
we're going to talk
00:20:56.570 --> 00:21:00.880
about very soon, says that s
sub n over n essentially
00:21:00.880 --> 00:21:06.210
becomes deterministic as
n becomes very large.
00:21:06.210 --> 00:21:09.460
What we mean by that-- and most
of you have seen that in
00:21:09.460 --> 00:21:10.780
various ways.
00:21:10.780 --> 00:21:14.180
We will review it later today.
00:21:14.180 --> 00:21:18.150
Well, we'll do it today
and on Wednesday.
00:21:18.150 --> 00:21:22.400
And there's a big question
of about what becoming
00:21:22.400 --> 00:21:24.530
deterministic means.
00:21:24.530 --> 00:21:27.180
But there is an essential
idea there.
00:21:27.180 --> 00:21:30.130
The extended model, namely
when you have one random
00:21:30.130 --> 00:21:33.510
variable, you create a very
large number of them.
00:21:33.510 --> 00:21:37.380
If it corresponds to repeated
experiments in the real world,
00:21:37.380 --> 00:21:41.060
then s sub n over n corresponds
to the arithmetic
00:21:41.060 --> 00:21:42.600
average in the real world.
00:21:42.600 --> 00:21:46.120
In the real world, you do take
arithmetic averages.
00:21:46.120 --> 00:21:49.650
Whenever you open up a
newspaper, somebody is taking
00:21:49.650 --> 00:21:53.110
an arithmetic average of
something and says, gee, this
00:21:53.110 --> 00:21:53.920
is significant.
00:21:53.920 --> 00:21:58.550
This shows what's going
on someplace.
00:21:58.550 --> 00:22:00.930
Models can have two types
of difficulties.
00:22:00.930 --> 00:22:04.250
This paragraph is a little
different than what I wrote in
00:22:04.250 --> 00:22:07.445
the handout because I realized
what I wrote in the handout
00:22:07.445 --> 00:22:10.250
didn't make a whole
lot of sense.
00:22:10.250 --> 00:22:12.720
OK, the two types of
difficulties you have with
00:22:12.720 --> 00:22:17.290
models, especially when you're
trying to model things by IID
00:22:17.290 --> 00:22:18.720
random variables.
00:22:18.720 --> 00:22:23.390
In one, a sequence of real
world experiments is not
00:22:23.390 --> 00:22:26.920
sufficiently similar and
isolated to each other to
00:22:26.920 --> 00:22:30.820
correspond to the IID
extended model.
00:22:30.820 --> 00:22:33.750
In other words, you want to
model things so that each
00:22:33.750 --> 00:22:37.120
time, each trial of this
experiment, you do the same
00:22:37.120 --> 00:22:40.390
thing but get a potentially
different answer.
00:22:40.390 --> 00:22:46.535
Sometimes you rig things without
trying to do so in
00:22:46.535 --> 00:22:49.630
such a way that these
experiments are not
00:22:49.630 --> 00:22:52.420
independent of each other
and in fact are very,
00:22:52.420 --> 00:22:53.540
very heavily biased.
00:22:53.540 --> 00:22:57.420
You find people taking risk
models in the financial world
00:22:57.420 --> 00:22:59.840
where they take all sorts
of these things.
00:22:59.840 --> 00:23:02.110
And they say, oh, all right,
these things are all
00:23:02.110 --> 00:23:03.290
independent of each other.
00:23:03.290 --> 00:23:05.120
They look independent.
00:23:05.120 --> 00:23:08.230
And then suddenly a
scare comes along.
00:23:08.230 --> 00:23:10.580
And everybody sells
simultaneously.
00:23:10.580 --> 00:23:13.120
And you find out that all these
random variables were
00:23:13.120 --> 00:23:14.880
not independent at all.
00:23:14.880 --> 00:23:17.760
They were very closely related
to each other but in a way you
00:23:17.760 --> 00:23:19.460
never saw before.
00:23:19.460 --> 00:23:26.290
OK, the other way that these
models are not true or not
00:23:26.290 --> 00:23:30.560
valid is that the IID
extension is OK.
00:23:30.560 --> 00:23:33.340
But the basic model
is not right.
00:23:33.340 --> 00:23:35.050
OK, in other words,
you model a coin.
00:23:35.050 --> 00:23:38.410
It's coming out heads with
probability 1/2.
00:23:38.410 --> 00:23:42.640
And somebody has put
a loaded coin in.
00:23:42.640 --> 00:23:47.760
And the probability that it
comes up heads is 0.45.
00:23:47.760 --> 00:23:51.700
And the probability that it
comes up tails is 0.55.
00:23:51.700 --> 00:23:54.860
And you might guess that this
person always bets on tails
00:23:54.860 --> 00:23:58.030
and tries to get you
to bet on heads.
00:23:58.030 --> 00:24:01.410
So in that case, the basic
model that you're
00:24:01.410 --> 00:24:02.940
using is not OK.
00:24:02.940 --> 00:24:05.380
So you have both of these
kinds of problems.
00:24:05.380 --> 00:24:07.190
You should try to keep
them straight.
00:24:07.190 --> 00:24:09.620
But we'll learn about these
problems through
00:24:09.620 --> 00:24:10.750
study of the models.
00:24:10.750 --> 00:24:13.950
Namely, we're not going to go
through an enormous amount of
00:24:13.950 --> 00:24:19.834
study on how you can bias a coin
or things of this sort.
00:24:19.834 --> 00:24:23.200
OK, science, symmetry,
analogies, earlier models, all
00:24:23.200 --> 00:24:27.830
of these are used to model
real world situations.
00:24:27.830 --> 00:24:31.490
Let me again talk about an
example I talked about a
00:24:31.490 --> 00:24:37.730
little bit last time because the
model was so trivial that
00:24:37.730 --> 00:24:41.250
you probably understood
everything about the model in
00:24:41.250 --> 00:24:42.110
the situation.
00:24:42.110 --> 00:24:46.060
But you didn't understand what
it was illustrating.
00:24:46.060 --> 00:24:49.170
You have two dice.
00:24:49.170 --> 00:24:50.930
One of them is red.
00:24:50.930 --> 00:24:52.930
And one of them is white.
00:24:52.930 --> 00:24:54.225
You roll them.
00:24:54.225 --> 00:25:00.430
By symmetry, each one comes up
to be 1, 2, 3, up to 6, each
00:25:00.430 --> 00:25:02.170
with equal probability.
00:25:02.170 --> 00:25:05.040
If you roll them with two hands
or something, they're
00:25:05.040 --> 00:25:07.850
going to be independent
of each other.
00:25:07.850 --> 00:25:11.910
And therefore, the probability
of each pair of outcomes--
00:25:11.910 --> 00:25:13.480
red is equal to i.
00:25:13.480 --> 00:25:15.100
White is equal to j.
00:25:15.100 --> 00:25:17.990
Probability of each one of
those is going to be 136
00:25:17.990 --> 00:25:20.770
because that's the size
of the sample space.
00:25:20.770 --> 00:25:23.580
Now you take two white dice.
00:25:23.580 --> 00:25:25.810
And you roll them.
00:25:25.810 --> 00:25:27.870
What's the sample space?
00:25:27.870 --> 00:25:30.420
Well, as far as the real world
is concerned, you can't
00:25:30.420 --> 00:25:35.800
distinguish a red 1 from
a white 1 and a white
00:25:35.800 --> 00:25:37.390
2 from a red 2.
00:25:37.390 --> 00:25:40.470
In other words, those two
possibilities can't be
00:25:40.470 --> 00:25:41.730
distinguished.
00:25:41.730 --> 00:25:44.740
So you might say I want to
use a sample space which
00:25:44.740 --> 00:25:46.736
corresponds to the--
00:25:50.140 --> 00:25:53.880
what's the word I used here?--
finest grain possible outcome
00:25:53.880 --> 00:25:57.096
that you can observe.
00:25:57.096 --> 00:25:59.030
And who would do that?
00:25:59.030 --> 00:26:01.170
You'd be crazy to do that.
00:26:01.170 --> 00:26:04.270
I mean, you have a nice model
of rolling dice where each
00:26:04.270 --> 00:26:07.510
outcome has probability 136.
00:26:07.510 --> 00:26:10.120
And you would replace that
with something where the
00:26:10.120 --> 00:26:13.440
probability of a 1, 1 is 136.
00:26:13.440 --> 00:26:17.560
But the probability of a 1, 2 is
1/18 because you can get a
00:26:17.560 --> 00:26:20.190
2 in two different ways.
00:26:20.190 --> 00:26:23.470
You get a 2 in two different
ways, which says you're really
00:26:23.470 --> 00:26:26.920
thinking about a red die
and a white die.
00:26:26.920 --> 00:26:29.440
Otherwise, you wouldn't
be able to say that.
00:26:29.440 --> 00:26:33.010
So the appropriate model here is
certainly to think in terms
00:26:33.010 --> 00:26:35.050
of a red die and a white die.
00:26:35.050 --> 00:26:36.850
It's what everybody does.
00:26:36.850 --> 00:26:38.830
They just don't talk about it.
00:26:38.830 --> 00:26:43.200
OK, so the point that I'm trying
to make here is that
00:26:43.200 --> 00:26:48.880
what you call a finest grain
model is not at all clear.
00:26:48.880 --> 00:26:52.450
And if it's not at all clear in
the case of dice, it sure
00:26:52.450 --> 00:26:55.740
as hell is not clear in most of
the kinds of problems you
00:26:55.740 --> 00:26:57.040
want to deal with.
00:26:57.040 --> 00:27:02.180
So you need something
considerably more than that.
00:27:02.180 --> 00:27:05.740
OK, so neither the axioms
nor experimentation
00:27:05.740 --> 00:27:06.970
motivate this model.
00:27:06.970 --> 00:27:11.940
In other words, you really
have to use common sense.
00:27:11.940 --> 00:27:14.610
You have to use judgment.
00:27:14.610 --> 00:27:16.480
And all of you have that.
00:27:16.480 --> 00:27:20.010
It's just that by learning
all this mathematics, you
00:27:20.010 --> 00:27:22.530
eventually start to think that
maybe you shouldn't use your
00:27:22.530 --> 00:27:24.170
common sense.
00:27:24.170 --> 00:27:27.830
So I have to keep saying that
no, you keep on using your
00:27:27.830 --> 00:27:28.850
common sense.
00:27:28.850 --> 00:27:32.680
You want to learn what these
models are about.
00:27:32.680 --> 00:27:34.475
You want to use your
common sense also.
00:27:34.475 --> 00:27:39.130
And you've got to go back and
forth between the two of them.
00:27:39.130 --> 00:27:42.410
OK, that's almost the end
of our philosophy.
00:27:42.410 --> 00:27:43.520
I guess one more slide.
00:27:43.520 --> 00:27:45.810
I'm getting tired
of this stuff.
00:27:45.810 --> 00:27:49.680
Comparing models for similar
situations and analyzing
00:27:49.680 --> 00:27:52.770
limited and effective
models helps a lot
00:27:52.770 --> 00:27:54.810
in clarifying fuzziness.
00:27:54.810 --> 00:27:57.260
But ultimately, as in
all of science, some
00:27:57.260 --> 00:27:58.840
experimentation is needed.
00:27:58.840 --> 00:28:01.240
This is like any other
branch of science.
00:28:01.240 --> 00:28:05.090
You need experimentation
sometimes.
00:28:05.090 --> 00:28:08.190
You don't want to do too much of
it because you'd always be
00:28:08.190 --> 00:28:10.040
doing experiments.
00:28:10.040 --> 00:28:13.600
But the important thing is
that the outcome of an
00:28:13.600 --> 00:28:17.140
experiment is a sample point.
00:28:17.140 --> 00:28:18.920
It's not a probability.
00:28:18.920 --> 00:28:19.725
You do an experiment.
00:28:19.725 --> 00:28:21.300
You get an outcome.
00:28:21.300 --> 00:28:24.050
And all you find is one sample
point, if you do the
00:28:24.050 --> 00:28:25.670
experiment once.
00:28:25.670 --> 00:28:27.800
And there's nothing that
lets you draw a
00:28:27.800 --> 00:28:29.790
probability from that.
00:28:29.790 --> 00:28:32.430
The only way you can get things
that you would call
00:28:32.430 --> 00:28:36.370
probabilities is to use an
extended model, hope the
00:28:36.370 --> 00:28:40.690
extended model corresponds to
the physical situation, and
00:28:40.690 --> 00:28:43.970
deal with these law of large
numbers kind of things.
00:28:43.970 --> 00:28:46.650
You don't necessarily need
IID random variables.
00:28:46.650 --> 00:28:50.200
But you need something that you
know about between a large
00:28:50.200 --> 00:28:54.060
number of random variables
to get from an outcome to
00:28:54.060 --> 00:28:57.700
something you could reasonably
call a probability.
00:28:57.700 --> 00:29:00.560
OK, so that's enough.
00:29:00.560 --> 00:29:03.200
Let's go on to the law
of large numbers.
00:29:03.200 --> 00:29:06.270
Let's do it in pictures first.
00:29:06.270 --> 00:29:10.820
So you can lie back and relax
for a minute or stop being
00:29:10.820 --> 00:29:14.080
bored by all this stuff.
00:29:14.080 --> 00:29:17.670
What I've done here is to take
the simplest random variable I
00:29:17.670 --> 00:29:20.760
can think of, which as
you might guess is a
00:29:20.760 --> 00:29:22.230
binary random variable.
00:29:22.230 --> 00:29:24.360
It's either 0 or 1.
00:29:24.360 --> 00:29:28.050
Here it's 1 with probability
1/4 and 0
00:29:28.050 --> 00:29:30.260
with probability 3/4.
00:29:30.260 --> 00:29:32.290
I have actually calculated
these things.
00:29:32.290 --> 00:29:38.490
The distribution function of
x1 plus x2 plus x3 plus x4,
00:29:38.490 --> 00:29:45.390
this point down here,
is the probability
00:29:45.390 --> 00:29:47.890
of all 0's, I guess.
00:29:47.890 --> 00:29:51.060
And then you get the probability
of all 0's plus 1,
00:29:51.060 --> 00:29:54.210
1 and so forth.
00:29:54.210 --> 00:29:57.710
Here's where you take the sum
of 20 random variables.
00:29:57.710 --> 00:30:01.010
And you're looking at the
distribution function of the
00:30:01.010 --> 00:30:03.400
number of 1's that you get.
00:30:03.400 --> 00:30:05.410
And it comes out like this.
00:30:05.410 --> 00:30:07.090
Here you're looking at s50.
00:30:07.090 --> 00:30:09.650
You're adding up 50
random variables.
00:30:09.650 --> 00:30:14.110
And what's happening as far
as the gross picture
00:30:14.110 --> 00:30:15.990
is concerned here?
00:30:15.990 --> 00:30:21.470
Well, the mean value of s sub
n is the mean of a sum of
00:30:21.470 --> 00:30:22.940
random variables.
00:30:22.940 --> 00:30:26.820
And that's equal to n times
a mean of a single random
00:30:26.820 --> 00:30:30.530
variable when you have
identically distributed random
00:30:30.530 --> 00:30:33.600
variables or random variables
that have the same mean.
00:30:33.600 --> 00:30:38.330
The variance is equal to
n times sigma squared.
00:30:38.330 --> 00:30:46.270
Namely, when you take the
expected value of this
00:30:46.270 --> 00:30:51.010
quantity squared, all these
cross terms are going to
00:30:51.010 --> 00:30:54.360
balance out with the
mean when you do.
00:30:54.360 --> 00:30:57.600
I mean, all of you know how to
find the variance of s sub n.
00:30:57.600 --> 00:30:59.560
I hope you know how
to find that.
00:30:59.560 --> 00:31:03.710
And when you do that,
it increases with n.
00:31:03.710 --> 00:31:06.060
And the mean increases with n.
00:31:06.060 --> 00:31:08.870
The standard deviation, which
gives you a picture of how
00:31:08.870 --> 00:31:12.810
wide the distribution is,
only goes up as the
00:31:12.810 --> 00:31:14.820
square root of n.
00:31:14.820 --> 00:31:21.010
This is really the essence of
the weak law of large numbers.
00:31:21.010 --> 00:31:25.520
I mean, everything else is
mathematical detail.
00:31:25.520 --> 00:31:32.740
And then if you go on beyond
this and you talk about the
00:31:32.740 --> 00:31:37.260
sample average, namely the sum
of these n random variables--
00:31:37.260 --> 00:31:38.970
assume them IID again.
00:31:38.970 --> 00:31:42.090
In fact, assume for this picture
that they're the same
00:31:42.090 --> 00:31:44.230
binary random variables.
00:31:44.230 --> 00:31:46.130
You look at the sample
average.
00:31:46.130 --> 00:31:48.890
You find the mean of
the sample average.
00:31:48.890 --> 00:31:52.400
And it's the mean of a single
random variable.
00:31:52.400 --> 00:31:54.460
You find the variance of it.
00:31:54.460 --> 00:31:57.960
Because of this n here and the
squaring that you're doing,
00:31:57.960 --> 00:32:01.990
the variance of the sum divided
by n, the sigma
00:32:01.990 --> 00:32:07.570
squared divided by n, what
happens as n gets large?
00:32:07.570 --> 00:32:10.080
This variance goes to 0.
00:32:10.080 --> 00:32:13.830
What happens when you have a
random variable, a sequence of
00:32:13.830 --> 00:32:17.850
random variables, all of which
have the same mean and whose
00:32:17.850 --> 00:32:21.020
standard deviation
is going to 0?
00:32:21.020 --> 00:32:24.550
Well, you might play around with
a lot of funny kinds of
00:32:24.550 --> 00:32:27.310
things that you might think
of as happening.
00:32:27.310 --> 00:32:32.810
But essentially what's going
on here is the nice feature
00:32:32.810 --> 00:32:36.360
that when you add all these
things up, the distribution
00:32:36.360 --> 00:32:42.250
function gets scrunched
down into a unit step.
00:32:42.250 --> 00:32:45.650
In other words, since the
standard deviation is going to
00:32:45.650 --> 00:32:49.810
0, the sequence of random
variables--
00:32:49.810 --> 00:32:52.260
since they all have
the same mean--
00:32:52.260 --> 00:32:55.690
they all have smaller and
smaller standard deviations.
00:32:55.690 --> 00:32:59.500
The only way you can do that is
to scrunch them down into a
00:32:59.500 --> 00:33:04.210
limiting random variable,
which is deterministic.
00:33:04.210 --> 00:33:07.460
And you can see that
happening here.
00:33:07.460 --> 00:33:12.670
Namely the largest value is
the black thing, which is
00:33:12.670 --> 00:33:14.540
getting smaller and smaller.
00:33:14.540 --> 00:33:17.250
And the left side is
going that way.
00:33:17.250 --> 00:33:19.830
On the right side, it's
going that way.
00:33:19.830 --> 00:33:24.010
So it looks like it's
approaching a unit step.
00:33:24.010 --> 00:33:25.630
That has to be proven.
00:33:25.630 --> 00:33:26.910
And there's a simple
proof of it.
00:33:26.910 --> 00:33:27.830
And we'll see that.
00:33:27.830 --> 00:33:29.720
And you've all seen
that before.
00:33:29.720 --> 00:33:32.413
And you've all probably
said, ho-hum.
00:33:32.413 --> 00:33:35.050
But that's the way it is.
00:33:35.050 --> 00:33:39.000
Now the next thing to look at
for this same set of random
00:33:39.000 --> 00:33:43.610
variables, the same sum, is
you look at the normalized
00:33:43.610 --> 00:33:49.470
sum, namely sn minus
n times the mean.
00:33:49.470 --> 00:33:53.730
And you divide that by the
square root of n times sigma.
00:33:53.730 --> 00:33:55.520
And what do you get?
00:33:55.520 --> 00:33:57.820
Well, every one of these
random variables--
00:33:57.820 --> 00:34:03.190
for every n has mean 0, has mean
0 because the mean of sn
00:34:03.190 --> 00:34:04.570
is n times x bar.
00:34:04.570 --> 00:34:08.429
So you're subtracting off
of the mean essentially.
00:34:08.429 --> 00:34:11.960
And every one of them
has variance 1.
00:34:11.960 --> 00:34:16.590
So you've got a whole sequence
of random variables, which are
00:34:16.590 --> 00:34:20.469
just sticking there at
the same mean, 0,
00:34:20.469 --> 00:34:23.040
and at the same variance.
00:34:23.040 --> 00:34:26.110
What's extraordinary when you
do that, and you can sort of
00:34:26.110 --> 00:34:30.949
see this happening a little
bit, this curve looks like
00:34:30.949 --> 00:34:35.389
it's going into a fixed
curve, which starts
00:34:35.389 --> 00:34:39.250
out sticking to 0.
00:34:39.250 --> 00:34:41.270
And then it gradually
comes up.
00:34:41.270 --> 00:34:43.600
And it looks fairly smooth.
00:34:43.600 --> 00:34:45.946
It goes off this way.
00:34:45.946 --> 00:34:54.219
And if you read a lot about this
or if you think that all
00:34:54.219 --> 00:34:58.640
respectable random variables are
Gaussian random variables,
00:34:58.640 --> 00:35:02.020
and I hope at the end of this
course you will realize that
00:35:02.020 --> 00:35:04.780
only most respectable
random variables are
00:35:04.780 --> 00:35:06.660
Gaussian random variables.
00:35:06.660 --> 00:35:08.575
There are many very interesting
random variables
00:35:08.575 --> 00:35:09.650
that aren't.
00:35:09.650 --> 00:35:13.520
But what the central limit
theorem says is that as you
00:35:13.520 --> 00:35:17.040
add up more and more random
variables and you look at this
00:35:17.040 --> 00:35:22.850
normalized sum here, what you
get is in fact the normal
00:35:22.850 --> 00:35:26.700
distribution, which is this
strange integral here, that e
00:35:26.700 --> 00:35:30.470
to the minus x squared
over 2 times the x.
00:35:30.470 --> 00:35:35.900
Now what I want to do with the
rest of our time is to show
00:35:35.900 --> 00:35:40.490
you why in fact that happens.
00:35:40.490 --> 00:35:42.420
I've never seen this proof
of the central
00:35:42.420 --> 00:35:44.480
limit theorem before.
00:35:44.480 --> 00:35:47.040
I'm sure that some people
have done it.
00:35:47.040 --> 00:35:52.230
I'm only going to do it for
the case of a binomial
00:35:52.230 --> 00:35:55.780
distribution, which is the only
place where this works.
00:35:55.780 --> 00:36:01.280
But I think in doing this you
will see why in fact that
00:36:01.280 --> 00:36:04.990
strange e to the minus x squared
over 2 comes up.
00:36:04.990 --> 00:36:08.500
It sure is not obvious by
looking at this problem.
00:36:08.500 --> 00:36:11.370
OK, so that's what we're
going to do.
00:36:11.370 --> 00:36:15.840
And I'm hoping that after you
see this, you will in fact
00:36:15.840 --> 00:36:19.920
understand why the central limit
theorem is true as well
00:36:19.920 --> 00:36:23.440
as knowing that it's true.
00:36:23.440 --> 00:36:27.250
OK, so let's look at the
Bernoulli process.
00:36:27.250 --> 00:36:32.200
You have a sequence of binary
random variables, each of them
00:36:32.200 --> 00:36:37.220
is IID, each of them is
1 with probability p.
00:36:37.220 --> 00:36:41.780
And a 0 with probability
q equals 1 minus p.
00:36:41.780 --> 00:36:42.770
You add them all up.
00:36:42.770 --> 00:36:44.280
They're IID.
00:36:44.280 --> 00:36:49.890
And the question is, what
does the distribution of
00:36:49.890 --> 00:36:51.860
the sum look like?
00:36:51.860 --> 00:36:55.620
Well, it has a nice
formula to it.
00:36:55.620 --> 00:36:57.220
It's that formula down there.
00:36:57.220 --> 00:36:59.990
You've probably seen that
formula before.
00:36:59.990 --> 00:37:01.950
Let's get some idea
of where it comes
00:37:01.950 --> 00:37:04.670
from and what it means.
00:37:04.670 --> 00:37:10.240
Each n tuple that starts with
k1's and then ends with n
00:37:10.240 --> 00:37:15.870
minus k0's, each one of those
has the same probability.
00:37:15.870 --> 00:37:19.320
And it's p to the k times
q to the n minus k.
00:37:19.320 --> 00:37:22.030
In other words, the probability
you get a 1 on the
00:37:22.030 --> 00:37:24.030
first toss is p.
00:37:24.030 --> 00:37:28.740
The probability you get a 1 on
the second toss also, since
00:37:28.740 --> 00:37:31.710
those are independent,
probability you get two 1's in
00:37:31.710 --> 00:37:33.210
a row is p squared.
00:37:33.210 --> 00:37:36.460
Probably you get three 1's
in a row is p cubed and
00:37:36.460 --> 00:37:38.340
so forth up to k.
00:37:38.340 --> 00:37:42.280
Because we're looking at the
probability that the first k
00:37:42.280 --> 00:37:45.300
outputs are 1, so the
probability of
00:37:45.300 --> 00:37:47.630
that is p to the k.
00:37:47.630 --> 00:37:48.940
That's this term.
00:37:48.940 --> 00:37:53.690
And the probability that the
rest of them are all 0's is q
00:37:53.690 --> 00:37:54.940
to the n minus k.
00:37:58.210 --> 00:38:01.000
And this is sometimes confusing
to you because you
00:38:01.000 --> 00:38:05.350
often think that this is going
to be maximized when k is
00:38:05.350 --> 00:38:06.610
equal to p over n.
00:38:06.610 --> 00:38:09.880
You have some strange view of
the law of large numbers.
00:38:09.880 --> 00:38:12.020
Well no, this quantity--
00:38:12.020 --> 00:38:16.320
if p is less than 1/2, it's
going to be largest at k
00:38:16.320 --> 00:38:17.280
equals zero.
00:38:17.280 --> 00:38:21.970
The most probable single outcome
from n tosses of a
00:38:21.970 --> 00:38:26.830
coin, and it's a biased
coin, it comes out 1's
00:38:26.830 --> 00:38:28.080
more often than 0's.
00:38:31.400 --> 00:38:34.250
0's are more probable
than 1's.
00:38:34.250 --> 00:38:38.580
The most probable output
is all 0's.
00:38:38.580 --> 00:38:42.130
Very improbable, but that's the
most probable of all these
00:38:42.130 --> 00:38:43.570
improbable things.
00:38:43.570 --> 00:38:50.830
But as you probably know
already, there are n choose k
00:38:50.830 --> 00:38:56.470
different n tuples, all of which
have k1's in them and n
00:38:56.470 --> 00:38:58.340
minus k0's.
00:38:58.340 --> 00:39:01.650
If you don't know that,
I didn't even
00:39:01.650 --> 00:39:02.900
put that in the text.
00:39:02.900 --> 00:39:04.430
I put most things there.
00:39:04.430 --> 00:39:08.500
This is one of those basic
combinatorial facts.
00:39:08.500 --> 00:39:10.620
Look it up in Wikipedia.
00:39:10.620 --> 00:39:13.290
You'll probably get a cleaner
explanation of it there than
00:39:13.290 --> 00:39:14.000
anywhere else.
00:39:14.000 --> 00:39:17.790
But look it up in any elementary
probability book or
00:39:17.790 --> 00:39:20.780
in any elementary combinatorics
book.
00:39:20.780 --> 00:39:23.440
I'm sure that all of you
have seen this stuff.
00:39:23.440 --> 00:39:29.020
So when you put this together,
the probability that the sum
00:39:29.020 --> 00:39:33.250
of a n random variables, all
of which are binary, the
00:39:33.250 --> 00:39:38.730
probability of getting k1's is
n choose k times p to the k
00:39:38.730 --> 00:39:40.720
times q to the n minus k.
00:39:40.720 --> 00:39:44.390
Now you look at that.
00:39:44.390 --> 00:39:49.680
And if k is 1,000 and if n is
1,000, I mean, your eyes
00:39:49.680 --> 00:39:52.050
boggle because you can't
imagine what that
00:39:52.050 --> 00:39:53.800
number looks like.
00:39:53.800 --> 00:39:56.860
So we want to find out
what it looks like.
00:39:56.860 --> 00:39:58.460
And here's a tricky
way of doing it.
00:40:01.730 --> 00:40:05.550
What we want to do is to see
how this varies with k.
00:40:05.550 --> 00:40:08.820
And in particular, we want to
see how it varies with k when
00:40:08.820 --> 00:40:15.130
n is very large and when k is
relatively close to p times n.
00:40:15.130 --> 00:40:18.070
So what we're going to do
is take the ratio of the
00:40:18.070 --> 00:40:23.470
probability of k plus 1 1's
to the ratio of k1's.
00:40:23.470 --> 00:40:24.220
And what is that?
00:40:24.220 --> 00:40:26.780
I've written it out.
00:40:26.780 --> 00:40:30.730
n choose k, n choose
k plus 1--
00:40:30.730 --> 00:40:32.400
which is this term here--
00:40:32.400 --> 00:40:37.710
is n factorial divided by k plus
1 factorial times n minus
00:40:37.710 --> 00:40:40.420
k minus 1 factorial.
00:40:40.420 --> 00:40:42.220
This term here--
00:40:42.220 --> 00:40:46.050
you put the n factorial down
on the bottom, k factorial
00:40:46.050 --> 00:40:51.550
times n minus k quantity
factorial.
00:40:51.550 --> 00:40:55.770
And then you take the
p's and the q's.
00:40:55.770 --> 00:40:59.540
For this term here you have p
to the k plus 1 q to the n
00:40:59.540 --> 00:41:01.160
minus k minus 1.
00:41:01.160 --> 00:41:04.620
And for this one here you
have p to the k times p
00:41:04.620 --> 00:41:06.190
to the n minus k.
00:41:06.190 --> 00:41:10.120
All that stuff cancels out,
which is really cute.
00:41:10.120 --> 00:41:13.050
When you look at this term you
have p to the k plus 1
00:41:13.050 --> 00:41:13.890
over p to the k.
00:41:13.890 --> 00:41:15.400
That's just p.
00:41:15.400 --> 00:41:19.520
And here you have q to the
n minus k minus 1 over q
00:41:19.520 --> 00:41:21.090
to the n minus k.
00:41:21.090 --> 00:41:23.520
That's just q in the
denominator.
00:41:23.520 --> 00:41:26.960
So this goes into p over q.
00:41:26.960 --> 00:41:30.790
This quantity here is almost
a simple n factorial over n
00:41:30.790 --> 00:41:32.680
factorial is 1.
00:41:32.680 --> 00:41:37.700
k plus 1 factorial divided by
k factorial is k plus 1.
00:41:37.700 --> 00:41:39.970
That's this term here.
00:41:39.970 --> 00:41:46.270
And the n minus k over n minus
k minus 1 is n minus k.
00:41:46.270 --> 00:41:48.930
So this ratio here
is just that very
00:41:48.930 --> 00:41:52.289
simple expression there.
00:41:52.289 --> 00:42:03.270
Now this ratio is strictly
decreasing in k.
00:42:03.270 --> 00:42:04.160
How do I see that?
00:42:04.160 --> 00:42:07.570
Well, as k gets bigger and
bigger, what happens?
00:42:07.570 --> 00:42:11.680
As k gets bigger, the numerator
gets larger.
00:42:11.680 --> 00:42:14.680
The denominator--
00:42:14.680 --> 00:42:19.560
excuse me, as k gets larger,
the numerator gets smaller.
00:42:19.560 --> 00:42:22.180
The denominator gets larger.
00:42:22.180 --> 00:42:28.030
So the ratio of the
two gets smaller.
00:42:28.030 --> 00:42:32.710
So this whole quantity here,
as k gets larger and larger
00:42:32.710 --> 00:42:36.810
for fixed n, is just decreasing
and decreasing and
00:42:36.810 --> 00:42:38.810
decreasing.
00:42:38.810 --> 00:42:44.670
Now let's look a little bit at
where this crosses 1, if it
00:42:44.670 --> 00:42:47.090
does cross 1.
00:42:47.090 --> 00:42:53.280
And what I claim here is that
when k plus 1 is less than or
00:42:53.280 --> 00:43:00.230
equal to pn, what
happens here?
00:43:00.230 --> 00:43:06.140
Well if I can do this--
00:43:06.140 --> 00:43:09.080
I usually get confused
doing these things.
00:43:09.080 --> 00:43:15.600
But if k plus 1 is less than
or equal to pn, this is the
00:43:15.600 --> 00:43:20.730
last of these choices here, then
k is also less than or
00:43:20.730 --> 00:43:22.530
equal to pn.
00:43:22.530 --> 00:43:27.320
And therefore, n minus
k is greater than--
00:43:27.320 --> 00:43:31.200
in fact, k is strictly
less than pn.
00:43:31.200 --> 00:43:40.970
And n minus k is strictly
greater than n minus pn.
00:43:40.970 --> 00:43:45.755
And since q is 1 minus
p, this is n times q.
00:43:45.755 --> 00:43:49.490
OK, so you take this
divided by this.
00:43:49.490 --> 00:43:52.130
And you take this
divided by this.
00:43:52.130 --> 00:43:56.920
And sure enough, this ratio
here is greater than 1 any
00:43:56.920 --> 00:44:03.710
time you have a k which is
smaller than what you think k
00:44:03.710 --> 00:44:07.876
ought to be, which
is p over n.
00:44:07.876 --> 00:44:10.060
OK, so you have these three
quantities here.
00:44:10.060 --> 00:44:12.530
Let me go on to the
next slide.
00:44:15.490 --> 00:44:21.620
Since these ratios are less
than 1 when k is large,
00:44:21.620 --> 00:44:27.790
approximately equal to 1 when k
is close to pn, and greater
00:44:27.790 --> 00:44:31.890
than 1 when it's smaller than
1, if I plot these things,
00:44:31.890 --> 00:44:39.390
what I find is that as k is
increasing, getting closer and
00:44:39.390 --> 00:44:43.840
closer to pn, it's getting
larger and larger.
00:44:43.840 --> 00:44:49.560
As k is increasing further,
getting larger than pn, this
00:44:49.560 --> 00:44:52.470
ratio says that these
things have to be
00:44:52.470 --> 00:44:56.670
getting smaller and smaller.
00:44:56.670 --> 00:45:00.140
So just from looking at this, we
know that these terms have
00:45:00.140 --> 00:45:04.820
to be increasing for terms less
than pn and have to be
00:45:04.820 --> 00:45:08.360
decreasing for terms
greater than pn.
00:45:08.360 --> 00:45:10.730
So this is a bell-shaped
curve.
00:45:10.730 --> 00:45:13.220
We've already seen that.
00:45:13.220 --> 00:45:16.790
It might not be quite clear that
it's bell-shaped in the
00:45:16.790 --> 00:45:21.530
sense that it kind of tapers
off as you get smaller.
00:45:21.530 --> 00:45:24.860
Because these ratios are getting
bigger and bigger, as
00:45:24.860 --> 00:45:28.680
k gets bigger and bigger, the
ratio of this term to this
00:45:28.680 --> 00:45:30.550
term gets bigger and bigger.
00:45:30.550 --> 00:45:32.750
So what's happening there?
00:45:32.750 --> 00:45:36.060
As this ratio gets bigger and
bigger, these terms get
00:45:36.060 --> 00:45:37.735
smaller and smaller.
00:45:37.735 --> 00:45:40.280
But as these terms get smaller
and smaller, they're getting
00:45:40.280 --> 00:45:42.210
closer and closer to 0.
00:45:42.210 --> 00:45:48.180
So even though they're going to
0 like a bat out of hell,
00:45:48.180 --> 00:45:50.260
they still can't get
any smaller than 0.
00:45:50.260 --> 00:45:55.330
So they just taper down and
start to get close to 0.
00:45:55.330 --> 00:46:03.910
So that is roughly how
this sum of binary
00:46:03.910 --> 00:46:06.240
random variables behave.
00:46:06.240 --> 00:46:13.950
OK, so let's go on and show that
the central limit theorem
00:46:13.950 --> 00:46:17.670
holds for the Bernoulli
process.
00:46:17.670 --> 00:46:19.550
And that's just as
easy really.
00:46:19.550 --> 00:46:23.230
There's nothing more difficult
about it that we
00:46:23.230 --> 00:46:25.430
have to deal with.
00:46:25.430 --> 00:46:30.410
This ratio, as we've said, is
equal to n minus k over k plus
00:46:30.410 --> 00:46:32.850
1 times p over k.
00:46:32.850 --> 00:46:35.310
What we're interested
in here--
00:46:35.310 --> 00:46:40.730
I mean, we've already seen from
the last slide that the
00:46:40.730 --> 00:46:44.270
interesting thing here
is the big terms.
00:46:44.270 --> 00:46:47.880
And the big terms are the terms
which are close to pn.
00:46:47.880 --> 00:46:51.780
So what we'd like to do is look
at values of k which are
00:46:51.780 --> 00:46:53.120
close to pn.
00:46:53.120 --> 00:46:58.580
What I've done here is to plot
this as k minus the integer
00:46:58.580 --> 00:46:59.820
value of pn.
00:46:59.820 --> 00:47:01.270
So we get integers.
00:47:01.270 --> 00:47:05.030
What I'm going to assume now,
because this gets a little
00:47:05.030 --> 00:47:08.400
hairy if I don't do that, I'm
going to assume that pn is
00:47:08.400 --> 00:47:10.540
equal to an integer.
00:47:10.540 --> 00:47:14.000
It doesn't make a whole lot of
difference to the argument.
00:47:14.000 --> 00:47:17.040
It just leaves out a lot
of terms that you don't
00:47:17.040 --> 00:47:18.300
have to play with.
00:47:18.300 --> 00:47:22.050
So we'll assume that
pn is an integer.
00:47:22.050 --> 00:47:24.360
With this example, we're
looking at where
00:47:24.360 --> 00:47:26.310
p is equal to 1/4.
00:47:26.310 --> 00:47:30.890
Pn is going to be an integer
whenever n is a multiple of 4.
00:47:30.890 --> 00:47:33.310
So things are fine then.
00:47:33.310 --> 00:47:37.360
If I try to make p equal to 1
over pi, then that doesn't
00:47:37.360 --> 00:47:38.170
work so well.
00:47:38.170 --> 00:47:45.800
But after all, no reason to
chose p in such a strange way.
00:47:45.800 --> 00:47:50.520
OK, so I'm going to look at this
for a fixed value of n.
00:47:50.520 --> 00:47:56.270
I'm going to look at it as k
increases for k less than pn.
00:47:56.270 --> 00:47:58.800
I'm going to look at it
as it decreases for k
00:47:58.800 --> 00:48:00.360
greater than pn.
00:48:00.360 --> 00:48:08.415
And I'm going to define k to
be equal to the i plus pn.
00:48:08.415 --> 00:48:11.070
So I'm going to put the whole
thing in terms of
00:48:11.070 --> 00:48:15.470
i instead of k.
00:48:15.470 --> 00:48:25.030
OK, so when I substitute i
equals k minus pn for k here,
00:48:25.030 --> 00:48:27.620
what I'm going to get
is this term.
00:48:27.620 --> 00:48:30.700
It's going to be the probability
of pn plus i plus
00:48:30.700 --> 00:48:33.960
1 over p of--
00:48:37.560 --> 00:48:40.530
fortunately when you're using
textures, you can distinguish
00:48:40.530 --> 00:48:42.190
different kinds of p's.
00:48:42.190 --> 00:48:44.490
I have too many p's
in this equation.
00:48:44.490 --> 00:48:47.210
This is the probability
mass function.
00:48:47.210 --> 00:48:52.180
This is just my probability
of a 1.
00:48:52.180 --> 00:48:54.730
And p's are things that you
like to use a lot in
00:48:54.730 --> 00:48:55.470
probability.
00:48:55.470 --> 00:48:59.440
So it's nice to have that
separation there.
00:48:59.440 --> 00:49:04.900
OK, when I take this and I
substitute it into that with k
00:49:04.900 --> 00:49:10.770
equal to i, what I get is
n minus pn minus i.
00:49:10.770 --> 00:49:17.570
That's n minus k over pn plus
i plus 1 times p over q.
00:49:17.570 --> 00:49:24.750
Fair enough, OK, all I'm doing
is replacing k with pn plus i
00:49:24.750 --> 00:49:29.080
because I want i to be very
close to 0 in this argument.
00:49:29.080 --> 00:49:32.440
Because I've already seen that
these terms are only
00:49:32.440 --> 00:49:35.370
significant when i is relatively
close to 0.
00:49:35.370 --> 00:49:38.600
Because when I get away from 0,
these terms are going down
00:49:38.600 --> 00:49:41.500
very, very fast.
00:49:41.500 --> 00:49:45.260
So when I do that,
what do I get?
00:49:45.260 --> 00:49:51.040
I get n minus pn
is equal to qn.
00:49:51.040 --> 00:49:51.820
That's nice.
00:49:51.820 --> 00:49:53.850
So I have an nq here.
00:49:53.850 --> 00:49:55.270
I have a q here.
00:49:55.270 --> 00:49:56.870
I have a pn here.
00:49:56.870 --> 00:49:58.230
I have a p here.
00:49:58.230 --> 00:49:59.860
I'm going to multiply
this p by n.
00:49:59.860 --> 00:50:02.660
I'm going to multiply
this q by n.
00:50:02.660 --> 00:50:05.730
And I'm going to take a ratio
of this pair of things.
00:50:05.730 --> 00:50:11.170
So when I take this ratio,
I'm going to get nq over
00:50:11.170 --> 00:50:12.420
nq, which is 1.
00:50:19.590 --> 00:50:26.030
And the other terms there
become minus i over nq.
00:50:26.030 --> 00:50:33.460
In the denominator, I'm going to
divide pn plus i plus 1 by
00:50:33.460 --> 00:50:40.320
p, by pn, which gives me 1 plus
i plus 1 divided by np.
00:50:40.320 --> 00:50:45.300
So I get two terms, ratio of
two terms, which are both
00:50:45.300 --> 00:50:49.170
close to 1 at this point and
which are getting closer and
00:50:49.170 --> 00:50:54.910
closer to 1 as n gets
larger and larger.
00:50:54.910 --> 00:50:59.580
Now let's take the logarithm
of this.
00:50:59.580 --> 00:51:02.670
Let me justify taking the
logarithm of it in two
00:51:02.670 --> 00:51:04.160
different ways.
00:51:04.160 --> 00:51:09.360
One of them is that what
we're trying to prove--
00:51:09.360 --> 00:51:17.130
and I'm playing the game
that all of you
00:51:17.130 --> 00:51:18.560
always play in quizzes.
00:51:18.560 --> 00:51:20.720
When you're trying to prove
something, what do you do?
00:51:20.720 --> 00:51:21.980
You start at the beginning.
00:51:21.980 --> 00:51:22.870
You work this way.
00:51:22.870 --> 00:51:24.150
You start at the end.
00:51:24.150 --> 00:51:25.620
You work back this way.
00:51:25.620 --> 00:51:29.090
And you hope, at some point, the
two things come together.
00:51:29.090 --> 00:51:31.530
If they don't come together,
you get to this point.
00:51:31.530 --> 00:51:32.810
And you say, obviously.
00:51:32.810 --> 00:51:35.440
And then you go to that point
which leads to-- yeah.
00:51:35.440 --> 00:51:37.590
[LAUGHTER]
00:51:37.590 --> 00:51:42.510
PROFESSOR: OK, so I'm doing
the same thing here.
00:51:42.510 --> 00:51:46.620
This probability that we're
trying to calculate--
00:51:46.620 --> 00:51:50.860
well, I've listed it
here in terms of--
00:51:50.860 --> 00:51:56.600
I have put it here in terms of
a distribution function.
00:51:56.600 --> 00:52:01.170
I will do just as well if I can
do it in terms of an PMF.
00:52:01.170 --> 00:52:05.270
And what I'd like to show is
that the PMF of sn minus nX
00:52:05.270 --> 00:52:09.620
bar over square root of n
times sigma is somehow
00:52:09.620 --> 00:52:13.740
proportional to e to the
minus x squared over 2.
00:52:13.740 --> 00:52:18.690
Now if I want to do that, it
will be all right if I can
00:52:18.690 --> 00:52:21.700
take the logarithm of this
term and show that
00:52:21.700 --> 00:52:27.660
it's a square nX.
00:52:27.660 --> 00:52:32.240
And if I want to show that this
logarithm is a square nX,
00:52:32.240 --> 00:52:35.910
and I'm looking at the
differentials at each time,
00:52:35.910 --> 00:52:40.570
what are the differentials going
to be if the sum of the
00:52:40.570 --> 00:52:44.510
differentials is quadratic?
00:52:44.510 --> 00:52:48.300
If the sum of these
differentials is quadratic,
00:52:48.300 --> 00:52:50.340
then the individual terms
have to be linear.
00:52:53.510 --> 00:52:57.490
If I take a bunch of linear
terms, if I add up 1 plus 2
00:52:57.490 --> 00:53:03.900
plus 3 plus 4 plus 5, you've
all done this I'm sure.
00:53:10.080 --> 00:53:17.960
And down here you write n plus
n minus 1 plus, plus 1.
00:53:17.960 --> 00:53:19.240
And what do you get?
00:53:19.240 --> 00:53:25.246
You get n times n
plus 1 over 2.
00:53:25.246 --> 00:53:28.170
You can also approximate
that by integrating.
00:53:28.170 --> 00:53:31.560
Whenever you add up a sum
of linear terms, you
00:53:31.560 --> 00:53:33.690
get a square term.
00:53:33.690 --> 00:53:35.660
And I'm just curious.
00:53:35.660 --> 00:53:39.570
How many of you have
seen that?
00:53:39.570 --> 00:53:40.920
Good, OK.
00:53:40.920 --> 00:53:42.630
Well, it's only about 1/2.
00:53:42.630 --> 00:53:47.330
So it's something you've
probably seen in high school.
00:53:47.330 --> 00:53:48.730
Or your haven't seen
it at all.
00:53:53.060 --> 00:53:56.450
So let's go on with
this argument.
00:54:02.250 --> 00:54:05.200
OK, so I'm going to take
the logarithm of
00:54:05.200 --> 00:54:08.120
this expression here.
00:54:08.120 --> 00:54:09.570
I'm going to take
the logarithm.
00:54:09.570 --> 00:54:14.070
I'm going to have the logarithm
of 1 minus i over nq
00:54:14.070 --> 00:54:20.690
minus the logarithm of 1
plus i plus 1 over np.
00:54:20.690 --> 00:54:26.410
And I'm going to use what I
think of as one of the most
00:54:26.410 --> 00:54:31.750
useful inequalities that you
will ever see, which is the
00:54:31.750 --> 00:54:36.130
natural log of 1 plus x.
00:54:36.130 --> 00:54:44.450
If we use a power expansion, we
get x minus x squared over
00:54:44.450 --> 00:54:50.160
2 plus x cubed over 3 minus--
00:54:50.160 --> 00:54:52.100
it's an alternating series.
00:54:52.100 --> 00:54:56.630
If x is negative, this
term is negative.
00:54:56.630 --> 00:54:57.820
This term is negative.
00:54:57.820 --> 00:54:59.650
This term is negative.
00:54:59.650 --> 00:55:04.060
And all this makes sense
because if I draw this
00:55:04.060 --> 00:55:10.290
function here, logarithm of
1 plus x at x equals 0.
00:55:10.290 --> 00:55:12.680
This is equal to 0.
00:55:12.680 --> 00:55:16.420
It comes up with a slope of 1.
00:55:16.420 --> 00:55:18.620
And it levels off.
00:55:18.620 --> 00:55:23.870
And here it's going
down very fast.
00:55:23.870 --> 00:55:28.600
So these terms, you get
these negative terms.
00:55:28.600 --> 00:55:32.890
And on the positive side, you
get these alternating terms.
00:55:32.890 --> 00:55:35.810
So this goes up slowly,
down fast.
00:55:35.810 --> 00:55:39.290
The slope here is x, which
is this term here.
00:55:39.290 --> 00:55:45.210
The curvature here gives you
the minus x squared over 2.
00:55:45.210 --> 00:55:49.380
And the approximation, which is
very useful here, is that
00:55:49.380 --> 00:55:56.310
the logarithm of 1 plus x, when
x is small, is equal to x
00:55:56.310 --> 00:55:59.910
plus what we call
little l of x.
00:55:59.910 --> 00:56:03.890
Namely something which goes
to 0 faster than x
00:56:03.890 --> 00:56:05.930
as x goes to 0.
00:56:05.930 --> 00:56:09.630
OK, all of you know
that, right?
00:56:09.630 --> 00:56:12.090
Well, if you don't know
it, now you know it.
00:56:12.090 --> 00:56:12.940
It's useful.
00:56:12.940 --> 00:56:16.040
You will use it again
and again.
00:56:16.040 --> 00:56:18.650
OK, so what we're
going to do is--
00:56:21.590 --> 00:56:26.160
well, that's pretty good.
00:56:26.160 --> 00:56:27.410
Where did I get to that point?
00:56:31.910 --> 00:56:34.630
I skipped something.
00:56:34.630 --> 00:56:42.240
What I have shown is that this
increment in the probability,
00:56:42.240 --> 00:56:46.190
in the PMF for s sub n, namely
the increment as you increase
00:56:46.190 --> 00:56:50.810
i by 1, is linear in i.
00:56:50.810 --> 00:56:54.940
And in fact, the logarithm of
this increment is linear in i.
00:56:54.940 --> 00:56:58.740
So therefore, by what I was
saying before, the logarithm
00:56:58.740 --> 00:57:02.760
of the actual terms should be
rather than linear in i, they
00:57:02.760 --> 00:57:04.680
should be quadratic in i.
00:57:04.680 --> 00:57:06.870
So that's what I'm trying
to do here.
00:57:06.870 --> 00:57:09.450
I just missed this
whole term here.
00:57:09.450 --> 00:57:14.490
What I'm interested in now is
getting a handle on pn plus
00:57:14.490 --> 00:57:17.960
some larger value,
j, divided by the
00:57:17.960 --> 00:57:21.000
probability of sn for pn.
00:57:21.000 --> 00:57:23.340
What am I trying to do here?
00:57:23.340 --> 00:57:27.390
I should've said what
I was trying to do.
00:57:27.390 --> 00:57:29.570
This term is just one term.
00:57:29.570 --> 00:57:31.370
It's fixed.
00:57:31.370 --> 00:57:35.120
Be nice if we knew what it was,
we don't at the moment.
00:57:35.120 --> 00:57:38.370
But I'm trying to express
everything else in terms of
00:57:38.370 --> 00:57:40.890
that one unknown term.
00:57:40.890 --> 00:57:44.580
And what I'm trying to do is to
show that the logarithm of
00:57:44.580 --> 00:57:50.970
this everything else is going
to be quadratic in j.
00:57:50.970 --> 00:57:54.320
And if I can do that, then I
only have one undetermined
00:57:54.320 --> 00:57:56.180
factor in this whole sum.
00:57:56.180 --> 00:57:59.345
And I can use the fact that PMF
summed to 1 to solve the
00:57:59.345 --> 00:58:01.180
whole problem.
00:58:01.180 --> 00:58:07.650
So I'm going to express the
probability that we get pn
00:58:07.650 --> 00:58:13.070
plus j1's divided by the
probability that we get pn1's.
00:58:13.070 --> 00:58:24.440
It's the sum of the probability
pn plus i plus 1
00:58:24.440 --> 00:58:26.250
over pn plus i.
00:58:26.250 --> 00:58:27.270
And we increase i.
00:58:27.270 --> 00:58:30.060
We start out at i equals 0.
00:58:30.060 --> 00:58:34.740
And then the denominator is
probability of pn plus 0,
00:58:34.740 --> 00:58:36.900
which is this term.
00:58:36.900 --> 00:58:42.110
And each time we increase i by
1, this term cancels out with
00:58:42.110 --> 00:58:45.320
the previous or the next
value of this term.
00:58:45.320 --> 00:58:51.300
And when I get all done, all I
have is this expression here.
00:58:51.300 --> 00:58:53.854
Everybody see that?
00:58:53.854 --> 00:58:56.130
OK, I see a lot of--
00:58:56.130 --> 00:58:59.470
if you don't see it,
just look at it.
00:58:59.470 --> 00:59:01.810
And you'll see that this--
00:59:01.810 --> 00:59:03.660
I think you'll see
that this works.
00:59:03.660 --> 00:59:07.700
OK, so now I take this
expression here.
00:59:07.700 --> 00:59:11.210
This logarithm is this
linear term here.
00:59:11.210 --> 00:59:12.680
What do I want to do?
00:59:12.680 --> 00:59:18.210
I want to sum i from
0 up to j minus 1.
00:59:18.210 --> 00:59:22.270
What do I get when I sum
i from 0 to j minus 1?
00:59:22.270 --> 00:59:24.330
I get this expression here.
00:59:24.330 --> 00:59:29.870
I get j times j minus
1 divided by 2n.
00:59:29.870 --> 00:59:31.120
Oh, I was--
00:59:33.420 --> 00:59:35.180
I skipped something.
00:59:35.180 --> 00:59:37.380
Let's go back a little bit.
00:59:37.380 --> 00:59:40.105
Because it'll look like
it was a typo.
00:59:43.800 --> 00:59:48.100
When I took this logarithm and
I applied this approximation
00:59:48.100 --> 00:59:55.160
to it, I got minus i over nq
minus i over np minus 1 over
00:59:55.160 --> 00:59:59.460
np plus square terms in n.
00:59:59.460 --> 01:00:05.020
When I take i over nq minus i
over np, I can combine those
01:00:05.020 --> 01:00:06.680
two things together.
01:00:06.680 --> 01:00:15.100
I can take ip over npq
minus iq times npq.
01:00:15.100 --> 01:00:17.430
And q plus p is equal to 1.
01:00:17.430 --> 01:00:20.310
So the numerator
all goes away.
01:00:20.310 --> 01:00:27.230
And these two terms combine to
be minus i over n times p
01:00:27.230 --> 01:00:29.720
times p times q.
01:00:29.720 --> 01:00:33.490
And I just has this one last
little term left here.
01:00:33.490 --> 01:00:35.190
Don't know what to
do with that.
01:00:35.190 --> 01:00:37.260
But then I add up
all these terms.
01:00:37.260 --> 01:00:39.690
This one is the one that
leads to j times j
01:00:39.690 --> 01:00:42.710
minus 1 over 2 npq.
01:00:42.710 --> 01:00:47.300
This one is the one that
leads to j over np.
01:00:47.300 --> 01:00:51.580
And I just neglect this term,
which is negligible compared
01:00:51.580 --> 01:00:52.920
to j squared.
01:00:52.920 --> 01:00:55.685
I get minus j squared
over 2 npq.
01:00:59.440 --> 01:01:04.430
Let me come back later to say
why I'm so eager to neglect
01:01:04.430 --> 01:01:07.240
this term except that that's
what I have to do if I want to
01:01:07.240 --> 01:01:08.700
get the right answer.
01:01:08.700 --> 01:01:13.130
OK, so we'll see why that has
to be negligible in just a
01:01:13.130 --> 01:01:14.110
little bit.
01:01:14.110 --> 01:01:19.630
But now this logarithm is coming
out to be exactly the
01:01:19.630 --> 01:01:23.160
term that I want it to be.
01:01:23.160 --> 01:01:29.950
So finally, the logarithm of
the sum of these random
01:01:29.950 --> 01:01:34.790
variables pn plus j, namely j
off the mean, divided by that
01:01:34.790 --> 01:01:40.420
if pn is equal to minus j
squared over 2 npq plus some
01:01:40.420 --> 01:01:42.160
negligible terms.
01:01:42.160 --> 01:01:45.120
And this says when I
exponentiate things, that the
01:01:45.120 --> 01:01:51.300
probability that sn is j off
the mean is approximately
01:01:51.300 --> 01:01:54.720
equal to this term, the
probability that it's right at
01:01:54.720 --> 01:01:58.965
the mean, times e to the minus
j squared over 2 npq.
01:02:01.490 --> 01:02:06.020
What that is saying is that this
sum of terms that I was
01:02:06.020 --> 01:02:07.270
looking at before--
01:02:10.290 --> 01:02:20.070
this term here, this term, this
term, this term, this
01:02:20.070 --> 01:02:21.865
term, and so forth down--
01:02:30.280 --> 01:02:37.840
these terms here are actually
going as minus j squared over
01:02:37.840 --> 01:02:42.150
2 npq, which is what they should
be going as if you have
01:02:42.150 --> 01:02:44.810
a Gaussian curve here.
01:02:44.810 --> 01:02:50.140
OK, now there's one other thing
we have to do, which is
01:02:50.140 --> 01:02:54.340
figure out what this term is.
01:02:54.340 --> 01:03:02.470
And if you look at this as an
undetermined coefficient on
01:03:02.470 --> 01:03:06.270
these Gaussian-type terms and
you think of what happens if I
01:03:06.270 --> 01:03:11.710
sum this over all i, well, if
I sum it over all i what I'm
01:03:11.710 --> 01:03:17.490
going to get is the sum of all
of these terms here, which are
01:03:17.490 --> 01:03:24.360
negligible except where j
squared is proportional to n.
01:03:24.360 --> 01:03:27.640
So I don't have to sum them
beyond the point where this
01:03:27.640 --> 01:03:29.910
approximation makes sense.
01:03:29.910 --> 01:03:33.350
So I want to sum all
these terms.
01:03:33.350 --> 01:03:36.540
In summing these terms, when n
gets very, very large, these
01:03:36.540 --> 01:03:39.290
things are dropping off
very, very slowly.
01:03:39.290 --> 01:03:41.510
The curve is getting
very, very wide.
01:03:41.510 --> 01:03:46.940
If i scrunch the curve back in
again, what I get is a Riemann
01:03:46.940 --> 01:03:51.730
approximation to a normal
density curve.
01:03:51.730 --> 01:03:53.890
Therefore, I can integrate it.
01:03:53.890 --> 01:03:55.920
And believe me.
01:03:55.920 --> 01:03:57.620
If you don't believe me,
I'll go through it.
01:03:57.620 --> 01:03:59.720
And you won't like that.
01:03:59.720 --> 01:04:06.750
When you go through this, what
you get is in fact this
01:04:06.750 --> 01:04:11.930
expression right here, which
says that when n gets very,
01:04:11.930 --> 01:04:20.190
very large and j is the offset
from the mean and is
01:04:20.190 --> 01:04:23.420
proportional to--
01:04:23.420 --> 01:04:25.890
well, it's proportional to
the square root of n.
01:04:25.890 --> 01:04:30.260
Then what I get is this PMF
here, which is in fact what
01:04:30.260 --> 01:04:32.350
the central limit
theorem says.
01:04:32.350 --> 01:04:36.380
And now if you go back and try
to think of exactly what we've
01:04:36.380 --> 01:04:40.110
done, what we've done is to
show that the logarithm of
01:04:40.110 --> 01:04:44.490
these differences here is
in fact linear in i.
01:04:44.490 --> 01:04:47.470
Therefore, when you sum them,
you get something which is
01:04:47.470 --> 01:04:50.640
quadratic in j.
01:04:50.640 --> 01:04:54.000
And because of that, all you
have to do is normalize with a
01:04:54.000 --> 01:04:55.340
center term.
01:04:55.340 --> 01:04:57.800
And you get this.
01:04:57.800 --> 01:05:01.690
The central limit theorem,
especially for the binary
01:05:01.690 --> 01:05:05.550
case, is almost always done
by using a Stirling
01:05:05.550 --> 01:05:07.040
approximation.
01:05:07.040 --> 01:05:10.460
And a Stirling approximation is
one of these things which
01:05:10.460 --> 01:05:12.740
is black magic.
01:05:12.740 --> 01:05:17.190
I don't know any place except in
William Feller's book where
01:05:17.190 --> 01:05:21.240
anyone talks about where this
formula comes from.
01:05:21.240 --> 01:05:23.650
If you now go back and look
very carefully at this
01:05:23.650 --> 01:05:27.390
derivation, this tells
you what the Stirling
01:05:27.390 --> 01:05:28.830
approximation is.
01:05:28.830 --> 01:05:34.210
Because if you do this for p
equals q, what you're doing is
01:05:34.210 --> 01:05:39.650
actually evaluating n choose
k where k is very
01:05:39.650 --> 01:05:42.560
close to and over 2.
01:05:42.560 --> 01:05:45.170
And that will tell you exactly
what Stirling's
01:05:45.170 --> 01:05:47.300
approximation has to be.
01:05:47.300 --> 01:05:49.700
In other words, that's a way
of deriving Stirling's
01:05:49.700 --> 01:05:50.960
approximation.
01:05:50.960 --> 01:05:54.220
The very backward way of
doing things it seems.
01:05:54.220 --> 01:05:57.330
But often backward ways
are the best ways
01:05:57.330 --> 01:05:59.480
of doing these things.
01:05:59.480 --> 01:06:05.210
OK, so I told you I
would stop at some
01:06:05.210 --> 01:06:06.840
point and ask for questions.
01:06:06.840 --> 01:06:07.570
Yes?
01:06:07.570 --> 01:06:09.820
AUDIENCE: Can you please go
back one slide before this
01:06:09.820 --> 01:06:15.695
slide where can you neglect
a term, which [INAUDIBLE],
01:06:15.695 --> 01:06:18.932
minus j over np.
01:06:18.932 --> 01:06:22.400
PROFESSOR: Why did I neglect
the j over np?
01:06:22.400 --> 01:06:25.830
OK, that's a good question.
01:06:25.830 --> 01:06:34.760
If you look at this curve here,
and I put the j in.
01:06:34.760 --> 01:06:39.470
I can put the j in by just
making this expression here
01:06:39.470 --> 01:06:44.360
look at one smaller value of
j or one larger value of j.
01:06:44.360 --> 01:06:45.870
And you get something different
whether you're
01:06:45.870 --> 01:06:48.880
looking at the minus side
or the plus side.
01:06:48.880 --> 01:06:53.850
In fact, if p is equal to q,
this term cancels out.
01:06:53.850 --> 01:06:57.400
If p is not equal to q, what
happens is that the central
01:06:57.400 --> 01:07:01.940
limit theorem is approximately
symmetric.
01:07:01.940 --> 01:07:04.130
But in this first ordered term,
01:07:04.130 --> 01:07:05.760
it's not quite symmetric.
01:07:05.760 --> 01:07:10.090
It can't be symmetric because
this is p times n.
01:07:10.090 --> 01:07:12.500
And you have all these
terms out to 1.
01:07:12.500 --> 01:07:15.690
And you have many, many
fewer terms back to 0.
01:07:15.690 --> 01:07:19.290
So it has to be slightly
asymmetric.
01:07:19.290 --> 01:07:25.230
But it's only asymmetric over at
most a unit of value here,
01:07:25.230 --> 01:07:27.190
which is not significant.
01:07:27.190 --> 01:07:29.870
Because as n gets bigger,
these terms--
01:07:29.870 --> 01:07:32.060
well, as I've done
it, the terms do
01:07:32.060 --> 01:07:33.870
not get close together.
01:07:33.870 --> 01:07:37.820
But if I want to think of it
as a normalized Gaussian
01:07:37.820 --> 01:07:40.300
curve, I have to make the
terms close together.
01:07:40.300 --> 01:07:43.580
So that extra term is
not significant.
01:07:43.580 --> 01:07:46.630
I wish I had a nicer way of
taking care of all the
01:07:46.630 --> 01:07:48.740
approximations here.
01:07:48.740 --> 01:07:52.110
I haven't put this in the notes
because I still haven't
01:07:52.110 --> 01:07:54.640
figured out how to do that.
01:07:54.640 --> 01:08:00.310
But I still think you get more
insight from doing it this way
01:08:00.310 --> 01:08:04.130
than you do by going through
Stirling's approximation, all
01:08:04.130 --> 01:08:07.610
those pages and pages
of algebra.
01:08:07.610 --> 01:08:08.860
Anything else?
01:08:14.020 --> 01:08:16.080
OK, well, see you
Wednesday then.