WEBVTT
00:00:00.060 --> 00:00:02.500
The following content is
provided under a Creative
00:00:02.500 --> 00:00:04.019
Commons license.
00:00:04.019 --> 00:00:06.360
Your support will help
MIT OpenCourseWare
00:00:06.360 --> 00:00:10.730
continue to offer high quality
educational resources for free.
00:00:10.730 --> 00:00:13.340
To make a donation or
view additional materials
00:00:13.340 --> 00:00:17.217
from hundreds of MIT courses,
visit MIT OpenCourseWare
00:00:17.217 --> 00:00:17.842
at ocw.mit.edu.
00:00:21.520 --> 00:00:25.260
PROFESSOR: OK, so
good afternoon.
00:00:25.260 --> 00:00:30.820
Today, we will review
probability theory.
00:00:30.820 --> 00:00:36.090
So I will mostly focus on-- I'll
give you some distributions.
00:00:36.090 --> 00:00:38.830
So probabilistic distributions,
that will be of interest to us
00:00:38.830 --> 00:00:40.830
throughout the course.
00:00:40.830 --> 00:00:44.610
And I will talk about
moment-generating function
00:00:44.610 --> 00:00:46.120
a little bit.
00:00:46.120 --> 00:00:50.660
Afterwards, I will talk
about law of large numbers
00:00:50.660 --> 00:00:52.210
and central limit theorem.
00:00:56.310 --> 00:01:00.680
Who has heard of all
of these topics before?
00:01:00.680 --> 00:01:02.150
OK.
00:01:02.150 --> 00:01:04.120
That's good.
00:01:04.120 --> 00:01:06.624
Then I'll try to focus
more on a little bit more
00:01:06.624 --> 00:01:07.540
of the advanced stuff.
00:01:10.890 --> 00:01:13.830
Then a big part of it
will be review for you.
00:01:13.830 --> 00:01:18.260
So first of all, just to
agree on terminology, let's
00:01:18.260 --> 00:01:21.490
review some definitions.
00:01:21.490 --> 00:01:32.670
So a random variable
X-- we will talk
00:01:32.670 --> 00:01:38.900
about discrete and
continuous random variables.
00:01:43.310 --> 00:01:47.240
Just to set up the notation,
I will write discrete as X
00:01:47.240 --> 00:01:50.130
and continuous random
variable as Y for now.
00:01:50.130 --> 00:01:52.820
So they are given by its
probability distribution--
00:01:52.820 --> 00:01:57.070
discrete random variable is
given by its probability mass
00:01:57.070 --> 00:02:02.490
function, f sub
X, I will denote.
00:02:02.490 --> 00:02:06.900
And continuous is given by
probability distribution
00:02:06.900 --> 00:02:07.399
function.
00:02:11.530 --> 00:02:17.745
I will denote by f
sub Y. So pmf and pdf.
00:02:22.210 --> 00:02:23.930
Here, I just use a
subscript because I
00:02:23.930 --> 00:02:26.030
wanted to distinguish
f sub x and f sub y.
00:02:26.030 --> 00:02:29.140
But when it's clear which random
variable we're talking about,
00:02:29.140 --> 00:02:32.190
I'll just say f.
00:02:32.190 --> 00:02:33.740
So what is this?
00:02:33.740 --> 00:02:42.980
A probability mass function is
a function from the sample space
00:02:42.980 --> 00:02:50.290
to non-negative reals such
that the sum over all points
00:02:50.290 --> 00:02:54.480
in the domain equals 1.
00:02:54.480 --> 00:02:57.110
The probability distribution
is very similar.
00:02:59.730 --> 00:03:02.890
The function from the
sample space non-negative
00:03:02.890 --> 00:03:07.500
reals, but now the
integration over the domain.
00:03:11.780 --> 00:03:16.650
So it's pretty much safe to
consider our sample space
00:03:16.650 --> 00:03:20.570
to be the real numbers for
continuous random variables.
00:03:20.570 --> 00:03:23.960
Later in the course, you
will see some examples where
00:03:23.960 --> 00:03:25.230
it's not the real numbers.
00:03:25.230 --> 00:03:29.217
But for now, just consider
it as real numbers.
00:03:34.840 --> 00:03:39.412
For example, probability
mass function.
00:03:39.412 --> 00:03:46.810
If X takes 1 with
probability 1/3,
00:03:46.810 --> 00:03:53.010
minus 1 with probability 1/3,
and 0 with probability 1/3.
00:03:56.070 --> 00:04:01.464
Then our probability mass
function is f_x(1) equals
00:04:01.464 --> 00:04:08.370
f_x(-1), 1/3, just like that.
00:04:08.370 --> 00:04:11.820
An example of a
continuous random variable
00:04:11.820 --> 00:04:17.470
is if-- let's say, for
example, if f sub Y is
00:04:17.470 --> 00:04:25.420
equal to 1 for all
y in [0,1], then
00:04:25.420 --> 00:04:36.305
this is pdf of uniform
random variable
00:04:36.305 --> 00:04:39.800
where the space is [0,1].
00:04:39.800 --> 00:04:41.850
So this random variable
just picks one out
00:04:41.850 --> 00:04:44.330
of the three numbers
with equal probability.
00:04:44.330 --> 00:04:47.450
This picks one out of this,
all the real numbers between 0
00:04:47.450 --> 00:04:51.600
and 1, with equal probability.
00:04:51.600 --> 00:04:54.956
These are just some basic stuff.
00:04:54.956 --> 00:04:56.330
You should be
familiar with this,
00:04:56.330 --> 00:05:00.934
but I wrote it down just so
that we agree on the notation.
00:05:00.934 --> 00:05:01.858
OK.
00:05:01.858 --> 00:05:03.353
Both of the boards don't slide.
00:05:03.353 --> 00:05:06.311
That's good.
00:05:06.311 --> 00:05:08.490
A few more stuff.
00:05:08.490 --> 00:05:14.530
Expectation-- probability first.
00:05:14.530 --> 00:05:22.092
Probability of an event can be
computed as probability of A
00:05:22.092 --> 00:05:28.200
is equal to either sum of all
points in A-- this probability
00:05:28.200 --> 00:05:36.700
mass function-- or
integral over the set A
00:05:36.700 --> 00:05:39.540
depending on what you're using.
00:05:39.540 --> 00:05:50.050
And expectation, or mean
is-- expectation of X
00:05:50.050 --> 00:05:55.410
is equal to the sum over
all x, x times that.
00:05:55.410 --> 00:06:01.110
And expectation of Y is
the integral over omega.
00:06:01.110 --> 00:06:02.580
Oh, sorry.
00:06:02.580 --> 00:06:04.540
Space.
00:06:04.540 --> 00:06:05.538
y times.
00:06:11.016 --> 00:06:12.520
OK.
00:06:12.520 --> 00:06:16.850
And one more basic
concept I'd like to review
00:06:16.850 --> 00:06:32.150
is two random variables X_1, X_2
are independent if probability
00:06:32.150 --> 00:06:38.220
that X_1 is in A and
X_2 is in B equals
00:06:38.220 --> 00:06:48.898
the product of the
probabilities, for all events A
00:06:48.898 --> 00:06:54.222
and B. OK.
00:06:57.610 --> 00:06:59.570
All agreed?
00:06:59.570 --> 00:07:01.910
So for independence, I will
talk about independence
00:07:01.910 --> 00:07:04.570
of several random
variables as well.
00:07:04.570 --> 00:07:09.290
There are two concepts
of independence--
00:07:09.290 --> 00:07:10.760
not two, but several.
00:07:10.760 --> 00:07:17.220
The two most popular are
mutually independent events
00:07:17.220 --> 00:07:19.110
and pairwise independent events.
00:07:23.583 --> 00:07:27.060
Can somebody tell me the
difference between these two
00:07:27.060 --> 00:07:28.865
for several variables?
00:07:33.230 --> 00:07:34.200
Yes?
00:07:34.200 --> 00:07:35.655
AUDIENCE: So
usually, independent
00:07:35.655 --> 00:07:38.640
means all the random
variables are independent,
00:07:38.640 --> 00:07:42.550
like X_1 is independent
with every others.
00:07:42.550 --> 00:07:46.610
But pairwise means X_1
and X_2 are independent,
00:07:46.610 --> 00:07:51.677
but X_1, X_2, and x_3, they
may not be independent.
00:07:51.677 --> 00:07:52.260
PROFESSOR: OK.
00:07:52.260 --> 00:07:54.940
Maybe-- yeah.
00:07:54.940 --> 00:07:57.020
So that's good.
00:07:57.020 --> 00:08:04.420
So let's see-- for the example
of three random variables,
00:08:04.420 --> 00:08:07.770
it might be the case that
each pair are independent.
00:08:07.770 --> 00:08:10.110
X_1 and X_2 X_1 is
independent with X_2,
00:08:10.110 --> 00:08:12.940
X_1 is independent with
X_3, X_2 is with X_3.
00:08:12.940 --> 00:08:15.290
But altogether, it's
not independent.
00:08:15.290 --> 00:08:20.780
What that means is, this type
of statement is not true.
00:08:20.780 --> 00:08:25.200
So there are say A_1, A_2, A_3
for which this does not hold.
00:08:25.200 --> 00:08:28.150
But that's just some
technical detail.
00:08:28.150 --> 00:08:30.960
We will mostly just consider
mutually independent events.
00:08:30.960 --> 00:08:32.960
So when we say that several
random variables are
00:08:32.960 --> 00:08:36.630
independent, it just means
whatever collection you take,
00:08:36.630 --> 00:08:37.742
they're all independent.
00:08:43.995 --> 00:08:44.960
OK.
00:08:44.960 --> 00:08:47.780
So a little bit more fun
stuff [? in this ?] overview.
00:08:50.640 --> 00:08:54.275
So we defined random variables.
00:08:54.275 --> 00:08:59.060
And one of the most
universal random variable,
00:08:59.060 --> 00:09:02.310
or distribution, is a
normal distribution.
00:09:10.920 --> 00:09:14.450
It's a continuous
random variable.
00:09:14.450 --> 00:09:21.160
Our continuous random variable
has normal distribution,
00:09:21.160 --> 00:09:29.835
is said to have normal
distribution, if-- N(mu,
00:09:29.835 --> 00:09:40.380
sigma)-- if the probability
distribution function is given
00:09:40.380 --> 00:09:46.820
as 1 over sigma
square root 2 pi,
00:09:46.820 --> 00:09:50.830
e to the minus x
minus mu squared.
00:09:57.270 --> 00:10:01.194
For all reals.
00:10:01.194 --> 00:10:04.146
OK?
00:10:04.146 --> 00:10:12.500
So mu mean over--
that's one of the most
00:10:12.500 --> 00:10:17.050
universal random variables--
distributions, the most
00:10:17.050 --> 00:10:18.100
important one as well.
00:10:28.990 --> 00:10:29.870
OK.
00:10:29.870 --> 00:10:33.150
So this distribution, how
it looks like-- I'm sure
00:10:33.150 --> 00:10:36.043
you saw this bell curve before.
00:10:36.043 --> 00:10:42.351
It looks like this if
it's N(0,1), let's say.
00:10:42.351 --> 00:10:45.420
And that's your y.
00:10:45.420 --> 00:10:48.360
So it's centered
around the origin,
00:10:48.360 --> 00:10:52.090
and it's symmetrical
on the origin.
00:10:52.090 --> 00:10:55.290
So now let's look
at our purpose.
00:10:55.290 --> 00:10:56.850
Let's think about our purpose.
00:10:56.850 --> 00:11:01.940
We want to model a financial
product or a stock,
00:11:01.940 --> 00:11:05.350
the price of the stock,
using some random variable.
00:11:05.350 --> 00:11:09.065
The first thing you can try
is to use normal distribution.
00:11:09.065 --> 00:11:10.690
Normal distribution
doesn't make sense,
00:11:10.690 --> 00:11:19.586
but we can say the price at
day n minus the price at day n
00:11:19.586 --> 00:11:21.615
minus 1 is normal distribution.
00:11:25.575 --> 00:11:29.440
Is this a sensible definition?
00:11:29.440 --> 00:11:30.637
That's not really.
00:11:30.637 --> 00:11:31.720
So it's not a good choice.
00:11:31.720 --> 00:11:35.810
You can model it like this,
but it's not a good choice.
00:11:35.810 --> 00:11:38.050
There may be several
reasons, but one reason
00:11:38.050 --> 00:11:40.860
is that it doesn't take into
account the order of magnitude
00:11:40.860 --> 00:11:42.110
of the price itself.
00:11:42.110 --> 00:11:49.487
So the stock-- let's say
you have a stock price that
00:11:49.487 --> 00:11:52.730
goes something like that.
00:11:52.730 --> 00:11:58.620
And say it was $10
here, and $50 here.
00:11:58.620 --> 00:12:01.890
Regardless of where
your position is at,
00:12:01.890 --> 00:12:05.900
it says that the increment,
the absolute value of increment
00:12:05.900 --> 00:12:11.080
is identically distributed at
this point and at this point.
00:12:11.080 --> 00:12:14.770
But if you observed
how it works,
00:12:14.770 --> 00:12:18.040
usually that's not
normally distributed.
00:12:18.040 --> 00:12:21.800
What's normally distributed
is the percentage
00:12:21.800 --> 00:12:24.610
of how much it changes daily.
00:12:24.610 --> 00:12:32.125
So this is not a sensible
model, not a good model.
00:12:35.910 --> 00:12:41.200
But still, we can use
normal distribution
00:12:41.200 --> 00:12:42.830
to come up with a
pretty good model.
00:12:49.170 --> 00:13:06.130
So instead, what we want
is a relative difference
00:13:06.130 --> 00:13:07.892
to be normally distributed.
00:13:15.680 --> 00:13:16.720
That is the percent.
00:13:26.760 --> 00:13:33.150
The question is, what is
the distribution of price?
00:13:33.150 --> 00:13:34.826
What does the
distribution of price?
00:13:45.750 --> 00:13:48.660
So it's not a very
good explanation.
00:13:48.660 --> 00:13:52.860
Because I'm giving just
discrete increments while
00:13:52.860 --> 00:13:55.770
these are continuous
random variables and so on.
00:13:55.770 --> 00:13:59.030
But what I'm trying to say here
is that normal distribution
00:13:59.030 --> 00:14:00.500
is not good enough.
00:14:00.500 --> 00:14:03.360
Instead, we want the
percentage change
00:14:03.360 --> 00:14:05.450
to be normally distributed.
00:14:05.450 --> 00:14:11.300
And if that is the case,
what will be the distribution
00:14:11.300 --> 00:14:13.066
of the random variable?
00:14:13.066 --> 00:14:15.440
In this case, what will be
the distribution of the price?
00:14:27.420 --> 00:14:30.250
One thing I should
mention is, in this case,
00:14:30.250 --> 00:14:34.230
if each discrement is
normally distributed,
00:14:34.230 --> 00:14:39.530
then the price at
day n will still
00:14:39.530 --> 00:14:44.270
be a normal random variable
distributed like that.
00:14:47.440 --> 00:14:53.900
So if there's no tendency-- if
the average daily increment is
00:14:53.900 --> 00:14:56.832
0, then no matter
how far you go,
00:14:56.832 --> 00:14:58.915
your random variable will
be normally distributed.
00:15:02.230 --> 00:15:06.110
But here, that will
not be the case.
00:15:06.110 --> 00:15:08.785
So we want to see what
the distribution of P_n
00:15:08.785 --> 00:15:11.981
will be in this case.
00:15:11.981 --> 00:15:12.480
OK.
00:15:17.820 --> 00:15:29.300
To do that-- let me formally
write down what I want to say.
00:15:29.300 --> 00:15:34.008
What I want to say is this.
00:15:34.008 --> 00:15:46.030
I want to define a
log-normal distribution Y,
00:15:46.030 --> 00:16:07.274
or log-normal random variable
Y, such that log of Y
00:16:07.274 --> 00:16:08.762
is normally distributed.
00:16:24.170 --> 00:16:26.670
So to derive the probability
distribution of this
00:16:26.670 --> 00:16:28.220
from the normal
distribution, we can
00:16:28.220 --> 00:16:40.010
use the change of
variable formula, which
00:16:40.010 --> 00:16:47.340
says the following:
suppose X and Y
00:16:47.340 --> 00:17:16.781
are random variables such
that-- probability of X
00:17:16.781 --> 00:17:26.262
minus x-- for all x.
00:17:32.250 --> 00:17:48.218
Then F of Y of y--
the first-- of f
00:17:48.218 --> 00:17:52.709
sub X is equal to f sub Y of y.
00:17:58.198 --> 00:17:59.196
h of x.
00:18:07.200 --> 00:18:11.930
So let's try to fit
into this story.
00:18:11.930 --> 00:18:14.920
We want to have a
random variable Y such
00:18:14.920 --> 00:18:18.510
that log Y is
normally distributed.
00:18:18.510 --> 00:18:26.430
Here-- so you can
put log of x here.
00:18:26.430 --> 00:18:30.300
If Y is normally distributed,
X will be the distribution
00:18:30.300 --> 00:18:32.890
that we're interested in.
00:18:32.890 --> 00:18:37.870
So using this formula, we can
find probability distribution
00:18:37.870 --> 00:18:40.650
function of the log-normal
distribution using
00:18:40.650 --> 00:18:43.720
the probability
distribution of normal.
00:18:43.720 --> 00:18:44.810
So let's do that.
00:19:05.669 --> 00:19:10.659
AUDIENCE: [INAUDIBLE], right?
00:19:10.659 --> 00:19:12.910
PROFESSOR: Yes.
00:19:12.910 --> 00:19:15.006
So it's not a good choice.
00:19:15.006 --> 00:19:16.380
Locally, it might
be good choice.
00:19:16.380 --> 00:19:20.357
But if it's taken
over a long time,
00:19:20.357 --> 00:19:21.440
it won't be a good choice.
00:19:21.440 --> 00:19:24.398
Because it will also take
negative values, for example.
00:19:28.517 --> 00:19:30.100
So if you just take
this model, what's
00:19:30.100 --> 00:19:31.849
going to happen over
a long period of time
00:19:31.849 --> 00:19:35.730
is it's going to hit
this square root of n,
00:19:35.730 --> 00:19:38.090
negative square root of
n line infinitely often.
00:19:42.050 --> 00:19:44.620
And then it can
go up to infinity,
00:19:44.620 --> 00:19:47.470
or it can go down to
infinity eventually.
00:19:47.470 --> 00:19:49.720
So it will take negative
values and positive values.
00:19:53.310 --> 00:19:55.460
That's one reason, but
there are several reasons
00:19:55.460 --> 00:19:57.970
why that's not a good choice.
00:19:57.970 --> 00:19:59.440
If you look at a
very small scale,
00:19:59.440 --> 00:20:03.610
it might be OK, because the base
price doesn't change that much.
00:20:03.610 --> 00:20:05.490
So if you model
in terms of ratio,
00:20:05.490 --> 00:20:07.930
our if you model it
in an absolute way,
00:20:07.930 --> 00:20:09.830
it doesn't matter that much.
00:20:09.830 --> 00:20:13.850
But if you want to do it a
little bit more large scale,
00:20:13.850 --> 00:20:17.890
then that's not a
very good choice.
00:20:17.890 --> 00:20:20.120
Other questions?
00:20:20.120 --> 00:20:21.745
Do you want me to
add some explanation?
00:20:25.322 --> 00:20:25.822
OK.
00:20:29.580 --> 00:20:32.720
So let me get this right.
00:20:37.120 --> 00:20:45.440
Y. I want X to be-- yes.
00:20:45.440 --> 00:20:49.950
I want X to be the log
normal distribution.
00:20:56.950 --> 00:21:04.580
And I want Y to be
normal distribution
00:21:04.580 --> 00:21:07.190
or a normal random variable.
00:21:07.190 --> 00:21:12.572
Then the probability
that X is at most x
00:21:12.572 --> 00:21:24.500
equals the probability
that Y is at most-- sigma.
00:21:24.500 --> 00:21:29.070
Y is at most log x.
00:21:29.070 --> 00:21:33.160
That's the definition of
log-normal distribution.
00:21:33.160 --> 00:21:39.130
Then by using this change
of variable formula,
00:21:39.130 --> 00:21:41.780
probability density
function of X
00:21:41.780 --> 00:21:46.980
is equal to probability
density function of Y at log
00:21:46.980 --> 00:21:54.440
x times the differentiation
of log x which is 1 over x.
00:21:54.440 --> 00:22:00.460
So it becomes 1 over
x sigma square root
00:22:00.460 --> 00:22:07.704
2 pi, e to the minus
log x minus mu squared.
00:22:11.610 --> 00:22:13.430
So log-normal
distribution can also
00:22:13.430 --> 00:22:15.380
be defined as the
distribution which has
00:22:15.380 --> 00:22:17.246
probability mass function this.
00:22:22.650 --> 00:22:26.160
You can use either definition.
00:22:26.160 --> 00:22:29.391
Let me just make sure that I
didn't mess up in the middle.
00:22:32.800 --> 00:22:33.780
Yes.
00:22:33.780 --> 00:22:39.187
And that only works
for x greater than 0.
00:22:39.187 --> 00:22:39.687
Yes?
00:22:39.687 --> 00:22:41.714
AUDIENCE: [INAUDIBLE]?
00:22:41.714 --> 00:22:42.380
PROFESSOR: Yeah.
00:22:42.380 --> 00:22:43.940
So all logs are natural log.
00:22:43.940 --> 00:22:46.171
It should be ln.
00:22:46.171 --> 00:22:46.670
Yeah.
00:22:46.670 --> 00:22:48.320
Thank you.
00:22:48.320 --> 00:22:49.810
OK.
00:22:49.810 --> 00:22:58.370
So question-- what's the mean
of this distribution here?
00:22:58.370 --> 00:22:58.870
Yeah?
00:22:58.870 --> 00:23:00.970
AUDIENCE: 1?
00:23:00.970 --> 00:23:02.460
PROFESSOR: Not 1.
00:23:02.460 --> 00:23:04.820
It might be mu.
00:23:04.820 --> 00:23:07.500
Is it mu?
00:23:07.500 --> 00:23:08.260
Oh, sorry.
00:23:08.260 --> 00:23:09.850
It might be e to the mu.
00:23:09.850 --> 00:23:15.470
Because log X, the normal
distribution had mean mu.
00:23:15.470 --> 00:23:17.630
log x equals mu
might be the center.
00:23:17.630 --> 00:23:20.850
If that's the case, x is e
to the mu will be the mean.
00:23:20.850 --> 00:23:23.915
Is that the case?
00:23:23.915 --> 00:23:24.415
Yes?
00:23:24.415 --> 00:23:27.890
AUDIENCE: Can you get
the mu minus [INAUDIBLE]?
00:23:27.890 --> 00:23:29.760
PROFESSOR: Probably right.
00:23:29.760 --> 00:23:31.070
I don't remember what's there.
00:23:31.070 --> 00:23:32.490
There is a correcting factor.
00:23:32.490 --> 00:23:34.292
I don't remember
exactly what that is,
00:23:34.292 --> 00:23:37.210
but I think you're right.
00:23:37.210 --> 00:23:39.770
So one very important
thing to remember
00:23:39.770 --> 00:23:43.500
is log-normal
distribution are referred
00:23:43.500 --> 00:23:48.150
to in terms of the
parameters mu and sigma,
00:23:48.150 --> 00:23:50.510
because that's the mu and
sigma up here and here coming
00:23:50.510 --> 00:23:52.600
from the normal distribution.
00:23:52.600 --> 00:23:57.580
But those are not the
mean and variance anymore,
00:23:57.580 --> 00:24:01.900
because you skew
the distribution.
00:24:01.900 --> 00:24:03.700
It's no longer centered at mu.
00:24:03.700 --> 00:24:07.490
log X is centered at mu, but
when it takes exponential,
00:24:07.490 --> 00:24:08.590
it becomes skewed.
00:24:08.590 --> 00:24:12.630
And we take the average,
you'll see that the mean
00:24:12.630 --> 00:24:13.930
is no longer e to the mu.
00:24:13.930 --> 00:24:16.365
So that doesn't give the mean.
00:24:16.365 --> 00:24:18.490
That doesn't imply that
the mean is e to the sigma.
00:24:18.490 --> 00:24:20.870
That doesn't imply
that the variance is
00:24:20.870 --> 00:24:23.242
something like e to the sigma.
00:24:23.242 --> 00:24:27.040
That's just totally nonsense.
00:24:27.040 --> 00:24:30.080
Just remember-- these are just
parameters, some parameters.
00:24:30.080 --> 00:24:32.450
It's no longer mean or variance.
00:24:35.670 --> 00:24:39.794
And in your homework,
one exercise,
00:24:39.794 --> 00:24:41.710
we'll ask you to compute
the mean and variance
00:24:41.710 --> 00:24:44.490
of the random variable.
00:24:44.490 --> 00:24:48.560
But really, just try to
have it stick in your mind
00:24:48.560 --> 00:24:53.160
that mu and sigma is no
longer mean and variance.
00:24:53.160 --> 00:24:56.230
That's only the case for
normal random variables.
00:24:56.230 --> 00:24:58.380
And the reason we are
still using mu and sigma
00:24:58.380 --> 00:25:00.680
is because of this derivation.
00:25:00.680 --> 00:25:02.390
And it's easy to
describe it in those.
00:25:05.830 --> 00:25:07.940
OK.
00:25:07.940 --> 00:25:11.800
So the normal distribution
and log-normal distribution
00:25:11.800 --> 00:25:13.720
will probably be
the distributions
00:25:13.720 --> 00:25:15.742
that you'll see the most
throughout the course.
00:25:15.742 --> 00:25:17.325
But there are some
other distributions
00:25:17.325 --> 00:25:18.500
that you'll also see.
00:25:23.460 --> 00:25:24.948
I need this.
00:25:32.884 --> 00:25:35.650
I will not talk
about it in detail.
00:25:35.650 --> 00:25:38.540
It will be some
exercise questions.
00:25:38.540 --> 00:25:44.939
For example, you have Poisson
distribution or exponential
00:25:44.939 --> 00:25:45.522
distributions.
00:25:52.130 --> 00:25:56.550
These are some other
distributions that you'll see.
00:25:56.550 --> 00:25:59.060
And all of these-- normal,
log-normal, Poisson,
00:25:59.060 --> 00:26:01.060
and exponential,
and a lot more can
00:26:01.060 --> 00:26:04.400
be grouped into a
family of distributions
00:26:04.400 --> 00:26:05.798
called exponential family.
00:26:18.490 --> 00:26:24.026
So a distribution is called to
be in an exponential family--
00:26:24.026 --> 00:26:36.590
A distribution belongs
to exponential family
00:26:36.590 --> 00:26:50.890
if there exists a theta,
a vector that parametrizes
00:26:50.890 --> 00:27:05.520
the distribution such that
the probability density
00:27:05.520 --> 00:27:10.670
function for this choice
of parameter theta
00:27:10.670 --> 00:27:16.480
can be written as h
of x times c of theta
00:27:16.480 --> 00:27:22.498
times the exponent of
sum from i equal 1 to k--
00:27:35.446 --> 00:27:35.970
Yes.
00:27:35.970 --> 00:27:40.100
So here, when I write
only x, h should only
00:27:40.100 --> 00:27:43.400
depend on x, not on theta.
00:27:43.400 --> 00:27:45.090
When I write some
function of theta,
00:27:45.090 --> 00:27:48.020
it should only depend
on theta, not on x.
00:27:48.020 --> 00:28:01.070
So h(x), t_i(x) depends only
on x and c(theta) on my value
00:28:01.070 --> 00:28:04.679
theta, depends only on theta.
00:28:04.679 --> 00:28:05.720
That's an abstract thing.
00:28:05.720 --> 00:28:07.830
It's not clear why
this is so useful,
00:28:07.830 --> 00:28:10.140
at least from the definition.
00:28:10.140 --> 00:28:14.955
But you're going to talk
about some distribution
00:28:14.955 --> 00:28:16.650
for an exponential
family, right?
00:28:16.650 --> 00:28:17.150
Yeah.
00:28:17.150 --> 00:28:19.840
So you will see
something about this.
00:28:19.840 --> 00:28:21.770
But one good thing
is, they exhibit
00:28:21.770 --> 00:28:25.360
some good statistical
behavior, the things-- when
00:28:25.360 --> 00:28:28.330
you group them into--
all distributions
00:28:28.330 --> 00:28:31.460
in the exponential family
have some nice statistical
00:28:31.460 --> 00:28:35.590
properties, which makes it good.
00:28:35.590 --> 00:28:37.270
That's too abstract.
00:28:37.270 --> 00:28:42.140
Let's see how log-normal
distribution actually falls
00:28:42.140 --> 00:28:43.631
into the exponential family.
00:28:47.607 --> 00:28:49.444
AUDIENCE: So, let
me just comment.
00:28:49.444 --> 00:28:50.360
PROFESSOR: Yeah, sure.
00:28:50.360 --> 00:28:53.976
AUDIENCE: The notion of
independent random variables,
00:28:53.976 --> 00:28:58.687
you went over how the--
well, the probability density
00:28:58.687 --> 00:29:00.520
functions of collections
of random variables
00:29:00.520 --> 00:29:01.936
if they're mutually
independent is
00:29:01.936 --> 00:29:05.640
the product of the
probability densities
00:29:05.640 --> 00:29:07.132
of the individual variables.
00:29:07.132 --> 00:29:10.240
And so with this
exponential family,
00:29:10.240 --> 00:29:12.685
if you have random variables
from the same exponential
00:29:12.685 --> 00:29:18.380
family, products of this
density function factor out
00:29:18.380 --> 00:29:19.700
into a very simple form.
00:29:19.700 --> 00:29:21.360
It doesn't get more
complicated as you
00:29:21.360 --> 00:29:24.430
look at the joint density
of many variables,
00:29:24.430 --> 00:29:27.510
and in fact simplifies to
the same exponential family.
00:29:27.510 --> 00:29:30.210
So that's where that
becomes very useful.
00:29:30.210 --> 00:29:32.305
PROFESSOR: So it's designed
so that it factors out
00:29:32.305 --> 00:29:33.180
when it's multiplied.
00:29:33.180 --> 00:29:34.644
It factors out well.
00:29:37.990 --> 00:29:38.650
OK.
00:29:38.650 --> 00:29:43.000
So-- sorry about that.
00:29:43.000 --> 00:29:44.960
Yeah, log-normal distribution.
00:29:44.960 --> 00:29:49.970
So take h(x), 1 over x.
00:29:49.970 --> 00:29:52.350
Before that, let's just rewrite
that in a different way.
00:29:52.350 --> 00:29:58.804
So 1 over x sigma square
root 2 pi, e to the minus log
00:29:58.804 --> 00:30:03.430
x [INAUDIBLE] squared.
00:30:03.430 --> 00:30:04.530
Square.
00:30:04.530 --> 00:30:10.546
Can be rewritten as 1
over x, times 1 over sigma
00:30:10.546 --> 00:30:18.215
squared 2 pi, e to
the minus log x square
00:30:18.215 --> 00:30:30.590
over 2 sigma square plus
mu log x over sigma square
00:30:30.590 --> 00:30:33.065
minus mu square.
00:30:37.050 --> 00:30:38.730
Let's write it like that.
00:30:38.730 --> 00:30:42.464
Set up h(x) equals 1 over x.
00:30:42.464 --> 00:30:51.422
c of theta-- sorry,
theta equals mu sigma.
00:30:51.422 --> 00:30:55.932
c(theta) is equal to 1 over
sigma square root 2 pi, e
00:30:55.932 --> 00:30:57.163
to the minus mu square.
00:31:01.510 --> 00:31:03.920
So you will
parametrize this family
00:31:03.920 --> 00:31:06.870
in terms of mu and sigma.
00:31:06.870 --> 00:31:09.490
Your h of x here
will be 1 over x.
00:31:09.490 --> 00:31:14.000
Your c(theta) will be this
term and the last term here,
00:31:14.000 --> 00:31:16.960
because this
doesn't depend on x.
00:31:16.960 --> 00:31:21.630
And then you have to
figure out what w and t is.
00:31:21.630 --> 00:31:24.970
You can let w_1 of
x be log x square.
00:31:29.180 --> 00:31:38.940
t_1-- no, t_1 of x be log x
square, w_1 of theta be minus 1
00:31:38.940 --> 00:31:41.392
over 2 sigma square.
00:31:41.392 --> 00:31:44.080
And similarly, you
can let t_2 equals log
00:31:44.080 --> 00:31:51.404
x and w_2 equals mu over sigma.
00:31:54.580 --> 00:31:56.570
It's just some technicality,
but at least you
00:31:56.570 --> 00:31:59.974
can see it really fits in.
00:32:02.690 --> 00:32:05.200
OK.
00:32:05.200 --> 00:32:07.380
So that's all
about distributions
00:32:07.380 --> 00:32:10.080
that I want to talk about.
00:32:10.080 --> 00:32:12.640
And then let's talk
a little bit more
00:32:12.640 --> 00:32:15.340
about more interesting
stuff, in my opinion.
00:32:15.340 --> 00:32:16.705
I like this stuff better.
00:32:19.440 --> 00:32:23.340
There are two main things
that we're interested in.
00:32:23.340 --> 00:32:30.650
When we have a random variable,
at least for our purpose, what
00:32:30.650 --> 00:32:42.766
we want to study is given
a random variable, first,
00:32:42.766 --> 00:32:44.015
we want to study a statistics.
00:32:50.710 --> 00:32:54.826
So we want to study this
statistics, whatever
00:32:54.826 --> 00:32:55.798
that means.
00:32:59.690 --> 00:33:02.567
And that will be represented
by the k-th moments
00:33:02.567 --> 00:33:03.525
of the random variable.
00:33:10.340 --> 00:33:15.370
Where k-th moment is defined
as expectation of X to the k.
00:33:20.600 --> 00:33:24.000
And a good way to study
all the moments together
00:33:24.000 --> 00:33:26.855
in one function is a
moment-generating function.
00:33:34.300 --> 00:33:36.480
So this moment-generating
function
00:33:36.480 --> 00:33:40.340
encodes all the k-th moments
of a random variable.
00:33:40.340 --> 00:33:43.130
So it contains all the
statistical information
00:33:43.130 --> 00:33:45.339
of a random variable.
00:33:45.339 --> 00:33:46.880
That's why
moment-generating function
00:33:46.880 --> 00:33:48.060
will be interesting to us.
00:33:48.060 --> 00:33:50.050
Because when you
want to study it,
00:33:50.050 --> 00:33:52.760
you don't have to consider
each moment separately.
00:33:52.760 --> 00:33:54.090
It gives a unified way.
00:33:54.090 --> 00:33:58.050
It gives a very good
feeling about your function.
00:33:58.050 --> 00:33:59.560
That will be our first topic.
00:33:59.560 --> 00:34:02.200
Our second topic will
be we want to study
00:34:02.200 --> 00:34:10.140
its long-term or
large-scale behavior.
00:34:18.190 --> 00:34:21.199
So for example, assume that you
have a normal distribution--
00:34:21.199 --> 00:34:24.449
one random variable with
normal distribution.
00:34:24.449 --> 00:34:28.800
If we just have a
single random variable,
00:34:28.800 --> 00:34:30.760
you really have no control.
00:34:30.760 --> 00:34:31.870
It can be anywhere.
00:34:31.870 --> 00:34:39.260
The outcome can be anything
according to that distribution.
00:34:39.260 --> 00:34:41.429
But if you have several
independent random variables
00:34:41.429 --> 00:34:44.540
with the exact
same distribution,
00:34:44.540 --> 00:34:49.530
if the number is super large--
let's say 100 million--
00:34:49.530 --> 00:34:55.320
and you plot how many random
variables fall into each point
00:34:55.320 --> 00:34:58.150
into a graph,
you'll know that it
00:34:58.150 --> 00:35:01.672
has to look very
close to this curve.
00:35:01.672 --> 00:35:04.160
It will be more dense
here, sparser there,
00:35:04.160 --> 00:35:06.720
and sparser there.
00:35:06.720 --> 00:35:09.050
So you don't have
individual control on each
00:35:09.050 --> 00:35:10.150
of the random variables.
00:35:10.150 --> 00:35:12.185
But when you look
at large scale,
00:35:12.185 --> 00:35:16.860
you know, at least with
very high probability,
00:35:16.860 --> 00:35:19.990
it has to look like this curve.
00:35:19.990 --> 00:35:22.480
Those kind of things are
what we want to study.
00:35:22.480 --> 00:35:25.720
When we look at this long-term
behavior or large scale
00:35:25.720 --> 00:35:28.500
behavior, what can we say?
00:35:28.500 --> 00:35:30.130
What kind of events
are guaranteed
00:35:30.130 --> 00:35:35.110
to happen with probability,
let's say, 99.9%?
00:35:35.110 --> 00:35:38.680
And actually, some interesting
things are happening.
00:35:38.680 --> 00:35:44.800
As you might already know, two
typical theorems of this type
00:35:44.800 --> 00:35:46.850
will be, in this
topic will be law
00:35:46.850 --> 00:35:53.282
of large numbers and
central limit theorem.
00:36:02.520 --> 00:36:04.590
So let's start with
our first topic--
00:36:04.590 --> 00:36:05.975
the moment-generating function.
00:36:26.310 --> 00:36:28.800
The moment-generating
function of a random variable
00:36:28.800 --> 00:36:31.540
is defined as-- I
write it as m sub
00:36:31.540 --> 00:36:39.330
X. It's defined as expectation
of e to the t times x
00:36:39.330 --> 00:36:41.090
where t is some parameter.
00:36:41.090 --> 00:36:42.510
t can be any real.
00:36:47.372 --> 00:36:48.330
You have to be careful.
00:36:48.330 --> 00:36:51.680
It doesn't always converge.
00:36:51.680 --> 00:36:58.360
So remark: does not
necessarily exist.
00:37:09.900 --> 00:37:12.960
So for example, one of the
distributions you already saw
00:37:12.960 --> 00:37:15.010
does not have
moment-generating function.
00:37:15.010 --> 00:37:22.101
The log-normal
distribution does not
00:37:22.101 --> 00:37:23.600
have any moment-generating
function.
00:37:30.650 --> 00:37:33.720
And that's one thing
you have to be careful.
00:37:33.720 --> 00:37:35.870
It's not just some
theoretical thing.
00:37:38.329 --> 00:37:40.120
The statement is not
something theoretical.
00:37:40.120 --> 00:37:42.670
It actually happens for
some random variables
00:37:42.670 --> 00:37:45.548
that you encounter in your life.
00:37:45.548 --> 00:37:48.190
So be careful.
00:37:48.190 --> 00:37:54.460
And that will actually show
some very interesting thing
00:37:54.460 --> 00:37:57.220
I will later explain.
00:37:57.220 --> 00:37:59.796
Some very interesting
facts arise from this fact.
00:38:03.900 --> 00:38:06.277
Before going into
that, first of all,
00:38:06.277 --> 00:38:08.110
why is it called
moment-generating function?
00:38:08.110 --> 00:38:14.540
It's because if you
take the k-th derivative
00:38:14.540 --> 00:38:26.280
of this function,
then it actually
00:38:26.280 --> 00:38:33.131
gives the k-th moment
of your random variable.
00:38:33.131 --> 00:38:34.505
That's where the
name comes from.
00:38:43.235 --> 00:38:45.225
It's for all integers.
00:38:58.320 --> 00:39:00.040
And that gives a
different way of writing
00:39:00.040 --> 00:39:01.248
a moment-generating function.
00:39:11.230 --> 00:39:18.090
Because of that, we may write
the moment-generating function
00:39:18.090 --> 00:39:24.992
as the sum from k equals
0 to infinity, t to the k,
00:39:24.992 --> 00:39:29.912
k factorial, times
a k-th moment.
00:39:37.790 --> 00:39:40.469
That's like the
Taylor expansion.
00:39:40.469 --> 00:39:42.010
Because you know
all the derivatives,
00:39:42.010 --> 00:39:43.551
you know what the
functions would be.
00:39:43.551 --> 00:39:45.300
Of course, only if it exists.
00:39:45.300 --> 00:39:46.300
This might not converge.
00:39:55.080 --> 00:39:58.360
So if moment-generating
function exists,
00:39:58.360 --> 00:40:01.120
they pretty much classify
your random variables.
00:40:04.630 --> 00:40:09.020
So if two random
variables, X, Y,
00:40:09.020 --> 00:40:16.120
have the same
moment-generating function,
00:40:16.120 --> 00:40:24.835
then X and Y have the
same distribution.
00:40:30.020 --> 00:40:32.550
I will not prove this theorem.
00:40:32.550 --> 00:40:35.080
But it says that
moment-generating function,
00:40:35.080 --> 00:40:39.600
if it exists, encodes
really all the information
00:40:39.600 --> 00:40:41.516
about your random variables.
00:40:41.516 --> 00:40:42.990
You're not losing anything.
00:40:46.320 --> 00:40:50.540
However, be very careful when
you're applying this theorem.
00:40:50.540 --> 00:40:59.920
Because remark,
it does not imply
00:40:59.920 --> 00:41:20.740
that all random variables
with identical k-th moments
00:41:20.740 --> 00:41:26.790
for all k has the
same distribution.
00:41:37.418 --> 00:41:40.030
Do you see it?
00:41:40.030 --> 00:41:43.330
If X and Y have a
moment-generating function,
00:41:43.330 --> 00:41:49.210
and they're the same, then they
have the same distribution.
00:41:49.210 --> 00:41:52.710
This looks a little bit
controversial to this theorem.
00:41:52.710 --> 00:41:56.890
It says that it's not
necessarily the case
00:41:56.890 --> 00:42:01.000
that two random variables, which
have identical moments-- so
00:42:01.000 --> 00:42:04.750
all k-th moments are the
same for two variables--
00:42:04.750 --> 00:42:06.710
even if that's the case,
they don't necessarily
00:42:06.710 --> 00:42:10.060
have to have the
same distribution.
00:42:10.060 --> 00:42:12.014
Which seems like it
doesn't make sense
00:42:12.014 --> 00:42:13.180
if you look at this theorem.
00:42:13.180 --> 00:42:14.596
Because moment-generating
function
00:42:14.596 --> 00:42:16.650
is defined in terms
of the moments.
00:42:16.650 --> 00:42:18.742
If two random variables
have the same moment,
00:42:18.742 --> 00:42:20.575
we have the same
moment-generating function.
00:42:20.575 --> 00:42:22.616
If they have the same
moment-generating function,
00:42:22.616 --> 00:42:24.970
they have the same distribution.
00:42:24.970 --> 00:42:28.450
There is a hole
in this argument.
00:42:28.450 --> 00:42:31.850
Even if they have
the same moments,
00:42:31.850 --> 00:42:33.792
it doesn't necessarily
imply that they
00:42:33.792 --> 00:42:35.500
have the same
moment-generating function.
00:42:35.500 --> 00:42:39.520
They might both not have
moment-generating functions.
00:42:39.520 --> 00:42:42.620
That's the glitch.
00:42:42.620 --> 00:42:44.040
Be careful.
00:42:44.040 --> 00:42:47.587
So just remember that even if
they have the same moments,
00:42:47.587 --> 00:42:49.670
they don't necessarily
have the same distribution.
00:42:49.670 --> 00:42:51.740
And the reason is
because-- one reason
00:42:51.740 --> 00:42:56.110
is because the moment-generating
function might not exist.
00:42:56.110 --> 00:42:57.930
And if you look in
to Wikipedia, you'll
00:42:57.930 --> 00:43:00.850
see an example of
when it happens,
00:43:00.850 --> 00:43:03.345
of two random variables
where this happens.
00:43:10.310 --> 00:43:13.380
So that's one thing
we will use later.
00:43:13.380 --> 00:43:17.660
Another thing that
we will use later,
00:43:17.660 --> 00:43:20.950
it's a statement
very similar to that,
00:43:20.950 --> 00:43:25.820
but it says something about a
sequence of random variables.
00:43:25.820 --> 00:43:39.406
So if X_1, X_2, up to X_n is
a sequence of random variables
00:43:39.406 --> 00:43:48.470
such that the moment-generating
function exists,
00:43:48.470 --> 00:43:52.580
and it converges-- ah,
it goes to infinity.
00:43:57.542 --> 00:44:03.250
Tends to the
moment-generating function
00:44:03.250 --> 00:44:05.380
of some random variable t.
00:44:05.380 --> 00:44:13.091
X. For some random
variable X for all t.
00:44:16.250 --> 00:44:18.970
Here, we're assuming that all
moment-generating function
00:44:18.970 --> 00:44:20.280
exists.
00:44:20.280 --> 00:44:22.050
So again, the
situation is, you have
00:44:22.050 --> 00:44:24.900
a sequence of random variables.
00:44:24.900 --> 00:44:27.600
Their moment-generating
function exists.
00:44:27.600 --> 00:44:31.790
And in each point
t, it converges
00:44:31.790 --> 00:44:33.967
to the value of the
moment-generating function
00:44:33.967 --> 00:44:35.300
of some other random variable x.
00:44:38.270 --> 00:44:41.310
And what should happen?
00:44:41.310 --> 00:44:43.880
In light of this theorem,
it should be the case
00:44:43.880 --> 00:44:47.490
that the distribution
of this sequence
00:44:47.490 --> 00:44:49.240
gets closer and closer
to the distribution
00:44:49.240 --> 00:44:53.360
of this random variable x.
00:44:53.360 --> 00:45:00.220
And to make it formal, to make
that information formal, what
00:45:00.220 --> 00:45:09.760
we can conclude is, for
all x, the probability
00:45:09.760 --> 00:45:15.440
X_i is less than or equal to
x tends to the probability
00:45:15.440 --> 00:45:17.300
that at x.
00:45:20.090 --> 00:45:22.990
So in this sense,
the distributions
00:45:22.990 --> 00:45:25.940
of these random variables
converges to the distribution
00:45:25.940 --> 00:45:27.216
of that random variable.
00:45:30.090 --> 00:45:32.330
So it's just a technical issue.
00:45:32.330 --> 00:45:38.890
You can just think of it as
these random variables converge
00:45:38.890 --> 00:45:41.200
to that random variable.
00:45:41.200 --> 00:45:43.230
If you take some graduate
probability course,
00:45:43.230 --> 00:45:47.100
you'll see that there's
several possible ways
00:45:47.100 --> 00:45:48.730
to define convergence.
00:45:48.730 --> 00:45:50.740
But that's just
some technicality.
00:45:50.740 --> 00:45:53.397
And the spirit
here is just really
00:45:53.397 --> 00:45:55.730
the sequence converges if its
moment-generating function
00:45:55.730 --> 00:45:56.229
converges.
00:45:59.790 --> 00:46:02.470
So as you can see from
these two theorems,
00:46:02.470 --> 00:46:04.440
moment-generating
function, if it exists,
00:46:04.440 --> 00:46:08.270
is a really powerful
tool that allows you
00:46:08.270 --> 00:46:09.480
to control the distribution.
00:46:13.060 --> 00:46:16.407
You'll see some applications
later in central limit theorem.
00:46:16.407 --> 00:46:16.990
Any questions?
00:46:21.530 --> 00:46:22.446
AUDIENCE: [INAUDIBLE]?
00:46:28.557 --> 00:46:29.390
PROFESSOR: This one?
00:46:32.870 --> 00:46:34.154
Why?
00:46:34.154 --> 00:46:35.612
AUDIENCE: Because
it starts with t,
00:46:35.612 --> 00:46:38.162
and the right-hand side
has nothing general.
00:46:40.777 --> 00:46:41.360
PROFESSOR: Ah.
00:46:44.318 --> 00:46:47.180
Thank you.
00:46:47.180 --> 00:46:48.350
We evaluated at zero.
00:46:53.230 --> 00:46:54.694
Other questions?
00:46:54.694 --> 00:46:56.646
Other corrections?
00:46:56.646 --> 00:46:59.086
AUDIENCE: When you say the
moment-generating function
00:46:59.086 --> 00:47:01.526
doesn't exist, do you mean
that it isn't analytic
00:47:01.526 --> 00:47:03.010
or it doesn't converge?
00:47:03.010 --> 00:47:04.580
PROFESSOR: It
might not converge.
00:47:04.580 --> 00:47:08.130
So log-normal distribution,
it does not converge.
00:47:08.130 --> 00:47:10.412
So for all non-zero
t, it does not
00:47:10.412 --> 00:47:12.109
converge, for
log-normal distribution.
00:47:12.109 --> 00:47:13.025
AUDIENCE: [INAUDIBLE]?
00:47:16.350 --> 00:47:17.140
PROFESSOR: Here?
00:47:17.140 --> 00:47:17.640
Yes.
00:47:17.640 --> 00:47:19.822
Pointwise convergence implies
pointwise convergence.
00:47:22.420 --> 00:47:22.945
No, no.
00:47:26.760 --> 00:47:30.474
Because it's pointwise, this
conclusion is also rather weak.
00:47:30.474 --> 00:47:32.640
It's almost the weakest
convergence in distribution.
00:48:01.024 --> 00:48:01.524
OK.
00:48:01.524 --> 00:48:12.480
The law of large numbers.
00:49:04.100 --> 00:49:06.940
So now we're talking about
large-scale behavior.
00:49:06.940 --> 00:49:09.630
Let X_1 up to X_n be
independent random variables
00:49:09.630 --> 00:49:11.334
with identical distribution.
00:49:11.334 --> 00:49:13.250
We don't really know
what the distribution is,
00:49:13.250 --> 00:49:15.270
but we know that
they're all the same.
00:49:15.270 --> 00:49:18.620
In short, I'll just refer
to this condition as i.i.d.
00:49:18.620 --> 00:49:21.990
random variables later.
00:49:21.990 --> 00:49:25.048
Independent, identically
distributed random variables.
00:49:29.040 --> 00:49:36.530
And let mean be mu,
variance be sigma square.
00:49:44.470 --> 00:49:50.740
Let's also define X as the
average of n random variables.
00:49:54.590 --> 00:50:22.986
Then the probability that--
X-- for all-- all positive
00:50:22.986 --> 00:50:23.486
[INAUDIBLE].
00:50:31.590 --> 00:50:35.100
So whenever you have identical
independent distributions, when
00:50:35.100 --> 00:50:39.050
you take their average, if
you take a large enough number
00:50:39.050 --> 00:50:43.430
of samples, they will be
very close to the mean, which
00:50:43.430 --> 00:50:44.144
makes sense.
00:51:04.420 --> 00:51:06.270
So what's an example of this?
00:51:06.270 --> 00:51:14.010
Before proving it, example
of this theorem in practice
00:51:14.010 --> 00:51:16.605
can be seen in the casino.
00:51:22.530 --> 00:51:25.120
So for example, if
you're playing blackjack
00:51:25.120 --> 00:51:38.890
in a casino, when you're
playing against the casino,
00:51:38.890 --> 00:51:42.700
you have a very
small disadvantage.
00:51:42.700 --> 00:51:52.500
If you're playing at
the optimal strategy,
00:51:52.500 --> 00:51:56.380
you have-- does anybody
know the probability?
00:51:56.380 --> 00:52:00.460
It's about 48%, 49%.
00:52:00.460 --> 00:52:04.520
About 48% chance of winning.
00:52:09.160 --> 00:52:14.340
That means if you bet $1 at
the beginning of each round,
00:52:14.340 --> 00:52:22.605
the expected amount
you'll win is $0.48.
00:52:22.605 --> 00:52:28.060
The expected amount that the
casino will win is $0.52.
00:52:28.060 --> 00:52:30.760
But it's designed so
that the variance is
00:52:30.760 --> 00:52:37.030
so big that this expectation
is hidden, the mean is hidden.
00:52:37.030 --> 00:52:39.390
From the player's
point of view, you only
00:52:39.390 --> 00:52:41.390
have a very small sample.
00:52:41.390 --> 00:52:44.960
So it looks like the
mean doesn't matter,
00:52:44.960 --> 00:52:48.710
because the variance takes
over in a very short scale.
00:52:48.710 --> 00:52:50.730
But from the casino's
point of view,
00:52:50.730 --> 00:52:54.680
they're taking a
very large n there.
00:52:54.680 --> 00:53:02.720
So for each round, let's
say from the casino's
00:53:02.720 --> 00:53:13.500
point of view, it's
like taking, they
00:53:13.500 --> 00:53:20.520
are taking enormous value of n.
00:53:26.640 --> 00:53:27.660
n here.
00:53:27.660 --> 00:53:32.380
And that means as long as they
have the slightest advantage,
00:53:32.380 --> 00:53:34.993
they'll be winning money,
and a huge amount of money.
00:53:38.240 --> 00:53:41.690
And most games played in the
casinos are designed like this.
00:53:41.690 --> 00:53:45.730
It looks like the mean
is really close to 50%,
00:53:45.730 --> 00:53:47.840
but it's hidden,
because they designed it
00:53:47.840 --> 00:53:51.000
so the variance is big.
00:53:51.000 --> 00:53:53.180
But from the casino's
point of view,
00:53:53.180 --> 00:53:55.010
they have enough
players to play the game
00:53:55.010 --> 00:54:02.120
so that the law of large
numbers just makes them money.
00:54:07.770 --> 00:54:09.530
The moral is, don't
play blackjack.
00:54:12.240 --> 00:54:15.360
Play poker.
00:54:15.360 --> 00:54:19.790
The reason that the rule
of law of large numbers
00:54:19.790 --> 00:54:23.010
doesn't apply, at least
in this sense, to poker--
00:54:23.010 --> 00:54:24.220
can anybody explain why?
00:54:27.100 --> 00:54:32.000
It's because poker, you're
playing against other players.
00:54:32.000 --> 00:54:36.500
If you have an advantage, if
your skill-- if you believe
00:54:36.500 --> 00:54:38.980
that there is skill in poker--
if your skill is better
00:54:38.980 --> 00:54:41.330
than the other
player by, let's say,
00:54:41.330 --> 00:54:47.010
5% chance, then you have
an edge over that player.
00:54:47.010 --> 00:54:48.010
So you can win money.
00:54:48.010 --> 00:54:53.870
The only problem is that
because-- poker, you're
00:54:53.870 --> 00:54:55.691
not playing against the casino.
00:55:00.390 --> 00:55:04.770
Don't play against casino.
00:55:04.770 --> 00:55:06.530
But they still
have to make money.
00:55:06.530 --> 00:55:08.770
So what they do instead
is they take rake.
00:55:08.770 --> 00:55:12.350
So for each round
that the players play,
00:55:12.350 --> 00:55:15.740
they pay some fee to the casino.
00:55:15.740 --> 00:55:19.920
And how the casino makes
money at the poker table
00:55:19.920 --> 00:55:22.870
is by accumulating those fees.
00:55:22.870 --> 00:55:25.291
They're not taking
chances there.
00:55:25.291 --> 00:55:26.790
But from the player's
point of view,
00:55:26.790 --> 00:55:32.405
if you're better than the other
player, and the amount of edge
00:55:32.405 --> 00:55:35.630
you have over the other
player is larger than the fee
00:55:35.630 --> 00:55:38.000
that the casino
charges to you, then
00:55:38.000 --> 00:55:41.380
now you can apply law of large
numbers to yourself and win.
00:55:45.420 --> 00:55:50.360
And if you take an
example as poker,
00:55:50.360 --> 00:55:54.372
it looks like-- OK, I'm
not going to play poker.
00:55:54.372 --> 00:55:59.320
But if it's a hedge
fund, or if you're
00:55:59.320 --> 00:56:04.850
doing high-frequency trading,
that's the moral behind it.
00:56:04.850 --> 00:56:07.860
So that's the belief
you should have.
00:56:07.860 --> 00:56:10.760
You have to believe
that you have an edge.
00:56:10.760 --> 00:56:13.660
Even if you have a
tiny edge, if you
00:56:13.660 --> 00:56:16.400
can have enough
number of trials,
00:56:16.400 --> 00:56:21.000
if you can trade enough of times
using some strategy that you
00:56:21.000 --> 00:56:26.580
believe is winning over time,
then law of large numbers
00:56:26.580 --> 00:56:31.266
will take it from there and
will bring you money, profit.
00:56:34.920 --> 00:56:41.770
Of course, the problem is,
when the variance is big,
00:56:41.770 --> 00:56:45.210
your belief starts to fall.
00:56:45.210 --> 00:56:48.660
At least, that was the case for
me when I was playing poker.
00:56:48.660 --> 00:56:51.650
Because I believed
that I had an edge,
00:56:51.650 --> 00:56:55.520
but when there is
really swing, it
00:56:55.520 --> 00:56:59.680
looks like your
expectation is negative.
00:56:59.680 --> 00:57:01.885
And that's when you have
to believe in yourself.
00:57:05.590 --> 00:57:07.690
Yeah.
00:57:07.690 --> 00:57:09.480
That's when your
faith in mathematics
00:57:09.480 --> 00:57:11.929
is being challenged.
00:57:11.929 --> 00:57:12.720
It really happened.
00:57:15.290 --> 00:57:17.290
I hope it doesn't happen to you.
00:57:17.290 --> 00:57:22.730
Anyway, that's proof
law of large numbers.
00:57:22.730 --> 00:57:23.690
How do you prove it?
00:57:23.690 --> 00:57:24.690
The proof is quite easy.
00:57:27.840 --> 00:57:32.940
First of all, one observation--
expectation of X is just
00:57:32.940 --> 00:57:37.640
expectation of 1 over
n times sum of X_i's.
00:57:41.400 --> 00:57:52.471
And that, by linearity,
just becomes the sum of--
00:57:52.471 --> 00:57:55.883
and that's mu.
00:57:55.883 --> 00:57:56.383
OK.
00:57:56.383 --> 00:57:59.317
That's good.
00:57:59.317 --> 00:58:01.610
And then the variance,
what's the variance of X?
00:58:04.430 --> 00:58:09.750
That's the expectation
of X minus mu
00:58:09.750 --> 00:58:20.976
square, which is the expectation
sum over all i's, minus mu
00:58:20.976 --> 00:58:21.476
square.
00:58:24.344 --> 00:58:26.260
I'll group them.
00:58:26.260 --> 00:58:33.584
That's the expectation of 1 over
n sum of X_i minus mu square.
00:58:33.584 --> 00:58:35.580
i is from 1 to n.
00:58:43.570 --> 00:58:44.800
What did I do wrong?
00:58:44.800 --> 00:58:46.610
1 over n is inside the square.
00:58:46.610 --> 00:58:50.720
So I can take it out
and square, n square.
00:58:50.720 --> 00:58:53.660
And then you're summing
n terms of sigma square.
00:58:53.660 --> 00:58:57.145
So that is equal to
sigma square over n.
00:59:02.450 --> 00:59:04.110
That means the
effect of averaging
00:59:04.110 --> 00:59:08.600
n terms does not
affect your average,
00:59:08.600 --> 00:59:10.020
but it affects your variance.
00:59:13.510 --> 00:59:15.802
It divides your variance by n.
00:59:15.802 --> 00:59:18.890
If you take larger and
larger n, your variance
00:59:18.890 --> 00:59:20.080
gets smaller and smaller.
00:59:22.590 --> 00:59:25.970
And using that, we can
prove this statement.
00:59:25.970 --> 00:59:27.840
There's only one thing
you have to notice--
00:59:27.840 --> 00:59:30.510
that the probability
that x minus mu
00:59:30.510 --> 00:59:32.620
is greater than epsilon.
00:59:32.620 --> 00:59:35.840
When you multiply this
by epsilon square.
00:59:35.840 --> 00:59:41.230
This will be less than or
equal to the variance of x.
00:59:41.230 --> 00:59:42.780
The reason this
inequality holds is
00:59:42.780 --> 00:59:46.290
because variance X is defined
as the expectation of X minus mu
00:59:46.290 --> 00:59:48.200
square.
00:59:48.200 --> 00:59:52.340
For all the events when you have
X minus mu at least epsilon,
00:59:52.340 --> 00:59:54.260
your multiplying
factor X square will
00:59:54.260 --> 00:59:56.780
be at least epsilon square.
00:59:56.780 --> 01:00:00.350
This term will be at
least epsilon square
01:00:00.350 --> 01:00:03.520
when you fall into this event.
01:00:03.520 --> 01:00:07.100
So your variance has
to be at least that.
01:00:07.100 --> 01:00:11.971
And this is known to
be sigma square over n.
01:00:11.971 --> 01:00:15.704
So probability that
x minus mu is greater
01:00:15.704 --> 01:00:21.980
than epsilon is at most sigma
square over n epsilon squared.
01:00:21.980 --> 01:00:26.140
That means if you take n to go
to infinity, that goes to zero.
01:00:26.140 --> 01:00:29.590
So the probability that
you deviate from the mean
01:00:29.590 --> 01:00:33.187
by more than epsilon goes to 0.
01:00:33.187 --> 01:00:35.645
You can actually read out a
little bit more from the proof.
01:00:38.690 --> 01:00:41.635
It also tells a little bit
about the speed of convergence.
01:00:44.260 --> 01:00:50.230
So let's say you have a random
variable X. Your mean is 50.
01:00:50.230 --> 01:00:53.930
You epsilon is 0.1.
01:00:53.930 --> 01:00:55.830
So you want to know
the probability
01:00:55.830 --> 01:01:00.480
that you deviate from your
mean by more than 0.1.
01:01:00.480 --> 01:01:06.010
Let's say you want
to be 99% sure.
01:01:06.010 --> 01:01:14.812
Want to be 99% sure that X
minus mu is less than 0.1,
01:01:14.812 --> 01:01:18.120
or X minus 50 is less than 0.1.
01:01:18.120 --> 01:01:23.060
In that case, what you can do
is-- you want this to be 0.01.
01:01:23.060 --> 01:01:26.360
It has to be 0.01.
01:01:26.360 --> 01:01:29.800
So plug in that, plug in your
variance, plug in your epsilon.
01:01:29.800 --> 01:01:32.230
That will give you
some bound on n.
01:01:32.230 --> 01:01:34.190
If you have more than
that number of trials,
01:01:34.190 --> 01:01:38.113
you can be 99% sure that you
don't deviate from your mean
01:01:38.113 --> 01:01:40.680
by more than epsilon.
01:01:40.680 --> 01:01:42.700
So that does give
some estimate, but I
01:01:42.700 --> 01:01:46.150
should mention that this
is a very bad estimate.
01:01:46.150 --> 01:01:47.990
There are much more
powerful estimates
01:01:47.990 --> 01:01:48.970
that can be done here.
01:01:48.970 --> 01:01:50.770
That will give the order of
magnitude-- I didn't really
01:01:50.770 --> 01:01:53.440
calculate here, but it looks
like it's close to millions.
01:01:53.440 --> 01:01:55.900
It has to be close to millions.
01:01:55.900 --> 01:02:00.360
But in practice, if you use
a lot more powerful tool
01:02:00.360 --> 01:02:05.008
of estimating it, it should
only be hundreds or at most
01:02:05.008 --> 01:02:05.508
thousands.
01:02:13.460 --> 01:02:15.960
So the tool you'll use there
is moment-generating functions,
01:02:15.960 --> 01:02:18.360
something similar to
moment-generating functions.
01:02:18.360 --> 01:02:20.412
But I will not go into it.
01:02:20.412 --> 01:02:20.995
Any questions?
01:02:23.610 --> 01:02:25.090
OK.
01:02:25.090 --> 01:02:28.552
For those who already saw
law of large numbers before,
01:02:28.552 --> 01:02:30.510
the name suggests there's
also something called
01:02:30.510 --> 01:02:32.250
strong law of large numbers.
01:02:35.982 --> 01:02:41.380
In that theorem, your
conclusion is stronger.
01:02:41.380 --> 01:02:45.005
So the convergence is stronger
than this type of convergence.
01:02:47.810 --> 01:02:51.610
And also, the
condition I gave here
01:02:51.610 --> 01:02:53.580
is a very strong condition.
01:02:53.580 --> 01:02:56.020
The same conclusion
is true even if you
01:02:56.020 --> 01:02:58.840
weaken some of the conditions.
01:02:58.840 --> 01:03:01.580
So for example, the variance
does not have to exist.
01:03:01.580 --> 01:03:06.480
It can be replaced by some
other condition, and so on.
01:03:06.480 --> 01:03:08.860
But here, I just want
it to be a simple form
01:03:08.860 --> 01:03:11.350
so that it's easy to prove.
01:03:11.350 --> 01:03:14.274
And you at least get the
spirit of what's happening.
01:03:20.480 --> 01:03:26.140
Now let's move on to the next
topic-- central limit theorem.
01:04:11.240 --> 01:04:16.880
So weak law of
large numbers says
01:04:16.880 --> 01:04:22.210
that if you have IID random
variables, 1 over n times
01:04:22.210 --> 01:04:27.400
sum over X_i's converges to mu,
the mean, in some weak sense.
01:04:31.210 --> 01:04:33.730
And the reason it happened
was because this had
01:04:33.730 --> 01:04:39.157
mean mu and variance
sigma square over n.
01:04:43.660 --> 01:04:49.730
We've exploited the fact that
variance vanishes to get this.
01:04:49.730 --> 01:04:53.560
So the question is, what
happens if you replace 1 over n
01:04:53.560 --> 01:04:54.903
by 1 over square root n?
01:04:59.250 --> 01:05:04.590
What happens if-- for
the random variable
01:05:04.590 --> 01:05:08.300
1 over square root n times X_i?
01:05:14.180 --> 01:05:16.990
The reason I'm making this
choice of 1 over square root n
01:05:16.990 --> 01:05:19.310
is because if you
make this choice,
01:05:19.310 --> 01:05:26.330
now the average has mean mu
and variance sigma square just
01:05:26.330 --> 01:05:28.770
as in X_i's.
01:05:28.770 --> 01:05:34.981
So this is the same as X_i.
01:05:40.910 --> 01:05:44.330
Then what should it look like?
01:05:44.330 --> 01:05:46.730
If the random variable is the
same mean and same variance
01:05:46.730 --> 01:05:52.120
as your original random
variable, the distribution
01:05:52.120 --> 01:05:54.795
of this, should it look like
the distribution of X_i?
01:06:00.530 --> 01:06:01.290
If mean is mu.
01:06:01.290 --> 01:06:04.170
Thank you very much.
01:06:04.170 --> 01:06:05.535
The case when mean is 0.
01:06:13.160 --> 01:06:13.660
OK.
01:06:13.660 --> 01:06:17.620
For this special case,
will it look like X_i,
01:06:17.620 --> 01:06:20.820
or will it not look like X_i?
01:06:20.820 --> 01:06:24.260
If it doesn't look like X_i,
can we say anything interesting
01:06:24.260 --> 01:06:27.590
about the distribution of this?
01:06:27.590 --> 01:06:31.480
And central limit theorem
answers this question.
01:06:31.480 --> 01:06:34.980
When I first saw it, I thought
it was really interesting.
01:06:34.980 --> 01:06:37.161
Because normal
distribution comes up here.
01:06:40.250 --> 01:06:42.050
And that's probably
one of the reasons
01:06:42.050 --> 01:06:45.010
that normal distribution
is so universal.
01:06:45.010 --> 01:06:50.310
Because when you take
many independent events
01:06:50.310 --> 01:06:53.270
and take the average
in this sense,
01:06:53.270 --> 01:06:56.765
their distribution converges
to a normal distribution.
01:06:56.765 --> 01:06:57.265
Yes?
01:06:57.265 --> 01:06:59.660
AUDIENCE: How did you get
mean equals [INAUDIBLE]?
01:06:59.660 --> 01:07:00.970
PROFESSOR: I didn't get it.
01:07:00.970 --> 01:07:02.678
I assumed it if X-- yeah.
01:07:29.600 --> 01:07:41.480
So theorem: let
X_1, X_2, to X_n be
01:07:41.480 --> 01:07:51.960
IID random variables with mean,
this time, mu and variance
01:07:51.960 --> 01:07:55.020
sigma squared.
01:07:55.020 --> 01:07:59.308
And let X-- or Y_n.
01:08:01.940 --> 01:08:10.023
Y_n be square root n times
1 over n, of X_i minus mu.
01:08:24.813 --> 01:08:41.080
Then the distribution
of Y_n converges
01:08:41.080 --> 01:08:50.056
to that of normal distribution
with mean 0 and variance sigma.
01:08:55.050 --> 01:08:57.350
What this means-- I'll
write it down again--
01:08:57.350 --> 01:09:01.790
it means for all x,
probability that Y_n
01:09:01.790 --> 01:09:03.790
is less than or
equal to x converges
01:09:03.790 --> 01:09:07.722
the probability that normal
distribution is less than
01:09:07.722 --> 01:09:08.910
or equal to x.
01:09:14.140 --> 01:09:16.220
What's really
interesting here is,
01:09:16.220 --> 01:09:20.340
no matter what distribution
you had in the beginning,
01:09:20.340 --> 01:09:24.090
if we average it
out in this sense,
01:09:24.090 --> 01:09:25.965
then you converge to
the normal distribution.
01:09:35.429 --> 01:09:37.720
Any questions about this
statement, or any corrections?
01:09:40.490 --> 01:09:43.545
Any mistakes that I made?
01:09:43.545 --> 01:09:46.015
OK.
01:09:46.015 --> 01:09:47.003
Here's the proof.
01:09:50.970 --> 01:09:54.400
I will prove it when the
moment-generating function
01:09:54.400 --> 01:09:54.900
exists.
01:09:54.900 --> 01:09:56.816
So assume that the
moment-generating functions
01:09:56.816 --> 01:09:58.010
exists.
01:09:58.010 --> 01:10:04.963
So proof assuming
m of X_i exists.
01:10:16.810 --> 01:10:19.860
So remember that theorem.
01:10:19.860 --> 01:10:22.160
Try to recall that
theorem where if you
01:10:22.160 --> 01:10:25.130
know that the moment-generating
function of Y_n's converges
01:10:25.130 --> 01:10:29.250
to the moment-generating
function of the normal, then
01:10:29.250 --> 01:10:30.210
we have the statement.
01:10:30.210 --> 01:10:31.400
The distribution converges.
01:10:31.400 --> 01:10:34.328
So that's the statement
we're going to use.
01:10:34.328 --> 01:10:37.100
That means our goal is to prove
that the moment-generating
01:10:37.100 --> 01:10:43.020
function of these Y_n's converge
to the moment-generating
01:10:43.020 --> 01:10:51.088
function of the normal for
all t, pointwise convergence.
01:10:56.360 --> 01:11:00.080
And this part is well known.
01:11:00.080 --> 01:11:01.455
I'll just write it down.
01:11:01.455 --> 01:11:06.094
It's known to be e to the t
square sigma square over 2.
01:11:08.818 --> 01:11:11.173
That just can be computed.
01:11:18.610 --> 01:11:21.270
So we want to somehow show that
the moment-generating function
01:11:21.270 --> 01:11:25.738
of this Y_n converges to that.
01:11:25.738 --> 01:11:29.440
The moment-generating
function of Y_n
01:11:29.440 --> 01:11:36.102
is equal to expectation
of e to t Y_n.
01:11:42.544 --> 01:11:50.496
e to the t, 1 over square
root n, sum of X_i minus mu.
01:11:54.490 --> 01:11:57.680
And then because each of
the X_i's are independent,
01:11:57.680 --> 01:11:59.403
this sum will split
into products.
01:12:02.650 --> 01:12:14.059
Product of-- let
me split it better.
01:12:14.059 --> 01:12:19.240
Meets the expectation-- we
didn't use independent yet.
01:12:19.240 --> 01:12:26.504
Sum becomes products of e to
the t, 1 over square root n, X_i
01:12:26.504 --> 01:12:27.462
minus mu.
01:12:34.650 --> 01:12:36.380
And then because
they're independent,
01:12:36.380 --> 01:12:37.530
this product can go out.
01:12:40.925 --> 01:12:49.996
Equal to the product from 1 to
n expectation e to the t times
01:12:49.996 --> 01:12:50.984
square root n--
01:12:56.160 --> 01:12:56.660
OK.
01:12:56.660 --> 01:12:58.159
Now they're identically
distributed,
01:12:58.159 --> 01:13:00.900
so you just have to take
the n-th power of that.
01:13:00.900 --> 01:13:03.923
That's equal to the
expectation of e
01:13:03.923 --> 01:13:11.920
to the t over square root n,
X_i minus mu, to the n-th power.
01:13:11.920 --> 01:13:15.420
Now we'll do some estimation.
01:13:15.420 --> 01:13:19.450
So use the Taylor
expansion of this.
01:13:19.450 --> 01:13:30.002
What we get is expectation of 1
plus that, t over square root n
01:13:30.002 --> 01:13:36.990
xi minus mu, plus 1 over
2 factorial, that squared,
01:13:36.990 --> 01:13:43.760
t over square root n,
xi minus mu squared,
01:13:43.760 --> 01:13:48.748
plus 1 over 3 factorial,
that cubed plus so on.
01:13:55.050 --> 01:13:57.990
Then that's equal to 1--
Ah, to the n-th power.
01:14:02.920 --> 01:14:06.890
The linearity of
expectation, 1 comes out.
01:14:06.890 --> 01:14:12.830
Second term is 0,
because X_i have mean mu.
01:14:12.830 --> 01:14:15.020
So that disappears.
01:14:15.020 --> 01:14:26.930
This term-- we have 1 over 2,
t squared over n, X_i minus mu
01:14:26.930 --> 01:14:29.370
square.
01:14:29.370 --> 01:14:31.590
X_i minus mu square, when
you take expectation,
01:14:31.590 --> 01:14:35.550
that will be sigma square.
01:14:35.550 --> 01:14:39.720
And then the terms after
that, because we're
01:14:39.720 --> 01:14:42.850
only interested in
proving that for fixed t,
01:14:42.850 --> 01:14:46.160
this converges-- so we're only
proving pointwise convergence.
01:14:46.160 --> 01:14:49.030
You may consider t
as a fixed number.
01:14:49.030 --> 01:14:52.540
So as n goes to infinity--
if n is really, really large,
01:14:52.540 --> 01:14:56.730
all these terms will be
smaller order of magnitude
01:14:56.730 --> 01:15:00.830
than n, 1 over n.
01:15:00.830 --> 01:15:02.270
Something like that happens.
01:15:08.530 --> 01:15:11.250
And that's happening
because we're fixed.
01:15:11.250 --> 01:15:14.260
For fixed t, we
have to prove it.
01:15:14.260 --> 01:15:16.292
So if we're saying
something uniformly about t,
01:15:16.292 --> 01:15:18.390
that's no longer true.
01:15:18.390 --> 01:15:21.060
Now we go back to
the exponential form.
01:15:21.060 --> 01:15:26.540
So this is pretty much
just e to that term,
01:15:26.540 --> 01:15:30.900
1 over 2 t square
sigma square over n
01:15:30.900 --> 01:15:37.370
plus little o of 1 over
n to the n-th power.
01:15:37.370 --> 01:15:42.980
Now, that n can be
multiplied to cancel out.
01:15:42.980 --> 01:15:46.640
And we see that it's e to t
square sigma square over 2
01:15:46.640 --> 01:15:48.342
plus the little o of 1.
01:15:48.342 --> 01:15:50.370
So if you take n
to go to infinity,
01:15:50.370 --> 01:15:55.840
that term disappears,
and we prove
01:15:55.840 --> 01:15:57.410
that it converges to that.
01:16:00.100 --> 01:16:04.516
And then by the theorem that I
stated before, if we have this,
01:16:04.516 --> 01:16:06.182
we know that the
distribution converges.
01:16:09.880 --> 01:16:10.500
Any questions?
01:16:13.760 --> 01:16:14.260
OK.
01:16:14.260 --> 01:16:15.515
I'll make one final remark.
01:16:29.009 --> 01:16:42.640
So suppose there is a random
variable x whose mean we do not
01:16:42.640 --> 01:16:44.865
know, whose mean is unknown.
01:16:53.670 --> 01:16:55.710
Our goal is to
estimate the mean.
01:16:58.970 --> 01:17:02.730
And one way to do that is by
taking many independent trials
01:17:02.730 --> 01:17:05.220
of this random variable.
01:17:05.220 --> 01:17:21.680
So take independent trials X_1,
X_2, to X_n, and use 1 over--
01:17:21.680 --> 01:17:22.250
X_1 plus...
01:17:22.250 --> 01:17:23.565
X_n as our estimator.
01:17:32.960 --> 01:17:34.990
Then the law of large
numbers says that this
01:17:34.990 --> 01:17:36.750
will be very close to the mean.
01:17:36.750 --> 01:17:39.840
So if you take n
to be large enough,
01:17:39.840 --> 01:17:42.100
you will more than likely
have some value which
01:17:42.100 --> 01:17:44.190
is very close to the mean.
01:17:44.190 --> 01:17:47.050
And then the central
limit theorem
01:17:47.050 --> 01:17:53.530
tells you how the
distribution of this variable
01:17:53.530 --> 01:17:55.915
is around the mean.
01:17:55.915 --> 01:17:57.920
So we don't know what
the real value is,
01:17:57.920 --> 01:18:00.620
but we know that
the distribution
01:18:00.620 --> 01:18:02.980
of the value that
we will obtain here
01:18:02.980 --> 01:18:05.048
is something like
that around the mean.
01:18:09.340 --> 01:18:17.080
And because normal distribution
have very small tails,
01:18:17.080 --> 01:18:21.900
the tail distributions
is really small,
01:18:21.900 --> 01:18:23.950
we will get really
close really fast.
01:18:27.290 --> 01:18:34.387
And this is known as the maximum
likelihood estimator, is it?
01:18:37.670 --> 01:18:38.310
OK, yeah.
01:18:38.310 --> 01:18:39.980
For some distributions,
it's better
01:18:39.980 --> 01:18:44.080
to take some other estimator.
01:18:44.080 --> 01:18:47.280
Which is quite interesting.
01:18:47.280 --> 01:18:50.015
At least my intuition is to
take this for every single case,
01:18:50.015 --> 01:18:52.890
looks like that will
be a good choice.
01:18:52.890 --> 01:18:54.680
But it turns out that
that's not the case;
01:18:54.680 --> 01:18:59.492
for some distributions there's
a better choice than this.
01:18:59.492 --> 01:19:03.210
And Peter will
later talk about it.
01:19:06.340 --> 01:19:09.960
If you're interested
in, come back.
01:19:09.960 --> 01:19:13.960
And that's it for
today, any questions?
01:19:13.960 --> 01:19:17.875
So next Tuesday we will
have an outside speaker,
01:19:17.875 --> 01:19:21.256
and it will be on bonds.
01:19:21.256 --> 01:19:24.883
and I don't think anything from
linear algebra will be here.