1
00:00:00,060 --> 00:00:02,500
The following content is
provided under a Creative
2
00:00:02,500 --> 00:00:04,019
Commons license.
3
00:00:04,019 --> 00:00:06,360
Your support will help
MIT OpenCourseWare
4
00:00:06,360 --> 00:00:10,730
continue to offer high quality
educational resources for free.
5
00:00:10,730 --> 00:00:13,340
To make a donation or
view additional materials
6
00:00:13,340 --> 00:00:17,217
from hundreds of MIT courses,
visit MIT OpenCourseWare
7
00:00:17,217 --> 00:00:17,842
at ocw.mit.edu.
8
00:00:21,520 --> 00:00:25,260
PROFESSOR: OK, so
good afternoon.
9
00:00:25,260 --> 00:00:30,820
Today, we will review
probability theory.
10
00:00:30,820 --> 00:00:36,090
So I will mostly focus on-- I'll
give you some distributions.
11
00:00:36,090 --> 00:00:38,830
So probabilistic distributions,
that will be of interest to us
12
00:00:38,830 --> 00:00:40,830
throughout the course.
13
00:00:40,830 --> 00:00:44,610
And I will talk about
moment-generating function
14
00:00:44,610 --> 00:00:46,120
a little bit.
15
00:00:46,120 --> 00:00:50,660
Afterwards, I will talk
about law of large numbers
16
00:00:50,660 --> 00:00:52,210
and central limit theorem.
17
00:00:56,310 --> 00:01:00,680
Who has heard of all
of these topics before?
18
00:01:00,680 --> 00:01:02,150
OK.
19
00:01:02,150 --> 00:01:04,120
That's good.
20
00:01:04,120 --> 00:01:06,624
Then I'll try to focus
more on a little bit more
21
00:01:06,624 --> 00:01:07,540
of the advanced stuff.
22
00:01:10,890 --> 00:01:13,830
Then a big part of it
will be review for you.
23
00:01:13,830 --> 00:01:18,260
So first of all, just to
agree on terminology, let's
24
00:01:18,260 --> 00:01:21,490
review some definitions.
25
00:01:21,490 --> 00:01:32,670
So a random variable
X-- we will talk
26
00:01:32,670 --> 00:01:38,900
about discrete and
continuous random variables.
27
00:01:43,310 --> 00:01:47,240
Just to set up the notation,
I will write discrete as X
28
00:01:47,240 --> 00:01:50,130
and continuous random
variable as Y for now.
29
00:01:50,130 --> 00:01:52,820
So they are given by its
probability distribution--
30
00:01:52,820 --> 00:01:57,070
discrete random variable is
given by its probability mass
31
00:01:57,070 --> 00:02:02,490
function, f sub
X, I will denote.
32
00:02:02,490 --> 00:02:06,900
And continuous is given by
probability distribution
33
00:02:06,900 --> 00:02:07,399
function.
34
00:02:11,530 --> 00:02:17,745
I will denote by f
sub Y. So pmf and pdf.
35
00:02:22,210 --> 00:02:23,930
Here, I just use a
subscript because I
36
00:02:23,930 --> 00:02:26,030
wanted to distinguish
f sub x and f sub y.
37
00:02:26,030 --> 00:02:29,140
But when it's clear which random
variable we're talking about,
38
00:02:29,140 --> 00:02:32,190
I'll just say f.
39
00:02:32,190 --> 00:02:33,740
So what is this?
40
00:02:33,740 --> 00:02:42,980
A probability mass function is
a function from the sample space
41
00:02:42,980 --> 00:02:50,290
to non-negative reals such
that the sum over all points
42
00:02:50,290 --> 00:02:54,480
in the domain equals 1.
43
00:02:54,480 --> 00:02:57,110
The probability distribution
is very similar.
44
00:02:59,730 --> 00:03:02,890
The function from the
sample space non-negative
45
00:03:02,890 --> 00:03:07,500
reals, but now the
integration over the domain.
46
00:03:11,780 --> 00:03:16,650
So it's pretty much safe to
consider our sample space
47
00:03:16,650 --> 00:03:20,570
to be the real numbers for
continuous random variables.
48
00:03:20,570 --> 00:03:23,960
Later in the course, you
will see some examples where
49
00:03:23,960 --> 00:03:25,230
it's not the real numbers.
50
00:03:25,230 --> 00:03:29,217
But for now, just consider
it as real numbers.
51
00:03:34,840 --> 00:03:39,412
For example, probability
mass function.
52
00:03:39,412 --> 00:03:46,810
If X takes 1 with
probability 1/3,
53
00:03:46,810 --> 00:03:53,010
minus 1 with probability 1/3,
and 0 with probability 1/3.
54
00:03:56,070 --> 00:04:01,464
Then our probability mass
function is f_x(1) equals
55
00:04:01,464 --> 00:04:08,370
f_x(-1), 1/3, just like that.
56
00:04:08,370 --> 00:04:11,820
An example of a
continuous random variable
57
00:04:11,820 --> 00:04:17,470
is if-- let's say, for
example, if f sub Y is
58
00:04:17,470 --> 00:04:25,420
equal to 1 for all
y in [0,1], then
59
00:04:25,420 --> 00:04:36,305
this is pdf of uniform
random variable
60
00:04:36,305 --> 00:04:39,800
where the space is [0,1].
61
00:04:39,800 --> 00:04:41,850
So this random variable
just picks one out
62
00:04:41,850 --> 00:04:44,330
of the three numbers
with equal probability.
63
00:04:44,330 --> 00:04:47,450
This picks one out of this,
all the real numbers between 0
64
00:04:47,450 --> 00:04:51,600
and 1, with equal probability.
65
00:04:51,600 --> 00:04:54,956
These are just some basic stuff.
66
00:04:54,956 --> 00:04:56,330
You should be
familiar with this,
67
00:04:56,330 --> 00:05:00,934
but I wrote it down just so
that we agree on the notation.
68
00:05:00,934 --> 00:05:01,858
OK.
69
00:05:01,858 --> 00:05:03,353
Both of the boards don't slide.
70
00:05:03,353 --> 00:05:06,311
That's good.
71
00:05:06,311 --> 00:05:08,490
A few more stuff.
72
00:05:08,490 --> 00:05:14,530
Expectation-- probability first.
73
00:05:14,530 --> 00:05:22,092
Probability of an event can be
computed as probability of A
74
00:05:22,092 --> 00:05:28,200
is equal to either sum of all
points in A-- this probability
75
00:05:28,200 --> 00:05:36,700
mass function-- or
integral over the set A
76
00:05:36,700 --> 00:05:39,540
depending on what you're using.
77
00:05:39,540 --> 00:05:50,050
And expectation, or mean
is-- expectation of X
78
00:05:50,050 --> 00:05:55,410
is equal to the sum over
all x, x times that.
79
00:05:55,410 --> 00:06:01,110
And expectation of Y is
the integral over omega.
80
00:06:01,110 --> 00:06:02,580
Oh, sorry.
81
00:06:02,580 --> 00:06:04,540
Space.
82
00:06:04,540 --> 00:06:05,538
y times.
83
00:06:11,016 --> 00:06:12,520
OK.
84
00:06:12,520 --> 00:06:16,850
And one more basic
concept I'd like to review
85
00:06:16,850 --> 00:06:32,150
is two random variables X_1, X_2
are independent if probability
86
00:06:32,150 --> 00:06:38,220
that X_1 is in A and
X_2 is in B equals
87
00:06:38,220 --> 00:06:48,898
the product of the
probabilities, for all events A
88
00:06:48,898 --> 00:06:54,222
and B. OK.
89
00:06:57,610 --> 00:06:59,570
All agreed?
90
00:06:59,570 --> 00:07:01,910
So for independence, I will
talk about independence
91
00:07:01,910 --> 00:07:04,570
of several random
variables as well.
92
00:07:04,570 --> 00:07:09,290
There are two concepts
of independence--
93
00:07:09,290 --> 00:07:10,760
not two, but several.
94
00:07:10,760 --> 00:07:17,220
The two most popular are
mutually independent events
95
00:07:17,220 --> 00:07:19,110
and pairwise independent events.
96
00:07:23,583 --> 00:07:27,060
Can somebody tell me the
difference between these two
97
00:07:27,060 --> 00:07:28,865
for several variables?
98
00:07:33,230 --> 00:07:34,200
Yes?
99
00:07:34,200 --> 00:07:35,655
AUDIENCE: So
usually, independent
100
00:07:35,655 --> 00:07:38,640
means all the random
variables are independent,
101
00:07:38,640 --> 00:07:42,550
like X_1 is independent
with every others.
102
00:07:42,550 --> 00:07:46,610
But pairwise means X_1
and X_2 are independent,
103
00:07:46,610 --> 00:07:51,677
but X_1, X_2, and x_3, they
may not be independent.
104
00:07:51,677 --> 00:07:52,260
PROFESSOR: OK.
105
00:07:52,260 --> 00:07:54,940
Maybe-- yeah.
106
00:07:54,940 --> 00:07:57,020
So that's good.
107
00:07:57,020 --> 00:08:04,420
So let's see-- for the example
of three random variables,
108
00:08:04,420 --> 00:08:07,770
it might be the case that
each pair are independent.
109
00:08:07,770 --> 00:08:10,110
X_1 and X_2 X_1 is
independent with X_2,
110
00:08:10,110 --> 00:08:12,940
X_1 is independent with
X_3, X_2 is with X_3.
111
00:08:12,940 --> 00:08:15,290
But altogether, it's
not independent.
112
00:08:15,290 --> 00:08:20,780
What that means is, this type
of statement is not true.
113
00:08:20,780 --> 00:08:25,200
So there are say A_1, A_2, A_3
for which this does not hold.
114
00:08:25,200 --> 00:08:28,150
But that's just some
technical detail.
115
00:08:28,150 --> 00:08:30,960
We will mostly just consider
mutually independent events.
116
00:08:30,960 --> 00:08:32,960
So when we say that several
random variables are
117
00:08:32,960 --> 00:08:36,630
independent, it just means
whatever collection you take,
118
00:08:36,630 --> 00:08:37,742
they're all independent.
119
00:08:43,995 --> 00:08:44,960
OK.
120
00:08:44,960 --> 00:08:47,780
So a little bit more fun
stuff [? in this ?] overview.
121
00:08:50,640 --> 00:08:54,275
So we defined random variables.
122
00:08:54,275 --> 00:08:59,060
And one of the most
universal random variable,
123
00:08:59,060 --> 00:09:02,310
or distribution, is a
normal distribution.
124
00:09:10,920 --> 00:09:14,450
It's a continuous
random variable.
125
00:09:14,450 --> 00:09:21,160
Our continuous random variable
has normal distribution,
126
00:09:21,160 --> 00:09:29,835
is said to have normal
distribution, if-- N(mu,
127
00:09:29,835 --> 00:09:40,380
sigma)-- if the probability
distribution function is given
128
00:09:40,380 --> 00:09:46,820
as 1 over sigma
square root 2 pi,
129
00:09:46,820 --> 00:09:50,830
e to the minus x
minus mu squared.
130
00:09:57,270 --> 00:10:01,194
For all reals.
131
00:10:01,194 --> 00:10:04,146
OK?
132
00:10:04,146 --> 00:10:12,500
So mu mean over--
that's one of the most
133
00:10:12,500 --> 00:10:17,050
universal random variables--
distributions, the most
134
00:10:17,050 --> 00:10:18,100
important one as well.
135
00:10:28,990 --> 00:10:29,870
OK.
136
00:10:29,870 --> 00:10:33,150
So this distribution, how
it looks like-- I'm sure
137
00:10:33,150 --> 00:10:36,043
you saw this bell curve before.
138
00:10:36,043 --> 00:10:42,351
It looks like this if
it's N(0,1), let's say.
139
00:10:42,351 --> 00:10:45,420
And that's your y.
140
00:10:45,420 --> 00:10:48,360
So it's centered
around the origin,
141
00:10:48,360 --> 00:10:52,090
and it's symmetrical
on the origin.
142
00:10:52,090 --> 00:10:55,290
So now let's look
at our purpose.
143
00:10:55,290 --> 00:10:56,850
Let's think about our purpose.
144
00:10:56,850 --> 00:11:01,940
We want to model a financial
product or a stock,
145
00:11:01,940 --> 00:11:05,350
the price of the stock,
using some random variable.
146
00:11:05,350 --> 00:11:09,065
The first thing you can try
is to use normal distribution.
147
00:11:09,065 --> 00:11:10,690
Normal distribution
doesn't make sense,
148
00:11:10,690 --> 00:11:19,586
but we can say the price at
day n minus the price at day n
149
00:11:19,586 --> 00:11:21,615
minus 1 is normal distribution.
150
00:11:25,575 --> 00:11:29,440
Is this a sensible definition?
151
00:11:29,440 --> 00:11:30,637
That's not really.
152
00:11:30,637 --> 00:11:31,720
So it's not a good choice.
153
00:11:31,720 --> 00:11:35,810
You can model it like this,
but it's not a good choice.
154
00:11:35,810 --> 00:11:38,050
There may be several
reasons, but one reason
155
00:11:38,050 --> 00:11:40,860
is that it doesn't take into
account the order of magnitude
156
00:11:40,860 --> 00:11:42,110
of the price itself.
157
00:11:42,110 --> 00:11:49,487
So the stock-- let's say
you have a stock price that
158
00:11:49,487 --> 00:11:52,730
goes something like that.
159
00:11:52,730 --> 00:11:58,620
And say it was $10
here, and $50 here.
160
00:11:58,620 --> 00:12:01,890
Regardless of where
your position is at,
161
00:12:01,890 --> 00:12:05,900
it says that the increment,
the absolute value of increment
162
00:12:05,900 --> 00:12:11,080
is identically distributed at
this point and at this point.
163
00:12:11,080 --> 00:12:14,770
But if you observed
how it works,
164
00:12:14,770 --> 00:12:18,040
usually that's not
normally distributed.
165
00:12:18,040 --> 00:12:21,800
What's normally distributed
is the percentage
166
00:12:21,800 --> 00:12:24,610
of how much it changes daily.
167
00:12:24,610 --> 00:12:32,125
So this is not a sensible
model, not a good model.
168
00:12:35,910 --> 00:12:41,200
But still, we can use
normal distribution
169
00:12:41,200 --> 00:12:42,830
to come up with a
pretty good model.
170
00:12:49,170 --> 00:13:06,130
So instead, what we want
is a relative difference
171
00:13:06,130 --> 00:13:07,892
to be normally distributed.
172
00:13:15,680 --> 00:13:16,720
That is the percent.
173
00:13:26,760 --> 00:13:33,150
The question is, what is
the distribution of price?
174
00:13:33,150 --> 00:13:34,826
What does the
distribution of price?
175
00:13:45,750 --> 00:13:48,660
So it's not a very
good explanation.
176
00:13:48,660 --> 00:13:52,860
Because I'm giving just
discrete increments while
177
00:13:52,860 --> 00:13:55,770
these are continuous
random variables and so on.
178
00:13:55,770 --> 00:13:59,030
But what I'm trying to say here
is that normal distribution
179
00:13:59,030 --> 00:14:00,500
is not good enough.
180
00:14:00,500 --> 00:14:03,360
Instead, we want the
percentage change
181
00:14:03,360 --> 00:14:05,450
to be normally distributed.
182
00:14:05,450 --> 00:14:11,300
And if that is the case,
what will be the distribution
183
00:14:11,300 --> 00:14:13,066
of the random variable?
184
00:14:13,066 --> 00:14:15,440
In this case, what will be
the distribution of the price?
185
00:14:27,420 --> 00:14:30,250
One thing I should
mention is, in this case,
186
00:14:30,250 --> 00:14:34,230
if each discrement is
normally distributed,
187
00:14:34,230 --> 00:14:39,530
then the price at
day n will still
188
00:14:39,530 --> 00:14:44,270
be a normal random variable
distributed like that.
189
00:14:47,440 --> 00:14:53,900
So if there's no tendency-- if
the average daily increment is
190
00:14:53,900 --> 00:14:56,832
0, then no matter
how far you go,
191
00:14:56,832 --> 00:14:58,915
your random variable will
be normally distributed.
192
00:15:02,230 --> 00:15:06,110
But here, that will
not be the case.
193
00:15:06,110 --> 00:15:08,785
So we want to see what
the distribution of P_n
194
00:15:08,785 --> 00:15:11,981
will be in this case.
195
00:15:11,981 --> 00:15:12,480
OK.
196
00:15:17,820 --> 00:15:29,300
To do that-- let me formally
write down what I want to say.
197
00:15:29,300 --> 00:15:34,008
What I want to say is this.
198
00:15:34,008 --> 00:15:46,030
I want to define a
log-normal distribution Y,
199
00:15:46,030 --> 00:16:07,274
or log-normal random variable
Y, such that log of Y
200
00:16:07,274 --> 00:16:08,762
is normally distributed.
201
00:16:24,170 --> 00:16:26,670
So to derive the probability
distribution of this
202
00:16:26,670 --> 00:16:28,220
from the normal
distribution, we can
203
00:16:28,220 --> 00:16:40,010
use the change of
variable formula, which
204
00:16:40,010 --> 00:16:47,340
says the following:
suppose X and Y
205
00:16:47,340 --> 00:17:16,781
are random variables such
that-- probability of X
206
00:17:16,781 --> 00:17:26,262
minus x-- for all x.
207
00:17:32,250 --> 00:17:48,218
Then F of Y of y--
the first-- of f
208
00:17:48,218 --> 00:17:52,709
sub X is equal to f sub Y of y.
209
00:17:58,198 --> 00:17:59,196
h of x.
210
00:18:07,200 --> 00:18:11,930
So let's try to fit
into this story.
211
00:18:11,930 --> 00:18:14,920
We want to have a
random variable Y such
212
00:18:14,920 --> 00:18:18,510
that log Y is
normally distributed.
213
00:18:18,510 --> 00:18:26,430
Here-- so you can
put log of x here.
214
00:18:26,430 --> 00:18:30,300
If Y is normally distributed,
X will be the distribution
215
00:18:30,300 --> 00:18:32,890
that we're interested in.
216
00:18:32,890 --> 00:18:37,870
So using this formula, we can
find probability distribution
217
00:18:37,870 --> 00:18:40,650
function of the log-normal
distribution using
218
00:18:40,650 --> 00:18:43,720
the probability
distribution of normal.
219
00:18:43,720 --> 00:18:44,810
So let's do that.
220
00:19:05,669 --> 00:19:10,659
AUDIENCE: [INAUDIBLE], right?
221
00:19:10,659 --> 00:19:12,910
PROFESSOR: Yes.
222
00:19:12,910 --> 00:19:15,006
So it's not a good choice.
223
00:19:15,006 --> 00:19:16,380
Locally, it might
be good choice.
224
00:19:16,380 --> 00:19:20,357
But if it's taken
over a long time,
225
00:19:20,357 --> 00:19:21,440
it won't be a good choice.
226
00:19:21,440 --> 00:19:24,398
Because it will also take
negative values, for example.
227
00:19:28,517 --> 00:19:30,100
So if you just take
this model, what's
228
00:19:30,100 --> 00:19:31,849
going to happen over
a long period of time
229
00:19:31,849 --> 00:19:35,730
is it's going to hit
this square root of n,
230
00:19:35,730 --> 00:19:38,090
negative square root of
n line infinitely often.
231
00:19:42,050 --> 00:19:44,620
And then it can
go up to infinity,
232
00:19:44,620 --> 00:19:47,470
or it can go down to
infinity eventually.
233
00:19:47,470 --> 00:19:49,720
So it will take negative
values and positive values.
234
00:19:53,310 --> 00:19:55,460
That's one reason, but
there are several reasons
235
00:19:55,460 --> 00:19:57,970
why that's not a good choice.
236
00:19:57,970 --> 00:19:59,440
If you look at a
very small scale,
237
00:19:59,440 --> 00:20:03,610
it might be OK, because the base
price doesn't change that much.
238
00:20:03,610 --> 00:20:05,490
So if you model
in terms of ratio,
239
00:20:05,490 --> 00:20:07,930
our if you model it
in an absolute way,
240
00:20:07,930 --> 00:20:09,830
it doesn't matter that much.
241
00:20:09,830 --> 00:20:13,850
But if you want to do it a
little bit more large scale,
242
00:20:13,850 --> 00:20:17,890
then that's not a
very good choice.
243
00:20:17,890 --> 00:20:20,120
Other questions?
244
00:20:20,120 --> 00:20:21,745
Do you want me to
add some explanation?
245
00:20:25,322 --> 00:20:25,822
OK.
246
00:20:29,580 --> 00:20:32,720
So let me get this right.
247
00:20:37,120 --> 00:20:45,440
Y. I want X to be-- yes.
248
00:20:45,440 --> 00:20:49,950
I want X to be the log
normal distribution.
249
00:20:56,950 --> 00:21:04,580
And I want Y to be
normal distribution
250
00:21:04,580 --> 00:21:07,190
or a normal random variable.
251
00:21:07,190 --> 00:21:12,572
Then the probability
that X is at most x
252
00:21:12,572 --> 00:21:24,500
equals the probability
that Y is at most-- sigma.
253
00:21:24,500 --> 00:21:29,070
Y is at most log x.
254
00:21:29,070 --> 00:21:33,160
That's the definition of
log-normal distribution.
255
00:21:33,160 --> 00:21:39,130
Then by using this change
of variable formula,
256
00:21:39,130 --> 00:21:41,780
probability density
function of X
257
00:21:41,780 --> 00:21:46,980
is equal to probability
density function of Y at log
258
00:21:46,980 --> 00:21:54,440
x times the differentiation
of log x which is 1 over x.
259
00:21:54,440 --> 00:22:00,460
So it becomes 1 over
x sigma square root
260
00:22:00,460 --> 00:22:07,704
2 pi, e to the minus
log x minus mu squared.
261
00:22:11,610 --> 00:22:13,430
So log-normal
distribution can also
262
00:22:13,430 --> 00:22:15,380
be defined as the
distribution which has
263
00:22:15,380 --> 00:22:17,246
probability mass function this.
264
00:22:22,650 --> 00:22:26,160
You can use either definition.
265
00:22:26,160 --> 00:22:29,391
Let me just make sure that I
didn't mess up in the middle.
266
00:22:32,800 --> 00:22:33,780
Yes.
267
00:22:33,780 --> 00:22:39,187
And that only works
for x greater than 0.
268
00:22:39,187 --> 00:22:39,687
Yes?
269
00:22:39,687 --> 00:22:41,714
AUDIENCE: [INAUDIBLE]?
270
00:22:41,714 --> 00:22:42,380
PROFESSOR: Yeah.
271
00:22:42,380 --> 00:22:43,940
So all logs are natural log.
272
00:22:43,940 --> 00:22:46,171
It should be ln.
273
00:22:46,171 --> 00:22:46,670
Yeah.
274
00:22:46,670 --> 00:22:48,320
Thank you.
275
00:22:48,320 --> 00:22:49,810
OK.
276
00:22:49,810 --> 00:22:58,370
So question-- what's the mean
of this distribution here?
277
00:22:58,370 --> 00:22:58,870
Yeah?
278
00:22:58,870 --> 00:23:00,970
AUDIENCE: 1?
279
00:23:00,970 --> 00:23:02,460
PROFESSOR: Not 1.
280
00:23:02,460 --> 00:23:04,820
It might be mu.
281
00:23:04,820 --> 00:23:07,500
Is it mu?
282
00:23:07,500 --> 00:23:08,260
Oh, sorry.
283
00:23:08,260 --> 00:23:09,850
It might be e to the mu.
284
00:23:09,850 --> 00:23:15,470
Because log X, the normal
distribution had mean mu.
285
00:23:15,470 --> 00:23:17,630
log x equals mu
might be the center.
286
00:23:17,630 --> 00:23:20,850
If that's the case, x is e
to the mu will be the mean.
287
00:23:20,850 --> 00:23:23,915
Is that the case?
288
00:23:23,915 --> 00:23:24,415
Yes?
289
00:23:24,415 --> 00:23:27,890
AUDIENCE: Can you get
the mu minus [INAUDIBLE]?
290
00:23:27,890 --> 00:23:29,760
PROFESSOR: Probably right.
291
00:23:29,760 --> 00:23:31,070
I don't remember what's there.
292
00:23:31,070 --> 00:23:32,490
There is a correcting factor.
293
00:23:32,490 --> 00:23:34,292
I don't remember
exactly what that is,
294
00:23:34,292 --> 00:23:37,210
but I think you're right.
295
00:23:37,210 --> 00:23:39,770
So one very important
thing to remember
296
00:23:39,770 --> 00:23:43,500
is log-normal
distribution are referred
297
00:23:43,500 --> 00:23:48,150
to in terms of the
parameters mu and sigma,
298
00:23:48,150 --> 00:23:50,510
because that's the mu and
sigma up here and here coming
299
00:23:50,510 --> 00:23:52,600
from the normal distribution.
300
00:23:52,600 --> 00:23:57,580
But those are not the
mean and variance anymore,
301
00:23:57,580 --> 00:24:01,900
because you skew
the distribution.
302
00:24:01,900 --> 00:24:03,700
It's no longer centered at mu.
303
00:24:03,700 --> 00:24:07,490
log X is centered at mu, but
when it takes exponential,
304
00:24:07,490 --> 00:24:08,590
it becomes skewed.
305
00:24:08,590 --> 00:24:12,630
And we take the average,
you'll see that the mean
306
00:24:12,630 --> 00:24:13,930
is no longer e to the mu.
307
00:24:13,930 --> 00:24:16,365
So that doesn't give the mean.
308
00:24:16,365 --> 00:24:18,490
That doesn't imply that
the mean is e to the sigma.
309
00:24:18,490 --> 00:24:20,870
That doesn't imply
that the variance is
310
00:24:20,870 --> 00:24:23,242
something like e to the sigma.
311
00:24:23,242 --> 00:24:27,040
That's just totally nonsense.
312
00:24:27,040 --> 00:24:30,080
Just remember-- these are just
parameters, some parameters.
313
00:24:30,080 --> 00:24:32,450
It's no longer mean or variance.
314
00:24:35,670 --> 00:24:39,794
And in your homework,
one exercise,
315
00:24:39,794 --> 00:24:41,710
we'll ask you to compute
the mean and variance
316
00:24:41,710 --> 00:24:44,490
of the random variable.
317
00:24:44,490 --> 00:24:48,560
But really, just try to
have it stick in your mind
318
00:24:48,560 --> 00:24:53,160
that mu and sigma is no
longer mean and variance.
319
00:24:53,160 --> 00:24:56,230
That's only the case for
normal random variables.
320
00:24:56,230 --> 00:24:58,380
And the reason we are
still using mu and sigma
321
00:24:58,380 --> 00:25:00,680
is because of this derivation.
322
00:25:00,680 --> 00:25:02,390
And it's easy to
describe it in those.
323
00:25:05,830 --> 00:25:07,940
OK.
324
00:25:07,940 --> 00:25:11,800
So the normal distribution
and log-normal distribution
325
00:25:11,800 --> 00:25:13,720
will probably be
the distributions
326
00:25:13,720 --> 00:25:15,742
that you'll see the most
throughout the course.
327
00:25:15,742 --> 00:25:17,325
But there are some
other distributions
328
00:25:17,325 --> 00:25:18,500
that you'll also see.
329
00:25:23,460 --> 00:25:24,948
I need this.
330
00:25:32,884 --> 00:25:35,650
I will not talk
about it in detail.
331
00:25:35,650 --> 00:25:38,540
It will be some
exercise questions.
332
00:25:38,540 --> 00:25:44,939
For example, you have Poisson
distribution or exponential
333
00:25:44,939 --> 00:25:45,522
distributions.
334
00:25:52,130 --> 00:25:56,550
These are some other
distributions that you'll see.
335
00:25:56,550 --> 00:25:59,060
And all of these-- normal,
log-normal, Poisson,
336
00:25:59,060 --> 00:26:01,060
and exponential,
and a lot more can
337
00:26:01,060 --> 00:26:04,400
be grouped into a
family of distributions
338
00:26:04,400 --> 00:26:05,798
called exponential family.
339
00:26:18,490 --> 00:26:24,026
So a distribution is called to
be in an exponential family--
340
00:26:24,026 --> 00:26:36,590
A distribution belongs
to exponential family
341
00:26:36,590 --> 00:26:50,890
if there exists a theta,
a vector that parametrizes
342
00:26:50,890 --> 00:27:05,520
the distribution such that
the probability density
343
00:27:05,520 --> 00:27:10,670
function for this choice
of parameter theta
344
00:27:10,670 --> 00:27:16,480
can be written as h
of x times c of theta
345
00:27:16,480 --> 00:27:22,498
times the exponent of
sum from i equal 1 to k--
346
00:27:35,446 --> 00:27:35,970
Yes.
347
00:27:35,970 --> 00:27:40,100
So here, when I write
only x, h should only
348
00:27:40,100 --> 00:27:43,400
depend on x, not on theta.
349
00:27:43,400 --> 00:27:45,090
When I write some
function of theta,
350
00:27:45,090 --> 00:27:48,020
it should only depend
on theta, not on x.
351
00:27:48,020 --> 00:28:01,070
So h(x), t_i(x) depends only
on x and c(theta) on my value
352
00:28:01,070 --> 00:28:04,679
theta, depends only on theta.
353
00:28:04,679 --> 00:28:05,720
That's an abstract thing.
354
00:28:05,720 --> 00:28:07,830
It's not clear why
this is so useful,
355
00:28:07,830 --> 00:28:10,140
at least from the definition.
356
00:28:10,140 --> 00:28:14,955
But you're going to talk
about some distribution
357
00:28:14,955 --> 00:28:16,650
for an exponential
family, right?
358
00:28:16,650 --> 00:28:17,150
Yeah.
359
00:28:17,150 --> 00:28:19,840
So you will see
something about this.
360
00:28:19,840 --> 00:28:21,770
But one good thing
is, they exhibit
361
00:28:21,770 --> 00:28:25,360
some good statistical
behavior, the things-- when
362
00:28:25,360 --> 00:28:28,330
you group them into--
all distributions
363
00:28:28,330 --> 00:28:31,460
in the exponential family
have some nice statistical
364
00:28:31,460 --> 00:28:35,590
properties, which makes it good.
365
00:28:35,590 --> 00:28:37,270
That's too abstract.
366
00:28:37,270 --> 00:28:42,140
Let's see how log-normal
distribution actually falls
367
00:28:42,140 --> 00:28:43,631
into the exponential family.
368
00:28:47,607 --> 00:28:49,444
AUDIENCE: So, let
me just comment.
369
00:28:49,444 --> 00:28:50,360
PROFESSOR: Yeah, sure.
370
00:28:50,360 --> 00:28:53,976
AUDIENCE: The notion of
independent random variables,
371
00:28:53,976 --> 00:28:58,687
you went over how the--
well, the probability density
372
00:28:58,687 --> 00:29:00,520
functions of collections
of random variables
373
00:29:00,520 --> 00:29:01,936
if they're mutually
independent is
374
00:29:01,936 --> 00:29:05,640
the product of the
probability densities
375
00:29:05,640 --> 00:29:07,132
of the individual variables.
376
00:29:07,132 --> 00:29:10,240
And so with this
exponential family,
377
00:29:10,240 --> 00:29:12,685
if you have random variables
from the same exponential
378
00:29:12,685 --> 00:29:18,380
family, products of this
density function factor out
379
00:29:18,380 --> 00:29:19,700
into a very simple form.
380
00:29:19,700 --> 00:29:21,360
It doesn't get more
complicated as you
381
00:29:21,360 --> 00:29:24,430
look at the joint density
of many variables,
382
00:29:24,430 --> 00:29:27,510
and in fact simplifies to
the same exponential family.
383
00:29:27,510 --> 00:29:30,210
So that's where that
becomes very useful.
384
00:29:30,210 --> 00:29:32,305
PROFESSOR: So it's designed
so that it factors out
385
00:29:32,305 --> 00:29:33,180
when it's multiplied.
386
00:29:33,180 --> 00:29:34,644
It factors out well.
387
00:29:37,990 --> 00:29:38,650
OK.
388
00:29:38,650 --> 00:29:43,000
So-- sorry about that.
389
00:29:43,000 --> 00:29:44,960
Yeah, log-normal distribution.
390
00:29:44,960 --> 00:29:49,970
So take h(x), 1 over x.
391
00:29:49,970 --> 00:29:52,350
Before that, let's just rewrite
that in a different way.
392
00:29:52,350 --> 00:29:58,804
So 1 over x sigma square
root 2 pi, e to the minus log
393
00:29:58,804 --> 00:30:03,430
x [INAUDIBLE] squared.
394
00:30:03,430 --> 00:30:04,530
Square.
395
00:30:04,530 --> 00:30:10,546
Can be rewritten as 1
over x, times 1 over sigma
396
00:30:10,546 --> 00:30:18,215
squared 2 pi, e to
the minus log x square
397
00:30:18,215 --> 00:30:30,590
over 2 sigma square plus
mu log x over sigma square
398
00:30:30,590 --> 00:30:33,065
minus mu square.
399
00:30:37,050 --> 00:30:38,730
Let's write it like that.
400
00:30:38,730 --> 00:30:42,464
Set up h(x) equals 1 over x.
401
00:30:42,464 --> 00:30:51,422
c of theta-- sorry,
theta equals mu sigma.
402
00:30:51,422 --> 00:30:55,932
c(theta) is equal to 1 over
sigma square root 2 pi, e
403
00:30:55,932 --> 00:30:57,163
to the minus mu square.
404
00:31:01,510 --> 00:31:03,920
So you will
parametrize this family
405
00:31:03,920 --> 00:31:06,870
in terms of mu and sigma.
406
00:31:06,870 --> 00:31:09,490
Your h of x here
will be 1 over x.
407
00:31:09,490 --> 00:31:14,000
Your c(theta) will be this
term and the last term here,
408
00:31:14,000 --> 00:31:16,960
because this
doesn't depend on x.
409
00:31:16,960 --> 00:31:21,630
And then you have to
figure out what w and t is.
410
00:31:21,630 --> 00:31:24,970
You can let w_1 of
x be log x square.
411
00:31:29,180 --> 00:31:38,940
t_1-- no, t_1 of x be log x
square, w_1 of theta be minus 1
412
00:31:38,940 --> 00:31:41,392
over 2 sigma square.
413
00:31:41,392 --> 00:31:44,080
And similarly, you
can let t_2 equals log
414
00:31:44,080 --> 00:31:51,404
x and w_2 equals mu over sigma.
415
00:31:54,580 --> 00:31:56,570
It's just some technicality,
but at least you
416
00:31:56,570 --> 00:31:59,974
can see it really fits in.
417
00:32:02,690 --> 00:32:05,200
OK.
418
00:32:05,200 --> 00:32:07,380
So that's all
about distributions
419
00:32:07,380 --> 00:32:10,080
that I want to talk about.
420
00:32:10,080 --> 00:32:12,640
And then let's talk
a little bit more
421
00:32:12,640 --> 00:32:15,340
about more interesting
stuff, in my opinion.
422
00:32:15,340 --> 00:32:16,705
I like this stuff better.
423
00:32:19,440 --> 00:32:23,340
There are two main things
that we're interested in.
424
00:32:23,340 --> 00:32:30,650
When we have a random variable,
at least for our purpose, what
425
00:32:30,650 --> 00:32:42,766
we want to study is given
a random variable, first,
426
00:32:42,766 --> 00:32:44,015
we want to study a statistics.
427
00:32:50,710 --> 00:32:54,826
So we want to study this
statistics, whatever
428
00:32:54,826 --> 00:32:55,798
that means.
429
00:32:59,690 --> 00:33:02,567
And that will be represented
by the k-th moments
430
00:33:02,567 --> 00:33:03,525
of the random variable.
431
00:33:10,340 --> 00:33:15,370
Where k-th moment is defined
as expectation of X to the k.
432
00:33:20,600 --> 00:33:24,000
And a good way to study
all the moments together
433
00:33:24,000 --> 00:33:26,855
in one function is a
moment-generating function.
434
00:33:34,300 --> 00:33:36,480
So this moment-generating
function
435
00:33:36,480 --> 00:33:40,340
encodes all the k-th moments
of a random variable.
436
00:33:40,340 --> 00:33:43,130
So it contains all the
statistical information
437
00:33:43,130 --> 00:33:45,339
of a random variable.
438
00:33:45,339 --> 00:33:46,880
That's why
moment-generating function
439
00:33:46,880 --> 00:33:48,060
will be interesting to us.
440
00:33:48,060 --> 00:33:50,050
Because when you
want to study it,
441
00:33:50,050 --> 00:33:52,760
you don't have to consider
each moment separately.
442
00:33:52,760 --> 00:33:54,090
It gives a unified way.
443
00:33:54,090 --> 00:33:58,050
It gives a very good
feeling about your function.
444
00:33:58,050 --> 00:33:59,560
That will be our first topic.
445
00:33:59,560 --> 00:34:02,200
Our second topic will
be we want to study
446
00:34:02,200 --> 00:34:10,140
its long-term or
large-scale behavior.
447
00:34:18,190 --> 00:34:21,199
So for example, assume that you
have a normal distribution--
448
00:34:21,199 --> 00:34:24,449
one random variable with
normal distribution.
449
00:34:24,449 --> 00:34:28,800
If we just have a
single random variable,
450
00:34:28,800 --> 00:34:30,760
you really have no control.
451
00:34:30,760 --> 00:34:31,870
It can be anywhere.
452
00:34:31,870 --> 00:34:39,260
The outcome can be anything
according to that distribution.
453
00:34:39,260 --> 00:34:41,429
But if you have several
independent random variables
454
00:34:41,429 --> 00:34:44,540
with the exact
same distribution,
455
00:34:44,540 --> 00:34:49,530
if the number is super large--
let's say 100 million--
456
00:34:49,530 --> 00:34:55,320
and you plot how many random
variables fall into each point
457
00:34:55,320 --> 00:34:58,150
into a graph,
you'll know that it
458
00:34:58,150 --> 00:35:01,672
has to look very
close to this curve.
459
00:35:01,672 --> 00:35:04,160
It will be more dense
here, sparser there,
460
00:35:04,160 --> 00:35:06,720
and sparser there.
461
00:35:06,720 --> 00:35:09,050
So you don't have
individual control on each
462
00:35:09,050 --> 00:35:10,150
of the random variables.
463
00:35:10,150 --> 00:35:12,185
But when you look
at large scale,
464
00:35:12,185 --> 00:35:16,860
you know, at least with
very high probability,
465
00:35:16,860 --> 00:35:19,990
it has to look like this curve.
466
00:35:19,990 --> 00:35:22,480
Those kind of things are
what we want to study.
467
00:35:22,480 --> 00:35:25,720
When we look at this long-term
behavior or large scale
468
00:35:25,720 --> 00:35:28,500
behavior, what can we say?
469
00:35:28,500 --> 00:35:30,130
What kind of events
are guaranteed
470
00:35:30,130 --> 00:35:35,110
to happen with probability,
let's say, 99.9%?
471
00:35:35,110 --> 00:35:38,680
And actually, some interesting
things are happening.
472
00:35:38,680 --> 00:35:44,800
As you might already know, two
typical theorems of this type
473
00:35:44,800 --> 00:35:46,850
will be, in this
topic will be law
474
00:35:46,850 --> 00:35:53,282
of large numbers and
central limit theorem.
475
00:36:02,520 --> 00:36:04,590
So let's start with
our first topic--
476
00:36:04,590 --> 00:36:05,975
the moment-generating function.
477
00:36:26,310 --> 00:36:28,800
The moment-generating
function of a random variable
478
00:36:28,800 --> 00:36:31,540
is defined as-- I
write it as m sub
479
00:36:31,540 --> 00:36:39,330
X. It's defined as expectation
of e to the t times x
480
00:36:39,330 --> 00:36:41,090
where t is some parameter.
481
00:36:41,090 --> 00:36:42,510
t can be any real.
482
00:36:47,372 --> 00:36:48,330
You have to be careful.
483
00:36:48,330 --> 00:36:51,680
It doesn't always converge.
484
00:36:51,680 --> 00:36:58,360
So remark: does not
necessarily exist.
485
00:37:09,900 --> 00:37:12,960
So for example, one of the
distributions you already saw
486
00:37:12,960 --> 00:37:15,010
does not have
moment-generating function.
487
00:37:15,010 --> 00:37:22,101
The log-normal
distribution does not
488
00:37:22,101 --> 00:37:23,600
have any moment-generating
function.
489
00:37:30,650 --> 00:37:33,720
And that's one thing
you have to be careful.
490
00:37:33,720 --> 00:37:35,870
It's not just some
theoretical thing.
491
00:37:38,329 --> 00:37:40,120
The statement is not
something theoretical.
492
00:37:40,120 --> 00:37:42,670
It actually happens for
some random variables
493
00:37:42,670 --> 00:37:45,548
that you encounter in your life.
494
00:37:45,548 --> 00:37:48,190
So be careful.
495
00:37:48,190 --> 00:37:54,460
And that will actually show
some very interesting thing
496
00:37:54,460 --> 00:37:57,220
I will later explain.
497
00:37:57,220 --> 00:37:59,796
Some very interesting
facts arise from this fact.
498
00:38:03,900 --> 00:38:06,277
Before going into
that, first of all,
499
00:38:06,277 --> 00:38:08,110
why is it called
moment-generating function?
500
00:38:08,110 --> 00:38:14,540
It's because if you
take the k-th derivative
501
00:38:14,540 --> 00:38:26,280
of this function,
then it actually
502
00:38:26,280 --> 00:38:33,131
gives the k-th moment
of your random variable.
503
00:38:33,131 --> 00:38:34,505
That's where the
name comes from.
504
00:38:43,235 --> 00:38:45,225
It's for all integers.
505
00:38:58,320 --> 00:39:00,040
And that gives a
different way of writing
506
00:39:00,040 --> 00:39:01,248
a moment-generating function.
507
00:39:11,230 --> 00:39:18,090
Because of that, we may write
the moment-generating function
508
00:39:18,090 --> 00:39:24,992
as the sum from k equals
0 to infinity, t to the k,
509
00:39:24,992 --> 00:39:29,912
k factorial, times
a k-th moment.
510
00:39:37,790 --> 00:39:40,469
That's like the
Taylor expansion.
511
00:39:40,469 --> 00:39:42,010
Because you know
all the derivatives,
512
00:39:42,010 --> 00:39:43,551
you know what the
functions would be.
513
00:39:43,551 --> 00:39:45,300
Of course, only if it exists.
514
00:39:45,300 --> 00:39:46,300
This might not converge.
515
00:39:55,080 --> 00:39:58,360
So if moment-generating
function exists,
516
00:39:58,360 --> 00:40:01,120
they pretty much classify
your random variables.
517
00:40:04,630 --> 00:40:09,020
So if two random
variables, X, Y,
518
00:40:09,020 --> 00:40:16,120
have the same
moment-generating function,
519
00:40:16,120 --> 00:40:24,835
then X and Y have the
same distribution.
520
00:40:30,020 --> 00:40:32,550
I will not prove this theorem.
521
00:40:32,550 --> 00:40:35,080
But it says that
moment-generating function,
522
00:40:35,080 --> 00:40:39,600
if it exists, encodes
really all the information
523
00:40:39,600 --> 00:40:41,516
about your random variables.
524
00:40:41,516 --> 00:40:42,990
You're not losing anything.
525
00:40:46,320 --> 00:40:50,540
However, be very careful when
you're applying this theorem.
526
00:40:50,540 --> 00:40:59,920
Because remark,
it does not imply
527
00:40:59,920 --> 00:41:20,740
that all random variables
with identical k-th moments
528
00:41:20,740 --> 00:41:26,790
for all k has the
same distribution.
529
00:41:37,418 --> 00:41:40,030
Do you see it?
530
00:41:40,030 --> 00:41:43,330
If X and Y have a
moment-generating function,
531
00:41:43,330 --> 00:41:49,210
and they're the same, then they
have the same distribution.
532
00:41:49,210 --> 00:41:52,710
This looks a little bit
controversial to this theorem.
533
00:41:52,710 --> 00:41:56,890
It says that it's not
necessarily the case
534
00:41:56,890 --> 00:42:01,000
that two random variables, which
have identical moments-- so
535
00:42:01,000 --> 00:42:04,750
all k-th moments are the
same for two variables--
536
00:42:04,750 --> 00:42:06,710
even if that's the case,
they don't necessarily
537
00:42:06,710 --> 00:42:10,060
have to have the
same distribution.
538
00:42:10,060 --> 00:42:12,014
Which seems like it
doesn't make sense
539
00:42:12,014 --> 00:42:13,180
if you look at this theorem.
540
00:42:13,180 --> 00:42:14,596
Because moment-generating
function
541
00:42:14,596 --> 00:42:16,650
is defined in terms
of the moments.
542
00:42:16,650 --> 00:42:18,742
If two random variables
have the same moment,
543
00:42:18,742 --> 00:42:20,575
we have the same
moment-generating function.
544
00:42:20,575 --> 00:42:22,616
If they have the same
moment-generating function,
545
00:42:22,616 --> 00:42:24,970
they have the same distribution.
546
00:42:24,970 --> 00:42:28,450
There is a hole
in this argument.
547
00:42:28,450 --> 00:42:31,850
Even if they have
the same moments,
548
00:42:31,850 --> 00:42:33,792
it doesn't necessarily
imply that they
549
00:42:33,792 --> 00:42:35,500
have the same
moment-generating function.
550
00:42:35,500 --> 00:42:39,520
They might both not have
moment-generating functions.
551
00:42:39,520 --> 00:42:42,620
That's the glitch.
552
00:42:42,620 --> 00:42:44,040
Be careful.
553
00:42:44,040 --> 00:42:47,587
So just remember that even if
they have the same moments,
554
00:42:47,587 --> 00:42:49,670
they don't necessarily
have the same distribution.
555
00:42:49,670 --> 00:42:51,740
And the reason is
because-- one reason
556
00:42:51,740 --> 00:42:56,110
is because the moment-generating
function might not exist.
557
00:42:56,110 --> 00:42:57,930
And if you look in
to Wikipedia, you'll
558
00:42:57,930 --> 00:43:00,850
see an example of
when it happens,
559
00:43:00,850 --> 00:43:03,345
of two random variables
where this happens.
560
00:43:10,310 --> 00:43:13,380
So that's one thing
we will use later.
561
00:43:13,380 --> 00:43:17,660
Another thing that
we will use later,
562
00:43:17,660 --> 00:43:20,950
it's a statement
very similar to that,
563
00:43:20,950 --> 00:43:25,820
but it says something about a
sequence of random variables.
564
00:43:25,820 --> 00:43:39,406
So if X_1, X_2, up to X_n is
a sequence of random variables
565
00:43:39,406 --> 00:43:48,470
such that the moment-generating
function exists,
566
00:43:48,470 --> 00:43:52,580
and it converges-- ah,
it goes to infinity.
567
00:43:57,542 --> 00:44:03,250
Tends to the
moment-generating function
568
00:44:03,250 --> 00:44:05,380
of some random variable t.
569
00:44:05,380 --> 00:44:13,091
X. For some random
variable X for all t.
570
00:44:16,250 --> 00:44:18,970
Here, we're assuming that all
moment-generating function
571
00:44:18,970 --> 00:44:20,280
exists.
572
00:44:20,280 --> 00:44:22,050
So again, the
situation is, you have
573
00:44:22,050 --> 00:44:24,900
a sequence of random variables.
574
00:44:24,900 --> 00:44:27,600
Their moment-generating
function exists.
575
00:44:27,600 --> 00:44:31,790
And in each point
t, it converges
576
00:44:31,790 --> 00:44:33,967
to the value of the
moment-generating function
577
00:44:33,967 --> 00:44:35,300
of some other random variable x.
578
00:44:38,270 --> 00:44:41,310
And what should happen?
579
00:44:41,310 --> 00:44:43,880
In light of this theorem,
it should be the case
580
00:44:43,880 --> 00:44:47,490
that the distribution
of this sequence
581
00:44:47,490 --> 00:44:49,240
gets closer and closer
to the distribution
582
00:44:49,240 --> 00:44:53,360
of this random variable x.
583
00:44:53,360 --> 00:45:00,220
And to make it formal, to make
that information formal, what
584
00:45:00,220 --> 00:45:09,760
we can conclude is, for
all x, the probability
585
00:45:09,760 --> 00:45:15,440
X_i is less than or equal to
x tends to the probability
586
00:45:15,440 --> 00:45:17,300
that at x.
587
00:45:20,090 --> 00:45:22,990
So in this sense,
the distributions
588
00:45:22,990 --> 00:45:25,940
of these random variables
converges to the distribution
589
00:45:25,940 --> 00:45:27,216
of that random variable.
590
00:45:30,090 --> 00:45:32,330
So it's just a technical issue.
591
00:45:32,330 --> 00:45:38,890
You can just think of it as
these random variables converge
592
00:45:38,890 --> 00:45:41,200
to that random variable.
593
00:45:41,200 --> 00:45:43,230
If you take some graduate
probability course,
594
00:45:43,230 --> 00:45:47,100
you'll see that there's
several possible ways
595
00:45:47,100 --> 00:45:48,730
to define convergence.
596
00:45:48,730 --> 00:45:50,740
But that's just
some technicality.
597
00:45:50,740 --> 00:45:53,397
And the spirit
here is just really
598
00:45:53,397 --> 00:45:55,730
the sequence converges if its
moment-generating function
599
00:45:55,730 --> 00:45:56,229
converges.
600
00:45:59,790 --> 00:46:02,470
So as you can see from
these two theorems,
601
00:46:02,470 --> 00:46:04,440
moment-generating
function, if it exists,
602
00:46:04,440 --> 00:46:08,270
is a really powerful
tool that allows you
603
00:46:08,270 --> 00:46:09,480
to control the distribution.
604
00:46:13,060 --> 00:46:16,407
You'll see some applications
later in central limit theorem.
605
00:46:16,407 --> 00:46:16,990
Any questions?
606
00:46:21,530 --> 00:46:22,446
AUDIENCE: [INAUDIBLE]?
607
00:46:28,557 --> 00:46:29,390
PROFESSOR: This one?
608
00:46:32,870 --> 00:46:34,154
Why?
609
00:46:34,154 --> 00:46:35,612
AUDIENCE: Because
it starts with t,
610
00:46:35,612 --> 00:46:38,162
and the right-hand side
has nothing general.
611
00:46:40,777 --> 00:46:41,360
PROFESSOR: Ah.
612
00:46:44,318 --> 00:46:47,180
Thank you.
613
00:46:47,180 --> 00:46:48,350
We evaluated at zero.
614
00:46:53,230 --> 00:46:54,694
Other questions?
615
00:46:54,694 --> 00:46:56,646
Other corrections?
616
00:46:56,646 --> 00:46:59,086
AUDIENCE: When you say the
moment-generating function
617
00:46:59,086 --> 00:47:01,526
doesn't exist, do you mean
that it isn't analytic
618
00:47:01,526 --> 00:47:03,010
or it doesn't converge?
619
00:47:03,010 --> 00:47:04,580
PROFESSOR: It
might not converge.
620
00:47:04,580 --> 00:47:08,130
So log-normal distribution,
it does not converge.
621
00:47:08,130 --> 00:47:10,412
So for all non-zero
t, it does not
622
00:47:10,412 --> 00:47:12,109
converge, for
log-normal distribution.
623
00:47:12,109 --> 00:47:13,025
AUDIENCE: [INAUDIBLE]?
624
00:47:16,350 --> 00:47:17,140
PROFESSOR: Here?
625
00:47:17,140 --> 00:47:17,640
Yes.
626
00:47:17,640 --> 00:47:19,822
Pointwise convergence implies
pointwise convergence.
627
00:47:22,420 --> 00:47:22,945
No, no.
628
00:47:26,760 --> 00:47:30,474
Because it's pointwise, this
conclusion is also rather weak.
629
00:47:30,474 --> 00:47:32,640
It's almost the weakest
convergence in distribution.
630
00:48:01,024 --> 00:48:01,524
OK.
631
00:48:01,524 --> 00:48:12,480
The law of large numbers.
632
00:49:04,100 --> 00:49:06,940
So now we're talking about
large-scale behavior.
633
00:49:06,940 --> 00:49:09,630
Let X_1 up to X_n be
independent random variables
634
00:49:09,630 --> 00:49:11,334
with identical distribution.
635
00:49:11,334 --> 00:49:13,250
We don't really know
what the distribution is,
636
00:49:13,250 --> 00:49:15,270
but we know that
they're all the same.
637
00:49:15,270 --> 00:49:18,620
In short, I'll just refer
to this condition as i.i.d.
638
00:49:18,620 --> 00:49:21,990
random variables later.
639
00:49:21,990 --> 00:49:25,048
Independent, identically
distributed random variables.
640
00:49:29,040 --> 00:49:36,530
And let mean be mu,
variance be sigma square.
641
00:49:44,470 --> 00:49:50,740
Let's also define X as the
average of n random variables.
642
00:49:54,590 --> 00:50:22,986
Then the probability that--
X-- for all-- all positive
643
00:50:22,986 --> 00:50:23,486
[INAUDIBLE].
644
00:50:31,590 --> 00:50:35,100
So whenever you have identical
independent distributions, when
645
00:50:35,100 --> 00:50:39,050
you take their average, if
you take a large enough number
646
00:50:39,050 --> 00:50:43,430
of samples, they will be
very close to the mean, which
647
00:50:43,430 --> 00:50:44,144
makes sense.
648
00:51:04,420 --> 00:51:06,270
So what's an example of this?
649
00:51:06,270 --> 00:51:14,010
Before proving it, example
of this theorem in practice
650
00:51:14,010 --> 00:51:16,605
can be seen in the casino.
651
00:51:22,530 --> 00:51:25,120
So for example, if
you're playing blackjack
652
00:51:25,120 --> 00:51:38,890
in a casino, when you're
playing against the casino,
653
00:51:38,890 --> 00:51:42,700
you have a very
small disadvantage.
654
00:51:42,700 --> 00:51:52,500
If you're playing at
the optimal strategy,
655
00:51:52,500 --> 00:51:56,380
you have-- does anybody
know the probability?
656
00:51:56,380 --> 00:52:00,460
It's about 48%, 49%.
657
00:52:00,460 --> 00:52:04,520
About 48% chance of winning.
658
00:52:09,160 --> 00:52:14,340
That means if you bet $1 at
the beginning of each round,
659
00:52:14,340 --> 00:52:22,605
the expected amount
you'll win is $0.48.
660
00:52:22,605 --> 00:52:28,060
The expected amount that the
casino will win is $0.52.
661
00:52:28,060 --> 00:52:30,760
But it's designed so
that the variance is
662
00:52:30,760 --> 00:52:37,030
so big that this expectation
is hidden, the mean is hidden.
663
00:52:37,030 --> 00:52:39,390
From the player's
point of view, you only
664
00:52:39,390 --> 00:52:41,390
have a very small sample.
665
00:52:41,390 --> 00:52:44,960
So it looks like the
mean doesn't matter,
666
00:52:44,960 --> 00:52:48,710
because the variance takes
over in a very short scale.
667
00:52:48,710 --> 00:52:50,730
But from the casino's
point of view,
668
00:52:50,730 --> 00:52:54,680
they're taking a
very large n there.
669
00:52:54,680 --> 00:53:02,720
So for each round, let's
say from the casino's
670
00:53:02,720 --> 00:53:13,500
point of view, it's
like taking, they
671
00:53:13,500 --> 00:53:20,520
are taking enormous value of n.
672
00:53:26,640 --> 00:53:27,660
n here.
673
00:53:27,660 --> 00:53:32,380
And that means as long as they
have the slightest advantage,
674
00:53:32,380 --> 00:53:34,993
they'll be winning money,
and a huge amount of money.
675
00:53:38,240 --> 00:53:41,690
And most games played in the
casinos are designed like this.
676
00:53:41,690 --> 00:53:45,730
It looks like the mean
is really close to 50%,
677
00:53:45,730 --> 00:53:47,840
but it's hidden,
because they designed it
678
00:53:47,840 --> 00:53:51,000
so the variance is big.
679
00:53:51,000 --> 00:53:53,180
But from the casino's
point of view,
680
00:53:53,180 --> 00:53:55,010
they have enough
players to play the game
681
00:53:55,010 --> 00:54:02,120
so that the law of large
numbers just makes them money.
682
00:54:07,770 --> 00:54:09,530
The moral is, don't
play blackjack.
683
00:54:12,240 --> 00:54:15,360
Play poker.
684
00:54:15,360 --> 00:54:19,790
The reason that the rule
of law of large numbers
685
00:54:19,790 --> 00:54:23,010
doesn't apply, at least
in this sense, to poker--
686
00:54:23,010 --> 00:54:24,220
can anybody explain why?
687
00:54:27,100 --> 00:54:32,000
It's because poker, you're
playing against other players.
688
00:54:32,000 --> 00:54:36,500
If you have an advantage, if
your skill-- if you believe
689
00:54:36,500 --> 00:54:38,980
that there is skill in poker--
if your skill is better
690
00:54:38,980 --> 00:54:41,330
than the other
player by, let's say,
691
00:54:41,330 --> 00:54:47,010
5% chance, then you have
an edge over that player.
692
00:54:47,010 --> 00:54:48,010
So you can win money.
693
00:54:48,010 --> 00:54:53,870
The only problem is that
because-- poker, you're
694
00:54:53,870 --> 00:54:55,691
not playing against the casino.
695
00:55:00,390 --> 00:55:04,770
Don't play against casino.
696
00:55:04,770 --> 00:55:06,530
But they still
have to make money.
697
00:55:06,530 --> 00:55:08,770
So what they do instead
is they take rake.
698
00:55:08,770 --> 00:55:12,350
So for each round
that the players play,
699
00:55:12,350 --> 00:55:15,740
they pay some fee to the casino.
700
00:55:15,740 --> 00:55:19,920
And how the casino makes
money at the poker table
701
00:55:19,920 --> 00:55:22,870
is by accumulating those fees.
702
00:55:22,870 --> 00:55:25,291
They're not taking
chances there.
703
00:55:25,291 --> 00:55:26,790
But from the player's
point of view,
704
00:55:26,790 --> 00:55:32,405
if you're better than the other
player, and the amount of edge
705
00:55:32,405 --> 00:55:35,630
you have over the other
player is larger than the fee
706
00:55:35,630 --> 00:55:38,000
that the casino
charges to you, then
707
00:55:38,000 --> 00:55:41,380
now you can apply law of large
numbers to yourself and win.
708
00:55:45,420 --> 00:55:50,360
And if you take an
example as poker,
709
00:55:50,360 --> 00:55:54,372
it looks like-- OK, I'm
not going to play poker.
710
00:55:54,372 --> 00:55:59,320
But if it's a hedge
fund, or if you're
711
00:55:59,320 --> 00:56:04,850
doing high-frequency trading,
that's the moral behind it.
712
00:56:04,850 --> 00:56:07,860
So that's the belief
you should have.
713
00:56:07,860 --> 00:56:10,760
You have to believe
that you have an edge.
714
00:56:10,760 --> 00:56:13,660
Even if you have a
tiny edge, if you
715
00:56:13,660 --> 00:56:16,400
can have enough
number of trials,
716
00:56:16,400 --> 00:56:21,000
if you can trade enough of times
using some strategy that you
717
00:56:21,000 --> 00:56:26,580
believe is winning over time,
then law of large numbers
718
00:56:26,580 --> 00:56:31,266
will take it from there and
will bring you money, profit.
719
00:56:34,920 --> 00:56:41,770
Of course, the problem is,
when the variance is big,
720
00:56:41,770 --> 00:56:45,210
your belief starts to fall.
721
00:56:45,210 --> 00:56:48,660
At least, that was the case for
me when I was playing poker.
722
00:56:48,660 --> 00:56:51,650
Because I believed
that I had an edge,
723
00:56:51,650 --> 00:56:55,520
but when there is
really swing, it
724
00:56:55,520 --> 00:56:59,680
looks like your
expectation is negative.
725
00:56:59,680 --> 00:57:01,885
And that's when you have
to believe in yourself.
726
00:57:05,590 --> 00:57:07,690
Yeah.
727
00:57:07,690 --> 00:57:09,480
That's when your
faith in mathematics
728
00:57:09,480 --> 00:57:11,929
is being challenged.
729
00:57:11,929 --> 00:57:12,720
It really happened.
730
00:57:15,290 --> 00:57:17,290
I hope it doesn't happen to you.
731
00:57:17,290 --> 00:57:22,730
Anyway, that's proof
law of large numbers.
732
00:57:22,730 --> 00:57:23,690
How do you prove it?
733
00:57:23,690 --> 00:57:24,690
The proof is quite easy.
734
00:57:27,840 --> 00:57:32,940
First of all, one observation--
expectation of X is just
735
00:57:32,940 --> 00:57:37,640
expectation of 1 over
n times sum of X_i's.
736
00:57:41,400 --> 00:57:52,471
And that, by linearity,
just becomes the sum of--
737
00:57:52,471 --> 00:57:55,883
and that's mu.
738
00:57:55,883 --> 00:57:56,383
OK.
739
00:57:56,383 --> 00:57:59,317
That's good.
740
00:57:59,317 --> 00:58:01,610
And then the variance,
what's the variance of X?
741
00:58:04,430 --> 00:58:09,750
That's the expectation
of X minus mu
742
00:58:09,750 --> 00:58:20,976
square, which is the expectation
sum over all i's, minus mu
743
00:58:20,976 --> 00:58:21,476
square.
744
00:58:24,344 --> 00:58:26,260
I'll group them.
745
00:58:26,260 --> 00:58:33,584
That's the expectation of 1 over
n sum of X_i minus mu square.
746
00:58:33,584 --> 00:58:35,580
i is from 1 to n.
747
00:58:43,570 --> 00:58:44,800
What did I do wrong?
748
00:58:44,800 --> 00:58:46,610
1 over n is inside the square.
749
00:58:46,610 --> 00:58:50,720
So I can take it out
and square, n square.
750
00:58:50,720 --> 00:58:53,660
And then you're summing
n terms of sigma square.
751
00:58:53,660 --> 00:58:57,145
So that is equal to
sigma square over n.
752
00:59:02,450 --> 00:59:04,110
That means the
effect of averaging
753
00:59:04,110 --> 00:59:08,600
n terms does not
affect your average,
754
00:59:08,600 --> 00:59:10,020
but it affects your variance.
755
00:59:13,510 --> 00:59:15,802
It divides your variance by n.
756
00:59:15,802 --> 00:59:18,890
If you take larger and
larger n, your variance
757
00:59:18,890 --> 00:59:20,080
gets smaller and smaller.
758
00:59:22,590 --> 00:59:25,970
And using that, we can
prove this statement.
759
00:59:25,970 --> 00:59:27,840
There's only one thing
you have to notice--
760
00:59:27,840 --> 00:59:30,510
that the probability
that x minus mu
761
00:59:30,510 --> 00:59:32,620
is greater than epsilon.
762
00:59:32,620 --> 00:59:35,840
When you multiply this
by epsilon square.
763
00:59:35,840 --> 00:59:41,230
This will be less than or
equal to the variance of x.
764
00:59:41,230 --> 00:59:42,780
The reason this
inequality holds is
765
00:59:42,780 --> 00:59:46,290
because variance X is defined
as the expectation of X minus mu
766
00:59:46,290 --> 00:59:48,200
square.
767
00:59:48,200 --> 00:59:52,340
For all the events when you have
X minus mu at least epsilon,
768
00:59:52,340 --> 00:59:54,260
your multiplying
factor X square will
769
00:59:54,260 --> 00:59:56,780
be at least epsilon square.
770
00:59:56,780 --> 01:00:00,350
This term will be at
least epsilon square
771
01:00:00,350 --> 01:00:03,520
when you fall into this event.
772
01:00:03,520 --> 01:00:07,100
So your variance has
to be at least that.
773
01:00:07,100 --> 01:00:11,971
And this is known to
be sigma square over n.
774
01:00:11,971 --> 01:00:15,704
So probability that
x minus mu is greater
775
01:00:15,704 --> 01:00:21,980
than epsilon is at most sigma
square over n epsilon squared.
776
01:00:21,980 --> 01:00:26,140
That means if you take n to go
to infinity, that goes to zero.
777
01:00:26,140 --> 01:00:29,590
So the probability that
you deviate from the mean
778
01:00:29,590 --> 01:00:33,187
by more than epsilon goes to 0.
779
01:00:33,187 --> 01:00:35,645
You can actually read out a
little bit more from the proof.
780
01:00:38,690 --> 01:00:41,635
It also tells a little bit
about the speed of convergence.
781
01:00:44,260 --> 01:00:50,230
So let's say you have a random
variable X. Your mean is 50.
782
01:00:50,230 --> 01:00:53,930
You epsilon is 0.1.
783
01:00:53,930 --> 01:00:55,830
So you want to know
the probability
784
01:00:55,830 --> 01:01:00,480
that you deviate from your
mean by more than 0.1.
785
01:01:00,480 --> 01:01:06,010
Let's say you want
to be 99% sure.
786
01:01:06,010 --> 01:01:14,812
Want to be 99% sure that X
minus mu is less than 0.1,
787
01:01:14,812 --> 01:01:18,120
or X minus 50 is less than 0.1.
788
01:01:18,120 --> 01:01:23,060
In that case, what you can do
is-- you want this to be 0.01.
789
01:01:23,060 --> 01:01:26,360
It has to be 0.01.
790
01:01:26,360 --> 01:01:29,800
So plug in that, plug in your
variance, plug in your epsilon.
791
01:01:29,800 --> 01:01:32,230
That will give you
some bound on n.
792
01:01:32,230 --> 01:01:34,190
If you have more than
that number of trials,
793
01:01:34,190 --> 01:01:38,113
you can be 99% sure that you
don't deviate from your mean
794
01:01:38,113 --> 01:01:40,680
by more than epsilon.
795
01:01:40,680 --> 01:01:42,700
So that does give
some estimate, but I
796
01:01:42,700 --> 01:01:46,150
should mention that this
is a very bad estimate.
797
01:01:46,150 --> 01:01:47,990
There are much more
powerful estimates
798
01:01:47,990 --> 01:01:48,970
that can be done here.
799
01:01:48,970 --> 01:01:50,770
That will give the order of
magnitude-- I didn't really
800
01:01:50,770 --> 01:01:53,440
calculate here, but it looks
like it's close to millions.
801
01:01:53,440 --> 01:01:55,900
It has to be close to millions.
802
01:01:55,900 --> 01:02:00,360
But in practice, if you use
a lot more powerful tool
803
01:02:00,360 --> 01:02:05,008
of estimating it, it should
only be hundreds or at most
804
01:02:05,008 --> 01:02:05,508
thousands.
805
01:02:13,460 --> 01:02:15,960
So the tool you'll use there
is moment-generating functions,
806
01:02:15,960 --> 01:02:18,360
something similar to
moment-generating functions.
807
01:02:18,360 --> 01:02:20,412
But I will not go into it.
808
01:02:20,412 --> 01:02:20,995
Any questions?
809
01:02:23,610 --> 01:02:25,090
OK.
810
01:02:25,090 --> 01:02:28,552
For those who already saw
law of large numbers before,
811
01:02:28,552 --> 01:02:30,510
the name suggests there's
also something called
812
01:02:30,510 --> 01:02:32,250
strong law of large numbers.
813
01:02:35,982 --> 01:02:41,380
In that theorem, your
conclusion is stronger.
814
01:02:41,380 --> 01:02:45,005
So the convergence is stronger
than this type of convergence.
815
01:02:47,810 --> 01:02:51,610
And also, the
condition I gave here
816
01:02:51,610 --> 01:02:53,580
is a very strong condition.
817
01:02:53,580 --> 01:02:56,020
The same conclusion
is true even if you
818
01:02:56,020 --> 01:02:58,840
weaken some of the conditions.
819
01:02:58,840 --> 01:03:01,580
So for example, the variance
does not have to exist.
820
01:03:01,580 --> 01:03:06,480
It can be replaced by some
other condition, and so on.
821
01:03:06,480 --> 01:03:08,860
But here, I just want
it to be a simple form
822
01:03:08,860 --> 01:03:11,350
so that it's easy to prove.
823
01:03:11,350 --> 01:03:14,274
And you at least get the
spirit of what's happening.
824
01:03:20,480 --> 01:03:26,140
Now let's move on to the next
topic-- central limit theorem.
825
01:04:11,240 --> 01:04:16,880
So weak law of
large numbers says
826
01:04:16,880 --> 01:04:22,210
that if you have IID random
variables, 1 over n times
827
01:04:22,210 --> 01:04:27,400
sum over X_i's converges to mu,
the mean, in some weak sense.
828
01:04:31,210 --> 01:04:33,730
And the reason it happened
was because this had
829
01:04:33,730 --> 01:04:39,157
mean mu and variance
sigma square over n.
830
01:04:43,660 --> 01:04:49,730
We've exploited the fact that
variance vanishes to get this.
831
01:04:49,730 --> 01:04:53,560
So the question is, what
happens if you replace 1 over n
832
01:04:53,560 --> 01:04:54,903
by 1 over square root n?
833
01:04:59,250 --> 01:05:04,590
What happens if-- for
the random variable
834
01:05:04,590 --> 01:05:08,300
1 over square root n times X_i?
835
01:05:14,180 --> 01:05:16,990
The reason I'm making this
choice of 1 over square root n
836
01:05:16,990 --> 01:05:19,310
is because if you
make this choice,
837
01:05:19,310 --> 01:05:26,330
now the average has mean mu
and variance sigma square just
838
01:05:26,330 --> 01:05:28,770
as in X_i's.
839
01:05:28,770 --> 01:05:34,981
So this is the same as X_i.
840
01:05:40,910 --> 01:05:44,330
Then what should it look like?
841
01:05:44,330 --> 01:05:46,730
If the random variable is the
same mean and same variance
842
01:05:46,730 --> 01:05:52,120
as your original random
variable, the distribution
843
01:05:52,120 --> 01:05:54,795
of this, should it look like
the distribution of X_i?
844
01:06:00,530 --> 01:06:01,290
If mean is mu.
845
01:06:01,290 --> 01:06:04,170
Thank you very much.
846
01:06:04,170 --> 01:06:05,535
The case when mean is 0.
847
01:06:13,160 --> 01:06:13,660
OK.
848
01:06:13,660 --> 01:06:17,620
For this special case,
will it look like X_i,
849
01:06:17,620 --> 01:06:20,820
or will it not look like X_i?
850
01:06:20,820 --> 01:06:24,260
If it doesn't look like X_i,
can we say anything interesting
851
01:06:24,260 --> 01:06:27,590
about the distribution of this?
852
01:06:27,590 --> 01:06:31,480
And central limit theorem
answers this question.
853
01:06:31,480 --> 01:06:34,980
When I first saw it, I thought
it was really interesting.
854
01:06:34,980 --> 01:06:37,161
Because normal
distribution comes up here.
855
01:06:40,250 --> 01:06:42,050
And that's probably
one of the reasons
856
01:06:42,050 --> 01:06:45,010
that normal distribution
is so universal.
857
01:06:45,010 --> 01:06:50,310
Because when you take
many independent events
858
01:06:50,310 --> 01:06:53,270
and take the average
in this sense,
859
01:06:53,270 --> 01:06:56,765
their distribution converges
to a normal distribution.
860
01:06:56,765 --> 01:06:57,265
Yes?
861
01:06:57,265 --> 01:06:59,660
AUDIENCE: How did you get
mean equals [INAUDIBLE]?
862
01:06:59,660 --> 01:07:00,970
PROFESSOR: I didn't get it.
863
01:07:00,970 --> 01:07:02,678
I assumed it if X-- yeah.
864
01:07:29,600 --> 01:07:41,480
So theorem: let
X_1, X_2, to X_n be
865
01:07:41,480 --> 01:07:51,960
IID random variables with mean,
this time, mu and variance
866
01:07:51,960 --> 01:07:55,020
sigma squared.
867
01:07:55,020 --> 01:07:59,308
And let X-- or Y_n.
868
01:08:01,940 --> 01:08:10,023
Y_n be square root n times
1 over n, of X_i minus mu.
869
01:08:24,813 --> 01:08:41,080
Then the distribution
of Y_n converges
870
01:08:41,080 --> 01:08:50,056
to that of normal distribution
with mean 0 and variance sigma.
871
01:08:55,050 --> 01:08:57,350
What this means-- I'll
write it down again--
872
01:08:57,350 --> 01:09:01,790
it means for all x,
probability that Y_n
873
01:09:01,790 --> 01:09:03,790
is less than or
equal to x converges
874
01:09:03,790 --> 01:09:07,722
the probability that normal
distribution is less than
875
01:09:07,722 --> 01:09:08,910
or equal to x.
876
01:09:14,140 --> 01:09:16,220
What's really
interesting here is,
877
01:09:16,220 --> 01:09:20,340
no matter what distribution
you had in the beginning,
878
01:09:20,340 --> 01:09:24,090
if we average it
out in this sense,
879
01:09:24,090 --> 01:09:25,965
then you converge to
the normal distribution.
880
01:09:35,429 --> 01:09:37,720
Any questions about this
statement, or any corrections?
881
01:09:40,490 --> 01:09:43,545
Any mistakes that I made?
882
01:09:43,545 --> 01:09:46,015
OK.
883
01:09:46,015 --> 01:09:47,003
Here's the proof.
884
01:09:50,970 --> 01:09:54,400
I will prove it when the
moment-generating function
885
01:09:54,400 --> 01:09:54,900
exists.
886
01:09:54,900 --> 01:09:56,816
So assume that the
moment-generating functions
887
01:09:56,816 --> 01:09:58,010
exists.
888
01:09:58,010 --> 01:10:04,963
So proof assuming
m of X_i exists.
889
01:10:16,810 --> 01:10:19,860
So remember that theorem.
890
01:10:19,860 --> 01:10:22,160
Try to recall that
theorem where if you
891
01:10:22,160 --> 01:10:25,130
know that the moment-generating
function of Y_n's converges
892
01:10:25,130 --> 01:10:29,250
to the moment-generating
function of the normal, then
893
01:10:29,250 --> 01:10:30,210
we have the statement.
894
01:10:30,210 --> 01:10:31,400
The distribution converges.
895
01:10:31,400 --> 01:10:34,328
So that's the statement
we're going to use.
896
01:10:34,328 --> 01:10:37,100
That means our goal is to prove
that the moment-generating
897
01:10:37,100 --> 01:10:43,020
function of these Y_n's converge
to the moment-generating
898
01:10:43,020 --> 01:10:51,088
function of the normal for
all t, pointwise convergence.
899
01:10:56,360 --> 01:11:00,080
And this part is well known.
900
01:11:00,080 --> 01:11:01,455
I'll just write it down.
901
01:11:01,455 --> 01:11:06,094
It's known to be e to the t
square sigma square over 2.
902
01:11:08,818 --> 01:11:11,173
That just can be computed.
903
01:11:18,610 --> 01:11:21,270
So we want to somehow show that
the moment-generating function
904
01:11:21,270 --> 01:11:25,738
of this Y_n converges to that.
905
01:11:25,738 --> 01:11:29,440
The moment-generating
function of Y_n
906
01:11:29,440 --> 01:11:36,102
is equal to expectation
of e to t Y_n.
907
01:11:42,544 --> 01:11:50,496
e to the t, 1 over square
root n, sum of X_i minus mu.
908
01:11:54,490 --> 01:11:57,680
And then because each of
the X_i's are independent,
909
01:11:57,680 --> 01:11:59,403
this sum will split
into products.
910
01:12:02,650 --> 01:12:14,059
Product of-- let
me split it better.
911
01:12:14,059 --> 01:12:19,240
Meets the expectation-- we
didn't use independent yet.
912
01:12:19,240 --> 01:12:26,504
Sum becomes products of e to
the t, 1 over square root n, X_i
913
01:12:26,504 --> 01:12:27,462
minus mu.
914
01:12:34,650 --> 01:12:36,380
And then because
they're independent,
915
01:12:36,380 --> 01:12:37,530
this product can go out.
916
01:12:40,925 --> 01:12:49,996
Equal to the product from 1 to
n expectation e to the t times
917
01:12:49,996 --> 01:12:50,984
square root n--
918
01:12:56,160 --> 01:12:56,660
OK.
919
01:12:56,660 --> 01:12:58,159
Now they're identically
distributed,
920
01:12:58,159 --> 01:13:00,900
so you just have to take
the n-th power of that.
921
01:13:00,900 --> 01:13:03,923
That's equal to the
expectation of e
922
01:13:03,923 --> 01:13:11,920
to the t over square root n,
X_i minus mu, to the n-th power.
923
01:13:11,920 --> 01:13:15,420
Now we'll do some estimation.
924
01:13:15,420 --> 01:13:19,450
So use the Taylor
expansion of this.
925
01:13:19,450 --> 01:13:30,002
What we get is expectation of 1
plus that, t over square root n
926
01:13:30,002 --> 01:13:36,990
xi minus mu, plus 1 over
2 factorial, that squared,
927
01:13:36,990 --> 01:13:43,760
t over square root n,
xi minus mu squared,
928
01:13:43,760 --> 01:13:48,748
plus 1 over 3 factorial,
that cubed plus so on.
929
01:13:55,050 --> 01:13:57,990
Then that's equal to 1--
Ah, to the n-th power.
930
01:14:02,920 --> 01:14:06,890
The linearity of
expectation, 1 comes out.
931
01:14:06,890 --> 01:14:12,830
Second term is 0,
because X_i have mean mu.
932
01:14:12,830 --> 01:14:15,020
So that disappears.
933
01:14:15,020 --> 01:14:26,930
This term-- we have 1 over 2,
t squared over n, X_i minus mu
934
01:14:26,930 --> 01:14:29,370
square.
935
01:14:29,370 --> 01:14:31,590
X_i minus mu square, when
you take expectation,
936
01:14:31,590 --> 01:14:35,550
that will be sigma square.
937
01:14:35,550 --> 01:14:39,720
And then the terms after
that, because we're
938
01:14:39,720 --> 01:14:42,850
only interested in
proving that for fixed t,
939
01:14:42,850 --> 01:14:46,160
this converges-- so we're only
proving pointwise convergence.
940
01:14:46,160 --> 01:14:49,030
You may consider t
as a fixed number.
941
01:14:49,030 --> 01:14:52,540
So as n goes to infinity--
if n is really, really large,
942
01:14:52,540 --> 01:14:56,730
all these terms will be
smaller order of magnitude
943
01:14:56,730 --> 01:15:00,830
than n, 1 over n.
944
01:15:00,830 --> 01:15:02,270
Something like that happens.
945
01:15:08,530 --> 01:15:11,250
And that's happening
because we're fixed.
946
01:15:11,250 --> 01:15:14,260
For fixed t, we
have to prove it.
947
01:15:14,260 --> 01:15:16,292
So if we're saying
something uniformly about t,
948
01:15:16,292 --> 01:15:18,390
that's no longer true.
949
01:15:18,390 --> 01:15:21,060
Now we go back to
the exponential form.
950
01:15:21,060 --> 01:15:26,540
So this is pretty much
just e to that term,
951
01:15:26,540 --> 01:15:30,900
1 over 2 t square
sigma square over n
952
01:15:30,900 --> 01:15:37,370
plus little o of 1 over
n to the n-th power.
953
01:15:37,370 --> 01:15:42,980
Now, that n can be
multiplied to cancel out.
954
01:15:42,980 --> 01:15:46,640
And we see that it's e to t
square sigma square over 2
955
01:15:46,640 --> 01:15:48,342
plus the little o of 1.
956
01:15:48,342 --> 01:15:50,370
So if you take n
to go to infinity,
957
01:15:50,370 --> 01:15:55,840
that term disappears,
and we prove
958
01:15:55,840 --> 01:15:57,410
that it converges to that.
959
01:16:00,100 --> 01:16:04,516
And then by the theorem that I
stated before, if we have this,
960
01:16:04,516 --> 01:16:06,182
we know that the
distribution converges.
961
01:16:09,880 --> 01:16:10,500
Any questions?
962
01:16:13,760 --> 01:16:14,260
OK.
963
01:16:14,260 --> 01:16:15,515
I'll make one final remark.
964
01:16:29,009 --> 01:16:42,640
So suppose there is a random
variable x whose mean we do not
965
01:16:42,640 --> 01:16:44,865
know, whose mean is unknown.
966
01:16:53,670 --> 01:16:55,710
Our goal is to
estimate the mean.
967
01:16:58,970 --> 01:17:02,730
And one way to do that is by
taking many independent trials
968
01:17:02,730 --> 01:17:05,220
of this random variable.
969
01:17:05,220 --> 01:17:21,680
So take independent trials X_1,
X_2, to X_n, and use 1 over--
970
01:17:21,680 --> 01:17:22,250
X_1 plus...
971
01:17:22,250 --> 01:17:23,565
X_n as our estimator.
972
01:17:32,960 --> 01:17:34,990
Then the law of large
numbers says that this
973
01:17:34,990 --> 01:17:36,750
will be very close to the mean.
974
01:17:36,750 --> 01:17:39,840
So if you take n
to be large enough,
975
01:17:39,840 --> 01:17:42,100
you will more than likely
have some value which
976
01:17:42,100 --> 01:17:44,190
is very close to the mean.
977
01:17:44,190 --> 01:17:47,050
And then the central
limit theorem
978
01:17:47,050 --> 01:17:53,530
tells you how the
distribution of this variable
979
01:17:53,530 --> 01:17:55,915
is around the mean.
980
01:17:55,915 --> 01:17:57,920
So we don't know what
the real value is,
981
01:17:57,920 --> 01:18:00,620
but we know that
the distribution
982
01:18:00,620 --> 01:18:02,980
of the value that
we will obtain here
983
01:18:02,980 --> 01:18:05,048
is something like
that around the mean.
984
01:18:09,340 --> 01:18:17,080
And because normal distribution
have very small tails,
985
01:18:17,080 --> 01:18:21,900
the tail distributions
is really small,
986
01:18:21,900 --> 01:18:23,950
we will get really
close really fast.
987
01:18:27,290 --> 01:18:34,387
And this is known as the maximum
likelihood estimator, is it?
988
01:18:37,670 --> 01:18:38,310
OK, yeah.
989
01:18:38,310 --> 01:18:39,980
For some distributions,
it's better
990
01:18:39,980 --> 01:18:44,080
to take some other estimator.
991
01:18:44,080 --> 01:18:47,280
Which is quite interesting.
992
01:18:47,280 --> 01:18:50,015
At least my intuition is to
take this for every single case,
993
01:18:50,015 --> 01:18:52,890
looks like that will
be a good choice.
994
01:18:52,890 --> 01:18:54,680
But it turns out that
that's not the case;
995
01:18:54,680 --> 01:18:59,492
for some distributions there's
a better choice than this.
996
01:18:59,492 --> 01:19:03,210
And Peter will
later talk about it.
997
01:19:06,340 --> 01:19:09,960
If you're interested
in, come back.
998
01:19:09,960 --> 01:19:13,960
And that's it for
today, any questions?
999
01:19:13,960 --> 01:19:17,875
So next Tuesday we will
have an outside speaker,
1000
01:19:17,875 --> 01:19:21,256
and it will be on bonds.
1001
01:19:21,256 --> 01:19:24,883
and I don't think anything from
linear algebra will be here.