1
00:00:00,120 --> 00:00:02,460
The following content is
provided under a Creative

2
00:00:02,460 --> 00:00:03,880
Commons license.

3
00:00:03,880 --> 00:00:06,090
Your support will help
MIT OpenCourseWare

4
00:00:06,090 --> 00:00:10,180
continue to offer high quality
educational resources for free.

5
00:00:10,180 --> 00:00:12,720
To make a donation or to
view additional materials

6
00:00:12,720 --> 00:00:16,680
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,680 --> 00:00:17,880
at ocw.mit.edu.

8
00:00:20,270 --> 00:00:22,770
PROFESSOR: So we've been talking
about this chi square test.

9
00:00:22,770 --> 00:00:26,400
And the name chi square
comes from the fact

10
00:00:26,400 --> 00:00:28,710
that we build a
test statistic that

11
00:00:28,710 --> 00:00:31,110
has asymptotic
distribution given

12
00:00:31,110 --> 00:00:36,080
by the chi square distribution.

13
00:00:36,080 --> 00:00:37,460
Let's just give it another shot.

14
00:00:44,970 --> 00:00:47,440
OK.

15
00:00:47,440 --> 00:00:48,210
This test.

16
00:00:48,210 --> 00:00:50,770
Who has actually ever
encountered the chi square test

17
00:00:50,770 --> 00:00:54,150
outside of a stats classroom?

18
00:00:54,150 --> 00:00:54,930
All right.

19
00:00:54,930 --> 00:00:55,940
So some people have.

20
00:00:55,940 --> 00:00:59,160
It's a fairly common test
that you might encounter.

21
00:00:59,160 --> 00:01:01,650
And it was essentially
to test, if given

22
00:01:01,650 --> 00:01:06,900
some data with a fixed
probability mass function, so

23
00:01:06,900 --> 00:01:08,580
a discrete
distribution, you wanted

24
00:01:08,580 --> 00:01:12,990
to test if the PMF was
equal to a set value, p0,

25
00:01:12,990 --> 00:01:15,000
or if it was different from p0.

26
00:01:15,000 --> 00:01:18,480
And the way the chi
square arose here

27
00:01:18,480 --> 00:01:22,160
was by looking at Wald's test.

28
00:01:22,160 --> 00:01:25,020
And essentially if you write--
so Wald's is the one that

29
00:01:25,020 --> 00:01:27,120
has the chi square as the
limiting distribution,

30
00:01:27,120 --> 00:01:31,900
and if you invert the
covariance matrix,

31
00:01:31,900 --> 00:01:33,900
the asymptotic covariance
matrix, so you compute

32
00:01:33,900 --> 00:01:36,108
the Fisher information,
which in this particular case

33
00:01:36,108 --> 00:01:39,360
does not exist for the
multinomial distribution,

34
00:01:39,360 --> 00:01:41,180
but we found the trick
on how to do this.

35
00:01:41,180 --> 00:01:44,470
We remove the part that
forbid it to be invertible,

36
00:01:44,470 --> 00:01:46,309
then we found this chi
square distribution.

37
00:01:46,309 --> 00:01:47,850
In a way we have
this test statistic,

38
00:01:47,850 --> 00:01:50,550
which you might have learned
as a black box, laundry list,

39
00:01:50,550 --> 00:01:53,280
but going through the math
which might have been slightly

40
00:01:53,280 --> 00:01:56,010
unpleasant, I acknowledge,
but really told you

41
00:01:56,010 --> 00:01:59,130
why you should do this
particular normalization.

42
00:01:59,130 --> 00:02:04,470
So since some of you requested
a little more practical examples

43
00:02:04,470 --> 00:02:07,200
of how those things work,
let me show you a couple.

44
00:02:07,200 --> 00:02:12,210
The first one is you want to
answer the question, well,

45
00:02:12,210 --> 00:02:16,440
you know, when should I
be born to be successful.

46
00:02:16,440 --> 00:02:20,220
Some people believe in zodiac,
and so Fortune magazine

47
00:02:20,220 --> 00:02:24,870
actually collected the signs of
256 heads of the Fortune 500.

48
00:02:24,870 --> 00:02:26,407
Those were taken randomly.

49
00:02:26,407 --> 00:02:27,990
And they were collected
there, and you

50
00:02:27,990 --> 00:02:31,260
can see the count of
number of CEOs that

51
00:02:31,260 --> 00:02:33,060
have a particular zodiac sign.

52
00:02:33,060 --> 00:02:35,580
And if this was completely
uniformly distributed,

53
00:02:35,580 --> 00:02:37,410
you should actually
get a number that's

54
00:02:37,410 --> 00:02:42,540
around 256 divided by 12,
which in this case is 21.33.

55
00:02:42,540 --> 00:02:45,870
And you can see that
there is numbers

56
00:02:45,870 --> 00:02:49,890
that are probably in the
vicinity, but look at this guy.

57
00:02:49,890 --> 00:02:51,360
Pisces, that's 29.

58
00:02:51,360 --> 00:02:53,020
So who's Pisces here?

59
00:02:53,020 --> 00:02:55,600
All right.

60
00:02:55,600 --> 00:02:57,510
All right, so give
me your information

61
00:02:57,510 --> 00:02:59,940
and we'll meet
again in 10 years.

62
00:02:59,940 --> 00:03:02,784
And so basically you
might want to test

63
00:03:02,784 --> 00:03:04,950
if actually the fact that
it's uniformly distributed

64
00:03:04,950 --> 00:03:06,280
is a valid assumption.

65
00:03:06,280 --> 00:03:09,240
Now this is clearly
a random variable.

66
00:03:09,240 --> 00:03:13,230
I pick a random
CEO and I measure

67
00:03:13,230 --> 00:03:16,570
what his zodiac sign is.

68
00:03:16,570 --> 00:03:19,530
And I want to know, so it's a
probability over, I don't know,

69
00:03:19,530 --> 00:03:20,850
12 zodiac signs.

70
00:03:20,850 --> 00:03:23,330
And I want to know if
it's uniform or not.

71
00:03:23,330 --> 00:03:25,230
Uniform sounds like it
should be the status

72
00:03:25,230 --> 00:03:27,390
quo, if you're reasonable.

73
00:03:27,390 --> 00:03:31,530
And maybe there's actually
something that moves away.

74
00:03:31,530 --> 00:03:34,950
So we could do this, in view
of these data is there evidence

75
00:03:34,950 --> 00:03:36,720
that one is different.

76
00:03:36,720 --> 00:03:38,730
Here is another example
where you might want

77
00:03:38,730 --> 00:03:40,560
to apply the chi square test.

78
00:03:40,560 --> 00:03:44,219
So as I said, the
benchmark distribution

79
00:03:44,219 --> 00:03:46,260
was the uniform distribution
for the zodiac sign,

80
00:03:46,260 --> 00:03:47,843
and that's usually
the one I give you.

81
00:03:47,843 --> 00:03:49,890
1 over k, 1 over k,
because well that's

82
00:03:49,890 --> 00:03:53,930
sort of the zero, the central
point for all distributions.

83
00:03:53,930 --> 00:03:57,354
That's the point, the center
of what we call the simplex.

84
00:03:57,354 --> 00:03:58,770
But you can have
another benchmark

85
00:03:58,770 --> 00:03:59,970
that sort of makes sense.

86
00:03:59,970 --> 00:04:04,770
So for example this is an actual
dataset where 275 jurors were

87
00:04:04,770 --> 00:04:09,360
identified, racial
group were collected,

88
00:04:09,360 --> 00:04:10,890
and you actually
might want to know

89
00:04:10,890 --> 00:04:12,990
if you know juries
in this country

90
00:04:12,990 --> 00:04:17,620
are actually representative
of the actual population.

91
00:04:17,620 --> 00:04:19,500
And so here of
course, the population

92
00:04:19,500 --> 00:04:23,269
is not uniformly distributed
according to racial group.

93
00:04:23,269 --> 00:04:24,810
And the way you
actually do it is you

94
00:04:24,810 --> 00:04:26,450
actually go on
Wikipedia, for example,

95
00:04:26,450 --> 00:04:28,700
and you look at the demographics
of the United States,

96
00:04:28,700 --> 00:04:33,570
and you find that the proportion
of white is 72%, black is 7%,

97
00:04:33,570 --> 00:04:41,610
Hispanic is 12, and
other is about 9%.

98
00:04:41,610 --> 00:04:43,080
So that's a total of 1.

99
00:04:43,080 --> 00:04:46,670
And this is what we actually
measured for some jurors.

100
00:04:46,670 --> 00:04:48,120
So for this guy,
you can actually

101
00:04:48,120 --> 00:04:49,470
run the chi square test.

102
00:04:49,470 --> 00:04:51,630
You have the estimated
proportion, which

103
00:04:51,630 --> 00:04:53,070
comes from this first line.

104
00:04:53,070 --> 00:04:55,004
You have the tested
proportion, p0,

105
00:04:55,004 --> 00:04:56,670
that comes from the
second line, and you

106
00:04:56,670 --> 00:04:58,503
might want to check if
those things actually

107
00:04:58,503 --> 00:04:59,970
correspond to each other.

108
00:04:59,970 --> 00:05:01,650
OK, so I'm not going
to do it for you,

109
00:05:01,650 --> 00:05:03,240
but I sort of
invite you to do it

110
00:05:03,240 --> 00:05:05,400
and test, and see
how this compares

111
00:05:05,400 --> 00:05:07,474
to the quantiles of
the appropriate chi

112
00:05:07,474 --> 00:05:10,140
square distribution and see what
you can conclude from those two

113
00:05:10,140 --> 00:05:12,320
things.

114
00:05:12,320 --> 00:05:12,820
All right.

115
00:05:12,820 --> 00:05:15,180
So this was the
multinomial case.

116
00:05:15,180 --> 00:05:17,110
So this is essentially
what we did.

117
00:05:17,110 --> 00:05:19,130
We computed the MLE under
the right constraint,

118
00:05:19,130 --> 00:05:20,980
and that was our test
statistic that converges

119
00:05:20,980 --> 00:05:22,130
to the chi square distribution.

120
00:05:22,130 --> 00:05:23,588
So if you've seen
it before, that's

121
00:05:23,588 --> 00:05:24,790
all that was given to you.

122
00:05:24,790 --> 00:05:27,820
Now we know why the
normalization here

123
00:05:27,820 --> 00:05:33,310
is p0 j and not p0 j squared or
square root of p0 j, or even 1.

124
00:05:33,310 --> 00:05:35,074
I mean it's not clear
that this should

125
00:05:35,074 --> 00:05:36,490
be the right
normalization, but we

126
00:05:36,490 --> 00:05:38,950
know that's what
comes from taking

127
00:05:38,950 --> 00:05:41,410
the right normalization,
which comes from the Fisher

128
00:05:41,410 --> 00:05:42,230
information.

129
00:05:42,230 --> 00:05:43,421
All right?

130
00:05:43,421 --> 00:05:43,920
OK.

131
00:05:47,500 --> 00:05:50,290
The thing I wanted to move
onto, so we've basically covered

132
00:05:50,290 --> 00:05:51,130
chi square test.

133
00:05:51,130 --> 00:05:53,980
Are there any questions
about chi square test?

134
00:05:53,980 --> 00:05:56,140
And for those of you who
were not here on Thursday,

135
00:05:56,140 --> 00:05:57,662
I'm really just--

136
00:05:57,662 --> 00:05:59,530
do not pretend I just did it.

137
00:05:59,530 --> 00:06:01,730
That's something we
did last Thursday.

138
00:06:01,730 --> 00:06:03,324
But are there any
questions that arose

139
00:06:03,324 --> 00:06:04,990
when you were reading
your notes, things

140
00:06:04,990 --> 00:06:06,230
that you didn't understand?

141
00:06:06,230 --> 00:06:06,952
Yes.

142
00:06:06,952 --> 00:06:09,412
AUDIENCE: Is there
like a formal name?

143
00:06:09,412 --> 00:06:12,856
Before we had talked about
how what we call the Fisher

144
00:06:12,856 --> 00:06:17,776
information [INAUDIBLE],,
still has the same [INAUDIBLE]

145
00:06:17,776 --> 00:06:21,834
because it's the same number.

146
00:06:21,834 --> 00:06:23,250
PROFESSOR: So it's
not the Fisher.

147
00:06:23,250 --> 00:06:25,990
The Fisher information does
not exist in this case.

148
00:06:25,990 --> 00:06:27,992
And so there's no
appropriate name for this.

149
00:06:27,992 --> 00:06:30,450
It's the pseudoinverse of the
asymptotic covariance matrix,

150
00:06:30,450 --> 00:06:32,462
and that's what it is.

151
00:06:32,462 --> 00:06:34,170
I don't know if I
mentioned it last time,

152
00:06:34,170 --> 00:06:36,230
but there's this entire
field that uses--

153
00:06:36,230 --> 00:06:39,420
you know, for people who really
aspire to differential geometry

154
00:06:39,420 --> 00:06:41,095
but are stuck in the
stats department,

155
00:06:41,095 --> 00:06:43,470
and there's this thing called
information geometry, which

156
00:06:43,470 --> 00:06:47,220
is essentially studying
the manifolds associated

157
00:06:47,220 --> 00:06:50,460
to the Fisher information
metric, the metric that's

158
00:06:50,460 --> 00:06:52,410
associated to
Fisher information.

159
00:06:52,410 --> 00:06:55,810
And so those of course can be
lower dimensional manifolds,

160
00:06:55,810 --> 00:06:58,002
not only distorts the
geometry but forces everything

161
00:06:58,002 --> 00:06:59,460
to live on a lower
dimension, which

162
00:06:59,460 --> 00:07:02,410
is what happens when your Fisher
information does not exist.

163
00:07:02,410 --> 00:07:04,350
And so there's a bunch
of things that you

164
00:07:04,350 --> 00:07:06,900
can study, what this manifold
looks like, et cetera.

165
00:07:06,900 --> 00:07:09,400
But no, there's no
particular terminology here

166
00:07:09,400 --> 00:07:12,879
about going here.

167
00:07:12,879 --> 00:07:14,670
To be fair, within the
scope of this class,

168
00:07:14,670 --> 00:07:18,320
this is the only
case where you--

169
00:07:18,320 --> 00:07:19,970
multinomial case
is the only case

170
00:07:19,970 --> 00:07:26,169
where you typically see a lack
of a Fisher information matrix.

171
00:07:26,169 --> 00:07:28,460
And that's just because we
have these extra constraints

172
00:07:28,460 --> 00:07:30,200
that the sum of the
parameters should be 1.

173
00:07:30,200 --> 00:07:31,866
And if you have an
extra constraint that

174
00:07:31,866 --> 00:07:34,370
seems like it's actually
remove one degree of freedom,

175
00:07:34,370 --> 00:07:36,470
this will happen inevitably.

176
00:07:36,470 --> 00:07:40,040
And so maybe what you
can do is reparameterize.

177
00:07:40,040 --> 00:07:44,240
So if I actually reparameterize
everything function of p1

178
00:07:44,240 --> 00:07:46,940
to p k minus 1, and
then 1 minus the sum,

179
00:07:46,940 --> 00:07:48,500
this would not have happened.

180
00:07:48,500 --> 00:07:51,200
Because I have only a
k-dimensional space.

181
00:07:51,200 --> 00:07:53,170
So there's tricks
around this to make it

182
00:07:53,170 --> 00:07:56,060
exist if you want it to exist.

183
00:07:56,060 --> 00:07:58,740
Any other question?

184
00:07:58,740 --> 00:07:59,240
All right.

185
00:07:59,240 --> 00:08:02,180
So let's move on to
Student's t-test.

186
00:08:02,180 --> 00:08:03,650
We mentioned it last time.

187
00:08:03,650 --> 00:08:06,390
So essentially you've
probably done it

188
00:08:06,390 --> 00:08:09,770
more even in the homework than
you've done it in lectures,

189
00:08:09,770 --> 00:08:12,870
but just quickly this
is essentially the test.

190
00:08:12,870 --> 00:08:15,505
That's the test when we have
an actual data that comes

191
00:08:15,505 --> 00:08:16,630
from a normal distribution.

192
00:08:16,630 --> 00:08:18,750
There is no Central Limit
Theorem that exists.

193
00:08:18,750 --> 00:08:21,080
This is really to
account for the fact

194
00:08:21,080 --> 00:08:24,140
that for smaller
sample sizes, it

195
00:08:24,140 --> 00:08:27,920
might be the case that it's
not exactly true that when

196
00:08:27,920 --> 00:08:33,860
I look at xn bar minus mu
divided by-- so if I look

197
00:08:33,860 --> 00:08:37,010
at xn bar minus mu divided by
sigma times square root of n,

198
00:08:37,010 --> 00:08:41,179
then this thing should have N
0, 1 distribution approximately.

199
00:08:41,179 --> 00:08:42,280
Right?

200
00:08:42,280 --> 00:08:45,770
By the Central Limit Theorem.

201
00:08:45,770 --> 00:08:47,120
So that's for n large.

202
00:08:47,120 --> 00:08:53,780
But if n is small,
then it's still true

203
00:08:53,780 --> 00:09:00,910
when the data is N
mu, sigma squared,

204
00:09:00,910 --> 00:09:02,940
then it's true that
square root of n--

205
00:09:09,310 --> 00:09:12,040
so here it's approximately.

206
00:09:12,040 --> 00:09:14,720
And this is always true.

207
00:09:14,720 --> 00:09:16,930
But I don't know sigma
in practice, right?

208
00:09:16,930 --> 00:09:20,550
Maybe mu, it comes from my,
maybe mu comes from my mu

209
00:09:20,550 --> 00:09:23,830
0, maybe something
from the test statistic

210
00:09:23,830 --> 00:09:25,290
where mu actually is here.

211
00:09:25,290 --> 00:09:27,490
But for this guy I'm
going to have inevitably

212
00:09:27,490 --> 00:09:29,120
to find an estimator.

213
00:09:29,120 --> 00:09:32,800
And now in this case, for small
n, this is no longer true.

214
00:09:32,800 --> 00:09:34,375
And what the t
statistic is doing

215
00:09:34,375 --> 00:09:36,875
is essentially telling you what
the distribution of this guy

216
00:09:36,875 --> 00:09:37,780
is.

217
00:09:37,780 --> 00:09:41,050
So what you should say
is that now this guy

218
00:09:41,050 --> 00:09:44,440
has a t distribution with n
minus 1 degrees of freedom.

219
00:09:44,440 --> 00:09:47,296
That's basically the
laundry list stats

220
00:09:47,296 --> 00:09:48,170
that you would learn.

221
00:09:48,170 --> 00:09:50,860
It says just look at a different
table, that's what it is.

222
00:09:50,860 --> 00:09:55,180
But we actually defined
what a t distribution was.

223
00:09:55,180 --> 00:09:58,510
And a t distribution
is basically

224
00:09:58,510 --> 00:10:03,490
something that has the same
distribution as some N 0, 1,

225
00:10:03,490 --> 00:10:06,040
divided by the square
root of a chi square

226
00:10:06,040 --> 00:10:08,110
with d degrees of
freedom divided by d.

227
00:10:08,110 --> 00:10:12,750
And that's a t distribution
with d degrees of freedom.

228
00:10:12,750 --> 00:10:14,785
And those two have
to be independent.

229
00:10:20,960 --> 00:10:24,280
And so what I need to check
is that this guy over there

230
00:10:24,280 --> 00:10:25,030
is of this form.

231
00:10:37,160 --> 00:10:39,120
OK?

232
00:10:39,120 --> 00:10:41,830
So let's look at the numerator.

233
00:10:41,830 --> 00:10:45,986
Well, square root of
n, xn bar minus mu.

234
00:10:45,986 --> 00:10:47,610
What is the distribution
of this thing?

235
00:10:47,610 --> 00:10:50,562
Is it an N 0, 1?

236
00:10:50,562 --> 00:10:52,569
AUDIENCE: N 0, sigma squared?

237
00:10:52,569 --> 00:10:54,110
PROFESSOR: N 0,
sigma squared, right.

238
00:10:58,817 --> 00:11:00,150
So I'm not going to put it here.

239
00:11:00,150 --> 00:11:01,714
So if I want this
guy to be N 0, 1,

240
00:11:01,714 --> 00:11:04,130
I need to divide by sigma,
that's what we have over there.

241
00:11:06,980 --> 00:11:09,820
So that's my N 0, 1 that's going
to play the role of this guy

242
00:11:09,820 --> 00:11:11,300
here.

243
00:11:11,300 --> 00:11:13,970
So if I want to go
a little further,

244
00:11:13,970 --> 00:11:21,090
I need to just say, OK, now I
need to have square root of n,

245
00:11:21,090 --> 00:11:23,670
and I need to find
something here

246
00:11:23,670 --> 00:11:27,300
that looks like my square
root of chi square divided

247
00:11:27,300 --> 00:11:28,119
by-- yeah?

248
00:11:28,119 --> 00:11:29,452
AUDIENCE: Really quick question.

249
00:11:29,452 --> 00:11:32,700
The equals sign with the d on
top, that's just defined as?

250
00:11:32,700 --> 00:11:35,440
PROFESSOR: No, that's
just the distribution.

251
00:11:35,440 --> 00:11:37,207
So, I don't know.

252
00:11:37,207 --> 00:11:38,290
AUDIENCE: Then never mind.

253
00:11:38,290 --> 00:11:41,800
PROFESSOR: Let's just write
it like that, if you want.

254
00:11:41,800 --> 00:11:44,406
I mean, that's not really
appropriate to have.

255
00:11:44,406 --> 00:11:46,030
Usually you write
only one distribution

256
00:11:46,030 --> 00:11:48,310
on the right-hand inside
of this little thing.

257
00:11:48,310 --> 00:11:51,160
So not just this complicated
function of distributions.

258
00:11:51,160 --> 00:11:53,110
This is more like to explain.

259
00:11:53,110 --> 00:11:54,910
OK, and so usually
the thing you should

260
00:11:54,910 --> 00:11:58,990
say that t is equal to this
X divided by square root of Z

261
00:11:58,990 --> 00:12:01,660
divided by d where X
has normal distribution,

262
00:12:01,660 --> 00:12:06,210
Z has chi square distribution
with d degrees of freedom.

263
00:12:06,210 --> 00:12:07,230
So what do we need here?

264
00:12:07,230 --> 00:12:10,031
Well I need to have something
which looks like my sigma hat,

265
00:12:10,031 --> 00:12:10,530
right?

266
00:12:10,530 --> 00:12:13,740
So somehow inevitably I'm going
to need to have sigma hat.

267
00:12:16,430 --> 00:12:18,866
Now of course I need to
divide this by my sigma

268
00:12:18,866 --> 00:12:19,990
so that my sigma goes away.

269
00:12:22,710 --> 00:12:25,195
And so now this thing here--

270
00:12:25,195 --> 00:12:27,880
sorry, I should move
on to the right, OK.

271
00:12:27,880 --> 00:12:33,270
And so this thing here, so
sigma hat is square root of Sn.

272
00:12:33,270 --> 00:12:35,300
And now I'm almost there.

273
00:12:35,300 --> 00:12:38,090
So this thing is actually
equal to square root of n.

274
00:12:47,760 --> 00:12:51,920
But this thing here
is actually not a--

275
00:12:55,100 --> 00:12:57,890
so this thing here
follows a distribution

276
00:12:57,890 --> 00:13:00,530
which is actually a
chi square, square root

277
00:13:00,530 --> 00:13:11,510
of a chi square
distribution divided by n.

278
00:13:15,460 --> 00:13:18,340
Yeah, that's the square
root chi square distribution

279
00:13:18,340 --> 00:13:20,740
with n minus 1 degrees
of freedom divided

280
00:13:20,740 --> 00:13:25,360
by n, because sigma hat
is equal to 1 over n sum

281
00:13:25,360 --> 00:13:30,290
from i equal 1 to n,
xi minus x bar squared.

282
00:13:30,290 --> 00:13:32,750
And we just said
that this part here

283
00:13:32,750 --> 00:13:34,000
was a chi square distribution.

284
00:13:34,000 --> 00:13:36,770
We didn't just say it, we said
it a few lectures years back,

285
00:13:36,770 --> 00:13:39,228
that this thing was a chi square
distribution, and the fact

286
00:13:39,228 --> 00:13:42,160
that the presence
of this x bar here

287
00:13:42,160 --> 00:13:46,000
was actually removing one
degree of freedom from this sum.

288
00:13:46,000 --> 00:13:48,220
OK, so this guy here has
the same distribution

289
00:13:48,220 --> 00:13:52,860
as a chi square n
minus 1 divided by n.

290
00:13:52,860 --> 00:13:56,100
So I need to actually still
arrange this thing a little bit

291
00:13:56,100 --> 00:13:58,170
to have a t distribution.

292
00:13:58,170 --> 00:14:01,850
I should not see n here,
but I should n minus 1.

293
00:14:01,850 --> 00:14:06,396
The d is the same
as this d here.

294
00:14:06,396 --> 00:14:07,770
And so let me make
the correction

295
00:14:07,770 --> 00:14:09,880
so that this actually happens.

296
00:14:09,880 --> 00:14:14,700
Well, if I actually write
this to be equal to--

297
00:14:14,700 --> 00:14:19,950
so if I write square root of
n minus 1, as on the slide,

298
00:14:19,950 --> 00:14:25,200
times xn bar minus
mu divided by--

299
00:14:25,200 --> 00:14:27,150
well let me write it
as square root of Sn,

300
00:14:27,150 --> 00:14:29,550
which is my sigma hat.

301
00:14:29,550 --> 00:14:33,902
Then what this thing
is actually equal to,

302
00:14:33,902 --> 00:14:39,030
it follows a N 0, 1,
divided by the square root

303
00:14:39,030 --> 00:14:40,560
of my chi square
distribution with n

304
00:14:40,560 --> 00:14:42,150
minus 1 degrees of freedom.

305
00:14:42,150 --> 00:14:43,930
And here the fact
that I multiply

306
00:14:43,930 --> 00:14:45,439
by square root of
n minus 1, and I

307
00:14:45,439 --> 00:14:47,730
have the square root of n
here, is essentially the same

308
00:14:47,730 --> 00:14:51,440
as dividing here by n minus 1.

309
00:14:51,440 --> 00:14:54,600
And that's my tn distribution.

310
00:14:54,600 --> 00:14:58,190
My t distribution with n
minus 1 degrees of freedom.

311
00:14:58,190 --> 00:15:00,200
Just by definition of
what this thing is.

312
00:15:00,200 --> 00:15:00,700
OK?

313
00:15:22,020 --> 00:15:22,631
All right.

314
00:15:22,631 --> 00:15:23,130
Yes?

315
00:15:23,130 --> 00:15:26,387
AUDIENCE: Where'd you
get the square root from?

316
00:15:26,387 --> 00:15:27,220
PROFESSOR: This guy?

317
00:15:27,220 --> 00:15:28,511
Oh sorry, that's sigma squared.

318
00:15:28,511 --> 00:15:30,210
Thank you.

319
00:15:30,210 --> 00:15:32,570
That's the estimator of the
variance, not the estimator

320
00:15:32,570 --> 00:15:33,440
of the standard deviation.

321
00:15:33,440 --> 00:15:35,939
And when I want to divide it I
divide by standard deviation.

322
00:15:35,939 --> 00:15:38,480
Thank you.

323
00:15:38,480 --> 00:15:40,194
Any other question or remark?

324
00:15:40,194 --> 00:15:42,194
AUDIENCE: Shouldn't you
divide by sigma squared?

325
00:15:42,194 --> 00:15:45,617
The actual.

326
00:15:45,617 --> 00:15:47,410
The estimator for
the variance is

327
00:15:47,410 --> 00:15:52,480
equal to sigma squared
times chi square, right?

328
00:15:52,480 --> 00:15:55,975
PROFESSOR: The estimator
for the variance.

329
00:15:55,975 --> 00:15:56,850
Oh yes, you're right.

330
00:15:56,850 --> 00:15:59,325
So there's a sigma squared here.

331
00:15:59,325 --> 00:16:00,450
Is that what you're asking?

332
00:16:00,450 --> 00:16:00,750
AUDIENCE: Yeah.

333
00:16:00,750 --> 00:16:01,874
PROFESSOR: Yes, absolutely.

334
00:16:01,874 --> 00:16:03,760
And that's where,
it get cancels here.

335
00:16:03,760 --> 00:16:04,860
It gets canceled here.

336
00:16:10,048 --> 00:16:10,548
OK?

337
00:16:13,185 --> 00:16:15,310
So this is really a sigma
squared times chi square.

338
00:16:20,040 --> 00:16:21,120
OK.

339
00:16:21,120 --> 00:16:22,740
So the fact that
it's sigma squared

340
00:16:22,740 --> 00:16:24,240
is just because I
can pull out sigma

341
00:16:24,240 --> 00:16:26,030
squared and just think
those guys N 0, 1.

342
00:16:32,591 --> 00:16:33,090
All right.

343
00:16:33,090 --> 00:16:34,960
So that's my t distribution.

344
00:16:34,960 --> 00:16:37,950
Now that I actually have a
pivotal distribution, what I do

345
00:16:37,950 --> 00:16:40,230
is that I form the statistic.

346
00:16:40,230 --> 00:16:42,480
Here I called it Tn tilde.

347
00:16:52,180 --> 00:16:53,260
OK.

348
00:16:53,260 --> 00:16:54,350
And what is this thing?

349
00:16:54,350 --> 00:16:56,250
I know that this has a
pivotal distribution.

350
00:16:56,250 --> 00:16:59,150
So for example, I know
that the probability

351
00:16:59,150 --> 00:17:05,660
that Tn tilde in absolute value
exceeds some number that I'm

352
00:17:05,660 --> 00:17:11,420
going to call q alpha over
2 for the t n minus 1,

353
00:17:11,420 --> 00:17:13,400
is equal to alpha.

354
00:17:13,400 --> 00:17:16,880
So that's basically,
remember the t distribution

355
00:17:16,880 --> 00:17:19,700
has the same shape as the
Gaussian distribution.

356
00:17:19,700 --> 00:17:21,920
What I'm finding is,
for this t distribution,

357
00:17:21,920 --> 00:17:26,079
some number q alpha
over 2 of t n minus 1

358
00:17:26,079 --> 00:17:29,370
and minus q alpha
over 2 of t minus 1.

359
00:17:29,370 --> 00:17:31,610
So those are different
from the Gaussian one.

360
00:17:31,610 --> 00:17:33,290
Such that the area
under the curve

361
00:17:33,290 --> 00:17:36,530
here is alpha over
2 on each side

362
00:17:36,530 --> 00:17:39,770
so that the probability that
my absolute value exceeds

363
00:17:39,770 --> 00:17:43,790
this number is equal to alpha.

364
00:17:43,790 --> 00:17:46,040
And that's what I'm going
to use to reject the test.

365
00:17:46,040 --> 00:17:59,270
So now my test becomes, for H0,
say mu is equal to some mu 0,

366
00:17:59,270 --> 00:18:05,015
versus H1, mu is
not equal to mu 0.

367
00:18:08,240 --> 00:18:13,090
The rejection region is going
to be equal to the set on which

368
00:18:13,090 --> 00:18:19,520
square root of n minus 1 times
xn bar minus mu 0 this time,

369
00:18:19,520 --> 00:18:25,700
divided by square root of Sn
exceeds, in absolute value,

370
00:18:25,700 --> 00:18:28,580
exceeds q-- sorry
that's already here--

371
00:18:28,580 --> 00:18:34,130
exceeds q alpha over
2 of t n minus 1.

372
00:18:34,130 --> 00:18:36,200
So I reject when
this thing increases.

373
00:18:36,200 --> 00:18:39,050
The same as the Gaussian case,
except that rather than reading

374
00:18:39,050 --> 00:18:41,410
my quantiles from
the Gaussian table

375
00:18:41,410 --> 00:18:44,550
I read them from
the Student table.

376
00:18:44,550 --> 00:18:45,600
It's just the same thing.

377
00:18:45,600 --> 00:18:48,740
So they're just going to
be a little bit farther.

378
00:18:48,740 --> 00:18:52,640
So this guy here is just
going to be a little bigger

379
00:18:52,640 --> 00:18:54,500
than the one for
the Gaussian one,

380
00:18:54,500 --> 00:18:57,170
because it's going to require
me a little more evidence

381
00:18:57,170 --> 00:18:59,060
in my data to be able
to reject because I

382
00:18:59,060 --> 00:19:01,411
have to account for the
fluctuations of sigma hat.

383
00:19:09,270 --> 00:19:12,800
So of course Student's
test is used everywhere.

384
00:19:12,800 --> 00:19:15,800
People use only t tests, right?

385
00:19:15,800 --> 00:19:19,190
If you look at any
data point, any output,

386
00:19:19,190 --> 00:19:21,504
even if you had
500 observations,

387
00:19:21,504 --> 00:19:23,420
if you look at the
statistical software output

388
00:19:23,420 --> 00:19:25,220
it's going to say t test.

389
00:19:25,220 --> 00:19:26,630
And the reason
why you see t test

390
00:19:26,630 --> 00:19:29,420
is because somehow it's felt
like it's not asymptotic.

391
00:19:29,420 --> 00:19:31,850
You don't need to
actually do, you

392
00:19:31,850 --> 00:19:33,680
know, to be
particularly careful.

393
00:19:33,680 --> 00:19:35,705
And anyway, if n
is equal to 500,

394
00:19:35,705 --> 00:19:37,730
since the two curves
are above each other

395
00:19:37,730 --> 00:19:39,019
it's basically the same thing.

396
00:19:39,019 --> 00:19:40,560
So it doesn't really
change anything.

397
00:19:40,560 --> 00:19:43,320
So why not use the t test?

398
00:19:43,320 --> 00:19:44,460
So it's not asymptotic.

399
00:19:44,460 --> 00:19:47,040
It doesn't require Central
Limit Theorem to kick in.

400
00:19:47,040 --> 00:19:50,940
And so in particular it be run
if you have 15 observations.

401
00:19:50,940 --> 00:19:52,770
Of course, the drawback
of the Student test

402
00:19:52,770 --> 00:19:54,228
is that it relies
on the assumption

403
00:19:54,228 --> 00:19:56,579
that the sample is Gaussian,
and that's something

404
00:19:56,579 --> 00:19:57,870
we really need to keep in mind.

405
00:19:57,870 --> 00:20:01,710
If you have a small sample size,
there is no magic going on.

406
00:20:01,710 --> 00:20:04,320
It's not like Student t
test allows you to get rid

407
00:20:04,320 --> 00:20:06,720
of this asymptotic normality.

408
00:20:06,720 --> 00:20:08,520
It sort of assumes
that it's built in.

409
00:20:08,520 --> 00:20:14,150
It assumes that your data
has a Gaussian distribution.

410
00:20:14,150 --> 00:20:18,580
So if you have 15 observations,
what are you going to do?

411
00:20:18,580 --> 00:20:21,610
You want to test if the mean is
equal to 0 or not equal to 0,

412
00:20:21,610 --> 00:20:24,510
but you have only
15 observations.

413
00:20:24,510 --> 00:20:27,900
You have to somehow assume
that your data is Gaussian.

414
00:20:27,900 --> 00:20:30,270
But if the data is given
to you, this is not math,

415
00:20:30,270 --> 00:20:32,514
you actually have to
check that it's Gaussian.

416
00:20:32,514 --> 00:20:33,930
And so we're going
to have to find

417
00:20:33,930 --> 00:20:38,840
a test that, given some data,
tells us whether it's Gaussian

418
00:20:38,840 --> 00:20:39,830
or not.

419
00:20:39,830 --> 00:20:42,320
If I have 15
observations, 8 of them

420
00:20:42,320 --> 00:20:46,089
are equal to plus 1 and 7 of
them are equal to minus 1,

421
00:20:46,089 --> 00:20:47,630
then it's pretty
unlikely that you're

422
00:20:47,630 --> 00:20:50,046
going to be able to conclude
that your data has a Gaussian

423
00:20:50,046 --> 00:20:51,000
distribution.

424
00:20:51,000 --> 00:20:54,320
However, if you see some sort
of spread around some value,

425
00:20:54,320 --> 00:20:56,120
you form a histogram
maybe and it sort of

426
00:20:56,120 --> 00:20:57,710
looks like it's a
Gaussian, you might

427
00:20:57,710 --> 00:20:59,120
want to say it's Gaussian.

428
00:20:59,120 --> 00:21:01,920
And so how do we make
this more quantitative?

429
00:21:01,920 --> 00:21:05,390
Well, the sad answer
to this question

430
00:21:05,390 --> 00:21:08,030
is that there will be some
tests that make it quantitative,

431
00:21:08,030 --> 00:21:11,590
but here, if you think about it
for one second, what is going

432
00:21:11,590 --> 00:21:13,030
to be your null hypothesis?

433
00:21:13,030 --> 00:21:15,930
Your null hypothesis,
since it's one point,

434
00:21:15,930 --> 00:21:17,880
it's going to be
that it's Gaussian,

435
00:21:17,880 --> 00:21:19,290
and then the
alternative is going

436
00:21:19,290 --> 00:21:21,520
to be that it's not Gaussian.

437
00:21:21,520 --> 00:21:23,860
So what it means is
that, for the first time

438
00:21:23,860 --> 00:21:26,140
in your statistician
life, you're

439
00:21:26,140 --> 00:21:30,142
going to want to conclude
that H0 is the true one.

440
00:21:30,142 --> 00:21:31,600
You're definitely
not going to want

441
00:21:31,600 --> 00:21:34,016
to say that it's not Gaussian,
because then everything you

442
00:21:34,016 --> 00:21:36,580
know is sort of falling apart.

443
00:21:36,580 --> 00:21:39,540
And so it's kind of
a weird thing where

444
00:21:39,540 --> 00:21:41,430
you're sort of going
to be seeking tests

445
00:21:41,430 --> 00:21:43,140
that have no power basically.

446
00:21:43,140 --> 00:21:46,240
You're going to want to test
that, and that's the nature.

447
00:21:46,240 --> 00:21:49,140
The amount of
alternatives, the number

448
00:21:49,140 --> 00:21:52,710
of ways you can be not
Gaussian, is so huge

449
00:21:52,710 --> 00:21:56,569
that all tests are sort of
bound to have very low power.

450
00:21:56,569 --> 00:21:58,860
And so that's why people are
pretty happy with the idea

451
00:21:58,860 --> 00:22:00,420
that things are
Gaussian, because it's

452
00:22:00,420 --> 00:22:01,961
very hard to find
a test that's going

453
00:22:01,961 --> 00:22:04,790
to reject this hypothesis.

454
00:22:04,790 --> 00:22:08,479
And so we're even going to find
some tests that are visual,

455
00:22:08,479 --> 00:22:10,020
where you're going
to be able to say,

456
00:22:10,020 --> 00:22:12,800
well, sort of looks
Gaussian to me.

457
00:22:12,800 --> 00:22:16,760
It allows you to deal
with the borderline cases

458
00:22:16,760 --> 00:22:17,580
pretty efficiently.

459
00:22:17,580 --> 00:22:19,930
We'll see actually a
particular example.

460
00:22:19,930 --> 00:22:22,280
All right, so this
theory of testing

461
00:22:22,280 --> 00:22:24,470
whether data comes from
a particular distribution

462
00:22:24,470 --> 00:22:26,930
is called goodness of fit.

463
00:22:26,930 --> 00:22:31,480
Is this distribution a
good fit for my data?

464
00:22:31,480 --> 00:22:33,620
That's the goodness of fit test.

465
00:22:33,620 --> 00:22:36,110
We have just seen a
goodness of fit test.

466
00:22:36,110 --> 00:22:36,690
What was it?

467
00:22:41,950 --> 00:22:44,143
Yeah.

468
00:22:44,143 --> 00:22:46,090
The chi square test, right?

469
00:22:46,090 --> 00:22:49,660
The case square test, we
were given a candidate PMF

470
00:22:49,660 --> 00:22:52,390
and we were testing if this
was a good fit for our data.

471
00:22:52,390 --> 00:22:54,490
That was a goodness of fit test.

472
00:22:54,490 --> 00:22:57,190
So of course multinomial
is one example,

473
00:22:57,190 --> 00:22:59,500
but really what we have
in the back of our mind is

474
00:22:59,500 --> 00:23:01,210
I want to test if
my data is Gaussian.

475
00:23:01,210 --> 00:23:03,410
That's basically
the usual thing.

476
00:23:03,410 --> 00:23:06,010
And just like you always see
t test as the standard output

477
00:23:06,010 --> 00:23:09,220
from statistical software
whether you ask for it or not,

478
00:23:09,220 --> 00:23:11,530
there will be a
test for normality

479
00:23:11,530 --> 00:23:16,500
whether you ask it or not from
any statistical software app.

480
00:23:16,500 --> 00:23:17,020
All right.

481
00:23:17,020 --> 00:23:19,580
So a goodness of fit
test looks as follows.

482
00:23:19,580 --> 00:23:21,340
There's a random
variable X and you're

483
00:23:21,340 --> 00:23:23,909
given i.i.d. copies
of X, X1 to Xn,

484
00:23:23,909 --> 00:23:25,450
they come from the
same distribution.

485
00:23:25,450 --> 00:23:28,750
And you're going to ask the
following question: does X have

486
00:23:28,750 --> 00:23:31,510
a standard normal distribution?

487
00:23:31,510 --> 00:23:33,520
So for t distribution
that's definitely

488
00:23:33,520 --> 00:23:35,720
the kind of questions
you may want to ask.

489
00:23:35,720 --> 00:23:39,820
Does X have a uniform
distribution on 0, 1?

490
00:23:39,820 --> 00:23:41,560
That's different from
the distribution 1

491
00:23:41,560 --> 00:23:44,110
over k, 1 over k, it's
the continuous notion

492
00:23:44,110 --> 00:23:47,430
of uniformity.

493
00:23:47,430 --> 00:23:49,950
And for example, you
might want to test that--

494
00:23:49,950 --> 00:23:51,700
so there's actually a
nice exercise, which

495
00:23:51,700 --> 00:23:53,580
is if you look at the p-values.

496
00:23:53,580 --> 00:23:55,900
So we've defined what
the p-values were.

497
00:23:55,900 --> 00:23:59,710
And the p-value's a number
between 0 and 1, right?

498
00:23:59,710 --> 00:24:01,270
And you could
actually ask yourself,

499
00:24:01,270 --> 00:24:04,660
what is the distribution of
the p-value under the null?

500
00:24:04,660 --> 00:24:08,000
So the p-value is
a random number.

501
00:24:08,000 --> 00:24:10,180
It's the probability-- so
the p-value-- let's look

502
00:24:10,180 --> 00:24:13,720
at the following test.

503
00:24:17,995 --> 00:24:25,190
H0, mu is equal to 0, versus
H1, mu is not equal to 0.

504
00:24:25,190 --> 00:24:28,757
And I know that the p-value is--

505
00:24:28,757 --> 00:24:29,840
so I'm going to form what?

506
00:24:29,840 --> 00:24:34,832
I'm going to look
at Xn bar minus mu

507
00:24:34,832 --> 00:24:37,040
times square root of n
divided by-- let's say that we

508
00:24:37,040 --> 00:24:40,040
know sigma for one second.

509
00:24:40,040 --> 00:24:43,040
Then the p-value
is the probability

510
00:24:43,040 --> 00:24:48,590
that this is larger then
square root of n little xn

511
00:24:48,590 --> 00:24:54,775
bar minus mu, minus 0
actually in this case,

512
00:24:54,775 --> 00:24:59,600
divided by sigma, where
this guy is the observed.

513
00:25:04,360 --> 00:25:05,510
OK.

514
00:25:05,510 --> 00:25:09,890
So now you could say, well,
how is that a random variable?

515
00:25:09,890 --> 00:25:11,260
It's just a number.

516
00:25:11,260 --> 00:25:13,380
It's just a probability
of something.

517
00:25:13,380 --> 00:25:17,090
But then I can view this
as a function of this guy

518
00:25:17,090 --> 00:25:23,190
here when I plug it back
to be a random variable.

519
00:25:23,190 --> 00:25:26,040
So what I mean by this is
that if I look at this value

520
00:25:26,040 --> 00:25:34,520
here, if I say that phi
is the CDF of N 0, 1,

521
00:25:34,520 --> 00:25:36,260
so the p-value is
the probability

522
00:25:36,260 --> 00:25:37,730
that it exceeds this.

523
00:25:37,730 --> 00:25:41,730
So that's the probability
that I'm either here or here.

524
00:25:44,258 --> 00:25:47,975
AUDIENCE: [INAUDIBLE]

525
00:25:47,975 --> 00:25:49,266
PROFESSOR: No, it's not, right?

526
00:25:49,266 --> 00:25:52,530
AUDIENCE: [INAUDIBLE]

527
00:25:52,530 --> 00:25:55,290
PROFESSOR: This is a big
X and this is a small x.

528
00:25:55,290 --> 00:25:57,685
This is just where
you plug in your data.

529
00:25:57,685 --> 00:25:59,310
The p-value is the
probability that you

530
00:25:59,310 --> 00:26:03,180
have more evidence
against your null

531
00:26:03,180 --> 00:26:05,390
than what you already have.

532
00:26:05,390 --> 00:26:06,820
OK, so now I can
write it in terms

533
00:26:06,820 --> 00:26:09,190
of cumulative
distribution functions.

534
00:26:09,190 --> 00:26:09,940
So this is what?

535
00:26:09,940 --> 00:26:14,480
This is phi of this guy, which
is minus this thing here.

536
00:26:17,500 --> 00:26:19,270
Well it's basically
2 times this guy,

537
00:26:19,270 --> 00:26:27,950
phi of minus square root of
n, Xn bar divided by sigma.

538
00:26:30,740 --> 00:26:31,652
That's my p-value.

539
00:26:31,652 --> 00:26:33,860
If you give me data, I'm
going to compute the average

540
00:26:33,860 --> 00:26:36,435
and plug it in there, and
it can spit out the p-value.

541
00:26:36,435 --> 00:26:37,340
Everybody agrees?

542
00:26:39,890 --> 00:26:42,940
So now I can view this, if I
start now looking back I say,

543
00:26:42,940 --> 00:26:45,710
well, where does
this data come from?

544
00:26:45,710 --> 00:26:48,520
Well, it could be
a random variable.

545
00:26:48,520 --> 00:26:51,340
It came from the
realization of this thing.

546
00:26:51,340 --> 00:26:54,664
So I can try to, I can
think of this value,

547
00:26:54,664 --> 00:26:57,080
where now this is a random
variable because I just plugged

548
00:26:57,080 --> 00:26:59,840
in a random variable in here.

549
00:26:59,840 --> 00:27:04,240
So now I view my p-value
as a random variable.

550
00:27:04,240 --> 00:27:06,960
So I keep switching from
small x to large X. Everybody

551
00:27:06,960 --> 00:27:08,740
agrees what I'm doing here?

552
00:27:08,740 --> 00:27:11,490
So I just wrote it as a
deterministic function

553
00:27:11,490 --> 00:27:14,135
of some deterministic
number, and now the function

554
00:27:14,135 --> 00:27:17,520
stays deterministic but
the number becomes random.

555
00:27:17,520 --> 00:27:21,702
And so I can think of this
as some statistic of my data.

556
00:27:21,702 --> 00:27:23,660
And I could say, well,
what is the distribution

557
00:27:23,660 --> 00:27:26,560
of this random variable?

558
00:27:26,560 --> 00:27:29,480
Now if my data is actually
normally distributed,

559
00:27:29,480 --> 00:27:31,810
so I'm actually
under the null, so

560
00:27:31,810 --> 00:27:37,570
under the null, that means that
Xn bar times square root of n

561
00:27:37,570 --> 00:27:40,947
divided by sigma has
what distribution?

562
00:27:48,335 --> 00:27:48,835
Normal?

563
00:27:56,540 --> 00:27:59,260
Well it was sigma,
I assume I knew it.

564
00:27:59,260 --> 00:28:00,520
So it's N 0, 1, right?

565
00:28:00,520 --> 00:28:02,080
I divided by sigma here.

566
00:28:02,080 --> 00:28:03,010
OK?

567
00:28:03,010 --> 00:28:04,500
So now I have this
random variable.

568
00:28:15,880 --> 00:28:24,012
And so my random variable is now
2 phi of minus absolute value

569
00:28:24,012 --> 00:28:24,595
of a Gaussian.

570
00:28:34,430 --> 00:28:40,300
And I'm actually interested in
the distribution of this thing.

571
00:28:40,300 --> 00:28:41,620
I could ask that.

572
00:28:41,620 --> 00:28:43,150
Anybody has an idea
of how you would

573
00:28:43,150 --> 00:28:45,017
want to tackle this thing?

574
00:28:45,017 --> 00:28:46,600
If I ask you, what
is the distribution

575
00:28:46,600 --> 00:28:48,930
of a random variable, how
do you tackle this question?

576
00:28:53,120 --> 00:28:54,360
There's basically two ways.

577
00:28:54,360 --> 00:28:55,880
One is to try to
find something that

578
00:28:55,880 --> 00:29:02,090
looks like the expectation
of h of x for all h.

579
00:29:02,090 --> 00:29:04,790
And you try to write this
using change of variables

580
00:29:04,790 --> 00:29:09,260
and something that looks like
integral of h of x p of x dx.

581
00:29:09,260 --> 00:29:12,540
And then you say, well,
that's the density.

582
00:29:12,540 --> 00:29:15,290
If you can read this
for any h, then that's

583
00:29:15,290 --> 00:29:16,970
the way you would do it.

584
00:29:16,970 --> 00:29:19,160
But there's a simpler
way that does not

585
00:29:19,160 --> 00:29:21,805
involve changing
variables, et cetera,

586
00:29:21,805 --> 00:29:23,930
you just try to compute
the cumulative distribution

587
00:29:23,930 --> 00:29:25,250
function.

588
00:29:25,250 --> 00:29:26,900
So let's try to
compute the probability

589
00:29:26,900 --> 00:29:34,850
that 2 phi minus N
0, 1, is less than t.

590
00:29:34,850 --> 00:29:38,130
And maybe we can find
something we know.

591
00:29:38,130 --> 00:29:38,630
OK.

592
00:29:38,630 --> 00:29:39,713
Well that's equal to what?

593
00:29:39,713 --> 00:29:43,040
That's the probability
that a minus N 0,

594
00:29:43,040 --> 00:29:45,968
well let's say that an N 0, 1--

595
00:29:45,968 --> 00:29:57,590
sorry, N 0, 1 absolute value is
greater than minus phi inverse

596
00:29:57,590 --> 00:29:58,600
of t over 2.

597
00:30:04,170 --> 00:30:05,477
And that's what?

598
00:30:05,477 --> 00:30:07,560
Well, it's just the same
thing that we had before.

599
00:30:07,560 --> 00:30:12,990
It's equal to-- so
if I look again,

600
00:30:12,990 --> 00:30:15,840
this is the probability that
I'm actually on this side

601
00:30:15,840 --> 00:30:17,550
or that side of this number.

602
00:30:17,550 --> 00:30:18,650
And this number is what?

603
00:30:18,650 --> 00:30:25,840
It's minus phi of t over 2.

604
00:30:25,840 --> 00:30:27,080
Why do I have a minus here?

605
00:30:32,230 --> 00:30:33,880
That's fine, OK.

606
00:30:33,880 --> 00:30:36,220
So it's actually not this,
it's actually the probability

607
00:30:36,220 --> 00:30:39,830
that my absolute value--

608
00:30:39,830 --> 00:30:41,340
oh, because phi inverse.

609
00:30:41,340 --> 00:30:42,330
OK.

610
00:30:42,330 --> 00:30:44,460
Because phi inverse is--

611
00:30:44,460 --> 00:30:48,280
so I'm going to
look at t between 0

612
00:30:48,280 --> 00:30:52,440
and-- so this number is
ranging between 0 and 1.

613
00:30:52,440 --> 00:30:55,170
So it means that this number
is ranging between 0--

614
00:30:55,170 --> 00:30:58,230
well, the probability that
something is less than t

615
00:30:58,230 --> 00:31:03,030
should be ranging between the
numbers that this guy takes,

616
00:31:03,030 --> 00:31:04,500
so that's between 0 and 2.

617
00:31:11,410 --> 00:31:14,750
Because this thing takes
values between 0 and 2.

618
00:31:14,750 --> 00:31:16,824
I want to see 0 and 1, though.

619
00:31:21,048 --> 00:31:23,533
AUDIENCE: Negative absolute
value is always less

620
00:31:23,533 --> 00:31:24,776
than [INAUDIBLE].

621
00:31:29,314 --> 00:31:29,980
PROFESSOR: Yeah.

622
00:31:29,980 --> 00:31:30,979
You're right, thank you.

623
00:31:30,979 --> 00:31:34,090
So this is always some
number which is less than 0,

624
00:31:34,090 --> 00:31:36,490
so the probability that
the Gaussian is less

625
00:31:36,490 --> 00:31:38,710
than this number is always
less than the probability

626
00:31:38,710 --> 00:31:40,750
it's less than 0,
which is 1/2, so t only

627
00:31:40,750 --> 00:31:41,890
has to be between 0 and 1.

628
00:31:41,890 --> 00:31:43,940
Thank you.

629
00:31:43,940 --> 00:31:47,380
And so now for t
between 0 and 1, then

630
00:31:47,380 --> 00:31:50,680
this guy is actually becoming
something which is positive,

631
00:31:50,680 --> 00:31:52,274
for the same reason as before.

632
00:31:52,274 --> 00:31:53,065
And so that's what?

633
00:31:53,065 --> 00:32:04,710
That's just basically 2 times
phi of phi inverse of t over 2.

634
00:32:07,625 --> 00:32:09,750
That's just playing with
the symmetry a little bit.

635
00:32:09,750 --> 00:32:11,695
You can look at the
areas under the curve.

636
00:32:11,695 --> 00:32:13,820
And so what it means is
that those two guys cancel.

637
00:32:13,820 --> 00:32:15,320
This is the identity.

638
00:32:15,320 --> 00:32:18,740
And so this is equal to t.

639
00:32:18,740 --> 00:32:23,240
So which distribution
has a density--

640
00:32:23,240 --> 00:32:27,410
sorry, which distribution
has a cumulative distribution

641
00:32:27,410 --> 00:32:32,040
function which is equal to
t for t between 0 and 1?

642
00:32:32,040 --> 00:32:34,140
That's the uniform
distribution, right?

643
00:32:34,140 --> 00:32:37,800
So it means that this guy
follows a uniform distribution

644
00:32:37,800 --> 00:32:39,130
on the interval 0, 1.

645
00:32:44,264 --> 00:32:45,680
And you could
actually check that.

646
00:32:45,680 --> 00:32:47,240
For any test you're
going to come up with,

647
00:32:47,240 --> 00:32:48,448
this is going to be the case.

648
00:32:48,448 --> 00:32:52,340
Your p-value under the null
will have a distribution

649
00:32:52,340 --> 00:32:54,330
which is uniform.

650
00:32:54,330 --> 00:32:58,070
So now if somebody shows up
and says, here's my test,

651
00:32:58,070 --> 00:33:00,552
it's awesome, it
just works great.

652
00:33:00,552 --> 00:33:02,510
I'm not going to explain
to you how I built it,

653
00:33:02,510 --> 00:33:03,926
it's a complicated
statistics that

654
00:33:03,926 --> 00:33:06,740
involve moments of order 27.

655
00:33:06,740 --> 00:33:08,510
And I'm like, OK,
you know, how am I

656
00:33:08,510 --> 00:33:11,690
going to test that your test
statistic actually makes sense?

657
00:33:11,690 --> 00:33:16,090
Well one thing I can do
is to run a bunch of data,

658
00:33:16,090 --> 00:33:18,440
draw a bunch of samples,
compute your test statistic,

659
00:33:18,440 --> 00:33:22,610
compute the p-value, and
check if my p-value has

660
00:33:22,610 --> 00:33:27,050
a uniform distribution
on the interval 0, 1.

661
00:33:27,050 --> 00:33:29,340
But for that I need
to have a test that,

662
00:33:29,340 --> 00:33:31,110
given a bunch of
observations, can tell me

663
00:33:31,110 --> 00:33:33,060
whether they're actually
distributed uniformly

664
00:33:33,060 --> 00:33:34,410
on the interval 0, 1.

665
00:33:34,410 --> 00:33:36,750
And again one thing I could
do is build a histogram

666
00:33:36,750 --> 00:33:40,032
and see if it looks
like that of a uniform,

667
00:33:40,032 --> 00:33:42,240
but I could also try to be
slightly more quantitative

668
00:33:42,240 --> 00:33:43,232
about this.

669
00:33:43,232 --> 00:33:44,954
AUDIENCE: Why does
the [INAUDIBLE] have

670
00:33:44,954 --> 00:33:47,039
to be for a [INAUDIBLE]?

671
00:33:47,039 --> 00:33:48,080
PROFESSOR: For two tests?

672
00:33:48,080 --> 00:33:51,461
AUDIENCE: For each test.

673
00:33:51,461 --> 00:33:54,370
Why does the p-value
have to be normal?

674
00:33:54,370 --> 00:33:55,096
I mean, uniform.

675
00:33:55,096 --> 00:33:57,620
PROFESSOR: It's
uniform under the null.

676
00:33:57,620 --> 00:34:00,599
So because my test statistic
was built under the null,

677
00:34:00,599 --> 00:34:03,140
and so I have to be able to plug
in the right value in there,

678
00:34:03,140 --> 00:34:04,806
otherwise it's going
to shift everything

679
00:34:04,806 --> 00:34:06,078
for this particular test.

680
00:34:06,078 --> 00:34:08,203
AUDIENCE: At the beginning
while your probabilities

681
00:34:08,203 --> 00:34:09,910
were of big Xn, that thing.

682
00:34:09,910 --> 00:34:11,848
That thing is the p-value.

683
00:34:11,848 --> 00:34:13,389
PROFESSOR: That's
the p-value, right?

684
00:34:13,389 --> 00:34:15,038
That's the definition
of the p-value.

685
00:34:15,038 --> 00:34:15,579
AUDIENCE: OK.

686
00:34:17,604 --> 00:34:19,020
PROFESSOR: So it's
the probability

687
00:34:19,020 --> 00:34:23,835
that my test statistic exceeds
what I've actually observed.

688
00:34:23,835 --> 00:34:26,270
AUDIENCE: So how you run
the test is basically

689
00:34:26,270 --> 00:34:29,679
you have your
observations and plug them

690
00:34:29,679 --> 00:34:33,575
into the cumulative distribution
function for a normal,

691
00:34:33,575 --> 00:34:35,854
and then see if it
falls under the given--

692
00:34:35,854 --> 00:34:36,520
PROFESSOR: Yeah.

693
00:34:36,520 --> 00:34:39,850
So my p-value is
just this number

694
00:34:39,850 --> 00:34:42,429
when I just plug in the
values that I observe here.

695
00:34:42,429 --> 00:34:43,179
That's one number.

696
00:34:43,179 --> 00:34:45,040
For every dataset
you're going to give me,

697
00:34:45,040 --> 00:34:46,820
it's going to be one number.

698
00:34:46,820 --> 00:34:51,820
Now what I can do is generate
a bunch of datasets of size n,

699
00:34:51,820 --> 00:34:53,436
like 200 of them.

700
00:34:53,436 --> 00:34:55,060
And then I'm going
to have a new sample

701
00:34:55,060 --> 00:34:59,000
of say 200, which is just
the sample of 200 p-values.

702
00:34:59,000 --> 00:35:00,770
And I want to test if
those p-values have

703
00:35:00,770 --> 00:35:02,360
a uniform distribution.

704
00:35:02,360 --> 00:35:02,930
OK?

705
00:35:02,930 --> 00:35:05,688
Because that's the distribution
they should be having.

706
00:35:05,688 --> 00:35:06,676
All right?

707
00:35:11,130 --> 00:35:12,174
OK.

708
00:35:12,174 --> 00:35:13,340
This one we've already seen.

709
00:35:13,340 --> 00:35:18,410
Does x have a PMF with
30%, 50%, and 20%?

710
00:35:18,410 --> 00:35:21,070
That's something I
could try to test.

711
00:35:21,070 --> 00:35:27,260
That looks like your grade point
distribution for this class.

712
00:35:27,260 --> 00:35:30,230
Well not exactly, but
that looks like it.

713
00:35:30,230 --> 00:35:33,039
So all these things are known
as goodness of fit tests.

714
00:35:33,039 --> 00:35:34,580
The goodness of fit
test is something

715
00:35:34,580 --> 00:35:38,180
that you want to know if the
data that you have at hand

716
00:35:38,180 --> 00:35:41,390
follows the hypothesized
distribution.

717
00:35:41,390 --> 00:35:43,010
So it's not a parametric test.

718
00:35:43,010 --> 00:35:46,790
It's not a test that says, is
my mean equal to 25 or not.

719
00:35:46,790 --> 00:35:51,200
Is my proportion of heads
larger than 1/2 or not?

720
00:35:51,200 --> 00:35:53,494
It's something that
says, my distribution

721
00:35:53,494 --> 00:35:54,410
this particular thing.

722
00:35:57,290 --> 00:36:00,690
So I'm going to write them as
goodness of fit, G-O-F here.

723
00:36:00,690 --> 00:36:02,940
You don't need to have
parametric modeling to do that.

724
00:36:05,560 --> 00:36:06,760
So how do I work?

725
00:36:06,760 --> 00:36:09,550
So if I don't have any
parametric modeling,

726
00:36:09,550 --> 00:36:12,430
I need to have something which
is somewhat non-parametric,

727
00:36:12,430 --> 00:36:14,860
something that goes
beyond computing the mean

728
00:36:14,860 --> 00:36:16,630
and the standard
deviation, something

729
00:36:16,630 --> 00:36:19,900
that computes some intrinsic
non-parametric aspect

730
00:36:19,900 --> 00:36:21,370
of my data.

731
00:36:21,370 --> 00:36:24,400
And just like here we made
this computation, what we did

732
00:36:24,400 --> 00:36:28,490
is we said well,
if I actually check

733
00:36:28,490 --> 00:36:34,340
that the CDF of my data,
that my p-value is uniform,

734
00:36:34,340 --> 00:36:35,500
then I know it's uniform.

735
00:36:35,500 --> 00:36:37,840
So it means that the cumulative
distribution function

736
00:36:37,840 --> 00:36:39,850
has an intrinsic value
about it that captures

737
00:36:39,850 --> 00:36:41,740
the entire distribution.

738
00:36:41,740 --> 00:36:44,200
Everything I need to know
about my distribution

739
00:36:44,200 --> 00:36:47,320
is captured by the cumulative
distribution function.

740
00:36:47,320 --> 00:36:49,810
Now I have an empirical
way of computing,

741
00:36:49,810 --> 00:36:52,180
I have a data-driven
way of computing

742
00:36:52,180 --> 00:36:54,850
an estimate for the cumulative
distribution function, which

743
00:36:54,850 --> 00:36:57,040
is using the old
statistical trick which

744
00:36:57,040 --> 00:37:00,190
consists of replacing
expectations by averages.

745
00:37:00,190 --> 00:37:04,420
So as I said, the cumulative
distribution function

746
00:37:04,420 --> 00:37:08,440
for any distribution, for
any random variable, is--

747
00:37:12,510 --> 00:37:17,610
so F of t is the
probability that X

748
00:37:17,610 --> 00:37:19,600
is less than or
equal to t, which

749
00:37:19,600 --> 00:37:22,630
is equal to the expectation
of the indicator

750
00:37:22,630 --> 00:37:26,110
that X is less
than or equal to t.

751
00:37:26,110 --> 00:37:28,310
That's the definition
of a probability.

752
00:37:28,310 --> 00:37:31,660
And so here I'm just going
to replace expectation

753
00:37:31,660 --> 00:37:34,390
by the average.

754
00:37:34,390 --> 00:37:37,480
That's my usual
statistical trick.

755
00:37:37,480 --> 00:37:42,980
And so my estimator Fn for--

756
00:37:42,980 --> 00:37:45,940
the distribution is going
to be 1 over n sum from i

757
00:37:45,940 --> 00:37:48,423
equal 1 to n of
these indicators.

758
00:37:53,420 --> 00:37:58,780
And this is called
the empirical CDF.

759
00:37:58,780 --> 00:38:01,050
It's just the data
version of the CDF.

760
00:38:04,800 --> 00:38:08,423
So I just replaced this
expectation here by an average.

761
00:38:13,630 --> 00:38:17,270
Now when I sum
indicators, I'm actually

762
00:38:17,270 --> 00:38:20,570
counting the number of them
that satisfy something.

763
00:38:20,570 --> 00:38:24,070
So if you look at
what this guy is,

764
00:38:24,070 --> 00:38:32,820
this is the number of X i's
that is less than t, right?

765
00:38:32,820 --> 00:38:35,355
And so if I divide by n, it's
the proportion of observations

766
00:38:35,355 --> 00:38:36,840
I have that are less than t.

767
00:38:41,821 --> 00:38:43,570
That's what the empirical
distribution is.

768
00:38:46,920 --> 00:38:50,280
That's what's written here,
the number of data points

769
00:38:50,280 --> 00:38:52,080
that are less than t.

770
00:38:52,080 --> 00:38:53,670
And so this is going
to be something

771
00:38:53,670 --> 00:38:57,511
that's sort of trying to
estimate one or the other.

772
00:38:57,511 --> 00:38:59,010
And the law of large
number actually

773
00:38:59,010 --> 00:39:03,240
tells me that for any given t,
if n is large enough, Fn of t

774
00:39:03,240 --> 00:39:05,600
should be close to F of t.

775
00:39:05,600 --> 00:39:07,010
Because it's an average.

776
00:39:07,010 --> 00:39:10,610
And this entire thing, this
entire statistical trick,

777
00:39:10,610 --> 00:39:13,820
which consists of replacing
expectations by averages,

778
00:39:13,820 --> 00:39:16,610
is justified by the
law of large number.

779
00:39:16,610 --> 00:39:19,610
Every time we used it, that was
because the law of large number

780
00:39:19,610 --> 00:39:21,560
sort of guaranteed to
us that the average was

781
00:39:21,560 --> 00:39:23,057
close to the expectation.

782
00:39:26,720 --> 00:39:27,250
OK.

783
00:39:27,250 --> 00:39:30,760
So law of large numbers tell
me that Fn of t converges,

784
00:39:30,760 --> 00:39:34,010
so that's the strong law, says
that almost surely actually

785
00:39:34,010 --> 00:39:35,390
Fn of t goes to F of t.

786
00:39:40,470 --> 00:39:43,260
And that's just for any given t.

787
00:39:43,260 --> 00:39:46,650
Is there any
question about this?

788
00:39:46,650 --> 00:39:48,329
That averages converge
to expectation,

789
00:39:48,329 --> 00:39:49,620
that's the law of large number.

790
00:39:52,690 --> 00:39:54,704
And almost surely we
could say in probability

791
00:39:54,704 --> 00:39:57,120
it's the same, that would be
the weak law of large number.

792
00:40:00,460 --> 00:40:01,960
Now this is fine.

793
00:40:01,960 --> 00:40:05,470
For any given t, the average
converges to the true.

794
00:40:05,470 --> 00:40:09,010
It just happens that this
random variable is indexed by t,

795
00:40:09,010 --> 00:40:12,460
and I could do it for
t equals 1 or 2 or 25,

796
00:40:12,460 --> 00:40:14,280
and just check it again.

797
00:40:14,280 --> 00:40:18,145
But I might want to check
it for all t's at once.

798
00:40:18,145 --> 00:40:19,770
And that's actually
a different result.

799
00:40:19,770 --> 00:40:21,960
That's called a
uniform result. I

800
00:40:21,960 --> 00:40:25,200
want this to hold for
all t at the same time.

801
00:40:25,200 --> 00:40:28,720
And it may be the case that it
works for each t individually

802
00:40:28,720 --> 00:40:31,200
but not for all t's
at the same time.

803
00:40:31,200 --> 00:40:33,330
What could happen is
that for t equals 1

804
00:40:33,330 --> 00:40:36,326
it converges at a certain
rate, and for t equals 2

805
00:40:36,326 --> 00:40:37,950
it converges at a
bit of a slower rate,

806
00:40:37,950 --> 00:40:41,010
and for t equals 3 at a
slower rate and slower rate.

807
00:40:41,010 --> 00:40:43,770
And so as t goes to infinity,
the rate is going to vanish

808
00:40:43,770 --> 00:40:45,360
and nothing is
going to converge.

809
00:40:45,360 --> 00:40:46,290
That could happen.

810
00:40:46,290 --> 00:40:48,600
I could make this happen
at a finite point.

811
00:40:48,600 --> 00:40:50,850
There's many ways where
it could make this happen.

812
00:40:50,850 --> 00:40:52,780
Let's see how that could work.

813
00:40:52,780 --> 00:40:54,780
I could say, well, actually no.

814
00:40:54,780 --> 00:40:59,115
I still need to have this
at infinity for some reason.

815
00:40:59,115 --> 00:41:01,686
It turns out that this
is still true uniformly,

816
00:41:01,686 --> 00:41:03,810
and this is actually a much
more complicated result

817
00:41:03,810 --> 00:41:05,340
than the law of large number.

818
00:41:05,340 --> 00:41:07,740
It's called
Glivenko-Cantelli Theorem.

819
00:41:07,740 --> 00:41:09,270
And the
Glivenko-Cantelli Theorem

820
00:41:09,270 --> 00:41:14,960
tells me that, for all t's
at once, Fn converges to F.

821
00:41:14,960 --> 00:41:18,230
So let me just show
you quickly why

822
00:41:18,230 --> 00:41:22,040
this is just a little
bit stronger than the one

823
00:41:22,040 --> 00:41:25,900
that we had.

824
00:41:25,900 --> 00:41:29,120
If sup is confusing
you, think of max.

825
00:41:29,120 --> 00:41:31,880
It's just the max
over an infinite set.

826
00:41:31,880 --> 00:41:40,500
And so what we know is
that Fn of t goes to F of t

827
00:41:40,500 --> 00:41:43,270
as n goes to infinity.

828
00:41:43,270 --> 00:41:45,560
And that's almost surely.

829
00:41:45,560 --> 00:41:48,360
And that's the law
of large numbers.

830
00:41:48,360 --> 00:41:54,920
Which is equivalent to saying
that Fn of t minus F of t as n

831
00:41:54,920 --> 00:41:59,700
goes to infinity converges
almost surely to 0, right?

832
00:41:59,700 --> 00:42:01,823
This is the same thing.

833
00:42:01,823 --> 00:42:07,217
Now I want this to happen
for all t's at once.

834
00:42:07,217 --> 00:42:09,300
So what I'm going to do--
oh, and this is actually

835
00:42:09,300 --> 00:42:11,174
equivalent to this.

836
00:42:11,174 --> 00:42:12,840
And so what I'm going
to do is I'm going

837
00:42:12,840 --> 00:42:14,590
to make it a little stronger.

838
00:42:14,590 --> 00:42:16,990
So here the arrow
only goes one way.

839
00:42:16,990 --> 00:42:20,660
And this is where the sup
for t in R of Fn of t.

840
00:42:26,847 --> 00:42:28,930
And you could actually
show that this happens also

841
00:42:28,930 --> 00:42:29,513
almost surely.

842
00:42:35,500 --> 00:42:37,650
Now maybe almost
surely is a bit more

843
00:42:37,650 --> 00:42:39,210
difficult to get a grasp on.

844
00:42:43,560 --> 00:42:48,630
Does anybody want to see, like
why this statement for this sup

845
00:42:48,630 --> 00:42:51,030
is strictly stronger than the
one that holds individually

846
00:42:51,030 --> 00:42:52,880
for all t's?

847
00:42:52,880 --> 00:42:54,086
You want to see that?

848
00:42:54,086 --> 00:42:54,960
OK, so let's do that.

849
00:42:54,960 --> 00:42:57,410
So forget about it almost
surely for one second.

850
00:42:57,410 --> 00:42:59,660
Let's just do it in probability.

851
00:42:59,660 --> 00:43:09,690
The fact that Fn of t
converges to F of t for all t,

852
00:43:09,690 --> 00:43:12,300
in probability means that
this goes to 0 as n goes

853
00:43:12,300 --> 00:43:13,860
to infinity for any epsilon.

854
00:43:17,400 --> 00:43:19,150
For any epsilon in t
we know we have this.

855
00:43:19,150 --> 00:43:22,529
That's the convergence
in probability.

856
00:43:22,529 --> 00:43:24,070
Now what I want is
to put a sup here.

857
00:43:28,408 --> 00:43:32,920
The probability that the
sup is lower than epsilon,

858
00:43:32,920 --> 00:43:38,080
might be actually always
larger than, never go to 0

859
00:43:38,080 --> 00:43:39,090
in some cases.

860
00:43:39,090 --> 00:43:42,410
It could be the case
that for each given t,

861
00:43:42,410 --> 00:43:46,440
I can make n large enough so
that this probability becomes

862
00:43:46,440 --> 00:43:47,640
small.

863
00:43:47,640 --> 00:43:49,950
But then maybe it's an n of t.

864
00:43:49,950 --> 00:43:53,750
So this here means
that for any--

865
00:43:53,750 --> 00:43:56,570
maybe I shouldn't put,
let me put a delta here.

866
00:43:56,570 --> 00:44:02,460
So for any epsilon, for
any t and for any epsilon,

867
00:44:02,460 --> 00:44:09,800
there exists n, which could
depend on both epsilon

868
00:44:09,800 --> 00:44:15,920
and t, such that the
probability that Fn t

869
00:44:15,920 --> 00:44:25,110
minus F of t exceeding delta
is less than epsilon t.

870
00:44:25,110 --> 00:44:29,140
There exists an n and a delta.

871
00:44:29,140 --> 00:44:30,600
No, that's for all delta, sorry.

872
00:44:34,810 --> 00:44:36,100
So this is true.

873
00:44:36,100 --> 00:44:40,040
That's what this limit
statement actually means.

874
00:44:40,040 --> 00:44:43,060
But it could be the case that
now when I take the sup over t,

875
00:44:43,060 --> 00:44:47,380
maybe that n of t is
something that looks like t.

876
00:44:50,480 --> 00:44:54,510
Or maybe, well,
integer part of t.

877
00:44:54,510 --> 00:44:56,175
It could be, right?

878
00:44:56,175 --> 00:44:57,050
I don't say anything.

879
00:44:57,050 --> 00:44:59,710
It's just an n
that depends on t.

880
00:44:59,710 --> 00:45:04,730
So if this n is just t,
maybe t over epsilon,

881
00:45:04,730 --> 00:45:05,930
because I want epsilon.

882
00:45:05,930 --> 00:45:07,610
Something like this.

883
00:45:07,610 --> 00:45:09,470
Well that means
that if I want this

884
00:45:09,470 --> 00:45:11,510
to hold for all t's
at once, I'm going

885
00:45:11,510 --> 00:45:15,980
to have to go for the n that
works for all t's at once.

886
00:45:15,980 --> 00:45:19,070
But there's no such n that
works for all t's at once.

887
00:45:19,070 --> 00:45:21,830
The only n that
works is infinity.

888
00:45:21,830 --> 00:45:24,350
And so I cannot make this
happen for all of them.

889
00:45:24,350 --> 00:45:26,420
What Glivenko-Cantelli
tells you,

890
00:45:26,420 --> 00:45:29,090
it's actually this is not
something that holds like this.

891
00:45:29,090 --> 00:45:33,650
That the n that depends on t,
there's actually one largest n

892
00:45:33,650 --> 00:45:37,150
that works for all the t's
at once, and that's it.

893
00:45:39,451 --> 00:45:39,950
OK.

894
00:45:39,950 --> 00:45:44,150
So just so you know why this is
actually a stronger statement,

895
00:45:44,150 --> 00:45:48,880
and that's basically
how it works.

896
00:45:48,880 --> 00:45:50,567
Any other question?

897
00:45:50,567 --> 00:45:51,067
Yeah.

898
00:45:51,067 --> 00:45:53,271
AUDIENCE: So what's
the position for this

899
00:45:53,271 --> 00:45:54,979
to have, because the
random variable have

900
00:45:54,979 --> 00:45:57,179
a finite mean, finite variance?

901
00:45:57,179 --> 00:45:58,664
PROFESSOR: No.

902
00:45:58,664 --> 00:46:00,580
Well the random variable
does have finite mean

903
00:46:00,580 --> 00:46:02,580
and finite variance,
because the random variable

904
00:46:02,580 --> 00:46:03,580
is an indicator.

905
00:46:03,580 --> 00:46:04,830
So it has everything you want.

906
00:46:04,830 --> 00:46:06,621
This is one of the
nicest random variables,

907
00:46:06,621 --> 00:46:08,410
this is a Bernoulli
random variable.

908
00:46:08,410 --> 00:46:11,952
So here when I say law of
large number, that this holds.

909
00:46:11,952 --> 00:46:12,910
Where did I write this?

910
00:46:12,910 --> 00:46:14,140
I think I erased it.

911
00:46:14,140 --> 00:46:15,440
Yeah, the one over there.

912
00:46:15,440 --> 00:46:16,570
This is actually the
law of large numbers

913
00:46:16,570 --> 00:46:17,680
for Bernoulli random variables.

914
00:46:17,680 --> 00:46:18,930
They have everything you want.

915
00:46:18,930 --> 00:46:21,320
They're bounded.

916
00:46:21,320 --> 00:46:21,820
Yes.

917
00:46:21,820 --> 00:46:23,989
AUDIENCE: So I'm having
trouble understanding

918
00:46:23,989 --> 00:46:25,194
the first statement.

919
00:46:25,194 --> 00:46:27,122
So it says, for all
epsilon and all t,

920
00:46:27,122 --> 00:46:29,540
the probability of that--

921
00:46:29,540 --> 00:46:31,040
PROFESSOR: So you mean this one?

922
00:46:31,040 --> 00:46:31,790
AUDIENCE: Yeah.

923
00:46:31,790 --> 00:46:34,760
PROFESSOR: For all
epsilon and all t.

924
00:46:34,760 --> 00:46:36,110
So you fix them now.

925
00:46:36,110 --> 00:46:39,050
Then the probability that,
sorry, that was delta.

926
00:46:39,050 --> 00:46:41,694
I changed this epsilon
to delta at some point.

927
00:46:41,694 --> 00:46:44,930
AUDIENCE: And then
what's the second line?

928
00:46:44,930 --> 00:46:49,860
PROFESSOR: Oh, so then
the second line says that,

929
00:46:49,860 --> 00:46:53,330
so I'm just rewriting in
terms of epsilon delta

930
00:46:53,330 --> 00:46:56,360
what this n goes
to infinity means.

931
00:46:56,360 --> 00:47:01,880
So it means that for
any a t and delta,

932
00:47:01,880 --> 00:47:04,250
so that's the same
as this guy here,

933
00:47:04,250 --> 00:47:06,284
then here I'm just going
back to rewriting this.

934
00:47:06,284 --> 00:47:08,450
It says that for any epsilon
there exists an n large

935
00:47:08,450 --> 00:47:11,990
enough such that, well,
n larger than this thing

936
00:47:11,990 --> 00:47:14,370
basically, such that this
thing is less than epsilon.

937
00:47:18,670 --> 00:47:21,420
So Glivenko-Cantelli tells us
that not only is this thing

938
00:47:21,420 --> 00:47:25,150
a good idea pointwise, but it's
also a good idea uniformly.

939
00:47:25,150 --> 00:47:27,690
And all it's saying
is if you actually

940
00:47:27,690 --> 00:47:30,300
were happy with just
this result, you should

941
00:47:30,300 --> 00:47:32,290
be even happier
with that result.

942
00:47:32,290 --> 00:47:34,570
And both of those results
only tell you one thing.

943
00:47:34,570 --> 00:47:36,720
They're just telling you
that the empirical CDF

944
00:47:36,720 --> 00:47:38,196
is a good estimator of the CDF.

945
00:47:41,600 --> 00:47:47,720
Now since those indicators
are Bernoulli distributions,

946
00:47:47,720 --> 00:47:50,390
I can actually do even more.

947
00:47:50,390 --> 00:47:52,130
So let me get this guy here.

948
00:48:00,240 --> 00:48:14,220
OK so, those guys,
Fn of t, this guy

949
00:48:14,220 --> 00:48:16,530
is a Bernoulli distribution.

950
00:48:16,530 --> 00:48:20,494
What is the parameter of
this Bernoulli distribution?

951
00:48:20,494 --> 00:48:22,410
What is the probability
that it takes value 1?

952
00:48:26,250 --> 00:48:26,989
AUDIENCE: F of t.

953
00:48:26,989 --> 00:48:28,030
PROFESSOR: F of t, right?

954
00:48:28,030 --> 00:48:30,484
It's just the probability
that this thing happens,

955
00:48:30,484 --> 00:48:31,150
which is F of t.

956
00:48:34,020 --> 00:48:40,650
So in particular the
variance of this guy

957
00:48:40,650 --> 00:48:42,540
is the variance
of this Bernoulli.

958
00:48:42,540 --> 00:48:46,730
So it's F of t 1 minus F of t.

959
00:48:46,730 --> 00:48:50,230
And I can use that in my
Central Limit Theorem.

960
00:48:50,230 --> 00:48:51,760
And Central Limit
Theorem is just

961
00:48:51,760 --> 00:48:53,890
going to tell me that
if I look at the average

962
00:48:53,890 --> 00:48:56,530
of random variables,
I remove their mean,

963
00:48:56,530 --> 00:49:01,000
so I look at square
root of n Fn of t,

964
00:49:01,000 --> 00:49:04,380
which I could really
write as xn bar, right?

965
00:49:04,380 --> 00:49:06,730
That's really just an xn bar.

966
00:49:06,730 --> 00:49:08,320
Minus the expectation,
which is F

967
00:49:08,320 --> 00:49:11,290
of t, that comes from this guy.

968
00:49:11,290 --> 00:49:16,000
Now if I divide by square
root of the variance, that's

969
00:49:16,000 --> 00:49:18,950
my square root p1 minus p.

970
00:49:18,950 --> 00:49:22,540
Then this guy, by the
Central Limit Theorem,

971
00:49:22,540 --> 00:49:23,890
goes to some N 0, 1.

972
00:49:27,032 --> 00:49:28,740
Which is the same
thing as you see there,

973
00:49:28,740 --> 00:49:30,865
except that the variance
was put on the other side.

974
00:49:34,850 --> 00:49:36,170
OK.

975
00:49:36,170 --> 00:49:42,422
Do I have the same
thing uniformly in t?

976
00:49:46,110 --> 00:49:48,630
Can I write something
that holds uniformly in t?

977
00:49:48,630 --> 00:49:50,940
Well, if you think
about it for one second

978
00:49:50,940 --> 00:49:53,520
it's unlikely it's
going to go too well.

979
00:49:53,520 --> 00:49:55,650
In the sense that it's
unlikely that the supremum

980
00:49:55,650 --> 00:49:58,530
of those random variables over t
is going to also be a Gaussian.

981
00:50:02,800 --> 00:50:08,120
And the reason is that,
well actually the reason

982
00:50:08,120 --> 00:50:10,790
is that this thing is actually
a stochastic process indexed

983
00:50:10,790 --> 00:50:11,460
by t.

984
00:50:11,460 --> 00:50:14,870
A stochastic process is just
a sequence in random variables

985
00:50:14,870 --> 00:50:17,060
that's indexed by,
let's say time.

986
00:50:17,060 --> 00:50:20,030
The one that's the most
famous is Brownian motion,

987
00:50:20,030 --> 00:50:24,440
and it's basically a bunch
of Gaussian increments.

988
00:50:24,440 --> 00:50:27,170
So when you go from t to
just t a little after that,

989
00:50:27,170 --> 00:50:30,180
you have add some
Gaussian into the thing.

990
00:50:30,180 --> 00:50:33,770
And here it's basically the
same thing that's happening.

991
00:50:33,770 --> 00:50:35,970
And you would sort of expect,
since each of this guy

992
00:50:35,970 --> 00:50:37,610
is Gaussian, you
would expect to see

993
00:50:37,610 --> 00:50:40,005
something that looks like a
Brownian motion at the end.

994
00:50:40,005 --> 00:50:41,630
But it's not exactly
a Brownian motion,

995
00:50:41,630 --> 00:50:43,671
it's something that's
called the Brownian bridge.

996
00:50:43,671 --> 00:50:45,920
So if you've seen the
Brownian motion, if I make

997
00:50:45,920 --> 00:50:49,160
it start at 0 for example,
so this is the value

998
00:50:49,160 --> 00:50:50,280
of my Brownian motion.

999
00:50:50,280 --> 00:50:52,110
Let's write it.

1000
00:50:52,110 --> 00:50:56,370
So this is one path, one
realization of Brownian motion.

1001
00:50:56,370 --> 00:50:59,350
Let's call it w of
t as t increases.

1002
00:50:59,350 --> 00:51:04,430
So let's say it starts at 0 and
looks like something like this.

1003
00:51:04,430 --> 00:51:06,540
So that's what Brownian
motion looks like.

1004
00:51:06,540 --> 00:51:11,010
It's just something
that's pretty nasty.

1005
00:51:11,010 --> 00:51:13,710
I mean it looks pretty nasty,
it's not continuous et cetera,

1006
00:51:13,710 --> 00:51:19,110
but it's actually very
benign in some average way.

1007
00:51:19,110 --> 00:51:21,150
So Brownian motion
is just something,

1008
00:51:21,150 --> 00:51:25,820
you should view this as if I
sum some random variable that

1009
00:51:25,820 --> 00:51:29,520
are Gaussian, and then I look at
this from farther and farther,

1010
00:51:29,520 --> 00:51:31,980
it's going to look like this.

1011
00:51:31,980 --> 00:51:34,750
And so here I cannot have
a Brownian motion in the n,

1012
00:51:34,750 --> 00:51:40,030
because what is the variance
of Fn of t minus F of t at t is

1013
00:51:40,030 --> 00:51:40,673
equal to 1?

1014
00:51:43,780 --> 00:51:47,890
Sorry, at t is
equal to infinity.

1015
00:51:47,890 --> 00:51:48,390
AUDIENCE: 0.

1016
00:51:48,390 --> 00:51:49,460
PROFESSOR: It's 0, right?

1017
00:51:49,460 --> 00:51:52,100
The variance goes from 0
at t is negative infinity,

1018
00:51:52,100 --> 00:51:56,940
because at negative infinity
F of t is going to 0.

1019
00:51:56,940 --> 00:51:59,850
And as t goes to
plus infinity, F of t

1020
00:51:59,850 --> 00:52:03,590
is going to 1, which means that
the variance of this guy as t

1021
00:52:03,590 --> 00:52:06,320
goes from negative
infinity to plus infinity

1022
00:52:06,320 --> 00:52:09,990
is pinned to be 0 on each side.

1023
00:52:09,990 --> 00:52:12,200
And so my Brownian
motion cannot,

1024
00:52:12,200 --> 00:52:14,630
when I describe a Brownian
motion I'm just adding more

1025
00:52:14,630 --> 00:52:16,880
and more entropy to the
thing and it's going all over

1026
00:52:16,880 --> 00:52:20,450
the place, but here what I want
is that as I go back it should

1027
00:52:20,450 --> 00:52:21,920
go back to essentially 0.

1028
00:52:21,920 --> 00:52:25,280
It should be pinned down to
a specific value at the n.

1029
00:52:25,280 --> 00:52:27,322
And that's actually called
the Brownian bridge.

1030
00:52:27,322 --> 00:52:29,030
It's a Brownian motion
that's conditioned

1031
00:52:29,030 --> 00:52:32,546
to come back to where
it started essentially.

1032
00:52:32,546 --> 00:52:35,170
Now you don't need to understand
Brownian bridges to understand

1033
00:52:35,170 --> 00:52:36,780
what I'm going to
be telling you.

1034
00:52:36,780 --> 00:52:39,040
The only thing I want
to communicate to you

1035
00:52:39,040 --> 00:52:42,720
is that this guy here, when
I say a Brownian bridge,

1036
00:52:42,720 --> 00:52:45,010
I can go to any probabilist
and they can tell you

1037
00:52:45,010 --> 00:52:51,167
all the probability properties
of this stochastic process.

1038
00:52:51,167 --> 00:52:52,750
It can tell me the
probability that it

1039
00:52:52,750 --> 00:52:55,120
takes any value at any point.

1040
00:52:55,120 --> 00:52:57,370
In particular, it can tell me--

1041
00:52:57,370 --> 00:53:01,040
the supremum between
0 and 1 of this guy,

1042
00:53:01,040 --> 00:53:03,230
it could tell me what the
cumulative distribution

1043
00:53:03,230 --> 00:53:04,813
function of this
thing is, can tell me

1044
00:53:04,813 --> 00:53:07,585
what the density of this thing
is, can tell me everything.

1045
00:53:07,585 --> 00:53:09,710
So it means that if I want
to compute probabilities

1046
00:53:09,710 --> 00:53:14,210
on this object here, which is
the maximum value that this guy

1047
00:53:14,210 --> 00:53:17,565
can take over a certain period
of time, which is basically

1048
00:53:17,565 --> 00:53:18,440
this random variable.

1049
00:53:18,440 --> 00:53:20,390
So if I look at the
value here, it's

1050
00:53:20,390 --> 00:53:22,310
a random variable
that fluctuates.

1051
00:53:22,310 --> 00:53:25,160
It can tell me where it is
with hyperability, can tell me

1052
00:53:25,160 --> 00:53:28,790
the quantiles of this
thing, which is useful

1053
00:53:28,790 --> 00:53:31,440
because I can build a table and
use it to compute my quantiles

1054
00:53:31,440 --> 00:53:34,480
and form tests from it.

1055
00:53:34,480 --> 00:53:36,100
So that's what
actually is quite nice.

1056
00:53:36,100 --> 00:53:38,170
It says that if I look at
the square root of n Fn

1057
00:53:38,170 --> 00:53:40,999
hat minus sup over
t, I get something

1058
00:53:40,999 --> 00:53:42,790
that looks like the
sup of these Gaussians,

1059
00:53:42,790 --> 00:53:44,290
but it's not really
sup of Gaussian,

1060
00:53:44,290 --> 00:53:46,040
it's sup of a Brownian motion.

1061
00:53:46,040 --> 00:53:48,290
Now there's something you
should be very careful here.

1062
00:53:48,290 --> 00:53:49,210
I cheated a little bit.

1063
00:53:49,210 --> 00:53:51,251
I mean, I didn't cheat,
I can do whatever I want.

1064
00:53:51,251 --> 00:53:55,730
But my notation might
be a little confusing.

1065
00:53:55,730 --> 00:54:01,870
Everybody sees that this t here
is not the same as this t here?

1066
00:54:01,870 --> 00:54:03,140
Can somebody see that?

1067
00:54:03,140 --> 00:54:05,690
Just because, first of all,
this guy's between 0 and 1.

1068
00:54:05,690 --> 00:54:09,550
And this guy is in all of R.

1069
00:54:09,550 --> 00:54:12,760
What is this t here?

1070
00:54:12,760 --> 00:54:14,040
As a function of this t here?

1071
00:54:21,270 --> 00:54:23,770
This guy is F of this guy.

1072
00:54:23,770 --> 00:54:27,790
So really, if I want it to
be completely transparent

1073
00:54:27,790 --> 00:54:32,750
and not save the
keys of my keyboard,

1074
00:54:32,750 --> 00:54:42,460
I would read this as sup
over t of Fn t minus F of t

1075
00:54:42,460 --> 00:54:46,430
goes to N distribution
as n goes to infinity.

1076
00:54:46,430 --> 00:54:50,440
The supremum over t,
again in R, so this guy is

1077
00:54:50,440 --> 00:54:52,655
for t in the entire
real line, this guy

1078
00:54:52,655 --> 00:54:54,620
is for t in the
entire real line.

1079
00:54:54,620 --> 00:54:58,440
But now I should
write b of what?

1080
00:54:58,440 --> 00:55:00,710
F of t, exactly.

1081
00:55:00,710 --> 00:55:04,150
So really the t here is
F of the original one.

1082
00:55:04,150 --> 00:55:06,570
And so that's a
Brownian bridge, where

1083
00:55:06,570 --> 00:55:09,870
when t goes to infinity
the Brownian bridge

1084
00:55:09,870 --> 00:55:11,670
goes from 0 to 1 and
it looks like this.

1085
00:55:11,670 --> 00:55:16,100
A Brownian bridge at
0 is 0, at 1 it's 0.

1086
00:55:16,100 --> 00:55:18,470
And it does this.

1087
00:55:18,470 --> 00:55:20,580
But it doesn't stray too
far because I condition

1088
00:55:20,580 --> 00:55:22,860
it to come back to this point.

1089
00:55:22,860 --> 00:55:26,600
That's what a
Brownian bridge is.

1090
00:55:26,600 --> 00:55:28,450
OK.

1091
00:55:28,450 --> 00:55:33,527
So in particular, I can find
a distribution for this guy.

1092
00:55:33,527 --> 00:55:35,610
And I can use this to build
a test which is called

1093
00:55:35,610 --> 00:55:37,120
the Kolmogorov-Smirnov test.

1094
00:55:39,810 --> 00:55:40,895
The idea is the following.

1095
00:55:40,895 --> 00:55:44,875
It says, if I want to
test some distribution

1096
00:55:44,875 --> 00:55:49,650
F0, some distribution that
has a particular CDF F0,

1097
00:55:49,650 --> 00:55:52,360
and I plug it in
under the null, then

1098
00:55:52,360 --> 00:55:55,420
this guy should have pretty
much the same distribution

1099
00:55:55,420 --> 00:55:58,090
as the supremum of
Brownian bridge.

1100
00:55:58,090 --> 00:56:00,790
And so if I see this to be
much larger than it should

1101
00:56:00,790 --> 00:56:02,980
be when it's the supremum
of a Brownian bridge,

1102
00:56:02,980 --> 00:56:05,020
I'm actually going to
reject my hypothesis.

1103
00:56:08,270 --> 00:56:09,290
So here's the test.

1104
00:56:09,290 --> 00:56:17,100
I want to test whether
H0, F is equal to F0,

1105
00:56:17,100 --> 00:56:22,850
and you will see that most
of the goodness of fit tests

1106
00:56:22,850 --> 00:56:24,950
are formulated
mathematically in terms

1107
00:56:24,950 --> 00:56:26,960
of the cumulative
distribution function.

1108
00:56:26,960 --> 00:56:29,600
I could formulate them in
terms of personality density

1109
00:56:29,600 --> 00:56:33,270
function, or just
write x follows N 0, 1,

1110
00:56:33,270 --> 00:56:34,950
but that's the way we write it.

1111
00:56:34,950 --> 00:56:37,880
We formulate them in terms
of cumulative distribution

1112
00:56:37,880 --> 00:56:39,650
function because
that's what we have

1113
00:56:39,650 --> 00:56:42,320
a handle on through the
empirical cumulative

1114
00:56:42,320 --> 00:56:44,330
distribution function.

1115
00:56:44,330 --> 00:56:50,300
And then it's versus H1,
F is not equal to F0.

1116
00:56:50,300 --> 00:56:52,370
So now I have my empirical CDF.

1117
00:56:52,370 --> 00:56:54,650
And I hope that for
all t's, Fn of t

1118
00:56:54,650 --> 00:56:57,900
should be close to F0 of t.

1119
00:56:57,900 --> 00:57:00,330
Let me write it like this.

1120
00:57:00,330 --> 00:57:03,740
I put it on the exponent
because otherwise that

1121
00:57:03,740 --> 00:57:06,650
would be the empirical
distribution function based

1122
00:57:06,650 --> 00:57:07,970
on zero observations.

1123
00:57:11,060 --> 00:57:14,011
Now I form the following
test statistic.

1124
00:57:21,450 --> 00:57:24,280
So my test statistic
is tn, which

1125
00:57:24,280 --> 00:57:28,120
is the supremum over t in
the real line of square root

1126
00:57:28,120 --> 00:57:34,494
of n Fn of t minus F
of t, sorry, F0 of t.

1127
00:57:34,494 --> 00:57:35,660
So I can compute everything.

1128
00:57:35,660 --> 00:57:37,450
I know this from
the data, and this

1129
00:57:37,450 --> 00:57:39,930
is the one that comes
from my null hypothesis.

1130
00:57:39,930 --> 00:57:41,939
As I can compute this thing.

1131
00:57:41,939 --> 00:57:43,480
And I know that if
this is true, this

1132
00:57:43,480 --> 00:57:46,180
should actually be the
supremum of a Brownian bridge.

1133
00:57:46,180 --> 00:57:48,940
Pretty much.

1134
00:57:48,940 --> 00:58:01,620
And so the Kolmogorov-Smirnov
test is simply,

1135
00:58:01,620 --> 00:58:09,080
reject if this guy,
tn, in absolute value,

1136
00:58:09,080 --> 00:58:10,690
no actually not
in absolute value.

1137
00:58:10,690 --> 00:58:13,590
This is just already
absolute valued.

1138
00:58:13,590 --> 00:58:14,960
Then this guy should be what?

1139
00:58:14,960 --> 00:58:20,580
It should be larger than the
q alpha over 2 distribution

1140
00:58:20,580 --> 00:58:21,540
that I have.

1141
00:58:21,540 --> 00:58:24,870
But now rather than
putting N 0, 1, or Tn,

1142
00:58:24,870 --> 00:58:30,016
this is here whatever
notation I have for supremum

1143
00:58:30,016 --> 00:58:31,952
of Brownian bridge.

1144
00:58:40,860 --> 00:58:43,710
Just like I did for any
pivotal distribution.

1145
00:58:43,710 --> 00:58:45,900
That was the same recipe
every single time.

1146
00:58:45,900 --> 00:58:47,970
I formed the test
statistic such that

1147
00:58:47,970 --> 00:58:51,330
the asymptotic distribution did
not depend on anything I know,

1148
00:58:51,330 --> 00:58:54,300
and then I would just reject
when this pivotal distribution

1149
00:58:54,300 --> 00:58:56,080
was larger than something.

1150
00:58:56,080 --> 00:58:56,845
Yes?

1151
00:58:56,845 --> 00:58:59,635
AUDIENCE: I'm not really sure
why Brownian bridge appears.

1152
00:59:02,900 --> 00:59:05,542
PROFESSOR: Do you know what
a Brownian bridge is, or?

1153
00:59:05,542 --> 00:59:06,500
AUDIENCE: Only vaguely.

1154
00:59:06,500 --> 00:59:07,100
PROFESSOR: OK.

1155
00:59:07,100 --> 00:59:14,320
So this thing here, think
of it as being a Gaussian.

1156
00:59:14,320 --> 00:59:18,110
So for all t you have a
Gaussian distribution.

1157
00:59:18,110 --> 00:59:27,270
Now a Brownian motion, so
if I had a Brownian motion

1158
00:59:27,270 --> 00:59:28,770
I need to tell you what the--

1159
00:59:28,770 --> 00:59:30,300
so it's basically
a Brownian motion

1160
00:59:30,300 --> 00:59:31,730
is something that
looks like this.

1161
00:59:31,730 --> 00:59:34,180
It's some random variable
that's indexed by t.

1162
00:59:34,180 --> 00:59:38,610
I want, say, the expectation
of Xt could be equal to 0

1163
00:59:38,610 --> 00:59:40,640
for all t.

1164
00:59:40,640 --> 00:59:42,960
And what I want is
that the increments

1165
00:59:42,960 --> 00:59:44,640
have a certain distribution.

1166
00:59:44,640 --> 00:59:53,700
So what I want is that the
expectation of Xt minus Xs

1167
00:59:53,700 --> 00:59:58,050
follows some distribution
which is N 0, t minus s.

1168
00:59:58,050 --> 01:00:00,750
So the increments are
bigger as I go farther,

1169
01:00:00,750 --> 01:00:02,580
in terms of variability.

1170
01:00:02,580 --> 01:00:05,880
And I also want some covariance
structure between the two.

1171
01:00:05,880 --> 01:00:10,320
So what I want is that the
covariance between Xs and Xt

1172
01:00:10,320 --> 01:00:12,750
is actually equal to
the minimum of s and t.

1173
01:00:18,520 --> 01:00:21,660
Yeah, maybe.

1174
01:00:21,660 --> 01:00:23,220
Yeah, that should be there.

1175
01:00:23,220 --> 01:00:26,040
So this is, you open a
probability book, that's

1176
01:00:26,040 --> 01:00:27,370
what it's going to look like.

1177
01:00:27,370 --> 01:00:31,710
So in particular, you
can see, if I put 0 here

1178
01:00:31,710 --> 01:00:34,390
and X0 is equal to
0, it has 0 variance.

1179
01:00:34,390 --> 01:00:38,180
So in particular,
it means that Xt,

1180
01:00:38,180 --> 01:00:39,920
if I look only at
the t-th one, it

1181
01:00:39,920 --> 01:00:43,110
has some normal distribution
with variance t.

1182
01:00:43,110 --> 01:00:46,050
So this is something
that just blows up.

1183
01:00:46,050 --> 01:00:49,230
So this guy here
looks like it's going

1184
01:00:49,230 --> 01:00:50,730
to be a Brownian
motion because when

1185
01:00:50,730 --> 01:00:53,700
I look at the left-hand side
it has a normal distribution.

1186
01:00:53,700 --> 01:00:55,950
Now there's a bunch of other
things you need to check.

1187
01:00:55,950 --> 01:00:58,325
It's the fact that you have
this covariance, for example,

1188
01:00:58,325 --> 01:01:00,090
which I did not tell you.

1189
01:01:00,090 --> 01:01:03,300
But it sure look
somewhat like that.

1190
01:01:03,300 --> 01:01:07,590
And in particular, when I
look at the normal with mean 0

1191
01:01:07,590 --> 01:01:10,440
and variance here,
then it's clear

1192
01:01:10,440 --> 01:01:12,420
that this guy does not
have a variance that's

1193
01:01:12,420 --> 01:01:16,560
going to go to infinity just
like the variance of this guy.

1194
01:01:16,560 --> 01:01:21,620
We know that the variance
is forced to be back to 0.

1195
01:01:21,620 --> 01:01:23,290
And so in particular
we have something

1196
01:01:23,290 --> 01:01:28,270
that has mean 0 always, whose
variance has to be 0 at 0,

1197
01:01:28,270 --> 01:01:31,870
and variance-- sorry, at t
equals negative infinity,

1198
01:01:31,870 --> 01:01:34,890
and variance 1 at t
equals plus infinity.

1199
01:01:34,890 --> 01:01:36,920
So a variance 0 at t
equals plus infinity,

1200
01:01:36,920 --> 01:01:40,420
and so I have to basically force
it to be equal to 0 at each n.

1201
01:01:40,420 --> 01:01:42,400
So the Brownian
motion here tends

1202
01:01:42,400 --> 01:01:44,830
to just go to
infinity somewhere,

1203
01:01:44,830 --> 01:01:47,110
whereas this guy
forces it to come back.

1204
01:01:47,110 --> 01:01:48,700
Now everything I
described to you

1205
01:01:48,700 --> 01:01:52,720
is on the scale negative
infinity to plus infinity,

1206
01:01:52,720 --> 01:01:56,620
but since everything
depends on F of t,

1207
01:01:56,620 --> 01:01:58,360
I can actually
just put that back

1208
01:01:58,360 --> 01:02:02,300
into a scale, which is 0 and 1
by a simple change of variable.

1209
01:02:02,300 --> 01:02:06,814
It's called change of time
for the Brownian motion.

1210
01:02:06,814 --> 01:02:07,314
OK?

1211
01:02:07,314 --> 01:02:08,302
Yeah.

1212
01:02:08,302 --> 01:02:09,784
AUDIENCE: So does
a Brownian bridge

1213
01:02:09,784 --> 01:02:13,242
have a variance at each
point that's proportional?

1214
01:02:13,242 --> 01:02:15,382
Like it starts at
0 variance and then

1215
01:02:15,382 --> 01:02:17,688
goes to 1/4 variance
in the middle

1216
01:02:17,688 --> 01:02:21,146
and then goes back
to 0 variance?

1217
01:02:21,146 --> 01:02:23,924
Like in the same
parabolic shape?

1218
01:02:23,924 --> 01:02:24,590
PROFESSOR: Yeah.

1219
01:02:24,590 --> 01:02:26,180
I mean, definitely.

1220
01:02:26,180 --> 01:02:29,873
I mean by symmetry you can
probably infer all the things.

1221
01:02:29,873 --> 01:02:31,706
AUDIENCE: Well I can
imagine Brownian bridge

1222
01:02:31,706 --> 01:02:34,904
with a variance that starts
at 0 and stays, like,

1223
01:02:34,904 --> 01:02:38,809
the shape of the variance
as you move along.

1224
01:02:38,809 --> 01:02:40,600
PROFESSOR: Yeah, so I
don't know if-- there

1225
01:02:40,600 --> 01:02:43,205
is an explicit formula
for this, and it's simple.

1226
01:02:43,205 --> 01:02:45,830
That's what I can tell you, but
I don't know what the explicit,

1227
01:02:45,830 --> 01:02:47,756
off the top of my head what
the explicit formula is.

1228
01:02:47,756 --> 01:02:49,708
AUDIENCE: But would it
have to match this F

1229
01:02:49,708 --> 01:02:53,112
of t 1 minus F of t structure?

1230
01:02:53,112 --> 01:02:53,612
Or not?

1231
01:02:53,612 --> 01:02:54,278
PROFESSOR: Yeah.

1232
01:02:56,052 --> 01:02:58,510
AUDIENCE: Or does the fact that
we're taking the supremum--

1233
01:02:58,510 --> 01:02:59,700
PROFESSOR: No.

1234
01:02:59,700 --> 01:03:03,390
Well the Brownian bridge, this
is the supremum-- you're right.

1235
01:03:03,390 --> 01:03:06,700
So this will be this form
for the variance for sure,

1236
01:03:06,700 --> 01:03:08,700
because this is only
marginal distributions that

1237
01:03:08,700 --> 01:03:10,920
don't take-- right,
the process is not just

1238
01:03:10,920 --> 01:03:13,360
what is the distribution
at each instant t.

1239
01:03:13,360 --> 01:03:15,990
It's also how do those
distributions interact

1240
01:03:15,990 --> 01:03:17,657
with each other in
terms of covariance.

1241
01:03:17,657 --> 01:03:19,740
For the marginal distributions
at each instance t,

1242
01:03:19,740 --> 01:03:22,950
you're right, the variance
is F of t 1 minus F of t.

1243
01:03:22,950 --> 01:03:25,170
We're not going to escape that.

1244
01:03:25,170 --> 01:03:27,390
But then the covariance
structure between those guys

1245
01:03:27,390 --> 01:03:29,250
is a little more complicated.

1246
01:03:29,250 --> 01:03:30,330
But yes, you're right.

1247
01:03:30,330 --> 01:03:32,201
For marginal that's enough.

1248
01:03:32,201 --> 01:03:32,701
Yeah?

1249
01:03:32,701 --> 01:03:34,701
AUDIENCE: So the supremum
of the Brownian bridge

1250
01:03:34,701 --> 01:03:38,180
is a number between 0
and 10, let's just say.

1251
01:03:38,180 --> 01:03:40,244
PROFESSOR: Yeah, it
could be infinity.

1252
01:03:40,244 --> 01:03:43,226
AUDIENCE: So it's not
symmetrical with respect to 0,

1253
01:03:43,226 --> 01:03:45,214
so why are we doing all over 2?

1254
01:03:56,170 --> 01:03:57,900
PROFESSOR: OK.

1255
01:03:57,900 --> 01:03:58,640
Did say raise it?

1256
01:03:58,640 --> 01:03:59,230
Yeah.

1257
01:03:59,230 --> 01:04:01,974
Because here I didn't say the
supremum of the absolute value

1258
01:04:01,974 --> 01:04:03,890
of a Brownian bridge, I
just said the supremum

1259
01:04:03,890 --> 01:04:04,765
of a Brownian bridge.

1260
01:04:04,765 --> 01:04:08,640
But you're right, let's
just do this like that.

1261
01:04:08,640 --> 01:04:11,070
And then it's probably cleaner.

1262
01:04:14,580 --> 01:04:17,210
So yeah, actually well
it should be q alpha.

1263
01:04:17,210 --> 01:04:19,630
So this is basically,
you're right.

1264
01:04:19,630 --> 01:04:22,210
So think of it as
being one-sided.

1265
01:04:22,210 --> 01:04:25,960
And there's actually no
symmetry for the supremum.

1266
01:04:25,960 --> 01:04:29,074
I mean the supremum is
not symmetric around 0,

1267
01:04:29,074 --> 01:04:29,740
so you're right.

1268
01:04:29,740 --> 01:04:33,630
I should not use alpha
over 2, thank you.

1269
01:04:33,630 --> 01:04:35,690
Any other question?

1270
01:04:35,690 --> 01:04:36,950
This should be alpha.

1271
01:04:36,950 --> 01:04:37,815
Yeah.

1272
01:04:37,815 --> 01:04:39,940
I mean those slides were
written with 1 minus alpha

1273
01:04:39,940 --> 01:04:42,490
and I have not replaced all
instances of 1 minus alpha

1274
01:04:42,490 --> 01:04:43,980
by alpha.

1275
01:04:43,980 --> 01:04:45,392
I mean, except this guy, tilde.

1276
01:04:45,392 --> 01:04:47,100
Well, depends on how
you want to call it.

1277
01:04:47,100 --> 01:04:50,520
But this is still, the
probability that Z exceeds

1278
01:04:50,520 --> 01:04:53,550
this guy should be alpha.

1279
01:04:53,550 --> 01:04:54,160
OK?

1280
01:04:54,160 --> 01:04:55,910
And this can be found in tables.

1281
01:04:55,910 --> 01:05:00,370
And we can compute the p-value
just like we did before.

1282
01:05:00,370 --> 01:05:02,320
But we have to simulate
it because it's not

1283
01:05:02,320 --> 01:05:04,236
going to depend on the
cumulative distribution

1284
01:05:04,236 --> 01:05:06,890
function of a Gaussian, like
it did for the usual Gaussian

1285
01:05:06,890 --> 01:05:07,740
test.

1286
01:05:07,740 --> 01:05:09,656
That's something that's
more complicated,

1287
01:05:09,656 --> 01:05:11,030
and typically you
don't even try.

1288
01:05:11,030 --> 01:05:14,210
You get the statistical
software to do it for you.

1289
01:05:14,210 --> 01:05:17,650
So just let me skip a few lines.

1290
01:05:17,650 --> 01:05:20,150
This is what the table looks
like for the Kolmogorov-Smirnov

1291
01:05:20,150 --> 01:05:21,430
test.

1292
01:05:21,430 --> 01:05:25,690
So it just tells you, what is
your number of observations, n.

1293
01:05:25,690 --> 01:05:28,270
Then you want alpha to
be equal to 5%, say.

1294
01:05:28,270 --> 01:05:30,320
Let's say you have
nine observations.

1295
01:05:30,320 --> 01:05:34,240
So if square root of n absolute
value of Fn of t minus F of t

1296
01:05:34,240 --> 01:05:36,610
exceeds this thing, you reject.

1297
01:05:46,060 --> 01:05:47,620
Well it's pretty
clear from this test

1298
01:05:47,620 --> 01:05:49,390
is that it looks
very nice, and I tell

1299
01:05:49,390 --> 01:05:50,680
you this is how you build it.

1300
01:05:50,680 --> 01:05:52,577
But if you think about
it for one second,

1301
01:05:52,577 --> 01:05:54,160
it's actually really
an annoying thing

1302
01:05:54,160 --> 01:05:57,760
to build because you have
to take the supremum over t.

1303
01:05:57,760 --> 01:06:01,150
This depends on computing a
supremum, which in practice

1304
01:06:01,150 --> 01:06:03,070
might be super cumbersome.

1305
01:06:03,070 --> 01:06:05,360
I don't want to have to
compute this for all values t

1306
01:06:05,360 --> 01:06:07,784
and then to take the
maximum of those guys.

1307
01:06:07,784 --> 01:06:09,950
It turns out that that's
actually quite nice that we

1308
01:06:09,950 --> 01:06:11,720
don't have to actually do this.

1309
01:06:11,720 --> 01:06:14,060
What does the empirical
distribution function

1310
01:06:14,060 --> 01:06:15,474
look like?

1311
01:06:15,474 --> 01:06:23,350
Well, this thing, remember
Fn of t by definition was--

1312
01:06:23,350 --> 01:06:25,590
so let me go to the
slide that's relevant.

1313
01:06:25,590 --> 01:06:27,290
So Fn of t looks like this.

1314
01:06:38,320 --> 01:06:41,590
So what it means is that when
t is between two observations,

1315
01:06:41,590 --> 01:06:44,390
then this guy is actually
keeping the same value.

1316
01:06:44,390 --> 01:06:48,210
So if I put my observations
on the real line here.

1317
01:06:48,210 --> 01:06:49,940
So let's say I have
one observation here,

1318
01:06:49,940 --> 01:06:51,782
one observation here,
one observation here,

1319
01:06:51,782 --> 01:06:53,740
one observation here,
and one observation here,

1320
01:06:53,740 --> 01:06:55,270
for simplicity.

1321
01:06:55,270 --> 01:06:57,730
Then this guy is basically,
up to this normalization,

1322
01:06:57,730 --> 01:07:01,820
counting how many observations
they have that are less than t.

1323
01:07:01,820 --> 01:07:05,020
So since I normalize by n, I
know that the smallest number

1324
01:07:05,020 --> 01:07:10,480
here is going to be 0, and
the largest number here

1325
01:07:10,480 --> 01:07:13,300
is going to be 1.

1326
01:07:13,300 --> 01:07:14,980
So let's say this
looks like this.

1327
01:07:14,980 --> 01:07:18,290
This is the value 1.

1328
01:07:18,290 --> 01:07:21,800
At the value, since I take
it less than or equal to,

1329
01:07:21,800 --> 01:07:24,530
when I'm at Xi, I'm
actually counting it.

1330
01:07:24,530 --> 01:07:26,570
So the jump happens at Xi.

1331
01:07:26,570 --> 01:07:29,000
So that's the first
observation, and then I jump.

1332
01:07:29,000 --> 01:07:30,860
By how much do I jump?

1333
01:07:33,650 --> 01:07:35,510
Yeah?

1334
01:07:35,510 --> 01:07:38,670
One over n, right?

1335
01:07:38,670 --> 01:07:41,732
And then this value
belongs to the right.

1336
01:07:41,732 --> 01:07:42,690
And then I do it again.

1337
01:07:50,850 --> 01:07:54,534
I know it's not going to work
out for me, but we'll see.

1338
01:07:54,534 --> 01:07:55,950
Oh no actually, I
did pretty well.

1339
01:08:00,790 --> 01:08:04,130
This is what my cumulative
distribution looks like.

1340
01:08:04,130 --> 01:08:05,630
Now if you look on
this slide, there

1341
01:08:05,630 --> 01:08:07,870
is this weird notation
where I start putting now

1342
01:08:07,870 --> 01:08:10,390
my indices in parentheses.

1343
01:08:10,390 --> 01:08:13,350
X parenthesis 1, X
parenthesis 2, et cetera.

1344
01:08:13,350 --> 01:08:15,910
Those are called the
ordered statistic.

1345
01:08:15,910 --> 01:08:18,729
It's just because it might
be, when my data is given

1346
01:08:18,729 --> 01:08:20,422
to me I just call the
first observation,

1347
01:08:20,422 --> 01:08:21,880
the one that's on
top of the table,

1348
01:08:21,880 --> 01:08:24,640
but it doesn't have to
be the smallest value.

1349
01:08:24,640 --> 01:08:28,029
So it might be that this
is X1 and that this is X2,

1350
01:08:28,029 --> 01:08:31,510
and then this is X3, X4, and X5.

1351
01:08:31,510 --> 01:08:33,374
These might be my observations.

1352
01:08:33,374 --> 01:08:35,290
So what I do is that I
call them in such a way

1353
01:08:35,290 --> 01:08:38,109
that this is actually,
I recall this guy X1,

1354
01:08:38,109 --> 01:08:40,569
which is just really X3.

1355
01:08:40,569 --> 01:08:46,810
This is X2, X3, X4, and X5.

1356
01:08:46,810 --> 01:08:48,790
These are my
reordered observations

1357
01:08:48,790 --> 01:08:52,029
in such a way that the
smallest one is indexed by one

1358
01:08:52,029 --> 01:08:54,013
and the largest one
is indexed by n.

1359
01:08:58,439 --> 01:09:01,200
So now this is
actually quite nice,

1360
01:09:01,200 --> 01:09:04,170
because what I'm trying to
do is to find the largest

1361
01:09:04,170 --> 01:09:07,210
deviation from this guy to the
true cumulative distribution

1362
01:09:07,210 --> 01:09:07,710
function.

1363
01:09:07,710 --> 01:09:09,460
The true cumulative
distribution function,

1364
01:09:09,460 --> 01:09:11,729
let's say it's Gaussian,
looks like this.

1365
01:09:15,340 --> 01:09:19,120
It's something continuous,
for a symmetric distribution

1366
01:09:19,120 --> 01:09:22,520
it crosses this axis at 1/2,
and that's what it looks like.

1367
01:09:22,520 --> 01:09:25,330
And the Kolmogorov-Smirnov
test is just

1368
01:09:25,330 --> 01:09:31,470
telling me how far do
those two curves get

1369
01:09:31,470 --> 01:09:35,069
in the worst possible case?

1370
01:09:35,069 --> 01:09:37,380
So in particular here,
where are they the farthest?

1371
01:09:37,380 --> 01:09:40,729
Clearly that's this point.

1372
01:09:40,729 --> 01:09:42,490
And so up to rescaling,
this is the value

1373
01:09:42,490 --> 01:09:44,510
I'm going to be interested in.

1374
01:09:44,510 --> 01:09:49,130
That's how they get as far
as possible from each other.

1375
01:09:49,130 --> 01:09:52,399
Here, something just
happened, right?

1376
01:09:52,399 --> 01:09:54,695
The farthest distance
that I got was exactly

1377
01:09:54,695 --> 01:09:55,970
at one of those dots.

1378
01:09:58,480 --> 01:10:01,600
It turns out this is enough
to look at those dots.

1379
01:10:01,600 --> 01:10:04,660
And the reason is, well
because after this dot

1380
01:10:04,660 --> 01:10:08,460
and until the next jump,
this guy does not change,

1381
01:10:08,460 --> 01:10:11,230
but this guy increases.

1382
01:10:11,230 --> 01:10:15,220
And so the only point where
they can be the farthest apart

1383
01:10:15,220 --> 01:10:19,720
is either to the left of a
jump or to the right of a jump.

1384
01:10:19,720 --> 01:10:22,540
That's the only place where
they can be far from each other.

1385
01:10:22,540 --> 01:10:24,896
And that means that
only one observation.

1386
01:10:24,896 --> 01:10:26,620
Everybody sees that?

1387
01:10:26,620 --> 01:10:29,470
The farthest points, the points
at which those two curves are

1388
01:10:29,470 --> 01:10:31,300
the farthest from
each other, has

1389
01:10:31,300 --> 01:10:34,000
to be at one of
the observations.

1390
01:10:34,000 --> 01:10:37,790
And so rather than looking at
a sup over all possible t's,

1391
01:10:37,790 --> 01:10:40,920
really all I need to do
is to look at a maximum

1392
01:10:40,920 --> 01:10:43,036
only at my observations.

1393
01:10:46,390 --> 01:10:48,960
I just need to check
at each of those points

1394
01:10:48,960 --> 01:10:51,150
whether they're far.

1395
01:10:51,150 --> 01:10:53,090
Now here, notice
that you did not,

1396
01:10:53,090 --> 01:10:57,530
this is not written Fn of Xi.

1397
01:10:57,530 --> 01:11:01,410
The reason is because I
actually know what Fn of Xi is.

1398
01:11:01,410 --> 01:11:05,320
Fn of the i-th order
observation is just

1399
01:11:05,320 --> 01:11:08,680
the number of jumps I've
had until this observation.

1400
01:11:08,680 --> 01:11:11,290
So here, I know that the
value of Fn is 1 over n,

1401
01:11:11,290 --> 01:11:15,520
here it's 2 over n, 3 over
n, 4 over n, 5 over n.

1402
01:11:15,520 --> 01:11:19,300
So I knew that the values
of Fn at my observations,

1403
01:11:19,300 --> 01:11:22,300
and those are actually the
only values that Fn can take,

1404
01:11:22,300 --> 01:11:25,060
are an integer divided by n.

1405
01:11:25,060 --> 01:11:29,680
And that's why you see i
minus 1 over n, or i over n.

1406
01:11:29,680 --> 01:11:32,520
This is the difference
just before the jump,

1407
01:11:32,520 --> 01:11:34,450
and this is the
difference at the jump.

1408
01:11:38,090 --> 01:11:42,800
So here the key message
is that this is no longer

1409
01:11:42,800 --> 01:11:44,610
a supremum over all
t's, but it's just

1410
01:11:44,610 --> 01:11:46,110
the maximum from 1 to n.

1411
01:11:46,110 --> 01:11:49,160
So I really have only
two n values to compute.

1412
01:11:49,160 --> 01:11:51,970
This value and this value for
each observation, that's 2n

1413
01:11:51,970 --> 01:11:52,850
total.

1414
01:11:52,850 --> 01:11:55,760
I look at the maximum and
that's actually the value.

1415
01:11:55,760 --> 01:11:58,907
And it's actually equal to tn.

1416
01:11:58,907 --> 01:11:59,990
It's not an approximation.

1417
01:11:59,990 --> 01:12:00,840
Those things are equal.

1418
01:12:00,840 --> 01:12:02,060
That's just the
only places where

1419
01:12:02,060 --> 01:12:03,143
those guys can be maximum.

1420
01:12:09,242 --> 01:12:10,715
Yes?

1421
01:12:10,715 --> 01:12:15,134
AUDIENCE: It seems like since
the null hypothesis [INAUDIBLE]

1422
01:12:15,134 --> 01:12:17,589
the entire
distribution of theta,

1423
01:12:17,589 --> 01:12:19,716
this is like strictly
more powerful than just

1424
01:12:19,716 --> 01:12:23,000
doing it [INAUDIBLE].

1425
01:12:23,000 --> 01:12:24,832
PROFESSOR: It's
strictly less powerful.

1426
01:12:24,832 --> 01:12:27,784
AUDIENCE: Strictly
less powerful.

1427
01:12:27,784 --> 01:12:30,490
But is there, is that
like a big trade-off

1428
01:12:30,490 --> 01:12:32,018
that we're making
when we do that?

1429
01:12:32,018 --> 01:12:33,934
Obviously we're not
certain in the first place

1430
01:12:33,934 --> 01:12:35,309
that we want to
assume normality.

1431
01:12:35,309 --> 01:12:37,624
Does it make sense
to [INAUDIBLE],,

1432
01:12:37,624 --> 01:12:39,592
the Gaussian [INAUDIBLE].

1433
01:12:48,000 --> 01:12:50,420
PROFESSOR: So can
you, I'm not sure what

1434
01:12:50,420 --> 01:12:51,400
question you're asking.

1435
01:12:51,400 --> 01:12:53,360
AUDIENCE: So when we're
doing a normal test,

1436
01:12:53,360 --> 01:12:55,810
we're just asking
questions about the mus,

1437
01:12:55,810 --> 01:12:57,280
the means of our distribution.

1438
01:12:57,280 --> 01:13:00,383
[INAUDIBLE] This
one, it seems like it

1439
01:13:00,383 --> 01:13:02,670
would be both at the same time.

1440
01:13:02,670 --> 01:13:11,000
[INAUDIBLE] Is this
decreasing power [INAUDIBLE]??

1441
01:13:11,000 --> 01:13:13,470
PROFESSOR: So remember,
here in this test

1442
01:13:13,470 --> 01:13:16,140
we want to conclude to H0, in
the other test we typically

1443
01:13:16,140 --> 01:13:17,670
want to conclude to H1.

1444
01:13:17,670 --> 01:13:21,150
So here we actually don't
want power, in a way.

1445
01:13:21,150 --> 01:13:24,619
And you have to also assume
that doing a test on the mean

1446
01:13:24,619 --> 01:13:26,160
is probably not the
only thing you're

1447
01:13:26,160 --> 01:13:27,576
going to end up
doing on your data

1448
01:13:27,576 --> 01:13:31,472
after you actually establish
that it's normally distributed.

1449
01:13:31,472 --> 01:13:33,180
Then you have the
dataset, you've sort of

1450
01:13:33,180 --> 01:13:34,763
established it's
normally distributed,

1451
01:13:34,763 --> 01:13:38,090
and then you can just run the
arsenal of statistical studies.

1452
01:13:38,090 --> 01:13:39,540
And we're going
to see regression

1453
01:13:39,540 --> 01:13:42,570
and all sorts of predictive
things, which are not just

1454
01:13:42,570 --> 01:13:44,280
tests if the mean is
equal to something.

1455
01:13:44,280 --> 01:13:45,660
Maybe you want to build
a confidence interval

1456
01:13:45,660 --> 01:13:46,530
for the mean.

1457
01:13:46,530 --> 01:13:50,052
Then this is not, confidence
interval is not a test.

1458
01:13:50,052 --> 01:13:52,260
So you're going to have to
first test if it's normal,

1459
01:13:52,260 --> 01:13:53,760
and then see if you
can actually use

1460
01:13:53,760 --> 01:13:55,770
the quantiles of a Gaussian
distribution or a t

1461
01:13:55,770 --> 01:13:59,760
distribution to build
this confidence interval.

1462
01:13:59,760 --> 01:14:03,510
So in a way you should do
this as like, the flat fee

1463
01:14:03,510 --> 01:14:05,670
to enter the Gaussian
world, and then you

1464
01:14:05,670 --> 01:14:09,072
can do whatever you want to
do in the Gaussian world.

1465
01:14:09,072 --> 01:14:11,030
We'll see actually that
your question goes back

1466
01:14:11,030 --> 01:14:14,750
to something that's a
little important, is here

1467
01:14:14,750 --> 01:14:17,540
I said F0 is fully specified.

1468
01:14:17,540 --> 01:14:21,490
It's like an N 1, 5.

1469
01:14:21,490 --> 01:14:24,410
But I didn't say, is it
normally distributed,

1470
01:14:24,410 --> 01:14:26,440
which is the question
that everybody asks.

1471
01:14:26,440 --> 01:14:29,189
You're not asking, is it this
particular normal distribution

1472
01:14:29,189 --> 01:14:31,480
with this particular mean
and this particular variance.

1473
01:14:31,480 --> 01:14:32,860
So how would you
do it in practice?

1474
01:14:32,860 --> 01:14:34,276
Well you would
say, I'm just going

1475
01:14:34,276 --> 01:14:36,910
to replace the mean by the
empirical mean and the variance

1476
01:14:36,910 --> 01:14:38,720
by the empirical variance.

1477
01:14:38,720 --> 01:14:41,710
But by doing that you're making
a huge mistake because you

1478
01:14:41,710 --> 01:14:45,160
are sort of depriving your
test of the possibility

1479
01:14:45,160 --> 01:14:46,967
to reject the Gaussian
hypothesis just

1480
01:14:46,967 --> 01:14:49,300
based on the fact that the
mean is wrong or the variance

1481
01:14:49,300 --> 01:14:49,930
is wrong.

1482
01:14:49,930 --> 01:14:52,600
You've already stuck to
your data pretty well.

1483
01:14:52,600 --> 01:14:55,660
And so you're sort
of like already

1484
01:14:55,660 --> 01:14:59,320
tilting the game in
favor of H0 big time.

1485
01:14:59,320 --> 01:15:01,300
So there's actually a
way to arrange for this.

1486
01:15:03,930 --> 01:15:05,555
OK, so this is about
pivotal statistic.

1487
01:15:05,555 --> 01:15:06,950
We've used this word many times.

1488
01:15:09,680 --> 01:15:12,272
And So that's how.

1489
01:15:12,272 --> 01:15:13,730
I'm not going to
go into this test.

1490
01:15:13,730 --> 01:15:16,640
It's really, this is a recipe
on how you would actually

1491
01:15:16,640 --> 01:15:20,920
build the table that I
showed you, this table.

1492
01:15:20,920 --> 01:15:23,660
This is basically the
recipe on how to build it.

1493
01:15:23,660 --> 01:15:25,790
There's another recipe to
build it, which is just

1494
01:15:25,790 --> 01:15:27,730
open a book at this page.

1495
01:15:27,730 --> 01:15:29,770
That's a little faster.

1496
01:15:29,770 --> 01:15:32,870
Or use software.

1497
01:15:32,870 --> 01:15:34,050
I just wanted to show you.

1498
01:15:34,050 --> 01:15:36,891
So let's just keep in mind,
anybody has a good memory?

1499
01:15:36,891 --> 01:15:38,390
Let's just keep in
mind this number.

1500
01:15:38,390 --> 01:15:44,060
This is the threshold for the
Kolmogorov-Smirnov statistic.

1501
01:15:44,060 --> 01:15:47,250
If I have 10 observations
and I want to do it at 5%,

1502
01:15:47,250 --> 01:15:50,060
it's about 41%.

1503
01:15:50,060 --> 01:15:52,380
So that's the number that
it should be larger from.

1504
01:15:52,380 --> 01:15:56,630
So it turns out that if you want
to test if it's normal, and not

1505
01:15:56,630 --> 01:15:59,000
just the specific
normal, this number

1506
01:15:59,000 --> 01:16:00,145
is going to be different.

1507
01:16:00,145 --> 01:16:01,520
Do you think the
number I'm going

1508
01:16:01,520 --> 01:16:03,561
to read in a table that's
appropriate for this is

1509
01:16:03,561 --> 01:16:05,630
going to be larger or smaller?

1510
01:16:05,630 --> 01:16:07,505
Who says larger?

1511
01:16:07,505 --> 01:16:09,130
AUDIENCE: Sorry, what
was the question?

1512
01:16:09,130 --> 01:16:10,588
PROFESSOR: So the
question is, this

1513
01:16:10,588 --> 01:16:20,270
is the number I should see if
my test was, is X, say, N 0, 5.

1514
01:16:20,270 --> 01:16:20,770
Right?

1515
01:16:20,770 --> 01:16:25,630
That's a specific distribution
with a specific F0.

1516
01:16:25,630 --> 01:16:27,810
So that's the
number, I would build

1517
01:16:27,810 --> 01:16:29,630
the Kolmogorov-Smirnov
statistic from this.

1518
01:16:29,630 --> 01:16:32,460
I would perform a test and
check if my Kolmogorov-Smirnov

1519
01:16:32,460 --> 01:16:34,970
statistic tn is larger
than this number or not.

1520
01:16:34,970 --> 01:16:36,450
If it's larger I'm
going to reject.

1521
01:16:36,450 --> 01:16:40,940
Now I say, actually, I don't
want to test if H0 is N 0, 5,

1522
01:16:40,940 --> 01:16:47,942
but it's just a mu sigma squared
for some mu and sigma squared.

1523
01:16:47,942 --> 01:16:50,400
And in particular I'm just
going to plugin mu hat and sigma

1524
01:16:50,400 --> 01:16:52,680
hat into my F0, run
the same statistic,

1525
01:16:52,680 --> 01:16:56,280
but compare it to
a different number.

1526
01:16:56,280 --> 01:17:00,090
So the larger the
number, the more or less

1527
01:17:00,090 --> 01:17:03,660
likely am I to reject?

1528
01:17:03,660 --> 01:17:05,730
The less likely I
am to reject, right?

1529
01:17:05,730 --> 01:17:09,700
So if I just use that
number, let's say

1530
01:17:09,700 --> 01:17:12,660
this is a large
number, I would be more

1531
01:17:12,660 --> 01:17:14,077
tempted to say it's Gaussian.

1532
01:17:14,077 --> 01:17:15,660
And if you look at
the table you would

1533
01:17:15,660 --> 01:17:18,300
get that if you make the
appropriate correction

1534
01:17:18,300 --> 01:17:21,200
at the same number
of observations, 10,

1535
01:17:21,200 --> 01:17:26,359
and the same level, you
get 25% as opposed to 41%.

1536
01:17:26,359 --> 01:17:28,650
That means that you're actually
much more likely if you

1537
01:17:28,650 --> 01:17:32,670
use the appropriate test to
reject the fact that it's

1538
01:17:32,670 --> 01:17:34,680
normal, which is bad
news, because that means

1539
01:17:34,680 --> 01:17:36,780
you don't have access
to the Gaussian arsenal,

1540
01:17:36,780 --> 01:17:38,160
and nobody wants to do this.

1541
01:17:38,160 --> 01:17:40,920
So actually this is a
mistake that people do a lot.

1542
01:17:40,920 --> 01:17:42,570
They use the
Kolmogorov-Smirnov test

1543
01:17:42,570 --> 01:17:45,810
to test for normality without
adjusting for the fact

1544
01:17:45,810 --> 01:17:48,210
that they've plugged
in the estimated mean

1545
01:17:48,210 --> 01:17:50,100
and the estimated variance.

1546
01:17:50,100 --> 01:17:53,520
This leads to rejecting
less often, right?

1547
01:17:53,520 --> 01:17:58,490
I mean this is almost half
of the number that we had.

1548
01:17:58,490 --> 01:18:00,990
And then they can be
happy and walk home

1549
01:18:00,990 --> 01:18:03,120
and say, well, I did the
test and it was normal.

1550
01:18:03,120 --> 01:18:04,640
So this is actually
a mistake that I

1551
01:18:04,640 --> 01:18:07,130
believe that genuinely at
least a quarter of the people

1552
01:18:07,130 --> 01:18:09,357
do make in purpose.

1553
01:18:09,357 --> 01:18:11,690
They just say, well I want
it to be Gaussian so I'm just

1554
01:18:11,690 --> 01:18:13,760
going to make my life easier.

1555
01:18:13,760 --> 01:18:17,180
So this is the so-called
Kolmogorov Lilliefors test.

1556
01:18:17,180 --> 01:18:20,800
We'll talk about it,
well not today for sure.

1557
01:18:20,800 --> 01:18:24,650
There's other statistics that
you can test, that you can use.

1558
01:18:24,650 --> 01:18:26,390
And the idea is to
say, well, we want

1559
01:18:26,390 --> 01:18:28,280
to know if the
empirical distribution

1560
01:18:28,280 --> 01:18:31,900
function, the empirical CDF,
is close to the true CDF.

1561
01:18:31,900 --> 01:18:33,880
The way we did it is by
forming the difference

1562
01:18:33,880 --> 01:18:36,240
in looking at the worst
possible distance they can be.

1563
01:18:36,240 --> 01:18:39,880
That's called a sup
norm, or L infinity norm,

1564
01:18:39,880 --> 01:18:42,140
in functional analysis.

1565
01:18:42,140 --> 01:18:44,020
So here, this is
what it looked like.

1566
01:18:44,020 --> 01:18:46,630
The distance between Fn and
F that we measured was just

1567
01:18:46,630 --> 01:18:48,520
the supremum distance
over all t's.

1568
01:18:48,520 --> 01:18:51,100
That's one way to measure
distance between two functions.

1569
01:18:51,100 --> 01:18:53,170
But there's an
infinite many ways

1570
01:18:53,170 --> 01:18:54,880
to measure distance
between functions.

1571
01:18:54,880 --> 01:18:56,840
One is something we're
much more familiar with,

1572
01:18:56,840 --> 01:18:59,510
which is the squared L2-norm.

1573
01:18:59,510 --> 01:19:02,770
This is nice because this
has like an inner product,

1574
01:19:02,770 --> 01:19:04,370
it has some nice properties.

1575
01:19:04,370 --> 01:19:06,740
And you could actually just,
rather than taking the sup,

1576
01:19:06,740 --> 01:19:10,280
you could just integrate
the squared distance.

1577
01:19:10,280 --> 01:19:14,485
And this is what leads to
Cramier-Von Mises test.

1578
01:19:14,485 --> 01:19:15,860
And then there's
another one that

1579
01:19:15,860 --> 01:19:18,770
says, well, maybe I don't want
to integrate without weights.

1580
01:19:18,770 --> 01:19:22,230
Maybe I want to put weights
that account for the variance.

1581
01:19:22,230 --> 01:19:24,500
And this guy is called
Anderson-Darling.

1582
01:19:24,500 --> 01:19:26,810
For each of these
tests you can check

1583
01:19:26,810 --> 01:19:29,660
that the asymptotic distribution
is going to be pivotal,

1584
01:19:29,660 --> 01:19:32,420
which means that there will be
a table at the back of some book

1585
01:19:32,420 --> 01:19:37,190
that tells you what the
statistic, the quantiles

1586
01:19:37,190 --> 01:19:38,730
of square root of
n times this guy

1587
01:19:38,730 --> 01:19:40,333
are asymptotically, basically.

1588
01:19:40,333 --> 01:19:41,299
Yeah?

1589
01:19:41,299 --> 01:19:44,197
AUDIENCE: For the
Kolmogorov-Smirnov test,

1590
01:19:44,197 --> 01:19:48,061
for the table that
shows the value it has,

1591
01:19:48,061 --> 01:19:51,572
it has the value
for different n.

1592
01:19:51,572 --> 01:19:53,390
But I thought we [INAUDIBLE]--

1593
01:19:53,390 --> 01:19:54,150
PROFESSOR: Yeah.

1594
01:19:54,150 --> 01:19:56,649
So that's just to show you that
asymptotically it's pivotal,

1595
01:19:56,649 --> 01:19:59,160
and I can point you
to one specific thing.

1596
01:19:59,160 --> 01:20:02,842
But it turns out that this thing
is actually pivotal for each n.

1597
01:20:02,842 --> 01:20:05,300
And that's why you have this
recipe to construct the entire

1598
01:20:05,300 --> 01:20:08,690
thing, because it's actually
not true for all possible n's.

1599
01:20:08,690 --> 01:20:10,700
Also there's the n
that shows up here.

1600
01:20:10,700 --> 01:20:13,040
So no actually,
this is something

1601
01:20:13,040 --> 01:20:14,090
you should have in mind.

1602
01:20:14,090 --> 01:20:18,350
So basically, let me
strike what I just said.

1603
01:20:18,350 --> 01:20:20,330
This thing you can
actually, this distribution

1604
01:20:20,330 --> 01:20:24,119
will not depend on F0
for any particular n.

1605
01:20:24,119 --> 01:20:25,910
It's just not going to
be a Brownian bridge

1606
01:20:25,910 --> 01:20:28,130
but a finite sample
approximation of a Brownian

1607
01:20:28,130 --> 01:20:31,160
bridge, and you can simulate
that just drawing samples

1608
01:20:31,160 --> 01:20:33,500
from it, building a
histogram, and constructing

1609
01:20:33,500 --> 01:20:35,286
the quantiles for this guy.

1610
01:20:35,286 --> 01:20:36,910
AUDIENCE: No one has
actually developed

1611
01:20:36,910 --> 01:20:38,304
a table for Brownian--

1612
01:20:38,304 --> 01:20:39,470
PROFESSOR: Oh, there is one.

1613
01:20:39,470 --> 01:20:42,570
That's the table, maybe.

1614
01:20:42,570 --> 01:20:46,670
Let's see if we see it at the
bottom of the other table.

1615
01:20:46,670 --> 01:20:47,220
Yeah.

1616
01:20:47,220 --> 01:20:47,720
See?

1617
01:20:47,720 --> 01:20:48,997
Over 40, over 30.

1618
01:20:48,997 --> 01:20:50,580
So this is not the
Kolmogorov-Smirnov,

1619
01:20:50,580 --> 01:20:52,710
but that's the
Kolmogorov Lilliefors.

1620
01:20:52,710 --> 01:20:54,900
Those numbers that
you see here, they

1621
01:20:54,900 --> 01:20:57,060
are the numbers for the
asymptotic thing which is

1622
01:20:57,060 --> 01:20:59,192
some sort of Brownian bridge.

1623
01:20:59,192 --> 01:21:00,184
Yeah?

1624
01:21:00,184 --> 01:21:01,184
AUDIENCE: Two questions.

1625
01:21:01,184 --> 01:21:03,656
If I want to build the
Kolmogorov-Smirnov test,

1626
01:21:03,656 --> 01:21:08,120
it says that F0 is
required to be continuous.

1627
01:21:08,120 --> 01:21:10,104
PROFESSOR: Yeah.

1628
01:21:10,104 --> 01:21:13,576
AUDIENCE: [INAUDIBLE] If
we have, like, probability

1629
01:21:13,576 --> 01:21:15,560
mass of a particular value.

1630
01:21:15,560 --> 01:21:18,060
Like some sort of data.

1631
01:21:18,060 --> 01:21:20,769
PROFESSOR: So then you won't
have this nice picture, right?

1632
01:21:20,769 --> 01:21:22,560
This can happen at any
point because you're

1633
01:21:22,560 --> 01:21:24,335
going to have
discontinuities in F

1634
01:21:24,335 --> 01:21:26,620
and those things can
happen everywhere.

1635
01:21:26,620 --> 01:21:27,120
And then--

1636
01:21:27,120 --> 01:21:29,034
AUDIENCE: Would the
supremum still work?

1637
01:21:29,034 --> 01:21:30,700
PROFESSOR: You mean
the Brownian bridge?

1638
01:21:30,700 --> 01:21:32,140
AUDIENCE: Yeah.

1639
01:21:32,140 --> 01:21:35,340
The Kolmogorov test
doesn't say that you

1640
01:21:35,340 --> 01:21:37,382
have to be able to easily
calculate the supremum.

1641
01:21:37,382 --> 01:21:39,256
PROFESSOR: No, no, no,
but you still need it.

1642
01:21:39,256 --> 01:21:40,600
You still need it for--

1643
01:21:40,600 --> 01:21:42,832
so there's some finite
sample versions of it

1644
01:21:42,832 --> 01:21:45,040
that you can use that are
slightly more conservative,

1645
01:21:45,040 --> 01:21:47,740
which is in a way good
news because you're

1646
01:21:47,740 --> 01:21:50,250
going to conclude more to H0.

1647
01:21:50,250 --> 01:21:52,930
And there's are some,
I forget the name,

1648
01:21:52,930 --> 01:21:57,172
it's Kiefer-Wolfowitz, the
Kiefer-Dvoretzky-Wolfowitz,

1649
01:21:57,172 --> 01:21:59,630
an equality which is basically
like Hoeffding's inequality.

1650
01:21:59,630 --> 01:22:01,510
So it's basically
up to bad constants

1651
01:22:01,510 --> 01:22:04,900
telling you the same result
as the Brownian bridge result,

1652
01:22:04,900 --> 01:22:06,850
and those are true all the time.

1653
01:22:06,850 --> 01:22:08,830
But for the exact
asymptotic distribution,

1654
01:22:08,830 --> 01:22:11,467
you need continuity.

1655
01:22:11,467 --> 01:22:12,496
Yes.

1656
01:22:12,496 --> 01:22:13,912
AUDIENCE: So just
a clarification.

1657
01:22:13,912 --> 01:22:15,868
So when we are testing
the Kolmogorov,

1658
01:22:15,868 --> 01:22:19,902
we shouldn't test a particular
mu and sigma squared?

1659
01:22:19,902 --> 01:22:22,110
PROFESSOR: Well if you know
what they are you can use

1660
01:22:22,110 --> 01:22:25,259
Kolmogorov-Smirnov, but if
you don't know what they are

1661
01:22:25,259 --> 01:22:26,300
you're going to plug in--

1662
01:22:26,300 --> 01:22:27,758
as soon as you're
going to estimate

1663
01:22:27,758 --> 01:22:29,534
the mean and the
variance from the data,

1664
01:22:29,534 --> 01:22:31,700
you should use the one we'll
see next time, which is

1665
01:22:31,700 --> 01:22:33,200
called Kolmogorov Lilliefors.

1666
01:22:33,200 --> 01:22:34,950
You don't have to think
about it too much.

1667
01:22:34,950 --> 01:22:38,000
We'll talk about it on Thursday.

1668
01:22:38,000 --> 01:22:39,215
Any other question?

1669
01:22:39,215 --> 01:22:40,090
So we're out of time.

1670
01:22:40,090 --> 01:22:45,700
So I think we should stop here,
and we'll resume on Thursday.