1
00:00:00,120 --> 00:00:02,661
PROFESSOR: The following content
is provided under a Creative

2
00:00:02,661 --> 00:00:03,880
Commons license.

3
00:00:03,880 --> 00:00:06,090
Your support will help
MIT OpenCourseWare

4
00:00:06,090 --> 00:00:10,180
continue to offer high quality
educational resources for free.

5
00:00:10,180 --> 00:00:12,720
To make a donation or to
view additional materials

6
00:00:12,720 --> 00:00:16,680
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,680 --> 00:00:17,880
at ocw.mit.edu.

8
00:00:21,590 --> 00:00:23,240
So welcome back.

9
00:00:23,240 --> 00:00:26,180
So we are now moving
to a new chapter, which

10
00:00:26,180 --> 00:00:30,050
is going to have a little
more of a statistical flavor

11
00:00:30,050 --> 00:00:32,720
when it comes to designing
methods, all right?

12
00:00:32,720 --> 00:00:35,090
Because if you
think about it, OK--

13
00:00:35,090 --> 00:00:39,139
some of you have probably
attempted problem number two

14
00:00:39,139 --> 00:00:39,930
in the problem set.

15
00:00:39,930 --> 00:00:44,360
And you realize that maximum
likelihood estimators does not

16
00:00:44,360 --> 00:00:48,080
give you super trivial
estimators, right?

17
00:00:48,080 --> 00:00:50,752
I mean, when you have an n theta
theta, then the thing you get

18
00:00:50,752 --> 00:00:53,210
is not something you could have
guessed before you actually

19
00:00:53,210 --> 00:00:55,410
attempted to solve that problem.

20
00:00:55,410 --> 00:00:59,270
And so, in a way, we've seen
already sophisticated methods.

21
00:00:59,270 --> 00:01:02,630
However, in many instances, the
maximum likelihood estimator

22
00:01:02,630 --> 00:01:03,710
was just an average.

23
00:01:03,710 --> 00:01:07,010
And in a way, even if
we had this confirmation

24
00:01:07,010 --> 00:01:09,950
for maximum likelihood that
indeed that was the estimator

25
00:01:09,950 --> 00:01:11,960
that maximum likelihood
would spit out,

26
00:01:11,960 --> 00:01:15,780
and that our intuition
was therefore pretty good,

27
00:01:15,780 --> 00:01:18,560
most of the statistical analysis
or use of the central limit

28
00:01:18,560 --> 00:01:20,390
theorems, all these
things actually

29
00:01:20,390 --> 00:01:23,780
did not come in the
building of estimator,

30
00:01:23,780 --> 00:01:25,700
in the design of the
estimator, but really

31
00:01:25,700 --> 00:01:27,365
in the analysis
of the estimator.

32
00:01:27,365 --> 00:01:29,955
And you could say,
well, if I know already

33
00:01:29,955 --> 00:01:31,580
that the best estimator
is the average,

34
00:01:31,580 --> 00:01:32,996
I'm just going to
use the average.

35
00:01:32,996 --> 00:01:35,450
I don't have to, basically,
quantify how good it is.

36
00:01:35,450 --> 00:01:37,760
I just know it's
the best I can do.

37
00:01:37,760 --> 00:01:39,290
We're going to talk about tests.

38
00:01:39,290 --> 00:01:44,460
And we're going to talk about
parametric hypothesis testing.

39
00:01:44,460 --> 00:01:46,880
So you should view this
as-- parametric means,

40
00:01:46,880 --> 00:01:49,420
well, it's about a parameter,
like we did before.

41
00:01:49,420 --> 00:01:54,420
And hypothesis testing is on
the same level as estimation.

42
00:01:54,420 --> 00:01:56,510
And on the same
level as estimator

43
00:01:56,510 --> 00:01:58,760
will be the word "test," OK?

44
00:01:58,760 --> 00:02:01,100
And when we're going
to devise a test,

45
00:02:01,100 --> 00:02:03,200
we're going to actually
need to understand

46
00:02:03,200 --> 00:02:06,090
random fluctuations that arise
from the central limit theorem

47
00:02:06,090 --> 00:02:06,860
better, OK?

48
00:02:06,860 --> 00:02:08,940
It's not just going
to be in the analysis.

49
00:02:08,940 --> 00:02:10,610
It's also going to
be in the design.

50
00:02:10,610 --> 00:02:12,901
And everything we've been
doing before in understanding

51
00:02:12,901 --> 00:02:14,720
the behavior of an
estimator is actually

52
00:02:14,720 --> 00:02:16,910
going to come in
and be extremely

53
00:02:16,910 --> 00:02:21,820
useful in the actual
design of tests, OK?

54
00:02:21,820 --> 00:02:25,120
So as an example, I want to talk
to you about some real data.

55
00:02:28,080 --> 00:02:29,690
I will not study this data.

56
00:02:29,690 --> 00:02:31,420
But this data actually exist.

57
00:02:31,420 --> 00:02:34,330
You can find it
on R. And so, it's

58
00:02:34,330 --> 00:02:37,450
the data from the so-called
credit union Cherry Blossom

59
00:02:37,450 --> 00:02:38,890
Run, which is a 10 mile race.

60
00:02:38,890 --> 00:02:40,550
It takes place
every year in D.C.

61
00:02:40,550 --> 00:02:42,550
It seems that some of the
years are pretty nice.

62
00:02:42,550 --> 00:02:45,820
In 2009, there were about
15,000 participants.

63
00:02:45,820 --> 00:02:47,290
Pretty big race.

64
00:02:47,290 --> 00:02:52,330
And the average running time
was 103.5 minutes, all right?

65
00:02:52,330 --> 00:02:57,710
So about an hour and a
half or a little bit more.

66
00:02:57,710 --> 00:03:01,240
And so, you can ask the
following question, right?

67
00:03:01,240 --> 00:03:02,580
This is actual data, right?

68
00:03:02,580 --> 00:03:07,330
103.5 actually averaged the
running time for all of 15,000.

69
00:03:07,330 --> 00:03:10,630
Now, this in practice, may not
be something very suitable.

70
00:03:10,630 --> 00:03:13,330
And you might want to
just sample a few runners

71
00:03:13,330 --> 00:03:15,130
and try to understand
how they're

72
00:03:15,130 --> 00:03:16,900
behaving every
year without having

73
00:03:16,900 --> 00:03:18,910
to collect the entire data set.

74
00:03:18,910 --> 00:03:20,710
And so, you could ask
the question, well,

75
00:03:20,710 --> 00:03:24,160
let's say my budget is to
ask for maybe 10 runners

76
00:03:24,160 --> 00:03:25,960
what their running time was.

77
00:03:25,960 --> 00:03:27,760
I still want to be
able to determine

78
00:03:27,760 --> 00:03:31,600
whether they were running
faster in 2012 than in 2009.

79
00:03:31,600 --> 00:03:34,420
Why do I put 2012, and not 2016?

80
00:03:34,420 --> 00:03:38,860
Well, because the data set
for 2012 is also available.

81
00:03:38,860 --> 00:03:41,770
So if you are interested
and you know how to use R,

82
00:03:41,770 --> 00:03:44,300
just go and have fun with it.

83
00:03:44,300 --> 00:03:47,180
So to answer this question, what
we do is we select n runners,

84
00:03:47,180 --> 00:03:47,680
right?

85
00:03:47,680 --> 00:03:51,430
So n is a moderate number that's
more manageable than 15,000.

86
00:03:51,430 --> 00:03:53,500
From the 2012 race at random.

87
00:03:53,500 --> 00:03:56,199
That's where the random variable
is going to come from, right?

88
00:03:56,199 --> 00:03:58,615
That's where we actually inject
randomness in our problem.

89
00:04:02,789 --> 00:04:04,205
So remember this
is an experience.

90
00:04:04,205 --> 00:04:06,970
So really in a way, the
runners are the omegas.

91
00:04:06,970 --> 00:04:10,150
And I'm interested in
measurements on those guys.

92
00:04:10,150 --> 00:04:11,980
So this is how I have
a random variable.

93
00:04:11,980 --> 00:04:15,570
And this random verbal here is
measuring their running time.

94
00:04:15,570 --> 00:04:16,120
OK.

95
00:04:16,120 --> 00:04:18,179
If you look at the data
set, you have all sorts

96
00:04:18,179 --> 00:04:19,720
of random variables
you could measure

97
00:04:19,720 --> 00:04:21,310
about those random runners.

98
00:04:21,310 --> 00:04:22,630
Country of origin.

99
00:04:22,630 --> 00:04:25,330
I don't know, height,
age, a bunch of things.

100
00:04:25,330 --> 00:04:25,860
OK.

101
00:04:25,860 --> 00:04:27,401
Here, the random
variable of interest

102
00:04:27,401 --> 00:04:29,750
being the running time.

103
00:04:29,750 --> 00:04:30,250
OK.

104
00:04:30,250 --> 00:04:32,510
Everybody understand
what the process is?

105
00:04:32,510 --> 00:04:33,010
OK.

106
00:04:33,010 --> 00:04:36,220
So now I'm going to have to
make some modeling assumptions.

107
00:04:36,220 --> 00:04:37,930
And here, I'm
actually pretty lucky.

108
00:04:37,930 --> 00:04:41,770
I actually have all the
data from a past year.

109
00:04:41,770 --> 00:04:44,500
I mean, this is not the data
from 2012, which I also have,

110
00:04:44,500 --> 00:04:45,504
but I don't use.

111
00:04:45,504 --> 00:04:47,920
But I can actually use past
data to try to understand what

112
00:04:47,920 --> 00:04:49,210
distribution do I have, right?

113
00:04:49,210 --> 00:04:51,610
I mean, after all,
running time is going

114
00:04:51,610 --> 00:04:52,780
to be rounded to something.

115
00:04:52,780 --> 00:04:55,632
Maybe I can think of it as
a discrete random variable.

116
00:04:55,632 --> 00:04:58,090
Maybe I can think of it as the
exponential random variable.

117
00:04:58,090 --> 00:05:00,040
Those are positive numbers.

118
00:05:00,040 --> 00:05:01,896
I mean, there's many
kind of running times

119
00:05:01,896 --> 00:05:03,020
that could come up to mind.

120
00:05:03,020 --> 00:05:04,686
Many kind of distributions
I could think

121
00:05:04,686 --> 00:05:06,630
of for this modeling part.

122
00:05:06,630 --> 00:05:08,200
But it turns out
that if you actually

123
00:05:08,200 --> 00:05:11,440
plug the histogram of those
running times for all 15,000

124
00:05:11,440 --> 00:05:14,410
runners in 2009,
you actually are

125
00:05:14,410 --> 00:05:16,145
pretty happy to
see that it really

126
00:05:16,145 --> 00:05:18,020
looks like a bell-shaped
curve, which suggest

127
00:05:18,020 --> 00:05:19,630
that this should be a Gaussian.

128
00:05:19,630 --> 00:05:25,630
So what you go on to do
is you estimate the mean

129
00:05:25,630 --> 00:05:29,800
from past observations, which
was actually 103.5, as we said.

130
00:05:29,800 --> 00:05:34,900
You submit the
variance, which was 373.

131
00:05:34,900 --> 00:05:37,960
And you just try to
superimpose the curve

132
00:05:37,960 --> 00:05:43,740
with this one, which is a
Gaussian PDF with mean 103.5

133
00:05:43,740 --> 00:05:45,160
and variants 373.

134
00:05:45,160 --> 00:05:48,110
And you see that they
actually look very much alike.

135
00:05:48,110 --> 00:05:50,530
And so here, you're
pretty comfortable to say

136
00:05:50,530 --> 00:05:53,570
that the running time actually
is Gaussian distribution.

137
00:05:53,570 --> 00:05:54,070
All right?

138
00:05:54,070 --> 00:05:56,410
So now I know that
the x1 to xn, I'm

139
00:05:56,410 --> 00:05:58,960
going to say they're
Gaussian, OK?

140
00:05:58,960 --> 00:06:01,550
I still need to
specify two parameters.

141
00:06:01,550 --> 00:06:05,260
So what I want to know is,
is the distribution the same

142
00:06:05,260 --> 00:06:06,220
from past years, right?

143
00:06:06,220 --> 00:06:08,380
So I want to know if the random
variable that I'm looking

144
00:06:08,380 --> 00:06:09,463
for-- if I, say, pick one.

145
00:06:09,463 --> 00:06:10,510
Say, x1.

146
00:06:10,510 --> 00:06:12,730
Does it have the same
distribution in 2012

147
00:06:12,730 --> 00:06:15,820
that it did in 2009?

148
00:06:15,820 --> 00:06:16,560
OK.

149
00:06:16,560 --> 00:06:19,330
And so, the question
is, is x1 has

150
00:06:19,330 --> 00:06:23,530
a Gaussian distribution with
mean 103.5 and variance 373?

151
00:06:23,530 --> 00:06:24,580
Is that clear?

152
00:06:24,580 --> 00:06:25,450
OK.

153
00:06:25,450 --> 00:06:30,250
So this question that calls
for a yes or no answer

154
00:06:30,250 --> 00:06:31,840
is a hypothesis testing problem.

155
00:06:31,840 --> 00:06:34,090
I am testing a hypothesis.

156
00:06:34,090 --> 00:06:36,850
And this is the basis
of basically all

157
00:06:36,850 --> 00:06:39,490
of data-driven
scientific inquiry.

158
00:06:39,490 --> 00:06:40,570
You just ask questions.

159
00:06:40,570 --> 00:06:43,930
You formulate a
scientific hypothesis.

160
00:06:43,930 --> 00:06:48,816
Knocking down this gene is going
to cure melanoma, is this true?

161
00:06:48,816 --> 00:06:49,690
I'm going to collect.

162
00:06:49,690 --> 00:06:50,430
I'm going to try.

163
00:06:50,430 --> 00:06:52,990
I'm to observe some patients on
which I knock down this gene.

164
00:06:52,990 --> 00:06:54,615
I'm going to collect
some measurements.

165
00:06:54,615 --> 00:07:00,190
And I'm going to try to answer
this yes/no question, OK?

166
00:07:00,190 --> 00:07:02,120
It's different
from the question,

167
00:07:02,120 --> 00:07:08,150
what is the mean running
time for this year?

168
00:07:08,150 --> 00:07:10,210
OK.

169
00:07:10,210 --> 00:07:12,350
So this hypothesis
testing is testing

170
00:07:12,350 --> 00:07:13,770
if this hypothesis is true.

171
00:07:13,770 --> 00:07:20,750
The hypothesis in common
English we just said,

172
00:07:20,750 --> 00:07:22,111
were runners running faster?

173
00:07:22,111 --> 00:07:22,610
All right?

174
00:07:22,610 --> 00:07:24,451
Anybody could formulate
this hypothesis.

175
00:07:24,451 --> 00:07:25,700
Now, you go to a statistician.

176
00:07:25,700 --> 00:07:27,241
And he's like, oh,
what you're really

177
00:07:27,241 --> 00:07:32,750
asking me is x1 has a Gaussian
distribution with mean

178
00:07:32,750 --> 00:07:35,630
less than 103.5 and
variance 373, right?

179
00:07:35,630 --> 00:07:38,380
That's really the question that
you ask in statistical terms.

180
00:07:38,380 --> 00:07:42,050
And so, if you're asking if
this was the same as before,

181
00:07:42,050 --> 00:07:44,720
there's many ways it could
not be the same as before.

182
00:07:44,720 --> 00:07:46,490
There's basically
three ways it could not

183
00:07:46,490 --> 00:07:47,840
be the same as before.

184
00:07:47,840 --> 00:07:54,360
It could be the case that x1
is in expectation to 103.5

185
00:07:54,360 --> 00:07:56,610
So the expectation has changed.

186
00:07:56,610 --> 00:07:58,710
Or the variance has changed.

187
00:07:58,710 --> 00:08:00,320
Or the distribution has changed.

188
00:08:00,320 --> 00:08:01,400
I mean, who knows?

189
00:08:01,400 --> 00:08:05,060
Maybe runners are now just all
running holding their hands.

190
00:08:05,060 --> 00:08:08,190
And it's like now a point
mass at 1 given point.

191
00:08:08,190 --> 00:08:08,690
OK.

192
00:08:08,690 --> 00:08:11,000
So you never know what
could [INAUDIBLE]..

193
00:08:11,000 --> 00:08:14,060
Now of course, if you
allow for any change,

194
00:08:14,060 --> 00:08:16,111
you will find change.

195
00:08:16,111 --> 00:08:18,610
And so what you have to do is
to factor in as much knowledge

196
00:08:18,610 --> 00:08:19,270
as you can.

197
00:08:19,270 --> 00:08:20,710
Make as many
modeling assumptions,

198
00:08:20,710 --> 00:08:22,210
so that you can
let the data speak

199
00:08:22,210 --> 00:08:23,730
about your particular question.

200
00:08:23,730 --> 00:08:27,230
Here, your particular question
is, are they running faster?

201
00:08:27,230 --> 00:08:30,460
So you're only really asking a
question about the expectation.

202
00:08:30,460 --> 00:08:33,159
You really want to know if
the expectation has changed.

203
00:08:33,159 --> 00:08:35,114
So as far as you're
concerned, you're

204
00:08:35,114 --> 00:08:37,030
happy to make the
assumption that the rest has

205
00:08:37,030 --> 00:08:38,309
been unchanged.

206
00:08:38,309 --> 00:08:39,760
OK.

207
00:08:39,760 --> 00:08:42,429
And so, this is the
question we're asking.

208
00:08:42,429 --> 00:08:45,760
Is the expectation
now less than 103.5?

209
00:08:45,760 --> 00:08:48,880
Because you specifically
asked whether runners were

210
00:08:48,880 --> 00:08:50,650
going faster this year, right?

211
00:08:50,650 --> 00:08:55,420
They tend to go faster rather
than slower, all right?

212
00:08:55,420 --> 00:08:56,470
OK.

213
00:08:56,470 --> 00:09:00,030
So this is the question we're
asking in mathematical terms.

214
00:09:00,030 --> 00:09:03,350
So first, when I did that, I
need to basically fix the rest.

215
00:09:03,350 --> 00:09:05,450
And fixing the rest
is actually part

216
00:09:05,450 --> 00:09:07,160
of the modeling assumptions.

217
00:09:07,160 --> 00:09:10,470
So I fixed my
variance to be 373.

218
00:09:10,470 --> 00:09:11,300
OK?

219
00:09:11,300 --> 00:09:13,250
I assume that the
variance has not

220
00:09:13,250 --> 00:09:17,120
changed between 2009 and 2012.

221
00:09:17,120 --> 00:09:18,620
Now, this is an assumption.

222
00:09:18,620 --> 00:09:21,110
It turns out it's wrong.

223
00:09:21,110 --> 00:09:22,910
So if you look at
the data from 2012,

224
00:09:22,910 --> 00:09:24,496
this is not the
correct assumption.

225
00:09:24,496 --> 00:09:26,120
But I'm just going
to make it right now

226
00:09:26,120 --> 00:09:29,700
for the sake of argument, OK?

227
00:09:29,700 --> 00:09:31,340
And also the fact
that it's Gaussian.

228
00:09:31,340 --> 00:09:34,850
Now, this is going to be
hard to violate, right?

229
00:09:34,850 --> 00:09:37,640
I mean, where did this
bell-shaped curve come from?

230
00:09:37,640 --> 00:09:42,480
Well, it's just natural when you
just measure a bunch of things.

231
00:09:42,480 --> 00:09:44,070
The central limit
theorem appears

232
00:09:44,070 --> 00:09:45,320
in the small things of nature.

233
00:09:45,320 --> 00:09:48,200
I mean, that's the bedtime story
you get about the central limit

234
00:09:48,200 --> 00:09:48,900
theorem.

235
00:09:48,900 --> 00:09:50,940
And that's why the bell-shaped
curve is everywhere in nature.

236
00:09:50,940 --> 00:09:52,730
It's the sum of little
independent things

237
00:09:52,730 --> 00:09:53,910
that are going on.

238
00:09:53,910 --> 00:09:57,080
And this Gaussian assumption,
even if I wanted to relax it,

239
00:09:57,080 --> 00:09:58,910
there's not much else I can do.

240
00:09:58,910 --> 00:10:02,020
It is pretty robust
across the years.

241
00:10:02,020 --> 00:10:02,520
All right.

242
00:10:02,520 --> 00:10:04,930
So the only thing
that we did not fix

243
00:10:04,930 --> 00:10:08,892
is the expectation of x1, which
now I want to know what it is.

244
00:10:08,892 --> 00:10:11,350
And since I don't know what it
is, I'm going to call it mu.

245
00:10:11,350 --> 00:10:13,641
And it's going to be a variable
of interest, all right?

246
00:10:13,641 --> 00:10:14,770
So it's just a number mu.

247
00:10:14,770 --> 00:10:17,320
Whatever this is I can try
to estimate it, maybe using

248
00:10:17,320 --> 00:10:19,162
maximum likelihood estimation.

249
00:10:19,162 --> 00:10:21,370
Probably using the average,
because this is Gaussian.

250
00:10:21,370 --> 00:10:22,995
And we know that the
maximum likelihood

251
00:10:22,995 --> 00:10:26,000
estimator for a Gaussian
is just the average.

252
00:10:26,000 --> 00:10:30,590
And now we only want to test
if mu is equal to 103.5,

253
00:10:30,590 --> 00:10:34,100
like it was in 2009.

254
00:10:34,100 --> 00:10:37,774
Or on the contrary, if
mu is not equal to 103.5.

255
00:10:37,774 --> 00:10:39,440
And more specifically,
if mu is actually

256
00:10:39,440 --> 00:10:41,240
strictly less than 103.5.

257
00:10:41,240 --> 00:10:42,950
That's the question you ask.

258
00:10:42,950 --> 00:10:49,370
Now, why am I in writing
mu equal to 103.5 or is

259
00:10:49,370 --> 00:10:53,240
less than 103.5
and equal to 103.5

260
00:10:53,240 --> 00:10:55,450
versus not equal to 103.5?

261
00:10:55,450 --> 00:10:58,640
It's because since you asked
me a more precise question,

262
00:10:58,640 --> 00:11:01,270
I'm going to be able to give
you a more precise answer.

263
00:11:01,270 --> 00:11:03,890
And so, if your question
is very specific--

264
00:11:03,890 --> 00:11:05,960
are they running faster?

265
00:11:05,960 --> 00:11:08,440
I'm going to factor
that in what I write.

266
00:11:08,440 --> 00:11:10,310
If you just ask
me, is it the same?

267
00:11:10,310 --> 00:11:13,647
I'm going to have to write,
or is it different than 103.5?

268
00:11:13,647 --> 00:11:15,230
And that's less
information about what

269
00:11:15,230 --> 00:11:19,200
you're looking for, OK?

270
00:11:19,200 --> 00:11:23,500
So by making all these
modeling assumptions--

271
00:11:23,500 --> 00:11:25,250
the fact that the
variance doesn't change,

272
00:11:25,250 --> 00:11:26,708
the fact that it's
still Gaussian--

273
00:11:26,708 --> 00:11:31,050
I've actually reduced
the number of.

274
00:11:31,050 --> 00:11:34,440
And I put numbers in quotes,
because this is still

275
00:11:34,440 --> 00:11:35,660
an infinite of them.

276
00:11:35,660 --> 00:11:38,220
But I'm limiting
the number of ways

277
00:11:38,220 --> 00:11:40,875
the hypothesis can be violated.

278
00:11:43,380 --> 00:11:46,050
The number of possible
alternative realities

279
00:11:46,050 --> 00:11:48,540
for this hypothesis, all right?

280
00:11:48,540 --> 00:11:50,520
For example, I'm
saying there's no way

281
00:11:50,520 --> 00:11:53,400
mu can be larger than 103.5.

282
00:11:53,400 --> 00:11:55,710
I've already
factored that in, OK?

283
00:11:55,710 --> 00:11:56,700
It could be.

284
00:11:56,700 --> 00:11:59,550
But I'm actually just going
to say that if it's larger,

285
00:11:59,550 --> 00:12:02,940
all I'm going to be able to tell
you is that it's not smaller.

286
00:12:02,940 --> 00:12:06,290
I'm not going to
be able to tell you

287
00:12:06,290 --> 00:12:07,710
that it's actually larger, OK?

288
00:12:12,830 --> 00:12:15,830
And the only way it
can be rejected now.

289
00:12:15,830 --> 00:12:17,990
The only way I can
reject my hypothesis

290
00:12:17,990 --> 00:12:22,900
is if x belongs to very specific
family of distributions.

291
00:12:22,900 --> 00:12:24,650
If it has a distribution
which is Gaussian

292
00:12:24,650 --> 00:12:29,820
with mean mu and variance of
373 for mu, which is less 103.5.

293
00:12:29,820 --> 00:12:30,320
All right?

294
00:12:30,320 --> 00:12:40,240
So we started with
basically was x1--

295
00:12:40,240 --> 00:12:41,470
so that's the reality.

296
00:12:41,470 --> 00:12:49,840
x1 follows n 103.5 373, OK?

297
00:12:49,840 --> 00:12:53,690
And this is everything
else, right?

298
00:12:53,690 --> 00:12:55,680
So for example,
here is x follows

299
00:12:55,680 --> 00:13:02,410
some exponential, 0.1, OK?

300
00:13:02,410 --> 00:13:04,270
This is just another
distribution here.

301
00:13:04,270 --> 00:13:06,040
Those are all the
possible distributions.

302
00:13:06,040 --> 00:13:09,010
What we said is we said,
OK, first of all, let's just

303
00:13:09,010 --> 00:13:13,780
keep only those Gaussian
distributions, right?

304
00:13:13,780 --> 00:13:18,100
And second, we said, well, among
those Gaussian distributions,

305
00:13:18,100 --> 00:13:20,290
let's only look at
those that have-- well,

306
00:13:20,290 --> 00:13:24,370
maybe this one should
be at the boundary--

307
00:13:24,370 --> 00:13:26,650
let's only look at
the Gaussians here.

308
00:13:26,650 --> 00:13:33,820
So this guy here are
all the Gaussians

309
00:13:33,820 --> 00:13:43,290
with mean mu and variance 373
for mu less than 103.5, OK?

310
00:13:43,290 --> 00:13:45,360
So when you're going
to give me data,

311
00:13:45,360 --> 00:13:48,660
I'm going to be able to
say, well, am I this guy?

312
00:13:48,660 --> 00:13:49,772
Or am I one of those guys?

313
00:13:49,772 --> 00:13:51,480
Rather than searching
through everything.

314
00:13:51,480 --> 00:13:53,940
And the more you search
the easier for you

315
00:13:53,940 --> 00:13:56,980
to find something that fits
better the data, right?

316
00:13:56,980 --> 00:14:00,000
And so, if I allow
everything possible,

317
00:14:00,000 --> 00:14:01,800
then there's going
to be something

318
00:14:01,800 --> 00:14:04,770
that just by pure randomness is
actually going to look better

319
00:14:04,770 --> 00:14:06,490
for the data, OK?

320
00:14:09,510 --> 00:14:12,780
So for example, if I draw
10 random variables, right?

321
00:14:12,780 --> 00:14:15,840
If n is equal to 10.

322
00:14:15,840 --> 00:14:18,150
And let's say they take
10 different values.

323
00:14:18,150 --> 00:14:20,070
Then it's actually more
likely that those guys

324
00:14:20,070 --> 00:14:23,040
come from a discrete
distribution that

325
00:14:23,040 --> 00:14:27,350
takes each of these values
with probability 1 over 10,

326
00:14:27,350 --> 00:14:30,560
than actually some Gaussian
random variable, right?

327
00:14:30,560 --> 00:14:31,507
That would be perfect.

328
00:14:31,507 --> 00:14:32,590
I can actually explain it.

329
00:14:32,590 --> 00:14:36,740
If the 10 numbers
I got were say--

330
00:14:36,740 --> 00:14:41,870
let's say I collect
3, 90, 95, and 102.

331
00:14:41,870 --> 00:14:44,420
Then the most likely
distribution for those guys

332
00:14:44,420 --> 00:14:46,490
is the discrete
distribution that

333
00:14:46,490 --> 00:14:51,260
takes three values, 91
with probability 1/3, 95

334
00:14:51,260 --> 00:14:57,387
with probability 1/3, and
102 with probably 1/3, right?

335
00:14:57,387 --> 00:14:59,720
That's definitely the most
likely distribution for this.

336
00:14:59,720 --> 00:15:02,090
So if I allowed this,
I would say, oh no.

337
00:15:02,090 --> 00:15:04,610
This is not distributed
according to that.

338
00:15:04,610 --> 00:15:06,920
It's distributed according
to this very specific

339
00:15:06,920 --> 00:15:09,800
distribution, which is
somewhere in the realm

340
00:15:09,800 --> 00:15:12,450
of all possible
distributions, OK?

341
00:15:12,450 --> 00:15:15,480
So now we're just going to try
to carve out all this stuff

342
00:15:15,480 --> 00:15:18,570
by making our assumptions.

343
00:15:18,570 --> 00:15:19,070
OK.

344
00:15:19,070 --> 00:15:20,870
So here in this
particular example,

345
00:15:20,870 --> 00:15:23,270
just make a mental note
that what we're doing

346
00:15:23,270 --> 00:15:25,580
is that I actually--

347
00:15:25,580 --> 00:15:31,340
a little birdie told me that the
reference number is 103.5, OK?

348
00:15:31,340 --> 00:15:34,760
That was the thing I'm
actually looking for.

349
00:15:34,760 --> 00:15:36,680
In practice, it's
actually seldom the case

350
00:15:36,680 --> 00:15:40,260
that you have this reference
for yourself to think of, right?

351
00:15:40,260 --> 00:15:43,260
Maybe here, I just happen
to have a full data

352
00:15:43,260 --> 00:15:46,620
set of all the runners of 2009.

353
00:15:46,620 --> 00:15:50,680
But if I really just
asked you, I said,

354
00:15:50,680 --> 00:15:55,990
were runners faster
in 2012 than in 2009?

355
00:15:55,990 --> 00:15:59,262
Here's $10 to perform
your statistical analysis.

356
00:15:59,262 --> 00:16:01,720
What you're probably going to
do is called maybe 10 runners

357
00:16:01,720 --> 00:16:05,380
from 2012, maybe 15
runners from 2009,

358
00:16:05,380 --> 00:16:07,930
ask them and try to
compare their mean.

359
00:16:07,930 --> 00:16:09,250
There's no standard reference.

360
00:16:09,250 --> 00:16:11,680
You would not be able to
come up with this 103.5,

361
00:16:11,680 --> 00:16:14,630
because these data maybe is
expensive to get or something.

362
00:16:14,630 --> 00:16:15,130
OK.

363
00:16:15,130 --> 00:16:18,250
So this is really more of
the standard case, all right?

364
00:16:18,250 --> 00:16:21,310
Where you really compare
two things with each other,

365
00:16:21,310 --> 00:16:23,830
but there's no actual
ground truth number

366
00:16:23,830 --> 00:16:26,280
that you're comparing it to.

367
00:16:26,280 --> 00:16:26,780
OK.

368
00:16:26,780 --> 00:16:28,700
So we'll come back
to that in a second.

369
00:16:28,700 --> 00:16:32,070
I'll tell you what the
other example looks like.

370
00:16:32,070 --> 00:16:34,430
So let's just stick
to this example.

371
00:16:34,430 --> 00:16:36,710
I tell you it's 103.5, OK?

372
00:16:36,710 --> 00:16:39,170
Let's try to have our
intuition work the same way.

373
00:16:39,170 --> 00:16:42,710
We said, well,
averages worked well.

374
00:16:42,710 --> 00:16:46,970
The average, tell me,
of over these 10 guys

375
00:16:46,970 --> 00:16:49,820
should tell me what
the mean should be.

376
00:16:49,820 --> 00:16:52,460
So I can just say,
well x bar is going

377
00:16:52,460 --> 00:16:55,500
to be close to the true mean
by the law of large number.

378
00:16:55,500 --> 00:17:00,600
So I'm going to decide whether
x bar is less than 103.5.

379
00:17:00,600 --> 00:17:04,069
And conclude that in this case,
indeed mu is less than 103.5,

380
00:17:04,069 --> 00:17:06,859
because those two
quantities are close, right?

381
00:17:06,859 --> 00:17:08,609
I could do that.

382
00:17:08,609 --> 00:17:10,670
The problem is that this
could go pretty wrong.

383
00:17:10,670 --> 00:17:13,730
Because if n is
small, then I know

384
00:17:13,730 --> 00:17:17,400
that xn bar is not equal to mu.

385
00:17:17,400 --> 00:17:19,819
I know that xn bar
is close to mu.

386
00:17:19,819 --> 00:17:21,859
But I also know that
there's pretty high chance

387
00:17:21,859 --> 00:17:23,150
that it's not equal to mu.

388
00:17:23,150 --> 00:17:26,410
In particular, I know it's going
to be somewhere at 1 over root

389
00:17:26,410 --> 00:17:28,580
n away from mu, right?

390
00:17:28,580 --> 00:17:31,690
1 over root n being the
root coming from what?

391
00:17:34,820 --> 00:17:35,600
CLT, right?

392
00:17:35,600 --> 00:17:40,700
That's the root n that comes
from CLT. In blunt words,

393
00:17:40,700 --> 00:17:44,510
CLT tells me the
mean is at distance

394
00:17:44,510 --> 00:17:47,600
1 over root n from the
expectation, pretty much.

395
00:17:47,600 --> 00:17:48,920
That's what it's telling.

396
00:17:48,920 --> 00:17:49,910
So 1 over root n.

397
00:17:52,510 --> 00:17:55,790
If I have 10 people in
there, 1 over root 10

398
00:17:55,790 --> 00:17:57,170
is not a huge number, right?

399
00:17:57,170 --> 00:18:00,290
It's like 1/3 pretty much.

400
00:18:00,290 --> 00:18:02,850
So 1/3 103.5.

401
00:18:02,850 --> 00:18:07,940
If the true mean
was actually 103.4,

402
00:18:07,940 --> 00:18:12,834
but my average was telling
me it's 103.4 plus 1/3,

403
00:18:12,834 --> 00:18:15,250
I would actually come to two
different conclusions, right?

404
00:18:22,620 --> 00:18:29,400
So let's say that mu
is equal to 103.4, OK?

405
00:18:29,400 --> 00:18:32,700
So you're not supposed
to know this, right?

406
00:18:32,700 --> 00:18:34,350
That's the hidden truth.

407
00:18:37,900 --> 00:18:38,400
OK.

408
00:18:38,400 --> 00:18:40,380
Now I have n is equal to 10.

409
00:18:40,380 --> 00:18:49,190
So I know that x
bar n minus 103.4

410
00:18:49,190 --> 00:18:52,470
is something of the order of
1 over the square root of 10,

411
00:18:52,470 --> 00:18:57,840
which is of the
order of, say, 0.3.

412
00:18:57,840 --> 00:18:58,340
OK.

413
00:18:58,340 --> 00:19:01,480
So here, this is
all hand wavy, OK?

414
00:19:01,480 --> 00:19:05,120
But that's what the central
limit theorem tells me.

415
00:19:05,120 --> 00:19:13,260
What it means is
that it is possible

416
00:19:13,260 --> 00:19:20,030
that x bar n is actually
equal to is actually

417
00:19:20,030 --> 00:19:30,970
equal to 103.4 plus 0.3,
which is equal to 103.7.

418
00:19:30,970 --> 00:19:40,350
Which means that while the truth
is that mu is less than 103.5,

419
00:19:40,350 --> 00:19:47,572
then I would conclude that
mu is larger than 103.5, OK?

420
00:19:47,572 --> 00:19:49,780
And that's because I have
not been very cautious, OK?

421
00:19:52,990 --> 00:19:56,170
So what we want to do is
to have a little buffer

422
00:19:56,170 --> 00:19:58,750
to account for the
fact that xn bar is not

423
00:19:58,750 --> 00:20:01,525
a precise value for the true mu.

424
00:20:01,525 --> 00:20:05,180
It's something that's 1
over root n away from you.

425
00:20:05,180 --> 00:20:07,420
And so, what we want is
the better heuristic that

426
00:20:07,420 --> 00:20:09,550
says, well, if I want
to conclude that I'm

427
00:20:09,550 --> 00:20:14,270
less than 103.5, maybe I
need to be less than 103.5

428
00:20:14,270 --> 00:20:17,920
minus a little buffer that goes
to 0 as my sample size goes

429
00:20:17,920 --> 00:20:19,300
to infinity.

430
00:20:19,300 --> 00:20:22,354
And actually, that's what the
law of large number tells me.

431
00:20:22,354 --> 00:20:23,770
The central limit
theorem actually

432
00:20:23,770 --> 00:20:26,740
tells me that this
should be true,

433
00:20:26,740 --> 00:20:30,030
something that goes to
0 as n goes to infinity

434
00:20:30,030 --> 00:20:32,850
and the rate 1
over root n, right?

435
00:20:32,850 --> 00:20:36,680
That's basically what the
central limit theorem tells me.

436
00:20:36,680 --> 00:20:39,170
So to make this
intuition more precise,

437
00:20:39,170 --> 00:20:41,490
we need to understand
those fluctuations.

438
00:20:41,490 --> 00:20:43,610
We need to actually
put in something

439
00:20:43,610 --> 00:20:47,102
that's more precise than
these little wiggles here, OK?

440
00:20:47,102 --> 00:20:49,560
We need to actually have the
central limit theorem come in.

441
00:20:53,110 --> 00:20:57,340
So here is the example
of comparing two groups.

442
00:20:57,340 --> 00:21:00,700
So pharmaceutical
companies use hypothesis

443
00:21:00,700 --> 00:21:03,167
testing to test if a
drug is efficient, right?

444
00:21:03,167 --> 00:21:04,000
That's what they do.

445
00:21:04,000 --> 00:21:06,170
They want to know,
does my new drug work?

446
00:21:06,170 --> 00:21:09,175
And that's what the Federal
Drug Administration office

447
00:21:09,175 --> 00:21:11,560
is doing on a daily basis.

448
00:21:11,560 --> 00:21:18,660
They ask for extremely well
regulated clinical trials

449
00:21:18,660 --> 00:21:22,530
on a thousand people,
and check, does this drug

450
00:21:22,530 --> 00:21:23,460
make a difference?

451
00:21:23,460 --> 00:21:24,900
Did everybody die?

452
00:21:24,900 --> 00:21:27,270
Does it make no difference?

453
00:21:27,270 --> 00:21:30,450
Should people pay $200 for
a pill of sugar, right?

454
00:21:30,450 --> 00:21:33,030
So that's what people
are actually asking.

455
00:21:33,030 --> 00:21:36,060
So to do so, of course, there
is no ground truth about--

456
00:21:36,060 --> 00:21:38,830
so there's actually
a placebo effect.

457
00:21:38,830 --> 00:21:41,970
So it's not like actually
giving a drug that does not work

458
00:21:41,970 --> 00:21:44,400
is going to have no
effect on patients.

459
00:21:44,400 --> 00:21:47,320
It will have a small effect,
but it's very hard to quantify.

460
00:21:47,320 --> 00:21:50,460
We know that it's there, but
we don't know what it is.

461
00:21:50,460 --> 00:21:52,670
And so rather than saying,
oh the ground truth

462
00:21:52,670 --> 00:21:56,280
is no improvement, the ground
truth is the placebo effect.

463
00:21:56,280 --> 00:22:00,020
And we need to measure
what the placebo effect is.

464
00:22:00,020 --> 00:22:01,730
So what we're going
to do is we're

465
00:22:01,730 --> 00:22:04,730
going to split our
patients into two groups.

466
00:22:04,730 --> 00:22:06,590
And there's going to
be what's called a test

467
00:22:06,590 --> 00:22:10,020
group and a control group.

468
00:22:10,020 --> 00:22:13,040
So the word test here is
used in a different way

469
00:22:13,040 --> 00:22:14,220
than hypothesis testing.

470
00:22:14,220 --> 00:22:17,490
So we'll just call it
typically the drug group.

471
00:22:17,490 --> 00:22:22,790
And so, I will refer to
mu drug for this guy, OK?

472
00:22:22,790 --> 00:22:26,120
Now, this let's say this
is a cough syrup, OK?

473
00:22:26,120 --> 00:22:29,990
And when you have a
cough syrup, the way

474
00:22:29,990 --> 00:22:34,520
you measure the efficacy
of a cough syrup

475
00:22:34,520 --> 00:22:40,000
is to measure how many times
you cough per minute, OK?

476
00:22:40,000 --> 00:22:42,740
And so, if I define
mu control the number

477
00:22:42,740 --> 00:22:48,840
of expectoration per hour.

478
00:22:48,840 --> 00:22:50,500
So just the expected
number, right?

479
00:22:50,500 --> 00:22:53,000
This is the number I don't know,
because I don't have access

480
00:22:53,000 --> 00:22:55,430
to the entire population
of people that will ever

481
00:22:55,430 --> 00:22:57,590
take this cough syrup.

482
00:22:57,590 --> 00:23:00,526
And so, I will call it mu
control for the control group.

483
00:23:00,526 --> 00:23:02,900
So those are the people who
have been actually given just

484
00:23:02,900 --> 00:23:05,330
like sugar, like maple syrup.

485
00:23:05,330 --> 00:23:09,440
And mu drug are those people who
are given the actual syrup, OK?

486
00:23:09,440 --> 00:23:12,710
And you can imagine that maybe
maple syrup will have an effect

487
00:23:12,710 --> 00:23:18,020
on expectorations per hour just
because, well, it's just sweet

488
00:23:18,020 --> 00:23:19,040
and it helps, OK?

489
00:23:19,040 --> 00:23:21,290
And so, we don't know what
this effect is going to be.

490
00:23:21,290 --> 00:23:24,920
We just want to measure
if the drug is actually

491
00:23:24,920 --> 00:23:28,670
having just a better
impact on expectorations

492
00:23:28,670 --> 00:23:34,700
per hour than the just
pure maple syrup, OK?

493
00:23:34,700 --> 00:23:38,385
So what we want to know is if
mu drug is less than mu control.

494
00:23:38,385 --> 00:23:39,260
That would be enough.

495
00:23:39,260 --> 00:23:41,540
If we had access to
all the populations

496
00:23:41,540 --> 00:23:44,390
that will ever take
the syrup for all ages,

497
00:23:44,390 --> 00:23:46,760
then we would just measure,
did this have an impact?

498
00:23:46,760 --> 00:23:49,880
And even if it's a slightly
ever so small impact,

499
00:23:49,880 --> 00:23:52,760
then it's good to
release this cough syrup,

500
00:23:52,760 --> 00:23:55,430
assuming that it has no side
effects or anything like this,

501
00:23:55,430 --> 00:23:58,370
because it's just better
than maple syrup, OK?

502
00:23:58,370 --> 00:24:00,430
The problem is that we
don't have access to this.

503
00:24:00,430 --> 00:24:03,110
And we're going to have to make
this decision based on samples

504
00:24:03,110 --> 00:24:09,140
that give me imprecise knowledge
about mu drug and mu control.

505
00:24:09,140 --> 00:24:10,870
So in this case,
unlike the first case

506
00:24:10,870 --> 00:24:13,990
where we compared an
unknown expected value

507
00:24:13,990 --> 00:24:17,530
to have a fixed number, which
was one of the 103.5, here,

508
00:24:17,530 --> 00:24:20,980
we're just comparing two unknown
numbers with each other, OK?

509
00:24:20,980 --> 00:24:22,522
So there's two
sources of randomness.

510
00:24:22,522 --> 00:24:23,896
Trying to estimate
the first one.

511
00:24:23,896 --> 00:24:25,492
And trying to estimate
the second one.

512
00:24:29,190 --> 00:24:31,860
Before I move on, I just
wanted to tell you I apologize.

513
00:24:31,860 --> 00:24:34,480
One of the graders was not able
to finish grading his problem

514
00:24:34,480 --> 00:24:35,240
sets for today.

515
00:24:35,240 --> 00:24:39,390
So for those of you who are here
just to pick up their homework,

516
00:24:39,390 --> 00:24:41,010
feel free to leave now.

517
00:24:41,010 --> 00:24:45,071
Even if you have a name tag, I
will pretend I did not read it.

518
00:24:45,071 --> 00:24:45,570
OK.

519
00:24:45,570 --> 00:24:47,300
So I'm sorry.

520
00:24:47,300 --> 00:24:49,970
You'll get it on Tuesday.

521
00:24:49,970 --> 00:24:53,540
And this will not happen again.

522
00:24:53,540 --> 00:24:54,510
OK.

523
00:24:54,510 --> 00:24:56,722
So for the clinical
trials, now I'm

524
00:24:56,722 --> 00:24:57,930
going to collect information.

525
00:24:57,930 --> 00:25:00,138
I'm going to collect the
data from the control group.

526
00:25:00,138 --> 00:25:03,420
And I'm going to collect data
from the test group, all right?

527
00:25:03,420 --> 00:25:05,490
So my control group here.

528
00:25:05,490 --> 00:25:08,170
I don't have to collect the same
number of people in the control

529
00:25:08,170 --> 00:25:09,600
group than in the drug group.

530
00:25:09,600 --> 00:25:12,520
Actually, for cough syrup,
maybe it's not that important.

531
00:25:12,520 --> 00:25:14,790
But you can imagine
that if you think

532
00:25:14,790 --> 00:25:20,160
you have the cure to a
really annoying disease,

533
00:25:20,160 --> 00:25:23,910
it's actually hard to tell
half of the people you

534
00:25:23,910 --> 00:25:26,460
will get a pill of nothing, OK?

535
00:25:26,460 --> 00:25:28,140
People tend to want
to try the drug.

536
00:25:28,140 --> 00:25:28,942
They're desperate.

537
00:25:28,942 --> 00:25:30,900
And so, you have to have
this sort of imbalance

538
00:25:30,900 --> 00:25:35,390
between who is getting the drug
and who's not getting the drug.

539
00:25:35,390 --> 00:25:37,865
And people have to qualify
for the clinical trials.

540
00:25:37,865 --> 00:25:39,240
There's lots of
fluctuations that

541
00:25:39,240 --> 00:25:42,091
affect what the final numbers
of people who are actually

542
00:25:42,091 --> 00:25:44,340
going to get the drug and
are going to get the control

543
00:25:44,340 --> 00:25:45,180
is going to be.

544
00:25:45,180 --> 00:25:49,277
And so, it's not easy for you
to make those two numbers equal.

545
00:25:49,277 --> 00:25:51,360
You'd like to have those
numbers equal if you can,

546
00:25:51,360 --> 00:25:55,910
but not necessarily.

547
00:25:55,910 --> 00:25:59,270
And by the way, this is all part
of some mystical science called

548
00:25:59,270 --> 00:26:00,440
"design of experiments."

549
00:26:00,440 --> 00:26:02,780
And in particular,
you can imagine

550
00:26:02,780 --> 00:26:04,910
that if one of the series
had higher variants,

551
00:26:04,910 --> 00:26:07,040
you would want to like
more people in this group

552
00:26:07,040 --> 00:26:08,000
than the other group.

553
00:26:08,000 --> 00:26:10,982
Yeah?

554
00:26:10,982 --> 00:26:13,964
STUDENT: So when we're
subtracting [INAUDIBLE]

555
00:26:13,964 --> 00:26:20,425
something that [INAUDIBLE] 0
[INAUDIBLE] to be satisfied.

556
00:26:20,425 --> 00:26:22,420
So that's on
purpose [INAUDIBLE]..

557
00:26:22,420 --> 00:26:23,180
PROFESSOR: Yeah,
that's on purpose.

558
00:26:23,180 --> 00:26:25,055
And I'll come to that
in a second, all right?

559
00:26:25,055 --> 00:26:31,130
So basically, we're
going to make it

560
00:26:31,130 --> 00:26:34,760
if your answer is, is this true?

561
00:26:34,760 --> 00:26:39,170
We're going to make it as hard
as possible, but no harder

562
00:26:39,170 --> 00:26:41,150
for you to say yes
to this answer.

563
00:26:41,150 --> 00:26:43,230
Because, well, we'll see why.

564
00:26:45,900 --> 00:26:50,040
OK, so now we have two set
of data, the x's and the y's.

565
00:26:50,040 --> 00:26:51,810
The x's are the
ones for the drug.

566
00:26:51,810 --> 00:26:55,230
And the y's are the data that I
collected from the people, who

567
00:26:55,230 --> 00:26:57,520
were just given a placebo, OK?

568
00:26:57,520 --> 00:26:59,280
And so, they're all
IID random variables.

569
00:26:59,280 --> 00:27:02,220
And here, since it's the
number of expectorations,

570
00:27:02,220 --> 00:27:07,352
I'm making a blunt
modeling assumption.

571
00:27:07,352 --> 00:27:08,810
I'm just going to
say it's Poisson.

572
00:27:08,810 --> 00:27:11,990
And it's characterized only by
the mean mu drug or the mean mu

573
00:27:11,990 --> 00:27:13,440
control, OK?

574
00:27:13,440 --> 00:27:15,090
I've just made an
assumption here.

575
00:27:15,090 --> 00:27:16,750
It could be something different.

576
00:27:16,750 --> 00:27:19,260
But let's say it's a
Poisson distribution.

577
00:27:19,260 --> 00:27:21,655
So now what I want to know
is to test whether mu drug is

578
00:27:21,655 --> 00:27:22,530
less than mu control.

579
00:27:22,530 --> 00:27:23,880
We said that already.

580
00:27:23,880 --> 00:27:26,340
But the way we said it before
was not as mathematical

581
00:27:26,340 --> 00:27:27,120
as it is now.

582
00:27:27,120 --> 00:27:29,520
Now we're actually making
a test on the parameters

583
00:27:29,520 --> 00:27:30,540
of Poisson distribution.

584
00:27:30,540 --> 00:27:32,610
Whereas before, we
were just making test

585
00:27:32,610 --> 00:27:36,410
on expected numbers, OK?

586
00:27:36,410 --> 00:27:39,650
So the heuristic-- again, if we
try to apply the heuristic now.

587
00:27:39,650 --> 00:27:42,890
Rather than comparing mu x
bar drug to some fixed number,

588
00:27:42,890 --> 00:27:46,306
I'm actually comparing x
bar drug to some control.

589
00:27:46,306 --> 00:27:48,680
But now here, I need to have
something that accounts for,

590
00:27:48,680 --> 00:27:51,200
not only the fluctuations
of x bar drug,

591
00:27:51,200 --> 00:27:55,406
but also for the fluctuations
of x bar control, OK?

592
00:27:55,406 --> 00:27:56,780
And so, now I need
something that

593
00:27:56,780 --> 00:27:59,540
goes to 0 when all those
two things go to infinity.

594
00:27:59,540 --> 00:28:02,840
And typically, it should go to
zero with 1 over root of n drug

595
00:28:02,840 --> 00:28:06,340
and 1 over square
root of n control, OK?

596
00:28:06,340 --> 00:28:08,830
That's what the central
limit theorem for both x bar

597
00:28:08,830 --> 00:28:11,320
drug and x bar control.

598
00:28:11,320 --> 00:28:15,261
Two central limit theorems
are actually telling.

599
00:28:15,261 --> 00:28:15,760
OK.

600
00:28:15,760 --> 00:28:17,840
And then we can conclude
that this happens.

601
00:28:17,840 --> 00:28:19,570
And as you said, we're
trying to make it

602
00:28:19,570 --> 00:28:21,700
a bit harder to conclude this.

603
00:28:21,700 --> 00:28:23,830
Because let's face it.

604
00:28:23,830 --> 00:28:26,800
If we were actually using
two simple heuristic, right?

605
00:28:30,114 --> 00:28:31,030
For simplicity, right?

606
00:28:35,880 --> 00:28:43,930
So I can rewrite x bar drug
less than x bar control

607
00:28:43,930 --> 00:28:46,060
minus this something
that goes to 0.

608
00:28:46,060 --> 00:28:54,650
I can write it as x bar drug
minus x bar control less

609
00:28:54,650 --> 00:28:57,930
than something negative, OK?

610
00:28:57,930 --> 00:29:00,280
This little something, OK?

611
00:29:00,280 --> 00:29:02,990
So now let's look at those guys.

612
00:29:02,990 --> 00:29:06,470
This is the difference
of two random variables.

613
00:29:06,470 --> 00:29:08,570
From the central
limit theorem, they

614
00:29:08,570 --> 00:29:12,914
should be approximately
Gaussian each.

615
00:29:12,914 --> 00:29:14,330
And actually, we're
going to think

616
00:29:14,330 --> 00:29:15,660
of them as being independent.

617
00:29:15,660 --> 00:29:18,460
There's no reason why the
people in the control group

618
00:29:18,460 --> 00:29:20,210
should have any effect
on what's happening

619
00:29:20,210 --> 00:29:21,350
to the people in the test group.

620
00:29:21,350 --> 00:29:23,550
Those people probably
don't even know each other.

621
00:29:23,550 --> 00:29:27,056
And so, when I look at this,
this should look like n 0

622
00:29:27,056 --> 00:29:28,430
with some mean
and some variants,

623
00:29:28,430 --> 00:29:30,590
let's say I don't
know what it is, OK?

624
00:29:30,590 --> 00:29:31,940
The mean I actually know.

625
00:29:31,940 --> 00:29:37,250
It's mu drug minus
mu control, OK?

626
00:29:37,250 --> 00:29:39,950
So if they were to plot
the PDF of this guy,

627
00:29:39,950 --> 00:29:41,210
it would look like this.

628
00:29:41,210 --> 00:29:42,890
I would have something
which is centered

629
00:29:42,890 --> 00:29:45,590
at mu drug minus mu control.

630
00:29:48,280 --> 00:29:51,880
And it would look like this, OK?

631
00:29:51,880 --> 00:29:55,780
Now let's say that mu drug is
actually equal to mu control.

632
00:29:55,780 --> 00:29:59,810
That this pharmaceutical
company is a huge scam.

633
00:29:59,810 --> 00:30:04,810
And they really are trying
to sell bottled corn

634
00:30:04,810 --> 00:30:07,580
syrup for $200 a pop, OK?

635
00:30:07,580 --> 00:30:08,980
So this is a huge scam.

636
00:30:08,980 --> 00:30:12,010
And the true things are
actually equal to 0.

637
00:30:12,010 --> 00:30:15,670
So this thing is really
centered about 0, OK?

638
00:30:15,670 --> 00:30:18,760
Now, if were not to do
this, then basically, half

639
00:30:18,760 --> 00:30:20,890
of the time I would
actually come up

640
00:30:20,890 --> 00:30:22,900
with a distribution
that's above this value.

641
00:30:22,900 --> 00:30:24,370
And half of the time I
would have something that's

642
00:30:24,370 --> 00:30:27,070
below this value, which would
mean that half of the scams

643
00:30:27,070 --> 00:30:31,040
would actually go through
FDA if I did not do this.

644
00:30:31,040 --> 00:30:33,340
So what I'm trying to
do is to say, well, OK.

645
00:30:33,340 --> 00:30:35,830
You have to be here, so
that there is actually

646
00:30:35,830 --> 00:30:37,930
a very low probability
that just by chance

647
00:30:37,930 --> 00:30:40,120
you end up being here.

648
00:30:40,120 --> 00:30:42,520
And we'll make all the
statements extremely precise

649
00:30:42,520 --> 00:30:43,780
later on.

650
00:30:43,780 --> 00:30:46,060
But I think the
drug thing makes it

651
00:30:46,060 --> 00:30:49,330
interesting to see why
you're making it hard,

652
00:30:49,330 --> 00:30:51,220
because You don't
want to allow people

653
00:30:51,220 --> 00:30:52,420
to sell a thing like that.

654
00:30:55,830 --> 00:30:58,620
Before we go more into the
statistical thinking associated

655
00:30:58,620 --> 00:31:01,050
to tests, let's just
see how we would

656
00:31:01,050 --> 00:31:02,460
do this quantification, right?

657
00:31:02,460 --> 00:31:04,410
I mean after all, this
is what we probably

658
00:31:04,410 --> 00:31:07,590
are the most comfortable
with at this point.

659
00:31:07,590 --> 00:31:10,650
So let's just try
to understand this.

660
00:31:10,650 --> 00:31:16,060
And I'm going to make the
statisticians favorite test,

661
00:31:16,060 --> 00:31:19,700
which is the thing that
obviously you do at home all

662
00:31:19,700 --> 00:31:21,450
the time every time
you get a new quarter,

663
00:31:21,450 --> 00:31:23,740
is testing whether it's
a fair coin or not.

664
00:31:23,740 --> 00:31:24,240
All right?

665
00:31:24,240 --> 00:31:27,840
So this test, of course,
exists only in textbooks.

666
00:31:27,840 --> 00:31:30,780
And I actually did
not write this slide.

667
00:31:30,780 --> 00:31:32,670
I was lazy to just
replace all this stuff

668
00:31:32,670 --> 00:31:37,410
by the Cherry Blossom Run.

669
00:31:37,410 --> 00:31:38,440
So you have a coin.

670
00:31:38,440 --> 00:31:42,330
Now you have 80
observations, x1 to x80.

671
00:31:42,330 --> 00:31:45,420
So n is equal to 80.

672
00:31:45,420 --> 00:31:53,820
I have x1, xn, IID, Bernoulli p.

673
00:31:53,820 --> 00:31:55,570
And I want to know if
I have a fair coin.

674
00:31:55,570 --> 00:31:57,330
So in mathematical
language, I want

675
00:31:57,330 --> 00:32:00,380
to know if p is equal to 1/2.

676
00:32:04,800 --> 00:32:07,680
Let's say this is
just the heads, OK?

677
00:32:07,680 --> 00:32:09,030
And a biased coin?

678
00:32:09,030 --> 00:32:10,536
Well, maybe you
would potentially

679
00:32:10,536 --> 00:32:11,910
be interested
whether it's biased

680
00:32:11,910 --> 00:32:13,034
one direction or the other.

681
00:32:13,034 --> 00:32:15,900
But not being a fair
coin is already somewhat

682
00:32:15,900 --> 00:32:17,360
of a discovery, OK?

683
00:32:17,360 --> 00:32:20,520
And so, you just want to know
whether p is equal to 1/2

684
00:32:20,520 --> 00:32:25,280
or p is not equal to 1/2, OK?

685
00:32:25,280 --> 00:32:29,890
Now, if I were to apply the
very naive first example

686
00:32:29,890 --> 00:32:32,080
to not reject this hypothesis.

687
00:32:32,080 --> 00:32:35,710
If I run this thing
80 times, I need

688
00:32:35,710 --> 00:32:40,360
to see exactly 40
heads and 40 tales.

689
00:32:40,360 --> 00:32:43,960
Now this is very unlikely
to happen exactly.

690
00:32:43,960 --> 00:32:47,200
You're going to have close to
40 heads and close to 40 tails,

691
00:32:47,200 --> 00:32:49,510
but how close should
those things be?

692
00:32:49,510 --> 00:32:50,310
OK?

693
00:32:50,310 --> 00:32:52,540
And so, the little
something is going

694
00:32:52,540 --> 00:32:55,300
to be quantified by
exactly this, OK?

695
00:32:55,300 --> 00:33:06,490
So now here, let's say that my
experiment gave me 54 heads.

696
00:33:06,490 --> 00:33:07,541
That's 54?

697
00:33:07,541 --> 00:33:08,040
Yeah.

698
00:33:10,870 --> 00:33:21,100
Which means that my xn bar
is 54 over 80, which is 0.68.

699
00:33:21,100 --> 00:33:21,970
All right?

700
00:33:21,970 --> 00:33:24,580
So I have this estimator.

701
00:33:24,580 --> 00:33:26,620
Looks pretty large, right?

702
00:33:26,620 --> 00:33:29,950
It's much larger than
0.5, so it does look like,

703
00:33:29,950 --> 00:33:32,080
and my mom would
certainly conclude,

704
00:33:32,080 --> 00:33:34,240
that this is a
biased coin for sure,

705
00:33:34,240 --> 00:33:35,870
because she thinks I'm tricky.

706
00:33:35,870 --> 00:33:36,370
All right.

707
00:33:36,370 --> 00:33:40,110
So the question is, can
this be due to chance?

708
00:33:40,110 --> 00:33:42,050
Can this be due to chance alone?

709
00:33:42,050 --> 00:33:45,490
Like what is the likelihood
that a fair coin would actually

710
00:33:45,490 --> 00:33:51,800
end up being 54 times
on heads rather than 40?

711
00:33:51,800 --> 00:33:52,900
OK?

712
00:33:52,900 --> 00:33:55,540
And so, what we do
is we say, OK, I

713
00:33:55,540 --> 00:33:58,370
need to understand, what is
the distribution of the number

714
00:33:58,370 --> 00:33:59,732
of times it comes on heads?

715
00:33:59,732 --> 00:34:01,190
And this is going
to be a binomial,

716
00:34:01,190 --> 00:34:02,984
but it's a little
annoying to play with.

717
00:34:02,984 --> 00:34:05,150
So we're going to use the
central limit theorem that

718
00:34:05,150 --> 00:34:10,790
tells me that xn
bar minus p divided

719
00:34:10,790 --> 00:34:15,940
by square root of p1 minus p
is approximately distributed

720
00:34:15,940 --> 00:34:17,300
as an n01.

721
00:34:17,300 --> 00:34:18,944
And here, since
n is equal to 80,

722
00:34:18,944 --> 00:34:21,110
I'm pretty safe that this
is actually going to work.

723
00:34:28,420 --> 00:34:33,300
And I can actually
use [INAUDIBLE],,

724
00:34:33,300 --> 00:34:34,510
and put xn bar here.

725
00:34:38,050 --> 00:34:40,980
[INAUDIBLE] tells me
that this is OK to do.

726
00:34:40,980 --> 00:34:41,480
All right.

727
00:34:41,480 --> 00:34:43,350
So now I'm actually
going to compute this.

728
00:34:43,350 --> 00:34:44,870
So here, I know this.

729
00:34:44,870 --> 00:34:46,280
This is square root of 80.

730
00:34:46,280 --> 00:34:48,860
This is a 0.68.

731
00:34:48,860 --> 00:34:50,210
What is this value here?

732
00:34:50,210 --> 00:34:51,179
We'll talk about it.

733
00:34:51,179 --> 00:34:53,090
Well, we're trying to
understand what happens

734
00:34:53,090 --> 00:34:55,610
if it is a fair coin, right?

735
00:34:55,610 --> 00:35:02,220
So if fair, then p is
equal to 0.5, right?

736
00:35:02,220 --> 00:35:05,160
So what I want to know
is, what is the likelihood

737
00:35:05,160 --> 00:35:09,240
that a fair coin
would give me 0.68?

738
00:35:09,240 --> 00:35:10,560
Let me finish.

739
00:35:10,560 --> 00:35:11,850
All right.

740
00:35:11,850 --> 00:35:14,190
What is the likelihood
that a fair coin will

741
00:35:14,190 --> 00:35:17,470
allow me to do this, so I'm
actually allowed to plug-in p

742
00:35:17,470 --> 00:35:19,530
to be 0.5 here?

743
00:35:19,530 --> 00:35:25,000
Now, your question is, why
do I not plug-in p to be 0.5?

744
00:35:25,000 --> 00:35:25,640
But you can.

745
00:35:25,640 --> 00:35:26,140
All right.

746
00:35:26,140 --> 00:35:29,024
I just want to make you plug-in
p at one specific point,

747
00:35:29,024 --> 00:35:30,190
but you're absolutely right.

748
00:35:34,040 --> 00:35:34,540
OK.

749
00:35:34,540 --> 00:35:37,270
Let's forget about your
question for one second.

750
00:35:37,270 --> 00:35:41,590
So now I'm going to have to
look at xn bar minus 0.5 divided

751
00:35:41,590 --> 00:35:45,300
by xn bar 1 minus xn bar.

752
00:35:45,300 --> 00:35:51,130
Then this thing is
approximately Gaussian and 0,1

753
00:35:51,130 --> 00:35:52,270
if the coin is fair.

754
00:35:56,150 --> 00:36:01,120
Otherwise, I'm going to have
a mean which is not zero here.

755
00:36:01,120 --> 00:36:04,730
If the coin is something else,
whatever I get here, right?

756
00:36:07,772 --> 00:36:09,230
Let's just write
it for one second.

757
00:36:23,520 --> 00:36:25,010
Let's do it.

758
00:36:25,010 --> 00:36:27,510
So what is the
distribution of this if p--

759
00:36:27,510 --> 00:36:33,380
so that's p is equal to 0.5.

760
00:36:33,380 --> 00:36:33,930
OK?

761
00:36:33,930 --> 00:36:39,910
Now if p is equal to 0.6,
then this thing is just, well,

762
00:36:39,910 --> 00:36:43,660
I know that this is equal
to square root of n xn

763
00:36:43,660 --> 00:36:52,140
bar minus 0.6,
divided by xn bar 1

764
00:36:52,140 --> 00:36:55,740
minus xn in the bar
squared root, plus--

765
00:36:55,740 --> 00:36:57,050
well, now the difference.

766
00:36:57,050 --> 00:37:00,710
Is So square root
of n, 0.6 minus

767
00:37:00,710 --> 00:37:07,520
0.5, divided by square root of
xn bar 1 minus xn bar, right?

768
00:37:07,520 --> 00:37:13,790
Now if p is equal to 0.6,
then this guy is n 0,1,

769
00:37:13,790 --> 00:37:17,950
but this guy is
something different.

770
00:37:17,950 --> 00:37:22,155
This is just a number that
depends on square root of n.

771
00:37:22,155 --> 00:37:24,130
It's actually pretty large.

772
00:37:24,130 --> 00:37:28,620
So if I want to use the
fact that this guy has

773
00:37:28,620 --> 00:37:33,650
a normal distribution, I need
to plug-in the true value here.

774
00:37:33,650 --> 00:37:38,630
Now, the implicit question
that I got was the following.

775
00:37:38,630 --> 00:37:43,630
It says, well, if you know
what p is, then what's

776
00:37:43,630 --> 00:37:46,600
actually true is also this.

777
00:37:46,600 --> 00:37:51,450
If p is equal to
0.5, then since I

778
00:37:51,450 --> 00:37:57,180
know that root n xn bar minus
p divided by square root of p 1

779
00:37:57,180 --> 00:38:01,590
minus p is some n
0, 1, it's also true

780
00:38:01,590 --> 00:38:06,570
that square root of
n xn bar minus 0.5

781
00:38:06,570 --> 00:38:14,940
divided by square root of 0.5
1 minus 0.5 is n 0,1, right?

782
00:38:14,940 --> 00:38:15,750
I know what p is.

783
00:38:15,750 --> 00:38:18,620
I'm just going to
make it appear.

784
00:38:18,620 --> 00:38:19,460
OK.

785
00:38:19,460 --> 00:38:22,100
And so, what's actually
nice about this particular

786
00:38:22,100 --> 00:38:27,460
[INAUDIBLE] experiment is that
I can check if my assumption is

787
00:38:27,460 --> 00:38:31,386
valid by checking
whether I'm actually--

788
00:38:31,386 --> 00:38:32,760
so what I'm going
to do right now

789
00:38:32,760 --> 00:38:36,697
is check whether this is likely
to be a Gaussian or not, right?

790
00:38:36,697 --> 00:38:38,280
And there's two ways
I can violate it.

791
00:38:38,280 --> 00:38:42,496
By violating mean, but also
by violating the variance.

792
00:38:42,496 --> 00:38:44,120
And here, what I did
in the first case,

793
00:38:44,120 --> 00:38:46,340
I said, well I'm not allowing
you to check whether you've

794
00:38:46,340 --> 00:38:47,040
violated the variance.

795
00:38:47,040 --> 00:38:49,340
I'm just plugging whatever
variance you're getting.

796
00:38:49,340 --> 00:38:51,110
Whereas here, I'm
saying, well, there's

797
00:38:51,110 --> 00:38:52,460
two ways you can violate it.

798
00:38:52,460 --> 00:38:55,430
And I'm just going to
factor everything in.

799
00:38:55,430 --> 00:38:58,700
So now I can
plug-in this number.

800
00:38:58,700 --> 00:39:00,146
So this is 80.

801
00:39:00,146 --> 00:39:02,440
This is 0.68.

802
00:39:02,440 --> 00:39:04,210
So I can compute all this stuff.

803
00:39:04,210 --> 00:39:06,910
I can compute all this
stuff here as well.

804
00:39:06,910 --> 00:39:10,210
And what I get in this
case, if I put the xn bar 1,

805
00:39:10,210 --> 00:39:15,160
I get 3.45, OK?

806
00:39:15,160 --> 00:39:17,120
And now I claim
that this makes it

807
00:39:17,120 --> 00:39:19,430
reasonable to reject
the hypothesis that p

808
00:39:19,430 --> 00:39:21,790
is equal to 0.5.

809
00:39:21,790 --> 00:39:22,900
Can somebody tell me why?

810
00:39:27,819 --> 00:39:28,860
STUDENT: It's pretty big.

811
00:39:28,860 --> 00:39:30,235
PROFESSOR: Yeah,
3 is pretty big.

812
00:39:30,235 --> 00:39:31,510
So it's very unlikely.

813
00:39:31,510 --> 00:39:33,730
So this number that
I should see should

814
00:39:33,730 --> 00:39:39,460
look like the number I would get
if I asked a computer to draw

815
00:39:39,460 --> 00:39:42,960
one random Gaussian for me.

816
00:39:42,960 --> 00:39:45,250
This number, when I draw
one random Gaussian,

817
00:39:45,250 --> 00:39:49,510
is actually a number with
99.9% this number will

818
00:39:49,510 --> 00:39:52,700
be between negative 3 and 3.

819
00:39:52,700 --> 00:39:55,540
With 78% it's going to be
between negative 2 and 2.

820
00:40:01,990 --> 00:40:04,410
68% is between minus 1 and 1.

821
00:40:04,410 --> 00:40:07,560
And with like 90% it's
between minus 2 and 2.

822
00:40:07,560 --> 00:40:10,860
So getting a 3.45
when you do this

823
00:40:10,860 --> 00:40:13,260
is extremely unlikely
to happen, which

824
00:40:13,260 --> 00:40:17,260
means that you would have to
be extremely unlucky for this

825
00:40:17,260 --> 00:40:17,890
to ever happen.

826
00:40:17,890 --> 00:40:19,750
Now, it can happen, right?

827
00:40:19,750 --> 00:40:25,000
It could be the case that you
flip 80 coins and 80 of them

828
00:40:25,000 --> 00:40:27,505
are heads.

829
00:40:27,505 --> 00:40:29,130
With what probability
does this happen?

830
00:40:32,680 --> 00:40:34,390
1 over 2 to the 80, right?

831
00:40:34,390 --> 00:40:39,890
Which is probably better
off playing the lottery

832
00:40:39,890 --> 00:40:41,140
with this kind of odds, right?

833
00:40:41,140 --> 00:40:43,840
I mean, this is just not going
to happen, but it might happen.

834
00:40:43,840 --> 00:40:48,490
So we cannot remove completely
the uncertainty, right?

835
00:40:48,490 --> 00:40:53,250
It's still possible that
this is due to noise.

836
00:40:53,250 --> 00:40:55,640
But we're just trying to
make all the cases that

837
00:40:55,640 --> 00:40:58,340
are very unlikely go away, OK?

838
00:40:58,340 --> 00:41:03,350
And so, now I claim that 3.45
is very unlikely for a Gaussian.

839
00:41:03,350 --> 00:41:07,700
So if I were to draw the PDF
of a standard Gaussian, right?

840
00:41:07,700 --> 00:41:09,320
So n 0, 1, right?

841
00:41:09,320 --> 00:41:12,880
So that's PDF of n 0, 1.

842
00:41:16,740 --> 00:41:21,900
3.73 is basically here, OK?

843
00:41:21,900 --> 00:41:25,260
So it's just too
far in the tails.

844
00:41:25,260 --> 00:41:26,030
Understood?

845
00:41:26,030 --> 00:41:30,320
Now I cannot say that the
probability that the Gaussian

846
00:41:30,320 --> 00:41:33,810
is equal to 373 is small, right?

847
00:41:33,810 --> 00:41:35,660
I just cannot say
that, because it's 0.

848
00:41:35,660 --> 00:41:37,770
And it's also 0 for the
probability that it's 0,

849
00:41:37,770 --> 00:41:41,060
even though the most
likely values are around 0.

850
00:41:41,060 --> 00:41:44,236
It's the continuous
random variable.

851
00:41:44,236 --> 00:41:45,610
Any value you give
me, it's going

852
00:41:45,610 --> 00:41:47,720
to happen with probability zero.

853
00:41:47,720 --> 00:41:51,190
So what we're going to say
is, well, the fluctuations

854
00:41:51,190 --> 00:41:52,780
are larger than this number.

855
00:41:52,780 --> 00:41:55,030
The probability that
I get anything worse

856
00:41:55,030 --> 00:41:57,040
than this is actually
extremely small, right?

857
00:41:57,040 --> 00:42:00,970
Anything worse than this is
just like farther than 3.73.

858
00:42:00,970 --> 00:42:03,710
And this is going to
be what we control.

859
00:42:03,710 --> 00:42:04,300
All right?

860
00:42:04,300 --> 00:42:06,550
So in this case, I claim
that it's quite reasonable

861
00:42:06,550 --> 00:42:07,740
to reject the hypothesis.

862
00:42:07,740 --> 00:42:10,830
Is everybody OK with this?

863
00:42:10,830 --> 00:42:12,440
Everybody find this shocking?

864
00:42:12,440 --> 00:42:14,290
Or everybody has no
idea what's going on?

865
00:42:14,290 --> 00:42:16,422
Do you have any questions?

866
00:42:16,422 --> 00:42:17,410
Yeah?

867
00:42:17,410 --> 00:42:19,386
STUDENT: Regarding
the case of p, where

868
00:42:19,386 --> 00:42:21,362
minus p isn't close to xn.

869
00:42:21,362 --> 00:42:24,820
If you use 1 minus p
as 0.5, then you're

870
00:42:24,820 --> 00:42:28,772
dividing by a larger number
than you would if you used xn.

871
00:42:28,772 --> 00:42:32,230
So it feels like our
true number is not 3.45.

872
00:42:32,230 --> 00:42:34,700
It's something a
little bit smaller

873
00:42:34,700 --> 00:42:39,146
than 3.45 for the distribution
to actually be like 1/2.

874
00:42:39,146 --> 00:42:40,628
Because it seems
like we're adding

875
00:42:40,628 --> 00:42:43,574
an unnecessary extra
error by using xn bar.

876
00:42:43,574 --> 00:42:45,074
And we're adding
an error that makes

877
00:42:45,074 --> 00:42:50,014
it seem that our result was less
likely than it actually was.

878
00:43:00,450 --> 00:43:03,490
PROFESSOR: That's correct.

879
00:43:03,490 --> 00:43:05,790
And you're right.

880
00:43:05,790 --> 00:43:07,540
I didn't want to plug-in
the p everywhere,

881
00:43:07,540 --> 00:43:09,670
but you should plug it
in everywhere you can.

882
00:43:09,670 --> 00:43:11,010
That's for sure, OK?

883
00:43:11,010 --> 00:43:12,205
So let's agree on that.

884
00:43:12,205 --> 00:43:15,022
And that's true that it makes
the number a little bigger.

885
00:43:15,022 --> 00:43:16,480
You compute how
much you would get,

886
00:43:16,480 --> 00:43:18,000
we would get if we 0.5 there.

887
00:43:20,760 --> 00:43:23,040
Well, I don't know what
the square root of 80 is.

888
00:43:23,040 --> 00:43:26,370
Can somebody compute quickly?

889
00:43:26,370 --> 00:43:27,600
I'm not asking you to do it.

890
00:43:27,600 --> 00:43:46,150
But what I want is two times
square root of 80 times 0.18.

891
00:43:46,150 --> 00:43:48,850
3.22

892
00:43:48,850 --> 00:43:49,970
OK.

893
00:43:49,970 --> 00:43:55,462
I can make the same
cartoon picture with 3.22.

894
00:43:55,462 --> 00:43:56,170
But you're right.

895
00:43:56,170 --> 00:43:57,370
This is definitely
more accurate.

896
00:43:57,370 --> 00:43:58,536
And I should have done this.

897
00:43:58,536 --> 00:44:02,350
I didn't want to get the
confused message, OK?

898
00:44:02,350 --> 00:44:02,850
All right.

899
00:44:02,850 --> 00:44:07,770
So now here's a second
example that you can think of.

900
00:44:07,770 --> 00:44:11,520
So now I toss it 30 times.

901
00:44:11,520 --> 00:44:17,310
Still in the realm of the
central limit theorem.

902
00:44:17,310 --> 00:44:23,910
I get 13 heads rather than 15.

903
00:44:23,910 --> 00:44:27,127
So I'm actually much closer
to being exactly at half.

904
00:44:27,127 --> 00:44:28,710
So let's see if this
is actually going

905
00:44:28,710 --> 00:44:29,918
to give me a plausible value.

906
00:44:32,580 --> 00:44:34,830
So I get 0.33 in average.

907
00:44:34,830 --> 00:44:40,950
If the truth was 0.5, I would
get something like 0.77.

908
00:44:40,950 --> 00:44:44,700
And now I claim that 0.77
is a plausible realization

909
00:44:44,700 --> 00:44:46,950
for some standard Gaussian, OK?

910
00:44:46,950 --> 00:44:49,620
Now, 0.77 is going to
look like it's here.

911
00:44:55,130 --> 00:44:57,830
So that could very well
be something that just

912
00:44:57,830 --> 00:44:59,420
comes because of randomness.

913
00:44:59,420 --> 00:45:01,860
And again, if you
think about it.

914
00:45:01,860 --> 00:45:06,740
If I told you, you were
expecting 15, you saw 13,

915
00:45:06,740 --> 00:45:09,479
you're happy to put that on
the account of randomness.

916
00:45:09,479 --> 00:45:11,270
Now of course, the
question is going to be,

917
00:45:11,270 --> 00:45:12,770
where do I draw the line?

918
00:45:12,770 --> 00:45:13,760
Right?

919
00:45:13,760 --> 00:45:15,740
Is 12 the right number?

920
00:45:15,740 --> 00:45:16,840
Is 11?

921
00:45:16,840 --> 00:45:17,570
Is 10?

922
00:45:17,570 --> 00:45:18,070
What is it?

923
00:45:21,690 --> 00:45:24,770
So basically, the answer is
it's whatever you want to be.

924
00:45:24,770 --> 00:45:28,250
The problem it's hard to
think on the scale, right?

925
00:45:28,250 --> 00:45:30,332
What does it mean to
think on the scale?

926
00:45:30,332 --> 00:45:31,790
If I can't think
in this scale, I'm

927
00:45:31,790 --> 00:45:33,950
going to have to think on
the scale of 80 of them.

928
00:45:33,950 --> 00:45:38,390
I'm going to have to think on
the scale of running 100 coin

929
00:45:38,390 --> 00:45:39,170
flips.

930
00:45:39,170 --> 00:45:43,004
And so, this scale is a
moving target all the time.

931
00:45:43,004 --> 00:45:44,420
Every time you
have a new problem,

932
00:45:44,420 --> 00:45:45,961
you have to have a
new skill in mind.

933
00:45:45,961 --> 00:45:47,660
And it's very difficult.

934
00:45:47,660 --> 00:45:50,840
The purpose of
statistical analysis,

935
00:45:50,840 --> 00:45:53,810
and in particular this
process that content

936
00:45:53,810 --> 00:45:55,670
that takes your x
bar and turns it

937
00:45:55,670 --> 00:45:58,530
into something that should
be standard Gaussian,

938
00:45:58,530 --> 00:46:01,010
allows you to map
the value of x bar

939
00:46:01,010 --> 00:46:06,350
into a scale that is the
standard scale of the Gaussian.

940
00:46:06,350 --> 00:46:07,070
All right?

941
00:46:07,070 --> 00:46:09,050
Now, all you need
to have in mind

942
00:46:09,050 --> 00:46:13,220
is, what is a large number
or an unusually large number

943
00:46:13,220 --> 00:46:14,014
for a Gaussian?

944
00:46:14,014 --> 00:46:15,180
That's all you need to know.

945
00:46:18,880 --> 00:46:21,760
So here, by the way,
0.77 is not this one,

946
00:46:21,760 --> 00:46:26,860
because it was
actually negative 0.77.

947
00:46:26,860 --> 00:46:28,660
So this one.

948
00:46:28,660 --> 00:46:29,160
OK.

949
00:46:29,160 --> 00:46:34,460
So I can be on the right or
I can be on the left of zero.

950
00:46:34,460 --> 00:46:36,030
But they are still plausible.

951
00:46:36,030 --> 00:46:40,000
So understand you could
actually have in mind

952
00:46:40,000 --> 00:46:42,040
all the values that are
plausible for a Gaussian

953
00:46:42,040 --> 00:46:43,415
and those that
are not plausible,

954
00:46:43,415 --> 00:46:46,190
and draw the line based on what
you think is the right number.

955
00:46:46,190 --> 00:46:49,690
So how large should a positive
value of a Gaussian to become

956
00:46:49,690 --> 00:46:52,990
unreasonable for you?

957
00:46:52,990 --> 00:46:54,530
Is it 1?

958
00:46:54,530 --> 00:46:56,190
Is it 1.5?

959
00:46:56,190 --> 00:46:56,819
Is it 2?

960
00:46:56,819 --> 00:46:57,860
Stop me when I get there.

961
00:46:57,860 --> 00:46:59,900
Is it 2.5?

962
00:46:59,900 --> 00:47:00,557
Is it 3?

963
00:47:00,557 --> 00:47:02,348
STUDENT: I think 2.5
is definitely too big.

964
00:47:02,348 --> 00:47:03,014
PROFESSOR: What?

965
00:47:03,014 --> 00:47:04,808
STUDENT: Doesn't it
depend on our prior?

966
00:47:04,808 --> 00:47:06,776
Let's say we already
have really good evidence

967
00:47:06,776 --> 00:47:09,864
at this point [INAUDIBLE]

968
00:47:09,864 --> 00:47:12,030
PROFESSOR: Yeah, so this
is not Bayesian statistics.

969
00:47:12,030 --> 00:47:14,160
So there's no such thing
as a prior right now.

970
00:47:14,160 --> 00:47:15,150
We'll get there.

971
00:47:15,150 --> 00:47:18,360
You'll have your moment
during one short chapter.

972
00:47:23,510 --> 00:47:25,580
So there's no prior here, right?

973
00:47:25,580 --> 00:47:27,380
It's really a matter
of whether you think

974
00:47:27,380 --> 00:47:28,820
is a Gaussian large or not.

975
00:47:28,820 --> 00:47:30,230
It's not a matter of coins.

976
00:47:30,230 --> 00:47:31,530
It's not a matter of anything.

977
00:47:31,530 --> 00:47:33,950
Now I've just reduced
it to just one question.

978
00:47:33,950 --> 00:47:36,350
So forget about
everything we just said.

979
00:47:36,350 --> 00:47:38,240
And I'm asking you,
when do you decide

980
00:47:38,240 --> 00:47:43,010
that a number is too large
to be reasonably drawn

981
00:47:43,010 --> 00:47:44,790
from a Gaussian?

982
00:47:44,790 --> 00:47:50,190
And this number is 2 or 1.96.

983
00:47:50,190 --> 00:47:53,010
And that's basically the number
that you get from this quintel.

984
00:47:53,010 --> 00:47:55,560
We've seen the
1.96 before, right?

985
00:47:55,560 --> 00:47:59,920
It's actually q alpha over 2,
where alpha is equal to 5%.

986
00:47:59,920 --> 00:48:01,620
That's a quintel of a Gaussian.

987
00:48:01,620 --> 00:48:05,130
So actually, what we
do is we map it again.

988
00:48:05,130 --> 00:48:06,300
So are now at the Gaussians.

989
00:48:06,300 --> 00:48:08,380
And then we map it again
into some probabilities,

990
00:48:08,380 --> 00:48:10,796
which is the probability of
being farther than this thing.

991
00:48:10,796 --> 00:48:12,460
And now probabilities,
we can think.

992
00:48:12,460 --> 00:48:15,060
Probability is something
that quantifies my error.

993
00:48:15,060 --> 00:48:17,220
And the question is
what percentage of error

994
00:48:17,220 --> 00:48:18,300
am I willing to tolerate.

995
00:48:18,300 --> 00:48:20,310
And if I tell you
5%, that's something

996
00:48:20,310 --> 00:48:21,570
you can really envision.

997
00:48:21,570 --> 00:48:24,750
What it means is that if I
were to do this test a million

998
00:48:24,750 --> 00:48:28,380
times, 5% of the time
I would expose myself

999
00:48:28,380 --> 00:48:30,400
to making a mistake.

1000
00:48:30,400 --> 00:48:30,900
All right.

1001
00:48:30,900 --> 00:48:31,950
That's all it would say.

1002
00:48:31,950 --> 00:48:36,790
If you said, well, I don't
want to account for 5%,

1003
00:48:36,790 --> 00:48:42,280
maybe I want 1%, then you
have to move from 1.94 to 2.5.

1004
00:48:42,280 --> 00:48:44,710
And then if you say
at I want 0.01%,

1005
00:48:44,710 --> 00:48:47,180
then you have to move to
an even larger number.

1006
00:48:47,180 --> 00:48:48,100
So it depends.

1007
00:48:48,100 --> 00:48:51,610
But stating this
number 1%, 5%, 10%

1008
00:48:51,610 --> 00:48:57,380
is much easier than seeing those
numbers 1.96, 2.5, et cetera.

1009
00:48:57,380 --> 00:49:00,370
So we're just putting
everything back on the scale.

1010
00:49:00,370 --> 00:49:01,480
All right.

1011
00:49:01,480 --> 00:49:03,190
To conclude, this,
again, as we said,

1012
00:49:03,190 --> 00:49:05,720
does not suggest that
the coin is unfair.

1013
00:49:05,720 --> 00:49:08,260
Now, it might be that
the coin is unfair.

1014
00:49:08,260 --> 00:49:10,330
We just don't have enough
evidence to say that.

1015
00:49:10,330 --> 00:49:12,580
And that goes back to
your question about,

1016
00:49:12,580 --> 00:49:17,420
why are we siding with
the fact that we're

1017
00:49:17,420 --> 00:49:22,086
making it harder to conclude
that the runners were faster?

1018
00:49:22,086 --> 00:49:23,210
And this is the same thing.

1019
00:49:23,210 --> 00:49:24,830
We're making it
harder to conclude

1020
00:49:24,830 --> 00:49:26,540
that the coin is biased.

1021
00:49:26,540 --> 00:49:28,700
Because there is a status quo.

1022
00:49:28,700 --> 00:49:30,500
And we're trying to
see if we have evidence

1023
00:49:30,500 --> 00:49:31,910
against the status quo.

1024
00:49:31,910 --> 00:49:35,380
The status quo for the runners
is they ran the same speed.

1025
00:49:35,380 --> 00:49:37,880
The status quo for the coin,
we can probably all agree

1026
00:49:37,880 --> 00:49:39,560
is that the coin is fair.

1027
00:49:39,560 --> 00:49:41,150
The status quo for a drug?

1028
00:49:41,150 --> 00:49:43,940
I mean, again,
unless you prove me

1029
00:49:43,940 --> 00:49:45,920
that you're actually
not a scammer

1030
00:49:45,920 --> 00:49:48,959
is that the status quo is
that this is maple syrup.

1031
00:49:48,959 --> 00:49:50,000
There's nothing in there.

1032
00:49:50,000 --> 00:49:51,060
Why would you?

1033
00:49:51,060 --> 00:49:53,060
I mean, if I let you
get away with it,

1034
00:49:53,060 --> 00:49:55,810
you would put corn syrup.

1035
00:49:55,810 --> 00:49:58,720
It's cheaper.

1036
00:49:58,720 --> 00:49:59,840
OK.

1037
00:49:59,840 --> 00:50:01,171
So now let's move on to math.

1038
00:50:01,171 --> 00:50:01,670
All right.

1039
00:50:01,670 --> 00:50:04,460
So when I started
doing mathematics,

1040
00:50:04,460 --> 00:50:06,690
I'm going to have to talk
about random variables

1041
00:50:06,690 --> 00:50:08,270
and statistical models.

1042
00:50:08,270 --> 00:50:13,110
And here, there is actually
a very simple thing,

1043
00:50:13,110 --> 00:50:15,800
which actually goes
back to this picture.

1044
00:50:18,950 --> 00:50:27,230
A test is really asking
me if my parameter

1045
00:50:27,230 --> 00:50:28,850
is in some region
of the parameter set

1046
00:50:28,850 --> 00:50:30,766
or another region of the
parameter set, right?

1047
00:50:30,766 --> 00:50:32,000
Yes/no.

1048
00:50:32,000 --> 00:50:37,490
And so, what I'm going to be
given is a sample, x1, xn.

1049
00:50:37,490 --> 00:50:38,120
I have a model.

1050
00:50:41,830 --> 00:50:46,460
And again, those can be
braces depending on the day.

1051
00:50:46,460 --> 00:50:54,670
And so, now I'm going to give
myself theta 0 and theta 1

1052
00:50:54,670 --> 00:50:55,660
to this joint subset.

1053
00:50:58,590 --> 00:51:01,170
OK.

1054
00:51:01,170 --> 00:51:02,840
So capital theta
here is the space

1055
00:51:02,840 --> 00:51:05,200
in which my parameter can live.

1056
00:51:05,200 --> 00:51:06,950
To make two disjoint
subsets, I could just

1057
00:51:06,950 --> 00:51:11,356
split this guy in half, right?

1058
00:51:11,356 --> 00:51:13,730
I'm going to say, well, maybe
it's this guy and this guy.

1059
00:51:13,730 --> 00:51:14,230
OK.

1060
00:51:14,230 --> 00:51:16,250
So this is theta 0.

1061
00:51:16,250 --> 00:51:18,830
And this is theta 1.

1062
00:51:18,830 --> 00:51:22,840
What it means when I split
those two guys, in test,

1063
00:51:22,840 --> 00:51:25,510
I'm actually going to focus
only on theta 0 or theta 1.

1064
00:51:25,510 --> 00:51:28,010
And so, it means that
a priori I've already

1065
00:51:28,010 --> 00:51:32,240
removed all the possibilities
of theta being in this region.

1066
00:51:32,240 --> 00:51:33,540
What does it mean?

1067
00:51:33,540 --> 00:51:37,910
Go back to the
example of runners.

1068
00:51:37,910 --> 00:51:41,180
This region here for
the Cherry Blossom Run

1069
00:51:41,180 --> 00:51:44,420
is the set of parameters,
where mu was larger

1070
00:51:44,420 --> 00:51:47,354
than 103.5, right?

1071
00:51:47,354 --> 00:51:48,020
We removed that.

1072
00:51:48,020 --> 00:51:49,880
We didn't even consider
this possibility.

1073
00:51:49,880 --> 00:51:52,820
We said either it's less--

1074
00:51:52,820 --> 00:51:53,320
sorry.

1075
00:51:53,320 --> 00:51:55,860
That's mu equal to 103.5.

1076
00:51:55,860 --> 00:51:59,770
And this was mu
less than 103.5, OK?

1077
00:51:59,770 --> 00:52:03,950
But these guys were like
if it happens, it happens.

1078
00:52:03,950 --> 00:52:06,980
I'm not making any
statement about that case.

1079
00:52:06,980 --> 00:52:07,520
All right?

1080
00:52:07,520 --> 00:52:09,565
So now I take those two subsets.

1081
00:52:09,565 --> 00:52:11,690
And now I'm going to give
them two different names,

1082
00:52:11,690 --> 00:52:15,050
because they're going to
have an asymmetric role.

1083
00:52:15,050 --> 00:52:18,710
h0 is the null hypothesis.

1084
00:52:18,710 --> 00:52:23,630
And h1 is the
alternative hypothesis.

1085
00:52:23,630 --> 00:52:27,030
h0 is the status quo.

1086
00:52:27,030 --> 00:52:29,560
h1 is what is
considered typically

1087
00:52:29,560 --> 00:52:32,140
as scientific discovery.

1088
00:52:32,140 --> 00:52:36,550
So if you're a regulator,
you're going to push towards h0.

1089
00:52:36,550 --> 00:52:39,832
If you're a scientist, you're
going to push towards h1.

1090
00:52:39,832 --> 00:52:41,290
If you're a
pharmaceutical company,

1091
00:52:41,290 --> 00:52:42,990
you're going to push towards h1.

1092
00:52:42,990 --> 00:52:43,900
OK?

1093
00:52:43,900 --> 00:52:47,410
And so, depending on whether you
want to be conservative-- oh,

1094
00:52:47,410 --> 00:52:49,272
I can find evidence
in a lot of data.

1095
00:52:49,272 --> 00:52:50,980
As soon as you give
me three data points,

1096
00:52:50,980 --> 00:52:52,563
I'm going to be able
to find evidence.

1097
00:52:52,563 --> 00:52:55,000
That means I'm going to
tend to say, oh, it's h1.

1098
00:52:55,000 --> 00:52:58,010
But if you say you
need a lot of data

1099
00:52:58,010 --> 00:53:00,260
before you can actually move
away from the status quo,

1100
00:53:00,260 --> 00:53:01,870
that's age h0, OK?

1101
00:53:01,870 --> 00:53:03,940
So think of h0 as
being status quo,

1102
00:53:03,940 --> 00:53:08,480
h1 being some discovery that
goes against the status quo.

1103
00:53:08,480 --> 00:53:08,980
All right?

1104
00:53:08,980 --> 00:53:12,310
So if we believe that
the truth theta is either

1105
00:53:12,310 --> 00:53:17,330
in one of those, what we say is
we want to test h0 against h1.

1106
00:53:17,330 --> 00:53:17,830
OK.

1107
00:53:17,830 --> 00:53:19,790
This is actually wording.

1108
00:53:19,790 --> 00:53:22,360
So remember, because this
is how your questions are

1109
00:53:22,360 --> 00:53:23,770
going to be formulated.

1110
00:53:23,770 --> 00:53:26,320
And this is how you want
to probably communicate

1111
00:53:26,320 --> 00:53:27,730
as a statistician.

1112
00:53:27,730 --> 00:53:29,347
So you're going to
say I have the null

1113
00:53:29,347 --> 00:53:30,430
and I have an alternative.

1114
00:53:30,430 --> 00:53:32,830
I want to test h0 against h1.

1115
00:53:32,830 --> 00:53:34,480
I want to test the
null hypothesis

1116
00:53:34,480 --> 00:53:36,220
against the alternative
hypothesis, OK?

1117
00:53:39,540 --> 00:53:42,590
Now, the two hypotheses I
forgot to say are actually this.

1118
00:53:42,590 --> 00:53:46,710
h0 is that the theta
belongs to theta 0.

1119
00:53:46,710 --> 00:53:50,911
And h1 is that it theta
belongs to theta 1.

1120
00:53:50,911 --> 00:53:51,410
OK.

1121
00:53:51,410 --> 00:53:53,280
So here, for example,
theta was mu.

1122
00:53:53,280 --> 00:53:57,320
And that was mu equal to 103.5.

1123
00:53:57,320 --> 00:54:01,530
And this was mu less than 103.5.

1124
00:54:01,530 --> 00:54:02,480
OK?

1125
00:54:02,480 --> 00:54:06,360
So typically, they're not going
to look like thetas and things

1126
00:54:06,360 --> 00:54:06,860
like that.

1127
00:54:06,860 --> 00:54:09,140
They're going to look like very
simple things, where you take

1128
00:54:09,140 --> 00:54:11,180
your usual notation for
your usual parameter

1129
00:54:11,180 --> 00:54:15,060
and you just say in mathematical
terms what relationship this

1130
00:54:15,060 --> 00:54:16,910
should be satisfying, right?

1131
00:54:16,910 --> 00:54:18,320
For example, in
the drug example,

1132
00:54:18,320 --> 00:54:25,150
that would be mu drug
is equal to mu control.

1133
00:54:25,150 --> 00:54:30,380
And here, that would be mu
drug less than mu control.

1134
00:54:30,380 --> 00:54:34,150
The number of
expectorations for people

1135
00:54:34,150 --> 00:54:35,800
who take the drug
for the cough syrup

1136
00:54:35,800 --> 00:54:38,300
is less than the number
of expectoration of people

1137
00:54:38,300 --> 00:54:42,990
who take the corn syrup, OK?

1138
00:54:42,990 --> 00:54:45,090
So now what we want to do.

1139
00:54:47,890 --> 00:54:51,060
We've set up our
hypothesis testing problem.

1140
00:54:51,060 --> 00:54:52,260
You're a scientist.

1141
00:54:52,260 --> 00:54:55,020
You've set up your problem.

1142
00:54:55,020 --> 00:54:58,170
Now what you're going
to do is collect data.

1143
00:54:58,170 --> 00:55:00,660
And what you're going to
try to find on this data

1144
00:55:00,660 --> 00:55:04,050
is evidence against h0.

1145
00:55:04,050 --> 00:55:06,060
And the alternative
is going to guide you

1146
00:55:06,060 --> 00:55:08,010
into which direction
you should be looking

1147
00:55:08,010 --> 00:55:10,200
for evidence against this guy.

1148
00:55:10,200 --> 00:55:11,290
All right?

1149
00:55:11,290 --> 00:55:13,820
And so, of course, the
narrower the alternative,

1150
00:55:13,820 --> 00:55:15,570
the easier it is for
you, because you just

1151
00:55:15,570 --> 00:55:19,410
have to look at the one
possible candidate, right?

1152
00:55:19,410 --> 00:55:22,650
But typically, h1 is a
big group, like less than.

1153
00:55:22,650 --> 00:55:27,990
Nobody tells you it's
either it's 103.5 and 103.

1154
00:55:27,990 --> 00:55:32,460
People tell you it's either
103.5 or less than 103.5.

1155
00:55:32,460 --> 00:55:33,280
OK.

1156
00:55:33,280 --> 00:55:37,330
And so, what we want to do is
to decide whether we reject h0.

1157
00:55:37,330 --> 00:55:40,480
So we look for evidence
against h0 in the data, OK?

1158
00:55:44,320 --> 00:55:48,880
So as I said, h0 and h1 do
not play a symmetric role.

1159
00:55:48,880 --> 00:55:51,700
It's very important to
know which one you're

1160
00:55:51,700 --> 00:55:53,465
going to place as h0
and which one you're

1161
00:55:53,465 --> 00:55:54,340
going to place at h1.

1162
00:55:59,260 --> 00:56:01,480
If it's a close
call, you're always

1163
00:56:01,480 --> 00:56:04,000
going to side with h0, OK?

1164
00:56:04,000 --> 00:56:05,860
So you have to be
careful about those.

1165
00:56:05,860 --> 00:56:07,609
You have to keep that
in mind that if it's

1166
00:56:07,609 --> 00:56:10,410
a close call, if data does
not carry a lot of evidence,

1167
00:56:10,410 --> 00:56:12,280
you're going to side with h0.

1168
00:56:12,280 --> 00:56:15,700
And so, you're actually
never saying that h0 is true.

1169
00:56:15,700 --> 00:56:18,430
You're just saying I did not
find evidence against h0.

1170
00:56:18,430 --> 00:56:21,400
You don't say I accept that h0.

1171
00:56:21,400 --> 00:56:25,590
You say I failed to reject h0.

1172
00:56:25,590 --> 00:56:26,090
OK.

1173
00:56:26,090 --> 00:56:28,189
And so one of the
things that you

1174
00:56:28,189 --> 00:56:29,980
want to keep in mind
when you're doing this

1175
00:56:29,980 --> 00:56:32,960
is this innocent
until proven guilty.

1176
00:56:32,960 --> 00:56:37,010
So if you come from a
country, like America,

1177
00:56:37,010 --> 00:56:38,090
there's such a thing.

1178
00:56:38,090 --> 00:56:41,270
And in particular,
lack of evidence

1179
00:56:41,270 --> 00:56:45,410
does not mean that you
are not guilty, all right?

1180
00:56:45,410 --> 00:56:47,720
OJ Simpson was found not guilty.

1181
00:56:47,720 --> 00:56:50,270
It was not found innocent, OK?

1182
00:56:50,270 --> 00:56:52,100
And so, this is
basically what happens

1183
00:56:52,100 --> 00:56:55,760
is like the prosecutor
brings their evidence.

1184
00:56:55,760 --> 00:56:58,940
And then the jury has
to decide whether they

1185
00:56:58,940 --> 00:57:07,400
were convinced that this
person was guilty of anything.

1186
00:57:07,400 --> 00:57:11,820
And the question is, do
you have enough evidence?

1187
00:57:11,820 --> 00:57:13,550
But if you don't
have evidence, it's

1188
00:57:13,550 --> 00:57:17,257
not the burden of the defender
to prove that they're innocent.

1189
00:57:17,257 --> 00:57:18,590
Nobody's proving their innocent.

1190
00:57:18,590 --> 00:57:20,120
I mean, sometimes it helps.

1191
00:57:20,120 --> 00:57:22,160
But you just have to make
sure that there's not

1192
00:57:22,160 --> 00:57:24,560
enough evidence against you, OK?

1193
00:57:24,560 --> 00:57:26,370
And that's basically
what it's doing.

1194
00:57:26,370 --> 00:57:28,980
You're h0 until proven h1.

1195
00:57:31,529 --> 00:57:32,820
So how are we going to do this?

1196
00:57:32,820 --> 00:57:37,200
Well, as I said, the
role of estimators

1197
00:57:37,200 --> 00:57:40,460
in hypothesis testing is played
by something called tests.

1198
00:57:40,460 --> 00:57:42,300
And a test is a statistic.

1199
00:57:42,300 --> 00:57:44,340
Can somebody remind me
what a statistic is?

1200
00:57:47,190 --> 00:57:48,140
Yep?

1201
00:57:48,140 --> 00:57:50,050
STUDENT: The measure [INAUDIBLE]

1202
00:57:50,050 --> 00:57:53,290
PROFESSOR: Yeah, that's
actually just one step more.

1203
00:57:53,290 --> 00:57:54,940
So it's a function
of the observations.

1204
00:57:54,940 --> 00:57:56,734
And we require it
to be measurable.

1205
00:57:56,734 --> 00:57:58,150
And as a rule of
thumb, measurable

1206
00:57:58,150 --> 00:58:00,567
means if I give you data, you
can actually compute it, OK?

1207
00:58:00,567 --> 00:58:02,649
If you don't see a
[INAUDIBLE] or an [INAUDIBLE],,

1208
00:58:02,649 --> 00:58:04,140
you don't have to
think about it.

1209
00:58:04,140 --> 00:58:04,780
All right.

1210
00:58:08,950 --> 00:58:11,590
And so, what we do is
we just have this test.

1211
00:58:11,590 --> 00:58:14,590
But now I'm actually
asking only from this test

1212
00:58:14,590 --> 00:58:18,280
a yes/no answer, which I
can code as 0, 1, right?

1213
00:58:18,280 --> 00:58:21,640
So as a rule of thumb, you
say that, well, the test

1214
00:58:21,640 --> 00:58:23,140
is equal to 0 then h0.

1215
00:58:23,140 --> 00:58:25,490
The test is equal to 1 at h1.

1216
00:58:25,490 --> 00:58:27,604
And as we said, is that
if the test is equal to 0,

1217
00:58:27,604 --> 00:58:29,020
it doesn't mean
that a 0 is truth.

1218
00:58:29,020 --> 00:58:31,060
It means that I
feel to rejected h0.

1219
00:58:31,060 --> 00:58:33,390
And if the test is
equal to 1, I reject h0.

1220
00:58:36,510 --> 00:58:38,750
So I have two possibilities.

1221
00:58:38,750 --> 00:58:39,640
I look at my data.

1222
00:58:39,640 --> 00:58:41,800
I turn it into a yes/no answer.

1223
00:58:41,800 --> 00:58:45,310
And yes/no answer is
really h0 or h1, OK?

1224
00:58:45,310 --> 00:58:49,260
Which one is the most
likely basically.

1225
00:58:49,260 --> 00:58:50,720
All right.

1226
00:58:50,720 --> 00:58:57,530
So in the coin flip
example, our test statistic

1227
00:58:57,530 --> 00:59:00,850
is actually something
that takes value 0, 1.

1228
00:59:00,850 --> 00:59:04,600
And anything, any function
that takes value at 0,

1229
00:59:04,600 --> 00:59:07,030
1 is an indicator function, OK?

1230
00:59:07,030 --> 00:59:11,507
So an indicator function
is just a function.

1231
00:59:11,507 --> 00:59:13,090
So there's many ways
you can write it.

1232
00:59:18,760 --> 00:59:20,440
So it's a 1 with a double bar.

1233
00:59:20,440 --> 00:59:21,940
If you aren't
comfortable with this,

1234
00:59:21,940 --> 00:59:27,650
it's totally OK to write i
of something, like i of a.

1235
00:59:27,650 --> 00:59:28,534
OK.

1236
00:59:28,534 --> 00:59:29,200
And that's what?

1237
00:59:29,200 --> 00:59:34,270
So a, here, is a statement,
like an inequality, an equality,

1238
00:59:34,270 --> 00:59:38,600
some mathematical statement, OK?

1239
00:59:38,600 --> 00:59:39,700
Or not mathematical.

1240
00:59:39,700 --> 00:59:43,430
I mean, "a" can be, you know,
my grandma is 20 years old, OK?

1241
00:59:43,430 --> 00:59:50,390
And so, this is basically 1 if
a is true, and 0 if a is false.

1242
00:59:54,510 --> 00:59:56,260
That's the way you
want to think about it.

1243
01:00:02,840 --> 01:00:05,420
This function takes only
two values, and that's it.

1244
01:00:10,290 --> 01:00:12,080
So here's the
example that we had.

1245
01:00:12,080 --> 01:00:17,330
We looked at whether
the standardized xn

1246
01:00:17,330 --> 01:00:20,220
bar, the one that actually
is approximately n 0,1

1247
01:00:20,220 --> 01:00:22,590
was larger than something
in absolute value,

1248
01:00:22,590 --> 01:00:27,550
either very large or
very small, but negative.

1249
01:00:27,550 --> 01:00:29,020
I'm going back to this picture.

1250
01:00:29,020 --> 01:00:31,270
We wanted to know
if this guy was

1251
01:00:31,270 --> 01:00:35,660
either to the left of something
or to the right of something,

1252
01:00:35,660 --> 01:00:36,160
right?

1253
01:00:36,160 --> 01:00:37,642
Was it in these regions?

1254
01:00:42,352 --> 01:00:49,250
Now this indicator, I can view
this as a function of x bar.

1255
01:00:49,250 --> 01:00:52,100
What it does, it really
splits the possible values

1256
01:00:52,100 --> 01:00:54,500
of x bar, which is just
a real number, right?

1257
01:00:54,500 --> 01:00:56,180
In two groups.

1258
01:00:56,180 --> 01:00:59,030
The groups on which they
lead to a value, which is 1.

1259
01:00:59,030 --> 01:01:00,590
And the groups on
which they lead

1260
01:01:00,590 --> 01:01:02,270
to value, which is 0, right?

1261
01:01:02,270 --> 01:01:05,420
So what it does is that
I can actually think

1262
01:01:05,420 --> 01:01:09,140
of it as the real line, x bar.

1263
01:01:09,140 --> 01:01:13,010
And there's basically
some values here,

1264
01:01:13,010 --> 01:01:14,927
where I'm going to get a 1.

1265
01:01:14,927 --> 01:01:16,260
Maybe I'm going to get a 0 here.

1266
01:01:16,260 --> 01:01:17,670
Maybe I'm going to get a 0.

1267
01:01:17,670 --> 01:01:18,840
Maybe I'm going to get a 1.

1268
01:01:18,840 --> 01:01:22,470
I'm just splitting all
possible values of x bar.

1269
01:01:22,470 --> 01:01:25,350
And I see whether to spit
out the side which is 0

1270
01:01:25,350 --> 01:01:26,880
or which is 1.

1271
01:01:26,880 --> 01:01:29,471
In this case, it's
not clear, right?

1272
01:01:29,471 --> 01:01:31,095
I mean, the function
is very nonlinear.

1273
01:01:31,095 --> 01:01:34,530
It's x bar minus 0.5 divided
by the square root of x bar 1

1274
01:01:34,530 --> 01:01:35,532
minus x bar.

1275
01:01:35,532 --> 01:01:36,990
If we put the p in
the denominator,

1276
01:01:36,990 --> 01:01:38,070
that would be clear.

1277
01:01:38,070 --> 01:01:40,530
That would just be exactly
something that looks like this.

1278
01:01:45,427 --> 01:01:46,760
The function would be like this.

1279
01:01:46,760 --> 01:01:49,370
It would be 1 if it's
smaller than some value.

1280
01:01:49,370 --> 01:01:52,580
Less than 0 if it's
in between two values.

1281
01:01:52,580 --> 01:01:54,220
And then 1 again.

1282
01:01:54,220 --> 01:02:00,730
So that's psi, OK?

1283
01:02:00,730 --> 01:02:02,470
So this is 1, right?

1284
01:02:02,470 --> 01:02:03,130
This is 1.

1285
01:02:03,130 --> 01:02:04,540
And this is 0.

1286
01:02:04,540 --> 01:02:07,390
So if x bar is too small
or if x bar is too large,

1287
01:02:07,390 --> 01:02:09,590
then I'm getting a value 1.

1288
01:02:09,590 --> 01:02:12,640
But if it's somewhere in
between, I'm getting a value 0.

1289
01:02:12,640 --> 01:02:14,410
Now, if I have this
weird function,

1290
01:02:14,410 --> 01:02:18,090
it's not clear
how this happened.

1291
01:02:18,090 --> 01:02:20,434
So the picture
here that I get is

1292
01:02:20,434 --> 01:02:27,080
that I have a weird
non-linear function, right?

1293
01:02:27,080 --> 01:02:28,420
So that's x bar.

1294
01:02:28,420 --> 01:02:32,270
That's square root
of n x bar n 0.5

1295
01:02:32,270 --> 01:02:36,980
divided by the square root of
x bar n 1 minus x bar n, right?

1296
01:02:36,980 --> 01:02:38,210
That's this function.

1297
01:02:38,210 --> 01:02:40,890
A priori, I have no idea what
this function looks like.

1298
01:02:43,650 --> 01:02:45,320
We can probably
analyze this function,

1299
01:02:45,320 --> 01:02:46,740
but let's pretend we don't know.

1300
01:02:46,740 --> 01:02:49,680
So it's like some
crazy stuff like this.

1301
01:02:49,680 --> 01:02:56,320
And all I'm asking is
whether in absolute value

1302
01:02:56,320 --> 01:02:59,510
it's larger than c, which means
that is this function larger

1303
01:02:59,510 --> 01:03:01,383
than c or less than minus c?

1304
01:03:05,020 --> 01:03:07,100
The intervals on which
I'm going to say 1

1305
01:03:07,100 --> 01:03:17,821
are this guy, this guy,
this guy, and this guy.

1306
01:03:17,821 --> 01:03:18,320
OK.

1307
01:03:18,320 --> 01:03:20,897
And everywhere
else, I'm seeing 0.

1308
01:03:20,897 --> 01:03:21,980
Everybody agree with this?

1309
01:03:21,980 --> 01:03:24,930
This is what I'm doing.

1310
01:03:24,930 --> 01:03:27,370
Now of course, it's
probably easier for you

1311
01:03:27,370 --> 01:03:29,350
to just package it into
this nice thing that's

1312
01:03:29,350 --> 01:03:31,602
just either larger than
c, an absolute value,

1313
01:03:31,602 --> 01:03:33,810
or less Than C. I want to
have to plot this function.

1314
01:03:33,810 --> 01:03:36,950
In practice, you don't have to.

1315
01:03:36,950 --> 01:03:40,050
Now, this is where I
am actually claiming.

1316
01:03:40,050 --> 01:03:42,200
So here, I actually
defined to you a test.

1317
01:03:42,200 --> 01:03:44,500
And I promised, starting
this lecture, by saying,

1318
01:03:44,500 --> 01:03:46,250
oh, now we're going
to do something better

1319
01:03:46,250 --> 01:03:47,880
than computing the averages.

1320
01:03:47,880 --> 01:03:50,450
Now I'm telling you it's
just computing an average.

1321
01:03:50,450 --> 01:03:52,910
And the thing is
the test is not just

1322
01:03:52,910 --> 01:03:54,830
the specification of this x bar.

1323
01:03:54,830 --> 01:03:57,901
It's also the specification
of this constant c.

1324
01:03:57,901 --> 01:03:58,400
All right?

1325
01:03:58,400 --> 01:04:02,270
And the constant c
was exactly where

1326
01:04:02,270 --> 01:04:07,360
our belief about what a large
value for a Gaussian is.

1327
01:04:07,360 --> 01:04:09,200
That's exactly where it came in.

1328
01:04:09,200 --> 01:04:12,500
So this choice of c is
basically a threshold

1329
01:04:12,500 --> 01:04:15,320
at which we decide above
this threshold this isn't

1330
01:04:15,320 --> 01:04:17,104
likely to come from a Gaussian.

1331
01:04:17,104 --> 01:04:18,770
Below this threshold
we decide that it's

1332
01:04:18,770 --> 01:04:20,660
likely to come from a Gaussian.

1333
01:04:20,660 --> 01:04:24,200
So we have to choose what
this threshold is based

1334
01:04:24,200 --> 01:04:26,750
on what we think likely means.

1335
01:04:34,420 --> 01:04:37,490
Just a little bit
more of those things.

1336
01:04:37,490 --> 01:04:39,520
So now we're going to
have to characterize

1337
01:04:39,520 --> 01:04:43,030
what makes a good test, right?

1338
01:04:43,030 --> 01:04:44,770
Well, I'll come back
to it in a second.

1339
01:04:44,770 --> 01:04:48,755
But you could have a test
that says reject all the time.

1340
01:04:48,755 --> 01:04:50,380
And that's going to
be bad test, right?

1341
01:04:50,380 --> 01:04:52,120
The FDA is not
implementing a test

1342
01:04:52,120 --> 01:04:56,440
that says, yes all drugs work,
now let's just go to Aruba, OK?

1343
01:04:56,440 --> 01:04:59,650
So people are trying
to have something that

1344
01:04:59,650 --> 01:05:01,560
tries to work all the time.

1345
01:05:01,560 --> 01:05:03,700
Now FDA's not either
saying, let's just

1346
01:05:03,700 --> 01:05:07,340
say that no drugs work, and
let's go to Aruba, all right?

1347
01:05:07,340 --> 01:05:09,880
They're just trying
to say the right thing

1348
01:05:09,880 --> 01:05:11,180
as often as possible.

1349
01:05:11,180 --> 01:05:13,300
And so, we're going to
have to measure this.

1350
01:05:13,300 --> 01:05:15,790
So the things that are
associated to a test

1351
01:05:15,790 --> 01:05:17,440
are the rejection region.

1352
01:05:17,440 --> 01:05:21,550
And if you look at this
x in en, such that psi

1353
01:05:21,550 --> 01:05:25,420
of x is equal to 1, this is
exactly this guy that I drew.

1354
01:05:29,024 --> 01:05:30,940
So here, I summarized
the values of the sample

1355
01:05:30,940 --> 01:05:32,460
into their average.

1356
01:05:32,460 --> 01:05:35,080
But the values of the
sample that I collect

1357
01:05:35,080 --> 01:05:38,691
will lead to a test that says 1.

1358
01:05:38,691 --> 01:05:39,190
All right?

1359
01:05:39,190 --> 01:05:40,610
So this is the rejection region.

1360
01:05:40,610 --> 01:05:43,990
If I collect a data point,
technically I have--

1361
01:05:43,990 --> 01:05:51,250
so I have e to the n, which
is a big space like this.

1362
01:05:51,250 --> 01:05:52,540
So that's e to the n.

1363
01:05:52,540 --> 01:05:55,150
Think of it as being
the space of xn bars.

1364
01:05:55,150 --> 01:05:59,200
And I have a function that
takes only value 0, 1.

1365
01:05:59,200 --> 01:06:01,962
So I can decompose
it into this part

1366
01:06:01,962 --> 01:06:04,420
where it takes value 0 and the
part where it takes value 1.

1367
01:06:04,420 --> 01:06:06,740
And those can be super
complicated, right?

1368
01:06:06,740 --> 01:06:07,880
Can have a thing like this.

1369
01:06:07,880 --> 01:06:11,990
Can have some weird little
islands where it takes value 1.

1370
01:06:11,990 --> 01:06:14,090
I can have some islands
where it's takes value 0.

1371
01:06:14,090 --> 01:06:16,237
I can have some
weird stuff going on.

1372
01:06:16,237 --> 01:06:18,070
But I can always partition
it into the value

1373
01:06:18,070 --> 01:06:20,570
where it takes value 0 and the
value where it takes value 1.

1374
01:06:20,570 --> 01:06:25,530
And the value where it takes
1, if psi is equal to 1,

1375
01:06:25,530 --> 01:06:32,630
this is called the rejection
region of the plot, OK?

1376
01:06:32,630 --> 01:06:37,810
So just the samples that
would lead me to rejecting.

1377
01:06:37,810 --> 01:06:42,460
And notice that this is the
indicator of the rejection

1378
01:06:42,460 --> 01:06:44,010
region.

1379
01:06:44,010 --> 01:06:48,190
The test is the indicator
of the rejection region.

1380
01:06:48,190 --> 01:06:52,530
So there's two ways you can make
an error when there's a test.

1381
01:06:52,530 --> 01:06:56,460
Either the truth is in h1, and
you're saying actually it's h1.

1382
01:06:56,460 --> 01:06:59,310
Or the truth is in h1,
and you say it's h0.

1383
01:06:59,310 --> 01:07:04,710
And that's how we build-in the
asymmetry between h0 and h1.

1384
01:07:04,710 --> 01:07:06,480
We control only one
of the two errors.

1385
01:07:06,480 --> 01:07:09,510
And we hope for the
best for the second one.

1386
01:07:09,510 --> 01:07:13,370
So the type 1 error is
the one that says, well,

1387
01:07:13,370 --> 01:07:16,640
if it is actually the
status quo, but a claim

1388
01:07:16,640 --> 01:07:19,220
that there is a discovery--
if it's actually h0,

1389
01:07:19,220 --> 01:07:21,700
but I claim that
I'm in h1, then I

1390
01:07:21,700 --> 01:07:25,560
admit I commit a type I error.

1391
01:07:25,560 --> 01:07:27,870
And so the probability
of type I error

1392
01:07:27,870 --> 01:07:29,730
is this function
alpha of psi, which

1393
01:07:29,730 --> 01:07:34,920
is the probability of saying
that psi is equal to 1

1394
01:07:34,920 --> 01:07:36,809
when theta is in h0.

1395
01:07:36,809 --> 01:07:38,850
Now, the problem is that
this is not just number,

1396
01:07:38,850 --> 01:07:41,790
because theta is just like
moving all over h0, right?

1397
01:07:41,790 --> 01:07:45,980
There's many values that
theta can be, right?

1398
01:07:45,980 --> 01:07:48,500
So theta is somewhere here.

1399
01:07:52,854 --> 01:07:53,520
I erased it, OK.

1400
01:07:59,120 --> 01:08:00,335
All right.

1401
01:08:00,335 --> 01:08:02,210
For simplicity, we're
going to think of theta

1402
01:08:02,210 --> 01:08:07,980
as being mu and 103.5, OK?

1403
01:08:07,980 --> 01:08:11,720
And so, I know that
this is theta 1.

1404
01:08:11,720 --> 01:08:18,990
And just this point
here was theta 0, OK?

1405
01:08:18,990 --> 01:08:19,490
Agreed?

1406
01:08:19,490 --> 01:08:22,800
This is with the
Cherry Blossom Run.

1407
01:08:22,800 --> 01:08:25,699
Now, here in this case,
it's actually easy.

1408
01:08:25,699 --> 01:08:27,240
I need to compute
this function alpha

1409
01:08:27,240 --> 01:08:37,144
of psi, which maps theta in
theta 0 to p theta of psi

1410
01:08:37,144 --> 01:08:37,689
equals 1.

1411
01:08:37,689 --> 01:08:40,682
So that's the probability that
I reject when theta is in h0.

1412
01:08:40,682 --> 01:08:42,390
Then there's only one
of them to compute,

1413
01:08:42,390 --> 01:08:44,220
because theta can only
take this one value.

1414
01:08:44,220 --> 01:08:46,840
So this is really 103.5.

1415
01:08:46,840 --> 01:08:47,340
OK.

1416
01:08:47,340 --> 01:08:48,964
So that's the
probability that I reject

1417
01:08:48,964 --> 01:08:52,649
when the true mean was 103.5.

1418
01:08:52,649 --> 01:08:54,870
Now, if I was testing whether--

1419
01:08:54,870 --> 01:08:57,660
if h0 was this entire
guy here, all the

1420
01:08:57,660 --> 01:09:00,090
values larger than
103.5, then I would

1421
01:09:00,090 --> 01:09:03,450
have to compute this function
for all possible values

1422
01:09:03,450 --> 01:09:06,220
of the theta in there.

1423
01:09:06,220 --> 01:09:07,220
And guess what?

1424
01:09:07,220 --> 01:09:09,414
The worst case is when
it's going to be here.

1425
01:09:09,414 --> 01:09:11,080
Because it's so close
to the alternative

1426
01:09:11,080 --> 01:09:15,149
that that's where I'm making
the most error possible.

1427
01:09:15,149 --> 01:09:17,550
And then there's
the type 2 error,

1428
01:09:17,550 --> 01:09:22,200
which is defined basically
in the symmetric ways.

1429
01:09:22,200 --> 01:09:25,089
The function that maps
theta to the probability.

1430
01:09:25,089 --> 01:09:26,880
So that's the probability
of type 2 errors.

1431
01:09:26,880 --> 01:09:30,330
The probability that I
fail to reject h0, right?

1432
01:09:30,330 --> 01:09:34,080
If psi is equal to 0,
I fail to reject h0.

1433
01:09:34,080 --> 01:09:39,840
But that actually
came from h1, OK?

1434
01:09:39,840 --> 01:09:41,540
So in this example, let's clear.

1435
01:09:41,540 --> 01:09:45,629
If I'm here, like if
the true mean was 100,

1436
01:09:45,629 --> 01:09:48,170
I'm looking at the probability
that the true mean is actually

1437
01:09:48,170 --> 01:09:51,569
100, and I'm actually
saying it was 103.5.

1438
01:09:51,569 --> 01:09:53,504
Or it's not less than 103.5.

1439
01:09:53,504 --> 01:09:54,004
Yeah?

1440
01:09:54,004 --> 01:09:56,045
STUDENT: I'm just still
confused by the notation.

1441
01:09:56,045 --> 01:09:59,471
When you say that [INAUDIBLE]
theta sub 1 arrow r,

1442
01:09:59,471 --> 01:10:02,650
I'm not sure what
that notation means.

1443
01:10:02,650 --> 01:10:04,650
PROFESSOR: Well, this
just means it's a function

1444
01:10:04,650 --> 01:10:08,060
that maps theta 0 to r.

1445
01:10:08,060 --> 01:10:09,450
You've seen functions, right?

1446
01:10:09,450 --> 01:10:10,220
OK.

1447
01:10:10,220 --> 01:10:14,690
So that's just
the way you write.

1448
01:10:14,690 --> 01:10:20,800
So that means that's a function
f that goes from, say, r r,

1449
01:10:20,800 --> 01:10:25,381
and that maps x to x squared.

1450
01:10:25,381 --> 01:10:25,880
OK.

1451
01:10:25,880 --> 01:10:27,921
So here, I'm just saying
I don't have to consider

1452
01:10:27,921 --> 01:10:29,030
all possible values.

1453
01:10:29,030 --> 01:10:32,540
I'm only considering
the values on theta 0.

1454
01:10:32,540 --> 01:10:33,620
I put r actually.

1455
01:10:33,620 --> 01:10:36,320
I could restrict myself
to the interval 0, 1,

1456
01:10:36,320 --> 01:10:38,160
because those are probabilities.

1457
01:10:38,160 --> 01:10:41,090
So it's just telling me
where my function comes from

1458
01:10:41,090 --> 01:10:44,330
and where my function goes to.

1459
01:10:44,330 --> 01:10:47,240
And beta is a function, right?

1460
01:10:47,240 --> 01:10:52,990
So beta psi of theta
is just the probability

1461
01:10:52,990 --> 01:10:55,610
that theta is equal to 1.

1462
01:10:55,610 --> 01:10:57,930
And I could define
that for all thetas--

1463
01:10:57,930 --> 01:10:58,430
sorry.

1464
01:10:58,430 --> 01:11:00,600
If psi is equal
to 0 in this case.

1465
01:11:00,600 --> 01:11:02,810
And that could define
that for all thetas.

1466
01:11:02,810 --> 01:11:05,240
But the only ones
that lead to an error

1467
01:11:05,240 --> 01:11:06,812
are the thetas that are in h1.

1468
01:11:06,812 --> 01:11:08,270
I mean, I can define
this function.

1469
01:11:08,270 --> 01:11:11,121
It's just not going to
correspond to an error, OK?

1470
01:11:13,930 --> 01:11:18,960
And the power of a
test is the smallest--

1471
01:11:18,960 --> 01:11:22,101
so the power is basically
1 minus an error.

1472
01:11:22,101 --> 01:11:23,600
1 minus the probability
of an error.

1473
01:11:23,600 --> 01:11:27,324
So it's the probability of
making a correct decision, OK?

1474
01:11:27,324 --> 01:11:29,490
So it's the probability of
making a correct decision

1475
01:11:29,490 --> 01:11:31,830
under h1, that's
what the power is.

1476
01:11:31,830 --> 01:11:34,830
But again, this
could be a function.

1477
01:11:34,830 --> 01:11:36,900
Because there's many
ways that can be in h1

1478
01:11:36,900 --> 01:11:39,150
if h1 is an entire
set of numbers.

1479
01:11:39,150 --> 01:11:42,200
For example, all the numbers
there are less than 103.5.

1480
01:11:42,200 --> 01:11:45,510
And so, what I'm doing here when
I define the power of a test,

1481
01:11:45,510 --> 01:11:50,057
I'm looking at the smallest
possible of those values, OK?

1482
01:11:50,057 --> 01:11:51,390
So I'm looking at this function.

1483
01:11:54,140 --> 01:11:57,028
Maybe I should actually
expand a little more on this.

1484
01:12:02,700 --> 01:12:03,200
OK.

1485
01:12:03,200 --> 01:12:10,790
So beta psi of theta is
the probability under theta

1486
01:12:10,790 --> 01:12:12,590
that psi is equal to 0, right?

1487
01:12:12,590 --> 01:12:18,710
That's the probability
in theta 1,

1488
01:12:18,710 --> 01:12:21,110
which means then the
alternative, that they

1489
01:12:21,110 --> 01:12:21,884
feel to reject.

1490
01:12:21,884 --> 01:12:23,300
And I really should,
because theta

1491
01:12:23,300 --> 01:12:25,520
was actually in theta 1, OK?

1492
01:12:25,520 --> 01:12:29,150
So this thing here is the
probability of type 2 error.

1493
01:12:29,150 --> 01:12:34,560
Now, this is 1 minus the
probability that I did reject

1494
01:12:34,560 --> 01:12:36,960
and I should have rejected.

1495
01:12:36,960 --> 01:12:39,830
That's just a little
off the complement.

1496
01:12:39,830 --> 01:12:42,910
Because if psi is not equal
to 0, then it's equal to 1.

1497
01:12:42,910 --> 01:12:44,620
So now if I rearrange
this, it tells me

1498
01:12:44,620 --> 01:12:48,260
that the probability
that psi is equal to 1--

1499
01:12:48,260 --> 01:12:50,100
this is actually 1
minus beta psi of theta.

1500
01:12:54,440 --> 01:12:57,110
So that's true for
all thetas in theta 1.

1501
01:12:57,110 --> 01:12:58,640
And what I'm saying
is, well, this

1502
01:12:58,640 --> 01:13:00,904
is now a good thing, right?

1503
01:13:00,904 --> 01:13:02,570
This number being
large is a good thing.

1504
01:13:02,570 --> 01:13:05,330
It means I should have
rejected, and I rejected.

1505
01:13:05,330 --> 01:13:07,650
I want this to happen
with large probability.

1506
01:13:07,650 --> 01:13:09,025
And so, what I'm
going to look at

1507
01:13:09,025 --> 01:13:11,894
is the most conservative
choice of this number, right?

1508
01:13:11,894 --> 01:13:13,310
Rather than being
super optimistic

1509
01:13:13,310 --> 01:13:16,970
and say, oh, but indeed if theta
was actually equal to zero,

1510
01:13:16,970 --> 01:13:19,300
then I'm always going
to conclude that--

1511
01:13:19,300 --> 01:13:22,730
I mean, if mu is equal to 0,
everybody runs in 0 seconds,

1512
01:13:22,730 --> 01:13:25,990
then I with high
probability I'm actually

1513
01:13:25,990 --> 01:13:27,770
going to make no mistake.

1514
01:13:27,770 --> 01:13:30,920
But really, I should look at
the worst possible case, OK?

1515
01:13:30,920 --> 01:13:32,900
So what I'm looking
at is basically

1516
01:13:32,900 --> 01:13:45,002
the smallest value it
can take on theta one

1517
01:13:45,002 --> 01:13:53,490
is called power of psi.

1518
01:13:53,490 --> 01:13:55,850
Power of the test psi, OK?

1519
01:13:55,850 --> 01:13:58,930
So that's the smallest
possible value it can take.

1520
01:14:01,610 --> 01:14:02,110
All right.

1521
01:14:02,110 --> 01:14:02,651
So I'm sorry.

1522
01:14:02,651 --> 01:14:05,310
This is a lot of definitions
that you have to sink in.

1523
01:14:05,310 --> 01:14:06,870
And it's not super pleasant.

1524
01:14:06,870 --> 01:14:09,180
But that's what testing is.

1525
01:14:09,180 --> 01:14:10,540
There's a lot of jargon.

1526
01:14:10,540 --> 01:14:12,440
Those are actually
fairly simple things.

1527
01:14:12,440 --> 01:14:14,460
Just maybe you should
get a sheet for yourself.

1528
01:14:14,460 --> 01:14:17,400
And say, these are the
new terms that I learned.

1529
01:14:17,400 --> 01:14:19,584
What is their test,
rejection region?

1530
01:14:19,584 --> 01:14:21,000
Probably of type
I error, probably

1531
01:14:21,000 --> 01:14:22,740
of type 2 error, and power.

1532
01:14:22,740 --> 01:14:23,910
Just make sure you know
what those guys are.

1533
01:14:23,910 --> 01:14:24,410
Oh.

1534
01:14:24,410 --> 01:14:27,162
And null and alternative
hypothesis, OK?

1535
01:14:27,162 --> 01:14:28,620
And once you know
all these things,

1536
01:14:28,620 --> 01:14:29,953
you know what I'm talking about.

1537
01:14:29,953 --> 01:14:31,320
You know what I'm referring to.

1538
01:14:31,320 --> 01:14:33,000
And this is just jargon.

1539
01:14:33,000 --> 01:14:35,610
But in the end, those
are just probabilities.

1540
01:14:35,610 --> 01:14:38,369
I mean, these a
natural quantities.

1541
01:14:38,369 --> 01:14:40,160
Just for some reason,
people have been used

1542
01:14:40,160 --> 01:14:43,850
to using different terminology.

1543
01:14:43,850 --> 01:14:46,420
So just to illustrate.

1544
01:14:46,420 --> 01:14:48,090
When do I make a typo 1 error?

1545
01:14:48,090 --> 01:14:51,880
And when do I not
make a type 1 error?

1546
01:14:51,880 --> 01:14:56,500
So I make a type 1 error if h0
is true and I reject h0, right?

1547
01:14:56,500 --> 01:14:59,560
So the off diagonal blocks
are when I make an error.

1548
01:14:59,560 --> 01:15:02,920
When I'm on the diagonal
terms, h1 is true

1549
01:15:02,920 --> 01:15:05,530
and I reject h0, that's
a correct decision.

1550
01:15:05,530 --> 01:15:08,200
When h0 is true and
I fail to reject h0,

1551
01:15:08,200 --> 01:15:11,210
that's also the correct
decision to make.

1552
01:15:11,210 --> 01:15:17,110
So I only make errors when
I'm in one of the red blocks.

1553
01:15:17,110 --> 01:15:20,860
And one block is the type
1 error and the other block

1554
01:15:20,860 --> 01:15:21,730
is the type 2 error.

1555
01:15:21,730 --> 01:15:24,059
That's all it means, OK?

1556
01:15:24,059 --> 01:15:26,100
So you just have to know
which one we called one.

1557
01:15:32,460 --> 01:15:36,960
I mean, this was chosen
in a pretty ad hoc way.

1558
01:15:36,960 --> 01:15:40,440
So to conclude this lecture,
let me ask you a few questions.

1559
01:15:40,440 --> 01:15:46,780
If in a US court, the
defendant is found either say,

1560
01:15:46,780 --> 01:15:49,601
let's just say for the sake of
discussion, innocent or guilty.

1561
01:15:49,601 --> 01:15:50,100
All right?

1562
01:15:50,100 --> 01:15:51,516
It's really guilty
for not guilty,

1563
01:15:51,516 --> 01:15:53,900
but let's say
innocent or guilty.

1564
01:15:53,900 --> 01:15:56,106
When does the jury
make a type 1 error?

1565
01:16:03,414 --> 01:16:03,914
Yep?

1566
01:16:07,840 --> 01:16:10,180
And he's guilty?

1567
01:16:10,180 --> 01:16:11,560
And he's innocent, right?

1568
01:16:11,560 --> 01:16:14,560
The status quo, everybody is
innocent until proven guilty.

1569
01:16:14,560 --> 01:16:18,500
So that's our h0 is that
the person is innocent.

1570
01:16:18,500 --> 01:16:21,510
And so, that means
that h0 is innocent.

1571
01:16:21,510 --> 01:16:23,760
And so, we're looking at the
probably of type 1 error,

1572
01:16:23,760 --> 01:16:25,560
so that's when we reject
the fact that it's innocent.

1573
01:16:25,560 --> 01:16:27,752
So conclude that this
person is guilty, OK?

1574
01:16:27,752 --> 01:16:29,710
So type 1 error is when
this person is innocent

1575
01:16:29,710 --> 01:16:31,090
and we conclude it's guilty.

1576
01:16:31,090 --> 01:16:32,131
What is the type 2 error?

1577
01:16:36,390 --> 01:16:38,160
Letting a guilty
person go free, which

1578
01:16:38,160 --> 01:16:40,290
actually according
to the constitution,

1579
01:16:40,290 --> 01:16:42,320
is the better of the two.

1580
01:16:42,320 --> 01:16:42,820
All right?

1581
01:16:42,820 --> 01:16:45,361
So what we're going to try to
do is to control the first one,

1582
01:16:45,361 --> 01:16:47,870
and hope for the best
for the second one.

1583
01:16:47,870 --> 01:16:51,260
How could the jury make sure
that they make no type 1

1584
01:16:51,260 --> 01:16:52,500
error ever?

1585
01:16:57,870 --> 01:17:01,406
Always let the guy
go free, right?

1586
01:17:01,406 --> 01:17:03,030
What is the effect
on the type 2 error?

1587
01:17:06,600 --> 01:17:08,430
Yeah, it's the worst
possible, right?

1588
01:17:08,430 --> 01:17:12,550
I mean, basically, for every guy
that's guilty, you let them go.

1589
01:17:12,550 --> 01:17:14,970
That's the worst you can do.

1590
01:17:14,970 --> 01:17:15,930
And same thing, right?

1591
01:17:15,930 --> 01:17:20,470
How can the jury make sure
that there's no type 2 error?

1592
01:17:20,470 --> 01:17:21,140
Always convict.

1593
01:17:21,140 --> 01:17:22,890
What is the effect on
the American budget?

1594
01:17:22,890 --> 01:17:25,135
What is the effect
on the type 1 error?

1595
01:17:28,180 --> 01:17:28,680
Right.

1596
01:17:28,680 --> 01:17:31,710
So the effect is that basically
the type 1 error is maximized.

1597
01:17:31,710 --> 01:17:33,540
So there's this trade
off between type 1

1598
01:17:33,540 --> 01:17:35,430
and type 2 error
that's inherent.

1599
01:17:35,430 --> 01:17:39,030
And that's why we have this
sort of multi objective thing.

1600
01:17:39,030 --> 01:17:41,530
We're trying to minimize
two things at the same time.

1601
01:17:41,530 --> 01:17:44,340
And I can't find many
ad hoc ways, right?

1602
01:17:44,340 --> 01:17:46,730
So if you've taken
any optimization,

1603
01:17:46,730 --> 01:17:49,040
trying to optimize two
things when one is going up

1604
01:17:49,040 --> 01:17:51,540
while the other one is going
down, the only thing you can do

1605
01:17:51,540 --> 01:17:53,370
is make ad hoc heuristics.

1606
01:17:53,370 --> 01:17:55,740
Maybe you try to minimize
the sum of those two guys.

1607
01:17:55,740 --> 01:17:59,310
Maybe you try to minimize
1/3 of the first guy

1608
01:17:59,310 --> 01:18:00,940
plus 2/3 of the second guy.

1609
01:18:00,940 --> 01:18:03,390
Maybe you try to minimize
the first guy plus the square

1610
01:18:03,390 --> 01:18:04,140
of the second guy.

1611
01:18:04,140 --> 01:18:05,973
You can think of many
ways, but none of them

1612
01:18:05,973 --> 01:18:07,470
is more justified
than the other.

1613
01:18:07,470 --> 01:18:10,120
However, for statistical
hypothesis testing,

1614
01:18:10,120 --> 01:18:12,790
there's one that's very well
justified, which is just

1615
01:18:12,790 --> 01:18:15,990
constrain your type 1
error to be the smallest,

1616
01:18:15,990 --> 01:18:18,460
to be at a level that
you deem acceptable.

1617
01:18:18,460 --> 01:18:18,960
5%.

1618
01:18:24,510 --> 01:18:27,850
I want to convict at most
5% of innocent people.

1619
01:18:27,850 --> 01:18:29,850
That's what I deem reasonable.

1620
01:18:29,850 --> 01:18:33,570
And based on that, I'm going to
try to convict as many people

1621
01:18:33,570 --> 01:18:37,170
as they can, all right?

1622
01:18:37,170 --> 01:18:39,780
So that's called the
Nieman Pearson paradigm,

1623
01:18:39,780 --> 01:18:42,240
and we'll talk
about it next time.

1624
01:18:42,240 --> 01:18:43,140
All right.

1625
01:18:43,140 --> 01:18:44,990
Thank you.