1
00:00:00,120 --> 00:00:02,460
The following content is
provided under a Creative

2
00:00:02,460 --> 00:00:03,880
Commons license.

3
00:00:03,880 --> 00:00:06,090
Your support will help
MIT OpenCourseWare

4
00:00:06,090 --> 00:00:10,180
continue to offer high quality
educational resources for free.

5
00:00:10,180 --> 00:00:12,720
To make a donation or to
view additional materials

6
00:00:12,720 --> 00:00:16,650
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,650 --> 00:00:17,880
at ocw.mit.edu.

8
00:00:19,792 --> 00:00:21,750
PHILIPPE RIGOLLET: Doesn't
to run Flash Player.

9
00:00:21,750 --> 00:00:26,560
So I had to run them on Chrome.

10
00:00:26,560 --> 00:00:29,450
All right, so let's move
on to our second chapter.

11
00:00:29,450 --> 00:00:31,270
And hopefully, in
this chapter, you

12
00:00:31,270 --> 00:00:33,820
will feel a little
better if you felt

13
00:00:33,820 --> 00:00:36,412
like it was going a bit
fast in the first chapter.

14
00:00:36,412 --> 00:00:38,620
And the main reason we
actually went fast, especially

15
00:00:38,620 --> 00:00:40,534
in terms of
confidence intervals.

16
00:00:40,534 --> 00:00:41,950
Some of you came
and asked me what

17
00:00:41,950 --> 00:00:43,880
you mean by this is a
confidence interval?

18
00:00:43,880 --> 00:00:45,588
What does it mean that
it's not happening

19
00:00:45,588 --> 00:00:47,662
in there with probability
95%, et cetera?

20
00:00:47,662 --> 00:00:49,120
I just went really
fast so that you

21
00:00:49,120 --> 00:00:53,440
could see why I didn't want
to give you a first week doing

22
00:00:53,440 --> 00:00:57,170
probability only
without understanding

23
00:00:57,170 --> 00:00:59,780
what the statistical
context for it was.

24
00:00:59,780 --> 00:01:02,041
So hopefully, all
these things that we've

25
00:01:02,041 --> 00:01:03,790
done in terms of
probability, you actually

26
00:01:03,790 --> 00:01:05,701
know why we've been doing them.

27
00:01:05,701 --> 00:01:08,200
And so we're basically going
to go back to what we're doing,

28
00:01:08,200 --> 00:01:11,110
maybe start with some
statistical setup.

29
00:01:11,110 --> 00:01:13,000
But the goal of this
lecture is really

30
00:01:13,000 --> 00:01:17,500
going to go back again to
what we've seen from a purely

31
00:01:17,500 --> 00:01:18,940
statistical perspective.

32
00:01:18,940 --> 00:01:19,870
All right?

33
00:01:19,870 --> 00:01:22,660
So the first thing
we're going to do

34
00:01:22,660 --> 00:01:26,080
is explain why we're doing
statistical modeling, right?

35
00:01:26,080 --> 00:01:28,810
So in practice,
if you have data,

36
00:01:28,810 --> 00:01:30,620
if you observe a
bunch of points--

37
00:01:30,620 --> 00:01:34,820
in here, I gave you some
numbers, for example.

38
00:01:34,820 --> 00:01:37,960
So here's the partial data sets
with the number of siblings,

39
00:01:37,960 --> 00:01:40,180
including self, that were
collected from college

40
00:01:40,180 --> 00:01:41,470
students a few years back.

41
00:01:41,470 --> 00:01:43,570
So I was teaching
a class like yours

42
00:01:43,570 --> 00:01:44,950
and actually asked
students to go

43
00:01:44,950 --> 00:01:47,560
and fill out some Google form
and tell me a bunch of things.

44
00:01:47,560 --> 00:01:49,870
And one of the questions
was, including yourself,

45
00:01:49,870 --> 00:01:51,520
how many siblings do you have?

46
00:01:51,520 --> 00:01:54,790
And so they gave me this
list of numbers, right?

47
00:01:54,790 --> 00:01:57,411
And there's many ways I can
think of this list of numbers,

48
00:01:57,411 --> 00:01:57,910
right?

49
00:01:57,910 --> 00:02:01,780
I could think of it as being
just a discrete distribution

50
00:02:01,780 --> 00:02:05,200
on the set of
numbers between 1--

51
00:02:05,200 --> 00:02:08,070
I know there's not going to be
an answer which is less than 1,

52
00:02:08,070 --> 00:02:10,949
unless, well, someone doesn't
understand the question.

53
00:02:10,949 --> 00:02:14,620
But all the answers I should
get are positive integers--

54
00:02:14,620 --> 00:02:16,450
1, 2, 3, et cetera.

55
00:02:16,450 --> 00:02:18,970
And there probably
is an upper bound,

56
00:02:18,970 --> 00:02:20,780
but I don't know it
on the top of my head.

57
00:02:20,780 --> 00:02:22,900
So maybe I should say 100.

58
00:02:22,900 --> 00:02:24,550
Maybe I should say 15.

59
00:02:24,550 --> 00:02:25,660
It depends, right?

60
00:02:25,660 --> 00:02:28,611
And so I think the largest
number I got for this was 6.

61
00:02:28,611 --> 00:02:29,110
All right?

62
00:02:29,110 --> 00:02:33,610
So here you can see you have
pretty standard families,

63
00:02:33,610 --> 00:02:37,930
you know, lots of
1s, 2s, and 3s.

64
00:02:37,930 --> 00:02:39,640
What statistical
modeling is doing

65
00:02:39,640 --> 00:02:42,160
is to try to compress
this information that I

66
00:02:42,160 --> 00:02:44,920
could actually describe
in a very naive way.

67
00:02:44,920 --> 00:02:49,380
So let's start with the
basic usual statistical set

68
00:02:49,380 --> 00:02:49,960
up, right?

69
00:02:49,960 --> 00:02:52,540
So I will start with many
of the boards that look

70
00:02:52,540 --> 00:02:58,570
like x1, xn, random variables.

71
00:02:58,570 --> 00:03:01,120
And what I'm going to
assume, as we said typically

72
00:03:01,120 --> 00:03:04,600
is that those guys are IID.

73
00:03:04,600 --> 00:03:06,730
And they have some
distribution, all right?

74
00:03:06,730 --> 00:03:08,422
So they all share the
same distribution.

75
00:03:08,422 --> 00:03:10,630
And the fact that their IID
is so that I can actually

76
00:03:10,630 --> 00:03:11,350
do statistics.

77
00:03:11,350 --> 00:03:15,840
Statistics means looking at
the global averaging thing

78
00:03:15,840 --> 00:03:17,890
so that I can actually
get a sense of what

79
00:03:17,890 --> 00:03:20,316
the global behavior is
for the population, right?

80
00:03:20,316 --> 00:03:22,690
If I start assuming that those
things are not identically

81
00:03:22,690 --> 00:03:24,810
distributed-- they all
live on their own--

82
00:03:24,810 --> 00:03:28,380
that my sequence of number
is your number of siblings--

83
00:03:28,380 --> 00:03:30,190
the shoe size of this person--

84
00:03:30,190 --> 00:03:31,756
the depth of the
Charles River, and I

85
00:03:31,756 --> 00:03:33,130
start measuring
a bunch of stuff.

86
00:03:33,130 --> 00:03:34,990
There's nothing I can
actually get together.

87
00:03:34,990 --> 00:03:37,330
I need to have something
that's cohesive.

88
00:03:37,330 --> 00:03:41,350
And so here, I collected
some data that was cohesive.

89
00:03:41,350 --> 00:03:42,310
And so the goal here--

90
00:03:42,310 --> 00:03:44,830
the first thing is to say what
is the distribution that I

91
00:03:44,830 --> 00:03:46,586
actually have here, right?

92
00:03:50,250 --> 00:03:52,830
So I could actually
be very general.

93
00:03:52,830 --> 00:03:56,640
I could just say at
some distribution p.

94
00:03:56,640 --> 00:03:59,440
And let's so those are random
variables, not random vectors,

95
00:03:59,440 --> 00:03:59,940
right?

96
00:03:59,940 --> 00:04:02,060
I could collect entire
vectors about students,

97
00:04:02,060 --> 00:04:05,400
but let's say those are
just random variables.

98
00:04:05,400 --> 00:04:07,530
And so now I can start
making assumptions

99
00:04:07,530 --> 00:04:09,630
on this distribution p, right?

100
00:04:09,630 --> 00:04:11,550
What can I say about
a distribution?

101
00:04:11,550 --> 00:04:14,640
Well, maybe if those
numbers are continues,

102
00:04:14,640 --> 00:04:17,034
for example, I could assume
they have a density--

103
00:04:17,034 --> 00:04:19,810
a probability density function.

104
00:04:19,810 --> 00:04:21,220
That's already an assumption.

105
00:04:21,220 --> 00:04:23,469
Maybe I could start to assume
that they're probability

106
00:04:23,469 --> 00:04:25,620
density function is smooth.

107
00:04:25,620 --> 00:04:26,822
That's another assumption.

108
00:04:26,822 --> 00:04:29,280
Maybe I could actually assume
that it's piecewise constant.

109
00:04:29,280 --> 00:04:30,960
That's even better, right?

110
00:04:30,960 --> 00:04:33,270
And those things make my
life simpler and simpler,

111
00:04:33,270 --> 00:04:37,260
because what I do by making
the successive assumptions is

112
00:04:37,260 --> 00:04:40,190
reducing the degrees of
freedom of the space in which I

113
00:04:40,190 --> 00:04:42,970
am actually searching
the distribution.

114
00:04:42,970 --> 00:04:46,230
And so what we actually
want is to have something

115
00:04:46,230 --> 00:04:48,510
which is small enough
so we can actually

116
00:04:48,510 --> 00:04:50,710
have some averaging going on.

117
00:04:50,710 --> 00:04:53,940
But we also want something
which is big enough

118
00:04:53,940 --> 00:04:55,550
that it can actually express.

119
00:04:55,550 --> 00:04:58,590
It has chances of actually
containing a distribution that

120
00:04:58,590 --> 00:04:59,837
makes sense for us.

121
00:04:59,837 --> 00:05:01,920
So let's start with the
simplest possible example,

122
00:05:01,920 --> 00:05:07,420
which is when the
xi's belong to 0, 1.

123
00:05:07,420 --> 00:05:09,310
And as I said, here,
we don't have a choice.

124
00:05:09,310 --> 00:05:14,880
The distribution of those
guys has to be Bernoulli.

125
00:05:14,880 --> 00:05:18,330
And since they are IID,
they all share the same p.

126
00:05:18,330 --> 00:05:20,520
So that's definitely the
simplest possible thing

127
00:05:20,520 --> 00:05:21,600
I could think of.

128
00:05:21,600 --> 00:05:24,240
They are just Bernoulli p.

129
00:05:24,240 --> 00:05:28,030
And so all I would have to
figure out in this case is p.

130
00:05:31,770 --> 00:05:34,030
And this is the simplest case.

131
00:05:34,030 --> 00:05:37,570
And unsurprisingly, it has
the simplest answer, right?

132
00:05:37,570 --> 00:05:39,340
We will come back
to this example

133
00:05:39,340 --> 00:05:42,400
when we study maximum
likelihood estimators or method

134
00:05:42,400 --> 00:05:45,230
of moments estimators
by method of moments.

135
00:05:45,230 --> 00:05:49,180
But at the end of
the day, the things

136
00:05:49,180 --> 00:05:50,980
that we did-- the
things that we will do

137
00:05:50,980 --> 00:05:52,630
are always the
naive estimator you

138
00:05:52,630 --> 00:05:55,490
would come up with is what
is the proportion of 1.

139
00:05:55,490 --> 00:05:58,560
And this will be, in
pretty much all respects,

140
00:05:58,560 --> 00:06:01,071
the best estimator
you can think of.

141
00:06:01,071 --> 00:06:01,570
All right?

142
00:06:01,570 --> 00:06:03,819
So then we're going to try
to assess this performance.

143
00:06:03,819 --> 00:06:06,830
And we saw how to do that in
the first chapter as well.

144
00:06:06,830 --> 00:06:10,420
So this problem here somehow
is completely understood.

145
00:06:10,420 --> 00:06:11,630
We'll come back to it.

146
00:06:11,630 --> 00:06:14,400
But there's nothing fancy
that is going to happen.

147
00:06:14,400 --> 00:06:17,260
But now, I could have some
more complicated things.

148
00:06:17,260 --> 00:06:20,500
For example, in the example
of the students now,

149
00:06:20,500 --> 00:06:26,050
my xi's belong to the
sequence of integers 1, 2, 3,

150
00:06:26,050 --> 00:06:31,301
et cetera, OK, which is also
denoted by n, maybe without 0

151
00:06:31,301 --> 00:06:32,800
if you want to put
0 in that, right?

152
00:06:32,800 --> 00:06:36,250
So the positive integers.

153
00:06:36,250 --> 00:06:38,860
Or I could actually
just maybe put

154
00:06:38,860 --> 00:06:41,350
some prior knowledge
about how humans

155
00:06:41,350 --> 00:06:42,880
have time to have families.

156
00:06:42,880 --> 00:06:47,470
But maybe some people thought
of their college mates

157
00:06:47,470 --> 00:06:49,090
as being their
brothers and sisters.

158
00:06:49,090 --> 00:06:53,740
And one student would
actually put 465 siblings,

159
00:06:53,740 --> 00:06:56,530
because we're all good friends.

160
00:06:56,530 --> 00:06:59,279
Or maybe they actually think
that all their Facebook

161
00:06:59,279 --> 00:07:00,820
contacts are actually
their siblings.

162
00:07:00,820 --> 00:07:03,160
And so you never know
what's going to happen.

163
00:07:03,160 --> 00:07:04,874
So maybe you want
to account for this,

164
00:07:04,874 --> 00:07:06,790
but maybe you know that
people are reasonable,

165
00:07:06,790 --> 00:07:08,956
and they will actually give
you something like this.

166
00:07:11,790 --> 00:07:13,745
Now intuitively, maybe
you would say, well,

167
00:07:13,745 --> 00:07:15,620
why would you bother
doing this if you're not

168
00:07:15,620 --> 00:07:17,420
really sure about the 20?

169
00:07:17,420 --> 00:07:18,920
But I think that
probably all of you

170
00:07:18,920 --> 00:07:21,830
intuitively guess that this
is probably a good idea

171
00:07:21,830 --> 00:07:24,130
to start putting this
kind of assumption

172
00:07:24,130 --> 00:07:26,750
rather than allowing for any
number in the first place,

173
00:07:26,750 --> 00:07:30,830
because this eventually
will be injected

174
00:07:30,830 --> 00:07:33,145
into the precision
of our estimators.

175
00:07:33,145 --> 00:07:36,020
If I allow anything, it's going
to be more complicated for me

176
00:07:36,020 --> 00:07:37,600
to get an accurate estimator.

177
00:07:37,600 --> 00:07:40,370
If I know that the numbers
are either 1 or 2, then

178
00:07:40,370 --> 00:07:42,930
I'm actually going to be
slightly more accurate as well.

179
00:07:42,930 --> 00:07:43,430
All right?

180
00:07:43,430 --> 00:07:45,560
Because I know that, for
example, somebody put the 5,

181
00:07:45,560 --> 00:07:46,226
I can remove it.

182
00:07:46,226 --> 00:07:48,980
Then it's not going to actually
corrupt with my estimator.

183
00:07:48,980 --> 00:07:55,320
All right, so now,
let's say we actually

184
00:07:55,320 --> 00:07:57,080
agree that we have numbers.

185
00:07:57,080 --> 00:08:01,290
And here I put
seven numbers, OK?

186
00:08:01,290 --> 00:08:03,995
So I just said,
well, let's assume

187
00:08:03,995 --> 00:08:05,370
that the numbers
I'm going to get

188
00:08:05,370 --> 00:08:10,050
are going to be 1 all the way
to say this number that I denote

189
00:08:10,050 --> 00:08:12,360
by larger than or
equal to 7, which

190
00:08:12,360 --> 00:08:15,570
is a placeholder for any
numbers larger than 7, OK?

191
00:08:15,570 --> 00:08:17,250
Because I know
maybe I don't want

192
00:08:17,250 --> 00:08:20,750
to distinguish between people
that have 9 or 25 siblings.

193
00:08:20,750 --> 00:08:24,270
OK, and so now, this
is a distribution

194
00:08:24,270 --> 00:08:28,110
on seven possible values--
the discrete distributions.

195
00:08:28,110 --> 00:08:30,518
And you know from
your probability class

196
00:08:30,518 --> 00:08:32,309
that the way you describe
this distribution

197
00:08:32,309 --> 00:08:34,549
is using the probability
mass function.

198
00:08:44,520 --> 00:08:45,560
OK, or PMF--

199
00:08:48,570 --> 00:08:51,810
So that's how we describe
a discrete distribution.

200
00:08:51,810 --> 00:08:56,370
And the PMF is just a
list of numbers, right?

201
00:08:56,370 --> 00:08:58,590
So as I wrote here, you
have a list of numbers.

202
00:08:58,590 --> 00:09:00,750
And here, you wrote
the possible value

203
00:09:00,750 --> 00:09:03,000
that your random
variable can take.

204
00:09:03,000 --> 00:09:04,500
And here you rate
the probability

205
00:09:04,500 --> 00:09:07,020
that your random variable
takes this value.

206
00:09:07,020 --> 00:09:11,970
So the possible values being 1,
2, 3 all the way to larger than

207
00:09:11,970 --> 00:09:13,620
or equal to 7.

208
00:09:13,620 --> 00:09:16,050
And then I'm trying to
estimate those numbers.

209
00:09:16,050 --> 00:09:16,620
Right?

210
00:09:16,620 --> 00:09:20,480
If I give you those
numbers, at least up to this

211
00:09:20,480 --> 00:09:23,130
you know compression of all
numbers that are equal to 7,

212
00:09:23,130 --> 00:09:25,800
you have the full description
of your distribution.

213
00:09:25,800 --> 00:09:29,190
And that is the ultimate
goal of statistics, right?

214
00:09:29,190 --> 00:09:30,810
The ultimate goal
of statistics is

215
00:09:30,810 --> 00:09:33,314
to say what distribution
your data came from,

216
00:09:33,314 --> 00:09:34,980
because that's basically
the best you're

217
00:09:34,980 --> 00:09:36,970
going to be able to do.

218
00:09:36,970 --> 00:09:40,980
Now admittedly, if I started
looking at the fraction of 1s,

219
00:09:40,980 --> 00:09:44,610
and the fraction of 2s, and
the fraction of 3s, et cetera,

220
00:09:44,610 --> 00:09:48,180
I would actually eventually
get those numbers--

221
00:09:48,180 --> 00:09:49,950
just like looking at
the fraction of 1s

222
00:09:49,950 --> 00:09:53,000
gave me a good estimate for
p in the Bernoulli case,

223
00:09:53,000 --> 00:09:55,060
it would do the same
in this case, right?

224
00:09:55,060 --> 00:09:56,750
It's a pretty intuitive idea.

225
00:09:56,750 --> 00:09:59,076
It's just the law
of large numbers.

226
00:09:59,076 --> 00:10:00,200
Everybody agrees with that?

227
00:10:00,200 --> 00:10:02,560
If I look at the proportion
of 1s, the proportion of 2s,

228
00:10:02,560 --> 00:10:04,310
the proportion of 3s,
that should actually

229
00:10:04,310 --> 00:10:06,810
give me something that gets
closer and closer, as my sample

230
00:10:06,810 --> 00:10:10,360
size increases to what I want.

231
00:10:10,360 --> 00:10:14,610
But the problem is if my
sample size is not huge,

232
00:10:14,610 --> 00:10:17,450
here I have seven
numbers to estimate.

233
00:10:17,450 --> 00:10:20,700
And if I have 20
observations, the ratio

234
00:10:20,700 --> 00:10:23,280
is not really in my favor--

235
00:10:23,280 --> 00:10:26,089
20 observations to estimate
seven parameters-- some of them

236
00:10:26,089 --> 00:10:27,630
are going to be
pretty off, typically

237
00:10:27,630 --> 00:10:29,420
the ones with the large values.

238
00:10:29,420 --> 00:10:32,380
If you have only 20 students,
look at the list of numbers.

239
00:10:32,380 --> 00:10:34,588
I don't know how many numbers
I have, but it probably

240
00:10:34,588 --> 00:10:35,460
is close to 20--

241
00:10:35,460 --> 00:10:37,140
maybe 15 or something.

242
00:10:37,140 --> 00:10:39,420
And so if you look at
this list, nobody's

243
00:10:39,420 --> 00:10:45,570
actually-- nobody has four
or more siblings, right?

244
00:10:45,570 --> 00:10:46,900
There's no such person.

245
00:10:46,900 --> 00:10:49,200
So that means that eventually
from this data set,

246
00:10:49,200 --> 00:10:50,680
my estimates--

247
00:10:50,680 --> 00:10:56,040
so those numbers I denote by
say p1, p2, p3, et cetera--

248
00:10:56,040 --> 00:11:01,390
those estimates p4 hat would be
equal to what from this data?

249
00:11:06,150 --> 00:11:07,350
0, right?

250
00:11:07,350 --> 00:11:12,340
And p5 hat equal to 0 and
p6 hat would be equal to 0.

251
00:11:12,340 --> 00:11:16,090
And p larger than or equal
to 7 hat would be equal to 0.

252
00:11:16,090 --> 00:11:19,690
That would be my estimate
from this data set.

253
00:11:19,690 --> 00:11:21,580
So maybe this is not--

254
00:11:21,580 --> 00:11:25,300
maybe I want to actually
pull some information

255
00:11:25,300 --> 00:11:28,460
from the people who
have less siblings

256
00:11:28,460 --> 00:11:31,210
to try to make a guess,
which is probably

257
00:11:31,210 --> 00:11:33,940
slightly better for the
larger values, right?

258
00:11:33,940 --> 00:11:39,740
It's pretty clear that in
average, there is more than 0--

259
00:11:39,740 --> 00:11:42,850
the proportion of the
population of households

260
00:11:42,850 --> 00:11:46,460
that have four children or
more is definitely more than 0,

261
00:11:46,460 --> 00:11:46,960
all right?

262
00:11:46,960 --> 00:11:49,475
So it means that
my data set is not

263
00:11:49,475 --> 00:11:51,100
representative of
what I'm going to try

264
00:11:51,100 --> 00:11:53,649
to do is to find a model that
tries to use the data they have

265
00:11:53,649 --> 00:11:56,190
for the smaller values that I
can observe and just push it up

266
00:11:56,190 --> 00:11:57,530
to the other ones.

267
00:11:57,530 --> 00:12:00,970
And so what we can do is to
just reduce those parameters

268
00:12:00,970 --> 00:12:03,740
into something
that's understood.

269
00:12:03,740 --> 00:12:05,950
And this is part of the
modeling that I talked about

270
00:12:05,950 --> 00:12:07,100
in the first place.

271
00:12:07,100 --> 00:12:12,447
Now, how do you succinctly
describe a number of something?

272
00:12:12,447 --> 00:12:14,780
Well, one thing that you do
is the Poisson distribution,

273
00:12:14,780 --> 00:12:15,550
right?

274
00:12:15,550 --> 00:12:17,230
Why do Poisson?

275
00:12:17,230 --> 00:12:18,670
There's many reasons.

276
00:12:18,670 --> 00:12:20,660
Again, that's part
of statical modeling.

277
00:12:20,660 --> 00:12:23,710
But once you know that you
have number of something that

278
00:12:23,710 --> 00:12:26,302
can be modeled by a Poisson,
why not try a Poisson, right?

279
00:12:26,302 --> 00:12:27,510
You could just fit a Poisson.

280
00:12:27,510 --> 00:12:30,340
And the Poisson is something
that looks like this.

281
00:12:30,340 --> 00:12:32,090
And I guess you've all seen it.

282
00:12:32,090 --> 00:12:36,040
But if x follows a
Poisson distribution

283
00:12:36,040 --> 00:12:38,170
with parameter lambda,
than the probability

284
00:12:38,170 --> 00:12:42,200
that x is equal to little
x is equal to lambda

285
00:12:42,200 --> 00:12:47,420
to the x over factorial
x e to the minus lambda.

286
00:12:47,420 --> 00:12:47,920
OK?

287
00:12:51,280 --> 00:12:54,410
And if you did the sheet that
I gave you on the first day,

288
00:12:54,410 --> 00:12:55,970
you can check those numbers.

289
00:12:55,970 --> 00:13:00,200
So this is, of course, for x
equals 0, 1, et cetera, right?

290
00:13:00,200 --> 00:13:04,470
So x is in natural integers.

291
00:13:04,470 --> 00:13:08,480
And if you sum from x equals 0
to infinity, this thing you get

292
00:13:08,480 --> 00:13:09,660
is e to the lambda.

293
00:13:09,660 --> 00:13:11,390
And so they cancel,
and you have some

294
00:13:11,390 --> 00:13:13,730
which is equal to 1,
which is indeed a PMF.

295
00:13:13,730 --> 00:13:17,610
But what's key about this PMF
is that it never takes value 0.

296
00:13:17,610 --> 00:13:20,600
Like this thing is
always strictly positive.

297
00:13:20,600 --> 00:13:24,076
So whatever value of lambda
I find from this data

298
00:13:24,076 --> 00:13:25,700
will give me something
that's certainly

299
00:13:25,700 --> 00:13:29,240
more interesting than
just putting the value 0.

300
00:13:29,240 --> 00:13:31,490
But more importantly,
rather than having

301
00:13:31,490 --> 00:13:35,390
to estimate seven parameters
and, as a consequence,

302
00:13:35,390 --> 00:13:37,790
to actually have
to estimate 1, 2,

303
00:13:37,790 --> 00:13:42,150
3, 4 of them being equal to
0, I have only one parameter

304
00:13:42,150 --> 00:13:44,680
to estimate which is lambda.

305
00:13:44,680 --> 00:13:49,070
The problem with doing this
is that now lambda may not

306
00:13:49,070 --> 00:13:53,660
be just something as simple as
computing the average number.

307
00:13:53,660 --> 00:13:54,560
Right?

308
00:13:54,560 --> 00:13:55,690
In this case, it will.

309
00:13:55,690 --> 00:13:58,940
But in many instances,
it's actually not clear

310
00:13:58,940 --> 00:14:01,275
that this parametrization
with lambda that I chose--

311
00:14:01,275 --> 00:14:03,650
I'm going to be able to estimate
lambda just by computing

312
00:14:03,650 --> 00:14:06,220
the average number that I get.

313
00:14:06,220 --> 00:14:07,530
It will be the case.

314
00:14:07,530 --> 00:14:10,640
But if it's not, remember this
example of the exponential

315
00:14:10,640 --> 00:14:12,244
we did in the last lecture--

316
00:14:12,244 --> 00:14:13,910
we could use the delta
method and things

317
00:14:13,910 --> 00:14:16,400
like that to estimate them.

318
00:14:16,400 --> 00:14:20,900
All right, so
here's modeling 101.

319
00:14:20,900 --> 00:14:22,950
So the purpose of
modeling is to restrict

320
00:14:22,950 --> 00:14:26,630
the base of possible
distributions

321
00:14:26,630 --> 00:14:29,960
to a subspace that's actually
plausible, but much simpler

322
00:14:29,960 --> 00:14:31,590
for me to estimate.

323
00:14:31,590 --> 00:14:34,230
So we went from
all distributions

324
00:14:34,230 --> 00:14:37,110
on seven parameters,
which is a large space--

325
00:14:37,110 --> 00:14:38,590
that's a lot of things--

326
00:14:38,590 --> 00:14:41,030
to something which
is just one number.

327
00:14:41,030 --> 00:14:42,370
This number is positive.

328
00:14:46,490 --> 00:14:50,750
Any question about the
purpose of doing this?

329
00:14:50,750 --> 00:14:55,880
OK, so we're going to have to do
a little bit of formalism now.

330
00:14:55,880 --> 00:14:58,070
And so if we want to talk--

331
00:14:58,070 --> 00:14:59,420
this is a statistics classroom.

332
00:14:59,420 --> 00:15:01,700
I'm not going to want to
talk about the Poisson model

333
00:15:01,700 --> 00:15:03,260
specifically every single time.

334
00:15:03,260 --> 00:15:05,300
I'm going to want to talk
about generic models.

335
00:15:05,300 --> 00:15:08,370
And then you're going to build
to plug in your favorite word--

336
00:15:08,370 --> 00:15:11,066
Poisson, binomial,
exponential, uniform--

337
00:15:11,066 --> 00:15:12,440
all these words
that you've seen,

338
00:15:12,440 --> 00:15:13,920
you're going to be
able to plug in there.

339
00:15:13,920 --> 00:15:16,045
But we're just going to
have some generic notations

340
00:15:16,045 --> 00:15:19,170
and some generic terminology
for a statistical model.

341
00:15:19,170 --> 00:15:19,670
All right?

342
00:15:19,670 --> 00:15:22,430
So here is the
formal definition.

343
00:15:22,430 --> 00:15:24,160
So I'm going to go
through it with you.

344
00:15:29,710 --> 00:15:35,580
OK, so the definition is
that of a statistical model.

345
00:15:47,330 --> 00:15:47,830
OK?

346
00:15:50,814 --> 00:15:53,280
Sorry, that's a statistical
experiment, I should say.

347
00:16:00,520 --> 00:16:04,861
So a statistical experiment
is actually just a pair--

348
00:16:04,861 --> 00:16:08,930
E. And that's a set--

349
00:16:11,630 --> 00:16:19,640
and a family of distributions
P theta, where theta ranges

350
00:16:19,640 --> 00:16:22,050
in some set capital theta.

351
00:16:22,050 --> 00:16:22,550
OK?

352
00:16:22,550 --> 00:16:26,040
So I hope you're up to date
with your Greek letters.

353
00:16:26,040 --> 00:16:28,870
So the small theta
is the capital theta.

354
00:16:28,870 --> 00:16:29,980
And enough of us--

355
00:16:29,980 --> 00:16:31,690
I don't have the handwriting.

356
00:16:31,690 --> 00:16:34,010
So if you don't see
something, just ask me.

357
00:16:34,010 --> 00:16:36,650
And so this thing now--
so each of this guy

358
00:16:36,650 --> 00:16:40,700
is a probability distribution.

359
00:16:40,700 --> 00:16:41,390
All right?

360
00:16:41,390 --> 00:16:47,460
So for example, this could be
a Poisson with parameter theta

361
00:16:47,460 --> 00:16:51,560
or a Bernoulli with
parameter theta--

362
00:16:51,560 --> 00:16:54,091
OK, or an exponential
with parameter--

363
00:16:54,091 --> 00:16:56,090
I don't know-- 1 over
theta squared if you want.

364
00:16:58,620 --> 00:17:00,850
OK, but they're just
indexed by theta.

365
00:17:00,850 --> 00:17:02,820
But for each theta,
this completely

366
00:17:02,820 --> 00:17:05,190
describes the distribution.

367
00:17:05,190 --> 00:17:06,510
It could be more complicated.

368
00:17:06,510 --> 00:17:09,980
This theta should be a pair--
could be a pair-- a mu sigma

369
00:17:09,980 --> 00:17:10,619
square.

370
00:17:10,619 --> 00:17:16,790
And that could actually give
you some n mu sigma squared.

371
00:17:16,790 --> 00:17:20,720
OK so anything where
you can actually--

372
00:17:20,720 --> 00:17:24,319
rather than actually giving
you a full distribution,

373
00:17:24,319 --> 00:17:26,193
I can compress into a parameter.

374
00:17:26,193 --> 00:17:27,109
But it could be worse.

375
00:17:27,109 --> 00:17:28,460
It could be this guy here.

376
00:17:28,460 --> 00:17:28,960
Right?

377
00:17:28,960 --> 00:17:32,110
Theta could be p1--

378
00:17:32,110 --> 00:17:34,850
p larger than or equal to 7.

379
00:17:34,850 --> 00:17:37,610
And my distribution could just
be something that has PMF--

380
00:17:40,830 --> 00:17:44,779
p1-- p larger than 7.

381
00:17:44,779 --> 00:17:45,820
That's another parameter.

382
00:17:45,820 --> 00:17:48,524
This one is seven dimensional.

383
00:17:48,524 --> 00:17:49,690
This one is two dimensional.

384
00:17:49,690 --> 00:17:52,590
And all these guys are
just one dimensional.

385
00:17:52,590 --> 00:17:55,460
All these guys are parameters.

386
00:17:55,460 --> 00:17:56,939
Is that clear?

387
00:17:56,939 --> 00:17:59,230
What's important here is that
once they give you theta,

388
00:17:59,230 --> 00:18:00,860
you know exactly all the
probabilities associated

389
00:18:00,860 --> 00:18:01,840
with this random variable.

390
00:18:01,840 --> 00:18:03,401
You know its
distribution perfectly.

391
00:18:09,307 --> 00:18:10,390
So this is the definition.

392
00:18:10,390 --> 00:18:11,150
Is that clear?

393
00:18:11,150 --> 00:18:13,682
Is there a question
about this distribution--

394
00:18:13,682 --> 00:18:14,890
about this definition, sorry?

395
00:18:17,670 --> 00:18:18,360
All right.

396
00:18:18,360 --> 00:18:22,440
So really, the key thing is the
statistical model associated

397
00:18:22,440 --> 00:18:24,090
to a statistical experiments.

398
00:18:24,090 --> 00:18:24,590
OK?

399
00:18:27,807 --> 00:18:29,140
So let's just see some examples.

400
00:18:29,140 --> 00:18:31,630
It's probably just better
because, again, the formalism

401
00:18:31,630 --> 00:18:33,900
is never really clear.

402
00:18:33,900 --> 00:18:35,860
Actually, that's the next slide.

403
00:18:35,860 --> 00:18:40,900
OK, so there's two
things we need to assume.

404
00:18:40,900 --> 00:18:44,680
OK, so the purpose of
a statistical model

405
00:18:44,680 --> 00:18:46,795
is once I estimate
the parameter,

406
00:18:46,795 --> 00:18:51,310
I actually know exactly what
distribution it has, OK?

407
00:18:51,310 --> 00:18:54,760
So it means that I
could potentially

408
00:18:54,760 --> 00:18:56,410
have several
parameters that give me

409
00:18:56,410 --> 00:18:59,034
the same distribution that would
still be fine, because I could

410
00:18:59,034 --> 00:18:59,800
estimate one guy.

411
00:18:59,800 --> 00:19:01,150
Or I could estimate
the other guy.

412
00:19:01,150 --> 00:19:03,220
And I would still recover
the underlying distribution

413
00:19:03,220 --> 00:19:03,730
of my data.

414
00:19:04,850 --> 00:19:07,420
The problem is that this
creates really annoying

415
00:19:07,420 --> 00:19:09,797
theoretical
problems, like things

416
00:19:09,797 --> 00:19:11,380
don't work, the
algorithms won't work,

417
00:19:11,380 --> 00:19:12,629
the guarantees won't work.

418
00:19:12,629 --> 00:19:14,670
And so what we typically
assume is that the model

419
00:19:14,670 --> 00:19:16,300
is so-called well-specified.

420
00:19:18,980 --> 00:19:20,665
Sorry, that's not
well specified.

421
00:19:20,665 --> 00:19:23,740
I'm jumping ahead of myself.

422
00:19:23,740 --> 00:19:28,690
OK, well-specified
means that your data--

423
00:19:28,690 --> 00:19:32,720
the distribution of your data
is actually one of those guys.

424
00:19:32,720 --> 00:19:33,220
OK?

425
00:19:33,220 --> 00:19:46,070
So some vocabulary--
so well-specified

426
00:19:46,070 --> 00:19:51,310
means that for my
observations x,

427
00:19:51,310 --> 00:19:55,250
there exists a theta
in capital theta

428
00:19:55,250 --> 00:20:00,764
such that x follows p sub theta.

429
00:20:00,764 --> 00:20:03,610
I should put a double bar.

430
00:20:03,610 --> 00:20:06,850
OK, so that's what
well-specified means.

431
00:20:06,850 --> 00:20:08,770
So that means that
the distribution

432
00:20:08,770 --> 00:20:12,920
of your actual data is
just one of those guys.

433
00:20:12,920 --> 00:20:18,790
This is a bit strong
of an assumption.

434
00:20:18,790 --> 00:20:20,920
It's strong in the sense that--

435
00:20:20,920 --> 00:20:26,740
I don't know if you've heard of
this sense, which I don't know,

436
00:20:26,740 --> 00:20:28,370
I can tell you who
it's attributed to,

437
00:20:28,370 --> 00:20:30,370
but that probably means
that this person did not

438
00:20:30,370 --> 00:20:31,930
come up with it.

439
00:20:31,930 --> 00:20:35,920
But I said that all models
are wrong, but some of them

440
00:20:35,920 --> 00:20:37,890
are useful.

441
00:20:37,890 --> 00:20:40,690
All right, so all
models are wrong

442
00:20:40,690 --> 00:20:44,470
means that maybe it's not true
that this Poisson distribution

443
00:20:44,470 --> 00:20:47,965
that I assume for the number of
siblings for college students--

444
00:20:47,965 --> 00:20:50,350
maybe that's not
perfectly correct.

445
00:20:50,350 --> 00:20:53,210
Maybe there's a spike
at three, right?

446
00:20:53,210 --> 00:20:55,900
Maybe there's a spike at
one, because you know,

447
00:20:55,900 --> 00:20:58,180
maybe those are slightly
more educated families.

448
00:20:58,180 --> 00:20:59,260
They have less children.

449
00:20:59,260 --> 00:21:02,260
Maybe this is actually
not exactly perfect.

450
00:21:02,260 --> 00:21:04,496
But it's probably good
enough for our purposes.

451
00:21:04,496 --> 00:21:05,870
And when we make
this assumption,

452
00:21:05,870 --> 00:21:07,750
we're actually assuming
that the data really

453
00:21:07,750 --> 00:21:09,756
comes from a Poisson model.

454
00:21:09,756 --> 00:21:11,380
There is a lot of
research that goes on

455
00:21:11,380 --> 00:21:14,380
about misspecified
models and that tells you

456
00:21:14,380 --> 00:21:16,686
how well you're doing
in the model that's

457
00:21:16,686 --> 00:21:18,310
the closest to the
actual distribution.

458
00:21:18,310 --> 00:21:19,630
So that's pretty much it.

459
00:21:19,630 --> 00:21:21,049
Yeah?

460
00:21:21,049 --> 00:21:22,470
AUDIENCE: [INAUDIBLE].

461
00:21:24,130 --> 00:21:25,970
PHILIPPE RIGOLLET: So my data--

462
00:21:25,970 --> 00:21:29,620
so it's always the
way I denote one

463
00:21:29,620 --> 00:21:31,480
of the generic
observations, right?

464
00:21:31,480 --> 00:21:36,100
So my observations are x1, xn.

465
00:21:36,100 --> 00:21:39,670
And they're IID with
distribution p--

466
00:21:39,670 --> 00:21:40,990
always.

467
00:21:40,990 --> 00:21:42,610
So x is just one of those guys.

468
00:21:42,610 --> 00:21:46,720
I don't want to write x5 or x4.

469
00:21:46,720 --> 00:21:47,490
They're IID.

470
00:21:47,490 --> 00:21:49,840
So they all have the
same distribution.

471
00:21:49,840 --> 00:21:54,780
So OK-- no, no, no.

472
00:21:54,780 --> 00:21:55,490
They're all IID.

473
00:21:55,490 --> 00:21:57,490
So they all have
the same p data.

474
00:21:57,490 --> 00:21:59,150
They'll have the
same p, which means

475
00:21:59,150 --> 00:22:00,840
they'll have the same p data.

476
00:22:00,840 --> 00:22:02,250
So I can pick any one of them.

477
00:22:02,250 --> 00:22:05,470
So I'd just remove the
index just so we're clear.

478
00:22:05,470 --> 00:22:06,940
OK?

479
00:22:06,940 --> 00:22:09,580
So when I write x, I
just mean think of x1.

480
00:22:09,580 --> 00:22:10,540
Right they're an idea.

481
00:22:10,540 --> 00:22:12,607
I can pick whichever I want.

482
00:22:12,607 --> 00:22:13,690
I'm not going to write x1.

483
00:22:13,690 --> 00:22:14,648
It's going to be weird.

484
00:22:17,070 --> 00:22:18,470
OK?

485
00:22:18,470 --> 00:22:19,670
Is that clear?

486
00:22:19,670 --> 00:22:20,780
OK.

487
00:22:20,780 --> 00:22:26,522
So this particular theta is
called the true parameter.

488
00:22:34,240 --> 00:22:37,060
Sometimes since we're going
to want some variable theta,

489
00:22:37,060 --> 00:22:41,610
we might denote it by
theta star as opposed

490
00:22:41,610 --> 00:22:43,750
to theta hat, which is
always our estimator.

491
00:22:43,750 --> 00:22:47,250
But I'll keep it to
be theta for now.

492
00:22:47,250 --> 00:22:50,280
And so the aim of this
physical experiment

493
00:22:50,280 --> 00:22:52,500
is to estimate theta
so that once I actually

494
00:22:52,500 --> 00:22:56,010
plug in theta in the form of
my distribution, for example,

495
00:22:56,010 --> 00:22:58,020
I could plug in theta here.

496
00:22:58,020 --> 00:23:01,600
So theta here was
actually lambda.

497
00:23:01,600 --> 00:23:03,600
So once I estimate this
guy, I would plug it in,

498
00:23:03,600 --> 00:23:06,183
and I would know the probability
that my random variable takes

499
00:23:06,183 --> 00:23:09,240
any value, by just putting the
lambda hat and the lambda hat

500
00:23:09,240 --> 00:23:10,700
here.

501
00:23:10,700 --> 00:23:11,240
OK?

502
00:23:11,240 --> 00:23:12,960
So my goal is going
to be to estimate

503
00:23:12,960 --> 00:23:16,080
this guy so that I can actually
compute those distributions.

504
00:23:16,080 --> 00:23:18,520
But actually, we'll
see, for example,

505
00:23:18,520 --> 00:23:21,540
when we talk about regression
that this parameter actually

506
00:23:21,540 --> 00:23:23,340
has a meaning in many instances.

507
00:23:23,340 --> 00:23:26,670
And so just knowing
the parameter itself

508
00:23:26,670 --> 00:23:30,360
intuitively or say more--

509
00:23:30,360 --> 00:23:33,680
let's say more so than just
computing probabilities,

510
00:23:33,680 --> 00:23:36,480
will actually tell us
something about the process.

511
00:23:36,480 --> 00:23:38,880
For example, we're going
to run linear regression.

512
00:23:38,880 --> 00:23:40,260
And when we do
linear regression,

513
00:23:40,260 --> 00:23:41,490
there's going to be
some coefficients

514
00:23:41,490 --> 00:23:42,810
in the linear regression.

515
00:23:42,810 --> 00:23:44,250
And the value of
this coefficient

516
00:23:44,250 --> 00:23:47,460
is actually telling me what is
the sensitivity of the response

517
00:23:47,460 --> 00:23:50,620
that I'm looking at to
this particular input.

518
00:23:50,620 --> 00:23:51,120
All right?

519
00:23:51,120 --> 00:23:52,950
So just knowing if
this number is larger

520
00:23:52,950 --> 00:23:55,050
or if this number
is small is actually

521
00:23:55,050 --> 00:23:58,180
going to be useful for us
to just look at this guy.

522
00:23:58,180 --> 00:23:58,680
All right?

523
00:23:58,680 --> 00:24:00,485
So there's going to be
some instances where

524
00:24:00,485 --> 00:24:01,610
it's going to be important.

525
00:24:01,610 --> 00:24:04,026
Sometimes we're going to want
to know if this parameter is

526
00:24:04,026 --> 00:24:07,260
larger or smaller than something
or if it's equal to something

527
00:24:07,260 --> 00:24:08,647
or not equal to something.

528
00:24:08,647 --> 00:24:10,730
And those things are also
important-- for example,

529
00:24:10,730 --> 00:24:13,091
if theta actually
measures the true--

530
00:24:13,091 --> 00:24:13,590
right?

531
00:24:13,590 --> 00:24:16,380
So theta is the true unknown
parameter-- true efficacy

532
00:24:16,380 --> 00:24:18,010
of a drug.

533
00:24:18,010 --> 00:24:18,510
OK?

534
00:24:18,510 --> 00:24:21,720
Let's say I want to know what
the true efficacy of a drug is.

535
00:24:21,720 --> 00:24:25,084
And what I'm going to want to
know is maybe it's a score.

536
00:24:25,084 --> 00:24:27,500
Maybe I'm going to want to
know if theta is larger than 2.

537
00:24:27,500 --> 00:24:30,166
Maybe I want to know if theta is
the average number of siblings.

538
00:24:30,166 --> 00:24:32,080
Is this true number
larger than 2 or not?

539
00:24:32,080 --> 00:24:32,580
Right?

540
00:24:32,580 --> 00:24:37,410
Maybe I am interested in knowing
if college students come from--

541
00:24:37,410 --> 00:24:40,217
so maybe from a
sociological perspective,

542
00:24:40,217 --> 00:24:42,300
I'm interested in knowing
if college students come

543
00:24:42,300 --> 00:24:45,375
from households with
more than two children.

544
00:24:45,375 --> 00:24:47,070
All right, so those
can be the questions

545
00:24:47,070 --> 00:24:48,814
that I may ask myself.

546
00:24:48,814 --> 00:24:50,730
I'm going to want to
know maybe theta is going

547
00:24:50,730 --> 00:24:51,940
to be equal to 1/2 or not.

548
00:24:51,940 --> 00:24:54,810
So maybe for a drug
efficacy, is it completely

549
00:24:54,810 --> 00:24:57,640
standard-- maybe for elections.

550
00:24:57,640 --> 00:24:59,220
Is the proportion
of the population

551
00:24:59,220 --> 00:25:02,380
that is going to vote for
this particular candidate

552
00:25:02,380 --> 00:25:03,420
equal to 0.5?

553
00:25:03,420 --> 00:25:05,584
Or is it different from 0.5?

554
00:25:05,584 --> 00:25:07,250
OK, and I can think
of different things.

555
00:25:07,250 --> 00:25:09,025
When I'm talking
about the regression,

556
00:25:09,025 --> 00:25:11,400
I'm going to want to test if
this coefficient is actually

557
00:25:11,400 --> 00:25:13,650
0 or not, because
if it's 0, it means

558
00:25:13,650 --> 00:25:17,200
that the variable that's in
front of it actually goes out.

559
00:25:17,200 --> 00:25:18,900
And so those are
things we're testing.

560
00:25:18,900 --> 00:25:22,050
Actually having this very
specific yes/no answer

561
00:25:22,050 --> 00:25:26,760
is going to give me a huge
intuition or huge understanding

562
00:25:26,760 --> 00:25:29,850
of what's going on in the
phenomenon that I observe.

563
00:25:29,850 --> 00:25:32,850
But actually, since the
questions are so precise,

564
00:25:32,850 --> 00:25:34,204
it's going to be much more--

565
00:25:34,204 --> 00:25:36,370
I'm going to be much better
at answering them rather

566
00:25:36,370 --> 00:25:38,520
than giving you an
estimate for theta

567
00:25:38,520 --> 00:25:41,240
with some confidence around it.

568
00:25:41,240 --> 00:25:44,870
All right, it's sort of the same
principle as trying to reduce.

569
00:25:44,870 --> 00:25:46,620
What you're trying to
do as a statistician

570
00:25:46,620 --> 00:25:49,830
is to inject as much knowledge
about the question and about

571
00:25:49,830 --> 00:25:52,740
the problem that you
can so that the data has

572
00:25:52,740 --> 00:25:54,450
to do a minimal job.

573
00:25:54,450 --> 00:25:58,300
And henceforth, you
actually need less data.

574
00:25:58,300 --> 00:26:00,720
So from now on, we
will always assume--

575
00:26:00,720 --> 00:26:03,030
and this is because this
is an intro stats class--

576
00:26:03,030 --> 00:26:05,550
you will always
assume that theta--

577
00:26:05,550 --> 00:26:09,180
the subset of parameters
is a subset of r to the d.

578
00:26:09,180 --> 00:26:11,940
That means that
theta is a vector

579
00:26:11,940 --> 00:26:16,320
with at most a finite
number of coordinates.

580
00:26:16,320 --> 00:26:17,970
Why do I say this?

581
00:26:17,970 --> 00:26:20,280
Well, this is called
a parametric model.

582
00:26:20,280 --> 00:26:31,750
So it's called a parametric
model or sometimes

583
00:26:31,750 --> 00:26:35,022
parametric statistics.

584
00:26:35,022 --> 00:26:37,480
Actually, we don't really talk
about parametric statistics.

585
00:26:37,480 --> 00:26:40,330
But we talk a lot about
nonparametric statistics

586
00:26:40,330 --> 00:26:42,340
or a non-parametric model.

587
00:26:42,340 --> 00:26:45,852
Can somebody think of a model
which is non-parametric?

588
00:26:53,090 --> 00:26:56,190
For example, in the
siblings example,

589
00:26:56,190 --> 00:27:01,160
if I did not cap the
number of siblings to 7,

590
00:27:01,160 --> 00:27:06,350
but I let this list
go to infinity,

591
00:27:06,350 --> 00:27:09,530
I would have an infinite number
of parameters to estimate.

592
00:27:09,530 --> 00:27:12,207
Very likely, the
last ones would be 0.

593
00:27:12,207 --> 00:27:14,540
But still, I would have an
infinite number of parameters

594
00:27:14,540 --> 00:27:15,295
to estimate.

595
00:27:15,295 --> 00:27:17,400
So this would not be
a parametric model

596
00:27:17,400 --> 00:27:19,430
if I just let this
list of things

597
00:27:19,430 --> 00:27:21,740
to be estimated to be infinite.

598
00:27:21,740 --> 00:27:24,580
But there's other classes
that are actually infinite

599
00:27:24,580 --> 00:27:26,990
and cannot represented
by vectors.

600
00:27:26,990 --> 00:27:29,700
For example, function-- right?

601
00:27:29,700 --> 00:27:38,870
If I tell you my
model, pf, is just

602
00:27:38,870 --> 00:27:43,880
the distribution of x's, the
probability distributions,

603
00:27:43,880 --> 00:27:48,187
that have density f, right?

604
00:27:48,187 --> 00:27:50,270
So what I know is that the
density is non-negative

605
00:27:50,270 --> 00:27:52,100
and that it integrates
to one, right?

606
00:27:52,100 --> 00:27:54,900
That's all I know
about densities.

607
00:27:54,900 --> 00:27:57,620
Well f is not
something you're going

608
00:27:57,620 --> 00:28:01,250
to be able to describe with a
finite number of values, right?

609
00:28:01,250 --> 00:28:03,610
All possible functions
is the huge set.

610
00:28:03,610 --> 00:28:08,730
It's certainly not
representable by 10 numbers.

611
00:28:08,730 --> 00:28:12,470
And so non-parametric
estimation is typically

612
00:28:12,470 --> 00:28:14,600
when you actually want
to parametrize this

613
00:28:14,600 --> 00:28:17,600
by a large class of functions.

614
00:28:17,600 --> 00:28:20,810
And so for example,
histograms is the prime tool

615
00:28:20,810 --> 00:28:22,970
of non-parametric
estimation, because when

616
00:28:22,970 --> 00:28:24,470
you fit a histogram
to data, you're

617
00:28:24,470 --> 00:28:26,929
trying to estimate the
density of your data,

618
00:28:26,929 --> 00:28:28,470
but you're not trying
to represent it

619
00:28:28,470 --> 00:28:31,920
as a finite number of points.

620
00:28:31,920 --> 00:28:35,040
That's really-- I
mean, effectively,

621
00:28:35,040 --> 00:28:36,390
you have to represent it, right?

622
00:28:36,390 --> 00:28:38,480
So you actually truncate
somewhere and just

623
00:28:38,480 --> 00:28:40,670
say those things are
not going to matter.

624
00:28:40,670 --> 00:28:41,360
All right?

625
00:28:41,360 --> 00:28:44,690
But really the key thing is
that this is non-parametric

626
00:28:44,690 --> 00:28:47,360
where you have a potentially
infinite number of parameters.

627
00:28:47,360 --> 00:28:49,490
Whereas we're going to
only talk about finites.

628
00:28:49,490 --> 00:28:53,790
And actually finite in the
overwhelming majority of cases

629
00:28:53,790 --> 00:28:55,320
is going to be 1.

630
00:28:55,320 --> 00:28:58,404
So theta is going to
be a subset of r1.

631
00:28:58,404 --> 00:29:00,320
OK, we're going to be
interested in estimating

632
00:29:00,320 --> 00:29:03,770
one parameter just like
the parameter of a Poisson

633
00:29:03,770 --> 00:29:05,760
or the parameter
of an exponential--

634
00:29:05,760 --> 00:29:07,460
the parameter of Bernoulli.

635
00:29:07,460 --> 00:29:09,539
But for example,
really, we're going

636
00:29:09,539 --> 00:29:11,330
to be interested in
estimating mu and sigma

637
00:29:11,330 --> 00:29:12,730
square for the normal.

638
00:29:17,880 --> 00:29:19,850
So here are some
statistical models.

639
00:29:19,850 --> 00:29:20,350
All right?

640
00:29:20,350 --> 00:29:23,040
So I'm going to go
through them with you.

641
00:29:31,360 --> 00:29:35,050
So if I tell you I observe--

642
00:29:35,050 --> 00:29:38,940
I'm interested in
understanding--

643
00:29:38,940 --> 00:29:42,040
I'm still [INAUDIBLE] I'm
interested in understanding

644
00:29:42,040 --> 00:29:44,680
the proportion of people
who kiss by bending

645
00:29:44,680 --> 00:29:46,240
their head to the right.

646
00:29:46,240 --> 00:29:50,050
And for that, I
collected n observations.

647
00:29:50,050 --> 00:29:53,050
And I'm interested in
making some inference

648
00:29:53,050 --> 00:29:54,880
in the statistical model.

649
00:29:54,880 --> 00:29:58,000
My question to you is, what
is the statistical model?

650
00:29:58,000 --> 00:30:00,050
Well, if you want to read
the statistical model,

651
00:30:00,050 --> 00:30:02,170
you're going to have
to write this E--

652
00:30:02,170 --> 00:30:03,960
oh, sorry, I never
told you what E was.

653
00:30:03,960 --> 00:30:06,610
OK, well actually just
go to the examples,

654
00:30:06,610 --> 00:30:09,350
and then you'll know what E is.

655
00:30:09,350 --> 00:30:14,532
So you're going to have to write
to me an E and a p theta, OK?

656
00:30:14,532 --> 00:30:16,240
So let's start with
the Bernoulli trials.

657
00:30:25,180 --> 00:30:29,480
So this e here is
called the sample space.

658
00:30:33,290 --> 00:30:37,040
And in the normal
people's words,

659
00:30:37,040 --> 00:30:44,980
it just means the space or
the set in which x and--

660
00:30:44,980 --> 00:30:48,200
back to your question, x is
just a generic observation lips.

661
00:30:51,620 --> 00:30:56,110
OK, and hopefully, this is
the smallest you can think of.

662
00:30:56,110 --> 00:30:58,560
OK, so for example,
for Bernoulli trials,

663
00:30:58,560 --> 00:31:01,270
I'm going to observe a
sequence of 0's and 1's.

664
00:31:01,270 --> 00:31:04,360
So my experiment is going to
be-- as written on the board,

665
00:31:04,360 --> 00:31:06,880
is going to be 1, 0, 1.

666
00:31:06,880 --> 00:31:08,657
And then the probability
distributions

667
00:31:08,657 --> 00:31:10,240
are going to be,
well, it's just going

668
00:31:10,240 --> 00:31:13,290
to be the Bernoulli
distributions indexed

669
00:31:13,290 --> 00:31:14,240
by p, right?

670
00:31:14,240 --> 00:31:17,050
So rather than
writing p sub p, I'm

671
00:31:17,050 --> 00:31:20,650
going to write it
as Bernoulli p,

672
00:31:20,650 --> 00:31:24,145
because it's clear what
I mean when I write that.

673
00:31:24,145 --> 00:31:25,537
Is everybody happy?

674
00:31:25,537 --> 00:31:27,370
Actually, I need to
tell you something more.

675
00:31:27,370 --> 00:31:28,880
This is a family
of distributions.

676
00:31:28,880 --> 00:31:29,932
So I need p.

677
00:31:29,932 --> 00:31:31,390
And maybe I don't
want to have to p

678
00:31:31,390 --> 00:31:33,370
that's a value 0, 1, right?

679
00:31:33,370 --> 00:31:34,390
It doesn't make sense.

680
00:31:34,390 --> 00:31:37,660
I would probably not
look at this problem

681
00:31:37,660 --> 00:31:40,660
if I anticipated that everybody
would kiss to the right.

682
00:31:40,660 --> 00:31:43,280
And everybody would
kiss to the left.

683
00:31:43,280 --> 00:31:45,400
So I am going to assume
that p is in 0, 1,

684
00:31:45,400 --> 00:31:47,930
but does not have 0 and 1.

685
00:31:47,930 --> 00:31:48,430
OK?

686
00:31:48,430 --> 00:31:52,494
So that's the statistical
model for a Bernoulli trial.

687
00:32:00,250 --> 00:32:03,180
OK, now the next
one, what do we have?

688
00:32:03,180 --> 00:32:03,991
Exponential.

689
00:32:03,991 --> 00:32:04,490
OK?

690
00:32:09,630 --> 00:32:12,684
OK, so when I have
exponential distributions,

691
00:32:12,684 --> 00:32:14,850
what is the support of the
exponential distribution?

692
00:32:14,850 --> 00:32:17,150
What value is it going to take?

693
00:32:20,520 --> 00:32:23,190
0 to infinity, right?

694
00:32:23,190 --> 00:32:26,700
So what I have is
that my first space

695
00:32:26,700 --> 00:32:28,740
is the value that my
random variables can take.

696
00:32:28,740 --> 00:32:34,290
So its-- well, actually I
can remove the 0 again--

697
00:32:34,290 --> 00:32:37,140
0 to plus infinity.

698
00:32:37,140 --> 00:32:39,450
And then the family
of distributions

699
00:32:39,450 --> 00:32:43,320
that I have are exponential
with parameter lambda.

700
00:32:43,320 --> 00:32:45,090
And again, maybe you've
seen me switching

701
00:32:45,090 --> 00:32:49,147
from p, to lambda, to theta,
to mu, to sigma square.

702
00:32:49,147 --> 00:32:50,730
Honestly you can do
whatever you want.

703
00:32:50,730 --> 00:32:53,430
But its just that it's customary
to have this particular group

704
00:32:53,430 --> 00:32:54,740
of letters.

705
00:32:54,740 --> 00:32:55,620
OK?

706
00:32:55,620 --> 00:32:58,950
And so the parameters
of an exponential

707
00:32:58,950 --> 00:33:02,210
are just positive numbers.

708
00:33:02,210 --> 00:33:02,710
OK?

709
00:33:02,710 --> 00:33:08,714
And that's my exponential model.

710
00:33:08,714 --> 00:33:09,630
What is the third one?

711
00:33:09,630 --> 00:33:11,960
Can somebody tell me?

712
00:33:11,960 --> 00:33:12,960
Poisson, OK?

713
00:33:16,080 --> 00:33:20,230
OK, so Poisson-- is a
Poisson random verbal

714
00:33:20,230 --> 00:33:21,545
discrete or continuous?

715
00:33:27,720 --> 00:33:29,740
Go back to your probability.

716
00:33:29,740 --> 00:33:34,150
All right, so the answer being
the opposite of continuous--

717
00:33:34,150 --> 00:33:36,790
good job.

718
00:33:36,790 --> 00:33:38,471
All right, so it's going to be--

719
00:33:38,471 --> 00:33:39,720
what value can a Poisson take?

720
00:33:43,157 --> 00:33:44,490
All the natural integers, right?

721
00:33:44,490 --> 00:33:47,434
So 0, 1, 2, 3, all
the way to infinity.

722
00:33:47,434 --> 00:33:48,850
We don't have any
control of this.

723
00:33:48,850 --> 00:33:53,830
So I'm going to write
this as n without 0.

724
00:33:53,830 --> 00:33:55,882
I think in the slides,
it's n-star maybe.

725
00:33:55,882 --> 00:33:57,340
Actually, no, you
can take value 0.

726
00:33:57,340 --> 00:33:57,840
I'm sorry.

727
00:33:57,840 --> 00:33:59,630
This actually takes
value 0 quite a lot.

728
00:33:59,630 --> 00:34:03,190
That's typically, in many
instances, actually the mode.

729
00:34:03,190 --> 00:34:05,780
So it's n, and then
I'm going to write it

730
00:34:05,780 --> 00:34:08,500
as Poisson with
parameter-- well,

731
00:34:08,500 --> 00:34:11,350
here it's again
lambda as a parameter.

732
00:34:11,350 --> 00:34:13,492
And lambda can take
any positive value.

733
00:34:13,492 --> 00:34:13,992
OK?

734
00:34:17,469 --> 00:34:21,280
And that's where you can
actually see that the model

735
00:34:21,280 --> 00:34:23,719
that we had for the
siblings-- right?

736
00:34:23,719 --> 00:34:27,270
So let me actually just squeeze
in the siblings model here.

737
00:34:31,230 --> 00:34:35,920
So that was the bad model
that I had in the first place

738
00:34:35,920 --> 00:34:37,210
when I actually kept this.

739
00:34:37,210 --> 00:34:39,106
Let's say we just kept it at 7.

740
00:34:39,106 --> 00:34:40,730
Forget about larger
than or equal to 7.

741
00:34:40,730 --> 00:34:42,290
We just assumed it was 7.

742
00:34:42,290 --> 00:34:43,749
What was our sample space?

743
00:34:54,228 --> 00:34:56,699
We said 7.

744
00:34:56,699 --> 00:35:01,530
So it's 1, 2, to 7, right?

745
00:35:01,530 --> 00:35:04,770
Those were the possible values
that this thing would take.

746
00:35:04,770 --> 00:35:06,160
And then what was my--

747
00:35:06,160 --> 00:35:07,330
what's my parameter space?

748
00:35:10,550 --> 00:35:12,750
So it's going to be
a nightmare write.

749
00:35:12,750 --> 00:35:14,210
But I'm going to write it.

750
00:35:14,210 --> 00:35:18,380
OK, so I'm going to write it as
something like the probability

751
00:35:18,380 --> 00:35:22,470
that x is equal to k
is equal to p sub k.

752
00:35:26,210 --> 00:35:27,540
OK?

753
00:35:27,540 --> 00:35:33,480
And that's going to be for p.

754
00:35:33,480 --> 00:35:36,150
OK, so that's for
all k's, right?

755
00:35:36,150 --> 00:35:38,890
Or for k equal 1 to 7.

756
00:35:38,890 --> 00:35:44,999
And here the index is the
set of parameters p1 to pk.

757
00:35:44,999 --> 00:35:47,040
And I know a little more
about those guys, right?

758
00:35:47,040 --> 00:35:49,670
I know there are going
to be non-negative--

759
00:35:49,670 --> 00:35:50,910
PJ non-negative.

760
00:35:50,910 --> 00:35:52,320
And I know that they sum to 1.

761
00:35:57,960 --> 00:36:01,770
OK, so maybe writing
this, you start

762
00:36:01,770 --> 00:36:05,880
seeing why we like those
Poisson, exponential,

763
00:36:05,880 --> 00:36:08,010
and short notation, because
I actually don't have

764
00:36:08,010 --> 00:36:09,530
to write the PMF of a Poisson.

765
00:36:09,530 --> 00:36:10,870
The Poisson is really just this.

766
00:36:10,870 --> 00:36:12,570
But I call it Poisson
so I don't have

767
00:36:12,570 --> 00:36:14,600
to rewrite this all the time.

768
00:36:14,600 --> 00:36:17,620
And so here, I did not
use a particular form.

769
00:36:17,620 --> 00:36:19,810
So I just have this thing,
and that's what it is.

770
00:36:19,810 --> 00:36:24,940
The set of parameters is the
set of positive numbers of--

771
00:36:24,940 --> 00:36:28,560
p1 to p7, pk--

772
00:36:28,560 --> 00:36:31,010
and sum to 7, right?

773
00:36:31,010 --> 00:36:34,110
And so this as just
a list of numbers

774
00:36:34,110 --> 00:36:37,240
that are non-negative
and sum up to 1.

775
00:36:37,240 --> 00:36:39,971
So that's my parameter space.

776
00:36:39,971 --> 00:36:40,470
OK?

777
00:36:40,470 --> 00:36:42,280
So here that's my theta.

778
00:36:42,280 --> 00:36:45,360
This whole thing here--

779
00:36:45,360 --> 00:36:47,551
this is my capital theta.

780
00:36:47,551 --> 00:36:48,051
OK?

781
00:36:51,947 --> 00:36:53,760
So that's just the
set of parameters

782
00:36:53,760 --> 00:36:55,530
that theta-- the
set of parameters

783
00:36:55,530 --> 00:36:58,440
that theta is allowed to take.

784
00:36:58,440 --> 00:37:01,890
OK, and finally, we're going
to end with the star of all,

785
00:37:01,890 --> 00:37:03,970
and that's the
normal distribution.

786
00:37:03,970 --> 00:37:06,960
And in the normal
distribution, you still

787
00:37:06,960 --> 00:37:10,080
have also some flexibility
in terms of choices,

788
00:37:10,080 --> 00:37:13,710
because then naturally,
the normal distribution

789
00:37:13,710 --> 00:37:16,740
is parametrized by--

790
00:37:16,740 --> 00:37:19,380
the normal distribution is
parametrized by two parameters,

791
00:37:19,380 --> 00:37:19,880
right?

792
00:37:19,880 --> 00:37:20,709
Mean and variance.

793
00:37:26,200 --> 00:37:30,450
So what values can a Gaussian
random variable take?

794
00:37:30,450 --> 00:37:33,980
An entire real line, right?

795
00:37:33,980 --> 00:37:35,990
And the set of
parameters that it

796
00:37:35,990 --> 00:37:42,200
can take it-- so this is going
to be n, mu, sigma square.

797
00:37:42,200 --> 00:37:46,190
And mu is going to be positive.

798
00:37:46,190 --> 00:37:49,080
And stigma square is going--

799
00:37:49,080 --> 00:37:51,720
sorry, m is going to be an r.

800
00:37:51,720 --> 00:37:55,070
And sigma square is
going to be positive.

801
00:37:55,070 --> 00:37:57,414
OK, so again here,
that's the way

802
00:37:57,414 --> 00:37:58,580
you're supposed to write it.

803
00:37:58,580 --> 00:38:03,260
If you really want to
identify what theta is,

804
00:38:03,260 --> 00:38:08,760
well, theta formally is the set
of mu sigma square such that--

805
00:38:08,760 --> 00:38:15,821
well, in r times
0 infinity, right?

806
00:38:19,120 --> 00:38:22,200
That's just to be formal, but
this does the job just fine.

807
00:38:22,200 --> 00:38:22,700
OK?

808
00:38:22,700 --> 00:38:25,820
You don't have to
be super formal.

809
00:38:25,820 --> 00:38:28,500
OK, that's not three.

810
00:38:28,500 --> 00:38:30,170
That's like five.

811
00:38:30,170 --> 00:38:32,120
Actually, I just want
to write another one.

812
00:38:32,120 --> 00:38:35,130
Let's call it 5-bit.

813
00:38:35,130 --> 00:38:41,030
And 5-bit is just Gaussian
with known variants.

814
00:38:46,760 --> 00:38:50,230
And this arises a
lot in labs when

815
00:38:50,230 --> 00:38:51,550
you have measurement error--

816
00:38:51,550 --> 00:38:55,870
when you actually receive
your measurement device.

817
00:38:55,870 --> 00:38:57,910
This thing has been
tested by the manufacturer

818
00:38:57,910 --> 00:39:00,940
so much that it actually comes
in on the side of the box.

819
00:39:00,940 --> 00:39:04,030
It says that the standard
deviation of your measurements

820
00:39:04,030 --> 00:39:07,574
is going to be 0.23.

821
00:39:07,574 --> 00:39:09,490
OK, and actually why you
do this is because we

822
00:39:09,490 --> 00:39:11,290
can brag about accuracy, right?

823
00:39:11,290 --> 00:39:13,510
That's how they sell you
this particular device.

824
00:39:13,510 --> 00:39:16,480
And so you actually know
exactly what sigma square is.

825
00:39:16,480 --> 00:39:20,230
So once you actually get
your data in the lab,

826
00:39:20,230 --> 00:39:22,210
you actually only
have to estimate mu,

827
00:39:22,210 --> 00:39:25,530
because stigma
comes on the label.

828
00:39:25,530 --> 00:39:28,660
So now, what is your
statistical model?

829
00:39:28,660 --> 00:39:33,190
Well, the numbers are
collecting still in r.

830
00:39:33,190 --> 00:39:42,034
But now, the models that I
have is n, mu, sigma squared.

831
00:39:42,034 --> 00:39:46,010
But the parameter space is not
mu, and r, and sigma positive.

832
00:39:46,010 --> 00:39:46,897
It's just mu and r.

833
00:39:54,530 --> 00:39:58,449
And to be a little more
emphatic about this,

834
00:39:58,449 --> 00:39:59,990
this is enough to
describe it, right?

835
00:39:59,990 --> 00:40:02,300
Because if sigma
is the sigma that

836
00:40:02,300 --> 00:40:04,580
was specified by
the manufacturer,

837
00:40:04,580 --> 00:40:07,340
then this is the sigma you want.

838
00:40:07,340 --> 00:40:10,710
But you can actually
write sigma is equal to--

839
00:40:10,710 --> 00:40:15,420
sigma square is equal to
sigma square manufacturer.

840
00:40:15,420 --> 00:40:15,920
Right?

841
00:40:15,920 --> 00:40:18,860
You can just fix it to
be this particular value.

842
00:40:18,860 --> 00:40:21,230
Or maybe you don't want
to write that index that's

843
00:40:21,230 --> 00:40:22,007
the manufacturer.

844
00:40:22,007 --> 00:40:23,590
And so you just say,
well, the sigma--

845
00:40:23,590 --> 00:40:24,740
when I write n
squared what I mean

846
00:40:24,740 --> 00:40:26,490
is the sigma square
from the manufacturer.

847
00:40:26,490 --> 00:40:27,404
Yeah?

848
00:40:27,404 --> 00:40:29,368
AUDIENCE: [INAUDIBLE]

849
00:40:35,320 --> 00:40:37,080
PHILIPPE RIGOLLET: Yeah.

850
00:40:37,080 --> 00:40:39,597
For a particular
measuring device?

851
00:40:39,597 --> 00:40:42,180
You know, you're in a lab, and
you have some measuring device.

852
00:40:42,180 --> 00:40:45,260
I don't know--
something that measures

853
00:40:45,260 --> 00:40:48,042
tensile strength of something.

854
00:40:48,042 --> 00:40:49,750
And it's just going
to measure something.

855
00:40:49,750 --> 00:40:51,480
And it will naturally
make errors.

856
00:40:51,480 --> 00:40:53,865
But it's been tested so
much by the manufacturer

857
00:40:53,865 --> 00:40:55,770
and calibrated by them.

858
00:40:55,770 --> 00:40:57,807
They know it's not
going to be perfect.

859
00:40:57,807 --> 00:40:59,265
But they knew
exactly what error it

860
00:40:59,265 --> 00:41:00,690
was making, because
they've actually tried it

861
00:41:00,690 --> 00:41:02,231
on things for which
they exactly knew

862
00:41:02,231 --> 00:41:04,431
what the tensile strength was.

863
00:41:04,431 --> 00:41:05,385
OK?

864
00:41:05,385 --> 00:41:06,339
Yeah.

865
00:41:06,339 --> 00:41:07,770
AUDIENCE: [INAUDIBLE]

866
00:41:09,155 --> 00:41:10,155
PHILIPPE RIGOLLET: This?

867
00:41:10,155 --> 00:41:11,600
AUDIENCE: [INAUDIBLE]

868
00:41:11,600 --> 00:41:13,898
PHILIPPE RIGOLLET: Oh,
like that's pointing to--

869
00:41:13,898 --> 00:41:14,886
5 prime?

870
00:41:19,340 --> 00:41:21,260
OK?

871
00:41:21,260 --> 00:41:24,230
And we can come up with
other examples, right?

872
00:41:24,230 --> 00:41:26,030
So for example,
here's another one.

873
00:41:30,380 --> 00:41:33,350
So the names don't
really matter, right?

874
00:41:33,350 --> 00:41:34,670
I call it the siblings model.

875
00:41:34,670 --> 00:41:37,662
But you won't find the siblings
model in the textbook, right?

876
00:41:37,662 --> 00:41:38,870
So I wouldn't worry too much.

877
00:41:38,870 --> 00:41:41,810
But for example, let's say
you have something-- so

878
00:41:41,810 --> 00:41:42,710
let's call it 6.

879
00:41:42,710 --> 00:41:45,700
You have-- I don't know--

880
00:41:45,700 --> 00:41:54,240
a truncated-- and that's the
name I just came up with.

881
00:41:54,240 --> 00:41:57,490
But it's actually not exactly
describing what I want.

882
00:41:57,490 --> 00:42:03,510
But let's say I observe y, which
is the indicator of x larger

883
00:42:03,510 --> 00:42:11,460
than say 5 when x follows some
exponential with parameter

884
00:42:11,460 --> 00:42:13,181
lambda.

885
00:42:13,181 --> 00:42:13,680
OK?

886
00:42:13,680 --> 00:42:15,570
This is what I get to observe.

887
00:42:15,570 --> 00:42:18,990
I only observe if
my waiting time

888
00:42:18,990 --> 00:42:20,610
was more than five
minutes, because I

889
00:42:20,610 --> 00:42:23,160
see somebody coming out
of the Kendall Station

890
00:42:23,160 --> 00:42:24,380
being really upset.

891
00:42:24,380 --> 00:42:26,310
And that's all I record
is I've been waiting

892
00:42:26,310 --> 00:42:27,770
for more than five minutes.

893
00:42:27,770 --> 00:42:29,460
And that's all I get to record.

894
00:42:29,460 --> 00:42:29,960
OK?

895
00:42:29,960 --> 00:42:31,109
That happens a lot.

896
00:42:31,109 --> 00:42:32,400
These are called censored data.

897
00:42:32,400 --> 00:42:34,960
I should probably not
call it truncated,

898
00:42:34,960 --> 00:42:36,712
but this should be censored.

899
00:42:36,712 --> 00:42:38,140
OK?

900
00:42:38,140 --> 00:42:40,620
You see a lot of censored
data when you ask people

901
00:42:40,620 --> 00:42:42,290
how much they make.

902
00:42:42,290 --> 00:42:45,330
They say, well, more
than five figures.

903
00:42:45,330 --> 00:42:47,720
And that's all they
want to tell you.

904
00:42:47,720 --> 00:42:48,380
OK?

905
00:42:48,380 --> 00:42:54,410
And so you see a lot of censored
data in survival analysis,

906
00:42:54,410 --> 00:42:55,560
right?

907
00:42:55,560 --> 00:42:58,620
You are trying to understand
how long your patients are going

908
00:42:58,620 --> 00:43:01,720
to live after some surgery, OK?

909
00:43:01,720 --> 00:43:05,970
And maybe you're not going
to keep people alive,

910
00:43:05,970 --> 00:43:07,491
and you're not
going to actually be

911
00:43:07,491 --> 00:43:09,490
in touch in their family
every day and ask them,

912
00:43:09,490 --> 00:43:10,920
is the guy still alive?

913
00:43:10,920 --> 00:43:12,750
And so what you
can do is just you

914
00:43:12,750 --> 00:43:15,540
ask people maybe five
years after your study

915
00:43:15,540 --> 00:43:18,060
and say, please, come in.

916
00:43:18,060 --> 00:43:20,970
And you will just happen to
have some people say, well, you

917
00:43:20,970 --> 00:43:22,470
know, the person is deceased.

918
00:43:22,470 --> 00:43:25,560
And you will only be able to
know that the person deceased

919
00:43:25,560 --> 00:43:27,780
less than five years ago.

920
00:43:27,780 --> 00:43:31,980
But you only see what
happens after that, OK?

921
00:43:31,980 --> 00:43:34,080
And so this is this
truncated and censored data.

922
00:43:34,080 --> 00:43:35,940
It happens all the
time just because you

923
00:43:35,940 --> 00:43:39,750
don't have the ability
to do better than that.

924
00:43:39,750 --> 00:43:42,380
So this could happen here.

925
00:43:42,380 --> 00:43:45,270
So what is my physical
experiment, right?

926
00:43:45,270 --> 00:43:47,650
So here, I should probably
write this like this,

927
00:43:47,650 --> 00:43:50,380
because I just told you that my
observations are going to be x,

928
00:43:50,380 --> 00:43:52,720
but there is some unknown y.

929
00:43:52,720 --> 00:43:54,210
I will never get to see this y.

930
00:43:54,210 --> 00:43:57,230
I only get to see the x.

931
00:43:57,230 --> 00:43:58,700
What is my statistical
experiment?

932
00:43:58,700 --> 00:44:00,215
Please help me.

933
00:44:00,215 --> 00:44:02,460
So is it the real line?

934
00:44:02,460 --> 00:44:04,850
My sample space--
is it the real line?

935
00:44:09,270 --> 00:44:12,410
Sorry, who does not
know what this means?

936
00:44:12,410 --> 00:44:13,440
I'm sorry.

937
00:44:13,440 --> 00:44:15,450
OK.

938
00:44:15,450 --> 00:44:18,460
So this is called an indicator.

939
00:44:18,460 --> 00:44:20,586
So I read it as--

940
00:44:20,586 --> 00:44:23,940
if I wrote well, that would
be one with a double bar.

941
00:44:23,940 --> 00:44:26,070
You can also write
i if you prefer

942
00:44:26,070 --> 00:44:28,200
if you don't feel like
writing one in double bars.

943
00:44:28,200 --> 00:44:31,235
And it's one of say--

944
00:44:31,235 --> 00:44:32,610
I'm going to write
it like that--

945
00:44:32,610 --> 00:44:43,590
1 of a is equal to 1 if a
is true and 0 if a is false.

946
00:44:43,590 --> 00:44:44,880
OK?

947
00:44:44,880 --> 00:44:48,370
So that means that if y is
larger than 4, this thing is 1.

948
00:44:48,370 --> 00:44:52,350
And if y is not larger
than 5, this thing is 0.

949
00:44:52,350 --> 00:44:53,754
OK.

950
00:44:53,754 --> 00:44:56,247
So that's called an indicator--

951
00:45:00,143 --> 00:45:01,604
indicator function.

952
00:45:06,480 --> 00:45:10,760
It was very useful to just
turn anything into a 0, 1.

953
00:45:10,760 --> 00:45:14,387
So now that I'm here,
what is my sample space?

954
00:45:17,380 --> 00:45:18,800
0, 1.

955
00:45:18,800 --> 00:45:21,610
Well, whatever this
thing I did not tell you

956
00:45:21,610 --> 00:45:24,247
was taking value with the
thing you should have--

957
00:45:24,247 --> 00:45:26,580
if I end up telling you that
is taking value 6 or 7 that

958
00:45:26,580 --> 00:45:29,760
would be your sample space, OK?

959
00:45:29,760 --> 00:45:33,220
OK, so it takes values 0, 1.

960
00:45:33,220 --> 00:45:37,060
And then what is the
probability here?

961
00:45:37,060 --> 00:45:38,410
What should I write here?

962
00:45:38,410 --> 00:45:40,243
What should you write
without even thinking?

963
00:45:44,062 --> 00:45:45,020
Yeah.

964
00:45:45,020 --> 00:45:47,070
So let's assume
there's two seconds

965
00:45:47,070 --> 00:45:48,830
before the end of the exam.

966
00:45:48,830 --> 00:45:50,164
You're going to write Bernoulli.

967
00:45:50,164 --> 00:45:52,663
And that's where you're going
to start checking if I'm going

968
00:45:52,663 --> 00:45:54,174
to give you extra time, OK?

969
00:45:54,174 --> 00:45:55,840
So you write Bernoulli
without thinking,

970
00:45:55,840 --> 00:45:57,330
because it's taking value 0, 1.

971
00:45:57,330 --> 00:45:59,760
So you just write Bernoulli,
but you still have to tell me

972
00:45:59,760 --> 00:46:04,110
what possible parameters
this thing is taking, right?

973
00:46:04,110 --> 00:46:06,450
So I'm going to write it
p, because I don't know.

974
00:46:06,450 --> 00:46:09,080
And then p take value--

975
00:46:09,080 --> 00:46:11,370
OK, so sorry.

976
00:46:11,370 --> 00:46:14,980
I could write it like that.

977
00:46:14,980 --> 00:46:16,910
Right?

978
00:46:16,910 --> 00:46:21,260
That would be perfectly
valid, but actually no more.

979
00:46:21,260 --> 00:46:23,390
It's not any p.

980
00:46:23,390 --> 00:46:26,330
The p is the probability
that an exponential lambda

981
00:46:26,330 --> 00:46:27,560
is larger than 5.

982
00:46:27,560 --> 00:46:30,530
And maybe I want to have
lambda as a parameter.

983
00:46:30,530 --> 00:46:33,450
OK, so what I need to
actually compute is,

984
00:46:33,450 --> 00:46:38,180
what is the probability
that y is larger than 5--

985
00:46:38,180 --> 00:46:40,725
when y is this
exponential lambda,

986
00:46:40,725 --> 00:46:42,350
which means that what
I need to compute

987
00:46:42,350 --> 00:46:46,414
is the integral between
5 and infinity of--

988
00:46:46,414 --> 00:46:47,388
what is it?

989
00:46:47,388 --> 00:46:49,823
1 over lambda.

990
00:46:49,823 --> 00:46:52,745
How did I define
it in this class?

991
00:46:52,745 --> 00:46:54,206
Did I change it-- what?

992
00:46:54,206 --> 00:46:57,150
AUDIENCE: [INAUDIBLE].

993
00:46:57,150 --> 00:46:59,090
PHILIPPE RIGOLLET: Yeah,
right, right, right.

994
00:46:59,090 --> 00:46:59,760
Yeah.

995
00:46:59,760 --> 00:47:04,230
Lambda e to the minus
lambda x dx, right?

996
00:47:04,230 --> 00:47:07,760
So that's what I
need to compute.

997
00:47:07,760 --> 00:47:09,580
What is this?

998
00:47:09,580 --> 00:47:11,678
Yeah, so what is the
value of this integral?

999
00:47:14,666 --> 00:47:16,658
Can you take
appropriate measures?

1000
00:47:25,622 --> 00:47:28,112
AUDIENCE: [INAUDIBLE]

1001
00:47:32,594 --> 00:47:33,610
PHILIPPE RIGOLLET: OK?

1002
00:47:33,610 --> 00:47:35,984
And again, you can
cancel this, right?

1003
00:47:35,984 --> 00:47:37,650
So when I'm going to
integrate this guy,

1004
00:47:37,650 --> 00:47:39,070
those guys are going to cancel.

1005
00:47:39,070 --> 00:47:40,900
I'm going to get 0 for infinity.

1006
00:47:40,900 --> 00:47:42,929
I'm going to get
a 5 for this guy.

1007
00:47:42,929 --> 00:47:45,470
And well, I know it's going to
be positive number, so I'm not

1008
00:47:45,470 --> 00:47:46,890
really going to
bother with the signs,

1009
00:47:46,890 --> 00:47:48,556
because I know that's
what it should be.

1010
00:47:48,556 --> 00:47:51,850
OK, so I get e to
the minus 5 lambda.

1011
00:47:51,850 --> 00:47:55,231
And so that means that I can
actually write this like that--

1012
00:47:57,973 --> 00:48:01,710
and now parametrize this
thing by lambda positive.

1013
00:48:01,710 --> 00:48:02,210
OK?

1014
00:48:02,210 --> 00:48:05,480
So what I did here is I changed
the parametrization from p

1015
00:48:05,480 --> 00:48:06,890
to lambda.

1016
00:48:06,890 --> 00:48:07,490
Why?

1017
00:48:07,490 --> 00:48:10,550
Well, because maybe if I
know this is happening,

1018
00:48:10,550 --> 00:48:13,400
maybe I am actually
interested in reporting lambda

1019
00:48:13,400 --> 00:48:15,910
to MBTA, for example.

1020
00:48:15,910 --> 00:48:20,375
Maybe I'm actually trying to
estimate 1 over lambda, so

1021
00:48:20,375 --> 00:48:22,857
that I know it is--

1022
00:48:22,857 --> 00:48:24,440
well, lambda is
actually the intensity

1023
00:48:24,440 --> 00:48:26,819
of arrival of my
Poisson process, right?

1024
00:48:26,819 --> 00:48:27,860
I have a Poisson process.

1025
00:48:27,860 --> 00:48:31,357
That's how my trains
are coming in.

1026
00:48:31,357 --> 00:48:32,690
And so I'm interested in lambda.

1027
00:48:32,690 --> 00:48:34,314
So I will parametrize
things by lambda.

1028
00:48:34,314 --> 00:48:35,680
So the thing I get is lambda.

1029
00:48:35,680 --> 00:48:37,010
You can play with this, right?

1030
00:48:37,010 --> 00:48:39,051
I mean, I could parametrize
this by 1 over lambda

1031
00:48:39,051 --> 00:48:42,960
and put 1 over lambda
here if I want it.

1032
00:48:42,960 --> 00:48:46,650
But you know, the
context of your problem

1033
00:48:46,650 --> 00:48:50,321
will tell you exactly
how to parametrize this.

1034
00:48:50,321 --> 00:48:50,820
OK?

1035
00:48:53,890 --> 00:48:59,200
So what else did I
want to tell you?

1036
00:48:59,200 --> 00:49:00,688
OK, let's do a final one.

1037
00:49:13,660 --> 00:49:17,800
By the way, are you guys OK
with Poisson exponential,

1038
00:49:17,800 --> 00:49:21,060
Bernoulli's--

1039
00:49:21,060 --> 00:49:22,930
I don't know, binomial, normal--

1040
00:49:22,930 --> 00:49:24,155
all these things.

1041
00:49:24,155 --> 00:49:25,780
I'm not going to go
back to it, but I'm

1042
00:49:25,780 --> 00:49:26,863
going to use them heavily.

1043
00:49:26,863 --> 00:49:29,435
So just spend five
minutes on Wikipedia

1044
00:49:29,435 --> 00:49:31,870
if you forgot about
what those things are.

1045
00:49:31,870 --> 00:49:35,140
Usually, you must have seen them
the in your probability class.

1046
00:49:35,140 --> 00:49:36,670
So they should not
be a crazy name.

1047
00:49:36,670 --> 00:49:38,410
And again, I'm
not expecting you.

1048
00:49:38,410 --> 00:49:40,804
I don't remember what the
density of an exponential is.

1049
00:49:40,804 --> 00:49:42,220
So it would be
pretty unfair of me

1050
00:49:42,220 --> 00:49:44,069
to actually ask you to
remember what it is.

1051
00:49:44,069 --> 00:49:45,610
Even for the Gaussian,
I don't expect

1052
00:49:45,610 --> 00:49:46,760
you to remember what it is.

1053
00:49:46,760 --> 00:49:51,550
But I want you to remember that
if I add 5 to a Gaussian, then

1054
00:49:51,550 --> 00:49:54,490
I have a Gaussian with me and
mu plus 5 if I multiply it

1055
00:49:54,490 --> 00:49:55,800
by something, right?

1056
00:49:55,800 --> 00:49:59,170
You need to know how to
operate those things.

1057
00:49:59,170 --> 00:50:02,110
But knowing
complicated densities

1058
00:50:02,110 --> 00:50:04,290
is definitely not
part of the game.

1059
00:50:04,290 --> 00:50:05,740
OK?

1060
00:50:05,740 --> 00:50:09,591
So let's do a final one.

1061
00:50:09,591 --> 00:50:11,090
I don't know what
number I have now.

1062
00:50:11,090 --> 00:50:12,298
I'm going to just do uniform.

1063
00:50:14,999 --> 00:50:15,790
That's another one.

1064
00:50:15,790 --> 00:50:18,370
Everybody knows what uniform is?

1065
00:50:18,370 --> 00:50:19,360
So it's uniform, right?

1066
00:50:19,360 --> 00:50:22,810
So I'm going to have x,
which my observations are

1067
00:50:22,810 --> 00:50:27,140
going to be uniform on the
interval 0 theta, right?

1068
00:50:27,140 --> 00:50:30,100
So if I want to define
a uniform distribution

1069
00:50:30,100 --> 00:50:32,575
for a random variable, I have
to tell you which interval

1070
00:50:32,575 --> 00:50:35,200
or which set I want
it to be uniform on.

1071
00:50:35,200 --> 00:50:38,210
And so here I'm telling you
is the interval 0 theta.

1072
00:50:38,210 --> 00:50:41,270
And so what is going
to be my sample space?

1073
00:50:41,270 --> 00:50:42,204
AUDIENCE: [INAUDIBLE]

1074
00:50:42,204 --> 00:50:44,480
PHILIPPE RIGOLLET: I'm sorry?

1075
00:50:44,480 --> 00:50:44,980
0 to theta.

1076
00:50:47,620 --> 00:50:50,770
And then what is my
probability distribution?

1077
00:50:50,770 --> 00:50:52,228
My family of parameters?

1078
00:50:57,100 --> 00:51:00,000
So well, I can write
it like this, right?

1079
00:51:00,000 --> 00:51:03,444
Uniform theta, right?

1080
00:51:03,444 --> 00:51:06,408
And theta let's say is positive.

1081
00:51:09,866 --> 00:51:12,336
Can somebody tell me what's
wrong with what I wrote?

1082
00:51:18,780 --> 00:51:20,592
This makes no sense.

1083
00:51:20,592 --> 00:51:21,554
Tell me why.

1084
00:51:24,440 --> 00:51:26,845
Yeah?

1085
00:51:26,845 --> 00:51:30,292
Yeah, this set depends on theta,
and why is that a problem?

1086
00:51:30,292 --> 00:51:32,188
AUDIENCE: [INAUDIBLE]

1087
00:51:36,869 --> 00:51:38,410
PHILIPPE RIGOLLET:
There is no theta.

1088
00:51:38,410 --> 00:51:40,990
Right now, there's
the families of theta.

1089
00:51:40,990 --> 00:51:43,430
Which one did you pick here?

1090
00:51:43,430 --> 00:51:46,540
Right, this is just something
that's indexed by theta,

1091
00:51:46,540 --> 00:51:49,090
but I could have very well
written it as, you know,

1092
00:51:49,090 --> 00:51:51,860
just not being
Greek for a second,

1093
00:51:51,860 --> 00:51:55,652
I could have just written
this as t rather than theta.

1094
00:51:55,652 --> 00:51:56,860
That would be the same thing.

1095
00:51:56,860 --> 00:51:59,349
And then what the hell is theta?

1096
00:51:59,349 --> 00:52:00,640
There's no such thing as theta.

1097
00:52:00,640 --> 00:52:02,230
We don't know what
the parameter is.

1098
00:52:02,230 --> 00:52:04,314
This parameter should
move with everyone.

1099
00:52:04,314 --> 00:52:05,980
And so that means
that I actually am not

1100
00:52:05,980 --> 00:52:07,200
allowed to pick this theta.

1101
00:52:07,200 --> 00:52:10,060
I'm actually-- just for
the reason that there is no

1102
00:52:10,060 --> 00:52:12,056
parameter to put
on the left side--

1103
00:52:12,056 --> 00:52:13,180
there should not be, right?

1104
00:52:13,180 --> 00:52:14,910
So you just said, well, there's
a problem because the parameter

1105
00:52:14,910 --> 00:52:16,030
is on the left-hand side.

1106
00:52:16,030 --> 00:52:17,405
But there's not
even a parameter.

1107
00:52:17,405 --> 00:52:19,630
I'm describing the family
of possible parameters.

1108
00:52:19,630 --> 00:52:22,060
There is no one that you
can actually plug it in.

1109
00:52:22,060 --> 00:52:24,352
So this should really be 1.

1110
00:52:24,352 --> 00:52:25,810
And I'm going to
go back to writing

1111
00:52:25,810 --> 00:52:29,776
this as theta because
that's pretty standard.

1112
00:52:29,776 --> 00:52:31,750
Is that clear for everyone.

1113
00:52:31,750 --> 00:52:37,780
I cannot just pick one and put
it in there and just take the--

1114
00:52:37,780 --> 00:52:40,600
before I run my experiments,
I could potentially

1115
00:52:40,600 --> 00:52:42,370
get numbers that are
all the way up to 1,

1116
00:52:42,370 --> 00:52:45,206
because I don't know what theta
is going to be ahead of time.

1117
00:52:45,206 --> 00:52:47,740
Now, if somebody
promised to me that theta

1118
00:52:47,740 --> 00:52:49,690
was going to be less
than 0.5, that would be--

1119
00:52:49,690 --> 00:52:50,980
sorry, why do I put 1 here?

1120
00:52:56,740 --> 00:52:58,490
I could put theta
between 0 and 1.

1121
00:52:58,490 --> 00:53:00,150
But if somebody is going
to promise me, for example,

1122
00:53:00,150 --> 00:53:01,649
if theta is going
to be less than 1,

1123
00:53:01,649 --> 00:53:03,599
then you expect to put 0, 1.

1124
00:53:03,599 --> 00:53:04,565
All right?

1125
00:53:08,912 --> 00:53:12,310
Is that clear?

1126
00:53:12,310 --> 00:53:15,410
OK, so now you know how
to answer the question--

1127
00:53:15,410 --> 00:53:18,390
what is the statistical model?

1128
00:53:18,390 --> 00:53:20,140
And again, within the
scope of this class,

1129
00:53:20,140 --> 00:53:23,110
you will not be asked to just
come up with a model right that

1130
00:53:23,110 --> 00:53:24,260
will just tell you.

1131
00:53:24,260 --> 00:53:26,390
Poisson would be probably
be a good idea here.

1132
00:53:26,390 --> 00:53:28,681
And then you would just have
to trust me that indeed it

1133
00:53:28,681 --> 00:53:30,460
would be a good idea.

1134
00:53:30,460 --> 00:53:35,230
All right, so what I started
talking about 20 minutes ago--

1135
00:53:35,230 --> 00:53:38,350
so it's definitely
ahead of myself

1136
00:53:38,350 --> 00:53:40,000
is the notion-- so
that's when I was

1137
00:53:40,000 --> 00:53:41,290
talking about well-specified.

1138
00:53:41,290 --> 00:53:44,650
Remember, well-specified says
that the true distribution

1139
00:53:44,650 --> 00:53:47,080
is one of the distributions
in this parametric families

1140
00:53:47,080 --> 00:53:48,310
of distribution.

1141
00:53:48,310 --> 00:53:50,050
The true distribution
of my siblings

1142
00:53:50,050 --> 00:53:52,570
is actually a Poisson
with some parameters.

1143
00:53:52,570 --> 00:53:56,800
And all I need to figure out
is what this parameter is.

1144
00:53:56,800 --> 00:53:58,560
When I started saying
that, I said, well,

1145
00:53:58,560 --> 00:53:59,970
but then that
could be that there

1146
00:53:59,970 --> 00:54:01,428
are several parameters
that give me

1147
00:54:01,428 --> 00:54:03,010
the same distribution, right?

1148
00:54:03,010 --> 00:54:07,840
It could be the case that
Poisson 5 and Poisson 17

1149
00:54:07,840 --> 00:54:09,910
are exactly the same
distributions when

1150
00:54:09,910 --> 00:54:13,540
I started putting those numbers
in the formula which I erased,

1151
00:54:13,540 --> 00:54:14,140
OK?

1152
00:54:14,140 --> 00:54:18,070
So it could be the case that two
different numbers would give me

1153
00:54:18,070 --> 00:54:20,360
exactly the same probabilities.

1154
00:54:20,360 --> 00:54:24,600
And in this case, we see that
the model is not identifiable.

1155
00:54:24,600 --> 00:54:26,680
I mean, the parameter
is not identifiable.

1156
00:54:26,680 --> 00:54:29,410
I cannot identify the parameter,
even if you actually gave me

1157
00:54:29,410 --> 00:54:32,080
an infinite amount of data,
which means that I could

1158
00:54:32,080 --> 00:54:34,750
actually estimate
exactly the PMF.

1159
00:54:34,750 --> 00:54:37,277
I might not be able to go
back, because there would

1160
00:54:37,277 --> 00:54:38,860
be several candidates,
and I would not

1161
00:54:38,860 --> 00:54:41,360
be able to tell you which one
it was in the first place.

1162
00:54:41,360 --> 00:54:41,860
OK?

1163
00:54:41,860 --> 00:54:45,310
So what we want is
that this function--

1164
00:54:45,310 --> 00:54:49,720
theta maps to p
theta is injective.

1165
00:54:49,720 --> 00:54:51,067
And that really can be fancy.

1166
00:54:54,410 --> 00:54:57,560
What I really mean
is that if theta

1167
00:54:57,560 --> 00:55:01,580
is different from theta
prime, then p of theta

1168
00:55:01,580 --> 00:55:04,100
is different from
p of theta prime.

1169
00:55:04,100 --> 00:55:07,580
Or, if you prefer to think about
the contrapositive of this,

1170
00:55:07,580 --> 00:55:11,960
this is the same as saying
that if p theta gives me

1171
00:55:11,960 --> 00:55:15,320
the same distribution
as theta prime,

1172
00:55:15,320 --> 00:55:17,480
then that implies
that theta must

1173
00:55:17,480 --> 00:55:20,180
be equal to the theta prime.

1174
00:55:20,180 --> 00:55:24,330
The logic of those two
things are equivalent, right?

1175
00:55:24,330 --> 00:55:26,780
So that's what this means.

1176
00:55:26,780 --> 00:55:37,130
So this is-- we say that the
parameter is identifiable

1177
00:55:37,130 --> 00:55:41,414
or identified-- it
doesn't really matter--

1178
00:55:41,414 --> 00:55:42,386
in this model.

1179
00:55:49,170 --> 00:55:50,920
And this is something
we're going to want.

1180
00:55:50,920 --> 00:55:51,920
OK?

1181
00:55:51,920 --> 00:55:54,980
So in all the examples
that I gave you,

1182
00:55:54,980 --> 00:55:57,090
those parameters are
completely identified.

1183
00:55:57,090 --> 00:55:57,590
Right?

1184
00:55:57,590 --> 00:55:58,410
If I tell you--

1185
00:55:58,410 --> 00:56:01,440
I mean, if those things
are in probability box,

1186
00:56:01,440 --> 00:56:03,920
it means that they were
probably thought through, right?

1187
00:56:03,920 --> 00:56:06,290
So when I say
exponential lambda,

1188
00:56:06,290 --> 00:56:09,987
I'm really talking about one
specific distribution and not--

1189
00:56:09,987 --> 00:56:11,820
there's not another
lambda going to give you

1190
00:56:11,820 --> 00:56:13,910
exactly the same distribution.

1191
00:56:13,910 --> 00:56:15,020
OK so that was the case.

1192
00:56:15,020 --> 00:56:17,150
And you can check that,
but it's a little annoying.

1193
00:56:17,150 --> 00:56:19,306
So I would probably not do it.

1194
00:56:19,306 --> 00:56:20,930
But rather than doing
this, let me just

1195
00:56:20,930 --> 00:56:24,210
give you some examples where
it would not be the case.

1196
00:56:24,210 --> 00:56:25,980
Again, here's an
example, if I take xi--

1197
00:56:31,220 --> 00:56:36,980
so now I'm back to just using
this indicator function--

1198
00:56:36,980 --> 00:56:39,080
but now for a Gaussian.

1199
00:56:39,080 --> 00:56:42,030
So what I observe is
x is the indicator

1200
00:56:42,030 --> 00:56:44,020
that y is, what did we say?

1201
00:56:44,020 --> 00:56:44,914
Positive.

1202
00:56:48,050 --> 00:56:49,350
OK?

1203
00:56:49,350 --> 00:56:51,458
So this is a Bernoulli
random variable, right?

1204
00:56:56,400 --> 00:56:57,709
And it has some parameter p.

1205
00:56:57,709 --> 00:56:59,250
But p now is going
to depend-- sorry,

1206
00:56:59,250 --> 00:57:04,890
and here y is n mu sigma square.

1207
00:57:04,890 --> 00:57:09,090
So the p, the probability
that this thing is positive,

1208
00:57:09,090 --> 00:57:10,020
is actually--

1209
00:57:10,020 --> 00:57:11,390
I don't think I put the 0.

1210
00:57:11,390 --> 00:57:13,010
Oh, yeah, because I have mu.

1211
00:57:13,010 --> 00:57:15,935
OK, so this distribution--
this p the probability

1212
00:57:15,935 --> 00:57:17,560
that it's positive
is just the probably

1213
00:57:17,560 --> 00:57:19,715
that some Gaussian is positive.

1214
00:57:19,715 --> 00:57:22,410
And it will depend on
mu and sigma, right?

1215
00:57:22,410 --> 00:57:31,680
Because if I draw a 0, and I
draw my Gaussian around mu,

1216
00:57:31,680 --> 00:57:35,430
then the probability of
this Bernoulli being 1

1217
00:57:35,430 --> 00:57:39,790
is really the area
under the curve here.

1218
00:57:39,790 --> 00:57:40,770
Right?

1219
00:57:40,770 --> 00:57:42,827
And this thing-- well,
if mu is very large,

1220
00:57:42,827 --> 00:57:44,160
it's going to become very large.

1221
00:57:44,160 --> 00:57:48,540
If mu is very small, it's
going to become very small.

1222
00:57:48,540 --> 00:57:51,961
And if sigma changes, it's
also going to effect--

1223
00:57:51,961 --> 00:57:53,970
is that clear for everyone?

1224
00:57:53,970 --> 00:57:56,040
But we can actually
compute this, right?

1225
00:57:56,040 --> 00:57:59,610
So the parameter p that
I'm looking for here

1226
00:57:59,610 --> 00:58:01,650
as a function of mu
and sigma is simply

1227
00:58:01,650 --> 00:58:06,270
the probability that
some y is non-negative,

1228
00:58:06,270 --> 00:58:12,150
which is the probability that
y minus mu divided by sigma

1229
00:58:12,150 --> 00:58:16,790
is larger than minus
mu divided by sigma.

1230
00:58:16,790 --> 00:58:20,160
But when you study probability,
is that some operation you

1231
00:58:20,160 --> 00:58:22,170
were used to making?

1232
00:58:22,170 --> 00:58:26,010
Removing the mean and dividing
by the standard deviation?

1233
00:58:26,010 --> 00:58:28,116
What is the effect of
doing that on the Gaussian

1234
00:58:28,116 --> 00:58:30,655
random variable?

1235
00:58:30,655 --> 00:58:32,030
Yeah, so you
normalize it, right?

1236
00:58:32,030 --> 00:58:33,490
And you standardize it.

1237
00:58:33,490 --> 00:58:34,900
You make it a standard Gaussian.

1238
00:58:34,900 --> 00:58:36,880
You remove the mean.

1239
00:58:36,880 --> 00:58:38,470
The mean 0 is Gaussian.

1240
00:58:38,470 --> 00:58:41,170
And you remove the variance
for it to become 1.

1241
00:58:41,170 --> 00:58:43,019
So when you have a
Gaussian, remove the mean

1242
00:58:43,019 --> 00:58:44,560
and divide by the
standard deviation,

1243
00:58:44,560 --> 00:58:46,570
it becomes a standard Gaussian--

1244
00:58:46,570 --> 00:58:50,975
which this thing has
n , 0, 1 distribution,

1245
00:58:50,975 --> 00:58:53,350
which is the one you can read
the quintiles of at the end

1246
00:58:53,350 --> 00:58:54,640
of the book.

1247
00:58:54,640 --> 00:58:55,140
Right?

1248
00:58:55,140 --> 00:58:57,020
And that's exactly what we did.

1249
00:58:57,020 --> 00:58:57,520
OK?

1250
00:58:57,520 --> 00:59:00,190
So now you have the probability
that some standard Gaussian

1251
00:59:00,190 --> 00:59:04,366
exceeds negative mu
over sigma, which

1252
00:59:04,366 --> 00:59:06,490
I can write in terms of
the cumulative distribution

1253
00:59:06,490 --> 00:59:07,894
function, capital phi--

1254
00:59:14,720 --> 00:59:16,560
like we did in
the first lecture.

1255
00:59:16,560 --> 00:59:19,070
So if I do this cumulative
distribution function,

1256
00:59:19,070 --> 00:59:21,012
what is this probability
in terms of phi?

1257
00:59:25,431 --> 00:59:26,413
[INAUDIBLE]?

1258
00:59:26,413 --> 00:59:28,307
AUDIENCE: [INAUDIBLE].

1259
00:59:28,307 --> 00:59:30,640
PHILIPPE RIGOLLET: Well,
that's what your name tag says.

1260
00:59:33,400 --> 00:59:34,366
1 minus--

1261
00:59:34,366 --> 00:59:36,034
AUDIENCE: [INAUDIBLE].

1262
00:59:36,034 --> 00:59:37,700
PHILLIPPE RIGOLLET:
1 minus mu of sigma.

1263
00:59:37,700 --> 00:59:39,786
What happens with phi in our--

1264
00:59:39,786 --> 00:59:43,450
do you think I
defined this for fun?

1265
00:59:43,450 --> 00:59:50,210
1 minus phi of mu
over sigma, right?

1266
00:59:50,210 --> 00:59:50,710
Right?

1267
00:59:50,710 --> 00:59:52,320
Because this is 1
minus the probability

1268
00:59:52,320 --> 00:59:53,361
that it's less than this.

1269
00:59:53,361 --> 00:59:55,180
And this is exactly
the definition

1270
00:59:55,180 --> 00:59:57,915
of the cumulative
distribution function.

1271
00:59:57,915 --> 01:00:04,080
So in particular, this thing
only depends on mu over sigma.

1272
01:00:04,080 --> 01:00:05,640
Agreed?

1273
01:00:05,640 --> 01:00:09,630
So in particular, if I
had 2 mu over 2 sigma,

1274
01:00:09,630 --> 01:00:11,240
p would remain unchanged.

1275
01:00:11,240 --> 01:00:15,340
If I have 12 mu over
12 sigma, this thing

1276
01:00:15,340 --> 01:00:18,370
would remain
unchanged, which means

1277
01:00:18,370 --> 01:00:22,780
that p does not
change if I scale mu

1278
01:00:22,780 --> 01:00:25,410
and sigma by the same factor.

1279
01:00:25,410 --> 01:00:28,840
So there's no way just
by observing x, even

1280
01:00:28,840 --> 01:00:32,090
an infinite times, so that I can
actually get exactly what p is.

1281
01:00:32,090 --> 01:00:34,870
I'm never going to be able to
get mu and sigma separately.

1282
01:00:34,870 --> 01:00:37,842
All I'm going to be able
to get is mu over sigma.

1283
01:00:37,842 --> 01:00:41,730
So here, we say that mu sigma--

1284
01:00:41,730 --> 01:00:43,120
the parameter mu sigma--

1285
01:00:43,120 --> 01:00:46,056
or actually each of them
individually-- those guys--

1286
01:00:50,310 --> 01:00:51,769
they're not identifiable.

1287
01:00:58,810 --> 01:01:03,180
But the parameter mu over
sigma is identifiable.

1288
01:01:09,180 --> 01:01:13,820
So if I wanted to write a
statistical model in which

1289
01:01:13,820 --> 01:01:15,620
the parameter is identifiable--

1290
01:01:25,660 --> 01:01:32,440
I would write 0, 1 Bernoulli.

1291
01:01:32,440 --> 01:01:41,289
And then I would write 1
minus phi over mu over sigma.

1292
01:01:41,289 --> 01:01:42,830
And then I would
take two parameters,

1293
01:01:42,830 --> 01:01:48,940
which are mu and r and
sigma squared positive.

1294
01:01:48,940 --> 01:01:52,244
So let's write sigma positive.

1295
01:01:52,244 --> 01:01:52,744
Right?

1296
01:01:56,970 --> 01:01:59,440
No, this is not identifiable.

1297
01:01:59,440 --> 01:02:02,848
I cannot write those two guys
as being two things different.

1298
01:02:12,010 --> 01:02:22,050
Instead, what I want to write
is 0, 1, Bernoulli 1 minus--

1299
01:02:26,026 --> 01:02:30,002
and now my parameter--

1300
01:02:30,002 --> 01:02:37,610
I forgot this-- my
parameter is mu over sigma.

1301
01:02:37,610 --> 01:02:41,180
Can somebody tell me
where mu over sigma lives?

1302
01:02:41,180 --> 01:02:42,670
What values can this thing take?

1303
01:02:46,630 --> 01:02:48,115
Any real value, right?

1304
01:02:53,070 --> 01:02:55,920
OK, so now I've done this
definitely out of convenience,

1305
01:02:55,920 --> 01:02:56,430
right?

1306
01:02:56,430 --> 01:02:59,130
Because that was the only
thing I was able to identify--

1307
01:02:59,130 --> 01:03:01,020
the ratio of mu over sigma.

1308
01:03:01,020 --> 01:03:04,260
But it's still something
that has some meaning.

1309
01:03:04,260 --> 01:03:06,570
It's the normalized mean.

1310
01:03:06,570 --> 01:03:08,880
It really tells me what
the mean is compared

1311
01:03:08,880 --> 01:03:10,200
to the standard deviation.

1312
01:03:10,200 --> 01:03:13,980
So in some models, in reality,
in some real applications,

1313
01:03:13,980 --> 01:03:16,050
this actually might
have a good meaning.

1314
01:03:16,050 --> 01:03:17,940
It's just telling
me how big the mean

1315
01:03:17,940 --> 01:03:22,620
is compared to the standard
fluctuations of this model.

1316
01:03:22,620 --> 01:03:24,936
But I won't be able
to get more than that.

1317
01:03:24,936 --> 01:03:25,436
Agreed?

1318
01:03:30,630 --> 01:03:32,500
All right?

1319
01:03:32,500 --> 01:03:37,510
So now that we've set
a parametric model,

1320
01:03:37,510 --> 01:03:40,580
let's try to see what our
goals are going to be.

1321
01:03:40,580 --> 01:03:41,080
OK?

1322
01:03:41,080 --> 01:03:44,740
So now we have a sample
and a statistical model.

1323
01:03:44,740 --> 01:03:47,560
And we want to estimate
the parameter theta,

1324
01:03:47,560 --> 01:03:49,510
and I could say,
well, you know what?

1325
01:03:49,510 --> 01:03:51,190
I don't have time
for this analysis.

1326
01:03:51,190 --> 01:03:53,390
Collecting data is going
to take me a while.

1327
01:03:53,390 --> 01:03:55,010
So I'm just going to mmm--

1328
01:03:55,010 --> 01:03:57,509
and I'm going to say
that mu over sigma is 4.

1329
01:03:57,509 --> 01:03:59,050
And I'm just going
to give it to you.

1330
01:03:59,050 --> 01:04:00,970
And maybe you will
tell me, yeah,

1331
01:04:00,970 --> 01:04:02,140
it's not very good, right?

1332
01:04:02,140 --> 01:04:04,870
So we need some
measure of performance

1333
01:04:04,870 --> 01:04:05,770
of a given parameter.

1334
01:04:05,770 --> 01:04:09,310
We need to be able to evaluate
if eyeballing the problem

1335
01:04:09,310 --> 01:04:11,680
is worse than
actually collecting

1336
01:04:11,680 --> 01:04:13,030
a large amount of theta.

1337
01:04:13,030 --> 01:04:16,150
We need to know if even if I
come up with an estimator that

1338
01:04:16,150 --> 01:04:18,520
actually sort of uses
the data, does it

1339
01:04:18,520 --> 01:04:20,620
make an efficient
use of the data?

1340
01:04:20,620 --> 01:04:22,750
Would I actually need 10
times more observations

1341
01:04:22,750 --> 01:04:24,279
to achieve the same accuracy?

1342
01:04:24,279 --> 01:04:25,820
To be able to answer
these questions,

1343
01:04:25,820 --> 01:04:28,390
well, I need to define
what accuracy means.

1344
01:04:28,390 --> 01:04:30,550
And accuracy is something
that sort of makes sense.

1345
01:04:30,550 --> 01:04:31,716
It says, well, I want theta.

1346
01:04:31,716 --> 01:04:33,520
I have to be close to theta.

1347
01:04:33,520 --> 01:04:35,344
And the theta is
a random variable.

1348
01:04:35,344 --> 01:04:36,760
So I'm going to
have to understand

1349
01:04:36,760 --> 01:04:38,218
what it means for
a random variable

1350
01:04:38,218 --> 01:04:40,570
to be close to a
deterministic number.

1351
01:04:40,570 --> 01:04:44,354
And so, what is a
parameter estimator, right?

1352
01:04:44,354 --> 01:04:46,770
So I have an estimator, and I
said it's a random variable.

1353
01:04:49,692 --> 01:04:51,640
And the formal definition--

1354
01:04:59,920 --> 01:05:10,470
so an estimator is a measurable
function of the data.

1355
01:05:10,470 --> 01:05:12,570
So when I write
theta hat, and that

1356
01:05:12,570 --> 01:05:18,060
will typically be my notation
for an estimator, right?

1357
01:05:18,060 --> 01:05:24,820
I should really write
theta hat of x1 xn.

1358
01:05:24,820 --> 01:05:25,380
OK?

1359
01:05:25,380 --> 01:05:26,930
That's what an estimator is.

1360
01:05:26,930 --> 01:05:28,620
If you want to know
an estimator is,

1361
01:05:28,620 --> 01:05:30,400
this is a measurable
function of the data.

1362
01:05:30,400 --> 01:05:35,187
And it's actually also
known as a statistic.

1363
01:05:37,767 --> 01:05:39,350
And you know, if
you're interested in,

1364
01:05:39,350 --> 01:05:43,340
you know, I see every day
I think when I have like,

1365
01:05:43,340 --> 01:05:47,250
you know, a dinner
with normal people.

1366
01:05:47,250 --> 01:05:48,630
And they say I'm a statistician.

1367
01:05:48,630 --> 01:05:50,660
Oh, yeah, I really
like baseball.

1368
01:05:50,660 --> 01:05:53,210
And they talk to me
about batting averages.

1369
01:05:53,210 --> 01:05:54,119
That's not what I do.

1370
01:05:54,119 --> 01:05:55,910
But for them, that's
what it is, and that's

1371
01:05:55,910 --> 01:05:58,010
because in a way, that's
what a statistic is.

1372
01:05:58,010 --> 01:06:00,450
A batting average
is a statistic.

1373
01:06:00,450 --> 01:06:02,240
OK, and so here
are some examples.

1374
01:06:02,240 --> 01:06:04,250
You can take the average xn bar.

1375
01:06:04,250 --> 01:06:06,400
You can take the maximum
of your observation.

1376
01:06:06,400 --> 01:06:07,370
That's the statistics.

1377
01:06:07,370 --> 01:06:08,990
You can take the first one.

1378
01:06:08,990 --> 01:06:10,820
You can take the first
one plus log of 1

1379
01:06:10,820 --> 01:06:12,830
plus the absolute
value of the last one.

1380
01:06:12,830 --> 01:06:15,980
You can do whatever you want
that will be an estimator.

1381
01:06:15,980 --> 01:06:17,780
Some of them are
clearly going to be bad.

1382
01:06:17,780 --> 01:06:20,090
But that's still a statistic,
and you can do this.

1383
01:06:20,090 --> 01:06:24,610
Now, when I say
measurable, I always have--

1384
01:06:24,610 --> 01:06:26,277
so you know, graduate
students sometimes

1385
01:06:26,277 --> 01:06:28,943
ask me like, yeah, how do I know
if this estimator is measurable

1386
01:06:28,943 --> 01:06:29,480
or not.

1387
01:06:29,480 --> 01:06:31,710
And usually, my answer is,
well, if I give you data,

1388
01:06:31,710 --> 01:06:32,866
can you compute it.

1389
01:06:32,866 --> 01:06:35,240
And they say, yeah, I'm like,
well, then it's measurable.

1390
01:06:35,240 --> 01:06:38,390
That's a very good rule to
check if you can actually--

1391
01:06:38,390 --> 01:06:40,970
if something is
actually measurable.

1392
01:06:40,970 --> 01:06:42,560
When is this thing
non-measurable?

1393
01:06:42,560 --> 01:06:44,750
It's when it's
implicitly defined.

1394
01:06:44,750 --> 01:06:46,700
OK, and in
particular, the things

1395
01:06:46,700 --> 01:06:48,880
that give you problems are--

1396
01:06:52,370 --> 01:06:53,525
sup or inf.

1397
01:06:53,525 --> 01:06:55,820
Anybody knows what
a sup or an inf is?

1398
01:06:55,820 --> 01:06:57,560
It's like a max or a min.

1399
01:06:57,560 --> 01:06:59,720
But it's not always attained.

1400
01:06:59,720 --> 01:07:02,120
OK, so if I have x1.

1401
01:07:02,120 --> 01:07:06,800
So if I look at the
infimum of the function

1402
01:07:06,800 --> 01:07:11,610
f of x for x on the real
line and f of x, sorry,

1403
01:07:11,610 --> 01:07:13,880
let's say x on the 1 infinity.

1404
01:07:13,880 --> 01:07:16,970
And f of x is equal to 1 over x.

1405
01:07:16,970 --> 01:07:18,070
Right?

1406
01:07:18,070 --> 01:07:20,030
Then the infimum is
the smallest value

1407
01:07:20,030 --> 01:07:22,850
we can take except
that it doesn't really

1408
01:07:22,850 --> 01:07:28,270
take it at 0 right, because
1 over x is going to 0.

1409
01:07:28,270 --> 01:07:30,240
But it's never
really getting there.

1410
01:07:30,240 --> 01:07:32,590
So we just called the inf 0.

1411
01:07:32,590 --> 01:07:34,340
But it's not the value
that it ever takes.

1412
01:07:34,340 --> 01:07:37,952
And these things might actually
be complicated to compute.

1413
01:07:37,952 --> 01:07:40,160
And so that's when you
actually have problems, right?

1414
01:07:40,160 --> 01:07:41,870
When the limit is not--

1415
01:07:41,870 --> 01:07:44,030
you're not really quite
reaching the limit.

1416
01:07:44,030 --> 01:07:47,481
You won't have this problem in
general, but just so you know,

1417
01:07:47,481 --> 01:07:48,980
an estimator is not
really anything.

1418
01:07:48,980 --> 01:07:51,440
It has to actually
be measurable.

1419
01:07:51,440 --> 01:07:54,630
OK, so the first thing we
want to know I mentioned it--

1420
01:07:54,630 --> 01:07:57,690
so an estimator is a statistic
which does not depend on theta,

1421
01:07:57,690 --> 01:07:58,670
of course.

1422
01:07:58,670 --> 01:08:01,430
So if I give you the data, you
have to be able to compute it.

1423
01:08:01,430 --> 01:08:04,250
And that probably should not
require not knowing any known

1424
01:08:04,250 --> 01:08:06,840
parameters.

1425
01:08:06,840 --> 01:08:11,070
OK, so an estimator is
said to be consistent.

1426
01:08:11,070 --> 01:08:13,790
When my data-- when I collect
more and more data, this thing

1427
01:08:13,790 --> 01:08:16,130
is getting closer and closer
to the true parameter.

1428
01:08:16,130 --> 01:08:16,629
All right?

1429
01:08:16,629 --> 01:08:20,080
And we said that eyeballing and
saying that it's going to be 4

1430
01:08:20,080 --> 01:08:21,770
is not really something
that's probably

1431
01:08:21,770 --> 01:08:22,907
going to be consistent.

1432
01:08:22,907 --> 01:08:24,740
But they can have things
that are consistent

1433
01:08:24,740 --> 01:08:28,850
but that are converging to
theta at different speeds.

1434
01:08:28,850 --> 01:08:29,816
OK?

1435
01:08:29,816 --> 01:08:32,479
And we know also that
this is a random variable.

1436
01:08:32,479 --> 01:08:33,649
It converges to something.

1437
01:08:33,649 --> 01:08:35,390
And there might be
some different notions

1438
01:08:35,390 --> 01:08:36,556
of convergence that kick in.

1439
01:08:36,556 --> 01:08:38,060
And actually there are.

1440
01:08:38,060 --> 01:08:40,850
And we say that it's weakly
convergent if it converges

1441
01:08:40,850 --> 01:08:43,670
in probability and
strongly convergent

1442
01:08:43,670 --> 01:08:46,310
if it converges
almost [INAUDIBLE]..

1443
01:08:46,310 --> 01:08:46,970
OK?

1444
01:08:46,970 --> 01:08:48,890
And this is just vocabulary.

1445
01:08:48,890 --> 01:08:50,581
It won't make a big difference.

1446
01:08:50,581 --> 01:08:51,080
OK?

1447
01:08:51,080 --> 01:08:56,228
So we will typically say it's
consistent with any of the two.

1448
01:08:56,228 --> 01:08:57,707
AUDIENCE: [INAUDIBLE].

1449
01:09:02,637 --> 01:09:07,488
PHILIPPE RIGOLLET: Well, so
in parametric statistics,

1450
01:09:07,488 --> 01:09:09,529
it's actually a little
difficult to come up with.

1451
01:09:09,529 --> 01:09:15,022
But in non-parametric ones, I
could just say, if I had xi,

1452
01:09:15,022 --> 01:09:24,180
yi, and I know that yi is
f of xi plus noise s1i.

1453
01:09:24,180 --> 01:09:26,800
And I know that f belongs
to some class of function,

1454
01:09:26,800 --> 01:09:27,930
let's say--

1455
01:09:27,930 --> 01:09:31,310
[INAUDIBLE] class of smooth
functions-- it's massive.

1456
01:09:31,310 --> 01:09:33,810
And now, I'm going to actually
find the following estimator.

1457
01:09:33,810 --> 01:09:35,279
I'm going to take the average.

1458
01:09:35,279 --> 01:09:36,945
So I'm going to do
least squares, right?

1459
01:09:40,310 --> 01:09:41,720
So I just check.

1460
01:09:41,720 --> 01:09:44,300
I'm trying to minimize the
distance of each of my f of xi

1461
01:09:44,300 --> 01:09:45,979
to my yi.

1462
01:09:45,979 --> 01:09:49,700
And now, I want to find
the smallest of them.

1463
01:09:49,700 --> 01:09:56,240
So if I look at the infimum
here, then the question is--

1464
01:09:56,240 --> 01:09:57,590
so that could be--

1465
01:09:57,590 --> 01:09:59,750
well, that's not really
an estimator for f.

1466
01:09:59,750 --> 01:10:02,630
But it's an estimator for
the smallest possible value.

1467
01:10:02,630 --> 01:10:04,550
And so for example,
this is actually

1468
01:10:04,550 --> 01:10:07,340
an estimator for the
variance of sigma square.

1469
01:10:07,340 --> 01:10:09,990
This might not be attained,
and this might not

1470
01:10:09,990 --> 01:10:13,710
be measurable if f is massive?

1471
01:10:13,710 --> 01:10:16,285
All right, so that's the
infimum over some class f of x.

1472
01:10:16,285 --> 01:10:18,150
OK?

1473
01:10:18,150 --> 01:10:20,810
So those are all voice things
that are defined implicitly.

1474
01:10:20,810 --> 01:10:24,982
If it's an average, for example,
it's completely measurable.

1475
01:10:24,982 --> 01:10:27,467
OK?

1476
01:10:27,467 --> 01:10:28,958
Any other question?

1477
01:10:31,950 --> 01:10:37,809
OK, so we know that the first
thing we might want to check,

1478
01:10:37,809 --> 01:10:40,350
and that's definitely something
we want about estimators that

1479
01:10:40,350 --> 01:10:43,020
is consistent, because
all consistency tells

1480
01:10:43,020 --> 01:10:45,924
us is that just as I
collect more and more data,

1481
01:10:45,924 --> 01:10:47,382
my estimator is
going to get closer

1482
01:10:47,382 --> 01:10:51,300
and closer to the parameter.

1483
01:10:51,300 --> 01:10:52,930
There's other things
we can look at.

1484
01:10:52,930 --> 01:10:55,600
For each possible value
of n-- now, right now,

1485
01:10:55,600 --> 01:11:00,560
I have a finite number
of observations--

1486
01:11:00,560 --> 01:11:01,850
25.

1487
01:11:01,850 --> 01:11:04,450
And I want to know something
about my estimator.

1488
01:11:04,450 --> 01:11:08,672
The first thing I want to check
is maybe if in average, right?

1489
01:11:08,672 --> 01:11:09,880
So this is a random variable.

1490
01:11:09,880 --> 01:11:11,860
Is this random
variable in average

1491
01:11:11,860 --> 01:11:14,540
going to be close
to theta or not?

1492
01:11:14,540 --> 01:11:17,260
And so the difference
how far I am from theta

1493
01:11:17,260 --> 01:11:20,140
is actually called the bias.

1494
01:11:20,140 --> 01:11:28,400
So the bias of an estimator is
the expectation of theta hat

1495
01:11:28,400 --> 01:11:31,640
minus the value that I hope
it gets, which is theta.

1496
01:11:31,640 --> 01:11:38,642
If this thing is equal to 0, we
say that theta hat is unbiased.

1497
01:11:42,030 --> 01:11:44,680
And unbiased estimators
are things that people

1498
01:11:44,680 --> 01:11:46,967
are looking for in general.

1499
01:11:46,967 --> 01:11:49,300
The problem is that there's
lots of unbiased estimators.

1500
01:11:49,300 --> 01:11:52,829
And so it might be misleading
to look for unbiasedness

1501
01:11:52,829 --> 01:11:54,370
when that's not
really the only thing

1502
01:11:54,370 --> 01:11:55,870
you should be looking for.

1503
01:11:55,870 --> 01:11:58,690
OK, so what does it
mean to be unbiased?

1504
01:11:58,690 --> 01:12:00,432
Maybe for this
particular round of data

1505
01:12:00,432 --> 01:12:02,140
you collected, you're
actually pretty far

1506
01:12:02,140 --> 01:12:04,240
from the true estimator.

1507
01:12:04,240 --> 01:12:08,440
But one thing that actually--

1508
01:12:08,440 --> 01:12:12,580
what it means is that if I redid
this experiment over, and over,

1509
01:12:12,580 --> 01:12:16,870
and over again, and I averaged
all the values of my estimators

1510
01:12:16,870 --> 01:12:19,770
that I got, then this would
actually be the right--

1511
01:12:19,770 --> 01:12:21,371
the true parameter.

1512
01:12:21,371 --> 01:12:21,870
OK.

1513
01:12:21,870 --> 01:12:22,990
That's what it means.

1514
01:12:22,990 --> 01:12:25,360
If I were to repeat
this experiment,

1515
01:12:25,360 --> 01:12:27,610
in average, I would actually
get the right thing.

1516
01:12:27,610 --> 01:12:30,300
But you don't get to
repeat the experiment.

1517
01:12:30,300 --> 01:12:33,070
OK, just a remark
about estimators,

1518
01:12:33,070 --> 01:12:34,910
look at this estimator-- xn bar.

1519
01:12:34,910 --> 01:12:35,410
Right?

1520
01:12:35,410 --> 01:12:36,670
Think of the kiss example.

1521
01:12:36,670 --> 01:12:39,605
I'm looking at the average
of my observations.

1522
01:12:39,605 --> 01:12:41,980
And I want to know what the
expectation of this thing is.

1523
01:12:44,670 --> 01:12:45,780
OK?

1524
01:12:45,780 --> 01:12:56,843
Now, this guy is by
linearity of the expectation,

1525
01:12:56,843 --> 01:12:59,710
it is this, right?

1526
01:12:59,710 --> 01:13:03,850
But my data is
identically distributed.

1527
01:13:03,850 --> 01:13:07,060
So in particular, all the xi's
have the same expectation,

1528
01:13:07,060 --> 01:13:09,070
right?

1529
01:13:09,070 --> 01:13:10,734
Everybody agrees with this.

1530
01:13:10,734 --> 01:13:12,150
When it's identically
distributed,

1531
01:13:12,150 --> 01:13:14,870
they'll get the
same expectation.

1532
01:13:14,870 --> 01:13:17,950
So what it means is
that this guy's here--

1533
01:13:17,950 --> 01:13:22,210
they're all equal to
the expectation of x1.

1534
01:13:22,210 --> 01:13:23,710
Right?

1535
01:13:23,710 --> 01:13:25,980
So what it means is
that these guys--

1536
01:13:25,980 --> 01:13:28,280
I have the average
of the same number.

1537
01:13:28,280 --> 01:13:31,830
So this is actually
the expectation of x1.

1538
01:13:31,830 --> 01:13:32,517
OK?

1539
01:13:32,517 --> 01:13:33,100
And it's true.

1540
01:13:33,100 --> 01:13:36,836
In the kiss example, this was p.

1541
01:13:36,836 --> 01:13:37,780
And this is p--

1542
01:13:40,620 --> 01:13:43,200
the probability of
turning your head right.

1543
01:13:43,200 --> 01:13:43,840
OK?

1544
01:13:43,840 --> 01:13:45,670
So those two things
are the same.

1545
01:13:45,670 --> 01:13:50,020
In particular, that means
that xn bar and just x1

1546
01:13:50,020 --> 01:13:54,120
have the same bias.

1547
01:13:54,120 --> 01:13:56,100
So that should probably
illustrate to you

1548
01:13:56,100 --> 01:13:59,400
that bias is not something
that really is telling you

1549
01:13:59,400 --> 01:14:02,930
the entire picture, Right?

1550
01:14:02,930 --> 01:14:05,350
I can take only one
of my observations--

1551
01:14:05,350 --> 01:14:06,484
Bernoulli 0, 1.

1552
01:14:06,484 --> 01:14:07,900
This thing will
have the same bias

1553
01:14:07,900 --> 01:14:10,880
as if I average 1,000 of them.

1554
01:14:10,880 --> 01:14:13,350
But the bias is really telling
you where I am in average.

1555
01:14:13,350 --> 01:14:16,261
But it's really not telling me
what fluctuations I'm getting.

1556
01:14:16,261 --> 01:14:18,510
And so if you want to start
having fluctuations coming

1557
01:14:18,510 --> 01:14:20,460
into the picture,
we actually have

1558
01:14:20,460 --> 01:14:22,350
to look at the risk
or the quadratic risk

1559
01:14:22,350 --> 01:14:23,852
of the estimator.

1560
01:14:23,852 --> 01:14:25,560
And so the quadratic
risk is the finest--

1561
01:14:25,560 --> 01:14:28,590
the expectation of the square
distance between theta hat

1562
01:14:28,590 --> 01:14:30,770
and theta.

1563
01:14:30,770 --> 01:14:33,155
OK?

1564
01:14:33,155 --> 01:14:34,520
So let's look at this.

1565
01:14:42,360 --> 01:14:43,830
So the quadratic risk--

1566
01:14:47,270 --> 01:14:48,710
sometimes it's
denoted that people

1567
01:14:48,710 --> 01:14:57,590
call it the l2 risk of
theta hat, of course.

1568
01:14:57,590 --> 01:14:59,950
I'm sorry for maintaining
such an ugly board.

1569
01:14:59,950 --> 01:15:00,910
[INAUDIBLE] this stuff.

1570
01:15:09,460 --> 01:15:10,960
OK, so I look at
the square distance

1571
01:15:10,960 --> 01:15:12,070
between theta hat and theta.

1572
01:15:12,070 --> 01:15:14,530
This is still-- this is the
function of a random variable.

1573
01:15:14,530 --> 01:15:16,250
So it's a random
variable as well.

1574
01:15:16,250 --> 01:15:19,081
And now I'm looking at the
expectation of this guy.

1575
01:15:19,081 --> 01:15:23,060
That's the definition.

1576
01:15:23,060 --> 01:15:25,560
I claimed that when this
thing goes to 0, then

1577
01:15:25,560 --> 01:15:28,350
my estimator is actually
going to be consistent.

1578
01:15:28,350 --> 01:15:30,298
Everybody agrees with this?

1579
01:15:37,116 --> 01:15:47,715
So if it goes to zero as n
goes to infinity-- and here,

1580
01:15:47,715 --> 01:15:50,090
I don't need to tell you what
kind of convergence I have,

1581
01:15:50,090 --> 01:15:51,715
because this is just
the number, right?

1582
01:15:51,715 --> 01:15:52,640
It's an expectation.

1583
01:15:52,640 --> 01:15:57,260
So it's a regular, usual
calculus-style convergence.

1584
01:15:57,260 --> 01:16:03,326
Then that implies that theta hat
is actually weakly consistent.

1585
01:16:07,294 --> 01:16:09,774
What did I use to tell you this?

1586
01:16:14,740 --> 01:16:17,450
Yeah, this is the
convergence in l2.

1587
01:16:17,450 --> 01:16:19,950
This actually is
strictly equivalent.

1588
01:16:19,950 --> 01:16:26,264
This is by definition saying
that theta hat converges in l2

1589
01:16:26,264 --> 01:16:29,360
to theta.

1590
01:16:29,360 --> 01:16:31,190
And we know that
convergence in l2

1591
01:16:31,190 --> 01:16:37,305
implies convergence in
credibility to theta.

1592
01:16:37,305 --> 01:16:38,180
That was the picture.

1593
01:16:38,180 --> 01:16:40,160
We're going up.

1594
01:16:40,160 --> 01:16:42,498
And this is actually
equivalent to a consistency

1595
01:16:42,498 --> 01:16:46,329
by definition-- a
weak consistency.

1596
01:16:46,329 --> 01:16:48,370
OK, so this is actually
telling you a little more

1597
01:16:48,370 --> 01:16:50,380
because this guy here--

1598
01:16:50,380 --> 01:16:52,810
they are both unbiased.

1599
01:16:52,810 --> 01:16:55,010
Theta xn bar is unbiased.

1600
01:16:55,010 --> 01:16:56,470
X1 is unbiased.

1601
01:16:56,470 --> 01:16:58,060
But x1 is certainly
not consistent ,

1602
01:16:58,060 --> 01:17:01,150
because the more data I collect,
I'm not even doing anything

1603
01:17:01,150 --> 01:17:01,650
with it.

1604
01:17:01,650 --> 01:17:04,120
I'm just taking the first data
point you're giving to me.

1605
01:17:04,120 --> 01:17:05,650
So they're both unbiased.

1606
01:17:05,650 --> 01:17:07,420
But this one is not consistent.

1607
01:17:07,420 --> 01:17:09,930
And this one we'll see
is actually consistent.

1608
01:17:09,930 --> 01:17:11,370
xn bar is consistent.

1609
01:17:11,370 --> 01:17:14,358
And actually, we've
seen that last time.

1610
01:17:14,358 --> 01:17:15,816
And that's because of the?

1611
01:17:19,704 --> 01:17:23,752
What guarantees the fact
that xn bar is consistent?

1612
01:17:23,752 --> 01:17:25,210
AUDIENCE: The law
of large numbers.

1613
01:17:25,210 --> 01:17:26,110
PHILIPPE RIGOLLET: The law
of large numbers, right?

1614
01:17:26,110 --> 01:17:27,780
Actually, it's
strongly consistent

1615
01:17:27,780 --> 01:17:29,880
if you have a
strong [INAUDIBLE]..

1616
01:17:29,880 --> 01:17:35,580
OK, so just in the
last two minutes,

1617
01:17:35,580 --> 01:17:39,920
I want to tell you a little bit
about how this risk is linked

1618
01:17:39,920 --> 01:17:43,350
to see, quadratic
risk is equal to bias

1619
01:17:43,350 --> 01:17:44,840
squared plus the variance.

1620
01:17:44,840 --> 01:17:48,030
So let's see what
I mean by this?

1621
01:17:48,030 --> 01:17:50,030
So I'm going to forget
about the absolute values

1622
01:17:50,030 --> 01:17:50,988
that you have a square.

1623
01:17:50,988 --> 01:17:54,600
I don't really need them.

1624
01:17:54,600 --> 01:17:57,530
If theta hat was
unbiased, this thing

1625
01:17:57,530 --> 01:18:01,619
would be the expectation
of theta hat.

1626
01:18:01,619 --> 01:18:02,660
It might not be the case.

1627
01:18:02,660 --> 01:18:06,464
So let me see how I can actually
see-- put the bias in there.

1628
01:18:06,464 --> 01:18:07,880
Well, one way to
do this is to see

1629
01:18:07,880 --> 01:18:10,170
that this is equal to
the expectation of theta

1630
01:18:10,170 --> 01:18:13,110
hat minus the
expectation of theta hat,

1631
01:18:13,110 --> 01:18:17,160
plus the expectation of
theta hat minus theta.

1632
01:18:21,450 --> 01:18:22,620
OK?

1633
01:18:22,620 --> 01:18:24,860
I just removed the same
and added the same thing.

1634
01:18:24,860 --> 01:18:27,030
So I didn't change anything.

1635
01:18:27,030 --> 01:18:29,840
Now, this guy is my bias, right?

1636
01:18:32,570 --> 01:18:34,680
So now let me expand the square.

1637
01:18:34,680 --> 01:18:37,680
So what I get is the expectation
of the square of theta

1638
01:18:37,680 --> 01:18:39,295
hat minus its expectation.

1639
01:18:42,480 --> 01:18:45,850
I should put some
square brackets--

1640
01:18:45,850 --> 01:18:50,410
plus two times
the cross-product.

1641
01:18:50,410 --> 01:18:52,900
So the cross-product
is what expectation

1642
01:18:52,900 --> 01:18:59,740
of theta hat minus the
expectation of theta hat times

1643
01:18:59,740 --> 01:19:03,516
expectation of theta
hat minus theta.

1644
01:19:07,250 --> 01:19:08,892
And then I have the last square.

1645
01:19:17,830 --> 01:19:22,890
Expectation of theta
hat minus theta squared.

1646
01:19:22,890 --> 01:19:24,240
OK?

1647
01:19:24,240 --> 01:19:27,100
So square,
cross-products, square.

1648
01:19:27,100 --> 01:19:29,980
Everybody is with me?

1649
01:19:29,980 --> 01:19:32,830
now this guy here--

1650
01:19:32,830 --> 01:19:35,070
if you pay attention, this
thing is the expectation

1651
01:19:35,070 --> 01:19:36,070
of some random variable.

1652
01:19:36,070 --> 01:19:38,450
So it's a deterministic number.

1653
01:19:38,450 --> 01:19:39,800
Theta is the true parameter.

1654
01:19:39,800 --> 01:19:41,810
It's a deterministic number.

1655
01:19:41,810 --> 01:19:44,020
So what I can do is pull
out this entire thing out

1656
01:19:44,020 --> 01:19:52,090
of the expectation like this
and compute the expectation only

1657
01:19:52,090 --> 01:19:53,607
with respect to that part.

1658
01:19:53,607 --> 01:19:56,529
But what is the
expectation of this thing?

1659
01:19:59,460 --> 01:20:00,240
It's zero, right?

1660
01:20:00,240 --> 01:20:02,323
The expectation of theta
hat minus the expectation

1661
01:20:02,323 --> 01:20:03,852
of theta hat is 0.

1662
01:20:03,852 --> 01:20:07,900
So this entire thing is equal 0.

1663
01:20:07,900 --> 01:20:12,200
So now when I actually collect
back my quadratic terms--

1664
01:20:12,200 --> 01:20:15,980
my two squared terms
in this expansion--

1665
01:20:15,980 --> 01:20:18,650
what I get is that
the expectation

1666
01:20:18,650 --> 01:20:21,800
of theta hat minus
theta squared is

1667
01:20:21,800 --> 01:20:26,450
equal to the expectation of
theta hat minus expectation

1668
01:20:26,450 --> 01:20:32,220
of theta hat squared plus
the square of expectation

1669
01:20:32,220 --> 01:20:35,650
of theta hat minus theta.

1670
01:20:40,550 --> 01:20:41,190
Right?

1671
01:20:41,190 --> 01:20:42,600
So those are just the two--

1672
01:20:42,600 --> 01:20:46,560
the first and the last term
of the previous equality?

1673
01:20:46,560 --> 01:20:48,690
Now, here I have the
expectation of the square

1674
01:20:48,690 --> 01:20:50,481
of the difference
between a random variable

1675
01:20:50,481 --> 01:20:52,440
and its expectation.

1676
01:20:52,440 --> 01:20:56,100
This is otherwise known
as the variance, right?

1677
01:20:56,100 --> 01:21:03,660
So this is actually equal to
the variance of theta hat.

1678
01:21:03,660 --> 01:21:05,690
And well, this was the bias.

1679
01:21:05,690 --> 01:21:07,200
We already said that's there.

1680
01:21:07,200 --> 01:21:09,385
So this whole thing
is the bias square.

1681
01:21:12,531 --> 01:21:13,030
OK?

1682
01:21:13,030 --> 01:21:15,550
And hence the quadratic
term is the sum

1683
01:21:15,550 --> 01:21:18,010
of the variance and
the squared bias.

1684
01:21:18,010 --> 01:21:18,940
Why squared bias?

1685
01:21:18,940 --> 01:21:21,130
Well, because otherwise,
you would add dollars

1686
01:21:21,130 --> 01:21:22,799
in dollars squared.

1687
01:21:22,799 --> 01:21:24,715
So you need to add dollars
squared and dollars

1688
01:21:24,715 --> 01:21:27,870
squared so that this thing
is actually homogeneous.

1689
01:21:27,870 --> 01:21:30,910
So if x is in dollars, then
the bias is in dollars,

1690
01:21:30,910 --> 01:21:32,870
but the variance is
in dollars squared.

1691
01:21:32,870 --> 01:21:35,036
OK, and the square here
forced you to put everything

1692
01:21:35,036 --> 01:21:36,150
on the square scale.

1693
01:21:36,150 --> 01:21:39,190
All right, so what's nice is
that if the quadratic risk goes

1694
01:21:39,190 --> 01:21:42,820
to 0, then since I have the
sum of two positive terms,

1695
01:21:42,820 --> 01:21:45,040
both of them have to go to 0.

1696
01:21:45,040 --> 01:21:46,930
That means that my
variance is going to 0--

1697
01:21:46,930 --> 01:21:48,790
very little fluctuations.

1698
01:21:48,790 --> 01:21:51,580
And my bias is also going to 0,
which means that I'm actually

1699
01:21:51,580 --> 01:21:53,260
going to be on
target once I reduce

1700
01:21:53,260 --> 01:21:55,450
my fluctuations, because
it's one thing to reduce

1701
01:21:55,450 --> 01:21:56,732
the fluctuations.

1702
01:21:56,732 --> 01:21:58,690
But if I'm not on target,
it's an issue, right?

1703
01:21:58,690 --> 01:22:03,422
For example, the estimator for
the value 4 has no variance.

1704
01:22:03,422 --> 01:22:05,380
Every time I'm going to
repeat the experiments,

1705
01:22:05,380 --> 01:22:07,510
I'm going to get 4, 4, 4, 4--

1706
01:22:07,510 --> 01:22:08,890
variance is 0.

1707
01:22:08,890 --> 01:22:10,390
But the bias is bad.

1708
01:22:10,390 --> 01:22:12,880
The bias is 4 minus theta.

1709
01:22:12,880 --> 01:22:17,140
And if theta is far from 4,
that's not doing very well.

1710
01:22:17,140 --> 01:22:21,420
OK, so next week, we will--

1711
01:22:21,420 --> 01:22:25,060
we'll talk about what
is a good estimate--

1712
01:22:25,060 --> 01:22:26,740
how estimators
change if they have

1713
01:22:26,740 --> 01:22:32,440
high variance or low variance
or high bias and low bias.

1714
01:22:32,440 --> 01:22:35,640
And we'll talk about
confidence intervals as well.