1
00:00:00,000 --> 00:00:00,590
Hi.
2
00:00:00,590 --> 00:00:03,110
In this problem, Romeo and
Juliet are back and they're
3
00:00:03,110 --> 00:00:05,470
still looking to meet
up for a date.
4
00:00:05,470 --> 00:00:07,944
Remember, the last time we met
up with them, it was back in
5
00:00:07,944 --> 00:00:09,613
the beginning of the course and
they were trying to meet
6
00:00:09,613 --> 00:00:11,510
up for a date but they weren't
always punctual.
7
00:00:11,510 --> 00:00:15,570
So we modeled their delay as
uniformly distributed between
8
00:00:15,570 --> 00:00:18,060
0 and 1 hour.
9
00:00:18,060 --> 00:00:19,770
So now in this problem,
we're actually
10
00:00:19,770 --> 00:00:21,610
going to look at variation.
11
00:00:21,610 --> 00:00:24,020
And we're going to ask the
question, how do we actually
12
00:00:24,020 --> 00:00:26,970
know that the distribution is
uniformly distributed between
13
00:00:26,970 --> 00:00:29,190
0 and 1 hour?
14
00:00:29,190 --> 00:00:31,900
Or it could also be the case
that it is uniformly
15
00:00:31,900 --> 00:00:34,190
distributed between 0
and half an hour, or
16
00:00:34,190 --> 00:00:35,600
zero and two hours.
17
00:00:35,600 --> 00:00:38,150
How do we actually know what
this parameter of the uniform
18
00:00:38,150 --> 00:00:39,844
distribution is?
19
00:00:39,844 --> 00:00:44,030
OK, so let's put ourselves in
the shoes of Romeo who's tired
20
00:00:44,030 --> 00:00:46,690
of being stood up by Juliet
on all these dates.
21
00:00:46,690 --> 00:00:49,870
And fortunately, he's learned
some probability since the
22
00:00:49,870 --> 00:00:51,590
beginning of course,
and so have we.
23
00:00:51,590 --> 00:00:54,470
And in particular we've learned
Bayesian inference.
24
00:00:54,470 --> 00:00:56,120
And so in this problem, we're
actually going to use
25
00:00:56,120 --> 00:00:58,830
basically all the concepts and
tools of Bayesian inference
26
00:00:58,830 --> 00:01:00,720
that we learned chapter
eight and apply them.
27
00:01:00,720 --> 00:01:04,269
So it's a nice review problem,
and so let's get started.
28
00:01:04,269 --> 00:01:08,530
The set of the problem is
similar to the first Romeo and
29
00:01:08,530 --> 00:01:11,810
Juliet problem that
we dealt with.
30
00:01:11,810 --> 00:01:14,480
They are meeting up for a date,
and they're not always
31
00:01:14,480 --> 00:01:16,190
punctual and they
have a delay.
32
00:01:16,190 --> 00:01:19,680
But instead of the delay being
uniformly distributed between
33
00:01:19,680 --> 00:01:24,570
0 and 1 hour, now we have an
extra layer of uncertainty.
34
00:01:24,570 --> 00:01:31,780
So if we know sum theta, then we
know that the delay, which
35
00:01:31,780 --> 00:01:34,770
we'll call x is uniformly
distributed
36
00:01:34,770 --> 00:01:36,520
between 0 and the theta.
37
00:01:36,520 --> 00:01:39,670
So here's one possible
theta, theta 1.
38
00:01:39,670 --> 00:01:42,740
But we don't actually know
what this theta is.
39
00:01:42,740 --> 00:01:45,120
So in the original problem
we knew that theta
40
00:01:45,120 --> 00:01:46,820
was exactly one hour.
41
00:01:46,820 --> 00:01:49,320
But in this problem we don't
know what theta is.
42
00:01:49,320 --> 00:01:54,150
So theta could also be like
this, some other theta 2.
43
00:01:54,150 --> 00:01:56,530
And we don't know what
this theta is.
44
00:01:56,530 --> 00:02:01,590
And we choose to model it as
being uniformly distributed
45
00:02:01,590 --> 00:02:04,950
between 0 and 1.
46
00:02:04,950 --> 00:02:07,330
So like I said, we have
two layers now.
47
00:02:07,330 --> 00:02:10,419
We have uncertainty about theta,
which is the parameters
48
00:02:10,419 --> 00:02:11,730
of the uniform distribution.
49
00:02:11,730 --> 00:02:16,510
And then we have uncertainty
in regards to the
50
00:02:16,510 --> 00:02:19,342
actual delay, x.
51
00:02:19,342 --> 00:02:23,030
OK, so let's actually
write out what these
52
00:02:23,030 --> 00:02:23,720
distributions are.
53
00:02:23,720 --> 00:02:27,360
So theta, the unknown parameter,
we're told in the
54
00:02:27,360 --> 00:02:29,500
problem that we're going to
assume that is uniformly
55
00:02:29,500 --> 00:02:30,930
distributed between 0 and 1.
56
00:02:30,930 --> 00:02:35,960
And so the PDF is just 1, when
theta is between 0 and 1, and
57
00:02:35,960 --> 00:02:38,870
0 otherwise.
58
00:02:38,870 --> 00:02:44,530
And we're told that, given what
theta is, given what this
59
00:02:44,530 --> 00:02:50,090
parameter is, the delay is
uniformly distributed between
60
00:02:50,090 --> 00:02:52,330
0 and this theta.
61
00:02:52,330 --> 00:02:55,670
So what that means is that we
know this conditional PDF, the
62
00:02:55,670 --> 00:03:01,410
conditional PDF of x given theta
is going to be 1 over
63
00:03:01,410 --> 00:03:10,620
theta if x is between 0 and
theta, and 0 otherwise.
64
00:03:10,620 --> 00:03:14,120
All right, because we know
that given a theta, x is
65
00:03:14,120 --> 00:03:16,810
uniformly distributed
between 0 and theta.
66
00:03:16,810 --> 00:03:20,520
So in order to make this uniform
distribution, it's the
67
00:03:20,520 --> 00:03:23,900
normalization or the heights,
you can think of it, has to be
68
00:03:23,900 --> 00:03:25,520
exactly 1 over theta.
69
00:03:25,520 --> 00:03:29,300
So just imagine for a concrete
case, if theta were 1, 1 hour
70
00:03:29,300 --> 00:03:32,330
in the original problem, then
this would just be a PDF of 1
71
00:03:32,330 --> 00:03:36,350
or a standard uniform
distribution between 0 and 1.
72
00:03:36,350 --> 00:03:41,160
OK, so now this is, we have the
necessary fundamentals for
73
00:03:41,160 --> 00:03:42,110
this problem.
74
00:03:42,110 --> 00:03:43,790
And what do we do
in inference?
75
00:03:43,790 --> 00:03:46,760
Well the objective is
to try to infer
76
00:03:46,760 --> 00:03:48,350
some unknown parameter.
77
00:03:48,350 --> 00:03:56,780
And what we have is we have a
prior which is our initial
78
00:03:56,780 --> 00:04:00,280
belief for what this
parameter might be.
79
00:04:00,280 --> 00:04:01,940
And then we have some data.
80
00:04:01,940 --> 00:04:04,370
So in this case, the data that
we collect is the actual
81
00:04:04,370 --> 00:04:07,700
observed delayed
for Juliet, x.
82
00:04:07,700 --> 00:04:10,420
And this model tells
us how this data
83
00:04:10,420 --> 00:04:12,540
is essentially generated.
84
00:04:12,540 --> 00:04:17,740
And now what we do is, we want
to use the data and our prior
85
00:04:17,740 --> 00:04:20,560
belief, combined them somehow,
and use it to update our
86
00:04:20,560 --> 00:04:23,180
belief into what we call
our posterior.
87
00:04:23,180 --> 00:04:26,050
In order to do that, we use
Bayes' rule, which is why this
88
00:04:26,050 --> 00:04:28,340
is called Bayesian inference.
89
00:04:28,340 --> 00:04:33,750
So when we use Bayes' rule,
remember the Bayes' rule is
90
00:04:33,750 --> 00:04:36,870
just, we want to now find the
posterior which is the
91
00:04:36,870 --> 00:04:40,360
conditional PDF of theta, the
unknown parameter, given x.
92
00:04:40,360 --> 00:04:43,710
So essentially just flip
this condition.
93
00:04:43,710 --> 00:04:47,560
And remember Bayes' rule is
given as the following.
94
00:04:47,560 --> 00:04:56,210
It's just the prior times this
conditional PDF of x given
95
00:04:56,210 --> 00:05:02,290
theta divided by the PDF of x.
96
00:05:02,290 --> 00:05:06,660
All right, and we know what
most of these things are.
97
00:05:06,660 --> 00:05:14,050
The prior or just the
PDF of theta is 1.
98
00:05:14,050 --> 00:05:20,180
The condition PDF of x given
theta is 1 over theta.
99
00:05:20,180 --> 00:05:24,370
And then of course we
have this PDF of x.
100
00:05:24,370 --> 00:05:27,950
But we always have to be careful
because these two
101
00:05:27,950 --> 00:05:32,520
values are only valid for
certain ranges of theta and x.
102
00:05:32,520 --> 00:05:35,620
So in order for this to be
valid we need theta to be
103
00:05:35,620 --> 00:05:38,560
between 0 and 1 because
otherwise it would be 0.
104
00:05:38,560 --> 00:05:41,890
So we need theta to be
between 0 and 1.
105
00:05:41,890 --> 00:05:45,640
And we need x to be between
0 and theta.
106
00:05:45,640 --> 00:05:49,580
107
00:05:49,580 --> 00:05:52,330
And otherwise this would be 0.
108
00:05:52,330 --> 00:05:55,400
109
00:05:55,400 --> 00:05:57,920
So now we're almost done.
110
00:05:57,920 --> 00:06:00,200
One last thing we need to do
is just calculate what this
111
00:06:00,200 --> 00:06:06,210
denominator is, f x of x.
112
00:06:06,210 --> 00:06:09,360
Well the denominator,
remember, is just a
113
00:06:09,360 --> 00:06:10,490
normalization.
114
00:06:10,490 --> 00:06:13,400
And it's actually relatively
less important because what
115
00:06:13,400 --> 00:06:17,760
we'll find out is that this has
no dependence on theta.
116
00:06:17,760 --> 00:06:21,670
It will only depend on x.
117
00:06:21,670 --> 00:06:24,440
So the importance, the
dependence on theta, will be
118
00:06:24,440 --> 00:06:26,390
captured just by
the numerator.
119
00:06:26,390 --> 00:06:29,810
But for completeness let's
calculate out what this is.
120
00:06:29,810 --> 00:06:32,350
So it's just a normalization.
121
00:06:32,350 --> 00:06:41,230
So it's actually just the
integral of the numerator.
122
00:06:41,230 --> 00:06:44,720
You can think of it as an
application of kind of total
123
00:06:44,720 --> 00:06:45,970
probability.
124
00:06:45,970 --> 00:06:48,400
125
00:06:48,400 --> 00:06:52,700
So we have this that we
integrate over and what do we
126
00:06:52,700 --> 00:06:54,460
integrate this over?
127
00:06:54,460 --> 00:06:58,020
Well we know that we're
integrating over theta.
128
00:06:58,020 --> 00:07:02,220
And we know that theta
has to be between x--
129
00:07:02,220 --> 00:07:06,060
has to be greater than x and
it has to be less than 1.
130
00:07:06,060 --> 00:07:10,750
So we integrate from theta
equals x to 1.
131
00:07:10,750 --> 00:07:13,850
And this is just the integral
from x to 1 of
132
00:07:13,850 --> 00:07:15,210
the numerator, right?
133
00:07:15,210 --> 00:07:17,830
This is just 1 and this
is 1 over theta.
134
00:07:17,830 --> 00:07:23,370
So it's the integral of 1 over
theta, d theta from x to 1.
135
00:07:23,370 --> 00:07:25,450
Which when you do it out,
this is the integral,
136
00:07:25,450 --> 00:07:27,560
this is log of theta.
137
00:07:27,560 --> 00:07:33,520
So it's log of 1
minus log of x.
138
00:07:33,520 --> 00:07:35,750
Log of 1 is 0.
139
00:07:35,750 --> 00:07:39,800
X, remember x is between
0 and theta.
140
00:07:39,800 --> 00:07:40,560
Theta is less than 1.
141
00:07:40,560 --> 00:07:42,960
So x has to be between
0 and 1.
142
00:07:42,960 --> 00:07:46,310
The log of something between
0 and 1 is negative.
143
00:07:46,310 --> 00:07:48,690
So this is a negative number.
144
00:07:48,690 --> 00:07:51,030
This is 0.
145
00:07:51,030 --> 00:07:52,490
And then we have a
negative sign.
146
00:07:52,490 --> 00:07:58,910
So really what we can write this
as is the absolute value
147
00:07:58,910 --> 00:08:01,260
of log of x.
148
00:08:01,260 --> 00:08:04,590
This is just so that it would
actually be negative log of x.
149
00:08:04,590 --> 00:08:07,300
But because log of x is
negative we can just--
150
00:08:07,300 --> 00:08:09,210
we know that this is actually
going to be a positive number.
151
00:08:09,210 --> 00:08:13,550
So this is just to make it
look more intuitive.
152
00:08:13,550 --> 00:08:17,810
OK so now to complete this we
can just plug that back in and
153
00:08:17,810 --> 00:08:20,470
the final answer is--
154
00:08:20,470 --> 00:08:31,480
this is going to be the absolute
value log of x or you
155
00:08:31,480 --> 00:08:38,380
could also rewrite this as 1
over theta times absolute
156
00:08:38,380 --> 00:08:41,880
value log of x.
157
00:08:41,880 --> 00:08:45,780
And of course, remember that
the actual limits for where
158
00:08:45,780 --> 00:08:49,690
this is valid are
very important.
159
00:08:49,690 --> 00:08:54,430
OK, so what does this
actually mean?
160
00:08:54,430 --> 00:09:02,170
Let's try to interpret
what this answer is.
161
00:09:02,170 --> 00:09:07,900
So what we have is this is the
posterior distribution.
162
00:09:07,900 --> 00:09:09,710
And now what have we done?
163
00:09:09,710 --> 00:09:14,650
Well we started out with the
prior, which was that theta is
164
00:09:14,650 --> 00:09:21,500
uniform between 0 and
between 0 and 1.
165
00:09:21,500 --> 00:09:24,120
This is our prior belief.
166
00:09:24,120 --> 00:09:25,900
Now we observed some data.
167
00:09:25,900 --> 00:09:29,200
And this allows us to
update our belief.
168
00:09:29,200 --> 00:09:31,870
And this is the update
that we get.
169
00:09:31,870 --> 00:09:36,700
So let's just assume that we
observe that Juliet is late by
170
00:09:36,700 --> 00:09:38,430
half an hour.
171
00:09:38,430 --> 00:09:40,660
Well if she's late by half an
hour, what does that tell us
172
00:09:40,660 --> 00:09:42,540
about what theta can be?
173
00:09:42,540 --> 00:09:46,750
Well what we know from that at
least is that theta cannot be
174
00:09:46,750 --> 00:09:50,460
anything less than half an hour
because if theta were
175
00:09:50,460 --> 00:09:54,040
less than half an hour there's
no way that her delay--
176
00:09:54,040 --> 00:09:56,740
remember her delay we know
has to be distributed
177
00:09:56,740 --> 00:09:58,150
between 0 and theta.
178
00:09:58,150 --> 00:10:01,090
There's no way that her delay
could be half an hour if theta
179
00:10:01,090 --> 00:10:02,860
were less than half an hour.
180
00:10:02,860 --> 00:10:10,960
So automatically we know that
now theta has to be somewhere
181
00:10:10,960 --> 00:10:14,900
between x and one which is where
this limit comes in.
182
00:10:14,900 --> 00:10:17,530
So we know that theta have to be
between x and 1 now instead
183
00:10:17,530 --> 00:10:18,820
of just 0 and 1.
184
00:10:18,820 --> 00:10:23,910
So by observing an x that cuts
down and eliminates part of
185
00:10:23,910 --> 00:10:28,320
the range of theta, the range
that theta can take on.
186
00:10:28,320 --> 00:10:30,000
Now what else do we know?
187
00:10:30,000 --> 00:10:31,665
Well this, we can actually
plot this.
188
00:10:31,665 --> 00:10:34,030
This is a function of theta.
189
00:10:34,030 --> 00:10:35,930
The log x, we can just
think of it as some
190
00:10:35,930 --> 00:10:37,500
sort of scaling factor.
191
00:10:37,500 --> 00:10:41,030
So it's something like
1 over theta scaled.
192
00:10:41,030 --> 00:10:43,015
And so that's going to look
something like this.
193
00:10:43,015 --> 00:10:46,080
194
00:10:46,080 --> 00:10:48,270
And so what we've done is we've
transformed the prior,
195
00:10:48,270 --> 00:10:50,850
which looks like flat and
uniform into something that
196
00:10:50,850 --> 00:10:52,660
looks like this,
the posterior.
197
00:10:52,660 --> 00:10:56,050
So we've eliminated small values
of x because we know
198
00:10:56,050 --> 00:10:57,650
that those can't be possible.
199
00:10:57,650 --> 00:11:01,600
And now what's left is
everything between x and 1.
200
00:11:01,600 --> 00:11:07,830
So now why is it also that
it becomes not uniform
201
00:11:07,830 --> 00:11:09,520
between x and 1?
202
00:11:09,520 --> 00:11:16,630
Well it's because, if you think
about it, when theta is
203
00:11:16,630 --> 00:11:20,010
close to x, so say x
is half an hour.
204
00:11:20,010 --> 00:11:23,520
If theta is half an hour, that
means that there's higher
205
00:11:23,520 --> 00:11:26,320
probability that you will
actually observe something, a
206
00:11:26,320 --> 00:11:31,570
delay of half an hour because
there's only a range between 0
207
00:11:31,570 --> 00:11:36,640
and half an hour that
x can be drawn from.
208
00:11:36,640 --> 00:11:42,120
Now if theta was actually 1 then
x could be drawn anywhere
209
00:11:42,120 --> 00:11:44,330
from 0 to 1 which is
a wider range.
210
00:11:44,330 --> 00:11:48,730
And so it's less likely that
you'll get a value of x equal
211
00:11:48,730 --> 00:11:49,780
to half an hour.
212
00:11:49,780 --> 00:11:55,070
And so because of that values
of theta closer
213
00:11:55,070 --> 00:11:56,540
to x are more likely.
214
00:11:56,540 --> 00:12:01,320
That's why you get this
decreasing function.
215
00:12:01,320 --> 00:12:09,690
OK, so now let's continue and
now what we have is this is
216
00:12:09,690 --> 00:12:12,350
the case for if you observe
one data point.
217
00:12:12,350 --> 00:12:16,440
So you arrange a date with
Juliet, you observe how late
218
00:12:16,440 --> 00:12:18,800
she is, and you get
one value of x.
219
00:12:18,800 --> 00:12:22,690
And now suppose you want to get
collect more data so you
220
00:12:22,690 --> 00:12:25,110
arrange say 10 dates
with Juliet.
221
00:12:25,110 --> 00:12:27,420
And for each one you observe
how late she was.
222
00:12:27,420 --> 00:12:33,720
So now we can collect multiple
samples, say
223
00:12:33,720 --> 00:12:35,290
n samples of delays.
224
00:12:35,290 --> 00:12:39,080
So x1 is her delay on
the first date.
225
00:12:39,080 --> 00:12:41,950
Xn is her delay on
the nth date.
226
00:12:41,950 --> 00:12:45,100
And x we can now just call a
variable that's a collection
227
00:12:45,100 --> 00:12:46,730
of all of these.
228
00:12:46,730 --> 00:12:49,180
And now the question is, how do
you incorporate in all this
229
00:12:49,180 --> 00:12:53,330
information into updating
your belief about theta?
230
00:12:53,330 --> 00:12:55,360
And it's actually pretty
analogous to
231
00:12:55,360 --> 00:12:56,730
what we've done here.
232
00:12:56,730 --> 00:12:59,030
The important assumption that
we make in this problem is
233
00:12:59,030 --> 00:13:04,590
that conditional on theta, all
of these delays are in fact
234
00:13:04,590 --> 00:13:06,730
conditionally independent.
235
00:13:06,730 --> 00:13:09,210
And that's going to help
us solve this problem.
236
00:13:09,210 --> 00:13:13,900
So the set up is essentially
the same.
237
00:13:13,900 --> 00:13:16,740
What we still need is a--
238
00:13:16,740 --> 00:13:18,980
we still need the prior.
239
00:13:18,980 --> 00:13:20,230
And the prior hasn't changed.
240
00:13:20,230 --> 00:13:23,980
241
00:13:23,980 --> 00:13:27,225
The prior is still uniform
between 0 and 1.
242
00:13:27,225 --> 00:13:35,460
243
00:13:35,460 --> 00:13:40,140
The way the actual delays are
generated is we still assume
244
00:13:40,140 --> 00:13:44,420
to be the same given conditional
on theta, each one
245
00:13:44,420 --> 00:13:47,230
of these is conditionally
independent, and each one is
246
00:13:47,230 --> 00:13:51,000
uniformly distributed
between 0 and theta.
247
00:13:51,000 --> 00:13:57,790
And so what we get is that this
is going to be equal to--
248
00:13:57,790 --> 00:14:04,420
you can also imagine this as
a big joint PDF, joint
249
00:14:04,420 --> 00:14:12,250
conditional PDF of
all the x's.
250
00:14:12,250 --> 00:14:15,800
And because we said that they
are conditionally independent
251
00:14:15,800 --> 00:14:20,450
given theta, then we can
actually split this joint PDF
252
00:14:20,450 --> 00:14:24,330
into the product of a lot of
individual conditional PDFs.
253
00:14:24,330 --> 00:14:29,240
So this we can actually rewrite
as PDF of x1 given
254
00:14:29,240 --> 00:14:33,720
theta times all the way through
the condition PDF of
255
00:14:33,720 --> 00:14:38,640
xn given theta.
256
00:14:38,640 --> 00:14:41,870
And because we assume that
each one of these is--
257
00:14:41,870 --> 00:14:44,060
for each one of these it's
uniformly distributed between
258
00:14:44,060 --> 00:14:46,680
0 and theta, they're
all the same.
259
00:14:46,680 --> 00:14:49,600
So in fact what we get
is 1 over theta
260
00:14:49,600 --> 00:14:50,810
for each one of these.
261
00:14:50,810 --> 00:14:52,090
And there's n of them.
262
00:14:52,090 --> 00:14:53,490
So it's 1 over theta to the n.
263
00:14:53,490 --> 00:14:57,700
264
00:14:57,700 --> 00:15:02,010
But what values of x
is this valid for?
265
00:15:02,010 --> 00:15:03,590
What values of x and theta?
266
00:15:03,590 --> 00:15:08,290
Well what we need is that for
each one of these, we need
267
00:15:08,290 --> 00:15:14,940
that theta has to be at least
equal to whatever x you get.
268
00:15:14,940 --> 00:15:17,820
Whatever x you observe, theta
has to at least that.
269
00:15:17,820 --> 00:15:28,730
So we know that theta has to at
least equal to x1 and all
270
00:15:28,730 --> 00:15:29,490
the way through xn.
271
00:15:29,490 --> 00:15:32,650
And so theta has to be at least
greater than or equal to
272
00:15:32,650 --> 00:15:39,370
all these x's and otherwise
this would be 0.
273
00:15:39,370 --> 00:15:42,110
So let's define something
that's going to help us.
274
00:15:42,110 --> 00:15:50,620
Let's define x bar to be
the maximum of all
275
00:15:50,620 --> 00:15:53,680
the observed x's.
276
00:15:53,680 --> 00:16:03,600
And so what we can do is rewrite
this condition as
277
00:16:03,600 --> 00:16:06,590
theta has to be at least
equal to the
278
00:16:06,590 --> 00:16:10,050
maximum, equal to x bar.
279
00:16:10,050 --> 00:16:13,740
All right, and now we can
again apply Bayes' rule.
280
00:16:13,740 --> 00:16:16,560
Bayes' rule will tell us
what this posterior
281
00:16:16,560 --> 00:16:19,270
distribution is.
282
00:16:19,270 --> 00:16:27,490
So again the numerator will
be the prior times this
283
00:16:27,490 --> 00:16:34,645
conditional PDF over PDF of x.
284
00:16:34,645 --> 00:16:41,100
OK, so the numerator again,
the prior is just one.
285
00:16:41,100 --> 00:16:43,850
This distribution we calculated
over here.
286
00:16:43,850 --> 00:16:47,240
It's 1 over theta to the n.
287
00:16:47,240 --> 00:16:50,630
And then we have this
denominator.
288
00:16:50,630 --> 00:16:53,660
289
00:16:53,660 --> 00:16:58,320
And again, we need to be careful
to write down when
290
00:16:58,320 --> 00:16:59,650
this is actually valid.
291
00:16:59,650 --> 00:17:08,599
So it's actually valid when x
bar is greater than theta--
292
00:17:08,599 --> 00:17:12,069
I'm sorry, x bar is less than
or equal to theta, and
293
00:17:12,069 --> 00:17:14,480
otherwise it's zero.
294
00:17:14,480 --> 00:17:17,390
So this is actually more
or less complete.
295
00:17:17,390 --> 00:17:22,109
Again we need to calculate out
what exactly this denominator
296
00:17:22,109 --> 00:17:26,660
is but just like before it's
actually just a scaling factor
297
00:17:26,660 --> 00:17:28,800
which is independent
of what theta is.
298
00:17:28,800 --> 00:17:31,530
So if we wanted to, we could
actually calculate this out.
299
00:17:31,530 --> 00:17:34,150
It would be just like before.
300
00:17:34,150 --> 00:17:37,200
It would be the integral of the
numerator, which is 1 over
301
00:17:37,200 --> 00:17:39,340
theta to the n d theta.
302
00:17:39,340 --> 00:17:43,050
And we integrate theta from
before, it was from x to 1.
303
00:17:43,050 --> 00:17:46,970
But now we need to integrate
from x bar to 1.
304
00:17:46,970 --> 00:17:49,100
And if we wanted to, we can
actually do others.
305
00:17:49,100 --> 00:17:54,230
It's fairly simple calculus
to calculate what this
306
00:17:54,230 --> 00:17:55,840
normalization factor would be.
307
00:17:55,840 --> 00:17:58,600
But the main point is that the
shape of it will be dictated
308
00:17:58,600 --> 00:18:02,100
by this 1 over theta
to the n term.
309
00:18:02,100 --> 00:18:05,820
And so now we know that with n
pieces of data, it's actually
310
00:18:05,820 --> 00:18:07,530
going to be 1--
311
00:18:07,530 --> 00:18:11,190
the shape will be 1 over theta
to the n, where theta has to
312
00:18:11,190 --> 00:18:14,700
be at least greater than
or equal to x bar.
313
00:18:14,700 --> 00:18:18,890
Before it was actually just
1 over theta and has to be
314
00:18:18,890 --> 00:18:20,920
between x and 1.
315
00:18:20,920 --> 00:18:24,920
So you can kind of see how the
problem generalizes when you
316
00:18:24,920 --> 00:18:27,140
collect more data.
317
00:18:27,140 --> 00:18:30,920
So now imagine that
this is the new--
318
00:18:30,920 --> 00:18:34,460
when you collect n pieces of
data, the maximum of all the
319
00:18:34,460 --> 00:18:36,140
x's is here.
320
00:18:36,140 --> 00:18:40,000
Well, it turns out that it's the
posterior now is going to
321
00:18:40,000 --> 00:18:43,020
look something like this.
322
00:18:43,020 --> 00:18:45,960
323
00:18:45,960 --> 00:18:49,460
So it becomes steeper because
it's 1 over theta to the n as
324
00:18:49,460 --> 00:18:50,740
opposed to 1 over theta.
325
00:18:50,740 --> 00:18:55,560
And it's limited to be
between x bar and 1.
326
00:18:55,560 --> 00:19:01,020
And so with more data you're
more sure of the range that
327
00:19:01,020 --> 00:19:08,260
theta can take on because each
data points eliminates parts
328
00:19:08,260 --> 00:19:11,260
of theta, the range of theta
that theta can't be.
329
00:19:11,260 --> 00:19:13,720
And so you're left with
just x bar to 1.
330
00:19:13,720 --> 00:19:15,320
And you're also more certain.
331
00:19:15,320 --> 00:19:20,350
So you have this kind
of distribution.
332
00:19:20,350 --> 00:19:26,590
OK, so this is kind of the
posterior distribution which
333
00:19:26,590 --> 00:19:31,170
tells you the entire
distribution of what the
334
00:19:31,170 --> 00:19:33,000
unknown parameter--
335
00:19:33,000 --> 00:19:35,220
the entire distribution of the
unknown parameter given all
336
00:19:35,220 --> 00:19:38,750
the data that you have
plus the prior
337
00:19:38,750 --> 00:19:40,800
distribution that you have.
338
00:19:40,800 --> 00:19:44,250
But if someone were to come to
ask you, your manager asks
339
00:19:44,250 --> 00:19:49,090
you, well what is your best
guess of what theta is?
340
00:19:49,090 --> 00:19:54,130
It's less informative or less
clear when you tell them,
341
00:19:54,130 --> 00:19:55,560
here's the distribution.
342
00:19:55,560 --> 00:20:00,000
Because you still have a big
range of what theta could be,
343
00:20:00,000 --> 00:20:02,760
it could be anything between
x and 1 or x bar and 1.
344
00:20:02,760 --> 00:20:05,420
So if you wanted to actually
come up with a point estimate
345
00:20:05,420 --> 00:20:09,190
which is just one single value,
there's different ways
346
00:20:09,190 --> 00:20:10,100
you can do it.
347
00:20:10,100 --> 00:20:16,380
The first way that we'll talk
about is the map rule.
348
00:20:16,380 --> 00:20:20,130
What the map rule does is
it takes the posterior
349
00:20:20,130 --> 00:20:25,700
distribution and just finds
the value of the parameter
350
00:20:25,700 --> 00:20:29,050
that gives the maximum posterior
distribution, the
351
00:20:29,050 --> 00:20:31,360
maximum point in the posterior
distribution.
352
00:20:31,360 --> 00:20:39,560
So if you look at this posture
distribution, the map will
353
00:20:39,560 --> 00:20:43,870
just take the highest value.
354
00:20:43,870 --> 00:20:47,320
And in this case, because the
posterior looks like this, the
355
00:20:47,320 --> 00:20:51,060
highest value is in fact x.
356
00:20:51,060 --> 00:21:00,260
And so theta hat map
is actually just x.
357
00:21:00,260 --> 00:21:03,360
And if you think about it, this
kind of an optimistic
358
00:21:03,360 --> 00:21:07,310
estimate because you always
assume that it's whatever, if
359
00:21:07,310 --> 00:21:12,800
Juliet were 30 minutes late then
you assume that her delay
360
00:21:12,800 --> 00:21:16,300
is uniformly distributed between
0 and 30 minutes.
361
00:21:16,300 --> 00:21:20,710
Well in fact, even though she
arrived 30 minutes late, that
362
00:21:20,710 --> 00:21:24,340
could have been because she's
actually distributed between 0
363
00:21:24,340 --> 00:21:27,450
and 1 hour and you just happened
to get 30 minutes.
364
00:21:27,450 --> 00:21:30,590
But what you do is you always
take kind of the optimistic,
365
00:21:30,590 --> 00:21:33,690
and just give her the benefit of
the doubt, and say that was
366
00:21:33,690 --> 00:21:37,210
actually kind of the worst
case scenario given her
367
00:21:37,210 --> 00:21:39,320
distribution.
368
00:21:39,320 --> 00:21:43,490
So another way to take this
entire posterior distribution
369
00:21:43,490 --> 00:21:46,440
and come up with just a single
number, a point estimate, is
370
00:21:46,440 --> 00:21:49,560
to take the conditional
expectation.
371
00:21:49,560 --> 00:21:51,750
So you have an entire
distribution.
372
00:21:51,750 --> 00:21:55,020
So there's two obvious ways of
getting a number out of this.
373
00:21:55,020 --> 00:21:57,500
One is to take the maximum and
the other is to take the
374
00:21:57,500 --> 00:21:58,170
expectation.
375
00:21:58,170 --> 00:22:01,130
So take everything in the
distribution, combine it and
376
00:22:01,130 --> 00:22:03,740
come up with a estimate.
377
00:22:03,740 --> 00:22:06,240
So if you think about it, it
will probably be something
378
00:22:06,240 --> 00:22:09,610
like here, would be the
conditional distribution.
379
00:22:09,610 --> 00:22:12,890
So this is called the
LMS estimator.
380
00:22:12,890 --> 00:22:17,590
And the way to calculate it is
just like we said, you take
381
00:22:17,590 --> 00:22:18,840
the conditional expectation.
382
00:22:18,840 --> 00:22:21,190
383
00:22:21,190 --> 00:22:23,710
So how do we take the
conditional expectation?
384
00:22:23,710 --> 00:22:29,810
Remember it is just the value
and you weight it by the
385
00:22:29,810 --> 00:22:33,260
correct distribution, in this
case it's the conditional PDF
386
00:22:33,260 --> 00:22:37,580
of theta given x which is the
posterior distribution.
387
00:22:37,580 --> 00:22:40,770
And what do we integrate
theta from?
388
00:22:40,770 --> 00:22:45,840
Well we integrate
it from x to 1.
389
00:22:45,840 --> 00:22:48,780
Now if we plug this in, we
integrate from x to 1, theta
390
00:22:48,780 --> 00:22:56,370
times the posterior.
391
00:22:56,370 --> 00:23:02,710
The posterior we calculated
earlier, it was 1 over theta
392
00:23:02,710 --> 00:23:07,840
times the absolute
value of log x.
393
00:23:07,840 --> 00:23:11,090
So the thetas just cancel out,
and you just have 1 over
394
00:23:11,090 --> 00:23:12,110
absolute value of log x.
395
00:23:12,110 --> 00:23:13,970
Well that doesn't
depend on theta.
396
00:23:13,970 --> 00:23:20,150
So what you get is just 1
minus x over absolute
397
00:23:20,150 --> 00:23:23,810
value of log x.
398
00:23:23,810 --> 00:23:28,870
All right, so we can actually
plot this, so we have two
399
00:23:28,870 --> 00:23:29,570
estimates now.
400
00:23:29,570 --> 00:23:33,280
One is that the estimate
is just theta--
401
00:23:33,280 --> 00:23:34,830
the estimate is just x.
402
00:23:34,830 --> 00:23:37,710
The other one is that it's
1 minus x over absolute
403
00:23:37,710 --> 00:23:39,840
value of log x.
404
00:23:39,840 --> 00:23:41,630
So we can plot this and
compare the two.
405
00:23:41,630 --> 00:23:45,250
406
00:23:45,250 --> 00:23:53,320
So here's x, and here is theta
hat, theta hat of x for the
407
00:23:53,320 --> 00:23:55,490
two different estimates.
408
00:23:55,490 --> 00:24:02,690
So here's you the estimate from
the map rule which is
409
00:24:02,690 --> 00:24:06,190
whatever x is, we estimate
that theta is equal to x.
410
00:24:06,190 --> 00:24:09,130
So it just looks like this.
411
00:24:09,130 --> 00:24:11,520
Now if we plot this, turns
out that it looks
412
00:24:11,520 --> 00:24:12,770
something like this.
413
00:24:12,770 --> 00:24:18,980
414
00:24:18,980 --> 00:24:22,550
And so whatever x is, this
will tell you what the
415
00:24:22,550 --> 00:24:25,480
estimate, the LMS estimate
of theta would be.
416
00:24:25,480 --> 00:24:27,740
And it turns out that
it's always higher
417
00:24:27,740 --> 00:24:29,850
than the map estimate.
418
00:24:29,850 --> 00:24:33,330
So it's less optimistic.
419
00:24:33,330 --> 00:24:36,485
And it kind of factors in
the entire distribution.
420
00:24:36,485 --> 00:24:41,380
421
00:24:41,380 --> 00:24:43,980
So because there are several
parts to this problem, we're
422
00:24:43,980 --> 00:24:46,470
going to take a pause for a
quick break and we'll come
423
00:24:46,470 --> 00:24:48,500
back and finish the problem
in a little bit.
424
00:24:48,500 --> 00:24:51,034