1
00:00:00,000 --> 00:00:00,040
2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative
3
00:00:02,460 --> 00:00:03,870
Commons license.
4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to
5
00:00:06,910 --> 00:00:10,560
offer high quality educational
resources for free.
6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from
7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at
8
00:00:19,290 --> 00:00:21,708
ocw.mit.edu.
9
00:00:21,708 --> 00:00:25,380
PROFESSOR: It involves real
phenomena out there.
10
00:00:25,380 --> 00:00:28,960
So we have real stuff
that happens.
11
00:00:28,960 --> 00:00:33,630
So it might be an arrival
process to a bank that we're
12
00:00:33,630 --> 00:00:35,790
trying to model.
13
00:00:35,790 --> 00:00:38,230
This is a reality, but
this is what we have
14
00:00:38,230 --> 00:00:39,660
been doing so far.
15
00:00:39,660 --> 00:00:41,910
We have been playing
with models of
16
00:00:41,910 --> 00:00:43,770
probabilistic phenomena.
17
00:00:43,770 --> 00:00:46,730
And somehow we need to
tie the two together.
18
00:00:46,730 --> 00:00:50,930
The way these are tied is that
we observe the real world and
19
00:00:50,930 --> 00:00:53,530
this gives us data.
20
00:00:53,530 --> 00:00:58,590
And then based on these data, we
try to come up with a model
21
00:00:58,590 --> 00:01:01,930
of what exactly is going on.
22
00:01:01,930 --> 00:01:05,290
For example, for an arrival
process, you might ask the
23
00:01:05,290 --> 00:01:08,680
model in question, is my arrival
process Poisson or is
24
00:01:08,680 --> 00:01:10,300
it something different?
25
00:01:10,300 --> 00:01:14,630
If it is Poisson, what is the
rate of the arrival process?
26
00:01:14,630 --> 00:01:17,460
Once you come up with your model
and you come up with the
27
00:01:17,460 --> 00:01:21,710
parameters of the model, then
you can use it to make
28
00:01:21,710 --> 00:01:27,520
predictions about reality or to
figure out certain hidden
29
00:01:27,520 --> 00:01:31,890
things, certain hidden aspects
of reality, that you do not
30
00:01:31,890 --> 00:01:35,560
observe directly, but you try
to infer what they are.
31
00:01:35,560 --> 00:01:38,900
So that's where the usefulness
of the model comes in.
32
00:01:38,900 --> 00:01:43,330
Now this field is of course
tremendously useful.
33
00:01:43,330 --> 00:01:46,650
And it shows up pretty
much everywhere.
34
00:01:46,650 --> 00:01:50,000
So we talked about the polling
examples in the
35
00:01:50,000 --> 00:01:51,280
last couple of lectures.
36
00:01:51,280 --> 00:01:53,520
This is, of course, a
real application.
37
00:01:53,520 --> 00:01:57,525
You sample and on the basis of
the sample that you have, you
38
00:01:57,525 --> 00:02:00,400
try to make some inferences
about, let's say, the
39
00:02:00,400 --> 00:02:03,060
preferences in a given
population.
40
00:02:03,060 --> 00:02:06,230
Let's say in the medical field,
you want to try whether
41
00:02:06,230 --> 00:02:08,919
a certain drug makes a
difference or not.
42
00:02:08,919 --> 00:02:14,380
So people would do medical
trials, get some results, and
43
00:02:14,380 --> 00:02:17,640
then from the data somehow you
need to make sense of them and
44
00:02:17,640 --> 00:02:18,530
make a decision.
45
00:02:18,530 --> 00:02:21,360
Is the new drug useful
or is it not?
46
00:02:21,360 --> 00:02:23,460
How do we go systematically
about the
47
00:02:23,460 --> 00:02:24,710
question of this type?
48
00:02:24,710 --> 00:02:27,770
49
00:02:27,770 --> 00:02:32,170
A sexier, more recent topic,
there's this famous Netflix
50
00:02:32,170 --> 00:02:37,510
competition where Netflix gives
you a huge table of
51
00:02:37,510 --> 00:02:41,450
movies and people.
52
00:02:41,450 --> 00:02:45,860
And people have rated the
movies, but not everyone has
53
00:02:45,860 --> 00:02:47,850
watched all of the
movies in there.
54
00:02:47,850 --> 00:02:49,460
You have some of the ratings.
55
00:02:49,460 --> 00:02:53,250
For example, this person gave a
4 to that particular movie.
56
00:02:53,250 --> 00:02:56,300
So you get the table that's
partially filled.
57
00:02:56,300 --> 00:02:58,300
And the Netflix asks
you to make
58
00:02:58,300 --> 00:02:59,860
recommendations to people.
59
00:02:59,860 --> 00:03:02,410
So this means trying to guess.
60
00:03:02,410 --> 00:03:06,100
This person here, how much
would they like this
61
00:03:06,100 --> 00:03:07,610
particular movie?
62
00:03:07,610 --> 00:03:11,130
And you can start thinking,
well, maybe this person has
63
00:03:11,130 --> 00:03:14,860
given somewhat similar ratings
with another person.
64
00:03:14,860 --> 00:03:18,440
And if that other person has
also seen that movie, maybe
65
00:03:18,440 --> 00:03:21,290
the rating of that other
person is relevant.
66
00:03:21,290 --> 00:03:24,230
But of course it's a lot more
complicated than that.
67
00:03:24,230 --> 00:03:26,650
And this has been a serious
competition where people have
68
00:03:26,650 --> 00:03:30,230
been using every heavy, wet
machinery that there is in
69
00:03:30,230 --> 00:03:32,540
statistics, trying to
come up with good
70
00:03:32,540 --> 00:03:35,140
recommendation systems.
71
00:03:35,140 --> 00:03:37,870
Then the other people, of
course, are trying to analyze
72
00:03:37,870 --> 00:03:39,010
financial data.
73
00:03:39,010 --> 00:03:43,680
Somebody gives you the sequence
of the values, let's
74
00:03:43,680 --> 00:03:45,840
say of the SMP index.
75
00:03:45,840 --> 00:03:47,850
You look at something like this
76
00:03:47,850 --> 00:03:49,770
and you can ask questions.
77
00:03:49,770 --> 00:03:55,030
How do I model these data using
any of the models that
78
00:03:55,030 --> 00:03:57,060
we have in our bag of tools?
79
00:03:57,060 --> 00:04:00,230
How can I make predictions about
what's going to happen
80
00:04:00,230 --> 00:04:03,310
afterwards, and so on?
81
00:04:03,310 --> 00:04:09,700
On the engineering side,
anywhere where you have noise
82
00:04:09,700 --> 00:04:11,590
inference comes in.
83
00:04:11,590 --> 00:04:13,810
Signal processing, in
some sense, is just
84
00:04:13,810 --> 00:04:14,960
an inference problem.
85
00:04:14,960 --> 00:04:18,730
You observe signals that are
noisy and you try to figure
86
00:04:18,730 --> 00:04:21,750
out exactly what's happening
out there or what kind of
87
00:04:21,750 --> 00:04:24,130
signal has been sent.
88
00:04:24,130 --> 00:04:28,830
Maybe the beginning of the field
could be traced a few
89
00:04:28,830 --> 00:04:32,060
hundred years ago where people
would observe, make
90
00:04:32,060 --> 00:04:35,420
astronomical observations
of the position of the
91
00:04:35,420 --> 00:04:37,550
planets in the sky.
92
00:04:37,550 --> 00:04:41,130
They would have some beliefs
that perhaps the orbits of
93
00:04:41,130 --> 00:04:44,070
planets is an ellipse.
94
00:04:44,070 --> 00:04:47,840
Or if it's a comet, maybe it's
a parabola, hyperbola, don't
95
00:04:47,840 --> 00:04:48,640
know what it is.
96
00:04:48,640 --> 00:04:51,320
But they would have
a model of that.
97
00:04:51,320 --> 00:04:53,840
But, of course, astronomical
measurements would not be
98
00:04:53,840 --> 00:04:55,300
perfectly exact.
99
00:04:55,300 --> 00:05:00,690
And they would try to find the
curve that fits these data.
100
00:05:00,690 --> 00:05:05,580
How do you go about choosing
this particular curve on the
101
00:05:05,580 --> 00:05:07,960
base of noisy data and
try to do it in a
102
00:05:07,960 --> 00:05:11,274
somewhat principled way?
103
00:05:11,274 --> 00:05:13,890
OK, so questions of this
type-- clearly the
104
00:05:13,890 --> 00:05:17,100
applications are all
over the place.
105
00:05:17,100 --> 00:05:20,830
But how is this related
conceptually with what we have
106
00:05:20,830 --> 00:05:22,480
been doing so far?
107
00:05:22,480 --> 00:05:25,960
What's the relation between the
field of inference and the
108
00:05:25,960 --> 00:05:28,130
field of probability
as we have been
109
00:05:28,130 --> 00:05:30,650
practicing until now?
110
00:05:30,650 --> 00:05:33,620
Well, mathematically speaking,
what's going to happen in the
111
00:05:33,620 --> 00:05:38,780
next few lectures could be just
exercises or homework
112
00:05:38,780 --> 00:05:44,880
problems in the class in based
on what we have done so far.
113
00:05:44,880 --> 00:05:48,560
That means you're not going
to get any new facts about
114
00:05:48,560 --> 00:05:50,200
probability theory.
115
00:05:50,200 --> 00:05:53,930
Everything we're going to do
will be simple applications of
116
00:05:53,930 --> 00:05:57,110
things that you already
do know.
117
00:05:57,110 --> 00:06:00,140
So in some sense, statistics
and inference is just an
118
00:06:00,140 --> 00:06:02,780
applied exercise
in probability.
119
00:06:02,780 --> 00:06:08,310
But actually, things are
not that simple in
120
00:06:08,310 --> 00:06:09,550
the following sense.
121
00:06:09,550 --> 00:06:12,510
If you get a probability
problem,
122
00:06:12,510 --> 00:06:14,040
there's a correct answer.
123
00:06:14,040 --> 00:06:15,450
There's a correct solution.
124
00:06:15,450 --> 00:06:18,170
And that correct solution
is unique.
125
00:06:18,170 --> 00:06:20,550
There's no ambiguity.
126
00:06:20,550 --> 00:06:23,380
The theory of probability has
clearly defined rules.
127
00:06:23,380 --> 00:06:24,570
These are the axioms.
128
00:06:24,570 --> 00:06:27,550
You're given some information
about probability
129
00:06:27,550 --> 00:06:28,280
distributions.
130
00:06:28,280 --> 00:06:31,000
You're asked to calculate
certain other things.
131
00:06:31,000 --> 00:06:32,190
There's no ambiguity.
132
00:06:32,190 --> 00:06:34,230
Answers are always unique.
133
00:06:34,230 --> 00:06:39,180
In statistical questions, it's
no longer the case that the
134
00:06:39,180 --> 00:06:41,420
question has a unique answer.
135
00:06:41,420 --> 00:06:44,990
If I give you data and I ask
you what's the best way of
136
00:06:44,990 --> 00:06:49,710
estimating the motion of that
planet, reasonable people can
137
00:06:49,710 --> 00:06:53,370
come up with different
methods.
138
00:06:53,370 --> 00:06:56,790
And reasonable people will try
to argue that's my method has
139
00:06:56,790 --> 00:07:00,140
these desirable properties but
somebody else may say, here's
140
00:07:00,140 --> 00:07:03,740
another method that has certain
desirable properties.
141
00:07:03,740 --> 00:07:08,220
And it's not clear what
the best method is.
142
00:07:08,220 --> 00:07:11,330
So it's good to have some
understanding of what the
143
00:07:11,330 --> 00:07:16,910
issues are and to know at least
what is the general
144
00:07:16,910 --> 00:07:20,150
class of methods that one tries
to consider, how does
145
00:07:20,150 --> 00:07:22,380
one go about such problems.
146
00:07:22,380 --> 00:07:24,360
So we're going to see
lots and lots of
147
00:07:24,360 --> 00:07:25,880
different inference methods.
148
00:07:25,880 --> 00:07:27,350
We're not going to tell
you that one is
149
00:07:27,350 --> 00:07:28,730
better than the other.
150
00:07:28,730 --> 00:07:30,940
But it's important to understand
what are the
151
00:07:30,940 --> 00:07:33,980
concepts between those
different methods.
152
00:07:33,980 --> 00:07:38,710
And finally, statistics can
be misused really badly.
153
00:07:38,710 --> 00:07:41,870
That is, one can come up with
methods that you think are
154
00:07:41,870 --> 00:07:48,650
sound, but in fact they're
not quite that.
155
00:07:48,650 --> 00:07:52,830
I will bring some examples next
time and talk a little
156
00:07:52,830 --> 00:07:54,290
more about this.
157
00:07:54,290 --> 00:07:58,540
So, they want to say, you have
some data, you want to make
158
00:07:58,540 --> 00:08:02,590
some inference from them, what
many people will do is to go
159
00:08:02,590 --> 00:08:06,340
to Wikipedia, find a statistical
test that they
160
00:08:06,340 --> 00:08:08,990
think it applies to that
situation, plug in numbers,
161
00:08:08,990 --> 00:08:10,880
and present results.
162
00:08:10,880 --> 00:08:14,220
Are the conclusions that they
get really justified or are
163
00:08:14,220 --> 00:08:16,400
they misusing statistical
methods?
164
00:08:16,400 --> 00:08:20,520
Well, too many people actually
do misuse statistics and
165
00:08:20,520 --> 00:08:24,530
conclusions that people
get are often false.
166
00:08:24,530 --> 00:08:29,840
So it's important to, besides
just being able to copy
167
00:08:29,840 --> 00:08:32,600
statistical tests and use them,
to understand what are
168
00:08:32,600 --> 00:08:35,860
the assumptions between the
different methods and what
169
00:08:35,860 --> 00:08:40,559
kind of guarantees they
have, if any.
170
00:08:40,559 --> 00:08:44,420
All right, so we'll try to do a
quick tour through the field
171
00:08:44,420 --> 00:08:47,600
of inference in this lecture and
the next few lectures that
172
00:08:47,600 --> 00:08:51,700
we have left this semester and
try to highlight at the very
173
00:08:51,700 --> 00:08:53,940
high level the main concept
skills, and
174
00:08:53,940 --> 00:08:56,990
techniques that come in.
175
00:08:56,990 --> 00:08:59,840
Let's start with some
generalities and some general
176
00:08:59,840 --> 00:09:01,090
statements.
177
00:09:01,090 --> 00:09:03,090
178
00:09:03,090 --> 00:09:07,090
One first statement is that
statistics or inference
179
00:09:07,090 --> 00:09:11,800
problems come up in very
different guises.
180
00:09:11,800 --> 00:09:16,490
And they may look as if they are
of very different forms.
181
00:09:16,490 --> 00:09:20,190
Although, at some fundamental
level, the basic issues turn
182
00:09:20,190 --> 00:09:23,320
out to be always pretty
much the same.
183
00:09:23,320 --> 00:09:27,880
So let's look at this example.
184
00:09:27,880 --> 00:09:31,420
There's an unknown signal
that's being sent.
185
00:09:31,420 --> 00:09:35,840
It's sent through some medium,
and that medium just takes the
186
00:09:35,840 --> 00:09:39,180
signal and amplifies it
by a certain number.
187
00:09:39,180 --> 00:09:41,340
So you can think of
somebody shouting.
188
00:09:41,340 --> 00:09:42,920
There's the air out there.
189
00:09:42,920 --> 00:09:46,420
What you shouted will be
attenuated through the air
190
00:09:46,420 --> 00:09:48,040
until it gets to a receiver.
191
00:09:48,040 --> 00:09:51,730
And that receiver then observes
this, but together
192
00:09:51,730 --> 00:09:53,110
with some random noise.
193
00:09:53,110 --> 00:09:56,040
194
00:09:56,040 --> 00:10:00,390
Here I meant S. S is the signal
that's being sent.
195
00:10:00,390 --> 00:10:06,280
And what you observe is an X.
196
00:10:06,280 --> 00:10:09,240
You observe X, so what kind
of inference problems
197
00:10:09,240 --> 00:10:11,240
could we have here?
198
00:10:11,240 --> 00:10:15,400
In some cases, you want to build
a model of the physical
199
00:10:15,400 --> 00:10:17,450
phenomenon that you're
dealing with.
200
00:10:17,450 --> 00:10:21,180
So for example, you don't know
the attenuation of your signal
201
00:10:21,180 --> 00:10:25,190
and you try to find out what
this number is based on the
202
00:10:25,190 --> 00:10:26,980
observations that you have.
203
00:10:26,980 --> 00:10:30,240
So the way this is done in
engineering systems is that
204
00:10:30,240 --> 00:10:35,020
you design a certain signal, you
know what it is, you shout
205
00:10:35,020 --> 00:10:39,560
a particular word, and then
the receiver listens.
206
00:10:39,560 --> 00:10:43,460
And based on the intensity of
the signal that they get, they
207
00:10:43,460 --> 00:10:48,380
try to make a guess about A. So
you don't know A, but you
208
00:10:48,380 --> 00:10:52,460
know S. And by observing X,
you get some information
209
00:10:52,460 --> 00:10:54,270
about what A is.
210
00:10:54,270 --> 00:10:57,810
So in this case, you're trying
to build a model of the medium
211
00:10:57,810 --> 00:11:01,170
through which your signal
is propagating.
212
00:11:01,170 --> 00:11:04,600
So sometimes one would call
problems of this kind, let's
213
00:11:04,600 --> 00:11:07,990
say, system identification.
214
00:11:07,990 --> 00:11:11,980
In a different version of an
inference problem that comes
215
00:11:11,980 --> 00:11:15,300
with this picture, you've
done your modeling.
216
00:11:15,300 --> 00:11:18,160
You know your A. You know the
medium through which the
217
00:11:18,160 --> 00:11:22,330
signal is going, but it's
a communication system.
218
00:11:22,330 --> 00:11:24,190
This person is trying
to communicate
219
00:11:24,190 --> 00:11:26,140
something to that person.
220
00:11:26,140 --> 00:11:30,250
So you send the signal S, but
that person receives a noisy
221
00:11:30,250 --> 00:11:35,430
version of S. So that person
tries to reconstruct S based
222
00:11:35,430 --> 00:11:36,930
on X.
223
00:11:36,930 --> 00:11:42,210
So in both cases, we have a
linear relation between X and
224
00:11:42,210 --> 00:11:43,490
the unknown quantity.
225
00:11:43,490 --> 00:11:47,360
In one version, A is the unknown
and we know S. In the
226
00:11:47,360 --> 00:11:51,670
other version, A is known,
and so we try to infer S.
227
00:11:51,670 --> 00:11:54,300
Mathematically, you can see that
this is essentially the
228
00:11:54,300 --> 00:11:57,060
same kind of problem
in both cases.
229
00:11:57,060 --> 00:12:03,590
Although, the kind of practical
problem that you're
230
00:12:03,590 --> 00:12:07,580
trying to solve is a
little different.
231
00:12:07,580 --> 00:12:11,880
So we will not be making any
distinctions between problems
232
00:12:11,880 --> 00:12:15,940
of the model building type as
opposed to models where you
233
00:12:15,940 --> 00:12:19,260
try to estimate some unknown
signal and so on.
234
00:12:19,260 --> 00:12:22,400
Because conceptually, the tools
that one uses for both
235
00:12:22,400 --> 00:12:26,850
types of problems are
essentially the same.
236
00:12:26,850 --> 00:12:30,430
OK, next a very useful
classification
237
00:12:30,430 --> 00:12:31,680
of inference problems--
238
00:12:31,680 --> 00:12:34,170
239
00:12:34,170 --> 00:12:37,760
the unknown quantity that you're
trying to estimate
240
00:12:37,760 --> 00:12:40,770
could be either a discrete
one that takes a
241
00:12:40,770 --> 00:12:43,040
small number of values.
242
00:12:43,040 --> 00:12:45,605
So this could be discrete
problems, such as the airplane
243
00:12:45,605 --> 00:12:48,080
radar problem we encountered
back a long
244
00:12:48,080 --> 00:12:50,120
time ago in this class.
245
00:12:50,120 --> 00:12:52,120
So there's two possibilities--
246
00:12:52,120 --> 00:12:55,450
an airplane is out there or an
airplane is not out there.
247
00:12:55,450 --> 00:12:57,050
And you're trying to
make a decision
248
00:12:57,050 --> 00:12:58,940
between these two options.
249
00:12:58,940 --> 00:13:01,570
Or you can have other problems
would you have, let's say,
250
00:13:01,570 --> 00:13:03,380
four possible options.
251
00:13:03,380 --> 00:13:05,970
You don't know which one is
true, but you get data and you
252
00:13:05,970 --> 00:13:09,040
try to figure out which
one is true.
253
00:13:09,040 --> 00:13:12,050
In problems of these kind,
usually you want to make a
254
00:13:12,050 --> 00:13:14,050
decision based on your data.
255
00:13:14,050 --> 00:13:17,000
And you're interested in the
probability of making a
256
00:13:17,000 --> 00:13:18,040
correct decision.
257
00:13:18,040 --> 00:13:19,430
You would like that
probability to
258
00:13:19,430 --> 00:13:21,830
be as high as possible.
259
00:13:21,830 --> 00:13:24,000
Estimation problems are
a little different.
260
00:13:24,000 --> 00:13:28,540
Here you have some continuous
quantity that's not known.
261
00:13:28,540 --> 00:13:31,860
And you try to make a good
guess of that quantity.
262
00:13:31,860 --> 00:13:36,050
And you would like your guess to
be as close as possible to
263
00:13:36,050 --> 00:13:37,310
the true quantity.
264
00:13:37,310 --> 00:13:40,270
So the polling problem
was of this type.
265
00:13:40,270 --> 00:13:44,720
There was an unknown fraction
f of the population that had
266
00:13:44,720 --> 00:13:45,870
some property.
267
00:13:45,870 --> 00:13:50,040
And you try to estimate f as
accurately as you can.
268
00:13:50,040 --> 00:13:53,420
So the distinction here is that
usually here the unknown
269
00:13:53,420 --> 00:13:56,440
quantity takes on discrete
set of values.
270
00:13:56,440 --> 00:13:57,890
Here the unknown quantity
takes a
271
00:13:57,890 --> 00:14:00,030
continuous set of values.
272
00:14:00,030 --> 00:14:02,980
Here we're interested in the
probability of error.
273
00:14:02,980 --> 00:14:07,400
Here we're interested in
the size of the error.
274
00:14:07,400 --> 00:14:11,000
Broadly speaking, most inference
problems fall either
275
00:14:11,000 --> 00:14:13,940
in this category or
in that category.
276
00:14:13,940 --> 00:14:17,230
Although, if you want to
complicate life, you can also
277
00:14:17,230 --> 00:14:20,250
think or construct problems
where both of these aspects
278
00:14:20,250 --> 00:14:24,410
are simultaneously present.
279
00:14:24,410 --> 00:14:28,530
OK, finally since we're in
classification mode, there is
280
00:14:28,530 --> 00:14:33,670
a very big, important dichotomy
into how one goes
281
00:14:33,670 --> 00:14:35,940
about inference problems.
282
00:14:35,940 --> 00:14:39,150
And here there's two
fundamentally different
283
00:14:39,150 --> 00:14:46,070
philosophical points of view,
which is how do we model the
284
00:14:46,070 --> 00:14:50,270
quantity that is unknown?
285
00:14:50,270 --> 00:14:54,530
In one approach, you say there's
a certain quantity
286
00:14:54,530 --> 00:14:57,590
that has a definite value.
287
00:14:57,590 --> 00:15:00,010
It just happens that
they don't know it.
288
00:15:00,010 --> 00:15:01,320
But it's a number.
289
00:15:01,320 --> 00:15:03,290
There's nothing random
about it.
290
00:15:03,290 --> 00:15:05,945
So think of trying to estimate
some physical quantity.
291
00:15:05,945 --> 00:15:10,630
292
00:15:10,630 --> 00:15:13,350
You're making measurements, you
try to estimate the mass
293
00:15:13,350 --> 00:15:15,820
of an electron, which
is a sort of
294
00:15:15,820 --> 00:15:18,270
universal physical constant.
295
00:15:18,270 --> 00:15:20,320
There's nothing random
about it.
296
00:15:20,320 --> 00:15:22,340
It's a fixed number.
297
00:15:22,340 --> 00:15:29,120
You get data, because you have
some measuring apparatus.
298
00:15:29,120 --> 00:15:33,020
And that measuring apparatus,
depending on what that results
299
00:15:33,020 --> 00:15:37,160
that you get are affected by the
true mass of the electron,
300
00:15:37,160 --> 00:15:39,340
but there's also some noise.
301
00:15:39,340 --> 00:15:42,200
You take the data out of your
measuring apparatus and you
302
00:15:42,200 --> 00:15:44,465
try to come up with
some estimate of
303
00:15:44,465 --> 00:15:47,220
that quantity theta.
304
00:15:47,220 --> 00:15:49,760
So this is definitely a
legitimate picture, but the
305
00:15:49,760 --> 00:15:52,370
important thing in this picture
is that this theta is
306
00:15:52,370 --> 00:15:54,570
written as lowercase.
307
00:15:54,570 --> 00:15:58,110
And that's to make the point
that it's a real number, not a
308
00:15:58,110 --> 00:16:00,900
random variable.
309
00:16:00,900 --> 00:16:03,230
There's a different
philosophical approach which
310
00:16:03,230 --> 00:16:08,180
says, well, anything that I
don't know I should model it
311
00:16:08,180 --> 00:16:10,190
as a random variable.
312
00:16:10,190 --> 00:16:11,130
Yes, I know.
313
00:16:11,130 --> 00:16:14,500
The mass of the electron
is not really random.
314
00:16:14,500 --> 00:16:15,690
It's a constant.
315
00:16:15,690 --> 00:16:17,920
But I don't know what it is.
316
00:16:17,920 --> 00:16:22,510
I have some vague sense,
perhaps, what it is perhaps
317
00:16:22,510 --> 00:16:24,290
because of the experiments
that some other
318
00:16:24,290 --> 00:16:25,940
people carried out.
319
00:16:25,940 --> 00:16:30,560
So perhaps I have a prior
distribution on the possible
320
00:16:30,560 --> 00:16:32,160
values of Theta.
321
00:16:32,160 --> 00:16:34,990
And that prior distribution
doesn't mean that the nature
322
00:16:34,990 --> 00:16:39,320
is random, but it's more of a
subjective description of my
323
00:16:39,320 --> 00:16:44,570
subjective beliefs of where do
I think this constant number
324
00:16:44,570 --> 00:16:46,200
happens to be.
325
00:16:46,200 --> 00:16:50,140
So even though it's not truly
random, I model my initial
326
00:16:50,140 --> 00:16:52,600
beliefs before the experiment
starts.
327
00:16:52,600 --> 00:16:55,790
In terms of a prior
distribution, I view it as a
328
00:16:55,790 --> 00:16:57,470
random variable.
329
00:16:57,470 --> 00:17:01,850
Then I observe another related
random variable through some
330
00:17:01,850 --> 00:17:02,930
measuring apparatus.
331
00:17:02,930 --> 00:17:05,920
And then I use this again
to create an estimate.
332
00:17:05,920 --> 00:17:08,819
333
00:17:08,819 --> 00:17:12,069
So these two pictures
philosophically are very
334
00:17:12,069 --> 00:17:13,589
different from each other.
335
00:17:13,589 --> 00:17:17,130
Here we treat the unknown
quantities as unknown numbers.
336
00:17:17,130 --> 00:17:20,589
Here we treat them as
random variables.
337
00:17:20,589 --> 00:17:24,829
When we treat them as a random
variables, then we know pretty
338
00:17:24,829 --> 00:17:27,109
much already what we
should be doing.
339
00:17:27,109 --> 00:17:29,470
We should just use
the Bayes rule.
340
00:17:29,470 --> 00:17:31,850
Based on X, find
the conditional
341
00:17:31,850 --> 00:17:33,670
distribution of Theta.
342
00:17:33,670 --> 00:17:37,520
And that's what we will be doing
mostly over this lecture
343
00:17:37,520 --> 00:17:40,010
and the next lecture.
344
00:17:40,010 --> 00:17:44,660
Now in both cases, what you end
up getting at the end is
345
00:17:44,660 --> 00:17:47,240
an estimate.
346
00:17:47,240 --> 00:17:52,120
But actually, that estimate is
what kind of object is it?
347
00:17:52,120 --> 00:17:55,170
It's a random variable
in both cases.
348
00:17:55,170 --> 00:17:56,000
Why?
349
00:17:56,000 --> 00:17:58,130
Even in this case where
theta was a
350
00:17:58,130 --> 00:18:01,060
constant, my data are random.
351
00:18:01,060 --> 00:18:02,860
I do my data processing.
352
00:18:02,860 --> 00:18:06,050
So I calculate a function
of the data, the
353
00:18:06,050 --> 00:18:07,580
data are random variables.
354
00:18:07,580 --> 00:18:11,390
So out here we output something
which is a function
355
00:18:11,390 --> 00:18:12,770
of a random variable.
356
00:18:12,770 --> 00:18:15,830
So this quantity here
will be also random.
357
00:18:15,830 --> 00:18:18,400
It's affected by the noise and
the experiment that I have
358
00:18:18,400 --> 00:18:19,650
been doing.
359
00:18:19,650 --> 00:18:22,330
That's why these estimators
will be denoted
360
00:18:22,330 --> 00:18:24,920
by uppercase Thetas.
361
00:18:24,920 --> 00:18:26,740
And we will be using hats.
362
00:18:26,740 --> 00:18:29,030
Hat, usually in estimation,
means
363
00:18:29,030 --> 00:18:32,990
an estimate of something.
364
00:18:32,990 --> 00:18:35,380
All right, so this is
the big picture.
365
00:18:35,380 --> 00:18:38,690
We're going to start with
the Bayesian version.
366
00:18:38,690 --> 00:18:42,830
And then the last few lectures
we're going to talk about the
367
00:18:42,830 --> 00:18:45,690
non-Bayesian version or
the classical one.
368
00:18:45,690 --> 00:18:48,610
By the way, I should say that
statisticians have been
369
00:18:48,610 --> 00:18:52,500
debating fiercely for 100 years
whether the right way to
370
00:18:52,500 --> 00:18:56,030
approach statistics is to go
the classical way or the
371
00:18:56,030 --> 00:18:57,420
Bayesian way.
372
00:18:57,420 --> 00:19:00,530
And there have been tides going
back and forth between
373
00:19:00,530 --> 00:19:02,260
the two sides.
374
00:19:02,260 --> 00:19:05,330
These days, Bayesian methods
tend to become a little more
375
00:19:05,330 --> 00:19:07,320
popular for various reasons.
376
00:19:07,320 --> 00:19:11,730
We're going to come back
to this later.
377
00:19:11,730 --> 00:19:14,610
All right, so in Bayesian
estimation, what we got in our
378
00:19:14,610 --> 00:19:16,610
hands is Bayes rule.
379
00:19:16,610 --> 00:19:19,380
And if you have Bayes rule,
there's not a lot
380
00:19:19,380 --> 00:19:21,340
that's left to do.
381
00:19:21,340 --> 00:19:24,190
We have different forms of the
Bayes rule, depending on
382
00:19:24,190 --> 00:19:27,920
whether we're dealing with
discrete data, And discrete
383
00:19:27,920 --> 00:19:32,310
quantities to estimate, or
continuous data, and so on.
384
00:19:32,310 --> 00:19:36,020
In the hypothesis testing
problem, the unknown quantity
385
00:19:36,020 --> 00:19:38,210
Theta is discrete.
386
00:19:38,210 --> 00:19:42,890
So in both cases here,
we have a P of Theta.
387
00:19:42,890 --> 00:19:45,530
We obtain data, the X's.
388
00:19:45,530 --> 00:19:49,040
And on the basis of the X that
we observe, we can calculate
389
00:19:49,040 --> 00:19:53,340
the posterior distribution
of Theta, given the data.
390
00:19:53,340 --> 00:19:59,840
So to use Bayesian inference,
what do we start with?
391
00:19:59,840 --> 00:20:03,160
We start with some priors.
392
00:20:03,160 --> 00:20:05,910
These are our initial
beliefs about what
393
00:20:05,910 --> 00:20:07,890
Theta that might be.
394
00:20:07,890 --> 00:20:10,440
That's before we do
the experiment.
395
00:20:10,440 --> 00:20:13,840
We have a model of the
experimental aparatus.
396
00:20:13,840 --> 00:20:17,520
397
00:20:17,520 --> 00:20:21,550
And the model of the
experimental apparatus tells
398
00:20:21,550 --> 00:20:28,040
us if this Theta is true, I'm
going to see X's of that kind.
399
00:20:28,040 --> 00:20:31,480
If that other Theta is true, I'm
going to see X's that they
400
00:20:31,480 --> 00:20:33,130
are somewhere else.
401
00:20:33,130 --> 00:20:35,200
That models my apparatus.
402
00:20:35,200 --> 00:20:39,150
And based on that knowledge,
once I observe I have these
403
00:20:39,150 --> 00:20:41,975
two functions in my hands, we
have already seen that if you
404
00:20:41,975 --> 00:20:44,760
know those two functions, you
can also calculate the
405
00:20:44,760 --> 00:20:46,550
denominator here.
406
00:20:46,550 --> 00:20:50,900
So all of these functions are
available, so you can compute,
407
00:20:50,900 --> 00:20:54,170
you can find a formula for
this function as well.
408
00:20:54,170 --> 00:20:58,780
And as soon as you observe the
data, that X's, you plug in
409
00:20:58,780 --> 00:21:02,220
here the numerical value
of those X's.
410
00:21:02,220 --> 00:21:04,720
And you get a function
of Theta.
411
00:21:04,720 --> 00:21:07,870
And this is the posterior
distribution of Theta, given
412
00:21:07,870 --> 00:21:09,680
the data that you have seen.
413
00:21:09,680 --> 00:21:11,930
So you've already done
a fair number of
414
00:21:11,930 --> 00:21:13,760
exercises of these kind.
415
00:21:13,760 --> 00:21:17,320
So we not say more about this.
416
00:21:17,320 --> 00:21:20,470
And there's a similar formula as
you know for the case where
417
00:21:20,470 --> 00:21:22,460
we have continuous data.
418
00:21:22,460 --> 00:21:25,140
If the X's are continuous random
variable, then the
419
00:21:25,140 --> 00:21:28,620
formula is the same, except
that X's are described by
420
00:21:28,620 --> 00:21:31,630
densities instead of being
described by a probability
421
00:21:31,630 --> 00:21:32,880
mass functions.
422
00:21:32,880 --> 00:21:35,170
423
00:21:35,170 --> 00:21:40,200
OK, now if Theta is continuous,
then we're dealing
424
00:21:40,200 --> 00:21:42,160
with estimation problems.
425
00:21:42,160 --> 00:21:44,880
But the story is once
more the same.
426
00:21:44,880 --> 00:21:47,920
You're going to use the Bayes
rule to come up with the
427
00:21:47,920 --> 00:21:51,090
posterior density of Theta,
given the data
428
00:21:51,090 --> 00:21:53,300
that you have observed.
429
00:21:53,300 --> 00:21:57,250
Now just for the sake of the
example, let's come back to
430
00:21:57,250 --> 00:21:58,900
this picture here.
431
00:21:58,900 --> 00:22:03,490
Suppose that something is flying
in the air, and maybe
432
00:22:03,490 --> 00:22:07,800
this is just an object in the
air close to the Earth.
433
00:22:07,800 --> 00:22:10,820
So because of gravity, the
trajectory that it's going to
434
00:22:10,820 --> 00:22:15,170
follow it's going to
be a parabola.
435
00:22:15,170 --> 00:22:18,014
So this is the general equation
of a parabola.
436
00:22:18,014 --> 00:22:23,450
Zt is the position of my
objects at time t.
437
00:22:23,450 --> 00:22:26,310
438
00:22:26,310 --> 00:22:29,500
But I don't know exactly
which parabola it is.
439
00:22:29,500 --> 00:22:32,690
So the parameters of the
parabola are unknown
440
00:22:32,690 --> 00:22:34,040
quantities.
441
00:22:34,040 --> 00:22:37,710
What I can do is to go and
measure the position of my
442
00:22:37,710 --> 00:22:41,880
objects at different times.
443
00:22:41,880 --> 00:22:44,575
But unfortunately, my
measurements are noisy.
444
00:22:44,575 --> 00:22:47,380
445
00:22:47,380 --> 00:22:51,070
What I want to do is to model
the motion of my object.
446
00:22:51,070 --> 00:22:56,260
So I guess in the picture, the
axis would be t going this way
447
00:22:56,260 --> 00:22:59,980
and Z going this way.
448
00:22:59,980 --> 00:23:02,470
And on the basis of the
data that they get,
449
00:23:02,470 --> 00:23:05,020
these are my X's.
450
00:23:05,020 --> 00:23:07,390
I want to figure
out the Thetas.
451
00:23:07,390 --> 00:23:09,570
That is, I want to figure
out the exact
452
00:23:09,570 --> 00:23:11,840
equation of this parabola.
453
00:23:11,840 --> 00:23:14,940
Now if somebody gives you
probability distributions for
454
00:23:14,940 --> 00:23:18,490
Theta, these would
be your priors.
455
00:23:18,490 --> 00:23:19,840
So this is given.
456
00:23:19,840 --> 00:23:23,200
457
00:23:23,200 --> 00:23:26,200
We need the conditional
distribution of the X's given
458
00:23:26,200 --> 00:23:27,360
the Thetas.
459
00:23:27,360 --> 00:23:30,870
Well, we have the conditional
distribution of Z, given the
460
00:23:30,870 --> 00:23:32,920
Thetas from this equation.
461
00:23:32,920 --> 00:23:36,040
And then by playing with this
equation, you can also find
462
00:23:36,040 --> 00:23:42,460
how is X distributed if Theta
takes a particular value.
463
00:23:42,460 --> 00:23:46,420
So you do have all of the
densities that you might need.
464
00:23:46,420 --> 00:23:48,790
And you can apply
the Bayes rule.
465
00:23:48,790 --> 00:23:53,620
And at the end, your end result
would be a formula for
466
00:23:53,620 --> 00:23:57,270
the distribution of Theta,
given to the X
467
00:23:57,270 --> 00:23:59,130
that you have observed--
468
00:23:59,130 --> 00:24:03,000
except for one sort of
computation, or to make things
469
00:24:03,000 --> 00:24:04,470
more interesting.
470
00:24:04,470 --> 00:24:07,680
Instead of these X's and Theta's
being single random
471
00:24:07,680 --> 00:24:11,070
variables that we have here,
typically those X's and
472
00:24:11,070 --> 00:24:13,400
Theta's will be
multi-dimensional random
473
00:24:13,400 --> 00:24:16,490
variables or will correspond
to multiple ones.
474
00:24:16,490 --> 00:24:19,920
So this little Theta here
actually stands for a triplet
475
00:24:19,920 --> 00:24:22,880
of Theta0, Theta1, and Theta2.
476
00:24:22,880 --> 00:24:26,820
And that X here stands here for
the entire sequence of X's
477
00:24:26,820 --> 00:24:28,410
that we have observed.
478
00:24:28,410 --> 00:24:31,060
So in reality, the object that
you're going to get at to the
479
00:24:31,060 --> 00:24:35,900
end after inference is done is
a function that you plug in
480
00:24:35,900 --> 00:24:39,430
the values of the data and you
get the function of the
481
00:24:39,430 --> 00:24:43,240
Theta's that tells you the
relative likelihoods of
482
00:24:43,240 --> 00:24:46,780
different Theta triplets.
483
00:24:46,780 --> 00:24:49,760
So what I'm saying is that this
is no harder than the
484
00:24:49,760 --> 00:24:53,720
problems that you have dealt
with so far, except perhaps
485
00:24:53,720 --> 00:24:56,020
for the complication that's
usually in interesting
486
00:24:56,020 --> 00:24:57,490
inference problems.
487
00:24:57,490 --> 00:25:01,940
Your Theta's and X's are often
the vectors of random
488
00:25:01,940 --> 00:25:05,490
variables instead of individual
random variables.
489
00:25:05,490 --> 00:25:09,630
Now if you are to do estimation
in a case where you
490
00:25:09,630 --> 00:25:13,520
have discrete data, again the
situation is no different.
491
00:25:13,520 --> 00:25:17,020
We still have a Bayes rule of
the same kind, except that
492
00:25:17,020 --> 00:25:19,540
densities gets replaced
by PMF's.
493
00:25:19,540 --> 00:25:23,680
If X is discrete, you put a P
here instead of putting an f.
494
00:25:23,680 --> 00:25:27,990
So an example of an estimation
problem with discrete data is
495
00:25:27,990 --> 00:25:29,740
similar to the polling
problem.
496
00:25:29,740 --> 00:25:31,600
You have a coin.
497
00:25:31,600 --> 00:25:33,500
It has an unknown
parameter Theta.
498
00:25:33,500 --> 00:25:35,230
This is the probability
of obtaining heads.
499
00:25:35,230 --> 00:25:37,410
You flip the coin many times.
500
00:25:37,410 --> 00:25:41,560
What can you tell me about
the true value of Theta?
501
00:25:41,560 --> 00:25:46,200
A classical statistician, at
this point, would say, OK, I'm
502
00:25:46,200 --> 00:25:48,900
going to use an estimator,
the most reasonable
503
00:25:48,900 --> 00:25:50,950
one, which is this.
504
00:25:50,950 --> 00:25:54,200
How many heads did they
obtain in n trials?
505
00:25:54,200 --> 00:25:56,440
Divide by the total
number of trials.
506
00:25:56,440 --> 00:26:00,700
This is my estimate of
the bias of my coin.
507
00:26:00,700 --> 00:26:02,860
And then the classical
statistician would continue
508
00:26:02,860 --> 00:26:07,610
from here and try to prove some
properties and argue that
509
00:26:07,610 --> 00:26:10,030
this estimate is a good one.
510
00:26:10,030 --> 00:26:12,850
For example, we have the weak
law of large numbers that
511
00:26:12,850 --> 00:26:15,630
tells us that this particular
estimate converges in
512
00:26:15,630 --> 00:26:17,990
probability to the
true parameter.
513
00:26:17,990 --> 00:26:21,000
This is a kind of guarantee
that's useful to have.
514
00:26:21,000 --> 00:26:23,410
And the classical statistician
would pretty much close the
515
00:26:23,410 --> 00:26:24,660
subject in this way.
516
00:26:24,660 --> 00:26:27,340
517
00:26:27,340 --> 00:26:30,160
What would the Bayesian
person do differently?
518
00:26:30,160 --> 00:26:35,040
The Bayesian person would start
by assuming a prior
519
00:26:35,040 --> 00:26:37,100
distribution of Theta.
520
00:26:37,100 --> 00:26:39,820
Instead of treating Theta as
an unknown constant, they
521
00:26:39,820 --> 00:26:44,340
would say that Theta would speak
randomly or pretend that
522
00:26:44,340 --> 00:26:47,360
it would speak randomly
and assume a
523
00:26:47,360 --> 00:26:49,300
distribution on Theta.
524
00:26:49,300 --> 00:26:54,290
So for example, if you don't
know they need anything more,
525
00:26:54,290 --> 00:26:57,510
you might assume that any value
for the bias of the coin
526
00:26:57,510 --> 00:27:01,460
is as likely as any other value
of the bias of the coin.
527
00:27:01,460 --> 00:27:04,150
And this way so the probability
distribution
528
00:27:04,150 --> 00:27:05,720
that's uniform.
529
00:27:05,720 --> 00:27:09,840
Or if you have a little more
faith in the manufacturing
530
00:27:09,840 --> 00:27:13,270
processes that's created that
coin, you might choose your
531
00:27:13,270 --> 00:27:17,660
prior to be a distribution
that's centered around 1/2 and
532
00:27:17,660 --> 00:27:21,860
sits fairly narrowly centered
around 1/2.
533
00:27:21,860 --> 00:27:24,500
That would be a prior
distribution in which you say,
534
00:27:24,500 --> 00:27:27,920
well, I believe that the
manufacturer tried to make my
535
00:27:27,920 --> 00:27:29,410
coin to be fair.
536
00:27:29,410 --> 00:27:33,070
But they often makes some
mistakes, so it's going to be,
537
00:27:33,070 --> 00:27:36,600
I believe, it's approximately
1/2 but not quite.
538
00:27:36,600 --> 00:27:40,050
So depending on your beliefs,
you would choose an
539
00:27:40,050 --> 00:27:43,630
appropriate prior for the
distribution of Theta.
540
00:27:43,630 --> 00:27:48,610
And then you would use the
Bayes rule to find the
541
00:27:48,610 --> 00:27:52,270
probabilities of different
values of Theta, based on the
542
00:27:52,270 --> 00:27:53,520
data that you have observed.
543
00:27:53,520 --> 00:27:59,620
544
00:27:59,620 --> 00:28:04,640
So no matter which version of
the Bayes rule that you use,
545
00:28:04,640 --> 00:28:10,540
the end product of the Bayes
rule is going to be either a
546
00:28:10,540 --> 00:28:14,400
plot of this kind or a
plot of that kind.
547
00:28:14,400 --> 00:28:16,740
So what am I plotting here?
548
00:28:16,740 --> 00:28:19,810
This axis is the Theta axis.
549
00:28:19,810 --> 00:28:23,830
These are the possible values
of the unknown quantity that
550
00:28:23,830 --> 00:28:26,670
we're trying to estimate.
551
00:28:26,670 --> 00:28:28,990
In the continuous
case, theta is a
552
00:28:28,990 --> 00:28:30,800
continuous random variable.
553
00:28:30,800 --> 00:28:32,560
I obtain my data.
554
00:28:32,560 --> 00:28:36,430
And I plot for the posterior
probability distribution after
555
00:28:36,430 --> 00:28:37,940
observing my data.
556
00:28:37,940 --> 00:28:42,220
And I'm plotting here the
probability density for Theta.
557
00:28:42,220 --> 00:28:45,500
So this is a plot
of that density.
558
00:28:45,500 --> 00:28:49,210
In the discrete case, theta can
take finitely many values
559
00:28:49,210 --> 00:28:51,570
or a discrete set of values.
560
00:28:51,570 --> 00:28:54,470
And for each one of those
values, I'm telling you how
561
00:28:54,470 --> 00:28:58,080
likely is that the value to be
the correct one, given the
562
00:28:58,080 --> 00:29:01,040
data that I have observed.
563
00:29:01,040 --> 00:29:04,990
And in general, what you would
go back to your boss and
564
00:29:04,990 --> 00:29:08,520
report after you've done all
your inference work would be
565
00:29:08,520 --> 00:29:10,870
either a plot of this kinds
or of that kind.
566
00:29:10,870 --> 00:29:14,180
So you go to your boss
who asks you, what is
567
00:29:14,180 --> 00:29:15,190
the value of Theta?
568
00:29:15,190 --> 00:29:17,490
And you say, well, I only
have limited data.
569
00:29:17,490 --> 00:29:19,420
That I don't know what it is.
570
00:29:19,420 --> 00:29:22,920
It could be this, with
so much probability.
571
00:29:22,920 --> 00:29:24,640
There's probability.
572
00:29:24,640 --> 00:29:27,220
OK, let's throw in some
numbers here.
573
00:29:27,220 --> 00:29:32,250
There's probability 0.3 that
Theta is this value.
574
00:29:32,250 --> 00:29:36,100
There's probability 0.2 that
Theta is this value, 0.1 that
575
00:29:36,100 --> 00:29:39,420
it's this one, 0.1 that it's
this one, 0.2 that it's that
576
00:29:39,420 --> 00:29:40,830
one, and so on.
577
00:29:40,830 --> 00:29:44,890
OK, now bosses often want
simple answers.
578
00:29:44,890 --> 00:29:48,480
They say, OK, you're
talking too much.
579
00:29:48,480 --> 00:29:51,770
What do you think Theta is?
580
00:29:51,770 --> 00:29:55,920
And now you're forced
to make a decision.
581
00:29:55,920 --> 00:30:00,680
If that was the situation and
you have to make a decision,
582
00:30:00,680 --> 00:30:02,370
how would you make it?
583
00:30:02,370 --> 00:30:06,880
Well, I'm going to make a
decision that's most likely to
584
00:30:06,880 --> 00:30:09,120
be correct.
585
00:30:09,120 --> 00:30:13,060
If I make this decision,
what's going to happen?
586
00:30:13,060 --> 00:30:17,670
Theta is this value with
probability 0.2, which means
587
00:30:17,670 --> 00:30:21,150
there's probably 0.8 that
they make an error
588
00:30:21,150 --> 00:30:23,280
if I make that guess.
589
00:30:23,280 --> 00:30:29,370
If I make that decision, this
decision has probably 0.3 of
590
00:30:29,370 --> 00:30:30,750
being the correct one.
591
00:30:30,750 --> 00:30:34,530
So I have probably
of error 0.7.
592
00:30:34,530 --> 00:30:38,460
So if you want to just maximize
the probability of
593
00:30:38,460 --> 00:30:41,730
giving the correct decision, or
if you want to minimize the
594
00:30:41,730 --> 00:30:44,780
probability of making an
incorrect decision, what
595
00:30:44,780 --> 00:30:48,790
you're going to choose to report
is that value of Theta
596
00:30:48,790 --> 00:30:51,450
for which the probability
is highest.
597
00:30:51,450 --> 00:30:54,230
So in this case, I would
choose to report this
598
00:30:54,230 --> 00:30:58,210
particular value, the most
likely value of Theta, given
599
00:30:58,210 --> 00:31:00,120
what I have observed.
600
00:31:00,120 --> 00:31:04,640
And that value is called them
maximum a posteriori
601
00:31:04,640 --> 00:31:07,550
probability estimate.
602
00:31:07,550 --> 00:31:11,550
It's going to be this
one in our case.
603
00:31:11,550 --> 00:31:16,830
So picking the point in the
posterior PMF that has the
604
00:31:16,830 --> 00:31:19,040
highest probability.
605
00:31:19,040 --> 00:31:20,720
That's the reasonable
thing to do.
606
00:31:20,720 --> 00:31:23,850
This is the optimal thing to do
if you want to minimize the
607
00:31:23,850 --> 00:31:27,340
probability of an incorrect
inference.
608
00:31:27,340 --> 00:31:31,400
And that's what people do
usually if they need to report
609
00:31:31,400 --> 00:31:35,280
a single answer, if they need
to report a single decision.
610
00:31:35,280 --> 00:31:39,530
How about in the estimation
context?
611
00:31:39,530 --> 00:31:43,250
If that's what you know about
Theta, Theta could be around
612
00:31:43,250 --> 00:31:46,670
here, but there's also some
sharp probability that it is
613
00:31:46,670 --> 00:31:48,720
around here.
614
00:31:48,720 --> 00:31:52,380
What's the single answer that
you would give to your boss?
615
00:31:52,380 --> 00:31:56,310
One option is to use the same
philosophy and say, OK, I'm
616
00:31:56,310 --> 00:32:00,135
going to find the Theta at which
this posterior density
617
00:32:00,135 --> 00:32:01,690
is highest.
618
00:32:01,690 --> 00:32:06,010
So I would pick this point
here and report this
619
00:32:06,010 --> 00:32:06,920
particular Theta.
620
00:32:06,920 --> 00:32:11,110
So this would be my Theta,
again, Theta MAP, the Theta
621
00:32:11,110 --> 00:32:15,290
that has the highest a
posteriori probability, just
622
00:32:15,290 --> 00:32:19,100
because it corresponds to
the peak of the density.
623
00:32:19,100 --> 00:32:23,810
But in this context, the
maximum a posteriori
624
00:32:23,810 --> 00:32:27,120
probability theta was the
one that was most
625
00:32:27,120 --> 00:32:28,600
likely to be true.
626
00:32:28,600 --> 00:32:32,460
In the continuous case, you
cannot really say that this is
627
00:32:32,460 --> 00:32:34,940
the most likely value
of Theta.
628
00:32:34,940 --> 00:32:38,340
In a continuous setting, any
value of Theta has zero
629
00:32:38,340 --> 00:32:41,530
probability, so when we
talk about densities.
630
00:32:41,530 --> 00:32:43,260
So it's not the most likely.
631
00:32:43,260 --> 00:32:48,240
It's the one for which the
density, so the probabilities
632
00:32:48,240 --> 00:32:51,820
of that neighborhoods,
are highest.
633
00:32:51,820 --> 00:32:56,390
So the rationale for picking
this particular estimate in
634
00:32:56,390 --> 00:33:00,050
the continuous case is much
less compelling than the
635
00:33:00,050 --> 00:33:02,210
rationale that we had in here.
636
00:33:02,210 --> 00:33:05,590
So in this case, reasonable
people might choose different
637
00:33:05,590 --> 00:33:07,460
quantities to report.
638
00:33:07,460 --> 00:33:11,810
And the very popular one would
be to report instead the
639
00:33:11,810 --> 00:33:13,700
conditional expectation.
640
00:33:13,700 --> 00:33:15,990
So I don't know quite
what Theta is.
641
00:33:15,990 --> 00:33:19,600
Given the data that I have,
Theta has this distribution.
642
00:33:19,600 --> 00:33:23,320
Let me just report the average
over that distribution.
643
00:33:23,320 --> 00:33:27,090
Let me report to the center
of gravity of this figure.
644
00:33:27,090 --> 00:33:30,340
And in this figure, the center
of gravity would probably be
645
00:33:30,340 --> 00:33:32,230
somewhere around here.
646
00:33:32,230 --> 00:33:35,690
And that would be a different
estimate that you
647
00:33:35,690 --> 00:33:37,520
might choose to report.
648
00:33:37,520 --> 00:33:40,340
So center of gravity is
something around here.
649
00:33:40,340 --> 00:33:43,580
And this is a conditional
expectation of Theta, given
650
00:33:43,580 --> 00:33:46,010
the data that you have.
651
00:33:46,010 --> 00:33:51,190
So these are two, in some sense,
fairly reasonable ways
652
00:33:51,190 --> 00:33:53,850
of choosing what to report
to your boss.
653
00:33:53,850 --> 00:33:55,690
Some people might choose
to report this.
654
00:33:55,690 --> 00:33:58,630
Some people might choose
to report that.
655
00:33:58,630 --> 00:34:03,230
And a priori, if there's no
compelling reason why one
656
00:34:03,230 --> 00:34:08,639
would be preferable than other
one, unless you set some rules
657
00:34:08,639 --> 00:34:12,350
for the game and you describe
a little more precisely what
658
00:34:12,350 --> 00:34:14,090
your objectives are.
659
00:34:14,090 --> 00:34:19,070
But no matter which one you
report, a single answer, a
660
00:34:19,070 --> 00:34:24,350
point estimate, doesn't really
tell you the whole story.
661
00:34:24,350 --> 00:34:28,159
There's a lot more information
conveyed by this posterior
662
00:34:28,159 --> 00:34:31,060
distribution plot than
any single number
663
00:34:31,060 --> 00:34:32,159
that you might report.
664
00:34:32,159 --> 00:34:36,510
So in general, you may wish to
convince your boss that's it's
665
00:34:36,510 --> 00:34:40,310
worth their time to look at the
entire plot, because that
666
00:34:40,310 --> 00:34:43,100
plot sort of covers all
the possibilities.
667
00:34:43,100 --> 00:34:47,060
It tells your boss most likely
we're in that range, but
668
00:34:47,060 --> 00:34:51,620
there's also a distinct change
that our Theta happens to lie
669
00:34:51,620 --> 00:34:54,080
in that range.
670
00:34:54,080 --> 00:34:58,400
All right, now let us try to
perhaps differentiate between
671
00:34:58,400 --> 00:35:02,570
these two and see under what
circumstances this one might
672
00:35:02,570 --> 00:35:05,530
be the better estimate
to perform.
673
00:35:05,530 --> 00:35:07,320
Better with respect to what?
674
00:35:07,320 --> 00:35:08,830
We need some rules.
675
00:35:08,830 --> 00:35:10,730
So we're going to throw
in some rules.
676
00:35:10,730 --> 00:35:14,320
677
00:35:14,320 --> 00:35:17,450
As a warm up, we're going to
deal with the problem of
678
00:35:17,450 --> 00:35:22,000
making an estimation if you
had no information at all,
679
00:35:22,000 --> 00:35:24,670
except for a prior
distribution.
680
00:35:24,670 --> 00:35:27,650
So this is a warm up for what's
coming next, which
681
00:35:27,650 --> 00:35:32,970
would be estimation that takes
into account some information.
682
00:35:32,970 --> 00:35:34,860
So we have a Theta.
683
00:35:34,860 --> 00:35:38,500
And because of your subjective
beliefs or models by others,
684
00:35:38,500 --> 00:35:41,780
you believe that Theta is
uniformly distributed between,
685
00:35:41,780 --> 00:35:46,250
let's say, 4 and 10.
686
00:35:46,250 --> 00:35:48,120
You want to come up with
a point estimate.
687
00:35:48,120 --> 00:35:51,770
688
00:35:51,770 --> 00:35:54,900
Let's try to look
for an estimate.
689
00:35:54,900 --> 00:35:57,580
Call it c, in this case.
690
00:35:57,580 --> 00:36:00,090
I want to pick a number
with which to estimate
691
00:36:00,090 --> 00:36:01,340
the value of Theta.
692
00:36:01,340 --> 00:36:04,030
693
00:36:04,030 --> 00:36:08,260
I will be interested in the size
of the error that I make.
694
00:36:08,260 --> 00:36:12,310
And I really dislike large
errors, so I'm going to focus
695
00:36:12,310 --> 00:36:15,500
on the square of the error
that they make.
696
00:36:15,500 --> 00:36:19,140
So I pick c.
697
00:36:19,140 --> 00:36:21,340
Theta that has a random value
that I don't know.
698
00:36:21,340 --> 00:36:25,900
But whatever it is, once it
becomes known, it results into
699
00:36:25,900 --> 00:36:28,640
a squared error between
what it is and what I
700
00:36:28,640 --> 00:36:30,660
guessed that it was.
701
00:36:30,660 --> 00:36:35,770
And I'm interested in making
a small air on the average,
702
00:36:35,770 --> 00:36:38,170
where the average is taken
with respect to all the
703
00:36:38,170 --> 00:36:42,350
possible and unknown
values of Theta.
704
00:36:42,350 --> 00:36:47,220
So the problem, this is a least
squares formulation of
705
00:36:47,220 --> 00:36:49,240
the problem, where we
try to minimize the
706
00:36:49,240 --> 00:36:51,150
least squares errors.
707
00:36:51,150 --> 00:36:53,900
How do you find the optimal c?
708
00:36:53,900 --> 00:36:57,200
Well, we take that expression
and expand it.
709
00:36:57,200 --> 00:37:00,930
710
00:37:00,930 --> 00:37:05,650
And it is, using linearity
of expectations--
711
00:37:05,650 --> 00:37:11,460
square minus 2c expected
Theta plus c squared--
712
00:37:11,460 --> 00:37:13,620
that's the quantity that
we want to minimize,
713
00:37:13,620 --> 00:37:16,670
with respect to c.
714
00:37:16,670 --> 00:37:19,670
To do the minimization, take the
derivative with respect to
715
00:37:19,670 --> 00:37:21,950
c and set it to 0.
716
00:37:21,950 --> 00:37:27,320
So that differentiation gives us
from here minus 2 expected
717
00:37:27,320 --> 00:37:32,420
value of Theta plus
2c is equal to 0.
718
00:37:32,420 --> 00:37:36,550
And the answer that you get by
solving this equation is that
719
00:37:36,550 --> 00:37:39,350
c is the expected
value of Theta.
720
00:37:39,350 --> 00:37:42,860
So when you do this
optimization, you find that
721
00:37:42,860 --> 00:37:45,170
the optimal estimate, the
things you should be
722
00:37:45,170 --> 00:37:47,970
reporting, is the expected
value of Theta.
723
00:37:47,970 --> 00:37:51,630
So in this particular example,
you would choose your estimate
724
00:37:51,630 --> 00:37:55,500
c to be just the middle
of these values,
725
00:37:55,500 --> 00:37:57,980
which would be 7.
726
00:37:57,980 --> 00:38:02,642
727
00:38:02,642 --> 00:38:06,640
OK, and in case your
boss asks you, how
728
00:38:06,640 --> 00:38:08,610
good is your estimate?
729
00:38:08,610 --> 00:38:11,390
How big is your error
going to be?
730
00:38:11,390 --> 00:38:14,910
731
00:38:14,910 --> 00:38:19,870
What you could report is the
average size of the estimation
732
00:38:19,870 --> 00:38:22,570
error that you are making.
733
00:38:22,570 --> 00:38:26,760
We picked our estimates to be
the expected value of Theta.
734
00:38:26,760 --> 00:38:29,450
So for this particular way that
I'm choosing to do my
735
00:38:29,450 --> 00:38:33,610
estimation, this is the mean
squared error that I get.
736
00:38:33,610 --> 00:38:35,330
And this is a familiar
quantity.
737
00:38:35,330 --> 00:38:38,370
It's just the variance
of the distribution.
738
00:38:38,370 --> 00:38:41,890
So the expectation is that
best way to estimate a
739
00:38:41,890 --> 00:38:45,550
quantity, if you're interested
in the mean squared error.
740
00:38:45,550 --> 00:38:50,430
And the resulting mean squared
error is the variance itself.
741
00:38:50,430 --> 00:38:56,380
How will this story change if
we now have data as well?
742
00:38:56,380 --> 00:39:01,290
Now having data means that
we can compute posterior
743
00:39:01,290 --> 00:39:05,150
distributions or conditional
distributions.
744
00:39:05,150 --> 00:39:10,400
So we get transported into a new
universe where instead the
745
00:39:10,400 --> 00:39:14,740
working with the original
distribution of Theta, the
746
00:39:14,740 --> 00:39:18,860
prior distribution, now we work
with the condition of
747
00:39:18,860 --> 00:39:22,280
distribution of Theta,
given the data
748
00:39:22,280 --> 00:39:24,860
that we have observed.
749
00:39:24,860 --> 00:39:30,430
Now remember our old slogan that
conditional models and
750
00:39:30,430 --> 00:39:33,570
conditional probabilities are
no different than ordinary
751
00:39:33,570 --> 00:39:38,880
probabilities, except that we
live now in a new universe
752
00:39:38,880 --> 00:39:42,690
where the new information has
been taken into account.
753
00:39:42,690 --> 00:39:47,860
So if you use that philosophy
and you're asked to minimize
754
00:39:47,860 --> 00:39:53,310
the squared error but now that
you live in a new universe
755
00:39:53,310 --> 00:39:56,910
where X has been fixed to
something, what would the
756
00:39:56,910 --> 00:39:59,210
optimal solution be?
757
00:39:59,210 --> 00:40:03,540
It would again be the
expectation of theta, but
758
00:40:03,540 --> 00:40:04,730
which expectation?
759
00:40:04,730 --> 00:40:08,910
It's the expectation which
applies in the new conditional
760
00:40:08,910 --> 00:40:12,350
universe in which we
live right now.
761
00:40:12,350 --> 00:40:16,330
So because of what we did
before, by the same
762
00:40:16,330 --> 00:40:20,330
calculation, we would find that
the optimal estimates is
763
00:40:20,330 --> 00:40:24,970
the expected value of X of
Theta, but the optimal
764
00:40:24,970 --> 00:40:26,730
estimate that takes
into account the
765
00:40:26,730 --> 00:40:29,170
information that we have.
766
00:40:29,170 --> 00:40:33,600
So the conclusion, once you get
your data, if you want to
767
00:40:33,600 --> 00:40:40,480
minimize the mean squared error,
you should just report
768
00:40:40,480 --> 00:40:43,870
the conditional estimation of
this unknown quantity based on
769
00:40:43,870 --> 00:40:46,640
the data that you have.
770
00:40:46,640 --> 00:40:53,050
So the picture here is that
Theta is unknown.
771
00:40:53,050 --> 00:41:00,710
You have your apparatus that
creates measurements.
772
00:41:00,710 --> 00:41:07,880
So this creates an X. You take
an X, and here you have a box
773
00:41:07,880 --> 00:41:10,203
that does calculations.
774
00:41:10,203 --> 00:41:13,490
775
00:41:13,490 --> 00:41:18,180
It does calculations and it
spits out the conditional
776
00:41:18,180 --> 00:41:22,230
expectation of Theta, given the
particular data that you
777
00:41:22,230 --> 00:41:24,750
have observed.
778
00:41:24,750 --> 00:41:28,680
And what we have done in this
class so far is, to some
779
00:41:28,680 --> 00:41:33,450
extent, developing the
computational tools and skills
780
00:41:33,450 --> 00:41:36,020
to do with this particular
calculation--
781
00:41:36,020 --> 00:41:39,780
how to calculate the posterior
density for Theta and how to
782
00:41:39,780 --> 00:41:42,750
calculate expectations,
conditional expectations.
783
00:41:42,750 --> 00:41:45,330
So in principle, we know
how to do this.
784
00:41:45,330 --> 00:41:50,040
In principle, we can program a
computer to take the data and
785
00:41:50,040 --> 00:41:51,670
to spit out condition
expectations.
786
00:41:51,670 --> 00:41:56,140
787
00:41:56,140 --> 00:42:04,390
Somebody who doesn't think like
us might instead design a
788
00:42:04,390 --> 00:42:09,940
calculating machine that does
something differently and
789
00:42:09,940 --> 00:42:16,490
produces some other estimate.
790
00:42:16,490 --> 00:42:20,000
So we went through this argument
and we decided to
791
00:42:20,000 --> 00:42:23,110
program our computer to
calculate conditional
792
00:42:23,110 --> 00:42:24,490
expectations.
793
00:42:24,490 --> 00:42:28,460
Somebody else came up with some
other crazy idea for how
794
00:42:28,460 --> 00:42:30,590
to estimate the random
variable.
795
00:42:30,590 --> 00:42:34,460
They came up with some function
g and the programmed
796
00:42:34,460 --> 00:42:38,700
it, and they designed a machine
that estimates Theta's
797
00:42:38,700 --> 00:42:43,000
by outputting a certain
g of X.
798
00:42:43,000 --> 00:42:47,690
That could be an alternative
estimator.
799
00:42:47,690 --> 00:42:50,280
Which one is better?
800
00:42:50,280 --> 00:42:56,350
Well, we convinced ourselves
that this is the optimal one
801
00:42:56,350 --> 00:42:59,780
in a universe where we have
fixed the particular
802
00:42:59,780 --> 00:43:01,420
value of the data.
803
00:43:01,420 --> 00:43:06,030
So what we have proved so far
is a relation of this kind.
804
00:43:06,030 --> 00:43:09,670
In this conditional universe,
the mean squared
805
00:43:09,670 --> 00:43:11,920
error that I get--
806
00:43:11,920 --> 00:43:15,170
I'm the one who's using
this estimator--
807
00:43:15,170 --> 00:43:18,850
is less than or equal than the
mean squared error that this
808
00:43:18,850 --> 00:43:23,960
person will get, the person
who uses that estimator.
809
00:43:23,960 --> 00:43:28,040
For any particular value of
the data, I'm going to do
810
00:43:28,040 --> 00:43:30,190
better than the other person.
811
00:43:30,190 --> 00:43:32,760
Now the data themselves
are random.
812
00:43:32,760 --> 00:43:38,050
If I average over all possible
values of the data, I should
813
00:43:38,050 --> 00:43:40,240
still be better off.
814
00:43:40,240 --> 00:43:45,120
If I'm better off for any
possible value X, then I
815
00:43:45,120 --> 00:43:49,140
should be better off on the
average over all possible
816
00:43:49,140 --> 00:43:50,640
values of X.
817
00:43:50,640 --> 00:43:55,670
So let us average both sides of
this quantity with respect
818
00:43:55,670 --> 00:43:58,990
to the probability distribution
of X. If you want
819
00:43:58,990 --> 00:44:03,350
to do it formally, you can write
this inequality between
820
00:44:03,350 --> 00:44:06,520
numbers as an inequality between
random variables.
821
00:44:06,520 --> 00:44:10,240
And it tells that no matter
what that random variable
822
00:44:10,240 --> 00:44:14,010
turns out to be, this quantity
is better than that quantity.
823
00:44:14,010 --> 00:44:17,270
Take expectations of both
sides, and you get this
824
00:44:17,270 --> 00:44:21,360
inequality between expectations
overall.
825
00:44:21,360 --> 00:44:29,130
And this last inequality tells
me that the person who's using
826
00:44:29,130 --> 00:44:34,430
this estimator who produces
estimates according to this
827
00:44:34,430 --> 00:44:45,090
machine will have a mean squared
estimation error
828
00:44:45,090 --> 00:44:48,580
that's less than or equal to
the estimation error that's
829
00:44:48,580 --> 00:44:51,290
produced by the other person.
830
00:44:51,290 --> 00:44:54,710
In a few words, the conditional
expectation
831
00:44:54,710 --> 00:44:58,500
estimator is the optimal
estimator.
832
00:44:58,500 --> 00:45:01,765
It's the ultimate estimating
machine.
833
00:45:01,765 --> 00:45:04,430
834
00:45:04,430 --> 00:45:08,720
That's how you should solve
estimation problems and report
835
00:45:08,720 --> 00:45:10,240
a single value.
836
00:45:10,240 --> 00:45:14,510
If you're forced to report a
single value and if you're
837
00:45:14,510 --> 00:45:18,060
interested in estimation
errors.
838
00:45:18,060 --> 00:45:24,620
OK, while we could have told you
that story, of course, a
839
00:45:24,620 --> 00:45:29,500
month or two ago, this is really
about interpretation --
840
00:45:29,500 --> 00:45:32,550
about realizing that conditional
expectations have
841
00:45:32,550 --> 00:45:35,160
a very nice property.
842
00:45:35,160 --> 00:45:38,180
But other than that, any
probabilistic skills that come
843
00:45:38,180 --> 00:45:41,180
into this business are just the
probabilistic skills of
844
00:45:41,180 --> 00:45:44,330
being able to calculate
conditional expectations,
845
00:45:44,330 --> 00:45:46,750
which you already
know how to do.
846
00:45:46,750 --> 00:45:51,380
So conclusion, all of optimal
Bayesian estimation just means
847
00:45:51,380 --> 00:45:54,655
calculating and reporting
conditional expectations.
848
00:45:54,655 --> 00:45:58,380
Well, if the world were that
simple, then statisticians
849
00:45:58,380 --> 00:46:02,670
wouldn't be able to find jobs
if life is that simple.
850
00:46:02,670 --> 00:46:05,690
So real life is not
that simple.
851
00:46:05,690 --> 00:46:07,540
There are complications.
852
00:46:07,540 --> 00:46:10,050
And that perhaps makes their
life a little more
853
00:46:10,050 --> 00:46:11,300
interesting.
854
00:46:11,300 --> 00:46:22,010
855
00:46:22,010 --> 00:46:25,500
OK, one complication is that we
would deal with the vectors
856
00:46:25,500 --> 00:46:28,580
instead of just single
random variables.
857
00:46:28,580 --> 00:46:31,830
I use the notation here
as if X was a
858
00:46:31,830 --> 00:46:33,500
single random variable.
859
00:46:33,500 --> 00:46:37,710
In real life, you get
several data.
860
00:46:37,710 --> 00:46:39,520
Does our story change?
861
00:46:39,520 --> 00:46:41,950
Not really, same argument--
862
00:46:41,950 --> 00:46:44,410
given all the data that you
have observed, you should
863
00:46:44,410 --> 00:46:47,660
still report the conditional
expectation of Theta.
864
00:46:47,660 --> 00:46:51,260
But what kind of work does it
take in order to report this
865
00:46:51,260 --> 00:46:53,080
conditional expectation?
866
00:46:53,080 --> 00:46:57,030
One issue is that you need to
cook up a plausible prior
867
00:46:57,030 --> 00:46:58,810
distribution for Theta.
868
00:46:58,810 --> 00:46:59,960
How do you do that?
869
00:46:59,960 --> 00:47:03,570
In a given application , this
is a bit of a judgment call,
870
00:47:03,570 --> 00:47:05,970
what prior would you
be working with.
871
00:47:05,970 --> 00:47:08,840
And there's a certain
skill there of not
872
00:47:08,840 --> 00:47:12,100
making silly choices.
873
00:47:12,100 --> 00:47:16,690
A more pragmatic, practical
issue is that this is a
874
00:47:16,690 --> 00:47:21,180
formula that's extremely nice
and compact and simple that
875
00:47:21,180 --> 00:47:24,560
you can write with
minimal ink.
876
00:47:24,560 --> 00:47:29,180
But the behind it there could
be hidden a huge amount of
877
00:47:29,180 --> 00:47:31,520
calculation.
878
00:47:31,520 --> 00:47:34,820
So doing any sort of
calculations that involve
879
00:47:34,820 --> 00:47:39,640
multiple random variables really
involves calculating
880
00:47:39,640 --> 00:47:42,240
multi-dimensional integrals.
881
00:47:42,240 --> 00:47:46,230
And the multi-dimensional
integrals are hard to compute.
882
00:47:46,230 --> 00:47:50,830
So implementing actually this
calculating machine here may
883
00:47:50,830 --> 00:47:54,340
not be easy, might be
complicated computationally.
884
00:47:54,340 --> 00:47:58,250
It's also complicated in terms
of not being able to derive
885
00:47:58,250 --> 00:47:59,890
intuition about it.
886
00:47:59,890 --> 00:48:03,680
So perhaps you might want to
have a simpler version, a
887
00:48:03,680 --> 00:48:07,940
simpler alternative to this
formula that's easier to work
888
00:48:07,940 --> 00:48:10,950
with and easier to calculate.
889
00:48:10,950 --> 00:48:13,440
We will be talking about
one such simpler
890
00:48:13,440 --> 00:48:15,540
alternative next time.
891
00:48:15,540 --> 00:48:18,570
So again, to conclude, at
the high level, Bayesian
892
00:48:18,570 --> 00:48:22,330
estimation is very, very simple,
given that you have
893
00:48:22,330 --> 00:48:24,180
mastered everything that
has happened in
894
00:48:24,180 --> 00:48:26,370
this course so far.
895
00:48:26,370 --> 00:48:29,860
There are certain practical
issues and it's also good to
896
00:48:29,860 --> 00:48:33,590
be familiar with the concepts
and the issues that in
897
00:48:33,590 --> 00:48:36,620
general, you would prefer to
report that complete posterior
898
00:48:36,620 --> 00:48:37,360
distribution.
899
00:48:37,360 --> 00:48:40,890
But if you're forced to report a
point estimate, then there's
900
00:48:40,890 --> 00:48:43,130
a number of reasonable
ways to do it.
901
00:48:43,130 --> 00:48:45,690
And perhaps the most reasonable
one is to just the
902
00:48:45,690 --> 00:48:48,220
report the conditional
expectation itself.
903
00:48:48,220 --> 00:48:49,470