1
00:00:01,260 --> 00:00:03,390
In this lecture
sequence, we introduced
2
00:00:03,390 --> 00:00:05,700
quite a few new
concepts and went
3
00:00:05,700 --> 00:00:08,090
through a fair
number of examples.
4
00:00:08,090 --> 00:00:09,930
So for this reason,
it is useful now
5
00:00:09,930 --> 00:00:14,860
to just take stock and summarize
the key ideas and concepts.
6
00:00:14,860 --> 00:00:17,650
The starting point in a
Bayesian inference problem
7
00:00:17,650 --> 00:00:18,580
is the following.
8
00:00:18,580 --> 00:00:20,510
There's an unknown
parameter, Theta,
9
00:00:20,510 --> 00:00:23,050
and we're given a
prior distribution
10
00:00:23,050 --> 00:00:24,510
for that parameter.
11
00:00:24,510 --> 00:00:28,690
We're also given a model
for the observations, X,
12
00:00:28,690 --> 00:00:31,240
in terms of a
distribution that depends
13
00:00:31,240 --> 00:00:34,140
on the unknown parameter, Theta.
14
00:00:34,140 --> 00:00:36,610
The inference problem
is as follows.
15
00:00:36,610 --> 00:00:40,590
We will be given the value
of the random variable X.
16
00:00:40,590 --> 00:00:43,690
And then we want to find
the posterior distribution
17
00:00:43,690 --> 00:00:48,060
of Theta, that is, given this
particular value of X, what
18
00:00:48,060 --> 00:00:50,420
is the conditional
distribution of Theta?
19
00:00:50,420 --> 00:00:52,210
In the case where
Theta is discrete,
20
00:00:52,210 --> 00:00:54,060
this will be in terms of a PMF.
21
00:00:54,060 --> 00:00:57,630
If Theta is continuous,
this would be a PDF.
22
00:00:57,630 --> 00:00:59,550
We find the posterior
distribution
23
00:00:59,550 --> 00:01:02,760
by using an appropriate
version of the Bayes rule.
24
00:01:02,760 --> 00:01:08,010
And here we have four different
combinations or four choices,
25
00:01:08,010 --> 00:01:13,460
depending on which variables
are discrete or continuous.
26
00:01:13,460 --> 00:01:15,780
This is a complete solution
to the Bayesian inference
27
00:01:15,780 --> 00:01:18,610
problem-- a posterior
distribution.
28
00:01:18,610 --> 00:01:24,640
But if we want to come up with
a single guess of what Theta is,
29
00:01:24,640 --> 00:01:27,860
then we use a
so-called estimator.
30
00:01:27,860 --> 00:01:31,289
What an estimator does is that
it calculates a certain value
31
00:01:31,289 --> 00:01:33,830
as a function of
the observed data.
32
00:01:33,830 --> 00:01:37,380
So g describes the way that
the data are processed.
33
00:01:37,380 --> 00:01:40,110
Because X is random,
the estimator
34
00:01:40,110 --> 00:01:42,710
itself will be a
random variable.
35
00:01:42,710 --> 00:01:46,979
But once we obtain a specific
value of our random variable
36
00:01:46,979 --> 00:01:49,860
and we apply this
particular estimator,
37
00:01:49,860 --> 00:01:53,490
then we get the realized
value of the estimator.
38
00:01:53,490 --> 00:01:56,740
So we apply g now
to the lowercase x,
39
00:01:56,740 --> 00:02:00,940
and this gives us an estimate,
which is actually a number.
40
00:02:00,940 --> 00:02:04,200
We have seen two particular
ways of constructing
41
00:02:04,200 --> 00:02:06,210
estimates or estimators.
42
00:02:06,210 --> 00:02:11,038
One of them is the maximum a
posteriori probability rule
43
00:02:11,038 --> 00:02:17,210
in which we choose an estimate
that maximizes the posterior
44
00:02:17,210 --> 00:02:18,520
distribution.
45
00:02:18,520 --> 00:02:20,590
So in the case where
Theta is discrete,
46
00:02:20,590 --> 00:02:23,500
this finds the value
of theta, which
47
00:02:23,500 --> 00:02:27,280
is the most likely one
given our observation.
48
00:02:27,280 --> 00:02:29,560
And similarly, in
the continuous case,
49
00:02:29,560 --> 00:02:32,360
it finds a value
of theta at which
50
00:02:32,360 --> 00:02:36,460
the conditional PDF of
Theta would be largest.
51
00:02:36,460 --> 00:02:40,000
Another estimator
is the one that we
52
00:02:40,000 --> 00:02:43,140
call the LMS or least mean
squares estimator, which
53
00:02:43,140 --> 00:02:45,390
calculates the
conditional expectation
54
00:02:45,390 --> 00:02:47,880
of the unknown parameter,
given the observations
55
00:02:47,880 --> 00:02:50,160
that we have obtained.
56
00:02:50,160 --> 00:02:53,150
Finally, we may be
interested in evaluating
57
00:02:53,150 --> 00:02:56,110
the performance of
a given estimator.
58
00:02:56,110 --> 00:02:58,260
For hypotheses-testing
problems we're
59
00:02:58,260 --> 00:03:00,910
interested in the
probability of error.
60
00:03:00,910 --> 00:03:03,820
And we have the conditional
probability of error.
61
00:03:03,820 --> 00:03:06,730
Given the data that
I have just observed
62
00:03:06,730 --> 00:03:09,420
and given that I'm using
a specific estimator,
63
00:03:09,420 --> 00:03:12,560
what is the probability
that I make a mistake?
64
00:03:12,560 --> 00:03:16,800
And then there's the overall
evaluation of the estimator--
65
00:03:16,800 --> 00:03:19,720
how well does it
do on the average
66
00:03:19,720 --> 00:03:23,710
before I know what
X is going to be?
67
00:03:23,710 --> 00:03:26,200
And this is just the
probability that I
68
00:03:26,200 --> 00:03:29,250
will be making an
incorrect decision.
69
00:03:29,250 --> 00:03:31,250
For estimation problems,
on the other hand,
70
00:03:31,250 --> 00:03:34,800
we're interested in the
distance between our estimates
71
00:03:34,800 --> 00:03:36,829
from the true value of Theta.
72
00:03:36,829 --> 00:03:40,780
And this leads us to the
following conditional mean
73
00:03:40,780 --> 00:03:44,270
squared error, given
that we have already
74
00:03:44,270 --> 00:03:45,740
obtained an observation.
75
00:03:45,740 --> 00:03:48,400
And we come up
with an estimator.
76
00:03:48,400 --> 00:03:50,890
In particular, the value of
the estimator at this time
77
00:03:50,890 --> 00:03:54,230
would be completely determined
by the data that we obtained.
78
00:03:54,230 --> 00:03:57,460
But Theta, the unknown
parameter remains random.
79
00:03:57,460 --> 00:04:00,790
And there's going to be
a certain squared error.
80
00:04:00,790 --> 00:04:03,270
We find the
conditional expectation
81
00:04:03,270 --> 00:04:07,130
of this squared error in this
particular situation, where
82
00:04:07,130 --> 00:04:09,910
we have obtained a specific
value of the random variable,
83
00:04:09,910 --> 00:04:12,930
capital X. On the
other hand, if we're
84
00:04:12,930 --> 00:04:15,390
looking at the estimator
more generally,
85
00:04:15,390 --> 00:04:17,990
how well it does on
the average, then
86
00:04:17,990 --> 00:04:21,640
we look at the unconditional
mean squared error,
87
00:04:21,640 --> 00:04:25,540
and this gives us an overall
performance evaluation.
88
00:04:25,540 --> 00:04:29,020
How do we calculate these
performance measures?
89
00:04:29,020 --> 00:04:31,670
Here, we live in a
conditional universe.
90
00:04:31,670 --> 00:04:35,570
And in a Bayesian estimation
problem at some point
91
00:04:35,570 --> 00:04:39,050
we do calculate the posterior
distribution of Theta,
92
00:04:39,050 --> 00:04:40,480
given the measurements.
93
00:04:40,480 --> 00:04:45,020
So these calculations
involved here
94
00:04:45,020 --> 00:04:49,490
consist of just an
integration or summation
95
00:04:49,490 --> 00:04:52,050
using the conditional
distribution.
96
00:04:52,050 --> 00:04:55,270
For example, here we would
integrate this quantity
97
00:04:55,270 --> 00:04:57,920
using the conditional
density of Theta,
98
00:04:57,920 --> 00:05:01,250
given the particular value
that we have obtained.
99
00:05:01,250 --> 00:05:06,000
If we want to now calculate
the unconditional performance,
100
00:05:06,000 --> 00:05:09,860
then we would have to
use the total probability
101
00:05:09,860 --> 00:05:11,625
or expectation theorem.
102
00:05:16,210 --> 00:05:21,880
And in that case, we can average
over all the possible values
103
00:05:21,880 --> 00:05:25,410
of X to find the overall error.
104
00:05:25,410 --> 00:05:29,560
So all of the calculations
involve tools and equations
105
00:05:29,560 --> 00:05:32,790
that we have seen and that
we have used in the past,
106
00:05:32,790 --> 00:05:36,400
so it is just a matter
of connecting those tools
107
00:05:36,400 --> 00:05:40,900
with the specific new concepts
that we have introduced here.