WEBVTT
00:00:02.080 --> 00:00:04.500
In the next variation
that we consider,
00:00:04.500 --> 00:00:07.160
the random variable
Theta is still discrete.
00:00:07.160 --> 00:00:09.520
So it might, for
example, represent
00:00:09.520 --> 00:00:11.900
a number of
alternative hypothesis.
00:00:11.900 --> 00:00:14.640
But now our observation
is continuous.
00:00:14.640 --> 00:00:17.630
Of course, we do have a
variation of the Bayes rule
00:00:17.630 --> 00:00:20.020
that's applicable
to this situation.
00:00:20.020 --> 00:00:22.600
The only difference from the
previous version of the Bayes
00:00:22.600 --> 00:00:27.290
rule is that now the PMF
of X, the unconditional
00:00:27.290 --> 00:00:30.590
and the conditional one,
is replaced by a PDF.
00:00:30.590 --> 00:00:33.610
Otherwise, everything
remains the same.
00:00:33.610 --> 00:00:37.760
A standard example
is the following.
00:00:37.760 --> 00:00:41.550
Here we're sending a signal
that takes one of, let's say,
00:00:41.550 --> 00:00:43.950
three alternative values.
00:00:43.950 --> 00:00:46.300
And what we observe
is the signal
00:00:46.300 --> 00:00:48.710
that was sent plus some noise.
00:00:48.710 --> 00:00:50.510
And the typical
assumption here might
00:00:50.510 --> 00:00:53.610
be that the noise has zero
mean and a certain variance,
00:00:53.610 --> 00:00:58.480
and is independent from
the signal that was sent.
00:00:58.480 --> 00:01:03.110
This is an example that we more
or less studied some time ago.
00:01:03.110 --> 00:01:05.090
Actually, at that time,
we looked at an example
00:01:05.090 --> 00:01:08.230
where Theta could only
take one out of two values,
00:01:08.230 --> 00:01:10.470
but the calculations
and the methodology
00:01:10.470 --> 00:01:14.310
remains essentially the same as
for the case of three values.
00:01:14.310 --> 00:01:16.870
So in principle, we
do know at this point
00:01:16.870 --> 00:01:19.530
how to apply the Bayes
rule in this situation
00:01:19.530 --> 00:01:23.950
to come up with a
conditional PMF of theta.
00:01:23.950 --> 00:01:28.330
And the key to that calculation
was that the term that we need,
00:01:28.330 --> 00:01:32.539
the conditional PDF of X, can
be obtained from this equation
00:01:32.539 --> 00:01:34.190
as follows.
00:01:34.190 --> 00:01:36.030
If I tell you the
value of Theta,
00:01:36.030 --> 00:01:41.140
then X is essentially the same
as W plus a certain constant.
00:01:41.140 --> 00:01:45.190
Adding a constant just
shifts the PDF of W
00:01:45.190 --> 00:01:47.650
by an amount equal
to that constant.
00:01:47.650 --> 00:01:50.360
And, therefore, the
conditional PDF of X
00:01:50.360 --> 00:01:54.910
is the shifted PDF of the
random variable W. Using
00:01:54.910 --> 00:01:58.890
this particular fact, we can
then apply the Bayes rule,
00:01:58.890 --> 00:02:02.040
carry out of the calculations,
and suppose that in the end
00:02:02.040 --> 00:02:04.350
we came up with these results.
00:02:04.350 --> 00:02:06.960
That is we obtain the
specific observation
00:02:06.960 --> 00:02:09.280
x and based on that
observation, we
00:02:09.280 --> 00:02:11.470
calculate the
conditional probabilities
00:02:11.470 --> 00:02:13.860
of the different
choices of Theta.
00:02:13.860 --> 00:02:16.670
At this point, we
may use the MAP rule
00:02:16.670 --> 00:02:19.880
and come up with
an estimate which
00:02:19.880 --> 00:02:24.160
is the value of Theta, which
is the more likely one.
00:02:24.160 --> 00:02:27.590
And then we can continue
exactly as in the case
00:02:27.590 --> 00:02:31.820
of discrete measurements,
of discrete observations,
00:02:31.820 --> 00:02:35.050
and talk about conditional
probabilities of error
00:02:35.050 --> 00:02:36.270
and so on.
00:02:36.270 --> 00:02:38.640
Now, the fact that
X is continuous
00:02:38.640 --> 00:02:43.800
really makes no difference,
once we arrive at this picture.
00:02:43.800 --> 00:02:46.730
With the MAP rule we still
choose the most likely value
00:02:46.730 --> 00:02:49.290
of theta, and this
is our estimates.
00:02:49.290 --> 00:02:51.220
And we can calculate
the probability
00:02:51.220 --> 00:02:52.835
of error, which
with the MAP rule
00:02:52.835 --> 00:02:56.030
would be 0.4, exactly
the same argument
00:02:56.030 --> 00:02:58.120
as for the case of
discrete observations
00:02:58.120 --> 00:03:01.290
applies and shows that this
conditional probability
00:03:01.290 --> 00:03:04.550
of error is smallest
under the MAP rule.
00:03:04.550 --> 00:03:06.760
And then we can
continue similarly
00:03:06.760 --> 00:03:10.010
and talk about the overall
probability of error, which
00:03:10.010 --> 00:03:12.750
can be calculated using
the total probability
00:03:12.750 --> 00:03:15.130
theorem in two ways.
00:03:15.130 --> 00:03:17.880
One way is to take the
conditional probability
00:03:17.880 --> 00:03:21.340
of error for any
given value of X
00:03:21.340 --> 00:03:23.760
and then average those
conditional probabilities
00:03:23.760 --> 00:03:27.030
of errors over all the
possible choices of X.
00:03:27.030 --> 00:03:29.010
Because X is now
continuous, here
00:03:29.010 --> 00:03:30.840
we're going to have an integral.
00:03:30.840 --> 00:03:34.350
Alternatively, you can
condition on the possible values
00:03:34.350 --> 00:03:39.090
of Theta, calculate conditional
probabilities of error
00:03:39.090 --> 00:03:41.710
for any particular
choice of theta,
00:03:41.710 --> 00:03:46.560
and then take a weighted
average of them.
00:03:46.560 --> 00:03:49.030
In practice, this
calculation sometimes
00:03:49.030 --> 00:03:51.720
turns out to be the simpler one.
00:03:51.720 --> 00:03:53.930
Finally, we can
replicate the argument
00:03:53.930 --> 00:03:55.760
that we had in
the discrete case.
00:03:55.760 --> 00:04:01.560
Since the MAP rule makes this
term here as small as possible,
00:04:01.560 --> 00:04:05.180
it is less than or equal
to the probability of error
00:04:05.180 --> 00:04:09.740
that you would get under any
other estimate or estimator,
00:04:09.740 --> 00:04:12.590
then it follows that
the integral will also
00:04:12.590 --> 00:04:14.830
be as small as possible.
00:04:14.830 --> 00:04:18.329
And therefore, the conclusion
is that the overall probability
00:04:18.329 --> 00:04:21.660
of error is, again,
the smallest possible
00:04:21.660 --> 00:04:23.970
when we use the MAP rule.
00:04:23.970 --> 00:04:27.970
And so the MAP rule
remains the optimal way
00:04:27.970 --> 00:04:31.070
of choosing between
alternative hypothesis,
00:04:31.070 --> 00:04:34.401
whether X is discrete
or continuous.