1 00:00:02,080 --> 00:00:04,500 In the next variation that we consider, 2 00:00:04,500 --> 00:00:07,160 the random variable Theta is still discrete. 3 00:00:07,160 --> 00:00:09,520 So it might, for example, represent 4 00:00:09,520 --> 00:00:11,900 a number of alternative hypothesis. 5 00:00:11,900 --> 00:00:14,640 But now our observation is continuous. 6 00:00:14,640 --> 00:00:17,630 Of course, we do have a variation of the Bayes rule 7 00:00:17,630 --> 00:00:20,020 that's applicable to this situation. 8 00:00:20,020 --> 00:00:22,600 The only difference from the previous version of the Bayes 9 00:00:22,600 --> 00:00:27,290 rule is that now the PMF of X, the unconditional 10 00:00:27,290 --> 00:00:30,590 and the conditional one, is replaced by a PDF. 11 00:00:30,590 --> 00:00:33,610 Otherwise, everything remains the same. 12 00:00:33,610 --> 00:00:37,760 A standard example is the following. 13 00:00:37,760 --> 00:00:41,550 Here we're sending a signal that takes one of, let's say, 14 00:00:41,550 --> 00:00:43,950 three alternative values. 15 00:00:43,950 --> 00:00:46,300 And what we observe is the signal 16 00:00:46,300 --> 00:00:48,710 that was sent plus some noise. 17 00:00:48,710 --> 00:00:50,510 And the typical assumption here might 18 00:00:50,510 --> 00:00:53,610 be that the noise has zero mean and a certain variance, 19 00:00:53,610 --> 00:00:58,480 and is independent from the signal that was sent. 20 00:00:58,480 --> 00:01:03,110 This is an example that we more or less studied some time ago. 21 00:01:03,110 --> 00:01:05,090 Actually, at that time, we looked at an example 22 00:01:05,090 --> 00:01:08,230 where Theta could only take one out of two values, 23 00:01:08,230 --> 00:01:10,470 but the calculations and the methodology 24 00:01:10,470 --> 00:01:14,310 remains essentially the same as for the case of three values. 25 00:01:14,310 --> 00:01:16,870 So in principle, we do know at this point 26 00:01:16,870 --> 00:01:19,530 how to apply the Bayes rule in this situation 27 00:01:19,530 --> 00:01:23,950 to come up with a conditional PMF of theta. 28 00:01:23,950 --> 00:01:28,330 And the key to that calculation was that the term that we need, 29 00:01:28,330 --> 00:01:32,539 the conditional PDF of X, can be obtained from this equation 30 00:01:32,539 --> 00:01:34,190 as follows. 31 00:01:34,190 --> 00:01:36,030 If I tell you the value of Theta, 32 00:01:36,030 --> 00:01:41,140 then X is essentially the same as W plus a certain constant. 33 00:01:41,140 --> 00:01:45,190 Adding a constant just shifts the PDF of W 34 00:01:45,190 --> 00:01:47,650 by an amount equal to that constant. 35 00:01:47,650 --> 00:01:50,360 And, therefore, the conditional PDF of X 36 00:01:50,360 --> 00:01:54,910 is the shifted PDF of the random variable W. Using 37 00:01:54,910 --> 00:01:58,890 this particular fact, we can then apply the Bayes rule, 38 00:01:58,890 --> 00:02:02,040 carry out of the calculations, and suppose that in the end 39 00:02:02,040 --> 00:02:04,350 we came up with these results. 40 00:02:04,350 --> 00:02:06,960 That is we obtain the specific observation 41 00:02:06,960 --> 00:02:09,280 x and based on that observation, we 42 00:02:09,280 --> 00:02:11,470 calculate the conditional probabilities 43 00:02:11,470 --> 00:02:13,860 of the different choices of Theta. 44 00:02:13,860 --> 00:02:16,670 At this point, we may use the MAP rule 45 00:02:16,670 --> 00:02:19,880 and come up with an estimate which 46 00:02:19,880 --> 00:02:24,160 is the value of Theta, which is the more likely one. 47 00:02:24,160 --> 00:02:27,590 And then we can continue exactly as in the case 48 00:02:27,590 --> 00:02:31,820 of discrete measurements, of discrete observations, 49 00:02:31,820 --> 00:02:35,050 and talk about conditional probabilities of error 50 00:02:35,050 --> 00:02:36,270 and so on. 51 00:02:36,270 --> 00:02:38,640 Now, the fact that X is continuous 52 00:02:38,640 --> 00:02:43,800 really makes no difference, once we arrive at this picture. 53 00:02:43,800 --> 00:02:46,730 With the MAP rule we still choose the most likely value 54 00:02:46,730 --> 00:02:49,290 of theta, and this is our estimates. 55 00:02:49,290 --> 00:02:51,220 And we can calculate the probability 56 00:02:51,220 --> 00:02:52,835 of error, which with the MAP rule 57 00:02:52,835 --> 00:02:56,030 would be 0.4, exactly the same argument 58 00:02:56,030 --> 00:02:58,120 as for the case of discrete observations 59 00:02:58,120 --> 00:03:01,290 applies and shows that this conditional probability 60 00:03:01,290 --> 00:03:04,550 of error is smallest under the MAP rule. 61 00:03:04,550 --> 00:03:06,760 And then we can continue similarly 62 00:03:06,760 --> 00:03:10,010 and talk about the overall probability of error, which 63 00:03:10,010 --> 00:03:12,750 can be calculated using the total probability 64 00:03:12,750 --> 00:03:15,130 theorem in two ways. 65 00:03:15,130 --> 00:03:17,880 One way is to take the conditional probability 66 00:03:17,880 --> 00:03:21,340 of error for any given value of X 67 00:03:21,340 --> 00:03:23,760 and then average those conditional probabilities 68 00:03:23,760 --> 00:03:27,030 of errors over all the possible choices of X. 69 00:03:27,030 --> 00:03:29,010 Because X is now continuous, here 70 00:03:29,010 --> 00:03:30,840 we're going to have an integral. 71 00:03:30,840 --> 00:03:34,350 Alternatively, you can condition on the possible values 72 00:03:34,350 --> 00:03:39,090 of Theta, calculate conditional probabilities of error 73 00:03:39,090 --> 00:03:41,710 for any particular choice of theta, 74 00:03:41,710 --> 00:03:46,560 and then take a weighted average of them. 75 00:03:46,560 --> 00:03:49,030 In practice, this calculation sometimes 76 00:03:49,030 --> 00:03:51,720 turns out to be the simpler one. 77 00:03:51,720 --> 00:03:53,930 Finally, we can replicate the argument 78 00:03:53,930 --> 00:03:55,760 that we had in the discrete case. 79 00:03:55,760 --> 00:04:01,560 Since the MAP rule makes this term here as small as possible, 80 00:04:01,560 --> 00:04:05,180 it is less than or equal to the probability of error 81 00:04:05,180 --> 00:04:09,740 that you would get under any other estimate or estimator, 82 00:04:09,740 --> 00:04:12,590 then it follows that the integral will also 83 00:04:12,590 --> 00:04:14,830 be as small as possible. 84 00:04:14,830 --> 00:04:18,329 And therefore, the conclusion is that the overall probability 85 00:04:18,329 --> 00:04:21,660 of error is, again, the smallest possible 86 00:04:21,660 --> 00:04:23,970 when we use the MAP rule. 87 00:04:23,970 --> 00:04:27,970 And so the MAP rule remains the optimal way 88 00:04:27,970 --> 00:04:31,070 of choosing between alternative hypothesis, 89 00:04:31,070 --> 00:04:34,401 whether X is discrete or continuous.