1 00:00:01,600 --> 00:00:04,220 Our discussion of least mean squares estimation 2 00:00:04,220 --> 00:00:06,590 so far was based on the case where 3 00:00:06,590 --> 00:00:09,400 we have a single unknown random variable 4 00:00:09,400 --> 00:00:11,990 and a single observation. 5 00:00:11,990 --> 00:00:14,300 And we're interested in a point estimate 6 00:00:14,300 --> 00:00:16,720 of this single unknown random variable. 7 00:00:16,720 --> 00:00:20,880 What happens if we have multiple observations or parameters? 8 00:00:20,880 --> 00:00:22,850 For example, suppose that instead 9 00:00:22,850 --> 00:00:27,540 of a single observation, we have a whole vector of observations. 10 00:00:27,540 --> 00:00:29,220 And, of course, we assume that we 11 00:00:29,220 --> 00:00:32,119 have a model for these observations. 12 00:00:32,119 --> 00:00:35,510 Once we observe our data, a numerical value 13 00:00:35,510 --> 00:00:39,300 for this vector, or what is the same numerical values 14 00:00:39,300 --> 00:00:42,930 for each one of these observation random variables. 15 00:00:42,930 --> 00:00:46,030 Then we're placed in the conditional universe where 16 00:00:46,030 --> 00:00:48,790 these values have been observed. 17 00:00:48,790 --> 00:00:51,780 Then, we notice that the arguments that we carried out 18 00:00:51,780 --> 00:00:54,660 did not rely in any way on the fact 19 00:00:54,660 --> 00:00:56,870 that X was one-dimensional. 20 00:00:56,870 --> 00:00:58,990 Exactly the same argument goes through 21 00:00:58,990 --> 00:01:02,970 for the multi-dimensional case, and simply, the answer 22 00:01:02,970 --> 00:01:06,060 is again, that the optimal estimate, the one that 23 00:01:06,060 --> 00:01:09,120 minimizes the mean squared error, is again, 24 00:01:09,120 --> 00:01:12,590 the conditional expectation of the unknown random variable 25 00:01:12,590 --> 00:01:15,810 given the values of the observations. 26 00:01:15,810 --> 00:01:19,110 So this gives us a simple and much more general 27 00:01:19,110 --> 00:01:22,070 solution that also applies to the case 28 00:01:22,070 --> 00:01:24,580 of multiple observations. 29 00:01:24,580 --> 00:01:28,780 Now, what if we have multiple parameters? 30 00:01:28,780 --> 00:01:32,950 Once more, the argument is exactly the same, 31 00:01:32,950 --> 00:01:35,320 and we obtain that the optimal estimate 32 00:01:35,320 --> 00:01:37,740 of any particular parameter is going 33 00:01:37,740 --> 00:01:41,070 to be the conditional expectation of that parameter 34 00:01:41,070 --> 00:01:43,060 given the observations. 35 00:01:43,060 --> 00:01:50,729 So if our parameter vector is something of this form, 36 00:01:50,729 --> 00:01:53,670 consisting of several components, 37 00:01:53,670 --> 00:01:59,700 then the LMS estimate of the jth component of our parameter 38 00:01:59,700 --> 00:02:03,950 vector is going to be simply the conditional expectation 39 00:02:03,950 --> 00:02:09,850 of this parameter given the data that we have obtained. 40 00:02:09,850 --> 00:02:13,180 And this gives us the most general solution 41 00:02:13,180 --> 00:02:15,480 to the program of least mean squares 42 00:02:15,480 --> 00:02:20,370 estimation when we have multiple parameters 43 00:02:20,370 --> 00:02:22,950 and multiple observations. 44 00:02:22,950 --> 00:02:27,565 One very simple concept that applies to all possible cases. 45 00:02:30,670 --> 00:02:35,680 Unfortunately, however, our worries are not over. 46 00:02:35,680 --> 00:02:39,970 Even though LMS estimation has such a simple and such 47 00:02:39,970 --> 00:02:45,360 a general solution, things are not always easy. 48 00:02:45,360 --> 00:02:48,180 Let us see what's happening. 49 00:02:48,180 --> 00:02:52,590 No matter what, we have to first find out 50 00:02:52,590 --> 00:02:55,680 the posterior distribution of Theta 51 00:02:55,680 --> 00:02:58,480 given the observations that we have obtained. 52 00:02:58,480 --> 00:03:02,150 And this is done using the Bayes rule, which we have written 53 00:03:02,150 --> 00:03:04,530 here, and this is how you evaluate 54 00:03:04,530 --> 00:03:08,380 the denominator in Bayes' rule. 55 00:03:08,380 --> 00:03:11,390 What are the difficulties that we may encounter? 56 00:03:11,390 --> 00:03:15,020 One first difficulty is that in many applications, 57 00:03:15,020 --> 00:03:18,520 we do not necessarily have a good model 58 00:03:18,520 --> 00:03:21,950 or we're not very confident about our model 59 00:03:21,950 --> 00:03:23,890 of the observations. 60 00:03:23,890 --> 00:03:26,710 If X and Theta are multi-dimensional, 61 00:03:26,710 --> 00:03:31,560 such a model might be difficult to construct. 62 00:03:31,560 --> 00:03:36,594 Setting this issue aside, there's a further issue. 63 00:03:36,594 --> 00:03:39,640 The conditional expectation of Theta 64 00:03:39,640 --> 00:03:44,160 given X may be a complicated non-linear function 65 00:03:44,160 --> 00:03:46,300 of the observations. 66 00:03:46,300 --> 00:03:49,890 This means that it may be difficult to analyze, 67 00:03:49,890 --> 00:03:52,720 but even more important, it may be 68 00:03:52,720 --> 00:03:56,140 very difficult to calculate even after you 69 00:03:56,140 --> 00:03:58,140 have obtained your data. 70 00:03:58,140 --> 00:04:01,790 Let us understand why this might be the case. 71 00:04:01,790 --> 00:04:06,100 Suppose that Theta is a multi-dimensional parameter. 72 00:04:06,100 --> 00:04:09,680 Then in order to calculate the denominator that's involved 73 00:04:09,680 --> 00:04:13,780 here in the Bayes rule, when you integrate with respect to theta 74 00:04:13,780 --> 00:04:20,279 , you have to actually carry a multi-dimensional integral, 75 00:04:20,279 --> 00:04:24,190 and this can be very challenging or sometimes, 76 00:04:24,190 --> 00:04:27,110 practically impossible. 77 00:04:27,110 --> 00:04:31,040 Even if you had this denominator term in your hands, 78 00:04:31,040 --> 00:04:36,420 still, in order to calculate a conditional expectation, 79 00:04:36,420 --> 00:04:41,230 you would have to calculate once more 80 00:04:41,230 --> 00:04:46,430 an integral of theta j integrated 81 00:04:46,430 --> 00:04:49,780 against the posterior distribution of the vector 82 00:04:49,780 --> 00:04:52,190 theta. 83 00:04:52,190 --> 00:04:59,040 But this integral should be once more, over all the parameters. 84 00:04:59,040 --> 00:05:02,640 So it would be a multi-dimensional integral 85 00:05:02,640 --> 00:05:06,090 in the general case, and that's one additional source 86 00:05:06,090 --> 00:05:07,940 of difficulty. 87 00:05:07,940 --> 00:05:10,570 And this is the reason why we will also 88 00:05:10,570 --> 00:05:14,490 consider an alternative to least mean squares estimation, which 89 00:05:14,490 --> 00:05:17,890 is much simpler computationally and much less 90 00:05:17,890 --> 00:05:21,100 demanding in terms of the model that we 91 00:05:21,100 --> 00:05:23,610 need to have in our hands.