1
00:00:01,600 --> 00:00:04,220
Our discussion of least
mean squares estimation
2
00:00:04,220 --> 00:00:06,590
so far was based
on the case where
3
00:00:06,590 --> 00:00:09,400
we have a single
unknown random variable
4
00:00:09,400 --> 00:00:11,990
and a single observation.
5
00:00:11,990 --> 00:00:14,300
And we're interested
in a point estimate
6
00:00:14,300 --> 00:00:16,720
of this single unknown
random variable.
7
00:00:16,720 --> 00:00:20,880
What happens if we have multiple
observations or parameters?
8
00:00:20,880 --> 00:00:22,850
For example,
suppose that instead
9
00:00:22,850 --> 00:00:27,540
of a single observation, we have
a whole vector of observations.
10
00:00:27,540 --> 00:00:29,220
And, of course,
we assume that we
11
00:00:29,220 --> 00:00:32,119
have a model for
these observations.
12
00:00:32,119 --> 00:00:35,510
Once we observe our
data, a numerical value
13
00:00:35,510 --> 00:00:39,300
for this vector, or what is
the same numerical values
14
00:00:39,300 --> 00:00:42,930
for each one of these
observation random variables.
15
00:00:42,930 --> 00:00:46,030
Then we're placed in the
conditional universe where
16
00:00:46,030 --> 00:00:48,790
these values have been observed.
17
00:00:48,790 --> 00:00:51,780
Then, we notice that the
arguments that we carried out
18
00:00:51,780 --> 00:00:54,660
did not rely in
any way on the fact
19
00:00:54,660 --> 00:00:56,870
that X was one-dimensional.
20
00:00:56,870 --> 00:00:58,990
Exactly the same
argument goes through
21
00:00:58,990 --> 00:01:02,970
for the multi-dimensional
case, and simply, the answer
22
00:01:02,970 --> 00:01:06,060
is again, that the optimal
estimate, the one that
23
00:01:06,060 --> 00:01:09,120
minimizes the mean
squared error, is again,
24
00:01:09,120 --> 00:01:12,590
the conditional expectation
of the unknown random variable
25
00:01:12,590 --> 00:01:15,810
given the values of
the observations.
26
00:01:15,810 --> 00:01:19,110
So this gives us a simple
and much more general
27
00:01:19,110 --> 00:01:22,070
solution that also
applies to the case
28
00:01:22,070 --> 00:01:24,580
of multiple observations.
29
00:01:24,580 --> 00:01:28,780
Now, what if we have
multiple parameters?
30
00:01:28,780 --> 00:01:32,950
Once more, the argument
is exactly the same,
31
00:01:32,950 --> 00:01:35,320
and we obtain that
the optimal estimate
32
00:01:35,320 --> 00:01:37,740
of any particular
parameter is going
33
00:01:37,740 --> 00:01:41,070
to be the conditional
expectation of that parameter
34
00:01:41,070 --> 00:01:43,060
given the observations.
35
00:01:43,060 --> 00:01:50,729
So if our parameter vector
is something of this form,
36
00:01:50,729 --> 00:01:53,670
consisting of
several components,
37
00:01:53,670 --> 00:01:59,700
then the LMS estimate of the
jth component of our parameter
38
00:01:59,700 --> 00:02:03,950
vector is going to be simply
the conditional expectation
39
00:02:03,950 --> 00:02:09,850
of this parameter given the
data that we have obtained.
40
00:02:09,850 --> 00:02:13,180
And this gives us the
most general solution
41
00:02:13,180 --> 00:02:15,480
to the program of
least mean squares
42
00:02:15,480 --> 00:02:20,370
estimation when we have
multiple parameters
43
00:02:20,370 --> 00:02:22,950
and multiple observations.
44
00:02:22,950 --> 00:02:27,565
One very simple concept that
applies to all possible cases.
45
00:02:30,670 --> 00:02:35,680
Unfortunately, however,
our worries are not over.
46
00:02:35,680 --> 00:02:39,970
Even though LMS estimation
has such a simple and such
47
00:02:39,970 --> 00:02:45,360
a general solution, things
are not always easy.
48
00:02:45,360 --> 00:02:48,180
Let us see what's happening.
49
00:02:48,180 --> 00:02:52,590
No matter what, we
have to first find out
50
00:02:52,590 --> 00:02:55,680
the posterior
distribution of Theta
51
00:02:55,680 --> 00:02:58,480
given the observations
that we have obtained.
52
00:02:58,480 --> 00:03:02,150
And this is done using the Bayes
rule, which we have written
53
00:03:02,150 --> 00:03:04,530
here, and this is
how you evaluate
54
00:03:04,530 --> 00:03:08,380
the denominator in Bayes' rule.
55
00:03:08,380 --> 00:03:11,390
What are the difficulties
that we may encounter?
56
00:03:11,390 --> 00:03:15,020
One first difficulty is
that in many applications,
57
00:03:15,020 --> 00:03:18,520
we do not necessarily
have a good model
58
00:03:18,520 --> 00:03:21,950
or we're not very
confident about our model
59
00:03:21,950 --> 00:03:23,890
of the observations.
60
00:03:23,890 --> 00:03:26,710
If X and Theta are
multi-dimensional,
61
00:03:26,710 --> 00:03:31,560
such a model might be
difficult to construct.
62
00:03:31,560 --> 00:03:36,594
Setting this issue aside,
there's a further issue.
63
00:03:36,594 --> 00:03:39,640
The conditional
expectation of Theta
64
00:03:39,640 --> 00:03:44,160
given X may be a complicated
non-linear function
65
00:03:44,160 --> 00:03:46,300
of the observations.
66
00:03:46,300 --> 00:03:49,890
This means that it may
be difficult to analyze,
67
00:03:49,890 --> 00:03:52,720
but even more
important, it may be
68
00:03:52,720 --> 00:03:56,140
very difficult to
calculate even after you
69
00:03:56,140 --> 00:03:58,140
have obtained your data.
70
00:03:58,140 --> 00:04:01,790
Let us understand why
this might be the case.
71
00:04:01,790 --> 00:04:06,100
Suppose that Theta is a
multi-dimensional parameter.
72
00:04:06,100 --> 00:04:09,680
Then in order to calculate the
denominator that's involved
73
00:04:09,680 --> 00:04:13,780
here in the Bayes rule, when you
integrate with respect to theta
74
00:04:13,780 --> 00:04:20,279
, you have to actually carry
a multi-dimensional integral,
75
00:04:20,279 --> 00:04:24,190
and this can be very
challenging or sometimes,
76
00:04:24,190 --> 00:04:27,110
practically impossible.
77
00:04:27,110 --> 00:04:31,040
Even if you had this
denominator term in your hands,
78
00:04:31,040 --> 00:04:36,420
still, in order to calculate
a conditional expectation,
79
00:04:36,420 --> 00:04:41,230
you would have to
calculate once more
80
00:04:41,230 --> 00:04:46,430
an integral of
theta j integrated
81
00:04:46,430 --> 00:04:49,780
against the posterior
distribution of the vector
82
00:04:49,780 --> 00:04:52,190
theta.
83
00:04:52,190 --> 00:04:59,040
But this integral should be once
more, over all the parameters.
84
00:04:59,040 --> 00:05:02,640
So it would be a
multi-dimensional integral
85
00:05:02,640 --> 00:05:06,090
in the general case, and
that's one additional source
86
00:05:06,090 --> 00:05:07,940
of difficulty.
87
00:05:07,940 --> 00:05:10,570
And this is the reason
why we will also
88
00:05:10,570 --> 00:05:14,490
consider an alternative to least
mean squares estimation, which
89
00:05:14,490 --> 00:05:17,890
is much simpler
computationally and much less
90
00:05:17,890 --> 00:05:21,100
demanding in terms
of the model that we
91
00:05:21,100 --> 00:05:23,610
need to have in our hands.