WEBVTT

00:00:01.600 --> 00:00:04.220
Our discussion of least
mean squares estimation

00:00:04.220 --> 00:00:06.590
so far was based
on the case where

00:00:06.590 --> 00:00:09.400
we have a single
unknown random variable

00:00:09.400 --> 00:00:11.990
and a single observation.

00:00:11.990 --> 00:00:14.300
And we're interested
in a point estimate

00:00:14.300 --> 00:00:16.720
of this single unknown
random variable.

00:00:16.720 --> 00:00:20.880
What happens if we have multiple
observations or parameters?

00:00:20.880 --> 00:00:22.850
For example,
suppose that instead

00:00:22.850 --> 00:00:27.540
of a single observation, we have
a whole vector of observations.

00:00:27.540 --> 00:00:29.220
And, of course,
we assume that we

00:00:29.220 --> 00:00:32.119
have a model for
these observations.

00:00:32.119 --> 00:00:35.510
Once we observe our
data, a numerical value

00:00:35.510 --> 00:00:39.300
for this vector, or what is
the same numerical values

00:00:39.300 --> 00:00:42.930
for each one of these
observation random variables.

00:00:42.930 --> 00:00:46.030
Then we're placed in the
conditional universe where

00:00:46.030 --> 00:00:48.790
these values have been observed.

00:00:48.790 --> 00:00:51.780
Then, we notice that the
arguments that we carried out

00:00:51.780 --> 00:00:54.660
did not rely in
any way on the fact

00:00:54.660 --> 00:00:56.870
that X was one-dimensional.

00:00:56.870 --> 00:00:58.990
Exactly the same
argument goes through

00:00:58.990 --> 00:01:02.970
for the multi-dimensional
case, and simply, the answer

00:01:02.970 --> 00:01:06.060
is again, that the optimal
estimate, the one that

00:01:06.060 --> 00:01:09.120
minimizes the mean
squared error, is again,

00:01:09.120 --> 00:01:12.590
the conditional expectation
of the unknown random variable

00:01:12.590 --> 00:01:15.810
given the values of
the observations.

00:01:15.810 --> 00:01:19.110
So this gives us a simple
and much more general

00:01:19.110 --> 00:01:22.070
solution that also
applies to the case

00:01:22.070 --> 00:01:24.580
of multiple observations.

00:01:24.580 --> 00:01:28.780
Now, what if we have
multiple parameters?

00:01:28.780 --> 00:01:32.950
Once more, the argument
is exactly the same,

00:01:32.950 --> 00:01:35.320
and we obtain that
the optimal estimate

00:01:35.320 --> 00:01:37.740
of any particular
parameter is going

00:01:37.740 --> 00:01:41.070
to be the conditional
expectation of that parameter

00:01:41.070 --> 00:01:43.060
given the observations.

00:01:43.060 --> 00:01:50.729
So if our parameter vector
is something of this form,

00:01:50.729 --> 00:01:53.670
consisting of
several components,

00:01:53.670 --> 00:01:59.700
then the LMS estimate of the
jth component of our parameter

00:01:59.700 --> 00:02:03.950
vector is going to be simply
the conditional expectation

00:02:03.950 --> 00:02:09.850
of this parameter given the
data that we have obtained.

00:02:09.850 --> 00:02:13.180
And this gives us the
most general solution

00:02:13.180 --> 00:02:15.480
to the program of
least mean squares

00:02:15.480 --> 00:02:20.370
estimation when we have
multiple parameters

00:02:20.370 --> 00:02:22.950
and multiple observations.

00:02:22.950 --> 00:02:27.565
One very simple concept that
applies to all possible cases.

00:02:30.670 --> 00:02:35.680
Unfortunately, however,
our worries are not over.

00:02:35.680 --> 00:02:39.970
Even though LMS estimation
has such a simple and such

00:02:39.970 --> 00:02:45.360
a general solution, things
are not always easy.

00:02:45.360 --> 00:02:48.180
Let us see what's happening.

00:02:48.180 --> 00:02:52.590
No matter what, we
have to first find out

00:02:52.590 --> 00:02:55.680
the posterior
distribution of Theta

00:02:55.680 --> 00:02:58.480
given the observations
that we have obtained.

00:02:58.480 --> 00:03:02.150
And this is done using the Bayes
rule, which we have written

00:03:02.150 --> 00:03:04.530
here, and this is
how you evaluate

00:03:04.530 --> 00:03:08.380
the denominator in Bayes' rule.

00:03:08.380 --> 00:03:11.390
What are the difficulties
that we may encounter?

00:03:11.390 --> 00:03:15.020
One first difficulty is
that in many applications,

00:03:15.020 --> 00:03:18.520
we do not necessarily
have a good model

00:03:18.520 --> 00:03:21.950
or we're not very
confident about our model

00:03:21.950 --> 00:03:23.890
of the observations.

00:03:23.890 --> 00:03:26.710
If X and Theta are
multi-dimensional,

00:03:26.710 --> 00:03:31.560
such a model might be
difficult to construct.

00:03:31.560 --> 00:03:36.594
Setting this issue aside,
there's a further issue.

00:03:36.594 --> 00:03:39.640
The conditional
expectation of Theta

00:03:39.640 --> 00:03:44.160
given X may be a complicated
non-linear function

00:03:44.160 --> 00:03:46.300
of the observations.

00:03:46.300 --> 00:03:49.890
This means that it may
be difficult to analyze,

00:03:49.890 --> 00:03:52.720
but even more
important, it may be

00:03:52.720 --> 00:03:56.140
very difficult to
calculate even after you

00:03:56.140 --> 00:03:58.140
have obtained your data.

00:03:58.140 --> 00:04:01.790
Let us understand why
this might be the case.

00:04:01.790 --> 00:04:06.100
Suppose that Theta is a
multi-dimensional parameter.

00:04:06.100 --> 00:04:09.680
Then in order to calculate the
denominator that's involved

00:04:09.680 --> 00:04:13.780
here in the Bayes rule, when you
integrate with respect to theta

00:04:13.780 --> 00:04:20.279
, you have to actually carry
a multi-dimensional integral,

00:04:20.279 --> 00:04:24.190
and this can be very
challenging or sometimes,

00:04:24.190 --> 00:04:27.110
practically impossible.

00:04:27.110 --> 00:04:31.040
Even if you had this
denominator term in your hands,

00:04:31.040 --> 00:04:36.420
still, in order to calculate
a conditional expectation,

00:04:36.420 --> 00:04:41.230
you would have to
calculate once more

00:04:41.230 --> 00:04:46.430
an integral of
theta j integrated

00:04:46.430 --> 00:04:49.780
against the posterior
distribution of the vector

00:04:49.780 --> 00:04:52.190
theta.

00:04:52.190 --> 00:04:59.040
But this integral should be once
more, over all the parameters.

00:04:59.040 --> 00:05:02.640
So it would be a
multi-dimensional integral

00:05:02.640 --> 00:05:06.090
in the general case, and
that's one additional source

00:05:06.090 --> 00:05:07.940
of difficulty.

00:05:07.940 --> 00:05:10.570
And this is the reason
why we will also

00:05:10.570 --> 00:05:14.490
consider an alternative to least
mean squares estimation, which

00:05:14.490 --> 00:05:17.890
is much simpler
computationally and much less

00:05:17.890 --> 00:05:21.100
demanding in terms
of the model that we

00:05:21.100 --> 00:05:23.610
need to have in our hands.