WEBVTT
00:00:01.000 --> 00:00:03.540
We have presented
the complete solution
00:00:03.540 --> 00:00:06.920
to the liner least mean squares
estimation problem, when
00:00:06.920 --> 00:00:10.430
we want to estimate a certain
unknown random variable
00:00:10.430 --> 00:00:13.720
on the basis of a
different random variable X
00:00:13.720 --> 00:00:15.550
that we get to observe.
00:00:15.550 --> 00:00:19.720
But what if we have
multiple observations?
00:00:19.720 --> 00:00:23.510
What would be the analogous
formulation of the problem?
00:00:23.510 --> 00:00:24.950
Here's the idea.
00:00:24.950 --> 00:00:28.320
Once more, we restrict
ourselves to estimators
00:00:28.320 --> 00:00:31.970
that are linear functions of
the data, linear functions
00:00:31.970 --> 00:00:34.280
of the observations
that we have.
00:00:34.280 --> 00:00:37.670
And then we pose the
problem of finding the best
00:00:37.670 --> 00:00:42.570
choices of these coefficients
a1 up to a n and b.
00:00:42.570 --> 00:00:45.540
What does it mean to
find the best choices?
00:00:45.540 --> 00:00:49.010
It means that if we
fix certain choices,
00:00:49.010 --> 00:00:52.170
we obtain an estimator,
we look at the difference
00:00:52.170 --> 00:00:54.480
between the estimator
and the quantity
00:00:54.480 --> 00:00:56.700
we're trying to estimate,
take the square,
00:00:56.700 --> 00:00:58.880
and then take the expectation.
00:00:58.880 --> 00:01:01.910
So once more, we're looking
at the mean squared error
00:01:01.910 --> 00:01:06.970
of our estimator and we try to
make it as small as possible.
00:01:06.970 --> 00:01:10.760
So this is a well-defined
optimization problem.
00:01:10.760 --> 00:01:15.830
We have a quantity, which is a
function of certain parameters.
00:01:15.830 --> 00:01:19.050
And we wish to find the
choices for those parameters,
00:01:19.050 --> 00:01:21.420
or those coefficients,
that will make
00:01:21.420 --> 00:01:24.930
this quantity as
small as possible.
00:01:24.930 --> 00:01:27.820
One first comment is
similar to the case
00:01:27.820 --> 00:01:30.920
where we had a single
measurement [and]
00:01:30.920 --> 00:01:32.280
is the following.
00:01:32.280 --> 00:01:35.560
If it turns out that the
conditional expectation
00:01:35.560 --> 00:01:38.590
of Theta given all
of the data that we
00:01:38.590 --> 00:01:44.440
have is linear in X, if it is
of this form, then what happens?
00:01:44.440 --> 00:01:47.990
We know that this is the
best possible estimator.
00:01:47.990 --> 00:01:51.720
If it is also linear, then
it is the best estimator
00:01:51.720 --> 00:01:55.470
within the class of
linear estimators as well
00:01:55.470 --> 00:01:59.100
and, therefore, the linear
least mean squares estimator
00:01:59.100 --> 00:02:03.800
is the same as the general
least mean squares estimator.
00:02:03.800 --> 00:02:08.050
So if for some problems it
turns out that this is linear,
00:02:08.050 --> 00:02:13.240
then we automatically also have
the optimal linear estimator.
00:02:13.240 --> 00:02:15.520
And this is going to
be the case, once more,
00:02:15.520 --> 00:02:20.560
for certain normal problems with
a linear structure of the type
00:02:20.560 --> 00:02:22.520
that we have studied earlier.
00:02:25.740 --> 00:02:28.870
Now, let us look
into what it takes
00:02:28.870 --> 00:02:32.079
to carry out this optimization.
00:02:32.079 --> 00:02:35.100
If we had a single
observation, then we
00:02:35.100 --> 00:02:38.710
have seen a closed form formula,
a fairly simple formula,
00:02:38.710 --> 00:02:41.650
that tells us what the
coefficients should be.
00:02:41.650 --> 00:02:43.920
For the more general
case, formulas
00:02:43.920 --> 00:02:47.090
would not be as
simple, but we can
00:02:47.090 --> 00:02:49.700
make the following observations.
00:02:49.700 --> 00:02:53.510
If you take this
expression and expand it,
00:02:53.510 --> 00:02:56.250
it's going to have
a bunch of terms.
00:02:56.250 --> 00:03:00.650
For example, it's going to have
a term of the form a1 squared
00:03:00.650 --> 00:03:04.730
times the expected
value of X1 squared.
00:03:04.730 --> 00:03:11.590
It's going to have a term
such as twice a1, a2 times
00:03:11.590 --> 00:03:16.150
the expected value of X1, X2.
00:03:16.150 --> 00:03:20.760
And then there's going to be
many more terms to some of them
00:03:20.760 --> 00:03:26.920
will also involve products
of Theta with this.
00:03:26.920 --> 00:03:32.829
So we might see that we have
a term of the form a1 expected
00:03:32.829 --> 00:03:36.290
value of X1 Theta.
00:03:36.290 --> 00:03:40.010
And then, there's going to
be many, many more terms.
00:03:40.010 --> 00:03:42.350
What's the important
thing to notice?
00:03:42.350 --> 00:03:46.980
That this expression as a
function of the coefficient
00:03:46.980 --> 00:03:49.526
involves terms
either of this kind
00:03:49.526 --> 00:03:51.570
or of this kind,
or of that kind,
00:03:51.570 --> 00:03:55.800
first-order or
second-order terms.
00:03:55.800 --> 00:03:57.430
To minimize this
expression, we're
00:03:57.430 --> 00:04:02.730
going to take the derivative
of this and set it equal to 0.
00:04:02.730 --> 00:04:06.210
When you take the derivative
of a function that
00:04:06.210 --> 00:04:09.660
involves only quadratic
and linear terms,
00:04:09.660 --> 00:04:14.410
you get something that's
linear in the coefficients.
00:04:14.410 --> 00:04:16.730
The conclusion out of
all this discussion
00:04:16.730 --> 00:04:21.480
is that when you actually go
and carry out this minimization
00:04:21.480 --> 00:04:23.930
by setting derivatives
to zero, what you
00:04:23.930 --> 00:04:29.130
will end up doing is solving
a system of linear equations
00:04:29.130 --> 00:04:32.085
in the coefficients that
you're trying to determine.
00:04:32.085 --> 00:04:34.310
And why is this interesting?
00:04:34.310 --> 00:04:36.650
Well, it is because
if you actually
00:04:36.650 --> 00:04:39.010
want to carry out
this minimization,
00:04:39.010 --> 00:04:43.050
all you need to do is to solve
a linear system, which is easily
00:04:43.050 --> 00:04:46.370
done on a computer.
00:04:46.370 --> 00:04:51.100
The next observation is
that this expression only
00:04:51.100 --> 00:04:55.860
involves expectations
of various terms
00:04:55.860 --> 00:04:59.750
that are second order in the
random variables involved.
00:04:59.750 --> 00:05:02.950
So it involves the expected
value of X1 squared,
00:05:02.950 --> 00:05:05.050
it involves this term,
which has something
00:05:05.050 --> 00:05:07.960
to do with the
covariance of X1 and X2.
00:05:07.960 --> 00:05:11.280
This term that has something
to do with the covariance of X1
00:05:11.280 --> 00:05:12.910
with Theta.
00:05:12.910 --> 00:05:17.480
But these are the only terms out
of the distribution of the X's
00:05:17.480 --> 00:05:20.310
and of Theta that will matter.
00:05:20.310 --> 00:05:25.420
So similar to the case where
we had a single observation,
00:05:25.420 --> 00:05:27.360
in order to solve
this problem, we
00:05:27.360 --> 00:05:31.590
do not need to know the
complete distribution of the X's
00:05:31.590 --> 00:05:32.705
and of Theta.
00:05:32.705 --> 00:05:35.570
It is enough to know
all of the means,
00:05:35.570 --> 00:05:39.040
variances, and covariances
of the random variables
00:05:39.040 --> 00:05:40.550
that are involved.
00:05:40.550 --> 00:05:43.390
And once more, this
makes this approach
00:05:43.390 --> 00:05:47.060
to estimation a practical
one, because we do not
00:05:47.060 --> 00:05:50.090
need to model in complete
detail the distribution
00:05:50.090 --> 00:05:53.470
of the different
random variables.
00:05:53.470 --> 00:05:58.130
Finally, if we do not have just
one unknown random variable,
00:05:58.130 --> 00:06:00.570
but we have multiple
random variables that we
00:06:00.570 --> 00:06:03.740
want to estimate,
what should we do?
00:06:03.740 --> 00:06:05.800
Well, this is pretty simple.
00:06:05.800 --> 00:06:08.250
You just apply this
estimation methodology
00:06:08.250 --> 00:06:13.390
to each one of the unknown
random variables separately.
00:06:13.390 --> 00:06:18.720
To conclude, this linear
estimation methodology
00:06:18.720 --> 00:06:23.900
applies also to the case where
you have multiple observations.
00:06:23.900 --> 00:06:27.120
You need to solve a certain
computational problem in order
00:06:27.120 --> 00:06:30.390
to find the structure of
the best linear estimator,
00:06:30.390 --> 00:06:33.640
but it is not a very difficult
computational problem,
00:06:33.640 --> 00:06:36.260
because all that it
involves is to minimize
00:06:36.260 --> 00:06:38.780
a quadratic function
of the coefficients
00:06:38.780 --> 00:06:40.720
that you are trying
to determine.
00:06:40.720 --> 00:06:43.130
And this leads us
to having to solve
00:06:43.130 --> 00:06:45.230
a system of linear equations.
00:06:45.230 --> 00:06:48.420
For all these reasons,
linear estimation,
00:06:48.420 --> 00:06:53.310
or estimation using linear
estimators, is quite practical.