WEBVTT

00:00:01.020 --> 00:00:04.900
In this final segment, we want
to discuss an interesting point

00:00:04.900 --> 00:00:07.830
about linear estimators.

00:00:07.830 --> 00:00:10.000
Here's what the issue is.

00:00:10.000 --> 00:00:12.740
You obtain an observation,
X, on the basis

00:00:12.740 --> 00:00:15.210
of which you want
to estimate Theta.

00:00:15.210 --> 00:00:19.190
But perhaps you measure
X on a different scale,

00:00:19.190 --> 00:00:23.410
let's say on a cubic scale, so
that what you record actually

00:00:23.410 --> 00:00:25.160
is X cubed.

00:00:25.160 --> 00:00:28.440
So you're faced with two
possible estimation problems.

00:00:28.440 --> 00:00:32.200
One estimation problem is
to use X to estimate Theta.

00:00:32.200 --> 00:00:36.900
Another estimation problem is to
use X cubed to estimate Theta.

00:00:36.900 --> 00:00:38.990
Does it make a difference?

00:00:38.990 --> 00:00:42.890
Let's consider the case of
least mean squares estimation,

00:00:42.890 --> 00:00:45.550
without any
linearity constraint.

00:00:45.550 --> 00:00:48.530
If you use X to estimate
Theta, your estimator

00:00:48.530 --> 00:00:51.160
is going to be this
conditional expectation.

00:00:51.160 --> 00:00:53.580
If you use X cubed
to estimate Theta,

00:00:53.580 --> 00:00:56.900
your estimator will be this
conditional expectation.

00:00:56.900 --> 00:00:58.670
Are they different?

00:00:58.670 --> 00:01:05.510
Well, X and X cubed carry the
same information about Theta.

00:01:05.510 --> 00:01:10.320
In particular, the posterior
distribution of Theta given X

00:01:10.320 --> 00:01:14.200
is going to be the same as the
posterior distribution of Theta

00:01:14.200 --> 00:01:16.090
given X cubed.

00:01:16.090 --> 00:01:18.470
You will be getting
the same information,

00:01:18.470 --> 00:01:20.700
the same knowledge about X.

00:01:20.700 --> 00:01:22.480
And in particular,
if you calculate

00:01:22.480 --> 00:01:26.970
conditional expectations,
these will also be the same.

00:01:26.970 --> 00:01:29.260
What about the linear case?

00:01:29.260 --> 00:01:32.320
If we restrict to
linear estimators,

00:01:32.320 --> 00:01:35.220
then on the basis
of X, you would

00:01:35.220 --> 00:01:38.110
form a linear
estimation of this kind.

00:01:38.110 --> 00:01:42.350
But if your observation
is in the form of X cubed,

00:01:42.350 --> 00:01:45.050
then a linear
estimator would form

00:01:45.050 --> 00:01:47.680
a linear function of X cubed.

00:01:47.680 --> 00:01:51.060
So this would be a
different kind of estimator.

00:01:51.060 --> 00:01:54.110
We have seen a formula
on how to obtain the best

00:01:54.110 --> 00:01:57.030
estimator, the best
choices of a and b

00:01:57.030 --> 00:01:59.229
for estimators of this kind.

00:01:59.229 --> 00:02:01.800
We can use that same
formula to obtain

00:02:01.800 --> 00:02:04.210
the best estimator of that kind.

00:02:04.210 --> 00:02:07.100
It's going to be, of course,
a different estimator.

00:02:07.100 --> 00:02:10.949
Here, we're optimizing
within a different class.

00:02:10.949 --> 00:02:13.630
Which one of the two is better?

00:02:13.630 --> 00:02:15.210
Well, this depends
on what you know

00:02:15.210 --> 00:02:18.760
about the particular
problem at hand.

00:02:18.760 --> 00:02:21.850
If you have some
reason to believe,

00:02:21.850 --> 00:02:26.810
or if you know that Theta
and X are roughly related

00:02:26.810 --> 00:02:31.110
by some kind of cubic relation,
then perhaps estimators

00:02:31.110 --> 00:02:34.690
in this class are going to
perform better than estimators

00:02:34.690 --> 00:02:37.040
in that class.

00:02:37.040 --> 00:02:40.620
Let me also point out a related
issue that would come here.

00:02:40.620 --> 00:02:43.650
To find the right
choice of a, you

00:02:43.650 --> 00:02:47.400
need to know the covariance
between X and Theta.

00:02:47.400 --> 00:02:49.380
That's why the
formula tells us about

00:02:49.380 --> 00:02:51.740
the optimal linear estimator.

00:02:51.740 --> 00:02:58.070
Here you would you need to know
the covariance between Theta

00:02:58.070 --> 00:03:01.140
and X cubed.

00:03:01.140 --> 00:03:04.640
In addition, the formula
requires the variance of X.

00:03:04.640 --> 00:03:07.720
But here, instead of
X, we're using X cubed.

00:03:07.720 --> 00:03:12.300
So in this case, we would
need the variance of X cubed.

00:03:12.300 --> 00:03:14.360
Now, this could be
more challenging.

00:03:14.360 --> 00:03:17.710
In general, the higher
the powers that you have,

00:03:17.710 --> 00:03:19.880
the more difficult
these quantities

00:03:19.880 --> 00:03:23.430
are to calculate or
to know what they are.

00:03:23.430 --> 00:03:27.430
But leaving that issue
aside, what we have here

00:03:27.430 --> 00:03:32.180
is two alternative choices for
the structure of the estimator

00:03:32.180 --> 00:03:33.130
that we're using.

00:03:35.770 --> 00:03:39.380
Now, we can push
this story further.

00:03:39.380 --> 00:03:42.910
Instead of considering just
estimators of this kind,

00:03:42.910 --> 00:03:48.380
we might consider as well
estimators of this kind.

00:03:48.380 --> 00:03:51.280
Is this a linear estimator?

00:03:51.280 --> 00:03:53.610
We still call it a
linear estimator,

00:03:53.610 --> 00:03:56.680
because it is linear
in the coefficients

00:03:56.680 --> 00:03:59.900
that we have to choose
on how to optimize.

00:03:59.900 --> 00:04:01.440
That's the more important part.

00:04:01.440 --> 00:04:04.430
It's the linearity in these
coefficients that's important,

00:04:04.430 --> 00:04:07.460
rather than the
linearity in the X's.

00:04:07.460 --> 00:04:11.050
So as a function of
X, this is non-linear.

00:04:11.050 --> 00:04:14.950
On the other hand, we can think
of this X as one observation,

00:04:14.950 --> 00:04:17.269
X squared as
another observation,

00:04:17.269 --> 00:04:21.579
X cubed as a third observation,
and what we've got here

00:04:21.579 --> 00:04:26.210
is a linear function of
three different observations.

00:04:26.210 --> 00:04:30.650
So we can still pose a least
squares problem in which we

00:04:30.650 --> 00:04:34.680
try to find the best choices
for the coefficients a1, a2,

00:04:34.680 --> 00:04:37.370
and a3, as well as
the coefficient b,

00:04:37.370 --> 00:04:39.200
find those choices
that they're going

00:04:39.200 --> 00:04:42.960
to give us the smallest
possible mean squared error.

00:04:42.960 --> 00:04:46.050
So we can optimize
within this class.

00:04:46.050 --> 00:04:48.450
Within this class of
estimators, we certainly

00:04:48.450 --> 00:04:50.590
have more flexibility.

00:04:50.590 --> 00:04:52.630
This is a more general
class of estimators

00:04:52.630 --> 00:04:55.480
than either of this
one or that one.

00:04:55.480 --> 00:04:59.720
So within this class, we should
be able to do even better.

00:04:59.720 --> 00:05:01.980
On the other hand, we
would have to pay a price

00:05:01.980 --> 00:05:04.730
that this is a more
complex structure.

00:05:04.730 --> 00:05:08.420
It would be more difficult to
find the optimal coefficients.

00:05:08.420 --> 00:05:12.380
And also, we're going to
need higher order moments

00:05:12.380 --> 00:05:17.010
or expectations related
to the X's and the Thetas.

00:05:17.010 --> 00:05:21.500
Finally, there's nothing
special in us using powers of X

00:05:21.500 --> 00:05:23.410
and using a polynomial.

00:05:23.410 --> 00:05:25.610
We could also look
at estimators that

00:05:25.610 --> 00:05:27.800
have some other
type of structure.

00:05:27.800 --> 00:05:29.730
For example, we
might want to mix

00:05:29.730 --> 00:05:33.780
an exponential function in X
and a logarithmic function of X,

00:05:33.780 --> 00:05:38.560
look at estimators of this form,
and try to choose the best one.

00:05:38.560 --> 00:05:41.520
Find the best choice
of the coefficients.

00:05:41.520 --> 00:05:45.620
Again, this is something
that is possible.

00:05:45.620 --> 00:05:47.810
And again, it's
going to boil down

00:05:47.810 --> 00:05:52.250
to solving a system of linear
equations in the coefficients.

00:05:52.250 --> 00:05:55.490
On the other hand, we need
to know various expectations

00:05:55.490 --> 00:05:59.540
about X that might be
difficult to obtain.

00:05:59.540 --> 00:06:02.330
How do we choose
which structure to use

00:06:02.330 --> 00:06:06.300
should it be this one, this
one, this one, or that one?

00:06:06.300 --> 00:06:09.260
There's a trade-off, that
more complicated structures

00:06:09.260 --> 00:06:11.230
introduce more
complexity and make

00:06:11.230 --> 00:06:13.290
the problem more difficult.

00:06:13.290 --> 00:06:14.990
But there's also another issue.

00:06:14.990 --> 00:06:17.050
It has to do with
what do we know

00:06:17.050 --> 00:06:19.960
about the particular
problem at hand.

00:06:19.960 --> 00:06:23.290
If we know or have reason
to believe that third order

00:06:23.290 --> 00:06:27.180
polynomials are going to give
us excellent estimates of theta,

00:06:27.180 --> 00:06:31.450
then we may want to
work within this class.

00:06:31.450 --> 00:06:33.990
In any case, the
moral of this story

00:06:33.990 --> 00:06:38.600
is that if we are to use the
linear estimation methodology,

00:06:38.600 --> 00:06:40.860
we do have some choices.

00:06:40.860 --> 00:06:43.790
Linear in what?

00:06:43.790 --> 00:06:47.590
And different choices will
give us different performance.

00:06:47.590 --> 00:06:51.710
But this now gets somewhat
away from the subject

00:06:51.710 --> 00:06:56.520
of a mathematical methodology,
and it gets closer to the art

00:06:56.520 --> 00:07:01.500
that you need to exercise in
any particular problem domain.