WEBVTT
00:00:00.475 --> 00:00:03.050
In this segment, we discuss
a little more the
00:00:03.050 --> 00:00:04.890
mean squared error.
00:00:04.890 --> 00:00:06.660
Consider some estimator.
00:00:06.660 --> 00:00:10.000
It can be any estimator, not
just the sample mean.
00:00:10.000 --> 00:00:12.670
We can decompose the mean
squared error as
00:00:12.670 --> 00:00:15.430
a sum of two terms.
00:00:15.430 --> 00:00:17.520
Where does this formula
come from?
00:00:17.520 --> 00:00:20.610
Well, we know that for any
random variable Z, this
00:00:20.610 --> 00:00:22.490
formula is valid.
00:00:22.490 --> 00:00:27.060
And if we let Z be equal to
the difference between the
00:00:27.060 --> 00:00:31.230
estimator and the value that
we're trying to estimate, then
00:00:31.230 --> 00:00:33.120
we obtain this formula here.
00:00:33.120 --> 00:00:37.000
The expected value of our random
variable Z squared is
00:00:37.000 --> 00:00:39.460
equal to the variance of that
random variable plus the
00:00:39.460 --> 00:00:42.880
square of its mean.
00:00:42.880 --> 00:00:45.280
Let us now rewrite these
two terms in a
00:00:45.280 --> 00:00:47.120
more suggestive way.
00:00:47.120 --> 00:00:49.920
We first notice that theta
is a constant.
00:00:49.920 --> 00:00:52.530
When you add or subtract the
constant from a random
00:00:52.530 --> 00:00:55.030
variable, the variance
does not change.
00:00:55.030 --> 00:00:59.650
So this term is the same as
the variance of theta hat.
00:00:59.650 --> 00:01:02.710
This quantity here,
we will call it
00:01:02.710 --> 00:01:06.120
the bias of the estimator.
00:01:06.120 --> 00:01:10.130
It tells us whether theta hat
is systematically above or
00:01:10.130 --> 00:01:12.870
below than the unknown parameter
theta that we're
00:01:12.870 --> 00:01:14.340
trying to estimate.
00:01:14.340 --> 00:01:18.310
And using this terminology, this
term here is just equal
00:01:18.310 --> 00:01:20.660
to the square of the bias.
00:01:20.660 --> 00:01:23.890
So the mean squared error
consists of two components,
00:01:23.890 --> 00:01:27.820
and these capture different
aspects of an estimator's
00:01:27.820 --> 00:01:29.039
performance.
00:01:29.039 --> 00:01:32.620
Let us see what they are
in a concrete setting.
00:01:32.620 --> 00:01:35.780
Suppose that we're estimating
the unknown mean of some
00:01:35.780 --> 00:01:42.130
distribution, and that our
estimator is the sample mean.
00:01:42.130 --> 00:01:46.960
In this case, the mean squared
error is the variance, which
00:01:46.960 --> 00:01:52.940
we know to be sigma squared over
n, plus the bias term.
00:01:52.940 --> 00:01:55.539
But we know that the sample
mean is unbiased.
00:01:55.539 --> 00:01:58.450
The expected value of the sample
mean is equal to the
00:01:58.450 --> 00:01:59.500
unknown mean.
00:01:59.500 --> 00:02:03.160
And so the bias contribution
is equal to zero.
00:02:03.160 --> 00:02:05.930
Now, for the sake of a
comparison, let us consider a
00:02:05.930 --> 00:02:09.960
somewhat silly estimator which
ignores the data all together,
00:02:09.960 --> 00:02:16.000
and always gives you an
estimate of zero.
00:02:16.000 --> 00:02:20.210
In this case, the mean squared
error is as follows.
00:02:20.210 --> 00:02:24.480
Since our estimator is just a
constant, its variance is
00:02:24.480 --> 00:02:26.530
going to be equal to zero.
00:02:26.530 --> 00:02:33.230
On the other hand, since theta
hat is zero, this term here is
00:02:33.230 --> 00:02:36.770
just the constant,
theta, squared.
00:02:36.770 --> 00:02:39.260
And this gives us the
corresponding
00:02:39.260 --> 00:02:40.980
mean squared error.
00:02:40.980 --> 00:02:43.829
Let us now compare the
two estimators.
00:02:43.829 --> 00:02:48.490
We will plot the mean squared
error as a function of the
00:02:48.490 --> 00:02:51.490
unknown parameter, theta.
00:02:51.490 --> 00:02:55.680
For the sample mean estimator,
the mean squared error is
00:02:55.680 --> 00:03:01.400
constant, it does not depend on
theta, and is equal to this
00:03:01.400 --> 00:03:03.650
value, sigma squared over n.
00:03:03.650 --> 00:03:08.260
On the other hand, for the
zero estimator, the mean
00:03:08.260 --> 00:03:12.130
squared error is equal
to theta squared.
00:03:12.130 --> 00:03:13.410
How do they compare?
00:03:13.410 --> 00:03:15.280
Which one is better?
00:03:15.280 --> 00:03:17.630
At this point, there's no
way to say that one is
00:03:17.630 --> 00:03:19.130
better than the other.
00:03:19.130 --> 00:03:23.260
For some theta, the sample
mean has a smaller mean
00:03:23.260 --> 00:03:25.320
squared error.
00:03:25.320 --> 00:03:31.130
But for other theta, the zero
estimator has a smaller mean
00:03:31.130 --> 00:03:33.190
squared error.
00:03:33.190 --> 00:03:36.630
But we do not know where the
true value of theta is.
00:03:36.630 --> 00:03:38.030
It could be anything.
00:03:38.030 --> 00:03:41.750
So we cannot say that one is
better than the other.
00:03:41.750 --> 00:03:44.640
Of course, we know that
the sample mean is
00:03:44.640 --> 00:03:46.720
a consistent estimator.
00:03:46.720 --> 00:03:49.530
As n goes to infinity,
it will give you the
00:03:49.530 --> 00:03:51.160
true value of theta.
00:03:51.160 --> 00:03:54.170
And this is a very desirable
properties that the zero
00:03:54.170 --> 00:03:56.630
estimator does not have.
00:03:56.630 --> 00:04:01.610
But if n is moderate, the
situation is less clear.
00:04:01.610 --> 00:04:05.810
If we have some good reason to
expect that the true value of
00:04:05.810 --> 00:04:09.460
theta is somewhere in the
vicinity of zero, then the
00:04:09.460 --> 00:04:13.605
zero estimator might be a better
one, because it then
00:04:13.605 --> 00:04:17.180
will achieve a smaller
mean squared error.
00:04:17.180 --> 00:04:21.170
But in a classical statistical
framework, there is no way to
00:04:21.170 --> 00:04:24.710
express a belief of this kind.
00:04:24.710 --> 00:04:28.530
In contrast, if we were
following a Bayesian approach,
00:04:28.530 --> 00:04:31.830
you could provide a prior
distribution for theta that
00:04:31.830 --> 00:04:34.560
would be highly concentrated
around zero.
00:04:34.560 --> 00:04:37.780
This would express your beliefs
about theta, and would
00:04:37.780 --> 00:04:41.190
provide you with the guidance
to choose between the two
00:04:41.190 --> 00:04:46.860
estimators, or maybe suggest
an even better estimator.
00:04:46.860 --> 00:04:50.370
In any case, going back to this
formula, this quantity,
00:04:50.370 --> 00:04:53.770
the variance of the estimator
plays an important role in the
00:04:53.770 --> 00:04:56.580
analysis of different
estimators.
00:04:56.580 --> 00:05:01.000
And the more intuitive variant
of this quantity is its square
00:05:01.000 --> 00:05:06.020
root, which is the standard
deviation of the estimator,
00:05:06.020 --> 00:05:08.320
and is usually called
the standard
00:05:08.320 --> 00:05:10.720
error of the estimator.
00:05:10.720 --> 00:05:14.280
We can interpret the standard
error as follows.
00:05:14.280 --> 00:05:16.730
We have the true
value of theta.
00:05:19.300 --> 00:05:23.290
Then on day one, we collect
some data, we perform the
00:05:23.290 --> 00:05:27.610
estimation procedure, and we
come up with an estimate.
00:05:27.610 --> 00:05:32.010
On day two, we do the same
thing, but independently.
00:05:32.010 --> 00:05:35.420
We collect a new set of
data, and we come up
00:05:35.420 --> 00:05:37.590
with another estimate.
00:05:37.590 --> 00:05:38.850
And so on.
00:05:38.850 --> 00:05:40.970
We do this many times.
00:05:40.970 --> 00:05:44.210
We use different data
sets to come up
00:05:44.210 --> 00:05:46.030
with different estimates.
00:05:46.030 --> 00:05:49.900
And because of the randomness
in the data, these estimates
00:05:49.900 --> 00:05:52.390
may be all over the place.
00:05:52.390 --> 00:05:56.710
Well, the standard error tells
us how spread out all these
00:05:56.710 --> 00:05:58.130
estimates will be.
00:05:58.130 --> 00:06:00.910
It is the standard
deviation of this
00:06:00.910 --> 00:06:03.240
collection of estimates.
00:06:03.240 --> 00:06:06.820
Having a large standard error
means that our estimation
00:06:06.820 --> 00:06:11.570
procedure is quite noisy, and
that our estimates have some
00:06:11.570 --> 00:06:13.590
inherent randomness.
00:06:13.590 --> 00:06:17.560
And therefore, also have
a lack of accuracy.
00:06:17.560 --> 00:06:20.620
That is, they cannot be
trusted too much.
00:06:20.620 --> 00:06:23.240
That's the case of a large
standard error.
00:06:23.240 --> 00:06:27.240
Conversely, a small standard
error would tell us that the
00:06:27.240 --> 00:06:30.210
estimates would tend
to be concentrated
00:06:30.210 --> 00:06:32.520
close to each other.
00:06:32.520 --> 00:06:36.560
As such, the standard error
is a very useful piece of
00:06:36.560 --> 00:06:39.180
information to have.
00:06:39.180 --> 00:06:44.210
Besides designing and
implementing an estimator, one
00:06:44.210 --> 00:06:49.260
usually also tries to find a
way to calculate and report
00:06:49.260 --> 00:06:51.090
the associated standard error.