WEBVTT

00:00:01.230 --> 00:00:04.490
In this lecture, we provide a
quick introduction into the

00:00:04.490 --> 00:00:07.990
conceptual framework of
classical statistics.

00:00:07.990 --> 00:00:11.710
We use the words "classical" to
make the distinction from

00:00:11.710 --> 00:00:15.540
the Bayesian framework that
we have been using so far.

00:00:15.540 --> 00:00:18.890
Recall that in the Bayesian
framework, unknown quantities

00:00:18.890 --> 00:00:22.110
are viewed as random variables
that have a certain prior

00:00:22.110 --> 00:00:23.690
distribution.

00:00:23.690 --> 00:00:27.770
In contrast, in the classical
setting an unknown quantity is

00:00:27.770 --> 00:00:31.560
viewed as a non-random constant
that just happens to

00:00:31.560 --> 00:00:33.930
be unknown.

00:00:33.930 --> 00:00:36.950
We can still carry out
estimation without assuming a

00:00:36.950 --> 00:00:39.650
prior for the unknown
quantity.

00:00:39.650 --> 00:00:43.190
For example, if the unknown
theta is the mean of a certain

00:00:43.190 --> 00:00:46.630
distribution, we can generate
many samples from that

00:00:46.630 --> 00:00:49.960
distribution and form
the sample mean.

00:00:49.960 --> 00:00:53.110
The weak law of large numbers
then tells us that this

00:00:53.110 --> 00:00:57.690
estimate will approach in the
limit as n increases the true

00:00:57.690 --> 00:01:00.010
value of theta.

00:01:00.010 --> 00:01:03.440
After going through this
argument, we will then take

00:01:03.440 --> 00:01:06.960
the occasion to introduce some
terminology that is often used

00:01:06.960 --> 00:01:11.010
in connection with classical
estimation methods.

00:01:11.010 --> 00:01:14.940
Now, the sample mean provides
us a point estimate for the

00:01:14.940 --> 00:01:18.720
unknown theta, but does not
tell us how accurate that

00:01:18.720 --> 00:01:20.510
estimate is.

00:01:20.510 --> 00:01:23.400
To give a sense of the
accuracy involved, we

00:01:23.400 --> 00:01:27.180
introduce the concept of a
confidence interval, which is

00:01:27.180 --> 00:01:31.110
an interval that has high
probability of containing the

00:01:31.110 --> 00:01:33.140
true theta.

00:01:33.140 --> 00:01:36.560
In general, it is a common
practice to report not just

00:01:36.560 --> 00:01:39.950
estimates, but also confidence
intervals.

00:01:39.950 --> 00:01:43.120
But as we will discuss, one
has to be careful in

00:01:43.120 --> 00:01:48.300
interpreting what exactly a
confidence interval tells us.

00:01:48.300 --> 00:01:51.650
We will see that we can easily
calculate confidence intervals

00:01:51.650 --> 00:01:53.800
using the central
limit theorem.

00:01:53.800 --> 00:01:57.250
And we will discuss in some
detail some extra steps that

00:01:57.250 --> 00:02:00.930
need to be taken if we do not
know the variance of the

00:02:00.930 --> 00:02:04.630
random variables involved.

00:02:04.630 --> 00:02:07.080
We will then continue in the
direction of greater

00:02:07.080 --> 00:02:08.410
generality.

00:02:08.410 --> 00:02:12.550
We will see that by repeated use
of various sample means,

00:02:12.550 --> 00:02:16.210
we can also estimate more
complicated quantities, such

00:02:16.210 --> 00:02:18.970
as, for example, the correlation
coefficient

00:02:18.970 --> 00:02:22.160
between two random variables.

00:02:22.160 --> 00:02:24.990
We will conclude by introducing
a general

00:02:24.990 --> 00:02:29.120
estimation methodology, the
so-called maximum likelihood

00:02:29.120 --> 00:02:30.940
estimation method.

00:02:30.940 --> 00:02:35.300
This is a method that applies
always, even when the unknown

00:02:35.300 --> 00:02:38.240
parameter of interest cannot
be interpreted as an

00:02:38.240 --> 00:02:39.650
expectation.

00:02:39.650 --> 00:02:42.200
It is a universally-applicable
method.

00:02:42.200 --> 00:02:45.329
And fortunately, has some very
desirable properties.