WEBVTT

00:00:00.690 --> 00:00:04.270
We now discuss how to come up
with confidence intervals when

00:00:04.270 --> 00:00:08.550
we try to estimate the unknown
mean of some random variable,

00:00:08.550 --> 00:00:10.830
or of some distribution,
using the

00:00:10.830 --> 00:00:13.170
sample mean as our estimator.

00:00:13.170 --> 00:00:17.310
So here X1 up to Xn are
independent, identically

00:00:17.310 --> 00:00:20.320
distributed random variables
that are drawn from a

00:00:20.320 --> 00:00:23.320
distribution that has a certain
mean theta, the

00:00:23.320 --> 00:00:26.410
quantity that we want to
estimate, and some variance

00:00:26.410 --> 00:00:28.400
sigma squared.

00:00:28.400 --> 00:00:30.570
Let us say that we want
to construct a

00:00:30.570 --> 00:00:33.320
95% confidence interval.

00:00:33.320 --> 00:00:36.790
Our starting point will be the
fact that the sample mean,

00:00:36.790 --> 00:00:40.050
according to the central limit
theorem, can be described

00:00:40.050 --> 00:00:43.740
approximately using normal
distributions.

00:00:43.740 --> 00:00:48.420
And we look up at the normal
table, and we observe the

00:00:48.420 --> 00:00:49.450
following--

00:00:49.450 --> 00:00:55.070
that if we take a standard
normal random variable, then

00:00:55.070 --> 00:01:02.430
there is probability, 97.5% of
falling below this number,

00:01:02.430 --> 00:01:07.570
1.96, which means that there
is probability 2 1/2% of

00:01:07.570 --> 00:01:09.910
falling above that number.

00:01:09.910 --> 00:01:12.760
And by symmetry, the probability
of falling below

00:01:12.760 --> 00:01:17.990
minus 1.96 is also 2 1/2%.

00:01:17.990 --> 00:01:21.810
This means that this middle
interval here has probability

00:01:21.810 --> 00:01:26.070
95%, and we exploit this
fact as follows.

00:01:28.800 --> 00:01:32.860
If we take the sample mean,
subtract the true mean, and

00:01:32.860 --> 00:01:36.250
then divide by the standard
deviation of the sample mean,

00:01:36.250 --> 00:01:39.479
then we obtain a random
variable, which is

00:01:39.479 --> 00:01:42.630
approximately a standard
normal.

00:01:42.630 --> 00:01:46.320
Therefore, what we have here
is the probability of an

00:01:46.320 --> 00:01:49.039
approximately standard normal
random variable.

00:01:49.039 --> 00:01:55.420
Or actually, its absolute value
falling below 1.96.

00:01:55.420 --> 00:01:59.710
This is just the event that
our standard normal falls

00:01:59.710 --> 00:02:04.180
inside this middle interval
here, according to this entry

00:02:04.180 --> 00:02:07.090
from the normal tables and the
previous discussion, this

00:02:07.090 --> 00:02:10.179
probability is going to
be approximately 95%.

00:02:12.990 --> 00:02:17.470
And now we take this statement,
send this term to

00:02:17.470 --> 00:02:20.710
the other side of the
inequality, and then interpret

00:02:20.710 --> 00:02:22.910
what it means for an
absolute value to

00:02:22.910 --> 00:02:24.540
be less than something.

00:02:24.540 --> 00:02:28.130
And we obtain an equivalent
statement.

00:02:28.130 --> 00:02:32.320
This event here is algebraically
identical to the

00:02:32.320 --> 00:02:35.370
event that we have up there, and
this provides us with the

00:02:35.370 --> 00:02:37.360
desired confidence interval.

00:02:37.360 --> 00:02:40.920
We think of this quantity here
as the lower end of the

00:02:40.920 --> 00:02:42.320
confidence interval.

00:02:42.320 --> 00:02:45.310
This quantity here is
the upper end of

00:02:45.310 --> 00:02:47.010
the confidence interval.

00:02:47.010 --> 00:02:50.730
And this statement tells us
that there is probability

00:02:50.730 --> 00:02:55.550
approximately equal to 95% that
the confidence interval

00:02:55.550 --> 00:02:59.560
constructed this way contains
the true value

00:02:59.560 --> 00:03:02.320
of the unknown parameter.

00:03:02.320 --> 00:03:06.650
So this is how we obtain a
95% confidence interval.

00:03:06.650 --> 00:03:11.600
If instead we wanted a 90%
confidence interval, we would

00:03:11.600 --> 00:03:14.210
proceed in more or less
in the same way.

00:03:14.210 --> 00:03:20.070
Here, we would want to
have the number 0.95.

00:03:20.070 --> 00:03:21.290
Why is that?

00:03:21.290 --> 00:03:26.020
We want this middle interval to
have probability 90%, which

00:03:26.020 --> 00:03:30.890
means that we want to have
probability 5% at

00:03:30.890 --> 00:03:32.525
each one of the tails.

00:03:35.550 --> 00:03:39.510
And then we look up at the
normal tables, and we find

00:03:39.510 --> 00:03:44.980
that the entry that gives us
probability 95% of being below

00:03:44.980 --> 00:03:48.950
that value is 1.645.

00:03:48.950 --> 00:03:54.910
So if we use 1.645 in place
of 1.96, we obtain a 90%

00:03:54.910 --> 00:03:58.310
confidence interval, and
similarly for other choices.

00:03:58.310 --> 00:04:03.290
For example, if we want a
99% confidence interval.

00:04:03.290 --> 00:04:06.660
There's only one issue that's
left to discuss, and this is

00:04:06.660 --> 00:04:07.620
the following.

00:04:07.620 --> 00:04:11.860
In order to obtain numerical
values for the endpoints of

00:04:11.860 --> 00:04:15.890
the confidence interval, we
need to know sigma, the

00:04:15.890 --> 00:04:17.940
standard deviation
of the random

00:04:17.940 --> 00:04:20.010
variables that we are observing.

00:04:20.010 --> 00:04:23.160
But if we do not know the value
of sigma, then we may

00:04:23.160 --> 00:04:24.720
have to do some additional
work.