WEBVTT
00:00:01.080 --> 00:00:03.700
We will now go through a
sequence of examples that
00:00:03.700 --> 00:00:07.880
illustrate the different types
of questions that we usually
00:00:07.880 --> 00:00:11.130
answer using a normal
approximation based on the
00:00:11.130 --> 00:00:13.060
central limit theorem.
00:00:13.060 --> 00:00:17.000
In general, one uses these
approximations to make
00:00:17.000 --> 00:00:19.820
statements of this type.
00:00:19.820 --> 00:00:23.400
That the probability of
the sum of n, i.i.d.
00:00:23.400 --> 00:00:26.900
random variables being less than
a certain number, that
00:00:26.900 --> 00:00:29.370
this probability is
approximately equal to some
00:00:29.370 --> 00:00:30.630
other number.
00:00:30.630 --> 00:00:34.770
Notice that this statement
involves three parameters, a,
00:00:34.770 --> 00:00:39.000
b, and n, and you can imagine
problems where you are given
00:00:39.000 --> 00:00:41.040
two of these parameters,
and you're
00:00:41.040 --> 00:00:42.920
asked to find the third.
00:00:42.920 --> 00:00:45.990
And this gives us the different
variations of the
00:00:45.990 --> 00:00:49.340
questions that we might
be able to answer.
00:00:49.340 --> 00:00:52.350
So we will go through examples
of each one of these
00:00:52.350 --> 00:00:55.210
variations.
00:00:55.210 --> 00:00:57.890
Our setting will
be as follows.
00:00:57.890 --> 00:01:02.420
We have a container, and the
container receives packages.
00:01:02.420 --> 00:01:05.300
Each package has a random
weight, which is an
00:01:05.300 --> 00:01:08.050
independent random variable
that's drawn from an
00:01:08.050 --> 00:01:12.590
exponential distribution
with a parameter 1/2.
00:01:12.590 --> 00:01:17.289
And we load the container
with 100 packages.
00:01:17.289 --> 00:01:21.260
We would like to calculate the
probability that the total
00:01:21.260 --> 00:01:25.230
weight of the 100 packages
exceeds 210.
00:01:25.230 --> 00:01:29.680
For example, 210 might be the
capacity of the container.
00:01:29.680 --> 00:01:33.160
Since we will be using the
central limit theorem, we will
00:01:33.160 --> 00:01:37.900
have to work with the
standardized version of Sn in
00:01:37.900 --> 00:01:42.410
which we subtract the mean of Sn
and divide by the standard
00:01:42.410 --> 00:01:44.976
deviation of Sn.
00:01:44.976 --> 00:01:48.390
And to do that, we will need
to know the mean and the
00:01:48.390 --> 00:01:49.910
standard deviation.
00:01:49.910 --> 00:01:53.950
Now for an exponential, the mean
is the inverse of lambda
00:01:53.950 --> 00:01:58.660
and the standard deviation is
also the inverse of lambda and
00:01:58.660 --> 00:02:02.090
so we know what these
quantities are.
00:02:02.090 --> 00:02:08.750
Then, the next step is to take
this event here and rewrite it
00:02:08.750 --> 00:02:13.270
in a way that involves
this random variable.
00:02:13.270 --> 00:02:18.530
So what we would do is that we
take the original description
00:02:18.530 --> 00:02:24.590
of the event, subtract from both
sides of this inequality
00:02:24.590 --> 00:02:26.640
this number n times mu.
00:02:26.640 --> 00:02:29.600
In this case, n is
100, mu is 2.
00:02:29.600 --> 00:02:35.079
So we subtract 200, divide by
this quantity, square root of
00:02:35.079 --> 00:02:39.090
100 is 10 times sigma,
this gives us 20.
00:02:39.090 --> 00:02:44.750
And we do the same on the other
side of the inequality.
00:02:44.750 --> 00:02:48.250
This is just an equivalent
representation of the original
00:02:48.250 --> 00:02:52.670
event, but we have here is
the probability that this
00:02:52.670 --> 00:02:58.660
standardized version of Sn is
larger than or equal to this
00:02:58.660 --> 00:03:01.580
number, which is 0.5.
00:03:01.580 --> 00:03:04.880
And at this point, we can use
the central limit theorem
00:03:04.880 --> 00:03:08.320
approximation to say that this
probability is approximately
00:03:08.320 --> 00:03:12.925
the same if we use a standard
normal instead of Zn.
00:03:15.570 --> 00:03:18.900
Now for a standard normal, we
can calculate probabilities in
00:03:18.900 --> 00:03:21.650
terms of the CDF that's
given in the table.
00:03:21.650 --> 00:03:24.010
But here, we have the
probability that Z is larger
00:03:24.010 --> 00:03:27.740
than something, not smaller
than something.
00:03:27.740 --> 00:03:30.810
The CDF gives us the probability
that Z is less
00:03:30.810 --> 00:03:31.790
than something.
00:03:31.790 --> 00:03:33.160
This is easy to handle.
00:03:33.160 --> 00:03:40.380
This probability is 1 minus the
probability that Z is less
00:03:40.380 --> 00:03:48.140
than 0.5, which is 1 minus the
CDF of the standard normal
00:03:48.140 --> 00:03:51.430
evaluated at 0.5.
00:03:51.430 --> 00:03:56.610
And at this point, we look up
the normal table, the standard
00:03:56.610 --> 00:04:04.070
normal table, and value for
an argument of 0.5.
00:04:04.070 --> 00:04:09.600
The corresponding value is this
one, so we obtain 1 minus
00:04:09.600 --> 00:04:17.339
0.6915, which evaluates
to 0.3085.
00:04:17.339 --> 00:04:22.560
And this is the answer to
this particular problem.
00:04:22.560 --> 00:04:27.280
In the next example, we ask a
somewhat different question.
00:04:27.280 --> 00:04:32.620
We fix again the number of
packages to be 100, but we're
00:04:32.620 --> 00:04:37.330
given some probabilistic
tolerance.
00:04:37.330 --> 00:04:42.360
We allow the packages, their
total weight, to exceed the
00:04:42.360 --> 00:04:44.790
capacity of the container.
00:04:44.790 --> 00:04:47.970
But we don't want that to happen
too often, we want to
00:04:47.970 --> 00:04:52.560
have only 5% probability of
exceeding that capacity.
00:04:52.560 --> 00:04:55.820
How should we choose the
capacity of the container if
00:04:55.820 --> 00:05:00.080
we want to have this kind
of a specification?
00:05:00.080 --> 00:05:02.220
So we proceed as follows.
00:05:02.220 --> 00:05:08.040
We want this number, 0.05, to be
approximately equal to this
00:05:08.040 --> 00:05:09.250
probability.
00:05:09.250 --> 00:05:15.100
But now, we take this event and
rewrite it in terms of the
00:05:15.100 --> 00:05:18.250
standardized random variable.
00:05:18.250 --> 00:05:24.460
That is, we start from both
sides of the inequality and
00:05:24.460 --> 00:05:32.680
subtract n times mu, which is
200, and then divide by the
00:05:32.680 --> 00:05:37.490
standard deviation of Sn, which
is this quantity and
00:05:37.490 --> 00:05:40.715
which is 20, exactly as in
the previous example.
00:05:45.540 --> 00:05:51.990
And now, this random variable,
Zn, is approximately a
00:05:51.990 --> 00:05:53.460
standard normal.
00:05:53.460 --> 00:05:55.600
So we're asking for the
probability of the standard
00:05:55.600 --> 00:05:59.680
normal is larger than or equal
to something which, using the
00:05:59.680 --> 00:06:05.230
argument as in the previous
example, is 1 minus the CDF of
00:06:05.230 --> 00:06:09.670
the standard normal, evaluated
at this particular value.
00:06:12.550 --> 00:06:19.770
Now, what this tells us is that
this quantity here, the
00:06:19.770 --> 00:06:25.940
value of the CDF, should be
equal to 1 minus 0.05.
00:06:25.940 --> 00:06:30.940
So this quantity here
should be 0.95.
00:06:30.940 --> 00:06:34.200
What does this tell us about
the argument of the CDF?
00:06:34.200 --> 00:06:37.050
We can look at the table
and try to find
00:06:37.050 --> 00:06:41.120
somewhere an entry of 0.95.
00:06:41.120 --> 00:06:48.159
And we find it either
here or there.
00:06:48.159 --> 00:06:51.940
We could choose either one, or
we might decide to split the
00:06:51.940 --> 00:06:58.310
difference and say that we get
the value of 0.95 when the
00:06:58.310 --> 00:07:04.450
argument is 1.645.
00:07:04.450 --> 00:07:11.590
And so we conclude that in order
for this to be 0.95, we
00:07:11.590 --> 00:07:20.390
need a minus 200 divided by
20 to be equal to 1.645.
00:07:20.390 --> 00:07:28.310
And then we solve for a and we
find that a should be 232.9.
00:07:28.310 --> 00:07:33.390
And we can choose the capacity
of our container this way.
00:07:33.390 --> 00:07:37.310
Our next example is a little
more challenging.
00:07:37.310 --> 00:07:40.390
Here, we will fix a and
b and we will ask for
00:07:40.390 --> 00:07:42.430
the value of n.
00:07:42.430 --> 00:07:48.310
Here's a type of question
that has this flavor.
00:07:48.310 --> 00:07:52.120
We are given the capacity
of our container.
00:07:52.120 --> 00:07:54.840
We want to have small
probability of
00:07:54.840 --> 00:07:57.070
exceeding that capacity.
00:07:57.070 --> 00:08:01.880
How many packages should
you try to load?
00:08:01.880 --> 00:08:05.550
What is the value of
n for which this
00:08:05.550 --> 00:08:08.360
relation will be true?
00:08:08.360 --> 00:08:13.650
So we proceed, as usual, by
taking this event and
00:08:13.650 --> 00:08:18.710
rewriting it in a way that
involves the standardized
00:08:18.710 --> 00:08:21.430
version of Sn.
00:08:21.430 --> 00:08:25.680
So we need to subtract n
times mu, which in this
00:08:25.680 --> 00:08:27.680
problem is 2 times n.
00:08:27.680 --> 00:08:30.960
We subtract it from both sides
of the inequality.
00:08:30.960 --> 00:08:34.909
And then we divide by the square
root of n times sigma,
00:08:34.909 --> 00:08:36.150
which is 2.
00:08:36.150 --> 00:08:41.929
So we divide both sides of the
inequality by this number.
00:08:41.929 --> 00:08:45.970
Once more, this event here is
identical to the original
00:08:45.970 --> 00:08:50.970
event, but now it is expressed
in terms of the standardized
00:08:50.970 --> 00:08:52.620
version of Sn.
00:08:52.620 --> 00:08:55.300
This is a random variable
that's approximately a
00:08:55.300 --> 00:08:59.010
standard normal, so once more,
we're talking about the
00:08:59.010 --> 00:09:02.070
probability that the standard
normal exceeds a certain
00:09:02.070 --> 00:09:05.420
value, and by the central
limit theorem, this is
00:09:05.420 --> 00:09:11.000
approximately equal to 1 minus
the standard normal CDF,
00:09:11.000 --> 00:09:17.010
evaluated at this particular
value here.
00:09:17.010 --> 00:09:21.870
Now we want this quantity to
be approximately equal to
00:09:21.870 --> 00:09:28.770
0.05, which, once more, means
that this quantity should be
00:09:28.770 --> 00:09:37.310
0.95 and arguing as before that
we try to find 0.95 in
00:09:37.310 --> 00:09:39.320
the standard normal table.
00:09:39.320 --> 00:09:43.850
And this tells us that the
argument of the normal CDF
00:09:43.850 --> 00:09:56.050
should be equal to 1.645.
00:09:56.050 --> 00:09:58.620
Here, we get an equation
for n.
00:09:58.620 --> 00:10:01.190
Unfortunately, it is a
quadratic equation.
00:10:01.190 --> 00:10:03.410
However, we can solve it.
00:10:03.410 --> 00:10:07.240
And after you solve it,
numerically or using the
00:10:07.240 --> 00:10:11.360
formula for the solution of
quadratic equations, you find
00:10:11.360 --> 00:10:17.220
the value of n that's somewhere
between 89 and 90.
00:10:17.220 --> 00:10:23.420
Now, n is an integer, so you
could choose either 89 or 90.
00:10:23.420 --> 00:10:26.960
If you want to be conservative,
then you would
00:10:26.960 --> 00:10:32.150
set n to the smaller value of
the two and set n to be 89.
00:10:35.930 --> 00:10:40.920
Our last example is going to
be a little different.
00:10:40.920 --> 00:10:43.240
Here's what happens.
00:10:43.240 --> 00:10:46.050
We start loading the container,
and the container
00:10:46.050 --> 00:10:48.780
has a capacity of 210.
00:10:48.780 --> 00:10:52.110
Once we load the package and
we see that the weight has
00:10:52.110 --> 00:10:55.630
exceeded 210, we stop.
00:10:55.630 --> 00:10:58.900
Let N be the number of packages
that have been
00:10:58.900 --> 00:11:01.990
loaded, and this number
is random.
00:11:01.990 --> 00:11:05.220
If you're unlucky and you happen
to get lots of heavy
00:11:05.220 --> 00:11:08.980
packages, then you will
stop earlier.
00:11:08.980 --> 00:11:12.560
We would like to calculate,
approximately, the probability
00:11:12.560 --> 00:11:15.760
that the number of packages
that have been loaded is
00:11:15.760 --> 00:11:18.630
larger than 100.
00:11:18.630 --> 00:11:22.440
Now, this problem feels
a little different.
00:11:22.440 --> 00:11:27.590
The reason is that N is not the
sum of independent random
00:11:27.590 --> 00:11:31.390
variables and so we do not have
a version of the central
00:11:31.390 --> 00:11:37.110
limit theorem that we could
apply to N. What can we do?
00:11:37.110 --> 00:11:41.860
Well, we try to take this event
and express it in terms
00:11:41.860 --> 00:11:43.720
of the Xi's.
00:11:43.720 --> 00:11:45.820
And here's how we go about it.
00:11:45.820 --> 00:11:50.200
What does it mean that we loaded
more than 100 packages?
00:11:50.200 --> 00:11:53.860
This means that at the time
we were loading the 100th
00:11:53.860 --> 00:11:56.740
package, we didn't stop.
00:11:56.740 --> 00:12:02.000
And this means that at that
time, after we loaded the
00:12:02.000 --> 00:12:07.800
100th package, the weight
had not exceeded 210.
00:12:07.800 --> 00:12:11.420
So the event that we're dealing
with here is the same
00:12:11.420 --> 00:12:18.070
as the event that the first
100 packages have a total
00:12:18.070 --> 00:12:25.350
weight which is less than
or equal to 210.
00:12:25.350 --> 00:12:27.675
But now we're back into
a problem that
00:12:27.675 --> 00:12:29.350
we know how to solve.
00:12:29.350 --> 00:12:33.270
And the way to solve it is to
take this random variable,
00:12:33.270 --> 00:12:34.850
standardize it--
00:12:34.850 --> 00:12:38.780
this actually is essentially the
same calculation as in our
00:12:38.780 --> 00:12:40.720
very first example--
00:12:40.720 --> 00:12:46.450
and we will get the standard
normal CDF evaluated at 210
00:12:46.450 --> 00:12:51.120
minus the mean of this random
variable, which is 200,
00:12:51.120 --> 00:12:53.610
divided by the standard
deviation of this random
00:12:53.610 --> 00:12:55.920
variable, which is 20.
00:12:55.920 --> 00:13:04.400
So we're looking at phi of 0.5,
which we look up at the
00:13:04.400 --> 00:13:14.900
standard normal table, and
has a value of 0.6915.
00:13:14.900 --> 00:13:19.480
So this was our last example,
and these four examples that
00:13:19.480 --> 00:13:24.330
we worked through cover pretty
much all of the types of
00:13:24.330 --> 00:13:27.410
problems that you
might encounter.
00:13:27.410 --> 00:13:31.110
Of course, sometimes it might
not be entirely obvious what
00:13:31.110 --> 00:13:33.370
kind of problem you
are dealing with.
00:13:33.370 --> 00:13:36.310
You may have to do some
translation from a problem
00:13:36.310 --> 00:13:40.100
statement to bring it in the
form that we dealt with at
00:13:40.100 --> 00:13:41.080
this point.
00:13:41.080 --> 00:13:46.120
But once you bring it into a
form where you can get close
00:13:46.120 --> 00:13:49.480
to applying the central limit
theorem, then the steps are
00:13:49.480 --> 00:13:53.340
pretty much routine, as long
as you carry them out in a
00:13:53.340 --> 00:13:54.990
systematic and organized
manner.