WEBVTT
00:00:01.080 --> 00:00:03.660
We have claimed that normal
random variables are very
00:00:03.660 --> 00:00:06.880
important, and therefore we
would like to be able to
00:00:06.880 --> 00:00:09.280
calculate probabilities
associated with them.
00:00:09.280 --> 00:00:11.940
For example, given a normal
random variable, what is the
00:00:11.940 --> 00:00:15.200
probability that it takes
a value less than 5?
00:00:15.200 --> 00:00:18.470
Unfortunately, there are no
closed form expressions that
00:00:18.470 --> 00:00:20.210
can help us with this.
00:00:20.210 --> 00:00:23.040
In particular, the CDF, the
Cumulative Distribution
00:00:23.040 --> 00:00:26.030
Function of normal random
variables, is not given in
00:00:26.030 --> 00:00:27.120
closed form.
00:00:27.120 --> 00:00:30.960
But fortunately, we do have
tables for the standard normal
00:00:30.960 --> 00:00:32.820
random variable.
00:00:32.820 --> 00:00:38.270
These tables, which take the
form shown here, give us the
00:00:38.270 --> 00:00:40.200
following information.
00:00:40.200 --> 00:00:43.930
If we have a normal random
variable, which is a standard
00:00:43.930 --> 00:00:48.150
normal, they tell us the values
of the cumulative
00:00:48.150 --> 00:00:55.240
distribution function for
different values of little y.
00:00:55.240 --> 00:00:59.870
In terms of a picture, if this
is the PDF of a standard
00:00:59.870 --> 00:01:04.840
normal and I give you a value
little y, I'm interested in
00:01:04.840 --> 00:01:08.250
the corresponding value of
the CDF, which is the
00:01:08.250 --> 00:01:10.150
area under the curve.
00:01:10.150 --> 00:01:13.770
Well, that value, the area under
this curve, is exactly
00:01:13.770 --> 00:01:16.580
what this table is
giving to us.
00:01:16.580 --> 00:01:19.990
And there's a shorthand notation
for referring to the
00:01:19.990 --> 00:01:24.510
CDF of the standard normal,
which is just phi of y.
00:01:24.510 --> 00:01:27.390
Let us see how we
use this table.
00:01:27.390 --> 00:01:31.140
Suppose we're interested
in phi of 0.
00:01:31.140 --> 00:01:33.900
Which is the probability that
our standard normal takes a
00:01:33.900 --> 00:01:36.570
value less than or equal to 0?
00:01:36.570 --> 00:01:40.259
Well, by symmetry since the PDF
is symmetric around 0, we
00:01:40.259 --> 00:01:44.180
know that this probability
should be 0.5.
00:01:44.180 --> 00:01:46.990
Let's see what the
table tells us.
00:01:46.990 --> 00:01:53.690
0 corresponds to this entry,
which is indeed 0.5.
00:01:53.690 --> 00:01:57.789
Let us look up the probability
that our standard normal takes
00:01:57.789 --> 00:02:01.330
a value less than,
let's say, 1.16.
00:02:01.330 --> 00:02:03.580
How do we find this
information?
00:02:03.580 --> 00:02:06.620
1 is here.
00:02:06.620 --> 00:02:11.770
And 1.1 is here.
00:02:11.770 --> 00:02:17.350
1.1, and then we have a 6 in the
next decimal place, which
00:02:17.350 --> 00:02:20.590
leads us to this entry.
00:02:20.590 --> 00:02:27.660
And so this value is 0.8770.
00:02:27.660 --> 00:02:31.200
Similarly, we can calculate
the probability that the
00:02:31.200 --> 00:02:35.030
normal is less than 2.9.
00:02:35.030 --> 00:02:37.010
How do we look up this
information?
00:02:37.010 --> 00:02:38.860
2.9 is here.
00:02:38.860 --> 00:02:42.520
We do not have another decimal
digit, so we're looking at
00:02:42.520 --> 00:02:43.640
this column.
00:02:43.640 --> 00:02:50.910
And we obtain this value,
which is 0.9981.
00:02:50.910 --> 00:02:54.930
And by looking at this number
we can actually tell that a
00:02:54.930 --> 00:02:59.810
standard normal random variable
has extremely low
00:02:59.810 --> 00:03:04.400
probability of being
bigger than 2.9.
00:03:04.400 --> 00:03:08.850
Now notice that the table
specifies phi of y for y being
00:03:08.850 --> 00:03:10.190
non-negative.
00:03:10.190 --> 00:03:15.820
What if we wish to calculate the
value, for example, of phi
00:03:15.820 --> 00:03:18.860
of minus 2?
00:03:18.860 --> 00:03:23.980
In terms of a picture, this
is a standard normal.
00:03:23.980 --> 00:03:26.870
Here is minus 2.
00:03:26.870 --> 00:03:29.570
And we wish to calculate
this probability.
00:03:29.570 --> 00:03:32.040
There's nothing in the table
that gives us this probability
00:03:32.040 --> 00:03:35.150
directly, but we can
argue as follows.
00:03:35.150 --> 00:03:37.730
The normal PDF is symmetric.
00:03:37.730 --> 00:03:43.530
So if we look at 2, then this
probability here, which is phi
00:03:43.530 --> 00:03:47.200
of minus 2, is the same
as that probability
00:03:47.200 --> 00:03:49.020
here, of that tail.
00:03:49.020 --> 00:03:51.810
What is the probability
of that tail?
00:03:51.810 --> 00:03:56.750
It's 1, which is the overall
area under the curve, minus
00:03:56.750 --> 00:04:01.530
the area under the curve when
you go up to the value of 2.
00:04:04.540 --> 00:04:11.340
So this quantity is going to be
the same as phi of minus 2.
00:04:11.340 --> 00:04:14.570
And this one we can now
get from the tables.
00:04:14.570 --> 00:04:15.830
It's 1 minus--
00:04:15.830 --> 00:04:18.709
let us see, 2 is here.
00:04:18.709 --> 00:04:27.170
It's 1 minus 0.9772.
00:04:27.170 --> 00:04:30.460
The standard normal table
gives us probabilities
00:04:30.460 --> 00:04:33.610
associated with a standard
normal random variable.
00:04:33.610 --> 00:04:37.320
What if we're dealing with a
normal random variable that
00:04:37.320 --> 00:04:41.450
has a mean and a variance that
are different from those of
00:04:41.450 --> 00:04:42.820
the standard normal?
00:04:42.820 --> 00:04:44.280
What can we do?
00:04:44.280 --> 00:04:48.200
Well, there's a general trick
that you can do to a random
00:04:48.200 --> 00:04:51.409
variable, which is
the following.
00:04:51.409 --> 00:04:55.770
Let us define a new random
variable Y in this fashion.
00:04:55.770 --> 00:05:01.030
Y measures how far away is
X from the mean value.
00:05:01.030 --> 00:05:04.610
But because we divide by sigma,
the standard deviation,
00:05:04.610 --> 00:05:08.920
it measures this distance
in standard deviations.
00:05:08.920 --> 00:05:14.190
So if Y is equal to 3 it means
that X is 3 standard
00:05:14.190 --> 00:05:16.320
deviations away from the mean.
00:05:16.320 --> 00:05:20.540
In general, Y measures how many
deviations away from the
00:05:20.540 --> 00:05:22.270
mean are you.
00:05:22.270 --> 00:05:25.030
What properties does this
random variable have?
00:05:25.030 --> 00:05:30.080
The expected value of Y is
going to be equal to 0,
00:05:30.080 --> 00:05:33.650
because we have X and we're
subtracting the mean of X. So
00:05:33.650 --> 00:05:36.890
the expected value of this
term is equal to 0.
00:05:36.890 --> 00:05:39.690
How about the variance of Y?
00:05:39.690 --> 00:05:43.820
Whenever we multiply a random
variable by a constant, the
00:05:43.820 --> 00:05:48.300
variance gets multiplied by the
square of that constant.
00:05:48.300 --> 00:05:51.320
So we get this expression.
00:05:51.320 --> 00:05:54.280
But the variance of X
is sigma squared.
00:05:54.280 --> 00:05:55.980
So this is equal to 1.
00:05:55.980 --> 00:05:59.500
So starting from X, we have
obtained a closely related
00:05:59.500 --> 00:06:03.460
random variable Y that has the
property that it has 0 mean
00:06:03.460 --> 00:06:05.480
and unit variance.
00:06:05.480 --> 00:06:10.990
If it also happens that X is a
normal random variable, then Y
00:06:10.990 --> 00:06:14.380
is going to be a standard
normal random variable.
00:06:14.380 --> 00:06:18.160
So we have managed to relate
X to a standard
00:06:18.160 --> 00:06:19.940
normal random variable.
00:06:19.940 --> 00:06:23.400
And perhaps you can rewrite this
expression in this form,
00:06:23.400 --> 00:06:29.040
X equals to mu plus
sigma Y where Y is
00:06:29.040 --> 00:06:32.620
now a standard normal.
00:06:32.620 --> 00:06:35.040
So, instead of doing
calculations having to do with
00:06:35.040 --> 00:06:40.220
X, we can try to calculate in
terms of Y. And for Y we do
00:06:40.220 --> 00:06:42.570
have available tables.
00:06:42.570 --> 00:06:46.300
Let us look at an example
of how this is done.
00:06:46.300 --> 00:06:49.210
The way to calculate
probabilities associated with
00:06:49.210 --> 00:06:53.560
general normal random variables
is to take the event
00:06:53.560 --> 00:06:58.140
whose probability we want
calculated and express it in
00:06:58.140 --> 00:07:00.690
terms of standard normal
random variables.
00:07:00.690 --> 00:07:03.810
And then use the standard
normal tables.
00:07:03.810 --> 00:07:07.090
Let us see how this is done
in terms of an example.
00:07:07.090 --> 00:07:13.000
Suppose that X is normal with
mean 6 and variance 4, so that
00:07:13.000 --> 00:07:15.780
the standard deviation
sigma is equal to 2.
00:07:15.780 --> 00:07:19.080
And suppose that we want to
calculate the probability that
00:07:19.080 --> 00:07:23.550
X lies between 2 and 8.
00:07:30.520 --> 00:07:33.200
Here's how we can proceed.
00:07:33.200 --> 00:07:39.140
This event is the same as the
event that X minus 6 takes a
00:07:39.140 --> 00:07:43.220
value between 2 minus
6 and 8 minus 6.
00:07:43.220 --> 00:07:45.790
This event is the same as the
original event we were
00:07:45.790 --> 00:07:47.590
interested in.
00:07:47.590 --> 00:07:50.890
We can also divide both sides
of this inequality by the
00:07:50.890 --> 00:07:52.420
standard deviation.
00:07:52.420 --> 00:07:55.040
And the event of interest
has now been
00:07:55.040 --> 00:07:59.430
expressed in this form.
00:07:59.430 --> 00:08:03.830
But at this point we recognize
that this is of the form X
00:08:03.830 --> 00:08:06.390
minus mu over sigma.
00:08:06.390 --> 00:08:09.430
So this random variable
here is a standard
00:08:09.430 --> 00:08:10.855
normal random variable.
00:08:14.090 --> 00:08:24.530
So the probability that X lies
between 2 and 8 is the same as
00:08:24.530 --> 00:08:27.600
the probability that a standard
normal random
00:08:27.600 --> 00:08:32.350
variable, call it Y, falls
between these numbers minus 4
00:08:32.350 --> 00:08:35.090
divided by 2, that's minus 2.
00:08:35.090 --> 00:08:37.380
Then Y less than 1.
00:08:40.960 --> 00:08:44.340
And now we can use the standard
normal tables to
00:08:44.340 --> 00:08:46.210
calculate this probability.
00:08:46.210 --> 00:08:49.960
We have here 1 and here
we have minus 2.
00:08:49.960 --> 00:08:52.790
And we want to find the
probability that our standard
00:08:52.790 --> 00:08:55.660
normal falls inside
this range.
00:08:55.660 --> 00:08:59.420
This is the probability that
it is less than 1.
00:08:59.420 --> 00:09:02.640
But we need to subtract the
probability of that tail so
00:09:02.640 --> 00:09:06.850
that we're left just with
this intermediate area.
00:09:06.850 --> 00:09:11.970
So this is the probability that
Y is less than 1 minus
00:09:11.970 --> 00:09:17.390
the probability that Y
is less than minus 2.
00:09:17.390 --> 00:09:21.520
And finally, as we discussed
earlier, the probability that
00:09:21.520 --> 00:09:27.360
Y is less than minus 2, this
is 1 minus the probability
00:09:27.360 --> 00:09:31.690
that Y is less than
or equal to 2.
00:09:31.690 --> 00:09:35.670
And now we can go to the normal
tables, identify the
00:09:35.670 --> 00:09:38.660
values that we're interested in,
the probability that Y is
00:09:38.660 --> 00:09:41.970
less than 1, the probability
that Y is less than 2, and
00:09:41.970 --> 00:09:43.440
plug these in.
00:09:43.440 --> 00:09:46.250
And this gives us the
desired probability.
00:09:46.250 --> 00:09:53.450
Again, the key step is to take
the event of interest and by
00:09:53.450 --> 00:09:55.530
subtracting the mean and
dividing by the standard
00:09:55.530 --> 00:09:59.740
deviation express that same
event in an equivalent form,
00:09:59.740 --> 00:10:02.240
but which now involves
a standard
00:10:02.240 --> 00:10:03.660
normal random variable.
00:10:06.880 --> 00:10:09.780
And then finally, use the
standard normal tables.