WEBVTT

00:00:02.220 --> 00:00:05.530
Simulation is an important
tool in the analysis

00:00:05.530 --> 00:00:08.830
of probabilistic phenomena.

00:00:08.830 --> 00:00:12.310
For example, suppose
that X, Y, and Z

00:00:12.310 --> 00:00:14.760
are independent
random variables,

00:00:14.760 --> 00:00:17.580
and you're interested in
the statistical properties

00:00:17.580 --> 00:00:20.400
of this random variable.

00:00:20.400 --> 00:00:22.660
Perhaps you can find
the distribution

00:00:22.660 --> 00:00:26.300
of this random variable by
solving a derived distribution

00:00:26.300 --> 00:00:29.210
problem, but sometimes
this is impossible.

00:00:29.210 --> 00:00:32.049
And in such cases,
what you do is,

00:00:32.049 --> 00:00:37.230
you generate random samples
of these random variables

00:00:37.230 --> 00:00:39.850
drawn according to
their distributions,

00:00:39.850 --> 00:00:45.450
and then evaluate the function
g on that random sample.

00:00:45.450 --> 00:00:49.120
And this gives you one sample
value of this function.

00:00:49.120 --> 00:00:51.400
And you can repeat
that several times

00:00:51.400 --> 00:00:54.070
to obtain some kind of
histogram and from that,

00:00:54.070 --> 00:00:56.690
get some understanding about
the statistical properties

00:00:56.690 --> 00:00:58.160
of this function.

00:00:58.160 --> 00:01:02.330
So the question is, how can
we generate a random sample

00:01:02.330 --> 00:01:06.300
of a random variable whose
distribution is known?

00:01:06.300 --> 00:01:10.230
So what we want is to
create some kind of box

00:01:10.230 --> 00:01:12.789
that outputs numbers.

00:01:12.789 --> 00:01:15.300
And these numbers
are random variables

00:01:15.300 --> 00:01:21.580
that are distributed according
to a CDF that's given to us.

00:01:21.580 --> 00:01:23.270
How can we do it?

00:01:23.270 --> 00:01:28.010
Well, computers typically
have a random number generator

00:01:28.010 --> 00:01:29.640
in them.

00:01:29.640 --> 00:01:33.650
And random number generators,
typically what they do

00:01:33.650 --> 00:01:37.729
is generate values
that are drawn

00:01:37.729 --> 00:01:39.495
from a uniform distribution.

00:01:42.160 --> 00:01:44.580
So this gives us
a starting point.

00:01:44.580 --> 00:01:47.640
We can generate uniform
random variables.

00:01:47.640 --> 00:01:51.060
But what we want is to generate
values of a random variable

00:01:51.060 --> 00:01:52.940
according to some
other distribution.

00:01:52.940 --> 00:01:54.890
How are we going to do it?

00:01:54.890 --> 00:02:00.770
What we want to do is to create
some kind of box or function

00:02:00.770 --> 00:02:07.870
that takes this uniform random
variable and generates g of U.

00:02:07.870 --> 00:02:13.610
And we want to find the
right function to use.

00:02:13.610 --> 00:02:20.190
Find a g so that the
random variable, g of U,

00:02:20.190 --> 00:02:23.860
is distributed according to
the distribution that we want.

00:02:23.860 --> 00:02:26.520
That is, we want
the CDF of g of U

00:02:26.520 --> 00:02:29.940
to be the CDF
that's given to us.

00:02:29.940 --> 00:02:32.140
So let's see how we can do this.

00:02:35.130 --> 00:02:39.310
Let us look at the discrete
case first, which is easier.

00:02:39.310 --> 00:02:41.780
And let us look at an example.

00:02:41.780 --> 00:02:46.700
So suppose that I want
to generate samples

00:02:46.700 --> 00:02:51.160
of a discrete random variable
that has the following PMF.

00:02:51.160 --> 00:02:53.920
It takes this value
with probability 2/6,

00:02:53.920 --> 00:02:58.050
this value with probability
3/6, and this value

00:02:58.050 --> 00:03:00.800
with probability 1/6.

00:03:00.800 --> 00:03:05.930
What I have is a
uniform random variable

00:03:05.930 --> 00:03:10.500
that's drawn from a
uniform distribution.

00:03:10.500 --> 00:03:13.100
What can I do?

00:03:13.100 --> 00:03:15.090
I can do the following.

00:03:15.090 --> 00:03:19.579
Let this number here be 2/6.

00:03:19.579 --> 00:03:22.420
If my uniform random
variable falls

00:03:22.420 --> 00:03:27.510
in this range, which happens
with probability 2/6,

00:03:27.510 --> 00:03:30.210
I'm going to report
this value for

00:03:30.210 --> 00:03:31.470
my discrete random variable.

00:03:34.390 --> 00:03:38.300
Then I take an
interval of length 3/6,

00:03:38.300 --> 00:03:42.860
which takes me to 5/6.

00:03:42.860 --> 00:03:47.500
And if my uniform random
variable falls in this range,

00:03:47.500 --> 00:03:50.470
then I'm going to
report that value

00:03:50.470 --> 00:03:52.700
for my discrete random variable.

00:03:52.700 --> 00:03:56.150
And finally, with
probability 1/6,

00:03:56.150 --> 00:03:59.640
my uniform random variable
happens to fall in here.

00:03:59.640 --> 00:04:02.810
And then I report that [value].

00:04:02.810 --> 00:04:05.720
So clearly, the value
that I'm reporting

00:04:05.720 --> 00:04:07.480
has the correct probabilities.

00:04:07.480 --> 00:04:11.180
I'm going to report this
value with probability 2/6,

00:04:11.180 --> 00:04:15.020
I'm going to report that
value with probability 3/6,

00:04:15.020 --> 00:04:16.190
and so on.

00:04:16.190 --> 00:04:19.100
So this is how we can
generate random samples

00:04:19.100 --> 00:04:22.010
of a discrete
distribution, starting

00:04:22.010 --> 00:04:24.460
from a uniform random variable.

00:04:24.460 --> 00:04:30.210
Let us now look at what we did
in a somewhat different way.

00:04:30.210 --> 00:04:33.650
This is the x-axis.

00:04:33.650 --> 00:04:39.610
And let me plot the CDF of
my discrete random variable.

00:04:39.610 --> 00:04:48.390
So the CDF has a jump of 2/6, at
a point which is equal to that.

00:04:48.390 --> 00:04:52.730
Then it has another
jump of size 3/6, which

00:04:52.730 --> 00:04:57.720
takes us to 5/6 at
some other point.

00:04:57.720 --> 00:05:01.930
And that point here corresponds
to the location of that value.

00:05:01.930 --> 00:05:05.600
And finally, it has
another jump of 1/6

00:05:05.600 --> 00:05:08.740
that takes us to 1,
at another point, that

00:05:08.740 --> 00:05:10.370
corresponds to the third value.

00:05:13.230 --> 00:05:18.670
And look now at this
interval here from 0 to 1.

00:05:18.670 --> 00:05:20.630
And let us think as follows.

00:05:20.630 --> 00:05:22.910
We have a uniform
random variable

00:05:22.910 --> 00:05:26.140
distributed between 0 to 1.

00:05:26.140 --> 00:05:28.770
If my uniform random
variable happens

00:05:28.770 --> 00:05:34.470
to fall in this interval, I'm
going to report that value.

00:05:34.470 --> 00:05:37.930
If my uniform random
variable happens

00:05:37.930 --> 00:05:49.320
to fall in this interval, I'm
going to report that value.

00:05:49.320 --> 00:05:54.690
And finally, if my uniform
falls in this interval,

00:05:54.690 --> 00:05:58.020
I'm going to report that value.

00:05:58.020 --> 00:06:01.030
We're doing exactly the
same thing as before.

00:06:01.030 --> 00:06:05.040
With probability 2/6,
my uniform falls here.

00:06:05.040 --> 00:06:08.270
And we report this
value and so on.

00:06:08.270 --> 00:06:10.840
So what's a graphical
way of understanding

00:06:10.840 --> 00:06:12.480
of what we're doing?

00:06:12.480 --> 00:06:14.430
We're taking the CDF.

00:06:14.430 --> 00:06:18.120
We generate a value
of the uniform.

00:06:18.120 --> 00:06:22.090
And then we move
until we hit the CDF

00:06:22.090 --> 00:06:25.740
and report the
corresponding value of x.

00:06:25.740 --> 00:06:28.280
It turns out that
this recipe will also

00:06:28.280 --> 00:06:32.860
work in the continuous case.

00:06:32.860 --> 00:06:34.230
Let's see how this is done.

00:06:36.830 --> 00:06:40.409
So let's assume that
we have a CDF, which

00:06:40.409 --> 00:06:41.800
is strictly monotonic.

00:06:44.860 --> 00:06:49.540
So the picture
would be as follows.

00:06:49.540 --> 00:06:51.300
It's a CDF.

00:06:51.300 --> 00:06:53.620
CDFs are monotonic,
but here, we assume

00:06:53.620 --> 00:06:56.290
that it is strictly monotonic.

00:06:56.290 --> 00:06:59.615
And we also assume
that it is continuous.

00:07:06.060 --> 00:07:08.690
It doesn't have any jumps.

00:07:08.690 --> 00:07:16.530
So this CDF starts at 0 and
rises, asymptotically, to 1.

00:07:16.530 --> 00:07:20.480
What was the recipe that
we were just discussing?

00:07:20.480 --> 00:07:24.195
We generate a value for a
uniform random variable.

00:07:28.110 --> 00:07:36.440
We move until we hit the CDF,
and then report this value here

00:07:36.440 --> 00:07:38.870
for x.

00:07:38.870 --> 00:07:40.480
So what is it that we're doing?

00:07:40.480 --> 00:07:43.440
We're going from u's to x's.

00:07:43.440 --> 00:07:48.100
So we're using the
inverse function.

00:07:48.100 --> 00:07:52.100
The cumulative takes
as an input an x,

00:07:52.100 --> 00:07:57.400
a value on this axis, and then
reports, a value on that axis.

00:07:57.400 --> 00:07:59.400
The inverse function
is the function

00:07:59.400 --> 00:08:01.350
that goes the opposite way.

00:08:01.350 --> 00:08:03.850
We start from a value
on the vertical axis

00:08:03.850 --> 00:08:08.030
and takes us to the
horizontal axis.

00:08:08.030 --> 00:08:10.620
Now, the important thing is
that because of our assumption

00:08:10.620 --> 00:08:14.450
that f is continuous
and strictly monotonic,

00:08:14.450 --> 00:08:17.920
this inverse function
is well-defined.

00:08:17.920 --> 00:08:20.020
Given any point
here, we can always

00:08:20.020 --> 00:08:26.720
find one and only
one corresponding x.

00:08:26.720 --> 00:08:29.480
Now, what are the
properties of this method

00:08:29.480 --> 00:08:31.910
that we have been using?

00:08:31.910 --> 00:08:39.010
If I take some number c and then
take the corresponding number

00:08:39.010 --> 00:08:45.620
up here, which is
going to be F_X of c,

00:08:45.620 --> 00:08:49.020
then we have the
following property.

00:08:49.020 --> 00:08:53.120
My random variable
X is going to be

00:08:53.120 --> 00:08:56.235
less than or equal
to c if and only

00:08:56.235 --> 00:09:01.160
if my random variable X
falls into this interval.

00:09:01.160 --> 00:09:09.680
But that's equivalent to
saying that the uniform random

00:09:09.680 --> 00:09:12.760
variable fell in that interval.

00:09:12.760 --> 00:09:16.180
Values of the uniform
in this interval-- these

00:09:16.180 --> 00:09:19.600
are the values that
give me x's that are

00:09:19.600 --> 00:09:22.370
less than or equal to c.

00:09:22.370 --> 00:09:26.270
So the event that X is
less than or equal to c

00:09:26.270 --> 00:09:30.990
is identical to the
event that U is less than

00:09:30.990 --> 00:09:34.410
or equal to F_X of c.

00:09:34.410 --> 00:09:39.260
So this is how I am generating
my x's based on u's.

00:09:39.260 --> 00:09:42.400
We now need to verify that
the x's that I'm generating

00:09:42.400 --> 00:09:47.110
this way have the correct
property, have the correct CDF.

00:09:47.110 --> 00:09:49.850
So let's check it out.

00:09:49.850 --> 00:09:55.240
The probability that X is
less than or equal to c, this

00:09:55.240 --> 00:10:03.230
is the probability that U is
less than or equal to F_X of c.

00:10:03.230 --> 00:10:05.640
But U is a uniform
random variable.

00:10:05.640 --> 00:10:08.050
The probability of being
less than something

00:10:08.050 --> 00:10:11.250
is just that something.

00:10:11.250 --> 00:10:16.530
So we have verified that
with this way of constructing

00:10:16.530 --> 00:10:19.650
samples of X based
on samples of U,

00:10:19.650 --> 00:10:23.225
the random variable that
we get has the desired CDF.

00:10:26.950 --> 00:10:29.340
Let's look at an example now.

00:10:29.340 --> 00:10:32.010
Suppose that we want
to generate samples

00:10:32.010 --> 00:10:36.710
of a random variable, which is
an exponential random variable,

00:10:36.710 --> 00:10:38.720
with parameter 1.

00:10:38.720 --> 00:10:41.770
In this case, we
know what the CDF is.

00:10:41.770 --> 00:10:47.600
The CDF of an exponential with
parameter 1 is given by this

00:10:47.600 --> 00:10:50.000
formula, for non-negative x's.

00:10:54.110 --> 00:10:57.065
Now, let us find the
inverse function.

00:11:01.080 --> 00:11:08.180
If a u corresponds to 1
minus e to the minus x--

00:11:08.180 --> 00:11:12.870
so we started with some x here
and we find the corresponding

00:11:12.870 --> 00:11:17.970
u-- this is the formula that
takes us from x's to u's.

00:11:17.970 --> 00:11:21.990
Let's find the formula that
takes us from u's to x's.

00:11:21.990 --> 00:11:24.910
So we need to solve
this equation.

00:11:24.910 --> 00:11:27.910
Let's send u to the
other side, and let's

00:11:27.910 --> 00:11:30.610
send this term to
the left hand side.

00:11:30.610 --> 00:11:35.750
We obtain e to the minus
x equals 1 minus u.

00:11:35.750 --> 00:11:41.060
Let us take logarithms: minus
x equals to the logarithm

00:11:41.060 --> 00:11:43.530
of 1 minus u.

00:11:43.530 --> 00:11:51.600
And finally, x is equal to minus
the logarithm of 1 minus u.

00:11:51.600 --> 00:11:54.850
So this is the inverse function.

00:11:54.850 --> 00:11:58.510
And now, what we have
discussed leads us

00:11:58.510 --> 00:12:00.590
to the following procedure.

00:12:00.590 --> 00:12:03.870
I generate a random
variable, U, according

00:12:03.870 --> 00:12:05.920
to the uniform distribution.

00:12:05.920 --> 00:12:09.420
Then I form the
random variable X

00:12:09.420 --> 00:12:15.750
by taking the negative of
the logarithm of 1 minus U.

00:12:15.750 --> 00:12:20.010
And this gives me
a random variable,

00:12:20.010 --> 00:12:22.430
which has an exponential
distribution.

00:12:22.430 --> 00:12:25.710
And so we have found
a way of simulating

00:12:25.710 --> 00:12:29.590
exponential random variables,
starting with a random number

00:12:29.590 --> 00:12:33.543
generator that produces
uniform random variables.