WEBVTT
00:00:00.700 --> 00:00:04.110
We will now define the notion
of a random variable.
00:00:04.110 --> 00:00:06.970
Very loosely speaking, a random
variable is a numerical
00:00:06.970 --> 00:00:10.180
quantity that takes
random values.
00:00:10.180 --> 00:00:11.410
But what does this mean?
00:00:11.410 --> 00:00:14.260
We want to be a little more
precise and I'm going to
00:00:14.260 --> 00:00:17.140
introduce the idea through
an example.
00:00:17.140 --> 00:00:22.160
Suppose that our sample space
is a set of students labeled
00:00:22.160 --> 00:00:23.780
according to their names.
00:00:23.780 --> 00:00:33.920
Or for simplicity, let's just
label them as a, b, c, and d.
00:00:33.920 --> 00:00:37.200
Our probabilistic experiment is
to pick a student at random
00:00:37.200 --> 00:00:41.690
according to some probability
law and then record their
00:00:41.690 --> 00:00:43.640
weight in kilograms.
00:00:47.700 --> 00:00:51.520
So for example, suppose that the
outcome of the experiment
00:00:51.520 --> 00:00:54.600
was this particular student,
and the weight of
00:00:54.600 --> 00:00:57.430
that student is 62.
00:00:57.430 --> 00:01:00.370
Or it could be that the outcome
of the experiment is
00:01:00.370 --> 00:01:03.980
this particular student, and
that particular student has a
00:01:03.980 --> 00:01:07.900
weight of 75 kilograms.
00:01:07.900 --> 00:01:11.930
The weight of a particular
student is a number, little w.
00:01:11.930 --> 00:01:15.280
But let us think of the abstract
concept of weight,
00:01:15.280 --> 00:01:21.170
something that we will denote
by capital W. Weight is an
00:01:21.170 --> 00:01:24.700
object whose value is determined
once you tell me
00:01:24.700 --> 00:01:28.300
the outcome of the experiment,
once you tell me which student
00:01:28.300 --> 00:01:29.370
was picked.
00:01:29.370 --> 00:01:33.080
In this sense, weight is really
a function of the
00:01:33.080 --> 00:01:34.960
outcome of the experiment.
00:01:34.960 --> 00:01:40.440
So think of weight as an
abstract box that takes as
00:01:40.440 --> 00:01:51.130
input a student and produces a
number, little w, which is the
00:01:51.130 --> 00:01:54.650
weight of that particular
student.
00:01:54.650 --> 00:01:59.220
Or more concretely, think of
weight with a capital W as a
00:01:59.220 --> 00:02:02.820
procedure that takes a student,
puts him or her on a
00:02:02.820 --> 00:02:04.840
scale, and reports the result.
00:02:04.840 --> 00:02:08.620
In this sense, weight is an
object of the same kind as the
00:02:08.620 --> 00:02:13.740
square root function that's
sitting inside your computer.
00:02:13.740 --> 00:02:16.450
The square root function
is a function.
00:02:16.450 --> 00:02:20.660
It's a subroutine, perhaps it is
a piece of code, that takes
00:02:20.660 --> 00:02:25.460
as input a number, let's say
the number 9, and produces
00:02:25.460 --> 00:02:26.750
another number.
00:02:26.750 --> 00:02:30.110
In this case, it would be the
number 3, which is the
00:02:30.110 --> 00:02:31.720
square root of 9.
00:02:34.300 --> 00:02:37.660
Notice here the distinction that
we will keep emphasizing
00:02:37.660 --> 00:02:39.050
over and over.
00:02:39.050 --> 00:02:41.190
Square root of 9 is a number.
00:02:41.190 --> 00:02:43.120
It is the number 3.
00:02:43.120 --> 00:02:46.375
The box square root
is a function.
00:02:49.320 --> 00:02:52.810
Now, let us go back to our
probabilistic experiment.
00:02:52.810 --> 00:02:55.470
Note that a probabilistic
experiment such as the one in
00:02:55.470 --> 00:02:59.720
our example can have several
associated random variables.
00:02:59.720 --> 00:03:03.290
For example, we could have
another random variable
00:03:03.290 --> 00:03:08.470
denoted by capital H, which
is the height of a student
00:03:08.470 --> 00:03:10.665
recorded in meters.
00:03:13.350 --> 00:03:17.030
So if the outcome of the
experiment, for example, was
00:03:17.030 --> 00:03:21.750
student a, then this random
variable would take a value
00:03:21.750 --> 00:03:26.690
which is the height of that
student, let's say it was 1.7.
00:03:26.690 --> 00:03:30.010
Or if the outcome of the
experiment was student c, then
00:03:30.010 --> 00:03:32.520
we would record the height
of that student.
00:03:32.520 --> 00:03:35.920
And let's say it turns
out to be 1.8.
00:03:35.920 --> 00:03:39.540
Once again, height with a
capital H is an abstract
00:03:39.540 --> 00:03:44.610
object, a function whose value
is determined once you tell me
00:03:44.610 --> 00:03:47.840
the outcome of the experiment.
00:03:47.840 --> 00:03:51.660
Now, given some random
variables, we can create new
00:03:51.660 --> 00:03:54.840
random variables as
functions of the
00:03:54.840 --> 00:03:56.730
original random variables.
00:03:56.730 --> 00:04:02.170
For example, consider the
quantity defined as weight
00:04:02.170 --> 00:04:05.010
divided by height squared.
00:04:05.010 --> 00:04:08.930
This quantity is the so-called
body mass index, and it is
00:04:08.930 --> 00:04:12.860
also a function on
the sample space.
00:04:12.860 --> 00:04:15.630
Why is it a function on
the sample space?
00:04:15.630 --> 00:04:19.540
Well, because once an outcome
of the experiment is
00:04:19.540 --> 00:04:22.580
determined, suppose that the
outcome of the experiment was
00:04:22.580 --> 00:04:26.830
the blue student, then these two
numbers, 62 and 1.7, are
00:04:26.830 --> 00:04:28.180
also determined.
00:04:28.180 --> 00:04:32.650
And using those numbers, we can
carry out this calculation
00:04:32.650 --> 00:04:36.940
and find the body mass index
of that particular student,
00:04:36.940 --> 00:04:40.510
which in this case
would be 21.5.
00:04:40.510 --> 00:04:44.960
Or if it happened that this
student was selected, then the
00:04:44.960 --> 00:04:48.980
body mass index would turn out
to be some other number.
00:04:48.980 --> 00:04:51.650
In this case, it would be 23.
00:04:51.650 --> 00:04:55.650
So again, we see that the body
mass index can be viewed as an
00:04:55.650 --> 00:04:58.820
abstract concept defined
by this formula.
00:04:58.820 --> 00:05:02.510
But once an outcome is
determined, then the body mass
00:05:02.510 --> 00:05:04.330
index is also determined.
00:05:04.330 --> 00:05:08.090
And so the body mass index is
really a function of which
00:05:08.090 --> 00:05:12.620
particular outcome
was selected.
00:05:12.620 --> 00:05:16.280
Let us now abstract from the
previous discussion.
00:05:16.280 --> 00:05:21.930
We have seen that random
variables are abstract objects
00:05:21.930 --> 00:05:26.330
that associate a specific value,
a particular number, to
00:05:26.330 --> 00:05:30.700
any particular outcome of a
probabilistic experiment.
00:05:30.700 --> 00:05:33.760
So in that sense, random
variables are functions from
00:05:33.760 --> 00:05:36.690
the sample space to
the real numbers.
00:05:36.690 --> 00:05:39.970
They are numerical functions,
but as numerical functions
00:05:39.970 --> 00:05:42.760
they can either take discrete
values, for example the
00:05:42.760 --> 00:05:46.200
integers, or they can take
continuous values, let's say
00:05:46.200 --> 00:05:48.260
on the real line.
00:05:48.260 --> 00:05:52.200
For example, if your random
variable is the number of
00:05:52.200 --> 00:05:56.250
heads in 10 consecutive coin
tosses, this is a discrete
00:05:56.250 --> 00:05:58.630
random variable that
takes values in the
00:05:58.630 --> 00:06:01.220
set from 0 to 10.
00:06:01.220 --> 00:06:05.770
If your random variable is a
measurement of the time at
00:06:05.770 --> 00:06:10.900
which something happened, and
if your timer has infinite
00:06:10.900 --> 00:06:15.090
accuracy, then the timer reports
a real number and we
00:06:15.090 --> 00:06:18.710
would have a continuous
random variable.
00:06:18.710 --> 00:06:23.830
In this lecture sequence and in
the next few ones, we will
00:06:23.830 --> 00:06:26.730
concentrate on discrete random
variables because they are
00:06:26.730 --> 00:06:28.120
easier to handle.
00:06:28.120 --> 00:06:30.550
And then later on, we will
move to a discussion of
00:06:30.550 --> 00:06:33.060
continuous random variables.
00:06:33.060 --> 00:06:38.230
Throughout, we want to keep
noting this very important
00:06:38.230 --> 00:06:42.390
distinction that I already
brought in the discussion for
00:06:42.390 --> 00:06:46.260
a particular example, but it
needs to be emphasized and
00:06:46.260 --> 00:06:47.750
re-emphasized.
00:06:47.750 --> 00:06:50.540
That we make a distinction
between random variables,
00:06:50.540 --> 00:06:52.030
which are abstract objects.
00:06:52.030 --> 00:06:54.560
They are functions on the sample
space and they are
00:06:54.560 --> 00:06:57.270
denoted by uppercase letters.
00:06:57.270 --> 00:07:01.340
In contrast, we will use lower
case letters to indicate
00:07:01.340 --> 00:07:05.000
numerical values of the
random variables.
00:07:05.000 --> 00:07:09.290
So little x is always a real
number, as opposed to the
00:07:09.290 --> 00:07:13.720
random variable, which
is a function.
00:07:13.720 --> 00:07:16.670
One point that we made earlier
is that for the same
00:07:16.670 --> 00:07:19.490
probabilistic experiment we
can have several random
00:07:19.490 --> 00:07:22.400
variables associated with
that experiment.
00:07:22.400 --> 00:07:25.080
And we can also combine
random variables to
00:07:25.080 --> 00:07:27.470
form new random variables.
00:07:27.470 --> 00:07:32.810
In general, a function of random
variables has numerical
00:07:32.810 --> 00:07:37.060
values that are determined by
the numerical values of the
00:07:37.060 --> 00:07:38.870
original random variables.
00:07:38.870 --> 00:07:42.070
And so, ultimately, they are
determined by the outcome of
00:07:42.070 --> 00:07:43.140
the experiment.
00:07:43.140 --> 00:07:45.780
So a function of random
variables has a numerical
00:07:45.780 --> 00:07:48.409
value which is completely
determined by the outcome of
00:07:48.409 --> 00:07:49.159
the experiment.
00:07:49.159 --> 00:07:52.080
And so a function of
random variables is
00:07:52.080 --> 00:07:54.440
also a random variable.
00:07:54.440 --> 00:07:58.980
As an example, we could think of
two random variables, X and
00:07:58.980 --> 00:08:02.700
Y, associated with the same
probabilistic experiment.
00:08:02.700 --> 00:08:07.060
And then define a random
variable, let's say X plus Y.
00:08:07.060 --> 00:08:08.230
What does that mean?
00:08:08.230 --> 00:08:17.770
X plus Y is a random variable
that takes the value little x
00:08:17.770 --> 00:08:25.560
plus little y when the random
variable capital X takes the
00:08:25.560 --> 00:08:34.240
value little x and capital Y
takes the value little y.
00:08:36.820 --> 00:08:39.400
So X and Y are random
variables.
00:08:39.400 --> 00:08:42.140
X plus Y is another
random variable.
00:08:42.140 --> 00:08:45.540
X and Y will take numerical
values once the outcome of the
00:08:45.540 --> 00:08:47.510
experiment has been obtained.
00:08:47.510 --> 00:08:50.740
And if the numerical values that
they take are little x
00:08:50.740 --> 00:08:55.160
and little y, then the random
variable X plus Y will take
00:08:55.160 --> 00:09:00.040
the numerical value little
x plus little y.
00:09:00.040 --> 00:09:03.700
So we can now move on and start
doing some interesting
00:09:03.700 --> 00:09:05.820
things about random variables.
00:09:05.820 --> 00:09:09.130
Characterize them, describe
them, give some examples, and
00:09:09.130 --> 00:09:11.860
introduce some new concepts
associated with them.