WEBVTT

00:00:00.820 --> 00:00:03.680
In this segment, we discuss the
expected value rule for

00:00:03.680 --> 00:00:05.890
calculating the expected
value of a

00:00:05.890 --> 00:00:08.000
function of a random variable.

00:00:08.000 --> 00:00:11.180
It corresponds to a nice formula
that we will see

00:00:11.180 --> 00:00:15.220
shortly, but it also involves a
much more general idea that

00:00:15.220 --> 00:00:18.970
we will encounter many times
in this course, in somewhat

00:00:18.970 --> 00:00:20.480
different forms.

00:00:20.480 --> 00:00:22.920
Here's what it is all about.

00:00:22.920 --> 00:00:27.910
We start with a certain random
variable that has a known PMF.

00:00:27.910 --> 00:00:30.720
However, we're ultimately
interested in another random

00:00:30.720 --> 00:00:34.190
variable Y, which is defined as
a function of the original

00:00:34.190 --> 00:00:35.460
random variable.

00:00:35.460 --> 00:00:38.130
We're interested in calculating
the expected value

00:00:38.130 --> 00:00:42.130
of this new random variable,
Y. How should we do it?

00:00:42.130 --> 00:00:45.150
We will illustrate the ideas
involved through a simple

00:00:45.150 --> 00:00:46.670
numerical example.

00:00:46.670 --> 00:00:49.810
In this example, we have a
random variable, X, that takes

00:00:49.810 --> 00:00:52.910
values 2, 3, 4, or 5, according
to some given

00:00:52.910 --> 00:00:54.140
probabilities.

00:00:54.140 --> 00:00:56.570
We are also given a
function that maps

00:00:56.570 --> 00:00:59.300
x-values into y-values.

00:00:59.300 --> 00:01:04.080
And this function, g, then
defines a new random variable.

00:01:04.080 --> 00:01:07.550
So if the outcome of the
experiment leads to an X equal

00:01:07.550 --> 00:01:11.200
to 4, then the random variable,
Y, will also take a

00:01:11.200 --> 00:01:13.130
value equal to 4.

00:01:13.130 --> 00:01:16.200
How do we calculate the
expected value of Y?

00:01:16.200 --> 00:01:18.720
The only tool that we have
available in our hands at this

00:01:18.720 --> 00:01:22.310
point is the definition of the
expected value, which tells us

00:01:22.310 --> 00:01:26.620
that we should run a summation
over the y-axis, consider

00:01:26.620 --> 00:01:29.520
different values of
y one at the time.

00:01:29.520 --> 00:01:33.640
And for each value for y,
multiply that value by its

00:01:33.640 --> 00:01:35.430
corresponding probability.

00:01:35.430 --> 00:01:39.640
So in this case, we start with
Y equal to 3, which needs to

00:01:39.640 --> 00:01:41.870
be multiplied by the
probability that

00:01:41.870 --> 00:01:43.620
Y is equal to 3.

00:01:43.620 --> 00:01:45.150
What is that probability?

00:01:45.150 --> 00:01:49.850
Well, Y is equal to three, if
and only if X is 2 or 3, which

00:01:49.850 --> 00:01:56.430
happens with probability,
0.1 plus 0.2.

00:01:56.430 --> 00:01:59.400
Then we continue with the
summation by considering the

00:01:59.400 --> 00:02:01.790
next value of little y.

00:02:01.790 --> 00:02:04.280
The next possible value is 4.

00:02:04.280 --> 00:02:08.009
And this gives us a contribution
of 4, weighted by

00:02:08.009 --> 00:02:10.490
the probability of
obtaining a 4.

00:02:10.490 --> 00:02:13.340
The probability that Y is equal
to 4 is the probability

00:02:13.340 --> 00:02:17.579
that X is either equal to 4 or
to 5, which happens with

00:02:17.579 --> 00:02:18.340
probability.

00:02:18.340 --> 00:02:20.750
0.3 plus 0.4.

00:02:20.750 --> 00:02:24.660
So this way, we obtain an
arithmetic expression which we

00:02:24.660 --> 00:02:25.890
can evaluate.

00:02:25.890 --> 00:02:29.480
And it's going to give us the
expected value of Y. But

00:02:29.480 --> 00:02:31.680
here's an alternative
way of calculating

00:02:31.680 --> 00:02:33.400
the expected value.

00:02:33.400 --> 00:02:38.310
And this corresponds to the
following type of thinking.

00:02:38.310 --> 00:02:42.260
10% of the time, X is going
to be equal to 2.

00:02:42.260 --> 00:02:46.079
And when that happens, Y
takes on a value of 3.

00:02:46.079 --> 00:02:49.930
So this should give us a
contribution to the average

00:02:49.930 --> 00:02:55.430
value of Y, which
is 3 times 0.1.

00:02:55.430 --> 00:03:02.590
Then, 20% of the time, X
is 3 and Y is also 3.

00:03:02.590 --> 00:03:09.980
So 20% of the time, we
also get 3's in Y.

00:03:09.980 --> 00:03:15.790
Then 30% of the time, X is 4,
which results in a Y that's

00:03:15.790 --> 00:03:17.280
equal to 4.

00:03:17.280 --> 00:03:21.620
So we obtain a 4 30%
of the time.

00:03:21.620 --> 00:03:26.440
And finally, 40% of the time, X
equals to [5], which results

00:03:26.440 --> 00:03:30.329
in a Y equal to 4.

00:03:30.329 --> 00:03:33.710
And we obtain this arithmetic
expression.

00:03:33.710 --> 00:03:36.600
Now you can compare the two
arithmetic expressions, the

00:03:36.600 --> 00:03:39.940
red and the blue one, and you
will notice that they're

00:03:39.940 --> 00:03:43.470
equal, except that the terms
are arranged in a slightly

00:03:43.470 --> 00:03:44.880
different way.

00:03:44.880 --> 00:03:48.210
Conceptually, however, there's
a very big difference.

00:03:48.210 --> 00:03:50.920
In the first summation, we
run over the values of

00:03:50.920 --> 00:03:53.079
Y one at the time.

00:03:53.079 --> 00:03:56.630
In the second summation, we run
over the different values

00:03:56.630 --> 00:04:01.320
of X one at a time, and took
into account their individual

00:04:01.320 --> 00:04:02.990
contributions.

00:04:02.990 --> 00:04:07.500
This second way of calculating
the expected value of Y is

00:04:07.500 --> 00:04:09.710
called the expected
value rule.

00:04:09.710 --> 00:04:13.560
And it corresponds to the
following formula.

00:04:13.560 --> 00:04:17.390
We carry out a summation
over the x-axis.

00:04:17.390 --> 00:04:22.330
For each x-value that we
consider, we calculate what is

00:04:22.330 --> 00:04:27.830
the corresponding y-value,
that's g of x, and also weigh

00:04:27.830 --> 00:04:30.620
this term according to
the probability of

00:04:30.620 --> 00:04:32.510
this particular x.

00:04:32.510 --> 00:04:37.710
So for instance, a typical term
here would be when x is

00:04:37.710 --> 00:04:42.440
equal to 2, g of x would
be equal to 3.

00:04:42.440 --> 00:04:44.480
And the corresponding
probability, that's the

00:04:44.480 --> 00:04:49.560
probability of a 2,
would be 0.1.

00:04:49.560 --> 00:04:52.790
The advantage of using the
expected value rule instead of

00:04:52.790 --> 00:04:55.180
the definition of the
expectation is that the

00:04:55.180 --> 00:04:58.510
expected value rule only
involves the PMF of the

00:04:58.510 --> 00:05:02.540
original random variable, so
we do not need to do any

00:05:02.540 --> 00:05:05.295
additional work to
find the PMF of

00:05:05.295 --> 00:05:08.230
the new random variable.

00:05:08.230 --> 00:05:13.610
Now we argued in favor of the
expected value rule by

00:05:13.610 --> 00:05:16.450
considering this numerical
example, and by checking that

00:05:16.450 --> 00:05:18.730
it gives the right result.

00:05:18.730 --> 00:05:20.520
But now let us verify.

00:05:20.520 --> 00:05:23.760
Let us argue more generally that
it's going to give us the

00:05:23.760 --> 00:05:25.410
right answer.

00:05:25.410 --> 00:05:30.300
So what we're going to do is
to take this summation and

00:05:30.300 --> 00:05:34.980
argue that it's equal to the
expected value of Y, which is

00:05:34.980 --> 00:05:37.040
defined by that summation.

00:05:37.040 --> 00:05:38.510
So let us start with this.

00:05:38.510 --> 00:05:41.790
It's a sum over all x's.

00:05:41.790 --> 00:05:48.920
Let us first fix a particular
value of y, and add over all

00:05:48.920 --> 00:05:52.970
those x's that correspond
to that particular y.

00:05:52.970 --> 00:05:56.440
So we're fixing a
particular y.

00:05:56.440 --> 00:06:02.320
And so we're adding only over
those x's that lead to that

00:06:02.320 --> 00:06:04.170
particular y.

00:06:04.170 --> 00:06:06.085
And we carry out to
the summation.

00:06:09.680 --> 00:06:13.500
So this is the part of this
sum associated with one

00:06:13.500 --> 00:06:16.620
particular choice of y.

00:06:16.620 --> 00:06:20.000
And it's a sum, really,
over this set of x's.

00:06:20.000 --> 00:06:24.370
But in order to exhaust all x's,
we need to consider all

00:06:24.370 --> 00:06:26.930
possible values of y.

00:06:26.930 --> 00:06:31.690
And this gives rise to an
outer summation over the

00:06:31.690 --> 00:06:33.130
different y's.

00:06:33.130 --> 00:06:37.630
So for any fixed y, we add
over the associated x's.

00:06:37.630 --> 00:06:40.250
But we want to consider
all the possible y's.

00:06:43.010 --> 00:06:47.960
Now at this point, we make the
following observation.

00:06:50.460 --> 00:06:53.460
Here, we have a summation
over y's.

00:06:53.460 --> 00:06:57.710
And let's look at the
inner summation.

00:06:57.710 --> 00:07:03.520
The inner summation involves
x's, all of which are

00:07:03.520 --> 00:07:07.140
associated with a specific
value of y.

00:07:07.140 --> 00:07:13.360
Having fixed y, all the terms
inside this sum have the

00:07:13.360 --> 00:07:16.290
property that g of
x is equal to y.

00:07:16.290 --> 00:07:20.960
So g of x is equal to
that particular y.

00:07:20.960 --> 00:07:25.710
And we can make this
substitution here.

00:07:25.710 --> 00:07:28.510
Now when we look at this
summation, we now realize that

00:07:28.510 --> 00:07:33.470
it's a summation over x's
while y is being fixed.

00:07:33.470 --> 00:07:37.690
And so we can take this
term of y and pull

00:07:37.690 --> 00:07:39.165
it outside the summation.

00:07:42.620 --> 00:07:50.590
What this leaves us with is a
sum over all y's of y, and

00:07:50.590 --> 00:07:54.930
then a further sum over all
x's that lead to that

00:07:54.930 --> 00:07:59.590
particular y, of the
probabilities of those x's.

00:08:04.640 --> 00:08:09.850
Now what can we say about
this inner summation?

00:08:09.850 --> 00:08:12.960
We have fixed a y.

00:08:12.960 --> 00:08:16.050
For that particular y, we're
adding the probabilities of

00:08:16.050 --> 00:08:19.910
all the x's that lead to
that particular y.

00:08:19.910 --> 00:08:23.820
Fixing y, consider all the
x's that lead to it.

00:08:23.820 --> 00:08:27.100
This is just the probability
of that particular y.

00:08:29.740 --> 00:08:33.549
But what we have now is just the
definition of the expected

00:08:33.549 --> 00:08:38.840
value of Y. And this concludes
the proof that this

00:08:38.840 --> 00:08:42.490
expression, as given by the
expected value rule, gives us

00:08:42.490 --> 00:08:45.530
the same answer as the original
definition of the

00:08:45.530 --> 00:08:48.110
expected value of Y.

00:08:48.110 --> 00:08:50.860
Now before closing, a
few observations.

00:08:50.860 --> 00:08:54.350
The expected value rule is
really simple to use.

00:08:54.350 --> 00:08:56.890
For example, if you want to
calculate the expected value

00:08:56.890 --> 00:09:00.330
of the square of a random
variable, then you're dealing

00:09:00.330 --> 00:09:03.560
with a situation where
the g function

00:09:03.560 --> 00:09:07.350
is the square function.

00:09:07.350 --> 00:09:11.790
And so, the expected value of
X-squared will be the sum over

00:09:11.790 --> 00:09:16.700
x's of x squared weighted
according to the probability

00:09:16.700 --> 00:09:19.260
of a particular x.

00:09:19.260 --> 00:09:22.950
And finally, one important
word of caution, that in

00:09:22.950 --> 00:09:26.090
general, the expected value
of the function--

00:09:26.090 --> 00:09:29.750
so for example, the expected
value of X-squared.

00:09:29.750 --> 00:09:35.480
In general, it's not going to
be the same as taking the

00:09:35.480 --> 00:09:40.620
expected value of X
and squaring it.

00:09:40.620 --> 00:09:43.800
So this is a word [of] caution,
that in general, you

00:09:43.800 --> 00:09:46.960
cannot interchange the order
with which you apply a

00:09:46.960 --> 00:09:50.820
function, and then you calculate
expectation.

00:09:50.820 --> 00:09:54.060
There are exceptions, however,
in which we happen to have

00:09:54.060 --> 00:09:55.400
equality here.

00:09:55.400 --> 00:09:57.320
And this is going to
be our next topic.