WEBVTT
00:00:00.950 --> 00:00:04.460
In this segment, we derive and
discuss the Markov inequality,
00:00:04.460 --> 00:00:09.030
a rather simple but quite useful
and powerful fact about
00:00:09.030 --> 00:00:11.660
probability distributions.
00:00:11.660 --> 00:00:14.760
The basic idea behind the Markov
inequality as well as
00:00:14.760 --> 00:00:18.410
many other inequalities and
bounds in probability theory
00:00:18.410 --> 00:00:20.140
is the following.
00:00:20.140 --> 00:00:23.050
We may be interested in saying
something about the
00:00:23.050 --> 00:00:25.270
probability of an
extreme event.
00:00:25.270 --> 00:00:28.280
By extreme event, we mean that
some random variable takes a
00:00:28.280 --> 00:00:29.960
very large value.
00:00:29.960 --> 00:00:33.360
If we can calculate that
probability exactly, then, of
00:00:33.360 --> 00:00:35.020
course, everything is fine.
00:00:35.020 --> 00:00:38.600
But suppose that we only have
a little bit of information
00:00:38.600 --> 00:00:41.350
about the probability
distribution at hand.
00:00:41.350 --> 00:00:44.250
For example, suppose that we
only know the expected value
00:00:44.250 --> 00:00:46.060
associated with that
distribution.
00:00:46.060 --> 00:00:47.810
Can we say something?
00:00:47.810 --> 00:00:51.040
Well, here's a statement, which
is quite intuitive.
00:00:51.040 --> 00:00:53.790
If you have a non-negative
random variable, and I tell
00:00:53.790 --> 00:00:56.920
you that the average or the
expected value is rather
00:00:56.920 --> 00:01:01.580
small, then there should be only
a very small probability
00:01:01.580 --> 00:01:05.030
that the random variable takes
a very large value.
00:01:05.030 --> 00:01:08.039
This is an intuitively plausible
statement, and the
00:01:08.039 --> 00:01:10.950
Markov inequality makes that
statement precise.
00:01:10.950 --> 00:01:12.550
Here is what it says.
00:01:12.550 --> 00:01:15.160
If we have a random variable
that's non-negative and you
00:01:15.160 --> 00:01:18.820
take any positive number, the
probability that the random
00:01:18.820 --> 00:01:22.850
variable exceeds that particular
number is bounded
00:01:22.850 --> 00:01:25.289
by this ratio.
00:01:25.289 --> 00:01:28.940
If the expected value of X
is very small, then the
00:01:28.940 --> 00:01:34.070
probability of exceeding that
value of a will also be small.
00:01:34.070 --> 00:01:37.610
Furthermore, if a is very large,
the probability of
00:01:37.610 --> 00:01:41.660
exceeding that very large value
drops down because this
00:01:41.660 --> 00:01:44.560
ratio becomes smaller.
00:01:44.560 --> 00:01:47.039
So that's what the Markov
inequality says.
00:01:47.039 --> 00:01:49.430
Let us now proceed with
a derivation.
00:01:49.430 --> 00:01:53.259
Let's start with the formula for
the expected value of X,
00:01:53.259 --> 00:01:56.870
and just to keep the argument
concrete, let us assume that
00:01:56.870 --> 00:01:59.479
the random variable is
continuous so that the
00:01:59.479 --> 00:02:02.310
expected value is given
by an integral.
00:02:02.310 --> 00:02:04.890
The argument would be exactly
the same as in the discrete
00:02:04.890 --> 00:02:08.570
case, but in the discrete case,
we would be using a sum.
00:02:08.570 --> 00:02:12.070
Now since the random variable is
non-negative, this integral
00:02:12.070 --> 00:02:16.110
only ranges from
0 to infinity.
00:02:16.110 --> 00:02:19.750
Now, we're interested, however,
in values of X larger
00:02:19.750 --> 00:02:24.840
than or equal to a, and that
tempts us to consider just the
00:02:24.840 --> 00:02:29.800
integral from a to infinity
of the same quantity.
00:02:29.800 --> 00:02:35.590
How do these two quantities
compare to each other?
00:02:35.590 --> 00:02:39.010
Since we're integrating a
non-negative quantity, if
00:02:39.010 --> 00:02:42.840
we're integrating over a smaller
range, the resulting
00:02:42.840 --> 00:02:48.579
integral will be less than or
equal to this integral here,
00:02:48.579 --> 00:02:53.680
so we get an inequality that
goes in this direction.
00:02:53.680 --> 00:02:56.500
Now let us look at this
integral here.
00:02:56.500 --> 00:03:00.860
Over the range of integration
that we're considering, X is
00:03:00.860 --> 00:03:03.200
at least as large as a.
00:03:03.200 --> 00:03:07.740
Therefore, the quantity that
we're integrating from a to
00:03:07.740 --> 00:03:16.420
infinity is at least as large
as a times the density of X.
00:03:16.420 --> 00:03:20.170
And now we can take this a,
which is a constant, pull it
00:03:20.170 --> 00:03:22.360
outside the integral.
00:03:22.360 --> 00:03:25.170
And what we're left with is the
integral of the density
00:03:25.170 --> 00:03:29.040
from a to infinity, which is
nothing but the probability
00:03:29.040 --> 00:03:32.140
that the random variable
takes a value larger
00:03:32.140 --> 00:03:33.510
than or equal to a.
00:03:33.510 --> 00:03:37.560
And now if you compare the two
sides of this inequality,
00:03:37.560 --> 00:03:44.010
that's exactly what the Markov
inequality is telling us.
00:03:44.010 --> 00:03:47.740
Now it is instructive to go
through a second derivation of
00:03:47.740 --> 00:03:49.290
the Markov inequality.
00:03:49.290 --> 00:03:53.180
This derivation is essentially
the same conceptually as the
00:03:53.180 --> 00:03:56.320
one that we just went through
except that it is more
00:03:56.320 --> 00:04:00.390
abstract and does not require us
to write down any explicit
00:04:00.390 --> 00:04:02.480
sums or integrals.
00:04:02.480 --> 00:04:04.070
Here's how it goes.
00:04:04.070 --> 00:04:08.200
Let us define a new random
variable Y, which is equal to
00:04:08.200 --> 00:04:13.350
0 if the random variable X
happens to be less than a and
00:04:13.350 --> 00:04:17.970
it is equal to a if X
happens to be larger
00:04:17.970 --> 00:04:19.709
than or equal to a.
00:04:19.709 --> 00:04:22.870
How is Y related to X?
00:04:22.870 --> 00:04:26.420
If X takes a value less than
a, it will still be a
00:04:26.420 --> 00:04:29.920
non-negative value, so X is
going to be at least as large
00:04:29.920 --> 00:04:31.360
as the value of 0.
00:04:31.360 --> 00:04:33.250
that Y takes.
00:04:33.250 --> 00:04:38.050
If X is larger than or equal to
a, Y will be a, so X will
00:04:38.050 --> 00:04:40.440
again be at least as large.
00:04:40.440 --> 00:04:47.190
So no matter what, we have the
inequality that Y is always
00:04:47.190 --> 00:04:52.960
less than or equal to X. And
since this is always the case,
00:04:52.960 --> 00:04:56.950
this means that the expected
value of Y will be less than
00:04:56.950 --> 00:05:00.550
or equal to the expected
value of X.
00:05:00.550 --> 00:05:02.920
But now what is the expected
value of Y?
00:05:02.920 --> 00:05:08.570
Since Y is either 0 or a, the
expected value is equal to a
00:05:08.570 --> 00:05:14.050
times the probability of that
event, which is a times the
00:05:14.050 --> 00:05:19.580
probability that X is larger
than or equal to a.
00:05:19.580 --> 00:05:23.850
And by comparing the two sides
of this inequality, what we
00:05:23.850 --> 00:05:27.530
have is exactly the
Markov inequality.
00:05:27.530 --> 00:05:30.170
Let us now go through some
simple examples.
00:05:30.170 --> 00:05:32.659
Suppose that X is exponentially
distributed with
00:05:32.659 --> 00:05:36.590
parameter or equal to 1 so that
the expected value of X
00:05:36.590 --> 00:05:40.870
is also going to be equal to 1,
and in that case, we obtain
00:05:40.870 --> 00:05:43.510
a bound of 1 over a.
00:05:43.510 --> 00:05:47.210
To put this result in
perspective, note that we're
00:05:47.210 --> 00:05:49.570
trying to bound a probability.
00:05:49.570 --> 00:05:53.040
We know that the probability
lies between 0 and 1.
00:05:53.040 --> 00:05:56.409
There's a true value for this
probability, and in this
00:05:56.409 --> 00:05:58.909
particular example because
we have an exponential
00:05:58.909 --> 00:06:04.690
distribution, this probability
is equal to e to the minus a.
00:06:04.690 --> 00:06:07.370
The Markov inequality
gives us a bound.
00:06:07.370 --> 00:06:11.480
In this instance, the bound
takes the form of 1 over a,
00:06:11.480 --> 00:06:14.270
and the inequality tells us
that the true value is
00:06:14.270 --> 00:06:17.110
somewhere to the left of here.
00:06:17.110 --> 00:06:21.450
A bound will be considered good
or strong or useful if
00:06:21.450 --> 00:06:24.590
that bound turns out to be quite
close to the correct
00:06:24.590 --> 00:06:28.810
value so that it also serves as
a fairly accurate estimate.
00:06:28.810 --> 00:06:31.560
Unfortunately, in this example,
this is not the case
00:06:31.560 --> 00:06:35.000
because the true value falls
off exponentially with a,
00:06:35.000 --> 00:06:37.940
whereas the bound that we
obtained falls off at a much
00:06:37.940 --> 00:06:40.550
slower rate of 1 over a.
00:06:40.550 --> 00:06:43.690
For this reason, one would
like to have even better
00:06:43.690 --> 00:06:46.470
bounds than the Markov
inequality, and this is one
00:06:46.470 --> 00:06:49.350
motivation for the Chebyshev
inequality that we will be
00:06:49.350 --> 00:06:50.870
considering next.
00:06:50.870 --> 00:06:55.230
But before we move there, let us
consider one more example.
00:06:55.230 --> 00:06:59.220
Suppose that X is a uniform
random variable on the
00:06:59.220 --> 00:07:05.960
interval from minus 4 to 4, and
we're interested in saying
00:07:05.960 --> 00:07:10.270
something about the probability
that X is larger
00:07:10.270 --> 00:07:12.350
than or equal to 3.
00:07:12.350 --> 00:07:15.960
So we're interested in
this event here.
00:07:15.960 --> 00:07:20.015
So the value of the density,
because we have a range of
00:07:20.015 --> 00:07:23.470
length 8, the value of
the density is 1/8.
00:07:23.470 --> 00:07:27.750
So we know that this probability
has a true value
00:07:27.750 --> 00:07:35.350
of 1 over 8, which we can
indicate on a diagram here.
00:07:35.350 --> 00:07:37.540
Probabilities are
between 0 and 1.
00:07:37.540 --> 00:07:40.890
We have a true value
of 1 over 8.
00:07:40.890 --> 00:07:43.010
Lets us see what the Markov
inequality is
00:07:43.010 --> 00:07:44.900
going to give us.
00:07:44.900 --> 00:07:48.650
There's one difficulty that X
is not a non-negative random
00:07:48.650 --> 00:07:51.300
variable, so we cannot
apply the Markov
00:07:51.300 --> 00:07:53.680
inequality right away.
00:07:53.680 --> 00:07:59.050
However, the event that X is
larger than or equal to 3 is
00:07:59.050 --> 00:08:03.920
smaller than the event that
the absolute value of X is
00:08:03.920 --> 00:08:06.690
larger than or equal to 3.
00:08:06.690 --> 00:08:12.460
That is, we take this blue event
and we also add this
00:08:12.460 --> 00:08:17.340
green event, and we say that
the probability of the blue
00:08:17.340 --> 00:08:21.370
event is less than or equal to
the probability of the blue
00:08:21.370 --> 00:08:24.970
together with the green event,
which is the event that the
00:08:24.970 --> 00:08:28.190
absolute value of X is larger
than or equal to 3.
00:08:28.190 --> 00:08:30.430
So now we have a random
variable, which is
00:08:30.430 --> 00:08:33.919
non-negative, and we can apply
the Markov inequality and
00:08:33.919 --> 00:08:36.890
write that this is less than or
equal to the expected value
00:08:36.890 --> 00:08:40.980
of the absolute value
of X divided by 3.
00:08:40.980 --> 00:08:44.153
What is this expectation of
the absolute value of X?
00:08:44.153 --> 00:08:47.160
X is uniform on this range.
00:08:47.160 --> 00:08:50.590
The absolute value of X will
be taking values only
00:08:50.590 --> 00:08:52.550
between 0 and 4.
00:08:52.550 --> 00:08:55.360
And because the original
distribution was uniform, the
00:08:55.360 --> 00:08:59.050
absolute value of X will
also be uniform on the
00:08:59.050 --> 00:09:00.820
range from 0 to 4.
00:09:00.820 --> 00:09:03.510
And for this reason, the
expected value is going to be
00:09:03.510 --> 00:09:08.290
equal to 2, and we get
a bound of 2/3.
00:09:08.290 --> 00:09:10.810
This is a pretty bad bound.
00:09:10.810 --> 00:09:14.030
It is true, of course,
but it is quite far
00:09:14.030 --> 00:09:15.910
from the true answer.
00:09:15.910 --> 00:09:18.230
Could we improve this bound?
00:09:18.230 --> 00:09:20.920
In this particular
example, we can.
00:09:20.920 --> 00:09:24.720
Because of symmetry, we know
that the probability of being
00:09:24.720 --> 00:09:29.150
larger than or equal to 3 is
equal to the probability of
00:09:29.150 --> 00:09:32.200
being less than or
equal to minus 3.
00:09:32.200 --> 00:09:35.840
Or the probability of this
event, which is the blue and
00:09:35.840 --> 00:09:39.100
the green, is twice
the probability of
00:09:39.100 --> 00:09:41.030
just the blue event.
00:09:41.030 --> 00:09:46.710
Or to put it differently, this
probably here is equal to 1/2
00:09:46.710 --> 00:09:50.870
of the probability that the
absolute value of x is larger
00:09:50.870 --> 00:09:54.780
than or equal to 3, and
therefore, by using the same
00:09:54.780 --> 00:10:00.090
bound as here, we will obtain
and answer of 1/3.
00:10:00.090 --> 00:10:03.440
So by being a little more clever
and exploiting the
00:10:03.440 --> 00:10:07.660
symmetry of this distribution
around 0, we get a somewhat
00:10:07.660 --> 00:10:12.660
better bound of 1/3, which
is, again, a true bound.
00:10:12.660 --> 00:10:16.260
It is more informative than the
original bound, but still
00:10:16.260 --> 00:10:18.910
it is quite far away from
the true answer.