WEBVTT

00:00:05.500 --> 00:00:09.600
Why does going to the airport seem to require
extra time compared with coming back from

00:00:09.600 --> 00:00:15.080
the airport even if the traffic is the same
in both directions? The answer must somehow

00:00:15.080 --> 00:00:19.900
depend on more than just the average travel
time, which we’re assuming is the same and

00:00:19.900 --> 00:00:27.760
often is. In fact, it depends on the distribution
of travel times. Probability distributions

00:00:27.769 --> 00:00:32.980
are fully described by listing or graphing
every probability. For example, how likely

00:00:32.980 --> 00:00:38.350
is a journey to the airport to be between
10 and 20 minutes? How likely is a 20—30

00:00:38.350 --> 00:00:43.550
minute journey? A 30—40 minute journey?
And so on. We’ll answer the airport question

00:00:43.550 --> 00:00:45.590
at the end of the video.

00:00:45.590 --> 00:00:50.379
This video is part of the Probability and
Statistics video series. Many natural and

00:00:50.379 --> 00:00:55.960
social phenomena are probabilistic in nature.
Engineers, scientists, and policymakers often

00:00:55.960 --> 00:00:59.390
use probability to model and predict system
behavior.

00:00:59.390 --> 00:01:04.720
Hi, my name is Sanjoy Mahajan, and I’m a
professor of Applied Science and Engineering

00:01:04.720 --> 00:01:10.420
at Olin College. Before watching this video,
you should be proficient with integration

00:01:10.420 --> 00:01:14.020
and have some familiarity with probabilities.

00:01:14.020 --> 00:01:16.920
After watching this video, you will be able
to:

00:01:16.920 --> 00:01:20.140
Explain what moments of distributions are,
and

00:01:20.140 --> 00:01:25.400
Compute moments and understand what they mean

00:01:27.658 --> 00:01:34.658
To illustrate what a probability distribution
is, lets consider rolling two fair dice. The

00:01:34.670 --> 00:01:39.549
probability distribution of their sum is this
table. For example, the only way to get a

00:01:39.549 --> 00:01:46.329
sum of two is to roll a 1 on each die. And,
there are 36 possible rolls for a pair of

00:01:46.329 --> 00:01:53.329
dice. So, getting a sum of two has a probability
of 1 over 36. The probability of rolling a

00:01:53.880 --> 00:02:00.880
sum of 3 is 2 over 36. And so on and so forth.
You can fill in a table like this yourself.

00:02:01.090 --> 00:02:06.219
But the whole distribution, even for something
as simple as two dice, is usually too much

00:02:06.219 --> 00:02:07.860
information.

00:02:07.860 --> 00:02:12.790
We often want to characterize the shape of
the distribution using only a few numbers.

00:02:12.790 --> 00:02:19.700
Of course, that throws away information, but
throwing away information is the only way

00:02:19.700 --> 00:02:23.200
to fit the complexity of the world into our
brains.

00:02:23.200 --> 00:02:29.040
The art comes in keeping the most important
information. Finding the moments of a distribution

00:02:29.040 --> 00:02:34.959
can help us reach our goal. Two moments that
you are probably already familiar with are

00:02:34.959 --> 00:02:39.690
mean and variance. They are the two most important
moments of distributions.

00:02:39.690 --> 00:02:47.150
Let’s define these moments more formally.
The mean is the first moment of a distribution.

00:02:47.150 --> 00:02:54.599
It is also called the expected value and is
computed as shown. Expected value of x, that’s

00:02:54.599 --> 00:03:00.349
x with angled brackets around it, is equal
to this sum. It’s the weighted sum of all

00:03:00.349 --> 00:03:06.069
of the x’s weighted by their probabilities.
Let the x sub i be the possible values of

00:03:06.069 --> 00:03:07.569
x.

00:03:07.569 --> 00:03:14.400
For example, for the rolling of two dice,
the possible values for x sub i would be 2,3,4

00:03:14.400 --> 00:03:19.220
all the way up through 12. And p sub i would
be the corresponding probabilities of rolling

00:03:19.220 --> 00:03:24.540
those sums - so that was 1 over 36, 2 over
36, and so on.

00:03:24.540 --> 00:03:29.840
So, the first moment gives us some idea of
what our distribution might look like, but

00:03:29.840 --> 00:03:34.900
not much. Think about it like this, the center
of mass in these two images is in the same

00:03:34.900 --> 00:03:39.930
place, but the mass is actually distributed
very differently in the two cases. We need

00:03:39.930 --> 00:03:41.409
more information.

00:03:41.409 --> 00:03:46.099
The second moment can help us. The second
moment is very similar in structure to the

00:03:46.099 --> 00:03:51.819
first moment. We write it the same way with
angled brackets, but now we’re talking about

00:03:51.819 --> 00:03:58.379
the expected value of x squared. So it’s
still a sum and it’s still weighted by the

00:03:58.379 --> 00:04:04.340
probabilities p sub i, but now we square each
possible x value. For the dice example that

00:04:04.340 --> 00:04:10.920
was the values from two through twelve. This
is also called the mean square. First you

00:04:10.920 --> 00:04:16.829
square the x values, then you take the mean,
weighting each x sub i by its probability,

00:04:16.829 --> 00:04:17.779
p sub i.

00:04:17.779 --> 00:04:24.780
In general, the nth moment is defined as follows.

00:04:27.590 --> 00:04:32.479
So how does the second moment help us get
a better picture of our distribution? Because

00:04:32.479 --> 00:04:38.300
it can help us calculate something called
the variance. The variance measures how spread

00:04:38.300 --> 00:04:44.229
out the distribution is around the mean. To
calculate the variance, you first subtract

00:04:44.229 --> 00:04:49.710
the mean from each x sub i – this is like
finding the distance of each x sub i from

00:04:49.710 --> 00:04:56.710
the mean - and then you square the result
and multiply by p sub i.

00:04:59.620 --> 00:05:04.930
What are the dimensions of the variance? The
square of the dimensions of x. For example

00:05:04.930 --> 00:05:10.660
if the dimension is a length, then the variance
is a length squared. But we often want a measure

00:05:10.660 --> 00:05:16.490
of dispersion like the variance, but one that
has the same dimensions as x itself. That

00:05:16.490 --> 00:05:22.320
measure is the standard deviation, sigma.
Sigma is defined as the square root of the

00:05:22.320 --> 00:05:27.520
variance. So if the variable x has dimensions
of length, then the variance will have dimensions

00:05:27.520 --> 00:05:32.470
of length squared, but the standard deviation,
sigma, will have dimensions of length so it’s

00:05:32.470 --> 00:05:35.000
comparable to x directly.

00:05:35.000 --> 00:05:40.350
This expression for the variance looks like
a pain to compute, but it has an alternative

00:05:40.350 --> 00:05:45.320
expression that is much simpler. And you get
to show that as one of the exercises after

00:05:45.320 --> 00:05:51.490
the video. The alternative expression, the
much simpler one, is that the variance is

00:05:51.490 --> 00:05:57.159
equal to the second moment, our old friend,
minus the square of the first moment, or the

00:05:57.159 --> 00:05:58.240
mean.

00:05:58.240 --> 00:06:05.240
Pause the video here to convince yourself
that this difference is always non-negative.

00:06:09.729 --> 00:06:15.050
This alternative expression for the variance,
this much more useful one, is also the parallel

00:06:15.050 --> 00:06:19.990
axis theorem in mechanics, which says that
the moment of inertia of an object about the

00:06:19.990 --> 00:06:25.160
center of mass is equal to the moment of inertia
about an axis shifted by h from the center

00:06:25.160 --> 00:06:29.860
of mass, a parallel shift, minus mh squared.

00:06:29.860 --> 00:06:36.350
So how does this analogy work? This, the dispersion
around the mean, which is here at the center

00:06:36.350 --> 00:06:42.610
of mass, is like the variance. This is like
the second moment if we make h equal to the

00:06:42.610 --> 00:06:50.389
mean. So this is the dispersion around zero
or its second moment. So this is like x squared,

00:06:50.389 --> 00:06:56.580
the expected value. The mass is the sum total
of all the weights here for each of xi which

00:06:56.580 --> 00:07:03.580
all add up to one. So this is just like one
in this problem. And then the h squared, well

00:07:03.639 --> 00:07:06.840
h is the mean, so this is x squared.

00:07:06.840 --> 00:07:12.910
So you can see the exact same structure repeated
with h, the shift of axis as the mean, and

00:07:12.910 --> 00:07:19.500
m the mass, as the sum of all probabilities
which is one. So this formula for the variance

00:07:19.500 --> 00:07:24.080
is also the parallel axis theorem.

00:07:26.900 --> 00:07:32.100
Let’s use the definitions of the moments,
and also of the related quantity, the variance,

00:07:32.110 --> 00:07:34.460
and practice on a few distributions.

00:07:34.460 --> 00:07:39.639
A simple discrete distribution is a single
coin flip. Instead of thinking of the coin

00:07:39.639 --> 00:07:43.889
flip as resulting in heads or tails, let’s
think about the coin as turning up a zero

00:07:43.889 --> 00:07:47.970
or one. Let p be the probability of a one.

00:07:47.970 --> 00:07:53.560
So the mean is the weighted sum of the xi’s,
weighted by the probabilities. So the mean

00:07:53.560 --> 00:08:02.340
x is the sum pi xi which is equal to one minus
p times zero plus p times one which is equal

00:08:02.349 --> 00:08:03.620
to p.

00:08:03.620 --> 00:08:11.040
What about the second moment? X squared, it’s
equal to the weighted sum of the xi’s squared

00:08:11.050 --> 00:08:18.970
so the weights are the same and we can square
each value here, the xi’s, but since they’re

00:08:18.970 --> 00:08:24.919
all zero or one, squaring doesn’t change
them. So the second moment and the third moment

00:08:24.919 --> 00:08:32.759
and every higher moment are all p. Pause the
video here and compute the variance and sketch

00:08:32.760 --> 00:08:35.919
it as a function of p.

00:08:40.690 --> 00:08:44.750
The variance from our old convenient form
of the formula is… variance of x is the

00:08:44.750 --> 00:08:49.680
mean squared, mean square minus the squared
mean and all the moments themselves were just

00:08:49.680 --> 00:08:56.580
p. So that’s p minus p squared which is
equal to p times 1 minus p.

00:08:56.580 --> 00:09:03.339
What does that look like? We sketch it. P
on this axis, variance on that axis and the

00:09:03.339 --> 00:09:09.310
curve starts at zero (something I can’t
understand) and goes back to zero.

00:09:09.310 --> 00:09:15.450
This is a p equals 1 and that’s p equals
zero. Does that make sense?

00:09:15.450 --> 00:09:22.080
Yeah, it does… from the meaning of variance
as dispersion around the mean. So take the

00:09:22.080 --> 00:09:27.430
first extreme case of p equals zero. In other
words, the coin has no chance of producing

00:09:27.430 --> 00:09:33.730
a one, always produces a zero every time.
There the mean is zero and there is no dispersion

00:09:33.730 --> 00:09:40.060
because it always produces zero. The same
applies when p equals one here at this extreme.

00:09:40.060 --> 00:09:45.560
The coin always produces a one with no dispersion.
There is no variation, there is no variance

00:09:45.560 --> 00:09:52.420
and it’s plausible that the variance should
be a maximum right in between… here at p

00:09:52.420 --> 00:09:59.100
equals one half which it is on this curve.
So everything looks good. Our calculation

00:09:59.100 --> 00:10:03.540
seems reasonable and checks out in the extreme
cases.

00:10:03.540 --> 00:10:07.900
Before we go back to the airport problem,
let’s extend the idea of moments to continuous

00:10:07.900 --> 00:10:09.620
distributions.

00:10:09.630 --> 00:10:14.459
Here, instead of a list of probabilities for
each possible x, we have a probability density

00:10:14.459 --> 00:10:21.010
p as a function of x, where x is now a continuous
variable. That’s the continuous version

00:10:21.010 --> 00:10:27.880
for the nth moment was a sum of xi to the
nth weighted by the probabilities. Here, the

00:10:27.880 --> 00:10:34.540
nth moment, x sub n, in equal to instead of
a sum, an integral. Weighted again, as always,

00:10:34.540 --> 00:10:42.340
by the probability times x sub n, as before
and with a dx because p of x times dx is the

00:10:42.340 --> 00:10:48.389
probability and you add them all up over all
possible values of x. That’s the formula

00:10:48.399 --> 00:10:50.769
for a continuous distribution, for the moments
of a continuous distribution.

00:10:50.769 --> 00:10:57.170
Let’s practice on the simplest continuous
distribution, the uniform distribution. X

00:10:57.170 --> 00:11:03.420
is equally likely to be any real number between
zero and one. That’s the distribution and

00:11:03.420 --> 00:11:07.450
we can compute the first and second moments
and the variance.

00:11:07.450 --> 00:11:12.700
Pause the video here, use the definition
of moments for a continuous distribution and

00:11:12.720 --> 00:11:19.720
compute the mean, first moment, the second
moment, and from those two, the variance.

00:11:27.930 --> 00:11:33.240
What you should have found is … for the
mean, it’s the integral of one because p

00:11:33.240 --> 00:11:40.980
of x is one, times x between zero and one
dx, which is x squared over two evaluated

00:11:40.990 --> 00:11:47.649
between zero and one, which equal one half…
which makes sense. The mean here, the average

00:11:47.649 --> 00:11:52.680
value is just one-half right in the middle
of the distribution of the possible values

00:11:52.680 --> 00:11:54.279
of x.

00:11:54.279 --> 00:12:02.159
What about the mean square? For that, you should
have found almost the same calculation, one

00:12:02.160 --> 00:12:09.519
times x squared dx, which equals x cubed over
3 between zero and one equals one-third. And

00:12:09.519 --> 00:12:15.200
thus, the variance is equal to one-third,
that’s the mean square minus the squared

00:12:15.200 --> 00:12:22.720
mean, which is… one twelfth. And that number
is familiar. That’s the same 1/12 that shows

00:12:22.730 --> 00:12:28.500
up in the moment of inertia of a ruler of
length l and mass m. Its moment of inertia

00:12:28.500 --> 00:12:34.600
is 1/12 ml squared which illustrates again
the connection between moments of inertia

00:12:34.600 --> 00:12:36.570
and moments of distributions.

00:12:36.570 --> 00:12:42.589
Let’s apply our knowledge to understand
quantitatively, or in a formal way, what happens

00:12:42.589 --> 00:12:47.600
with airport travel – why does it seem
so much longer on the way there, than on the

00:12:47.600 --> 00:12:48.370
way back?

00:12:48.370 --> 00:12:53.839
Here is the ideal travel experience to
the airport, the distribution of travel times

00:12:53.839 --> 00:12:59.079
t. Here's the probability of each particular travel time, p

00:12:59.079 --> 00:13:06.219
of t. In the ideal world, the travel time
would be very predictable. Let’s say it

00:13:06.230 --> 00:13:11.570
would be almost always twenty minutes. In
that case, you would allow twenty minutes

00:13:11.570 --> 00:13:16.329
to get to the airport and you would allow
twenty minutes on the way back. Going there

00:13:16.329 --> 00:13:18.399
and coming back would seem the same.

00:13:18.399 --> 00:13:24.070
But, here’s what travel to the airport actually
looks like. Let’s say the mean is still

00:13:24.070 --> 00:13:30.139
the same, but the reality is that there’s
lots of dispersion. And so the curve actually

00:13:30.139 --> 00:13:36.360
looks like that. Sometimes the travel time
will be 30 minutes, sometimes 40, sometimes

00:13:36.360 --> 00:13:38.630
10.

00:13:38.630 --> 00:13:44.079
So now, what do you have to do?... this is
reality. Well, on the way home, it’s no

00:13:44.079 --> 00:13:48.680
problem. On average, you get home in twenty
minutes. You leave whenever you get out of

00:13:48.680 --> 00:13:54.269
the baggage claim. And while it’s true that
the trip to the airport follows the same distribution,

00:13:54.269 --> 00:13:59.639
the risk to you of not making it to the airport
on time is much greater. If you just allow

00:13:59.639 --> 00:14:03.800
twenty minutes, yeah, sometimes you’ll get
lucky, but every once in a while it will take

00:14:03.800 --> 00:14:06.410
you twenty-five or thirty minutes.

00:14:06.410 --> 00:14:09.740
So what you have to do is allow more time
on the way there so that you don’t miss

00:14:09.740 --> 00:14:14.540
your flight - maybe thirty minutes, maybe
even forty minutes. It all depends on the

00:14:14.540 --> 00:14:19.730
dispersion, or standard deviation, of the
distribution. On the way to the airport, you

00:14:19.730 --> 00:14:25.290
are much more aware of the distribution, if
you will, than you are on the way back.

00:14:29.400 --> 00:14:34.320
In this video, we saw how to calculate the
moments of a distribution and how these moments can

00:14:34.320 --> 00:14:39.000
help us quickly summarize the distribution. Like life...

00:14:39.000 --> 00:14:46.000
when something is complicated, simplify it, grasp it, and understand it by appreciating its moments!