1
00:00:05,500 --> 00:00:09,600
Why does going to the airport seem to require
extra time compared with coming back from
2
00:00:09,600 --> 00:00:15,080
the airport even if the traffic is the same
in both directions? The answer must somehow
3
00:00:15,080 --> 00:00:19,900
depend on more than just the average travel
time, which we’re assuming is the same and
4
00:00:19,900 --> 00:00:27,760
often is. In fact, it depends on the distribution
of travel times. Probability distributions
5
00:00:27,769 --> 00:00:32,980
are fully described by listing or graphing
every probability. For example, how likely
6
00:00:32,980 --> 00:00:38,350
is a journey to the airport to be between
10 and 20 minutes? How likely is a 20—30
7
00:00:38,350 --> 00:00:43,550
minute journey? A 30—40 minute journey?
And so on. We’ll answer the airport question
8
00:00:43,550 --> 00:00:45,590
at the end of the video.
9
00:00:45,590 --> 00:00:50,379
This video is part of the Probability and
Statistics video series. Many natural and
10
00:00:50,379 --> 00:00:55,960
social phenomena are probabilistic in nature.
Engineers, scientists, and policymakers often
11
00:00:55,960 --> 00:00:59,390
use probability to model and predict system
behavior.
12
00:00:59,390 --> 00:01:04,720
Hi, my name is Sanjoy Mahajan, and I’m a
professor of Applied Science and Engineering
13
00:01:04,720 --> 00:01:10,420
at Olin College. Before watching this video,
you should be proficient with integration
14
00:01:10,420 --> 00:01:14,020
and have some familiarity with probabilities.
15
00:01:14,020 --> 00:01:16,920
After watching this video, you will be able
to:
16
00:01:16,920 --> 00:01:20,140
Explain what moments of distributions are,
and
17
00:01:20,140 --> 00:01:25,400
Compute moments and understand what they mean
18
00:01:27,658 --> 00:01:34,658
To illustrate what a probability distribution
is, lets consider rolling two fair dice. The
19
00:01:34,670 --> 00:01:39,549
probability distribution of their sum is this
table. For example, the only way to get a
20
00:01:39,549 --> 00:01:46,329
sum of two is to roll a 1 on each die. And,
there are 36 possible rolls for a pair of
21
00:01:46,329 --> 00:01:53,329
dice. So, getting a sum of two has a probability
of 1 over 36. The probability of rolling a
22
00:01:53,880 --> 00:02:00,880
sum of 3 is 2 over 36. And so on and so forth.
You can fill in a table like this yourself.
23
00:02:01,090 --> 00:02:06,219
But the whole distribution, even for something
as simple as two dice, is usually too much
24
00:02:06,219 --> 00:02:07,860
information.
25
00:02:07,860 --> 00:02:12,790
We often want to characterize the shape of
the distribution using only a few numbers.
26
00:02:12,790 --> 00:02:19,700
Of course, that throws away information, but
throwing away information is the only way
27
00:02:19,700 --> 00:02:23,200
to fit the complexity of the world into our
brains.
28
00:02:23,200 --> 00:02:29,040
The art comes in keeping the most important
information. Finding the moments of a distribution
29
00:02:29,040 --> 00:02:34,959
can help us reach our goal. Two moments that
you are probably already familiar with are
30
00:02:34,959 --> 00:02:39,690
mean and variance. They are the two most important
moments of distributions.
31
00:02:39,690 --> 00:02:47,150
Let’s define these moments more formally.
The mean is the first moment of a distribution.
32
00:02:47,150 --> 00:02:54,599
It is also called the expected value and is
computed as shown. Expected value of x, that’s
33
00:02:54,599 --> 00:03:00,349
x with angled brackets around it, is equal
to this sum. It’s the weighted sum of all
34
00:03:00,349 --> 00:03:06,069
of the x’s weighted by their probabilities.
Let the x sub i be the possible values of
35
00:03:06,069 --> 00:03:07,569
x.
36
00:03:07,569 --> 00:03:14,400
For example, for the rolling of two dice,
the possible values for x sub i would be 2,3,4
37
00:03:14,400 --> 00:03:19,220
all the way up through 12. And p sub i would
be the corresponding probabilities of rolling
38
00:03:19,220 --> 00:03:24,540
those sums - so that was 1 over 36, 2 over
36, and so on.
39
00:03:24,540 --> 00:03:29,840
So, the first moment gives us some idea of
what our distribution might look like, but
40
00:03:29,840 --> 00:03:34,900
not much. Think about it like this, the center
of mass in these two images is in the same
41
00:03:34,900 --> 00:03:39,930
place, but the mass is actually distributed
very differently in the two cases. We need
42
00:03:39,930 --> 00:03:41,409
more information.
43
00:03:41,409 --> 00:03:46,099
The second moment can help us. The second
moment is very similar in structure to the
44
00:03:46,099 --> 00:03:51,819
first moment. We write it the same way with
angled brackets, but now we’re talking about
45
00:03:51,819 --> 00:03:58,379
the expected value of x squared. So it’s
still a sum and it’s still weighted by the
46
00:03:58,379 --> 00:04:04,340
probabilities p sub i, but now we square each
possible x value. For the dice example that
47
00:04:04,340 --> 00:04:10,920
was the values from two through twelve. This
is also called the mean square. First you
48
00:04:10,920 --> 00:04:16,829
square the x values, then you take the mean,
weighting each x sub i by its probability,
49
00:04:16,829 --> 00:04:17,779
p sub i.
50
00:04:17,779 --> 00:04:24,780
In general, the nth moment is defined as follows.
51
00:04:27,590 --> 00:04:32,479
So how does the second moment help us get
a better picture of our distribution? Because
52
00:04:32,479 --> 00:04:38,300
it can help us calculate something called
the variance. The variance measures how spread
53
00:04:38,300 --> 00:04:44,229
out the distribution is around the mean. To
calculate the variance, you first subtract
54
00:04:44,229 --> 00:04:49,710
the mean from each x sub i – this is like
finding the distance of each x sub i from
55
00:04:49,710 --> 00:04:56,710
the mean - and then you square the result
and multiply by p sub i.
56
00:04:59,620 --> 00:05:04,930
What are the dimensions of the variance? The
square of the dimensions of x. For example
57
00:05:04,930 --> 00:05:10,660
if the dimension is a length, then the variance
is a length squared. But we often want a measure
58
00:05:10,660 --> 00:05:16,490
of dispersion like the variance, but one that
has the same dimensions as x itself. That
59
00:05:16,490 --> 00:05:22,320
measure is the standard deviation, sigma.
Sigma is defined as the square root of the
60
00:05:22,320 --> 00:05:27,520
variance. So if the variable x has dimensions
of length, then the variance will have dimensions
61
00:05:27,520 --> 00:05:32,470
of length squared, but the standard deviation,
sigma, will have dimensions of length so it’s
62
00:05:32,470 --> 00:05:35,000
comparable to x directly.
63
00:05:35,000 --> 00:05:40,350
This expression for the variance looks like
a pain to compute, but it has an alternative
64
00:05:40,350 --> 00:05:45,320
expression that is much simpler. And you get
to show that as one of the exercises after
65
00:05:45,320 --> 00:05:51,490
the video. The alternative expression, the
much simpler one, is that the variance is
66
00:05:51,490 --> 00:05:57,159
equal to the second moment, our old friend,
minus the square of the first moment, or the
67
00:05:57,159 --> 00:05:58,240
mean.
68
00:05:58,240 --> 00:06:05,240
Pause the video here to convince yourself
that this difference is always non-negative.
69
00:06:09,729 --> 00:06:15,050
This alternative expression for the variance,
this much more useful one, is also the parallel
70
00:06:15,050 --> 00:06:19,990
axis theorem in mechanics, which says that
the moment of inertia of an object about the
71
00:06:19,990 --> 00:06:25,160
center of mass is equal to the moment of inertia
about an axis shifted by h from the center
72
00:06:25,160 --> 00:06:29,860
of mass, a parallel shift, minus mh squared.
73
00:06:29,860 --> 00:06:36,350
So how does this analogy work? This, the dispersion
around the mean, which is here at the center
74
00:06:36,350 --> 00:06:42,610
of mass, is like the variance. This is like
the second moment if we make h equal to the
75
00:06:42,610 --> 00:06:50,389
mean. So this is the dispersion around zero
or its second moment. So this is like x squared,
76
00:06:50,389 --> 00:06:56,580
the expected value. The mass is the sum total
of all the weights here for each of xi which
77
00:06:56,580 --> 00:07:03,580
all add up to one. So this is just like one
in this problem. And then the h squared, well
78
00:07:03,639 --> 00:07:06,840
h is the mean, so this is x squared.
79
00:07:06,840 --> 00:07:12,910
So you can see the exact same structure repeated
with h, the shift of axis as the mean, and
80
00:07:12,910 --> 00:07:19,500
m the mass, as the sum of all probabilities
which is one. So this formula for the variance
81
00:07:19,500 --> 00:07:24,080
is also the parallel axis theorem.
82
00:07:26,900 --> 00:07:32,100
Let’s use the definitions of the moments,
and also of the related quantity, the variance,
83
00:07:32,110 --> 00:07:34,460
and practice on a few distributions.
84
00:07:34,460 --> 00:07:39,639
A simple discrete distribution is a single
coin flip. Instead of thinking of the coin
85
00:07:39,639 --> 00:07:43,889
flip as resulting in heads or tails, let’s
think about the coin as turning up a zero
86
00:07:43,889 --> 00:07:47,970
or one. Let p be the probability of a one.
87
00:07:47,970 --> 00:07:53,560
So the mean is the weighted sum of the xi’s,
weighted by the probabilities. So the mean
88
00:07:53,560 --> 00:08:02,340
x is the sum pi xi which is equal to one minus
p times zero plus p times one which is equal
89
00:08:02,349 --> 00:08:03,620
to p.
90
00:08:03,620 --> 00:08:11,040
What about the second moment? X squared, it’s
equal to the weighted sum of the xi’s squared
91
00:08:11,050 --> 00:08:18,970
so the weights are the same and we can square
each value here, the xi’s, but since they’re
92
00:08:18,970 --> 00:08:24,919
all zero or one, squaring doesn’t change
them. So the second moment and the third moment
93
00:08:24,919 --> 00:08:32,759
and every higher moment are all p. Pause the
video here and compute the variance and sketch
94
00:08:32,760 --> 00:08:35,919
it as a function of p.
95
00:08:40,690 --> 00:08:44,750
The variance from our old convenient form
of the formula is… variance of x is the
96
00:08:44,750 --> 00:08:49,680
mean squared, mean square minus the squared
mean and all the moments themselves were just
97
00:08:49,680 --> 00:08:56,580
p. So that’s p minus p squared which is
equal to p times 1 minus p.
98
00:08:56,580 --> 00:09:03,339
What does that look like? We sketch it. P
on this axis, variance on that axis and the
99
00:09:03,339 --> 00:09:09,310
curve starts at zero (something I can’t
understand) and goes back to zero.
100
00:09:09,310 --> 00:09:15,450
This is a p equals 1 and that’s p equals
zero. Does that make sense?
101
00:09:15,450 --> 00:09:22,080
Yeah, it does… from the meaning of variance
as dispersion around the mean. So take the
102
00:09:22,080 --> 00:09:27,430
first extreme case of p equals zero. In other
words, the coin has no chance of producing
103
00:09:27,430 --> 00:09:33,730
a one, always produces a zero every time.
There the mean is zero and there is no dispersion
104
00:09:33,730 --> 00:09:40,060
because it always produces zero. The same
applies when p equals one here at this extreme.
105
00:09:40,060 --> 00:09:45,560
The coin always produces a one with no dispersion.
There is no variation, there is no variance
106
00:09:45,560 --> 00:09:52,420
and it’s plausible that the variance should
be a maximum right in between… here at p
107
00:09:52,420 --> 00:09:59,100
equals one half which it is on this curve.
So everything looks good. Our calculation
108
00:09:59,100 --> 00:10:03,540
seems reasonable and checks out in the extreme
cases.
109
00:10:03,540 --> 00:10:07,900
Before we go back to the airport problem,
let’s extend the idea of moments to continuous
110
00:10:07,900 --> 00:10:09,620
distributions.
111
00:10:09,630 --> 00:10:14,459
Here, instead of a list of probabilities for
each possible x, we have a probability density
112
00:10:14,459 --> 00:10:21,010
p as a function of x, where x is now a continuous
variable. That’s the continuous version
113
00:10:21,010 --> 00:10:27,880
for the nth moment was a sum of xi to the
nth weighted by the probabilities. Here, the
114
00:10:27,880 --> 00:10:34,540
nth moment, x sub n, in equal to instead of
a sum, an integral. Weighted again, as always,
115
00:10:34,540 --> 00:10:42,340
by the probability times x sub n, as before
and with a dx because p of x times dx is the
116
00:10:42,340 --> 00:10:48,389
probability and you add them all up over all
possible values of x. That’s the formula
117
00:10:48,399 --> 00:10:50,769
for a continuous distribution, for the moments
of a continuous distribution.
118
00:10:50,769 --> 00:10:57,170
Let’s practice on the simplest continuous
distribution, the uniform distribution. X
119
00:10:57,170 --> 00:11:03,420
is equally likely to be any real number between
zero and one. That’s the distribution and
120
00:11:03,420 --> 00:11:07,450
we can compute the first and second moments
and the variance.
121
00:11:07,450 --> 00:11:12,700
Pause the video here, use the definition
of moments for a continuous distribution and
122
00:11:12,720 --> 00:11:19,720
compute the mean, first moment, the second
moment, and from those two, the variance.
123
00:11:27,930 --> 00:11:33,240
What you should have found is … for the
mean, it’s the integral of one because p
124
00:11:33,240 --> 00:11:40,980
of x is one, times x between zero and one
dx, which is x squared over two evaluated
125
00:11:40,990 --> 00:11:47,649
between zero and one, which equal one half…
which makes sense. The mean here, the average
126
00:11:47,649 --> 00:11:52,680
value is just one-half right in the middle
of the distribution of the possible values
127
00:11:52,680 --> 00:11:54,279
of x.
128
00:11:54,279 --> 00:12:02,159
What about the mean square? For that, you should
have found almost the same calculation, one
129
00:12:02,160 --> 00:12:09,519
times x squared dx, which equals x cubed over
3 between zero and one equals one-third. And
130
00:12:09,519 --> 00:12:15,200
thus, the variance is equal to one-third,
that’s the mean square minus the squared
131
00:12:15,200 --> 00:12:22,720
mean, which is… one twelfth. And that number
is familiar. That’s the same 1/12 that shows
132
00:12:22,730 --> 00:12:28,500
up in the moment of inertia of a ruler of
length l and mass m. Its moment of inertia
133
00:12:28,500 --> 00:12:34,600
is 1/12 ml squared which illustrates again
the connection between moments of inertia
134
00:12:34,600 --> 00:12:36,570
and moments of distributions.
135
00:12:36,570 --> 00:12:42,589
Let’s apply our knowledge to understand
quantitatively, or in a formal way, what happens
136
00:12:42,589 --> 00:12:47,600
with airport travel – why does it seem
so much longer on the way there, than on the
137
00:12:47,600 --> 00:12:48,370
way back?
138
00:12:48,370 --> 00:12:53,839
Here is the ideal travel experience to
the airport, the distribution of travel times
139
00:12:53,839 --> 00:12:59,079
t. Here's the probability of each particular travel time, p
140
00:12:59,079 --> 00:13:06,219
of t. In the ideal world, the travel time
would be very predictable. Let’s say it
141
00:13:06,230 --> 00:13:11,570
would be almost always twenty minutes. In
that case, you would allow twenty minutes
142
00:13:11,570 --> 00:13:16,329
to get to the airport and you would allow
twenty minutes on the way back. Going there
143
00:13:16,329 --> 00:13:18,399
and coming back would seem the same.
144
00:13:18,399 --> 00:13:24,070
But, here’s what travel to the airport actually
looks like. Let’s say the mean is still
145
00:13:24,070 --> 00:13:30,139
the same, but the reality is that there’s
lots of dispersion. And so the curve actually
146
00:13:30,139 --> 00:13:36,360
looks like that. Sometimes the travel time
will be 30 minutes, sometimes 40, sometimes
147
00:13:36,360 --> 00:13:38,630
10.
148
00:13:38,630 --> 00:13:44,079
So now, what do you have to do?... this is
reality. Well, on the way home, it’s no
149
00:13:44,079 --> 00:13:48,680
problem. On average, you get home in twenty
minutes. You leave whenever you get out of
150
00:13:48,680 --> 00:13:54,269
the baggage claim. And while it’s true that
the trip to the airport follows the same distribution,
151
00:13:54,269 --> 00:13:59,639
the risk to you of not making it to the airport
on time is much greater. If you just allow
152
00:13:59,639 --> 00:14:03,800
twenty minutes, yeah, sometimes you’ll get
lucky, but every once in a while it will take
153
00:14:03,800 --> 00:14:06,410
you twenty-five or thirty minutes.
154
00:14:06,410 --> 00:14:09,740
So what you have to do is allow more time
on the way there so that you don’t miss
155
00:14:09,740 --> 00:14:14,540
your flight - maybe thirty minutes, maybe
even forty minutes. It all depends on the
156
00:14:14,540 --> 00:14:19,730
dispersion, or standard deviation, of the
distribution. On the way to the airport, you
157
00:14:19,730 --> 00:14:25,290
are much more aware of the distribution, if
you will, than you are on the way back.
158
00:14:29,400 --> 00:14:34,320
In this video, we saw how to calculate the
moments of a distribution and how these moments can
159
00:14:34,320 --> 00:14:39,000
help us quickly summarize the distribution. Like life...
160
00:14:39,000 --> 00:14:46,000
when something is complicated, simplify it, grasp it, and understand it by appreciating its moments!