1
00:00:00,000 --> 00:00:00,630
2
00:00:00,630 --> 00:00:01,500
Hi.
3
00:00:01,500 --> 00:00:04,750
In this problem, we're dealing
with buses of students going
4
00:00:04,750 --> 00:00:06,450
to a job convention.
5
00:00:06,450 --> 00:00:10,190
And in the problem, we'll
be exercising our
6
00:00:10,190 --> 00:00:11,400
knowledge of PMFs--
7
00:00:11,400 --> 00:00:13,050
probability mass functions.
8
00:00:13,050 --> 00:00:15,060
So we'll get a couple of
opportunities to write out
9
00:00:15,060 --> 00:00:18,210
some PMFs, and also calculating
expectations or
10
00:00:18,210 --> 00:00:19,700
expected values.
11
00:00:19,700 --> 00:00:22,340
And also, importantly, we'll
actually be exercising our
12
00:00:22,340 --> 00:00:27,510
intuition to help us not just
rely on numbers, but also to
13
00:00:27,510 --> 00:00:30,850
just have a sense of what the
answers to some probability
14
00:00:30,850 --> 00:00:32,930
questions should be.
15
00:00:32,930 --> 00:00:35,850
So the problem specifically
deals with
16
00:00:35,850 --> 00:00:37,710
four buses of students.
17
00:00:37,710 --> 00:00:41,070
So we have buses, and in each
one carries a different number
18
00:00:41,070 --> 00:00:41,630
of students.
19
00:00:41,630 --> 00:00:44,960
So the first one carries 40
students, the second one 33,
20
00:00:44,960 --> 00:00:48,790
the third one has 25, and the
last one has 50 students for a
21
00:00:48,790 --> 00:00:54,480
total of 148 students.
22
00:00:54,480 --> 00:00:56,670
And because these students
are smart, and they like
23
00:00:56,670 --> 00:00:58,210
probability, they are
24
00:00:58,210 --> 00:01:00,170
interested in a couple questions.
25
00:01:00,170 --> 00:01:06,370
So suppose that one of these
148 students is chosen
26
00:01:06,370 --> 00:01:09,880
randomly, and so we'll assume
that what that means is that
27
00:01:09,880 --> 00:01:12,360
each one has the same
probability of being chosen.
28
00:01:12,360 --> 00:01:15,120
So they're chosen uniformly
at random.
29
00:01:15,120 --> 00:01:18,580
And let's assign a couple
of random variables.
30
00:01:18,580 --> 00:01:28,100
So we'll say x corresponds to
the number of students in the
31
00:01:28,100 --> 00:01:40,010
bus of the selected student.
32
00:01:40,010 --> 00:01:44,830
OK, so one of these 148 students
is selected uniformly
33
00:01:44,830 --> 00:01:47,670
at random, and we'll let x
correspond to the number of
34
00:01:47,670 --> 00:01:51,310
students in that
student's bus.
35
00:01:51,310 --> 00:01:55,310
So if a student from this bus
was chosen, then x would be
36
00:01:55,310 --> 00:01:57,750
25, for example.
37
00:01:57,750 --> 00:02:00,340
OK, and then let's come up with
another random variable,
38
00:02:00,340 --> 00:02:04,430
y, which is almost
the same thing.
39
00:02:04,430 --> 00:02:08,789
Except instead of now selecting
a random student,
40
00:02:08,789 --> 00:02:11,920
we'll select a random bus.
41
00:02:11,920 --> 00:02:17,110
Or equivalently, we'll select
a random bus driver.
42
00:02:17,110 --> 00:02:20,390
So each bus has one driver, and
instead of selecting one
43
00:02:20,390 --> 00:02:23,320
of the 148 students at random,
we'll select one of the four
44
00:02:23,320 --> 00:02:26,620
bus drivers also uniformly
at random.
45
00:02:26,620 --> 00:02:30,110
And we'll say the number
of students in that
46
00:02:30,110 --> 00:02:32,930
driver's bus will be y.
47
00:02:32,930 --> 00:02:36,940
So for example, if this bus
driver was selected, then y
48
00:02:36,940 --> 00:02:38,820
would be 33.
49
00:02:38,820 --> 00:02:44,640
OK, so the main problem that
we're trying to answer is what
50
00:02:44,640 --> 00:02:47,270
do you expect the
expectation--
51
00:02:47,270 --> 00:02:48,910
which one of these random
variables do you expect to
52
00:02:48,910 --> 00:02:53,050
have the higher expectation or
the higher expected value?
53
00:02:53,050 --> 00:02:56,280
So, would you expect
x to be higher on
54
00:02:56,280 --> 00:02:58,050
average, or y to be higher?
55
00:02:58,050 --> 00:03:00,620
And what would be the
intuition for this?
56
00:03:00,620 --> 00:03:02,910
So obviously, we can actually
write out the
57
00:03:02,910 --> 00:03:03,990
PMFs for x and y.
58
00:03:03,990 --> 00:03:05,780
These are just discrete
random variables.
59
00:03:05,780 --> 00:03:08,170
And we can actually calculate
out what the expectation is.
60
00:03:08,170 --> 00:03:11,190
But it's also useful to exercise
your intuition, and
61
00:03:11,190 --> 00:03:14,260
your sense of what the
answer should be.
62
00:03:14,260 --> 00:03:18,420
So it might not be immediately
clear which one would be
63
00:03:18,420 --> 00:03:20,640
higher, or you might even say
that maybe it doesn't make a
64
00:03:20,640 --> 00:03:21,280
difference.
65
00:03:21,280 --> 00:03:23,350
They're actually the same.
66
00:03:23,350 --> 00:03:27,800
But a useful way to approach
some of these questions is to
67
00:03:27,800 --> 00:03:30,360
try to take things to
the extreme and see
68
00:03:30,360 --> 00:03:31,440
how that plays out.
69
00:03:31,440 --> 00:03:33,580
So let's take the simpler
example and take it to the
70
00:03:33,580 --> 00:03:37,260
extreme and say, suppose a set
of four buses carrying these
71
00:03:37,260 --> 00:03:38,370
number of students.
72
00:03:38,370 --> 00:03:39,620
We have only two buses--
73
00:03:39,620 --> 00:03:49,280
one bus that has only 1 student,
and we have another
74
00:03:49,280 --> 00:03:57,280
bus that has 1,000 students.
75
00:03:57,280 --> 00:03:58,370
OK.
76
00:03:58,370 --> 00:04:00,840
And suppose we ask the
same question.
77
00:04:00,840 --> 00:04:05,880
Well, now if you look at it,
there's a total of 1,001
78
00:04:05,880 --> 00:04:06,850
students now.
79
00:04:06,850 --> 00:04:10,770
If you select one of the
students at random, it's
80
00:04:10,770 --> 00:04:14,040
overwhelmingly more likely that
that student will be one
81
00:04:14,040 --> 00:04:17,140
of the 1,000 students
on this huge bus.
82
00:04:17,140 --> 00:04:20,630
It's very unlikely that you'll
get lucky and select the one
83
00:04:20,630 --> 00:04:23,140
student who is by himself.
84
00:04:23,140 --> 00:04:27,210
And so because of that, you
have a very high chance of
85
00:04:27,210 --> 00:04:30,930
selecting the bus with the
high number of students.
86
00:04:30,930 --> 00:04:33,710
And so you would expect
x, the number of
87
00:04:33,710 --> 00:04:37,490
students, to be high--
88
00:04:37,490 --> 00:04:40,590
to be almost 1,000 in
the expectation.
89
00:04:40,590 --> 00:04:44,840
But on the other hand, if you
selected the driver at random,
90
00:04:44,840 --> 00:04:46,880
then you have a 50/50
chance of selecting
91
00:04:46,880 --> 00:04:48,430
this one or that one.
92
00:04:48,430 --> 00:04:54,210
And so you would expect the
expectation there to be
93
00:04:54,210 --> 00:04:56,160
roughly 500 or so.
94
00:04:56,160 --> 00:04:59,740
And so you can see that if you
take this to the extreme, then
95
00:04:59,740 --> 00:05:03,240
it becomes more clear what
the answer would be.
96
00:05:03,240 --> 00:05:06,650
And the argument is that the
expectation of x should be
97
00:05:06,650 --> 00:05:10,930
higher than the expectation of
y, and the reason here is that
98
00:05:10,930 --> 00:05:14,250
because you select the student
at random, you're more likely
99
00:05:14,250 --> 00:05:18,410
to select a student who is in a
large bus, because that bus
100
00:05:18,410 --> 00:05:20,920
just has more students
to select from.
101
00:05:20,920 --> 00:05:23,910
And because of that, you're
more biased in favor of
102
00:05:23,910 --> 00:05:27,980
selecting large buses, and
therefore, that makes x higher
103
00:05:27,980 --> 00:05:29,910
in expectation.
104
00:05:29,910 --> 00:05:32,580
OK, so that's the intuition
behind this problem.
105
00:05:32,580 --> 00:05:34,240
And now, as I actually go
through some of the more
106
00:05:34,240 --> 00:05:38,100
mechanics and write out what the
PMFs and the calculation
107
00:05:38,100 --> 00:05:40,640
for the expectation would be to
verify that our intuition
108
00:05:40,640 --> 00:05:42,400
is actually correct.
109
00:05:42,400 --> 00:05:46,020
OK, so we have two random
variables that are defined.
110
00:05:46,020 --> 00:05:48,940
Now let's just write out
what their PMFs are.
111
00:05:48,940 --> 00:05:51,270
So the PMF--
112
00:05:51,270 --> 00:05:58,240
we write it as little P of
capital X and little x.
113
00:05:58,240 --> 00:06:00,740
So the random variable-- what
we do is we say the
114
00:06:00,740 --> 00:06:03,970
probability that it will take
on a certain value, right?
115
00:06:03,970 --> 00:06:09,210
So what is the probability
that x will be 40?
116
00:06:09,210 --> 00:06:13,030
Well, x will be 40
if a student from
117
00:06:13,030 --> 00:06:14,810
this bus was selected.
118
00:06:14,810 --> 00:06:16,870
And what's the probability that
a student from this bus
119
00:06:16,870 --> 00:06:17,570
is selected?
120
00:06:17,570 --> 00:06:23,230
That probability is 40/148,
because there's 148 students,
121
00:06:23,230 --> 00:06:27,160
40 of whom are sitting
in this bus.
122
00:06:27,160 --> 00:06:35,470
And similarly, x will be 33 with
probability 33/148, and x
123
00:06:35,470 --> 00:06:40,750
will be 25 with probability
25/148.
124
00:06:40,750 --> 00:06:45,120
And x will be 50 with
probability 50/148.
125
00:06:45,120 --> 00:06:47,030
And it will be 0 otherwise.
126
00:06:47,030 --> 00:06:51,750
127
00:06:51,750 --> 00:06:57,440
OK, so there is our PMF for
x, and we can do the
128
00:06:57,440 --> 00:06:59,920
same thing for y.
129
00:06:59,920 --> 00:07:02,060
The PMF of y--
130
00:07:02,060 --> 00:07:05,160
again, we say what is the
probability that y will take
131
00:07:05,160 --> 00:07:06,150
on certain values?
132
00:07:06,150 --> 00:07:09,900
Well, y can take on the same
values as x can, because we're
133
00:07:09,900 --> 00:07:12,580
still dealing with the number
of students in each bus.
134
00:07:12,580 --> 00:07:14,910
So y can be 40.
135
00:07:14,910 --> 00:07:17,390
But the probability that y is
40, because we're selecting
136
00:07:17,390 --> 00:07:20,290
the driver at random
now, is 1/4, right?
137
00:07:20,290 --> 00:07:23,260
Because there's a 1/4 chance
that we'll pick this driver.
138
00:07:23,260 --> 00:07:27,960
And the probability that y will
be 33 will also be 1/4,
139
00:07:27,960 --> 00:07:35,840
and the same thing
for 25 and 50.
140
00:07:35,840 --> 00:07:42,260
And it's 0 otherwise.
141
00:07:42,260 --> 00:07:49,690
OK, so those are the PMFs
for our two random
142
00:07:49,690 --> 00:07:51,950
variables, x and y.
143
00:07:51,950 --> 00:07:55,630
And we can also draw out what
the PMFs look like.
144
00:07:55,630 --> 00:08:14,730
So if this is 25, 30, 35,
40, 45, and 50, then the
145
00:08:14,730 --> 00:08:17,650
probability that it's
25 is 25/148.
146
00:08:17,650 --> 00:08:21,290
So we can draw a mass
right there.
147
00:08:21,290 --> 00:08:24,130
For 33, it's a little
higher, because it's
148
00:08:24,130 --> 00:08:27,440
33/148 instead of 25.
149
00:08:27,440 --> 00:08:29,260
For 40, it's even
higher still.
150
00:08:29,260 --> 00:08:30,220
It's 40/148.
151
00:08:30,220 --> 00:08:39,380
And for 50, it is still higher,
because it is 50/148.
152
00:08:39,380 --> 00:08:44,620
And so you can see that the PMF
is more heavily favored
153
00:08:44,620 --> 00:08:47,410
towards the larger values.
154
00:08:47,410 --> 00:08:51,610
We can do the same thing for
y, and we'll notice that
155
00:08:51,610 --> 00:08:54,690
there's a difference in how
these distributions look.
156
00:08:54,690 --> 00:09:00,460
157
00:09:00,460 --> 00:09:05,280
So if we do the same thing, the
difference now is that all
158
00:09:05,280 --> 00:09:11,500
four of these masses will
have the same height.
159
00:09:11,500 --> 00:09:16,240
Each one will have height 1/4,
whereas this one for x, it's
160
00:09:16,240 --> 00:09:18,710
more heavily biased in favor
of the larger ones.
161
00:09:18,710 --> 00:09:21,410
And so because of that, we can
actually now calculate what
162
00:09:21,410 --> 00:09:24,740
the expectations are and figure
out whether or not our
163
00:09:24,740 --> 00:09:27,600
intuition was correct.
164
00:09:27,600 --> 00:09:30,760
OK, so now let's actually
calculate out what these
165
00:09:30,760 --> 00:09:33,610
expectations are.
166
00:09:33,610 --> 00:09:37,880
So as you recall, the
expectation is calculated out
167
00:09:37,880 --> 00:09:39,830
as a weighted sum.
168
00:09:39,830 --> 00:09:45,490
So for each possible value of x,
you take that value and you
169
00:09:45,490 --> 00:09:50,070
weight it by the probability of
the random variable taking
170
00:09:50,070 --> 00:09:52,080
on that value.
171
00:09:52,080 --> 00:09:59,920
So in this case, it would be
40 times 40/148, 33 times
172
00:09:59,920 --> 00:10:10,650
33/148, and so on.
173
00:10:10,650 --> 00:10:20,760
48 plus 25 times 25/148
plus 50 times 50/148.
174
00:10:20,760 --> 00:10:25,810
And if you do out this
calculation, what you'll get
175
00:10:25,810 --> 00:10:30,820
is that it is around 39.
176
00:10:30,820 --> 00:10:33,070
Roughly 39.
177
00:10:33,070 --> 00:10:36,910
And now we can do the
same thing for y.
178
00:10:36,910 --> 00:10:41,650
But for y, it's different,
because now instead of
179
00:10:41,650 --> 00:10:44,650
weighting it by these
probabilities, we'll weight it
180
00:10:44,650 --> 00:10:45,920
by these probabilities.
181
00:10:45,920 --> 00:10:48,600
So each one has the same
weight of 1/4.
182
00:10:48,600 --> 00:10:55,390
So now we get 40 times 1/4
plus 33 times 1/4.
183
00:10:55,390 --> 00:11:01,130
That's 25 times 1/4
plus 50 times 1/4.
184
00:11:01,130 --> 00:11:06,030
And if you do out this
arithmetic, what you get is
185
00:11:06,030 --> 00:11:10,310
that this expectation is 37.
186
00:11:10,310 --> 00:11:13,930
And so what we get is that, in
fact, after we do out the
187
00:11:13,930 --> 00:11:17,090
calculations, the expected value
of x is indeed greater
188
00:11:17,090 --> 00:11:20,110
than the expected value of y,
which confirms our intuition.
189
00:11:20,110 --> 00:11:24,310
OK, so this problem, to
summarize-- we've reviewed how
190
00:11:24,310 --> 00:11:27,650
to write out a PMF and also how
to calculate expectations.
191
00:11:27,650 --> 00:11:33,540
But also, we've got a chance to
figure out some intuition
192
00:11:33,540 --> 00:11:35,840
behind some of these problems.
193
00:11:35,840 --> 00:11:39,480
And so sometimes it's helpful
to take simpler things and
194
00:11:39,480 --> 00:11:42,250
take things to the extreme and
figure out intuitively whether
195
00:11:42,250 --> 00:11:43,520
or not the answer makes sense.
196
00:11:43,520 --> 00:11:47,530
It's useful just to verify
whether the numerical answer
197
00:11:47,530 --> 00:11:48,950
that you get in the
end is correct.
198
00:11:48,950 --> 00:11:50,440
Does this actually make sense?
199
00:11:50,440 --> 00:11:53,850
It's a useful guide for when
you're solving these problems.
200
00:11:53,850 --> 00:11:55,310
OK, so we'll see you next time.
201
00:11:55,310 --> 00:11:56,560