1 00:00:00,000 --> 00:00:00,630 2 00:00:00,630 --> 00:00:01,500 Hi. 3 00:00:01,500 --> 00:00:04,750 In this problem, we're dealing with buses of students going 4 00:00:04,750 --> 00:00:06,450 to a job convention. 5 00:00:06,450 --> 00:00:10,190 And in the problem, we'll be exercising our 6 00:00:10,190 --> 00:00:11,400 knowledge of PMFs-- 7 00:00:11,400 --> 00:00:13,050 probability mass functions. 8 00:00:13,050 --> 00:00:15,060 So we'll get a couple of opportunities to write out 9 00:00:15,060 --> 00:00:18,210 some PMFs, and also calculating expectations or 10 00:00:18,210 --> 00:00:19,700 expected values. 11 00:00:19,700 --> 00:00:22,340 And also, importantly, we'll actually be exercising our 12 00:00:22,340 --> 00:00:27,510 intuition to help us not just rely on numbers, but also to 13 00:00:27,510 --> 00:00:30,850 just have a sense of what the answers to some probability 14 00:00:30,850 --> 00:00:32,930 questions should be. 15 00:00:32,930 --> 00:00:35,850 So the problem specifically deals with 16 00:00:35,850 --> 00:00:37,710 four buses of students. 17 00:00:37,710 --> 00:00:41,070 So we have buses, and in each one carries a different number 18 00:00:41,070 --> 00:00:41,630 of students. 19 00:00:41,630 --> 00:00:44,960 So the first one carries 40 students, the second one 33, 20 00:00:44,960 --> 00:00:48,790 the third one has 25, and the last one has 50 students for a 21 00:00:48,790 --> 00:00:54,480 total of 148 students. 22 00:00:54,480 --> 00:00:56,670 And because these students are smart, and they like 23 00:00:56,670 --> 00:00:58,210 probability, they are 24 00:00:58,210 --> 00:01:00,170 interested in a couple questions. 25 00:01:00,170 --> 00:01:06,370 So suppose that one of these 148 students is chosen 26 00:01:06,370 --> 00:01:09,880 randomly, and so we'll assume that what that means is that 27 00:01:09,880 --> 00:01:12,360 each one has the same probability of being chosen. 28 00:01:12,360 --> 00:01:15,120 So they're chosen uniformly at random. 29 00:01:15,120 --> 00:01:18,580 And let's assign a couple of random variables. 30 00:01:18,580 --> 00:01:28,100 So we'll say x corresponds to the number of students in the 31 00:01:28,100 --> 00:01:40,010 bus of the selected student. 32 00:01:40,010 --> 00:01:44,830 OK, so one of these 148 students is selected uniformly 33 00:01:44,830 --> 00:01:47,670 at random, and we'll let x correspond to the number of 34 00:01:47,670 --> 00:01:51,310 students in that student's bus. 35 00:01:51,310 --> 00:01:55,310 So if a student from this bus was chosen, then x would be 36 00:01:55,310 --> 00:01:57,750 25, for example. 37 00:01:57,750 --> 00:02:00,340 OK, and then let's come up with another random variable, 38 00:02:00,340 --> 00:02:04,430 y, which is almost the same thing. 39 00:02:04,430 --> 00:02:08,789 Except instead of now selecting a random student, 40 00:02:08,789 --> 00:02:11,920 we'll select a random bus. 41 00:02:11,920 --> 00:02:17,110 Or equivalently, we'll select a random bus driver. 42 00:02:17,110 --> 00:02:20,390 So each bus has one driver, and instead of selecting one 43 00:02:20,390 --> 00:02:23,320 of the 148 students at random, we'll select one of the four 44 00:02:23,320 --> 00:02:26,620 bus drivers also uniformly at random. 45 00:02:26,620 --> 00:02:30,110 And we'll say the number of students in that 46 00:02:30,110 --> 00:02:32,930 driver's bus will be y. 47 00:02:32,930 --> 00:02:36,940 So for example, if this bus driver was selected, then y 48 00:02:36,940 --> 00:02:38,820 would be 33. 49 00:02:38,820 --> 00:02:44,640 OK, so the main problem that we're trying to answer is what 50 00:02:44,640 --> 00:02:47,270 do you expect the expectation-- 51 00:02:47,270 --> 00:02:48,910 which one of these random variables do you expect to 52 00:02:48,910 --> 00:02:53,050 have the higher expectation or the higher expected value? 53 00:02:53,050 --> 00:02:56,280 So, would you expect x to be higher on 54 00:02:56,280 --> 00:02:58,050 average, or y to be higher? 55 00:02:58,050 --> 00:03:00,620 And what would be the intuition for this? 56 00:03:00,620 --> 00:03:02,910 So obviously, we can actually write out the 57 00:03:02,910 --> 00:03:03,990 PMFs for x and y. 58 00:03:03,990 --> 00:03:05,780 These are just discrete random variables. 59 00:03:05,780 --> 00:03:08,170 And we can actually calculate out what the expectation is. 60 00:03:08,170 --> 00:03:11,190 But it's also useful to exercise your intuition, and 61 00:03:11,190 --> 00:03:14,260 your sense of what the answer should be. 62 00:03:14,260 --> 00:03:18,420 So it might not be immediately clear which one would be 63 00:03:18,420 --> 00:03:20,640 higher, or you might even say that maybe it doesn't make a 64 00:03:20,640 --> 00:03:21,280 difference. 65 00:03:21,280 --> 00:03:23,350 They're actually the same. 66 00:03:23,350 --> 00:03:27,800 But a useful way to approach some of these questions is to 67 00:03:27,800 --> 00:03:30,360 try to take things to the extreme and see 68 00:03:30,360 --> 00:03:31,440 how that plays out. 69 00:03:31,440 --> 00:03:33,580 So let's take the simpler example and take it to the 70 00:03:33,580 --> 00:03:37,260 extreme and say, suppose a set of four buses carrying these 71 00:03:37,260 --> 00:03:38,370 number of students. 72 00:03:38,370 --> 00:03:39,620 We have only two buses-- 73 00:03:39,620 --> 00:03:49,280 one bus that has only 1 student, and we have another 74 00:03:49,280 --> 00:03:57,280 bus that has 1,000 students. 75 00:03:57,280 --> 00:03:58,370 OK. 76 00:03:58,370 --> 00:04:00,840 And suppose we ask the same question. 77 00:04:00,840 --> 00:04:05,880 Well, now if you look at it, there's a total of 1,001 78 00:04:05,880 --> 00:04:06,850 students now. 79 00:04:06,850 --> 00:04:10,770 If you select one of the students at random, it's 80 00:04:10,770 --> 00:04:14,040 overwhelmingly more likely that that student will be one 81 00:04:14,040 --> 00:04:17,140 of the 1,000 students on this huge bus. 82 00:04:17,140 --> 00:04:20,630 It's very unlikely that you'll get lucky and select the one 83 00:04:20,630 --> 00:04:23,140 student who is by himself. 84 00:04:23,140 --> 00:04:27,210 And so because of that, you have a very high chance of 85 00:04:27,210 --> 00:04:30,930 selecting the bus with the high number of students. 86 00:04:30,930 --> 00:04:33,710 And so you would expect x, the number of 87 00:04:33,710 --> 00:04:37,490 students, to be high-- 88 00:04:37,490 --> 00:04:40,590 to be almost 1,000 in the expectation. 89 00:04:40,590 --> 00:04:44,840 But on the other hand, if you selected the driver at random, 90 00:04:44,840 --> 00:04:46,880 then you have a 50/50 chance of selecting 91 00:04:46,880 --> 00:04:48,430 this one or that one. 92 00:04:48,430 --> 00:04:54,210 And so you would expect the expectation there to be 93 00:04:54,210 --> 00:04:56,160 roughly 500 or so. 94 00:04:56,160 --> 00:04:59,740 And so you can see that if you take this to the extreme, then 95 00:04:59,740 --> 00:05:03,240 it becomes more clear what the answer would be. 96 00:05:03,240 --> 00:05:06,650 And the argument is that the expectation of x should be 97 00:05:06,650 --> 00:05:10,930 higher than the expectation of y, and the reason here is that 98 00:05:10,930 --> 00:05:14,250 because you select the student at random, you're more likely 99 00:05:14,250 --> 00:05:18,410 to select a student who is in a large bus, because that bus 100 00:05:18,410 --> 00:05:20,920 just has more students to select from. 101 00:05:20,920 --> 00:05:23,910 And because of that, you're more biased in favor of 102 00:05:23,910 --> 00:05:27,980 selecting large buses, and therefore, that makes x higher 103 00:05:27,980 --> 00:05:29,910 in expectation. 104 00:05:29,910 --> 00:05:32,580 OK, so that's the intuition behind this problem. 105 00:05:32,580 --> 00:05:34,240 And now, as I actually go through some of the more 106 00:05:34,240 --> 00:05:38,100 mechanics and write out what the PMFs and the calculation 107 00:05:38,100 --> 00:05:40,640 for the expectation would be to verify that our intuition 108 00:05:40,640 --> 00:05:42,400 is actually correct. 109 00:05:42,400 --> 00:05:46,020 OK, so we have two random variables that are defined. 110 00:05:46,020 --> 00:05:48,940 Now let's just write out what their PMFs are. 111 00:05:48,940 --> 00:05:51,270 So the PMF-- 112 00:05:51,270 --> 00:05:58,240 we write it as little P of capital X and little x. 113 00:05:58,240 --> 00:06:00,740 So the random variable-- what we do is we say the 114 00:06:00,740 --> 00:06:03,970 probability that it will take on a certain value, right? 115 00:06:03,970 --> 00:06:09,210 So what is the probability that x will be 40? 116 00:06:09,210 --> 00:06:13,030 Well, x will be 40 if a student from 117 00:06:13,030 --> 00:06:14,810 this bus was selected. 118 00:06:14,810 --> 00:06:16,870 And what's the probability that a student from this bus 119 00:06:16,870 --> 00:06:17,570 is selected? 120 00:06:17,570 --> 00:06:23,230 That probability is 40/148, because there's 148 students, 121 00:06:23,230 --> 00:06:27,160 40 of whom are sitting in this bus. 122 00:06:27,160 --> 00:06:35,470 And similarly, x will be 33 with probability 33/148, and x 123 00:06:35,470 --> 00:06:40,750 will be 25 with probability 25/148. 124 00:06:40,750 --> 00:06:45,120 And x will be 50 with probability 50/148. 125 00:06:45,120 --> 00:06:47,030 And it will be 0 otherwise. 126 00:06:47,030 --> 00:06:51,750 127 00:06:51,750 --> 00:06:57,440 OK, so there is our PMF for x, and we can do the 128 00:06:57,440 --> 00:06:59,920 same thing for y. 129 00:06:59,920 --> 00:07:02,060 The PMF of y-- 130 00:07:02,060 --> 00:07:05,160 again, we say what is the probability that y will take 131 00:07:05,160 --> 00:07:06,150 on certain values? 132 00:07:06,150 --> 00:07:09,900 Well, y can take on the same values as x can, because we're 133 00:07:09,900 --> 00:07:12,580 still dealing with the number of students in each bus. 134 00:07:12,580 --> 00:07:14,910 So y can be 40. 135 00:07:14,910 --> 00:07:17,390 But the probability that y is 40, because we're selecting 136 00:07:17,390 --> 00:07:20,290 the driver at random now, is 1/4, right? 137 00:07:20,290 --> 00:07:23,260 Because there's a 1/4 chance that we'll pick this driver. 138 00:07:23,260 --> 00:07:27,960 And the probability that y will be 33 will also be 1/4, 139 00:07:27,960 --> 00:07:35,840 and the same thing for 25 and 50. 140 00:07:35,840 --> 00:07:42,260 And it's 0 otherwise. 141 00:07:42,260 --> 00:07:49,690 OK, so those are the PMFs for our two random 142 00:07:49,690 --> 00:07:51,950 variables, x and y. 143 00:07:51,950 --> 00:07:55,630 And we can also draw out what the PMFs look like. 144 00:07:55,630 --> 00:08:14,730 So if this is 25, 30, 35, 40, 45, and 50, then the 145 00:08:14,730 --> 00:08:17,650 probability that it's 25 is 25/148. 146 00:08:17,650 --> 00:08:21,290 So we can draw a mass right there. 147 00:08:21,290 --> 00:08:24,130 For 33, it's a little higher, because it's 148 00:08:24,130 --> 00:08:27,440 33/148 instead of 25. 149 00:08:27,440 --> 00:08:29,260 For 40, it's even higher still. 150 00:08:29,260 --> 00:08:30,220 It's 40/148. 151 00:08:30,220 --> 00:08:39,380 And for 50, it is still higher, because it is 50/148. 152 00:08:39,380 --> 00:08:44,620 And so you can see that the PMF is more heavily favored 153 00:08:44,620 --> 00:08:47,410 towards the larger values. 154 00:08:47,410 --> 00:08:51,610 We can do the same thing for y, and we'll notice that 155 00:08:51,610 --> 00:08:54,690 there's a difference in how these distributions look. 156 00:08:54,690 --> 00:09:00,460 157 00:09:00,460 --> 00:09:05,280 So if we do the same thing, the difference now is that all 158 00:09:05,280 --> 00:09:11,500 four of these masses will have the same height. 159 00:09:11,500 --> 00:09:16,240 Each one will have height 1/4, whereas this one for x, it's 160 00:09:16,240 --> 00:09:18,710 more heavily biased in favor of the larger ones. 161 00:09:18,710 --> 00:09:21,410 And so because of that, we can actually now calculate what 162 00:09:21,410 --> 00:09:24,740 the expectations are and figure out whether or not our 163 00:09:24,740 --> 00:09:27,600 intuition was correct. 164 00:09:27,600 --> 00:09:30,760 OK, so now let's actually calculate out what these 165 00:09:30,760 --> 00:09:33,610 expectations are. 166 00:09:33,610 --> 00:09:37,880 So as you recall, the expectation is calculated out 167 00:09:37,880 --> 00:09:39,830 as a weighted sum. 168 00:09:39,830 --> 00:09:45,490 So for each possible value of x, you take that value and you 169 00:09:45,490 --> 00:09:50,070 weight it by the probability of the random variable taking 170 00:09:50,070 --> 00:09:52,080 on that value. 171 00:09:52,080 --> 00:09:59,920 So in this case, it would be 40 times 40/148, 33 times 172 00:09:59,920 --> 00:10:10,650 33/148, and so on. 173 00:10:10,650 --> 00:10:20,760 48 plus 25 times 25/148 plus 50 times 50/148. 174 00:10:20,760 --> 00:10:25,810 And if you do out this calculation, what you'll get 175 00:10:25,810 --> 00:10:30,820 is that it is around 39. 176 00:10:30,820 --> 00:10:33,070 Roughly 39. 177 00:10:33,070 --> 00:10:36,910 And now we can do the same thing for y. 178 00:10:36,910 --> 00:10:41,650 But for y, it's different, because now instead of 179 00:10:41,650 --> 00:10:44,650 weighting it by these probabilities, we'll weight it 180 00:10:44,650 --> 00:10:45,920 by these probabilities. 181 00:10:45,920 --> 00:10:48,600 So each one has the same weight of 1/4. 182 00:10:48,600 --> 00:10:55,390 So now we get 40 times 1/4 plus 33 times 1/4. 183 00:10:55,390 --> 00:11:01,130 That's 25 times 1/4 plus 50 times 1/4. 184 00:11:01,130 --> 00:11:06,030 And if you do out this arithmetic, what you get is 185 00:11:06,030 --> 00:11:10,310 that this expectation is 37. 186 00:11:10,310 --> 00:11:13,930 And so what we get is that, in fact, after we do out the 187 00:11:13,930 --> 00:11:17,090 calculations, the expected value of x is indeed greater 188 00:11:17,090 --> 00:11:20,110 than the expected value of y, which confirms our intuition. 189 00:11:20,110 --> 00:11:24,310 OK, so this problem, to summarize-- we've reviewed how 190 00:11:24,310 --> 00:11:27,650 to write out a PMF and also how to calculate expectations. 191 00:11:27,650 --> 00:11:33,540 But also, we've got a chance to figure out some intuition 192 00:11:33,540 --> 00:11:35,840 behind some of these problems. 193 00:11:35,840 --> 00:11:39,480 And so sometimes it's helpful to take simpler things and 194 00:11:39,480 --> 00:11:42,250 take things to the extreme and figure out intuitively whether 195 00:11:42,250 --> 00:11:43,520 or not the answer makes sense. 196 00:11:43,520 --> 00:11:47,530 It's useful just to verify whether the numerical answer 197 00:11:47,530 --> 00:11:48,950 that you get in the end is correct. 198 00:11:48,950 --> 00:11:50,440 Does this actually make sense? 199 00:11:50,440 --> 00:11:53,850 It's a useful guide for when you're solving these problems. 200 00:11:53,850 --> 00:11:55,310 OK, so we'll see you next time. 201 00:11:55,310 --> 00:11:56,560