1
00:00:00,000 --> 00:00:00,530
2
00:00:00,530 --> 00:00:03,340
In this exercise, we'll be
looking at a problem, also
3
00:00:03,340 --> 00:00:05,710
know as the coupons collector's
problem.
4
00:00:05,710 --> 00:00:09,500
We have a set of K coupons,
or grades in our case.
5
00:00:09,500 --> 00:00:11,230
And each time slot
we're revealed
6
00:00:11,230 --> 00:00:13,210
with one random grade.
7
00:00:13,210 --> 00:00:15,230
And we'd like to know how long
it would take for us to
8
00:00:15,230 --> 00:00:16,900
collect all K grades.
9
00:00:16,900 --> 00:00:20,690
In our case, K is equal to 6.
10
00:00:20,690 --> 00:00:22,390
Now the key to solving
the problem
11
00:00:22,390 --> 00:00:23,930
is essentially twofolds.
12
00:00:23,930 --> 00:00:26,980
First, we'll have to find a way
to intelligently define
13
00:00:26,980 --> 00:00:30,050
sequence random variables that
captured, essentially,
14
00:00:30,050 --> 00:00:32,280
stopping time of this process.
15
00:00:32,280 --> 00:00:35,930
And then we'll employ the idea
of linearity of expectations
16
00:00:35,930 --> 00:00:39,740
in breaking down this value
in simpler terms.
17
00:00:39,740 --> 00:00:41,260
So let's get started.
18
00:00:41,260 --> 00:00:49,440
We'll define Yi as the number
of papers till we see
19
00:00:49,440 --> 00:00:50,910
the i-th new grade.
20
00:00:50,910 --> 00:00:56,420
21
00:00:56,420 --> 00:00:57,280
What does that mean?
22
00:00:57,280 --> 00:01:00,040
Well, let's take a look
at an example.
23
00:01:00,040 --> 00:01:04,069
Suppose, here we have a timeline
from no paper yet,
24
00:01:04,069 --> 00:01:06,740
first paper, second paper,
third paper,
25
00:01:06,740 --> 00:01:08,430
so on, and so forth.
26
00:01:08,430 --> 00:01:12,230
Now, if we got grade A on the
first slot, grade A minus on
27
00:01:12,230 --> 00:01:16,020
second slot, A again on the
third slot, let's say there's
28
00:01:16,020 --> 00:01:17,960
a fourth's slot, we got B.
29
00:01:17,960 --> 00:01:22,380
According to this process, we
see that Y1 is always 1,
30
00:01:22,380 --> 00:01:24,100
because whatever we got
on the first slot
31
00:01:24,100 --> 00:01:25,810
will be a new grade.
32
00:01:25,810 --> 00:01:29,000
Now, Y2 is 2, because
the second paper is,
33
00:01:29,000 --> 00:01:31,190
again, a new grade.
34
00:01:31,190 --> 00:01:33,690
On the third paper we got a
grade, which is the same as
35
00:01:33,690 --> 00:01:34,950
the first grade.
36
00:01:34,950 --> 00:01:38,150
So that would not
count as any Yi.
37
00:01:38,150 --> 00:01:43,940
And the third time we saw new
grade would now be paper four.
38
00:01:43,940 --> 00:01:47,490
According to this notation,
we're interested in knowing
39
00:01:47,490 --> 00:01:53,600
what is the expected value of E
of Y6, which is the time it
40
00:01:53,600 --> 00:01:56,270
takes to receive
all six grades.
41
00:01:56,270 --> 00:01:59,180
So so far this notation isn't
really helping us in solving
42
00:01:59,180 --> 00:02:02,290
the problem, but kind of just
staying a different way.
43
00:02:02,290 --> 00:02:05,290
It turns out, it's much easier
to look at the following
44
00:02:05,290 --> 00:02:07,690
variable derived from the Yis.
45
00:02:07,690 --> 00:02:11,090
We'll define Xi as the
difference between Yi
46
00:02:11,090 --> 00:02:13,700
plus 1 minus Yi.
47
00:02:13,700 --> 00:02:17,690
And in [? words, ?] it says, Xi
is a number of papers you
48
00:02:17,690 --> 00:02:21,950
need until you see the i plus
1-th new grade, after you have
49
00:02:21,950 --> 00:02:23,840
received i new grade so far.
50
00:02:23,840 --> 00:02:30,450
So in this case, X1 will be if
we call 0, Y0, will be the
51
00:02:30,450 --> 00:02:34,030
difference between Y1 and
Y0, which is always 1--
52
00:02:34,030 --> 00:02:35,270
that's X1.
53
00:02:35,270 --> 00:02:38,100
And the difference between
these two will be X2.
54
00:02:38,100 --> 00:02:42,100
And the difference between
Y3 and Y2--
55
00:02:42,100 --> 00:02:44,685
Sorry.
56
00:02:44,685 --> 00:02:51,590
It should be Y X0,
1, 2, and so on.
57
00:02:51,590 --> 00:02:53,610
OK?
58
00:02:53,610 --> 00:02:59,040
Through this notation we see
that Y6 now can be written as
59
00:02:59,040 --> 00:03:04,700
the summation of i equal
to 0, 2, 5, X, i.
60
00:03:04,700 --> 00:03:08,580
So all I did was to break down
i6 into a sequence of
61
00:03:08,580 --> 00:03:13,220
summation of the differences,
like Y. 6 Minus Y5, Y5 minus
62
00:03:13,220 --> 00:03:14,960
Y4, and so on.
63
00:03:14,960 --> 00:03:19,060
It turns out, this expression
will be very useful.
64
00:03:19,060 --> 00:03:20,420
OK.
65
00:03:20,420 --> 00:03:25,280
So now that we have the two
variables Y and X, let's see
66
00:03:25,280 --> 00:03:28,200
if it will be easier to look
at the distribution of X in
67
00:03:28,200 --> 00:03:30,170
studying this process.
68
00:03:30,170 --> 00:03:34,370
Let's say, we have seen
a new grade so far--
69
00:03:34,370 --> 00:03:35,400
one.
70
00:03:35,400 --> 00:03:37,660
How many trials would it
take for us to see
71
00:03:37,660 --> 00:03:38,710
the second new grade?
72
00:03:38,710 --> 00:03:40,720
It turns out it's
not that hard.
73
00:03:40,720 --> 00:03:44,790
In this case, we know there is
a total of six grades, and we
74
00:03:44,790 --> 00:03:45,890
have seen one of them.
75
00:03:45,890 --> 00:03:48,920
So that leaves us five
more grades that
76
00:03:48,920 --> 00:03:50,290
we'll potentially see.
77
00:03:50,290 --> 00:03:53,740
And therefore, on any random
trial after that, there is a
78
00:03:53,740 --> 00:03:57,560
probability of 5 over 6 that
we'll see a new grade.
79
00:03:57,560 --> 00:04:03,970
And hence, we know that X1 has
a distribution geometric with
80
00:04:03,970 --> 00:04:08,350
a success probability,
or a parameter, 5/6.
81
00:04:08,350 --> 00:04:12,580
Now, more generally, if we
extend this idea further, we
82
00:04:12,580 --> 00:04:19,019
see that Xi will have a
geometric distribution of
83
00:04:19,019 --> 00:04:24,840
parameter 6 minus i over 6.
84
00:04:24,840 --> 00:04:27,120
And this is due to the fact that
so far we have already
85
00:04:27,120 --> 00:04:29,630
seen i new grades.
86
00:04:29,630 --> 00:04:33,350
And that will be the success
probability of seeing a
87
00:04:33,350 --> 00:04:35,480
further new grade.
88
00:04:35,480 --> 00:04:39,670
So from this expression, we know
that the expected value
89
00:04:39,670 --> 00:04:45,780
of Xi will simply be the inverse
of the parameter of
90
00:04:45,780 --> 00:04:51,730
the geometric distribution,
which is 6 over 6 minus i or 6
91
00:04:51,730 --> 00:04:54,350
times 1 over 6 minus i.
92
00:04:54,350 --> 00:04:56,930
93
00:04:56,930 --> 00:05:00,600
And now we're ready to compute
a final answer.
94
00:05:00,600 --> 00:05:05,550
So from this expression we know
expected value of Y6 is
95
00:05:05,550 --> 00:05:13,280
equal to the expected value of
sum of i equal to 0 to 5 Xi.
96
00:05:13,280 --> 00:05:16,360
97
00:05:16,360 --> 00:05:19,600
And by the linearity of
expectation, we can pull out
98
00:05:19,600 --> 00:05:28,140
the sum and write it as 2,
5 expected value of Xi.
99
00:05:28,140 --> 00:05:31,220
Now, since we know that
expective of Xi is the
100
00:05:31,220 --> 00:05:32,670
following expression.
101
00:05:32,670 --> 00:05:36,990
We see that this term is equal
to 6 times expected value of i
102
00:05:36,990 --> 00:05:43,260
equals 0, 5, 1 over 6 minus i.
103
00:05:43,260 --> 00:05:48,260
Or written in the other way
this is equal to 6 times i
104
00:05:48,260 --> 00:05:51,356
equal to 0, 2, 5.
105
00:05:51,356 --> 00:06:03,510
In fact, 1, 2, 5, 1 over i.
106
00:06:03,510 --> 00:06:05,970
And all I did here was to,
essentially, change the
107
00:06:05,970 --> 00:06:10,190
variable, so that these two
summations contained exactly
108
00:06:10,190 --> 00:06:12,030
the same terms.
109
00:06:12,030 --> 00:06:20,840
And this will give us the
answer, which is 14.7.
110
00:06:20,840 --> 00:06:24,420
Now, more generally, we can
see that there's nothing
111
00:06:24,420 --> 00:06:26,400
special about number 6 here.
112
00:06:26,400 --> 00:06:30,640
We could have substituted 6 with
a number, let's say, K.
113
00:06:30,640 --> 00:06:37,840
And then we'll get E of YK,
let's say, there's more than
114
00:06:37,840 --> 00:06:39,580
six labels.
115
00:06:39,580 --> 00:06:45,030
And this will give us K times
summation i equal to 1, so K
116
00:06:45,030 --> 00:06:47,740
minus 1, 1 over i.
117
00:06:47,740 --> 00:06:51,000
Interestingly, it turns out
this quantity has an
118
00:06:51,000 --> 00:06:51,480
[? asymptotic ?]
119
00:06:51,480 --> 00:06:56,170
expression that, essentially,
is roughly equal to K times
120
00:06:56,170 --> 00:07:01,100
the natural logarithm of K.
And this is known as the
121
00:07:01,100 --> 00:07:02,190
scaling [? la ?]
122
00:07:02,190 --> 00:07:04,680
for the coupon collector's
problem that says,
123
00:07:04,680 --> 00:07:06,810
essentially, takes about
K times [? la ?]
124
00:07:06,810 --> 00:07:11,650
K many trials until we collect
all K coupons.
125
00:07:11,650 --> 00:07:13,800
And that'll be the end
of the problem.
126
00:07:13,800 --> 00:07:15,050
See you next time.
127
00:07:15,050 --> 00:07:15,850