1
00:00:00,000 --> 00:00:01,924
[SQUEAKING]

2
00:00:01,924 --> 00:00:04,329
[RUSTLING]

3
00:00:04,329 --> 00:00:05,772
[CLICKING]

4
00:00:12,510 --> 00:00:13,510
ERIK DEMAINE: All right.

5
00:00:13,510 --> 00:00:18,080
Welcome to practice
problem session 3, 006.

6
00:00:18,080 --> 00:00:20,600
Today, we are going to go
through a bunch of problems,

7
00:00:20,600 --> 00:00:23,497
which you should have already.

8
00:00:23,497 --> 00:00:25,580
I was thinking of skipping
the very first problem,

9
00:00:25,580 --> 00:00:27,200
because it's just
a mechanical thing.

10
00:00:27,200 --> 00:00:28,910
If we have time at the end,
we can come back to it.

11
00:00:28,910 --> 00:00:31,310
But there's not really any
insight I can give you in how

12
00:00:31,310 --> 00:00:32,352
to approach this problem.

13
00:00:32,352 --> 00:00:34,220
It's just, do you
understand hashing?

14
00:00:34,220 --> 00:00:36,470
So I want to go into the
more creative problems first.

15
00:00:36,470 --> 00:00:41,010
Let's start with problem
3.2, hash sequence.

16
00:00:41,010 --> 00:00:42,080
So I'll just read it.

17
00:00:42,080 --> 00:00:46,070
And then our first task is
to convert the word problem

18
00:00:46,070 --> 00:00:49,635
into a concise formal algorithms
thing we need to achieve.

19
00:00:49,635 --> 00:00:52,010
Then we need to come up with
ideas for how to achieve it.

20
00:00:52,010 --> 00:00:53,990
And we need to
check the details.

21
00:00:53,990 --> 00:00:55,885
That'll be our general pattern.

22
00:00:55,885 --> 00:00:57,260
So this problem
says, hash tables

23
00:00:57,260 --> 00:00:59,210
are not only useful for
implementing set operations,

24
00:00:59,210 --> 00:01:01,085
they can also be used
to implement sequences.

25
00:01:01,085 --> 00:01:04,730
Remember from lecture 2,
we have a set interface,

26
00:01:04,730 --> 00:01:07,370
which is about querying
items by their key

27
00:01:07,370 --> 00:01:09,260
and sort of intrinsic
order that's

28
00:01:09,260 --> 00:01:11,672
about the items themselves,
versus a sequence

29
00:01:11,672 --> 00:01:13,880
interface that we started
out with, with linked lists

30
00:01:13,880 --> 00:01:18,482
and so on, and arrays,
where we're given an order,

31
00:01:18,482 --> 00:01:19,940
and we want to
maintain that order.

32
00:01:19,940 --> 00:01:21,440
And that order may
not have anything

33
00:01:21,440 --> 00:01:23,080
to do with the items themselves.

34
00:01:23,080 --> 00:01:24,890
So that's what we call
an extrinsic order.

35
00:01:24,890 --> 00:01:26,515
We're told what the
order is by saying,

36
00:01:26,515 --> 00:01:29,240
insert this item after this one,
or append this one to the end,

37
00:01:29,240 --> 00:01:31,620
or prepend it to the beginning.

38
00:01:31,620 --> 00:01:36,830
So in lecture last week, we
saw hash tables implement sets.

39
00:01:36,830 --> 00:01:43,470
And let me just remind you
some things that they can do.

40
00:01:43,470 --> 00:01:47,460
So we have, on the
one hand, set hashing.

41
00:01:50,330 --> 00:01:55,760
So we're going to
need this in a moment.

42
00:01:55,760 --> 00:01:58,320
This is just a
reminder from lecture.

43
00:01:58,320 --> 00:02:01,910
We can build one in
linear time expected.

44
00:02:01,910 --> 00:02:06,530
We can find an item in
constant timed expected by key.

45
00:02:06,530 --> 00:02:12,320
And we can insert
or delete an item

46
00:02:12,320 --> 00:02:17,480
in constant expected amortized.

47
00:02:20,570 --> 00:02:25,070
OK, so this is a black
box that we're given.

48
00:02:25,070 --> 00:02:31,130
And the problem statement
says that, imagine you're

49
00:02:31,130 --> 00:02:33,410
given a hash table
as a black box, which

50
00:02:33,410 --> 00:02:37,160
means we're giving a thing
that behaves just like a--

51
00:02:37,160 --> 00:02:39,530
thank you, 2.

52
00:02:39,530 --> 00:02:41,870
We're given something
that is a hash table.

53
00:02:41,870 --> 00:02:43,653
But it's black box in
the sense we're not

54
00:02:43,653 --> 00:02:46,070
allowed to reach in and change
the implementation details.

55
00:02:46,070 --> 00:02:49,190
We're supposed to use it as is,
just by calling its interface.

56
00:02:49,190 --> 00:02:51,440
So in particular, we're
giving these three operations.

57
00:02:51,440 --> 00:02:55,105
I'll maybe also use iter to
iterate through the items.

58
00:02:55,105 --> 00:02:57,230
So we're allowed to build
something in linear time,

59
00:02:57,230 --> 00:03:02,120
find, and insert and delete in
constant expected amortized.

60
00:03:02,120 --> 00:03:03,920
And what the
problem is asking is

61
00:03:03,920 --> 00:03:06,320
to build out of
this data structure

62
00:03:06,320 --> 00:03:11,540
a sequence with
particular time balance.

63
00:03:11,540 --> 00:03:17,000
So this is what we call
a reduction in that we're

64
00:03:17,000 --> 00:03:19,700
going to convert--

65
00:03:19,700 --> 00:03:22,820
I guess technically, we're
reducing the sequence problem

66
00:03:22,820 --> 00:03:25,838
to the set problem,
because we're

67
00:03:25,838 --> 00:03:28,130
showing how to solve the
sequence problem using the set

68
00:03:28,130 --> 00:03:28,910
problem.

69
00:03:28,910 --> 00:03:30,980
But the way we'll think about
it is in the other direction.

70
00:03:30,980 --> 00:03:32,660
We're given a data
structure that solves set.

71
00:03:32,660 --> 00:03:34,550
And we're going to convert
it into a data structure that

72
00:03:34,550 --> 00:03:35,640
solves sequence.

73
00:03:35,640 --> 00:03:38,580
So given that we already know
how to do this from lecture,

74
00:03:38,580 --> 00:03:40,661
we're going to learn
how to do this.

75
00:03:40,661 --> 00:03:43,945
This is teaching you new
stuff in a problem set.

76
00:03:43,945 --> 00:03:45,320
So the specific
bounds that we're

77
00:03:45,320 --> 00:03:51,020
told to achieve are build
in constant expected time,

78
00:03:51,020 --> 00:04:00,290
get and set_at in
constant expected time,

79
00:04:00,290 --> 00:04:12,770
insert and delete_at in
linear expected time,

80
00:04:12,770 --> 00:04:23,930
and insert and delete
first and last in--

81
00:04:23,930 --> 00:04:26,060
we're running out of room here--

82
00:04:26,060 --> 00:04:31,550
constant expected amortized.

83
00:04:31,550 --> 00:04:33,530
OK.

84
00:04:33,530 --> 00:04:36,560
So this is just what
we're told to do.

85
00:04:36,560 --> 00:04:38,730
And now we start thinking.

86
00:04:38,730 --> 00:04:39,890
So we're given this.

87
00:04:39,890 --> 00:04:42,500
We want to build this.

88
00:04:42,500 --> 00:04:44,990
And so I'm going to tell you
a little bit about my thought

89
00:04:44,990 --> 00:04:45,620
process.

90
00:04:45,620 --> 00:04:47,453
When I'm presented with
a problem like this,

91
00:04:47,453 --> 00:04:49,070
the first thing is
to read the problem

92
00:04:49,070 --> 00:04:52,210
and see, OK, what's
the hard part here?

93
00:04:52,210 --> 00:04:53,210
What are the challenges?

94
00:04:53,210 --> 00:04:55,010
So clearly, we
have to do all four

95
00:04:55,010 --> 00:04:56,480
of these types of operations.

96
00:04:56,480 --> 00:04:59,060
Build in linear expected time--
that's basically everything

97
00:04:59,060 --> 00:05:00,140
we've seen.

98
00:05:00,140 --> 00:05:03,890
Get or set_at in constant
expected time-- that's fast.

99
00:05:03,890 --> 00:05:07,400
And that feels kind of
like this find operation.

100
00:05:07,400 --> 00:05:10,520
So both of these seem
pretty matchy-matchy.

101
00:05:10,520 --> 00:05:12,770
So that looks like
a good mapping.

102
00:05:12,770 --> 00:05:14,770
I'm going to try to build
these operations using

103
00:05:14,770 --> 00:05:16,150
those operations.

104
00:05:16,150 --> 00:05:20,260
Insert and delete at
a specific location

105
00:05:20,260 --> 00:05:22,270
in constant expected time--

106
00:05:22,270 --> 00:05:25,270
sorry, linear expected
time-- that's big.

107
00:05:25,270 --> 00:05:28,540
Linear expected time means I
can rebuild the entire data

108
00:05:28,540 --> 00:05:30,730
structure every time
I do an operation.

109
00:05:30,730 --> 00:05:32,627
So this is easy.

110
00:05:32,627 --> 00:05:34,210
OK, that's the first
thing to realize.

111
00:05:34,210 --> 00:05:36,350
This is big.

112
00:05:36,350 --> 00:05:36,850
Great.

113
00:05:36,850 --> 00:05:39,160
So I don't really have to
worry about these operations.

114
00:05:39,160 --> 00:05:40,780
I mean, I do have
to implement them.

115
00:05:40,780 --> 00:05:46,480
But it's not hard to do it that
fast, because I can rebuild.

116
00:05:46,480 --> 00:05:49,390
And then here, insert and delete
at the beginning and the end

117
00:05:49,390 --> 00:05:52,300
of the array, these are the DEQ,
Double-Ended Queue operations,

118
00:05:52,300 --> 00:05:54,520
insert and delete at
either end, in constant

119
00:05:54,520 --> 00:05:55,780
expected amortized time.

120
00:05:55,780 --> 00:05:58,970
This, I feel like,
is a tricky one.

121
00:05:58,970 --> 00:06:01,510
You've seen one way to
do this in a problem set.

122
00:06:01,510 --> 00:06:04,480
But now we're going to
see another way with--

123
00:06:04,480 --> 00:06:08,800
OK, the other thing to notice
is these "expected" words.

124
00:06:08,800 --> 00:06:10,780
In this case, we're
told to use hashing.

125
00:06:10,780 --> 00:06:13,960
But with lot of the problems,
you're not told how to solve it

126
00:06:13,960 --> 00:06:16,570
or what you should be
basing your thing on.

127
00:06:16,570 --> 00:06:19,330
And so "expected" is
always a good keyword,

128
00:06:19,330 --> 00:06:21,730
because it means randomization
is involved somehow.

129
00:06:21,730 --> 00:06:24,460
If you're told the bound
is going to be expected,

130
00:06:24,460 --> 00:06:28,030
you probably need to
use randomization.

131
00:06:28,030 --> 00:06:30,830
And in this class, the only form
of randomization you will use

132
00:06:30,830 --> 00:06:32,320
is essentially hashing.

133
00:06:32,320 --> 00:06:35,645
So that's a good hint.

134
00:06:35,645 --> 00:06:38,020
In this case, we know what
we're supposed to use hashing.

135
00:06:38,020 --> 00:06:43,870
All right, so this is
going to be the challenge.

136
00:06:46,390 --> 00:06:49,460
But any ideas on how we
might tackle this problem?

137
00:06:49,460 --> 00:06:53,500
How can we-- so set, remember,
every item has a key.

138
00:06:53,500 --> 00:06:55,823
In a sequence, items
are just items.

139
00:06:55,823 --> 00:06:57,490
And we're told to
insert and delete them

140
00:06:57,490 --> 00:06:59,110
at particular locations.

141
00:06:59,110 --> 00:07:01,060
But they don't have keys.

142
00:07:01,060 --> 00:07:02,440
So one of the
challenges is going

143
00:07:02,440 --> 00:07:05,770
to be to take our items
here, give them keys

144
00:07:05,770 --> 00:07:07,460
so that we can
store them in a set.

145
00:07:07,460 --> 00:07:12,556
Otherwise, we can't use find.

146
00:07:12,556 --> 00:07:14,700
If there's no keys
there, there's

147
00:07:14,700 --> 00:07:15,700
no way to search by key.

148
00:07:18,100 --> 00:07:18,600
Ideas?

149
00:07:23,870 --> 00:07:27,830
So let's think about
what we want to do.

150
00:07:27,830 --> 00:07:31,370
Let's start with-- so
build, I think, is fine.

151
00:07:31,370 --> 00:07:33,370
If you just want to
build a data structure,

152
00:07:33,370 --> 00:07:35,220
you don't need to do anything.

153
00:07:35,220 --> 00:07:37,460
The hard part are the
queries or updates

154
00:07:37,460 --> 00:07:39,870
you want to be able to do
on your data structure.

155
00:07:39,870 --> 00:07:42,900
Let's start with this
operation, get and set_at.

156
00:07:42,900 --> 00:07:46,280
So remember, get_at,
you're given an index i,

157
00:07:46,280 --> 00:07:53,950
and you want to find
the item at position i.

158
00:07:53,950 --> 00:07:58,190
And set_at, we're
given a position,

159
00:07:58,190 --> 00:08:00,965
and we want to change the
item stored at that position,

160
00:08:00,965 --> 00:08:03,210
at that index, i.

161
00:08:03,210 --> 00:08:06,420
Now, over here,
what we're given,

162
00:08:06,420 --> 00:08:07,850
we can insert and delete.

163
00:08:07,850 --> 00:08:09,920
But the main sort of lookup--

164
00:08:09,920 --> 00:08:11,870
let's think about get_at first.

165
00:08:11,870 --> 00:08:16,170
A natural mapping, given
this arrow, is find.

166
00:08:16,170 --> 00:08:19,500
Find will search
for an item by key.

167
00:08:19,500 --> 00:08:21,750
So here's-- just
staring at that,

168
00:08:21,750 --> 00:08:24,110
let's look at all the possible
pairings you could do.

169
00:08:24,110 --> 00:08:25,760
We have find by key over here.

170
00:08:25,760 --> 00:08:28,680
And we need to implement
get_at by index.

171
00:08:28,680 --> 00:08:31,460
So let's make the indices keys.

172
00:08:31,460 --> 00:08:35,390
OK, so this is idea number one--

173
00:08:35,390 --> 00:08:47,870
index-- assign a
key to each item

174
00:08:47,870 --> 00:08:51,495
equal to index in sequence.

175
00:08:55,790 --> 00:08:58,580
OK, so then, when I do--

176
00:08:58,580 --> 00:09:00,440
to implement get_at,
I can just call

177
00:09:00,440 --> 00:09:06,380
find of i if i is also a key.

178
00:09:06,380 --> 00:09:08,455
And that should give me
the thing that I want.

179
00:09:08,455 --> 00:09:09,830
Maybe for this to
make sense, let

180
00:09:09,830 --> 00:09:11,780
me tell you how I'm building.

181
00:09:11,780 --> 00:09:16,820
So if I'm given, say, an array
A of items, and they're both--

182
00:09:16,820 --> 00:09:18,910
the only name conflict
here is build.

183
00:09:18,910 --> 00:09:21,620
So let me call this
one sequence build.

184
00:09:24,260 --> 00:09:26,300
And I'm going to implement
it using set build.

185
00:09:29,600 --> 00:09:32,930
And I'll use some
shorthand notation here.

186
00:09:32,930 --> 00:09:35,870
Let's say I want to
make an object that

187
00:09:35,870 --> 00:09:47,960
has a key equal to i and
a value equal to A of i--

188
00:09:47,960 --> 00:09:50,090
that's my object notation--

189
00:09:50,090 --> 00:09:58,100
for i equals 0, 1, up
to size of A minus 1.

190
00:09:58,100 --> 00:10:02,040
That's a little bit code-like,
but not quite literal code.

191
00:10:02,040 --> 00:10:03,750
So I'm just going
to use this to say,

192
00:10:03,750 --> 00:10:07,863
let's make an object
that has two parts.

193
00:10:07,863 --> 00:10:08,780
One is called the key.

194
00:10:08,780 --> 00:10:11,840
So we can talk about the
object.key, so we can--

195
00:10:11,840 --> 00:10:13,085
which sets want to do.

196
00:10:13,085 --> 00:10:15,710
And we're also going to store a
value, which is the actual item

197
00:10:15,710 --> 00:10:17,640
that we're given.

198
00:10:17,640 --> 00:10:20,030
So I'm just-- because these
are given in the sequence,

199
00:10:20,030 --> 00:10:21,738
I'm just representing
that sequence order

200
00:10:21,738 --> 00:10:23,490
by assigning i to be the key.

201
00:10:23,490 --> 00:10:26,240
And so now, if I want to
find the item at index i,

202
00:10:26,240 --> 00:10:27,590
I can do find of i.

203
00:10:27,590 --> 00:10:30,980
And technically, I should
probably do .value.

204
00:10:30,980 --> 00:10:32,520
That will give me
the actual item

205
00:10:32,520 --> 00:10:34,550
that's stored at that position.

206
00:10:34,550 --> 00:10:37,580
When I do find of i, I'm
going to get this whole object

207
00:10:37,580 --> 00:10:38,390
with the key of i.

208
00:10:38,390 --> 00:10:41,720
And then I want to get
the value part of it.

209
00:10:41,720 --> 00:10:45,500
So then set_at, I
can just use this

210
00:10:45,500 --> 00:10:52,860
find operation to get the
object and set its value to x.

211
00:10:52,860 --> 00:10:53,360
Boom.

212
00:10:53,360 --> 00:10:57,050
We've implemented array-like
semantics, get_at i

213
00:10:57,050 --> 00:11:01,130
and set_at i, using a set.

214
00:11:01,130 --> 00:11:03,110
If you've ever
programmed in JavaScript,

215
00:11:03,110 --> 00:11:05,750
this should feel very familiar,
because JavaScript actually

216
00:11:05,750 --> 00:11:10,130
implements arrays, at least
at the conceptual level,

217
00:11:10,130 --> 00:11:13,340
as just general mapping
types, which are--

218
00:11:13,340 --> 00:11:16,820
they call them objects, but
they are basically sets.

219
00:11:16,820 --> 00:11:18,290
And it's even grosser.

220
00:11:18,290 --> 00:11:20,690
They convert the integers
into strings and then

221
00:11:20,690 --> 00:11:24,815
index everything by the
strings, semantically, anyway.

222
00:11:24,815 --> 00:11:26,690
Implementation details
can be more efficient.

223
00:11:26,690 --> 00:11:28,860
But conceptually,
that's what's going on.

224
00:11:28,860 --> 00:11:34,910
And so that's the idea we're
doing here, which seems great.

225
00:11:34,910 --> 00:11:37,420
Any problems?

226
00:11:37,420 --> 00:11:38,040
So let's see.

227
00:11:38,040 --> 00:11:39,740
There's insert_at and delete_at.

228
00:11:39,740 --> 00:11:41,420
As I mentioned,
what I'm going to do

229
00:11:41,420 --> 00:11:44,650
for those operations is just
rebuild the entire structure.

230
00:11:44,650 --> 00:11:46,715
I'll just write that briefly.

231
00:11:49,340 --> 00:11:54,122
Basically, let's just
iterate all the items.

232
00:11:54,122 --> 00:12:00,950
Iterate all items, let's
say, into an array.

233
00:12:00,950 --> 00:12:04,670
Insert, delete one of them.

234
00:12:04,670 --> 00:12:07,290
And then rebuild.

235
00:12:07,290 --> 00:12:09,980
OK, and if I was
writing a P set answer,

236
00:12:09,980 --> 00:12:11,480
I would say a little
bit more detail

237
00:12:11,480 --> 00:12:14,482
what I mean in this step.

238
00:12:14,482 --> 00:12:15,565
I've done it in the notes.

239
00:12:15,565 --> 00:12:16,490
Not that hard.

240
00:12:16,490 --> 00:12:18,320
But we can afford
linear expected time.

241
00:12:18,320 --> 00:12:20,735
I can afford to
call build again.

242
00:12:20,735 --> 00:12:22,110
I guess, technically,
I'm calling

243
00:12:22,110 --> 00:12:23,970
this build, sequence build.

244
00:12:27,462 --> 00:12:29,670
So I can afford just to
extract things into an array,

245
00:12:29,670 --> 00:12:31,445
do the linear time
operation on the array

246
00:12:31,445 --> 00:12:32,820
with the shifting
and everything,

247
00:12:32,820 --> 00:12:34,205
and then just call build again.

248
00:12:34,205 --> 00:12:34,830
Yeah, question?

249
00:12:34,830 --> 00:12:36,648
AUDIENCE: I had a
question about get_at.

250
00:12:36,648 --> 00:12:37,440
ERIK DEMAINE: Yeah.

251
00:12:37,440 --> 00:12:41,340
AUDIENCE: [INAUDIBLE] get_at.

252
00:12:41,340 --> 00:12:44,640
ERIK DEMAINE: Sorry, no.

253
00:12:44,640 --> 00:12:46,800
These are separate
definitions, yeah?

254
00:12:46,800 --> 00:12:48,100
Sorry, they got a little close.

255
00:12:48,100 --> 00:12:48,300
AUDIENCE: Oh.

256
00:12:48,300 --> 00:12:50,280
ERIK DEMAINE: So this is
the definition of get_at.

257
00:12:50,280 --> 00:12:51,840
This is the definition
of sequence build.

258
00:12:51,840 --> 00:12:52,688
AUDIENCE: Oh, I see.

259
00:12:52,688 --> 00:12:53,480
ERIK DEMAINE: Yeah.

260
00:12:53,480 --> 00:12:54,510
Thanks for asking.

261
00:12:57,340 --> 00:12:59,730
OK, all good, yeah?

262
00:12:59,730 --> 00:13:02,653
AUDIENCE: Can you explain
the insert and delete again?

263
00:13:02,653 --> 00:13:04,320
ERIK DEMAINE: Explain
insert and delete.

264
00:13:04,320 --> 00:13:09,270
OK, so maybe I should actually
write one of them down,

265
00:13:09,270 --> 00:13:11,100
or I'll just draw
a picture, maybe.

266
00:13:11,100 --> 00:13:15,720
So we have this data structure,
which is now a sequence data

267
00:13:15,720 --> 00:13:20,220
structure, represents
some sequence of items.

268
00:13:20,220 --> 00:13:23,460
And my goal is to say,
delete the i-th item.

269
00:13:23,460 --> 00:13:27,930
So there's some items in
here, x0 up to xn minus 1.

270
00:13:27,930 --> 00:13:31,320
I want to remove xi
from the sequence.

271
00:13:31,320 --> 00:13:33,640
Or I guess I should
draw it this way.

272
00:13:33,640 --> 00:13:35,180
It's coming out.

273
00:13:35,180 --> 00:13:38,020
So what I'm going to do is
first extract all the items

274
00:13:38,020 --> 00:13:38,870
from the sequence.

275
00:13:38,870 --> 00:13:40,510
And I didn't write
it, but there's

276
00:13:40,510 --> 00:13:44,650
an interface over here called
iter, which just gives me

277
00:13:44,650 --> 00:13:46,760
all the items in order.

278
00:13:46,760 --> 00:13:50,320
So I'm going to extract
this into an array sequence.

279
00:13:55,240 --> 00:13:59,620
Let's say I'll just build
a static array of size n.

280
00:13:59,620 --> 00:14:01,810
I also have a length
operation that tells me

281
00:14:01,810 --> 00:14:03,100
how many items are in here.

282
00:14:03,100 --> 00:14:06,910
And the iter operation will
give me all the items in order.

283
00:14:06,910 --> 00:14:11,000
And so I'll put into my
array x0 and then x1,

284
00:14:11,000 --> 00:14:13,700
and so on, as they come out.

285
00:14:13,700 --> 00:14:17,140
Then I go to position i.

286
00:14:17,140 --> 00:14:20,890
And I want to delete that item
and shift all the others over.

287
00:14:20,890 --> 00:14:23,290
This is the boring--

288
00:14:23,290 --> 00:14:28,600
I think we even said how to
do delete_at in dynamic arrays

289
00:14:28,600 --> 00:14:31,120
in recitation 2, pretty sure.

290
00:14:31,120 --> 00:14:32,560
So I'm just mimicking that.

291
00:14:32,560 --> 00:14:35,830
I'm building this just to
get the new order of things.

292
00:14:35,830 --> 00:14:39,340
And then I'm applying,
via the build operation,

293
00:14:39,340 --> 00:14:43,570
I'm building a
totally new sequence.

294
00:14:43,570 --> 00:14:46,895
And that's how I would
implement delete_at, one way.

295
00:14:46,895 --> 00:14:47,770
There are other ways.

296
00:14:47,770 --> 00:14:48,270
Yeah?

297
00:14:48,270 --> 00:14:50,960
AUDIENCE: Do you have
[INAUDIBLE] space [INAUDIBLE],,

298
00:14:50,960 --> 00:14:51,460
or--

299
00:14:51,460 --> 00:14:53,640
ERIK DEMAINE: How much
space is this using?

300
00:14:53,640 --> 00:14:55,515
Oh, problem with space
if you're inserting--

301
00:14:55,515 --> 00:14:56,890
if you're inserting,
you probably

302
00:14:56,890 --> 00:14:58,990
want to allocate a static
array of size n plus 1.

303
00:14:58,990 --> 00:15:01,010
You know exactly
what's going to happen.

304
00:15:01,010 --> 00:15:03,130
So just allocate a
little bit bigger.

305
00:15:03,130 --> 00:15:04,760
Then you can do the shift.

306
00:15:04,760 --> 00:15:06,250
You could also use
dynamic arrays.

307
00:15:06,250 --> 00:15:08,240
But then you would
get maybe an--

308
00:15:08,240 --> 00:15:10,240
it's not an amortized
bound, because you're only

309
00:15:10,240 --> 00:15:12,130
doing one insertion.

310
00:15:12,130 --> 00:15:14,180
The point is this
is really easy.

311
00:15:14,180 --> 00:15:15,710
We can spend linear time.

312
00:15:15,710 --> 00:15:17,920
So we can rebuild the--

313
00:15:17,920 --> 00:15:20,930
we can rebuild this array
three times if we wanted.

314
00:15:20,930 --> 00:15:22,190
Question?

315
00:15:22,190 --> 00:15:23,910
AUDIENCE: What if
you weren't allowed

316
00:15:23,910 --> 00:15:27,488
external non-constant space?

317
00:15:27,488 --> 00:15:28,238
ERIK DEMAINE: Huh.

318
00:15:28,238 --> 00:15:29,988
You're going to throw
me and open problem.

319
00:15:29,988 --> 00:15:32,275
What if you only have
constant extra space?

320
00:15:36,750 --> 00:15:37,350
Right.

321
00:15:37,350 --> 00:15:40,050
Then I think we need to
use insert and delete.

322
00:15:40,050 --> 00:15:44,160
So we could-- good question.

323
00:15:44,160 --> 00:15:46,590
We could conceptually
do this shifting,

324
00:15:46,590 --> 00:15:48,960
but do it using
insert and delete.

325
00:15:48,960 --> 00:15:53,730
So we can-- so let's do
the delete case again.

326
00:15:53,730 --> 00:15:56,190
So we want to--

327
00:15:56,190 --> 00:15:57,030
here's xi.

328
00:15:57,030 --> 00:16:00,330
We want to replace it
with xi plus 1 and so on.

329
00:16:04,380 --> 00:16:10,170
And so we can start out by
deleting the item with key i.

330
00:16:10,170 --> 00:16:12,060
That will get rid of this guy.

331
00:16:12,060 --> 00:16:16,920
Then we can delete the
item with key i plus 1.

332
00:16:16,920 --> 00:16:18,630
And that gives us the item.

333
00:16:18,630 --> 00:16:23,310
And then we can reassign its
key to i instead of i plus 1

334
00:16:23,310 --> 00:16:25,200
and then reinsert it.

335
00:16:25,200 --> 00:16:27,060
So we can take this item out.

336
00:16:27,060 --> 00:16:30,516
It has a key, which is--

337
00:16:30,516 --> 00:16:31,770
I'll draw this properly.

338
00:16:31,770 --> 00:16:37,200
So we have key i
plus 1 and value xi

339
00:16:37,200 --> 00:16:40,350
plus 1 stored in
this data structure.

340
00:16:40,350 --> 00:16:43,260
Then we update the key to i.

341
00:16:43,260 --> 00:16:44,700
And then we reinsert it.

342
00:16:44,700 --> 00:16:46,955
And it takes the
place of this guy.

343
00:16:46,955 --> 00:16:47,830
So you could do that.

344
00:16:47,830 --> 00:16:50,220
You could go down this
list and-- or not the list,

345
00:16:50,220 --> 00:16:52,980
but you could iterate
for i equals-- sorry,

346
00:16:52,980 --> 00:16:56,370
for j equals i to n minus 1,
and for each of those items,

347
00:16:56,370 --> 00:16:59,688
delete it, change its key,
reinsert it with the new key.

348
00:16:59,688 --> 00:17:01,980
And then you don't have to
build this intermediate data

349
00:17:01,980 --> 00:17:03,100
structure.

350
00:17:03,100 --> 00:17:05,335
So if you're told to
minimize space, great.

351
00:17:05,335 --> 00:17:06,960
And maybe you think
of that as simpler.

352
00:17:06,960 --> 00:17:09,000
I like to think of this
as simpler, because I--

353
00:17:09,000 --> 00:17:10,208
point is, I have linear time.

354
00:17:10,208 --> 00:17:14,069
I can do crazy, silly, very
non-data-structures-y things,

355
00:17:14,069 --> 00:17:16,670
where I just start from scratch.

356
00:17:16,670 --> 00:17:18,619
OK, great.

357
00:17:18,619 --> 00:17:20,510
But there's one more
set of operations,

358
00:17:20,510 --> 00:17:23,609
insert, delete, first and last.

359
00:17:23,609 --> 00:17:26,339
Are these easy?

360
00:17:26,339 --> 00:17:26,839
Good?

361
00:17:33,540 --> 00:17:35,340
Shall we try?

362
00:17:35,340 --> 00:17:40,440
We can insert last.

363
00:17:40,440 --> 00:17:44,520
So this is given an item,
x, we want to add it

364
00:17:44,520 --> 00:17:45,790
to the end of the structure.

365
00:17:45,790 --> 00:17:48,928
So that means its index is going
to be equal to-- because we

366
00:17:48,928 --> 00:17:50,970
start at 0, it's going to
be equal to the length,

367
00:17:50,970 --> 00:17:52,780
current length of the structure.

368
00:17:52,780 --> 00:17:57,350
So let's just insert
a new object, which

369
00:17:57,350 --> 00:18:00,720
has key equal to the length.

370
00:18:00,720 --> 00:18:04,240
And it has value equal to x.

371
00:18:04,240 --> 00:18:05,310
We're done.

372
00:18:05,310 --> 00:18:06,690
Delete last, similar.

373
00:18:06,690 --> 00:18:10,740
Just delete the item
with key length minus 1.

374
00:18:10,740 --> 00:18:11,760
OK, what about first?

375
00:18:17,170 --> 00:18:19,905
This is supposed to add x to
the beginning of my sequence.

376
00:18:22,810 --> 00:18:25,120
Well, now I realize
I have a problem,

377
00:18:25,120 --> 00:18:29,830
because I want this
new item to have key 0,

378
00:18:29,830 --> 00:18:33,700
because after I do an
insert first, get_at of 0

379
00:18:33,700 --> 00:18:35,530
should return this item.

380
00:18:35,530 --> 00:18:39,370
But I already have an item with
key 0, and an item with key 1,

381
00:18:39,370 --> 00:18:41,990
and an item with key 2,
and so on down the way.

382
00:18:41,990 --> 00:18:44,170
And so if I wanted
to give x a key of 0,

383
00:18:44,170 --> 00:18:46,750
I have to shift the keys
of all of those items,

384
00:18:46,750 --> 00:18:48,320
just like we were doing here.

385
00:18:48,320 --> 00:18:50,890
And that's going to
take linear time.

386
00:18:50,890 --> 00:18:54,320
But we were supposed to do this
in constant expected amortized

387
00:18:54,320 --> 00:18:55,640
time.

388
00:18:55,640 --> 00:18:57,800
So that's no good.

389
00:18:57,800 --> 00:19:00,890
So this idea is not enough.

390
00:19:00,890 --> 00:19:01,850
It's not a bad idea.

391
00:19:01,850 --> 00:19:03,190
It's still a good idea.

392
00:19:03,190 --> 00:19:05,770
But it's no longer what
we actually want to do.

393
00:19:05,770 --> 00:19:09,430
It's only morally
what we want to do.

394
00:19:09,430 --> 00:19:13,750
So do you have any thoughts
on how we might get around

395
00:19:13,750 --> 00:19:15,760
this problem?

396
00:19:15,760 --> 00:19:17,630
Seems like inserting
at position 0,

397
00:19:17,630 --> 00:19:20,095
I need to shift everything
down, linear time.

398
00:19:20,095 --> 00:19:20,965
That really sucks.

399
00:19:28,680 --> 00:19:31,310
Yeah?

400
00:19:31,310 --> 00:19:36,210
AUDIENCE: You could create some
sort of link to something else.

401
00:19:36,210 --> 00:19:38,640
ERIK DEMAINE: Link this data
structure with another one.

402
00:19:38,640 --> 00:19:40,140
So we could build
more than one set.

403
00:19:40,140 --> 00:19:41,182
That's certainly allowed.

404
00:19:45,260 --> 00:19:46,940
I don't know how
to do-- oh, I see.

405
00:19:46,940 --> 00:19:49,070
You're saying maybe build
a whole other structure

406
00:19:49,070 --> 00:19:51,620
for the items that
come before 0?

407
00:19:51,620 --> 00:19:52,820
AUDIENCE: Yeah.

408
00:19:52,820 --> 00:19:54,028
ERIK DEMAINE: Yeah, actually.

409
00:19:54,028 --> 00:19:57,260
That would work, I think, maybe.

410
00:19:57,260 --> 00:19:58,490
It's like in the P set.

411
00:19:58,490 --> 00:20:01,280
Then you have to deal
with when-- if you delete

412
00:20:01,280 --> 00:20:02,630
one of them, it becomes empty.

413
00:20:02,630 --> 00:20:04,340
Then things get messy.

414
00:20:04,340 --> 00:20:07,280
Delete first is also going
to be a problem, because I

415
00:20:07,280 --> 00:20:11,970
delete beginning of
this data structure,

416
00:20:11,970 --> 00:20:13,820
then I lose my 0 item.

417
00:20:13,820 --> 00:20:16,850
And I want the new 0
item to be the 1 item.

418
00:20:16,850 --> 00:20:18,500
And again, all
the indices shift.

419
00:20:18,500 --> 00:20:21,230
So delete and inserting
at the first is hard.

420
00:20:21,230 --> 00:20:23,720
So we could do that trick
like in the P set, but--

421
00:20:23,720 --> 00:20:27,540
or like in the last
problem session and so on.

422
00:20:27,540 --> 00:20:31,629
But there's a much simpler idea.

423
00:20:31,629 --> 00:20:33,296
AUDIENCE: Can you
have an extra variable

424
00:20:33,296 --> 00:20:35,453
to keep track of where
is the beginning?

425
00:20:35,453 --> 00:20:36,245
ERIK DEMAINE: Nice.

426
00:20:36,245 --> 00:20:38,930
I can have an extra
variable to keep track

427
00:20:38,930 --> 00:20:40,190
of where the beginning is.

428
00:20:52,084 --> 00:20:54,390
Call this first.

429
00:20:54,390 --> 00:21:08,090
This is going to be the key
of the first item, index 0.

430
00:21:08,090 --> 00:21:10,130
Another way to say
this is, let's just

431
00:21:10,130 --> 00:21:13,100
use negative integers, right?

432
00:21:13,100 --> 00:21:15,920
Sets work for any
keys, any integer keys.

433
00:21:15,920 --> 00:21:17,360
OK, actually, we
technically said,

434
00:21:17,360 --> 00:21:19,670
make sure you use
keys 0 to u minus 1.

435
00:21:19,670 --> 00:21:24,290
But then, if you have negative
numbers, you can easily fold--

436
00:21:24,290 --> 00:21:26,435
AUDIENCE: Wait, doesn't
it [INAUDIBLE] to like

437
00:21:26,435 --> 00:21:27,205
[INAUDIBLE]?

438
00:21:27,205 --> 00:21:27,913
ERIK DEMAINE: Ah.

439
00:21:27,913 --> 00:21:29,850
Python negative numbers
mean something else.

440
00:21:29,850 --> 00:21:31,475
But we're not using
a Python interface.

441
00:21:31,475 --> 00:21:34,760
We're using our custom
magical set interface,

442
00:21:34,760 --> 00:21:38,090
which we show how to implement
in recitation notes, which

443
00:21:38,090 --> 00:21:40,280
can take an arbitrary key.

444
00:21:40,280 --> 00:21:46,040
It hashes that key and finds
a place to put that item.

445
00:21:46,040 --> 00:21:48,338
So we're not actually
storing things in order here.

446
00:21:48,338 --> 00:21:49,880
We're storing things
in a hash table.

447
00:21:49,880 --> 00:21:52,160
But we're not supposed to
get into the implementation

448
00:21:52,160 --> 00:21:52,940
details.

449
00:21:52,940 --> 00:21:56,720
I think the way we presented
hashing with our universal hash

450
00:21:56,720 --> 00:21:59,220
functions, we only
allowed positive numbers.

451
00:21:59,220 --> 00:22:01,980
So maybe, technically,
I should point out,

452
00:22:01,980 --> 00:22:07,430
if you have positive
and negative numbers,

453
00:22:07,430 --> 00:22:15,260
you can fold this in half by
mapping 0 to 0, 1 to 2, 2 to 4,

454
00:22:15,260 --> 00:22:17,300
spreading it out.

455
00:22:17,300 --> 00:22:19,760
And then you can take
minus 1 and map it

456
00:22:19,760 --> 00:22:23,750
to plus 1, and minus 2
and map it to plus 3.

457
00:22:23,750 --> 00:22:26,540
So this is like multiplying
each of these guys by 2,

458
00:22:26,540 --> 00:22:30,200
and multiplying each of these
guys by minus 2 and adding 1.

459
00:22:30,200 --> 00:22:35,960
And then you get non-negative
integers out of all integers.

460
00:22:35,960 --> 00:22:40,100
This is a typical
math trick for showing

461
00:22:40,100 --> 00:22:42,320
that the number of integers
is equal to the number

462
00:22:42,320 --> 00:22:45,140
of non-negative integers,
which may seem weird to you.

463
00:22:45,140 --> 00:22:48,410
But they're both
countably infinite.

464
00:22:48,410 --> 00:22:50,690
So you could-- if your
structure only supports

465
00:22:50,690 --> 00:22:53,930
non-negative keys, you could
map negative keys in this way

466
00:22:53,930 --> 00:22:57,200
and throw them into
the hash table, OK?

467
00:22:57,200 --> 00:23:00,170
So now, I allow
negative things for--

468
00:23:03,490 --> 00:23:04,250
like that.

469
00:23:04,250 --> 00:23:05,230
And so, great.

470
00:23:05,230 --> 00:23:08,080
If I want to insert at the
beginning, what I can do

471
00:23:08,080 --> 00:23:14,080
is just decrement my
first variable, which

472
00:23:14,080 --> 00:23:15,450
is keeping track of the index.

473
00:23:15,450 --> 00:23:17,920
So initially, first
is going to be 0.

474
00:23:17,920 --> 00:23:20,590
So I'm going to
add into my build.

475
00:23:20,590 --> 00:23:23,500
First, I'm going
to say first equals

476
00:23:23,500 --> 00:23:28,270
0, because I start with
key 0 when I initially

477
00:23:28,270 --> 00:23:29,680
build a structure.

478
00:23:29,680 --> 00:23:31,930
And if I want to-- if I
need more room before 0,

479
00:23:31,930 --> 00:23:34,240
I just set first to minus 1.

480
00:23:34,240 --> 00:23:36,400
And if I already have
a minus 1 element,

481
00:23:36,400 --> 00:23:38,040
I'll decrement it to minus 2.

482
00:23:38,040 --> 00:23:40,600
Decrement means decrease by 1--

483
00:23:40,600 --> 00:23:42,640
shows my assembly
language programming.

484
00:23:42,640 --> 00:23:45,550
This is usually a built-in
operation on most computers.

485
00:23:45,550 --> 00:23:55,690
And then I can insert an item
with key first and value x.

486
00:23:58,410 --> 00:23:58,910
Great.

487
00:23:58,910 --> 00:24:01,020
And if I want to
delete the first item,

488
00:24:01,020 --> 00:24:05,850
I would delete the item with key
first and then increment first.

489
00:24:05,850 --> 00:24:08,720
And now all of my operations
have to change a little bit--

490
00:24:08,720 --> 00:24:11,030
let me use another color--

491
00:24:11,030 --> 00:24:13,190
because I was
implicitly assuming here

492
00:24:13,190 --> 00:24:15,320
that all my indices
started at i.

493
00:24:15,320 --> 00:24:16,850
But now they start at first.

494
00:24:16,850 --> 00:24:22,730
The index 0 maps to key first.

495
00:24:22,730 --> 00:24:24,680
And so the right
thing to do here

496
00:24:24,680 --> 00:24:31,370
is plus first and plus first.

497
00:24:31,370 --> 00:24:35,930
Basically, add a whole bunch
of plus firsts throughout.

498
00:24:35,930 --> 00:24:37,080
This one is probably fine.

499
00:24:37,080 --> 00:24:40,550
If I'm globally rebuilding,
I can reassign all my labels.

500
00:24:40,550 --> 00:24:44,840
But this one should
be first plus length.

501
00:24:47,630 --> 00:24:52,760
OK, so just by keeping track
of where my keys are starting,

502
00:24:52,760 --> 00:24:56,510
I can do this shifting and
not have to worry about stuff.

503
00:24:56,510 --> 00:24:59,630
And this is a lot easier
than having to worry about

504
00:24:59,630 --> 00:25:03,290
maintaining two structures, and
keeping them both non-empty,

505
00:25:03,290 --> 00:25:06,070
and stuff like
that, because of--

506
00:25:06,070 --> 00:25:08,780
if I assume my
mindset has this power

507
00:25:08,780 --> 00:25:11,420
of dealing with negative
integers, and strings,

508
00:25:11,420 --> 00:25:14,450
and whatever else.

509
00:25:14,450 --> 00:25:15,400
Cool?

510
00:25:15,400 --> 00:25:15,900
Yeah?

511
00:25:15,900 --> 00:25:17,330
AUDIENCE: Is there a
reason why you didn't do

512
00:25:17,330 --> 00:25:18,490
like the sorting--

513
00:25:18,490 --> 00:25:21,870
like, have [INAUDIBLE]?

514
00:25:21,870 --> 00:25:24,130
ERIK DEMAINE: Oh, why
didn't I use a linked list?

515
00:25:24,130 --> 00:25:27,790
Because this.

516
00:25:27,790 --> 00:25:32,440
Linked lists are very bad at
get and set at a given index.

517
00:25:32,440 --> 00:25:33,400
AUDIENCE: Is that the--

518
00:25:33,400 --> 00:25:35,025
the bottom idea, is
that a linked list?

519
00:25:35,025 --> 00:25:36,692
ERIK DEMAINE: This
is not a linked list.

520
00:25:36,692 --> 00:25:38,200
This is just storing
a single number

521
00:25:38,200 --> 00:25:41,050
as integer in your data
structure that says,

522
00:25:41,050 --> 00:25:44,660
what is the smallest key
in my data structure?

523
00:25:44,660 --> 00:25:45,970
That's all it this.

524
00:25:45,970 --> 00:25:47,078
It's a counter.

525
00:25:47,078 --> 00:25:48,302
AUDIENCE: Ah.

526
00:25:48,302 --> 00:25:50,027
ERIK DEMAINE: OK,
so data structure

527
00:25:50,027 --> 00:25:51,110
keeps track of its length.

528
00:25:51,110 --> 00:25:52,910
And it keeps track
of the minimum key.

529
00:25:52,910 --> 00:25:55,100
And so it will always
consist-- the invariant is,

530
00:25:55,100 --> 00:25:57,530
you will always
have keys from first

531
00:25:57,530 --> 00:26:00,290
up to first plus length minus 1.

532
00:26:00,290 --> 00:26:04,800
And that's what we're
exploiting here.

533
00:26:04,800 --> 00:26:06,330
We have no idea
where first will be.

534
00:26:06,330 --> 00:26:08,430
It depends how many
operations you've done,

535
00:26:08,430 --> 00:26:10,510
how many inserts at the
beginning, and so on.

536
00:26:10,510 --> 00:26:23,520
But the keys-- keys will
always be first to first plus

537
00:26:23,520 --> 00:26:26,433
length minus 1.

538
00:26:26,433 --> 00:26:27,850
This is what we
call an invariant.

539
00:26:27,850 --> 00:26:31,200
Useful to write
these things down so

540
00:26:31,200 --> 00:26:32,970
you can understand
what the heck--

541
00:26:32,970 --> 00:26:35,077
why is your data
structure correct?

542
00:26:35,077 --> 00:26:36,660
Because of invariants
like this, which

543
00:26:36,660 --> 00:26:39,072
you can prove by
induction, by showing,

544
00:26:39,072 --> 00:26:40,530
each time you do
an operation, this

545
00:26:40,530 --> 00:26:44,130
is maintained, even when
I'm changing first in order

546
00:26:44,130 --> 00:26:47,820
to maintain this invariant.

547
00:26:47,820 --> 00:26:48,352
Cool.

548
00:26:48,352 --> 00:26:50,310
Sometimes you come up
with the invariant first.

549
00:26:50,310 --> 00:26:57,010
In this case, I came up with
it post facto, after the fact.

550
00:26:57,010 --> 00:26:58,510
Cool.

551
00:26:58,510 --> 00:27:18,760
Let's move on to problem 3,
which is called critter sort.

552
00:27:18,760 --> 00:27:21,600
And the other key thing I
want you to learn about--

553
00:27:21,600 --> 00:27:22,100
question?

554
00:27:22,100 --> 00:27:22,600
Sorry.

555
00:27:22,600 --> 00:27:23,480
AUDIENCE: Yeah.

556
00:27:23,480 --> 00:27:25,547
So when you do
first, first plus 1,

557
00:27:25,547 --> 00:27:28,247
is that a rebuilding
of the [INAUDIBLE]??

558
00:27:28,247 --> 00:27:29,830
ERIK DEMAINE: This
is just a sentence.

559
00:27:29,830 --> 00:27:32,720
It is not an algorithm
or data structure.

560
00:27:32,720 --> 00:27:34,115
This is a mathematical property.

561
00:27:34,115 --> 00:27:34,990
AUDIENCE: [INAUDIBLE]

562
00:27:34,990 --> 00:27:36,770
ERIK DEMAINE: This
is not an assignment.

563
00:27:36,770 --> 00:27:38,800
This is a mathematically
is equal to.

564
00:27:38,800 --> 00:27:41,920
AUDIENCE: But you are
re-indexing it though,

565
00:27:41,920 --> 00:27:43,630
because you're
doing first plus 1.

566
00:27:43,630 --> 00:27:46,480
ERIK DEMAINE: So are you asking
about one of these operations,

567
00:27:46,480 --> 00:27:48,142
like this one?

568
00:27:48,142 --> 00:27:48,850
AUDIENCE: Oh, OK.

569
00:27:48,850 --> 00:27:49,090
Never mind.

570
00:27:49,090 --> 00:27:49,530
I get it.

571
00:27:49,530 --> 00:27:50,030
OK.

572
00:27:50,030 --> 00:27:51,830
ERIK DEMAINE: Yeah.

573
00:27:51,830 --> 00:27:52,600
OK.

574
00:27:52,600 --> 00:27:54,160
So the other
important takeaway I

575
00:27:54,160 --> 00:27:56,350
want you to get about
reading our problem sets

576
00:27:56,350 --> 00:27:58,630
is that they have
hidden humor inside.

577
00:27:58,630 --> 00:28:00,470
I don't know if you've noticed.

578
00:28:00,470 --> 00:28:02,950
But here's an example of a
problem called critter sort.

579
00:28:02,950 --> 00:28:05,770
Ashley Gettem collects
and trains pocket critters

580
00:28:05,770 --> 00:28:07,750
to fight other pocket
critters in battle.

581
00:28:07,750 --> 00:28:09,235
What is this a reference to?

582
00:28:09,235 --> 00:28:10,313
AUDIENCE: Digimon.

583
00:28:10,313 --> 00:28:11,230
ERIK DEMAINE: Digimon.

584
00:28:11,230 --> 00:28:13,150
Wow, you guys are so young.

585
00:28:13,150 --> 00:28:16,990
Pokemon, the ancient form.

586
00:28:16,990 --> 00:28:19,690
Pokemon is short
for pocket monsters.

587
00:28:19,690 --> 00:28:20,950
And in fact, in the original--

588
00:28:20,950 --> 00:28:21,490
AUDIENCE: [INAUDIBLE]

589
00:28:21,490 --> 00:28:22,448
ERIK DEMAINE: --anime--

590
00:28:22,448 --> 00:28:24,160
AUDIENCE: Actually,
there's [INAUDIBLE]..

591
00:28:24,160 --> 00:28:25,285
ERIK DEMAINE: I don't know.

592
00:28:25,285 --> 00:28:27,880
This is all after my time.

593
00:28:27,880 --> 00:28:28,845
We can debate after.

594
00:28:28,845 --> 00:28:30,220
So pocket critters
is a reference

595
00:28:30,220 --> 00:28:33,130
to pocket monsters,
which is Pokemon.

596
00:28:33,130 --> 00:28:35,240
Who's Ashley Gettem?

597
00:28:35,240 --> 00:28:36,400
AUDIENCE: Ash.

598
00:28:36,400 --> 00:28:38,470
ERIK DEMAINE: Ash
Ketchum is his full name

599
00:28:38,470 --> 00:28:40,952
in the English version.

600
00:28:40,952 --> 00:28:42,910
Totally different name
in the Japanese version.

601
00:28:42,910 --> 00:28:46,090
But they're both puns on
collect them all, right?

602
00:28:46,090 --> 00:28:48,040
All right, so that's
the important stuff.

603
00:28:48,040 --> 00:28:50,240
We'll see more jokes later.

604
00:28:50,240 --> 00:28:51,640
So there's this setup.

605
00:28:51,640 --> 00:28:54,280
But basically, we
have n critters.

606
00:28:54,280 --> 00:28:57,290
And we want to sort them
by four different things.

607
00:28:57,290 --> 00:28:59,590
And so I'm just going
to abstract this problem

608
00:28:59,590 --> 00:29:04,438
to sort n objects by the
following types of keys.

609
00:29:04,438 --> 00:29:06,730
And for each one, we want to
know what the best sorting

610
00:29:06,730 --> 00:29:08,200
algorithm is.

611
00:29:08,200 --> 00:29:10,990
And there's this footnote
that's very important that says,

612
00:29:10,990 --> 00:29:13,420
faster correct algorithms
will receive more points

613
00:29:13,420 --> 00:29:15,938
than slower correct algorithms.

614
00:29:15,938 --> 00:29:17,980
Also, correct algorithms
will receive more points

615
00:29:17,980 --> 00:29:19,150
than incorrect algorithms.

616
00:29:19,150 --> 00:29:20,230
But that's implicit.

617
00:29:20,230 --> 00:29:22,330
Incorrect generally gets zero.

618
00:29:22,330 --> 00:29:27,340
OK, so part a, it
says, species ID.

619
00:29:27,340 --> 00:29:36,400
But basically, we have integers
and the range minus n to n.

620
00:29:36,400 --> 00:29:40,900
So if I want to sort n integers
in the range minus n to n,

621
00:29:40,900 --> 00:29:42,340
what should I do?

622
00:29:42,340 --> 00:29:45,550
This is a reference to
yesterday's lecture.

623
00:29:50,170 --> 00:29:51,350
Yeah?

624
00:29:51,350 --> 00:29:52,800
Radix sort, yeah.

625
00:29:52,800 --> 00:29:54,005
Always a good answer.

626
00:29:54,005 --> 00:29:56,990
Or almost always a good
answer when you have integers.

627
00:29:56,990 --> 00:29:59,640
It's a good answer whenever
you have small integers.

628
00:29:59,640 --> 00:30:01,590
Now, radix sort, the
way we phrased it--

629
00:30:01,590 --> 00:30:04,010
let me maybe put it down here.

630
00:30:04,010 --> 00:30:20,270
Radix sort sorts n integers in
the range 0 to u minus 1 in n

631
00:30:20,270 --> 00:30:25,910
plus n log base n of u time.

632
00:30:28,460 --> 00:30:31,800
And in particular,
this is linear time

633
00:30:31,800 --> 00:30:38,420
if u is n to some
constant power.

634
00:30:38,420 --> 00:30:43,430
OK, so can I just apply this
as is to these integers?

635
00:30:43,430 --> 00:30:45,987
No, because they're negative.

636
00:30:45,987 --> 00:30:46,820
So what should I do?

637
00:30:46,820 --> 00:30:48,278
Maybe I should do
my folding trick.

638
00:30:48,278 --> 00:30:51,440
We just saw how to take negative
numbers and fold them in,

639
00:30:51,440 --> 00:30:53,160
interspersed with
positive numbers.

640
00:30:53,160 --> 00:30:57,900
If I sort that, will that work?

641
00:30:57,900 --> 00:31:00,960
No, because that does
not preserve order.

642
00:31:00,960 --> 00:31:03,335
It would intersperse.

643
00:31:03,335 --> 00:31:05,460
We want all the negative
numbers to come before all

644
00:31:05,460 --> 00:31:06,240
the positive numbers.

645
00:31:06,240 --> 00:31:06,630
Yeah?

646
00:31:06,630 --> 00:31:08,672
AUDIENCE: Can you just
add n to all the integers?

647
00:31:08,672 --> 00:31:10,485
ERIK DEMAINE: Just add n, yep.

648
00:31:10,485 --> 00:31:11,190
Boom.

649
00:31:11,190 --> 00:31:11,970
Plus n.

650
00:31:11,970 --> 00:31:14,730
Now we have integers
in the range--

651
00:31:14,730 --> 00:31:16,020
let's be careful--

652
00:31:16,020 --> 00:31:17,640
0 to 2n.

653
00:31:21,130 --> 00:31:21,630
Cool.

654
00:31:21,630 --> 00:31:22,650
Now we can apply this.

655
00:31:22,650 --> 00:31:25,470
Now u equals,
technically, 2n plus 1,

656
00:31:25,470 --> 00:31:27,833
because we're only supposed
to go to u minus 1.

657
00:31:27,833 --> 00:31:28,500
But that's fine.

658
00:31:28,500 --> 00:31:29,490
That's linear.

659
00:31:29,490 --> 00:31:32,220
And so we can sort
in linear time, easy.

660
00:31:32,220 --> 00:31:34,850
This is a super easy problem.

661
00:31:34,850 --> 00:31:37,500
But in each one, we might need
to do some transformation.

662
00:31:37,500 --> 00:31:41,410
Part b is a little
more interesting.

663
00:31:41,410 --> 00:31:58,230
So we have strings over 26
letters of length at most

664
00:31:58,230 --> 00:32:00,930
10 ceiling log n.

665
00:32:03,640 --> 00:32:05,160
OK, this is a little trickier.

666
00:32:05,160 --> 00:32:06,300
What could I do?

667
00:32:06,300 --> 00:32:10,350
Again, I'd like to see
whether radix sort applies.

668
00:32:10,350 --> 00:32:12,120
I should say radix sort sorts.

669
00:32:15,420 --> 00:32:17,130
I'd like to see if
radix sort applies.

670
00:32:17,130 --> 00:32:20,550
To do that, I have to map these
strings into integers somehow.

671
00:32:20,550 --> 00:32:21,510
Any way to do that?

672
00:32:24,405 --> 00:32:26,155
This is easy if you
understand radix sort.

673
00:32:26,155 --> 00:32:26,655
Yeah?

674
00:32:26,655 --> 00:32:29,180
AUDIENCE: Can you just
index the letters?

675
00:32:29,180 --> 00:32:30,618
ERIK DEMAINE:
Index, the letters.

676
00:32:30,618 --> 00:32:31,118
Yeah.

677
00:32:34,610 --> 00:32:35,910
Yeah, we can map--

678
00:32:35,910 --> 00:32:36,410
right.

679
00:32:36,410 --> 00:32:39,150
So we can map A to 0, B to 1.

680
00:32:39,150 --> 00:32:39,650
Then what?

681
00:32:42,288 --> 00:32:43,080
AUDIENCE: Oh, wait.

682
00:32:43,080 --> 00:32:45,170
Length--

683
00:32:45,170 --> 00:32:48,050
ERIK DEMAINE: So
that's for each letter.

684
00:32:48,050 --> 00:32:52,520
But we have a lot of letters.

685
00:32:52,520 --> 00:32:53,660
There are only 26 letters.

686
00:32:53,660 --> 00:32:58,070
But then we have 10 log
n letters in a string.

687
00:32:58,070 --> 00:33:00,920
That is, together, a single
key that we need to sort.

688
00:33:05,130 --> 00:33:05,630
Yeah?

689
00:33:05,630 --> 00:33:09,995
AUDIENCE: Can't we just sort by
the first letter first, then--

690
00:33:09,995 --> 00:33:12,620
ERIK DEMAINE: Sort by the first
letter, then the second letter.

691
00:33:12,620 --> 00:33:14,517
That is exactly the
opposite of radix sort.

692
00:33:14,517 --> 00:33:16,100
Remember, radix sort,
we want to start

693
00:33:16,100 --> 00:33:18,980
by the last letter, and then
the next to last letter,

694
00:33:18,980 --> 00:33:20,450
and finally, the first letter.

695
00:33:20,450 --> 00:33:21,530
AUDIENCE: But you want
to sort by the first one

696
00:33:21,530 --> 00:33:23,233
in order to alphabetize things.

697
00:33:23,233 --> 00:33:24,650
ERIK DEMAINE: No,
to alphabetize--

698
00:33:24,650 --> 00:33:27,230
we do want to, in the end,
sort by the first letter.

699
00:33:27,230 --> 00:33:29,150
But that's at the end.

700
00:33:29,150 --> 00:33:30,690
So at the end--
remember, radix sort

701
00:33:30,690 --> 00:33:32,690
always goes backwards
from the least significant

702
00:33:32,690 --> 00:33:34,340
to the most significant.

703
00:33:34,340 --> 00:33:36,090
And so indeed, that
is what we want to do.

704
00:33:36,090 --> 00:33:37,250
You're just saying,
use radix sort.

705
00:33:37,250 --> 00:33:38,690
But what am I radix sort on?

706
00:33:38,690 --> 00:33:41,150
What am I radix sorting on?

707
00:33:41,150 --> 00:33:44,353
AUDIENCE: Yeah, on the last
letters, not the first letters.

708
00:33:44,353 --> 00:33:45,770
ERIK DEMAINE: So
technically, that

709
00:33:45,770 --> 00:33:47,953
would be using counting
sort on the last letter,

710
00:33:47,953 --> 00:33:49,870
counting sort of the
next to last letter, dot,

711
00:33:49,870 --> 00:33:53,330
dot, dot, counting sort
on the first letter.

712
00:33:53,330 --> 00:33:56,510
But that is, together,
radix sort on something,

713
00:33:56,510 --> 00:34:01,580
or Jason likes to call
this tuple sorting.

714
00:34:01,580 --> 00:34:03,150
Tuple sort is the thing--

715
00:34:03,150 --> 00:34:05,600
is the algorithm that says,
sort by the last thing,

716
00:34:05,600 --> 00:34:08,580
then sort by the previous
thing, and so on.

717
00:34:08,580 --> 00:34:10,820
You can also think of
this as radix sorting

718
00:34:10,820 --> 00:34:13,025
on a number written in base 26.

719
00:34:15,550 --> 00:34:16,639
They're the same thing.

720
00:34:24,440 --> 00:34:28,227
But in the end, we can
sort in linear time.

721
00:34:28,227 --> 00:34:30,830
AUDIENCE: How do you
ensure that the letters are

722
00:34:30,830 --> 00:34:33,818
sorted in order, though?

723
00:34:33,818 --> 00:34:36,975
Like, how do you ensure that--
how do you tell the algorithm

724
00:34:36,975 --> 00:34:39,850
that you want A to come--

725
00:34:39,850 --> 00:34:42,596
just like not-- 0 is less
than 1, A is less than B.

726
00:34:42,596 --> 00:34:43,429
ERIK DEMAINE: Right.

727
00:34:43,429 --> 00:34:45,230
So I mean, technically,
when you call

728
00:34:45,230 --> 00:34:47,270
something like tuple sort--

729
00:34:47,270 --> 00:34:49,810
or maybe it's even clearer
when you call radix sort.

730
00:34:49,810 --> 00:34:51,810
Radix sort, you're giving
it a bunch of numbers.

731
00:34:51,810 --> 00:34:55,560
So you're taking these strings
and mapping them to numbers.

732
00:34:55,560 --> 00:34:57,775
And when you do that, you
get to decide which letter

733
00:34:57,775 --> 00:34:59,150
is the most
significant, which is

734
00:34:59,150 --> 00:35:01,100
the least significant, right?

735
00:35:01,100 --> 00:35:04,700
So you will choose to always map
the first letter in your string

736
00:35:04,700 --> 00:35:07,250
to position--

737
00:35:07,250 --> 00:35:13,130
to value, or the--
what do you call it?

738
00:35:13,130 --> 00:35:15,050
Position in positional notation.

739
00:35:15,050 --> 00:35:19,590
Position 26 to the power 10
log n as the most significant.

740
00:35:19,590 --> 00:35:21,090
So it's always the
most significant.

741
00:35:21,090 --> 00:35:22,580
Even if your string
is of length 1,

742
00:35:22,580 --> 00:35:25,070
you want to put that in
the most significant digit.

743
00:35:25,070 --> 00:35:26,870
And you'll pad with
zeros at the end

744
00:35:26,870 --> 00:35:29,340
if you run out of
letters in your string.

745
00:35:29,340 --> 00:35:31,970
AUDIENCE: How many times are
you running counting sort here?

746
00:35:31,970 --> 00:35:34,303
ERIK DEMAINE: How many times
am I running counting sort?

747
00:35:34,303 --> 00:35:35,680
Oh, 10 log n times.

748
00:35:35,680 --> 00:35:36,775
Whoops.

749
00:35:36,775 --> 00:35:37,880
Yeah, good question.

750
00:35:37,880 --> 00:35:39,240
Good point.

751
00:35:39,240 --> 00:35:40,890
Yeah, I computed this wrong.

752
00:35:40,890 --> 00:35:42,980
So right.

753
00:35:42,980 --> 00:35:47,180
There are log n
digits in the string.

754
00:35:47,180 --> 00:35:49,310
So that is bad.

755
00:35:49,310 --> 00:35:51,270
I mean, it's OK.

756
00:35:51,270 --> 00:35:57,480
We'll end up with n log n
running time by the tuple sort.

757
00:36:01,850 --> 00:36:03,920
However-- so that's
the tuple sort.

758
00:36:03,920 --> 00:36:07,380
So I should really make
this not equivalent.

759
00:36:07,380 --> 00:36:10,410
If I run tuple short letter
by letter, I'm going to do--

760
00:36:10,410 --> 00:36:12,270
I'm running counting
sort log n times.

761
00:36:12,270 --> 00:36:16,310
And so I get n log n, because
each one takes linear time.

762
00:36:16,310 --> 00:36:20,630
If I map my strings
into numbers first,

763
00:36:20,630 --> 00:36:22,355
radix sort doesn't use base 26.

764
00:36:22,355 --> 00:36:24,500
It uses base n.

765
00:36:24,500 --> 00:36:28,070
And then it will
only run 10 times,

766
00:36:28,070 --> 00:36:38,520
because 2 to the 10
log n is n to the 10.

767
00:36:38,520 --> 00:36:42,800
And so the numbers that we're
sorting are between 0 and n

768
00:36:42,800 --> 00:36:44,040
to the 10.

769
00:36:44,040 --> 00:36:46,290
And so u is n to the 10.

770
00:36:46,290 --> 00:36:49,380
And so that's the case when
radix sort runs in linear time.

771
00:36:49,380 --> 00:36:53,210
So if you run tuple short
letter by letter, it's slow.

772
00:36:53,210 --> 00:36:55,700
If you run radix sort,
it's doing a whole bunch

773
00:36:55,700 --> 00:36:56,720
of letters at once.

774
00:36:56,720 --> 00:36:58,340
Effectively, it's
doing log n letters

775
00:36:58,340 --> 00:37:01,700
at a time in a single
call to counting sort.

776
00:37:01,700 --> 00:37:05,930
And so the radix sort will
actually win and get linear.

777
00:37:09,883 --> 00:37:11,300
There's a subtlety
here, which is,

778
00:37:11,300 --> 00:37:13,940
I'm assuming that we can
actually take these strings

779
00:37:13,940 --> 00:37:18,380
and convert them into integers
in constant time each.

780
00:37:18,380 --> 00:37:20,750
And this problem
set was ambiguous.

781
00:37:20,750 --> 00:37:22,610
And both answers were accepted.

782
00:37:22,610 --> 00:37:25,010
If you assume these letters
are nice and compactly

783
00:37:25,010 --> 00:37:27,800
stored, and they
fit in 10 words,

784
00:37:27,800 --> 00:37:31,280
because a word is at
least log n bits long,

785
00:37:31,280 --> 00:37:33,560
then you can actually do this.

786
00:37:33,560 --> 00:37:36,650
If you store each letter
in a separate word,

787
00:37:36,650 --> 00:37:40,496
then just reading the entire
input will take n log n time.

788
00:37:40,496 --> 00:37:44,947
So that's a subtlety which we
don't need to worry too much

789
00:37:44,947 --> 00:37:45,780
about in this class.

790
00:37:45,780 --> 00:37:46,280
Yeah?

791
00:37:46,280 --> 00:37:49,340
AUDIENCE: So [INAUDIBLE]
bounding the letters

792
00:37:49,340 --> 00:37:50,312
to numbers.

793
00:37:50,312 --> 00:37:52,267
And like, how would that help?

794
00:37:52,267 --> 00:37:55,130
Because we still have to do 26--

795
00:37:55,130 --> 00:37:58,340
ERIK DEMAINE: Yeah, there
are 26 possible letters,

796
00:37:58,340 --> 00:38:01,260
numbering them 0 to 25.

797
00:38:01,260 --> 00:38:05,450
And then when we take
a string, like AA,

798
00:38:05,450 --> 00:38:09,740
we map this into 00 in base 26.

799
00:38:12,350 --> 00:38:13,430
That's a number.

800
00:38:13,430 --> 00:38:18,350
If we do BB, for
example, this maps to 11

801
00:38:18,350 --> 00:38:25,820
in base 26, which means 1
times 26 plus 1, which is 27.

802
00:38:25,820 --> 00:38:28,958
OK, so that's the
mapping that I mean.

803
00:38:28,958 --> 00:38:31,250
AUDIENCE: You're mapping the
whole string [INAUDIBLE]??

804
00:38:31,250 --> 00:38:34,130
ERIK DEMAINE: The whole string
to a single number, yeah.

805
00:38:34,130 --> 00:38:36,740
And there's a subtlety,
because I want lexicographic.

806
00:38:36,740 --> 00:38:38,540
I need to pad things
with spaces at the end

807
00:38:38,540 --> 00:38:40,850
or pad them with As
at the end in case

808
00:38:40,850 --> 00:38:44,410
they're shorter than 10 log n.

809
00:38:44,410 --> 00:38:46,150
OK, cool.

810
00:38:46,150 --> 00:38:46,990
That was b.

811
00:38:50,330 --> 00:38:52,610
c is not very interesting.

812
00:38:52,610 --> 00:39:00,410
It's integers in the
range 0 to n squared.

813
00:39:00,410 --> 00:39:02,420
This, I can just
solve with radix sort,

814
00:39:02,420 --> 00:39:03,990
because my radix
sort, at this point,

815
00:39:03,990 --> 00:39:06,160
we've done it a third time.

816
00:39:06,160 --> 00:39:09,050
Radix sort, we can sort as
long as the integers are

817
00:39:09,050 --> 00:39:10,400
bounded by a polynomial.

818
00:39:10,400 --> 00:39:13,940
Here, it's a fixed polynomial
with a constant exponent.

819
00:39:13,940 --> 00:39:16,370
So this will-- and
this is radix sort,

820
00:39:16,370 --> 00:39:19,220
like we saw, that just
calls counting sort twice,

821
00:39:19,220 --> 00:39:21,290
linear time.

822
00:39:21,290 --> 00:39:24,240
d is where things
get more interesting.

823
00:39:24,240 --> 00:39:29,100
Let me get this
phrasing the same.

824
00:39:29,100 --> 00:39:36,230
So in d, we have rational
numbers of the form w over f.

825
00:39:36,230 --> 00:39:40,310
This is some win ratio.

826
00:39:40,310 --> 00:39:42,230
Always in the range 0 to 1.

827
00:39:42,230 --> 00:39:44,420
So we saw w is at most f.

828
00:39:44,420 --> 00:39:49,550
And 0 is less than w, is
less than f, is less than n

829
00:39:49,550 --> 00:39:51,380
squared, because the--

830
00:39:51,380 --> 00:39:53,185
that is really confusing--

831
00:39:55,920 --> 00:39:57,470
is less than n squared--

832
00:39:57,470 --> 00:40:00,200
those are separate statements--

833
00:40:00,200 --> 00:40:02,240
because the f actually
comes from part c.

834
00:40:02,240 --> 00:40:04,800
And c is really a
setup for this one.

835
00:40:04,800 --> 00:40:06,770
Doesn't really matter
what this means.

836
00:40:06,770 --> 00:40:09,050
It's just that we
have numbers w and f,

837
00:40:09,050 --> 00:40:10,460
where w is always less than f.

838
00:40:10,460 --> 00:40:12,420
And they're between
0 and n squared.

839
00:40:12,420 --> 00:40:15,500
So you should think, this is
a good range for me, right?

840
00:40:15,500 --> 00:40:17,240
That I'm representing
this rational

841
00:40:17,240 --> 00:40:19,650
in terms of two numbers
between 0 and n squared.

842
00:40:19,650 --> 00:40:22,700
So there's like n to
the 4th possible choices

843
00:40:22,700 --> 00:40:24,500
for what w and f are.

844
00:40:24,500 --> 00:40:26,780
So the range of my
values is n to the 4th.

845
00:40:26,780 --> 00:40:29,780
That's the setting where
radix sort should run fast.

846
00:40:29,780 --> 00:40:33,290
Unfortunately, these numbers--
what I want to sort by

847
00:40:33,290 --> 00:40:34,500
is not an integer.

848
00:40:34,500 --> 00:40:36,650
It's a rational.

849
00:40:36,650 --> 00:40:39,020
And that's annoying.

850
00:40:39,020 --> 00:40:43,850
So there are a couple of
ways to solve this problem.

851
00:40:43,850 --> 00:40:48,980
In general, a good way to solve
sorting is to use merge sort.

852
00:40:48,980 --> 00:40:50,590
Merge sort is always
a good answer.

853
00:40:50,590 --> 00:40:51,770
It's not the best answer.

854
00:40:51,770 --> 00:40:54,540
In these cases, we
shaved off a log.

855
00:40:54,540 --> 00:40:55,730
We got to linear time.

856
00:40:55,730 --> 00:40:57,330
But n log n is pretty good.

857
00:40:57,330 --> 00:40:58,530
It's pretty close to n.

858
00:40:58,530 --> 00:41:01,820
So first goal might
be, can we even achieve

859
00:41:01,820 --> 00:41:03,428
n log n via merge sort?

860
00:41:06,710 --> 00:41:13,805
What would I need to do in order
to actually apply merge sort

861
00:41:13,805 --> 00:41:14,840
to this instance?

862
00:41:18,700 --> 00:41:21,690
What does merge
sort do to its keys?

863
00:41:32,110 --> 00:41:32,610
Sorry?

864
00:41:32,610 --> 00:41:34,230
AUDIENCE: Isolate
and compare them.

865
00:41:34,230 --> 00:41:36,930
ERIK DEMAINE: It isolates and
compares them, yeah, right.

866
00:41:36,930 --> 00:41:39,030
So there's an array
data structure.

867
00:41:39,030 --> 00:41:40,410
And it indexes into the array.

868
00:41:40,410 --> 00:41:41,820
That's the isolation.

869
00:41:41,820 --> 00:41:45,330
But then the thing it actually
does with the items themselves

870
00:41:45,330 --> 00:41:46,427
is always a comparison.

871
00:41:46,427 --> 00:41:48,510
And this is why we introduced
the comparison model

872
00:41:48,510 --> 00:41:51,150
and proved an n log n lower
bound in the comparison model,

873
00:41:51,150 --> 00:41:53,010
because merge sort,
and insertion sort,

874
00:41:53,010 --> 00:41:55,920
and selection sort are
all comparison algorithms.

875
00:41:55,920 --> 00:41:57,443
Radix sort is not.

876
00:41:57,443 --> 00:41:58,110
But this one is.

877
00:41:58,110 --> 00:42:03,150
But to apply merge sort, I need
to say, how do I compare wi

878
00:42:03,150 --> 00:42:09,375
over fi versus wj over fj?

879
00:42:12,850 --> 00:42:15,580
My computer only
deals with integers.

880
00:42:15,580 --> 00:42:19,090
We can't actually
represent wi over fi

881
00:42:19,090 --> 00:42:23,710
explicitly in binary, because
it has infinitely many bits.

882
00:42:23,710 --> 00:42:25,240
But I can represent
it implicitly

883
00:42:25,240 --> 00:42:27,080
by storing wi and fi.

884
00:42:27,080 --> 00:42:27,580
Yeah?

885
00:42:27,580 --> 00:42:29,620
AUDIENCE: Multiply by fi and fj.

886
00:42:29,620 --> 00:42:31,570
ERIK DEMAINE: Multiply
by fi and fj, yeah.

887
00:42:31,570 --> 00:42:33,220
When I went-- I
didn't go to school,

888
00:42:33,220 --> 00:42:37,123
but then we learned
cross multiplication,

889
00:42:37,123 --> 00:42:39,165
which is the same as
multiplying both sides by fi

890
00:42:39,165 --> 00:42:41,530
and multiplying both
sides by fj, as you said.

891
00:42:41,530 --> 00:42:48,880
So then we get fi fj less
than question mark f--

892
00:42:48,880 --> 00:42:53,433
whatever-- fi wj.

893
00:42:53,433 --> 00:42:55,600
When we do that, we better
make sure that the things

894
00:42:55,600 --> 00:42:57,183
we're multiplying
by are non-negative.

895
00:42:57,183 --> 00:42:58,750
Otherwise, the sign flips.

896
00:42:58,750 --> 00:43:01,870
But here, we assume
they're all non-negative.

897
00:43:01,870 --> 00:43:02,988
So this is good.

898
00:43:02,988 --> 00:43:05,030
And now we're just
multiplying two integers here,

899
00:43:05,030 --> 00:43:06,970
multiplying two integers
here, and comparing.

900
00:43:06,970 --> 00:43:08,803
Those are all things I
can do in a word RAM.

901
00:43:11,440 --> 00:43:13,600
So this was actually
the intended solution

902
00:43:13,600 --> 00:43:14,980
when this problem was posed.

903
00:43:14,980 --> 00:43:16,510
Here's a way to do
comparison sort.

904
00:43:16,510 --> 00:43:17,890
We get n log n.

905
00:43:17,890 --> 00:43:21,140
But in fact, you can
achieve linear time.

906
00:43:21,140 --> 00:43:21,640
Yeah?

907
00:43:21,640 --> 00:43:23,723
AUDIENCE: [INAUDIBLE] that
solution, how would you

908
00:43:23,723 --> 00:43:26,110
quickly say which one's bigger?

909
00:43:26,110 --> 00:43:29,573
Because wi times f
of j, one of them

910
00:43:29,573 --> 00:43:32,320
belongs to one of the
Pokemons, and the other one

911
00:43:32,320 --> 00:43:34,852
is [INAUDIBLE].

912
00:43:34,852 --> 00:43:37,060
ERIK DEMAINE: I feel like
there's a joke here, like--

913
00:43:37,060 --> 00:43:38,523
AUDIENCE: [INAUDIBLE]

914
00:43:38,523 --> 00:43:39,940
ERIK DEMAINE:
Pikachu is superior.

915
00:43:39,940 --> 00:43:42,860
That's always the answer.

916
00:43:42,860 --> 00:43:44,950
So how do I tell
whether one Pokemon

917
00:43:44,950 --> 00:43:46,730
is superior to the other?

918
00:43:46,730 --> 00:43:51,510
If I multiply my--

919
00:43:51,510 --> 00:43:56,020
I multiply i's f value
with j's w value.

920
00:43:56,020 --> 00:43:59,890
And I see whether that's
greater than i's w

921
00:43:59,890 --> 00:44:02,510
value times j's f value.

922
00:44:02,510 --> 00:44:03,830
And if it is--

923
00:44:03,830 --> 00:44:06,718
so these are equivalent.

924
00:44:06,718 --> 00:44:08,260
If this one is
greater than this one,

925
00:44:08,260 --> 00:44:10,180
I know that this is
greater than this.

926
00:44:10,180 --> 00:44:14,350
These are equivalent sentences
by mathematics, by algebra.

927
00:44:14,350 --> 00:44:16,630
And so this is what
I want to know.

928
00:44:16,630 --> 00:44:18,760
This would say j
is superior to i.

929
00:44:18,760 --> 00:44:22,010
And so I determine that
by actually doing this.

930
00:44:22,010 --> 00:44:25,060
So then I don't have to divide
and deal with real numbers,

931
00:44:25,060 --> 00:44:27,790
because I don't know how,
because I'm a computer.

932
00:44:31,110 --> 00:44:32,430
We're all computers in the end.

933
00:44:39,345 --> 00:44:39,845
OK.

934
00:44:53,050 --> 00:44:57,100
So it would be great
if my numbers all

935
00:44:57,100 --> 00:44:58,670
had the same denominator.

936
00:44:58,670 --> 00:45:02,530
If they all had the same f, then
I could just compare the w's.

937
00:45:02,530 --> 00:45:06,220
So that's one intuition
for why we can actually

938
00:45:06,220 --> 00:45:09,070
do this in linear time.

939
00:45:09,070 --> 00:45:11,440
But the way I like to
think about it-- so let's

940
00:45:11,440 --> 00:45:16,240
just draw the real
interval from 0 to 1.

941
00:45:16,240 --> 00:45:23,040
And there are various spots
all over here that represent--

942
00:45:23,040 --> 00:45:24,310
I can't actually compute this.

943
00:45:24,310 --> 00:45:27,310
But conceptually, each of
these wi fi's falls somewhere

944
00:45:27,310 --> 00:45:29,790
in that interval from 0 to 1.

945
00:45:29,790 --> 00:45:34,012
And I want to sort them somehow.

946
00:45:34,012 --> 00:45:35,470
So one thing that
would be great is

947
00:45:35,470 --> 00:45:37,030
if I could take
these real numbers

948
00:45:37,030 --> 00:45:42,830
and somehow map
them to integers,

949
00:45:42,830 --> 00:45:47,560
which are uniformly spaced,
maybe a few more of them.

950
00:45:47,560 --> 00:45:50,620
But these go from
0 to u minus 1.

951
00:45:50,620 --> 00:45:53,080
And if I could get
u relatively small,

952
00:45:53,080 --> 00:45:55,550
and I could map each of these--

953
00:45:55,550 --> 00:45:58,450
so I want that mapping
to be order preserving.

954
00:45:58,450 --> 00:46:03,160
And I want two very close,
but distinct items to map to--

955
00:46:03,160 --> 00:46:04,600
distinct keys here.

956
00:46:04,600 --> 00:46:06,760
I want them to map to
distinct integers down here.

957
00:46:06,760 --> 00:46:08,968
If I could do that, then I
just sort by the integers.

958
00:46:08,968 --> 00:46:13,070
And that's the same as
sorting by the real numbers.

959
00:46:13,070 --> 00:46:17,650
And so at this point,
I wonder, how close

960
00:46:17,650 --> 00:46:20,350
can two of these numbers be?

961
00:46:20,350 --> 00:46:30,010
So how close can two keys be?

962
00:46:30,010 --> 00:46:37,510
So I want to consider wi
over fi minus wj over fj

963
00:46:37,510 --> 00:46:38,580
in absolute value.

964
00:46:41,110 --> 00:46:43,550
Now I do algebra.

965
00:46:43,550 --> 00:46:44,830
So this is--

966
00:46:44,830 --> 00:46:47,900
I'd like to bring
this into one ratio.

967
00:46:47,900 --> 00:46:49,090
So this is--

968
00:46:49,090 --> 00:46:52,782
I can do that by multiplying
1 by fi, 1 by fj.

969
00:46:52,782 --> 00:46:58,360
Now that's wi fj minus
wj fi, which should

970
00:46:58,360 --> 00:47:01,030
look a lot like something here.

971
00:47:01,030 --> 00:47:02,740
But never mind.

972
00:47:02,740 --> 00:47:04,520
I'm sure there's a
deep connection here.

973
00:47:04,520 --> 00:47:08,430
I can probably use this to
prove that and vice versa.

974
00:47:08,430 --> 00:47:08,930
Cool.

975
00:47:08,930 --> 00:47:11,980
So with some absolute
values, same thing.

976
00:47:11,980 --> 00:47:15,160
Maybe these are non-negative,
so I can actually

977
00:47:15,160 --> 00:47:18,220
just put absolute
values on the top part.

978
00:47:18,220 --> 00:47:24,070
And OK, wi is an integer, fj is
an integer, wj is an integer,

979
00:47:24,070 --> 00:47:27,310
fi is an integer, all
greater than or equal to 0.

980
00:47:27,310 --> 00:47:30,560
So this thing is an integer.

981
00:47:30,560 --> 00:47:33,405
So it could be equal to 0.

982
00:47:33,405 --> 00:47:35,530
It's a non-negative integer,
because all the things

983
00:47:35,530 --> 00:47:36,400
are non-negative.

984
00:47:36,400 --> 00:47:37,390
It could be equal to 0.

985
00:47:37,390 --> 00:47:38,765
But if they're
equal to 0, that's

986
00:47:38,765 --> 00:47:41,590
actually identical
ratios, right?

987
00:47:41,590 --> 00:47:43,240
If this is 0, the
whole thing is 0.

988
00:47:43,240 --> 00:47:46,180
And so these two
values were the same.

989
00:47:46,180 --> 00:47:47,560
OK, but let's
suppose it's not 0.

990
00:47:47,560 --> 00:47:50,170
If it's not 0, it's
actually at least 1,

991
00:47:50,170 --> 00:47:53,950
the absolute value,
because it's an integer.

992
00:47:53,950 --> 00:47:55,210
What about the bottom?

993
00:47:55,210 --> 00:47:57,950
fi-- so now we want this--

994
00:47:57,950 --> 00:48:00,040
I want to know how
small this ratio can be.

995
00:48:00,040 --> 00:48:02,860
It's going to be small when
this is small and this is big.

996
00:48:02,860 --> 00:48:05,030
How big could fi fj be?

997
00:48:05,030 --> 00:48:07,720
Well, we're told that all the
f's are less than n squared.

998
00:48:07,720 --> 00:48:10,610
So this thing is
at most n squared,

999
00:48:10,610 --> 00:48:14,380
n to the 4th, less
than n the 4th--

1000
00:48:14,380 --> 00:48:17,620
n squared minus 1 squared,
less than n to the 4th.

1001
00:48:17,620 --> 00:48:20,050
AUDIENCE: [INAUDIBLE]

1002
00:48:20,050 --> 00:48:21,910
ERIK DEMAINE: fi is
at most n squared.

1003
00:48:21,910 --> 00:48:24,230
fj is at most n squared.

1004
00:48:24,230 --> 00:48:26,450
So it's n squared squared.

1005
00:48:26,450 --> 00:48:29,570
So this is at least
1 over n to the 4th.

1006
00:48:29,570 --> 00:48:32,830
So the closest the two
points can get here

1007
00:48:32,830 --> 00:48:34,810
is 1 over n to the 4th.

1008
00:48:34,810 --> 00:48:39,430
So what can I do to scale
that up to make them

1009
00:48:39,430 --> 00:48:41,860
kind of like integers?

1010
00:48:41,860 --> 00:48:45,400
Multiply by n to the 4th.

1011
00:48:45,400 --> 00:48:54,220
So just multiply by n to
the 4th and then floor.

1012
00:48:54,220 --> 00:48:59,170
So we're going to
take each fi over--

1013
00:48:59,170 --> 00:49:00,640
I'd like to compute this ratio.

1014
00:49:00,640 --> 00:49:02,110
But I don't know how.

1015
00:49:02,110 --> 00:49:05,320
So instead, I'm going
to take fi, multiply--

1016
00:49:05,320 --> 00:49:05,890
OK.

1017
00:49:05,890 --> 00:49:08,890
Conceptually, what I want to
do is multiply by n to the 4th

1018
00:49:08,890 --> 00:49:10,510
and take the floor.

1019
00:49:10,510 --> 00:49:16,840
How do I actually do this
in a machine that doesn't

1020
00:49:16,840 --> 00:49:18,310
have real numbers like this?

1021
00:49:20,900 --> 00:49:23,575
So I don't have a
floor operation.

1022
00:49:23,575 --> 00:49:26,200
I just have integer operations.

1023
00:49:26,200 --> 00:49:36,610
Then I can take fi,
multiply it by n to the 4th,

1024
00:49:36,610 --> 00:49:38,450
and integer divide by wj.

1025
00:49:41,290 --> 00:49:42,400
That is the same--

1026
00:49:42,400 --> 00:49:45,160
that computes exactly
this, because I

1027
00:49:45,160 --> 00:49:46,910
can do the multiplication
and the division

1028
00:49:46,910 --> 00:49:50,192
in either order in real space.

1029
00:49:50,192 --> 00:49:52,400
And then this does the floor
at the appropriate time.

1030
00:49:52,400 --> 00:49:54,620
But this is just
operations on integers.

1031
00:49:54,620 --> 00:49:57,010
And now these are
integers representing

1032
00:49:57,010 --> 00:49:59,080
how good my Pokemon are.

1033
00:49:59,080 --> 00:50:02,110
They have the property that
any two distinct ones--

1034
00:50:02,110 --> 00:50:04,690
before I take the floor, any
two distinct ones are at least 1

1035
00:50:04,690 --> 00:50:05,493
apart.

1036
00:50:05,493 --> 00:50:07,660
So after I take the floor,
they will remain 1 apart.

1037
00:50:07,660 --> 00:50:09,760
They will remain
distinct integers.

1038
00:50:09,760 --> 00:50:12,430
And so I have successfully
mapped my real numbers

1039
00:50:12,430 --> 00:50:14,710
to integers where distinct
real numbers match

1040
00:50:14,710 --> 00:50:15,590
distinct integers.

1041
00:50:15,590 --> 00:50:16,090
Yeah?

1042
00:50:16,090 --> 00:50:16,715
AUDIENCE: Wait.

1043
00:50:16,715 --> 00:50:18,870
So why is fi now in
the numerator, and wi

1044
00:50:18,870 --> 00:50:19,760
in the denominator?

1045
00:50:19,760 --> 00:50:21,010
ERIK DEMAINE: Did I flip them?

1046
00:50:21,010 --> 00:50:22,450
Yeah, sorry.

1047
00:50:22,450 --> 00:50:26,620
Please invert
everything-- just here.

1048
00:50:26,620 --> 00:50:27,670
This is w and fi.

1049
00:50:27,670 --> 00:50:28,630
That was just a typo.

1050
00:50:33,390 --> 00:50:35,016
That's all of them.

1051
00:50:35,016 --> 00:50:36,480
OK.

1052
00:50:36,480 --> 00:50:38,200
AUDIENCE: Are they
both i's or j's?

1053
00:50:41,840 --> 00:50:45,310
ERIK DEMAINE: These are
supposed to both be i's, yeah.

1054
00:50:45,310 --> 00:50:46,410
Thank you.

1055
00:50:46,410 --> 00:50:49,680
This was for each
Pokemon, i, we're

1056
00:50:49,680 --> 00:50:52,595
going to compute
this as our key.

1057
00:50:52,595 --> 00:50:54,720
And then we're going to
sort by those integer keys.

1058
00:50:54,720 --> 00:50:56,678
And that will sort the
Pokemon by their ratios.

1059
00:51:00,930 --> 00:51:05,080
Let's write mon for monster.

1060
00:51:05,080 --> 00:51:05,580
Yeah?

1061
00:51:05,580 --> 00:51:10,560
AUDIENCE: [INAUDIBLE]
u minus 1 [INAUDIBLE]??

1062
00:51:10,560 --> 00:51:12,960
ERIK DEMAINE: So u was just a--

1063
00:51:12,960 --> 00:51:16,028
sorry, this is-- a label on
this thing might help you.

1064
00:51:16,028 --> 00:51:16,570
AUDIENCE: Oh.

1065
00:51:16,570 --> 00:51:19,050
ERIK DEMAINE: Yeah.

1066
00:51:19,050 --> 00:51:21,720
So now my u-- oh, right.

1067
00:51:21,720 --> 00:51:23,110
What is my u?

1068
00:51:23,110 --> 00:51:24,480
What is my largest key?

1069
00:51:32,048 --> 00:51:36,145
It occurs to me, I really would
like fi to be bigger than 0.

1070
00:51:36,145 --> 00:51:40,540
But let's not worry about it.

1071
00:51:40,540 --> 00:51:42,040
How big can u be?

1072
00:51:42,040 --> 00:51:45,410
Well, the biggest this
can be is if fi is small,

1073
00:51:45,410 --> 00:51:46,240
and this is big.

1074
00:51:46,240 --> 00:51:47,920
Let's say fi can
only go down to 1.

1075
00:51:47,920 --> 00:51:49,645
Otherwise, we'll
get a division by 0.

1076
00:51:49,645 --> 00:51:51,910
We have to deal with
infinity especially.

1077
00:51:51,910 --> 00:51:54,250
Probably, the problem isn't
even well defined then.

1078
00:51:54,250 --> 00:51:55,270
How big could this be?

1079
00:51:55,270 --> 00:51:57,049
Well, I know the wi's--

1080
00:51:57,049 --> 00:51:58,632
AUDIENCE: f's are
defined as positive.

1081
00:51:58,632 --> 00:51:59,890
ERIK DEMAINE: Oh, good.

1082
00:51:59,890 --> 00:52:00,498
Thank you.

1083
00:52:00,498 --> 00:52:02,290
So there's also a
positive constraint here.

1084
00:52:02,290 --> 00:52:07,720
Just I failed to preserve
that constraint in my mapping

1085
00:52:07,720 --> 00:52:10,150
from the word problem
into the formal problem.

1086
00:52:12,950 --> 00:52:14,800
So f is the least 1.

1087
00:52:14,800 --> 00:52:15,490
Good.

1088
00:52:15,490 --> 00:52:17,890
But worst case is when it's 1.

1089
00:52:17,890 --> 00:52:19,730
And when wi-- how
big could it be?

1090
00:52:19,730 --> 00:52:21,340
Well, n squared minus 1.

1091
00:52:21,340 --> 00:52:23,680
So this could be,
basically, n squared

1092
00:52:23,680 --> 00:52:27,415
times n to the 4th divided
by 1, which is n to the 6th.

1093
00:52:27,415 --> 00:52:31,720
So w-- or sorry, u, the
largest key I can have plus 1,

1094
00:52:31,720 --> 00:52:33,550
is n to the 6th.

1095
00:52:33,550 --> 00:52:36,400
But that's OK, because
radix sort can handle

1096
00:52:36,400 --> 00:52:38,780
any fixed polynomial in n.

1097
00:52:38,780 --> 00:52:41,305
So it's going to end up doing
six counting sort passes.

1098
00:52:46,960 --> 00:52:51,820
OK, that's problem 3.

1099
00:52:51,820 --> 00:52:54,640
Let's move on to problem 4.

1100
00:53:18,390 --> 00:53:22,260
So problem 4, MIT has
employed Gank Frehry.

1101
00:53:22,260 --> 00:53:22,800
Who's that?

1102
00:53:26,610 --> 00:53:27,790
Frank Gehry, yeah.

1103
00:53:27,790 --> 00:53:34,970
This is a common encoding
that Jason really likes.

1104
00:53:34,970 --> 00:53:38,820
I've grown to like it.

1105
00:53:38,820 --> 00:53:42,720
This is called spoonerism,
where you replace some part

1106
00:53:42,720 --> 00:53:45,580
of the beginning of your thing.

1107
00:53:45,580 --> 00:53:47,410
OK, that's one joke.

1108
00:53:47,410 --> 00:53:49,357
There's another joke
in this problem.

1109
00:53:49,357 --> 00:53:51,690
Anyway, they're building a
new wing of the Stata Center,

1110
00:53:51,690 --> 00:53:53,280
as one does.

1111
00:53:53,280 --> 00:53:55,510
We have a bunch of cubes.

1112
00:53:55,510 --> 00:53:58,620
If you read long enough, you
realize that's a red herring.

1113
00:53:58,620 --> 00:54:00,810
Cubes do not play a
role in this problem.

1114
00:54:00,810 --> 00:54:03,870
In the end, what
we have is a bunch

1115
00:54:03,870 --> 00:54:10,197
of integers, which happen to be
the side length of the cubes.

1116
00:54:10,197 --> 00:54:12,030
But we just care about
the side lengths, not

1117
00:54:12,030 --> 00:54:14,310
their volume or anything--

1118
00:54:14,310 --> 00:54:21,640
s n minus 1.

1119
00:54:21,640 --> 00:54:32,015
And we want two numbers
in s summing to h.

1120
00:54:32,015 --> 00:54:34,440
AUDIENCE: This is
dumb, but how can cubes

1121
00:54:34,440 --> 00:54:36,553
have more than six sides?

1122
00:54:36,553 --> 00:54:38,220
ERIK DEMAINE: This
is a side length, not

1123
00:54:38,220 --> 00:54:39,610
the number of sides.

1124
00:54:39,610 --> 00:54:40,470
So a cube--

1125
00:54:40,470 --> 00:54:43,518
AUDIENCE: Oh, OK.

1126
00:54:43,518 --> 00:54:44,310
ERIK DEMAINE: Cool.

1127
00:54:44,310 --> 00:54:46,470
I didn't know we'd be
doing 3D geometry today.

1128
00:54:46,470 --> 00:54:48,090
That's si.

1129
00:54:48,090 --> 00:54:49,800
OK, so you got little cubes.

1130
00:54:49,800 --> 00:54:52,020
You've got big cubes.

1131
00:54:52,020 --> 00:54:53,100
This is a small si.

1132
00:54:53,100 --> 00:54:53,910
This is a big si.

1133
00:54:53,910 --> 00:54:54,868
Doesn't matter, though.

1134
00:54:54,868 --> 00:54:56,163
They're just numbers.

1135
00:54:56,163 --> 00:54:57,330
We're not using them at all.

1136
00:54:57,330 --> 00:54:58,650
In the problem,
you're trying to like

1137
00:54:58,650 --> 00:54:59,860
stack one cube on the other.

1138
00:54:59,860 --> 00:55:01,820
But all we really care
about is two numbers

1139
00:55:01,820 --> 00:55:06,930
whose sum, regular old
sum, is exactly h, ideally.

1140
00:55:06,930 --> 00:55:09,250
There's going to be two
versions of this problem.

1141
00:55:09,250 --> 00:55:12,360
And so the first goal
is to solve this exactly

1142
00:55:12,360 --> 00:55:17,550
in linear expected time.

1143
00:55:17,550 --> 00:55:20,100
That's what the problem says.

1144
00:55:20,100 --> 00:55:23,110
So what do we know?

1145
00:55:23,110 --> 00:55:24,523
Well, linear time, that's--

1146
00:55:24,523 --> 00:55:26,940
can't get much faster than
that, because we need that just

1147
00:55:26,940 --> 00:55:28,170
to read the input.

1148
00:55:28,170 --> 00:55:32,220
Expected time-- hashing, right?

1149
00:55:32,220 --> 00:55:34,680
We're told, basically,
we should use hashing.

1150
00:55:34,680 --> 00:55:36,540
Now, if we're really
annoying, maybe we

1151
00:55:36,540 --> 00:55:38,290
throw that in even
when you don't need it.

1152
00:55:38,290 --> 00:55:40,750
But that's pretty rare.

1153
00:55:40,750 --> 00:55:43,728
So when we see expected,
we should, in a problem

1154
00:55:43,728 --> 00:55:45,270
set setting like
this-- in real life,

1155
00:55:45,270 --> 00:55:46,728
you never know what
you should use.

1156
00:55:46,728 --> 00:55:49,110
But in our-- with your
learning in this class,

1157
00:55:49,110 --> 00:55:51,082
we're going to tell
you basically what

1158
00:55:51,082 --> 00:55:52,290
tricks you're allowed to use.

1159
00:55:52,290 --> 00:55:54,040
Here, you're allowed
to use randomization.

1160
00:55:54,040 --> 00:55:55,590
So probably, we need it.

1161
00:55:55,590 --> 00:55:59,760
Indeed, you need it
to achieve this bound.

1162
00:55:59,760 --> 00:56:01,650
Cool.

1163
00:56:01,650 --> 00:56:03,210
Hashing.

1164
00:56:03,210 --> 00:56:05,460
Not obvious how to approach
this problem with hashing.

1165
00:56:05,460 --> 00:56:08,640
So I'm going to
give you the way I--

1166
00:56:08,640 --> 00:56:10,800
it's hard for me to not
know this algorithm.

1167
00:56:10,800 --> 00:56:14,400
But to me, the first thing
you should think about

1168
00:56:14,400 --> 00:56:17,760
is if I have linear
time and n things,

1169
00:56:17,760 --> 00:56:22,800
and I'm going to use hashing,
the obvious thing to do

1170
00:56:22,800 --> 00:56:25,590
is to take those n things
and put them in a hash table.

1171
00:56:25,590 --> 00:56:26,400
Build.

1172
00:56:26,400 --> 00:56:28,540
Why not?

1173
00:56:28,540 --> 00:56:38,400
So let's just build a hash
table on all the keys in s.

1174
00:56:38,400 --> 00:56:39,540
That's idea one.

1175
00:56:45,510 --> 00:56:46,980
Seems like the
first thing to try.

1176
00:56:46,980 --> 00:56:49,440
So what does that let me do?

1177
00:56:49,440 --> 00:56:50,640
It lets me--

1178
00:56:50,640 --> 00:56:52,950
I just erased the
interface for hash tables.

1179
00:56:52,950 --> 00:56:55,170
But I can build a
sequence out of it.

1180
00:56:55,170 --> 00:56:57,630
But normally, it gives
me a set interface.

1181
00:56:57,630 --> 00:56:59,880
So I can call find
now in constant time.

1182
00:56:59,880 --> 00:57:03,330
It lets me, given the
number, determine immediately

1183
00:57:03,330 --> 00:57:06,050
whether that number is in s.

1184
00:57:06,050 --> 00:57:07,800
Well, that sounds
interesting, because I'm

1185
00:57:07,800 --> 00:57:09,990
looking for two numbers in s.

1186
00:57:09,990 --> 00:57:11,740
So it lets me find one of them.

1187
00:57:11,740 --> 00:57:13,710
So I call it twice?

1188
00:57:13,710 --> 00:57:14,880
No.

1189
00:57:14,880 --> 00:57:17,342
Calling it twice and only
spending constant time

1190
00:57:17,342 --> 00:57:19,050
on this beautiful data
structure will not

1191
00:57:19,050 --> 00:57:20,220
give you anything useful.

1192
00:57:23,310 --> 00:57:25,070
But we have linear time, right?

1193
00:57:25,070 --> 00:57:26,940
So in addition to
building a table,

1194
00:57:26,940 --> 00:57:29,730
we could call find on
that table a linear number

1195
00:57:29,730 --> 00:57:32,610
of times, because each find
only takes constant expected

1196
00:57:32,610 --> 00:57:33,910
amortized time.

1197
00:57:33,910 --> 00:57:37,410
So if I do n of them, that
will take linear expected time.

1198
00:57:37,410 --> 00:57:40,850
The amortization disappears,
because I'm using it n times.

1199
00:57:40,850 --> 00:57:41,725
AUDIENCE: [INAUDIBLE]

1200
00:57:41,725 --> 00:57:42,725
ERIK DEMAINE: Oh, right.

1201
00:57:42,725 --> 00:57:44,020
Find never has amortization.

1202
00:57:44,020 --> 00:57:46,740
So it doesn't disappear,
because it was never there.

1203
00:57:46,740 --> 00:57:47,610
Never mind.

1204
00:57:47,610 --> 00:57:52,530
I can afford n calls,
or 5n calls, to find,

1205
00:57:52,530 --> 00:57:55,290
because each one costs
constant expected.

1206
00:57:55,290 --> 00:57:57,250
And the total for that
will be linear time.

1207
00:57:57,250 --> 00:58:04,966
So the next idea is let's
just somehow call find

1208
00:58:04,966 --> 00:58:12,280
a linear number of times, OK?

1209
00:58:12,280 --> 00:58:16,440
So I want to find two numbers
summing to a given value, h.

1210
00:58:16,440 --> 00:58:19,540
That wasn't maybe
clear, but h is given.

1211
00:58:19,540 --> 00:58:20,440
AUDIENCE: Sorry.

1212
00:58:20,440 --> 00:58:22,720
How long does it take
to build the hash table?

1213
00:58:22,720 --> 00:58:24,130
ERIK DEMAINE: How long does
it take to build a hash table?

1214
00:58:24,130 --> 00:58:26,422
It was previously on this
board-- linear expected time.

1215
00:58:29,360 --> 00:58:32,421
See previous lecture.

1216
00:58:32,421 --> 00:58:35,220
No, two years ago.

1217
00:58:35,220 --> 00:58:35,860
OK.

1218
00:58:35,860 --> 00:58:37,410
Well, if we're going to do
this a linear number of times,

1219
00:58:37,410 --> 00:58:39,310
I guess we should
have a for loop.

1220
00:58:39,310 --> 00:58:41,145
Let's do a for loop
over the numbers.

1221
00:58:41,145 --> 00:58:44,370
That's the next idea.

1222
00:58:44,370 --> 00:58:48,247
Loop over s.

1223
00:58:48,247 --> 00:58:49,830
And at this point,
we're done, almost.

1224
00:58:52,560 --> 00:58:55,330
I want space.

1225
00:58:55,330 --> 00:58:57,000
So I want to loop
over the numbers.

1226
00:58:57,000 --> 00:58:58,860
And each one, I
want to do a find.

1227
00:58:58,860 --> 00:59:00,900
That's kind of all
I have time to do.

1228
00:59:00,900 --> 00:59:04,900
So seems like a
natural thing to try.

1229
00:59:04,900 --> 00:59:06,670
This is by no means easy.

1230
00:59:06,670 --> 00:59:07,890
Don't get me wrong.

1231
00:59:07,890 --> 00:59:09,570
Having these ideas is--

1232
00:59:09,570 --> 00:59:12,150
while I'm explaining them
as the obvious ideas,

1233
00:59:12,150 --> 00:59:13,600
they're not obvious.

1234
00:59:13,600 --> 00:59:17,040
But they are easy,
at least, just not

1235
00:59:17,040 --> 00:59:19,300
obvious to come up
with the easy ideas.

1236
00:59:19,300 --> 00:59:22,590
So let's loop over
s, somehow call find,

1237
00:59:22,590 --> 00:59:23,670
using our hash table.

1238
00:59:23,670 --> 00:59:25,860
So the order is actually, we're
going to build the hash table,

1239
00:59:25,860 --> 00:59:26,500
then loop.

1240
00:59:26,500 --> 00:59:28,583
And inside the loop, we're
going to call find once

1241
00:59:28,583 --> 00:59:31,000
per loop iteration.

1242
00:59:31,000 --> 00:59:32,020
So let's do it.

1243
00:59:32,020 --> 00:59:37,410
Let's say, for si in S--

1244
00:59:40,320 --> 00:59:42,210
so I want to find two numbers.

1245
00:59:42,210 --> 00:59:46,050
Here, I have exhaustively
looped over one number.

1246
00:59:46,050 --> 00:59:47,850
I just need to find
the second number that

1247
00:59:47,850 --> 00:59:49,600
can possibly add up, right?

1248
00:59:49,600 --> 00:59:57,390
I want to find whether
there's an sj in S such

1249
00:59:57,390 --> 01:00:05,090
that si plus sj equals h.

1250
01:00:08,030 --> 01:00:09,530
Can I do that query with find?

1251
01:00:12,290 --> 01:00:12,790
How?

1252
01:00:16,350 --> 01:00:17,790
So what does find do?

1253
01:00:17,790 --> 01:00:22,590
Find says, if I give you a key,
it will tell me whether-- like,

1254
01:00:22,590 --> 01:00:25,800
if I knew what sj
was, it would tell me

1255
01:00:25,800 --> 01:00:29,346
whether it's in S. Yeah?

1256
01:00:29,346 --> 01:00:31,776
AUDIENCE: Can't you
just subtract h from si

1257
01:00:31,776 --> 01:00:33,450
and then see if [INAUDIBLE]?

1258
01:00:33,450 --> 01:00:38,860
ERIK DEMAINE: Subtract h from
si and see whether that exists.

1259
01:00:38,860 --> 01:00:39,720
Did get it right?

1260
01:00:39,720 --> 01:00:40,890
AUDIENCE: h minus si.

1261
01:00:40,890 --> 01:00:42,092
ERIK DEMAINE: h minus si.

1262
01:00:42,092 --> 01:00:43,582
I always get it wrong.

1263
01:00:46,500 --> 01:00:48,330
Don't feel bad that
you also got it wrong.

1264
01:00:48,330 --> 01:00:51,040
It makes me feel better,
because I always get it wrong.

1265
01:00:51,040 --> 01:00:52,650
So the claim is this.

1266
01:00:52,650 --> 01:00:53,460
Why?

1267
01:00:53,460 --> 01:00:57,570
Because what we
want to do is find--

1268
01:00:57,570 --> 01:00:58,890
well, OK.

1269
01:00:58,890 --> 01:01:01,040
Let's see what it
says over here.

1270
01:01:01,040 --> 01:01:05,100
So if we do h minus
si equals sj--

1271
01:01:05,100 --> 01:01:07,740
so these are
equivalent statements,

1272
01:01:07,740 --> 01:01:10,320
just by moving the si over.

1273
01:01:10,320 --> 01:01:12,460
And this is a query we can do.

1274
01:01:12,460 --> 01:01:15,330
So let's remember, these
are things we know.

1275
01:01:19,830 --> 01:01:21,870
And s.j is something
we don't know.

1276
01:01:21,870 --> 01:01:24,140
All that we know
is that it's an s.

1277
01:01:24,140 --> 01:01:25,650
OK, so we know these two things.

1278
01:01:25,650 --> 01:01:27,970
So if we bring them
over to the same side,

1279
01:01:27,970 --> 01:01:30,220
we're searching for
an unknown thing,

1280
01:01:30,220 --> 01:01:33,030
which is equal to exactly this
thing that we can compute.

1281
01:01:33,030 --> 01:01:35,190
So we just compute h minus si.

1282
01:01:35,190 --> 01:01:35,910
We call find.

1283
01:01:35,910 --> 01:01:40,230
That will tell us whether
there is an sj equal to this.

1284
01:01:40,230 --> 01:01:43,590
OK, so this is like a comment.

1285
01:01:43,590 --> 01:01:45,990
And this is what we actually do.

1286
01:01:45,990 --> 01:01:49,050
And if there is a pair
of numbers summing to h,

1287
01:01:49,050 --> 01:01:50,400
this will find it.

1288
01:01:50,400 --> 01:01:52,090
How much time did it take?

1289
01:01:52,090 --> 01:01:54,690
Well, we're doing n
iterations of this loop.

1290
01:01:54,690 --> 01:01:57,450
Each one, we're calling
a single find operation.

1291
01:01:57,450 --> 01:02:01,760
And find costs
constant expected time.

1292
01:02:01,760 --> 01:02:05,700
And so the total is
linear expected time.

1293
01:02:05,700 --> 01:02:07,050
Great.

1294
01:02:07,050 --> 01:02:08,136
Part A done.

1295
01:02:19,560 --> 01:02:24,330
Then they throw part b
at us, make it harder.

1296
01:02:24,330 --> 01:02:25,950
Those pesky instructors.

1297
01:02:33,680 --> 01:02:37,010
So we read part b.

1298
01:02:37,010 --> 01:02:41,240
Part b says two things
to make it harder.

1299
01:02:41,240 --> 01:02:45,200
So first of all, we want
linear worst-case time.

1300
01:02:50,290 --> 01:02:54,360
And furthermore-- so we
can't use hashing anymore.

1301
01:02:54,360 --> 01:02:57,270
Furthermore-- so
here, we just needed

1302
01:02:57,270 --> 01:02:59,460
to solve the exact problem
to find whether the two

1303
01:02:59,460 --> 01:03:01,470
numbers sum exactly to h.

1304
01:03:01,470 --> 01:03:04,350
Now we would like to find the
best solution smaller than

1305
01:03:04,350 --> 01:03:05,940
or equal to h.

1306
01:03:05,940 --> 01:03:17,730
So find biggest
pairwise sum that's

1307
01:03:17,730 --> 01:03:22,890
less than or equal to h if
there's no perfect pair.

1308
01:03:22,890 --> 01:03:28,620
But we're given a little bit
of extra information, which

1309
01:03:28,620 --> 01:03:37,122
is, we can assume h
equals 600 n to the 6th.

1310
01:03:37,122 --> 01:03:40,620
That's a weird polynomial.

1311
01:03:40,620 --> 01:03:43,380
Took me a while to even notice
that that was a joke in here--

1312
01:03:43,380 --> 01:03:46,950
6006, hiding in a polynomial.

1313
01:03:46,950 --> 01:03:49,470
All right, so polynomial.

1314
01:03:49,470 --> 01:03:51,060
Hm.

1315
01:03:51,060 --> 01:03:52,680
That should make you
think radix sort.

1316
01:03:52,680 --> 01:03:54,160
It is radix sort week.

1317
01:03:54,160 --> 01:03:56,160
So that is a natural
thing to try.

1318
01:03:56,160 --> 01:03:58,530
But in general, even
later in the semester,

1319
01:03:58,530 --> 01:04:02,580
when you see a nice
polynomial with a fixed

1320
01:04:02,580 --> 01:04:04,178
constant like this,
and it's somehow

1321
01:04:04,178 --> 01:04:05,970
related to the integers
we're dealing with,

1322
01:04:05,970 --> 01:04:07,137
you should think radix sort.

1323
01:04:07,137 --> 01:04:09,730
Especially because now, we
want constant worst-case time,

1324
01:04:09,730 --> 01:04:11,820
radix sort seems like
a good thing to do.

1325
01:04:11,820 --> 01:04:13,380
Don't know what
to do with it yet.

1326
01:04:13,380 --> 01:04:15,810
In fact, I can't even
apply radix sort.

1327
01:04:15,810 --> 01:04:19,170
But idea one is radix sort.

1328
01:04:19,170 --> 01:04:21,000
Just because I see
that polynomial,

1329
01:04:21,000 --> 01:04:24,600
I think maybe I should try it.

1330
01:04:24,600 --> 01:04:27,690
Now, there's a problem here,
because we're given some

1331
01:04:27,690 --> 01:04:31,080
numbers, some integers, si's.

1332
01:04:31,080 --> 01:04:32,010
We're also given h.

1333
01:04:32,010 --> 01:04:34,510
We're told now that is a
nice, small polynomial.

1334
01:04:34,510 --> 01:04:37,710
But we have no idea how
big these numbers are.

1335
01:04:37,710 --> 01:04:39,540
So the problem with
this idea is that--

1336
01:04:43,180 --> 01:04:46,260
but si could be bigger than h.

1337
01:04:46,260 --> 01:04:48,420
We have no idea how
big the si's are.

1338
01:04:51,140 --> 01:04:53,510
What can I say
about si's that are

1339
01:04:53,510 --> 01:04:57,400
bigger than h for this problem?

1340
01:05:00,100 --> 01:05:02,360
Summing to h.

1341
01:05:02,360 --> 01:05:02,860
Oh.

1342
01:05:02,860 --> 01:05:05,193
I didn't say it, but all these
numbers are non-negative.

1343
01:05:05,193 --> 01:05:06,490
That's important.

1344
01:05:06,490 --> 01:05:09,160
That looks like [INAUDIBLE].

1345
01:05:09,160 --> 01:05:11,581
Greater than or equal to 0.

1346
01:05:16,580 --> 01:05:17,080
Yeah?

1347
01:05:17,080 --> 01:05:18,640
AUDIENCE: [INAUDIBLE]
solution [INAUDIBLE]..

1348
01:05:18,640 --> 01:05:19,420
ERIK DEMAINE: Right.

1349
01:05:19,420 --> 01:05:21,587
If I'm finding a sum that's
less than or equal to h,

1350
01:05:21,587 --> 01:05:22,810
they're non-negative.

1351
01:05:22,810 --> 01:05:28,540
And any number that's greater
than h, I can just throw away.

1352
01:05:28,540 --> 01:05:29,920
They'll never be in a solution.

1353
01:05:29,920 --> 01:05:32,670
So they already-- a sum of
one number is bigger than h.

1354
01:05:32,670 --> 01:05:35,175
So two is only going to get
bigger if they're non-negative.

1355
01:05:38,090 --> 01:05:41,830
So idea number two is let's
just throw out all the big si's,

1356
01:05:41,830 --> 01:05:43,180
anything bigger than h.

1357
01:05:43,180 --> 01:05:46,090
Now, that won't change the
answer, because those can never

1358
01:05:46,090 --> 01:05:47,320
be in a solution.

1359
01:05:47,320 --> 01:05:49,120
And now I have all
the si's having

1360
01:05:49,120 --> 01:05:51,700
the property that they're
less than or equal to h.

1361
01:05:51,700 --> 01:05:56,150
And so they are small,
bounded by a fixed polynomial.

1362
01:05:56,150 --> 01:05:57,600
And now I can apply radix sort.

1363
01:05:57,600 --> 01:06:01,370
So after this idea, I
can apply this idea.

1364
01:06:01,370 --> 01:06:02,830
OK, this gives you
a flavor of how

1365
01:06:02,830 --> 01:06:04,720
I like to think about problems.

1366
01:06:04,720 --> 01:06:06,820
I see clues, like a polynomial.

1367
01:06:06,820 --> 01:06:08,860
I think radix sort doesn't work.

1368
01:06:08,860 --> 01:06:12,655
But with some more ideas,
I can get it to work.

1369
01:06:12,655 --> 01:06:14,370
OK.

1370
01:06:14,370 --> 01:06:16,890
What goes with the--
so now I've sorted si.

1371
01:06:16,890 --> 01:06:18,380
OK, great.

1372
01:06:18,380 --> 01:06:19,550
S is sorted.

1373
01:06:23,650 --> 01:06:27,250
I guess we can try to
do the same algorithm,

1374
01:06:27,250 --> 01:06:29,540
except I don't have
a hash table anymore.

1375
01:06:29,540 --> 01:06:34,820
So let's just try doing a
for loop over the S. Why not?

1376
01:06:34,820 --> 01:06:40,270
So let's do for si in
S. But now it's sorted.

1377
01:06:40,270 --> 01:06:42,920
So presumably, I should
exploit the sorted order.

1378
01:06:42,920 --> 01:06:45,860
So let's do them in order.

1379
01:06:45,860 --> 01:06:49,870
So i equals 0, 1,
up to n minus 1.

1380
01:06:49,870 --> 01:06:52,030
Let's say that s0
is the smallest.

1381
01:06:52,030 --> 01:06:53,770
s1 is the next smallest.

1382
01:06:53,770 --> 01:06:56,260
sn minus 1 is the biggest.

1383
01:06:56,260 --> 01:06:58,120
So I want to do something with--

1384
01:06:58,120 --> 01:06:59,890
so I have si.

1385
01:06:59,890 --> 01:07:05,200
And I want to figure out
whether h minus si is in there.

1386
01:07:08,680 --> 01:07:12,040
Hard to do that better than--

1387
01:07:12,040 --> 01:07:15,730
actually, I could do
this with binary search.

1388
01:07:15,730 --> 01:07:18,730
I'm looking for this value.

1389
01:07:18,730 --> 01:07:20,260
And I have a sorted array now.

1390
01:07:20,260 --> 01:07:22,960
So I could binary
search for h minus si.

1391
01:07:22,960 --> 01:07:26,600
And in log n time, I will find
whether that guy is in there.

1392
01:07:26,600 --> 01:07:27,820
And if not, keep looping.

1393
01:07:27,820 --> 01:07:29,960
I can keep track of the
best thing that I found.

1394
01:07:29,960 --> 01:07:32,430
And so in n log n time, I
can definitely solve this.

1395
01:07:32,430 --> 01:07:35,020
But I'd like to get linear time.

1396
01:07:35,020 --> 01:07:36,070
Do you have a question?

1397
01:07:36,070 --> 01:07:37,270
AUDIENCE: Well,
I'm just wondering,

1398
01:07:37,270 --> 01:07:38,512
how would you [INAUDIBLE]?

1399
01:07:38,512 --> 01:07:40,622
Like, why would you
[INAUDIBLE] whether that

1400
01:07:40,622 --> 01:07:41,728
is in there [INAUDIBLE]?

1401
01:07:41,728 --> 01:07:43,270
ERIK DEMAINE: I'm
not looking for si.

1402
01:07:43,270 --> 01:07:45,910
I'm going to compute h minus si.

1403
01:07:45,910 --> 01:07:48,550
So this is-- maybe I shouldn't
even write this down, but--

1404
01:07:48,550 --> 01:07:52,090
AUDIENCE: She's asking about
the [INAUDIBLE] constraint of,

1405
01:07:52,090 --> 01:07:53,290
we're not looking for h.

1406
01:07:53,290 --> 01:07:55,990
We're looking for
something smaller than h.

1407
01:07:55,990 --> 01:07:57,020
ERIK DEMAINE: This one?

1408
01:07:57,020 --> 01:07:57,520
Or--

1409
01:07:57,520 --> 01:07:58,690
AUDIENCE: Something larger.

1410
01:07:58,690 --> 01:07:59,950
ERIK DEMAINE: Oh, this thing.

1411
01:07:59,950 --> 01:08:01,010
AUDIENCE: A large
thing less than h.

1412
01:08:01,010 --> 01:08:01,843
ERIK DEMAINE: Right.

1413
01:08:01,843 --> 01:08:06,100
So in particular, if there
are two items that sum to h,

1414
01:08:06,100 --> 01:08:08,920
I want to find it.

1415
01:08:08,920 --> 01:08:09,970
So let's start with that.

1416
01:08:09,970 --> 01:08:14,710
So I'm binary searching
for h minus si in S.

1417
01:08:14,710 --> 01:08:16,330
So I could certainly do that.

1418
01:08:16,330 --> 01:08:19,737
And if I find it, great.

1419
01:08:19,737 --> 01:08:21,279
I found a pair that
sum to exactly h.

1420
01:08:21,279 --> 01:08:24,670
If I don't find it, binary
search tells me not only

1421
01:08:24,670 --> 01:08:26,229
that it's not there,
but it tells me

1422
01:08:26,229 --> 01:08:28,850
what the previous
and next value are.

1423
01:08:28,850 --> 01:08:30,520
So even though h
minus si isn't there,

1424
01:08:30,520 --> 01:08:33,430
I can get the next largest thing
and the next smallest thing.

1425
01:08:33,430 --> 01:08:35,319
What I want is the
next smallest thing.

1426
01:08:35,319 --> 01:08:39,970
And that will be the largest
sum I can get using si.

1427
01:08:39,970 --> 01:08:43,499
And so then that's one
candidate for a sum less than

1428
01:08:43,499 --> 01:08:44,498
or equal to h.

1429
01:08:44,498 --> 01:08:45,790
I want to find the largest one.

1430
01:08:45,790 --> 01:08:46,899
So I do a for loop.

1431
01:08:46,899 --> 01:08:48,010
I always keep track of--

1432
01:08:48,010 --> 01:08:49,930
I take a list of all
the candidates I got.

1433
01:08:49,930 --> 01:08:53,529
Each time I do an iteration of
this loop, I get one candidate.

1434
01:08:53,529 --> 01:08:56,229
Then I take the largest one.

1435
01:08:56,229 --> 01:08:59,593
OK, so return largest candidate.

1436
01:09:06,859 --> 01:09:12,910
So this gives me a candidate,
just the previous item.

1437
01:09:12,910 --> 01:09:15,700
This is what we called find
previous, or find prev,

1438
01:09:15,700 --> 01:09:18,220
probably, in our set interface.

1439
01:09:18,220 --> 01:09:21,350
And if you have a sorted set,
you can do that in log n time.

1440
01:09:21,350 --> 01:09:27,382
So this is an n log n solution,
because we do n iterations

1441
01:09:27,382 --> 01:09:28,090
through the loop.

1442
01:09:28,090 --> 01:09:29,920
Each binary search takes log n.

1443
01:09:29,920 --> 01:09:32,510
I want to get linear.

1444
01:09:32,510 --> 01:09:35,455
This is not obvious.

1445
01:09:40,689 --> 01:09:43,600
The best intuition I can
think of for this next idea

1446
01:09:43,600 --> 01:09:49,750
is, well, I start with the
very smallest item in S.

1447
01:09:49,750 --> 01:09:53,029
And I want to sum up to
something that's kind of big,

1448
01:09:53,029 --> 01:09:53,529
right?

1449
01:09:53,529 --> 01:09:56,920
I threw away all the
items bigger than h.

1450
01:09:56,920 --> 01:10:00,730
If s0 is like tiny,
like close to 0,

1451
01:10:00,730 --> 01:10:02,680
because it's the smallest
one, then maybe I

1452
01:10:02,680 --> 01:10:05,380
should look at the
end of the array,

1453
01:10:05,380 --> 01:10:07,150
because I want to
compare, or I want

1454
01:10:07,150 --> 01:10:08,670
to add the smallest
thing probably

1455
01:10:08,670 --> 01:10:09,820
with the biggest thing.

1456
01:10:09,820 --> 01:10:12,940
That's as close
as I can imagine.

1457
01:10:12,940 --> 01:10:20,530
So then-- so here's my sorted S.
It's the smallest item, biggest

1458
01:10:20,530 --> 01:10:21,320
item.

1459
01:10:21,320 --> 01:10:25,150
So I'm going to loop over
these items one by one.

1460
01:10:25,150 --> 01:10:27,850
So let's start by comparing the
first one with the last one.

1461
01:10:31,690 --> 01:10:34,630
The two-finger algorithm, OK?

1462
01:10:41,950 --> 01:10:43,210
This is the big idea.

1463
01:10:43,210 --> 01:10:45,400
You're doing it all
the time in this class.

1464
01:10:45,400 --> 01:10:46,150
It's super useful.

1465
01:10:46,150 --> 01:10:48,880
We saw it in merge sorts, for
example, and merging two lists.

1466
01:10:48,880 --> 01:10:51,190
We have fingers in two
lists that advance.

1467
01:10:51,190 --> 01:10:54,582
And because they only advance,
it takes linear total time.

1468
01:10:54,582 --> 01:10:57,040
So we're going to do this kind
of folded in backwards here.

1469
01:10:57,040 --> 01:10:58,550
We're going to start here.

1470
01:10:58,550 --> 01:11:00,700
This seems like a good
candidate to start with.

1471
01:11:00,700 --> 01:11:03,400
Now, what else
could this add with?

1472
01:11:03,400 --> 01:11:05,650
Well, maybe smaller items.

1473
01:11:05,650 --> 01:11:07,930
And maybe I have to go
all the way through here.

1474
01:11:07,930 --> 01:11:10,345
And then I've got to
advance my left finger.

1475
01:11:10,345 --> 01:11:11,080
Yeah, OK.

1476
01:11:11,080 --> 01:11:15,020
So here's the idea.

1477
01:11:15,020 --> 01:11:19,240
So let's look at--

1478
01:11:19,240 --> 01:11:22,650
so I'm going to call this
finger i and this finger j.

1479
01:11:22,650 --> 01:11:25,070
So we want to sum two things.

1480
01:11:25,070 --> 01:11:27,040
So I guess one other
inspiration here is,

1481
01:11:27,040 --> 01:11:28,390
we want to add two things up.

1482
01:11:28,390 --> 01:11:31,170
And we have one algorithm
that has the word "two" in it.

1483
01:11:31,170 --> 01:11:33,080
And it's the
two-finger algorithm.

1484
01:11:33,080 --> 01:11:34,978
So let's try that.

1485
01:11:34,978 --> 01:11:36,520
So we're going to
start at i equals 0

1486
01:11:36,520 --> 01:11:38,050
and j equals n minus 1.

1487
01:11:38,050 --> 01:11:44,770
We're going to look at si plus
sj and see, how good is it?

1488
01:11:44,770 --> 01:11:46,900
How close to summing to h is it?

1489
01:11:46,900 --> 01:11:51,400
Well, in particular, it's
either less than or equal to h

1490
01:11:51,400 --> 01:11:52,330
or bigger than h.

1491
01:11:57,940 --> 01:11:59,980
If it's bigger than h--

1492
01:11:59,980 --> 01:12:01,300
so this sum is too big.

1493
01:12:01,300 --> 01:12:03,520
I can't even use
it as a candidate.

1494
01:12:03,520 --> 01:12:06,640
Well, that means I really
don't need this guy, right?

1495
01:12:06,640 --> 01:12:08,020
It's too big overall.

1496
01:12:08,020 --> 01:12:10,450
I'm adding the smallest
item to this item.

1497
01:12:10,450 --> 01:12:11,362
And it's too big.

1498
01:12:11,362 --> 01:12:12,820
Well, then I should
go to the left.

1499
01:12:12,820 --> 01:12:14,960
I should move my right
finger to the left.

1500
01:12:14,960 --> 01:12:19,750
So in this case, we decrement j.

1501
01:12:19,750 --> 01:12:22,520
Move the right
finger to the left.

1502
01:12:22,520 --> 01:12:26,760
So I'm guessing, in this
case, I increment i.

1503
01:12:26,760 --> 01:12:28,660
Why?

1504
01:12:28,660 --> 01:12:32,395
If I add these two items
up, and this is too small,

1505
01:12:32,395 --> 01:12:36,370
it's smaller than h, then this
item was probably too small.

1506
01:12:36,370 --> 01:12:38,215
It might actually--
it's an OK solution.

1507
01:12:38,215 --> 01:12:39,423
It's less than or equal to h.

1508
01:12:39,423 --> 01:12:41,965
So I should keep
it as a candidate.

1509
01:12:41,965 --> 01:12:47,800
Let's say add candidate.

1510
01:12:47,800 --> 01:12:51,220
So I'm just going to keep a
list of candidates that I see.

1511
01:12:51,220 --> 01:12:52,900
So this is a possible solution.

1512
01:12:52,900 --> 01:12:54,430
It might not be the best one.

1513
01:12:54,430 --> 01:12:55,810
But it's one to add to my list.

1514
01:12:55,810 --> 01:13:00,400
And then I'm going to increase i
and now work on this sub-array,

1515
01:13:00,400 --> 01:13:02,510
because that will be
a little bit bigger.

1516
01:13:02,510 --> 01:13:04,160
I can't go this way
to make it bigger,

1517
01:13:04,160 --> 01:13:05,860
because I'm at the last item.

1518
01:13:05,860 --> 01:13:07,690
And it's not obvious
that this works.

1519
01:13:07,690 --> 01:13:15,920
I think there's a nice invariant
that will help somewhere.

1520
01:13:15,920 --> 01:13:17,691
Where'd I put my piece of paper?

1521
01:13:22,430 --> 01:13:22,930
Yeah.

1522
01:13:26,325 --> 01:13:28,750
So here's an invariant.

1523
01:13:37,076 --> 01:13:38,010
Oh, yes.

1524
01:14:00,672 --> 01:14:02,380
It's really clear this
is the right thing

1525
01:14:02,380 --> 01:14:03,460
to do in the first step.

1526
01:14:03,460 --> 01:14:06,340
And the tricky part is to argue
that it works in all steps,

1527
01:14:06,340 --> 01:14:09,243
because when I really have the
smallest item and the largest

1528
01:14:09,243 --> 01:14:11,535
item, it's clear that I should
advance one or the other

1529
01:14:11,535 --> 01:14:13,060
if I'm too small or too big.

1530
01:14:13,060 --> 01:14:15,520
But the way to prove it
in general by induction

1531
01:14:15,520 --> 01:14:17,290
is to show this invariant that--

1532
01:14:17,290 --> 01:14:19,870
so at some point
through this execution,

1533
01:14:19,870 --> 01:14:21,070
i and j are somewhere.

1534
01:14:21,070 --> 01:14:23,830
And I want to say that if I
take any j from the right--

1535
01:14:23,830 --> 01:14:26,860
any j prime to the right of
j and any i prime to the left

1536
01:14:26,860 --> 01:14:30,310
of i, unstrictly, then
all of those pairs,

1537
01:14:30,310 --> 01:14:33,010
all those pairwise sums,
are either too big--

1538
01:14:33,010 --> 01:14:34,750
and that's when we decrease j--

1539
01:14:34,750 --> 01:14:37,450
or they're less than or equal
to the largest candidate

1540
01:14:37,450 --> 01:14:39,490
that we've seen so far.

1541
01:14:39,490 --> 01:14:42,140
That's because we added
these candidates in there.

1542
01:14:42,140 --> 01:14:45,280
So that invariant will
hold by induction,

1543
01:14:45,280 --> 01:14:47,950
because whenever there's a
possible thing that's good,

1544
01:14:47,950 --> 01:14:49,577
I add it to my candidate list.

1545
01:14:49,577 --> 01:14:51,160
And then, at the end
of the algorithm,

1546
01:14:51,160 --> 01:14:54,250
I just loop through my
candidate list, compute the max,

1547
01:14:54,250 --> 01:14:55,090
return that pair.

1548
01:14:58,390 --> 01:15:01,600
OK, so that is
two-finger algorithm,

1549
01:15:01,600 --> 01:15:04,360
which solves the
non-exact problem

1550
01:15:04,360 --> 01:15:06,070
in linear worst-case time.

1551
01:15:06,070 --> 01:15:06,570
Yeah?

1552
01:15:06,570 --> 01:15:08,170
AUDIENCE: i cannot
equal j, right?

1553
01:15:08,170 --> 01:15:09,040
ERIK DEMAINE: Oh,
i cannot-- right.

1554
01:15:09,040 --> 01:15:10,900
So what are the
termination conditions?

1555
01:15:10,900 --> 01:15:14,740
When i equals j, that's
probably when you want to stop.

1556
01:15:14,740 --> 01:15:15,350
It depends.

1557
01:15:15,350 --> 01:15:19,540
You could say, if i is
greater than j, stop.

1558
01:15:19,540 --> 01:15:21,230
Return max candidate.

1559
01:15:24,005 --> 01:15:25,880
There are two ways to
interpret this problem.

1560
01:15:25,880 --> 01:15:29,290
One is that the two
values you choose in S

1561
01:15:29,290 --> 01:15:31,840
need to be different
values, or you

1562
01:15:31,840 --> 01:15:35,080
allow them to be the same value,
like they can both be h over 2.

1563
01:15:35,080 --> 01:15:36,730
And either way is
easier to solve.

1564
01:15:36,730 --> 01:15:39,490
If you want to allow
s over 2, then I

1565
01:15:39,490 --> 01:15:40,660
would put greater than here.

1566
01:15:40,660 --> 01:15:44,470
If you don't want
to allow h over 2,

1567
01:15:44,470 --> 01:15:46,990
then I would put greater
than or equal to--

1568
01:15:46,990 --> 01:15:47,590
either way.

1569
01:15:47,590 --> 01:15:49,870
Both of these problems,
you can solve both ways.

1570
01:15:49,870 --> 01:15:54,820
Or both algorithms can
handle both situations.

1571
01:15:54,820 --> 01:15:57,790
OK, one more problem.

1572
01:16:00,950 --> 01:16:01,640
All right.

1573
01:16:01,640 --> 01:16:04,100
Yeah, I'm all out of time.

1574
01:16:04,100 --> 01:16:06,740
But I'm getting
faster and faster.

1575
01:16:06,740 --> 01:16:08,180
Of course, on the
hardest problem,

1576
01:16:08,180 --> 01:16:09,590
I can do it the fastest.

1577
01:16:09,590 --> 01:16:14,180
All right, so Meff Ja--

1578
01:16:14,180 --> 01:16:16,880
this is a reference to Jeff
Ma of the MIT Blackjack

1579
01:16:16,880 --> 01:16:19,910
Team, who I got to speak here
at LSC a bunch of years ago.

1580
01:16:19,910 --> 01:16:22,880
But he's featured in
the movie 21 and so on--

1581
01:16:22,880 --> 01:16:25,910
fictionalized.

1582
01:16:25,910 --> 01:16:27,860
So I was playing this game.

1583
01:16:27,860 --> 01:16:29,270
It's a great setup.

1584
01:16:29,270 --> 01:16:31,220
You should definitely
read this problem--

1585
01:16:31,220 --> 01:16:35,120
Po- k -er.

1586
01:16:35,120 --> 01:16:38,930
And he has a deck of
cards, where each card has

1587
01:16:38,930 --> 01:16:40,400
a letter of the alphabet on it.

1588
01:16:40,400 --> 01:16:42,510
I guess this is
the right way up.

1589
01:16:42,510 --> 01:16:44,930
So I, of course,
have such a deck.

1590
01:16:44,930 --> 01:16:46,040
Doesn't everyone?

1591
01:16:46,040 --> 01:16:47,000
You can buy these.

1592
01:16:47,000 --> 01:16:49,370
I have several, actually.

1593
01:16:49,370 --> 01:16:50,990
And so we can do a
quick magic trick,

1594
01:16:50,990 --> 01:16:54,272
like pick a card, any
card-- here, pick a card.

1595
01:16:54,272 --> 01:16:57,710
[INAUDIBLE] OK, good choice.

1596
01:16:57,710 --> 01:16:59,960
I can't force, so it
doesn't really matter.

1597
01:16:59,960 --> 01:17:04,160
OK, and so this is
your card, right?

1598
01:17:04,160 --> 01:17:07,680
And your card is an s, right?

1599
01:17:07,680 --> 01:17:09,560
OK, good.

1600
01:17:09,560 --> 01:17:11,180
No, not all the cards are s's.

1601
01:17:11,180 --> 01:17:14,923
[LAUGHTER]

1602
01:17:14,923 --> 01:17:16,340
But he has mirrors
in his glasses.

1603
01:17:16,340 --> 01:17:17,404
No.

1604
01:17:17,404 --> 01:17:20,630
I can reveal later
how that's done.

1605
01:17:20,630 --> 01:17:22,350
OK, so a deck of cards.

1606
01:17:22,350 --> 01:17:25,730
Each card has 26
possible letters on it.

1607
01:17:25,730 --> 01:17:29,090
And there's this
weird dealing process.

1608
01:17:29,090 --> 01:17:30,590
Even just defining
this problem is

1609
01:17:30,590 --> 01:17:31,370
going to take a little while.

1610
01:17:31,370 --> 01:17:32,578
Oh, here's my piece of paper.

1611
01:17:35,930 --> 01:17:38,000
So we have this dealing process.

1612
01:17:38,000 --> 01:17:40,310
Here's an example that's
in the program, abcdbc.

1613
01:17:45,020 --> 01:17:47,330
So you know the
order of the cards.

1614
01:17:47,330 --> 01:17:48,350
This is the top card.

1615
01:17:48,350 --> 01:17:49,940
This is the bottom card.

1616
01:17:49,940 --> 01:17:53,450
And now, randomly, you do a cut.

1617
01:17:53,450 --> 01:17:56,910
Cut is this.

1618
01:17:56,910 --> 01:18:00,140
So I take some chunk off the
top, move it to the bottom,

1619
01:18:00,140 --> 01:18:02,030
once, randomly.

1620
01:18:02,030 --> 01:18:06,080
So for example, I
could take this cut.

1621
01:18:06,080 --> 01:18:13,190
And then what I would get
is cdbc for this part that's

1622
01:18:13,190 --> 01:18:18,840
copied here, and ab as the--

1623
01:18:18,840 --> 01:18:23,490
so this is-- so the first
thing we do is cut at i.

1624
01:18:23,490 --> 01:18:24,780
This is position i.

1625
01:18:24,780 --> 01:18:28,860
In this example, i equals 2.

1626
01:18:28,860 --> 01:18:33,610
OK, then we deal
the top k cards.

1627
01:18:33,610 --> 01:18:38,610
So let's say we deal the
top four cards, k equals 4.

1628
01:18:38,610 --> 01:18:42,960
So this is deal k.

1629
01:18:42,960 --> 01:18:45,900
So we get cdbc, in that order.

1630
01:18:45,900 --> 01:18:48,270
But the order doesn't matter,
because the last operation

1631
01:18:48,270 --> 01:18:53,882
we do in the problem is
sort them, which is bccd,

1632
01:18:53,882 --> 01:18:55,590
like you do when you
get a hand of cards.

1633
01:18:55,590 --> 01:18:57,420
You tend to sort them.

1634
01:18:57,420 --> 01:18:58,860
OK, so this is a process.

1635
01:18:58,860 --> 01:19:03,330
Given a deck-- so the
deck here is fixed.

1636
01:19:03,330 --> 01:19:10,200
We call this process, I
think, P of D comma i comma k.

1637
01:19:10,200 --> 01:19:11,320
We're told what D is.

1638
01:19:11,320 --> 01:19:12,720
We're told what k is.

1639
01:19:12,720 --> 01:19:15,210
i is chosen randomly.

1640
01:19:15,210 --> 01:19:18,090
And we'd like to know what
happens with different i's.

1641
01:19:21,160 --> 01:19:24,265
So if you stare at this problem
enough, it begins to simplify.

1642
01:19:24,265 --> 01:19:26,382
So this is a complicated setup.

1643
01:19:26,382 --> 01:19:28,840
But what's really going on is
we're starting at position i.

1644
01:19:28,840 --> 01:19:32,830
And we're taking the next k
cards from there cyclically.

1645
01:19:32,830 --> 01:19:34,270
So here, we just
took those four.

1646
01:19:34,270 --> 01:19:38,410
If i equaled 3, we would deal
d, then b, then c, then a.

1647
01:19:38,410 --> 01:19:40,290
But then we sort them.

1648
01:19:40,290 --> 01:19:43,240
OK, so we're getting different
substrings of length k,

1649
01:19:43,240 --> 01:19:44,350
cyclically.

1650
01:19:44,350 --> 01:19:46,710
But then we're
sorting those letters.

1651
01:19:46,710 --> 01:19:48,460
Sorting is really
crucial for this problem

1652
01:19:48,460 --> 01:19:50,890
to at all be feasible.

1653
01:19:50,890 --> 01:19:53,780
It took me a while even to
see how to solve this problem.

1654
01:19:53,780 --> 01:19:56,950
But the key is sorting,
that they get sorted,

1655
01:19:56,950 --> 01:19:59,350
because that means--

1656
01:19:59,350 --> 01:20:01,630
because we sort, it
doesn't matter whether you

1657
01:20:01,630 --> 01:20:08,050
have aaba, or baaa, or abaa.

1658
01:20:08,050 --> 01:20:10,490
These are all the same.

1659
01:20:10,490 --> 01:20:12,550
If you take these
cards dealt, you

1660
01:20:12,550 --> 01:20:15,130
sort them to the
same thing, which is

1661
01:20:15,130 --> 01:20:17,650
the one I didn't write, aaab.

1662
01:20:17,650 --> 01:20:20,150
All of these get sorted
to the same thing.

1663
01:20:20,150 --> 01:20:23,970
So we lost some information
when we sort, lost the order.

1664
01:20:23,970 --> 01:20:25,680
The first question
to get you thinking

1665
01:20:25,680 --> 01:20:29,970
in this direction, part a,
says, build a data structure

1666
01:20:29,970 --> 01:20:36,600
given D and k that lets me know,
given two indices, i and j,

1667
01:20:36,600 --> 01:20:39,670
do I end up with
the exact same hand?

1668
01:20:39,670 --> 01:20:43,080
This thing is called a hand.

1669
01:20:43,080 --> 01:20:46,770
And it's exactly this P D, i, k.

1670
01:20:46,770 --> 01:20:50,610
So I want to do P D,
i, k and P, d, j, k.

1671
01:20:50,610 --> 01:20:51,480
And I want to know--

1672
01:20:51,480 --> 01:20:52,260
JK.

1673
01:20:52,260 --> 01:20:54,480
And I want to know whether
those two things are

1674
01:20:54,480 --> 01:20:57,340
equal in constant time.

1675
01:20:57,340 --> 01:20:58,380
That's what this says--

1676
01:20:58,380 --> 01:21:00,240
constant time.

1677
01:21:00,240 --> 01:21:02,835
Doesn't say worst case,
but worst case is possible.

1678
01:21:06,577 --> 01:21:08,160
And that sounds hard,
because, I mean,

1679
01:21:08,160 --> 01:21:10,560
there's k symbols for one
of them, another k symbols

1680
01:21:10,560 --> 01:21:11,312
for the other guy.

1681
01:21:11,312 --> 01:21:13,020
But we don't have to
compare the symbols.

1682
01:21:13,020 --> 01:21:17,220
We just need to compare the
sorting of those strings.

1683
01:21:17,220 --> 01:21:19,620
And this, we can compress.

1684
01:21:19,620 --> 01:21:21,570
So this is a subtlety.

1685
01:21:21,570 --> 01:21:24,210
But all I really need to know
is that there are three a's

1686
01:21:24,210 --> 01:21:30,260
here, and one b, and zero
c's, and zero d's, and zero

1687
01:21:30,260 --> 01:21:31,980
e's, and so on.

1688
01:21:31,980 --> 01:21:35,310
But because there's only
26 letters in this deck--

1689
01:21:35,310 --> 01:21:38,480
and indeed, in this deck, it
happens upper and lowercase a

1690
01:21:38,480 --> 01:21:39,180
through z.

1691
01:21:39,180 --> 01:21:41,220
But we might have n cards.

1692
01:21:41,220 --> 01:21:43,190
But there are only
26 possible labels.

1693
01:21:43,190 --> 01:21:44,648
So in fact, a lot
of them are going

1694
01:21:44,648 --> 01:21:46,390
to be equal if n is large.

1695
01:21:46,390 --> 01:21:47,940
So this is a good
compression scheme,

1696
01:21:47,940 --> 01:21:52,620
because to represent the things
I get after sorting, I just

1697
01:21:52,620 --> 01:21:54,830
need to give you 26 numbers.

1698
01:21:54,830 --> 01:21:58,050
And for us, 26 is small,
because 26 is a constant.

1699
01:21:58,050 --> 01:22:00,690
Independent of the number of
cards, I just need to say,

1700
01:22:00,690 --> 01:22:01,710
how many a's are there?

1701
01:22:01,710 --> 01:22:03,630
It could be anywhere
between 0 and n.

1702
01:22:03,630 --> 01:22:04,710
How many b's are there?

1703
01:22:04,710 --> 01:22:05,642
Between 0 and n.

1704
01:22:05,642 --> 01:22:06,600
How many c's are there?

1705
01:22:06,600 --> 01:22:07,800
Between 0 and n.

1706
01:22:07,800 --> 01:22:15,690
So 26 numbers in
the range 0 to n--

1707
01:22:15,690 --> 01:22:24,480
I like to think of this as a
26-digit number base n plus 1.

1708
01:22:24,480 --> 01:22:29,220
We can map this
into base n plus 1.

1709
01:22:29,220 --> 01:22:36,360
And we get 26
digits In that base.

1710
01:22:36,360 --> 01:22:39,907
Another way to say it is
that the number of possible

1711
01:22:39,907 --> 01:22:42,240
combinations here-- how many
a's, how many b's, how many

1712
01:22:42,240 --> 01:22:42,960
c's--

1713
01:22:42,960 --> 01:22:45,480
is not even theta.

1714
01:22:45,480 --> 01:22:51,120
It is n plus 1, anything between
0 and n, to the power of 26.

1715
01:22:51,120 --> 01:22:54,270
This is a good polynomial.

1716
01:22:54,270 --> 01:22:58,630
So I can do stuff
like radix sort.

1717
01:23:01,300 --> 01:23:02,200
Cool.

1718
01:23:02,200 --> 01:23:07,330
So let me summarize a little
bit how we solve part a.

1719
01:23:07,330 --> 01:23:10,440
So I want to build a
data structure, which is,

1720
01:23:10,440 --> 01:23:12,960
for each value i,
I know I'm going

1721
01:23:12,960 --> 01:23:17,610
to end up serving these four
cards, or in general, k cards.

1722
01:23:17,610 --> 01:23:20,915
So for those cards, I
would like to compute

1723
01:23:20,915 --> 01:23:23,040
how many a's, how many b's,
how many c's are there?

1724
01:23:23,040 --> 01:23:25,770
And then just write
down this number.

1725
01:23:25,770 --> 01:23:27,470
This is a number
which I can write down

1726
01:23:27,470 --> 01:23:32,020
in at most 26 words, because we
can represent numbers between 0

1727
01:23:32,020 --> 01:23:33,660
and n in a single word.

1728
01:23:33,660 --> 01:23:37,020
That's the w is at
least log n assumption.

1729
01:23:37,020 --> 01:23:38,520
So it's constant size.

1730
01:23:38,520 --> 01:23:41,130
In a constant
number of numbers, I

1731
01:23:41,130 --> 01:23:44,640
can represent all I need to
know about a thing of size--

1732
01:23:44,640 --> 01:23:46,480
of length k here.

1733
01:23:46,480 --> 01:23:49,088
So I don't need to know
which letters is where.

1734
01:23:49,088 --> 01:23:50,630
I just need to know
the sorted order.

1735
01:23:50,630 --> 01:23:53,490
So I just need to know-- this
is called a frequency table--

1736
01:23:53,490 --> 01:23:55,230
how many a's, how many b's?

1737
01:23:55,230 --> 01:23:59,010
And so if I can compute those,
then given that representation

1738
01:23:59,010 --> 01:24:02,130
for starting at i, and
given that representation

1739
01:24:02,130 --> 01:24:04,300
for starting at j,
say, which would

1740
01:24:04,300 --> 01:24:07,920
be these two and these
two, I can compare them

1741
01:24:07,920 --> 01:24:09,780
by just comparing
those 26 numbers.

1742
01:24:09,780 --> 01:24:11,370
If they're all
equal, then they're

1743
01:24:11,370 --> 01:24:13,230
the same string after sorting.

1744
01:24:13,230 --> 01:24:15,510
And if there's any difference,
then they're different.

1745
01:24:15,510 --> 01:24:17,343
So that's how I could
do it in constant time

1746
01:24:17,343 --> 01:24:19,410
if I can compute
these representations.

1747
01:24:19,410 --> 01:24:21,660
And it's not hard to do that.

1748
01:24:21,660 --> 01:24:24,420
It's called a sliding window
technique, where you compute it

1749
01:24:24,420 --> 01:24:26,640
for the first k guys.

1750
01:24:26,640 --> 01:24:29,370
And then you remove this
item and add this item.

1751
01:24:29,370 --> 01:24:32,010
And just by incrementing
the counter for b,

1752
01:24:32,010 --> 01:24:33,810
decrementing the
counter for a, now

1753
01:24:33,810 --> 01:24:37,180
I know the representation
for these guys.

1754
01:24:37,180 --> 01:24:39,690
Make a copy of that, which is
a copy of those 26 numbers,

1755
01:24:39,690 --> 01:24:40,530
constant.

1756
01:24:40,530 --> 01:24:43,110
Then I add on c, remove b.

1757
01:24:43,110 --> 01:24:49,530
Then I add on a, remove c,
add on d, remove d, add on c,

1758
01:24:49,530 --> 01:24:53,983
remove b, and add
on d, and remove c.

1759
01:24:53,983 --> 01:24:55,400
Well, I got back
to the beginning.

1760
01:24:55,400 --> 01:24:57,425
So now I have
representation of those.

1761
01:24:57,425 --> 01:25:00,000
OK, so by sliding
this window, I'm

1762
01:25:00,000 --> 01:25:02,640
only changing at the two ends.

1763
01:25:02,640 --> 01:25:03,690
I add one guy on.

1764
01:25:03,690 --> 01:25:05,580
I increment one
of these counters.

1765
01:25:05,580 --> 01:25:07,140
I decrement one
of these counters.

1766
01:25:07,140 --> 01:25:09,620
So in constant time,
given the representation

1767
01:25:09,620 --> 01:25:12,120
of one of these substrings, I
can compute the representation

1768
01:25:12,120 --> 01:25:13,095
of the next one.

1769
01:25:13,095 --> 01:25:14,760
And that's how I,
in linear time,

1770
01:25:14,760 --> 01:25:16,470
can build such a
data structure that

1771
01:25:16,470 --> 01:25:19,860
lets me tell whether
any two hands are equal.

1772
01:25:19,860 --> 01:25:23,040
The next problem,
part b, is, given

1773
01:25:23,040 --> 01:25:25,650
all these
representations, can you

1774
01:25:25,650 --> 01:25:27,892
find which one is
the most common?

1775
01:25:27,892 --> 01:25:29,850
Because we were choosing
i uniformly at random,

1776
01:25:29,850 --> 01:25:33,930
I want to know what the most
likely hand that you get is.

1777
01:25:33,930 --> 01:25:35,910
And I think the
easiest way to say

1778
01:25:35,910 --> 01:25:38,670
this is you can do
that by radix sorting.

1779
01:25:38,670 --> 01:25:40,180
You take all these
representations.

1780
01:25:40,180 --> 01:25:42,480
They are nice numbers
in the range 0

1781
01:25:42,480 --> 01:25:45,010
to n plus 1 to the 26th power.

1782
01:25:45,010 --> 01:25:47,490
So I can just run radix
sort and sort them all.

1783
01:25:47,490 --> 01:25:49,830
And then with a single
scan through the array,

1784
01:25:49,830 --> 01:25:51,990
I can see which one
is the most common.

1785
01:25:51,990 --> 01:25:53,120
Or rather, I can--

1786
01:25:53,120 --> 01:25:56,610
in a single scan, I can compute,
OK, how many of the same things

1787
01:25:56,610 --> 01:25:57,330
are at the front?

1788
01:25:57,330 --> 01:26:00,310
If they're sorted, then all the
equal ones will be together.

1789
01:26:00,310 --> 01:26:01,630
So how many are there?

1790
01:26:01,630 --> 01:26:03,550
Then how many equal ones next?

1791
01:26:03,550 --> 01:26:05,190
And how many equal ones next?

1792
01:26:05,190 --> 01:26:07,950
Each time, comparing each
item to the previous one.

1793
01:26:07,950 --> 01:26:11,400
Then I get frequency counts
for all of these hands.

1794
01:26:11,400 --> 01:26:13,770
And then I do another scan
to find the most common one.

1795
01:26:13,770 --> 01:26:16,980
And I can do another scan to
find the lexically best one,

1796
01:26:16,980 --> 01:26:19,005
lexically last one.

1797
01:26:19,005 --> 01:26:22,370
And that's how you
solve problem 5.