1
00:00:00,000 --> 00:00:01,976
[SQUEAKING]

2
00:00:01,976 --> 00:00:04,446
[RUSTLING]

3
00:00:04,446 --> 00:00:07,904
[CLICKING]

4
00:00:12,850 --> 00:00:16,580
JASON KU: Good
morning, everybody.

5
00:00:16,580 --> 00:00:18,400
How's everybody doing?

6
00:00:18,400 --> 00:00:20,920
Nice long weekend
we just came from--

7
00:00:20,920 --> 00:00:21,820
I'm doing well.

8
00:00:21,820 --> 00:00:24,870
I'm actually getting
over a little cold.

9
00:00:24,870 --> 00:00:27,220
Aw-- yeah, unfortunately.

10
00:00:27,220 --> 00:00:30,850
But after this, I don't have
anything else this week,

11
00:00:30,850 --> 00:00:32,470
so that's good.

12
00:00:32,470 --> 00:00:40,390
OK, so last time, last
week, we talked about how--

13
00:00:40,390 --> 00:00:45,550
we looked at the search
problem that we talked about

14
00:00:45,550 --> 00:00:49,070
earlier that week
and showed that,

15
00:00:49,070 --> 00:00:51,200
in a certain model
of computation,

16
00:00:51,200 --> 00:00:57,080
where I could only compare
two objects that I'm

17
00:00:57,080 --> 00:00:58,820
storing in my--

18
00:00:58,820 --> 00:01:02,480
that I'm storing and
get some constant number

19
00:01:02,480 --> 00:01:04,489
of outputs on what I could--

20
00:01:04,489 --> 00:01:06,140
how I could identify
these things,

21
00:01:06,140 --> 00:01:08,180
like equal, or less
than, or something

22
00:01:08,180 --> 00:01:11,070
like that, then we
drew a decision tree

23
00:01:11,070 --> 00:01:17,430
and we got this bound
that, if I had n outputs,

24
00:01:17,430 --> 00:01:21,760
I would require my decision tree
to be at least log n height.

25
00:01:21,760 --> 00:01:27,000
And so in this model, I
can't find the things faster

26
00:01:27,000 --> 00:01:28,810
than log n time.

27
00:01:28,810 --> 00:01:33,620
But luckily, we are in a
model of computation which

28
00:01:33,620 --> 00:01:35,690
has a stronger operation--

29
00:01:35,690 --> 00:01:38,680
namely, random accessing.

30
00:01:38,680 --> 00:01:43,390
And if we stored the things
that we're looking for,

31
00:01:43,390 --> 00:01:47,710
we have unique keys, and
those keys are integers.

32
00:01:47,710 --> 00:01:52,020
Then, if I have an
item with key K,

33
00:01:52,020 --> 00:01:57,540
if I store it at
index K in my array,

34
00:01:57,540 --> 00:02:02,220
then I can find it and
manipulate it in constant time.

35
00:02:02,220 --> 00:02:04,667
That's pretty cool.

36
00:02:04,667 --> 00:02:06,500
That's what we called
a direct access array.

37
00:02:06,500 --> 00:02:08,333
A direct access array--
really not different

38
00:02:08,333 --> 00:02:11,000
than a regular array,
except how are you using it

39
00:02:11,000 --> 00:02:13,550
when we were talking
about sequences is we

40
00:02:13,550 --> 00:02:19,910
are giving extrinsic
semantics to the slots where

41
00:02:19,910 --> 00:02:21,110
we are storing these things.

42
00:02:21,110 --> 00:02:26,120
Basically, I could put
any item in any slot.

43
00:02:26,120 --> 00:02:28,160
Where it was in my
array had nothing

44
00:02:28,160 --> 00:02:30,590
to do with what
those things were.

45
00:02:30,590 --> 00:02:35,900
Here we are imposing intrinsic
semantics on my array

46
00:02:35,900 --> 00:02:42,980
that, if I have an item with
key K, it must be at index K.

47
00:02:42,980 --> 00:02:46,970
That's the thing that we're
taking advantage of here.

48
00:02:46,970 --> 00:02:50,540
And then we can use this nice,
powerful linear branching

49
00:02:50,540 --> 00:02:54,183
random access operation to find
that thing in constant time,

50
00:02:54,183 --> 00:02:55,850
because that's our
model of computation.

51
00:02:55,850 --> 00:03:01,160
OK, then what was the problem
with this direct access array?

52
00:03:01,160 --> 00:03:02,120
Anyone shout it out.

53
00:03:06,180 --> 00:03:07,800
Space-- right.

54
00:03:07,800 --> 00:03:10,980
So we had to instantiate
a direct access

55
00:03:10,980 --> 00:03:15,180
array that was the size
of the space of our keys.

56
00:03:15,180 --> 00:03:18,930
In general, my
index location is--

57
00:03:18,930 --> 00:03:21,520
could go from 0 to
some positive number.

58
00:03:21,520 --> 00:03:24,270
If I a very large positive
numbers, if I was sorting--

59
00:03:24,270 --> 00:03:27,420
if I was searching
among your MIT IDs,

60
00:03:27,420 --> 00:03:29,280
I'd have to have a
direct access array that

61
00:03:29,280 --> 00:03:33,450
was that spanned that space of
possible keys you could have.

62
00:03:33,450 --> 00:03:35,850
And that could be
much larger than n.

63
00:03:35,850 --> 00:03:39,510
And so the rest of the
time we talked about how

64
00:03:39,510 --> 00:03:42,570
to fix that space problem.

65
00:03:42,570 --> 00:03:46,500
We can reduce the space by
taking that larger key space

66
00:03:46,500 --> 00:03:49,290
from 0 to u, which
could be very large,

67
00:03:49,290 --> 00:03:51,690
and map it down
to a small space.

68
00:03:51,690 --> 00:03:56,740
Now, in general, if I give you
a fixed hash function there,

69
00:03:56,740 --> 00:04:00,440
that's not going to be
good in-- for all inputs.

70
00:04:00,440 --> 00:04:04,160
If your inputs are very well
distributed over the key space,

71
00:04:04,160 --> 00:04:07,670
then it is good, but
in general, there

72
00:04:07,670 --> 00:04:12,680
would be hash functions with
some inputs that will be bad.

73
00:04:12,680 --> 00:04:14,720
That's what we argued.

74
00:04:14,720 --> 00:04:16,820
And so for the rest
of the time there,

75
00:04:16,820 --> 00:04:20,390
we talked about hash families,
choosing a hash function

76
00:04:20,390 --> 00:04:24,770
randomly from among a large
set of hash functions,

77
00:04:24,770 --> 00:04:28,850
which had a property that, if
I chose this thing randomly

78
00:04:28,850 --> 00:04:32,660
and you, generating your
input, didn't know which random

79
00:04:32,660 --> 00:04:38,170
numbers I was picking, the
expectation over my random

80
00:04:38,170 --> 00:04:39,003
choice-- me--

81
00:04:39,003 --> 00:04:40,420
I'm the one running
the algorithm,

82
00:04:40,420 --> 00:04:43,800
not you giving me the input--

83
00:04:43,800 --> 00:04:46,230
that random choice--
my algorithm

84
00:04:46,230 --> 00:04:48,360
actually behaves really
well in expectation.

85
00:04:48,360 --> 00:04:50,760
In particular, I
got constant time

86
00:04:50,760 --> 00:04:55,470
for finding, inserting,
and deleting into this data

87
00:04:55,470 --> 00:04:57,390
structure, in expectation.

88
00:04:57,390 --> 00:05:00,930
We did a little proof of--

89
00:05:00,930 --> 00:05:04,500
that the chain links where we
stored collisions in our hash

90
00:05:04,500 --> 00:05:06,120
function-- in our hash table--

91
00:05:06,120 --> 00:05:10,750
sorry-- those
wouldn't be very long,

92
00:05:10,750 --> 00:05:13,500
and so if they were
constant, then I

93
00:05:13,500 --> 00:05:16,560
don't have to search more than
a constant number of things

94
00:05:16,560 --> 00:05:19,710
when I go to an-- a
hashed index location.

95
00:05:19,710 --> 00:05:23,070
Does everyone remember what
we talked about last week?

96
00:05:26,930 --> 00:05:29,120
I didn't show you
this chart at the end,

97
00:05:29,120 --> 00:05:31,370
but I'm showing it to you now.

98
00:05:31,370 --> 00:05:34,430
Essentially, what we had was we
have a bunch of different ways

99
00:05:34,430 --> 00:05:37,550
to deal with this set interface.

100
00:05:37,550 --> 00:05:40,850
And last week, we talked
about the sorted array,

101
00:05:40,850 --> 00:05:43,490
and then we talked about this
direct access array and this

102
00:05:43,490 --> 00:05:52,300
hash table, which do better for
these dictionary-- the find,

103
00:05:52,300 --> 00:05:55,090
and insert, and
delete operations--

104
00:05:55,090 --> 00:05:58,750
or at least better
in an expected sense.

105
00:05:58,750 --> 00:06:02,080
What's the worst case
performance of a hash table?

106
00:06:06,290 --> 00:06:08,840
If I have to look up
something in a hash table,

107
00:06:08,840 --> 00:06:12,200
and I happen to choose a bad
hash table-- hash function,

108
00:06:12,200 --> 00:06:14,420
what's the worst case here?

109
00:06:14,420 --> 00:06:15,560
What?

110
00:06:15,560 --> 00:06:16,550
n, right?

111
00:06:16,550 --> 00:06:20,300
It's worse than a sorted
array, because potentially, I

112
00:06:20,300 --> 00:06:22,280
hashed everything
that I was storing

113
00:06:22,280 --> 00:06:24,860
to the same index
in my hash table,

114
00:06:24,860 --> 00:06:27,380
and to be able to
distinguish between them,

115
00:06:27,380 --> 00:06:30,860
I can't do anything more
than a linear search.

116
00:06:30,860 --> 00:06:35,810
I could store another set's
data structure as my chain

117
00:06:35,810 --> 00:06:37,430
and do better that way.

118
00:06:37,430 --> 00:06:39,680
That's actually
how Java does it.

119
00:06:39,680 --> 00:06:42,140
They store a data
structure we're

120
00:06:42,140 --> 00:06:45,350
going to be talking about
next week as the chains

121
00:06:45,350 --> 00:06:48,170
so that they can get,
worst case, log n.

122
00:06:48,170 --> 00:06:54,138
But in general, that
hash table is only good

123
00:06:54,138 --> 00:06:54,930
if we're allowing--

124
00:06:54,930 --> 00:06:58,410
OK, I want this to be expected
good, but in the worst case,

125
00:06:58,410 --> 00:07:01,080
if I really need that
operation to be worst case--

126
00:07:01,080 --> 00:07:03,780
I really can't afford
linear time ever

127
00:07:03,780 --> 00:07:05,467
for an operation of that kind--

128
00:07:05,467 --> 00:07:07,050
then I don't want
to use a hash table.

129
00:07:07,050 --> 00:07:10,560
And so on your p set 2,
everything we ask you for

130
00:07:10,560 --> 00:07:12,570
is worst case, so
probably, you don't

131
00:07:12,570 --> 00:07:14,800
want to be using hash tables.

132
00:07:14,800 --> 00:07:16,300
OK?

133
00:07:16,300 --> 00:07:17,117
Yes?

134
00:07:17,117 --> 00:07:18,310
AUDIENCE: What does
the subscript e mean?

135
00:07:18,310 --> 00:07:19,570
JASON KU: What does
the subscript e mean?

136
00:07:19,570 --> 00:07:20,200
That's great.

137
00:07:20,200 --> 00:07:28,840
In this chart, I put a subscript
on this is an expected runtime,

138
00:07:28,840 --> 00:07:31,510
or an A meaning this is
an amortized runtime.

139
00:07:31,510 --> 00:07:35,290
At the end, we talked about
how, if we had too many things

140
00:07:35,290 --> 00:07:40,870
in our hash table, then, as long
as we didn't do it too often--

141
00:07:40,870 --> 00:07:42,460
this is a little
hand wavey argument,

142
00:07:42,460 --> 00:07:45,490
but the same kinds of ideas
as the dynamic array--

143
00:07:45,490 --> 00:07:48,720
if, whenever we got a linear--

144
00:07:48,720 --> 00:07:53,970
we are more than a linear factor
away from where we are trying--

145
00:07:53,970 --> 00:07:56,580
basically, the fill factor
we were trying to be,

146
00:07:56,580 --> 00:07:58,950
then we could just completely
rebuild the hash table

147
00:07:58,950 --> 00:08:00,780
with the new hash
function randomly

148
00:08:00,780 --> 00:08:03,920
chosen from our hash
table with a new size,

149
00:08:03,920 --> 00:08:05,690
and we could get
amortized bounds.

150
00:08:05,690 --> 00:08:07,220
And so that's what Python--

151
00:08:07,220 --> 00:08:10,820
how Python implements
dictionaries, or sets, or even

152
00:08:10,820 --> 00:08:17,500
objects when it's trying to
map keys to different things.

153
00:08:17,500 --> 00:08:19,050
So that's hash tables.

154
00:08:19,050 --> 00:08:20,310
That's great.

155
00:08:20,310 --> 00:08:25,350
The key thing here is, well,
actually, if your range of keys

156
00:08:25,350 --> 00:08:28,770
is small, or if
you as a programmer

157
00:08:28,770 --> 00:08:31,920
have the ability to choose
the keys that you identify

158
00:08:31,920 --> 00:08:33,809
your objects with,
you can actually

159
00:08:33,809 --> 00:08:36,270
choose that range to
be small, to be linear,

160
00:08:36,270 --> 00:08:39,107
to be small with
respect to your items.

161
00:08:39,107 --> 00:08:40,440
And you don't need a hash table.

162
00:08:40,440 --> 00:08:43,230
You can just use a
direct access array,

163
00:08:43,230 --> 00:08:48,580
because if you know your key
space is small, that's great.

164
00:08:48,580 --> 00:08:50,097
So a lot of C
programmers probably

165
00:08:50,097 --> 00:08:52,180
would like to do something
like that, because they

166
00:08:52,180 --> 00:08:53,470
don't have access to--

167
00:08:53,470 --> 00:08:58,760
maybe C++ programmers would
have access to their hash table.

168
00:08:58,760 --> 00:09:02,370
Any questions on this
stuff before we move on?

169
00:09:02,370 --> 00:09:03,091
Yeah?

170
00:09:03,091 --> 00:09:07,240
AUDIENCE: So why is [INAUDIBLE]?

171
00:09:07,240 --> 00:09:09,040
JASON KU: Why is it expected?

172
00:09:09,040 --> 00:09:12,610
When I'm building,
I could insert--

173
00:09:12,610 --> 00:09:16,720
I'm inserting these things from
x 1 by 1 into my hash table.

174
00:09:16,720 --> 00:09:19,840
Each of those
insert operations--

175
00:09:19,840 --> 00:09:22,480
I'm looking up to
see whether that--

176
00:09:22,480 --> 00:09:25,460
an item with that key already
exists in my hash table.

177
00:09:25,460 --> 00:09:29,120
And so I have to look down
the chain to see where it is.

178
00:09:29,120 --> 00:09:33,010
However, if I happen to
know that all of my keys

179
00:09:33,010 --> 00:09:35,960
are unique in my input, all
the items I'm trying to store

180
00:09:35,960 --> 00:09:38,380
are unique, then I don't
have to do that check

181
00:09:38,380 --> 00:09:40,270
and I can get worst
case linear time.

182
00:09:40,270 --> 00:09:42,160
Does that make sense?

183
00:09:42,160 --> 00:09:42,680
All right.

184
00:09:42,680 --> 00:09:45,360
It's a subtlety, but
that's a great question.

185
00:09:45,360 --> 00:09:49,280
OK, so today, instead of
talking about searching,

186
00:09:49,280 --> 00:09:52,030
we're talking about sorting.

187
00:09:52,030 --> 00:09:57,360
Last week, we saw a
few ways to do sort.

188
00:09:57,360 --> 00:09:59,940
Some of them were quadratic--
insertion sort and selection

189
00:09:59,940 --> 00:10:00,450
sort--

190
00:10:00,450 --> 00:10:02,340
and then we had one
that was n log n.

191
00:10:02,340 --> 00:10:06,330
And this thing, n log
n, seemed pretty good,

192
00:10:06,330 --> 00:10:08,160
but can I do better?

193
00:10:11,070 --> 00:10:12,788
Can I do better?

194
00:10:12,788 --> 00:10:15,330
Well, what we're going to show
at the beginning of this class

195
00:10:15,330 --> 00:10:19,640
is, in this
comparison model, no.

196
00:10:19,640 --> 00:10:20,990
n log n is optimal.

197
00:10:20,990 --> 00:10:23,210
And we're going to go
through the exact same line

198
00:10:23,210 --> 00:10:26,430
of reasoning that
we had last week.

199
00:10:26,430 --> 00:10:35,340
So in the comparison
model, what did we

200
00:10:35,340 --> 00:10:40,020
use when we were trying
to make this argument

201
00:10:40,020 --> 00:10:45,830
that any comparison
model algorithm was going

202
00:10:45,830 --> 00:10:48,470
to take at least log n time?

203
00:10:48,470 --> 00:10:50,690
What we did was
we said, OK, I can

204
00:10:50,690 --> 00:10:54,080
think of any model in
the comparison model--

205
00:10:54,080 --> 00:10:59,060
any algorithm in the comparison
model as kind of this--

206
00:10:59,060 --> 00:11:01,250
some comparisons happen.

207
00:11:01,250 --> 00:11:03,950
They branch in a
binary sense, but you

208
00:11:03,950 --> 00:11:07,260
could have it generalized to
any constant branching factor.

209
00:11:07,260 --> 00:11:10,590
But for our purposes,
binary's fine.

210
00:11:10,590 --> 00:11:16,040
And what we said was that
there were at least n outputs--

211
00:11:16,040 --> 00:11:17,660
really n plus 1, but--

212
00:11:17,660 --> 00:11:20,270
at least order n outputs.

213
00:11:20,270 --> 00:11:22,520
And we showed that--

214
00:11:22,520 --> 00:11:25,430
or we argued to you that
the height of this tree

215
00:11:25,430 --> 00:11:29,210
had to be at least log n--

216
00:11:31,880 --> 00:11:34,130
log the number of leaves.

217
00:11:34,130 --> 00:11:36,320
It had to be at least
log the number of leaves.

218
00:11:36,320 --> 00:11:38,840
That was the height
of the decision tree.

219
00:11:38,840 --> 00:11:44,120
And if this decision tree
represented a search algorithm,

220
00:11:44,120 --> 00:11:48,110
I had to walk down and
perform these comparisons

221
00:11:48,110 --> 00:11:53,930
in order, reach a leaf where
I would output something.

222
00:11:53,930 --> 00:12:00,050
If the minimum height of any
binary tree on a linear number

223
00:12:00,050 --> 00:12:05,030
of leaves is log n,
then any algorithm

224
00:12:05,030 --> 00:12:10,570
in the comparison model
also has to take log n time,

225
00:12:10,570 --> 00:12:13,090
because it has to do that many
comparisons to differentiate

226
00:12:13,090 --> 00:12:16,150
between all possible outputs.

227
00:12:16,150 --> 00:12:18,490
Does that make sense?

228
00:12:18,490 --> 00:12:19,180
All right.

229
00:12:19,180 --> 00:12:28,070
So in the sort problem, how
many possible outputs are there?

230
00:12:31,220 --> 00:12:33,405
What is the output of
a sorting algorithm?

231
00:12:38,740 --> 00:12:39,710
AUDIENCE: [INAUDIBLE]

232
00:12:39,710 --> 00:12:40,916
JASON KU: What?

233
00:12:40,916 --> 00:12:42,940
What's up?

234
00:12:42,940 --> 00:12:48,080
A list-- in particular,
given my input--

235
00:12:48,080 --> 00:12:56,330
some set of items
A that has size n--

236
00:12:56,330 --> 00:13:00,650
what I'm going to give you is
some permutation of that list.

237
00:13:00,650 --> 00:13:06,310
So for each index, say, I
could tell you where it goes.

238
00:13:09,190 --> 00:13:13,060
Another way I could say is,
where does the first item

239
00:13:13,060 --> 00:13:16,570
go to, where does
the second item go

240
00:13:16,570 --> 00:13:18,690
to, where does the third
item go to-- blah, blah,

241
00:13:18,690 --> 00:13:20,970
blah-- like that.

242
00:13:20,970 --> 00:13:25,210
So how many different choices
of a permutation are there?

243
00:13:25,210 --> 00:13:29,130
Well, how many choices do I have
for the first thing of where

244
00:13:29,130 --> 00:13:31,740
it could be in the
final sorted array?

245
00:13:31,740 --> 00:13:36,790
It could be in any of
the places, so it's n.

246
00:13:36,790 --> 00:13:38,830
How about this one,
the second one?

247
00:13:38,830 --> 00:13:41,470
Well, it can't go to
where this one went,

248
00:13:41,470 --> 00:13:43,430
right but it can
go anywhere else.

249
00:13:43,430 --> 00:13:45,280
So it's n minus 1.

250
00:13:45,280 --> 00:13:47,600
And since these are
independent choices I'm making,

251
00:13:47,600 --> 00:13:49,240
if I multiply them
all together, I

252
00:13:49,240 --> 00:13:52,030
get 9 factorial
permutations that

253
00:13:52,030 --> 00:13:53,830
are the number of
possible outputs

254
00:13:53,830 --> 00:13:55,780
that I have to my
sorting algorithm.

255
00:13:55,780 --> 00:13:58,870
So for me, to have an output
to my sorting algorithm

256
00:13:58,870 --> 00:14:01,990
be correct, I need at
least n factorial leaves.

257
00:14:01,990 --> 00:14:03,730
Does that make sense?

258
00:14:03,730 --> 00:14:04,230
OK.

259
00:14:07,260 --> 00:14:10,170
The nice thing about
doing this last week

260
00:14:10,170 --> 00:14:14,010
is this is really just
the number of leaves

261
00:14:14,010 --> 00:14:16,740
and this is really
the number of leaves.

262
00:14:16,740 --> 00:14:21,090
So what's the number of
leaves is theta n factorial.

263
00:14:21,090 --> 00:14:23,370
Here it's actually n
factorial, but I'm just

264
00:14:23,370 --> 00:14:25,410
going to put it there.

265
00:14:25,410 --> 00:14:27,690
And here we get an n factorial.

266
00:14:33,460 --> 00:14:34,030
I see.

267
00:14:34,030 --> 00:14:39,640
So it's at least
omega n factorial.

268
00:14:39,640 --> 00:14:42,320
Does that make you happier?

269
00:14:42,320 --> 00:14:44,120
Theta here-- thank you--

270
00:14:44,120 --> 00:14:47,040
has to be at least.

271
00:14:47,040 --> 00:14:47,790
So this was right.

272
00:14:50,580 --> 00:14:53,820
OK, so at least this many--

273
00:14:53,820 --> 00:14:57,480
there are algorithms
that, if it got--

274
00:14:57,480 --> 00:14:59,100
it could take two
different routes

275
00:14:59,100 --> 00:15:00,900
to get to the same output.

276
00:15:00,900 --> 00:15:03,970
So this is a lower bound
on the number of leaves.

277
00:15:03,970 --> 00:15:05,070
OK?

278
00:15:05,070 --> 00:15:06,690
So what this
argument is saying is

279
00:15:06,690 --> 00:15:10,110
that, if I just replace
the number of leaves n here

280
00:15:10,110 --> 00:15:13,500
with n factorial, I get
a similar comparison

281
00:15:13,500 --> 00:15:15,660
sort lower bound now.

282
00:15:15,660 --> 00:15:17,790
So what is log of n factorial?

283
00:15:20,980 --> 00:15:24,610
This is familiar
from p set 1 maybe.

284
00:15:24,610 --> 00:15:28,820
So one thing I could do is I
could put in Sterling formula,

285
00:15:28,820 --> 00:15:29,320
right?

286
00:15:32,690 --> 00:15:35,370
And that'll give me something
of the form n log n.

287
00:15:35,370 --> 00:15:41,180
But what's another way I
could lower bound n factorial?

288
00:15:41,180 --> 00:15:42,850
Well, I have a bunch
of things here.

289
00:15:46,030 --> 00:15:48,280
That's n factorial.

290
00:15:48,280 --> 00:15:50,480
Half of these things--

291
00:15:50,480 --> 00:15:53,980
these half, n/2 things--

292
00:15:53,980 --> 00:15:58,770
are bigger than or equal to n/2.

293
00:15:58,770 --> 00:16:00,600
That make sense?

294
00:16:00,600 --> 00:16:08,930
So I can certainly lower bound
this thing by n/2 to the n/2.

295
00:16:08,930 --> 00:16:11,780
That's a little easier
thing to take a log of.

296
00:16:11,780 --> 00:16:17,250
If you take a log of that,
that's asymptotically n log n.

297
00:16:17,250 --> 00:16:20,850
So what we're getting here
is any sorting algorithm here

298
00:16:20,850 --> 00:16:24,270
takes at least n
log n comparisons,

299
00:16:24,270 --> 00:16:26,100
and so a merge sort's
the best we can do.

300
00:16:28,950 --> 00:16:30,480
That make sense to everybody?

301
00:16:30,480 --> 00:16:33,600
We're just piggybacking on the
analysis we had about decision

302
00:16:33,600 --> 00:16:38,800
trees, connecting leaves
with the minimum height

303
00:16:38,800 --> 00:16:43,090
of any binary tree on
that number of leaves,

304
00:16:43,090 --> 00:16:46,090
and just replacing
n with n factorial--

305
00:16:46,090 --> 00:16:48,650
nothing super interesting here.

306
00:16:48,650 --> 00:16:49,150
Yeah?

307
00:16:49,150 --> 00:16:51,450
AUDIENCE: [INAUDIBLE]
the n over 2.

308
00:16:51,450 --> 00:16:53,047
JASON KU: Yeah, sure.

309
00:16:53,047 --> 00:16:54,630
You can just plug
in Sterling formula,

310
00:16:54,630 --> 00:16:58,950
but I did this, so I
might as well clarify.

311
00:16:58,950 --> 00:17:03,320
There are n terms
here in the product.

312
00:17:03,320 --> 00:17:06,214
Half of them are at least n/2.

313
00:17:06,214 --> 00:17:07,089
Does that make sense?

314
00:17:09,660 --> 00:17:12,300
I can lower bound this
product by something

315
00:17:12,300 --> 00:17:15,240
smaller than half of the terms--

316
00:17:15,240 --> 00:17:18,119
a product of that,
and that'll be fine.

317
00:17:18,119 --> 00:17:23,819
So I'm taking n/2 of them and
I'm multiplying n/2 altogether,

318
00:17:23,819 --> 00:17:24,825
n/2 times.

319
00:17:24,825 --> 00:17:25,700
Does that make sense?

320
00:17:28,650 --> 00:17:31,250
It's just providing
a lower bound.

321
00:17:31,250 --> 00:17:33,200
I just need something
that's smaller

322
00:17:33,200 --> 00:17:34,460
than all of these terms.

323
00:17:34,460 --> 00:17:36,260
And multiply them all
together, and that'll

324
00:17:36,260 --> 00:17:39,120
give me a lower bound.

325
00:17:39,120 --> 00:17:43,210
OK, so we can't do better than
n log n in the comparison model,

326
00:17:43,210 --> 00:17:46,650
but what we did
last week was use

327
00:17:46,650 --> 00:17:49,500
random access and a direct
access array to do better.

328
00:17:49,500 --> 00:17:51,750
OK?

329
00:17:51,750 --> 00:17:57,170
Can anyone think of how to
use that idea to sort faster?

330
00:17:57,170 --> 00:18:00,320
And I'm going to give
you a caveat here.

331
00:18:00,320 --> 00:18:04,400
I'm going to let you assume that
the keys of the things you're

332
00:18:04,400 --> 00:18:05,960
trying to sort out are unique.

333
00:18:08,850 --> 00:18:13,760
And say they're in a
bound-- in a small range.

334
00:18:13,760 --> 00:18:19,610
So how could I use a direct
access array to sort faster?

335
00:18:19,610 --> 00:18:21,260
Any ideas?

336
00:18:21,260 --> 00:18:21,760
Yeah?

337
00:18:21,760 --> 00:18:23,732
AUDIENCE: Could
you just literally

338
00:18:23,732 --> 00:18:26,267
insert [INAUDIBLE] into
a direct access array?

339
00:18:26,267 --> 00:18:26,975
JASON KU: Uh-huh.

340
00:18:26,975 --> 00:18:31,410
AUDIENCE: And then you look at
that array and how to sort it.

341
00:18:31,410 --> 00:18:31,980
JASON KU: OK.

342
00:18:31,980 --> 00:18:34,230
So what your colleague is
saying is exactly correct.

343
00:18:34,230 --> 00:18:37,620
It's something that I like to
call direct access array sort.

344
00:18:37,620 --> 00:18:41,370
We won't really call it that,
because there's something more

345
00:18:41,370 --> 00:18:45,190
general that we'll talk
about in just a second.

346
00:18:45,190 --> 00:18:47,340
But what your colleague
was saying is,

347
00:18:47,340 --> 00:18:50,610
instantiate a big
direct access array--

348
00:18:50,610 --> 00:18:53,805
direct access array sort.

349
00:18:56,570 --> 00:18:59,510
I'm instantiating
this big direct access

350
00:18:59,510 --> 00:19:03,910
array of the space
of my keys, and what

351
00:19:03,910 --> 00:19:05,410
your colleague was
saying was I take

352
00:19:05,410 --> 00:19:08,812
each one of the items in my--

353
00:19:08,812 --> 00:19:10,270
the things that
I'm trying to sort,

354
00:19:10,270 --> 00:19:11,980
I look at each
one of their keys,

355
00:19:11,980 --> 00:19:16,240
and I stick it in
the direct accessory

356
00:19:16,240 --> 00:19:20,740
exactly where it needs
to go, in constant time.

357
00:19:20,740 --> 00:19:22,150
That's great.

358
00:19:22,150 --> 00:19:25,150
Now, I gave you this caveat
that all the keys were unique,

359
00:19:25,150 --> 00:19:27,580
so I don't have to deal
with collisions here.

360
00:19:27,580 --> 00:19:30,430
But then, after I'm done with
this, all of these things

361
00:19:30,430 --> 00:19:33,040
are now in sorted
order, and what I can do

362
00:19:33,040 --> 00:19:37,240
is I can just walk
down this list.

363
00:19:37,240 --> 00:19:39,550
A lot of these cells
are empty, potentially.

364
00:19:42,430 --> 00:19:44,890
Some of the keys might
not be there, but what

365
00:19:44,890 --> 00:19:47,170
I can do is just
walk down this list,

366
00:19:47,170 --> 00:19:51,950
pick off every item that does
exist, stick them in an array--

367
00:19:51,950 --> 00:19:52,450
I'm done.

368
00:19:55,990 --> 00:20:00,400
Stick a key into here and then--

369
00:20:00,400 --> 00:20:03,260
all right.

370
00:20:03,260 --> 00:20:08,750
Make direct access array.

371
00:20:08,750 --> 00:20:20,630
Store items-- item
x in index x.key.

372
00:20:25,700 --> 00:20:43,000
Walk down direct access array,
and return items seen in order.

373
00:20:43,000 --> 00:20:45,560
Does that make
sense to everybody?

374
00:20:45,560 --> 00:20:47,410
All right, how long
does this step take?

375
00:20:53,910 --> 00:20:58,380
Building a direct
access array order u--

376
00:20:58,380 --> 00:21:01,970
OK, so this is order u--

377
00:21:01,970 --> 00:21:02,970
how long does this take?

378
00:21:06,390 --> 00:21:08,160
How many items you
have to insert?

379
00:21:08,160 --> 00:21:11,240
Order n, or just n--

380
00:21:11,240 --> 00:21:14,210
and how long does it take to
insert each one of these things

381
00:21:14,210 --> 00:21:17,110
into my direct access array?

382
00:21:17,110 --> 00:21:20,850
Worst case constant time--

383
00:21:20,850 --> 00:21:26,220
so this is n times worst
case constant time--

384
00:21:26,220 --> 00:21:27,115
great.

385
00:21:27,115 --> 00:21:28,490
How long does this
last one take?

386
00:21:35,030 --> 00:21:36,880
Anyone?

387
00:21:36,880 --> 00:21:39,070
O of u also-- right,
because I'm walking down

388
00:21:39,070 --> 00:21:40,150
the entire length of u.

389
00:21:43,510 --> 00:21:50,840
So this algorithm takes,
in total, n plus u time.

390
00:21:50,840 --> 00:21:53,810
This is great.

391
00:21:53,810 --> 00:21:56,375
u is bigger than n, because
we assumed distinct keys.

392
00:21:58,920 --> 00:22:03,150
But if u is on the
order of n, then we now

393
00:22:03,150 --> 00:22:04,650
have linear time
sorting algorithm.

394
00:22:04,650 --> 00:22:07,360
Yes?

395
00:22:07,360 --> 00:22:08,306
What's up?

396
00:22:08,306 --> 00:22:10,277
AUDIENCE: [INAUDIBLE]

397
00:22:10,277 --> 00:22:11,110
JASON KU: I'm sorry.

398
00:22:11,110 --> 00:22:11,985
You have to speak up.

399
00:22:11,985 --> 00:22:16,670
AUDIENCE: How do you attach
keys to the [INAUDIBLE]??

400
00:22:16,670 --> 00:22:21,760
JASON KU: How do I attach
keys to my inputs in my--

401
00:22:21,760 --> 00:22:25,090
for a set data structure that
we've been talking about,

402
00:22:25,090 --> 00:22:27,350
all of my items have keys.

403
00:22:27,350 --> 00:22:30,810
That's just something that
we impose on our input.

404
00:22:30,810 --> 00:22:34,310
AUDIENCE: [INAUDIBLE]

405
00:22:34,310 --> 00:22:36,270
JASON KU: Each of the
keys is-- in this case,

406
00:22:36,270 --> 00:22:37,190
it has to be a number.

407
00:22:40,380 --> 00:22:42,150
That's a nice point.

408
00:22:42,150 --> 00:22:48,260
We do this to talk about
sorting items generally so

409
00:22:48,260 --> 00:22:51,050
that we don't have to deal with
potentially if these keys have

410
00:22:51,050 --> 00:22:53,360
values associated with--
or other stuff associated--

411
00:22:53,360 --> 00:22:56,100
put them on that item, and
they'll still be there.

412
00:22:56,100 --> 00:22:59,090
But in general, if you just
wanted to sort integers,

413
00:22:59,090 --> 00:23:02,480
you could say that .key is--

414
00:23:02,480 --> 00:23:04,545
points back to
the object itself,

415
00:23:04,545 --> 00:23:06,170
if you want to just
sort some integers.

416
00:23:06,170 --> 00:23:07,220
Does that make sense?

417
00:23:07,220 --> 00:23:09,220
It's a good question, though.

418
00:23:09,220 --> 00:23:12,880
OK, so that gives us a
linear time algorithm

419
00:23:12,880 --> 00:23:16,510
when u is small, and
under this condition

420
00:23:16,510 --> 00:23:20,050
that I have unique keys
when I want to sort.

421
00:23:20,050 --> 00:23:22,840
Those are fairly
restrictive, so we

422
00:23:22,840 --> 00:23:24,730
might want to generalize
this a little bit.

423
00:23:24,730 --> 00:23:26,160
OK?

424
00:23:26,160 --> 00:23:30,030
So that's direct
access array sort.

425
00:23:30,030 --> 00:23:34,410
What if we had a set of keys
that was a little larger?

426
00:23:40,480 --> 00:23:53,250
So let's say u is theta n
implies linear time sorting.

427
00:23:53,250 --> 00:23:54,810
That's great.

428
00:23:54,810 --> 00:23:59,250
So now, what happens if we
expand that range a little bit?

429
00:23:59,250 --> 00:24:02,880
Say u is less than or
equal to n squared--

430
00:24:02,880 --> 00:24:03,960
maybe just less than.

431
00:24:06,470 --> 00:24:14,300
OK, this is a bigger range
And if we instantiated

432
00:24:14,300 --> 00:24:18,500
a direct access array
of quadratics size,

433
00:24:18,500 --> 00:24:20,210
we'd have a quadratic
time algorithm.

434
00:24:20,210 --> 00:24:21,170
This is not helpful.

435
00:24:24,000 --> 00:24:30,740
Anyone have a way in which
we could sort integers that

436
00:24:30,740 --> 00:24:33,170
are between 0 and n squared?

437
00:24:36,070 --> 00:24:38,140
Maybe using the stuff
that we had above--

438
00:24:42,860 --> 00:24:43,360
Yeah?

439
00:24:43,360 --> 00:24:46,460
AUDIENCE: [INAUDIBLE]
sort by the first n,

440
00:24:46,460 --> 00:24:49,790
kind of like the first digit.

441
00:24:49,790 --> 00:24:52,520
JASON KU: Your colleague
is saying exactly

442
00:24:52,520 --> 00:24:54,980
the thing that I'm looking
for, which is great,

443
00:24:54,980 --> 00:25:02,970
which is maybe we could break
this larger number into two

444
00:25:02,970 --> 00:25:05,700
smaller numbers.

445
00:25:05,700 --> 00:25:12,810
Any integer that is between 0
n squared can be written as--

446
00:25:15,330 --> 00:25:26,580
key can be some a and b, where
a is essentially the higher n

447
00:25:26,580 --> 00:25:28,980
and b is the lower n.

448
00:25:28,980 --> 00:25:31,020
This is kind of weird.

449
00:25:31,020 --> 00:25:33,340
OK, so what do I
actually mean by this?

450
00:25:33,340 --> 00:25:43,350
I mean that let's let a be
K, when I divide it by n--

451
00:25:43,350 --> 00:25:51,300
integer, the floor-- key
integer to divide by n.

452
00:25:51,300 --> 00:25:57,000
And b equals K mod n.

453
00:25:57,000 --> 00:26:00,960
So this is a number that's
less than n and this is

454
00:26:00,960 --> 00:26:03,120
a number that's less than n.

455
00:26:03,120 --> 00:26:04,480
Does that make sense?

456
00:26:04,480 --> 00:26:08,430
And actually, I can
recover K at any time

457
00:26:08,430 --> 00:26:13,130
by saying K equals an plus b.

458
00:26:13,130 --> 00:26:17,710
I've essentially decomposed
this into a base n

459
00:26:17,710 --> 00:26:21,350
representation of this number.

460
00:26:21,350 --> 00:26:23,690
And I have two digits
in that number.

461
00:26:23,690 --> 00:26:26,510
This is the n-th--

462
00:26:26,510 --> 00:26:29,560
n digit, and this
is the ones digit.

463
00:26:29,560 --> 00:26:31,460
Does that make sense?

464
00:26:31,460 --> 00:26:36,035
All right, so now let's say
I have this list of numbers--

465
00:26:43,980 --> 00:26:49,390
17, 3, 24, 22, 12.

466
00:26:54,660 --> 00:26:57,610
Here I have five numbers.

467
00:26:57,610 --> 00:27:00,290
So what's n in this case?

468
00:27:00,290 --> 00:27:03,320
5-- OK, not so interesting.

469
00:27:03,320 --> 00:27:06,320
n is 5 here.

470
00:27:06,320 --> 00:27:11,870
And I'm going to represent
this as five pairs of numbers

471
00:27:11,870 --> 00:27:15,930
that are each within
the bounds of 0 to 4.

472
00:27:15,930 --> 00:27:17,310
Does that makes sense?

473
00:27:17,310 --> 00:27:20,525
So what is my a, b
representation of 17?

474
00:27:24,320 --> 00:27:30,380
3, 2-- OK.

475
00:27:30,380 --> 00:27:35,400
Yeah, so there are
3 times 5 plus 2.

476
00:27:35,400 --> 00:27:35,900
That's good.

477
00:27:35,900 --> 00:27:37,040
That's 17.

478
00:27:37,040 --> 00:27:38,240
Yeah?

479
00:27:38,240 --> 00:27:41,600
I think your colleague
did that, right?

480
00:27:41,600 --> 00:27:43,220
I have all of
these written down,

481
00:27:43,220 --> 00:27:44,637
so I'm just going
to write it out.

482
00:27:53,810 --> 00:27:56,690
And I hope I did it correctly.

483
00:27:56,690 --> 00:28:00,890
OK-- 3, 2; 0, 3;
4, 4; 4, 2; 2, 2--

484
00:28:00,890 --> 00:28:02,230
OK.

485
00:28:02,230 --> 00:28:04,210
So now I have a bunch
of things that I

486
00:28:04,210 --> 00:28:12,480
want to sort based on
this function that I have.

487
00:28:12,480 --> 00:28:15,390
These are no longer just
integers that I need to sort.

488
00:28:15,390 --> 00:28:19,470
I need to sort by this
transformation of this thing

489
00:28:19,470 --> 00:28:20,580
into a number.

490
00:28:20,580 --> 00:28:22,400
Does that make sense?

491
00:28:22,400 --> 00:28:27,490
So anyone have any
ideas on how we could--

492
00:28:27,490 --> 00:28:31,810
by the way, these are both
constant time operations

493
00:28:31,810 --> 00:28:36,370
on your computer, as long
as it's an integer division

494
00:28:36,370 --> 00:28:38,080
and this is mod.

495
00:28:38,080 --> 00:28:41,260
Python also has a
nice thing, I think,

496
00:28:41,260 --> 00:28:51,100
in its standard operations,
which is divmod of K, n.

497
00:28:51,100 --> 00:28:52,250
Is that right?

498
00:28:52,250 --> 00:28:53,860
Yeah.

499
00:28:53,860 --> 00:28:55,360
So if you want to
use that, you can.

500
00:28:58,180 --> 00:29:00,850
OK, so how do we
sort these tuples?

501
00:29:00,850 --> 00:29:02,800
These are tuples, right?

502
00:29:02,800 --> 00:29:06,250
You guys are, I'm sure, very
familiar with tuples by now.

503
00:29:08,860 --> 00:29:09,985
How do I sort these tuples?

504
00:29:14,680 --> 00:29:18,220
What's the most important
digit of this thing?

505
00:29:18,220 --> 00:29:21,120
If I had to sort
one of the digits

506
00:29:21,120 --> 00:29:24,250
and get something
that's close to sorted,

507
00:29:24,250 --> 00:29:28,800
what's more important-- the
1's digit or the n's digit?

508
00:29:28,800 --> 00:29:31,170
OK, we have discrepancy here.

509
00:29:31,170 --> 00:29:33,700
Who says 1?

510
00:29:33,700 --> 00:29:35,380
Who says n?

511
00:29:35,380 --> 00:29:37,060
Someone who said n tell me why.

512
00:29:41,620 --> 00:29:45,742
Oh, you all think that
way for no reason.

513
00:29:45,742 --> 00:29:47,560
AUDIENCE: [INAUDIBLE]

514
00:29:47,560 --> 00:29:48,550
JASON KU: Yeah.

515
00:29:48,550 --> 00:29:49,180
Sorry.

516
00:29:49,180 --> 00:29:50,600
This is a little confusing.

517
00:29:50,600 --> 00:29:51,580
This is the 1's digit.

518
00:29:51,580 --> 00:29:53,110
This is the n's digit.

519
00:29:53,110 --> 00:29:54,250
This is the n's digit.

520
00:29:54,250 --> 00:29:56,770
This is the 1's digit
in how I'm writing this.

521
00:29:56,770 --> 00:29:58,710
Does that makes sense?

522
00:29:58,710 --> 00:29:59,210
Yeah?

523
00:29:59,210 --> 00:30:03,701
AUDIENCE: [INAUDIBLE] have
a different ones digit

524
00:30:03,701 --> 00:30:04,700
inside of it.

525
00:30:04,700 --> 00:30:07,856
So you could have
[INAUDIBLE] but

526
00:30:07,856 --> 00:30:10,211
that only tells you where
they are with regard

527
00:30:10,211 --> 00:30:11,794
to the specific n
category they're in.

528
00:30:11,794 --> 00:30:13,100
So it's more of a [INAUDIBLE].

529
00:30:13,100 --> 00:30:13,725
JASON KU: Yeah.

530
00:30:13,725 --> 00:30:18,010
So what your colleague is
saying is exactly correct.

531
00:30:18,010 --> 00:30:23,760
I could vary b all I want
right with the same a.

532
00:30:23,760 --> 00:30:26,730
If I change a by 1,
it doesn't matter what

533
00:30:26,730 --> 00:30:28,110
b is-- it's going to be bigger.

534
00:30:31,840 --> 00:30:33,910
Does that make sense?

535
00:30:33,910 --> 00:30:36,460
The K is much more
sensitive to a

536
00:30:36,460 --> 00:30:40,040
than it is to b, so a is
more important than b.

537
00:30:40,040 --> 00:30:42,080
Does that make sense?

538
00:30:42,080 --> 00:30:47,560
So if I just wanted to get
some linear time algorithm,

539
00:30:47,560 --> 00:30:50,230
I could just sort by
their bigger digits

540
00:30:50,230 --> 00:30:55,630
and hope they don't differ very
much on the smaller things.

541
00:30:55,630 --> 00:30:57,800
I've kind of sorted
these things.

542
00:30:57,800 --> 00:30:59,480
Does that make sense?

543
00:30:59,480 --> 00:30:59,980
OK.

544
00:30:59,980 --> 00:31:01,900
What if I actually want
to sort these things?

545
00:31:06,250 --> 00:31:06,970
Any hints?

546
00:31:10,150 --> 00:31:12,590
Yeah?

547
00:31:12,590 --> 00:31:14,330
I need to sort on
both, in some sense.

548
00:31:17,630 --> 00:31:19,130
What I'm going to
tell you right now

549
00:31:19,130 --> 00:31:23,690
is an algorithm that I
like to call tuple sort,

550
00:31:23,690 --> 00:31:28,280
but you can also think of it
as Excel spreadsheets sort.

551
00:31:28,280 --> 00:31:31,220
I have an Excel spreadsheet
of a bunch of data.

552
00:31:31,220 --> 00:31:34,665
I have a prioritization on how
important the keys are to me--

553
00:31:34,665 --> 00:31:35,165
the columns.

554
00:31:37,720 --> 00:31:42,730
And if I have a very
important column and an order

555
00:31:42,730 --> 00:31:46,100
of the columns of how
important they are to me,

556
00:31:46,100 --> 00:31:49,730
I can repeatedly
sought on the columns

557
00:31:49,730 --> 00:31:54,010
until they're sorted
based on my preference.

558
00:31:54,010 --> 00:31:57,080
That's something that
you may have done.

559
00:31:57,080 --> 00:32:00,350
Now, if I have an ordering on
the preferences of my columns,

560
00:32:00,350 --> 00:32:04,430
do I start by
sorting all of them

561
00:32:04,430 --> 00:32:07,980
on the most important thing
or the least important thing?

562
00:32:07,980 --> 00:32:10,320
What?

563
00:32:10,320 --> 00:32:12,590
Who says most?

564
00:32:12,590 --> 00:32:13,550
Who says least?

565
00:32:16,090 --> 00:32:18,970
There's discrepancy here.

566
00:32:18,970 --> 00:32:22,520
All right, let's try it out.

567
00:32:22,520 --> 00:32:26,060
All right, tuple sort--

568
00:32:26,060 --> 00:32:30,950
let's start by sorting
these things by least

569
00:32:30,950 --> 00:32:33,650
significant first, and then--

570
00:32:33,650 --> 00:32:36,140
no, most significant first
and then least significant.

571
00:32:36,140 --> 00:32:38,570
That was the first thing
I asked you, right?

572
00:32:38,570 --> 00:32:41,940
All right, so these are the
most significant things,

573
00:32:41,940 --> 00:32:42,620
the first ones.

574
00:32:42,620 --> 00:32:46,050
And these are the less
significant things.

575
00:32:46,050 --> 00:32:48,930
All right, instead of
writing it as tuples,

576
00:32:48,930 --> 00:32:55,110
I'm going to write them
as 32, 03, 44, 42, 22.

577
00:32:55,110 --> 00:32:56,340
Is everyone cool that?

578
00:32:56,340 --> 00:33:00,440
This is just base
five representation.

579
00:33:00,440 --> 00:33:04,610
All right, so let's start by
sorting all of these things

580
00:33:04,610 --> 00:33:08,180
by the most significant
thing, which

581
00:33:08,180 --> 00:33:12,620
is by this guy, this guy, this
guy, this guy, and this guy.

582
00:33:12,620 --> 00:33:13,470
So how do I do it?

583
00:33:13,470 --> 00:33:22,530
The first one is 03, second one
is 22, the next one is 32, 42,

584
00:33:22,530 --> 00:33:23,295
and then 44--

585
00:33:26,000 --> 00:33:27,800
maybe 44?

586
00:33:27,800 --> 00:33:29,090
I don't know.

587
00:33:29,090 --> 00:33:33,208
Does it matter, the order
in which I put these things?

588
00:33:33,208 --> 00:33:33,750
I don't know.

589
00:33:33,750 --> 00:33:36,090
I'm just going to keep it
the same order for now.

590
00:33:36,090 --> 00:33:38,670
All right, so I've sorted it
by the least significant--

591
00:33:38,670 --> 00:33:41,100
or the most
significant-- sorry--

592
00:33:41,100 --> 00:33:43,350
the leading term.

593
00:33:43,350 --> 00:33:46,720
And now I'm going to sort
by the least significant.

594
00:33:46,720 --> 00:33:49,620
So what's the least
significant here?

595
00:33:49,620 --> 00:33:53,950
22-- then 2 is also--

596
00:33:53,950 --> 00:33:55,180
this is also 2.

597
00:33:55,180 --> 00:33:57,100
This is also 2.

598
00:33:57,100 --> 00:33:59,440
This is 3.

599
00:33:59,440 --> 00:34:01,630
And sorted list-- voila.

600
00:34:04,640 --> 00:34:05,690
Why did that not work?

601
00:34:09,100 --> 00:34:10,471
Yeah?

602
00:34:10,471 --> 00:34:17,034
AUDIENCE: [INAUDIBLE]

603
00:34:17,034 --> 00:34:17,659
JASON KU: Yeah.

604
00:34:17,659 --> 00:34:20,480
So what happened is I
did take into account

605
00:34:20,480 --> 00:34:23,179
the significant
digit sort, but when

606
00:34:23,179 --> 00:34:27,620
I did the less significant
thing, it erased all of my work

607
00:34:27,620 --> 00:34:28,739
from up here.

608
00:34:28,739 --> 00:34:31,080
Does that make sense?

609
00:34:31,080 --> 00:34:36,780
In the case of ties, we want
the more significant thing

610
00:34:36,780 --> 00:34:39,675
to take precedence, so we
want to do that thing last.

611
00:34:39,675 --> 00:34:41,230
Does that makes sense?

612
00:34:41,230 --> 00:34:44,560
So the right way to do this--

613
00:34:44,560 --> 00:34:54,415
this is the most significant
first [INAUDIBLE] not good.

614
00:34:54,415 --> 00:34:56,040
All right, at least
significant first--

615
00:34:56,040 --> 00:34:59,010
let's try that.

616
00:34:59,010 --> 00:35:02,370
So least significant here is 2.

617
00:35:02,370 --> 00:35:16,520
OK, so I see 32, 42,
22, 03, and then 44.

618
00:35:16,520 --> 00:35:18,740
OK?

619
00:35:18,740 --> 00:35:20,160
Sound good?

620
00:35:20,160 --> 00:35:24,013
Least significant first--
now I do most significant.

621
00:35:24,013 --> 00:35:25,430
I sort the most
significant thing.

622
00:35:25,430 --> 00:35:27,590
OK, so what's the most
significant thing?

623
00:35:27,590 --> 00:35:37,610
03, 22, 32-- most
significant four--

624
00:35:37,610 --> 00:35:41,310
44, and 42-- cool.

625
00:35:41,310 --> 00:35:44,260
We're sorted, right?

626
00:35:44,260 --> 00:35:45,790
I did what you told me to do.

627
00:35:45,790 --> 00:35:50,750
I sorted by the most
significant thing.

628
00:35:50,750 --> 00:35:51,950
What's the problem here?

629
00:35:58,920 --> 00:36:00,300
What did I do wrong?

630
00:36:00,300 --> 00:36:05,830
You wanted me to put 42
here and 44 here, right?

631
00:36:05,830 --> 00:36:10,120
Because 42 came first
in the input and 44

632
00:36:10,120 --> 00:36:11,010
came second, right?

633
00:36:14,070 --> 00:36:18,510
OK, if a sorting algorithm
maintains this property that,

634
00:36:18,510 --> 00:36:25,050
if they are the same
thing, then the output

635
00:36:25,050 --> 00:36:29,400
maintains their order from
the input to the output--

636
00:36:29,400 --> 00:36:33,260
their relative order-- that's
what we call a stable sorting

637
00:36:33,260 --> 00:36:34,190
algorithm.

638
00:36:34,190 --> 00:36:36,650
And so if we have a stable
sorting algorithm when

639
00:36:36,650 --> 00:36:40,850
we're doing tuple sort, when
we're sorting on different keys

640
00:36:40,850 --> 00:36:45,110
or columns of a
set, we really want

641
00:36:45,110 --> 00:36:48,110
to be using a stable
sorting algorithm.

642
00:36:48,110 --> 00:36:50,250
Does that makes sense?

643
00:36:50,250 --> 00:36:52,850
Because otherwise,
we may mess up

644
00:36:52,850 --> 00:36:56,330
work we did before
in a previous sort

645
00:36:56,330 --> 00:36:58,500
of the less significant things.

646
00:36:58,500 --> 00:37:04,610
And so yes, we want a stable
sorting algorithm here,

647
00:37:04,610 --> 00:37:08,690
because then we will end
up sorting our thing.

648
00:37:08,690 --> 00:37:10,190
Does that make sense?

649
00:37:10,190 --> 00:37:11,058
Yes?

650
00:37:11,058 --> 00:37:17,760
AUDIENCE: [INAUDIBLE]

651
00:37:17,760 --> 00:37:21,810
JASON KU: So what your
colleague is saying--

652
00:37:21,810 --> 00:37:26,220
let's sort by most significant,
then look at all of the things

653
00:37:26,220 --> 00:37:32,530
with one of those that are
the same, and now sort that.

654
00:37:32,530 --> 00:37:34,750
That's something we could do.

655
00:37:34,750 --> 00:37:36,140
How long would that take?

656
00:37:36,140 --> 00:37:43,220
Well, let's say I didn't use
half of my more significant set

657
00:37:43,220 --> 00:37:44,810
of digits.

658
00:37:44,810 --> 00:37:49,960
Say I'm only using n/2 or--

659
00:37:49,960 --> 00:37:51,750
that's not quite going
to get what I want.

660
00:37:54,672 --> 00:37:58,517
AUDIENCE: [INAUDIBLE]

661
00:37:58,517 --> 00:37:59,350
JASON KU: Say again.

662
00:37:59,350 --> 00:38:01,875
AUDIENCE: We'll take
n squared [INAUDIBLE]..

663
00:38:01,875 --> 00:38:02,500
JASON KU: Yeah.

664
00:38:02,500 --> 00:38:06,760
So what we're going to do, if we
have direct access array sort--

665
00:38:06,760 --> 00:38:08,920
if I then go into each
one of these digits

666
00:38:08,920 --> 00:38:11,410
and try to sort the
things that are in there,

667
00:38:11,410 --> 00:38:13,210
that's going to take time.

668
00:38:13,210 --> 00:38:16,810
It's going to take time
for each of those digits.

669
00:38:16,810 --> 00:38:20,440
There might be a
ton of collisions

670
00:38:20,440 --> 00:38:25,030
into one of the things, and
so I might take more time

671
00:38:25,030 --> 00:38:26,860
to sort that than linear.

672
00:38:26,860 --> 00:38:28,250
Does that make sense?

673
00:38:28,250 --> 00:38:31,840
So I would prefer to do this
tuple sort kind of behavior,

674
00:38:31,840 --> 00:38:34,990
sorting the smaller thing,
sorting the bigger thing.

675
00:38:34,990 --> 00:38:37,390
And because I only have a
constant number of things

676
00:38:37,390 --> 00:38:40,540
in my tuples, this is
important, because I only

677
00:38:40,540 --> 00:38:43,150
have two things I'm
worried about here.

678
00:38:43,150 --> 00:38:48,840
I only have to do two passes
of a sorting algorithm

679
00:38:48,840 --> 00:38:51,180
to be able to sort
these numbers.

680
00:38:51,180 --> 00:38:56,180
However, can I use direct
access array sort here?

681
00:38:56,180 --> 00:39:00,320
What was the initial stipulation
I had on direct access array?

682
00:39:00,320 --> 00:39:03,048
That the keys were unique--

683
00:39:03,048 --> 00:39:05,090
that's exactly the opposite
of what we have here.

684
00:39:05,090 --> 00:39:06,890
We have things that
could be the same.

685
00:39:10,970 --> 00:39:12,320
So we give up--

686
00:39:12,320 --> 00:39:12,850
can't do it.

687
00:39:16,093 --> 00:39:17,010
What do we do instead?

688
00:39:20,790 --> 00:39:21,290
Yeah?

689
00:39:21,290 --> 00:39:29,030
AUDIENCE: [INAUDIBLE]

690
00:39:29,030 --> 00:39:31,590
JASON KU: You've already said
the thing that I'm looking for,

691
00:39:31,590 --> 00:39:32,750
so that's great.

692
00:39:32,750 --> 00:39:35,180
Your colleague
said, why can't we

693
00:39:35,180 --> 00:39:39,140
just put more things at a key?

694
00:39:39,140 --> 00:39:41,530
Why can't we put a list there?

695
00:39:41,530 --> 00:39:42,662
That's exactly what we do.

696
00:39:42,662 --> 00:39:43,870
This is called counting sort.

697
00:39:48,340 --> 00:39:50,980
And what we do here is we
still have this direct access

698
00:39:50,980 --> 00:39:56,140
array of space u
minus 0 to u minus 1,

699
00:39:56,140 --> 00:40:03,390
but instead of storing one
thing here at each key K,

700
00:40:03,390 --> 00:40:07,830
we store a pointer to a chain.

701
00:40:07,830 --> 00:40:10,350
This sounds like hashing, right?

702
00:40:10,350 --> 00:40:13,230
But the important
thing is that I

703
00:40:13,230 --> 00:40:16,290
need to make sure, as
I'm inserting things

704
00:40:16,290 --> 00:40:19,170
in here, that I'm maintaining
the order in which they

705
00:40:19,170 --> 00:40:20,330
came in.

706
00:40:20,330 --> 00:40:21,930
I can't just throw
them willy nilly,

707
00:40:21,930 --> 00:40:25,020
or else we have this problem
up here that we had before.

708
00:40:25,020 --> 00:40:28,350
So I need what I would
say is sequence data

709
00:40:28,350 --> 00:40:32,310
structure, something that will
maintain the order that I--

710
00:40:32,310 --> 00:40:34,140
the extrinsic order
that I had when

711
00:40:34,140 --> 00:40:37,360
I'm putting these things in.

712
00:40:37,360 --> 00:40:43,920
So as I have multiple
things with K,

713
00:40:43,920 --> 00:40:46,770
I'm going to put
them in the order.

714
00:40:46,770 --> 00:40:50,160
I can put-- have a pointer to a
dynamic array or a linked list,

715
00:40:50,160 --> 00:40:53,418
where I just add
things to the end.

716
00:40:53,418 --> 00:40:54,960
And then, at the
end of my algorithm,

717
00:40:54,960 --> 00:40:58,890
when I read off the
things, I can just

718
00:40:58,890 --> 00:41:02,460
look at anyone that
has a non-empty data

719
00:41:02,460 --> 00:41:04,980
structure under here and
read them off in the order

720
00:41:04,980 --> 00:41:06,770
that they came.

721
00:41:06,770 --> 00:41:09,500
Does that makes sense?

722
00:41:09,500 --> 00:41:14,200
So for this example,
I'm just going

723
00:41:14,200 --> 00:41:20,230
to do this last step
here from the first row

724
00:41:20,230 --> 00:41:22,280
to the second row.

725
00:41:22,280 --> 00:41:28,030
I'm going to have this direct
access array with 0, 1, 2, 3, 4

726
00:41:28,030 --> 00:41:28,705
on the slots.

727
00:41:32,110 --> 00:41:35,930
So how am I going to do
this counting sort now?

728
00:41:35,930 --> 00:41:41,540
I have 32, 42, 22, 03, and 44.

729
00:41:41,540 --> 00:41:43,460
I can take the first one, 32.

730
00:41:43,460 --> 00:41:46,100
I'm sorting by the
most significant thing.

731
00:41:46,100 --> 00:41:48,890
I stick it here--

732
00:41:48,890 --> 00:41:53,484
32, and then 44--

733
00:41:53,484 --> 00:41:59,360
42-- sorry-- 42, 22.

734
00:41:59,360 --> 00:42:06,800
This is not so much different
yet then dynamic array--

735
00:42:06,800 --> 00:42:08,120
direct access array sort.

736
00:42:08,120 --> 00:42:14,700
But when we get
to this duplicate,

737
00:42:14,700 --> 00:42:19,430
44 here, we now have two
things in this thing.

738
00:42:19,430 --> 00:42:24,230
And because we are keeping
them in order in this sequence,

739
00:42:24,230 --> 00:42:26,060
I'm appending to the end.

740
00:42:26,060 --> 00:42:31,700
Then, when I go and read
off the different things,

741
00:42:31,700 --> 00:42:35,000
then I'm returning them
in a stable way in the way

742
00:42:35,000 --> 00:42:36,290
that I want them to be.

743
00:42:36,290 --> 00:42:38,620
Does that makes sense.

744
00:42:38,620 --> 00:42:40,380
And it's not
overwriting the work

745
00:42:40,380 --> 00:42:42,300
I did on the lower
significant digits.

746
00:42:44,900 --> 00:42:46,160
So how long does this take?

747
00:42:49,660 --> 00:42:56,290
This also only takes
order n plus u,

748
00:42:56,290 --> 00:43:00,450
because I'm instantiating
this thing of size u.

749
00:43:00,450 --> 00:43:02,730
And then, how big are
these data structures?

750
00:43:02,730 --> 00:43:07,630
Well, maybe I'm storing one, a
constant amount for each index.

751
00:43:07,630 --> 00:43:09,720
So that's a u overhead.

752
00:43:09,720 --> 00:43:14,220
And then I'm paying 1 for
every item I'm storing.

753
00:43:14,220 --> 00:43:18,030
These things are
only the lengths.

754
00:43:18,030 --> 00:43:21,360
The sum total of
their lengths is n,

755
00:43:21,360 --> 00:43:24,990
because I'm only storing
n things in there.

756
00:43:24,990 --> 00:43:29,870
So the total amount of
space, the total amount

757
00:43:29,870 --> 00:43:32,480
of work I have to do is order--

758
00:43:32,480 --> 00:43:36,200
I need to be able to
spend in constant time

759
00:43:36,200 --> 00:43:38,390
and I need to be able to
cycle through these things,

760
00:43:38,390 --> 00:43:40,550
iterate over them
in linear time.

761
00:43:40,550 --> 00:43:42,830
But if I have that,
I get n plus u.

762
00:43:42,830 --> 00:43:44,110
Yeah?

763
00:43:44,110 --> 00:43:45,800
AUDIENCE: How do
you ensure that,

764
00:43:45,800 --> 00:43:49,167
within your linked list or
your dynamic-- those elements,

765
00:43:49,167 --> 00:43:52,534
like four equals four--
how do you make sure

766
00:43:52,534 --> 00:43:55,460
that those are sorted?

767
00:43:55,460 --> 00:43:58,790
JASON KU: So your
colleague is saying,

768
00:43:58,790 --> 00:44:01,280
how do I ensure that the
things in these lists,

769
00:44:01,280 --> 00:44:03,920
where they collide, how do you
ensure that they're sorted?

770
00:44:03,920 --> 00:44:05,270
I don't.

771
00:44:05,270 --> 00:44:09,740
I just ensure that they came
in the order that they came.

772
00:44:09,740 --> 00:44:14,700
But as long as I sorted the
lower order digits correctly

773
00:44:14,700 --> 00:44:19,140
in the previous things,
then I'm assuming

774
00:44:19,140 --> 00:44:20,880
that their order as
they come in will

775
00:44:20,880 --> 00:44:22,470
be sorted, if they collide.

776
00:44:22,470 --> 00:44:23,910
That's the assumption.

777
00:44:23,910 --> 00:44:27,997
That's the reason why I'm
doing these building up

778
00:44:27,997 --> 00:44:29,580
from the least
significant to the most

779
00:44:29,580 --> 00:44:33,690
significant is so that I
know that, when they collide,

780
00:44:33,690 --> 00:44:37,675
the underlying stuff there is
sorted already in the input.

781
00:44:37,675 --> 00:44:38,550
Does that make sense?

782
00:44:38,550 --> 00:44:39,422
Great-- yeah?

783
00:44:39,422 --> 00:44:45,246
AUDIENCE: So this array
isn't as big as u.

784
00:44:45,246 --> 00:44:48,388
It's as big as n.

785
00:44:48,388 --> 00:44:50,680
JASON KU: I'm using a direct
access array on the keys--

786
00:44:50,680 --> 00:44:51,460
oh, this is n.

787
00:44:56,080 --> 00:44:59,140
So counting sort is
general for any u.

788
00:44:59,140 --> 00:45:04,360
I just happened to pick u being
n in this case when I broke

789
00:45:04,360 --> 00:45:06,010
this thing up into n squared.

790
00:45:06,010 --> 00:45:07,870
But this general concept is--

791
00:45:10,440 --> 00:45:12,160
doesn't matter what
I choose for u.

792
00:45:12,160 --> 00:45:15,100
Does that make sense?

793
00:45:15,100 --> 00:45:15,600
OK.

794
00:45:15,600 --> 00:45:18,870
But we will use that
right now to sort

795
00:45:18,870 --> 00:45:20,150
larger ranges of numbers.

796
00:45:25,830 --> 00:45:27,120
This was exactly the idea.

797
00:45:27,120 --> 00:45:30,150
We're going to combine
tuple sort, use counting

798
00:45:30,150 --> 00:45:32,910
sort as its auxiliary sorting--

799
00:45:32,910 --> 00:45:38,040
stable sorting algorithm to do
all its work on these digits.

800
00:45:38,040 --> 00:45:45,090
And so to sort of on n
squared size numbers,

801
00:45:45,090 --> 00:45:47,340
I get linear time,
which is great,

802
00:45:47,340 --> 00:45:49,620
because u is n in this case.

803
00:45:54,890 --> 00:45:56,120
But can I extend that?

804
00:45:56,120 --> 00:45:59,210
What if I had n cubed?

805
00:45:59,210 --> 00:46:04,700
What if I had up to size u
equals n cubed, or less than n

806
00:46:04,700 --> 00:46:05,882
cubed?

807
00:46:05,882 --> 00:46:07,340
How many digits
would I have there?

808
00:46:12,290 --> 00:46:16,510
How many size n digits
what I need to represent

809
00:46:16,510 --> 00:46:18,400
a number of size n cubed?

810
00:46:21,710 --> 00:46:22,400
Any ideas?

811
00:46:24,920 --> 00:46:26,270
What did we do here?

812
00:46:26,270 --> 00:46:28,580
We divided off an n.

813
00:46:28,580 --> 00:46:30,050
We took it and stored it.

814
00:46:30,050 --> 00:46:32,750
We're left with
something of size n.

815
00:46:32,750 --> 00:46:36,680
If I had a number of size n
cubed, I could divide off an n.

816
00:46:36,680 --> 00:46:38,540
I'm left with
something of n squared.

817
00:46:38,540 --> 00:46:40,910
I don't know how to deal
with something of n squared.

818
00:46:40,910 --> 00:46:42,530
Actually, I do.

819
00:46:42,530 --> 00:46:46,130
I can split it up into
two size n numbers.

820
00:46:46,130 --> 00:46:52,210
So if I had numbers bound--
upper bounded by a cubic--

821
00:46:52,210 --> 00:46:56,080
n cubed-- I could split
it up into three digits.

822
00:46:56,080 --> 00:46:57,850
Three is still constant.

823
00:46:57,850 --> 00:47:01,030
And so I could split it
up into three digits,

824
00:47:01,030 --> 00:47:06,310
tuple sort them in their
increasing priority,

825
00:47:06,310 --> 00:47:07,300
and sort those.

826
00:47:07,300 --> 00:47:10,660
Again, I'm doing
linear work per digit.

827
00:47:10,660 --> 00:47:12,200
I have a constant
number of digits,

828
00:47:12,200 --> 00:47:14,500
so I get a linear
time algorithm.

829
00:47:14,500 --> 00:47:15,293
Yeah?

830
00:47:15,293 --> 00:47:18,040
AUDIENCE: When it comes
to sorting [INAUDIBLE]----

831
00:47:18,040 --> 00:47:18,960
JASON KU: Uh-huh.

832
00:47:18,960 --> 00:47:21,755
AUDIENCE: Are you ensuring
that that runtime is also

833
00:47:21,755 --> 00:47:23,705
big O of n plus u?

834
00:47:23,705 --> 00:47:24,330
JASON KU: Yeah.

835
00:47:24,330 --> 00:47:28,080
So it's always going to
be big O of n plus u,

836
00:47:28,080 --> 00:47:33,300
but because I'm bounding
my digit size to be n,

837
00:47:33,300 --> 00:47:36,100
u is n there, and so
I'm getting linear time.

838
00:47:36,100 --> 00:47:37,180
Does that make sense?

839
00:47:37,180 --> 00:47:37,680
Yeah.

840
00:47:37,680 --> 00:47:39,580
So the idea here--

841
00:47:39,580 --> 00:47:42,090
this is what we call radix sort.

842
00:47:42,090 --> 00:48:01,180
Radix sort-- break up
integers, max size u,

843
00:48:01,180 --> 00:48:10,645
into a base and tuple.

844
00:48:14,470 --> 00:48:17,650
So basically, each one of my
digits can range from 0 to n.

845
00:48:21,790 --> 00:48:24,760
How many base n digits do if
I have a number of size u?

846
00:48:28,590 --> 00:48:30,690
Yeah, log n of u--

847
00:48:30,690 --> 00:48:36,900
number of digits is log n of u--

848
00:48:36,900 --> 00:48:38,130
log base n of u.

849
00:48:41,930 --> 00:48:56,060
And then tuple sort on
digits using counting sort,

850
00:48:56,060 --> 00:49:05,190
from least to most significant--

851
00:49:05,190 --> 00:49:06,780
that's the algorithm.

852
00:49:06,780 --> 00:49:10,200
How long does that take?

853
00:49:10,200 --> 00:49:13,980
How long does it take
to sort on a digit that

854
00:49:13,980 --> 00:49:17,550
spans the key 0 to n?

855
00:49:17,550 --> 00:49:18,690
Linear time, right?

856
00:49:18,690 --> 00:49:23,790
Order n time-- how many times
do I have to do this tuple sort?

857
00:49:23,790 --> 00:49:27,060
The number of
digits times, right?

858
00:49:27,060 --> 00:49:29,370
So the running time
of this algorithm--

859
00:49:29,370 --> 00:49:35,230
first, I have to do this stuff,
break up each of the integers.

860
00:49:35,230 --> 00:49:37,420
That takes n time--

861
00:49:37,420 --> 00:49:38,680
n times the number of digits.

862
00:49:38,680 --> 00:49:42,100
I had to create each
one of these tuples--

863
00:49:42,100 --> 00:49:47,850
so n plus n times the
number of digits--

864
00:49:47,850 --> 00:49:54,900
log base n of u.

865
00:49:54,900 --> 00:49:58,320
So here I had to loop
through all the things.

866
00:49:58,320 --> 00:50:01,170
And then here, for each
thing, I broke it up

867
00:50:01,170 --> 00:50:09,500
into log base n of
u digits, and that's

868
00:50:09,500 --> 00:50:11,480
how long the first thing took.

869
00:50:11,480 --> 00:50:15,250
And then, how long did
it take me to tuple sort?

870
00:50:15,250 --> 00:50:17,590
n time per digit--

871
00:50:17,590 --> 00:50:19,030
so I also get this factor.

872
00:50:19,030 --> 00:50:20,220
Does that make sense?

873
00:50:23,790 --> 00:50:24,630
How long is that?

874
00:50:24,630 --> 00:50:25,240
Is that good?

875
00:50:25,240 --> 00:50:27,010
Is that bad?

876
00:50:27,010 --> 00:50:30,430
For what values of u
is this linear time?

877
00:50:35,500 --> 00:50:43,390
If u is less than n to
the c for some constant c,

878
00:50:43,390 --> 00:50:48,430
then the c comes out of the
logarithm, log n of n is 1,

879
00:50:48,430 --> 00:50:50,155
and we get a linear
time algorithm.

880
00:50:50,155 --> 00:50:52,130
Does that makes sense?

881
00:50:52,130 --> 00:50:52,640
OK.

882
00:50:52,640 --> 00:50:56,060
So that's how we can
sort in linear time,

883
00:50:56,060 --> 00:51:00,920
if our things are only
polynomially large.

884
00:51:00,920 --> 00:51:04,100
So in counting sort,
we get n plus u.

885
00:51:04,100 --> 00:51:07,910
In radix sort, we get also
a stable sorting algorithm

886
00:51:07,910 --> 00:51:14,140
where the running time is n
plus n times log base n of u.

887
00:51:14,140 --> 00:51:16,700
Does that makes sense?

888
00:51:16,700 --> 00:51:20,855
And then, in the
situations where--

889
00:51:20,855 --> 00:51:22,480
there's a typo there
in counting sort--

890
00:51:22,480 --> 00:51:25,270
that should be
when u is order n--

891
00:51:25,270 --> 00:51:27,490
counting short runs
in linear time.

892
00:51:27,490 --> 00:51:31,510
And it's linear time also
in the case of rating sort,

893
00:51:31,510 --> 00:51:35,380
if our things are bounded
by a polynomial in n,

894
00:51:35,380 --> 00:51:38,770
by n to the c for
some constant c.

895
00:51:38,770 --> 00:51:41,180
Does that make sense?

896
00:51:41,180 --> 00:51:45,350
All right, so that's how
to sort in linear time,

897
00:51:45,350 --> 00:51:47,690
with the caveat that your
numbers aren't too big.

898
00:51:47,690 --> 00:51:51,700
OK, see you next week.