WEBVTT
00:00:08.000 --> 00:00:12.000
Today we're going to talk about
sorting, which may not come as
00:00:12.000 --> 00:00:15.000
such a big surprise.
We talked about sorting for a
00:00:15.000 --> 00:00:20.000
while, but we're going to talk
about it at a somewhat higher
00:00:20.000 --> 00:00:24.000
level and question some of the
assumptions that we've been
00:00:24.000 --> 00:00:27.000
making so far.
And we're going to ask the
00:00:27.000 --> 00:00:32.000
question how fast can we sort?
A pretty natural question.
00:00:32.000 --> 00:00:35.000
You may think you know the
answer.
00:00:35.000 --> 00:00:40.000
Perhaps you do.
Any suggestions on what the
00:00:40.000 --> 00:00:43.000
answer to this question might
be?
00:00:43.000 --> 00:00:46.000
There are several possible
answers.
00:00:46.000 --> 00:00:50.000
Many of them are partially
correct.
00:00:50.000 --> 00:00:56.000
Let's hear any kinds of answers
you'd like and start waking up
00:00:56.000 --> 00:01:00.000
this fresh morning.
Sorry?
00:01:00.000 --> 00:01:02.000
Theta n log n.
That's a good answer.
00:01:02.000 --> 00:01:06.000
That's often correct.
Any other suggestions?
00:01:06.000 --> 00:01:09.000
N squared.
That's correct if all you're
00:01:09.000 --> 00:01:12.000
allowed to do is swap adjacent
elements.
00:01:12.000 --> 00:01:13.000
Good.
That was close.
00:01:13.000 --> 00:01:17.000
I will see if I can make every
answer correct.
00:01:17.000 --> 00:01:20.000
Usually n squared is not the
right answer,
00:01:20.000 --> 00:01:22.000
but in some models it is.
Yeah?
00:01:22.000 --> 00:01:26.000
Theta n is also sometimes the
right answer.
00:01:26.000 --> 00:01:30.000
The real answer is "it
depends".
00:01:30.000 --> 00:01:33.000
That's the point of today's
lecture.
00:01:33.000 --> 00:01:37.000
It depends on what we call the
computational model,
00:01:37.000 --> 00:01:42.000
what you're allowed to do.
And, in particular here,
00:01:42.000 --> 00:01:46.000
with sorting,
what we care about is the order
00:01:46.000 --> 00:01:49.000
of the elements,
how are you allowed to
00:01:49.000 --> 00:01:54.000
manipulate the elements,
what are you allowed to do with
00:01:54.000 --> 00:02:00.000
them and find out their order.
The model is what you can do
00:02:00.000 --> 00:02:03.000
with the elements.
00:02:14.000 --> 00:02:18.000
Now, we've seen several sorting
algorithms.
00:02:18.000 --> 00:02:23.000
Do you want to shout some out?
I think we've seen four,
00:02:23.000 --> 00:02:27.000
but maybe you know even more
algorithms.
00:02:27.000 --> 00:02:30.000
Quicksort.
Keep going.
00:02:30.000 --> 00:02:32.000
Heapsort.
Merge sort.
00:02:32.000 --> 00:02:37.000
You can remember all the way
back to Lecture 1.
00:02:37.000 --> 00:02:39.000
Any others?
Insertion sort.
00:02:39.000 --> 00:02:43.000
All right.
You're on top of it today.
00:02:43.000 --> 00:02:49.000
I don't know exactly why,
but these two are single words
00:02:49.000 --> 00:02:54.000
and these two are two words.
That's the style.
00:02:54.000 --> 00:03:00.000
What is the running time of
quicksort?
00:03:00.000 --> 00:03:04.000
This is a bit tricky.
N log n in the average case.
00:03:04.000 --> 00:03:10.000
Or, if we randomize quicksort,
randomized quicksort runs in n
00:03:10.000 --> 00:03:14.000
log n expected for any input
sequence.
00:03:14.000 --> 00:03:18.000
Let's say n lg n randomized.
That's theta.
00:03:18.000 --> 00:03:24.000
And the worst-case with plain
old quicksort where you just
00:03:24.000 --> 00:03:30.000
pick the first element as the
partition element.
00:03:30.000 --> 00:03:34.000
That's n^2.
Heapsort, what's the running
00:03:34.000 --> 00:03:37.000
time there?
n lg n always.
00:03:37.000 --> 00:03:43.000
Merge sort, I hope you can
remember that as well,
00:03:43.000 --> 00:03:46.000
n lg n.
And insertion sort?
00:03:46.000 --> 00:03:50.000
n^2.
All of these algorithms run no
00:03:50.000 --> 00:03:54.000
faster than n lg n,
so we might ask,
00:03:54.000 --> 00:03:59.000
can we do better than n lg n?
00:04:11.000 --> 00:04:13.000
And that is a question,
in some sense,
00:04:13.000 --> 00:04:16.000
we will answer both yes and no
to today.
00:04:16.000 --> 00:04:20.000
But all of these algorithms
have something in common in
00:04:20.000 --> 00:04:25.000
terms of the model of what
you're allowed to do with the
00:04:25.000 --> 00:04:28.000
elements.
Any guesses on what that model
00:04:28.000 --> 00:04:30.000
might be?
Yeah?
00:04:30.000 --> 00:04:33.000
You compare pairs of elements,
exactly.
00:04:33.000 --> 00:04:39.000
That is indeed the model used
by all four of these algorithms.
00:04:39.000 --> 00:04:43.000
And in that model n lg n is the
best you can do.
00:04:43.000 --> 00:04:48.000
We have so far just looked at
what are called comparison
00:04:48.000 --> 00:04:52.000
sorting algorithms or
"comparison sorts".
00:04:52.000 --> 00:04:57.000
And this is a model for the
sorting problem of what you're
00:04:57.000 --> 00:05:02.000
allowed to do.
Here all you can do is use
00:05:02.000 --> 00:05:06.000
comparisons meaning less than,
greater than,
00:05:06.000 --> 00:05:11.000
less than or equal to,
greater than or equal to,
00:05:11.000 --> 00:05:17.000
equals to determine the
relative order of elements.
00:05:25.000 --> 00:05:26.000
This is a restriction on
algorithms.
00:05:26.000 --> 00:05:29.000
It is, in some sense,
stating what kinds of elements
00:05:29.000 --> 00:05:32.000
we're dealing with.
They are elements that we can
This is a three digit number.
00:05:32.000 --> 00:05:35.000
somehow compare.
They have a total order,
00:05:35.000 --> 00:05:37.000
some are less,
some are bigger.
00:05:37.000 --> 00:05:39.000
But is also restricts the
algorithm.
00:05:39.000 --> 00:05:42.000
You could say,
well, I'm sorting integers,
00:05:42.000 --> 00:05:45.000
but still I'm only allowed to
do comparisons with them.
00:05:45.000 --> 00:05:49.000
I'm not allowed to multiply the
integers or do other weird
00:05:49.000 --> 00:05:51.000
things.
That's the comparison sorting
00:05:51.000 --> 00:05:52.000
model.
And this lecture,
00:05:52.000 --> 00:05:55.000
in some sense,
follows the standard
00:05:55.000 --> 00:05:58.000
mathematical progression where
you have a theorem,
Then we get some 4s,
00:05:58.000 --> 00:06:01.000
then you have a proof,
then you have a counter
00:06:01.000 --> 00:06:05.000
example.
It's always a good way to have
00:06:05.000 --> 00:06:07.000
a math lecture.
We're going to prove the
00:06:07.000 --> 00:06:11.000
theorem that no comparison
sorting algorithm runs better
00:06:11.000 --> 00:06:13.000
than n lg n.
Comparisons.
00:06:13.000 --> 00:06:17.000
State the theorem,
prove that, and then we'll give
00:06:17.000 --> 00:06:21.000
a counter example in the sense
that if you go outside the
00:06:21.000 --> 00:06:25.000
comparison sorting model you can
do better, you can get linear
00:06:25.000 --> 00:06:28.000
time in some cases,
better than n lg n.
00:06:28.000 --> 00:06:32.000
So, that is what we're doing
today.
00:06:32.000 --> 00:06:36.000
But first we're going to stick
to this comparison model and try
00:06:36.000 --> 00:06:41.000
to understand why we need n lg n
comparisons if that's all we're
00:06:41.000 --> 00:06:45.000
allowed to do.
And for that we're going to
00:06:45.000 --> 00:06:48.000
look at something called
decision trees,
00:06:48.000 --> 00:06:52.000
which in some sense is another
model of what you're allowed to
00:06:52.000 --> 00:06:56.000
do in an algorithm,
but it's more general than the
00:06:56.000 --> 00:07:01.000
comparison model.
And let's try and example to
00:07:01.000 --> 00:07:06.000
get some intuition.
Suppose we want to sort three
00:07:06.000 --> 00:07:10.000
elements.
This is not very challenging,
00:07:10.000 --> 00:07:15.000
but we'll get to draw the
decision tree that corresponds
00:07:15.000 --> 00:07:22.000
to sorting three elements.
Here is one solution I claim.
00:07:42.000 --> 00:07:45.000
This is, in a certain sense,
an algorithm,
00:07:45.000 --> 00:07:50.000
but it's drawn as a tree
instead of pseudocode.
00:08:15.000 --> 00:08:18.000
What this tree means is that
each node you're making a
00:08:18.000 --> 00:08:21.000
comparison.
This says compare a_1 versus
00:08:21.000 --> 00:08:24.000
a_2.
If a_1 is smaller than a_2 you
00:08:24.000 --> 00:08:27.000
go this way, if it is bigger
than a_2 you go this way,
00:08:27.000 --> 00:08:32.000
and then you proceed.
When you get down to a leaf,
00:08:32.000 --> 00:08:36.000
this is the answer.
Remember, the sorting problem
00:08:36.000 --> 00:08:41.000
is you're trying to find a
permutation of the inputs that
00:08:41.000 --> 00:08:45.000
puts it in sorted order.
Let's try it with some sequence
00:08:45.000 --> 00:08:48.000
of numbers, say 9,
4 and 6.
00:08:48.000 --> 00:08:51.000
We want to sort 9,
4 and 6, so first we compare
00:08:51.000 --> 00:08:55.000
the first element with the
second element.
00:08:55.000 --> 00:09:00.000
9 is bigger than 4 so we go
down this way.
00:09:00.000 --> 00:09:03.000
Then we compare the first
element with the third element,
00:09:03.000 --> 00:09:05.000
that's 9 versus 6.
9 is bigger than 6,
00:09:05.000 --> 00:09:08.000
so we go this way.
And then we compare the second
00:09:08.000 --> 00:09:11.000
element with the third element,
4 is less than 6 and,
00:09:11.000 --> 00:09:14.000
so we go this way.
And the claim is that this is
00:09:14.000 --> 00:09:16.000
the correct permutation of the
elements.
00:09:16.000 --> 00:09:19.000
You take a_2,
which is 4, then you take a_3,
00:09:19.000 --> 00:09:22.000
which is 6, and then you take
a_1, which is 9,
00:09:22.000 --> 00:09:25.000
so indeed that works out.
And if I wrote this down right,
00:09:25.000 --> 00:09:30.000
this is a sorting algorithm in
the decision tree model.
00:09:30.000 --> 00:09:36.000
In general, let me just say the
rules of this game.
00:09:36.000 --> 00:09:43.000
In general, we have n elements
we want to sort.
00:09:43.000 --> 00:09:52.000
And I only drew the n = 3 case
because these trees get very big
00:09:52.000 --> 00:09:56.000
very quickly.
Each internal node,
00:09:56.000 --> 00:10:03.000
so every non-leaf node,
has a label of the form i :
00:10:03.000 --> 00:10:10.000
j where i and j are between 1
and n.
00:10:15.000 --> 00:10:23.000
And this means that we compare
a_i with a_j.
00:10:29.000 --> 00:10:33.000
And we have two subtrees from
every such node.
00:10:33.000 --> 00:10:40.000
We have the left subtree which
tells you what the algorithm
00:10:40.000 --> 00:10:45.000
does, what subsequent
comparisons it makes if it comes
00:10:45.000 --> 00:10:48.000
out less than.
00:10:54.000 --> 00:10:57.000
And we have to be a little bit
careful because it could also
00:10:57.000 --> 00:10:59.000
come out equal.
What we will do is the left
00:10:59.000 --> 00:11:03.000
subtree corresponds to less than
or equal to and the right
00:11:03.000 --> 00:11:06.000
subtree corresponds to strictly
greater than.
00:11:17.000 --> 00:11:21.000
That is a little bit more
precise than what we were doing
00:11:21.000 --> 00:11:23.000
here.
Here all the elements were
00:11:23.000 --> 00:11:26.000
distinct so no problem.
But, in general,
00:11:26.000 --> 00:11:30.000
we care about the equality case
too to be general.
00:11:30.000 --> 00:11:32.000
So, that was the internal
nodes.
00:11:32.000 --> 00:11:36.000
And then each leaf node gives
you a permutation.
00:11:44.000 --> 00:11:47.000
So, in order to be the answer
to that sorting problem,
00:11:47.000 --> 00:11:52.000
that permutation better have
the property that it orders the
00:11:52.000 --> 00:11:54.000
elements.
This is from the first lecture
00:11:54.000 --> 00:11:58.000
when we defined the sorting
problem.
00:11:58.000 --> 00:12:05.000
Some permutation on n things
such that a_pi(1) is less than
00:12:05.000 --> 00:12:09.000
or equal to a_pi(2) and so on.
00:12:15.000 --> 00:12:18.000
So, that is the definition of a
decision tree.
00:12:18.000 --> 00:12:21.000
Any binary tree with these
kinds of labels satisfies all
00:12:21.000 --> 00:12:24.000
these properties.
That is, in some sense,
00:12:24.000 --> 00:12:28.000
a sorting algorithm.
It's a sorting algorithm in the
00:12:28.000 --> 00:12:31.000
decision tree model.
Now, as you might expect,
00:12:31.000 --> 00:12:35.000
this is really not too
different than the comparison
00:12:35.000 --> 00:12:37.000
model.
If I give you a comparison
00:12:37.000 --> 00:12:40.000
sorting algorithm,
we have these four,
00:12:40.000 --> 00:12:44.000
quicksort, heapsort,
merge sort and insertion sort.
00:12:44.000 --> 00:12:48.000
All of them can be translated
into the decision tree model.
00:12:48.000 --> 00:12:52.000
It's sort of a graphical
representation of what the
00:12:52.000 --> 00:12:55.000
algorithm does.
It's not a terribly useful one
00:12:55.000 --> 00:13:00.000
for writing down an algorithm.
Any guesses why?
00:13:00.000 --> 00:13:03.000
Why do we not draw these
pictures as a definition of
00:13:03.000 --> 00:13:06.000
quicksort or a definition of
merge sort?
00:13:06.000 --> 00:13:09.000
It depends on the size of the
input, that's a good point.
00:13:09.000 --> 00:13:13.000
This tree is specific to the
value of n, so it is,
00:13:13.000 --> 00:13:15.000
in some sense,
not as generic.
00:13:15.000 --> 00:13:19.000
Now, we could try to write down
a construction for an arbitrary
00:13:19.000 --> 00:13:22.000
value of n of one of these
decision trees and that would
00:13:22.000 --> 00:13:28.000
give us sort of a real algorithm
that works for any input size.
00:13:28.000 --> 00:13:31.000
But even then this is not a
terribly convenient
00:13:31.000 --> 00:13:34.000
representation for writing down
an algorithm.
00:13:34.000 --> 00:13:38.000
Well, let's write down a
transformation that converts a
00:13:38.000 --> 00:13:42.000
comparison sorting algorithm to
a decision tree and then maybe
00:13:42.000 --> 00:13:45.000
you will see why.
This is not a useless model,
00:13:45.000 --> 00:13:48.000
obviously, I wouldn't be
telling you otherwise.
00:13:48.000 --> 00:13:52.000
It will be very powerful for
proving that we cannot do better
00:13:52.000 --> 00:13:56.000
than n lg n, but as writing down
an algorithm,
00:13:56.000 --> 00:14:00.000
if you were going to implement
something, this tree is not so
00:14:00.000 --> 00:14:05.000
useful.
Even if you had a decision tree
00:14:05.000 --> 00:14:10.000
computer, whatever that is.
But let's prove this theorem
00:14:10.000 --> 00:14:14.000
that decision trees,
in some sense,
00:14:14.000 --> 00:14:19.000
model comparison sorting
algorithms, which we call just
00:14:19.000 --> 00:14:22.000
comparison sorts.
00:14:29.000 --> 00:14:33.000
This is a transformation.
And we're going to build one
00:14:33.000 --> 00:14:38.000
tree for each value of n.
The decision trees depend on n.
00:14:38.000 --> 00:14:43.000
The algorithm hopefully,
well, it depends on n,
00:14:43.000 --> 00:14:46.000
but it works for all values of
n.
00:14:46.000 --> 00:14:51.000
And we're just going to think
of the algorithm as splitting
00:14:51.000 --> 00:14:55.000
into two forks,
the left subtree and the right
00:14:55.000 --> 00:15:00.000
subtree whenever it makes a
comparison.
00:15:07.000 --> 00:15:09.000
If we take a comparison sort
like merge sort.
00:15:09.000 --> 00:15:12.000
And it does lots of stuff.
It does index arithmetic,
00:15:12.000 --> 00:15:14.000
it does recursion,
whatever.
00:15:14.000 --> 00:15:18.000
But at some point it makes a
comparison and then we say,
00:15:18.000 --> 00:15:20.000
OK, there are two halves of the
algorithm.
00:15:20.000 --> 00:15:24.000
There is what the algorithm
would do if the comparison came
00:15:24.000 --> 00:15:27.000
out less than or equal to and
what the algorithm would do if
00:15:27.000 --> 00:15:31.000
the comparison came out greater
than.
00:15:31.000 --> 00:15:33.000
So, you can build a tree in
this way.
00:15:33.000 --> 00:15:37.000
In some sense,
what this tree is doing is
00:15:37.000 --> 00:15:42.000
listing all possible executions
of this algorithm considering
00:15:42.000 --> 00:15:46.000
what would happen for all
possible values of those
00:15:46.000 --> 00:15:48.000
comparisons.
00:15:59.000 --> 00:16:03.000
We will call these all possible
instruction traces.
00:16:03.000 --> 00:16:09.000
If you write down all the
instructions that are executed
00:16:09.000 --> 00:16:13.000
by this algorithm,
for all possible input arrays,
00:16:13.000 --> 00:16:19.000
a_1 to a_n, see what all the
comparisons, how they could come
00:16:19.000 --> 00:16:25.000
and what the algorithm does,
in the end you will get a tree.
00:16:25.000 --> 00:16:30.000
Now, how big will that tree be
roughly?
00:16:43.000 --> 00:16:48.000
As a function of n. Yeah?
00:16:55.000 --> 00:16:57.000
Right.
If it's got to be able to sort
00:16:57.000 --> 00:17:01.000
every possible list of length n,
at the leaves I have to have
00:17:01.000 --> 00:17:05.000
all the permutations of those
elements.
00:17:05.000 --> 00:17:07.000
That is a lot.
There are a lot of permeations
00:17:07.000 --> 00:17:10.000
on n elements.
There's n factorial of them.
00:17:10.000 --> 00:17:13.000
N factorial is exponential,
it's really big.
00:17:13.000 --> 00:17:17.000
So, this tree is huge.
It's going to be exponential on
00:17:17.000 --> 00:17:19.000
the input size n.
That is why we don't write
00:17:19.000 --> 00:17:22.000
algorithms down normally as a
decision tree,
00:17:22.000 --> 00:17:25.000
even though in some cases maybe
we could.
00:17:25.000 --> 00:17:29.000
It's not a very compact
representation.
00:17:29.000 --> 00:17:31.000
These algorithms,
you write them down in
00:17:31.000 --> 00:17:33.000
pseudocode, they have constant
length.
00:17:33.000 --> 00:17:35.000
It's a very succinct
representation of this
00:17:35.000 --> 00:17:38.000
algorithm.
Here the length depends on n
00:17:38.000 --> 00:17:41.000
and it depends exponentially on
n, which is not useful if you
00:17:41.000 --> 00:17:44.000
wanted to implement the
algorithm because writing down
00:17:44.000 --> 00:17:46.000
the algorithm would take a long
time.
00:17:46.000 --> 00:17:49.000
But, nonetheless,
we can use this as a tool to
00:17:49.000 --> 00:17:51.000
analyze these comparison sorting
algorithms.
00:17:51.000 --> 00:17:54.000
We have all of these.
Any algorithm can be
00:17:54.000 --> 00:17:58.000
transformed in this way into a
decision tree.
00:17:58.000 --> 00:18:03.000
And now we have this
observation that the number of
00:18:03.000 --> 00:18:08.000
leaves in this decision tree has
to be really big.
00:18:08.000 --> 00:18:12.000
Let me talk about leaves in a
second.
00:18:12.000 --> 00:18:18.000
Before we get to leaves,
let's talk about the depth of
00:18:18.000 --> 00:18:20.000
the tree.
00:18:26.000 --> 00:18:29.000
This decision tree represents
all possible executions of the
00:18:29.000 --> 00:18:31.000
algorithm.
If I look at a particular
00:18:31.000 --> 00:18:35.000
execution, which corresponds to
some root to leaf path in the
00:18:35.000 --> 00:18:38.000
tree, the running time or the
number of comparisons made by
00:18:38.000 --> 00:18:42.000
that execution is just the
length of the path.
00:18:47.000 --> 00:18:52.000
And, therefore,
the worst-case running time,
00:18:52.000 --> 00:18:59.000
over all possible inputs of
length n, is going to be --
00:19:05.000 --> 00:19:06.000
n - 1?
Could be.
00:19:06.000 --> 00:19:11.000
Depends on the decision tree.
But, as a function of the
00:19:11.000 --> 00:19:14.000
decision tree?
The longest path,
00:19:14.000 --> 00:19:19.000
right, which is called the
height of the tree.
00:19:24.000 --> 00:19:26.000
So, this is what we want to
measure.
00:19:26.000 --> 00:19:29.000
We want to claim that the
height of the tree has to be at
00:19:29.000 --> 00:19:32.000
least n lg n with an omega in
front.
00:19:32.000 --> 00:19:34.000
That is what we'll prove.
00:19:42.000 --> 00:19:44.000
And the only thing we're going
to use is that the number of
00:19:44.000 --> 00:19:48.000
leaves in that tree has to be
big, has to be n factorial.
00:20:00.000 --> 00:20:09.000
This is a lower bound on
decision tree sorting.
00:20:21.000 --> 00:20:26.000
And the lower bound says that
if you have any decision tree
00:20:26.000 --> 00:20:32.000
that sorts n elements then its
height has to be at least n lg n
00:20:32.000 --> 00:20:35.000
up to constant factors.
00:20:45.000 --> 00:20:52.000
So, that is the theorem.
Now we're going to prove the
00:20:52.000 --> 00:20:57.000
theorem.
And we're going to use that the
00:20:57.000 --> 00:21:06.000
number of leaves in that tree
must be at least n factorial.
00:21:06.000 --> 00:21:10.000
Because there are n factorial
permutations of the inputs.
00:21:10.000 --> 00:21:14.000
All of them could happen.
And so, for this algorithm to
00:21:14.000 --> 00:21:19.000
be correct, it has detect every
one of those permutations in
00:21:19.000 --> 00:21:22.000
some way.
Now, it may do it very quickly.
00:21:22.000 --> 00:21:26.000
We better only need n lg n
comparisons because we know
00:21:26.000 --> 00:21:31.000
that's possible.
The depth of the tree may not
00:21:31.000 --> 00:21:35.000
be too big, but it has to have a
huge number of leaves down
00:21:35.000 --> 00:21:37.000
there.
It has to branch enough to get
00:21:37.000 --> 00:21:42.000
n factorial leaves because it
has to give the right answer in
00:21:42.000 --> 00:21:45.000
possible inputs.
This is, in some sense,
00:21:45.000 --> 00:21:49.000
counting the number of possible
inputs that we have to
00:21:49.000 --> 00:21:52.000
distinguish.
This is the number of leaves.
00:21:52.000 --> 00:21:55.000
What we care about is the
height of the tree.
00:21:55.000 --> 00:21:59.000
Let's call the height of the
tree h.
00:21:59.000 --> 00:22:02.000
Now, if I have a tree of height
h, how many leaves could it
00:22:02.000 --> 00:22:04.000
have?
What's the maximum number of
00:22:04.000 --> 00:22:06.000
leaves it could have?
00:22:19.000 --> 00:22:23.000
2^h, exactly.
Because this is binary tree,
00:22:23.000 --> 00:22:29.000
comparison trees always have a
branching factor of 2,
00:22:29.000 --> 00:22:35.000
the number of leaves has to be
at most 2^h, if I have a height
00:22:35.000 --> 00:22:38.000
h tree.
Now, this gives me a relation.
00:22:38.000 --> 00:22:41.000
The number of leaves has to be
greater than or equal to n
00:22:41.000 --> 00:22:44.000
factorial and the number of
leaves has to be less than or
00:22:44.000 --> 00:22:47.000
equal to 2^h.
Therefore, n factorial is less
00:22:47.000 --> 00:22:50.000
than or equal to 2^h,
if I got that right.
00:22:58.000 --> 00:23:02.000
Now, again, we care about h in
terms of n factorial,
00:23:02.000 --> 00:23:04.000
so we solve this by taking
logs.
00:23:04.000 --> 00:23:07.000
And I am also going to flip
sides.
00:23:07.000 --> 00:23:12.000
Now h is at least log base 2,
because there is a 2 over here,
00:23:12.000 --> 00:23:15.000
of n factorial.
There is a property that I'm
00:23:15.000 --> 00:23:20.000
using here in order to derive
this inequality from this
00:23:20.000 --> 00:23:23.000
inequality.
This is a technical aside,
00:23:23.000 --> 00:23:27.000
but it's important that you
realize there is a technical
00:23:27.000 --> 00:23:30.000
issue here.
00:23:40.000 --> 00:23:43.000
The general principle I'm
applying is I have some
00:23:43.000 --> 00:23:46.000
inequality, I do the same thing
to both sides,
00:23:46.000 --> 00:23:49.000
and hopefully that inequality
should still be true.
00:23:49.000 --> 00:23:53.000
But, in order for that to be
the case, I need a property
00:23:53.000 --> 00:23:56.000
about that operation that I'm
performing.
00:23:56.000 --> 00:24:00.000
It has to be a monotonic
transformation.
00:24:00.000 --> 00:24:04.000
Here what I'm using is that log
is a monotonically increasing
00:24:04.000 --> 00:24:06.000
function.
That is important.
00:24:06.000 --> 00:24:11.000
If I multiply both sides by -1,
which is a decreasing function,
00:24:11.000 --> 00:24:14.000
the inequality would have to
get flipped.
00:24:14.000 --> 00:24:18.000
The fact that the inequality is
not flipping here,
00:24:18.000 --> 00:24:21.000
I need to know that log is
monotonically increasing.
00:24:21.000 --> 00:24:27.000
If you see log that's true.
We need to be careful here.
00:24:27.000 --> 00:24:31.000
Now we need some approximation
of n factorial in order to
00:24:31.000 --> 00:24:36.000
figure out what its log is.
Does anyone know a good
00:24:36.000 --> 00:24:41.000
approximation for n factorial?
Not necessarily the equation
00:24:41.000 --> 00:24:44.000
but the name.
Stirling's formula.
00:24:44.000 --> 00:24:47.000
Good.
You all remember Stirling.
00:24:47.000 --> 00:24:52.000
And I just need the highest
order term, which I believe is
00:24:52.000 --> 00:24:54.000
that.
N factorial is at least
00:24:54.000 --> 00:24:59.000
(n/e)^n.
So, that's all we need here.
00:24:59.000 --> 00:25:06.000
Now I can use properties of
logs to bring the n outside.
00:25:06.000 --> 00:25:09.000
This is n lg (n/e).
00:25:15.000 --> 00:25:18.000
And then lg (n/e) I can
simplify.
00:25:28.000 --> 00:25:32.000
That is just lg n - lg e.
So, this is n(lg n - lg e).
00:25:32.000 --> 00:25:37.000
Lg e is a constant,
so it's really tiny compared to
00:25:37.000 --> 00:25:39.000
this lg n which is growing
within.
00:25:39.000 --> 00:25:44.000
This is Omega(n lg n).
All we care about is the
00:25:44.000 --> 00:25:47.000
leading term.
It is actually Theta(n lg n),
00:25:47.000 --> 00:25:52.000
but because we have it greater
than or equal to all we care
00:25:52.000 --> 00:25:57.000
about is the omega.
A theta here wouldn't give us
00:25:57.000 --> 00:26:01.000
anything stronger.
Of course, not all algorithms
00:26:01.000 --> 00:26:04.000
have n lg n running time or make
n lg n comparisons.
00:26:04.000 --> 00:26:07.000
Some of them do,
some of them are worse,
00:26:07.000 --> 00:26:10.000
but this proves that all of
them require a height of at
00:26:10.000 --> 00:26:12.000
least n lg n.
There you see proof,
00:26:12.000 --> 00:26:15.000
once you observe the fact about
the number of leaves,
00:26:15.000 --> 00:26:18.000
and if you remember Stirling's
formula.
00:26:18.000 --> 00:26:22.000
So, you should know this proof.
You can show that all sorts of
00:26:22.000 --> 00:26:25.000
problems require n lg n time
with this kind of technique,
00:26:25.000 --> 00:26:30.000
provided you're in some kind of
a decision tree model.
00:26:30.000 --> 00:26:32.000
That's important.
We really need that our
00:26:32.000 --> 00:26:35.000
algorithm can be phrased as a
decision tree.
00:26:35.000 --> 00:26:37.000
And, in particular,
we know from this
00:26:37.000 --> 00:26:40.000
transformation that all
comparison sorts can be
00:26:40.000 --> 00:26:42.000
represented as the decision
tree.
00:26:42.000 --> 00:26:45.000
But there are some sorting
algorithms which cannot be
00:26:45.000 --> 00:26:48.000
represented as a decision tree.
And we will turn to that
00:26:48.000 --> 00:26:51.000
momentarily.
But before we get there I
00:26:51.000 --> 00:26:54.000
phrased this theorem as a lower
bound on decision tree sorting.
00:26:54.000 --> 00:26:57.000
But, of course,
we also get a lower bound on
00:26:57.000 --> 00:27:02.000
comparison sorting.
And, in particular,
00:27:02.000 --> 00:27:08.000
it tells us that merge sort and
heapsort are asymptotically
00:27:08.000 --> 00:27:11.000
optimal.
Their dependence on n,
00:27:11.000 --> 00:27:17.000
in terms of asymptotic
notation, so ignoring constant
00:27:17.000 --> 00:27:24.000
factors, these algorithms are
optimal in terms of growth of n,
00:27:24.000 --> 00:27:30.000
but this is only in the
comparison model.
00:27:30.000 --> 00:27:33.000
So, among comparison sorting
algorithms, which these are,
00:27:33.000 --> 00:27:35.000
they are asymptotically
optimal.
00:27:35.000 --> 00:27:39.000
They use the minimum number of
comparisons up to constant
00:27:39.000 --> 00:27:41.000
factors.
In fact, their whole running
00:27:41.000 --> 00:27:44.000
time is dominated by the number
of comparisons.
00:27:44.000 --> 00:27:47.000
It's all Theta(n lg n).
So, this is good news.
00:27:47.000 --> 00:27:51.000
And I should probably mention a
little bit about what happens
00:27:51.000 --> 00:27:55.000
with randomized algorithms.
What I've described here really
00:27:55.000 --> 00:27:57.000
only applies,
in some sense,
00:27:57.000 --> 00:28:02.000
to deterministic algorithms.
Does anyone see what would
00:28:02.000 --> 00:28:06.000
change with randomized
algorithms or where I've assumed
00:28:06.000 --> 00:28:09.000
that I've had a deterministic
comparison sort?
00:28:09.000 --> 00:28:13.000
This is a bit subtle.
And I only noticed it reading
00:28:13.000 --> 00:28:17.000
the notes this morning,
oh, wait.
00:28:28.000 --> 00:28:30.000
I will give you a hint.
It's over here,
00:28:30.000 --> 00:28:33.000
the right-hand side of the
world.
00:28:50.000 --> 00:28:55.000
If I have a deterministic
algorithm, what the algorithm
00:28:55.000 --> 00:29:00.000
does is completely determinate
at each step.
00:29:00.000 --> 00:29:05.000
As long as I know all the
comparisons that it made up to
00:29:05.000 --> 00:29:11.000
some point, it's determinate
what that algorithm will do.
00:29:11.000 --> 00:29:17.000
But, if I have a randomized
algorithm, it also depends on
00:29:17.000 --> 00:29:24.000
the outcomes of some coin flips.
Any suggestions of what breaks
00:29:24.000 --> 00:29:28.000
over here?
There is more than one tree,
00:29:28.000 --> 00:29:31.000
exactly.
So, we had this assumption that
00:29:31.000 --> 00:29:33.000
we only have one tree for each
n.
00:29:33.000 --> 00:29:36.000
In fact, what we get is a
probability distribution over
00:29:36.000 --> 00:29:38.000
trees.
For each value of n,
00:29:38.000 --> 00:29:41.000
if you take all the possible
executions of that algorithm,
00:29:41.000 --> 00:29:44.000
all the instruction traces,
well, now, in addition to
00:29:44.000 --> 00:29:47.000
branching on comparisons,
we also branch on whether a
00:29:47.000 --> 00:29:50.000
coin flip came out heads or
tails, or however we're
00:29:50.000 --> 00:29:53.000
generating random numbers it
came out with some value between
00:29:53.000 --> 00:29:55.000
1 and n.
So, we get a probability
00:29:55.000 --> 00:29:58.000
distribution over trees.
This lower bound still applies,
00:29:58.000 --> 00:30:02.000
though.
Because, no matter what tree we
00:30:02.000 --> 00:30:05.000
get, I don't really care.
I get at least one tree for
00:30:05.000 --> 00:30:08.000
each n.
And this proof applies to every
00:30:08.000 --> 00:30:10.000
tree.
So, no matter what tree you
00:30:10.000 --> 00:30:15.000
get, if it is a correct tree it
has to have height Omega(n lg
00:30:15.000 --> 00:30:17.000
n).
This lower bound applies even
00:30:17.000 --> 00:30:21.000
for randomized algorithms.
You cannot get better than n lg
00:30:21.000 --> 00:30:24.000
n, because no matter what tree
it comes up with,
00:30:24.000 --> 00:30:29.000
no matter how those coin flips
come out, this argument still
00:30:29.000 --> 00:30:33.000
applies.
Every tree that comes out has
00:30:33.000 --> 00:30:37.000
to be correct,
so this is really at least one
00:30:37.000 --> 00:30:38.000
tree.
00:30:43.000 --> 00:30:47.000
And that will now work.
We also get the fact that
00:30:47.000 --> 00:30:52.000
randomized quicksort is
asymptotically optimal in
00:30:52.000 --> 00:30:54.000
expectation.
00:31:05.000 --> 00:31:09.000
But, in order to say that
randomized quicksort is
00:31:09.000 --> 00:31:13.000
asymptotically optimal,
we need to know that all
00:31:13.000 --> 00:31:19.000
randomized algorithms require
Omega(n lg n) comparisons.
00:31:19.000 --> 00:31:22.000
Now we know that so all is
well.
00:31:22.000 --> 00:31:27.000
That is the comparison model.
Any questions before we go on?
00:31:27.000 --> 00:31:31.000
Good.
The next topic is to burst
00:31:31.000 --> 00:31:37.000
outside of the comparison model
and try to sort in linear time.
00:31:43.000 --> 00:31:45.000
It is pretty clear that,
as long as you don't have some
00:31:45.000 --> 00:31:48.000
kind of a parallel algorithm or
something really fancy,
00:31:48.000 --> 00:31:51.000
you cannot sort any better than
linear time because you've at
00:31:51.000 --> 00:31:54.000
least got to look at the data.
No matter what you're doing
00:31:54.000 --> 00:31:56.000
with the data,
you've got to look at it,
00:31:56.000 --> 00:31:59.000
otherwise you're not sorting it
correctly.
00:31:59.000 --> 00:32:01.000
So, linear time is the best we
could hope for.
00:32:01.000 --> 00:32:05.000
N lg n is pretty close.
How could we sort in linear
00:32:05.000 --> 00:32:07.000
time?
Well, we're going to need some
00:32:07.000 --> 00:32:10.000
more powerful assumption.
And this is the counter
00:32:10.000 --> 00:32:12.000
example.
We're going to have to move
00:32:12.000 --> 00:32:16.000
outside the comparison model and
do something else with our
00:32:16.000 --> 00:32:18.000
elements.
And what we're going to do is
00:32:18.000 --> 00:32:21.000
assume that they're integers in
a particular range,
00:32:21.000 --> 00:32:24.000
and we will use that to sort in
linear time.
00:32:24.000 --> 00:32:27.000
We're going to see two
algorithms for sorting faster
00:32:27.000 --> 00:32:32.000
than n lg n.
The first one is pretty simple,
00:32:32.000 --> 00:32:35.000
and we will use it in the
second algorithm.
00:32:35.000 --> 00:32:40.000
It's called counting sort.
The input to counting sort is
00:32:40.000 --> 00:32:44.000
an array, as usual,
but we're going to assume what
00:32:44.000 --> 00:32:49.000
those array elements look like.
Each A[i] is an integer from
00:32:49.000 --> 00:32:52.000
the range of 1 to k.
This is a pretty strong
00:32:52.000 --> 00:32:55.000
assumption.
And the running time is
00:32:55.000 --> 00:33:01.000
actually going to depend on k.
If k is small it is going to be
00:33:01.000 --> 00:33:06.000
a good algorithm.
If k is big it's going to be a
00:33:06.000 --> 00:33:10.000
really bad algorithm,
worse than n lg n.
00:33:10.000 --> 00:33:15.000
Our goal is to output some
sorted version of this array.
00:33:15.000 --> 00:33:20.000
Let's call this sorting of A.
It's going to be easier to
00:33:20.000 --> 00:33:25.000
write down the output directly
instead of writing down
00:33:25.000 --> 00:33:32.000
permutation for this algorithm.
And then we have some auxiliary
00:33:32.000 --> 00:33:36.000
storage.
I'm about to write down the
00:33:36.000 --> 00:33:41.000
pseudocode, which is why I'm
declaring all my variables here.
00:33:41.000 --> 00:33:45.000
And the auxiliary storage will
have length k,
00:33:45.000 --> 00:33:48.000
which is the range on my input
values.
00:33:48.000 --> 00:33:52.000
Let's see the algorithm.
00:34:07.000 --> 00:34:09.000
This is counting sort.
00:34:17.000 --> 00:34:20.000
And it takes a little while to
write down but it's pretty
00:34:20.000 --> 00:34:22.000
straightforward.
00:34:28.000 --> 00:34:32.000
First we do some
initialization.
00:34:32.000 --> 00:34:36.000
Then we do some counting.
00:35:04.000 --> 00:35:06.000
Then we do some summing.
00:35:50.000 --> 00:35:54.000
And then we actually write the
output.
00:36:28.000 --> 00:36:30.000
Is that algorithm perfectly
clear to everyone?
00:36:30.000 --> 00:36:30.000
No one.
Good.
This should illustrate how
obscure pseudocode can be.
00:36:33.000 --> 00:36:36.000
And when you're solving your
problem sets,
00:36:36.000 --> 00:36:39.000
you should keep in mind that
it's really hard to understand
00:36:39.000 --> 00:36:41.000
an algorithm just given
pseudocode like this.
00:36:41.000 --> 00:36:45.000
You need some kind of English
description of what's going on
00:36:45.000 --> 00:36:48.000
because, while you could work
through and figure out what this
00:36:48.000 --> 00:36:51.000
means, it could take half an
hour to an hour.
00:36:51.000 --> 00:36:53.000
And that's not a good way of
expressing yourself.
00:36:53.000 --> 00:36:57.000
And so what I will give you now
is the English description,
00:36:57.000 --> 00:37:01.000
but we will refer back to this
to understand.
00:37:01.000 --> 00:37:05.000
This is sort of our bible of
what the algorithm is supposed
00:37:05.000 --> 00:37:07.000
to do.
Let me go over it briefly.
00:37:07.000 --> 00:37:11.000
The first step is just some
initialization.
00:37:11.000 --> 00:37:15.000
The C[i]'s are going to count
some things, count occurrences
00:37:15.000 --> 00:37:18.000
of values.
And so first we set them to
00:37:18.000 --> 00:37:20.000
zero.
Then, for every value we see
00:37:20.000 --> 00:37:25.000
A[j], we're going to increment
the counter for that value A[j].
00:37:25.000 --> 00:37:30.000
Then the C[i]s will give me the
number of elements equal to a
00:37:30.000 --> 00:37:35.000
particular value i.
Then I'm going to take prefix
00:37:35.000 --> 00:37:39.000
sums, which will make it so that
C[i] gives me the number of
00:37:39.000 --> 00:37:42.000
keys, the number of elements
less than or equal to [i]
00:37:42.000 --> 00:37:45.000
instead of equals.
And then, finally,
00:37:45.000 --> 00:37:49.000
it turns out that's enough to
put all the elements in the
00:37:49.000 --> 00:37:52.000
right place.
This I will call distribution.
00:37:52.000 --> 00:37:56.000
This is the distribution step.
And it's probably the least
00:37:56.000 --> 00:38:01.000
obvious of all the steps.
And let's do an example to make
00:38:01.000 --> 00:38:04.000
it more obvious what's going on.
00:38:12.000 --> 00:38:30.000
Let's take an array A = [4,
1, 3, 4, 3].
00:38:30.000 --> 00:38:36.000
And then I want some array C.
And let me add some indices
00:38:36.000 --> 00:38:43.000
here so we can see what the
algorithm is really doing.
00:38:43.000 --> 00:38:50.000
Here it turns out that all of
my numbers are in the range 1 to
00:38:50.000 --> 00:38:54.000
4, so k = 4.
My array C has four values.
00:38:54.000 --> 00:39:00.000
Initially, I set them all to
zero.
00:39:00.000 --> 00:39:03.000
That's easy.
And now I want to count through
00:39:03.000 --> 00:39:07.000
everything.
And let me not cheat here.
00:39:07.000 --> 00:39:10.000
I'm in the second step,
so to speak.
00:39:10.000 --> 00:39:13.000
And I look for each element in
order.
00:39:13.000 --> 00:39:17.000
I look at the C[i] value.
The first element is 4,
00:39:17.000 --> 00:39:20.000
so I look at C4.
That is 0.
00:39:20.000 --> 00:39:24.000
I increment it to 1.
Then I look at element 1.
00:39:24.000 --> 00:39:28.000
That's 0.
I increment it to 1.
00:39:28.000 --> 00:39:30.000
Then I look at 3 and that's
here.
00:39:30.000 --> 00:39:33.000
It is also 0.
I increment it to 1.
00:39:33.000 --> 00:39:37.000
Not so exciting so far.
Now I see 4,
00:39:37.000 --> 00:39:40.000
which I've seen before,
how exciting.
00:39:40.000 --> 00:39:44.000
I had value 1 in here,
I increment it to 2.
00:39:44.000 --> 00:39:48.000
Then I see value 3,
which also had a value of 1.
00:39:48.000 --> 00:39:51.000
I increment that to 2.
The result is [1,
00:39:51.000 --> 00:39:55.000
0, 2, 2].
That's what my array C looks
00:39:55.000 --> 00:40:00.000
like at this point in the
algorithm.
00:40:00.000 --> 00:40:04.000
Now I do a relatively simple
transformation of taking prefix
00:40:04.000 --> 00:40:05.000
sums.
I want to know,
00:40:05.000 --> 00:40:09.000
instead of these individual
values, the sum of this prefix,
00:40:09.000 --> 00:40:13.000
the sum of this prefix,
the sum of this prefix and the
00:40:13.000 --> 00:40:17.000
sum of this prefix.
I will call that C prime just
00:40:17.000 --> 00:40:21.000
so we don't get too lost in all
these different versions of C.
00:40:21.000 --> 00:40:23.000
This is just 1.
And 1 plus 0 is 1.
00:40:23.000 --> 00:40:25.000
1 plus 2 is 3.
3 plus 2 is 5.
00:40:25.000 --> 00:40:30.000
So, these are sort of the
running totals.
00:40:30.000 --> 00:40:33.000
There are five elements total,
there are three elements less
00:40:33.000 --> 00:40:37.000
than or equal to 3,
there is one element less than
00:40:37.000 --> 00:40:38.000
or equal to 2,
and so on.
00:40:38.000 --> 00:40:40.000
Now, the fun part,
the distribution.
00:40:40.000 --> 00:40:43.000
And this is where we get our
array B.
00:40:43.000 --> 00:40:46.000
B better have the same size,
every element better appear
00:40:46.000 --> 00:40:50.000
here somewhere and they should
come out in sorted order.
00:40:50.000 --> 00:40:54.000
Let's just run the algorithm.
j is going to start at the end
00:40:54.000 --> 00:40:58.000
of the array and work its way
down to 1, the beginning of the
00:40:58.000 --> 00:41:02.000
array.
And what we do is we pick up
00:41:02.000 --> 00:41:05.000
the last element of A,
A[n].
00:41:05.000 --> 00:41:11.000
We look at the counter.
We look at the C vector for
00:41:11.000 --> 00:41:14.000
that value.
Here the value is 3,
00:41:14.000 --> 00:41:19.000
and this is the third column,
so that has number 3.
00:41:19.000 --> 00:41:24.000
And the claim is that's where
it belongs in B.
00:41:24.000 --> 00:41:29.000
You take this number 3,
you put it in index 3 of the
00:41:29.000 --> 00:41:34.000
array B.
And then you decrement the
00:41:34.000 --> 00:41:37.000
counter.
I'm going to replace 3 here
00:41:37.000 --> 00:41:40.000
with 2.
And the idea is these numbers
00:41:40.000 --> 00:41:44.000
tell you where those values
should go.
00:41:44.000 --> 00:41:48.000
Anything of value 1 should go
at position 1.
00:41:48.000 --> 00:41:53.000
Anything with value 3 should go
at position 3 or less.
00:41:53.000 --> 00:41:59.000
This is going to be the last
place that a 3 should go.
00:41:59.000 --> 00:42:02.000
And then anything with value 4
should go at position 5 or less,
00:42:02.000 --> 00:42:06.000
definitely should go at the end
of the array because 4 is the
00:42:06.000 --> 00:42:09.000
largest value.
And this counter will work out
00:42:09.000 --> 00:42:13.000
perfectly because these counts
have left enough space in each
00:42:13.000 --> 00:42:15.000
section of the array.
Effectively,
00:42:15.000 --> 00:42:18.000
this part is reserved for ones,
there are no twos,
00:42:18.000 --> 00:42:21.000
this part is reserved for
threes, and this part is
00:42:21.000 --> 00:42:24.000
reserved for fours.
You can check if that's really
00:42:24.000 --> 00:42:27.000
what this array means.
Let's finish running the
00:42:27.000 --> 00:42:31.000
algorithm.
That was the last element.
00:42:31.000 --> 00:42:34.000
I won't cross it off,
but we've sort of done that.
00:42:34.000 --> 00:42:36.000
Now I look at the next to last
element.
00:42:36.000 --> 00:42:38.000
That's a 4.
Fours go in position 5.
00:42:38.000 --> 00:42:42.000
So, I put my 4 here in position
5 and I decrement that counter.
00:42:42.000 --> 00:42:45.000
Next I look at another 3.
Threes now go in position 2,
00:42:45.000 --> 00:42:48.000
so that goes there.
And then I decrement that
00:42:48.000 --> 00:42:50.000
counter.
I won't actually use that
00:42:50.000 --> 00:42:53.000
counter anymore,
but let's decrement it because
00:42:53.000 --> 00:42:57.000
that's what the algorithm says.
I look at the previous element.
00:42:57.000 --> 00:43:00.000
That's a 1.
Ones go in position 1,
00:43:00.000 --> 00:43:04.000
so I put it here and decrement
that counter.
00:43:04.000 --> 00:43:09.000
And finally I have another 4.
And fours go in position 4 now,
00:43:09.000 --> 00:43:13.000
position 4 is here,
and I decrement that counter.
00:43:13.000 --> 00:43:18.000
So, that's counting sort.
And you'll notice that all the
00:43:18.000 --> 00:43:23.000
elements appear and they appear
in order, so that's the
00:43:23.000 --> 00:43:26.000
algorithm.
Now, what's the running time of
00:43:26.000 --> 00:43:31.000
counting sort?
kn is an upper bound.
00:43:31.000 --> 00:43:35.000
It's a little bit better than
that.
00:43:35.000 --> 00:43:43.000
Actually, quite a bit better.
This requires some summing.
00:43:43.000 --> 00:43:49.000
Let's go back to the top of the
algorithm.
00:43:49.000 --> 00:43:53.000
How much time does this step
take?
00:43:53.000 --> 00:43:57.000
k.
How much time does this step
00:43:57.000 --> 00:44:00.000
take?
n.
00:44:00.000 --> 00:44:05.000
How much time does this step
take?
00:44:05.000 --> 00:44:10.000
k.
Each of these operations in the
00:44:10.000 --> 00:44:17.000
for loops is taking constant
time, so it is how many
00:44:17.000 --> 00:44:22.000
iterations of that for loop are
there?
00:44:22.000 --> 00:44:29.000
And, finally,
this step takes n.
00:44:29.000 --> 00:44:35.000
So, the total running time of
counting sort is k + n.
00:44:35.000 --> 00:44:43.000
And this is a great algorithm
if k is relatively small,
00:44:43.000 --> 00:44:49.000
like at most n.
If k is big like n^2 or 2^n or
00:44:49.000 --> 00:44:54.000
whatever, this is not such a
good algorithm,
00:44:54.000 --> 00:45:01.000
but if k = O(n) this is great.
And we get our linear time
00:45:01.000 --> 00:45:04.000
sorting algorithm.
Not only do we need the
00:45:04.000 --> 00:45:08.000
assumption that our numbers are
integers, but we need that the
00:45:08.000 --> 00:45:12.000
range of the integers is pretty
small for this algorithm to
00:45:12.000 --> 00:45:14.000
work.
If all the numbers are between
00:45:14.000 --> 00:45:17.000
1 and order n then we get a
linear time algorithm.
00:45:17.000 --> 00:45:20.000
But as soon as they're up to n
lg n we're toast.
00:45:20.000 --> 00:45:24.000
We're back to n lg n sorting.
It's not so great.
00:45:24.000 --> 00:45:27.000
So, you could write a
combination algorithm that says,
00:45:27.000 --> 00:45:31.000
well, if k is bigger than n lg
n, then I will just use merge
00:45:31.000 --> 00:45:35.000
sort.
And if it's less than n lg n
00:45:35.000 --> 00:45:38.000
I'll use counting sort.
And that would work,
00:45:38.000 --> 00:45:42.000
but we can do better than that.
How's the time?
00:45:42.000 --> 00:45:46.000
It is worth noting that we've
beaten our bound,
00:45:46.000 --> 00:45:51.000
but only assuming that we're
outside the comparison model.
00:45:51.000 --> 00:45:55.000
We haven't really contradicted
the original theorem,
00:45:55.000 --> 00:46:00.000
we're just changing the model.
And it's always good to
00:46:00.000 --> 00:46:04.000
question what you're allowed to
do in any problem scenario.
00:46:04.000 --> 00:46:07.000
In, say, some practical
scenarios, this would be great
00:46:07.000 --> 00:46:10.000
if the numbers you're dealing
with are, say,
00:46:10.000 --> 00:46:12.000
a byte long.
Then k is only 2^8,
00:46:12.000 --> 00:46:15.000
which is 256.
You need this auxiliary array
00:46:15.000 --> 00:46:17.000
of size 256, and this is really
fast.
00:46:17.000 --> 00:46:21.000
256 + n, no matter how big n is
it's linear in n.
00:46:21.000 --> 00:46:24.000
If you know your numbers are
small, it's great.
00:46:24.000 --> 00:46:27.000
But if you're numbers are
bigger, say you still know
00:46:27.000 --> 00:46:30.000
they're integers but they fit in
like 32 bit words,
00:46:30.000 --> 00:46:35.000
then life is not so easy.
Because k is then 2^32,
00:46:35.000 --> 00:46:39.000
which is 4.2 billion or so,
which is pretty big.
00:46:39.000 --> 00:46:43.000
And you would need this
auxiliary array of 4.2 billion
00:46:43.000 --> 00:46:46.000
words, which is probably like 16
gigabytes.
00:46:46.000 --> 00:46:51.000
So, you just need to initialize
that array before you can even
00:46:51.000 --> 00:46:54.000
get started.
Unless n is like much,
00:46:54.000 --> 00:46:58.000
much more than 4 billion and
you have 16 gigabytes of storage
00:46:58.000 --> 00:47:02.000
just to throw away,
which I don't even have any
00:47:02.000 --> 00:47:06.000
machines with 16 gigabytes of
RAM, this is not such a great
00:47:06.000 --> 00:47:10.000
algorithm.
Just to get a feel,
00:47:10.000 --> 00:47:13.000
it's good, the numbers are
really small.
00:47:13.000 --> 00:47:18.000
What we're going to do next is
come up with a fancier algorithm
00:47:18.000 --> 00:47:22.000
that uses this as a subroutine
on small numbers and combines
00:47:22.000 --> 00:47:25.000
this algorithm to handle larger
numbers.
00:47:25.000 --> 00:47:29.000
That algorithm is called radix
sort.
00:47:29.000 --> 00:47:34.000
But we need one important
property of counting sort before
00:47:34.000 --> 00:47:36.000
we can go there.
00:47:42.000 --> 00:47:45.000
And that important property is
stability.
00:47:50.000 --> 00:47:58.000
A stable sorting algorithm
preserves the order of equal
00:47:58.000 --> 00:48:05.000
elements, let's say the relative
order.
00:48:19.000 --> 00:48:21.000
This is a bit subtle because
usually we think of elements
00:48:21.000 --> 00:48:24.000
just as numbers.
And, yeah, we had a couple
00:48:24.000 --> 00:48:25.000
threes and we had a couple
fours.
00:48:25.000 --> 00:48:28.000
It turns out,
if you look at the order of
00:48:28.000 --> 00:48:31.000
those threes and the order of
those fours, we kept them in
00:48:31.000 --> 00:48:33.000
order.
Because we took the last three
00:48:33.000 --> 00:48:36.000
and we put it here.
Then we took the next to the
00:48:36.000 --> 00:48:39.000
last three and we put it to the
left of that where O is
00:48:39.000 --> 00:48:42.000
decrementing our counter and
moving from the end of the array
00:48:42.000 --> 00:48:45.000
to the beginning of the array.
No matter how we do that,
00:48:45.000 --> 00:48:49.000
the orders of those threes are
preserved, the orders of the
00:48:49.000 --> 00:48:51.000
fours are preserved.
This may seem like a relatively
00:48:51.000 --> 00:48:54.000
simple thing,
but if you look at the other
00:48:54.000 --> 00:48:57.000
four sorting algorithms we've
seen, not all of them are
00:48:57.000 --> 00:49:00.000
stable.
So, this is an exercise.
00:49:06.000 --> 00:49:11.000
Exercise is figure out which
other sorting algorithms that
00:49:11.000 --> 00:49:15.000
we've seen are stable and which
are not.
00:49:21.000 --> 00:49:25.000
I encourage you to work that
out because this is the sort of
00:49:25.000 --> 00:49:29.000
thing that we ask on quizzes.
But for now all we need is that
00:49:29.000 --> 00:49:33.000
counting sort is stable.
And I won't prove this,
00:49:33.000 --> 00:49:37.000
but it should be pretty obvious
from the algorithm.
00:49:37.000 --> 00:49:41.000
Now we get to talk about radix
sort.
00:49:55.000 --> 00:50:01.000
Radix sort is going to work for
a much larger range of numbers
00:50:01.000 --> 00:50:04.000
in linear time.
Still it has to have an
00:50:04.000 --> 00:50:09.000
assumption about how big those
numbers are, but it will be a
00:50:09.000 --> 00:50:13.000
much more lax assumption.
Now, to increase suspense even
00:50:13.000 --> 00:50:18.000
further, I am going to tell you
some history about radix sort.
00:50:18.000 --> 00:50:22.000
This is one of the oldest
sorting algorithms.
00:50:22.000 --> 00:50:26.000
It's probably the oldest
implemented sorting algorithm.
00:50:26.000 --> 00:50:32.000
It was implemented around 1890.
This is Herman Hollerith.
00:50:32.000 --> 00:50:35.000
Let's say around 1890.
Has anyone heard of Hollerith
00:50:35.000 --> 00:50:37.000
before?
A couple people.
00:50:37.000 --> 00:50:41.000
Not too many.
He is sort of an important guy.
00:50:41.000 --> 00:50:43.000
He was a lecturer at MIT at
some point.
00:50:43.000 --> 00:50:47.000
He developed an early version
of punch cards.
00:50:47.000 --> 00:50:51.000
Punch card technology.
This is before my time so I
00:50:51.000 --> 00:50:54.000
even have to look at my notes to
remember.
00:50:54.000 --> 00:50:57.000
Oh, yeah, they're called punch
cards.
00:50:57.000 --> 00:51:02.000
You may have seen them.
If not they're in the
00:51:02.000 --> 00:51:06.000
PowerPoint lecture notes.
There's this big grid.
00:51:06.000 --> 00:51:11.000
These days, if you've used a
modern punch card recently,
00:51:11.000 --> 00:51:16.000
they are 80 characters wide
and, I don't know,
00:51:16.000 --> 00:51:21.000
I think it's something like 16,
I don't remember exactly.
00:51:21.000 --> 00:51:25.000
And then you punch little holes
here.
00:51:25.000 --> 00:51:30.000
You have this magic machine.
It's like a typewriter.
00:51:30.000 --> 00:51:34.000
You press a letter and that
corresponds to some character.
00:51:34.000 --> 00:51:38.000
Maybe it will punch out a hole
here, punch out a hole here.
00:51:38.000 --> 00:51:42.000
You can see the website if you
want to know exactly how this
00:51:42.000 --> 00:51:46.000
works for historical reasons.
You don't see these too often
00:51:46.000 --> 00:51:49.000
anymore, but this is in
particular the reason why most
00:51:49.000 --> 00:51:53.000
terminals are 80 characters wide
because that was how things
00:51:53.000 --> 00:51:55.000
were.
Hollerith actually didn't
00:51:55.000 --> 00:51:59.000
develop these punch cards
exactly, although eventually he
00:51:59.000 --> 00:52:01.000
did.
In the beginning,
00:52:01.000 --> 00:52:04.000
in 1890, the big deal was the
US Census.
00:52:04.000 --> 00:52:07.000
If you watched the news,
I guess like a year or two ago,
00:52:07.000 --> 00:52:10.000
the US Census was a big deal
because it's really expensive to
00:52:10.000 --> 00:52:12.000
collect all this data from
everyone.
00:52:12.000 --> 00:52:15.000
And the Constitution says
you've got to collect data about
00:52:15.000 --> 00:52:18.000
everyone every ten years.
And it was getting hard.
00:52:18.000 --> 00:52:20.000
In particular,
in 1880, they did the census.
00:52:20.000 --> 00:52:24.000
And it took them almost ten
years to complete the census.
00:52:24.000 --> 00:52:27.000
The population kept going up,
and ten years to do a ten-year
00:52:27.000 --> 00:52:30.000
census, that's going to start
getting expensive when they
00:52:30.000 --> 00:52:34.000
overlap with each other.
So, for 1890 they wanted to do
00:52:34.000 --> 00:52:37.000
something fancier.
And Hollerith said,
00:52:37.000 --> 00:52:40.000
OK, I'm going to build a
machine that you take in the
00:52:40.000 --> 00:52:42.000
data.
It was a modified punch card
00:52:42.000 --> 00:52:46.000
where you would mark out
particular squares depending on
00:52:46.000 --> 00:52:50.000
your status, whether you were
single or married or whatever.
00:52:50.000 --> 00:52:53.000
All the things they wanted to
know on the census they would
00:52:53.000 --> 00:52:57.000
encode in binary onto this card.
And then he built a machine
00:52:57.000 --> 00:53:02.000
that would sort these cards so
you could do counting.
00:53:02.000 --> 00:53:05.000
And, in some sense,
these are numbers.
00:53:05.000 --> 00:53:10.000
And the numbers aren't too big,
but they're big enough that
00:53:10.000 --> 00:53:15.000
counting sort wouldn't work.
I mean if there were a hundred
00:53:15.000 --> 00:53:18.000
numbers here,
2^100 is pretty overwhelming,
00:53:18.000 --> 00:53:24.000
so we cannot use counting sort.
The first idea was the wrong
00:53:24.000 --> 00:53:27.000
idea.
I'm going to think of these as
00:53:27.000 --> 00:53:30.000
numbers.
Let's say each of these columns
00:53:30.000 --> 00:53:34.000
is one number.
And so there's sort of the most
00:53:34.000 --> 00:53:38.000
significant number out here and
there is the least significant
00:53:38.000 --> 00:53:40.000
number out here.
The first idea was you sort by
00:53:40.000 --> 00:53:43.000
the most significant digit
first.
00:53:50.000 --> 00:53:53.000
That's not such a great
algorithm, because if you sort
00:53:53.000 --> 00:53:58.000
by the most significant digit
you get a bunch of buckets each
00:53:58.000 --> 00:54:01.000
with a pile of cards.
And this was a physical device.
00:54:01.000 --> 00:54:04.000
It wasn't exactly an
electronically controlled
00:54:04.000 --> 00:54:06.000
computer.
It was a human that would push
00:54:06.000 --> 00:54:09.000
down some kind of reader.
It would see which holes in the
00:54:09.000 --> 00:54:12.000
first column are punched.
And then it would open a
00:54:12.000 --> 00:54:15.000
physical bin in which the person
would sort of swipe it and it
00:54:15.000 --> 00:54:17.000
would just fall into the right
bin.
00:54:17.000 --> 00:54:20.000
It was a semi-automated.
I mean the computer was the
00:54:20.000 --> 00:54:22.000
human plus the machine,
but never mind.
00:54:22.000 --> 00:54:25.000
This was the procedure.
You sorted it into bins.
00:54:25.000 --> 00:54:28.000
Then you had to go through and
sort each bin by the second
00:54:28.000 --> 00:54:32.000
digit.
And pretty soon the number of
00:54:32.000 --> 00:54:36.000
bins gets pretty big.
And if you don't have too many
00:54:36.000 --> 00:54:40.000
digits this is OK,
but it's not the right thing to
00:54:40.000 --> 00:54:41.000
do.
The right idea,
00:54:41.000 --> 00:54:45.000
which is what Hollerith came up
with after that,
00:54:45.000 --> 00:54:50.000
was to sort by the least
significant digit first.
00:55:00.000 --> 00:55:03.000
And you should also do that
using a stable sorting
00:55:03.000 --> 00:55:05.000
algorithm.
Now, Hollerith probably didn't
00:55:05.000 --> 00:55:08.000
call it a stable sorting
algorithm at the time,
00:55:08.000 --> 00:55:11.000
but we will.
And this won Hollerith lots of
00:55:11.000 --> 00:55:14.000
money and good things.
He founded this tabulating
00:55:14.000 --> 00:55:17.000
machine company in 1911,
and that merged with several
00:55:17.000 --> 00:55:21.000
other companies to form
something you may have heard of
00:55:21.000 --> 00:55:24.000
called IBM in 1924.
That may be the context in
00:55:24.000 --> 00:55:28.000
which you've heard of Hollerith,
or if you've done punch cards
00:55:28.000 --> 00:55:32.000
before.
The whole idea is that we're
00:55:32.000 --> 00:55:37.000
doing a digit by digit sort.
I should have mentioned that at
00:55:37.000 --> 00:55:40.000
the beginning.
And we're going to do it from
00:55:40.000 --> 00:55:43.000
least significant to most
significant.
00:55:43.000 --> 00:55:48.000
It turns out that works.
And to see that let's do an
00:55:48.000 --> 00:55:50.000
example.
I think I'm going to need a
00:55:50.000 --> 00:55:55.000
whole two boards ideally.
First we'll see an example.
00:55:55.000 --> 00:55:59.000
Then we'll prove the theorem.
The proof is actually pretty
00:55:59.000 --> 00:56:03.000
darn easy.
But, nonetheless,
00:56:03.000 --> 00:56:07.000
it's rather counterintuitive
this works if you haven't seen
00:56:07.000 --> 00:56:10.000
it before.
Certainly, the first time I saw
00:56:10.000 --> 00:56:14.000
it, it was quite a surprise.
The nice thing also about this
00:56:14.000 --> 00:56:19.000
algorithm is there are no bins.
It's all one big bin at all
00:56:19.000 --> 00:56:21.000
times.
Let's take some numbers.
00:56:23.000 --> 00:56:28.000
I'm spacing out the digits so
we can see them a little bit
00:56:28.000 --> 00:07:37.000
better.
00:56:30.000 --> 00:56:33.000
657, 839, 436,
720 and 355.
00:56:33.000 --> 00:56:38.000
I'm assuming here we're using
decimal numbers.
00:56:38.000 --> 00:56:43.000
Why not?
Hopefully this are not yet
00:56:43.000 --> 00:56:47.000
sorted.
We'd like to sort them.
00:56:47.000 --> 00:56:54.000
The first thing we do is take
the least significant digit,
00:56:54.000 --> 00:57:00.000
sort by the least significant
digit.
00:57:00.000 --> 00:57:04.000
And whenever we have equal
elements like these two nines,
00:57:04.000 --> 00:57:07.000
we preserve their relative
order.
00:57:07.000 --> 00:57:11.000
So, 329 is going to remain
above 839.
00:57:11.000 --> 00:57:16.000
It doesn't matter here because
we're doing the first sort,
00:57:16.000 --> 00:57:20.000
but in general we're always
using a stable sorting
00:57:20.000 --> 00:57:23.000
algorithm.
When we sort by this column,
00:57:23.000 --> 00:57:27.000
first we get the zero,
so that's 720,
00:57:27.000 --> 00:05:55.000
then we get 5,
00:57:30.000 --> 00:07:16.000
Then we get 6,
00:57:31.000 --> 00:57:36.000
Stop me if I make a mistake.
Then we get the 7s,
00:57:36.000 --> 00:57:42.000
and we preserve the order.
Here it happens to be the right
00:57:42.000 --> 00:57:47.000
order, but it may not be at this
point.
00:57:47.000 --> 00:57:51.000
We haven't even looked at the
other digits.
00:57:51.000 --> 00:57:54.000
Then we get 9s,
there are two 9s,
00:57:54.000 --> 00:57:57.000
329 and 839.
All right so far?
00:57:57.000 --> 00:58:03.000
Good.
Now we sort by the middle
00:58:03.000 --> 00:58:07.000
digit, the next least
significant.
00:58:07.000 --> 00:58:12.000
And we start out with what
looks like the 2s.
00:58:12.000 --> 00:58:17.000
There is a 2 up here and a 2
down here.
00:58:17.000 --> 00:58:23.000
Of course, we write the first 2
first, 720, then 329.
00:58:23.000 --> 00:58:30.000
Then we have the 3s,
so we have 436 and 839.
00:58:30.000 --> 00:58:33.000
Then we have a bunch of 5s it
looks like.
00:58:33.000 --> 00:58:36.000
Have I missed anyone so far?
No.
00:58:36.000 --> 00:58:38.000
Good.
We have three 5s,
00:58:38.000 --> 00:58:42.000
355, 457 and 657.
I like to check that I haven't
00:58:42.000 --> 00:58:45.000
lost any elements.
We have seven here,
00:58:45.000 --> 00:58:48.000
seven here and seven elements
here.
00:58:48.000 --> 00:58:51.000
Good.
Finally, we sort by the last
00:58:51.000 --> 00:58:53.000
digit.
One thing to notice,
00:58:53.000 --> 00:59:00.000
by the way, is before we sorted
by the last digit --
00:59:00.000 --> 00:59:05.000
Currently these numbers don't
resemble sorted order at all.
00:59:05.000 --> 00:59:10.000
But if you look at everything
beyond the digit we haven't yet
00:59:10.000 --> 00:59:15.000
sorted, so these two digits,
that's nice and sorted,
00:59:15.000 --> 00:59:17.000
20, 29, 36, 39,
55, 57, 57.
00:59:17.000 --> 00:59:20.000
Pretty cool.
Let's finish it off.
00:59:20.000 --> 00:59:23.000
We stably sort by the first
digit.
00:59:23.000 --> 00:59:29.000
And the smallest number we get
is a 3, so we get 329 and then
00:59:36.000 --> 00:59:45.000
436 and 457,
then we get a 6,
00:59:45.000 --> 00:59:55.000
657, then a 7,
and then we have an 8.
00:59:55.000 --> 01:00:01.631
And check.
I still have seven elements.
01:00:01.631 --> 01:00:03.203
Good.
I haven't lost anyone.
01:00:03.203 --> 01:00:05.533
And, indeed,
they're now in sorted order.
01:00:05.533 --> 01:00:08.097
And you can start to see why
this is working.
01:00:08.097 --> 01:00:11.417
When I have equal elements
here, I have already sorted the
01:00:11.417 --> 01:00:13.398
suffix.
Let's write down a proof of
01:00:13.398 --> 01:00:15.029
that.
What is nice about this
01:00:15.029 --> 01:00:17.650
algorithm is we're not
partitioning into bins.
01:00:17.650 --> 01:00:20.970
We always keep the huge batch
of elements in one big pile,
01:00:20.970 --> 01:00:23.650
but we're just going through it
multiple times.
01:00:23.650 --> 01:00:27.087
In general, we sort of need to
go through it multiple times.
01:00:27.087 --> 01:00:32.006
Hopefully not too many times.
But let's first argue
01:00:32.006 --> 01:00:36.019
correctness.
To analyze the running time is
01:00:36.019 --> 01:00:41.751
a little bit tricky here because
it depends how you partition
01:00:41.751 --> 01:00:44.808
into digits.
Correctness is easy.
01:00:44.808 --> 01:00:50.159
We just induct on the digit
position that we're currently
01:00:50.159 --> 01:00:55.891
sorting, so let's call that t.
And we can assume by induction
01:00:55.891 --> 01:01:02.656
that it's sorted beyond digit t.
This is our induction
01:01:02.656 --> 01:01:07.841
hypothesis.
We assume that we're sorted on
01:01:07.841 --> 01:01:14.924
the low-order t - 1 digits.
And then the next thing we do
01:01:14.924 --> 01:01:21.501
is sort on the t-th digit.
We just need to check that
01:01:21.501 --> 01:01:26.561
things work.
And we restore the induction
01:01:26.561 --> 00:00:01.000
hypothesis for t instead of t -
01:01:32.000 --> 01:01:36.009
When we sort on the t-th digit
there are two cases.
01:01:36.009 --> 01:01:40.981
If we look at any two elements,
we want to know whether they're
01:01:40.981 --> 01:01:45.150
put in the right order.
If two elements are the same,
01:01:45.150 --> 01:01:49.000
let's say they have the same
t-th digit --
01:01:58.000 --> 01:02:02.000
This is the tricky case.
If they have the same t-th
01:02:02.000 --> 01:02:05.519
digit then their order should
not be changed.
01:02:05.519 --> 01:02:09.360
So, by stability,
we know that they remain in the
01:02:09.360 --> 01:02:14.400
same order because stability is
supposed to preserve things that
01:02:14.400 --> 01:02:17.519
have the same key that we're
sorting on.
01:02:17.519 --> 01:02:21.920
And then, by the induction
hypothesis, we know that that
01:02:21.920 --> 01:02:26.239
keeps them in sorted order
because induction hypothesis
01:02:26.239 --> 01:02:30.000
says that they used to be
sorted.
01:02:30.000 --> 01:02:35.369
Adding on this value in the
front that's the same in both
01:02:35.369 --> 01:02:39.684
doesn't change anything so they
remain sorted.
01:02:39.684 --> 01:02:44.000
And if they have differing t-th
digits --
01:02:54.000 --> 01:03:00.000
-- then this sorting step will
put them in the right order.
01:03:00.000 --> 01:03:03.189
Because that's what sorting
does.
01:03:03.189 --> 01:03:08.870
This is the most significant
digit, so you've got to order
01:03:08.870 --> 01:03:12.558
them by the t-th digit if they
differ.
01:03:12.558 --> 01:03:17.840
The rest are irrelevant.
So, proof here of correctness
01:03:17.840 --> 01:03:22.026
is very simple once you know the
algorithm.
01:03:22.026 --> 01:03:25.514
Any questions before we go on?
Good.
01:03:25.514 --> 01:03:30.000
We're going to use counting
sort.
01:03:30.000 --> 01:03:30.344
We could use any sorting
algorithm we want for individual
01:03:30.344 --> 01:03:30.713
digits, but the only algorithm
that we know that runs in less
01:03:30.713 --> 01:03:30.916
than n lg n time is counting
sort.
01:03:30.916 --> 01:03:31.267
So, we better use that one to
sort of bootstrap and get an
01:03:31.267 --> 01:03:31.501
even faster and more general
algorithm.
01:03:31.501 --> 01:03:31.883
I just erased the running time.
Counting sort runs in order k +
01:03:31.883 --> 01:03:36.003
n time.
We need to remember that.
01:03:36.003 --> 01:03:44.329
And the range of the numbers is
1 to k or 0 to k - 1.
01:03:44.329 --> 01:03:53.616
When we sort by a particular
digit, we shouldn't use n lg n
01:03:53.616 --> 01:04:02.743
algorithm because then this
thing will take n lg n for one
01:04:02.743 --> 01:04:09.788
round and it's going to have
multiple rounds.
01:04:09.788 --> 01:04:15.552
That's going to be worse than n
lg n.
01:04:15.552 --> 01:04:25.000
We're going to use counting
sort for each round.
01:04:32.000 --> 01:04:34.931
We use counting sort for each
digit.
01:04:34.931 --> 01:04:40.125
And we know the running time of
counting sort here is order k +
01:04:40.125 --> 01:04:42.973
n .
But I don't want to assume that
01:04:42.973 --> 01:04:46.324
my integers are split into
digits for me.
01:04:46.324 --> 01:04:50.261
That's sort of giving away too
much flexibility.
01:04:50.261 --> 01:04:55.287
Because if I have some number
written in whatever form it is,
01:04:55.287 --> 01:05:00.062
probably written in binary,
I can cluster together some of
01:05:00.062 --> 01:05:04.000
those bits and call that a
digit.
01:05:04.000 --> 01:05:07.415
Let's think of our numbers as
binary.
01:05:07.415 --> 01:05:12.442
Suppose we have n integers.
And they're in some range.
01:05:12.442 --> 01:05:16.901
And we want to know how big a
range they can be.
01:05:16.901 --> 01:05:21.264
Let's say, a sort of practical
way of thinking,
01:05:21.264 --> 01:05:26.577
you know, we're in a binary
world, each integer is b bits
01:05:26.577 --> 01:05:29.774
long.
So, in other words,
01:05:29.774 --> 01:05:35.283
the range is from 0 to 2b - 1.
I will assume that my numbers
01:05:35.283 --> 01:05:39.765
are non-negative.
It doesn't make much difference
01:05:39.765 --> 01:05:42.006
if they're negative,
too.
01:05:42.006 --> 01:05:47.515
I want to know how big a b I
can handle, but I don't want to
01:05:47.515 --> 01:05:52.650
split into bits as my digits
because then I would have b
01:05:52.650 --> 01:05:59.000
digits and I would have to do b
rounds of this algorithm.
01:05:59.000 --> 01:06:02.839
The number of rounds of this
algorithm is the number of
01:06:02.839 --> 01:06:05.754
digits that I have.
And each one costs me,
01:06:05.754 --> 01:06:08.598
let's hope, for linear time.
And, indeed,
01:06:08.598 --> 01:06:10.589
if I use a single bit,
k = 2.
01:06:10.589 --> 01:06:14.428
And so this is order n.
But then the running time would
01:06:14.428 --> 01:06:17.557
be order n per round.
And there are b digits,
01:06:17.557 --> 01:06:21.183
if I consider them to be bits,
order n times b time.
01:06:21.183 --> 01:06:24.240
And even if b is something
small like log n,
01:06:24.240 --> 01:06:27.866
if I have log n bits,
then these are numbers between
01:06:27.866 --> 01:06:32.549
0 and n - 1.
I already know how to sort
01:06:32.549 --> 01:06:36.666
numbers between 0 and n - 1 in
linear time.
01:06:36.666 --> 01:06:41.372
Here I'm spending n lg n time,
so that's no good.
01:06:41.372 --> 01:06:47.549
Instead, what we're going to do
is take a bunch of bits and call
01:06:47.549 --> 01:06:51.470
that a digit,
the most bits we can handle
01:06:51.470 --> 01:06:56.078
with counting sort.
The notation will be I split
01:06:56.078 --> 01:07:01.846
each integer into b/r digits.
Each r bits long.
01:07:01.846 --> 01:07:06.630
In other words,
I think of my number as being
01:07:06.630 --> 01:07:11.086
in base 2^r.
And I happen to be writing it
01:07:11.086 --> 01:07:15.869
down in binary,
but I cluster together r bits
01:07:15.869 --> 01:07:20.108
and I get a bunch of digits in
base 2^r.
01:07:20.108 --> 01:07:26.195
And then there are b/ r digits.
This b/r is the number of
01:07:26.195 --> 01:07:30.000
rounds.
And this base --
01:07:30.000 --> 01:07:34.104
This is the maximum value I
have in one of these digits.
01:07:34.104 --> 01:07:37.537
It's between 0 and 2^r.
This is, in some sense,
01:07:37.537 --> 01:07:40.000
k for a run of counting sort.
01:07:49.000 --> 01:07:54.673
What is the running time?
Well, I have b/r rounds.
01:07:54.673 --> 01:08:00.000
It's b/r times the running time
for a round.
01:08:00.000 --> 01:08:05.830
Which I have n numbers and my
value of k is 2^r.
01:08:05.830 --> 01:08:10.917
This is the running time of
counting sort,
01:08:10.917 --> 01:08:18.236
n + k, this is the number of
rounds, so this is b/r (n+2^r).
01:08:18.236 --> 01:08:23.198
And I am free to choose r
however I want.
01:08:23.198 --> 01:08:30.145
What I would like to do is
minimize this run time over my
01:08:30.145 --> 01:08:35.703
choices of r.
Any suggestions on how I might
01:08:35.703 --> 01:08:40.303
find the minimum running time
over all choices of r?
01:08:40.303 --> 01:08:44.000
Techniques, not necessarily
solutions.
01:08:53.000 --> 01:08:55.488
We're not used to this because
it's asymptomatic,
01:08:55.488 --> 01:08:58.288
but forget the big O here.
How do I minimize a function
01:08:58.288 --> 01:09:01.336
with respect to one variable?
Take the derivative,
01:09:01.336 --> 01:09:03.541
yeah.
I can take the derivative of
01:09:03.541 --> 01:09:06.080
this function by r,
differentiate by r,
01:09:06.080 --> 01:09:10.022
set the derivative equal to 0,
and that should be a critical
01:09:10.022 --> 01:09:13.496
point in this function.
It turns out this function is
01:09:13.496 --> 01:09:16.368
unimodal in r and you will find
the minimum.
01:09:16.368 --> 01:09:19.510
We could do that.
I'm not going to do it because
01:09:19.510 --> 01:09:23.385
it takes a little bit more work.
You should try it at home.
01:09:23.385 --> 01:09:27.059
It will give you the exact
minimum, which is good if you
01:09:27.059 --> 01:09:32.283
know what this constant is.
Differentiate with respect to r
01:09:32.283 --> 01:09:35.305
and set to 0.
I am going to do it a little
01:09:35.305 --> 01:09:39.063
bit more intuitively,
in other words less precisely,
01:09:39.063 --> 01:09:41.788
but I will still get the right
answer.
01:09:41.788 --> 01:09:46.210
And definitely I will get an
upper bound because I can choose
01:09:46.210 --> 01:09:50.115
r to be whatever I want.
It turns out this will be the
01:09:50.115 --> 01:09:53.210
right answer.
Let's just think about growth
01:09:53.210 --> 01:09:56.526
in terms of r.
There are essentially two terms
01:09:56.526 --> 01:10:00.024
here.
I have b/r(n) and I have
01:10:00.024 --> 01:10:03.315
b/r(2^r).
Now, b/r(n) would like r to be
01:10:03.315 --> 01:10:07.364
as big as possible.
The bigger r is the number of
01:10:07.364 --> 01:10:10.992
rounds goes down.
This number in front of n,
01:10:10.992 --> 01:10:16.138
this coefficient in front of n
goes down, so I would like r to
01:10:16.138 --> 01:10:18.669
be big.
So, b/r(n) wants r big.
01:10:18.669 --> 01:10:23.478
However, r cannot be too big.
This is saying I want digits
01:10:23.478 --> 01:10:28.540
that have a lot of bits in them.
It cannot be too big because
01:10:28.540 --> 01:10:34.465
there's 2^r term out here.
If this happens to be bigger
01:10:34.465 --> 01:10:39.220
than n then this will dominate
in terms of growth of r.
01:10:39.220 --> 01:10:43.182
This is going to be b times 2
to the r over r.
01:10:43.182 --> 01:10:46.264
2 the r is much,
much bigger than r,
01:10:46.264 --> 01:10:50.490
so it's going to grow much
faster is what I mean.
01:10:50.490 --> 01:10:55.949
And so I really don't want r to
be too big for this other term.
01:10:55.949 --> 01:11:00.000
So, that is b/4(2^r) wants r
small.
01:11:00.000 --> 01:11:06.684
Provided that this term is
bigger or equal to this term
01:11:06.684 --> 01:11:11.758
then I can set r pretty big for
that term.
01:11:11.758 --> 01:11:16.710
What I want is the n to
dominate the 2^r.
01:11:16.710 --> 01:11:23.641
Provided I have that then I can
set r as large as I want.
01:11:23.641 --> 01:11:30.697
Let's say I want to choose r to
be maximum subject to this
01:11:30.697 --> 01:11:38.000
condition that n is greater than
or equal to 2^r.
01:11:38.000 --> 01:11:42.291
This is an upper bound to 2^r,
and upper bound on r.
01:11:42.291 --> 01:11:44.899
In other words,
I want r = lg n.
01:11:44.899 --> 01:11:49.948
This turns out to be the right
answer up to constant factors.
01:11:49.948 --> 01:11:53.566
There we go.
And definitely choosing r to be
01:11:53.566 --> 01:11:58.951
lg n will give me an upper bound
on the best running time I could
01:11:58.951 --> 01:12:04.000
get because I can choose it to
be whatever I want.
01:12:04.000 --> 01:12:10.564
If you differentiate you will
indeed get the same answer.
01:12:10.564 --> 01:12:15.956
This was not quite a formal
argument but close,
01:12:15.956 --> 01:12:21.699
because the big O is all about
what grows fastest.
01:12:21.699 --> 01:12:26.036
If we plug in r = lg n we get
bn/lg n.
01:12:26.036 --> 01:12:31.780
The n and the 2^r are equal,
that's a factor of 2,
01:12:31.780 --> 01:12:38.704
2 times n, not a big deal.
It comes out into the O.
01:12:38.704 --> 01:12:44.788
We have bn/lg n which is r.
We have to think about what
01:12:44.788 --> 01:12:49.859
this means and translate it in
terms of range.
01:12:49.859 --> 01:12:56.957
b was the number of bits in our
number, which corresponds to the
01:12:56.957 --> 01:13:03.417
range of the number.
I've got 20 minutes under so
01:13:03.417 --> 01:13:08.543
far in lecture so I can go 20
minutes over,
01:13:08.543 --> 01:13:11.228
right?
No, I'm kidding.
01:13:11.228 --> 01:13:15.988
Almost done.
Let's say that our numbers,
01:13:15.988 --> 01:13:21.724
are integers are in the range,
we have 0 to 2^b,
01:13:21.724 --> 01:13:26.606
I'm going to say that it's
range 0 to nd.
01:13:26.606 --> 01:13:33.449
This should be a -1 here.
If I have numbers that are
01:13:33.449 --> 01:13:38.632
between 0 and n^d - 1 where d is
a constant or d is some
01:13:38.632 --> 01:13:42.306
parameter, so this is a
polynomial in n,
01:13:42.306 --> 01:13:45.604
then you work out this running
time.
01:13:45.604 --> 01:13:49.844
It is order dn.
This is the way to think about
01:13:49.844 --> 01:13:54.179
it because now we can compare to
counting sort.
01:13:54.179 --> 01:13:59.644
Counting sort could handle 0 up
to some constant times d in
01:13:59.644 --> 01:14:04.501
linear time.
Now I can handle 0 up to n to
01:14:04.501 --> 01:14:07.434
some constant power in linear
time.
01:14:07.434 --> 01:14:12.178
This is if d = order 1 then we
get a linear time sorting
01:14:12.178 --> 01:14:15.543
algorithm.
And that is cool as long as d
01:14:15.543 --> 01:14:19.511
is at most lg n.
As long as your numbers are at
01:14:19.511 --> 01:14:24.255
most n lg n then we have
something that beats our n lg n
01:14:24.255 --> 01:14:29.000
sorting algorithms.
And this is pretty nice.
01:14:29.000 --> 01:14:33.099
Whenever you know that your
numbers are order log end bits
01:14:33.099 --> 01:14:36.048
long we are happy,
and you get some smooth
01:14:36.048 --> 01:14:37.990
tradeoff there.
For example,
01:14:37.990 --> 01:14:42.018
if we have our 32 bit numbers
and we split into let's say
01:14:42.018 --> 01:14:46.262
eight bit chunks then we'll only
have to do four rounds each
01:14:46.262 --> 01:14:49.570
linear time and we have just 256
working space.
01:14:49.570 --> 01:14:52.735
We were doing four rounds for
32 bit numbers.
01:14:52.735 --> 01:14:56.835
If you use n lg n algorithm,
you're going to be doing lg n
01:14:56.835 --> 01:15:00.941
rounds through your numbers.
n is like 2000,
01:15:00.941 --> 01:15:03.515
and that's at least 11 rounds
for example.
01:15:03.515 --> 01:15:07.281
You would think this algorithm
is going to be much faster for
01:15:07.281 --> 01:15:09.038
small numbers.
Unfortunately,
01:15:09.038 --> 01:15:11.612
counting sort is not very good
on a cache.
01:15:11.612 --> 01:15:14.311
In practice,
rating sort is not that fast an
01:15:14.311 --> 01:15:17.199
algorithm unless your numbers
are really small.
01:15:17.199 --> 01:15:19.584
Something like quicksort can do
better.
01:15:19.584 --> 01:15:22.660
It's sort of shame,
but theoretically this is very
01:15:22.660 --> 01:15:25.045
beautiful.
And there are contexts where
01:15:25.045 --> 01:15:29.000
this is really the right way to
sort things.
01:15:29.000 --> 01:15:34.352
I will mention finally that if
you have arbitrary integers that
01:15:34.352 --> 01:15:39.100
are one word length long.
Here we're assuming that there
01:15:39.100 --> 01:15:44.280
are b bits in a word and we have
some depends indirectly on b
01:15:44.280 --> 01:15:46.093
here.
But, in general,
01:15:46.093 --> 01:15:51.100
if you have a bunch of integers
and they're one word length
01:15:51.100 --> 01:15:55.589
long, and you can manipulate a
word in constant time,
01:15:55.589 --> 01:16:00.597
then the best algorithm we know
for sorting runs in n times
01:16:00.597 --> 01:16:05.000
square root of lg lg n time
expected.
01:16:05.000 --> 01:16:08.719
It is a randomized algorithm.
We're not going to cover that
01:16:08.719 --> 01:16:11.798
algorithm in this class.
It's rather complicated.
01:16:11.798 --> 01:16:15.068
I didn't even cover it in
Advanced Algorithms when I
01:16:15.068 --> 01:16:17.570
taught it.
If you want something easier,
01:16:17.570 --> 01:16:21.289
you can get n times square root
of lg lg n time worst-case.
01:16:21.289 --> 01:16:23.406
And that paper is almost
readable.
01:16:23.406 --> 01:16:26.035
I have taught that in Advanced
Algorithms.
01:16:26.035 --> 01:16:28.729
If you're interested in this
kind of stuff,
01:16:28.729 --> 01:16:32.000
take Advanced Algorithms next
fall.
01:16:32.000 --> 01:16:34.552
It's one of the follow-ons to
this class.
01:16:34.552 --> 01:16:38.317
These are much more complicated
algorithms, but it gives you
01:16:38.317 --> 01:16:40.870
some sense.
You can even break out of the
01:16:40.870 --> 01:16:43.742
dependence on b,
as long as you know that b is
01:16:43.742 --> 01:16:46.486
at most a word.
And I will stop there unless
01:16:46.486 --> 01:16:49.000
there are any questions.
Then see you Wednesday.