WEBVTT

00:00:00.070 --> 00:00:01.770
The following
content is provided

00:00:01.770 --> 00:00:04.010
under a Creative
Commons license.

00:00:04.010 --> 00:00:06.860
B support will help MIT
OpenCourseWare continue

00:00:06.860 --> 00:00:10.720
to offer high quality
educational resources for free.

00:00:10.720 --> 00:00:13.330
To make a donation or
view additional materials

00:00:13.330 --> 00:00:17.209
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.209 --> 00:00:17.834
at ocw.mit.edu.

00:00:21.141 --> 00:00:23.390
VICTOR COSTAN: Any questions
about the sorting methods

00:00:23.390 --> 00:00:27.570
that you want me to go over
in that while I revise?

00:00:32.440 --> 00:00:34.744
OK.

00:00:34.744 --> 00:00:35.535
All right, sorting.

00:00:40.520 --> 00:00:44.030
What sorting methods
have we learned?

00:00:44.030 --> 00:00:46.395
Let's start from
dumbest to smartest.

00:00:46.395 --> 00:00:47.700
AUDIENCE: Merge sorting.

00:00:47.700 --> 00:00:50.550
VICTOR COSTAN: OK,
somewhere in the middle.

00:00:50.550 --> 00:00:52.020
Merge sort isn't very bad.

00:00:52.020 --> 00:00:54.401
What's the easiest
method to sort?

00:00:54.401 --> 00:00:55.234
AUDIENCE: Insertion.

00:00:58.120 --> 00:00:59.370
VICTOR COSTAN: Insertion sort.

00:00:59.370 --> 00:01:00.370
Excellent.

00:01:00.370 --> 00:01:00.930
All right.

00:01:00.930 --> 00:01:01.520
What else?

00:01:06.730 --> 00:01:07.230
Heapsort.

00:01:11.070 --> 00:01:11.620
And?

00:01:11.620 --> 00:01:14.239
I gave two away now.

00:01:14.239 --> 00:01:15.030
AUDIENCE: Counting.

00:01:17.772 --> 00:01:18.980
VICTOR COSTAN: Counting sort.

00:01:18.980 --> 00:01:20.190
Very good.

00:01:20.190 --> 00:01:20.690
And?

00:01:25.800 --> 00:01:26.300
Oh, wow.

00:01:26.300 --> 00:01:29.130
If you don't even
have the name of it.

00:01:29.130 --> 00:01:32.120
So the last one is radix sort.

00:01:32.120 --> 00:01:35.520
What are the running
times for these three

00:01:35.520 --> 00:01:36.555
that you guys remember?

00:01:40.191 --> 00:01:44.530
AUDIENCE: Insertion sort
is linearly one more.

00:01:44.530 --> 00:01:45.030
It's bad.

00:01:45.030 --> 00:01:47.696
VICTOR COSTAN: I want to see our
pseudocode for insertion sorts.

00:01:47.696 --> 00:01:49.915
AUDIENCE: n squared.

00:01:49.915 --> 00:01:52.390
AUDIENCE: Now that's really bad.

00:01:52.390 --> 00:01:55.430
VICTOR COSTAN: So linear is as
good as you could possibly get.

00:01:55.430 --> 00:01:58.770
So sorting takes an
array of random stuff

00:01:58.770 --> 00:02:01.510
and outputs an array of
things in a sorted order.

00:02:01.510 --> 00:02:05.170
The array is size n, so it has
to output an array of size n.

00:02:05.170 --> 00:02:07.680
If you can do an algorithm
that runs in order n time,

00:02:07.680 --> 00:02:09.979
then that's the best you
could possibly accomplish,

00:02:09.979 --> 00:02:12.340
because you have
output n elements.

00:02:12.340 --> 00:02:14.790
So the best possible time
you could get for sorting

00:02:14.790 --> 00:02:17.070
is theta of n.

00:02:17.070 --> 00:02:17.570
All right.

00:02:17.570 --> 00:02:18.573
How about merge sort?

00:02:21.351 --> 00:02:22.549
AUDIENCE: [INAUDIBLE].

00:02:22.549 --> 00:02:23.590
VICTOR COSTAN: Thank you.

00:02:26.260 --> 00:02:28.171
Heapsort.

00:02:28.171 --> 00:02:30.506
AUDIENCE: Order h.

00:02:30.506 --> 00:02:32.850
Order h is log n.

00:02:32.850 --> 00:02:34.680
VICTOR COSTAN: Order
h where h is log n.

00:02:34.680 --> 00:02:35.340
OK.

00:02:35.340 --> 00:02:37.900
And you're missing a factor.

00:02:37.900 --> 00:02:41.300
So a heap operation takes
order h, which is log n.

00:02:41.300 --> 00:02:43.540
So if I have to insert
a numbering in a heap

00:02:43.540 --> 00:02:46.750
or extract a number from
a heap, that's log n.

00:02:46.750 --> 00:02:51.765
In order to start an array,
how many insertions do I do?

00:02:51.765 --> 00:02:54.140
AUDIENCE: I think--
now I don't know.

00:02:54.140 --> 00:02:55.090
VICTOR COSTAN: OK.

00:02:55.090 --> 00:02:56.580
Wild guess.

00:02:56.580 --> 00:02:57.390
AUDIENCE: n.

00:02:57.390 --> 00:02:58.610
VICTOR COSTAN: Very good.

00:02:58.610 --> 00:03:00.560
See, there you go.

00:03:00.560 --> 00:03:04.240
So you need to insert all
your numbers in a heap

00:03:04.240 --> 00:03:05.740
and then extract
them one by one.

00:03:05.740 --> 00:03:07.750
And you will get them
in the correct order

00:03:07.750 --> 00:03:09.360
that gives you the
sorted results.

00:03:09.360 --> 00:03:12.440
So n log n.

00:03:12.440 --> 00:03:16.240
Does anyone remember what's
special about these three

00:03:16.240 --> 00:03:19.628
sorting methods that does
not apply to the other two?

00:03:19.628 --> 00:03:23.942
AUDIENCE: They're in place.

00:03:23.942 --> 00:03:25.900
VICTOR COSTAN: Merge sort
isn't quite in place.

00:03:25.900 --> 00:03:29.000
If it would be in place,
it would be perfect.

00:03:29.000 --> 00:03:32.260
There is actually a way of
making in place merge sort,

00:03:32.260 --> 00:03:35.820
but it requires a PhD
degree to understand that.

00:03:35.820 --> 00:03:40.120
So we will not cover it in 6006,
because I do not understand it.

00:03:40.120 --> 00:03:42.230
So I couldn't explain it.

00:03:42.230 --> 00:03:44.120
So merge sort is
not quite in place.

00:03:44.120 --> 00:03:45.640
Which one is in place?

00:03:49.951 --> 00:03:51.337
AUDIENCE: Heapsort.

00:03:51.337 --> 00:03:52.170
VICTOR COSTAN: Good.

00:03:52.170 --> 00:03:54.210
So heapsort is in place.

00:03:54.210 --> 00:03:56.000
Merge sort is not in place.

00:03:56.000 --> 00:03:58.890
And insertion sort
is really slow,

00:03:58.890 --> 00:04:00.800
so we don't care
that much about it.

00:04:00.800 --> 00:04:04.080
So what's special
about these three

00:04:04.080 --> 00:04:05.550
that does not
apply to these two?

00:04:11.200 --> 00:04:13.320
AUDIENCE: You don't
have to use integers.

00:04:13.320 --> 00:04:14.070
VICTOR COSTAN: OK.

00:04:14.070 --> 00:04:15.720
You don't have to use integers.

00:04:15.720 --> 00:04:17.519
What do they want
to know instead

00:04:17.519 --> 00:04:19.149
about the things you use?

00:04:19.149 --> 00:04:20.303
So we'll call them keys.

00:04:20.303 --> 00:04:22.219
AUDIENCE: You need to
be able to compare them.

00:04:22.219 --> 00:04:22.674
VICTOR COSTAN: All right.

00:04:22.674 --> 00:04:24.729
AUDIENCE: You don't
need to have a minimum

00:04:24.729 --> 00:04:27.806
and a maximum integer.

00:04:27.806 --> 00:04:30.430
VICTOR COSTAN: So turns out, if
you have a comparison operator,

00:04:30.430 --> 00:04:32.720
you will have a
minimum and a maximum.

00:04:32.720 --> 00:04:35.130
But that's complex
abstract algebra

00:04:35.130 --> 00:04:37.230
that we don't need
to worry about.

00:04:37.230 --> 00:04:39.060
So you gave me the
good answer, which

00:04:39.060 --> 00:04:42.915
is we use something
called a comparison model.

00:04:45.870 --> 00:04:47.970
And in that model,
you do not need

00:04:47.970 --> 00:04:49.610
to know too much
about your keys.

00:04:49.610 --> 00:04:52.640
So the elements in the
area that you're sorting.

00:04:52.640 --> 00:04:54.000
Your keys are blobs.

00:04:54.000 --> 00:04:55.750
And all they have
to be able to do

00:04:55.750 --> 00:04:57.380
is know-- if you
have two of them--

00:04:57.380 --> 00:05:01.670
you have to know
which one's greater.

00:05:01.670 --> 00:05:02.190
That's it.

00:05:02.190 --> 00:05:03.820
Nothing else.

00:05:03.820 --> 00:05:05.720
What's the problem with
the comparison model?

00:05:09.024 --> 00:05:11.384
AUDIENCE: It takes
time to compare things.

00:05:11.384 --> 00:05:12.800
It's like with everything.

00:05:12.800 --> 00:05:13.633
VICTOR COSTAN: Yeah.

00:05:16.192 --> 00:05:17.650
So we learned in
lecture that there

00:05:17.650 --> 00:05:20.260
is a lower bound for
the comparison model.

00:05:20.260 --> 00:05:23.720
And if you want to sort using
nothing but this information,

00:05:23.720 --> 00:05:28.680
that will take you at
least n log n time.

00:05:28.680 --> 00:05:31.310
You cannot do better than n
log n if all you're using is

00:05:31.310 --> 00:05:33.020
comparisons.

00:05:33.020 --> 00:05:37.260
So in that respect, merge sort
and heap sort are optimal.

00:05:37.260 --> 00:05:39.170
If you want to stay
within this model,

00:05:39.170 --> 00:05:42.290
this is the best time
you're going to get.

00:05:42.290 --> 00:05:46.670
Does anyone know how you can
implement this comparison model

00:05:46.670 --> 00:05:47.730
in Python?

00:05:47.730 --> 00:05:51.190
So numbers respond to
these operators, right?

00:05:51.190 --> 00:05:54.020
Actually, in Python
this is equals equals.

00:05:54.020 --> 00:05:56.080
What if I have a
random object and I

00:05:56.080 --> 00:05:58.630
want to make it respond
to these operators?

00:05:58.630 --> 00:06:00.370
So for example, I
write merge sort.

00:06:00.370 --> 00:06:01.840
We wrote merge sort.

00:06:01.840 --> 00:06:03.952
And now I have my own
objects, my own keys

00:06:03.952 --> 00:06:05.410
which are not
necessarily integers,

00:06:05.410 --> 00:06:07.004
because that's why we like this.

00:06:07.004 --> 00:06:09.170
And we want to make them
respond to these operators.

00:06:09.170 --> 00:06:11.490
So I can call merge
sort on an array of them

00:06:11.490 --> 00:06:13.240
and it will crash.

00:06:13.240 --> 00:06:16.042
What do I have to do?

00:06:16.042 --> 00:06:18.432
AUDIENCE: I mean, you
have to give the keys

00:06:18.432 --> 00:06:21.300
values that can be compared.

00:06:21.300 --> 00:06:23.430
VICTOR COSTAN: So suppose
this is my key class.

00:06:27.254 --> 00:06:31.119
AUDIENCE: This is
lad, the lt, and gt.

00:06:31.119 --> 00:06:32.160
VICTOR COSTAN: All right.

00:06:32.160 --> 00:06:34.360
There's a magical
method in Python.

00:06:34.360 --> 00:06:36.410
So there is the
old school model,

00:06:36.410 --> 00:06:41.480
which you might see in
legacy code, which only works

00:06:41.480 --> 00:06:45.000
in Python 2.x, which is you
define the method called

00:06:45.000 --> 00:06:52.480
cmp that takes self and other.

00:06:52.480 --> 00:06:54.900
And it has to return
a number that's

00:06:54.900 --> 00:06:58.780
either smaller than zero, equal
to zero, or greater than zero.

00:06:58.780 --> 00:07:01.660
And this maps to this.

00:07:04.590 --> 00:07:06.130
So you'll see this in old code.

00:07:06.130 --> 00:07:08.340
But you shouldn't
use it in new code.

00:07:08.340 --> 00:07:10.940
On this, you have a
very good reason to.

00:07:10.940 --> 00:07:14.020
Instead, the new
model says that you

00:07:14.020 --> 00:07:21.480
define special methods called
lt, which stands for less than.

00:07:21.480 --> 00:07:22.470
So it's this guy.

00:07:24.980 --> 00:07:30.100
le, which is less or equal.

00:07:30.100 --> 00:07:31.800
gt, which is greater than.

00:07:31.800 --> 00:07:34.935
And ge, which is
greater or equal.

00:07:37.630 --> 00:07:40.740
And if you look at our code
for pieces two and three,

00:07:40.740 --> 00:07:43.640
we have some objects that
pretend they're keys.

00:07:43.640 --> 00:07:47.780
And we have to
define these methods.

00:07:47.780 --> 00:07:49.460
Also, when you
define these, it's

00:07:49.460 --> 00:07:56.420
a good idea to define eq
for equality comparison.

00:07:56.420 --> 00:08:01.250
And ne, which is this guy.

00:08:01.250 --> 00:08:06.620
So these also take
self and other key

00:08:06.620 --> 00:08:08.790
that you're comparing with.

00:08:08.790 --> 00:08:10.370
And they return true or false.

00:08:13.440 --> 00:08:16.930
So this will help you
understand the code better.

00:08:16.930 --> 00:08:18.680
All right, so with
relatively little work,

00:08:18.680 --> 00:08:23.450
you can have any wild object
you want act as a key.

00:08:23.450 --> 00:08:25.950
And then you have
insertion sort,

00:08:25.950 --> 00:08:31.050
merge sort, heapsort,
heaps, binary trees, AVLs.

00:08:31.050 --> 00:08:34.020
Everything works, because
everything uses the comparison

00:08:34.020 --> 00:08:35.190
model.

00:08:35.190 --> 00:08:37.600
The problem is
this n log n bound.

00:08:40.200 --> 00:08:43.890
It's not as fast as the best
possible sorting algorithm

00:08:43.890 --> 00:08:45.410
you could come up with.

00:08:45.410 --> 00:08:47.550
This is slower than this.

00:08:47.550 --> 00:08:50.250
So that's why we have to break
out of the comparison model.

00:08:50.250 --> 00:08:55.000
And we have to look into these
boxes and get more information,

00:08:55.000 --> 00:08:58.200
so that we can write
faster sorting algorithms.

00:08:58.200 --> 00:09:02.678
Does anyone remember the
running time for counting sort?

00:09:02.678 --> 00:09:04.580
AUDIENCE: [INAUDIBLE] again?

00:09:04.580 --> 00:09:05.330
VICTOR COSTAN: OK.

00:09:09.524 --> 00:09:10.650
AUDIENCE: n plus e.

00:09:10.650 --> 00:09:11.400
VICTOR COSTAN: OK.

00:09:15.720 --> 00:09:18.790
Let's remember how
counting sort looks like.

00:09:18.790 --> 00:09:27.710
Let's get this array that-- that
should be enough-- four, one,

00:09:27.710 --> 00:09:30.970
three, two, three.

00:09:30.970 --> 00:09:33.180
How do we sort it
using counting sort?

00:09:38.080 --> 00:09:43.929
AUDIENCE: We initialize an array
of all the possible values.

00:09:43.929 --> 00:09:44.970
VICTOR COSTAN: Very good.

00:09:44.970 --> 00:09:45.850
Very good.

00:09:45.850 --> 00:09:48.370
So counting sort needs to know
something about your values,

00:09:48.370 --> 00:09:48.870
right?

00:09:48.870 --> 00:09:49.870
It makes an assumption.

00:09:49.870 --> 00:09:51.630
And the assumption
is that these values

00:09:51.630 --> 00:09:57.170
are integers from 0
to, say, k minus 1.

00:09:57.170 --> 00:10:00.340
So you have k possible values.

00:10:00.340 --> 00:10:02.380
And they don't really
have to be these as long

00:10:02.380 --> 00:10:05.640
as you can map them
to these numbers.

00:10:05.640 --> 00:10:10.090
So we are going to
initialize an array.

00:10:10.090 --> 00:10:14.960
Let's say this is an array.

00:10:14.960 --> 00:10:16.760
And zero, one.

00:10:16.760 --> 00:10:20.500
So zero, one, two,
three, four, five.

00:10:23.460 --> 00:10:26.090
So we're going to
initialize it with--

00:10:26.090 --> 00:10:27.440
AUDIENCE: Oh, zeroes.

00:10:27.440 --> 00:10:28.481
VICTOR COSTAN: All right.

00:10:31.020 --> 00:10:31.990
And then?

00:10:31.990 --> 00:10:37.380
AUDIENCE: Iterative
over our list sort

00:10:37.380 --> 00:10:44.620
incrementing the corresponding
value to each key in your--

00:10:44.620 --> 00:10:46.890
VICTOR COSTAN: So which
one am I incrementing here?

00:10:46.890 --> 00:10:47.360
AUDIENCE: Pardon?

00:10:47.360 --> 00:10:48.790
VICTOR COSTAN: Which one
am I incrementing here?

00:10:48.790 --> 00:10:50.081
AUDIENCE: Zero ne through four.

00:10:53.450 --> 00:10:53.950
One.

00:10:56.740 --> 00:10:59.800
VICTOR COSTAN: Three, two.

00:10:59.800 --> 00:11:00.697
And then?

00:11:00.697 --> 00:11:02.040
AUDIENCE: Three n.

00:11:02.040 --> 00:11:03.970
So this becomes a two.

00:11:06.630 --> 00:11:09.380
And what do I do now?

00:11:09.380 --> 00:11:14.482
AUDIENCE: Reiterate over
that-- I don't know.

00:11:14.482 --> 00:11:16.940
I don't know what to call that
identity [INAUDIBLE] almost?

00:11:16.940 --> 00:11:20.270
OK, an array.

00:11:20.270 --> 00:11:27.510
Printing into your output array
one one, one two, two threes,

00:11:27.510 --> 00:11:28.210
one four.

00:11:28.210 --> 00:11:28.820
VICTOR COSTAN: All right.

00:11:28.820 --> 00:11:30.330
So there's no zeroes
and now fives.

00:11:30.330 --> 00:11:36.240
So one one, one two,
one three, and one four.

00:11:38.930 --> 00:11:39.830
OK, so far so good.

00:11:39.830 --> 00:11:42.180
This is great.

00:11:42.180 --> 00:11:44.310
There's one thing
that's missing.

00:11:44.310 --> 00:11:47.200
For counting sort and for
other sorting algorithms,

00:11:47.200 --> 00:11:50.300
we care about the
property called stability.

00:11:50.300 --> 00:11:52.780
And stability means
that if you have

00:11:52.780 --> 00:11:55.090
two equal keys, or
at least two keys

00:11:55.090 --> 00:11:56.810
that look equal to
the sorting algorithm,

00:11:56.810 --> 00:11:58.684
they might be different
objects, because they

00:11:58.684 --> 00:12:00.800
might be implementing that.

00:12:00.800 --> 00:12:02.700
The one that shows
up first in the input

00:12:02.700 --> 00:12:06.010
should also show up
first in the output.

00:12:06.010 --> 00:12:07.710
And that requires
particular care,

00:12:07.710 --> 00:12:10.267
because you can't
just look at the keys

00:12:10.267 --> 00:12:11.850
from your sorting
perspective and know

00:12:11.850 --> 00:12:13.224
which one's supposed
to go where.

00:12:13.224 --> 00:12:15.660
You have to remember where
they were in the input.

00:12:15.660 --> 00:12:19.841
So if this guy is 3a,
and this guy is 3b,

00:12:19.841 --> 00:12:21.587
I can't use this
approach anymore, right?

00:12:21.587 --> 00:12:23.420
Because when I'm
outputting here, all I know

00:12:23.420 --> 00:12:24.520
is I have to output a three.

00:12:24.520 --> 00:12:25.936
I don't have any
other information

00:12:25.936 --> 00:12:27.950
associated with the key.

00:12:27.950 --> 00:12:30.006
So instead, I have to
do something smarter.

00:12:30.006 --> 00:12:34.200
AUDIENCE: Either replace
your array with a 2-D array.

00:12:34.200 --> 00:12:36.920
Or I think better
would be to replace

00:12:36.920 --> 00:12:39.601
each value with a length list.

00:12:39.601 --> 00:12:40.350
VICTOR COSTAN: OK.

00:12:40.350 --> 00:12:45.700
So we can replace each value
with a length list, which

00:12:45.700 --> 00:12:48.030
would have the keys
that map to it, right.

00:12:48.030 --> 00:12:49.700
So here I would have a one.

00:12:49.700 --> 00:12:51.970
Here I would have a two.

00:12:51.970 --> 00:12:55.450
Here I would have
3a, and then 3b.

00:12:55.450 --> 00:12:59.120
and here I would have a four.

00:12:59.120 --> 00:13:03.150
So then I can go through these
and output them the right way.

00:13:03.150 --> 00:13:06.940
OK, now suppose I'm
writing this in C.

00:13:06.940 --> 00:13:09.089
Suppose I'm in a
low level language.

00:13:09.089 --> 00:13:10.880
And I'm in a low level
language because I'm

00:13:10.880 --> 00:13:14.760
hired by one of these startups
that are doing NoSQL databases.

00:13:14.760 --> 00:13:16.490
And they're writing
everything in C

00:13:16.490 --> 00:13:18.660
to make their
things really fast.

00:13:18.660 --> 00:13:20.660
So I'm writing an index
that uses counting sort.

00:13:20.660 --> 00:13:23.870
I don't have length lists,
because if I'm writing in C,

00:13:23.870 --> 00:13:24.900
I have to write my own.

00:13:24.900 --> 00:13:26.220
And that's hard.

00:13:26.220 --> 00:13:28.430
So I want to implement
this in another way.

00:13:32.240 --> 00:13:33.560
Length lists are hard.

00:13:33.560 --> 00:13:36.070
What would I do instead?

00:13:36.070 --> 00:13:37.440
Can anyone think of another way?

00:13:37.440 --> 00:13:40.350
AUDIENCE: I think you
can decrement the values

00:13:40.350 --> 00:13:43.906
for the C in the
array that you have,

00:13:43.906 --> 00:13:46.354
where you have to type the
culture of each anyway.

00:13:46.354 --> 00:13:48.270
VICTOR COSTAN: OK, so
you have the right idea.

00:13:50.880 --> 00:13:51.910
You're missing one step.

00:13:51.910 --> 00:13:53.350
So I'll give
everyone else a hint

00:13:53.350 --> 00:13:54.600
so that everyone can catch up.

00:13:54.600 --> 00:13:58.470
So what I want to do is I want
to take this and transform it

00:13:58.470 --> 00:14:02.330
into something that allows
me to go through the keys.

00:14:02.330 --> 00:14:05.020
So I know I have five keys here.

00:14:05.020 --> 00:14:07.970
I'm going to make an output
array of five elements.

00:14:07.970 --> 00:14:11.100
And I want to be able
to see four and know

00:14:11.100 --> 00:14:12.630
that it belongs here.

00:14:12.630 --> 00:14:15.180
See one, know that
it belongs here.

00:14:15.180 --> 00:14:18.470
See 3a, know that
it belongs here.

00:14:18.470 --> 00:14:21.410
Then probably update the
value associated with three.

00:14:21.410 --> 00:14:23.040
See two, know that
it belongs here.

00:14:23.040 --> 00:14:26.010
And then when I see 3b,
know that it belongs here.

00:14:28.720 --> 00:14:33.270
So I want to look, when I get to
3a, I want to look inside here.

00:14:33.270 --> 00:14:39.090
And I want this to tell
me that 3 belongs here,

00:14:39.090 --> 00:14:39.960
3a belongs here.

00:14:54.610 --> 00:14:58.570
So what would the
position of 3a be?

00:14:58.570 --> 00:14:59.700
That's not good, right?

00:14:59.700 --> 00:15:02.760
Let's call this c instead
so that I can say 3a be.

00:15:05.920 --> 00:15:09.470
So how would I
define the position

00:15:09.470 --> 00:15:14.350
using the sorted property?

00:15:14.350 --> 00:15:19.780
3a should go in the index that
is how many keys smaller than 3

00:15:19.780 --> 00:15:20.420
there are.

00:15:22.970 --> 00:15:25.750
So if I can look
through here and see

00:15:25.750 --> 00:15:29.800
how many keys do I have
that are smaller than 3,

00:15:29.800 --> 00:15:33.210
this is where 3a needs to go.

00:15:33.210 --> 00:15:35.180
If I look at four,
there are four keys

00:15:35.180 --> 00:15:36.450
that are smaller than four.

00:15:36.450 --> 00:15:40.640
So it needs to go
in position four.

00:15:40.640 --> 00:15:44.680
AUDIENCE: Well, that almost
seems more like a compare.

00:15:44.680 --> 00:15:46.253
I'm guessing that
makes it-- I think

00:15:46.253 --> 00:15:47.586
it's kind of a comparison model.

00:15:47.586 --> 00:15:51.530
But you're saying
is it greater than.

00:15:51.530 --> 00:15:54.180
So it's not really counting
sort anymore as much.

00:15:54.180 --> 00:15:55.680
VICTOR COSTAN: Well,
I'm telling you

00:15:55.680 --> 00:15:58.860
I can compute that using this.

00:15:58.860 --> 00:16:00.650
So I can use the
counting sort algorithm

00:16:00.650 --> 00:16:05.130
and change this array a little
bit so that I can do this trick

00:16:05.130 --> 00:16:07.082
and know what goes where.

00:16:07.082 --> 00:16:09.860
AUDIENCE: You already
mentioned using a 2-D array.

00:16:09.860 --> 00:16:14.470
VICTOR COSTAN: But a 2-D
array would be too much.

00:16:14.470 --> 00:16:16.660
In the end, I will be
changing this in place.

00:16:16.660 --> 00:16:22.910
So no extra space except
for this array of size k.

00:16:22.910 --> 00:16:25.320
But let's not worry about
changing it in place right now.

00:16:25.320 --> 00:16:28.920
Let's say we're going to
make another array of size k.

00:16:34.920 --> 00:16:39.320
So I want it to tell me that-- I
guess I don't care about this--

00:16:39.320 --> 00:16:42.360
but I want it to tell me
that one, the first one

00:16:42.360 --> 00:16:44.340
should go here, the
first two should go here,

00:16:44.340 --> 00:16:47.540
the first three should go here,
the first four should go here.

00:16:47.540 --> 00:16:48.310
How do I do that?

00:16:51.472 --> 00:16:54.820
AUDIENCE: Well, you could
make that array, right.

00:16:54.820 --> 00:16:56.620
VICTOR COSTAN: But
how do I compute it?

00:16:56.620 --> 00:16:58.245
AUDIENCE: While you're
making this one,

00:16:58.245 --> 00:17:01.120
you can start
filling that one in.

00:17:01.120 --> 00:17:03.910
But while you're
making the top one.

00:17:03.910 --> 00:17:05.269
VICTOR COSTAN: Can I?

00:17:05.269 --> 00:17:08.790
AUDIENCE: It would be like
insertion sort though, kind of.

00:17:08.790 --> 00:17:11.655
So you come across the four.

00:17:11.655 --> 00:17:14.030
You put it in there, because
you know how many there are.

00:17:14.030 --> 00:17:15.510
But that doesn't
make a lot of sense.

00:17:15.510 --> 00:17:16.510
VICTOR COSTAN: Yeah, OK.

00:17:16.510 --> 00:17:17.980
So let's abandon that route.

00:17:17.980 --> 00:17:20.309
Let's think of something else.

00:17:20.309 --> 00:17:21.946
AUDIENCE: Could you
populate the array

00:17:21.946 --> 00:17:26.250
with the number of elements that
are less than that [INAUDIBLE]?

00:17:26.250 --> 00:17:28.270
VICTOR COSTAN: So
intuitively, I want

00:17:28.270 --> 00:17:30.150
this to tell me how
many elements there

00:17:30.150 --> 00:17:32.170
are that are smaller than two.

00:17:32.170 --> 00:17:34.084
This should tell me
the number of elements

00:17:34.084 --> 00:17:36.500
there are that are smaller
than three, so on and so forth.

00:17:39.520 --> 00:17:41.230
OK, how would I compute that?

00:17:46.852 --> 00:17:48.310
Let's see what it's
supposed to be.

00:17:48.310 --> 00:17:49.768
Let's fill it out
with real values.

00:17:49.768 --> 00:17:50.847
AUDIENCE: Zero.

00:17:50.847 --> 00:17:51.680
VICTOR COSTAN: Zero.

00:17:51.680 --> 00:17:53.410
How many elements
smaller than one?

00:17:53.410 --> 00:17:54.650
AUDIENCE: Zero.

00:17:54.650 --> 00:17:57.087
VICTOR COSTAN: How many
elements smaller than two?

00:17:57.087 --> 00:17:58.032
AUDIENCE: One.

00:17:58.032 --> 00:18:00.198
VICTOR COSTAN: How many
elements smaller than three?

00:18:00.198 --> 00:18:01.950
AUDIENCE: Two.

00:18:01.950 --> 00:18:03.770
It's a cumulative sum.

00:18:03.770 --> 00:18:04.800
VICTOR COSTAN: OK.

00:18:04.800 --> 00:18:07.240
AUDIENCE: On the array above.

00:18:07.240 --> 00:18:09.650
VICTOR COSTAN: So this is
how many elements smaller

00:18:09.650 --> 00:18:11.236
than four?

00:18:11.236 --> 00:18:13.530
Or how many elements
smaller than 5 4?

00:18:13.530 --> 00:18:14.810
OK.

00:18:14.810 --> 00:18:16.992
what's the difference
between these two guys?

00:18:16.992 --> 00:18:18.350
AUDIENCE: One.

00:18:18.350 --> 00:18:19.724
VICTOR COSTAN: What's the
difference between these two

00:18:19.724 --> 00:18:19.968
guys?

00:18:19.968 --> 00:18:20.551
AUDIENCE: One.

00:18:22.994 --> 00:18:24.410
VICTOR COSTAN:
Yeah, you're right.

00:18:24.410 --> 00:18:26.410
Sorry.

00:18:26.410 --> 00:18:27.530
Thank you.

00:18:27.530 --> 00:18:29.982
What's the difference
between these two guys?

00:18:29.982 --> 00:18:32.142
AUDIENCE: Two.

00:18:32.142 --> 00:18:32.643
One.

00:18:32.643 --> 00:18:35.058
VICTOR COSTAN: And what's the
difference between these two

00:18:35.058 --> 00:18:36.116
guys?

00:18:36.116 --> 00:18:38.561
AUDIENCE: Zero.

00:18:38.561 --> 00:18:41.154
VICTOR COSTAN: OK, What
did I just write here?

00:18:41.154 --> 00:18:42.615
AUDIENCE: Same series up there.

00:18:42.615 --> 00:18:44.080
AUDIENCE: Array.

00:18:44.080 --> 00:18:45.660
VICTOR COSTAN: All right.

00:18:45.660 --> 00:18:49.610
So this guy is zero, right,
because there's no element

00:18:49.610 --> 00:18:52.380
that-- there's nothing that's
smaller to the smallest key.

00:18:52.380 --> 00:18:59.610
And then this guy is whatever
was here plus this almost.

00:18:59.610 --> 00:19:02.140
So the difference between
this guy and this guy is this.

00:19:05.633 --> 00:19:07.629
AUDIENCE: So why go
through an array?

00:19:07.629 --> 00:19:09.967
I mean, why did you bother?

00:19:09.967 --> 00:19:11.835
Why do we make a new array?

00:19:11.835 --> 00:19:13.675
Because we could just
get that information.

00:19:13.675 --> 00:19:15.050
VICTOR COSTAN:
Making a new array

00:19:15.050 --> 00:19:18.870
so that we can see
how to compute it.

00:19:18.870 --> 00:19:21.170
So now we're going to try
to right pseudocode that

00:19:21.170 --> 00:19:23.950
does this in place.

00:19:23.950 --> 00:19:27.780
So suppose this
array is a and this

00:19:27.780 --> 00:19:33.300
array is pass for position.

00:19:33.300 --> 00:19:35.770
And suppose-- sorry,
not this array.

00:19:35.770 --> 00:19:36.640
This array is a.

00:19:39.460 --> 00:19:40.360
This array is pass.

00:19:40.360 --> 00:19:41.740
And I start with this.

00:19:41.740 --> 00:19:45.020
And I want to end up with this.

00:19:45.020 --> 00:19:48.510
So let's try to write the
pseudocode for counting sort.

00:19:48.510 --> 00:19:51.250
Counting sort with an array a.

00:19:51.250 --> 00:19:57.300
I'm not going to write the first
two lines that produce this.

00:19:57.300 --> 00:19:59.827
Let's transform this to this.

00:19:59.827 --> 00:20:00.660
How would I do that?

00:20:03.950 --> 00:20:06.770
AUDIENCE: Initialize an
array of the same size.

00:20:06.770 --> 00:20:08.597
VICTOR COSTAN: OK.

00:20:08.597 --> 00:20:09.805
Can we try to do it in place?

00:20:13.253 --> 00:20:13.878
AUDIENCE: Sure.

00:20:15.864 --> 00:20:17.530
VICTOR COSTAN: How
do we do it in place?

00:20:22.450 --> 00:20:25.530
AUDIENCE: You could, well
for four, you get the four.

00:20:25.530 --> 00:20:28.660
You're like, oh, I haven't
encountered anything below me.

00:20:28.660 --> 00:20:31.745
So you put it in zero
initially for four.

00:20:31.745 --> 00:20:32.704
And then you get a one.

00:20:32.704 --> 00:20:35.036
And you're like, oh, I haven't
gotten anything below me.

00:20:35.036 --> 00:20:36.866
But I forget to keep
track of the fact

00:20:36.866 --> 00:20:39.116
that you have to iterate a
whole list ever single time

00:20:39.116 --> 00:20:40.260
you get a new input.

00:20:40.260 --> 00:20:42.010
VICTOR COSTAN: So I
don't want to do that,

00:20:42.010 --> 00:20:43.051
because that's n squared.

00:20:43.051 --> 00:20:50.540
AUDIENCE: What you need to
do is keep a running sum.

00:20:50.540 --> 00:20:51.440
Is it a register?

00:20:51.440 --> 00:20:52.340
Is that what you do call it?

00:20:52.340 --> 00:20:52.920
VICTOR COSTAN: Running sum.

00:20:52.920 --> 00:20:53.720
I like running sum.

00:20:53.720 --> 00:20:54.261
AUDIENCE: OK.

00:20:54.261 --> 00:20:55.480
Keep a running sum of--

00:20:58.770 --> 00:21:00.780
VICTOR COSTAN: Sums always
start at zero, right?

00:21:00.780 --> 00:21:01.820
AUDIENCE: Right.

00:21:01.820 --> 00:21:10.140
So you keep zero at--
you take the value

00:21:10.140 --> 00:21:14.920
in each index of that
array and add it to sum.

00:21:14.920 --> 00:21:16.250
VICTOR COSTAN: OK.

00:21:16.250 --> 00:21:23.280
So for i iterating
from zero to-- so you

00:21:23.280 --> 00:21:26.577
want each value in
this array, right?

00:21:26.577 --> 00:21:27.160
AUDIENCE: Yes.

00:21:27.160 --> 00:21:30.980
VICTOR COSTAN: So it's going
to iterate from zero to what?

00:21:30.980 --> 00:21:34.635
How many elements
do I have there?

00:21:34.635 --> 00:21:36.910
AUDIENCE: Length k.

00:21:36.910 --> 00:21:38.400
VICTOR COSTAN: OK, almost.

00:21:38.400 --> 00:21:43.220
So we're using Python numbering,
which is zero base indexing.

00:21:43.220 --> 00:21:45.290
The indices look like this.

00:21:45.290 --> 00:21:46.799
So it's zero to--

00:21:46.799 --> 00:21:47.715
AUDIENCE: [INAUDIBLE].

00:21:47.715 --> 00:21:48.756
VICTOR COSTAN: Very good.

00:21:48.756 --> 00:21:50.870
Thank you.

00:21:50.870 --> 00:21:53.790
And you said I'm going to
add the elements to a sum.

00:21:53.790 --> 00:22:03.300
So sum is sum plus
position of i.

00:22:03.300 --> 00:22:05.080
OK.

00:22:05.080 --> 00:22:05.590
And then?

00:22:08.265 --> 00:22:12.510
AUDIENCE: The replace
is the [INAUDIBLE].

00:22:12.510 --> 00:22:16.830
So zero should be zero still.

00:22:16.830 --> 00:22:23.665
One should be the sum
after evaluating zero.

00:22:23.665 --> 00:22:25.610
You'll need a temp variable.

00:22:25.610 --> 00:22:26.400
VICTOR COSTAN: OK.

00:22:26.400 --> 00:22:31.360
AUDIENCE: You'll need to
graph position i when in temp.

00:22:31.360 --> 00:22:35.400
VICTOR COSTAN:
Temp is position i.

00:22:35.400 --> 00:22:42.605
AUDIENCE: Then say position i
is sum before incremental sums.

00:22:45.985 --> 00:22:46.485
No.

00:22:46.485 --> 00:22:49.395
That's not it at all.

00:22:49.395 --> 00:22:51.335
VICTOR COSTAN: Really?

00:22:51.335 --> 00:22:53.670
AUDIENCE: We'll have to say
that sum is sum plus temp.

00:23:01.542 --> 00:23:02.526
That is going to work.

00:23:05.480 --> 00:23:06.230
VICTOR COSTAN: OK.

00:23:06.230 --> 00:23:09.500
How does everyone
else feel about this?

00:23:09.500 --> 00:23:11.396
Does it make sense?

00:23:11.396 --> 00:23:12.283
AUDIENCE: Not really.

00:23:12.283 --> 00:23:14.324
AUDIENCE: [INAUDIBLE]
temporary blast [INAUDIBLE]

00:23:14.324 --> 00:23:19.500
previous adjuration, because--
so when you first started,

00:23:19.500 --> 00:23:21.930
it's the very initial
case that doesn't work.

00:23:21.930 --> 00:23:24.520
So like, if you're in the first
column, everything's fine.

00:23:24.520 --> 00:23:27.283
Then you go to column one.

00:23:27.283 --> 00:23:29.919
You're looking at everything
to the left of it.

00:23:29.919 --> 00:23:31.085
It's still going to be zero.

00:23:31.085 --> 00:23:32.560
Then you go to
the second column,

00:23:32.560 --> 00:23:35.250
but you already overwrote
the previous column.

00:23:35.250 --> 00:23:39.620
So you need to store
somehow the-- I don't know.

00:23:39.620 --> 00:23:42.932
It's just the initial
case from when it first

00:23:42.932 --> 00:23:46.047
goes from zero to an
actual qualified number.

00:23:46.047 --> 00:23:47.547
Because otherwise,
you're just going

00:23:47.547 --> 00:23:48.824
to get like zero, zero, zero.

00:23:48.824 --> 00:23:51.770
And you just overwrite.

00:23:51.770 --> 00:23:56.189
AUDIENCE: Can you
start [INAUDIBLE]?

00:23:56.189 --> 00:23:59.293
Was that before you changed?

00:23:59.293 --> 00:24:00.043
VICTOR COSTAN: OK.

00:24:14.580 --> 00:24:16.650
Sorry, I'm getting confused.

00:24:24.770 --> 00:24:27.220
This is getting hard.

00:24:27.220 --> 00:24:29.750
I will show you a trick
to make life easier.

00:24:29.750 --> 00:24:33.160
I'm going to put-- how many
elements do I have here?

00:24:33.160 --> 00:24:35.050
Five, right?

00:24:35.050 --> 00:24:41.170
So I'm going to put a
five here after the array.

00:24:41.170 --> 00:24:45.212
And then I'm going to ask
you, what's this difference.

00:24:45.212 --> 00:24:47.110
AUDIENCE: Zero.

00:24:47.110 --> 00:24:48.490
VICTOR COSTAN: OK.

00:24:48.490 --> 00:24:50.380
So now we have this whole array.

00:24:57.650 --> 00:25:00.780
Can people see what's
going on here.?

00:25:00.780 --> 00:25:03.130
So instead of starting
at the beginning,

00:25:03.130 --> 00:25:04.380
I'm going to start at the end.

00:25:04.380 --> 00:25:09.750
And I'm going to know-- I know
for sure there are n elements.

00:25:09.750 --> 00:25:13.310
Therefore, the index of
this guy is n minus--

00:25:13.310 --> 00:25:17.850
so the index of the last key
is n minus how many keys I

00:25:17.850 --> 00:25:18.770
have with this value.

00:25:21.720 --> 00:25:24.140
Does this make sense?

00:25:24.140 --> 00:25:26.334
AUDIENCE: But you're iterating
over an order, right?

00:25:26.334 --> 00:25:27.875
So we can't just
take the whole thing

00:25:27.875 --> 00:25:30.300
and say we're going to
shift it over to the right.

00:25:30.300 --> 00:25:31.383
VICTOR COSTAN: How about--

00:25:35.069 --> 00:25:37.110
AUDIENCE: And you're going
through left to right.

00:25:37.110 --> 00:25:39.840
You'll only know what
you see thus far.

00:25:39.840 --> 00:25:48.150
VICTOR COSTAN: How about going
it for ai from n minus 1 to 0.

00:25:48.150 --> 00:25:51.230
Will it work then?

00:25:51.230 --> 00:25:52.370
So what would I write?

00:25:52.370 --> 00:25:54.510
AUDIENCE: But isn't
that super inefficient?

00:25:54.510 --> 00:25:57.534
Because then you're starting
looking at the whole list.

00:25:57.534 --> 00:25:59.430
And then you're sort
of, rather than just

00:25:59.430 --> 00:26:02.955
looking at the previous sum
that you just-- the cumulative.

00:26:02.955 --> 00:26:04.386
So your first
adjuration, you have

00:26:04.386 --> 00:26:05.817
to add up everything
that you see.

00:26:05.817 --> 00:26:08.204
Like adjuration, you have
to add everything up.

00:26:08.204 --> 00:26:10.120
VICTOR COSTAN: So if I
add everything up here,

00:26:10.120 --> 00:26:12.790
what's the result going to be?

00:26:12.790 --> 00:26:13.570
AUDIENCE: Five.

00:26:13.570 --> 00:26:14.320
VICTOR COSTAN: OK.

00:26:14.320 --> 00:26:14.880
What's five?

00:26:21.094 --> 00:26:24.070
So this counts
how many zero keys

00:26:24.070 --> 00:26:26.086
I've seen, how many
one keys I've seen,

00:26:26.086 --> 00:26:29.389
how many two keys I've
seen, so on and so forth.

00:26:29.389 --> 00:26:29.930
So in total--

00:26:29.930 --> 00:26:30.960
AUDIENCE: So you're subtracting

00:26:30.960 --> 00:26:32.793
VICTOR COSTAN: It's how
many keys I've seen.

00:26:32.793 --> 00:26:36.040
All this, the sum of all these,
is how many keys I've sent.

00:26:36.040 --> 00:26:37.540
How many keys do I have?

00:26:37.540 --> 00:26:38.450
AUDIENCE: Five.

00:26:38.450 --> 00:26:39.950
For each one you
see, you can just--

00:26:39.950 --> 00:26:42.846
VICTOR COSTAN: So who's five?

00:26:42.846 --> 00:26:45.320
It's the length of
this guy, right?

00:26:45.320 --> 00:26:47.680
And we usually call that n.

00:26:47.680 --> 00:26:53.620
So when we're doing
sorting, this is n.

00:26:53.620 --> 00:26:55.175
So maybe it's less confusing.

00:26:55.175 --> 00:26:57.180
Oh, I already used
n in two places.

00:26:57.180 --> 00:26:59.930
So I guess that's it.

00:26:59.930 --> 00:27:04.245
I could say the length
of a, but there you go.

00:27:07.327 --> 00:27:09.660
So I could do the thing that
we're going through before.

00:27:09.660 --> 00:27:11.201
I could figure out
my temp variables.

00:27:11.201 --> 00:27:14.060
And I could make it work.

00:27:14.060 --> 00:27:15.145
Or I could do this.

00:27:15.145 --> 00:27:16.440
AUDIENCE: I think it's
the same though, isn't it?

00:27:16.440 --> 00:27:17.340
VICTOR COSTAN: Yup.

00:27:17.340 --> 00:27:20.616
It's the same thing, except I
think this is easier to write.

00:27:20.616 --> 00:27:22.240
Does anyone want to
help me write this?

00:27:28.636 --> 00:27:31.217
AUDIENCE: Maybe
doing once you're

00:27:31.217 --> 00:27:34.052
starting with the top array,
and then finding the bottom one.

00:27:34.052 --> 00:27:35.024
VICTOR COSTAN: Yeah.

00:27:35.024 --> 00:27:35.960
AUDIENCE: Oh, OK.

00:27:35.960 --> 00:27:37.835
Well, you just-- you
start with the first one

00:27:37.835 --> 00:27:40.444
and the one ahead of it.

00:27:40.444 --> 00:27:43.240
And oh, I mean starting
with the top right.

00:27:43.240 --> 00:27:43.820
Sorry.

00:27:43.820 --> 00:27:46.400
VICTOR COSTAN: OK,
so I have this.

00:27:46.400 --> 00:27:49.182
And then what do I do?

00:27:49.182 --> 00:27:51.597
AUDIENCE: [INAUDIBLE]?

00:27:51.597 --> 00:27:53.530
Oh, so you're starting
from the back.

00:27:53.530 --> 00:27:54.534
VICTOR COSTAN: Yep.

00:27:54.534 --> 00:28:00.720
AUDIENCE: Well, then you just
compare that to-- I mean,

00:28:00.720 --> 00:28:03.445
you're going to start
with zero difference.

00:28:03.445 --> 00:28:05.820
If you have-- well you don't
have any of those last keys,

00:28:05.820 --> 00:28:07.434
so you'd be able to
start with a zero.

00:28:07.434 --> 00:28:09.600
VICTOR COSTAN: So what's
the difference between five

00:28:09.600 --> 00:28:12.780
here, which is n, and this guy?

00:28:12.780 --> 00:28:13.437
What is this?

00:28:13.437 --> 00:28:14.770
AUDIENCE: It's going to be zero.

00:28:14.770 --> 00:28:16.140
VICTOR COSTAN: But what is it?

00:28:16.140 --> 00:28:16.870
Why is it zero?

00:28:16.870 --> 00:28:19.350
So this one's zero, this
one's one, this one's two.

00:28:19.350 --> 00:28:21.910
What is this?

00:28:21.910 --> 00:28:23.160
It's the last guy here, right?

00:28:23.160 --> 00:28:24.180
AUDIENCE: Yeah, yeah.

00:28:24.180 --> 00:28:29.450
VICTOR COSTAN: So this
is pass of n minus 1.

00:28:29.450 --> 00:28:34.090
And this is pass of n minus
2, so on and so forth.

00:28:34.090 --> 00:28:38.420
So to get from n
to the value here,

00:28:38.420 --> 00:28:40.030
I have to subtract this guy.

00:28:45.417 --> 00:28:46.250
AUDIENCE: Pass of i.

00:28:48.859 --> 00:28:49.900
VICTOR COSTAN: Pass of i.

00:28:56.264 --> 00:28:57.180
AUDIENCE: [INAUDIBLE].

00:29:02.681 --> 00:29:03.430
VICTOR COSTAN: OK.

00:29:03.430 --> 00:29:04.180
Very good.

00:29:04.180 --> 00:29:07.659
AUDIENCE: And then update sum.

00:29:07.659 --> 00:29:10.144
Sum equals a pos value.

00:29:13.140 --> 00:29:14.210
VICTOR COSTAN: Sweet.

00:29:14.210 --> 00:29:17.681
No temp variables, aside
from this, I guess.

00:29:17.681 --> 00:29:18.680
How does this look like?

00:29:18.680 --> 00:29:20.860
Do people get it?

00:29:20.860 --> 00:29:22.690
AUDIENCE: You're
subtracting positive i,

00:29:22.690 --> 00:29:25.450
or you're subtracting a of i.

00:29:25.450 --> 00:29:26.870
AUDIENCE: It's all one array.

00:29:26.870 --> 00:29:28.120
AUDIENCE: It's the same thing.

00:29:28.120 --> 00:29:28.661
That's right.

00:29:28.661 --> 00:29:32.230
VICTOR COSTAN: So a is this
array. a is the input array.

00:29:32.230 --> 00:29:34.370
And pass is this guy.

00:29:34.370 --> 00:29:36.800
And this is pass
before the four loop.

00:29:36.800 --> 00:29:39.320
And this is pass
after the four loop.

00:29:39.320 --> 00:29:41.150
So I guess this is pass zero.

00:29:41.150 --> 00:29:43.050
And this is pass one.

00:29:43.050 --> 00:29:46.080
And here, we start
with pass zero.

00:29:46.080 --> 00:29:48.910
This, we end up with pass one.

00:29:48.910 --> 00:29:53.300
OK

00:29:53.300 --> 00:29:55.747
So we're able to compute this.

00:29:55.747 --> 00:29:57.830
There are many ways of
doing this, but in the end,

00:29:57.830 --> 00:30:00.250
you want an array
that looks like that.

00:30:00.250 --> 00:30:01.379
This is counting sort.

00:30:01.379 --> 00:30:03.420
This is the hard part of
counting sort, coming up

00:30:03.420 --> 00:30:04.650
with that array.

00:30:04.650 --> 00:30:06.980
Once you come up with
that array, you're golden.

00:30:06.980 --> 00:30:10.990
So let's see that we're golden
and produce an output array

00:30:10.990 --> 00:30:12.500
with the keys in
the right order.

00:30:15.760 --> 00:30:18.050
So say we have an
array called output.

00:30:18.050 --> 00:30:21.530
And this is going to have
these keys in the right order.

00:30:21.530 --> 00:30:24.956
What's the pseudocode for that?

00:30:24.956 --> 00:30:28.160
First, I'm going to
create a new array.

00:30:28.160 --> 00:30:32.155
And I'm going to initialize
it with n NIL values.

00:30:34.990 --> 00:30:35.740
Then what do I do?

00:30:40.390 --> 00:30:42.050
AUDIENCE: Iterate over a.

00:30:42.050 --> 00:30:44.470
VICTOR COSTAN: Very good.

00:30:44.470 --> 00:30:47.350
For-- nah, it's too low.

00:30:47.350 --> 00:30:48.458
Let's do it here.

00:30:48.458 --> 00:30:49.454
AUDIENCE: i of a.

00:30:53.440 --> 00:30:57.950
From zero to n minus 1.

00:30:57.950 --> 00:30:59.660
VICTOR COSTAN: OK.

00:30:59.660 --> 00:31:00.210
What do I do?

00:31:03.070 --> 00:31:12.962
AUDIENCE: Out of
[INAUDIBLE] has to be-- oh,

00:31:12.962 --> 00:31:14.450
can we modify pass one as we go?

00:31:14.450 --> 00:31:15.350
VICTOR COSTAN: Yeah.

00:31:15.350 --> 00:31:18.750
AUDIENCE: So you could
say, out of pos one--

00:31:18.750 --> 00:31:20.500
VICTOR COSTAN: So by
the way, this is pos.

00:31:20.500 --> 00:31:22.350
The reason I label
them with zero and one,

00:31:22.350 --> 00:31:23.900
so we're doing the
change in place.

00:31:23.900 --> 00:31:24.150
AUDIENCE: Right.

00:31:24.150 --> 00:31:25.608
VICTOR COSTAN: The
reason I labeled

00:31:25.608 --> 00:31:27.825
them is to say that
this is what pos

00:31:27.825 --> 00:31:29.590
is before we going
into the loop.

00:31:29.590 --> 00:31:31.600
This is what pos is afterwards.

00:31:31.600 --> 00:31:33.440
But it's a single array.

00:31:33.440 --> 00:31:34.580
So let's call it pos.

00:31:34.580 --> 00:31:36.455
So out of pos of--

00:31:36.455 --> 00:31:40.330
AUDIENCE: Pos of i
equals a to the i.

00:31:43.281 --> 00:31:46.990
Positive i plus pos squared.

00:31:46.990 --> 00:31:48.830
VICTOR COSTAN: Yup.

00:31:48.830 --> 00:31:51.670
And I'm going to use
the CLRS, the way

00:31:51.670 --> 00:31:54.590
which makes me write more.

00:31:58.190 --> 00:31:59.470
So how this work?

00:31:59.470 --> 00:32:01.710
I have the survey here.

00:32:01.710 --> 00:32:04.370
I start at four.

00:32:04.370 --> 00:32:05.750
What's pos of four?

00:32:08.582 --> 00:32:10.000
AUDIENCE: Four.

00:32:10.000 --> 00:32:12.570
VICTOR COSTAN: All
right, so I'm going

00:32:12.570 --> 00:32:14.520
to write this as position four.

00:32:14.520 --> 00:32:18.950
I should probably make
this a proper array.

00:32:18.950 --> 00:32:21.510
One two, three, four, five.

00:32:24.570 --> 00:32:26.740
So at four, I write four.

00:32:26.740 --> 00:32:30.420
And then I increment
this guy to become five.

00:32:34.820 --> 00:32:36.070
Then I get to one.

00:32:36.070 --> 00:32:38.320
So I look at pos of--

00:32:38.320 --> 00:32:39.680
AUDIENCE: One.

00:32:39.680 --> 00:32:42.800
VICTOR COSTAN: And that is zero.

00:32:42.800 --> 00:32:46.580
So I'm going to write
one at position zero.

00:32:46.580 --> 00:32:50.470
And I'm going to increment it.

00:32:50.470 --> 00:32:51.755
Then I get to 3a.

00:32:51.755 --> 00:32:53.620
I look at positive 3.

00:32:53.620 --> 00:32:54.600
It says 2.

00:32:54.600 --> 00:32:58.793
So I'm going to write 3a
here and increment this.

00:33:03.140 --> 00:33:04.660
Then I get to two.

00:33:04.660 --> 00:33:06.707
Pos of two is--

00:33:06.707 --> 00:33:07.290
AUDIENCE: One.

00:33:07.290 --> 00:33:08.240
VICTOR COSTAN: One.

00:33:08.240 --> 00:33:10.560
So I write two here.

00:33:10.560 --> 00:33:14.530
Pos of two becomes two.

00:33:14.530 --> 00:33:19.610
Then I have 3c, which
is pos of 3 is now 3.

00:33:19.610 --> 00:33:20.790
It's not two anymore.

00:33:20.790 --> 00:33:23.870
So yay, I'm not overwriting 3a.

00:33:23.870 --> 00:33:24.960
That's good.

00:33:24.960 --> 00:33:26.040
And this becomes four.

00:33:31.559 --> 00:33:33.350
Are people getting what
just happened here?

00:33:35.990 --> 00:33:41.250
AUDIENCE: Wait, why didn't
[INAUDIBLE] to just basically

00:33:41.250 --> 00:33:45.639
train the next array
into an index binder?

00:33:45.639 --> 00:33:46.430
VICTOR COSTAN: Yep.

00:33:46.430 --> 00:33:50.090
So this guy tells
me if I have a key,

00:33:50.090 --> 00:33:52.700
where do I write it in here?

00:33:52.700 --> 00:33:56.950
So these start out with pointers
to the first element that

00:33:56.950 --> 00:33:58.240
would store that key value.

00:33:58.240 --> 00:34:01.910
And when I store a key, say when
I start 3a, when I get to 3c,

00:34:01.910 --> 00:34:03.770
I don't want to store
it in the same place.

00:34:03.770 --> 00:34:05.280
So I have to increment that.

00:34:05.280 --> 00:34:07.910
I have to say, yo, I
wrote 3a at position two.

00:34:07.910 --> 00:34:10.510
So next time, write
it-- next time you

00:34:10.510 --> 00:34:13.290
see a three, right it at
the position following that.

00:34:13.290 --> 00:34:16.750
And that's what this guy does.

00:34:19.989 --> 00:34:22.710
So this is the
relatively easy part.

00:34:22.710 --> 00:34:26.159
And this is the hard
magic in counting sort.

00:34:30.270 --> 00:34:33.290
So how are people
feeling about it now?

00:34:35.969 --> 00:34:39.426
Any nods, or is still
confusing as hell?

00:34:39.426 --> 00:34:40.300
AUDIENCE: It's a lot.

00:34:40.300 --> 00:34:43.340
I'm confused.

00:34:43.340 --> 00:34:46.230
VICTOR COSTAN: OK.

00:34:46.230 --> 00:34:47.690
Well what should we do?

00:34:47.690 --> 00:34:49.690
Do you guys want to
ask more questions?

00:34:49.690 --> 00:34:52.815
Do you want to run
through another example?

00:34:52.815 --> 00:34:55.750
Do you want to try to see how
this becomes useful in radix

00:34:55.750 --> 00:34:58.720
sort, so that you're motivated
to figure it out on your own?

00:34:58.720 --> 00:35:00.570
What would make more sense?

00:35:00.570 --> 00:35:01.070
All right.

00:35:01.070 --> 00:35:03.710
Who wants to do more count sort?

00:35:03.710 --> 00:35:06.200
Who wants to do some radix sort.

00:35:06.200 --> 00:35:07.190
All right.

00:35:07.190 --> 00:35:07.950
Radix sort it is.

00:35:10.399 --> 00:35:12.440
Next time you want to move
on, tell me understood

00:35:12.440 --> 00:35:13.434
and I'll believe you.

00:35:13.434 --> 00:35:14.600
And it'll look good on tape.

00:35:17.310 --> 00:35:18.390
Two, three--

00:35:18.390 --> 00:35:20.056
AUDIENCE: You're not
supposed to tell us

00:35:20.056 --> 00:35:21.630
that there's a camera in here.

00:35:21.630 --> 00:35:24.091
VICTOR COSTAN: One, four.

00:35:24.091 --> 00:35:26.340
I think you're supposed to
know, because otherwise you

00:35:26.340 --> 00:35:30.410
don't know that we're
violating your rights.

00:35:30.410 --> 00:35:31.330
Two, four--

00:35:31.330 --> 00:35:34.250
AUDIENCE: This is out the door.

00:35:34.250 --> 00:35:44.090
VICTOR COSTAN: One, two, four,
three, two, one, four, three.

00:35:44.090 --> 00:35:45.075
And one more.

00:35:45.075 --> 00:35:48.120
One, two, three, four.

00:35:48.120 --> 00:35:50.150
So this is to
refresh your memory.

00:35:50.150 --> 00:35:54.800
What do keys look like
in merge and radix sort?

00:35:54.800 --> 00:35:58.580
So in concert, the keys have to
be numbers from 0 to k minus 1.

00:35:58.580 --> 00:35:59.480
How about merge sort?

00:35:59.480 --> 00:36:00.508
What do keys look like?

00:36:10.560 --> 00:36:16.080
So radix sort says that a
key is a sequence of digits.

00:36:16.080 --> 00:36:21.010
Say you have d digits in a key.

00:36:21.010 --> 00:36:23.880
But then each digit isn't
necessarily a base 10 digit

00:36:23.880 --> 00:36:25.240
like we're used to.

00:36:25.240 --> 00:36:27.920
Each digit is in base k.

00:36:27.920 --> 00:36:31.940
So each digit can be
from 0 to k minus 1.

00:36:31.940 --> 00:36:36.820
And we're using base k.

00:36:36.820 --> 00:36:39.230
How many keys can I
represent this way?

00:36:41.940 --> 00:36:44.310
So if you have numbers
of n digits in base k,

00:36:44.310 --> 00:36:46.310
what's the biggest number
that we can represent,

00:36:46.310 --> 00:36:48.389
or how many numbers can
we represent with that?

00:36:48.389 --> 00:36:49.796
AUDIENCE: n to the k.

00:36:49.796 --> 00:36:53.079
No, d to the k.

00:36:53.079 --> 00:36:54.774
Right?

00:36:54.774 --> 00:36:55.690
VICTOR COSTAN: Almost.

00:36:55.690 --> 00:36:56.856
AUDIENCE: [INAUDIBLE] the d.

00:37:00.479 --> 00:37:01.520
VICTOR COSTAN: All right.

00:37:01.520 --> 00:37:03.900
So if our base is two,
like if we're using bits,

00:37:03.900 --> 00:37:05.630
then our base is two.

00:37:05.630 --> 00:37:08.010
And if I have eight bits,
then two to the eight.

00:37:10.600 --> 00:37:11.330
Cool.

00:37:11.330 --> 00:37:14.490
So if I add one
more digit, I get

00:37:14.490 --> 00:37:19.800
to multiply the number
of keys I represent by k.

00:37:19.800 --> 00:37:21.850
How do I radix sort?

00:37:21.850 --> 00:37:24.610
Does anyone remember?

00:37:24.610 --> 00:37:28.692
AUDIENCE: We checked the
log base k of everything.

00:37:28.692 --> 00:37:32.330
I guess log base d.

00:37:32.330 --> 00:37:33.260
Oh, k.

00:37:33.260 --> 00:37:34.190
It's based in--

00:37:34.190 --> 00:37:34.330
VICTOR COSTAN: No.

00:37:34.330 --> 00:37:35.340
That would be hard math.

00:37:35.340 --> 00:37:36.820
We don't do hard math.

00:37:36.820 --> 00:37:40.340
In sorting, if you have
integers going into your sort,

00:37:40.340 --> 00:37:41.690
you only do integer operations.

00:37:41.690 --> 00:37:43.690
You don't do anything
math beyond them.

00:37:51.830 --> 00:37:54.520
So what we do is we've
broken up the keys

00:37:54.520 --> 00:37:55.950
into d digits for a reason.

00:37:55.950 --> 00:37:59.030
We're going to have
d rounds in the sort.

00:37:59.030 --> 00:38:03.659
And in each round, we're
going to take all the keys

00:38:03.659 --> 00:38:04.200
that we have.

00:38:04.200 --> 00:38:08.940
And we're going to sort them
according to one of the digits.

00:38:08.940 --> 00:38:11.800
So in one round, we'll sort
them according to this digit.

00:38:11.800 --> 00:38:13.540
In one round, we'll
sort them according

00:38:13.540 --> 00:38:15.690
to this digit, this
digit, this digit.

00:38:18.750 --> 00:38:20.580
Which digit do we start with?

00:38:20.580 --> 00:38:21.988
What do you guys think?

00:38:21.988 --> 00:38:23.900
AUDIENCE: To least
significant digit, right?

00:38:23.900 --> 00:38:25.691
AUDIENCE: And most
significant on the left.

00:38:29.830 --> 00:38:32.272
VICTOR COSTAN: So this or this?

00:38:32.272 --> 00:38:33.580
AUDIENCE: The right side.

00:38:33.580 --> 00:38:36.180
AUDIENCE: 100 is
bigger than 1, even

00:38:36.180 --> 00:38:41.219
though the 1 is greater
than the 0 in 100.

00:38:41.219 --> 00:38:42.760
VICTOR COSTAN: You're
helping me out.

00:38:42.760 --> 00:38:44.730
So the point I'm
trying to make here

00:38:44.730 --> 00:38:47.280
is radix sort is unintuitive.

00:38:47.280 --> 00:38:49.269
If we ask you on a quiz
what do you start with,

00:38:49.269 --> 00:38:50.810
your intuition will
tell you to start

00:38:50.810 --> 00:38:52.460
with the most significant digit.

00:38:52.460 --> 00:38:53.876
Go against it.

00:38:53.876 --> 00:38:56.726
In radix sort, you start with
the least significant digit

00:38:56.726 --> 00:38:57.850
and then move your way out.

00:38:57.850 --> 00:39:02.333
So radix sort goes like this.

00:39:02.333 --> 00:39:03.874
AUDIENCE: I mean,
it does make sense,

00:39:03.874 --> 00:39:05.962
because you don't have very
much information unless you're

00:39:05.962 --> 00:39:06.766
looking at bits.

00:39:06.766 --> 00:39:09.176
You can get a bunch
of twos, but that

00:39:09.176 --> 00:39:10.622
doesn't give you
much information.

00:39:10.622 --> 00:39:12.400
The most information
is the smallest bit.

00:39:12.400 --> 00:39:14.671
And then you move up from there.

00:39:14.671 --> 00:39:16.420
VICTOR COSTAN: It
depends what information

00:39:16.420 --> 00:39:17.530
you're trying to get.

00:39:17.530 --> 00:39:20.810
But maybe you know the
algorithm, so you're thinking,

00:39:20.810 --> 00:39:22.470
oh, by knowing the
algorithm, I know

00:39:22.470 --> 00:39:24.700
that I'll have the
most information

00:39:24.700 --> 00:39:26.300
by looking at it this way.

00:39:26.300 --> 00:39:30.050
All right, so let's sort
these by the last digit.

00:39:30.050 --> 00:39:30.990
Sweet.

00:39:30.990 --> 00:39:33.660
Let's sort them by the
digit, by the digit

00:39:33.660 --> 00:39:35.970
before the last digit.

00:39:35.970 --> 00:39:38.022
What do I have to
do in my sorting?

00:39:38.022 --> 00:39:39.480
What do I have to
pay attention to?

00:39:43.380 --> 00:39:46.280
So the sorting method that I
use has to have a property.

00:39:46.280 --> 00:39:47.780
It can't be any kind of sorting.

00:39:50.251 --> 00:39:50.750
Stable.

00:39:55.460 --> 00:39:57.990
So the reason we went through
all this pain in counting sort

00:39:57.990 --> 00:40:00.840
is because we want to
have a stable sort here.

00:40:00.840 --> 00:40:05.750
Now, let's try to sort
these in a stable manner.

00:40:05.750 --> 00:40:09.110
This is the first one,
two, four, one, three.

00:40:09.110 --> 00:40:15.854
Then I have two threes, so one,
four, three, two, one, two,

00:40:15.854 --> 00:40:16.780
three, four.

00:40:16.780 --> 00:40:20.160
And then I have three fours.

00:40:20.160 --> 00:40:22.160
Two, three, four, one.

00:40:22.160 --> 00:40:24.660
Two, four, one, three.

00:40:24.660 --> 00:40:27.320
Two, one, four, three.

00:40:27.320 --> 00:40:28.340
Way this isn't good.

00:40:32.860 --> 00:40:35.560
Two, three, four, one.

00:40:35.560 --> 00:40:36.735
One, two, four, three.

00:40:36.735 --> 00:40:38.718
AUDIENCE: You should cross them
off if you write them down.

00:40:38.718 --> 00:40:39.717
VICTOR COSTAN: I should.

00:40:43.450 --> 00:40:47.020
I was hoping you guys
would help me if I mess up.

00:40:47.020 --> 00:40:48.900
So now these are sorted stably.

00:40:48.900 --> 00:40:53.330
Let's look at these last three
that have the same digit here.

00:40:53.330 --> 00:40:54.900
So they have the same four.

00:40:54.900 --> 00:40:57.360
If you look at the
last digit, because I

00:40:57.360 --> 00:40:59.850
used a stable
sorting, they're also

00:40:59.850 --> 00:41:02.230
sorted according
to this last digit.

00:41:02.230 --> 00:41:05.770
So they're sorted according
to these last two digits,

00:41:05.770 --> 00:41:08.140
because the sorting
that I used is stable.

00:41:08.140 --> 00:41:11.740
So now if I sort
according to this digit,

00:41:11.740 --> 00:41:13.570
then if my sorting
is stable, they're

00:41:13.570 --> 00:41:17.350
going to be sorted according
to the last three digits.

00:41:17.350 --> 00:41:21.300
So as I go from my last
digit to my first digit,

00:41:21.300 --> 00:41:23.300
the keys are going to
be sorted according

00:41:23.300 --> 00:41:25.550
to the last digit, the last
two digits, the last three

00:41:25.550 --> 00:41:28.190
digits, and then all the
way up to everything.

00:41:28.190 --> 00:41:29.760
This is why I need
a stable sort.

00:41:29.760 --> 00:41:32.034
And also, this is why I
need to start from the end.

00:41:40.160 --> 00:41:42.870
Does this make some sense?

00:41:42.870 --> 00:41:46.750
What stable sort
did we just learn?

00:41:46.750 --> 00:41:48.582
AUDIENCE: Counting.

00:41:48.582 --> 00:41:49.790
VICTOR COSTAN: Counting sort.

00:41:49.790 --> 00:41:50.290
All right.

00:41:50.290 --> 00:41:52.930
So we're going to use
counting sort for this.

00:41:52.930 --> 00:41:54.680
What's the running
time for one round?

00:41:54.680 --> 00:41:57.520
So for one sorting.

00:41:57.520 --> 00:41:59.220
One counting sort
takes how much time?

00:42:02.094 --> 00:42:04.559
AUDIENCE: This is a radix sort.

00:42:04.559 --> 00:42:05.350
VICTOR COSTAN: Yes.

00:42:05.350 --> 00:42:08.620
So radix sort is d
rounds of counting sort.

00:42:08.620 --> 00:42:10.320
Count sort this,
count sort this,

00:42:10.320 --> 00:42:12.890
count sort this,
count sort this.

00:42:12.890 --> 00:42:15.550
So one round, one counting
sort, what's the running time?

00:42:18.256 --> 00:42:19.610
AUDIENCE: [INAUDIBLE].

00:42:19.610 --> 00:42:20.651
VICTOR COSTAN: Thank you.

00:42:23.840 --> 00:42:31.750
Now how about d of these
plus the running time?

00:42:31.750 --> 00:42:33.202
AUDIENCE: dn plus b.

00:42:40.480 --> 00:42:43.040
VICTOR COSTAN: OK, but I
want to come back here.

00:42:43.040 --> 00:42:46.450
And I want to be able to say
that radix sort is optimal.

00:42:46.450 --> 00:42:50.780
I want to be able to
say that it is order n.

00:42:50.780 --> 00:42:53.635
So what do I have to do in
order to be able to say that?

00:42:58.810 --> 00:43:02.550
AUDIENCE: [INAUDIBLE]
k equal to m.

00:43:02.550 --> 00:43:05.450
VICTOR COSTAN: So you're going
from-- you know the answer.

00:43:05.450 --> 00:43:07.616
You're going from the fact
that you know the answer.

00:43:07.616 --> 00:43:08.660
AUDIENCE: [INAUDIBLE].

00:43:08.660 --> 00:43:09.510
VICTOR COSTAN: OK, very good.

00:43:09.510 --> 00:43:11.010
What if we wouldn't
know the answer?

00:43:11.010 --> 00:43:12.725
What do I need to do?

00:43:12.725 --> 00:43:14.808
AUDIENCE: Well, we know
the first part is order n.

00:43:14.808 --> 00:43:16.189
So--

00:43:16.189 --> 00:43:17.480
VICTOR COSTAN: So d has to be--

00:43:17.480 --> 00:43:21.400
AUDIENCE: We want dn to
be greater than dk, right?

00:43:21.400 --> 00:43:23.470
VICTOR COSTAN: Well, so dn.

00:43:23.470 --> 00:43:26.817
dn has to be, at
most, o of n, right.

00:43:26.817 --> 00:43:28.900
Because otherwise, the
whole thing would go above.

00:43:28.900 --> 00:43:30.150
So that wouldn't work.

00:43:30.150 --> 00:43:34.000
So then what can I say about d?

00:43:34.000 --> 00:43:35.340
AUDIENCE: Constant.

00:43:35.340 --> 00:43:36.381
VICTOR COSTAN: Very good.

00:43:36.381 --> 00:43:38.402
And how do you write
constants in math mode?

00:43:38.402 --> 00:43:39.509
AUDIENCE: Order one.

00:43:39.509 --> 00:43:40.550
VICTOR COSTAN: Very good.

00:43:40.550 --> 00:43:42.060
So d has to be order one.

00:43:42.060 --> 00:43:44.980
Otherwise, it's not going
to come out to that.

00:43:44.980 --> 00:43:46.490
Now, what else do we know?

00:43:46.490 --> 00:43:50.700
We have this that's
order n plus k.

00:43:50.700 --> 00:43:52.940
If I said this to be
a lot smaller than k,

00:43:52.940 --> 00:43:56.820
if I set it to be log n,
it's going to be order n.

00:43:56.820 --> 00:44:01.900
If I set it k to be a
constant, if I use bits,

00:44:01.900 --> 00:44:05.500
if I use base 2-- so I said
k equal 2-- this is still

00:44:05.500 --> 00:44:07.080
going to be order n.

00:44:07.080 --> 00:44:09.740
So if k goes way
below n, this step

00:44:09.740 --> 00:44:11.360
is still going to be order n.

00:44:11.360 --> 00:44:13.975
So I might as well set
k as high as possible.

00:44:17.550 --> 00:44:21.330
So k is order n, because
that's the highest

00:44:21.330 --> 00:44:22.650
thing I could set it to.

00:44:22.650 --> 00:44:25.120
Now why do I want to do that?

00:44:25.120 --> 00:44:26.374
Yes, you have a ques--

00:44:26.374 --> 00:44:29.146
AUDIENCE: [INAUDIBLE] represent
in counting sort again?

00:44:29.146 --> 00:44:31.352
The length of what?

00:44:31.352 --> 00:44:32.810
VICTOR COSTAN: So
in counting sort,

00:44:32.810 --> 00:44:37.230
n is your input, how
many keys you have.

00:44:37.230 --> 00:44:41.150
And k is the size of this array.

00:44:41.150 --> 00:44:41.859
AUDIENCE: Oh, OK.

00:44:41.859 --> 00:44:43.691
VICTOR COSTAN: So you
have to be able to map

00:44:43.691 --> 00:44:45.090
your keys from 0 to k minus 1.

00:44:45.090 --> 00:44:46.962
AUDIENCE: It's set
by n, basically.

00:44:46.962 --> 00:44:48.370
Or it's set by the elements.

00:44:48.370 --> 00:44:48.760
VICTOR COSTAN: Yeah.

00:44:48.760 --> 00:44:50.115
It's set by the
nature of the keys.

00:44:50.115 --> 00:44:50.390
AUDIENCE: OK.

00:44:50.390 --> 00:44:50.910
Got it.

00:44:50.910 --> 00:44:52.451
VICTOR COSTAN: So
in real life, we're

00:44:52.451 --> 00:44:55.520
thinking maybe we have some huge
numbers that we want to sort.

00:44:55.520 --> 00:44:57.830
And we're going to chunk
them up into-- when we're

00:44:57.830 --> 00:44:59.330
writing on the
board, we always have

00:44:59.330 --> 00:45:00.829
to chunk them up
in base 10 digits,

00:45:00.829 --> 00:45:02.870
because that's the only
way we know how to write.

00:45:02.870 --> 00:45:05.890
But in a computer memory, we
can chunk them up into, say,

00:45:05.890 --> 00:45:08.440
base 10,000 digits.

00:45:08.440 --> 00:45:10.440
And the fewer digits
you have, the faster

00:45:10.440 --> 00:45:11.380
this is going to run.

00:45:11.380 --> 00:45:14.480
So we have to figure
out what's the base.

00:45:14.480 --> 00:45:15.940
And it turns out
that if you want

00:45:15.940 --> 00:45:19.680
to have radix sort run
in order and time, well,

00:45:19.680 --> 00:45:24.390
the number of digits has
to be sort of constant.

00:45:24.390 --> 00:45:27.210
I know that k should
be order n, because I

00:45:27.210 --> 00:45:30.320
have no interest in
making it lower than that.

00:45:30.320 --> 00:45:32.050
So these two bounds
together tell me

00:45:32.050 --> 00:45:36.666
that the keys that I
can sort are from zero

00:45:36.666 --> 00:45:42.610
up to order n of order one.

00:45:42.610 --> 00:45:45.940
And this looks terrible,
but what it comes up to

00:45:45.940 --> 00:45:49.690
is that you can
sort keys that look

00:45:49.690 --> 00:45:52.640
like n to some constant
for any constant.

00:45:55.630 --> 00:45:59.690
So you can sort huge keys,
as long as huge still

00:45:59.690 --> 00:46:00.260
means finite.

00:46:04.070 --> 00:46:06.780
And as long as you can figure
out how to map them to numbers.

00:46:10.060 --> 00:46:13.170
Does this make some sense?

00:46:13.170 --> 00:46:15.950
Would we ever want to use merge
sort instead of counting sort?

00:46:15.950 --> 00:46:17.870
Suppose we had a
stable merge sort.

00:46:17.870 --> 00:46:24.071
Would we want to use that
instead of counting sort here?

00:46:24.071 --> 00:46:24.820
What would happen?

00:46:32.300 --> 00:46:33.330
So suppose it's stable.

00:46:33.330 --> 00:46:34.010
So it's correct.

00:46:34.010 --> 00:46:35.551
The algorithm isn't
going to blow up.

00:46:35.551 --> 00:46:37.273
What's the running
time for merge sort?

00:46:44.380 --> 00:46:46.880
So if I use a merge sort.

00:46:46.880 --> 00:46:50.760
So if I use the merge sort, it's
going to be d times n log n.

00:46:50.760 --> 00:46:52.980
So no matter how
small d is, I'm still

00:46:52.980 --> 00:46:55.190
not running in linear time.

00:46:55.190 --> 00:46:57.657
So merge sort does not
go well with radix sort.

00:47:01.340 --> 00:47:03.280
So from my end, we're
pretty much done.

00:47:03.280 --> 00:47:04.730
We started with n log n.

00:47:04.730 --> 00:47:08.217
And we got to a sorting
algorithm that's order n.

00:47:08.217 --> 00:47:10.300
We started at the beginning
of [INAUDIBLE], saying

00:47:10.300 --> 00:47:13.090
that the best thing
we can do is omega--

00:47:13.090 --> 00:47:15.160
is that omega-- omega of n.

00:47:15.160 --> 00:47:16.051
We got to that limit.

00:47:16.051 --> 00:47:16.550
We're happy.

00:47:16.550 --> 00:47:17.950
We're going to be
done with sorting.

00:47:17.950 --> 00:47:19.140
Any questions from you guys?

00:47:23.912 --> 00:47:25.495
That means everyone's
confused, right?

00:47:25.495 --> 00:47:26.360
Yes, thank you.

00:47:26.360 --> 00:47:28.610
AUDIENCE: Can you explain
what the stability criteria

00:47:28.610 --> 00:47:29.512
is again?

00:47:29.512 --> 00:47:30.345
VICTOR COSTAN: The--

00:47:30.345 --> 00:47:34.990
AUDIENCE: Stability for
these sorting algorithms.

00:47:34.990 --> 00:47:37.032
Which ones are stable and
what makes it unstable?

00:47:37.032 --> 00:47:38.531
VICTOR COSTAN: All
right, very good.

00:47:38.531 --> 00:47:39.160
Thank you.

00:47:39.160 --> 00:47:41.420
So I like especially
the last part,

00:47:41.420 --> 00:47:42.674
with which ones are stable.

00:47:42.674 --> 00:47:43.840
I'd like to go through that.

00:47:43.840 --> 00:47:46.410
So a stable sorting
algorithm means

00:47:46.410 --> 00:47:49.300
that if you have two
keys that are equal,

00:47:49.300 --> 00:47:51.480
the key that shows
up first in the input

00:47:51.480 --> 00:47:55.610
is the key that is
produced to the output.

00:47:55.610 --> 00:47:58.920
So in this model, your keys
are not necessarily integers.

00:47:58.920 --> 00:48:00.950
Your keys might be
those weird classes

00:48:00.950 --> 00:48:04.880
that implement some method
that maps them to integers.

00:48:04.880 --> 00:48:08.665
So say there is a
method there, __int__,

00:48:08.665 --> 00:48:11.880
that gives you the
integer for that.

00:48:11.880 --> 00:48:14.750
So the sorting algorithm
would only see a three here.

00:48:14.750 --> 00:48:17.010
But in fact, this
is a complex object.

00:48:17.010 --> 00:48:18.830
And this is another
complex object,

00:48:18.830 --> 00:48:21.690
but the sorting
only sees the three.

00:48:21.690 --> 00:48:24.380
If this guy shows up before
this guy in the input,

00:48:24.380 --> 00:48:28.278
they have to show up in the
same order in the output.

00:48:28.278 --> 00:48:31.480
AUDIENCE: Why would that
be bad if they're switched?

00:48:31.480 --> 00:48:34.730
VICTOR COSTAN: It's not stable.

00:48:34.730 --> 00:48:39.900
If they're switched, then when
we're using a stable sorting

00:48:39.900 --> 00:48:41.720
algorithm here.

00:48:41.720 --> 00:48:46.050
So here, the key is
this complicated object.

00:48:46.050 --> 00:48:47.650
But say we're in
the second round.

00:48:47.650 --> 00:48:49.900
We're in this round,
which we played with.

00:48:49.900 --> 00:48:52.730
Even though the key is this
whole complicated object,

00:48:52.730 --> 00:48:55.730
the only thing that the counting
sort sees is this number.

00:48:58.400 --> 00:49:00.430
So this guy looks like three.

00:49:00.430 --> 00:49:01.930
This guy looks like three.

00:49:01.930 --> 00:49:04.310
And these three guys,
although they're different,

00:49:04.310 --> 00:49:06.430
they look like four.

00:49:06.430 --> 00:49:09.480
If I don't output them
in the right order--

00:49:09.480 --> 00:49:12.770
say I output this one all
the way at the end-- then

00:49:12.770 --> 00:49:16.550
I'm going to get two, three,
four, one to be down here.

00:49:16.550 --> 00:49:20.360
And now my numbers aren't sorted
by the last two digits anymore.

00:49:20.360 --> 00:49:23.440
So it breaks any algorithm
that assumes stability.

00:49:23.440 --> 00:49:25.910
So stability is something
that you get from a sort,

00:49:25.910 --> 00:49:27.940
because it's
convenient to assume it

00:49:27.940 --> 00:49:30.580
in some other algorithm
that builds up on that sort.

00:49:30.580 --> 00:49:32.496
If you don't need it,
you don't care about it.

00:49:32.496 --> 00:49:34.920
But in some cases, you need it.

00:49:34.920 --> 00:49:37.915
And for the second part,
which algorithms are stable.

00:49:41.370 --> 00:49:42.675
Is insertion sort stable?

00:49:46.968 --> 00:49:47.960
AUDIENCE: I assume so.

00:49:47.960 --> 00:49:49.810
I mean, stable is
being correct, right?

00:49:49.810 --> 00:49:50.560
VICTOR COSTAN: No.

00:49:50.560 --> 00:49:53.482
We mean that property there.

00:49:53.482 --> 00:49:54.344
AUDIENCE: Oh, I see.

00:49:54.344 --> 00:49:55.204
You mean in order.

00:49:55.204 --> 00:49:55.995
VICTOR COSTAN: Yep.

00:49:55.995 --> 00:49:56.905
AUDIENCE: Oh, OK.

00:50:00.927 --> 00:50:03.463
Insertion sort goes in order.

00:50:03.463 --> 00:50:06.076
But I guess it could push
other things out of order.

00:50:06.076 --> 00:50:07.450
VICTOR COSTAN: So
insertion sort,

00:50:07.450 --> 00:50:09.730
you're doing swapping to
move things to the left.

00:50:09.730 --> 00:50:11.480
But if you find two
things that are equal,

00:50:11.480 --> 00:50:14.000
you're never going to swap them.

00:50:14.000 --> 00:50:18.760
So insertion sort is
in order, is stable.

00:50:18.760 --> 00:50:23.130
Merge sort, the one we gave
you in that list is not stable.

00:50:23.130 --> 00:50:25.907
But there is the one character
change that makes it stable.

00:50:25.907 --> 00:50:27.740
And you should look at
today's lecture notes

00:50:27.740 --> 00:50:29.260
to find out what that is.

00:50:29.260 --> 00:50:31.580
So merge sort can be stable.

00:50:31.580 --> 00:50:33.075
Heapsort, stable or unstable?

00:50:37.040 --> 00:50:37.630
Unstable.

00:50:37.630 --> 00:50:39.070
And there's a
really small example

00:50:39.070 --> 00:50:41.180
that you should look at.

00:50:41.180 --> 00:50:43.470
Counting sort,
stable or unstable?

00:50:43.470 --> 00:50:44.430
AUDIENCE: Stable.

00:50:44.430 --> 00:50:45.471
VICTOR COSTAN: Thank you.

00:50:45.471 --> 00:50:48.700
It would have broken my heart if
this would have come out wrong.

00:50:48.700 --> 00:50:50.878
And radix sort?

00:50:50.878 --> 00:50:52.330
AUDIENCE: Probably.

00:50:52.330 --> 00:50:54.417
Yes.

00:50:54.417 --> 00:50:56.000
VICTOR COSTAN:
Probably stable, right?

00:50:56.000 --> 00:50:56.755
All right.

00:50:56.755 --> 00:50:57.970
Any more questions?

00:50:57.970 --> 00:51:00.980
I like that question by the way,
because you made me do this.

00:51:00.980 --> 00:51:01.480
I like that.

00:51:01.480 --> 00:51:02.405
Any more questions?

00:51:06.070 --> 00:51:08.390
All right, thank you guys.