WEBVTT

00:00:00.090 --> 00:00:02.500
The following content is
provided under a Creative

00:00:02.500 --> 00:00:04.019
Commons license.

00:00:04.019 --> 00:00:06.360
Your support will help
MIT OpenCourseWare

00:00:06.360 --> 00:00:10.730
continue to offer high quality
educational resources for free.

00:00:10.730 --> 00:00:13.340
To make a donation or
view additional materials

00:00:13.340 --> 00:00:17.215
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.215 --> 00:00:17.840
at ocw.mit.edu.

00:00:20.706 --> 00:00:23.206
AMARTYA SHANKHA BISWAS: Feel
free to populate the front row.

00:00:23.206 --> 00:00:26.030
I'm not that scary.

00:00:26.030 --> 00:00:29.010
So today, we're going to look
at more greedy algorithms.

00:00:29.010 --> 00:00:31.420
So I think you went
over Kruskal's algorithm

00:00:31.420 --> 00:00:34.040
and how you do the
sorting in the lecture.

00:00:34.040 --> 00:00:38.970
So going back to make
change from last recitation,

00:00:38.970 --> 00:00:41.220
so this is sort of
a variant on that.

00:00:41.220 --> 00:00:43.080
So instead of
discrete coins, we now

00:00:43.080 --> 00:00:46.440
have continuous coins, in the
sense so the analogy here is,

00:00:46.440 --> 00:00:50.030
let's say, you have N metals,
and each of the metals

00:00:50.030 --> 00:00:53.550
has some value given by
Ci dollars per kilogram,

00:00:53.550 --> 00:00:55.990
or whatever units you prefer.

00:00:55.990 --> 00:00:58.660
And you want to
achieve some value

00:00:58.660 --> 00:01:01.520
T. You want to give someone
T dollars worth of metal.

00:01:01.520 --> 00:01:04.110
And you want to do this
while minimizing-- oh, so I

00:01:04.110 --> 00:01:05.099
should mention this.

00:01:05.099 --> 00:01:07.580
ki is the weight of
every metal that you

00:01:07.580 --> 00:01:09.720
will give to the person.

00:01:09.720 --> 00:01:13.242
So you're taking ki of metal
i, and you are going to--

00:01:13.242 --> 00:01:14.950
and you have to ensure,
so basically, you

00:01:14.950 --> 00:01:22.820
have to ensure that some
summation of ki Ci over all i

00:01:22.820 --> 00:01:26.890
is equal to T. And in doing
so you want to minimize

00:01:26.890 --> 00:01:28.570
the summation over all ki.

00:01:28.570 --> 00:01:29.570
So does that make sense?

00:01:29.570 --> 00:01:30.550
So you have a bunch of metals.

00:01:30.550 --> 00:01:32.383
Some of them are more
expensive than others.

00:01:32.383 --> 00:01:34.740
And you want to measure
them out and give someone

00:01:34.740 --> 00:01:36.660
a certain fixed value.

00:01:36.660 --> 00:01:39.590
So anyone have any
ideas how to do this?

00:01:39.590 --> 00:01:49.780
Should be-- should be the
first thing that comes to mind.

00:01:49.780 --> 00:01:52.340
So you have much of
metals-- some of them

00:01:52.340 --> 00:01:53.470
with certain costs.

00:01:53.470 --> 00:01:58.200
And you're trying
to create a value T.

00:01:58.200 --> 00:02:00.660
So which metal would
you want to pick?

00:02:00.660 --> 00:02:03.422
So it should seem
intuitive that if you

00:02:03.422 --> 00:02:05.130
want to minimize the
weight of the metal,

00:02:05.130 --> 00:02:07.540
you would want to pick
the-- have the most

00:02:07.540 --> 00:02:10.130
expensive one for weight.

00:02:10.130 --> 00:02:17.760
So let's start by sort by Ci.

00:02:17.760 --> 00:02:22.990
And we want to sort it
in decreasing order.

00:02:22.990 --> 00:02:23.670
Does make sense?

00:02:23.670 --> 00:02:25.920
So if you have the
most expensive metal,

00:02:25.920 --> 00:02:28.480
you want to use as much
of that as you can,

00:02:28.480 --> 00:02:30.510
so that your weight
is minimized.

00:02:30.510 --> 00:02:33.560
So once you sort by Ci, so
let's say, you have your costs

00:02:33.560 --> 00:02:41.190
right now are-- let's call
this one C1, C2, up to Cn.

00:02:41.190 --> 00:02:42.760
And these are in sorted order.

00:02:42.760 --> 00:02:44.120
So it's increasing this way.

00:02:46.910 --> 00:02:52.970
So you now take your value
T, and you look at T by C1.

00:02:52.970 --> 00:02:56.000
And that is the amount of weight
you would need to generate C.

00:02:56.000 --> 00:02:58.900
So you look at how
much you have here.

00:02:58.900 --> 00:03:02.400
So the amount of metal-- so a
constraint I forgot to mention,

00:03:02.400 --> 00:03:05.301
you are given a limited
amount of every metal.

00:03:05.301 --> 00:03:10.340
OK, that's-- it's
not that trivial.

00:03:10.340 --> 00:03:13.280
So you have--
let's mention that.

00:03:13.280 --> 00:03:16.260
So you have-- is that used?

00:03:16.260 --> 00:03:19.940
No, it's not-- amount.

00:03:32.136 --> 00:03:33.390
Does that make more sense?

00:03:36.210 --> 00:03:38.730
So you look at T over Ci.

00:03:38.730 --> 00:03:45.220
And if T over Ci is
greater than W of i,

00:03:45.220 --> 00:03:47.890
then you just use the amount
you need to construct Wi,

00:03:47.890 --> 00:03:49.170
and you're done.

00:03:49.170 --> 00:03:52.100
Otherwise, you use all of Ci.

00:03:52.100 --> 00:03:57.320
So if it's less than
Wi, in that case,

00:03:57.320 --> 00:03:59.420
you-- sorry, other way around.

00:03:59.420 --> 00:04:02.140
If it's greater than
Wi, you use all of it.

00:04:02.140 --> 00:04:05.210
And then you move on to the next
one, the next one, and so on.

00:04:05.210 --> 00:04:06.890
So that seems pretty intuitive.

00:04:06.890 --> 00:04:09.300
Let's actually do a
formal proof of that.

00:04:09.300 --> 00:04:11.660
So how you go about
proving this is

00:04:11.660 --> 00:04:15.201
that-- so let's say-- so it's
what we call the current base

00:04:15.201 --> 00:04:15.700
method.

00:04:15.700 --> 00:04:17.201
So basically what
you have is, let's

00:04:17.201 --> 00:04:19.241
say you're not using the
most expensive metal you

00:04:19.241 --> 00:04:20.079
have at this point.

00:04:20.079 --> 00:04:23.680
So let's say your most
expensive metal has cost Ci,

00:04:23.680 --> 00:04:27.450
but instead, you
decide to use Cj.

00:04:27.450 --> 00:04:33.350
So let's say you decide to
use some kj amount of Cj.

00:04:33.350 --> 00:04:40.260
So the value you're
getting from this is Cj kj.

00:04:40.260 --> 00:04:45.250
And instead, if you use
Ci, how much metal would

00:04:45.250 --> 00:04:46.690
you need to get the same value?

00:04:46.690 --> 00:04:50.885
You would need Cj kj over Ci.

00:04:50.885 --> 00:04:52.860
Does that make sense?

00:04:52.860 --> 00:04:55.560
So this is the value you
would obtain by using

00:04:55.560 --> 00:04:59.290
kj kilograms of this metal.

00:04:59.290 --> 00:05:01.830
So if you instead used this
one, you'd get this value.

00:05:01.830 --> 00:05:04.480
And so this is the
most expensive one.

00:05:04.480 --> 00:05:07.840
And Ci is greater
than Cj, this value,

00:05:07.840 --> 00:05:11.200
so this value is less than kj.

00:05:11.200 --> 00:05:14.150
So by using this metal
instead of that one,

00:05:14.150 --> 00:05:16.830
you are decreasing the amount--
the weight you would need,

00:05:16.830 --> 00:05:19.110
so your minimization goes down.

00:05:19.110 --> 00:05:20.175
Make sense?

00:05:20.175 --> 00:05:23.860
So that's like a very
simple greedy algorithm.

00:05:23.860 --> 00:05:26.350
And it's-- the algorithm is
exactly what you'd expect,

00:05:26.350 --> 00:05:28.180
and the proof isn't very hard.

00:05:28.180 --> 00:05:32.040
So let's move on to a
slightly interesting one.

00:05:36.130 --> 00:05:38.870
So this is process scheduling.

00:05:38.870 --> 00:05:40.250
So let's say you
have a computer,

00:05:40.250 --> 00:05:41.780
and you're running
end processes.

00:05:41.780 --> 00:05:45.560
And each of the process
has a time-- t1 through tn,

00:05:45.560 --> 00:05:46.670
again processes.

00:05:46.670 --> 00:05:48.890
And you want to order
them in some way.

00:05:48.890 --> 00:05:50.560
So first, you will
do process p1.

00:05:50.560 --> 00:05:53.550
Then you'll process p2,
and so on, and so forth.

00:05:53.550 --> 00:05:55.190
Then you'll define
a completion time.

00:05:55.190 --> 00:05:58.026
So completion time is simply
when does process i end.

00:05:58.026 --> 00:05:59.150
So when does process i end?

00:05:59.150 --> 00:06:02.140
You just p1 plus-- it's like
the time for p1 plus time for p2

00:06:02.140 --> 00:06:03.170
up to pn.

00:06:03.170 --> 00:06:07.230
So basically, you have
all your processes.

00:06:07.230 --> 00:06:12.130
So let's says this is p1,
this is p2, and so on.

00:06:12.130 --> 00:06:14.690
And the completion time for a
certain process in the middle

00:06:14.690 --> 00:06:17.000
is just the sum of
all times before it.

00:06:17.000 --> 00:06:18.150
That's completion time.

00:06:18.150 --> 00:06:18.920
And now what you
want to do is you

00:06:18.920 --> 00:06:20.544
want to minimize the
average completion

00:06:20.544 --> 00:06:22.860
time, which is summation
over all the completion

00:06:22.860 --> 00:06:25.270
times over n.

00:06:25.270 --> 00:06:29.370
So any ideas what an algorithm
for this would look like?

00:06:29.370 --> 00:06:31.780
Essentially, you
want to minimize

00:06:31.780 --> 00:06:33.460
the sum of all these times.

00:06:33.460 --> 00:06:36.250
So all these times, you want to
minimize the average of these.

00:06:36.250 --> 00:06:37.333
So what do you want to do?

00:06:37.333 --> 00:06:40.440
Do you want to shift the
slower-- the processes which

00:06:40.440 --> 00:06:41.680
take more time, do you want
to keep them at the end,

00:06:41.680 --> 00:06:43.555
or do you want to keep
them at the beginning?

00:06:50.310 --> 00:06:52.060
So if you have a bunch
of small processes,

00:06:52.060 --> 00:06:53.220
what do you do with
them at the end?

00:06:53.220 --> 00:06:54.553
What do you do at the beginning?

00:07:02.990 --> 00:07:04.520
Completion time
is when does-- So

00:07:04.520 --> 00:07:06.400
let's say this is process pi.

00:07:06.400 --> 00:07:09.097
And completion time for process
pi is like this distance.

00:07:09.097 --> 00:07:10.680
It's like, when does
pi get completed?

00:07:10.680 --> 00:07:13.840
So it's summation of all
the times-- so the time

00:07:13.840 --> 00:07:16.030
taken for p1, p2,
up to the end, see?

00:07:16.030 --> 00:07:17.840
So you want to
basically minimize

00:07:17.840 --> 00:07:20.629
the average of these values.

00:07:20.629 --> 00:07:22.420
So where do you put
the smaller processes--

00:07:22.420 --> 00:07:23.878
would you put the
shorter processes

00:07:23.878 --> 00:07:25.790
at the end or the beginning?

00:07:25.790 --> 00:07:29.435
Which one would
decrease your average?

00:07:29.435 --> 00:07:31.080
The beginning?

00:07:31.080 --> 00:07:32.307
Makes sense, right?

00:07:32.307 --> 00:07:33.390
So yeah, that makes sense.

00:07:33.390 --> 00:07:34.610
So you basically
want to like scrunch

00:07:34.610 --> 00:07:36.026
these lines towards
the beginning,

00:07:36.026 --> 00:07:37.740
so your average is smaller.

00:07:37.740 --> 00:07:41.000
Note that this total length
is always the constant.

00:07:41.000 --> 00:07:44.640
It's like summation over all ti.

00:07:44.640 --> 00:07:49.060
So let's go about-- so
OK, this is strategy.

00:07:49.060 --> 00:07:58.590
Again, sort by ti, and
this is increasing order,

00:07:58.590 --> 00:08:00.100
and that's it basically.

00:08:00.100 --> 00:08:02.170
So this is your
algorithm, sorted by tn,

00:08:02.170 --> 00:08:03.950
but use the process
in that order.

00:08:03.950 --> 00:08:06.120
So let's try to prove this.

00:08:06.120 --> 00:08:08.417
So the way you prove this
is a pretty generic method.

00:08:08.417 --> 00:08:10.250
It is often used to
prove greedy algorithms.

00:08:10.250 --> 00:08:11.430
So let's say that this
is not the optimal.

00:08:11.430 --> 00:08:12.890
Let's say someone
comes up to you

00:08:12.890 --> 00:08:15.490
and tells you, OK, I
have a better sequence.

00:08:15.490 --> 00:08:17.100
I have a sequence,
let's say, called--

00:08:17.100 --> 00:08:19.760
let's say I have a
sequence of p1 to pn.

00:08:19.760 --> 00:08:22.620
And that sequence does
better than a sorted order.

00:08:22.620 --> 00:08:26.585
So you're like, OK, so
if this is not sorted,

00:08:26.585 --> 00:08:28.335
then you have some
elements in the middle.

00:08:28.335 --> 00:08:31.480
Let's say you call
them pi is greater

00:08:31.480 --> 00:08:39.770
than pj, with i is less than j.

00:08:39.770 --> 00:08:43.500
So there's some pi here, and
there's some pj here, such

00:08:43.500 --> 00:08:45.614
that this is greater than that.

00:08:45.614 --> 00:08:46.780
So it's not in sorted order.

00:08:46.780 --> 00:08:48.800
So you can always
find a pair like that.

00:08:48.800 --> 00:08:53.140
So now I'm going to claim
that if you swap these two

00:08:53.140 --> 00:08:57.770
values-- so you swap pi
and pj-- that'll actually

00:08:57.770 --> 00:09:01.340
decrease whatever current
average completion time you

00:09:01.340 --> 00:09:01.840
have.

00:09:04.820 --> 00:09:06.880
So initially, you had
something like this.

00:09:06.880 --> 00:09:10.980
So-- no, let's not
draw a line there.

00:09:10.980 --> 00:09:16.670
So let's say you had something
like you had this process, so

00:09:16.670 --> 00:09:21.130
pi-- actually, this is the
bigger process, so this is pi,

00:09:21.130 --> 00:09:25.547
and this is pj.

00:09:25.547 --> 00:09:26.922
And now I'm saying
that-- and you

00:09:26.922 --> 00:09:28.770
have some stuff in the middle.

00:09:28.770 --> 00:09:31.190
And my claim is that,
no, this is not optimal.

00:09:31.190 --> 00:09:33.850
You'd do much better if
you moved the pj over here.

00:09:33.850 --> 00:09:41.630
So you want to go from this
to this, and big process.

00:09:41.630 --> 00:09:44.410
So let's see what changes when
you go from there to there.

00:09:44.410 --> 00:09:46.760
So first of all, observe
that the completion

00:09:46.760 --> 00:09:48.594
times of everything
behind this is the same.

00:09:48.594 --> 00:09:50.218
They all have the
same completion time;

00:09:50.218 --> 00:09:51.160
nothing is affected.

00:09:51.160 --> 00:09:53.044
And you're only changing
these two things.

00:09:53.044 --> 00:09:54.710
Everything after this
is also the same--

00:09:54.710 --> 00:09:56.620
has the same completion time.

00:09:56.620 --> 00:09:58.120
So the only things
that are changing

00:09:58.120 --> 00:10:00.490
are this one, this one, and
all the ones up to this one.

00:10:00.490 --> 00:10:02.990
Even this one has the
same completion time.

00:10:02.990 --> 00:10:03.900
Make sense?

00:10:03.900 --> 00:10:08.100
So how much is this changing by?

00:10:08.100 --> 00:10:08.990
So let's define this.

00:10:08.990 --> 00:10:18.110
Delta is equal to t
of pi minus t of pj--

00:10:18.110 --> 00:10:20.650
so the difference between
these two processes.

00:10:20.650 --> 00:10:25.060
So the original completion
time of pi was this.

00:10:25.060 --> 00:10:26.600
And now the
corresponding process

00:10:26.600 --> 00:10:29.290
down here, the completion
time is decreased by delta.

00:10:29.290 --> 00:10:32.340
So completion time for us
goes down like minus delta.

00:10:32.340 --> 00:10:34.020
This is a summation
of completion time.

00:10:34.020 --> 00:10:35.500
This divided by n is a constant.

00:10:35.500 --> 00:10:38.642
So you just want
to minimize this.

00:10:38.642 --> 00:10:40.850
So first it goes-- so this
one goes down minus delta.

00:10:40.850 --> 00:10:42.266
So let's look at
the next process.

00:10:42.266 --> 00:10:44.700
The next process is
something like this.

00:10:44.700 --> 00:10:46.539
So again, these do not change.

00:10:46.539 --> 00:10:47.830
You're only swapping these two.

00:10:47.830 --> 00:10:50.820
So this completion time
also down a minus delta,

00:10:50.820 --> 00:10:52.330
and so on, and so forth.

00:10:52.330 --> 00:10:54.880
So you just get a
bunch of minus deltas,

00:10:54.880 --> 00:10:57.311
which is equal to however
many processes you have.

00:10:57.311 --> 00:10:58.560
But that's not even important.

00:10:58.560 --> 00:11:00.490
What is important is
that just by swapping,

00:11:00.490 --> 00:11:02.810
you're going to get at
least one minus delta.

00:11:02.810 --> 00:11:06.084
And delta is positive, because
assumption-- oops, sorry,

00:11:06.084 --> 00:11:11.347
t-- because assumption was that
t pi minus t pj is positive.

00:11:11.347 --> 00:11:13.680
So just by swapping, you're
going to always decrease it.

00:11:13.680 --> 00:11:17.260
So the claim that that sequence
was an optimal solution

00:11:17.260 --> 00:11:18.260
is wrong.

00:11:18.260 --> 00:11:21.040
So you can always do better
by swapping two inversions.

00:11:21.040 --> 00:11:24.180
So that out of sorted order
is called an inversion.

00:11:24.180 --> 00:11:27.626
So if you solve an inversion,
you always get a better result.

00:11:27.626 --> 00:11:28.750
Does that proof make sense?

00:11:33.060 --> 00:11:36.560
So that's a slightly more
interesting recent algorithm.

00:11:36.560 --> 00:11:40.410
So let's move on to the
third one we have here.

00:11:40.410 --> 00:11:42.901
The third one is event overlap.

00:11:42.901 --> 00:11:43.900
So this is how it works.

00:11:46.520 --> 00:11:51.210
So you wake up in the morning,
and you look at your calendar.

00:11:51.210 --> 00:11:54.450
And being an MIT student, your
calendar looks pretty full.

00:11:54.450 --> 00:11:57.810
So let's say this is
what it looks like.

00:11:57.810 --> 00:11:59.942
So these are your events.

00:11:59.942 --> 00:12:04.872
Let's use some colors, make
it a little clearer possibly.

00:12:04.872 --> 00:12:08.140
And let's say you have
another event over here.

00:12:10.680 --> 00:12:14.816
You have something here.

00:12:14.816 --> 00:12:18.956
You have something here.

00:12:18.956 --> 00:12:21.396
You have something here.

00:12:21.396 --> 00:12:26.907
And you have something here.

00:12:26.907 --> 00:12:28.839
So OK, let's move this down.

00:12:33.560 --> 00:12:36.780
So the problem is that
you have this bunch

00:12:36.780 --> 00:12:37.920
of events planned out.

00:12:37.920 --> 00:12:39.720
Now clearly,
they're overlapping,

00:12:39.720 --> 00:12:41.880
so you can't attend all of them.

00:12:41.880 --> 00:12:45.010
So the idea is you make a
bunch of clones of yourself.

00:12:45.010 --> 00:12:49.140
And so in this case, look
at the matching colors.

00:12:49.140 --> 00:12:51.850
So if you create clone
number 1 goes here,

00:12:51.850 --> 00:12:54.820
and clone number 2 goes
to red, and clone number 3

00:12:54.820 --> 00:12:55.890
goes to blue.

00:12:55.890 --> 00:12:57.460
So then clone
number 1 does this.

00:12:57.460 --> 00:12:59.120
Clone number 3
does the blue one.

00:12:59.120 --> 00:13:01.770
Clone number 2 does red.

00:13:01.770 --> 00:13:03.760
I guess, we should move
the red back a little

00:13:03.760 --> 00:13:06.288
or forward a little bit
just to make it clear.

00:13:06.288 --> 00:13:07.780
Yeah, there we go.

00:13:07.780 --> 00:13:11.350
And now, you could easily
see that this is optimal.

00:13:11.350 --> 00:13:14.417
So you can do this with
three clones and no less.

00:13:14.417 --> 00:13:16.000
So you make three
clones, and then you

00:13:16.000 --> 00:13:17.490
can go off to spring break.

00:13:17.490 --> 00:13:18.720
And your schedule is fine.

00:13:18.720 --> 00:13:22.480
So now, how would you
approach this problem?

00:13:22.480 --> 00:13:26.210
So what is a greedy strategy
to, given a number of intervals,

00:13:26.210 --> 00:13:29.200
how do you find the
minimum number of clones

00:13:29.200 --> 00:13:32.750
you need to cover your day?

00:13:32.750 --> 00:13:33.250
Any ideas?

00:13:35.552 --> 00:13:37.010
What is a naive
thing you could do?

00:13:39.705 --> 00:13:41.495
AUDIENCE: When you
say to cover your day,

00:13:41.495 --> 00:13:42.620
then it's like the number--

00:13:42.620 --> 00:13:45.480
AMARTYA SHANKHA BISWAS: So
you want to do every event.

00:13:45.480 --> 00:13:48.710
But like so this clone can't--
so clone number 1 does this

00:13:48.710 --> 00:13:49.210
event.

00:13:49.210 --> 00:13:51.371
Then he can't do this
event or this event.

00:13:51.371 --> 00:13:53.335
AUDIENCE: Sort of
maximizing your events?

00:13:53.335 --> 00:13:55.635
AMARTYA SHANKHA BISWAS: You
want to do all the events?

00:13:55.635 --> 00:13:57.385
You want to minimize
the number of clones.

00:13:57.385 --> 00:13:58.375
AUDIENCE: [INAUDIBLE]

00:14:01.196 --> 00:14:03.570
AMARTYA SHANKHA BISWAS: So
it's like interval scheduling.

00:14:03.570 --> 00:14:07.240
But you want to do
all the intervals,

00:14:07.240 --> 00:14:09.005
but you're allowed to
use multiple people

00:14:09.005 --> 00:14:10.004
to do all the intervals.

00:14:12.944 --> 00:14:13.875
Yes?

00:14:13.875 --> 00:14:14.375
Yeah?

00:14:14.375 --> 00:14:16.760
AUDIENCE: You could
sort by end time.

00:14:16.760 --> 00:14:17.330
AMARTYA SHANKHA BISWAS:
By end time, OK.

00:14:17.330 --> 00:14:19.079
What do you do after
you sort by end time?

00:14:19.079 --> 00:14:22.139
AUDIENCE: And then iterate
over all the intervals

00:14:22.139 --> 00:14:25.369
once they're sorted and just
count how many intervals there

00:14:25.369 --> 00:14:31.085
are between the [INAUDIBLE].

00:14:31.085 --> 00:14:32.774
That would get complicated.

00:14:32.774 --> 00:14:34.440
AMARTYA SHANKHA BISWAS:
So you're close.

00:14:34.440 --> 00:14:35.940
So you do begin by sorting.

00:14:35.940 --> 00:14:38.426
But you can actually do
it by sorting by end time.

00:14:38.426 --> 00:14:40.550
It's easier to visualize
if you sort by start time.

00:14:40.550 --> 00:14:44.602
So leading from that,
anyone want to top in?

00:14:44.602 --> 00:14:46.530
AUDIENCE: It's when
your sorting starts,

00:14:46.530 --> 00:14:50.386
and every time you get a
class, you get a new clone.

00:14:50.386 --> 00:14:52.584
AMARTYA SHANKHA BISWAS:
So yeah, essentially,

00:14:52.584 --> 00:14:55.000
every time you can't add it
to one of your current clones,

00:14:55.000 --> 00:14:55.810
you just create a new one.

00:14:55.810 --> 00:14:56.593
You could also do
it by end time,

00:14:56.593 --> 00:14:57.340
because it's symmetrical, right?

00:14:57.340 --> 00:14:58.930
So if you sort by
end time, then you

00:14:58.930 --> 00:15:01.700
start with the smallest-- last
end time and go backwards,

00:15:01.700 --> 00:15:03.870
exactly the same thing.

00:15:03.870 --> 00:15:05.210
So let's write it down.

00:15:17.070 --> 00:15:19.330
So sort by start
time, and so actually,

00:15:19.330 --> 00:15:21.210
let's work out this example.

00:15:21.210 --> 00:15:26.130
So in this case, you
would go-- OK, actually,

00:15:26.130 --> 00:15:28.580
if once you sort--
so first you have 1,

00:15:28.580 --> 00:15:35.120
then you have 2, 3, 4, 5, 6.

00:15:35.120 --> 00:15:37.260
So that's sorted by start time.

00:15:37.260 --> 00:15:40.920
And then you have-- so
first you go for this one.

00:15:40.920 --> 00:15:45.160
Then you go for 2, and
2 intersects with 1.

00:15:45.160 --> 00:15:46.750
So you put 2 into it.

00:15:46.750 --> 00:15:48.670
So this is clone number 1.

00:15:48.670 --> 00:15:50.920
And then you have to
create a new clone for 2,

00:15:50.920 --> 00:15:52.310
so you create the new clone.

00:15:52.310 --> 00:15:53.820
And there we go.

00:15:53.820 --> 00:15:55.300
So then you go to 3.

00:15:55.300 --> 00:15:56.950
3 clashes with both
1 and 2, so you

00:15:56.950 --> 00:15:59.110
have to create a
new clone again.

00:15:59.110 --> 00:16:05.930
So in that case, you
go forth and create 3.

00:16:05.930 --> 00:16:07.100
Then you go to 4.

00:16:07.100 --> 00:16:10.050
Now, 4, you see, it's
starts with 2 and 3,

00:16:10.050 --> 00:16:11.620
but it is good with 1.

00:16:11.620 --> 00:16:15.224
So you just put 4 over here.

00:16:15.224 --> 00:16:17.140
And if you continue like
this, you essentially

00:16:17.140 --> 00:16:22.120
get this and this.

00:16:22.120 --> 00:16:23.050
Make sense?

00:16:23.050 --> 00:16:24.970
So that's how you schedule it.

00:16:24.970 --> 00:16:28.330
So does that
algorithm make sense?

00:16:28.330 --> 00:16:30.520
Let's try to prove
its correctness.

00:16:33.280 --> 00:16:36.820
So let's look at the instance
where you're inserting

00:16:36.820 --> 00:16:41.995
the m-th clone-- so m-th clone.

00:16:52.090 --> 00:16:55.219
So when the m-th is
created, you already

00:16:55.219 --> 00:16:56.260
have some values in here.

00:16:59.000 --> 00:17:03.740
So you have 1, 2, all
the way up to m minus 1.

00:17:03.740 --> 00:17:06.650
So now, you bring
in your interval,

00:17:06.650 --> 00:17:09.560
and you see that it collides
with all of these values.

00:17:09.560 --> 00:17:13.430
So let's just draw the final
interval for all these guys.

00:17:13.430 --> 00:17:16.664
So let's say the final interval
for this guy was out here.

00:17:16.664 --> 00:17:18.645
Let's say the final
interval for this guy

00:17:18.645 --> 00:17:23.010
was out here, and so on, and
blah, blah, blah, blah, blah.

00:17:23.010 --> 00:17:25.859
And so when you
create the m-th clone,

00:17:25.859 --> 00:17:27.000
you look at the start time.

00:17:27.000 --> 00:17:31.080
So what happens is that the
start time is somewhere,

00:17:31.080 --> 00:17:32.350
let's say, here.

00:17:32.350 --> 00:17:34.450
And now, you know
that because of this--

00:17:34.450 --> 00:17:36.810
so you're only adding
a new clone when you

00:17:36.810 --> 00:17:38.590
don't have an available slot.

00:17:38.590 --> 00:17:40.890
So that means that there
is some interval here,

00:17:40.890 --> 00:17:43.716
which intersects with this guy.

00:17:43.716 --> 00:17:45.590
So how do you show that
this is one interval?

00:17:45.590 --> 00:17:47.050
Well, it's like
consider any level.

00:17:47.050 --> 00:17:49.258
But say there is no interval
that intersects with it.

00:17:49.258 --> 00:17:51.860
So that means that
there is either-- So

00:17:51.860 --> 00:17:57.480
if there were a gap here-- so
let's say, at this location,

00:17:57.480 --> 00:17:58.980
this interval wasn't here.

00:17:58.980 --> 00:18:02.430
Let's say if you extrapolate
this line outward-- so this

00:18:02.430 --> 00:18:04.210
is your current starting value.

00:18:04.210 --> 00:18:06.730
And let's say you
look at this line.

00:18:06.730 --> 00:18:10.510
And in this segment,
you can't have

00:18:10.510 --> 00:18:12.250
something which
starts after this,

00:18:12.250 --> 00:18:15.140
because this is the current
highest sorting starting time.

00:18:15.140 --> 00:18:17.340
So there's no interval
that starts after this.

00:18:17.340 --> 00:18:19.400
So the only interval
that's going to exist

00:18:19.400 --> 00:18:21.124
have already ended here.

00:18:21.124 --> 00:18:22.540
And if they're
already ended here,

00:18:22.540 --> 00:18:24.560
that means you
could evaluate here.

00:18:24.560 --> 00:18:27.210
Does that make sense?

00:18:27.210 --> 00:18:29.370
So basically, then
you can show that, OK,

00:18:29.370 --> 00:18:32.579
so at every existing-- if
you're adding a new clone, that

00:18:32.579 --> 00:18:33.995
means at every
existing level, you

00:18:33.995 --> 00:18:36.270
have something which intersects.

00:18:36.270 --> 00:18:40.220
So what that means
is that you have

00:18:40.220 --> 00:18:42.672
a single point of
time where there

00:18:42.672 --> 00:18:46.100
are m minus 1 plus 1 intervals.

00:18:46.100 --> 00:18:49.330
That means that you absolutely
need m intervals regardless

00:18:49.330 --> 00:18:51.010
of what your strategy is.

00:18:51.010 --> 00:18:53.690
So adding the m-th
clone is necessary.

00:18:53.690 --> 00:18:55.640
So if you go on,
continue the argument--

00:18:55.640 --> 00:18:57.181
let's say your total
number of clones

00:18:57.181 --> 00:18:59.190
was m-- so you can just
do this argument for m.

00:18:59.190 --> 00:19:01.648
There you will show that, oh,
if I followed all these rules

00:19:01.648 --> 00:19:04.360
correctly, I can show
that the start time for m

00:19:04.360 --> 00:19:07.300
intersects with m minus
1 other intervals.

00:19:07.300 --> 00:19:10.310
So there's no way I
can create a scheduling

00:19:10.310 --> 00:19:12.962
with less than m clones.

00:19:12.962 --> 00:19:15.420
Did that argument make sense,
or should I go over it again?

00:19:19.060 --> 00:19:25.316
So that's somewhat hand wavy,
but that shouldn't be-- OK.

00:19:25.316 --> 00:19:28.390
In any case, well, that's
the three problems.

00:19:28.390 --> 00:19:30.790
So I guess we can
go back to this one

00:19:30.790 --> 00:19:32.510
and sort of give the
motivation for this.

00:19:32.510 --> 00:19:37.055
So this could, for example, be
used in scheduling processes

00:19:37.055 --> 00:19:38.550
for servers, for instance.

00:19:38.550 --> 00:19:42.010
So let's say your server gets
a request to run n processes,

00:19:42.010 --> 00:19:43.570
and they have times like that.

00:19:43.570 --> 00:19:45.140
So this is like
shortest time first.

00:19:45.140 --> 00:19:47.600
So you take all the
short-- the smallest jobs,

00:19:47.600 --> 00:19:49.184
and you execute them
in the beginning.

00:19:49.184 --> 00:19:50.349
And you wait for other jobs.

00:19:50.349 --> 00:19:51.860
And this can also
be done online.

00:19:51.860 --> 00:19:55.780
So you can have an
online version of this.

00:19:55.780 --> 00:19:58.890
So if you take this algorithm
and you do it online-- so let's

00:19:58.890 --> 00:20:02.080
say your server is running
jobs, and you get a new request.

00:20:02.080 --> 00:20:06.170
So you get a new request, so
you already have some set of t1

00:20:06.170 --> 00:20:07.150
to tn.

00:20:07.150 --> 00:20:09.820
And let's say, at
the current moment,

00:20:09.820 --> 00:20:11.560
ti is your smallest job.

00:20:11.560 --> 00:20:16.200
And you're running it, and
you're currently at this point.

00:20:16.200 --> 00:20:18.120
And then in the
middle of running it,

00:20:18.120 --> 00:20:20.350
you can get new
requests for jobs.

00:20:20.350 --> 00:20:23.205
So how would you modify this
algorithm to handle that?

00:20:26.875 --> 00:20:30.290
So you still want to maintain
this lowest average completion

00:20:30.290 --> 00:20:31.316
time thing.

00:20:31.316 --> 00:20:32.940
So how would you
handle this situation.

00:20:32.940 --> 00:20:34.090
So let's say you're
in the middle of a job

00:20:34.090 --> 00:20:36.010
and you get a bunch
of new requests.

00:20:36.010 --> 00:20:39.390
So current set is all
these existing jobs

00:20:39.390 --> 00:20:41.590
plus some other things
you get in here.

00:20:44.910 --> 00:20:48.042
So would you consider switching
to a different job here,

00:20:48.042 --> 00:20:49.250
or would you keep doing this?

00:20:56.520 --> 00:20:59.840
Let's say one of the new
jobs you get is really small.

00:20:59.840 --> 00:21:02.460
So what you would
do in that case

00:21:02.460 --> 00:21:04.910
is that instead of
continuing with this,

00:21:04.910 --> 00:21:07.200
you would switch to
current smallest job.

00:21:07.200 --> 00:21:08.670
So you would look
at the remaining

00:21:08.670 --> 00:21:09.987
time, so that's important.

00:21:09.987 --> 00:21:11.820
So you could forget
about the amount of time

00:21:11.820 --> 00:21:12.904
you already spent on this.

00:21:12.904 --> 00:21:14.403
You know what the
remaining time is,

00:21:14.403 --> 00:21:16.070
and that is all
that is relevant.

00:21:16.070 --> 00:21:17.803
So you can just
consider this problem

00:21:17.803 --> 00:21:19.870
in a different framework.

00:21:19.870 --> 00:21:21.620
It's the exact same question.

00:21:21.620 --> 00:21:24.210
You just look at remaining
time, instead of total time.

00:21:24.210 --> 00:21:25.080
So if you're in the
middle of a job,

00:21:25.080 --> 00:21:26.747
and a new one comes
in which is smaller,

00:21:26.747 --> 00:21:28.371
you just switch to
that, complete that,

00:21:28.371 --> 00:21:30.570
and then look at the remaining
times for everything.

00:21:30.570 --> 00:21:32.028
So at some point
of time, you might

00:21:32.028 --> 00:21:34.470
have a lot of half completed
jobs just lying around.

00:21:34.470 --> 00:21:37.960
And for all of them, you'll
update their ti values

00:21:37.960 --> 00:21:40.130
to remaining time
rather than start time.

00:21:40.130 --> 00:21:42.860
And that gives you
a nice way to decide

00:21:42.860 --> 00:21:44.227
which processes to do online.

00:21:44.227 --> 00:21:45.060
And that gives you--

00:21:45.060 --> 00:21:49.280
So this is assuming that all of
your tasks have equal weights.

00:21:49.280 --> 00:21:51.150
So all of them
have equal reward.

00:21:51.150 --> 00:21:52.970
So obviously, that's
not always the case.

00:21:52.970 --> 00:21:55.730
You might be pushing back
a very long job forever,

00:21:55.730 --> 00:21:57.712
because smaller
things keep coming in

00:21:57.712 --> 00:21:58.920
and that might get important.

00:21:58.920 --> 00:22:01.092
But everything is
equally weighted,

00:22:01.092 --> 00:22:02.842
then this is the optimal
thing you can do.

00:22:02.842 --> 00:22:06.040
And it's a very simple
strategy that works.

00:22:06.040 --> 00:22:09.925
So those are the three
problems I wanted to discuss.

00:22:09.925 --> 00:22:11.925
Do you guys have any other
questions or comments

00:22:11.925 --> 00:22:14.900
or anything?

00:22:14.900 --> 00:22:16.322
Good?

00:22:16.322 --> 00:22:18.700
OK.

00:22:18.700 --> 00:22:20.420
We finished pretty
early, so I guess,

00:22:20.420 --> 00:22:22.540
have a great spring break.