1
00:00:07 --> 00:00:09
Good morning,
everyone.
2
00:00:09 --> 00:00:14
Glad you are all here bright
and early.
3
00:00:14 --> 00:00:20
I'm counting the days till the
TA's outnumber the students.
4
00:00:20 --> 00:00:26
They'll show up.
We return to a familiar story.
5
00:00:26 --> 00:00:32
This is part two,
the Empire Strikes Back.
6
00:00:32 --> 00:00:33
So last time,
our adversary,
7
00:00:33 --> 00:00:36
the graph, came to us with a
problem.
8
00:00:36 --> 00:00:39
We have a source,
and we had a directed graph,
9
00:00:39 --> 00:00:43
and we had weights on the
edges, and they were all
10
00:00:43 --> 00:00:46
nonnegative.
And there was happiness.
11
00:00:46 --> 00:00:50
And we triumphed over the
Empire by designing Dijkstra's
12
00:00:50 --> 00:00:54
algorithm, and very efficiently
finding single source shortest
13
00:00:54 --> 00:01:00
paths, shortest path weight from
s to every other vertex.
14
00:01:00 --> 00:01:02
Today, however,
the Death Star has a new trick
15
00:01:02 --> 00:01:05
up its sleeve,
and we have negative weights,
16
00:01:05 --> 00:01:07
potentially.
And we're going to have to
17
00:01:07 --> 00:01:09
somehow deal with,
in particular,
18
00:01:09 --> 00:01:13
negative weight cycles.
And we saw that when we have a
19
00:01:13 --> 00:01:16
negative weight cycle,
we can just keep going around,
20
00:01:16 --> 00:01:19
and around, and around,
and go back in time farther,
21
00:01:19 --> 00:01:21
and farther,
and farther.
22
00:01:21 --> 00:01:24
And we can get to be
arbitrarily far back in the
23
00:01:24 --> 00:01:26
past.
And so there's no shortest
24
00:01:26 --> 00:01:29
path, because whatever path you
take you can get a shorter one.
25
00:01:29 --> 00:01:33
So we want to address that
issue today, and we're going to
26
00:01:33 --> 00:01:37
come up with a new algorithm
actually simpler than Dijkstra,
27
00:01:37 --> 00:01:39
but not as fast,
called the Bellman-Ford
28
00:01:39 --> 00:01:44
algorithm.
And, it's going to allow
29
00:01:44 --> 00:01:48
negative weights,
and in some sense allow
30
00:01:48 --> 00:01:54
negative weight cycles,
although maybe not as much as
31
00:01:54 --> 00:01:59
you might hope.
We have to leave room for a
32
00:01:59 --> 00:02:04
sequel, of course.
OK, so the Bellman-Ford
33
00:02:04 --> 00:02:09
algorithm, invented by two guys,
as you might expect,
34
00:02:09 --> 00:02:13
it computes the shortest path
weights.
35
00:02:13 --> 00:02:17
So, it makes no assumption
about the weights.
36
00:02:17 --> 00:02:22
Weights are arbitrary,
and it's going to compute the
37
00:02:22 --> 00:02:27
shortest path weights.
So, remember this notation:
38
00:02:27 --> 00:02:33
delta of s, v is the weight of
the shortest path from s to v.
39
00:02:33 --> 00:02:40
s was called a source vertex.
And, we want to compute these
40
00:02:40 --> 00:02:43
weights for all vertices,
little v.
41
00:02:43 --> 00:02:47
The claim is that computing
from s to everywhere is no
42
00:02:47 --> 00:02:51
harder than computing s to a
particular location.
43
00:02:51 --> 00:02:53
So, we're going to do for all
them.
44
00:02:53 --> 00:02:56
It's still going to be the case
here.
45
00:02:56 --> 00:02:59
And, it allows negative
weights.
46
00:02:59 --> 00:03:03
And this is the good case,
but there's an alternative,
47
00:03:03 --> 00:03:07
which is that Bellman-Ford may
just say, oops,
48
00:03:07 --> 00:03:11
there's a negative weight
cycle.
49
00:03:11 --> 00:03:14
And in that case it will just
say so.
50
00:03:14 --> 00:03:18
So, they say a negative weight
cycle exists.
51
00:03:18 --> 00:03:23
Therefore, some of these deltas
are minus infinity.
52
00:03:23 --> 00:03:27
And that seems weird.
So, Bellman-Ford as we'll
53
00:03:27 --> 00:03:33
present it today is intended for
the case, but there are no
54
00:03:33 --> 00:03:39
negative weights cycles,
which is more intuitive.
55
00:03:39 --> 00:03:42
It sort of allows them,
but it will just report them.
56
00:03:42 --> 00:03:45
In that case,
it will not give you delta
57
00:03:45 --> 00:03:48
values.
You can change the algorithm to
58
00:03:48 --> 00:03:52
give you delta values in that
case, but we are not going to
59
00:03:52 --> 00:03:54
see it here.
So, in exercise,
60
00:03:54 --> 00:03:57
after you see the algorithm,
exercise is:
61
00:03:57 --> 00:04:01
compute these deltas in all
cases.
62
00:04:01 --> 00:04:12
63
00:04:12 --> 00:04:19
So, it's not hard to do.
But we don't have time for it
64
00:04:19 --> 00:04:24
here.
So, here's the algorithm.
65
00:04:24 --> 00:04:32
It's pretty straightforward.
As I said, it's easier than
66
00:04:32 --> 00:04:36
Dijkstra.
It's a relaxation algorithm.
67
00:04:36 --> 00:04:40
So the main thing that it does
is relax edges just like
68
00:04:40 --> 00:04:43
Dijkstra.
So, we'll be able to use a lot
69
00:04:43 --> 00:04:47
of dilemmas from Dijkstra.
And proof of correctness will
70
00:04:47 --> 00:04:51
be three times shorter because
the first two thirds we already
71
00:04:51 --> 00:04:55
have from Dijkstra.
But I'm jumping ahead a bit.
72
00:04:55 --> 00:04:57
So, the first part is
initialization.
73
00:04:57 --> 00:05:01
Again, d of v will represent
the estimated distance from s to
74
00:05:01 --> 00:05:05
v.
And we're going to be updating
75
00:05:05 --> 00:05:08
those estimates as the algorithm
goes along.
76
00:05:08 --> 00:05:10
And initially,
d of s is zero,
77
00:05:10 --> 00:05:14
which now may not be the right
answer conceivably.
78
00:05:14 --> 00:05:17
Everyone else is infinity,
which is certainly an upper
79
00:05:17 --> 00:05:20
bound.
OK, these are both upper bounds
80
00:05:20 --> 00:05:23
on the true distance.
So that's fine.
81
00:05:23 --> 00:05:27
That's initialization just like
before.
82
00:05:27 --> 00:05:36
83
00:05:36 --> 00:05:39
And now we have a main loop
which happens v minus one times.
84
00:05:39 --> 00:05:41
We're not actually going to use
the index i.
85
00:05:41 --> 00:05:43
It's just a counter.
86
00:05:43 --> 00:06:02
87
00:06:02 --> 00:06:07
And we're just going to look at
every edge and relax it.
88
00:06:07 --> 00:06:13
It's a very simple idea.
If you learn about relaxation,
89
00:06:13 --> 00:06:16
this is the first thing you
might try.
90
00:06:16 --> 00:06:20
The question is when do you
stop.
91
00:06:20 --> 00:06:25
It's sort of like I have this
friend to what he was like six
92
00:06:25 --> 00:06:31
years old he would claim,
oh, I know how to spell banana.
93
00:06:31 --> 00:06:37
I just don't know when to stop.
OK, same thing with relaxation.
94
00:06:37 --> 00:06:40
This is our relaxation step
just as before.
95
00:06:40 --> 00:06:43
We look at the edge;
we see whether it violates the
96
00:06:43 --> 00:06:47
triangle inequality according to
our current estimates we know
97
00:06:47 --> 00:06:51
the distance from s to v should
be at most distance from s to
98
00:06:51 --> 00:06:54
plus the weight of that edge
from u to v.
99
00:06:54 --> 00:06:55
If it isn't,
we set it equal.
100
00:06:55 --> 00:07:00
We've proved that this is
always an OK thing to do.
101
00:07:00 --> 00:07:03
We never violate,
I mean, these d of v's never
102
00:07:03 --> 00:07:07
get too small if we do a bunch
of relaxations.
103
00:07:07 --> 00:07:09
So, the idea is you take every
edge.
104
00:07:09 --> 00:07:12
You relax it.
I don't care which order.
105
00:07:12 --> 00:07:15
Just relax every edge,
one each.
106
00:07:15 --> 00:07:17
And that do that V minus one
times.
107
00:07:17 --> 00:07:21
The claim is that that should
be enough if you have no
108
00:07:21 --> 00:07:25
negative weights cycles.
So, if there's a negative
109
00:07:25 --> 00:07:30
weight cycle,
we need to figure it out.
110
00:07:30 --> 00:07:35
And, we'll do that in a fairly
straightforward way,
111
00:07:35 --> 00:07:40
which is we're going to do
exactly the same thing.
112
00:07:40 --> 00:07:44
So this is outside before loop
here.
113
00:07:44 --> 00:07:50
We'll have the same four loops
for each edge in our graph.
114
00:07:50 --> 00:07:54
We'll try to relax it.
And if you can relax it,
115
00:07:54 --> 00:08:02
the claim is that there has to
be a negative weight cycle.
116
00:08:02 --> 00:08:04
So this is the main thing that
needs proof.
117
00:08:04 --> 00:08:28
118
00:08:28 --> 00:08:31
OK, and that's the algorithm.
So the claim is that at the
119
00:08:31 --> 00:08:35
ends we should have d of v,
let's see, L's so to speak.
120
00:08:35 --> 00:08:38
d of v equals delta of s comma
v for every vertex,
121
00:08:38 --> 00:08:40
v.
If we don't find a negative
122
00:08:40 --> 00:08:44
weight cycle according to this
rule, that we should have all
123
00:08:44 --> 00:08:47
the shortest path weights.
That's the claim.
124
00:08:47 --> 00:08:50
Now, the first question is,
in here, the running time is
125
00:08:50 --> 00:08:54
very easy to analyze.
So let's start with the running
126
00:08:54 --> 00:08:56
time.
We can compare it to Dijkstra,
127
00:08:56 --> 00:09:02
which is over here.
What is the running time of
128
00:09:02 --> 00:09:06
this algorithm?
V times E, exactly.
129
00:09:06 --> 00:09:12
OK, I'm going to assume,
because it's pretty reasonable,
130
00:09:12 --> 00:09:19
that V and E are both positive.
Then it's V times E.
131
00:09:19 --> 00:09:25
So, this is a little bit
slower, or a fair amount slower,
132
00:09:25 --> 00:09:30
than Dijkstra's algorithm.
There it is:
133
00:09:30 --> 00:09:35
E plus V log V is essentially,
ignoring the logs is pretty
134
00:09:35 --> 00:09:39
much linear time.
Here we have something that's
135
00:09:39 --> 00:09:43
at least quadratic in V,
assuming your graph is
136
00:09:43 --> 00:09:45
connected.
So, it's slower,
137
00:09:45 --> 00:09:48
but it's going to handle these
negative weights.
138
00:09:48 --> 00:09:52
Dijkstra can't handle negative
weights at all.
139
00:09:52 --> 00:09:56
So, let's do an example,
make it clear why you might
140
00:09:56 --> 00:10:03
hope this algorithm works.
And then we'll prove that it
141
00:10:03 --> 00:10:08
works, of course.
But the proof will be pretty
142
00:10:08 --> 00:10:12
easy.
So, I'm going to draw a graph
143
00:10:12 --> 00:10:18
that has negative weights,
but no negative weight cycles
144
00:10:18 --> 00:10:24
so that I get an interesting
answer.
145
00:10:24 --> 00:10:55
146
00:10:55 --> 00:10:57
Good.
The other thing I need in order
147
00:10:57 --> 00:11:00
to make the output of this
algorithm well defined,
148
00:11:00 --> 00:11:03
it depends in which order you
visit the edges.
149
00:11:03 --> 00:11:07
So I'm going to assign an
arbitrary order to these edges.
150
00:11:07 --> 00:11:11
I could just ask you for an
order, but to be consistent with
151
00:11:11 --> 00:11:13
the notes, I'll put an ordering
on it.
152
00:11:13 --> 00:11:17
Let's say I put number four,
say that's the fourth edge I'll
153
00:11:17 --> 00:11:18
visit.
It doesn't matter.
154
00:11:18 --> 00:11:22
But it will affect what happens
during the algorithm for a
155
00:11:22 --> 00:11:25
particular graph.
156
00:11:25 --> 00:11:43
157
00:11:43 --> 00:11:46
Do they get them all?
One, two, three,
158
00:11:46 --> 00:11:48
four, five, six,
seven, eight,
159
00:11:48 --> 00:11:51
OK.
And my source is going to be A.
160
00:11:51 --> 00:11:54
And, that's it.
So, I want to run this
161
00:11:54 --> 00:11:57
algorithm.
I'm just going to initialize
162
00:11:57 --> 00:12:01
everything.
So, I set the estimates for s
163
00:12:01 --> 00:12:06
to be zero, and everyone else to
be infinity.
164
00:12:06 --> 00:12:10
And to give me some notion of
time, over here I'm going to
165
00:12:10 --> 00:12:15
draw or write down what all of
these d values are as the
166
00:12:15 --> 00:12:20
algorithm proceeds because I'm
going to start crossing them out
167
00:12:20 --> 00:12:25
and rewriting them that the
figure will get a little bit
168
00:12:25 --> 00:12:28
messier.
But we can keep track of it
169
00:12:28 --> 00:12:31
over here.
It's initially zero and
170
00:12:31 --> 00:12:34
infinities.
Yeah?
171
00:12:34 --> 00:12:36
It doesn't matter.
So, for the algorithm you can
172
00:12:36 --> 00:12:40
go to the edges in a different
order every time if you want.
173
00:12:40 --> 00:12:42
We'll prove that,
but here I'm going to go
174
00:12:42 --> 00:12:44
through the same order every
time.
175
00:12:44 --> 00:12:47
Good question.
It turns out it doesn't matter
176
00:12:47 --> 00:12:49
here.
OK, so here's the starting
177
00:12:49 --> 00:12:51
point.
Now I'm going to relax every
178
00:12:51 --> 00:12:53
edge.
So, there's going to be a lot
179
00:12:53 --> 00:12:55
of edges here that don't do
anything.
180
00:12:55 --> 00:12:57
I try to relax n minus one.
I'd say, well,
181
00:12:57 --> 00:13:02
I know how to get from s to B
with weight infinity.
182
00:13:02 --> 00:13:04
Infinity plus two I can get to
from s to E.
183
00:13:04 --> 00:13:08
Well, infinity plus two is not
much better than infinity.
184
00:13:08 --> 00:13:11
OK, so I don't do anything,
don't update this to infinity.
185
00:13:11 --> 00:13:14
I mean, infinity plus two
sounds even worse.
186
00:13:14 --> 00:13:16
But infinity plus two is
infinity.
187
00:13:16 --> 00:13:20
OK, that's the edge number one.
So, no relaxation edge number
188
00:13:20 --> 00:13:24
two, same deal as number three,
same deal, edge number four we
189
00:13:24 --> 00:13:27
start to get something
interesting because I have a
190
00:13:27 --> 00:13:31
finite value here that says I
can get from A to B using a
191
00:13:31 --> 00:13:35
total weight of minus one.
So that seems good.
192
00:13:35 --> 00:13:41
I'll write down minus one here,
and update B to minus one.
193
00:13:41 --> 00:13:45
The rest stay the same.
So, I'm just going to keep
194
00:13:45 --> 00:13:50
doing this over and over.
That was edge number four.
195
00:13:50 --> 00:13:53
Number five,
we also get a relaxation.
196
00:13:53 --> 00:14:00
Four is better than infinity.
So, c gets a number of four.
197
00:14:00 --> 00:14:04
Then we get to edge number six.
That's infinity plus five is
198
00:14:04 --> 00:14:07
worse than four.
OK, so no relaxation there.
199
00:14:07 --> 00:14:11
Edge number seven is
interesting because I have a
200
00:14:11 --> 00:14:15
finite value here minus one plus
the weight of this edge,
201
00:14:15 --> 00:14:18
which is three.
That's a total of two,
202
00:14:18 --> 00:14:20
which is actually better than
four.
203
00:14:20 --> 00:14:24
So, this route,
A, B, c is actually better than
204
00:14:24 --> 00:14:26
the route I just found a second
ago.
205
00:14:26 --> 00:14:30
So, this is now a two.
This is all happening in one
206
00:14:30 --> 00:14:35
iteration of the main loop.
We actually found two good
207
00:14:35 --> 00:14:38
paths to c.
We found one better than the
208
00:14:38 --> 00:14:41
other.
OK, and that was edge number
209
00:14:41 --> 00:14:44
seven, and edge number eight is
over here.
210
00:14:44 --> 00:14:47
It doesn't matter.
OK, so that was round one of
211
00:14:47 --> 00:14:50
this outer loop,
so, the first value of i.
212
00:14:50 --> 00:14:52
i equals one.
OK, now we continue.
213
00:14:52 --> 00:14:56
Just keep going.
So, we start with edge number
214
00:14:56 --> 00:15:00
one.
Now, minus one plus two is one.
215
00:15:00 --> 00:15:04
That's better than infinity.
It'll start speeding up.
216
00:15:04 --> 00:15:08
It's repetitive.
It's actually not too much
217
00:15:08 --> 00:15:14
longer until we're done.
Number two, this is an infinity
218
00:15:14 --> 00:15:17
so we don't do anything.
Number three:
219
00:15:17 --> 00:15:22
minus one plus two is one;
better than infinity.
220
00:15:22 --> 00:15:25
This is vertex d,
and it's number three.
221
00:15:25 --> 00:15:31
Number four we've already done.
Nothing changed.
222
00:15:31 --> 00:15:35
Number five:
this is where we see the path
223
00:15:35 --> 00:15:38
four again, but that's worse
than two.
224
00:15:38 --> 00:15:43
So, we don't update anything.
Number six: one plus five is
225
00:15:43 --> 00:15:47
six, which is bigger than two,
so no good.
226
00:15:47 --> 00:15:49
Go around this way.
Number seven:
227
00:15:49 --> 00:15:53
same deal.
Number eight is interesting.
228
00:15:53 --> 00:15:58
So, we have a weight of one
here, a weight of minus three
229
00:15:58 --> 00:16:02
here.
So, the total is minus two,
230
00:16:02 --> 00:16:07
which is better than one.
So, that was d.
231
00:16:07 --> 00:16:13
And, I believe that's it.
So that was definitely the end
232
00:16:13 --> 00:16:18
of that round.
So, it's I plus two because we
233
00:16:18 --> 00:16:24
just looked at the eighth edge.
And, I'll cheat and check.
234
00:16:24 --> 00:16:30
Indeed, that is the last thing
that happens.
235
00:16:30 --> 00:16:33
We can check the couple of
outgoing edges from d because
236
00:16:33 --> 00:16:36
that's the only one whose value
just changed.
237
00:16:36 --> 00:16:39
And, there are no more
relaxations possible.
238
00:16:39 --> 00:16:43
So, that was in two rounds.
The claim is we got all the
239
00:16:43 --> 00:16:47
shortest path weights.
The algorithm would actually
240
00:16:47 --> 00:16:51
loop four times to guarantee
correctness because we have five
241
00:16:51 --> 00:16:53
vertices here and one less than
that.
242
00:16:53 --> 00:16:56
So, in fact,
in the execution here there are
243
00:16:56 --> 00:16:59
two more blank rounds at the
bottom.
244
00:16:59 --> 00:17:03
Nothing happens.
But, what the hell?
245
00:17:03 --> 00:17:06
OK, so that is Bellman-Ford.
I mean, it's certainly not
246
00:17:06 --> 00:17:08
doing anything wrong.
The question is,
247
00:17:08 --> 00:17:11
why is it guaranteed to
converge in V minus one steps
248
00:17:11 --> 00:17:13
unless there is a negative
weight cycle?
249
00:17:13 --> 00:17:15
Question?
250
00:17:15 --> 00:17:24
251
00:17:24 --> 00:17:25
Right, so that's an
optimization.
252
00:17:25 --> 00:17:28
If you discover a whole round,
and nothing happens,
253
00:17:28 --> 00:17:31
so you can keep track of that
in the algorithm thing,
254
00:17:31 --> 00:17:33
you can stop.
In the worst case,
255
00:17:33 --> 00:17:35
it won't make a difference.
But in practice,
256
00:17:35 --> 00:17:37
you probably want to do that.
Yeah?
257
00:17:37 --> 00:17:40
Good question.
All right, so some simple
258
00:17:40 --> 00:17:42
observations,
I mean, we're only doing
259
00:17:42 --> 00:17:44
relaxation.
So, we can use a lot of our
260
00:17:44 --> 00:17:46
analysis from before.
In particular,
261
00:17:46 --> 00:17:49
the d values are only
decreasing monotonically.
262
00:17:49 --> 00:17:51
As we cross out values here,
we are always making it
263
00:17:51 --> 00:17:54
smaller, which is good.
Another nifty thing about this
264
00:17:54 --> 00:18:00
algorithm is that you can run it
even in a distributed system.
265
00:18:00 --> 00:18:02
If this is some actual network,
some computer network,
266
00:18:02 --> 00:18:05
and these are machines,
and they're communicating by
267
00:18:05 --> 00:18:07
these links, I mean,
it's a purely local thing.
268
00:18:07 --> 00:18:09
Relaxation is a local thing.
You don't need any global
269
00:18:09 --> 00:18:12
strategy, and you're asking
about, can we do a different
270
00:18:12 --> 00:18:15
order in each step?
Well, yeah, you could just keep
271
00:18:15 --> 00:18:16
relaxing edges,
and keep relaxing edges,
272
00:18:16 --> 00:18:19
and just keep going for the
entire lifetime of the network.
273
00:18:19 --> 00:18:21
And eventually,
you will find shortest paths.
274
00:18:21 --> 00:18:24
So, this algorithm is
guaranteed to finish in V rounds
275
00:18:24 --> 00:18:27
in a distributed system.
It might be more asynchronous.
276
00:18:27 --> 00:18:30
And, it's a little harder to
analyze.
277
00:18:30 --> 00:18:34
But it will still work
eventually.
278
00:18:34 --> 00:18:41
It's guaranteed to converge.
And so, Bellman-Ford is used in
279
00:18:41 --> 00:18:46
the Internet for finding
shortest paths.
280
00:18:46 --> 00:18:51
OK, so let's finally prove that
it works.
281
00:18:51 --> 00:18:56
This should only take a couple
of boards.
282
00:18:56 --> 00:19:03
So let's suppose we have a
graph and some edge weights that
283
00:19:03 --> 00:19:13
have no negative weight cycles.
Then the claim is that we
284
00:19:13 --> 00:19:19
terminate with the correct
answer.
285
00:19:19 --> 00:19:29
So, Bellman-Ford terminates
with all of these d of v values
286
00:19:29 --> 00:19:38
set to the delta values for
every vertex.
287
00:19:38 --> 00:19:42
OK, the proof is going to be
pretty immediate using the
288
00:19:42 --> 00:19:45
lemmas that we had from before
if you remember them.
289
00:19:45 --> 00:19:50
So, we're just going to look at
every vertex separately.
290
00:19:50 --> 00:19:54
So, I'll call the vertex v.
The claim is that this holds by
291
00:19:54 --> 00:19:58
the end of the algorithm.
So, remember what we need to
292
00:19:58 --> 00:20:02
prove is that at some point,
d of v equals delta of s comma
293
00:20:02 --> 00:20:06
v because we know it decreases
monotonically,
294
00:20:06 --> 00:20:10
and we know that it never gets
any smaller than the correct
295
00:20:10 --> 00:20:15
value because relaxations are
always safe.
296
00:20:15 --> 00:20:24
So, we just need to show at
some point this holds,
297
00:20:24 --> 00:20:32
and that it will hold at the
end.
298
00:20:32 --> 00:20:41
So, by monotonicity of the d
values, and by correctness part
299
00:20:41 --> 00:20:51
one, which was that the d of v's
are always greater than or equal
300
00:20:51 --> 00:20:58
to the deltas,
we only need to show that at
301
00:20:58 --> 00:21:04
some point we have equality.
302
00:21:04 --> 00:21:18
303
00:21:18 --> 00:21:21
So that's our goal.
So what we're going to do is
304
00:21:21 --> 00:21:24
just look at v,
and the shortest path to v,
305
00:21:24 --> 00:21:30
and see what happens to the
algorithm relative to that path.
306
00:21:30 --> 00:21:35
So, I'm going to name the path.
Let's call it p.
307
00:21:35 --> 00:21:40
It starts at vertex v_0 and
goes to v_1, v_2,
308
00:21:40 --> 00:21:46
whatever, and ends at v_k.
And, this is not just any
309
00:21:46 --> 00:21:51
shortest path,
but it's one that starts at s.
310
00:21:51 --> 00:21:54
So, v_0's s,
and it ends at v.
311
00:21:54 --> 00:22:01
So, I'm going to give a couple
of names to s and v so I can
312
00:22:01 --> 00:22:04
talk about the path more
uniformly.
313
00:22:04 --> 00:22:11
So, this is a shortest path
from s to v.
314
00:22:11 --> 00:22:15
Now, I also want it to be not
just any shortest path from s to
315
00:22:15 --> 00:22:20
v, but among all shortest paths
from s to v I want it to be one
316
00:22:20 --> 00:22:23
with the fewest possible edges.
317
00:22:23 --> 00:22:32
318
00:22:32 --> 00:22:36
OK, so shortest here means in
terms of the total weight of the
319
00:22:36 --> 00:22:38
path.
Subject to being shortest in
320
00:22:38 --> 00:22:42
weight, I wanted to also be
shortest in the number of edges.
321
00:22:42 --> 00:22:46
And, the reason I want that is
to be able to conclude that p is
322
00:22:46 --> 00:22:50
a simple path,
meaning that it doesn't repeat
323
00:22:50 --> 00:22:52
any vertices.
Now, can anyone tell me why I
324
00:22:52 --> 00:22:56
need to assume that the number
of edges is the smallest
325
00:22:56 --> 00:23:01
possible in order to guarantee
that p is simple?
326
00:23:01 --> 00:23:04
The claim is that not all
shortest paths are necessarily
327
00:23:04 --> 00:23:05
simple.
Yeah?
328
00:23:05 --> 00:23:07
Right, I can have a zero weight
cycle, exactly.
329
00:23:07 --> 00:23:10
So, we are hoping,
I mean, in fact in the theorem
330
00:23:10 --> 00:23:14
here, we're assuming that there
are no negative weight cycles.
331
00:23:14 --> 00:23:17
But there might be zero weight
cycles still.
332
00:23:17 --> 00:23:20
As a zero weight cycle,
you can put that in the middle
333
00:23:20 --> 00:23:23
of any shortest path to make it
arbitrarily long,
334
00:23:23 --> 00:23:26
repeat vertices over and over.
That's going to be annoying.
335
00:23:26 --> 00:23:30
What I want is that p is
simple.
336
00:23:30 --> 00:23:33
And, I can guarantee that
essentially by shortcutting.
337
00:23:33 --> 00:23:36
If ever I take a zero weight
cycle, I throw it away.
338
00:23:36 --> 00:23:39
And this is one mathematical
way of doing that.
339
00:23:39 --> 00:23:43
OK, now what else do we know
about this shortest path?
340
00:23:43 --> 00:23:47
Well, we know that subpaths are
shortest paths are shortest
341
00:23:47 --> 00:23:49
paths.
That's optimal substructure.
342
00:23:49 --> 00:23:53
So, we know what the shortest
path from s to v_i is sort of
343
00:23:53 --> 00:23:55
inductively.
It's the shortest path,
344
00:23:55 --> 00:23:58
I mean, it's the weight of that
path, which is,
345
00:23:58 --> 00:24:01
in particular,
the shortest path from s to v
346
00:24:01 --> 00:24:07
minus one plus the weight of the
last edge, v minus one to v_i.
347
00:24:07 --> 00:24:17
So, this is by optimal
substructure as we proved last
348
00:24:17 --> 00:24:23
time.
OK, and I think that's pretty
349
00:24:23 --> 00:24:30
much the warm-up.
So, I want to sort of do this
350
00:24:30 --> 00:24:33
inductively in I,
start out with v zero,
351
00:24:33 --> 00:24:37
and go up to v_k.
So, the first question is,
352
00:24:37 --> 00:24:40
what is d of v_0,
which is s?
353
00:24:40 --> 00:24:44
What is d of the source?
Well, certainly at the
354
00:24:44 --> 00:24:47
beginning of the algorithm,
it's zero.
355
00:24:47 --> 00:24:52
So, let's say equals zero
initially because that's what we
356
00:24:52 --> 00:24:55
set it to.
And it only goes down from
357
00:24:55 --> 00:24:57
there.
So, it certainly,
358
00:24:57 --> 00:25:01
at most, zero.
The real question is,
359
00:25:01 --> 00:25:06
what is delta of s comma v_0.
What is the shortest path
360
00:25:06 --> 00:25:09
weight from s to s?
It has to be zero,
361
00:25:09 --> 00:25:13
otherwise you have a negative
weight cycle,
362
00:25:13 --> 00:25:15
exactly.
My favorite answer,
363
00:25:15 --> 00:25:19
zero.
So, if we had another path from
364
00:25:19 --> 00:25:21
s to s, I mean,
that is a cycle.
365
00:25:21 --> 00:25:26
So, it's got to be zero.
So, these are actually equal at
366
00:25:26 --> 00:25:32
the beginning of the algorithm,
which is great.
367
00:25:32 --> 00:25:37
That means they will be for all
time because we just argued up
368
00:25:37 --> 00:25:41
here, only goes down,
never can get too small.
369
00:25:41 --> 00:25:45
So, we have d of v_0 set to the
right thing.
370
00:25:45 --> 00:25:49
Great: good for the base case
of the induction.
371
00:25:49 --> 00:25:53
Of course, what we really care
about is v_k,
372
00:25:53 --> 00:25:56
which is v.
So, let's talk about the v_i
373
00:25:56 --> 00:26:02
inductively, and then we will
get v_k as a result.
374
00:26:02 --> 00:26:11
375
00:26:11 --> 00:26:14
So, yeah, let's do it by
induction.
376
00:26:14 --> 00:26:16
That's more fun.
377
00:26:16 --> 00:26:27
378
00:26:27 --> 00:26:32
Let's say that d of v_i is
equal to delta of s v_i after I
379
00:26:32 --> 00:26:38
rounds of the algorithm.
So, this is actually referring
380
00:26:38 --> 00:26:42
to the I that is in the
algorithm here.
381
00:26:42 --> 00:26:46
These are rounds.
So, one round is an entire
382
00:26:46 --> 00:26:52
execution of all the edges,
relaxation of all the edges.
383
00:26:52 --> 00:26:56
So, this is certainly true for
I equals zero.
384
00:26:56 --> 00:27:00
We just proved that.
After zero rounds,
385
00:27:00 --> 00:27:06
at the beginning of the
algorithm, d of v_0 equals delta
386
00:27:06 --> 00:27:11
of s, v_0.
OK, so now, that's not really
387
00:27:11 --> 00:27:13
what I wanted,
but OK, fine.
388
00:27:13 --> 00:27:16
Now we'll prove it for d of v_i
plus one.
389
00:27:16 --> 00:27:20
Generally, I recommend you
assume something.
390
00:27:20 --> 00:27:24
In fact, why don't I follow my
own advice and change it?
391
00:27:24 --> 00:27:29
It's usually nicer to think of
induction as recursion.
392
00:27:29 --> 00:27:32
So, you assume that this is
true, let's say,
393
00:27:32 --> 00:27:37
for j less than the i that you
care about, and then you prove
394
00:27:37 --> 00:27:42
it for d of v_i.
It's usually a lot easier to
395
00:27:42 --> 00:27:44
think about it that way.
In particular,
396
00:27:44 --> 00:27:48
you can use strong induction
for all less than i.
397
00:27:48 --> 00:27:51
Here, we're only going to need
it for one less.
398
00:27:51 --> 00:27:56
We have some relation between I
and I minus one here in terms of
399
00:27:56 --> 00:27:59
the deltas.
And so, we want to argue
400
00:27:59 --> 00:28:05
something about the d values.
OK, well, let's think about
401
00:28:05 --> 00:28:08
what's going on here.
We know that,
402
00:28:08 --> 00:28:15
let's say, after I minus one
rounds, we have this inductive
403
00:28:15 --> 00:28:22
hypothesis, d of v_i minus one
equals delta of s v_i minus one.
404
00:28:22 --> 00:28:27
And, we want to conclude that
after i rounds,
405
00:28:27 --> 00:28:31
so we have one more round to do
this.
406
00:28:31 --> 00:28:38
We want to conclude that d of
v_i has the right answer,
407
00:28:38 --> 00:28:44
delta of s comma v_i.
Does that look familiar at all?
408
00:28:44 --> 00:28:47
So we want to relax every edge
in this round.
409
00:28:47 --> 00:28:49
In particular,
at some point,
410
00:28:49 --> 00:28:53
we have to relax the edge from
v_i minus one to v_i.
411
00:28:53 --> 00:28:56
We know that this path consists
of edges.
412
00:28:56 --> 00:29:00
That's the definition of a
path.
413
00:29:00 --> 00:29:10
So, during the i'th round,
we relax every edge.
414
00:29:10 --> 00:29:18
So, we better relax v_i minus
one v_i.
415
00:29:18 --> 00:29:30
And, what happens then?
It's a test of memory.
416
00:29:30 --> 00:29:43
417
00:29:43 --> 00:29:46
Quick, the Death Star is
approaching.
418
00:29:46 --> 00:29:51
So, if we have the correct
value for v_i minus one,
419
00:29:51 --> 00:29:57
that we relax an outgoing edge
from there, and that edge is an
420
00:29:57 --> 00:30:01
edge of the shortest path from s
to v_i.
421
00:30:01 --> 00:30:07
What do we know?
d of v_i becomes the correct
422
00:30:07 --> 00:30:13
value, delta of s comma v_i.
This was called correctness
423
00:30:13 --> 00:30:18
lemma last time.
One of the things we proved
424
00:30:18 --> 00:30:24
about Dijkstra's algorithm,
but it was really just a fact
425
00:30:24 --> 00:30:29
about relaxation.
And it was a pretty simple
426
00:30:29 --> 00:30:32
proof.
And it comes from this fact.
427
00:30:32 --> 00:30:35
We know the shortest path
weight is this.
428
00:30:35 --> 00:30:38
So, certainly d of v_i was at
least this big,
429
00:30:38 --> 00:30:42
and let's suppose it's greater,
or otherwise we were done.
430
00:30:42 --> 00:30:44
We know d of v_i minus one is
set to this.
431
00:30:44 --> 00:30:48
And so, this is exactly the
condition that's being checked
432
00:30:48 --> 00:30:52
in the relaxation step.
And, the d of v_i value will be
433
00:30:52 --> 00:30:54
greater than this,
let's suppose.
434
00:30:54 --> 00:30:56
And then, we'll set it equal to
this.
435
00:30:56 --> 00:31:01
And that's exactly d of s v_i.
So, when we relax that edge,
436
00:31:01 --> 00:31:04
we've got to set it to the
right value.
437
00:31:04 --> 00:31:06
So, this is the end of the
proof, right?
438
00:31:06 --> 00:31:08
It's very simple.
The point is,
439
00:31:08 --> 00:31:11
you look at your shortest path.
Here it is.
440
00:31:11 --> 00:31:14
And if we assume there's no
negative weight cycles,
441
00:31:14 --> 00:31:17
this has the correct value
initially.
442
00:31:17 --> 00:31:20
d of s is going to be zero.
After the first round,
443
00:31:20 --> 00:31:23
you've got to relax this edge.
And then you get the right
444
00:31:23 --> 00:31:26
value for that vertex.
After the second round,
445
00:31:26 --> 00:31:30
you've got to relax this edge,
which gets you the right d
446
00:31:30 --> 00:31:36
value for this vertex and so on.
And so, no matter which
447
00:31:36 --> 00:31:40
shortest path you take,
you can apply this analysis.
448
00:31:40 --> 00:31:44
And you know that by,
if the length of this path,
449
00:31:44 --> 00:31:50
here we assumed it was k edges,
then after k rounds you've got
450
00:31:50 --> 00:31:53
to be done.
OK, so this was not actually
451
00:31:53 --> 00:31:57
the end of the proof.
Sorry.
452
00:31:57 --> 00:32:03
So this means after k rounds,
we have the right answer for
453
00:32:03 --> 00:32:08
v_k, which is v.
So, the only question is how
454
00:32:08 --> 00:32:12
big could k be?
And, it better be the right
455
00:32:12 --> 00:32:18
answer, at most,
v minus one is the claim by the
456
00:32:18 --> 00:32:24
algorithm that you only need to
do v minus one steps.
457
00:32:24 --> 00:32:30
And indeed, the number of edges
in a simple path in a graph is,
458
00:32:30 --> 00:32:37
at most, the number of vertices
minus one.
459
00:32:37 --> 00:32:40
k is, at most,
v minus one because p is
460
00:32:40 --> 00:32:43
simple.
So, that's why we had to assume
461
00:32:43 --> 00:32:47
that it wasn't just any shortest
path.
462
00:32:47 --> 00:32:52
It had to be a simple one so it
didn't repeat any vertices.
463
00:32:52 --> 00:32:55
So there are,
at most, V vertices in the
464
00:32:55 --> 00:33:01
path, so at most,
V minus one edges in the path.
465
00:33:01 --> 00:33:05
OK, and that's all there is to
Bellman-Ford.
466
00:33:05 --> 00:33:08
So: pretty simple in
correctness.
467
00:33:08 --> 00:33:15
Of course, we're using a lot of
the lemmas that we proved last
468
00:33:15 --> 00:33:21
time, which makes it easier.
OK, a consequence of this
469
00:33:21 --> 00:33:27
theorem, or of this proof is
that if Bellman-Ford fails to
470
00:33:27 --> 00:33:33
converge, and that's what the
algorithm is checking is whether
471
00:33:33 --> 00:33:39
this relaxation still requires
work after these d minus one
472
00:33:39 --> 00:33:44
steps.
Right, the end of this
473
00:33:44 --> 00:33:48
algorithm is run another round,
a V'th round,
474
00:33:48 --> 00:33:53
see whether anything changes.
So, we'll say that the
475
00:33:53 --> 00:33:58
algorithm fails to converge
after V minus one steps or
476
00:33:58 --> 00:34:01
rounds.
Then, there has to be a
477
00:34:01 --> 00:34:04
negative weight cycle.
OK, this is just a
478
00:34:04 --> 00:34:06
contrapositive of what we
proved.
479
00:34:06 --> 00:34:10
We proved that if you assume
there's no negative weight
480
00:34:10 --> 00:34:14
cycle, then we know that d of s
is zero, and then all this
481
00:34:14 --> 00:34:18
argument says is you've got to
converge after v minus one
482
00:34:18 --> 00:34:21
rounds.
There can't be anything left to
483
00:34:21 --> 00:34:24
do once you've reached the
shortest path weights because
484
00:34:24 --> 00:34:30
you're going monotonically;
you can never hit the bottom.
485
00:34:30 --> 00:34:33
You can never go to the floor.
So, if you fail to converge
486
00:34:33 --> 00:34:37
somehow after V minus one
rounds, you've got to have
487
00:34:37 --> 00:34:40
violated the assumption.
The only assumption we made was
488
00:34:40 --> 00:34:42
there's no negative weight
cycle.
489
00:34:42 --> 00:34:45
So, this tells us that
Bellman-Ford is actually
490
00:34:45 --> 00:34:48
correct.
When it says that there is a
491
00:34:48 --> 00:34:51
negative weight cycle,
it indeed means it.
492
00:34:51 --> 00:34:53
It's true.
OK, and you can modify
493
00:34:53 --> 00:34:56
Bellman-Ford in that case to
sort of run a little longer,
494
00:34:56 --> 00:35:01
and find where all the minus
infinities are.
495
00:35:01 --> 00:35:02
And that is,
in some sense,
496
00:35:02 --> 00:35:05
one of the things you have to
do in your problem set,
497
00:35:05 --> 00:35:08
I believe.
So, I won't cover it here.
498
00:35:08 --> 00:35:11
But, it's a good exercise in
any case to figure out how you
499
00:35:11 --> 00:35:14
would find where the minus
infinities are.
500
00:35:14 --> 00:35:18
What are all the vertices
reachable from negative weight
501
00:35:18 --> 00:35:20
cycle?
Those are the ones that have
502
00:35:20 --> 00:35:22
minus infinities.
OK, so you might say,
503
00:35:22 --> 00:35:26
well, that was awfully fast.
Actually, it's not over yet.
504
00:35:26 --> 00:35:29
The episode is not yet ended.
We're going to use Bellman-Ford
505
00:35:29 --> 00:35:35
to solve the even bigger and
greater shortest path problems.
506
00:35:35 --> 00:35:39
And in the remainder of today's
lecture, we will see it applied
507
00:35:39 --> 00:35:42
to a more general problem,
in some sense,
508
00:35:42 --> 00:35:45
called linear programming.
And the next lecture,
509
00:35:45 --> 00:35:49
we'll really use it to do some
amazing stuff with all pairs
510
00:35:49 --> 00:35:52
shortest paths.
Let's go over here.
511
00:35:52 --> 00:35:55
So, our goal,
although it won't be obvious
512
00:35:55 --> 00:35:59
today, is to be able to compute
the shortest paths between every
513
00:35:59 --> 00:36:03
pair of vertices,
which we could certainly do at
514
00:36:03 --> 00:36:08
this point just by running
Bellman-Ford v times.
515
00:36:08 --> 00:36:15
OK, but we want to do better
than that, of course.
516
00:36:15 --> 00:36:21
And, that will be the climax of
the trilogy.
517
00:36:21 --> 00:36:30
OK, today we just discovered
who Luke's father is.
518
00:36:30 --> 00:36:37
So, it turns out the father of
shortest paths is linear
519
00:36:37 --> 00:36:42
programming.
Actually, simultaneously the
520
00:36:42 --> 00:36:50
father and the mother because
programs do not have gender.
521
00:36:50 --> 00:36:57
OK, my father likes to say,
we both took improv comedy
522
00:36:57 --> 00:37:05
lessons so we have degrees in
improvisation.
523
00:37:05 --> 00:37:07
And he said,
you know, we went to improv
524
00:37:07 --> 00:37:10
classes in order to learn how to
make our humor better.
525
00:37:10 --> 00:37:13
And, the problem is,
it didn't actually make our
526
00:37:13 --> 00:37:16
humor better.
It just made us less afraid to
527
00:37:16 --> 00:37:17
use it.
[LAUGHTER] So,
528
00:37:17 --> 00:37:20
you are subjected to all this
improv humor.
529
00:37:20 --> 00:37:22
I didn't see the connection of
Luke's father,
530
00:37:22 --> 00:37:25
but there you go.
OK, so, linear programming is a
531
00:37:25 --> 00:37:29
very general problem,
a very big tool.
532
00:37:29 --> 00:37:32
Has anyone seen linear
programming before?
533
00:37:32 --> 00:37:36
OK, one person.
And, I'm sure you will,
534
00:37:36 --> 00:37:40
at some time in your life,
do anything vaguely computing
535
00:37:40 --> 00:37:45
optimization related,
linear programming comes up at
536
00:37:45 --> 00:37:48
some point.
It's a very useful tool.
537
00:37:48 --> 00:37:53
You're given a matrix and two
vectors: not too exciting yet.
538
00:37:53 --> 00:37:57
What you want to do is find a
vector.
539
00:37:57 --> 00:38:02
This is a very dry description.
We'll see what makes it so
540
00:38:02 --> 00:38:04
interesting in a moment.
541
00:38:04 --> 00:38:17
542
00:38:17 --> 00:38:21
So, you want to maximize some
objective, and you have some
543
00:38:21 --> 00:38:24
constraints.
And they're all linear.
544
00:38:24 --> 00:38:28
So, the objective is a linear
function in the variables x,
545
00:38:28 --> 00:38:32
and your constraints are a
bunch of linear constraints,
546
00:38:32 --> 00:38:36
inequality constraints,
that's one makes an
547
00:38:36 --> 00:38:39
interesting.
It's not just solving a linear
548
00:38:39 --> 00:38:43
system as you've seen in linear
algebra, or whatever.
549
00:38:43 --> 00:38:46
Or, of course,
it could be that there is no
550
00:38:46 --> 00:38:49
such x.
OK: vaguely familiar you might
551
00:38:49 --> 00:38:52
think to the theorem about
Bellman-Ford.
552
00:38:52 --> 00:38:56
And, we'll show that there's
some kind of connection here
553
00:38:56 --> 00:39:01
that either you want to find
something, or show that it
554
00:39:01 --> 00:39:06
doesn't exist.
Well, that's still a pretty
555
00:39:06 --> 00:39:09
vague connection,
but I also want to maximize
556
00:39:09 --> 00:39:13
something, or are sort of
minimize the shortest paths,
557
00:39:13 --> 00:39:17
OK, somewhat similar.
We have these constraints.
558
00:39:17 --> 00:39:19
So, yeah.
This may be intuitive to you,
559
00:39:19 --> 00:39:22
I don't know.
I prefer a more geometric
560
00:39:22 --> 00:39:27
picture, and I will try to draw
such a geometric picture,
561
00:39:27 --> 00:39:30
and I've never tried to do this
on a blackboard,
562
00:39:30 --> 00:39:36
so it should be interesting.
I think I'm going to fail
563
00:39:36 --> 00:39:39
miserably.
It sort of looks like a
564
00:39:39 --> 00:39:41
dodecahedron,
right?
565
00:39:41 --> 00:39:44
Sort of, kind of,
not really.
566
00:39:44 --> 00:39:47
A bit rough on the bottom,
OK.
567
00:39:47 --> 00:39:51
So, if you have a bunch of
linear constraints,
568
00:39:51 --> 00:39:56
this is supposed to be in 3-D.
Now I labeled it.
569
00:39:56 --> 00:40:00
It's now in 3-D.
Good.
570
00:40:00 --> 00:40:02
So, you have these linear
constraints.
571
00:40:02 --> 00:40:06
That turns out to define
hyperplanes in n dimensions.
572
00:40:06 --> 00:40:11
OK, so you have this base here
that's three-dimensional space.
573
00:40:11 --> 00:40:14
So, n equals three.
And, these hyperplanes,
574
00:40:14 --> 00:40:17
if you're looking at one side
of the hyperplane,
575
00:40:17 --> 00:40:21
that's the less than or equal
to, if you take the
576
00:40:21 --> 00:40:24
intersection,
you get some convex polytope or
577
00:40:24 --> 00:40:27
polyhedron.
In 3-D, you might get a
578
00:40:27 --> 00:40:29
dodecahedron or whatever.
And, your goal,
579
00:40:29 --> 00:40:33
you have some objective vector
c, let's say,
580
00:40:33 --> 00:40:37
up.
Suppose that's the c vector.
581
00:40:37 --> 00:40:42
Your goal is to find the
highest point in this polytope.
582
00:40:42 --> 00:40:47
So here, it's maybe this one.
OK, this is the target.
583
00:40:47 --> 00:40:49
This is the optimal,
x.
584
00:40:49 --> 00:40:54
That is the geometric view.
If you prefer the algebraic
585
00:40:54 --> 00:41:00
view, you want to maximize the c
transpose times x.
586
00:41:00 --> 00:41:01
So, this is m.
This is n.
587
00:41:01 --> 00:41:04
Check out the dimensions work
out.
588
00:41:04 --> 00:41:08
So that's saying you want to
maximize the dot product.
589
00:41:08 --> 00:41:13
You want to maximize the extent
to which x is in the direction
590
00:41:13 --> 00:41:16
c.
And, you want to maximize that
591
00:41:16 --> 00:41:20
subject to some constraints,
which looks something like
592
00:41:20 --> 00:41:22
this, maybe.
So, this is A,
593
00:41:22 --> 00:41:25
and it's m by n.
You want to multiply it by,
594
00:41:25 --> 00:41:30
it should be something of
height n.
595
00:41:30 --> 00:41:32
That's x.
Let me put x down here,
596
00:41:32 --> 00:41:36
n by one.
And, it should be less than or
597
00:41:36 --> 00:41:39
equal to something of this
height, which is B,
598
00:41:39 --> 00:41:44
the right hand side.
OK, that's the algebraic view,
599
00:41:44 --> 00:41:48
which is to check out all the
dimensions are working out.
600
00:41:48 --> 00:41:52
But, you can read these off in
each row here,
601
00:41:52 --> 00:41:57
when multiplied by this column,
gives you one value here.
602
00:41:57 --> 00:42:03
And as just a linear
constraints on all the x sides.
603
00:42:03 --> 00:42:08
So, you want to maximize this
linear function of x_1 up to x_n
604
00:42:08 --> 00:42:11
subject to these constraints,
OK?
605
00:42:11 --> 00:42:16
Pretty simple,
but pretty powerful in general.
606
00:42:16 --> 00:42:21
So, it turns out that with,
you can formulate a huge number
607
00:42:21 --> 00:42:26
of problems such as shortest
paths as a linear program.
608
00:42:26 --> 00:42:31
So, it's a general tool.
And in this class,
609
00:42:31 --> 00:42:37
we will not cover any
algorithms for solving linear
610
00:42:37 --> 00:42:40
programming.
It's a bit tricky.
611
00:42:40 --> 00:42:44
I'll just mention that they are
out there.
612
00:42:44 --> 00:42:50
So, there's many efficient
algorithms, and lots of code
613
00:42:50 --> 00:42:55
that does this.
It's a very practical setup.
614
00:42:55 --> 00:43:02
So, lots of algorithms to solve
LP's, linear programs.
615
00:43:02 --> 00:43:05
Linear programming is usually
called LP.
616
00:43:05 --> 00:43:08
And, I'll mention a few of
them.
617
00:43:08 --> 00:43:14
There's the simplex algorithm.
This is one of the first.
618
00:43:14 --> 00:43:18
I think it is the first,
the ellipsoid algorithm.
619
00:43:18 --> 00:43:24
There's interior point methods,
and there's random sampling.
620
00:43:24 --> 00:43:29
I'll just say a little bit
about each of these because
621
00:43:29 --> 00:43:36
we're not going to talk about
any of them in depth.
622
00:43:36 --> 00:43:38
The simplex algorithm,
this is, I mean,
623
00:43:38 --> 00:43:41
one of the first algorithms in
the world in some sense,
624
00:43:41 --> 00:43:43
certainly one of the most
popular.
625
00:43:43 --> 00:43:47
It's still used today.
Almost all linear programming
626
00:43:47 --> 00:43:50
code uses the simplex algorithm.
It happens to run an
627
00:43:50 --> 00:43:53
exponential time in the
worst-case, so it's actually
628
00:43:53 --> 00:43:56
pretty bad theoretically.
But in practice,
629
00:43:56 --> 00:43:59
it works really well.
And there is some recent work
630
00:43:59 --> 00:44:03
that tries to understand this.
It's still exponential in the
631
00:44:03 --> 00:44:06
worst case.
But, it's practical.
632
00:44:06 --> 00:44:10
There's actually an open
problem whether there exists a
633
00:44:10 --> 00:44:13
variation of simplex that runs
in polynomial time.
634
00:44:13 --> 00:44:17
But, I won't go into that.
That's a major open problem in
635
00:44:17 --> 00:44:22
this area of linear programming.
The ellipsoid algorithm was the
636
00:44:22 --> 00:44:26
first algorithm to solve linear
programming in polynomial time.
637
00:44:26 --> 00:44:30
So, for a long time,
people didn't know.
638
00:44:30 --> 00:44:32
Around this time,
people started realizing
639
00:44:32 --> 00:44:36
polynomial time is a good thing.
That happened around the late
640
00:44:36 --> 00:44:37
60s.
Polynomial time is good.
641
00:44:37 --> 00:44:41
And, the ellipsoid algorithm is
the first one to do it.
642
00:44:41 --> 00:44:44
It's a very general algorithm,
and very powerful,
643
00:44:44 --> 00:44:46
theoretically:
completely impractical.
644
00:44:46 --> 00:44:49
But, it's cool.
It lets you do things like you
645
00:44:49 --> 00:44:52
can solve a linear program that
has exponentially many
646
00:44:52 --> 00:44:56
constraints in polynomial time.
You've got all sorts of crazy
647
00:44:56 --> 00:44:57
things.
So, I'll just say it's
648
00:44:57 --> 00:45:01
polynomial time.
I can't say something nice
649
00:45:01 --> 00:45:04
about it; don't say it at all.
It's impractical.
650
00:45:04 --> 00:45:07
Interior point methods are sort
of the mixture.
651
00:45:07 --> 00:45:11
They run in polynomial time.
You can guarantee that.
652
00:45:11 --> 00:45:14
And, they are also pretty
practical, and there's sort of
653
00:45:14 --> 00:45:18
this competition these days
about whether simplex or
654
00:45:18 --> 00:45:21
interior point is better.
And, I don't know what it is
655
00:45:21 --> 00:45:24
today but a few years ago they
were neck and neck.
656
00:45:24 --> 00:45:27
And, random sampling is a brand
new approach.
657
00:45:27 --> 00:45:31
This is just from a couple
years ago by two MIT professors,
658
00:45:31 --> 00:45:35
Dimitris Bertsimas and Santosh
Vempala, I guess the other is in
659
00:45:35 --> 00:45:39
applied math.
So, just to show you,
660
00:45:39 --> 00:45:41
there's active work in this
area.
661
00:45:41 --> 00:45:44
People are still finding new
ways to solve linear programs.
662
00:45:44 --> 00:45:47
This is completely randomized,
and very simple,
663
00:45:47 --> 00:45:50
and very general.
It hasn't been implemented,
664
00:45:50 --> 00:45:52
so we don't know how practical
it is yet.
665
00:45:52 --> 00:45:54
But, it has potential.
OK: pretty neat.
666
00:45:54 --> 00:45:57
OK, we're going to look at a
somewhat simpler version of
667
00:45:57 --> 00:46:02
linear programming.
The first restriction we are
668
00:46:02 --> 00:46:05
going to make is actually not
much of a restriction.
669
00:46:05 --> 00:46:09
But, nonetheless we will
consider it, it's a little bit
670
00:46:09 --> 00:46:13
easier to think about.
So here, we had some polytope
671
00:46:13 --> 00:46:16
we wanted to maximize some
objective.
672
00:46:16 --> 00:46:19
In a feasibility problem,
I just want to know,
673
00:46:19 --> 00:46:23
is the polytope empty?
Can you find any point in that
674
00:46:23 --> 00:46:26
polytope?
Can you find any set of values,
675
00:46:26 --> 00:46:30
x, that satisfy these
constraints?
676
00:46:30 --> 00:46:34
OK, so there's no objective.
c, just find x such that AX is
677
00:46:34 --> 00:46:39
less than or equal to B.
OK, it turns out you can prove
678
00:46:39 --> 00:46:43
a very general theorem that if
you can solve linear
679
00:46:43 --> 00:46:47
feasibility, you can also solve
linear programming.
680
00:46:47 --> 00:46:52
We won't prove that here,
but this is actually no easier
681
00:46:52 --> 00:46:56
than the original problem even
though it feels easier,
682
00:46:56 --> 00:47:03
and it's easier to think about.
I was just saying actually no
683
00:47:03 --> 00:47:08
easier than LP.
OK, the next restriction we're
684
00:47:08 --> 00:47:11
going to make is a real
restriction.
685
00:47:11 --> 00:47:17
And it simplifies the problem
quite a bit.
686
00:47:17 --> 00:47:30
687
00:47:30 --> 00:47:35
And that's to look at different
constraints.
688
00:47:35 --> 00:47:40
And, if all this seemed a bit
abstract so far,
689
00:47:40 --> 00:47:45
we will now ground ourselves
little bit.
690
00:47:45 --> 00:47:51
A system of different
constraints is a linear
691
00:47:51 --> 00:47:57
feasibility problem.
So, it's an LP where there's no
692
00:47:57 --> 00:48:06
objective.
And, it's with a restriction,
693
00:48:06 --> 00:48:17
so, where each row of the
matrix, so, the matrix,
694
00:48:17 --> 00:48:26
A, has one one,
and it has one minus one,
695
00:48:26 --> 00:48:36
and everything else in the row
is zero.
696
00:48:36 --> 00:48:40
OK, in other words,
each constraint has its very
697
00:48:40 --> 00:48:45
simple form.
It involves two variables and
698
00:48:45 --> 00:48:49
some number.
So, we have something like x_j
699
00:48:49 --> 00:48:53
minus x_i is less than or equal
to w_ij.
700
00:48:53 --> 00:49:00
So, this is just a number.
These are two variables.
701
00:49:00 --> 00:49:02
There's a minus sign,
no values up here,
702
00:49:02 --> 00:49:06
no coefficients,
no other of the X_k's appear,
703
00:49:06 --> 00:49:09
just two of them.
And, you have a bunch of
704
00:49:09 --> 00:49:13
constraints of this form,
one per row of the matrix.
705
00:49:13 --> 00:49:16
Geometrically,
I haven't thought about what
706
00:49:16 --> 00:49:18
this means.
I think it means the
707
00:49:18 --> 00:49:22
hyperplanes are pretty simple.
Sorry I can't do better than
708
00:49:22 --> 00:49:25
that.
It's a little hard to see this
709
00:49:25 --> 00:49:30
in high dimensions.
But, it will start to
710
00:49:30 --> 00:49:38
correspond to something we've
seen, namely the board that its
711
00:49:38 --> 00:49:45
next to, very shortly.
OK, so let's do a very quick
712
00:49:45 --> 00:49:50
example mainly to have something
to point at.
713
00:49:50 --> 00:49:59
Here's a very simple system of
difference constraints --
714
00:49:59 --> 00:50:11
715
00:50:11 --> 00:50:13
-- OK, and a solution.
Why not?
716
00:50:13 --> 00:50:18
It's not totally trivial to
solve this, but here's a
717
00:50:18 --> 00:50:21
solution.
And the only thing to check is
718
00:50:21 --> 00:50:25
that each of these constraints
is satisfied.
719
00:50:25 --> 00:50:29
x_1 minus x_2 is three,
which is less than or equal to
720
00:50:29 --> 00:50:35
three, and so on.
There could be negative values.
721
00:50:35 --> 00:50:42
There could be positive values.
It doesn't matter.
722
00:50:42 --> 00:50:49
I'd like to transform this
system of difference constraints
723
00:50:49 --> 00:50:55
into a graph because we know a
lot about graphs.
724
00:50:55 --> 00:51:03
So, we're going to call this
the constraint graph.
725
00:51:03 --> 00:51:08
And, it's going to represent
these constraints.
726
00:51:08 --> 00:51:13
How'd I do it?
Well, I take every constraint,
727
00:51:13 --> 00:51:20
which in general looks like
this, and I convert it into an
728
00:51:20 --> 00:51:24
edge.
OK, so if I write it as x_j
729
00:51:24 --> 00:51:29
minus x_i is less than or equal
to some w_ij,
730
00:51:29 --> 00:51:36
w seems suggestive of weights.
That's exactly why I called it
731
00:51:36 --> 00:51:38
w.
I'm going to make that an edge
732
00:51:38 --> 00:51:41
from v_i to v_j.
So, the order flips a little
733
00:51:41 --> 00:51:44
bit.
And, the weight of that edge is
734
00:51:44 --> 00:51:46
w_ij.
So, just do that.
735
00:51:46 --> 00:51:49
Make n vertices.
So, you have the number of
736
00:51:49 --> 00:51:53
vertices equals n.
The number of edges equals the
737
00:51:53 --> 00:51:56
number of constraints,
which is m, the height of the
738
00:51:56 --> 00:52:01
matrix, and just transform.
So, for example,
739
00:52:01 --> 00:52:06
here we have three variables.
So, we have three vertices,
740
00:52:06 --> 00:52:09
v_1, v_2, v_3.
We have x_1 minus x_2.
741
00:52:09 --> 00:52:14
So, we have an edge from v_2 to
v_1 of weight three.
742
00:52:14 --> 00:52:18
We have x_2 minus x_3.
So, we have an edge from v_3 to
743
00:52:18 --> 00:52:23
v_2 of weight minus two.
And, we have x_1 minus x_3.
744
00:52:23 --> 00:52:27
So, we have an edge from v_3 to
v_1 of weight two.
745
00:52:27 --> 00:52:32
I hope I got the directions
right.
746
00:52:32 --> 00:52:34
Yep.
So, there it is,
747
00:52:34 --> 00:52:40
a graph: currently no obvious
connection to shortest paths,
748
00:52:40 --> 00:52:42
right?
But in fact,
749
00:52:42 --> 00:52:47
this constraint is closely
related to shortest paths.
750
00:52:47 --> 00:52:52
So let me just rewrite it.
You could say,
751
00:52:52 --> 00:52:59
well, an x_j is less than or
equal to x_i plus w_ij.
752
00:52:59 --> 00:53:03
Or, you could think of it as
d[j] less than or equal to d[i]
753
00:53:03 --> 00:53:07
plus w_ij.
This is a conceptual balloon.
754
00:53:07 --> 00:53:10
Look awfully familiar?
A lot like the triangle
755
00:53:10 --> 00:53:13
inequality, a lot like
relaxation.
756
00:53:13 --> 00:53:17
So, there's a very close
connection between these two
757
00:53:17 --> 00:53:21
problems as we will now prove.
758
00:53:21 --> 00:53:43
759
00:53:43 --> 00:53:45
So, we're going to have two
theorems.
760
00:53:45 --> 00:53:49
And, they're going to look
similar to the correctness of
761
00:53:49 --> 00:53:53
Bellman-Ford in that they talk
about negative weight cycles.
762
00:53:53 --> 00:53:54
Here we go.
It turns out,
763
00:53:54 --> 00:53:57
I mean, we have this constraint
graph.
764
00:53:57 --> 00:54:02
It can have negative weights.
It can have positive weights.
765
00:54:02 --> 00:54:05
It turns out what matters is if
you have a negative weight
766
00:54:05 --> 00:54:07
cycle.
So, the first thing to prove is
767
00:54:07 --> 00:54:11
that if you have a negative
weight cycle that something bad
768
00:54:11 --> 00:54:13
happens.
OK, what could happen bad?
769
00:54:13 --> 00:54:16
Well, we're just trying to
satisfy this system of
770
00:54:16 --> 00:54:19
constraints.
So, the bad thing is that there
771
00:54:19 --> 00:54:22
might not be any solution.
These constraints may be
772
00:54:22 --> 00:54:24
infeasible.
And that's the claim.
773
00:54:24 --> 00:54:29
The claim is that this is
actually an if and only if.
774
00:54:29 --> 00:54:33
But first we'll proved the if.
If you have a negative weight
775
00:54:33 --> 00:54:38
cycle, you're doomed.
The difference constraints are
776
00:54:38 --> 00:54:41
unsatisfiable.
That's a more intuitive way to
777
00:54:41 --> 00:54:43
say it.
In the LP world,
778
00:54:43 --> 00:54:48
they call it infeasible.
But unsatisfiable makes a lot
779
00:54:48 --> 00:54:51
more sense.
There's no way to assign the
780
00:54:51 --> 00:54:56
x_i's in order to satisfy all
the constraints simultaneously.
781
00:54:56 --> 00:55:01
So, let's just take a look.
Consider a negative weight
782
00:55:01 --> 00:55:03
cycle.
It starts at some vertex,
783
00:55:03 --> 00:55:07
goes through some vertices,
and at some point comes back.
784
00:55:07 --> 00:55:11
I don't care whether it repeats
vertices, just as long as this
785
00:55:11 --> 00:55:15
cycle, from v_1 to v_1 is a
negative weight cycle strictly
786
00:55:15 --> 00:55:17
negative weight.
787
00:55:17 --> 00:55:26
788
00:55:26 --> 00:55:30
OK, and what I'm going to do is
just write down all the
789
00:55:30 --> 00:55:34
constraints.
Each of these edges corresponds
790
00:55:34 --> 00:55:37
to a constraint,
which must be in the set of
791
00:55:37 --> 00:55:40
constraints because we had that
graph.
792
00:55:40 --> 00:55:45
So, these are all edges.
Let's look at what they give
793
00:55:45 --> 00:55:48
us.
So, we have an edge from v_1 to
794
00:55:48 --> 00:55:50
v_2.
That corresponds to x_2 minus
795
00:55:50 --> 00:55:53
x_1 is, at most,
something, w_12.
796
00:55:53 --> 00:55:57
Then we have x_3 minus x_2.
That's the weight w_23,
797
00:55:57 --> 00:56:04
and so on.
And eventually we get up to
798
00:56:04 --> 00:56:08
something like x_k minus
x_(k-1).
799
00:56:08 --> 00:56:15
That's this edge:
w_(k-1),k , and lastly we have
800
00:56:15 --> 00:56:23
this edge, which wraps around.
So, it's x_1 minus x_k,
801
00:56:23 --> 00:56:30
w_k1 if I've got the signs
right.
802
00:56:30 --> 00:56:35
Good, so here's a bunch of
constraints.
803
00:56:35 --> 00:56:40
What do you suggest I do with
them?
804
00:56:40 --> 00:56:47
Anything interesting about
these constraints,
805
00:56:47 --> 00:56:52
say, the left hand sides?
Sorry?
806
00:56:52 --> 00:57:00
It sounded like the right word.
What was it?
807
00:57:00 --> 00:57:01
Telescopes, yes,
good.
808
00:57:01 --> 00:57:04
Everything cancels.
If I added these up,
809
00:57:04 --> 00:57:08
there's an x_2 and a minus x_2.
There's a minus x_1 and an x_1.
810
00:57:08 --> 00:57:12
There's a minus XK and an XK.
Everything here cancels if I
811
00:57:12 --> 00:57:15
add up the left hand sides.
So, what happens if I add up
812
00:57:15 --> 00:57:18
the right hand sides?
Over here I get zero,
813
00:57:18 --> 00:57:20
my favorite answer.
And over here,
814
00:57:20 --> 00:57:24
we get all the weights of all
the edges in the negative weight
815
00:57:24 --> 00:57:30
cycle, which is the weight of
the cycle, which is negative.
816
00:57:30 --> 00:57:33
So, zero is strictly less than
zero: contradiction.
817
00:57:33 --> 00:57:35
Contradiction:
wait a minute,
818
00:57:35 --> 00:57:37
we didn't assume anything that
was false.
819
00:57:37 --> 00:57:40
So, it's not really a
contradiction in the
820
00:57:40 --> 00:57:43
mathematical sense.
We didn't contradict the world.
821
00:57:43 --> 00:57:47
We just said that these
constraints are contradictory.
822
00:57:47 --> 00:57:50
In other words,
if you pick any values of the
823
00:57:50 --> 00:57:53
x_i's, there is no way that
these can all be true because
824
00:57:53 --> 00:57:55
that you would get a
contradiction.
825
00:57:55 --> 00:57:59
So, it's impossible for these
things to be satisfied by some
826
00:57:59 --> 00:58:01
real x_i's.
So, these must be
827
00:58:01 --> 00:58:07
unsatisfiable.
Let's say there's no satisfying
828
00:58:07 --> 00:58:11
assignment, a little more
precise, x_1 up to x_m,
829
00:58:11 --> 00:58:14
no weights.
Can we satisfy those
830
00:58:14 --> 00:58:18
constraints?
Because they add up to zero on
831
00:58:18 --> 00:58:23
the left-hand side,
and negative on the right-hand
832
00:58:23 --> 00:58:26
side.
OK, so that's an easy proof.
833
00:58:26 --> 00:58:33
The reverse direction will be
only slightly harder.
834
00:58:33 --> 00:58:34
OK, so, cool.
We have this connection.
835
00:58:34 --> 00:58:37
So motivation is,
suppose you'd want to solve
836
00:58:37 --> 00:58:40
these difference constraints.
And we'll see one such
837
00:58:40 --> 00:58:42
application.
I Googled around for difference
838
00:58:42 --> 00:58:44
constraints.
There is a fair number of
839
00:58:44 --> 00:58:46
papers that care about
difference constraints.
840
00:58:46 --> 00:58:49
And, they all use shortest
paths to solve them.
841
00:58:49 --> 00:58:51
So, if we can prove a
connection between shortest
842
00:58:51 --> 00:58:54
paths, which we know how to
compute, and difference
843
00:58:54 --> 00:58:56
constraints, then we'll have
something cool.
844
00:58:56 --> 00:59:00
And, next class will see even
more applications of difference
845
00:59:00 --> 00:59:05
constraints.
It turns out they're really
846
00:59:05 --> 00:59:09
useful for all pairs shortest
paths.
847
00:59:09 --> 00:59:16
OK, but for now let's just
prove this equivalence and
848
00:59:16 --> 00:59:21
finish it off.
So, the reverse direction is if
849
00:59:21 --> 00:59:29
there's no negative weight cycle
in this constraint graph,
850
00:59:29 --> 00:59:35
then the system better be
satisfiable.
851
00:59:35 --> 00:59:42
The claim is that these
negative weight cycles are the
852
00:59:42 --> 00:59:49
only barriers for finding a
solution to these difference
853
00:59:49 --> 00:59:54
constraints.
I have this feeling somewhere
854
00:59:54 --> 00:59:58
here.
I had to talk about the
855
00:59:58 --> 1:00:03
constraint graph.
Good.
856
1:00:03 --> 1:00:13
857
1:00:13 --> 1:00:19.83
Satisfied, good.
So, here we're going to see a
858
1:00:19.83 --> 1:00:28.482
technique that is very useful
when thinking about shortest
859
1:00:28.482 --> 1:00:32.788
paths.
And, it's a bit hard to guess,
860
1:00:32.788 --> 1:00:36.505
especially if you haven't seen
it before.
861
1:00:36.505 --> 1:00:40.78
This is useful in problem sets,
and in quizzes,
862
1:00:40.78 --> 1:00:45.334
and finals, and everything.
So, keep this in mind.
863
1:00:45.334 --> 1:00:50.539
I mean, I'm using it to prove
this rather simple theorem,
864
1:00:50.539 --> 1:00:56.115
but the idea of changing the
graph, so I'm going to call this
865
1:00:56.115 --> 1:01:00.483
constraint graph G.
Changing the graph is a very
866
1:01:00.483 --> 1:01:04.386
powerful idea.
So, we're going to add a new
867
1:01:04.386 --> 1:01:07.732
vertex, s, or source,
use the source,
868
1:01:07.732 --> 1:01:13.215
Luke, and we're going to add a
bunch of edges from s because
869
1:01:13.215 --> 1:01:17.397
being a source,
it better be connected to some
870
1:01:17.397 --> 1:01:23.529
things.
So, we are going to add a zero
871
1:01:23.529 --> 1:01:29.764
weight edge, or weight zero edge
from s to everywhere,
872
1:01:29.764 --> 1:01:36
so, to every other vertex in
the constraint graph.
873
1:01:36 --> 1:01:40.121
Those vertices are called v_i,
v_1 up to v_n.
874
1:01:40.121 --> 1:01:45.928
So, I have my constraint graph.
But I'll copy this one so I can
875
1:01:45.928 --> 1:01:49.768
change it.
It's always good to backup your
876
1:01:49.768 --> 1:01:53.046
work before you make changes,
right?
877
1:01:53.046 --> 1:01:57.542
So now, I want to add a new
vertex, s, over here,
878
1:01:57.542 --> 1:02:01.195
my new source.
I just take my constraint
879
1:02:01.195 --> 1:02:06.909
graph, whatever it looks like,
add in weight zero edges to all
880
1:02:06.909 --> 1:02:11.171
the other vertices.
Simple enough.
881
1:02:11.171 --> 1:02:14.1
Now, what did I do?
What did you do?
882
1:02:14.1 --> 1:02:18.953
Well, I have a candidate source
now which can reach all the
883
1:02:18.953 --> 1:02:21.799
vertices.
So, shortest path from s,
884
1:02:21.799 --> 1:02:24.728
hopefully, well,
paths from s exist.
885
1:02:24.728 --> 1:02:30
I can get from s to everywhere
in weight at most zero.
886
1:02:30 --> 1:02:31.851
OK, maybe less.
Could it be less?
887
1:02:31.851 --> 1:02:34.338
Well, you know,
like v_2, I can get to it by
888
1:02:34.338 --> 1:02:36.71
zero minus two.
So, that's less than zero.
889
1:02:36.71 --> 1:02:38.677
So I've got to be a little
careful.
890
1:02:38.677 --> 1:02:40.933
What if there's a negative
weight cycle?
891
1:02:40.933 --> 1:02:42.785
Oh no?
Then there wouldn't be any
892
1:02:42.785 --> 1:02:44.347
shortest paths.
Fortunately,
893
1:02:44.347 --> 1:02:47.413
we assume that there's no
negative weight cycle in the
894
1:02:47.413 --> 1:02:49.785
original graph.
And if you think about it,
895
1:02:49.785 --> 1:02:53.082
if there's no negative weight
cycle in the original graph,
896
1:02:53.082 --> 1:02:55.396
we add an edge from s to
everywhere else.
897
1:02:55.396 --> 1:02:58.52
We're not making any new
negative weight cycles because
898
1:02:58.52 --> 1:03:01.586
you can start at s and go
somewhere at a cost of zero,
899
1:03:01.586 --> 1:03:05
which doesn't affect any
weights.
900
1:03:05 --> 1:03:08.92
And then, you are forced to
stay in the old graph.
901
1:03:08.92 --> 1:03:12.84
So, there can't be any new
negative weight cycles.
902
1:03:12.84 --> 1:03:17
So, the modified graph has no
negative weight cycles.
903
1:03:17 --> 1:03:20.519
That's good because it also has
paths from s,
904
1:03:20.519 --> 1:03:25
and therefore it also has
shortest paths from s.
905
1:03:25 --> 1:03:30.376
The modified graph has no
negative weight because it
906
1:03:30.376 --> 1:03:34.487
didn't before.
And, it has paths from s.
907
1:03:34.487 --> 1:03:38.387
There's a path from s to every
vertex.
908
1:03:38.387 --> 1:03:44.923
There may not have been before.
Before, I couldn't get from v_2
909
1:03:44.923 --> 1:03:49.561
to v_3, for example.
Well, that's still true.
910
1:03:49.561 --> 1:03:53.145
But from s I can get to
everywhere.
911
1:03:53.145 --> 1:03:58.521
So, that means that this graph,
this modified graph,
912
1:03:58.521 --> 1:04:04.974
has shortest paths.
Shortest paths exist from s.
913
1:04:04.974 --> 1:04:09.86
In other words,
if I took all the shortest path
914
1:04:09.86 --> 1:04:14.641
weights, like I ran Bellman-Ford
from s, then,
915
1:04:14.641 --> 1:04:19.421
I would get a bunch of finite
numbers, d of v,
916
1:04:19.421 --> 1:04:22.926
for every value,
for every vertex.
917
1:04:22.926 --> 1:04:27.175
That seems like a good idea.
Let's do it.
918
1:04:27.175 --> 1:04:33.757
So, shortest paths exist.
Let's just assign x_i to be the
919
1:04:33.757 --> 1:04:36.782
shortest path weight from s to
v_i.
920
1:04:36.782 --> 1:04:39.806
Why not?
That's a good choice for a
921
1:04:39.806 --> 1:04:43.898
number, the shortest path weight
from s to v_i.
922
1:04:43.898 --> 1:04:47.99
This is finite because it's
less than infinity,
923
1:04:47.99 --> 1:04:51.549
and it's greater than minus
infinity, so,
924
1:04:51.549 --> 1:04:55.73
some finite number.
That's what we need to do in
925
1:04:55.73 --> 1:05:00
order to satisfy these
constraints.
926
1:05:00 --> 1:05:03.933
The claim is that this is a
satisfying assignment.
927
1:05:03.933 --> 1:05:05.86
Why?
Triangle inequality.
928
1:05:05.86 --> 1:05:09.311
Somewhere here we wrote
triangle inequality.
929
1:05:09.311 --> 1:05:12.924
This looks a lot like the
triangle inequality.
930
1:05:12.924 --> 1:05:16.456
In fact, I think that's the end
of the proof.
931
1:05:16.456 --> 1:05:19.908
Let's see here.
What we want to be true with
932
1:05:19.908 --> 1:05:24.564
this assignment is that x_j
minus x_i is less than or equal
933
1:05:24.564 --> 1:05:28.497
to w_ij whenever ij is an edge.
Or, let's say v_i,
934
1:05:28.497 --> 1:05:31.949
v_j, for every such constraint,
so, for v_i,
935
1:05:31.949 --> 1:05:37.313
v_j in the edge set.
OK, so what is this true?
936
1:05:37.313 --> 1:05:42.217
Well, let's just expand it out.
So, x_i is this delta,
937
1:05:42.217 --> 1:05:46.935
and x_j is some other delta.
So, we have delta of s,
938
1:05:46.935 --> 1:05:51.654
vj minus delta of s_vi.
And, on the right-hand side,
939
1:05:51.654 --> 1:05:56.743
well, w_ij, that was the weight
of the edge from I to J.
940
1:05:56.743 --> 1:06:01
So, this is the weight of v_i
to v_j.
941
1:06:01 --> 1:06:03.659
OK, I will rewrite this
slightly.
942
1:06:03.659 --> 1:06:07.315
Delta s, vj is less than or
equal to delta s,
943
1:06:07.315 --> 1:06:09.06
vi plus w of v_i,
v_j.
944
1:06:09.06 --> 1:06:12.965
And that's the triangle
inequality more or less.
945
1:06:12.965 --> 1:06:18.117
The shortest path from s to v_j
is, at most, shortest path from
946
1:06:18.117 --> 1:06:22.022
s to v_i plus a particular path
from v_i to v_j,
947
1:06:22.022 --> 1:06:24.765
namely the single edge v_i to
v_j.
948
1:06:24.765 --> 1:06:30
This could only be longer than
the shortest path.
949
1:06:30 --> 1:06:33.372
And so, that makes the
right-hand side bigger,
950
1:06:33.372 --> 1:06:37.644
which makes this inequality
more true, meaning it was true
951
1:06:37.644 --> 1:06:39.967
before.
And now it's still true.
952
1:06:39.967 --> 1:06:42.441
And, that proves it.
This is true.
953
1:06:42.441 --> 1:06:45.513
And, these were all equivalent
statements.
954
1:06:45.513 --> 1:06:48.961
This we know to be true by
triangle inequality.
955
1:06:48.961 --> 1:06:52.408
Therefore, these constraints
are all satisfied.
956
1:06:52.408 --> 1:06:54.357
Magic.
I'm so excited here.
957
1:06:54.357 --> 1:06:59.004
So, we've proved that having a
negative weight cycle is exactly
958
1:06:59.004 --> 1:07:05
when these system of difference
constraints are unsatisfiable.
959
1:07:05 --> 1:07:08.241
So, if we want to satisfy them,
if we want to find the right
960
1:07:08.241 --> 1:07:10
answer to x, we run
Bellman-Ford.
961
1:07:10 --> 1:07:12.417
Either it says,
oh, no negative weight cycle.
962
1:07:12.417 --> 1:07:14.945
Then you are hosed.
Then, there is no solution.
963
1:07:14.945 --> 1:07:17.252
But that's the best you could
hope to know.
964
1:07:17.252 --> 1:07:19.67
Otherwise, it says,
oh, there was no negative
965
1:07:19.67 --> 1:07:22.087
weight cycle,
and here are your shortest path
966
1:07:22.087 --> 1:07:23.736
weights.
You just plug them in,
967
1:07:23.736 --> 1:07:26.868
and bam, you have your x_i's
that satisfy the constraints.
968
1:07:26.868 --> 1:07:30
Awesome.
Now, it wasn't just any graph.
969
1:07:30 --> 1:07:32.877
I mean, we started with
constraints, algebra,
970
1:07:32.877 --> 1:07:35.886
we converted it into a graph by
this transform.
971
1:07:35.886 --> 1:07:37.978
Then we added a source vertex,
s.
972
1:07:37.978 --> 1:07:41.641
So, I mean, we had to build a
graph to solve our problem,
973
1:07:41.641 --> 1:07:43.21
very powerful idea.
Cool.
974
1:07:43.21 --> 1:07:47.135
This is the idea of reduction.
You can reduce the problem you
975
1:07:47.135 --> 1:07:50.601
want to solve into some problem
you know how to solve.
976
1:07:50.601 --> 1:07:54.656
You know how to solve shortest
paths when there are no negative
977
1:07:54.656 --> 1:07:57.337
weight cycles,
or find out that there is a
978
1:07:57.337 --> 1:08:01
negative weight cycle by
Bellman-Ford.
979
1:08:01 --> 1:08:06.099
So, now we know how to solve
difference constraints.
980
1:08:06.099 --> 1:08:09.4
It turns out you can do even
more.
981
1:08:09.4 --> 1:08:15
Bellman-Ford does a little bit
more than just solve these
982
1:08:15 --> 1:08:18.899
constraints.
But first let me write down
983
1:08:18.899 --> 1:08:22.899
what I've been jumping up and
down about.
984
1:08:22.899 --> 1:08:27
The corollary is you can use
Bellman-Ford.
985
1:08:27 --> 1:08:34.484
I mean, you make this graph.
Then you apply Bellman-Ford,
986
1:08:34.484 --> 1:08:41.33
and it will solve your system
of difference constraints.
987
1:08:41.33 --> 1:08:45.686
So, let me put in some numbers
here.
988
1:08:45.686 --> 1:08:49.793
You have m difference
constraints.
989
1:08:49.793 --> 1:08:56.266
And, you have n variables.
And, it will solve them in
990
1:08:56.266 --> 1:09:02.416
order m times n time.
Actually, these numbers go up
991
1:09:02.416 --> 1:09:07.333
slightly because we are adding n
edges, and we're adding one
992
1:09:07.333 --> 1:09:12
vertex, but assuming all of
these numbers are nontrivial,
993
1:09:12 --> 1:09:14.916
m is at least n.
It's order MN time.
994
1:09:14.916 --> 1:09:20.083
OK, trying to avoid cases where
some of them are close to zero.
995
1:09:20.083 --> 1:09:22.25
Good.
So, some other facts,
996
1:09:22.25 --> 1:09:26.25
that's what I just said.
And we'll leave these as
997
1:09:26.25 --> 1:09:31
exercises because they're not
too essential.
998
1:09:31 --> 1:09:35.627
The main thing we need is this.
But, some other cool facts is
999
1:09:35.627 --> 1:09:39.484
that Bellman-Ford actually
optimizes some objective
1000
1:09:39.484 --> 1:09:42.492
functions.
So, we are saying it's just a
1001
1:09:42.492 --> 1:09:46.194
feasibility problem.
We just want to know whether
1002
1:09:46.194 --> 1:09:48.739
these constraints are
satisfiable.
1003
1:09:48.739 --> 1:09:52.75
In fact, you can add a
particular objective function.
1004
1:09:52.75 --> 1:09:56.837
So, you can't give it an
arbitrary objective function,
1005
1:09:56.837 --> 1:10:04.647
but here's one of interest.
x_1 plus x_2 plus x_n,
1006
1:10:04.647 --> 1:10:15
OK, but not just that.
We have some constraints.
1007
1:10:15 --> 1:10:24
1008
1:10:24 --> 1:10:27.395
OK, this is a linear program.
I want to maximize the sum of
1009
1:10:27.395 --> 1:10:30.849
the x_i's subject to all the
x_i's being nonpositive and the
1010
1:10:30.849 --> 1:10:33.542
difference constraints.
So, this we had before.
1011
1:10:33.542 --> 1:10:35.943
This is fine.
We noticed at some point you
1012
1:10:35.943 --> 1:10:38.811
could get from s to everywhere
with cost, at most,
1013
1:10:38.811 --> 1:10:40.509
zero.
So, we know that in this
1014
1:10:40.509 --> 1:10:42.851
assignment all of the x_i's are
negative.
1015
1:10:42.851 --> 1:10:45.602
That's not necessary,
but it's true when you run
1016
1:10:45.602 --> 1:10:47.944
Bellman-Ford.
So if you solve your system
1017
1:10:47.944 --> 1:10:50.754
using Bellman-Ford,
which is no less general than
1018
1:10:50.754 --> 1:10:53.272
anything else,
you happen to get nonpositive
1019
1:10:53.272 --> 1:10:54.969
x_i's.
And so, subject to that
1020
1:10:54.969 --> 1:10:58.072
constraint, it actually makes
them is close to zero as
1021
1:10:58.072 --> 1:11:04.009
possible in the L1 norm.
In the sum of these values,
1022
1:11:04.009 --> 1:11:08.578
it tries to make the sum as
close to zero,
1023
1:11:08.578 --> 1:11:15.154
it tries to make the values as
small as possible in absolute
1024
1:11:15.154 --> 1:11:20.393
value in this sense.
OK, it does more than that.
1025
1:11:20.393 --> 1:11:25.297
It cooks, it cleans,
it finds shortest paths.
1026
1:11:25.297 --> 1:11:31.761
It also minimizes the spread,
the maximum over all i of x_i
1027
1:11:31.761 --> 1:11:37
minus the minimum over all i of
x_i.
1028
1:11:37 --> 1:11:40.84
So, I mean, if you have your
real line, and here are the
1029
1:11:40.84 --> 1:11:44.402
x_i's wherever they are.
It minimizes this distance.
1030
1:11:44.402 --> 1:11:46.567
And zero is somewhere over
here.
1031
1:11:46.567 --> 1:11:50.268
So, it tries to make the x_i's
as compact as possible.
1032
1:11:50.268 --> 1:11:54.458
This is actually the L infinity
norm, if you know stuff about
1033
1:11:54.458 --> 1:11:56.972
norms from your linear algebra
class.
1034
1:11:56.972 --> 1:12:00.673
OK, this is the L1 norm.
I think it minimizes every LP
1035
1:12:00.673 --> 1:12:05.17
norm.
Good, so let's use this for
1036
1:12:05.17 --> 1:12:09.163
something.
Yeah, let's solve a real
1037
1:12:09.163 --> 1:12:13.978
problem, and then we'll be done
for today.
1038
1:12:13.978 --> 1:12:20.79
Next class we'll see the really
cool stuff, the really cool
1039
1:12:20.79 --> 1:12:27.366
application of all of this.
For now, and we'll see a cool
1040
1:12:27.366 --> 1:12:32.886
but relatively simple
application, which is VLSI
1041
1:12:32.886 --> 1:12:37.528
layout.
We talked a little bit about
1042
1:12:37.528 --> 1:12:40.779
VLSI way back and divide and
conquer.
1043
1:12:40.779 --> 1:12:45.655
You have a bunch of chips,
or you want to arrange them,
1044
1:12:45.655 --> 1:12:50.441
and minimize some objectives.
So, here's a particular,
1045
1:12:50.441 --> 1:12:54.505
tons of problems that come out
of VLSI layout.
1046
1:12:54.505 --> 1:12:59.02
Here's one of them.
You have a bunch of features of
1047
1:12:59.02 --> 1:13:04.583
an integrated circuit.
You want to somehow arrange
1048
1:13:04.583 --> 1:13:09.845
them on your circuit without
putting any two of them too
1049
1:13:09.845 --> 1:13:13.768
close to each other.
You have some minimum
1050
1:13:13.768 --> 1:13:19.03
separation like at least they
should not get top of each
1051
1:13:19.03 --> 1:13:22.283
other.
Probably, you also need some
1052
1:13:22.283 --> 1:13:26.589
separation to put wires in
between, and so on,
1053
1:13:26.589 --> 1:13:33
so, without putting any two
features too close together.
1054
1:13:33 --> 1:13:37.152
OK, so just to give you an
idea, so I have some objects and
1055
1:13:37.152 --> 1:13:41.089
I'm going to be a little bit
vague about how this works.
1056
1:13:41.089 --> 1:13:43.738
You have some features.
This is stuff,
1057
1:13:43.738 --> 1:13:47.46
some chips, whatever.
We don't really care what their
1058
1:13:47.46 --> 1:13:50.825
shapes look like.
I just want to be able to move
1059
1:13:50.825 --> 1:13:55.192
them around so that the gap at
any point, so let me just think
1060
1:13:55.192 --> 1:13:58.199
about this gap.
This gap should be at least
1061
1:13:58.199 --> 1:14:01.134
some delta.
Or, I don't want to use delta.
1062
1:14:01.134 --> 1:14:05
Let's say epsilon,
good, small number.
1063
1:14:05 --> 1:14:08.828
So, I just need some separation
between all of my parts.
1064
1:14:08.828 --> 1:14:12.378
And for this problem,
I'm going to be pretty simple,
1065
1:14:12.378 --> 1:14:15.719
just say that the parts are
only allowed to slide
1066
1:14:15.719 --> 1:14:18.433
horizontally.
So, it's a one-dimensional
1067
1:14:18.433 --> 1:14:20.73
problem.
These objects are in 2-d,
1068
1:14:20.73 --> 1:14:23.654
or whatever,
but I can only slide them an x
1069
1:14:23.654 --> 1:14:25.672
coordinate.
So, to model that,
1070
1:14:25.672 --> 1:14:29.57
I'm going to look at the left
edge of every part and say,
1071
1:14:29.57 --> 1:14:32.981
well, these two left edges
should be at least some
1072
1:14:32.981 --> 1:14:36.848
separation.
So, I think of it as whatever
1073
1:14:36.848 --> 1:14:38.952
the distance is plus some
epsilon.
1074
1:14:38.952 --> 1:14:41.501
But, you know,
if you have some funky 2-d
1075
1:14:41.501 --> 1:14:45.135
shapes you have to compute,
well, this is a little bit too
1076
1:14:45.135 --> 1:14:47.621
close because these come into
alignment.
1077
1:14:47.621 --> 1:14:51.063
But, there's some constraint,
well, for any two pieces,
1078
1:14:51.063 --> 1:14:53.677
I could figure out how close
they can get.
1079
1:14:53.677 --> 1:14:57.31
They should get no closer.
So, I'm going to call this x_1.
1080
1:14:57.31 --> 1:15:00.243
I'll call this x_2.
So, we have some constraint
1081
1:15:00.243 --> 1:15:03.111
like x_2 minus x_1 is at least d
plus epsilon,
1082
1:15:03.111 --> 1:15:07
or whatever you compute that
weight to be.
1083
1:15:07 --> 1:15:09.735
OK, so for every pair of
pieces, I can do this,
1084
1:15:09.735 --> 1:15:13.066
compute some constraint on how
far apart they have to be.
1085
1:15:13.066 --> 1:15:15.861
And, now I'd like to assign
these x coordinates.
1086
1:15:15.861 --> 1:15:18.596
Right now, I'm assuming they're
just variables.
1087
1:15:18.596 --> 1:15:22.105
I want to slide these pieces
around horizontally in order to
1088
1:15:22.105 --> 1:15:25.257
compactify them as much as
possible so they fit in the
1089
1:15:25.257 --> 1:15:28.35
smallest chip that I can make
because it costs money,
1090
1:15:28.35 --> 1:15:31.145
and time, and everything,
and power, everything.
1091
1:15:31.145 --> 1:15:34
You always want your chip
small.
1092
1:15:34 --> 1:15:40.225
So, Bellman-Ford does that.
All right, so Bellman-Ford
1093
1:15:40.225 --> 1:15:47.626
solves these constraints because
it's just a bunch of difference
1094
1:15:47.626 --> 1:15:51.972
constraints.
And we know that they are
1095
1:15:51.972 --> 1:15:57.963
solvable because you could
spread all the pieces out
1096
1:15:57.963 --> 1:16:03.25
arbitrarily far.
And, it minimizes the spread,
1097
1:16:03.25 --> 1:16:10.298
minimizes the size of the chip
I need, a max of x_i minus the
1098
1:16:10.298 --> 1:16:14.879
min of x_i.
So, this is it maximizes
1099
1:16:14.879 --> 1:16:18.167
compactness, or minimizes size
of the chip.
1100
1:16:18.167 --> 1:16:22.943
OK, this is a one-dimensional
problem, so it may seem a little
1101
1:16:22.943 --> 1:16:27.014
artificial, but the two
dimensional problem is really
1102
1:16:27.014 --> 1:16:29.049
hard to solve.
And this is,
1103
1:16:29.049 --> 1:16:33.355
in fact, the best you can do
with a nice polynomial time
1104
1:16:33.355 --> 1:16:37.419
algorithm.
There are other applications if
1105
1:16:37.419 --> 1:16:42.024
you're scheduling events in,
like, a multimedia environment,
1106
1:16:42.024 --> 1:16:46.629
and you want to guarantee that
this audio plays at least two
1107
1:16:46.629 --> 1:16:50.922
seconds after this video,
but then there are things that
1108
1:16:50.922 --> 1:16:55.605
are playing at the same time,
and they have to be within some
1109
1:16:55.605 --> 1:16:59.351
gap of each other,
so, lots of papers about using
1110
1:16:59.351 --> 1:17:02.786
Bellman-Ford,
solve difference constraints to
1111
1:17:02.786 --> 1:17:06.766
enable multimedia environments.
OK, so there you go.
1112
1:17:06.766 --> 1:17:11.449
And next class we'll see more
applications of Bellman-Ford to
1113
1:17:11.449 --> 1:17:14.181
all pairs shortest paths.
Questions?
1114
1:17:14.181 --> 1:17:17
Great.