WEBVTT

00:00:00.090 --> 00:00:01.800
The following
content is provided

00:00:01.800 --> 00:00:04.040
under a Creative
Commons license.

00:00:04.040 --> 00:00:06.880
Your support will help MIT
OpenCourseWare continue

00:00:06.880 --> 00:00:10.740
to offer high quality
educational resources for free.

00:00:10.740 --> 00:00:13.350
To make a donation or
view additional materials

00:00:13.350 --> 00:00:15.800
from hundreds of
MIT courses, visit

00:00:15.800 --> 00:00:21.994
MIT OpenCourseWare
at ocw.mit.edu

00:00:21.994 --> 00:00:24.850
PROFESSOR: All right,
let's get started.

00:00:24.850 --> 00:00:27.810
We return today to graph search.

00:00:27.810 --> 00:00:29.950
Last time we saw breadth-first
search, today we're

00:00:29.950 --> 00:00:31.672
going to do depth-first search.

00:00:31.672 --> 00:00:34.130
It's a simple algorithm, but
you can do lots of cool things

00:00:34.130 --> 00:00:34.570
with it.

00:00:34.570 --> 00:00:36.236
And that's what I'll
spend most of today

00:00:36.236 --> 00:00:39.680
on, in particular, telling
whether your graph has a cycle,

00:00:39.680 --> 00:00:42.680
and something called
topological sort.

00:00:42.680 --> 00:00:45.970
As usual, basically in
all graph algorithms

00:00:45.970 --> 00:00:48.790
in this class, the input, the
way the graph is specified

00:00:48.790 --> 00:00:52.840
is as an adjacency list, or I
guess adjacency list plural.

00:00:52.840 --> 00:00:56.460
So you have a bunch of lists,
each one says for each vertex,

00:00:56.460 --> 00:00:58.550
what are the vertices
I'm connected to?

00:00:58.550 --> 00:01:02.900
What are the vertices I can
get to in one step via an edge?

00:01:02.900 --> 00:01:05.040
So that's our
input and our goal,

00:01:05.040 --> 00:01:08.630
in general, with graph search
is to explore the graph.

00:01:08.630 --> 00:01:10.390
In particular, the
kind of exploration

00:01:10.390 --> 00:01:13.430
we're going to be doing today
is to visit all the vertices,

00:01:13.430 --> 00:01:17.416
in some order, and visit
each vertex only once.

00:01:17.416 --> 00:01:19.040
So the way we did
breadth-first search,

00:01:19.040 --> 00:01:20.581
breadth-first search
was really good.

00:01:20.581 --> 00:01:22.830
It explored things
layer by layer,

00:01:22.830 --> 00:01:25.340
and that was nice because
it gave us shortest paths,

00:01:25.340 --> 00:01:28.830
it gave us the fastest
way to get to everywhere,

00:01:28.830 --> 00:01:31.490
from a particular
source, vertex s.

00:01:31.490 --> 00:01:34.210
But if you can't get
from s to your vertex,

00:01:34.210 --> 00:01:37.410
than the shortest way to
get there is infinity,

00:01:37.410 --> 00:01:39.440
there's no way to get there.

00:01:39.440 --> 00:01:41.790
And BFS is good for detecting
that, it can tell you

00:01:41.790 --> 00:01:46.490
which vertices are
unreachable from s.

00:01:46.490 --> 00:01:50.130
DFS can do that as
well, but it's often

00:01:50.130 --> 00:01:52.390
used to explore the
whole graph, not just

00:01:52.390 --> 00:01:54.134
the part reachable
from s, and so

00:01:54.134 --> 00:01:55.800
we're going to see
how to do that today.

00:01:55.800 --> 00:01:58.580
This trick could be used
for be BFS or for DFS,

00:01:58.580 --> 00:02:01.590
but we're going to do it
here for DFS, because that's

00:02:01.590 --> 00:02:02.840
more common, let's say.

00:02:07.080 --> 00:02:09.014
So DFS.

00:02:21.110 --> 00:02:24.930
So depth-first search is kind
of like how you solve a maze.

00:02:24.930 --> 00:02:27.350
Like, the other weekend
I was at the big corn

00:02:27.350 --> 00:02:32.050
maze in central
Massachusetts, and it's

00:02:32.050 --> 00:02:34.584
easy to get lost in
there, in particular,

00:02:34.584 --> 00:02:36.250
because I didn't bring
any bread crumbs.

00:02:36.250 --> 00:02:39.160
The proper way to solve a
maze, if you're in there

00:02:39.160 --> 00:02:41.950
and all you can do is see which
way to go next and then walk

00:02:41.950 --> 00:02:43.730
a little bit to
the next junction,

00:02:43.730 --> 00:02:45.970
and then you have to
keep making decisions.

00:02:45.970 --> 00:02:49.490
Unless you have a really
good memory, which I do not,

00:02:49.490 --> 00:02:53.780
teaching staff can attest to
that, then an easy way to do it

00:02:53.780 --> 00:02:55.720
is to leave bread
crumbs behind, say,

00:02:55.720 --> 00:02:58.710
this is the last way
I went from this node,

00:02:58.710 --> 00:03:00.974
so that when I
reach a deadend, I

00:03:00.974 --> 00:03:02.390
have to turn around
and backtrack.

00:03:02.390 --> 00:03:04.450
I reach a breadcrumb that
say, oh, last time you

00:03:04.450 --> 00:03:07.160
went this way, next time
you should go this way,

00:03:07.160 --> 00:03:10.570
and in particular, keep track
at each node, which of the edges

00:03:10.570 --> 00:03:14.890
have I visited, which ones
are still left to visit.

00:03:14.890 --> 00:03:18.910
And this can be done very easily
on a computer using recursion.

00:03:30.520 --> 00:03:32.140
So high-level
description is we're

00:03:32.140 --> 00:03:37.400
going to just recursively
explore the graph,

00:03:37.400 --> 00:03:42.495
backtracking as necessary, kind
of like how you solve a maze.

00:03:54.980 --> 00:03:59.210
In fact, when I was
seven years old,

00:03:59.210 --> 00:04:00.960
one of the first
computer programs I wrote

00:04:00.960 --> 00:04:01.918
was for solving a maze.

00:04:01.918 --> 00:04:04.140
I didn't know it was
depth-first search at the time,

00:04:04.140 --> 00:04:04.810
but now I know.

00:04:11.050 --> 00:04:12.690
It was so much harder
doing algorithms

00:04:12.690 --> 00:04:15.540
when I didn't know
what they were.

00:04:15.540 --> 00:04:20.779
Anyway, I'm going to write some
code for depth-first search,

00:04:20.779 --> 00:04:26.900
it is super simple code, the
simplest graph algorithm.

00:04:49.175 --> 00:04:50.255
It's four lines.

00:05:05.590 --> 00:05:06.090
That's it.

00:05:06.090 --> 00:05:08.280
I'm going to write a little
bit of code after this,

00:05:08.280 --> 00:05:11.500
but this is basic
depth-first search.

00:05:11.500 --> 00:05:13.580
This will visit all
the vertices reachable

00:05:13.580 --> 00:05:16.480
from a given source, vertex s.

00:05:16.480 --> 00:05:19.780
So we're given the
adjacency list.

00:05:19.780 --> 00:05:22.130
I don't know why I put v
here, you could erase it,

00:05:22.130 --> 00:05:24.380
it's not necessary.

00:05:24.380 --> 00:05:29.030
And all we do is, we
have our vertex b, sorry,

00:05:29.030 --> 00:05:31.300
we have our vertex s.

00:05:31.300 --> 00:05:35.930
We look at all of the
outgoing edges from s.

00:05:35.930 --> 00:05:40.950
For each one, we'll
call it v, we check,

00:05:40.950 --> 00:05:42.770
have I visited this
vertex already?

00:05:45.422 --> 00:05:46.880
A place where we
need to be careful

00:05:46.880 --> 00:05:49.160
is to not repeat vertices.

00:05:49.160 --> 00:05:50.980
We need to do this
in BFS as well.

00:05:56.110 --> 00:05:58.430
So, the way we're
going to do that

00:05:58.430 --> 00:06:00.450
is by setting the
parent of a node,

00:06:00.450 --> 00:06:03.210
we'll see what that
actually means later.

00:06:03.210 --> 00:06:05.940
But for now, it's just, are you
in the parent structure or not?

00:06:05.940 --> 00:06:09.600
This is initially, we've
seen s, so we give it

00:06:09.600 --> 00:06:14.250
a parent of nothing, but it
exists in this dictionary.

00:06:14.250 --> 00:06:16.830
If the vertex b that
we're looking at

00:06:16.830 --> 00:06:19.300
is not in our dictionary,
we haven't seen it yet,

00:06:19.300 --> 00:06:23.190
we mark it as seen by
setting its parent to s,

00:06:23.190 --> 00:06:25.060
and then we
recursively visit it.

00:06:25.060 --> 00:06:26.310
That's it.

00:06:26.310 --> 00:06:29.120
Super simple, just recurse.

00:06:29.120 --> 00:06:32.130
Sort of the magical part
is the preventing yourself

00:06:32.130 --> 00:06:34.070
from repeating.

00:06:34.070 --> 00:06:37.250
As you explore the graph,
if you reach something

00:06:37.250 --> 00:06:39.950
you've already seen before
you just skip it again.

00:06:39.950 --> 00:06:45.010
So you only visit every
vertex once, at most once.

00:06:45.010 --> 00:06:47.260
This will not visit
the entire graph,

00:06:47.260 --> 00:06:50.840
it will only visit the
vertices reachable from s.

00:06:50.840 --> 00:06:52.940
The next part of the
code I'd like to give you

00:06:52.940 --> 00:06:56.920
is for visiting all the
vertices, and in the textbook

00:06:56.920 --> 00:06:58.820
this is called the DFS,
whereas this is just

00:06:58.820 --> 00:07:02.180
called DFS visit, that's
sort of the recursive part,

00:07:02.180 --> 00:07:08.330
and this is sort of a
top level algorithm.

00:07:08.330 --> 00:07:19.840
Here we are going to use
the set of vertices, b,

00:07:19.840 --> 00:07:22.040
and here we're just going
to iterate over the s's.

00:07:47.960 --> 00:07:51.150
So it looks almost the same,
but what we're iterating over

00:07:51.150 --> 00:07:52.200
is different.

00:07:52.200 --> 00:07:55.720
Here we're iterating over
the outgoing edges from s,

00:07:55.720 --> 00:07:57.855
here were iterating
over the choices of s.

00:08:03.190 --> 00:08:05.239
So the idea here
is we don't really

00:08:05.239 --> 00:08:06.530
know where to start our search.

00:08:06.530 --> 00:08:09.154
If it's a disconnected graph or
not a strongly connected graph,

00:08:09.154 --> 00:08:12.330
we might have to start
our search multiple times.

00:08:12.330 --> 00:08:15.520
This DFS algorithm is finding
all the possible places

00:08:15.520 --> 00:08:19.290
you might start the search
and trying them all.

00:08:19.290 --> 00:08:21.320
So it's like, OK, let's
try the first vertex.

00:08:21.320 --> 00:08:23.778
If that hasn't been visited,
which initially nothing's been

00:08:23.778 --> 00:08:27.380
visited, then visit it,
recursively, everything

00:08:27.380 --> 00:08:29.010
reachable from s.

00:08:29.010 --> 00:08:30.630
Then you go on to
the second vertex.

00:08:30.630 --> 00:08:32.480
Now, you may have already
visited it, then you skip it.

00:08:32.480 --> 00:08:34.271
Third vertex, maybe
you visited it already.

00:08:34.271 --> 00:08:36.250
Third, fourth
vertex, keep going,

00:08:36.250 --> 00:08:39.049
until you find some vertex
you haven't visited at all.

00:08:39.049 --> 00:08:42.990
And then you recursively visit
everything reachable from it,

00:08:42.990 --> 00:08:45.400
and you repeat.

00:08:45.400 --> 00:08:48.400
This will find all the
different clusters,

00:08:48.400 --> 00:08:50.480
all the different strongly
connected components

00:08:50.480 --> 00:08:51.630
of your graph.

00:08:51.630 --> 00:08:54.190
Most of the work is being
done by this recursion,

00:08:54.190 --> 00:08:55.970
but then there's
this top level, just

00:08:55.970 --> 00:08:59.090
to make sure that all
the vertices get visited.

00:08:59.090 --> 00:09:03.380
Let's do a little example,
so this is super clear,

00:09:03.380 --> 00:09:07.410
and then it will also
let me do something

00:09:07.410 --> 00:09:09.480
called edge classification.

00:09:09.480 --> 00:09:13.340
Once we see every
edge in the graph

00:09:13.340 --> 00:09:15.870
gets visited by DFS
in one way or another,

00:09:15.870 --> 00:09:18.820
and it's really helpful to
think about the different ways

00:09:18.820 --> 00:09:20.910
they can be visited.

00:09:20.910 --> 00:09:25.810
So here's a graph.

00:09:25.810 --> 00:09:29.010
I think its a similar
to one from last class.

00:09:46.160 --> 00:09:50.220
It's not strongly
connected, I don't think,

00:09:50.220 --> 00:09:53.960
so you can't get from
these vertices to c.

00:09:53.960 --> 00:09:55.510
You can get from
c to everywhere,

00:09:55.510 --> 00:10:00.110
it looks like, but not
strongly connected.

00:10:00.110 --> 00:10:02.820
And we're going to run
DFS, and I think, basically

00:10:02.820 --> 00:10:06.480
in alphabetical order
is how we're imagining--

00:10:06.480 --> 00:10:08.230
these vertices have
to be ordered somehow,

00:10:08.230 --> 00:10:12.680
we don't really care how, but
for sake of example I care.

00:10:12.680 --> 00:10:15.610
So we're going to
start with a, that's

00:10:15.610 --> 00:10:17.029
the first vertex in here.

00:10:17.029 --> 00:10:19.570
We're going to recursively visit
everything reachable from a,

00:10:19.570 --> 00:10:22.750
so we enter here
with s equals a.

00:10:22.750 --> 00:10:30.275
So I'll mark this s1, to be the
first value of s at this level.

00:10:33.070 --> 00:10:37.180
So we consider-- I'm going
to check the order here--

00:10:37.180 --> 00:10:39.345
first edge we look at,
there's two outgoing edges,

00:10:39.345 --> 00:10:40.845
let's say we look
at this one first.

00:10:46.230 --> 00:10:48.950
We look at b, b has
not been visited yet,

00:10:48.950 --> 00:10:50.570
has no parent pointer.

00:10:50.570 --> 00:10:54.040
This one has a
parent pointer of 0.

00:10:54.040 --> 00:10:59.560
B we're going to give a parent
pointer of a, that's here.

00:10:59.560 --> 00:11:01.970
Then we recursively
visit everything for b.

00:11:01.970 --> 00:11:04.670
So we look at all the outgoing
edges from b, there's only one.

00:11:04.670 --> 00:11:05.750
So we visit this edge.

00:11:09.230 --> 00:11:11.160
for b to e. e has
not been visited,

00:11:11.160 --> 00:11:15.200
so we set as parent pointer to
b, an now we recursively visit

00:11:15.200 --> 00:11:16.451
e.

00:11:16.451 --> 00:11:22.590
e has only one outgoing edge, so
we look at it, over here to d.

00:11:25.230 --> 00:11:29.286
d has not been visited, so
we set a parent pointer to e,

00:11:29.286 --> 00:11:31.160
and we look at all the
outgoing edges from d.

00:11:31.160 --> 00:11:33.170
d has one outgoing
edge, which is

00:11:33.170 --> 00:11:35.760
to b. b has already
been visited,

00:11:35.760 --> 00:11:38.530
so we skip that
one, nothing to do.

00:11:38.530 --> 00:11:42.720
That's the else case
of this if, so we

00:11:42.720 --> 00:11:45.730
do nothing in the else case,
we just go to the next edge.

00:11:45.730 --> 00:11:48.450
But there's no next edge
for d, so we're done.

00:11:48.450 --> 00:11:52.440
So this algorithm returns
to the next level up.

00:11:52.440 --> 00:11:54.220
Next level up was
e, we were iterating

00:11:54.220 --> 00:11:55.690
over the outgoing edges from e.

00:11:55.690 --> 00:11:59.870
But there was only one, so
we're done, so e finishes.

00:11:59.870 --> 00:12:05.340
Then we backtrack to b,
which is always going back

00:12:05.340 --> 00:12:07.420
along the parent pointer,
but it's also just

00:12:07.420 --> 00:12:08.500
in the recursion.

00:12:08.500 --> 00:12:10.915
We know where to go back to.

00:12:10.915 --> 00:12:13.540
We were going over the outgoing
edges from b, there's only one,

00:12:13.540 --> 00:12:15.610
we're done.

00:12:15.610 --> 00:12:16.960
So we go back to a.

00:12:16.960 --> 00:12:18.910
We only looked at one
outgoing edge from a.

00:12:18.910 --> 00:12:22.130
There's another outgoing
edge, which is this one,

00:12:22.130 --> 00:12:24.880
but we've already visited
d, so we skip over that one,

00:12:24.880 --> 00:12:27.240
too, so we're done
recursively visiting

00:12:27.240 --> 00:12:30.970
everything reachable from a.

00:12:30.970 --> 00:12:34.190
Now we go back to this
loop, the outer loop.

00:12:34.190 --> 00:12:38.310
So we did a, next we look at b,
we say, oh b has been visited,

00:12:38.310 --> 00:12:40.000
we don't need to do
anything from there.

00:12:40.000 --> 00:12:42.430
Then we go to c, c
hasn't been visited

00:12:42.430 --> 00:12:46.210
so we're going to loop
from c, and so this

00:12:46.210 --> 00:12:50.390
is our second choice
of s in this recursion,

00:12:50.390 --> 00:12:53.460
or in this outer loop.

00:12:53.460 --> 00:12:56.200
And so we look at the
outgoing edges from s2,

00:12:56.200 --> 00:12:59.210
let me match the
order in the notes.

00:12:59.210 --> 00:13:03.516
Let's say first we go to f.

00:13:03.516 --> 00:13:08.150
f has not been visited, so we
set its parent pointer to c.

00:13:08.150 --> 00:13:10.130
Then we look at all the
outgoing edges from f.

00:13:10.130 --> 00:13:13.710
There's one outgoing edge
from f, it goes to f.

00:13:13.710 --> 00:13:18.860
I guess I shouldn't
really bold this, sorry.

00:13:18.860 --> 00:13:21.040
I'll say what the bold
edges mean in a moment.

00:13:23.570 --> 00:13:25.300
This is just a regular edge.

00:13:25.300 --> 00:13:27.570
We follow the edge from f to f.

00:13:27.570 --> 00:13:29.385
We see, oh, f has
already been visited,

00:13:29.385 --> 00:13:31.400
it already has a parent
pointer, so there's

00:13:31.400 --> 00:13:33.389
no point going down there.

00:13:33.389 --> 00:13:35.430
We're done with f, that's
the only outgoing edge.

00:13:35.430 --> 00:13:37.650
We go back to c, there's
one other outgoing edge,

00:13:37.650 --> 00:13:40.900
but it leads to a vertex we've
already visited, namely e,

00:13:40.900 --> 00:13:44.600
and so we're done with visiting
everything reachable from c.

00:13:44.600 --> 00:13:46.100
We didn't visit
everything reachable

00:13:46.100 --> 00:13:49.250
from c, because some of it
was already visited from a.

00:13:49.250 --> 00:13:51.685
Then we go back to the outer
loop, say, OK, what about d?

00:13:51.685 --> 00:13:53.060
D has been visited,
what about e?

00:13:53.060 --> 00:13:54.351
E's been visited, what about f?

00:13:54.351 --> 00:13:55.590
F's been visited.

00:13:55.590 --> 00:13:57.790
So we're visiting
these vertices again,

00:13:57.790 --> 00:14:03.640
but should only be twice
in total, and in the end

00:14:03.640 --> 00:14:06.230
we visit all the vertices,
and, in a certain sense,

00:14:06.230 --> 00:14:07.170
all the edges as well.

00:14:12.440 --> 00:14:18.070
Let's talk about running time.

00:14:27.597 --> 00:14:29.930
What do you think the running
time of this algorithm is?

00:14:38.120 --> 00:14:39.590
Anyone?

00:14:39.590 --> 00:14:42.935
Time to wake up.

00:14:42.935 --> 00:14:43.897
AUDIENCE: Upper bound?

00:14:43.897 --> 00:14:45.810
PROFESSOR: Upper bound, sure.

00:14:45.810 --> 00:14:46.310
AUDIENCE: V?

00:14:46.310 --> 00:14:46.851
PROFESSOR: V?

00:14:46.851 --> 00:14:48.690
AUDIENCE: [INAUDIBLE].

00:14:48.690 --> 00:14:55.690
PROFESSOR: V is a little bit
optimistic, plus e, good,

00:14:55.690 --> 00:14:57.720
collaborative effort.

00:14:57.720 --> 00:15:00.070
It's linear time, just like BFS.

00:15:00.070 --> 00:15:02.520
This is what we
call linear time,

00:15:02.520 --> 00:15:07.550
because this is the
size of the input.

00:15:07.550 --> 00:15:11.342
It's theta V plus E
for the whole thing.

00:15:11.342 --> 00:15:12.800
The size of the
input was v plus e.

00:15:12.800 --> 00:15:15.300
We needed v slots
in an array, plus we

00:15:15.300 --> 00:15:20.400
needed e items in these linked
lists, one for each edge.

00:15:20.400 --> 00:15:22.560
We have to traverse
that whole structure.

00:15:22.560 --> 00:15:27.030
The reason it's order v plus e
is-- first, as you were saying,

00:15:27.030 --> 00:15:30.320
you're visiting every vertex
once in this outer loop,

00:15:30.320 --> 00:15:46.160
so not worrying about the
recursion in DFS alone,

00:15:46.160 --> 00:15:48.480
so that's order b.

00:15:48.480 --> 00:15:51.040
Then have to worry
about this recursion,

00:15:51.040 --> 00:15:56.160
but we know that whenever we
call DFS visit on a vertex,

00:15:56.160 --> 00:15:58.961
that it did not have
a parent before.

00:15:58.961 --> 00:16:00.830
Right before we
called DFS visit,

00:16:00.830 --> 00:16:03.170
we set its parent
for the first time.

00:16:03.170 --> 00:16:05.590
Right before we called
DFS visit on v here,

00:16:05.590 --> 00:16:07.580
we set as parent
for the first time,

00:16:07.580 --> 00:16:09.930
because it wasn't set before.

00:16:09.930 --> 00:16:17.880
So DFS visit, and I'm
going to just write of v,

00:16:17.880 --> 00:16:19.840
meaning the last argument here.

00:16:25.520 --> 00:16:32.660
It's called once, at
most once, per vertex b.

00:16:35.800 --> 00:16:37.580
But it does not
take constant time.

00:16:37.580 --> 00:16:41.310
This takes constant time per
vertex, plus a recursive call.

00:16:41.310 --> 00:16:44.320
This thing, this takes constant
time, but there's a for loop

00:16:44.320 --> 00:16:44.820
here.

00:16:44.820 --> 00:16:47.140
We have to pay for however
many outgoing edges

00:16:47.140 --> 00:16:49.300
there are from b, that's
the part you're missing.

00:16:52.880 --> 00:17:00.560
And we pay length of adjacency
of v for that vertex.

00:17:00.560 --> 00:17:03.046
So the total in
addition to this v

00:17:03.046 --> 00:17:08.300
is going to be the order, sum
overall vertices, v in capital

00:17:08.300 --> 00:17:13.400
V, of length of the
adjacency, list for v,

00:17:13.400 --> 00:17:22.150
which is E. This
is the handshaking

00:17:22.150 --> 00:17:24.592
lemma from last time.

00:17:24.592 --> 00:17:27.010
It's twice e for
undirected graphs,

00:17:27.010 --> 00:17:29.550
it's e for directed graphs.

00:17:29.550 --> 00:17:33.970
I've drawn directed graphs here,
it's a little more interesting.

00:17:33.970 --> 00:17:37.560
OK, so it's linear time, just
like the BFS, so you could say,

00:17:37.560 --> 00:17:42.240
who cares, but DFS offers a
lot of different properties

00:17:42.240 --> 00:17:42.870
than BFS.

00:17:42.870 --> 00:17:44.660
They each have their niche.

00:17:44.660 --> 00:17:46.250
BFS is great for shortest paths.

00:17:46.250 --> 00:17:49.080
You want to know the fastest
way to solve the Rubik's cube,

00:17:49.080 --> 00:17:50.560
BFS will find it.

00:17:50.560 --> 00:17:53.330
You want to find the fastest
way to solve the Rubik's cube,

00:17:53.330 --> 00:17:55.150
DFS will not find it.

00:17:55.150 --> 00:17:57.090
It's not following
shortest paths here.

00:17:57.090 --> 00:17:59.300
Going from a to
d, we use the path

00:17:59.300 --> 00:18:01.324
of length 3, that's
the bold edges.

00:18:01.324 --> 00:18:02.740
We could have gone
directly from a

00:18:02.740 --> 00:18:05.170
to d, so it's a
different kind of search,

00:18:05.170 --> 00:18:07.340
but sort of the inverse.

00:18:07.340 --> 00:18:10.560
But it's extremely useful,
in particular, in the way

00:18:10.560 --> 00:18:13.082
that it classifies edges.

00:18:13.082 --> 00:18:14.790
So let me talk about
edge classification.

00:18:27.630 --> 00:18:31.540
You can check every edge
in this graph gets visited.

00:18:31.540 --> 00:18:34.060
In a directed graph every
edge gets visited once,

00:18:34.060 --> 00:18:35.740
in an undirected
graph, every edge

00:18:35.740 --> 00:18:37.660
gets visited twice,
once from each side.

00:18:40.200 --> 00:18:42.240
And when you visit
that edge, there's

00:18:42.240 --> 00:18:45.710
sort of different categories
of what could happen to it.

00:18:45.710 --> 00:18:50.920
Maybe the edge led to something
unvisited, when you went there.

00:18:50.920 --> 00:18:52.190
We call those tree edges.

00:19:10.360 --> 00:19:12.920
That's what the parent
pointers are specifying

00:19:12.920 --> 00:19:16.420
and all the bold edges here
are called three edges.

00:19:16.420 --> 00:19:27.410
This is when we visit a
new vertex via that edge.

00:19:29.832 --> 00:19:31.540
So we look at the
other side of the edge,

00:19:31.540 --> 00:19:33.024
we discover a new vertex.

00:19:33.024 --> 00:19:34.440
Those are what we
call tree edges,

00:19:34.440 --> 00:19:37.830
it turns out they form
a tree, a directed tree.

00:19:37.830 --> 00:19:39.930
That's a lemma you can prove.

00:19:39.930 --> 00:19:40.810
You can see it here.

00:19:40.810 --> 00:19:44.650
We just have a path, actually a
forest would be more accurate.

00:19:44.650 --> 00:19:48.916
We have a path abed,
and we have an edge cf,

00:19:48.916 --> 00:19:51.209
but, in general, it's a forest.

00:19:51.209 --> 00:19:53.250
So for example, if there
was another thing coming

00:19:53.250 --> 00:19:57.540
from e here, let's modify my
graph, we would, at some point,

00:19:57.540 --> 00:19:59.720
visit that edge and say,
oh, here's a new way to go,

00:19:59.720 --> 00:20:04.250
and now that bold structure
forms an actual tree.

00:20:04.250 --> 00:20:06.850
These are called tree edges,
you can call them forest edges

00:20:06.850 --> 00:20:10.080
if you feel like it.

00:20:10.080 --> 00:20:13.120
There are other edges in
there, the nonbold edges,

00:20:13.120 --> 00:20:17.260
and the textbook distinguishes
three types, three types?

00:20:17.260 --> 00:20:19.950
Three types, so many types.

00:20:22.500 --> 00:20:40.580
They are forward edges,
backward edges, and cross edges.

00:20:44.720 --> 00:20:47.740
Some of these are more useful
to distinguish than others,

00:20:47.740 --> 00:20:51.490
but it doesn't hurt
to have them all.

00:20:51.490 --> 00:20:57.590
So, for example, this edge I'm
going to call a forward edge,

00:20:57.590 --> 00:21:01.260
just write f,
that's unambiguous,

00:21:01.260 --> 00:21:04.430
because it goes, in some
sense, forward along the tree.

00:21:04.430 --> 00:21:09.730
It goes from the root of
this tree to a descendant.

00:21:09.730 --> 00:21:12.130
There is a path
in the tree from a

00:21:12.130 --> 00:21:14.720
to d, so we call
it a forward edge.

00:21:14.720 --> 00:21:20.770
By contrast, this edge I'm
going to call a backward edge,

00:21:20.770 --> 00:21:24.570
because it goes from
a node in the tree

00:21:24.570 --> 00:21:26.390
to an ancestor in the trees.

00:21:26.390 --> 00:21:28.914
If you think of parents, I
can go from d to its parent

00:21:28.914 --> 00:21:30.830
to its parent, and that's
where the edge goes,

00:21:30.830 --> 00:21:33.460
so that's a backward
edge-- double check I

00:21:33.460 --> 00:21:36.870
got these not reversed,
yeah, that's right.

00:21:36.870 --> 00:21:39.334
Forward edge because I could
go from d to its parent

00:21:39.334 --> 00:21:41.000
to its parent to its
parent and the edge

00:21:41.000 --> 00:21:44.220
went the other way,
that's a forward edge.

00:21:44.220 --> 00:21:49.170
So forward edge goes from a node
to a descendant in the tree.

00:21:52.540 --> 00:21:56.660
Backward edge goes from a node
to an ancestor in the tree.

00:22:02.670 --> 00:22:04.180
And when I say,
tree, I mean forest.

00:22:07.080 --> 00:22:10.170
And then all the other
edges are cross edges.

00:22:12.940 --> 00:22:17.670
So I guess, here,
this is a cross edge.

00:22:17.670 --> 00:22:20.840
In this case, it goes from
one tree to another, doesn't

00:22:20.840 --> 00:22:22.540
have to go between
different trees.

00:22:22.540 --> 00:22:28.540
For example, let's say
I'm visiting d, then

00:22:28.540 --> 00:22:32.942
I go back to e, I visit g,
or there could be this edge.

00:22:32.942 --> 00:22:37.720
If this edge existed, it
would be a cross edge,

00:22:37.720 --> 00:22:40.970
because g and d are
not ancestor related,

00:22:40.970 --> 00:22:42.980
neither one is an
ancestor of the other,

00:22:42.980 --> 00:22:46.329
they are siblings actually.

00:22:46.329 --> 00:22:47.870
So there's, in
general, there's going

00:22:47.870 --> 00:22:51.210
to be some subtree over
here, some subtree over here,

00:22:51.210 --> 00:22:55.760
and this is a cross edge
between two different subtrees.

00:22:55.760 --> 00:23:07.960
This cross edge is between two,
sort of, non ancestor related,

00:23:07.960 --> 00:23:16.955
I think is the shortest way to
write this, subtrees or nodes.

00:23:26.520 --> 00:23:29.065
A little puzzle for
you, well, I guess

00:23:29.065 --> 00:23:31.620
the first question is, how do
you compute this structure?

00:23:31.620 --> 00:23:34.212
How do you compute
which edges are which?

00:23:34.212 --> 00:23:36.670
This is not hard, although I
haven't written it in the code

00:23:36.670 --> 00:23:37.200
here.

00:23:37.200 --> 00:23:42.290
You can check the textbook
for one way to do it.

00:23:42.290 --> 00:23:45.800
The parent structure tells you
which edges are tree edges.

00:23:45.800 --> 00:23:47.980
So that part we have done.

00:23:47.980 --> 00:23:52.670
Every parent pointer corresponds
to the reverse of a tree edge,

00:23:52.670 --> 00:23:55.250
so at the same time you could
mark that edge a tree edge,

00:23:55.250 --> 00:23:56.958
and you'd know which
edges are tree edges

00:23:56.958 --> 00:23:58.874
and which edges
are nontree edges.

00:23:58.874 --> 00:24:01.290
If you want to know which are
forward, which are backward,

00:24:01.290 --> 00:24:06.130
which are cross edges, the
key thing you need to know

00:24:06.130 --> 00:24:14.140
is, well, in particular,
for backward edges, one way

00:24:14.140 --> 00:24:16.850
to compute them is
to mark which nodes

00:24:16.850 --> 00:24:19.880
you are currently exploring.

00:24:19.880 --> 00:24:22.660
So when we do a DFS
visit on a node,

00:24:22.660 --> 00:24:25.160
we could say at
the beginning here,

00:24:25.160 --> 00:24:31.230
basically, we're starting
to visit s, say, start s,

00:24:31.230 --> 00:24:33.569
and then at the end of
this for loop, we write,

00:24:33.569 --> 00:24:34.485
we're finished with s.

00:24:38.190 --> 00:24:40.130
And you could mark that
in the s structure.

00:24:40.130 --> 00:24:43.720
You could say s dot in
process is true up here,

00:24:43.720 --> 00:24:46.730
s dot in process
equals false down here.

00:24:46.730 --> 00:24:49.470
Keep track of which nodes are
currently in the recursion

00:24:49.470 --> 00:24:53.120
stack, just by marking
them and unmarking them

00:24:53.120 --> 00:24:55.430
at the beginning and the end.

00:24:55.430 --> 00:24:58.210
Then we'll know, if we follow
an edge and it's an edge

00:24:58.210 --> 00:25:01.220
to somebody who's
already in the stack,

00:25:01.220 --> 00:25:06.020
then it's a backward edge,
because that's-- everyone

00:25:06.020 --> 00:25:10.690
in the stack is an ancestor
from our current node.

00:25:10.690 --> 00:25:15.400
Detecting forward edges,
it's a little trickier.

00:25:18.940 --> 00:25:23.330
Forward edges
versus cross edges,

00:25:23.330 --> 00:25:25.220
any suggestions on an
easy way to do that?

00:25:28.480 --> 00:25:31.840
I don't think I know
an easy way to do that.

00:25:31.840 --> 00:25:33.560
It can be done.

00:25:33.560 --> 00:25:35.750
The way the textbook does
it is a little bit more

00:25:35.750 --> 00:25:41.030
sophisticated, in that when
they start visiting a vertex,

00:25:41.030 --> 00:25:44.890
they record the time
that it got visited.

00:25:44.890 --> 00:25:46.620
What's time?

00:25:46.620 --> 00:25:49.220
You could think of it as
the clock on your computer,

00:25:49.220 --> 00:25:51.140
another way to do
it is, every time

00:25:51.140 --> 00:25:55.000
you do a step in this algorithm,
you increment a counter.

00:25:55.000 --> 00:25:58.351
So every time anything happens,
you increment a counter,

00:25:58.351 --> 00:25:59.850
and then you store
the value of that

00:25:59.850 --> 00:26:02.910
counter here for s, that
would be the start time for s,

00:26:02.910 --> 00:26:06.100
you store the finish
time for s down here,

00:26:06.100 --> 00:26:08.040
and then this gives
you, this tells you

00:26:08.040 --> 00:26:09.970
when a node was
visited, and you can

00:26:09.970 --> 00:26:12.520
use that to compute when
an edge is a forward edge

00:26:12.520 --> 00:26:14.924
and otherwise it's a cross edge.

00:26:14.924 --> 00:26:16.840
It's not terribly exciting,
though, so I'm not

00:26:16.840 --> 00:26:18.810
going to detail that.

00:26:18.810 --> 00:26:22.450
You can look at the textbook
if you're interested.

00:26:22.450 --> 00:26:24.140
But here's a fun puzzle.

00:26:24.140 --> 00:26:32.920
In an undirected graph, which
of these edges can exist?

00:26:32.920 --> 00:26:38.790
We can have a vote, do some
democratic mathematics.

00:26:38.790 --> 00:26:41.910
How many people think tree edges
exist in undirected graphs?

00:26:44.510 --> 00:26:46.170
You, OK.

00:26:46.170 --> 00:26:46.670
Sarini does.

00:26:46.670 --> 00:26:47.740
That's a good sign.

00:26:47.740 --> 00:26:49.340
How many people
think forward edges

00:26:49.340 --> 00:26:50.920
exist in an undirected graph?

00:26:54.310 --> 00:26:54.870
A couple.

00:26:54.870 --> 00:26:56.370
How many people
think backward edges

00:26:56.370 --> 00:26:59.500
exist in an undirected graph?

00:26:59.500 --> 00:27:00.000
Couple.

00:27:00.000 --> 00:27:01.850
How many people
think cross edges

00:27:01.850 --> 00:27:03.980
exist in undirected graph?

00:27:03.980 --> 00:27:05.250
More people, OK.

00:27:05.250 --> 00:27:07.870
I think voting worked.

00:27:07.870 --> 00:27:10.830
They all exist, no,
that's not true.

00:27:10.830 --> 00:27:13.217
This one can exist and
this one can exist.

00:27:13.217 --> 00:27:15.050
I actually wrote the
wrong ones in my notes,

00:27:15.050 --> 00:27:19.020
so it's good to trick you,
no, it's I made a mistake.

00:27:19.020 --> 00:27:20.870
It's very easy to
get these mixed up

00:27:20.870 --> 00:27:24.360
and you can think
about why this is true,

00:27:24.360 --> 00:27:26.200
maybe I'll draw some
pictures to clarify.

00:27:30.080 --> 00:27:35.570
This is something, you remember
the-- there was BFS diagram,

00:27:35.570 --> 00:27:38.460
I talked a little bit
about this last class.

00:27:38.460 --> 00:27:40.650
Tree edges better exist,
those are the things

00:27:40.650 --> 00:27:42.370
you use to visit new vertices.

00:27:42.370 --> 00:27:45.640
So that always happens,
undirected or otherwise.

00:27:45.640 --> 00:27:47.640
Forward edges, though,
forward edge of

00:27:47.640 --> 00:27:51.590
would be, OK, I visited
this, then I visited this.

00:27:51.590 --> 00:27:52.770
Those were tree edges.

00:27:55.370 --> 00:27:58.552
Then I backtrack and I
follow an edge like this.

00:27:58.552 --> 00:27:59.760
This would be a forward edge.

00:27:59.760 --> 00:28:03.470
And in a directed
graph that can happen.

00:28:03.470 --> 00:28:11.320
In an undirected graph,
it can also happen, right?

00:28:11.320 --> 00:28:12.540
Oh, no, it can't, it can't.

00:28:12.540 --> 00:28:14.530
OK.

00:28:14.530 --> 00:28:15.720
So confusing.

00:28:15.720 --> 00:28:17.970
undirected graph, if
you look like this,

00:28:17.970 --> 00:28:20.300
you start-- let's say this is s.

00:28:20.300 --> 00:28:24.000
You start here, and suppose
we follow this edge.

00:28:24.000 --> 00:28:27.180
We get to here, then we follow
this edge, we get to here.

00:28:27.180 --> 00:28:31.390
Then we will follow this
edge in the other direction,

00:28:31.390 --> 00:28:35.240
and that's guaranteed to
finish before we get back to s.

00:28:35.240 --> 00:28:36.970
So, in order to
be a forward edge,

00:28:36.970 --> 00:28:39.110
this one has to be
visited after this one,

00:28:39.110 --> 00:28:43.030
from s, but in this scenario,
if you follow this one first,

00:28:43.030 --> 00:28:44.530
you'll eventually
get to this vertex

00:28:44.530 --> 00:28:47.440
and then you will come back,
and then that will be classified

00:28:47.440 --> 00:28:49.670
as a backward edge in
an undirected graph.

00:28:49.670 --> 00:28:53.335
So you can never have forward
edges in an undirected graph.

00:29:00.900 --> 00:29:04.490
But I have a backward edge
here, that would suggest

00:29:04.490 --> 00:29:08.190
I can have backward edges
here, and no cross edges.

00:29:08.190 --> 00:29:14.410
Well, democracy did not work, I
was swayed by the popular vote.

00:29:14.410 --> 00:29:17.700
So I claim, apparently,
cross edges do not exist.

00:29:17.700 --> 00:29:18.660
Let's try to draw this.

00:29:18.660 --> 00:29:26.240
So a cross edge typical
scenario would be either here,

00:29:26.240 --> 00:29:29.900
you follow this
edge, you backtrack,

00:29:29.900 --> 00:29:31.950
you follow another
edge, and then

00:29:31.950 --> 00:29:34.670
you discover there's was an
edge back to some other subtree

00:29:34.670 --> 00:29:36.020
that you've already visited.

00:29:36.020 --> 00:29:38.365
That can happen in
an undirected graph.

00:29:38.365 --> 00:29:41.930
For the same reason, if
I follow this one first,

00:29:41.930 --> 00:29:46.240
and this edge exists undirected,
then I will go down that way.

00:29:46.240 --> 00:29:50.260
So it will be actually tree
edge, not a cross edge.

00:29:50.260 --> 00:29:51.670
OK, phew.

00:29:51.670 --> 00:29:56.494
That means my
notes were correct.

00:29:56.494 --> 00:29:57.910
I was surprised,
because they were

00:29:57.910 --> 00:30:04.355
copied from the textbook,
uncorrect my correction.

00:30:04.355 --> 00:30:04.855
Good.

00:30:10.080 --> 00:30:13.140
So what?

00:30:13.140 --> 00:30:15.930
Why do I care about these
edge classifications?

00:30:15.930 --> 00:30:21.970
I claim they're super handy for
two problems, cycle detection,

00:30:21.970 --> 00:30:24.140
which is pretty
intuitive problem.

00:30:24.140 --> 00:30:26.760
Does my graph have any cycles?

00:30:26.760 --> 00:30:29.890
In the directed case, this
is particularly interesting.

00:30:29.890 --> 00:30:33.390
I want to know, does a graph
have any directed cycles?

00:30:33.390 --> 00:30:35.360
And another problem
called topological sort,

00:30:35.360 --> 00:30:36.390
which we will get to.

00:30:41.500 --> 00:30:45.360
So let's start with
cycle detection.

00:30:45.360 --> 00:30:48.870
This is actually a warmup
for topological sort.

00:30:52.760 --> 00:30:55.680
So does my graph
have any cycles?

00:30:55.680 --> 00:31:00.600
G has a cycle, I claim.

00:31:00.600 --> 00:31:10.660
This happens, if and only if, G
has a back edge, or let's say,

00:31:10.660 --> 00:31:13.940
a depth-first search of
that graph has a back edge.

00:31:17.250 --> 00:31:19.840
So it doesn't matter
where I start from

00:31:19.840 --> 00:31:22.944
or how this algorithm-- I run
this top level DFS algorithm,

00:31:22.944 --> 00:31:24.360
explore the whole
graph, because I

00:31:24.360 --> 00:31:26.970
want to know in the whole
graph is there a cycle?

00:31:26.970 --> 00:31:29.580
I claim, if there's a back
edge, then there's a cycle.

00:31:33.030 --> 00:31:35.729
So it all comes
down to back edges.

00:31:35.729 --> 00:31:38.020
This will work for both
directed and undirected graphs.

00:31:38.020 --> 00:31:41.070
Detecting cycles is pretty
easy in undirected graphs.

00:31:41.070 --> 00:31:43.370
It's a little more subtle
with directed graphs,

00:31:43.370 --> 00:31:46.750
because you have to worry
about the edge directions.

00:31:46.750 --> 00:31:49.610
So let's prove this.

00:31:49.610 --> 00:31:52.770
We haven't done a
serious proof in a while,

00:31:52.770 --> 00:31:57.110
so this is still a pretty easy
one, let's think about it.

00:31:57.110 --> 00:31:58.880
What do you think is
the easier direction

00:31:58.880 --> 00:32:02.780
to prove here, left or right?

00:32:02.780 --> 00:32:03.720
To more democracy.

00:32:03.720 --> 00:32:07.292
How many people
think left is easy?

00:32:07.292 --> 00:32:08.360
A couple.

00:32:08.360 --> 00:32:10.240
How many people
think right is easy?

00:32:10.240 --> 00:32:12.410
A whole bunch more.

00:32:12.410 --> 00:32:14.890
I disagree with you.

00:32:14.890 --> 00:32:18.320
I guess it depends
what you consider easy.

00:32:18.320 --> 00:32:21.210
Let me show you
how easy left is.

00:32:21.210 --> 00:32:25.780
Left is, I have a back edge, I
want to claim there's a cycle.

00:32:25.780 --> 00:32:27.610
What is the back edge look like?

00:32:27.610 --> 00:32:34.050
Well, it's an edge to
an ancestor in the tree.

00:32:34.050 --> 00:32:35.796
If this node is a
descendant of this node

00:32:35.796 --> 00:32:39.920
and this node is an ancestor
of this node, that's

00:32:39.920 --> 00:32:42.860
saying there are
tree edges, there's

00:32:42.860 --> 00:32:45.820
a path, a tree path, that
connects one to the other.

00:32:49.340 --> 00:32:54.160
So these are tree
edges, because this

00:32:54.160 --> 00:32:57.859
is supposed to be an
ancestor, and this

00:32:57.859 --> 00:32:59.150
is supposed to be a descendant.

00:33:03.670 --> 00:33:08.770
And that's the definition
of a back edge.

00:33:08.770 --> 00:33:11.540
Do you see a cycle?

00:33:11.540 --> 00:33:12.820
I see a cycle.

00:33:12.820 --> 00:33:17.550
This is a cycle, directed cycle.

00:33:17.550 --> 00:33:21.970
So if there's a back edge, by
definition, it makes a cycle.

00:33:21.970 --> 00:33:24.290
Now, it's harder to say
if I have 10 back edges,

00:33:24.290 --> 00:33:25.400
how many cycles are there?

00:33:25.400 --> 00:33:26.560
Could be many.

00:33:26.560 --> 00:33:28.880
But if there's a
back edge, there's

00:33:28.880 --> 00:33:30.410
definitely at least one cycle.

00:33:34.082 --> 00:33:35.790
The other direction
is also not too hard,

00:33:35.790 --> 00:33:38.600
but I would hesitate
to call it easy.

00:33:38.600 --> 00:33:42.690
Any suggestions if, I
know there is a cycle,

00:33:42.690 --> 00:33:46.910
how do I prove that there's
a back edge somewhere?

00:33:46.910 --> 00:33:49.110
Think about that,
let me draw a cycle.

00:34:11.439 --> 00:34:12.480
There's a length k cycle.

00:34:16.214 --> 00:34:17.880
Where do you think,
which of these edges

00:34:17.880 --> 00:34:19.260
do you think is going
to be a back edge?

00:34:19.260 --> 00:34:20.835
Let's hope it's
one of these edges.

00:34:23.350 --> 00:34:24.190
Sorry?

00:34:24.190 --> 00:34:25.420
AUDIENCE: Vk to v zero.

00:34:25.420 --> 00:34:26.560
PROFESSOR: Vk to v zero.

00:34:26.560 --> 00:34:31.000
That's a good idea, maybe
this is a back edge.

00:34:31.000 --> 00:34:34.670
Of course, this is
symmetric, why that edge?

00:34:34.670 --> 00:34:36.780
I labeled it in
a suggestive way,

00:34:36.780 --> 00:34:39.389
but I need to say something
before I know actually which

00:34:39.389 --> 00:34:42.404
edge is going to
be the back edge.

00:34:42.404 --> 00:34:44.320
AUDIENCE: You have to
say you start to v zero?

00:34:44.320 --> 00:34:45.850
PROFESSOR: Start at v zero.

00:34:45.850 --> 00:34:48.460
If I started a search
of v zero, that

00:34:48.460 --> 00:34:49.839
looks good, because
the search is

00:34:49.839 --> 00:34:51.719
kind of going to go
in this direction.

00:34:51.719 --> 00:34:53.949
vk will maybe be the
last thing to be visited,

00:34:53.949 --> 00:34:55.480
that's not actually true.

00:34:55.480 --> 00:34:57.710
Could be there's an edge
directly from v zero to vk,

00:34:57.710 --> 00:35:00.700
but intuitively vk
will kind of later,

00:35:00.700 --> 00:35:02.470
and then when this
edge gets visited,

00:35:02.470 --> 00:35:05.350
this will be an ancestor
and it will be a back edge.

00:35:05.350 --> 00:35:10.270
Of course, we may not
start a search here,

00:35:10.270 --> 00:35:12.240
so calling it the
start of the search

00:35:12.240 --> 00:35:16.079
is not quite right,
a little different.

00:35:16.079 --> 00:35:18.800
AUDIENCE: First vertex
that gets hit [INAUDIBLE].

00:35:18.800 --> 00:35:21.550
PROFESSOR: First vertex
that gets hit, good.

00:35:21.550 --> 00:35:24.820
I'm going to start the
numbering , v zero,

00:35:24.820 --> 00:35:38.460
let's assume v 0 is the
first vertex in the cycle,

00:35:38.460 --> 00:35:40.040
visited by the
depth-first search.

00:35:47.100 --> 00:35:54.060
Together, if you want some
pillows if you like them,

00:35:54.060 --> 00:35:56.640
especially convenient
that they're in front.

00:35:56.640 --> 00:35:59.130
So right, if it's
not v zero, say

00:35:59.130 --> 00:36:00.470
v3 was the first one visited.

00:36:00.470 --> 00:36:01.845
We will just change
the labeling,

00:36:01.845 --> 00:36:06.260
so that's v zero, that's
v1, that's v, and so on.

00:36:06.260 --> 00:36:09.340
So set this labeling,
so that v0 first one,

00:36:09.340 --> 00:36:12.430
first vertex that gets visited.

00:36:12.430 --> 00:36:20.230
Then, I claim that-- let me
just write the claim first.

00:36:20.230 --> 00:36:23.610
This edge vkv0 will
be a back edge.

00:36:26.350 --> 00:36:29.252
We'll just say, is back edge.

00:36:29.252 --> 00:36:32.780
And I would say this is not
obvious, be a little careful.

00:36:50.420 --> 00:36:54.460
We have to somehow exploit
the depth-first nature of DFS,

00:36:54.460 --> 00:36:58.820
the fact that it goes deep-- it
goes as deep as it can before

00:36:58.820 --> 00:37:00.396
backtracking.

00:37:00.396 --> 00:37:02.820
If you think about
it, we're starting,

00:37:02.820 --> 00:37:05.690
at this point we are starting a
search relative to this cycle.

00:37:05.690 --> 00:37:08.550
No one has been visited,
except v zero just

00:37:08.550 --> 00:37:10.930
got visited, has a parent
pointer off somewhere else.

00:37:15.990 --> 00:37:16.880
What do we do next?

00:37:16.880 --> 00:37:19.309
Well, we visit all the
outgoing edges from v zero,

00:37:19.309 --> 00:37:20.850
there might be many
of them. it could

00:37:20.850 --> 00:37:23.480
be edge from v zero to v1,
it could an edge from v zero

00:37:23.480 --> 00:37:28.750
to v3, it could be an edge
from v zero to something else.

00:37:28.750 --> 00:37:31.980
We don't know which one's
going to happen first.

00:37:31.980 --> 00:37:39.760
But the one thing I
can claim is that v1

00:37:39.760 --> 00:37:46.610
will be visited before we
finish visiting v zero.

00:37:52.124 --> 00:37:53.790
From v zero, we might
go somewhere else,

00:37:53.790 --> 00:37:55.790
we might go somewhere
else that might eventually

00:37:55.790 --> 00:37:58.130
lead to v1 by some other
route, but in particular, we

00:37:58.130 --> 00:38:01.440
look at that edge
from v zero to v1.

00:38:01.440 --> 00:38:03.730
And so, at some point,
we're searching,

00:38:03.730 --> 00:38:06.580
we're visiting all the things
reachable from v zero, that

00:38:06.580 --> 00:38:09.830
includes v1, and
that will happen,

00:38:09.830 --> 00:38:11.950
we will touch v1
for the first time,

00:38:11.950 --> 00:38:13.800
because it hasn't
been touched yet.

00:38:13.800 --> 00:38:17.932
We will visit it before
we finish visiting v zero.

00:38:17.932 --> 00:38:21.660
The same goes actually for all
of v i's, because they're all

00:38:21.660 --> 00:38:23.510
reachable from v zero.

00:38:23.510 --> 00:38:25.760
You can prove this by induction.

00:38:25.760 --> 00:38:29.860
You'll have to visit v1 before
you finish visiting v zero.

00:38:29.860 --> 00:38:32.480
You'll have to visit v2
before you finish visiting

00:38:32.480 --> 00:38:35.592
v1, although you might
actually visit v2 before v1.

00:38:35.592 --> 00:38:37.050
You would definitely
finish, you'll

00:38:37.050 --> 00:38:41.880
finished v2 before you
finish v1, and so on.

00:38:41.880 --> 00:38:47.424
So vi will be visited before
you finish vi minus 1,

00:38:47.424 --> 00:38:49.090
but in particular,
what we care about is

00:38:49.090 --> 00:38:58.760
that vk is visited
before we finish v zero.

00:39:02.040 --> 00:39:03.670
And it will be entirely visited.

00:39:03.670 --> 00:39:05.930
We will finish
visiting vk before we

00:39:05.930 --> 00:39:07.570
finish visiting v zero.

00:39:07.570 --> 00:39:10.280
We will start decay vk
after we start to v zero,

00:39:10.280 --> 00:39:12.330
because v zero is first.

00:39:12.330 --> 00:39:16.580
So the order is going to
look like, start v zero,

00:39:16.580 --> 00:39:20.940
at some point we will start vk.

00:39:20.940 --> 00:39:27.950
Then we'll finish vk,
then we'll finish v zero.

00:39:27.950 --> 00:39:30.340
This is something the
textbook likes to call,

00:39:30.340 --> 00:39:33.200
and I like to call,
balanced parentheses.

00:39:33.200 --> 00:39:38.690
You can think of it as, we
start v zero, then we start vk,

00:39:38.690 --> 00:39:42.390
then we finish vk,
then we finish v zero.

00:39:42.390 --> 00:39:44.290
And these match up
and their balanced.

00:39:46.970 --> 00:39:48.720
Depth-first search
always looks like that,

00:39:48.720 --> 00:39:50.630
because once you
start a vertex, you

00:39:50.630 --> 00:39:53.060
keep chugging until you visited
all the things reachable

00:39:53.060 --> 00:39:54.460
from it.

00:39:54.460 --> 00:39:55.500
Then you finish it.

00:39:55.500 --> 00:39:57.560
You won't finish v zero
before you finish vk,

00:39:57.560 --> 00:40:00.114
because it's part
of the recursion.

00:40:00.114 --> 00:40:01.530
You can't return
at a higher level

00:40:01.530 --> 00:40:04.942
before you return
at the lower levels.

00:40:04.942 --> 00:40:06.400
So we've just argued
that the order

00:40:06.400 --> 00:40:08.025
is like this, because
v zero was first,

00:40:08.025 --> 00:40:11.600
so vk starts after v zero, and
also we're going to finish vk

00:40:11.600 --> 00:40:14.550
before we finish v zero, because
it's reachable, and hasn't

00:40:14.550 --> 00:40:17.000
been visited before.

00:40:17.000 --> 00:40:25.200
So, in here, we
consider vkv zero.

00:40:28.000 --> 00:40:32.070
When we consider that edge,
it will be a back edge.

00:40:34.750 --> 00:40:35.710
Why?

00:40:35.710 --> 00:40:39.640
Because v zero is currently
on the recursion stack,

00:40:39.640 --> 00:40:42.427
and so you will have marked v
zero as currently in process.

00:40:42.427 --> 00:40:44.760
So when you look at that edge,
you see it's a back edge,

00:40:44.760 --> 00:40:47.660
it's an edge to your ancestor.

00:40:47.660 --> 00:40:48.430
That's the proof.

00:40:51.700 --> 00:40:52.790
Any questions about that?

00:40:55.490 --> 00:40:59.460
It's pretty easy once you set
up the starting point, which

00:40:59.460 --> 00:41:01.470
is look at the first
time you visit the cycle,

00:41:01.470 --> 00:41:03.732
than just think about how
you walk around the cycle.

00:41:03.732 --> 00:41:05.940
There's lots of ways you
might walk around the cycle,

00:41:05.940 --> 00:41:08.579
but it's guaranteed you'll
visit vk at some point,

00:41:08.579 --> 00:41:10.870
then you'll look at the edge.
v0 is still in the stack,

00:41:10.870 --> 00:41:12.730
so it's a back edge.

00:41:12.730 --> 00:41:14.575
And so this proves
that having a cycle

00:41:14.575 --> 00:41:16.260
is equivalent to
having a back edge.

00:41:16.260 --> 00:41:18.980
This gives you an easy linear
time algorithm to tell,

00:41:18.980 --> 00:41:20.902
does my graph have a cycle?

00:41:20.902 --> 00:41:22.860
And if it does, it's
actually easy to find one,

00:41:22.860 --> 00:41:26.102
because we find a back edge,
just follow the tree edges,

00:41:26.102 --> 00:41:27.060
and you get your cycle.

00:41:29.564 --> 00:41:31.230
So if someone gives
you a graph and say,

00:41:31.230 --> 00:41:34.350
hey, I think this is acyclic,
you can very quickly say,

00:41:34.350 --> 00:41:36.590
no, it's not, here's
a cycle, or say,

00:41:36.590 --> 00:41:40.490
yeah, I agree, no back edges,
I only have tree, forward,

00:41:40.490 --> 00:41:41.611
and cross edges.

00:41:49.150 --> 00:41:50.545
OK, that was application 1.

00:41:56.610 --> 00:41:58.990
Application 2 is
topological sort,

00:41:58.990 --> 00:42:02.790
which we're going to
think about in the setting

00:42:02.790 --> 00:42:04.320
of a problem called
job scheduling.

00:42:07.700 --> 00:42:14.860
So job scheduling, we are
given a directed acyclic graph.

00:42:21.770 --> 00:42:39.090
I want to order the vertices
so that all edges point

00:42:39.090 --> 00:42:46.090
from lower order to high order.

00:42:52.520 --> 00:42:54.405
Directed acyclic
graph is called a DAG,

00:42:54.405 --> 00:42:59.830
you should know that from 042.

00:42:59.830 --> 00:43:02.790
And maybe I'll
draw one for kicks.

00:43:32.030 --> 00:43:34.760
Now, I've drawn the graph so
all the edges go left to right,

00:43:34.760 --> 00:43:37.110
so you can see that
there's no cycles here,

00:43:37.110 --> 00:43:41.090
but generally you'd run DFS and
you'd detect there's no cycles.

00:43:41.090 --> 00:43:43.170
And now, imagine these
vertices represent

00:43:43.170 --> 00:43:45.746
things you need to do.

00:43:45.746 --> 00:43:49.080
The textbook has a funny example
where you're getting dressed,

00:43:49.080 --> 00:43:50.820
so you have these
constraints that say,

00:43:50.820 --> 00:43:53.579
well, I've got to put my socks
on before put my shoes on.

00:43:53.579 --> 00:43:55.620
And then I've got to put
my underwear on before I

00:43:55.620 --> 00:43:59.350
put my pants on, and all
these kinds of things.

00:43:59.350 --> 00:44:01.460
You would code that as a
directed acyclic graph.

00:44:01.460 --> 00:44:03.293
You hope there's no
cycles, because then you

00:44:03.293 --> 00:44:05.100
can't get dressed.

00:44:05.100 --> 00:44:06.830
And there's some
things, like, well, I

00:44:06.830 --> 00:44:09.050
could put my glasses on
whenever, although actually I

00:44:09.050 --> 00:44:11.174
should put my glasses on
before I do anything else,

00:44:11.174 --> 00:44:12.730
otherwise there's problems.

00:44:12.730 --> 00:44:14.980
I don't know, you could put
your watch on at any time,

00:44:14.980 --> 00:44:17.110
unless you need to
know what time is.

00:44:17.110 --> 00:44:20.287
So there's some disconnected
parts, whatever.

00:44:20.287 --> 00:44:21.870
There's some unrelated
things, like, I

00:44:21.870 --> 00:44:24.955
don't care the order between
my shirt and my pants

00:44:24.955 --> 00:44:28.780
or whatever, some things
aren't constrained.

00:44:28.780 --> 00:44:31.760
What you'd like to do is choose
an actual order to do things.

00:44:31.760 --> 00:44:33.275
Say you're a
sequential being, you

00:44:33.275 --> 00:44:35.630
can only do one
thing at a time, so I

00:44:35.630 --> 00:44:37.050
want to compute a total order.

00:44:37.050 --> 00:44:39.510
First I'll do g,
then I'll do a, then

00:44:39.510 --> 00:44:42.900
I can do h, because I've done
both of the predecessors.

00:44:42.900 --> 00:44:45.160
Then I can't do be,
because I haven't done d,

00:44:45.160 --> 00:44:49.040
so maybe I'll do d first, and
then b, and than e, then c,

00:44:49.040 --> 00:44:50.090
then f, then i.

00:44:50.090 --> 00:44:53.180
That would be a valid order,
because all edges point

00:44:53.180 --> 00:44:55.580
from an earlier number
to a later number.

00:44:55.580 --> 00:44:56.930
So that's the goal.

00:44:56.930 --> 00:44:59.300
And these are real job
scheduling problems

00:44:59.300 --> 00:45:01.670
that come up, you'll
see more applications

00:45:01.670 --> 00:45:04.710
in your problem set.

00:45:04.710 --> 00:45:07.199
How do we do this?

00:45:07.199 --> 00:45:08.990
Well, at this point we
have two algorithms,

00:45:08.990 --> 00:45:10.880
and I pretty much
revealed it is DFS.

00:45:10.880 --> 00:45:13.100
DFS will do this.

00:45:13.100 --> 00:45:16.650
It's a topological sort, is
what this algorithm is usually

00:45:16.650 --> 00:45:17.150
called.

00:45:20.010 --> 00:45:23.280
Topological sort because
you're given a graph, which

00:45:23.280 --> 00:45:25.070
you could think
of as a topology.

00:45:25.070 --> 00:45:26.912
You want to sort it,
in a certain sense.

00:45:26.912 --> 00:45:28.370
It's not like
sorting numbers, it's

00:45:28.370 --> 00:45:32.370
sorting vertices in a graph,
so, hence, topological sort.

00:45:32.370 --> 00:45:34.150
That's the name
of the algorithm.

00:45:34.150 --> 00:45:46.250
And it's run DFS, and
output the reverse

00:45:46.250 --> 00:45:55.192
of the finishing
times of vertices.

00:45:55.192 --> 00:45:57.150
so this is another
application where you really

00:45:57.150 --> 00:45:58.983
want to visit all the
vertices in the graph,

00:45:58.983 --> 00:46:05.100
so we use this top level DFS,
so everybody gets visited.

00:46:05.100 --> 00:46:07.350
And there are these
finishing times,

00:46:07.350 --> 00:46:11.470
so every time I finish a vertex,
I could add it to a list.

00:46:11.470 --> 00:46:13.294
Say OK, that one
was finished next,

00:46:13.294 --> 00:46:15.460
than this one is finished,
than this one's finished.

00:46:15.460 --> 00:46:18.320
I take that order
and I reverse it.

00:46:18.320 --> 00:46:21.588
That will be a
topological order.

00:46:21.588 --> 00:46:22.900
Why?

00:46:22.900 --> 00:46:24.190
Who knows.

00:46:24.190 --> 00:46:24.880
Let's prove it.

00:46:34.440 --> 00:46:38.610
We've actually done pretty
much the hard work, which

00:46:38.610 --> 00:46:42.560
is to say-- we're assuming
our graph has no cycles,

00:46:42.560 --> 00:46:46.150
so that tells us by
this cycle detection

00:46:46.150 --> 00:46:47.410
that there are no back edges.

00:46:47.410 --> 00:46:49.780
Back edges are kind
of the annoying part.

00:46:49.780 --> 00:46:51.500
Now they don't exist here.

00:46:51.500 --> 00:46:56.970
So all the edges are tree edges,
forward edges, and cross edges,

00:46:56.970 --> 00:47:01.765
and we use that to
prove the theorem.

00:47:05.020 --> 00:47:10.570
So we want to prove that all
the edges point from an earlier

00:47:10.570 --> 00:47:12.170
number to a later number.

00:47:15.320 --> 00:47:17.080
So what that means
is for an edge,

00:47:17.080 --> 00:47:22.830
uv, we want to show that
v finishes before u.

00:47:32.010 --> 00:47:34.750
That's the reverse,
because what we're taking

00:47:34.750 --> 00:47:38.610
is the reverse of
the finishing order.

00:47:38.610 --> 00:47:41.790
So edge uv, I want to make
sure v finishes first,

00:47:41.790 --> 00:47:43.595
so that u will be ordered first.

00:47:45.917 --> 00:47:47.000
Well, there are two cases.

00:47:51.290 --> 00:47:59.010
Case 1 is that u
starts before v. Case 2

00:47:59.010 --> 00:48:01.460
is that he v before u.

00:48:06.690 --> 00:48:08.220
At some point they
start, because we

00:48:08.220 --> 00:48:09.136
visit the whole graph.

00:48:13.160 --> 00:48:16.400
This top loop guarantees that.

00:48:16.400 --> 00:48:21.440
So consider what order we visit
them first, at the beginning,

00:48:21.440 --> 00:48:23.960
and then we'll think
about how they finish.

00:48:23.960 --> 00:48:27.400
Well, this case is kind of
something we've seen before.

00:48:27.400 --> 00:48:31.480
We visit u, we have
not yet visited v,

00:48:31.480 --> 00:48:35.440
but v is reachable from
u, so maybe via this edge,

00:48:35.440 --> 00:48:38.320
or maybe via some other
path, we will eventually

00:48:38.320 --> 00:48:41.190
visit v in the recursion for u.

00:48:41.190 --> 00:48:48.950
So before u finishes,
we will visit v, visit v

00:48:48.950 --> 00:48:53.070
before u finishes.

00:48:53.070 --> 00:48:58.560
That sentence is just
like this sentence,

00:48:58.560 --> 00:48:59.849
so same kind of argument.

00:48:59.849 --> 00:49:01.640
We won't go into detail,
because we already

00:49:01.640 --> 00:49:04.470
did that several times.

00:49:04.470 --> 00:49:07.710
So that means we'll visit v,
we will completely visit v,

00:49:07.710 --> 00:49:10.040
we will finish v
before we finish u

00:49:10.040 --> 00:49:12.100
and that's what we
wanted to prove.

00:49:12.100 --> 00:49:14.580
So in that case is good.

00:49:14.580 --> 00:49:18.820
The other cases is
that v starts before u.

00:49:18.820 --> 00:49:21.764
Here, you might get
slightly worried.

00:49:21.764 --> 00:49:24.810
So we have an edge, uv,
still, same direction.

00:49:24.810 --> 00:49:29.930
But now we start at v, u
has not yet been visited.

00:49:29.930 --> 00:49:35.646
Well, now we worry
that we visit u.

00:49:35.646 --> 00:49:38.510
If we visit u, we're going to
finish u before we finish v,

00:49:38.510 --> 00:49:40.640
but we want it to be
the other way around.

00:49:40.640 --> 00:49:43.096
Why can't that happen?

00:49:43.096 --> 00:49:44.013
AUDIENCE: [INAUDIBLE].

00:49:44.013 --> 00:49:46.262
PROFESSOR: Because there's
a back edge somewhere here.

00:49:46.262 --> 00:49:48.610
In particular, the graph
would have to be cyclic.

00:49:48.610 --> 00:49:54.830
This is a cycle, so this
can't happen, a contradiction.

00:49:54.830 --> 00:50:00.350
So v will finish before
we visit u at all.

00:50:04.690 --> 00:50:07.830
So v will still finish first,
because we don't even touch u,

00:50:07.830 --> 00:50:10.080
because there's no cycles.

00:50:10.080 --> 00:50:13.280
So that's actually the proof
that topological sort gives you

00:50:13.280 --> 00:50:18.195
a valid job schedule,
and it's kind of-- there

00:50:18.195 --> 00:50:21.200
are even more things
you can do with DFS.

00:50:21.200 --> 00:50:24.520
We'll see some in recitations,
more in the textbook.

00:50:24.520 --> 00:50:28.280
But simple algorithm, can do
a lot of nifty things with it,

00:50:28.280 --> 00:50:30.930
very fast, linear time.