Lecture 11: Weighted Shortest Paths

Flash and JavaScript are required for this feature.

Download the video from Internet Archive.

This lecture discusses weighted graphs and weighted paths. This prepares for the next four lectures, which are on algorithms to find shortest-path weights in weighted graphs.

Instructor: Jason Ku

[SQUEAKING]

[RUSTLING]

[CLICKING]

 

JASON KU: Hi, everyone. Welcome to the 11th lecture of 6.006, our first lecture on weighted shortest paths. Until now, we've only been talking about graphs that-- where we measure distance in terms of the number of edges in a path. Today, we're going to generalize that notion. But I just want to go over what we've talked about in the last two lectures.

In the last two lectures, we've talked about two algorithms, breadth-first search and depth-first search to solve a range of problems. Here's some of the problems that we've been solving. Single-source shortest paths, where distances are measured in number of edges in a path. And we used BFS to solve this problem, starting from a single source, usually a vertex s that we call. And we solve that in linear time. And we solve that in order v plus e. That's what we called linear time for a graph.

For the special case of single-source reachability, here we had to return a shortest path distance for every vertex. And there was, at most, E things reachable from a vertex. So this is the bound we got. But in the special case for single-source reachability, when our output only has to list the vertices that are reachable from me, the number of things reachable in basically a spanning tree of the connected component of my source can almost be of order E. And so for all the little singleton vertices in my graph, I don't really care. So I can get this in order E, but that's kind of a little optimization.

The next thing we did was we talked about connected components. And we didn't just reduce to using a search algorithm like a single-source reachability algorithm like BFS or DFS. We put a for loop around that to explore the entire graph by basically saying, if I've explored one connected component, then I can look at any other vertex I haven't seen and explore the next one. And so that actually with some-- a little analysis, also got linear time, because I'm at most traversing any component of my graph once. That's kind of the idea.

And we can use that using BFS or DFS really, because we're just trying to get a thing that searches an entire connected component. And then this topological sort we did at the end of the last lecture. We used full DFS to give an ordering of the vertices in a DAG-- maybe I'll specify clearly that this is only for a DAG-- where we have an ordering of the vertices so all the edges go forward in that order, for example. And that we also did in linear time.

All right, in this lecture, and in actually the next four lectures, what we're going to do is instead of measuring distance in terms of the number of edges in a path-- so previously, distance equaled number of edges-- we're going to generalize that notion. So instead counting an edge, we're going to count an integer associated with that edge. It's going to be called a weight.

So here's an example of a weighted graph G. And I've labeled, in red, weights for each of these edges. This is a directed graph on eight vertices. And I've got an integer associated with each edge. You'll notice, some of them are positive, some of them are negative. It's OK to be zero as well. It's just any integer edge weight here.

So generally we're going to be-- along with our graph G, we're going to be given a weight function that maps the edges of G to, we're going to say, integers, in this class anyway. In other contexts, in mathematics, you might have these be real numbers. But in this class, we're going to deal with integers.

So each edge, if you have an edge, we're going to say this is the edge weight-- the weight of this edge e, from e. Sometimes, if this edge e is u, v, we might sometimes say the weight from u to v, since we have a simple graph that's unambiguous. All right, so but this is just talking about our notation. So in general, for example, the weight from vertex b to f in this graph is what? Can someone tell me?

AUDIENCE: Minus 4.

JASON KU: Minus 4, right? It's right here. And I'll be consistent with my coloring, because I've got colored chalk today. Minus 4. Happiness.

All right, so why do we care about adding weights to our graph? Well, this comes up a lot in many applications. For example, distances in a road network. if I have a road from here-- so from Mass Ave, front of MIT, to Central Square, we might think of that as one road. Maybe you've got each road is a connection between two intersections in my road network. But an edge, it takes longer to go from, say, Vassar Street to Amherst. That takes a shorter amount of time than it does to go from Memorial Drive across the river to Beacon Street. So we might want to associate a larger distance or a weight associated with that edge.

Latency in a network, for example. Maybe strength of relationships in a social network. And you could imagine that it's possible maybe you're "frenemies" with someone, you don't like them, and so maybe you have a negative weight associated with an edge in a social network. I'm not sure. Maybe not. But there are lots of applications where you might want weights on your edges.

So that comes to the next question of, how do I represent-- how do I give the user, or the algorithm, these weights in my graph? We had a representation for a graph. Our common way to represent a graph was store a set data structure on the vertices mapping to the adjacencies of each vertex, which we stored in what we called an adjacency list, which really could be any data structure. Commonly, it's just an array of the adjacencies. But you could also have that be a set data structure, where you can query in constant time what-- if a particular adjacency exists in that graph.

So there are two common ways to store these weights. One is just, with every adjacency, I'm going to store its weight. Maybe just in a tuple. With each adjacency, also store weight of the edge that it corresponds to, just in any way.

A second way, instead of trying to modify our graph structure that we gave you before, let's just have a dictionary of all the edges mapping to their weights. And we already know how to do that. Just any set data structure-- any separate set data structure mapping edges to their, I guess, weights. Bad notation, but you get the idea.

And it doesn't really matter how we're doing this. The assumption that we're going to rely on here is that, given an edge, given this vertex pair, I can query what the weight of that edge is in constant time. And so if I'm going to do that, I can either store it with maybe a hash table of hash tables-- a hash table mapping the set of vertices to their adjacencies, and then each adjacency list stores its adjacencies in a hash table. And that way, in constant time, I can check what the weight is there.

Or here, I'm just-- I could even have just a single hash table mapping the pair, the edge, the tuple, constant size, to its weight. So either way is fine. We're just going to assume that we can query an edge in constant time-- the weight of an edge in constant time.

OK, so this is that graph example. It's a little busy here. I'm probably going to erase that in just a second. But we're going to move on to what giving these edges weights implies for these problems that we've defined in terms of unweighted graphs. In particular, we are going to be concentrating on single-source shortest paths, again, at least for the next three lectures. We'll generalize that even still in the next lecture-- I mean, in the fourth-- in three lectures from now.

But what we had here was that the distance before in an unweighted graph was the number of edges in the path. Here, we're going to generalize that notion kind of obviously to weighted paths. And the weight of a path, I'm going call it pi. So some weight of path pi is just going to be the sum of the weights in the edges in the path. So edge in the path, I'm going to sum their weights. So that's all the weight of a path means. It's just I'm going to sum all the weights in a path.

So if I took a look at the-- maybe there's a particular path here. The path from a to b to f to g is going to be minus 5, minus 4, 2. It's going to be minus 9 plus 2 is minus 7. So just as an example.

So then what is the shortest path then? Well, kind of obviously among all paths between two vertices, it's going to be one with the minimum weight. Yeah, question.

AUDIENCE: Can I use the same edge more than once?

JASON KU: Can I use the same edge more than once? Right now, you're asking about the distinction in our class which we have between paths and simple paths. So here, a weighted path doesn't really care if we visit an edge more than once. So if an edge appears more than once in pi, we have to count that more than once in the edge weight-- in the weight of the path. OK, great question.

But what we're going to see later on is shortest paths cannot repeat an edge more than once in certain contexts. So we're going to get to the problem there a little later in this lecture. And we're going to solve that in tomorrow's lecture. But if you have-- we're getting a little ahead of ourselves. But when we have negative weights in a graph, it's possible that things go wrong. We're going to get there in about five lines. OK, great.

So a shortest path-- and in this case, I'm going to clarify that this is the weighted shortest path-- is a minimum-- min-i-mum-- sure-- is a minimum weight path from s to t. Nothing too interesting here, but there's actually some subtleties we have to deal with here.

We're going to call-- just like we did with breadth-first search when we talked about shortest paths, we're going to define an expression for what the distance or the shortest path weight is between two vertices. And I'm going to represent that by a delta. A delta from a vertex s to t is going to be-- let's-- I'm going to do the wrong thing first-- the minimum over the weight of all paths for all paths pi from s to t.

OK, so there's a couple things that go wrong here. First thing that goes wrong is the same thing that went wrong with breadth-first search. Anyone remember what could go wrong with breadth-first search for this delta definition?

AUDIENCE: Maybe there's no path.

JASON KU: Maybe there's no path, right. So except if no path. Just by convention, we're setting delta s, t, to equal infinity. But there's one additional problem with weighted shortest paths, and it's a little subtle. It's possible that a finite shortest-- finite length shortest path doesn't exist.

And what do I mean by that? It means I could keep going through edges in my graph and continually getting a shorter path. So if the shortest-- the minimum weight of a path from s to t actually goes through an infinite number of edges, then this isn't really well-defined.

So I'm going to change this minimum here to-- in mathematics we would, just to be specific, we call it an infimum. So if in the case where the weight of a shortest path can approach arbitrarily small, then we'll call this thing minus infinity. So when does that occur? When does that occur? When could we have our shortest path go through lots and lots of vertices?

Well, let's actually take a look at this example here. Can someone tell me what the shortest path is from a to actually any vertex in this graph?

AUDIENCE: b, f, g, c.

JASON KU: Ah, OK. So well, we could look at this path I have to b. Let's just take a look at b. I have a path going from a to b that is minus 5. OK, that's pretty good. That's pretty small. And it seems that if I go around this graph through another way, it might be bigger.

So I go 7 plus 3 plus 8-- that's 15-- minus 1-- that's 14. That's much bigger than minus 5, so it seems like minus 5 should be good, right? Anyone have a problem with this path or a problem with this being the shortest path?

And what your colleague just informed me was that there is something interesting happening here in this graph in particular. We have a cycle from b to f to g to c that has negative total weight back to b. This has minus 4 plus 2 plus 1 minus 1. So that total cycle has a cycle weight of minus 2, this negative weight cycle.

So if I want to get to b, I could go there via this minus 5 weight edge. But every time I circled around this cycle, I incur minus 2 to my path weight. So I just keep going around this cycle over and over and over and over and over and over again, and I don't have any finite length minimum weight path. And in such cases, we just say that delta is minus infinity.

So the problem here is that we could have negative weight cycles-- deserves a capital letter-- Negative weight cycles. It's a problem. In particular, if there exists a path from s to some vertex v that goes through a vertex on a negative weight cycle, then I can take that path to that vertex, circle around the negative weight cycle, and then proceed to v, and I can take that cycle as many times as I want. Then this delta s,v we're going to set to minus infinity.

And in such cases, in our shortest paths algorithm, we don't really care about what the shortest path is. We're not even going to deal with parent pointers here, because there is no finite length shortest path. So I'm just going to kind of throw up my hands in the air and say, you know what, I can't return you a shortest path, but I might want to return to you a negative weight cycle.

If you told me that this thing has bad weight, maybe I want you to tell me what a path is that goes through a negative weight cycle to get back to s. So that's what we're going to talk about next lecture. This lecture, we are not going to talk about that. We are going to talk about weighted shortest paths, though. That's what the remainder of this unit on graphs is really about is weighted shortest paths.

OK, so in weighted shortest paths, we actually know an algorithm already to solve a subset of weighted shortest parts, namely BFS, right? Now, you're like wait, Jason, BFS doesn't solve weighted shortest paths. We didn't even know about weighted graphs then. How does that solve weighted shortest paths?

Well, there's a couple cases where we might be able to reduce to solving shortest paths using BFS. Can anyone think of such a scenario? So let's say, I mean, kind of what we did before was we counted the number of edges. So if we gave a weight of 1 to every edge in my graph, then just that graph, that weighted graph, corresponds to an unweighted graph using the other distance metric. So in that case, BFS just solves our problem.

And in fact, we can generalize further. What if all of our weights were positive, but the same value? If it was all positive and the same value, then we could just divide by that value. Now we have an unweighted graph which we can run BFS, and then multiply shortest path distances by that value later on.

And in fact, there's one further generalization we can make, which is a little bit of a tricky graph transformation problem. But we can also get this linear time algorithm for weighted single-source shortest paths in contexts where the weights aren't that large.

So if I have positive edge weights-- if I have a positive edge weight, let's say-- using my weight color here-- that's, like, weight of 4, that's kind of problematic, because I don't know how to simulate that using an unweighted graph. Or do I? Anyone have an idea of how I could simulate an edge of weight 4 with an unweighted graph? Yeah.

AUDIENCE: Have four edges of weight 1.

JASON KU: Yeah, I can just put four edges of weight 1 in parallel here-- I'm sorry, in series, the opposite of parallel. I can just convert this here into 1, 2, 3, 4 edges. And if I do that for every edge in my graph and we have positive edge weights, then that transformation can hold. Now, that's not necessarily a good transformation to make. Why?

AUDIENCE: The weight might be very big.

JASON KU: Yeah, the weights might be very big compared to the number of vertices and edges in my graph. However, if the sum of all weights in my graph is asymptotically less than v plus e, we can get a linear time algorithm again by reducing to BFS. OK, so that's great.

But in general, that gives us a linear time algorithm in these very special cases. And in general, it's an open problem. We don't know whether we can solve the single-source shortest paths problem in the weighted context for general graphs in linear time. We don't know how to do it.

But what we do know are some algorithms that do pretty well. And that's what we use all the time. But one more special case we're going to go over today is when we have this really nice structure where we have a DAG, a Directed Acyclic Graph, like we were talking about in the last lecture.

For any set of edge weights-- remember, with BFS, we needed to restrict our edge weights to be positive and maybe bounded to get this good running time? For any set of edge weights, if our graph structure is DAG-- it really has nothing to do with the weights-- if the graph structure is a DAG, then we can actually solve this single-source shortest paths problem in linear time, which is pretty awesome.

Now, for general graphs, we're going to show you in the next lecture how to, for any graph-- even with cycles, even with negative weight cycles-- we're going to show you how to solve this single-source shortest paths problem in something like a quadratic running time bound. Now, this isn't the best known, but it's a really practical algorithm and people use it all the time. And we are going to show Bellman-Ford in the context of the DAG algorithm we're going to solve today.

So that's the very general case in terms of restrictions on our graph. But in reality, most problems that come up in applications occur with graphs that have positive edge weights. You can think of a road network. You've got-- or non-negative ones anyway. You're traveling along, and it's not ever useful to go back to where you came from, because you want to make progress to where you're going.

So in the context where you don't have negative weights, you don't have this problem where you have negative weight cycles. We can actually do a lot better by exploiting that property. And we get a bound that's a little bit-- that looks a little bit more like n log n. It's pretty close to linear. You're losing a log factor on the number of vertices. But it's pretty good. This is called Dijkstra, and we'll get to that in two lectures.

OK, so that's the roadmap of what we're going to do for at least the next three lectures. But before we go on to showing you how to solve single-source shortest paths in a DAG using this algorithm that I'm calling DAG relaxation here, I'm going to go back to a thing that we talked about in breadth-first search, where in breadth-first search when we solved single-source shortest paths, we output two things. We output single-source shortest paths, these deltas, for the other definition of distance, the weights-- I mean, not the weights, the distances, the shortest distances. But we also returned parent pointers. We return parent pointers back along paths to the source along shortest paths. We call this the shortest paths tree.

So I'm going to revisit this topic of shortest paths tree-- shortest path trees-- shortest path trees. And in particular, it's kind of going to be annoying to talk about both of these quantities-- distances and parent pointers-- as we go through all three of these algorithms. It's basically going to be bookkeeping to-- distances are actually sufficient for us to reconstruct parent pointers if we need them later.

So what I'm going to show for you-- prove to you now is that, if I give you the shortest path distances for the subset of the graph reachable from s that doesn't go through negative weight cycles, if I'm giving you those distances, I can reconstruct parent pointers along shortest paths in linear time for any graph I might give you if I give you those shortest path distances.

OK, so that's what I'm going to try to show to you now. So here's the algorithm. For weighted-- there's the caveat here I'm going to write down. For weighted shortest paths, only need parent pointers for v with finite shortest path distance-- only finite shortest path distance. We don't care about the infinite ones or the minus infinite ones, just the finite ones.

OK, so here's the algorithm. I can initialize all Pv to equal-- sorry, oh, getting ahead of myself. I'm writing down DAG. Init parent pointer data structure to be empty. At first, I'm not going to sort any parent pointers. But at the beginning, I'm going to set the parent pointer of the source to be none. So that's what we kind of did in breadth-first search as well.

Now, what I've given you is-- I'm trying to show that, given all the shortest path distances, I can construct these parent pointers correctly. So what I'm going to do is, for each vertex u in my graph, where my delta s of u is finite, what am I going to do? I'm going to say, well, let's take a look at all my outgoing neighbors. This is kind of what we do in every graph algorithm.

For each v in the adjacency, the outgoing adjacencies of u, if there is no parent pointer assigned to this v, there's the potential that i-- u-- [CHUCKLES] I, you-- this u, this vertex u, is the parent of v. It's possible. It's some incoming edge to v.

When will it be an incoming edge to v? If v not in P-- I haven't assigned it a parent pointer-- and-- so this means it could be my parent. When is it my parent along the shortest path? Sure.

AUDIENCE: Sum the distance along the edge to the distance of the other.

JASON KU: Yeah, so we have some edge from u to v. It has some weight. If I already know the shortest path distance to u, and I know the shortest path distance to v, if the shortest path distance from s to u-- let's draw a picture here. We've got s, we've got some path here to u, and we know we've got an edge from u to v. If this shortest path distance plus this edge weight is equal to the shortest path distance from s to v, then it better-- I mean, there may be more than one shortest path, but this is certainly a shortest path, so we can assign a parent pointer back to u.

So let's write that condition down. If the shortest path distance from s to v equals the shortest path distance from s to u, and then traversing the edge from u to v, then exists shortest path that uses edge u, v, in particular this one. So set the parent of u-- of v to u.

OK, so this is the algorithm. I'm not going to prove to you that this is correct. But it kind of intuitively makes sense, right? If I have these shortest path distances, you can prove by induction that not only does this parent pointer point to the right place along some shortest path here, but it also does so in linear time, because I'm looping over all the vertices and looping over its outgoing adjacencies once. Same analysis as we had for both BFS and DFS, essentially.

And then, since we can do this, since we can compute parent pointers from these distances, we're going to ignore computing these parent pointers from now on. We're just going to concentrate on computing the distances, because we're going to have to take linear time anyway at least. And all these other things take more time. So we can compute the distances in more time, and then compute the parents after. OK, so that's what we're going to do.

So now, with all that buildup, let's show an algorithm. [CHUCKLES] How do we compute single-source shortest paths in a DAG in linear time? Well, a DAG-- I mean, this is actually a super useful, convenient thing in algorithms in general. DAGs are just nice things. They're kind of ordered in a way. There's this topological sort order that we were talking about before. This is going to play a key role.

There's a really nice structure to DAGs not having cycles, not having to deal with this negative weight cycle problem. You can only go in one direction along this graph. It's a very nice structure to exploit. And so we're going to exploit it.

And here's the idea. DAG relaxation, what it's going to do is it's going to start out with some estimates of what these distances should be. So maintain distance estimates. And now I'm going to try to be careful here about how I draw my Ds. This is a d, this is a delta. This is shortest paths. This is a distance estimate. So that's what I'm going to be using for the rest of this time.

So we're going to maintain these estimates of distance d, which are going to start at initially infinite. I don't know what they are. I don't know what the shortest paths are, but they better be less than infinite or else I don't care. So that's the worst case scenario. It can't be worse than this-- for every vertex. And we're going to maintain the property that estimates upper bound-- that should probably be two words-- upper bound delta s, v-- we're going to maintain that they upper-bound this thing and gradually lower until they're equal.

So this is the idea. We start from an over-estimate, an upper bound on the distance estimate. And then we're repeatedly going to lower that value as we gain more information about the graph, maintaining that we're always upper-bounding the distance. And we're going to keep doing it, keep doing it, keep doing it, until, as we will try to prove to you, these estimates reach, actually reach down, all the way to our shortest path distances.

So when do we lower these things? When do we lower these things? We are going to lower these distance estimates whenever the distance estimates violate what we're going to call the triangle inequality. OK, what is the triangle inequality?

Triangle inequality is actually a pretty intuitive notion. It's basically saying, if I have three points-- thus, triangle-- maybe bigger so I can write a letter in them. It's basically saying that if I have a vertex u, a vertex v, vertex x, for example, the shortest path distance-- the shortest path distance delta of u, v-- that's the shortest distance from u to v-- it can't be bigger then a shortest path from u to v that also goes through x.

Of my paths, I'm now restricting the paths I have to the ones that go through x. The shortest path distance from u to v can't be bigger than restricting paths that go through x and taking that shortest distance, getting the shortest path distance from here and adding it to the shortest path distance here-- delta u, x, delta x, v. That's just a statement of, I'm restricting to a subset of the paths. I can't decrease my minimum distance.

So this is the statement of the triangle inequality, that the shortest path distance from u to v can't be bigger than the shortest path distance from u to x plus the shortest path distance from x to v for any x in my graph that's not u and v. So that's the triangle inequality. Pretty intuitive notion, right?

Why is this useful? OK, well, if I find-- if I find an edge in my graph, if there's an edge u, v, in my graph such that this condition is violated for the estimates that I have-- it obviously can't be violated on my shortest path distances, but if it violates it on the estimates-- u, v, is bigger than u, x-- sorry, u-- how am I going to do this?

I want this to be s. I'm calculating shortest path distances from s and shortest path distances from s to some incoming vertex u plus the edge weight from u to v. All right, so what is this doing?

I have some edge u, v in my graph. Basically, what I've said is that I have some distance estimate to u, but going through-- making a path to v by going through u, and then the edge from u to v is better than my current estimate, my shortest path estimate to v. That's bad, right? That's violating the triangle inequality. These cannot be the right weights. These cannot be the right distances.

So how we're going to do that is lower-- this is what we said, repeatedly lower these distance estimates. I'm going to lower this guy down to equal this thing. In a sense, this constraint was violated. But now we're relaxing that constraint so that this is no longer violated.

So relax is a little weird word here. We're using it for historical reasons. But that's what we mean by when we say relax. This thing is a violated constraint. It's got some pressure to be resolved. And so what we're doing is, to resolve it, we're just setting this guy equal to this, so it at least resolves, locally, that constraint. Now, it may violate the triangle inequality other places now that we've done this change. But at least this constraint is now relaxed and satisfied.

OK, so relax edge by lowering d of s, v, to this thing. That's what we're going to mean by relaxing an edge. And relaxing an edge is what I'll call safe. It's safe to do. What do I mean by relaxation is safe? It means that as I am computing these shortest path distances, I'm going to maintain this property that each one of these estimates-- sorry, these estimates here-- has the property that it's either infinite or it is the weight of some path to v. So that's the thing the-- relaxation is safe.

OK, so each distance estimate, s, v, is weight of some path from s to v or infinite. And this is a pretty easy thing to prove. If I had the invariant that these were all weights of shortest paths, let's try to relax an edge. And we need to show that this property is maintained. Relax edge u, v, OK?

Now, if I relax edge u, v, what do I do? I set this thing-- or, sorry, I set this thing, my shortest path distance to v, to be this thing plus this thing. There's a weight of an edge from u to v.

Now, by my assumption that we're maintaining that this is the weight of some path in my graph, if this thing is bigger, I'm setting it to the weight of some path on my graph to u, plus an edge from u to v, and so this checks out. So assign d of s, v, to weight of some path.

I'm not going to write down all the argument that I just had here. But basically, since this distance estimate was by supposition before the weight of some path to v-- to u, then this is, again, the weight of some path to v. OK, great. So now we're ready to actually go through this algorithm.

So DAG relaxation, from over there, initializes all of our distance estimates to equal infinity, just like we did in BFS. Then, set my distance estimate to myself to be 0. Now, it's possible that this might be minus infinity or negative at some point. But right now, I'm just setting it to 0. And either way, 0 is going to upper-bound the distance to s.

So in particular, at initialization, anything not reachable from s is set correctly. And s itself is set as an upper bound to the shortest path distance. Now we're going to process each vertex u in a topological sort order. So remember, our input to DAG relaxation is a DAG. So this thing has a topological sort order. We're going to process these vertices in that order.

You can imagine we're starting at the top, and all my vertices are-- all my edges are pointed away from me. And I'm just processing each vertex down this topological sort line. And then for each of these vertices, what am I going to do? I'm going to look at all the outgoing neighbors. And if the triangle inequality is violated, I'm going to relax that edge. The algorithm is as simple as that.

For each outgoing neighbor of v-- sorry, of u-- I always get u and v mixed up here. If my shortest path estimate to v violates the triangle inequality for an edge, for an incoming edge, then I'm going to set-- relax u, v, i.e. set d, s,v, equal to d, s,u, plus w, u,v. So that's the algorithm.

So if I were to take a look at this example graph over here, maybe a is my start vertex. I initialize it to--

AUDIENCE: DAG.

JASON KU: This is not a DAG. Thank you. Let's make it a DAG. I claim to you now this is a DAG. In particular, a topological sort order is when there's a path through all the vertices, then there's a unique topological sort order-- a, b, e, f, g, h, d, c. This is a topological order. You can check all the edges.

So I'm going to start by setting the-- actually, let's use e. Let's use shortest paths from e. Why not? Shortest paths from e.

Vertex a actually comes before e in the topological order. So it has no-- I mean, its shortest path distance, when I initialize, I'm going to initialize this to 0. I'm going to initialize this to infinite, infinite, infinite, infinite, all these things to infinite. These are my estimates. These are not quite the shortest paths yet-- distances.

But when I get here, clearly I can't be-- distance to me being infinite can never violate the triangle inequality with something infinite or finite. It doesn't matter, right? So I don't do anything to a. Anything before my source vertex in the topological order can't be visited, because it's before in the topological order. That's kind of the point. There's no path from my source vertex to anything before it in the topological order.

So same with b. b is before a in the topological order. Now, I'm at e, and it's possible we are violating triangle inequality, in particular here. I think the shortest path distance to f is infinite. But actually, if I go to e through this edge with a weight 3, I know that this is violating triangle inequality. So actually, this thing is wrong, and I can set it equal to 3.

Now, that might not be the shortest path distance. But right now it's a better estimate, so we've set it.

Now, I'm moving on. I'm done with this guy. I move to the next vertex in my topological order. And again, I relax edges out of f. OK, so here, looking at 8, 3 plus 8 is better than infinite, so we'll say that that's 11. And 3 plus 2 is better than infinite, so that's 5. Now, I keep going in the topological order. g is the next. 5 plus 1 is 6. OK, so we found a better estimate here. So this 11 is not as good. 6 is better, so we replace it.

Here, we haven't visited before. It's still infinite. So 5 plus minus 2 is 3. This is the next in the topological order. 3 plus 9 is bigger than 6, so that's not a shorter path. 3 plus 4 is certainly smaller than infinite, so set that equal to 7. Then, 7 plus 5 is also bigger than 6. And actually, you can confirm that these are all the shortest path distances from e.

So this algorithm seems to work. Does it actually work? Let's take a look. The claim to you is that at the end of relaxation, this algorithm, we've set-- claim at end, all the estimates equal shortest path distances. The basic idea here is that if I take the k-th vertex in the topological order, assuming that these distances are all equal for the ones before me in the topological order, I can prove by induction.

We can consider a shortest path from s to v, the k-th vertex, and look at the vertex preceding me along the shortest path. That vertex better be before me in the topological order or we're not a DAG. And we've already set its shortest path distance to be equal to the correct thing by induction.

So then when we processed u-- s to u to v-- when we processed u in DAG relaxation here, processed the vertex and looked at all its outgoing adjacencies, we would have relaxed this edge to be no greater than that shortest path distance. So this is correct.

You can also think of it as the DAG relaxation algorithm for each vertex looks all at its incoming neighbors, assuming that their shortest path distances are computed correctly already. Any shortest path distance to me needs to be composed of a shortest path distance to one of my incoming neighbors through an edge to me, so I can just check all of them. That's what DAG relaxation does.

And again, we're looping over every vertex and looking at its adjacencies doing constant work. This, again, takes linear time. OK, so that's shortest paths and a DAG. Next time, we'll look at, for general graphs, how we can use kind of the same technique in an algorithm we call Bellman-Ford. OK, that's it for today.