1 00:00:00,000 --> 00:00:07,728 [SQUEAKING] [RUSTLING] [CLICKING] 2 00:00:13,618 --> 00:00:14,660 JUSTIN SOLOMON: OK, team. 3 00:00:14,660 --> 00:00:18,600 Let's get started for the day. 4 00:00:18,600 --> 00:00:21,350 It's a pleasure to see all of you guys. 5 00:00:21,350 --> 00:00:23,000 In case you don't remember, I'm Justin. 6 00:00:23,000 --> 00:00:26,630 I'm the third instructor of 006 that you probably forgot about, 7 00:00:26,630 --> 00:00:29,180 but you're going to see a lot more of me in the graph theory 8 00:00:29,180 --> 00:00:31,640 part of our course because that's the part of algorithms 9 00:00:31,640 --> 00:00:32,280 that I like. 10 00:00:32,280 --> 00:00:35,672 If I were reincarnated as a theoretical computer scientist, 11 00:00:35,672 --> 00:00:37,130 I would probably go into this area. 12 00:00:37,130 --> 00:00:38,420 Hey, guys. 13 00:00:38,420 --> 00:00:40,490 OK. 14 00:00:40,490 --> 00:00:43,190 We have our PhD admit visit days coming up 15 00:00:43,190 --> 00:00:46,460 for the next couple of days I'm working on my camp counselor 16 00:00:46,460 --> 00:00:47,780 cheerleader voice. 17 00:00:47,780 --> 00:00:51,680 So don't make me wake all of you guys up for the day. 18 00:00:51,680 --> 00:00:53,490 You're not going to like it. 19 00:00:53,490 --> 00:00:57,350 But in any event, so in 6.006 if you look back 20 00:00:57,350 --> 00:00:59,000 at the course outline, we're officially 21 00:00:59,000 --> 00:01:01,610 starting part two of this class. 22 00:01:01,610 --> 00:01:04,155 There are a few corollaries to that fact. 23 00:01:04,155 --> 00:01:06,030 So unless there are any questions about that, 24 00:01:06,030 --> 00:01:09,140 we'll get started with our new unit in 6.006 25 00:01:09,140 --> 00:01:11,870 which is a graph theory. 26 00:01:11,870 --> 00:01:15,153 If you're wondering, there's a graph on the screen here. 27 00:01:15,153 --> 00:01:17,570 But of course, we'll fill in a little bit more information 28 00:01:17,570 --> 00:01:19,370 today throughout our lecture. 29 00:01:19,370 --> 00:01:24,680 When I was learning how to teach, which I'm still doing, 30 00:01:24,680 --> 00:01:26,960 actually my PhD advisor told me if you want somebody 31 00:01:26,960 --> 00:01:29,460 to learn something, you have to write it as big as possible. 32 00:01:29,460 --> 00:01:31,880 And so I'm really leaning into that approach 33 00:01:31,880 --> 00:01:34,260 today in our slides. 34 00:01:34,260 --> 00:01:36,770 So in any event, so today we're going 35 00:01:36,770 --> 00:01:38,690 to have our first lecture on graphs 36 00:01:38,690 --> 00:01:42,890 which I think will somewhat be a review for many of you guys. 37 00:01:42,890 --> 00:01:44,337 And if it's not, that's cool too. 38 00:01:44,337 --> 00:01:45,920 Because we'll start from the beginning 39 00:01:45,920 --> 00:01:47,420 and kind of build up all the notions 40 00:01:47,420 --> 00:01:50,300 that we need to understand and process graphs and hopefully 41 00:01:50,300 --> 00:01:53,560 by the end of lecture, have some style 42 00:01:53,560 --> 00:01:56,060 of algorithm for computing the shortest path from one vertex 43 00:01:56,060 --> 00:01:58,140 to all the other ones. 44 00:01:58,140 --> 00:02:01,620 So in case we forgot a little bit of terminology, a graph-- 45 00:02:01,620 --> 00:02:04,370 some people call this network, but sometimes that term is 46 00:02:04,370 --> 00:02:06,620 overloaded with a few different kind of variations 47 00:02:06,620 --> 00:02:07,820 on the theme-- 48 00:02:07,820 --> 00:02:10,070 is a collection of two things. 49 00:02:10,070 --> 00:02:12,650 That's what this parentheses notation means. 50 00:02:12,650 --> 00:02:16,520 There's a set of vertices and a set of edges. 51 00:02:16,520 --> 00:02:19,670 And the edges, like you can see in the sort of third point 52 00:02:19,670 --> 00:02:23,600 on our screen here, are a subset of v cross v. Now 53 00:02:23,600 --> 00:02:25,220 this is fancy notation for something 54 00:02:25,220 --> 00:02:27,257 really, really simple. 55 00:02:27,257 --> 00:02:28,590 Because what is this telling me? 56 00:02:28,590 --> 00:02:30,870 This is telling me that an edge, like in the picture 57 00:02:30,870 --> 00:02:32,640 that we see on the screen here. 58 00:02:32,640 --> 00:02:36,107 it just just something that connects to vertices together. 59 00:02:36,107 --> 00:02:38,690 So if I think of there being a pair of vertices, like the from 60 00:02:38,690 --> 00:02:42,335 and the to, then that is a subset of the cross product 61 00:02:42,335 --> 00:02:44,010 of v and itself. 62 00:02:44,010 --> 00:02:46,502 So hopefully the notation in that third line on the screen 63 00:02:46,502 --> 00:02:47,210 makes some sense. 64 00:02:47,210 --> 00:02:50,900 This is just fancy notation for edges are pairs of vertices. 65 00:02:50,900 --> 00:02:52,790 But of course, inside of that notation 66 00:02:52,790 --> 00:02:57,230 there are two special cases that we care about in this class. 67 00:02:57,230 --> 00:02:59,390 One is when you have a directed graph, 68 00:02:59,390 --> 00:03:01,550 and one is when you have an undirected graph-- 69 00:03:01,550 --> 00:03:03,592 because I said them in opposite order from what's 70 00:03:03,592 --> 00:03:04,930 on the screen. 71 00:03:04,930 --> 00:03:08,355 So in an undirected graph, I guess we still think of an edge 72 00:03:08,355 --> 00:03:10,730 like a pair of vertices, but really I should have notated 73 00:03:10,730 --> 00:03:12,230 this slightly differently-- in fact, 74 00:03:12,230 --> 00:03:14,480 maybe I'll revise it in the slides before they go 75 00:03:14,480 --> 00:03:15,650 into OCW-- 76 00:03:15,650 --> 00:03:20,150 where instead of writing e equals w comma v, 77 00:03:20,150 --> 00:03:24,950 I should write in fact equals v comma w. 78 00:03:24,950 --> 00:03:27,770 And notice that there's a slight difference between the notation 79 00:03:27,770 --> 00:03:30,020 on the slide and what I've written on the board, which 80 00:03:30,020 --> 00:03:32,480 is the set notation here. 81 00:03:32,480 --> 00:03:35,050 What's the difference between parentheses and squiggly lines 82 00:03:35,050 --> 00:03:36,580 is that this guy is unordered. 83 00:03:36,580 --> 00:03:38,140 This is a set of things. 84 00:03:38,140 --> 00:03:40,960 And what's on the board is ordered-- 85 00:03:40,960 --> 00:03:43,030 or what's on the screen rather. 86 00:03:43,030 --> 00:03:44,860 And of course, in an undirected edge 87 00:03:44,860 --> 00:03:46,930 there's no such thing as an edge from w 88 00:03:46,930 --> 00:03:49,788 to v being distinct from an edge from v to w. 89 00:03:49,788 --> 00:03:50,830 Those are the same thing. 90 00:03:50,830 --> 00:03:51,622 They're undirected. 91 00:03:51,622 --> 00:03:53,890 It just is a notion of connectivity. 92 00:03:53,890 --> 00:03:56,020 Whereas in a directed graph, now we're 93 00:03:56,020 --> 00:03:58,120 going to use that parenthetical notation 94 00:03:58,120 --> 00:04:01,330 to say that the edge from w to v is 95 00:04:01,330 --> 00:04:03,987 different than the edge from v to w. 96 00:04:03,987 --> 00:04:05,570 That's going to make a big difference. 97 00:04:05,570 --> 00:04:09,160 So for example in the graph on the right-- 98 00:04:09,160 --> 00:04:12,470 let's maybe redraw it on the board here. 99 00:04:12,470 --> 00:04:14,140 So we have four vertices. 100 00:04:14,140 --> 00:04:16,540 I drew this last night, and I'm hoping that this example 101 00:04:16,540 --> 00:04:17,260 actually works. 102 00:04:22,029 --> 00:04:26,580 Like that-- can I get from the upper right vertex 103 00:04:26,580 --> 00:04:31,550 to the lower left vertex following edges in this graph? 104 00:04:31,550 --> 00:04:33,500 I heard one person. 105 00:04:33,500 --> 00:04:35,838 Everybody on three-- 1, 2, 3. 106 00:04:35,838 --> 00:04:36,380 AUDIENCE: No. 107 00:04:36,380 --> 00:04:37,463 JUSTIN SOLOMON: No, right. 108 00:04:37,463 --> 00:04:38,780 Because if I wanted to-- 109 00:04:38,780 --> 00:04:41,300 I mean maybe I think of drawing this path here-- 110 00:04:41,300 --> 00:04:44,270 but of course, if I would go from the upper right 111 00:04:44,270 --> 00:04:45,590 to the lower left-- 112 00:04:45,590 --> 00:04:48,550 this is like the ugliest thing I've ever done, I'm so sorry-- 113 00:04:48,550 --> 00:04:50,300 you can notice that the edges are pointing 114 00:04:50,300 --> 00:04:52,410 in the up direction here. 115 00:04:52,410 --> 00:04:54,650 So I'd have to go against the stream of the water, 116 00:04:54,650 --> 00:04:57,242 but that's not allowable in the directed graph case. 117 00:04:57,242 --> 00:04:58,700 Of course, I'm already anticipating 118 00:04:58,700 --> 00:05:01,075 the notion of a path which we haven't really defined yet. 119 00:05:01,075 --> 00:05:02,900 But I think intuitively, that's sort 120 00:05:02,900 --> 00:05:06,015 of the big difference between a directed and undirected graph. 121 00:05:06,015 --> 00:05:08,140 Does that distinction makes sense to all of you all 122 00:05:08,140 --> 00:05:12,560 or have I managed to lose you in four minutes or less? 123 00:05:12,560 --> 00:05:14,360 Excellent. 124 00:05:14,360 --> 00:05:16,652 So I flipped things a tiny, tiny bit from the course 125 00:05:16,652 --> 00:05:18,110 notes because I figured we'd define 126 00:05:18,110 --> 00:05:19,777 what a graph is first before telling you 127 00:05:19,777 --> 00:05:21,690 what the implications are. 128 00:05:21,690 --> 00:05:23,990 But in any event, I think it's really 129 00:05:23,990 --> 00:05:26,000 not a big stretch of the imagination 130 00:05:26,000 --> 00:05:28,640 to say that graphs are literally everywhere 131 00:05:28,640 --> 00:05:30,320 in our everyday life, right. 132 00:05:30,320 --> 00:05:34,130 Any time that we come up with a network of stuff connected 133 00:05:34,130 --> 00:05:37,040 together, implicitly the right abstraction often 134 00:05:37,040 --> 00:05:40,480 in the back of our heads is to think about a graph. 135 00:05:40,480 --> 00:05:42,230 So some simple examples that I think 136 00:05:42,230 --> 00:05:43,730 would all come to mind for us would 137 00:05:43,730 --> 00:05:45,580 be like computer networks-- 138 00:05:45,580 --> 00:05:49,160 so the nodes or the vertices of your graph in that case, maybe 139 00:05:49,160 --> 00:05:50,960 are computers, and then the edges 140 00:05:50,960 --> 00:05:53,480 are roughly the cables connecting them together 141 00:05:53,480 --> 00:05:57,110 in my very coarse understanding of how networks work-- 142 00:05:57,110 --> 00:05:59,060 or maybe at a social network-- 143 00:05:59,060 --> 00:06:02,210 the nodes are people on your social network, 144 00:06:02,210 --> 00:06:04,490 and the edges are friend relationships 145 00:06:04,490 --> 00:06:08,300 or frenemy relationships or whatever. 146 00:06:08,300 --> 00:06:11,330 In fact, I think you could think of both directed and undirected 147 00:06:11,330 --> 00:06:15,220 versions of that particular network. 148 00:06:15,220 --> 00:06:18,110 In road networks, maybe I'm working for Google 149 00:06:18,110 --> 00:06:20,600 and I want to tell you the shortest path 150 00:06:20,600 --> 00:06:22,220 between your house and MIT. 151 00:06:22,220 --> 00:06:24,500 Of course, in order to do that and essentially 152 00:06:24,500 --> 00:06:26,840 behind the scenes, we're solving some version 153 00:06:26,840 --> 00:06:28,640 of computing the shortest path between two 154 00:06:28,640 --> 00:06:30,225 vertices in a graph. 155 00:06:30,225 --> 00:06:31,850 That's a tiny bit of a lie in the sense 156 00:06:31,850 --> 00:06:33,808 that there's a lot of structure in that problem 157 00:06:33,808 --> 00:06:35,930 that we're not going to leverage in this course. 158 00:06:35,930 --> 00:06:38,900 A road network is a very special type of graph, 159 00:06:38,900 --> 00:06:40,670 and if you take an advanced course maybe 160 00:06:40,670 --> 00:06:43,490 you'll say, well, if I know a little more about my graph 161 00:06:43,490 --> 00:06:46,740 I can do better than the general case we'll talk about here. 162 00:06:46,740 --> 00:06:49,610 But the basic algorithms that we'll talk about in 6.006 163 00:06:49,610 --> 00:06:51,073 are certainly relevant in that case 164 00:06:51,073 --> 00:06:52,490 and are really the building blocks 165 00:06:52,490 --> 00:06:54,620 for what goes on in the tools that 166 00:06:54,620 --> 00:06:57,260 are used every day on your phone when you open Google Maps 167 00:06:57,260 --> 00:06:59,630 or Ways or whatever. 168 00:06:59,630 --> 00:07:01,310 And of course, there's many others. 169 00:07:01,310 --> 00:07:03,835 So for instance, an example that maybe 170 00:07:03,835 --> 00:07:05,210 is a little bit more subtle would 171 00:07:05,210 --> 00:07:07,430 be the set of states and transitions 172 00:07:07,430 --> 00:07:08,820 of a discrete thing. 173 00:07:08,820 --> 00:07:11,420 So think about like a Rubik's cube. 174 00:07:11,420 --> 00:07:13,580 So I could make a graph where the node 175 00:07:13,580 --> 00:07:15,860 is every configuration of my Rubik's cube, 176 00:07:15,860 --> 00:07:17,300 like every rotation. 177 00:07:17,300 --> 00:07:19,490 And then the edges are like can I 178 00:07:19,490 --> 00:07:21,500 get from this configuration to that one 179 00:07:21,500 --> 00:07:24,170 by making one simple transition, like one flip. 180 00:07:24,170 --> 00:07:26,420 I don't actually know the terminology in Rubik's cube, 181 00:07:26,420 --> 00:07:29,870 I have a feeling you do, for one rotation. 182 00:07:29,870 --> 00:07:32,478 Twist-- thank you. 183 00:07:32,478 --> 00:07:34,270 And of course, there are many other places. 184 00:07:34,270 --> 00:07:38,250 So for instance, in my day job here at MIT 185 00:07:38,250 --> 00:07:40,130 I typically teach computer graphics courses. 186 00:07:40,130 --> 00:07:42,170 And actually graph theory, although we 187 00:07:42,170 --> 00:07:45,170 talk about it very differently, appears in that world 188 00:07:45,170 --> 00:07:46,460 constantly. 189 00:07:46,460 --> 00:07:48,860 Because of course, with sitting behind any 3D model 190 00:07:48,860 --> 00:07:52,725 on your computer is a giant network of triangles. 191 00:07:52,725 --> 00:07:54,350 This is called a triangulated surface-- 192 00:07:54,350 --> 00:07:56,210 like this torus we see here. 193 00:07:56,210 --> 00:07:57,827 And this is nothing more than a graph. 194 00:07:57,827 --> 00:07:59,660 And in fact, if you squint at the algorithms 195 00:07:59,660 --> 00:08:01,460 that we cover in six eight three eight, 196 00:08:01,460 --> 00:08:05,373 you'll see they're roughly just graph algorithms in disguise. 197 00:08:05,373 --> 00:08:07,790 In fact, if you take my graduate course one thing we'll do 198 00:08:07,790 --> 00:08:10,165 is we'll spend a lot of time doing differential geometry. 199 00:08:10,165 --> 00:08:12,103 And then we'll step back 10 feet and notice 200 00:08:12,103 --> 00:08:13,520 that exactly the algorithms we are 201 00:08:13,520 --> 00:08:17,570 using for computing curvature and bendiness on triangle 202 00:08:17,570 --> 00:08:19,940 meshes, just looks like a graph algorithm 203 00:08:19,940 --> 00:08:22,540 and can be applied to networks in exactly the same way. 204 00:08:22,540 --> 00:08:25,430 So it will be a nice kind of fun reveal there. 205 00:08:25,430 --> 00:08:28,940 And of course, there's one last kind of fun application. 206 00:08:28,940 --> 00:08:31,160 I actually was gone the last couple of days 207 00:08:31,160 --> 00:08:33,320 at a conference on political redistricting. 208 00:08:33,320 --> 00:08:35,570 And the funny thing is most of the discussion 209 00:08:35,570 --> 00:08:38,510 at that conference was about graph theory. 210 00:08:38,510 --> 00:08:43,340 And the reason for that is sort of a theme that shows up 211 00:08:43,340 --> 00:08:46,642 a lot in geometry world, which is if I take my state, 212 00:08:46,642 --> 00:08:48,350 in this case I think these are the voting 213 00:08:48,350 --> 00:08:51,500 precincts in some state or another, 214 00:08:51,500 --> 00:08:53,810 and I look at adjacency relationships, 215 00:08:53,810 --> 00:08:56,570 then maybe I put a node for every precinct and an edge 216 00:08:56,570 --> 00:08:59,760 any time that they share a boundary with one another. 217 00:08:59,760 --> 00:09:01,010 Well now I have a network. 218 00:09:01,010 --> 00:09:03,980 And maybe a region on my graph is like a connected piece 219 00:09:03,980 --> 00:09:06,150 of this network. 220 00:09:06,150 --> 00:09:08,780 And so anyway, this is one of these examples where 221 00:09:08,780 --> 00:09:11,210 graphs and networks and connectivity and so on just 222 00:09:11,210 --> 00:09:12,920 show up literally no matter where you go. 223 00:09:12,920 --> 00:09:14,877 They're totally unavoidable. 224 00:09:14,877 --> 00:09:17,210 And so that's what we'll be spending quite a bit of time 225 00:09:17,210 --> 00:09:20,250 on in this class here. 226 00:09:20,250 --> 00:09:23,000 Now you could easily take, I would argue, 227 00:09:23,000 --> 00:09:27,620 at least three entire courses on graph theory here at MIT, 228 00:09:27,620 --> 00:09:30,560 and you could easily build a PhD dissertation doing nothing 229 00:09:30,560 --> 00:09:32,780 more than really simple problems on graphs. 230 00:09:32,780 --> 00:09:37,580 Of course, in this class we're limited to just a few 231 00:09:37,580 --> 00:09:38,455 lectures out of many. 232 00:09:38,455 --> 00:09:40,372 So we're going to make a couple of assumptions 233 00:09:40,372 --> 00:09:43,250 both on the problems we want to solve, as well as in the graphs 234 00:09:43,250 --> 00:09:44,760 that we care about. 235 00:09:44,760 --> 00:09:47,210 So in particular, one simplifying assumption, 236 00:09:47,210 --> 00:09:48,767 which actually really doesn't affect 237 00:09:48,767 --> 00:09:50,600 many of the algorithms we'll talk about here 238 00:09:50,600 --> 00:09:52,790 but it's worth noting explicitly, 239 00:09:52,790 --> 00:09:55,520 is that we'll mostly be thinking about a particular type 240 00:09:55,520 --> 00:09:58,810 of graph which is a simple graph. 241 00:09:58,810 --> 00:10:01,690 And in fact often, depending on how you define your graph, 242 00:10:01,690 --> 00:10:03,970 you kind of accidentally made your graph simple 243 00:10:03,970 --> 00:10:05,200 even if you didn't intend to. 244 00:10:05,200 --> 00:10:07,570 So for example, we wrote that our edges 245 00:10:07,570 --> 00:10:11,110 were a subset of v cross v. Which maybe means that I can't 246 00:10:11,110 --> 00:10:15,460 have multiple edges that sort of traverse 247 00:10:15,460 --> 00:10:17,110 the same pair of vertices. 248 00:10:17,110 --> 00:10:22,190 So let's see an example of a graph that is not simple. 249 00:10:22,190 --> 00:10:23,920 So sorry, I haven't actually defined it. 250 00:10:23,920 --> 00:10:27,440 A simple graph is a graph that has no self loops, 251 00:10:27,440 --> 00:10:30,210 so it can't go from a vertex to itself, 252 00:10:30,210 --> 00:10:31,860 and every edge is distinct. 253 00:10:31,860 --> 00:10:36,610 So let's make the most non simple graph we can think of. 254 00:10:36,610 --> 00:10:40,600 Like let's say I have two vertices. 255 00:10:40,600 --> 00:10:42,700 So maybe if I want to make my-- 256 00:10:42,700 --> 00:10:45,400 so there's a graph, right, two vertices and one edge. 257 00:10:45,400 --> 00:10:47,050 This is simple. 258 00:10:47,050 --> 00:10:49,420 If I wanted to be annoying and make it not simple, 259 00:10:49,420 --> 00:10:53,130 maybe I take this edge and I'd duplicate it three times just 260 00:10:53,130 --> 00:10:54,430 for fun. 261 00:10:54,430 --> 00:10:57,090 That violates the second assumption. 262 00:10:57,090 --> 00:10:59,590 And now to make it even worse, I could violate the first one 263 00:10:59,590 --> 00:11:03,050 by adding an edge that goes from this vertex to itself. 264 00:11:03,050 --> 00:11:06,530 This is not simple. 265 00:11:09,828 --> 00:11:12,370 I don't know what you would call it actually-- general graph, 266 00:11:12,370 --> 00:11:14,930 I guess-- complicated because it's not simple. 267 00:11:14,930 --> 00:11:19,297 I don't know-- a multigraph. 268 00:11:19,297 --> 00:11:20,380 I always thought of that-- 269 00:11:20,380 --> 00:11:22,220 anyway, it doesn't matter. 270 00:11:22,220 --> 00:11:24,100 But in any event, in this class we're 271 00:11:24,100 --> 00:11:26,745 not going to worry about this particular circumstance. 272 00:11:26,745 --> 00:11:28,870 And of course, in many applications of graph theory 273 00:11:28,870 --> 00:11:31,612 that's a totally reasonable assumption to make. 274 00:11:31,612 --> 00:11:33,820 Any questions about the definition of a simple graph? 275 00:11:36,440 --> 00:11:39,092 OK, so from now on whenever we think about a graph, 276 00:11:39,092 --> 00:11:40,550 in the back of our head we're going 277 00:11:40,550 --> 00:11:42,260 to think of our graph as simple. 278 00:11:42,260 --> 00:11:44,870 There's one nice property that a simple graph has, 279 00:11:44,870 --> 00:11:47,120 which I've written in really big text on the screen 280 00:11:47,120 --> 00:11:52,250 here, which is that the edges are big O of v squared. 281 00:11:52,250 --> 00:11:55,790 And in fact, let's expand that formula just a tiny bit. 282 00:11:55,790 --> 00:11:57,860 So there's sort of two cases, one 283 00:11:57,860 --> 00:11:59,870 is when my graph is undirected, the other 284 00:11:59,870 --> 00:12:03,780 is when my graph is directed. 285 00:12:03,780 --> 00:12:05,270 So if I have a directed graph-- 286 00:12:12,220 --> 00:12:15,570 well, let's think about how many edges we could possibly have. 287 00:12:15,570 --> 00:12:21,580 So an edge is a pair of a from vertex and a to vertex, 288 00:12:21,580 --> 00:12:23,230 and I can never repeat it twice. 289 00:12:23,230 --> 00:12:27,190 That's sort of like the second assumption here. 290 00:12:27,190 --> 00:12:29,570 So in particular, what do we know? 291 00:12:29,570 --> 00:12:31,390 We know that mod E-- or rather the number 292 00:12:31,390 --> 00:12:36,740 of edges in our graph is upper bounded by what? 293 00:12:36,740 --> 00:12:40,600 Well, I can take any pair of vertices-- 294 00:12:43,958 --> 00:12:46,000 like that-- but I have to be a little bit careful 295 00:12:46,000 --> 00:12:48,250 because my graph is directed-- 296 00:12:48,250 --> 00:12:50,450 so from and to matter here. 297 00:12:50,450 --> 00:12:52,900 So this is v choose 2 is saying that I 298 00:12:52,900 --> 00:12:55,630 can take any unique pair of vertices, 299 00:12:55,630 --> 00:12:57,940 but I have to put a factor of 2 in front of it 300 00:12:57,940 --> 00:13:00,190 to account for the fact that the source and the target 301 00:13:00,190 --> 00:13:01,900 can be flip back and forth. 302 00:13:01,900 --> 00:13:03,825 And of course, if I want to do the undirected 303 00:13:03,825 --> 00:13:05,200 I don't have to worry about that. 304 00:13:10,520 --> 00:13:16,540 We'll get E here is less than or equal to just mod v choose 2. 305 00:13:16,540 --> 00:13:19,030 So this is just a fancy way of saying 306 00:13:19,030 --> 00:13:21,400 that every edge consists of two vertices, 307 00:13:21,400 --> 00:13:22,648 and my edges are unique. 308 00:13:22,648 --> 00:13:24,190 And one thing, if you just write down 309 00:13:24,190 --> 00:13:27,160 the formula for our binomial coefficient here, 310 00:13:27,160 --> 00:13:30,440 we'll see that both of these things-- 311 00:13:30,440 --> 00:13:33,850 oops, oh, yeah, sorry-- 312 00:13:37,330 --> 00:13:39,920 are at worse mod v squared here. 313 00:13:43,370 --> 00:13:45,723 And that makes perfect sense, because of course, an edge 314 00:13:45,723 --> 00:13:46,640 is a pair of vertices. 315 00:13:46,640 --> 00:13:48,470 You kind of expect there to be a square there. 316 00:13:48,470 --> 00:13:48,970 Yes? 317 00:13:48,970 --> 00:13:53,942 AUDIENCE: [INAUDIBLE] 318 00:13:53,942 --> 00:13:55,150 JUSTIN SOLOMON: I'm so sorry. 319 00:13:55,150 --> 00:13:56,177 I can't hear you. 320 00:13:56,177 --> 00:13:57,760 AUDIENCE: So the 2 comes from the fact 321 00:13:57,760 --> 00:13:58,932 that it's from the source. 322 00:13:58,932 --> 00:14:00,140 JUSTIN SOLOMON: Yes, exactly. 323 00:14:00,140 --> 00:14:03,400 So the 2 for the director case, comes from the fact 324 00:14:03,400 --> 00:14:07,900 that an edge from v to w is different than an edge from w 325 00:14:07,900 --> 00:14:11,105 to v. So remember that the binomial coefficient here, it's 326 00:14:11,105 --> 00:14:12,730 just counting the number of ways that I 327 00:14:12,730 --> 00:14:15,280 can choose two things from a set of size v, 328 00:14:15,280 --> 00:14:16,930 but it doesn't care about ordering. 329 00:14:16,930 --> 00:14:20,760 Yeah, any other questions? 330 00:14:20,760 --> 00:14:21,808 Fabulous. 331 00:14:21,808 --> 00:14:23,100 So why is this going to matter? 332 00:14:23,100 --> 00:14:24,600 Well, these sorts of bounds, I mean 333 00:14:24,600 --> 00:14:26,470 they might seem a little bit obvious to you, 334 00:14:26,470 --> 00:14:28,428 but we're going to write down graph algorithms. 335 00:14:28,428 --> 00:14:30,480 And now when we analyze the runtime and the space 336 00:14:30,480 --> 00:14:33,368 that they take, we now have sort of two different numbers 337 00:14:33,368 --> 00:14:34,410 that we can think about-- 338 00:14:34,410 --> 00:14:37,740 the number of vertices and the number of edges. 339 00:14:37,740 --> 00:14:39,240 And so for instance, if I write down 340 00:14:39,240 --> 00:14:42,300 an algorithm whose runtime is proportional to the number 341 00:14:42,300 --> 00:14:44,940 of edges, maybe then genErikally I 342 00:14:44,940 --> 00:14:46,513 could also think of the algorithm 343 00:14:46,513 --> 00:14:48,930 as having a runtime that looks like the number of vertices 344 00:14:48,930 --> 00:14:51,030 squared unless I put some additional assumptions 345 00:14:51,030 --> 00:14:52,100 on my graph. 346 00:14:52,100 --> 00:14:53,850 And so there's some connection between all 347 00:14:53,850 --> 00:14:55,470 of these different constants, and it's 348 00:14:55,470 --> 00:14:57,000 useful to kind of keep that at the back of our head. 349 00:14:57,000 --> 00:14:59,417 That sometimes you'll see a bunch of different expressions 350 00:14:59,417 --> 00:15:02,220 that really are encoding roughly the same relationship just 351 00:15:02,220 --> 00:15:04,130 in different language 352 00:15:04,130 --> 00:15:06,840 Of course, that also means that we can be more precise. 353 00:15:06,840 --> 00:15:10,470 So sometimes a graph is what we would call sparse. 354 00:15:10,470 --> 00:15:12,290 So in my universe, almost all graphs 355 00:15:12,290 --> 00:15:15,530 that I deal with in my day to day life are extremely sparse. 356 00:15:15,530 --> 00:15:18,530 This is a consequence of topology. 357 00:15:18,530 --> 00:15:20,750 And because of that, an algorithm 358 00:15:20,750 --> 00:15:22,790 that scales like the number of edges 359 00:15:22,790 --> 00:15:25,215 might actually be much preferable to an algorithm 360 00:15:25,215 --> 00:15:26,840 that scales like the number of vertices 361 00:15:26,840 --> 00:15:28,910 squared because, in practice, often 362 00:15:28,910 --> 00:15:32,330 there are fewer edges than like every single possible pair. 363 00:15:32,330 --> 00:15:34,560 And so that's the sort of reason why 364 00:15:34,560 --> 00:15:37,190 it's we're thinking about these numbers. 365 00:15:37,190 --> 00:15:40,550 OK, so let's continue making boring definitions here. 366 00:15:40,550 --> 00:15:42,560 So some other ones that we should think about 367 00:15:42,560 --> 00:15:46,490 involve the topology or the connectivity of our graph-- 368 00:15:46,490 --> 00:15:49,280 in particular, thinking about neighbors. 369 00:15:49,280 --> 00:15:52,400 So in general we kind think about pairs of vertices 370 00:15:52,400 --> 00:15:54,090 as being neighbors of one another 371 00:15:54,090 --> 00:15:55,940 if there's an edge between them. 372 00:15:55,940 --> 00:15:57,710 We have to be a little bit careful 373 00:15:57,710 --> 00:16:00,440 because, of course, when we have a directed edge, 374 00:16:00,440 --> 00:16:02,900 we have to be careful who's on the sort of giving 375 00:16:02,900 --> 00:16:05,480 and the receiving end of this neighbor relationship. 376 00:16:05,480 --> 00:16:08,730 Yeah, so let's draw a really, really simple graph. 377 00:16:08,730 --> 00:16:13,130 So here's vertex 0, here's vertex 1, here's vertex 2. 378 00:16:13,130 --> 00:16:17,570 And maybe we'll have an edge going up, an edge going down, 379 00:16:17,570 --> 00:16:21,470 and then a cycle here. 380 00:16:21,470 --> 00:16:22,700 OK. 381 00:16:22,700 --> 00:16:24,980 Now we can define a lot of different notions 382 00:16:24,980 --> 00:16:27,290 of neighbors-- like the outgoing neighbor set, 383 00:16:27,290 --> 00:16:29,120 the incoming neighbor set. 384 00:16:29,120 --> 00:16:30,800 And the basic idea here is that we 385 00:16:30,800 --> 00:16:34,040 want to keep track of edges going from a vertex and edges 386 00:16:34,040 --> 00:16:35,630 pointing into one. 387 00:16:35,630 --> 00:16:37,910 Yeah, so for instance, the outgoing neighbor 388 00:16:37,910 --> 00:16:42,200 set, which we're going to notate as Adj plus here-- 389 00:16:44,900 --> 00:16:49,037 what is the outgoing neighbor set of node 0 here? 390 00:16:49,037 --> 00:16:50,870 Well, if we take a look, notice that there's 391 00:16:50,870 --> 00:16:54,230 one edge going out of node 0, and it points to node 2. 392 00:16:54,230 --> 00:16:57,230 So of course, this is a set which 393 00:16:57,230 --> 00:16:59,390 just contains one other node. 394 00:16:59,390 --> 00:17:05,960 And similarly, the incoming neighbor set of node 0, well 395 00:17:05,960 --> 00:17:09,540 notice that there's one incoming neighbor from vertex 1, 396 00:17:09,540 --> 00:17:14,180 so that is a set like that. 397 00:17:14,180 --> 00:17:16,550 Now of course, in an undirected graph 398 00:17:16,550 --> 00:17:18,650 the sort of distinction between these two things 399 00:17:18,650 --> 00:17:19,323 doesn't matter. 400 00:17:19,323 --> 00:17:20,990 So if you look at our final bullet point 401 00:17:20,990 --> 00:17:22,910 here, often in the undirected case 402 00:17:22,910 --> 00:17:25,020 we just drop that plus or minus superscript 403 00:17:25,020 --> 00:17:28,069 because it sort of doesn't matter. 404 00:17:28,069 --> 00:17:31,190 In any event, there's one additional piece of terminology 405 00:17:31,190 --> 00:17:33,890 that matters quite a bit, which is degree. 406 00:17:33,890 --> 00:17:35,780 And this is nothing more than just counting 407 00:17:35,780 --> 00:17:37,010 the size of this set. 408 00:17:37,010 --> 00:17:38,910 So the out degree is the number of edges 409 00:17:38,910 --> 00:17:40,830 that point out of a vertex. 410 00:17:40,830 --> 00:17:44,150 And the in degree is the number of edges that point in. 411 00:17:44,150 --> 00:17:47,475 So notice in this case, both of those numbers are 1. 412 00:17:47,475 --> 00:17:49,100 Let's see an example where they're not. 413 00:17:49,100 --> 00:17:53,160 So in node 1, notice there's two edges that come out. 414 00:17:53,160 --> 00:17:56,030 So the out degree of node 1 is 2. 415 00:17:56,030 --> 00:18:01,420 There's one edge that points in, so the in degree is 1. 416 00:18:01,420 --> 00:18:03,177 OK, so often why are we going to do this? 417 00:18:03,177 --> 00:18:05,260 Well, we're going to get a lot of graph algorithms 418 00:18:05,260 --> 00:18:07,660 that like have a FOR loop over the neighbors of a given 419 00:18:07,660 --> 00:18:08,750 vertex. 420 00:18:08,750 --> 00:18:12,320 And then this degree number is going to come into play. 421 00:18:12,320 --> 00:18:15,950 It's worth bounding these things just a tiny bit. 422 00:18:15,950 --> 00:18:20,050 So in particular, one thing we could think about-- 423 00:18:23,610 --> 00:18:25,840 I write too big, and I'm going to run out of space 424 00:18:25,840 --> 00:18:27,940 really quickly here-- 425 00:18:27,940 --> 00:18:29,030 is the following. 426 00:18:29,030 --> 00:18:33,250 So let's take a look at all of the possible nodes 427 00:18:33,250 --> 00:18:35,578 inside of my graph, and now let's 428 00:18:35,578 --> 00:18:36,745 sum up all of their degrees. 429 00:18:42,460 --> 00:18:43,760 So I'm going to-- 430 00:18:43,760 --> 00:18:45,890 let's see, if I look at this graph 431 00:18:45,890 --> 00:18:49,940 notice there's three edges adjacent to this vertex 432 00:18:49,940 --> 00:18:52,910 here, three edges adjacent to that one, two adjacent to this. 433 00:18:52,910 --> 00:18:54,760 So we sum them all together. 434 00:18:54,760 --> 00:18:56,830 So it's just a convenient bound to have around-- 435 00:18:56,830 --> 00:18:58,160 is to sum these things, because we're 436 00:18:58,160 --> 00:19:00,620 going to have algorithms that look like for every vertex, 437 00:19:00,620 --> 00:19:02,400 for every neighbor do something. 438 00:19:02,400 --> 00:19:04,850 So we might as well know roughly how much time 439 00:19:04,850 --> 00:19:06,920 that's going to take. 440 00:19:06,920 --> 00:19:09,120 Let's think about this. 441 00:19:09,120 --> 00:19:10,475 So what do we know? 442 00:19:10,475 --> 00:19:14,030 In an undirected graph every edge 443 00:19:14,030 --> 00:19:15,380 is adjacent to two vertices. 444 00:19:18,320 --> 00:19:20,540 So if we think about how we account 445 00:19:20,540 --> 00:19:23,440 for degree what do we know? 446 00:19:23,440 --> 00:19:25,060 Well we know that an edge sort of 447 00:19:25,060 --> 00:19:29,470 contributes to the degree of two different vertices. 448 00:19:29,470 --> 00:19:35,260 So if we think about it carefully here, 449 00:19:35,260 --> 00:19:38,710 what we're going to see is that if our graph is undirected-- 450 00:19:43,070 --> 00:19:49,020 oh, sorry-- is that right, wait I'm backward again. 451 00:19:49,020 --> 00:19:53,060 So if I have a graph with two vertices and one edge 452 00:19:53,060 --> 00:19:58,500 and it is undirected, notice that is the number of edges 453 00:19:58,500 --> 00:19:59,670 here is 1. 454 00:19:59,670 --> 00:20:02,580 What is the sum of the degree? 455 00:20:02,580 --> 00:20:04,890 Well, it's 1 plus 1 equals 2. 456 00:20:04,890 --> 00:20:14,990 Yeah, so there's a 2 here if my graph is undirected, 457 00:20:14,990 --> 00:20:20,900 and E if my graph is directed, if what I'm counting 458 00:20:20,900 --> 00:20:24,776 is just the outgoing degree. 459 00:20:24,776 --> 00:20:25,790 Does that makes sense? 460 00:20:25,790 --> 00:20:27,832 I think I managed to totally botch that sentence, 461 00:20:27,832 --> 00:20:29,280 so maybe let's try that again. 462 00:20:29,280 --> 00:20:32,180 So if I'm counting just the number of edges pointing out 463 00:20:32,180 --> 00:20:34,250 of every vertex, and I count that 464 00:20:34,250 --> 00:20:36,560 over all of the possible vertices, 465 00:20:36,560 --> 00:20:37,890 then there's two cases-- 466 00:20:37,890 --> 00:20:40,220 one is directed and one is undirected. 467 00:20:40,220 --> 00:20:42,590 So in the undirected case you get a 2 here 468 00:20:42,590 --> 00:20:44,840 because essentially every edge is simultaneously 469 00:20:44,840 --> 00:20:46,400 in going and outgoing. 470 00:20:46,400 --> 00:20:48,140 Whereas you get a 1 as the coefficient 471 00:20:48,140 --> 00:20:49,070 in the directed case. 472 00:20:49,070 --> 00:20:50,210 Does that makes sense? 473 00:20:50,210 --> 00:20:53,030 I'm sorry I botched that for a second. 474 00:20:53,030 --> 00:20:54,950 OK, excellent. 475 00:20:54,950 --> 00:20:58,298 OK, that's going to be a useful bound for us later on. 476 00:20:58,298 --> 00:20:59,840 Now we think about graphs, of course, 477 00:20:59,840 --> 00:21:01,010 we just spent the last couple of weeks 478 00:21:01,010 --> 00:21:02,305 thinking about data structures. 479 00:21:02,305 --> 00:21:04,680 We should think about how to store a graph on a computer, 480 00:21:04,680 --> 00:21:06,720 and there's many different options. 481 00:21:06,720 --> 00:21:10,040 In fact, really one thing that you can do is sort of pair-- 482 00:21:10,040 --> 00:21:12,227 just like when we talked about sets. 483 00:21:12,227 --> 00:21:14,060 There are many different ways to store sets. 484 00:21:14,060 --> 00:21:16,340 And one way to think about it was depending 485 00:21:16,340 --> 00:21:18,860 on how we're going to interact with that set we might choose 486 00:21:18,860 --> 00:21:21,830 one data structure or another to sort optimize 487 00:21:21,830 --> 00:21:24,590 the types of interactions we're going to have with that set 488 00:21:24,590 --> 00:21:26,360 and make them as fast as possible. 489 00:21:26,360 --> 00:21:29,690 This is exactly the same story for a graph. 490 00:21:29,690 --> 00:21:33,470 So for instance, the world's dumbest representation 491 00:21:33,470 --> 00:21:36,530 of a graph would be to just have a long list of edges. 492 00:21:36,530 --> 00:21:39,980 So for example, for this graph up here maybe 493 00:21:39,980 --> 00:21:44,930 I have 0, 1, that's an edge, and then 0, 494 00:21:44,930 --> 00:21:50,930 2, that's another edge, and then 1, 2, and then 2, 1. 495 00:21:50,930 --> 00:21:52,340 There's a big list of edges. 496 00:21:52,340 --> 00:21:53,480 It's really a set. 497 00:21:53,480 --> 00:21:55,312 I don't care about the order. 498 00:21:55,312 --> 00:21:56,718 AUDIENCE: The first one's 1, 2. 499 00:21:56,718 --> 00:21:58,260 JUSTIN SOLOMON: 1-- oh, you're right. 500 00:21:58,260 --> 00:21:58,860 I'm sorry. 501 00:21:58,860 --> 00:22:00,630 Yeah, the edge points up-- 502 00:22:00,630 --> 00:22:01,875 thanks Erik, or not Erik-- 503 00:22:01,875 --> 00:22:02,375 Jason. 504 00:22:05,110 --> 00:22:08,408 OK, so let's say that I have a graph algorithm, 505 00:22:08,408 --> 00:22:09,950 and I'm going to have to do something 506 00:22:09,950 --> 00:22:13,100 like check whether there exists an edge from v 507 00:22:13,100 --> 00:22:14,270 to w a bunch of times. 508 00:22:16,790 --> 00:22:19,380 How long is that going to take in this data structure? 509 00:22:19,380 --> 00:22:22,850 Well, if I just have like a hot mess disorganized list of edges 510 00:22:22,850 --> 00:22:25,250 and I want to know does there exist an edge from v to w, 511 00:22:25,250 --> 00:22:28,605 all I can do is write a FOR loop that just goes along this 512 00:22:28,605 --> 00:22:30,480 and says, like this the edge I'm looking for. 513 00:22:30,480 --> 00:22:30,890 No. 514 00:22:30,890 --> 00:22:31,880 Is that the edge I'm looking for? 515 00:22:31,880 --> 00:22:32,645 No. 516 00:22:32,645 --> 00:22:35,360 So every single time I want to find an edge, 517 00:22:35,360 --> 00:22:37,933 it's going to take me time proportional 518 00:22:37,933 --> 00:22:39,350 to the number of edges of my graph 519 00:22:39,350 --> 00:22:42,020 which could potentially be up to v squared. 520 00:22:42,020 --> 00:22:44,600 Yeah, so this is not such a great representation 521 00:22:44,600 --> 00:22:47,352 of a graph on my computer. 522 00:22:47,352 --> 00:22:49,310 So if we're thinking back to our data structure 523 00:22:49,310 --> 00:22:52,130 we may say, OK, so an edge list is probably not the way to go. 524 00:22:52,130 --> 00:22:53,780 Although notice that the way we notated 525 00:22:53,780 --> 00:22:56,750 what is a graph kind of looks like an edge list. 526 00:22:56,750 --> 00:22:59,870 But in any event, the more common thing to do 527 00:22:59,870 --> 00:23:03,750 is to source something like an adjacency list. 528 00:23:03,750 --> 00:23:08,360 So the basic idea of an adjacency list 529 00:23:08,360 --> 00:23:18,710 is that what I'm going to store is a set that maps a vertex 530 00:23:18,710 --> 00:23:24,465 u to everything adjacent to u. 531 00:23:24,465 --> 00:23:25,840 So in other words, I'm just going 532 00:23:25,840 --> 00:23:27,465 to keep track of all the outgoing edges 533 00:23:27,465 --> 00:23:29,830 from every vertex. 534 00:23:29,830 --> 00:23:33,642 And now I have to decide, how am I going to store this object. 535 00:23:33,642 --> 00:23:35,100 And oftentimes, we're going to have 536 00:23:35,100 --> 00:23:37,350 to answer queries like does there exist an edge from v 537 00:23:37,350 --> 00:23:38,522 to w. 538 00:23:38,522 --> 00:23:39,480 So how could I do that? 539 00:23:39,480 --> 00:23:43,140 First, I would look up v, and I get back 540 00:23:43,140 --> 00:23:44,958 sort of a list or a set of all the things 541 00:23:44,958 --> 00:23:47,250 that are adjacent to v. And I have to query that thing. 542 00:23:47,250 --> 00:23:49,110 And I want it to be pretty fast. 543 00:23:49,110 --> 00:23:54,990 So maybe what I do is I store the set 544 00:23:54,990 --> 00:24:00,990 of adjacent stuff as something like a direct access 545 00:24:00,990 --> 00:24:07,500 array or a hash table to make that look up fast. 546 00:24:14,870 --> 00:24:17,123 So for example, how long would it take-- 547 00:24:17,123 --> 00:24:19,040 I see, I'm going to finish the sentence here-- 548 00:24:19,040 --> 00:24:20,457 how long would it take me to check 549 00:24:20,457 --> 00:24:22,213 if an edge existed in my graph? 550 00:24:22,213 --> 00:24:23,130 Well, what would I do? 551 00:24:23,130 --> 00:24:24,680 I would first pull out this object, 552 00:24:24,680 --> 00:24:26,720 and then I'd look inside of here. 553 00:24:26,720 --> 00:24:28,490 So if I stored this as a hash table, 554 00:24:28,490 --> 00:24:31,100 then the expected time I would have order one look up, 555 00:24:31,100 --> 00:24:32,690 because this is order one and then 556 00:24:32,690 --> 00:24:34,890 you have another order one look up there. 557 00:24:34,890 --> 00:24:38,580 So we went from v squared to one with one simple trick. 558 00:24:38,580 --> 00:24:39,327 Yes? 559 00:24:39,327 --> 00:24:42,223 AUDIENCE: Does it matter what direction [INAUDIBLE] 560 00:24:42,223 --> 00:24:43,890 JUSTIN SOLOMON: That's a great question. 561 00:24:43,890 --> 00:24:45,307 So this is a design decision here. 562 00:24:45,307 --> 00:24:48,110 I'm sorry, in my head I think a lot about undirected graphs, 563 00:24:48,110 --> 00:24:49,818 and I'm going to make this mistake a lot. 564 00:24:49,818 --> 00:24:51,260 And I'm glad that you caught me. 565 00:24:51,260 --> 00:24:52,968 There's a totally reasonable thing to do, 566 00:24:52,968 --> 00:24:55,670 which is maybe just to keep track of the outgoing edges 567 00:24:55,670 --> 00:24:56,687 for every vertex. 568 00:24:56,687 --> 00:24:57,770 This is a design decision. 569 00:24:57,770 --> 00:24:59,690 For an algorithm maybe I want to keep 570 00:24:59,690 --> 00:25:00,973 track of the incoming edges. 571 00:25:00,973 --> 00:25:02,390 Whatever, I just have to make sure 572 00:25:02,390 --> 00:25:05,580 that it aligns with what I want to do with my graph later. 573 00:25:05,580 --> 00:25:06,990 Excellent point. 574 00:25:06,990 --> 00:25:10,520 Sorry, as a geometry person we rarely encounter directed 575 00:25:10,520 --> 00:25:11,540 graphs. 576 00:25:11,540 --> 00:25:13,700 But it's important to keep remembering 577 00:25:13,700 --> 00:25:16,870 that not everybody works on the same problems that I do. 578 00:25:16,870 --> 00:25:20,630 OK, now if I wanted to be totally extreme about it-- 579 00:25:20,630 --> 00:25:22,683 as just a third example of representation, 580 00:25:22,683 --> 00:25:24,100 which actually, in some sense, you 581 00:25:24,100 --> 00:25:25,725 could think of like an adjacency list-- 582 00:25:25,725 --> 00:25:28,340 we need an adjacency matrix where now I just keep 583 00:25:28,340 --> 00:25:32,090 a giant v by v array of like does this exist, 584 00:25:32,090 --> 00:25:33,780 does that edge exist. 585 00:25:33,780 --> 00:25:38,670 Now it's really, really easy to check if an edge exists. 586 00:25:38,670 --> 00:25:41,045 But now let's say that I make a graph algorithm that's 587 00:25:41,045 --> 00:25:42,420 going to have a FOR loop over all 588 00:25:42,420 --> 00:25:45,470 the neighbors of some vertex. 589 00:25:45,470 --> 00:25:49,900 So here, if I wanted to loop over all the neighbors of u, 590 00:25:49,900 --> 00:25:51,400 I could do that in time proportional 591 00:25:51,400 --> 00:25:54,300 to the number of neighbors of u. 592 00:25:54,300 --> 00:25:56,130 But if I just have a big adjacency matrix, 593 00:25:56,130 --> 00:25:59,760 just a bunch of binary values-- like for every pair of vertices 594 00:25:59,760 --> 00:26:02,610 are these vertices adjacent-- yea or nay. 595 00:26:02,610 --> 00:26:04,720 If I want to iterate over all my neighbors, 596 00:26:04,720 --> 00:26:07,110 now I have to iterate over all the vertices 597 00:26:07,110 --> 00:26:10,132 and check is that number one and then do something. 598 00:26:10,132 --> 00:26:12,090 So actually that can incur some additional time 599 00:26:12,090 --> 00:26:13,200 and additional space. 600 00:26:13,200 --> 00:26:16,450 Does that makes sense? 601 00:26:16,450 --> 00:26:18,930 So in any event, that's a sort of a lazy man's graph 602 00:26:18,930 --> 00:26:19,680 representation. 603 00:26:19,680 --> 00:26:22,260 I use it a lot when I'm coding because adjacency matrices are 604 00:26:22,260 --> 00:26:23,160 easy to work with. 605 00:26:23,160 --> 00:26:25,407 But it does incur a lot of additional space, 606 00:26:25,407 --> 00:26:27,240 and it's not always the most efficient thing 607 00:26:27,240 --> 00:26:29,460 even if you have the space because iterating over 608 00:26:29,460 --> 00:26:32,700 neighbors, it actually can take quite a bit of time. 609 00:26:32,700 --> 00:26:35,250 OK, so the real point of our lecture today 610 00:26:35,250 --> 00:26:37,980 is to start introducing sort of the canonical problem 611 00:26:37,980 --> 00:26:39,900 that we all worry about on graphs 612 00:26:39,900 --> 00:26:43,383 which is computing paths, in particular shortest paths. 613 00:26:43,383 --> 00:26:45,300 So the first thing we should do is, of course, 614 00:26:45,300 --> 00:26:48,580 define what a path is on a graph. 615 00:26:48,580 --> 00:26:51,570 So we're going to talk about our graph like a road network. 616 00:26:51,570 --> 00:26:54,660 Let's think of maybe every node here as an intersection. 617 00:26:54,660 --> 00:26:56,970 So this is a roughly Kendall Square. 618 00:26:56,970 --> 00:26:59,520 See it's a square. 619 00:26:59,520 --> 00:27:04,050 But in any event, let's say that I want to find-- 620 00:27:04,050 --> 00:27:05,850 maybe a question one would be does 621 00:27:05,850 --> 00:27:08,970 there exist a way to get from vertex 1 to vertex 3. 622 00:27:08,970 --> 00:27:10,380 And then a better question to ask 623 00:27:10,380 --> 00:27:12,870 would be does there exists a short way to get from vertex 1 624 00:27:12,870 --> 00:27:13,647 to vertex 3. 625 00:27:13,647 --> 00:27:15,480 Then of course, the first thing I have to do 626 00:27:15,480 --> 00:27:16,440 is to define my enemy. 627 00:27:16,440 --> 00:27:19,380 I have define what I'm looking for, which is a path. 628 00:27:19,380 --> 00:27:23,130 So a path is nothing more than a sequence of vertices in a graph 629 00:27:23,130 --> 00:27:26,280 where every pair of adjacent vertices in that sequence 630 00:27:26,280 --> 00:27:27,560 is an edge. 631 00:27:27,560 --> 00:27:29,310 I think this all aligns with our intuition 632 00:27:29,310 --> 00:27:31,360 of what a path is in a graph. 633 00:27:31,360 --> 00:27:35,370 So for instance, here's a path p equals v1, v2, v3. 634 00:27:35,370 --> 00:27:38,310 So notice that there's an edge from v1 to v2 635 00:27:38,310 --> 00:27:40,020 and also an edge from v2 to v3. 636 00:27:40,020 --> 00:27:45,600 So it satisfies the assumptions set forth in our definition. 637 00:27:45,600 --> 00:27:47,370 What would not be a path in our graph-- 638 00:27:47,370 --> 00:27:51,900 would be like v1 comma v3, because there's no edge there. 639 00:27:51,900 --> 00:27:55,110 OK, so if we talk about paths, then there's 640 00:27:55,110 --> 00:27:58,407 a very natural notion which is the length. 641 00:27:58,407 --> 00:28:00,990 Length, I guess you could think of like the number of vertices 642 00:28:00,990 --> 00:28:03,870 in your path minus 1, or the number of edges 643 00:28:03,870 --> 00:28:05,100 that your path traverses. 644 00:28:05,100 --> 00:28:06,960 Those are the same thing. 645 00:28:06,960 --> 00:28:10,485 So for instance, the length of the path p here is 2. 646 00:28:10,485 --> 00:28:12,360 Does everybody see that? 647 00:28:12,360 --> 00:28:14,970 A very common coding bug that I encounter a lot 648 00:28:14,970 --> 00:28:18,750 is adding 1 to that number by accident. 649 00:28:18,750 --> 00:28:21,310 Because of course, there's one more vertex in your path 650 00:28:21,310 --> 00:28:23,690 than there are edges. 651 00:28:23,690 --> 00:28:25,730 OK, and there are many different-- 652 00:28:25,730 --> 00:28:28,512 there could be potentially more than one path 653 00:28:28,512 --> 00:28:29,720 between any pair of vertices. 654 00:28:29,720 --> 00:28:33,470 So let's say that I have an undirected graph that 655 00:28:33,470 --> 00:28:35,100 looks like the following. 656 00:28:35,100 --> 00:28:37,880 So it's just a square plus a diagonal. 657 00:28:37,880 --> 00:28:39,365 So here are nodes. 658 00:28:42,110 --> 00:28:44,600 So then a perfectly valid path from the lower left 659 00:28:44,600 --> 00:28:47,758 to the upper right would be to go one over and one up, 660 00:28:47,758 --> 00:28:49,550 but of course, there's a more efficient way 661 00:28:49,550 --> 00:28:51,717 to get from the lower left to the upper right, which 662 00:28:51,717 --> 00:28:54,667 is to go across the diagonal. 663 00:28:54,667 --> 00:28:56,500 And so when we talk about the shortest path, 664 00:28:56,500 --> 00:28:57,875 it's nothing more than the length 665 00:28:57,875 --> 00:28:59,920 of the path that has the fewest number of edges 666 00:28:59,920 --> 00:29:05,290 or vertices between any pair of vertices in my graph. 667 00:29:05,290 --> 00:29:06,460 OK, so this is our enemy. 668 00:29:06,460 --> 00:29:07,502 This is what we're after. 669 00:29:07,502 --> 00:29:11,690 It's computing the shortest path between vertices in a graph. 670 00:29:11,690 --> 00:29:14,260 And this is the thing that we'll be talking about quite 671 00:29:14,260 --> 00:29:15,135 a bit in this course. 672 00:29:15,135 --> 00:29:17,135 Because of course, it's a very practical matter. 673 00:29:17,135 --> 00:29:19,180 Like when I want to solve routing problems, 674 00:29:19,180 --> 00:29:20,900 I want to move packets out of my network, 675 00:29:20,900 --> 00:29:23,350 I'd prefer not to-- well, unless I'm doing Tor-- 676 00:29:23,350 --> 00:29:27,040 I would prefer them not to hit too many computers in between. 677 00:29:27,040 --> 00:29:28,990 Then maybe I want a computer shortest path. 678 00:29:28,990 --> 00:29:33,910 Or on a surface maybe I want to move information 679 00:29:33,910 --> 00:29:36,190 in a way that's not too far away. 680 00:29:36,190 --> 00:29:38,140 But of course, there's sort of many variations 681 00:29:38,140 --> 00:29:40,990 on that theme when we talk about shortest path 682 00:29:40,990 --> 00:29:42,870 or even just existence of a path. 683 00:29:42,870 --> 00:29:44,980 So these are three sort of model problems 684 00:29:44,980 --> 00:29:47,480 that we might solve on a graph. 685 00:29:47,480 --> 00:29:50,090 So the first one, which in this of course 686 00:29:50,090 --> 00:29:52,360 we're calling the single pair reachability, 687 00:29:52,360 --> 00:29:54,310 would be the idea that I take two vertices s 688 00:29:54,310 --> 00:29:58,720 and t on my graph g, and I ask you does there 689 00:29:58,720 --> 00:30:01,794 exists a path between s and t. 690 00:30:01,794 --> 00:30:04,010 So what would be the sort of extreme example 691 00:30:04,010 --> 00:30:07,670 where this problem may not always 692 00:30:07,670 --> 00:30:09,265 give back the answer yes? 693 00:30:09,265 --> 00:30:11,390 Somehow in our head, I think we think of all graphs 694 00:30:11,390 --> 00:30:13,250 as being connected. 695 00:30:13,250 --> 00:30:15,840 But a perfectly valid graph the way we've defined it 696 00:30:15,840 --> 00:30:19,713 would be like 10 vertices and no edges. 697 00:30:19,713 --> 00:30:21,380 This function would be very easy to code 698 00:30:21,380 --> 00:30:23,930 if that were the only graph you ever cared about. 699 00:30:23,930 --> 00:30:26,560 But any event, the existence of a path 700 00:30:26,560 --> 00:30:28,310 is already a query that takes a little bit 701 00:30:28,310 --> 00:30:29,310 of algorithmic thinking. 702 00:30:29,310 --> 00:30:32,950 We haven't figured out how to do that yet. 703 00:30:32,950 --> 00:30:35,450 Now another problem we can solve would be the shortest path. 704 00:30:35,450 --> 00:30:37,670 Given a graph and two vertices, we 705 00:30:37,670 --> 00:30:41,150 might say, well, how far apart are these vertices of my graph 706 00:30:41,150 --> 00:30:43,910 if I want to use the shortest possible distance from one 707 00:30:43,910 --> 00:30:45,650 to the other. 708 00:30:45,650 --> 00:30:49,250 Notice that I can use the second problem to solve the first one. 709 00:30:49,250 --> 00:30:51,200 Because what's the length of the shortest 710 00:30:51,200 --> 00:30:55,600 path between two vertices that don't have a path between them? 711 00:30:55,600 --> 00:30:57,700 Infinity or a shrug-- that's actually 712 00:30:57,700 --> 00:30:58,720 a totally valid answer. 713 00:30:58,720 --> 00:31:00,730 Yeah, that's right. 714 00:31:00,730 --> 00:31:03,760 So how could I implement the reachability code? 715 00:31:03,760 --> 00:31:05,680 Well, I could call my shortest path code, 716 00:31:05,680 --> 00:31:07,030 and it gives me infinity. 717 00:31:07,030 --> 00:31:08,990 Then I return no, it's not reachable. 718 00:31:08,990 --> 00:31:12,250 And if it gives me not infinity, I return yes. 719 00:31:12,250 --> 00:31:14,917 So remember that a key idea in an algorithms class 720 00:31:14,917 --> 00:31:16,000 is this idea of reduction. 721 00:31:16,000 --> 00:31:19,190 That I can use one function to solve another. 722 00:31:19,190 --> 00:31:21,260 So in case, if we can solve shortest path, 723 00:31:21,260 --> 00:31:23,890 then we can certainly solve the reachability problem 724 00:31:23,890 --> 00:31:26,620 by calling that piece of code. 725 00:31:26,620 --> 00:31:30,130 And then finally we could talk about single source shortest 726 00:31:30,130 --> 00:31:30,730 path. 727 00:31:30,730 --> 00:31:35,220 So notice now that there's only one input node here s-- 728 00:31:35,220 --> 00:31:37,020 so what this problem is saying is give me 729 00:31:37,020 --> 00:31:38,790 the length of the shortest path from s 730 00:31:38,790 --> 00:31:42,493 to every single other vertex in my graph. 731 00:31:42,493 --> 00:31:43,410 Does that makes sense? 732 00:31:43,410 --> 00:31:46,230 Like maybe I return a big array with all the information, 733 00:31:46,230 --> 00:31:49,690 every single shortest distance. 734 00:31:49,690 --> 00:31:52,650 So can we solve single pair shortest path using 735 00:31:52,650 --> 00:31:56,120 single source shortest path? 736 00:31:56,120 --> 00:31:57,110 Absolutely. 737 00:31:57,110 --> 00:32:01,580 I could take s in my single pair shortest path problem, 738 00:32:01,580 --> 00:32:04,935 compute the shortest path from s to literally everything else, 739 00:32:04,935 --> 00:32:07,560 and then throw away all of that information except the shortest 740 00:32:07,560 --> 00:32:10,050 path to t, and now I'm good. 741 00:32:10,050 --> 00:32:13,350 Now I haven't justified that this is the fastest possible 742 00:32:13,350 --> 00:32:14,760 way to solve that second problem, 743 00:32:14,760 --> 00:32:17,370 but at least it shows that if I can solve problem three 744 00:32:17,370 --> 00:32:19,572 I can also solve problem two. 745 00:32:19,572 --> 00:32:21,780 If I can solve from two I can also solve problem one. 746 00:32:21,780 --> 00:32:24,030 So in today's lecture, we're just 747 00:32:24,030 --> 00:32:25,603 going to worry about problem three. 748 00:32:25,603 --> 00:32:27,270 In other words, these things are sort of 749 00:32:27,270 --> 00:32:31,390 listed in increasing order of their difficulty. 750 00:32:31,390 --> 00:32:34,810 OK, so in order to think about the single source shortest path 751 00:32:34,810 --> 00:32:37,480 problem, we're going to make one additional construction. 752 00:32:37,480 --> 00:32:41,530 And this is an idea called the shortest path tree. 753 00:32:41,530 --> 00:32:44,290 I got lazy drawing PowerPoint slides at 2:00 AM yesterday, 754 00:32:44,290 --> 00:32:48,230 and instead thought I'd draw a picture on the board. 755 00:32:48,230 --> 00:32:52,940 So let's draw a graph. 756 00:32:52,940 --> 00:32:56,140 So here we have a, b-- 757 00:32:56,140 --> 00:32:58,030 I'm going to use letters instead of numbers 758 00:32:58,030 --> 00:33:01,330 to refer to nodes from now on because I don't want 759 00:33:01,330 --> 00:33:03,130 to confuse the length of the shortest path 760 00:33:03,130 --> 00:33:06,440 with the index of my node. 761 00:33:06,440 --> 00:33:08,320 So here's a, b, c-- 762 00:33:08,320 --> 00:33:10,320 I'm going to match my notes here-- 763 00:33:10,320 --> 00:33:14,725 d, e, f. 764 00:33:17,310 --> 00:33:19,815 Here's a graph-- again undirected 765 00:33:19,815 --> 00:33:22,440 because your instructor likes to think about undirected graphs. 766 00:33:22,440 --> 00:33:24,232 But I know I'm going to get feedback that I 767 00:33:24,232 --> 00:33:26,110 shouldn't have done that later. 768 00:33:26,110 --> 00:33:27,570 But in any event, let's say that I 769 00:33:27,570 --> 00:33:29,820 want to compute the shortest path from a to everything 770 00:33:29,820 --> 00:33:31,535 else-- or the length rather. 771 00:33:31,535 --> 00:33:33,910 So first of all, even without talking about an algorithm, 772 00:33:33,910 --> 00:33:35,785 I think it's pretty easy to guess what it is. 773 00:33:35,785 --> 00:33:39,240 So clearly the shortest path from a to a has length 0. 774 00:33:39,240 --> 00:33:42,360 The shortest length from a to b is 1, from a to c is 2-- 775 00:33:42,360 --> 00:33:43,920 because I can follow these guys. 776 00:33:43,920 --> 00:33:45,040 Now it gets complicated. 777 00:33:45,040 --> 00:33:47,250 It branched. 778 00:33:47,250 --> 00:33:52,695 So the next shortest path is length 3, and then 4 like that. 779 00:33:52,695 --> 00:33:54,570 Does everybody agree with me that the numbers 780 00:33:54,570 --> 00:33:56,653 I've decorated here are the length of the shortest 781 00:33:56,653 --> 00:33:58,590 path from a to everything else? 782 00:34:01,520 --> 00:34:03,880 But what have I not done? 783 00:34:03,880 --> 00:34:06,430 I haven't told you how to actually compute the path, 784 00:34:06,430 --> 00:34:09,730 I've just given you the length of the path. 785 00:34:09,730 --> 00:34:12,340 So I may want a piece of code that in addition 786 00:34:12,340 --> 00:34:15,550 to doing single source shortest path length, 787 00:34:15,550 --> 00:34:19,989 also gives me a single source shortest path. 788 00:34:19,989 --> 00:34:22,460 So initially when I think about that, I might think about, 789 00:34:22,460 --> 00:34:25,420 well, how do I even write down a data structure that 790 00:34:25,420 --> 00:34:28,060 can store all of those paths. 791 00:34:28,060 --> 00:34:31,060 Well every path could have like v vertices in it, right. 792 00:34:31,060 --> 00:34:33,255 It could be that for whatever reason, 793 00:34:33,255 --> 00:34:34,880 there's a lot of branching in my graph. 794 00:34:34,880 --> 00:34:36,730 And all the paths are super long. 795 00:34:36,730 --> 00:34:38,409 Actually, I guess I have to think about 796 00:34:38,409 --> 00:34:40,929 whether branching would make them longer or shorter. 797 00:34:40,929 --> 00:34:44,415 But in any event, I could have a really boring data structure 798 00:34:44,415 --> 00:34:45,790 that just for every single vertex 799 00:34:45,790 --> 00:34:51,570 keeps track of the shortest path from a to that vertex. 800 00:34:51,570 --> 00:34:54,210 How big would that data structure be? 801 00:34:54,210 --> 00:34:56,730 Well, if the only bound I have on the length of a path 802 00:34:56,730 --> 00:34:58,215 is that-- 803 00:34:58,215 --> 00:35:02,010 it certainly at most it takes all the vertices in my graph-- 804 00:35:02,010 --> 00:35:04,750 then any one path will take v space. 805 00:35:04,750 --> 00:35:07,700 So that would take v squared space total. 806 00:35:07,700 --> 00:35:08,930 That wouldn't be so good. 807 00:35:08,930 --> 00:35:11,450 Because somehow I have an amount of information on my graph 808 00:35:11,450 --> 00:35:12,470 currently that's linear. 809 00:35:12,470 --> 00:35:13,997 It's just the length of the path. 810 00:35:13,997 --> 00:35:15,830 If I want to actually reconstruct that path, 811 00:35:15,830 --> 00:35:18,650 initially sort of spiritually feels like I need way 812 00:35:18,650 --> 00:35:20,242 more space to do that. 813 00:35:20,242 --> 00:35:21,950 But the answer is that we actually don't. 814 00:35:21,950 --> 00:35:23,480 That we're going to only need linear space, 815 00:35:23,480 --> 00:35:25,490 and the idea for that is to store an object 816 00:35:25,490 --> 00:35:26,740 called the shortest path tree. 817 00:35:26,740 --> 00:35:27,240 Yes? 818 00:35:27,240 --> 00:35:33,823 AUDIENCE: Just for [INAUDIBLE] previous [INAUDIBLE].. 819 00:35:33,823 --> 00:35:35,990 JUSTIN SOLOMON: So the question was about recursion. 820 00:35:35,990 --> 00:35:38,240 We haven't actually written down any graph algorithms. 821 00:35:38,240 --> 00:35:41,105 So we're going to defer on that until we actually recurse. 822 00:35:41,105 --> 00:35:42,980 And then we'll think about it more carefully. 823 00:35:42,980 --> 00:35:45,410 Yeah, but it's a totally reasonable question. 824 00:35:45,410 --> 00:35:47,810 There are plenty of recursive graph algorithms out there. 825 00:35:47,810 --> 00:35:50,500 And then we'll have to do our counting 826 00:35:50,500 --> 00:35:51,500 very carefully for sure. 827 00:35:54,480 --> 00:35:57,990 Right, so instead, we're going to define an object called 828 00:35:57,990 --> 00:35:59,250 the shortest path tree. 829 00:35:59,250 --> 00:36:01,810 And the basic trick here is to say, well, 830 00:36:01,810 --> 00:36:04,550 how did I get from a to c? 831 00:36:04,550 --> 00:36:08,990 Well, there's always a vertex, which is its predecessor, 832 00:36:08,990 --> 00:36:10,290 on the shortest path. 833 00:36:10,290 --> 00:36:12,960 And shortest path have this really beautiful property, 834 00:36:12,960 --> 00:36:16,790 which is that the shortest path from a to c, if I truncate it-- 835 00:36:16,790 --> 00:36:19,990 right, so it goes a to b to c-- 836 00:36:19,990 --> 00:36:23,140 then the truncated one is also the shortest path 837 00:36:23,140 --> 00:36:24,515 to that previous vertex. 838 00:36:24,515 --> 00:36:26,140 So let's think about that a little bit, 839 00:36:26,140 --> 00:36:27,970 because that sentence was, as usual, poorly 840 00:36:27,970 --> 00:36:29,530 phrased by your instructor. 841 00:36:29,530 --> 00:36:31,930 So let's say that I have the shortest 842 00:36:31,930 --> 00:36:38,800 path from a to d, which is very clearly a, b, c, d. 843 00:36:38,800 --> 00:36:40,540 I think we can all agree. 844 00:36:40,540 --> 00:36:42,820 And now I take like this sublist. 845 00:36:42,820 --> 00:36:45,310 I just look from a to c. 846 00:36:45,310 --> 00:36:47,020 Is there ever a circumstance when 847 00:36:47,020 --> 00:36:53,270 this is not the shortest path or a shortest path from a to c? 848 00:36:53,270 --> 00:36:56,750 No, right because if there existed a shorter path 849 00:36:56,750 --> 00:36:59,930 from a to c, I could splice it in here 850 00:36:59,930 --> 00:37:02,636 and find the shortest path from a to d. 851 00:37:02,636 --> 00:37:04,670 Do you see that? 852 00:37:04,670 --> 00:37:06,920 So based on that reasoning, rather than 853 00:37:06,920 --> 00:37:10,000 string like this giant set of shortest paths, 854 00:37:10,000 --> 00:37:11,750 sort of actually applying, in some senses, 855 00:37:11,750 --> 00:37:15,290 recursive suggestion, instead I can just 856 00:37:15,290 --> 00:37:19,832 think of the one vertex that's before me in my shortest path. 857 00:37:19,832 --> 00:37:21,040 I'm going to trace backwards. 858 00:37:21,040 --> 00:37:22,730 So let's take a look at our graph here. 859 00:37:25,667 --> 00:37:27,750 Essentially, the object I'm going to keep track of 860 00:37:27,750 --> 00:37:30,130 is like a predecessor, right. 861 00:37:30,130 --> 00:37:34,210 So what is the predecessor of f on the shortest path? 862 00:37:34,210 --> 00:37:35,620 It's actually either d or e. 863 00:37:35,620 --> 00:37:37,460 It doesn't matter in this case. 864 00:37:37,460 --> 00:37:42,690 Maybe the predecessor is e for fun, right. 865 00:37:42,690 --> 00:37:44,040 What's the predecessor of e? 866 00:37:44,040 --> 00:37:47,160 Well, clearly the previous vertex on the shortest path 867 00:37:47,160 --> 00:37:49,260 is c. 868 00:37:49,260 --> 00:37:54,540 Similarly for d-- now we have b and a and a bunch 869 00:37:54,540 --> 00:37:56,640 of arrows that point this way. 870 00:37:56,640 --> 00:37:58,092 So for every vertex I'm just going 871 00:37:58,092 --> 00:38:00,300 to start an arrow pointing toward the previous vertex 872 00:38:00,300 --> 00:38:01,270 on the shortest path. 873 00:38:01,270 --> 00:38:03,480 I'm not going to store the whole shortest path, just 874 00:38:03,480 --> 00:38:06,370 the very last edge. 875 00:38:06,370 --> 00:38:10,530 So first of all, how much storage does this take? 876 00:38:10,530 --> 00:38:11,663 It takes v space. 877 00:38:11,663 --> 00:38:12,330 Do you see that? 878 00:38:12,330 --> 00:38:14,220 Or the size of the vertices space. 879 00:38:14,220 --> 00:38:16,570 Because every vertex just has to store one thing, 880 00:38:16,570 --> 00:38:20,642 which is the previous vertex on the shortest path. 881 00:38:20,642 --> 00:38:22,850 Now what does my algorithm for tracing shortest path? 882 00:38:22,850 --> 00:38:23,850 It's really simple. 883 00:38:23,850 --> 00:38:25,940 I just start walking along these edges 884 00:38:25,940 --> 00:38:29,208 all the way until I get back to a. 885 00:38:29,208 --> 00:38:31,250 Now this object is called the shortest path tree. 886 00:38:31,250 --> 00:38:35,060 Notice I snuck in one additional word which is tree. 887 00:38:35,060 --> 00:38:35,630 Why is that? 888 00:38:35,630 --> 00:38:38,612 Can I ever have a cycle in this graph? 889 00:38:38,612 --> 00:38:40,320 It wouldn't really make any sense, right. 890 00:38:40,320 --> 00:38:41,320 These are shortest path. 891 00:38:41,320 --> 00:38:43,570 You should be able to kind of follow the gradient back 892 00:38:43,570 --> 00:38:46,450 to the original vertex. 893 00:38:46,450 --> 00:38:51,460 OK, so in other words, I'm going to basically decorate my graph 894 00:38:51,460 --> 00:38:52,570 with one additional thing. 895 00:38:52,570 --> 00:38:55,960 We'll call it p of v which is the previous vertex 896 00:38:55,960 --> 00:39:00,590 on the shortest path from my source point to my vertex v. 897 00:39:00,590 --> 00:39:02,972 And what I think I've tried to argue to you guys today 898 00:39:02,972 --> 00:39:04,430 is that if I have this information, 899 00:39:04,430 --> 00:39:06,763 that's actually enough to reconstruct the shortest path. 900 00:39:06,763 --> 00:39:10,190 I just keep taking p of v, and then p of p of v, 901 00:39:10,190 --> 00:39:12,230 and then p of p of p of v, and so on, which 902 00:39:12,230 --> 00:39:14,330 sounds more complicated than it is, 903 00:39:14,330 --> 00:39:17,010 until I trace back to my original vertex. 904 00:39:17,010 --> 00:39:20,368 And this object conceptually is called the shortest path tree. 905 00:39:20,368 --> 00:39:21,410 Any questions about that? 906 00:39:23,930 --> 00:39:25,106 Yes? 907 00:39:25,106 --> 00:39:28,045 AUDIENCE: [INAUDIBLE] 908 00:39:28,045 --> 00:39:30,925 JUSTIN SOLOMON: If I had an edge that connected a to d, OK. 909 00:39:30,925 --> 00:39:36,980 AUDIENCE: [INAUDIBLE] 910 00:39:36,980 --> 00:39:39,680 JUSTIN SOLOMON: Oh, OK so the question was, 911 00:39:39,680 --> 00:39:44,600 let's say that our colleague here added an edge-- 912 00:39:44,600 --> 00:39:47,840 this is a great question. 913 00:39:47,840 --> 00:39:51,410 You know somebody was evil, my adversarial neural network, 914 00:39:51,410 --> 00:39:53,810 stuck an edge here because it was adversarial, 915 00:39:53,810 --> 00:39:56,580 and it wanted my shortest path code to fail. 916 00:39:56,580 --> 00:40:01,250 And now somehow the tree that I gave you is no longer correct. 917 00:40:01,250 --> 00:40:03,440 And my answer to that is yes. 918 00:40:03,440 --> 00:40:04,070 Why is that? 919 00:40:04,070 --> 00:40:07,580 Well, by adding this edge here, the length of my shortest path 920 00:40:07,580 --> 00:40:08,450 changed. 921 00:40:08,450 --> 00:40:10,927 The shortest path from a to d is now 1. 922 00:40:10,927 --> 00:40:12,260 So this tree is no longer valid. 923 00:40:12,260 --> 00:40:14,120 I need a new tree. 924 00:40:14,120 --> 00:40:19,120 So now what would be the previous p of d here? 925 00:40:19,120 --> 00:40:22,720 Well, rather than being c, it would be a. 926 00:40:22,720 --> 00:40:24,230 Yes, that's absolutely right. 927 00:40:24,230 --> 00:40:27,580 And it actually is reflective of a really annoying property 928 00:40:27,580 --> 00:40:31,150 of shortest path, which is if I add one edge to my graph, 929 00:40:31,150 --> 00:40:34,030 the length of the shortest path to every vertex can change. 930 00:40:34,030 --> 00:40:37,330 Well, I guess with the exception of the source vertex. 931 00:40:37,330 --> 00:40:41,140 Yeah, and that's actually a really big headache 932 00:40:41,140 --> 00:40:42,170 in certain applications. 933 00:40:42,170 --> 00:40:44,980 So for instance-- and then I'll shut up about applications 934 00:40:44,980 --> 00:40:47,020 and do math again-- 935 00:40:47,020 --> 00:40:48,880 I work a lot with 3D models. 936 00:40:48,880 --> 00:40:51,730 And there's a big data set of 3D models of like ballerinas. 937 00:40:51,730 --> 00:40:54,105 And ballerinas are really annoying because sometimes they 938 00:40:54,105 --> 00:40:55,580 put their hands together like that. 939 00:40:55,580 --> 00:40:58,540 And then suddenly the shortest path between your fingers 940 00:40:58,540 --> 00:41:02,770 goes from your entire body to like 0. 941 00:41:02,770 --> 00:41:05,860 And so incremental algorithms for computing shortest path 942 00:41:05,860 --> 00:41:06,940 can fail here, right. 943 00:41:06,940 --> 00:41:08,890 Because I have to update like everything 944 00:41:08,890 --> 00:41:11,975 if I accidentally glued together fingers like that. 945 00:41:11,975 --> 00:41:13,600 So anyway, I'll let you think about how 946 00:41:13,600 --> 00:41:14,725 you might fix that problem. 947 00:41:14,725 --> 00:41:17,380 If you want to know more, you should take 6.838. 948 00:41:17,380 --> 00:41:18,226 Yes? 949 00:41:18,226 --> 00:41:19,862 AUDIENCE: [INAUDIBLE]. 950 00:41:19,862 --> 00:41:21,820 JUSTIN SOLOMON: If you change your source node, 951 00:41:21,820 --> 00:41:23,120 the shortest possible change again. 952 00:41:23,120 --> 00:41:25,310 Yeah, so this is going to be one of these really boring things 953 00:41:25,310 --> 00:41:26,240 where I'm going to keep answering 954 00:41:26,240 --> 00:41:28,440 like any time I change anything about my problem-- 955 00:41:28,440 --> 00:41:30,440 I change my source, I change my edges-- 956 00:41:30,440 --> 00:41:32,580 I have to just recompute all the shortest paths. 957 00:41:32,580 --> 00:41:36,570 There are obviously algorithms out there that don't do that. 958 00:41:36,570 --> 00:41:38,600 But we're not going to think about them yet. 959 00:41:38,600 --> 00:41:39,710 OK. 960 00:41:39,710 --> 00:41:41,230 So as usual, I've talked too much 961 00:41:41,230 --> 00:41:43,220 and left myself about 10 minutes to do 962 00:41:43,220 --> 00:41:45,860 the actual algorithm that's interesting in the lecture 963 00:41:45,860 --> 00:41:46,467 here-- 964 00:41:46,467 --> 00:41:48,550 although actually, it's really not so complicated, 965 00:41:48,550 --> 00:41:50,120 so I think we'll do OK-- 966 00:41:50,120 --> 00:41:52,288 which is how do I actually compute shortest paths? 967 00:41:52,288 --> 00:41:54,080 Yeah, and the basic thing we're going to do 968 00:41:54,080 --> 00:41:57,388 is sort of build on this tree analogy here. 969 00:41:57,388 --> 00:41:59,930 We are going to define one more object, which I really like-- 970 00:41:59,930 --> 00:42:02,305 actually I enjoy this from Jason's notes because it looks 971 00:42:02,305 --> 00:42:04,730 like calculus, and I enjoy that-- 972 00:42:04,730 --> 00:42:08,150 and that's an idea of the level set. 973 00:42:08,150 --> 00:42:11,060 And so this is a whole set of things L sub k. 974 00:42:11,060 --> 00:42:13,715 And these are all the vertices that are distance k 975 00:42:13,715 --> 00:42:15,810 away from my source. 976 00:42:15,810 --> 00:42:18,620 So for instance, if my source vertex in this example 977 00:42:18,620 --> 00:42:20,540 is the vertex all the way on the left, 978 00:42:20,540 --> 00:42:24,230 then L0 obviously contains just that vertex, right. 979 00:42:24,230 --> 00:42:25,670 L1 is the next one. 980 00:42:25,670 --> 00:42:27,080 L2 is the third one. 981 00:42:27,080 --> 00:42:30,597 But now L3 is a set of three vertices 982 00:42:30,597 --> 00:42:32,680 because those are all the things that are distance 983 00:42:32,680 --> 00:42:34,170 3 away from the source. 984 00:42:34,170 --> 00:42:36,630 That's what I've labeled in pink here. 985 00:42:36,630 --> 00:42:40,230 OK, so that's all that this notation here means. 986 00:42:40,230 --> 00:42:42,840 Oh, I've made a slight typo because in this class distance 987 00:42:42,840 --> 00:42:46,165 is delta and not d, but whatever. 988 00:42:46,165 --> 00:42:47,040 AUDIENCE: [INAUDIBLE] 989 00:42:47,040 --> 00:42:48,180 JUSTIN SOLOMON: The shortest distance-- 990 00:42:48,180 --> 00:42:49,200 that's absolutely right. 991 00:42:49,200 --> 00:42:51,283 So for instance, I could have a very long distance 992 00:42:51,283 --> 00:42:52,390 from L0 to L2, right. 993 00:42:52,390 --> 00:42:54,990 I could just flip back and forth between L0 and L1, 994 00:42:54,990 --> 00:42:57,220 maybe go over to L4 and then go back. 995 00:42:57,220 --> 00:43:00,040 But that wouldn't be a terribly helpful thing to compute. 996 00:43:00,040 --> 00:43:01,040 That's absolutely right. 997 00:43:01,040 --> 00:43:02,286 Yes? 998 00:43:02,286 --> 00:43:04,197 AUDIENCE: [INAUDIBLE]. 999 00:43:04,197 --> 00:43:05,780 JUSTIN SOLOMON: Oh, the red background 1000 00:43:05,780 --> 00:43:10,515 is the set L. So for example, L3 contains these three vertices 1001 00:43:10,515 --> 00:43:12,140 because they're all the things that are 1002 00:43:12,140 --> 00:43:14,000 distance 3 away from the left. 1003 00:43:14,000 --> 00:43:17,930 I got a little too slick drawing my diagram late last night. 1004 00:43:17,930 --> 00:43:19,800 I'm kind of proud of it. 1005 00:43:19,800 --> 00:43:24,140 OK, so essentially if I wanted to compute 1006 00:43:24,140 --> 00:43:27,350 the length of the shortest path from all the way on the left 1007 00:43:27,350 --> 00:43:29,590 to all the other vertices, one way 1008 00:43:29,590 --> 00:43:32,090 to do that would be to compute all these level sets and then 1009 00:43:32,090 --> 00:43:35,150 just sort of check what level set I'm in, right. 1010 00:43:35,150 --> 00:43:37,700 So we're going to introduce an algorithm called 1011 00:43:37,700 --> 00:43:40,940 Breadth-First search which does roughly that. 1012 00:43:40,940 --> 00:43:42,680 So Breadth-First search, the way we'll 1013 00:43:42,680 --> 00:43:45,440 introduce it today is going to be an algorithm for computing 1014 00:43:45,440 --> 00:43:48,500 all of those level sets, L sub i, and then from that, 1015 00:43:48,500 --> 00:43:50,120 we can construct the length and even 1016 00:43:50,120 --> 00:43:52,740 the shape of the shortest path. 1017 00:43:52,740 --> 00:43:55,350 And I'm going to move to my handwritten notes. 1018 00:43:55,350 --> 00:44:00,568 OK, and here's what our algorithm is going to do. 1019 00:44:00,568 --> 00:44:02,610 I'm going to write it in a slightly different way 1020 00:44:02,610 --> 00:44:07,080 than what's in the notes and on the screen, but only slightly. 1021 00:44:07,080 --> 00:44:10,530 So first of all, one thing I think we can all agree on 1022 00:44:10,530 --> 00:44:12,150 is that level set 0-- 1023 00:44:12,150 --> 00:44:15,200 oh, that's-- this chalk bifurcated-- 1024 00:44:15,200 --> 00:44:17,400 it contains one node. 1025 00:44:17,400 --> 00:44:19,450 What should that node be? 1026 00:44:19,450 --> 00:44:22,120 The source because the only thing that's distance 1027 00:44:22,120 --> 00:44:24,140 is 0 away from the source, is the source node. 1028 00:44:28,750 --> 00:44:33,460 OK, and in addition to that, we can initialize the distance 1029 00:44:33,460 --> 00:44:35,753 from the source to itself. 1030 00:44:35,753 --> 00:44:37,420 Everybody on three, what is the distance 1031 00:44:37,420 --> 00:44:38,560 from the source to itself-- 1032 00:44:38,560 --> 00:44:39,500 1, 2, 3. 1033 00:44:39,500 --> 00:44:40,117 AUDIENCE: 0. 1034 00:44:40,117 --> 00:44:41,200 JUSTIN SOLOMON: Thank you. 1035 00:44:41,200 --> 00:44:43,790 See you're waking up now, it's almost 11:00-- 1036 00:44:43,790 --> 00:44:44,290 12:00. 1037 00:44:44,290 --> 00:44:45,200 What time is it? 1038 00:44:45,200 --> 00:44:49,210 Almost 12:00-- OK, and then finally-- 1039 00:44:49,210 --> 00:44:50,830 well maybe initially we don't really 1040 00:44:50,830 --> 00:44:54,100 know anything about the array p, so we just make it empty. 1041 00:44:54,100 --> 00:44:56,545 Because p of the source, it somehow doesn't matter. 1042 00:44:56,545 --> 00:44:58,420 Because once I've made it back to the source, 1043 00:44:58,420 --> 00:45:00,402 I'm done computing shortest path. 1044 00:45:00,402 --> 00:45:02,110 So we're going to write an algorithm that 1045 00:45:02,110 --> 00:45:06,310 computes all the level sets and fills in this array p and fills 1046 00:45:06,310 --> 00:45:08,380 in the distances all in one big shot. 1047 00:45:08,380 --> 00:45:11,040 We're going to call it Breadth-First search. 1048 00:45:11,040 --> 00:45:12,080 OK, so let's do that. 1049 00:45:15,810 --> 00:45:17,785 So we can use the notation here. 1050 00:45:17,785 --> 00:45:19,160 And notice that there's basically 1051 00:45:19,160 --> 00:45:22,070 an induction going on, which is I'm going to compute level set 1052 00:45:22,070 --> 00:45:25,490 1 from level set 0, level set 2 from level set 1, 1053 00:45:25,490 --> 00:45:28,560 and so on, until I fill in all my level sets. 1054 00:45:28,560 --> 00:45:30,250 Does that makes sense? 1055 00:45:30,250 --> 00:45:34,080 So here's a slightly different way to notate the same thing. 1056 00:45:34,080 --> 00:45:35,580 I'm going to use a WHILE loop, which 1057 00:45:35,580 --> 00:45:39,540 I know is like slightly non-kosher, but that's OK. 1058 00:45:39,540 --> 00:45:42,720 So I'm going to initialize a number i to be 1. 1059 00:45:42,720 --> 00:45:44,580 This is going to be like our counter. 1060 00:45:44,580 --> 00:45:52,830 I'm going to say WHILE the previous level set is not 1061 00:45:52,830 --> 00:45:55,470 empty, meaning that potentially there's 1062 00:45:55,470 --> 00:45:57,960 a path that goes through the previous level set 1063 00:45:57,960 --> 00:45:59,640 into the next one. 1064 00:45:59,640 --> 00:46:02,340 Because as soon as one of my levels is empty, 1065 00:46:02,340 --> 00:46:04,845 notice that like the Li for even bigger i 1066 00:46:04,845 --> 00:46:05,970 are also going to be empty. 1067 00:46:05,970 --> 00:46:08,910 There's like never a case when there's something not distance 1068 00:46:08,910 --> 00:46:12,110 i but then distance i plus 5. 1069 00:46:12,110 --> 00:46:15,570 OK, so now what am I going to do? 1070 00:46:15,570 --> 00:46:18,110 Well, let's think back to our graph. 1071 00:46:23,040 --> 00:46:26,100 So like now I know that this guy is distance 0 away. 1072 00:46:26,100 --> 00:46:28,090 That's what I started with. 1073 00:46:28,090 --> 00:46:30,998 So now I'm going to look at all the neighbors of this vertex. 1074 00:46:30,998 --> 00:46:32,790 And I'm going to make them distance 1 away. 1075 00:46:32,790 --> 00:46:35,740 Does that makes sense? 1076 00:46:35,740 --> 00:46:38,850 And similarly here, this guy is distance 2. 1077 00:46:38,850 --> 00:46:42,300 And eventually I'm going to get in trouble because maybe-- 1078 00:46:42,300 --> 00:46:44,110 well, what's a good example here. 1079 00:46:44,110 --> 00:46:46,272 I won't even try to draw. 1080 00:46:46,272 --> 00:46:47,730 I could run into trouble if I don't 1081 00:46:47,730 --> 00:46:51,670 want to add a vertex twice to two different level sets. 1082 00:46:51,670 --> 00:46:53,970 Once I've put it in Li then I don't 1083 00:46:53,970 --> 00:46:56,190 want to put it in Li plus 5 because I already 1084 00:46:56,190 --> 00:46:58,260 know that it's distance i away. 1085 00:46:58,260 --> 00:47:00,090 Does that makes sense? 1086 00:47:00,090 --> 00:47:01,890 OK, so what I'm going to do is I'm 1087 00:47:01,890 --> 00:47:04,720 going to iterate over all the vertices in my previous level 1088 00:47:04,720 --> 00:47:05,220 set. 1089 00:47:09,680 --> 00:47:12,260 And now I'm going to look at every vertex that 1090 00:47:12,260 --> 00:47:15,512 is adjacent to u. 1091 00:47:15,512 --> 00:47:16,470 Because what do I know? 1092 00:47:16,470 --> 00:47:19,520 I know that if I can get to u in i minus 1 steps, 1093 00:47:19,520 --> 00:47:24,310 how many steps should it take me to get to any neighbor of u? 1094 00:47:24,310 --> 00:47:26,710 i steps because I can go through the path, which 1095 00:47:26,710 --> 00:47:30,370 is the length of i minus 1, add one additional edge, 1096 00:47:30,370 --> 00:47:32,590 and I'll get to that new guy. 1097 00:47:32,590 --> 00:47:35,000 So what can I do? 1098 00:47:35,000 --> 00:47:37,450 I can iterate over all of v, which 1099 00:47:37,450 --> 00:47:42,790 is in the adjacent set of u. 1100 00:47:42,790 --> 00:47:45,310 But I have to be a little bit careful because what 1101 00:47:45,310 --> 00:47:46,510 if I have an edge backwards? 1102 00:47:46,510 --> 00:47:48,610 So like for instance, here I have an edge back 1103 00:47:48,610 --> 00:47:50,560 to the source. 1104 00:47:50,560 --> 00:47:52,260 I guess this is-- 1105 00:47:52,260 --> 00:47:54,665 yeah, that's a valid example. 1106 00:47:54,665 --> 00:47:56,040 I wouldn't want to add the source 1107 00:47:56,040 --> 00:47:58,560 to the third level set because I already 1108 00:47:58,560 --> 00:48:01,080 added it in the previous guy. 1109 00:48:01,080 --> 00:48:05,175 So I want to get rid of the union of all 1110 00:48:05,175 --> 00:48:06,300 of the previous level sets. 1111 00:48:11,416 --> 00:48:12,405 Does that make sense? 1112 00:48:12,405 --> 00:48:13,780 So in other words, I'm only going 1113 00:48:13,780 --> 00:48:17,950 to look at the adjacent vertices that I haven't visited yet 1114 00:48:17,950 --> 00:48:21,160 in my level set computational algorithm. 1115 00:48:21,160 --> 00:48:23,410 And all I have to do is update my arrays, right. 1116 00:48:23,410 --> 00:48:32,230 So in particular, I'm going to add vertex v to level set i 1117 00:48:32,230 --> 00:48:34,450 because I haven't seen v yet. 1118 00:48:34,450 --> 00:48:43,030 I'm going to set the distance from s to v equal to i 1119 00:48:43,030 --> 00:48:46,240 because I'm currently filling in my level set i. 1120 00:48:46,240 --> 00:48:51,820 And then finally what is p of v? 1121 00:48:51,820 --> 00:48:54,430 What is the previous vertex to v in my shortest 1122 00:48:54,430 --> 00:48:55,540 path from my source? 1123 00:48:58,420 --> 00:48:59,680 It's u, right. 1124 00:48:59,680 --> 00:49:01,810 Because that's the guy in the previous level 1125 00:49:01,810 --> 00:49:05,590 set that I'm building my path from, right. 1126 00:49:05,590 --> 00:49:07,247 I'm going to set that to u. 1127 00:49:07,247 --> 00:49:08,830 And then-- sorry, I ran out of space-- 1128 00:49:08,830 --> 00:49:12,990 but I also have to increment i. 1129 00:49:12,990 --> 00:49:15,910 OK, so what does this algorithm do? 1130 00:49:15,910 --> 00:49:17,880 It's just building one level set at a time. 1131 00:49:17,880 --> 00:49:21,930 If we go back to our picture, so it starts by initializing L0 1132 00:49:21,930 --> 00:49:25,110 to just be the source vertex, then it looks at all the edges 1133 00:49:25,110 --> 00:49:27,210 coming out of that-- in that case just one-- 1134 00:49:27,210 --> 00:49:29,190 it makes that length 1-- 1135 00:49:29,190 --> 00:49:30,125 and so on. 1136 00:49:30,125 --> 00:49:31,500 And so this is just incrementally 1137 00:49:31,500 --> 00:49:32,990 building up all these level sets. 1138 00:49:32,990 --> 00:49:34,740 Now there's a pretty straightforward proof 1139 00:49:34,740 --> 00:49:37,350 by induction that this algorithm correctly 1140 00:49:37,350 --> 00:49:40,140 computes the L's the p's and the deltas which 1141 00:49:40,140 --> 00:49:43,890 is all the information that we need to compute the shortest 1142 00:49:43,890 --> 00:49:44,490 path. 1143 00:49:44,490 --> 00:49:46,448 I think you guys can do that in your recitation 1144 00:49:46,448 --> 00:49:49,800 if you still need a little bit of induction proof practice 1145 00:49:49,800 --> 00:49:51,240 here. 1146 00:49:51,240 --> 00:49:53,130 And the final thing that we should check 1147 00:49:53,130 --> 00:49:55,860 is what is the runtime of this algorithm. 1148 00:49:55,860 --> 00:49:59,800 I'm going to squeeze it in there just at the last second here. 1149 00:49:59,800 --> 00:50:01,540 So let's take a look. 1150 00:50:01,540 --> 00:50:08,700 So first of all, I did something a little-- oh, no it's OK-- 1151 00:50:08,700 --> 00:50:10,298 in my algorithm actually in step zero 1152 00:50:10,298 --> 00:50:12,840 I had to make an array which was the size equal to the number 1153 00:50:12,840 --> 00:50:14,490 of vertices. 1154 00:50:14,490 --> 00:50:17,250 Remember that in 6.006 how much time does 1155 00:50:17,250 --> 00:50:19,853 it take to allocate memory? 1156 00:50:19,853 --> 00:50:21,770 Yeah, it takes the amount of time proportional 1157 00:50:21,770 --> 00:50:24,590 to the amount of memory that I allocate. 1158 00:50:24,590 --> 00:50:27,180 So already-- Steph, I see your hand but we're low on time. 1159 00:50:27,180 --> 00:50:29,210 So we're to make it to the end. 1160 00:50:29,210 --> 00:50:32,600 Already we've incurred v time because our shortest pathway 1161 00:50:32,600 --> 00:50:34,730 array takes v space. 1162 00:50:34,730 --> 00:50:39,500 But in addition to that, we have this kind of funny FOR loop 1163 00:50:39,500 --> 00:50:44,610 where for every node I have to visit all of its neighbors. 1164 00:50:44,610 --> 00:50:49,080 But first of all, do I ever see a node of twice here? 1165 00:50:49,080 --> 00:50:52,260 No, because I'm going in order of distance. 1166 00:50:52,260 --> 00:50:54,423 And the second that I've seen a node in one level 1167 00:50:54,423 --> 00:50:55,590 set, it can't be in another. 1168 00:50:55,590 --> 00:50:58,830 That's our basic construction here. 1169 00:50:58,830 --> 00:51:01,380 Well, conveniently for you guys, you already 1170 00:51:01,380 --> 00:51:03,060 proved exactly the formula that we need. 1171 00:51:03,060 --> 00:51:04,560 And if I'm lucky, I didn't trace it. 1172 00:51:04,560 --> 00:51:06,900 Yeah, here we are. 1173 00:51:06,900 --> 00:51:09,630 So if we take a look here, this is exactly the scenario 1174 00:51:09,630 --> 00:51:10,712 that we're in. 1175 00:51:10,712 --> 00:51:11,670 Because what did we do? 1176 00:51:11,670 --> 00:51:14,622 We iterated over all the nodes in our graph, 1177 00:51:14,622 --> 00:51:17,080 and then we iterated over all the neighbors of those nodes. 1178 00:51:17,080 --> 00:51:18,705 And that's the basic computational time 1179 00:51:18,705 --> 00:51:20,950 in our algorithm. 1180 00:51:20,950 --> 00:51:25,150 So that FOR loop, or that WHILE loop rather, in my code 1181 00:51:25,150 --> 00:51:29,510 is incurring time proportional to the number of edges. 1182 00:51:29,510 --> 00:51:33,440 So what is the total run time for Breadth-First search? 1183 00:51:33,440 --> 00:51:36,980 Well, we need to construct that array. 1184 00:51:36,980 --> 00:51:40,280 So just at step zero, we've incurred v time. 1185 00:51:40,280 --> 00:51:43,340 And then we have to iterate over something that takes 1186 00:51:43,340 --> 00:51:45,120 up most the number of edges. 1187 00:51:45,120 --> 00:51:54,813 So overall our algorithm takes big O of mod v plus mod e time. 1188 00:51:54,813 --> 00:51:56,730 Now, notice that this is-- you might view this 1189 00:51:56,730 --> 00:51:57,605 as kind of redundant. 1190 00:51:57,605 --> 00:51:58,630 By the way this-- 1191 00:51:58,630 --> 00:52:00,060 I have a little bit of a quibble with Jason. 1192 00:52:00,060 --> 00:52:02,160 But in this class we will call this a linear time algorithm 1193 00:52:02,160 --> 00:52:04,140 because it's linear in the space that you're 1194 00:52:04,140 --> 00:52:05,833 using to store your graph. 1195 00:52:05,833 --> 00:52:07,500 I think that's a little fishy personally 1196 00:52:07,500 --> 00:52:10,090 because this scale could scale quadratically in v, 1197 00:52:10,090 --> 00:52:12,930 but I digress. 1198 00:52:12,930 --> 00:52:19,210 In any event, why do we need both of these terms here? 1199 00:52:19,210 --> 00:52:23,740 Well, notice that if I had no edges in my graph, 1200 00:52:23,740 --> 00:52:26,110 now this term is going to dominate. 1201 00:52:26,110 --> 00:52:28,870 But as I add edges to my graph, this thing 1202 00:52:28,870 --> 00:52:30,470 could go up to v squared. 1203 00:52:30,470 --> 00:52:32,470 So this is somehow a more informative expression 1204 00:52:32,470 --> 00:52:35,003 than just saying, well at worst this is v squared time. 1205 00:52:35,003 --> 00:52:35,920 Does that makes sense? 1206 00:52:35,920 --> 00:52:38,290 It's a slightly better formula to have. 1207 00:52:38,290 --> 00:52:40,630 OK, so with that we just squeaked into the finish line. 1208 00:52:40,630 --> 00:52:43,340 We have an algorithm for computing shortest paths. 1209 00:52:43,340 --> 00:52:46,980 And I will see you guys again I guess on Tuesday.