1 00:00:00,000 --> 00:00:02,425 [SQUEAKING] 2 00:00:02,425 --> 00:00:04,365 [RUSTLING] 3 00:00:04,365 --> 00:00:05,820 [CLICKING] 4 00:00:12,620 --> 00:00:16,340 JUSTIN SOLOMON: Well, welcome to problem session 5 of 6.006. 5 00:00:16,340 --> 00:00:20,195 It's a pleasure to see all your smiling faces here today. 6 00:00:20,195 --> 00:00:22,070 This week, we're going to cover some problems 7 00:00:22,070 --> 00:00:24,947 in graph theory related to depth-first search 8 00:00:24,947 --> 00:00:27,530 and breadth-first search, which were roughly the topics that I 9 00:00:27,530 --> 00:00:29,060 guess we've covered in the last couple lectures 10 00:00:29,060 --> 00:00:30,920 and what's going to be on your homework. 11 00:00:30,920 --> 00:00:32,753 And I believe this is basically the homework 12 00:00:32,753 --> 00:00:36,440 from last year with a few revisions, 13 00:00:36,440 --> 00:00:38,282 based on some typos we caught. 14 00:00:38,282 --> 00:00:39,740 Oh, and I caught a spelling mistake 15 00:00:39,740 --> 00:00:41,960 that I'll bother our instructor about later. 16 00:00:41,960 --> 00:00:42,950 OK. 17 00:00:42,950 --> 00:00:44,913 So without further ado, let's get started. 18 00:00:44,913 --> 00:00:47,330 I guess we'll just do them in order for lack of creativity 19 00:00:47,330 --> 00:00:48,068 here. 20 00:00:48,068 --> 00:00:50,360 The very first problem has to do with some measurements 21 00:00:50,360 --> 00:00:54,170 on a graph, which is actually a really interesting one to me. 22 00:00:54,170 --> 00:00:56,378 So it turns out that in a lot of research in-- 23 00:00:56,378 --> 00:00:57,920 for some reason, in computer science, 24 00:00:57,920 --> 00:00:59,878 there's graph theory research, and then there's 25 00:00:59,878 --> 00:01:01,050 networks research. 26 00:01:01,050 --> 00:01:03,170 And these are two different communities 27 00:01:03,170 --> 00:01:05,600 for weird historical reasons that I don't totally 28 00:01:05,600 --> 00:01:06,830 understand. 29 00:01:06,830 --> 00:01:09,140 But people in the network science literature 30 00:01:09,140 --> 00:01:11,000 often measure things like the radius 31 00:01:11,000 --> 00:01:12,937 of the graph and some other kind of measures 32 00:01:12,937 --> 00:01:15,020 that are trying to tell you something about, like, 33 00:01:15,020 --> 00:01:16,970 is a graph a long, spread-out thing, 34 00:01:16,970 --> 00:01:20,510 like a line graph, or something super compact, like a star? 35 00:01:20,510 --> 00:01:21,447 And so on. 36 00:01:21,447 --> 00:01:23,030 And so this problem is kind of digging 37 00:01:23,030 --> 00:01:24,873 into the algorithmic aspects of how 38 00:01:24,873 --> 00:01:26,540 we might compute one of the measurements 39 00:01:26,540 --> 00:01:32,520 that I believe is fairly common in that community. 40 00:01:32,520 --> 00:01:34,340 So let's kind of go through these problems. 41 00:01:34,340 --> 00:01:37,718 As usual, in 6.006, we like to take actually 42 00:01:37,718 --> 00:01:39,760 relatively straightforward computational problems 43 00:01:39,760 --> 00:01:41,740 and then dress them up with a lot of language 44 00:01:41,740 --> 00:01:43,840 to make it annoying for you guys to parse. 45 00:01:43,840 --> 00:01:47,240 And indeed, this problem is no exception to that. 46 00:01:47,240 --> 00:01:50,560 So in this problem, we're given an undirected graph. 47 00:01:56,050 --> 00:02:00,980 And as usual, we will call him G. 48 00:02:00,980 --> 00:02:02,960 And we define a particular number 49 00:02:02,960 --> 00:02:05,380 that we're trying to measure in this problem, right? 50 00:02:05,380 --> 00:02:12,140 So in particular, if we're given a vertex v, 51 00:02:12,140 --> 00:02:15,860 then we can define something called the eccentricity of v, 52 00:02:15,860 --> 00:02:19,470 which is the distance to the farthest-away thing. 53 00:02:19,470 --> 00:02:22,250 So in particular, we can define-- 54 00:02:27,290 --> 00:02:31,730 it's going to be given by the following, which is 55 00:02:31,730 --> 00:02:35,540 the max over all the possible-- 56 00:02:35,540 --> 00:02:38,347 I'll try to make sure my notation-- 57 00:02:38,347 --> 00:02:39,930 oops, I've already-- well, that's OK-- 58 00:02:42,990 --> 00:02:50,070 over all the other vertices of the distance from v to w. 59 00:02:50,070 --> 00:02:52,140 So I'm standing at a point on a graph. 60 00:02:52,140 --> 00:02:54,090 And now I like make a loud noise. 61 00:02:54,090 --> 00:02:56,850 And the last person to hear me, the distance to him 62 00:02:56,850 --> 00:03:00,760 would be the eccentricity of that vertex. 63 00:03:00,760 --> 00:03:06,000 And so this is some kind of notion of radius or diameter, 64 00:03:06,000 --> 00:03:08,197 but sort of planted at a point. 65 00:03:08,197 --> 00:03:10,530 And then if we want to learn a property not of a vertex, 66 00:03:10,530 --> 00:03:13,230 but of the entire graph, one thing we can do 67 00:03:13,230 --> 00:03:14,250 is define the radius. 68 00:03:17,510 --> 00:03:25,390 And that is given by R of G is the min over all 69 00:03:25,390 --> 00:03:31,390 of the different vertices, u, of the eccentricity of u. 70 00:03:31,390 --> 00:03:33,640 OK, so I think this is one of these definitions that's 71 00:03:33,640 --> 00:03:35,425 really annoying to parse and think about. 72 00:03:35,425 --> 00:03:37,300 So we should draw a little bit of a schematic 73 00:03:37,300 --> 00:03:40,270 and see what's going on here, because 74 00:03:40,270 --> 00:03:42,250 especially as a geometry professor, this one's 75 00:03:42,250 --> 00:03:44,710 kind of nice, because it translates directly to what you 76 00:03:44,710 --> 00:03:46,010 might do in metric geometry. 77 00:03:46,010 --> 00:03:49,780 So let's say that I have a circle here. 78 00:03:49,780 --> 00:03:51,610 And I want the world's most complicated way 79 00:03:51,610 --> 00:03:53,890 of defining its radius. 80 00:03:53,890 --> 00:03:56,890 So for any given point-- 81 00:03:56,890 --> 00:03:58,120 there's a point in a circle. 82 00:03:58,120 --> 00:03:59,470 That's a circle, in case you were wondering. 83 00:03:59,470 --> 00:04:01,160 You know those internet contests where 84 00:04:01,160 --> 00:04:03,160 they have people that just walk up to the board, 85 00:04:03,160 --> 00:04:05,350 and draw perfect circles, and then leave? 86 00:04:05,350 --> 00:04:09,280 I unfortunately am not an expert at this matter. 87 00:04:09,280 --> 00:04:11,740 But anyway, so if we think of a point-- 88 00:04:11,740 --> 00:04:14,080 like a circle as some analog of our graph, 89 00:04:14,080 --> 00:04:15,730 and then I draw a point, which might 90 00:04:15,730 --> 00:04:19,360 be the analog of a vertex, then what is the eccentricity? 91 00:04:19,360 --> 00:04:23,350 Well, it's the distance to the farthest-away point, right? 92 00:04:23,350 --> 00:04:28,450 So for this guy, it might be the length of this line, 93 00:04:28,450 --> 00:04:30,730 roughly, because that's the distance 94 00:04:30,730 --> 00:04:32,720 to the farthest-away thing. 95 00:04:32,720 --> 00:04:36,280 So for every different point that I draw, 96 00:04:36,280 --> 00:04:38,110 each point has its own farthest-away point 97 00:04:38,110 --> 00:04:39,857 in the circle. 98 00:04:39,857 --> 00:04:41,440 So there's some positive number that's 99 00:04:41,440 --> 00:04:43,680 assigned to every single point in this domain. 100 00:04:43,680 --> 00:04:46,870 And if I take the minimum of that positive number, 101 00:04:46,870 --> 00:04:48,694 where do you think I end up? 102 00:04:48,694 --> 00:04:49,990 AUDIENCE: Center of the circle. 103 00:04:49,990 --> 00:04:51,490 JUSTIN SOLOMON: That's right, Jason. 104 00:04:51,490 --> 00:04:53,050 I end up in the center of the circle, 105 00:04:53,050 --> 00:04:55,210 because if I think about it, the distance 106 00:04:55,210 --> 00:04:58,750 to the farthest-away point in this domain, one thing you 107 00:04:58,750 --> 00:05:00,250 can convince yourself is that that's 108 00:05:00,250 --> 00:05:02,270 sort of as small as possible. 109 00:05:02,270 --> 00:05:04,240 So this is what we might call a min-max problem 110 00:05:04,240 --> 00:05:07,000 in optimization, because we are minimizing 111 00:05:07,000 --> 00:05:09,820 the maximum distance, yeah? 112 00:05:09,820 --> 00:05:11,420 This also shows up in game theory, 113 00:05:11,420 --> 00:05:13,700 all kinds of different places that solve this stuff. 114 00:05:13,700 --> 00:05:15,492 But thankfully, in this particular problem, 115 00:05:15,492 --> 00:05:17,020 we're not going to need all that. 116 00:05:17,020 --> 00:05:18,680 OK, so right. 117 00:05:18,680 --> 00:05:22,100 So this homework problem has two parts. 118 00:05:22,100 --> 00:05:25,480 The first is to give an algorithm for computing 119 00:05:25,480 --> 00:05:27,152 the radius of a graph. 120 00:05:27,152 --> 00:05:29,110 And then the second one is to give an algorithm 121 00:05:29,110 --> 00:05:31,540 for approximating the radius of the graph 122 00:05:31,540 --> 00:05:35,032 really quickly, or more quickly than the first part. 123 00:05:35,032 --> 00:05:37,240 I actually don't know if there's a lower bound there. 124 00:05:37,240 --> 00:05:40,060 But come back to that later. 125 00:05:40,060 --> 00:05:47,430 OK, so in part a, we're given G. And moreover, we're 126 00:05:47,430 --> 00:05:49,930 given one additional piece of information, which we actually 127 00:05:49,930 --> 00:05:50,965 do need in this problem. 128 00:05:50,965 --> 00:05:53,590 I think it's one of those words that kind of slips past us when 129 00:05:53,590 --> 00:05:55,270 we read a graph theory problem. 130 00:05:55,270 --> 00:05:57,430 But it's important to pay attention, of course. 131 00:05:57,430 --> 00:06:02,427 And that is, we're given G. And it's connected. 132 00:06:02,427 --> 00:06:04,510 I suppose, really, it should be given connected G. 133 00:06:04,510 --> 00:06:06,530 But that's OK. 134 00:06:06,530 --> 00:06:14,860 Now what we want is to compute the radius of G in time that 135 00:06:14,860 --> 00:06:17,920 looks like the product of the vertices, 136 00:06:17,920 --> 00:06:20,320 the number of vertices-- oops-- 137 00:06:20,320 --> 00:06:23,800 times the number of edges, or the number of times 138 00:06:23,800 --> 00:06:25,530 the number of vertices. 139 00:06:25,530 --> 00:06:27,442 Your instructor struggles to speak and write 140 00:06:27,442 --> 00:06:28,150 at the same time. 141 00:06:28,150 --> 00:06:30,310 But it's a skill that I'm working on. 142 00:06:30,310 --> 00:06:33,750 And frankly, handwriting is much easier with this little chalk. 143 00:06:33,750 --> 00:06:34,600 OK. 144 00:06:34,600 --> 00:06:38,620 So essentially-- I used to have a math professor in college 145 00:06:38,620 --> 00:06:40,900 that used this phrase all the time that was just 146 00:06:40,900 --> 00:06:43,360 like, it's important not to think here. 147 00:06:43,360 --> 00:06:48,130 The problem asks you to compute the radius of a graph. 148 00:06:48,130 --> 00:06:49,840 And in some sense, there's an algorithm 149 00:06:49,840 --> 00:06:52,990 that just writes itself for computing the radius, right? 150 00:06:52,990 --> 00:06:55,900 Because the radius is the min over all the vertices, 151 00:06:55,900 --> 00:06:57,460 of the eccentricity. 152 00:06:57,460 --> 00:06:59,690 The eccentricity is the max distance. 153 00:06:59,690 --> 00:07:01,780 So what would be the simplest thing to do here? 154 00:07:01,780 --> 00:07:05,528 Well, in some sense, it would be to loop over all the vertices, 155 00:07:05,528 --> 00:07:07,570 compute their distance to all the other vertices, 156 00:07:07,570 --> 00:07:11,600 and take the max for each one of those 157 00:07:11,600 --> 00:07:14,710 and then the min over all of the guys in the outer loop. 158 00:07:14,710 --> 00:07:17,170 Since I just said a sentence that I'm realizing doesn't 159 00:07:17,170 --> 00:07:20,860 parse particularly well, let's sort of write down what I mean, 160 00:07:20,860 --> 00:07:24,130 which is to say, we're going to think of there being an outer-- 161 00:07:24,130 --> 00:07:27,170 that's why we don't use this chalk-- an outer for loop, 162 00:07:27,170 --> 00:07:29,230 which is computing this min. 163 00:07:29,230 --> 00:07:33,070 So-- right? 164 00:07:33,070 --> 00:07:34,320 Well, what are we going to do? 165 00:07:34,320 --> 00:07:45,390 We can compute the shortest path distance 166 00:07:45,390 --> 00:08:01,450 to all of the other w in my graph, take the max of w-- 167 00:08:01,450 --> 00:08:04,860 of distance from v to all the other w's. 168 00:08:04,860 --> 00:08:08,010 Obviously, we can kind of do these two at the same time. 169 00:08:08,010 --> 00:08:10,980 And then, if this number is bigger than my current max, 170 00:08:10,980 --> 00:08:12,470 keep it. 171 00:08:12,470 --> 00:08:13,470 Oh, yikes. 172 00:08:13,470 --> 00:08:18,280 If it's smaller than the current estimate I have of the radius, 173 00:08:18,280 --> 00:08:18,960 then I keep it. 174 00:08:18,960 --> 00:08:20,835 And if it's not, then I throw it away, right? 175 00:08:20,835 --> 00:08:25,480 So maybe I initialize my radius at infinity. 176 00:08:25,480 --> 00:08:29,410 And now let's call this number, I don't know, little r. 177 00:08:29,410 --> 00:08:34,409 If little r is less than big R, then just 178 00:08:34,409 --> 00:08:36,650 keep it around, right? 179 00:08:36,650 --> 00:08:38,336 And so if we think about it, I don't 180 00:08:38,336 --> 00:08:40,669 think it's terribly hard to prove that this algorithm is 181 00:08:40,669 --> 00:08:42,799 correct, because it's sort of just taking 182 00:08:42,799 --> 00:08:45,500 our definition of what the radius of a graph is 183 00:08:45,500 --> 00:08:49,440 and translating it into a braindead algorithm. 184 00:08:49,440 --> 00:08:52,310 So I think really, the challenge here 185 00:08:52,310 --> 00:08:57,420 is proving the runtime in this particular algorithm. 186 00:08:57,420 --> 00:08:59,020 So what does our runtime look like? 187 00:08:59,020 --> 00:09:03,800 So we have a loop over vertices. 188 00:09:03,800 --> 00:09:07,640 So I kind of incur a factor of mod v here. 189 00:09:07,640 --> 00:09:11,030 And then, well, our graph is unweighted. 190 00:09:11,030 --> 00:09:14,210 So one strategy for computing the shortest path distance 191 00:09:14,210 --> 00:09:16,783 would be breadth-first search. 192 00:09:16,783 --> 00:09:18,200 I think that's what's in my notes. 193 00:09:18,200 --> 00:09:19,340 Yep. 194 00:09:19,340 --> 00:09:21,680 And in general, breadth-first search, 195 00:09:21,680 --> 00:09:28,380 if you recall from lecture, takes mod v plus mod E time. 196 00:09:28,380 --> 00:09:31,817 So the question is, OK, so if I multiply these things together, 197 00:09:31,817 --> 00:09:32,400 what do I get? 198 00:09:32,400 --> 00:09:43,400 I get O of v times v plus E, like that, time. 199 00:09:43,400 --> 00:09:44,450 But like, uh-oh. 200 00:09:44,450 --> 00:09:47,450 That's not the time that my homework problem wanted, right? 201 00:09:47,450 --> 00:09:49,070 Because the homework problem asks 202 00:09:49,070 --> 00:09:53,000 you to solve this in just mod v times E time. 203 00:09:53,000 --> 00:09:56,330 And somehow we've incurred an extra factor. 204 00:09:56,330 --> 00:09:59,297 And now we have to figure out why this is actually OK, 205 00:09:59,297 --> 00:10:00,630 or we have to fix our algorithm. 206 00:10:00,630 --> 00:10:03,170 But in this case, it turns out that this runtime 207 00:10:03,170 --> 00:10:07,360 is just inaccurate, OK? 208 00:10:07,360 --> 00:10:08,590 What's our intuition here? 209 00:10:08,590 --> 00:10:11,650 Well, I kind of underlined it for you here. 210 00:10:11,650 --> 00:10:13,442 Our graph is connected. 211 00:10:13,442 --> 00:10:15,420 And in particular, there's going to be 212 00:10:15,420 --> 00:10:18,600 a nice property of connected graphs, which 213 00:10:18,600 --> 00:10:21,210 is that the number of edges dwarfs the number of vertices 214 00:10:21,210 --> 00:10:22,060 here. 215 00:10:22,060 --> 00:10:25,200 So really, if we have v plus E, in some sense, 216 00:10:25,200 --> 00:10:27,480 this is going to look like a constant factor times E 217 00:10:27,480 --> 00:10:29,320 plus another E here. 218 00:10:29,320 --> 00:10:32,010 So this whole thing is going to be v times E time, yeah? 219 00:10:32,010 --> 00:10:36,310 So let's make that argument a tiny bit more formal here. 220 00:10:36,310 --> 00:10:41,250 So in particular, we know that G is connected. 221 00:10:46,050 --> 00:10:49,170 And every vertex-- so in particular, 222 00:10:49,170 --> 00:10:51,270 what can happen here is-- 223 00:10:51,270 --> 00:10:54,510 OK, unless my graph consists of one vertex, which 224 00:10:54,510 --> 00:10:57,387 is a case you could dispose of pretty quickly, what I can't 225 00:10:57,387 --> 00:10:59,970 have is a graph that looks like this, like one vertex and then 226 00:10:59,970 --> 00:11:01,350 an edge floating around there. 227 00:11:01,350 --> 00:11:05,790 Everything has to be connected together-- 228 00:11:05,790 --> 00:11:07,180 connected together. 229 00:11:07,180 --> 00:11:10,530 OK, so in particular, what this means 230 00:11:10,530 --> 00:11:22,770 is that every vertex is adjacent to at least one edge, again, 231 00:11:22,770 --> 00:11:24,930 except, I guess, technically, the one vertex case. 232 00:11:24,930 --> 00:11:26,520 But I think we can convince ourselves 233 00:11:26,520 --> 00:11:28,110 that for any graph of constant size, 234 00:11:28,110 --> 00:11:30,540 we're not terribly worried about it, right? 235 00:11:30,540 --> 00:11:33,090 It's just the asymptotics that matter in this problem. 236 00:11:33,090 --> 00:11:35,700 OK, so if every vertex is adjacent to one 237 00:11:35,700 --> 00:11:38,700 edge, well-- and remember that every edge, kind 238 00:11:38,700 --> 00:11:42,070 of by definition of an edge, is adjacent to two vertices. 239 00:11:42,070 --> 00:11:45,210 Then what we can conclude is that the number of vertices 240 00:11:45,210 --> 00:11:48,480 is less than or equal to the number of edges divided by 2. 241 00:11:48,480 --> 00:11:50,860 This is a conservative estimate. 242 00:11:50,860 --> 00:11:53,680 And so in particular, what does that mean? 243 00:11:53,680 --> 00:11:59,478 It means that v is big O of E. This 244 00:11:59,478 --> 00:12:01,770 is a case where we have to be quite careful about big O 245 00:12:01,770 --> 00:12:04,310 being an upper bound, right? 246 00:12:04,310 --> 00:12:07,710 In this case, typically, v is much less than E-- 247 00:12:07,710 --> 00:12:09,720 well, depends how many edges, like 248 00:12:09,720 --> 00:12:12,760 if you have a really dense graph or not. 249 00:12:12,760 --> 00:12:14,380 But in this case, what does that mean? 250 00:12:14,380 --> 00:12:22,030 That means that mod v plus mod E is really just big O of mod E, 251 00:12:22,030 --> 00:12:22,530 right? 252 00:12:22,530 --> 00:12:26,220 Because this is big O of mod E plus big O of mod E. 253 00:12:26,220 --> 00:12:35,840 And that means that our problem really runs in v times E time, 254 00:12:35,840 --> 00:12:38,320 which is what we wanted in our problem. 255 00:12:38,320 --> 00:12:44,340 Are there any questions from our audience on part a here? 256 00:12:44,340 --> 00:12:44,840 Cool. 257 00:12:44,840 --> 00:12:47,710 AUDIENCE: I don't quite understand why-- 258 00:12:47,710 --> 00:12:49,570 where you went from the first statement 259 00:12:49,570 --> 00:12:51,520 to the second statement there. 260 00:12:51,520 --> 00:12:55,318 At least one edge implies v is less than or equal to E over 2. 261 00:12:55,318 --> 00:12:56,360 JUSTIN SOLOMON: Oh, yeah. 262 00:12:56,360 --> 00:12:59,110 So I guess there's sort of two things that matter here, right? 263 00:12:59,110 --> 00:13:02,800 Every vertex is adjacent to one edge at most. 264 00:13:02,800 --> 00:13:08,680 And every edge-- yikes. 265 00:13:08,680 --> 00:13:16,593 Every edge is adjacent to two vertices. 266 00:13:16,593 --> 00:13:18,760 I guess, actually, it's the second one that matters. 267 00:13:18,760 --> 00:13:22,070 So you can never have a vertex just floating by itself. 268 00:13:22,070 --> 00:13:24,625 So one way that I can count my number of vertices 269 00:13:24,625 --> 00:13:26,530 is by looking at the number of edges 270 00:13:26,530 --> 00:13:30,250 and saying that, well, every edge can touch exactly two 271 00:13:30,250 --> 00:13:30,820 vertices. 272 00:13:30,820 --> 00:13:33,160 Every vertex has to touch exactly-- 273 00:13:33,160 --> 00:13:34,725 well, at least one edge. 274 00:13:34,725 --> 00:13:36,100 So if you put those together, you 275 00:13:36,100 --> 00:13:38,363 can convince yourself that this bound has to be 2. 276 00:13:38,363 --> 00:13:40,030 If you want to be conservative about it, 277 00:13:40,030 --> 00:13:42,760 you can just get rid of the divided by 2 here, I guess. 278 00:13:42,760 --> 00:13:45,570 It doesn't really matter. 279 00:13:45,570 --> 00:13:47,280 Any other questions from our audience? 280 00:13:47,280 --> 00:13:48,330 Cool. 281 00:13:48,330 --> 00:13:52,710 All right, so now, let's take a look at part b. 282 00:13:52,710 --> 00:13:57,248 So in part b here, they ask us to basically do some version 283 00:13:57,248 --> 00:13:58,290 of the same thing, right? 284 00:13:58,290 --> 00:14:00,870 They want us to now approximate the radius. 285 00:14:00,870 --> 00:14:03,850 But we're given a smaller budget of time. 286 00:14:03,850 --> 00:14:08,250 So now what we want in number b here 287 00:14:08,250 --> 00:14:17,730 is, compute an R star such that-- 288 00:14:17,730 --> 00:14:21,420 I got yelled at in my textbook that st should always 289 00:14:21,420 --> 00:14:22,830 be "subject to." 290 00:14:22,830 --> 00:14:24,900 I got an angry review of the textbook 291 00:14:24,900 --> 00:14:27,300 I wrote because of that, which was puzzling to me. 292 00:14:27,300 --> 00:14:32,310 But amazon.com is not a great source of useful data. 293 00:14:32,310 --> 00:14:35,280 But in any event, we want R star, which 294 00:14:35,280 --> 00:14:40,000 is sandwiched between the radius of G 295 00:14:40,000 --> 00:14:45,740 and 2 times the radius of G, like that. 296 00:14:45,740 --> 00:14:48,115 Now, notice-- so in other words, we want to-- 297 00:14:48,115 --> 00:14:49,990 the first thing to notice is we want to upper 298 00:14:49,990 --> 00:14:53,060 bound the radius of our graph. 299 00:14:53,060 --> 00:14:55,310 And already, this should suggest to us 300 00:14:55,310 --> 00:14:58,520 how we might solve this problem, because if we take a look back 301 00:14:58,520 --> 00:15:00,600 at our definition of radius over here, 302 00:15:00,600 --> 00:15:03,440 notice that the radius is a min, right? 303 00:15:03,440 --> 00:15:06,230 So what's going to happen if I returned 304 00:15:06,230 --> 00:15:08,330 epsilon of some other vertex? 305 00:15:08,330 --> 00:15:10,490 Well, it's lower bounded by the radius, 306 00:15:10,490 --> 00:15:12,740 because the radius is the smallest possible epsilon 307 00:15:12,740 --> 00:15:14,300 over any vertex. 308 00:15:14,300 --> 00:15:16,160 That make sense? 309 00:15:16,160 --> 00:15:18,320 Now, when I was doing this problem, 310 00:15:18,320 --> 00:15:21,290 because, you know, I'm the dumb instructor of the three, 311 00:15:21,290 --> 00:15:25,700 I said, well, OK, but maybe I need 312 00:15:25,700 --> 00:15:29,480 to be somehow judicious about what vertex I choose. 313 00:15:29,480 --> 00:15:32,612 Like, well, in some sense, what this suggests 314 00:15:32,612 --> 00:15:34,320 is that maybe I choose some other vertex, 315 00:15:34,320 --> 00:15:36,770 and compute its radius, and return 316 00:15:36,770 --> 00:15:38,900 that as our approximation. 317 00:15:38,900 --> 00:15:41,420 But of course, the problem wants me to sandwich it 318 00:15:41,420 --> 00:15:42,780 between two values here. 319 00:15:42,780 --> 00:15:46,498 So in addition to upper bounding R, 320 00:15:46,498 --> 00:15:48,540 I want to be less than 2 times R. In other words, 321 00:15:48,540 --> 00:15:51,860 my approximation is within a constant factor. 322 00:15:51,860 --> 00:15:54,470 I tried some weird stuff, like farthest point sampling and so 323 00:15:54,470 --> 00:15:54,970 on. 324 00:15:54,970 --> 00:15:57,140 Then I realized that you actually don't really 325 00:15:57,140 --> 00:15:58,490 need to do any of that. 326 00:15:58,490 --> 00:16:02,750 One thing you can do is literally choose any vertex, 327 00:16:02,750 --> 00:16:06,020 return its eccentricity. 328 00:16:06,020 --> 00:16:08,330 And that's actually good enough. 329 00:16:08,330 --> 00:16:12,050 So here's our algorithm. 330 00:16:12,050 --> 00:16:14,287 Let me go back to my notes here. 331 00:16:14,287 --> 00:16:16,370 I don't know why I'm following my notes, actually. 332 00:16:16,370 --> 00:16:17,100 I could do this off the top of my head. 333 00:16:17,100 --> 00:16:19,400 But they make me feel better if I'm looking 334 00:16:19,400 --> 00:16:21,660 at them at the same time. 335 00:16:21,660 --> 00:16:31,830 So in particular, what I'm going to do is choose u and v. Let me 336 00:16:31,830 --> 00:16:32,730 be clear here-- 337 00:16:32,730 --> 00:16:36,960 any u and v. So if I'm using some data structure 338 00:16:36,960 --> 00:16:40,620 to store all my vertices, I just take the first one, whatever. 339 00:16:40,620 --> 00:16:53,012 And two, I'm going to return R star is equal to epsilon of u. 340 00:16:53,012 --> 00:16:54,970 Now, of course, this isn't really an algorithm. 341 00:16:54,970 --> 00:16:56,770 If you do this on your homework, you'll lose points. 342 00:16:56,770 --> 00:16:58,478 And the reason is that I haven't told you 343 00:16:58,478 --> 00:17:00,545 how to compute this value here. 344 00:17:00,545 --> 00:17:02,920 So if you were to write out your answer for this problem, 345 00:17:02,920 --> 00:17:04,810 of course, you should tell us that, 346 00:17:04,810 --> 00:17:07,420 like, really, to compute epsilon, what do I do? 347 00:17:07,420 --> 00:17:09,250 I use breadth-first search to compute 348 00:17:09,250 --> 00:17:11,710 the shortest path from u to all the other vertices. 349 00:17:11,710 --> 00:17:14,460 And then I guess I take the max value here. 350 00:17:14,460 --> 00:17:17,200 OK, so I think you guys can fill in the details 351 00:17:17,200 --> 00:17:19,000 of the algorithm. 352 00:17:19,000 --> 00:17:20,740 The bigger challenge is going to be 353 00:17:20,740 --> 00:17:24,400 to prove that this is actually a good bound, right? 354 00:17:24,400 --> 00:17:28,480 And so, in other words, what we need to prove here-- 355 00:17:28,480 --> 00:17:29,063 I don't know. 356 00:17:29,063 --> 00:17:29,980 Like, there's a claim. 357 00:17:29,980 --> 00:17:31,000 There's a proposition. 358 00:17:31,000 --> 00:17:33,202 There's a theorem, somewhere on that axis. 359 00:17:33,202 --> 00:17:34,660 I'm going to call this one a claim. 360 00:17:34,660 --> 00:17:37,600 I'm going to downgrade it. 361 00:17:37,600 --> 00:17:42,820 And that is that the radius of my graph 362 00:17:42,820 --> 00:17:47,680 is less than or equal to R star, which is less than 363 00:17:47,680 --> 00:17:53,140 or equal to 2 times the radius of my graph. 364 00:17:53,140 --> 00:17:57,586 OK, so let's prove this thing. 365 00:17:57,586 --> 00:18:02,240 I'm managing to use all of my boards on one problem here. 366 00:18:02,240 --> 00:18:05,720 OK, so in particular, to prove this claim, 367 00:18:05,720 --> 00:18:07,190 I need to prove two inequalities. 368 00:18:07,190 --> 00:18:09,680 This is like two homework problems in one. 369 00:18:09,680 --> 00:18:11,010 So let's number those off. 370 00:18:11,010 --> 00:18:12,080 There's 1. 371 00:18:12,080 --> 00:18:14,040 There's 2. 372 00:18:14,040 --> 00:18:19,080 OK, so let's do inequality 1. 373 00:18:19,080 --> 00:18:23,860 I think we can squeeze him into a relatively small space. 374 00:18:23,860 --> 00:18:28,000 So remember, what is the radius of my graph? 375 00:18:28,000 --> 00:18:30,960 Well, just by definition, we know 376 00:18:30,960 --> 00:18:39,270 that it's the min over all possible u of epsilon of u. 377 00:18:39,270 --> 00:18:41,020 So in particular, what's-- 378 00:18:41,020 --> 00:18:43,020 the nice property about the minimum of something 379 00:18:43,020 --> 00:18:46,050 is that it's less than everything else or equal. 380 00:18:46,050 --> 00:18:49,860 So this-- maybe let's call this u0, 381 00:18:49,860 --> 00:18:52,500 just to distinguish between that and the notation I 382 00:18:52,500 --> 00:18:53,940 have on the left-hand side. 383 00:18:53,940 --> 00:18:58,210 This is less than or equal to epsilon of u, 384 00:18:58,210 --> 00:19:02,770 because, I don't know, because min, yeah? 385 00:19:02,770 --> 00:19:05,200 So that actually already-- 386 00:19:05,200 --> 00:19:08,770 and of course, this is exactly what we chose to be our R star. 387 00:19:08,770 --> 00:19:12,340 So our first part of our proof is done here. 388 00:19:12,340 --> 00:19:14,680 So this is the easy part. 389 00:19:14,680 --> 00:19:16,218 And sometimes, like, this is sort 390 00:19:16,218 --> 00:19:17,510 of what inspired our algorithm. 391 00:19:17,510 --> 00:19:20,890 So we expect this bound to be kind of straightforward. 392 00:19:20,890 --> 00:19:23,680 OK, but the other half of the problem 393 00:19:23,680 --> 00:19:26,120 is a little more tricky. 394 00:19:26,120 --> 00:19:28,360 And actually, there's a solution in the notes. 395 00:19:28,360 --> 00:19:30,550 And then I decided, just to make it a little more inaccurate, 396 00:19:30,550 --> 00:19:31,342 to write up my own. 397 00:19:33,940 --> 00:19:36,753 But actually, I have I have an ulterior motive, which 398 00:19:36,753 --> 00:19:38,170 is I notice in this class we don't 399 00:19:38,170 --> 00:19:40,900 tend to use a tiny piece of notation that I like. 400 00:19:40,900 --> 00:19:43,060 So for my convenience in future problems sessions, 401 00:19:43,060 --> 00:19:46,000 I thought I'd introduce it now. 402 00:19:46,000 --> 00:19:49,040 So we're solving a minimization problem. 403 00:19:49,040 --> 00:19:51,040 The nice thing is that in this class, everything 404 00:19:51,040 --> 00:19:52,307 we do is finite. 405 00:19:52,307 --> 00:19:53,890 If you take my graduate course, that's 406 00:19:53,890 --> 00:19:55,100 not going to be the case. 407 00:19:55,100 --> 00:19:56,230 In fact, actually, in lecture 2, we're 408 00:19:56,230 --> 00:19:57,813 going to do like variational calculus. 409 00:19:57,813 --> 00:20:01,400 But in this course, what does that mean? 410 00:20:01,400 --> 00:20:03,700 That means if I minimize a function, 411 00:20:03,700 --> 00:20:05,920 there is actually a vertex in my graph 412 00:20:05,920 --> 00:20:09,160 that achieves that minimum, right? 413 00:20:09,160 --> 00:20:11,330 This is different than like, for example, if I-- 414 00:20:11,330 --> 00:20:12,080 then I'll shut up. 415 00:20:12,080 --> 00:20:17,950 But if I wanted to minimize-- here's f of x equals 1 over x, 416 00:20:17,950 --> 00:20:19,780 and I ask you for the minimum value. 417 00:20:19,780 --> 00:20:23,770 Well, it's over all x greater than or equal to 0. 418 00:20:23,770 --> 00:20:27,340 Well, the minimum value is 0 if I take x off to infinity. 419 00:20:27,340 --> 00:20:29,110 But it never quite crosses 0. 420 00:20:29,110 --> 00:20:31,300 So you're kind of in this weird universe. 421 00:20:31,300 --> 00:20:33,940 If you remember Jason's lecture, he talked about infs and sups 422 00:20:33,940 --> 00:20:35,770 as opposed to mins and maxes. 423 00:20:35,770 --> 00:20:38,672 But this can't happen in our problem, 424 00:20:38,672 --> 00:20:40,630 because when we compute a min, there's actually 425 00:20:40,630 --> 00:20:42,400 a vertex that achieves it. 426 00:20:42,400 --> 00:20:45,880 And that vertex, we call arg min. 427 00:20:45,880 --> 00:20:48,980 And so this arg here stands for argument. 428 00:20:48,980 --> 00:20:51,370 So one thing that I can do is say, OK. 429 00:20:51,370 --> 00:20:59,190 So remember that my radius is the min, the min 430 00:20:59,190 --> 00:21:03,660 over all u of epsilon of u. 431 00:21:03,660 --> 00:21:09,045 Then I'm going to define a vertex, u0, to be the arg 432 00:21:09,045 --> 00:21:14,620 min over u of epsilon of u. 433 00:21:14,620 --> 00:21:16,930 And this is just fancy notation for saying, give me 434 00:21:16,930 --> 00:21:19,780 the actual vertex that makes this value as 435 00:21:19,780 --> 00:21:22,095 small as possible, yeah? 436 00:21:22,095 --> 00:21:23,470 The nice thing about this problem 437 00:21:23,470 --> 00:21:27,268 is that we're not worried yet about how we make runtime. 438 00:21:27,268 --> 00:21:28,810 So I can construct this kind of thing 439 00:21:28,810 --> 00:21:31,510 and not worry about how I actually found it, right? 440 00:21:31,510 --> 00:21:33,980 OK, so let's say that we did that. 441 00:21:33,980 --> 00:21:38,740 So this is, find me the vertex that actually gives me 442 00:21:38,740 --> 00:21:39,940 the radius, right? 443 00:21:39,940 --> 00:21:41,690 So in other words, I find that vertex. 444 00:21:41,690 --> 00:21:45,110 And then I find his or her farthest-away vertex 445 00:21:45,110 --> 00:21:46,370 and measure the distance. 446 00:21:46,370 --> 00:21:50,517 And that distance is the radius of my graph, OK? 447 00:21:50,517 --> 00:21:51,600 So let's actually do that. 448 00:21:51,600 --> 00:21:56,710 So in particular, then I can define a second vertex, v0. 449 00:21:56,710 --> 00:21:59,103 Well, how does the radius algorithm work? 450 00:21:59,103 --> 00:22:00,520 I find the central guy, and then I 451 00:22:00,520 --> 00:22:02,510 find the one that's farthest away. 452 00:22:02,510 --> 00:22:05,110 So we're going to make him the arg 453 00:22:05,110 --> 00:22:11,290 max over all v in my graph of the distance starting 454 00:22:11,290 --> 00:22:14,800 at u to any v. 455 00:22:14,800 --> 00:22:17,380 So if I think about my circle-- 456 00:22:17,380 --> 00:22:18,910 that's a circle. 457 00:22:18,910 --> 00:22:22,570 Then u0 is like that center of my circle. 458 00:22:22,570 --> 00:22:27,505 And then v0 is like that far-away point. 459 00:22:27,505 --> 00:22:28,630 This is a schematic, right? 460 00:22:28,630 --> 00:22:30,520 My circle is really a graph in this problem. 461 00:22:30,520 --> 00:22:32,780 But I think the analogy actually works. 462 00:22:32,780 --> 00:22:33,910 OK. 463 00:22:33,910 --> 00:22:36,070 But in reality, my algorithm was braindead. 464 00:22:36,070 --> 00:22:38,090 I didn't actually compute u0. 465 00:22:38,090 --> 00:22:39,315 I just randomly drew-- 466 00:22:39,315 --> 00:22:40,690 sorry, I shouldn't use that word. 467 00:22:40,690 --> 00:22:44,650 I arbitrarily drew a vertex, u, and then computed 468 00:22:44,650 --> 00:22:47,350 the farthest-away distance from that guy. 469 00:22:47,350 --> 00:22:48,880 And of course, what we have to check 470 00:22:48,880 --> 00:22:51,940 is that that thing is within a factor of 2 of what I wanted. 471 00:22:51,940 --> 00:22:53,230 So OK. 472 00:22:53,230 --> 00:22:55,120 If I have u, then I'm additionally 473 00:22:55,120 --> 00:22:58,780 going to define one more thing called v. And that-- 474 00:23:09,810 --> 00:23:10,950 oh boy. 475 00:23:10,950 --> 00:23:13,000 OK, so I'm noticing I'm saying one thing, 476 00:23:13,000 --> 00:23:14,130 and I'm writing another. 477 00:23:14,130 --> 00:23:16,860 u0 is the center of my graph. 478 00:23:16,860 --> 00:23:17,610 I think I said it. 479 00:23:17,610 --> 00:23:18,750 I just forgot to write it. 480 00:23:18,750 --> 00:23:21,630 And then this v is the farthest-away guy from him. 481 00:23:21,630 --> 00:23:24,300 So basically, the subscript 0 here means-- 482 00:23:24,300 --> 00:23:26,760 is the platonic ideal of what I wanted in my problem. 483 00:23:26,760 --> 00:23:29,730 And no subscript is going to mean the other one. 484 00:23:29,730 --> 00:23:33,757 So now I compute the farthest-away thing from the u 485 00:23:33,757 --> 00:23:35,340 that I actually chose in my algorithm. 486 00:23:35,340 --> 00:23:36,990 That's some v bar. 487 00:23:36,990 --> 00:23:39,570 So again, remember, my algorithm just says, OK, 488 00:23:39,570 --> 00:23:42,420 I'm going to choose some other point, v, 489 00:23:42,420 --> 00:23:47,590 and then return v's distance to some farthest-away point-- 490 00:23:47,590 --> 00:23:49,140 oh, sorry, choose another-- 491 00:23:49,140 --> 00:23:50,000 oh boy. 492 00:23:50,000 --> 00:23:53,090 Choose a point, u, and return his distance 493 00:23:53,090 --> 00:23:55,130 to some far-away point, v. I think 494 00:23:55,130 --> 00:23:57,920 I've managed to lose everybody, knotting together 495 00:23:57,920 --> 00:23:59,580 u's and v's here. 496 00:23:59,580 --> 00:24:01,838 OK, so why'd I introduce all of this notation? 497 00:24:01,838 --> 00:24:04,130 Because this is what's going on in this problem, right? 498 00:24:04,130 --> 00:24:05,570 To actually compute the radius, I 499 00:24:05,570 --> 00:24:08,270 want to find the most central point, u0, and its distance 500 00:24:08,270 --> 00:24:10,250 its farthest-away thing, v0. 501 00:24:10,250 --> 00:24:13,340 In reality, I arbitrarily chose the point u. 502 00:24:13,340 --> 00:24:16,122 And I returned u's distance to some point, 503 00:24:16,122 --> 00:24:18,080 v. And I want to show that those two things are 504 00:24:18,080 --> 00:24:19,940 within a factor of 2 of each other. 505 00:24:19,940 --> 00:24:21,530 OK, that summary makes sense even, 506 00:24:21,530 --> 00:24:23,470 if I talked in circles for a little while. 507 00:24:23,470 --> 00:24:25,890 OK, so let's actually do that. 508 00:24:25,890 --> 00:24:29,480 So remember that the thing that I'm going to actually return 509 00:24:29,480 --> 00:24:30,500 is R star. 510 00:24:30,500 --> 00:24:35,210 And that is equal to the distance from u to v now, 511 00:24:35,210 --> 00:24:37,790 because I just made all these definitions. 512 00:24:37,790 --> 00:24:43,637 And now I get to use my favorite inequality. 513 00:24:43,637 --> 00:24:45,470 In fact, this is sort of the only inequality 514 00:24:45,470 --> 00:24:47,470 we know in this class so far, I think, 515 00:24:47,470 --> 00:24:49,840 which is the triangle inequality, which 516 00:24:49,840 --> 00:24:52,750 says that, of course, this is less 517 00:24:52,750 --> 00:24:56,260 than or equal to the distance from u 518 00:24:56,260 --> 00:25:02,193 to u0 plus the distance from u0 to v. 519 00:25:02,193 --> 00:25:03,610 So in other words, this is saying, 520 00:25:03,610 --> 00:25:06,235 the shortest path from u to v is always 521 00:25:06,235 --> 00:25:08,110 upper bounded from the length of the shortest 522 00:25:08,110 --> 00:25:11,330 path from u to u0 and then u0 to v, right? 523 00:25:11,330 --> 00:25:13,560 This is drawing a triangle. 524 00:25:13,560 --> 00:25:15,210 Aha! 525 00:25:15,210 --> 00:25:16,200 But take a look. 526 00:25:16,200 --> 00:25:20,410 What is the actual radius of my graph? 527 00:25:20,410 --> 00:25:23,280 Well, in my notation, the radius of my graph 528 00:25:23,280 --> 00:25:30,250 is exactly the distance from u0 to v0. 529 00:25:30,250 --> 00:25:35,050 And this thing is bigger than the distance from u0 530 00:25:35,050 --> 00:25:43,430 to anything else, by definition, for all v, right? 531 00:25:43,430 --> 00:25:47,240 So if I flip this inequality backward, well, take a look. 532 00:25:47,240 --> 00:25:49,250 This is the distance from u0 to something. 533 00:25:49,250 --> 00:25:51,230 This is the distance from u0 to something. 534 00:25:51,230 --> 00:25:52,955 So I incur two factors of the radius. 535 00:25:56,270 --> 00:25:58,910 And I get the bound that I wanted, yeah? 536 00:25:58,910 --> 00:26:01,340 And so this is a slightly more formal little proof 537 00:26:01,340 --> 00:26:04,208 of exactly the same thing that's in the homework notes. 538 00:26:04,208 --> 00:26:05,750 OK, so the one thing that's remaining 539 00:26:05,750 --> 00:26:08,248 is to actually show that our algorithm runs 540 00:26:08,248 --> 00:26:09,540 in a reasonable amount of time. 541 00:26:09,540 --> 00:26:13,970 So I think they give us a budget of order E time. 542 00:26:13,970 --> 00:26:16,490 But notice, that argument is precisely the argument 543 00:26:16,490 --> 00:26:20,690 that we just made right here, just minus the v factor. 544 00:26:20,690 --> 00:26:22,970 And the v factor just came from looping over 545 00:26:22,970 --> 00:26:25,740 all the vertices in part a. 546 00:26:25,740 --> 00:26:29,120 So now I think we're done with problem 1. 547 00:26:29,120 --> 00:26:31,820 As usual, I've wasted too much time on the easy problem. 548 00:26:31,820 --> 00:26:34,890 All right, any questions about this one? 549 00:26:34,890 --> 00:26:35,390 Excellent. 550 00:26:35,390 --> 00:26:39,582 Well, now that I've written too much, let's do the rest of it. 551 00:26:39,582 --> 00:26:41,540 I spent time on this problem because I like it. 552 00:26:41,540 --> 00:26:43,170 It looks like a geometry problem. 553 00:26:43,170 --> 00:26:43,670 OK. 554 00:26:46,290 --> 00:26:47,830 So now, let's see. 555 00:26:47,830 --> 00:26:50,700 In problem 2, which I noticed that this homework is 556 00:26:50,700 --> 00:26:56,010 kind of full of prototypical 6.006 557 00:26:56,010 --> 00:26:57,813 slash graph theory problems in general. 558 00:26:57,813 --> 00:27:00,480 Like, they just go down the list of things that people typically 559 00:27:00,480 --> 00:27:04,470 do in graph theory that are useful tricks to know. 560 00:27:04,470 --> 00:27:07,450 So I would suggest to the students in this class, 561 00:27:07,450 --> 00:27:10,230 even if it's pass-fail, look very closely at this homework 562 00:27:10,230 --> 00:27:12,390 before doing the current one. 563 00:27:12,390 --> 00:27:16,740 I think the ordering works out that they can do that, 564 00:27:16,740 --> 00:27:19,860 because I think you'll get some good hints for how to solve 565 00:27:19,860 --> 00:27:22,320 all the current homework. 566 00:27:22,320 --> 00:27:25,330 So you heard it here first, guys. 567 00:27:25,330 --> 00:27:25,830 OK. 568 00:27:25,830 --> 00:27:31,680 So in problem 2, we're talking about internet investigation. 569 00:27:31,680 --> 00:27:36,570 So in particular, at MIT has a bunch of different routers 570 00:27:36,570 --> 00:27:39,730 that are connected by cables to one another. 571 00:27:39,730 --> 00:27:42,120 And essentially, what are we given? 572 00:27:42,120 --> 00:27:44,040 We're given a bunch of different routers. 573 00:27:44,040 --> 00:27:46,860 And we're given the length of the cable in between them. 574 00:27:46,860 --> 00:27:48,570 And the latency, unsurprisingly, is 575 00:27:48,570 --> 00:27:50,237 proportional to the length of the cable. 576 00:27:50,237 --> 00:27:52,260 That, in my abstract understanding 577 00:27:52,260 --> 00:27:54,460 of how computers work, kind of makes sense to me. 578 00:27:54,460 --> 00:27:55,877 I'm not sure that's actually true. 579 00:27:55,877 --> 00:28:01,050 But that's sort of immaterial for 6.006. 580 00:28:01,050 --> 00:28:02,940 I assume our department has a networks 581 00:28:02,940 --> 00:28:06,360 class if you're interested in that kind of thing. 582 00:28:06,360 --> 00:28:08,220 And essentially, what we're trying to do 583 00:28:08,220 --> 00:28:11,910 is sum up the latency over all of the routers. 584 00:28:11,910 --> 00:28:14,190 So let's break down a little bit of notation 585 00:28:14,190 --> 00:28:18,246 here while I continue to dance all over the room here. 586 00:28:18,246 --> 00:28:20,400 I keep losing my chalk. 587 00:28:27,050 --> 00:28:28,610 I need like a holster. 588 00:28:28,610 --> 00:28:31,700 I feel like that would be useful for the chalk bucket. 589 00:28:31,700 --> 00:28:33,290 OK. 590 00:28:33,290 --> 00:28:37,080 So now we're going to do problem 2 here. 591 00:28:37,080 --> 00:28:38,540 So we're given r routers. 592 00:28:42,460 --> 00:28:48,360 And some of them are marked as entry points. 593 00:28:53,730 --> 00:29:06,840 And now we have a bunch of bidirectional wires, wi, 594 00:29:06,840 --> 00:29:12,350 each of which has length li. 595 00:29:12,350 --> 00:29:17,360 And that's a positive integer value here. 596 00:29:17,360 --> 00:29:18,938 And actually, because of this-- 597 00:29:18,938 --> 00:29:21,230 so technically, I think a lot of students in this class 598 00:29:21,230 --> 00:29:22,897 have encountered weighted graphs before. 599 00:29:22,897 --> 00:29:25,640 But if you think about the narrative of this course, 600 00:29:25,640 --> 00:29:27,590 I think, for the version of this homework, 601 00:29:27,590 --> 00:29:29,750 we haven't really encountered weighted graphs yet. 602 00:29:29,750 --> 00:29:32,208 But a better way of putting it, rather than psychologically 603 00:29:32,208 --> 00:29:35,603 diagnosing your instructors, is that what we're going to find 604 00:29:35,603 --> 00:29:38,270 is that there are often problems that look like they're weighted 605 00:29:38,270 --> 00:29:39,895 graph problems, but they really aren't. 606 00:29:39,895 --> 00:29:43,490 And this is a nice example where that's the case. 607 00:29:43,490 --> 00:29:48,770 OK, so we define latency as follows, 608 00:29:48,770 --> 00:29:52,400 that it's at least proportional to the shortest 609 00:29:52,400 --> 00:29:53,560 path to an entry point. 610 00:30:01,740 --> 00:30:04,690 And now we have two additional assumptions that we need, 611 00:30:04,690 --> 00:30:05,190 right? 612 00:30:05,190 --> 00:30:09,690 One is that the total latency, or at least the latency 613 00:30:09,690 --> 00:30:15,470 of every vertex, which is the same thing-- latency-- 614 00:30:15,470 --> 00:30:17,570 is less than infinity. 615 00:30:17,570 --> 00:30:20,090 What is this really saying, by the way? 616 00:30:20,090 --> 00:30:21,860 Like, when would the latency be infinity? 617 00:30:21,860 --> 00:30:23,630 It would only be infinity if I like took a pair of scissors, 618 00:30:23,630 --> 00:30:25,550 and cut a wire, and just connected 619 00:30:25,550 --> 00:30:27,440 for the rest of the network. 620 00:30:27,440 --> 00:30:28,460 Yes? 621 00:30:28,460 --> 00:30:31,002 AUDIENCE: Every router is connected to some entry point. 622 00:30:31,002 --> 00:30:32,210 JUSTIN SOLOMON: Yes, exactly. 623 00:30:32,210 --> 00:30:34,820 Like, there's some path from every router to some entry 624 00:30:34,820 --> 00:30:35,360 point. 625 00:30:35,360 --> 00:30:37,652 Doesn't necessarily mean the entire graph is connected, 626 00:30:37,652 --> 00:30:38,430 I guess. 627 00:30:38,430 --> 00:30:41,900 But at least you can always get to an entry point. 628 00:30:41,900 --> 00:30:44,240 And then, moreover, and this one's 629 00:30:44,240 --> 00:30:46,940 the real kicker here, that there's 630 00:30:46,940 --> 00:30:51,650 at most 100r feet of wire. 631 00:30:55,920 --> 00:30:57,780 Incidentally, r stands for routers. 632 00:30:57,780 --> 00:30:59,960 I had the previous problem in my head 633 00:30:59,960 --> 00:31:02,110 and was thinking radius a long time. 634 00:31:02,110 --> 00:31:03,785 So don't be like your instructor. 635 00:31:03,785 --> 00:31:05,910 And actually read the entire problem before getting 636 00:31:05,910 --> 00:31:07,330 hung up on it. 637 00:31:07,330 --> 00:31:11,190 But in any event, the thing that you're trying to do 638 00:31:11,190 --> 00:31:20,550 is to compute the sum over all of the routers-- 639 00:31:20,550 --> 00:31:25,990 I don't know, r, whatever-- of the latency of that router. 640 00:31:25,990 --> 00:31:27,595 OK, so that's our problem here. 641 00:31:27,595 --> 00:31:29,910 Incidentally, this little goofy exercise 642 00:31:29,910 --> 00:31:31,707 I just did of taking this paragraph problem 643 00:31:31,707 --> 00:31:34,290 and kind of writing it in bullet points, I find helps me a lot 644 00:31:34,290 --> 00:31:36,330 when I'm trying to solve these algorithms problems, 645 00:31:36,330 --> 00:31:38,830 because I think it's really easy to just get like thrown off 646 00:31:38,830 --> 00:31:41,200 by a wall of text here. 647 00:31:41,200 --> 00:31:44,940 OK, so this problem is screaming out graph theory. 648 00:31:44,940 --> 00:31:47,040 Like, we're practically using the terms here. 649 00:31:47,040 --> 00:31:48,300 We are using the terms, right? 650 00:31:48,300 --> 00:31:51,660 Like, we've got nodes that are kind of like routers. 651 00:31:51,660 --> 00:31:54,840 And maybe edges are kind of like wires. 652 00:31:54,840 --> 00:31:58,410 But there's a bit of a catch, which is that your runtime-- 653 00:31:58,410 --> 00:32:04,650 at the end of the day, I think you want order r runtime. 654 00:32:04,650 --> 00:32:07,620 That's where things get a little funky initially. 655 00:32:07,620 --> 00:32:09,120 And so we have to think a little bit 656 00:32:09,120 --> 00:32:11,100 carefully about how to do it. 657 00:32:11,100 --> 00:32:13,345 And here's going to be the trick. 658 00:32:13,345 --> 00:32:15,720 So this is starting to look like a shortest path problem. 659 00:32:15,720 --> 00:32:18,480 But what would you maybe not want to do? 660 00:32:18,480 --> 00:32:20,820 Would be to iterate over every single router, 661 00:32:20,820 --> 00:32:23,610 or every single vertex and every single router, 662 00:32:23,610 --> 00:32:26,610 and compute the shortest path between every single pair, 663 00:32:26,610 --> 00:32:28,500 because if you did that-- oh boy, I'm 664 00:32:28,500 --> 00:32:30,120 confusing my terminology. 665 00:32:30,120 --> 00:32:32,400 There are entry points, which is the thing that I need 666 00:32:32,400 --> 00:32:34,410 to compute the distance to. 667 00:32:34,410 --> 00:32:36,535 And I need to iterate over every single router 668 00:32:36,535 --> 00:32:38,910 and compute its distance, maybe, to all the entry points, 669 00:32:38,910 --> 00:32:41,140 and then take the min, or something like that. 670 00:32:41,140 --> 00:32:42,990 But if I had a double for loop, then I'm 671 00:32:42,990 --> 00:32:45,237 probably not going to get order r time, right? 672 00:32:45,237 --> 00:32:47,820 Because somehow, you expect it to look like something squared, 673 00:32:47,820 --> 00:32:49,310 or like the product of two terms. 674 00:32:49,310 --> 00:32:51,310 So we have to be a little more sneaky than that. 675 00:32:51,310 --> 00:32:54,810 And we're going to use sort of a canonical trick in graph 676 00:32:54,810 --> 00:32:56,050 theory. 677 00:32:56,050 --> 00:33:00,050 OK, so let's follow the Toucan Sam approach here. 678 00:33:00,050 --> 00:33:03,160 We're going to follow our nose and say that, OK, there's 679 00:33:03,160 --> 00:33:06,768 basically a graph that's staring us in the face in this problem. 680 00:33:06,768 --> 00:33:09,310 But then we're going to have to make a little bit of an edit, 681 00:33:09,310 --> 00:33:13,180 because we'd like to use the kind of linear-looking time 682 00:33:13,180 --> 00:33:16,240 search that BFS affords us. 683 00:33:16,240 --> 00:33:18,620 But it looks like we have edge weights in our graph, 684 00:33:18,620 --> 00:33:22,510 because the wires are associated to lengths, right? 685 00:33:22,510 --> 00:33:24,490 Different wires have different sizes. 686 00:33:24,490 --> 00:33:26,680 But we have this nice fun fact, which 687 00:33:26,680 --> 00:33:29,170 is that the total amount of wire in our whole universe 688 00:33:29,170 --> 00:33:31,758 is less than 100r. 689 00:33:31,758 --> 00:33:34,050 I guess the units of this 100 are kind of weird, right? 690 00:33:34,050 --> 00:33:38,220 It's like feet per router or something, but whatever. 691 00:33:38,220 --> 00:33:46,190 OK, so in particular, I'm going to make a graph 692 00:33:46,190 --> 00:33:51,200 with the node per router. 693 00:33:51,200 --> 00:33:53,090 So like, maybe here's a router. 694 00:33:53,090 --> 00:33:54,620 There's another router. 695 00:33:54,620 --> 00:33:58,200 There's router 1 and router 2. 696 00:33:58,200 --> 00:34:00,450 But since I want to use the sort of linear time 697 00:34:00,450 --> 00:34:02,940 advantages of breadth-first search 698 00:34:02,940 --> 00:34:06,390 when I'm computing distances, I can be a little bit sneaky 699 00:34:06,390 --> 00:34:08,429 about this, which is to say, instead 700 00:34:08,429 --> 00:34:10,980 of having like 10 feet of wires, I'm going 701 00:34:10,980 --> 00:34:14,370 to have 10 1-foot wires, yeah? 702 00:34:14,370 --> 00:34:19,250 Except now I'm additionally going to have little chains. 703 00:34:19,250 --> 00:34:24,889 So here, maybe the length l 1, 2 is equal to 3, right? 704 00:34:24,889 --> 00:34:29,060 So I'm going to put three edges in between. 705 00:34:29,060 --> 00:34:30,040 So in other words-- 706 00:34:30,040 --> 00:34:38,219 and I'm going to connect them with chains 707 00:34:38,219 --> 00:34:45,510 of li edges for each wire. 708 00:34:49,155 --> 00:34:50,030 Does that make sense? 709 00:34:50,030 --> 00:34:52,548 So essentially, I'm going to take my weighted graph problem 710 00:34:52,548 --> 00:34:55,090 and make it unweighted by just like repeating a bunch-- well, 711 00:34:55,090 --> 00:34:56,882 not really repeating, but chaining together 712 00:34:56,882 --> 00:35:00,880 a bunch of edges so that the total length of this thing 713 00:35:00,880 --> 00:35:04,120 is equal to the distance from one router to another. 714 00:35:04,120 --> 00:35:05,890 OK. 715 00:35:05,890 --> 00:35:07,600 One thing we might as well do is bound 716 00:35:07,600 --> 00:35:10,940 the number of vertices and edges in our graph when we do that. 717 00:35:10,940 --> 00:35:18,180 So first of all, let's think about the number of vertices. 718 00:35:18,180 --> 00:35:20,550 And we can be totally lazy and upper bound this stuff. 719 00:35:20,550 --> 00:35:21,570 It doesn't matter. 720 00:35:21,570 --> 00:35:24,210 Well, for one thing, I have one node per router. 721 00:35:24,210 --> 00:35:28,280 So we incur one factor of r there. 722 00:35:28,280 --> 00:35:31,220 And now, notice that we're kind of laying down cable 723 00:35:31,220 --> 00:35:35,120 one little piece at a time here in our chains. 724 00:35:35,120 --> 00:35:39,050 And now I always tend to have a fencepost-style headache 725 00:35:39,050 --> 00:35:41,090 about exactly what the constant factor is here. 726 00:35:41,090 --> 00:35:43,880 But if we're conservative about it, 727 00:35:43,880 --> 00:35:48,010 we incur at most a factor of 100r kind of additional edges, 728 00:35:48,010 --> 00:35:49,760 because those are all the different pieces 729 00:35:49,760 --> 00:35:50,660 that we could lay together. 730 00:35:50,660 --> 00:35:53,243 I think it's actually less than that because of the endpoints, 731 00:35:53,243 --> 00:35:58,280 but whatever, because r plus 100 r is big O of r. 732 00:35:58,280 --> 00:36:02,202 So the number of vertices in my graph here is big O of r. 733 00:36:02,202 --> 00:36:03,785 Similarly, what's the number of edges? 734 00:36:07,590 --> 00:36:10,950 Well, this is exactly the amount of cable that's inside 735 00:36:10,950 --> 00:36:12,800 of my network, I believe. 736 00:36:12,800 --> 00:36:13,840 Yep. 737 00:36:13,840 --> 00:36:15,090 So this is exactly 100r. 738 00:36:17,600 --> 00:36:19,490 Well, I guess the way the problem is written, 739 00:36:19,490 --> 00:36:23,690 it's upper bounded by 100r, but whatever. 740 00:36:23,690 --> 00:36:27,065 So this, again, is big O of r. 741 00:36:27,065 --> 00:36:28,190 This is kind of convenient. 742 00:36:28,190 --> 00:36:29,898 So now we have one number that rules them 743 00:36:29,898 --> 00:36:32,730 all, which is r, which tells you both the number of vertices 744 00:36:32,730 --> 00:36:35,420 and the number of edges, up to a constant factor, right? 745 00:36:35,420 --> 00:36:37,340 So one thing I can convince myself 746 00:36:37,340 --> 00:36:39,920 is if I do BFS on my graph, that's sort of OK. 747 00:36:39,920 --> 00:36:43,020 Remember, that's vertices plus edges time. 748 00:36:43,020 --> 00:36:45,560 But in this case, those are the same. 749 00:36:45,560 --> 00:36:48,702 OK, so right. 750 00:36:48,702 --> 00:36:50,160 So remember, at the end of the day, 751 00:36:50,160 --> 00:36:51,660 I'm trying to compute the latency. 752 00:36:51,660 --> 00:36:56,240 This is like the length of the shortest path to the entry 753 00:36:56,240 --> 00:36:58,460 point nodes. 754 00:36:58,460 --> 00:37:00,440 So here would be a braindead algorithm, 755 00:37:00,440 --> 00:37:12,990 which is to say, for all routers, for all entry points, 756 00:37:12,990 --> 00:37:13,490 compute-- 757 00:37:16,438 --> 00:37:16,980 I don't know. 758 00:37:16,980 --> 00:37:20,770 Let's call the router on i, the entry point j. 759 00:37:20,770 --> 00:37:23,760 I compute distance ij like using breadth-first search 760 00:37:23,760 --> 00:37:25,330 or something. 761 00:37:25,330 --> 00:37:32,850 And then I take the min of these values 762 00:37:32,850 --> 00:37:35,660 and add them all together, right? 763 00:37:35,660 --> 00:37:38,890 So I compute-- for every router, I look at every possible entry 764 00:37:38,890 --> 00:37:39,390 point. 765 00:37:39,390 --> 00:37:41,190 I compute its distance to the entry point. 766 00:37:41,190 --> 00:37:42,990 I take the min over all these things. 767 00:37:42,990 --> 00:37:46,655 And now I add that to my running sum. 768 00:37:46,655 --> 00:37:48,030 There's a problem here, which is, 769 00:37:48,030 --> 00:37:50,520 I haven't told you the relative number of entry points 770 00:37:50,520 --> 00:37:52,610 to the total number of routers. 771 00:37:52,610 --> 00:37:56,250 So at least the way that I've written this algorithm here, 772 00:37:56,250 --> 00:37:58,290 how much time would this take? 773 00:37:58,290 --> 00:38:00,360 Well, there's two different for loops. 774 00:38:00,360 --> 00:38:02,453 And in the worst possible case, at least 775 00:38:02,453 --> 00:38:03,870 in my braindead algorithm, I don't 776 00:38:03,870 --> 00:38:07,150 notice that if I am an entry point, 777 00:38:07,150 --> 00:38:10,570 then I don't need to compute distances. 778 00:38:10,570 --> 00:38:14,140 Well, this would take order r squared-- 779 00:38:14,140 --> 00:38:17,890 whoa, r squared time, right? 780 00:38:17,890 --> 00:38:18,880 At least, right? 781 00:38:18,880 --> 00:38:21,880 Actually, I shouldn't even write big O. I should write-- 782 00:38:21,880 --> 00:38:22,840 what's lower bound? 783 00:38:22,840 --> 00:38:24,760 Oh god, I'm a terrible algorithms professor. 784 00:38:24,760 --> 00:38:28,573 Omega of r squared time, because I haven't even 785 00:38:28,573 --> 00:38:30,490 accounted for the amount of time that it takes 786 00:38:30,490 --> 00:38:33,412 to compute the distance, right? 787 00:38:33,412 --> 00:38:34,870 And this is a problem, because I've 788 00:38:34,870 --> 00:38:37,030 only given you a budget of linear time for your algorithm, 789 00:38:37,030 --> 00:38:37,690 right? 790 00:38:37,690 --> 00:38:40,120 So this is frowny face. 791 00:38:40,120 --> 00:38:42,070 I tried drawing the turd emoji on my notes. 792 00:38:42,070 --> 00:38:43,750 And it really-- it didn't work. 793 00:38:43,750 --> 00:38:47,440 OK, so we need a better trick. 794 00:38:47,440 --> 00:38:51,700 And this is actually one of these prototypical tricks, 795 00:38:51,700 --> 00:38:53,750 which is to do the following. 796 00:38:53,750 --> 00:38:54,920 So let's construct a graph. 797 00:38:54,920 --> 00:38:57,610 I'm going to draw my graph in a particular way. 798 00:38:57,610 --> 00:38:59,843 But notice that there's nothing about my algorithm 799 00:38:59,843 --> 00:39:01,510 that cares about the way that I drew it. 800 00:39:01,510 --> 00:39:03,143 This is just to make my life easier, 801 00:39:03,143 --> 00:39:04,810 which is, I'm going to put all the entry 802 00:39:04,810 --> 00:39:08,050 points on the left and all the remaining 803 00:39:08,050 --> 00:39:11,433 non-entry-point routers on the right, because I can. 804 00:39:11,433 --> 00:39:13,100 And so this is what my graph looks like. 805 00:39:13,100 --> 00:39:14,517 So these are like my entry points. 806 00:39:16,952 --> 00:39:18,035 Here are my other routers. 807 00:39:21,206 --> 00:39:23,402 My graph doesn't have to be bipartite. 808 00:39:23,402 --> 00:39:25,360 Like, it could be that my routers are connected 809 00:39:25,360 --> 00:39:27,670 to each other, whatever. 810 00:39:27,670 --> 00:39:31,120 And then there are some edges that go from my entry points 811 00:39:31,120 --> 00:39:33,070 to the routers in the graph. 812 00:39:33,070 --> 00:39:35,620 I'm trying to make sure that my graph is connected. 813 00:39:35,620 --> 00:39:36,680 OK. 814 00:39:36,680 --> 00:39:39,490 And so essentially, what this problem is asking you to do 815 00:39:39,490 --> 00:39:42,160 is to say, OK, for every single node in my graph, 816 00:39:42,160 --> 00:39:46,870 I need to compute the distance to the closest entry point 817 00:39:46,870 --> 00:39:48,910 and then sum all those things together, right? 818 00:39:48,910 --> 00:39:52,980 That's just the schematic we could have in mind. 819 00:39:52,980 --> 00:39:57,160 So in some sense, what we want to do 820 00:39:57,160 --> 00:39:58,870 is think about the set of entry points 821 00:39:58,870 --> 00:40:02,200 as like one giant node, because it doesn't matter 822 00:40:02,200 --> 00:40:06,540 which of these guys I choose for my shortest path to an entry 823 00:40:06,540 --> 00:40:07,040 point. 824 00:40:07,040 --> 00:40:08,890 I just need to find one, yeah? 825 00:40:08,890 --> 00:40:10,205 And so here's the basic trick. 826 00:40:10,205 --> 00:40:12,580 And this is one that appears all over graph theory, which 827 00:40:12,580 --> 00:40:15,260 is I'm going to introduce one additional node to my graph. 828 00:40:15,260 --> 00:40:18,140 And I'm going to put him on the left-hand side. 829 00:40:18,140 --> 00:40:23,540 He's really big, because he is a supernode, which 830 00:40:23,540 --> 00:40:25,370 is a term of art. 831 00:40:25,370 --> 00:40:26,570 This term shows up a lot. 832 00:40:26,570 --> 00:40:31,860 And I'm going to connect it to every entry 833 00:40:31,860 --> 00:40:35,961 point in my network of routers. 834 00:40:35,961 --> 00:40:38,670 Does that makes sense, class? 835 00:40:38,670 --> 00:40:39,930 OK. 836 00:40:39,930 --> 00:40:42,760 So here's the kind of cool thing. 837 00:40:42,760 --> 00:40:44,790 So first of all, for every entry point, 838 00:40:44,790 --> 00:40:46,770 what's the shortest path from the entry 839 00:40:46,770 --> 00:40:48,328 point to the supernode? 840 00:40:48,328 --> 00:40:49,995 Well, obviously, it has length 1, right? 841 00:40:49,995 --> 00:40:52,240 I drew it for you here. 842 00:40:52,240 --> 00:40:53,680 Now, here's the thing. 843 00:40:53,680 --> 00:40:56,100 Let's take the shortest path from the supernode 844 00:40:56,100 --> 00:40:59,690 to any of the routers on the right-hand side. 845 00:40:59,690 --> 00:41:00,530 What do I know? 846 00:41:00,530 --> 00:41:04,378 Well, clearly-- like maybe I choose this guy here. 847 00:41:04,378 --> 00:41:05,670 Well, what is my shortest path? 848 00:41:05,670 --> 00:41:08,760 It goes here and then there. 849 00:41:08,760 --> 00:41:10,140 There's one property that matters 850 00:41:10,140 --> 00:41:14,550 here, which is that it has to pass through one of these entry 851 00:41:14,550 --> 00:41:16,420 nodes. 852 00:41:16,420 --> 00:41:19,370 Which one does it have to pass through? 853 00:41:19,370 --> 00:41:21,020 Shrug. 854 00:41:21,020 --> 00:41:21,890 For shame. 855 00:41:21,890 --> 00:41:24,710 Well, remember, Justin's favorite inequality 856 00:41:24,710 --> 00:41:26,385 is the triangle inequality. 857 00:41:26,385 --> 00:41:27,260 And what does it say? 858 00:41:27,260 --> 00:41:29,060 It says that if I compute the shortest 859 00:41:29,060 --> 00:41:33,560 path from the supernode to any node in my graph, 860 00:41:33,560 --> 00:41:36,920 then every sort of sub-piece of that shortest path 861 00:41:36,920 --> 00:41:38,750 is also a shortest path. 862 00:41:38,750 --> 00:41:40,190 That sentence was hard to parse. 863 00:41:40,190 --> 00:41:41,400 Let's try that again. 864 00:41:41,400 --> 00:41:43,400 So in particular, if I have a graph 865 00:41:43,400 --> 00:41:45,793 from the supernode to some router over here, 866 00:41:45,793 --> 00:41:47,210 well, we've convinced ourselves it 867 00:41:47,210 --> 00:41:50,112 has to pass through one of the entry nodes. 868 00:41:50,112 --> 00:41:51,820 Which one does that have to pass through? 869 00:41:51,820 --> 00:41:55,450 Is it ever something that is farther than the closest entry 870 00:41:55,450 --> 00:41:57,230 node? 871 00:41:57,230 --> 00:41:59,210 Well, no, because I could compute a shorter 872 00:41:59,210 --> 00:42:01,940 path in that case by choosing the closest entry 873 00:42:01,940 --> 00:42:04,740 node and then going to the supernode. 874 00:42:04,740 --> 00:42:07,850 So this is a complicated way of saying that essentially, 875 00:42:07,850 --> 00:42:18,910 what we really want is for every router, the distance 876 00:42:18,910 --> 00:42:26,310 from that router, let's call it i, to the supernode, s. 877 00:42:26,310 --> 00:42:27,210 Is that quite right? 878 00:42:27,210 --> 00:42:31,132 Is that the distance to the closest entry point? 879 00:42:31,132 --> 00:42:32,840 AUDIENCE: You went one more inch too far. 880 00:42:32,840 --> 00:42:34,210 JUSTIN SOLOMON: I went one inch too far, right? 881 00:42:34,210 --> 00:42:36,210 Because I went to the closest entry point. 882 00:42:36,210 --> 00:42:37,668 And then I took an additional edge. 883 00:42:37,668 --> 00:42:40,500 So we want to do minus 1. 884 00:42:40,500 --> 00:42:41,730 OK. 885 00:42:41,730 --> 00:42:43,117 So what does this mean? 886 00:42:43,117 --> 00:42:44,700 Well, that means that I don't actually 887 00:42:44,700 --> 00:42:48,180 have to have this inner for loop over all the possible entry 888 00:42:48,180 --> 00:42:49,090 points. 889 00:42:49,090 --> 00:42:51,690 I just need to construct this new special graph with one 890 00:42:51,690 --> 00:42:53,815 additional node-- notice that's not going to affect 891 00:42:53,815 --> 00:42:54,690 my runtime-- 892 00:42:54,690 --> 00:42:56,940 and compute the shortest distance from the supernode 893 00:42:56,940 --> 00:43:00,190 to every other node in my graph, and then 894 00:43:00,190 --> 00:43:02,080 use that as my output, yeah? 895 00:43:02,080 --> 00:43:07,810 So in other words, what is my algorithm going to look like? 896 00:43:07,810 --> 00:43:13,560 Well, first, I'm going to construct my graph, right? 897 00:43:13,560 --> 00:43:15,188 So what do I need to do? 898 00:43:15,188 --> 00:43:16,980 If I were to write this out in my homework, 899 00:43:16,980 --> 00:43:18,570 I would have to talk about how I've 900 00:43:18,570 --> 00:43:20,790 got these chains of edges between different pairs 901 00:43:20,790 --> 00:43:21,570 of routers. 902 00:43:21,570 --> 00:43:23,153 In addition to that, I'm going to make 903 00:43:23,153 --> 00:43:26,130 one additional supernode and insert an edge 904 00:43:26,130 --> 00:43:29,250 from that to every entry point. 905 00:43:29,250 --> 00:43:33,750 Notice that adding the entry point here just adds a 1 906 00:43:33,750 --> 00:43:36,780 to the number of vertices, and at most, I 907 00:43:36,780 --> 00:43:38,620 guess, an r to the number of edges, 908 00:43:38,620 --> 00:43:40,740 which doesn't affect asymptotically 909 00:43:40,740 --> 00:43:42,480 the size of either of these two sets. 910 00:43:42,480 --> 00:43:44,430 So that's a good thing. 911 00:43:44,430 --> 00:43:47,820 Now I'm going to do-- 912 00:43:47,820 --> 00:43:56,620 I'm going to use BFS to do a single-source shortest path 913 00:43:56,620 --> 00:44:02,260 from my supernode to all other vertices. 914 00:44:06,070 --> 00:44:08,510 And how much time does this take? 915 00:44:08,510 --> 00:44:11,450 Well, remember that in general, BFS takes v plus E time. 916 00:44:11,450 --> 00:44:14,450 In this case, v plus E are both-- 917 00:44:14,450 --> 00:44:15,470 look like r. 918 00:44:15,470 --> 00:44:18,440 So this is order r time. 919 00:44:18,440 --> 00:44:19,760 OK. 920 00:44:19,760 --> 00:44:31,010 And then finally, I'm going to sum over routers 921 00:44:31,010 --> 00:44:37,790 i the value of the distance from the supernode to the router i, 922 00:44:37,790 --> 00:44:41,070 minus 1 to account for that additional edge that I added. 923 00:44:41,070 --> 00:44:44,390 OK, and that's the solution to our problem. 924 00:44:44,390 --> 00:44:48,410 OK, any questions about number 2 here? 925 00:44:48,410 --> 00:44:49,190 Excellent. 926 00:44:49,190 --> 00:44:50,220 Go team. 927 00:44:50,220 --> 00:44:50,720 OK. 928 00:44:50,720 --> 00:44:53,420 So now let's move on to problem 3. 929 00:44:53,420 --> 00:44:55,910 Am I-- yeah, we're about halfway. 930 00:44:55,910 --> 00:44:59,240 OK, so in problem 3-- 931 00:44:59,240 --> 00:44:59,740 right. 932 00:44:59,740 --> 00:45:06,715 So we're doing Potry Harter and three wizard friends. 933 00:45:06,715 --> 00:45:08,090 The number three here, I believe, 934 00:45:08,090 --> 00:45:10,845 is actually irrelevant, although like any time 935 00:45:10,845 --> 00:45:12,470 you see a specific number in a problem, 936 00:45:12,470 --> 00:45:15,560 you should cache that in your bag of things to remember. 937 00:45:15,560 --> 00:45:18,890 And in this case, that was a red herring. 938 00:45:18,890 --> 00:45:24,200 Potry Harter and her three wizard friends 939 00:45:24,200 --> 00:45:27,890 are tasked with searching around a labyrinth, yeah? 940 00:45:27,890 --> 00:45:30,620 And in particular, there's some nice things 941 00:45:30,620 --> 00:45:35,390 to know about the labyrinth and Potry Harter world-- 942 00:45:35,390 --> 00:45:39,150 this is really throwing off my dyslexia here-- 943 00:45:39,150 --> 00:45:40,130 which is the following. 944 00:45:43,140 --> 00:45:45,820 Right, so what do we know? 945 00:45:45,820 --> 00:45:54,880 We know that there are n rooms in my labyrinth 946 00:45:54,880 --> 00:46:02,110 and that each of my rooms has at most four doors. 947 00:46:05,810 --> 00:46:09,080 So in other words, if I think of building a graph out 948 00:46:09,080 --> 00:46:10,940 of my rooms, which is like, I don't think 949 00:46:10,940 --> 00:46:12,890 I'm giving much away about this problem 950 00:46:12,890 --> 00:46:15,560 by jumping to the solution a little bit, 951 00:46:15,560 --> 00:46:17,870 what do we know about the degree of any vertex, 952 00:46:17,870 --> 00:46:21,620 assuming my vertices are rooms in the labyrinth? 953 00:46:21,620 --> 00:46:22,460 It's at most four. 954 00:46:22,460 --> 00:46:24,260 So that's kind of nice. 955 00:46:24,260 --> 00:46:26,630 OK, right. 956 00:46:26,630 --> 00:46:28,160 And all the doors start closed. 957 00:46:31,110 --> 00:46:33,750 So that seems like a useful piece of information 958 00:46:33,750 --> 00:46:35,580 to remember. 959 00:46:35,580 --> 00:46:37,980 But we have this kind of weird thing, which is 960 00:46:37,980 --> 00:46:44,454 that some doors are enchanted. 961 00:46:47,120 --> 00:46:54,320 And apparently, Potry Harter can open up 962 00:46:54,320 --> 00:46:57,030 certain doors for free, which are not the intended doors. 963 00:46:57,030 --> 00:46:59,180 And then other ones, they have to do 964 00:46:59,180 --> 00:47:00,860 the blessing, and the holy water, 965 00:47:00,860 --> 00:47:04,460 and whatever it is that happens in this universe, 966 00:47:04,460 --> 00:47:06,200 and then opens up that door. 967 00:47:06,200 --> 00:47:09,500 But that costs them materials and heartache, right? 968 00:47:09,500 --> 00:47:12,110 And so we want to minimize that. 969 00:47:12,110 --> 00:47:17,580 And so what they're given is basically a map. 970 00:47:17,580 --> 00:47:19,580 And this includes all of the different rooms, 971 00:47:19,580 --> 00:47:22,080 how they're connected to one another, and which of the doors 972 00:47:22,080 --> 00:47:23,790 are enchanted. 973 00:47:23,790 --> 00:47:31,740 And what I want is the minimum number of doors 974 00:47:31,740 --> 00:47:35,040 that they have to disenchant. 975 00:47:39,630 --> 00:47:42,780 Now, this problem is like kind of sneaky. 976 00:47:42,780 --> 00:47:46,860 And the reason why is that there's like the network that's 977 00:47:46,860 --> 00:47:48,210 obvious to build. 978 00:47:48,210 --> 00:47:51,053 And that turns out to be not quite the right one. 979 00:47:51,053 --> 00:47:53,220 And then you can start thinking about adding weights 980 00:47:53,220 --> 00:47:54,887 on your graph and going crazy with that. 981 00:47:54,887 --> 00:47:57,370 But that turns out not to be the right direction. 982 00:47:57,370 --> 00:47:59,932 And in fact, in Potry Harter world, 983 00:47:59,932 --> 00:48:02,390 apparently, we're not worried about their physical fitness. 984 00:48:02,390 --> 00:48:05,035 In other words, shortest paths are actually 985 00:48:05,035 --> 00:48:06,160 irrelevant in this problem. 986 00:48:06,160 --> 00:48:06,827 Do you see that? 987 00:48:06,827 --> 00:48:13,710 Because let's say that I have a really complicated, annoying 988 00:48:13,710 --> 00:48:14,210 problem. 989 00:48:14,210 --> 00:48:16,150 So like, maybe I have-- 990 00:48:16,150 --> 00:48:20,310 here's my labyrinth. 991 00:48:20,310 --> 00:48:24,720 And we don't even talk about the entry point, 992 00:48:24,720 --> 00:48:26,130 like where they actually go in. 993 00:48:26,130 --> 00:48:28,020 But just for fiction purposes, let's say 994 00:48:28,020 --> 00:48:30,990 that they enter my labyrinth here and that, 995 00:48:30,990 --> 00:48:34,638 just to be annoying, the two doors that are enchanted-- 996 00:48:34,638 --> 00:48:36,180 remember, we could make a graph where 997 00:48:36,180 --> 00:48:39,060 all the vertices are rooms, and the edges are doors-- 998 00:48:39,060 --> 00:48:42,090 are like at these two endpoints of the T. So I have a giant T. 999 00:48:42,090 --> 00:48:44,920 And I enter right in the middle. 1000 00:48:44,920 --> 00:48:48,340 Now, what is Potry Harter to do here? 1001 00:48:48,340 --> 00:48:50,993 Well, obviously, there's-- since this graph is a tree, 1002 00:48:50,993 --> 00:48:52,660 there's only so much they can do, right? 1003 00:48:52,660 --> 00:48:54,760 Maybe they enter here. 1004 00:48:54,760 --> 00:48:56,950 They walk over all the way to the end 1005 00:48:56,950 --> 00:48:59,560 to disenchant the door over here. 1006 00:48:59,560 --> 00:49:02,680 And then they turn around and walk to the other end. 1007 00:49:02,680 --> 00:49:06,100 They disenchant that guy, yeah? 1008 00:49:06,100 --> 00:49:08,140 And now they can reach other rooms. 1009 00:49:10,737 --> 00:49:13,070 Yeah, because that's their goal, is to visit every room. 1010 00:49:13,070 --> 00:49:15,842 Sorry, I think I skipped that step. 1011 00:49:15,842 --> 00:49:17,300 Now, there's a few things to notice 1012 00:49:17,300 --> 00:49:18,675 about this example, which make it 1013 00:49:18,675 --> 00:49:20,870 a little bit different from the typical graph 1014 00:49:20,870 --> 00:49:24,750 theory thing, which is, once they disenchant this door, 1015 00:49:24,750 --> 00:49:28,600 like they walk over here, and they open it, well, now they 1016 00:49:28,600 --> 00:49:31,240 walk over to this other room, just to-- 1017 00:49:31,240 --> 00:49:32,740 you know those gym exercises where 1018 00:49:32,740 --> 00:49:34,630 you run to the other side of the room, you touch the floor, 1019 00:49:34,630 --> 00:49:35,600 and then you run back? 1020 00:49:35,600 --> 00:49:36,880 That's kind of what they did here, right? 1021 00:49:36,880 --> 00:49:38,170 They ran to this room. 1022 00:49:38,170 --> 00:49:39,240 They tapped that vertex. 1023 00:49:39,240 --> 00:49:41,740 And now they want to turn around and walk to the other side. 1024 00:49:41,740 --> 00:49:44,860 They don't pay money again on their way out, right? 1025 00:49:44,860 --> 00:49:48,960 So once they open that door, it stays open. 1026 00:49:48,960 --> 00:49:52,052 And that's actually quite important, because what it does 1027 00:49:52,052 --> 00:49:54,510 is it makes this problem not look like a traveling salesman 1028 00:49:54,510 --> 00:49:57,030 problem, which wouldn't be so great. 1029 00:49:57,030 --> 00:49:59,040 OK, so right. 1030 00:49:59,040 --> 00:50:02,140 And moreover, does the fact that they're-- like, 1031 00:50:02,140 --> 00:50:03,420 maybe I subdivide these edges. 1032 00:50:03,420 --> 00:50:06,690 I have a bunch of edges here that are all not enchanted. 1033 00:50:06,690 --> 00:50:12,700 Does that matter, like if I had like five billion edges here? 1034 00:50:12,700 --> 00:50:13,510 No, right? 1035 00:50:13,510 --> 00:50:15,220 Because they only ask in this problem 1036 00:50:15,220 --> 00:50:20,180 for the minimum number of doors that you have to disenchant, 1037 00:50:20,180 --> 00:50:20,680 yeah? 1038 00:50:20,680 --> 00:50:24,940 So it might be the Harry Harter walks really 1039 00:50:24,940 --> 00:50:26,195 far along my graph. 1040 00:50:26,195 --> 00:50:28,570 But as long as they don't walk through an enchanted door, 1041 00:50:28,570 --> 00:50:31,072 it costs them nothing. 1042 00:50:31,072 --> 00:50:32,030 So what does that mean? 1043 00:50:32,030 --> 00:50:34,530 Well, that means that in some sense, the second that I enter 1044 00:50:34,530 --> 00:50:37,710 a room, I might as well walk to every other room 1045 00:50:37,710 --> 00:50:41,370 that it's connected to through unenchanted doors. 1046 00:50:41,370 --> 00:50:43,210 And that doesn't cost me anything. 1047 00:50:43,210 --> 00:50:46,050 So sort of as a policy, I should do that, right? 1048 00:50:46,050 --> 00:50:47,160 I enter a room. 1049 00:50:47,160 --> 00:50:48,930 And then I kind of just search around 1050 00:50:48,930 --> 00:50:50,730 and enter every possible door that I 1051 00:50:50,730 --> 00:50:54,180 can that doesn't cost me an enchantment, because those 1052 00:50:54,180 --> 00:50:54,690 are free. 1053 00:50:54,690 --> 00:50:58,080 And my goal is to visit every room, yeah? 1054 00:50:58,080 --> 00:51:00,870 OK, so here's going to be the sneaky trick. 1055 00:51:00,870 --> 00:51:03,090 Like, what is that starting to smell like? 1056 00:51:03,090 --> 00:51:06,300 I open a door, and now I want to explore all the other rooms 1057 00:51:06,300 --> 00:51:08,370 that are connected to that one. 1058 00:51:08,370 --> 00:51:09,987 AUDIENCE: Maybe a connected component. 1059 00:51:09,987 --> 00:51:12,070 JUSTIN SOLOMON: Yeah, maybe a connected component. 1060 00:51:12,070 --> 00:51:12,820 There's a problem. 1061 00:51:12,820 --> 00:51:14,760 Is it connected component in this graph? 1062 00:51:14,760 --> 00:51:15,260 Well, no. 1063 00:51:15,260 --> 00:51:17,978 Like, this whole graph is one giant connected component. 1064 00:51:17,978 --> 00:51:19,770 So the sneaky trick is we're actually going 1065 00:51:19,770 --> 00:51:23,100 to remove the enchanted doors. 1066 00:51:23,100 --> 00:51:26,340 That was supposed to erase, and it didn't happen. 1067 00:51:26,340 --> 00:51:29,140 But the point is that if we remove the connected doors, 1068 00:51:29,140 --> 00:51:30,900 these are like the chunks of my map 1069 00:51:30,900 --> 00:51:34,540 that I can visit without incurring any cost. 1070 00:51:34,540 --> 00:51:36,600 So if I think of my graph, maybe there's 1071 00:51:36,600 --> 00:51:38,262 a bunch of vertices over here. 1072 00:51:38,262 --> 00:51:39,720 And then there's an enchanted door. 1073 00:51:39,720 --> 00:51:41,550 And there's a bunch of vertices over here, 1074 00:51:41,550 --> 00:51:44,397 and then like two more enchanted doors like that. 1075 00:51:44,397 --> 00:51:46,230 And like, what goes on in here, like if this 1076 00:51:46,230 --> 00:51:47,940 is like a giant triangle or something, 1077 00:51:47,940 --> 00:51:50,910 is actually irrelevant, because once I touch any one of these, 1078 00:51:50,910 --> 00:51:53,740 I can now touch all the rest of them. 1079 00:51:53,740 --> 00:51:54,990 So let's suggest an algorithm. 1080 00:51:54,990 --> 00:52:03,740 So our first step is that we construct a graph, G, 1081 00:52:03,740 --> 00:52:05,420 where the nodes are the rooms-- 1082 00:52:09,440 --> 00:52:10,535 are the rooms. 1083 00:52:13,520 --> 00:52:15,650 And what should the edges be? 1084 00:52:15,650 --> 00:52:19,220 Well, if I'm just trying to find these little clumps of rooms 1085 00:52:19,220 --> 00:52:22,640 that I can visit for free if I get to any one of them, 1086 00:52:22,640 --> 00:52:29,100 then the edges are the non-enchanted doors. 1087 00:52:36,300 --> 00:52:37,230 OK? 1088 00:52:37,230 --> 00:52:41,640 And so now, in step two, I'm going 1089 00:52:41,640 --> 00:52:46,020 to compute my connected components, which 1090 00:52:46,020 --> 00:52:49,030 we covered in lecture-- 1091 00:52:49,030 --> 00:52:51,490 the connected components of my graph, 1092 00:52:51,490 --> 00:52:54,410 G. How much time does that take? 1093 00:52:54,410 --> 00:52:56,660 Well, remember that there are two different algorithms 1094 00:52:56,660 --> 00:52:58,850 we mentioned that can do this. 1095 00:52:58,850 --> 00:53:04,910 This is full BFS or DFS. 1096 00:53:04,910 --> 00:53:07,880 And both of them are going to take the same amount of time. 1097 00:53:07,880 --> 00:53:09,250 What is that time? 1098 00:53:09,250 --> 00:53:11,000 AUDIENCE: Linear in the size of the graph. 1099 00:53:11,000 --> 00:53:13,430 JUSTIN SOLOMON: Linear in the size of the graph. 1100 00:53:13,430 --> 00:53:16,040 So initially, that could be problematic, 1101 00:53:16,040 --> 00:53:17,130 because I want order n. 1102 00:53:17,130 --> 00:53:20,840 Remember, there are n rooms here. 1103 00:53:20,840 --> 00:53:23,420 But thanks to our degree bound, thanks 1104 00:53:23,420 --> 00:53:25,953 to knowing that every room has at most four doors, 1105 00:53:25,953 --> 00:53:28,370 you can convince yourself that both the number of vertices 1106 00:53:28,370 --> 00:53:30,460 and the number of edges are order n, which 1107 00:53:30,460 --> 00:53:32,502 I should probably rush through, because as usual, 1108 00:53:32,502 --> 00:53:33,920 I'm going slowly. 1109 00:53:33,920 --> 00:53:35,090 OK. 1110 00:53:35,090 --> 00:53:36,815 So now, what do I have? 1111 00:53:36,815 --> 00:53:38,690 I have a list of all the connected components 1112 00:53:38,690 --> 00:53:40,860 in my graph. 1113 00:53:40,860 --> 00:53:43,340 And each one is potentially connected 1114 00:53:43,340 --> 00:53:47,290 to some other ones by enchanted doors. 1115 00:53:47,290 --> 00:53:49,473 So in some sense, I could think about-- 1116 00:53:49,473 --> 00:53:51,640 it's not to say this is the solution to the problem. 1117 00:53:51,640 --> 00:53:53,432 But I could think about modeling my problem 1118 00:53:53,432 --> 00:53:57,010 as making some new graph, where I put a giant vertex 1119 00:53:57,010 --> 00:53:59,020 in every connected component. 1120 00:53:59,020 --> 00:54:01,270 And maybe I connect them by enchanted doors. 1121 00:54:01,270 --> 00:54:06,490 And I want a path that touches every one of these rooms. 1122 00:54:06,490 --> 00:54:09,470 But that's not quite the right way to go. 1123 00:54:09,470 --> 00:54:12,160 And this is what catches you by surprise, because this 1124 00:54:12,160 --> 00:54:13,480 starts something scary, right? 1125 00:54:13,480 --> 00:54:15,563 If you've heard of the traveling salesman problem, 1126 00:54:15,563 --> 00:54:17,080 it kind of smells like that. 1127 00:54:17,080 --> 00:54:20,160 But that's not actually correct here for two reasons. 1128 00:54:20,160 --> 00:54:22,878 One is that once I open an enchanted door, 1129 00:54:22,878 --> 00:54:23,920 I can go back through it. 1130 00:54:23,920 --> 00:54:25,780 Like, I can like hopscotch back and forth 1131 00:54:25,780 --> 00:54:27,370 through that door as many times as I want, 1132 00:54:27,370 --> 00:54:28,703 and it doesn't cost me anything. 1133 00:54:28,703 --> 00:54:33,380 It only costs me something the first time I open it, yeah? 1134 00:54:33,380 --> 00:54:37,100 And moreover, I didn't ask you to actually 1135 00:54:37,100 --> 00:54:38,643 compute me that path. 1136 00:54:38,643 --> 00:54:40,310 If you read the problem closely, it just 1137 00:54:40,310 --> 00:54:43,890 asks for the minimum number of doors you have to open. 1138 00:54:43,890 --> 00:54:45,830 So this is a really sneaky problem, 1139 00:54:45,830 --> 00:54:50,690 because it turns out there's an additional one line of code 1140 00:54:50,690 --> 00:54:51,853 that solves this problem. 1141 00:54:51,853 --> 00:54:53,270 That's step three, which I'm going 1142 00:54:53,270 --> 00:54:56,600 to write before steps one and two, just to keep you confused. 1143 00:54:56,600 --> 00:55:04,180 And that is to return this number of connected components 1144 00:55:04,180 --> 00:55:04,720 minus 1. 1145 00:55:07,610 --> 00:55:08,680 That seems sneaky. 1146 00:55:08,680 --> 00:55:09,580 Why is that? 1147 00:55:09,580 --> 00:55:14,223 Well, what's going on here is the following, 1148 00:55:14,223 --> 00:55:16,390 which is that let's say that I walk along-- remember 1149 00:55:16,390 --> 00:55:17,580 that my graph is connected. 1150 00:55:17,580 --> 00:55:19,150 So what I know is that I can always 1151 00:55:19,150 --> 00:55:23,080 get from any one connected component to any other. 1152 00:55:23,080 --> 00:55:26,190 And so let's just take whatever order-- 1153 00:55:26,190 --> 00:55:28,020 notice that the problem hasn't actually 1154 00:55:28,020 --> 00:55:31,720 asked me how to return an efficient path. 1155 00:55:31,720 --> 00:55:34,807 It just asked me for the minimum number of doors I have to open. 1156 00:55:34,807 --> 00:55:36,390 So all I have to do is convince myself 1157 00:55:36,390 --> 00:55:38,850 there exists a path with this many doors I have to open. 1158 00:55:38,850 --> 00:55:40,380 I don't have to actually return it. 1159 00:55:40,380 --> 00:55:44,070 If I did, it would be mildly more annoying to think about. 1160 00:55:44,070 --> 00:55:45,690 OK. 1161 00:55:45,690 --> 00:55:47,292 So my graph is connected. 1162 00:55:47,292 --> 00:55:49,375 So one thing that I could do is make the world's-- 1163 00:55:52,170 --> 00:55:54,490 well, how do I want to do this? 1164 00:55:57,110 --> 00:55:58,170 Well, let's see here. 1165 00:55:58,170 --> 00:56:01,690 I guess I could come up with an ordering that 1166 00:56:01,690 --> 00:56:04,210 looks like depth-first search of my graph. 1167 00:56:04,210 --> 00:56:05,390 That should do it. 1168 00:56:05,390 --> 00:56:05,890 OK. 1169 00:56:05,890 --> 00:56:07,720 So maybe I start at this guy. 1170 00:56:07,720 --> 00:56:10,570 I just start at some arbitrary vertex. 1171 00:56:10,570 --> 00:56:12,760 And then I'm going to do depth-first search, 1172 00:56:12,760 --> 00:56:14,650 but rather than on the full graph, 1173 00:56:14,650 --> 00:56:16,030 on this kind of meta graph, where 1174 00:56:16,030 --> 00:56:19,398 I've clumped together rooms that I can get to with no cost. 1175 00:56:19,398 --> 00:56:20,440 So what am I going to do? 1176 00:56:20,440 --> 00:56:21,820 I'm going to start walking outward 1177 00:56:21,820 --> 00:56:23,740 toward this guy and then a depth-first search, 1178 00:56:23,740 --> 00:56:26,740 backtracking, and then going back down. 1179 00:56:26,740 --> 00:56:28,328 And if you think about it, remember 1180 00:56:28,328 --> 00:56:30,370 that in depth-first search, I have this property, 1181 00:56:30,370 --> 00:56:34,390 I never need to revisit a clump which I've got to it once. 1182 00:56:34,390 --> 00:56:36,640 Well, the total number of doors that I'm going to open 1183 00:56:36,640 --> 00:56:38,890 is exactly the number of connected components minus 1, 1184 00:56:38,890 --> 00:56:41,380 because as soon as I've done that, my depth-first search is 1185 00:56:41,380 --> 00:56:43,060 done here, yeah? 1186 00:56:43,060 --> 00:56:46,680 In other words, that's the number of nodes in my graph. 1187 00:56:46,680 --> 00:56:48,760 So if I took-- 1188 00:56:48,760 --> 00:56:51,930 what would be a better way to-- 1189 00:56:51,930 --> 00:56:53,840 I'm noticing that in my head, this was easier 1190 00:56:53,840 --> 00:56:55,060 to articulate than in words. 1191 00:56:58,315 --> 00:57:03,170 Here would be a way to do it, would be-- 1192 00:57:03,170 --> 00:57:06,060 AUDIENCE: Maybe add some more enchanted doors to the graph? 1193 00:57:06,060 --> 00:57:07,390 JUSTIN SOLOMON: Maybe add some more enchanted doors. 1194 00:57:07,390 --> 00:57:08,100 Ah, that's true. 1195 00:57:08,100 --> 00:57:11,070 Actually, my problem's a little too easy. 1196 00:57:11,070 --> 00:57:13,800 So as long as my depth-first search backtracks 1197 00:57:13,800 --> 00:57:15,510 along the paths it's already found, 1198 00:57:15,510 --> 00:57:17,280 then I'm sort of reaching out into this tentacle, 1199 00:57:17,280 --> 00:57:19,655 and then reaching back, and then reaching to a new place. 1200 00:57:19,655 --> 00:57:21,330 I'll never traverse an enchanted door 1201 00:57:21,330 --> 00:57:26,318 that I don't need to, because I've already seen the location. 1202 00:57:26,318 --> 00:57:28,360 AUDIENCE: So you're traversing a tree, basically? 1203 00:57:28,360 --> 00:57:29,235 JUSTIN SOLOMON: Yeah. 1204 00:57:29,235 --> 00:57:31,993 So I've got a shortest path tree that's going on here. 1205 00:57:31,993 --> 00:57:33,660 Actually, I guess a breadth-first search 1206 00:57:33,660 --> 00:57:34,743 would be a better example. 1207 00:57:37,638 --> 00:57:39,680 In fact, here's-- OK, let's be concrete about it. 1208 00:57:39,680 --> 00:57:40,220 I'm sorry. 1209 00:57:40,220 --> 00:57:42,178 I should have thought about this more carefully 1210 00:57:42,178 --> 00:57:43,687 than I did at home yesterday. 1211 00:57:43,687 --> 00:57:46,020 One thing I could do would be to compute a shortest path 1212 00:57:46,020 --> 00:57:49,790 tree from one vertex in this graph to all the other ones. 1213 00:57:49,790 --> 00:57:52,770 In particular, that gives me the shortest path. 1214 00:57:52,770 --> 00:57:55,490 And I could traverse that tree to one node, 1215 00:57:55,490 --> 00:57:57,103 and then traverse it all the way back, 1216 00:57:57,103 --> 00:57:58,520 and then reverse it to a new node, 1217 00:57:58,520 --> 00:58:01,040 and then traverse it all the way back, and so on. 1218 00:58:01,040 --> 00:58:04,310 This is not an efficient path from a walking perspective. 1219 00:58:04,310 --> 00:58:05,990 But from a door opening perspective, 1220 00:58:05,990 --> 00:58:08,690 it's extremely efficient, because it's a tree, right? 1221 00:58:08,690 --> 00:58:10,640 And remember that the number of edges 1222 00:58:10,640 --> 00:58:13,520 in a spanning tree of my graph is exactly 1223 00:58:13,520 --> 00:58:18,480 the number of vertices in my graph minus 1, 1224 00:58:18,480 --> 00:58:20,850 which is exactly the property we have here. 1225 00:58:20,850 --> 00:58:23,090 Whoo, sweating for a second there. 1226 00:58:23,090 --> 00:58:24,570 OK. 1227 00:58:24,570 --> 00:58:27,720 So now, in our remaining 30 minutes here, we've 1228 00:58:27,720 --> 00:58:31,210 got two more problems, which is more than enough time, 1229 00:58:31,210 --> 00:58:33,728 especially because the last problem is 1230 00:58:33,728 --> 00:58:35,520 largely combinatorial and less algorithmic. 1231 00:58:35,520 --> 00:58:37,890 So I think it's OK to focus-- 1232 00:58:37,890 --> 00:58:41,220 maybe talk about that at a high level and show a fun plot. 1233 00:58:41,220 --> 00:58:47,140 OK, so for problem 4, we have an airline, Purity Atlantic. 1234 00:58:47,140 --> 00:58:49,800 That's cute, Jason, really. 1235 00:58:49,800 --> 00:58:53,430 And it's owned by Brichard Ranson. 1236 00:58:53,430 --> 00:58:55,110 Did I get that right? 1237 00:58:55,110 --> 00:59:00,876 And Purity Atlantic has a cute sale-- 1238 00:59:00,876 --> 00:59:05,190 this is not like a cute angle, I suppose-- 1239 00:59:05,190 --> 00:59:06,880 which is essentially the following, 1240 00:59:06,880 --> 00:59:09,210 which is that you can book an itinerary where 1241 00:59:09,210 --> 00:59:11,160 you have your home city. 1242 00:59:11,160 --> 00:59:14,160 And then you choose, I believe, three other cities 1243 00:59:14,160 --> 00:59:16,170 that you want to visit. 1244 00:59:16,170 --> 00:59:18,560 And then Purity Atlantic-- 1245 00:59:18,560 --> 00:59:20,310 maybe you're on your honeymoon, and you're 1246 00:59:20,310 --> 00:59:23,025 not concerned with price, but rather, just the efficiency, 1247 00:59:23,025 --> 00:59:25,650 because you don't want to spend your whole time in an airplane. 1248 00:59:25,650 --> 00:59:28,290 That's particularly true this month. 1249 00:59:28,290 --> 00:59:30,170 Then what do you want to do? 1250 00:59:30,170 --> 00:59:33,090 You want to minimize your total number of connections, right? 1251 00:59:33,090 --> 00:59:38,070 Because as we all know, in spring 2020, 1252 00:59:38,070 --> 00:59:41,190 we don't want to spend very much time in airports, yeah? 1253 00:59:41,190 --> 00:59:43,530 So, right. 1254 00:59:43,530 --> 00:59:45,120 So how do we do that? 1255 00:59:45,120 --> 00:59:47,460 Well, we make a website, where you 1256 00:59:47,460 --> 00:59:49,593 tell Purity Atlantic the cities that you want 1257 00:59:49,593 --> 00:59:50,760 to visit and your home city. 1258 00:59:50,760 --> 00:59:53,430 And they give you back an efficient itinerary 1259 00:59:53,430 --> 00:59:56,410 that minimizes the number of connections. 1260 00:59:56,410 --> 00:59:56,910 OK. 1261 00:59:56,910 --> 01:00:00,140 And the question is, how do you actually do that, right? 1262 01:00:00,140 --> 01:00:03,420 How do you compute the best itinerary 1263 01:00:03,420 --> 01:00:05,837 that minimizes the number of flights you have to take? 1264 01:00:05,837 --> 01:00:06,920 So what are our variables? 1265 01:00:06,920 --> 01:00:08,330 And sometimes it feels like the variables 1266 01:00:08,330 --> 01:00:10,190 are all the different permutations of the cities you 1267 01:00:10,190 --> 01:00:10,990 could visit, right? 1268 01:00:10,990 --> 01:00:15,140 I could go to, I don't know, Cambridge, Boston, and then 1269 01:00:15,140 --> 01:00:16,760 Cambridge in the UK. 1270 01:00:16,760 --> 01:00:18,590 Maybe you're doing like a University thing, 1271 01:00:18,590 --> 01:00:21,210 and then, I don't know, Budapest and some other place. 1272 01:00:21,210 --> 01:00:23,330 Or I could do those in any other order. 1273 01:00:23,330 --> 01:00:25,680 And that feels like it should be factorial, 1274 01:00:25,680 --> 01:00:28,820 which would be bad news. 1275 01:00:28,820 --> 01:00:33,140 But this is one of these problems which 1276 01:00:33,140 --> 01:00:35,750 I suppose a computer science theorist might 1277 01:00:35,750 --> 01:00:37,190 call fixed parameter tractable. 1278 01:00:37,190 --> 01:00:39,800 But that's sort of an overkill term here. 1279 01:00:39,800 --> 01:00:41,690 But essentially, as long as you ignore 1280 01:00:41,690 --> 01:00:45,410 all the factors that make this problem hard, then it's easy. 1281 01:00:45,410 --> 01:00:48,290 A different way to put it is that, OK, 1282 01:00:48,290 --> 01:00:50,720 if I'm only visiting three cities, what's 1283 01:00:50,720 --> 01:00:56,650 the total number of possible orderings of my three cities? 1284 01:00:56,650 --> 01:00:59,090 Class? 1285 01:00:59,090 --> 01:01:04,960 So I have city A, B, C. I could do B, C, A. I could do B, A, C. 1286 01:01:04,960 --> 01:01:06,150 AUDIENCE: List them out. 1287 01:01:06,150 --> 01:01:09,270 JUSTIN SOLOMON: Yeah, fine, Jason. 1288 01:01:09,270 --> 01:01:10,870 I'll do that. 1289 01:01:10,870 --> 01:01:15,830 So this is what we call direct proof mathematically, 1290 01:01:15,830 --> 01:01:21,050 which are other possible ways to visit three cities. 1291 01:01:28,810 --> 01:01:32,710 And now, by my direct proof, I claim there are no other ways 1292 01:01:32,710 --> 01:01:34,360 to visit three cities. 1293 01:01:34,360 --> 01:01:36,890 And in particular, there are 1, 2, 3, 4, 5, 1294 01:01:36,890 --> 01:01:41,470 6 different orderings of the cities that I can visit. 1295 01:01:41,470 --> 01:01:43,570 Notice that this is a constant in my problem. 1296 01:01:43,570 --> 01:01:45,850 I am not asking you to make a website that 1297 01:01:45,850 --> 01:01:48,747 takes like the total set of cities 1298 01:01:48,747 --> 01:01:50,830 that you want to visit as a couple and order them. 1299 01:01:50,830 --> 01:01:52,450 It's specifically three. 1300 01:01:52,450 --> 01:01:57,730 You might also notice that 6 is 3 factorial, which is perhaps 1301 01:01:57,730 --> 01:02:00,880 a more efficient way to get to that same bound. 1302 01:02:00,880 --> 01:02:02,050 OK. 1303 01:02:02,050 --> 01:02:02,740 So, right. 1304 01:02:02,740 --> 01:02:05,470 So there's six different orderings of the cities. 1305 01:02:05,470 --> 01:02:09,090 And in each case, what am I going to have to do? 1306 01:02:09,090 --> 01:02:13,270 I'm going to have to compute the sum of going from my source 1307 01:02:13,270 --> 01:02:16,498 city to the first one, from the first one to the second one, 1308 01:02:16,498 --> 01:02:18,040 from the second one to the third one, 1309 01:02:18,040 --> 01:02:21,490 from the third one back to the first one, OK? 1310 01:02:21,490 --> 01:02:24,140 So what do I need? 1311 01:02:24,140 --> 01:02:26,535 Well, I need-- in some sense, I want 1312 01:02:26,535 --> 01:02:27,910 to be conservative about it, just 1313 01:02:27,910 --> 01:02:30,850 the cost of flying from every city to every other city. 1314 01:02:30,850 --> 01:02:32,390 But that's not quite right. 1315 01:02:32,390 --> 01:02:35,590 I only need the cost of flying from every city 1316 01:02:35,590 --> 01:02:37,270 that you have specified as a city you're 1317 01:02:37,270 --> 01:02:38,920 interested in to every other city 1318 01:02:38,920 --> 01:02:41,524 that you've specified that you're interested in, yeah? 1319 01:02:41,524 --> 01:02:42,790 OK. 1320 01:02:42,790 --> 01:02:46,880 So in particular, I go to my new one. 1321 01:02:46,880 --> 01:02:52,300 So in this problem, we have c cities and f flights. 1322 01:02:56,780 --> 01:02:57,280 OK. 1323 01:02:57,280 --> 01:02:58,910 And initially, it might seem that we 1324 01:02:58,910 --> 01:03:01,010 have to compute a ton of shortest paths. 1325 01:03:01,010 --> 01:03:03,170 But like, if I want to go from Boston, 1326 01:03:03,170 --> 01:03:06,365 to Budapest, to London, to-- 1327 01:03:06,365 --> 01:03:07,490 I'm running out of cities-- 1328 01:03:07,490 --> 01:03:10,460 Paris, and back to Boston, or whatever ordering I prefer, 1329 01:03:10,460 --> 01:03:16,070 do I need to worry about the shortest path from Nebraska 1330 01:03:16,070 --> 01:03:17,630 to California? 1331 01:03:17,630 --> 01:03:18,297 Potentially not. 1332 01:03:18,297 --> 01:03:19,588 Like, that could be irrelevant. 1333 01:03:19,588 --> 01:03:22,040 The only ones that I care about are those four cities 1334 01:03:22,040 --> 01:03:24,770 that I've identified, OK? 1335 01:03:24,770 --> 01:03:29,620 So there's 3 factorial possible permutations. 1336 01:03:29,620 --> 01:03:36,700 And at the end of the day, well, there's 2 times 4 choose 2. 1337 01:03:36,700 --> 01:03:45,480 If you're wondering, this is 12, or big O of 1 pairs of cities, 1338 01:03:45,480 --> 01:03:46,470 meaning that, like-- 1339 01:03:49,260 --> 01:03:53,930 for itinerary purposes-- itinerary-- 1340 01:03:57,480 --> 01:04:02,685 meaning that if I always enter an airport in one of city A, B, 1341 01:04:02,685 --> 01:04:06,720 C, or my hometown, and I always exit through another one, 1342 01:04:06,720 --> 01:04:08,820 so then there's four possible cities. 1343 01:04:08,820 --> 01:04:11,550 I choose them two at a time. 1344 01:04:11,550 --> 01:04:13,430 Notice that flights might not be ordered. 1345 01:04:13,430 --> 01:04:15,680 Like, I might be able to get from one city to another. 1346 01:04:15,680 --> 01:04:18,150 But then maybe the airplane has a connection or something. 1347 01:04:18,150 --> 01:04:19,970 So going back is a different cost. 1348 01:04:19,970 --> 01:04:24,140 But totally, there's 2 times 4 choose 2 different pairs 1349 01:04:24,140 --> 01:04:27,050 of cities that I could enter or exit from. 1350 01:04:27,050 --> 01:04:28,400 OK. 1351 01:04:28,400 --> 01:04:31,020 So now, what am I going to do? 1352 01:04:31,020 --> 01:04:36,790 Well, so I can compute the 12 different shortest paths 1353 01:04:36,790 --> 01:04:38,640 that matter in my graph. 1354 01:04:38,640 --> 01:04:40,720 So when I say shortest path, what do I mean? 1355 01:04:40,720 --> 01:04:49,850 Well, I'm going to construct a graph, G, 1356 01:04:49,850 --> 01:05:02,660 with one vertex per city and one edge per flight. 1357 01:05:05,760 --> 01:05:08,130 And notice the number of connections 1358 01:05:08,130 --> 01:05:10,020 that I need to make, the minimum number 1359 01:05:10,020 --> 01:05:13,020 between any city than any other city, 1360 01:05:13,020 --> 01:05:14,760 is equal to the shortest path-- 1361 01:05:14,760 --> 01:05:17,370 the length of the shortest path minus 1, right? 1362 01:05:17,370 --> 01:05:18,885 So like, maybe I have-- 1363 01:05:18,885 --> 01:05:20,430 like, here's Boston. 1364 01:05:20,430 --> 01:05:21,630 Here's London. 1365 01:05:21,630 --> 01:05:28,450 Here's Paris, B, L, P for short. 1366 01:05:28,450 --> 01:05:31,977 Then the length of my shortest path is 2. 1367 01:05:31,977 --> 01:05:33,810 And the number of connections I have to make 1368 01:05:33,810 --> 01:05:37,110 is 1, because I stop through London, yeah? 1369 01:05:37,110 --> 01:05:38,470 So what am I going to do? 1370 01:05:38,470 --> 01:05:46,623 Well, for every pair of cities in this-- 1371 01:05:46,623 --> 01:05:49,620 oops-- no, that's OK-- 1372 01:05:53,230 --> 01:05:57,250 in the set of the source city and the three cities 1373 01:05:57,250 --> 01:06:04,280 you want to visit, I'm going to compute the shortest-- 1374 01:06:04,280 --> 01:06:06,257 the length of the shortest path. 1375 01:06:06,257 --> 01:06:08,090 So this is the minimum number of connections 1376 01:06:08,090 --> 01:06:09,548 I need to get from any one of these 1377 01:06:09,548 --> 01:06:14,600 to any other one in my graph, G. 1378 01:06:14,600 --> 01:06:16,100 Well, how much time does this take? 1379 01:06:16,100 --> 01:06:17,690 Well, there's 12 such pairs. 1380 01:06:17,690 --> 01:06:19,740 We already argued that. 1381 01:06:19,740 --> 01:06:25,890 And how much time does it take to actually do shortest path, 1382 01:06:25,890 --> 01:06:27,817 so using breadth-first search? 1383 01:06:27,817 --> 01:06:29,650 AUDIENCE: Linear time the size of the graph. 1384 01:06:29,650 --> 01:06:31,733 JUSTIN SOLOMON: Linear time the size of the graph. 1385 01:06:31,733 --> 01:06:34,060 I think Jason actually has a t-shirt that says that. 1386 01:06:34,060 --> 01:06:36,040 Well, in this case, remember, that's 1387 01:06:36,040 --> 01:06:39,370 big O of the number of edges plus the number of vertices. 1388 01:06:39,370 --> 01:06:42,400 But just to make your life a little more annoying, 1389 01:06:42,400 --> 01:06:44,647 the number of vertices is the number of cities. 1390 01:06:44,647 --> 01:06:46,980 And the number of edges is the number of flights, right? 1391 01:06:46,980 --> 01:06:52,700 So this takes 12 times O of c plus f time, which, of course, 1392 01:06:52,700 --> 01:06:57,472 is just O of c plus f time. 1393 01:06:57,472 --> 01:06:58,930 Notice, this is one of these things 1394 01:06:58,930 --> 01:06:59,972 where we're being sneaky. 1395 01:06:59,972 --> 01:07:01,870 We told you that you specifically 1396 01:07:01,870 --> 01:07:04,030 visit three places. 1397 01:07:04,030 --> 01:07:06,580 And that's where this number 12 came from. 1398 01:07:06,580 --> 01:07:09,400 If we'd said that you wanted to visit m cities, 1399 01:07:09,400 --> 01:07:11,710 then this would be a very different homework problem. 1400 01:07:11,710 --> 01:07:13,710 This is one of those things you got to remember, 1401 01:07:13,710 --> 01:07:15,400 where we've given you a few constants, 1402 01:07:15,400 --> 01:07:17,530 and you should use them. 1403 01:07:17,530 --> 01:07:18,430 OK. 1404 01:07:18,430 --> 01:07:21,680 So now, what can I do? 1405 01:07:21,680 --> 01:07:39,400 I can iterate over every permutation of A, B, C, right? 1406 01:07:39,400 --> 01:07:45,670 So this is like saying I go from my source to city 1, to city 2, 1407 01:07:45,670 --> 01:07:49,930 to city 3, back to my source. 1408 01:07:49,930 --> 01:08:01,170 I add together and compute the cost, the cost of that trip. 1409 01:08:01,170 --> 01:08:02,940 And remember, cost in this case is 1410 01:08:02,940 --> 01:08:05,970 equal to the minimum number of connections. 1411 01:08:05,970 --> 01:08:07,500 And then I return the minimizer. 1412 01:08:10,890 --> 01:08:12,540 So I say, like, is it cheaper for me 1413 01:08:12,540 --> 01:08:15,270 to go Boston, Budapest, Paris, Boston, Paris, 1414 01:08:15,270 --> 01:08:17,140 Budapest, and so on. 1415 01:08:17,140 --> 01:08:19,080 So a for loop over permutations, which 1416 01:08:19,080 --> 01:08:20,609 generally is frowned upon. 1417 01:08:20,609 --> 01:08:22,710 But in this case, because we told you 1418 01:08:22,710 --> 01:08:26,090 you're visiting precisely three places, 1419 01:08:26,090 --> 01:08:29,390 how many steps are going to happen in that for loop? 1420 01:08:29,390 --> 01:08:32,390 Well, we actually wrote them all out over here on our board. 1421 01:08:32,390 --> 01:08:36,080 It's exactly 3 factorial, or 6 steps, right? 1422 01:08:36,080 --> 01:08:39,200 3 times 2 times 1, which is 6. 1423 01:08:39,200 --> 01:08:40,939 OK. 1424 01:08:40,939 --> 01:08:41,569 So, right. 1425 01:08:41,569 --> 01:08:44,300 So at the end of the day, this for loop 1426 01:08:44,300 --> 01:08:52,859 is going to take, well, order 6 time, which, of course, is just 1427 01:08:52,859 --> 01:08:53,370 order 1. 1428 01:08:53,370 --> 01:08:56,160 So it doesn't really contribute to our runtime at all. 1429 01:08:56,160 --> 01:09:01,560 And our entire algorithm runs in c plus f time. 1430 01:09:01,560 --> 01:09:02,677 OK, so right. 1431 01:09:02,677 --> 01:09:04,260 So this is one of these problems where 1432 01:09:04,260 --> 01:09:06,210 you're really taking advantage of the constants 1433 01:09:06,210 --> 01:09:06,918 that we gave you. 1434 01:09:06,918 --> 01:09:09,060 We said you're visiting three cities. 1435 01:09:09,060 --> 01:09:10,470 So use it. 1436 01:09:10,470 --> 01:09:12,720 Incidentally, as a computer science theorist, 1437 01:09:12,720 --> 01:09:15,930 if I said you're visiting exactly 17 cities, 1438 01:09:15,930 --> 01:09:17,800 well, what would be our numbers now? 1439 01:09:17,800 --> 01:09:22,020 I mean, it would be 17 factorial and then like 17 choose 2. 1440 01:09:22,020 --> 01:09:22,960 Those are big numbers. 1441 01:09:22,960 --> 01:09:24,127 But they're still constants. 1442 01:09:24,127 --> 01:09:26,580 So for purposes of this class, that would be OK. 1443 01:09:26,580 --> 01:09:29,939 But the second that I give it a name, like m, 1444 01:09:29,939 --> 01:09:32,700 then I got to think about those factorial things a little more 1445 01:09:32,700 --> 01:09:34,490 carefully. 1446 01:09:34,490 --> 01:09:36,439 All right. 1447 01:09:36,439 --> 01:09:37,510 So that's this problem. 1448 01:09:37,510 --> 01:09:39,950 So the basic trick here was that, 1449 01:09:39,950 --> 01:09:42,800 like, yeah, it looks like all pairs shortest path. 1450 01:09:42,800 --> 01:09:43,760 But it's not quite. 1451 01:09:43,760 --> 01:09:45,740 It's all pairs of things that you're actually 1452 01:09:45,740 --> 01:09:47,850 going to travel between shortest path. 1453 01:09:47,850 --> 01:09:51,090 And since that number of pairs is finite-- it's just 12-- 1454 01:09:51,090 --> 01:09:53,210 that's an OK thing to do. 1455 01:09:53,210 --> 01:09:55,590 OK. 1456 01:09:55,590 --> 01:09:57,150 How we doing? 1457 01:09:57,150 --> 01:09:58,020 Ah, 15 minutes. 1458 01:09:58,020 --> 01:09:59,468 Perfect. 1459 01:09:59,468 --> 01:10:01,010 I didn't want to do the last problem. 1460 01:10:01,010 --> 01:10:03,860 And I think I've managed to get myself 1461 01:10:03,860 --> 01:10:05,390 in exactly that position. 1462 01:10:05,390 --> 01:10:05,970 OK. 1463 01:10:05,970 --> 01:10:07,820 So the very last problem on this homework, 1464 01:10:07,820 --> 01:10:09,237 which, again, this homework really 1465 01:10:09,237 --> 01:10:12,260 follows the prototypical 6.006 breadth-first search, 1466 01:10:12,260 --> 01:10:13,700 depth-first search homework. 1467 01:10:13,700 --> 01:10:16,190 I feel like they all fall into a similar pattern. 1468 01:10:16,190 --> 01:10:18,680 Again, all these resources are available to you guys. 1469 01:10:18,680 --> 01:10:20,960 You should look at them. 1470 01:10:20,960 --> 01:10:22,850 We're not trying to hide anything. 1471 01:10:22,850 --> 01:10:24,920 This problem involves solving a pocket 1472 01:10:24,920 --> 01:10:26,990 cube, which is like a little mini Rubik's cube, 1473 01:10:26,990 --> 01:10:29,750 which is 2 by 2. 1474 01:10:29,750 --> 01:10:33,340 And it looks like this. 1475 01:10:33,340 --> 01:10:34,750 Ah, there's chalk. 1476 01:10:34,750 --> 01:10:38,750 Actually, there we go. 1477 01:10:38,750 --> 01:10:39,840 So here's my Rubik's cube. 1478 01:10:42,780 --> 01:10:47,790 Looks like a cube, which I'm having some trouble drawing. 1479 01:10:47,790 --> 01:10:52,190 And in particular, it's 2 by 2, which 1480 01:10:52,190 --> 01:10:56,870 makes it a little easier than your typical Rubik's cube. 1481 01:10:56,870 --> 01:11:00,260 And in particular, we're going to mark some faces. 1482 01:11:00,260 --> 01:11:02,340 Sneakily, they used a little geometry term here, 1483 01:11:02,340 --> 01:11:03,300 which is cute. 1484 01:11:03,300 --> 01:11:05,492 So here's face f0. 1485 01:11:05,492 --> 01:11:06,950 I'm sorry you can't quite see that. 1486 01:11:06,950 --> 01:11:11,360 But the top face is f0, in case you were wondering. 1487 01:11:11,360 --> 01:11:15,650 The left face is f1. 1488 01:11:15,650 --> 01:11:20,323 And the front-facing face here, f2. 1489 01:11:20,323 --> 01:11:21,740 Notice that we've identified these 1490 01:11:21,740 --> 01:11:24,620 by like vectors that point 90 degrees out from the face. 1491 01:11:24,620 --> 01:11:26,625 These are called normal vectors. 1492 01:11:26,625 --> 01:11:28,250 If you want to define those rigorously, 1493 01:11:28,250 --> 01:11:30,950 you can take my grad level class. 1494 01:11:30,950 --> 01:11:33,960 But for a Rubik's cube, it's not terribly difficult. 1495 01:11:33,960 --> 01:11:37,250 But in any event, I can talk about flipping this Rubik's 1496 01:11:37,250 --> 01:11:40,080 cube in a pretty easy way, which is that I like, 1497 01:11:40,080 --> 01:11:42,450 I'm going to fix one corner of my cube. 1498 01:11:42,450 --> 01:11:45,110 So this is like the corner that I'm holding onto with my hand. 1499 01:11:45,110 --> 01:11:47,030 And now I can grab, what? 1500 01:11:47,030 --> 01:11:50,330 The top, the side, or the front of my cube. 1501 01:11:50,330 --> 01:11:53,218 And I can rotate it clockwise or counterclockwise. 1502 01:11:53,218 --> 01:11:55,760 And you can convince yourself those are all the possible ways 1503 01:11:55,760 --> 01:11:58,280 that I could sort of mess with the state of my cube 1504 01:11:58,280 --> 01:12:01,100 after fixing one corner. 1505 01:12:01,100 --> 01:12:02,960 OK, right. 1506 01:12:02,960 --> 01:12:05,592 And so this problem basically is involving 1507 01:12:05,592 --> 01:12:06,800 sort of a very typical trick. 1508 01:12:06,800 --> 01:12:10,430 In fact, a lot of the history of these different search 1509 01:12:10,430 --> 01:12:13,070 algorithms-- breadth-first search, depth-first search, A*, 1510 01:12:13,070 --> 01:12:16,370 which I guess we won't really cover here-- 1511 01:12:16,370 --> 01:12:19,770 date back to, what, 20 or 30 years ago, 1512 01:12:19,770 --> 01:12:22,250 we would have called artificial intelligence. 1513 01:12:22,250 --> 01:12:25,220 These days, that has a very different meaning. 1514 01:12:25,220 --> 01:12:28,580 But back in the day, AI was all about solving board games, 1515 01:12:28,580 --> 01:12:31,910 and Rubik's cubes, and all these kinds of things, 1516 01:12:31,910 --> 01:12:32,912 using algorithms. 1517 01:12:32,912 --> 01:12:34,370 And the way that they would do that 1518 01:12:34,370 --> 01:12:37,910 is by searching the different spaces of configurations. 1519 01:12:37,910 --> 01:12:41,420 And so now, if we think of every face of this cube 1520 01:12:41,420 --> 01:12:45,080 as painted with a color, there are different configurations 1521 01:12:45,080 --> 01:12:49,280 of my graph that I get by flipping the three sides. 1522 01:12:49,280 --> 01:12:59,610 So if we think of there being a vertex for each state 1523 01:12:59,610 --> 01:13:03,180 of my cube, where state here means 1524 01:13:03,180 --> 01:13:07,020 like the coloring of every face on my Rubik's cube, 1525 01:13:07,020 --> 01:13:14,610 then there's an edge for every move. 1526 01:13:14,610 --> 01:13:17,460 And in this problem, we encoded a move as a pair, 1527 01:13:17,460 --> 01:13:20,280 j comma s, where it's saying that I'm 1528 01:13:20,280 --> 01:13:28,710 going to rotate face fj, where j is between, I guess, 0, 1, 2, 1529 01:13:28,710 --> 01:13:32,160 in direction s. 1530 01:13:32,160 --> 01:13:35,940 And we can just index that as like plus or minus 1 to kind 1531 01:13:35,940 --> 01:13:38,970 of say counterclockwise or clockwise. 1532 01:13:38,970 --> 01:13:40,720 So this is kind of a cute thing, where 1533 01:13:40,720 --> 01:13:43,310 your graph has a bunch of vertices, 1534 01:13:43,310 --> 01:13:46,760 which are all Rubik's cubes. 1535 01:13:46,760 --> 01:13:48,080 That's a cube. 1536 01:13:48,080 --> 01:13:50,900 And then there are edges if I can get from one to another 1537 01:13:50,900 --> 01:13:53,070 by doing one of these moves. 1538 01:13:53,070 --> 01:13:54,980 And this is a nice abstraction, because if I 1539 01:13:54,980 --> 01:13:58,298 want to solve a Rubik's cube in the most efficient way 1540 01:13:58,298 --> 01:14:00,590 possible, one way to do that is to compute the shortest 1541 01:14:00,590 --> 01:14:03,920 path from my current configuration 1542 01:14:03,920 --> 01:14:08,090 to the Platonic Rubik's cube, where 1543 01:14:08,090 --> 01:14:11,720 all the colors are constant on the different faces of my cube. 1544 01:14:11,720 --> 01:14:15,560 And so that's like a sort of basic identification 1545 01:14:15,560 --> 01:14:18,650 that happens all over the place in search strategies, 1546 01:14:18,650 --> 01:14:21,530 where I'm going to think of every vertex of my graph 1547 01:14:21,530 --> 01:14:24,290 as being the state of some system and every edge 1548 01:14:24,290 --> 01:14:26,900 as being a transition from one to another. 1549 01:14:26,900 --> 01:14:29,693 And then paths in this thing are kind of like different ways 1550 01:14:29,693 --> 01:14:30,860 of solving my puzzle, right? 1551 01:14:30,860 --> 01:14:33,740 So like a different one would be, I don't know, 1552 01:14:33,740 --> 01:14:38,210 every vertex is a chess board with the chess pieces scattered 1553 01:14:38,210 --> 01:14:39,080 on the chess board. 1554 01:14:39,080 --> 01:14:42,520 And every edge is one chess move by one player or the other. 1555 01:14:42,520 --> 01:14:45,200 In that case, you'd have to be a little careful, because you 1556 01:14:45,200 --> 01:14:47,825 want player 1 or player 2 to go back and forth from each other. 1557 01:14:47,825 --> 01:14:50,150 But I'll let you think about the reduction there. 1558 01:14:50,150 --> 01:14:52,010 OK, so right. 1559 01:14:52,010 --> 01:14:54,680 This problem, I think largely, is mostly 1560 01:14:54,680 --> 01:14:57,120 just fun combinatorics rather than algorithms. 1561 01:14:57,120 --> 01:15:00,240 But there's a little bit of algorithms hiding in here. 1562 01:15:00,240 --> 01:15:02,240 So they want you to argue that the number 1563 01:15:02,240 --> 01:15:04,460 of distinct configurations of this Rubik's 1564 01:15:04,460 --> 01:15:08,062 cube, this 2-by-2 guy, is less than 12 million. 1565 01:15:08,062 --> 01:15:10,520 This is nice, because 12 million is a number that computers 1566 01:15:10,520 --> 01:15:12,698 can actually cope with. 1567 01:15:12,698 --> 01:15:14,990 And so there's a pretty straightforward argument there. 1568 01:15:19,860 --> 01:15:20,790 Right. 1569 01:15:20,790 --> 01:15:27,490 So in particular, here's a cube. 1570 01:15:27,490 --> 01:15:29,463 How many quarters are in a cube? 1571 01:15:29,463 --> 01:15:30,130 AUDIENCE: Eight. 1572 01:15:30,130 --> 01:15:31,450 JUSTIN SOLOMON: Eight, thanks. 1573 01:15:31,450 --> 01:15:35,650 I hid one back here, in case you were wondering. 1574 01:15:35,650 --> 01:15:39,220 So let's say that I fix a corner of the cube, 1575 01:15:39,220 --> 01:15:40,900 like we've done that. 1576 01:15:40,900 --> 01:15:44,470 Then every time that I rotate one of the faces of my cube 1577 01:15:44,470 --> 01:15:46,150 clockwise or counterclockwise, I'm 1578 01:15:46,150 --> 01:15:48,080 essentially like taking one corner of my cube 1579 01:15:48,080 --> 01:15:50,890 and like sticking it in another place, right? 1580 01:15:50,890 --> 01:15:56,835 So in all, seven corners of my cube can move. 1581 01:15:59,980 --> 01:16:02,605 And if I'm not worried about, like-- 1582 01:16:02,605 --> 01:16:04,690 it could be that some of these permutations 1583 01:16:04,690 --> 01:16:06,655 are not actually achievable by a set of steps. 1584 01:16:06,655 --> 01:16:08,530 Like, maybe I'd have to break my Rubik's cube 1585 01:16:08,530 --> 01:16:10,580 and glue it back together. 1586 01:16:10,580 --> 01:16:12,490 But if I'm being conservative about it, 1587 01:16:12,490 --> 01:16:15,430 there's of course less than or equal to 7 1588 01:16:15,430 --> 01:16:18,415 factorial different configurations of the corners. 1589 01:16:24,520 --> 01:16:26,590 So in other words, every time I rotate my face, 1590 01:16:26,590 --> 01:16:28,700 one of the corners ends up in a different place. 1591 01:16:28,700 --> 01:16:31,325 So there's 7 factorial different ways that could have happened. 1592 01:16:33,840 --> 01:16:35,957 OK, so that's part of my bound. 1593 01:16:35,957 --> 01:16:38,040 Remember that I'm trying to bound the total number 1594 01:16:38,040 --> 01:16:40,020 of configurations here. 1595 01:16:40,020 --> 01:16:42,240 And essentially, what I've done so far is I've said, 1596 01:16:42,240 --> 01:16:46,180 OK, well, there's a bunch of cubes 1597 01:16:46,180 --> 01:16:47,980 in my 2-by-2 Rubik's cube. 1598 01:16:47,980 --> 01:16:50,890 So I'm going to like unglue this entire cube, 1599 01:16:50,890 --> 01:16:53,568 take just this corner, and stick it up here. 1600 01:16:53,568 --> 01:16:55,360 And there's like 7 factorial different ways 1601 01:16:55,360 --> 01:16:57,873 that I could do that. 1602 01:16:57,873 --> 01:16:59,540 But I still have to account for the fact 1603 01:16:59,540 --> 01:17:00,830 that I pull this piece off. 1604 01:17:00,830 --> 01:17:02,002 I stick it in the top. 1605 01:17:02,002 --> 01:17:03,710 But I have to figure out its orientation. 1606 01:17:03,710 --> 01:17:07,340 I can still rotate it about this corner. 1607 01:17:07,340 --> 01:17:09,140 And in fact, there are three different ways 1608 01:17:09,140 --> 01:17:10,400 that I could rotate it, right? 1609 01:17:10,400 --> 01:17:11,840 You can kind of see it, right? 1610 01:17:11,840 --> 01:17:15,300 1, 2, 3, yeah? 1611 01:17:15,300 --> 01:17:27,750 So in all, so each corner can rotate three ways. 1612 01:17:27,750 --> 01:17:32,300 So that means that I have 3 times 7 factorial 1613 01:17:32,300 --> 01:17:36,280 different configurations as an upper bound. 1614 01:17:36,280 --> 01:17:45,520 And this number is, wait for it, 11,022,480. 1615 01:17:45,520 --> 01:17:49,300 The problem asks you to argue that your upper bound is 1616 01:17:49,300 --> 01:17:50,840 upper bounded by 12 million. 1617 01:17:50,840 --> 01:17:55,045 And indeed, it is less than or equal to 12 million. 1618 01:17:55,045 --> 01:17:59,480 AUDIENCE: Is that 3 times 7 factorial or-- 1619 01:17:59,480 --> 01:18:00,830 JUSTIN SOLOMON: Oh, I'm sorry. 1620 01:18:00,830 --> 01:18:03,247 Right, because there are seven corners, each of which 1621 01:18:03,247 --> 01:18:04,580 can rotate three different ways. 1622 01:18:04,580 --> 01:18:07,730 It's actually 3 to the 7th power times 7 factorial. 1623 01:18:07,730 --> 01:18:10,550 Thank you, student. 1624 01:18:10,550 --> 01:18:12,170 OK, right. 1625 01:18:12,170 --> 01:18:13,820 So let's see here. 1626 01:18:13,820 --> 01:18:17,030 Really quickly moving here, the next problem says, 1627 01:18:17,030 --> 01:18:18,890 state the maximum and minimum degree 1628 01:18:18,890 --> 01:18:22,350 of any vertex in my graph. 1629 01:18:22,350 --> 01:18:24,420 First of all, do I expect vertices 1630 01:18:24,420 --> 01:18:25,740 to have different degrees? 1631 01:18:25,740 --> 01:18:28,320 This is kind of a goofy problem. 1632 01:18:28,320 --> 01:18:35,450 Like, what would it mean to have a vertex that somehow has lower 1633 01:18:35,450 --> 01:18:36,575 degree than another vertex? 1634 01:18:36,575 --> 01:18:38,450 It would mean that there's some configuration 1635 01:18:38,450 --> 01:18:40,950 of this cube for which there are fewer moves that I 1636 01:18:40,950 --> 01:18:43,320 could do to change it than a different configuration 1637 01:18:43,320 --> 01:18:45,582 of this cube. 1638 01:18:45,582 --> 01:18:47,290 And that's obviously not the case, right? 1639 01:18:47,290 --> 01:18:51,820 Because when I flip one of the faces of my cube, all I'm doing 1640 01:18:51,820 --> 01:18:53,230 is I'm moving the colors around. 1641 01:18:53,230 --> 01:18:55,430 I haven't somehow changed the physics of how 1642 01:18:55,430 --> 01:18:57,160 a Rubik's cube works, right? 1643 01:18:57,160 --> 01:19:02,140 And so I think this was just intended to be 1644 01:19:02,140 --> 01:19:04,120 annoying by your instructors. 1645 01:19:04,120 --> 01:19:13,780 The min degree is equal to the max degree. 1646 01:19:13,780 --> 01:19:17,290 And in fact, the degree of every node in my graph 1647 01:19:17,290 --> 01:19:20,180 is constant here. 1648 01:19:20,180 --> 01:19:21,980 The one thing that's worth noting here-- 1649 01:19:21,980 --> 01:19:23,123 what I haven't argued-- 1650 01:19:23,123 --> 01:19:24,540 it turns out, I think, to be true. 1651 01:19:24,540 --> 01:19:27,980 But what I have argued is that I couldn't rotate a face 1652 01:19:27,980 --> 01:19:29,930 and actually end up in the same configuration. 1653 01:19:29,930 --> 01:19:32,128 Like, maybe, for some reason, I had red all the way 1654 01:19:32,128 --> 01:19:32,920 around the outside. 1655 01:19:32,920 --> 01:19:35,357 And so when I rotate it, nothing changed. 1656 01:19:35,357 --> 01:19:36,440 That obviously isn't true. 1657 01:19:36,440 --> 01:19:37,965 But I haven't argued it carefully. 1658 01:19:37,965 --> 01:19:40,340 But as long as I don't worry about my graph being simple, 1659 01:19:40,340 --> 01:19:43,880 like I'm OK with self-loops, then the degree is certainly 1660 01:19:43,880 --> 01:19:45,800 constant, yeah? 1661 01:19:45,800 --> 01:19:47,690 OK. 1662 01:19:47,690 --> 01:19:50,770 And in fact, I don't think that can happen in a typical Rubik's 1663 01:19:50,770 --> 01:19:51,270 cube. 1664 01:19:51,270 --> 01:19:52,645 AUDIENCE: Well, I think the point 1665 01:19:52,645 --> 01:19:55,585 is to say what the degree was. 1666 01:19:55,585 --> 01:19:56,960 JUSTIN SOLOMON: Oh, yeah, indeed. 1667 01:19:56,960 --> 01:19:58,130 So we haven't computed the degree. 1668 01:19:58,130 --> 01:20:00,260 But we've argued that they're equal to one another. 1669 01:20:00,260 --> 01:20:04,280 OK, so now we have to compute what that degree is. 1670 01:20:04,280 --> 01:20:05,410 And here's how to do it. 1671 01:20:05,410 --> 01:20:08,780 So of course-- well, this, I think, 1672 01:20:08,780 --> 01:20:11,120 is actually even easier than the first part. 1673 01:20:11,120 --> 01:20:14,030 Essentially, remember, we have three different options 1674 01:20:14,030 --> 01:20:15,440 for faces that I can rotate. 1675 01:20:15,440 --> 01:20:19,820 I can rotate the top, the front, or the side here. 1676 01:20:19,820 --> 01:20:24,950 So there's three faces that we could rotate. 1677 01:20:27,360 --> 01:20:29,360 OK, and how many different ways can rotate them? 1678 01:20:29,360 --> 01:20:32,870 I can rotate them counterclockwise or clockwise. 1679 01:20:32,870 --> 01:20:34,560 So there's two directions. 1680 01:20:34,560 --> 01:20:37,580 So in all, there's degree 6 for every vertex, right? 1681 01:20:37,580 --> 01:20:43,030 There's sis different ways in or out of a vertex here. 1682 01:20:43,030 --> 01:20:44,530 OK, so the next part of the problem 1683 01:20:44,530 --> 01:20:48,430 gives you a piece of code and then does breadth-first search 1684 01:20:48,430 --> 01:20:50,950 on this graph. 1685 01:20:50,950 --> 01:20:52,870 And it's super, super slow to give me 1686 01:20:52,870 --> 01:20:55,690 the distance to all the other configurations. 1687 01:20:55,690 --> 01:20:59,320 And Jason conveniently has run it on his laptop here. 1688 01:20:59,320 --> 01:21:02,580 I don't-- I'm nervous to touch your laptop. 1689 01:21:02,580 --> 01:21:04,750 I don't care so much, but I don't-- 1690 01:21:04,750 --> 01:21:06,440 you know, I don't want to infect your-- 1691 01:21:06,440 --> 01:21:06,940 yeah. 1692 01:21:09,460 --> 01:21:10,030 Right. 1693 01:21:10,030 --> 01:21:12,490 So we have a piece of code that explores 1694 01:21:12,490 --> 01:21:15,320 the graph of all the configurations of our cube 1695 01:21:15,320 --> 01:21:18,490 by breadth-first search and then sort of gives me the shortest 1696 01:21:18,490 --> 01:21:20,885 path, I think from the base cube, 1697 01:21:20,885 --> 01:21:22,510 where all the faces are constant to all 1698 01:21:22,510 --> 01:21:24,970 the other configurations that are reachable, 1699 01:21:24,970 --> 01:21:28,400 and generates a plot. 1700 01:21:28,400 --> 01:21:29,050 Right. 1701 01:21:29,050 --> 01:21:32,290 And so what they ask is to figure out 1702 01:21:32,290 --> 01:21:33,760 the total number of configurations 1703 01:21:33,760 --> 01:21:35,127 that it explores. 1704 01:21:35,127 --> 01:21:36,460 One thing that you'll find out-- 1705 01:21:39,330 --> 01:21:45,030 center down-- is that it explores pretty much a third-- 1706 01:21:45,030 --> 01:21:47,880 in fact, exactly a third of all the possible configurations 1707 01:21:47,880 --> 01:21:48,795 of my cube. 1708 01:21:48,795 --> 01:21:52,450 I think we can see that here. 1709 01:21:52,450 --> 01:21:54,030 So I guess it runs this whole thing. 1710 01:21:54,030 --> 01:21:56,580 You saw them all together. 1711 01:21:56,580 --> 01:21:57,420 Right. 1712 01:21:57,420 --> 01:21:57,920 Oops. 1713 01:22:02,280 --> 01:22:03,450 That's OK. 1714 01:22:03,450 --> 01:22:06,300 And so in fact, the kind of fun fact that you can learn about 1715 01:22:06,300 --> 01:22:10,710 the 2-by-2-by-2 Rubik's cube is that there's actually three 1716 01:22:10,710 --> 01:22:13,090 connected components in this graph. 1717 01:22:13,090 --> 01:22:15,240 So in other words, there's sort of like three 1718 01:22:15,240 --> 01:22:18,150 different Rubik's cubes you can make, modulo all the different 1719 01:22:18,150 --> 01:22:21,000 flips that you can do to the faces. 1720 01:22:21,000 --> 01:22:23,550 And those correspond to there corner rotations of one 1721 01:22:23,550 --> 01:22:26,010 of the corners of the thing. 1722 01:22:26,010 --> 01:22:30,570 OK, so right. 1723 01:22:30,570 --> 01:22:32,900 So then the next part of the problem 1724 01:22:32,900 --> 01:22:35,420 asks you to state the maximum number of moves needed 1725 01:22:35,420 --> 01:22:37,067 to solve any Rubik's cube. 1726 01:22:37,067 --> 01:22:38,400 And you can see it in this plot. 1727 01:22:38,400 --> 01:22:40,370 So what this plot is showing you is the size 1728 01:22:40,370 --> 01:22:43,010 of the level set of this distance 1729 01:22:43,010 --> 01:22:44,970 function for every distance. 1730 01:22:44,970 --> 01:22:47,840 So I think technically, it looks like 0. 1731 01:22:47,840 --> 01:22:49,580 But there's actually, this is at a 1 1732 01:22:49,580 --> 01:22:52,970 here, which is to say there's one vertex at distance 0, which 1733 01:22:52,970 --> 01:22:54,380 is the source. 1734 01:22:54,380 --> 01:22:56,360 And as we move farther and farther out, 1735 01:22:56,360 --> 01:22:57,303 our tree is expanding. 1736 01:22:57,303 --> 01:22:58,970 And we're seeing more and more vertices. 1737 01:22:58,970 --> 01:23:02,240 And apparently, most vertices are approximately distance-- 1738 01:23:02,240 --> 01:23:04,010 is that 11 or 12? 1739 01:23:04,010 --> 01:23:06,480 11 away from the original. 1740 01:23:06,480 --> 01:23:09,140 Then eventually I explore the entire graph, and I'm done. 1741 01:23:09,140 --> 01:23:12,770 And you can see that the farthest-away vertex is 14 1742 01:23:12,770 --> 01:23:16,940 away, meaning that the most annoying Rubik's cube to solve 1743 01:23:16,940 --> 01:23:22,400 can be solved in 14 steps for the 2-by-2-by-2 pocket cube. 1744 01:23:22,400 --> 01:23:25,520 I'm sure that Jason probably knows the equivalent of this 1745 01:23:25,520 --> 01:23:26,840 number for the 3-by-3-by-3. 1746 01:23:26,840 --> 01:23:29,148 But I have no idea what it is. 1747 01:23:29,148 --> 01:23:31,190 I'm impressed if he can calculate it in his head, 1748 01:23:31,190 --> 01:23:33,460 like he looked like he was about to try. 1749 01:23:33,460 --> 01:23:35,490 But I digress. 1750 01:23:35,490 --> 01:23:35,990 Right. 1751 01:23:35,990 --> 01:23:40,323 So in other words, this is actually a fancy term for-- 1752 01:23:40,323 --> 01:23:41,990 we talked about the radius of your graph 1753 01:23:41,990 --> 01:23:42,865 in the first problem. 1754 01:23:42,865 --> 01:23:45,830 Now we've got the diameter, which is, well, 1755 01:23:45,830 --> 01:23:47,400 not necessarily 2 times the radius, 1756 01:23:47,400 --> 01:23:51,980 the way that we've defined it here, but actually, almost-- 1757 01:23:51,980 --> 01:23:55,150 I think within some constant of that. 1758 01:23:55,150 --> 01:23:58,100 OK, so right. 1759 01:23:58,100 --> 01:24:01,610 So notice that the vertical axis here is really big. 1760 01:24:01,610 --> 01:24:04,310 And this is explaining why this BFS code is so slow, right? 1761 01:24:04,310 --> 01:24:06,440 Because these are all the different configurations 1762 01:24:06,440 --> 01:24:07,670 it has to hit. 1763 01:24:07,670 --> 01:24:10,620 Or more accurately, if I take the y-position 1764 01:24:10,620 --> 01:24:13,610 of each one of these vertices and sum up its height, 1765 01:24:13,610 --> 01:24:16,220 those are all the configurations that are reachable. 1766 01:24:16,220 --> 01:24:19,040 And those are all the steps that BFS needs before it's done. 1767 01:24:19,040 --> 01:24:22,700 And so that number is in the-- 1768 01:24:22,700 --> 01:24:25,760 it's certainly in the millions, yeah. 1769 01:24:25,760 --> 01:24:27,830 OK, so then the last part of this problem, 1770 01:24:27,830 --> 01:24:31,160 which it conveniently looks like I'm low on time to solve, 1771 01:24:31,160 --> 01:24:34,010 but I'll refer you to the solution anyway, 1772 01:24:34,010 --> 01:24:37,520 is asking how we might do this faster. 1773 01:24:37,520 --> 01:24:41,130 And so in particular, what it says is, 1774 01:24:41,130 --> 01:24:44,850 let's say that I have a total of n configurations for my Rubik's 1775 01:24:44,850 --> 01:24:45,350 cube. 1776 01:24:45,350 --> 01:24:47,330 In this case, it turns out that that's 1777 01:24:47,330 --> 01:24:49,720 like roughly three million, I think. 1778 01:24:49,720 --> 01:24:50,510 OK. 1779 01:24:50,510 --> 01:24:54,860 And now I want an algorithm that gives me 1780 01:24:54,860 --> 01:25:00,410 the shortest sequence of moves to solve any pocket cube. 1781 01:25:00,410 --> 01:25:04,430 Man, I'm really ravaging the chalk today. 1782 01:25:04,430 --> 01:25:08,660 And I want to solve any cube in a number of steps 1783 01:25:08,660 --> 01:25:17,950 that looks like 2N to the ceiling of w over 2, where-- 1784 01:25:23,660 --> 01:25:24,890 let's see here. 1785 01:25:24,890 --> 01:25:26,540 The code provided-- sorry, this problem 1786 01:25:26,540 --> 01:25:28,252 changed on me this afternoon. 1787 01:25:28,252 --> 01:25:30,530 [LAUGHS] Right. 1788 01:25:30,530 --> 01:25:36,710 So where N sub i is equal to the number of configurations 1789 01:25:36,710 --> 01:25:38,940 reachable within i moves. 1790 01:25:38,940 --> 01:25:39,440 Oh, good. 1791 01:25:39,440 --> 01:25:40,398 I see what we did here. 1792 01:25:46,840 --> 01:25:50,075 So if this is my base cube, then we've got like maybe, 1793 01:25:50,075 --> 01:25:52,250 I guess, six different-- 1794 01:25:52,250 --> 01:25:56,388 1, 2, 3, 4, 5, 6 different cubes that I can reach from those. 1795 01:25:56,388 --> 01:25:58,680 And then there's 6 cubes I can reach from all of those. 1796 01:25:58,680 --> 01:26:00,597 But of course, some of those might be pointing 1797 01:26:00,597 --> 01:26:03,290 backward or to each other. 1798 01:26:03,290 --> 01:26:06,020 But this is the number of things that are reachable in i moves. 1799 01:26:06,020 --> 01:26:08,330 And they ask for an algorithm that finds the shortest 1800 01:26:08,330 --> 01:26:10,910 path in this amount of time. 1801 01:26:10,910 --> 01:26:13,880 By the way, big N typically exponentiates in that subscript 1802 01:26:13,880 --> 01:26:14,800 there. 1803 01:26:14,800 --> 01:26:17,870 This looks innocent, but it's not. 1804 01:26:17,870 --> 01:26:20,413 The basic trick here is to do-- 1805 01:26:20,413 --> 01:26:22,080 I'm not going to bother writing it down. 1806 01:26:22,080 --> 01:26:24,663 We'll just talk about it for a second and call it for the day. 1807 01:26:24,663 --> 01:26:26,620 Well, I'll draw a picture. 1808 01:26:26,620 --> 01:26:28,680 So the breadth-first search algorithm 1809 01:26:28,680 --> 01:26:31,420 that we've thought about so far chooses a vertex 1810 01:26:31,420 --> 01:26:35,790 and then computes level sets outward from that vertex 1811 01:26:35,790 --> 01:26:39,570 until it maybe reaches the destination that you 1812 01:26:39,570 --> 01:26:41,602 want to hit. 1813 01:26:41,602 --> 01:26:42,810 That doesn't quite work here. 1814 01:26:42,810 --> 01:26:44,130 And the-- well, I mean, it does work. 1815 01:26:44,130 --> 01:26:46,140 But it's going to be quite slow, because like, 1816 01:26:46,140 --> 01:26:49,390 let's say I had bad luck, and now we're in that 14 vertex, 1817 01:26:49,390 --> 01:26:49,890 right? 1818 01:26:49,890 --> 01:26:51,348 Then, somewhere in there, I'm going 1819 01:26:51,348 --> 01:26:55,410 to hit this big height, which is sitting over the 11, 1820 01:26:55,410 --> 01:26:59,070 before I can get to vertex 14. 1821 01:26:59,070 --> 01:27:01,110 So the trick here is it turns out 1822 01:27:01,110 --> 01:27:04,050 that I can do it by only ever getting to 7. 1823 01:27:04,050 --> 01:27:08,470 And the way that I'm going to do that is instead, 1824 01:27:08,470 --> 01:27:10,997 I'm going to run BFS sort of in parallel for two 1825 01:27:10,997 --> 01:27:12,080 different vertices, right? 1826 01:27:12,080 --> 01:27:13,163 The source and the target. 1827 01:27:13,163 --> 01:27:15,850 So in this case, my current cube and the cube 1828 01:27:15,850 --> 01:27:18,340 I would like it to be, like the solution to the problem. 1829 01:27:18,340 --> 01:27:21,690 I'm first going to compute the level set 1 of that cube, 1830 01:27:21,690 --> 01:27:24,820 then level set 1 of the next cube, then level set 2, 1831 01:27:24,820 --> 01:27:26,950 level set 2, 3, 4. 1832 01:27:26,950 --> 01:27:28,630 And notice that eventually, they're 1833 01:27:28,630 --> 01:27:32,320 going to intersect, pretty much right at the midpoint. 1834 01:27:32,320 --> 01:27:34,140 And so the size of the level set-- 1835 01:27:34,140 --> 01:27:35,890 I never need to compute a level set that's 1836 01:27:35,890 --> 01:27:38,850 bigger than a half of the shortest path length. 1837 01:27:38,850 --> 01:27:41,530 I have to round up to be conservative about that. 1838 01:27:41,530 --> 01:27:43,203 And that's where I get this factor here. 1839 01:27:43,203 --> 01:27:44,620 So that's just a nice little trick 1840 01:27:44,620 --> 01:27:46,697 for reducing the search size. 1841 01:27:46,697 --> 01:27:48,280 This is another kind of standard trick 1842 01:27:48,280 --> 01:27:50,530 if you look at some of the code people use 1843 01:27:50,530 --> 01:27:52,930 for solving board games algorithmically and so on. 1844 01:27:52,930 --> 01:27:55,690 I think they typically sort of search from the beginning 1845 01:27:55,690 --> 01:27:58,900 and end state outward and try and meet in the middle 1846 01:27:58,900 --> 01:28:00,370 for exactly this reason, which is 1847 01:28:00,370 --> 01:28:03,470 that exponential growth, as we all know, 1848 01:28:03,470 --> 01:28:05,752 can be quite problematic. 1849 01:28:05,752 --> 01:28:06,460 All right, folks. 1850 01:28:06,460 --> 01:28:08,127 So I think we're just about out of time. 1851 01:28:08,127 --> 01:28:10,150 And I've certainly worn myself out. 1852 01:28:10,150 --> 01:28:12,580 So with that, hopefully we'll see you next week. 1853 01:28:12,580 --> 01:28:15,900 And yeah, I hope everybody is doing well.