1 00:00:00,000 --> 00:00:02,475 [SQEAKING] 2 00:00:02,475 --> 00:00:03,955 [RUSTLING] 3 00:00:04,455 --> 00:00:05,445 [CLICKING] 4 00:00:25,270 --> 00:00:27,580 MICHAEL SIPSER: Welcome, everyone, 5 00:00:27,580 --> 00:00:31,150 back to theory of computation. 6 00:00:31,150 --> 00:00:35,710 So we are now at lecture number 15. 7 00:00:35,710 --> 00:00:40,283 And this is an important lecture. 8 00:00:40,283 --> 00:00:41,950 Well, all of our lectures are important, 9 00:00:41,950 --> 00:00:49,420 but this is going to introduce one of the major topics 10 00:00:49,420 --> 00:00:53,230 that we're going to see going on in various forms 11 00:00:53,230 --> 00:00:55,330 through the rest of the semester, which 12 00:00:55,330 --> 00:00:57,380 is the notion of a complete problem. 13 00:00:57,380 --> 00:00:59,860 In this case, it's going to be an NP complete problem. 14 00:00:59,860 --> 00:01:04,040 And I think some of you have probably heard of that concept 15 00:01:04,040 --> 00:01:04,540 already. 16 00:01:04,540 --> 00:01:06,457 Maybe you've seen it in some courses or other, 17 00:01:06,457 --> 00:01:11,080 but we're going to do that in a rather careful and formal way 18 00:01:11,080 --> 00:01:13,480 over the next couple of lectures. 19 00:01:13,480 --> 00:01:18,790 So we are following up on our previous discussions 20 00:01:18,790 --> 00:01:21,670 about complexity, time complexity. 21 00:01:21,670 --> 00:01:26,470 We defined the time and not deterministic time complexity 22 00:01:26,470 --> 00:01:29,560 classes, as you may remember, the classes P 23 00:01:29,560 --> 00:01:35,430 and NP, talked about the P versus NP problem, 24 00:01:35,430 --> 00:01:37,170 looked at some interesting algorithms 25 00:01:37,170 --> 00:01:40,470 for showing problems in P called dynamic programming. 26 00:01:40,470 --> 00:01:44,010 And we started to move toward our discussion of NP 27 00:01:44,010 --> 00:01:46,860 completeness with the introduction of polynomial time 28 00:01:46,860 --> 00:01:51,120 reducibility, which is related to some of the earlier 29 00:01:51,120 --> 00:01:53,197 reducibility notions that we discussed 30 00:01:53,197 --> 00:01:54,405 in the computability section. 31 00:02:04,710 --> 00:02:12,070 Quick review-- what it means for one problem to be polynomial 32 00:02:12,070 --> 00:02:14,620 time reducible to another-- 33 00:02:14,620 --> 00:02:19,150 this follows our pattern of reducibility concepts, 34 00:02:19,150 --> 00:02:24,460 where if a problem is reducible to another problem, 35 00:02:24,460 --> 00:02:26,200 and that other problem is solvable, 36 00:02:26,200 --> 00:02:28,040 then the first problem is solvable. 37 00:02:28,040 --> 00:02:30,460 So if a problem is reducible to an easy problem, 38 00:02:30,460 --> 00:02:33,520 that problem becomes [INAUDIBLE].. 39 00:02:33,520 --> 00:02:35,590 And the kind of reduction that we're 40 00:02:35,590 --> 00:02:38,080 going to be looking at here are the mapping reductions, 41 00:02:38,080 --> 00:02:42,500 but now where the reductions are computable in polynomial time. 42 00:02:42,500 --> 00:02:45,220 And so we had this result we mentioned 43 00:02:45,220 --> 00:02:49,030 last time, that, if A is polynomial time reducible to B, 44 00:02:49,030 --> 00:02:53,980 and B is in P, then A is also in P. 45 00:02:53,980 --> 00:03:00,580 So that's going to be critically important for the whole 46 00:03:00,580 --> 00:03:02,095 of discussion that's coming. 47 00:03:04,630 --> 00:03:09,400 Our intuition about P and NP, repeating that here-- 48 00:03:09,400 --> 00:03:13,990 the NP problems are the ones where you can verify membership 49 00:03:13,990 --> 00:03:20,530 easily, as opposed to being able to test membership easily. 50 00:03:20,530 --> 00:03:22,510 Those are the problems that are in P. 51 00:03:22,510 --> 00:03:25,060 The verification typically requires 52 00:03:25,060 --> 00:03:30,250 some kind of a certificate that establishes 53 00:03:30,250 --> 00:03:34,180 a proof that the input is a member of that language. 54 00:03:37,460 --> 00:03:40,880 And the big question of the field, 55 00:03:40,880 --> 00:03:42,800 which remains an unsolved problem, 56 00:03:42,800 --> 00:03:44,270 is whether P equals NP. 57 00:03:44,270 --> 00:03:46,550 It's called the P versus NP question. 58 00:03:46,550 --> 00:03:48,270 And we don't know the answer-- 59 00:03:48,270 --> 00:03:53,300 so whether there are problems that are in NP solvable 60 00:03:53,300 --> 00:03:55,310 in non-deterministic polynomial time-- 61 00:03:55,310 --> 00:03:58,100 typically these are problems that involve searching-- 62 00:03:58,100 --> 00:04:00,260 whether they can be solved in polynomial time, 63 00:04:00,260 --> 00:04:02,630 typically without the searching. 64 00:04:02,630 --> 00:04:06,775 And if P were equal to NP, then you 65 00:04:06,775 --> 00:04:08,150 could always eliminate searching. 66 00:04:08,150 --> 00:04:10,130 And if P were different from NP, then there 67 00:04:10,130 --> 00:04:12,260 were cases where you need to search. 68 00:04:12,260 --> 00:04:16,220 And we don't know the answer to that. 69 00:04:16,220 --> 00:04:22,360 So in the direction of exploring this question 70 00:04:22,360 --> 00:04:25,510 and its ramifications, we introduced 71 00:04:25,510 --> 00:04:26,740 this problem called SAT. 72 00:04:26,740 --> 00:04:34,120 These are the Boolean formulas, which have an assignment that 73 00:04:34,120 --> 00:04:37,460 makes them evaluate to true. 74 00:04:37,460 --> 00:04:40,090 So we call those satisfiable formulas. 75 00:04:40,090 --> 00:04:45,760 And we mentioned, but have not yet proven-- 76 00:04:45,760 --> 00:04:49,030 and we will not prove until the next lecture on Tuesday-- 77 00:04:49,030 --> 00:04:53,560 that there is this theorem, a very remarkable theorem 78 00:04:53,560 --> 00:04:56,260 that says that, if you have-- 79 00:04:56,260 --> 00:05:00,610 take the satisfiability problem, and if it is solvable quickly, 80 00:05:00,610 --> 00:05:03,830 then all of the NP problems are solvable quickly. 81 00:05:03,830 --> 00:05:06,670 So if SAT is in P, then P equals NP. 82 00:05:06,670 --> 00:05:11,140 So in a sense, SAT is kind of no the-- 83 00:05:13,900 --> 00:05:17,050 it's sort of the super NP problem, 84 00:05:17,050 --> 00:05:22,060 in the sense that all of the difficulty of any NP problem 85 00:05:22,060 --> 00:05:24,220 is embedded within SAT. 86 00:05:24,220 --> 00:05:27,610 So if SAT becomes easy, then all of the other NP problems 87 00:05:27,610 --> 00:05:28,900 become easy. 88 00:05:28,900 --> 00:05:31,930 And we'll eventually prove that, but right now 89 00:05:31,930 --> 00:05:38,450 we're setting up the terminology to allow us to do that. 90 00:05:38,450 --> 00:05:42,440 So anyway, we'll get there. 91 00:05:42,440 --> 00:05:48,340 So the key ingredient for proving this Cook-Levin theorem 92 00:05:48,340 --> 00:05:50,860 is polynomial time reducibility. 93 00:05:50,860 --> 00:05:55,150 What we're going to show is that every problem in NP 94 00:05:55,150 --> 00:05:58,810 can be polynomial time reduced to SAT. 95 00:05:58,810 --> 00:06:03,670 So every NP problem can be converted into a SAT problem. 96 00:06:03,670 --> 00:06:06,870 And so working toward that-- because when 97 00:06:06,870 --> 00:06:09,960 you think about it, there are infinitely many primes in NP, 98 00:06:09,960 --> 00:06:16,260 and being able to show that all of them are reducible to SAT 99 00:06:16,260 --> 00:06:20,170 is, in a sense, kind of-- you have to prove a higher level 100 00:06:20,170 --> 00:06:20,670 theorem. 101 00:06:20,670 --> 00:06:22,050 It's not just a single reduction. 102 00:06:22,050 --> 00:06:24,450 We're exhibiting a schema of reductions, 103 00:06:24,450 --> 00:06:28,440 which shows that all of these problems can be reduced to SAT. 104 00:06:28,440 --> 00:06:35,910 And developing our intuition toward that direction, 105 00:06:35,910 --> 00:06:38,970 we're going to look at some specific polynomial time 106 00:06:38,970 --> 00:06:41,590 reductions today. 107 00:06:41,590 --> 00:06:47,070 And we'll start out with a reduction between two problems 108 00:06:47,070 --> 00:06:48,510 we have not yet seen-- 109 00:06:48,510 --> 00:06:51,510 one of them called 3SAT and the other one called clique. 110 00:06:51,510 --> 00:06:56,100 So on this slide, we're going to introduce those two problems. 111 00:06:56,100 --> 00:07:00,990 So remembering, again, about Boolean formulas, 112 00:07:00,990 --> 00:07:04,630 we're going to consider a special class 113 00:07:04,630 --> 00:07:08,680 of Boolean formulas, restricted form of Boolean form formulas 114 00:07:08,680 --> 00:07:11,680 called conjunctive normal form. 115 00:07:11,680 --> 00:07:14,440 And so this formula here in particular 116 00:07:14,440 --> 00:07:17,620 is in that conjunctive normal form. 117 00:07:17,620 --> 00:07:19,530 So just remember-- 118 00:07:19,530 --> 00:07:24,600 I was explaining some of this to someone else 119 00:07:24,600 --> 00:07:27,030 earlier this morning, showing some of these slides, 120 00:07:27,030 --> 00:07:29,550 and it was pointed out that not all of you 121 00:07:29,550 --> 00:07:33,600 may be familiar with these the Boolean operations of 122 00:07:33,600 --> 00:07:35,280 and and or. 123 00:07:35,280 --> 00:07:37,470 So this is the or symbol here. 124 00:07:37,470 --> 00:07:39,120 This is the and symbol here. 125 00:07:42,080 --> 00:07:46,220 Hopefully you've seen just the concept of Booelan and and or. 126 00:07:46,220 --> 00:07:49,130 Or is, in a sense, a little bit like a union. 127 00:07:49,130 --> 00:07:50,870 And it's like an intersection. 128 00:07:50,870 --> 00:07:54,890 And the symbols here are a little bit similar to the union 129 00:07:54,890 --> 00:07:57,150 and the intersection-- 130 00:07:57,150 --> 00:08:00,320 union and intersection symbols themselves. 131 00:08:00,320 --> 00:08:05,120 The V shape is a little bit like a pointy union symbol an 132 00:08:05,120 --> 00:08:07,880 the upside down V shape is a little like 133 00:08:07,880 --> 00:08:10,610 a pointy intersection symbol. 134 00:08:10,610 --> 00:08:12,080 If you haven't seen those-- 135 00:08:12,080 --> 00:08:13,760 seen that connection before, maybe it's 136 00:08:13,760 --> 00:08:16,460 interesting to observe that. 137 00:08:16,460 --> 00:08:21,200 But anyway, what makes this formula 138 00:08:21,200 --> 00:08:24,740 be in conjunctive normal form? 139 00:08:24,740 --> 00:08:27,830 Well, conjunctive normal formulas 140 00:08:27,830 --> 00:08:29,630 are organized in a certain way. 141 00:08:29,630 --> 00:08:34,490 They have these groups called clauses 142 00:08:34,490 --> 00:08:38,090 within the parentheses that are anded together, 143 00:08:38,090 --> 00:08:42,350 that are connected by these and operations. 144 00:08:42,350 --> 00:08:47,090 And within those groups, which are called clauses, 145 00:08:47,090 --> 00:08:53,160 the elements are ORed together. 146 00:08:53,160 --> 00:08:56,640 Those elements are going to be either variables 147 00:08:56,640 --> 00:09:02,020 or variables with negation, negated variables. 148 00:09:02,020 --> 00:09:06,000 So just to repeat that, these-- 149 00:09:06,000 --> 00:09:10,440 well, just to state, those variables or negated variables 150 00:09:10,440 --> 00:09:16,710 are going to be called literals, and the ORs 151 00:09:16,710 --> 00:09:20,860 of a bunch of literals are going to be called clauses. 152 00:09:20,860 --> 00:09:23,630 This is just the standard terminology in the field. 153 00:09:23,630 --> 00:09:24,130 OK? 154 00:09:24,130 --> 00:09:30,260 So a literal is a variable or a negated variable 155 00:09:30,260 --> 00:09:34,770 and a clause is an or of literals-- 156 00:09:34,770 --> 00:09:36,633 just a definition. 157 00:09:36,633 --> 00:09:38,800 All right, so we have a bunch of these clauses here. 158 00:09:38,800 --> 00:09:43,510 Each of the clauses has an or of literals in it. 159 00:09:43,510 --> 00:09:46,490 Now, a Chomsky-- 160 00:09:46,490 --> 00:09:50,580 Chomsky-- conjunctive normal form formula-- 161 00:09:50,580 --> 00:09:51,990 I guess they're both CNFs-- 162 00:09:51,990 --> 00:09:55,350 but conjunctive normal form formula 163 00:09:55,350 --> 00:10:00,630 is one that's written as an and of these clauses. 164 00:10:00,630 --> 00:10:03,990 So we take these clauses, which themselves are ORs-- 165 00:10:03,990 --> 00:10:06,120 we and them together, we get a formula 166 00:10:06,120 --> 00:10:07,920 that's conjunctive normal form. 167 00:10:07,920 --> 00:10:10,230 Conjunctive stands for and, I think. 168 00:10:10,230 --> 00:10:11,310 Conjunction is an and. 169 00:10:14,300 --> 00:10:21,140 And then a 3CNF is a CNF where each clause has exactly three 170 00:10:21,140 --> 00:10:22,490 literals. 171 00:10:22,490 --> 00:10:27,590 So for example, this particular formula is a CNF, 172 00:10:27,590 --> 00:10:32,210 but it's not a 3CNF, because it has at least two clauses which 173 00:10:32,210 --> 00:10:34,820 violate the condition of three literals per clause. 174 00:10:34,820 --> 00:10:36,830 This one is OK, obviously-- 175 00:10:36,830 --> 00:10:39,460 this first clause. 176 00:10:39,460 --> 00:10:41,993 And a 3SAT problem-- 177 00:10:41,993 --> 00:10:43,660 so this is the first of the two problems 178 00:10:43,660 --> 00:10:46,030 we're going to be talking about-- 179 00:10:46,030 --> 00:10:50,860 the 3SAT problem is the satisfiability problem 180 00:10:50,860 --> 00:10:55,310 restricted to these 3CNF formulas. 181 00:10:55,310 --> 00:10:57,140 So this is the collection of-- 182 00:10:57,140 --> 00:11:01,940 set of 3CNF formulas which are satisfiable. 183 00:11:01,940 --> 00:11:05,170 So you can think of it as kind of a special case of the SAT 184 00:11:05,170 --> 00:11:07,780 problem, where we only care about formulas 185 00:11:07,780 --> 00:11:08,800 of the special kind. 186 00:11:11,650 --> 00:11:13,150 This being a special case, you might 187 00:11:13,150 --> 00:11:15,730 imagine that this might be an easier problem to solve, 188 00:11:15,730 --> 00:11:20,080 but in general, it turns out not to be, as we will see. 189 00:11:20,080 --> 00:11:22,570 Solving this special case is just as hard 190 00:11:22,570 --> 00:11:25,720 as hard as solving the general case for satisfiability. 191 00:11:29,100 --> 00:11:32,220 Now let's turn to the second of these two languages 192 00:11:32,220 --> 00:11:37,450 up in the headline, the clique problem. 193 00:11:37,450 --> 00:11:39,300 So for that we-- 194 00:11:39,300 --> 00:11:41,170 going to turn to graph theory. 195 00:11:41,170 --> 00:11:45,480 So we're going to consider graphs, points 196 00:11:45,480 --> 00:11:47,340 and lines-- connecting them. 197 00:11:47,340 --> 00:11:52,230 And we will say that a clique in a graph 198 00:11:52,230 --> 00:11:53,880 is a collection of nodes-- 199 00:11:53,880 --> 00:11:57,060 collection of the points that are all pairwise 200 00:11:57,060 --> 00:11:58,500 connected by lines. 201 00:11:58,500 --> 00:12:02,110 And a k-clique is one where you have k such nodes. 202 00:12:02,110 --> 00:12:08,300 So here we have a three clique, four clique, and a five clique. 203 00:12:08,300 --> 00:12:11,270 And then the clique problem is to try 204 00:12:11,270 --> 00:12:16,610 to find cliques of a certain specified size embedded 205 00:12:16,610 --> 00:12:18,350 within a given graph. 206 00:12:21,670 --> 00:12:23,320 You're going to be given a graph, 207 00:12:23,320 --> 00:12:26,860 and it targets clique sized k, and you want to know, 208 00:12:26,860 --> 00:12:32,050 is there a subset of the nodes in the graph of size k 209 00:12:32,050 --> 00:12:34,760 that are all connected to one another? 210 00:12:34,760 --> 00:12:37,740 That's the clique problem. 211 00:12:37,740 --> 00:12:42,680 So obviously, the clique problem, just as with 212 00:12:42,680 --> 00:12:45,380 the satisfiability problem, is a decidable problem. 213 00:12:45,380 --> 00:12:49,460 You can just try every possible subset of k nodes 214 00:12:49,460 --> 00:12:52,640 and see whether it constitutes a clique. 215 00:12:52,640 --> 00:12:55,310 But that's, in general, going to be an exponential algorithm. 216 00:12:55,310 --> 00:13:00,080 If you have-- k is a large value-- 217 00:13:00,080 --> 00:13:02,660 it might be half the size of the graph-- you might be looking 218 00:13:02,660 --> 00:13:04,130 for a very large clique. 219 00:13:04,130 --> 00:13:06,920 Then you have to try many, many subsets in order 220 00:13:06,920 --> 00:13:10,175 to see whether any one of them is a clique. 221 00:13:10,175 --> 00:13:11,300 It's also going to be and-- 222 00:13:11,300 --> 00:13:15,260 I hope you get this intuition-- this is a problem that's in NP. 223 00:13:15,260 --> 00:13:16,850 The clique problem is an NP problem, 224 00:13:16,850 --> 00:13:23,870 because you can easily verify that a graph has a k-clique 225 00:13:23,870 --> 00:13:26,830 just by exhibiting the clique. 226 00:13:26,830 --> 00:13:28,360 OK. 227 00:13:28,360 --> 00:13:35,380 Now, so the language here is you've given a graph and a k, 228 00:13:35,380 --> 00:13:38,710 and you want to know, does the graph contain a k-clique? 229 00:13:38,710 --> 00:13:41,080 And what we're going to show is that these two problems 230 00:13:41,080 --> 00:13:45,490 are connected, that the 3SAT problem on the clique problem 231 00:13:45,490 --> 00:13:49,960 are related, in that you can reduce in polynomial time 3SAT 232 00:13:49,960 --> 00:13:50,530 to clique. 233 00:13:57,140 --> 00:14:00,440 Just standing back at it-- 234 00:14:00,440 --> 00:14:03,290 back for a minute, it seems kind of surprising. 235 00:14:03,290 --> 00:14:04,430 There's no real reason-- 236 00:14:07,930 --> 00:14:09,850 no obvious reason why there should 237 00:14:09,850 --> 00:14:12,160 be a connection between three 3SAT and clique. 238 00:14:12,160 --> 00:14:14,620 They look very different from one another. 239 00:14:14,620 --> 00:14:15,760 But we do give that-- 240 00:14:15,760 --> 00:14:19,750 we will give that reduction. 241 00:14:19,750 --> 00:14:22,240 That implies that, if you can find 242 00:14:22,240 --> 00:14:24,280 a way of solving the clique problem quickly, 243 00:14:24,280 --> 00:14:26,590 that'll give you a way of solving the 3SAT 244 00:14:26,590 --> 00:14:28,240 problem quickly. 245 00:14:28,240 --> 00:14:30,800 And that's going to be the whole point of this. 246 00:14:30,800 --> 00:14:34,390 So we're going to show this over the next slide 247 00:14:34,390 --> 00:14:38,830 or two, this polynomial time reduction. 248 00:14:38,830 --> 00:14:40,810 I'm going to walk through it slowly, 249 00:14:40,810 --> 00:14:44,740 but this is one where it's really important to try 250 00:14:44,740 --> 00:14:48,580 to get a sense of how it's working, 251 00:14:48,580 --> 00:14:50,020 because this is the kind of thing 252 00:14:50,020 --> 00:14:51,310 that we're going to be doing a lot of, 253 00:14:51,310 --> 00:14:53,727 and you're going to be asked to do it also on the homework 254 00:14:53,727 --> 00:14:55,330 and possibly on the final exam. 255 00:14:55,330 --> 00:14:58,360 I hate to use that as the motivating force here, 256 00:14:58,360 --> 00:15:00,880 but at least it might be one motivation for some of you, 257 00:15:00,880 --> 00:15:05,240 that you have to understand how to do this kind of a reduction. 258 00:15:07,825 --> 00:15:09,850 I think, at a high level, what's going 259 00:15:09,850 --> 00:15:15,160 on is we're showing how to recode a Boolean formula 260 00:15:15,160 --> 00:15:18,122 satisfiability problem into the problem of testing 261 00:15:18,122 --> 00:15:19,330 whether a graph has a clique. 262 00:15:23,280 --> 00:15:25,080 Oh, let's see if we have any questions here 263 00:15:25,080 --> 00:15:26,685 that are coming up in the chat. 264 00:15:36,740 --> 00:15:40,160 OK, so this is an interesting question here. 265 00:15:40,160 --> 00:15:46,500 Can we always convert a Boolean formula 266 00:15:46,500 --> 00:15:50,640 into conjunctive normal form, first of all? 267 00:15:50,640 --> 00:15:51,150 Yes. 268 00:15:51,150 --> 00:15:54,360 The answer is you can always convert a Boolean formula 269 00:15:54,360 --> 00:15:59,130 into an equivalent one in CNF. 270 00:15:59,130 --> 00:16:02,910 But in general, that might make the formula 271 00:16:02,910 --> 00:16:06,210 exponentially larger. 272 00:16:06,210 --> 00:16:10,180 So just the mere fact that you can convert formulas 273 00:16:10,180 --> 00:16:12,570 into conjunctive normal form doesn't 274 00:16:12,570 --> 00:16:21,090 mean that solving conjunctive normal form 275 00:16:21,090 --> 00:16:25,230 formulas for testing satisfiability of the CNF 276 00:16:25,230 --> 00:16:28,770 formulas is going to be as hard as testing the general case, 277 00:16:28,770 --> 00:16:31,980 because just the conversion might be exponential. 278 00:16:31,980 --> 00:16:36,240 There's something more-- little bit more complicated 279 00:16:36,240 --> 00:16:39,120 going on than that. 280 00:16:45,020 --> 00:16:45,980 Let me just see here. 281 00:16:50,360 --> 00:16:53,630 If phi is a satisfiable with formula which is not in CNF, 282 00:16:53,630 --> 00:16:56,300 can be-- 283 00:16:56,300 --> 00:17:01,280 so a similar question-- so the questions that I'm getting 284 00:17:01,280 --> 00:17:05,780 are about converting formulas to CNF. 285 00:17:05,780 --> 00:17:10,359 So yeah, you can do it, but not in polynomial time, in general, 286 00:17:10,359 --> 00:17:12,859 because the resulting formula you get might be much larger-- 287 00:17:18,230 --> 00:17:21,530 if you're looking for an equivalent formula. 288 00:17:21,530 --> 00:17:23,900 If you're not looking for an equivalent formula, then-- 289 00:17:23,900 --> 00:17:26,582 of course, then, depending upon what you're looking for, 290 00:17:26,582 --> 00:17:28,415 you might be able to find something smaller. 291 00:17:34,690 --> 00:17:37,390 This is, I guess, a good basic question. 292 00:17:37,390 --> 00:17:41,380 Why is clique in NP? 293 00:17:41,380 --> 00:17:43,907 Doesn't verifying that you have a clique 294 00:17:43,907 --> 00:17:45,865 require going through all the possible cliques? 295 00:17:48,610 --> 00:17:50,920 You have to understand what verifying means. 296 00:17:50,920 --> 00:17:53,830 Verifying means you can verify something 297 00:17:53,830 --> 00:17:56,660 if you're given a certificate. 298 00:17:56,660 --> 00:17:58,700 In the case of a clique-- 299 00:17:58,700 --> 00:18:03,290 the clique problem-- the certificate is the clique. 300 00:18:03,290 --> 00:18:04,900 So once you have the certificate, 301 00:18:04,900 --> 00:18:07,780 you can do the verification in polynomial time. 302 00:18:07,780 --> 00:18:11,500 Finding the certificate, of course, might be difficult. 303 00:18:11,500 --> 00:18:13,290 So you only think of NP in the context 304 00:18:13,290 --> 00:18:15,100 of having that certificate. 305 00:18:15,100 --> 00:18:18,247 So in a case for compositeness, the certificate 306 00:18:18,247 --> 00:18:19,080 might be the factor. 307 00:18:25,010 --> 00:18:28,220 There are problems sometimes where what the certificate is 308 00:18:28,220 --> 00:18:31,190 is not necessarily obvious. 309 00:18:31,190 --> 00:18:37,630 But there can be a certificate for showing that the input is 310 00:18:37,630 --> 00:18:38,740 in the language. 311 00:18:38,740 --> 00:18:40,330 In the ones that we've done so far, 312 00:18:40,330 --> 00:18:42,385 maybe you can argue that the certificate is 313 00:18:42,385 --> 00:18:44,260 sort of an obvious thing, but it's not always 314 00:18:44,260 --> 00:18:46,600 an obvious thing. 315 00:18:46,600 --> 00:18:48,070 OK. 316 00:18:48,070 --> 00:18:50,930 Why don't we move on then? 317 00:18:50,930 --> 00:18:55,030 So let's see, how do we do this polynomial time reduction 318 00:18:55,030 --> 00:18:57,310 from 3SAT to clique? 319 00:18:57,310 --> 00:18:58,720 OK, here we go. 320 00:19:03,157 --> 00:19:04,740 So I'm just going to give a reduction. 321 00:19:04,740 --> 00:19:06,660 That's what the definition means. 322 00:19:06,660 --> 00:19:10,500 I'm going to give a way of converting formulas 323 00:19:10,500 --> 00:19:16,110 to pairs, a G and a k, where the formula's going 324 00:19:16,110 --> 00:19:19,940 to be satisfiability if and only if the graph has a k-clique. 325 00:19:22,610 --> 00:19:26,960 OK, so let's a little bit do it by example. 326 00:19:26,960 --> 00:19:29,610 And in order to do that-- 327 00:19:29,610 --> 00:19:31,520 so here's going to be a formula now. 328 00:19:31,520 --> 00:19:33,200 It's in 3CNF. 329 00:19:33,200 --> 00:19:35,600 That's what I need in order to be doing this reduction. 330 00:19:35,600 --> 00:19:38,735 I'm converting 3CNF formulas into clique problems. 331 00:19:42,570 --> 00:19:44,220 We to have a little bit understand 332 00:19:44,220 --> 00:19:48,110 what it means when we say-- 333 00:19:48,110 --> 00:19:51,110 we talk about the satisfiability of a formula like this, 334 00:19:51,110 --> 00:19:54,050 because it's going to be helpful in doing the reduction. 335 00:19:54,050 --> 00:19:56,310 Obviously, satisfiability means that the-- 336 00:19:56,310 --> 00:19:58,270 you can find an assignment to the variables. 337 00:19:58,270 --> 00:20:00,380 So you're going to set each of these variables-- 338 00:20:00,380 --> 00:20:03,200 A, B, and C, and so on-- to true or false, 339 00:20:03,200 --> 00:20:06,140 and you want to make the whole formula evaluate the true. 340 00:20:06,140 --> 00:20:07,640 But what does that actually mean? 341 00:20:10,650 --> 00:20:14,180 It means that, because of the structure of the formula, that 342 00:20:14,180 --> 00:20:17,990 making this formula true corresponds to making 343 00:20:17,990 --> 00:20:19,340 each clause true-- 344 00:20:22,100 --> 00:20:26,340 because the clauses are all anded together, 345 00:20:26,340 --> 00:20:28,430 so the only way for the formula to be true 346 00:20:28,430 --> 00:20:30,630 is to make each clause true. 347 00:20:30,630 --> 00:20:34,170 And to make a clause true, you have to make at least one 348 00:20:34,170 --> 00:20:37,480 of the literals true. 349 00:20:37,480 --> 00:20:43,820 So it's another way of thinking about satisfying this formula. 350 00:20:43,820 --> 00:20:46,460 Satisfying these [INAUDIBLE] satisfying assignment 351 00:20:46,460 --> 00:20:51,380 makes at least one true literal in every clause. 352 00:20:51,380 --> 00:20:56,730 It's really important to think about it that way, 353 00:20:56,730 --> 00:21:00,330 because that's what's going to be 354 00:21:00,330 --> 00:21:03,807 the basis for doing this reduction 355 00:21:03,807 --> 00:21:04,890 and all of the reductions. 356 00:21:04,890 --> 00:21:07,890 It's what makes 3SAT easy to think about, 357 00:21:07,890 --> 00:21:10,140 in terms of its satisfiability. 358 00:21:10,140 --> 00:21:13,800 If you had a general satisfiability problem 359 00:21:13,800 --> 00:21:15,610 and you had a satisfiability both formula, 360 00:21:15,610 --> 00:21:18,900 there's no obvious way of seeing what the satisfying assignment 361 00:21:18,900 --> 00:21:21,690 looks like, but here we understand what it looks like. 362 00:21:21,690 --> 00:21:25,230 It has that very special form, making one true literal 363 00:21:25,230 --> 00:21:30,180 in every clause-- at least one true literal in every clause. 364 00:21:30,180 --> 00:21:32,350 So now we're going to do the reduction. 365 00:21:32,350 --> 00:21:34,665 So I'm going to take from this formula-- 366 00:21:38,070 --> 00:21:41,880 I know, for some of you, you're going to be chafing. 367 00:21:41,880 --> 00:21:42,990 Why am I going slowly? 368 00:21:42,990 --> 00:21:44,407 But I want to make sure that we're 369 00:21:44,407 --> 00:21:47,280 all together and understanding what the rules of the game are 370 00:21:47,280 --> 00:21:48,600 and what we're trying to do. 371 00:21:52,000 --> 00:21:54,780 We're trying to convert this formula 372 00:21:54,780 --> 00:21:58,170 into a graph and a number. 373 00:21:58,170 --> 00:22:01,770 So right now my job is, to do this reduction, 374 00:22:01,770 --> 00:22:05,160 is to exhibit that graph. 375 00:22:05,160 --> 00:22:07,057 So I'm going to do that and two steps. 376 00:22:07,057 --> 00:22:09,390 First, I'm going to tell you what the nodes of the graph 377 00:22:09,390 --> 00:22:10,110 are. 378 00:22:10,110 --> 00:22:13,980 Then I'll tell you what the edges of that graph [AUDIO OUT] 379 00:22:13,980 --> 00:22:17,950 Finally, I'll tell you what the number k is. 380 00:22:17,950 --> 00:22:20,230 That's the way this polynomial time 381 00:22:20,230 --> 00:22:22,740 reduction is going to work. 382 00:22:22,740 --> 00:22:27,390 And we have to also observe at the very end that the reduction 383 00:22:27,390 --> 00:22:28,390 that I'm given-- 384 00:22:28,390 --> 00:22:31,560 giving you, this procedure for building this graph 385 00:22:31,560 --> 00:22:34,050 can be done in polynomial time, but that, I 386 00:22:34,050 --> 00:22:35,910 think-- you'll see, once I'm done, 387 00:22:35,910 --> 00:22:38,010 that that's pretty obvious. 388 00:22:38,010 --> 00:22:41,880 OK, so first, as I promised, the nodes-- 389 00:22:41,880 --> 00:22:44,160 so the nodes of this graph are going 390 00:22:44,160 --> 00:22:47,790 to correspond to the literals of the formula. 391 00:22:47,790 --> 00:22:50,770 Every literal is going to become a node in the graph, 392 00:22:50,770 --> 00:22:56,460 and it's going to be labeled with the name of that literal. 393 00:22:56,460 --> 00:22:58,680 Every node is going to be labeled an a, 394 00:22:58,680 --> 00:23:01,360 a b, or c bar, and so on. 395 00:23:01,360 --> 00:23:02,160 So here it goes. 396 00:23:05,110 --> 00:23:08,370 So those are the nodes of the graph 397 00:23:08,370 --> 00:23:16,770 G, one for each literal in the formula, labeled as promised. 398 00:23:16,770 --> 00:23:18,990 Now I have to tell you what the edges look like. 399 00:23:23,400 --> 00:23:26,550 I'm going to tell you what the edges are by first telling you 400 00:23:26,550 --> 00:23:28,170 what they are not. 401 00:23:28,170 --> 00:23:29,640 I'm going to first explain to you 402 00:23:29,640 --> 00:23:32,790 what the forbidden edges are, what 403 00:23:32,790 --> 00:23:36,480 edges I'm going to promise not to include. 404 00:23:36,480 --> 00:23:38,670 And then the ones that I will include 405 00:23:38,670 --> 00:23:41,330 are going to be all the others. 406 00:23:41,330 --> 00:23:43,700 So what are going to be the forbidden edges? 407 00:23:43,700 --> 00:23:48,140 First of all, the forbidden edges 408 00:23:48,140 --> 00:23:50,010 are going to be of two types. 409 00:23:50,010 --> 00:23:55,820 One is the edges between nodes that come from literals 410 00:23:55,820 --> 00:23:57,060 in the same clause. 411 00:23:57,060 --> 00:24:02,060 So I would just call that no edges within a clause. 412 00:24:02,060 --> 00:24:04,160 So for example, these three nodes 413 00:24:04,160 --> 00:24:05,970 will not be connected to one another. 414 00:24:05,970 --> 00:24:08,330 And I'm going to indicate that by writing 415 00:24:08,330 --> 00:24:14,030 red dashed lines, which means there's not an edge there. 416 00:24:14,030 --> 00:24:17,350 Those are forbidden from having an edge. 417 00:24:17,350 --> 00:24:19,870 So these three have no edge, and the same thing 418 00:24:19,870 --> 00:24:26,780 for every other triple of three nodes that come from clauses. 419 00:24:26,780 --> 00:24:29,570 Those are no edges there. 420 00:24:29,570 --> 00:24:33,560 And there's one other category of edge that I'm forbidding, 421 00:24:33,560 --> 00:24:36,140 and that are edges that can-- 422 00:24:36,140 --> 00:24:40,400 that go between inconsistent labels, and the nodes 423 00:24:40,400 --> 00:24:42,060 with inconsistent labels. 424 00:24:42,060 --> 00:24:45,020 So for example, between a and a bar-- 425 00:24:45,020 --> 00:24:47,570 those are inconsistent. 426 00:24:47,570 --> 00:24:49,715 Nothing's wrong with a going a to d. 427 00:24:49,715 --> 00:24:50,840 Those are not inconsistent. 428 00:24:50,840 --> 00:24:52,370 Those are just different labels. 429 00:24:52,370 --> 00:24:54,590 Or going from a to a-- that's OK. 430 00:24:54,590 --> 00:24:57,990 But from a to a bar-- that's not allowed. 431 00:24:57,990 --> 00:25:01,190 So that's going to be another forbidden edge-- 432 00:25:01,190 --> 00:25:06,420 or for example, a to a bar here, a bar to a, or for example, 433 00:25:06,420 --> 00:25:09,110 from this c bar to c-- 434 00:25:09,110 --> 00:25:10,470 forbidden. 435 00:25:10,470 --> 00:25:10,970 OK? 436 00:25:10,970 --> 00:25:12,380 So you imagine, you're going to write down 437 00:25:12,380 --> 00:25:13,790 all of those forbidden edges. 438 00:25:18,180 --> 00:25:20,940 Those are all of the forbidden edges, 439 00:25:20,940 --> 00:25:24,960 and then, after taking those away, 440 00:25:24,960 --> 00:25:28,540 you're going to be putting in all the other edges possible. 441 00:25:28,540 --> 00:25:32,310 So for example-- let me just gray those out so they 442 00:25:32,310 --> 00:25:34,710 don't interfere with the picture-- 443 00:25:34,710 --> 00:25:41,140 from a to d, there's going to be an edge, because those are not 444 00:25:41,140 --> 00:25:42,160 forbidden. 445 00:25:42,160 --> 00:25:45,715 a to a-- not forbidden, because they-- 446 00:25:48,700 --> 00:25:53,110 they're in different clauses and they're not inconsistent. 447 00:25:53,110 --> 00:25:55,040 They're consistent with one another. 448 00:25:55,040 --> 00:25:56,570 So here are a whole bunch of others. 449 00:25:56,570 --> 00:25:57,612 I'm not showing them all. 450 00:25:57,612 --> 00:25:58,540 It becomes very messy. 451 00:25:58,540 --> 00:26:03,100 But here are all of the other edges between nodes which are-- 452 00:26:07,220 --> 00:26:09,410 where they're not forbidden. 453 00:26:09,410 --> 00:26:09,910 OK? 454 00:26:09,910 --> 00:26:12,820 And that's the graph. 455 00:26:12,820 --> 00:26:18,550 That's G. G is all those nodes and those edges which 456 00:26:18,550 --> 00:26:20,860 are not forbidden. 457 00:26:20,860 --> 00:26:22,810 And I just have to tell you what k is. 458 00:26:22,810 --> 00:26:28,150 k is going to be the number of clauses. 459 00:26:28,150 --> 00:26:31,420 That's going to be the size of the clique I'm 460 00:26:31,420 --> 00:26:35,430 looking for in this graph G that I just spelled out for you. 461 00:26:35,430 --> 00:26:40,080 And I'm going to claim that this graph here that I just 462 00:26:40,080 --> 00:26:43,470 described will have a k-clique. 463 00:26:43,470 --> 00:26:50,420 k is the number of clauses exactly when phi was satisfied. 464 00:26:50,420 --> 00:26:54,020 It's kind of cool. 465 00:26:54,020 --> 00:26:57,260 If phi is satisfiability, then there will be a k-clique here. 466 00:26:57,260 --> 00:27:01,340 And if phi was not satisfiability-- no k-clique. 467 00:27:01,340 --> 00:27:06,370 We're going to prove that as a claim on the next slide. 468 00:27:06,370 --> 00:27:09,140 OK, so k is the number of clauses. 469 00:27:09,140 --> 00:27:10,940 All right? 470 00:27:10,940 --> 00:27:13,770 Any questions on that construction? 471 00:27:13,770 --> 00:27:15,270 So I've done with the construction. 472 00:27:15,270 --> 00:27:18,110 What's left is to argue why the construction works. 473 00:27:25,940 --> 00:27:29,540 So far, so good-- let's move on. 474 00:27:29,540 --> 00:27:30,040 OK. 475 00:27:35,450 --> 00:27:37,430 So here is that very same construction. 476 00:27:37,430 --> 00:27:40,520 I eliminated the red forbidden edges. 477 00:27:40,520 --> 00:27:42,440 These are the ones that were remaining, 478 00:27:42,440 --> 00:27:46,010 plus anything else that was not forbidden. 479 00:27:46,010 --> 00:27:51,000 That was the same formula that I had from the previous slide. 480 00:27:51,000 --> 00:27:54,590 Now I want to claim that that formula is satisfiability 481 00:27:54,590 --> 00:27:56,630 exactly when G has a k-clique. 482 00:27:56,630 --> 00:27:59,890 Now, why in the world is that? 483 00:27:59,890 --> 00:28:02,070 So this is an if and only if. 484 00:28:02,070 --> 00:28:04,870 It's proved in both directions. 485 00:28:04,870 --> 00:28:06,943 And this is going to be the typical kind of thing 486 00:28:06,943 --> 00:28:09,360 that you're going to want to do when you're exhibiting a-- 487 00:28:13,800 --> 00:28:16,260 one of these reductions, which is you're 488 00:28:16,260 --> 00:28:18,580 going to have an opportunity to do that. 489 00:28:18,580 --> 00:28:22,815 And we'll do that too in our examples. 490 00:28:25,380 --> 00:28:31,410 OK, so now what I want to show is 491 00:28:31,410 --> 00:28:36,440 that, if phi is satisfiable-- so let's 492 00:28:36,440 --> 00:28:39,470 prove the forward direction-- if phi is satisfiable, 493 00:28:39,470 --> 00:28:40,820 then G has a k-clique. 494 00:28:43,420 --> 00:28:46,540 OK, so first of all, if phi is satisfiable, 495 00:28:46,540 --> 00:28:51,580 that means it has a satisfying assignment. 496 00:28:51,580 --> 00:28:53,800 Now, here's a common confusion. 497 00:28:53,800 --> 00:28:56,530 I'm not sure whether it's helpful to sprinkle 498 00:28:56,530 --> 00:28:58,790 the confusions alone with the discussion, 499 00:28:58,790 --> 00:29:02,390 but in case you're worried, you might ask, well, 500 00:29:02,390 --> 00:29:04,330 how do I find that satisfying assignment? 501 00:29:04,330 --> 00:29:10,120 I thought that was exponentially hard to do. 502 00:29:10,120 --> 00:29:10,980 This is a proof. 503 00:29:10,980 --> 00:29:13,250 This is not an algorithm. 504 00:29:13,250 --> 00:29:15,950 We're just trying to argue the correctness 505 00:29:15,950 --> 00:29:17,460 of this construction. 506 00:29:17,460 --> 00:29:20,060 So there's no longer any concern about how 507 00:29:20,060 --> 00:29:21,362 to find an assignment. 508 00:29:21,362 --> 00:29:23,570 I'm just saying, if there is an assignment-- if there 509 00:29:23,570 --> 00:29:25,790 is a satisfying assignment, then something 510 00:29:25,790 --> 00:29:27,320 will happen something. 511 00:29:27,320 --> 00:29:30,230 Then something will happen-- in particular, 512 00:29:30,230 --> 00:29:33,900 then we will show that G has a k-clique. 513 00:29:33,900 --> 00:29:39,080 So let's say phi is satisfiable, so it has some assignment. 514 00:29:39,080 --> 00:29:41,510 Let's take that satisfying assignment. 515 00:29:41,510 --> 00:29:47,470 And remember that, in any satisfying assignment 516 00:29:47,470 --> 00:29:53,830 to a formula in CNF, that makes at least one literal true 517 00:29:53,830 --> 00:29:54,610 in every clause. 518 00:29:58,050 --> 00:30:00,260 So let's just pick one of those. 519 00:30:00,260 --> 00:30:01,850 There might be several clauses which 520 00:30:01,850 --> 00:30:04,070 have multiple true literals. 521 00:30:04,070 --> 00:30:08,230 In that case, just pick one of them arbitrarily. 522 00:30:11,100 --> 00:30:13,890 I don't have this indicated on this diagram here, 523 00:30:13,890 --> 00:30:18,330 but imagine maybe, in the very first clause, a was true; 524 00:30:18,330 --> 00:30:22,800 in the second clause, b was true; in the third clause, 525 00:30:22,800 --> 00:30:27,630 e complement-- e bar was true, not e-- 526 00:30:27,630 --> 00:30:30,390 e bar was true, which means e itself was false, 527 00:30:30,390 --> 00:30:31,920 but e bar was true. 528 00:30:31,920 --> 00:30:34,830 And that's the way each of those clauses, in turn, 529 00:30:34,830 --> 00:30:37,620 got to be true in this particular satisfying 530 00:30:37,620 --> 00:30:39,522 assignment-- 531 00:30:39,522 --> 00:30:41,980 because you're going to pick one true literal every clause. 532 00:30:41,980 --> 00:30:51,260 And now, from that choice of literals-- one per clause-- 533 00:30:51,260 --> 00:30:55,550 I'm going to look at the corresponding nodes in G, 534 00:30:55,550 --> 00:30:59,600 and I'm going to claim that those nodes taken together 535 00:30:59,600 --> 00:31:01,040 form a k-clique. 536 00:31:06,240 --> 00:31:10,840 OK, so first of all, do they have the right number of nodes? 537 00:31:10,840 --> 00:31:15,630 Well, sure, because I'm picking one node per clique. 538 00:31:15,630 --> 00:31:18,120 I already said k is the number of cliques, 539 00:31:18,120 --> 00:31:20,400 so I'm getting exactly k nodes. 540 00:31:20,400 --> 00:31:24,510 So I have at least the right number of nodes. 541 00:31:24,510 --> 00:31:27,990 But how do I know they're all connected to one another, 542 00:31:27,990 --> 00:31:32,150 those nodes that I just picked? 543 00:31:32,150 --> 00:31:34,040 Well, they're all connected to each other 544 00:31:34,040 --> 00:31:36,270 because I'm going to say-- 545 00:31:36,270 --> 00:31:40,820 well, because there were no forbidden edges among them. 546 00:31:40,820 --> 00:31:43,990 And remember that we put in all possible edges except 547 00:31:43,990 --> 00:31:47,110 for the forbidden ones. 548 00:31:47,110 --> 00:31:49,350 So how do I know there are no forbidden edges? 549 00:31:49,350 --> 00:31:52,260 Well, what were the rules for being forbidden? 550 00:31:52,260 --> 00:31:55,320 It means they-- two nodes in the same clique-- 551 00:31:55,320 --> 00:31:58,830 in the same clause. 552 00:31:58,830 --> 00:32:03,040 Well, I'm picking one node per clause, 553 00:32:03,040 --> 00:32:05,730 so I can never be having two nodes from the same clause. 554 00:32:05,730 --> 00:32:07,650 So I'm never going to run into trouble 555 00:32:07,650 --> 00:32:11,082 with the first condition of being forbidden. 556 00:32:11,082 --> 00:32:13,290 So what's the second possibility for being forbidden, 557 00:32:13,290 --> 00:32:17,030 is that I'm picking two nodes which are inconsistent-- 558 00:32:17,030 --> 00:32:19,460 well, how do I know that I didn't end up 559 00:32:19,460 --> 00:32:23,460 with two nodes with inconsistent labels? 560 00:32:23,460 --> 00:32:26,130 That would be bad, because then they would have 561 00:32:26,130 --> 00:32:27,930 a forbidden edge, and what-- 562 00:32:27,930 --> 00:32:31,180 my result would not be a clique. 563 00:32:31,180 --> 00:32:32,805 How do I know I didn't end up picking-- 564 00:32:36,990 --> 00:32:42,450 in this group I pick a, and in this group I pick a bar? 565 00:32:42,450 --> 00:32:46,230 Well, because they all came from the same assignment. 566 00:32:48,840 --> 00:32:52,140 They were all true liberals in clauses. 567 00:32:52,140 --> 00:32:54,870 It's not possible that a was true in this clause 568 00:32:54,870 --> 00:32:56,910 and a bar is true in that clause, 569 00:32:56,910 --> 00:32:59,430 because if a is true here, a bar's got to be false. 570 00:32:59,430 --> 00:33:02,170 It can't be the true literal in this clause. 571 00:33:02,170 --> 00:33:09,570 So I cannot have any inconsistent nodes appearing 572 00:33:09,570 --> 00:33:12,870 among the nodes of my clique. 573 00:33:12,870 --> 00:33:15,480 And so they're not in the same clause. 574 00:33:15,480 --> 00:33:18,690 They're not inconsistent, so the edge has to be there. 575 00:33:18,690 --> 00:33:21,390 And that's going to be true for every pair of nodes 576 00:33:21,390 --> 00:33:22,500 in that clique-- 577 00:33:22,500 --> 00:33:24,960 in that group of nodes, and that's why it's a clique. 578 00:33:27,490 --> 00:33:28,942 So that proves one direction. 579 00:33:28,942 --> 00:33:30,900 Now we still have to prove the other direction, 580 00:33:30,900 --> 00:33:34,170 because we have to say, well, if G has a k-clique, 581 00:33:34,170 --> 00:33:36,160 how do we know that phi is satisfied? 582 00:33:36,160 --> 00:33:37,990 So that's the reverse. 583 00:33:37,990 --> 00:33:42,140 So let's just take any k-clique that's in G. 584 00:33:42,140 --> 00:33:44,965 And how do I know I can get from that, a satisfying assignment 585 00:33:44,965 --> 00:33:45,590 to the formula? 586 00:33:55,010 --> 00:33:59,550 Good-- getting some good questions here in the chat. 587 00:33:59,550 --> 00:34:02,960 But let me just move on. 588 00:34:02,960 --> 00:34:05,720 So proving the reverse-- 589 00:34:05,720 --> 00:34:09,770 the reverse direction, assuming we have a k-clique-- 590 00:34:09,770 --> 00:34:12,302 take any such k-clique. 591 00:34:12,302 --> 00:34:13,969 Now, first of all, you observe that it's 592 00:34:13,969 --> 00:34:16,865 got to have one node in every clause. 593 00:34:19,710 --> 00:34:22,409 It can't have two nodes in the same clause 594 00:34:22,409 --> 00:34:26,675 or zero nodes in a clause. 595 00:34:26,675 --> 00:34:29,050 First of all, it can't have two nodes in the same clause. 596 00:34:29,050 --> 00:34:29,770 Why? 597 00:34:29,770 --> 00:34:33,810 Well, because those nodes are never connected to one another. 598 00:34:33,810 --> 00:34:35,738 So they cannot be both in the same clique, 599 00:34:35,738 --> 00:34:38,030 because all the nodes in the inner clique are connected 600 00:34:38,030 --> 00:34:40,909 to one another. 601 00:34:40,909 --> 00:34:45,020 By the way we constructed G, nodes from the same clause are 602 00:34:45,020 --> 00:34:48,810 not connected, so they cannot appear in a clique together. 603 00:34:48,810 --> 00:34:52,340 So there's going to be at most one 604 00:34:52,340 --> 00:34:57,230 node from every clause appearing in this clique, but-- 605 00:34:57,230 --> 00:34:58,730 appearing in this clique. 606 00:34:58,730 --> 00:35:01,960 But we also have-- we know we have k nodes, 607 00:35:01,960 --> 00:35:07,090 so that means, if there is at most one per clause, 608 00:35:07,090 --> 00:35:10,390 and we only have k clauses, that means every clause has 609 00:35:10,390 --> 00:35:12,510 got to have one. 610 00:35:12,510 --> 00:35:14,310 If some clause is missing a node, 611 00:35:14,310 --> 00:35:18,860 then there's got to be some other clause that has two. 612 00:35:18,860 --> 00:35:22,850 So we know that every-- 613 00:35:22,850 --> 00:35:27,530 that this clique that G has exactly one 614 00:35:27,530 --> 00:35:30,190 node in every clause. 615 00:35:30,190 --> 00:35:33,950 So how do we get from that satisfying assignment? 616 00:35:33,950 --> 00:35:39,550 So what we're going to do is take the corresponding literals 617 00:35:39,550 --> 00:35:46,120 that correspond to that one node in that was in the clique, 618 00:35:46,120 --> 00:35:50,920 take those liberals and set those literals to be true, 619 00:35:50,920 --> 00:35:56,050 which means setting the variable to be true or setting-- 620 00:35:56,050 --> 00:35:58,960 if the literal was a bar, then setting a 621 00:35:58,960 --> 00:36:00,940 to be false, because you want a bar to be true. 622 00:36:00,940 --> 00:36:04,190 You want to set the literals to be true. 623 00:36:04,190 --> 00:36:07,030 Well, now we're setting-- 624 00:36:07,030 --> 00:36:08,800 there's one node in each clause-- 625 00:36:08,800 --> 00:36:11,055 we're setting one literal-- 626 00:36:11,055 --> 00:36:12,430 the corresponding literal-- true, 627 00:36:12,430 --> 00:36:16,360 so that's going to set one true literal in every clause. 628 00:36:16,360 --> 00:36:18,610 So that means it's going to be satisfying. 629 00:36:18,610 --> 00:36:22,360 There's one thing that one has to really be careful here 630 00:36:22,360 --> 00:36:23,710 to double check-- 631 00:36:23,710 --> 00:36:26,260 to make sure that we're doing this consistently, 632 00:36:26,260 --> 00:36:32,170 that we're not being asked to set a true and also a bar true, 633 00:36:32,170 --> 00:36:35,290 because then that would not be possible. 634 00:36:35,290 --> 00:36:39,970 But a and a bar are not connected, so it's-- 635 00:36:39,970 --> 00:36:42,910 we're never going to be trying to set both a and a bar true. 636 00:36:42,910 --> 00:36:44,898 If we're going to be trying to set a true, 637 00:36:44,898 --> 00:36:46,690 we're never going to try to set a bar true, 638 00:36:46,690 --> 00:36:48,435 because those are not-- 639 00:36:48,435 --> 00:36:51,190 a and a bar cannot be in the same clique. 640 00:36:51,190 --> 00:36:53,330 They're not connected to each other. 641 00:36:53,330 --> 00:36:53,830 OK? 642 00:36:53,830 --> 00:36:57,790 So that is the argument. 643 00:36:57,790 --> 00:37:00,078 And lastly, I claim-- 644 00:37:00,078 --> 00:37:01,870 we're not going to look at this in detail-- 645 00:37:01,870 --> 00:37:04,690 that it's kind of obvious that this reduction can 646 00:37:04,690 --> 00:37:06,070 be done in polynomial time. 647 00:37:06,070 --> 00:37:08,320 Namely, if I give you that formula, 648 00:37:08,320 --> 00:37:10,480 you can write out that graph in-- 649 00:37:10,480 --> 00:37:11,830 pretty easily. 650 00:37:11,830 --> 00:37:16,030 There's no hard work to be done in writing out that graph 651 00:37:16,030 --> 00:37:18,700 or counting up the number of clauses. 652 00:37:18,700 --> 00:37:24,490 So that's the proof that I can reduce 3SAT to clique. 653 00:37:24,490 --> 00:37:28,810 And it tells you also that, if I could solve clique quickly, 654 00:37:28,810 --> 00:37:31,150 then I can solve 3SAT quickly. 655 00:37:31,150 --> 00:37:34,060 And this is the whole point, because I 656 00:37:34,060 --> 00:37:39,820 can convert now clique problems to 3SAT problems. 657 00:37:39,820 --> 00:37:40,330 No. 658 00:37:40,330 --> 00:37:40,990 I'm sorry. 659 00:37:40,990 --> 00:37:43,268 I got it backwards-- convert 3SAT problems 660 00:37:43,268 --> 00:37:44,060 to clique problems. 661 00:37:44,060 --> 00:37:48,760 So if I can solve clique easily, then I can solve 3SAT easily. 662 00:37:48,760 --> 00:37:51,430 I just showed a way of converting these formulas 663 00:37:51,430 --> 00:37:53,560 to graphs. 664 00:37:53,560 --> 00:37:56,320 So that says, if I can solve the graphs easily, 665 00:37:56,320 --> 00:37:58,540 then I can solve the formulas easily. 666 00:37:58,540 --> 00:38:02,440 OK, why don't we turn to some-- oh, question, check-in. 667 00:38:02,440 --> 00:38:05,875 But we can now also-- 668 00:38:05,875 --> 00:38:08,740 I'll launch a check-in, but we'll-- 669 00:38:08,740 --> 00:38:11,800 let's just see. 670 00:38:11,800 --> 00:38:13,855 I'm seeing many, many questions here, 671 00:38:13,855 --> 00:38:15,730 which are actually what my check-in is about. 672 00:38:18,800 --> 00:38:20,090 I'll turn it back to you guys. 673 00:38:25,570 --> 00:38:33,130 OK, so where did we use the fact that we 674 00:38:33,130 --> 00:38:38,080 have three literals per clause? 675 00:38:38,080 --> 00:38:40,810 Does this thing just work even if we had any number 676 00:38:40,810 --> 00:38:43,570 of literals per clause? 677 00:38:43,570 --> 00:38:44,320 What do you think? 678 00:38:47,320 --> 00:38:49,030 We got a tight race here. 679 00:39:01,070 --> 00:39:04,720 Truth is losing out, unfortunately. 680 00:39:04,720 --> 00:39:05,220 All right. 681 00:39:05,220 --> 00:39:07,200 Oh, it's really close, but still-- 682 00:39:07,200 --> 00:39:07,700 OK. 683 00:39:11,610 --> 00:39:13,485 One more-- OK, it's neck and neck. 684 00:39:16,390 --> 00:39:18,520 OK, almost done-- 685 00:39:18,520 --> 00:39:25,630 10 seconds-- are we done here? 686 00:39:25,630 --> 00:39:29,020 OK, that's it-- ending polling. 687 00:39:42,260 --> 00:39:44,960 Yes, truth has won out. 688 00:39:44,960 --> 00:39:47,570 Thank God. 689 00:39:47,570 --> 00:39:49,280 Yeah, it works for any size clause. 690 00:39:49,280 --> 00:39:52,190 We didn't use the fact that it's a 3CNF. 691 00:39:52,190 --> 00:39:55,880 It could have been any number here. 692 00:39:55,880 --> 00:39:57,110 Well, we've got a-- 693 00:39:57,110 --> 00:40:03,020 I guess it's a plurality here, though, not a majority. 694 00:40:03,020 --> 00:40:05,817 Where did we use 3 in any of these argument? 695 00:40:05,817 --> 00:40:06,650 I didn't mention it. 696 00:40:06,650 --> 00:40:09,560 Maybe you were imagining that it's going to be part of it, 697 00:40:09,560 --> 00:40:13,370 but 3 does not come into this discussion at all. 698 00:40:13,370 --> 00:40:16,640 If you think about how it's-- why it's working, 699 00:40:16,640 --> 00:40:17,750 even if we had-- 700 00:40:17,750 --> 00:40:21,800 one of these clauses had 10 literals in it, 701 00:40:21,800 --> 00:40:25,370 as long as k is the number of clauses, 702 00:40:25,370 --> 00:40:28,100 and we don't connect any variable-- 703 00:40:28,100 --> 00:40:31,970 any literals internal to a clause, this whole argument 704 00:40:31,970 --> 00:40:33,200 is going to still work. 705 00:40:33,200 --> 00:40:34,460 So please check that make. 706 00:40:34,460 --> 00:40:36,020 Sure you understand what's going on, 707 00:40:36,020 --> 00:40:39,020 because I can see that a good chunk of you 708 00:40:39,020 --> 00:40:40,130 have not got this right. 709 00:40:44,120 --> 00:40:46,190 So I got a good question here. 710 00:40:46,190 --> 00:40:48,770 What if it's only just one big clause-- 711 00:40:48,770 --> 00:40:51,350 so it's like the whole formula's just one big OR. 712 00:40:51,350 --> 00:40:53,270 So what does happen in that case? 713 00:40:53,270 --> 00:40:55,445 Good question-- so in that case-- 714 00:41:00,530 --> 00:41:03,770 suppose there's one big clause with 100 literals. 715 00:41:03,770 --> 00:41:05,000 That's the whole formula. 716 00:41:05,000 --> 00:41:05,998 It's just a big OR. 717 00:41:05,998 --> 00:41:08,540 So we know the formula's going to be satisfiable, by the way, 718 00:41:08,540 --> 00:41:10,310 obviously. 719 00:41:10,310 --> 00:41:13,670 So if you look at the corresponding G, 720 00:41:13,670 --> 00:41:15,890 it's going to have 100 nodes, because there's 721 00:41:15,890 --> 00:41:18,410 one for every literal. 722 00:41:18,410 --> 00:41:20,630 None of them are going to be connected to each other, 723 00:41:20,630 --> 00:41:24,267 because they're all in the same clause. 724 00:41:24,267 --> 00:41:26,600 So it looks like we're going to be in tough shape trying 725 00:41:26,600 --> 00:41:29,272 to find a clique there, because there's no edges at all. 726 00:41:29,272 --> 00:41:30,230 Everything's forbidden. 727 00:41:30,230 --> 00:41:34,010 But what is k? k is going to be 1 in that case, 728 00:41:34,010 --> 00:41:35,840 because there's only one clause. 729 00:41:35,840 --> 00:41:44,030 And kind of a degenerate case, but a clique with just one-- 730 00:41:44,030 --> 00:41:48,560 just a single node is-- counts as a one clique, 731 00:41:48,560 --> 00:41:49,748 because it's just one node. 732 00:41:49,748 --> 00:41:51,290 There's no need for any edges at all. 733 00:41:51,290 --> 00:41:52,530 It still counts as a clique. 734 00:41:52,530 --> 00:41:56,670 So it'll still work out. 735 00:41:56,670 --> 00:41:57,180 Let's see. 736 00:41:57,180 --> 00:41:58,787 What else here? 737 00:41:58,787 --> 00:42:00,870 That was kind of a fun question that you asked me. 738 00:42:00,870 --> 00:42:01,500 Thank you. 739 00:42:04,880 --> 00:42:06,140 There are a lot of questions. 740 00:42:08,690 --> 00:42:11,690 I think we're actually pretty near the break also. 741 00:42:11,690 --> 00:42:13,380 We'll just see. 742 00:42:13,380 --> 00:42:13,880 Yeah. 743 00:42:20,090 --> 00:42:23,070 Oh, many, many questions here-- 744 00:42:23,070 --> 00:42:26,300 so I think I'm going to start the clock going down 745 00:42:26,300 --> 00:42:32,210 for our, coffee break and I will take some of these questions. 746 00:42:32,210 --> 00:42:35,860 I'll try to answer some of these questions-- oops-- 747 00:42:35,860 --> 00:42:41,290 try to answer some of these questions afterward. 748 00:42:44,080 --> 00:42:44,790 All right. 749 00:42:50,160 --> 00:42:51,480 Good-- all right. 750 00:42:54,770 --> 00:42:57,110 So I got a question about making sure 751 00:42:57,110 --> 00:43:00,470 why nodes don't have inconsistent labels, if I 752 00:43:00,470 --> 00:43:02,090 understand the question correctly. 753 00:43:06,940 --> 00:43:10,510 So I never put edges between nodes with inconsistent labels. 754 00:43:10,510 --> 00:43:11,530 That's the rule. 755 00:43:11,530 --> 00:43:12,610 That's my construction. 756 00:43:12,610 --> 00:43:15,590 I get to say how the construction works. 757 00:43:15,590 --> 00:43:18,760 So those two nodes with inconsistent labels 758 00:43:18,760 --> 00:43:23,680 can never be in the same clique, because there's 759 00:43:23,680 --> 00:43:25,208 no edge between them. 760 00:43:25,208 --> 00:43:27,250 So I'm not sure that answered the question there, 761 00:43:27,250 --> 00:43:29,890 but that was-- 762 00:43:29,890 --> 00:43:31,090 let's see. 763 00:43:31,090 --> 00:43:34,570 So we're getting another question here. 764 00:43:34,570 --> 00:43:38,620 Since any Boolean formula can be converted to CNF, 765 00:43:38,620 --> 00:43:40,870 does that mean-- 766 00:43:40,870 --> 00:43:43,390 is SAT polynomial time reducible to clique as w-- 767 00:43:50,900 --> 00:43:52,750 [INAUDIBLE] time reducible to clique, 768 00:43:52,750 --> 00:43:55,260 but not for the reason that any Boolean formula can be 769 00:43:55,260 --> 00:44:01,860 converted to CNF, because that conversion is going to be 770 00:44:01,860 --> 00:44:03,840 an exponential conversion in general-- 771 00:44:03,840 --> 00:44:05,007 the conversion to CNF. 772 00:44:05,007 --> 00:44:06,840 So you have to be careful about what we mean 773 00:44:06,840 --> 00:44:09,025 by converting a formula to CNF. 774 00:44:12,520 --> 00:44:15,020 So also getting question-- what happens in the case of 1SAT? 775 00:44:15,020 --> 00:44:18,593 Well, we didn't really talk about 1SAT. 776 00:44:18,593 --> 00:44:20,260 So the question is, suppose there's only 777 00:44:20,260 --> 00:44:24,320 a single literal per clause. 778 00:44:24,320 --> 00:44:27,620 That's an interesting-- another sort of edge case here. 779 00:44:27,620 --> 00:44:29,970 What happens under those circumstances? 780 00:44:29,970 --> 00:44:33,365 So there's only a single literal per clause. 781 00:44:37,530 --> 00:44:39,180 So you have to work out what happens. 782 00:44:39,180 --> 00:44:46,490 But in that case, all the clauses just 783 00:44:46,490 --> 00:44:50,360 have one literal in them, and now they have-- 784 00:45:05,000 --> 00:45:13,570 when you have a 1CNF, so there's only one literally per clause, 785 00:45:13,570 --> 00:45:15,520 the only way that can be satisfiable 786 00:45:15,520 --> 00:45:17,710 is you have to make each literal true, 787 00:45:17,710 --> 00:45:19,360 because there's no ORing anymore. 788 00:45:19,360 --> 00:45:21,730 So every literal has to be true in the assignment. 789 00:45:21,730 --> 00:45:25,810 So that means you can never have any consistent literals. 790 00:45:25,810 --> 00:45:28,180 And that's the only case when you would not have 791 00:45:28,180 --> 00:45:30,985 an edge, because you're not-- 792 00:45:30,985 --> 00:45:33,250 the forbidden condition won't happen 793 00:45:33,250 --> 00:45:37,120 because you don't have more than one node per clause. 794 00:45:37,120 --> 00:45:40,030 Every clause is going to have just a single node in it. 795 00:45:40,030 --> 00:45:42,940 So it really comes down to whether or not 796 00:45:42,940 --> 00:45:46,210 you have an inconsistently labeled-- 797 00:45:46,210 --> 00:45:48,200 inconsistent clauses. 798 00:45:48,200 --> 00:45:51,285 I didn't explain that super well, but it's-- 799 00:45:51,285 --> 00:45:52,660 the argument still works, though. 800 00:45:52,660 --> 00:45:54,035 You should check it for yourself. 801 00:45:56,210 --> 00:45:57,520 So this is a basic question. 802 00:45:57,520 --> 00:45:59,860 How do we see that F is polynomial time? 803 00:45:59,860 --> 00:46:05,140 I'm not sure I want to spend a lot of discussion on that. 804 00:46:05,140 --> 00:46:10,720 The conversion from the formula to the graph is kind 805 00:46:10,720 --> 00:46:13,180 of a one-to-one-- 806 00:46:13,180 --> 00:46:16,660 every literal becomes a node. 807 00:46:16,660 --> 00:46:18,490 The rule for when edges are present-- it's 808 00:46:18,490 --> 00:46:21,940 a very simple rule. 809 00:46:21,940 --> 00:46:23,470 I'm not sure what more to say. 810 00:46:23,470 --> 00:46:27,040 It seems pretty clear that the conversion-- 811 00:46:27,040 --> 00:46:29,560 that reduction is polynomial time computable. 812 00:46:29,560 --> 00:46:31,330 If you wrote a program to do that, 813 00:46:31,330 --> 00:46:35,050 it would operate very quickly. 814 00:46:35,050 --> 00:46:38,110 Is it possible for G to have a k plus n clique, where 815 00:46:38,110 --> 00:46:39,340 n is greater than 0? 816 00:46:39,340 --> 00:46:41,650 Does that matter? 817 00:46:41,650 --> 00:46:45,040 If you think about it, the biggest possible clique 818 00:46:45,040 --> 00:46:48,070 that this graph G could have is k. 819 00:46:48,070 --> 00:46:52,540 It could never have more than a bigger clique, 820 00:46:52,540 --> 00:46:58,220 because two nodes in the same clause are never connected. 821 00:46:58,220 --> 00:47:01,130 You only have k clauses. 822 00:47:01,130 --> 00:47:04,600 So the answer is no, you cannot have a-- 823 00:47:04,600 --> 00:47:06,550 bigger than a k-clique-- 824 00:47:06,550 --> 00:47:08,340 just not going to happen. 825 00:47:08,340 --> 00:47:11,200 Does each clause need to have the same number of literals? 826 00:47:11,200 --> 00:47:11,890 I don't see why. 827 00:47:17,460 --> 00:47:21,450 So question is, why are we worrying about 3SAT, 828 00:47:21,450 --> 00:47:23,680 since it didn't seem to matter here? 829 00:47:23,680 --> 00:47:26,910 There are other examples where it does matter. 830 00:47:26,910 --> 00:47:29,050 I'm actually not sure if we'll see one or not, 831 00:47:29,050 --> 00:47:33,270 but there are cases where it does matter. 832 00:47:33,270 --> 00:47:35,460 Let me just see if there's any quick questions here. 833 00:47:40,210 --> 00:47:42,590 Somebody's saying, does-- do we have 834 00:47:42,590 --> 00:47:45,770 to worry about there being a polynomial number of literals? 835 00:47:45,770 --> 00:47:48,500 So you have to think about what that question means. 836 00:47:48,500 --> 00:47:52,400 The size of the input, which includes all of the literals, 837 00:47:52,400 --> 00:47:54,750 is n. 838 00:47:54,750 --> 00:47:57,870 So there's going to be, at most, n literals, 839 00:47:57,870 --> 00:48:02,850 because that's the size of the input, at most, 840 00:48:02,850 --> 00:48:04,380 in literals appearing. 841 00:48:04,380 --> 00:48:09,920 So you don't have to worry about that being polynomial, 842 00:48:09,920 --> 00:48:10,610 because-- 843 00:48:10,610 --> 00:48:13,640 by the way, we're defining the size of the input 844 00:48:13,640 --> 00:48:16,840 is going to be, at most, n literals. 845 00:48:16,840 --> 00:48:17,770 I hope that's clear. 846 00:48:21,585 --> 00:48:22,085 OK. 847 00:48:26,290 --> 00:48:27,640 OK, so this is a good question. 848 00:48:27,640 --> 00:48:29,560 When we talk about polynomial time, 849 00:48:29,560 --> 00:48:32,560 it's polynomial in the representation 850 00:48:32,560 --> 00:48:35,650 of the input, which is-- 851 00:48:35,650 --> 00:48:37,720 if you want to think about it in terms of bits, 852 00:48:37,720 --> 00:48:39,160 that's fine-- or whatever. 853 00:48:39,160 --> 00:48:41,590 It's not going to matter if you use some larger alphabet. 854 00:48:41,590 --> 00:48:45,610 But the number of symbols you need in your fixed size 855 00:48:45,610 --> 00:48:47,350 alphabet, the number of symbols you 856 00:48:47,350 --> 00:48:49,720 need to write down that input in whatever 857 00:48:49,720 --> 00:48:52,390 your reasonable encoding is is going to be n. 858 00:48:52,390 --> 00:48:54,880 And it's going to be polynomial in n, polynomial 859 00:48:54,880 --> 00:48:59,020 in that length of the input. 860 00:48:59,020 --> 00:49:01,340 So another question is, does SAT reduce the 3SAT? 861 00:49:01,340 --> 00:49:01,840 Yes. 862 00:49:01,840 --> 00:49:03,580 That we will see. 863 00:49:03,580 --> 00:49:06,190 SAT actually does polynomial time reduce to 3SAT. 864 00:49:06,190 --> 00:49:08,440 Not by converting to an equivalent formula, 865 00:49:08,440 --> 00:49:09,550 but by some more-- 866 00:49:09,550 --> 00:49:12,828 somewhat more involved argument than that. 867 00:49:12,828 --> 00:49:13,870 OK, why don't we move on? 868 00:49:13,870 --> 00:49:16,287 Some of these are going to come out anyway in the lecture. 869 00:49:19,290 --> 00:49:21,720 OK, so now let's talk about NP completeness, 870 00:49:21,720 --> 00:49:23,430 because we've kind of set things up. 871 00:49:23,430 --> 00:49:26,220 We're not going to prove that the basic theorem, 872 00:49:26,220 --> 00:49:28,500 the Cook-Levin theorem about NP completeness, 873 00:49:28,500 --> 00:49:32,095 but at least we'll be able to make the definition. 874 00:49:32,095 --> 00:49:34,470 OK, here is our definition of what it means for a problem 875 00:49:34,470 --> 00:49:35,280 to be NP complete. 876 00:49:40,860 --> 00:49:47,000 So a language B is called the NP complete 877 00:49:47,000 --> 00:49:49,550 if it has two properties. 878 00:49:49,550 --> 00:49:53,320 Number one is that it has to be a member of NP, 879 00:49:53,320 --> 00:49:54,820 and number two-- 880 00:49:54,820 --> 00:49:57,900 every language in NP has the polynomial time 881 00:49:57,900 --> 00:50:01,300 reduce to that language-- to that NP 882 00:50:01,300 --> 00:50:05,810 complete language in order for it to be NP complete. 883 00:50:05,810 --> 00:50:11,460 So simple picture-- has to be in NP and everything 884 00:50:11,460 --> 00:50:15,050 an NP reduces to it. 885 00:50:15,050 --> 00:50:17,200 And so that's kind of the magical property 886 00:50:17,200 --> 00:50:19,330 that we claim that SAT has. 887 00:50:19,330 --> 00:50:22,600 sat, for one thing, is obviously in NP. 888 00:50:22,600 --> 00:50:27,340 And as we-- the Cook-Levin theorem shows-- 889 00:50:27,340 --> 00:50:30,700 or will show-- everything in NP is reducible to SAT, 890 00:50:30,700 --> 00:50:33,307 so SAT's going to be our first example of an NP 891 00:50:33,307 --> 00:50:34,015 complete problem. 892 00:50:42,200 --> 00:50:44,540 And we're going to get what we claimed also 893 00:50:44,540 --> 00:50:49,700 for SAT-- that, if SAT or any other NP complete problem 894 00:50:49,700 --> 00:50:53,690 turns out to be solvable in polynomial time, then 895 00:50:53,690 --> 00:50:56,060 every NP problem is solvable in polynomial time. 896 00:50:56,060 --> 00:50:58,460 And that's immediate, because everything 897 00:50:58,460 --> 00:51:02,400 is reducible in polynomial time to the NP complete problem. 898 00:51:02,400 --> 00:51:05,390 So if you can do it easily, you can do everything easily 899 00:51:05,390 --> 00:51:08,890 just by going through the reduction. 900 00:51:08,890 --> 00:51:10,510 OK. 901 00:51:10,510 --> 00:51:12,260 So the Cook-Levin theorem, as I mentioned, 902 00:51:12,260 --> 00:51:13,960 is that SAT is NP complete. 903 00:51:13,960 --> 00:51:16,780 And we're going to actually prove it next lecture, 904 00:51:16,780 --> 00:51:19,570 but let's assume for the remainder of this lecture 905 00:51:19,570 --> 00:51:21,860 that we know it to be true. 906 00:51:21,860 --> 00:51:25,210 So I'll use the terminology of problems being NP complete, 907 00:51:25,210 --> 00:51:27,940 assuming that we know-- 908 00:51:27,940 --> 00:51:31,140 that we have SAT as NP complete. 909 00:51:31,140 --> 00:51:31,950 OK? 910 00:51:31,950 --> 00:51:35,040 So we're going to be using some of the things 911 00:51:35,040 --> 00:51:38,790 that we're proving next lecture just in the terminology 912 00:51:38,790 --> 00:51:41,037 that we're going to be talking today. 913 00:51:41,037 --> 00:51:42,120 OK, so here's the picture. 914 00:51:44,670 --> 00:51:46,530 Here's the class NP. 915 00:51:46,530 --> 00:51:50,730 And everything in NP is polynomial time reducible 916 00:51:50,730 --> 00:51:51,720 to SAT. 917 00:51:51,720 --> 00:51:53,300 SAT itself is a member of NP, but I 918 00:51:53,300 --> 00:51:55,800 didn't want to show it that way because it makes the picture 919 00:51:55,800 --> 00:52:01,740 kind of hard to display. 920 00:52:01,740 --> 00:52:06,720 So just from the perspective of the reduction, everything in NP 921 00:52:06,720 --> 00:52:08,700 is polynomial time reducible to SAT. 922 00:52:08,700 --> 00:52:10,380 We'll show that next lecture. 923 00:52:10,380 --> 00:52:12,990 Another thing that we'll show next lecture 924 00:52:12,990 --> 00:52:15,960 is that SAT, in turn, is polynomial time 925 00:52:15,960 --> 00:52:17,550 reducible to 3SAT. 926 00:52:17,550 --> 00:52:22,400 So 3SAT, as you remember, are just those problems 927 00:52:22,400 --> 00:52:24,110 that are in conjunct-- 928 00:52:24,110 --> 00:52:26,870 are in 3CNF. 929 00:52:26,870 --> 00:52:31,910 And then what we show today is that 3SAT is polynomial time 930 00:52:31,910 --> 00:52:34,220 reducible to clique. 931 00:52:34,220 --> 00:52:40,390 So now, taking the assumption that SAT is NP complete-- 932 00:52:40,390 --> 00:52:44,140 so everything is polynomial time reduce both the SAT, which 933 00:52:44,140 --> 00:52:47,500 is, in turn, polynomial time reducible to 3SAT, and in turn, 934 00:52:47,500 --> 00:52:48,700 reducible to clique. 935 00:52:48,700 --> 00:52:52,660 These reductions, as we've seen before, composed. 936 00:52:57,010 --> 00:53:00,960 You can just apply one reduction function after the next. 937 00:53:00,960 --> 00:53:04,080 If each one individually is polynomial, 938 00:53:04,080 --> 00:53:07,240 the whole thing as a combination is going to be polynomial. 939 00:53:07,240 --> 00:53:09,720 So now we know that 3SAT is going 940 00:53:09,720 --> 00:53:15,150 to be also NP complete, because we can reduce anything in NP 941 00:53:15,150 --> 00:53:17,610 to SAT, and then to 3SAT, and then 942 00:53:17,610 --> 00:53:19,890 we get a reduction directly to 3SAT 943 00:53:19,890 --> 00:53:21,420 by composing those two reductions-- 944 00:53:21,420 --> 00:53:23,130 and then, furthermore, at the clique. 945 00:53:23,130 --> 00:53:26,530 So now we're-- have several NP complete problems. 946 00:53:26,530 --> 00:53:31,260 And moving beyond that, we have the HAMPATH problem, which we 947 00:53:31,260 --> 00:53:32,980 are going to talk about next. 948 00:53:32,980 --> 00:53:36,960 And we'll show another reduction in addition 949 00:53:36,960 --> 00:53:39,420 to the one we just showed to clique, now one 950 00:53:39,420 --> 00:53:43,200 going from 3SAT to HAMPATH. 951 00:53:43,200 --> 00:53:44,430 OK. 952 00:53:44,430 --> 00:53:49,410 So in general, I think the takeaway message is 953 00:53:49,410 --> 00:53:51,840 that, to show some language is NP complete, 954 00:53:51,840 --> 00:53:54,960 you want to show that 3SAT is polynomial time 955 00:53:54,960 --> 00:53:55,830 reducible to it. 956 00:54:02,250 --> 00:54:04,820 OK, some good questions coming in-- 957 00:54:04,820 --> 00:54:06,300 I'll try to answer those. 958 00:54:06,300 --> 00:54:10,310 So you're going to take 3SAT and reduce to C. 959 00:54:10,310 --> 00:54:13,215 That's the most typical case. 960 00:54:13,215 --> 00:54:14,840 There's going to be other examples too, 961 00:54:14,840 --> 00:54:16,640 we might start with another problem 962 00:54:16,640 --> 00:54:18,800 that you've already shown to be in NP complete, 963 00:54:18,800 --> 00:54:20,870 and reduce it to your language. 964 00:54:20,870 --> 00:54:25,670 So it doesn't have to start with 3SAT, though often, it does. 965 00:54:29,330 --> 00:54:33,070 Why is this concept important? 966 00:54:33,070 --> 00:54:34,998 So I would say there are two reasons. 967 00:54:34,998 --> 00:54:36,540 And this is going to get a little bit 968 00:54:36,540 --> 00:54:40,390 at some of the chat questions. 969 00:54:40,390 --> 00:54:45,210 First of all, if you're faced with some new computational 970 00:54:45,210 --> 00:54:48,840 problem-- you've got some robotics problem that you want 971 00:54:48,840 --> 00:54:53,610 to solve in your thesis, and you need some algorithm about 972 00:54:53,610 --> 00:54:55,320 whether the robot arm can do such-- 973 00:54:55,320 --> 00:54:58,560 move in a certain such a way, and involves searching-- 974 00:54:58,560 --> 00:55:01,380 possibly searching through a space of different kinds 975 00:55:01,380 --> 00:55:04,830 of motions, and you want to know-- 976 00:55:04,830 --> 00:55:06,510 I'd like to find a polynomial algorithm 977 00:55:06,510 --> 00:55:08,267 to solve that problem. 978 00:55:08,267 --> 00:55:10,350 I'm using this as an example because this actually 979 00:55:10,350 --> 00:55:12,600 did happen to one of the former students 980 00:55:12,600 --> 00:55:15,030 from this class, who was working in robotics, 981 00:55:15,030 --> 00:55:17,940 and they actually end up proving that the problem that they were 982 00:55:17,940 --> 00:55:19,350 trying to solve is NP complete. 983 00:55:23,180 --> 00:55:29,230 So that's useful information, because even 984 00:55:29,230 --> 00:55:32,170 though knowing a problem is NP complete 985 00:55:32,170 --> 00:55:34,810 doesn't guarantee that it's not in P-- 986 00:55:34,810 --> 00:55:37,300 because conceivably, P equals NP-- 987 00:55:37,300 --> 00:55:39,820 what it does tell you-- 988 00:55:39,820 --> 00:55:41,650 if you have a problem that's NP complete 989 00:55:41,650 --> 00:55:45,070 and it does turn out to be in P, then P equals NP. 990 00:55:45,070 --> 00:55:49,120 So there would be tremendous surprising consequences 991 00:55:49,120 --> 00:55:53,950 of your problem, if it was known to be NP complete, ending up 992 00:55:53,950 --> 00:55:57,100 in P. So generally, people take a proof of NP 993 00:55:57,100 --> 00:55:59,680 completeness as powerful evidence 994 00:55:59,680 --> 00:56:02,200 that the problem is not in P. Even though it's not quite 995 00:56:02,200 --> 00:56:05,140 a proof, it's powerful evidence that it's not in P. 996 00:56:05,140 --> 00:56:07,180 And so you might as well give up working 997 00:56:07,180 --> 00:56:09,460 on trying to find a polynomial algorithm for it, 998 00:56:09,460 --> 00:56:12,400 because if you do, you don't have to worry 999 00:56:12,400 --> 00:56:13,360 about robotics anymore. 1000 00:56:13,360 --> 00:56:19,850 You're going to become famous as a So I wouldn't worry about-- 1001 00:56:19,850 --> 00:56:21,800 so if you throw problem's NP complete, 1002 00:56:21,800 --> 00:56:23,600 you can pretty much assume-- 1003 00:56:23,600 --> 00:56:26,900 almost certainly not in P. Now, there's 1004 00:56:26,900 --> 00:56:28,730 another reason related to-- 1005 00:56:28,730 --> 00:56:31,880 for the theorists to be-- care about NP completeness, 1006 00:56:31,880 --> 00:56:36,710 and that is, if you're trying to prove P is different from NP-- 1007 00:56:36,710 --> 00:56:39,425 or P equals NP, as one of the chat questions is raising-- 1008 00:56:43,850 --> 00:56:48,660 order to prove P different from NP, the most likely approach 1009 00:56:48,660 --> 00:56:51,480 is you pick some [? pre ?] problem in NP 1010 00:56:51,480 --> 00:56:55,155 and show it's not in P. 1011 00:56:55,155 --> 00:56:57,280 That's what it would mean for them to be different. 1012 00:56:57,280 --> 00:56:58,780 You're going to pick some NP problem 1013 00:56:58,780 --> 00:57:00,640 and show it's not a P problem. 1014 00:57:03,250 --> 00:57:08,735 Well, one thing would be terrible is possibly, 1015 00:57:08,735 --> 00:57:13,620 P is different from NP, and you pick the wrong problem. 1016 00:57:13,620 --> 00:57:17,940 Suppose I'd spent all my time trying to show-- 1017 00:57:17,940 --> 00:57:20,190 back 20 years ago, I'm working really 1018 00:57:20,190 --> 00:57:23,290 hard trying to show composites is not 1019 00:57:23,290 --> 00:57:28,950 in P, which is-- would have been perfectly reasonable to do, 1020 00:57:28,950 --> 00:57:31,590 because composites is an NP, and it was not 1021 00:57:31,590 --> 00:57:34,993 known to be in P 20 years ago. 1022 00:57:34,993 --> 00:57:36,410 And then I invested tons of effort 1023 00:57:36,410 --> 00:57:39,050 to try to prove-- because I like number theory or God knows 1024 00:57:39,050 --> 00:57:42,410 what-- to prove that composites is not in P. 1025 00:57:42,410 --> 00:57:43,995 And then, turns out, composites was 1026 00:57:43,995 --> 00:57:45,620 in P. It was the wrong problem to pick, 1027 00:57:45,620 --> 00:57:48,040 even though P might be different from NP. 1028 00:57:48,040 --> 00:57:50,320 But what NP complete is guarantees 1029 00:57:50,320 --> 00:57:52,810 is that, if you work on a problem, which is NP complete, 1030 00:57:52,810 --> 00:57:54,730 you can't pick the wrong problem, 1031 00:57:54,730 --> 00:57:58,360 because if any problem is in NP, and not in P, 1032 00:57:58,360 --> 00:58:03,420 an NP complete problem is going to be an example of that-- 1033 00:58:03,420 --> 00:58:06,360 because if the NP complete problems in P, everything in NP 1034 00:58:06,360 --> 00:58:07,410 is in P. 1035 00:58:07,410 --> 00:58:09,370 So if NP is different from P, you 1036 00:58:09,370 --> 00:58:11,160 know the complete problems are not in P, 1037 00:58:11,160 --> 00:58:13,600 so you might as well work on one of those. 1038 00:58:13,600 --> 00:58:17,340 So those are two ways in which NP completeness has turned out 1039 00:58:17,340 --> 00:58:19,118 to be-- is an important concept. 1040 00:58:19,118 --> 00:58:20,160 OK, so here's a check-in. 1041 00:58:26,240 --> 00:58:27,615 You guys are getting-- 1042 00:58:27,615 --> 00:58:29,990 maybe you're getting-- starting to think the way I think. 1043 00:58:29,990 --> 00:58:31,310 You're getting-- some-- 1044 00:58:31,310 --> 00:58:34,825 at least one of you asked this question in the chat. 1045 00:58:34,825 --> 00:58:36,200 Which language are we've probably 1046 00:58:36,200 --> 00:58:40,240 seen is most analogous to SAT-- 1047 00:58:40,240 --> 00:58:44,170 ATM, ETM, or 0 to the k, 1 to the k? 1048 00:58:46,690 --> 00:58:48,400 Obviously, this is maybe subjective. 1049 00:58:48,400 --> 00:58:49,900 You may have your own interpretation 1050 00:58:49,900 --> 00:58:51,370 of what that means. 1051 00:58:51,370 --> 00:58:54,190 There's, in a sense, perhaps no right answer, 1052 00:58:54,190 --> 00:58:55,330 but what do you think? 1053 00:59:00,440 --> 00:59:01,640 OK, that's it. 1054 00:59:05,167 --> 00:59:06,750 I don't know how in the world that you 1055 00:59:06,750 --> 00:59:09,750 could see this problem as analogous to 0 to the k, 1 1056 00:59:09,750 --> 00:59:11,970 to the k, but OK. 1057 00:59:11,970 --> 00:59:14,490 I'm sure you have your reason. 1058 00:59:14,490 --> 00:59:17,700 Yes, this is a lot like ATM. 1059 00:59:17,700 --> 00:59:18,570 Why? 1060 00:59:18,570 --> 00:59:21,720 well, because for one thing, we showed in a homework problem 1061 00:59:21,720 --> 00:59:25,590 that all Turing recognizable languages 1062 00:59:25,590 --> 00:59:27,270 are reducible to ATM-- 1063 00:59:27,270 --> 00:59:28,650 mapping reducible to ATM. 1064 00:59:28,650 --> 00:59:31,830 So that's a little bit like the notion of completeness 1065 00:59:31,830 --> 00:59:37,290 that we have for satisfiability, because all NP problems are 1066 00:59:37,290 --> 00:59:41,630 going to be reducible to the SAT. 1067 00:59:41,630 --> 00:59:46,130 And the other thing too is that, once we start-- 1068 00:59:46,130 --> 00:59:48,480 we want to show other problems are undecidable, 1069 00:59:48,480 --> 00:59:50,210 we reduced ATM to them. 1070 00:59:50,210 --> 00:59:51,840 And that's also very similar. 1071 00:59:51,840 --> 00:59:55,160 We're going to be reducing SAT or 3SAT-- 1072 00:59:55,160 --> 00:59:57,140 so it's indirectly from SAT-- 1073 00:59:57,140 --> 00:59:58,775 to other problems in order to show 1074 00:59:58,775 --> 01:00:00,150 that there shouldn't NP complete, 1075 01:00:00,150 --> 01:00:01,670 so that they're hard-- 1076 01:00:01,670 --> 01:00:04,110 that they're hard, in a sense. 1077 01:00:04,110 --> 01:00:08,330 And so in both those ways, ATM and SAT 1078 01:00:08,330 --> 01:00:10,820 are kind of playing similar roles. 1079 01:00:10,820 --> 01:00:15,680 One key difference, however, between ATM and SAT 1080 01:00:15,680 --> 01:00:20,990 is in that for ATM, we can prove that it's undecidable, 1081 01:00:20,990 --> 01:00:23,240 but for SAT, we don't know how to prove 1082 01:00:23,240 --> 01:00:30,520 it's outside of P. Those would be the analogous situations. 1083 01:00:30,520 --> 01:00:36,690 And so the story for SAT, which is easily 1084 01:00:36,690 --> 01:00:41,110 solved by a diagonalization argument for ATM-- 1085 01:00:41,110 --> 01:00:43,600 there's reasons to believe-- that we will see later-- 1086 01:00:43,600 --> 01:00:47,440 that the diagonalization is not going to work to prove SAT 1087 01:00:47,440 --> 01:00:51,520 outside of P. And besides that, we don't really 1088 01:00:51,520 --> 01:00:54,310 have any good methods. 1089 01:00:54,310 --> 01:01:00,680 So anyway, let's move on. 1090 01:01:00,680 --> 01:01:04,850 Why is ETM less analogous? 1091 01:01:04,850 --> 01:01:07,720 Because the ATM was the first problem we showed undecidable, 1092 01:01:07,720 --> 01:01:09,940 and SAT is the first problem that we're going 1093 01:01:09,940 --> 01:01:12,020 to be showing NP complete. 1094 01:01:12,020 --> 01:01:15,650 I guess that would be my answer. 1095 01:01:15,650 --> 01:01:17,180 OK, let's continue. 1096 01:01:17,180 --> 01:01:22,330 So let's show now that HAMPATH is NP complete, 1097 01:01:22,330 --> 01:01:27,190 assuming that we know SAT or 3SAT is NP complete. 1098 01:01:27,190 --> 01:01:31,660 So we're going to give a reduction from 3SAT to HAMPATH. 1099 01:01:31,660 --> 01:01:33,102 That's what this is about. 1100 01:01:33,102 --> 01:01:35,560 It's just like what we did for clique, but now for HAMPATH. 1101 01:01:38,070 --> 01:01:41,330 And this is going to be very typical. 1102 01:01:41,330 --> 01:01:43,700 In these reductions, typically what 1103 01:01:43,700 --> 01:01:50,850 happens is that you're trying to simulate a formula-- 1104 01:01:50,850 --> 01:01:53,460 Boolean formula for satisfiability-- 1105 01:01:53,460 --> 01:01:55,050 from the satisfiability perspective. 1106 01:01:55,050 --> 01:01:56,820 You're trying to simulate that formula 1107 01:01:56,820 --> 01:02:00,720 with some sort of structures inside the target language, 1108 01:02:00,720 --> 01:02:03,620 which would be HAMPATH. 1109 01:02:03,620 --> 01:02:06,800 The lingo that people use is that you're 1110 01:02:06,800 --> 01:02:08,810 going to build gadgets to simulate 1111 01:02:08,810 --> 01:02:10,430 the structures in the formula-- 1112 01:02:10,430 --> 01:02:14,495 namely, the variables, the literals, and the clauses. 1113 01:02:17,900 --> 01:02:23,510 OK, these are going to be substructures of the graph, 1114 01:02:23,510 --> 01:02:26,127 in this case, that you're building. 1115 01:02:26,127 --> 01:02:27,210 We'll see what that means. 1116 01:02:27,210 --> 01:02:32,420 So let's take a formula here and let's, again, 1117 01:02:32,420 --> 01:02:34,880 try to imagine how we would reduce that 1118 01:02:34,880 --> 01:02:36,510 to the HAMPATH problem. 1119 01:02:36,510 --> 01:02:39,370 So the reduction would produce a graph-- 1120 01:02:39,370 --> 01:02:43,190 no, it would produce an HAMPATH instance-- so a G, s, and t. 1121 01:02:43,190 --> 01:02:45,260 Want to know, is there a Hamiltonian path from s 1122 01:02:45,260 --> 01:02:47,860 to t in the graph? 1123 01:02:47,860 --> 01:02:50,800 And this is going to be not the whole graph, 1124 01:02:50,800 --> 01:02:53,633 but this is going to be a substructure in that graph. 1125 01:02:53,633 --> 01:02:55,800 The next slide is going to have the global structure 1126 01:02:55,800 --> 01:02:57,177 of the graph. 1127 01:02:57,177 --> 01:02:59,010 But here, this is going to be a key element, 1128 01:02:59,010 --> 01:03:02,325 and we're going to call that the variable gadget. 1129 01:03:05,220 --> 01:03:06,660 OK, what does it look like? 1130 01:03:06,660 --> 01:03:09,750 I don't know if you can see it on your-- 1131 01:03:09,750 --> 01:03:15,540 clearly enough, but these edges-- 1132 01:03:15,540 --> 01:03:17,390 so there are four outside nodes here. 1133 01:03:17,390 --> 01:03:20,510 The edges connecting them are all kind of pointed downward. 1134 01:03:20,510 --> 01:03:22,970 And then there were these horizontal nodes here, 1135 01:03:22,970 --> 01:03:24,680 and there are edges connecting them 1136 01:03:24,680 --> 01:03:26,510 both left to right and to left. 1137 01:03:29,060 --> 01:03:31,260 There's a row of these horizontal nodes. 1138 01:03:31,260 --> 01:03:35,138 OK, so you get the picture of what this looks like? 1139 01:03:35,138 --> 01:03:37,180 You have to look carefully to see the arrowheads. 1140 01:03:37,180 --> 01:03:40,240 I maybe should have made those a little bigger. 1141 01:03:44,950 --> 01:03:50,640 Whoops-- so now, if we're trying to get from this node to that 1142 01:03:50,640 --> 01:03:51,480 node-- 1143 01:03:51,480 --> 01:03:55,000 imagine now you're trying to build a Hamiltonian path, 1144 01:03:55,000 --> 01:03:59,470 because this is going to be a part of that graph G 1145 01:03:59,470 --> 01:04:01,402 that I'm constructing. 1146 01:04:01,402 --> 01:04:02,860 Now, remember, for them-- for there 1147 01:04:02,860 --> 01:04:05,740 to be a Hamiltonian graph, that means 1148 01:04:05,740 --> 01:04:11,660 you have to go through every node in the graph. 1149 01:04:11,660 --> 01:04:17,360 So if I want to get from s to t, the only way 1150 01:04:17,360 --> 01:04:23,120 I'm going to be able to go-- to hit these horizontal nodes here 1151 01:04:23,120 --> 01:04:32,620 is by picking them up as I go from s to t. 1152 01:04:32,620 --> 01:04:36,220 So the only possibility is-- if you think about it, is-- 1153 01:04:36,220 --> 01:04:43,490 would be for the sequence to go like this, if you can-- 1154 01:04:43,490 --> 01:04:45,040 if that comes through for you. 1155 01:04:45,040 --> 01:04:48,200 So the path would go from s to this node, 1156 01:04:48,200 --> 01:04:51,290 and then through these hops of-- 1157 01:04:51,290 --> 01:04:54,230 along these horizontal nodes, and then down to the bottom 1158 01:04:54,230 --> 01:04:56,330 node here. 1159 01:04:56,330 --> 01:05:00,163 So that's one way that you can get from here down to there 1160 01:05:00,163 --> 01:05:02,330 and pick up all the other nodes along the way, which 1161 01:05:02,330 --> 01:05:04,580 is-- they're not going to have any other possibility-- 1162 01:05:04,580 --> 01:05:06,720 possible ways of getting to them. 1163 01:05:06,720 --> 01:05:10,345 So I'm going to call that a zig-zag. 1164 01:05:10,345 --> 01:05:11,230 OK? 1165 01:05:11,230 --> 01:05:13,510 But there's another way to get from s 1166 01:05:13,510 --> 01:05:18,540 to the bottom node, which is going to be 1167 01:05:18,540 --> 01:05:22,180 by doing sort of the dual-- 1168 01:05:22,180 --> 01:05:24,740 going to the right, then going out to the left, 1169 01:05:24,740 --> 01:05:26,770 and then down to the bottom. 1170 01:05:26,770 --> 01:05:28,795 I'm going to call that a zig-zig-- 1171 01:05:35,590 --> 01:05:38,590 and with a little diagram here just 1172 01:05:38,590 --> 01:05:40,520 to summarize what it means. 1173 01:05:40,520 --> 01:05:44,800 And this is the classic thing for a variable gadget, 1174 01:05:44,800 --> 01:05:47,830 because it's a structure that when 1175 01:05:47,830 --> 01:05:52,840 you're trying to think about how the object that you're asking 1176 01:05:52,840 --> 01:05:54,100 whether it exists or not-- 1177 01:05:54,100 --> 01:05:57,190 the Hamiltonian path-- how it relates to that object, 1178 01:05:57,190 --> 01:05:59,140 it's going to have two possibilities, which 1179 01:05:59,140 --> 01:06:02,380 are going to correspond to the variable being set 1180 01:06:02,380 --> 01:06:04,550 true or false in the formula. 1181 01:06:04,550 --> 01:06:08,380 So we're showing how to set the-- 1182 01:06:08,380 --> 01:06:12,165 simulate the variable in this HAMPATH instance. 1183 01:06:15,340 --> 01:06:17,680 So setting the variable is going to correspond 1184 01:06:17,680 --> 01:06:22,910 to constructing the path. 1185 01:06:22,910 --> 01:06:23,570 OK? 1186 01:06:23,570 --> 01:06:27,920 Now, we also have to make sure not only that the variable gets 1187 01:06:27,920 --> 01:06:31,880 set to true or false, but that it gets a set of true and false 1188 01:06:31,880 --> 01:06:35,030 in a way that makes this a satisfying assignment-- namely, 1189 01:06:35,030 --> 01:06:38,370 that we get one true literal in every clause. 1190 01:06:38,370 --> 01:06:40,830 So I'm going to add another gadget here called 1191 01:06:40,830 --> 01:06:44,580 a clause gadget, which is just a single node. 1192 01:06:47,520 --> 01:06:50,370 And visiting that node here is going 1193 01:06:50,370 --> 01:06:55,063 to correspond to satisfying that clause, 1194 01:06:55,063 --> 01:06:56,730 to having a true literal in that clause. 1195 01:07:00,140 --> 01:07:04,370 I'm going to have [? enabled ?] a detour 1196 01:07:04,370 --> 01:07:08,090 from these horizontal nodes out to this clause gadget. 1197 01:07:08,090 --> 01:07:08,900 So here it is. 1198 01:07:08,900 --> 01:07:10,460 Here's that detour. 1199 01:07:10,460 --> 01:07:16,590 As I'm going from left to right, I can-- instead of 1200 01:07:16,590 --> 01:07:19,590 doing a single jump here, I could branch out 1201 01:07:19,590 --> 01:07:23,490 and visit that clause node, and then come back, 1202 01:07:23,490 --> 01:07:28,665 and pick up my left to right path, as I was doing before. 1203 01:07:32,750 --> 01:07:35,910 I hope you're seeing the big picture, 1204 01:07:35,910 --> 01:07:38,000 because this has to be a Hamiltonian path. 1205 01:07:38,000 --> 01:07:39,980 It has to hit every node. 1206 01:07:39,980 --> 01:07:42,410 That's one of the requirements. 1207 01:07:42,410 --> 01:07:45,110 This is going to be one of the nodes in my graph. 1208 01:07:45,110 --> 01:07:46,700 The path has got to hit that node. 1209 01:07:46,700 --> 01:07:49,310 The only way it's going to be able to hit that node 1210 01:07:49,310 --> 01:07:55,120 is by taking a detour off of this horizontal path here. 1211 01:07:55,120 --> 01:08:00,270 But notice-- and this is the key-- 1212 01:08:00,270 --> 01:08:05,030 if I'm doing a zig-zag, then I can make the detour and visit 1213 01:08:05,030 --> 01:08:06,080 that clause gadget-- 1214 01:08:06,080 --> 01:08:07,490 that clause node. 1215 01:08:07,490 --> 01:08:12,880 But if I'm doing a zag-zig, the way this detour is structured 1216 01:08:12,880 --> 01:08:16,420 does not allow me to visit that node, 1217 01:08:16,420 --> 01:08:20,170 because if I'm going from to right left here, 1218 01:08:20,170 --> 01:08:24,609 by the time I get to this outgoing edge-- 1219 01:08:24,609 --> 01:08:26,920 now I want to come back-- 1220 01:08:26,920 --> 01:08:30,720 that note has already been taken. 1221 01:08:30,720 --> 01:08:33,240 It only works if I'm going zag-zig, 1222 01:08:33,240 --> 01:08:36,149 if I'm going from right to left-- 1223 01:08:36,149 --> 01:08:36,810 left to right. 1224 01:08:36,810 --> 01:08:39,132 If I'm going from left to right, then I can do it, 1225 01:08:39,132 --> 01:08:40,590 but if I'm going right to left, no. 1226 01:08:43,140 --> 01:08:45,090 If you think of left-- 1227 01:08:45,090 --> 01:08:47,160 the left to right direction as true, 1228 01:08:47,160 --> 01:08:49,560 that's going to correspond to that variable appearing 1229 01:08:49,560 --> 01:08:52,680 in the positive way in that clause, 1230 01:08:52,680 --> 01:08:54,270 but not in the negative way. 1231 01:08:54,270 --> 01:08:58,560 The negative way-- I would reverse in and out 1232 01:08:58,560 --> 01:09:03,200 of the detour, flip that around so now 1233 01:09:03,200 --> 01:09:06,140 I could only do the detour when I'm going right to left, 1234 01:09:06,140 --> 01:09:07,399 instead of left to right. 1235 01:09:07,399 --> 01:09:08,640 I hope you get the picture. 1236 01:09:08,640 --> 01:09:10,250 This is how the structure is working. 1237 01:09:10,250 --> 01:09:13,080 Now what's only left for me to do is put it all together. 1238 01:09:13,080 --> 01:09:20,150 But this slide contains the guts of what's happening. 1239 01:09:20,150 --> 01:09:24,729 Is there any question that I can answer for you on this? 1240 01:09:28,493 --> 01:09:31,160 Let's move on to the next slide, and then, as questions come up, 1241 01:09:31,160 --> 01:09:34,490 you can be typing them in, and we can maybe answer it there. 1242 01:09:34,490 --> 01:09:37,340 OK, so here is the big picture. 1243 01:09:37,340 --> 01:09:42,638 So imagine we have that formula that we're start-- 1244 01:09:42,638 --> 01:09:43,805 we're reducing that formula. 1245 01:09:54,890 --> 01:09:58,960 And let's say it has m variables and k clauses. 1246 01:09:58,960 --> 01:10:02,290 I'm going to call them clause c1, c2, up to ck-- 1247 01:10:02,290 --> 01:10:07,150 here, the m variables appearing either positively 1248 01:10:07,150 --> 01:10:11,170 or negated in that formula. 1249 01:10:11,170 --> 01:10:13,820 And this is the way the global structure of G 1250 01:10:13,820 --> 01:10:14,745 is going to look like. 1251 01:10:14,745 --> 01:10:16,120 Now, I'm getting a question here. 1252 01:10:16,120 --> 01:10:20,140 What do those horizontal nodes-- what role do they play? 1253 01:10:20,140 --> 01:10:26,440 Those nodes are there to allow me to visit those clause 1254 01:10:26,440 --> 01:10:30,040 nodes, the nodes which represent the clause gadgets, which 1255 01:10:30,040 --> 01:10:31,540 I'm going to place over here. 1256 01:10:39,070 --> 01:10:43,120 This is almost a whole of G. I'm just missing a few edges, 1257 01:10:43,120 --> 01:10:45,060 but these are all the nodes of G. 1258 01:10:45,060 --> 01:10:47,740 So remember, I'm trying to find out, 1259 01:10:47,740 --> 01:10:52,080 is there a Hamiltonian path from s to t? 1260 01:10:52,080 --> 01:10:55,830 Now, if I didn't have to worry about these nodes, 1261 01:10:55,830 --> 01:10:57,090 the answer would be just yes. 1262 01:10:59,710 --> 01:11:02,530 In fact, there would be many Hamiltonian paths from s to t, 1263 01:11:02,530 --> 01:11:05,140 because I can do a zig-zag or a zig-zag 1264 01:11:05,140 --> 01:11:07,630 through each of these variable gadgets, 1265 01:11:07,630 --> 01:11:10,820 and that would take me from s to t, and I'd be good. 1266 01:11:10,820 --> 01:11:16,100 It's just the c nodes, these nodes-- these ci nodes-- 1267 01:11:16,100 --> 01:11:17,840 I have to hit them too. 1268 01:11:17,840 --> 01:11:20,150 So they're going to be-- 1269 01:11:24,110 --> 01:11:27,450 visiting them is going to be enabled by detours from here. 1270 01:11:27,450 --> 01:11:30,300 So let me just try to show you what that looks like. 1271 01:11:30,300 --> 01:11:35,360 So I'm going to magnify little pieces here from these gadgets 1272 01:11:35,360 --> 01:11:38,730 here and show you how these guys are connected up. 1273 01:11:38,730 --> 01:11:43,970 So here is the x1 gadget. 1274 01:11:43,970 --> 01:11:48,860 So x1 appears positively in c1. 1275 01:11:48,860 --> 01:11:50,240 Here's x1 and c1. 1276 01:11:50,240 --> 01:11:54,470 And so that means, when I'm going left to right, 1277 01:11:54,470 --> 01:11:57,155 I'm going to be a possibility of visiting c1. 1278 01:12:01,550 --> 01:12:03,980 But now, let's look what happens with-- 1279 01:12:11,150 --> 01:12:13,610 which was the next one I had here? 1280 01:12:16,380 --> 01:12:16,880 OK. 1281 01:12:23,500 --> 01:12:24,700 Right. 1282 01:12:24,700 --> 01:12:27,160 So the next one is-- oh, yeah. 1283 01:12:27,160 --> 01:12:29,350 So x1 appears positively in c1. 1284 01:12:29,350 --> 01:12:32,560 That's why I have the connection like this. 1285 01:12:32,560 --> 01:12:42,430 Now, x1 appears negated in c2, so I only want to do-- 1286 01:12:42,430 --> 01:12:45,550 enable the detour to visit c2 when 1287 01:12:45,550 --> 01:12:49,940 I'm going to left, as opposed to left to right. 1288 01:12:49,940 --> 01:12:56,860 So this set of horizontal nodes is only 1289 01:12:56,860 --> 01:13:00,550 going to allow me to take a detour either out to c1 1290 01:13:00,550 --> 01:13:05,150 or out to c2, but not to both, because the Hamiltonian 1291 01:13:05,150 --> 01:13:08,420 path is either going to go left or right or to left. 1292 01:13:08,420 --> 01:13:14,040 It can't do both, when it's going through the x1 gadget. 1293 01:13:14,040 --> 01:13:15,190 You need to-- 1294 01:13:15,190 --> 01:13:17,610 [INAUDIBLE] I'll try to help you through it, 1295 01:13:17,610 --> 01:13:21,600 but you have to try to think about why it's working. 1296 01:13:21,600 --> 01:13:24,760 Let's think together. 1297 01:13:24,760 --> 01:13:31,980 So x2 also appears in c1, but now it appears negated. 1298 01:13:31,980 --> 01:13:38,010 So I'm going to have edges from this x2 gadget-- 1299 01:13:38,010 --> 01:13:43,350 oops-- the x2 gadget, but now look at the-- look 1300 01:13:43,350 --> 01:13:46,040 at how I've arranged that detour. 1301 01:13:46,040 --> 01:13:49,140 I can leave on the left leftward side 1302 01:13:49,140 --> 01:13:51,780 and return on the right side, which 1303 01:13:51,780 --> 01:13:57,000 means I can only do that detour when I'm going to left. 1304 01:14:01,050 --> 01:14:03,030 And that's because x2 is negated in c1. 1305 01:14:06,870 --> 01:14:09,680 Maybe it's a lot here, if you're not quite getting it, 1306 01:14:09,680 --> 01:14:13,280 but the point is, let's say, try to quickly prove-- so that's 1307 01:14:13,280 --> 01:14:14,420 the whole construction. 1308 01:14:14,420 --> 01:14:17,780 You just do that for every single appearance 1309 01:14:17,780 --> 01:14:20,300 of the literal in a clause. 1310 01:14:20,300 --> 01:14:22,100 You're going to add these detours, which 1311 01:14:22,100 --> 01:14:28,860 allow you possibly to go visit the clause. 1312 01:14:28,860 --> 01:14:33,020 So the forward direction is-- 1313 01:14:33,020 --> 01:14:34,080 why is this true? 1314 01:14:34,080 --> 01:14:37,490 So you take any satisfying assignment, as I suggested, 1315 01:14:37,490 --> 01:14:40,090 make the corresponding zig-zags or zag-zigs 1316 01:14:40,090 --> 01:14:41,810 through the variable gadgets from s 1317 01:14:41,810 --> 01:14:46,550 to t, and then take the detours to visit the clause nodes. 1318 01:14:51,010 --> 01:14:54,230 The reverse direction actually is slightly trickier. 1319 01:14:54,230 --> 01:14:57,640 We're not going to have time to go through the subtlety of it, 1320 01:14:57,640 --> 01:15:01,780 but what you want to make sure here 1321 01:15:01,780 --> 01:15:08,620 is that you don't have a weird path occurring, because I'm 1322 01:15:08,620 --> 01:15:10,210 going to start with a Hamiltonian path 1323 01:15:10,210 --> 01:15:13,090 now and build an assignment. 1324 01:15:13,090 --> 01:15:17,050 And we want to make sure that the path that I'm constructing 1325 01:15:17,050 --> 01:15:21,670 doesn't go from one-- from this horizontal nodes 1326 01:15:21,670 --> 01:15:26,140 to a closed node, and then back to somebody else's 1327 01:15:26,140 --> 01:15:31,570 horizontal nodes, and is kind of a hodgepodge of things which 1328 01:15:31,570 --> 01:15:34,700 don't make any sense in trying to reconstruct a satisfying 1329 01:15:34,700 --> 01:15:35,200 assignment. 1330 01:15:35,200 --> 01:15:37,030 What you really want to have happen 1331 01:15:37,030 --> 01:15:40,150 is the Hamiltonian path should have clear zig-zags 1332 01:15:40,150 --> 01:15:43,630 and zags-zigs that allow you to decide 1333 01:15:43,630 --> 01:15:46,240 how to set the variables. 1334 01:15:46,240 --> 01:15:48,760 And so that's the role of these little nodes here. 1335 01:15:48,760 --> 01:15:53,020 These are spacers that separate the detours from one another, 1336 01:15:53,020 --> 01:15:59,260 and that force a visit to the clause node 1337 01:15:59,260 --> 01:16:05,280 to come back to the same place from which it left. 1338 01:16:05,280 --> 01:16:08,910 Otherwise, you would never be able to visit those spacer 1339 01:16:08,910 --> 01:16:09,930 nodes. 1340 01:16:09,930 --> 01:16:11,860 You have to look in the book for that, 1341 01:16:11,860 --> 01:16:13,920 but there is a little bit of a detail 1342 01:16:13,920 --> 01:16:15,300 that you have to go through. 1343 01:16:15,300 --> 01:16:18,450 You have to show it must be zig-zags and zag-zigs, 1344 01:16:18,450 --> 01:16:20,640 and then you get the corresponding truth assignment, 1345 01:16:20,640 --> 01:16:24,180 and it must satisfy phi for all paths. 1346 01:16:24,180 --> 01:16:26,935 OK. 1347 01:16:26,935 --> 01:16:29,060 Again, the reduction is polynomial time computable. 1348 01:16:29,060 --> 01:16:30,602 I'm not going to say more about that. 1349 01:16:30,602 --> 01:16:33,950 We're a little bit low on time. 1350 01:16:33,950 --> 01:16:37,770 Last check-in-- would this construction still 1351 01:16:37,770 --> 01:16:40,040 work if G was undirected? 1352 01:16:40,040 --> 01:16:45,440 Suppose I just eliminated all the directions from the edges, 1353 01:16:45,440 --> 01:16:47,480 made them lines, instead of arrows. 1354 01:16:47,480 --> 01:16:51,170 Would that now show that the undirected Hamiltonian path 1355 01:16:51,170 --> 01:16:54,810 problem is NP complete? 1356 01:16:54,810 --> 01:16:57,540 Let me see. 1357 01:16:57,540 --> 01:16:59,790 OK, what do the c nodes represent here? 1358 01:16:59,790 --> 01:17:04,780 There's one c node for every clause. 1359 01:17:04,780 --> 01:17:12,040 So there are k clauses named c1 to ck, and there are k c nodes. 1360 01:17:12,040 --> 01:17:15,440 These are the so-called clause gadgets, 1361 01:17:15,440 --> 01:17:18,830 which are going to force there to be 1362 01:17:18,830 --> 01:17:25,880 one true literal in every clause for the satisfying assignment. 1363 01:17:33,665 --> 01:17:36,290 You have to look at it, or maybe we can spend a little bit more 1364 01:17:36,290 --> 01:17:39,800 time explaining it, but that's what the purpose is. 1365 01:17:44,880 --> 01:17:47,970 Does that mean we need only two inside nodes? 1366 01:17:51,000 --> 01:17:55,110 So the horizontal nodes-- do we only need two of them? 1367 01:17:55,110 --> 01:18:00,230 You won't be able to reuse these nodes for multiple detours. 1368 01:18:00,230 --> 01:18:03,500 For one thing, once you've gone to a detour, 1369 01:18:03,500 --> 01:18:06,770 you come back to the node next over, and so you better 1370 01:18:06,770 --> 01:18:08,900 not overlay multiple detours. 1371 01:18:11,690 --> 01:18:16,930 And also, you need to keep them separated from each other. 1372 01:18:16,930 --> 01:18:19,530 Don't forget, this node x1 can appear 1373 01:18:19,530 --> 01:18:24,900 in many, many different clauses, so you 1374 01:18:24,900 --> 01:18:31,085 would need to have possibly many of these horizontal nodes. 1375 01:18:35,250 --> 01:18:38,580 So someone now says 2k inside nodes would suffice. 1376 01:18:38,580 --> 01:18:40,650 Probably 2k-- I would say 3k, just to be 1377 01:18:40,650 --> 01:18:42,990 safe for the spacer nodes. 1378 01:18:42,990 --> 01:18:45,150 You need to look carefully at the argument, which 1379 01:18:45,150 --> 01:18:47,940 is laid out in the textbook. 1380 01:18:47,940 --> 01:18:49,920 You may not actually need the spacer nodes, 1381 01:18:49,920 --> 01:18:52,860 but then it makes the argument just more ugly. 1382 01:18:52,860 --> 01:18:55,230 So the way the construction is done 1383 01:18:55,230 --> 01:18:58,155 is you have 3k inside nodes. 1384 01:19:01,470 --> 01:19:02,920 OK. 1385 01:19:02,920 --> 01:19:06,190 Again, several questions like that-- 1386 01:19:06,190 --> 01:19:11,590 the graph would start looking messy if x9 was in c1. 1387 01:19:11,590 --> 01:19:14,740 Yeah, if x9 down here was in c1? 1388 01:19:14,740 --> 01:19:17,140 Yeah, it would be messy. 1389 01:19:17,140 --> 01:19:17,980 It's OK. 1390 01:19:17,980 --> 01:19:21,127 Messy is allowed. 1391 01:19:21,127 --> 01:19:22,210 All right, I think we're-- 1392 01:19:22,210 --> 01:19:23,830 let's end the polling. 1393 01:19:23,830 --> 01:19:25,300 Are you all in? 1394 01:19:25,300 --> 01:19:29,420 All right-- share results. 1395 01:19:29,420 --> 01:19:31,340 Yes. 1396 01:19:31,340 --> 01:19:34,040 The answer's no. 1397 01:19:34,040 --> 01:19:36,440 The construction depends on this being directed. 1398 01:19:36,440 --> 01:19:40,310 You can see that all over the place, but for one thing, 1399 01:19:40,310 --> 01:19:47,150 the whole point of these detours is the directions of the edges. 1400 01:19:47,150 --> 01:19:51,410 And so without that, this construction 1401 01:19:51,410 --> 01:19:53,150 is going to be just a bunch of-- 1402 01:19:53,150 --> 01:19:55,525 is not going to mean anything. 1403 01:19:55,525 --> 01:19:56,900 It's not going to prove anything. 1404 01:19:56,900 --> 01:19:59,080 It's probably always going to be-- 1405 01:19:59,080 --> 01:20:03,150 have a Hamiltonian path without the directions. 1406 01:20:03,150 --> 01:20:04,745 So I think we're out of time. 1407 01:20:08,580 --> 01:20:12,160 A quick review-- these are the topics we've covered. 1408 01:20:12,160 --> 01:20:16,450 I think we're out of time, so I should let you go. 1409 01:20:16,450 --> 01:20:19,438 But I'll stick around for a few minutes, in case any of you 1410 01:20:19,438 --> 01:20:20,230 have any questions. 1411 01:20:20,230 --> 01:20:22,605 But I need to run off at 4:00 myself for another meeting, 1412 01:20:22,605 --> 01:20:25,347 so I don't have much that much time. 1413 01:20:25,347 --> 01:20:27,430 Clarify my comment about picking the wrong problem 1414 01:20:27,430 --> 01:20:34,440 to tackle P versus NP, when I used composites as an example-- 1415 01:20:34,440 --> 01:20:36,750 if I worked hard to prove that composites is not 1416 01:20:36,750 --> 01:20:41,010 in P as a way of proving that there is some NP language which 1417 01:20:41,010 --> 01:20:44,880 is not in P, that would have been a mistake, 1418 01:20:44,880 --> 01:20:48,720 because composites is in P. I would 1419 01:20:48,720 --> 01:20:51,150 have been working hard to prove something which we now 1420 01:20:51,150 --> 01:20:52,840 know was false. 1421 01:20:52,840 --> 01:20:56,755 So we don't want to spend time working on the wrong language. 1422 01:20:59,410 --> 01:21:01,390 But the nice thing about NP complete languages 1423 01:21:01,390 --> 01:21:04,840 is that we have a guarantee that, if P's different from NP, 1424 01:21:04,840 --> 01:21:07,640 that that language is not in P. Are there 1425 01:21:07,640 --> 01:21:10,730 problems that are not in P that are in NP, 1426 01:21:10,730 --> 01:21:11,900 but are not NP complete? 1427 01:21:11,900 --> 01:21:13,025 Oh, that's a good question. 1428 01:21:15,320 --> 01:21:20,300 Are there problems in between P and NP complete? 1429 01:21:20,300 --> 01:21:23,450 So NP complete is sort of like the hardest problems in NP, 1430 01:21:23,450 --> 01:21:26,330 and the P problems are obviously the easy problems 1431 01:21:26,330 --> 01:21:27,890 that are in NP. 1432 01:21:27,890 --> 01:21:30,740 Is everything either NP complete or in P? 1433 01:21:33,800 --> 01:21:37,550 So for one thing, there are problems that are not 1434 01:21:37,550 --> 01:21:39,050 known to be in either category. 1435 01:21:41,740 --> 01:21:45,740 So we'll discuss some of those in due course, but one of them 1436 01:21:45,740 --> 01:21:47,720 is the graph isomorphism problem, 1437 01:21:47,720 --> 01:21:51,140 testing two graphs-- if they're really just permutations of one 1438 01:21:51,140 --> 01:21:51,840 another. 1439 01:21:51,840 --> 01:21:55,250 It's clearly a problem in NP, but not known to be in P. 1440 01:21:55,250 --> 01:21:58,070 So there are problems that are not 1441 01:21:58,070 --> 01:22:00,260 known to be either NP complete or in P, 1442 01:22:00,260 --> 01:22:03,800 so there are problems that might be in between. 1443 01:22:03,800 --> 01:22:05,240 But then there was another theorem 1444 01:22:05,240 --> 01:22:09,020 out there, which says that, if you assume that P is differ 1445 01:22:09,020 --> 01:22:13,770 from NP, then you can construct problems which are in between, 1446 01:22:13,770 --> 01:22:18,110 which are neither NP complete nor in P. They're NP, problems 1447 01:22:18,110 --> 01:22:23,210 but they're not NP complete, not in P. So those problems 1448 01:22:23,210 --> 01:22:25,490 themselves are perhaps somewhat artificial, 1449 01:22:25,490 --> 01:22:27,350 but they at least prove the point 1450 01:22:27,350 --> 01:22:30,260 that it is possible to have these intermediate problems. 1451 01:22:33,680 --> 01:22:36,530 Oh, so somebody's asking, isn't factorization one. 1452 01:22:36,530 --> 01:22:41,900 Not [? known-- ?] for the case of factorization-- 1453 01:22:41,900 --> 01:22:44,330 or you have to make a language out of that, by the way. 1454 01:22:46,940 --> 01:22:56,130 But it's a because factorization's a function, 1455 01:22:56,130 --> 01:22:58,380 so we won't really want to be talking about languages. 1456 01:22:58,380 --> 01:23:03,060 As the homework suggests, NP and P are classes of languages, 1457 01:23:03,060 --> 01:23:06,030 but OK, that's a separate note there. 1458 01:23:06,030 --> 01:23:11,740 Factorization is not known. 1459 01:23:11,740 --> 01:23:16,280 Factorization could be in P and it could be NP complete. 1460 01:23:16,280 --> 01:23:20,660 Both of those are not ruled out. 1461 01:23:20,660 --> 01:23:22,850 So I think most people would probably 1462 01:23:22,850 --> 01:23:25,130 venture to guess that it's a problem that's 1463 01:23:25,130 --> 01:23:28,490 in the in-between state, that's neither P nor NP complete, 1464 01:23:28,490 --> 01:23:29,390 but not known. 1465 01:23:34,510 --> 01:23:38,350 Who first thought of this reduction from 3SAT to HAMPATH? 1466 01:23:38,350 --> 01:23:39,820 It's so clever. 1467 01:23:39,820 --> 01:23:40,750 Well, it wasn't me. 1468 01:23:46,630 --> 01:23:49,510 I think that is due to Dick Karp, who 1469 01:23:49,510 --> 01:23:51,820 was one of my professors at Berkeley, where 1470 01:23:51,820 --> 01:23:52,900 I was a graduate student. 1471 01:23:52,900 --> 01:23:55,520 That was done before I got there. 1472 01:23:55,520 --> 01:23:57,880 It was around 1971 when-- 1473 01:23:57,880 --> 01:24:00,100 so there were two famous papers. 1474 01:24:00,100 --> 01:24:01,292 There's the Cook paper. 1475 01:24:01,292 --> 01:24:02,500 There's also the Levin paper. 1476 01:24:02,500 --> 01:24:03,430 That was in Russian. 1477 01:24:03,430 --> 01:24:06,013 That took a while for people to discover out here in the West. 1478 01:24:06,013 --> 01:24:10,240 But the Cook paper was 1971, and very quickly followed after-- 1479 01:24:10,240 --> 01:24:13,420 he just showed SAT is NP complete, 1480 01:24:13,420 --> 01:24:18,910 but after that, Karp was-- 1481 01:24:18,910 --> 01:24:21,250 he had a paper called "Reducibility 1482 01:24:21,250 --> 01:24:23,320 Among Combinatorial Problems," and he 1483 01:24:23,320 --> 01:24:25,660 had a list of about 20 problems that he 1484 01:24:25,660 --> 01:24:28,270 showed were NP complete-- 1485 01:24:28,270 --> 01:24:31,240 by reduction from SAT-- 1486 01:24:31,240 --> 01:24:34,345 include clique, include HAMPATH, and a bunch of other things. 1487 01:24:36,970 --> 01:24:40,840 And that was also a very famous paper. 1488 01:24:40,840 --> 01:24:41,980 Both of those are-- 1489 01:24:41,980 --> 01:24:44,110 people often talk about Cook-Karp 1490 01:24:44,110 --> 01:24:50,095 as, together, they really show the importance 1491 01:24:50,095 --> 01:24:54,747 of NP completeness and the whole notion of NP completeness. 1492 01:24:57,370 --> 01:24:59,345 Yeah, 21 problems, so-- 1493 01:24:59,345 --> 01:25:01,220 yeah, so Karp proved 21 when problems were NP 1494 01:25:01,220 --> 01:25:03,830 complete in 1972. 1495 01:25:03,830 --> 01:25:05,690 So that was shortly after Cook showed 1496 01:25:05,690 --> 01:25:09,350 that SAT was NP complete. 1497 01:25:09,350 --> 01:25:11,900 Incidentally, I think the terminology NP complete wasn't 1498 01:25:11,900 --> 01:25:16,310 around until a little later. 1499 01:25:16,310 --> 01:25:19,880 And that might have been-- might be due to Knuth. 1500 01:25:19,880 --> 01:25:20,600 I'm not sure. 1501 01:25:20,600 --> 01:25:23,000 I remember he did a big poll of people 1502 01:25:23,000 --> 01:25:27,380 about what should be the right language to use for that term, 1503 01:25:27,380 --> 01:25:28,700 and I think he came up with it. 1504 01:25:31,765 --> 01:25:33,390 All right, I'm going to head off, guys. 1505 01:25:33,390 --> 01:25:35,650 Nice seeing you all-- 1506 01:25:35,650 --> 01:25:38,700 so until Thursday-- oh, until Tuesday. 1507 01:25:38,700 --> 01:25:40,400 Bye bye.