1 00:00:00,000 --> 00:00:01,988 [SQUEAKING] 2 00:00:01,988 --> 00:00:04,473 [RUSTLING] 3 00:00:04,473 --> 00:00:07,455 [CLICKING] 4 00:00:24,910 --> 00:00:26,390 MICHAEL SIPSER: Welcome, everyone. 5 00:00:26,390 --> 00:00:29,800 Welcome back to theory of computation. 6 00:00:29,800 --> 00:00:38,130 And just to recap where we are, we 7 00:00:38,130 --> 00:00:40,470 have been looking at time complexity and space 8 00:00:40,470 --> 00:00:41,790 complexity. 9 00:00:41,790 --> 00:00:48,570 And we just finished proving what 10 00:00:48,570 --> 00:00:53,790 are called the hierarchy theorems, which, in a nutshell, 11 00:00:53,790 --> 00:00:58,360 basically say that, if you allow the computational model 12 00:00:58,360 --> 00:01:00,360 to have a little bit more resource, a little bit 13 00:01:00,360 --> 00:01:03,780 more time, a little bit more space, then 14 00:01:03,780 --> 00:01:09,500 you can do more things with certain conditions. 15 00:01:09,500 --> 00:01:11,890 So we proved that last time. 16 00:01:11,890 --> 00:01:14,600 It was a proof, basically, by a diagonalization. 17 00:01:14,600 --> 00:01:17,620 I don't know if you recognized the diagonalization there, 18 00:01:17,620 --> 00:01:21,730 but when you're encoding a machine by an input 19 00:01:21,730 --> 00:01:24,550 and then basically running all possible different machines, 20 00:01:24,550 --> 00:01:28,910 that's essentially a diagonalization. 21 00:01:28,910 --> 00:01:33,050 So today, we're going to build on that work 22 00:01:33,050 --> 00:01:35,810 to give an example of what we call 23 00:01:35,810 --> 00:01:39,260 a natural intractable problem. 24 00:01:39,260 --> 00:01:41,755 We'll say a bit more about what that means. 25 00:01:41,755 --> 00:01:43,880 And then, we're going to talk about something which 26 00:01:43,880 --> 00:01:47,030 is a different topic, but nevertheless related, 27 00:01:47,030 --> 00:01:52,220 having to do with oracles and methods which may or may not 28 00:01:52,220 --> 00:01:59,750 work to solve the P versus NP problem, which, of course, is 29 00:01:59,750 --> 00:02:02,390 a big open problem in the field. 30 00:02:02,390 --> 00:02:03,470 OK. 31 00:02:03,470 --> 00:02:07,100 So the time and space hierarchy theorems-- 32 00:02:07,100 --> 00:02:09,560 because we're going to be using those today-- 33 00:02:09,560 --> 00:02:13,670 they say that if you give a little bit more space here-- so 34 00:02:13,670 --> 00:02:16,190 for space constructible functions, functions 35 00:02:16,190 --> 00:02:19,400 that you can actually compute within the amount of space 36 00:02:19,400 --> 00:02:24,230 that they specify, you can show that the things that you 37 00:02:24,230 --> 00:02:29,060 can do in that much space is probably larger than what 38 00:02:29,060 --> 00:02:30,500 you can do in less space. 39 00:02:30,500 --> 00:02:33,390 And you can prove a similar slightly weaker fact 40 00:02:33,390 --> 00:02:36,270 about the time complexity classes. 41 00:02:36,270 --> 00:02:43,050 So what that means is that these classes form a hierarchy. 42 00:02:43,050 --> 00:02:46,220 So as you add more time, or let's 43 00:02:46,220 --> 00:02:49,010 say, in this case, space, from n squared, to n cubed, 44 00:02:49,010 --> 00:02:53,810 to n to the 4th, you get larger and larger classes, 45 00:02:53,810 --> 00:02:57,660 which I'm kind illustrating here by putting a dot there, 46 00:02:57,660 --> 00:02:59,360 which shows that there's something 47 00:02:59,360 --> 00:03:02,180 that we know that's new in those classes 48 00:03:02,180 --> 00:03:07,830 as you go up these different bounds. 49 00:03:07,830 --> 00:03:09,890 And this is going to be true for space complexity 50 00:03:09,890 --> 00:03:12,500 and it's also going to be true for time complexity. 51 00:03:17,900 --> 00:03:21,470 And one of the corollaries that we pointed out last time 52 00:03:21,470 --> 00:03:30,320 is that, PSPACE is a-- properly includes non-deterministic log 53 00:03:30,320 --> 00:03:31,340 space, NL. 54 00:03:31,340 --> 00:03:33,920 So NL is a proper subset of PSPACE. 55 00:03:33,920 --> 00:03:38,750 So there's stuff in PSPACE that is not in NL. 56 00:03:38,750 --> 00:03:41,420 And remember this notation here, this means proper subset. 57 00:03:44,470 --> 00:03:46,600 One of the things that-- 58 00:03:46,600 --> 00:03:49,520 a follow-on corollary that we didn't mention last time, 59 00:03:49,520 --> 00:03:52,450 but that's something that you should know, 60 00:03:52,450 --> 00:03:57,970 is that the TQBF problem, our PSPACE based complete problem, 61 00:03:57,970 --> 00:04:02,140 is an example of a problem that's in PSPACE, obviously, 62 00:04:02,140 --> 00:04:05,620 but we know it's also not in NL. 63 00:04:05,620 --> 00:04:07,540 And in order to get that conclusion, 64 00:04:07,540 --> 00:04:14,020 you have to look, again, at the proof that TQBF is PSPACE 65 00:04:14,020 --> 00:04:19,300 complete, and observe that the reductions that we gave 66 00:04:19,300 --> 00:04:23,410 in that proof can be carried out not only in polynomial time, 67 00:04:23,410 --> 00:04:26,110 but they can be carried out in log space. 68 00:04:26,110 --> 00:04:32,200 And therefore, if TQBF turned out to go down to NL, 69 00:04:32,200 --> 00:04:34,660 then because everything in PSPACE 70 00:04:34,660 --> 00:04:38,650 is log space reducible to TQBF, that would bring all of PSPACE 71 00:04:38,650 --> 00:04:40,480 down to NL. 72 00:04:40,480 --> 00:04:42,950 But that we just proved is not the case. 73 00:04:42,950 --> 00:04:46,530 So therefore, TQBF could not be in NL. 74 00:04:46,530 --> 00:04:50,340 OK, and we're going to be using that kind of reasoning 75 00:04:50,340 --> 00:04:54,070 again in this lecture. 76 00:04:54,070 --> 00:04:56,740 So just a quick check-in. 77 00:04:56,740 --> 00:05:00,030 These are a few, more or less easy, 78 00:05:00,030 --> 00:05:04,370 maybe more or less tricky, follow-ons 79 00:05:04,370 --> 00:05:09,357 that you can conclude from the time and space hierarchy 80 00:05:09,357 --> 00:05:10,940 theorems plus some of the other things 81 00:05:10,940 --> 00:05:12,450 we've proven along the way. 82 00:05:12,450 --> 00:05:16,957 And so just as a check of your understanding, maybe 83 00:05:16,957 --> 00:05:18,540 these a little bit on the tricky side, 84 00:05:18,540 --> 00:05:21,020 so you have to read them carefully. 85 00:05:21,020 --> 00:05:25,520 Which of these are known to be true based on the material 86 00:05:25,520 --> 00:05:26,510 that we've presented? 87 00:05:26,510 --> 00:05:29,210 And this is also just material that's 88 00:05:29,210 --> 00:05:35,300 the facts that we know to be true in complexity theory. 89 00:05:35,300 --> 00:05:36,640 So let me launch that poll. 90 00:05:36,640 --> 00:05:43,290 And just check off the ones that we can prove. 91 00:05:43,290 --> 00:05:44,970 Hmm. 92 00:05:44,970 --> 00:05:46,710 OK. 93 00:05:46,710 --> 00:05:48,220 I'm going to close it down. 94 00:05:48,220 --> 00:05:53,010 So please answer quickly if you're going to. 95 00:05:53,010 --> 00:05:54,360 OK, 1, 2, 3, end. 96 00:05:57,120 --> 00:05:59,220 OK. 97 00:05:59,220 --> 00:06:04,620 Well, the two leading candidates are correct. 98 00:06:04,620 --> 00:06:06,750 And the two that are the laggards here 99 00:06:06,750 --> 00:06:09,960 are, in fact, the ones that are not true. 100 00:06:09,960 --> 00:06:14,910 So A and D are not true, based on what we know. 101 00:06:14,910 --> 00:06:18,070 And B and C are true. 102 00:06:18,070 --> 00:06:20,250 So let's understand, first of all, A, we 103 00:06:20,250 --> 00:06:25,620 know it's false because 2 to the n plus 1 is just 2 times 2 104 00:06:25,620 --> 00:06:27,140 to the n. 105 00:06:27,140 --> 00:06:32,250 And so these two bounds differ only by a constant factor. 106 00:06:32,250 --> 00:06:35,570 And so in fact, they're the same complexity class. 107 00:06:35,570 --> 00:06:37,940 And so you don't get proper containment for A. 108 00:06:37,940 --> 00:06:39,755 So that one we absolutely know is false. 109 00:06:42,820 --> 00:06:49,440 D, well, if we could prove that, then we 110 00:06:49,440 --> 00:06:51,120 would have solved the famous problem, 111 00:06:51,120 --> 00:06:56,530 because we don't know whether even P equals PSPACE. 112 00:06:56,530 --> 00:06:59,280 So if P equals PSPACE, then certainly PSPACE 113 00:06:59,280 --> 00:07:02,200 would equal NP, which is in between the two. 114 00:07:02,200 --> 00:07:04,890 And so we don't know how to prove 115 00:07:04,890 --> 00:07:08,130 PSPACE is different from NP, that's 116 00:07:08,130 --> 00:07:11,777 based on the current state of knowledge of the field. 117 00:07:11,777 --> 00:07:13,360 So this would not be something that we 118 00:07:13,360 --> 00:07:17,530 know to be true based on what things that we've said. 119 00:07:17,530 --> 00:07:23,980 Now, B follows directly from the time hierarchy theorem, 120 00:07:23,980 --> 00:07:28,210 because 2 to the 2n is the square of 2 to the n. 121 00:07:28,210 --> 00:07:33,650 And that is, asymptotically, a significantly larger bound. 122 00:07:33,650 --> 00:07:41,660 And so you can prove that time 2 the n is properly 123 00:07:41,660 --> 00:07:44,560 contains time 2 the n. 124 00:07:44,560 --> 00:07:49,970 C is a little trickier because you need 125 00:07:49,970 --> 00:07:52,630 to remember Savitch's theorem. 126 00:07:52,630 --> 00:07:54,970 Savitch's theorem applies to space. 127 00:07:54,970 --> 00:07:56,650 But you also need to remember that what 128 00:07:56,650 --> 00:07:59,440 you can do in time, in non-deterministic time 129 00:07:59,440 --> 00:08:02,230 n squared, you can also do in non-deterministic space 130 00:08:02,230 --> 00:08:04,720 n squared, which, then, in turn, you 131 00:08:04,720 --> 00:08:09,550 can do in deterministic space n to the 4th, which is properly 132 00:08:09,550 --> 00:08:12,380 contained within space n to the 5th. 133 00:08:12,380 --> 00:08:15,800 So you can prove that PSPACE properly 134 00:08:15,800 --> 00:08:19,940 contains non-deterministic time n squared. 135 00:08:19,940 --> 00:08:22,850 OK, just a bunch of containments there. 136 00:08:22,850 --> 00:08:27,470 A and C are perhaps, in a sense, it may be the most tricky 137 00:08:27,470 --> 00:08:29,380 of this group. 138 00:08:29,380 --> 00:08:32,049 OK. 139 00:08:32,049 --> 00:08:34,460 So let's move on. 140 00:08:34,460 --> 00:08:38,350 So we're going to introduce, today, two new classes. 141 00:08:38,350 --> 00:08:41,069 And actually, I want to go back to here. 142 00:08:43,750 --> 00:08:46,930 What are we going to be trying to accomplish 143 00:08:46,930 --> 00:08:48,620 in today's lecture? 144 00:08:48,620 --> 00:08:53,170 So we're going to be looking at provable intractability. 145 00:08:53,170 --> 00:08:58,780 So a problem being intractable for us means it's outside of P. 146 00:08:58,780 --> 00:09:02,560 So we can't solve it in polynomial time. 147 00:09:02,560 --> 00:09:06,160 For our perspective, we're going to call that 148 00:09:06,160 --> 00:09:08,710 an intractable problem. 149 00:09:08,710 --> 00:09:13,870 Now, this problem over here, that's sitting in time 2 the n, 150 00:09:13,870 --> 00:09:17,740 but not in smaller classes, so this is an intractable problem. 151 00:09:17,740 --> 00:09:25,420 That's outside of P. But this example 152 00:09:25,420 --> 00:09:30,280 of a language, if you remember how the time hierarchy 153 00:09:30,280 --> 00:09:32,560 theorem or the space hierarchy theorem 154 00:09:32,560 --> 00:09:36,430 was proved, basically, this language itself is not 155 00:09:36,430 --> 00:09:39,490 an interesting language for other than the purpose 156 00:09:39,490 --> 00:09:43,270 that it serves, to be in that class and not in a lower class. 157 00:09:43,270 --> 00:09:45,670 But it's not a language that anyone would care about. 158 00:09:45,670 --> 00:09:48,470 And it's not even a language that is easy to describe. 159 00:09:48,470 --> 00:09:53,110 It's just the language that some Turing machine decides, 160 00:09:53,110 --> 00:09:54,790 where that Turing machine is especially 161 00:09:54,790 --> 00:09:58,210 designed to have the property that its language is 162 00:09:58,210 --> 00:10:01,610 at a particular complexity level. 163 00:10:01,610 --> 00:10:04,388 But otherwise, there's no nice description of that language. 164 00:10:04,388 --> 00:10:05,930 It's not like a to the n, b to the n, 165 00:10:05,930 --> 00:10:13,320 or some equivalence of 2 dfa's or something like that. 166 00:10:13,320 --> 00:10:16,190 So I would say that that language is, in a sense, 167 00:10:16,190 --> 00:10:18,830 it serves its purpose, but it's not a natural language that you 168 00:10:18,830 --> 00:10:19,950 really care about. 169 00:10:19,950 --> 00:10:23,150 So one one of the goals of today's lecture 170 00:10:23,150 --> 00:10:26,090 is to give an example of a natural language, 171 00:10:26,090 --> 00:10:30,260 a naturally-occurring language, in a sense, that's 172 00:10:30,260 --> 00:10:33,830 easy to describe, where you can prove that that language is 173 00:10:33,830 --> 00:10:39,170 intractable, is actually outside of P. 174 00:10:39,170 --> 00:10:42,840 So that's a bit of motivation where we're going. 175 00:10:42,840 --> 00:10:45,230 So along the way, we're going to introduce 176 00:10:45,230 --> 00:10:48,810 these exponential complexity classes, exponential time 177 00:10:48,810 --> 00:10:52,490 and exponential space, which are exponentially 178 00:10:52,490 --> 00:10:57,170 bigger than polynomial time and polynomial space classes. 179 00:10:57,170 --> 00:11:00,200 So it's 2 to the n to the k in both cases. 180 00:11:00,200 --> 00:11:03,930 2 to a polynomial. 181 00:11:03,930 --> 00:11:08,790 And the first five of these classes, L 182 00:11:08,790 --> 00:11:10,430 through PSPACE we've already seen, 183 00:11:10,430 --> 00:11:12,890 and exponential time and exponential space 184 00:11:12,890 --> 00:11:17,880 extend the containments that we've already seen. 185 00:11:17,880 --> 00:11:24,200 So you have to double check that you understand why PSPACE is 186 00:11:24,200 --> 00:11:27,090 a subset of exponential time. 187 00:11:27,090 --> 00:11:30,170 But that's because that, as we showed, 188 00:11:30,170 --> 00:11:33,110 going from space to time, you can do that 189 00:11:33,110 --> 00:11:35,190 with an exponential increase. 190 00:11:35,190 --> 00:11:37,130 That's the cost of the simulation. 191 00:11:37,130 --> 00:11:39,050 And going from time to space, you 192 00:11:39,050 --> 00:11:40,370 don't need any increase at all. 193 00:11:40,370 --> 00:11:42,578 Anything that you can do in a certain amount of time, 194 00:11:42,578 --> 00:11:44,293 you can do in that much space. 195 00:11:44,293 --> 00:11:46,460 So anything you can do in a certain amount of space, 196 00:11:46,460 --> 00:11:50,290 you can also do in exponentially more amount of time. 197 00:11:50,290 --> 00:11:53,090 OK, so those were simple theorems 198 00:11:53,090 --> 00:11:57,590 that we proved right at the very beginning. 199 00:11:57,590 --> 00:11:59,560 Now, the hierarchy theorems allow 200 00:11:59,560 --> 00:12:02,890 us to conclude some separations among these classes. 201 00:12:02,890 --> 00:12:06,880 So we already looked at this one, NL versus PSPACE. 202 00:12:06,880 --> 00:12:11,800 And we saw that because NL is, by Savitch's theorem, 203 00:12:11,800 --> 00:12:15,550 in deterministic log squared space, which is properly 204 00:12:15,550 --> 00:12:19,270 contained in polynomial space, you 205 00:12:19,270 --> 00:12:24,260 get a separation between those two classes, provably. 206 00:12:24,260 --> 00:12:26,780 And for similar reasons, polynomial space 207 00:12:26,780 --> 00:12:28,490 to exponential space, you're going 208 00:12:28,490 --> 00:12:32,000 to get a separation from the space hierarchy theorem. 209 00:12:32,000 --> 00:12:34,280 And polynomial time to exponential time, 210 00:12:34,280 --> 00:12:37,480 you get a provable separation by virtue 211 00:12:37,480 --> 00:12:38,980 of the hierarchy theorem. 212 00:12:41,930 --> 00:12:45,710 Now we're going to define complete problems 213 00:12:45,710 --> 00:12:48,333 for these two classes, exponential time 214 00:12:48,333 --> 00:12:49,250 and exponential space. 215 00:12:49,250 --> 00:12:51,500 So we have exponential time complete. 216 00:12:51,500 --> 00:12:56,880 It's going to be analogous to what we showed before, 217 00:12:56,880 --> 00:13:01,860 which is that it's a member of exponential time. 218 00:13:01,860 --> 00:13:06,173 And every problem in exponential time is reducible to it, 219 00:13:06,173 --> 00:13:08,090 let's say, in polynomial time, though it's not 220 00:13:08,090 --> 00:13:09,680 going to really turn out to be matter. 221 00:13:09,680 --> 00:13:12,320 It could be in log space. 222 00:13:12,320 --> 00:13:14,135 Some simple method of doing the reduction 223 00:13:14,135 --> 00:13:15,260 is going to be good enough. 224 00:13:15,260 --> 00:13:18,800 Let's say polynomial time is the typical definition. 225 00:13:18,800 --> 00:13:21,350 And the same thing for exponential space complete. 226 00:13:21,350 --> 00:13:23,630 We'll say it's exponential space complete, 227 00:13:23,630 --> 00:13:25,040 if it's an exponential space. 228 00:13:25,040 --> 00:13:27,230 And anything else in exponential space 229 00:13:27,230 --> 00:13:31,050 is polynomial time reducible to it. 230 00:13:31,050 --> 00:13:31,890 OK. 231 00:13:31,890 --> 00:13:37,320 But the important thing is that if something 232 00:13:37,320 --> 00:13:40,156 is exponential time complete, you 233 00:13:40,156 --> 00:13:45,770 know it's outside of P, for the same reasons we've now 234 00:13:45,770 --> 00:13:46,745 seen several times. 235 00:13:49,860 --> 00:13:55,110 Namely, that if an exponential time complete problem ended up 236 00:13:55,110 --> 00:14:00,220 being in P, then because everything 237 00:14:00,220 --> 00:14:03,190 else in exponential time is reducible to the complete 238 00:14:03,190 --> 00:14:05,560 problem, they would also be in P. 239 00:14:05,560 --> 00:14:08,650 And so exponential time and P would be equal. 240 00:14:08,650 --> 00:14:12,820 But we just said they're not equal because of the hierarchy 241 00:14:12,820 --> 00:14:15,140 theorem. 242 00:14:15,140 --> 00:14:19,550 So the logic is the hierarchy theorem separates the class, 243 00:14:19,550 --> 00:14:24,020 and then the complete problem inherits the difficulty 244 00:14:24,020 --> 00:14:26,010 of the larger class. 245 00:14:26,010 --> 00:14:30,202 So the complete problem cannot be any lower than the other 246 00:14:30,202 --> 00:14:32,660 problems in the class, because they're all reducible to it. 247 00:14:36,150 --> 00:14:39,390 So the same thing is going to be true for an exponential space 248 00:14:39,390 --> 00:14:40,170 complete problem. 249 00:14:40,170 --> 00:14:43,650 Can't be even in PSPACE because exponential space and PSPACE 250 00:14:43,650 --> 00:14:45,120 are different. 251 00:14:45,120 --> 00:14:47,940 And if it's not in PSPACE, it's not going to be in P. 252 00:14:47,940 --> 00:14:51,630 And so in both cases, if you have a problem that's 253 00:14:51,630 --> 00:14:54,780 complete for exponential space or exponential time, 254 00:14:54,780 --> 00:14:59,110 we know that those problems are intractable. 255 00:14:59,110 --> 00:15:02,080 And our strategy, then, for giving 256 00:15:02,080 --> 00:15:07,420 a natural intractable problem is to show 257 00:15:07,420 --> 00:15:09,760 it's complete for one of these classes. 258 00:15:09,760 --> 00:15:11,290 And it's actually going to turn out 259 00:15:11,290 --> 00:15:14,800 to be an exponential space complete problem that we're 260 00:15:14,800 --> 00:15:17,880 going to give as our example. 261 00:15:17,880 --> 00:15:19,890 OK, so that is the plan. 262 00:15:22,500 --> 00:15:24,390 I think it's a good time to-- 263 00:15:24,390 --> 00:15:27,120 let's just take a few questions here 264 00:15:27,120 --> 00:15:33,300 to make sure we're all on the same page as what we're doing. 265 00:15:33,300 --> 00:15:34,140 So let me just read. 266 00:15:34,140 --> 00:15:36,060 I got a couple of questions already in here. 267 00:15:45,360 --> 00:15:46,830 So this is a little bit of a side 268 00:15:46,830 --> 00:15:49,230 comment that somebody-- that's an interesting question. 269 00:15:49,230 --> 00:15:53,910 Basically, is it possible that we may not 270 00:15:53,910 --> 00:15:57,270 be able to prove, solve the P versus NP problem, 271 00:15:57,270 --> 00:16:01,050 that it's not a problem that one can answer 272 00:16:01,050 --> 00:16:02,830 from the basic axioms of mathematics, 273 00:16:02,830 --> 00:16:06,740 if I'm interpreting the question correctly. 274 00:16:06,740 --> 00:16:10,280 There are certain problems in mathematics-- 275 00:16:10,280 --> 00:16:11,810 and I think I, perhaps, I mentioned 276 00:16:11,810 --> 00:16:14,150 earlier in the term, the problem of whether there 277 00:16:14,150 --> 00:16:21,080 is a set whose size is in between the integers 278 00:16:21,080 --> 00:16:22,670 and the real numbers. 279 00:16:22,670 --> 00:16:24,890 We know the real numbers are larger in size 280 00:16:24,890 --> 00:16:26,540 than the integers. 281 00:16:26,540 --> 00:16:28,790 That was our first example of a diagonalization. 282 00:16:28,790 --> 00:16:32,660 And is there a problem of size strictly in between the two? 283 00:16:32,660 --> 00:16:35,850 Bigger than the integers, smaller than the real numbers. 284 00:16:35,850 --> 00:16:39,680 So that's a problem that was posed a long time ago. 285 00:16:39,680 --> 00:16:41,240 It was one of Hilbert's problems. 286 00:16:41,240 --> 00:16:46,300 And was eventually shown to be unanswerable 287 00:16:46,300 --> 00:16:49,310 using the basic axioms of mathematics. 288 00:16:49,310 --> 00:16:51,175 So the question is, maybe P versus NP 289 00:16:51,175 --> 00:16:52,510 is in the same category. 290 00:16:55,370 --> 00:16:55,870 Could be. 291 00:16:55,870 --> 00:16:58,350 That could be true of any unsolved problems 292 00:16:58,350 --> 00:17:00,000 in mathematics. 293 00:17:00,000 --> 00:17:02,130 But at least our experience has shown 294 00:17:02,130 --> 00:17:04,980 that the kinds of problems that, at least, have been shown 295 00:17:04,980 --> 00:17:09,420 to be unsolvable from mathematical axioms 296 00:17:09,420 --> 00:17:12,270 tend to involve infinities and very large 297 00:17:12,270 --> 00:17:14,940 things, things that are very far from our intuitions. 298 00:17:14,940 --> 00:17:18,810 And something as down to earth as P versus NP, at least, 299 00:17:18,810 --> 00:17:20,700 it would be very surprising to me 300 00:17:20,700 --> 00:17:23,430 if that turned out to be unanswerable 301 00:17:23,430 --> 00:17:25,530 using our mathematical axioms. 302 00:17:25,530 --> 00:17:26,692 But, who knows? 303 00:17:26,692 --> 00:17:28,109 Oh, this is another good question. 304 00:17:28,109 --> 00:17:31,260 Do the time and space hierarchy theorems 305 00:17:31,260 --> 00:17:33,130 have non-deterministic variants? 306 00:17:33,130 --> 00:17:34,380 Yes, they do. 307 00:17:34,380 --> 00:17:36,000 They're much harder to prove, however, 308 00:17:36,000 --> 00:17:37,417 and we're not going to cover that. 309 00:17:37,417 --> 00:17:42,900 But you can also prove that non-deterministic time, n cubed 310 00:17:42,900 --> 00:17:45,180 properly includes non-deterministic time 311 00:17:45,180 --> 00:17:45,750 n squared. 312 00:17:45,750 --> 00:17:47,583 You're not going to be responsible for that. 313 00:17:47,583 --> 00:17:48,990 Don't worry. 314 00:17:48,990 --> 00:17:51,480 If you try to actually prove that, 315 00:17:51,480 --> 00:17:58,520 you'll see the diagonalization doesn't directly work. 316 00:17:58,520 --> 00:18:01,730 And so you have to do something fancier. 317 00:18:04,560 --> 00:18:07,320 People are asking about which reduction method to use. 318 00:18:07,320 --> 00:18:11,550 Again, the kinds of reductions that we encounter 319 00:18:11,550 --> 00:18:13,210 are always very simple. 320 00:18:13,210 --> 00:18:16,200 So we're just going to be working with very weak notions 321 00:18:16,200 --> 00:18:17,460 of reductions. 322 00:18:17,460 --> 00:18:20,130 Not interesting yet, generally, to consider powerful kinds 323 00:18:20,130 --> 00:18:25,037 of reductions like polynomial exponential time reductions 324 00:18:25,037 --> 00:18:25,870 or things like that. 325 00:18:25,870 --> 00:18:30,210 So it's just not something that people really think about much. 326 00:18:30,210 --> 00:18:33,360 I mean, I can talk about it at length offline. 327 00:18:33,360 --> 00:18:36,870 But let's just assume that our reduction strength 328 00:18:36,870 --> 00:18:38,370 is something very low. 329 00:18:38,370 --> 00:18:39,930 Log space is going to be good enough 330 00:18:39,930 --> 00:18:41,745 to do all of the reductions in this class. 331 00:18:45,330 --> 00:18:47,490 OK, so let's move on, then. 332 00:18:47,490 --> 00:18:51,090 So here is the problem that we're 333 00:18:51,090 --> 00:18:53,250 going to spend the next 20 minutes 334 00:18:53,250 --> 00:18:59,040 or so proving to be exponential space complete. 335 00:18:59,040 --> 00:19:01,080 I have got to do a little introduction first. 336 00:19:01,080 --> 00:19:07,130 So this is not the problem, but this is related to the problem. 337 00:19:07,130 --> 00:19:10,060 So the problem of testing if two regular expressions 338 00:19:10,060 --> 00:19:12,083 are equivalent. 339 00:19:12,083 --> 00:19:13,500 Write down to regular expressions, 340 00:19:13,500 --> 00:19:15,830 do they generate the same language? 341 00:19:15,830 --> 00:19:18,915 So that problem actually turns out to be in PSPACE. 342 00:19:18,915 --> 00:19:21,040 So it's not going to be exponential space complete. 343 00:19:21,040 --> 00:19:22,732 It's actually in PSPACE. 344 00:19:22,732 --> 00:19:24,190 I don't think we're going to have-- 345 00:19:24,190 --> 00:19:26,260 I thought about presenting it in the lecture. 346 00:19:26,260 --> 00:19:27,820 It's not that hard to show. 347 00:19:27,820 --> 00:19:30,190 But it just took too much time and doesn't really 348 00:19:30,190 --> 00:19:31,780 introduce new methods. 349 00:19:31,780 --> 00:19:35,890 It's a good exercise, actually, using Savitch's theorem. 350 00:19:35,890 --> 00:19:38,740 But maybe we'll do it in recitation, 351 00:19:38,740 --> 00:19:43,150 or if the lecture miraculously ends earlier, 352 00:19:43,150 --> 00:19:44,073 I'll do it at the end. 353 00:19:44,073 --> 00:19:45,490 But I don't think we'll have time. 354 00:19:50,570 --> 00:19:55,520 But that's a setup for the intractable problem 355 00:19:55,520 --> 00:19:58,850 that we're going to talk about, which is very related. 356 00:19:58,850 --> 00:20:01,310 Now, OK, before we get to that, so 357 00:20:01,310 --> 00:20:04,370 if I have a regular expression, I'm 358 00:20:04,370 --> 00:20:11,450 going to enhance our regular expression in one simple way, 359 00:20:11,450 --> 00:20:15,490 by allowing exponents or exponentiation. 360 00:20:15,490 --> 00:20:21,060 And that means if I have a regular expression R, 361 00:20:21,060 --> 00:20:25,170 I can write R to the k to mean R concatenated with itself k 362 00:20:25,170 --> 00:20:26,190 times. 363 00:20:26,190 --> 00:20:28,770 We've been sort of informally using that all the way along 364 00:20:28,770 --> 00:20:31,350 anyway, like when we talk about 0 to the k, 1 to the k. 365 00:20:34,260 --> 00:20:36,330 So if we're going to formally allow that 366 00:20:36,330 --> 00:20:40,080 when we write down regular expressions, in some cases, 367 00:20:40,080 --> 00:20:42,090 that might allow the regular expression 368 00:20:42,090 --> 00:20:46,030 to be much smaller, especially if we're 369 00:20:46,030 --> 00:20:48,800 writing down k in binary. 370 00:20:48,800 --> 00:20:52,220 Because I can write R to the million with just a few 371 00:20:52,220 --> 00:20:55,490 symbols if I have exponentiation. 372 00:20:55,490 --> 00:20:57,500 But if I don't have exponentiation, 373 00:20:57,500 --> 00:20:59,900 then I have to write R concatenated 374 00:20:59,900 --> 00:21:03,750 with R out a million times, and I get a much, 375 00:21:03,750 --> 00:21:07,340 much longer, an exponentially longer expression 376 00:21:07,340 --> 00:21:11,300 if I don't have that exponent as a way of describing 377 00:21:11,300 --> 00:21:13,010 regular expressions. 378 00:21:13,010 --> 00:21:15,870 And that's going to make a big difference. 379 00:21:15,870 --> 00:21:21,410 So now, the equivalence problem for regular expressions with 380 00:21:21,410 --> 00:21:25,890 exponentiation-- that's what that little up arrow means, 381 00:21:25,890 --> 00:21:27,840 what it signifies-- 382 00:21:27,840 --> 00:21:30,390 now I'm giving you two regular expressions. 383 00:21:30,390 --> 00:21:33,780 But they're allowed to have the exponentiation 384 00:21:33,780 --> 00:21:41,860 operation in addition to the standard regular operations. 385 00:21:41,860 --> 00:21:46,560 So now, testing whether two of these regular expressions that 386 00:21:46,560 --> 00:21:49,140 have exponentiation, that problem 387 00:21:49,140 --> 00:21:51,975 turns out to be exponential space complete. 388 00:21:56,495 --> 00:21:58,870 So here's the equivalence problem for regular expressions 389 00:21:58,870 --> 00:22:00,250 with exponentiation. 390 00:22:00,250 --> 00:22:02,840 That's an exponential space complete problem. 391 00:22:02,840 --> 00:22:05,500 And as we pointed out, that means this problem 392 00:22:05,500 --> 00:22:08,360 is provably intractable. 393 00:22:08,360 --> 00:22:11,930 So there's just no way, in general, 394 00:22:11,930 --> 00:22:14,180 to solve that problem in polynomial time. 395 00:22:14,180 --> 00:22:15,740 That's proven, that's known. 396 00:22:19,120 --> 00:22:23,170 So we're going to go through the reduction. 397 00:22:23,170 --> 00:22:25,690 I think it's going to be our last reduction of the term, 398 00:22:25,690 --> 00:22:28,660 of proving problems complete for some class. 399 00:22:28,660 --> 00:22:34,660 But each one of those has their own kind of thing 400 00:22:34,660 --> 00:22:37,120 that makes it special. 401 00:22:37,120 --> 00:22:40,870 So first of all, we have to show that it's in exponential space. 402 00:22:40,870 --> 00:22:42,820 That's really going to rely on this other fact 403 00:22:42,820 --> 00:22:43,870 that we didn't prove. 404 00:22:43,870 --> 00:22:47,360 So I'm going to go over that very quickly. 405 00:22:47,360 --> 00:22:49,400 But the interesting part is doing the reduction. 406 00:22:49,400 --> 00:22:51,970 So if I have something in exponential space 407 00:22:51,970 --> 00:22:54,820 that I can show that I can reduce it 408 00:22:54,820 --> 00:22:58,630 to the equivalence problem for regular expressions 409 00:22:58,630 --> 00:23:00,870 with exponentiation. 410 00:23:00,870 --> 00:23:04,530 OK, so quickly arguing part one that we're 411 00:23:04,530 --> 00:23:07,950 in exponential space, basically, what you do 412 00:23:07,950 --> 00:23:09,690 is you take your two regular expressions 413 00:23:09,690 --> 00:23:12,120 that you want to test to see if they're equivalent, 414 00:23:12,120 --> 00:23:14,010 but now they have exponentiation. 415 00:23:14,010 --> 00:23:17,400 And as a first step, you get rid of the exponentiation. 416 00:23:17,400 --> 00:23:22,620 You just expand things out by repeating the parts that 417 00:23:22,620 --> 00:23:25,050 have the exponents. 418 00:23:25,050 --> 00:23:27,900 And of course, as I said, that's going 419 00:23:27,900 --> 00:23:31,050 to make the expression themselves exponentially 420 00:23:31,050 --> 00:23:33,040 bigger. 421 00:23:33,040 --> 00:23:36,610 But now, you run the PSPACE algorithm 422 00:23:36,610 --> 00:23:40,220 on those two exponentially larger expressions. 423 00:23:40,220 --> 00:23:42,970 So the input that the PSPACE algorithm is now 424 00:23:42,970 --> 00:23:47,620 exponential in the original input size, 425 00:23:47,620 --> 00:23:50,330 but it's PSPACE in that enlarged input. 426 00:23:50,330 --> 00:23:52,690 So that's going to give you an exponential space 427 00:23:52,690 --> 00:23:57,070 algorithm in the original input size, because you expanded 428 00:23:57,070 --> 00:23:58,570 it to become exponentially bigger, 429 00:23:58,570 --> 00:24:04,655 and then you run the PSPACE algorithm on that expanded 430 00:24:04,655 --> 00:24:05,155 problem. 431 00:24:08,620 --> 00:24:10,750 So that gives you an exponential space algorithm 432 00:24:10,750 --> 00:24:15,380 for this problem. 433 00:24:15,380 --> 00:24:17,170 But now, what we're going to do-- 434 00:24:17,170 --> 00:24:20,480 the interesting part is the reduction. 435 00:24:20,480 --> 00:24:24,280 So given some language and exponential space, say, 436 00:24:24,280 --> 00:24:27,880 decided by some Turing machine in that amount of space, 437 00:24:27,880 --> 00:24:33,730 2 to the n to the k, we're going to give a reduction that maps a 438 00:24:33,730 --> 00:24:38,710 to this equivalence problem. 439 00:24:38,710 --> 00:24:40,150 Got it? 440 00:24:40,150 --> 00:24:41,530 That is the plan. 441 00:24:44,640 --> 00:24:47,780 So let's make sure we're all together on the plan 442 00:24:47,780 --> 00:24:51,320 before we go ahead and carry out that plan. 443 00:24:56,160 --> 00:24:57,680 We just sort of set things up here, 444 00:24:57,680 --> 00:25:01,130 in a sense, for what we're going to be doing. 445 00:25:01,130 --> 00:25:08,240 So feel free to ask a question on just the plan. 446 00:25:08,240 --> 00:25:09,890 It's going to get technical. 447 00:25:09,890 --> 00:25:12,740 Because, as doing these reductions always is, 448 00:25:12,740 --> 00:25:15,080 there's a simulation involved, and you 449 00:25:15,080 --> 00:25:19,290 have to kind of describe that simulation in its own way. 450 00:25:19,290 --> 00:25:22,100 So now, we're going to be simulating, 451 00:25:22,100 --> 00:25:28,290 in a certain sense, M on w, the decider 452 00:25:28,290 --> 00:25:32,970 for this exponential space, problem A, 453 00:25:32,970 --> 00:25:34,470 we're going to take M on w and we're 454 00:25:34,470 --> 00:25:39,130 going to somehow have to express the fact that M accepts w using 455 00:25:39,130 --> 00:25:41,130 this equivalence problem for regular expressions 456 00:25:41,130 --> 00:25:42,367 with exponentiation. 457 00:25:47,260 --> 00:25:48,365 So no questions? 458 00:25:48,365 --> 00:25:49,240 Why don't we move on? 459 00:25:52,120 --> 00:25:56,320 I have three slides on this, but they're kind of dense, 460 00:25:56,320 --> 00:25:57,340 I'm sorry to say. 461 00:26:00,810 --> 00:26:04,920 So here is the plan as usual. 462 00:26:04,920 --> 00:26:09,420 We're going to map A with a polynomial time reduction 463 00:26:09,420 --> 00:26:11,910 to the equivalence problem for regular expressions 464 00:26:11,910 --> 00:26:13,590 with exponentiation. 465 00:26:13,590 --> 00:26:16,680 So that means we're going to have to take an input, which 466 00:26:16,680 --> 00:26:21,480 may or may not be in A, and produce two regular expressions 467 00:26:21,480 --> 00:26:28,160 with exponentiation, which are going to be equivalent when 468 00:26:28,160 --> 00:26:33,220 w is in A. Or when M accepts w. 469 00:26:40,220 --> 00:26:45,570 So it's going to be, as these things always are, 470 00:26:45,570 --> 00:26:47,660 these are going to be in terms of the computation 471 00:26:47,660 --> 00:26:49,760 history for M under w. 472 00:26:49,760 --> 00:26:51,590 But in this case, it's going to turn out 473 00:26:51,590 --> 00:26:57,230 to be convenient to work with the rejecting computation 474 00:26:57,230 --> 00:26:59,270 history for M on w. 475 00:26:59,270 --> 00:27:04,240 So remember, now we have a Turing machine M. 476 00:27:04,240 --> 00:27:08,140 It's a decider, so that means it always holds-- 477 00:27:08,140 --> 00:27:11,140 for the strings in the language, it ends up at a Q accept state, 478 00:27:11,140 --> 00:27:15,340 for things not in the language, it ends up at a Q reject state. 479 00:27:15,340 --> 00:27:17,710 So a rejecting computation history 480 00:27:17,710 --> 00:27:19,330 is the sequence of configurations 481 00:27:19,330 --> 00:27:22,870 the machine goes through from the start configuration 482 00:27:22,870 --> 00:27:25,240 until it ends up at a configuration 483 00:27:25,240 --> 00:27:29,890 with a reject state, a rejecting configuration. 484 00:27:29,890 --> 00:27:32,830 And we're going to make a regular expression that 485 00:27:32,830 --> 00:27:38,640 describes all strings except for that one. 486 00:27:38,640 --> 00:27:43,050 It's going to avoid describing a rejecting computation 487 00:27:43,050 --> 00:27:44,670 history for M on w. 488 00:27:44,670 --> 00:27:47,445 Otherwise, it's going to describe all possible strings. 489 00:27:50,480 --> 00:27:54,530 Now, if M does not reject w, so there 490 00:27:54,530 --> 00:27:57,170 is no rejecting computation history-- 491 00:27:57,170 --> 00:27:59,000 namely, M accepts w, by the way. 492 00:27:59,000 --> 00:28:01,790 So if M accepts w, does not reject w, 493 00:28:01,790 --> 00:28:05,270 it does not have a rejecting computation history, 494 00:28:05,270 --> 00:28:09,470 what is R1 describing? 495 00:28:09,470 --> 00:28:12,610 Well, it's describing, in that case, everything, 496 00:28:12,610 --> 00:28:16,450 because there is no rejecting computation history. 497 00:28:16,450 --> 00:28:19,150 So it's describing every other string besides. 498 00:28:19,150 --> 00:28:23,170 So that means it's describing all strings, if there 499 00:28:23,170 --> 00:28:25,390 is no rejecting computation history in the case 500 00:28:25,390 --> 00:28:27,370 that M accepts w. 501 00:28:27,370 --> 00:28:30,890 So what does that suggest we should use for R2? 502 00:28:30,890 --> 00:28:33,740 R2 is going to be the regular expression that 503 00:28:33,740 --> 00:28:36,480 just generates all strings. 504 00:28:36,480 --> 00:28:40,920 So we'll be testing whether R1 generates all strings or not, 505 00:28:40,920 --> 00:28:49,410 which is the same as saying does M accept w or not. 506 00:28:49,410 --> 00:28:52,470 So R2 is going to be-- 507 00:28:52,470 --> 00:28:55,710 I would like to say sigma star, but sigma is really 508 00:28:55,710 --> 00:29:00,622 the input to M, and gamma is the tape alphabet for M. 509 00:29:00,622 --> 00:29:02,580 So we have a lot of Greek letters to play with, 510 00:29:02,580 --> 00:29:07,190 so we're going to use delta for the alphabet 511 00:29:07,190 --> 00:29:10,910 that we write the computation histories in. 512 00:29:10,910 --> 00:29:17,760 If you want to get reminded what that delta is, a computation 513 00:29:17,760 --> 00:29:20,760 history can have a tape alphabet symbol for M, 514 00:29:20,760 --> 00:29:23,970 it can have a state symbol for M, 515 00:29:23,970 --> 00:29:25,590 or it can have a delimiter pound-- 516 00:29:25,590 --> 00:29:26,880 hashtag. 517 00:29:26,880 --> 00:29:32,530 So it's either a capital delta alphabet 518 00:29:32,530 --> 00:29:37,330 is a tape alphabet symbol, or state, something representing 519 00:29:37,330 --> 00:29:40,740 a state symbol, or a hashtag. 520 00:29:40,740 --> 00:29:41,490 That's just delta. 521 00:29:41,490 --> 00:29:44,190 So don't get-- I always feel bad if somebody 522 00:29:44,190 --> 00:29:45,840 gets confused by something that's 523 00:29:45,840 --> 00:29:47,140 supposed to be very simple. 524 00:29:47,140 --> 00:29:49,140 Don't get confused by delta star. 525 00:29:49,140 --> 00:29:51,150 This is just all possible strings 526 00:29:51,150 --> 00:29:52,275 over the alphabet delta. 527 00:29:56,450 --> 00:29:58,820 OK, so what does R1-- 528 00:29:58,820 --> 00:30:01,130 so my job is to do R1. 529 00:30:01,130 --> 00:30:04,160 R2, I already told you. 530 00:30:04,160 --> 00:30:07,380 R1 now has to describe all those strings 531 00:30:07,380 --> 00:30:10,830 except for the rejecting computation history. 532 00:30:10,830 --> 00:30:16,260 So everything that fails to be a rejecting computation history-- 533 00:30:16,260 --> 00:30:19,380 so it fails either because it started wrong, 534 00:30:19,380 --> 00:30:22,320 or it ended wrong, or it's wrong somewhere in the middle. 535 00:30:22,320 --> 00:30:27,510 And by wrong I mean, it fails to correctly describe 536 00:30:27,510 --> 00:30:32,235 the way the machine operates if it's ending up rejecting w. 537 00:30:35,550 --> 00:30:36,050 All right. 538 00:30:36,050 --> 00:30:40,970 So I'm going to describe all those possible strings 539 00:30:40,970 --> 00:30:44,780 by breaking it down into those three categories. 540 00:30:44,780 --> 00:30:47,870 Starts wrong, ends wrong, or somewhere computes 541 00:30:47,870 --> 00:30:51,180 wrong along the way. 542 00:30:51,180 --> 00:30:51,680 OK. 543 00:30:51,680 --> 00:30:56,780 So rejecting computation history looks something like this. 544 00:30:56,780 --> 00:31:04,400 Here's the start configuration as we usually envision it. 545 00:31:04,400 --> 00:31:07,160 It's a start state looking at the first symbol of the input, 546 00:31:07,160 --> 00:31:09,650 and there's the rest of the input. 547 00:31:09,650 --> 00:31:10,955 So let me just write this out. 548 00:31:13,615 --> 00:31:15,760 This is a rejecting computation history now. 549 00:31:15,760 --> 00:31:19,310 So the first configuration, the second one, 550 00:31:19,310 --> 00:31:21,790 and so on and so on, until we end up at a rejecting 551 00:31:21,790 --> 00:31:26,350 computation-- rejecting configuration. 552 00:31:26,350 --> 00:31:32,530 Now, for convenience, I'm going to insist 553 00:31:32,530 --> 00:31:38,960 that all of these configurations are the same length. 554 00:31:38,960 --> 00:31:44,620 It's going to make my life easier in doing the proof. 555 00:31:44,620 --> 00:31:47,630 But why can I do that? 556 00:31:47,630 --> 00:31:49,193 Well, I'm just going to take them-- 557 00:31:49,193 --> 00:31:51,610 you know, because usually you think of the configurations, 558 00:31:51,610 --> 00:31:54,340 they start small because they're just basically of length n, 559 00:31:54,340 --> 00:31:56,380 but this is using exponential space, 560 00:31:56,380 --> 00:31:57,820 they're getting longer and longer. 561 00:31:57,820 --> 00:32:00,280 Let's just pair them all out with blanks 562 00:32:00,280 --> 00:32:02,780 so that they're all the same size. 563 00:32:02,780 --> 00:32:04,900 So as I've indicated over here, we're 564 00:32:04,900 --> 00:32:06,340 adding in a bunch of blanks. 565 00:32:06,340 --> 00:32:09,436 It's going to be a lot of blanks here, 566 00:32:09,436 --> 00:32:12,280 to make sure they all have length 2 to the n 567 00:32:12,280 --> 00:32:15,260 to the k, which is the maximum size of a configuration 568 00:32:15,260 --> 00:32:16,510 when you have that much space. 569 00:32:24,440 --> 00:32:26,720 I'm going to construct-- so basically, that's my job. 570 00:32:26,720 --> 00:32:29,000 I'm going to construct R1 so that it 571 00:32:29,000 --> 00:32:30,980 generates all those strings. 572 00:32:30,980 --> 00:32:37,500 I wrote a little box around that thing I'm trying to-- 573 00:32:41,703 --> 00:32:44,030 that's my to do. 574 00:32:44,030 --> 00:32:46,400 It's going to help me in the coming slides 575 00:32:46,400 --> 00:32:49,370 because they're a little bit dense. 576 00:32:49,370 --> 00:32:52,550 When I'm going to draw this sort of reddish, pinkish box 577 00:32:52,550 --> 00:32:55,760 around something, that means that I'm 578 00:32:55,760 --> 00:32:58,970 going to try to describe all strings except for that one. 579 00:33:10,770 --> 00:33:12,600 I want to avoid describing that one, 580 00:33:12,600 --> 00:33:14,730 because that's the rejecting computation history, 581 00:33:14,730 --> 00:33:16,410 but I want to describe everything else. 582 00:33:16,410 --> 00:33:18,585 That's my wish. 583 00:33:21,870 --> 00:33:25,180 So here's a check in before we move forward. 584 00:33:25,180 --> 00:33:26,790 But we can also-- maybe we should just 585 00:33:26,790 --> 00:33:30,463 take some questions, even before we launch the check in. 586 00:33:30,463 --> 00:33:31,380 How are we doing here? 587 00:33:36,000 --> 00:33:40,380 So, is our one describing-- 588 00:33:40,380 --> 00:33:43,520 well, R1 is a regular expression. 589 00:33:43,520 --> 00:33:45,620 Over here, we're talking about a-- 590 00:33:45,620 --> 00:33:47,630 this is just an ordinary computation history, 591 00:33:47,630 --> 00:33:48,720 but it ends with a reject. 592 00:33:48,720 --> 00:33:49,220 That's all. 593 00:33:49,220 --> 00:33:52,040 A rejecting computation history is just one that's 594 00:33:52,040 --> 00:33:53,540 a little different at the end. 595 00:33:53,540 --> 00:33:56,180 The machine just ended up rejecting instead of accepting. 596 00:33:56,180 --> 00:34:03,620 Otherwise everything has to be spelled out in accordance 597 00:34:03,620 --> 00:34:06,254 with the rules of the machine and the start configuration. 598 00:34:09,710 --> 00:34:13,597 Yeah, we were assuming one rejecting state. 599 00:34:13,597 --> 00:34:15,889 Yeah, that's the way we actually define Turing machines 600 00:34:15,889 --> 00:34:16,681 in the first place. 601 00:34:16,681 --> 00:34:19,159 But, who's arguing. 602 00:34:19,159 --> 00:34:21,460 Yeah, there's one reject state. 603 00:34:21,460 --> 00:34:24,460 We're all deterministic, correct. 604 00:34:24,460 --> 00:34:25,960 Why do we need the padding? 605 00:34:25,960 --> 00:34:29,560 Because I want to make these all the same size, all of these 606 00:34:29,560 --> 00:34:30,550 configurations. 607 00:34:30,550 --> 00:34:32,739 That's going to help me later in terms 608 00:34:32,739 --> 00:34:37,810 of describing the invalid configurations, the ones that 609 00:34:37,810 --> 00:34:42,980 are not legal configurations, legal rejecting configurations. 610 00:34:42,980 --> 00:34:45,409 So just simply a matter of convenience, 611 00:34:45,409 --> 00:34:47,449 but just accept it for now. 612 00:34:47,449 --> 00:34:49,580 I just want all of those configurations 613 00:34:49,580 --> 00:34:55,540 to be the same length in my rejecting computation history. 614 00:34:55,540 --> 00:34:57,370 Otherwise I'm not going to-- 615 00:34:57,370 --> 00:34:59,980 I'm just coding that rejecting computation history 616 00:34:59,980 --> 00:35:01,210 in this particular way. 617 00:35:06,610 --> 00:35:09,340 So people are asking about the details of bad start. 618 00:35:09,340 --> 00:35:10,540 That's yet to come. 619 00:35:10,540 --> 00:35:13,010 I have two more slides on this. 620 00:35:13,010 --> 00:35:16,950 So I'll tell you about how we're going to do those. 621 00:35:16,950 --> 00:35:21,480 So R bad-start-- that's a good question-- is R bad-start all-- 622 00:35:21,480 --> 00:35:25,770 these are all the strings that don't start this way. 623 00:35:25,770 --> 00:35:27,060 We'll see it in a second. 624 00:35:27,060 --> 00:35:30,480 But R bad-start are all the things that don't start with 625 00:35:30,480 --> 00:35:31,890 the-- 626 00:35:31,890 --> 00:35:33,000 they start bad. 627 00:35:36,720 --> 00:35:39,577 They're not starting with the start configuration. 628 00:35:39,577 --> 00:35:41,160 They're starting with some other junk. 629 00:35:46,130 --> 00:35:48,850 Do we need only one rejecting computation history? 630 00:35:48,850 --> 00:35:51,410 What about the other ones? 631 00:35:51,410 --> 00:35:54,500 This is a deterministic machine, so there's only going to be-- 632 00:35:54,500 --> 00:35:57,060 if I prescribe the lengths as I've done, 633 00:35:57,060 --> 00:35:59,607 there's going to be only one rejecting computation history. 634 00:35:59,607 --> 00:36:01,190 Because it's deterministic, everything 635 00:36:01,190 --> 00:36:06,380 is going to be forced from the beginning. 636 00:36:06,380 --> 00:36:09,020 Should R1 be the not of those three? 637 00:36:09,020 --> 00:36:09,880 No. 638 00:36:09,880 --> 00:36:11,870 R1 is describing all of the strings 639 00:36:11,870 --> 00:36:18,660 except, except this one string. 640 00:36:18,660 --> 00:36:21,800 So I'm capturing all the different possible ways 641 00:36:21,800 --> 00:36:24,140 a string could fail to be the string. 642 00:36:24,140 --> 00:36:26,180 It could start wrong. 643 00:36:26,180 --> 00:36:28,760 Could be wrong along the middle somewhere. 644 00:36:28,760 --> 00:36:31,870 So I have to union them together. 645 00:36:31,870 --> 00:36:35,290 Because I'm describing-- as I always believe, 646 00:36:35,290 --> 00:36:38,420 negations are the most confusing thing to everybody, 647 00:36:38,420 --> 00:36:41,510 including me. 648 00:36:41,510 --> 00:36:43,460 So we're describing all the things 649 00:36:43,460 --> 00:36:47,143 that are not this string. 650 00:36:47,143 --> 00:36:48,810 We're trying to stay away from that one. 651 00:36:48,810 --> 00:36:50,580 We want to describe everything else. 652 00:36:55,308 --> 00:36:57,100 All right, I think I'd better move on here. 653 00:36:57,100 --> 00:36:58,740 We've got a lot of questions. 654 00:36:58,740 --> 00:37:01,470 Talk to the TAs. 655 00:37:01,470 --> 00:37:02,700 All right, check in. 656 00:37:09,190 --> 00:37:15,110 How big is this rejecting computation history anyway? 657 00:37:15,110 --> 00:37:16,430 Interesting. 658 00:37:16,430 --> 00:37:18,620 There's a lesson here. 659 00:37:18,620 --> 00:37:21,680 I got a big burst of answers right at the very beginning. 660 00:37:21,680 --> 00:37:24,550 All wrong. 661 00:37:24,550 --> 00:37:26,020 But then the bright-- 662 00:37:26,020 --> 00:37:29,770 the people who took a little bit more time to think 663 00:37:29,770 --> 00:37:35,620 started getting the right answer, which is-- 664 00:37:35,620 --> 00:37:36,130 let's look. 665 00:37:36,130 --> 00:37:39,622 We've got a close election here folks, so now I have to report. 666 00:37:39,622 --> 00:37:41,080 Hope we don't have to do a recount. 667 00:37:45,680 --> 00:37:47,450 OK, come on guys. 668 00:37:47,450 --> 00:37:48,500 Answer up. 669 00:37:48,500 --> 00:37:50,152 10 seconds. 670 00:37:50,152 --> 00:37:51,110 This is not super hard. 671 00:37:53,690 --> 00:37:54,395 Stop the count. 672 00:37:57,500 --> 00:38:00,905 Yeah, I think we'd better stop at this, we're on the edge. 673 00:38:03,880 --> 00:38:05,845 OK, 3 seconds. 674 00:38:10,030 --> 00:38:12,780 End polling. 675 00:38:12,780 --> 00:38:13,470 Share results. 676 00:38:20,670 --> 00:38:23,100 The correct answer is, in fact, c. 677 00:38:23,100 --> 00:38:23,730 Why is that? 678 00:38:23,730 --> 00:38:27,900 Because each configuration is 2 to the n to the k. 679 00:38:27,900 --> 00:38:31,770 So that's how much space the machine has, exponential space. 680 00:38:31,770 --> 00:38:35,328 But the amount of time, which is each one-- 681 00:38:35,328 --> 00:38:36,870 the number of configurations is going 682 00:38:36,870 --> 00:38:39,600 to be the amount of time that's used. 683 00:38:39,600 --> 00:38:42,300 It's going to be exponentially more even than that. 684 00:38:42,300 --> 00:38:45,450 So it's going to be 2 to the 2 to the n of the k, 685 00:38:45,450 --> 00:38:47,550 is how many steps the machine can run. 686 00:38:47,550 --> 00:38:51,420 And that's going to be how long the computation history could 687 00:38:51,420 --> 00:38:52,380 be. 688 00:38:52,380 --> 00:38:55,650 So it's a very long thing. 689 00:38:55,650 --> 00:39:02,070 And when you think about it, the regular expression 690 00:39:02,070 --> 00:39:03,860 we are generating, how big is that? 691 00:39:06,880 --> 00:39:09,490 The regular expression-- again, a lot 692 00:39:09,490 --> 00:39:11,830 of people playing off my comments here. 693 00:39:15,660 --> 00:39:17,100 Were the votes legal or not? 694 00:39:17,100 --> 00:39:18,810 OK. 695 00:39:18,810 --> 00:39:20,050 Let's focus here. 696 00:39:22,990 --> 00:39:26,260 So this is doubly exponentially large. 697 00:39:26,260 --> 00:39:28,360 How big is the regular expression 698 00:39:28,360 --> 00:39:29,680 that we're generating? 699 00:39:29,680 --> 00:39:32,780 Well that has to be produced in polynomial time, 700 00:39:32,780 --> 00:39:34,330 so it's only polynomially big. 701 00:39:34,330 --> 00:39:38,260 So we have this little teensy weensy, relatively speaking, 702 00:39:38,260 --> 00:39:42,100 regular expression, which is only n to the k. 703 00:39:42,100 --> 00:39:44,830 It's having to describe all strings 704 00:39:44,830 --> 00:39:49,300 except for this particular string, which is 2 to the 2 705 00:39:49,300 --> 00:39:51,050 to the n to the k. 706 00:39:51,050 --> 00:39:54,410 So in a sense, this string that is 707 00:39:54,410 --> 00:39:56,270 related to that regular expression 708 00:39:56,270 --> 00:39:58,520 is doubly exponentially larger than that. 709 00:39:58,520 --> 00:40:01,070 And that kind of presents some of the challenge 710 00:40:01,070 --> 00:40:05,000 in doing the reduction, in constructing 711 00:40:05,000 --> 00:40:07,640 that regular expression. 712 00:40:07,640 --> 00:40:09,740 So let's move on and start doing-- 713 00:40:09,740 --> 00:40:12,660 this is the hard stuff. 714 00:40:12,660 --> 00:40:17,430 Here is the bad start, which is challenging enough. 715 00:40:17,430 --> 00:40:20,160 Even this little piece is going to be a little bit challenging 716 00:40:20,160 --> 00:40:22,680 to describe. 717 00:40:22,680 --> 00:40:26,230 Just rewriting from the previous slide. 718 00:40:26,230 --> 00:40:28,050 So we're trying to make R1, which 719 00:40:28,050 --> 00:40:30,540 is generating all the strings except the rejecting 720 00:40:30,540 --> 00:40:34,230 computation history for M on w. 721 00:40:34,230 --> 00:40:35,940 It's in those three parts. 722 00:40:35,940 --> 00:40:39,150 Right now I'm describing the bad start piece. 723 00:40:39,150 --> 00:40:41,820 So that's going to describe all strings that 724 00:40:41,820 --> 00:40:46,630 don't start with this C1. 725 00:40:46,630 --> 00:40:47,880 So let me write that out here. 726 00:40:47,880 --> 00:40:49,650 This is going to generate all strings that 727 00:40:49,650 --> 00:40:56,520 don't start with C start or C1, which is as specified. 728 00:40:56,520 --> 00:40:57,460 Looks like this. 729 00:40:57,460 --> 00:41:01,320 So any string that doesn't start with these symbols, doesn't 730 00:41:01,320 --> 00:41:06,610 start exactly like this, should be 731 00:41:06,610 --> 00:41:12,220 described by bad start, that regular expression. 732 00:41:12,220 --> 00:41:17,620 So that, in itself, is going to be further subdivided. 733 00:41:17,620 --> 00:41:22,600 And the reason for that is not that hard to understand. 734 00:41:22,600 --> 00:41:24,490 I'm going to-- bad start is going 735 00:41:24,490 --> 00:41:31,600 to accomplish its goal by saying, well, 736 00:41:31,600 --> 00:41:34,630 anything that doesn't start this way either 737 00:41:34,630 --> 00:41:38,260 doesn't start with a q0, or doesn't or doesn't 738 00:41:38,260 --> 00:41:40,570 have a w1 in the next place, or doesn't 739 00:41:40,570 --> 00:41:42,460 have a w2 in the next place. 740 00:41:42,460 --> 00:41:48,220 Or somewhere along the way, it has a wrong symbol. 741 00:41:48,220 --> 00:41:52,960 Each one of these guys is going to be about one 742 00:41:52,960 --> 00:41:58,973 of those symbols being wrong in some particular place. 743 00:41:58,973 --> 00:42:00,890 So I'm going to show you what those look like. 744 00:42:00,890 --> 00:42:07,750 So right now, I'm going to focus my attention 745 00:42:07,750 --> 00:42:12,960 on describing all strings except for this one. 746 00:42:12,960 --> 00:42:18,150 All strings that start with something except for this one. 747 00:42:20,970 --> 00:42:23,310 So just remember, delta is the alphabet 748 00:42:23,310 --> 00:42:26,070 for the competition histories. 749 00:42:26,070 --> 00:42:28,950 And some notation here, delta sub epsilon, 750 00:42:28,950 --> 00:42:30,450 we've seen this before, is you're 751 00:42:30,450 --> 00:42:35,490 going to add in epsilon as an allowed thing for delta. 752 00:42:35,490 --> 00:42:39,600 So it's all the symbols, or epsilon, now 753 00:42:39,600 --> 00:42:41,740 thought of as a set here. 754 00:42:41,740 --> 00:42:45,390 And furthermore, it's going to be convenient to talk about all 755 00:42:45,390 --> 00:42:49,080 of the symbols in delta, except for some symbol. 756 00:42:49,080 --> 00:42:51,570 So like at the very beginning, q0. 757 00:42:51,570 --> 00:42:55,092 I want to talk about all of the symbols except for q0 symbol. 758 00:42:55,092 --> 00:42:56,550 Because that's what I'm going to be 759 00:42:56,550 --> 00:43:02,130 using to start off R bad-start. 760 00:43:02,130 --> 00:43:05,590 It's going to be anything except for q0. 761 00:43:05,590 --> 00:43:07,370 So let's just see how that looks. 762 00:43:07,370 --> 00:43:11,890 So here is S0, the very first part of our bad start. 763 00:43:11,890 --> 00:43:14,420 It's going to say-- 764 00:43:14,420 --> 00:43:18,430 I'm trying to color the active ingredient here 765 00:43:18,430 --> 00:43:22,020 in the pink color. 766 00:43:22,020 --> 00:43:29,980 So delta, with q0 removed, followed by anything. 767 00:43:29,980 --> 00:43:31,570 So this little regular expression 768 00:43:31,570 --> 00:43:36,460 here describes all strings that don't start with a q0, 769 00:43:36,460 --> 00:43:38,110 as I'm indicating over here. 770 00:43:38,110 --> 00:43:40,930 All strings that don't start with a q0 771 00:43:40,930 --> 00:43:45,750 is what as S0 describes. 772 00:43:45,750 --> 00:43:47,782 You have to understand that, because it's just 773 00:43:47,782 --> 00:43:48,990 going to build up from there. 774 00:43:51,880 --> 00:43:53,820 So what do we want to say for S1? 775 00:43:53,820 --> 00:43:55,860 What's going to be all strings that don't 776 00:43:55,860 --> 00:43:58,990 have w1 in the second place? 777 00:43:58,990 --> 00:44:01,960 So I'm going to write that over here. 778 00:44:01,960 --> 00:44:06,520 S1 is anything in the first place-- 779 00:44:06,520 --> 00:44:09,400 I mean, if the first place was wrong, S0 took care of it. 780 00:44:09,400 --> 00:44:11,400 So I'm just going to keep my life simple. 781 00:44:11,400 --> 00:44:15,030 All I want to do is describe all of the places 782 00:44:15,030 --> 00:44:17,400 where the second symbol is wrong. 783 00:44:17,400 --> 00:44:19,530 Namely, it's not w1. 784 00:44:19,530 --> 00:44:22,600 So anything in the first place, something 785 00:44:22,600 --> 00:44:27,490 besides w1 in the next place, and then anything at all 786 00:44:27,490 --> 00:44:28,150 afterward. 787 00:44:28,150 --> 00:44:30,460 Those are all strings that don't have-- 788 00:44:30,460 --> 00:44:34,840 [AUDIO CUTS] 789 00:44:34,840 --> 00:44:37,090 So I'll write it over here like that. 790 00:44:37,090 --> 00:44:41,840 Now S2 similarly is going to d since I have exponentiation, 791 00:44:41,840 --> 00:44:45,100 let's use that for convenience. 792 00:44:45,100 --> 00:44:47,930 Delta delta, or just delta squared. 793 00:44:47,930 --> 00:44:53,645 So anything in the first two places, then not w2, and then 794 00:44:53,645 --> 00:44:55,640 the next place, and then anything. 795 00:44:55,640 --> 00:44:57,540 So that's going to capture this part. 796 00:44:57,540 --> 00:44:59,630 So this is what these S's do, and you 797 00:44:59,630 --> 00:45:01,622 can sort of get the idea. 798 00:45:01,622 --> 00:45:02,330 So dot, dot, dot. 799 00:45:02,330 --> 00:45:04,790 This Sn is going to describe everything 800 00:45:04,790 --> 00:45:08,675 except for wn in that location, which 801 00:45:08,675 --> 00:45:12,590 is going to be the n plus first location, actually. 802 00:45:12,590 --> 00:45:18,620 And now I have to continue on doing that for the blanks. 803 00:45:18,620 --> 00:45:24,380 So now, if you think with me, let's just 804 00:45:24,380 --> 00:45:26,120 take a look how that could go. 805 00:45:29,630 --> 00:45:34,790 The next symbol, which is skipping over the n plus 1 806 00:45:34,790 --> 00:45:38,440 that I've already taken care of, I 807 00:45:38,440 --> 00:45:41,890 want to say it's not a blank symbol in this very 808 00:45:41,890 --> 00:45:44,450 first location after the input. 809 00:45:44,450 --> 00:45:46,660 So again, I'm describing these non-- 810 00:45:46,660 --> 00:45:49,420 these strings which are not the start configuration. 811 00:45:49,420 --> 00:45:53,198 It could fail because there's not a blank where 812 00:45:53,198 --> 00:45:54,490 there's supposed to be a blank. 813 00:45:57,190 --> 00:45:59,380 Suppose I do that for each one of these guys. 814 00:46:02,790 --> 00:46:05,570 That would work. 815 00:46:05,570 --> 00:46:06,560 But. 816 00:46:06,560 --> 00:46:07,370 But what? 817 00:46:10,250 --> 00:46:13,060 Think. 818 00:46:13,060 --> 00:46:17,260 This is actually not going to be a good solution for us. 819 00:46:17,260 --> 00:46:21,700 Because there are exponentially many blanks over here. 820 00:46:21,700 --> 00:46:24,652 This is a hugely long configuration. 821 00:46:24,652 --> 00:46:26,360 And so there's exponentially many blanks. 822 00:46:26,360 --> 00:46:30,890 If I do it this way, I'm going to end up with an exponentially 823 00:46:30,890 --> 00:46:32,990 large regular expression. 824 00:46:32,990 --> 00:46:35,970 And that's not doable in polynomial time. 825 00:46:35,970 --> 00:46:39,380 So I have a more complicated way of getting the same effect. 826 00:46:39,380 --> 00:46:40,970 Which is-- I don't really expect you 827 00:46:40,970 --> 00:46:43,010 to fully parse through this right 828 00:46:43,010 --> 00:46:46,250 now, in real time in lecture, but let me try to help you. 829 00:46:46,250 --> 00:46:50,030 What I'm going to do is skip over these first initial n 830 00:46:50,030 --> 00:46:53,120 plus 1 places, and then a variable number 831 00:46:53,120 --> 00:46:58,310 of places, which is indicated by the next piece here. 832 00:46:58,310 --> 00:47:00,040 And the way that works is-- 833 00:47:00,040 --> 00:47:02,110 these are all strings of length n 834 00:47:02,110 --> 00:47:09,370 plus 1 through the end of the configuration. 835 00:47:09,370 --> 00:47:13,930 And to understand that, it's almost a little 836 00:47:13,930 --> 00:47:16,630 too technical to even try, but let's see. 837 00:47:16,630 --> 00:47:19,840 If I put delta to the 7, that's all strings of length 7. 838 00:47:19,840 --> 00:47:22,840 But if I put delta sub epsilon to the 7, 839 00:47:22,840 --> 00:47:24,400 if you think about what that means, 840 00:47:24,400 --> 00:47:28,120 that's all strings of length between 0 and 7. 841 00:47:31,530 --> 00:47:33,660 Because I can either have it as epsilon 842 00:47:33,660 --> 00:47:37,110 as my variable or a symbol from delta. 843 00:47:37,110 --> 00:47:39,360 And so that's what I'm doing over here. 844 00:47:39,360 --> 00:47:45,240 I'm getting a variable length space, spacer of deltas, 845 00:47:45,240 --> 00:47:48,420 that are going to then end up at a certain location-- 846 00:47:48,420 --> 00:47:50,670 I'm going to say at that place. 847 00:47:50,670 --> 00:47:53,430 Then I have a non-blank. 848 00:47:53,430 --> 00:47:56,220 Because all I need to do is describe 849 00:47:56,220 --> 00:48:02,440 the strings that fail to have a blank somewhere in this range. 850 00:48:02,440 --> 00:48:04,680 So we've got to sort have a variable spacer 851 00:48:04,680 --> 00:48:10,800 out to that spot, where that missing blank might be. 852 00:48:10,800 --> 00:48:12,850 So that's what this describes. 853 00:48:12,850 --> 00:48:15,420 If you didn't get that, don't worry. 854 00:48:15,420 --> 00:48:17,010 That is a technical point and you can 855 00:48:17,010 --> 00:48:19,620 try to think about it offline. 856 00:48:19,620 --> 00:48:25,380 And then at the very end, I'm going to describe what happens. 857 00:48:25,380 --> 00:48:27,600 Describe the strings that fail to have 858 00:48:27,600 --> 00:48:30,510 a hashtag in that location. 859 00:48:30,510 --> 00:48:36,180 It's how I describe all strings that don't start right. 860 00:48:36,180 --> 00:48:39,330 That's a lot of work, just to do that little piece. 861 00:48:39,330 --> 00:48:42,540 Fortunately, the next two pieces are easier, surprisingly. 862 00:48:46,110 --> 00:48:50,160 You can jump in with a question, but maybe I 863 00:48:50,160 --> 00:48:53,580 should move, push on. 864 00:48:53,580 --> 00:48:58,580 So now I'm going to describe the bad move and bad reject pieces. 865 00:48:58,580 --> 00:49:02,150 And bad reject generates all strings 866 00:49:02,150 --> 00:49:06,200 that don't contain the q reject symbol. 867 00:49:06,200 --> 00:49:07,850 So that's going to certainly describe 868 00:49:07,850 --> 00:49:11,580 all of the strings that don't end correctly. 869 00:49:11,580 --> 00:49:15,870 And that's just simply the delta with the q 870 00:49:15,870 --> 00:49:19,730 reject symbol removed, and then any string of those. 871 00:49:19,730 --> 00:49:22,140 That's all strings that don't have q reject. 872 00:49:22,140 --> 00:49:26,360 So that's going to describe all strings that 873 00:49:26,360 --> 00:49:28,713 don't end with a q reject, plus some other junk 874 00:49:28,713 --> 00:49:29,630 strings along the way. 875 00:49:29,630 --> 00:49:33,500 But that's all that's never a problem, 876 00:49:33,500 --> 00:49:36,737 to put in other strings that you might be capturing 877 00:49:36,737 --> 00:49:38,570 in some other part of the regular expression 878 00:49:38,570 --> 00:49:40,220 that you know are bad strings. 879 00:49:40,220 --> 00:49:43,400 You just want to make sure you don't put in that one uniquely 880 00:49:43,400 --> 00:49:45,500 good string, which is the rejecting computation 881 00:49:45,500 --> 00:49:47,700 history, good string. 882 00:49:47,700 --> 00:49:51,470 And lastly, we're going to use the notion 883 00:49:51,470 --> 00:49:54,512 of the neighborhoods. 884 00:49:54,512 --> 00:49:56,220 You might think this is the hardest part, 885 00:49:56,220 --> 00:49:57,820 but in fact not that hard. 886 00:49:57,820 --> 00:50:01,650 So these are all of the strings that 887 00:50:01,650 --> 00:50:08,160 have somewhere along the way a violation according 888 00:50:08,160 --> 00:50:09,480 to M's rules. 889 00:50:09,480 --> 00:50:11,580 You want to describe all of those as well. 890 00:50:11,580 --> 00:50:13,720 I'm going to do that in terms of the neighborhoods. 891 00:50:13,720 --> 00:50:19,320 But the neighborhoods are going to be stretched out. 892 00:50:19,320 --> 00:50:22,460 We don't have a tableau anymore, so they're not 893 00:50:22,460 --> 00:50:25,910 so easily visualizable, but it's the same idea, 894 00:50:25,910 --> 00:50:26,820 the neighborhood. 895 00:50:26,820 --> 00:50:28,700 So this is abc and def. 896 00:50:28,700 --> 00:50:32,630 But now it's an illegal neighborhood. 897 00:50:32,630 --> 00:50:34,745 def does not follow from abc. 898 00:50:39,170 --> 00:50:41,330 If all the neighborhoods are legal, 899 00:50:41,330 --> 00:50:45,710 then the whole computation is a legitimate computation, 900 00:50:45,710 --> 00:50:48,060 provided it starts and ends correctly. 901 00:50:48,060 --> 00:50:50,030 So if it's not a legitimate computation, 902 00:50:50,030 --> 00:50:52,520 there's got to be an illegal neighborhood somewhere. 903 00:50:52,520 --> 00:50:55,250 And I'm going to just describe all strings that 904 00:50:55,250 --> 00:50:57,530 have an illegal neighborhood. 905 00:50:57,530 --> 00:51:00,350 And the interesting part is that you have to describe-- 906 00:51:00,350 --> 00:51:07,430 you have to place that separator between abc and def. 907 00:51:07,430 --> 00:51:10,550 So this is another place where we're going to critically use 908 00:51:10,550 --> 00:51:17,780 the exponentiation, and the fact that all of the configurations 909 00:51:17,780 --> 00:51:19,020 are the same length. 910 00:51:19,020 --> 00:51:20,420 That's what we're using there. 911 00:51:20,420 --> 00:51:23,720 We know exactly how far apart the bottom 912 00:51:23,720 --> 00:51:26,960 of the 2 by 3 neighborhood is from the top of the 2 913 00:51:26,960 --> 00:51:28,640 by 3 neighborhood. 914 00:51:28,640 --> 00:51:33,110 So we're going to take a union over all illegal 2 915 00:51:33,110 --> 00:51:35,900 by 3 neighborhoods. 916 00:51:35,900 --> 00:51:37,520 Neighborhood settings, I should say. 917 00:51:37,520 --> 00:51:39,620 And there's only a fixed number of those, for the same reason 918 00:51:39,620 --> 00:51:41,360 that we had in the Cook-Levin theorem. 919 00:51:41,360 --> 00:51:44,210 There's a fixed number of those, depending upon the machine. 920 00:51:44,210 --> 00:51:45,920 And now we're going to have, say, we're 921 00:51:45,920 --> 00:51:48,050 going to start with anything. 922 00:51:48,050 --> 00:51:49,730 Here's the top of the neighborhood. 923 00:51:49,730 --> 00:51:53,480 Here is the separator that separates the top 924 00:51:53,480 --> 00:51:57,020 from the bottom in the two consecutive configurations, 925 00:51:57,020 --> 00:52:03,410 here's Ci going C i plus 1 inside my computation history. 926 00:52:03,410 --> 00:52:06,350 And then after that separator, I put 927 00:52:06,350 --> 00:52:09,365 in the second part of the neighborhood, which is the def. 928 00:52:13,020 --> 00:52:16,870 You have to really be comfortable with the way we've 929 00:52:16,870 --> 00:52:19,060 been presenting these other reductions up 930 00:52:19,060 --> 00:52:22,090 till now, to really get this. 931 00:52:22,090 --> 00:52:25,240 Anyway, I think we're at the break. 932 00:52:25,240 --> 00:52:27,940 So we can just take questions during the break, 933 00:52:27,940 --> 00:52:29,730 if you have any. 934 00:52:29,730 --> 00:52:35,910 And I will, otherwise, see you in five minutes. 935 00:52:35,910 --> 00:52:38,760 In my description back here-- 936 00:52:38,760 --> 00:52:40,020 let me just take this off. 937 00:52:42,810 --> 00:52:47,910 For bad reject, it looks like I'm doing kind of overkill, 938 00:52:47,910 --> 00:52:50,160 and maybe doing something wrong here. 939 00:52:50,160 --> 00:52:53,310 I'm describing all strings that don't have a reject anywhere. 940 00:52:53,310 --> 00:52:57,930 But as long as I don't describe the legitimate rejecting 941 00:52:57,930 --> 00:53:02,200 computation history, I do describe all strings that 942 00:53:02,200 --> 00:53:07,490 don't end correctly, I'm good. 943 00:53:07,490 --> 00:53:10,670 I could go through more effort to make sure 944 00:53:10,670 --> 00:53:15,380 that I'm only describing the very last configuration here 945 00:53:15,380 --> 00:53:17,270 as not having the reject. 946 00:53:17,270 --> 00:53:22,465 But that would just be more work, 947 00:53:22,465 --> 00:53:23,840 and I don't need to do that work. 948 00:53:23,840 --> 00:53:25,673 So maybe it would be good just to understand 949 00:53:25,673 --> 00:53:28,172 why this is sufficient, what I've described here, 950 00:53:28,172 --> 00:53:30,005 and it's not going to cause me any problems. 951 00:53:37,010 --> 00:53:41,270 I'm getting a note from one of my TAs, Thomas, 952 00:53:41,270 --> 00:53:43,910 saying that the notion "bad" perhaps 953 00:53:43,910 --> 00:53:46,610 is confusing, because bad sounds like rejecting. 954 00:53:46,610 --> 00:53:47,720 Yes. 955 00:53:47,720 --> 00:53:52,490 I mean bad in the sense of not describing a legal computation 956 00:53:52,490 --> 00:53:53,150 history. 957 00:53:53,150 --> 00:53:54,525 If you can think of another name, 958 00:53:54,525 --> 00:53:59,090 I'm happy to switch that for future years. 959 00:53:59,090 --> 00:54:00,020 Too late for now. 960 00:54:00,020 --> 00:54:00,950 But, yeah. 961 00:54:00,950 --> 00:54:03,290 I don't mean that rejecting, I mean that it's-- 962 00:54:07,050 --> 00:54:08,910 well, I don't know what the right term is. 963 00:54:08,910 --> 00:54:11,040 Illegal? 964 00:54:11,040 --> 00:54:14,515 Or-- I'm not sure what a good-- 965 00:54:14,515 --> 00:54:16,140 How are the neighborhoods defined here? 966 00:54:16,140 --> 00:54:18,980 What is the tableau here? 967 00:54:18,980 --> 00:54:22,790 I think you do need to think about it after lecture. 968 00:54:22,790 --> 00:54:25,580 But the tableau, you can think of the tableau 969 00:54:25,580 --> 00:54:28,030 now here just written out linearly. 970 00:54:28,030 --> 00:54:30,240 There are all of the rows now, instead of 971 00:54:30,240 --> 00:54:32,760 nicely organized into a table. 972 00:54:32,760 --> 00:54:35,557 They just appear consecutively, because I'm just 973 00:54:35,557 --> 00:54:36,390 trying to describe-- 974 00:54:36,390 --> 00:54:38,070 I need to do it to describe a string, 975 00:54:38,070 --> 00:54:40,080 whether my regular expression doesn't really 976 00:54:40,080 --> 00:54:41,520 make sense to think about. 977 00:54:41,520 --> 00:54:43,950 I mean you can fold it up into a tableau, if you like. 978 00:54:43,950 --> 00:54:47,220 And then abc and def will line up. 979 00:54:47,220 --> 00:54:50,280 But here, if you think about them written consecutively, 980 00:54:50,280 --> 00:54:52,740 this is exactly how far apart they end up being. 981 00:54:56,750 --> 00:55:02,570 Are there only polynomially many illegal neighborhoods? 982 00:55:02,570 --> 00:55:04,360 That's why I kind of corrected myself. 983 00:55:04,360 --> 00:55:07,030 It's not illegal neighborhoods that we're 984 00:55:07,030 --> 00:55:10,300 talking-- because the number of neighborhoods in this picture 985 00:55:10,300 --> 00:55:11,410 is vast. 986 00:55:11,410 --> 00:55:14,650 But the number of neighborhood settings, 987 00:55:14,650 --> 00:55:17,050 the way to set these values to abc, def. 988 00:55:20,950 --> 00:55:22,810 I mean these are symbols that can 989 00:55:22,810 --> 00:55:29,790 appear in a configuration of the machine. 990 00:55:29,790 --> 00:55:34,740 There's only a fixed number of symbols that can appear here, 991 00:55:34,740 --> 00:55:38,310 that depend upon the definition of the machine. 992 00:55:38,310 --> 00:55:39,720 So it's not only polynomial. 993 00:55:39,720 --> 00:55:42,420 There's a constant number of these things, that 994 00:55:42,420 --> 00:55:45,190 only depends on the machine. 995 00:55:45,190 --> 00:55:47,560 So you have to think about what's going on. 996 00:55:47,560 --> 00:55:49,830 There's a lot-- this is a lot on the slide. 997 00:55:54,480 --> 00:55:56,040 Bad history for reject. 998 00:55:56,040 --> 00:55:57,900 It's a bad history for rejecting, 999 00:55:57,900 --> 00:55:59,310 somebody's suggesting. 1000 00:55:59,310 --> 00:56:01,765 Yeah, it's a bad history. 1001 00:56:04,930 --> 00:56:05,590 Fake news. 1002 00:56:05,590 --> 00:56:07,360 Maybe we should be fake. 1003 00:56:07,360 --> 00:56:09,010 Fake would be a good term. 1004 00:56:09,010 --> 00:56:10,360 No, that's not so good. 1005 00:56:10,360 --> 00:56:10,960 I don't know. 1006 00:56:17,600 --> 00:56:18,740 Yeah, 2 by 3. 1007 00:56:18,740 --> 00:56:21,380 The reason 2 by 3, is the right-- 1008 00:56:21,380 --> 00:56:22,880 Somebody's asking why 2 by 3. 1009 00:56:22,880 --> 00:56:25,670 2 by 3 is exactly the size you need 1010 00:56:25,670 --> 00:56:28,280 to say that, if all the 2 by 3 neighborhoods 1011 00:56:28,280 --> 00:56:33,620 are correct everywhere in the computation history, 1012 00:56:33,620 --> 00:56:36,590 then the whole history is going to be 1013 00:56:36,590 --> 00:56:38,480 consistent with the rules of M. It's 1014 00:56:38,480 --> 00:56:41,030 going to be a legal representation of a computation 1015 00:56:41,030 --> 00:56:42,350 of M. 1016 00:56:42,350 --> 00:56:48,500 So if the string, which is allegedly a computation 1017 00:56:48,500 --> 00:56:51,710 history, has a bad neighborhood somewhere, 1018 00:56:51,710 --> 00:56:55,140 bad 2 by 3 neighborhood somewhere, then-- 1019 00:56:55,140 --> 00:56:57,470 well if it's not a legal computation history, 1020 00:56:57,470 --> 00:57:00,920 it's got to have a bad neighborhood, 2 1021 00:57:00,920 --> 00:57:02,406 by 3 neighborhood somewhere. 1022 00:57:06,390 --> 00:57:10,130 OK, let's move on. 1023 00:57:10,130 --> 00:57:13,970 Because I think we're out of time here. 1024 00:57:13,970 --> 00:57:16,220 Our timer is up. 1025 00:57:16,220 --> 00:57:18,440 We're going to shift gears now anyway. 1026 00:57:18,440 --> 00:57:22,803 So if you got a little lost in the previous proof, 1027 00:57:22,803 --> 00:57:24,720 we're going to talk about something different. 1028 00:57:24,720 --> 00:57:25,820 And in some ways, a little bit, I 1029 00:57:25,820 --> 00:57:27,500 think a little lighter, a little less technical. 1030 00:57:27,500 --> 00:57:28,542 And that's about oracles. 1031 00:57:36,570 --> 00:57:37,710 What are oracles? 1032 00:57:37,710 --> 00:57:39,000 Oracles are a simple thing. 1033 00:57:41,700 --> 00:57:48,008 But they are a useful concept for a number of reasons. 1034 00:57:48,008 --> 00:57:49,800 Especially because they're going to tell us 1035 00:57:49,800 --> 00:57:52,500 something interesting about methods, which may or may not 1036 00:57:52,500 --> 00:57:55,560 be useful for proving the P versus NP question, 1037 00:57:55,560 --> 00:57:57,990 when someday somebody hopefully does that. 1038 00:58:01,170 --> 00:58:03,030 What is an oracle? 1039 00:58:03,030 --> 00:58:05,340 An oracle is free information you're 1040 00:58:05,340 --> 00:58:07,050 going to give to a Turing machine, which 1041 00:58:07,050 --> 00:58:11,443 might affect the difficulty of solving problems. 1042 00:58:11,443 --> 00:58:13,860 And the way we're going to represent that free information 1043 00:58:13,860 --> 00:58:17,400 is, we're going to allow the Turing machine to test 1044 00:58:17,400 --> 00:58:21,740 membership in some specified language, 1045 00:58:21,740 --> 00:58:27,230 without charging for the work involved. 1046 00:58:30,970 --> 00:58:34,090 I'm going to allow you have any language at all, some language 1047 00:58:34,090 --> 00:58:38,810 A. And say a Turing machine with an oracle for A 1048 00:58:38,810 --> 00:58:43,085 is written this way, M with a superscript A. It's 1049 00:58:43,085 --> 00:58:46,370 a machine that has a black box that can answer questions. 1050 00:58:46,370 --> 00:58:50,540 Is some string, which the machine can choose, 1051 00:58:50,540 --> 00:58:52,680 in A or not? 1052 00:58:52,680 --> 00:58:56,340 And it gets that answer in one step, effectively for free. 1053 00:59:01,020 --> 00:59:04,320 So you can imagine, depending upon the language that you're 1054 00:59:04,320 --> 00:59:07,515 providing to the machine, that may or may not be useful. 1055 00:59:11,250 --> 00:59:15,800 For example, suppose I give you an oracle for the SAT language. 1056 00:59:15,800 --> 00:59:17,880 That can be very useful. 1057 00:59:17,880 --> 00:59:21,440 It could be very useful for deciding SAT, for example. 1058 00:59:21,440 --> 00:59:24,110 Because now you don't have to go through a brute force search 1059 00:59:24,110 --> 00:59:25,310 to solve SAT. 1060 00:59:25,310 --> 00:59:26,720 You just ask the oracle. 1061 00:59:26,720 --> 00:59:29,630 And the oracle is going to say, yes it's satisfiable, 1062 00:59:29,630 --> 00:59:31,430 or no it's not satisfiable. 1063 00:59:31,430 --> 00:59:36,230 But you can use that to solve other languages too, quickly. 1064 00:59:36,230 --> 00:59:40,670 Because anything that you can do in NP, you can reduce to SAT. 1065 00:59:40,670 --> 00:59:43,383 So you can convert it to a SAT question, which you can then 1066 00:59:43,383 --> 00:59:45,050 ship up to the oracle, and the oracle is 1067 00:59:45,050 --> 00:59:47,540 going to tell you the answer. 1068 00:59:47,540 --> 00:59:51,480 The word "oracle" already sort of conveys something magical. 1069 00:59:51,480 --> 00:59:53,268 We're not really going to be concerned 1070 00:59:53,268 --> 00:59:55,310 with the operation of the oracle, so don't ask me 1071 00:59:55,310 --> 00:59:57,602 how does the oracle work, or what does it correspond to 1072 00:59:57,602 --> 00:59:58,320 in reality. 1073 00:59:58,320 --> 00:59:59,090 It doesn't. 1074 00:59:59,090 --> 01:00:01,280 It's just a mathematical device which 1075 01:00:01,280 --> 01:00:03,770 provides this free information to the Turing machine, which 1076 01:00:03,770 --> 01:00:05,870 enables it to compute certain things. 1077 01:00:05,870 --> 01:00:07,657 It turns out to be a useful concept. 1078 01:00:07,657 --> 01:00:09,740 It's used in cryptography, where you might imagine 1079 01:00:09,740 --> 01:00:13,250 the oracle could provide the factors to some number, 1080 01:00:13,250 --> 01:00:16,340 or the password to some system or something. 1081 01:00:16,340 --> 01:00:17,510 Free information. 1082 01:00:17,510 --> 01:00:19,680 And then what can you do with that? 1083 01:00:19,680 --> 01:00:22,475 So this is a notion that comes up in other places. 1084 01:00:27,660 --> 01:00:34,370 If we have an oracle, we can think of all of the things 1085 01:00:34,370 --> 01:00:37,220 that you can compute in polynomial time 1086 01:00:37,220 --> 01:00:39,020 relative to that oracle. 1087 01:00:39,020 --> 01:00:41,600 So that's what we-- 1088 01:00:41,600 --> 01:00:43,445 the terminology that people usually use 1089 01:00:43,445 --> 01:00:46,490 is sometimes called relativism, or computation 1090 01:00:46,490 --> 01:00:49,020 relative to having this extra information. 1091 01:00:49,020 --> 01:00:52,850 So P with an A oracle is all of the language 1092 01:00:52,850 --> 01:00:54,740 that you can decide in polynomial time 1093 01:00:54,740 --> 01:00:59,450 if you have an oracle for A. Let's see. 1094 01:01:05,190 --> 01:01:05,785 Yeah. 1095 01:01:05,785 --> 01:01:07,410 Somebody's asking me, is it really free 1096 01:01:07,410 --> 01:01:10,326 or does it cost one unit? 1097 01:01:10,326 --> 01:01:12,947 Even just setting up the oracle and writing down the question 1098 01:01:12,947 --> 01:01:15,280 to the oracle is going to take you some number of steps. 1099 01:01:15,280 --> 01:01:17,447 So you're not going to be able do an infinite number 1100 01:01:17,447 --> 01:01:19,690 of oracle calls in zero time. 1101 01:01:19,690 --> 01:01:22,198 So charging one step or zero steps, 1102 01:01:22,198 --> 01:01:23,490 not going to make a difference. 1103 01:01:23,490 --> 01:01:25,532 Because you still have to formulate the question. 1104 01:01:32,230 --> 01:01:34,608 As I pointed out, P with a SAT oracle-- 1105 01:01:34,608 --> 01:01:36,400 so all the things you do in polynomial time 1106 01:01:36,400 --> 01:01:39,530 with a SAT oracle includes NP. 1107 01:01:39,530 --> 01:01:42,770 Does it perhaps include other stuff? 1108 01:01:42,770 --> 01:01:46,192 Or does it equal NP? 1109 01:01:46,192 --> 01:01:47,900 Would have been a good check-in question, 1110 01:01:47,900 --> 01:01:50,330 but I'm not going to ask that. 1111 01:01:50,330 --> 01:01:54,320 In fact, it seems like it contains other things too. 1112 01:01:54,320 --> 01:02:00,590 Because co-NP is also contained within P, given a SAT oracle. 1113 01:02:00,590 --> 01:02:04,310 Because the SAT oracle answer is both yes or no, 1114 01:02:04,310 --> 01:02:07,050 depending upon the answer. 1115 01:02:07,050 --> 01:02:09,890 So if the formula is unsatisfiable, 1116 01:02:09,890 --> 01:02:12,980 the oracle is going to say no, it's not in the language. 1117 01:02:12,980 --> 01:02:16,640 And now you can do the complement of the SAT problem 1118 01:02:16,640 --> 01:02:17,190 as well. 1119 01:02:17,190 --> 01:02:18,580 The unsatisfiability problem. 1120 01:02:18,580 --> 01:02:21,620 So you can do all of co-NP in the same way. 1121 01:02:21,620 --> 01:02:25,840 You can also define NP relative to some oracle. 1122 01:02:25,840 --> 01:02:28,600 So all the things you can do with a non-deterministic Turing 1123 01:02:28,600 --> 01:02:30,910 machine, where all of the branches 1124 01:02:30,910 --> 01:02:33,400 have separately access. 1125 01:02:33,400 --> 01:02:35,710 And they can ask multiple questions, by the way, 1126 01:02:35,710 --> 01:02:38,840 of the oracle. 1127 01:02:38,840 --> 01:02:39,770 Independently. 1128 01:02:43,450 --> 01:02:46,690 Let's do another, a little bit of a more challenging example. 1129 01:02:46,690 --> 01:02:49,363 The MIN-FORMULA language, which I hope you 1130 01:02:49,363 --> 01:02:50,530 remember from your homework. 1131 01:02:53,510 --> 01:02:57,710 So those are all of the formulas that do not have 1132 01:02:57,710 --> 01:03:01,850 a shorter equivalent formula. 1133 01:03:01,850 --> 01:03:03,360 They are minimal. 1134 01:03:03,360 --> 01:03:06,030 You cannot make a smaller formula that's equivalent that 1135 01:03:06,030 --> 01:03:09,950 gives you the same Boolean function. 1136 01:03:09,950 --> 01:03:14,170 So you showed, for example, that that language is in P space, 1137 01:03:14,170 --> 01:03:15,430 as I recall. 1138 01:03:15,430 --> 01:03:17,440 And there was some other-- you had another two 1139 01:03:17,440 --> 01:03:18,648 problems about that language. 1140 01:03:23,200 --> 01:03:26,580 The complement of the MIN-FORMULA problem 1141 01:03:26,580 --> 01:03:28,800 is in NP with a SAT oracle. 1142 01:03:31,640 --> 01:03:34,310 So mull that over for a second and then we'll see why. 1143 01:03:42,040 --> 01:03:49,900 Here's an algorithm, in NP with a SAT oracle algorithm. 1144 01:03:49,900 --> 01:03:57,470 So in other words, now I want to kind of implement 1145 01:03:57,470 --> 01:03:59,810 that strategy, which I argued in the homework problem 1146 01:03:59,810 --> 01:04:00,710 was not legal. 1147 01:04:00,710 --> 01:04:02,330 But now that I have the SAT oracle, 1148 01:04:02,330 --> 01:04:05,600 it's going to make it possible where before it 1149 01:04:05,600 --> 01:04:06,980 was not possible. 1150 01:04:10,230 --> 01:04:12,950 So let's just understand what I mean by that. 1151 01:04:15,730 --> 01:04:20,920 If I'm trying to do the non minimal formulas, 1152 01:04:20,920 --> 01:04:29,050 namely the formulas that do have a shorter equivalent formula. 1153 01:04:29,050 --> 01:04:31,630 I'm going to guess that shorter formula, called psi. 1154 01:04:35,050 --> 01:04:38,300 The challenge before was testing whether that shorter formula 1155 01:04:38,300 --> 01:04:41,400 was actually equivalent to phi. 1156 01:04:41,400 --> 01:04:46,040 Because that's not obviously doable in polynomial time. 1157 01:04:46,040 --> 01:04:51,620 But the equivalence problem for formulas is a co-NP problem. 1158 01:04:51,620 --> 01:04:53,630 Or if you like to think about it the other way, 1159 01:04:53,630 --> 01:04:57,440 any formula in equivalence is an NP problem, 1160 01:04:57,440 --> 01:04:59,360 because you just have to-- the witness 1161 01:04:59,360 --> 01:05:01,460 is the assignment on which they disagree. 1162 01:05:05,370 --> 01:05:09,940 So two formulas are equivalent if they never disagree. 1163 01:05:09,940 --> 01:05:11,295 And so that's a co-NP problem. 1164 01:05:15,560 --> 01:05:18,320 A SAT oracle can solve a co-NP problem. 1165 01:05:18,320 --> 01:05:22,310 Namely, the equivalence of the two formulas, the input formula 1166 01:05:22,310 --> 01:05:25,820 and the one that you now deterministically guessed. 1167 01:05:25,820 --> 01:05:28,240 And if it turns out that they are equivalent, 1168 01:05:28,240 --> 01:05:30,820 a smaller formula is equivalent to the input formula, 1169 01:05:30,820 --> 01:05:34,568 you know the input formula is not minimal. 1170 01:05:34,568 --> 01:05:35,485 And so you can accept. 1171 01:05:39,210 --> 01:05:40,710 And if it gets the wrong formula, 1172 01:05:40,710 --> 01:05:43,350 it turns out not to be equivalent, 1173 01:05:43,350 --> 01:05:45,630 then you reject on that branch of the non-determinism, 1174 01:05:45,630 --> 01:05:47,200 just like we did before. 1175 01:05:47,200 --> 01:05:51,810 And if the formula really was minimal, none of the branches 1176 01:05:51,810 --> 01:05:54,340 is going to find a shorter equivalent formula. 1177 01:05:54,340 --> 01:05:58,920 So that's why this problem here is in NP with a SAT oracle. 1178 01:06:04,510 --> 01:06:08,920 So now we're going to try to investigate this on my-- 1179 01:06:08,920 --> 01:06:13,040 we're getting near the end of the lecture. 1180 01:06:13,040 --> 01:06:20,270 We're going to look at problems like, well, 1181 01:06:20,270 --> 01:06:22,700 suppose I compare P with a SAT oracle 1182 01:06:22,700 --> 01:06:25,980 and NP with a SAT oracle. 1183 01:06:25,980 --> 01:06:28,810 Could those be the same? 1184 01:06:28,810 --> 01:06:31,440 Well, there's reasons to believe those are not the same. 1185 01:06:34,050 --> 01:06:41,330 But could there be any A where P with A oracle 1186 01:06:41,330 --> 01:06:43,610 is the same as NP with an A oracle? 1187 01:06:43,610 --> 01:06:48,050 It seems like no, but actually that's wrong. 1188 01:06:48,050 --> 01:06:51,800 There is a language, there are languages for which 1189 01:06:51,800 --> 01:06:55,460 NP with that oracle and P with that oracle 1190 01:06:55,460 --> 01:06:57,260 are exactly the same. 1191 01:06:57,260 --> 01:06:59,660 And that actually is an interest-- it's not just 1192 01:06:59,660 --> 01:07:03,860 a curiosity, it actually has relevance to strategies 1193 01:07:03,860 --> 01:07:05,450 for solving P versus NP. 1194 01:07:08,330 --> 01:07:11,720 Hopefully I'll be able to have time to get to. 1195 01:07:11,720 --> 01:07:15,130 Can we think of an oracle like a hash table? 1196 01:07:15,130 --> 01:07:17,335 I think hashing is somehow different in spirit. 1197 01:07:20,920 --> 01:07:23,080 I understand there's some similarity there, 1198 01:07:23,080 --> 01:07:24,190 but I don't see the-- 1199 01:07:24,190 --> 01:07:33,550 hashing is a way of finding sort of a short name for objects, 1200 01:07:33,550 --> 01:07:36,460 which has a variety of different purposes 1201 01:07:36,460 --> 01:07:38,340 why you might want to do that. 1202 01:07:38,340 --> 01:07:40,520 So I don't really think it's the same. 1203 01:07:40,520 --> 01:07:43,360 Let's see, an oracle question, OK, let's see. 1204 01:07:43,360 --> 01:07:46,030 How do we use SAT oracle to solve whether two formulas are 1205 01:07:46,030 --> 01:07:47,128 equivalent? 1206 01:07:51,195 --> 01:07:52,820 OK, this is getting back to this point. 1207 01:07:52,820 --> 01:07:55,790 How can we use a SAT oracle to solve whether two formulas are 1208 01:07:55,790 --> 01:07:58,690 equivalent? 1209 01:07:58,690 --> 01:08:03,010 Well, we can use a SAT oracle to solve any NP problem, 1210 01:08:03,010 --> 01:08:04,480 because it's reducible to SAT. 1211 01:08:07,080 --> 01:08:08,470 In other words-- 1212 01:08:08,470 --> 01:08:12,407 P with a SAT oracle contains all of NP, 1213 01:08:12,407 --> 01:08:14,490 so you have to make sure you understand that part. 1214 01:08:21,010 --> 01:08:23,569 If you have the clique problem, you can reduce. 1215 01:08:23,569 --> 01:08:27,010 If I give you a clique problem, which is an NP problem, 1216 01:08:27,010 --> 01:08:31,510 and I want to use the oracle to test if the formula-- 1217 01:08:31,510 --> 01:08:34,960 if the graph has a clique, I reduce that problem 1218 01:08:34,960 --> 01:08:37,029 to a SAT problem using the Cook-Levin theorem. 1219 01:08:41,240 --> 01:08:43,520 And knowing that a clique of a certain size 1220 01:08:43,520 --> 01:08:46,500 is going to correspond to having a formula which is satisfiable, 1221 01:08:46,500 --> 01:08:49,770 now I can ask the oracle. 1222 01:08:49,770 --> 01:08:52,319 And if I can do NP problems, I can do co-NP problems, 1223 01:08:52,319 --> 01:08:54,527 because P is a deterministic class. 1224 01:08:54,527 --> 01:08:56,819 Even though it has an oracle, it's still deterministic. 1225 01:08:56,819 --> 01:08:58,830 It can invert the answer. 1226 01:08:58,830 --> 01:09:03,520 Something that non-deterministic machines cannot necessarily do. 1227 01:09:03,520 --> 01:09:05,229 So I don't know, maybe that's-- 1228 01:09:05,229 --> 01:09:08,109 let's move on. 1229 01:09:08,109 --> 01:09:11,939 So there's an oracle where P to the A 1230 01:09:11,939 --> 01:09:15,827 equals NP to the A, which kind of seems kind of amazing 1231 01:09:15,827 --> 01:09:16,410 at some level. 1232 01:09:16,410 --> 01:09:19,740 Because here's an oracle where the non-determinism-- if I 1233 01:09:19,740 --> 01:09:23,520 give you that oracle, non-determinism doesn't help. 1234 01:09:26,399 --> 01:09:28,939 And it's actually a language we've seen. 1235 01:09:28,939 --> 01:09:29,594 It's TQBF. 1236 01:09:32,470 --> 01:09:34,021 Why is that? 1237 01:09:34,021 --> 01:09:35,229 Well, here's the whole proof. 1238 01:09:38,220 --> 01:09:41,294 If I have NP with a TQBF oracle. 1239 01:09:44,840 --> 01:09:46,429 Let's just check each of these steps. 1240 01:09:49,380 --> 01:09:54,860 I claim I can do that with a non-deterministic polynomial 1241 01:09:54,860 --> 01:10:02,760 space machine, which no longer has an oracle. 1242 01:10:02,760 --> 01:10:06,210 The reason is that, if I have polynomial space, 1243 01:10:06,210 --> 01:10:10,170 I can answer questions about TQBF without needing an oracle. 1244 01:10:10,170 --> 01:10:12,510 I have enough space just to answer the question directly 1245 01:10:12,510 --> 01:10:14,370 myself. 1246 01:10:14,370 --> 01:10:15,990 And I use my non-determinism here 1247 01:10:15,990 --> 01:10:20,610 to simulate the non-determinism of the NP machine. 1248 01:10:20,610 --> 01:10:24,450 So every time the NP machine branches non-deterministically, 1249 01:10:24,450 --> 01:10:26,850 so do I. Every time one of those branches 1250 01:10:26,850 --> 01:10:30,060 asks the oracle a TQBF question, I just 1251 01:10:30,060 --> 01:10:34,810 do my polynomial space algorithm to solve that question myself. 1252 01:10:34,810 --> 01:10:38,700 But now NPSPACE equals PSPACE by Savitch's theorem. 1253 01:10:38,700 --> 01:10:43,170 And because TQBF is PSPACE complete, 1254 01:10:43,170 --> 01:10:47,070 for the very same reason that a SAT oracle allows 1255 01:10:47,070 --> 01:10:50,190 me to do every NP problem, a TQBF problem 1256 01:10:50,190 --> 01:10:51,780 allows me to do every PSPACE problem. 1257 01:10:54,400 --> 01:10:59,310 And so I get NP contained within P for a TQBF oracle. 1258 01:10:59,310 --> 01:11:02,190 And of course, you get the containment the other way 1259 01:11:02,190 --> 01:11:03,340 immediately. 1260 01:11:03,340 --> 01:11:06,290 So they're equal. 1261 01:11:06,290 --> 01:11:08,035 What does that have to do with-- 1262 01:11:11,470 --> 01:11:14,707 somebody said-- well, I'll just-- 1263 01:11:14,707 --> 01:11:16,040 I don't want to run out of time. 1264 01:11:16,040 --> 01:11:19,810 So I'll take any questions at the end. 1265 01:11:19,810 --> 01:11:24,050 What does this got to do with the P versus NP problem? 1266 01:11:24,050 --> 01:11:27,140 OK, so this is a very interesting connection. 1267 01:11:35,380 --> 01:11:39,600 Remember, we just showed through a combination 1268 01:11:39,600 --> 01:11:41,850 of today's lecture and yesterday's lecture, 1269 01:11:41,850 --> 01:11:49,470 and I guess Thursday's lecture, that this problem, 1270 01:11:49,470 --> 01:11:52,530 this equivalence problem, is not in PSPACE, 1271 01:11:52,530 --> 01:11:56,140 and therefore it's not in P, and therefore it's intractable. 1272 01:11:56,140 --> 01:11:57,140 That's what we just did. 1273 01:12:01,700 --> 01:12:04,280 We showed it's complete for a class, which 1274 01:12:04,280 --> 01:12:13,345 is provably outside of P, provably bigger than P. 1275 01:12:13,345 --> 01:12:14,720 That's the kind of thing we would 1276 01:12:14,720 --> 01:12:17,660 like to be able to do to separate P and NP. 1277 01:12:17,660 --> 01:12:20,510 We would like to show that some other problem is not 1278 01:12:20,510 --> 01:12:23,480 in P. Some other problem is intractable. 1279 01:12:23,480 --> 01:12:24,380 Namely, SAT. 1280 01:12:24,380 --> 01:12:27,560 If we could do SAT, then we're good. 1281 01:12:27,560 --> 01:12:29,990 We've solved P and NP. 1282 01:12:29,990 --> 01:12:33,260 So we already have an example of being able to do that. 1283 01:12:33,260 --> 01:12:37,020 Could we use the same method? 1284 01:12:37,020 --> 01:12:39,780 Which is something people did try to do many years ago, 1285 01:12:39,780 --> 01:12:45,900 to show that SAT is not in P. So what is that method really? 1286 01:12:45,900 --> 01:12:49,647 The guts of that method really comes from the hierarchy there. 1287 01:12:49,647 --> 01:12:52,230 That's where you were actually proving problems that are hard. 1288 01:12:52,230 --> 01:12:56,460 You're getting this problem with through the hierarchy 1289 01:12:56,460 --> 01:13:02,830 construction that's provably outside of PSPACE. 1290 01:13:02,830 --> 01:13:05,490 And outside of P. 1291 01:13:05,490 --> 01:13:07,340 That's a diagonalization. 1292 01:13:07,340 --> 01:13:10,840 And if you look carefully at what's going on there-- 1293 01:13:10,840 --> 01:13:12,450 so we're going to say, no, we're not 1294 01:13:12,450 --> 01:13:15,240 going to be able to solve SAT, show SAT's outside of P 1295 01:13:15,240 --> 01:13:16,180 in the same way. 1296 01:13:16,180 --> 01:13:19,290 And the reason is, suppose we could. 1297 01:13:19,290 --> 01:13:22,450 Well the hierarchy theorems are proved by diagonalization. 1298 01:13:25,550 --> 01:13:30,260 What I mean by that is that in the hierarchy theorem, 1299 01:13:30,260 --> 01:13:34,430 there's a machine D, which is simulating some other machine, 1300 01:13:34,430 --> 01:13:39,210 M. To remember what's going on there, 1301 01:13:39,210 --> 01:13:41,700 remember that we made a machine which 1302 01:13:41,700 --> 01:13:44,580 is going to make its language different from the language 1303 01:13:44,580 --> 01:13:47,700 of every machine that's running with less space 1304 01:13:47,700 --> 01:13:48,600 or with less time. 1305 01:13:52,160 --> 01:13:54,680 That's how D was defined. 1306 01:13:54,680 --> 01:14:00,360 It wants to make sure its language cannot be done in less 1307 01:14:00,360 --> 01:14:01,090 space. 1308 01:14:01,090 --> 01:14:04,180 So it makes sure that its language is different. 1309 01:14:04,180 --> 01:14:07,860 It simulates the machines that use less space, 1310 01:14:07,860 --> 01:14:12,080 and does something different from what they do. 1311 01:14:12,080 --> 01:14:18,240 Well, that simulation-- if we had an oracle, 1312 01:14:18,240 --> 01:14:22,650 if we're trying to show that if we provide 1313 01:14:22,650 --> 01:14:25,860 both a simulator and the machine being simulated 1314 01:14:25,860 --> 01:14:29,340 with the same oracle, the simulation still works. 1315 01:14:29,340 --> 01:14:33,600 Every time the machine you're simulating asks a question, 1316 01:14:33,600 --> 01:14:35,220 the simulator has the same oracle 1317 01:14:35,220 --> 01:14:37,320 so it can also ask the same question, 1318 01:14:37,320 --> 01:14:39,490 and can still do the simulation. 1319 01:14:39,490 --> 01:14:43,830 So in other words, if you could prove P different from NP 1320 01:14:43,830 --> 01:14:46,290 using basically a simulation, which 1321 01:14:46,290 --> 01:14:48,990 is what a diagonalization is, then you 1322 01:14:48,990 --> 01:14:51,570 would be able to prove that P is different from NP 1323 01:14:51,570 --> 01:14:52,530 for every oracle. 1324 01:14:55,820 --> 01:14:58,670 So if you can prove P different from NP by a diagonalization, 1325 01:14:58,670 --> 01:15:00,290 that would also immediately prove 1326 01:15:00,290 --> 01:15:03,620 that P is different from NP for every oracle. 1327 01:15:03,620 --> 01:15:10,850 Because the argument is transparent to the oracle. 1328 01:15:10,850 --> 01:15:15,360 If you just put the oracle down, everything still works. 1329 01:15:15,360 --> 01:15:24,120 But-- here is the big but, it can't be that-- 1330 01:15:24,120 --> 01:15:27,990 we know that P A is-- 1331 01:15:27,990 --> 01:15:29,250 we know this is false. 1332 01:15:29,250 --> 01:15:33,220 We just exhibit an oracle for which they're equal. 1333 01:15:33,220 --> 01:15:35,280 It's not the case that P is different from NP 1334 01:15:35,280 --> 01:15:36,120 for every oracle. 1335 01:15:38,810 --> 01:15:41,480 Sometimes they're equal, for some oracles. 1336 01:15:41,480 --> 01:15:44,180 So something that's just basically a very 1337 01:15:44,180 --> 01:15:46,670 straightforward diagonalization, something that's 1338 01:15:46,670 --> 01:15:48,860 at its core is a diagonalization, 1339 01:15:48,860 --> 01:15:51,320 is not going to be enough to solve P and NP. 1340 01:15:51,320 --> 01:15:53,210 Because otherwise it would prove that they're 1341 01:15:53,210 --> 01:15:54,335 different for every oracle. 1342 01:15:54,335 --> 01:15:57,320 And sometimes they're not different, for some oracles. 1343 01:16:00,460 --> 01:16:04,240 That's an important insight for what kind of a method 1344 01:16:04,240 --> 01:16:09,340 will not be adequate to prove P different from NP. 1345 01:16:09,340 --> 01:16:12,490 And this comes up all the time. 1346 01:16:12,490 --> 01:16:16,870 People who propose hypothetical solutions that they're trying 1347 01:16:16,870 --> 01:16:18,220 to show P different from NP. 1348 01:16:18,220 --> 01:16:21,430 One of the very first things people ask is, well, 1349 01:16:21,430 --> 01:16:27,400 would that argument still work if you put an oracle there. 1350 01:16:27,400 --> 01:16:31,750 Often it does, which points out there was a flaw. 1351 01:16:31,750 --> 01:16:35,720 Anyway, last check in. 1352 01:16:35,720 --> 01:16:38,830 So this is just a little test of your knowledge about oracles. 1353 01:16:41,780 --> 01:16:43,655 Why don't we-- in our remaining minute here. 1354 01:16:46,400 --> 01:16:48,230 Let's say 30 seconds. 1355 01:16:48,230 --> 01:16:51,770 And then we'll do a wrap on this, 1356 01:16:51,770 --> 01:16:54,290 and I'll point out which ones are right. 1357 01:16:54,290 --> 01:16:56,780 Oh boy, we're all over the place on this one. 1358 01:17:00,820 --> 01:17:02,170 You're liking them all. 1359 01:17:02,170 --> 01:17:08,100 Well, I guess the ones that are false are lagging slightly. 1360 01:17:15,630 --> 01:17:16,950 OK, let's conclude. 1361 01:17:21,310 --> 01:17:23,800 Did I give you enough time there? 1362 01:17:23,800 --> 01:17:24,430 Share results. 1363 01:17:30,160 --> 01:17:31,020 So, in fact-- 1364 01:17:37,510 --> 01:17:41,427 Yeah, so having an oracle for the complement 1365 01:17:41,427 --> 01:17:42,760 is the same as having an oracle. 1366 01:17:42,760 --> 01:17:46,160 So this is certainly true. 1367 01:17:46,160 --> 01:17:50,220 NP SAT equal coNP SAT, we have no reason 1368 01:17:50,220 --> 01:17:53,808 to believe that would be true, and we 1369 01:17:53,808 --> 01:17:54,850 don't know it to be true. 1370 01:17:54,850 --> 01:18:00,660 So B is not a good choice and that's the laggard here. 1371 01:18:00,660 --> 01:18:05,700 MIN-FORMULA, well, is in PSPACE, and anything in PSPACE 1372 01:18:05,700 --> 01:18:09,660 is reducible to TQBF, so this is certainly true. 1373 01:18:09,660 --> 01:18:14,370 And same thing for NP with TQBF and coNP with TQBF. 1374 01:18:14,370 --> 01:18:22,640 Once you have TQBF, you're going to get all of PSPACE. 1375 01:18:22,640 --> 01:18:27,450 And as we pointed out, this is going to be equal as well. 1376 01:18:27,450 --> 01:18:30,500 So why don't we end here. 1377 01:18:30,500 --> 01:18:34,910 And I think that's my last slide. 1378 01:18:34,910 --> 01:18:36,800 Oh no, there's my summary here. 1379 01:18:36,800 --> 01:18:38,150 This is what we've done. 1380 01:18:38,150 --> 01:18:43,888 And I will send you all off on your way. 1381 01:18:43,888 --> 01:18:45,680 How does the interaction between the Turing 1382 01:18:45,680 --> 01:18:46,847 machine and the oracle look? 1383 01:18:46,847 --> 01:18:49,490 Yeah, I didn't define exactly how the machine interacts 1384 01:18:49,490 --> 01:18:50,510 with an oracle. 1385 01:18:50,510 --> 01:18:52,580 You can imagine having a separate tape 1386 01:18:52,580 --> 01:18:54,110 where it writes the oracle question 1387 01:18:54,110 --> 01:18:56,960 and then goes into a special query state. 1388 01:18:56,960 --> 01:18:59,212 You can formalize it however you like. 1389 01:18:59,212 --> 01:19:01,670 They're all going to be-- any reasonable way of formalizing 1390 01:19:01,670 --> 01:19:06,500 it is going to come up with the same notion in the end. 1391 01:19:06,500 --> 01:19:10,070 It does show that P with a TQBF oracle equals PSPACE. 1392 01:19:10,070 --> 01:19:11,330 Yes, that is correct. 1393 01:19:11,330 --> 01:19:13,970 Good point. 1394 01:19:13,970 --> 01:19:16,220 Why do we need the oracle to be TQBF? 1395 01:19:16,220 --> 01:19:19,170 Wouldn't SAT also work because it could solve any NP problem? 1396 01:19:19,170 --> 01:19:23,010 So you're asking, does P with a SAT oracle 1397 01:19:23,010 --> 01:19:25,170 equal NP with a SAT oracle? 1398 01:19:25,170 --> 01:19:26,190 Not known. 1399 01:19:26,190 --> 01:19:29,670 And believed not to be true, but we don't have 1400 01:19:29,670 --> 01:19:32,460 a compelling reason for that. 1401 01:19:32,460 --> 01:19:34,290 No one has any idea how to do that. 1402 01:19:34,290 --> 01:19:42,630 Because, for example, we showed the complement of MIN-FORMULA 1403 01:19:42,630 --> 01:19:44,235 is in NP with a SAT oracle. 1404 01:19:47,050 --> 01:19:48,820 But no one knows how to do-- 1405 01:19:48,820 --> 01:19:51,070 because there's sort of two levels of non-determinism 1406 01:19:51,070 --> 01:19:51,570 there. 1407 01:19:51,570 --> 01:19:54,160 There's guessing the smaller formula, 1408 01:19:54,160 --> 01:19:59,070 and then guessing again to check the equivalence. 1409 01:19:59,070 --> 01:20:01,320 And they really can't be combined, because one of them 1410 01:20:01,320 --> 01:20:03,750 is sort of an exist type guessing, 1411 01:20:03,750 --> 01:20:05,970 the other one is kind of a for all type guessing. 1412 01:20:11,010 --> 01:20:13,320 No one knows how to do that in polynomial time 1413 01:20:13,320 --> 01:20:14,190 with a SAT oracle. 1414 01:20:18,085 --> 01:20:19,960 Generally believed that they're not the same. 1415 01:20:23,250 --> 01:20:24,990 In the check-in, why was B false? 1416 01:20:24,990 --> 01:20:29,900 B is the same question. 1417 01:20:29,900 --> 01:20:35,750 Does NP with a SAT oracle equal coNP with a SAT oracle? 1418 01:20:35,750 --> 01:20:39,820 I'm not saying it's false, it's just not known to be true. 1419 01:20:39,820 --> 01:20:43,870 It doesn't follow from anything that we've shown so far. 1420 01:20:43,870 --> 01:20:47,500 And I think that would be something that-- 1421 01:20:47,500 --> 01:20:49,510 well, I guess it doesn't immediately 1422 01:20:49,510 --> 01:20:51,565 imply any famous open problem. 1423 01:20:54,790 --> 01:20:57,280 I wouldn't necessarily expect you 1424 01:20:57,280 --> 01:21:00,430 to know that it's an unsolved problem, but it is. 1425 01:21:04,870 --> 01:21:06,850 Could we have oracles for undecidable language? 1426 01:21:06,850 --> 01:21:08,410 Absolutely. 1427 01:21:08,410 --> 01:21:09,370 Would it be helpful? 1428 01:21:09,370 --> 01:21:12,160 Well, if you're trying to solve an undecidable problem, 1429 01:21:12,160 --> 01:21:13,690 it would be helpful. 1430 01:21:13,690 --> 01:21:16,240 But people do study that. 1431 01:21:16,240 --> 01:21:22,690 In fact, the original concept of oracles was presented, 1432 01:21:22,690 --> 01:21:27,670 was derived in the computability theory. 1433 01:21:27,670 --> 01:21:31,413 And a side note, you can talk about reducibility. 1434 01:21:31,413 --> 01:21:32,830 No, I don't want to even go there. 1435 01:21:32,830 --> 01:21:33,910 Too complicated. 1436 01:21:37,130 --> 01:21:41,000 What is not known to be true? 1437 01:21:41,000 --> 01:21:44,210 What is not known to be true is that NP with a SAT oracle 1438 01:21:44,210 --> 01:21:48,980 equals coNP with a SAT oracle, or equals P with a SAT oracle. 1439 01:21:48,980 --> 01:21:53,120 Nothing is known except the obvious relations among those. 1440 01:21:53,120 --> 01:21:57,485 Those are all unknown, and just not known to be true or false. 1441 01:22:05,020 --> 01:22:08,730 Is NP with a SAT oracle equal to NP? 1442 01:22:08,730 --> 01:22:10,140 Probably not. 1443 01:22:10,140 --> 01:22:15,050 NP with a SAT oracle, for one thing, contains coNP. 1444 01:22:15,050 --> 01:22:16,700 Because it's even more powerful. 1445 01:22:16,700 --> 01:22:21,050 We pointed out that P with a SAT oracle contains coNP. 1446 01:22:21,050 --> 01:22:24,870 And so NP with a SAT oracle is going to be at least as good. 1447 01:22:24,870 --> 01:22:27,350 And so it's going to contain coNP. 1448 01:22:27,350 --> 01:22:29,810 And so, probably not going to be equal to 1449 01:22:29,810 --> 01:22:34,380 NP unless shockingly unexpected things 1450 01:22:34,380 --> 01:22:37,460 happen in our complexity world.