1 00:00:00,000 --> 00:00:02,450 [SQUEAKING] 2 00:00:02,450 --> 00:00:04,410 [RUSTLING] 3 00:00:04,410 --> 00:00:05,880 [CLICKING] 4 00:00:25,030 --> 00:00:28,220 MICHAEL SIPSER: Why don't we get started. 5 00:00:28,220 --> 00:00:31,270 So as I like to do, let's just review 6 00:00:31,270 --> 00:00:34,780 where we have been recently, which 7 00:00:34,780 --> 00:00:41,020 is to discuss context-free languages. 8 00:00:41,020 --> 00:00:44,020 We talked about the context-free grammars and the pushdown 9 00:00:44,020 --> 00:00:49,210 automata as a way of describing the context-free languages. 10 00:00:49,210 --> 00:00:52,540 As you remember, the context-free languages 11 00:00:52,540 --> 00:00:55,643 are a larger class of languages than the regular languages, 12 00:00:55,643 --> 00:00:57,310 which is where we started, the languages 13 00:00:57,310 --> 00:00:58,400 of the finite automata. 14 00:00:58,400 --> 00:01:00,910 So when you add a stack, you get more power. 15 00:01:00,910 --> 00:01:03,700 You get more languages that you can do. 16 00:01:06,860 --> 00:01:09,890 And we're very rapidly going to be moving on today 17 00:01:09,890 --> 00:01:13,130 to our main model for the semester, which 18 00:01:13,130 --> 00:01:14,730 is called the Turing machine. 19 00:01:14,730 --> 00:01:18,200 So let's just take a look at what we're 20 00:01:18,200 --> 00:01:19,790 going to be covering today. 21 00:01:19,790 --> 00:01:24,110 And that is, first, we're going to show 22 00:01:24,110 --> 00:01:30,470 that a technique analogous to the one we use for proving 23 00:01:30,470 --> 00:01:33,470 that languages are not regular, but this time for proving 24 00:01:33,470 --> 00:01:35,630 languages are not context-free. 25 00:01:35,630 --> 00:01:38,390 So the pushdown automata and the grammar 26 00:01:38,390 --> 00:01:41,270 still have their limitations in terms of what we normally 27 00:01:41,270 --> 00:01:43,590 think a computer can do. 28 00:01:43,590 --> 00:01:46,370 And with that, we're going to use that as a kind of a lead-in 29 00:01:46,370 --> 00:01:51,320 to our general-purpose model, which is the Turing machine. 30 00:01:51,320 --> 00:01:56,290 And so we're going to talk about Turing 31 00:01:56,290 --> 00:02:03,260 machines and aspects of that. 32 00:02:03,260 --> 00:02:05,030 I would want to comment-- 33 00:02:05,030 --> 00:02:08,820 so I have posted the solutions for the first problem set. 34 00:02:08,820 --> 00:02:11,320 I know you're starting to think about the second problem set 35 00:02:11,320 --> 00:02:14,980 now, which I have posted as well. 36 00:02:14,980 --> 00:02:17,500 If you want to get a sense of what I'm looking for in terms 37 00:02:17,500 --> 00:02:20,290 of the level of detail, you can look at the solutions 38 00:02:20,290 --> 00:02:22,120 to problem set one, because I consider 39 00:02:22,120 --> 00:02:23,630 those to be model solutions. 40 00:02:23,630 --> 00:02:25,733 That's part of the reason why I post them, just 41 00:02:25,733 --> 00:02:28,400 to give you a sense of the level of detail that I'm looking for, 42 00:02:28,400 --> 00:02:29,600 which is not a whole lot. 43 00:02:29,600 --> 00:02:31,630 But I do want to make sure you're 44 00:02:31,630 --> 00:02:35,560 capturing the main ideas of what's involved 45 00:02:35,560 --> 00:02:37,620 in solving the problem. 46 00:02:37,620 --> 00:02:40,220 So have a look at those. 47 00:02:40,220 --> 00:02:44,992 And for problem set two, which I'll talk about in a second-- 48 00:02:44,992 --> 00:02:46,200 so I'll just say a few words. 49 00:02:46,200 --> 00:02:49,603 If you want to pull that up, you can do that. 50 00:02:49,603 --> 00:02:51,770 But just to get you started on a few of the problems 51 00:02:51,770 --> 00:02:54,530 if you're finding some challenges there-- 52 00:02:54,530 --> 00:02:56,300 I don't want you to get stuck really 53 00:02:56,300 --> 00:02:59,630 before you even understand what the problem is saying. 54 00:02:59,630 --> 00:03:02,697 So for problem number one, if you looked at that, 55 00:03:02,697 --> 00:03:04,280 so that's a problem where you're asked 56 00:03:04,280 --> 00:03:08,930 to prove a certain language is not context free. 57 00:03:08,930 --> 00:03:11,720 And by the way, all of the problems in this problem set 58 00:03:11,720 --> 00:03:15,500 except perhaps for the last, for number six, 59 00:03:15,500 --> 00:03:16,850 you'll be able to solve. 60 00:03:16,850 --> 00:03:19,340 We'll have enough material at the end of today's lecture 61 00:03:19,340 --> 00:03:21,750 to solve all of them. 62 00:03:21,750 --> 00:03:25,040 I believe that's right. 63 00:03:25,040 --> 00:03:27,020 Yeah, so number six, you should have 64 00:03:27,020 --> 00:03:31,020 enough as of Thursday's lecture to solve that. 65 00:03:31,020 --> 00:03:37,850 So problem number one, it's proving a language 66 00:03:37,850 --> 00:03:38,870 is not context free. 67 00:03:38,870 --> 00:03:41,037 So we're going to introduce a method for doing that. 68 00:03:41,037 --> 00:03:43,160 That method is going to come in handy. 69 00:03:43,160 --> 00:03:46,400 For parts B and C, if you look at the problem set, 70 00:03:46,400 --> 00:03:48,830 it has this strange-looking thing, sigma sigma 71 00:03:48,830 --> 00:03:50,930 sigma in parenthesis star. 72 00:03:50,930 --> 00:03:55,308 It's really just a regular expression that's very simple. 73 00:03:55,308 --> 00:03:56,975 You should just make sure you understand 74 00:03:56,975 --> 00:04:00,890 that that's a way of representing all strings whose 75 00:04:00,890 --> 00:04:04,070 length is a multiple of 3. 76 00:04:04,070 --> 00:04:06,710 And if I stick a sigma in front of that, 77 00:04:06,710 --> 00:04:12,560 it's all strings whose length is 1 plus a multiple of 3. 78 00:04:12,560 --> 00:04:16,880 So once you understand that, and if you 79 00:04:16,880 --> 00:04:20,420 think about what kinds of strings are in the language C2, 80 00:04:20,420 --> 00:04:22,460 it'll help you to understand what happens 81 00:04:22,460 --> 00:04:24,470 when you take those unions. 82 00:04:24,470 --> 00:04:29,300 And parts B and C are not intended to be very hard, 83 00:04:29,300 --> 00:04:32,540 but you just have to understand what's going on. 84 00:04:32,540 --> 00:04:34,850 Problem number two is about ambiguous grammars. 85 00:04:34,850 --> 00:04:38,310 I touched on that briefly in lecture. 86 00:04:38,310 --> 00:04:40,580 It's enough to solve the problem. 87 00:04:40,580 --> 00:04:42,200 The book has a little bit more detail 88 00:04:42,200 --> 00:04:48,470 about ambiguous languages, ambiguous grammars-- 89 00:04:48,470 --> 00:04:51,510 ambiguous grammars, I should say. 90 00:04:51,510 --> 00:04:55,040 And so this is a grammar that's supposed 91 00:04:55,040 --> 00:04:58,580 to represent a fragment of a programming language 92 00:04:58,580 --> 00:05:03,140 with if thens and if then elses. 93 00:05:03,140 --> 00:05:06,860 I'm sure you're all familiar with those kinds of constructs 94 00:05:06,860 --> 00:05:09,200 in programming languages. 95 00:05:09,200 --> 00:05:12,140 And there is a natural ambiguity that comes up 96 00:05:12,140 --> 00:05:13,730 in a programming language. 97 00:05:13,730 --> 00:05:18,500 If you have if some condition then statement one, 98 00:05:18,500 --> 00:05:20,240 else statement two, I presume you 99 00:05:20,240 --> 00:05:24,320 understand what the semantics of that is, what that means. 100 00:05:24,320 --> 00:05:27,210 And the tricky thing is that if you have-- 101 00:05:27,210 --> 00:05:29,660 those statements can themselves be if statements. 102 00:05:29,660 --> 00:05:34,460 And so if you have the situation where you have if then 103 00:05:34,460 --> 00:05:39,930 and if then else is what follows that, the question is, 104 00:05:39,930 --> 00:05:41,120 where does the else attach? 105 00:05:41,120 --> 00:05:43,790 Is it to the second if or to the first if? 106 00:05:43,790 --> 00:05:46,715 So that's kind of a big hint on this problem, but that's OK. 107 00:05:52,680 --> 00:05:55,040 You need to take that and figure out 108 00:05:55,040 --> 00:05:58,340 how to get an actual member of the language which 109 00:05:58,340 --> 00:06:02,240 is ambiguously generated, and then show that it has-- 110 00:06:02,240 --> 00:06:07,070 show that it is by showing two parse trees or two 111 00:06:07,070 --> 00:06:07,953 leftmost derivations. 112 00:06:07,953 --> 00:06:10,370 If you read the book, you'll see that's an alternative way 113 00:06:10,370 --> 00:06:12,960 of representing a parse tree. 114 00:06:12,960 --> 00:06:15,740 So and then what you're supposed to do 115 00:06:15,740 --> 00:06:20,213 is give a grammar for the same language which is unambiguous. 116 00:06:20,213 --> 00:06:22,130 You don't have to prove that it's unambiguous, 117 00:06:22,130 --> 00:06:23,463 because that's a bit of a chore. 118 00:06:23,463 --> 00:06:25,490 But as long as you understand what's going on, 119 00:06:25,490 --> 00:06:28,160 you should be able to come up with an unambiguous grammar 120 00:06:28,160 --> 00:06:30,530 which resolves that ambiguity. 121 00:06:30,530 --> 00:06:33,140 And I don't have in mind changing the language 122 00:06:33,140 --> 00:06:36,170 by introducing new programming language 123 00:06:36,170 --> 00:06:38,930 constructs like a "begin end." 124 00:06:38,930 --> 00:06:40,670 That's not in the spirit of this problem, 125 00:06:40,670 --> 00:06:43,160 because that's a different-- it's grammar 126 00:06:43,160 --> 00:06:44,400 for a different language. 127 00:06:44,400 --> 00:06:47,240 So you need to be generating the same language 128 00:06:47,240 --> 00:06:50,000 without any other extraneous things going on that are 129 00:06:50,000 --> 00:06:51,428 going to resolve the ambiguity. 130 00:06:51,428 --> 00:06:53,720 The ambiguity needs to be resolved within the structure 131 00:06:53,720 --> 00:06:55,860 of the grammar itself. 132 00:06:55,860 --> 00:06:58,570 So keep that in mind. 133 00:06:58,570 --> 00:07:01,900 For problem number three about the queue automata, 134 00:07:01,900 --> 00:07:06,070 you know, that came up actually as a suggestion last lecture, 135 00:07:06,070 --> 00:07:07,810 I believe, or two lectures back. 136 00:07:07,810 --> 00:07:10,750 What happens if you take a pushdown automaton, but instead 137 00:07:10,750 --> 00:07:12,520 of a pushdown-- 138 00:07:12,520 --> 00:07:15,520 instead of a stack, you add a queue. 139 00:07:15,520 --> 00:07:16,527 What happens then? 140 00:07:16,527 --> 00:07:18,610 Well actually, it turns out that the model you get 141 00:07:18,610 --> 00:07:19,350 is very powerful. 142 00:07:19,350 --> 00:07:21,100 And it turns out to be equivalent in power 143 00:07:21,100 --> 00:07:22,280 to a Turing machine. 144 00:07:22,280 --> 00:07:24,130 So you'll see arguments of that kind 145 00:07:24,130 --> 00:07:28,820 today, how you show that other models are equivalent-- no, not 146 00:07:28,820 --> 00:07:29,320 today. 147 00:07:33,460 --> 00:07:34,660 So I apologize. 148 00:07:34,660 --> 00:07:38,080 This is going to be something that you'll-- 149 00:07:38,080 --> 00:07:41,320 I'm confusing myself here. 150 00:07:41,320 --> 00:07:43,870 Problem number three actually needs Thursday's lecture 151 00:07:43,870 --> 00:07:47,347 as well to really at least see examples of how 152 00:07:47,347 --> 00:07:48,430 you do that kind of thing. 153 00:07:54,490 --> 00:07:58,270 Yeah, so I'll try to send out a note clarifying this. 154 00:07:58,270 --> 00:08:01,030 By the end of Thursday, you'll be able to do everything, 155 00:08:01,030 --> 00:08:02,530 except for problem six. 156 00:08:02,530 --> 00:08:05,560 And for problem six, you'll need Tuesday's lecture, a week 157 00:08:05,560 --> 00:08:06,730 from today's lecture, to do. 158 00:08:11,840 --> 00:08:15,980 So problem number four, that one you'll 159 00:08:15,980 --> 00:08:18,200 be able to do at the end of today. 160 00:08:18,200 --> 00:08:19,670 That's also going to-- the problem 161 00:08:19,670 --> 00:08:22,543 is I'm working on preparing Thursday's lecture too. 162 00:08:22,543 --> 00:08:23,585 So I'm getting a little-- 163 00:08:23,585 --> 00:08:25,402 I'm confusing myself. 164 00:08:25,402 --> 00:08:27,110 Problem number four, you'll be able to do 165 00:08:27,110 --> 00:08:28,152 after Thursday's lecture. 166 00:08:28,152 --> 00:08:33,470 Maybe we should talk about that next lecture. 167 00:08:33,470 --> 00:08:39,620 Problem number five, you can do today, 168 00:08:39,620 --> 00:08:41,913 but maybe I'm not going to say anything about that. 169 00:08:41,913 --> 00:08:44,330 And problem number six, I won't say anything about either. 170 00:08:44,330 --> 00:08:46,850 OK, so why don't we just jump in then 171 00:08:46,850 --> 00:08:49,920 and look at today's material. 172 00:08:49,920 --> 00:08:52,490 What about seven? 173 00:08:52,490 --> 00:08:54,410 Oh, seven is an optional problem. 174 00:08:54,410 --> 00:08:56,270 Oh, I should have mentioned that. 175 00:08:56,270 --> 00:08:58,020 Seven is always going to be an option. 176 00:08:58,020 --> 00:08:59,810 I indicate that with a star I should 177 00:08:59,810 --> 00:09:03,260 have made that clear on the actual description here, 178 00:09:03,260 --> 00:09:04,980 but seven is optional. 179 00:09:04,980 --> 00:09:06,860 It's just like we had for problem set one. 180 00:09:09,382 --> 00:09:10,840 OK, let's move let's move on, then, 181 00:09:10,840 --> 00:09:14,830 to what we're going to talk about today. 182 00:09:14,830 --> 00:09:17,750 And just a little bit of review-- 183 00:09:17,750 --> 00:09:19,900 so we talked about the equivalence 184 00:09:19,900 --> 00:09:22,600 of context-free grammars and pushdown automata, 185 00:09:22,600 --> 00:09:23,380 as you remember. 186 00:09:23,380 --> 00:09:27,190 Oops, let me get myself out of the picture here. 187 00:09:27,190 --> 00:09:29,620 As we mentioned last time, we actually proved 188 00:09:29,620 --> 00:09:32,033 one direction, but the other direction of that, 189 00:09:32,033 --> 00:09:33,700 you just have to know it's true, but you 190 00:09:33,700 --> 00:09:34,908 don't have to know the proof. 191 00:09:34,908 --> 00:09:36,910 The proof is a little bit lengthy, I would say. 192 00:09:36,910 --> 00:09:40,420 It's a nice proof, but it's pretty long. 193 00:09:40,420 --> 00:09:46,398 And there are two important corollaries to that. 194 00:09:46,398 --> 00:09:47,940 If you know what a corollary is, it's 195 00:09:47,940 --> 00:09:49,770 just a simple consequence which doesn't 196 00:09:49,770 --> 00:09:52,410 need much of a proof, sort of a very 197 00:09:52,410 --> 00:09:54,060 straightforward consequence. 198 00:09:54,060 --> 00:09:58,030 First of all, I think we pointed out last time, 199 00:09:58,030 --> 00:10:00,120 one conclusion, one corollary you get 200 00:10:00,120 --> 00:10:03,240 is that every regular language is a context-free language, 201 00:10:03,240 --> 00:10:10,110 because a finite automaton is a pushdown automaton that just 202 00:10:10,110 --> 00:10:12,150 happens not to use its stack. 203 00:10:12,150 --> 00:10:15,630 So immediately, you get that every language is context free. 204 00:10:15,630 --> 00:10:19,770 And second of all, you also immediately get 205 00:10:19,770 --> 00:10:24,000 that whenever you have a context-free language 206 00:10:24,000 --> 00:10:27,830 and a regular language and you take their intersection, 207 00:10:27,830 --> 00:10:31,770 you get back a context-free language. 208 00:10:31,770 --> 00:10:36,960 So context free intersect regular is context free. 209 00:10:36,960 --> 00:10:40,020 That's actually mentioned in your homework as well as one 210 00:10:40,020 --> 00:10:45,960 of the 0.x problems which I give to try to get you-- 211 00:10:45,960 --> 00:10:48,840 you don't have to turn those in, but I suggest you look at them. 212 00:10:48,840 --> 00:10:50,890 I don't know how many of you are looking at them. 213 00:10:50,890 --> 00:10:52,980 But this is a useful fact. 214 00:10:52,980 --> 00:10:57,250 And some of those other facts in 0.x problems are useful. 215 00:10:57,250 --> 00:10:59,700 So I encourage you to look at them. 216 00:10:59,700 --> 00:11:02,790 But anyway, intersection of context free and regular 217 00:11:02,790 --> 00:11:04,060 is context free. 218 00:11:04,060 --> 00:11:06,450 You might ask, what about intersection of context 219 00:11:06,450 --> 00:11:08,070 free and context free? 220 00:11:08,070 --> 00:11:09,990 Do we have closure under intersection? 221 00:11:09,990 --> 00:11:12,570 The answer is, no, we do not have close 222 00:11:12,570 --> 00:11:14,340 to closure under intersection. 223 00:11:14,340 --> 00:11:16,330 We'll talk about that shortly. 224 00:11:16,330 --> 00:11:19,800 So here is the proof sketch for-- 225 00:11:19,800 --> 00:11:22,170 I wanted to say that the intersection of context 226 00:11:22,170 --> 00:11:25,260 free and regular, why do we know that's still context free? 227 00:11:25,260 --> 00:11:30,990 Because the pushdown automaton for A 228 00:11:30,990 --> 00:11:34,080 can be simulating the finite automaton 229 00:11:34,080 --> 00:11:40,020 for B inside its finite control, inside its finite memory. 230 00:11:40,020 --> 00:11:42,690 The problem is, if you have two context-free languages, 231 00:11:42,690 --> 00:11:44,760 you have two pushdown automata, you 232 00:11:44,760 --> 00:11:47,190 can't simulate that with one pushdown automaton, 233 00:11:47,190 --> 00:11:49,925 because it has only a single stack. 234 00:11:49,925 --> 00:11:52,050 So if you're trying to take the intersection of two 235 00:11:52,050 --> 00:11:55,440 context-free languages with only a single stack, 236 00:11:55,440 --> 00:11:58,470 you're going to be in trouble, because it's hard to-- 237 00:11:58,470 --> 00:12:00,878 anyway, that's not a proof, but at least it 238 00:12:00,878 --> 00:12:03,420 shows you what goes wrong if you try to do the obvious thing. 239 00:12:06,220 --> 00:12:14,738 OK, so if-- and just, here is an important point 240 00:12:14,738 --> 00:12:16,030 that was trying to make before. 241 00:12:16,030 --> 00:12:18,010 If A and B are both context free and you're 242 00:12:18,010 --> 00:12:22,060 taking the intersection, the result may not necessarily 243 00:12:22,060 --> 00:12:24,770 be a context-free language. 244 00:12:24,770 --> 00:12:26,860 So the class of context-free languages 245 00:12:26,860 --> 00:12:28,570 is not closed under its intersection. 246 00:12:28,570 --> 00:12:31,060 We'll comment on that in a bit. 247 00:12:34,590 --> 00:12:38,000 The context-free languages are closed under the regular 248 00:12:38,000 --> 00:12:40,790 operations, however, union, intersection-- 249 00:12:40,790 --> 00:12:42,560 union, concatenation, and star. 250 00:12:42,560 --> 00:12:45,210 So you should feel comfortable that you 251 00:12:45,210 --> 00:12:47,300 know how to prove that. 252 00:12:47,300 --> 00:12:50,240 Again, it's one of the-- 253 00:12:50,240 --> 00:12:52,333 I think it's problem 0.2. 254 00:12:52,333 --> 00:12:54,750 And I think the solution is even given in the book for it. 255 00:12:54,750 --> 00:12:58,610 So you just should know how to prove that. 256 00:12:58,610 --> 00:13:00,710 It's pretty straightforward. 257 00:13:00,710 --> 00:13:08,750 OK, so let's move on then to basically conclude 258 00:13:08,750 --> 00:13:12,170 our work on context-free languages, 259 00:13:12,170 --> 00:13:17,600 to understand the limitations of context-free grammars, 260 00:13:17,600 --> 00:13:21,227 and what kinds of languages may not be context free. 261 00:13:21,227 --> 00:13:22,310 And how do you prove that? 262 00:13:22,310 --> 00:13:24,290 So how do you prove that, for some language, 263 00:13:24,290 --> 00:13:26,660 there is no grammar? 264 00:13:26,660 --> 00:13:28,310 Again, you know, it's not enough just 265 00:13:28,310 --> 00:13:36,880 to, say, give an informal comment that, I couldn't 266 00:13:36,880 --> 00:13:38,837 think of a grammar, or some-- 267 00:13:38,837 --> 00:13:39,670 things of that kind. 268 00:13:39,670 --> 00:13:41,128 That's not going to be good enough. 269 00:13:41,128 --> 00:13:42,600 We need to have a proof. 270 00:13:42,600 --> 00:13:46,355 So if we take the language here, 0 to the k, 1 to the k, 271 00:13:46,355 --> 00:13:48,650 2 to the k, so those are strings which 272 00:13:48,650 --> 00:13:51,350 are runs of 0's followed by an equal number of 1's 273 00:13:51,350 --> 00:13:54,320 followed by an equal number of 2's, so just 0's, 274 00:13:54,320 --> 00:13:58,920 then 1's, then 2's, all the same length. 275 00:13:58,920 --> 00:14:01,020 That's a language which is not going 276 00:14:01,020 --> 00:14:02,270 to be a context-free language. 277 00:14:02,270 --> 00:14:05,700 And we'll give a method for proving that. 278 00:14:05,700 --> 00:14:10,110 If you had a stack, you can match the 1's with the 0's, but 279 00:14:10,110 --> 00:14:12,900 then once you're done with that, the stack is empty. 280 00:14:12,900 --> 00:14:16,380 And how do you now make sure that the number of 2's 281 00:14:16,380 --> 00:14:18,670 corresponds to the number of 1's that you had? 282 00:14:18,670 --> 00:14:21,060 So again, that's an informal argument 283 00:14:21,060 --> 00:14:23,980 that's not good enough to be a proof, 284 00:14:23,980 --> 00:14:26,220 but it sort of gives an intuition. 285 00:14:26,220 --> 00:14:32,670 So we're going to give a method for proving non-context-free-- 286 00:14:32,670 --> 00:14:38,130 languages are not context free using, again, a pumping lemma. 287 00:14:38,130 --> 00:14:40,290 But this is going to be a pumping lemma that 288 00:14:40,290 --> 00:14:41,910 applies to context-free language, 289 00:14:41,910 --> 00:14:43,720 not to regular languages. 290 00:14:43,720 --> 00:14:46,320 It looks very similar, but it has some extra wrinkles 291 00:14:46,320 --> 00:14:50,640 thrown in, because the other older pumping lemma was 292 00:14:50,640 --> 00:14:52,203 specific to the regular languages. 293 00:14:52,203 --> 00:14:54,120 And this is going to be something that applies 294 00:14:54,120 --> 00:14:56,900 to the context-free languages. 295 00:14:56,900 --> 00:14:59,930 OK, so now let's just read it. 296 00:14:59,930 --> 00:15:03,480 And then we'll try to interpret it again. 297 00:15:03,480 --> 00:15:04,910 It's very similar in spirit. 298 00:15:04,910 --> 00:15:06,470 Basically, it says that, whenever 299 00:15:06,470 --> 00:15:08,150 you have a context-free language, 300 00:15:08,150 --> 00:15:10,850 all long strings in the language can 301 00:15:10,850 --> 00:15:13,428 be pumped in some kind of way. 302 00:15:13,428 --> 00:15:15,220 So it's going to be a little different kind 303 00:15:15,220 --> 00:15:17,890 of pumping than we had before. 304 00:15:17,890 --> 00:15:20,360 And you stay in the language. 305 00:15:20,360 --> 00:15:25,540 OK, so before, we broke the string into three pieces 306 00:15:25,540 --> 00:15:28,510 where we could repeat that centerpiece as many times 307 00:15:28,510 --> 00:15:29,890 as you like. 308 00:15:29,890 --> 00:15:31,210 And you stay in the language. 309 00:15:31,210 --> 00:15:32,740 Here, we're going to end up breaking 310 00:15:32,740 --> 00:15:36,140 the string into five pieces. 311 00:15:36,140 --> 00:15:38,005 So s is going to be broken up into uvxyz. 312 00:15:45,604 --> 00:15:48,910 And the way it's going to work here-- so here is a picture. 313 00:15:48,910 --> 00:15:50,605 So all long strings-- again, there 314 00:15:50,605 --> 00:15:51,752 is going to be a threshold. 315 00:15:51,752 --> 00:15:53,335 So whenever you have a language, there 316 00:15:53,335 --> 00:15:55,630 is going to be some cut-off length. 317 00:15:55,630 --> 00:15:58,472 So all the longer strings in that language can be pumped. 318 00:15:58,472 --> 00:15:59,680 And you stay in the language. 319 00:15:59,680 --> 00:16:02,995 But the shorter strings, there is no guarantee. 320 00:16:02,995 --> 00:16:05,620 So if you have a long string in the language of length at least 321 00:16:05,620 --> 00:16:08,440 this pumping length p, then you can break it up 322 00:16:08,440 --> 00:16:09,710 into five pieces. 323 00:16:09,710 --> 00:16:14,770 But now it's that second and fourth string that 324 00:16:14,770 --> 00:16:19,630 are going to play that special pumping role, which means that, 325 00:16:19,630 --> 00:16:24,200 what you can do is you can repeat those 326 00:16:24,200 --> 00:16:26,660 and you stay in the language. 327 00:16:26,660 --> 00:16:28,280 And it's important that you repeat 328 00:16:28,280 --> 00:16:31,557 them both, that v and that y, the same number of times. 329 00:16:31,557 --> 00:16:33,140 So you're going to have a picture that 330 00:16:33,140 --> 00:16:35,870 looks something like this. 331 00:16:35,870 --> 00:16:39,480 And that is going to you repeat. 332 00:16:42,430 --> 00:16:44,935 If you repeat the v and you repeat the y, you get uvvxyyz. 333 00:16:48,580 --> 00:16:50,545 Or if you look at over here, it would 334 00:16:50,545 --> 00:16:54,820 be uv squared xy squared z. 335 00:16:54,820 --> 00:16:58,500 And that's going to still be in the language. 336 00:16:58,500 --> 00:17:00,080 And then we have-- 337 00:17:00,080 --> 00:17:01,080 so that's one condition. 338 00:17:01,080 --> 00:17:02,955 We'll have to look at all of these conditions 339 00:17:02,955 --> 00:17:04,380 when we do the proof, but we just 340 00:17:04,380 --> 00:17:06,505 want to understand what the statement is right now. 341 00:17:06,505 --> 00:17:10,589 So the second condition is that v and y together cannot be 342 00:17:10,589 --> 00:17:11,310 empty. 343 00:17:11,310 --> 00:17:13,859 And really, that's another way of saying, they can't both 344 00:17:13,859 --> 00:17:15,869 be the empty string, because if they 345 00:17:15,869 --> 00:17:17,940 were both the empty string, then repeating 346 00:17:17,940 --> 00:17:21,297 them wouldn't change s. 347 00:17:21,297 --> 00:17:23,339 And then of course it would stay in the language. 348 00:17:23,339 --> 00:17:24,839 So it would be kind of meaningless 349 00:17:24,839 --> 00:17:26,880 if they were allowed to be empty. 350 00:17:26,880 --> 00:17:28,380 And the last thing is, again, going 351 00:17:28,380 --> 00:17:30,930 to be there as a matter of convenience 352 00:17:30,930 --> 00:17:37,980 for proving languages are not context free, because you have 353 00:17:37,980 --> 00:17:40,710 to make sure there is no possible way of cutting up 354 00:17:40,710 --> 00:17:41,652 the string. 355 00:17:41,652 --> 00:17:44,110 When you're trying to prove a language is not context free, 356 00:17:44,110 --> 00:17:45,568 you have to show the pumping fails. 357 00:17:48,690 --> 00:17:51,240 It's going to be helpful sometimes 358 00:17:51,240 --> 00:17:54,060 to limit the ways in which the string can be cut up, 359 00:17:54,060 --> 00:17:55,290 because then you have-- 360 00:17:55,290 --> 00:17:58,690 it's an easier job for you to work with it. 361 00:17:58,690 --> 00:18:00,660 So here, it's a little different than before, 362 00:18:00,660 --> 00:18:06,040 but sort of similar, that vxy combine as a substring. 363 00:18:06,040 --> 00:18:07,900 So I show that over here. 364 00:18:07,900 --> 00:18:12,600 vxy together is not too long. 365 00:18:12,600 --> 00:18:15,570 So the vxy-- maybe it's better seen up here-- 366 00:18:15,570 --> 00:18:18,240 is going to be, at most, p. 367 00:18:18,240 --> 00:18:21,000 We'll do an example in a minute of using this. 368 00:18:21,000 --> 00:18:23,640 OK, so again, here is our pumping lemma. 369 00:18:23,640 --> 00:18:25,020 I've just restated it. 370 00:18:25,020 --> 00:18:29,240 So we have it in front of us. 371 00:18:29,240 --> 00:18:31,078 And we're going to do a proof. 372 00:18:31,078 --> 00:18:33,370 I'm just going to give you the idea of the proof first. 373 00:18:33,370 --> 00:18:35,287 And then we'll go through some of the details. 374 00:18:35,287 --> 00:18:39,270 The idea is actually pretty simple. 375 00:18:39,270 --> 00:18:41,060 We give it-- call it a proof by picture. 376 00:18:43,953 --> 00:18:45,620 Again, remember what we're trying to do. 377 00:18:45,620 --> 00:18:48,700 We're trying to show that we have this context-free language 378 00:18:48,700 --> 00:18:55,240 A. And now all long strings in A have this pumping quality, 379 00:18:55,240 --> 00:18:58,750 that you can break them up into five pieces so 380 00:18:58,750 --> 00:19:01,880 that the second and the fourth piece can be repeated. 381 00:19:01,880 --> 00:19:04,900 And you stay in the language. 382 00:19:04,900 --> 00:19:07,450 So how do we know that that's going to be true? 383 00:19:07,450 --> 00:19:08,950 Let's take a look at the proof here. 384 00:19:08,950 --> 00:19:11,270 And why is that going to be true? 385 00:19:11,270 --> 00:19:16,100 So first of all, I'd like to do it qualitatively rather 386 00:19:16,100 --> 00:19:17,280 than quantitatively. 387 00:19:17,280 --> 00:19:20,210 So let's just imagine, instead of thinking-- we'll 388 00:19:20,210 --> 00:19:21,740 calculate what p is later. 389 00:19:21,740 --> 00:19:25,623 But just imagine that s is some really, really long string. 390 00:19:25,623 --> 00:19:27,290 That's the way I like to think about it. 391 00:19:27,290 --> 00:19:29,110 So s is just really long. 392 00:19:29,110 --> 00:19:31,480 What is that going to tell us? 393 00:19:31,480 --> 00:19:37,120 It's going to tell us something important about the way 394 00:19:37,120 --> 00:19:41,980 the grammar produces s, which is going to be useful in getting 395 00:19:41,980 --> 00:19:45,240 a way of pumping it. 396 00:19:45,240 --> 00:19:49,110 So if s is really long, we're going 397 00:19:49,110 --> 00:19:50,880 to look at the parse tree for s. 398 00:19:50,880 --> 00:19:53,580 And we're going to conclude that the parse tree has 399 00:19:53,580 --> 00:19:58,300 to be really tall, because it's impossible for a very 400 00:19:58,300 --> 00:20:03,880 shallow parse tree to generate a very long string. 401 00:20:03,880 --> 00:20:05,930 And again, we'll quantify that in a second. 402 00:20:05,930 --> 00:20:07,840 But intuitively, I think that's not 403 00:20:07,840 --> 00:20:10,430 too hard to see why that ought to be true. 404 00:20:10,430 --> 00:20:13,390 So if you have a long s, the parse tree 405 00:20:13,390 --> 00:20:16,090 has to be really tall, because the parse tree can't 406 00:20:16,090 --> 00:20:18,040 generate very many-- it can't expand 407 00:20:18,040 --> 00:20:20,603 by very much at each level. 408 00:20:20,603 --> 00:20:22,270 So we'll look at how much it can expand. 409 00:20:22,270 --> 00:20:25,390 But it depends on the grammar, how much expansion 410 00:20:25,390 --> 00:20:26,660 have at each level. 411 00:20:26,660 --> 00:20:30,250 And it's going to be-- you can't have just in three levels 412 00:20:30,250 --> 00:20:34,030 some small grammar generating a string of length 1 million. 413 00:20:34,030 --> 00:20:37,480 You'll see that that's just impossible. 414 00:20:37,480 --> 00:20:43,710 So once you know that the parse tree is really tall here, 415 00:20:43,710 --> 00:20:47,908 then you're actually almost done, because what 416 00:20:47,908 --> 00:20:49,200 does it mean to be really tall? 417 00:20:49,200 --> 00:20:52,800 It means that there is some path starting at the start variable 418 00:20:52,800 --> 00:20:57,240 E, I'm calling it in this parse tree, which goes down 419 00:20:57,240 --> 00:21:04,500 to some terminal symbol in s, which goes through many steps. 420 00:21:04,500 --> 00:21:06,960 That's what it means for the tree to be very tall. 421 00:21:06,960 --> 00:21:09,930 And each one of those steps is a variable 422 00:21:09,930 --> 00:21:13,930 until you get down to the very end. 423 00:21:13,930 --> 00:21:15,850 OK, so that's the way parse trees look. 424 00:21:15,850 --> 00:21:19,490 You keep expanding variables until you get to a terminal. 425 00:21:19,490 --> 00:21:25,130 So here, you get some path that's really a long path. 426 00:21:25,130 --> 00:21:27,410 And once you have a long path that 427 00:21:27,410 --> 00:21:30,270 has many, many variables appearing on here, 428 00:21:30,270 --> 00:21:33,320 well, the grammar itself has only some fixed number 429 00:21:33,320 --> 00:21:35,120 of variables in it, so you're going 430 00:21:35,120 --> 00:21:39,800 to have to have a repetition coming among the variables that 431 00:21:39,800 --> 00:21:41,330 occur on that long path. 432 00:21:44,610 --> 00:21:45,780 Got that? 433 00:21:45,780 --> 00:21:49,600 So a long string forces a tall parse tree, 434 00:21:49,600 --> 00:21:53,610 forces a repetition on some path coming out 435 00:21:53,610 --> 00:22:00,120 of the start variable of some other variable that comes out. 436 00:22:00,120 --> 00:22:03,840 Now that's going to tell us how to cut up s, 437 00:22:03,840 --> 00:22:06,570 because if you look at the subtrees of s 438 00:22:06,570 --> 00:22:13,430 that those two R variables are generating, shown like this, 439 00:22:13,430 --> 00:22:15,270 I'm going to use that-- 440 00:22:15,270 --> 00:22:17,130 so you have to follow what I'm saying here. 441 00:22:17,130 --> 00:22:21,170 So R here is generating this portion of s. 442 00:22:21,170 --> 00:22:24,260 And the lower R is generating a smaller portion of s, 443 00:22:24,260 --> 00:22:27,980 just looking at the subtree that you get here. 444 00:22:27,980 --> 00:22:34,260 And that's going to tell us that we can cut up s accordingly. 445 00:22:34,260 --> 00:22:38,280 So u was that very first part out here generated 446 00:22:38,280 --> 00:22:41,670 by E, but not by the first R. R is generated-- 447 00:22:41,670 --> 00:22:44,910 v is generated by the first R, but not by the second R. 448 00:22:44,910 --> 00:22:47,310 The second R generates exactly x. 449 00:22:47,310 --> 00:22:52,040 And then we have y and z, similarly. 450 00:22:52,040 --> 00:22:57,200 So that all follows from having a tall parse tree. 451 00:22:57,200 --> 00:22:58,700 And now we're finished. 452 00:22:58,700 --> 00:23:00,330 Now we know how to cut up s. 453 00:23:00,330 --> 00:23:04,880 How do we know we can repeat v and y 454 00:23:04,880 --> 00:23:06,290 and still be in the language? 455 00:23:06,290 --> 00:23:09,050 Well, I'll actually show you that you're in the language 456 00:23:09,050 --> 00:23:11,795 by exhibiting a parse tree for the string uvvxyyz. 457 00:23:14,450 --> 00:23:15,260 Here it is. 458 00:23:17,840 --> 00:23:22,850 I'm going to get that parse tree by, when I expand this lower R, 459 00:23:22,850 --> 00:23:25,010 instead of expanding it to get x, 460 00:23:25,010 --> 00:23:28,700 I'm going to follow the same substitutions that I had when 461 00:23:28,700 --> 00:23:33,035 I expanded the upper R. So it's as if I took this larger 462 00:23:33,035 --> 00:23:38,900 subtree here and I substituted it in for the smaller subtree 463 00:23:38,900 --> 00:23:42,260 under the second R. And so I get a picture that looks like this. 464 00:23:45,480 --> 00:23:49,710 So here I'm substituting under the second R the same subtree 465 00:23:49,710 --> 00:23:54,290 that I had originally coming out of the upper R, the first R. 466 00:23:54,290 --> 00:23:59,770 And so now this parse tree is generating the string uvvxyyz, 467 00:23:59,770 --> 00:24:01,020 which is what I'm looking for. 468 00:24:01,020 --> 00:24:03,230 And of course, you can do that again and again. 469 00:24:03,230 --> 00:24:06,380 And you're going to keep getting higher and higher exponents 470 00:24:06,380 --> 00:24:09,020 of v and y. 471 00:24:09,020 --> 00:24:11,910 And in fact, you can even get the 0 exponent, 472 00:24:11,910 --> 00:24:15,500 which means that v and y both disappear altogether. 473 00:24:15,500 --> 00:24:18,060 And for that, you do something slightly different, 474 00:24:18,060 --> 00:24:22,080 which is that you replace the larger subtree by the smaller 475 00:24:22,080 --> 00:24:22,580 subtree. 476 00:24:25,400 --> 00:24:29,270 OK so here, which was originally that larger tree generating 477 00:24:29,270 --> 00:24:32,580 vxy, I stick instead the smaller subtree. 478 00:24:32,580 --> 00:24:34,880 I do the substitutions from the smaller subtree. 479 00:24:34,880 --> 00:24:37,195 And I just get x there. 480 00:24:37,195 --> 00:24:38,570 And so now the string I generated 481 00:24:38,570 --> 00:24:44,740 is uxz, which is the same as uv to the 0 xy to the 0 z. 482 00:24:44,740 --> 00:24:48,650 And that is the idea of the proof. 483 00:24:48,650 --> 00:24:55,930 Now, I think you could work out the quantities that you need 484 00:24:55,930 --> 00:24:58,150 in order to drive this proof. 485 00:24:58,150 --> 00:24:59,530 I'm going to do that for you. 486 00:24:59,530 --> 00:25:03,610 I actually hate writing down lots of inequalities, 487 00:25:03,610 --> 00:25:06,892 and equations, and so on, on the board, 488 00:25:06,892 --> 00:25:08,350 because I think they're just almost 489 00:25:08,350 --> 00:25:09,505 incomprehensible to follow. 490 00:25:09,505 --> 00:25:10,880 Or at least they would be for me. 491 00:25:10,880 --> 00:25:12,520 But I'm going to put them up there just 492 00:25:12,520 --> 00:25:15,082 for completeness sake. 493 00:25:15,082 --> 00:25:17,290 So here we're going to give the details of this proof 494 00:25:17,290 --> 00:25:19,592 on the next slide here. 495 00:25:19,592 --> 00:25:21,550 Oh yeah, so I just want to give a name to this. 496 00:25:21,550 --> 00:25:23,890 I'm going to call this the cutting and pasting argument, 497 00:25:23,890 --> 00:25:27,850 because I'm cutting apart pieces of this parse tree 498 00:25:27,850 --> 00:25:30,910 and I'm pasting them in to other places within the parse tree 499 00:25:30,910 --> 00:25:34,570 to get new strings being generated. 500 00:25:34,570 --> 00:25:36,320 So this is a cutting and pasting argument. 501 00:25:36,320 --> 00:25:40,733 So OK, let's take a look at the details here, just, 502 00:25:40,733 --> 00:25:42,150 well, we have to understand, well, 503 00:25:42,150 --> 00:25:46,110 how big does p actually need to be in order for this thing 504 00:25:46,110 --> 00:25:46,860 to kick in? 505 00:25:49,800 --> 00:25:52,080 Well, first of all, we have to understand 506 00:25:52,080 --> 00:25:55,530 how fast that parse tree can be growing 507 00:25:55,530 --> 00:25:57,070 as we go level to level. 508 00:25:57,070 --> 00:26:01,860 And that's going to be dependent on how big the right-hand sides 509 00:26:01,860 --> 00:26:02,850 of rules are. 510 00:26:02,850 --> 00:26:06,120 I mean, that really tells you how many-- 511 00:26:06,120 --> 00:26:09,540 what's the fan out you know of each node? 512 00:26:09,540 --> 00:26:10,890 What's the maximum fan out? 513 00:26:10,890 --> 00:26:14,130 And that's going to be the maximum length 514 00:26:14,130 --> 00:26:17,280 of a right-hand side of any rule. 515 00:26:17,280 --> 00:26:18,960 So for example, in that other grammar 516 00:26:18,960 --> 00:26:22,890 we had seen last time for arithmetic expressions, 517 00:26:22,890 --> 00:26:27,030 we had this E goes to E plus T, this rule here. 518 00:26:27,030 --> 00:26:28,620 And in terms of the parse tree, that 519 00:26:28,620 --> 00:26:32,520 would look like a little element like that. 520 00:26:32,520 --> 00:26:35,520 And that's actually the longest right-hand side 521 00:26:35,520 --> 00:26:36,700 that you can get. 522 00:26:36,700 --> 00:26:39,900 And so the parse tree can be growing by a factor of 3 523 00:26:39,900 --> 00:26:41,470 each time. 524 00:26:41,470 --> 00:26:46,410 Now, that's going to tell us how big the string needs 525 00:26:46,410 --> 00:26:49,440 to be that's being generated, what is the value of p 526 00:26:49,440 --> 00:26:52,965 in order to get a high enough parse tree so that you're going 527 00:26:52,965 --> 00:26:54,090 to get a repeated variable. 528 00:26:57,440 --> 00:27:01,850 Let's call the height of the parse tree for S h. 529 00:27:01,850 --> 00:27:04,550 So now if you-- this is just repeating what I just said. 530 00:27:04,550 --> 00:27:07,430 If you have a tree of height h and the maximum branching 531 00:27:07,430 --> 00:27:10,190 is b, then you get, at most, b to the h leaves, 532 00:27:10,190 --> 00:27:14,660 because each level, you get another factor of b coming up, 533 00:27:14,660 --> 00:27:17,630 because that's how much branching you have. 534 00:27:17,630 --> 00:27:20,330 So each node at one level can become b nodes 535 00:27:20,330 --> 00:27:21,470 at the next level down. 536 00:27:21,470 --> 00:27:23,277 So you're multiplying by b each time. 537 00:27:23,277 --> 00:27:24,860 And if you have h levels, you're going 538 00:27:24,860 --> 00:27:28,290 to have b to the h leaves. 539 00:27:28,290 --> 00:27:31,440 So the length of s, which are really the leaves here, 540 00:27:31,440 --> 00:27:33,098 is at most b to the h. 541 00:27:33,098 --> 00:27:34,890 The reason why it's at most and not exactly 542 00:27:34,890 --> 00:27:37,050 is you might be doing some substitutions which 543 00:27:37,050 --> 00:27:40,380 are shorter right-hand sides. 544 00:27:40,380 --> 00:27:42,450 OK, so to try to show this as a picture 545 00:27:42,450 --> 00:27:46,290 here, pulling that same picture we had before, 546 00:27:46,290 --> 00:27:51,180 we want h, the height, to be bigger 547 00:27:51,180 --> 00:27:56,100 than the number of variables to force a repetition. 548 00:27:56,100 --> 00:27:59,310 So the number of variables is going to be written this way. 549 00:27:59,310 --> 00:28:01,440 V is the variables. 550 00:28:01,440 --> 00:28:04,488 V with bars around it is going to be the number of variables. 551 00:28:04,488 --> 00:28:06,030 And we want that height to be greater 552 00:28:06,030 --> 00:28:07,810 than the number of variables. 553 00:28:07,810 --> 00:28:10,770 So once you know how high you want that tree to be in order 554 00:28:10,770 --> 00:28:15,210 to force a repetition, then it tells you how big s has to be. 555 00:28:15,210 --> 00:28:22,560 So V has to be bigger than b to the V, b to the size of V, 556 00:28:22,560 --> 00:28:26,820 because then the height that you're going to get 557 00:28:26,820 --> 00:28:33,060 is going to be greater than the size of V, which 558 00:28:33,060 --> 00:28:34,660 is-- so that's what you want. 559 00:28:34,660 --> 00:28:36,960 You want h to be greater than the size of V. 560 00:28:36,960 --> 00:28:41,180 So you're going to set p to be one more than b to the V. 561 00:28:41,180 --> 00:28:43,698 And so if s is at least that length, 562 00:28:43,698 --> 00:28:45,240 this whole thing is going to kick in. 563 00:28:45,240 --> 00:28:48,460 And you're going to get that repeated variable. 564 00:28:48,460 --> 00:28:51,150 So we'll let p to be that value where 565 00:28:51,150 --> 00:28:53,230 V is the number of variables in the grammar. 566 00:28:53,230 --> 00:28:59,654 And so if s is at least p, which is greater than b to the V, 567 00:28:59,654 --> 00:29:04,710 then the length of s is going to be greater than b to the V. 568 00:29:04,710 --> 00:29:09,810 So h is going to be what you want to make this thing work. 569 00:29:09,810 --> 00:29:12,180 If you don't follow that, those inequalities, 570 00:29:12,180 --> 00:29:13,470 I sympathize with you. 571 00:29:13,470 --> 00:29:15,450 I would never follow that either in a lecture. 572 00:29:15,450 --> 00:29:19,840 So but I hope you get the idea. 573 00:29:19,840 --> 00:29:21,690 But we're not quite finished yet, 574 00:29:21,690 --> 00:29:25,110 because I want to now circle back, and look at these three 575 00:29:25,110 --> 00:29:29,520 conditions, and make sure that we've captured them all, 576 00:29:29,520 --> 00:29:34,950 because actually, it's not totally obvious in each 577 00:29:34,950 --> 00:29:36,850 of those cases that we've got them. 578 00:29:36,850 --> 00:29:41,460 So there is a few extra things we need to do. 579 00:29:41,460 --> 00:29:43,653 OK, so this is concluding the argument. 580 00:29:43,653 --> 00:29:46,320 There are going to be at least V plus 1 variables in the longest 581 00:29:46,320 --> 00:29:46,590 path. 582 00:29:46,590 --> 00:29:48,210 So there is going to be a repetition. 583 00:29:48,210 --> 00:29:50,430 So now let's go back here and see, now 584 00:29:50,430 --> 00:29:56,280 that we have this picture with a repeated variable, 585 00:29:56,280 --> 00:29:58,807 how do we know we can get condition one? 586 00:29:58,807 --> 00:30:00,890 Well, that's just the cutting and pasting argument 587 00:30:00,890 --> 00:30:01,970 from the previous slide. 588 00:30:05,090 --> 00:30:08,830 How do we know that v and y are not both empty? 589 00:30:08,830 --> 00:30:11,800 Well actually, that's not totally obvious, 590 00:30:11,800 --> 00:30:16,210 because it's possible that, when you generated v here 591 00:30:16,210 --> 00:30:21,010 and you generated y, maybe going from this R to that R, 592 00:30:21,010 --> 00:30:23,930 you got nothing new. 593 00:30:23,930 --> 00:30:27,130 You know, it could have been that R got replaced by T, 594 00:30:27,130 --> 00:30:29,350 another variable with nothing new coming out, 595 00:30:29,350 --> 00:30:31,580 and then T got replaced by R. 596 00:30:31,580 --> 00:30:35,187 You substituted T for R and then R for T. 597 00:30:35,187 --> 00:30:36,770 And you've got nothing new coming out. 598 00:30:36,770 --> 00:30:40,420 And in that case, v and y would both be the empty string. 599 00:30:40,420 --> 00:30:44,280 And that would violate what we want. 600 00:30:44,280 --> 00:30:47,413 The way you get around-- that and these are details here. 601 00:30:47,413 --> 00:30:49,830 If you're not totally following these points, don't worry. 602 00:30:52,470 --> 00:30:53,580 They're easy to describe. 603 00:30:53,580 --> 00:30:56,038 So I figure, let me present the whole thing in full detail. 604 00:30:59,720 --> 00:31:03,440 So if going from this R to that R doesn't generate anything 605 00:31:03,440 --> 00:31:06,470 new, you're getting exactly the same things coming out-- 606 00:31:06,470 --> 00:31:10,040 v and y are just the empty string-- 607 00:31:10,040 --> 00:31:12,950 how do we avoid that from happening? 608 00:31:12,950 --> 00:31:18,450 There is a simple way to address that, which is to say, 609 00:31:18,450 --> 00:31:23,720 if you have this string s, when you take a parse tree, 610 00:31:23,720 --> 00:31:27,950 make sure you take a small-as-possible parse tree. 611 00:31:27,950 --> 00:31:31,550 You're not allowed to start off with an inefficient parse tree 612 00:31:31,550 --> 00:31:34,430 that can be shortened and still generate s. 613 00:31:34,430 --> 00:31:36,590 I want the smallest possible parse tree. 614 00:31:36,590 --> 00:31:38,540 And that smallest possible parse tree 615 00:31:38,540 --> 00:31:41,720 can't have an R going to another R which 616 00:31:41,720 --> 00:31:44,210 is generating nothing new, because then you could always 617 00:31:44,210 --> 00:31:45,770 have eliminated that step. 618 00:31:45,770 --> 00:31:47,870 And you would still have a parse tree for s, 619 00:31:47,870 --> 00:31:49,550 but it would be a smaller parse tree. 620 00:31:49,550 --> 00:31:50,300 So that would be-- 621 00:31:50,300 --> 00:31:54,170 I want you to start off with the smallest possible parse tree. 622 00:31:54,170 --> 00:31:57,020 And then you're going to be guaranteed that v or y is going 623 00:31:57,020 --> 00:32:00,170 to be something not empty. 624 00:32:00,170 --> 00:32:05,770 So that takes care of condition two. 625 00:32:05,770 --> 00:32:13,300 Condition three-- you know, how do we know that vxy together 626 00:32:13,300 --> 00:32:15,280 is not very long? 627 00:32:15,280 --> 00:32:18,490 And basically, it's the same argument all over again. 628 00:32:18,490 --> 00:32:20,500 You just want to make sure that, when you're 629 00:32:20,500 --> 00:32:23,380 picking the repetition R, the two R's here, 630 00:32:23,380 --> 00:32:26,080 you pick the lowest possible repetitions that occur, 631 00:32:26,080 --> 00:32:28,550 if you have many choices. 632 00:32:28,550 --> 00:32:30,800 And those lowest two, those lowest repetitions, 633 00:32:30,800 --> 00:32:33,080 there is not going to be any lower repetition here. 634 00:32:33,080 --> 00:32:38,390 And then by the same argument, since once you 635 00:32:38,390 --> 00:32:41,210 have that very first R, there is no more repetitions occurring 636 00:32:41,210 --> 00:32:46,190 below, the vxy can't be very long, 637 00:32:46,190 --> 00:32:51,890 because that would, again, force another repetition to occur. 638 00:32:51,890 --> 00:32:54,110 So anyway, those are the three conditions. 639 00:32:54,110 --> 00:32:56,680 And that's the proof of the pumping lemma 640 00:32:56,680 --> 00:32:57,760 for h free languages. 641 00:32:57,760 --> 00:33:01,030 Let's see how we use that. 642 00:33:01,030 --> 00:33:03,190 OK, so let's do an example of proving a language 643 00:33:03,190 --> 00:33:05,892 not context free using the pumping lemma. 644 00:33:05,892 --> 00:33:07,600 How are you going to go about doing that? 645 00:33:07,600 --> 00:33:10,710 Because that's the kind of thing, at the very least, 646 00:33:10,710 --> 00:33:13,210 you need to know how to do this in order to do the homework. 647 00:33:15,690 --> 00:33:17,440 I'd like to motivate you that the stuff is 648 00:33:17,440 --> 00:33:20,780 so interesting and fun, but it doesn't work for everybody. 649 00:33:20,780 --> 00:33:25,990 So for you practical people out there, pay attention 650 00:33:25,990 --> 00:33:28,453 so you can do the homework. 651 00:33:28,453 --> 00:33:29,870 OK, let's go back to that language 652 00:33:29,870 --> 00:33:33,110 we had a couple of slides back, 0 to the k, 1 to the k, 653 00:33:33,110 --> 00:33:33,740 2 to the k. 654 00:33:33,740 --> 00:33:35,630 It's not a context-free language. 655 00:33:35,630 --> 00:33:38,210 We're going to show that now using the pumping lemma 656 00:33:38,210 --> 00:33:41,970 for context-free languages. 657 00:33:41,970 --> 00:33:46,160 So it's going to do, similar to the proofs using 658 00:33:46,160 --> 00:33:51,990 for non-regular languages, proof by contradiction. 659 00:33:51,990 --> 00:33:55,640 So you, first you assume the language is context-free. 660 00:33:55,640 --> 00:33:58,760 And then we're going to apply the pumping lemma. 661 00:33:58,760 --> 00:34:02,580 And then we're going to get a contradiction. 662 00:34:02,580 --> 00:34:04,580 So the pumping lemma gives that pumping length, 663 00:34:04,580 --> 00:34:06,170 as we described above. 664 00:34:06,170 --> 00:34:09,500 And now we just want to pick a longer string in the language 665 00:34:09,500 --> 00:34:12,770 and show that that longer string, which 666 00:34:12,770 --> 00:34:15,643 is supposed to be pumpable and stay in the language, in fact 667 00:34:15,643 --> 00:34:16,310 is not pumpable. 668 00:34:19,600 --> 00:34:21,100 So the pumping lemma says that you 669 00:34:21,100 --> 00:34:23,949 can divide it into five pieces satisfying the three 670 00:34:23,949 --> 00:34:25,060 conditions. 671 00:34:25,060 --> 00:34:28,153 Condition three implies that-- 672 00:34:28,153 --> 00:34:30,070 so now I'm going to I'm going to work through. 673 00:34:30,070 --> 00:34:32,780 I'm going to show you get a contradiction. 674 00:34:32,780 --> 00:34:37,270 So condition three implies that you cannot contain both 675 00:34:37,270 --> 00:34:38,020 0's and-- 676 00:34:38,020 --> 00:34:39,409 let's pull up a picture here. 677 00:34:39,409 --> 00:34:45,100 So here is s, 0's, 1's, and then 2's, all of the same length. 678 00:34:45,100 --> 00:34:51,139 Condition three-- so if you break it up, 679 00:34:51,139 --> 00:34:55,340 condition three says, vxy together cannot be too long. 680 00:34:55,340 --> 00:34:58,670 Well, if vxy together is not too long, 681 00:34:58,670 --> 00:35:02,660 how could it be that, when you're repeating v and y, 682 00:35:02,660 --> 00:35:05,380 you stay in the language? 683 00:35:05,380 --> 00:35:09,940 For one thing, you can't have 0's, 1's, and 2's all 684 00:35:09,940 --> 00:35:12,100 occurring within v, x, and y. 685 00:35:16,020 --> 00:35:18,430 Some symbol is going to get left out. 686 00:35:18,430 --> 00:35:20,470 So then when you pump up, you're going 687 00:35:20,470 --> 00:35:22,138 to have unequal numbers of symbols. 688 00:35:22,138 --> 00:35:24,055 And so you're going to be out of the language. 689 00:35:27,360 --> 00:35:30,570 OK, so no matter how you try to cut it up 690 00:35:30,570 --> 00:35:33,930 following condition three, which is one of the things that 691 00:35:33,930 --> 00:35:36,990 restricts the ways to cut it up, you're 692 00:35:36,990 --> 00:35:41,290 going to end up, when you pump up, going out of the language. 693 00:35:41,290 --> 00:35:45,690 And so therefore, it's not in-- 694 00:35:45,690 --> 00:35:46,230 D? 695 00:35:46,230 --> 00:35:49,030 D is wrong. 696 00:35:49,030 --> 00:35:50,280 B, should say "B." 697 00:35:56,610 --> 00:35:58,660 I'm supposed to be able to write on this thing. 698 00:35:58,660 --> 00:36:00,320 I guess not. 699 00:36:00,320 --> 00:36:02,350 I didn't test that. 700 00:36:02,350 --> 00:36:08,320 Oh well, that's supposed to be a B. 701 00:36:08,320 --> 00:36:10,880 So B is a context-free language, which includes-- 702 00:36:10,880 --> 00:36:13,390 so that's the assumption, that B is a context-free language. 703 00:36:13,390 --> 00:36:13,932 That's false. 704 00:36:13,932 --> 00:36:17,170 And we conclude that it's not a context-free language. 705 00:36:17,170 --> 00:36:22,910 Let's do-- oh yeah, I have a check in here. 706 00:36:22,910 --> 00:36:28,540 So let's see what I'm going to ask you to think about. 707 00:36:32,190 --> 00:36:37,558 OK, my head is blocking part of the text? 708 00:36:37,558 --> 00:36:38,600 Oh, that was a while ago. 709 00:36:43,580 --> 00:36:46,260 Yes, so just one question by the way, 710 00:36:46,260 --> 00:36:50,990 in terms of applying the pumping lemma-- either v or y 711 00:36:50,990 --> 00:36:53,507 can be empty, but not both. 712 00:36:53,507 --> 00:36:55,340 But anyway, let's get to this check in here. 713 00:36:58,450 --> 00:37:01,320 So let's look at these two languages, A1 and A2, 714 00:37:01,320 --> 00:37:05,410 which look very similar to B, but a little different. 715 00:37:05,410 --> 00:37:08,760 So it's A1 is 0 to the k, 1 to the k, 2 716 00:37:08,760 --> 00:37:13,120 to the l, where k and l could be any numbers, 717 00:37:13,120 --> 00:37:16,850 any positive, non-negative numbers. 718 00:37:16,850 --> 00:37:18,820 So basically what this is saying is 719 00:37:18,820 --> 00:37:21,280 that the number of 0's and 1's are going to be equal, 720 00:37:21,280 --> 00:37:26,830 but the number of 2's can be anything, whereas A2, similar, 721 00:37:26,830 --> 00:37:28,510 but here, we're requiring the number 722 00:37:28,510 --> 00:37:31,877 of 1's and 2's to be equal. 723 00:37:31,877 --> 00:37:33,460 And the number of 0's can be anything. 724 00:37:36,810 --> 00:37:40,050 Now, you can easily make, I hope-- 725 00:37:40,050 --> 00:37:41,940 you should make sure you can-- 726 00:37:41,940 --> 00:37:47,310 pushdown automata that can recognize A1 and A2, 727 00:37:47,310 --> 00:37:49,760 because let's just take A1. 728 00:37:49,760 --> 00:37:55,520 The pushdown automaton can push the 0's as it's reading them, 729 00:37:55,520 --> 00:37:57,920 pop them as it's reading the 1's to match them off 730 00:37:57,920 --> 00:38:00,620 and make sure that they're the same number of them. 731 00:38:00,620 --> 00:38:04,105 And then the 2's, it doesn't care how many there are. 732 00:38:04,105 --> 00:38:06,230 It just has to make sure that there are no strings, 733 00:38:06,230 --> 00:38:08,310 there are no letters coming out of order. 734 00:38:08,310 --> 00:38:09,848 But any number of 2's is fine. 735 00:38:09,848 --> 00:38:12,140 So you can easily make a pushdown automaton recognizing 736 00:38:12,140 --> 00:38:15,230 A1, similarly for A2. 737 00:38:15,230 --> 00:38:19,725 So what can we conclude from that? 738 00:38:19,725 --> 00:38:21,100 Here are the three possibilities. 739 00:38:21,100 --> 00:38:24,180 Let me-- so look at that, the class of context-free languages 740 00:38:24,180 --> 00:38:26,700 is not closed under intersection. 741 00:38:26,700 --> 00:38:28,140 You can read it. 742 00:38:28,140 --> 00:38:33,720 So I want to pull up the poll and launch that. 743 00:38:38,330 --> 00:38:39,290 Please fill that out. 744 00:38:43,420 --> 00:38:43,975 10 seconds. 745 00:38:46,890 --> 00:38:50,100 Again, just, if you don't know the answer, 746 00:38:50,100 --> 00:38:51,895 just give any answers so that-- 747 00:38:51,895 --> 00:38:53,520 because we're not counting correctness. 748 00:38:56,719 --> 00:38:58,870 There is still a few dribbling in. 749 00:38:58,870 --> 00:39:04,720 OK, five seconds. 750 00:39:04,720 --> 00:39:06,580 OK, end polling. 751 00:39:06,580 --> 00:39:07,750 Most of you got that right. 752 00:39:10,468 --> 00:39:11,010 I don't know. 753 00:39:11,010 --> 00:39:12,302 Is it OK to share these things? 754 00:39:12,302 --> 00:39:15,660 I don't want to make people who didn't get the right feel bad. 755 00:39:15,660 --> 00:39:17,160 You know, but you should understand, 756 00:39:17,160 --> 00:39:18,750 I think if you're missing something, 757 00:39:18,750 --> 00:39:21,540 you should understand what you're missing. 758 00:39:21,540 --> 00:39:24,270 The pumping lemma shows that A1 union A2 is not 759 00:39:24,270 --> 00:39:25,330 a context-free language? 760 00:39:25,330 --> 00:39:25,830 No. 761 00:39:25,830 --> 00:39:29,520 As I mentioned at the beginning, the context-free languages 762 00:39:29,520 --> 00:39:31,380 are closed under union. 763 00:39:31,380 --> 00:39:34,843 So the pumping lemma had better not show that these-- 764 00:39:34,843 --> 00:39:36,510 we already know that these two languages 765 00:39:36,510 --> 00:39:40,080 are context free, because we get them from pushdown automaton. 766 00:39:40,080 --> 00:39:42,600 And we said at the beginning that context-free language 767 00:39:42,600 --> 00:39:43,740 is closed under union. 768 00:39:43,740 --> 00:39:45,660 So we know that these two are context free. 769 00:39:45,660 --> 00:39:47,160 So the pumping lemma better not show 770 00:39:47,160 --> 00:39:48,410 that they're not context free. 771 00:39:48,410 --> 00:39:52,410 Something would be terribly-- have gone terribly wrong 772 00:39:52,410 --> 00:39:53,550 if that were true. 773 00:39:53,550 --> 00:39:59,910 And also we know also from a little bit of further reasoning 774 00:39:59,910 --> 00:40:01,680 that the context-free languages is not 775 00:40:01,680 --> 00:40:06,420 closed under complement by what we've already discussed, 776 00:40:06,420 --> 00:40:08,610 because they are closed under union. 777 00:40:08,610 --> 00:40:12,330 And as I pointed out, they're not closed under intersection. 778 00:40:12,330 --> 00:40:15,060 And so if they were closed under complement, 779 00:40:15,060 --> 00:40:17,280 De Morgan's Laws would say that closure 780 00:40:17,280 --> 00:40:19,050 under union and closure under complement 781 00:40:19,050 --> 00:40:21,595 would give you closure under intersection. 782 00:40:21,595 --> 00:40:23,470 But we don't have closure under intersection. 783 00:40:23,470 --> 00:40:27,600 So in fact, they're not closed under complement. 784 00:40:27,600 --> 00:40:30,025 OK, so in fact, this does show us 785 00:40:30,025 --> 00:40:31,650 that the class of context-free language 786 00:40:31,650 --> 00:40:33,330 is not closed under intersection, 787 00:40:33,330 --> 00:40:35,640 because the intersection of A1 and A2, 788 00:40:35,640 --> 00:40:41,620 two context-free languages, is B. And B is not context free. 789 00:40:41,620 --> 00:40:44,290 So it shows that this is-- 790 00:40:44,290 --> 00:40:49,810 the closure under intersection does not hold. 791 00:40:49,810 --> 00:40:53,650 So let us continue, then. 792 00:40:53,650 --> 00:40:55,690 We have one more example. 793 00:40:55,690 --> 00:40:58,850 Then we'll take a break. 794 00:40:58,850 --> 00:41:00,880 So the pumping lemma for context-free languages, 795 00:41:00,880 --> 00:41:04,180 again, here is the second example. 796 00:41:04,180 --> 00:41:07,360 Here is the language F. We have actually seen this before. 797 00:41:07,360 --> 00:41:16,077 ww, two copies of a string, two copies of any string-- 798 00:41:16,077 --> 00:41:18,535 and we're going to show that's not a context-free language. 799 00:41:22,200 --> 00:41:25,020 Assume that it is context free, the pumping lemma 800 00:41:25,020 --> 00:41:26,380 gives pumping length. 801 00:41:26,380 --> 00:41:28,380 Now, here you have to do a little bit more work. 802 00:41:28,380 --> 00:41:33,180 Often, the challenge in applying the pumping lemma 803 00:41:33,180 --> 00:41:36,690 in either case that we've seen involves 804 00:41:36,690 --> 00:41:38,820 choosing that string that you need to pump, 805 00:41:38,820 --> 00:41:40,180 that you're going to pump. 806 00:41:40,180 --> 00:41:43,410 So you have to choose s in F, which is longer 807 00:41:43,410 --> 00:41:45,460 than p, which s to go with. 808 00:41:45,460 --> 00:41:48,103 So you might try this one, first glance. 809 00:41:48,103 --> 00:41:49,770 Here is a string that's in the language, 810 00:41:49,770 --> 00:41:53,955 because it's two copies of the string 0 to the p1 0 811 00:41:53,955 --> 00:41:55,140 to-- and then 0 to the p1. 812 00:41:55,140 --> 00:41:57,945 So that's in the language, but it's a bad choice. 813 00:42:00,530 --> 00:42:02,240 Before I get ahead of myself, let's 814 00:42:02,240 --> 00:42:06,060 draw a picture of s, which I think is always helpful to see. 815 00:42:06,060 --> 00:42:10,490 So here is runs of 0's and then a 1, runs of 0's and then a 1. 816 00:42:10,490 --> 00:42:12,200 Why is this a bad choice? 817 00:42:12,200 --> 00:42:16,850 Because you can pump that string and you remain in the language. 818 00:42:16,850 --> 00:42:19,807 There is a way to cut that string up 819 00:42:19,807 --> 00:42:21,140 and you'll stay in the language. 820 00:42:21,140 --> 00:42:23,510 And the way to cut it up is to let 821 00:42:23,510 --> 00:42:28,070 the x be just that substring which is just the 1. 822 00:42:28,070 --> 00:42:32,540 And the v and y can be a couple of 0's or a single 0 823 00:42:32,540 --> 00:42:34,280 on either side of that 1. 824 00:42:34,280 --> 00:42:38,840 And now that's going to be a small vxy. 825 00:42:38,840 --> 00:42:41,990 But if you repeat v and y, you're 826 00:42:41,990 --> 00:42:50,800 going to stay in the language, because you'll just 827 00:42:50,800 --> 00:42:51,910 be adding 0's here. 828 00:42:51,910 --> 00:42:54,100 You'll be adding same number of 0's there. 829 00:42:54,100 --> 00:42:57,280 And then you're going to have a string which 830 00:42:57,280 --> 00:42:58,630 still looks like ww. 831 00:42:58,630 --> 00:43:00,310 And you'll still be in the language. 832 00:43:00,310 --> 00:43:03,700 So that means that cutting it up doesn't get you out 833 00:43:03,700 --> 00:43:06,430 of the language under pumping. 834 00:43:06,430 --> 00:43:09,948 And the fact is that that's a bad choice for s, 835 00:43:09,948 --> 00:43:11,740 because there is that way of cutting it up. 836 00:43:11,740 --> 00:43:14,170 So you have to show there's no way-- 837 00:43:14,170 --> 00:43:16,960 you don't get to pick the way to cut it up. 838 00:43:16,960 --> 00:43:21,100 You have to show that there is no way to cut it up in order 839 00:43:21,100 --> 00:43:25,640 to violate the pumping lemma. 840 00:43:25,640 --> 00:43:29,830 So if instead you use the string 0 to the p, 1 to the p, 841 00:43:29,830 --> 00:43:33,100 0 to the p, 1 to the p-- so this is 0's followed by 1's followed 842 00:43:33,100 --> 00:43:36,400 by 0's followed by 1's, all the same number of them-- 843 00:43:36,400 --> 00:43:40,840 that can't be pumped satisfying the three conditions. 844 00:43:40,840 --> 00:43:42,730 And just going through that-- 845 00:43:46,160 --> 00:43:49,370 now if you try to break it up, you're going to lose. 846 00:43:49,370 --> 00:43:51,105 Or the lemma is going to lose. 847 00:43:51,105 --> 00:43:52,730 You're going to be happy, but the lemma 848 00:43:52,730 --> 00:43:55,160 is not going to be happy, because it's not going-- it's 849 00:43:55,160 --> 00:43:58,400 going to violate the condition. 850 00:43:58,400 --> 00:44:03,290 Condition three says vxy is not-- 851 00:44:03,290 --> 00:44:05,390 doesn't span too much, and in fact, 852 00:44:05,390 --> 00:44:11,360 can't span two runs of 0's or two runs of 1's. 853 00:44:11,360 --> 00:44:13,460 It's just not big enough, because they're 854 00:44:13,460 --> 00:44:16,010 more than p things-- they're p things apart. 855 00:44:16,010 --> 00:44:20,420 And this one string, this string vxy is only p long. 856 00:44:20,420 --> 00:44:25,620 And so therefore, if you repeat v and y, 857 00:44:25,620 --> 00:44:27,810 you're going to have two runs of 0's or two 1's 858 00:44:27,810 --> 00:44:28,920 that have unequal length. 859 00:44:28,920 --> 00:44:32,160 And now that's not going to be the form ww. 860 00:44:32,160 --> 00:44:35,300 You're going to be out of the language. 861 00:44:35,300 --> 00:44:38,300 So I hope that's-- 862 00:44:38,300 --> 00:44:40,470 you've got a little practice with that. 863 00:44:40,470 --> 00:44:43,130 I think we're at our break. 864 00:44:43,130 --> 00:44:48,420 And I will see you back here in five minutes, 865 00:44:48,420 --> 00:44:53,630 if I can get my timer launched here. 866 00:44:53,630 --> 00:44:55,040 OK, so see you soon. 867 00:44:58,120 --> 00:45:03,730 This is a good time, by the way, to message me or the TAs. 868 00:45:03,730 --> 00:45:08,770 And I'll try to be looking for if you have any questions. 869 00:45:08,770 --> 00:45:10,540 In the pumping lemma, can x-- 870 00:45:10,540 --> 00:45:15,190 yeah, x can be epsilon in the pumping lemma. 871 00:45:15,190 --> 00:45:16,300 x can be epsilon. 872 00:45:16,300 --> 00:45:22,420 y can be epsilon, but x and y cannot both be epsilon, 873 00:45:22,420 --> 00:45:25,720 because then, when you pump, you'll get nothing new. 874 00:45:25,720 --> 00:45:30,320 Technically, v and y can include both 0's and 1's. 875 00:45:30,320 --> 00:45:33,560 Yeah, v and y can include both 0's and 1's. 876 00:45:38,550 --> 00:45:46,075 So let me try to put that back, if that's will-- 877 00:45:50,960 --> 00:45:55,940 so v and y can have both 0's and 1's, but they can't have 878 00:45:55,940 --> 00:45:58,040 0's from two different blocks. 879 00:46:01,180 --> 00:46:03,530 And you can't have 1's from two different blocks. 880 00:46:03,530 --> 00:46:05,680 So what's going to happen is either you're 881 00:46:05,680 --> 00:46:08,050 going to get things out of order when you repeat-- 882 00:46:08,050 --> 00:46:10,090 like, a v has both 0's and 1's in it. 883 00:46:10,090 --> 00:46:14,320 When you repeat v, you're going to have 884 00:46:14,320 --> 00:46:16,810 0's and 1's, and 0's and 1's, and 0's and 1's. 885 00:46:16,810 --> 00:46:19,990 That's clearly out of the language, so that's no good. 886 00:46:19,990 --> 00:46:23,440 Your only hope is to have v to be sticking only inside 887 00:46:23,440 --> 00:46:27,220 the 0's and y to be sticking only inside 0's or only inside 888 00:46:27,220 --> 00:46:28,152 1's. 889 00:46:28,152 --> 00:46:29,860 But now, if you repeat that and just look 890 00:46:29,860 --> 00:46:31,510 at what you're going to get, you're 891 00:46:31,510 --> 00:46:35,050 going to have a string which is going to be-- 892 00:46:35,050 --> 00:46:36,940 if you try to cut that string in half, 893 00:46:36,940 --> 00:46:38,930 it's not going to be of the right form. 894 00:46:38,930 --> 00:46:41,055 It's not going to be two copies of the same string, 895 00:46:41,055 --> 00:46:46,900 because it's going to have a run of 0's followed by a longer 896 00:46:46,900 --> 00:46:49,298 or shorter run of 0's, or a run of 1's 897 00:46:49,298 --> 00:46:51,340 followed by another run of 1's of unequal length. 898 00:46:51,340 --> 00:46:56,620 So there is no way this can be two strings, two 899 00:46:56,620 --> 00:46:59,530 copies of the same string, because that's 900 00:46:59,530 --> 00:47:00,430 what you required. 901 00:47:00,430 --> 00:47:02,290 F has to be two copies of the same string 902 00:47:02,290 --> 00:47:05,580 to be in the language. 903 00:47:05,580 --> 00:47:08,850 OK, let me just see where-- we're running out of time here. 904 00:47:08,850 --> 00:47:17,400 Let me just put my timer here. 905 00:47:17,400 --> 00:47:18,795 We've only got 30 seconds. 906 00:47:25,230 --> 00:47:29,640 And I'm sorry I'm not getting to answer all the questions here. 907 00:47:29,640 --> 00:47:37,500 OK, we are done with our break. 908 00:47:37,500 --> 00:47:38,500 It's going to come back. 909 00:47:41,500 --> 00:47:45,490 And now we're shifting gears in a major way, 910 00:47:45,490 --> 00:47:47,860 because in a sense, everything we've 911 00:47:47,860 --> 00:47:53,080 done so far has been kind of a warm up. 912 00:47:53,080 --> 00:47:57,850 These limited computational models 913 00:47:57,850 --> 00:48:00,940 really are kind of helping us to set 914 00:48:00,940 --> 00:48:03,760 our understanding of automata and the definitions 915 00:48:03,760 --> 00:48:05,500 and the notation. 916 00:48:05,500 --> 00:48:08,170 And they're also going to be helpful in providing examples 917 00:48:08,170 --> 00:48:10,270 later on in the term. 918 00:48:10,270 --> 00:48:12,820 But really, in terms of a model of computation, 919 00:48:12,820 --> 00:48:17,890 they don't cut it, because they cannot do very simple things 920 00:48:17,890 --> 00:48:20,800 that we normally think of a computer as being able to do. 921 00:48:20,800 --> 00:48:24,610 So here we're introducing another model of computation, 922 00:48:24,610 --> 00:48:25,750 called the Turing machine. 923 00:48:25,750 --> 00:48:27,840 And that's really going to be the model of what we're 924 00:48:27,840 --> 00:48:29,920 going to stick with for the rest of the semester, 925 00:48:29,920 --> 00:48:32,410 because that's going to be our model 926 00:48:32,410 --> 00:48:35,413 of a general-purpose computer, the way 927 00:48:35,413 --> 00:48:36,580 you normally think about it. 928 00:48:39,380 --> 00:48:41,960 So let's-- we'll spend a little time introducing it. 929 00:48:41,960 --> 00:48:50,100 And then we we'll continue that discussion next time. 930 00:48:50,100 --> 00:48:52,550 So in terms of a schematic, actually, 931 00:48:52,550 --> 00:48:54,410 the Turing machine model is pretty simple. 932 00:49:01,850 --> 00:49:04,520 It's going to have states and all that stuff. 933 00:49:04,520 --> 00:49:07,092 So there is going to be a finite control here, which 934 00:49:07,092 --> 00:49:09,300 is going to include states and a transition function, 935 00:49:09,300 --> 00:49:11,540 as we'll describe in a minute. 936 00:49:11,540 --> 00:49:13,730 The point is that it's going to have 937 00:49:13,730 --> 00:49:17,420 the input appearing on a tape. 938 00:49:17,420 --> 00:49:20,510 The key difference now is that the machine 939 00:49:20,510 --> 00:49:24,380 is going to be able to change the symbols on the tape. 940 00:49:24,380 --> 00:49:27,650 And so we think of the machine as being able to write as well 941 00:49:27,650 --> 00:49:29,480 as read the tape. 942 00:49:29,480 --> 00:49:36,740 So that's really the key feature of a Turing machine, 943 00:49:36,740 --> 00:49:38,810 is the ability to write on the tape. 944 00:49:38,810 --> 00:49:41,630 Everything else, in a sense, follows from that, 945 00:49:41,630 --> 00:49:43,730 and a few other differences. 946 00:49:43,730 --> 00:49:48,530 But so the fact that the head can read and write 947 00:49:48,530 --> 00:49:52,610 so that we can use the tape as storage 948 00:49:52,610 --> 00:49:54,620 much as we use the stack of storage, 949 00:49:54,620 --> 00:49:58,100 but it's not limited in the way we can access it the way a 950 00:49:58,100 --> 00:49:59,090 stack is-- 951 00:49:59,090 --> 00:50:02,930 so we kind of have very flexible access 952 00:50:02,930 --> 00:50:05,150 of the information on the tape. 953 00:50:05,150 --> 00:50:08,120 Now, being able to write on the tape 954 00:50:08,120 --> 00:50:10,730 doesn't do any good if you can't go back and read 955 00:50:10,730 --> 00:50:12,420 what you've written later on. 956 00:50:12,420 --> 00:50:15,290 So we're going to make the head to be able to be two way. 957 00:50:15,290 --> 00:50:17,510 So the head can move left to right as before, 958 00:50:17,510 --> 00:50:19,520 but it can also move back left. 959 00:50:19,520 --> 00:50:21,770 And that's going to be under control of the transition 960 00:50:21,770 --> 00:50:26,180 function, so under program control, essentially. 961 00:50:26,180 --> 00:50:28,160 The tape is going to be-- 962 00:50:28,160 --> 00:50:30,270 oops, sorry. 963 00:50:30,270 --> 00:50:34,210 The tape is infinite to the right. 964 00:50:34,210 --> 00:50:38,650 And so we're not going to limit how much storage the machine 965 00:50:38,650 --> 00:50:39,440 can have. 966 00:50:39,440 --> 00:50:41,898 So the tape is going to-- we'll think of as having, instead 967 00:50:41,898 --> 00:50:44,600 of just having the input on it, it's going to have the input. 968 00:50:44,600 --> 00:50:46,142 But then the rest, it's going to have 969 00:50:46,142 --> 00:50:50,110 infinitely many blanks, blank symbols following the input. 970 00:50:50,110 --> 00:50:57,410 So the tape is infinite in the right-hand direction. 971 00:50:57,410 --> 00:51:00,640 And so there is infinitely many blanks. 972 00:51:00,640 --> 00:51:04,180 I'm going to use that symbol for the blank to follow the input. 973 00:51:04,180 --> 00:51:05,530 You can accept or reject. 974 00:51:05,530 --> 00:51:07,930 Oh yeah, so that's another thing that's important. 975 00:51:07,930 --> 00:51:09,940 Normally, we think of-- 976 00:51:09,940 --> 00:51:13,810 in the previous machines, finite automata, pushdown automata, 977 00:51:13,810 --> 00:51:15,610 when you got to the end of the input, 978 00:51:15,610 --> 00:51:18,418 that's when the acceptance or rejection was decided. 979 00:51:18,418 --> 00:51:20,710 If you were going to accept it at the end of the input, 980 00:51:20,710 --> 00:51:21,850 then you accepted. 981 00:51:21,850 --> 00:51:25,060 But you have to be in that location 982 00:51:25,060 --> 00:51:27,700 at the end of the input in order for that to take effect. 983 00:51:27,700 --> 00:51:29,200 That doesn't make any sense anymore, 984 00:51:29,200 --> 00:51:32,740 because the machine might go off beyond that, 985 00:51:32,740 --> 00:51:35,860 and still be computing, and come back and read the tape later 986 00:51:35,860 --> 00:51:36,470 on. 987 00:51:36,470 --> 00:51:38,410 So it only really makes sense to let 988 00:51:38,410 --> 00:51:44,040 the machine accept or reject upon entering the accept 989 00:51:44,040 --> 00:51:45,070 or reject state. 990 00:51:45,070 --> 00:51:47,280 So we're going to have a special accept state 991 00:51:47,280 --> 00:51:49,680 and a special reject state, which is also 992 00:51:49,680 --> 00:51:51,540 a little different than before. 993 00:51:51,540 --> 00:51:53,940 And when the machine enters those states, 994 00:51:53,940 --> 00:51:55,110 then the machine-- 995 00:51:55,110 --> 00:51:57,210 then the action takes effect. 996 00:51:57,210 --> 00:52:02,010 The machine halts and then accepts or halts and then 997 00:52:02,010 --> 00:52:03,000 rejects. 998 00:52:03,000 --> 00:52:06,030 So we'll make that absolutely clear in the formal definition 999 00:52:06,030 --> 00:52:10,000 in a second, but just to get the spirit of it. 1000 00:52:10,000 --> 00:52:13,890 So I'm going to give you an example of the thing running. 1001 00:52:13,890 --> 00:52:17,970 Sorry, me too-- again, my PowerPoint is having issues. 1002 00:52:21,950 --> 00:52:26,600 OK, so here is a Turing machine recognizing that language b. 1003 00:52:26,600 --> 00:52:28,180 Actually, I switched gears on you. 1004 00:52:28,180 --> 00:52:30,680 Instead of 0's, 1's, and 2's, I made them a's, b's, and c's, 1005 00:52:30,680 --> 00:52:31,745 but the same idea. 1006 00:52:35,790 --> 00:52:38,630 So I'm going to show you how the Turing machine operates. 1007 00:52:38,630 --> 00:52:41,760 And then we'll give a formal definition. 1008 00:52:41,760 --> 00:52:43,610 I hope that's on here. 1009 00:52:43,610 --> 00:52:45,550 I think it is. 1010 00:52:45,550 --> 00:52:46,550 In a second, but let's-- 1011 00:52:46,550 --> 00:52:48,967 this is an informal discussion of how the machine is going 1012 00:52:48,967 --> 00:52:52,130 to operate to do this language, a to the k, b to the k, c 1013 00:52:52,130 --> 00:52:55,100 to the k, using its ability to write on the tape 1014 00:52:55,100 --> 00:52:58,050 as well as read and move its head in both directions. 1015 00:52:58,050 --> 00:53:02,690 OK, so let me just first describe in English 1016 00:53:02,690 --> 00:53:06,880 how this machine operates. 1017 00:53:06,880 --> 00:53:10,690 And then we will see it in action 1018 00:53:10,690 --> 00:53:14,460 on this little picture I have over here. 1019 00:53:14,460 --> 00:53:16,210 So the way the machine is going to operate 1020 00:53:16,210 --> 00:53:19,240 is the very first thing is the head is going to start here. 1021 00:53:19,240 --> 00:53:22,530 And the head is going to scan off to the right, 1022 00:53:22,530 --> 00:53:25,950 making sure that the symbols appear in the correct order. 1023 00:53:25,950 --> 00:53:29,750 So it's seeing that there are a's and b's and then c's, 1024 00:53:29,750 --> 00:53:33,230 without checking the quantities, just that the order is correct. 1025 00:53:33,230 --> 00:53:35,240 For that, you don't need to write. 1026 00:53:35,240 --> 00:53:38,450 A finite automaton can check that the input is of the form 1027 00:53:38,450 --> 00:53:40,920 a star, b star, c star. 1028 00:53:40,920 --> 00:53:45,570 So writing is not necessary. 1029 00:53:45,570 --> 00:53:48,760 The machine, if it detects symbols out of order, 1030 00:53:48,760 --> 00:53:52,320 it immediately rejects by going into a special reject state. 1031 00:53:52,320 --> 00:53:55,710 Otherwise, it's going to return its head back to the left end. 1032 00:53:55,710 --> 00:53:59,680 And let me just show that here. 1033 00:53:59,680 --> 00:54:01,470 So here is-- oh no. 1034 00:54:04,835 --> 00:54:06,210 Before I illustrate it over here, 1035 00:54:06,210 --> 00:54:08,258 let's go through the whole algorithm. 1036 00:54:08,258 --> 00:54:10,800 So the next thing that happens is you're going to scan right. 1037 00:54:10,800 --> 00:54:12,570 And now you want to do the counting. 1038 00:54:12,570 --> 00:54:14,370 So you're going to scan right again, 1039 00:54:14,370 --> 00:54:16,110 but this time, you're going to make 1040 00:54:16,110 --> 00:54:19,710 a bunch of passes over the input, a bunch of scans. 1041 00:54:19,710 --> 00:54:21,600 And each time you make a scan, you're 1042 00:54:21,600 --> 00:54:25,540 going to cross off one symbol of each type. 1043 00:54:25,540 --> 00:54:27,000 So you're going to cross off an a. 1044 00:54:27,000 --> 00:54:28,380 You'll cross off a b. 1045 00:54:28,380 --> 00:54:30,930 You'll cross off a c on a single scan. 1046 00:54:30,930 --> 00:54:33,780 And then you repeat that, crossing off the next a, 1047 00:54:33,780 --> 00:54:35,130 the next b, the next c. 1048 00:54:35,130 --> 00:54:37,680 And you want to make sure that you've 1049 00:54:37,680 --> 00:54:40,920 crossed off all of the symbols on the same run 1050 00:54:40,920 --> 00:54:43,950 and not crossing off some symbols before other symbols, 1051 00:54:43,950 --> 00:54:46,710 while other symbols will remain, because that would mean 1052 00:54:46,710 --> 00:54:48,210 that the counts were not equal. 1053 00:54:48,210 --> 00:54:50,370 If you cross them off and they're all 1054 00:54:50,370 --> 00:54:54,270 run out on the same scan, same pass, 1055 00:54:54,270 --> 00:54:57,630 then we know that the numbers had to start off being equal. 1056 00:54:57,630 --> 00:55:00,090 So I mean, this is a sort of baby stuff here, 1057 00:55:00,090 --> 00:55:01,650 but I hope you get the idea. 1058 00:55:01,650 --> 00:55:03,545 And we'll kind of illustrate it in a second. 1059 00:55:05,250 --> 00:55:08,500 If you have the last one of each symbol-- 1060 00:55:08,500 --> 00:55:10,530 so what I mean by that is you just 1061 00:55:10,530 --> 00:55:13,620 crossed off the last a, the last b, and the last c-- 1062 00:55:13,620 --> 00:55:17,523 then that you originally had an equal number. 1063 00:55:17,523 --> 00:55:19,440 And so you accept, because you're crossing off 1064 00:55:19,440 --> 00:55:21,570 one of each on each scan. 1065 00:55:21,570 --> 00:55:25,020 So if you cross off, on the last scan, each one of them 1066 00:55:25,020 --> 00:55:29,080 gets crossed off, then you accept. 1067 00:55:29,080 --> 00:55:32,370 But if it was the last of some symbol but not of other 1068 00:55:32,370 --> 00:55:37,170 symbols, so you crossed off the last a, but there were several 1069 00:55:37,170 --> 00:55:40,080 b's remaining, then you started off with an unequal number 1070 00:55:40,080 --> 00:55:42,610 of a's, b's, and c's. 1071 00:55:42,610 --> 00:55:43,650 Then you can reject. 1072 00:55:43,650 --> 00:55:48,400 Or if all symbols still remain after you have crossed them, 1073 00:55:48,400 --> 00:55:52,560 one on each off, then you haven't done enough passes. 1074 00:55:52,560 --> 00:55:56,040 And you're going to repeat from stage three 1075 00:55:56,040 --> 00:55:59,280 and do that again another scan. 1076 00:55:59,280 --> 00:56:02,460 OK, so here is a little animation 1077 00:56:02,460 --> 00:56:04,870 which shows this happening on this diagram. 1078 00:56:04,870 --> 00:56:08,640 So here is the very first stage where you're scanning across, 1079 00:56:08,640 --> 00:56:11,220 making sure things are in the right order. 1080 00:56:11,220 --> 00:56:12,840 I didn't have to write on the tape. 1081 00:56:12,840 --> 00:56:16,451 And now you're going to reset the head back to the beginning. 1082 00:56:19,040 --> 00:56:21,860 This is, by the way, not the most efficient procedure 1083 00:56:21,860 --> 00:56:22,670 for doing this. 1084 00:56:29,330 --> 00:56:31,950 Now we're going to do a scan crossing off 1085 00:56:31,950 --> 00:56:35,000 a single a, a single b, and a single c. 1086 00:56:35,000 --> 00:56:38,390 So here, I'm going to show that here, a single a, a single b, 1087 00:56:38,390 --> 00:56:39,703 single c. 1088 00:56:39,703 --> 00:56:41,870 And now as soon as you have crossed out that last c, 1089 00:56:41,870 --> 00:56:44,570 we can return back to the beginning. 1090 00:56:44,570 --> 00:56:46,430 So scan right across-- 1091 00:56:49,020 --> 00:56:51,150 so if all symbols remain, so there are still 1092 00:56:51,150 --> 00:56:52,890 symbols remaining of each type, we're 1093 00:56:52,890 --> 00:56:54,515 going to return to the left and repeat. 1094 00:56:57,590 --> 00:57:01,010 Now we're getting another pass, single a, single b, 1095 00:57:01,010 --> 00:57:03,260 single c get crossed off. 1096 00:57:03,260 --> 00:57:04,970 Have we crossed them all off yet? 1097 00:57:04,970 --> 00:57:09,210 No, there is-- of each type, there still are remaining ones. 1098 00:57:09,210 --> 00:57:11,580 So again, we return back to the beginning. 1099 00:57:11,580 --> 00:57:14,000 Now we have a last pass, cross off 1100 00:57:14,000 --> 00:57:16,190 the last a, the last b, the last c. 1101 00:57:16,190 --> 00:57:18,420 The last one of each type was crossed off. 1102 00:57:18,420 --> 00:57:22,100 So now we know we can accept, because the original string 1103 00:57:22,100 --> 00:57:23,770 was in the language. 1104 00:57:23,770 --> 00:57:28,920 OK, so that's to give you at least some idea how the Turing 1105 00:57:28,920 --> 00:57:31,680 machine can operate, more like the way 1106 00:57:31,680 --> 00:57:33,910 you would think of a computer operating. 1107 00:57:33,910 --> 00:57:35,220 Maybe it's very primitive. 1108 00:57:35,220 --> 00:57:37,740 You could imagine counting also. 1109 00:57:37,740 --> 00:57:39,450 And a Turing machine can count as well. 1110 00:57:39,450 --> 00:57:43,200 But this is the simplest procedure 1111 00:57:43,200 --> 00:57:49,470 that I can just describe for you without making 1112 00:57:49,470 --> 00:57:51,900 it too complicated. 1113 00:57:51,900 --> 00:57:55,550 OK, so let's do a little checking on that. 1114 00:57:55,550 --> 00:58:00,890 OK, so the way I'm describing this, how do you think? 1115 00:58:00,890 --> 00:58:03,350 And in a sense, you don't quite know enough yet. 1116 00:58:03,350 --> 00:58:09,520 But how do you think we're going to get this effect of crossing 1117 00:58:09,520 --> 00:58:12,400 off with the Turing machine? 1118 00:58:12,400 --> 00:58:15,490 Are we going to get that by changing the model 1119 00:58:15,490 --> 00:58:19,360 and adding that ability to cross off to the model? 1120 00:58:19,360 --> 00:58:21,040 Are we going to use a tape alphabet that 1121 00:58:21,040 --> 00:58:24,370 includes those crossed-off symbols among them? 1122 00:58:24,370 --> 00:58:27,050 Or we'll just assume that all Turing 1123 00:58:27,050 --> 00:58:29,050 machines come with an eraser and they can always 1124 00:58:29,050 --> 00:58:30,850 cross off stuff. 1125 00:58:30,850 --> 00:58:34,780 So what do you think is the nice way, sort of mathematically, 1126 00:58:34,780 --> 00:58:45,240 to describe this ability to cross things off? 1127 00:58:56,180 --> 00:59:01,780 Yeah, again, most of you, again, I think are getting this. 1128 00:59:01,780 --> 00:59:07,630 So there are, like, 10 laggards here. 1129 00:59:07,630 --> 00:59:14,280 So please wrap it up so we can close the poll. 1130 00:59:14,280 --> 00:59:15,420 Five seconds to go. 1131 00:59:18,800 --> 00:59:25,550 OK, polling ending, get your last-- last call. 1132 00:59:25,550 --> 00:59:28,420 All right, share the results. 1133 00:59:28,420 --> 00:59:31,370 So most of you got that right. 1134 00:59:31,370 --> 00:59:33,120 All Turing machines come with the eraser-- 1135 00:59:33,120 --> 00:59:33,480 I don't know. 1136 00:59:33,480 --> 00:59:35,910 That was thrown in there as a joke, but it came in second. 1137 00:59:35,910 --> 00:59:38,970 So don't feel bad if you got it, but that's 1138 00:59:38,970 --> 00:59:41,180 not what I had in mind. 1139 00:59:41,180 --> 00:59:44,190 The way the Turing machine is going to be writing on the tape 1140 00:59:44,190 --> 00:59:50,160 is to write a crossed-off symbol instead of the symbol 1141 00:59:50,160 --> 00:59:51,473 that was originally there. 1142 00:59:51,473 --> 00:59:53,640 So we're going to add these new crossed-off symbols. 1143 00:59:53,640 --> 00:59:57,855 And that's going to be a common thing for us to do 1144 00:59:57,855 --> 00:59:59,688 when we design Turing machines. 1145 00:59:59,688 --> 01:00:01,980 We're not going to get down to the implementation level 1146 01:00:01,980 --> 01:00:02,610 for very long. 1147 01:00:02,610 --> 01:00:03,990 We're going to very quickly shift 1148 01:00:03,990 --> 01:00:07,567 to a higher level of discussion about the machines. 1149 01:00:07,567 --> 01:00:09,150 But anyway, that's how you would do it 1150 01:00:09,150 --> 01:00:11,670 if you were going to actually build a machine. 1151 01:00:11,670 --> 01:00:15,575 So let us then look at the formal definition. 1152 01:00:15,575 --> 01:00:17,700 And personally, maybe I should have done that check 1153 01:00:17,700 --> 01:00:19,050 in after the formal definition. 1154 01:00:19,050 --> 01:00:22,538 That might have been clearer, but oh well. 1155 01:00:22,538 --> 01:00:24,330 OK, Turing-- here is the formal definition. 1156 01:00:24,330 --> 01:00:26,310 This time, a Turing machine is a 7-tuple. 1157 01:00:28,950 --> 01:00:38,090 And there is-- now here, we have sigma, 1158 01:00:38,090 --> 01:00:39,490 which is the input alphabet. 1159 01:00:39,490 --> 01:00:42,070 Gamma is the tape alphabet. 1160 01:00:42,070 --> 01:00:45,640 So now you're a little bit analogous to the stack 1161 01:00:45,640 --> 01:00:47,920 from before where gamma was the stack alphabet. 1162 01:00:47,920 --> 01:00:49,690 But these are the symbols that you're 1163 01:00:49,690 --> 01:00:52,570 allowed to write on the tape-- 1164 01:00:52,570 --> 01:00:55,120 that are allowed to be on the tape. 1165 01:00:55,120 --> 01:00:58,600 So obviously, all of the input symbols 1166 01:00:58,600 --> 01:01:02,440 are among the tape symbols, because they 1167 01:01:02,440 --> 01:01:03,490 can appear on the tape. 1168 01:01:03,490 --> 01:01:07,893 So you have sigma is a subset of gamma. 1169 01:01:07,893 --> 01:01:09,310 One thing I didn't mention here is 1170 01:01:09,310 --> 01:01:13,690 that the input alphabet, we don't allow the blank symbol 1171 01:01:13,690 --> 01:01:18,310 to be in the input alphabet, so that you can actually 1172 01:01:18,310 --> 01:01:20,110 use the blank symbol as a delimiter 1173 01:01:20,110 --> 01:01:23,910 for the end of the input, a marker 1174 01:01:23,910 --> 01:01:25,870 for the end of the input. 1175 01:01:25,870 --> 01:01:28,010 So in fact, and the blank symbol is always 1176 01:01:28,010 --> 01:01:29,385 going to be in the tape alphabet. 1177 01:01:33,460 --> 01:01:36,660 This is actually always going to be a proper subset because 1178 01:01:36,660 --> 01:01:38,235 of the blank symbol. 1179 01:01:38,235 --> 01:01:40,360 But we're just allowing-- it doesn't really matter. 1180 01:01:40,360 --> 01:01:43,230 We're allowing the tape alphabet to have 1181 01:01:43,230 --> 01:01:45,600 other symbols for convenience, so for example, 1182 01:01:45,600 --> 01:01:47,430 these crossed-off symbols. 1183 01:01:47,430 --> 01:01:49,620 Now let's look at what the transition function, 1184 01:01:49,620 --> 01:01:50,718 how that operates. 1185 01:01:50,718 --> 01:01:52,260 So the transition function, remember, 1186 01:01:52,260 --> 01:01:55,020 tells how the machine is actually doing its computation. 1187 01:01:55,020 --> 01:01:59,430 And it says that, if you're in a certain state and the head 1188 01:01:59,430 --> 01:02:03,780 is looking at a certain tape symbol, 1189 01:02:03,780 --> 01:02:06,300 then you can go to a new state. 1190 01:02:06,300 --> 01:02:10,320 You write a new symbol at that location on the tape. 1191 01:02:10,320 --> 01:02:13,688 And you can move the head either left or right. 1192 01:02:13,688 --> 01:02:15,480 So that's how we get the effect of the head 1193 01:02:15,480 --> 01:02:19,230 being able to be bi-directional. 1194 01:02:19,230 --> 01:02:21,375 And here is the writing on the tape. 1195 01:02:21,375 --> 01:02:22,552 It comes up right here. 1196 01:02:22,552 --> 01:02:26,295 So just an example here which says 1197 01:02:26,295 --> 01:02:29,820 that, if we're in state two and the head is looking at an a 1198 01:02:29,820 --> 01:02:32,400 currently on the tape, then we can move the state r. 1199 01:02:32,400 --> 01:02:33,450 We change that a to a b. 1200 01:02:33,450 --> 01:02:35,490 And we move the head right 1 square. 1201 01:02:39,990 --> 01:02:41,430 Now, this is important. 1202 01:02:44,740 --> 01:02:48,430 When you give a certain input here to the Turing machine, 1203 01:02:48,430 --> 01:02:50,680 it may compute around for a while, 1204 01:02:50,680 --> 01:02:53,290 moving its head back and forth, as we were showing. 1205 01:02:53,290 --> 01:02:55,270 And it may eventually halt by either 1206 01:02:55,270 --> 01:02:57,610 entering the q accept state or the q 1207 01:02:57,610 --> 01:03:01,000 reject state, which I didn't bring out here, 1208 01:03:01,000 --> 01:03:02,150 but that's important. 1209 01:03:02,150 --> 01:03:04,630 These are the accepting, rejecting, special states 1210 01:03:04,630 --> 01:03:06,130 of the machine. 1211 01:03:06,130 --> 01:03:08,145 Or the machine may never enter one of those. 1212 01:03:08,145 --> 01:03:09,520 It may just go on, and on, and on 1213 01:03:09,520 --> 01:03:13,880 and never halt. We call that looping, a little bit 1214 01:03:13,880 --> 01:03:16,550 of a misnomer, because looping implies 1215 01:03:16,550 --> 01:03:18,020 some sort of a repetition. 1216 01:03:18,020 --> 01:03:22,370 For us, looping just means not halting. 1217 01:03:22,370 --> 01:03:27,910 And so therefore, M has three possible outcomes 1218 01:03:27,910 --> 01:03:32,440 for each input, this w. 1219 01:03:32,440 --> 01:03:38,290 It might accept w by entering the accept state. 1220 01:03:38,290 --> 01:03:41,620 It could reject w by entering the reject state, which 1221 01:03:41,620 --> 01:03:44,540 means it's going to reject it by halting. 1222 01:03:44,540 --> 01:03:47,960 Or we also say we can reject by looping. 1223 01:03:47,960 --> 01:03:51,920 You can reject the string by running forever. 1224 01:03:51,920 --> 01:03:54,740 That's just the terminology that's common in the subject. 1225 01:03:54,740 --> 01:03:59,480 So you either accept it by halting and accepting 1226 01:03:59,480 --> 01:04:02,660 or rejecting it by either halting and rejecting 1227 01:04:02,660 --> 01:04:04,190 or by just going forever. 1228 01:04:04,190 --> 01:04:06,230 That's also considered to be rejecting, 1229 01:04:06,230 --> 01:04:09,350 sort of rejecting in a sense by default. 1230 01:04:09,350 --> 01:04:11,223 If you never actually have accepted it, 1231 01:04:11,223 --> 01:04:12,515 then it's going to be rejected. 1232 01:04:15,520 --> 01:04:18,280 OK, check in three here-- 1233 01:04:18,280 --> 01:04:23,030 all right, so now our last check in for the day, 1234 01:04:23,030 --> 01:04:28,030 we say, this Turing machine model is deterministic. 1235 01:04:28,030 --> 01:04:29,000 I'm just saying that. 1236 01:04:29,000 --> 01:04:33,010 But if you look at the way we set it up, 1237 01:04:33,010 --> 01:04:35,680 if you've been following the formal definition so far, 1238 01:04:35,680 --> 01:04:38,080 you would understand why it's deterministic. 1239 01:04:38,080 --> 01:04:41,080 So let's just, as a way of checking that, 1240 01:04:41,080 --> 01:04:43,030 how would we change this definition? 1241 01:04:43,030 --> 01:04:45,428 Because we will look at the next lecture 1242 01:04:45,428 --> 01:04:46,970 at non-deterministic Turing machines. 1243 01:04:46,970 --> 01:04:48,760 So a little bit of a lead in to that, 1244 01:04:48,760 --> 01:04:52,000 how would we change this definition 1245 01:04:52,000 --> 01:04:54,340 to make it a non-deterministic Turing machine? 1246 01:04:54,340 --> 01:04:58,760 Which of those three options would we use? 1247 01:04:58,760 --> 01:05:00,340 So here, I'll launch that poll. 1248 01:05:07,775 --> 01:05:09,602 I've got about 10 people left. 1249 01:05:09,602 --> 01:05:11,060 Let's give them another 10 seconds. 1250 01:05:17,670 --> 01:05:20,865 OK, I think that's everybody who has answered it from before. 1251 01:05:23,710 --> 01:05:26,760 So here, I think you pretty much almost all of you 1252 01:05:26,760 --> 01:05:28,230 got the right idea. 1253 01:05:28,230 --> 01:05:32,190 It is B, in fact, because when we have the power set symbol 1254 01:05:32,190 --> 01:05:34,200 here, that means there might be several-- 1255 01:05:34,200 --> 01:05:36,130 there is a subset of possibilities. 1256 01:05:36,130 --> 01:05:38,580 So that indicates several different ways to go. 1257 01:05:42,010 --> 01:05:44,380 And that's the essence of non-determinism. 1258 01:05:44,380 --> 01:05:48,040 OK, so I think we're-- 1259 01:05:48,040 --> 01:05:48,540 whoops. 1260 01:05:53,990 --> 01:05:59,060 All right, so look, this is also kind of setting us up 1261 01:05:59,060 --> 01:06:04,090 for next lecture and where we're going to be going with this. 1262 01:06:04,090 --> 01:06:08,670 So these are basically two in a-- well, two or three 1263 01:06:08,670 --> 01:06:11,560 important definitions here. 1264 01:06:11,560 --> 01:06:16,710 One is-- we talked about the regular languages 1265 01:06:16,710 --> 01:06:18,090 from finite automata. 1266 01:06:18,090 --> 01:06:21,060 We talked about the context-free languages 1267 01:06:21,060 --> 01:06:23,100 from the grammars and the pushdown automata. 1268 01:06:23,100 --> 01:06:28,170 What are the languages that the Turing machines can do? 1269 01:06:28,170 --> 01:06:30,870 Those are called, in this course, anyway, 1270 01:06:30,870 --> 01:06:36,420 Turing-recognizable languages, or T recognizable. 1271 01:06:36,420 --> 01:06:40,870 Those are the languages that the Turing machine can recognize. 1272 01:06:40,870 --> 01:06:46,710 And so just to make sure we were on the same page on this, 1273 01:06:46,710 --> 01:06:50,070 the language of the machine is the collection of strings 1274 01:06:50,070 --> 01:06:53,020 that the machine accepts. 1275 01:06:53,020 --> 01:06:54,770 So the things that are not in the language 1276 01:06:54,770 --> 01:06:57,320 are the things that are rejected either by looping 1277 01:06:57,320 --> 01:06:59,840 or by halting and rejecting. 1278 01:06:59,840 --> 01:07:03,320 So only the ones that are accepted is the language. 1279 01:07:03,320 --> 01:07:05,480 Every machine has just a single language. 1280 01:07:05,480 --> 01:07:08,840 It's the language of all strings that that machine accepts. 1281 01:07:08,840 --> 01:07:11,360 And we'll say that and recognize that language, 1282 01:07:11,360 --> 01:07:14,150 if that language is the collection of such strings 1283 01:07:14,150 --> 01:07:15,800 that are accepted. 1284 01:07:15,800 --> 01:07:21,980 And we will call that language a Turing-recognizable language, 1285 01:07:21,980 --> 01:07:24,230 if there is some Turing machine that can recognize it. 1286 01:07:28,090 --> 01:07:35,940 Now, this feature of being able to reject by running forever 1287 01:07:35,940 --> 01:07:37,310 is a little bit weird, perhaps. 1288 01:07:40,010 --> 01:07:46,370 And from the standpoint of practicality, 1289 01:07:46,370 --> 01:07:50,000 it's more convenient if the machine always makes a decision 1290 01:07:50,000 --> 01:07:52,940 to accept or reject in finite time 1291 01:07:52,940 --> 01:07:55,580 and doesn't just reject by going forever. 1292 01:07:55,580 --> 01:07:59,630 And so we're going to bring out a special class of Turing 1293 01:07:59,630 --> 01:08:05,090 machines that have that feature, that they always halt. 1294 01:08:05,090 --> 01:08:07,130 The halting states, by the way-- maybe it 1295 01:08:07,130 --> 01:08:09,590 didn't say this explicitly-- are the q accept 1296 01:08:09,590 --> 01:08:10,670 and the q reject states. 1297 01:08:10,670 --> 01:08:13,610 The accept and reject states are the halting states. 1298 01:08:13,610 --> 01:08:15,888 So if the machine halts, that means 1299 01:08:15,888 --> 01:08:17,180 it ends up in one of those two. 1300 01:08:17,180 --> 01:08:19,580 So it has made a decision of accepting 1301 01:08:19,580 --> 01:08:25,490 or rejecting at the point at which it has halted. 1302 01:08:25,490 --> 01:08:28,970 So we'll say a machine is a decider if it always 1303 01:08:28,970 --> 01:08:32,260 halts on every input. 1304 01:08:32,260 --> 01:08:34,510 So for every w fed in, the machine 1305 01:08:34,510 --> 01:08:38,660 is eventually going to come to a q accept or a q reject. 1306 01:08:38,660 --> 01:08:41,170 We call such a machine a decider. 1307 01:08:41,170 --> 01:08:46,279 And now we're going to say, a language is-- 1308 01:08:46,279 --> 01:08:50,463 so we'll say that the machine decides a language if it's 1309 01:08:50,463 --> 01:08:52,880 the language of the machine, so the collection of accepted 1310 01:08:52,880 --> 01:08:55,760 strings, and the machine is the decider. 1311 01:08:55,760 --> 01:08:59,060 We'll say that, instead of just recognizing the language, 1312 01:08:59,060 --> 01:09:02,729 we'll say that it decides the language. 1313 01:09:02,729 --> 01:09:07,439 And the Turing-decidable language is a language that 1314 01:09:07,439 --> 01:09:08,640 the machine-- 1315 01:09:08,640 --> 01:09:11,939 of all strings the machine accepts for some Turing machine 1316 01:09:11,939 --> 01:09:14,040 which is a decider, which is a Turing machine that 1317 01:09:14,040 --> 01:09:15,960 always halts. 1318 01:09:15,960 --> 01:09:19,370 So if a Turing machine may sometimes reject by looping, 1319 01:09:19,370 --> 01:09:21,470 then it's only recognizing its language. 1320 01:09:21,470 --> 01:09:23,810 If the Turing machine is always halting, 1321 01:09:23,810 --> 01:09:26,779 so it's always rejecting by explicitly coming 1322 01:09:26,779 --> 01:09:29,779 to a reject state and halting, then we'll 1323 01:09:29,779 --> 01:09:32,399 say it's actually deciding the language. 1324 01:09:32,399 --> 01:09:34,268 So then, in a sense, that's better. 1325 01:09:34,268 --> 01:09:36,310 And we're going to distinguish between those two, 1326 01:09:36,310 --> 01:09:40,430 because they're not the same. 1327 01:09:40,430 --> 01:09:42,520 There are some languages which can be recognized, 1328 01:09:42,520 --> 01:09:44,340 but not decided. 1329 01:09:44,340 --> 01:09:47,500 And so in fact, we're going to get the following picture here, 1330 01:09:47,500 --> 01:09:50,870 that the Turing-recognizable languages are a proper subset. 1331 01:09:50,870 --> 01:09:52,077 They include all of-- 1332 01:09:52,077 --> 01:09:53,660 everything that's decidable, certainly 1333 01:09:53,660 --> 01:10:00,500 is going to be recognizable, because being a decider 1334 01:10:00,500 --> 01:10:02,630 is an additional restriction to impose, 1335 01:10:02,630 --> 01:10:04,615 an additional requirement. 1336 01:10:04,615 --> 01:10:05,990 So everything that's decidable is 1337 01:10:05,990 --> 01:10:07,725 going to be automatically recognizable. 1338 01:10:07,725 --> 01:10:10,100 But there are things which are recognizable which are not 1339 01:10:10,100 --> 01:10:13,031 decidable, as we'll see. 1340 01:10:13,031 --> 01:10:14,630 I'll actually give an example of that, 1341 01:10:14,630 --> 01:10:17,270 but not prove it next lecture. 1342 01:10:17,270 --> 01:10:20,600 And just for, just to complete out this picture, 1343 01:10:20,600 --> 01:10:25,790 I'm going to also point out-- we haven't proven this yet, 1344 01:10:25,790 --> 01:10:29,080 but we will prove it-- 1345 01:10:29,080 --> 01:10:30,910 that the decidable languages also 1346 01:10:30,910 --> 01:10:34,150 include all the context-free languages, 1347 01:10:34,150 --> 01:10:36,070 which, in turn, include the regular languages, 1348 01:10:36,070 --> 01:10:37,180 as was already seen. 1349 01:10:37,180 --> 01:10:39,400 So we haven't shown this inclusion yet. 1350 01:10:39,400 --> 01:10:42,400 But actually, this is the picture that we get. 1351 01:10:42,400 --> 01:10:45,640 So there is actually a hierarchy of containments here. 1352 01:10:45,640 --> 01:10:50,170 Regular languages are a subset of the context-free languages, 1353 01:10:50,170 --> 01:10:53,050 which are, in turn, a subset of the decidable languages, which 1354 01:10:53,050 --> 01:10:57,850 in turn, are a subset of the Turing-recognizable languages. 1355 01:10:57,850 --> 01:11:00,070 And so with that, I think we're going 1356 01:11:00,070 --> 01:11:02,110 to move to our little bit of a review of what 1357 01:11:02,110 --> 01:11:04,290 we've done today. 1358 01:11:04,290 --> 01:11:06,800 So we proved that pumping lemma as a tool 1359 01:11:06,800 --> 01:11:10,790 for showing that languages are not context-free languages. 1360 01:11:10,790 --> 01:11:12,500 We defined Turing machines, which 1361 01:11:12,500 --> 01:11:17,285 is going to be our model that we're going to be focusing 1362 01:11:17,285 --> 01:11:20,750 on for the rest of the term, not forgetting the other models, 1363 01:11:20,750 --> 01:11:23,930 because they're going to be useful examples for us. 1364 01:11:23,930 --> 01:11:27,650 And we defined Turing deciders, Turing machine deciders 1365 01:11:27,650 --> 01:11:29,810 that halt on all inputs. 1366 01:11:29,810 --> 01:11:33,050 OK, so I think, with that, we have come 1367 01:11:33,050 --> 01:11:35,282 to the end of today's lecture. 1368 01:11:35,282 --> 01:11:37,490 I will stick around a little bit and answer questions 1369 01:11:37,490 --> 01:11:38,720 in the chat. 1370 01:11:38,720 --> 01:11:40,705 I will try to share them with everybody 1371 01:11:40,705 --> 01:11:43,205 as I'm answering them so I'm not just talking to one person. 1372 01:11:46,530 --> 01:11:49,470 How is the concept applied in-- 1373 01:11:49,470 --> 01:11:52,860 so I'm getting one question about the practicality 1374 01:11:52,860 --> 01:11:54,270 of all this. 1375 01:11:54,270 --> 01:11:56,010 Bunches of questions are coming in. 1376 01:12:00,380 --> 01:12:06,080 So look, is this stuff all practical? 1377 01:12:06,080 --> 01:12:09,920 I would say, yes and no. 1378 01:12:15,236 --> 01:12:17,130 I don't know which concept you have in mind. 1379 01:12:17,130 --> 01:12:19,890 We're going to introduce lots of concepts in this course. 1380 01:12:19,890 --> 01:12:30,870 And the concept of the finite automata, and the pushdown 1381 01:12:30,870 --> 01:12:34,560 automata, and context-free languages, definitely 1382 01:12:34,560 --> 01:12:39,330 used in other subjects, in other fields in computer science 1383 01:12:39,330 --> 01:12:43,680 and elsewhere-- these are very basic and fundamental notions. 1384 01:12:43,680 --> 01:12:46,227 And so yes, and Turing machines-- well, 1385 01:12:46,227 --> 01:12:48,060 I mean that's a model of a general computer. 1386 01:12:48,060 --> 01:12:50,190 If you want to understand computation, 1387 01:12:50,190 --> 01:12:52,133 you're going to need to understand some model. 1388 01:12:52,133 --> 01:12:54,300 And a Turing machine is a particularly simple model. 1389 01:12:54,300 --> 01:12:55,770 And that's why we use it. 1390 01:12:55,770 --> 01:12:56,730 As it turns out, it doesn't really 1391 01:12:56,730 --> 01:12:58,230 matter what model you use, but we'll 1392 01:12:58,230 --> 01:13:00,720 talk about that next time. 1393 01:13:00,720 --> 01:13:03,120 But yeah, I would say there is a lot of applied material 1394 01:13:03,120 --> 01:13:07,530 in this course, as time has shown, whether it's 1395 01:13:07,530 --> 01:13:10,500 led to things like public key cryptography, which 1396 01:13:10,500 --> 01:13:15,420 is used on the internet, or understanding 1397 01:13:15,420 --> 01:13:17,010 various algorithms. 1398 01:13:17,010 --> 01:13:19,410 I mean, that's not the reason I study it. 1399 01:13:19,410 --> 01:13:22,260 I study it because I'm more coming at it from more 1400 01:13:22,260 --> 01:13:23,730 of a mathematical perspective. 1401 01:13:23,730 --> 01:13:26,790 I just find the material very beautiful, and interesting, 1402 01:13:26,790 --> 01:13:30,330 and challenging, but it does have applications. 1403 01:13:33,010 --> 01:13:34,210 Any other questions here? 1404 01:13:34,210 --> 01:13:35,710 I think I'm going to sign off, then, 1405 01:13:35,710 --> 01:13:38,060 to get myself set up for my office hours, 1406 01:13:38,060 --> 01:13:40,600 which is on a different Zoom link. 1407 01:13:40,600 --> 01:13:42,220 OK, so thank you, everybody. 1408 01:13:42,220 --> 01:13:44,770 And see you on Thursday. 1409 01:13:44,770 --> 01:13:46,440 Bye bye.