1 00:00:00,000 --> 00:00:04,980 [SQUEAKING] [RUSTLING] [CLICKING] 2 00:00:25,240 --> 00:00:28,120 MICHAEL SIPSER: OK, folks. 3 00:00:28,120 --> 00:00:29,930 Here we are again. 4 00:00:29,930 --> 00:00:34,200 Welcome back for another episode of theory of computation. 5 00:00:34,200 --> 00:00:35,520 This is lecture number 3. 6 00:00:39,210 --> 00:00:42,570 I'm going to review what we've been doing. 7 00:00:42,570 --> 00:00:47,850 We've been looking at finite automata and regular languages. 8 00:00:47,850 --> 00:00:51,720 Those are the languages that finite automata can recognize. 9 00:00:51,720 --> 00:00:55,870 And we talked about nondeterminism. 10 00:00:55,870 --> 00:00:58,240 So we had non-deterministic finite automata 11 00:00:58,240 --> 00:00:59,760 and deterministic finite automata. 12 00:00:59,760 --> 00:01:01,860 We showed that they're equivalent. 13 00:01:01,860 --> 00:01:05,160 We looked at the closure properties 14 00:01:05,160 --> 00:01:08,640 over the regular operations union, concatenation, and star, 15 00:01:08,640 --> 00:01:11,400 and showed that the regular language is really-- 16 00:01:11,400 --> 00:01:13,650 the class of regular languages is closed 17 00:01:13,650 --> 00:01:15,760 under those regular operations. 18 00:01:15,760 --> 00:01:18,660 And we used the constructions that we 19 00:01:18,660 --> 00:01:21,330 developed in the proof of those closure properties 20 00:01:21,330 --> 00:01:23,730 to show that the-- 21 00:01:23,730 --> 00:01:28,350 we can give a way to convert regular expressions 22 00:01:28,350 --> 00:01:31,750 to finite automata. 23 00:01:31,750 --> 00:01:37,080 So that is-- was partway toward our goal 24 00:01:37,080 --> 00:01:40,890 of showing that these regular expressions and finite automata 25 00:01:40,890 --> 00:01:43,980 are equivalent with respect to the class of languages 26 00:01:43,980 --> 00:01:46,230 they describe, namely, the regular languages. 27 00:01:46,230 --> 00:01:48,240 So regular expressions of finite automata 28 00:01:48,240 --> 00:01:50,310 are interchangeable from the perspective 29 00:01:50,310 --> 00:01:53,890 of what kinds of languages you can do with them. 30 00:01:53,890 --> 00:01:57,010 So we're going to finish that off today. 31 00:01:57,010 --> 00:02:01,650 So let's take a look at what our next topics 32 00:02:01,650 --> 00:02:03,270 we're going to be covering. 33 00:02:03,270 --> 00:02:08,160 We're going to reverse the construction we 34 00:02:08,160 --> 00:02:11,250 gave last time which allowed us to convert regular expressions 35 00:02:11,250 --> 00:02:12,330 to finite automata. 36 00:02:12,330 --> 00:02:13,680 Now we're going to go backwards. 37 00:02:13,680 --> 00:02:16,410 We're going to show how to convert finite automata back 38 00:02:16,410 --> 00:02:18,330 to regular expressions. 39 00:02:18,330 --> 00:02:23,640 And that-- those two constructions together 40 00:02:23,640 --> 00:02:26,640 show us that the regular expressions and finite automata 41 00:02:26,640 --> 00:02:29,560 can be interconverted from one another, 42 00:02:29,560 --> 00:02:31,890 and they're therefore equivalent with respect 43 00:02:31,890 --> 00:02:36,300 to the kinds of things they can do in language recognition 44 00:02:36,300 --> 00:02:38,340 or generation. 45 00:02:38,340 --> 00:02:41,040 Then, we're going to prove that-- 46 00:02:41,040 --> 00:02:43,890 we're going to look at how you prove certain languages are not 47 00:02:43,890 --> 00:02:45,900 regular, they're beyond the capabilities 48 00:02:45,900 --> 00:02:47,100 of finite automata. 49 00:02:47,100 --> 00:02:48,690 And finally, at the end, we're going 50 00:02:48,690 --> 00:02:51,270 to introduce a new model of computation which 51 00:02:51,270 --> 00:02:54,060 is more powerful than the finite automata 52 00:02:54,060 --> 00:02:57,360 and regular expressions, namely, the context-free grammars. 53 00:02:57,360 --> 00:03:01,170 Those can do other kinds of languages 54 00:03:01,170 --> 00:03:05,400 that the simpler finite automata regular expressions models 55 00:03:05,400 --> 00:03:07,170 can't do. 56 00:03:07,170 --> 00:03:10,710 And I would also just like to note that a lot of what 57 00:03:10,710 --> 00:03:14,610 we're doing is a warm-up toward the more powerful models 58 00:03:14,610 --> 00:03:17,550 of computation that we're going to be looking at later-- 59 00:03:17,550 --> 00:03:19,260 well, in a week or so-- 60 00:03:19,260 --> 00:03:24,210 which are more general purpose computation. 61 00:03:24,210 --> 00:03:27,930 But along the way, introducing these models of finite automata 62 00:03:27,930 --> 00:03:32,760 in context-free languages is interesting and helpful 63 00:03:32,760 --> 00:03:34,500 because many of those-- 64 00:03:34,500 --> 00:03:37,440 a number-- those models turn out to be useful 65 00:03:37,440 --> 00:03:43,410 in a number of applications, whether it's from linguistics 66 00:03:43,410 --> 00:03:46,650 to programming languages. 67 00:03:46,650 --> 00:03:51,540 And a variety of different parts of computer science 68 00:03:51,540 --> 00:03:54,520 and other fields as well use those notions. 69 00:03:54,520 --> 00:04:01,110 So they're useful notions beyond just in this course. 70 00:04:01,110 --> 00:04:06,930 So I just want to-- 71 00:04:06,930 --> 00:04:09,960 a couple of administrative things to touch on. 72 00:04:09,960 --> 00:04:12,390 We are going to have additional check-ins 73 00:04:12,390 --> 00:04:16,300 today, as I mentioned to you. 74 00:04:16,300 --> 00:04:18,450 We're going to start counting participation-- not 75 00:04:18,450 --> 00:04:20,940 correctness, just participation-- 76 00:04:20,940 --> 00:04:23,220 in the live check-ins. 77 00:04:23,220 --> 00:04:30,510 So with that, let us move on to today's material. 78 00:04:30,510 --> 00:04:31,950 As I mentioned, we're going to be 79 00:04:31,950 --> 00:04:34,650 showing how to convert finite automata 80 00:04:34,650 --> 00:04:36,660 to regular expressions. 81 00:04:36,660 --> 00:04:39,850 And that's going to complete our equivalence of finite automata 82 00:04:39,850 --> 00:04:41,140 and regular expressions. 83 00:04:41,140 --> 00:04:43,680 So just to recap what we did last time, 84 00:04:43,680 --> 00:04:50,620 we showed that if you have a regular expression 85 00:04:50,620 --> 00:04:55,130 and it describes some language, then that language is regular. 86 00:04:55,130 --> 00:04:58,630 So in other words, we have a way of-- we 87 00:04:58,630 --> 00:05:02,530 gave a way of converting regular expressions to finite automata, 88 00:05:02,530 --> 00:05:04,570 as kind of shown in this diagram. 89 00:05:04,570 --> 00:05:06,530 That's what we did last time. 90 00:05:06,530 --> 00:05:09,320 Now we're going to go the other way. 91 00:05:09,320 --> 00:05:10,950 We're going to show how to convert-- 92 00:05:10,950 --> 00:05:16,540 oh, and just a reminder in case you're just getting yourself-- 93 00:05:16,540 --> 00:05:19,900 your memory to work, maybe it'll help you just 94 00:05:19,900 --> 00:05:22,660 to remember that we actually did an example of that conversion. 95 00:05:22,660 --> 00:05:26,800 We looked at this regular expression, a union ab star. 96 00:05:26,800 --> 00:05:30,100 And we actually worked through the process of converting that. 97 00:05:30,100 --> 00:05:30,610 Oops. 98 00:05:30,610 --> 00:05:34,210 I need to make myself smaller so you can see all that. 99 00:05:34,210 --> 00:05:38,900 We went through the process of converting a union ab 100 00:05:38,900 --> 00:05:42,268 star as an example of that-- 101 00:05:42,268 --> 00:05:45,984 of-- made a mis-- 102 00:05:45,984 --> 00:05:49,520 [LAUGHS] OK. 103 00:05:49,520 --> 00:05:53,840 Well, we went through the process of actually doing 104 00:05:53,840 --> 00:05:54,850 that conversion. 105 00:05:54,850 --> 00:05:57,433 And now we're going to show how to do it the other way around. 106 00:06:00,550 --> 00:06:05,600 So we're going to invert that and go backwards the other way. 107 00:06:05,600 --> 00:06:08,270 So today's theorem is to show that if a is regular, 108 00:06:08,270 --> 00:06:13,790 namely, it's the language of some finite automaton, 109 00:06:13,790 --> 00:06:21,120 then you can convert it to a regular expression which will 110 00:06:21,120 --> 00:06:22,690 describe that same language. 111 00:06:22,690 --> 00:06:25,230 So basically, we're going to give a conversion 112 00:06:25,230 --> 00:06:29,550 from finite automata to regular expressions. 113 00:06:29,550 --> 00:06:31,410 But before we do that, we're going 114 00:06:31,410 --> 00:06:33,878 to have to introduce a new concept. 115 00:06:33,878 --> 00:06:35,670 So we're not going to be able to dive right 116 00:06:35,670 --> 00:06:36,970 into that conversion. 117 00:06:36,970 --> 00:06:39,550 We're going to have to do-- introduce a new model first, 118 00:06:39,550 --> 00:06:42,330 which is going to facilitate that conversion. 119 00:06:42,330 --> 00:06:44,190 And that new model is called-- 120 00:06:44,190 --> 00:06:47,670 it's yet another kind of finite automaton called 121 00:06:47,670 --> 00:06:51,300 a Generalized Nondeterministic Finite 122 00:06:51,300 --> 00:06:57,490 Automaton, or a Generalized NFA, or just simply a GNFA. 123 00:06:57,490 --> 00:07:02,650 So this is yet another variant of the finite automaton model. 124 00:07:02,650 --> 00:07:06,520 And conceptually, it's very simple. 125 00:07:06,520 --> 00:07:08,560 It's similar to the NFAs. 126 00:07:08,560 --> 00:07:16,400 I'll give you-- here's a picture of a GNFA named G, G1. 127 00:07:16,400 --> 00:07:18,222 Very similar to the NFAs. 128 00:07:18,222 --> 00:07:19,680 But if you look at it for a second, 129 00:07:19,680 --> 00:07:26,090 you'll see that the transitions have more complicated labels. 130 00:07:26,090 --> 00:07:27,860 For the NFAs, we're only allowing 131 00:07:27,860 --> 00:07:30,650 just single symbols, or the empty string, 132 00:07:30,650 --> 00:07:32,150 to appear on the labels. 133 00:07:32,150 --> 00:07:36,260 Now I'm actually allowing you to put full regular expressions 134 00:07:36,260 --> 00:07:40,310 on the labels for the automaton. 135 00:07:40,310 --> 00:07:45,830 Now, we have to understand how a GNFA processes its input. 136 00:07:45,830 --> 00:07:51,590 And the way it works is not complicated to understand. 137 00:07:51,590 --> 00:07:54,050 When you're getting an input string feeding-- 138 00:07:54,050 --> 00:07:57,620 when a GNFA is processing an input string, 139 00:07:57,620 --> 00:08:01,350 it starts at the start state, just like you would imagine. 140 00:08:01,350 --> 00:08:03,770 But now, to go along a transition, 141 00:08:03,770 --> 00:08:06,860 instead of reading just a single symbol, or the empty string, 142 00:08:06,860 --> 00:08:11,690 as in the case for the nondeterministic machine, 143 00:08:11,690 --> 00:08:16,220 it actually gets to read a whole string at one step, 144 00:08:16,220 --> 00:08:18,380 kind of, at one bite. 145 00:08:18,380 --> 00:08:21,620 It can read an entire string and go along 146 00:08:21,620 --> 00:08:27,860 that transition arrow, provided that chunk of the input 147 00:08:27,860 --> 00:08:32,360 that it read is in the regular expression 148 00:08:32,360 --> 00:08:39,120 that that transition has as its label. 149 00:08:39,120 --> 00:08:43,010 So for example, this-- 150 00:08:43,010 --> 00:08:49,580 you can go from q1 to q2 in one step in this GNFA 151 00:08:49,580 --> 00:08:53,630 by reading a, a, b, b off the input. 152 00:08:53,630 --> 00:08:56,615 So it reads all of those four symbols all at once. 153 00:08:56,615 --> 00:08:59,330 It just swoops them up and then moves 154 00:08:59,330 --> 00:09:03,280 from q1 to q2 in one step. 155 00:09:03,280 --> 00:09:09,370 And then, when it's in q2 it can read aab and move to q3. 156 00:09:09,370 --> 00:09:11,930 And q3 happens, there's nowhere to go. 157 00:09:11,930 --> 00:09:14,350 So this is going to be a nondeterministic machine. 158 00:09:14,350 --> 00:09:17,330 There might be several different ways of processing the input. 159 00:09:17,330 --> 00:09:21,100 And if any one of them gets to an accepting state 160 00:09:21,100 --> 00:09:25,470 at the end of the input, we say the GNFA accepts. 161 00:09:25,470 --> 00:09:27,780 So it's similar to nondeterministic-- 162 00:09:27,780 --> 00:09:32,550 to NFAs in the way the acceptance criterion works. 163 00:09:32,550 --> 00:09:34,950 So you could do an example. 164 00:09:34,950 --> 00:09:39,210 But hopefully the concept of how this works is reasonably-- 165 00:09:39,210 --> 00:09:44,250 you can at least buy it, that it processes 166 00:09:44,250 --> 00:09:48,930 the input in chunks at a time. 167 00:09:48,930 --> 00:09:51,690 And those chunks have to be described 168 00:09:51,690 --> 00:09:56,910 by the regular expressions on the transition arrows, 169 00:09:56,910 --> 00:10:00,990 as it moves along those transitions. 170 00:10:00,990 --> 00:10:07,380 So what we're going to do now is to convert 171 00:10:07,380 --> 00:10:12,400 not DFAs to regular expressions, we're 172 00:10:12,400 --> 00:10:15,970 going to convert GNFAs to regular expression. 173 00:10:15,970 --> 00:10:18,955 That's even harder, because GNFAs 174 00:10:18,955 --> 00:10:22,450 are-- allow you to do all sorts of other things besides just 175 00:10:22,450 --> 00:10:24,220 ordinary DFAs. 176 00:10:24,220 --> 00:10:25,580 So that's a harder job. 177 00:10:25,580 --> 00:10:27,310 Why am I making my life harder? 178 00:10:27,310 --> 00:10:29,980 Well, you'll see in a minute that it's 179 00:10:29,980 --> 00:10:33,100 going to actually turn out to be helpful to be working 180 00:10:33,100 --> 00:10:36,370 with a more powerful model in the way 181 00:10:36,370 --> 00:10:40,120 this construction is going to work. 182 00:10:40,120 --> 00:10:43,210 Now, before I dive in and do the construction 183 00:10:43,210 --> 00:10:45,940 from GNFAs to regular expressions, 184 00:10:45,940 --> 00:10:50,050 I'm going to make a simplifying assumption about the GNFAs. 185 00:10:50,050 --> 00:10:52,390 I'm going to put them in a special form that's 186 00:10:52,390 --> 00:10:56,110 going to make it easier to do the conversion. 187 00:10:56,110 --> 00:11:00,500 And that simpler form is, first of all, 188 00:11:00,500 --> 00:11:07,390 I'm going to assume the GNFA has just a single accepting state. 189 00:11:07,390 --> 00:11:13,210 And that accepting state is not allowed to be the start state. 190 00:11:13,210 --> 00:11:16,120 So it has to have just a single accepting state. 191 00:11:16,120 --> 00:11:20,800 I've already violated that convenient assumption 192 00:11:20,800 --> 00:11:25,242 in this GNFA, because I have here two accepting states. 193 00:11:25,242 --> 00:11:26,200 That's not what I want. 194 00:11:26,200 --> 00:11:27,850 I want to have just one. 195 00:11:27,850 --> 00:11:30,880 Well, the thing is, it's easy to obtain just one, 196 00:11:30,880 --> 00:11:35,500 just to modify the machine so that I have just one by adding 197 00:11:35,500 --> 00:11:38,590 a new accepting state which is branched 198 00:11:38,590 --> 00:11:42,760 to from the former accepting states by empty transitions. 199 00:11:42,760 --> 00:11:46,600 So I can always jump from q2 to q4 at any time 200 00:11:46,600 --> 00:11:48,880 without even reading any input, just going along 201 00:11:48,880 --> 00:11:51,260 this empty transition. 202 00:11:51,260 --> 00:11:56,750 And then I declassify the former accepting states as accepting. 203 00:11:56,750 --> 00:11:59,630 And now I have here just a single accepting state. 204 00:11:59,630 --> 00:12:03,080 And because it's going to be a new state that I added, 205 00:12:03,080 --> 00:12:04,610 it won't be the start state. 206 00:12:04,610 --> 00:12:10,310 And I have accomplished that one aspect of my assumption 207 00:12:10,310 --> 00:12:13,170 about the form of the GNFA. 208 00:12:13,170 --> 00:12:15,540 But there's another thing that I want to do, too. 209 00:12:15,540 --> 00:12:17,760 I want to assume-- 210 00:12:17,760 --> 00:12:19,560 as you will see, which is going to be 211 00:12:19,560 --> 00:12:24,600 convenient in my construction-- that we 212 00:12:24,600 --> 00:12:29,800 will have transition arrows going from every state 213 00:12:29,800 --> 00:12:31,187 to every other state. 214 00:12:31,187 --> 00:12:33,520 In fact, I want transition arrows going from every state 215 00:12:33,520 --> 00:12:36,160 even back to themselves. 216 00:12:36,160 --> 00:12:39,130 I want there to be-- all possible transition 217 00:12:39,130 --> 00:12:45,685 arrows should be present, with two exceptions. 218 00:12:45,685 --> 00:12:48,860 For the start state, there should be only transition 219 00:12:48,860 --> 00:12:51,158 arrows exiting the start state. 220 00:12:51,158 --> 00:12:52,700 And for the accepting state-- there's 221 00:12:52,700 --> 00:12:56,000 just one now-- there should be only transition arrows coming 222 00:12:56,000 --> 00:12:58,740 into the start state. 223 00:12:58,740 --> 00:13:02,040 So it's kind of what you would imagine as being reasonable. 224 00:13:02,040 --> 00:13:06,660 For the other states, which are not accepting or starting, 225 00:13:06,660 --> 00:13:10,170 there should be transition arrows leaving and coming 226 00:13:10,170 --> 00:13:11,530 from everywhere else. 227 00:13:11,530 --> 00:13:13,185 But for the start states, just leaving. 228 00:13:13,185 --> 00:13:16,080 And from the accept state, just coming in. 229 00:13:16,080 --> 00:13:19,650 And you could easily modify the machine to achieve that. 230 00:13:19,650 --> 00:13:21,580 Let's just see how to do that in one example. 231 00:13:21,580 --> 00:13:25,620 So from-- notice that from q3 to q2 232 00:13:25,620 --> 00:13:27,557 there is no transition right now. 233 00:13:27,557 --> 00:13:28,390 And that's not good. 234 00:13:28,390 --> 00:13:29,348 That's not what I want. 235 00:13:29,348 --> 00:13:31,620 I want there to be a transition from q3 to q2. 236 00:13:31,620 --> 00:13:35,340 Well, I'll just add that transition in. 237 00:13:35,340 --> 00:13:37,900 But I'm going to label it with the empty language 238 00:13:37,900 --> 00:13:39,855 regular expression. 239 00:13:39,855 --> 00:13:41,730 So that means, yeah, the transition is there, 240 00:13:41,730 --> 00:13:43,350 but you never can take it. 241 00:13:43,350 --> 00:13:47,160 So it doesn't change the language 242 00:13:47,160 --> 00:13:49,800 that the machine is going to be recognizing. 243 00:13:49,800 --> 00:13:56,040 But it fulfills my assumption, my convenient assumption, 244 00:13:56,040 --> 00:13:58,890 that we have all of these transition arrows being 245 00:13:58,890 --> 00:14:01,310 present in the machine. 246 00:14:01,310 --> 00:14:06,020 So anyway, I hope you will buy it. 247 00:14:06,020 --> 00:14:07,160 It's not going to be-- 248 00:14:07,160 --> 00:14:09,290 if you don't quite get this, don't worry. 249 00:14:09,290 --> 00:14:11,300 It's not totally critical that you're 250 00:14:11,300 --> 00:14:15,470 following all these little adjustments and modifications 251 00:14:15,470 --> 00:14:16,340 to the GNFA. 252 00:14:16,340 --> 00:14:18,440 But it will be helpful to understand what GNFAs 253 00:14:18,440 --> 00:14:21,590 themselves-- how they work. 254 00:14:21,590 --> 00:14:23,470 So as I mentioned, we can easily modify 255 00:14:23,470 --> 00:14:27,970 GNFA to have the special form that we're assuming here. 256 00:14:27,970 --> 00:14:31,790 So now we're going to jump in and start doing the conversion. 257 00:14:31,790 --> 00:14:33,790 So we're going to have a lemma, which 258 00:14:33,790 --> 00:14:38,140 is like a theorem that really is just of local interest here. 259 00:14:38,140 --> 00:14:39,875 It's not a general interest theorem. 260 00:14:39,875 --> 00:14:41,500 It's going to be relevant just to GNFA, 261 00:14:41,500 --> 00:14:45,370 which are really just defined to help us do this conversion. 262 00:14:45,370 --> 00:14:48,950 They really don't have any other independent value. 263 00:14:48,950 --> 00:14:52,060 So every-- you want to show that every GNFA has 264 00:14:52,060 --> 00:14:57,800 an equivalent regular expression R. That's really my goal. 265 00:14:57,800 --> 00:15:02,490 And the way we're going to prove that is by induction. 266 00:15:02,490 --> 00:15:05,640 It's going to be by induction on the number of states 267 00:15:05,640 --> 00:15:08,040 of the GNFA. 268 00:15:08,040 --> 00:15:12,060 Now, you really should be familiar with induction 269 00:15:12,060 --> 00:15:15,420 as one of the expectations for being in this course. 270 00:15:15,420 --> 00:15:18,270 But in case you're a little shaky on it, don't worry. 271 00:15:18,270 --> 00:15:21,540 I'm going to unpack it as a procedure. 272 00:15:21,540 --> 00:15:22,680 It's really just recursion. 273 00:15:22,680 --> 00:15:24,685 You know, induction is just-- 274 00:15:24,685 --> 00:15:26,940 a proof that uses induction is really 275 00:15:26,940 --> 00:15:28,660 just a proof that calls itself. 276 00:15:28,660 --> 00:15:30,870 It's just a proof that-- it's a recursive proof. 277 00:15:30,870 --> 00:15:32,230 That's all it is. 278 00:15:32,230 --> 00:15:33,930 So if you're comfortable with recursion, 279 00:15:33,930 --> 00:15:36,210 you'll be comfortable with induction. 280 00:15:36,210 --> 00:15:38,770 But anyway, I'm going to describe this as a procedure. 281 00:15:38,770 --> 00:15:43,020 So if you're a little shaky on induction, don't worry. 282 00:15:43,020 --> 00:15:44,670 So the basis is-- 283 00:15:44,670 --> 00:15:47,250 so first I'm going to handle the case where 284 00:15:47,250 --> 00:15:49,425 the GNFA has just two states. 285 00:15:53,590 --> 00:15:57,010 Now, remember, I'm assuming now my GNFAs 286 00:15:57,010 --> 00:15:58,870 are in the special form. 287 00:15:58,870 --> 00:16:01,180 So you can't even have a GNFA with one state, 288 00:16:01,180 --> 00:16:02,830 because it has to have a start state 289 00:16:02,830 --> 00:16:04,390 and it has to have an accept state, 290 00:16:04,390 --> 00:16:06,200 and they have to not be the same. 291 00:16:06,200 --> 00:16:08,770 So the smallest possible GNFA to worry about 292 00:16:08,770 --> 00:16:11,650 is a two-state GNFA. 293 00:16:11,650 --> 00:16:14,890 Now, if we have a-- if we happen to have a two-state GNFA, 294 00:16:14,890 --> 00:16:17,050 it turns out to be very easy to find 295 00:16:17,050 --> 00:16:18,550 the equivalent regular expression. 296 00:16:18,550 --> 00:16:19,630 Why? 297 00:16:19,630 --> 00:16:25,070 Because that two-state GNFA can only look like this. 298 00:16:25,070 --> 00:16:28,060 It can have a start state, it can have an accept state, 299 00:16:28,060 --> 00:16:30,850 and it can only have a transition going 300 00:16:30,850 --> 00:16:34,150 from the start to the accept because no other transitions 301 00:16:34,150 --> 00:16:37,040 are allowed. 302 00:16:37,040 --> 00:16:40,490 It only has outgoing from the start, only incoming from the-- 303 00:16:40,490 --> 00:16:41,690 to the accept. 304 00:16:41,690 --> 00:16:43,820 And so there's only one transition. 305 00:16:43,820 --> 00:16:47,300 And it has a label with a regular expression R. 306 00:16:47,300 --> 00:16:50,090 So what do you think the equivalent regular expression 307 00:16:50,090 --> 00:16:51,650 is for this GNFA? 308 00:16:51,650 --> 00:16:54,380 It's just simply the one that's labeling that transition, 309 00:16:54,380 --> 00:16:57,380 because that tells us when I can go from the start 310 00:16:57,380 --> 00:16:58,067 to the accept. 311 00:16:58,067 --> 00:16:59,900 And there's nothing else the machine can do. 312 00:16:59,900 --> 00:17:02,150 It just makes one step, which is to accept 313 00:17:02,150 --> 00:17:05,730 its input if it's described by that regular expression. 314 00:17:05,730 --> 00:17:08,329 So therefore, the equivalent regular expression 315 00:17:08,329 --> 00:17:10,790 that we're looking for is simply the label 316 00:17:10,790 --> 00:17:14,109 on that single transition. 317 00:17:14,109 --> 00:17:17,680 So two-stage GNFAs are easy. 318 00:17:17,680 --> 00:17:20,290 But what if-- what happens if you have more states? 319 00:17:20,290 --> 00:17:24,290 Then you're going to actually have to do some work. 320 00:17:24,290 --> 00:17:27,099 So we call that the induction step. 321 00:17:27,099 --> 00:17:30,240 That's when we have more than two states. 322 00:17:30,240 --> 00:17:33,000 And what that-- the way the induction works is we're 323 00:17:33,000 --> 00:17:35,370 going to assume we already know how 324 00:17:35,370 --> 00:17:39,660 to do it for k minus 1 states. 325 00:17:39,660 --> 00:17:41,310 And we're going to use that knowledge 326 00:17:41,310 --> 00:17:44,470 to show how to do it for k states. 327 00:17:44,470 --> 00:17:46,830 So in other words, we already know 328 00:17:46,830 --> 00:17:48,563 how to do it for two states. 329 00:17:48,563 --> 00:17:49,980 I'm going to use that fact to show 330 00:17:49,980 --> 00:17:52,507 how to do it for three states, and then use the fact that I 331 00:17:52,507 --> 00:17:54,840 can do it for three states to show how to do it for four 332 00:17:54,840 --> 00:17:56,460 states, and so on, and so on. 333 00:17:59,440 --> 00:18:06,520 And the idea for how to do that is actually pretty easy 334 00:18:06,520 --> 00:18:08,680 to grasp. 335 00:18:08,680 --> 00:18:12,460 What we're going to do is, if we have a k state GNFA 336 00:18:12,460 --> 00:18:15,960 that we want to convert, we're going 337 00:18:15,960 --> 00:18:20,490 to change that k state GNFA to a k minus 1 state GNFA 338 00:18:20,490 --> 00:18:23,370 and then use our assumption that we already know how 339 00:18:23,370 --> 00:18:25,270 to do the k minus 1 state GNFA. 340 00:18:29,320 --> 00:18:35,050 So in terms of a picture, I'm going to take a k state-- 341 00:18:35,050 --> 00:18:38,290 to prove that I can always convert k state 342 00:18:38,290 --> 00:18:41,950 GNFAs to regular expressions, I'm 343 00:18:41,950 --> 00:18:46,600 going to show how to convert the k state one into an equivalent 344 00:18:46,600 --> 00:18:48,845 k minus 1 state GNFA. 345 00:18:48,845 --> 00:18:51,220 And then, if you just like to think of this procedurally, 346 00:18:51,220 --> 00:18:53,860 the k minus 1 gets converted to a k minus 2, 347 00:18:53,860 --> 00:18:56,690 gets converted to a k minus 3, and so on, and so on, 348 00:18:56,690 --> 00:19:01,450 until I get down to two, which then I know how to do directly. 349 00:19:01,450 --> 00:19:03,490 So the whole name of the game here 350 00:19:03,490 --> 00:19:07,780 is figuring out how to convert a GNFA that 351 00:19:07,780 --> 00:19:12,130 has k states into another one that has one fewer state that 352 00:19:12,130 --> 00:19:15,680 does the same language. 353 00:19:15,680 --> 00:19:17,670 So you have to hold that in your head. 354 00:19:17,670 --> 00:19:19,670 I mean, I wish I had more blackboard space here, 355 00:19:19,670 --> 00:19:21,340 but it's very limited here. 356 00:19:21,340 --> 00:19:23,660 So you have to remember what we're 357 00:19:23,660 --> 00:19:26,120 going to be doing on the next slide, 358 00:19:26,120 --> 00:19:28,100 because that's going to finish the job for us. 359 00:19:28,100 --> 00:19:30,990 As long as I can show in general how to convert a K, 360 00:19:30,990 --> 00:19:34,400 state GNFA to a GNFA that has one fewer state 361 00:19:34,400 --> 00:19:38,050 but it still does the same language, 362 00:19:38,050 --> 00:19:40,510 I'm good, because then I can keep iterating 363 00:19:40,510 --> 00:19:43,820 that till I get down to two. 364 00:19:43,820 --> 00:19:50,820 So here is-- this is the guts of the argument. 365 00:19:50,820 --> 00:19:52,700 So I have my k state machine. 366 00:19:52,700 --> 00:19:54,020 Here's my start state. 367 00:19:54,020 --> 00:19:56,420 Here's my accept state. 368 00:19:56,420 --> 00:19:59,060 Here's my k minus 1 state, that machine 369 00:19:59,060 --> 00:20:01,430 that I'm going to be building for you. 370 00:20:01,430 --> 00:20:03,740 It's actually going to be-- 371 00:20:03,740 --> 00:20:06,380 look almost exactly the same. 372 00:20:06,380 --> 00:20:14,600 I'm just going to remove one state from the bigger machine. 373 00:20:14,600 --> 00:20:18,430 So I'm going to pick any state which is not the start 374 00:20:18,430 --> 00:20:21,070 state or the accept state. 375 00:20:21,070 --> 00:20:23,960 Here it is pictured here. 376 00:20:23,960 --> 00:20:26,390 I mean, all of the states of the k state machine 377 00:20:26,390 --> 00:20:29,420 are going to appear in the k minus 1 state machine 378 00:20:29,420 --> 00:20:33,020 except for one state that I'm going to rip out. 379 00:20:36,570 --> 00:20:38,250 That's the state x. 380 00:20:38,250 --> 00:20:41,405 It's now here as a ghost. 381 00:20:41,405 --> 00:20:42,155 It's been removed. 382 00:20:44,750 --> 00:20:46,130 It's not there anymore. 383 00:20:46,130 --> 00:20:47,720 But I'm just helping you to remember 384 00:20:47,720 --> 00:20:51,440 that it used to be there by showing this shadow. 385 00:20:51,440 --> 00:20:55,560 But it's a-- 386 00:20:55,560 --> 00:21:01,260 I have taken my original machine that had k states 387 00:21:01,260 --> 00:21:05,010 and basically just ripped out a state. 388 00:21:05,010 --> 00:21:09,560 And now I have one fewer state. 389 00:21:09,560 --> 00:21:13,220 So the good news is that I now have a machine 390 00:21:13,220 --> 00:21:15,170 with k minus 1 states. 391 00:21:15,170 --> 00:21:17,240 That's what I want. 392 00:21:17,240 --> 00:21:19,130 But the bad news is that it doesn't 393 00:21:19,130 --> 00:21:21,350 do the same language anymore. 394 00:21:21,350 --> 00:21:24,140 I broke the machine by rip-- 395 00:21:24,140 --> 00:21:26,868 if you're just going to rip out a state, who knows what 396 00:21:26,868 --> 00:21:28,160 the new machine is going to do. 397 00:21:28,160 --> 00:21:31,340 It's going to be probably not the same as what 398 00:21:31,340 --> 00:21:33,260 the original machine did. 399 00:21:33,260 --> 00:21:39,170 So what I need to do, then, is repair the damage. 400 00:21:39,170 --> 00:21:42,830 I've got to fix the damage that I caused by removing x. 401 00:21:45,980 --> 00:21:53,420 And whatever role x was playing in the original machine, 402 00:21:53,420 --> 00:21:58,530 I've got to make sure that the new machine that I have, 403 00:21:58,530 --> 00:22:03,860 which doesn't have x anymore, can still do the same things 404 00:22:03,860 --> 00:22:06,470 that the original machine did. 405 00:22:06,470 --> 00:22:08,720 And so the way I'm going to do that is 406 00:22:08,720 --> 00:22:13,100 look at all of the paths that could go through x and make 407 00:22:13,100 --> 00:22:15,320 sure that they are still present even though I 408 00:22:15,320 --> 00:22:18,322 don't have x anymore. 409 00:22:18,322 --> 00:22:20,655 And the way I'm going to do that is, I'm going to take-- 410 00:22:23,370 --> 00:22:28,200 consider a part of a path that might use x. 411 00:22:28,200 --> 00:22:34,260 So it starts-- let's pick two states, qi and qj, 412 00:22:34,260 --> 00:22:38,520 in the machine that had k states. 413 00:22:38,520 --> 00:22:39,673 Let me just see here-- 414 00:22:39,673 --> 00:22:40,590 I don't know if this-- 415 00:22:43,782 --> 00:22:45,620 OK. 416 00:22:45,620 --> 00:22:47,570 We have-- if we have-- 417 00:22:47,570 --> 00:22:54,230 we'll pick two states, qi and qj, in the original machine. 418 00:22:54,230 --> 00:22:58,160 Now, qi might have the possibility 419 00:22:58,160 --> 00:23:00,080 of going to state x. 420 00:23:00,080 --> 00:23:03,020 And then x might have a self loop. 421 00:23:03,020 --> 00:23:04,900 And then it might go to qj. 422 00:23:08,010 --> 00:23:10,440 The new machine doesn't have an x anymore. 423 00:23:13,470 --> 00:23:20,230 The way I'm going to fix that is by replacing the label that 424 00:23:20,230 --> 00:23:26,260 goes directly from i to j with a new label that 425 00:23:26,260 --> 00:23:31,045 adds in all of the things I lost when I removed x. 426 00:23:37,480 --> 00:23:38,780 That's the whole idea here. 427 00:23:38,780 --> 00:23:42,700 So here is qi to qj, but there's no x anymore. 428 00:23:42,700 --> 00:23:45,760 How could I get from qi to qj? 429 00:23:51,128 --> 00:23:56,410 What were the inputs that could have brought us 430 00:23:56,410 --> 00:23:59,170 from qi to qj via x? 431 00:24:01,900 --> 00:24:04,880 Well, they would have been an input that 432 00:24:04,880 --> 00:24:08,510 read a string described by r1. 433 00:24:08,510 --> 00:24:11,240 I might have self-looked at x a few times, 434 00:24:11,240 --> 00:24:15,140 so I might have read several strings that 435 00:24:15,140 --> 00:24:17,830 are described by r2. 436 00:24:17,830 --> 00:24:20,080 And then I would have read a string 437 00:24:20,080 --> 00:24:21,760 that was described by r3. 438 00:24:21,760 --> 00:24:23,215 And now I'm at qj. 439 00:24:27,870 --> 00:24:32,490 So the new label that I'm going to place over here 440 00:24:32,490 --> 00:24:40,160 is going to be the strings that I get from reading r1-- 441 00:24:40,160 --> 00:24:46,220 reading a string that's described by r1, then 442 00:24:46,220 --> 00:24:48,470 multiple copies of strings-- 443 00:24:48,470 --> 00:24:51,410 multiple strings that are possibly 444 00:24:51,410 --> 00:24:54,700 describing r2, which is the same as r2 star. 445 00:24:54,700 --> 00:24:56,180 Oh, and then multiples-- and then 446 00:24:56,180 --> 00:24:59,270 a string that could be described by r3. 447 00:24:59,270 --> 00:25:03,290 So that is a new addition to the transition 448 00:25:03,290 --> 00:25:07,430 that takes me from qi to qj. 449 00:25:07,430 --> 00:25:09,830 Of course, I need to include the things that 450 00:25:09,830 --> 00:25:13,620 would have taken me from qi to qj in the first place. 451 00:25:13,620 --> 00:25:16,820 So I'm also unioning in r4, which 452 00:25:16,820 --> 00:25:19,850 is the direct route from qi to qj 453 00:25:19,850 --> 00:25:23,850 that did not transit through x. 454 00:25:23,850 --> 00:25:36,590 So by making that new regular expression 455 00:25:36,590 --> 00:25:41,520 on the qi to qj transition, I have compensated 456 00:25:41,520 --> 00:25:50,520 for the loss of x for paths that go from qi to x and then out to 457 00:25:50,520 --> 00:25:51,300 qj. 458 00:25:51,300 --> 00:25:56,340 Now, what I need to do is to do that same thing 459 00:25:56,340 --> 00:26:01,610 for every pair qi and qj that are in the original machine. 460 00:26:04,590 --> 00:26:07,970 And so if I do that for every possible pair, 461 00:26:07,970 --> 00:26:10,790 I'll be modifying all of the transitions 462 00:26:10,790 --> 00:26:21,280 in the new machine in a way that compensates for the loss of x. 463 00:26:21,280 --> 00:26:24,610 And now the new machine has been repaired from the damage 464 00:26:24,610 --> 00:26:26,320 that I caused by removing x. 465 00:26:26,320 --> 00:26:28,100 And it does the same language. 466 00:26:28,100 --> 00:26:30,600 It's the kind of thing you need to think a little bit about. 467 00:26:30,600 --> 00:26:32,210 I understand. 468 00:26:32,210 --> 00:26:36,170 But at least hopefully, the spirit 469 00:26:36,170 --> 00:26:38,480 of what I just described to you comes through, 470 00:26:38,480 --> 00:26:42,120 that we're going to convert this k-- 471 00:26:42,120 --> 00:26:45,500 machine with k states to one with k minus 1 states 472 00:26:45,500 --> 00:26:48,120 by removing a state and repairing the damage. 473 00:26:48,120 --> 00:26:49,890 And now it does the same language. 474 00:26:49,890 --> 00:26:51,350 And then I can remove another state 475 00:26:51,350 --> 00:26:53,090 and do the same thing over and over again 476 00:26:53,090 --> 00:26:56,670 until I get down to two states. 477 00:26:56,670 --> 00:26:59,460 So that's the idea. 478 00:26:59,460 --> 00:27:03,030 And that really completes the proof. 479 00:27:03,030 --> 00:27:10,440 That shows that I can convert every GNFA 480 00:27:10,440 --> 00:27:14,890 to a regular expression. 481 00:27:14,890 --> 00:27:19,540 And that really is the end of the story for this. 482 00:27:19,540 --> 00:27:23,290 And thus I claim that DFAs, now, and regular expressions 483 00:27:23,290 --> 00:27:25,570 are equivalent. 484 00:27:25,570 --> 00:27:31,710 So let me-- going to give you a little check-in here on this, 485 00:27:31,710 --> 00:27:35,700 really just to see, high-level, if you're following 486 00:27:35,700 --> 00:27:36,600 what's going on. 487 00:27:39,270 --> 00:27:40,440 So just take a look. 488 00:27:40,440 --> 00:27:43,650 So we just showed how to convert GNFAs to regular expression. 489 00:27:43,650 --> 00:27:47,170 But we really wanted to convert DFAs to regular expressions. 490 00:27:47,170 --> 00:27:49,350 So how do we go from GNFA-- 491 00:27:49,350 --> 00:27:51,495 converting GNFAs to converting DFAs? 492 00:27:51,495 --> 00:27:53,960 Because they're not the same, obviously. 493 00:27:53,960 --> 00:27:55,620 Right? 494 00:27:55,620 --> 00:27:57,720 So how do we finish that? 495 00:27:57,720 --> 00:27:59,640 So there are three choices here. 496 00:27:59,640 --> 00:28:03,120 First, we have to show how to convert DFAs to GNFAs, maybe? 497 00:28:03,120 --> 00:28:05,970 Or show how to convert GNFAs to DFAs? 498 00:28:05,970 --> 00:28:07,260 Or maybe we're already done? 499 00:28:07,260 --> 00:28:10,140 So maybe I better launch that poll while you're reading that. 500 00:28:13,880 --> 00:28:16,490 And there you go. 501 00:28:16,490 --> 00:28:23,800 Hopefully you can-- all right. 502 00:28:23,800 --> 00:28:25,240 Why don't I end this? 503 00:28:25,240 --> 00:28:27,640 It's a little worrisome, because I 504 00:28:27,640 --> 00:28:31,720 would say we have a plurality who got the right answer, 505 00:28:31,720 --> 00:28:34,390 but not a majority. 506 00:28:34,390 --> 00:28:37,240 So let us share the results. 507 00:28:37,240 --> 00:28:42,950 I think-- so I sense not all of you are with me. 508 00:28:42,950 --> 00:28:45,730 But you're going to have to-- 509 00:28:48,380 --> 00:28:51,680 either that or you're playing-- you're reading 510 00:28:51,680 --> 00:28:54,140 your email while we're talking. 511 00:28:54,140 --> 00:28:54,810 I'm not sure. 512 00:28:54,810 --> 00:29:01,180 But whatever it is, you need to think a little bit about what's 513 00:29:01,180 --> 00:29:09,040 going on here, because the reason why we are done 514 00:29:09,040 --> 00:29:13,570 is because DFAs are a kind of GNFAs. 515 00:29:13,570 --> 00:29:15,580 They're just-- they have a very simple kind 516 00:29:15,580 --> 00:29:18,520 of regular expression on each transition. 517 00:29:18,520 --> 00:29:21,500 They just have the regular expression 518 00:29:21,500 --> 00:29:23,520 which is just a single symbol. 519 00:29:23,520 --> 00:29:26,370 So all DFAs are automatically GNFAs. 520 00:29:26,370 --> 00:29:29,940 So if I can convert GNFAs, I can certainly convert DFAs, 521 00:29:29,940 --> 00:29:32,640 because GNFAs include the DFAs. 522 00:29:32,640 --> 00:29:33,330 I'm done. 523 00:29:33,330 --> 00:29:39,550 It really was-- number C was the correct answer. 524 00:29:39,550 --> 00:29:44,860 So good thing we're not [LAUGHS] counting correctness here. 525 00:29:44,860 --> 00:29:47,050 So participation is good enough. 526 00:29:47,050 --> 00:29:49,230 But I do think you need to think about what's 527 00:29:49,230 --> 00:29:52,870 going on and making sure that you're following along. 528 00:29:52,870 --> 00:29:54,015 So anyway, that's a-- 529 00:29:58,820 --> 00:29:59,900 we'll carry on here. 530 00:29:59,900 --> 00:30:02,900 But it makes me a little concerned. 531 00:30:06,290 --> 00:30:09,842 So let us now move on. 532 00:30:09,842 --> 00:30:11,300 So we're going to talk a little bit 533 00:30:11,300 --> 00:30:12,860 about non-regular languages. 534 00:30:16,640 --> 00:30:18,710 So somebody's asking, don't we have to still make 535 00:30:18,710 --> 00:30:20,270 the DFAs into the special type? 536 00:30:20,270 --> 00:30:22,820 Yes, we do have to make them to the special type. 537 00:30:22,820 --> 00:30:25,580 But we already showed how to make GNFAs 538 00:30:25,580 --> 00:30:26,660 into the special type. 539 00:30:26,660 --> 00:30:31,400 And DFA-- that is going to apply to DFAs as well. 540 00:30:31,400 --> 00:30:33,340 They'll become GNFAs. 541 00:30:33,340 --> 00:30:35,840 You can add the extra starts-- 542 00:30:35,840 --> 00:30:39,530 add a new start state, add a new accept state, 543 00:30:39,530 --> 00:30:41,690 add in all the transitions with-- which 544 00:30:41,690 --> 00:30:45,810 you didn't have before with the empty language label, 545 00:30:45,810 --> 00:30:49,520 and you'll have a GNFA from a DFA. 546 00:30:49,520 --> 00:30:51,710 But that applies to GNFAs as-- 547 00:30:51,710 --> 00:30:52,610 in general. 548 00:30:52,610 --> 00:30:54,800 So it's nothing special about DFAs there. 549 00:30:54,800 --> 00:30:57,620 Anyway, I think you need to chew on that. 550 00:30:57,620 --> 00:31:02,690 And hopefully you're-- you'll be following going forward. 551 00:31:02,690 --> 00:31:06,080 Anyway, let us look now at non-- proving non-regularity. 552 00:31:06,080 --> 00:31:09,740 So we're finished with our goal of showing 553 00:31:09,740 --> 00:31:12,740 that regular languages-- that the regular languages can 554 00:31:12,740 --> 00:31:17,000 either come from DFAs or from regular expressions. 555 00:31:17,000 --> 00:31:19,910 Those are the same in terms of-- from the perspective 556 00:31:19,910 --> 00:31:23,280 of our course, they're interchangeable. 557 00:31:23,280 --> 00:31:26,968 So now, as we mentioned, there are 558 00:31:26,968 --> 00:31:29,010 going to be some languages which are not regular, 559 00:31:29,010 --> 00:31:31,178 which can't be done by DFAs. 560 00:31:31,178 --> 00:31:32,970 They're actually-- DFAs are actually pretty 561 00:31:32,970 --> 00:31:34,870 weak as a computational model. 562 00:31:34,870 --> 00:31:37,500 And so there's all sorts of very simple things that they cannot 563 00:31:37,500 --> 00:31:39,720 do-- 564 00:31:39,720 --> 00:31:42,330 though there are some fairly complicated things that they 565 00:31:42,330 --> 00:31:44,860 can do, surprisingly enough. 566 00:31:44,860 --> 00:31:48,640 But anyway, there are some simple things they can't do. 567 00:31:48,640 --> 00:31:52,440 And so we have to develop a method for showing 568 00:31:52,440 --> 00:31:54,840 that a language is not regular. 569 00:31:54,840 --> 00:31:57,540 And that's going to be useful for your homework 570 00:31:57,540 --> 00:31:59,760 and in general for just understanding 571 00:31:59,760 --> 00:32:03,450 the power of DFAs. 572 00:32:03,450 --> 00:32:05,950 So how do we show a language is not regular? 573 00:32:05,950 --> 00:32:09,300 So remember, if you want to show a language is regular, 574 00:32:09,300 --> 00:32:12,810 basically what you need to do is give a DFA. 575 00:32:12,810 --> 00:32:14,543 Or you can use the closure properties. 576 00:32:14,543 --> 00:32:16,710 That's another way of showing a language is regular. 577 00:32:16,710 --> 00:32:19,110 But underneath that, it's basically constructing DFAs. 578 00:32:22,446 --> 00:32:24,750 To show a language is not regular you 579 00:32:24,750 --> 00:32:26,363 have to give a proof. 580 00:32:26,363 --> 00:32:27,780 Generally it's not a construction, 581 00:32:27,780 --> 00:32:32,400 it's a proof that there is no DFA or that whatever-- 582 00:32:32,400 --> 00:32:37,090 that it's just going to be impossible to make a DFA. 583 00:32:37,090 --> 00:32:38,590 And we have to develop a method. 584 00:32:38,590 --> 00:32:42,100 What is that proof method? 585 00:32:42,100 --> 00:32:45,520 Now, there is a tempting-- 586 00:32:45,520 --> 00:32:47,410 you know, I've taught this course many times, 587 00:32:47,410 --> 00:32:55,508 and there's a tempting approach that many people have. 588 00:32:55,508 --> 00:32:57,550 It's not only going to apply for finite automata, 589 00:32:57,550 --> 00:32:58,592 but for other things too. 590 00:32:58,592 --> 00:33:01,090 And believe me, it's not only people in this class, 591 00:33:01,090 --> 00:33:03,790 it's for people out there in the-- 592 00:33:03,790 --> 00:33:07,800 who are trying to think about computation in general-- 593 00:33:07,800 --> 00:33:15,690 which is to say, well, I have some language. 594 00:33:15,690 --> 00:33:17,920 I'm trying to figure out if it's regular or not. 595 00:33:17,920 --> 00:33:21,360 And so I thought really hard how to make a DFA, 596 00:33:21,360 --> 00:33:22,830 and I couldn't find one. 597 00:33:22,830 --> 00:33:26,260 Therefore, it's not regular. 598 00:33:26,260 --> 00:33:27,730 That's not a proof. 599 00:33:27,730 --> 00:33:29,920 Just because you couldn't find a DFA 600 00:33:29,920 --> 00:33:32,310 doesn't mean there is no DFA. 601 00:33:32,310 --> 00:33:34,860 You need to prove that the language is not 602 00:33:34,860 --> 00:33:37,840 regular using some method. 603 00:33:37,840 --> 00:33:39,810 So I'm going to give you an example where 604 00:33:39,810 --> 00:33:43,260 that kind of approach can lead you wrong. 605 00:33:43,260 --> 00:33:47,430 And that is-- I'll give two examples of languages 606 00:33:47,430 --> 00:33:52,650 where you might try to prove they're regular or not, 607 00:33:52,650 --> 00:33:56,940 and you could be in trouble if you just follow 608 00:33:56,940 --> 00:34:00,480 that kind of informal approach. 609 00:34:00,480 --> 00:34:03,770 So if you take the language B, where these are strings-- well, 610 00:34:03,770 --> 00:34:07,570 let's assume our alphabet is zeros and ones. 611 00:34:07,570 --> 00:34:09,219 B is the language of all strings that 612 00:34:09,219 --> 00:34:12,909 have an equal number of zeros and ones. 613 00:34:12,909 --> 00:34:15,909 So you want to know, if I have 1,000 zeros, 614 00:34:15,909 --> 00:34:17,550 I need to have 1,000 ones. 615 00:34:17,550 --> 00:34:19,300 So basically, the way you test that, you'd 616 00:34:19,300 --> 00:34:20,842 have to count up the number of zeros, 617 00:34:20,842 --> 00:34:23,920 count up the number of ones, and see if those two counts are 618 00:34:23,920 --> 00:34:25,659 the same. 619 00:34:25,659 --> 00:34:30,030 And that's going to be really tough to make a DFA do, 620 00:34:30,030 --> 00:34:33,030 because how are you going to remember such-- 621 00:34:33,030 --> 00:34:36,090 that really big number of zeros that-- the DFA 622 00:34:36,090 --> 00:34:37,860 might have 50 states. 623 00:34:37,860 --> 00:34:41,820 But you might need to count up to 100 or a million 624 00:34:41,820 --> 00:34:47,040 to figure out-- to count up how many zeros you've seen. 625 00:34:47,040 --> 00:34:50,340 And it seems really hard to be able to do that kind of a count 626 00:34:50,340 --> 00:34:53,204 when you only have 50 states. 627 00:34:53,204 --> 00:34:55,939 So whatever number of states you have, 628 00:34:55,939 --> 00:35:01,110 it seems hard to count when you have a finite automaton. 629 00:35:01,110 --> 00:35:03,570 So the intuition is, it's not regular 630 00:35:03,570 --> 00:35:06,720 because a finite automaton can't count. 631 00:35:06,720 --> 00:35:11,220 Which, in this case, you can convert that intuition 632 00:35:11,220 --> 00:35:12,150 into a real proof. 633 00:35:12,150 --> 00:35:14,370 I would say it's not a real proof yet, 634 00:35:14,370 --> 00:35:17,940 but it can be made into a real proof. 635 00:35:17,940 --> 00:35:21,450 But compare that case with another language, 636 00:35:21,450 --> 00:35:27,160 which I'll call C, which, instead of looking at its input 637 00:35:27,160 --> 00:35:30,490 to see whether it has an equal number of zeros and ones, 638 00:35:30,490 --> 00:35:33,820 I'm going to look at the input and look at the substrings 639 00:35:33,820 --> 00:35:36,400 of 01s and 10s-- 640 00:35:36,400 --> 00:35:41,560 those two substrings-- and count the number of occurrences of 01 641 00:35:41,560 --> 00:35:44,680 as a substring and the number of occurrences of 10 642 00:35:44,680 --> 00:35:46,910 as a substring. 643 00:35:46,910 --> 00:35:48,960 Just to make sure you're understanding, 644 00:35:48,960 --> 00:35:50,240 let's look at some example-- 645 00:35:50,240 --> 00:35:51,720 two examples. 646 00:35:51,720 --> 00:35:57,800 So the string 0101 is not going to be in C, 647 00:35:57,800 --> 00:36:01,520 because if you count up the number of 01s and the number 648 00:36:01,520 --> 00:36:03,300 of 10s, not the same. 649 00:36:03,300 --> 00:36:07,070 So I'm even going to help you here, if you can see that. 650 00:36:07,070 --> 00:36:09,230 The number of 01s is two. 651 00:36:09,230 --> 00:36:14,610 But there's only a single occurrence of 10. 652 00:36:14,610 --> 00:36:17,980 So those are-- those two counts are different. 653 00:36:17,980 --> 00:36:21,930 And so that's why this input string is not in C. 654 00:36:21,930 --> 00:36:27,330 Compare that with the string 0110. 655 00:36:27,330 --> 00:36:31,560 Now, if you count up the number of 01 and 10 substrings, 656 00:36:31,560 --> 00:36:33,030 you're going to get the same value, 657 00:36:33,030 --> 00:36:38,226 because here we have a single 01 and a single 10. 658 00:36:38,226 --> 00:36:40,980 And so now the two counts of those number of substrings 659 00:36:40,980 --> 00:36:42,330 are the same. 660 00:36:42,330 --> 00:36:46,780 And so that's where you're in C. 661 00:36:46,780 --> 00:36:51,830 Now my question is, is C a regular language? 662 00:36:51,830 --> 00:36:54,920 Well, it looks like it shouldn't be regular for the same reason 663 00:36:54,920 --> 00:36:57,830 that B isn't regular-- because you have to count up two 664 00:36:57,830 --> 00:37:00,260 quantities and compare them. 665 00:37:07,940 --> 00:37:08,480 OK? 666 00:37:08,480 --> 00:37:17,110 So now, so if we-- 667 00:37:20,600 --> 00:37:22,490 so that's our intuition, that you just 668 00:37:22,490 --> 00:37:24,410 can't do it for the-- with a finite automaton, 669 00:37:24,410 --> 00:37:27,035 because you have to do the same kind of counting that you would 670 00:37:27,035 --> 00:37:30,575 have had to do for language B. But here you'll be-- 671 00:37:30,575 --> 00:37:33,830 you would be wrong, because C, in fact, is regular. 672 00:37:39,040 --> 00:37:42,550 It has a much simpler description 673 00:37:42,550 --> 00:37:47,530 than the one I gave over here at the beginning. 674 00:37:47,530 --> 00:37:50,290 The very same language, C, can be described 675 00:37:50,290 --> 00:37:51,670 in a much, much simpler way. 676 00:37:51,670 --> 00:37:52,750 I'm not going to tell you what it is. 677 00:37:52,750 --> 00:37:53,830 You can mull that over. 678 00:37:53,830 --> 00:37:56,890 You can try some examples to figure it out. 679 00:37:56,890 --> 00:37:58,480 But it has a much simpler description. 680 00:37:58,480 --> 00:38:00,670 It's not a totally trivial description. 681 00:38:00,670 --> 00:38:01,870 There is some content there. 682 00:38:01,870 --> 00:38:04,690 But there is-- it is the kind of thing 683 00:38:04,690 --> 00:38:06,670 that a finite automaton can do. 684 00:38:06,670 --> 00:38:09,460 It wouldn't do the counting this way. 685 00:38:09,460 --> 00:38:12,340 So the moral is-- the punch line is 686 00:38:12,340 --> 00:38:14,860 that sometimes the intuition works, but it can also 687 00:38:14,860 --> 00:38:16,605 be wrong. 688 00:38:16,605 --> 00:38:17,980 And so the moral of the story is, 689 00:38:17,980 --> 00:38:22,210 you need to give a proof when you're doing things like that. 690 00:38:22,210 --> 00:38:24,880 So what we're going to do next, in the second half 691 00:38:24,880 --> 00:38:28,330 of the lecture, is to give a method for proving 692 00:38:28,330 --> 00:38:29,608 languages are not regular. 693 00:38:29,608 --> 00:38:32,150 And again, you're going to need to use that on your homework. 694 00:38:32,150 --> 00:38:33,680 So I hope you get it. 695 00:38:33,680 --> 00:38:36,040 But first of all-- 696 00:38:36,040 --> 00:38:38,860 did I-- never stopped sharing that poll. 697 00:38:38,860 --> 00:38:39,490 Forgive me. 698 00:38:46,680 --> 00:38:48,210 And so with that, I think we'll take 699 00:38:48,210 --> 00:38:52,440 our little requested break. 700 00:38:52,440 --> 00:38:55,950 And-- for five minutes. 701 00:38:55,950 --> 00:38:59,355 And we'll be back in five minutes. 702 00:39:02,620 --> 00:39:04,120 So, break time. 703 00:39:21,280 --> 00:39:23,080 We are done here. 704 00:39:23,080 --> 00:39:28,190 And proving languages not regular. 705 00:39:31,490 --> 00:39:34,070 The way we're going to prove languages are not regular 706 00:39:34,070 --> 00:39:40,420 is by introducing a method called the pumping lemma. 707 00:39:40,420 --> 00:39:45,520 And the overarching plan at the pumping lemma, 708 00:39:45,520 --> 00:39:49,690 without getting into the specifics of it, is to say-- 709 00:39:49,690 --> 00:39:50,550 show that-- 710 00:39:55,922 --> 00:40:00,050 that lemma says all regular languages 711 00:40:00,050 --> 00:40:04,780 have a certain property, which we will describe. 712 00:40:04,780 --> 00:40:08,710 And so to show a language is not regular 713 00:40:08,710 --> 00:40:13,110 you simply show the language doesn't have that property, 714 00:40:13,110 --> 00:40:17,410 because all regular languages have to have that property. 715 00:40:17,410 --> 00:40:20,610 And so by showing a language fails to have the property, 716 00:40:20,610 --> 00:40:22,750 it could not be regular. 717 00:40:22,750 --> 00:40:24,025 That's the plan. 718 00:40:26,780 --> 00:40:32,630 Now, the property itself is a little complicated 719 00:40:32,630 --> 00:40:35,150 to describe, but not too bad. 720 00:40:35,150 --> 00:40:37,260 I'll try to unpack it for you. 721 00:40:37,260 --> 00:40:43,580 But first, let's look at the statement of the lemma, 722 00:40:43,580 --> 00:40:47,060 which says that whenever you have a regular language-- 723 00:40:47,060 --> 00:40:50,470 let's call it A. So for every regular language A 724 00:40:50,470 --> 00:40:54,710 there's always a special value called the pump-- 725 00:40:54,710 --> 00:40:56,180 a number. 726 00:40:56,180 --> 00:41:00,420 p, we'll call it-- called the pumping length. 727 00:41:00,420 --> 00:41:01,770 It's a special number. 728 00:41:01,770 --> 00:41:20,870 And it's-- and that length tells you that whenever a string is 729 00:41:20,870 --> 00:41:25,280 in that language and it's longer than that length, 730 00:41:25,280 --> 00:41:30,170 then something special happens. 731 00:41:30,170 --> 00:41:33,540 You can take that string and you can modify it, 732 00:41:33,540 --> 00:41:35,060 and you still stay in the language. 733 00:41:35,060 --> 00:41:37,700 So anything that's longer than that special length 734 00:41:37,700 --> 00:41:39,620 can be modified in a certain way, 735 00:41:39,620 --> 00:41:41,430 and you still stay in the language. 736 00:41:41,430 --> 00:41:47,240 So let's look at the actual statement of the lemma. 737 00:41:47,240 --> 00:41:53,520 So there is a number p such that if s 738 00:41:53,520 --> 00:41:56,400 is a string in the language and it's longer than p, or at least 739 00:41:56,400 --> 00:42:00,610 of length p, then you can take s and you can cut it up 740 00:42:00,610 --> 00:42:01,960 into three pieces-- 741 00:42:01,960 --> 00:42:03,430 x, y, and z-- 742 00:42:03,430 --> 00:42:05,920 so that's just breaking s into three pieces-- 743 00:42:05,920 --> 00:42:12,520 where you can take that middle piece, repeat it as many times 744 00:42:12,520 --> 00:42:14,935 as you like, and you still stay in the language. 745 00:42:17,680 --> 00:42:22,508 That's the-- what the pumping lemma is saying. 746 00:42:22,508 --> 00:42:24,550 And there's a bunch of other conditions here too. 747 00:42:24,550 --> 00:42:26,920 But the spirit of the pumping lemma 748 00:42:26,920 --> 00:42:30,280 says, whenever you have a regular language there's 749 00:42:30,280 --> 00:42:32,170 some cutoff such that all strings 750 00:42:32,170 --> 00:42:36,070 longer than that cutoff can be what we call pumped. 751 00:42:36,070 --> 00:42:40,030 You can take that string, you can find a section 752 00:42:40,030 --> 00:42:41,620 somewhere in the middle of that string 753 00:42:41,620 --> 00:42:43,780 or somewhere-- you cut it up in three pieces, 754 00:42:43,780 --> 00:42:47,210 you take that center piece, and you can repeat it. 755 00:42:47,210 --> 00:42:50,290 You can pump it up. 756 00:42:50,290 --> 00:42:54,070 And by repeating that string and repeating that piece, 757 00:42:54,070 --> 00:42:55,960 the string gets longer and longer. 758 00:42:55,960 --> 00:42:58,930 But you still stay in the language. 759 00:42:58,930 --> 00:43:03,250 That's the special property that all regular languages have. 760 00:43:03,250 --> 00:43:05,450 So in an informal way-- and we'll do-- 761 00:43:05,450 --> 00:43:07,960 I'll try to help you get the feeling for this. 762 00:43:07,960 --> 00:43:12,520 Informally, it says that if you have a regular language, 763 00:43:12,520 --> 00:43:15,730 then every long string-- so a long is by-- 764 00:43:15,730 --> 00:43:18,130 informal way of saying bigger than this value p. 765 00:43:18,130 --> 00:43:21,540 Every long string in the language can be pumped. 766 00:43:21,540 --> 00:43:24,150 And this result still stays in the language. 767 00:43:24,150 --> 00:43:27,480 And by "pumped" means I can cut the string into three pieces 768 00:43:27,480 --> 00:43:29,785 and repeat that middle piece as many times as I want. 769 00:43:29,785 --> 00:43:31,410 That's what I mean by pumping a string. 770 00:43:34,510 --> 00:43:41,250 So we'll do some examples in a second. 771 00:43:41,250 --> 00:43:44,390 But first we're going to see how to prove this. 772 00:43:44,390 --> 00:43:47,420 And hopefully, that'll give you some feeling, also, 773 00:43:47,420 --> 00:43:50,040 for why it's true. 774 00:43:50,040 --> 00:43:56,820 So-- and actually, maybe before I actually jump into the proof, 775 00:43:56,820 --> 00:43:58,800 let me-- let's look at these three conditions 776 00:43:58,800 --> 00:44:04,710 here just to understand it a little bit more thoroughly. 777 00:44:04,710 --> 00:44:10,440 So condition one kind of says what I just was telling you. 778 00:44:10,440 --> 00:44:12,330 I can break s into three pieces-- 779 00:44:12,330 --> 00:44:17,700 x, y, z-- such that if I take x y to the i z-- 780 00:44:17,700 --> 00:44:20,800 so that's repeating y as many times as I want. 781 00:44:20,800 --> 00:44:23,440 So here's y to the i defined, if that's helpful to you-- 782 00:44:23,440 --> 00:44:24,180 it's just y-- 783 00:44:24,180 --> 00:44:25,470 i copies of y. 784 00:44:25,470 --> 00:44:28,560 So I can take x y to the i z, and I 785 00:44:28,560 --> 00:44:32,880 remain in the language for every value of i-- 786 00:44:32,880 --> 00:44:35,160 even i equals 0, which means we're just 787 00:44:35,160 --> 00:44:38,760 removing y, which is sometimes actually a useful thing to do. 788 00:44:38,760 --> 00:44:42,810 But let's not get ahead of ourselves. 789 00:44:42,810 --> 00:44:48,270 So if-- you know, I can cut s-- 790 00:44:48,270 --> 00:44:51,270 I'm guaranteed to be able to cut s up into x, y, z 791 00:44:51,270 --> 00:44:57,960 so that the string xyyy is still in the language, or xyyyyy-- 792 00:44:57,960 --> 00:44:59,670 it's still in the language. 793 00:44:59,670 --> 00:45:03,090 That's going to be guaranteed for every regular language. 794 00:45:03,090 --> 00:45:06,622 That's a feature that's going to be true. 795 00:45:06,622 --> 00:45:08,080 And furthermore-- and this is going 796 00:45:08,080 --> 00:45:10,330 to be turning out to be-- it's not really 797 00:45:10,330 --> 00:45:12,700 part of the core idea of the pumping lemma, 798 00:45:12,700 --> 00:45:15,040 but it actually turns out to be very helpful in applying 799 00:45:15,040 --> 00:45:16,810 the pumping lemma. 800 00:45:16,810 --> 00:45:19,000 You can always cut it up in such a way 801 00:45:19,000 --> 00:45:24,530 that the first two pieces are not longer than that value p. 802 00:45:24,530 --> 00:45:28,280 So this-- it restricts on the ways you can cut the thing up. 803 00:45:28,280 --> 00:45:31,040 And that actually turns out to be very helpful. 804 00:45:31,040 --> 00:45:33,380 But let's first just look at the proof of this, 805 00:45:33,380 --> 00:45:35,285 giving a little bit the high-level picture. 806 00:45:37,890 --> 00:45:44,490 So my job is to show, if I have a string in my language-- 807 00:45:44,490 --> 00:45:45,540 let's say it's a-- 808 00:45:45,540 --> 00:45:48,720 think of it as a long string, really long. 809 00:45:48,720 --> 00:45:50,160 So its length is more than p. 810 00:45:50,160 --> 00:45:54,480 But I think intuitively, it's just a very long string. 811 00:45:54,480 --> 00:45:58,890 And I'm going to feed that string into the machine 812 00:45:58,890 --> 00:46:00,870 and watch what happens. 813 00:46:00,870 --> 00:46:03,300 Something special happens when I feed the string 814 00:46:03,300 --> 00:46:09,270 and I look at how the machine proceeds on that string, 815 00:46:09,270 --> 00:46:16,250 because s is so long that as I wander around 816 00:46:16,250 --> 00:46:18,950 inside the machine I have to end up 817 00:46:18,950 --> 00:46:21,140 coming back to the same place more than once. 818 00:46:23,930 --> 00:46:26,270 It's like if you have a small park 819 00:46:26,270 --> 00:46:27,650 and you go for a long walk. 820 00:46:27,650 --> 00:46:29,300 You're going to end up coming back to where you've-- 821 00:46:29,300 --> 00:46:30,500 what you've already seen. 822 00:46:30,500 --> 00:46:32,450 You just can't keep on seeing new stuff 823 00:46:32,450 --> 00:46:35,405 when you have a more small area of space to explore. 824 00:46:41,590 --> 00:46:43,390 So we're guaranteed that M is going 825 00:46:43,390 --> 00:46:46,480 to end up repeating some state when it's reading s 826 00:46:46,480 --> 00:46:47,920 because s is so long. 827 00:46:47,920 --> 00:46:50,800 So in terms-- pictorially, if you 828 00:46:50,800 --> 00:46:55,000 imagine here this wiggly line is describing the path that M 829 00:46:55,000 --> 00:46:57,220 follows when it's reading s, it ends up 830 00:46:57,220 --> 00:47:00,470 coming back to that state qj more than once. 831 00:47:00,470 --> 00:47:02,920 So it comes back here, cycles around, 832 00:47:02,920 --> 00:47:05,490 comes back again before it ends up accepting. 833 00:47:05,490 --> 00:47:07,240 We know it ends up accepting because we're 834 00:47:07,240 --> 00:47:10,120 assuming we have a string that's in the language. 835 00:47:10,120 --> 00:47:12,070 So we picked s in the language. 836 00:47:12,070 --> 00:47:15,670 So it has to be accepted by M. But the important thing 837 00:47:15,670 --> 00:47:18,760 is that it repeats a state. 838 00:47:18,760 --> 00:47:23,050 Now, how does that tell me I can cut s up 839 00:47:23,050 --> 00:47:24,610 into those three pieces? 840 00:47:24,610 --> 00:47:27,380 Well, I'm going to get those three pieces here. 841 00:47:27,380 --> 00:47:30,460 First of all, let's observe that here is processing-- 842 00:47:30,460 --> 00:47:31,750 as processing s. 843 00:47:31,750 --> 00:47:35,710 Here is the-- written right on top of the string, 844 00:47:35,710 --> 00:47:39,460 that state repetition occurring, qj, more than once. 845 00:47:39,460 --> 00:47:42,230 And now, if I look inside the machine, 846 00:47:42,230 --> 00:47:47,620 the part of s that took me to qj I'm going to call x. 847 00:47:47,620 --> 00:47:50,110 The part that took me from qj back 848 00:47:50,110 --> 00:47:52,210 to itself I'm going to call y. 849 00:47:52,210 --> 00:47:55,900 And the part that took qj to the accept state 850 00:47:55,900 --> 00:47:57,130 I'm going to call z. 851 00:47:57,130 --> 00:47:59,800 And I'm going to mark those off in s. 852 00:47:59,800 --> 00:48:05,160 And that gives me the way to cut s up into three pieces. 853 00:48:05,160 --> 00:48:11,180 Now, if you're appreciating what's 854 00:48:11,180 --> 00:48:13,190 going on inside the machine, you will 855 00:48:13,190 --> 00:48:18,545 see why M will also accept the string xyyz-- 856 00:48:22,360 --> 00:48:28,720 because every time-- once you're at qj, if you go around once, 857 00:48:28,720 --> 00:48:30,220 you come back to qj. 858 00:48:30,220 --> 00:48:32,980 And then if you go again, you'll come back to qj. 859 00:48:32,980 --> 00:48:35,180 And as many times as you keep seeing that y, 860 00:48:35,180 --> 00:48:37,270 you're just going to keep coming back to qj. 861 00:48:37,270 --> 00:48:40,913 So it doesn't matter how many y's you have. 862 00:48:40,913 --> 00:48:42,580 You're going to still-- if you follow it 863 00:48:42,580 --> 00:48:45,070 by z, which is what you will do-- you'll 864 00:48:45,070 --> 00:48:48,850 end up accepting this string. 865 00:48:48,850 --> 00:48:50,800 And that's really the proof. 866 00:48:50,800 --> 00:48:53,680 I mean, you have to do a little bit more 867 00:48:53,680 --> 00:48:55,240 here just to understand-- 868 00:48:55,240 --> 00:48:57,010 I should have mentioned why I want 869 00:48:57,010 --> 00:48:58,672 to forbid y being the empty string, 870 00:48:58,672 --> 00:49:00,880 because if y's the empty string it's not interesting. 871 00:49:00,880 --> 00:49:06,040 It doesn't change-- repeating it doesn't actually 872 00:49:06,040 --> 00:49:06,710 change anything. 873 00:49:06,710 --> 00:49:08,293 So I have to make sure it's not empty. 874 00:49:08,293 --> 00:49:11,200 But anyway, that's a detail here. 875 00:49:11,200 --> 00:49:14,950 If you look at the string xyyz, that's 876 00:49:14,950 --> 00:49:17,020 still going to be accepted. 877 00:49:17,020 --> 00:49:21,740 So that's the proof of the pumping lemma. 878 00:49:21,740 --> 00:49:24,740 So let's have a little check-in related to that. 879 00:49:24,740 --> 00:49:26,990 This is not going to be-- again, not super hard. 880 00:49:26,990 --> 00:49:31,010 But more just a curiosity. 881 00:49:33,980 --> 00:49:37,180 So the pumping lemma depends on the fact that if M has p states 882 00:49:37,180 --> 00:49:39,290 and it runs for more than p steps, 883 00:49:39,290 --> 00:49:41,890 then it's going to enter some state twice. 884 00:49:41,890 --> 00:49:44,140 So you may have seen that before. 885 00:49:44,140 --> 00:49:49,340 It actually has a name which some of you may have seen. 886 00:49:49,340 --> 00:49:51,325 So let's see how to just get a poll here. 887 00:49:54,849 --> 00:50:06,200 And I hope not too many of you are going to pick C, 888 00:50:06,200 --> 00:50:07,430 as it's-- some of you are. 889 00:50:07,430 --> 00:50:10,670 [LAUGHS] Oh well. 890 00:50:10,670 --> 00:50:13,550 Yes, I think this one most of you are-- 891 00:50:13,550 --> 00:50:15,360 you've seen this before. 892 00:50:15,360 --> 00:50:19,710 This is-- I think you pretty much all got it. 893 00:50:19,710 --> 00:50:24,920 This is what's known as the Pigeonhole Principle. 894 00:50:24,920 --> 00:50:30,133 So here, sharing the results, obviously I 895 00:50:30,133 --> 00:50:31,550 was having a little fun with this. 896 00:50:31,550 --> 00:50:33,550 I'm sure some of you were having fun back at me. 897 00:50:33,550 --> 00:50:35,720 That's OK. 898 00:50:35,720 --> 00:50:37,620 So let's continue on. 899 00:50:37,620 --> 00:50:43,610 Let's see how to use the pumping lemma to prove a language 900 00:50:43,610 --> 00:50:45,300 is not regular. 901 00:50:45,300 --> 00:50:47,150 So I put the pumping lemma up here 902 00:50:47,150 --> 00:50:51,630 just so you can remember the statement of it. 903 00:50:51,630 --> 00:50:55,430 So let's take the language D, which is the language 0 904 00:50:55,430 --> 00:50:58,220 to the k 1 to the k for any k. 905 00:50:58,220 --> 00:51:00,920 So that's some number of zeros followed 906 00:51:00,920 --> 00:51:03,420 by an equal number of ones. 907 00:51:03,420 --> 00:51:05,730 We're going to prove that language is not 908 00:51:05,730 --> 00:51:09,750 regular by using the pumping lemma. 909 00:51:09,750 --> 00:51:12,600 And this is going to be just an ironclad proof. 910 00:51:12,600 --> 00:51:16,100 It's not going to say, well, I couldn't think of how to-- 911 00:51:16,100 --> 00:51:18,870 I couldn't think of how to find it a finite automaton. 912 00:51:18,870 --> 00:51:20,565 This is going to be-- 913 00:51:20,565 --> 00:51:23,820 this is going to really be a proof. 914 00:51:23,820 --> 00:51:27,870 So we want to show that D is not regular. 915 00:51:27,870 --> 00:51:29,190 And we're going to give-- 916 00:51:29,190 --> 00:51:32,260 these things always go as a proof by contradiction. 917 00:51:32,260 --> 00:51:35,430 So proof by contradiction-- hopefully as a reminder to you, 918 00:51:35,430 --> 00:51:36,900 the way that works is you're going 919 00:51:36,900 --> 00:51:39,750 to assume the opposite of what you're trying to prove. 920 00:51:39,750 --> 00:51:44,010 And then from that, something crazy is going to happen, 921 00:51:44,010 --> 00:51:48,130 something you know is obviously false or wrong. 922 00:51:48,130 --> 00:51:49,690 And so therefore your assumption, 923 00:51:49,690 --> 00:51:51,982 which is the opposite of what you were trying to prove, 924 00:51:51,982 --> 00:51:52,697 had to be wrong. 925 00:51:52,697 --> 00:51:54,780 And so therefore, the thing you're trying to prove 926 00:51:54,780 --> 00:51:56,850 has to be right. 927 00:51:56,850 --> 00:52:00,040 That's the essence of what's called proof by contradiction. 928 00:52:00,040 --> 00:52:01,710 So first of all, we're going to assume, 929 00:52:01,710 --> 00:52:06,360 to get our contradiction, that D is regular, 930 00:52:06,360 --> 00:52:09,060 which is what we're trying to show is not the case. 931 00:52:09,060 --> 00:52:14,520 Now, if D is regular, then we can apply the pumping lemma up 932 00:52:14,520 --> 00:52:20,150 above here, which gives us that pumping length p, which 933 00:52:20,150 --> 00:52:22,750 says that any string longer than p can be pumped 934 00:52:22,750 --> 00:52:24,050 and you stay in the language. 935 00:52:24,050 --> 00:52:26,200 That's what the pumping lemma tells you. 936 00:52:26,200 --> 00:52:29,830 So let's pick the string s, which is the string 937 00:52:29,830 --> 00:52:31,360 0 to the p 1 to the p. 938 00:52:31,360 --> 00:52:33,950 Here's sort of a picture of s off on the side here. 939 00:52:33,950 --> 00:52:39,190 So a bunch of zeros followed by an equal number of ones. 940 00:52:39,190 --> 00:52:45,700 And that string is in D because D is strings of that form. 941 00:52:45,700 --> 00:52:47,890 And it's longer than p. 942 00:52:47,890 --> 00:52:51,020 Obviously, it's of length 2p. 943 00:52:51,020 --> 00:52:53,120 So the pumping lemma tells us there's 944 00:52:53,120 --> 00:52:57,390 a way to cut it up satisfying those three conditions. 945 00:52:57,390 --> 00:53:02,450 So how in the world could we possibly cut s up? 946 00:53:02,450 --> 00:53:04,070 Well, remember the three conditions. 947 00:53:04,070 --> 00:53:06,920 And especially condition 3 is going to come in handy here. 948 00:53:06,920 --> 00:53:10,100 Say that you can cut s up into three pieces-- 949 00:53:10,100 --> 00:53:11,720 x, y, and z-- 950 00:53:11,720 --> 00:53:19,850 where the first two pieces lie in the first p symbols of s 951 00:53:19,850 --> 00:53:22,160 at most p long. 952 00:53:22,160 --> 00:53:24,680 So x and y together are not very big. 953 00:53:24,680 --> 00:53:28,640 They don't extend beyond the first half of x-- 954 00:53:28,640 --> 00:53:29,360 first half of s. 955 00:53:29,360 --> 00:53:32,180 And in particular, they're all zeros. 956 00:53:32,180 --> 00:53:34,550 x and y are going to be all zeros. z is going 957 00:53:34,550 --> 00:53:36,470 to perhaps have some zeros and will 958 00:53:36,470 --> 00:53:40,890 have the rest of the ones-- will have the ones. 959 00:53:40,890 --> 00:53:48,220 Now, the pumping lemma says that if you cut it up that way, 960 00:53:48,220 --> 00:53:50,222 you can repeat y as many times as you like 961 00:53:50,222 --> 00:53:51,430 and you stay in the language. 962 00:53:51,430 --> 00:53:55,590 But that's obviously false, because if you repeat y-- 963 00:53:55,590 --> 00:53:57,240 which now has only zeros-- 964 00:53:57,240 --> 00:53:59,370 you're going to have too many zeros. 965 00:53:59,370 --> 00:54:01,530 And so the resulting string is no longer 966 00:54:01,530 --> 00:54:04,560 going to be of the form 0 to the k 1 to the k. 967 00:54:04,560 --> 00:54:08,700 It's going to be lots of zeros followed by not so many ones. 968 00:54:08,700 --> 00:54:09,960 That's not in the language. 969 00:54:09,960 --> 00:54:12,300 And that violates what the pumping lemma tells 970 00:54:12,300 --> 00:54:14,290 you is supposed to happen. 971 00:54:14,290 --> 00:54:15,970 And that's a contradiction. 972 00:54:15,970 --> 00:54:18,660 So therefore, our assumption that D is regular is false. 973 00:54:18,660 --> 00:54:21,430 And so we conclude that D is not regular. 974 00:54:21,430 --> 00:54:23,920 So that's a fairly simple one. 975 00:54:23,920 --> 00:54:25,920 I thought I would do another couple of examples, 976 00:54:25,920 --> 00:54:27,250 because you have this on your homework 977 00:54:27,250 --> 00:54:28,667 and I thought it might be helpful. 978 00:54:28,667 --> 00:54:31,260 So here's the second one-- slightly harder, but not 979 00:54:31,260 --> 00:54:33,570 too much. 980 00:54:33,570 --> 00:54:37,540 Let's take the language F, which is-- 981 00:54:37,540 --> 00:54:41,570 looks like the string's ww. 982 00:54:41,570 --> 00:54:43,770 These are strings that-- 983 00:54:43,770 --> 00:54:46,980 two copies of the same string. 984 00:54:46,980 --> 00:54:50,040 For any string that might be in sigma star, 985 00:54:50,040 --> 00:54:51,840 so for any string at all, I'm going 986 00:54:51,840 --> 00:54:53,370 to have two copies of that string. 987 00:54:53,370 --> 00:54:57,300 And so F is those strings which can be-- which are just 988 00:54:57,300 --> 00:54:58,890 two copies of the same string. 989 00:55:02,650 --> 00:55:06,010 We're going to show that F is not regular. 990 00:55:06,010 --> 00:55:07,510 These things always go the same way. 991 00:55:07,510 --> 00:55:08,770 It's the same pattern. 992 00:55:08,770 --> 00:55:10,280 You prove by contradiction. 993 00:55:10,280 --> 00:55:12,000 So you assume for contradiction that-- 994 00:55:12,000 --> 00:55:15,480 oh, D. That's bad. 995 00:55:15,480 --> 00:55:16,980 That was copied from my other slide. 996 00:55:16,980 --> 00:55:17,630 That's wrong. 997 00:55:17,630 --> 00:55:19,630 Let's see if I can actually make this work here. 998 00:55:26,150 --> 00:55:27,440 Good. 999 00:55:27,440 --> 00:55:29,420 Assume for contradiction that F is regular. 1000 00:55:31,980 --> 00:55:33,900 The pumping lemma gives F as above. 1001 00:55:33,900 --> 00:55:36,990 And so now we need to choose a string s 1002 00:55:36,990 --> 00:55:42,460 that's in F to do the pumping and show that the pumping lemma 1003 00:55:42,460 --> 00:55:44,578 is going to fail. 1004 00:55:44,578 --> 00:55:46,120 You're going to pump and you're going 1005 00:55:46,120 --> 00:55:49,630 to get something which is not in the language, which is-- 1006 00:55:49,630 --> 00:55:51,310 shows that the pump-- 1007 00:55:51,310 --> 00:55:52,600 something has gone wrong. 1008 00:55:52,600 --> 00:55:53,890 But which s to choose? 1009 00:55:53,890 --> 00:55:56,950 And sometimes that's where the creativity 1010 00:55:56,950 --> 00:55:59,890 in applying the pumping lemma comes in, 1011 00:55:59,890 --> 00:56:01,390 because you have to figure out which 1012 00:56:01,390 --> 00:56:04,240 is the right string you're going to pump on. 1013 00:56:04,240 --> 00:56:06,220 So you might try the string-- 1014 00:56:06,220 --> 00:56:09,280 well, 0 to the p 0 to the p. 1015 00:56:09,280 --> 00:56:15,180 That's certainly in F. It's two copies of the same string. 1016 00:56:15,180 --> 00:56:15,690 Here it is. 1017 00:56:15,690 --> 00:56:21,210 I've written lots of zeros followed 1018 00:56:21,210 --> 00:56:23,190 by the same number of zeros. 1019 00:56:23,190 --> 00:56:27,480 The problem is, if you use that string, 1020 00:56:27,480 --> 00:56:32,220 it actually is a string that you can pump. 1021 00:56:32,220 --> 00:56:35,010 You can break that string up into three pieces. 1022 00:56:35,010 --> 00:56:38,720 And then, if you let y be the string 00-- 1023 00:56:38,720 --> 00:56:40,470 actually, you have to be a little careful. 1024 00:56:40,470 --> 00:56:43,110 The string just 0 doesn't work, because there's 1025 00:56:43,110 --> 00:56:45,220 an evenness-oddness phenomenon going here. 1026 00:56:45,220 --> 00:56:47,140 So you might want to just think about that. 1027 00:56:47,140 --> 00:56:50,838 But if you let y be the string 00, 1028 00:56:50,838 --> 00:56:53,670 then if you have the string xy-- 1029 00:56:53,670 --> 00:56:55,815 x any number of y's-- 1030 00:56:55,815 --> 00:56:57,690 it's still just going to be a bunch of zeros. 1031 00:56:57,690 --> 00:57:00,330 And you're going to be able to see that that string is still 1032 00:57:00,330 --> 00:57:01,110 in the language. 1033 00:57:01,110 --> 00:57:03,890 So you haven't learned anything. 1034 00:57:03,890 --> 00:57:07,340 If the pumping lemma works and you're 1035 00:57:07,340 --> 00:57:10,880 satisfying the pumping lemma, you haven't learned anything. 1036 00:57:10,880 --> 00:57:15,650 So what you need to find is some other string. 1037 00:57:15,650 --> 00:57:17,520 That was a bad choice for s. 1038 00:57:17,520 --> 00:57:18,590 Find a different string. 1039 00:57:18,590 --> 00:57:22,850 So here's a different choice, 0 to the p 1 0 to the p 1. 1040 00:57:22,850 --> 00:57:25,600 So that's two copies of the same string. 1041 00:57:25,600 --> 00:57:27,250 And you're going to show it can't be-- 1042 00:57:27,250 --> 00:57:28,875 we're going to show it can't be pumped. 1043 00:57:28,875 --> 00:57:32,040 So here's a picture of that string here. 1044 00:57:32,040 --> 00:57:34,790 So zeros followed by 1, zeros followed by 1. 1045 00:57:34,790 --> 00:57:38,270 And now it's a very similar to the first argument. 1046 00:57:38,270 --> 00:57:40,730 If you cut it into three pieces in such a way 1047 00:57:40,730 --> 00:57:43,050 that it satisfies the conditions, 1048 00:57:43,050 --> 00:57:45,350 the first two pieces are going to be residing only 1049 00:57:45,350 --> 00:57:46,490 among the zeros. 1050 00:57:46,490 --> 00:57:50,720 And so therefore, when you repeat a y 1051 00:57:50,720 --> 00:57:55,460 you're no longer going to have two copies of the same string. 1052 00:57:55,460 --> 00:57:57,380 And so it won't be in the language. 1053 00:57:57,380 --> 00:58:01,040 So therefore, you've got a contradiction and F is not 1054 00:58:01,040 --> 00:58:02,420 regular. 1055 00:58:02,420 --> 00:58:05,660 So you have to play with the pumping lemma a little bit. 1056 00:58:05,660 --> 00:58:08,863 If you haven't seen that before it's going to be-- 1057 00:58:08,863 --> 00:58:10,280 it takes a little getting used to. 1058 00:58:10,280 --> 00:58:11,930 But you have a few homework questions 1059 00:58:11,930 --> 00:58:15,230 that need to be solved using the pumping lemma. 1060 00:58:15,230 --> 00:58:19,490 So now, let's look at-- 1061 00:58:19,490 --> 00:58:22,040 lastly, there is another method that can come in, 1062 00:58:22,040 --> 00:58:24,290 which is combining closure properties 1063 00:58:24,290 --> 00:58:27,040 with the pumping lemma. 1064 00:58:27,040 --> 00:58:29,565 So closure properties sometimes help you. 1065 00:58:29,565 --> 00:58:31,690 So let's look at the language B, which is actually, 1066 00:58:31,690 --> 00:58:33,680 we saw earlier in the lecture, where 1067 00:58:33,680 --> 00:58:36,730 we have an equal number of zeros and ones. 1068 00:58:36,730 --> 00:58:38,860 Now, we could prove that directly, 1069 00:58:38,860 --> 00:58:41,860 using the pumping lemma, as not being regular. 1070 00:58:41,860 --> 00:58:45,580 But it's actually even easier. 1071 00:58:45,580 --> 00:58:47,680 What we're going to prove-- 1072 00:58:47,680 --> 00:58:50,465 that-- we're going to prove that it's not 1073 00:58:50,465 --> 00:58:51,590 regular in a different way. 1074 00:58:51,590 --> 00:58:54,173 First we're going to assume for contradiction, as we often do, 1075 00:58:54,173 --> 00:58:55,977 that it is regular. 1076 00:58:55,977 --> 00:58:58,060 And now we're going to use something-- we're going 1077 00:58:58,060 --> 00:58:59,410 to use some other knowledge. 1078 00:58:59,410 --> 00:59:01,798 We're not going to use the pumping lemma here 1079 00:59:01,798 --> 00:59:03,340 because we're going to take advantage 1080 00:59:03,340 --> 00:59:07,170 of an earlier case where we used the pumping lemma. 1081 00:59:07,170 --> 00:59:10,200 And so now we know that the string-- 1082 00:59:10,200 --> 00:59:14,100 the language 0 star 1 star is a regular language, 1083 00:59:14,100 --> 00:59:16,260 because it's described by a regular expression. 1084 00:59:16,260 --> 00:59:20,970 If you take the B, which is the equal numbers of zeros 1085 00:59:20,970 --> 00:59:24,720 and ones, and you intersect it with 0 star 1 star, 1086 00:59:24,720 --> 00:59:26,490 that's going to be a regular language 1087 00:59:26,490 --> 00:59:31,740 if B was regular, using closure under intersection. 1088 00:59:31,740 --> 00:59:34,860 But this language B intersect 0 star 1 star 1089 00:59:34,860 --> 00:59:39,810 is the language of equal numbers of zeros and ones 1090 00:59:39,810 --> 00:59:42,960 where the zeros come first. 1091 00:59:42,960 --> 00:59:44,940 And that's the language D that we 1092 00:59:44,940 --> 00:59:49,990 showed two slides back, that we already know can't be regular. 1093 00:59:49,990 --> 00:59:53,110 So that intersection cannot be regular. 1094 00:59:53,110 --> 00:59:57,440 And so it violates the closure property. 1095 00:59:57,440 --> 01:00:00,200 And again, we get a contradiction. 1096 01:00:00,200 --> 01:00:06,490 So that's a different way of sometimes making a shortcut 1097 01:00:06,490 --> 01:00:09,830 to prove a language is not regular. 1098 01:00:09,830 --> 01:00:12,830 So we have-- in our last 10 minutes or so, 1099 01:00:12,830 --> 01:00:15,140 we're going to shift gears totally, 1100 01:00:15,140 --> 01:00:17,750 in an entirely different way, and consider 1101 01:00:17,750 --> 01:00:21,500 a new model of computation which is more powerful, 1102 01:00:21,500 --> 01:00:23,570 that can actually do things that we 1103 01:00:23,570 --> 01:00:25,850 can't do with finite automata. 1104 01:00:25,850 --> 01:00:28,470 And these are called context-free grammars. 1105 01:00:28,470 --> 01:00:31,100 So this is really just an introduction. 1106 01:00:31,100 --> 01:00:33,950 We're going to spend all of next lecture looking 1107 01:00:33,950 --> 01:00:38,270 at context-free grammars and their associated languages. 1108 01:00:38,270 --> 01:00:40,766 But let's just do-- 1109 01:00:40,766 --> 01:00:43,860 get a preview. 1110 01:00:43,860 --> 01:00:46,410 So a context-free grammar looks like this. 1111 01:00:46,410 --> 01:00:47,535 You have a bunch of these-- 1112 01:00:51,660 --> 01:00:54,750 what we call substitution rules, or rules, 1113 01:00:54,750 --> 01:00:58,560 sometimes, which just look like a symbol, 1114 01:00:58,560 --> 01:01:00,045 arrow, a string of symbols. 1115 01:01:03,420 --> 01:01:05,130 That's what a context-free grammar 1116 01:01:05,130 --> 01:01:08,490 looks like at a high level. 1117 01:01:08,490 --> 01:01:12,140 Let's define some terms. 1118 01:01:12,140 --> 01:01:15,290 So a rule, as I just described, is 1119 01:01:15,290 --> 01:01:18,500 going to be-- look-- it's going to be a symbol, which 1120 01:01:18,500 --> 01:01:21,110 we're going to call a variable. 1121 01:01:21,110 --> 01:01:24,590 And that's going to have an arrow to a string of other-- 1122 01:01:24,590 --> 01:01:30,080 possibly, other variables and symbols called terminals. 1123 01:01:30,080 --> 01:01:35,030 So a variable is a symbol that appears on the left-hand side 1124 01:01:35,030 --> 01:01:36,320 of a rule. 1125 01:01:36,320 --> 01:01:38,480 Anything that appears on the left-hand side 1126 01:01:38,480 --> 01:01:41,345 is going to be considered to be a variable. 1127 01:01:44,450 --> 01:01:48,420 So S and R are both variables. 1128 01:01:48,420 --> 01:01:51,800 Now, other symbols that appear in the grammar which 1129 01:01:51,800 --> 01:01:55,520 don't appear in the left-hand side-- 1130 01:01:55,520 --> 01:01:59,030 those are going to be called terminals. 1131 01:01:59,030 --> 01:02:02,645 So here, 0 and 1 are terminals. 1132 01:02:02,645 --> 01:02:04,770 Now, you may think that empty string should also be 1133 01:02:04,770 --> 01:02:05,460 a terminal . 1134 01:02:05,460 --> 01:02:07,110 But that's not a symbol. 1135 01:02:07,110 --> 01:02:09,280 Empty string is a string. 1136 01:02:09,280 --> 01:02:10,620 It's just a string of length 0. 1137 01:02:10,620 --> 01:02:14,310 So I'm not considering empty string to be a terminal. 1138 01:02:17,770 --> 01:02:22,028 So-- and then there's going to be a special variable which 1139 01:02:22,028 --> 01:02:24,570 is going to be considered the starting variable, just like we 1140 01:02:24,570 --> 01:02:25,740 had a starting state. 1141 01:02:25,740 --> 01:02:29,100 And that's typically going to be written as the top-left symbol. 1142 01:02:29,100 --> 01:02:33,340 So this symbol s, here, is going to be the starting symbol. 1143 01:02:33,340 --> 01:02:40,270 And grammars can be used to define languages and to-- 1144 01:02:40,270 --> 01:02:42,970 well, to generate strings and to define languages. 1145 01:02:42,970 --> 01:02:45,340 So first of all, let's see how a grammar, 1146 01:02:45,340 --> 01:02:48,145 using this as an illustration, can generate strings. 1147 01:02:50,960 --> 01:02:55,220 Actually, just to emphasize this terminology here, 1148 01:02:55,220 --> 01:02:59,240 in this particular example we had three rules. 1149 01:02:59,240 --> 01:03:04,950 The two variables were R and S. The two terminals were 0 and 1. 1150 01:03:04,950 --> 01:03:07,650 And the start variable was this top left-hand symbol, 1151 01:03:07,650 --> 01:03:10,670 as I mentioned-- the S. 1152 01:03:10,670 --> 01:03:12,890 So grammars generate strings. 1153 01:03:12,890 --> 01:03:16,190 The way they do is you follow a certain procedure, 1154 01:03:16,190 --> 01:03:18,490 which is really pretty simple. 1155 01:03:18,490 --> 01:03:21,850 You write down, first of all, the start variable. 1156 01:03:21,850 --> 01:03:23,860 And I'll do an example in a second. 1157 01:03:23,860 --> 01:03:26,220 You write down the start variable. 1158 01:03:26,220 --> 01:03:29,520 And then you take a look what you've written down. 1159 01:03:29,520 --> 01:03:31,560 And if it has any variables in it, 1160 01:03:31,560 --> 01:03:37,290 you can apply one of the corresponding right-hand sides 1161 01:03:37,290 --> 01:03:44,800 of a rule as a substitution for that variable. 1162 01:03:44,800 --> 01:03:47,940 And so-- like, for example, if you have an S in the thing 1163 01:03:47,940 --> 01:03:51,885 you've written down, you can substitute for that S a 0S1. 1164 01:03:54,650 --> 01:03:57,590 Or you could substitute for that S an R. Or if you have an R, 1165 01:03:57,590 --> 01:04:01,410 you can substitute for the S an empty string. 1166 01:04:01,410 --> 01:04:04,520 So you're just going to keep on doing that substitutions over 1167 01:04:04,520 --> 01:04:06,980 and over again until there are no variables left, 1168 01:04:06,980 --> 01:04:09,650 so there's nothing left to substitute. 1169 01:04:09,650 --> 01:04:11,090 Only terminals remain. 1170 01:04:11,090 --> 01:04:13,610 At that point, you have generated 1171 01:04:13,610 --> 01:04:15,020 a string in the language. 1172 01:04:17,580 --> 01:04:19,313 So the language, then, is the collection 1173 01:04:19,313 --> 01:04:20,355 of all generated strings. 1174 01:04:22,930 --> 01:04:23,810 Let's do an example. 1175 01:04:23,810 --> 01:04:27,430 Here's an example of G1 generating some string. 1176 01:04:27,430 --> 01:04:29,980 So as I mentioned, first of all, you're 1177 01:04:29,980 --> 01:04:32,950 going to write down the start variable. 1178 01:04:32,950 --> 01:04:36,310 And I'm just going to illustrate this in two parallel tracks 1179 01:04:36,310 --> 01:04:37,353 here. 1180 01:04:37,353 --> 01:04:38,770 On the left side I'm going to show 1181 01:04:38,770 --> 01:04:40,278 you the tree of substitutions. 1182 01:04:40,278 --> 01:04:42,070 And on the right side I'm going to show you 1183 01:04:42,070 --> 01:04:44,050 the resulting string that you get by applying 1184 01:04:44,050 --> 01:04:47,120 those substitutions. 1185 01:04:47,120 --> 01:04:49,280 So over here I'm going to substitute 1186 01:04:49,280 --> 01:04:51,380 for S the string 0S1. 1187 01:04:51,380 --> 01:04:53,540 So on the right-hand side I just have 0S1, 1188 01:04:53,540 --> 01:04:55,430 because that's what I substituted for S. 1189 01:04:55,430 --> 01:04:58,070 But you'll see it's not going to-- it's going to look 1190 01:04:58,070 --> 01:04:59,960 a little different in a second. 1191 01:04:59,960 --> 01:05:02,430 Here, I'm going to-- again I still have a variable. 1192 01:05:02,430 --> 01:05:05,780 So I'm going to substitute for S 0S1. 1193 01:05:05,780 --> 01:05:08,120 Now I have the string-- 1194 01:05:08,120 --> 01:05:13,710 resulting string 00S11, because I've substituted 0S1 1195 01:05:13,710 --> 01:05:17,362 for the previous S, but the 0 and 1 stick around from before. 1196 01:05:17,362 --> 01:05:18,320 They don't go anywhere. 1197 01:05:18,320 --> 01:05:22,070 So I have, at this point, 00S11. 1198 01:05:22,070 --> 01:05:25,860 Now I'm going to take a different choice. 1199 01:05:25,860 --> 01:05:27,840 I'm going to substitute for S-- 1200 01:05:27,840 --> 01:05:29,400 I could have gone either way. 1201 01:05:29,400 --> 01:05:32,010 This would have something-- almost like non-determinism 1202 01:05:32,010 --> 01:05:34,390 here, because you have a choice. 1203 01:05:34,390 --> 01:05:36,930 I'm going to substitute for S-- 1204 01:05:36,930 --> 01:05:39,000 instead of 0S1 I'm going to substitute R, 1205 01:05:39,000 --> 01:05:41,790 because that's also legitimate in terms of the rules. 1206 01:05:41,790 --> 01:05:43,614 And so now I'm going to have 00R11. 1207 01:05:46,360 --> 01:05:48,760 And now R-- there's no choices here. 1208 01:05:48,760 --> 01:05:53,120 R can only be substituted for by an empty string. 1209 01:05:53,120 --> 01:05:56,060 So I get to R becomes just empty string. 1210 01:05:56,060 --> 01:05:58,420 And in terms of the string generated, 1211 01:05:58,420 --> 01:06:00,530 empty string doesn't add anything. 1212 01:06:00,530 --> 01:06:02,140 It just really is-- 1213 01:06:02,140 --> 01:06:03,460 it's a nothing. 1214 01:06:03,460 --> 01:06:05,860 So I get the string 0011. 1215 01:06:05,860 --> 01:06:08,380 And this is a string just of terminal symbols. 1216 01:06:08,380 --> 01:06:13,660 And so that is a string in the language of the grammar G1. 1217 01:06:16,280 --> 01:06:19,760 And if you think about it, G1's language 1218 01:06:19,760 --> 01:06:22,860 is that language that we saw before, 1219 01:06:22,860 --> 01:06:26,270 which I think we called D-- 1220 01:06:26,270 --> 01:06:29,570 0 to the k 1 to the k for k greater than or equal to 0. 1221 01:06:29,570 --> 01:06:32,690 So this is an example of a language that a context-free 1222 01:06:32,690 --> 01:06:38,840 grammar can do but a finite automaton cannot do. 1223 01:06:38,840 --> 01:06:42,080 So that is our little introduction to-- 1224 01:06:42,080 --> 01:06:42,690 oops. 1225 01:06:42,690 --> 01:06:45,330 There's one more check-in here. 1226 01:06:45,330 --> 01:06:45,830 Oh, yeah. 1227 01:06:45,830 --> 01:06:48,710 So I'm asking you to actually look at-- 1228 01:06:48,710 --> 01:06:52,430 let me get myself out of this picture 1229 01:06:52,430 --> 01:06:57,020 so you don't see me blocking things. 1230 01:06:57,020 --> 01:06:58,940 And we will do one last check-in. 1231 01:07:01,840 --> 01:07:04,195 Make sure you're staying around for the whole thing. 1232 01:07:06,260 --> 01:07:08,260 Now there could be several of these strings that 1233 01:07:08,260 --> 01:07:09,093 are in the language. 1234 01:07:09,093 --> 01:07:11,020 You have to click them all-- all of the ones 1235 01:07:11,020 --> 01:07:13,720 that you have found that are in the language of this grammar 1236 01:07:13,720 --> 01:07:17,600 that can be generated by grammar G2, you have to click those. 1237 01:07:17,600 --> 01:07:20,680 I'll give you a little bit more time on this one 1238 01:07:20,680 --> 01:07:24,495 to see which ones G2 can generate. 1239 01:07:24,495 --> 01:07:25,370 I'll give you a hint. 1240 01:07:25,370 --> 01:07:27,970 It's more than one, but not all. 1241 01:07:30,890 --> 01:07:33,365 So I see you're making some progress here. 1242 01:07:39,560 --> 01:07:40,590 Interesting. 1243 01:07:40,590 --> 01:07:44,617 So please-- we're going to wrap this up very quickly. 1244 01:07:44,617 --> 01:07:46,700 You can-- somebody's telling me you can't unclick. 1245 01:07:46,700 --> 01:07:47,510 Thank you. 1246 01:07:47,510 --> 01:07:48,080 Good to know. 1247 01:07:51,790 --> 01:07:53,270 Still, things are coming in here. 1248 01:07:53,270 --> 01:07:58,560 So let's not-- we're running toward the end of the hour 1249 01:07:58,560 --> 01:07:59,060 here. 1250 01:07:59,060 --> 01:08:00,080 I don't want to go over. 1251 01:08:00,080 --> 01:08:02,440 So I'm going to end it in five seconds. 1252 01:08:02,440 --> 01:08:03,100 Click away. 1253 01:08:03,100 --> 01:08:07,840 And don't forget, we're not going to charge you 1254 01:08:07,840 --> 01:08:10,510 if you get it wrong. 1255 01:08:10,510 --> 01:08:12,817 Sharing results. 1256 01:08:12,817 --> 01:08:14,650 I don't know why it has an orange one there, 1257 01:08:14,650 --> 01:08:20,630 because there are several correct answers here. 1258 01:08:20,630 --> 01:08:24,040 So it's A, B, and D are correct. 1259 01:08:24,040 --> 01:08:26,649 You can get any of those. 1260 01:08:26,649 --> 01:08:29,439 It's really sort of two copies of the language we 1261 01:08:29,439 --> 01:08:34,600 had before next to one another. 1262 01:08:34,600 --> 01:08:41,779 And so the only thing you cannot get is 1010. 1263 01:08:41,779 --> 01:08:43,850 So I encourage you to think about that. 1264 01:08:43,850 --> 01:08:50,189 And I will come to our last side of today, 1265 01:08:50,189 --> 01:08:53,750 which is just a quick review. 1266 01:08:53,750 --> 01:08:54,964 I can put myself back. 1267 01:08:57,640 --> 01:09:01,330 So we showed how to convert DFAs to regular expressions. 1268 01:09:01,330 --> 01:09:07,090 And the summary is that DFAs, NFAs, GNFAs, even, and regular 1269 01:09:07,090 --> 01:09:10,990 expressions are all equivalent in the class of languages 1270 01:09:10,990 --> 01:09:14,380 they can describe. 1271 01:09:14,380 --> 01:09:16,180 The second thing we did was a method 1272 01:09:16,180 --> 01:09:18,160 for proving languages not regular by using 1273 01:09:18,160 --> 01:09:20,859 the pumping lemma or closure properties. 1274 01:09:20,859 --> 01:09:24,430 And lastly, we introduced context-free grammars. 1275 01:09:24,430 --> 01:09:28,000 And we're going to see more about those on Thursday. 1276 01:09:28,000 --> 01:09:30,740 So with that, I think we're out of time. 1277 01:09:30,740 --> 01:09:35,229 And thank you for the notes of appreciation. 1278 01:09:35,229 --> 01:09:39,979 And I will-- 1279 01:09:39,979 --> 01:09:42,560 I think we're going to end here. 1280 01:09:42,560 --> 01:09:48,820 And see you on Thursday, if not before.