1 00:00:25,370 --> 00:00:26,848 MICHAEL SIPSER: Hi, folks. 2 00:00:26,848 --> 00:00:27,890 Why don't we get started? 3 00:00:31,260 --> 00:00:33,650 Welcome back. 4 00:00:33,650 --> 00:00:37,220 Good to see you all here. 5 00:00:37,220 --> 00:00:41,335 So I am going to first-- 6 00:00:43,990 --> 00:00:47,470 well, we'll recap what we did last time 7 00:00:47,470 --> 00:00:49,840 and what we're going to do today. 8 00:00:49,840 --> 00:00:52,120 I'll talk a little bit about the problem set. 9 00:00:52,120 --> 00:00:56,150 And we'll also have a break, as requested, halfway through. 10 00:00:56,150 --> 00:00:58,400 So why don't we jump in? 11 00:00:58,400 --> 00:01:05,010 What we did last time was besides introducing the course, 12 00:01:05,010 --> 00:01:07,460 we introduced finite automata in regular languages, 13 00:01:07,460 --> 00:01:11,190 which are the languages that the finite automata can recognize. 14 00:01:11,190 --> 00:01:13,370 We talked about these regular operations. 15 00:01:18,200 --> 00:01:21,230 Those allow us to build what we call our regular expressions. 16 00:01:21,230 --> 00:01:24,270 These are ways of describing languages. 17 00:01:24,270 --> 00:01:27,530 So we have finite automata can describe languages, 18 00:01:27,530 --> 00:01:30,050 and regular expressions can describe languages. 19 00:01:30,050 --> 00:01:33,488 And one of our goals is to show that those two systems are 20 00:01:33,488 --> 00:01:35,780 equivalent to one another, even though they look rather 21 00:01:35,780 --> 00:01:38,890 different at first glance. 22 00:01:38,890 --> 00:01:45,570 So to move in that direction, we're 23 00:01:45,570 --> 00:01:47,100 going to prove closure properties 24 00:01:47,100 --> 00:01:50,190 for the class of regular languages 25 00:01:50,190 --> 00:01:52,000 over these regular operations. 26 00:01:52,000 --> 00:01:53,700 So we'll show that-- 27 00:01:53,700 --> 00:01:56,970 well, we already showed that any two regular languages 28 00:01:56,970 --> 00:01:59,010 have their union, also being regular. 29 00:01:59,010 --> 00:02:01,480 And we'll show that for the other two operations as well. 30 00:02:01,480 --> 00:02:05,615 So let's just look ahead to what we're going to do today. 31 00:02:05,615 --> 00:02:07,740 We're going to introduce an important concept which 32 00:02:07,740 --> 00:02:09,907 is going to be a theme throughout the course, called 33 00:02:09,907 --> 00:02:11,009 nondeterminism. 34 00:02:11,009 --> 00:02:17,550 And having that as a tool that we can use, 35 00:02:17,550 --> 00:02:22,520 we'll be able to show closure under concatenation and star, 36 00:02:22,520 --> 00:02:25,100 finishing up what we started to do last time. 37 00:02:25,100 --> 00:02:29,510 And then we'll use those closure constructions 38 00:02:29,510 --> 00:02:36,400 to show how to convert regular expressions to finite automata. 39 00:02:36,400 --> 00:02:39,520 And that's going to be halfway to our goal of showing 40 00:02:39,520 --> 00:02:43,180 that the two systems are equivalent to one another. 41 00:02:43,180 --> 00:02:44,830 And the following lecture, we will 42 00:02:44,830 --> 00:02:48,570 show how to do the conversion in the other direction. 43 00:02:48,570 --> 00:02:53,270 So I thought we would just jump in, then, and look at-- 44 00:02:53,270 --> 00:02:56,390 return to the material of the course. 45 00:02:56,390 --> 00:03:00,560 As you remember, we were looking at the closure properties 46 00:03:00,560 --> 00:03:03,570 for the class of regular languages. 47 00:03:03,570 --> 00:03:05,310 We started doing that. 48 00:03:05,310 --> 00:03:09,110 And if you recall, hopefully, we did closure under union. 49 00:03:09,110 --> 00:03:12,650 And then we tried to do closure under concatenation, which 50 00:03:12,650 --> 00:03:18,800 I have shown here on this slide, the proof 51 00:03:18,800 --> 00:03:21,272 attempt that we tried to do last time. 52 00:03:21,272 --> 00:03:22,730 And let's just review that quickly, 53 00:03:22,730 --> 00:03:26,690 because I think that's going to be helpful to see how to fix 54 00:03:26,690 --> 00:03:29,010 the problem that came up. 55 00:03:29,010 --> 00:03:35,750 So if you remember, we're given two regular languages, A1 56 00:03:35,750 --> 00:03:36,410 and A2. 57 00:03:36,410 --> 00:03:39,080 And we're trying to show that the concatenation language 58 00:03:39,080 --> 00:03:42,270 A1A2 is also regular. 59 00:03:42,270 --> 00:03:45,990 And so the way we go about all of these things 60 00:03:45,990 --> 00:03:49,200 is we assume that A1 and A2 are regular. 61 00:03:49,200 --> 00:03:52,680 So that means we have machines, finite automata, for A1 and A2. 62 00:03:52,680 --> 00:03:56,670 We'll call them M1 and M2, that recognize 63 00:03:56,670 --> 00:03:59,260 A1 and A2, respectively. 64 00:03:59,260 --> 00:04:01,560 And then what we need to do in order 65 00:04:01,560 --> 00:04:04,740 to show the concatenation is regular 66 00:04:04,740 --> 00:04:07,020 is to make a finite automaton which 67 00:04:07,020 --> 00:04:09,330 recognizes the concatenation. 68 00:04:09,330 --> 00:04:11,980 And we tried to do that last time. 69 00:04:11,980 --> 00:04:17,470 So if you remember, that concatenation machine-- 70 00:04:17,470 --> 00:04:19,959 M, we're calling it-- 71 00:04:19,959 --> 00:04:21,620 what is it supposed to do? 72 00:04:21,620 --> 00:04:23,680 It's supposed to accept its input if it's 73 00:04:23,680 --> 00:04:25,280 in the concatenation language. 74 00:04:25,280 --> 00:04:29,950 And that means that the input can be split into two parts, x 75 00:04:29,950 --> 00:04:32,980 and y, where x is in the A language, 76 00:04:32,980 --> 00:04:35,770 and y is in the B lang-- 77 00:04:35,770 --> 00:04:37,390 y is accepted by M1-- 78 00:04:37,390 --> 00:04:40,480 and x is accepted by M1, and y is accepted by M2. 79 00:04:40,480 --> 00:04:42,070 Sorry I garbled that up. 80 00:04:42,070 --> 00:04:46,000 So x should be in A1, and y should be in A2. 81 00:04:46,000 --> 00:04:48,580 if you can split w that way, then M should accept it. 82 00:04:48,580 --> 00:04:51,460 So M has to figure out if there's some way 83 00:04:51,460 --> 00:04:54,453 to split the input so that the first machine accepts 84 00:04:54,453 --> 00:04:55,870 the first part, the second machine 85 00:04:55,870 --> 00:04:57,440 accepts the second part. 86 00:04:57,440 --> 00:05:02,850 And the idea that we came up with for doing that 87 00:05:02,850 --> 00:05:11,250 was to take these two machines, build them in to a new machine 88 00:05:11,250 --> 00:05:16,420 M, and then connect the accepting states for M 89 00:05:16,420 --> 00:05:18,010 to the start state-- 90 00:05:18,010 --> 00:05:23,320 connect the accepting states for M1 to the start state for M2. 91 00:05:23,320 --> 00:05:27,070 Because the idea would be that if M1 has accepted 92 00:05:27,070 --> 00:05:30,490 an initial part, well, then you want to pass control 93 00:05:30,490 --> 00:05:34,050 to M2 to accept the rest. 94 00:05:34,050 --> 00:05:36,560 But as we observed, that doesn't quite work. 95 00:05:36,560 --> 00:05:41,560 Because the first place to split w 96 00:05:41,560 --> 00:05:45,500 after you found an initial part that's accepted by M1 97 00:05:45,500 --> 00:05:47,020 may not be the right place. 98 00:05:47,020 --> 00:05:49,750 Because the remainder may not be accepted by M2. 99 00:05:49,750 --> 00:05:52,330 You might have been better off waiting 100 00:05:52,330 --> 00:05:54,730 until you found another place that M1 101 00:05:54,730 --> 00:05:57,760 accepted, later on in the string, say, over here. 102 00:05:57,760 --> 00:06:00,640 And then by splitting it over there, 103 00:06:00,640 --> 00:06:02,740 then maybe you do get successfully 104 00:06:02,740 --> 00:06:04,960 find that the remainder is accepted by M2. 105 00:06:04,960 --> 00:06:07,990 Whereas if you tried to split it in the first place, 106 00:06:07,990 --> 00:06:10,390 the remainder wouldn't have been accepted by M2. 107 00:06:10,390 --> 00:06:11,590 So all you need to do-- 108 00:06:11,590 --> 00:06:13,840 M has to know, is there some place 109 00:06:13,840 --> 00:06:18,010 to split the input so that you can get both parts accepted 110 00:06:18,010 --> 00:06:20,330 by the respective machines? 111 00:06:20,330 --> 00:06:24,350 The problem is that M might need to know the future in order 112 00:06:24,350 --> 00:06:26,240 to know where to make the split. 113 00:06:26,240 --> 00:06:28,060 And it doesn't have access to the future. 114 00:06:28,060 --> 00:06:30,120 So what do we do? 115 00:06:30,120 --> 00:06:32,330 So what we're going to do is introduce 116 00:06:32,330 --> 00:06:37,670 a new concept that will allow us to basically get 117 00:06:37,670 --> 00:06:40,160 the effect of M1-- 118 00:06:40,160 --> 00:06:43,100 and the-- sort of being able to see the future. 119 00:06:43,100 --> 00:06:45,500 And that new concept is going to be very important for us 120 00:06:45,500 --> 00:06:46,370 throughout the term. 121 00:06:46,370 --> 00:06:48,320 It's called nondeterminism. 122 00:06:48,320 --> 00:06:51,680 And so we're going to introduce a new kind of finite automaton 123 00:06:51,680 --> 00:06:54,230 called a nondeterministic finite automaton. 124 00:06:54,230 --> 00:06:56,360 And first, we'll look at that, and then we'll 125 00:06:56,360 --> 00:07:00,050 see how that fits in with the previous deterministic finite 126 00:07:00,050 --> 00:07:03,060 automaton, that we introduced last time. 127 00:07:03,060 --> 00:07:04,145 So here's an example. 128 00:07:04,145 --> 00:07:07,870 It's always good to start off with an example. 129 00:07:07,870 --> 00:07:10,480 Here is a picture of a nondeterministic finite 130 00:07:10,480 --> 00:07:10,980 automaton. 131 00:07:10,980 --> 00:07:13,140 It looks very similar, at first glance, 132 00:07:13,140 --> 00:07:17,910 to the former kind, the deterministic finite automaton. 133 00:07:17,910 --> 00:07:19,420 But if you look a little carefully, 134 00:07:19,420 --> 00:07:22,540 you see that there are some key differences. 135 00:07:22,540 --> 00:07:31,360 The most important difference is that in state q1, for example, 136 00:07:31,360 --> 00:07:35,080 whereas in the machines that we introduced last time, 137 00:07:35,080 --> 00:07:38,830 there had to be exactly one way to go on each possible input 138 00:07:38,830 --> 00:07:42,430 symbol so you knew how to follow along through the machine 139 00:07:42,430 --> 00:07:45,790 it's computing, here there are two ways to go. 140 00:07:45,790 --> 00:07:50,590 In q1, you can either stay in a1, or you can go to q2. 141 00:07:50,590 --> 00:07:52,540 That's the essence of nondeterminism. 142 00:07:52,540 --> 00:07:55,780 There could be many ways to proceed. 143 00:07:55,780 --> 00:07:59,920 And furthermore, on q1, if you get a b, 144 00:07:59,920 --> 00:08:02,390 then there's nowhere to go. 145 00:08:02,390 --> 00:08:09,820 So that's also possible within nondeterminism. 146 00:08:09,820 --> 00:08:12,920 So let's just start looking at these features. 147 00:08:12,920 --> 00:08:15,260 There are multiple paths forward-- 148 00:08:15,260 --> 00:08:16,490 multiple paths possible. 149 00:08:16,490 --> 00:08:20,450 You might be able to have one, as we had before, 150 00:08:20,450 --> 00:08:23,390 or many ways to go at each step, or maybe 0 ways 151 00:08:23,390 --> 00:08:24,560 to go at each step. 152 00:08:24,560 --> 00:08:28,460 Those are all legitimate for a nondeterministic machine, which 153 00:08:28,460 --> 00:08:32,049 is doing a nondeterministic computation. 154 00:08:32,049 --> 00:08:33,909 Another difference, if you look carefully, 155 00:08:33,909 --> 00:08:37,030 is that we're allowing here the empty string 156 00:08:37,030 --> 00:08:39,929 to appear on a transition. 157 00:08:39,929 --> 00:08:43,080 That's perhaps a little less essential 158 00:08:43,080 --> 00:08:44,880 to the spirit of nondeterminism. 159 00:08:44,880 --> 00:08:48,180 But it's going to turn out to be a convenience when we actually 160 00:08:48,180 --> 00:08:51,600 apply nondeterminism to build machines, 161 00:08:51,600 --> 00:08:52,860 as you'll see very shortly. 162 00:08:55,910 --> 00:09:00,090 Now, if there are many different ways to go-- 163 00:09:00,090 --> 00:09:04,560 and some of those ways to go might have different outcomes. 164 00:09:04,560 --> 00:09:07,740 As we remember from before, we accepted the input, 165 00:09:07,740 --> 00:09:09,570 if you end up in an accept state, 166 00:09:09,570 --> 00:09:12,060 and we rejected the input, if you end up 167 00:09:12,060 --> 00:09:14,700 not in an accept state, in a non-accepting state. 168 00:09:14,700 --> 00:09:15,900 Then you reject. 169 00:09:15,900 --> 00:09:18,480 But now there might be several different ways to go. 170 00:09:21,890 --> 00:09:23,390 And we'll do an example in a minute. 171 00:09:23,390 --> 00:09:25,390 But there might be several different ways to go. 172 00:09:25,390 --> 00:09:26,750 And they might disagree. 173 00:09:26,750 --> 00:09:28,250 Some of them might accept. 174 00:09:28,250 --> 00:09:29,850 Other ones might reject. 175 00:09:29,850 --> 00:09:31,350 So then what do you do? 176 00:09:31,350 --> 00:09:37,220 Well, in that case, acceptance always overrules rejection. 177 00:09:37,220 --> 00:09:40,970 That's the essence of nondeterminism the way 178 00:09:40,970 --> 00:09:43,010 we're setting it up. 179 00:09:43,010 --> 00:09:44,690 You may ask why that is. 180 00:09:44,690 --> 00:09:49,920 And the spirit of that will become clear in due course. 181 00:09:49,920 --> 00:09:52,640 But right now, just take it as a rule. 182 00:09:52,640 --> 00:09:54,860 When we're having a nondeterministic machine, 183 00:09:54,860 --> 00:09:57,420 acceptance overrules rejection. 184 00:09:57,420 --> 00:10:02,420 So as long as there is-- one of the possible ways 185 00:10:02,420 --> 00:10:04,490 to go ends up at an accept, we say 186 00:10:04,490 --> 00:10:06,740 the whole thing is accepted. 187 00:10:06,740 --> 00:10:08,840 The only way we can possibly reject-- 188 00:10:08,840 --> 00:10:12,200 if all of the possible ways to go end up at rejection, 189 00:10:12,200 --> 00:10:13,910 end up at a non-accept state. 190 00:10:13,910 --> 00:10:15,542 So we'll see example of-- 191 00:10:15,542 --> 00:10:17,750 I think we're going to do an example right now, yeah. 192 00:10:17,750 --> 00:10:25,677 So if we take, for example, this machine N1 now on an input ab-- 193 00:10:25,677 --> 00:10:27,760 and we're going to process the symbols one by one, 194 00:10:27,760 --> 00:10:30,400 just like we did before. 195 00:10:30,400 --> 00:10:34,140 But now, to follow along, there might be several different ways 196 00:10:34,140 --> 00:10:35,260 to go. 197 00:10:35,260 --> 00:10:44,890 So if we take the first symbol a, and we run the machine-- 198 00:10:44,890 --> 00:10:49,050 so when the machine, it starts at the star state, as before-- 199 00:10:49,050 --> 00:10:50,790 but now an a comes in. 200 00:10:50,790 --> 00:10:52,352 And there might be two-- now there 201 00:10:52,352 --> 00:10:53,560 are two different ways to go. 202 00:10:53,560 --> 00:10:56,550 So we're going to keep track of both of them. 203 00:10:56,550 --> 00:10:58,890 After the machine reads an a, you 204 00:10:58,890 --> 00:11:03,480 can think of it as being in two states now simultaneously. 205 00:11:03,480 --> 00:11:06,450 It can be in state q1, and it can be in state q2. 206 00:11:06,450 --> 00:11:08,820 So those are two different possible places 207 00:11:08,820 --> 00:11:11,580 it could be at this moment. 208 00:11:11,580 --> 00:11:12,510 OK? 209 00:11:12,510 --> 00:11:17,100 Now we read the next symbol, the b. 210 00:11:17,100 --> 00:11:20,520 And from a b, you take each of the places 211 00:11:20,520 --> 00:11:26,120 where the machine could be at the end of the previous symbol, 212 00:11:26,120 --> 00:11:32,510 and you then apply reading a b, the next symbol, 213 00:11:32,510 --> 00:11:35,180 from each of those states where the machine could 214 00:11:35,180 --> 00:11:37,410 be in from the previous symbol. 215 00:11:37,410 --> 00:11:40,760 So the machine could be in q1 and q2 after reading an a. 216 00:11:40,760 --> 00:11:42,310 Now we apply a b. 217 00:11:42,310 --> 00:11:46,720 Well, q1 on a b goes nowhere. 218 00:11:46,720 --> 00:11:52,620 So you think of that branch, if you will, of the computation 219 00:11:52,620 --> 00:11:53,640 as just dying off. 220 00:11:53,640 --> 00:11:54,930 It has nowhere to go. 221 00:11:54,930 --> 00:11:57,390 It just vanishes. 222 00:11:57,390 --> 00:12:03,210 However, the other possibility, which was state q2 on a b, 223 00:12:03,210 --> 00:12:05,460 does allow, does have a place. 224 00:12:05,460 --> 00:12:12,180 So the machine is now going to go from q2 to q3 225 00:12:12,180 --> 00:12:13,980 on that branch of the computation, which 226 00:12:13,980 --> 00:12:15,450 reading-- on reading a b. 227 00:12:15,450 --> 00:12:25,370 And then it has, coming out of q3, there are two symbols. 228 00:12:25,370 --> 00:12:27,920 There's an a and an empty string symbol. 229 00:12:27,920 --> 00:12:31,610 Now, on an a, the machine would have to read an a in order 230 00:12:31,610 --> 00:12:34,460 to transition along that arrow. 231 00:12:34,460 --> 00:12:37,370 But when there's an empty symbol on the arrow, 232 00:12:37,370 --> 00:12:41,990 that means the machine can go along that arrow for free 233 00:12:41,990 --> 00:12:44,090 without even reading anything. 234 00:12:44,090 --> 00:12:49,780 As long as it gets to q3, it can automatically jump the q4. 235 00:12:49,780 --> 00:12:55,030 And so once it has read a b and gone to q3, 236 00:12:55,030 --> 00:13:03,380 now it can either stay in q3, or it 237 00:13:03,380 --> 00:13:06,440 can go along the empty transition and go to q4. 238 00:13:06,440 --> 00:13:08,960 So again, it is going to be a nondeterministic step 239 00:13:08,960 --> 00:13:11,720 at this point. 240 00:13:11,720 --> 00:13:14,900 The essence of having a empty transition 241 00:13:14,900 --> 00:13:17,400 is that there is going to be nondeterminism. 242 00:13:17,400 --> 00:13:19,880 That's why we didn't introduce that 243 00:13:19,880 --> 00:13:23,780 for deterministic automata, because you 244 00:13:23,780 --> 00:13:27,380 don't have to transition along an empty string transition. 245 00:13:27,380 --> 00:13:29,690 You can stay where you are, or you 246 00:13:29,690 --> 00:13:32,030 can go along the empty string transition 247 00:13:32,030 --> 00:13:35,030 without reading any input and go over to the next state, which, 248 00:13:35,030 --> 00:13:37,070 in this case, is q4. 249 00:13:37,070 --> 00:13:38,990 So let's just see where we are. 250 00:13:38,990 --> 00:13:42,680 After reading an a, we're in states q1, q2. 251 00:13:42,680 --> 00:13:46,400 But now after reading a b, we're in states q3 and q4 252 00:13:46,400 --> 00:13:47,720 as possibilities. 253 00:13:47,720 --> 00:13:50,300 And now we're at the end of the input, 254 00:13:50,300 --> 00:13:53,940 and we look and see what we got. 255 00:13:53,940 --> 00:13:56,580 If any one of the states as possibilities 256 00:13:56,580 --> 00:13:58,740 that we are right now at the end of the string 257 00:13:58,740 --> 00:14:02,070 is an accept state, then we say, overall, the machine 258 00:14:02,070 --> 00:14:04,260 has accepted. 259 00:14:04,260 --> 00:14:07,050 So that corresponds to what we said over here before-- 260 00:14:07,050 --> 00:14:12,340 accept the input if some path leads to an accept. 261 00:14:12,340 --> 00:14:17,020 So if any way of proceeding through these nondeterministic 262 00:14:17,020 --> 00:14:19,960 choices will lead you to an accept, then you will say, 263 00:14:19,960 --> 00:14:21,750 we're going to accept the input. 264 00:14:21,750 --> 00:14:22,250 OK? 265 00:14:22,250 --> 00:14:25,338 So this input here is accepted. 266 00:14:25,338 --> 00:14:26,380 Let's do another example. 267 00:14:26,380 --> 00:14:31,070 Suppose we have the input aa instead of ab. 268 00:14:31,070 --> 00:14:34,220 So aa, after the first day, as before, 269 00:14:34,220 --> 00:14:37,130 we're in states q1 and q2 as possibilities. 270 00:14:37,130 --> 00:14:39,050 Now we read an a again. 271 00:14:39,050 --> 00:14:42,050 Now, the one that's on state q1, that possibility, 272 00:14:42,050 --> 00:14:44,630 q1 possibility, after reading an a, 273 00:14:44,630 --> 00:14:48,260 it again branches to q1 and q2. 274 00:14:48,260 --> 00:14:50,370 So we know after reading the second a, 275 00:14:50,370 --> 00:14:52,590 we're going to be in at least q1 and q2. 276 00:14:52,590 --> 00:14:56,930 Now how about the state that had been on q2 on reading 277 00:14:56,930 --> 00:14:59,530 an a, the one from before? 278 00:14:59,530 --> 00:15:01,600 After reading the first a, you were in q2. 279 00:15:01,600 --> 00:15:03,920 Now reading the second a, there's nowhere to go. 280 00:15:03,920 --> 00:15:06,600 So that one just gets removed. 281 00:15:06,600 --> 00:15:11,190 So after reading aa, we remain in states q1 and q2 282 00:15:11,190 --> 00:15:12,200 as possibilities. 283 00:15:14,760 --> 00:15:17,610 Neither of those are accept states. 284 00:15:17,610 --> 00:15:23,940 So therefore, on input aa, the machine rejects. 285 00:15:23,940 --> 00:15:25,700 OK? 286 00:15:25,700 --> 00:15:29,780 Let's just do a couple more, and then I'll ask you to do one. 287 00:15:29,780 --> 00:15:33,273 So we have aba as an input. 288 00:15:33,273 --> 00:15:34,440 Let's see what happens then. 289 00:15:34,440 --> 00:15:45,200 So remember, after reading ab, the machine is in the two 290 00:15:45,200 --> 00:15:47,580 states q3, q4 as possibilities. 291 00:15:47,580 --> 00:15:51,240 That's what we have from the first example. 292 00:15:51,240 --> 00:15:54,200 So after reading ab, we're in states q3, q4. 293 00:15:54,200 --> 00:15:57,110 Now we read another a. 294 00:15:57,110 --> 00:15:59,768 The q4 on an a has nowhere to go. 295 00:15:59,768 --> 00:16:01,560 In fact, it has nowhere to go in any input. 296 00:16:01,560 --> 00:16:04,790 So no matter what comes in after you're in state q4, 297 00:16:04,790 --> 00:16:07,070 that branch dies. 298 00:16:07,070 --> 00:16:10,900 But on q3, which is another one of the possibilities reading 299 00:16:10,900 --> 00:16:13,780 an a, it can follow along, just transition. 300 00:16:13,780 --> 00:16:16,630 Because that's one of the labels on that transition, is a. 301 00:16:16,630 --> 00:16:18,540 So you can follow along just in transition 302 00:16:18,540 --> 00:16:22,570 on reading an a, which is the last symbol in the string. 303 00:16:22,570 --> 00:16:28,660 And so now after aba, you are in only state q4 as a possibility. 304 00:16:28,660 --> 00:16:31,540 But that happens to be an accept state, so the machine accepts. 305 00:16:36,800 --> 00:16:37,310 OK. 306 00:16:37,310 --> 00:16:42,000 And now lastly, let's take our final example. 307 00:16:42,000 --> 00:16:43,620 What happens if we have abb? 308 00:16:46,800 --> 00:16:51,930 So as we remember before, after reading ab-- 309 00:16:51,930 --> 00:16:53,250 that was the first example-- 310 00:16:53,250 --> 00:16:56,070 we were in states q3, q4 as possibilities. 311 00:16:56,070 --> 00:16:57,570 Now we read a b. 312 00:16:57,570 --> 00:17:00,780 Well, neither of those states have anywhere to go on a b. 313 00:17:00,780 --> 00:17:05,430 So now all threads, all branches, of the computation 314 00:17:05,430 --> 00:17:07,750 die off. 315 00:17:07,750 --> 00:17:10,540 And at this point, the machine is totally dead. 316 00:17:10,540 --> 00:17:13,960 It has no active possibilities left. 317 00:17:13,960 --> 00:17:17,138 So certainly, it's going to reject this input, because none 318 00:17:17,138 --> 00:17:19,180 of the active states-- there are no active states 319 00:17:19,180 --> 00:17:20,359 or accepting states. 320 00:17:20,359 --> 00:17:24,400 And in fact, if you looked at anything that came later, 321 00:17:24,400 --> 00:17:26,920 anything that extended the string abb 322 00:17:26,920 --> 00:17:28,150 would also be rejected. 323 00:17:28,150 --> 00:17:30,766 Because once the machine had all-- 324 00:17:30,766 --> 00:17:33,010 all possibilities have died off, there's 325 00:17:33,010 --> 00:17:37,030 no way for them to come back to life on any extensions. 326 00:17:37,030 --> 00:17:40,450 So with that-- oh, here's an important point 327 00:17:40,450 --> 00:17:43,570 before I'm going to jump to a check-in on you. 328 00:17:43,570 --> 00:17:47,440 But I think one thing that might be 329 00:17:47,440 --> 00:17:49,330 on your mind about this nondeterminism 330 00:17:49,330 --> 00:17:53,680 is how does that correspond to reality? 331 00:17:53,680 --> 00:17:54,610 Well, it doesn't. 332 00:17:57,310 --> 00:17:59,980 We're not intending for nondeterminism 333 00:17:59,980 --> 00:18:01,810 as we're defining it to correspond 334 00:18:01,810 --> 00:18:04,070 to a physical device. 335 00:18:04,070 --> 00:18:05,950 But nevertheless, as you'll see, it's 336 00:18:05,950 --> 00:18:09,580 a very mathematically useful concept, this nondeterminism. 337 00:18:09,580 --> 00:18:14,620 And it's going to be playing a big role throughout the subject 338 00:18:14,620 --> 00:18:17,630 as we'll experience it during the rest of the term. 339 00:18:17,630 --> 00:18:20,980 So with that, I'm going to have a little check-in. 340 00:18:20,980 --> 00:18:23,590 I'm going to ask you to consider what 341 00:18:23,590 --> 00:18:25,210 happens on one of the inputs. 342 00:18:25,210 --> 00:18:26,350 So here we go. 343 00:18:26,350 --> 00:18:28,823 What does it do on input aab? 344 00:18:28,823 --> 00:18:29,740 So here's the machine. 345 00:18:29,740 --> 00:18:30,560 You can look at it. 346 00:18:30,560 --> 00:18:33,460 And suppose, hopefully, there's a poll here 347 00:18:33,460 --> 00:18:39,815 for me to give to you so you can give me your input. 348 00:18:42,990 --> 00:18:45,010 So what does the machine do on input aab? 349 00:18:53,047 --> 00:18:54,130 Most of you have answered. 350 00:18:57,460 --> 00:18:59,230 Again, you're not going to be penalized 351 00:18:59,230 --> 00:19:01,240 for getting the wrong answer. 352 00:19:01,240 --> 00:19:07,130 But hopefully, you'll get the right answer. 353 00:19:07,130 --> 00:19:10,080 Anyway, let's just take a look here. 354 00:19:10,080 --> 00:19:12,140 So time is up. 355 00:19:12,140 --> 00:19:15,035 Let's end the polling and share the results. 356 00:19:17,780 --> 00:19:20,350 So the majority of you, majority have gotten 357 00:19:20,350 --> 00:19:21,985 the correct answer, which is a. 358 00:19:21,985 --> 00:19:25,870 The machine does accept aab. 359 00:19:25,870 --> 00:19:28,540 Because when you have a-- 360 00:19:28,540 --> 00:19:33,490 so I'll show you the path that corresponds to accepting. 361 00:19:33,490 --> 00:19:41,180 You go a, a, b, and then empty string. 362 00:19:41,180 --> 00:19:49,230 And so that sequence of steps is one 363 00:19:49,230 --> 00:19:51,090 of the nondeterministic possibilities 364 00:19:51,090 --> 00:19:52,990 that the machine can follow. 365 00:19:52,990 --> 00:19:57,360 And that shows that the machine does accept the input aab. 366 00:19:57,360 --> 00:20:01,260 You can think about it, the way we did it before also. 367 00:20:01,260 --> 00:20:05,010 If you read an a, it's in the two possibilities q1, q2. 368 00:20:05,010 --> 00:20:08,550 You read a second a, again, in the possibilities q1, q2. 369 00:20:08,550 --> 00:20:09,930 Now you read a b. 370 00:20:09,930 --> 00:20:12,570 It's in the possibilities q3, q4. 371 00:20:12,570 --> 00:20:15,853 And that's it, aab. 372 00:20:15,853 --> 00:20:16,770 So now you read the b. 373 00:20:16,770 --> 00:20:18,750 It's in possibility q3, q4. 374 00:20:18,750 --> 00:20:20,190 q4 is an accepting state. 375 00:20:20,190 --> 00:20:23,080 That overrules the non-accepting state. 376 00:20:23,080 --> 00:20:25,410 And so the machine accepts. 377 00:20:25,410 --> 00:20:27,160 You have to understand this. 378 00:20:27,160 --> 00:20:29,880 So if you didn't get it right, go back 379 00:20:29,880 --> 00:20:32,560 and think about where you slipped up. 380 00:20:32,560 --> 00:20:33,060 OK? 381 00:20:33,060 --> 00:20:39,300 Because this is just getting-- we're 382 00:20:39,300 --> 00:20:41,850 just getting warmed up here. 383 00:20:41,850 --> 00:20:44,770 It's going to get a lot harder. 384 00:20:44,770 --> 00:20:48,580 OK, so stop sharing the results. 385 00:20:48,580 --> 00:20:51,400 And so let's continue. 386 00:20:51,400 --> 00:20:55,500 So just as we did last time, we can formally 387 00:20:55,500 --> 00:20:59,010 define a nondeterministic finite automata. 388 00:20:59,010 --> 00:21:02,430 Here's the picture again. 389 00:21:02,430 --> 00:21:03,570 OK. 390 00:21:03,570 --> 00:21:07,410 So it looks a lot like the case we had before, 391 00:21:07,410 --> 00:21:10,680 the Deterministic Finite Automata, or DFA, 392 00:21:10,680 --> 00:21:12,120 as we'll call them. 393 00:21:12,120 --> 00:21:13,290 It's a 5 tuple. 394 00:21:13,290 --> 00:21:17,220 So I've written down little reminders 395 00:21:17,220 --> 00:21:19,290 for what those components of that 5 tuple 396 00:21:19,290 --> 00:21:24,220 are, that list of five components. 397 00:21:24,220 --> 00:21:26,580 So they're all the same as before-- 398 00:21:26,580 --> 00:21:30,480 states, alphabet, transition function, start state, 399 00:21:30,480 --> 00:21:31,740 and accepting states. 400 00:21:31,740 --> 00:21:34,560 So that the formal definition looks 401 00:21:34,560 --> 00:21:38,160 exactly the same except the structure of the transition 402 00:21:38,160 --> 00:21:40,290 function. 403 00:21:40,290 --> 00:21:43,010 So now, before, if you remember, you 404 00:21:43,010 --> 00:21:47,510 had a state and an input symbol, and you got back a state. 405 00:21:47,510 --> 00:21:49,880 Now we have something more complicated-looking. 406 00:21:49,880 --> 00:21:52,910 We have a state and an input symbol, 407 00:21:52,910 --> 00:21:56,210 but instead of just sigma, it's sigma sub epsilon. 408 00:21:56,210 --> 00:22:01,630 And that that's a shorthand for sigma union epsilon. 409 00:22:01,630 --> 00:22:04,780 And that's a way-- my way of saying that you're 410 00:22:04,780 --> 00:22:07,780 allowed to have on your transition 411 00:22:07,780 --> 00:22:12,050 arrows either an input symbol or an empty string. 412 00:22:12,050 --> 00:22:14,200 So the transition function has to tell you 413 00:22:14,200 --> 00:22:17,870 what to do when you have an empty string coming in as well. 414 00:22:17,870 --> 00:22:20,320 So that would be part of your table for the transition 415 00:22:20,320 --> 00:22:21,050 function. 416 00:22:21,050 --> 00:22:23,470 Now, over here, what's going on over here? 417 00:22:23,470 --> 00:22:28,500 Well, now, instead of just producing a single state, when 418 00:22:28,500 --> 00:22:31,950 you've read, for example, an a from q1, 419 00:22:31,950 --> 00:22:34,570 there's a whole set of possibilities. 420 00:22:34,570 --> 00:22:38,710 So here we have what's called a power set. 421 00:22:38,710 --> 00:22:45,430 That's the set of subsets of the collection Q. 422 00:22:45,430 --> 00:22:49,570 So here we're going to produce an entire subset of states. 423 00:22:49,570 --> 00:22:51,610 Instead of just one state coming out, 424 00:22:51,610 --> 00:22:54,130 there might be a subset of possible states 425 00:22:54,130 --> 00:22:55,660 that you can go to. 426 00:22:55,660 --> 00:23:01,330 So the power set of Q is a set of subsets of Q. 427 00:23:01,330 --> 00:23:06,390 So that's what this notation means. 428 00:23:06,390 --> 00:23:10,430 Again, this is something that I'm, hopefully, presenting 429 00:23:10,430 --> 00:23:12,290 to you as a bit of a reminder. 430 00:23:12,290 --> 00:23:14,030 You've seen this somewhere else before. 431 00:23:14,030 --> 00:23:17,360 But please make sure you understand the notation, 432 00:23:17,360 --> 00:23:20,570 going forward, because we'll be doing less hand-holding 433 00:23:20,570 --> 00:23:23,960 as we start moving forward. 434 00:23:23,960 --> 00:23:25,190 OK. 435 00:23:25,190 --> 00:23:26,330 So just let's take a look. 436 00:23:26,330 --> 00:23:31,340 In the N1 example here, just to illustrate what's going on, 437 00:23:31,340 --> 00:23:34,730 when you're in state q1 reading an a, 438 00:23:34,730 --> 00:23:36,950 now you get a whole set of possibilities, which, 439 00:23:36,950 --> 00:23:40,260 in this case, is q1 and q2. 440 00:23:40,260 --> 00:23:45,440 Whereas, if you're reading a b, what would be that set? 441 00:23:45,440 --> 00:23:49,730 Coming out of q1, what's the set of possible successor states? 442 00:23:49,730 --> 00:23:50,640 Well, there are none. 443 00:23:50,640 --> 00:23:53,140 So it's the empty set. 444 00:23:53,140 --> 00:23:53,640 OK? 445 00:23:53,640 --> 00:23:56,310 So hopefully, you're understanding the notation 446 00:23:56,310 --> 00:23:56,920 here. 447 00:23:56,920 --> 00:24:00,500 So now here's, I think, really important. 448 00:24:00,500 --> 00:24:03,310 How do we understand nondeterminism, intuitively 449 00:24:03,310 --> 00:24:04,060 speaking? 450 00:24:04,060 --> 00:24:06,280 And there are multiple different ways, 451 00:24:06,280 --> 00:24:11,550 which each has their value under different circumstances. 452 00:24:11,550 --> 00:24:15,200 So one way is thinking about nondeterminism 453 00:24:15,200 --> 00:24:17,190 as a kind of parallelism. 454 00:24:17,190 --> 00:24:19,970 So every time the machine has a nondeterministic choice 455 00:24:19,970 --> 00:24:22,250 to make, where there's more than one outcome, 456 00:24:22,250 --> 00:24:28,490 you think of the machine as a branching, forking, 457 00:24:28,490 --> 00:24:32,750 new threads of the parallel computation at that stage, 458 00:24:32,750 --> 00:24:35,360 where it makes an entire copy of itself when there's 459 00:24:35,360 --> 00:24:37,760 a choice of possibilities. 460 00:24:37,760 --> 00:24:40,790 And then each of those independently 461 00:24:40,790 --> 00:24:42,740 proceeds to read the rest of the input 462 00:24:42,740 --> 00:24:44,940 as separate threads of the computation. 463 00:24:44,940 --> 00:24:47,030 So if you're familiar with parallel computing, 464 00:24:47,030 --> 00:24:49,200 this should be reasonably familiar to you. 465 00:24:49,200 --> 00:24:51,410 The only key thing to remember is 466 00:24:51,410 --> 00:24:55,550 that as this thing forks a number of possibilities, 467 00:24:55,550 --> 00:24:59,720 the acceptance rule is, that if any one of those possibilities 468 00:24:59,720 --> 00:25:03,200 gets to an accept at the end of the input, it raises a flag 469 00:25:03,200 --> 00:25:04,460 and says, accept. 470 00:25:04,460 --> 00:25:06,260 And that overrules everybody else. 471 00:25:09,040 --> 00:25:12,630 So acceptance dominates. 472 00:25:12,630 --> 00:25:16,730 So another way of looking at it is the mathematical view, 473 00:25:16,730 --> 00:25:19,160 where you can imagine-- and we're going to use all these. 474 00:25:19,160 --> 00:25:21,170 So you really need to understand them all. 475 00:25:21,170 --> 00:25:24,770 The mathematical view is you can think 476 00:25:24,770 --> 00:25:29,470 of the computation as kind of a tree of possibilities. 477 00:25:29,470 --> 00:25:31,320 So you start off at the very beginning 478 00:25:31,320 --> 00:25:34,920 at the root of the computation, which is when it really begins. 479 00:25:34,920 --> 00:25:37,830 But every time there's a nondeterministic branching 480 00:25:37,830 --> 00:25:44,370 that occurs, that node of the tree 481 00:25:44,370 --> 00:25:49,330 has multiple children coming out of that node. 482 00:25:49,330 --> 00:25:51,880 And so the different threads of the computation 483 00:25:51,880 --> 00:25:55,792 correspond to different branches of that tree. 484 00:25:55,792 --> 00:25:58,250 And now you're going to accept if any one of those branches 485 00:25:58,250 --> 00:26:00,580 leads to an accepting state-- 486 00:26:00,580 --> 00:26:04,370 OK, obviously, somewhat similar to what we had before. 487 00:26:04,370 --> 00:26:09,100 But I think it's a little bit of a different perspective on how 488 00:26:09,100 --> 00:26:11,080 to think about nondeterminism. 489 00:26:11,080 --> 00:26:13,730 And the last one is going to sound a little weird. 490 00:26:13,730 --> 00:26:18,880 But actually, I think for people who are in the business, 491 00:26:18,880 --> 00:26:21,210 it's the one they use the most. 492 00:26:21,210 --> 00:26:27,630 And that's the magical way of thinking about nondeterminism. 493 00:26:27,630 --> 00:26:33,630 And that is, when the machine has nondeterministic choices 494 00:26:33,630 --> 00:26:36,320 to make, you think of the machine 495 00:26:36,320 --> 00:26:41,510 as magically guessing the correct one at every stage, 496 00:26:41,510 --> 00:26:44,600 and the correct one being the one that will eventually 497 00:26:44,600 --> 00:26:47,730 lead it to accept. 498 00:26:47,730 --> 00:26:48,230 OK? 499 00:26:48,230 --> 00:26:51,710 So you can think of the machine as guessing 500 00:26:51,710 --> 00:26:54,510 which is the right way to go. 501 00:26:54,510 --> 00:26:57,080 And if there is some way right way to go, 502 00:26:57,080 --> 00:26:58,080 it always guesses right. 503 00:26:58,080 --> 00:27:00,090 Of course, if the machine ends up rejecting, 504 00:27:00,090 --> 00:27:02,590 because there is no right way to go, then it doesn't matter. 505 00:27:02,590 --> 00:27:03,990 There is no good guess. 506 00:27:03,990 --> 00:27:06,600 But if there is some good guess, we'll think of the machine 507 00:27:06,600 --> 00:27:09,810 as taking that good guess and going that way. 508 00:27:09,810 --> 00:27:11,060 OK. 509 00:27:11,060 --> 00:27:15,080 So now here is a very important thing. 510 00:27:15,080 --> 00:27:20,060 We introduced this new model, the Nondeterministic Finite 511 00:27:20,060 --> 00:27:22,460 Automaton, NFA. 512 00:27:22,460 --> 00:27:25,760 It turns out, even though it looks more powerful, 513 00:27:25,760 --> 00:27:27,590 because it has this nondeterminism, 514 00:27:27,590 --> 00:27:28,820 it isn't any more powerful. 515 00:27:28,820 --> 00:27:31,460 It can do exactly the same class of languages, 516 00:27:31,460 --> 00:27:32,810 the regular languages. 517 00:27:32,810 --> 00:27:36,350 And we'll show that with this theorem 518 00:27:36,350 --> 00:27:43,620 here, that if an NFA recognizes a, then a is regular. 519 00:27:43,620 --> 00:27:47,040 So we'll prove that by showing how 520 00:27:47,040 --> 00:27:51,150 to convert an NFA to an equivalent DFA, which 521 00:27:51,150 --> 00:27:52,540 does the same language. 522 00:27:52,540 --> 00:27:55,350 So we can take an NFA that has the nondeterminism 523 00:27:55,350 --> 00:27:58,770 and find another DFA which doesn't have nondeterminism, 524 00:27:58,770 --> 00:28:00,180 but does the same language. 525 00:28:00,180 --> 00:28:03,040 It accepts exactly the same strength, 526 00:28:03,040 --> 00:28:06,670 even though it lacks that nondeterministic capability. 527 00:28:06,670 --> 00:28:09,110 This is going to be extremely useful, by the way, 528 00:28:09,110 --> 00:28:13,990 and for example, in showing that closure under concatenation. 529 00:28:13,990 --> 00:28:19,750 OK, so in this presentation here, 530 00:28:19,750 --> 00:28:22,178 I'm going to ignore the epsilon transitions. 531 00:28:22,178 --> 00:28:24,220 Because once you get the idea for how to do this, 532 00:28:24,220 --> 00:28:26,200 you could figure out how to incorporate them. 533 00:28:26,200 --> 00:28:28,370 They just make things a little more complicated. 534 00:28:28,370 --> 00:28:33,010 So let's just focus on the key aspect of nondeterminism, which 535 00:28:33,010 --> 00:28:35,650 is that the machine could have several ways to go 536 00:28:35,650 --> 00:28:37,430 at any point in time. 537 00:28:37,430 --> 00:28:42,110 There could be several next states on an input. 538 00:28:42,110 --> 00:28:42,800 OK? 539 00:28:42,800 --> 00:28:45,590 Now the idea for the construction-- 540 00:28:45,590 --> 00:28:50,060 so we're going to start with a nondeterministic machine M, 541 00:28:50,060 --> 00:28:52,490 and we're going to build a deterministic machine 542 00:28:52,490 --> 00:28:55,910 M prime, which does exactly the same thing. 543 00:28:55,910 --> 00:28:58,040 And the way M prime works is it's 544 00:28:58,040 --> 00:29:01,830 going to do what you would do if you were simulating 545 00:29:01,830 --> 00:29:04,190 M. What would you do? 546 00:29:04,190 --> 00:29:08,060 This is what we were doing as I was explaining it to you. 547 00:29:08,060 --> 00:29:11,750 If you were simulating M, every time you get an input symbol, 548 00:29:11,750 --> 00:29:14,840 you just keep track of what is the set of possible states 549 00:29:14,840 --> 00:29:16,920 at that point in time. 550 00:29:16,920 --> 00:29:18,650 That's what the DFA is going to do. 551 00:29:21,920 --> 00:29:26,600 it's going to have to keep track of which possible set of states 552 00:29:26,600 --> 00:29:32,690 the NFA could be in at the point on that input 553 00:29:32,690 --> 00:29:34,830 where we are right now. 554 00:29:34,830 --> 00:29:39,470 And then as you get to the next symbol, 555 00:29:39,470 --> 00:29:42,470 the DFA is going to have to update things to keep track 556 00:29:42,470 --> 00:29:47,230 of the next set of states the NFA could be in at this point, 557 00:29:47,230 --> 00:29:49,610 just like you would do. 558 00:29:49,610 --> 00:29:50,110 OK? 559 00:29:50,110 --> 00:29:55,630 And so here's a kind of a picture. 560 00:29:55,630 --> 00:29:59,960 And how do we implement that? 561 00:29:59,960 --> 00:30:02,230 So here's the NFA that we're starting with, M, 562 00:30:02,230 --> 00:30:04,960 and we're going to make here the DFA. 563 00:30:04,960 --> 00:30:11,510 But in order to remember which set of states that DFA could 564 00:30:11,510 --> 00:30:14,390 be in at a given point-- so maybe it's in the set of states 565 00:30:14,390 --> 00:30:16,838 that M could be in. 566 00:30:16,838 --> 00:30:17,630 Did I say it wrong? 567 00:30:17,630 --> 00:30:19,950 Which set of states the NFA could be in a given time-- 568 00:30:19,950 --> 00:30:22,640 so maybe M, the NFA, could be in, 569 00:30:22,640 --> 00:30:25,970 at some point, state q3 and q7. 570 00:30:25,970 --> 00:30:27,890 The way the DFA keeps track of that, 571 00:30:27,890 --> 00:30:32,840 it's going to have a state for every possible subset of states 572 00:30:32,840 --> 00:30:35,080 of the NFA. 573 00:30:35,080 --> 00:30:39,820 That's how it remembers which subset of states the NFA is in. 574 00:30:39,820 --> 00:30:41,320 That's the way DFAs work. 575 00:30:41,320 --> 00:30:44,890 They have a separate state for each possibility 576 00:30:44,890 --> 00:30:46,570 that they need to keep track of. 577 00:30:46,570 --> 00:30:49,480 And the possibilities here are the different subsets 578 00:30:49,480 --> 00:30:53,570 of states that the NFA could be in at a given point. 579 00:30:53,570 --> 00:30:54,070 OK? 580 00:30:54,070 --> 00:30:58,330 So corresponding to this subset, to these two possibilities 581 00:30:58,330 --> 00:31:04,420 q3, q7, the DFA is going to have a state with the subset q3, q7. 582 00:31:04,420 --> 00:31:06,280 And it's going to, for every possible subset 583 00:31:06,280 --> 00:31:09,315 here, there's going to be a different state of M prime. 584 00:31:09,315 --> 00:31:10,690 So M prime is going to be bigger. 585 00:31:13,420 --> 00:31:13,990 OK. 586 00:31:13,990 --> 00:31:20,230 So quickly, the construction of M, the states of M prime now, 587 00:31:20,230 --> 00:31:24,520 q prime, are going to be the power set, the set of subsets 588 00:31:24,520 --> 00:31:30,010 of states from the original machine M. 589 00:31:30,010 --> 00:31:38,380 And now we have to look at how the transition function 590 00:31:38,380 --> 00:31:41,620 of the DFA, when you made the primed machines 591 00:31:41,620 --> 00:31:43,910 of the DFAs, the DFA machine. 592 00:31:43,910 --> 00:31:46,460 So these are the deterministic components. 593 00:31:46,460 --> 00:31:56,980 So delta prime, when it has a subset, something like this, 594 00:31:56,980 --> 00:31:58,990 has one of its states, which corresponds 595 00:31:58,990 --> 00:32:03,220 to a subset of states of M, and it reads an input symbol, 596 00:32:03,220 --> 00:32:06,190 you just have to do the updating the way you would naturally do. 597 00:32:06,190 --> 00:32:08,920 You're going to look at every state in R, 598 00:32:08,920 --> 00:32:11,200 look at where that can go under a-- 599 00:32:11,200 --> 00:32:12,790 so there's a bunch of sets there. 600 00:32:12,790 --> 00:32:15,670 And look at all the possible states 601 00:32:15,670 --> 00:32:17,500 that could be in one of those subsets, 602 00:32:17,500 --> 00:32:20,532 and that's the set of states that you could be. 603 00:32:20,532 --> 00:32:22,240 That's going to be the new set of states, 604 00:32:22,240 --> 00:32:26,040 and that's going to be in the new state of M prime. 605 00:32:26,040 --> 00:32:26,540 OK? 606 00:32:26,540 --> 00:32:30,170 So it's going to be the subset corresponding to all 607 00:32:30,170 --> 00:32:35,360 of the states that could be in, when you apply the transition 608 00:32:35,360 --> 00:32:39,140 function of the nondeterministic machine, 609 00:32:39,140 --> 00:32:42,530 to one of the states in the subset of states 610 00:32:42,530 --> 00:32:44,750 that the nondeterministic machine could be in. 611 00:32:44,750 --> 00:32:45,980 OK? 612 00:32:45,980 --> 00:32:49,322 It's a little bit of a mouthful. 613 00:32:49,322 --> 00:32:51,280 I suggest you look at this, if you didn't quite 614 00:32:51,280 --> 00:32:53,380 get it, after the fact. 615 00:32:53,380 --> 00:32:55,390 Good to understand. 616 00:32:55,390 --> 00:32:58,090 The starting stage for the NFA-- 617 00:32:58,090 --> 00:32:59,680 for the DFA-- I'm sorry-- is going 618 00:32:59,680 --> 00:33:02,140 to be which subset now we're going to start off with. 619 00:33:02,140 --> 00:33:04,840 It's going to be the subset corresponding to just the start 620 00:33:04,840 --> 00:33:15,380 state of M. And the accepting states are going to be-- 621 00:33:15,380 --> 00:33:18,650 of the deterministic machine are going to be all of the subsets 622 00:33:18,650 --> 00:33:24,000 that have at least one accepting state from the NFA. 623 00:33:24,000 --> 00:33:24,900 OK? 624 00:33:24,900 --> 00:33:25,860 So I hope you got that. 625 00:33:25,860 --> 00:33:28,840 Because I'm going to give you another little check-in here. 626 00:33:28,840 --> 00:33:31,530 Which is I'm going to ask you, how big is M prime? 627 00:33:31,530 --> 00:33:33,570 How many states does M prime have? 628 00:33:33,570 --> 00:33:35,370 I told you what those states are. 629 00:33:35,370 --> 00:33:37,200 So just go think about that. 630 00:33:40,780 --> 00:33:43,240 So check-in two-- if M has n states, 631 00:33:43,240 --> 00:33:47,270 how many states does M prime have by this construction? 632 00:33:47,270 --> 00:33:48,830 OK, so let's launch the next poll. 633 00:33:58,710 --> 00:34:03,160 OK, five seconds-- and I think we're almost done here. 634 00:34:03,160 --> 00:34:03,660 good. 635 00:34:06,540 --> 00:34:09,670 All right, share results-- 636 00:34:09,670 --> 00:34:11,670 I don't know if sharing results is a good thing. 637 00:34:11,670 --> 00:34:13,190 I'm not trying to make you, if you 638 00:34:13,190 --> 00:34:14,531 didn't get the right answer-- 639 00:34:14,531 --> 00:34:16,739 because most of the people did get the right answer-- 640 00:34:16,739 --> 00:34:18,020 but if you didn't get the right answer, trying to make you 641 00:34:18,020 --> 00:34:18,530 feel bad. 642 00:34:18,530 --> 00:34:21,830 But it's a little bit of suggestion 643 00:34:21,830 --> 00:34:24,480 that you need to review some basic concepts. 644 00:34:24,480 --> 00:34:26,270 So the basic concept here is if you 645 00:34:26,270 --> 00:34:28,610 have a collection-- you have a set of states, 646 00:34:28,610 --> 00:34:31,159 how many subsets are there? 647 00:34:31,159 --> 00:34:33,989 And the number of subsets is going to be exponential. 648 00:34:33,989 --> 00:34:36,739 So if you have a collection of n elements, 649 00:34:36,739 --> 00:34:40,550 the number of subsets of those n elements is 2 to the n. 650 00:34:40,550 --> 00:34:42,150 That's the fact we're using here. 651 00:34:42,150 --> 00:34:45,679 And that's why M prime has 2 to the n states, 652 00:34:45,679 --> 00:34:49,040 if M had n states. 653 00:34:49,040 --> 00:34:53,659 And you should make sure you understand why that is. 654 00:34:53,659 --> 00:34:59,230 All right, so with that, as requested, 655 00:34:59,230 --> 00:35:01,570 we're going to have a little break. 656 00:35:01,570 --> 00:35:07,845 And that break is going to last us exactly five minutes. 657 00:35:15,060 --> 00:35:17,910 So we will return in five minutes. 658 00:35:17,910 --> 00:35:19,350 I'm going to be prompt. 659 00:35:19,350 --> 00:35:21,070 So I gave you a little timer here. 660 00:35:21,070 --> 00:35:29,220 So please, I'm going to begin it right when this is over. 661 00:35:35,280 --> 00:35:38,705 OK, almost ready. 662 00:35:53,040 --> 00:35:57,770 I hope you're all refreshed and ready for the second half. 663 00:35:57,770 --> 00:36:00,830 So now that we have nondeterminism, 664 00:36:00,830 --> 00:36:05,030 we're going to use that as a tool to prove the closure 665 00:36:05,030 --> 00:36:08,270 properties that we were aiming for, 666 00:36:08,270 --> 00:36:10,310 starting from last lecture. 667 00:36:10,310 --> 00:36:11,870 OK. 668 00:36:11,870 --> 00:36:17,270 So remember, let's look at closure under a union. 669 00:36:17,270 --> 00:36:19,220 Now, we already did that, but I'm 670 00:36:19,220 --> 00:36:22,370 going to do it again, but this time, using nondeterminism. 671 00:36:22,370 --> 00:36:28,280 And you'll see how powerful nondeterminism is. 672 00:36:28,280 --> 00:36:36,870 Because it's going to allow us to do it almost with no effort. 673 00:36:36,870 --> 00:36:39,690 We'll start off the way we did before. 674 00:36:39,690 --> 00:36:44,390 I'm going to start off with two DFAs. 675 00:36:44,390 --> 00:36:48,350 But actually, these could be NFAs even. 676 00:36:48,350 --> 00:36:50,960 But let's say we started with the two DFAs 677 00:36:50,960 --> 00:36:53,270 for the two languages A1 and A2. 678 00:36:53,270 --> 00:37:00,690 And now we're going to construct an NFA, recognizing the union. 679 00:37:00,690 --> 00:37:02,440 And that's good enough, because we already 680 00:37:02,440 --> 00:37:05,110 know that we can convert NFAs to DFAs. 681 00:37:05,110 --> 00:37:10,260 And therefore, they do regular languages, too. 682 00:37:10,260 --> 00:37:11,100 OK. 683 00:37:11,100 --> 00:37:14,670 So now here are the two DFAs that 684 00:37:14,670 --> 00:37:17,520 do the languages A1 and A2. 685 00:37:17,520 --> 00:37:20,970 And what I'm going to do is I'm going to put them together 686 00:37:20,970 --> 00:37:28,800 into a bag of states, which is going to be M, the NFA that's 687 00:37:28,800 --> 00:37:30,840 going to do the union language. 688 00:37:30,840 --> 00:37:34,260 So remember-- what does M supposed to do? 689 00:37:34,260 --> 00:37:36,960 M is supposed to accept its input, 690 00:37:36,960 --> 00:37:39,780 if either M1 or M2 accept. 691 00:37:39,780 --> 00:37:41,220 So how is it going to do that? 692 00:37:41,220 --> 00:37:42,637 What it's going to do, we're going 693 00:37:42,637 --> 00:37:46,410 to add a new state to M, which is going to branch 694 00:37:46,410 --> 00:37:48,010 under epsilon transitions. 695 00:37:48,010 --> 00:37:50,220 And now you can start to see how useful these epsilon 696 00:37:50,220 --> 00:37:52,050 transitions are going to be for us. 697 00:37:52,050 --> 00:37:54,720 Going to branch under epsilon transitions 698 00:37:54,720 --> 00:37:57,870 to the two original start states of M1 and M2. 699 00:37:57,870 --> 00:37:59,470 And we're done. 700 00:37:59,470 --> 00:38:01,750 Why? 701 00:38:01,750 --> 00:38:05,230 Well, now, nondeterministically, as we get an input, 702 00:38:05,230 --> 00:38:09,160 w coming in to M-- 703 00:38:09,160 --> 00:38:12,250 and at the very beginning, even just right after it gets going, 704 00:38:12,250 --> 00:38:13,750 the very first thing that happens 705 00:38:13,750 --> 00:38:15,970 is it's going to branch to M1 and also 706 00:38:15,970 --> 00:38:17,860 branch to M2 nondeterministically 707 00:38:17,860 --> 00:38:19,390 as two possibilities. 708 00:38:19,390 --> 00:38:22,630 And then inside M1 and M2, it's going to actually start 709 00:38:22,630 --> 00:38:23,740 reading the input. 710 00:38:23,740 --> 00:38:28,600 And each one is going to be now following along 711 00:38:28,600 --> 00:38:35,140 as it would have originally the states corresponding to reading 712 00:38:35,140 --> 00:38:37,690 those input symbols. 713 00:38:37,690 --> 00:38:42,160 And M, as a combination of M1 and M2, 714 00:38:42,160 --> 00:38:45,700 is going to have a possibility for one state in M1 715 00:38:45,700 --> 00:38:49,210 and one state in M2. 716 00:38:49,210 --> 00:38:55,280 And so M is going to have those combined into one package. 717 00:38:55,280 --> 00:39:00,580 And now at the end of the input, if either of these end 718 00:39:00,580 --> 00:39:06,670 up at an accepting state, then M is 719 00:39:06,670 --> 00:39:09,070 going to accept as a nondeterministic finite 720 00:39:09,070 --> 00:39:09,670 automaton. 721 00:39:09,670 --> 00:39:11,620 Because that's how nondeterminism works. 722 00:39:11,620 --> 00:39:13,270 You accept if either-- 723 00:39:13,270 --> 00:39:15,010 if any one of the branches ended up 724 00:39:15,010 --> 00:39:19,198 accepting-- which is just what you need for union. 725 00:39:19,198 --> 00:39:21,490 So when we're doing union, you want either one of these 726 00:39:21,490 --> 00:39:22,480 to be accepting. 727 00:39:22,480 --> 00:39:26,050 And the nondeterminism just is built 728 00:39:26,050 --> 00:39:32,480 conveniently to allow us to do the union almost for free. 729 00:39:32,480 --> 00:39:35,660 So you can again, thinking about nondeterminism 730 00:39:35,660 --> 00:39:38,730 as terms of parallelism, you could 731 00:39:38,730 --> 00:39:40,830 think of the nondeterministic machine 732 00:39:40,830 --> 00:39:46,150 as running in parallel M1 and M2 on the input. 733 00:39:46,150 --> 00:39:49,650 And if either one of them ends up accepting, M will accept. 734 00:39:49,650 --> 00:39:51,900 Or you can think about it in terms 735 00:39:51,900 --> 00:39:55,570 of that guessing that I referred to before, 736 00:39:55,570 --> 00:39:58,500 which means that as M is getting-- 737 00:39:58,500 --> 00:40:02,040 when it's just about to read the first symbols of its input, 738 00:40:02,040 --> 00:40:07,650 it guesses whether that's going to be an input accepted by M1 739 00:40:07,650 --> 00:40:09,780 or an input accepted by M2. 740 00:40:09,780 --> 00:40:12,030 And the magic of nondeterminism is 741 00:40:12,030 --> 00:40:14,070 that it always guesses right. 742 00:40:14,070 --> 00:40:16,800 So that input happens to be an input that's 743 00:40:16,800 --> 00:40:19,860 going to be accepted by M2. 744 00:40:19,860 --> 00:40:23,700 M is going to guess that M2 is the right way to follow. 745 00:40:23,700 --> 00:40:26,820 And it's going to go in the M2 direction. 746 00:40:26,820 --> 00:40:29,340 Because nondeterminism, the magic 747 00:40:29,340 --> 00:40:31,245 is you always guess right. 748 00:40:31,245 --> 00:40:34,076 I wish that was true in real life. 749 00:40:34,076 --> 00:40:37,210 It would make exams a lot easier. 750 00:40:37,210 --> 00:40:41,080 Anyway, so now let's see how we can use that to do closure 751 00:40:41,080 --> 00:40:44,380 under concatenation. 752 00:40:44,380 --> 00:40:46,750 OK, so now we're going to actually 753 00:40:46,750 --> 00:40:49,510 have a picture of very similar to the one we had originally. 754 00:40:49,510 --> 00:40:53,030 But now using nondeterminism, we can make it work. 755 00:40:53,030 --> 00:40:57,160 So here we have the two machines doing the two languages, A1 756 00:40:57,160 --> 00:40:58,900 and A2. 757 00:40:58,900 --> 00:41:03,760 And we're going to combine them into one bigger machine M, 758 00:41:03,760 --> 00:41:06,040 as shown. 759 00:41:06,040 --> 00:41:08,010 Remember, what M is supposed to do 760 00:41:08,010 --> 00:41:10,860 is accept its input, if there's some way of splitting 761 00:41:10,860 --> 00:41:14,820 that input, such that the first half is accepted by the M1, 762 00:41:14,820 --> 00:41:18,591 and the second part is accepted by M2. 763 00:41:18,591 --> 00:41:21,230 The way we're going to get that effect is 764 00:41:21,230 --> 00:41:29,420 by putting in a transit empty-- 765 00:41:29,420 --> 00:41:32,450 empty transitions, epsilon transitions, 766 00:41:32,450 --> 00:41:38,750 going from the accept states of M1 to the start state of M2, 767 00:41:38,750 --> 00:41:40,450 just as I've shown in this diagram. 768 00:41:40,450 --> 00:41:44,270 So these were the original accepting states of M1. 769 00:41:44,270 --> 00:41:48,170 And now they're going to be declassified 770 00:41:48,170 --> 00:41:49,370 as accepting states. 771 00:41:49,370 --> 00:41:55,100 But they're going to have new transitions, empty transitions, 772 00:41:55,100 --> 00:41:56,780 attached to them, which allow them 773 00:41:56,780 --> 00:42:00,470 to branch to M2 without reading any input. 774 00:42:03,420 --> 00:42:05,070 And so intuitively speaking, this 775 00:42:05,070 --> 00:42:07,170 is going to do the right thing. 776 00:42:07,170 --> 00:42:13,710 Because once M1 has accepted some part of w, 777 00:42:13,710 --> 00:42:17,880 then you can nondeterministically 778 00:42:17,880 --> 00:42:20,560 branch to M2. 779 00:42:24,240 --> 00:42:28,380 And you're going to be start processing inside M2. 780 00:42:28,380 --> 00:42:30,360 And the point is-- 781 00:42:30,360 --> 00:42:33,300 I jumped ahead of myself-- 782 00:42:33,300 --> 00:42:37,380 is that the reason why it fixes the problem we had before is 783 00:42:37,380 --> 00:42:40,530 that the epsilon transitions don't-- 784 00:42:40,530 --> 00:42:43,500 the machine does not have to take that. 785 00:42:43,500 --> 00:42:49,110 It can stay where it is as one nondeterministic option, 786 00:42:49,110 --> 00:42:51,390 or it can move along the epsilon transition, 787 00:42:51,390 --> 00:42:54,210 without reading any input, as another nondeterministic 788 00:42:54,210 --> 00:42:56,500 option. 789 00:42:56,500 --> 00:42:58,360 So it's using this nondeterminism now 790 00:42:58,360 --> 00:43:03,130 to both stay in M1 to continue reading more of the input 791 00:43:03,130 --> 00:43:07,480 and to jump into M2 to start processing what 792 00:43:07,480 --> 00:43:11,950 might be the second half or the second part of the input which 793 00:43:11,950 --> 00:43:13,940 M2 accepts. 794 00:43:13,940 --> 00:43:16,220 And you can think of it in terms of the guessing 795 00:43:16,220 --> 00:43:21,360 as that the machine is guessing where to make that split. 796 00:43:21,360 --> 00:43:25,290 Once it found an initial part that's accepted by M1, 797 00:43:25,290 --> 00:43:27,870 it guesses that this is the right split point. 798 00:43:27,870 --> 00:43:29,610 And that passes to M2. 799 00:43:29,610 --> 00:43:31,860 But there might be other guesses that it 800 00:43:31,860 --> 00:43:36,930 could make corresponding to other possibilities. 801 00:43:36,930 --> 00:43:41,680 And so with nondeterminism, it always guesses right. 802 00:43:41,680 --> 00:43:44,770 If there is some way to split the string into two 803 00:43:44,770 --> 00:43:48,880 parts accepted by M1 and M2, the machine 804 00:43:48,880 --> 00:43:50,810 will make that good guess. 805 00:43:50,810 --> 00:43:55,160 And then M1 will accept the first part, 806 00:43:55,160 --> 00:43:57,620 and M2 will accept with the second part. 807 00:43:57,620 --> 00:44:01,760 And we'll get M accepting that whole string altogether. 808 00:44:01,760 --> 00:44:06,760 And so that is the solution to our puzzle 809 00:44:06,760 --> 00:44:10,390 for how do we do closure under concatenation. 810 00:44:10,390 --> 00:44:11,660 OK, I hope that came through. 811 00:44:11,660 --> 00:44:17,090 Because we're just getting going with nondeterminism. 812 00:44:17,090 --> 00:44:19,160 We're going to be using nondeterminism a lot, 813 00:44:19,160 --> 00:44:24,410 and you're going to need to get very comfortable with it. 814 00:44:24,410 --> 00:44:26,090 OK? 815 00:44:26,090 --> 00:44:29,210 Now let's do closure under star. 816 00:44:29,210 --> 00:44:34,075 And closure under star works very similarly, 817 00:44:34,075 --> 00:44:36,200 but now we're just going to have a single language. 818 00:44:36,200 --> 00:44:37,982 If A is regular, so is A star. 819 00:44:37,982 --> 00:44:39,440 So they're not a pair of languages, 820 00:44:39,440 --> 00:44:43,880 because a star is a unary operation applying 821 00:44:43,880 --> 00:44:47,640 to just a single language. 822 00:44:47,640 --> 00:44:50,690 So if we have a DFA recognizing A, in order 823 00:44:50,690 --> 00:44:53,630 to show that A star is regular, we 824 00:44:53,630 --> 00:44:56,673 have to construct a machine that recognizes A star. 825 00:44:56,673 --> 00:44:58,340 And the machine we're going to construct 826 00:44:58,340 --> 00:45:01,650 is as before and then an NFA. 827 00:45:01,650 --> 00:45:02,370 OK? 828 00:45:02,370 --> 00:45:09,990 So here is M, the DFA for A. And we're 829 00:45:09,990 --> 00:45:15,630 going to build an NFA M prime that recognizes A star. 830 00:45:15,630 --> 00:45:21,520 And let's think now, what does it mean to recognize A star? 831 00:45:21,520 --> 00:45:24,510 So if I'm going to give you an input, 832 00:45:24,510 --> 00:45:26,640 when is it in the star language? 833 00:45:26,640 --> 00:45:30,150 What does M prime have to do? 834 00:45:30,150 --> 00:45:32,220 So remember what star is. 835 00:45:32,220 --> 00:45:34,740 Star means you can take as many copies of you 836 00:45:34,740 --> 00:45:37,110 lot as you like of strings in the original language, 837 00:45:37,110 --> 00:45:38,550 and that's in the star language. 838 00:45:38,550 --> 00:45:41,700 So to determine if something is in the star language, 839 00:45:41,700 --> 00:45:43,890 you have to see, can I break it up 840 00:45:43,890 --> 00:45:46,530 into pieces which are all in the original language? 841 00:45:52,960 --> 00:45:57,400 So you want to see, can I take my input w and cut it up 842 00:45:57,400 --> 00:46:00,610 into a bunch of pieces-- four, in this case-- 843 00:46:00,610 --> 00:46:03,130 where each of those pieces are members 844 00:46:03,130 --> 00:46:06,310 of A, the members of the original language? 845 00:46:06,310 --> 00:46:10,080 So that's what M prime's job is. 846 00:46:10,080 --> 00:46:12,310 It has its input and wants to know, 847 00:46:12,310 --> 00:46:15,270 can I cut that input up into pieces, each of which 848 00:46:15,270 --> 00:46:19,220 are accepted by the original machine M? 849 00:46:19,220 --> 00:46:21,430 That's what M prime does. 850 00:46:21,430 --> 00:46:26,840 And if you think about it a little bit, 851 00:46:26,840 --> 00:46:30,980 really what's happening is that as soon as M-- 852 00:46:30,980 --> 00:46:32,738 so M prime is going to be simulating 853 00:46:32,738 --> 00:46:35,030 M. That's the way I like to think about this, as having 854 00:46:35,030 --> 00:46:36,830 M inside. 855 00:46:36,830 --> 00:46:39,020 So if you were going to be doing this yourself, 856 00:46:39,020 --> 00:46:40,292 you're going to take w. 857 00:46:40,292 --> 00:46:41,750 You're going to run it for a while. 858 00:46:41,750 --> 00:46:43,910 You'll see, oh, M is accepted. 859 00:46:43,910 --> 00:46:46,960 Now I have to start him over again 860 00:46:46,960 --> 00:46:50,790 to see if it accepts the next segment. 861 00:46:50,790 --> 00:46:53,210 So every time M accepts, you're going 862 00:46:53,210 --> 00:46:56,450 to restart M to see if it accepts another segment. 863 00:46:56,450 --> 00:46:59,060 And so by doing that, you're going to be cutting w up 864 00:46:59,060 --> 00:47:01,610 into different segments, each of which 865 00:47:01,610 --> 00:47:05,600 is accepted by M. Of course, it's never totally 866 00:47:05,600 --> 00:47:09,530 clear whether you should, for any given segment, 867 00:47:09,530 --> 00:47:11,960 you should cut it there or you should wait a little longer 868 00:47:11,960 --> 00:47:14,390 and find another, a later place to cut. 869 00:47:14,390 --> 00:47:16,130 But that's exactly the same problem 870 00:47:16,130 --> 00:47:18,680 that we had before with concatenation. 871 00:47:18,680 --> 00:47:21,860 And we solved it using nondeterminism, 872 00:47:21,860 --> 00:47:25,310 and we're going to solve it again using nondeterminism. 873 00:47:25,310 --> 00:47:27,670 So the way we're going to get that effect 874 00:47:27,670 --> 00:47:31,720 of starting the machine over again, once it's accepted, 875 00:47:31,720 --> 00:47:37,380 is by adding in epsilon transitions that go 876 00:47:37,380 --> 00:47:39,760 from the start states back to-- 877 00:47:39,760 --> 00:47:41,910 from the accept state back to the start state. 878 00:47:47,700 --> 00:47:51,060 So now every time M has accepted, 879 00:47:51,060 --> 00:47:54,520 it has an option-- not a requirement, but has an option. 880 00:47:54,520 --> 00:47:57,120 It can either stay continuing to process, 881 00:47:57,120 --> 00:48:01,350 or it could restart, making a cut at that point 882 00:48:01,350 --> 00:48:07,650 and trying to see if there's yet a second, another segment 883 00:48:07,650 --> 00:48:11,670 of the input that it's going to accept. 884 00:48:11,670 --> 00:48:18,370 And this is basically the whole thing, with one little problem 885 00:48:18,370 --> 00:48:20,470 that we need to deal with. 886 00:48:20,470 --> 00:48:30,450 And that is we need to make sure that M prime accepts 887 00:48:30,450 --> 00:48:31,290 the empty string. 888 00:48:31,290 --> 00:48:33,870 Because remember, the empty string 889 00:48:33,870 --> 00:48:38,145 is always a member of the star language. 890 00:48:40,660 --> 00:48:44,980 And as it's written right now, we're 891 00:48:44,980 --> 00:48:49,450 going to be requiring there to be at least one copy 892 00:48:49,450 --> 00:48:51,490 of at least one segment. 893 00:48:51,490 --> 00:48:54,550 We're not taking into account the possibility of no segments, 894 00:48:54,550 --> 00:48:56,570 which is the empty string. 895 00:48:56,570 --> 00:48:58,720 And the way we're going to get that is-- 896 00:48:58,720 --> 00:49:04,160 well, I mean, one thing, one way to get to add-- 897 00:49:04,160 --> 00:49:06,580 so we're missing the empty string right now. 898 00:49:06,580 --> 00:49:07,560 So how do we fix it? 899 00:49:07,560 --> 00:49:11,190 Basically, we're just going to take the construction we have 900 00:49:11,190 --> 00:49:15,300 on the screen, and we're going to adjust 901 00:49:15,300 --> 00:49:17,460 it to add in the empty string. 902 00:49:17,460 --> 00:49:19,380 Because it's possibly missing. 903 00:49:19,380 --> 00:49:25,390 One way to do that, which is tempting, but wrong, 904 00:49:25,390 --> 00:49:33,050 is to make the start state of M an accepting state for M prime. 905 00:49:33,050 --> 00:49:36,920 So we could have made this an accepting state, too. 906 00:49:36,920 --> 00:49:40,230 And now M prime is also going to accept the empty string. 907 00:49:40,230 --> 00:49:42,220 That's the good news. 908 00:49:42,220 --> 00:49:48,430 The problem is that the start date 909 00:49:48,430 --> 00:49:51,640 might be playing some other role in M 910 00:49:51,640 --> 00:49:53,140 besides just being the start. 911 00:49:53,140 --> 00:49:56,940 There might be times when M comes back to the start state 912 00:49:56,940 --> 00:49:58,600 later on. 913 00:49:58,600 --> 00:50:01,650 And if we make the start state the an accept state, 914 00:50:01,650 --> 00:50:03,750 it's going to suddenly start accepting 915 00:50:03,750 --> 00:50:07,550 a bunch of other things too, which might not be intended. 916 00:50:07,550 --> 00:50:10,810 So it's a bad idea to make the start state an accept state. 917 00:50:10,810 --> 00:50:15,760 Instead, we'll take the simple solution alternative 918 00:50:15,760 --> 00:50:19,090 of adding a new start state, which will never 919 00:50:19,090 --> 00:50:22,180 be returned to under any circumstances, 920 00:50:22,180 --> 00:50:24,800 and make that a new start-- 921 00:50:24,800 --> 00:50:26,600 an accept state as well. 922 00:50:26,600 --> 00:50:29,380 So here, we'll have to make this additional modification. 923 00:50:29,380 --> 00:50:31,600 So as I'm saying, this is what we need to do. 924 00:50:31,600 --> 00:50:34,738 And the way we'll do that is by adding a new start state, which 925 00:50:34,738 --> 00:50:36,280 is also an accept state, to make sure 926 00:50:36,280 --> 00:50:38,120 it accepts the empty string. 927 00:50:38,120 --> 00:50:44,410 And then that also can branch to start off M 928 00:50:44,410 --> 00:50:48,760 as before, if the string that's input is not the empty string. 929 00:50:48,760 --> 00:50:51,400 And so then M prime is actually going 930 00:50:51,400 --> 00:50:53,710 to have to do some work to see if it can be cut off, 931 00:50:53,710 --> 00:50:57,050 as it was doing before. 932 00:50:57,050 --> 00:51:00,382 So that's the proof of closure under star. 933 00:51:00,382 --> 00:51:02,590 I'm not going to do it anything beyond what I've just 934 00:51:02,590 --> 00:51:03,100 described. 935 00:51:03,100 --> 00:51:06,820 These proofs by picture are convincing enough, I hope. 936 00:51:06,820 --> 00:51:10,703 And if not, they are explained in somewhat more detail, 937 00:51:10,703 --> 00:51:12,370 somewhat more formally, in the textbook. 938 00:51:12,370 --> 00:51:14,950 But for the lecture, this is where I'm going to stop, 939 00:51:14,950 --> 00:51:17,710 with these two arguments. 940 00:51:17,710 --> 00:51:21,385 And so now-- oh, we have one quick check-in on this. 941 00:51:23,930 --> 00:51:27,090 So if M has n states, how many states 942 00:51:27,090 --> 00:51:30,920 does M prime have by this construction? 943 00:51:30,920 --> 00:51:37,160 So I'm not intending these to be very hard, more just 944 00:51:37,160 --> 00:51:37,955 to keep you awake. 945 00:51:41,160 --> 00:51:48,490 So how many states does M prime have? 946 00:51:48,490 --> 00:51:52,550 OK, maybe a little too easy even for a check-in. 947 00:51:55,090 --> 00:51:57,830 Yeah, everybody is getting this one. 948 00:51:57,830 --> 00:51:59,000 Because all you did was-- 949 00:51:59,000 --> 00:52:01,130 we added one new state. 950 00:52:01,130 --> 00:52:04,160 So the answer is as you have-- 951 00:52:04,160 --> 00:52:10,970 I think pretty much everybody is observing that it's number b. 952 00:52:10,970 --> 00:52:13,220 So I'm going to end the polling, and I'm 953 00:52:13,220 --> 00:52:14,900 going to share the results. 954 00:52:14,900 --> 00:52:17,030 And everybody got that one. 955 00:52:17,030 --> 00:52:18,560 And so let's continue on. 956 00:52:21,540 --> 00:52:27,470 And so the very last thing we're going to do today 957 00:52:27,470 --> 00:52:34,820 is show you how to convert regular expressions to NFAs, 958 00:52:34,820 --> 00:52:37,250 thereby showing that every language that you 959 00:52:37,250 --> 00:52:39,050 can describe with a regular expression 960 00:52:39,050 --> 00:52:42,242 is a regular language. 961 00:52:42,242 --> 00:52:44,200 On Tuesday, we'll show how to do the conversion 962 00:52:44,200 --> 00:52:46,720 in the other direction and so thereby showing 963 00:52:46,720 --> 00:52:50,830 that these two methods of describing languages 964 00:52:50,830 --> 00:52:52,400 are equivalent to one another. 965 00:52:52,400 --> 00:52:53,620 So here's our theorem. 966 00:52:53,620 --> 00:52:57,010 If R is a regular expression, and A 967 00:52:57,010 --> 00:52:58,990 is the language-- a set of strings 968 00:52:58,990 --> 00:53:03,630 that that regular expression describes, then A is regular. 969 00:53:03,630 --> 00:53:05,910 OK? 970 00:53:05,910 --> 00:53:11,430 So we're going to show how to convert. 971 00:53:11,430 --> 00:53:17,170 The strategy is to convert R to an equivalent NFA M. 972 00:53:17,170 --> 00:53:19,870 And so we have to think about, remember, 973 00:53:19,870 --> 00:53:23,290 these regular expressions that we introduced last time. 974 00:53:23,290 --> 00:53:28,570 These are these expressions that look like ab union b 975 00:53:28,570 --> 00:53:31,040 star, something like that-- 976 00:53:31,040 --> 00:53:33,100 so built up out of the regular operations 977 00:53:33,100 --> 00:53:37,780 from the primitive regular expressions 978 00:53:37,780 --> 00:53:40,900 that don't have any operations, that we're calling atomic. 979 00:53:40,900 --> 00:53:43,000 So if R is an atomic regular expression, 980 00:53:43,000 --> 00:53:46,090 it just looks like either just a single symbol 981 00:53:46,090 --> 00:53:50,575 or an empty string symbol or an empty language symbol. 982 00:53:53,860 --> 00:54:01,000 Or R can be a composite regular expression-- 983 00:54:01,000 --> 00:54:02,650 whoops. 984 00:54:02,650 --> 00:54:03,640 We're having a little-- 985 00:54:06,190 --> 00:54:09,070 yeah, so we have two possibilities here. 986 00:54:09,070 --> 00:54:11,620 R is either atomic or composite. 987 00:54:11,620 --> 00:54:16,300 And so let's look at what the equivalent expression is 988 00:54:16,300 --> 00:54:17,840 in each case. 989 00:54:17,840 --> 00:54:21,880 So if R is just the single letter regular expression-- 990 00:54:21,880 --> 00:54:24,280 that's a totally legitimate regular expression, 991 00:54:24,280 --> 00:54:27,070 just a regular expression 1. 992 00:54:27,070 --> 00:54:31,390 So that just describes the language of the string 1. 993 00:54:31,390 --> 00:54:35,230 So we have to make an NFA which accepts-- 994 00:54:35,230 --> 00:54:37,000 which recognizes just that language, 995 00:54:37,000 --> 00:54:39,610 accepts only the string 1. 996 00:54:39,610 --> 00:54:41,800 So it's a very simple NFA. 997 00:54:41,800 --> 00:54:44,290 It just starts in the start state. 998 00:54:44,290 --> 00:54:49,420 And on that single symbol, it branches to an accept state. 999 00:54:49,420 --> 00:54:51,770 And there were no other transitions allowed. 1000 00:54:51,770 --> 00:54:53,320 So if you get anything else coming 1001 00:54:53,320 --> 00:54:56,380 in besides that one, that string, which 1002 00:54:56,380 --> 00:55:00,730 is just that one symbol, the NFA is going to reject it. 1003 00:55:00,730 --> 00:55:04,930 If it's too long, if it gets aa coming in, 1004 00:55:04,930 --> 00:55:09,970 well, there's nowhere to go from this accepting state on an A. 1005 00:55:09,970 --> 00:55:12,010 So the machine is just going to die. 1006 00:55:12,010 --> 00:55:14,440 It has to be in an accept state at the end of the input. 1007 00:55:17,910 --> 00:55:19,910 Now, I want you think for yourself for a minute, 1008 00:55:19,910 --> 00:55:24,790 how do we make an NFA which accepts only the empty string 1009 00:55:24,790 --> 00:55:26,290 and no other strings? 1010 00:55:26,290 --> 00:55:29,320 You can do that with just one state with an NFA, 1011 00:55:29,320 --> 00:55:31,665 just this one here. 1012 00:55:31,665 --> 00:55:33,040 The machine is going to start off 1013 00:55:33,040 --> 00:55:34,690 in the start state, which is also immediately 1014 00:55:34,690 --> 00:55:35,380 an accept state. 1015 00:55:35,380 --> 00:55:37,240 So it accepts the empty string. 1016 00:55:37,240 --> 00:55:38,950 But if anything else comes in, there's 1017 00:55:38,950 --> 00:55:41,750 nowhere to go when the machine dies. 1018 00:55:41,750 --> 00:55:44,510 So this machine accepts just the empty string. 1019 00:55:44,510 --> 00:55:47,210 Or its language is the language with one element, 1020 00:55:47,210 --> 00:55:49,130 the empty string. 1021 00:55:49,130 --> 00:55:51,230 How about the empty language? 1022 00:55:51,230 --> 00:55:55,200 Well, here's an NFA which has no accepting state, 1023 00:55:55,200 --> 00:55:58,780 so it can't be accepting anything. 1024 00:55:58,780 --> 00:56:03,300 Now, if we have a composite regular expression, 1025 00:56:03,300 --> 00:56:04,680 we're already finished. 1026 00:56:04,680 --> 00:56:07,740 Because we showed how to build up-- 1027 00:56:07,740 --> 00:56:13,600 we showed constructions which give us closure under union, 1028 00:56:13,600 --> 00:56:14,970 concatenation, and star. 1029 00:56:14,970 --> 00:56:16,860 And those constructions are going 1030 00:56:16,860 --> 00:56:22,620 to enable us to build up the NFAs that 1031 00:56:22,620 --> 00:56:29,000 do the language of these more complex regular expressions 1032 00:56:29,000 --> 00:56:36,030 built up out of the NFAs that do the individual parts. 1033 00:56:36,030 --> 00:56:40,660 So if we already have NFAs that do R1 and R2, 1034 00:56:40,660 --> 00:56:44,440 then the closure under union construction 1035 00:56:44,440 --> 00:56:51,140 gives us an NFA that does R1 union R2 as a language. 1036 00:56:51,140 --> 00:56:54,350 So I hope that's clear, but I'm going 1037 00:56:54,350 --> 00:56:57,290 to do an example which will hopefully illustrate it. 1038 00:56:57,290 --> 00:56:59,180 And it's going to show you-- 1039 00:56:59,180 --> 00:57:02,480 basically, what I'm giving you is an automatic procedure 1040 00:57:02,480 --> 00:57:05,720 for converting a regular expression into an equivalent 1041 00:57:05,720 --> 00:57:07,260 NFA. 1042 00:57:07,260 --> 00:57:12,520 So let's just see that procedure in action, 1043 00:57:12,520 --> 00:57:15,600 which is really just following this recipe 1044 00:57:15,600 --> 00:57:16,900 that I described for you. 1045 00:57:16,900 --> 00:57:23,350 So here is a regular expression a union ab star. 1046 00:57:23,350 --> 00:57:25,480 So this is a regular expression. 1047 00:57:25,480 --> 00:57:26,450 It's some language. 1048 00:57:26,450 --> 00:57:28,060 Whatever it is, I don't care. 1049 00:57:28,060 --> 00:57:32,890 But I want to make an NFA which recognizes that same language. 1050 00:57:32,890 --> 00:57:35,740 And the way I'm going to do that is first build 1051 00:57:35,740 --> 00:57:42,610 NFA for the components, the subexpressions 1052 00:57:42,610 --> 00:57:46,360 of this regular expression, and then combine them, 1053 00:57:46,360 --> 00:57:51,730 using our closure instructions, to be 1054 00:57:51,730 --> 00:57:54,520 NFAs for larger and larger subexpressions, 1055 00:57:54,520 --> 00:57:57,430 until I get the NFA that's the equivalent 1056 00:57:57,430 --> 00:57:59,690 of the entire expression. 1057 00:57:59,690 --> 00:58:01,160 So let's just see how that goes. 1058 00:58:01,160 --> 00:58:03,690 So the very most primitive parts, 1059 00:58:03,690 --> 00:58:06,130 the smallest subexpressions here, 1060 00:58:06,130 --> 00:58:09,820 are just the expressions for a and for b. 1061 00:58:09,820 --> 00:58:13,750 So here's the one just for a. 1062 00:58:13,750 --> 00:58:19,240 So this is the NFA which recognizes the language, which 1063 00:58:19,240 --> 00:58:21,660 is just the one string a. 1064 00:58:21,660 --> 00:58:26,970 Here is the NFA whose language is just the one string b. 1065 00:58:26,970 --> 00:58:34,320 And now I want an NFA which accepts only the string ab. 1066 00:58:34,320 --> 00:58:37,890 Now, of course, you could just do that by hand yourself. 1067 00:58:37,890 --> 00:58:38,880 It's simple enough. 1068 00:58:38,880 --> 00:58:43,050 But what I'm arguing is that we can do this automatically, 1069 00:58:43,050 --> 00:58:45,900 using the closure construction for concatenation. 1070 00:58:45,900 --> 00:58:50,070 Because really there's a hidden concatenation symbol. 1071 00:58:50,070 --> 00:58:53,810 This is a concatenate b. 1072 00:58:53,810 --> 00:59:01,140 So now for ab, I'm going to take the thing from a and the part 1073 00:59:01,140 --> 00:59:02,700 from b-- 1074 00:59:02,700 --> 00:59:05,370 so these two things that I had from before, 1075 00:59:05,370 --> 00:59:11,560 and use the concatenation construction to combine them. 1076 00:59:11,560 --> 00:59:12,190 You see that? 1077 00:59:12,190 --> 00:59:15,130 So now I have automatically an NFA 1078 00:59:15,130 --> 00:59:17,770 which does the language whose string is just 1079 00:59:17,770 --> 00:59:21,040 ab, just the ab string. 1080 00:59:21,040 --> 00:59:23,560 And it's not the simplest NFA. 1081 00:59:23,560 --> 00:59:26,160 You can make a simpler one, but the virtue of this one 1082 00:59:26,160 --> 00:59:30,850 is that I got it automatically just by following the closure 1083 00:59:30,850 --> 00:59:31,960 construction. 1084 00:59:31,960 --> 00:59:34,450 So now I'm going to do a more complex one, just 1085 00:59:34,450 --> 00:59:36,020 the inside here, a union ab. 1086 00:59:39,780 --> 00:59:41,880 So the way I'm going to build that is from the two 1087 00:59:41,880 --> 00:59:44,730 parts, the a part and the ab part, 1088 00:59:44,730 --> 00:59:46,800 the a part and the ab part. 1089 00:59:46,800 --> 00:59:48,850 So here is the a part. 1090 00:59:48,850 --> 00:59:50,580 Here's the ab part. 1091 00:59:50,580 --> 00:59:52,050 I've already got those from before. 1092 00:59:52,050 --> 00:59:53,967 It's really kind of a proof by induction here. 1093 00:59:53,967 --> 00:59:56,430 But I think it's simple enough, we 1094 00:59:56,430 --> 00:59:58,900 don't have to use that language. 1095 00:59:58,900 --> 01:00:04,190 So we have the a part, the ab part. 1096 01:00:04,190 --> 01:00:09,490 And now we are going to apply the closure 1097 01:00:09,490 --> 01:00:12,760 under union construction to combine those into one machine. 1098 01:00:12,760 --> 01:00:15,580 And remember how that worked. 1099 01:00:15,580 --> 01:00:18,640 We had a new symbol here, which branches under empty string 1100 01:00:18,640 --> 01:00:20,620 to the previous-- 1101 01:00:20,620 --> 01:00:22,240 we're adding a new start state, which 1102 01:00:22,240 --> 01:00:25,630 branches to the original start states under empty transition. 1103 01:00:25,630 --> 01:00:31,790 And now this is an NFA for this language, a union ab. 1104 01:00:31,790 --> 01:00:33,530 And lastly, now we're one step away 1105 01:00:33,530 --> 01:00:36,470 from getting the star of this. 1106 01:00:36,470 --> 01:00:38,400 And how are we going to do that? 1107 01:00:38,400 --> 01:00:42,230 We're going to take this thing here and apply the construction 1108 01:00:42,230 --> 01:00:44,910 for the star closure. 1109 01:00:44,910 --> 01:00:47,100 And that's going to be an NFA which 1110 01:00:47,100 --> 01:00:49,080 does a union ab star, which is what 1111 01:00:49,080 --> 01:00:50,890 we wanted in the first place. 1112 01:00:50,890 --> 01:00:55,928 So first, we're going to bring that one down. 1113 01:00:55,928 --> 01:00:57,470 Because we've already built that one. 1114 01:00:57,470 --> 01:01:03,160 And now remember how we built the closure under star. 1115 01:01:03,160 --> 01:01:07,360 We made the accepting states return back to the start state, 1116 01:01:07,360 --> 01:01:09,640 and we added a new start state to make 1117 01:01:09,640 --> 01:01:11,980 sure we got the empty string in there that 1118 01:01:11,980 --> 01:01:15,070 transitioned to the original start state under epsilon. 1119 01:01:15,070 --> 01:01:15,580 OK? 1120 01:01:15,580 --> 01:01:19,690 So that's all I wanted to say for today's lecture. 1121 01:01:19,690 --> 01:01:20,740 Let's do a quick review. 1122 01:01:23,440 --> 01:01:25,990 Very important concept, nondeterminism 1123 01:01:25,990 --> 01:01:28,330 and nondeterministic finite automata-- 1124 01:01:28,330 --> 01:01:31,840 we proved they were equivalent in power, 1125 01:01:31,840 --> 01:01:33,400 showed the class of regular languages 1126 01:01:33,400 --> 01:01:35,890 closed under concatenation in star. 1127 01:01:35,890 --> 01:01:39,430 We showed how to do conversion of regular expressions to NFAs. 1128 01:01:39,430 --> 01:01:45,340 So I think that is it for today's lecture. 1129 01:01:45,340 --> 01:01:50,268 And thank you, all, for being here. 1130 01:01:50,268 --> 01:01:51,685 I'll try to answer a few of these. 1131 01:01:56,770 --> 01:01:58,690 "Why does concatenation have order?" 1132 01:01:58,690 --> 01:02:02,800 Well, because it's an ordered construction. 1133 01:02:02,800 --> 01:02:05,710 Is there a simple way to prove closure under concatenation 1134 01:02:05,710 --> 01:02:06,980 without using nondeterminism? 1135 01:02:06,980 --> 01:02:07,480 No. 1136 01:02:10,340 --> 01:02:12,770 "Why are the empty strings at the accept state? 1137 01:02:12,770 --> 01:02:16,680 Can't they be at any state? 1138 01:02:16,680 --> 01:02:19,410 Doesn't star make copies of any part of the input?" 1139 01:02:19,410 --> 01:02:23,175 No, it's only-- you have to think about what's going on. 1140 01:02:25,920 --> 01:02:29,340 You have to branch back to the beginning only on an accept. 1141 01:02:29,340 --> 01:02:31,170 Because that means you found a piece that's 1142 01:02:31,170 --> 01:02:32,212 in the original language. 1143 01:02:34,630 --> 01:02:37,900 "Is there an automaton that can add some 1144 01:02:37,900 --> 01:02:40,720 or subtract memory automata?" 1145 01:02:40,720 --> 01:02:43,950 Well, depends on what you mean by all that. 1146 01:02:43,950 --> 01:02:47,660 But certainly, there are more powerful machines 1147 01:02:47,660 --> 01:02:49,685 that we're going to study than finite automata. 1148 01:02:53,780 --> 01:02:56,420 But yes, there is. 1149 01:02:56,420 --> 01:02:59,270 And even finite automata can add and subtract, 1150 01:02:59,270 --> 01:03:02,020 if you present the input in the right way. 1151 01:03:02,020 --> 01:03:06,440 I would refer you to the first problem on the homework. 1152 01:03:06,440 --> 01:03:11,860 So I think I'm going to check out then. 1153 01:03:11,860 --> 01:03:12,760 Take care, everybody. 1154 01:03:12,760 --> 01:03:14,130 Bye-bye.