1 00:00:00,000 --> 00:00:02,470 [SQUEAKING] 2 00:00:02,470 --> 00:00:04,446 [RUSTLING] 3 00:00:04,446 --> 00:00:06,916 [CLICKING] 4 00:00:12,850 --> 00:00:14,740 ERIK DEMAINE: All right welcome back to 006, 5 00:00:14,740 --> 00:00:16,239 Dynamic Programming. 6 00:00:16,239 --> 00:00:19,390 We're now in step two out of four-- 7 00:00:19,390 --> 00:00:21,100 going to see a bunch more examples-- 8 00:00:21,100 --> 00:00:23,320 three more examples of dynamic programming-- 9 00:00:23,320 --> 00:00:25,900 longest common subsequence, longest increasing subsequence, 10 00:00:25,900 --> 00:00:28,990 and kind of a made up problem from 006, 11 00:00:28,990 --> 00:00:30,910 alternating coin game. 12 00:00:30,910 --> 00:00:32,980 And through those examples, we're 13 00:00:32,980 --> 00:00:36,760 going to explore a few new ideas, things 14 00:00:36,760 --> 00:00:38,830 we haven't seen in action before yet-- dealing 15 00:00:38,830 --> 00:00:40,990 with multiple sequences, instead of just one; 16 00:00:40,990 --> 00:00:42,880 dealing with substrings of sequences, 17 00:00:42,880 --> 00:00:45,190 instead of prefixes and suffixes; 18 00:00:45,190 --> 00:00:47,200 parent pointers so we can recover solutions, 19 00:00:47,200 --> 00:00:48,580 like in shortest paths. 20 00:00:48,580 --> 00:00:51,220 And a big one, which will mostly be fleshed out 21 00:00:51,220 --> 00:00:53,290 in the next lecture, is subproblem constraint 22 00:00:53,290 --> 00:00:55,030 and expansion. 23 00:00:55,030 --> 00:00:59,080 This is all part of the SRTBOT paradigm-- 24 00:00:59,080 --> 00:01:04,900 remember, subproblems, relations, topological order, 25 00:01:04,900 --> 00:01:07,780 base case, original problem, and time. 26 00:01:07,780 --> 00:01:09,520 Here is subproblems and relations. 27 00:01:09,520 --> 00:01:11,500 I've written down both what these things are 28 00:01:11,500 --> 00:01:15,340 and the key lessons we got from the first dynamic programming 29 00:01:15,340 --> 00:01:16,060 lecture. 30 00:01:16,060 --> 00:01:18,400 Namely, we want to split up our problem into multiple 31 00:01:18,400 --> 00:01:20,980 subproblems, and if your input is a sequence-- 32 00:01:20,980 --> 00:01:24,130 that's the main case we've seen so far-- 33 00:01:24,130 --> 00:01:27,130 like the bowling problem, for example, 34 00:01:27,130 --> 00:01:29,650 then the natural subproblems to try 35 00:01:29,650 --> 00:01:32,200 are prefixes, suffixes, or substrings. 36 00:01:32,200 --> 00:01:35,188 Prefixes and suffixes are nice, because there's few of them. 37 00:01:35,188 --> 00:01:36,730 There's only a linear number of them. 38 00:01:36,730 --> 00:01:39,455 In general, we want a polynomial number. 39 00:01:39,455 --> 00:01:41,330 Sometimes you can get away with one of these. 40 00:01:41,330 --> 00:01:43,750 They're usually about the same. 41 00:01:43,750 --> 00:01:46,060 Sometimes you need substrings. 42 00:01:46,060 --> 00:01:48,980 There's quadratically many of those. 43 00:01:48,980 --> 00:01:53,020 Then, once you set up the subproblems, which-- 44 00:01:53,020 --> 00:01:54,940 it's easy to set up some problems, 45 00:01:54,940 --> 00:01:56,680 but hard to do it right-- 46 00:01:56,680 --> 00:01:58,420 to test whether you did it right is, 47 00:01:58,420 --> 00:02:00,400 can I write a recurrence relation 48 00:02:00,400 --> 00:02:04,600 that relates one subproblem solution to smaller subproblems 49 00:02:04,600 --> 00:02:05,620 solutions? 50 00:02:05,620 --> 00:02:07,390 And the general trick for doing this 51 00:02:07,390 --> 00:02:11,530 is to identify some feature of the solution you're 52 00:02:11,530 --> 00:02:12,200 looking for. 53 00:02:12,200 --> 00:02:13,990 So you're trying to solve some subproblem, 54 00:02:13,990 --> 00:02:17,710 and you try to ask a question whose answer would let you 55 00:02:17,710 --> 00:02:19,112 reduce to a smaller subproblem. 56 00:02:19,112 --> 00:02:21,070 So if you can figure out what that question is, 57 00:02:21,070 --> 00:02:23,618 and that question only has a polynomial number of answers, 58 00:02:23,618 --> 00:02:25,660 then boom-- and you've only got polynomial number 59 00:02:25,660 --> 00:02:28,730 of subproblems, then you will get a polynomial running time. 60 00:02:28,730 --> 00:02:31,810 So I have, I think, another way to say this. 61 00:02:31,810 --> 00:02:34,450 We just locally brute force all possible answers 62 00:02:34,450 --> 00:02:36,490 to whatever question we come up with, 63 00:02:36,490 --> 00:02:38,740 as long as there's polynomially many. 64 00:02:38,740 --> 00:02:43,180 And then each of them is going to recursively call the smaller 65 00:02:43,180 --> 00:02:45,190 subproblems, but because we memoize, 66 00:02:45,190 --> 00:02:47,110 we'll only solve each subproblem once. 67 00:02:47,110 --> 00:02:48,760 And so on the end, the running time 68 00:02:48,760 --> 00:02:50,770 will be, at most, the number of subproblems 69 00:02:50,770 --> 00:02:54,910 times the non-recursive work done in that relation. 70 00:02:54,910 --> 00:02:58,120 For that to work, of course, the relations between subproblems 71 00:02:58,120 --> 00:03:00,850 must be acyclic, so we'd like to give 72 00:03:00,850 --> 00:03:02,380 an explicit topological order. 73 00:03:02,380 --> 00:03:04,900 Usually it's a couple of for loops. 74 00:03:04,900 --> 00:03:07,450 But this is a topological order in the subproblem 75 00:03:07,450 --> 00:03:11,240 DAG, which I defined somewhat informally last time. 76 00:03:11,240 --> 00:03:11,740 the. 77 00:03:11,740 --> 00:03:13,840 Vertices are subproblems, and I want 78 00:03:13,840 --> 00:03:17,180 to draw an edge from a smaller problem to a bigger problem, 79 00:03:17,180 --> 00:03:22,270 meaning that, if evaluating b in this relation calls a, 80 00:03:22,270 --> 00:03:24,070 then I'll draw an arrow from a to b, 81 00:03:24,070 --> 00:03:26,890 from the things I need to do first to the things 82 00:03:26,890 --> 00:03:28,880 I'll do later. 83 00:03:28,880 --> 00:03:31,150 So then topological order will be 84 00:03:31,150 --> 00:03:34,060 ready-- by the time I try to compute b, 85 00:03:34,060 --> 00:03:37,220 I will have already computed a. 86 00:03:37,220 --> 00:03:39,860 And of course, the relation also needs base cases. 87 00:03:39,860 --> 00:03:43,940 And then sometimes the original problem we want to solve 88 00:03:43,940 --> 00:03:45,410 is just one of those subproblems. 89 00:03:45,410 --> 00:03:47,010 Sometimes we need to combine multiple. 90 00:03:47,010 --> 00:03:48,630 We'll see that today. 91 00:03:48,630 --> 00:03:51,770 So that's a review of this framework. 92 00:03:51,770 --> 00:03:55,430 And let's dive into longest common subsequence. 93 00:03:58,060 --> 00:03:59,665 This is kind of a classic problem. 94 00:04:07,280 --> 00:04:09,700 It even has applications to things 95 00:04:09,700 --> 00:04:11,050 like computational biology. 96 00:04:11,050 --> 00:04:12,490 You have two DNA sequences. 97 00:04:12,490 --> 00:04:15,250 You want to measure how in common they are. 98 00:04:15,250 --> 00:04:16,660 One version of that-- 99 00:04:16,660 --> 00:04:19,209 might see other versions in recitation-- 100 00:04:19,209 --> 00:04:22,540 called edit distance-- this is a simplest, cleanest version, 101 00:04:22,540 --> 00:04:24,990 where I give you two sequences-- 102 00:04:24,990 --> 00:04:28,880 I have an example here. 103 00:04:28,880 --> 00:04:31,090 So for example, it could be a sequence of letters. 104 00:04:37,010 --> 00:04:41,950 So my first sequence spells hieroglyphology-- 105 00:04:41,950 --> 00:04:43,450 study of hieroglyphs. 106 00:04:43,450 --> 00:04:47,950 And second sequence spells Michelangelo. 107 00:04:52,390 --> 00:04:54,630 And what I'd like is a subsequence. 108 00:04:54,630 --> 00:04:58,780 So remember, substring has to be some continuous range, 109 00:04:58,780 --> 00:05:00,400 some interval. 110 00:05:00,400 --> 00:05:03,010 Subsequence-- you can take any subset 111 00:05:03,010 --> 00:05:05,860 of the letters in your sequence or any subset 112 00:05:05,860 --> 00:05:07,130 of the items in your sequence. 113 00:05:07,130 --> 00:05:09,490 So you can have blanks in between. 114 00:05:09,490 --> 00:05:11,110 You can skip over items. 115 00:05:11,110 --> 00:05:14,440 And so what we want is the longest sequence 116 00:05:14,440 --> 00:05:17,650 that is a subsequence of both the first string, 117 00:05:17,650 --> 00:05:19,910 the first sequence, and the second string. 118 00:05:19,910 --> 00:05:22,000 And if you stare at this long enough, 119 00:05:22,000 --> 00:05:24,250 the longest common subsequence-- 120 00:05:24,250 --> 00:05:25,750 I don't think it's unique, but there 121 00:05:25,750 --> 00:05:32,830 is a longest common sequence, which is hello hiding in there. 122 00:05:35,560 --> 00:05:38,440 And that is a longest common subsequence. 123 00:05:38,440 --> 00:05:44,410 So given that input, the goal is to compute hello, 124 00:05:44,410 --> 00:05:46,420 or whatever the longest common subsequence is. 125 00:05:46,420 --> 00:05:49,660 So we're given-- write this down carefully-- 126 00:05:49,660 --> 00:05:50,890 given two sequences. 127 00:05:50,890 --> 00:05:58,120 Let me name them A and B. We want 128 00:05:58,120 --> 00:06:08,290 to find the longest sequence L that's 129 00:06:08,290 --> 00:06:18,862 a subsequence both A and B. So that's the problem definition, 130 00:06:18,862 --> 00:06:20,320 and we're going to see how to solve 131 00:06:20,320 --> 00:06:23,620 it using dynamic programming. 132 00:06:23,620 --> 00:06:26,440 And whereas, in the bowling problem, 133 00:06:26,440 --> 00:06:29,980 we just had a single sequence of numbers-- 134 00:06:29,980 --> 00:06:31,780 the values of the bowling pins-- 135 00:06:31,780 --> 00:06:33,700 here we have two sequences. 136 00:06:33,700 --> 00:06:36,340 And so we need a new trick. 137 00:06:36,340 --> 00:06:39,108 Before, we said, OK, if our subproblems-- or sorry-- 138 00:06:39,108 --> 00:06:40,900 if our input consists of a single sequence, 139 00:06:40,900 --> 00:06:43,330 we'll try prefixes, suffixes, or substrings. 140 00:06:43,330 --> 00:06:45,370 Now we've got two sequences, so somehow we 141 00:06:45,370 --> 00:06:53,090 need to combine multiple inputs together. 142 00:06:53,090 --> 00:06:57,880 And so here's a general trick-- 143 00:06:57,880 --> 00:07:04,700 subproblems for multiple inputs. 144 00:07:04,700 --> 00:07:07,120 It's a very simple trick. 145 00:07:07,120 --> 00:07:16,045 We just take the product, multiply the subproblem spaces. 146 00:07:21,020 --> 00:07:21,520 OK. 147 00:07:21,520 --> 00:07:25,660 In the sense of cross product of sets, and in particular, 148 00:07:25,660 --> 00:07:27,540 from a combinatorial perspective-- 149 00:07:27,540 --> 00:07:29,530 so we have two inputs, the first sequence 150 00:07:29,530 --> 00:07:32,772 A and the second sequence B. For each of them, 151 00:07:32,772 --> 00:07:35,230 there's a natural choice, or there's three natural choices. 152 00:07:35,230 --> 00:07:36,272 We could do one of these. 153 00:07:36,272 --> 00:07:39,640 I will choose suffixes for A and suffixes for B. 154 00:07:39,640 --> 00:07:41,560 You could do some other combination, 155 00:07:41,560 --> 00:07:43,660 but that would be enough for here. 156 00:07:43,660 --> 00:07:45,550 And then I want to multiply these spaces, 157 00:07:45,550 --> 00:07:47,860 meaning the number of subproblems 158 00:07:47,860 --> 00:07:50,140 is going to be the product of the number of suffixes 159 00:07:50,140 --> 00:07:53,810 here times the number of suffixes here. 160 00:07:53,810 --> 00:07:57,500 And in other words, every subproblem in LCS 161 00:07:57,500 --> 00:08:00,260 is going to be a pair of suffixes. 162 00:08:00,260 --> 00:08:05,400 So let me write that down. 163 00:08:05,400 --> 00:08:20,490 So for LCS, our subproblems are L of i, j-- 164 00:08:20,490 --> 00:08:22,860 this is going to be the longest common subsequence-- 165 00:08:22,860 --> 00:08:27,180 of the suffix of A starting at i and the suffix of B 166 00:08:27,180 --> 00:08:28,110 starting at j. 167 00:08:31,190 --> 00:08:35,750 And just to be clear how many there are, 168 00:08:35,750 --> 00:08:38,960 I'll give the ranges for i and j-- 169 00:08:41,909 --> 00:08:43,320 not going to assume the sequences 170 00:08:43,320 --> 00:08:46,980 are the same length, like in the example. 171 00:08:46,980 --> 00:08:49,490 So I'll write lengths of A and lengths of B. I like 172 00:08:49,490 --> 00:08:51,050 to include the empty suffix. 173 00:08:51,050 --> 00:08:55,730 So when j equals the length of B that's 0 items in it, 174 00:08:55,730 --> 00:08:58,000 because that makes for really easy base cases. 175 00:08:58,000 --> 00:09:01,320 So I'd like to include those in my problems. 176 00:09:01,320 --> 00:09:05,060 So that was the S in SRTBOT. 177 00:09:05,060 --> 00:09:13,160 Now I claim that set subproblems is enough to do a relation-- 178 00:09:13,160 --> 00:09:15,480 recursive relation among them. 179 00:09:15,480 --> 00:09:18,252 So I'd like to solve every subproblems L i, j. 180 00:09:21,920 --> 00:09:23,780 Relation is actually pretty simple, 181 00:09:23,780 --> 00:09:26,930 but it's maybe not so obvious. 182 00:09:26,930 --> 00:09:30,800 So the idea is, because we're looking at suffixes, 183 00:09:30,800 --> 00:09:32,750 we should always think about what 184 00:09:32,750 --> 00:09:36,500 happens in the first letter, because if we 185 00:09:36,500 --> 00:09:39,118 remove that first letter, then we get a smaller suffix. 186 00:09:39,118 --> 00:09:40,910 If you're doing prefixes, you should always 187 00:09:40,910 --> 00:09:41,910 look at the last letter. 188 00:09:41,910 --> 00:09:44,910 Either one would work for this problem. 189 00:09:44,910 --> 00:09:45,910 So we're doing suffixes. 190 00:09:45,910 --> 00:09:49,480 So we look at A of i and we look at B of j. 191 00:09:54,330 --> 00:09:58,290 That's the first letter in the suffix A starting at i 192 00:09:58,290 --> 00:10:01,260 and the suffix B starting at j. 193 00:10:01,260 --> 00:10:02,350 And there are two cases. 194 00:10:02,350 --> 00:10:06,226 They could be equal or different. 195 00:10:09,960 --> 00:10:11,760 I think the easier case to think about is 196 00:10:11,760 --> 00:10:12,720 when they're different. 197 00:10:12,720 --> 00:10:16,490 So like in hieroglyphology and Michelangelo, 198 00:10:16,490 --> 00:10:18,768 if we look at the whole string, say, the first letter 199 00:10:18,768 --> 00:10:21,060 in the top one is H. The first letter in the second one 200 00:10:21,060 --> 00:10:22,920 is M. Those are different letters, 201 00:10:22,920 --> 00:10:26,820 so clearly, one of those letters is not 202 00:10:26,820 --> 00:10:29,850 in the common subsequence, because they don't match. 203 00:10:29,850 --> 00:10:34,470 I can't start with an H. Well, I can start with an H, 204 00:10:34,470 --> 00:10:38,000 I could start with an M, but I can't start with both. 205 00:10:38,000 --> 00:10:39,750 One of those letters is not in the output. 206 00:10:39,750 --> 00:10:44,950 In this example, it's M. But I don't know which one, 207 00:10:44,950 --> 00:10:46,990 so I have this question. 208 00:10:46,990 --> 00:10:49,120 I want to identify some question. 209 00:10:49,120 --> 00:10:51,810 And the question is, should I-- 210 00:10:51,810 --> 00:10:54,450 do I know that the H is not in the answer or do 211 00:10:54,450 --> 00:10:56,490 I know that M is not in the answer-- 212 00:10:56,490 --> 00:10:59,070 the final longest common subsequence? 213 00:10:59,070 --> 00:11:02,160 We don't know which, so we'll just try both. 214 00:11:02,160 --> 00:11:04,608 And then we're trying to maximize 215 00:11:04,608 --> 00:11:06,150 the length of our common subsequence, 216 00:11:06,150 --> 00:11:18,660 so we'll take the max of L i plus 1 j and L i, j minus 1. 217 00:11:21,280 --> 00:11:24,970 So the intuition here is one of-- 218 00:11:28,270 --> 00:11:36,105 at least one of Ai and Bj is not in the LCS. 219 00:11:40,100 --> 00:11:41,480 Got this wrong. 220 00:11:41,480 --> 00:11:45,530 j plus 1-- sorry-- 221 00:11:45,530 --> 00:11:47,010 thinking about substrings already. 222 00:11:47,010 --> 00:11:47,510 Yeah. 223 00:11:47,510 --> 00:11:50,070 These are the beginning points, so I want exclude the i-th 224 00:11:50,070 --> 00:11:50,830 letter-- 225 00:11:50,830 --> 00:11:53,640 so if Ai is not in, then I want to look at the suffix 226 00:11:53,640 --> 00:11:54,720 starting at i plus 1. 227 00:11:54,720 --> 00:11:56,850 If Bj is not in, then I want to look at the suffix 228 00:11:56,850 --> 00:11:58,410 starting at j plus 1. 229 00:11:58,410 --> 00:12:00,240 So the indices are always increasing 230 00:12:00,240 --> 00:12:03,580 in the function calls. 231 00:12:03,580 --> 00:12:06,090 And the other case is that they're equal. 232 00:12:06,090 --> 00:12:08,995 So this one I have a little bit harder time arguing. 233 00:12:08,995 --> 00:12:10,870 I'm going to write the answer, and then prove 234 00:12:10,870 --> 00:12:12,010 that the answer is correct. 235 00:12:14,680 --> 00:12:17,870 Here I claim you don't need to make any choices. 236 00:12:17,870 --> 00:12:20,020 There's no question you need to answer. 237 00:12:20,020 --> 00:12:23,860 You can actually guarantee that Ai and Bj 238 00:12:23,860 --> 00:12:27,430 might as well be in the longest common subsequence. 239 00:12:27,430 --> 00:12:31,090 And so I get one point for that and then I recurse 240 00:12:31,090 --> 00:12:32,500 on all the remaining letters-- 241 00:12:32,500 --> 00:12:35,470 so from i plus 1 on and from j plus 1 on. 242 00:12:35,470 --> 00:12:36,940 Why is this OK? 243 00:12:36,940 --> 00:12:42,040 Well, we have A, B. We're starting at position i, 244 00:12:42,040 --> 00:12:47,440 and starting at position j for B. 245 00:12:47,440 --> 00:12:49,378 Think of some optimal solution, some longest 246 00:12:49,378 --> 00:12:50,170 common subsequence. 247 00:12:50,170 --> 00:12:54,490 So it pairs up letters in some way. 248 00:12:54,490 --> 00:12:57,850 This would be some non-crossing pairing between equal letters. 249 00:13:01,840 --> 00:13:03,880 So first case is that maybe i and j 250 00:13:03,880 --> 00:13:05,320 aren't paired with anything. 251 00:13:05,320 --> 00:13:06,730 Well, that's silly, because if they're not 252 00:13:06,730 --> 00:13:08,020 paired with anything, you have some bearing 253 00:13:08,020 --> 00:13:09,062 on the rest of the items. 254 00:13:09,062 --> 00:13:11,715 You can add this pair, and that would be a longer subsequence. 255 00:13:11,715 --> 00:13:13,090 So that would be a contradiction. 256 00:13:13,090 --> 00:13:16,510 If we're taking-- imagining some hypothetical optimal solution, 257 00:13:16,510 --> 00:13:20,050 it has to pair one of these with something. 258 00:13:20,050 --> 00:13:23,140 Maybe it pairs i with something else, though. 259 00:13:23,140 --> 00:13:25,690 Well, if we have a longest common subsequence 260 00:13:25,690 --> 00:13:31,930 that looks like that, I can just instead pair i with Bj. 261 00:13:31,930 --> 00:13:35,800 If I had this pairing, I'm actually not using any 262 00:13:35,800 --> 00:13:38,510 of these letters, so why don't I just use this letter instead? 263 00:13:38,510 --> 00:13:41,052 So you can argue there is the longest common subsequence that 264 00:13:41,052 --> 00:13:43,630 matches Ai with Bj, and so then we 265 00:13:43,630 --> 00:13:46,600 can guarantee by that little proof 266 00:13:46,600 --> 00:13:50,200 that we get one point for matching them up-- 267 00:13:50,200 --> 00:13:52,660 that we don't have to max this with anything. 268 00:13:52,660 --> 00:13:55,000 OK, so two cases-- 269 00:13:55,000 --> 00:13:57,430 pretty simple formula. 270 00:13:57,430 --> 00:13:59,000 And then we're basically done. 271 00:13:59,000 --> 00:14:01,270 We just need to fill in the rest of SRTBOT. 272 00:14:01,270 --> 00:14:03,500 So next is topological order. 273 00:14:09,230 --> 00:14:12,040 So this I'll write as for loops. 274 00:14:20,493 --> 00:14:21,910 Because I'm dealing with suffixes, 275 00:14:21,910 --> 00:14:23,890 we want to start with the empty suffixes, 276 00:14:23,890 --> 00:14:26,080 and then work our way to larger and larger suffixes. 277 00:14:26,080 --> 00:14:27,430 So this might seem backwards. 278 00:14:27,430 --> 00:14:29,863 If you're doing prefixes, it would be an increasing order. 279 00:14:29,863 --> 00:14:31,030 There's all sorts of orders. 280 00:14:31,030 --> 00:14:32,410 You could flip the i and j. 281 00:14:32,410 --> 00:14:34,540 It's very symmetric, so it doesn't really matter. 282 00:14:34,540 --> 00:14:39,670 But anything that's generally decreasing i and j is good. 283 00:14:39,670 --> 00:14:41,080 Then we have base cases. 284 00:14:43,750 --> 00:14:46,660 These are when one of the sequences is empty. 285 00:14:46,660 --> 00:14:50,080 I don't care how many items are in B, but if A has no items, 286 00:14:50,080 --> 00:14:52,030 there's no common subsequence. 287 00:14:52,030 --> 00:14:53,350 It's empty. 288 00:14:53,350 --> 00:14:57,670 And same for no matter how big A is, 289 00:14:57,670 --> 00:14:59,890 if I have exhausted the B string-- 290 00:14:59,890 --> 00:15:02,200 I start from beyond the last item-- 291 00:15:02,200 --> 00:15:05,020 then I should get 0. 292 00:15:05,020 --> 00:15:12,670 Then the original problem we want to solve is L 0, 0-- 293 00:15:12,670 --> 00:15:15,520 that's just the longest common sequence 294 00:15:15,520 --> 00:15:18,340 of the entire A and the entire B-- 295 00:15:18,340 --> 00:15:19,870 and time. 296 00:15:22,630 --> 00:15:26,060 OK, so for time, we need to know how many subproblems there are. 297 00:15:26,060 --> 00:15:30,220 It's A plus 1 times B plus 1. 298 00:15:30,220 --> 00:15:33,760 I'll just call that theta A, B. Assume these are not 299 00:15:33,760 --> 00:15:36,290 empty subsequences. 300 00:15:36,290 --> 00:15:40,150 So this is the number of subproblems. 301 00:15:40,150 --> 00:15:43,240 And then what we care about is, how much time do we 302 00:15:43,240 --> 00:15:46,300 spend for each sub problem in evaluating this recurrence 303 00:15:46,300 --> 00:15:47,060 relation? 304 00:15:47,060 --> 00:15:49,090 So we ignore the cost to recursively call 305 00:15:49,090 --> 00:15:51,510 these L's, because they are smaller subproblems. 306 00:15:51,510 --> 00:15:53,260 They're already dealt with when I multiply 307 00:15:53,260 --> 00:15:54,860 by the number of subproblems. 308 00:15:54,860 --> 00:15:58,090 So I just care about this max computation and this equality 309 00:15:58,090 --> 00:16:02,830 check, and we'll say those each cost constant time. 310 00:16:02,830 --> 00:16:09,360 So this is quadratic time. 311 00:16:09,360 --> 00:16:11,430 If the two strings are size n, this is n squared. 312 00:16:11,430 --> 00:16:14,700 In general, it's the product of the two string sizes. 313 00:16:14,700 --> 00:16:18,030 And that's longest common substring-- 314 00:16:18,030 --> 00:16:22,260 so pretty straightforward. 315 00:16:22,260 --> 00:16:24,360 Other than this little argument understanding 316 00:16:24,360 --> 00:16:26,670 the case when they're equal, the easy case where 317 00:16:26,670 --> 00:16:30,525 they're unequal, we just try the only two things we could do. 318 00:16:30,525 --> 00:16:34,620 One of Ai and Bj is not in the longest common subsequence, 319 00:16:34,620 --> 00:16:38,280 so what I like to say is we guess which of those 320 00:16:38,280 --> 00:16:40,470 is the correct one that is not in the longest 321 00:16:40,470 --> 00:16:41,550 common subsequence. 322 00:16:41,550 --> 00:16:43,980 And if we guess that it's in A, then we'll 323 00:16:43,980 --> 00:16:44,970 recurse on that side. 324 00:16:44,970 --> 00:16:48,000 If we guess that it's in j, then we'll recurse on-- 325 00:16:48,000 --> 00:16:49,190 by increasing j. 326 00:16:52,890 --> 00:16:54,210 I'd like to assume-- 327 00:16:54,210 --> 00:16:57,330 not really-- that we always make the correct guess, 328 00:16:57,330 --> 00:16:59,580 that we made the correct choice, whether it's i or j. 329 00:16:59,580 --> 00:17:01,090 Now, we don't actually know how to do that, 330 00:17:01,090 --> 00:17:02,430 so instead, we brute force. 331 00:17:02,430 --> 00:17:04,859 We try both of them, and then, because we're trying 332 00:17:04,859 --> 00:17:07,260 to maximize, we take the max-- 333 00:17:07,260 --> 00:17:09,839 just another way of thinking about it. 334 00:17:09,839 --> 00:17:11,880 But overall, very straightforward-- the only 335 00:17:11,880 --> 00:17:16,227 added complication here is we had to deal with two sequences 336 00:17:16,227 --> 00:17:18,060 simultaneously, and we just took the product 337 00:17:18,060 --> 00:17:21,450 of those-- pretty easy. 338 00:17:21,450 --> 00:17:23,937 In general, if you have some constant number of sequences, 339 00:17:23,937 --> 00:17:25,020 you can afford to do this. 340 00:17:25,020 --> 00:17:26,490 You'll still get polynomial. 341 00:17:26,490 --> 00:17:28,283 But of course, once you go to n sequences, 342 00:17:28,283 --> 00:17:29,200 you can't afford this. 343 00:17:29,200 --> 00:17:31,810 You would an n to the n behavior, 344 00:17:31,810 --> 00:17:34,620 so that's a limit to how far you could go. 345 00:17:34,620 --> 00:17:36,480 Two sequences is fine, three sequences 346 00:17:36,480 --> 00:17:38,190 is fine, but n sequences-- 347 00:17:38,190 --> 00:17:41,580 there probably is no polynomial time algorithm 348 00:17:41,580 --> 00:17:43,548 for this problem. 349 00:17:43,548 --> 00:17:45,090 Cool-- I want to show you an example. 350 00:17:47,710 --> 00:17:50,050 I have an example here. 351 00:17:50,050 --> 00:17:52,930 I didn't want to try out hieroglyphology 352 00:17:52,930 --> 00:17:56,450 versus Michelangelo, so I came up with another example. 353 00:17:56,450 --> 00:18:00,100 Their habit is to say hi. 354 00:18:00,100 --> 00:18:02,470 So the longest common subsequence of there and habit 355 00:18:02,470 --> 00:18:04,450 is HI. 356 00:18:04,450 --> 00:18:06,460 And it's a giant-- 357 00:18:06,460 --> 00:18:09,850 well, not that giant-- it looks kind of like a grid graph. 358 00:18:09,850 --> 00:18:13,630 The base cases are out here, because those correspond-- 359 00:18:16,330 --> 00:18:17,830 each of these nodes is a subproblem, 360 00:18:17,830 --> 00:18:21,070 and this corresponds to, what is the longest common subsequence 361 00:18:21,070 --> 00:18:25,720 between EIR and ABIT? 362 00:18:25,720 --> 00:18:27,900 And it should be I. It's the only letter they have 363 00:18:27,900 --> 00:18:29,650 in common, and that's why there's a 1 here 364 00:18:29,650 --> 00:18:31,840 to say that the longest common subsequence has 365 00:18:31,840 --> 00:18:33,910 a 1-- has size 1. 366 00:18:33,910 --> 00:18:36,910 The base cases are when either their has been emptied 367 00:18:36,910 --> 00:18:39,880 or when habit has been emptied, so those all have 0's 368 00:18:39,880 --> 00:18:40,990 on the outside. 369 00:18:40,990 --> 00:18:43,240 And then the problem we care about is this one. 370 00:18:43,240 --> 00:18:45,520 It's their versus habit. 371 00:18:45,520 --> 00:18:47,110 Claim is the length is 2. 372 00:18:47,110 --> 00:18:51,640 And what I've drawn in here are what I'll call parent pointers, 373 00:18:51,640 --> 00:18:56,980 like we talked about with BFS, and shortest paths, and so on. 374 00:18:59,530 --> 00:19:02,380 So we had this choice-- 375 00:19:02,380 --> 00:19:04,210 sometimes we had a choice-- 376 00:19:04,210 --> 00:19:07,930 on whether we recursed here or recursed here 377 00:19:07,930 --> 00:19:10,720 was the best thing to do. 378 00:19:10,720 --> 00:19:15,310 I'll draw in red the arrow from L i, j-- sorry-- to L 379 00:19:15,310 --> 00:19:17,847 i, j from one of these in this case. 380 00:19:17,847 --> 00:19:19,430 And in this case, there was no choice, 381 00:19:19,430 --> 00:19:23,650 but I'll still draw in red that arrow. 382 00:19:23,650 --> 00:19:26,550 So these diagonal edges are exactly the cases 383 00:19:26,550 --> 00:19:27,550 where the letters match. 384 00:19:27,550 --> 00:19:31,060 Here H equals H. I equals I here, 385 00:19:31,060 --> 00:19:32,290 so I draw this diagonal edge. 386 00:19:32,290 --> 00:19:34,850 That's that first case, where the letters are equal, 387 00:19:34,850 --> 00:19:37,780 and so I recurse by increasing i and j. 388 00:19:37,780 --> 00:19:39,220 That's why I get a diagonal edge. 389 00:19:39,220 --> 00:19:44,560 There's also one over here, where T equals T. 390 00:19:44,560 --> 00:19:46,270 So for those, we're getting a 1 plus. 391 00:19:46,270 --> 00:19:48,730 You see this 1 is 1 larger than the 0. 392 00:19:48,730 --> 00:19:50,260 This 2 is 1 larger than the 1. 393 00:19:50,260 --> 00:19:52,180 This 1 is one larger than the 0. 394 00:19:52,180 --> 00:19:55,840 And for every other vertex, we are rehearsing 395 00:19:55,840 --> 00:19:57,910 this way and this way. 396 00:19:57,910 --> 00:20:01,240 We see what those two numbers are, and we take the max. 397 00:20:01,240 --> 00:20:04,630 So this whole diagram is just filled in by-- 398 00:20:04,630 --> 00:20:08,210 for each position, where they're not equal. 399 00:20:08,210 --> 00:20:09,267 We look at the guy below. 400 00:20:09,267 --> 00:20:10,600 We look at the guy to the right. 401 00:20:10,600 --> 00:20:13,390 Those are the slightly smaller substrings. 402 00:20:13,390 --> 00:20:14,530 We look at those values. 403 00:20:14,530 --> 00:20:15,870 We take the max. 404 00:20:15,870 --> 00:20:18,250 As long as you compute this in a generally 405 00:20:18,250 --> 00:20:21,310 right-to-left and bottom-up fashion, whenever you're 406 00:20:21,310 --> 00:20:22,840 trying to compute a guy, you will 407 00:20:22,840 --> 00:20:25,630 have the-- its predecessors already computed. 408 00:20:25,630 --> 00:20:28,030 That's the topological order of this graph. 409 00:20:28,030 --> 00:20:30,280 And then, at the end, we get our answer, which is 2. 410 00:20:30,280 --> 00:20:33,490 And now, if we pay attention to where we came from-- 411 00:20:33,490 --> 00:20:36,430 for example, this vertex had to come from this direction-- 412 00:20:36,430 --> 00:20:41,800 2 is the max of 2 and 1, so I highlight the 2 edge. 413 00:20:41,800 --> 00:20:46,600 And if I follow this path, there should be a unique path 414 00:20:46,600 --> 00:20:48,040 to some base case. 415 00:20:48,040 --> 00:20:49,630 We don't know which one. 416 00:20:49,630 --> 00:20:52,000 And in this case, the diagonal edges 417 00:20:52,000 --> 00:20:54,070 correspond to my matching letters. 418 00:20:54,070 --> 00:20:59,590 So there's the H here, followed by the I here. 419 00:20:59,590 --> 00:21:02,172 And so HI is our longest common substring. 420 00:21:02,172 --> 00:21:03,880 In general, we just follow these pointers 421 00:21:03,880 --> 00:21:08,090 backward-- the red pointers-- and we get our answer. 422 00:21:08,090 --> 00:21:10,660 So not only do we compute the length of the LCS, 423 00:21:10,660 --> 00:21:14,740 but we actually can find the LCS using parent pointers. 424 00:21:30,390 --> 00:21:33,670 And this is a concept you can use in most dynamic programs, 425 00:21:33,670 --> 00:21:36,480 including all the ones from today. 426 00:21:36,480 --> 00:21:37,950 OK, any questions about LCS? 427 00:21:42,370 --> 00:21:45,655 All right-- perfectly clear to everyone in the audience. 428 00:21:48,880 --> 00:21:54,430 Now we move onto longest increasing subsequence, which-- 429 00:21:54,430 --> 00:21:56,650 did I lose a page here? 430 00:21:56,650 --> 00:21:58,630 I just disordered them. 431 00:22:01,140 --> 00:22:03,210 OK, this problem has almost the same name, 432 00:22:03,210 --> 00:22:05,920 but is quite different and behavior-- 433 00:22:05,920 --> 00:22:11,390 longest increasing subsequence-- 434 00:22:15,940 --> 00:22:17,720 LIS, instead of LCS-- 435 00:22:21,690 --> 00:22:25,470 both famous problems, examples of dynamic programming. 436 00:22:25,470 --> 00:22:33,140 Here we're just given one sequence, like carbohydrate. 437 00:22:37,950 --> 00:22:39,510 So this is a sequence of letters, 438 00:22:39,510 --> 00:22:43,620 and I want to find the longest subsequence of this sequence 439 00:22:43,620 --> 00:22:45,930 that is increasing-- strictly increasing, 440 00:22:45,930 --> 00:22:50,910 let's say-- so in this case, alphabetically. 441 00:22:50,910 --> 00:22:53,760 I could include CR, for example, but not CB. 442 00:22:53,760 --> 00:22:57,960 That would be a descending subsequence. 443 00:22:57,960 --> 00:23:00,840 In this example, the right answer-- 444 00:23:00,840 --> 00:23:04,825 or a right answer is abort. 445 00:23:04,825 --> 00:23:07,200 There aren't very many English words that are increasing, 446 00:23:07,200 --> 00:23:11,040 but there are some, and I looked through all of them. 447 00:23:11,040 --> 00:23:12,930 As I just implemented this dynamic program 448 00:23:12,930 --> 00:23:16,500 we're about to write down, it took me like two minutes 449 00:23:16,500 --> 00:23:18,240 to write down the DP, and then more 450 00:23:18,240 --> 00:23:22,540 work to read the dictionary and look for cool examples. 451 00:23:22,540 --> 00:23:28,680 So in general, we're given some sequence A, 452 00:23:28,680 --> 00:23:34,350 and we want to find the longest increasing subsequence of A-- 453 00:23:34,350 --> 00:23:38,345 the longest sequence that is increasing, strictly. 454 00:23:42,350 --> 00:23:45,210 We could use the same thing to solve not strictly increasing, 455 00:23:45,210 --> 00:23:45,710 but-- 456 00:23:48,680 --> 00:23:51,170 so here things are going to be a little trickier. 457 00:23:51,170 --> 00:23:53,280 It's easy, in that we just have a single sequence. 458 00:23:53,280 --> 00:23:56,360 So again, we think, OK, let's look at our chart here. 459 00:23:56,360 --> 00:23:58,370 We could try prefixes, suffixes, or substrings. 460 00:23:58,370 --> 00:24:00,320 I personally prefer suffixes. 461 00:24:00,320 --> 00:24:02,630 Jason prefers prefixes. 462 00:24:02,630 --> 00:24:09,317 Whatever you prefer is fine, but always, generally, start there, 463 00:24:09,317 --> 00:24:11,150 because there's nothing in this problem that 464 00:24:11,150 --> 00:24:13,433 makes me think I need to delete things from both ends. 465 00:24:13,433 --> 00:24:14,600 AUDIENCE: I have a question. 466 00:24:14,600 --> 00:24:15,500 ERIK DEMAINE: Yeah-- question? 467 00:24:15,500 --> 00:24:17,708 AUDIENCE: Isn't the answer to this problem always 26? 468 00:24:20,560 --> 00:24:24,230 ERIK DEMAINE: Is the answer always, at most, 26? 469 00:24:24,230 --> 00:24:27,110 Yes, if you're dealing with English words-- 470 00:24:27,110 --> 00:24:30,440 so when I say sequence here, this is a sequence of arbitrary 471 00:24:30,440 --> 00:24:31,520 integers-- 472 00:24:31,520 --> 00:24:32,430 word size integers. 473 00:24:32,430 --> 00:24:34,220 So there you can have a ton of variety. 474 00:24:34,220 --> 00:24:37,760 This is just for the fun of examples, I've drawn this. 475 00:24:37,760 --> 00:24:40,490 But even if the answer is 26, finding that longest 476 00:24:40,490 --> 00:24:42,680 common subsequence is-- 477 00:24:42,680 --> 00:24:46,340 the obvious algorithm would be to take all substrings of size 478 00:24:46,340 --> 00:24:49,620 26, which is n to the 26. 479 00:24:49,620 --> 00:24:51,920 We're going to do much faster than that here. 480 00:24:51,920 --> 00:24:53,240 N squared time. 481 00:24:53,240 --> 00:24:56,810 And then, if you remove the strictly increasing, 482 00:24:56,810 --> 00:24:59,840 then it can be arbitrarily large. 483 00:24:59,840 --> 00:25:04,310 OK, so let's try to do this. 484 00:25:07,760 --> 00:25:12,255 Maybe I won't be so pessimistic to write attempt here. 485 00:25:12,255 --> 00:25:13,130 Let's just go for it. 486 00:25:15,660 --> 00:25:18,830 So I want some subproblems, and I'm going to choose suffixes. 487 00:25:18,830 --> 00:25:22,250 So I'm going to define L of i to be the longest increasing 488 00:25:22,250 --> 00:25:26,870 subsequence of the suffix of A starting at i. 489 00:25:26,870 --> 00:25:30,350 That's the obvious thing to do. 490 00:25:30,350 --> 00:25:33,200 And now I'm going to leave myself a little space, 491 00:25:33,200 --> 00:25:38,200 and then I'd like a relation on these. 492 00:25:38,200 --> 00:25:43,260 So I'd like to say what L of i is. 493 00:25:43,260 --> 00:25:44,670 And what do I have to work with? 494 00:25:44,670 --> 00:25:47,680 Well, I have live things larger than i. 495 00:25:47,680 --> 00:25:51,570 Those would be smaller suffixes. 496 00:25:51,570 --> 00:25:55,200 But let's go back to, what is a question that I could 497 00:25:55,200 --> 00:26:00,120 ask about this subproblem that might help me figure out 498 00:26:00,120 --> 00:26:02,710 what the longest increasing subsequence looks like? 499 00:26:02,710 --> 00:26:03,885 So we're looking at a-- 500 00:26:06,420 --> 00:26:09,630 here's A from i on. 501 00:26:09,630 --> 00:26:11,970 Longest increasing subsequence is some subsequence. 502 00:26:11,970 --> 00:26:13,857 And we'd like to remove letter i. 503 00:26:13,857 --> 00:26:15,690 Now, when we do that, there are two choices. 504 00:26:15,690 --> 00:26:18,840 Maybe i is in the longest increasing subsequence, 505 00:26:18,840 --> 00:26:19,807 or it's not in. 506 00:26:19,807 --> 00:26:21,390 So the question I would like to answer 507 00:26:21,390 --> 00:26:28,620 is, is i in the longest increasing subsequence of A-- 508 00:26:28,620 --> 00:26:30,135 of A from i onwards? 509 00:26:33,480 --> 00:26:34,590 This is a binary question. 510 00:26:34,590 --> 00:26:35,680 There are two options-- 511 00:26:35,680 --> 00:26:37,240 so again, just like before. 512 00:26:37,240 --> 00:26:41,340 And so I can brute force those two options. 513 00:26:41,340 --> 00:26:45,510 And then I want the longest one, so I'm going to take the max. 514 00:26:45,510 --> 00:26:49,980 So I'd like to take the max of something like Li plus 1. 515 00:26:49,980 --> 00:26:54,630 So in the case that I don't put i in the solution, that's fine. 516 00:26:54,630 --> 00:26:59,370 Then I just look at i plus 1 on, and recursively compute that, 517 00:26:59,370 --> 00:27:00,660 and that would be my answer. 518 00:27:00,660 --> 00:27:02,520 And the other option is that I do 519 00:27:02,520 --> 00:27:05,130 put i in the longest increasing subsequence, so I do 1 520 00:27:05,130 --> 00:27:11,830 plus the rest L i plus 1. 521 00:27:11,830 --> 00:27:14,070 If I close this brace, this would 522 00:27:14,070 --> 00:27:16,620 be a very strange recurrence, because this is always 523 00:27:16,620 --> 00:27:18,570 bigger than this one. 524 00:27:18,570 --> 00:27:20,880 There's something wrong here, and the something wrong 525 00:27:20,880 --> 00:27:24,753 is I haven't enforced increasing at all. 526 00:27:24,753 --> 00:27:25,920 There's no constraints here. 527 00:27:25,920 --> 00:27:28,610 It's just saying, well, I'll i put in, and then I'll do 528 00:27:28,610 --> 00:27:31,970 whatever remains, and I'll pray that that's increasing-- 529 00:27:31,970 --> 00:27:37,430 probably won't be, because indeed, if i is a letter-- 530 00:27:37,430 --> 00:27:41,090 or is a number that is strictly greater than i plus 1, 531 00:27:41,090 --> 00:27:42,180 then this will be wrong. 532 00:27:42,180 --> 00:27:46,820 So I really can't always do this. 533 00:27:46,820 --> 00:27:50,000 I could check whether i plus 1 is in the answer, but some-- 534 00:27:50,000 --> 00:27:52,550 but I don't. 535 00:27:52,550 --> 00:27:56,990 I can check whether the letter i is less than letter i plus 1. 536 00:27:56,990 --> 00:27:59,733 But maybe I put this in the longest increasing subsequence 537 00:27:59,733 --> 00:28:02,150 and then I put this in the longest increasing subsequence, 538 00:28:02,150 --> 00:28:03,858 and so I need to compare these two items. 539 00:28:03,858 --> 00:28:06,290 But I don't know when that's going to happen. 540 00:28:06,290 --> 00:28:07,940 Things seem really hard. 541 00:28:07,940 --> 00:28:12,260 And indeed, there's no way, from this subproblem definition, 542 00:28:12,260 --> 00:28:14,910 to write down a relation. 543 00:28:14,910 --> 00:28:19,260 But there is a slight tweak to this definition 544 00:28:19,260 --> 00:28:20,530 that makes it work. 545 00:28:20,530 --> 00:28:25,500 So the trouble we have here-- and this is the idea 546 00:28:25,500 --> 00:28:29,640 of subproblem constraints or conditions-- 547 00:28:29,640 --> 00:28:32,010 the trouble we have is, when we recursively 548 00:28:32,010 --> 00:28:35,580 compute the longest increasing subsequence on the remainder, 549 00:28:35,580 --> 00:28:40,920 we don't know the first item in that answer. 550 00:28:40,920 --> 00:28:42,030 Maybe it's i plus 1. 551 00:28:42,030 --> 00:28:43,560 Maybe it's some guy over here. 552 00:28:43,560 --> 00:28:45,060 If we knew who it was, then we could 553 00:28:45,060 --> 00:28:47,850 compare that item to item i. 554 00:28:47,850 --> 00:28:53,340 And so what we'd like to do is add a constraint 555 00:28:53,340 --> 00:28:56,790 to the subproblem that somehow lets us know where the longest 556 00:28:56,790 --> 00:28:59,018 increasing subsequence starts. 557 00:28:59,018 --> 00:29:01,560 So what I would like to say is long as increasing subsequence 558 00:29:01,560 --> 00:29:12,390 of that suffix that starts with A of i. 559 00:29:12,390 --> 00:29:16,060 So in other words, it includes A of i. 560 00:29:19,660 --> 00:29:21,025 This was a separate question. 561 00:29:24,430 --> 00:29:26,410 OK, this is a bit of a funny constraint. 562 00:29:26,410 --> 00:29:27,880 It changes the problem. 563 00:29:27,880 --> 00:29:30,280 It's no longer what we want to solve. 564 00:29:30,280 --> 00:29:34,510 If you think about the original problem, 565 00:29:34,510 --> 00:29:36,662 before, it was solving L of 0. 566 00:29:36,662 --> 00:29:38,620 We just want the longest increasing subsequence 567 00:29:38,620 --> 00:29:39,710 of the whole thing. 568 00:29:39,710 --> 00:29:42,150 Now it's not necessarily L of 0. 569 00:29:42,150 --> 00:29:44,800 L of 0 means, what is the longest increasing sub sequence 570 00:29:44,800 --> 00:29:50,800 of the whole sequence A that includes the first letter of A? 571 00:29:50,800 --> 00:29:52,933 And maybe we do include the first letter of A, 572 00:29:52,933 --> 00:29:54,850 maybe we don't we don't know where the longest 573 00:29:54,850 --> 00:29:55,960 increasing subsequence starts. 574 00:29:55,960 --> 00:29:57,168 Here, for example, it didn't. 575 00:29:57,168 --> 00:29:59,970 It started at the second letter. 576 00:29:59,970 --> 00:30:03,450 But conveniently, it's OK that we don't know, 577 00:30:03,450 --> 00:30:05,880 because that's just another question we could 578 00:30:05,880 --> 00:30:08,610 ask is, where do we start? 579 00:30:08,610 --> 00:30:09,982 Where might the LIS start? 580 00:30:09,982 --> 00:30:11,940 Could start at the first letter, second letter, 581 00:30:11,940 --> 00:30:13,770 third letter-- there's only n choices, 582 00:30:13,770 --> 00:30:16,480 and let's just take the max of all of them. 583 00:30:16,480 --> 00:30:19,740 So before I get to the relation, let's solve this problem. 584 00:30:19,740 --> 00:30:24,380 I just want the max of L of i for all i. 585 00:30:29,305 --> 00:30:33,580 I guess we've been writing size of A here. 586 00:30:33,580 --> 00:30:37,062 OK, the maximum will be the overall longest 587 00:30:37,062 --> 00:30:38,020 increasing subsequence. 588 00:30:38,020 --> 00:30:40,480 So if I could find longest increasing sub sequence that 589 00:30:40,480 --> 00:30:43,450 includes the first letter, or the longest one that includes 590 00:30:43,450 --> 00:30:44,865 the second letter, and so on-- 591 00:30:44,865 --> 00:30:46,240 so it starts at the first letter, 592 00:30:46,240 --> 00:30:48,520 starts at the second letter-- then this max 593 00:30:48,520 --> 00:30:51,220 will be the longest overall. 594 00:30:51,220 --> 00:30:53,358 This subproblem is not what I really wanted, 595 00:30:53,358 --> 00:30:55,150 but it's still good enough, because it lets 596 00:30:55,150 --> 00:30:57,190 me solve my original problem. 597 00:30:57,190 --> 00:31:02,020 And this is adding an extra constraint to my problem. 598 00:31:02,020 --> 00:31:04,840 And doing this is challenging. 599 00:31:04,840 --> 00:31:07,660 Thinking about what the right constraint that would let you 600 00:31:07,660 --> 00:31:11,360 solve your problem is tricky, especially in the beginning. 601 00:31:11,360 --> 00:31:13,840 But for now, let's just take this as a thing that works. 602 00:31:13,840 --> 00:31:15,250 Why does it work? 603 00:31:15,250 --> 00:31:18,050 Because now I can say-- 604 00:31:18,050 --> 00:31:20,620 well, this term was fine, max-- 605 00:31:20,620 --> 00:31:23,680 so I'm trying to write longest increasing subsequence starting 606 00:31:23,680 --> 00:31:25,900 with the i-th letter, versus-- 607 00:31:28,610 --> 00:31:29,750 yeah, actually, no. 608 00:31:29,750 --> 00:31:31,442 This is just going to be different. 609 00:31:34,220 --> 00:31:37,940 OK, so now I get to the central issue, which is I 610 00:31:37,940 --> 00:31:42,050 know, by the definition of L of i, that I include letter i. 611 00:31:42,050 --> 00:31:44,690 This is going to be in my longest increasing subsequence. 612 00:31:44,690 --> 00:31:49,320 That's what I'm looking for, this definition. 613 00:31:49,320 --> 00:31:52,020 But I don't know what the second letter is-- 614 00:31:52,020 --> 00:31:53,910 could be i plus 1, could be i plus 2, 615 00:31:53,910 --> 00:31:56,530 could be anything bigger. 616 00:31:56,530 --> 00:31:58,240 Whenever there's something I don't know, 617 00:31:58,240 --> 00:32:00,940 I'll just brute force it. 618 00:32:00,940 --> 00:32:03,010 I don't care that I don't know. 619 00:32:03,010 --> 00:32:05,470 I'll just take a max over all the possible choices. 620 00:32:05,470 --> 00:32:08,740 Let's say that the next letter included in the longest 621 00:32:08,740 --> 00:32:11,590 increasing subsequence i is j. 622 00:32:11,590 --> 00:32:15,460 Then I would like to look at L of j. 623 00:32:15,460 --> 00:32:17,860 Now, I don't know what j is, but I'll just brute force 624 00:32:17,860 --> 00:32:19,330 all possible choices for j. 625 00:32:19,330 --> 00:32:22,150 So that's going to be i strictly less than j, 626 00:32:22,150 --> 00:32:26,290 because I don't want include the same letter i again. 627 00:32:26,290 --> 00:32:28,600 And otherwise, I would get an infinite recursive loop, 628 00:32:28,600 --> 00:32:31,330 if I put less than or equal to here. 629 00:32:31,330 --> 00:32:36,420 And maybe I don't do anything else to n. 630 00:32:36,420 --> 00:32:39,420 OK, not quite done-- 631 00:32:39,420 --> 00:32:41,880 now-- this is the interesting part-- 632 00:32:41,880 --> 00:32:45,270 I can enforce increasing, because I can't just choose 633 00:32:45,270 --> 00:32:48,000 any letter j to the right of i. 634 00:32:48,000 --> 00:32:52,080 Also, the number-- or letter that's written in here 635 00:32:52,080 --> 00:32:54,360 has to be greater than the number that's written here. 636 00:32:54,360 --> 00:32:56,980 That's the strictly increasing property. 637 00:32:56,980 --> 00:33:06,820 So I can add as a constraint in this max to say A of i 638 00:33:06,820 --> 00:33:08,960 am strictly less than A of j. 639 00:33:08,960 --> 00:33:11,370 And if you wanted non-strictly increasing, 640 00:33:11,370 --> 00:33:14,480 you would add an equal there. 641 00:33:14,480 --> 00:33:16,010 This is mathematical notation. 642 00:33:16,010 --> 00:33:21,200 In Python, you would say max open paren of this thing for j 643 00:33:21,200 --> 00:33:24,530 in this range, if this holds. 644 00:33:24,530 --> 00:33:27,680 So I'm just doing for loop, but I only do-- 645 00:33:27,680 --> 00:33:31,160 I only look at the possible choices for j 646 00:33:31,160 --> 00:33:35,550 when the strict increasing property holds. 647 00:33:35,550 --> 00:33:38,420 And then, when that holds, I put-- check this. 648 00:33:38,420 --> 00:33:39,680 Now, this set might be empty. 649 00:33:39,680 --> 00:33:44,030 I need to define what the max is when it's empty. 650 00:33:44,030 --> 00:33:45,770 Oh, I also need a 1 plus, don't I? 651 00:33:49,280 --> 00:33:51,680 Let me just do 1 plus. 652 00:33:51,680 --> 00:33:55,760 So we're told that i is in the answer, so we always get 1. 653 00:33:55,760 --> 00:34:00,560 And then the remainder is this or 0. 654 00:34:06,200 --> 00:34:09,140 If there are no Aj's greater than Ai, 655 00:34:09,140 --> 00:34:14,929 then we'll default to 0 and say that i is the only item 656 00:34:14,929 --> 00:34:18,350 in my increasing subsequence. 657 00:34:18,350 --> 00:34:20,600 OK, so there are a few details to get right, 658 00:34:20,600 --> 00:34:23,570 but in general, once you figure out what these recurrences look 659 00:34:23,570 --> 00:34:25,429 like, they're very simple. 660 00:34:25,429 --> 00:34:30,870 This is one line of code, and then all you need in addition 661 00:34:30,870 --> 00:34:34,260 to this is the original subproblem 662 00:34:34,260 --> 00:34:37,230 and some other things. 663 00:34:37,230 --> 00:34:42,659 We need the base cases, but I should do them in order. 664 00:34:42,659 --> 00:34:48,510 Topological order is just the usual for loop, 665 00:34:48,510 --> 00:34:50,400 because I'm doing suffixes. 666 00:34:50,400 --> 00:34:55,440 It's going to be i equals length of A down to 0. 667 00:34:55,440 --> 00:35:00,925 Base case is going to be L of length 668 00:35:00,925 --> 00:35:03,240 of A, which is 0, because there's 669 00:35:03,240 --> 00:35:06,480 no letters in that suffix. 670 00:35:06,480 --> 00:35:08,955 And we already did O, and then time-- 671 00:35:11,500 --> 00:35:14,030 is a little different from the past examples. 672 00:35:14,030 --> 00:35:15,960 So number of subproblems, just like usual, 673 00:35:15,960 --> 00:35:20,660 is linear length of A subproblems. 674 00:35:23,360 --> 00:35:25,875 It's only one sequence we're thinking about now, 675 00:35:25,875 --> 00:35:27,710 unlike the previous example. 676 00:35:27,710 --> 00:35:29,510 But the work we're doing in this relation 677 00:35:29,510 --> 00:35:31,040 now is non-trivial work. 678 00:35:31,040 --> 00:35:33,890 Before we were just guessing among two different choices. 679 00:35:33,890 --> 00:35:36,920 Now we're guessing among up to n different choices. 680 00:35:36,920 --> 00:35:42,130 This n here is length of A. 681 00:35:42,130 --> 00:35:49,620 And so we have theta length of A, 682 00:35:49,620 --> 00:36:00,990 non-recursive work that we're doing in each subproblem. 683 00:36:00,990 --> 00:36:06,210 Or you might think of this as choices that we're considering. 684 00:36:06,210 --> 00:36:08,310 And for each choice, we're just spending-- 685 00:36:08,310 --> 00:36:10,870 I mean, we're just taking a max of those items, adding 1. 686 00:36:10,870 --> 00:36:12,610 So that's a constant overhead. 687 00:36:12,610 --> 00:36:17,910 And so we just get this product, which is A squared-- 688 00:36:21,790 --> 00:36:22,960 cool. 689 00:36:22,960 --> 00:36:25,890 So that's longest increasing subsequence. 690 00:36:25,890 --> 00:36:28,820 Make sure I didn't miss anything else-- 691 00:36:28,820 --> 00:36:32,800 so we're using this idea of asking a question, 692 00:36:32,800 --> 00:36:35,320 and guessing or brute forcing the answer 693 00:36:35,320 --> 00:36:39,200 to that question in two places. 694 00:36:39,200 --> 00:36:43,900 One place is we're promising-- we're 695 00:36:43,900 --> 00:36:46,900 requesting that the longest increasing subsequence starts 696 00:36:46,900 --> 00:36:49,000 at i, so then the question is, well, 697 00:36:49,000 --> 00:36:54,450 what is the very second item that's 698 00:36:54,450 --> 00:36:57,360 in the longest increasing subsequence that starts with i? 699 00:36:57,360 --> 00:37:00,600 We're calling that j, and we're brute 700 00:37:00,600 --> 00:37:03,810 forcing all the possible choices that j could be, 701 00:37:03,810 --> 00:37:06,210 which conveniently lets us check, 702 00:37:06,210 --> 00:37:09,600 confirm that it's actually an increasing subsequence locally 703 00:37:09,600 --> 00:37:10,860 from i to j. 704 00:37:10,860 --> 00:37:13,320 And then L of j will take care of the rest. 705 00:37:13,320 --> 00:37:15,450 By induction, the rest of the longest increase 706 00:37:15,450 --> 00:37:18,000 in subsequence starting at j will also be increasing. 707 00:37:18,000 --> 00:37:20,550 And so this guarantees, by induction, the whole thing 708 00:37:20,550 --> 00:37:22,620 will be increasing. 709 00:37:22,620 --> 00:37:26,190 Then we also use this local brute force 710 00:37:26,190 --> 00:37:27,645 to solve the original problem. 711 00:37:27,645 --> 00:37:29,520 So we added this constraint of starting at i, 712 00:37:29,520 --> 00:37:32,340 but we didn't actually know overall where to start. 713 00:37:32,340 --> 00:37:34,740 But that's fine, because there's only A choices. 714 00:37:34,740 --> 00:37:38,590 So I should mention, in the running time analysis-- 715 00:37:38,590 --> 00:37:40,830 so they're solving the subproblems. 716 00:37:40,830 --> 00:37:42,360 That's fine, but then there's also 717 00:37:42,360 --> 00:37:46,050 a plus whatever it costs to solve the original problem. 718 00:37:46,050 --> 00:37:46,950 But that's OK. 719 00:37:46,950 --> 00:37:50,100 That's length of A. 720 00:37:50,100 --> 00:37:54,900 All of this plus length of A is still length of A squared. 721 00:37:57,970 --> 00:38:00,850 But if you're doing exponential work here, that would be bad. 722 00:38:00,850 --> 00:38:02,890 We have to do some reasonable amount of work 723 00:38:02,890 --> 00:38:05,980 to solve the original problem in terms of all the subproblems. 724 00:38:05,980 --> 00:38:09,700 I have an example hiding here. 725 00:38:12,410 --> 00:38:14,990 This is a little harder to stare at. 726 00:38:14,990 --> 00:38:17,510 Here I have empathy. 727 00:38:17,510 --> 00:38:20,090 And this example is not-- doesn't have much empathy, 728 00:38:20,090 --> 00:38:22,400 because the longest increasing subsequence of empathy 729 00:38:22,400 --> 00:38:23,705 is empty. 730 00:38:27,060 --> 00:38:31,350 Empty is one of the few English words that's increasing. 731 00:38:31,350 --> 00:38:35,970 And the hard part here is drawing the DAG. 732 00:38:35,970 --> 00:38:38,790 It's almost the complete graph, but we only 733 00:38:38,790 --> 00:38:42,780 draw edges from smaller letters to bigger letters. 734 00:38:42,780 --> 00:38:45,510 So we draw from E to M, from E to P, 735 00:38:45,510 --> 00:38:49,620 from E not to A-- there's no edge from E to A-- from E 736 00:38:49,620 --> 00:38:53,760 to T, not from E to H, but yes from E to Y. 737 00:38:53,760 --> 00:38:55,710 And then we also draw from E to the base 738 00:38:55,710 --> 00:38:58,620 case, which is there's no more letters. 739 00:38:58,620 --> 00:39:02,491 That edge to the base case is-- 740 00:39:05,648 --> 00:39:10,760 corresponds to this 0, or I guess this n, where we say, 741 00:39:10,760 --> 00:39:12,700 oh, let's just recurse. 742 00:39:12,700 --> 00:39:16,085 Let's just throw away-- 743 00:39:16,085 --> 00:39:18,460 actually, maybe we don't need the union 0 there, in fact, 744 00:39:18,460 --> 00:39:25,060 because we include L of n, which is the empty substring. 745 00:39:25,060 --> 00:39:27,820 Then the definition of L is a little funny. 746 00:39:27,820 --> 00:39:30,410 What does it mean to say you start with A of n? 747 00:39:30,410 --> 00:39:30,910 Hm? 748 00:39:30,910 --> 00:39:32,488 AUDIENCE: If we define A of n. 749 00:39:32,488 --> 00:39:34,280 ERIK DEMAINE: Right, A of n is not defined, 750 00:39:34,280 --> 00:39:35,650 so that's not so nice. 751 00:39:35,650 --> 00:39:39,700 So maybe fix that. 752 00:39:39,700 --> 00:39:43,565 n equals case. 753 00:39:43,565 --> 00:39:45,190 OK, but I'm still going to draw an edge 754 00:39:45,190 --> 00:39:48,880 there-- conceptually say, oh, we're just done at that point. 755 00:39:48,880 --> 00:39:54,160 That's the base case, where we have no string left-- 756 00:39:54,160 --> 00:39:55,420 cool. 757 00:39:55,420 --> 00:39:58,270 And when I said from to to actually, I meant the reverse. 758 00:39:58,270 --> 00:40:00,410 All the edges go from right to left. 759 00:40:00,410 --> 00:40:02,770 And then what we're doing is looking for the longest 760 00:40:02,770 --> 00:40:05,950 path in this DAG. 761 00:40:05,950 --> 00:40:08,980 Longest path is maybe a problem we've talked about in problem 762 00:40:08,980 --> 00:40:12,730 session, because it's a DAG-- 763 00:40:12,730 --> 00:40:15,250 well, longest path is the same thing as shortest path, 764 00:40:15,250 --> 00:40:16,960 if you just negate all the weights. 765 00:40:16,960 --> 00:40:18,620 There are no weights in this picture, 766 00:40:18,620 --> 00:40:21,700 so if you just put negative 1 on all these edges 767 00:40:21,700 --> 00:40:28,630 and ask for the shortest path from the base to anywhere-- 768 00:40:28,630 --> 00:40:32,050 so single source shortest paths from this base-- 769 00:40:32,050 --> 00:40:36,740 then we would end up getting this path, which, 770 00:40:36,740 --> 00:40:41,880 if you look at it, is E-M-P-T-Y, empty. 771 00:40:41,880 --> 00:40:45,070 And so that shortest path is indeed the right answer. 772 00:40:45,070 --> 00:40:47,340 What I've drawn here is the shortest path tree. 773 00:40:47,340 --> 00:40:49,980 So also, if you wanted the longest increasing subsequence 774 00:40:49,980 --> 00:40:54,630 starting at A, then it is A-T-Y, just by following 775 00:40:54,630 --> 00:40:56,930 the red arrows here. 776 00:40:56,930 --> 00:40:57,930 And how do you get that? 777 00:40:57,930 --> 00:40:59,700 You just draw the parent pointers, 778 00:40:59,700 --> 00:41:01,020 just like we did before. 779 00:41:01,020 --> 00:41:03,030 I didn't mention, but this example can also 780 00:41:03,030 --> 00:41:05,080 be solved with shortest paths. 781 00:41:05,080 --> 00:41:09,510 Once I construct this graph, you can do the shortest path 782 00:41:09,510 --> 00:41:11,530 from some base-- 783 00:41:11,530 --> 00:41:13,470 I don't know which one-- 784 00:41:13,470 --> 00:41:16,440 to here. 785 00:41:16,440 --> 00:41:21,030 If you put negative 1 on all of the diagonal edges 786 00:41:21,030 --> 00:41:23,520 and you put weight 0 everywhere else, 787 00:41:23,520 --> 00:41:25,560 then that corresponds to-- 788 00:41:25,560 --> 00:41:27,360 the shortest path in that graph will 789 00:41:27,360 --> 00:41:29,010 correspond to the longest-- 790 00:41:29,010 --> 00:41:30,630 the path with the most diagonal edges. 791 00:41:30,630 --> 00:41:32,880 And that makes sense, because the diagonal is where we 792 00:41:32,880 --> 00:41:35,280 actually get letters in common. 793 00:41:35,280 --> 00:41:38,070 And so in this case, it's 2. 794 00:41:38,070 --> 00:41:40,560 So both of these dynamic programs could instead-- 795 00:41:40,560 --> 00:41:42,660 instead of writing them as a recursive thing 796 00:41:42,660 --> 00:41:46,980 with memoization or writing them bottom-up as a for loop 797 00:41:46,980 --> 00:41:49,320 and then doing the computation, you 798 00:41:49,320 --> 00:41:52,410 could instead construct a graph and then 799 00:41:52,410 --> 00:41:55,190 run DAG shortest paths on it. 800 00:41:55,190 --> 00:41:57,170 But the point is these are the same thing. 801 00:41:57,170 --> 00:41:58,890 It's actually a lot simpler to just write 802 00:41:58,890 --> 00:42:00,640 the dynamic programming code, because it's 803 00:42:00,640 --> 00:42:04,240 just a for loop and then a recurrence. 804 00:42:04,240 --> 00:42:06,550 So you're just updating an array. 805 00:42:06,550 --> 00:42:10,240 Because you already know what the topological order is, 806 00:42:10,240 --> 00:42:13,360 you don't have to write a generic depth for search 807 00:42:13,360 --> 00:42:15,460 algorithm, take the finishing order, 808 00:42:15,460 --> 00:42:21,070 reverse it, and then run this-- 809 00:42:21,070 --> 00:42:25,030 run DAG shortest paths with relaxation-- 810 00:42:25,030 --> 00:42:27,380 much simpler to just write down the recurrence, 811 00:42:27,380 --> 00:42:29,320 once you figured it out. 812 00:42:29,320 --> 00:42:32,140 But they are the same in these examples. 813 00:42:32,140 --> 00:42:34,480 In Fibonacci, for example, you cannot write Fibonacci 814 00:42:34,480 --> 00:42:36,230 as a single source shortest paths problem, 815 00:42:36,230 --> 00:42:41,250 but a lot of DPs you can write as shortest paths problem-- 816 00:42:41,250 --> 00:42:45,480 just a connection to things we've seen. 817 00:42:45,480 --> 00:42:52,440 All right, last example, last problem for today-- 818 00:42:52,440 --> 00:42:54,090 we'll do more next week. 819 00:43:02,350 --> 00:43:05,590 Alternating coin game-- this is a two player game. 820 00:43:05,590 --> 00:43:09,960 We're going to find the optimal strategy in this game. 821 00:43:09,960 --> 00:43:17,560 In general, you have a sequence of coins, 822 00:43:17,560 --> 00:43:20,800 and we have two players. 823 00:43:20,800 --> 00:43:22,520 They take turns. 824 00:43:22,520 --> 00:43:35,700 So given coins of value v0 to v n minus 1-- 825 00:43:35,700 --> 00:43:38,940 so it's a sequence. 826 00:43:38,940 --> 00:43:40,410 They're given in order-- 827 00:43:40,410 --> 00:43:45,930 in some order-- for example, 5, 10, 100, 828 00:43:45,930 --> 00:43:47,460 25-- not necessarily sorted order. 829 00:43:47,460 --> 00:43:50,340 And the rules of the game are we're going to take turns. 830 00:43:50,340 --> 00:43:51,970 I'm going to take turns with you. 831 00:43:51,970 --> 00:43:56,340 I'm going to use I and you to refer to the two players. 832 00:43:56,340 --> 00:44:00,390 And so in each turn, either one-- whoever's turn it is, 833 00:44:00,390 --> 00:44:01,110 I get to-- 834 00:44:01,110 --> 00:44:04,410 we get to choose either the first coin or the last coin 835 00:44:04,410 --> 00:44:06,400 among the coins that remain. 836 00:44:06,400 --> 00:44:08,700 So at the beginning, I can choose 5 or 25. 837 00:44:08,700 --> 00:44:11,070 And I might think, oh, 25's really good. 838 00:44:11,070 --> 00:44:11,995 That's better than 5. 839 00:44:11,995 --> 00:44:12,870 I should choose that. 840 00:44:12,870 --> 00:44:15,100 But then, of course, you're going next, 841 00:44:15,100 --> 00:44:17,490 and you're going to choose 100, and you'll win the game. 842 00:44:17,490 --> 00:44:20,563 You'll get more of the total value of the coins. 843 00:44:20,563 --> 00:44:22,230 So in this is example, a better strategy 844 00:44:22,230 --> 00:44:26,290 is to take the 5, because then the 100 is still in the middle. 845 00:44:26,290 --> 00:44:32,280 And so once I take 5, you get to choose 10 or 25. 846 00:44:32,280 --> 00:44:34,170 At this point, you'd probably prefer 25, 847 00:44:34,170 --> 00:44:35,550 because that's better than 10. 848 00:44:35,550 --> 00:44:38,350 But whichever you choose, I can take the 100. 849 00:44:38,350 --> 00:44:43,670 And so I get 105 points, and you're going to get 35 points. 850 00:44:43,670 --> 00:44:47,910 OK-- good example for me. 851 00:44:47,910 --> 00:44:49,740 So that's easy for a simple example, 852 00:44:49,740 --> 00:44:52,260 but in general, there are exponentially many strategies 853 00:44:52,260 --> 00:44:52,760 here. 854 00:44:52,760 --> 00:44:55,610 At each step, either of us could go left or right-- 855 00:44:55,610 --> 00:44:58,000 choose is the leftmost or the rightmost. 856 00:44:58,000 --> 00:45:00,620 And we're going to give a dynamic programming algorithm 857 00:45:00,620 --> 00:45:04,710 that just solves this fast. 858 00:45:04,710 --> 00:45:08,250 I didn't mention-- so this algorithm is quadratic time, 859 00:45:08,250 --> 00:45:10,430 but it can be made n log n time. 860 00:45:10,430 --> 00:45:11,940 It's a fun exercise. 861 00:45:11,940 --> 00:45:14,460 Using a lot of the data structure augmentation stuff 862 00:45:14,460 --> 00:45:17,760 we've done, you can make this n log n. 863 00:45:17,760 --> 00:45:21,750 This algorithm, I think, is going to be n squared time. 864 00:45:26,440 --> 00:45:30,670 So I won't right the problem exactly, 865 00:45:30,670 --> 00:45:32,180 but I think you know the rules. 866 00:45:32,180 --> 00:45:36,130 Choose leftmost or rightmost coin, alternating moves. 867 00:45:36,130 --> 00:45:40,420 So I'd like to define some subproblems. 868 00:45:40,420 --> 00:45:43,180 And this is a problem that's very naturally a substring 869 00:45:43,180 --> 00:45:43,960 problem. 870 00:45:43,960 --> 00:45:47,800 If I just looked at suffixes, that would deal great with-- 871 00:45:47,800 --> 00:45:51,340 if I'm deleting coins from the left, but as soon as I delete-- 872 00:45:51,340 --> 00:45:53,440 and if I delete coins only from the right, 873 00:45:53,440 --> 00:45:55,090 that would give me prefixes. 874 00:45:55,090 --> 00:45:57,850 But I'll tell you now, there's no dynamic programming 875 00:45:57,850 --> 00:46:00,490 where the answer is suffixes and prefixes. 876 00:46:00,490 --> 00:46:03,830 You can do suffixes or prefixes, but if you need both, 877 00:46:03,830 --> 00:46:06,700 you almost certainly need substring, because as soon 878 00:46:06,700 --> 00:46:09,880 as I delete the first coin, and then maybe you 879 00:46:09,880 --> 00:46:11,710 take the second coin-- 880 00:46:11,710 --> 00:46:13,750 that's exactly the optimal strategy here-- now 881 00:46:13,750 --> 00:46:15,775 you have an arbitrary substring in the middle. 882 00:46:15,775 --> 00:46:17,650 But substrings are enough, because we're only 883 00:46:17,650 --> 00:46:19,060 leading from the ends. 884 00:46:19,060 --> 00:46:20,740 We'll look at substrings. 885 00:46:23,500 --> 00:46:27,370 So more precisely-- this is just the intuition-- 886 00:46:27,370 --> 00:46:31,715 we're going to define some generic x of i, 887 00:46:31,715 --> 00:46:44,300 j is going to be what is the maximum total value I 888 00:46:44,300 --> 00:47:01,890 can get from this game, if we play it on coins of value Vi 889 00:47:01,890 --> 00:47:04,290 to Vj. 890 00:47:04,290 --> 00:47:07,340 So that's a substring. 891 00:47:07,340 --> 00:47:10,030 So this is one way to write down the subproblems, 892 00:47:10,030 --> 00:47:13,790 and it's also a good way. 893 00:47:13,790 --> 00:47:21,450 You could write down a relation on this definition 894 00:47:21,450 --> 00:47:22,110 of subproblems. 895 00:47:22,110 --> 00:47:23,280 But I'm low on time. 896 00:47:23,280 --> 00:47:25,690 There's two ways to solve this problem. 897 00:47:25,690 --> 00:47:27,690 This is a reasonable way, exploiting 898 00:47:27,690 --> 00:47:28,830 that the game is zero-sum. 899 00:47:28,830 --> 00:47:32,880 But I'd like to change just a little bit to give you, 900 00:47:32,880 --> 00:47:36,300 I think, what's a cleaner way to solve the problem, which 901 00:47:36,300 --> 00:47:40,430 is to add a third coordinate to my subproblems. 902 00:47:40,430 --> 00:47:42,540 So now it's parameterized by three things. 903 00:47:42,540 --> 00:47:48,150 P here is-- only has two choices. 904 00:47:48,150 --> 00:47:51,390 It's me or you. 905 00:47:51,390 --> 00:47:53,880 And this gets at a point that's maybe not totally 906 00:47:53,880 --> 00:47:55,680 clear from this definition-- 907 00:47:55,680 --> 00:47:59,800 max total value that I can get from these-- this substring 908 00:47:59,800 --> 00:48:00,300 of coins. 909 00:48:00,300 --> 00:48:03,780 But this is not obviously what I need. 910 00:48:03,780 --> 00:48:06,870 So obviously, at the beginning, I want the whole string 911 00:48:06,870 --> 00:48:10,050 and I want to know what my maximum value is-- fine. 912 00:48:10,050 --> 00:48:12,090 And I go first in this game. 913 00:48:12,090 --> 00:48:16,260 I didn't specify, but I do. 914 00:48:16,260 --> 00:48:18,750 [INAUDIBLE] 915 00:48:18,750 --> 00:48:21,720 But as soon as I do a move-- as soon as I take the first coin, 916 00:48:21,720 --> 00:48:22,540 for example-- 917 00:48:22,540 --> 00:48:24,300 it's now your turn. 918 00:48:24,300 --> 00:48:28,140 And so I don't really want to know the maximum total value 919 00:48:28,140 --> 00:48:30,810 that I would get if I go first. 920 00:48:30,810 --> 00:48:37,020 I'd like to say, if player P goes first. 921 00:48:37,020 --> 00:48:39,870 I'd really like to know what happens 922 00:48:39,870 --> 00:48:43,380 in the case where you go first. 923 00:48:43,380 --> 00:48:45,998 So for some of the substrings, I want 924 00:48:45,998 --> 00:48:48,540 to know what happens when you go first, and for some of them, 925 00:48:48,540 --> 00:48:50,373 I want to know what happens when I go first, 926 00:48:50,373 --> 00:48:52,530 because as soon as I make a move, it's your turn. 927 00:48:52,530 --> 00:48:55,530 And so we're going to flip back and forth between P being me 928 00:48:55,530 --> 00:48:57,330 and P being you-- 929 00:48:57,330 --> 00:49:01,810 P-U. So you don't have to parameterize. 930 00:49:01,810 --> 00:49:03,810 There's a way to write the recurrence otherwise, 931 00:49:03,810 --> 00:49:06,390 but this is, I think, a lot more intuitive, 932 00:49:06,390 --> 00:49:13,340 because now we can do a very simple relation, which 933 00:49:13,340 --> 00:49:16,240 is as follows. 934 00:49:16,240 --> 00:49:18,450 So I'm going to split into two cases. 935 00:49:18,450 --> 00:49:22,220 One is x of i, j, me and the other is x of i, j, you. 936 00:49:25,610 --> 00:49:28,280 So x of i, j, me-- 937 00:49:28,280 --> 00:49:30,905 so I have some substring from i to j. 938 00:49:34,060 --> 00:49:34,970 What could I do? 939 00:49:34,970 --> 00:49:37,570 I could take the first coin or I could take the second coin. 940 00:49:37,570 --> 00:49:38,390 Which should I do? 941 00:49:38,390 --> 00:49:39,700 That's my question. 942 00:49:39,700 --> 00:49:41,140 What is my first move? 943 00:49:41,140 --> 00:49:43,510 Should I take the first coin or the second coin? 944 00:49:43,510 --> 00:49:45,730 So this is my question. 945 00:49:45,730 --> 00:49:47,300 What is the first move? 946 00:49:51,040 --> 00:49:53,870 There are exactly two possible answers to that question, 947 00:49:53,870 --> 00:49:59,290 so we can afford to just brute force them and take the max. 948 00:49:59,290 --> 00:50:02,650 If we're moving, we want the maximum number 949 00:50:02,650 --> 00:50:07,570 of points we can get-- maximum total value of the two choices. 950 00:50:07,570 --> 00:50:10,910 So if I take from the i side, the left side, 951 00:50:10,910 --> 00:50:14,664 that would be x sub i plus 1, j. 952 00:50:14,664 --> 00:50:15,490 Sorry. 953 00:50:15,490 --> 00:50:18,100 And now, crucially, we flip players, 954 00:50:18,100 --> 00:50:21,070 because then it's your turn. 955 00:50:21,070 --> 00:50:25,095 And if I take from the j side, that will make it j minus 1. 956 00:50:25,095 --> 00:50:26,470 This is what I accidentally wrote 957 00:50:26,470 --> 00:50:27,637 at the beginning of lecture. 958 00:50:31,630 --> 00:50:33,790 Also flip players. 959 00:50:33,790 --> 00:50:36,550 So either I shrink on the i side or I shrink on the j side. 960 00:50:36,550 --> 00:50:41,500 Oh, I should add on here the value of the coin that I get, 961 00:50:41,500 --> 00:50:46,900 and add on the value the coin that I took. 962 00:50:46,900 --> 00:50:49,930 This is an expression inside the max. 963 00:50:49,930 --> 00:50:51,995 That sum. 964 00:50:51,995 --> 00:50:54,370 And if I take the max those two options, that will give-- 965 00:50:54,370 --> 00:50:59,275 that is my local brute force the best choice of how many-- 966 00:50:59,275 --> 00:51:00,650 what are the total value of coins 967 00:51:00,650 --> 00:51:03,760 I will get out of the remainder, given that you start, 968 00:51:03,760 --> 00:51:06,640 plus this coin that I took right now in the first step, 969 00:51:06,640 --> 00:51:09,610 and for the two possible choices of what that coin is? 970 00:51:09,610 --> 00:51:14,770 OK, what remains is, how do we define this x of i, j, you. 971 00:51:14,770 --> 00:51:18,820 This is a little bit funnier, but it's conceptually similar. 972 00:51:18,820 --> 00:51:25,150 I'm going to write basically the same thing here, but with me, 973 00:51:25,150 --> 00:51:26,980 instead of you-- because again, it flips. 974 00:51:30,670 --> 00:51:35,390 This is, if you go first, then the very next move will be me. 975 00:51:35,390 --> 00:51:43,080 So this is just the symmetric formula here. 976 00:51:43,080 --> 00:51:45,650 I can even put the braces in-- 977 00:51:45,650 --> 00:51:48,230 so far, the same. 978 00:51:48,230 --> 00:51:52,400 Now, I don't put in the plus Vi and I don't put in the plus Vj 979 00:51:52,400 --> 00:51:54,988 here, because if you're moving, I don't get those points. 980 00:51:54,988 --> 00:51:56,780 So there's an asymmetry in this definition. 981 00:51:56,780 --> 00:51:57,950 You could define it in different ways, 982 00:51:57,950 --> 00:51:59,720 but this is the maximum total value 983 00:51:59,720 --> 00:52:02,420 that I would get if you start. 984 00:52:02,420 --> 00:52:05,120 So in your first move, you get some points, 985 00:52:05,120 --> 00:52:06,990 but I don't get any points out of that. 986 00:52:06,990 --> 00:52:08,240 So there's no plus Vi. 987 00:52:08,240 --> 00:52:09,530 There's no plus Vj. 988 00:52:09,530 --> 00:52:11,840 It's just you either choose the i-th coin 989 00:52:11,840 --> 00:52:13,820 or you choose the j-th coin, and then the coins 990 00:52:13,820 --> 00:52:17,550 that remain for me shrink accordingly. 991 00:52:17,550 --> 00:52:19,895 Now, you're kind of a pain in the ass. 992 00:52:19,895 --> 00:52:21,270 You're an adversary you're trying 993 00:52:21,270 --> 00:52:24,360 to minimize my score potentially because you're 994 00:52:24,360 --> 00:52:25,780 trying to maximize your score. 995 00:52:25,780 --> 00:52:26,860 This is a 0 sum game. 996 00:52:26,860 --> 00:52:30,292 So anything that you get I don't get. 997 00:52:30,292 --> 00:52:31,750 If you want to maximize your score, 998 00:52:31,750 --> 00:52:33,208 you're trying to minimize my score. 999 00:52:33,208 --> 00:52:34,420 These are symmetric things. 1000 00:52:34,420 --> 00:52:37,620 And so if you think for a while, the right thing to put here 1001 00:52:37,620 --> 00:52:39,030 is min. 1002 00:52:39,030 --> 00:52:40,710 From our perspective, we're imagining 1003 00:52:40,710 --> 00:52:43,410 what is the worst case that could happen, no matter 1004 00:52:43,410 --> 00:52:45,070 what you do. 1005 00:52:45,070 --> 00:52:48,250 And we don't have control over what you do, 1006 00:52:48,250 --> 00:52:51,748 and so we'd really like to see, what score would I 1007 00:52:51,748 --> 00:52:53,040 get if you chose the i-th coin? 1008 00:52:53,040 --> 00:52:57,360 What score do you get if you chose the j-th coin? 1009 00:52:57,360 --> 00:53:00,690 And then what we get is going to be the worst of those two 1010 00:53:00,690 --> 00:53:02,290 possibilities. 1011 00:53:02,290 --> 00:53:04,800 So when we get to choose, we're maximizing. 1012 00:53:04,800 --> 00:53:06,870 And this is a general two player phenomenon 1013 00:53:06,870 --> 00:53:09,630 that, when you choose, we end up minimizing, 1014 00:53:09,630 --> 00:53:12,870 because that's the saddest thing that could happen to us. 1015 00:53:12,870 --> 00:53:18,480 OK, this is one way to write a recurrence relation. 1016 00:53:18,480 --> 00:53:25,480 We have, of course, all of SRTBOT to do, 1017 00:53:25,480 --> 00:53:30,100 so the topological order here is in increasing length 1018 00:53:30,100 --> 00:53:31,190 of substance. 1019 00:53:31,190 --> 00:53:35,680 So the T is increasing j minus i. 1020 00:53:35,680 --> 00:53:37,040 Start with empty strings. 1021 00:53:37,040 --> 00:53:46,120 So base case is that x of i, i, me is Vi. 1022 00:53:46,120 --> 00:53:50,600 So here I'm inclusive in both ends in this definition. 1023 00:53:50,600 --> 00:53:52,532 So there is a coin I can take at the end. 1024 00:53:52,532 --> 00:53:54,490 But if you move last and there's one coin left, 1025 00:53:54,490 --> 00:53:57,010 then I don't get it, so it's 0. 1026 00:53:57,010 --> 00:54:02,110 Then we have the original problem that is x i, j, me-- 1027 00:54:02,110 --> 00:54:05,740 sorry-- x 0, n. 1028 00:54:05,740 --> 00:54:07,900 That's the entire coin set, starting with me. 1029 00:54:07,900 --> 00:54:09,550 That was the problem I wanted to do. 1030 00:54:09,550 --> 00:54:13,240 And then the running time we get is the number of subproblems-- 1031 00:54:13,240 --> 00:54:16,870 that's theta n squared, because we're doing substrings-- 1032 00:54:16,870 --> 00:54:19,960 times the amount of non-recursive work I do here. 1033 00:54:19,960 --> 00:54:21,350 That's just a max of two numbers. 1034 00:54:21,350 --> 00:54:22,240 Very simple. 1035 00:54:22,240 --> 00:54:25,210 Constant time. 1036 00:54:25,210 --> 00:54:26,440 So this is quadratic. 1037 00:54:31,100 --> 00:54:32,320 Let me show you an example. 1038 00:54:34,910 --> 00:54:38,050 This is hard to draw, but what I've described here 1039 00:54:38,050 --> 00:54:40,970 is called solution 2 in the notes. 1040 00:54:40,970 --> 00:54:42,010 So here's our sequence-- 1041 00:54:42,010 --> 00:54:43,540 5, 10, 100, 25-- 1042 00:54:43,540 --> 00:54:45,310 in both directions. 1043 00:54:45,310 --> 00:54:48,070 And what we're interested in is all substrings. 1044 00:54:48,070 --> 00:54:53,050 So over here I've written the choice for i. 1045 00:54:53,050 --> 00:54:56,110 So we start at one of these, and if you start here, 1046 00:54:56,110 --> 00:54:58,050 you can't end earlier than there. 1047 00:54:58,050 --> 00:55:01,250 So that's why we're in the upper diagonal of this matrix. 1048 00:55:01,250 --> 00:55:03,557 And then there's two versions of each problem-- 1049 00:55:03,557 --> 00:55:05,140 the white version and the blue version 1050 00:55:05,140 --> 00:55:06,515 just down and to the right of it. 1051 00:55:06,515 --> 00:55:08,050 If you can't see what blue is, this 1052 00:55:08,050 --> 00:55:09,670 is the version where you start. 1053 00:55:09,670 --> 00:55:11,740 This is the version where I start. 1054 00:55:11,740 --> 00:55:14,650 And I've labeled here all of the different numbers. 1055 00:55:14,650 --> 00:55:17,470 Please admire, because this took a long time to draw. 1056 00:55:17,470 --> 00:55:19,510 But in particular, we have 105 here, 1057 00:55:19,510 --> 00:55:22,630 meaning that the maximum points I can get 105. 1058 00:55:22,630 --> 00:55:26,020 And that's the case because, if we look over there, 1059 00:55:26,020 --> 00:55:31,960 it is the max of these two incoming values 1060 00:55:31,960 --> 00:55:34,010 plus the Vi that I get. 1061 00:55:34,010 --> 00:55:38,290 So either I go to the left and I take that item or I go down 1062 00:55:38,290 --> 00:55:40,870 and I take that item. 1063 00:55:40,870 --> 00:55:47,970 So the option here is I went to the left and took-- 1064 00:55:47,970 --> 00:55:50,820 well, that's going to be tricky to do in time. 1065 00:55:50,820 --> 00:55:53,170 The claim is that the best answer here 1066 00:55:53,170 --> 00:55:58,330 is to go here with the 100 and take the 5, because going down 1067 00:55:58,330 --> 00:56:00,530 corresponds to removing the last item. 1068 00:56:00,530 --> 00:56:02,800 If I went to the left, that corresponds to-- 1069 00:56:02,800 --> 00:56:03,785 sorry-- the first item. 1070 00:56:03,785 --> 00:56:05,410 If I went to the left, that corresponds 1071 00:56:05,410 --> 00:56:07,390 to removing the last item, so my options 1072 00:56:07,390 --> 00:56:14,390 are 10 plus 25, which is 35, versus 100 plus 5. 1073 00:56:14,390 --> 00:56:17,530 105 wins, so that's why there's a red edge here showing 1074 00:56:17,530 --> 00:56:19,460 that was my better choice. 1075 00:56:19,460 --> 00:56:22,390 And in general, if you follow these parent pointers back, 1076 00:56:22,390 --> 00:56:25,855 it gives you the optimal strategy in what you should do. 1077 00:56:25,855 --> 00:56:28,480 First, you should take the 5 is what this is saying, because we 1078 00:56:28,480 --> 00:56:30,310 just clipped off the 5. 1079 00:56:30,310 --> 00:56:31,750 We used to start here, and now we 1080 00:56:31,750 --> 00:56:34,480 start here in this subinterval. 1081 00:56:34,480 --> 00:56:39,280 Then our opponents, to be annoying, will take the 25-- 1082 00:56:39,280 --> 00:56:41,110 doesn't actually matter, I think. 1083 00:56:41,110 --> 00:56:44,710 Then we will take the 100, and then they 1084 00:56:44,710 --> 00:56:48,507 take the 10, and it's game over. 1085 00:56:48,507 --> 00:56:50,340 OK, all the numbers here are how many points 1086 00:56:50,340 --> 00:56:52,810 we get-- doesn't say how many points the opponent gets. 1087 00:56:52,810 --> 00:56:54,870 Of course, you could add that as well. 1088 00:56:54,870 --> 00:57:00,060 It's just the total sum minus what we get. 1089 00:57:00,060 --> 00:57:02,160 Now, let me come back to high level here. 1090 00:57:02,160 --> 00:57:07,290 What we're really doing is subproblem expansion, 1091 00:57:07,290 --> 00:57:11,070 and this is an idea that we will expand on next lecture. 1092 00:57:15,790 --> 00:57:18,160 And the idea is that sometimes you 1093 00:57:18,160 --> 00:57:20,200 start with the obvious subproblems of prefixes, 1094 00:57:20,200 --> 00:57:21,310 suffixes, or substrings. 1095 00:57:21,310 --> 00:57:23,018 Here, the obvious version was substrings, 1096 00:57:23,018 --> 00:57:24,700 because we were moving from both ends. 1097 00:57:24,700 --> 00:57:28,295 If you don't know, probably suffixes or prefixes 1098 00:57:28,295 --> 00:57:28,795 are enough. 1099 00:57:31,625 --> 00:57:33,250 So we start there, but sometimes that's 1100 00:57:33,250 --> 00:57:35,710 still not enough subproblems. 1101 00:57:35,710 --> 00:57:38,290 Here, as soon as we made a move, our problem almost 1102 00:57:38,290 --> 00:57:39,790 turned upside-down, because now it's 1103 00:57:39,790 --> 00:57:41,530 your turn, instead of my turn. 1104 00:57:41,530 --> 00:57:43,640 And that was just annoying to deal with, 1105 00:57:43,640 --> 00:57:47,260 and so we could-- whenever you run into a new type of problem, 1106 00:57:47,260 --> 00:57:48,490 just build more subproblems. 1107 00:57:48,490 --> 00:57:50,680 As long as it stays polynomial number, 1108 00:57:50,680 --> 00:57:53,240 we'll get polynomial time. 1109 00:57:53,240 --> 00:57:55,420 And so here we doubled the number of subproblems 1110 00:57:55,420 --> 00:57:57,280 to just the me case and the you case, 1111 00:57:57,280 --> 00:57:58,990 and that made this recurrence really easy to write. 1112 00:57:58,990 --> 00:58:01,115 In the notes, you'll see a messier way to write it, 1113 00:58:01,115 --> 00:58:02,110 if you don't do that. 1114 00:58:02,110 --> 00:58:03,980 In the examples we'll see next lecture, 1115 00:58:03,980 --> 00:58:06,430 we're going to do a lot more expansion, 1116 00:58:06,430 --> 00:58:10,010 maybe multiplying the number of problems by n or n squared. 1117 00:58:10,010 --> 00:58:12,100 And this will give us-- 1118 00:58:12,100 --> 00:58:17,800 it will let us add more constraints to our subproblems, 1119 00:58:17,800 --> 00:58:21,400 like we did in longest increasing subsequence. 1120 00:58:21,400 --> 00:58:25,340 We added this constraint that we start with a particular item. 1121 00:58:25,340 --> 00:58:28,450 The more subproblems we have, we can consider more constraints, 1122 00:58:28,450 --> 00:58:31,330 because we'll just brute force all the possible constraints 1123 00:58:31,330 --> 00:58:32,560 that could apply. 1124 00:58:32,560 --> 00:58:34,760 Well, we'll see more of that next time. 1125 00:58:34,760 --> 00:58:36,750 That's it for today.