1 00:00:00,000 --> 00:00:02,420 [SQUEAKING] 2 00:00:02,420 --> 00:00:04,356 [RUSTLING] 3 00:00:04,356 --> 00:00:06,776 [CLICKING] 4 00:00:14,425 --> 00:00:15,550 ERIK DEMAINE: Good morning. 5 00:00:15,550 --> 00:00:17,000 AUDIENCE: Morning! 6 00:00:17,000 --> 00:00:19,630 ERIK DEMAINE: Welcome to 006, Introduction to Algorithms, 7 00:00:19,630 --> 00:00:20,650 lecture two. 8 00:00:20,650 --> 00:00:23,860 I am Erik Demaine and I love algorithms. 9 00:00:23,860 --> 00:00:24,640 Are you with me? 10 00:00:24,640 --> 00:00:25,140 [APPLAUSE] 11 00:00:25,140 --> 00:00:27,010 Yeah. 12 00:00:27,010 --> 00:00:28,840 Today, we're not doing algorithms. 13 00:00:28,840 --> 00:00:30,920 No, we're doing data structures. 14 00:00:30,920 --> 00:00:31,420 It's OK. 15 00:00:31,420 --> 00:00:33,640 There's lots of algorithms in each data structure. 16 00:00:33,640 --> 00:00:36,280 It's like multiple algorithms for free. 17 00:00:36,280 --> 00:00:37,780 We're going to talk about sequences, 18 00:00:37,780 --> 00:00:42,310 and sets, and linked lists, and dynamic arrays. 19 00:00:42,310 --> 00:00:43,900 Fairly simple data structures today. 20 00:00:43,900 --> 00:00:46,480 This is the beginning of several data structures 21 00:00:46,480 --> 00:00:48,920 we'll be talking about in the next few lectures. 22 00:00:48,920 --> 00:00:51,580 But before we actually start with one, 23 00:00:51,580 --> 00:00:57,520 let me tell you/remind remind you of the difference between 24 00:00:57,520 --> 00:01:00,280 an interface-- which you might call an API 25 00:01:00,280 --> 00:01:03,010 if you're a programmer, or an ADT if you're an ancient 26 00:01:03,010 --> 00:01:04,840 algorithms person like me-- 27 00:01:04,840 --> 00:01:07,202 versus a data structure. 28 00:01:11,450 --> 00:01:14,650 These are useful distinctions. 29 00:01:14,650 --> 00:01:18,490 The idea is that an interface says what you want to do. 30 00:01:18,490 --> 00:01:20,840 A data structure says how you do it. 31 00:01:20,840 --> 00:01:22,900 So you might call this a specification. 32 00:01:25,600 --> 00:01:27,340 And in the context of data structures, 33 00:01:27,340 --> 00:01:28,700 we're trying to store some data. 34 00:01:28,700 --> 00:01:35,580 So the interface will specify what data you can store, 35 00:01:35,580 --> 00:01:37,440 whereas the data structure will give you 36 00:01:37,440 --> 00:01:40,690 an actual representation and tell you how to store it. 37 00:01:48,140 --> 00:01:50,150 This is pretty boring. 38 00:01:50,150 --> 00:01:51,720 Just storing data is really easy. 39 00:01:51,720 --> 00:01:54,650 You just throw it in a file or something. 40 00:01:54,650 --> 00:01:59,930 What makes it interesting is having operations on that data. 41 00:01:59,930 --> 00:02:05,630 In the interface, you specify what the operations do, 42 00:02:05,630 --> 00:02:16,365 what operations are supported, and in some sense, 43 00:02:16,365 --> 00:02:16,990 what they mean. 44 00:02:22,720 --> 00:02:26,430 And the data structure actually gives you algorithms-- 45 00:02:26,430 --> 00:02:28,390 this is where the algorithms come in-- 46 00:02:28,390 --> 00:02:30,310 for how to support those operations. 47 00:02:42,870 --> 00:02:44,850 All right. 48 00:02:44,850 --> 00:02:51,510 In this class, we're going to focus on two main interfaces 49 00:02:51,510 --> 00:02:55,110 and various special of them. 50 00:02:55,110 --> 00:02:58,050 The idea is to separate what you want to do versus how to do it. 51 00:02:58,050 --> 00:03:02,250 Because-- you can think of this as the problem statement. 52 00:03:02,250 --> 00:03:06,870 Yesterday-- or, last class, Jason talked about problems 53 00:03:06,870 --> 00:03:11,400 and defined what a problem was versus algorithmic solutions 54 00:03:11,400 --> 00:03:13,000 to the problem. 55 00:03:13,000 --> 00:03:15,720 And this is the analogous notion for data structures, where 56 00:03:15,720 --> 00:03:18,540 we want to maintain some data according 57 00:03:18,540 --> 00:03:21,480 to various operations. 58 00:03:21,480 --> 00:03:23,820 The same problem can be solved by many different data 59 00:03:23,820 --> 00:03:24,360 structures. 60 00:03:24,360 --> 00:03:25,440 And we're going to see that. 61 00:03:25,440 --> 00:03:27,000 And different data structures are going 62 00:03:27,000 --> 00:03:28,020 to have different advantages. 63 00:03:28,020 --> 00:03:30,390 They might support some operations faster than others. 64 00:03:30,390 --> 00:03:31,830 And depending on what you actually 65 00:03:31,830 --> 00:03:33,247 use those data structures for, you 66 00:03:33,247 --> 00:03:35,760 choose the right data structure. 67 00:03:35,760 --> 00:03:38,520 But you can maintain the same interface. 68 00:03:38,520 --> 00:03:40,600 We're going to think about two data structures. 69 00:03:40,600 --> 00:03:45,780 One is called a set and one is called a sequence. 70 00:03:45,780 --> 00:03:47,035 These are highly loaded terms. 71 00:03:47,035 --> 00:03:48,660 Set means something to a mathematician. 72 00:03:48,660 --> 00:03:51,300 It means something else to a Python programmer. 73 00:03:51,300 --> 00:03:53,100 Sequence, similarly. 74 00:03:53,100 --> 00:03:58,290 I guess there's not a Python sequence data type built in. 75 00:03:58,290 --> 00:04:01,290 The idea is, we want to store n things. 76 00:04:01,290 --> 00:04:03,540 The things will be fairly arbitrary. 77 00:04:03,540 --> 00:04:06,420 Think of them as integers or strings. 78 00:04:06,420 --> 00:04:10,110 And on the one hand, we care about their values. 79 00:04:10,110 --> 00:04:12,840 And maybe we want to maintain them in sorted order 80 00:04:12,840 --> 00:04:15,350 and be able to search for a given value, which 81 00:04:15,350 --> 00:04:16,620 we'll call a key. 82 00:04:16,620 --> 00:04:19,769 And on the other hand, we care about representing 83 00:04:19,769 --> 00:04:22,980 a particular sequence that we care about. 84 00:04:22,980 --> 00:04:27,450 Maybe we want to represent the numbers 5, 2, 9, 85 00:04:27,450 --> 00:04:30,060 7 in that order and store that. 86 00:04:30,060 --> 00:04:33,420 In Python, you could store that in a list, for example. 87 00:04:33,420 --> 00:04:35,050 And it will keep track of that order. 88 00:04:35,050 --> 00:04:39,030 And this is the first item, the second item, the last item. 89 00:04:39,030 --> 00:04:42,240 Today, we're going to be focusing on this sequence data 90 00:04:42,240 --> 00:04:43,740 structure, although at the end, I'll 91 00:04:43,740 --> 00:04:46,350 mention the interface for sets. 92 00:04:46,350 --> 00:04:48,840 But we're going to be actually solving sequences today. 93 00:04:48,840 --> 00:04:50,272 And in the next several lectures, 94 00:04:50,272 --> 00:04:52,230 we'll be bouncing back and forth between these. 95 00:04:52,230 --> 00:04:54,450 They're closely related. 96 00:04:54,450 --> 00:04:57,250 Pretty abstract at the moment. 97 00:04:57,250 --> 00:04:59,820 On the other hand, we're going to have two main-- 98 00:05:03,080 --> 00:05:07,510 let's call them data structure tools or approaches. 99 00:05:13,980 --> 00:05:21,200 One is arrays and the other is pointers-- 100 00:05:21,200 --> 00:05:25,600 pointer-based or linked data structures. 101 00:05:25,600 --> 00:05:26,810 You may have seen these. 102 00:05:26,810 --> 00:05:29,428 They're used a lot in programming, of course. 103 00:05:29,428 --> 00:05:31,220 But we're going to see both of these today. 104 00:05:31,220 --> 00:05:34,740 I'll come back to this sort of highlight in a moment. 105 00:05:34,740 --> 00:05:38,810 Let's jump into the sequence interface, which I conveniently 106 00:05:38,810 --> 00:05:41,610 have part of here. 107 00:05:41,610 --> 00:05:43,708 There's a few different levels of sequences 108 00:05:43,708 --> 00:05:44,750 that we might care about. 109 00:05:44,750 --> 00:05:47,810 I'm going to start with the static sequence interface. 110 00:05:47,810 --> 00:05:54,160 This is where the number of items doesn't change, 111 00:05:54,160 --> 00:05:56,630 though the actual items might. 112 00:05:56,630 --> 00:05:58,520 Here, we have n items. 113 00:05:58,520 --> 00:06:02,390 I'm going to label them x 0 to x n minus 1, as in Python. 114 00:06:02,390 --> 00:06:04,190 So the number of items is n. 115 00:06:04,190 --> 00:06:06,470 And the operations I want to support are build, 116 00:06:06,470 --> 00:06:10,790 length, iteration, get, and set. 117 00:06:10,790 --> 00:06:12,830 So what do these do? 118 00:06:12,830 --> 00:06:14,210 Build is how you get started. 119 00:06:14,210 --> 00:06:17,060 To build a data structure in this interface, 120 00:06:17,060 --> 00:06:19,280 you call build of x. 121 00:06:19,280 --> 00:06:22,790 Exactly how you specify x isn't too important, but the idea is, 122 00:06:22,790 --> 00:06:25,010 I give you some items in some order. 123 00:06:25,010 --> 00:06:27,050 In Python, this would be an iterable. 124 00:06:27,050 --> 00:06:29,570 I'm going to want to also know its length. 125 00:06:29,570 --> 00:06:32,000 And I want to make a new data structure of size 126 00:06:32,000 --> 00:06:34,160 and a new static sequence of size n 127 00:06:34,160 --> 00:06:36,220 that has those items in that order. 128 00:06:36,220 --> 00:06:38,828 So that's how you build one of these. 129 00:06:38,828 --> 00:06:40,370 Because somehow, we have to specify n 130 00:06:40,370 --> 00:06:42,037 to this data structure, because n is not 131 00:06:42,037 --> 00:06:44,150 going to be allowed to change. 132 00:06:44,150 --> 00:06:46,430 I'm going to give you a length method. 133 00:06:49,300 --> 00:06:52,000 Methods are the object-oriented way 134 00:06:52,000 --> 00:06:57,850 of thinking of operations that your interface supports. 135 00:06:57,850 --> 00:07:00,100 Length will just return this fixed value n. 136 00:07:00,100 --> 00:07:00,875 Iter sequence. 137 00:07:00,875 --> 00:07:03,250 This is the sense in which we want to maintain the order. 138 00:07:03,250 --> 00:07:06,490 I want to be able to output x 0 through x n minus 1 139 00:07:06,490 --> 00:07:08,950 in the sequence order, in that specified order 140 00:07:08,950 --> 00:07:13,750 that they were built in or that it was changed to. 141 00:07:13,750 --> 00:07:17,470 This is going to iterate through all of the items. 142 00:07:17,470 --> 00:07:20,410 So it's going to take at least linear time to output that. 143 00:07:20,410 --> 00:07:23,770 But more interesting is, we can dynamically access anywhere 144 00:07:23,770 --> 00:07:25,850 in the middle of the sequence. 145 00:07:25,850 --> 00:07:29,320 We can get the ith item, x i, given the value i, 146 00:07:29,320 --> 00:07:33,930 and we can change x i to a given new item. 147 00:07:33,930 --> 00:07:34,430 OK. 148 00:07:34,430 --> 00:07:37,040 So that's called get_at and set_at. 149 00:07:37,040 --> 00:07:38,150 Pretty straightforward. 150 00:07:38,150 --> 00:07:42,640 This should remind you very closely of something-- 151 00:07:42,640 --> 00:07:43,890 a data structure. 152 00:07:43,890 --> 00:07:45,090 So this is an interface. 153 00:07:45,090 --> 00:07:47,187 This is something I might want to solve. 154 00:07:47,187 --> 00:07:48,770 But what is the obvious data structure 155 00:07:48,770 --> 00:07:50,100 that solves this problem? 156 00:07:50,100 --> 00:07:50,600 Yeah? 157 00:07:50,600 --> 00:07:51,615 AUDIENCE: A list. 158 00:07:51,615 --> 00:07:52,490 ERIK DEMAINE: A list. 159 00:07:52,490 --> 00:07:54,530 In Python, it's called a list. 160 00:07:54,530 --> 00:07:58,337 I prefer to call it an array, but to each their own. 161 00:07:58,337 --> 00:07:59,420 We're going to use "list." 162 00:07:59,420 --> 00:08:07,730 List could mean many things, but the solution 163 00:08:07,730 --> 00:08:09,920 to this interface problem-- 164 00:08:09,920 --> 00:08:20,330 the natural solution-- is what I'll call a static array. 165 00:08:23,180 --> 00:08:28,613 Jason mentioned these in lecture one. 166 00:08:28,613 --> 00:08:30,030 It's a little tricky because there 167 00:08:30,030 --> 00:08:31,322 are no static arrays in Python. 168 00:08:31,322 --> 00:08:32,822 There are only dynamic arrays, which 169 00:08:32,822 --> 00:08:34,080 is something we will get to. 170 00:08:34,080 --> 00:08:38,740 But I want to talk about, what is a static array, really? 171 00:08:38,740 --> 00:08:42,480 And this relates to our notion of-- 172 00:08:42,480 --> 00:08:45,810 our model of computation, Jason also talked about, 173 00:08:45,810 --> 00:08:49,380 which we call the word RAM, remember? 174 00:08:58,580 --> 00:09:03,140 The idea, in word RAM, is that your memory 175 00:09:03,140 --> 00:09:08,362 is an array of w-bit words. 176 00:09:08,362 --> 00:09:09,320 This is a bit circular. 177 00:09:09,320 --> 00:09:11,780 I'm going to define an array in terms of the word RAM, which 178 00:09:11,780 --> 00:09:13,030 is defined in terms of arrays. 179 00:09:13,030 --> 00:09:18,780 But I think you know the idea. 180 00:09:18,780 --> 00:09:23,180 So we have a big memory which goes off to infinity, maybe. 181 00:09:23,180 --> 00:09:26,390 It's divided into words. 182 00:09:26,390 --> 00:09:29,930 Each word here is w bits long. 183 00:09:29,930 --> 00:09:32,240 This is word 0, word 1, word 2. 184 00:09:32,240 --> 00:09:35,450 And you can access this array randomly-- 185 00:09:35,450 --> 00:09:36,840 random access memory. 186 00:09:36,840 --> 00:09:41,210 So I can give you the number 5 and get 0 1, 2, 3, 4, 187 00:09:41,210 --> 00:09:44,990 5, the fifth word in this RAM. 188 00:09:44,990 --> 00:09:49,160 That's how actual memories work. 189 00:09:49,160 --> 00:09:53,540 You can access any of them equally quickly. 190 00:09:53,540 --> 00:09:57,300 OK, so that's memory. 191 00:09:57,300 --> 00:10:03,200 And so what we want to do is, when we say an array, 192 00:10:03,200 --> 00:10:11,150 we want this to be a consecutive chunk of memory. 193 00:10:16,460 --> 00:10:17,660 Let me get color. 194 00:10:22,180 --> 00:10:26,230 Let's say I have an array of size 4 and it lives here. 195 00:10:29,690 --> 00:10:31,910 Jason can't spell, but I can't count. 196 00:10:31,910 --> 00:10:34,370 So I think that's four. 197 00:10:34,370 --> 00:10:40,310 We've got-- so the array starts here and it ends over here. 198 00:10:40,310 --> 00:10:42,810 It's of size 4. 199 00:10:42,810 --> 00:10:45,450 And it's consecutive, which means, 200 00:10:45,450 --> 00:10:52,920 if I want to access the array at position-- 201 00:10:52,920 --> 00:10:57,720 at index i, then this is the same thing 202 00:10:57,720 --> 00:11:03,450 as accessing my memory array at position-- 203 00:11:03,450 --> 00:11:05,460 wherever the array starts, which I'll 204 00:11:05,460 --> 00:11:10,440 call the address of the array-- 205 00:11:10,440 --> 00:11:13,470 in Python, this is ID of array-- 206 00:11:13,470 --> 00:11:16,270 plus i. 207 00:11:16,270 --> 00:11:16,770 OK. 208 00:11:16,770 --> 00:11:19,470 This is just simple offset arithmetic. 209 00:11:19,470 --> 00:11:21,510 If I want to know the 0th item of the array, 210 00:11:21,510 --> 00:11:23,040 it's right here, where it starts. 211 00:11:23,040 --> 00:11:24,690 The first item is one after that. 212 00:11:24,690 --> 00:11:26,620 The second item is one after that. 213 00:11:26,620 --> 00:11:30,240 So as long as I store my array consecutively in memory, 214 00:11:30,240 --> 00:11:33,690 I can access the array in constant time. 215 00:11:33,690 --> 00:11:36,390 I can do get_at and set_at as quickly 216 00:11:36,390 --> 00:11:40,290 as I can randomly access the memory and get value-- 217 00:11:40,290 --> 00:11:43,060 or set a value-- which we're assuming is constant time. 218 00:11:50,870 --> 00:11:54,470 My array access is constant time. 219 00:11:57,880 --> 00:12:00,160 This is what allows a static array 220 00:12:00,160 --> 00:12:06,310 to actually solve this problem in constant time per get_at 221 00:12:06,310 --> 00:12:09,040 and set_at operation. 222 00:12:09,040 --> 00:12:12,610 This may seem simple, but we're really going to need this model 223 00:12:12,610 --> 00:12:15,610 and really rely on this model increasingly as we get to more 224 00:12:15,610 --> 00:12:17,090 interesting data structures. 225 00:12:17,090 --> 00:12:20,890 This is the first time we're actually needing it. 226 00:12:20,890 --> 00:12:21,850 Let's see. 227 00:12:21,850 --> 00:12:23,710 Length is also constant time. 228 00:12:23,710 --> 00:12:25,460 We're just going to store that number 229 00:12:25,460 --> 00:12:29,230 n, along with its address. 230 00:12:29,230 --> 00:12:32,200 And build is going to take linear time. 231 00:12:32,200 --> 00:12:34,690 Iteration will take linear time. 232 00:12:43,350 --> 00:12:45,820 Pretty straightforward. 233 00:12:45,820 --> 00:12:48,030 I guess one thing here, when defining 234 00:12:48,030 --> 00:12:49,950 build, I need to introduce a little bit more 235 00:12:49,950 --> 00:12:51,390 of our model of computation, which 236 00:12:51,390 --> 00:12:54,980 is, how do you create an array in the beginning? 237 00:12:54,980 --> 00:12:57,320 I claim I could do it in linear time, 238 00:12:57,320 --> 00:12:59,240 but that's just part of the model. 239 00:12:59,240 --> 00:13:01,370 This is called the memory allocation model. 240 00:13:09,450 --> 00:13:11,047 There are a few possible choices here, 241 00:13:11,047 --> 00:13:12,630 but the cleanest one is just to assume 242 00:13:12,630 --> 00:13:25,630 that you can allocate an array of size n in theta n time. 243 00:13:25,630 --> 00:13:29,177 So it takes linear time to make an array of size n. 244 00:13:29,177 --> 00:13:30,760 You could imagine this being constant. 245 00:13:30,760 --> 00:13:32,400 It doesn't really matter much. 246 00:13:32,400 --> 00:13:33,930 But it does take work. 247 00:13:33,930 --> 00:13:36,540 And in particular, if you just allocate some chunk of memory, 248 00:13:36,540 --> 00:13:38,290 you have no idea whether it's initialized. 249 00:13:38,290 --> 00:13:41,190 So initializing that array to 0s will cost linear time. 250 00:13:41,190 --> 00:13:43,380 It won't really matter, constant versus linear, 251 00:13:43,380 --> 00:13:47,880 but a nice side effect of this model is that space-- 252 00:13:47,880 --> 00:13:50,342 if you're just allocating arrays, 253 00:13:50,342 --> 00:13:52,800 the amount of space you use is, at most, the amount of time 254 00:13:52,800 --> 00:13:53,580 you use. 255 00:13:53,580 --> 00:13:57,240 Or, I guess, big O of that. 256 00:13:57,240 --> 00:14:00,290 So that's a nice feature. 257 00:14:00,290 --> 00:14:01,920 It's pretty weird if you imagine-- 258 00:14:01,920 --> 00:14:04,890 it's unrealistic to imagine you can allocate an array that's 259 00:14:04,890 --> 00:14:07,830 infinite size and then just use a few items out of it. 260 00:14:07,830 --> 00:14:10,540 That won't give you a good data structure. 261 00:14:10,540 --> 00:14:14,010 So we'll assume it costs to allocate memory. 262 00:14:14,010 --> 00:14:14,580 OK, great. 263 00:14:14,580 --> 00:14:16,890 We solved the sequence problem. 264 00:14:16,890 --> 00:14:18,450 Very simple, kind of boring. 265 00:14:18,450 --> 00:14:21,750 These are optimal running times. 266 00:14:21,750 --> 00:14:24,780 Now, let's make it interesting-- make sure I didn't miss 267 00:14:24,780 --> 00:14:26,590 anything-- 268 00:14:26,590 --> 00:14:32,460 and talk about-- oh, there is one thing I want 269 00:14:32,460 --> 00:14:35,460 to talk about in the word RAM. 270 00:14:35,460 --> 00:14:39,270 A side effect of this assumption that array access should 271 00:14:39,270 --> 00:14:43,410 take constant time, and that accessing these positions 272 00:14:43,410 --> 00:14:45,660 in my memory should take constant time, 273 00:14:45,660 --> 00:14:55,410 is that we need to assume w is at least log n or so. 274 00:14:58,450 --> 00:15:00,880 w, remember, is the machine word size. 275 00:15:00,880 --> 00:15:04,540 In real computers, this is currently 64-- 276 00:15:04,540 --> 00:15:08,420 or 256, in some bizarre instructions. 277 00:15:08,420 --> 00:15:10,750 But we don't usually think of the machine 278 00:15:10,750 --> 00:15:13,663 as getting bigger over time, but you should think of the machine 279 00:15:13,663 --> 00:15:14,830 as getting bigger over time. 280 00:15:14,830 --> 00:15:17,110 This is a statement that says, the word 281 00:15:17,110 --> 00:15:19,150 size has to grow with n. 282 00:15:19,150 --> 00:15:20,650 It might faster than log n, but it 283 00:15:20,650 --> 00:15:22,600 has to grow at least as fast as log n. 284 00:15:22,600 --> 00:15:23,960 Why do I say that? 285 00:15:23,960 --> 00:15:26,560 Because if I have n things that I'm dealing with-- n, 286 00:15:26,560 --> 00:15:27,940 here, is the problem size. 287 00:15:27,940 --> 00:15:30,580 Maybe it's the array I'm trying to store-- whatever. 288 00:15:30,580 --> 00:15:33,610 If I'm having to deal with n things in my memory, 289 00:15:33,610 --> 00:15:36,200 at the very least, I need to be able to address them. 290 00:15:36,200 --> 00:15:38,320 I should be able to say, give me the ith one 291 00:15:38,320 --> 00:15:40,570 and represent that number i in a word. 292 00:15:40,570 --> 00:15:43,240 Otherwise-- because the machine is designed to only work 293 00:15:43,240 --> 00:15:44,892 with w-bit words in constant time, 294 00:15:44,892 --> 00:15:46,600 they'll want to be able to access the ith 295 00:15:46,600 --> 00:15:48,920 word in constant time, I need a word size 296 00:15:48,920 --> 00:15:51,550 that's at least log n just to address that and n things 297 00:15:51,550 --> 00:15:52,798 in my input. 298 00:15:52,798 --> 00:15:54,590 So this is a totally reasonable assumption. 299 00:15:54,590 --> 00:15:56,798 It may seem weird because you think of a real machine 300 00:15:56,798 --> 00:15:59,200 as having constant size, but a real machine 301 00:15:59,200 --> 00:16:01,390 has constant size RAM, also. 302 00:16:01,390 --> 00:16:04,450 My machine has 24 gigs of RAM, or whatever. 303 00:16:04,450 --> 00:16:05,770 That laptop has 8. 304 00:16:05,770 --> 00:16:08,567 But you don't think of that as changing over time. 305 00:16:08,567 --> 00:16:10,900 But of course, if you want it to process a larger input, 306 00:16:10,900 --> 00:16:12,490 you would buy more RAM. 307 00:16:12,490 --> 00:16:16,330 So eventually, when our n's get really, really big, 308 00:16:16,330 --> 00:16:17,980 we're going to have to increase w just 309 00:16:17,980 --> 00:16:20,480 so we can address that RAM. 310 00:16:20,480 --> 00:16:22,350 That's the intuition here. 311 00:16:22,350 --> 00:16:25,220 But this is a way to bridge reality, which are 312 00:16:25,220 --> 00:16:27,510 fixed machines, with theory. 313 00:16:27,510 --> 00:16:28,010 In. 314 00:16:28,010 --> 00:16:30,740 Algorithms, we care about scalability for very large n. 315 00:16:30,740 --> 00:16:33,320 We want to know what that growth function is and ignore 316 00:16:33,320 --> 00:16:34,700 the lead constant factor. 317 00:16:34,700 --> 00:16:37,410 That's what asymptotic notation is all about. 318 00:16:37,410 --> 00:16:40,700 And for that, we need a notion of word size also changing 319 00:16:40,700 --> 00:16:42,620 in this asymptotic way. 320 00:16:42,620 --> 00:16:44,330 All right. 321 00:16:44,330 --> 00:16:46,220 That would be more important next week, 322 00:16:46,220 --> 00:16:48,620 when we talk about hashing and why hashing 323 00:16:48,620 --> 00:16:51,880 is a reasonable thing to do. 324 00:16:51,880 --> 00:16:57,480 But let's move on to dynamic sequences, which is 325 00:16:57,480 --> 00:16:59,025 where things get interesting. 326 00:17:01,730 --> 00:17:04,310 I have the update here. 327 00:17:04,310 --> 00:17:07,310 We start with static sequences. 328 00:17:07,310 --> 00:17:09,805 All of these operations are still 329 00:17:09,805 --> 00:17:11,930 something we want to support in a dynamic sequence, 330 00:17:11,930 --> 00:17:14,990 but we add two dynamic operations-- 331 00:17:14,990 --> 00:17:18,290 somewhat controversial operations, very exciting. 332 00:17:18,290 --> 00:17:21,530 I want to be able to insert in the middle of my sequence 333 00:17:21,530 --> 00:17:25,099 and I want to be able to delete from the middle of my sequence. 334 00:17:25,099 --> 00:17:28,735 Here's my sequence, which I'm going to think of in a picture. 335 00:17:28,735 --> 00:17:30,110 I'm going to draw it as an array. 336 00:17:30,110 --> 00:17:31,640 But it's stored however it's stored. 337 00:17:31,640 --> 00:17:32,540 We don't know. 338 00:17:32,540 --> 00:17:34,530 This is an interface, not an implementation. 339 00:17:34,530 --> 00:17:38,660 So we have x 0, x 1, x 2, x 3. 340 00:17:41,840 --> 00:17:46,940 And let's say I insert at position 2. 341 00:17:46,940 --> 00:17:49,070 Position 2 is here. 342 00:17:49,070 --> 00:17:51,800 So I come in with my new x, and I 343 00:17:51,800 --> 00:17:53,900 would like x to be the new x 2, but I don't 344 00:17:53,900 --> 00:17:55,130 want to lose any information. 345 00:17:55,130 --> 00:17:58,670 If I did set_at 2, then I would erase this and replace it 346 00:17:58,670 --> 00:17:59,433 with x. 347 00:17:59,433 --> 00:18:01,850 But I want to do insert_at, which means all of these guys, 348 00:18:01,850 --> 00:18:03,392 conceptually, are going to shift over 349 00:18:03,392 --> 00:18:05,750 by 1 in terms of their indices. 350 00:18:05,750 --> 00:18:09,290 Then, I would get this picture that's one bigger. 351 00:18:13,510 --> 00:18:14,910 And now I've got the new x. 352 00:18:14,910 --> 00:18:18,840 I've got what was the old x 2, which I don't-- 353 00:18:18,840 --> 00:18:21,720 I hesitate to call x 2 because that's its old name, not 354 00:18:21,720 --> 00:18:22,330 its new name. 355 00:18:22,330 --> 00:18:27,620 I'm going to draw arrows to say, these guys get copied over. 356 00:18:27,620 --> 00:18:30,200 These ones are definitely unchanged. 357 00:18:30,200 --> 00:18:34,220 Our new x 2, which prime is x This 358 00:18:34,220 --> 00:18:38,720 is x3 prime, 4 prime, and so on. 359 00:18:38,720 --> 00:18:42,770 I want to be careful here-- and of course, the new n prime 360 00:18:42,770 --> 00:18:43,970 is n plus 1. 361 00:18:46,653 --> 00:18:49,070 I want to be careful about the labeling, because the key-- 362 00:18:49,070 --> 00:18:51,870 what makes insert_at interesting is that, later, when I call 363 00:18:51,870 --> 00:18:54,680 get_at, it's with the new indexing. 364 00:18:54,680 --> 00:18:59,600 So previously, if I did get_at at 2, I would get this value. 365 00:18:59,600 --> 00:19:01,427 And afterwards, if I did get_at at 2, 366 00:19:01,427 --> 00:19:02,510 I would get the new value. 367 00:19:02,510 --> 00:19:04,520 If I did get_at at 3 down here, I 368 00:19:04,520 --> 00:19:08,280 would get the value that used to be X 2. 369 00:19:08,280 --> 00:19:09,890 That's maybe hard to track. 370 00:19:09,890 --> 00:19:12,920 But this is a conceptually very useful thing 371 00:19:12,920 --> 00:19:16,340 to do, especially when you're inserting or deleting 372 00:19:16,340 --> 00:19:17,670 at the ends. 373 00:19:17,670 --> 00:19:26,490 So we're going to define, in particular, insert and delete 374 00:19:26,490 --> 00:19:29,235 first and last. 375 00:19:33,270 --> 00:19:37,860 These are sometimes given-- if you have an insert, 376 00:19:37,860 --> 00:19:39,600 it has an x. 377 00:19:39,600 --> 00:19:43,560 If you do a delete, it has no argument. 378 00:19:43,560 --> 00:19:46,530 This means insert_at the beginning of the array, which 379 00:19:46,530 --> 00:19:48,630 would be like adding it here. 380 00:19:48,630 --> 00:19:52,110 And insert_last means adding it on here. 381 00:19:52,110 --> 00:19:53,850 insert_last doesn't change the indices 382 00:19:53,850 --> 00:19:55,030 of any of the old items. 383 00:19:55,030 --> 00:19:56,572 That's a nice feature of insert_last. 384 00:19:56,572 --> 00:19:59,460 Insert-first changes all of them. 385 00:19:59,460 --> 00:20:02,460 They all get incremented by 1. 386 00:20:02,460 --> 00:20:05,520 And we're also interested in the similar things here. 387 00:20:05,520 --> 00:20:22,550 We could do get-first or -last or set-first or -last, 388 00:20:22,550 --> 00:20:25,760 which are the obvious special cases of get_at and set_at. 389 00:20:25,760 --> 00:20:27,560 Now, these special cases are particularly 390 00:20:27,560 --> 00:20:29,608 interesting in an algorithms context. 391 00:20:29,608 --> 00:20:31,650 If you were a mathematician, you would say, well, 392 00:20:31,650 --> 00:20:32,630 why do I even bother? 393 00:20:32,630 --> 00:20:37,550 This is just shorthand for a particular call to get or set. 394 00:20:37,550 --> 00:20:40,340 But what makes it interesting from a data structures 395 00:20:40,340 --> 00:20:43,370 perspective is that we care about algorithms for supporting 396 00:20:43,370 --> 00:20:44,120 these operations. 397 00:20:44,120 --> 00:20:46,670 And maybe, the algorithm for supporting 398 00:20:46,670 --> 00:20:50,180 get-first or set-first, or in particular, insert-first or 399 00:20:50,180 --> 00:20:52,910 insert_last, might be more efficient. 400 00:20:52,910 --> 00:20:54,920 Maybe we can solve this problem better 401 00:20:54,920 --> 00:20:56,810 than we can solve insert_at. 402 00:20:56,810 --> 00:20:58,370 So while, ideally, we could solve 403 00:20:58,370 --> 00:21:01,220 the entire dynamic sequence interface constant time 404 00:21:01,220 --> 00:21:03,050 preparation, that's not actually possible. 405 00:21:03,050 --> 00:21:04,490 You can prove that. 406 00:21:04,490 --> 00:21:06,350 But special cases of it-- where we're just 407 00:21:06,350 --> 00:21:10,190 inserting and leading from the end, say-- we can do that. 408 00:21:10,190 --> 00:21:13,250 That's why it's interesting to introduce special cases that we 409 00:21:13,250 --> 00:21:16,180 care about. 410 00:21:16,180 --> 00:21:16,690 Cool. 411 00:21:16,690 --> 00:21:19,367 That's the definition of the dynamic sequence interface. 412 00:21:19,367 --> 00:21:20,950 Now, we're going to actually solve it. 413 00:21:27,830 --> 00:21:32,390 Our first data structure for this is called linked lists. 414 00:21:32,390 --> 00:21:39,810 You've taken, probably-- you've probably seen linked lists 415 00:21:39,810 --> 00:21:41,140 before at some point. 416 00:21:41,140 --> 00:21:43,140 But the main new part here is, we're 417 00:21:43,140 --> 00:21:46,650 going to actually analyze them and see how efficiently they 418 00:21:46,650 --> 00:21:49,380 implement all of these operations we might care about. 419 00:21:49,380 --> 00:21:50,730 First, review. 420 00:21:50,730 --> 00:21:52,620 What is a linked list? 421 00:21:52,620 --> 00:22:00,910 We store our items in a bunch of nodes. 422 00:22:12,470 --> 00:22:20,150 Each node has an item in it and a next field. 423 00:22:20,150 --> 00:22:22,910 So you can think of these as class objects 424 00:22:22,910 --> 00:22:28,340 with two class variables, the item and the next pointer. 425 00:22:28,340 --> 00:22:30,980 And we assemble those into this kind of structure 426 00:22:30,980 --> 00:22:32,360 where we store-- 427 00:22:32,360 --> 00:22:35,930 in the item fields, we're going to store the actual values 428 00:22:35,930 --> 00:22:38,300 that we want to represent in our sequence, x 0 429 00:22:38,300 --> 00:22:41,690 through x n minus 1, in order. 430 00:22:41,690 --> 00:22:44,780 And then we're going to use the next pointers to link these 431 00:22:44,780 --> 00:22:46,590 all together in that order. 432 00:22:46,590 --> 00:22:48,980 So the next pointers are what actually give us the order. 433 00:22:48,980 --> 00:22:51,320 And in addition, we're going to keep track of what's 434 00:22:51,320 --> 00:22:53,840 called the head of the list. 435 00:22:53,840 --> 00:22:56,720 The data structure is going to be represented by a head. 436 00:22:56,720 --> 00:23:00,513 If you wanted to, you could also store length. 437 00:23:00,513 --> 00:23:02,180 This could be the data structure itself. 438 00:23:04,830 --> 00:23:08,010 And it's pointing to all of these types of data structures. 439 00:23:08,010 --> 00:23:10,690 Notice, we've just seen an array-based data structure, 440 00:23:10,690 --> 00:23:14,340 which is just a static array, and we've 441 00:23:14,340 --> 00:23:17,710 seen a pointer-based data structure. 442 00:23:17,710 --> 00:23:22,180 And we're relying on the fact that pointers can be stored 443 00:23:22,180 --> 00:23:24,730 in a single word, which means we can de-reference them-- 444 00:23:24,730 --> 00:23:27,640 we can see what's on the other side of the pointer-- 445 00:23:27,640 --> 00:23:30,700 in constant time in our word RAM model. 446 00:23:30,700 --> 00:23:34,330 In reality, each of these nodes is stored somewhere 447 00:23:34,330 --> 00:23:38,150 in the array of the computer. 448 00:23:38,150 --> 00:23:40,420 So maybe each one is two words long, 449 00:23:40,420 --> 00:23:44,330 so maybe one node is-- the first node is here. 450 00:23:44,330 --> 00:23:45,580 Maybe the second node is here. 451 00:23:45,580 --> 00:23:46,870 The third node is here. 452 00:23:46,870 --> 00:23:48,580 They're in some arbitrary order. 453 00:23:48,580 --> 00:23:54,190 We're using this fact, that we can allocate an array of size n 454 00:23:54,190 --> 00:23:55,810 in linear time-- in this case, we're 455 00:23:55,810 --> 00:23:57,550 going to have arrays of size 2. 456 00:23:57,550 --> 00:24:00,800 We can just say, oh, please give me a new array of size 2. 457 00:24:00,800 --> 00:24:05,830 And that will make us one of these nodes. 458 00:24:05,830 --> 00:24:07,480 And then we're storing pointers. 459 00:24:07,480 --> 00:24:11,380 Pointers are just indices into the giant memory array. 460 00:24:11,380 --> 00:24:14,962 They're just, what is the address of this little array? 461 00:24:14,962 --> 00:24:17,170 If you've ever wondered how pointers are implemented, 462 00:24:17,170 --> 00:24:20,200 they're just numbers that say where, in memory, is 463 00:24:20,200 --> 00:24:21,340 this thing over here? 464 00:24:21,340 --> 00:24:23,950 And in memory, they're in arbitrary order. 465 00:24:23,950 --> 00:24:26,980 This is really nice because it's easy to manipulate 466 00:24:26,980 --> 00:24:28,780 the order of a linked list without actually 467 00:24:28,780 --> 00:24:34,900 physically moving nodes around, whereas arrays are problematic. 468 00:24:34,900 --> 00:24:37,270 Maybe it's worth mentioning. 469 00:24:37,270 --> 00:24:38,800 Let's start analyzing things. 470 00:24:38,800 --> 00:24:42,970 So we care about these dynamic sequence operations. 471 00:24:42,970 --> 00:24:47,140 And we could try to apply it to the static array data 472 00:24:47,140 --> 00:24:52,600 structure, or we could try to implement these operations 473 00:24:52,600 --> 00:24:53,950 in a static array. 474 00:24:53,950 --> 00:24:57,730 It's possible, just not going to be very good. 475 00:24:57,730 --> 00:25:00,070 And we can try to implement it with linked lists. 476 00:25:00,070 --> 00:25:02,990 And it's also not going to be that great. 477 00:25:02,990 --> 00:25:05,170 Let's go over here. 478 00:25:11,110 --> 00:25:13,280 Our goal is the next data structure, 479 00:25:13,280 --> 00:25:14,980 which is dynamic arrays. 480 00:25:14,980 --> 00:25:16,690 But linked lists and static arrays each 481 00:25:16,690 --> 00:25:18,130 have their advantages. 482 00:25:22,560 --> 00:25:33,730 Let's first analyze dynamic sequence operations, 483 00:25:33,730 --> 00:25:41,440 first on a static array and then on a linked list. 484 00:25:46,890 --> 00:25:50,520 On a static array, I think you all see, 485 00:25:50,520 --> 00:25:54,210 if I try to insert at the beginning of the static array-- 486 00:25:54,210 --> 00:25:55,710 that's kind of the worst case. 487 00:25:55,710 --> 00:25:59,970 If I insert first, then everybody has to shift over. 488 00:25:59,970 --> 00:26:01,920 If I'm going to maintain this invariant, 489 00:26:01,920 --> 00:26:04,800 that the ith item in the array represents-- 490 00:26:04,800 --> 00:26:08,880 I guess I didn't write it anywhere here. 491 00:26:08,880 --> 00:26:09,780 Maybe here. 492 00:26:14,610 --> 00:26:15,840 Static array. 493 00:26:15,840 --> 00:26:18,210 We're going to maintain this invariant that a of i 494 00:26:18,210 --> 00:26:19,365 represents x i. 495 00:26:22,050 --> 00:26:24,173 If I want to maintain that at all times, when 496 00:26:24,173 --> 00:26:25,590 I insert a new thing in the front, 497 00:26:25,590 --> 00:26:27,870 because the indices of all the previous items change, 498 00:26:27,870 --> 00:26:30,150 I have to spend time to copy those over. 499 00:26:30,150 --> 00:26:32,520 You can do it in linear time, but no better. 500 00:26:36,030 --> 00:26:39,150 Static array. 501 00:26:39,150 --> 00:26:53,000 Insert and delete anywhere costs theta n time-- 502 00:26:55,870 --> 00:26:58,670 actually, for two reasons. 503 00:26:58,670 --> 00:27:03,560 Reason number one is that, if we're near the front, 504 00:27:03,560 --> 00:27:05,365 then we have to do shifting. 505 00:27:07,960 --> 00:27:13,640 What about insert or delete the last element of an array? 506 00:27:13,640 --> 00:27:15,350 Is that any easier? 507 00:27:15,350 --> 00:27:17,840 Because then, if I insert the very last element, 508 00:27:17,840 --> 00:27:19,050 none of the indices change. 509 00:27:19,050 --> 00:27:20,330 I'm just adding a new element. 510 00:27:24,158 --> 00:27:25,450 So I don't have to do shifting. 511 00:27:28,620 --> 00:27:31,830 So can I do insert and delete last in constant time 512 00:27:31,830 --> 00:27:32,590 in a static array? 513 00:27:36,920 --> 00:27:38,142 Yeah? 514 00:27:38,142 --> 00:27:40,095 AUDIENCE: No, because the size is constant. 515 00:27:40,095 --> 00:27:42,053 ERIK DEMAINE: No, because the size is constant. 516 00:27:42,053 --> 00:27:45,840 So our model is that remember allocation model is 517 00:27:45,840 --> 00:27:50,340 that we can allocate a static array of size em 518 00:27:50,340 --> 00:27:52,410 but it's just a size n I can't just say 519 00:27:52,410 --> 00:27:55,500 please make it bigger by 1 I need I need space 520 00:27:55,500 --> 00:27:57,130 to store this extra element. 521 00:27:57,130 --> 00:28:00,150 And if you think about where things are in memory, when 522 00:28:00,150 --> 00:28:02,195 you call to this memory allocator, which 523 00:28:02,195 --> 00:28:03,570 is part of your operating system, 524 00:28:03,570 --> 00:28:08,070 you say, please give me a chunk of memory. 525 00:28:08,070 --> 00:28:10,530 It's going to place them in various places in memory, 526 00:28:10,530 --> 00:28:12,980 and some of them might be next to each other. 527 00:28:12,980 --> 00:28:14,665 So if I try to grow this array by 1, 528 00:28:14,665 --> 00:28:16,290 there might already be something there. 529 00:28:16,290 --> 00:28:18,558 And that's not possible without first shifting. 530 00:28:18,558 --> 00:28:20,100 So even though, in the array, I don't 531 00:28:20,100 --> 00:28:21,558 have to do any shifting, in memory, 532 00:28:21,558 --> 00:28:22,780 I might have to do shifting. 533 00:28:22,780 --> 00:28:24,322 And that's outside the model. 534 00:28:24,322 --> 00:28:26,280 So we're going to stick to this model of just-- 535 00:28:26,280 --> 00:28:27,370 you can allocate memory. 536 00:28:27,370 --> 00:28:31,530 You can also de-allocate memory, just to keep space usage small. 537 00:28:31,530 --> 00:28:34,500 But the only way to get more space 538 00:28:34,500 --> 00:28:35,610 is to ask for a new array. 539 00:28:35,610 --> 00:28:39,220 And that new array won't be contiguous to your old one. 540 00:28:39,220 --> 00:28:40,093 Question? 541 00:28:40,093 --> 00:28:42,065 AUDIENCE: [INAUDIBLE] 542 00:28:46,455 --> 00:28:48,080 ERIK DEMAINE: What is the dynamic array 543 00:28:48,080 --> 00:28:50,690 will be the next topic, so maybe we'll come back to that. 544 00:28:50,690 --> 00:28:52,400 Yeah? 545 00:28:52,400 --> 00:28:55,340 In a static array, you're just not allowed to make it bigger. 546 00:28:55,340 --> 00:29:00,500 And so you have to allocate a new array, which 547 00:29:00,500 --> 00:29:01,760 we say takes linear time. 548 00:29:01,760 --> 00:29:05,090 Even if allocating the new array didn't take linear time, 549 00:29:05,090 --> 00:29:07,303 you have to copy all the elements over 550 00:29:07,303 --> 00:29:08,720 from the old array to the new one. 551 00:29:08,720 --> 00:29:12,470 Then you can throw away the old one. 552 00:29:12,470 --> 00:29:14,690 Just the copying from an array of size 553 00:29:14,690 --> 00:29:17,640 n to an array of size n plus 1, that will take linear time. 554 00:29:17,640 --> 00:29:21,020 So static arrays are really bad for dynamic operations-- 555 00:29:21,020 --> 00:29:21,680 no surprise. 556 00:29:21,680 --> 00:29:23,970 But you could do them. 557 00:29:23,970 --> 00:29:26,000 That's static array. 558 00:29:26,000 --> 00:29:29,960 Now, linked lists are going to be almost the opposite-- 559 00:29:29,960 --> 00:29:33,320 well, almost. 560 00:29:33,320 --> 00:29:36,500 If we store the length, OK, we can compute the length 561 00:29:36,500 --> 00:29:39,470 of the array very quickly. 562 00:29:39,470 --> 00:29:44,510 We can insert and delete at the front really efficiently. 563 00:29:44,510 --> 00:29:51,080 If I want to add a new item as a new first item, then what do? 564 00:29:51,080 --> 00:29:54,590 I do I allocate a new node, which I'll call x. 565 00:29:54,590 --> 00:30:01,060 This is insert-first of x. 566 00:30:01,060 --> 00:30:03,810 I'll allocate a new array of size 2. 567 00:30:03,810 --> 00:30:05,320 I'm going to change-- 568 00:30:05,320 --> 00:30:08,110 let me do it in red. 569 00:30:08,110 --> 00:30:11,080 I'm going to change this head pointer. 570 00:30:11,080 --> 00:30:13,540 Maybe I should do that later. 571 00:30:13,540 --> 00:30:16,397 I'm going to set the next pointer here to this one, 572 00:30:16,397 --> 00:30:17,980 and then I'm going to change this head 573 00:30:17,980 --> 00:30:20,680 pointer to point to here. 574 00:30:20,680 --> 00:30:23,090 And, boom, now I've got a linked list. 575 00:30:23,090 --> 00:30:25,600 Again, we don't know anything about the order and memory 576 00:30:25,600 --> 00:30:26,692 of these lists. 577 00:30:26,692 --> 00:30:28,150 We just care about the order that's 578 00:30:28,150 --> 00:30:29,860 represented implicitly by following 579 00:30:29,860 --> 00:30:32,700 the next pointers repeatedly. 580 00:30:32,700 --> 00:30:35,820 Now, I've got a new list that has x in front, and then x 0, 581 00:30:35,820 --> 00:30:37,450 and then x 1, and so on. 582 00:30:37,450 --> 00:30:41,045 So insert- and delete_first, at least are really efficient. 583 00:30:41,045 --> 00:30:42,420 We won't get much more than that, 584 00:30:42,420 --> 00:30:55,010 but the linked list, insert- and delete_first are constant time. 585 00:31:00,350 --> 00:31:03,030 So that's cool. 586 00:31:03,030 --> 00:31:05,120 However, everything else is going to be slow. 587 00:31:05,120 --> 00:31:10,490 If I want to get the 10th item in a linked list, 588 00:31:10,490 --> 00:31:13,490 I have to follow these pointers 10 times. 589 00:31:13,490 --> 00:31:17,510 I go 0, 1, 2, 3, and so on. 590 00:31:17,510 --> 00:31:20,930 Follow 10 next pointers and I'll get the 10th item. 591 00:31:20,930 --> 00:31:23,480 Accessing the ith item is going to take order i time. 592 00:31:29,090 --> 00:31:40,500 Get- and set_at need i time, which, in the worst case, 593 00:31:40,500 --> 00:31:41,220 is theta n. 594 00:31:48,830 --> 00:31:51,330 We have sort of complementary data structures here. 595 00:31:51,330 --> 00:31:54,470 On the one hand, a static array can do constant time 596 00:31:54,470 --> 00:31:55,450 get_at/set_at. 597 00:31:55,450 --> 00:31:57,710 So it's very fast at the random access aspect 598 00:31:57,710 --> 00:31:59,150 because it's an array. 599 00:31:59,150 --> 00:32:01,470 Linked lists are very bad at random access, 600 00:32:01,470 --> 00:32:04,160 but they're better at being dynamic. 601 00:32:04,160 --> 00:32:06,620 We can insert and delete-- at the beginning, at least-- 602 00:32:06,620 --> 00:32:08,960 in constant time. 603 00:32:08,960 --> 00:32:11,240 Now, if we want to actually insert and delete 604 00:32:11,240 --> 00:32:13,670 at a particular position, that's still hard, 605 00:32:13,670 --> 00:32:15,620 because we have to walk to that position. 606 00:32:15,620 --> 00:32:17,810 Even inserting and leading at the end of the list 607 00:32:17,810 --> 00:32:20,435 is hard, although that's fixable. 608 00:32:23,150 --> 00:32:26,690 And maybe I'll leave that for problem session or problem set. 609 00:32:26,690 --> 00:32:31,170 But an easy-- here's a small puzzle. 610 00:32:31,170 --> 00:32:35,670 Suppose you wanted to solve get-last 611 00:32:35,670 --> 00:32:38,790 efficiently in a linked list. 612 00:32:38,790 --> 00:32:40,710 How would you solve that in constant time? 613 00:32:43,530 --> 00:32:44,132 Yeah? 614 00:32:44,132 --> 00:32:45,340 AUDIENCE: Doubly linked list. 615 00:32:45,340 --> 00:32:46,715 ERIK DEMAINE: Doubly linked list. 616 00:32:46,715 --> 00:32:49,450 It's a good idea, but actually not the right answer. 617 00:32:49,450 --> 00:32:52,030 That's an answer to the next question I might ask. 618 00:32:52,030 --> 00:32:52,891 Yeah? 619 00:32:52,891 --> 00:32:55,650 AUDIENCE: [INAUDIBLE] 620 00:32:55,650 --> 00:32:57,900 ERIK DEMAINE: [INAUDIBLE] pointer to the last element. 621 00:32:57,900 --> 00:32:59,400 That's all we need here. 622 00:32:59,400 --> 00:33:03,150 And often, a doubly linked list has this. 623 00:33:03,150 --> 00:33:06,220 They usually call this the tail-- head and tail. 624 00:33:06,220 --> 00:33:09,720 And if we always just store a pointer to the last list-- 625 00:33:09,720 --> 00:33:12,390 this is what we call data structure augmentation, 626 00:33:12,390 --> 00:33:14,460 where we add some extra information to the data 627 00:33:14,460 --> 00:33:15,420 structure and-- 628 00:33:15,420 --> 00:33:18,160 we have to keep it up to date all the time. 629 00:33:18,160 --> 00:33:20,400 So if we do an insert_last or something, 630 00:33:20,400 --> 00:33:22,920 insert_last also becomes easy because I can just 631 00:33:22,920 --> 00:33:25,950 add a new node here and update the pointer here. 632 00:33:25,950 --> 00:33:27,270 delete_last is trickier. 633 00:33:27,270 --> 00:33:29,700 That's where you get a doubly linked list. 634 00:33:29,700 --> 00:33:32,130 But whenever I add something to the end of this list, 635 00:33:32,130 --> 00:33:34,440 I have to update the tail pointer also. 636 00:33:34,440 --> 00:33:36,840 As long as I maintain this, now, suddenly get-last 637 00:33:36,840 --> 00:33:38,197 is fast in constant time. 638 00:33:38,197 --> 00:33:40,530 So linked lists are great if you're working on the ends, 639 00:33:40,530 --> 00:33:42,060 even dynamically. 640 00:33:42,060 --> 00:33:44,940 Arrays are great if you're doing random access and nothing 641 00:33:44,940 --> 00:33:45,520 dynamic-- 642 00:33:45,520 --> 00:33:48,450 nothing adding or deleting at the ends or in the middle. 643 00:33:51,270 --> 00:33:54,210 Our final goal for today is to get 644 00:33:54,210 --> 00:33:56,850 sort of the best of both worlds with dynamic arrays. 645 00:33:56,850 --> 00:33:59,968 We're going to try to get all of the good running times 646 00:33:59,968 --> 00:34:02,010 of linked lists and all of the good running times 647 00:34:02,010 --> 00:34:03,328 of static arrays. 648 00:34:03,328 --> 00:34:05,370 We won't get quite all of them, but most of them. 649 00:34:12,060 --> 00:34:19,980 And in some sense, another way to describe 650 00:34:19,980 --> 00:34:23,070 what these introductory lectures are about 651 00:34:23,070 --> 00:34:26,969 is telling you about how Python is implemented. 652 00:34:26,969 --> 00:34:31,139 What we're going to talk about next, dynamic arrays, 653 00:34:31,139 --> 00:34:33,090 I've alluded to many times. 654 00:34:33,090 --> 00:34:37,860 But these are what Python calls lists. 655 00:34:43,300 --> 00:34:46,210 You don't have to implement a dynamic array by hand 656 00:34:46,210 --> 00:34:50,739 because it's already built into many fancy new languages 657 00:34:50,739 --> 00:34:53,560 for free, because they're so darn useful. 658 00:34:53,560 --> 00:34:56,139 This lecture is about how these are actually implemented 659 00:34:56,139 --> 00:34:59,080 and why they're efficient. 660 00:34:59,080 --> 00:35:00,822 And in recitation nodes, you'll see 661 00:35:00,822 --> 00:35:02,530 how to actually implement them if all you 662 00:35:02,530 --> 00:35:04,420 had were static arrays. 663 00:35:04,420 --> 00:35:06,130 But luckily, we have dynamic arrays, 664 00:35:06,130 --> 00:35:08,410 so we don't have to actually implement them. 665 00:35:08,410 --> 00:35:11,480 But inside the Python interpreter, 666 00:35:11,480 --> 00:35:14,500 this is exactly what's happening. 667 00:35:14,500 --> 00:35:23,200 The idea is to relax the constraint-- 668 00:35:23,200 --> 00:35:25,150 or the invariant, whatever-- 669 00:35:25,150 --> 00:35:30,340 that the size of the array we use 670 00:35:30,340 --> 00:35:37,150 equals n, which is the number of items in the sequence. 671 00:35:40,773 --> 00:35:42,190 Remember, in the sequence problem, 672 00:35:42,190 --> 00:35:44,560 we're supposed to represent n items. 673 00:35:44,560 --> 00:35:47,020 With a static array, we allocated an array of size 674 00:35:47,020 --> 00:35:48,730 exactly n. 675 00:35:48,730 --> 00:35:49,780 So let's relax that. 676 00:35:49,780 --> 00:35:51,400 Let's not make it exactly n. 677 00:35:51,400 --> 00:35:54,100 Let's make it roughly n. 678 00:35:54,100 --> 00:35:56,890 How roughly, you can think about for a while. 679 00:35:56,890 --> 00:35:59,800 But from an algorithms perspective, 680 00:35:59,800 --> 00:36:02,410 usually, when we say roughly, we mean throw away 681 00:36:02,410 --> 00:36:03,220 constant factors. 682 00:36:03,220 --> 00:36:04,570 And that turns out to be the right answer here. 683 00:36:04,570 --> 00:36:06,040 It's not always the right answer. 684 00:36:06,040 --> 00:36:11,980 But we're going to enforce that the size of the array 685 00:36:11,980 --> 00:36:14,230 is theta n-- 686 00:36:14,230 --> 00:36:15,940 probably also greater than or equal to n. 687 00:36:15,940 --> 00:36:18,200 0.5 n would not be very helpful. 688 00:36:18,200 --> 00:36:21,370 So it's going to be at least n, and it's 689 00:36:21,370 --> 00:36:25,430 going to be at most some constant times n. 690 00:36:25,430 --> 00:36:28,225 2n, 10n, 1.1 times n. 691 00:36:28,225 --> 00:36:29,600 Any of these constants will work. 692 00:36:29,600 --> 00:36:32,840 I'm going to use 2n here, but there are lots of options. 693 00:36:35,430 --> 00:36:38,740 And now, things almost work for free. 694 00:36:38,740 --> 00:36:41,570 There's going to be one subtlety here. 695 00:36:41,570 --> 00:36:44,310 And I'm going to focus on-- 696 00:36:44,310 --> 00:36:50,990 we're still going to maintain that the ith item of the array 697 00:36:50,990 --> 00:36:54,460 represents x i. 698 00:36:54,460 --> 00:36:58,240 This data structure-- let me draw a picture. 699 00:36:58,240 --> 00:37:00,130 We've got an array of some size. 700 00:37:00,130 --> 00:37:07,850 The first few items are used to store the sequence. 701 00:37:07,850 --> 00:37:11,570 But then, there's going to be some blank ones at the end. 702 00:37:11,570 --> 00:37:13,440 Maybe we'll keep track of this-- 703 00:37:13,440 --> 00:37:16,640 so the data structure itself is going to have an array and it's 704 00:37:16,640 --> 00:37:19,100 going to have a length. 705 00:37:19,100 --> 00:37:20,155 Something like this. 706 00:37:20,155 --> 00:37:22,030 We're also going to keep track of the length. 707 00:37:22,030 --> 00:37:24,500 So we know that the first length items 708 00:37:24,500 --> 00:37:30,500 are where the data is, and the remainder are meaningless. 709 00:37:30,500 --> 00:37:39,760 So now, if I want to go and do an insert_last, what do I do? 710 00:37:39,760 --> 00:37:43,590 I just go to a of length and set it to x. 711 00:37:48,480 --> 00:37:52,200 And then I increment length. 712 00:37:52,200 --> 00:37:52,700 Boom. 713 00:37:52,700 --> 00:37:53,200 Easy. 714 00:37:53,200 --> 00:37:53,850 Constant time. 715 00:37:53,850 --> 00:37:54,350 Yeah? 716 00:37:54,350 --> 00:37:56,770 AUDIENCE: [INAUDIBLE] 717 00:37:56,770 --> 00:37:58,520 ERIK DEMAINE: How do you have enough room. 718 00:37:58,520 --> 00:37:59,390 Indeed, I don't. 719 00:37:59,390 --> 00:38:01,520 This was an incorrect algorithm. 720 00:38:01,520 --> 00:38:03,110 But it's usually correct. 721 00:38:03,110 --> 00:38:04,760 As long as I have extra space, this 722 00:38:04,760 --> 00:38:08,000 is all I need to do for insert_last. 723 00:38:08,000 --> 00:38:14,100 But I am also going to store the size of the array. 724 00:38:14,100 --> 00:38:15,500 This is the actual-- 725 00:38:15,500 --> 00:38:22,860 this whole thing is size, and this part is length. 726 00:38:22,860 --> 00:38:26,640 Length is always going to be less than or equal to size. 727 00:38:26,640 --> 00:38:27,870 And so there's a problem. 728 00:38:27,870 --> 00:38:31,290 If length equals size, then I don't have any space. 729 00:38:36,040 --> 00:38:48,070 Just add to end unless n equals size. 730 00:38:48,070 --> 00:38:51,950 I'm using n length for the same thing. 731 00:38:51,950 --> 00:38:55,860 So length here is the same as n. 732 00:38:55,860 --> 00:38:57,510 That's our actual number of things 733 00:38:57,510 --> 00:38:59,200 we're trying to represent. 734 00:38:59,200 --> 00:39:00,660 And size-- this is great. 735 00:39:00,660 --> 00:39:03,305 This is the interface size. 736 00:39:03,305 --> 00:39:04,930 This is what we're trying to represent. 737 00:39:04,930 --> 00:39:07,920 And this is the representation size. 738 00:39:07,920 --> 00:39:09,155 This is the size of my array. 739 00:39:09,155 --> 00:39:10,530 These are the number of items I'm 740 00:39:10,530 --> 00:39:12,220 trying to store in that array. 741 00:39:12,220 --> 00:39:14,275 This is the interface versus data structure. 742 00:39:14,275 --> 00:39:15,150 Here's the interface. 743 00:39:15,150 --> 00:39:17,160 Here's the data structure. 744 00:39:17,160 --> 00:39:18,680 OK, cool. 745 00:39:22,560 --> 00:39:25,395 What do I do in the case when n equals size? 746 00:39:33,280 --> 00:39:36,110 I'm going to have to make my array bigger. 747 00:39:36,110 --> 00:39:40,075 This should sound just like static arrays. 748 00:39:40,075 --> 00:39:43,000 For static arrays, we made our array bigger every time 749 00:39:43,000 --> 00:39:44,230 we inserted. 750 00:39:44,230 --> 00:39:49,420 And that was this linear cost of allocation. 751 00:39:49,420 --> 00:39:50,890 We're going to do that sometimes. 752 00:39:50,890 --> 00:39:53,140 With static arrays, we had to do it every single time, 753 00:39:53,140 --> 00:39:55,450 because size equaled n. 754 00:39:55,450 --> 00:39:56,920 Now, we have some flexibility. 755 00:39:56,920 --> 00:39:59,780 We're only going to do it sometimes. 756 00:39:59,780 --> 00:40:03,770 It's like, cookies are a sometimes food, apparently, 757 00:40:03,770 --> 00:40:06,530 according to modern Cookie Monster. 758 00:40:06,530 --> 00:40:07,860 I don't understand. 759 00:40:07,860 --> 00:40:12,470 But if n equals size, we're going 760 00:40:12,470 --> 00:40:25,625 to allocate a new array of size-- 761 00:40:30,190 --> 00:40:31,060 any suggestions? 762 00:40:31,060 --> 00:40:32,015 AUDIENCE: Bigger. 763 00:40:32,015 --> 00:40:32,890 ERIK DEMAINE: Bigger. 764 00:40:32,890 --> 00:40:33,760 I like it. 765 00:40:33,760 --> 00:40:35,740 Greater than size. 766 00:40:35,740 --> 00:40:36,790 How much bigger? 767 00:40:36,790 --> 00:40:38,550 AUDIENCE: Twice. 768 00:40:38,550 --> 00:40:40,003 ERIK DEMAINE: Twice. 769 00:40:40,003 --> 00:40:40,920 JASON KU: Five things. 770 00:40:40,920 --> 00:40:43,020 ERIK DEMAINE: Five things. 771 00:40:43,020 --> 00:40:44,640 Size plus 5? 772 00:40:44,640 --> 00:40:47,100 Come on, Jason. 773 00:40:47,100 --> 00:40:48,700 Trolling me. 774 00:40:48,700 --> 00:40:49,200 All right. 775 00:40:49,200 --> 00:40:51,490 There are a couple of natural choices here. 776 00:40:51,490 --> 00:40:53,250 One is a constant factor larger. 777 00:40:53,250 --> 00:40:57,540 You could use 1.1, or 1.01, or two, or 5, or 10. 778 00:40:57,540 --> 00:40:58,860 They will all work. 779 00:40:58,860 --> 00:41:02,430 Or you could use Jason's trolling answer of size 780 00:41:02,430 --> 00:41:06,420 plus a constant, like 5. 781 00:41:06,420 --> 00:41:07,320 Why is this bad? 782 00:41:10,390 --> 00:41:10,890 Yeah? 783 00:41:10,890 --> 00:41:13,290 AUDIENCE: [INAUDIBLE] 784 00:41:18,072 --> 00:41:19,780 ERIK DEMAINE: You'll have to do it again. 785 00:41:19,780 --> 00:41:22,060 You'll have to resize frequently. 786 00:41:22,060 --> 00:41:22,930 When? 787 00:41:22,930 --> 00:41:25,600 Five steps later. 788 00:41:25,600 --> 00:41:27,190 In the original static array, we were 789 00:41:27,190 --> 00:41:29,410 reallocating every single time. 790 00:41:29,410 --> 00:41:30,970 That's like n plus 1. 791 00:41:30,970 --> 00:41:33,340 If we do n plus 5, that really doesn't change things 792 00:41:33,340 --> 00:41:34,960 if we ignore constant factors. 793 00:41:34,960 --> 00:41:37,840 Now, we'll have to spend linear time every five steps instead 794 00:41:37,840 --> 00:41:39,490 of linear time every one step. 795 00:41:39,490 --> 00:41:42,250 That's still linear time per operation, just, 796 00:41:42,250 --> 00:41:44,710 we're changing the constant factor. 797 00:41:44,710 --> 00:41:46,810 Whereas 2 times size, well, now we 798 00:41:46,810 --> 00:41:49,340 have to think a little bit harder. 799 00:41:49,340 --> 00:41:51,370 Let's just think about the case where 800 00:41:51,370 --> 00:41:54,370 we're inserting at the end of an array. 801 00:41:54,370 --> 00:42:05,470 Let's say we do n insert_lasts from an empty array. 802 00:42:12,010 --> 00:42:13,270 When do we resize? 803 00:42:13,270 --> 00:42:14,845 Well, at the beginning-- 804 00:42:14,845 --> 00:42:16,970 I guess I didn't say what we do for an empty array. 805 00:42:16,970 --> 00:42:18,670 Let's say size equals 1. 806 00:42:18,670 --> 00:42:20,320 We can insert one item for free. 807 00:42:20,320 --> 00:42:23,500 As soon as we insert the second item, then we have to resize. 808 00:42:23,500 --> 00:42:24,460 That seems bad. 809 00:42:24,460 --> 00:42:26,080 Immediately, we have to resize. 810 00:42:26,080 --> 00:42:27,550 Then we insert the third item. 811 00:42:27,550 --> 00:42:30,110 OK, now let's draw a picture. 812 00:42:30,110 --> 00:42:31,330 So we start with one item. 813 00:42:31,330 --> 00:42:33,260 We fill it up. 814 00:42:33,260 --> 00:42:37,000 Then, we grow to size 2, because that's twice 1. 815 00:42:37,000 --> 00:42:37,960 Then we fill it up. 816 00:42:37,960 --> 00:42:39,520 Immediately, we have to resize again. 817 00:42:39,520 --> 00:42:41,290 But now we start to get some benefit. 818 00:42:41,290 --> 00:42:45,100 Now, we have size 4, and so we can insert two items 819 00:42:45,100 --> 00:42:46,870 before we have to resize. 820 00:42:46,870 --> 00:42:50,740 And now, we're size 8, and we get 821 00:42:50,740 --> 00:42:55,510 to insert four items before we refill. 822 00:42:55,510 --> 00:42:57,970 This is going to resize-- 823 00:42:57,970 --> 00:43:00,190 and again, resizes are expensive both 824 00:43:00,190 --> 00:43:02,620 because we have to pay to allocate the new array-- 825 00:43:02,620 --> 00:43:04,890 I drew it as just extending it, but in fact, we're 826 00:43:04,890 --> 00:43:06,730 creating a whole new array, and then 827 00:43:06,730 --> 00:43:09,293 we have to copy all of the items over. 828 00:43:09,293 --> 00:43:11,710 So there's the allocation cost and then the copying costs. 829 00:43:11,710 --> 00:43:13,600 It's linear either way. 830 00:43:13,600 --> 00:43:20,500 But we're going to resize at n equals 1, 2, 4, 8, 831 00:43:20,500 --> 00:43:22,180 16-- you know this sequence. 832 00:43:22,180 --> 00:43:25,180 All the powers of 2, because we're doubling. 833 00:43:25,180 --> 00:43:28,240 That is exactly powers of 2. 834 00:43:28,240 --> 00:43:31,690 So we pay a linear cost. 835 00:43:31,690 --> 00:43:37,150 This resize cost, the allocation and the copying, 836 00:43:37,150 --> 00:43:40,070 is going to be-- it's linear each time. 837 00:43:40,070 --> 00:43:46,480 So it's 1 plus 2 plus 4 plus 8 plus 16. 838 00:43:46,480 --> 00:43:51,070 Really, I should write this as sum from i 839 00:43:51,070 --> 00:43:53,680 equals 1 to roughly log n. 840 00:43:53,680 --> 00:43:58,270 Log base 2 of n is lG of 2 to the i. 841 00:44:01,160 --> 00:44:04,700 If you want a terminus here, it's roughly n. 842 00:44:04,700 --> 00:44:06,140 It's actually the next-- 843 00:44:06,140 --> 00:44:09,110 the previous power of 2 of n, or something. 844 00:44:09,110 --> 00:44:10,187 But that won't matter. 845 00:44:10,187 --> 00:44:12,270 That will just affect things by a constant factor. 846 00:44:12,270 --> 00:44:16,510 What is the sum of 2 to the i? 847 00:44:16,510 --> 00:44:18,792 This is a geometric series. 848 00:44:18,792 --> 00:44:19,750 Anyone know the answer? 849 00:44:25,690 --> 00:44:26,525 Yeah? 850 00:44:26,525 --> 00:44:29,380 AUDIENCE: [INAUDIBLE] 851 00:44:29,380 --> 00:44:33,800 ERIK DEMAINE: 2 to the top limit plus 1 minus 1. 852 00:44:33,800 --> 00:44:34,300 Yeah. 853 00:44:34,300 --> 00:44:36,520 So this is the identity. 854 00:44:36,520 --> 00:44:45,760 Sum of 2 to the i from i equals 1 to k is 2 to the k plus 1, 855 00:44:45,760 --> 00:44:47,830 plus 1 minus 1. 856 00:44:47,830 --> 00:44:49,123 So the plus 1 is upstairs. 857 00:44:49,123 --> 00:44:50,290 The minus one is downstairs. 858 00:44:50,290 --> 00:44:53,290 An easy way to remember this is if you think in binary-- 859 00:44:53,290 --> 00:44:54,220 as we all should. 860 00:44:54,220 --> 00:44:55,870 We're computer scientists. 861 00:44:55,870 --> 00:45:00,850 2 to the i means you set the ith bit to 1. 862 00:45:00,850 --> 00:45:02,290 Here's a bit string. 863 00:45:02,290 --> 00:45:03,250 This is the ith bit. 864 00:45:03,250 --> 00:45:04,480 This is 2 to the i. 865 00:45:04,480 --> 00:45:05,950 0 is down here. 866 00:45:05,950 --> 00:45:11,500 If I sum them all up, what that means is, I'm putting 1s here. 867 00:45:11,500 --> 00:45:15,820 And if you think about what this means, this is up to k from 0-- 868 00:45:15,820 --> 00:45:19,750 sorry, I should do 0 to be proper. 869 00:45:19,750 --> 00:45:23,110 If I write-- that's the left-hand side. 870 00:45:23,110 --> 00:45:27,700 The right-hand side is 2 to the k plus 1, which is a 1 here, 871 00:45:27,700 --> 00:45:30,100 and the rest 0s. 872 00:45:30,100 --> 00:45:31,900 So if you know your binary arithmetic, 873 00:45:31,900 --> 00:45:34,330 you subtract-- if you add 1 to this, you get this. 874 00:45:34,330 --> 00:45:36,670 Or if you subtract 1 from this, you get this. 875 00:45:36,670 --> 00:45:39,610 This is why this identity holds. 876 00:45:39,610 --> 00:45:42,040 Or the higher-level thing is to say, oh, 877 00:45:42,040 --> 00:45:44,120 this is a geometric series. 878 00:45:44,120 --> 00:45:46,420 So I know-- you should know this. 879 00:45:46,420 --> 00:45:47,440 I'm telling you now. 880 00:45:47,440 --> 00:45:50,990 Geometric series are dominated by the last term-- the biggest 881 00:45:50,990 --> 00:45:51,490 term. 882 00:45:51,490 --> 00:45:53,948 If you have any series you can identify as geometric, which 883 00:45:53,948 --> 00:45:56,380 means it's growing at least exponentially, 884 00:45:56,380 --> 00:45:59,135 then in terms of theta notation, you 885 00:45:59,135 --> 00:46:01,510 can just look at the last term and put a theta around it, 886 00:46:01,510 --> 00:46:03,050 and you're done. 887 00:46:03,050 --> 00:46:05,290 So this is theta of the last term, 888 00:46:05,290 --> 00:46:09,340 like 2 to the log n, which is theta n. 889 00:46:12,920 --> 00:46:14,030 Cool. 890 00:46:14,030 --> 00:46:15,710 Linear time. 891 00:46:15,710 --> 00:46:18,140 Linear time for all of my operations. 892 00:46:18,140 --> 00:46:21,800 I'm doing n operations here, and I spent linear total time 893 00:46:21,800 --> 00:46:24,020 to do all of the resizing. 894 00:46:24,020 --> 00:46:24,620 That's good. 895 00:46:24,620 --> 00:46:28,401 That's like constant each, kind of. 896 00:46:28,401 --> 00:46:31,850 The "kind of" is an important notion 897 00:46:31,850 --> 00:46:35,990 which we call amortization. 898 00:46:45,360 --> 00:46:55,560 I want to say an operation takes t of n 899 00:46:55,560 --> 00:47:13,540 amortized time if, let's say, any k of those operations 900 00:47:13,540 --> 00:47:19,790 take, at most, k times t of n time. 901 00:47:19,790 --> 00:47:23,080 This is a little bit sloppy, but be good enough. 902 00:47:23,080 --> 00:47:26,110 The idea is here, if this works for n or k, 903 00:47:26,110 --> 00:47:29,170 to do n operations from an empty array here 904 00:47:29,170 --> 00:47:33,280 takes linear time, which means I would 905 00:47:33,280 --> 00:47:36,670 call this constant amortized. 906 00:47:36,670 --> 00:47:42,390 Amortized means a particular kind of averaging-- 907 00:47:42,390 --> 00:47:44,710 averaging over the sequence of operations. 908 00:47:44,710 --> 00:47:47,190 So while individual operations will be expensive, 909 00:47:47,190 --> 00:47:49,980 one near the end, when I have to resize the array, 910 00:47:49,980 --> 00:47:52,830 is going to take linear time just for that one operation. 911 00:47:52,830 --> 00:47:54,430 But most of the operations are cheap. 912 00:47:54,430 --> 00:47:56,110 Most of them are constant. 913 00:47:56,110 --> 00:48:00,510 So I can think of charging that high cost 914 00:48:00,510 --> 00:48:04,620 to all of the other operations that made it happen. 915 00:48:04,620 --> 00:48:11,880 This is averaging over the operation sequence. 916 00:48:11,880 --> 00:48:17,160 Every insert_last over there only takes constant time, 917 00:48:17,160 --> 00:48:23,750 on average, over the sequence of operations that we do. 918 00:48:23,750 --> 00:48:25,060 And so it's almost constant. 919 00:48:25,060 --> 00:48:27,310 It's not quite as good as constant, worst case, 920 00:48:27,310 --> 00:48:28,750 but it's almost as good. 921 00:48:28,750 --> 00:48:30,280 And it's as good as you could hope 922 00:48:30,280 --> 00:48:35,080 to do in this dynamic array allocation model. 923 00:48:35,080 --> 00:48:37,610 Let me put this into a table. 924 00:48:37,610 --> 00:48:39,940 And you'll find these in the lecture notes, also. 925 00:48:39,940 --> 00:48:42,130 We have, on the top, the main operations 926 00:48:42,130 --> 00:48:45,130 of sequence interface, which we will revisit in lecture seven. 927 00:48:45,130 --> 00:48:47,170 We'll see some other data structures for this. 928 00:48:47,170 --> 00:48:49,167 Get_at and set_at in the first column. 929 00:48:49,167 --> 00:48:51,250 Insert_ and delete_first, insert_ and delete_last, 930 00:48:51,250 --> 00:48:54,340 insert_ and delete_at an arbitrary position. 931 00:48:54,340 --> 00:48:56,710 We've seen three data structures now. 932 00:48:56,710 --> 00:48:59,320 Arrays were really good at get_at/set_at. 933 00:48:59,320 --> 00:49:00,760 They took constant time. 934 00:49:00,760 --> 00:49:02,050 That's the blue one. 935 00:49:02,050 --> 00:49:04,075 We're omitting the thetas here. 936 00:49:04,075 --> 00:49:05,950 All of the other operations took linear time, 937 00:49:05,950 --> 00:49:07,360 no matter where they were. 938 00:49:07,360 --> 00:49:09,877 Linked lists were really good at insert- and delete_first. 939 00:49:09,877 --> 00:49:11,710 They took constant time, but everything else 940 00:49:11,710 --> 00:49:13,690 took linear time, in the worst case. 941 00:49:13,690 --> 00:49:17,890 These new dynamic arrays achieve get_at and set_at 942 00:49:17,890 --> 00:49:24,060 in constant time because they maintain this invariant here 943 00:49:24,060 --> 00:49:25,410 that a of i equals x i. 944 00:49:25,410 --> 00:49:29,920 So we can still do get- and set_at quickly. 945 00:49:29,920 --> 00:49:32,070 And we also just showed that insert_last 946 00:49:32,070 --> 00:49:33,540 is constant amortized. 947 00:49:33,540 --> 00:49:38,500 delete_last, you don't have to resize the array. 948 00:49:38,500 --> 00:49:40,750 You could just decrease length and, boom, 949 00:49:40,750 --> 00:49:43,130 you've deleted the last item. 950 00:49:43,130 --> 00:49:45,430 It's not so satisfying, because if you insert n items 951 00:49:45,430 --> 00:49:47,290 and then delete n items, you'll still 952 00:49:47,290 --> 00:49:50,200 have an array of size theta n, even though your current value 953 00:49:50,200 --> 00:49:52,120 of n is 0. 954 00:49:52,120 --> 00:49:54,250 You can get around that with a little bit more 955 00:49:54,250 --> 00:49:56,375 trickery, which are described in the lecture notes. 956 00:49:56,375 --> 00:49:59,530 But it's beyond the-- we're only going to do very simple 957 00:49:59,530 --> 00:50:01,450 amortized analysis in this class-- 958 00:50:01,450 --> 00:50:03,550 to prove that that algorithm is also constant 959 00:50:03,550 --> 00:50:05,320 amortized, which it is. 960 00:50:05,320 --> 00:50:09,820 You'll see in 046, or you can find it in the CLRS book. 961 00:50:09,820 --> 00:50:11,820 That's it for today.