# Lecture 7: Randomization: Skip Lists

Flash and JavaScript are required for this feature.

Description: In this lecture, Professor Devadas continues with randomization, introducing skip lists as a randomized data structure.

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.

SRINIVAS DEVADAS: All right. Good morning, everyone. And let's get started. Today's lecture is about a randomized data structure called the skip list. And it's a data structure that, obviously because it's randomized, we'd have to do a probabilistic analysis for.

And we're going to sort of raise the stakes here a little bit with respect to our expectation-- the pun intended of this data structure-- in the sense that we're not going to be happy with just doing an expected value analysis or to get what the expectation is of the search complexity in a skip list.

We're going to introduce this notion with high probability, which is a stronger notion than just giving you the expected value or the expectation for the complexity of a search algorithm. And we're going to prove that under this notion, that search has a particular complexity with high probability. So we'll get to the with high probability part a little bit later in the lecture, but we're just going to start off doing some cool data structure design, I guess, [INAUDIBLE] pointing to the skip list.

The skip list is a relatively young data structure invented by a guy called Bill Pugh in 1989, so not much older than you guys. It's relatively easy to implement as you'll see. I won't really claim that, but hopefully you'll be convinced by the time you're done describing the structure. Especially in comparison to balanced trees.

And we can do a comparison after we do our analysis of the data structure as to what the complexity comparisons are for search and insert when you take a skip list and compare it to an AVL tree, for example, or a red black tree, et cetera. In general, when we have a data structure, we want it to be dynamic. The skip list maintains a dynamic set.

What that means is not only do you want to search on it-- obviously it's uninteresting to have a static data structure and do a search. You want to be able to change it, want to be able to insert values into it. There's a complexity of insert to worry about. You want to be able to delete values. And the richness of the data structure comes from the operations and the augmentations you can do on it, and the skip lists are no exception to that.

So if you want to maintain a dynamic set of n elements, and you obviously know a ton of data structures to do this, each of which has different characteristics. And this is, if you ignore hash tables, this is your first real randomized data structure, if you're just taking 6006 and this class might have seen randomized structures in other classes.

So we're going to try and do this in order log n time. As you know with balanced binary trees, you can do things in order log n time, a ton of things, pretty much everything that is interesting. And this, given that it's randomized, it's a relatively easy analysis to show that the expectation or the expected value of a search would be order log n expected time.

But we're going to, as I said, raise the stakes, and we're going to spend a ton of time the second half of this showing the with high probability result. And that's a stronger result than just saying that search takes expected order log n time. All right, so that's the context.

You can think of a skip list as beginning with a simple linked list. So if we have one link list and that link list-- let's first think of this as being unsorted. So suppose I have a link list which is unsorted and I want to search for a particular value in this link list. And we can assume that this is a doubly-linked list, so the arrows go both ways. You have a pointer, let's say, just to the first element.

So if you have a list that's unsorted and you want to search for an element, you would want to do a membership query. If there's n elements, the complexity is?

AUDIENCE: Order n.

SRINIVAS DEVADAS: Order n. So a linked list, the search takes the order n time. Now let's go ahead and say that we are sorting this list, so it's a sorted linked list. So your values here, 14, 23, 34, 42, 50, 59. They're sorted in ascending order. You still only have a pointer to the front of the list and it's a doubly-linked list, what is the complexity of search in the sorted link list?

AUDIENCE: Log n.

SRINIVAS DEVADAS: Log n. Oh, I wanted to hear that. Because it is?

AUDIENCE: Order n.

SRINIVAS DEVADAS: It's order n. log n is-- yeah, that was a trick question. Because I liked that answer, the person who said log n gets a Frisbee. This person won't admit.

[LAUGHTER]

Oh, it was you. OK, all right. There you go. All right.

So log n would imply that you have random access. If you have an array that's sorted and you can go [? AFi, ?] and you can go [? AFi ?] divided by 2, or [? AF2i ?] and you can go directly to that element, then you can do binary search and you can get a log n, order log in.

But here, the sorting actually doesn't help you with respect to the search simply because you have to start from the beginning, from the front of the list, and you've got to keep walking. The only place that it helps you is that if you know it's sorted and you're looking for 37, you can stop after you see 42, right?

That's pretty much the only place that it helps you. But it's still order n because that could happen-- on average is going to happen halfway through the list for a given membership query. So it's still order n for a sorted link list.

But now let's say that we had two sorted link lists. And how are these two link lists structured? Well, they're structured in a certain way, and let me draw our canonical example for skip list that I'm going to keep coming back to. So I won't erase this, but I'll draw one out-- 1, 2, 3, 4, 5, 6, 7, 8, 9-- 9 elements.

So that's my first list which is sorted, and so I have 14, 23, 34, 42, 50, 59, 66, 72, and 79. What I'm going to have now is another list sort of on top of this. I can move from top to bottom, et cetera. But I'm not going to have elements on top of each bottom element.

By convention, I'm going to have elements on top of the first element, regardless of how many lists I have. We only have two at this point. And so I see a 14, which is exactly the same element duplicated up on the top list. And that list is also sorted, but I won't have all of the elements in the top list.

I'm just picking a couple here. So I've got 34, 42-- they're adjacent here-- and then I go all the way up to 72, and I duplicate it, et cetera. Now, this looks kind of random. Anybody recognize these numbers? No one from the great City of New York? No? Yup, yup.

AUDIENCE: On the subway stops?

SRINIVAS DEVADAS: Yeah, subway stops on the Seventh Avenue Express Line. So this is exactly the notion of a skip list, the fact that you have-- could you stand up? Great. All right.

So the notion here is that you don't have to make a lot of stops if you know you have to go far. So if you want to go from 14th Street to 72nd Street, you just take the express line. But if you want to go to 66th Street, what would you do?

AUDIENCE: Go to 72nd and then go back.

SRINIVAS DEVADAS: Well, that's one way. That's one way. That's not the way I wanted. The way we're going to do this is we're not going to overshoot. So we want to minimize distance, let's say. So our secondary thing is going to be minimizing distance travel.

And so you're going to pop up the express line, go all the way to 42nd Street, and you're going to say if I go to the next stop on the Express Line, I'm going too far. And so you're going to pop down to the local line. So you can think of this as being link list L0 and link list L1. You're going to pop down, and then you're going to go to 66th Street.

So search 66 will be going from 14 to 42 on L1, and then from 42, let's just say that's walking. 42 to 42, L1 to L0. And then 42 to 66 on L0.

So that's the basic notion of a skip list. So you can see that it's really pretty simple. What we're going to do now is do two things. I want to think about this double-sorted list as a data structure in its own right before I dive into skip lists in general. And I want to analyze at some level, the best case situation for worst case complexity.

And by that I mean I want to structure the express stops in the best manner possible. These stops are very structured for passengers because they figured fancy stops on 42nd Avenue, whatever-- fancy stores. Everybody wants to go there and so on and so forth. So you have 34 pretty close to 42 because they're both popular destinations.

But let's say that things where I guess more egalitarian and randomized, if you will. And what I want to do is I want to structure this double-sorted list so I get the best worst case complexity for search. And so let's do that.

And before I do that, let me write out the search algorithm, which is going to be important. I want you to assimilate this, keep it in your head because we're going to analyze search pretty much for the rest of the morning here. And so I'll write this down. You've got a sense of what it is based on what I just did here with this example of 66, but worth writing down.

We're going to walk right in the top linked list, so this is simply for two linked lists, and we'll generalize at some point. So we want to walk right in the top linked list, L1, until going right would go too far.

Now, there was this answer with 72, which I kind of dismissed. But there's no reason why you can't overshoot one stop and go backwards. It would just be a different search algorithm. It's not something we're going to analyze here. It turns out in analyzing that with high probability would be even more painful than the painful analysis we're going to do. So we won't go there.

And then we walk down to the bottom list. And the bottom list we'll call L0. And walk right in L0 until the element is found or not. And you know that if you've overshot.

So if you're looking here for route 67, when you get to 72 here-- you've seen 66 and you get to 72 and you're looking for 67, search fails. It stops and fails. Doesn't succeed in this case.

So that's what we got for search. And that's our two linked list argument. Now, our analysis essentially says what I have is I'm walking right at the bottom list here, and my top list is L1, so I start with L1. And my search cost is going to be approximately the length of L1.

The worst case analysis, I could go all the way on the top list-- it's possible. But for a given value, I'm going to be looking at only a portion of the bottom list. I'm not going to go all the way on the bottom list ever. I'm only going to be looking at a portion of it.

So it's going to be L0 divided by L1, if I have interspersed my express stops in a uniform way. So there's no reason-- if I have 100 elements in the bottom list, and if I had five, just for argument sake, five in the top list, then I'd put them at, let's say, the 0 position, 20, 40, 60, et cetera. So I want to have roughly equal spacings.

But we need to make that a little more concrete, and a little more precise. And what I'm saying here simply is that this is the cost of traversal in the top list, and this is the cost of traversal in the bottom list, because I'm not going to go all the way in the bottom list. I'm only going to go a portion on the bottom list. Everybody gets that? Yup? All right, good.

So if I want to minimize this cost, which is going to tell me how to scatter these elements in the top list, how to choose my express stops, if you will-- I want to scatter these in a uniform way, then this is minimized when terms are equal. You could go off and differentiate and do that. It's fairly standard.

And what you end up getting is you want to get L1 square equals L0 equals n. So all of the elements are down at the bottom list, and so the cardinality of the bottom list is n. And roughly speaking, you're going to end up optimizing, if you have this satisfied, which means that L1 is going to be square root of n. OK?

So what you've done here is you've said a bunch of things, actually. You've decided how many elements are going to be in your top list. If there's n elements in the bottom list, you want to have the square root of n elements in the top list.

And not only that, in order to make sure that this works properly, and that you don't get a worse case cost that is not optimal, you do have to intersperse the square root of n elements at regular intervals in relation to the bottom list on the top list. OK, so pictorially what this means is it's not what you have here.

What you really want is something that, let's say, looks like this where this part here is square root of n elements up until that point, and then let's say we go from here to here or square root of n elements, and maybe I'll have a 66 here because that's exactly where I want my square root of n. Basically, three elements in between. So I got 66 here, et cetera. I mean I chose n to be a particular value here, but you get the picture.

So the search now, as you can see if you just add those up you get square root of n here, and you got n divided by square root of n here. So that's square root of n as well. So the search cost is order square root of n.

And so that's it. That's the first generalization, and really the most important one, that comes from going from a single sorted list to an approximation of a skip list. So what do you do if you want to make things better? So we want to make things better? Are we happy with square root of n?

AUDIENCE: No.

SRINIVAS DEVADAS: No. Well, what's our target?

AUDIENCE: Log n.

SRINIVAS DEVADAS: Log n, obviously. Well, I guess you can argue that our target may be order 1 at some point, but for today's lecture it is order log n with high probability. We'll leave it at that.

And so what do you do if you want to go this way and generalize? You simply add more lists. I mean it seems to be pretty much the only thing we could do here. So let's go ahead and add a third list.

So if you have two sorted lists, that implies I have 2 square root of n. If I want to be explicit about the constant in terms of the search cost, assuming things are interspersed exactly right. Keep that in mind because that is going to go away when we go and randomize. We're going to be flipping coins and things like that. But so far, things are very structured.

What do you think-- we won't do this analysis-- the cost is going to be if I intersperse optimally, what is the cost going to be for a search when I have three sorted lists?

AUDIENCE: Cube root.

SRINIVAS DEVADAS: Cube root. Great guess. Who said cube root?

AUDIENCE: [INAUDIBLE].

SRINIVAS DEVADAS: You already have a Frisbee. Give it to a friend. I need to get rid of these.

So it's going to be cube root, and the constant in front of that is going to be?

AUDIENCE: 3.

SRINIVAS DEVADAS: 3. So you have-- right? So let's just keep going. You have k sorted lists. You're going to have k times the k-th root of n. That's what you got. And I'm not going to bother drawing this, but essentially what happens is you are making the same number of moves which corresponds to the root of n, the corresponding root of n, at every level.

And the last thing we have to do to get a sense for what happens here is we have log n sorted lists, so the number of levels here is log n. So this is starting to look kind of familiar because it borrows from other data structures. And what this is I'm just going to substitute log n for k, and I got this kind of scary looking-- I was scared the first time I saw this. Oh, this is n.

It's the log n-th root of n, OK? And so it's kind of scary looking. But what is the log n-th root of n-- and we can assume that n is a power of two?

AUDIENCE: 2.

SRINIVAS DEVADAS: 2, exactly. It's not that scary looking, and that's because I'm not a mathematician. That's why I was scared. So 2 log n.

All right. So that's it. So you get a sense of how this works now, right? We haven't talked about randomized structures yet, but I've given you the template that's associated with the skip list, which essentially says what I'm going to have are-- if it was static data items and n was a power of two, then essentially what I'm saying is I'm going to have a bunch of items, n items, at the bottom.

I'm going to have n over 2 items at the list that's just immediately above. And each of them are going to be alternating. You're going to have an item in between. And then on the top I'm going to see n over 4 items, and so on and so forth.

What does that look like? Kind of looks like a tree, right? I mean it doesn't have the structure of a tree in the sense of the edges of a tree. It's quite different because you're connecting things differently. You have all the leaves connected down at the bottom of this so-called tree with this doubly linked list, but it has the triangle structure of a tree. And that's where the log n comes from.

So this is would all be wonderful if this were a static set. And n doesn't have to be a power of 2-- you could pad it, and so on and so forth. But the big thing here is that we haven't quite accomplished what we set out to do, even though we seem to have this log n cost for search. But it's all based on a static set which doesn't change.

And the problem, of course, is that you could have deletions. You want to take away 42. For some reason you can't go to 42nd Avenue, or I guess art-- you can't go to [INAUDIBLE] would be a better example. So stuff breaks, right? And so you take stuff out and you insert things in.

Suppose I wanted to insert 60, 61, 62, 63, and 64 into that list that I have? What would happen? Yeah, you're shaking your head. I mean that log n would go away, so it would be a problem.

But what we have to do now is move to the probabilistic domain. We have to think about what happens when we insert elements. We need an algorithm for insert. So then we can start with the null list and build it up. And then you start with a null list and you have a randomized algorithm for insert, it ain't going to look that pretty. It's going to look random.

But you have to have a certain amount of structure so you can still get your order log n. So you have to do the insertion appropriately. So that's what we have to do next. But any questions about that complexity that I have up there? All right, good.

I want a canonical example of a list here, and I kind of ran out of room over there, so bear with me as I draw you a more sophisticated skip list that has a few more levels. And the reason for this is it's only interesting when you have three or more levels. The search algorithm is kind of the same. You go up top and when you overshoot you pop down one level, and then you do the same thing over and over.

But we are going to have to bound the number of levels in the skip list in a probabilistic way. We have to actually discover the expected number of levels because we're going to be doing inserts in a randomized way. And so it's worthwhile having a picture that's a little more interesting than the picture of the two linked lists that I had up there. So I'm going to leave this on for the rest of the lecture.

So that's our bottom, and that hasn't changed from our previous examples. I'm not going to bother drawing the horizontal connections. When you see things adjacent horizontally at the same level, assume that they're all connected-- all of them.

And so I have four levels here. And you can think of this as being the entire list or part of it. Just to delineate things nicely, we'll assume that 79, which is the last element, is all the way up at the top as well. Sort of the terminus, termini, corresponding to our analogy of subways.

And so that's our top-most level. And then I might have 50 here at this level, or so that looks like. I will have 50, so the invariant here, and that's another reason I want to draw this out, is that if you have a station at highest level, then you will have-- it's got to be sitting on something.

So if you've got a 79 at level four, or level three here if this is L0, then you will see 79 at L2, L1, and L0. And if you see 50 here, it's not in L3 so that's OK, but it's in L2, so it's got to be at L1 as well.

Of course you know that everything is down at L1, so this is interesting from a standpoint of the relationship between Li and Li plus 1 where i is greater than or equal to 1. So the implication is that if you see it at at Li plus 1, it's going to be at Li and Li minus 1 if that happens to exist, et cetera.

And so one last thing here just to finish it up. I got 34 here, which is an additional thing which ends there. So the highest level is this second level or L1. This is 66. And then that's it. So that's our skip list.

So if you wanted to search for 72, you would start here, and then you'd go to 79, or you'd look and say, oh, 79 is too far, so I'm going to pop down a level. And then you'd say 50, oh, good. I can get to 50. 79 is too far, so I'm going to pop down a level. And then you go to 66-- 79 is too far-- and at 66, you pop down a level and then you go 66 to 72.

So same as what we had before. Hopefully it's not too complicated. So that's our skip list. It's still looking pretty structured, looking pretty regular. But if I start taking that and start inserting things and deleting things, it could become quite irregular.

I could take away 23, for example. And there's nothing that's stopping me from taking away 34 or 79. You've got to delete an element, you've got to delete an element. I mean the fact that it's in four levels shouldn't make a difference. And so that's something to keep in mind. So this could get pretty messy.

So let's talk about insert, and I've spent a bunch of time skirting around the issue of what exactly happens when you insert an element. Turns out delete is pretty easy. Insert is more interesting. Let's do insert.

To insert an element x into a skip list, the first thing we're going to do is search to figure out where x fits into the bottom list. So you do a search just like you would if you were just doing a search.

You always insert into the appropriate position. So if there's a single sorted list, that would pretty much be it. And so that part is easy. If you want to insert 67, you do all of the search operations that I just went over, and then you insert 67 between 66 and 72. So do your pointer manipulations, what have you, and you're good.

But you're not done yet, because you want this to be a skip list and you want this to have expected search over any random query as the list grows and shrinks of order log n, expectation, and also with high probability. So what you're going to have to do is when you start inserting, you're going to have to decide if you're going to what is called promote these elements or not.

And the notion of a promotion is that you are going up and duplicating this inserted element some number of levels up. So if you just look at how this works, it's really pretty straightforward. What is going to happen is simply that let's say I have 67 and I'm going to insert it between 66 and 72. That much is a given. That is deterministic.

Then I'm going to flip a coin or spin a Frisbee. I like this better. I'm not sure if this is biased or not. It's probably seriously biased.

[LAUGHTER]

Would it ever go the other way is the question. Would it ever? No. All right. So we've got a problem here. I think we might have to do something like that.

[LAUGHTER]

I'm procrastinating. I don't want to teach the rest of this material.

[LAUGHTER]

All right. Let's go, let's go. So I'd like to insert into some of the lists, and the big question is which ones? It's going to be really cool. I'm just going to flip coins, fair coins, and decide how much to promote these elements.

So flip fair coin. If heads, promote x to the next level up, and repeat. Else, if you ever get a tails, you stop. And this next level up may be newly created.

So what might happen with the 67 is that you stick it in here, and it might happen that the first time you flip you get a tails, in which case, 67 is going to just be at the bottom list. But if you get one heads, then you're not only going to put 67 in here, you're going to put 67 up here as well. And you're going to flip again.

And if you get a heads again, you're going to put 67 up here. And if you get a heads again, you're going to put 67 up here. And if you get a heads again, you're going to create a new list up there, and at this point when you create the new list, it's only going to be 67 up there. And that's going to be the front of your list, because that's the one element that you're duplicating.

So you're going to keep going until you get a tails. Now, that's why this coin had better be fair. So you're going to keep going and you're going to keep adding. Every time you insert there's a potential for increasing the number of levels in this list.

Now, the number of levels is going to be bounded in expectation with a high probability of regular expectation, but I want to make it clear that every time you insert, if you get a chain of heads, you're going to be adding levels. And so the first time you get a tails, you just stop. You just stop.

So you can see that this can get pretty messy pretty quick. And especially if you were starting from ground zero and adding 14, 23-- all of those things, the bottom is going to look exactly like it looks now because you're going to put it in there. It's deterministic. But the very next level after that looked pretty messy. You could have all of them chunked up here, and a big gap, et cetera, et cetera. So it's all about randomized search cost.

The worse case cost here is going to be order n. Worst case cost is going to be order n, because you have no idea where these things are going to end up. But the randomized cost is what's cool about this. Any questions about insert or anything I said? Yeah, go ahead.

AUDIENCE: Is worse case really order n? What if you had a really long, like a lot of lists on top of each other, and you start at the top of that and you had to walk all the way [INAUDIBLE]?

SRINIVAS DEVADAS: Well, you go n down and n this way, right? You would be checking so it would be order n.

AUDIENCE: So it's [? bounded ?] by n?

SRINIVAS DEVADAS: Yeah, the worst case.

AUDIENCE: Worse case is infinity.

SRINIVAS DEVADAS: Worse case is infinity. Oh, in that sense, yeah. OK. Well, n elements, Eric is right. So what is happening here is that you have a small probability that you will keep flipping heads forever. So at some level, if you somehow take that away and use Frisbees instead or you truncate it.

Let's say at some point you ended up saying that you only have n levels total. So it's not a-- I should have gone there. The question has to be posed a little more precisely for the answer to be order n. You have to have some more limitations to avoid the case that Eric just mentioned, which is in the randomized situation you will have the possibility of getting an infinite number of heads. Yeah, question back there.

AUDIENCE: [INAUDIBLE].

SRINIVAS DEVADAS: Yes, you can certainly do capping and you can do a bunch of other things. It ends up becoming something which is not as clean as what you have here. The analysis is messy. And it's sort of in between a randomized data structure, a purely randomized data structure, and a deterministic one.

I think the important thing to bring out here is the worst case is much worse than order log n, OK? Cool. Good. Thanks for those questions.

And so what we have here now is an insert algorithm that could make things look pretty messy. I'm going to leave the insert up here, and that, of course, is part of that.

Now, for the rest of the lecture we're going to talk about why skip lists are good. And we're going to justify this randomized data structure and show lots of nice results with respect to the expectation on the number of levels, expectation on the number of moves in a search, regardless of what items you're inserting and deleting.

One last thing. To delete an item, you just delete it. You find it, search, and delete at all levels. So you can't leave it in any of the levels. So you find it, and you have to have the pointers set up properly-- move the previous pointer over to the next one, et cetera, et cetera. We won't get into that here, but you have to do the delete at every level. Yeah, question.

AUDIENCE: So what happens if you inserted 10s and you flip off a tail? So that's like your first element is not going to go up all the way, and then have you do search.

SRINIVAS DEVADAS: So typically what happens is you need to have a minus infinity here. And that's a good point. It's a corner case. You have to have a minus infinity that goes up all the way. Good question.

So the question was what happens if I had something less than 14 and I inserted it? Well, that doesn't happen because nothing is less than minus infinity, and that goes up all the way. But thanks for bringing it up.

And so we're going to do a little warm-up Lemma. I don't know if you've ever heard these two terms in juxtaposition like this-- warm up and Lemma. But here you go, your first warm-up Lemma.

I guess you'd never have a warm-up theorem. It's a warm-up Lemma for this theorem, which is going to take a while to prove. This comes down to trying to get a sense of how many levels you're going to have from a probabilistic standpoint. The number of levels in an n element skip list is order log n. And I'm going to now define the term with high probability.

So what does this mean exactly? Well, what this means is order log n is something like c log n plus a constant. Let's ignore the constant and let's stick with c log n. And with high probability is a probability that is really a function of n and alpha.

And you have this inverse polynomial relationship in the sense that obviously as n grows here, an alpha-- we'll assume that alpha is greater than the 1-- you are going to get a decrease in this quantity. So this is going to get closer and closer to 1 as n grows. So that's the difference between with high probability and just sort of giving you an expectation number where you have no such guarantees.

What is interesting about this is that as n grows, you're going to get a higher and higher probability. And this constant c is going to be related to alpha. That's the other thing that's interesting about this.

So it's like saying-- and you can kind of say this for using Chernoff bounds that we'll get to in a few minutes, even for expectation as well. But what this says is that if, for example, c doubled, then you are saying that your number of levels is order 4 log n. I mean I understand that that doesn't make too much sense, but it's less than or equal to 4 log n plus a constant.

And that 4 is going to get reflected in the alpha here. When the 4 goes from 4 to 8, the alpha increases. So the more room that you have with respect to this constant, the higher the probability. It becomes an overwhelming probability that you're going to be within those number of levels.

So maybe there's an 80% probability that you're within 2 log n. But there's a 99.99999% probability that you're within 4 log n, and so on and so forth. So that's the kind of thing that with the high probability analysis tells you explicitly.

And so you can do that, you can do this analysis fairly straightforwardly. And let me do that on a different board. Let me go ahead and do that over here. Actually, I don't really need this. So let's do that over here.

And so this is our first with high probability analysis. And I want to prove that warm-up Lemma. So usually what you do here is you look at the failure probability. So with high probability is typically something that looks like 1 minus 1 divided by n raised to alpha. And this part here is the failure probability. And that's typically what you analyze and what we're going to do today.

So the failure probability is that it's not less than c log n levels, is the complement of what we just looked at, which is the probability that it's strictly greater than c log n levels. And that's the probability that some element gets promoted greater than c log n times.

So why would you have more than c log n levels? It's essentially because you inserted something and that element got promoted strictly greater than c log n times, which obviously implies that you had the sequence of heads, and we'll get to that in just a second.

But before we go to that step of figuring out exactly what's going on here as to why this got promoted and what the probability of each promotion is, what I have here is I have a sequence of inserts potentially that I have to analyze. And in general, when I have an n element list, I'm going to assume that each of these elements got inserted into the list at some point. So I've had n inserts. And we just look at the case where you have n inserts, you could have deletes, and so you could have more inserts, but it won't really change anything.

You have n inserts corresponding to each of these elements, and one of those n elements got promoted in this failure case greater than c log n times. That's essentially what's happened here.

And so you don't know which one, but you can typically do this in with high probability analysis because the probabilities are so small and they're inverse polynomials, polynomials like n raised to alpha. You can use what's called the union bound that I'm sure you've used before in some context or the other. And you essentially say that this is less than or equal to the probability that a particular element x.

So you just pick an element, arbitrary element x, but you pick one. Gets promoted greater than c log n times. So you have a small probability. You have no idea whether these events are independent or not.

The union bound doesn't care about it. It's like saying you've got a 0.001 probability that any of these elements could get promoted greater than c log n times, and there's 10 of those elements. You don't know whether they're independent events or not, but you can certainly use the union bound that says the overall failure probability is going to be less than or equal to n equals 10, in my example, times that 0.001. That's basically it.

Now you can go off and say, what does it mean for an element to get promoted? What actually has to happen for an element to get promoted? And you have n times 1 over 2, because you're flipping a fair coin, and you are getting a c log n heads here. You flip and you get one promotion. There's two levels associated with a promotion, the level you came from and the level you went to.

And so a promotion is a move, so you're going to have one more level. If you count levels, then you have the number of promotions, right? That's just simply corresponds to taking this 1/2 and raising it to c log n, because that's essentially the number of promotions you have.

And you got n 1/2 c log n, and what does that turn into? What is n times 1/2 c log n? 1 over 2 raised to log n would give you? 2 raised to log ns? Is n, right?

So you got n divided by n raised to c, which is 1 divided by n raised to c minus 1, which is 1 divided by n raised to alpha where alpha is c minus 1. So that's it. That's our first with high probability analysis. Not too hard.

What I've done is done exactly what I just told you that the notion of with high probability is. You have a failure probability that is related. Inverse polynomial and the degree of the polynomial alpha is related to c. And so that's what I have out there, but c equals-- what did it have? Alpha equals c minus 1 or c equals alpha plus 1.

So what I've done here is done an analysis that tells you with high probability how many levels I'm going to have given my insert algorithm. So this is the first part of what we'd like to show. This just tells us how big this skip list is going to grow vertically.

It doesn't tell us anything about the structure of the list internally as to whether the randomization is going to cause that pretty structure that you see up here to be completely messed up to the point where we don't get order log n search complexity, because we are spending way too much time let's say on the bottom list or the list just above the bottom list, et cetera.

So we need to get a sense of how the structure corresponding to the skip list, whether it's going to look somewhat uniform or not. We have to categorize that, and the only way we're going to characterize that is by analyzing search and counting the number of moves that a search makes.

And the reason it's more complicated than what you see up there is that in a search, as you can see, you're going to be moving at different levels. You're going to be moving at the top level. Maybe at relatively small number of moves, you're going to pop down one, move a few moves at that level, pop down, et cetera, et cetera.

So there's a lot of things going on in search which happen at different levels, and the total cost is going to have to be all of the moves. So we're going to think about all of the moves-- up moves, down moves, and add them all up. They all have to be order log n with high probability. There's no getting around that because each of them costs you.

So that's the thing that we'll spend the next 20 minutes on. And the theorem that we like to prove for search is that-- this is what I just said-- any search in an n element skip list costs order log n w.h.p.

So it doesn't matter how this skip list looks. There's n elements, they got inserted using the insert algorithm-- that's important to know if you're going to have to use that. And when I do a search for an element, it may be in there, it may not be in there. Doesn't really matter. We'll assume a successful search. That is going to cost me order log n with high probability.

And the cool idea here in terms of analyzing the search in order to figure out how we're going to add up all of these moves is we're going to analyze the search backwards. So that's a cool idea. So what does that mean exactly?

Well, what that means is that we're going to think about this b search, which think of it as the backward search, starts-- it actually ends, so that's what I'm writing in brackets here, at the node in the bottom list. So we're assuming a successful search, as I mentioned before. Otherwise, the point would just be in between two members.

You know that it's not in there because you're looking for 67 and you see 66 to your left and 72 to your right. So either way it works, but keep in mind that it's a successful search because it just makes things a little bit easier.

Now, at each node that we visit, what we're going to do is we're going to say that if the node was not promoted higher, then what actually happened here was that when you inserted that particular element, you got a tails. Because otherwise you would have gotten a heads, that element would have been promoted higher.

Then you go-- and that really means that you came from the left-hand side, so you make a left move. Now, search of course makes down moves and right moves, but this is a backward search so it's going to make left moves and up moves.

What else do I have here? Running out of room, so let me-- let's continue with that.

All right. And now the case is if the node was promoted higher, that means we got heads here in that particular insertion. Then we go, and that means that during the search we came from upstairs.

And then lastly, we stop, which means we start when we reach the top level or minus infinity if we go all the way back. So that's it. A lot of writing here, but this should make things clear.

So let's say that we're searching for 66. I want to trace through what the backwards path would look like, and keep that code in mind as I do this. So I'm searching for 66, and obviously, we know how to find it. We've done that. But let's go backwards as to what exactly happened when we look for 66.

When we look for 66, right at this point when you see 66, where would you have come from?

AUDIENCE: [INAUDIBLE]

SRINIVAS DEVADAS: You'd have come from the top. And so if you go look at what happens here, the node when it got inserted was promoted one level. So that means that you would go up top in the backward search first. Your first move would be going up like that.

Now, if there's a 66 up there, you would go up one more. But there's not, so you go left. You go to 50. And when you have a 50 up here, would you stay on this level?

AUDIENCE: No.

SRINIVAS DEVADAS: No. You'd go up to 50 because the first chance you get you want to get up to the higher levels. And again, this 50 was promoted so you go up there, and you go to 14, and pretty much that's the end of that.

So this would look like you go like that, you have an up move, then you have a left move-- different colors here would be good-- then you have an up move, and a left, and then an up. So that's our backward search. And it's not that complicated, hopefully. If you're looking for 66 or 59, you do that. So it's much more natural, and you just need to flip it.

Why am I doing all this? Well, the reason I'm doing all this is that I have to do some bounding of the moves, and I know that the moves that correspond to the up moves are probabilistic in the sense that the reason I'm making them is because I flipped heads at some point. So all of this is going to turn into counting how many coin flips come out heads in a long stream of coin flips. So that's what this backward search is going to allow us to do.

And that crucial thing is what we'll look at next. So the analysis itself is a bit painful, but there's a bunch of algebra. But what I want to do is to make sure that you get the high level picture, number one, and the insights as to why the expected value or the with high probability value is going to be order log n. But the key is the strategy.

So we're going to go off and we're going to prove this theorem. Our backward search makes up moves and left moves. We know that. Each with probability 1/2. And the reason for that is when you go up is because you got a heads, and if you didn't get a heads in you got a tails, that meant you go left. Because of the previous element, every time you're passing these elements that are inserted, and they were inserted by flipping coins.

So that's key point number one. All of that, if you look at what happens here when I drew this out, you got heads here and you got tails there. So each of those things for a fair coin is happening with probability 1/2. And it's all about coin flips here.

Now, the number of moves going up is less than the number of levels-- the number of levels is one more than that. And we've shown that that's c log n with high probability by the warm-up Lemma. That's what this just did.

The number of up moves-- I mean you can't go off the list here. This list is now you're not inserting anymore, you're doing a search. So it's not like you're going to be adding levels or anything like that. So the number of up moves we've taken care of.

So this last thing here which I'm going to write out here is the key observation, which is going to make the whole analysis possible. And so this last thing it says that the total number of moves-- so now the total number of moves has to include, obviously, the up moves and the left moves, and there's no other kind.

The total number of moves is going to correspond to the number of moves till you get c log n up moves. So what does that mean? There's some sequence of heads and tails that I'm getting, each of them with probability 1/2. Every time that I got a heads, I moved up a level.

The fact of the matter is that I can't get more than c log n heads because I'm going to run out of levels. That's it. I'm going to run out of room vertically if I keep popping up and keep doing up moves. So at that point I'm forced to go left. Maybe I'm going left in the middle there when I still had a chance to go up. That corresponds to getting a tails as opposed to a heads.

But I can limit the total number of moves from a probabilistic standpoint by saying during that sequence of coin flips I only have a certain number of heads that I could have possibly gotten. Because if I got more heads than that, I would be up top. I'd be out of the skip list, and that doesn't work.

So the total number of moves is the number of moves till you get c log n up moves, which essentially corresponds to-- now, forget about skip lists for a second. Our claim is the total number of moves is the number of coin flips, so these are the same, because every move corresponds to a coin flip. Until-- it's a fair coin, probability 1/2-- until c log n heads have been obtained.

So the number of coin flips until c log n heads is the total number of moves. This equals that. And what we now want to show, if you believe that, and hopefully you do because the argument is simply that you run out of levels, that this is order log n w.h.p. That's why it's a claim.

So the observation is that the number of coin flips, as you flip a fair coin, until you get c log n heads will give you the number of moves in your search, total number of moves in your search. It includes the up moves as well as the left moves. And now what we have to show is that that is going to be order log n with high probability. OK?

And then once you do that you've done two things. You've bounded the number of levels in the skip list to be order log n with high probability. And you've said the number of moves in the search is order log n with high probability assuming that the number of levels is c log n, obviously.

So it's not that the bottom one subsumes the top one. It's the last thing to keep in mind as we get all of these items out of the way. This assumes that there are less than or equal to c log n levels. That's the only reason why I could make an argument that I've run out of levels.

So if I have this event A here-- if I call this event A, and I have this event B, what I really want is-- I've shown you that event A happens with high probability. That's the warm-up Lemma. I need to show you that event B happens with high probability. And then I have to show you that event A and event B happen with high probability, because I need both.

Any questions? We're stopping a minute here. The rest of the analysis, a bunch of algebra, we'll get through it, you can look at the notes. This is the key point. If you got this, you got it. Yeah.

AUDIENCE: Can you just say that because the probability of drawing an up move instead of a left move is 1/2, that the expected number of left moves should be equal to the number of up moves, [INAUDIBLE] bound the up moves?

SRINIVAS DEVADAS: So the argument is that since you have 1/2, can you simply say that the expected number of left moves is going to be the same as the same as the up moves?

You can make arguments about expectation. You can say that at any level, the number of left moves that you're going to have is going to be two in expectation. It's not going to give you your with high probability proof. It's not going to relate that to the 1 divided by n raised to alpha.

But I will tell you that if you just wanted to show expectation for search is order log n, you won't have to jump through all of these hoops. At some level you'll be making the assumptions that I've made explicit here through my observations when you do that expectation. So if you really want to write a precise proof of expected value for search complexity, you would have to do a lot of the things that I'm doing here.

I'm not saying you waved your hands. You did not. But it needed more to than what you just said. OK?

So this is pretty much what the analysis is. With high probability analysis we bounded the vertical, we bounded the number of moves. Assuming the vertical was bounded, we got the result for the number of moves. So both of those happen with high probability. You got your result, which is the theorem that we have somewhere. Woah, did I erase the theorem?

AUDIENCE: [INAUDIBLE].

SRINIVAS DEVADAS: It's somewhere. All right. Good.

So let's do what we can with respect to showing this theorem. There's a couple ways that you could prove this. There's a way that you could use a Chernoff bound. And this is kind of a cool result that I think is worth knowing.

I don't know if you've seen this, but this is a seminal theorem by Chernoff that says if you have a random variable representing the total number of tails, let's say-- it could be heads as well-- in a series of m-- not n, m-- independent coin flips where each flip has a probability p of coming up heads, then for all r greater than 0, we have this beautiful result that says the probability that y, which is a random variable-- a particular instance when you evaluate it-- that it is larger than the expectation by r is bounded.

So just a beautiful result that says here's a random variable that corresponds to flipping a coin. I'm going to flip this a bunch of times, and I know what the expectation is. If it's a fair coin of 1/2, then I'm going to get m over 2-- expected number of heads is going to be m over 2. Expected number of tails is going to be m over 2. If it's p, then obviously it's a little bit different-- p times m.

But what I have here is if you tell me what the probability is that I'm 10 away from the expectation and that would imply that r is 10, then that is bounded by e raised to minus 2 times 10 square divided by m. So that's Chernoff's bound.

And you can see how this relates to our with high probability analysis. Because our with high probability analysis is exactly this. This is the hammer that you can use to do with high probability analysis. Because this tells you as you get further and further away from the average or you get further and further away from the expectation, what the probability is that you're going to be so far away.

What is the probability that in 100 coin flips that are fair, you get 50 heads? It's a reasonably large number because the expected value corresponds to 50. So r is 0. So that just says this is a-- well, it doesn't tell you much because this says it's less than or equal to 1. That's all it's says.

But if you had 75, what are the probability that you get 75 heads when you flip a coin 100 times? Then e of y for a fair coin would be 50, r would be 25, and you'd go off and you could do the math for that.

So it's a beautiful relationship that tells you how the probabilities change as your random variable value is further and further away from the expectation. And you can imagine that this is going to be very useful in showing our with high probability result. And I think what I have time for is just to give you a sense of how this result works out-- I'm not going to do the algebra. I don't think it's worth it to write all of this on the board when you can read it in the notes.

But the bottom line is we're going to show this little Lemma that says for any c, invoking this Chernoff bound, there's a constant d, such that with high probability, the number of heads in flipping d log n. So I have a new constant here. d log n fair coins, or a single fair coin, d log n times, assuming independence, is at least c log n.

So what does this say? A lot of words. It just says, hey, you want an order log n bound here eventually. The beauty of order log n is that there's a constant in there that you control. That constant is d. So you tell me that c log n is 50. So c log n is 50.

Then what I'm going to do is I'm going to say something like, well, if I flip a coin 1,000 times, then I'm going to have an overwhelming probability that I'm going to get 50 heads. And that's it. That's what the Lemma says.

It says tell me what c log n is. Give me that value. And I will find you a d, such that by invoking Chernoff, I'm going to show you an overwhelming probability that for that d you're going to get at least c log n heads. So everybody buy that? Make sense from what you see up there? Yup?

So this essentially can be shown-- it turns out that what you have to do is-- and you don't have to choose 8, but you can choose d equals 8c. Just choose d equals 8c and you'll see the algebra in the notes corresponding to what each of these values are.

So e of y, just to tell you, would be m over 2. You're flipping m coins, fair coin with probability 1/2. So you got m over 2. And then the last thing that I'll tell you is that what you want in terms of invoking that, you want r-- remember we were talking about tails here-- so r is going to be d log n minus c log n.

So you just invoke Chernoff with e of y equals m over 2. And what you're saying here is you want c log n heads. You want to make sure you get c log n heads, which means that the number of tails is going to be d log n minus c log n.

And typically we analyze failure probability, so what this is is this is going to be a tiny number. So the failure is when you get fewer than c log n heads. So the failure is when you get fewer than c log n heads. And so that means that you're getting more than d log n minus c log n tails as you're flipping this coin. Fewer than c log n heads means you're getting at least d log n minus c log n tails. So that's why this is your r here.

And then when your r gets that large, and you can play around with the d and the c and choose d equals 8c, you realize that this is going to be a minuscule probability. And you can turn that around to a polynomial-- again, a little bit of algebra. But you can show this result on here that says that the number of coin flips until c log n heads is order log n with high probability by appropriately choosing the constant d to be some time/number over c.

So I'll let you do that algebra. But this one last thing that-- we're not quite done. So you thought we were done, but we're not quite done. And why is it that we're not quite done? Real quick question worth five Frisbees. Why is it that we're not quite done? What did I say? I have done event A and event B, right?

AUDIENCE: [INAUDIBLE].

SRINIVAS DEVADAS: I haven't done the last thing which is to show that probability of event A-- this is with high probability happens-- and I need to show that probability of event A and event B happens-- or this is with high probability. Or I should just say event A and event B happen with high probability.

And you can see that. It turns out it's pretty straightforward, but you got the gist of it. Thanks for being so patient. And there you go guys. Woah.