# Lecture 20: Independence

Flash and JavaScript are required for this feature.

Description: Differentiates between independent and dependent events as it pertains to probability, covering applications like coin flips, the distribution of birthdays, hashing, and cryptography.

Speaker: Tom Leighton

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.

TOM LEIGHTON: Today we're going to talk about the concept of independence. In probability, we say that an event A is independent of an event B if one of two conditions hold. First, if the probability of A given B is just the same as the probability of A or if B can't happen, namely the probability of B is 0.

In other words, A is independent of B if knowing that B happened doesn't change the probability that A is going to happen. So knowing that this event occurs doesn't influence the probability that A occurs. And there's a special case where they're independent because you know that B can't happen. If the probability of B happening is 0, then everything is independent of B.

Now, the typical example that gets used is when you flip two coins. So say we flip two fair, independent coins. And let's let B be the event that the first coin is heads and that means that the probability of B happening is 1/2, because we've assumed it's a fair coin, and we'll let A be the event that the second coin comes out heads. So we know the probability of A is 1/2 because it's fair.

And because they're independent, we can conclude that the probability of A given B is 1/2, which is the probability of A. In other words, seeing the result of the second coin doesn't tell you anything about the result of the first coin.

Now actually, when you flip two coins, it's not just always the case if they're independent. Can anybody think of an example where you can flip a pair of coins and they are dependent somehow, they're not independent? Yeah.

AUDIENCE: Well, if you have to get two heads and two tails?

TOM LEIGHTON: If you have to get two heads or two tails. Well, how would you have to get?

AUDIENCE: The probability of getting two heads should be 1/4 [INAUDIBLE].

TOM LEIGHTON: Well, then they would be independent in that case. Yeah.

AUDIENCE: If you glue the coins together.

TOM LEIGHTON: Yeah. I mean, this is a silly example, but I got two fair coins here. I could clip them together and now I flip them and odds are pretty good they're both going to be heads or both be tails. If you know what happened to the right coin, it will tell you what happened to the left coin.

Now, that's a pretty contrived example, but it is illustrative of what happens in practice. In practice, we assume independence even though there can be subtle dependencies and this could lead to trouble. In fact, we're going to give a lot of examples where it leads to trouble today and also for the rest of the course. Because we're always going to want to assume independence and when we do, we're going to get very nice results, but things aren't always independent in practice and establishing independence is a hard thing to do.

For that matter, while we're on the subject, we always talk about fair coins. You flip a coin and it's fair. You know, that's not always to either. There's actually a famous mathematician named Persi Diaconis who used to down the street at Harvard and he came and gave a talk one day at MIT in the math department and he's a probabalist. He does probability theory and is a very cool guy.

And so he flipped a coin, got a quarter from somebody in the audience and flipped it and he flip that I think 10 or 20 straight times all the way to the roof, caught it, turned it over. Every time it was heads. And he goes, now what's the probability of that happening? Well, you know, it's 1/2 to the 20th or whatever, not very likely. How could he always make it come out heads?

Well, Persi was an unusual guy and in fact, he'd spent months in the strobe lab over at Harvard practicing to make it always rotate seven times, three of them on the way up, one at the top, and then three down. And he could actually see how many rotations it had done to make sure it was seven, so it always came out heads.

Now, he is an unusual fellow. He was 1 of 10 people in the world that could do a perfect shuffle reliably on a deck of cards and that's a very hard thing to do. He said he had to practice 8 hours a day for over six months to be able to do it every time. In fact, he gave another talk at MIT where he came in and he made magic tricks, actually based on mathematics. And you would cut a deck, he would feel it like this and tell you where you cut, how many cards were in the part you picked up and then do his eight perfect shuffles, which is enough to return a normal 52-card deck back to its original order.

And then using this, he could play the game where pick any card, you stick it in, he feels where the card went, and then using mathematics, he could shuffle the deck eight times and make the card come out anywhere he wanted in the deck. So he had a lot going on upstairs too.

He had an interesting life history. He ran away from home as a young child and joined the traveling circus. And then somehow from there, he joined the faculty at Harvard. You know, there's an amazing story.

And actually your story about Persi is he was the first guy to get kicked out of casinos for card counting. He figured that out way before the MIT team and the movie 21. Down in Puerto Rico, he used to play and then they finally figured him out and he got booted.

So back to independence, let's do another picture example. Say that my sample space looks like this and I've got two events, A and B and they look like this, so they're dis-joined. Are A and B independent? No. In fact what is the probability of A given B as I've drawn it?

AUDIENCE: 0.

TOM LEIGHTON: 0. Because if B occurs, you're outside of A. And so this does not equal the probability of A as long as it's not 0. So disjoint events don't imply that they're independent.

Now, what's the picture look like for them to be independent? What is the right picture to draw here? So I got my sample space and say I make this half the sample space be A. Well, then B to be independent, would look something-- I didn't quite draw it. I actually have it be 50-50.

So if a is 50% of S, like this half, then for A to be independent of B, A intersect B, this part, has to be 50% of B. Because the probability of A given B must equal the probability of A to be independent. So this would be a picture where they are independent.

Now, independent events are really nice to work with and in part because they have a very simple rule for computing the probability of an intersection of events and it's called the product rule for independent events. And that says that if A is independent of B, then the probability of A and B or A intersect B is just the product of their probabilities separately, the probability of A times the probability of B.

So let's prove this. And there's two cases, depending on whether or not B can happen, if the probability of B is 0 or not. So case 1 is B can't happen. The probability of B is 0. In this case, what's the probability of A and B? B can't happen. 0.

If B can't happen, then they both can't. You can't have both of them happening and that equals the probability of A times the probability of B because the probability of B is 0. So that case works.

Case 2 is the probability of B is bigger than 0. In that case, we have the probability of A and B, A intersect B, well, from the definition, is the probability of B times the probability of A given B. We did that last time. And by independence, this is just the probability of A because A is independent of B, so we're done.

In fact, many texts will define independence by this product rule. Many texts will say that A and B are independent if this is true. And it's equivalent, it turns out. We won't prove that here, but if you use this as the definition, then you can derive our definition as a result. So this is an equivalent definition of independence.

Another nice fact about independent events is that it's a symmetric relationship. It's called the symmetry of independence. That says that if they A is independent of B, then the reverse is true. B is independent of A. Now, we won't prove that. It's actually easier to see that it's true if this were the definition of independence because A intersect B is the same as B intersect A and multiplication is commutative. So it's easier to see it if we had used that definition.

So because of this we often just say A and B are independent because it doesn't matter which order you're taking them in. All right, any questions about the definition so far? All right. Let's do some examples.

Let's say I have two independent fair coins. And I'm going to have the event A be the situation when the coins match, both heads, both tails. And B is going to be the event that the first coin is heads. And I want to know, are A and B independent? Are those independent events?

Well, what's the first answer to this? I mean, A is event the coins match. B tells me what the first coin was. So the first inclination here is that these are dependent events because I know something about the first coin, so that might tell me something about the probability they match. There could be some dependence here.

Now, in fact, because it's set up, they're independent and we can check that by just doing the calculation, computing the probability of A given B. Maybe I can do that. I'll do that here. The probability of A given B is, well, the condition that they're going to match given that the first point is heads means it's the same as the second coin being heads. This is the probability the second coin is heads and that's just 1/2 because it's a fair coin and independent of the first one.

Now, the probability of A, by itself, the events the coins match, what's that? How much is that? What's the probability the coins match?

AUDIENCE: [INAUDIBLE].

TOM LEIGHTON: 1/4 plus 1/4. I've got 1/4 chance of heads, heads 1/4 chance of tails, tails, so it's 1/2, so it works out. The probability of A given B equals the probability of A. They're both 1/2. So A and B are independent events because that's just the definition even though it looked like there might have been some dependence lurking around here.

Now, this example that I just did is a little misleading. The intuition they probably are dependent actually is good intuition in this case because if I don't have fair coins, they are dependent. All right. So in particular, let's look at what happens if the probability of a heads is p and the probability of tails is 1 minus p for both coins.

So let's compute the probability of A given B. What is it in this case? Well, it's the probability the second coin is heads. What's that? p because both of them are heads with probability, p. They're independent still. The two coins are independent.

And now let's look at the probability that the coins match. Well, it's a probability of heads, heads and the probability of tails, tails. Heads, heads is p times p. Tails, tails is 1 minus p squared. So to independent, I need this to equal that or to have the probability of B be 0.

So A and B are independent if and only if-- the first case is probability B is 0, which means that p equals 0, or that has to equal this. So p would have to equal 1 minus 2p plus 2p squared, just square that out there. So let's solve this. That happens if and only if 0 equals 1 minus 3p plus 2p squared. That's true if and only if 0 equals-- I factor this-- it's 1 minus 2p times 1 minus p and that's if and only if p is 1/2 or p is 1, two roots.

So if the coins are always heads, they're independent. If they're always tails, the events are independent or if they're fair coins, these two events are independent. But anything else, they're not independent anymore. Any questions? And now you can sort of see if the coins are likely to be tails and the first one comes up heads, that should influence the probability the coins match. It should change.

Questions? All right. So there's a nice application of this to getting an edge in ultimate Frisbee. Now, when you're playing ultimate, you've got to decide who gets the Frisbee first. And sometimes you don't have a coin to flip, call heads or tails, but you do have the Frisbee.

Now, you could flip the Frisbee and call right side up or not, but the problem is the Frisbee is known not to be a fair coin. When you toss it up in the air, it's likely to wind up on, I guess, the curved edge down. So that wouldn't be fair to call heads or tails.

So the standard solution is to flip the two Frisbees at the same time or one Frisbee twice and somebody calls same or different, that the two Frisbees both come up on the same way or they come up different ways and then if you called it right, you get to start with a Frisbee. And the idea behind this is that that simulates a fair coin, that the probability that they're the same is 50-50.

What do you think. Is that a fair way to decide who starts first? Yeah.

AUDIENCE: No.

TOM LEIGHTON: No. Yeah, that's right. It's not. Now, it is in the case when the coin was fair, but we know the Frisbee is not fair. And in fact, you can see this from this probability. This is the probability of a match, which is fine at p equal 1/2, but in fact, if you analyze this equation, you find out its minimum value is at p equals 1/2 and as p starts moving away from 1/2 towards 0 or to 1, it gets bigger. And we know that for Frisbees, p is not 1/2. This means that the probability of a match is better than 50%.

So if you're ever playing ultimate, always call same because you're going to have a better than 50-50 chance of getting to start with the Frisbee. It's not a fair example. There is another example of how to make a fair coin from a biased coin to an unbiased coin in homework, ways of doing this that are fair. Because often you have biased random numbers and you want to get unbiased or maybe you got a fair coin and you want to make something that comes up heads with probability 1/3. How do you actually do that in a way that works? Any questions on that?

The next example is from the first OJ Simpson trial. How many people here know who OJ Simpson is? OK, so he's still pretty famous. Now, as you probably know then he was a famous football player. Back when I was a kid, he was a famous college player, then he was a famous pro player and then he was an actor, famous actor.

And then he was accused of murdering his wife in a gory knifing and a friend of his wife's. And ultimately, the jury found him not guilty, but pretty much everybody in the country thought he did it. He looked really guilty. And it was a big media event, one of the first big trial events on TV. And so all the proceedings were on TV and everybody watched them. We'd all go home to watch the OJ hearing. It was amazing.

Now, during the indictment proceedings, there was a huge dispute over what independence was and does it matter. The issue arose when the prosecution witness claimed that only 1 in 200 Americans had a certain blood type that matched the blood type found at the scene of the crime, which was alleged to be OJ's blood. And this was during the indictment and back then DNA tests took a long time and they weren't ready yet. And the witness presented the following facts and this was the crime lab guy, the police guy.

He said that 1 in 10 people, roughly, matched type O blood. And that 1 in 5 people matched the Rh factor positive. And that 1 in 4 people match a certain kind of marker, which I don't remember what it was. We'll just call it marker XYZ, some other factor of the blood. And then this conclusion was that this means that 1 in 200 match all three factors.

And this seems reasonable because there's 1/10 of the people have O, if 15 of them have positive Rh factor and then 1/4 of all of those have this marker, that's 1 in 200. Now, it's important because OJ's blood and the blood at the crime scene both matched all three. So the implication, of course, is that OJ is looking like the guy who did it. And the question was, well, is the 1 in 200 really true? We can sample these three in the populations and see they're true, but is 1 in 200 really true?

Now, it would be if, in fact, we verified that 1/5 of the type O people have positive and 1/4 of the O positive people have the XYZ marker. But well, we don't necessarily know that unless we go figure that out. If you assume they're independent, then it would be true. The product rule will tell us that if you assume they're independent.

So during the trial, a special math defense counsel showed up, not part of the normal defense team, but he was brought in as a mathematician and lawyer and he crosses the police guy on the stand. And he asked the police guy, the lab guy if it is known that these three factors are independent. Well, the poor police lab guy never heard the word independent before, didn't know what it meant and the defense counsel proceeded to crucify him on the stand. And then in the end, all he could say was, look, we just get these things and we multiply them. That's what we're supposed to do.

It was a little scary. The actual transcript-- you can still get it-- is a little scary. The same problem arises today with DNA testing. Only there, you've got lots of these things and you multiply them all together and you get probabilities like one in many billion probability of a match.

Now, there's probably a higher level of science going on with DNA testing, but it's even harder to really establish independence. If you assume it, fine. The math works out great. You just multiply them together. But how do you know it's really true? How do you know that maybe a lot of people that have those four markers and DNA don't happen to just have the fifth also, but it really is totally unrelated.

And to know that for sure, you got to test hundreds of millions of people, which we really haven't done yet, and not just a few guys in Detroit to be able to conclude independence of 1 in a billion probabilities.

So for us, this is a lot easier. In the classroom, we assume independence and we'll keep doing that left and right, but it doesn't mean it's true in reality. In fact, in the last week of class. We'll talk about how false assumption of independence on mortgage failures led to the subprime mortgage disaster in the recession. It was all because of some mathematics mistakes that people made.

Now, this example raises the question of, what does independence mean when you have more than two events? We defined independence when there is two events, but here there's three. And so to be careful, we got to actually define dependence among more than two events and in this case, we talk about the events as being mutually independent. So let me define that.

So if I've got events A1, A2, up to An, we say they are mutually independent if, and this is a little complicated notation, but for all i and for all sets j that are subsets of the events, but not including i, then the probability that the i-th event occurs given that all the events in the subset occurred, is the same as the probability of the i-th event occurring by itself. Or there's a special case where the chance the other events occur is 0.

In other words, a collection of events is mutually independent if any knowledge about any of the rest of the events, happening or not, does not influence the event you're looking at for each of those events. So no information about any of the other markers the blood influences the i-th marker for any i. The probabilities are unchanged.

Now, there's an equivalent definitions based and the product rule. Let me show you that version because that's easier to work with usually. This is the product rule form and it says that A1, A2, up to An are mutually independent if for any subset of the events the probability of each of those events in the subset happening, all them happening, is simply the product of their individual probabilities.

So independence means that if you want the probability of a bunch of events occurring, just multiply them out individually. And that follows for independence or it could be the definition of independence, depending on how you want to do it. So either of these are good enough for you to use as a definition or a result for independence. And so the blood guy, of course, is just multiplying them out because they're assumed to be independent, so it's OK that way.

Let's do an example. So for example, say we have three events. A1, A2, and A3 are mutually independent if, these are the things you have to check, probability A1 and A2 is just the probability of A1 times the probability of A2. Then you'd check that the probability of A1 and A3 is the product of their probabilities, A1 and A3. And you'd check the probability of A2 and A3 is the product of their probabilities.

And there's one more thing to check. What's that? All of them. The probability of all of them is the product of each of them together here. So if you want to show the three events are mutually independent, these are the four things you check. That's one way to do it, which is the case of the blood typing in the situation.

All right. Let's do an example. Well, for example, if I flip three unbiased, mutually independent coins. The probability of two of them being heads is 1/4. The probability of three being heads is 1/8 and so forth. Let's do a trickier example. This is a question that was on the final exam a few years ago and a lot of the class missed it. So now we'll do it here.

Say I flip three fair, mutually independent coins and my events are going to be A1 is the event coin 1 matches coin 2. The second event, A2, is the event that coin 2 matches coin 3. And the third event, A3, is the event that coin 3 matches coin 1.

And the question was, are these three events mutually independent? Prove your answer. Let's try to figure that out. The coins, of course, are mutually independent, but what about these events? So let's start doing it. What's the probability one of the events occurring? Well, you got to get the two coins at hand to match, so that's the probability of a heads, heads plus the probability of a tails, tails. That's 1/4 plus 1/4 equals 1/2.

Now, the probability of Ai and Aj, i and j are 1 to 3, they're different, but what is a way of characterizing that case? Say event 1 occurred and event 2 occurred, how would I characterize that? Yeah.

AUDIENCE: All the same.

TOM LEIGHTON: All of them. Yeah. All of the coins are the same because if A1 and A2 occur, I know 1 matches 2 a 2 matches 3. If A1 and A3 happen, 1 matches 2 and 1 matches 3, so they're all the same and the same for A2 and A3. If 2 matches 3 and 3 matches 1, they're all the same. So this is the same as saying all three coins are the same. It could all be heads or all be tails.

And that's an 8 plus 8, which is 1/4 and that means equals the probability of Ai times the probability of Aj, which is what I need for independence. And then they said they're done. They are independent, the three events. You like that answer? What's missing?

The last case. They didn't check the last case and we got to do that to have mutual independence. So let's look at that. The last case is probability A1 intersect A2 intersect A3. What is the probability that all three events occur?

Well, the coins all have to match, right? If all the coins match, all three events occur, right? And what's the probability all 3 coins match? 1/4, just the same as this, is 1/4. Does that equal probability of A1 times the probability of A2 times the probability of A3?

What's that? 1/8. This is 1/8. They are not equal. They are not mutually independent events. All right? Any questions about that? It might well be something like this on the final this year, a good, decent chance.

So if you start going along, looks like they're independent, but you forget to check that last case, which shows they're not mutual independent. So you've got to check for all pairs and all subsets of events for mutual independence. Any questions about that?

Now, this is actually an interesting example because in this case, all pairs were independent and when that happens, we give that a special name and it's called pairwise independence, not too surprising. And that can be useful because there's many times where you do get pairwise independence, but not mutual independence. So let me give you that definition.

So a collection of events A1 through An are said to be pairwise independent if for all i and j, where i doesn't equal j, Ai and Aj are independent. Now, as we saw in this example, in this example, it was pairwise independence because the probability of Ai and Aj equaled the probability of Ai times the probably of Aj. For any pair, it was true. But it doesn't imply mutual independence. So pairwise does not imply mutual. Mutual would imply pairwise because it's true for every subset of events.

All right. So let's go back for OJ and see what would have happened. What can you say about the probability of a blood match for a random person if you only knew that these factors were pairwise independent? Say you only knew that. You didn't know they were mutually independent, but you knew they were pairwise independent in the population. What's the best you can say about the probability a random person matches that blood profile, an upper bound on the probability? Yeah.

AUDIENCE: 1 in 50.

TOM LEIGHTON: 1 in 50. Yeah. So what you can say is 1 in 50, but nothing better. So let's see why 1 in 50 works. So let's let M1 be the event you match here, M2 be the event you match their, and M3 be the event you match that. The probability you match all three is upper bounded by the probability you match the first two because matching all three is a subset of this.

Pairwise independence means that this is true. This equals the probability of matching the first times the probability of matching the second. The probability of matching the first is 1/10, probably of matching the second is 1/5, so this is 1/50. And you picked the best two. You could have picked these two and said it was at most 1/20 or those two and said it's at most 1/40. But you were clever and said, OK, I'm going to take these two and use that as my upper bound, which is 1/50.

And it might well be that 1 in 50 people match all three. That can well be. Because maybe whenever you're O positive, you have marker XYZ. That's possible, potentially, unless we find out otherwise.

What if I tell you can't assume any independence at all? What can you say about the probability of a blood match here for a random person? Yeah.

AUDIENCE: 1/10.

TOM LEIGHTON: What is it?

AUDIENCE: 1/10.

TOM LEIGHTON: 1/10. Because if they match all three, they match this and that probability is 1/10, so it's at most 1/10. And it could be that everybody who's O is O positive and has XYZ. So unless you have more information, that's the best you can say. It might well be that's the answer. Any questions about that?

So the assumptions really matter. The more independence you assume, the better bounds and the probability you get of a match. It's a little bit unrelated to this, but there was another mathematics dispute at the OJ trial. It turned out the that OJ had been beating up Nicole on a fairly regular basis and there were police records because after he'd beat her up, she'd go in and complain to the police.

And the prosecution wanted this evidence admitted at the trial because if the guy is a wife beater, it makes you think that maybe he killed her. And the defense lawyers argued against admitting that evidence because it wasn't tied to the actual murder scene in any way and they argued it would be prejudicial to the jury because, of course, if the jury hears that OJ was beating her, they might be more likely to include to convict him for murdering her.

Now, they got the math council again to argue that the reason you shouldn't admit this is because the probability that you kill your wife, that's K, given that you batter your wife, that's B, is 1 in 2,000. I would have guessed it was higher, but the evidence did show that. And so they said, look, there's only a 1 in 2,000 chance that this evidence of wife beating is relevant and therefore, it should not be admitted because there's a pretty decent chance if the jury hears this, they're going to convict him.

That's a pretty good argument. And usually that kind of thing, you exclude it. Yeah.

AUDIENCE: Where did that number come from?

TOM LEIGHTON: They got some study and some experts to come in and say that for every 2,000 wife beaters, only one of them actually kills his wife. Now, what do you suppose the prosecution argued back? They actually argued back very effectively, because that's a tough argument to get by. Yeah.

AUDIENCE: What's the probability that you kill your wife in the first place, that could be 100 times larger than usual.

TOM LEIGHTON: Well, that's a good point. So maybe the probability of killing your wife not knowing B, I hope is pretty small, probably that's very small, but I don't know. But in any case, this thing you're going from, say it's 1 in 1 million to 1 in 2,000, 1 in 2,000 is still too small to be used as evidence that OJ did it.

AUDIENCE: Frequency he did it.

TOM LEIGHTON: Frequency, they didn't get into that because I guess he'd done it a bunch, but that's a good point. It could be there's multiple beatings is higher. Maybe that's 1 in 200 then. In fact, that may be the case because I think there's probably they say because if you do it once, you do it multiple times. So there's not much more to be gaining there.

There's a critical piece of information we've left out of our conditional probabilities here. In fact, the most glaring piece of all of evidence. What's missing here? What haven't we factored in? Yeah.

AUDIENCE: The probability of B.

TOM LEIGHTON: The probability of B, that's the battering. Battering, I don't know what it is, probably a large number. Defense would argue it's large, I guess, but it shouldn't matter that much.

AUDIENCE: The probability that he actually beat her, given that she threatened him?

TOM LEIGHTON: Well, there's that, but they have police-- well, that's true. They didn't see him doing it, but let's say that they had good evidence that he did it and defense wasn't arguing that he didn't really beat her. The key thing we're missing here is Nicole wound up dead. She was dead. And there's another stat here that the prosecution argued.

So they argued this fact. The probability the husband kills his wife, given that he batters her and she wound up dead, that somebody murder her is bigger than 1/2. So here M is somebody murdered the wife. Here, the husband beats her. Now, the conditional probability that he killed her is bigger than 1/2 and that's a whopper. Now, it's very relevant.

The probability he killed her just given that he beat her is only 1 in 2,000, but if you add the fact, which is very relevant in this case, that the wife was murdered, this is now very compelling. Now, in fact, they should have really compare this to probability he kills her given that she's dead. And so that would determine now the relevance of the battering, the wife beating. That's what they should have done, but they didn't. They got this far and they had that and the judge said, I'm letting it in. So it came in at that point.

But this would be the right comparison, I think. Because you look at the probability that you killed her given that she's dead, but now the additional information, the wife battering, how does that change the probability? And it probably changes it materially. So it's all a little gory, but it's interesting to see how mathematics played out in this kind of environment. Yeah.

AUDIENCE: Are we supposed to assume that he did kill his wife?

TOM LEIGHTON: Yes, and they assumed that, but when you decide whether or not to admit evidence, if it's prejudicial, you've got to have a really good grounds to get it in. Like if the evidence is going to make the jury think he did it, then you really got to argue the evidence is relevant somehow. There's material information and that's what the fight was about. A 1 in 2,000 relevance isn't going to cut it. 1 in 2, that's probably pretty relevant. And that will be the grounds on which the judge makes his decision. But yeah, you assume he didn't do it.

All right. Back to independence. So the last example today is derived from a famous paradox and has several actually important applications in computer science. And this problem is known as the birthday problem or the birthday paradox. It's a paradox because it sort of has a surprising answer. Probably a lot of you have seen this before in some form or another.

In the birthday problem, there are N birthdays and typically we're going to look at the case where N is 365, the days of the year, and there is M people. And for example, know maybe there's 100 people here. And what we want to know is, what is the probability that two or more people have the same birthday.

For example, how many people think there's at least a 50% chance that a pair of you in the audience here have the same birthday? That's good. How many people think there's a better than 90% chance? A few of you. All right. How many people think there's a better than a 99% chance that there's a pair of matching birthdays? A couple left.

How many think it's better than a 99.9% chance? We've got one, two. You guys are going to be stubborn. Another one. All right. How many people think it's more than 99.999% chance? Actually it's six 9's. It's incredible. It is a virtual certainty.

So let's see. In fact, the chance that you're all different is about 1 in 3 million chance that you're all different. And we're going to see why that's true here. But to do that, we're going to need to make two important assumptions. Any ideas about what assumptions you're going to need? Yeah.

AUDIENCE: Birthdays are uniformly distributed.

TOM LEIGHTON: Birthdays are uniformly distributed. Any other ideas? Yes.

TOM LEIGHTON: Oh, he stole yours. What else are you going to need to assume? Yeah.

AUDIENCE: All birthdays are independent of each other.

TOM LEIGHTON: Yeah. Mutually independent. We're going to need that as well. Now, in actuality, neither is true in reality. It's well known that birthdays tend to follow seasonal patterns and they're related to major events.

Now, do you all remember the big blackout that hit the Northeast several years ago? Do you remember that? Well, it turns out, this is a true fact, there were a lot of babies born nine months later. In fact, they had a name. They're called blackout babies. If you were born in that period in the Northeast and there's all these news stories about the life of the blackout babies.

And the same thing happens after cold snaps in the winter and you get a blizzard or this kind of a thing. Nine months later, you get babies. In fact, I had a personal experience with this. Well, my son was born on October 18, 1996. And on the day he was born, we're going to the hospital and it was a zoo.

The maternity ward was totally full. We had to go at some other wing of the hospital. And babies were popping out all over the place. And I asked, what is going on? Why don't you have enough room for all the mothers here?

And they said, oh, it's all the blizzard babies. And I go, what? And they go, well, remember the blizzard of '96? It's like, oh yeah. I remember. Yeah. It was nine months prior is the big blizzard and so it's all the blizzard babies coming.

So they're not uniform. They're all different probabilities here, but we're going to assume they're equally likely.

Now, independence is also not true, in general. What's one way that birthdays might not be independent? What is it?

AUDIENCE: Twins.

TOM LEIGHTON: Twins. So if they're twins, they have the same birthday. Now, there's other ways. In fact, my only sibling, my brother, has the same birthday I do, but I'm two years older, so we weren't twins. Now, you say, what are the odds of that? Well, 1 in 365, you think.

Well, one day I'm in middle school, about the age you start thinking about these things, and you get the idea to count back nine months from your birthday. Probably some of you have done that. And I did that and that's my dad's birthday. I was like, oh. May is not 1 in 365. It's like, Happy Birthday. I don't know.

Anyway, I almost needed to go into therapy after that, you know. So now you all got to count back nine months from your birthday. Anybody whose birthday is on September 30 or October 1, nine months back is New Year's Eve. That's dangerous. So in reality, birthdays are not independent and they are not randomly distributed, but we're going to assume that because we're going to use this same analysis for computer science problems where things are, hopefully, more independent and random.

Now, we're going to do an experiment to see how many people it takes us to get a pair of matching birthdays. So I'm going to run through people in order in the rows here, get your birthday and we're going to record and we're going to see how far we go until there's a match in that group. So I will write up the months here. And we'll start with my birthday is October 28.

So let's go right across. What yours?

AUDIENCE: April 1.

TOM LEIGHTON: April 1. OK. We won't embarrass you here. OK, who's next? What's your birthday?

AUDIENCE: I'm sorry. September 2.

TOM LEIGHTON: September 2. All right. Yours.

AUDIENCE: June 1.

TOM LEIGHTON: June 1. OK. We'll come back.

AUDIENCE: April 8.

TOM LEIGHTON: What is it?

AUDIENCE: April 8.

TOM LEIGHTON: April 8. All right.

AUDIENCE: November 20.

TOM LEIGHTON: November 20.

AUDIENCE: June 12.

TOM LEIGHTON: June 12.

AUDIENCE: December 29.

TOM LEIGHTON: December 29.

AUDIENCE: [INAUDIBLE].

TOM LEIGHTON: What is it?

AUDIENCE: June 14.

TOM LEIGHTON: June 14. Ooh, I almost got one there. That one's close. All right. What's yours?

AUDIENCE: March 6.

TOM LEIGHTON: March 6.

AUDIENCE: May 2.

TOM LEIGHTON: May 2.

AUDIENCE: 17th of November.

TOM LEIGHTON: November 17. Close again.

AUDIENCE: August 4.

TOM LEIGHTON: August 4.

AUDIENCE: July 25.

TOM LEIGHTON: July 25. I don't think we'll get to 100 here, hopefully. Yeah, what's yours?

AUDIENCE: October 30.

TOM LEIGHTON: What is it?

AUDIENCE: October 30.

TOM LEIGHTON: October 30. Got close.

AUDIENCE: July 6.

TOM LEIGHTON: July 6. All right.

AUDIENCE: February 25.

TOM LEIGHTON: February 25.

AUDIENCE: May 21.

TOM LEIGHTON: May what? 21st of May.

AUDIENCE: May 30.

TOM LEIGHTON: May 30. You guys fooled me. What have you got?

AUDIENCE: January 12.

TOM LEIGHTON: January 12. All right.

AUDIENCE: July 14.

TOM LEIGHTON: July 14. OK.

AUDIENCE: April 30.

TOM LEIGHTON: April 30.

AUDIENCE: March 13.

TOM LEIGHTON: March 13. All right. Did I get--

AUDIENCE: October 7.

TOM LEIGHTON: October 7.

AUDIENCE: October 8.

TOM LEIGHTON: Ah, you guys. OK. Did I get you?

AUDIENCE: September 15.

TOM LEIGHTON: September 15.

AUDIENCE: November 9.

TOM LEIGHTON: November 9. All right.

AUDIENCE: July 15.

TOM LEIGHTON: July 15. Close.

AUDIENCE: September 3.

TOM LEIGHTON: September 3. You guys are killing me here.

AUDIENCE: February 6.

TOM LEIGHTON: February 6.

AUDIENCE: October 26.

TOM LEIGHTON: OK.

AUDIENCE: November 2.

TOM LEIGHTON: November 2.

AUDIENCE: January 23.

TOM LEIGHTON: January 23.

AUDIENCE: September 27.

TOM LEIGHTON: You guys are going to set a record for sure here. This isn't the way it's supposed to go.

AUDIENCE: December 30.

TOM LEIGHTON: December 30.

AUDIENCE: December 28.

TOM LEIGHTON: Ah, come on, guys. What is the probability of going this long here? Yeah.

AUDIENCE: September 22.

TOM LEIGHTON: September 22.

AUDIENCE: July 30.

TOM LEIGHTON: July 30.

AUDIENCE: The 24th of August.

TOM LEIGHTON: 24th August. I'm going to have to ask the same person to tell me twice here to get a match. We got over there now?

AUDIENCE: April 6.

TOM LEIGHTON: April 6.

AUDIENCE: October 16.

TOM LEIGHTON: October 16.

AUDIENCE: September 3.

TOM LEIGHTON: September 3. All right. Very good. All right. Let's count and see how many we got here. 1, 2, 3, 4, 5, 6, 7, 8. 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42. That is a record. So it took 42 people to get a match.

Now it turns out that for N equals 365, the magic number for N is 23, that by 23 people, we got a 50-50 chance. In fact, the probability of a match on 23 people is 0.506. It's a little bit better than 50-50 chance at 23. Now, maybe we should figure out. It's too late for homework to figure out what the chances are of going this long without a match. That maybe worth figuring that out.

Now, it may seem surprising at first that 23 people is enough to have a 50/50 chance because the chance of any pair matching is 1 in 365, by our assumption. And that's small, but there's lots of pairs of people and every pair of people have a chance to match and that's why 23 turns out to be enough to get to 50-50.

Now, we're going to do the analysis for general M and N to the figure out the probability of a match if there's M people and N birthdays. There's lots of ways to do it. The easiest is to sort of well, we'll draw the sample space. It will be too big to draw the whole thing, but we can sort of model the sample space and then look at the sample points.

So you've got the first person and there's N birthdays here, so it could be anywhere from January 1 out to December 31 and in general this will be N. And then you have the second person and they have N possibilities for their birthday. And you take the tree down M levels to the very last person here.

So each node has degree N and there's M levels on this tree. So the sample space is the set of all n-tuples b1, b2, to bm, these are the birthdays where every value of bi is between 1 and N. So a sample point is all the birthdays of the M people.

How many sample points are there here? Remember how to count these things? Number of leaves on an N-ary tree of depth M or you can think of it this way. I've got N choices for each bi and there's M of them.

AUDIENCE: [INAUDIBLE].

TOM LEIGHTON: So what's the number of sample points?

AUDIENCE: N to the M.

TOM LEIGHTON: N to the M. Because N choices here, N choices here, N choices there, so you have N times N times N M times. And what's the probability of each outcome? For a set of possible birthdays, what's its probability? What's the probability of b1, b2, bM?

So the probability of a sample point. What's the probability that the first person has birthday b1, the second has b2, and the N-th has bM? Remember that? Yeah.

AUDIENCE: 1 over N to the M.

TOM LEIGHTON: 1 over N to the M because each edge is probability of 1 over N and the paths are length M, so you've got 1 over N to the M-th power. Probability of the first birthday matching is 1 in N times 1 in N times 1 in N. And this actually makes sense because I've got N to the M sample points, each a probability 1 over N to the M. So they all add up to 1, which is good.

What kind of sample space is this where this happens where all the probabilities are the same?

AUDIENCE: Uniform.

TOM LEIGHTON: Uniform. Makes it very easy to work with. All we got to do now is just count the number of sample points where there's a matching birthday and then we multiply by that one probability 1 over N to the M.

Now, it turns out that rather than counting the number of sample points where there's a matching birthday, it's easier to count the number of sample points for all the birthdays are different. And this is often the case when you're doing a counting problem, it's easier to count the opposite of what you're after. That can be the case and it is the case here. So we're going to do that.

So let's count how many sample points are all different birthdays, so no pair of bi's is the same. Let's do that. How many choices are there for b1? 365 or N. Let's do this in terms of N because we're going to use this for general N.

How many choices for b2? N minus 1. Given you are the first one, you can't match it. And then N minus 2 all the way over to the last one is N minus M plus 1. And this is a formula you should all remember. That's just N factorial over N minus M factorial. You did this sort of stuff a couple weeks ago with counting sets and probability is really-- a lot of it's about counting.

So now we can compute the probability that all the birthdays are different. It's just adding up all the sample points of which there's n factorial over N minus M factorial and multiply by the probability of each one, which is 1 over N to the M. All right. So we've actually now answered the question. This is the probability that all the birthdays are different.

The only problem is, it's not so clear what the answer is to actually compute this or how fast it grows. So if I wanted to get a closed form for this without the factorials, what do I do? What do I use? Stirling's formula.

So let's remember that. It says that N factorial is asymptotically equal to square root 2 pi N times N over e to the N. And that is accurate within 0.1% when N is at least 100. So not only is it asymptotically equal, it's right on track for a reasonable size N.

Now, I won't drag you through all the calculations. I used to actually try plugging that formula in for here and here and then going through all the calculations, but we won't do it in class. It's in the text. But I will tell you where that winds up. It's not hard, you've just got to do the calculation.

So this is means the probability that all birthdays are different turns out to be asymptotically equal to e to the N minus M plus 1/2 times the natural log of N over N minus M minus M. And that's accurate to within 0.2%, if N and N minus M are large, larger than 100. So in fact, it's almost equal.

And now you could plug in N equals 365 and M equals 100. So if you do that, in fact, if somebody has a calculator, we should plug in, what do we have, 42. You should plug in M equals 42 and see what the probability is. But if M is 100, the chance that we're all different, this equals 3.07 dot, dot, dot times 10 to the minus 7. And we should check for M equals 42. My guess is it's pretty small, but I don't know. We'll have to check that.

AUDIENCE: 0.0859.

TOM LEIGHTON: Great. So a 9% chance of having 42 people all miss is a 9% chance. So we were little unlucky. That won't happen very often. But when you go from 42 to 100, it gets really small. 1 in 3 million or so. Now, if N is 365 and M is 23, the probability comes out to be about 0.49, so about 50-50, they're all different.

Now. For general M and N, we'd like to know when do you get to the 50-50 point? We'd like to derive an equation for M in terms of N where the probability of being all different is about 1/2. All right. So let's do that. So as long as we assume-- and this will turn out to be true-- that M is a little o of N to the 2/3 and remember little o means it grows slower than N to the 2/3. Then we can simplify that expression in asymptotic notation.

And when you do it, I won't drag it through on the board. It's also in the text, it turns out to be much simpler. It's just e to the minus M squared over 2N. So I take that thing up there and I assume that M is growing less fast than the 2/3 power of N and that whole upper expression reduces down to M squared over 2N. Everything else goes to 0 in the exponent. Doesn't matter.

Now, if I set this to be 1/2, I can solve this to find out what M has to be to make that be 1/2. All right. So this will be true if and only if minus M squared over 2N is equal to the natural log of 1/2. And that's true. Take the minus sign, put it inside to make a log of 2, multiply by 2N. That's true if M squared equals 2N natural log of 2.

And now I can solve for M really easily. That's true if and only if M equals the square root of 2 natural log of 2N, which is about 1.177 square root of N. So for general N, you get a 50% probability of having a matching birthday when M is in this range, pretty close to 1.2 square root of N.

Now, this square root N phenomenon, this thing here, that's what's known as the birthday principle. It says if you've got roughly square root of N randomly allocated items into N boxes or bins or birthdays, there's a decent chance two of the items will go into the same bin if the randomly allocated. In this case, the bins are the possible days of the year that we put each person into for their birthday. Any questions about that? Yeah.

AUDIENCE: M and N are like numbers like they're defined up there or does it mean to say M equals [INAUDIBLE]?

TOM LEIGHTON: Yeah. So here I looked at a special case where N was 365, M was 100, but we can imagine them as arbitrary numbers that could be getting large. And so over here and I say M is little o of N to the 2/3, I mean, well, M equals square root of N would qualify. Square root of N is little o of N to the 2/3. So as long as M is not growing too fast, I can simplify that expression up there, which is what I did.

And then we go back and we find, in fact, the square root of N the right answer and that is little o of N to the 2/3. And I have to use a different argument if I assumed M was bigger, which I didn't do. I didn't drag it for that. But I would have to go check that case.

So we can think of general is M and N as being arbitrary variables and potentially growing. M can be a function of N. And in fact, when M is the square root function of N, then we got a 50% chance of a match.

Now, the birthday principle comes up all over the place in computer science and it's worth remembering. For example, the generic form for this is when you have a hash function. Let's say I have a hash function, h, from a large set of items into a small set of items. For example, say I'm computing digital signatures. This is the space of all messages, this is the space of all 1,000-bit digital signatures, and h is a digital signature outcome.

Say I'm doing memory allocations. So all the things I might be sticking into a register, here's all the places it could go. Here's all the registers. Error checking. This is all the garbled messages in the world. This is the set of messages that make sense, all handled by functions, random kind of functions often.

Now, what you worry about when you're hashing is collisions. Let me define that. We say that x collides with y if the hash of x equals the hash of y, but x and y are different. For example, say you're looking at digital signatures. You would not want the signature for a \$100 check to your mom to match your signature for \$100,000 check to Boris. Because that would be bad because then Boris could come in and take that check to your mom for \$100, converted to a \$100,000 check to him and the signature is authentic if there's a collision in the signatures.

So very important when you're doing hash functions and in many applications, you don't want collisions because all the whole thing start breaking. Memory allocation. You don't want to assign two things in the same place. Error correction. There's only one answer you want to get out at the end.

Now, from the pigeon hole principle, you know if this set is bigger than that set, there is going to be a collision. That's what the pigeon hole principle says. Two guys will get mapped to the same thing. However, often in practice what we care about is a subset L prime of L that's pretty small because the set of messages we really assign is pretty small compared to all 1,000-bit signatures that are possible.

And what you'd like is that for this smaller set of messages, you might want to assign, they all get mapped one to one. And the birthday principle says life is not so nice. So let me write that down then we'll be done. All right. So the birthday principle says that if S is at least 100, L prime is a subset of L that is at least the square root of S.

So the cardinality of the things you want to hash is bigger than 1.2 square root the cardinality of S. And if the values of the function h on L prime are randomly chosen, uniform, and mutually independent, then there's at least a 50% chance, so with probability at least 1/2, there's a collision. There exists an x and a y such that x does not equal y-- and these are in L prime-- but h of x equals h of y.

All right. The proof is not hard, it's just we more or less did it. You just plug in the cardinality of L prime for M and the cardinality of S for N. And it's bad news because it means it doesn't take very many messages, just square root the number of signatures to get a collision. You'd hope you could get that you could have L prime be as big as S and that somehow they'd all go one to one, that everybody in this room would have a different birthday. That is not how it works if things are random, which is the case you usually like to have.

Now, this technique is used to crack cryptographic protocols and it's called the birthday attack based on the birthday principle. So what you do is, you get a bunch of messages that are encrypted and pretty soon you find two that get maybe encrypted the same way. And once you have that, now you can go back and crack the crypto system. For example, you break schemes like RSA with a birthday attack if this space is not big enough and that's one reason why now RSA, the keys have thousands of digits because otherwise you can use attacks like this and crack them more easily.

Any questions about that? OK. Very good. We're done for today.