# Lecture 19: Conditional Probability

Flash and JavaScript are required for this feature.

Download the video from iTunes U or the Internet Archive.

Description: Covers conditional probability and its applications to examples including medical testing, gambling, and court cases.

Speaker: Tom Leighton

Instructor's Note: The actual details of the Berkeley sex discrimination case may have been different than what was stated in the lecture, so it is best to consider the description given in lecture as fictional but illustrative of the mathematical point being made.

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.

PROFESSOR: Just a reminder, drop day is tomorrow. So if you were thinking about dropping the course or in danger of a bad grade or something, tomorrow's the last chance to bail out. Last time we began our discussion on probability with the Monty Hall game-- the Monty Hall problem. And as part of the analysis, we made assumptions of the form that given that Carol placed the prize in box 1, the probability that the contestant chooses box 1 is 1/3. Now, this is an example of something that's called a conditional probability. And that's what we're going to study today.

Now, in general, you have something like the conditional probability that an event, A, happens given that some other event, B, has already taken place. And you write that down as a probability of A given B. And both A and B are events. Now, the example from Monty Hall-- and actually, we had several-- but you might have B being the event that Carol places the prize in box 1. And A might be the event that the contestant chooses box 1.

And we assumed for the Monty Hall game that the probability of A given B in this case was 1/3 third because the contestant didn't know where the prize was. Now in general, there's a very simple formula to compute the probability of A given B. In fact, we'll treat it as a definition. Assuming the probability of B is non-zero than the probability of A given B is just the probability of A and B happening, both happening, divided by the probability of B happening.

And you can see why this makes sense when the picture-- say this is our sample space. And let this be the event, A, and this be the event, B. Now we're conditioning on the fact that B happened. Now once we've conditioned on that, all this stuff outside of B is no longer possible.

All those outcomes are no longer in the space of consideration. The only outcomes left are in B. So in some sense we've shrunk the sample space to be B. And all we care about is the probability that A happens inside this new sample space. And that is, we're asking the probability 1 of these outcomes happens given that this is the sample space.

Well, this is just A intersect B because you still have to have A happen, but now you're inside of B. And then we divide by probability of B. So we normalize this to be probability one.

OK. Because we're saying B happened-- we're conditioning on that. Therefore, the probability of these outcomes must be 1. So we divide by the probability of B. So we normalize.

This now becomes-- the probability of A given B is this share of B weighted by the outcomes. OK. All right. For example then, what's the probability of B given B? what's that equal? 1.

OK. Because we said it happened-- so it happens with probability 1. Or, using the formula, that's just probability of B and B divided by probability of B. Well, that equals the probability of B divided by the probability of B, which is 1.

All right. Any questions about the definition of the conditional probability? Very simple. And it's easy to work with using the formulas.

Now, there's a nice rule called the product rule, which follows from the definition very simply. The product rule says that the probability of A and B for two events is equal to the probability of B times the probability of A given B. And that's just follow straightforwardly from this definition.

Just multiply by probability of B on both sides. All right. So now you have a rule of computing a probability of two events simultaneously happening. So for example, in the Monty Hall problem, what's the probability that Carol places the prize in box one and that's the box the contestant chooses? All right?

So if we took A and B as defined up there, that's the probability that Carol places it in box one and the contestant chose it. Well, that's the probability that the contestant chooses it is 1/3 times the probability that Carol put it there, given the contestant chose it, or actually, vice versa, Is 1/9. OK?

And this extends to more events. It is called the general product rule. So if you want to compute the probability of A1 and A2 and all the way up to An, that's simply the probability of a 1 happening all by itself times the probability of A2 given A1 times-- well, I'll do the next one-- times the probability of A3 given A1 and A2, dot, dot dot, times, finally, the probability of An given all the others.

So that starts to look a little more complicated. But it gives you a handy way of computing the probability that an intersection of events takes place. I do This is proved by induction on n, just taking that rule and using induction on n. It's not hard. But we won't go through it.

All right. Let's do some examples. We'll start with an easy one. Say you're playing a playoff series and you're going to play best 2 out of 3. All right. So you have a best 2 out of 3 series. So whoever wins the first two games, best two out of three wins. And say you're told that the probability of winning the first game is 1/2.

So the teams are matched 50-50 for the first game. But then you're told that the probability of winning a game after a victory is higher. It's 2/3. So the probability of winning immediately after a game following a win is two thirds. And similarly, the probability of winning after a loss is 1/3.

All right. And the idea here is that you win a game, you're sort of psyched, you've got momentum, and going into the next day you're more likely to win. Similarly, if you lost you're sort of down and the other guy has a better chance of beating you. Now, what we're going to try to figure out is the probability of winning the series given you won the first game. All right?

Now, conditional probability comes up in two places in this problem. Anybody tell me places where it's come up? So I got the problem statement and the that's the goal is to figure out the probability you win the series given you won the first game. So what's one place conditional probability is entering into this problem? Yeah?

AUDIENCE: The probability changes depending on the result of the previous game.

PROFESSOR: That's true. The probability of winning any particular game is influenced by the previous game. So you're using conditional probability there. All right. And where else? Yeah.

AUDIENCE: [INAUDIBLE] you have to take into account [INAUDIBLE].

PROFESSOR: That's interesting. That will be another question we're going to look at. What's the probability of playing three games? Yep. That's one.

OK. Well, the question we're after, what's the probability of winning the series given that you won the first game. We're going to compute a conditional probability there. So it's coming up in a couple of places here.

All right. Let's figure this out. It's easy to do given the tree method. So let's make the tree for this. So we have possibly three games there's game one, game two, and game three.

Game one, you can win or lose. There's two branches. Game two you can win or lose. And now, game three-- well, it doesn't even take place here.

But it does here. You can win or lose here. And you could win or lose here. And here the series is over. So there is no game three in that case.

The probabilities are next we put a probability of every branch here. Game one is 50-50. What's the probability you take this branch? 2/3, because you're on the path where you won the first game.

You win the second game with 2/3. You lose with 1/3. Now here you're on the path where you lost the first game. So this has 1/3 and this has 2/3.

All right? And then lastly, what's the probability I have the win on the third game here? 1/3, because I just lost the last game. That's all I'm conditioning on.

So that becomes 1/3. And this is 2/3 now. And then here I just won a game. So I've got 2/3 and 1/3.

All right. So I got all the probabilities. And now I need to figure out for the sample points what's their probability. So this sample point we'll call win-win.

This sample point is win-lose-win. This one's win-lose-lose. Then we have lose-win-win, lose-win-lose, and then lose-lose.

So I got six sample points. And let's figure out the probability for each one. Now remember the rule we had for the tree method. I just multiply these things.

Well, in fact, the reason we have that rule is because that is the same as the product rule. Because what I'm asking here to compute the probability of this guy is-- so the product rule gives the probability of a win-win scenario-- win the first game, win the second game. By the product rule is the probability that I win the first game times the probability that I win the second game given that I won the first game. That's what the product rule says.

Probability I win the first game is 1/2 times the probability I win the second given that I won the first is 2/3. So that equals 1/3. So what we're doing here now is giving you the formal justification for that rule that we had last time and that you'll always use-- is the probability of a sample point is the product of the probabilities on the edges leading to it. It's just the product rule.

Now the next example is this one. And here we're going to use the general product rule to get it. The probability of win-lose-win by the general product rule is the probability that you win the first game times the probability you lose the second game given the that you win the first times the probability you win the third given what?

What am I given on the product rule? Won the first, lost the second. All right. Well, now we can fill in the numbers.

The probability I win the first is a 1/2. The probability that I lose the second given that I won the first, that's 1/3. And then this one here, the probability that I win the third given that I won the first and lost the second, that simplifies the probability I win the third given that I lost the second. Doesn't matter what happened on the first. And that's 1/3.

So this is 1/2 times 1/3 times 1/3 is 118. And that's 1/18. And it's just the product because the product rule saying product of the first probability times this one, which is the conditional probability of being here times this one, which is a conditional probability if these events happened before.

Any questions about that? Very simple to do, which is good. Yeah. Is there a question? OK. All right. So let's fill in the other probabilities here.

I got 1/2, 1/3, and 2/3. That's 1/9. Same thing here is 1/9. This is 1/18 and 1/3. OK. So those are the probabilities in the sample points.

Now, to compute the probability of winning the series given that we won the first game, let's define the events here. So A be the event that we win the series. B will be the event that we win the first game. And I want to compute the probability of A given B.

And we use our formula. Where's the formula for that? It's way back over there. The probability of A given B is the probability of both happening, the probability of A and B divided by the probability of B.

So now I just have to compute these probabilities. So to do that I got to figure out which sample points are in A and B here. So let's write that down. There's A, B, A and B. All right. So A is the event that we win the series. Now this sample point qualifies, that one does, and this one.

B is the event we won the first game. And that's these three sample points. And then A and B intersect B is these two. All right.

So for each event that I care about I figure out which sample points are in that event. And now I just add the probabilities up. So what's the probability of A and B? 7/18. 1/3 plus 1/18.

What's the probability of B? Yeah. 1/2, 9/18. I got these three points. So this'll be 1/3 third plus 1/18 plus the extra one, 1/9. So I've got 7/18 over 9/18. 7/9 is the answer. So the probability we win the Series given we won the first game is 7/9. Any questions?

We're going to do this same thing about 10 different times. OK? And it will look a little different each time maybe. But it's the same idea. And the beauty here is it's really easy to do. I'm going to give you a lot of confusing examples. But really, if you just do this is it's going to be very easy.

All right. Somebody talked about the series lasting three games. What's the probability the series lasts three games? Can anybody look at that and tell me? 1/3 because what you would do is add up these three sample points. And it's the opposite of these two. So it's 2/3 chance of two games, a 1/3 chance of three games. So it's not likely to go three games.

All right. So to this point, we've seen examples of a conditional probability where it's A given B where A follows B, like, we're told B happened. Now what's the chance of A. And A is coming later. The probability of winning today's game given that you won yesterday's game, the probability of winning the series given you already won the first game.

Next, we're going to look at the opposite scenario where the events are reversed in order. The probability that you won the first game given that you won the series. All right. Now, this is inherently confusing because if you're trying to figure-- if you know you the series, well, you already know what happened in the first game because it's been played. So how could there be any probability there? It happened.

Well, so what the meaning is is over all the times where the series was played, sort of what fraction of the time did the team that won the series win the first game is one way you could think about it. Or, maybe you just don't know. The game was played. You know you won the series. But you don't know who won the first game. And so you could think of a probability still being there.

Now when you think about it, it gets me confused still. But just think about it like the math. It's the same formula. OK. It doesn't matter which happened first in time. You use the same mathematics. In fact, they give a special name these kinds of things. They're called a postieri conditional probabilities.

It's a fancy name for just saying that things are out of order in time. All right? So it's a probability of B given A where B precedes A in time. All right? So it's the same math. It's just they're out of order.

So let's figure out the probability that you won the first game given that you want the series. Let's figure it out. So I want probability of B given A now for this example. Well, it's just the probability of B and A over the probability of A.

We already computed the probability of A and B. That's 1/3 plus 1/18. what's the probability of A, the probability of winning the first game? 1/2. It's those three sample points and they better add up to 1/2 because we sort of said, the probability of the first game's 1/2. So that's over 1/2, which is 9/18.

Well this was 7/18 over 9/18. It's 7/9. So the probability of winning the first game given that you won series is 7/9. Anybody notice anything unusual about that answer here?

It's the same as the answer over there. Is that a theorem? No. The probability of A given B is not always the probability of B given A. It was in this case. It is not always true. In fact, we could make a simple example to see why that's not always the case.

All right. So say here's your sample space. And say that this is B and this is A. What's the probability of A given B in this case? 1. If you're in B-- wait. No. It's not 1. What's the probability of A given B If I got some-- probably less than 1. Might be I've drawn it as 1/3 third if it was uniform.

But in this case, the probability of A given B is less than 1. What's the probability of B given A? 1, because if I'm in A I'm definitely in B. All right. So that's an example where they would be different. And that's the generic case is they're different. All right?

When are they equal because they were equal in this case? What makes them equal? Let's see. When does the probability of A given B equal a probability of B given A?

Let's see. Well, If I plug-in the formula, this equals the probability of A and B over the probability of B. That equals the probability of B and A over a probability of A. So when are those equal? Yeah. When probability A equals probability B. All right. So that's one case.

What's the other case? Yeah-- when it's 0. Probability-- there's no intersection. Probability of A intersect B is 0. That's the other case. All right. But usually these conditions won't apply-- just happened to in this example by coincidence.

Any questions about that? All right. Yeah. So the math is the same with a postieri probabilities. It's really, really easy.

All right. So let's do another simple example that'll start to maybe be a little more confusing. Say we've got two coins.

One of them is a fair coin. And by that, I mean the probability comes up heads is the same as the probability comes up tails is 1/2. The other one is an unfair coin. And in this case, that means it's always heads.

The probability of heads is 1. The probability of tails is 0. All right? I've got two such coins here.

All right. Here is the unfair coin-- heads and heads. Actually, they make these things look like quarters sometimes. Here's the fair coin-- heads and tails. All right.

Now suppose I pick one of these at random, 50-50, I pick one of these things, and I flip it, which I'm doing behind my back, and lo and behold, it comes out and, you see a heads. What's the probability I'm holding the fair coin? I picked the coin, 50-50, behind my back. So one answer is, I picked the fair coin with 50% probability.

But then I flipped it behind my back and I showed you the result. And you see heads. Of course, if I'd have shown you tails, You would have known for sure it was the fair coin because that's the only one with the tails.

But you don't know for sure now. You see a heads. What's the probability this is the fair coin given that you saw a heads after the flip?

How many people think 1/2? After all, I picked it with probability 1/2. How many people think it's less than 1/2? Good. OK.

Somebody even said 1/3. Does that sound right? A couple people like 1/3. OK. All right.

Now, part of what makes this tricky is I told you I picked the coin with 50% probability. But then I gave you information. So I've conditioned the problem.

And so this is one of those things you could have an ask Marilyn about. Is it 1/2 or is it 1/3? Because I picked it with 50% chance, what does the information do for you? Now, I'll give you a clue.

Bobo might have written in and said it's 1/2. And his proof is that three other mathematicians agreed with him.

[LAUGHTER]

All right? OK. So let's figure it out. And really it's very simple. It's just drawing out the tree and computing the conditional probability.

So we're going to do the same thing over and over again because it just works for every problem. Of course, you could imagine debating this for awhile, arguing with somebody. Is it 1/2 or 1/3? Much simpler just to do it.

So the first thing is we have, which coin is picked? So it could be fair-- and I told you that happens with probability 1/2-- or unfair, which is also 1/2. Then we have the flip. The fair coin is equally likely to be heads or tails, each with 1/2. The unfair coin, guaranteed to be heads, probability 1.

All right. Now we get the sample point outcomes. It's fair in heads with the probability 1/4, fair in tails, probability 1/4, unfair in heads, probability 1/2. Now we define the events of interest. A is going to be that we chose the fair coin. And B is at the result, is heads. And of course what I want to know is the probability that I chose the fair coin given that I saw a heads.

So to do that we plug in our formula. That's just the probability of A and B over the probability of B. And to compute that I got to figure out the probability of A and B and the probability of B.

So I'll make my diagram. A here, B here, A and B. A is the event I chose the fair coin. That's these guys. B is the event the result is heads. That's this one and this one. And A intersect B, That's the only point. So this is really easy to compute now.

What's the probability of A and B? 1/4. It's just that sample point. What's the probability of B? 3/4, 1/4 plus 1/2. So the probability of A given B is 1/3.

Really simple to answer this question. Just don't even think about it. Just write down the tree when you get these things. So much easier just to write the tree down.

All right. Now the key here is we knew the probability of picking the fair coin in the first place. Maybe it's worth writing down what happens if that's a variable-- sum variable P. Let's do that.

For example, what if I hadn't told you the probability that I picked the fair coin? I just picked one and flipped it. Think that'll change the answer? It should because you got to plug something in there for the 1/2 for this to work.

So let's see what happens. Say I picked the fair coin with probability P and the unfair coin with 1 minus P. And this is the same heads and tails, 1/2, 1/2. Heads, the probability 1. Well now, instead of 1/4 I get P over 2 up here. And this is now 1 minus P instead of 1/2.

So the probability of A given B is the probability of A and B is p over 2. And the probability of B is P over 2 plus 1 minus P. That's P over 2 up top, one minus P over 2, and that is all multiplied by-- what am I going to multiply-- 2 here. I'll get P over 2 minus P.

So the probability with which I picked the coin to start with impacts the answer here. For example, what if I picked the unfair coin for sure? That would be P being 0. Well, the probability that I picked the fair coin is 0 over 2, which is 0. All right though-- even know I showed you the heads, there's no chance it was the fair coin because I picked the unfair coin for sure.

Same thing if I picked the fair coin for sure, better be the case this is 1. So I get 1 over 2 minus 1. It's 1. Any questions? So it's important you know the probability I picked the fair coin to start with. Otherwise, you can't go anywhere.

All right. What if I do the same game? Pick a coin with probability p. But now I flip it K times. Say I flip it 100 times. And every time it comes up heads. I mean you're pretty sure you got the unfair coin because you never saw a tails. Right?

So let's do that. Let's compute that scenario. So instead of a single heads I get K straight heads and no tails. This would happen with 1 over 2 to the K. This would happen with 1 minus 1 over 2 to the K. So this is now p over 2 to the K. This is now P1 minus 2 to the minus K.

Let's recompute the probabilities. I'm going somewhere where this. Wait a minute.

So now we're looking at the event that B is K straight heads. Come up. And I want to know the probability that I picked the fair coin given that it just never comes up tails. The math is the same.

The probability now that I picked the fair coin and got k straight heads is just p times 2 to the minus K. The probability that I got K straight heads is P times 2 to the minus K plus the chance I picked the unfair coin, which is 1 minus P. And if I multiply top and bottom by 2 to the K, I get P over P plus to the K 1 minus B.

All right. So it gets very unlikely that I've got the fair coin here as K gets big. Like if K is 100 I got a big number down here. And basically it's 0 chance-- close to 0 chance of the fair coin.

But now say I do the following experiment. I don't tell you P. But I pull a coin out and 100 flips in a row it's heads. Which coin do you think I have? I flipped it 100 straight times and it's heads every time.

Yeah. There's not enough information. You don't know. What do you want to say? You want to say it's the unfair coin but you have no idea because I might have picked the fair coin with probability 1, in which case it is the fair coin and it just was unlucky that it came up heads 100 times in a row. But it could be.

So you could say nothing if you don't know the probability P. Because sure enough, if I plug in P being 1 here, that wipes out the 2 to the K and I just get probability 1. OK?

All right. Now when this comes up in practice is with things like polling. Like, we just had an election. And people do poles ahead of time. And they sample thousands of voters from 1% of the population.

And they say, OK, that 60% of the people are going to vote Republican. And they might have a margin of error, three points, whatever that means. And we'll figure that out next week. What does that tell you about the electorate as a whole-- the population if they sample 1% at random, 60% are Republican. Yeah?

AUDIENCE: [INAUDIBLE] The options you have, is it all heads or is it all tails? It should be one option all heads and another option at least one tails.

PROFESSOR: You're right. Oops. All right. At least one tail for this one. Yeah. Good. That is true. OK. Any questions about that example?

OK. Now we're back to the election and there's a pole that says they sampled 1% of the population at random and 60% said they're going to vote Republican. And the margin of error is 3% or something. What does that tell you about the population of the country?

Nothing. That's right. It is what it is. All you can conclude is that either the population is close to 60% Republican or you were unlucky in the 1% you sample. That's what you can conclude because the population really is fixed in this case.

It is what it is. There's no randomness in the population. All right? So you have next week for recitation. You're going to design a pole and work through how to calculate the margin of error and work through what that really means in terms of what the population is like.

Now of course, if it comes out 100 straight times heads, you've got to be really unlucky to have the fair coin. And the same thing with designing the poll if you're way off. Any questions about that?

OK. The next example comes up all the time in practice. And that's with medical testing. Maybe I'll leave-- no. I'll take that down. We know that now.

Now in this case-- in fact, this is a question we had on the final exam a few years ago. And there's a good chance this kind of question's going to be on the final this year. There's a disease out there. And you can have a test for it.

But like most medical tests, they're not perfect. Sometimes when it says you've got the disease you really don't. And if it ways you don't have it, you really do.

So in this case, we're going to assume that 10% of the population has the disease, whatever it is. You don't get symptoms right away. So you have this test.

But if you have the disease there is a 10% chance that the test is negative. And this is called a false negative, because the test comes back negative but it's wrong, because you have the disease.

And similarly, if you have the disease-- or sorry-- if you don't have the disease, there's a 30% chance that the test comes back positive. And it's called a false positive because it came back positive, but you don't have it. So the test is pretty good. Right? It's 10% false negative right, 30% false positive right.

Now say you select a random person and they test positive. What you want to know is the probability they have the disease given that it's a random person. So actually, this came up in my personal life.

Many years ago when my wife was pregnant with Alex, she was exposed to somebody with TB here at MIT. And she took the test. And it came back positive.

Now the bad thing-- TB's a bad thing. You don't want to get it. But the medicine for it you take for six months. And she was worried about taking medicine for six months when she's pregnant because who knows what the TB medicine does kind of thing if you have a baby.

So she asked the doc, what's the probability I really have the disease? The doc doesn't know. The doc maybe could give you some of these steps, 10% false negative, 30% false positive. But it tested positive. So they just normally give you the medicine.

So say this was the story. What would you say? What do you think? How many people think that it's a least a 70% chance you got the disease? She tested positive and it's only got a 30% false positive rate. Anybody? So you don't think she's likely to have it.

How many people think it's better than 50-50 you have the disease? A few. How many people think less than 50%. A bunch. Yeah. You're right, in fact.

Let's figure out the answer. It's easy to do. So A is the event the person has the disease. And B is the event that the person tests positive.

And of course what we want to know is the probability you have the disease given that you tested positive. And that's just the probability of both events divided by the probability of testing positive. So let's figure that out by drawing the tree.

So first, do you have the disease? And it's yes or no. And let's see. The probability of having the disease, what is that for a random person? 10%. that the stat. So it's-- actually, we'll call it 0.1. And 9.9 you don't have it.

And then there's the test. Well, you can be positive or negative. Now if you have the disease, there is a-- the chance you test negative is 10%, 0.1. Therefore there's a 90% chance you test positive.

Now if, you don't have the disease, you could test either way. If you don't have the disease there's a 30% chance you test positive. 30 here and 70% percent chance you're negative.

Now we can compute each sample point probability. This one is 0.1 times 0.9 is 0.09. 0.1 times 1 is 0.01. 0.9 and 0.3 is 0.27. 0.9 and 0.7 is 0.63.

So all sample points are figured out. Now we figure out which sample points are in which sets. So we have event A, event B, and A intersect B. Let's see. A is the event you have the disease. That's these guys. B is the event you test positive. That's this one and this one. A intersect B is just this one.

All right. We're almost done. Let's just figure out the probability you have the disease. What's the probability of A intersect B? 0.09. It's just that one sample point.

What's the probability that you tested positive? 0.36. Yeah. 0.09 plus 0.27, which is 0.36. So I got 0.09 over 0.36 is 1/4. Wow. That seems bizarre.

Right? You've got a test, 10% percent false negative, 30% false positive. Yet, when you test positive there's only a 25% chance you have the disease. So maybe you don't take the medicine. So if there's risk both ways, probably don't have the disease. Yeah?

AUDIENCE: [INAUDIBLE] disease change because you've already been exposed to somebody that has it?

PROFESSOR: That's a great point, great point, because there's additional information conditioning this in the personal example I cited. You were exposed to somebody. So we need to condition on that as well, which raises the chance you have the disease. That's a great point. Yeah.

Just like in the-- well, we haven't got to that example. Do another example with that exact kind of thing is very important. All right. So this is sort of paradoxical that it looks like a pretty good test-- low false positive, full false negatives, but likely be wrong, at least if it tells you have the disease.

In fact, let's figure out. What's the probability that the test is correct? What's the probability the test is right in general? 72%. Let's see. So it would be 0.09 plus 0.63. 72%.

So it's likely to be right. But if it tells you you have the disease it's likely to be wrong. It's hard. Why is this happening? Why does it come out that way? Yeah?

AUDIENCE: Then there is only a 1 in 64 chance that you have the disease. So if it comes back negative, then it's a pretty good indication that you're OK.

PROFESSOR: Yeah. If it comes back negative than it really is doing very well. That's right. But why is it when it comes back positive that you're unlikely to have the disease if it's a good test. Yeah.

AUDIENCE: The disease is so rare.

PROFESSOR: The disease is so rare. Absolutely. This number here is so small. And that's what's doing it. Because if you look at how many people have the disease and test positive, it's 0.09. So many people don't have the disease that even with a small false positive rate, this number swamps out that number.

In fact, imagine nobody had the disease. You'd have a 0 here. All right? And then you would always be wrong if you said you had it. OK? That's good.

OK. This comes up in weather prediction, the same paradox. For example, say you're trying to predict the weather for Seattle. Sometimes it seems like this in Boston. And you just say, it's going to rain.

Forget all the fancy weather forecasting stuff, the radar, and all the rest. Just say it's going to rain tomorrow. You're going to be right almost all the time. All right? And in fact, if you try to do fancy stuff, you're probably going to be wrong more of the time.

All right. For example, in this case, if you just say the person does not have the disease, forget the lab test. Just come back with negative. How often are you right? 90% of the time you're right. Much better than the test you paid a lot of money for.

I see. You've got to be careful what you're looking for, how you measure the value of a test or a prediction. Because presumably the one you paid for is better, even though accurate less of the time. Any questions about that?

OK. So For the rest of today we're going to do three more paradoxes. And in each case they're going to expose a flaw in our intuition about probability. But the good news is in each case it's easy to get the right answer. Just stick with the math and try not to think about it. Now the first example is a game involving dice that's called carnival dice that you can find in carnivals and you can also find in casinos.

It's a pretty popular game, actually. So the way it works is as follows. The player picks a number from 1 to 6-- we'll call it N-- and then rolls three dice. And let's say they're fair and mutually independent.

We haven't talked about independent. So they're fair dice. For now, normal dice-- nothing fishy. And the player wins if and only if the number he picked comes up on at least one of the dice. So you either win or you lose the game depending on if your lucky number came up at least once.

Now you've got three dice, each of which has a 1 in 6 chance of coming up a winner for you. So how many people think this is a fair game-- you got a 50-50 chance of winning-- three dice, each 1/6 chance of winning? Anybody think it's not a fair game? A bunch of you. How many people think it is a fair game-- 50-50? A few. All right.

Well, let's figure it out. And instead of doing the tree method, which we know we're supposed to do, we're just going to wing it, which is always seems easier to do. If you're in the Casino you want to just wing it instead of taking your napkin out and drawing a tree.

So the claim, question mark, is the probability you win in 1/2. And the proof, question mark, is you let Ai be the event that the i-th die comes up N. And i is 1 to 3 here.

So then you say, OK. The probability I win is the probability of A1-- I could win that way-- or A2, or A3. All I need is one of the die to come up my way. And that is the probability of A1 plus the probability of A2 plus the probability of A3. And each die wins for me with probability 1/6. And that is then 1/2.

So that's a proof that we win with probability of 1/2. What do you think? Any problems with that proof?

AUDIENCE: [INAUDIBLE]

PROFESSOR: Well that's a great point. Yeah. So if I extended this nice proof technique I couldn't have probability of 7/6 of winning with seven die. Yeah?

AUDIENCE: [INAUDIBLE]

PROFESSOR: Yeah. You're very close. I didn't technically assume that.

AUDIENCE: [INAUDIBLE]

PROFESSOR: They could double up. Yeah. There's no intersection in the events. In fact, there is intersection because there's a chance I rolled all six-- all Ns. Say N is 6. I could roll all sixes and then each of these would be a winner. But I don't get to count them separately. Then I only win once in that case.

In other words, all of these could turned on at the same time. There's an intersection here. So this rule does not hold. I need the Ai to be disjoined for this to be true-- the events to be disjoined.

And they're not disjoined because there's a sample point were two or more of the die could come up the same being a winner, which means the same sample point, namely all die are N, comes up in each of these three. So they're not disjoined.

Now what's the principal you used two weeks ago when you did cardinality of a set-- cardinality of a union of sets? Inclusion, exclusion. And the same thing needs to be done here.

So let's do that. And then we'll figure out the actual probability. So this is a fact based on the inclusion, exclusion principle. The probability of A1, union A2, union A3, is just what you think it would be from inclusion, exclusion.

It's a probability of A1 plus a probability of A2 plus the probability of A3 minus the pairwise intersections. A1 intersect A3 minus probability of A2 intersect A3. And is there anything else? Plus, the probably of all of them matching.

OK. So the proof is really the same proof you use for inclusion, exclusion with sets. The only difference is that in a probability space, we have weights on the elements. And the weight corresponds to the probability.

So in fact, if you were drawing the sample space, say here's A1 and here's A2, and here's A3. Well, you need to add the probabilities here, here, and here. Then you subtract off the double counting from here, from here, and from here. And then you add back again what you subtracted off too much there. Same proof, it's just your have weights on the elements of probabilities.

All right. So let's figure out the right probability. That's 1/6, 1/6, 1/6. What's the probability of the first two die matching-- both of them? 1/36. We'll talk more about why that is next time. But there's a 6 for A1 then given that 1/6 for the second die matching. So it's 1/6 times 1/6 minus the 1/36. 1/36, the chance that all three match is 1/216 or 6 cubed. So when you add all that up you get the 0.421 and some more.

So the chance of winning this game is 41% which makes it a worst game in the casino. It is hard to find a worse game than this. Roulette, much better. We'll study Roulette in the last lecture-- much better game. And even that's a terrible game to play.

So it looks like an easy game. There's a quick proof that it's 50-50. But it's horrible odds against the house. Now, this is a nice example because it shows how a rule you had for computing the cardinality of a set gives you the probability. All right.

In fact, all the set laws you learned a couple weeks ago work for probability spaces the same way. And there were several of those in homework that you just had the last problem set. Any questions about that?

OK. Now in addition, all those set laws you did also work for conditional probabilities. For example, this is true. The probability of A union B given C-- whoops-- given C, is the probability of A given C plus the probability of B given C minus the intersection, A intersect B given C. In other words, take any probability rule you have and condition everything on an event, C, and it still works.

And the proof is not hard. You can go through each individual law but it all comes out to be fine. All right. You have to be a little careful though because you got to remember which side you're doing, which what you're putting on either side of the bar here. For example, what about this one? Is this true?

Claim. Let's take-- say C and D are disjoined. Is this true? Then the probability of A conditioned on C union D. So given that either C or D is true, does that equal the probability of A given C plus probability of A given D?

We know that if I swapped all these, it's true. The probability of C union D when C and D are disjoined is the probability that C given A plus the probability of D given A. That I just claimed.

And what about this way? Can I swap things around? Yeah?

AUDIENCE: [INAUDIBLE] would C union D be 0?

PROFESSOR: If C and D are disjoined, C union D would just be C union D. But you're not a good point. What if C and D are disjoined? That's a good example. Let's draw that. Let's look at that case.

So we've got a sample space here. And you've got C here and D here. And just for fun, let's make A be here-- include all of them. What's the probability-- is this going to do what I want? Yeah. What's the probability of A given C? 1. If I'm in C I'm in A. A is everything here. So the probability of A given C is one. What's the probability of A given D? 1.

All right. Well, this is a problem because I can't have the probability ot-- what's the probably of A given C union D? Well, it can't be 2. Right? It's 1. They are not equal.

So you cannot do those set rules on the right side of the conditioning bar. You can do them on the left, not on the right. All right. So this is not true. Now nobody would do this. Right? I mean, the probability of-- not that it's-- see this example?

This you just would never make this mistake again seeing that example. Everybody understand the example, how it's clearly not always the case that probability of A given C union D is a probability of A given C plus probability of A given D?

Because now I'm going to show you an example where you're going to swear it's true. All right? And this is a real life example. Many years ago now there was a sex discrimination suit at Berkeley.

There was a female professor in the math department. And she was denied tenure. And she filed a lawsuit against Berkeley alleging sex discrimination. Said she wasn't tenured because she's a woman.

Now, unfortunately sex discrimination is a problem in math departments. It's historically been a difficult area. But it's always hard to prove. It's a nebulous kind of thing.

They don't say, hey, you can't have tenure because you're a woman. They'd get sued and get killed for that. So she had to get some mat to back her up. So what she did is she looked into Berkeley's practices and she found that in all 22 departments, every single department, the percentage of male PhD applicants that were accepted was higher than the percentage of female PhD applicants that were accepted.

Now you could understand some of the departments accepting more male PhDs than female PhDs. But all 22? What are the odds of that? I mean, so the immediate conclusion is, well, that's clearly there's sex discrimination going on at Berkeley. OK?

Well Berkeley took a look at that and said, nothing good. That doesn't look good for them. But they did their own study of PhD applicants. And they said that if the university as a whole-- look at the University as a whole, actually, the women, the females have a higher acceptance rate for the PhD Program than the men.

So look. Berkeley said, we're accepting more women than men percentage-wise. So how could we be discriminating against women? And this is where the same argument the female faculty member's making, But they're saying as a university as a whole, when you add up all 22 departments. Well, that sounds pretty good. How could they be discriminating?

OK. So the question for you guys, is it possible that both sides we're telling the truth, that in every single department the women have a lower acceptance rate than men, but on the university as a whole the women are higher percentage? It sounds like it's-- and just to avoid any confusion here, people only apply to one department and they're only one sex. So you can't-- Carroll didn't apply.

[LAUGHTER]

How many people think that one of the sides, actually, when they look at the studies was wrong, that they're contradictory? Nobody? You've been in 6 over 2 too long. How many people think it's possible that both sides were right? Yeah. All right. So let's see how this works.

And to make it simple I'm going to get down to just two departments rather than try to do data for all 22. And I'm going to do not the actual data but something that's represents what's going on. OK. So we're going to look at the following events. A is the event that the applicant is admitted. FCS is the event that the applicant is female and applying to CS.

FEE is the event that the applicant is female and applying to EE. MCS is the event the applicant is a male and CS. And then finally we have MEE is the event the applicant is male and in EE. So we're just going to look at two departments here and try to figure out if it can happen that in both departments the women are worse off but if you take the union they're better off.

So the female professor's argument effectively is, the probability of being admitted given that you're a female in CS is less than the probability of being admitted given that you're a male at CS. And same thing in EE. Probability of being admitted in EE if you're a female is less than if you're a male.

OK? Now Berkeley is saying it's sort of the reverse. The probability that you're admitted given that you're a female in either department is bigger than the probability of being admitted if you're a male in either department. OK. So we've now expressed their arguments as conditional probabilities Any questions?

Can you sort of see why this seems contradictory? Not plus, union. Because this is sort of like-- these are just joined. This is the sum of those. And this is sort of the sum of those. And yet the inequality changed.

All right. In fact, this is the logic that we've just debunked over there-- exactly that claim. In fact, these are not equal as the sum. So let's do an example. Say that-- let's do it over here. I'll put the real values in over here.

Say that for women in computer science, 0 out of 1 were admitted compared to the men, were 50 out of 100 were admitted. And then in EE, 70 out of 100 women were admitted compared to the men, which had 1 out of 1. All right? So as ratios, 70% is less than 100%. 0% is less than 50.

Now if I look at the two departments is a whole, I get 70 over 101 is in fact bigger than 51 over 101. All right? And so as a whole women are a lot more likely to be admitted even though in each department they're less likely to be admitted. OK?

So what went wrong with the intuition, which you didn't fall victim to, but people often do, that it shouldn't have been possible given that? What's going on here that make it so that it's not a less than when you look at the union of the departments? Yeah?

AUDIENCE: [INAUDIBLE] they're weighted differently?

PROFESSOR: Yeah. They're weighted very differently. You got huge waves here. Right? So if I look at the average of the percentages here, well it's 35% for the women versus 75% for the men. So the average of the percentage is just what you'd think. 35 is less than 75.

But I've got huge weightings on these guys, which changes the numbers quite dramatically. So it all depends how you count it. Actually, who do you think had a better-- Yeah. Go ahead.

AUDIENCE: [INAUDIBLE]

PROFESSOR: Who won the lawsuit? Actually, the woman won the lawsuit. And which argument would you buy now? You've got two arguments. Which one would you believe if either? Which one? I mean, now if I look at exactly this data I might side-- I might side with Berkeley looking at these numbers.

Then again, when you think about all 22 departments and the fact they weren't this lopsided, not so good. So in the end Berkeley lost. I'm going to see another example in a minute where it's even more clear which side to believe in. But it really depends on the numbers as to which one you might, if you had to vote, which way you'd vote.

Here's another example. This is from a newspaper article on which airlines are best to fly because they have the best on-time rates. And in this case they were comparing American Airlines and America West, looking at on-time rates. And here's the data they showed for the two airlines.

Here's American Airlines. Here's America West. And they took five cities, LA, Phoenix, San Diego, San Francisco, and Seattle. And then you looked at the number on time, the number of flights, and then the rate, percentage on time. And then same thing here. Number on time, number of flights, and the rate.

So I'm just going to give you the numbers here. So they had 500 out of 560 for a rate of 89%, 220 over 230 for 95, 210 over 230 for 92%, 500 over 600 for 83%, and then Seattle. They had a lot of flights. That's where they're-- we have a hub of 2,200 for 86%.

And if you added them all up, they got 3,300 out of 3,820 for 87% on time. Now the data for American West looks something like the following. In LA it's 700 out of 800 for 87%. they're based in Phoenix. They got a zillion flights there. 4,900 out of 5,300 for 92%. And 400 over 450 for 89%, 320, over 450, 71%, 200 over 260 for 77%. And then you add all them up. And you've got 6,520 over 7,260 for 90%.

So the newspaper concluded and literally said that American West is the better airline to fly because they're on-time rate is much better. It's 90% versus 87%. What do you think? Which airline would you fly looking at that data?

AUDIENCE: [INAUDIBLE]

PROFESSOR: I know which one I'd fly. It looks like America West is better. Every single city, American Airlines is better. 92 versus 89. Everywhere it's better by a bunch. 83 versus 71. 86 versus 77. Every single city, American Airlines is better. Yet, America West is better overall. And that's what the newspaper said. They went on this.

But of course, no matter where you're going you're better off with American Airlines. All right? Now what happened here? The waiting. In fact, America West flies out of Phoenix where the weather's great. So you get a higher on-time rate when in a good-weather city. And they got most of their flights there. American Airlines got a lot of flights in Seattle where the weather sucks and you're always delayed.

All right? And so they look worse on average because so many of their flights are in a bad city and so many of America West are in a good city. All right? So it makes America West look better when in fact, in this case, it's absolutely clear whose better.

American Airlines is better, every single city. All right. That's why Mark Twain said, "There's three kinds of lies-- lies, damned lies, and statistics." We'll see more examples next time.