# Lecture 1: Probability Models and Axioms

Flash and JavaScript are required for this feature.

Download the video from iTunes U or the Internet Archive.

Description: In this lecture, the professor discussed probability as a mathematical framework, probabilistic models, axioms of probability, and gave some simple examples.

Instructor: John Tsitsiklis

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.

PROFESSOR: OK, so welcome to 6.041/6.431, the class on probability models and the like. I'm John Tsitsiklis. I will be teaching this class, and I'm looking forward to this being an enjoyable and also useful experience. We have a fair amount of staff involved in this course, your recitation instructors and also a bunch of TAs, but I want to single out our head TA, Uzoma, who is the key person in this class. Everything has to go through him. If he doesn't know in which recitation section you are, then simply you do not exist, so keep that in mind.

All right. So we want to jump right into the subject, but I'm going to take just a few minutes to talk about a few administrative details and how the course is run. So we're going to have lectures twice a week and I'm going to use old fashioned transparencies. Now, you get copies of these slides with plenty of space for you to keep notes on them. A useful way of making good use of the slides is to use them as a sort of mnemonic summary of what happens in lecture. Not everything that I'm going to say is, of course, on the slides, but by looking them you get the sense of what's happening right now. And it may be a good idea to review them before you go to recitation.

So what happens in recitation? In recitation, your recitation instructor is going to maybe review some of the theory and then solve some problems for you. And then you have tutorials where you meet in very small groups together with your TA. And what happens in tutorials is that you actually do the problem solving with the help of your TA and the help of your classmates in your tutorial section.

Now probability is a tricky subject. You may be reading the text, listening to lectures, everything makes perfect sense, and so on, but until you actually sit down and try to solve problems, you don't quite appreciate the subtleties and the difficulties that are involved. So problem solving is a key part of this class. And tutorials are extremely useful just for this reason because that's where you actually get the practice of solving problems on your own, as opposed to seeing someone else who's solving them for you.

OK but, mechanics, a key part of what's going to happen today is that you will turn in your schedule forms that are at the end of the handout that you have in your hands. Then, the TAs will be working frantically through the night, and they're going to be producing a list of who goes into what section. And when that happens, any person in this class, with probability 90%, is going to be happy with their assignment and, with probability 10%, they're going to be unhappy.

Now, unhappy people have an option, though. You can resubmit your form together with your full schedule and constraints, give it back to the head TA, who will then do some further juggling and reassign people, and after that happens, 90% of those unhappy people will become happy. And 10% of them will be less unhappy.

OK. So what's the probability that a random person is going to be unhappy at the end of this process? It's 1%. Excellent. Good. Maybe you don't need this class. OK, so 1%. We have about 100 people in this class, so there's going to be about one unhappy person. I mean, anywhere you look in life, in any group you look at, there's always one unhappy person, right? So, what can we do about it?

All right. Another important part about mechanics is to read carefully the statement that we have about collaboration, academic honesty, and all that. You're encouraged, it's a very good idea to work with other students. You can consult sources that are out there, but when you sit down and write your solutions you have to do that by setting things aside and just write them on your own. You cannot copy something that somebody else has given to you.

One reason is that we're not going to like it when it happens, and then another reason is that you're not going to do yourself any favor. Really the only way to do well in this class is to get a lot of practice by solving problems yourselves. So if you don't do that on your own, then when quiz and exam time comes, things are going to be difficult.

So, as I mentioned here, we're going to have recitation sections, that some of them are for 6.041 students, some are for 6.431 students, the graduate section of the class. Now undergraduates can sit in the graduate recitation sections. What's going to happen there is that things may be just a little faster and you may be covering a problem that's a little more advanced and is not covered in the undergrad sections. But if you sit in the graduate section, and you're an undergraduate, you're still just responsible for the undergraduate material. That is, you can just do the undergraduate work in the class, but maybe be exposed at the different section.

OK. A few words about the style of this class. We want to focus on basic ideas and concepts. There's going to be lots of formulas, but what we try to do in this class is to actually have you understand what those formulas mean. And, in a year from now when almost all of the formulas have been wiped out from your memory, you still have the basic concepts. You can understand them, so when you look things up again, they will still make sense. It's not the plug and chug kind of class where you're given a list of formulas, you're given numbers, and you plug in and you get answers.

The really hard part is usually to choose which formulas you're going to use. You need judgment, you need intuition. Lots of probability problems, at least the interesting ones, often have lots of different solutions. Some are extremely long, some are extremely short. The extremely short ones usually involve some kind of deeper understanding of what's going on so that you can pick a shortcut and use it. And hopefully you are going to develop this skill during this class.

Now, I could spend a lot of time in this lecture talking about why the subject is important. I'll keep it short because I think it's almost obvious. Anything that happens in life is uncertain. There's uncertainty anywhere, so whatever you try to do, you need to have some way of dealing or thinking about this uncertainty. And the way to do that in a systematic way is by using the models that are given to us by probability theory.

So if you're an engineer and you're dealing with a communication system or signal processing, basically you're facing a fight against noise. Noise is random, is uncertain. How do you model it? How do you deal with it?

If you're a manager, I guess you're dealing with customer demand, which is, of course, random. Or you're dealing with the stock market, which is definitely random. Or you play the casino, which is, again, random, and so on. And the same goes for pretty much any other field that you can think of.

But, independent of which field you're coming from, the basic concepts and tools are really all the same. So you may see in bookstores that there are books, probability for scientists, probability for engineers, probability for social scientists, probability for astrologists. Well, what all those books have inside them is exactly the same models, the same equations, the same problems. They just make them somewhat different word problems.

The basic concepts are just one and the same, and we'll take this as an excuse for not going too much into specific domain applications. We will have problems and examples that are motivated, in some loose sense, from real world situations. But we're not really trying in this class to develop the skills for domain-specific problems. Rather, we're going to try to stick to general understanding of the subject.

OK. So the next slide, of which you do have in your handout, gives you a few more details about the class. Maybe one thing to comment here is that you do need to read the text. And with calculus books, perhaps you can live with a just a two page summary of all of the interesting formulas in calculus, and you can get by just with those formulas. But here, because we want to develop concepts and intuition, actually reading words, as opposed to just browsing through equations, does make a difference.

In the beginning, the class is kind of easy. When we deal with discrete probability, that's the material until our first quiz, and some of you may get by without being too systematic about following the material. But it does get substantially harder afterwards. And I would keep restating that you do have to read the text to really understand the material.

OK. So now we can start with the real part of the lecture. Let us set the goals for today. So probability, or probability theory, is a framework for dealing with uncertainty, for dealing with situations in which we have some kind of randomness. So what we want to do is, by the end of today's lecture, to give you anything that you need to know how to set up what does it take to set up a probabilistic model. And what are the basic rules of the game for dealing with probabilistic models? So, by the end of this lecture, you will have essentially recovered half of this semester's tuition, right?

So we're going to talk about probabilistic models in more detail-- the sample space, which is basically a description of all the things that may happen during a random experiment, and the probability law, which describes our beliefs about which outcomes are more likely to occur compared to other outcomes. Probability laws have to obey certain properties that we call the axioms of probability. So the main part of today's lecture is to describe those axioms, which are the rules of the game, and consider a few really trivial examples.

OK, so let's start with our agenda. The first piece in a probabilistic model is a description of the sample space of an experiment. So we do an experiment, and by experiment we just mean that just something happens out there. And that something that happens, it could be flipping a coin, or it could be rolling a dice, or it could be doing something in a card game.

So we fix a particular experiment. And we come up with a list of all the possible things that may happen during this experiment. So we write down a list of all the possible outcomes. So here's a list of all the possible outcomes of the experiment. I use the word "list," but, if you want to be a little more formal, it's better to think of that list as a set.

So we have a set. That set is our sample space. And it's a set whose elements are the possible outcomes of the experiment. So, for example, if you're dealing with flipping a coin, your sample space would be heads, this is one outcome, tails is one outcome. And this set, which has two elements, is the sample space of the experiment.

OK. What do we need to think about when we're setting up the sample space? First, the list should be mutually exclusive, collectively exhaustive. What does that mean?

Collectively exhaustive means that, no matter what happens in the experiment, you're going to get one of the outcomes inside here. So you have not forgotten any of the possibilities of what may happen in the experiment. Mutually exclusive means that if this happens, then that cannot happen. So at the end of the experiment, you should be able to point out to me just one, exactly one, of these outcomes and say, this is the outcome that happened.

OK. So these are sort of basic requirements. There's another requirement which is a little more loose. When you set up your sample space, sometimes you do have some freedom about the details of how you're going to describe it. And the question is, how much detail are you going to include?

So let's take this coin flipping experiment and think of the following sample space. One possible outcome is heads, a second possible outcome is tails and it's raining, and the third possible outcome is tails and it's not raining. So this is another possible sample space for the experiment where I flip a coin just once. It's a legitimate one. These three possibilities are mutually exclusive and collectively exhaustive.

Which one is the right sample space? Is it this one or that one? Well, if you think that my coin flipping inside this room is completely unrelated to the weather outside, then you're going to stick with this sample space. If, on the other hand, you have some superstitious belief that maybe rain has an effect on my coins, you might work with the sample space of this kind. So you probably wouldn't do that, but it's a legitimate option, strictly speaking.

Now this example is a little bit on the frivolous side, but the issue that comes up here is a basic one that shows up anywhere in science and engineering. Whenever you're dealing with a model or with a situation, there are zillions of details in that situation. And when you come up with a model, you choose some of those details that you keep in your model, and some that you say, well, these are irrelevant. Or maybe there are small effects, I can neglect them, and you keep them outside your model. So when you go to the real world, there's definitely an element of art and some judgment that you need to do in order to set up an appropriate sample space.

So, an easy example now. So of course, the elementary examples are coins, cards, and dice. So let's deal with dice. But to keep the diagram small, instead of a six-sided die, we're going to think about the die that only has four faces. So you can do that with a tetrahedron, doesn't really matter. Basically, it's a die that when you roll it, you get a result which is one, two, three or four.

However, the experiment that I'm going to think about will consist of two rolls of a dice. A crucial point here-- I'm rolling the die twice, but I'm thinking of this as just one experiment, not two different experiments, not a repetition twice of the same experiment. So it's one big experiment. During that big experiment various things could happen, such as I'm rolling the die once, and then I'm rolling the die twice.

OK. So what's the sample space for that experiment? Well, the sample space consists of the possible outcomes. One possible outcome is that your first roll resulted in two and the second roll resulted in three. In which case, the outcome that you get is this one, a two followed by three. This is one possible outcome.

The way I'm describing things, this outcome is to be distinguished from this outcome here, where a three is followed by two. If you're playing backgammon, it doesn't matter which one of the two happened. But if you're dealing with a probabilistic model that you want to keep track of everything that happens in this composite experiment, there are good reasons for distinguishing between these two outcomes. I mean, when this happens, it's definitely something different from that happening. A two followed by a three is different from a three followed by a two.

So this is the correct sample space for this experiment where we roll the die twice. It has a total of 16 elements and it's, of course, a finite set.

Sometimes, instead of describing sample spaces in terms of lists, or sets, or diagrams of this kind, it's useful to describe the experiment in some sequential way. Whenever you have an experiment that consists of multiple stages, it might be useful, at least visually, to give a diagram that shows you how those stages evolve. And that's what we do by using a sequential description or a tree-based description by drawing a tree of the possible evolutions during our experiment.

So in this tree, I'm thinking of a first stage in which I roll the first die, and there are four possible results, one, two, three and four.and 4. And, given what happened, let's say in the first roll, suppose I got a one. Then I'm rolling the second dice, and there are four possibilities for what may happen to the second die. And the possible results are one, tow, three and four again.

So what's the relation between the two diagrams? Well, for example, the outcome two followed by three corresponds to this path on the tree. So this path corresponds to two followed by a three. Any path is associated to a particular outcome, any outcome is associated to a particular path.

And, instead of paths, you may want to think in terms of the leaves of this diagram. Same thing, think of each one of the leaves as being one possible outcome. And of course we have 16 outcomes here, we have 16 outcomes here.

Maybe you noticed the subtlety that I used in my language. I said I rolled the first dice and the result that I get is a two. I didn't use the word "outcome." I want to reserve the word "outcome" to mean the overall outcome at the end of the overall experiment.

So "2, 3" is the outcome of the experiment. The experiment consisted of stages. Two was the result in the first stage, three was the result in the second stage. You put all those results together, and you get your outcome. OK, perhaps we are splitting hairs here, but it's useful to keep the concepts right.

What's special about this example is that, besides being trivial, it has a sample space which is finite. There's 16 possible total outcomes. Not every experiment has a finite sample space.

Here's an experiment in which the sample space is infinite. So you are playing darts and the target is this square. And you're perfect at that game, so you're sure that your darts will always fall inside the square. So, but where exactly your dart would fall inside that square, that itself is random. We don't know what it's going to be. It's uncertain.

So all the possible points inside the square are possible outcomes of the experiment. So a typical outcome of the experiment is going to a pair of numbers, x,y, where x and y are real numbers between zero and one. Now there's infinitely many real numbers, there's infinitely many points in the square, so this is an example in which our sample space is an infinite set. OK, so we're going to revisit this example a little later.

So these are two examples of what the sample space might be in simple experiments. Now, the more important order of business is now to look at those possible outcomes and to make some statements about their relative likelihoods. Which outcome is more likely to occur compared to the others? And the way we do this is by assigning probabilities to the outcomes.

Well, not exactly. Suppose that all you were to do was to assign probabilities to individual outcomes. If you go back to this example, and you consider one particular outcome-- let's say this point-- what would be the probability that you hit exactly this point to infinite precision? Intuitively, that probability would be zero. So any individual point in this diagram in any reasonable model should have zero probability. So if you just tell me that any individual outcome has zero probability, you're not really telling me much to work with.

For that reason, what instead we're going to do is to assign probabilities to subsets of the sample space, as opposed to assigning probabilities to individual outcomes. So here's the picture. We have our sample space, which is omega, and we consider some subset of the sample space. Call it A. And I want to assign a number, a numerical probability, to this particular subset which represents my belief about how likely this set is to occur.

OK. What do we mean "to occur?" And I'm introducing here a language that's being used in probability theory. When we talk about subsets of the sample space, we usually call them events, as opposed to subsets. And the reason is because it works nicely with the language that describes what's going on.

So the outcome is a point. The outcome is random. The outcome may be inside this set, in which case we say that event A occurred, if we get an outcome inside here. Or the outcome may fall outside the set, in which case we say that event A did not occur.

So we're going to assign probabilities to events. And now, how should we do this assignment? Well, probabilities are meant to describe your beliefs about which sets are more likely to occur versus other sets. So there's many ways that you can assign those probabilities. But there are some ground rules for this game.

First, we want probabilities to be numbers between zero and one because that's the usual convention. So a probability of zero means we're certain that something is not going to happen. Probability of one means that we're essentially certain that something's going to happen. So we want numbers between zero and one.

We also want a few other things. And those few other things are going to be encapsulated in a set of axioms. What "axioms" means in this context, it's the ground rules that any legitimate probabilistic model should obey. You have a choice of what kind of probabilities you use. But, no matter what you use, they should obey certain consistency properties because if they obey those properties, then you can go ahead and do useful calculations and do some useful reasoning.

So what are these properties? First, probabilities should be non-negative. OK? That's our convention. We want probabilities to be numbers between zero and one. So they should certainly be non-negative. The probability that event A occurs should be a non-negative number.

What's the second axiom? The probability of the entire sample space is equal to one. Why does this make sense? Well, the outcome is certain to be an element of the sample space because we set up a sample space, which is collectively exhaustive. No matter what the outcome is, it's going to be an element of the sample space. We're certain that event omega is going to occur. Therefore, we represent this certainty by saying that the probability of omega is equal to one.

Pretty straightforward so far. The more interesting axiom is the third rule. Before getting into it, just a quick reminder.

If you have two sets, A and B, the intersection of A and B consists of those elements that belong both to A and B. And we denote it this way. When you think probabilistically, the way to think of intersection is by using the word "and." This event, this intersection, is the event that A occurred and B occurred. If I get an outcome inside here, A has occurred and B has occurred at the same time. So you may find the word "and" to be a little more convenient than the word "intersection."

And similarly, we have some notation for the union of two events, which we write this way. The union of two sets, or two events, is the collection of all the elements that belong either to the first set, or to the second, or to both. When you talk about events, you can use the word "or." So this is the event that A occurred or B occurred. And this "or" means that it could also be that both of them occurred.

OK. So now that we have this notation, what does the third axiom say? The third axiom says that if we have two events, A and B, that have no common elements-- so here's A, here's B, and perhaps this is our big sample space. The two events have no common elements. So the intersection of the two events is the empty set. There's nothing in their intersection. Then, the total probability of A together with B has to be equal to the sum of the individual probabilities. So the probability that A occurs or B occurs is equal to the probability that A occurs plus the probability that B occurs.

So think of probability as being cream cheese. You have one pound of cream cheese, the total probability assigned to the entire sample space. And that cream cheese is spread out over this set. The probability of A is how much cream cheese sits on top of A. Probability of B is how much sits on top of B. The probability of A union B is the total amount of cream cheese sitting on top of this and that, which is obviously the sum of how much is sitting here and how much is sitting there.

So probabilities behave like cream cheese, or they behave like mass. For example, if you think of some material object, the mass of this set consisting of two pieces is obviously the sum of the two masses. So this property is a very intuitive one. It's a pretty natural one to have.

OK. Are these axioms enough for what we want to do? I mentioned a while ago that we want probabilities to be numbers between zero and one. Here's an axiom that tells you that probabilities are non-negative. Should we have another axiom that tells us that probabilities are less than or equal to one? It's a desirable property. We would like to have it in our hands.

OK, why is it not in that list? Well, the people who are in the axiom making business are mathematicians and mathematicians tend to be pretty laconic. You don't say something if you don't have to say it. And this is the case here. We don't need that extra axiom because we can derive it from the existing axioms.

Here's how it goes. One is the probability over the entire sample space. Here we're using the second axiom. Now the sample space consists of A together with the complement of A. OK? When I write the complement of A, I mean the complement of A inside of the set omega. So we have omega, here's A, here's the complement of A, and the overall set is omega.

OK. Now, what's the next step? What should I do next? Which axiom should I use? We use axiom three because a set and the complement of that set are disjoint. They don't have any common elements. So axiom three applies and tells me that this is the probability of A plus the probability of A complement. In particular, the probability of A is equal to one minus the probability of A complement, and this is less than or equal to one.

Why? Because probabilities are non-negative, by the first axiom.

OK. So we got the conclusion that we wanted. Probabilities are always less than or equal to one, and this is a simple consequence of the three axioms that we have. This is a really nice argument because it actually uses each one of those axioms. The argument is simple, but you have to use all of these three properties to get the conclusion that you want.

OK. So we can get interesting things out of our axioms. Can we get some more interesting ones? How about the union of three sets? What kind of probability should it have?

So here's an event consisting of three pieces. And I want to say something about the probability of A union B union C. What I would like to say is that this probability is equal to the sum of the three individual probabilities. How can I do it?

I have an axiom that tells me that I can do it for two events. I don't have an axiom for three events. Well, maybe I can manage things and still be able to use that axiom. And here's the trick. The union of three sets, you can think of it as forming the union of the first two sets and then taking the union with the third set. OK? So taking unions, you can take the unions in any order that you want.

So here we have the union of two sets. Now, ABC are disjoint, by assumption or that's how I drew it. So if A, B, and C are disjoint, then A union B is disjoint from C. So here we have the union of two disjoint sets. So by the additivity axiom, the probability of that the union is going to be the probability of the first set plus the probability of the second set.

And now I can use the additivity axiom once more to write that this is probability of A plus probability of B plus probability of C. So by using this axiom which was stated for two sets, we can actually derive a similar property for the union of three disjoint sets. And then you can repeat this argument as many times as you want. It's valid for the union of ten disjoint sets, for the union of a hundred disjoint sets, for the union of any finite number of sets. So if A1 up to An are disjoint, then the probability of A1 union An is equal to the sum of the probabilities of the individual sets.

OK. Special case of this is when we're dealing with finite sets. Suppose I have just a finite set of outcomes. I put them together in a set and I'm interested in the probability of that set. So here's our sample space. There's lots of outcomes, but I'm taking a few of these and I form a set out of them.

This is a set consisting of, in this picture, three elements. In general, it consists of k elements. Now, a finite set, I can write it as a union of single element sets. So this set here is the union of this one element set, together with this one element set together with that one element set. So the total probability of this set is going to be the sum of the probabilities of the one element sets.

Now, probability of a one element set, you need to use the brackets here because probabilities are assigned to sets. But this gets kind of tedious, so here one abuses notation a little bit and we get rid of those brackets and just write probability of this single, individual outcome. In any case, conclusion from this exercise is that the total probability of a finite collection of possible outcomes, the total probability is equal to the sum of the probabilities of individual elements.

So these are basically the axioms of probability theory. Or, well, they're almost the axioms. There are some subtleties that are involved here.

One subtlety is that this axiom here doesn't quite do the job for everything we would like to do. And we're going to come back to this at the end of the lecture. A second subtlety has to do with weird sets.

We said that an event is a subset of the sample space and we assign probabilities to events. Does this mean that we are going to assign probability to every possible subset of the sample space? Ideally, we would wish to do that. Unfortunately, this is not always possible.

If you take a sample space, such as the square, the square has nice subsets, those that you can describe by cutting it with lines and so on. But it does have some very ugly subsets, as well, that are impossible to visualize, impossible to imagine, but they do exist. And those very weird sets are such that there's no way to assign probabilities to them in a way that's consistent with the axioms of probability.

OK. So this is a very, very fine point that you can immediately forget for the rest of this class. You will only encounter these sets if you end up doing doctoral work on the theoretical aspects of probability theory. So it's just a mathematical subtlety that some very weird sets do not have probabilities assigned to them. But we're not going to encounter these sets and they do not show up in any applications.

OK. So now let's revisit our examples. Let's go back to the die example. We have our sample space. Now we need to assign a probability law. There's lots of possible probability laws that you can assign. I'm picking one here, arbitrarily, in which I say that every possible outcome has the same probability of 1/16.

OK. Why do I make this model? Well, empirically, if you have well-manufactured dice, they tend to behave that way. We will be coming back to this kind of story later in this class. But I'm not saying that this is the only probability law that there can be. You might have weird dice in which certain outcomes are more likely than others. But to keep things simple, let's take every outcome to have the same probability of 1/16.

OK. Now that we have in our hands a sample space and the probability law, we can actually solve any problem there is. We can answer any question that could be posed to us. For example, what's the probability that the outcome, which is this pair, is either 1,1 or 1,2. We're talking here about this particular event, 1,1 or 1,2. So it's an event consisting of these two items.

According to what we were just discussing, the probability of a finite collection of outcomes is the sum of their individual probabilities. Each one of them has probability of 1/16, so the probability of this is 2/16.

How about the probability of the event that x is equal to one. x is the first roll, so that's the probability that the first roll is equal to one. Notice the syntax that's being used here. Probabilities are assigned to subsets, to sets, so we think of this as meaning the set of all outcomes such that x is equal to one.

How do you answer this question? You go back to the picture and you try to visualize or identify this event of interest. x is equal to one corresponds to this event here. These are all the outcomes at which x is equal to one. There's four outcomes. Each one has probability 1/16, so the answer is 4/16.

OK. How about the probability that x plus y is odd? OK. That will take a little bit more work.

But you go to the sample space and you identify all the outcomes at which the sum is an odd number. So that's a place where the sum is odd, these are other places, and I guess that exhausts all the possible outcomes at which we have an odd sum. We count them. How many are there? There's a total of eight of them. Each one has probability 1/16, total probability is 8/16.

And harder question. What is the probability that the minimum of the two rolls is equal to 2? This is something that you probably couldn't do in your head without the help of a diagram. But once you have a diagram, things are simple.

You ask the question. OK, this is an event, that the minimum of the two rolls is equal to two. This can happen in several ways. What are the several ways that it can happen? Go to the diagram and try to identify them.

So the minimum is equal to two if both of them are two's. Or it could be that x is two and y is bigger, or y is two and x is bigger. OK. I guess we rediscover that yellow and blue make green, so we see here that there's a total of five possible outcomes. The probability of this event is 5/16.

Simple example, but the procedure that we followed in this example actually applies to any probability model you might ever encounter. You set up your sample space, you make a statement that describes the probability law over that sample space, then somebody asks you questions about various events. You go to your pictures, identify those events, pin them down, and then start kind of counting and calculating the total probability for those outcomes that you're considering.

This example is a special case of what is called the discrete uniform law. The model obeys the discrete uniform law if all outcomes are equally likely. It doesn't have to be that way. That's just one example of a probability law.

But when things are that way, if all outcomes are equally likely and we have N of them, and you have a set A that has little n elements, then each one of those elements has probability one over capital N since all outcomes are equally likely. And for our probabilities to add up to one, each one must have this much probability, and there's little n elements. That gives you the probability of the event of interest.

So problems like the one in the previous slide and more generally of the type described here under discrete uniform law, these problems reduce to just counting. How many elements are there in my sample space? How many elements are there inside the event of interest? Counting is generally simple, but for some problems it gets pretty complicated. And in a couple of weeks, we're going to have to spend the whole lecture just on the subject of how to count systematically.

Now the procedure we followed in the previous example is the same as the procedure you would follow in continuous probability problems. So, going back to our dart problem, we get the random point inside the square. That's our sample space. We need to assign a probability law. For lack of imagination, I'm taking the probability law to be the area of a subset.

So if we have two subsets of the sample space that have equal areas, then I'm postulating that they are equally likely to occur. The probably that they fall here is the same as the probability that they fall there. The model doesn't have to be that way. But if I have sort of complete ignorance of which points are more likely than others, that might be the reasonable model to use.

So equal areas mean equal probabilities. If the area is twice as large, the probability is going to be twice as big. So this is our model.

We can now answer questions. Let's answer the easy one. What's the probability that the outcome is exactly this point? That of course is zero because a single point has zero area. And since this probability is equal to area, that's zero probability.

How about the probability that the sum of the coordinates of the point that we got is less than or equal to 1/2? How do you deal with it? Well, you look at the picture again, at your sample space, and try to describe the event that you're talking about. The sum being less than 1/2 corresponds to getting an outcome that's below this line, where this line is the line where x plus y equals to 1/2. So the intercepts of that line with the axis are 1/2 and 1/2.

So you describe the event visually and then you use your probability law. The probability law that we have is that the probability of a set is equal to the area of that set. So all we need to find is the area of this triangle, which is 1/2 times 1/2 times 1/2, half, equals to 1/8.

OK. Moral from these two examples is that it's always useful to have a picture and work with a picture to visualize the events that you're talking about. And once you have a probability law in your hands, then it's a matter of calculation to find the probabilities of an event of interest. The calculations we did in these two examples, of course, were very simple.

Sometimes calculations may be a lot harder, but it's a different business. It's a business of calculus, for example, or being good in algebra and so on. As far as probability is concerned, it's clear what you will be doing, and then maybe you're faced with a harder algebraic part to actually carry out the calculations. The area of a triangle is easy to compute. If I had put down a very complicated shape, then you might need to solve a hard integration problem to find the area of that shape, but that's stuff that belongs to another class that you have presumably mastered by now.

Good, OK. So now let me spend just a couple of minutes to return to a point that I raised before. I was saying that the axiom that we had about additivity might not quite be enough. Let's illustrate what I mean by the following example.

Think of the experiment where you keep flipping a coin and you wait until you obtain heads for the first time. What's the sample space of this experiment? It might happen the first flip, it might happen in the tenth flip. Heads for the first time might occur in the millionth flip.

So the outcome of this experiment is going to be an integer and there's no bound to that integer. You might have to wait very much until that happens. So the natural sample space is the set of all possible integers.

Somebody tells you some information about the probability law. The probability that you have to wait for n flips is equal to two to the minus n. Where did this come from? That's a separate story. Where did it come from? Somebody tells this to us, and those probabilities are plotted here as a function of n.

And you're asked to find the probability that the outcome is an even number. How do you go about calculating that probability? So the probability of being an even number is the probability of the subset that consists of just the even numbers. So it would be a subset of this kind, that includes two, four, and so on.

So any reasonable person would say, well the probability of obtaining an outcome that's either two or four or six and so on is equal to the probability of obtaining a two, plus the probability of obtaining a four, plus the probability of obtaining a six, and so on. These probabilities are given to us. So here I have to do my algebra. I add this geometric series and I get an answer of 1/3. That's what any reasonable person would do.

But the person who only knows the axioms that they posted just a little earlier may get stuck. They would get stuck at this point. How do we justify this?

We had this property for the union of disjoint sets and the corresponding property that tells us that the total probability of finitely many things, outcomes, is the sum of their individual probabilities. But here we're using it on an infinite collection. The probability of infinitely many points is equal to the sum of the probabilities of each one of these. To justify this step we need to introduce one additional rule, an additional axiom, that tells us that this step is actually legitimate.

And this is the countable additivity axiom, which is a little stronger, or quite a bit stronger, than the additivity axiom we had before. It tells us that if we have a sequence of sets that are disjoint and we want to find their total probability, then we are allowed to add their individual probabilities. So the picture might be such as follows.

We have a sequence of sets, A1, A2, A3, and so on. I guess in order to fit them inside the sample space, the sets need to get smaller and smaller perhaps. They are disjoint. We have a sequence of such sets. The total probability of falling anywhere inside one of those sets is the sum of their individual probabilities.

A key subtlety that's involved here is that we're talking about a sequence of events. By "sequence" we mean that these events can be arranged in order. I can tell you the first event, the second event, the third event, and so on. So if you have such a collection of events that can be ordered as first, second, third, and so on, then you can add their probabilities to find the probability of their union.

So this point is actually a little more subtle than you might appreciate at this point, and I'm going to return to it at the beginning of the next lecture. For now, enjoy the first week of classes and have a good weekend. Thank you.

## Free Downloads

### Caption

• English-US (SRT)