Home » Courses » Electrical Engineering and Computer Science » Discrete Stochastic Processes » Video Lectures » Lecture 12: Renewal Rewards, Stopping Trials, and Wald's Inequali
Flash and JavaScript are required for this feature.
Download the video from iTunes U or the Internet Archive.
Description: In this lecture, we learn about time-averages for renewal rewards, stopping trials for stochastic processes, and Wald's equality.
Instructor: Prof. Robert Gallager
Lecture 12: Renewal Rewards...
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.
PROFESSOR: OK, today we are moving on with renewal processes, and we're going to finish up talking a little more about residual life, and we're going to generalize it to time averages for arbitrary renewal processes. And then finally we're going into the topic of stopping trials, stopping times, optional stopping times-- people call them different things-- and Wald's equality. This is, for some reason or other, normally a very tricky thing to understand.
I think finally after many years I understand it. And I hope that I can make it easy for you to understand it, so some of the confusions that often come in won't come in. And then we're going to end up talking about stopping when you're ahead, which is a gambling strategy which says if you're playing a fair game you play until you're $1.00 ahead, and then you stop. And you show that with probability 1 you will eventually become $1.00 ahead, so you have a way to beat the bank, or beat the casino. But of course casinos don't give you even odds, but you have a way to win any time you get even odds. We'll find that there's something wrong with that, but it is an interesting topic which illustrates Wald's equality in an interesting way.
So let's start out by reviewing a little bit. We first talked last time about convergence with probability 1, and we weren't talking about the strong law then, we were just talking about a sequence of random variables and what it means for them to converge with probability 1. And the theorem is that if you have a sequence of random variables, z sub n, they converge to some number alpha. This is slightly different than the theorem we stated last time, but it's trivially the same. In other words, it's the probability of the set of sample points. A sample point, now, is something which runs from time 0 to time infinity. It's something that covers the whole process, covers anything else you might want to talk about also. It's what you do when you go from a real situation to a model, and mathematically when you go to a model you put in everything you're going to be interested in at the beginning.
So, it converges to some alpha probability 1 if the set of sequences for which the statement holds has probability 1, and this statement is that the sample paths converge to alpha. The set of sample paths has converged to alpha as probability 1. So, it's a statement not like most of the statements in probability where you start out talking about finite length, talking about probabilities, and expectation, and so forth. And then you go to the limit with these things you're used to talking about, which are easier to deal with. Here it's a statement about sample paths over the whole sequence. You break them into two categories. Implicitly we have shown that when you break them into two categories those two categories in fact have probabilities. They are events, and one of them has probability 1.
For renewal processes, if we have a renewal process we can use this theorem to talk about the sample averages. In other words, in a renewal process we have a sequence of IID random variables, x1, x2, x3, and so forth. We have the sample sums, the sum of the time 1, the sum of the time 2, sum of the time 3, and so forth. And the strong law of large numbers, which comes pretty much directly out of this convergence of probability 1 theorem, says that the probability is the set of sample points for which the limit of the sample average, namely sn of omega over n. You take the sample average over all lengths for one sample point. That goes to a limit, that's this convergence with probability 1, the probability that you get convergence is equal to 1, which is what with probability 1 means. Then we use the theorem on the top.
Let's reset a little bit. This theorem on the top is more than just a theorem about convergence with probability 1. It's the reason why convergence with probability 1 is so useful. It says if you have some function f of x, that function is continuous at this point alpha. You have this limit here, then the probability of the set of omega for which the function applied to each of these sample points as equal to alpha is also one.
AUDIENCE: [INAUDIBLE].
PROFESSOR: It should be f of alpha. Yes, absolutely. Sorry about that. This is f of alpha right there. So that probability is equal to 1. I'm finally getting my bearings back on this.
So if I use this theorem here, where I use f of x as equal to 1 over x, then what it says for a renewal process, which has inter-renewals x1, x2, and so forth, with a inter-renewal process the expected inter-renewal time, which might be infinite in general, but it has to be bigger than 0. It has to be bigger than 0 because renewals in 0 time are not possible by definition. And you've done a problem in the homework, or you will do a problem in the homework, whether you've done it or not, I don't know, but when you do it, it will in fact show that the expected value of x has to be greater than 0. We're also assuming that it's less than infinity. So, at that point we can use this theorem which says that when you look at, not sn of omega over n, but the inverse of that, n over sn of omega, the limit of that is going to be equal to f of x-bar which is 1 over x-bar. So we have this limit here which is equal to 1 with probability 1.
This is almost the strong law for renewal processes, and the reason why it's not takes a couple of pages in the text. What I'm going to try to argue here is that the couple of pages, while you need to do mathematically, you can see why when you do it you can see what answer you're going to get. And the argument is the following. We talked about this last time a little bit. If you take an arbitrary time t then you look at n of t for that value of t, that's the number of arrivals you've had in this renewal process up until time t. And then you compare this point down here, which is the time of the last arrival before time t, and this point here, which is the time of the next arrival after time t, and you look at the slopes here, this slope here is n of t over t. That's the thing we're interested in. That expression there is the time average of the number of renewals within time t, and what we're interested in is what this becomes as t goes to infinity. That's squeezed between this slope here, which is n of t over s sub n of t. This is s sub n of t. And this point here is n of t, and this lower band here, which is n of t divided by s sub n of t plus 1.
Now, all this argument is devoted to proving two things, which are almost obvious and therefore I'm just going to wave my hands about it. What this says is when t gets very, very large, n of t becomes very, very large also. When n of t becomes very, very large, the difference between this slope and this slope doesn't make a hill of beans worth of difference, and therefore those two, that upper band and that lower band become the same. So with the argument as t gets large, n of t gets large, and these two slopes become the same, you suddenly have the statement that if you have a renewal process and the inter-renewal time is less than infinity, it has to be greater than 0. Then on a sample path basis the sample paths, the number of arrivals per unit time, converges to 1 over x-bar. That's what this previous picture was saying to you. It was saying that n over t over t, which is the number of arrivals per unit time t, as t becomes very, very large on a sample path basis that becomes equal to 1 over x-bar.
So, it says that the rate of renewals over the infinite time horizon is 1 over x-bar with probability 1. And what probability 1 statement again is saying that the set of sample paths for which this is true has probability 1. This also implies the weak law for renewals, which says that the limit as t approaches infinity of the probability that n of t over t minus 1 over x-bar is greater than epsilon goes to 0. This, surprisingly enough, is a statement that you hardly ever see. You see a bunch of other renewal theorems, which this theorem up here is probably the most important. There's something called an elementary renewal theorem, which is sort of a trivial statement but kind of hard to prove. And there's something called Blackwell's theorem, which talks about a local period of time after time gets very, very large. It's talking about the number of renewals in some tiny interval and it's proving things about stationarity over that local time interval. And we'll talk about that later, but this is sort of a nice result, and that's what it is.
Let's review residual life a little bit. The residual life of a renewal process at time t, the residual life is the time you have to wait until the next renewal, until the next arrival. So residual life means, how long does it take? If you think of these renewals as something that comes along and kills everybody, then this residual life is the amount of life that people still have left. So, if you look at what that is for a particular sample path, I have a particular sample path drawn here, completely arbitrary. When I look at the first arrival time here, s1 of omega, and I look at all of the time before that, the amount of time you have to wait until this arrival.
Remember this is one sample point. We're not thinking in terms of an observer who comes in and doesn't know what's going to happen in the future. We're looking at one sample point. We know the whole sample point. And what we're doing we know this, we know this, we know this, and all we're doing here saying for each time t along this infinite path we're just plotting how long it is until the next arrival. And that's just a line with slope minus 1 until the next arrival occurs. After the next arrival and you start waiting for the arrival after that, after that when you wait for the arrival after that.
So on a sample path basis, the residual life is just this sequence of isosceles triangles here. So we looked at that, we said, if we look at the time from n equals 1 up until the number of arrivals at time t, the number of triangles we have is n of t omega. The size of the area of each triangle is xi squared over 2. The size of each triangle is xi squared over 2, and when we get all done we want to take the average of it so we divide each of these by t. This quantity here is less than or equal to the integral of y of t over t, and that's less than or equal to the same sum where if we look at a particular time of t this sum here is summing everything up to this point. This sum here is summing everything up to this point. And as t goes to infinity, this one little triangle in here, even though this is the biggest triangle there, it still doesn't make any difference. As t goes to infinity, that washes out. You have to show that, of course, but that's what we showed last time. When we go to the limit we find that this time average residual life is equal to the expected value of x squared over twice the main effect.
Now, this can't be infinite. You can have a finite expected in a renewal time and you can still have an infinite second moment. If you look at the example we talked about last time where the inter-renewal time at two possible values, either epsilon or 1 over epsilon, in other words, it was either enormous or it was very, very small, what we found out there was that some of these inter-renewal intervals, a very small fraction of them, but a very small fraction of the inter-renewal intervals were enormously large. And because they were enormously large, the time average residual life turned out to be enormous also. You think of what happens as epsilon get smaller and smaller.
You can see intuitively why it makes sense that when you have humongously long inter-renewal times, and you have this x squared that occurs, because of this triangle here, I think is possible to see why this quantity here can't become infinite if you had this situation of a very long tailed distributions for the inter-renewal time. You have enormously long residual waits then, and you have them with high probability, because if you come into this process at some arbitrary time you're somewhat more likely to wind up in one of these large intervals than one of the small intervals, and that's what causes all the trouble here.
There are similar examples here. You can look at the age of a process, z of t is defined as the interval between t and the previous arrival. So, if we look at sum time t, the age at that point is the amount of time back to the previous arrival that goes up like a triangle, drops as soon as we get the next arrival. It's exactly the same. It's just the same thing looking at it backwards instead of forwards, so the answer is the same also. If you look at something called duration, the question here is at a particular time t. If we take the difference between the next arrival and the previous arrival, that's called the duration of time t. How big is that? Well, obviously this is the same problem as that, and the residual life and in fact, that's exactly the sum of the residual life and the age. So, it's not surprising that the time average-- oh my. There should be a 2 down there. So it's exactly the same situation as we had before when we were talking about residual life.
Now we'd like to generalize this, and I hope the generalization is almost obvious at this point. These three quantities here are all examples of assigning rewards to renewal processes. And the reward at any time t, when I'm trying to do this, is restricted to be a function of the inter-renewal period containing t. In other words, when you try to define what the reward is at a given time t, it's a function only of what's going on in a renewal period containing t. Now, in its simplest form, I want to go make this even simpler. In its simplest form r of t is restricted to be a function of the age of time t and the duration of time t. In other words, you try to say what's the reward in a given time t, and it's the same as these three examples we looked at. It's some function of how far back do you have to go to the previous arrival, and how far forward do you have to go on until the next arrival. You can make this a function if you want to of any of the three quantities residual life, age, and duration. It seems to be intuitively a little simpler to talk about age and duration as opposed to the other quantities.
So the time average for sample path of r of t is found by analogy to residual life. That's the way I'm going to do it here in class. The notes do a little more careful job than just by analogy. But you start with the n-th inter-renewal interval and you say, what is the aggregate reward I get in the n-th inter-renewal interval? It's a random variable, obviously. And what it is is the integral from the previous arrival up to the next arrival of r of t and omega dt. This is very strange, because when we looked at this before we talked about n of t and we talked about s sub n of t plus 1 and s sub n of t, and suddenly n of t plus 1 has turned into n, and n of t has turned into n minus 1. What has happened? Well, this gives you a clue. And it's partly how we define the intervals. Interval 1 goes from 0 to s1 and z of t equals t, interval n z of t is t minus s sub n minus 1. Let me go back and show you the original figure here and I think it will make it clear.
Now, the first arrival comes at this time here. The first interval we're talking about goes from 0 to time s1 of omega. What we're interested in, in this first interval going from 0 to s1 of omega, is this interarrival time, which is the first interarrival time. This first interval is connected to x1 and it goes from s sub 0, which is just at 0 up to s1. The second arrival time goes from s1 up to s2. When we look at n of t, n of t is a sum t here in this interval here, and this is s.
Am I going to get totally confused writing this or aren't I? I think I might get totally confused, so I'm not going to try to write it. Because when I try to write it I'm trying to make the comparison between n of t and n of t plus 1, which are the things up here. But the quantities here with the renewal periods are, this is the first inter-renewal period, this is the second inter-renewal period. The second inter-renewal period goes from s sub 2 back to s sub 1. The n-th goes from s sub n back to s sub n minus 1. So that's just the way it is when you count this way. If you count it the other way you would get even more confused because then in the interval x sub n the inter-renewal time would be x sub n plus 1 and that would be awful. So, this is the way we have to do it.
Let's look at what happens with the expected inter-renewal time, then, as a time average. And you can think of this as a being for a sample time. When you have an expression which is valid for a sample time, it's also valid for a random variable, because it maps every sample point into some value. So, I'll just call it here r sub n, which is the amount of reward that you pick up in the n-th inter-renewal period, so it's the integral from the n minus first arrival up to the n-th arrival, and we're integrating over r of t over this whole interval. So r of t is a function of the age and of the duration. The age is t minus s sub n minus 1, and the duration is just x of n. Now that's a little weird also because before we were talking about the duration as being statistically very, very different from the inter-renewal period. But it wasn't. If you look on a sample path basis, this duration of the n-th inter-renewal period is exactly equal to x sub n of omega, which is, for that particular sample path, it's the value of that inter-renewal interval.
So when we integrate this quantity now, we want to integrate it over dt. What do we get? Well, we've gotten rid of this t here, we just have the n-th duration here. The only t appears over here, and this is r of a difference between t and s sub n minus 1. So, we want to do a change of variables, we want to let z be t minus s sub n minus 1, and then we integrate z from 0 to x sub n. And now just imagine that you put omegas in all of these things to let you see that you're actually dealing with one sample function, and when you leave the omegas out you're just dealing with the random variables that arise because of doing that.
So when we do this integration, what we get is the integral of r of z with this fixed x sub n integrated against dz. This is a function only of the random variable x sub n. Strange. That's the expected value of r sub n. This unfortunately can't be reduced any further, this is just what it is. You have to find out what is the integral of r of zx integrated over z, and then we have to take the expected value over x. If the expectation exists, you can write this in a simple way, because this quantity here integrated is just r sub n. And then you take the expected value of r sub n, and you divide by x-bar down here. This quantity here is just the expected value of the integral of r of zx dz. And that integral there is just the expected value of r sub n. So, if you look over the entire sample path from 0 to infinity, what you get then is that the time average reward is just equal to the expected value of r sub n divided by x squared. Oops. Lost the example.
Suppose we want to find the k-th moment of the age. Simple thing to try to find. I want to do that because it's simple. If you're looking at the k-th moment of the age looked at as a time average, what it is is the reward at time t, it's the age at time t to the k-th power. We want to take the age to the k-th power at time t, and then integrate over rt and divide by the final value and go to the limit. So the expected value of this k-th moment over 1 inter-renewal period is this integral here, z to the k dz times dF sub x of x. The only place x comes in this integral is in the final point of the integration where we are integrating up to the point x, at which the duration ends, and then we're taking the expected value of this. So, we integrate this, the integral of the z to the k from 0 to x is just x to the k plus 1 divided by k. So when you take that integral and you look at this, what is this? It's the 1 over k times the expected value of the random variable, x to the k plus first power. That's just this taking the expected value of it, which is you take the expected value by integrating times the s sub x of x. Yes?
AUDIENCE: [INAUDIBLE] divide by k plus 1?
PROFESSOR: Yes. I'm sorry. My evil twin brother is speaking this morning, and this should be a k plus 1 here. This should be a k plus 1, and this should be a k plus 1. And if you look at the example of the first moment, which is the first thing we looked at today, namely finding the expected value of the age, what we find out is that it's expected value of x squared divided by 2 times x-bar. So with k equal to 1 we get the expected value of x squared over 2 times x-bar. It's even worse because I left a 2 out when we were talking about the expected value of age. And to make things even worse, when I did this I went back to check whether it was the right answer, and of course the two mistakes cancelled each other out. So, sorry about that.
The thing I really wanted to talk about today is stopping trials for stochastic processes, because that's a topic which has always caused confusion in this class. If you look at any of the textbooks on stochastic processes, it causes even more confusion. People talk about it, and then they talk about it again, and they say what we said before wasn't right, and then they talk about it again, they say what we said before that wasn't right, and it goes on and on and on. And you never know what's going on.
Very often instead of looking at an entire stochastic process over the infinite interval, you'd like to look at a finite interval. You'd like to look at the interval going from 0 up to some time t. But in almost a majority of those cases, the time t that you want to look at is not some fixed time but some random time which depends on something which is happening. And when you want to look at what's going on here, it's always tricky to visualize this because what you have is t becomes a random variable. That random variable is a function of the sample values that you have up until time t. But that seems like a circular argument, t a random variable and it depends on t. Things are not supposed to depend on themselves, because when they depend on themselves you have trouble understanding what's going on. So, we will sort that out.
In sorting it out we're only going to look at discrete-time processes. So, we'll only look at sequences of random variables. We will not assume that these random variables are IID for this argument here, because we want to look at a much wider range of things. So we have some arbitrary sequence of random variables, and the idea is you want to sit there looking at this evolution of the sequence of random variables, and in fact, you want to sit there looking at the sample path of this evolution. You want to look at x1, and you want to look at x2, and you want to look at x3, and so forth. And at each point along the line in terms of what you've seen already, you want to decide whether you're going to stop or whether you're going to continue at that point. And if at that point you develop a rule for what you're going to do with each point you call that a stopping rule, so that you have a rule that tells you when you want to stop looking at these things.
Let me give you some idea of the generality of this. Often when we look at queuing systems we are going to have an arrival process where arrivals keep coming in, things get serviced one after the other, and one of the very interesting things is whether the system ever empties out or whether the queue just keeps building up forever. Well, an interesting stopping rule then is when the queue empties out. We will see that an even more interesting stopping rule is starting with some arbitrary first arrival at time 0. Let's look for the time until another arrival occurs which starts another busy period. These stopping times, then, are going to be critical in analyzing things like queuing systems. They're also going to become critical in trying to analyze these renewal systems that we're already talking about. They will become critical to trying to take a time average view about renewal processes instead of a, not a time average, but an ensemble average viewpoint looking at particular times t as opposed to taking a time average. So we're going to look at these things in many different ways.
In order to do that, since you're looking at a sample function, you're observing it on each arrival of this process. At each observation of a sample value of a random variable, you decide whether you want to stop or not. So a sensible way to deal with this is to look over all time, define a random variable J, which describes when this sequence is to be stopped. So for each sample value x1 of omega, x2 of omega, x3 of omega, and so forth, if your rule is to stop the first time you see a sample value which is equal to 0, then J is a integer random variable whose value is the first n at which is x of n is equal to 0 and many other examples.
So we try a 1 then, x1 of omega is observed. A decision is made based on x1 of omega whether or not to stop. If we stop, J of omega equals 1. If we don't stop, J of omega is bigger than 1, but we don't know what it is yet. At trial two, if we haven't stopped yet, you observe x2 of omega, the second random variable, second sample value, you make a decision again based on x1 of omega and x2 of omega whether or not to stop. If you stop, J of omega is equal to 2. So you can visualize doing this on each trial, you don't have to worry about whether you've already stopped, you just have a rule that says, I want to stop at this point if I haven't stopped before. So the stopping event at time n is that you stop either if your rule tells you to stop or you stop before. And if you stop before then you're obviously stopped. At each trial n if stopping has not yet occurred x of n is observed and the decision based on x1 to xn is made. If you stop, then J omega is equal to n. So for each sample path J of n is the time or the trial at which you stop.
So, we're going to define a stopping trial, or stopping time, J for a sequence of random variables. You're going to define this to be a positive integer-valued random variable that has to be positive, because if you're going to stop before you observe anything that's not a very interesting thing. So you always observe at least one thing, then you decide whether you want to stop or you proceed, so forth. So it's a positive integer-valued random variable, you always stop at an integer trial, and for each n greater than or equal to 1 the indicator random variable the indicator of the event J equals n is a function of what you have observed up until that point. This part is a really critical part of it. The decision to stop at a time n has to be a function only of what you have already observed. You're not allowed as an observer to peek ahead a little bit and then decide on the basis of what you see in the future whether you want to stop at this time or not.
One example in the notes is that you're playing poker with somebody, and you make a bet and the other person wins. You make the bet on the basis of what you've seen in your cards so far, you don't make your bet on the basis of your observation of what the other player has, and the other player will get very angry if you try to base your decision later on what the person you're playing with has. So you can't do that sort of thing. You can't peek ahead when you're doing this. You could design random variables where, in fact, you can look ahead, but the times where you use what people call stopping trials are situations in which you stop based on what has already happened. You can generalize this, and we will generalize it later, where you can stop on the basis of other random variables, which also evolve in time, but you stop on the basis of what has already happened up until time n.
We're going to visualize conducting successive trials until sum n in which the event J equals n occurs. Further trials then cease. It is simpler conceptually to visualize stopping the observation of trials after the stopping trial, but continuing to conduct trials. In other words, if we start talking about a stopping process and we say we stop on the n-th trial, but we're forbidden to even talk about the n plus first random variable. Now the n plus first random variable occurs on some sample paths but doesn't occur on other sample paths. I don't know how to deal with that, and neither do you. And what that means is the way we're going to visualize these processes is physically they continue forever, but we just stop observing them at a certain point. So, this stopping rule is the time at which we stop observing as opposed to the time at which the random variable ceases existing, if we define the random variable, it has a value for all sample values, and sample values involve all of these paths. So you sort of have to define it that way.
One thing you would like to do is in many of these situations you wind up never stopping. And the trouble is, when you're investigating stopping rules, stopping trials, you usually don't know ahead of time whether you're ever going to stop or not. And because of that if you don't know whether you're going to stop or not, you can't call it a stopping trial. So, what one normally does is to say that J is a possibly defective random variable. And if it's possibly defective you mean that all sample points are mapped into either finite J or perhaps J equals infinity, which means that you never stop. But you still have the same condition that we have here, that you stop on the basis of x1, x of n. And you also have the condition, which isn't quite clear here, but it's clear from the fact that J is a random variable, that the events J equals 1, J equals 2, J equals 3, are all disjoint. So you can't stop twice, you have to just stop once. And once you stop you can't start again,
Now does everybody understand what is stopping trial or a stopping rule is at this point? If you don't understand it I think the trouble is you won't realize you don't understand it until you start seeing some of these examples where strange things are going on. I guess we have to just go ahead and see what happens then.
So the examples, which I'm not going to develop in any detail. A gambler goes to a casino and he gambles until he's broke. So, he goes in with a finite amount of capital. He has some particular system that he's playing by. If he's lucky he gets to play for a long, long time, he gets a lot of amusement as he loses his money, and if he's unlucky he loses very quickly. If you look at the odds in casinos and you apply the strong law of large numbers, you realize that with probability 1 you get wiped out eventually. Because the odds are not in your favor. Casinos wouldn't be built if the odds were in your favor.
Another example, just flip a coin until 10 successive heads appear. 10 heads in a row. That's a rather unlikely event. You're going to be able to figure out very easily from this theory what's the expected amount of time until 10 successive heads appear. It's a very easy problem.
A more important problem, this test and hypothesis with repeated trials. So, you have the hypothesis that a certain kind of treatment will make patients well, and you want to know whether to use this treatment. So you test it on mice, or people, or what have you. If you think it's pretty safe, you start testing it on people. And how many tests do you make? Well, you originally think that as a scientist you should plan an experiment ahead of time, and you should say, I'm going to do 1,000 tests. But if you're doing experiments on people and you find out that the first 10 patients die, you're going to stop the experiment at that point. And if you find out that the first 100 patients all live, well, you might continue the experiment. But you're going to publish the results at that point because you're going to try to get the FDA to approve this drug, or this operation, or this what have you.
So if you view this stopping trial as the time at which you're going to try to publish something, then, again, if you have any sense, you are going to perform experiments, look at the experiments as you go, and decide what you're going to do as a function of what you've already seen. I mean, that's the way that all intelligent people operate. So if science says that's not a good way to operate, something's wrong with science. But fortunately science doesn't say that. Science allows you to do this.
AUDIENCE: So not every J is potentially defective? You can come up with examples of where it would be defective. But not every one is necessarily [INAUDIBLE].
PROFESSOR: Yes. If a random variable is defective it means there's some probability that J is going to be infinite, but there's also presumably some probability that J is equal to 1, a probability J is equal to 2, and so forth. So we have a distribution function for J, which instead of going up to 1 and stopping goes up to some smaller value and stops, and then it doesn't go any further.
AUDIENCE: Not every [INAUDIBLE]. Some of these are definitely going to [INAUDIBLE].
PROFESSOR: Yes. So this testing hypotheses is really what triggered Abraham Wald to start trying to understand these situations, because he found out relatively quickly that it wasn't trivial to understand them, and all sorts of crazy things were happening. So he spent a lot of his time studying what happened in these cases. He called it sequential analysis. You might have heard the word. And sequential analysis is exactly the same idea. It's looking at analyzing experiments as you go in time, and either stopping or doing something else or changing the experiment or what have you, depending on what happens.
Another thing is observe successive renewals in a renewal process until s sub n is greater than or equal to 100. Now, this particular thing is one of the things that we're going to use in studying renewal processes, it's why we're studying this topic right now. Well, we might study it now anyway, because we're going to use it in lots of other places, but it's also why you have to be very careful about talking about stopping time as opposed to stopping trials. Because if you stop an experiment at the arrival in a renewal process at which that renewal occurs after time 100, you don't know when you're going to stop. It might be a very long time after 100 before that first arrival after time 100 occurs, it might be a very short time. So it's not a stopping time that you're defining, it's a stopping trial. You are defining a rule which tells you at which trial you're going to stop as opposed to a rule that tells you in which time you're going to stop, and we'll see this as we go on in this.
Suppose the random variables in this process we're looking at have a finite number of possible sample values. Then any possibly defective stopping trial, and a stopping trial is a set of rules for when you're going to stop observing things, any stopping trial can be represented as a rooted tree. If you don't know what a rooted tree is, a rooted tree is what I've drawn here. So, there's no confusion here. A rooted tree is something that starts on the left hand side and it grows for a while. It's like an ordinary tree. It has a root, and it has branches that go up. But any possibly defective stopping trial can be represented as a rooted tree where the trial at which each sample path stops is represented by a terminal node.
So the example here I want to use is the same example as in the text. I was trying to get another example, which will be more interesting. The trouble is with these trees they get very big very quickly, and therefore this is not really a practical way of analyzing these problems. I think it's a nice conceptual way of analyzing. So, the experiment is x is a binary random variable, and stopping occurs when the pattern 1, 0 first occurs. You can certainly look at any other pattern you want to, any much longer pattern. But what I want to do here is to illustrate what the tree is that corresponds to this stopping rule.
So if the first random variable has a value 1 and then the second one has a value 0, you observe 1, 0, and at that point the experiment is over. You've seen the pattern 1, 0, and you stop. So there's a big circle there. If you observe 1 followed by a 1 followed by 0, again you stop 1, 1, 1, followed by a 0 you stop, and so forth. If you observe a 0 and then you observe a 1 and then you observe a 0, that's the first time at which you've seen the pattern 1, 0 so you stop there. 0, 1, 1, 0, you stop there, and so forth. So you get this kind of ladder structure.
The point that I want to make is not that this particular structure is very interesting. It's that whatever kind of rule you decide to use for stopping, you can express it in this way. You can express the points at which you stop by big circles and the points at which you don't stop as intermediate nodes where you keep on going.
I think in some sense this takes the picture which you have of sample values, of sample functions with the rule that each sample value, which is often the easiest way to express stopping rules, but in terms of random variables it doesn't always give you the right story. This gives you a story in terms of all possible choices of the random variables. The first toss has to be either a 1 or a 0. The second toss has to be a 1 or a 0, but if it's a 0 you stop and don't go any further. If it's a 1 you continue further. So you keep continuing along here, you can continue forever here.
If you want to analyze this, it has a rather peculiar analysis. How long does it take until the first 1, 0 occurs? Well, if you start out with a 1 this experiment lasts until the first 0 appears. Because if the 0 doesn't appear right away another 1 appears, so you keep going along this path until a 0 appears. If you see a 0 first and then you see a 1, you're back in the same situation again. You proceed until a 0 occurs. What this is really saying is the time until you stop here consists of an arbitrary number, possibly 0 of 0's, followed by an arbitrary number, possibly 0 of 1's, followed by a single 0. So the only patterns you can have here which would lead to these terminal nodes are some number of 0's followed by some number of 1's, followed by a final 0. But the example is not important, so that just tells you what it means.
With all of this we finally have Wald's equality. And Wald's equality, it looks like a strange thing. I spent so much time talking about stopping rules because all the complexity lies in the stopping rules. Wald's equality itself is a very simple thing after you understand that. What Wald's equality says is, let x sub n be a sequence of IID random variables, each would mean x-bar. This is the kind of thing we've been talking about a lot. If J is a stopping trial for x sub n, n greater than or equal to 1, in other words, if it satisfies that definition where it never peeks ahead, and if the expected value of J is less than infinity, then the sum s sub J equals x1, x2, up to x sub J, the amount that the gambler has accumulated before the gambler stops. The sum of the stopping trial satisfied expected value of the gain as equal to the expected value of x times the expected number of times you play. It sort of says that even when you use these stopping rules to decide when to stop, there's no way to beat the house. If you're playing a fair game, your expected gain is equal to the expected gain for trial times the expected number of times you play. And there's no way you can get around that.
Note that this is more than a [? poof ?] and less than a proof. I'll explain why it's more than a [? poof ?] and less than a proof as we go. s sub J, the random variable s sub J, it's equal to x sub 1 times the indicator function of j greater than or equal to 1. In other words, well, the indicator function J greater than or equal to 1 is just universally 1. x sub 2 times the indicator function of J greater than or equal to 2. In other words, s sub J includes x sub 2 if the experiment proceeds long enough that you take the second trial. In other words, this quantity here, is 1 minus the probability that you stop at the end of the first trial. You continue here, each x sub n is included if the experiment has not been stopped before time n. So the experiments continue under the event that the stopping time was greater than or equal to the J. So the expected value of s of J is equal to the expected value of this sum here, of x sub n times the indicator function of J greater than or equal to n, which is equal to the sum over n of the expected value of x sub n times this indicator function.
By this time I hope that most of you are at least a little bit sensitive to interchanging expectations and sums. And the notes and the problem sets deal with that. That's the part of the proof that I don't want to talk about here, because it isn't really very interesting. It's just one of these typical tedious analysis things. Most of the time this is valid. We will see an example where it isn't, so you'll see what's going on.
The essence of the proof, however, the interesting part of the proof is to show that xn and this indicator function are independent random variables. So you can think of that as a separate limit, that the n-th trial and the indicator function of J greater than or equal to n are independent of each other. This seems a little bit weird, and it seems a little bit weird because this includes the event that J is equal to n. And the event that J is equal to n is something that's decided on the basis of the previous arrivals, or the previous what have you. It's highly influenced by x sub n. So, x sub n is not independent of the indicator function of J equals n. So this is strange. But we'll see how this works out.
Now, if we want to show that x sub n and this indicator function are independent, the thing that you do is note that the indicator function of J greater than or equal to n, this is the set of events in which J is greater than or equal to n. What's the complement of that event? The complement of the event J greater than or equal to n is the event J less than n. So this indicator function is 1 minus this indicator function. J less than n is the complement of the event J greater than or equal to n. When you perform the experiment either this happens, in which case this is 1, or this happens, in which case this is 1, and if this is 1, this is 0. If this is 1, this is 0. Also, the indicator function and J less than n is a function of x1, x2, up to x sub n minus 1. Let me spell that out a little more because it looks a little strange the way it is. i of J less than n is equal to i if the indicator random variable for J equals 1. Union with the indicator random variable for J equals 2. Union indicator function for J equals n minus 1. These are all disjoint events. You can't stop at two different times. So if you stop before time n you either stop at time 1, or you stop at time 2, or you stop at time n minus 1. This event is independent. Well not independent, it depends on x sub 1. This event depends on x1 and x2. This event depends on x1 up to x sub n minus 1. OK, so this is what I mean when I say that the event J less than n is determined by x1 up to x of n minus 1, because each of these sub events is determined by a sub event up there.
Since we're looking at the situation now where the x is our IID and this event is a function of x1 x of n minus 1, it just depends on those random variables. These random variables are independent of x sub n so i of J less than n is independent of x sub n and thus x sub n is independent of J greater than or equal to n. Now, as I said, this is very surprising, and it's very surprising because this event, J greater than or equal to n, typically depends very heavily on x sub n. It depends on J equals n plus 1, and so forth. So it's a little paradoxical and the resolution of the paradox is that given that J is greater than or equal to n, in other words, given that you haven't stopped before time n, the time at which you stop is very dependent on the observations that you make. But whether you stopped before time n or whether you haven't stopped before time n is independent of x sub n. So it really is this conditioning here that makes this a confusing issue. But whether or not J greater than n occurs depends only on x1 up to x sub n minus 1. And with that it's easy enough to finish the proof, it's just writing a few equations. The expected value of s sub J is equal to the sum over n and the expected value of x sub n times the indicator function of J greater than or equal to n.
We've just shown that these are independent. This random variable is independent of this random variable, that's independent of the event J greater than or equal to n, so it's independent of the random variable. The indicator function is J greater than or equal to n, so we can write this way. Now, when we have this expression each of the x sub n's are IID, so all of these are the same quantity x-bar. So since they're all the same quantity x-bar we can just take it outside of the summation. So we have x-bar times the sum of the expected value of i of J greater than or equal to n. Now, the expected value of this indicator function, this indicator function is 1 when J is greater than or equal to 1, and it's 0 otherwise. So the expected value of this is just a probability that J is greater than or equal to n. So, this is equal to the expected value of x times the sum of the probabilities that J is greater than or equal to n. This is really adding the elements of the complimentary distribution function for J, and that gives us the expected value of J. If finite or not finite gives us this anyway.
So, this really is a proof except for that one step. Except for this interchange of expectation and summation here which is, well, we will get an idea of what that has to do with anything. So let's look at an example. Stop when you're ahead. And stop when you're ahead is a strategy for gambling with coins. You toss a coin with probability of heads equal to p. And we want to look at all the different cases for p, whether p is fair or biased in your favor or biased in the other person's favor. And the rule is that you stop whenever you're winnings reach one. So you keep on dumbly gambling, and gambling, and gambling, until finally you get to the point where you're one ahead and when you're one ahead you sigh a sigh of relief and say, now my wife won't be angry at me, or my husband won't divorce me, or something of that sort. And you walk away swearing never to gamble again. Except, when you look at this you say, aha! If the game is fair, I am sure to win, and after I win I might as well play again, because I'm sure to win again, and I'm sure to win again, and I'm sure to win again, and so forth, which is all very strange. And we'll see why that is in a little bit.
But first let's look at the case where p is greater than 1/2. In other words, you have a loaded coin and you've talked somebody else, some poor sucker, into playing with you, and you've decided you're going to stop the first time you're ahead. And because p is greater than 1/2, the expected value of your gain each time you play the game is greater than 0. You're applying IID random variables so that eventually your winnings, if you kept on playing forever, would become humongous. So at some point you have to be 1 ahead in this process of getting to the point where you're arbitrarily large amounts ahead. So with probability 1, you will eventually become ahead.
We would like to know since we know that you're going to be ahead at some point, how long has it going to take you to get ahead? We know that j has to be a random variable in this case, because we know you have to win with probability 1 s sub J equals 1 with probability 1. Namely, your winnings up to the time J. At the time J your winnings are 1. So s sub J is equal to 1, the expected value of s sub J is equal to 1, and Wald says that the expected value of J, the expected time you have to play, is just 1 over x-bar. Isn't that neat? Which says it's 1 over 2p minus 1. The expected value of x is p plus minus 1 times 1 minus p and if you look at that it's 2p minus 1, which is positive when p is bigger than 1/2.
After all the abstraction of Wald's equality it's nice to look at this and solve it in a different way to see if we get the same answer. So we'll do that. If J is equal to 1, that means that on the first flip of the coin you've got heads, you became 1 up and you stopped. So the experiment was over at time 1 if the first toss was a head. So if the first toss was a tail, on the other hand, what has to happen in order for you to win? Well, you're sitting there at minus 1 and you're going to stop when you get to plus 1. Every time you move up one or down one, which means if you're ever going to get from minus 1 up to plus 1, you've got to go through 0 in order to get there.
So let's look at the expected time that it takes to get to 0 for the first time, and then we'll look at the expected time that it takes to get from 0 to 1 for the first time. Well, the expected time that it takes to get from minus 1 to 0 is exactly the same as the expected time that it takes to get from 0 to 1. So, the equation that you then have is the expected value of J is going to be 1. This is the first step you have to take anyway, and with probability 1 minus p you went to minus 1, and if you went to minus 1 you still have J-bar left to go to get to 0 again, and another J-bar to get to 1, so J-bar is equal to 1 plus failing the first time it takes 2J bar to get back up to plus 1. If you analyze this equation, J-bar is equal to 1 plus 2J-bar minus 2Jp-bar, you work that out and it's just 1 over 2p minus 1. So fortunately we get the same answer as we got with the Wald's equality.
Let's go one again. Let's look at what happens if you go to the casino and you play a game where you win $1 or lose $1. But the probability in casinos is somehow always tilted to be a little less. If you play for red or black in roulette, what happens is that red or black are equally likely, but there's the 0 and double 0 where the house always wins. So your probability of winning is always just a little less than 1/2, so that's the situation we have here. It's still possible to win and stop. Like if you win on the first toss, then you're going to stop. So J equals 1 with probability p. J equals 3 if you lose, and then you win, and then you win. So J equals 3 with probability p squared times 1 minus b.
And there are lots of other situations you can look at J equals 5 and see when you win then, and you can count all these things up. Let's try to find a simple way of counting them up. Let's let theta be the probability that J is less than infinity. In other words, theta is the probability that you're ever going to stop. Sometimes in this game you won't ever stop because you're going to lose for while, and as soon as you lose a fair amount you're very unlikely to ever recover because you're drifting South all the time.
Since you're drifting South, theta is going to be less than 1. So how do we analyze this? Same way as before, given that J is greater than 1, in other words, given that you lose on the first trial, the event J less than infinity requires that at some time you go from minus 1 back up to 1. In other words, there's some time m at which s sub m minus s sub 1 is equal to 1. s sub m is the time at which you first get from minus 1 back up to 0, and then there's some time that it takes to get from 0 up to 1, and there's some probability that ever happened. The probability that you ever reach 0 is equal to theta, the probability you ever get from 0 up to plus 1 is theta again. So theta is equal to p, which is the probability that you win right away, plus probability you lose on the first toss, and you ever get from minus 1 to 0 and you ever get from 0 to 1. So theta equals p plus 1 minus p times theta squared. There are two solutions to this. One is that theta it is p over 1 minus p and the other is theta equals 1, which we know is impossible. And thus J is defective, and Wald's equation is inapplicable. But the solution that we have, which is useful to know, is that your probability of ever winning is p over 1 minus p. Nice to know. That's what happens when you go to a casino.
Finally, let's consider the interesting case which is p equals 1/2. The limit as p approaches 1/2 from below, probability that J less than infinity goes from p over 1 minus p goes to 1. In other words, in a fair game, and this is not obvious, there's probability 1 that you will eventually get to the point where s sub n is equal to 1. You can sort of see this if you view this in terms of the central limit theorem. And we looked pretty hard at the central limit theorem in the case of these binary experiments. As n gets bigger and bigger, the standard deviation of this grid grows as the square root of n. Now the standard deviation is growing as the square root of n and over a period of n you're wobbling around. You can see that it doesn't make any sense to have the standard deviation growing with the square root of n if you don't at some point go from the positive half down to the negative half. If you always stay in the positive half after a certain point, then that much time later you're always going to stay in the same half again, and that doesn't work in terms of the square root of n. So you can convince yourself on the basis of a very tedious argument what this very simple argument shows you.
And as soon as we start studying Markov chains with uncountably infinite number of states, we're going to analyze this case again. In other words, this is a very interesting situation because something very interesting happens here and we're going to analyze it at least three different ways. So if you don't believe this way, you will believe the other way. And what happens as p approaches 1/2 from above you know the expected value of J approaches infinity. So what's going to happen at p equals 1/2 as the expected time for you to win goes to infinity, the probability of your winning goes to 1, and Wald's equality does not hold because the expected value of x is equal to 0, and you look at 0 times infinity and it doesn't tell you anything. The expected value of s sub J s sub n at every value of n is 0, but s sub n when the experiment ends is equal to 1. So there's something very funny going on here and if you think in practical terms though, it takes an infinite time to make your $1.00 and more important, it requires access to an infinite capital. We will see that if you limit your capital, if you limit how far South you can go, then in fact you don't win with probability 1, because you can go there also. Next time we will apply this to renewal processes.
Renewal Rewards, Stopping Trials, and Wald's Inequality (PDF)
MIT OpenCourseWare makes the materials used in the teaching of almost all of MIT's subjects available on the Web, free of charge. With more than 2,200 courses available, OCW is delivering on the promise of open sharing of knowledge. Learn more »
© 2001–2015
Massachusetts Institute of Technology
Your use of the MIT OpenCourseWare site and materials is subject to our Creative Commons License and other terms of use.