WEBVTT
00:00:00.530 --> 00:00:02.960
The following content is
provided under a Creative
00:00:02.960 --> 00:00:04.370
Commons license.
00:00:04.370 --> 00:00:07.410
Your support will help MIT
OpenCourseWare continue to
00:00:07.410 --> 00:00:11.060
offer high quality educational
resources for free.
00:00:11.060 --> 00:00:13.960
To make a donation or view
additional materials from
00:00:13.960 --> 00:00:17.890
hundreds of MIT courses, visit
MIT OpenCourseWare at
00:00:17.890 --> 00:00:23.400
ocw.mit.edu
00:00:23.400 --> 00:00:25.290
PROFESSOR: OK, we have
a busy day today,
00:00:25.290 --> 00:00:26.540
so let's get started.
00:00:32.580 --> 00:00:36.310
Want to go through Chernoff
bounds and the Wald identity,
00:00:36.310 --> 00:00:42.770
which are closely related, as
you'll see, and that involves
00:00:42.770 --> 00:00:47.220
coming back to the TG1Q a little
bit and making use of
00:00:47.220 --> 00:00:49.290
what we did for that.
00:00:49.290 --> 00:00:54.320
It also means coming
back to hypothesis
00:00:54.320 --> 00:00:57.140
testing and using that.
00:00:57.140 --> 00:00:59.100
Would probably have been better
to start out with
00:00:59.100 --> 00:01:04.319
Wald's identity and the Chernoff
bound and then do the
00:01:04.319 --> 00:01:08.620
applications when it was at
the natural time for them.
00:01:08.620 --> 00:01:12.730
But anyway, this is the way it
is this time, and next time
00:01:12.730 --> 00:01:14.080
we'll probably do
it differently.
00:01:16.610 --> 00:01:19.580
Suppose you have a random
variable z.
00:01:19.580 --> 00:01:21.520
It has a moment generating
function.
00:01:21.520 --> 00:01:24.200
Remember, not all random
variables have moment
00:01:24.200 --> 00:01:25.250
generating functions.
00:01:25.250 --> 00:01:27.810
It's a pretty strong
restriction.
00:01:27.810 --> 00:01:28.970
You need a variance.
00:01:28.970 --> 00:01:31.570
You need moments
of all orders.
00:01:31.570 --> 00:01:35.100
You need all sorts of things,
s but we'll assume it exists
00:01:35.100 --> 00:01:41.010
in some region between
r and r plus.
00:01:41.010 --> 00:01:43.310
There's always a question,
with moment generating
00:01:43.310 --> 00:01:49.240
functions, if they exist up
to some maximum value of r
00:01:49.240 --> 00:01:53.490
because some of them exist at
that value of r and then
00:01:53.490 --> 00:01:59.650
disappear immediately after
that, and others just sort of
00:01:59.650 --> 00:02:08.210
peter away as r approaches
r plus from below.
00:02:08.210 --> 00:02:10.750
I think in the homework this
week, you have an example of
00:02:10.750 --> 00:02:12.030
both of those.
00:02:12.030 --> 00:02:13.900
I mean, it's a very
simple issue.
00:02:13.900 --> 00:02:18.000
If you have an exponential
distribution, then as r
00:02:18.000 --> 00:02:23.900
approaches, the rate of that
exponential distribution,
00:02:23.900 --> 00:02:26.620
obviously, the moment generating
function blows up
00:02:26.620 --> 00:02:30.700
because you're taking e to the
minus lambda x, and you're
00:02:30.700 --> 00:02:33.390
multiplying it by a the r x.
00:02:33.390 --> 00:02:39.280
And when r is equal to lambda,
bingo, you're integrating 1
00:02:39.280 --> 00:02:42.290
over an infinite range, so
you've got infinity.
00:02:42.290 --> 00:02:46.150
If you multiply that exponential
by something which
00:02:46.150 --> 00:02:52.360
makes the integral finite when
you set r equal to lambda,
00:02:52.360 --> 00:02:53.900
then of course, you have
something which
00:02:53.900 --> 00:02:57.460
is finite at r star.
00:02:57.460 --> 00:02:59.450
That is a big pain
in the neck.
00:02:59.450 --> 00:03:01.570
It's usually not important.
00:03:01.570 --> 00:03:06.240
The notes deal with it very
carefully, so we're not going
00:03:06.240 --> 00:03:07.100
to deal with it here.
00:03:07.100 --> 00:03:11.210
We will just assume here that
we're talking about r less
00:03:11.210 --> 00:03:16.040
than r plus and not worry about
that special case, which
00:03:16.040 --> 00:03:17.690
usually is not all
that important.
00:03:17.690 --> 00:03:19.940
But sometimes you have
to worry about it.
00:03:19.940 --> 00:03:23.610
OK, the Chernoff bound says
that the probability that
00:03:23.610 --> 00:03:28.140
random variable is greater than
or equal to alpha is less
00:03:28.140 --> 00:03:32.060
than or equal to the moment
generating function evaluated
00:03:32.060 --> 00:03:37.690
at some arbitrary value r times
e to the minus r alpha.
00:03:37.690 --> 00:03:41.700
And if you put it in terms of
the semi invariant moment
00:03:41.700 --> 00:03:44.440
generating function, the log
of the moment generating
00:03:44.440 --> 00:03:48.040
function, then the bound
is e to the gamma z of
00:03:48.040 --> 00:03:51.100
r minus alpha r.
00:03:54.350 --> 00:03:57.410
When you see something like
that, you ought to look at it
00:03:57.410 --> 00:04:00.540
and say, gee, that looks funny
because here, we're taking an
00:04:00.540 --> 00:04:04.750
arbitrary random variable and
saying the tails of it have to
00:04:04.750 --> 00:04:07.370
go down exponentially.
00:04:07.370 --> 00:04:09.090
That's exactly what this says.
00:04:09.090 --> 00:04:13.560
It says that a z takes
on very large values.
00:04:13.560 --> 00:04:18.110
This is a fixed quantity here
for a given value of r, and
00:04:18.110 --> 00:04:21.700
it's going down as e to the
minus r times alpha.
00:04:21.700 --> 00:04:25.030
As you make alpha larger and
larger, this goes down faster
00:04:25.030 --> 00:04:25.620
and faster.
00:04:25.620 --> 00:04:26.790
So what's going on?
00:04:26.790 --> 00:04:30.090
How do you take an arbitrary
random variable and say the
00:04:30.090 --> 00:04:34.140
tails of it is exponentially
decreasing?
00:04:34.140 --> 00:04:37.510
That's why you have to insist
that the moment generating
00:04:37.510 --> 00:04:41.450
function exists because when the
moment generating function
00:04:41.450 --> 00:04:44.300
exists for some r, it means
that the tail of that
00:04:44.300 --> 00:04:48.870
distribution is, in fact, going
down at least that fast,
00:04:48.870 --> 00:04:50.690
so you get something
that exists.
00:04:50.690 --> 00:04:55.310
So the question is what's the
best bound of this sort of
00:04:55.310 --> 00:04:58.080
when you optimize o for r?
00:04:58.080 --> 00:05:02.650
Then the next thing we did is
we said that z is a sum of
00:05:02.650 --> 00:05:07.000
IID, then the semi invariant
moment generating function for
00:05:07.000 --> 00:05:12.480
that sum is equal to n times
the semi invariant moment
00:05:12.480 --> 00:05:17.722
generating function for the
underlying random variable x.
00:05:17.722 --> 00:05:20.840
S of n is n of these IID
random variable.
00:05:20.840 --> 00:05:24.180
So one thing you see
immediately, and ought to be
00:05:24.180 --> 00:05:28.890
second nature to you now, is
that if a random variable has
00:05:28.890 --> 00:05:32.480
a moment generating function
over some range, the sum of a
00:05:32.480 --> 00:05:36.070
bunch of those IID random
variables also has a moment
00:05:36.070 --> 00:05:39.200
generating function over
that same range.
00:05:39.200 --> 00:05:42.120
You can just count on that
because the semi invariant
00:05:42.120 --> 00:05:46.000
moment generating function
is just n times this b.
00:05:46.000 --> 00:05:49.640
OK, so then what we've said is
the probability that sn is
00:05:49.640 --> 00:05:53.170
greater than or equal to na,
where na is playing the role
00:05:53.170 --> 00:05:57.640
of alpha and sn is playing the
role of z, is just a minimum
00:05:57.640 --> 00:06:04.850
over r of e to the n times gamma
x of r minus ra, and the
00:06:04.850 --> 00:06:08.540
n is multiplying the ra
as well as the n.
00:06:08.540 --> 00:06:12.100
OK, this is exponential
n for a fixed a.
00:06:12.100 --> 00:06:15.820
In other words, what you do in
this minimization, if you
00:06:15.820 --> 00:06:18.780
don't worry about the special
cases or anything, how do you
00:06:18.780 --> 00:06:19.860
minimize something?
00:06:19.860 --> 00:06:25.590
Well, obviously, you want to
minimize the exponent here, so
00:06:25.590 --> 00:06:30.210
you take the derivative of this
gamma prime of r has to
00:06:30.210 --> 00:06:32.330
be equal to a.
00:06:32.330 --> 00:06:35.690
Then n can be whatever it wants
to be when you find that
00:06:35.690 --> 00:06:40.240
optimum r, which is where gamma
prime of r equals a,
00:06:40.240 --> 00:06:42.610
then this goes down
exponentially with a.
00:06:45.590 --> 00:06:48.860
Now, however, we're interested
in something else.
00:06:48.860 --> 00:06:51.130
We're interested in threshold
crossings.
00:06:51.130 --> 00:06:56.800
We're not interested in picking
a particular value of
00:06:56.800 --> 00:07:00.120
a and asking, as n gets very,
very big, what's the
00:07:00.120 --> 00:07:03.150
probability that the sum of
random variable is greater
00:07:03.150 --> 00:07:05.460
than or equal to n times a.
00:07:05.460 --> 00:07:08.760
That is exponential in n, but
what we're interested in is
00:07:08.760 --> 00:07:12.440
the probability that s of n is
greater than or equal to just
00:07:12.440 --> 00:07:16.300
some constant alpha, and what
we're doing, now, is instead
00:07:16.300 --> 00:07:20.850
of varying n and varying
this with n also,
00:07:20.850 --> 00:07:22.030
we're holding the stick.
00:07:22.030 --> 00:07:26.420
So we're asking as n gets very,
very large, but you hold
00:07:26.420 --> 00:07:30.340
this alpha fixed, what happens
on this bound over here?
00:07:30.340 --> 00:07:34.030
Well, when you minimize this,
taking the same simple-minded
00:07:34.030 --> 00:07:38.090
view, now the n is not
multiplied by the ra.
00:07:38.090 --> 00:07:41.480
It's just multiplied
by the gamma x.
00:07:41.480 --> 00:07:45.790
You get n times gamma prime of
r is equal to alpha s where
00:07:45.790 --> 00:07:51.610
the minimum is so that it says
gamma prime of r is optimized
00:07:51.610 --> 00:07:55.270
when you pick gamma prime of r
equal to alpha over and n.
00:07:55.270 --> 00:08:00.460
This quantity is minimized when
you pick gamma prime of r
00:08:00.460 --> 00:08:01.760
equal to alpha over n.
00:08:01.760 --> 00:08:06.200
So if you look at this bound as
n changes, what's happening
00:08:06.200 --> 00:08:11.710
is, as n changes, r is changing
also, so this is a
00:08:11.710 --> 00:08:15.360
harder thing to deal with
for variable n.
00:08:15.360 --> 00:08:19.580
But graphically, it's quite
easy to deal with.
00:08:19.580 --> 00:08:22.885
I'm not sure you all got the
graphical argument last time
00:08:22.885 --> 00:08:26.570
when we went through it, so I
want to go through it again.
00:08:26.570 --> 00:08:32.409
Let's look at this exponent r
minus n over alpha times gamma
00:08:32.409 --> 00:08:36.730
of r, and see what
it looks like.
00:08:36.730 --> 00:08:42.000
We'll take r, pick
any old r, there.
00:08:42.000 --> 00:08:48.530
What we want to do is show that
this, if you take a slope
00:08:48.530 --> 00:08:56.010
of alpha over n, and take an
arbitrary r, come down to
00:08:56.010 --> 00:09:00.620
gamma of x of r, draw a line
in this slope, and look at
00:09:00.620 --> 00:09:06.350
where it hits the horizontal
axis here, that point is r
00:09:06.350 --> 00:09:09.750
plus the length of
this line here.
00:09:09.750 --> 00:09:13.410
The length of this line here
is gamma of r, that's a
00:09:13.410 --> 00:09:18.930
negative value, times 1 over
that slope of this line.
00:09:18.930 --> 00:09:24.830
And 1 over the slope of this
line is n over alpha, so when
00:09:24.830 --> 00:09:28.730
I pick a particular value of r,
the value of the experiment
00:09:28.730 --> 00:09:31.100
I have is this value here.
00:09:36.930 --> 00:09:39.570
How do I optimize this over r?
00:09:39.570 --> 00:09:42.340
How do I get the largest
exponent here?
00:09:42.340 --> 00:09:48.560
Well, I think of varying r, as
I vary r from 0, and each
00:09:48.560 --> 00:09:51.280
time, I take this straight
line here.
00:09:51.280 --> 00:09:54.530
And I start here, draw a
straight line over there,
00:09:54.530 --> 00:09:57.770
start here, draw a straight
line over, start at this
00:09:57.770 --> 00:10:00.320
tangent here, draw a
straight line over.
00:10:00.320 --> 00:10:03.890
And what happens when I come
to larger values of r?
00:10:03.890 --> 00:10:09.300
Just because gamma of s of r is
convex, what happens is I
00:10:09.300 --> 00:10:17.500
start taking these slope lines,
slope alpha over n, and
00:10:17.500 --> 00:10:21.420
they intercept the horizontal
axis at a smaller value.
00:10:21.420 --> 00:10:28.020
So this is optimized over r
at the value of r0, which
00:10:28.020 --> 00:10:32.790
satisfies alpha over n equals
gamma prime of r0.
00:10:32.790 --> 00:10:35.870
That's the same answer we got
before when we just used
00:10:35.870 --> 00:10:38.370
elementary calculus.
00:10:38.370 --> 00:10:41.330
Here, we're using a more
sophisticated argument, which
00:10:41.330 --> 00:10:44.600
you learned about probably
in 10th grade.
00:10:44.600 --> 00:10:47.540
I would argue that you learned
mostly really sophisticated
00:10:47.540 --> 00:10:51.450
things when you're in high
school, and then when you get
00:10:51.450 --> 00:10:55.370
to study engineering in college,
somehow you always
00:10:55.370 --> 00:10:57.130
study these mundane things.
00:10:57.130 --> 00:11:01.740
But anyway, aside from
that, why is this
00:11:01.740 --> 00:11:04.330
geometric argument better?
00:11:04.330 --> 00:11:07.630
Well, when you look at these
special cases of what happens
00:11:07.630 --> 00:11:13.650
when gamma of r comes around
like this, and then suddenly
00:11:13.650 --> 00:11:17.350
it stops in midair and just
doesn't exist anymore?
00:11:17.350 --> 00:11:21.230
So it comes around here, it's
still convex, but then
00:11:21.230 --> 00:11:23.470
suddenly it goes off
to infinity.
00:11:23.470 --> 00:11:25.580
How do you do that optimization
then?
00:11:25.580 --> 00:11:29.050
Well, the graphical argument
makes it clear how you do it,
00:11:29.050 --> 00:11:32.060
and makes it perfectly rigorous
how to do it, whereas
00:11:32.060 --> 00:11:34.310
if you're doing it by calculus,
you've got a really
00:11:34.310 --> 00:11:39.270
think it through, and it
becomes fairly tricky.
00:11:39.270 --> 00:11:45.040
OK, so anyway, now, the next
question we want to ask--
00:11:47.810 --> 00:11:51.040
I mean, at this point, we've
seen how to minimize this
00:11:51.040 --> 00:11:56.490
quantity over r, so we know what
this exponent is for a
00:11:56.490 --> 00:11:58.800
particular value of n.
00:11:58.800 --> 00:12:01.690
Now, what happens
when we vary n?
00:12:01.690 --> 00:12:05.500
As you vary n, the thing that
happens is we have this
00:12:05.500 --> 00:12:10.010
tangent line here, a
slope alpha over n.
00:12:10.010 --> 00:12:14.650
When you start making n larger,
alpha over n becomes
00:12:14.650 --> 00:12:18.850
smaller, so the slope
becomes smaller.
00:12:18.850 --> 00:12:24.210
And as n approaches infinity,
you wind up going way,
00:12:24.210 --> 00:12:26.270
way the heck out.
00:12:26.270 --> 00:12:29.700
As n gets smaller, you
come in again.
00:12:29.700 --> 00:12:33.110
You keep coming in until you
get to this point here.
00:12:33.110 --> 00:12:34.130
And what happens then?
00:12:34.130 --> 00:12:38.010
We're talking about
a line of--
00:12:38.010 --> 00:12:39.570
maybe I ought to draw
it on the board.
00:12:39.570 --> 00:12:41.370
It would be clearer, I think.
00:12:57.850 --> 00:13:02.570
As n gets smaller, you
get a point which is
00:13:02.570 --> 00:13:05.510
tangent here, this here.
00:13:05.510 --> 00:13:12.000
When you're here, the tangent
gets right here, so we've
00:13:12.000 --> 00:13:15.970
moved all the way into this
quantity we call r star, which
00:13:15.970 --> 00:13:20.520
is the root of the equation
gamma of r equals 0.
00:13:20.520 --> 00:13:24.230
Gamma of r equals 0 typically
has two roots, one
00:13:24.230 --> 00:13:27.920
here, and one at 0.
00:13:27.920 --> 00:13:31.220
It always has a root at 0
because moment generating
00:13:31.220 --> 00:13:36.010
function evaluated is 0 is
always 1, so the log of
00:13:36.010 --> 00:13:37.950
it is always 0.
00:13:37.950 --> 00:13:41.500
There should be another root
because this is convex, unless
00:13:41.500 --> 00:13:45.610
it drops off suddenly, and even
if it drops off suddenly,
00:13:45.610 --> 00:13:48.270
you can visualize it
as a straight line
00:13:48.270 --> 00:13:50.450
going off to infinity.
00:13:50.450 --> 00:13:54.240
So when you get down to this
point, what happens?
00:13:54.240 --> 00:13:55.845
Well, we just keep
moving along.
00:14:04.340 --> 00:14:09.110
So as n increases, we start
out very large.
00:14:09.110 --> 00:14:10.140
We come in.
00:14:10.140 --> 00:14:14.260
We hit this point, and then
we start coming out again.
00:14:14.260 --> 00:14:18.040
I mean, if you think about it,
that makes perfect sense
00:14:18.040 --> 00:14:21.580
because what we're doing here is
we're imagining experiment
00:14:21.580 --> 00:14:27.150
where this random variable has
a negative expected value.
00:14:27.150 --> 00:14:33.060
That's what's indicated by
this quantity there.
00:14:33.060 --> 00:14:36.010
We're asking what's the
probability that the sum of a
00:14:36.010 --> 00:14:40.260
large number of IID random
variables with a negative
00:14:40.260 --> 00:14:44.480
expected value ever rises above
some positive threshold?
00:14:47.200 --> 00:14:50.000
Well, the law of large numbers
says it's not going to do that
00:14:50.000 --> 00:14:54.100
when n is very, very large,
and this says that, too.
00:14:54.100 --> 00:14:57.070
It says the probability of
it for n very large is
00:14:57.070 --> 00:14:58.910
extraordinarily small.
00:14:58.910 --> 00:15:02.860
It's e to the minus 10 times
an exponent, which is very,
00:15:02.860 --> 00:15:04.660
very large.
00:15:04.660 --> 00:15:09.660
So as n gets very small, it's
not going to happen either
00:15:09.660 --> 00:15:12.780
because it doesn't have time
to get to the threshold.
00:15:12.780 --> 00:15:15.910
So there's some intermediate
value at which it's most
00:15:15.910 --> 00:15:18.950
likely to cross the threshold,
if you're going to cross the
00:15:18.950 --> 00:15:22.460
threshold, and that intermediate
value is just
00:15:22.460 --> 00:15:28.960
that value at which gamma of
r star is equal to zero.
00:15:31.640 --> 00:15:34.670
So the probability this union
of terms, namely the
00:15:34.670 --> 00:15:38.310
probability you ever cross
alpha, is going to be, in some
00:15:38.310 --> 00:15:45.210
sense, approximately a to the
minus alpha r star because
00:15:45.210 --> 00:15:47.130
that's where the dominant
term is.
00:15:47.130 --> 00:15:50.060
The dominant term is where
alpha over n is
00:15:50.060 --> 00:15:52.180
equal to gamma prime.
00:15:52.180 --> 00:15:56.030
Blah, blah, blah, blah, blah,
where'd I put that?
00:15:56.030 --> 00:16:00.130
r star satisfies gamma
of r star equals 0.
00:16:00.130 --> 00:16:06.650
When you look at the line of
slope, gamma prime of r plus
00:16:06.650 --> 00:16:12.860
of r star, that's where you get
this critical value of n
00:16:12.860 --> 00:16:17.396
where it's most likely the
cross the threshold.
00:16:17.396 --> 00:16:18.900
OK, I put that somewhere.
00:16:18.900 --> 00:16:24.410
I thought it was on this slide,
but it's the n, the
00:16:24.410 --> 00:16:32.195
critical n, let's call it n
crit, is equal to gamma prime.
00:16:41.240 --> 00:16:42.930
Is at right?
00:16:42.930 --> 00:16:45.720
Alpha over m is the
gamma prime.
00:16:45.720 --> 00:16:51.730
Alpha over n, 1 over n crit.
00:16:51.730 --> 00:17:02.830
n crit, this says, is alpha over
gamma prime of r star.
00:17:02.830 --> 00:17:09.670
OK, so that sort of nails down
everything you want to know
00:17:09.670 --> 00:17:12.640
about the Chernoff bound accept
for the fact that it is
00:17:12.640 --> 00:17:14.270
exponentially tight.
00:17:14.270 --> 00:17:15.780
The text proves that.
00:17:15.780 --> 00:17:17.780
I'm not going to go
through that here.
00:17:17.780 --> 00:17:21.480
Exponentially tight means, if
you take an exponent which is
00:17:21.480 --> 00:17:25.260
just a little bit larger than
the one you found here, and
00:17:25.260 --> 00:17:27.660
look what happens as alpha
gets very, very
00:17:27.660 --> 00:17:31.590
large, then you lose.
00:17:31.590 --> 00:17:37.820
OK, let's go on, and at this
point, we're ready to talk
00:17:37.820 --> 00:17:40.510
about Wald's identity.
00:17:40.510 --> 00:17:43.600
And we'll prove Wald's identity
at the end of the
00:17:43.600 --> 00:17:45.510
lecture today.
00:17:45.510 --> 00:17:48.010
Turns out there's a very,
very simple proof of it.
00:17:48.010 --> 00:17:55.800
There's hardly anything to it,
but it seems more important to
00:17:55.800 --> 00:17:59.960
use it in several ways first so
that you get a sense that
00:17:59.960 --> 00:18:03.440
it, in fact, is sort
of important.
00:18:03.440 --> 00:18:09.910
OK, so we want to think about
a random walk, s sub n and
00:18:09.910 --> 00:18:12.840
greater than or equal to n, so
it's a sequence of sums of
00:18:12.840 --> 00:18:16.560
random variable, s sub
n is equal to x1
00:18:16.560 --> 00:18:19.150
plus up to x sub n.
00:18:19.150 --> 00:18:21.480
The x's are all IID.
00:18:21.480 --> 00:18:24.330
This is the thing we've been
talking about all term.
00:18:24.330 --> 00:18:27.040
We have a bunch of IID
random variables.
00:18:27.040 --> 00:18:29.600
We look at the partial
sums of them.
00:18:29.600 --> 00:18:32.220
We're interested in what happens
to that sequence of
00:18:32.220 --> 00:18:33.570
partial sums.
00:18:33.570 --> 00:18:36.940
The question we're asking here
is does that sequence of
00:18:36.940 --> 00:18:41.880
partial sums ever cross
a positive threshold?
00:18:41.880 --> 00:18:44.670
And now we're asking does
it ever cross a positive
00:18:44.670 --> 00:18:49.170
threshold, or does it cross a
negative threshold, and which
00:18:49.170 --> 00:18:51.230
does it cross first?
00:18:51.230 --> 00:18:54.690
So the probability that it
crosses this threshold is the
00:18:54.690 --> 00:18:57.340
probability that it
goes up, first.
00:18:57.340 --> 00:19:00.190
The probability that it crosses
this threshold is the
00:19:00.190 --> 00:19:03.140
probability that it
goes down, first.
00:19:03.140 --> 00:19:06.360
Now, what Wald's identity says
is the following thing.
00:19:06.360 --> 00:19:09.480
We're going to assume that
x is not identically 0.
00:19:09.480 --> 00:19:12.230
If x is identically
0, then it's never
00:19:12.230 --> 00:19:14.400
going to go any place.
00:19:14.400 --> 00:19:17.070
We're going to assume that it
has a semi invariant moment
00:19:17.070 --> 00:19:22.870
generating function in some
region, r minus to r plus.
00:19:22.870 --> 00:19:25.240
That's the same as assuming
that it has a generating
00:19:25.240 --> 00:19:31.230
function in that region, so it
exists from some value less
00:19:31.230 --> 00:19:35.060
than zero to some value
greater than zero.
00:19:35.060 --> 00:19:38.660
And we picked two thresholds,
one of them positive, one of
00:19:38.660 --> 00:19:44.980
them negative, and we let j be
the smallest value of n. j is
00:19:44.980 --> 00:19:49.790
a random variable, now, because
we've start to run
00:19:49.790 --> 00:19:51.070
this random walk.
00:19:51.070 --> 00:19:54.740
We run it until it crosses one
of these thresholds, and if it
00:19:54.740 --> 00:19:57.780
crosses the positive threshold,
j is the time at
00:19:57.780 --> 00:20:00.250
which it crosses the
positive threshold.
00:20:00.250 --> 00:20:03.470
If it crosses the negative
threshold, j is the time that
00:20:03.470 --> 00:20:05.390
it crosses the negative
threshold.
00:20:05.390 --> 00:20:07.150
We're only looking at the first
00:20:07.150 --> 00:20:08.400
threshold that it crosses.
00:20:10.990 --> 00:20:14.370
Now, notice that j is
a stopping trial.
00:20:14.370 --> 00:20:17.270
In other words, what that means
is you can determine
00:20:17.270 --> 00:20:21.720
whether you've crossed a
threshold at time n solely in
00:20:21.720 --> 00:20:26.040
terms of s1 up to s sub n.
00:20:26.040 --> 00:20:30.330
If you see all these sums, then
you know that you haven't
00:20:30.330 --> 00:20:33.170
crossed a threshold
up until time n.
00:20:33.170 --> 00:20:35.160
You know you have crossed
it at time n.
00:20:35.160 --> 00:20:37.990
Doesn't make any difference
what happens at times
00:20:37.990 --> 00:20:39.700
greater than n.
00:20:39.700 --> 00:20:42.720
OK, so it's a stopping trial
in the same sense as the
00:20:42.720 --> 00:20:45.330
stopping trials we talked
about before.
00:20:45.330 --> 00:20:48.720
You get the sense that Wald's
identity, which we're talking
00:20:48.720 --> 00:20:52.020
about here, is sort of like
Wald's equality, which we
00:20:52.020 --> 00:20:53.810
talked about before.
00:20:53.810 --> 00:20:57.620
Both of them have to do with
these stopping trials.
00:20:57.620 --> 00:21:01.000
Both of them have everything
to do with stopping trials.
00:21:01.000 --> 00:21:05.390
Wald was a famous statistician,
not all that
00:21:05.390 --> 00:21:09.030
much before your era.
00:21:09.030 --> 00:21:10.910
He didn't die too long ago.
00:21:10.910 --> 00:21:15.010
I forget when, but he was one
of the good statisticians.
00:21:15.010 --> 00:21:19.230
See, he was a statistician who
recognized that you wanted to
00:21:19.230 --> 00:21:21.850
look at lots of different
models to understand the
00:21:21.850 --> 00:21:24.830
problem, rather than a
statistician who only wanted
00:21:24.830 --> 00:21:30.040
to take data and think that he
wasn't assuming anything.
00:21:30.040 --> 00:21:32.630
So Wald was a good guy.
00:21:32.630 --> 00:21:37.530
And what his identity says is,
and the trouble with his
00:21:37.530 --> 00:21:42.070
identity, is you look at
it, and you blink.
00:21:42.070 --> 00:21:47.950
The expected value of
e to the r s sub j.
00:21:47.950 --> 00:21:53.520
s sub j is the value of the
random walk at the time when
00:21:53.520 --> 00:21:57.470
you cross a threshold minus
the time at which you've
00:21:57.470 --> 00:22:01.510
crossed a threshold
times gamma of r.
00:22:01.510 --> 00:22:05.670
So when you take the expected
value of e to the this, you're
00:22:05.670 --> 00:22:10.410
averaging over j over the time
that you crossed the
00:22:10.410 --> 00:22:14.400
threshold, and also at the value
at which you crossed the
00:22:14.400 --> 00:22:17.500
threshold, so you're averaging
over both of these things.
00:22:17.500 --> 00:22:21.900
And Wald says this expectation
not is less than or equal to
00:22:21.900 --> 00:22:27.870
1, but it's exactly 1, and it's
exactly 1 for every r
00:22:27.870 --> 00:22:31.700
between r minus and r plus.
00:22:31.700 --> 00:22:34.690
So it's a very surprising
result.
00:22:34.690 --> 00:22:34.910
Yes?
00:22:34.910 --> 00:22:35.862
AUDIENCE: Can you
please explain
00:22:35.862 --> 00:22:37.670
why j cannot be defective?
00:22:37.670 --> 00:22:38.710
I don't really see it.
00:22:38.710 --> 00:22:42.160
PROFESSOR: Oh, it's because
we were looking at two
00:22:42.160 --> 00:22:43.400
thresholds.
00:22:43.400 --> 00:22:46.790
If we only had one threshold,
then it could be defective.
00:22:46.790 --> 00:22:49.700
Since we're looking at two
thresholds, you keep adding
00:22:49.700 --> 00:22:53.510
random variables in, and the
sum starts to have a larger
00:22:53.510 --> 00:22:55.650
and larger variance.
00:22:55.650 --> 00:22:58.250
Now, even with a large variance,
you're not sure that
00:22:58.250 --> 00:23:01.090
you crossed a threshold,
but you see why you
00:23:01.090 --> 00:23:03.370
must cross a threshold.
00:23:03.370 --> 00:23:04.022
Yes?
00:23:04.022 --> 00:23:06.475
AUDIENCE: If the MGF is defined
at r minus, r plus,
00:23:06.475 --> 00:23:10.280
then is that also [INAUDIBLE]
quality?
00:23:10.280 --> 00:23:12.690
PROFESSOR: Yes.
00:23:12.690 --> 00:23:15.585
Oh, if it's defined at r plus.
00:23:19.240 --> 00:23:19.495
I don't know.
00:23:19.495 --> 00:23:24.060
I don't remember, and I would
have to think about it hard.
00:23:27.830 --> 00:23:32.225
Funny things happen right at the
ends of where these moment
00:23:32.225 --> 00:23:35.070
generating functions are
defined, and you'll see why
00:23:35.070 --> 00:23:36.320
when we prove it.
00:23:39.650 --> 00:23:42.450
I can give you a clue as to how
we're going to prove it.
00:23:42.450 --> 00:23:48.860
What we're going to do is, for
this random variable x, we're
00:23:48.860 --> 00:23:52.780
going to define another random
variable which has the same
00:23:52.780 --> 00:23:58.070
distribution as x except
it's tilted.
00:23:58.070 --> 00:24:03.250
For large values of x, you
multiply it by e to the rx.
00:24:03.250 --> 00:24:05.100
For small values of
x, you multiply it
00:24:05.100 --> 00:24:07.480
by e to the rx also.
00:24:07.480 --> 00:24:10.960
But if r is positive, that
means the positive values
00:24:10.960 --> 00:24:17.000
could shifted up, and the small
values get shifted down.
00:24:20.280 --> 00:24:23.560
So you're taking some of the
density that looks like this,
00:24:23.560 --> 00:24:29.640
and when you shift it to this
tilted value, you're shifting
00:24:29.640 --> 00:24:32.190
the whole thing upward.
00:24:32.190 --> 00:24:34.620
When r is negative,
you're shifting
00:24:34.620 --> 00:24:36.330
the whole thing downward.
00:24:36.330 --> 00:24:43.220
Now, what this says is that
tilted random variable, when
00:24:43.220 --> 00:24:47.130
it crosses the threshold, the
time of crossing the threshold
00:24:47.130 --> 00:24:48.640
is still a random variable.
00:24:48.640 --> 00:24:53.660
You will see that this simply
says that the expected value
00:24:53.660 --> 00:24:57.360
of that tilted random variable
is equal to--
00:24:57.360 --> 00:25:00.560
it says that tilted random
variable is, in fact, the
00:25:00.560 --> 00:25:01.430
random variable.
00:25:01.430 --> 00:25:02.640
It's not defective.
00:25:02.640 --> 00:25:05.600
And it's the same argument as
before, that has a finite
00:25:05.600 --> 00:25:09.270
variance, and therefore, since
it has a finite variance, it
00:25:09.270 --> 00:25:10.550
keeps expanding.
00:25:10.550 --> 00:25:13.960
It will cross one of the
thresholds eventually.
00:25:13.960 --> 00:25:21.040
OK, so the other thing you can
do here is to say, suppose
00:25:21.040 --> 00:25:25.020
instead of crossing a threshold,
you just fix this
00:25:25.020 --> 00:25:29.350
stopping rule to say we'll
stop at time 100.
00:25:29.350 --> 00:25:33.370
If you stop at time 100, then
what this says is expected
00:25:33.370 --> 00:25:38.745
value of e to the r100 minus
100 times gamma of
00:25:38.745 --> 00:25:40.940
r is equal to 1.
00:25:40.940 --> 00:25:44.690
But that's obvious because the
expected value of e to the r
00:25:44.690 --> 00:25:46.340
is j is, in fact--
00:25:54.030 --> 00:25:59.010
it's j times the expected value
of rx, so then you're
00:25:59.010 --> 00:26:02.410
subtracting off j times
the log of the
00:26:02.410 --> 00:26:04.890
expected value of rx.
00:26:04.890 --> 00:26:10.080
So it's a trivial identity
if x is fixed.
00:26:10.080 --> 00:26:15.930
OK, so Wald's identity
says this.
00:26:15.930 --> 00:26:20.240
Let's see what it means in terms
of crossing a threshold.
00:26:20.240 --> 00:26:22.290
We'll assume both thresholds
are there.
00:26:22.290 --> 00:26:27.070
Incidentally, Wald's identity
is valid in a much broader
00:26:27.070 --> 00:26:30.310
range of circumstances than
just where you have two
00:26:30.310 --> 00:26:33.430
thresholds, and you're looking
at a threshold crossing.
00:26:33.430 --> 00:26:41.680
It's just that's a particularly
valuable form of
00:26:41.680 --> 00:26:44.880
the Wald identity.
00:26:44.880 --> 00:26:47.350
So that's the only thing
we're going to use.
00:26:47.350 --> 00:26:51.080
But now, if we assume further
that this random variable x
00:26:51.080 --> 00:26:55.820
has a negative expectation,
when x has a negative
00:26:55.820 --> 00:27:00.490
expectation, gamma of r
starts off going down.
00:27:00.490 --> 00:27:03.300
Usually, it comes
back up again.
00:27:03.300 --> 00:27:12.000
We're going to assume that this
quantity r star here,
00:27:12.000 --> 00:27:22.400
where it crosses 0 again, we're
going to assume there is
00:27:22.400 --> 00:27:26.460
some value of r, for which
gamma of r star equals 0.
00:27:26.460 --> 00:27:30.280
Mainly, we're going to assume
this typical case in which it
00:27:30.280 --> 00:27:35.980
comes back up and crosses
the 0 point.
00:27:35.980 --> 00:27:43.600
And in that case, what it says
is the probability that sj is
00:27:43.600 --> 00:27:47.200
greater than or equal to alpha
is just less than or equal to
00:27:47.200 --> 00:27:50.320
e to the minus r star
times alpha.
00:27:50.320 --> 00:27:53.140
Very, very simple bound
at this point.
00:27:53.140 --> 00:27:57.130
You look at this, and you sort
of see why we're looking, now,
00:27:57.130 --> 00:28:01.820
not at r in general,
but just r star.
00:28:01.820 --> 00:28:05.360
At r star, gamma of r
star is equal to 0.
00:28:05.360 --> 00:28:09.270
So this term goes away, so we're
only talking about the
00:28:09.270 --> 00:28:15.670
expected value of e to the r
sj, e to the r star sj is
00:28:15.670 --> 00:28:16.640
equal to 1.
00:28:16.640 --> 00:28:19.320
So let's see what happens.
00:28:19.320 --> 00:28:24.080
We know that e to the r star sj
is greater than or equal to
00:28:24.080 --> 00:28:35.160
0 for all values of sj because
e to the anything real is
00:28:35.160 --> 00:28:36.970
going to be positive.
00:28:36.970 --> 00:28:41.670
OK, since e to the r star sj is
greater than or equal to 0,
00:28:41.670 --> 00:28:46.230
what we can do is break this
expected value here, this term
00:28:46.230 --> 00:28:49.760
is 0, now, remember, break
it into two terms.
00:28:49.760 --> 00:28:54.590
Break it into the term where s
sub j is bigger than alpha,
00:28:54.590 --> 00:29:00.060
and break it into the
term where s sub j
00:29:00.060 --> 00:29:02.250
is less than beta.
00:29:02.250 --> 00:29:04.570
So I'm just going to ignore the
case where it's less than
00:29:04.570 --> 00:29:05.970
or equal to beta.
00:29:05.970 --> 00:29:08.200
I'm going to take this
expected value.
00:29:08.200 --> 00:29:11.360
I'm going to write it as the
probability that s sub j is
00:29:11.360 --> 00:29:15.550
greater than or equal to alpha
times e times the expected
00:29:15.550 --> 00:29:19.750
value of either the r star s
sub j given s sub j greater
00:29:19.750 --> 00:29:20.900
than or equal to alpha.
00:29:20.900 --> 00:29:25.740
There should be another term
in here to make this equal,
00:29:25.740 --> 00:29:28.740
and that's the probability that
s sub j is less than or
00:29:28.740 --> 00:29:33.450
equal to beta times e to the r
star s sub j, given that s sub
00:29:33.450 --> 00:29:35.760
j is less than or
equal to beta.
00:29:35.760 --> 00:29:39.120
We're going to ignore that, and
that's why we get the less
00:29:39.120 --> 00:29:40.990
than or equal to 1 here.
00:29:40.990 --> 00:29:43.770
Now, you can lower bound
e to the r star
00:29:43.770 --> 00:29:47.060
sj under this condition.
00:29:47.060 --> 00:29:51.980
What's a lower bound to s sub
j given that s sub j is
00:29:51.980 --> 00:29:53.352
greater than or equal
to alpha?
00:29:56.010 --> 00:29:58.510
Alpha.
00:29:58.510 --> 00:30:01.380
OK, we're looking at all cases
where s sub j is greater than
00:30:01.380 --> 00:30:08.200
or equal to alpha, and we're
going to stop this experiment
00:30:08.200 --> 00:30:10.920
at the point where it
first exceeds alpha.
00:30:10.920 --> 00:30:13.280
So we're going to lower bound
the point where it first
00:30:13.280 --> 00:30:20.190
exceeds alpha by alpha itself,
so this quantity is lower
00:30:20.190 --> 00:30:23.880
bounded, again, by taking the
probability that s sub j
00:30:23.880 --> 00:30:28.980
greater than or equal to alpha
times e to the r star alpha,
00:30:28.980 --> 00:30:31.680
and that whole thing is less
than or equal to 1.
00:30:31.680 --> 00:30:35.380
That says the probability that
sj is greater than or equal to
00:30:35.380 --> 00:30:44.590
alpha is less than or equal to
e to the minus r star alpha,
00:30:44.590 --> 00:30:47.990
which is what this inequality
says here.
00:30:47.990 --> 00:30:52.290
OK, so this is not
rocket science.
00:30:52.290 --> 00:30:57.910
This is a fairly simple result
if you believe in Wald's
00:30:57.910 --> 00:31:00.740
identity, which we'll
prove later.
00:31:00.740 --> 00:31:02.970
OK, so it's valid
for all choices
00:31:02.970 --> 00:31:05.430
of this lower threshold.
00:31:05.430 --> 00:31:10.260
And remember, this probability
here, it doesn't look like
00:31:10.260 --> 00:31:14.640
it's a function of both alpha
and beta, but it is because
00:31:14.640 --> 00:31:21.370
you're asking what's the
probability that you cross the
00:31:21.370 --> 00:31:26.380
threshold alpha before you
cross the threshold beta.
00:31:26.380 --> 00:31:29.180
And if you make beta very, very
large, it makes it more
00:31:29.180 --> 00:31:31.780
likely that you're going
to cross the threshold.
00:31:31.780 --> 00:31:36.780
If you make beta very close to
0, then you're probably going
00:31:36.780 --> 00:31:42.750
to cross beta first, so this
inequality here, this quantity
00:31:42.750 --> 00:31:44.750
here, depends on beta also.
00:31:44.750 --> 00:31:47.960
But we know that this inequality
is valid no matter
00:31:47.960 --> 00:31:52.210
what beta is, so we can let beta
approach minus infinity,
00:31:52.210 --> 00:31:54.700
and we can still have
this inequality.
00:31:54.700 --> 00:31:58.390
There's a little bit tricky
math involved in that.
00:31:58.390 --> 00:32:02.800
There's an exercise in the text
which goes through that
00:32:02.800 --> 00:32:06.880
slightly tricky math, but what
you find is that this bound is
00:32:06.880 --> 00:32:09.920
valid with only one threshold,
as well as with two
00:32:09.920 --> 00:32:10.940
thresholds.
00:32:10.940 --> 00:32:15.010
But this proof here that we've
given depends on a lower
00:32:15.010 --> 00:32:17.420
threshold, which is somewhere.
00:32:17.420 --> 00:32:18.910
We don't care where.
00:32:18.910 --> 00:32:21.830
Valid for all choices of
beta, so it's valid
00:32:21.830 --> 00:32:24.580
without a lower threshold.
00:32:24.580 --> 00:32:30.590
The probability that the union
overall n of sn less than or
00:32:30.590 --> 00:32:31.410
equal to alpha.
00:32:31.410 --> 00:32:33.330
In other words, the probability
that we ever
00:32:33.330 --> 00:32:35.110
crossed a threshold alpha--
00:32:35.110 --> 00:32:36.430
AUDIENCE: It's not true equal.
00:32:36.430 --> 00:32:36.880
PROFESSOR: What?
00:32:36.880 --> 00:32:39.090
AUDIENCE: It's supposed to
be sn larger [INAUDIBLE]
00:32:39.090 --> 00:32:40.990
as the last time?
00:32:40.990 --> 00:32:43.940
PROFESSOR: It's less than or
equal to e to the minus r star
00:32:43.940 --> 00:32:46.590
alpha, which is--
00:32:46.590 --> 00:32:47.133
AUDIENCE: Oh, sn?
00:32:47.133 --> 00:32:48.000
sn?
00:32:48.000 --> 00:32:50.194
PROFESSOR: n.
00:32:50.194 --> 00:32:51.076
AUDIENCE: You just [INAUDIBLE]
00:32:51.076 --> 00:32:51.517
[? the quantity? ?]
00:32:51.517 --> 00:32:54.040
PROFESSOR: Oh, it's a union
overall n greater than or
00:32:54.040 --> 00:32:55.290
equal to 1.
00:32:58.407 --> 00:33:03.050
OK, in other words, this
quantity we're dealing with
00:33:03.050 --> 00:33:08.581
here is the probability
that sn---
00:33:08.581 --> 00:33:12.620
oh, I see what you're saying.
00:33:12.620 --> 00:33:14.320
This quantity here
should be greater
00:33:14.320 --> 00:33:15.230
than or equal to alpha.
00:33:15.230 --> 00:33:18.200
You're right.
00:33:18.200 --> 00:33:21.290
Sorry about that.
00:33:21.290 --> 00:33:22.710
I think it's right
most places.
00:33:22.710 --> 00:33:24.850
Yes, it's right.
00:33:24.850 --> 00:33:26.100
We have it right here.
00:33:32.210 --> 00:33:36.110
The probability of this union
is really the same as the
00:33:36.110 --> 00:33:39.930
probability that the value of
it, after it crosses the
00:33:39.930 --> 00:33:42.110
threshold, is greater than
or equal to alpha.
00:33:45.910 --> 00:33:51.150
OK, now, we saw before that the
probability that s sub n
00:33:51.150 --> 00:33:55.500
is greater than or
equal to alpha.
00:33:55.500 --> 00:33:56.750
Excuse me, that's the same.
00:34:00.580 --> 00:34:04.030
When you're writing things in
LaTeX, the symbol for less
00:34:04.030 --> 00:34:06.840
than or equal to is so similar
to that for greater than or
00:34:06.840 --> 00:34:09.110
equal to that's hard to
keep them straight.
00:34:09.110 --> 00:34:14.239
That quantity there is a greater
than or equal to sign,
00:34:14.239 --> 00:34:16.850
if you're going from right to
left instead of right to left.
00:34:16.850 --> 00:34:22.659
So all we're doing here is
simply using this, well,
00:34:22.659 --> 00:34:24.550
greater than or equal to.
00:34:24.550 --> 00:34:27.949
OK, the corollary makes a
stronger and cleaner statement
00:34:27.949 --> 00:34:35.630
that the probability that you
ever cross alpha is less than
00:34:35.630 --> 00:34:36.810
or equal to--
00:34:36.810 --> 00:34:42.910
my heavens, my evil twin got
a hold of these slides.
00:34:46.449 --> 00:34:51.760
And let me rewrite that one.
00:34:51.760 --> 00:35:02.620
The probability that the union
overall n of the event s sub n
00:35:02.620 --> 00:35:09.425
greater than or equal to alpha
is less than or equal to e to
00:35:09.425 --> 00:35:13.860
the minus r star alpha.
00:35:13.860 --> 00:35:17.350
OK, so we've seen from the
Chernoff bound that for every
00:35:17.350 --> 00:35:21.050
n this bound has satisfied, this
says that it's not only
00:35:21.050 --> 00:35:25.740
satisfied for each n, but it
says it's satisfied overall n
00:35:25.740 --> 00:35:26.790
collectively.
00:35:26.790 --> 00:35:29.960
Otherwise, if we were using the
Chernoff bound, what would
00:35:29.960 --> 00:35:33.420
we have to do to get a handle
on this quantity?
00:35:33.420 --> 00:35:37.370
We'd have to use the union
bound, and then when we use
00:35:37.370 --> 00:35:41.950
the union bound, we can show
that for every n, the
00:35:41.950 --> 00:35:44.820
probability that sn is greater
than or equal to alpha is less
00:35:44.820 --> 00:35:46.170
than or equal to
this quantity.
00:35:46.170 --> 00:35:49.340
But then we'd have to add all
those terms, and we would have
00:35:49.340 --> 00:35:52.600
to somehow diddle around with
them to show that there are
00:35:52.600 --> 00:35:57.760
only a few of them which are
close to this value, and all
00:35:57.760 --> 00:36:00.850
the rest are negligible.
00:36:00.850 --> 00:36:04.420
And the number that are close to
that value is only growing
00:36:04.420 --> 00:36:07.800
with n and goes through
a lot of headache.
00:36:07.800 --> 00:36:11.460
Here, we don't have to do this
anymore because the Wald
00:36:11.460 --> 00:36:14.100
identity has saved is from
all that difficulty.
00:36:17.780 --> 00:36:24.100
OK, we talked about
the G/G/1 queue.
00:36:24.100 --> 00:36:28.000
We're going to apply this
corollary to the G/G/1 queue
00:36:28.000 --> 00:36:33.000
to the queueing time, namely to
the time w sub i that the
00:36:33.000 --> 00:36:36.590
i's arrival spends in
the queue before
00:36:36.590 --> 00:36:38.420
starting to be served.
00:36:38.420 --> 00:36:41.860
You remember, when we looked at
that, we found that if we
00:36:41.860 --> 00:36:48.850
define u sub i to be equal to
the ith interarrival time
00:36:48.850 --> 00:36:53.230
minus the i minus first service
time, those are
00:36:53.230 --> 00:36:55.640
independent of each other,
so this is the
00:36:55.640 --> 00:36:58.300
difference between those.
00:36:58.300 --> 00:37:02.100
So ui is the difference between
the i's arrival time
00:37:02.100 --> 00:37:05.000
and the previous service time.
00:37:05.000 --> 00:37:09.280
What we showed was that this
sequence, u sub i, the
00:37:09.280 --> 00:37:14.170
sequence of the sums of u sub
i as a modification of a
00:37:14.170 --> 00:37:16.440
random walk.
00:37:16.440 --> 00:37:20.180
In other words, the sums of
the u sub i behave exactly
00:37:20.180 --> 00:37:24.280
like a random walk does, but
every time it gets down to 0,
00:37:24.280 --> 00:37:27.030
if it crosses 0, it
resets to 0 again.
00:37:27.030 --> 00:37:30.700
So it keeps bouncing up again.
00:37:30.700 --> 00:37:36.970
If you look in the text, what
it shows is that if you look
00:37:36.970 --> 00:37:41.130
at this sequence of u sub i's,
and you look at the sum of
00:37:41.130 --> 00:37:43.800
them, and you look at them
backward, if you look at the
00:37:43.800 --> 00:37:50.380
sum of u sub i plus u sub i
minus 1 plus u sub i minus 2,
00:37:50.380 --> 00:37:56.770
and so forth, when you look at
the sum that way, it actually
00:37:56.770 --> 00:37:58.490
becomes a random walk.
00:37:58.490 --> 00:38:03.290
Therefore, we can apply this
bound to the random walk, and
00:38:03.290 --> 00:38:08.200
what we find is that the
probability that the waiting
00:38:08.200 --> 00:38:18.010
time of n queue, of the nth
customer, is probability that
00:38:18.010 --> 00:38:21.870
it's greater than or equal to
an arbitrary number alpha is
00:38:21.870 --> 00:38:25.150
less than or equal to the
probability that w sub
00:38:25.150 --> 00:38:28.640
infinity is greater than or
equal to alpha, and it's less
00:38:28.640 --> 00:38:31.610
than e to the minus
r star alpha.
00:38:31.610 --> 00:38:35.750
So again, all you have to do is
you have this inner arrival
00:38:35.750 --> 00:38:39.390
time x, you have this service
time y, you take the
00:38:39.390 --> 00:38:42.770
difference of the two, that's a
random variable, you find a
00:38:42.770 --> 00:38:46.450
moment generating function of
that random variable, you find
00:38:46.450 --> 00:38:49.730
the point of r star at which
that moment generating
00:38:49.730 --> 00:38:53.580
function equals 1, and then
the bound says that the
00:38:53.580 --> 00:38:57.760
probability that the queueing
time that you're going to be
00:38:57.760 --> 00:39:00.840
dealing with is less than
or equal to this
00:39:00.840 --> 00:39:01.890
quantity alpha here.
00:39:01.890 --> 00:39:02.120
Yes?
00:39:02.120 --> 00:39:06.016
AUDIENCE: What do you work with
when you have the gamma
00:39:06.016 --> 00:39:08.938
function go like this, and thus
have infinity, and you
00:39:08.938 --> 00:39:09.912
cross it there.
00:39:09.912 --> 00:39:12.610
[INAUDIBLE] points that
we're looking for?
00:39:12.610 --> 00:39:15.090
PROFESSOR: For that, you
have to read the text.
00:39:15.090 --> 00:39:20.240
I mean, effectively, you can
think of it just as if gamma
00:39:20.240 --> 00:39:24.980
of r is a convex function
like anything else.
00:39:24.980 --> 00:39:28.840
It just has a discontinuity in
it, and bingo, it shoots off
00:39:28.840 --> 00:39:30.140
to infinity.
00:39:30.140 --> 00:39:33.570
So when you take these slope
arguments, what happens is
00:39:33.570 --> 00:39:37.140
that for all slopes beyond that
point, they just seesaw
00:39:37.140 --> 00:39:39.820
around at one point.
00:39:39.820 --> 00:39:41.250
But the same bound holds.
00:39:44.020 --> 00:39:46.960
OK, so that's the
Kingman bound.
00:39:50.240 --> 00:39:53.250
Then we talked about
large deviations
00:39:53.250 --> 00:39:54.590
for hypothesis test.
00:39:54.590 --> 00:39:58.340
Well, actually we just talked
about hypothesis test, but not
00:39:58.340 --> 00:40:01.680
large deviation for them.
00:40:01.680 --> 00:40:05.840
Let's review where
we were on that.
00:40:05.840 --> 00:40:14.870
Let's let the vector y be an n
tuple of IID random variables,
00:40:14.870 --> 00:40:17.000
y1 up to y sub n.
00:40:17.000 --> 00:40:21.050
They're IID conditional
on hypothesis 0.
00:40:21.050 --> 00:40:25.540
They're also IID conditional on
hypothesis 1, so the game
00:40:25.540 --> 00:40:31.965
is nature chooses either
hypothesis 0 or hypothesis 1.
00:40:35.060 --> 00:40:40.590
You take n samples of some IID
random variable, and those n
00:40:40.590 --> 00:40:44.370
samples are IID conditional on
either nature choosing 0 or
00:40:44.370 --> 00:40:45.870
nature choosing 1.
00:40:45.870 --> 00:40:49.260
At the end of choosing those n
samples, you're supposed to
00:40:49.260 --> 00:40:54.070
guess whether h0 is the right
hypothesis or 1 is a right
00:40:54.070 --> 00:40:55.800
hypothesis.
00:40:55.800 --> 00:40:59.790
Invest in Apple stock 10 years
ago, and one hypothesis is
00:40:59.790 --> 00:41:00.970
it's going to go broke.
00:41:00.970 --> 00:41:04.180
The other hypothesis is it's
going to invent marvelous
00:41:04.180 --> 00:41:08.030
things, and your stock will
go up by a factor of 50.
00:41:08.030 --> 00:41:12.530
You take some samples, you make
your decision on that.
00:41:12.530 --> 00:41:15.210
Fortunately, with that, you can
make a separate decision
00:41:15.210 --> 00:41:18.900
each year, but that's the
kind of thing that
00:41:18.900 --> 00:41:19.840
we're talking about.
00:41:19.840 --> 00:41:24.790
We're just restricting it to
this case where you have n
00:41:24.790 --> 00:41:28.280
sample values that you're taking
one after the other,
00:41:28.280 --> 00:41:31.380
and they're all IID when the
particular value of the
00:41:31.380 --> 00:41:34.210
hypothesis that happens
to be there.
00:41:34.210 --> 00:41:37.830
OK, so we said there
is something called
00:41:37.830 --> 00:41:39.710
a likelihood ratio.
00:41:39.710 --> 00:41:45.610
The likelihood ratio for a
particular sequence y is
00:41:45.610 --> 00:41:53.490
lambda of y is equal to the
density of y given h1 divided
00:41:53.490 --> 00:41:55.900
by the density of y given h0.
00:41:55.900 --> 00:42:00.550
Why is it h1 on the top
and h0 on the bottom?
00:42:00.550 --> 00:42:02.970
Purely convention,
nothing else.
00:42:02.970 --> 00:42:05.600
The only thing that
distinguishes hypothesis 1
00:42:05.600 --> 00:42:10.940
from hypothesis 0 is you choose
one and call it 1, and
00:42:10.940 --> 00:42:13.350
you choose the other
and call it 0.
00:42:13.350 --> 00:42:16.900
Doesn't make any difference
how you do it.
00:42:16.900 --> 00:42:19.470
So after we make that choice,
the likelihood
00:42:19.470 --> 00:42:22.870
ratio is that ratio.
00:42:22.870 --> 00:42:27.200
Now, the reason for using semi
invariant moment generating
00:42:27.200 --> 00:42:31.170
functions is that this
density here is
00:42:31.170 --> 00:42:32.930
a product of densities.
00:42:32.930 --> 00:42:36.710
This density is a product of
densities, and therefore when
00:42:36.710 --> 00:42:44.380
you take the log of this ratio
of products, you get the sum
00:42:44.380 --> 00:42:51.810
from i equals 1 to n of this log
likelihood ratio for just
00:42:51.810 --> 00:42:54.970
a single experiment.
00:42:54.970 --> 00:42:58.520
It's a single experiment that
you're taking based on the
00:42:58.520 --> 00:43:01.850
fact that all n experiments
are based on the same
00:43:01.850 --> 00:43:05.290
hypothesis, either h0 or h1.
00:43:05.290 --> 00:43:08.430
So the game that you're playing,
and please remember
00:43:08.430 --> 00:43:11.310
what the game is if you forget
everything else about this
00:43:11.310 --> 00:43:16.350
game, is the hypothesis gets
chosen, and at the same time,
00:43:16.350 --> 00:43:18.710
you take n sample values.
00:43:18.710 --> 00:43:22.830
All n sample values correspond
to the same value of the
00:43:22.830 --> 00:43:24.300
hypothesis.
00:43:24.300 --> 00:43:29.010
OK, so when you do that, we're
going to call z sub i, this
00:43:29.010 --> 00:43:33.090
logarithm here, this log
likelihood ratio.
00:43:33.090 --> 00:43:37.370
And then we showed last time
that a threshold test is--
00:43:37.370 --> 00:43:43.250
well, we define the threshold
test as comparing the sum with
00:43:43.250 --> 00:43:45.620
the logarithm of a threshold.
00:43:45.620 --> 00:43:51.630
And the threshold is equal to
p0 over p sub 1, if in fact
00:43:51.630 --> 00:43:55.890
you're doing a maximum a
posteriori probability test,
00:43:55.890 --> 00:44:01.530
and p0 and p1 are the
probabilities of hypothesis.
00:44:01.530 --> 00:44:03.410
Remember how we did that.
00:44:03.410 --> 00:44:04.700
It was a very simple thing.
00:44:04.700 --> 00:44:09.350
You just write out what the
probability is of hypothesis 0
00:44:09.350 --> 00:44:12.060
and a sequence of
n values of y.
00:44:12.060 --> 00:44:16.260
You write out what the
probability is of hypotheses 1
00:44:16.260 --> 00:44:22.360
and that same sequence of values
with the appropriate
00:44:22.360 --> 00:44:30.880
probability on that sequence for
h equals 1 and h equals 0.
00:44:30.880 --> 00:44:37.230
And what you get out of that
is that the threshold test
00:44:37.230 --> 00:44:40.050
sums up all the z sub i's,
compares it with the
00:44:40.050 --> 00:44:44.990
threshold, and makes a choice,
and that is the map choice.
00:44:44.990 --> 00:44:50.520
OK, so conditional on h0, you're
going to make an error
00:44:50.520 --> 00:44:54.490
if the sum of the z sub i's
is greater than the
00:44:54.490 --> 00:44:57.500
logarithm of eta.
00:44:57.500 --> 00:45:02.350
And conditional on h1, you're
going to make an error if the
00:45:02.350 --> 00:45:05.450
sum is less than or
equal to log eta.
00:45:05.450 --> 00:45:11.540
I denote these as the random
variable z sub i 0 to make
00:45:11.540 --> 00:45:16.020
sure that you recognize that
this random variable here is
00:45:16.020 --> 00:45:20.700
conditional on h0 in this case,
and it's conditional on
00:45:20.700 --> 00:45:22.965
h1 in the opposite case.
00:45:26.680 --> 00:45:32.570
OK, so the exponential bound
for z sub i sub 0--
00:45:32.570 --> 00:45:36.900
OK, so what we're doing now is
we're saying, OK, suppose that
00:45:36.900 --> 00:45:40.900
0 is the actual value
of this hypothesis.
00:45:40.900 --> 00:45:43.980
0 is the value of
the hypothesis.
00:45:43.980 --> 00:45:46.700
The experimenter doesn't
know this.
00:45:46.700 --> 00:45:49.820
What the experimenter does is
does what the experimenter has
00:45:49.820 --> 00:45:53.910
been told to do, namely the
experimenter take these n
00:45:53.910 --> 00:45:59.090
values, y1 up to y sub n, finds
the likelihood ratio,
00:45:59.090 --> 00:46:03.390
compares that likelihood ratio
with the threshold, and if the
00:46:03.390 --> 00:46:08.370
threshold is larger than the
threshold, it decides 1.
00:46:08.370 --> 00:46:11.465
If it's smaller than the
threshold, that decides
00:46:11.465 --> 00:46:14.060
opposite thing.
00:46:14.060 --> 00:46:17.300
It decides 1 if it's above
the threshold, 0 if
00:46:17.300 --> 00:46:18.550
it's below the threshold.
00:46:26.390 --> 00:46:30.440
Well, first thing we want to do,
then, is to find the log
00:46:30.440 --> 00:46:36.050
likelihood ratio under the
assumption that 0 is the
00:46:36.050 --> 00:46:39.550
correct hypothesis, and
something very remarkable
00:46:39.550 --> 00:46:41.420
happens here.
00:46:41.420 --> 00:46:46.980
Gamma sub 0 of r is now the
logarithm because it's a semi
00:46:46.980 --> 00:46:52.190
invariant moment generating
function of the expected value
00:46:52.190 --> 00:46:57.912
of this quantity of e to
the r times z sub i.
00:46:57.912 --> 00:47:01.200
When we take the expected value,
we integrate over f of
00:47:01.200 --> 00:47:07.692
y given h0 times e to the r
times log of f of y given h1
00:47:07.692 --> 00:47:10.300
over f of y given h0.
00:47:10.300 --> 00:47:12.920
You look at this, and
what do you get?
00:47:12.920 --> 00:47:17.330
This quantity here is e
to the r times log of
00:47:17.330 --> 00:47:18.920
f of y given h1.
00:47:18.920 --> 00:47:22.670
That whole quantity in there
is just f of y given
00:47:22.670 --> 00:47:26.830
h1 to the rth power.
00:47:26.830 --> 00:47:33.270
So what we have is, in this
quantity here, is f of y given
00:47:33.270 --> 00:47:35.990
h0 to the minus r power.
00:47:35.990 --> 00:47:40.100
So this term combined with
this term gives us f of 1
00:47:40.100 --> 00:47:45.890
minus r of y given h0, and this
quantity here is f to the
00:47:45.890 --> 00:47:50.460
r of y given h1 dy.
00:47:50.460 --> 00:47:55.340
So the semi invariant moment
generating function is this
00:47:55.340 --> 00:47:56.410
quantity here.
00:47:56.410 --> 00:48:02.910
At r equals 1, this is just f
of y given h1, so the log of
00:48:02.910 --> 00:48:05.470
it is equal to 0.
00:48:05.470 --> 00:48:10.420
So what we're saying is that,
for any old detection problem
00:48:10.420 --> 00:48:13.590
in the world, so long as this
moment generating function
00:48:13.590 --> 00:48:20.560
exists, what happens is it
starts at 0, it comes down,
00:48:20.560 --> 00:48:26.750
comes back up again, and
r star is equal to 1.
00:48:26.750 --> 00:48:28.110
That's what we've just shown.
00:48:28.110 --> 00:48:33.140
When r is equal to 1, this whole
thing is equal to 1, so
00:48:33.140 --> 00:48:35.560
the log of 1 is equal to 0.
00:48:35.560 --> 00:48:38.680
For every one of these problems,
you know where this
00:48:38.680 --> 00:48:42.090
intercept is, you know where
this intercept is, one is at
00:48:42.090 --> 00:48:43.950
0, one is at 1.
00:48:59.190 --> 00:49:01.560
What we're going to do now is
try to find out what the
00:49:01.560 --> 00:49:10.020
probability of error is given
that h is 0, h equals 0, is
00:49:10.020 --> 00:49:12.320
the correct hypothesis.
00:49:12.320 --> 00:49:16.010
So we're assuming that the
probabilities are actually f
00:49:16.010 --> 00:49:17.760
of y given h0.
00:49:17.760 --> 00:49:22.450
We calculate this quantity that
looks like this, and we
00:49:22.450 --> 00:49:32.290
ask what is the probability
that this sum of random
00:49:32.290 --> 00:49:36.690
variables exceeds the threshold,
exceeds the
00:49:36.690 --> 00:49:37.750
threshold eta.
00:49:37.750 --> 00:49:42.260
So the thing that we do is we
draw a line, a slope, natural
00:49:42.260 --> 00:49:44.840
log of eta divided by eta.
00:49:44.840 --> 00:49:48.780
We draw that slope along here,
and we find that the
00:49:48.780 --> 00:49:54.120
probability of error is upper
bounded by gamma 0 of this
00:49:54.120 --> 00:49:58.550
quantity, defined by the slope,
minus r0 times log of
00:49:58.550 --> 00:50:01.850
eta divided by eta.
00:50:01.850 --> 00:50:04.400
That's all there is to it.
00:50:04.400 --> 00:50:05.710
Any questions about that?
00:50:08.450 --> 00:50:11.470
Seem obvious?
00:50:11.470 --> 00:50:12.720
Seem strange?
00:50:14.820 --> 00:50:22.760
OK, so the probability of r
conditional on h equals 0 is e
00:50:22.760 --> 00:50:27.090
to the n times gamma 0 of
r0 minus r0, natural
00:50:27.090 --> 00:50:28.940
log of eta over eta.
00:50:28.940 --> 00:50:33.050
And ql of eta is the probability
of error given
00:50:33.050 --> 00:50:36.790
that h is equal to l.
00:50:36.790 --> 00:50:43.190
OK, we can do the same thing
for hypothesis 1.
00:50:43.190 --> 00:50:46.570
We're asking what's the
probability of error given
00:50:46.570 --> 00:50:51.040
that h equals 1 is the correct
hypothesis, and given that we
00:50:51.040 --> 00:50:53.950
choose a threshold, say
we know the a priori
00:50:53.950 --> 00:50:58.180
probabilities, so we choose
a threshold that way.
00:50:58.180 --> 00:51:01.850
OK, we go through the same
argument, z1 of s is the
00:51:01.850 --> 00:51:08.760
natural log of f of y given F1
times e to the s, we're using
00:51:08.760 --> 00:51:14.150
s in place of r here, times
the natural log of f of y
00:51:14.150 --> 00:51:18.320
given h1 over f of y given h0.
00:51:18.320 --> 00:51:25.380
And this quantity, now, f of y
given h1, the f of y given h1
00:51:25.380 --> 00:51:31.510
is upstairs, so we have f of
1 plus s of y given h1.
00:51:31.510 --> 00:51:34.460
This quantity is down here,
so we have f of minus
00:51:34.460 --> 00:51:38.210
s of y given h0.
00:51:38.210 --> 00:51:42.900
And we notice that when s is
equal to minus 1, this is
00:51:42.900 --> 00:51:47.220
again equal to 0, and we notice
also, if you compare
00:51:47.220 --> 00:51:53.350
this, gamma 1 of s is equal
to gamma 0 of r minus 1.
00:51:53.350 --> 00:51:57.740
These two functions are the
same, just shifts it by one.
00:51:57.740 --> 00:52:02.680
OK, so this one of the very
strange things about
00:52:02.680 --> 00:52:09.070
hypothesis testing, namely you
are calculating these expected
00:52:09.070 --> 00:52:12.640
values, but you're calculating
the expected value of a
00:52:12.640 --> 00:52:14.320
likelihood ratio.
00:52:14.320 --> 00:52:17.630
And the likelihood ratio
involves the probabilities of
00:52:17.630 --> 00:52:22.470
the hypotheses also, so when you
calculate that ratio, what
00:52:22.470 --> 00:52:27.240
you get this is funny quantity
here, which is related to what
00:52:27.240 --> 00:52:30.460
you get when you calculate
the semi invariant moment
00:52:30.460 --> 00:52:34.420
generating function given
the other hypothesis.
00:52:34.420 --> 00:52:38.600
So that now, what we wind up
with is a gamma 1 of the eta,
00:52:38.600 --> 00:52:42.750
is e to the n times
gamma 0 of r0.
00:52:42.750 --> 00:52:46.610
I'm using the fact that gamma 1
of s is equal to gamma 0 of
00:52:46.610 --> 00:52:52.150
r minus 1, s is just r shifted
over by 1, so I can do the
00:52:52.150 --> 00:52:54.460
same optimization for each.
00:52:54.460 --> 00:52:58.940
So what I wind up with is
the probability of error
00:52:58.940 --> 00:53:05.150
conditional on hypothesis 0,
is this quantity down here.
00:53:11.050 --> 00:53:14.160
That's this one, and the
probability of error
00:53:14.160 --> 00:53:18.810
conditional on the other
hypothesis, the exponent is
00:53:18.810 --> 00:53:21.804
equal to this quantity here.
00:53:21.804 --> 00:53:29.480
OK, so what that says is that
as you shift the threshold--
00:53:29.480 --> 00:53:36.270
in other words, suppose instead
of using a map test,
00:53:36.270 --> 00:53:39.710
you say, well, I want the
probability of error to be
00:53:39.710 --> 00:53:44.050
small when hypothesis
0 correct.
00:53:44.050 --> 00:53:47.850
I want it to be small when
hypothesis 1 is correct.
00:53:47.850 --> 00:53:50.450
I have a trade off between
those two.
00:53:50.450 --> 00:53:54.170
How do I choose my threshold in
order to get the smallest
00:53:54.170 --> 00:53:56.610
value overall?
00:53:56.610 --> 00:54:00.360
So you say, well,
you're stuck.
00:54:00.360 --> 00:54:04.400
You have one exponent
under hypothesis 0.
00:54:04.400 --> 00:54:08.120
You have another exponent
under hypothesis 1.
00:54:08.120 --> 00:54:10.350
You have this curve here.
00:54:10.350 --> 00:54:16.570
You can take whatever value you
want over here, and that
00:54:16.570 --> 00:54:19.950
sticks you with a value here.
00:54:19.950 --> 00:54:26.920
You can rock things around this
inverted seesaw, and you
00:54:26.920 --> 00:54:29.830
can make one probability of
error bigger by making the
00:54:29.830 --> 00:54:34.550
other one smaller, or you make
the other one bigger by making
00:54:34.550 --> 00:54:37.000
the other one smaller.
00:54:37.000 --> 00:54:40.820
Namely, what you're doing is
changing the threshold, and as
00:54:40.820 --> 00:54:45.190
you change the threshold, as
you make the threshold
00:54:45.190 --> 00:54:50.600
positive, what you're doing is
making it harder to accept h1,
00:54:50.600 --> 00:54:54.500
h equals 1, and easier
to accept h equals 0.
00:54:54.500 --> 00:54:57.350
When you move the threshold the
other way, you're making
00:54:57.350 --> 00:54:59.160
it easier the other way.
00:54:59.160 --> 00:55:04.410
This, in fact, gives you the
choice between the two.
00:55:04.410 --> 00:55:07.100
You decide you're going
to take n tests.
00:55:07.100 --> 00:55:10.650
You can make both of these
smaller by making n bigger.
00:55:10.650 --> 00:55:13.860
But there's a trade off between
the two, and the trade
00:55:13.860 --> 00:55:21.690
off is given by this tangent
line to this curve here.
00:55:21.690 --> 00:55:26.620
And you're always stuck with
r star equals 1 and
00:55:26.620 --> 00:55:27.900
all of these problems.
00:55:27.900 --> 00:55:30.840
So the only question is what
does this curve look like?
00:55:30.840 --> 00:55:39.350
Notice that the expected value
of the likelihood ratio given
00:55:39.350 --> 00:55:42.420
h equals 0 is negative.
00:55:42.420 --> 00:55:51.500
The expected value given h
equals 1 is positive, and
00:55:51.500 --> 00:55:56.400
that's just because of the form
of the likelihood ratio.
00:55:56.400 --> 00:56:03.420
OK, so this actually shows
these two exponents.
00:56:03.420 --> 00:56:07.040
These are the exponents for
the two kinds of errors.
00:56:07.040 --> 00:56:10.950
You can view this as a large
deviation form of the Neyman
00:56:10.950 --> 00:56:13.030
Pearson test.
00:56:13.030 --> 00:56:16.700
In the Neyman Pearson test,
you're doing things in a very
00:56:16.700 --> 00:56:23.080
detailed way, and you're
taking a choice between
00:56:23.080 --> 00:56:27.800
choosing different thresholds
to make the probability of
00:56:27.800 --> 00:56:32.030
error of one type bigger or less
than the other one, just
00:56:32.030 --> 00:56:33.110
the other way.
00:56:33.110 --> 00:56:36.250
Here, we're looking at the large
deviation form of it
00:56:36.250 --> 00:56:38.830
that becomes an upper bound
rather than an exact
00:56:38.830 --> 00:56:42.560
calculation, but it tells you
much, much more because for
00:56:42.560 --> 00:56:48.430
most of these threshold tests,
you're going to do enough
00:56:48.430 --> 00:56:51.460
experiments that your
probability of error is going
00:56:51.460 --> 00:56:52.820
to be very small.
00:56:52.820 --> 00:56:57.070
So the only question is where
do you really want the error
00:56:57.070 --> 00:56:58.880
probability to be small?
00:56:58.880 --> 00:57:02.540
You can make it very small one
way by shifting the curve this
00:57:02.540 --> 00:57:05.700
way, and make it very small the
other way by shifting the
00:57:05.700 --> 00:57:08.240
curve the other way.
00:57:08.240 --> 00:57:10.260
And you take your choice
of which you want.
00:57:13.010 --> 00:57:17.350
OK, the a priori probabilities
are usually not the essential
00:57:17.350 --> 00:57:20.010
characteristic when you're
dealing with this large
00:57:20.010 --> 00:57:23.310
deviation kind of result
because, when you take a large
00:57:23.310 --> 00:57:28.360
number of tests, this threshold,
log eta over eta
00:57:28.360 --> 00:57:32.830
over n, when n becomes very
large, when you have a large
00:57:32.830 --> 00:57:36.840
number of experiments,
log eta over n
00:57:36.840 --> 00:57:38.290
becomes relatively small.
00:57:38.290 --> 00:57:42.070
So that's not the thing you're
usually concerned with.
00:57:42.070 --> 00:57:45.940
What you're concerned with is
whether one test, the patient
00:57:45.940 --> 00:57:50.640
dies, and the other tests costs
a lot of money; or one
00:57:50.640 --> 00:57:55.010
test, the nuclear plant blows
up, and the other test, you
00:57:55.010 --> 00:57:56.450
waste a lot of money,
which you wouldn't
00:57:56.450 --> 00:57:57.700
have had to pay otherwise.
00:58:01.100 --> 00:58:09.240
OK, now, here's the important
part of all of this.
00:58:09.240 --> 00:58:12.630
So far, it looked like there
wasn't any way to get out of
00:58:12.630 --> 00:58:18.910
this trade off between choosing
a threshold to make
00:58:18.910 --> 00:58:22.350
the error probability small one
way, or making the error
00:58:22.350 --> 00:58:24.700
probability small
the other way.
00:58:24.700 --> 00:58:26.510
And you think, well,
yes, there is a way
00:58:26.510 --> 00:58:28.390
to get around it.
00:58:28.390 --> 00:58:32.980
What I should do is what I do
in real life, namely if I'm
00:58:32.980 --> 00:58:37.580
trying to decide about
something, what I'm normally
00:58:37.580 --> 00:58:41.690
going to do, I don't like to
waste my time deciding about
00:58:41.690 --> 00:58:44.770
it, so as soon as the decision
becomes relatively
00:58:44.770 --> 00:58:47.470
straightforward, I
make up my mind.
00:58:47.470 --> 00:58:49.780
If the decision is not
straightforward, if I don't
00:58:49.780 --> 00:58:53.780
have enough evidence, I keep
doing more tests, so
00:58:53.780 --> 00:58:57.460
sequential tests are an obvious
thing to try to do if
00:58:57.460 --> 00:59:00.080
you can do it.
00:59:00.080 --> 00:59:03.900
What we have here, what we've
shown, is we have two coupled
00:59:03.900 --> 00:59:05.480
random walks.
00:59:05.480 --> 00:59:11.120
Given hypothesis h equals 0, we
have one random walk, and
00:59:11.120 --> 00:59:16.250
that random walk is typically
going to go down.
00:59:16.250 --> 00:59:19.960
Given h equals 1, we have
another random walk.
00:59:19.960 --> 00:59:24.640
That random walk is typically
going to go up.
00:59:24.640 --> 00:59:27.680
And one is going to go down, one
is going to go up, because
00:59:27.680 --> 00:59:31.935
we've defined the random
variable involved is a log of
00:59:31.935 --> 00:59:41.300
f of y given h1 divided by f
of y given h0, which is why
00:59:41.300 --> 00:59:45.610
the 1 walk goes up, and
the 0 walk goes down.
00:59:45.610 --> 00:59:50.600
Now, the thing we're going to
do is do a sequential test.
00:59:50.600 --> 00:59:53.120
We're going to keep
doing experiments
00:59:53.120 --> 00:59:55.200
until we cross a threshold.
00:59:55.200 --> 00:59:58.560
We're going to decide what
threshold is going to give us
00:59:58.560 --> 01:00:02.550
a small enough probability of
error under each condition,
01:00:02.550 --> 01:00:04.650
and then we choose
that threshold.
01:00:04.650 --> 01:00:09.710
And we continue to test
until we get there.
01:00:09.710 --> 01:00:13.120
So we want to find out whether
we've gained anything by that,
01:00:13.120 --> 01:00:15.210
how much we've gained
if we gain something
01:00:15.210 --> 01:00:18.650
by it, and so forth.
01:00:18.650 --> 01:00:24.370
OK, when you use two thresholds,
alpha's going to
01:00:24.370 --> 01:00:25.420
be bigger than 0.
01:00:25.420 --> 01:00:27.720
Beta's going to be
less than 0.
01:00:27.720 --> 01:00:32.110
The expected value of z given
h0 is less than 0, but the
01:00:32.110 --> 01:00:36.410
value of z given h1
is greater than 0.
01:00:36.410 --> 01:00:38.810
That's why the walks are
coupled, so we can handle each
01:00:38.810 --> 01:00:42.120
of them separately until we can
get the answers for one
01:00:42.120 --> 01:00:44.530
from the answers
for the other.
01:00:44.530 --> 01:00:50.670
Crossing alpha is a rare event
for the random walk with h0
01:00:50.670 --> 01:00:52.270
because a random walk
with h0, you're
01:00:52.270 --> 01:00:54.250
going to go down typically.
01:00:54.250 --> 01:00:55.140
You hardly ever go up.
01:00:55.140 --> 01:00:55.784
Yes?
01:00:55.784 --> 01:00:57.560
AUDIENCE: Can you please
explain again sign of
01:00:57.560 --> 01:00:58.800
expectations?
01:00:58.800 --> 01:01:00.360
PROFESSOR: The sign of
the expectations?
01:01:00.360 --> 01:01:25.170
Yes, z is the log, so that when
we actually have h equals
01:01:25.170 --> 01:01:29.670
1, the expected value of this
is going to be lined up with
01:01:29.670 --> 01:01:31.350
this term on top.
01:01:31.350 --> 01:01:34.700
We have f of y.
01:01:34.700 --> 01:01:37.920
When we have h equals 0,
this lined up with
01:01:37.920 --> 01:01:40.230
the term on the bottom.
01:01:40.230 --> 01:01:45.270
I mean, actually, you have to
go through and actually show
01:01:45.270 --> 01:01:52.590
that the integral of f of y
given h1 of this quantity is
01:01:52.590 --> 01:01:55.360
greater than 0, and the other
one is less than 0.
01:01:55.360 --> 01:01:59.020
We don't really have to do that
because, if we calculate
01:01:59.020 --> 01:02:02.550
this moment generating
function, we can
01:02:02.550 --> 01:02:05.640
pick it off of there.
01:02:05.640 --> 01:02:13.380
When we look at this moment
generating function, that
01:02:13.380 --> 01:02:19.110
slope there is the expected
value of z conditional on h
01:02:19.110 --> 01:02:23.900
equals 0, and because of the
shifting property, this slope
01:02:23.900 --> 01:02:29.010
here is the expected value of
z given h equals 1, just
01:02:29.010 --> 01:02:33.750
because the 1 curve is shifted
from the other by one unit.
01:02:38.840 --> 01:02:41.400
It's really because
of that ratio.
01:02:41.400 --> 01:02:44.010
If you defined it the other
way, you just changed the
01:02:44.010 --> 01:02:46.598
sign, so nothing important
would happen.
01:02:51.478 --> 01:02:58.980
OK, so r start equals 1 for
the h0 walk, so the
01:02:58.980 --> 01:03:03.700
probability of error, given h0,
is less than or equal to e
01:03:03.700 --> 01:03:05.430
to the minus alpha.
01:03:05.430 --> 01:03:08.040
Well, that's a nice simple
result, isn't it?
01:03:08.040 --> 01:03:09.420
In fact, that's really
beautifully.
01:03:09.420 --> 01:03:13.740
You just calculate this moment
generating function, you find
01:03:13.740 --> 01:03:15.350
the root of it, and
you're done.
01:03:15.350 --> 01:03:18.260
You have a nice bound, and in
fact, it's an exponentially
01:03:18.260 --> 01:03:20.460
tight bound.
01:03:20.460 --> 01:03:24.790
And on the other hand, when you
deal with the probability
01:03:24.790 --> 01:03:29.230
of error given h1 by symmetry,
it's less than or equal to e
01:03:29.230 --> 01:03:29.990
to the beta.
01:03:29.990 --> 01:03:32.570
Beta is a negative number,
remember, so this is
01:03:32.570 --> 01:03:35.460
exponentially going
down as you choose
01:03:35.460 --> 01:03:38.030
beta, smaller and smaller.
01:03:38.030 --> 01:03:41.140
So the thing that we're getting
is we can make each of
01:03:41.140 --> 01:03:48.290
these error probabilities as
small as we want, this one, by
01:03:48.290 --> 01:03:49.935
making alpha big.
01:03:49.935 --> 01:03:52.530
We can make this one as
small as we want by
01:03:52.530 --> 01:03:55.360
making beta big negative.
01:03:55.360 --> 01:03:58.460
There must be a cost to this.
01:03:58.460 --> 01:03:59.710
OK, but what's the cost?
01:04:03.480 --> 01:04:05.595
What happens when you
make alpha big?
01:04:10.010 --> 01:04:14.110
When hypothesis 1 is the correct
hypothesis, what
01:04:14.110 --> 01:04:19.010
normally happens is that this
random walk is going to go up
01:04:19.010 --> 01:04:23.580
roughly at a slope of the
expected value of z
01:04:23.580 --> 01:04:26.530
given h equals 0.
01:04:26.530 --> 01:04:29.970
So when you make alpha very,
very large, you're forced to
01:04:29.970 --> 01:04:35.040
make a very large number of
tests when h is equal to 1.
01:04:35.040 --> 01:04:37.610
When you make beta very, very
large, you're forced to take a
01:04:37.610 --> 01:04:41.840
large number of tests when
h is equal to 0.
01:04:41.840 --> 01:04:45.220
So the trade off here is
a little bit funny.
01:04:45.220 --> 01:04:49.090
You make your error probability
for h equals 0
01:04:49.090 --> 01:04:54.280
very, very small by costing more
money when hypotheses 1
01:04:54.280 --> 01:04:57.730
is the correct hypothesis
because you don't make a
01:04:57.730 --> 01:05:01.540
decision until you've
really climb way up
01:05:01.540 --> 01:05:03.280
on this random walk.
01:05:03.280 --> 01:05:09.050
And that means it takes a long
time when you have h equals 1.
01:05:09.050 --> 01:05:12.490
Since when h is equal to 1, the
probability of crossing
01:05:12.490 --> 01:05:17.660
this lower threshold is it is
almost negligible, this
01:05:17.660 --> 01:05:20.550
expected time that it takes
is really just a
01:05:20.550 --> 01:05:23.110
function of h equals 1.
01:05:23.110 --> 01:05:24.880
I'm going to show that
in the next slide.
01:05:31.280 --> 01:05:35.780
When you increase alpha, it
lowers the probability of
01:05:35.780 --> 01:05:39.190
error given h equals 0.
01:05:39.190 --> 01:05:42.560
Excuse me, I should have h
equals 0 instead of h sub 0.
01:05:42.560 --> 01:05:48.660
Exponentially, it increases the
expected number of steps
01:05:48.660 --> 01:05:54.420
until you make a decision
given h1.
01:05:54.420 --> 01:05:58.940
Expected value of j given h1 is
effectively equal to alpha
01:05:58.940 --> 01:06:02.960
divided by expected value
of z given h1.
01:06:02.960 --> 01:06:05.840
Why is that?
01:06:05.840 --> 01:06:08.210
That's essentially
Wald's equality.
01:06:08.210 --> 01:06:11.700
Not Wald's identity, but Wald's
equality because--
01:06:25.250 --> 01:06:29.500
Yes, it says from Wald's
equality, since alpha is
01:06:29.500 --> 01:06:33.570
essentially equal to the
expected value of s of j given
01:06:33.570 --> 01:06:37.040
h equals 1, the number of
testing you have to take when
01:06:37.040 --> 01:06:41.960
h is equal to 1, when alpha
is very, very large, is
01:06:41.960 --> 01:06:45.430
effectively the amount of time
that it takes you to get up to
01:06:45.430 --> 01:06:46.310
the point alpha.
01:06:46.310 --> 01:06:51.110
That expected amount of time is
typically pretty close to
01:06:51.110 --> 01:06:52.790
the mean value.
01:06:52.790 --> 01:07:01.530
So alpha there is close to the
expected value of s of j given
01:07:01.530 --> 01:07:02.730
h equals 1.
01:07:02.730 --> 01:07:07.260
So Wald's equality, given h
equals 1, says the expected
01:07:07.260 --> 01:07:13.880
value of j given h1 is equal
to the expected value of sj
01:07:13.880 --> 01:07:17.530
given h equals 1, that's alpha,
divided by the expected
01:07:17.530 --> 01:07:22.550
value of z given h1,
which is just the
01:07:22.550 --> 01:07:24.220
underlying likelihood ratio.
01:07:27.640 --> 01:07:30.850
So to get this result, we just
substitute alpha for the
01:07:30.850 --> 01:07:33.340
expected value.
01:07:33.340 --> 01:07:37.260
And then the probability of
error, given h equals 0, if we
01:07:37.260 --> 01:07:40.680
write it this way, we see
the cost immediately.
01:07:40.680 --> 01:07:46.230
That's the expected value
of j given h equal to 1.
01:07:46.230 --> 01:07:50.430
In other words, the expected
number of tests given h equals
01:07:50.430 --> 01:07:54.555
1 times the expected value of
the log likelihood ratio given
01:07:54.555 --> 01:07:55.820
h equals 1.
01:07:55.820 --> 01:07:59.220
When you decrease beta, that
lowers the probability of
01:07:59.220 --> 01:08:03.960
error given h1 exponentially,
but it increases the number of
01:08:03.960 --> 01:08:08.390
tests when h0 is the
correct hypothesis.
01:08:08.390 --> 01:08:14.010
So in that case, you get the
probability of error given h
01:08:14.010 --> 01:08:20.240
equals 1 is effectively equal to
the expected value e to the
01:08:20.240 --> 01:08:23.970
expected value of j
equals j equals 0.
01:08:23.970 --> 01:08:27.160
This is just the number of tests
you have to do when h is
01:08:27.160 --> 01:08:28.630
equal to 0.
01:08:28.630 --> 01:08:32.450
This is the expected value of
the log likelihood ratio when
01:08:32.450 --> 01:08:34.930
h is equal to 0.
01:08:34.930 --> 01:08:38.630
This is very approximate, but
this is how you would actually
01:08:38.630 --> 01:08:43.680
choose how big you make alpha,
how big do you make beta if
01:08:43.680 --> 01:08:48.109
you want to do a test between
these two hypotheses.
01:08:48.109 --> 01:08:54.520
Now, this shows what you're
gaining by the sequential test
01:08:54.520 --> 01:08:57.130
over what you're gaining by
the non-sequential test.
01:08:57.130 --> 01:09:00.160
You don't have this in your
notes, so you might just jot
01:09:00.160 --> 01:09:01.439
it down quickly.
01:09:01.439 --> 01:09:08.399
The expected value of z,
conditional on h equals 0, is
01:09:08.399 --> 01:09:14.220
this slope here, the slope of
the moment generating function
01:09:14.220 --> 01:09:16.250
is z equals 0.
01:09:16.250 --> 01:09:20.490
That's the slope of the
underlying random variable.
01:09:20.490 --> 01:09:24.920
Since this point is r equal to
1, this point down here is the
01:09:24.920 --> 01:09:29.130
expected value of z
given h equals 0.
01:09:29.130 --> 01:09:38.620
That's the exponents that you
get when h equals 0 is, in
01:09:38.620 --> 01:09:40.120
fact, the correct exponent.
01:09:40.120 --> 01:09:45.819
When given the probability of
error given that h is equal to
01:09:45.819 --> 01:09:50.600
0, namely the probability that
you choose hypothesis 1.
01:09:50.600 --> 01:09:51.950
Same way over here.
01:09:51.950 --> 01:09:55.440
This slope here is the expected
value of the log
01:09:55.440 --> 01:09:59.015
likelihood ratio given
h equals 1.
01:09:59.015 --> 01:10:03.230
This hits down here at minus
expected value of z
01:10:03.230 --> 01:10:05.050
given h equals 1.
01:10:05.050 --> 01:10:08.750
So you have this exponent going
one way, you have this
01:10:08.750 --> 01:10:12.060
exponent going the other way
when the thing multiplying the
01:10:12.060 --> 01:10:15.900
exponent is not an absolute
value but is, in fact, the
01:10:15.900 --> 01:10:20.130
number of tests you have to
do than the other test.
01:10:20.130 --> 01:10:25.710
Now, if we do the fix test,
what we're fixed with is a
01:10:25.710 --> 01:10:31.550
test where you take a line
tangent to this curve, which
01:10:31.550 --> 01:10:34.090
goes from here across
here to there.
01:10:34.090 --> 01:10:36.410
We can see-saw it around.
01:10:36.410 --> 01:10:40.470
When we see-saw it all the way
in the limit, we can get this
01:10:40.470 --> 01:10:42.370
result here.
01:10:42.370 --> 01:10:48.240
But we get this result here at
the cost of an error, which is
01:10:48.240 --> 01:10:52.030
almost one in the other
case, so that's
01:10:52.030 --> 01:10:54.060
not a very good deal.
01:10:54.060 --> 01:10:57.940
This says that sequential
testing, well, it shows you
01:10:57.940 --> 01:11:01.960
how much you gain by doing
a sequential test.
01:11:01.960 --> 01:11:04.620
I mean, it might not be
intuitively obvious why this
01:11:04.620 --> 01:11:06.580
is happening.
01:11:06.580 --> 01:11:10.480
I mean, really the reason it's
happening is that the times
01:11:10.480 --> 01:11:16.600
when you want to make the test
very long are those times when
01:11:16.600 --> 01:11:21.110
if h is equal to 0, you
normally go down.
01:11:21.110 --> 01:11:24.620
The next most normal thing is
you wobble around without
01:11:24.620 --> 01:11:28.430
doing anything for a long time,
in which case you want
01:11:28.430 --> 01:11:33.210
to keep doing additional tests
until finally it falls down,
01:11:33.210 --> 01:11:35.090
or finally it goes up.
01:11:35.090 --> 01:11:38.810
But by taking additional
tests, you make it very
01:11:38.810 --> 01:11:42.740
unlikely that you're ever going
to cross that threshold.
01:11:42.740 --> 01:11:45.800
So that's the thing
you're gaining.
01:11:45.800 --> 01:11:54.140
You are gaining the fact that
the error is small in those
01:11:54.140 --> 01:11:58.560
situations where the sum of
these random variables stays
01:11:58.560 --> 01:12:02.410
close to 0 for a long time, and
then you don't make errors
01:12:02.410 --> 01:12:03.660
in those cases.
01:12:08.120 --> 01:12:09.930
We now have just a little
bit of time to
01:12:09.930 --> 01:12:11.930
prove Wald's identity.
01:12:11.930 --> 01:12:14.890
I don't want to have a lot of
time to prove it because
01:12:14.890 --> 01:12:18.090
proofs of theorems are things
you really have to look at
01:12:18.090 --> 01:12:19.810
yourselves.
01:12:19.810 --> 01:12:22.320
This one, you almost don't
have to look at it.
01:12:22.320 --> 01:12:27.460
This one is almost obvious as
soon as you understand what a
01:12:27.460 --> 01:12:29.790
tilted probability is.
01:12:29.790 --> 01:12:35.480
So let's suppose that x sub n is
a sequence of IID discrete
01:12:35.480 --> 01:12:36.840
random variables.
01:12:36.840 --> 01:12:42.110
It has a moment generating
function for some given r.
01:12:42.110 --> 01:12:44.240
We're going to assume that these
random variables are
01:12:44.240 --> 01:12:47.740
discrete now to make this
argument simple.
01:12:47.740 --> 01:12:51.770
If they're not discrete, this
whole argument has to be
01:12:51.770 --> 01:12:54.270
replaced with all sorts
of [INAUDIBLE]
01:12:54.270 --> 01:12:56.090
integrals and all
of that stuff.
01:12:56.090 --> 01:12:59.450
It's exactly the same idea,
but it just is messy
01:12:59.450 --> 01:13:01.190
mathematically.
01:13:01.190 --> 01:13:04.540
So what we're going to do is
we're going to define a tilted
01:13:04.540 --> 01:13:05.970
random variable.
01:13:05.970 --> 01:13:10.590
A tilted random variable is a
random variable in a different
01:13:10.590 --> 01:13:11.860
probability space.
01:13:11.860 --> 01:13:14.950
OK, we start out with this
probability space that we're
01:13:14.950 --> 01:13:23.830
interested in, and then we say,
OK, suppose that we, just
01:13:23.830 --> 01:13:29.000
to satisfy our imaginations, we
suppose the probabilities
01:13:29.000 --> 01:13:30.690
are different.
01:13:30.690 --> 01:13:34.090
We assume that the probabilities
for a given r is
01:13:34.090 --> 01:13:39.220
the probability that the random
variable X is equal to
01:13:39.220 --> 01:13:47.070
little x, namely this quantity
here, is equal to the original
01:13:47.070 --> 01:13:50.420
probability that X is
equal to little x.
01:13:50.420 --> 01:13:52.980
All the sample values are
the same, it's just the
01:13:52.980 --> 01:13:59.470
probability's changed, times e
to the rx minus gamma of r.
01:13:59.470 --> 01:14:03.240
So we're taking these
probabilities when X is large.
01:14:03.240 --> 01:14:05.790
We're magnifying them
when x is small.
01:14:05.790 --> 01:14:07.950
We're knocking them down.
01:14:07.950 --> 01:14:09.190
What's the purpose of this?
01:14:09.190 --> 01:14:11.420
It's just a normalization
factor.
01:14:11.420 --> 01:14:16.800
e to the minus gamma of r is 1
over the moment generating
01:14:16.800 --> 01:14:20.730
function of r, so you take
p of x, e to the rx,
01:14:20.730 --> 01:14:25.560
divide it by g of r.
01:14:25.560 --> 01:14:31.280
So this is a probability mass
function, as well as this.
01:14:31.280 --> 01:14:33.980
This is the correct probability
mass function for
01:14:33.980 --> 01:14:36.320
the model you're looking at.
01:14:36.320 --> 01:14:39.860
This is an imaginary one, but
you can always imagine.
01:14:39.860 --> 01:14:44.070
You can say let's suppose that
we had this model instead of
01:14:44.070 --> 01:14:44.950
the other model.
01:14:44.950 --> 01:14:48.430
All the sample values are the
same, but the probabilities
01:14:48.430 --> 01:14:49.800
are different.
01:14:49.800 --> 01:14:53.530
So we want to see what we can
find out from these different
01:14:53.530 --> 01:14:58.200
probabilities in this different
probability model.
01:14:58.200 --> 01:15:01.540
If you sum over x here,
this sum is equal to
01:15:01.540 --> 01:15:03.940
1, as we just said.
01:15:03.940 --> 01:15:11.580
So we'll view q sub xr of x as
the probability mass function
01:15:11.580 --> 01:15:14.560
on x in a new probability
space.
01:15:14.560 --> 01:15:19.050
We can use all the laws of
probability in this new space,
01:15:19.050 --> 01:15:22.000
and that's exactly what
we're going to do.
01:15:22.000 --> 01:15:25.790
And we're going to say things
about the new space, but then
01:15:25.790 --> 01:15:29.670
we can always come back to the
old space from this formula
01:15:29.670 --> 01:15:34.230
here because whatever we find
out in the new space will work
01:15:34.230 --> 01:15:35.880
in the old space.
01:15:35.880 --> 01:15:40.000
One thing we'd like to do is
to be able to find the
01:15:40.000 --> 01:15:44.720
expected value of the random
variable x in this new
01:15:44.720 --> 01:15:48.940
probability space, so this isn't
the expected value in
01:15:48.940 --> 01:15:49.680
the old space.
01:15:49.680 --> 01:15:51.950
It's a probability
in the new space.
01:15:51.950 --> 01:15:57.130
It's the sum over x of x
times q sub xr of x.
01:15:57.130 --> 01:15:59.320
That's what the expected
value is.
01:15:59.320 --> 01:16:02.000
X is the same in both spaces.
01:16:02.000 --> 01:16:05.030
That's just the probabilities
that have changed.
01:16:05.030 --> 01:16:12.400
These are p of x times z to the
rx minus gamma of r, so
01:16:12.400 --> 01:16:16.760
when you sum this, what you get
is 1 over g of xr, which
01:16:16.760 --> 01:16:20.440
is that term, times the
derivative of p sub x
01:16:20.440 --> 01:16:23.560
of x, e to the rx.
01:16:23.560 --> 01:16:27.290
When you take this derivative,
then you get an x in front,
01:16:27.290 --> 01:16:29.020
which is that x there.
01:16:29.020 --> 01:16:33.230
So you get g prime of xr
over gx of r, which is
01:16:33.230 --> 01:16:35.410
gamma prime of r.
01:16:35.410 --> 01:16:38.830
OK, so in terms of that graph
we've drawn, when you take
01:16:38.830 --> 01:16:44.320
these tilted probabilities, you
move that slope, that r
01:16:44.320 --> 01:16:48.370
equals 0, and now you're looking
at a slope at whatever
01:16:48.370 --> 01:16:50.590
r you're looking at.
01:16:50.590 --> 01:16:52.680
And that gives you the
expected value there.
01:16:56.290 --> 01:17:03.770
OK, if you have a joint tilted
probability mass function--
01:17:03.770 --> 01:17:05.750
and don't think it gets
any more complicated.
01:17:05.750 --> 01:17:07.000
It doesn't.
01:17:07.000 --> 01:17:10.120
I mean, you've already gone
through the major complication
01:17:10.120 --> 01:17:11.580
of this argument.
01:17:11.580 --> 01:17:19.690
The joint tilted PMF is the
probability of x1 to xn is the
01:17:19.690 --> 01:17:24.940
old probability of x1 to xn
times all of these tilted
01:17:24.940 --> 01:17:27.440
factors here.
01:17:27.440 --> 01:17:31.470
If you let a of sn be the set
of n tuples which have the
01:17:31.470 --> 01:17:37.590
same sum, then all these terms
become r times s sub n.
01:17:37.590 --> 01:17:42.410
So what you get is that for each
xn for which the sum is
01:17:42.410 --> 01:17:48.620
sn, this tilted probability
becomes the old probability
01:17:48.620 --> 01:17:53.310
times e to the r sn minus n
gamma of r, which says that
01:17:53.310 --> 01:17:58.380
when we look at the tilted
probability of the sum, namely
01:17:58.380 --> 01:18:02.050
we said that when we tilt these
probabilities, we can do
01:18:02.050 --> 01:18:04.710
everything in a new space that
we could do in the old space.
01:18:04.710 --> 01:18:08.080
We can do everything that
probability theory allows us
01:18:08.080 --> 01:18:12.460
to do, so we can look at the
probability of s sub n in the
01:18:12.460 --> 01:18:14.210
new space also.
01:18:14.210 --> 01:18:17.340
The probability of sn in the
old space, namely we're
01:18:17.340 --> 01:18:23.950
summing this quantity, overall
xn in a of sn, so we sum up
01:18:23.950 --> 01:18:28.610
all of those as the probability
sub s sub n at sn
01:18:28.610 --> 01:18:32.090
times this quantity,
which is fixed.
01:18:32.090 --> 01:18:36.980
So this is the key to a lot
of large deviation theory.
01:18:36.980 --> 01:18:41.090
Any time you're dealing with a
difficult problem, and you
01:18:41.090 --> 01:18:45.170
want to see what's happening
way, way away from the mean,
01:18:45.170 --> 01:18:48.110
you want to see what these
sums look like for these
01:18:48.110 --> 01:18:52.040
exceptional cases, what we do
is we look at a new model
01:18:52.040 --> 01:18:56.850
where we tilt the probability so
that the region of concern
01:18:56.850 --> 01:19:02.320
becomes the main region
for that tilted model.
01:19:02.320 --> 01:19:04.960
So for r equals 0, we're
tilting the probability
01:19:04.960 --> 01:19:08.330
towards large values, and you
can use the law of large
01:19:08.330 --> 01:19:11.970
numbers, essential limit
theorem, whatever you want to,
01:19:11.970 --> 01:19:15.280
in that new space, then.
01:19:15.280 --> 01:19:17.105
Now, we can prove
Wald's equality.
01:19:20.490 --> 01:19:25.020
What Wald's identity is is the
statement that when you tilt
01:19:25.020 --> 01:19:31.020
these probabilities, a stopping
rule in this tilted
01:19:31.020 --> 01:19:36.570
world is still the stopping
time is still a random
01:19:36.570 --> 01:19:39.760
variable, namely you still
stop with probability 1.
01:19:39.760 --> 01:19:42.960
Somebody questioned whether you
stop with probability 1 in
01:19:42.960 --> 01:19:44.652
the old world.
01:19:44.652 --> 01:19:47.400
Like I said, you do because
you have this positive
01:19:47.400 --> 01:19:50.330
variance, and the thing with two
thresholds keeps growing
01:19:50.330 --> 01:19:51.850
and growing.
01:19:51.850 --> 01:19:55.560
Here, you have the same thing.
01:19:55.560 --> 01:19:58.530
I mean, the mean doesn't make
any difference at all.
01:19:58.530 --> 01:20:01.770
I mean, you're looking at trying
to exceed one of two
01:20:01.770 --> 01:20:05.020
different thresholds, and
eventually, you exceed one of
01:20:05.020 --> 01:20:07.900
them no matter where
you set r.
01:20:07.900 --> 01:20:13.840
So what this is saying is the
probability that j is equal to
01:20:13.840 --> 01:20:19.610
n in this tilted space is equal
to the probability that
01:20:19.610 --> 01:20:23.920
j is equal to n in the old
space times z to the r sn
01:20:23.920 --> 01:20:26.300
minus gamma of r.
01:20:26.300 --> 01:20:30.280
So this quantity is equal to the
expected value of e to the
01:20:30.280 --> 01:20:32.540
r sn minus gamma of r.
01:20:32.540 --> 01:20:36.300
Given j equals n times the
probability that j is equal to
01:20:36.300 --> 01:20:39.860
n, you sum this over n
and, bingo, you're
01:20:39.860 --> 01:20:42.010
back at the Wald identity.
01:20:42.010 --> 01:20:44.220
So that's all the Wald identity
is, is just a
01:20:44.220 --> 01:20:48.260
statement that when you tilt a
probability, and you have a
01:20:48.260 --> 01:20:52.340
stopping rule on the original
probabilities, you then have a
01:20:52.340 --> 01:20:55.620
stopping rule on the
new probabilities.
01:20:55.620 --> 01:20:59.390
And Wald's identity says--
01:20:59.390 --> 01:21:03.500
well, Wald's identity holds
whenever that tilted stopping
01:21:03.500 --> 01:21:06.372
rule is a random variable.
01:21:06.372 --> 01:21:11.540
OK, that's it for today.
01:21:11.540 --> 01:21:13.240
We will do martingales
on Wednesday.