WEBVTT
00:00:00.530 --> 00:00:02.960
The following content is
provided under a Creative
00:00:02.960 --> 00:00:04.370
Commons license.
00:00:04.370 --> 00:00:07.410
Your support will help MIT
OpenCourseWare continue to
00:00:07.410 --> 00:00:11.060
offer high quality educational
resources for free.
00:00:11.060 --> 00:00:13.960
To make a donation or view
additional materials from
00:00:13.960 --> 00:00:17.890
hundreds of MIT courses, visit
MIT OpenCourseWare at
00:00:17.890 --> 00:00:19.140
ocw.mit.edu.
00:00:23.140 --> 00:00:27.360
PROFESSOR: OK, I guess we're all
set for getting close to
00:00:27.360 --> 00:00:31.550
the end, coming now to a race
about whether we could say
00:00:31.550 --> 00:00:35.140
anything meaningful about
Martingales or not.
00:00:35.140 --> 00:00:37.370
But I think we can.
00:00:37.370 --> 00:00:43.040
I want to spend a little time
reviewing the Wald identity
00:00:43.040 --> 00:00:47.380
today and also sequential
tests.
00:00:47.380 --> 00:00:51.446
It turns out that last
time on the slides--
00:00:55.750 --> 00:00:57.690
I didn't get the thresholds
confused--
00:00:57.690 --> 00:01:01.760
I got hypothesis 0 and
hypothesis 1 interchanged from
00:01:01.760 --> 00:01:03.560
the way we usually do them.
00:01:03.560 --> 00:01:05.780
And it doesn't make
any difference.
00:01:05.780 --> 00:01:11.520
There's no difference between
hypothesis 0 and hypothesis 1.
00:01:11.520 --> 00:01:13.900
And you can do it either
way you want to.
00:01:13.900 --> 00:01:18.000
But it gets very confusing when
you switch from one to
00:01:18.000 --> 00:01:20.520
the other when you're halfway
through an argument.
00:01:20.520 --> 00:01:25.960
So I'm going to go through
part of that again today.
00:01:25.960 --> 00:01:30.370
And we will get revised slides
on the web, so that if you
00:01:30.370 --> 00:01:37.310
want to see them with the
hypotheses done in a
00:01:37.310 --> 00:01:40.690
consistent way, you
will see it there.
00:01:40.690 --> 00:01:44.750
That should be on there by
this afternoon I hope.
00:01:44.750 --> 00:01:52.590
OK, so let's go on and review
what Mr. Wald said.
00:01:55.530 --> 00:01:58.820
He was talking about
a random walk.
00:01:58.820 --> 00:02:02.910
Random walk consists of a bunch
of a sequence of IID
00:02:02.910 --> 00:02:04.510
random variables.
00:02:04.510 --> 00:02:08.949
The random walk consists of the
sequence of partial sums
00:02:08.949 --> 00:02:11.100
of those random variables.
00:02:11.100 --> 00:02:16.260
And the question is if this
random walk is taking place
00:02:16.260 --> 00:02:21.000
and you have two thresholds, one
at alpha and one of beta--
00:02:21.000 --> 00:02:24.080
and beta is below 0 and
alpha is above 0--
00:02:24.080 --> 00:02:29.510
and you start at 0, of course,
and the question is when do
00:02:29.510 --> 00:02:33.550
you cross one of these two
thresholds and which
00:02:33.550 --> 00:02:34.790
threshold do cross?
00:02:34.790 --> 00:02:38.190
What's the probability of
crossing each other, and
00:02:38.190 --> 00:02:41.090
everything else you can say
about this problem.
00:02:41.090 --> 00:02:44.990
And it turns out this is a very
major problem as far as
00:02:44.990 --> 00:02:49.300
stochastic processes are
concerned, because it comes up
00:02:49.300 --> 00:02:50.670
almost everywhere.
00:02:50.670 --> 00:02:54.210
It certainly comes up as far as
00:02:54.210 --> 00:02:57.110
hypothesis testing is concerned.
00:02:57.110 --> 00:02:59.940
It's probably the major problem
there when you get
00:02:59.940 --> 00:03:02.710
into sequential analysis.
00:03:02.710 --> 00:03:05.090
It's the major problem there.
00:03:05.090 --> 00:03:07.060
So it's a very important
problem.
00:03:07.060 --> 00:03:14.820
And what Wald said was if you
let the random variable J be
00:03:14.820 --> 00:03:18.960
the stopping time of this random
walk, namely, the time
00:03:18.960 --> 00:03:23.340
in which the walk first crosses
either alpha or
00:03:23.340 --> 00:03:29.090
crosses the beta, and then he
said no matter what r you
00:03:29.090 --> 00:03:34.100
choose and the range of points
where the moment generating
00:03:34.100 --> 00:03:38.525
function g sub X of r exists.
00:03:38.525 --> 00:03:43.980
Y You can pick any r in that
range, and then what you get
00:03:43.980 --> 00:03:48.910
is this strange looking
equality here.
00:03:48.910 --> 00:03:58.030
And I pointed out last time it
just wasn't all that strange,
00:03:58.030 --> 00:04:02.720
because if instead of using the
stopping time of when you
00:04:02.720 --> 00:04:07.130
cross a threshold, if instead
you used as a stopping time
00:04:07.130 --> 00:04:08.640
just some particular end.
00:04:08.640 --> 00:04:11.660
You go for some number of steps,
and then you stop.
00:04:11.660 --> 00:04:13.440
And at that point, you
have the expected
00:04:13.440 --> 00:04:17.560
value of E to the rsn.
00:04:17.560 --> 00:04:22.460
The expected value of E to the
rsn by definition is the
00:04:22.460 --> 00:04:30.150
moment generating function at r
of S sub n, which is exactly
00:04:30.150 --> 00:04:41.460
equal to the minus J times
gamma n times gamma of r.
00:04:41.460 --> 00:04:46.540
So all we're doing here,
all that Wald did--
00:04:46.540 --> 00:04:48.820
as it turns out, it
was quite a bit--
00:04:48.820 --> 00:04:52.430
was to say that when you replace
a fixed end with a
00:04:52.430 --> 00:04:56.150
stopping time, you still
get the same result.
00:04:56.150 --> 00:04:59.750
We're stating it here just for
the case of two thresholds.
00:04:59.750 --> 00:05:01.990
Wald stated it in much
general terms.
00:05:01.990 --> 00:05:05.620
We'll use it a more general
terms, when we say more about
00:05:05.620 --> 00:05:06.870
martingales.
00:05:11.740 --> 00:05:15.520
X, now remember is the
underlying random variable.
00:05:15.520 --> 00:05:18.038
S sub n is the sum of the X's.
00:05:29.740 --> 00:05:39.080
If X bar is less than 0, and
if gamma r star equals 0, r
00:05:39.080 --> 00:05:43.140
star is the r at which
gamma of r equals 0.
00:05:43.140 --> 00:05:46.160
It's the second root
of gamma of r.
00:05:46.160 --> 00:06:01.660
Gamma of r, if you remember,
looks like this.
00:06:01.660 --> 00:06:04.230
This is r star here.
00:06:07.610 --> 00:06:13.420
This is the expected value
of X as the slope here.
00:06:13.420 --> 00:06:18.930
And we're assuming that X bar
is less than 0 for this.
00:06:18.930 --> 00:06:22.380
I don't know how that greater
than 0 got in.
00:06:22.380 --> 00:06:26.060
And then what it says is the
probability that SJ is greater
00:06:26.060 --> 00:06:28.810
than or equal to alpha is less
than or equal to the minus
00:06:28.810 --> 00:06:30.880
alpha r star.
00:06:30.880 --> 00:06:33.080
And last time, remember, we
went through a long, messy
00:06:33.080 --> 00:06:35.850
bunch of equations for that.
00:06:35.850 --> 00:06:40.520
I looked at it again, and this
is just the simple old Markov
00:06:40.520 --> 00:06:43.090
inequality again.
00:06:43.090 --> 00:06:48.620
All you do to get this is you
say, OK, think of the random
00:06:48.620 --> 00:06:51.730
variable, E to the r SJ.
00:06:51.730 --> 00:06:54.220
SJ is a very complicated random
variable, but it's a
00:06:54.220 --> 00:06:56.420
random variable, nonetheless.
00:06:56.420 --> 00:07:02.960
So either the rSJ is a random
variable and the expected
00:07:02.960 --> 00:07:18.960
value of that random variable
is at r star, the expected
00:07:18.960 --> 00:07:21.165
value of it is just one.
00:07:33.430 --> 00:07:34.790
I'll write down.
00:07:34.790 --> 00:07:37.610
It'll be easier.
00:07:37.610 --> 00:07:47.630
The expected value e to the r
star S sub J is equal to 1.
00:07:47.630 --> 00:08:00.360
And therefore, the probability
that E to the r star SJ is a
00:08:00.360 --> 00:08:10.520
greater than or equal to E to
the r star alpha is just less
00:08:10.520 --> 00:08:31.820
than or equal to 1 over E
to the r star alpha, OK?
00:08:31.820 --> 00:08:33.480
And that's what the
inequality says.
00:08:33.480 --> 00:08:36.960
So that's all there is to it.
00:08:36.960 --> 00:08:38.700
OK, what?
00:08:38.700 --> 00:08:43.155
AUDIENCE: I don't really see
why these two [INAUDIBLE]?
00:08:43.155 --> 00:08:47.120
They don't [INAUDIBLE].
00:08:47.120 --> 00:08:49.610
PROFESSOR: You need x1 negative
so that you get
00:08:49.610 --> 00:08:53.760
another root so that
r star exists.
00:08:53.760 --> 00:08:58.410
If r star is positive, if the
expected value of x is
00:08:58.410 --> 00:09:03.090
positive, then r star is down
here at negative r.
00:09:13.000 --> 00:09:14.170
I mean, you're talking
about the other
00:09:14.170 --> 00:09:15.790
threshold in a sense.
00:09:19.900 --> 00:09:26.050
OK, this is valid for all
lower thresholds.
00:09:26.050 --> 00:09:28.850
And it's also valid
for no threshold.
00:09:28.850 --> 00:09:32.760
OK, in other words, this
equation here does not have
00:09:32.760 --> 00:09:34.630
beta in it at all.
00:09:34.630 --> 00:09:41.130
So this equation is an upper
bound on the probability, that
00:09:41.130 --> 00:09:43.960
you're going to cross that
threshold of alpha.
00:09:43.960 --> 00:09:47.550
And that upper bound is valid,
no matter where you put the
00:09:47.550 --> 00:09:49.480
lower bound at all.
00:09:49.480 --> 00:09:51.830
So you can go to the
limit as the lower
00:09:51.830 --> 00:09:53.930
bound goes to infinity.
00:09:53.930 --> 00:09:57.120
And this inequality should
still be valid.
00:09:57.120 --> 00:10:00.200
You have a homework problem
where you actually prove that.
00:10:00.200 --> 00:10:04.050
Sometimes when things go to
infinity, funny things happen.
00:10:04.050 --> 00:10:07.790
And that proves that nothing
funny happens then.
00:10:07.790 --> 00:10:13.900
So what happens then is the
probability that you ever
00:10:13.900 --> 00:10:18.000
cross a threshold at plus alpha,
when you have a random
00:10:18.000 --> 00:10:26.840
variable, which has a negative
mean, is this exponent here.
00:10:26.840 --> 00:10:31.900
And we also sort of showed by
looking at the turn off bound
00:10:31.900 --> 00:10:34.300
that this bound is
pretty tight.
00:10:34.300 --> 00:10:37.130
So in other words, what this
is saying is when you're
00:10:37.130 --> 00:10:40.740
looking at threshold
crossing problems--
00:10:40.740 --> 00:10:45.560
this quantity here, this
quantity where the second root
00:10:45.560 --> 00:10:48.220
of gamma of r is--
00:10:48.220 --> 00:10:50.460
that's sort of the
crucial parameter
00:10:50.460 --> 00:10:51.470
that you want to know.
00:10:51.470 --> 00:10:54.170
Usually the first thing you want
to know about a random
00:10:54.170 --> 00:10:57.150
variable is its mean, its
variance, all sorts
00:10:57.150 --> 00:10:58.640
of things like that.
00:10:58.640 --> 00:11:01.160
This is saying if you're
interested in thresholds,
00:11:01.160 --> 00:11:05.702
forget about all those things,
look at r star.
00:11:05.702 --> 00:11:09.320
If r star is positive that means
it means is negative, so
00:11:09.320 --> 00:11:10.580
there's no problem there.
00:11:10.580 --> 00:11:14.700
But this one quantity here is
sort of the most important
00:11:14.700 --> 00:11:18.140
parameter of all of
these problems.
00:11:18.140 --> 00:11:25.900
OK, so let's go back to look at
a hypothesis testing again,
00:11:25.900 --> 00:11:32.750
where we're looking at the
likelihood ratio of being the
00:11:32.750 --> 00:11:37.460
ratio of the density for
hypothesis 0 divided by
00:11:37.460 --> 00:11:39.960
hypothesis 1.
00:11:39.960 --> 00:11:45.480
What you get then is you observe
this sequence Y sub n.
00:11:45.480 --> 00:11:48.360
These are the observations
that you're taking.
00:11:48.360 --> 00:11:51.330
In other words, nature at the
beginning of this whole
00:11:51.330 --> 00:11:58.870
experiment chooses either H
equals 0 or H equals 1.
00:11:58.870 --> 00:12:02.370
At that point, you start
to make measurements.
00:12:02.370 --> 00:12:06.180
Now whether nature chooses H
equals 0 before or after or
00:12:06.180 --> 00:12:08.220
when doesn't make
any difference.
00:12:08.220 --> 00:12:12.090
The point is the experiment
consists of nature choosing
00:12:12.090 --> 00:12:13.980
one of these two hypotheses.
00:12:13.980 --> 00:12:18.590
You know all the probabilities
that exist in the world in
00:12:18.590 --> 00:12:19.710
this model.
00:12:19.710 --> 00:12:21.770
You go making these
measurements.
00:12:21.770 --> 00:12:23.640
All you of observe is
these measurements.
00:12:23.640 --> 00:12:28.830
You don't observe what the
hypothesis is, so you define
00:12:28.830 --> 00:12:33.760
this likelihood ratio of the
ratio of the densities of the
00:12:33.760 --> 00:12:42.940
vector Y for H equals 0 and the
vector Y with H equals 1.
00:12:42.940 --> 00:12:47.660
These quantities exist no
matter what the a priori
00:12:47.660 --> 00:12:49.360
probabilities of the thresholds
00:12:49.360 --> 00:12:51.470
are or anything else.
00:12:51.470 --> 00:12:55.430
Even without all of that, so
long as you have a model which
00:12:55.430 --> 00:13:01.680
tells you what the densities
of these observations are,
00:13:01.680 --> 00:13:04.110
conditional on each hypothesis,
00:13:04.110 --> 00:13:05.140
you can define this.
00:13:05.140 --> 00:13:09.070
This doesn't depend on a priori
probabilities at all.
00:13:09.070 --> 00:13:14.010
OK, so now you look at the
probability that H is equal to
00:13:14.010 --> 00:13:18.150
0, given all these observations
divided by the
00:13:18.150 --> 00:13:20.470
probability it's equal to 1.
00:13:20.470 --> 00:13:23.000
What you get here now,
you have the a priori
00:13:23.000 --> 00:13:26.090
probabilities, p0 over p1.
00:13:26.090 --> 00:13:28.880
Here is the likelihood
ratio here.
00:13:28.880 --> 00:13:34.610
So what you have this p0 over p1
times the likelihood ratio
00:13:34.610 --> 00:13:36.400
of this vector of however many
00:13:36.400 --> 00:13:39.585
observations you have observed.
00:13:39.585 --> 00:13:43.350
It's just a nice way of breaking
up the problem into
00:13:43.350 --> 00:13:50.780
the likelihood ratio and the
a priori probabilities.
00:13:50.780 --> 00:13:55.170
Incidentally, we haven't talked
about this at all, but
00:13:55.170 --> 00:13:59.730
there's an important idea and
all of this hypothesis testing
00:13:59.730 --> 00:14:03.600
of a sufficient statistic, and
what do you think a sufficient
00:14:03.600 --> 00:14:05.380
statistic is.
00:14:05.380 --> 00:14:07.630
It's anything from which
you can calculate
00:14:07.630 --> 00:14:10.020
the likelihood ratio.
00:14:10.020 --> 00:14:12.480
In other words, what we're
saying here, the point we're
00:14:12.480 --> 00:14:17.510
making, is that any intelligent
choice of
00:14:17.510 --> 00:14:23.090
hypothesis is it based on
a threshold test on the
00:14:23.090 --> 00:14:24.970
likelihood ratio.
00:14:24.970 --> 00:14:29.380
And therefore, the only thing
you can really be interested
00:14:29.380 --> 00:14:33.430
in in all your observations
is just what is
00:14:33.430 --> 00:14:35.190
the likelihood ratio?
00:14:35.190 --> 00:14:38.680
If they make all these 1,000
observations complicated sort
00:14:38.680 --> 00:14:41.680
of thing, you calculate
one number.
00:14:41.680 --> 00:14:44.220
And that's the only thing
you're interested in.
00:14:44.220 --> 00:14:47.900
And anything from which that
number could be calculated is
00:14:47.900 --> 00:14:49.900
a sufficient statistic.
00:14:49.900 --> 00:14:52.720
And anything from which it can't
be calculated you've
00:14:52.720 --> 00:14:56.260
thrown away some of the
information that you have.
00:14:56.260 --> 00:15:00.140
If you study communication and
you study detection, you study
00:15:00.140 --> 00:15:04.900
how to receive data that's being
sent, what you find is
00:15:04.900 --> 00:15:10.000
that right at the beginning,
even before you do any
00:15:10.000 --> 00:15:13.530
detection, even before you do
any filtering, there's some
00:15:13.530 --> 00:15:16.490
idea of a sufficient
statistic there.
00:15:16.490 --> 00:15:20.900
That's what you need in order to
calculate everything else.
00:15:20.900 --> 00:15:23.100
And you want to make sure
that you have that.
00:15:23.100 --> 00:15:26.240
So that's an important
idea there.
00:15:26.240 --> 00:15:34.340
OK, but anyway, the MAP rule,
which comes right from this,
00:15:34.340 --> 00:15:38.100
says if you have these a priori
probabilities, and
00:15:38.100 --> 00:15:41.380
you're trying to maximize the
probability of choosing
00:15:41.380 --> 00:15:44.750
correctly, what do you do?
00:15:44.750 --> 00:15:52.290
Well, your probability of H
equals 0 was the correct
00:15:52.290 --> 00:15:55.680
hypothesis, given all
the observations you
00:15:55.680 --> 00:15:57.440
made, is in fact this.
00:15:57.440 --> 00:15:59.820
The probability that H equals
1 is the correct
00:15:59.820 --> 00:16:01.670
hypothesis is this.
00:16:01.670 --> 00:16:04.300
What do you do if you want to
maximize the probability of
00:16:04.300 --> 00:16:05.270
being correct?
00:16:05.270 --> 00:16:07.340
You choose the one
which is biggest.
00:16:07.340 --> 00:16:11.260
In other words, what you do is
you look at this number.
00:16:11.260 --> 00:16:14.890
And if this number is bigger
than 1, you choose 0.
00:16:14.890 --> 00:16:20.290
If it's less than 1, you
choose hypothesis 1.
00:16:20.290 --> 00:16:23.960
And what it turns out to
is threshold of rule.
00:16:23.960 --> 00:16:26.280
You take this likelihood
ratio.
00:16:26.280 --> 00:16:30.020
You compare it with
p1 over p0.
00:16:30.020 --> 00:16:34.250
And in this case, you
select h equals 0.
00:16:34.250 --> 00:16:37.540
In this case you select
H equals 1.
00:16:37.540 --> 00:16:42.040
And the last time I just a 1 and
0 reversed, which is fine,
00:16:42.040 --> 00:16:44.620
but if you reverse them one
place, you want a reverse them
00:16:44.620 --> 00:16:47.550
every place.
00:16:47.550 --> 00:16:52.260
And every other threshold test
does something like this,
00:16:52.260 --> 00:16:56.920
except you replace p1 over p0
with some arbitrary threshold.
00:16:56.920 --> 00:16:59.590
You say whatever reason you
want to find for that
00:16:59.590 --> 00:17:03.880
threshold, that's the only
intelligent kind of
00:17:03.880 --> 00:17:06.460
test you can make.
00:17:06.460 --> 00:17:09.300
OK, then we define the
log-likelihood ratio of the
00:17:09.300 --> 00:17:12.710
logarithm of the likelihood
ratio.
00:17:12.710 --> 00:17:17.970
And that was nice because it
was a sum of this quantity
00:17:17.970 --> 00:17:20.339
related to the individual
observations.
00:17:20.339 --> 00:17:25.400
For each observation you really
want to know what f of
00:17:25.400 --> 00:17:30.870
Y given H of Y given 0,
divided by Y given 1.
00:17:30.870 --> 00:17:32.000
You want to divide those two.
00:17:32.000 --> 00:17:34.060
You want to take the
logarithm of it.
00:17:34.060 --> 00:17:37.130
And then you have those numbers,
and the sufficient
00:17:37.130 --> 00:17:41.070
statistic that you're interested
in is just a sum of
00:17:41.070 --> 00:17:42.530
those numbers.
00:17:42.530 --> 00:17:47.385
So you're looking at a sum
of IID, random variable.
00:17:47.385 --> 00:17:50.540
IID, why IID?
00:17:50.540 --> 00:17:57.650
Well, under the hypothesis that
H is equal to 1, those
00:17:57.650 --> 00:18:00.410
Yi's are IID.
00:18:00.410 --> 00:18:03.000
And therefore under the
hypothesis that
00:18:03.000 --> 00:18:04.250
H is equal to 1.
00:18:09.140 --> 00:18:12.410
Well, little Zi is just
a sample value.
00:18:12.410 --> 00:18:15.080
If you look at the random
variable which has these
00:18:15.080 --> 00:18:21.560
sampled values, Z sub i, under
the probability measure,
00:18:21.560 --> 00:18:27.850
corresponding to H equals 1,
those Z sub i's are IID.
00:18:27.850 --> 00:18:34.350
So what that says is when you
look at these sums of random
00:18:34.350 --> 00:18:39.780
variables, the sum of Zi from
1 to n, under hypothesis H
00:18:39.780 --> 00:18:43.750
equals 1, what do you get?
00:18:43.750 --> 00:18:46.540
You get a random walk.
00:18:46.540 --> 00:18:49.935
You get a sum of IID
random variables.
00:18:53.780 --> 00:18:58.770
If you take more observations,
S sub n just changes.
00:18:58.770 --> 00:19:02.020
With n changing, then you
have a larger number of
00:19:02.020 --> 00:19:02.840
observations.
00:19:02.840 --> 00:19:06.270
So the random walk goes a little
further out, and you
00:19:06.270 --> 00:19:08.825
might get closer to a threshold
or whatever.
00:19:11.990 --> 00:19:14.110
And that's what we're
trying to do here.
00:19:14.110 --> 00:19:20.680
OK, so the Z sub i's under the
hypothesis H equals 1, or IID,
00:19:20.680 --> 00:19:24.820
and the moment generating
function of the Z sub i's
00:19:24.820 --> 00:19:27.305
given H equals 1, is this.
00:19:30.460 --> 00:19:33.220
Let's be careful about this.
00:19:33.220 --> 00:19:41.240
The sampled values of the Z
sub i do not depend on the
00:19:41.240 --> 00:19:43.410
hypotheses at all.
00:19:43.410 --> 00:19:45.900
Namely, you make
an observation.
00:19:45.900 --> 00:19:49.280
You make an observation
of Y sub i.
00:19:49.280 --> 00:19:52.730
You calculate Z sub
i from Y sub i.
00:19:52.730 --> 00:19:55.910
That has nothing to do
with whether H equals
00:19:55.910 --> 00:19:58.270
0 or H equals 1.
00:19:58.270 --> 00:20:00.940
You try to calculate
this moment
00:20:00.940 --> 00:20:03.250
generating function, however.
00:20:03.250 --> 00:20:05.980
And you want to know what
the probability
00:20:05.980 --> 00:20:09.870
density of the Y's are.
00:20:09.870 --> 00:20:13.140
And you get a different
probability density for H
00:20:13.140 --> 00:20:16.500
equals 1, then you get on
the other hypothesis.
00:20:16.500 --> 00:20:19.850
If the observations behaved
the same way under both
00:20:19.850 --> 00:20:24.500
hypotheses, it wouldn't make
much sense to do the
00:20:24.500 --> 00:20:25.450
observation.
00:20:25.450 --> 00:20:27.580
Unless you have a government
grant, and you're trying to
00:20:27.580 --> 00:20:30.130
get money out of the government
instead of trying
00:20:30.130 --> 00:20:32.470
to do anything worthwhile.
00:20:32.470 --> 00:20:35.690
Under those circumstances, you
keep on making observations.
00:20:35.690 --> 00:20:38.560
You now perfectly well
that nothing is going
00:20:38.560 --> 00:20:39.830
to come from them.
00:20:39.830 --> 00:20:42.310
But otherwise, it's
a little silly.
00:20:42.310 --> 00:20:46.690
So this moment generating
function under the hypothesis
00:20:46.690 --> 00:20:52.150
H equals 1 is given by
this quantity here.
00:20:52.150 --> 00:20:57.630
And this density here is the
same as this density here.
00:20:57.630 --> 00:21:03.590
So you get this density to the
1 minus r power, and you get
00:21:03.590 --> 00:21:06.130
this density to the r power.
00:21:06.130 --> 00:21:09.210
So you get the product of
these two densities.
00:21:09.210 --> 00:21:16.200
You integrate it over Y, and
that's what gamma 1 of r is.
00:21:16.200 --> 00:21:18.780
Now I said that the really
important thing in all of
00:21:18.780 --> 00:21:23.990
these threshold problems
is what is our star?
00:21:23.990 --> 00:21:27.180
And for this problem,
r star is trivial.
00:21:27.180 --> 00:21:28.640
It's always the same.
00:21:28.640 --> 00:21:31.570
r star is always equal to 1.
00:21:31.570 --> 00:21:37.160
And the reason is when you set
r equal to 1 here, this
00:21:37.160 --> 00:21:39.080
quantity becomes 1.
00:21:39.080 --> 00:21:41.860
This quantity becomes
the density of Y
00:21:41.860 --> 00:21:44.370
conditional on H equals 0.
00:21:44.370 --> 00:21:46.990
When you integrate
that, you get 1.
00:21:46.990 --> 00:21:53.100
So for all of these hypothesis
testing problems, r star is
00:21:53.100 --> 00:21:56.150
equal to 1.
00:21:56.150 --> 00:22:00.050
Gamma 1 of 1 is equal to 0.
00:22:00.050 --> 00:22:02.405
And this is what this
curve says.
00:22:10.660 --> 00:22:17.220
OK, this is gamma 1 of r here.
00:22:17.220 --> 00:22:20.450
This curve starts out here,
negative slope.
00:22:20.450 --> 00:22:22.030
It comes up here.
00:22:22.030 --> 00:22:26.630
r star is equal to
1 in this case.
00:22:26.630 --> 00:22:29.565
And that's sort of the end
of the story for that.
00:22:33.080 --> 00:22:38.660
Now if you are doing a test with
a fixed value of n, you
00:22:38.660 --> 00:22:41.000
say I'm going to make
n observations, it's
00:22:41.000 --> 00:22:42.770
all I have time for.
00:22:42.770 --> 00:22:44.180
The week is over.
00:22:44.180 --> 00:22:46.680
I'm going on vacation
next week.
00:22:46.680 --> 00:22:48.580
I've got to stop this test.
00:22:48.580 --> 00:22:50.730
I've got to write my paper.
00:22:50.730 --> 00:22:52.290
Take the end test.
00:22:52.290 --> 00:22:54.170
You write your paper.
00:22:54.170 --> 00:22:56.820
And what do you do?
00:22:56.820 --> 00:23:01.420
You go through the optimal
tests the best you can.
00:23:01.420 --> 00:23:05.910
And what you find is given H
equals 1, an error is going to
00:23:05.910 --> 00:23:11.020
occur if the sum of random
variables, namely the
00:23:11.020 --> 00:23:14.800
log-likelihood likelihood ratio,
exceeds the logarithm
00:23:14.800 --> 00:23:16.490
of your threshold.
00:23:16.490 --> 00:23:20.650
OK, this is whatever threshold
you decide to establish.
00:23:20.650 --> 00:23:27.580
And we showed before that the
probability that S sub n is
00:23:27.580 --> 00:23:33.290
greater than or equal to log of
the threshold is evaluated
00:23:33.290 --> 00:23:37.220
as E to the n times this
quantity right here.
00:23:37.220 --> 00:23:40.210
The probability of error
given H equals 1 is
00:23:40.210 --> 00:23:41.430
this quantity here.
00:23:41.430 --> 00:23:45.380
Probability of error given H
equals 1 is the probability
00:23:45.380 --> 00:23:48.880
that the data looks like
H equals 0 was a right
00:23:48.880 --> 00:23:49.720
hypothesis.
00:23:49.720 --> 00:23:52.690
In other words, that you crossed
the threshold at plus
00:23:52.690 --> 00:23:55.790
alpha, instead of crossing
the threshold at beta.
00:23:59.710 --> 00:24:02.660
Excuse me.
00:24:02.660 --> 00:24:04.590
We have too many cases
here we're looking
00:24:04.590 --> 00:24:07.300
at, so it gets confusing.
00:24:07.300 --> 00:24:11.810
What I'm looking at here is
the probability that the
00:24:11.810 --> 00:24:16.090
log-likelihood ratio exceeds
this threshold data, whatever
00:24:16.090 --> 00:24:18.380
we set beta to be.
00:24:18.380 --> 00:24:22.230
Eta is set, depending on the
cost of making errors of both
00:24:22.230 --> 00:24:26.955
types on our a priori beta,
if we have any and
00:24:26.955 --> 00:24:28.420
all of those things.
00:24:28.420 --> 00:24:31.920
And the probability of error
given H equals 1 is this
00:24:31.920 --> 00:24:36.490
quantity here, which has the
threshold in it over there.
00:24:36.490 --> 00:24:41.070
We've looked at that a number
of times in a lecture.
00:24:41.070 --> 00:24:42.950
We looked at it in
chapter one.
00:24:42.950 --> 00:24:46.510
And then those, we looked
at it in chapter seven.
00:24:46.510 --> 00:24:50.220
And you calculate it by taking
this moment generating
00:24:50.220 --> 00:24:58.480
function, drawing attention to
it at the point where slope
00:24:58.480 --> 00:25:02.430
natural log of eta
divided by n.
00:25:02.430 --> 00:25:05.200
And then you take where
it comes in to this
00:25:05.200 --> 00:25:07.140
vertical axis here.
00:25:07.140 --> 00:25:13.690
And that's the exponent of the
error of probability when
00:25:13.690 --> 00:25:16.206
hypothesis 1 is correct.
00:25:16.206 --> 00:25:23.050
Now, if the hypothesis is H
equals 0 instead, at that
00:25:23.050 --> 00:25:29.630
point with H equals 0, the
expected value of this
00:25:29.630 --> 00:25:32.860
log-likelihood ratio is
going to be positive.
00:25:32.860 --> 00:25:37.800
The situation is going to be a
curve that comes over here,
00:25:37.800 --> 00:25:40.080
comes back at some point here.
00:25:40.080 --> 00:25:44.040
And what we've showed is that
this curve is just a
00:25:44.040 --> 00:25:47.630
translation of this
curve by 1.
00:25:47.630 --> 00:25:53.770
OK, namely if you calculate the
moment generating function
00:25:53.770 --> 00:25:59.450
for H equals 0, you get the same
thing that we got before.
00:26:01.970 --> 00:26:04.390
I'm not going to go through
all the details of this.
00:26:06.890 --> 00:26:09.695
Now you have 0 here
instead of 1.
00:26:13.740 --> 00:26:19.540
Over here you're going
to have just minus r.
00:26:19.540 --> 00:26:25.440
And over here you're going
to have 1 plus r.
00:26:25.440 --> 00:26:29.170
So this whole thing is
translated by 1.
00:26:29.170 --> 00:26:30.860
The action happens here.
00:26:30.860 --> 00:26:35.010
But if you translate it by 1
over in this direction, what
00:26:35.010 --> 00:26:38.590
happens is the error
of probability is
00:26:38.590 --> 00:26:41.160
determined by this.
00:26:44.220 --> 00:26:45.560
It's this.
00:26:45.560 --> 00:26:49.800
The exponent is this point right
here, gamma 1 of r0 plus
00:26:49.800 --> 00:26:53.430
1 minus r0, log of eta over n.
00:26:53.430 --> 00:26:57.280
And the r0 was again determined
by this point at
00:26:57.280 --> 00:27:00.718
which the slope is equal
to log eta over n.
00:27:03.240 --> 00:27:04.490
We did that before.
00:27:10.450 --> 00:27:13.440
Then I want to make clear you
understood it, because to
00:27:13.440 --> 00:27:16.100
really understand it, you have
to go through the arithmetic
00:27:16.100 --> 00:27:18.190
yourselves at least once.
00:27:18.190 --> 00:27:21.860
And you can do that easily by
following the notes, because
00:27:21.860 --> 00:27:25.650
it does it in almost
excruciating detail.
00:27:25.650 --> 00:27:29.820
So that's the argument
you get.
00:27:29.820 --> 00:27:36.380
We had this idea before, of the
Neyman-Pearson principle,
00:27:36.380 --> 00:27:40.750
which says you don't assume
a priori probabilities.
00:27:40.750 --> 00:27:45.450
You look at the probability of
making an error as being a
00:27:45.450 --> 00:27:50.160
trade off between the error you
make when H is equal to 1
00:27:50.160 --> 00:27:53.380
and the error you make
when H is equal to 0.
00:27:53.380 --> 00:27:55.270
In terms of the Chernoff
bound, this
00:27:55.270 --> 00:27:57.320
trade off is very clear.
00:28:00.700 --> 00:28:05.200
As you change the exponent that
you want to get under H
00:28:05.200 --> 00:28:09.810
equals 1, this point moves.
00:28:09.810 --> 00:28:12.330
The tangent then moves.
00:28:12.330 --> 00:28:14.300
And the exponent over
here moves.
00:28:14.300 --> 00:28:16.950
So you have this inverted
seesaw.
00:28:16.950 --> 00:28:20.170
And the exponent for one kind
of error is over here.
00:28:20.170 --> 00:28:24.790
And the exponent for the other
kind of error is over there.
00:28:24.790 --> 00:28:29.550
Then the next thing we said
was this is really stupid,
00:28:29.550 --> 00:28:32.410
unless you're going on
vacation this Friday.
00:28:32.410 --> 00:28:35.130
If you're not going on vacation
this Friday, if
00:28:35.130 --> 00:28:38.620
you're really serious about
making the right decision,
00:28:38.620 --> 00:28:44.210
then what you're going to do is
keep on making observations
00:28:44.210 --> 00:28:45.885
until you're pretty
sure you're right.
00:28:49.420 --> 00:28:53.620
Now somebody at the end of the
lecture last time pointed out
00:28:53.620 --> 00:28:59.840
something, which says that when
you do experiments and
00:28:59.840 --> 00:29:02.420
you keep on making observations
until you get the
00:29:02.420 --> 00:29:05.760
data that you want, there's
something very
00:29:05.760 --> 00:29:08.960
unethical about that.
00:29:08.960 --> 00:29:12.580
Is this that kind of
unethical behavior?
00:29:12.580 --> 00:29:13.830
Or is this really valid?
00:29:17.770 --> 00:29:21.840
Well, I claim this is valid,
because what we're doing when
00:29:21.840 --> 00:29:27.050
we're doing sequential testing
is we're deciding what we're
00:29:27.050 --> 00:29:29.390
going to do ahead of time.
00:29:29.390 --> 00:29:31.900
Namely, we've decided what we're
going to do is we're
00:29:31.900 --> 00:29:36.550
going to continue testing until
we cross a threshold and
00:29:36.550 --> 00:29:42.290
threshold gives us a suitable
probability of error.
00:29:42.290 --> 00:29:44.770
So we're not cooking
the books at all.
00:29:44.770 --> 00:29:47.730
What we're doing is we're
following this preset
00:29:47.730 --> 00:29:49.250
procedure we've set up.
00:29:49.250 --> 00:29:54.230
And the only question is can
we get a very small error
00:29:54.230 --> 00:29:59.240
probability by using a smaller
number of observations on the
00:29:59.240 --> 00:30:01.580
average than what we
need otherwise?
00:30:04.330 --> 00:30:06.415
Put it in terms of a
communication system.
00:30:09.060 --> 00:30:12.490
One kind of communication
system, you have to send some
00:30:12.490 --> 00:30:15.320
data from one point
to another.
00:30:15.320 --> 00:30:18.380
You're not going to get
any feedback on it.
00:30:18.380 --> 00:30:21.630
You've got to get the data
through the first time.
00:30:21.630 --> 00:30:23.140
It's got to be right.
00:30:23.140 --> 00:30:25.250
What are you going to do?
00:30:25.250 --> 00:30:27.920
You're going to send this data
a very large number of times
00:30:27.920 --> 00:30:31.960
or use a very powerful coding
technique on it.
00:30:31.960 --> 00:30:35.790
And by time it gets through,
you're going to be very sure
00:30:35.790 --> 00:30:38.860
you're right.
00:30:38.860 --> 00:30:42.780
Now a much better procedure, and
the thing which is used in
00:30:42.780 --> 00:30:46.600
almost all communication
systems, and the thing which
00:30:46.600 --> 00:30:50.220
we use as human beings all the
time, and the thing which
00:30:50.220 --> 00:30:53.120
control people use all the time,
the thing which almost
00:30:53.120 --> 00:30:57.150
everybody uses, because most of
us have common sense if we
00:30:57.150 --> 00:31:01.840
spend some time trying to do
these things, is instead of
00:31:01.840 --> 00:31:06.720
trying to get it right the first
time, we try little bit
00:31:06.720 --> 00:31:08.500
to get it right the
first time.
00:31:08.500 --> 00:31:10.770
And we make sure that if we
don't get it right the first
00:31:10.770 --> 00:31:13.440
time, we have some way of
finding out about it and
00:31:13.440 --> 00:31:16.550
getting it right the
second time.
00:31:21.390 --> 00:31:26.500
And in the scientific way of
looking at it, what we do is
00:31:26.500 --> 00:31:30.250
we decide ahead of time exactly
what our procedure is
00:31:30.250 --> 00:31:33.043
going to be for making
repetitions--
00:31:36.090 --> 00:31:39.100
something called ARQ in
communication systems, which
00:31:39.100 --> 00:31:41.910
means automatic repeat
request.
00:31:41.910 --> 00:31:47.470
It's automatic, which means
you don't try to make your
00:31:47.470 --> 00:31:51.660
decision depending on whether
you'd like to receive this 0
00:31:51.660 --> 00:31:54.400
or like to receive a 1.
00:31:54.400 --> 00:31:57.730
You make the decision ahead of
time that if you have a clean
00:31:57.730 --> 00:32:01.110
enough answer, you're
going to accept it.
00:32:01.110 --> 00:32:04.310
If it looks doubtful, you're
going to send it over again.
00:32:04.310 --> 00:32:08.740
That's exactly the same sort
of thing we're doing here.
00:32:08.740 --> 00:32:17.810
OK, when we do that, given H
equals 1, we again have this S
00:32:17.810 --> 00:32:22.400
sub n as a function of
n as a random walk.
00:32:22.400 --> 00:32:26.040
It's a sum of IID random
variables and
00:32:26.040 --> 00:32:27.640
conditional on H equals 1.
00:32:27.640 --> 00:32:28.950
You have a random walk.
00:32:28.950 --> 00:32:33.810
Conditional on H equals 1, you
have a negative slope on this
00:32:33.810 --> 00:32:34.530
random walk.
00:32:34.530 --> 00:32:38.480
The random walk starts out and
on the average is going to go
00:32:38.480 --> 00:32:41.380
down, and it's going to continue
going down forever.
00:32:41.380 --> 00:32:43.580
And if you're looking for
across some positive
00:32:43.580 --> 00:32:46.640
threshold, if it doesn't cross
it pretty soon, it's not going
00:32:46.640 --> 00:32:47.490
to cross it.
00:32:47.490 --> 00:32:50.230
But anyway, we have a test
which said we have some
00:32:50.230 --> 00:32:51.320
positive threshold.
00:32:51.320 --> 00:32:53.420
We have some negative
threshold.
00:32:53.420 --> 00:32:56.950
If we ever cross the positive
threshold, we say
00:32:56.950 --> 00:32:58.610
H is equal to 0.
00:32:58.610 --> 00:33:01.410
If we ever cross the negative
threshold, we say
00:33:01.410 --> 00:33:02.630
H is equal to 1.
00:33:02.630 --> 00:33:05.600
And then we're done with it.
00:33:05.600 --> 00:33:10.200
OK, now, let me give you another
argument why that
00:33:10.200 --> 00:33:10.920
makes sense.
00:33:10.920 --> 00:33:13.250
I gave you one argument
last time.
00:33:13.250 --> 00:33:14.930
I'll give you another
argument this time.
00:33:18.780 --> 00:33:28.730
If S sub J is greater than or
equal to 0, we're going to
00:33:28.730 --> 00:33:32.550
decide that 0 is the
correct hypothesis.
00:33:35.300 --> 00:33:40.370
If H equals 1 is the correct
hypothesis, then we're going
00:33:40.370 --> 00:33:42.750
make an error when S
sub J is greater
00:33:42.750 --> 00:33:44.571
than or equal to alpha.
00:33:44.571 --> 00:33:47.010
If S sub J is less than or equal
to the beta, we're going
00:33:47.010 --> 00:33:50.410
to decide H equals 1.
00:33:50.410 --> 00:33:54.810
And conditional on H equals 1,
an error is made if SJ is
00:33:54.810 --> 00:33:56.560
greater than or equal
to alpha.
00:33:56.560 --> 00:34:00.710
Conditional on H equals 0, an
error is made if SJ is less
00:34:00.710 --> 00:34:02.570
than or equal to beta.
00:34:02.570 --> 00:34:06.880
OK, so the probability of the
error conditional on H equals
00:34:06.880 --> 00:34:12.090
1 is the probability that S sub
J is greater than or equal
00:34:12.090 --> 00:34:16.370
to alpha, given H equals 1,
which is less than or equal to
00:34:16.370 --> 00:34:18.120
E to the minus alpha or star.
00:34:18.120 --> 00:34:22.340
This is the thing that
we said before.
00:34:22.340 --> 00:34:25.270
r star is the root
of gamma of r.
00:34:25.270 --> 00:34:29.840
And gamma of r is
equal to this.
00:34:29.840 --> 00:34:38.060
OK, so, let's just make life a
little easier for ourselves
00:34:38.060 --> 00:34:43.050
assume that our a priori
probabilities are each 1/2.
00:34:43.050 --> 00:34:47.810
This is also called maximum
likelihood decision.
00:34:47.810 --> 00:34:51.300
You take this likelihood ratio,
and you just decide on
00:34:51.300 --> 00:34:54.770
the basis of the likelihood
ratio.
00:34:54.770 --> 00:34:58.790
OK, then at the end of trial
end, the probability of H
00:34:58.790 --> 00:35:03.850
equals 0 given Sn divided by the
probability of H equals 1
00:35:03.850 --> 00:35:07.120
given Sn, the a prioris
cancel out.
00:35:07.120 --> 00:35:11.260
It is just E to the S sub n.
00:35:11.260 --> 00:35:12.510
That's what it is.
00:35:15.000 --> 00:35:17.230
It's the likelihood ratio.
00:35:17.230 --> 00:35:21.100
S sub n is the log-likelihood
ratio.
00:35:21.100 --> 00:35:24.360
So this is what it is.
00:35:24.360 --> 00:35:30.190
If you now take probability of H
equals 0 on probability that
00:35:30.190 --> 00:35:34.130
H equals 1 given S of
in, this equation
00:35:34.130 --> 00:35:36.440
becomes this equation.
00:35:36.440 --> 00:35:40.560
And then the probability of H
equals 1 given S sub n is just
00:35:40.560 --> 00:35:44.910
E to the minus Sn over 1
plus E to the minus Sn.
00:35:44.910 --> 00:35:50.210
Now if Sn is a large number, E
to the minus Sn is going to be
00:35:50.210 --> 00:35:52.120
totally trivial.
00:35:52.120 --> 00:35:55.570
And the probability that
H equals 1 given Sn is
00:35:55.570 --> 00:35:58.730
essentially E to the minus Sn.
00:35:58.730 --> 00:36:04.260
It means when you can choose
different values of n, this
00:36:04.260 --> 00:36:06.990
very directly gives you
a control on what the
00:36:06.990 --> 00:36:08.900
probability of error is.
00:36:08.900 --> 00:36:13.680
The probability of error is
essentially E to the minus Sn.
00:36:13.680 --> 00:36:17.210
So if you choose a threshold
alpha, what you're doing is
00:36:17.210 --> 00:36:20.450
you're guaranteeing that the
probability of error cannot be
00:36:20.450 --> 00:36:23.080
less than E to the
minus alpha.
00:36:23.080 --> 00:36:27.650
OK, so this is more than just
talking about averages.
00:36:27.650 --> 00:36:31.120
This is saying if you use a
threshold rule, then what
00:36:31.120 --> 00:36:34.600
you're doing is guaranteeing
that the probability of error
00:36:34.600 --> 00:36:37.360
is never going to be less
than this quantity
00:36:37.360 --> 00:36:39.720
of specified here.
00:36:39.720 --> 00:36:43.940
OK, we saw last time the cost of
choosing alpha to be large
00:36:43.940 --> 00:36:48.060
is that you have to make a very
large number of trials,
00:36:48.060 --> 00:36:50.010
at least given H equals 0.
00:36:50.010 --> 00:36:53.770
Why don't I worry about
the number of trials
00:36:53.770 --> 00:36:56.840
for H equals 1?
00:36:56.840 --> 00:37:00.590
I mean, it's nothing to be
thought through here.
00:37:00.590 --> 00:37:04.370
If my thresholds are large,
my probability of
00:37:04.370 --> 00:37:05.620
error is very small.
00:37:08.060 --> 00:37:17.530
The expected values of things
for very large log-likelihood
00:37:17.530 --> 00:37:22.800
ratios are determined almost
entirely by H equals 0.
00:37:22.800 --> 00:37:24.610
H equals 1 sometimes.
00:37:24.610 --> 00:37:27.250
You sometimes make a mistake,
because it's something very,
00:37:27.250 --> 00:37:28.730
very unusual.
00:37:28.730 --> 00:37:31.450
But that has very little
influence on the expected
00:37:31.450 --> 00:37:33.680
number of tests you're making.
00:37:33.680 --> 00:37:38.960
So what happens then is the
expected number of tests you
00:37:38.960 --> 00:37:42.790
make under the hypothesis
that H is equal to 0--
00:37:45.580 --> 00:37:47.440
now we're using Wald's equality
00:37:47.440 --> 00:37:49.540
rather than Wald's identity--
00:37:49.540 --> 00:37:52.720
it's equal to the expected
value of S sub J given H
00:37:52.720 --> 00:37:56.530
equals 0, divided by the
expected value of Z,
00:37:56.530 --> 00:37:57.780
given H equals 0.
00:38:01.150 --> 00:38:05.940
Z is the log-likelihood
ratio of one trial.
00:38:05.940 --> 00:38:10.650
This is just Wald's
equality with this
00:38:10.650 --> 00:38:12.710
condition thrown into it.
00:38:12.710 --> 00:38:17.290
Now what's the expected value
of SJ given H equals 0?
00:38:17.290 --> 00:38:20.780
It's essentially alpha, and if
you want to be more careful,
00:38:20.780 --> 00:38:26.720
it's alpha plus the expected
overshoot given H equals 0.
00:38:26.720 --> 00:38:29.960
And that's divided by the
expected value of Z,
00:38:29.960 --> 00:38:31.090
given H equals 0.
00:38:31.090 --> 00:38:32.990
This is the answer
we got last time.
00:38:36.260 --> 00:38:39.520
So the number of tests you have
to make, if you set a
00:38:39.520 --> 00:38:43.820
positive threshold alpha, is
essentially the number of
00:38:43.820 --> 00:38:48.310
tests you have to make when
the hypothesis is equal 0.
00:38:48.310 --> 00:38:52.400
So the funny thing which is
happening here is that as you
00:38:52.400 --> 00:38:56.760
change alpha, you're changing
the probability of error for
00:38:56.760 --> 00:39:00.810
hypothesis H equals 1.
00:39:00.810 --> 00:39:02.830
And you're changing the number
of tests you're going to have
00:39:02.830 --> 00:39:05.290
to do when H is equal to 0.
00:39:05.290 --> 00:39:08.650
When you change beta, it's
just the opposite.
00:39:08.650 --> 00:39:13.130
So that when you change beta, if
you make beta a very large
00:39:13.130 --> 00:39:17.720
negative, you have to make an
enormous number of tests under
00:39:17.720 --> 00:39:20.750
the circumstance that
H is equal to 1.
00:39:20.750 --> 00:39:24.700
But you might make an error
when H is equal to 0.
00:39:24.700 --> 00:39:30.270
So the trade off is between
number of trials under one
00:39:30.270 --> 00:39:33.700
hypothesis, error of probability
under the other
00:39:33.700 --> 00:39:34.950
hypothesis.
00:39:39.300 --> 00:39:43.140
That's almost all we wanted to
say about Wald's identity.
00:39:43.140 --> 00:39:46.426
There's one other huge thing
that we want to talk about.
00:39:48.990 --> 00:39:54.470
If you take the first two
derivatives of Wald's identity
00:39:54.470 --> 00:40:00.360
at r equals 0, you get some
interesting things coming out.
00:40:00.360 --> 00:40:03.080
I mean, Wald's identity, you
can use it any value
00:40:03.080 --> 00:40:05.430
of r you want to.
00:40:05.430 --> 00:40:08.190
And when you use it for a large
value of r, you that an
00:40:08.190 --> 00:40:11.520
interesting result about
large deviations.
00:40:11.520 --> 00:40:15.650
When you use it at a small value
of r, you get something
00:40:15.650 --> 00:40:18.220
more about typical cases.
00:40:18.220 --> 00:40:25.590
So looking at it at r equals 0,
what you want to do is you
00:40:25.590 --> 00:40:28.970
want to take the derivative
with respect
00:40:28.970 --> 00:40:34.170
to r of Wald's identity.
00:40:34.170 --> 00:40:37.720
This expected value in here
we know is equal to 1.
00:40:37.720 --> 00:40:41.050
It's equal to 1 whatever
value of r we choose.
00:40:41.050 --> 00:40:45.140
And therefore, when we take
the derivative of this, we
00:40:45.140 --> 00:40:46.890
have to get 0.
00:40:46.890 --> 00:40:49.150
But we also want to take
the derivative of it
00:40:49.150 --> 00:40:50.670
to see what we get.
00:40:50.670 --> 00:40:53.460
So when you take the derivative
of this quantity
00:40:53.460 --> 00:40:56.260
here and you don't worry
about what exists and
00:40:56.260 --> 00:40:57.510
what doesn't exist--
00:41:01.250 --> 00:41:03.290
you have to take the
derivative here--
00:41:03.290 --> 00:41:07.470
so you get an S sub J there.
00:41:07.470 --> 00:41:09.340
You take the derivative
here, you get a
00:41:09.340 --> 00:41:11.190
gamma prime of r there.
00:41:11.190 --> 00:41:15.440
If you get SJ minus J times
gamma prime of r, and this E
00:41:15.440 --> 00:41:19.110
to the what have you
just sits there.
00:41:19.110 --> 00:41:22.450
You take the derivative of E
to something, you never get
00:41:22.450 --> 00:41:23.580
rid of the E to something.
00:41:23.580 --> 00:41:27.300
You just get piled up stuff
in front of it.
00:41:27.300 --> 00:41:33.840
OK, so when we evaluate that at
r equals 0, what happens?
00:41:33.840 --> 00:41:36.380
Well, what's the value of
the gamma prime of 0?
00:41:40.180 --> 00:41:42.120
It's the expected
value of X, yes.
00:41:48.060 --> 00:41:53.860
And this quantity here is all
equal to 1, so we can forget
00:41:53.860 --> 00:41:56.140
about that.
00:41:56.140 --> 00:42:00.530
When r is equal to 0,
this is equal to 0.
00:42:00.530 --> 00:42:04.320
When r is equal to 0, gamma
of r is equal to 0.
00:42:04.320 --> 00:42:07.810
So this whole thing
in here is 0.
00:42:07.810 --> 00:42:09.510
So E to the 0 is 1.
00:42:09.510 --> 00:42:11.070
So we've got a 1 there.
00:42:11.070 --> 00:42:16.870
We got expected value of
S sub J minus J times X
00:42:16.870 --> 00:42:19.770
bar is equal to 0.
00:42:19.770 --> 00:42:20.290
What is that?
00:42:20.290 --> 00:42:22.190
That's Wald's equality.
00:42:22.190 --> 00:42:27.260
So Wald's equality falls out of
the Wald's identity as what
00:42:27.260 --> 00:42:31.630
happens as the derivative
of Wald's identity
00:42:31.630 --> 00:42:34.000
that r equals 0.
00:42:34.000 --> 00:42:36.975
Well, since we're so successful
with that, let's go
00:42:36.975 --> 00:42:38.360
on and take another
derivative.
00:42:40.860 --> 00:42:41.566
Yes?
00:42:41.566 --> 00:42:46.180
AUDIENCE: I guess you want the
final equal to 0 [INAUDIBLE].
00:42:46.180 --> 00:42:49.970
PROFESSOR: Oh, the final equal
to 0 comes from the fact that
00:42:49.970 --> 00:42:54.080
this quantity here that you're
starting with is equal to 1
00:42:54.080 --> 00:42:56.120
for all values of r.
00:42:56.120 --> 00:42:57.680
Therefore, I want to take
the derivative with
00:42:57.680 --> 00:43:01.130
respect to r, I get 0.
00:43:01.130 --> 00:43:02.750
So that's one equation.
00:43:02.750 --> 00:43:05.100
The other thing is I just go
through the mechanics of
00:43:05.100 --> 00:43:07.400
taking the derivative.
00:43:07.400 --> 00:43:11.720
OK, so let's try to take
the second derivative.
00:43:11.720 --> 00:43:16.360
Take the second derivative by
taking the derivative of the
00:43:16.360 --> 00:43:18.290
first derivative.
00:43:18.290 --> 00:43:26.370
And what happens is then is this
quantity in here I get an
00:43:26.370 --> 00:43:29.320
extra term of that sitting
over there.
00:43:29.320 --> 00:43:31.840
And along with that, I get the
derivative of this with
00:43:31.840 --> 00:43:33.090
respect to r.
00:43:35.480 --> 00:43:39.060
I should probably have written
that down there but since I
00:43:39.060 --> 00:43:47.160
didn't, let me see
if I can do it.
00:43:47.160 --> 00:44:03.740
I get the expected value of SJ
minus J gamma prime of r.
00:44:03.740 --> 00:44:05.650
And this quantity
is squared now,
00:44:05.650 --> 00:44:07.450
because I have this there.
00:44:07.450 --> 00:44:11.780
I'm taking the derivative of
this term with respect to r.
00:44:11.780 --> 00:44:16.190
And also, I have to take the
derivative of this with
00:44:16.190 --> 00:44:17.700
respect to r.
00:44:17.700 --> 00:44:25.630
So that gives me minus J times
gamma double prime of r.
00:44:25.630 --> 00:44:41.570
And all of this times E to the
r SJ minus J gamma of r.
00:44:45.260 --> 00:44:49.990
Now I want to evaluate
this at r equals 0.
00:44:49.990 --> 00:44:54.510
Evaluating this at r equals
0, this term goes away.
00:44:54.510 --> 00:45:04.030
So I wind up with the expected
value of SJ minus J gamma
00:45:04.030 --> 00:45:12.240
prime of r where minus J
gamma double prime of
00:45:12.240 --> 00:45:17.740
r is equal to 0.
00:45:17.740 --> 00:45:20.060
Well, this doesn't look bad.
00:45:20.060 --> 00:45:23.700
But if you try to use it if you
expand this term here, you
00:45:23.700 --> 00:45:30.480
get a term the expected value of
S of J times J. And you can
00:45:30.480 --> 00:45:32.440
struggle with that.
00:45:32.440 --> 00:45:33.900
And it's ugly.
00:45:33.900 --> 00:45:34.920
That's very ugly.
00:45:34.920 --> 00:45:43.340
But if you now say, if
we have a mean we
00:45:43.340 --> 00:45:45.200
can use Wald's equality.
00:45:45.200 --> 00:45:47.150
It tells us what we
want to know.
00:45:47.150 --> 00:45:51.170
If we don't have a mean, then
Wald's equality doesn't tell
00:45:51.170 --> 00:45:53.320
us anything.
00:45:53.320 --> 00:45:55.670
But this is going to
tell us something.
00:45:55.670 --> 00:45:59.100
So we're going to make the
assumption here that r is
00:45:59.100 --> 00:46:06.525
equal to 0 and X bar
is equal to 0.
00:46:09.340 --> 00:46:13.590
And if X bar is equal
to 0, gamma prime of
00:46:13.590 --> 00:46:17.170
0 is equal to 0.
00:46:17.170 --> 00:46:29.540
And gamma double prime of 0 is
equal to sigma squared of X.
00:46:29.540 --> 00:46:31.810
So you do all of that.
00:46:31.810 --> 00:46:37.030
What you get is the expected
value of S sub J squared minus
00:46:37.030 --> 00:46:42.070
sigma X squared of
J is equal to 0.
00:46:42.070 --> 00:46:45.460
This is the same kind
of thing that we
00:46:45.460 --> 00:46:48.360
got from Wald's equality.
00:46:48.360 --> 00:46:52.130
From Wald's equality, it didn't
tell us anything.
00:46:52.130 --> 00:46:56.100
It just gave us a relationship
between the expected value of
00:46:56.100 --> 00:46:59.780
S sub J and expected
value of J. This is
00:46:59.780 --> 00:47:01.180
doing the same thing.
00:47:01.180 --> 00:47:05.540
It's giving us a relationship
between the expected value of
00:47:05.540 --> 00:47:12.040
S sub J squared and the expected
value of J. So we get
00:47:12.040 --> 00:47:13.930
the same kind of quantity.
00:47:13.930 --> 00:47:16.970
It's doing the same
thing for us.
00:47:16.970 --> 00:47:27.530
Now you look at this for a 0
means simple random walk.
00:47:27.530 --> 00:47:29.590
Now you would have thought
before you started to take
00:47:29.590 --> 00:47:34.390
this class that a simple random
walk with mean 0 was
00:47:34.390 --> 00:47:37.350
the simplest thing
in the world.
00:47:37.350 --> 00:47:41.010
And we've seen by looking at
that it really isn't all that
00:47:41.010 --> 00:47:45.830
simple, that you come play these
silly games like you can
00:47:45.830 --> 00:47:50.750
gamble forever, where with
probability 1/2 you lose $1,
00:47:50.750 --> 00:47:53.590
with probability 1/2
you win $1--
00:47:53.590 --> 00:47:55.420
perfectly fair game.
00:47:55.420 --> 00:47:58.365
And with probability one, you
to make $1 out of that, and
00:47:58.365 --> 00:48:00.140
quit and go home.
00:48:00.140 --> 00:48:03.470
And since you to make $1 out of
it, and quit and go home,
00:48:03.470 --> 00:48:06.610
you can then quickly come
back again and it again.
00:48:06.610 --> 00:48:07.960
You can make $2.
00:48:07.960 --> 00:48:09.440
You can make $10.
00:48:09.440 --> 00:48:10.750
You can make $1,000.
00:48:10.750 --> 00:48:14.770
You can make $1 million
with probability 1.
00:48:14.770 --> 00:48:17.840
So the simple random walk
is no longer simple.
00:48:17.840 --> 00:48:19.820
It becomes puzzling.
00:48:19.820 --> 00:48:23.910
But Wald's identity is dealing
with two thresholds, one of
00:48:23.910 --> 00:48:27.390
alpha and one at beta.
00:48:27.390 --> 00:48:34.380
When we apply this and you
observe it as a simple random
00:48:34.380 --> 00:48:39.810
walk, where you either go up
by 1 or go down by 1, each
00:48:39.810 --> 00:48:44.410
with probability 1/2, the
mean of X is 0 and the
00:48:44.410 --> 00:48:46.730
variance of X is 1.
00:48:46.730 --> 00:48:49.900
So this quantity here is 1.
00:48:49.900 --> 00:48:56.240
You can then play games with
what the probability is that
00:48:56.240 --> 00:48:59.526
you hit the upper threshold and
the probability that you
00:48:59.526 --> 00:49:01.150
hit the lower threshold.
00:49:01.150 --> 00:49:02.270
I mean, it's done in the text.
00:49:02.270 --> 00:49:05.390
You don't have to take
my word for it.
00:49:05.390 --> 00:49:09.680
And when you do that, what you
find is the expected value of
00:49:09.680 --> 00:49:14.230
J is equal to minus
beta times alpha.
00:49:14.230 --> 00:49:18.000
Theta is a negative number,
remember, so this is expected
00:49:18.000 --> 00:49:22.690
value of J is the magnitude
of theta times the
00:49:22.690 --> 00:49:23.940
magnitude of alpha.
00:49:26.290 --> 00:49:30.230
Now that's a little bizarre, but
then you think about it a
00:49:30.230 --> 00:49:31.370
little bit.
00:49:31.370 --> 00:49:34.410
You think what happens.
00:49:34.410 --> 00:49:36.310
And this is really exact.
00:49:36.310 --> 00:49:40.720
I mean, this isn't an
approximation or anything.
00:49:40.720 --> 00:49:45.000
If alpha is very large, and
beta is very large and
00:49:45.000 --> 00:49:50.050
negative, and you play this
random walk game, you're going
00:49:50.050 --> 00:49:52.250
to fluctuate a long time.
00:49:52.250 --> 00:49:54.350
You're going to disperse
slowly.
00:49:54.350 --> 00:49:56.920
You're going to disperse
according to the square root
00:49:56.920 --> 00:50:00.010
of n, or the number
of tests you take.
00:50:00.010 --> 00:50:03.720
So the amount of time it takes
you until you get way out to
00:50:03.720 --> 00:50:08.330
these thresholds should be--
00:50:08.330 --> 00:50:16.080
to the namely the value
that n has to have--
00:50:16.080 --> 00:50:19.990
roughly the square of alpha
when beta and alpha
00:50:19.990 --> 00:50:21.040
are both the same.
00:50:21.040 --> 00:50:22.820
This is something more
general than that.
00:50:22.820 --> 00:50:27.950
It says that if Sn, the
stop-when-you're-ahead game,
00:50:27.950 --> 00:50:34.820
we make alpha equals 1, the
expected value of J depends on
00:50:34.820 --> 00:50:37.110
what the lower threshold is.
00:50:37.110 --> 00:50:41.870
And that suddenly makes sense,
because what that's saying is
00:50:41.870 --> 00:50:45.530
if we have a lower threshold at
10, an upper threshold at
00:50:45.530 --> 00:50:54.170
one, then most of the
time you win.
00:50:54.170 --> 00:50:57.250
But when you lose,
you lose $10.
00:50:57.250 --> 00:50:58.900
When you win, you win, $1.
00:50:58.900 --> 00:51:02.000
When you set a lower threshold
at 100, when you
00:51:02.000 --> 00:51:03.800
lose, you lose $100.
00:51:03.800 --> 00:51:06.370
When you win, you win $1.
00:51:06.370 --> 00:51:09.310
And suddenly, that
stop-when-you're-ahead game
00:51:09.310 --> 00:51:13.300
does not look quite as
attractive as it did before.
00:51:13.300 --> 00:51:17.290
What you're doing is taking a
chance where you're probably
00:51:17.290 --> 00:51:22.950
going to win of winning $1, and
you're risking your life's
00:51:22.950 --> 00:51:27.470
assets for it, which doesn't
make too much sense anymore.
00:51:27.470 --> 00:51:32.200
OK this, I think, it gives you
a better idea of what's going
00:51:32.200 --> 00:51:36.600
on on the simple random walk
than anything else I've seen.
00:51:50.210 --> 00:51:53.100
So it's time to start talking
about martingales.
00:51:56.570 --> 00:51:59.740
A martingale, like most of the
other things we've been
00:51:59.740 --> 00:52:02.120
talking about in the
course, is a
00:52:02.120 --> 00:52:05.320
sequence of random variables.
00:52:05.320 --> 00:52:08.650
This is a more general kind of
sequence than most of them.
00:52:12.085 --> 00:52:16.200
Almost all of the processes
we've talked about so far have
00:52:16.200 --> 00:52:20.400
been the kinds of things you can
sort of get your hands on.
00:52:20.400 --> 00:52:23.950
And this is defined very
abstractly in terms of a
00:52:23.950 --> 00:52:26.690
peculiar property that it has.
00:52:26.690 --> 00:52:31.500
And then the peculiar property
it has is the expected value
00:52:31.500 --> 00:52:36.320
as the nth term, in this thing
called a martingale,
00:52:36.320 --> 00:52:40.010
conditional on knowing the
values of all the previous
00:52:40.010 --> 00:52:45.340
values, expected value of Z sub
n given the value of Z and
00:52:45.340 --> 00:52:50.130
minus 1, Z and minus 2, all the
way down to Z1 is equal to
00:52:50.130 --> 00:52:52.590
Z sub n minus 1.
00:52:52.590 --> 00:53:00.330
Namely, the expected value here
is what you had there.
00:53:00.330 --> 00:53:05.370
The word martingale comes from
gambling, where gamblers used
00:53:05.370 --> 00:53:09.740
to spend a great deal of time
trying to find gambling
00:53:09.740 --> 00:53:15.020
strategies when to stop, when to
start betting bigger, when
00:53:15.020 --> 00:53:18.460
to start betting smaller, when
to do all sorts of things, all
00:53:18.460 --> 00:53:23.050
sorts of strategies for how
to lose less money.
00:53:23.050 --> 00:53:27.520
Let me put it that way, because
you rarely find that
00:53:27.520 --> 00:53:30.890
opportunity where you can
play a fair game.
00:53:30.890 --> 00:53:37.450
But if you play a fair game,
martingales are what sort of
00:53:37.450 --> 00:53:38.650
rules on that.
00:53:38.650 --> 00:53:42.560
And what that says is if you
play this game for a long
00:53:42.560 --> 00:53:47.480
time, your capital is
Z sub n minus 1.
00:53:50.790 --> 00:53:54.090
This says, figure expected
capital after you
00:53:54.090 --> 00:53:56.790
play one more time.
00:53:56.790 --> 00:54:01.640
No matter what strategy you use,
your expected capital is
00:54:01.640 --> 00:54:05.460
going to be the same as
was as the actual
00:54:05.460 --> 00:54:09.170
capital the time before.
00:54:09.170 --> 00:54:14.360
If this is too abstract for you,
and it's too abstract for
00:54:14.360 --> 00:54:18.010
me half the time, because I look
at this, and I say, gee,
00:54:18.010 --> 00:54:20.840
that's not much of a
restriction, is it?
00:54:20.840 --> 00:54:24.380
What we're talking about is
expected values here.
00:54:24.380 --> 00:54:28.810
But it's more than that, because
it's saying for every
00:54:28.810 --> 00:54:33.990
choice of sample value for all
of these things, none of them
00:54:33.990 --> 00:54:37.440
make any difference, except
the last one.
00:54:37.440 --> 00:54:39.970
And that's what happens
in gambling.
00:54:39.970 --> 00:54:44.130
It doesn't make any difference
how your capital has gotten to
00:54:44.130 --> 00:54:49.310
the point where it is
at time n minus 1.
00:54:49.310 --> 00:54:55.880
You make a bet in a fair bet,
and what you win is solely a
00:54:55.880 --> 00:55:01.020
function of what you've bet,
if the game is fair.
00:55:01.020 --> 00:55:02.270
And that's what this
is saying.
00:55:05.790 --> 00:55:09.510
So when you write it out this
way, the expected value of Zn,
00:55:09.510 --> 00:55:13.820
given that 1, the random
variable Zn minus 1 has a
00:55:13.820 --> 00:55:16.900
particular value Zn minus 1.
00:55:19.920 --> 00:55:24.530
You started out with
a particular value.
00:55:24.530 --> 00:55:29.520
It says that expected value
is equal to what you
00:55:29.520 --> 00:55:32.870
said at the last time.
00:55:32.870 --> 00:55:36.170
And this is true for all sample
values Zn minus 1 down
00:55:36.170 --> 00:55:40.080
to Z1, which is why it's a much
stronger statement than
00:55:40.080 --> 00:55:41.330
it appears to be.
00:55:43.910 --> 00:55:46.200
now there's a lemma.
00:55:46.200 --> 00:55:49.250
I want to talk about that a
little bit, because it's a
00:55:49.250 --> 00:55:51.500
good time to get you
used to what these
00:55:51.500 --> 00:55:53.055
expected values mean.
00:55:55.810 --> 00:56:02.040
For martingale, the expected
value of Zn given Zi, Zi minus
00:56:02.040 --> 00:56:04.080
1, all the way down to Z1.
00:56:04.080 --> 00:56:07.770
This expected value is
equal to Z sub i.
00:56:07.770 --> 00:56:12.560
In other words, it's not only
that your expected capital,
00:56:12.560 --> 00:56:18.030
given all of the past, is equal
to what you had on the
00:56:18.030 --> 00:56:19.880
last time instant.
00:56:19.880 --> 00:56:25.830
If you're not given anything
for 100 years back, and all
00:56:25.830 --> 00:56:33.750
you know is what your capital
was 100 years ago, and if we
00:56:33.750 --> 00:56:36.180
think we're playing a fair game
all of this time, which
00:56:36.180 --> 00:56:39.840
is of course always a question,
the expected value
00:56:39.840 --> 00:56:45.530
of what we have now, conditional
on everything from
00:56:45.530 --> 00:56:51.730
100 years back through
recorded history, is
00:56:51.730 --> 00:56:53.400
just that last term.
00:56:53.400 --> 00:56:58.350
In other words, it's the same
kind of isolation of the past
00:56:58.350 --> 00:57:01.790
from the future as we had
with Markov change.
00:57:01.790 --> 00:57:05.260
With Markov change, remember,
it's only what happens at one
00:57:05.260 --> 00:57:10.990
instant, given what happens a
one instant, it makes the past
00:57:10.990 --> 00:57:13.080
independent of the future.
00:57:13.080 --> 00:57:17.430
Here it's not quite that way,
because the past and the
00:57:17.430 --> 00:57:22.140
future are separated only in
terms of the details of the
00:57:22.140 --> 00:57:27.135
past, and the expected
value the future.
00:57:27.135 --> 00:57:33.320
It says expected value of Z sub
n given all of the details
00:57:33.320 --> 00:57:36.640
of the past, no matter what the
details of the past are,
00:57:36.640 --> 00:57:41.340
the effective value of Zn is
equal to the actual value at
00:57:41.340 --> 00:57:43.440
times Z sub i.
00:57:43.440 --> 00:57:48.780
So I want to improve this for
you, and I warn you you're not
00:57:48.780 --> 00:57:50.880
going to follow this proof.
00:57:50.880 --> 00:57:53.820
And that's part of the reason
for me to do it, because I
00:57:53.820 --> 00:57:59.840
want you to go back to Chapter
1 and think that through.
00:57:59.840 --> 00:58:02.740
Because in dealing with
martingales, you have to think
00:58:02.740 --> 00:58:04.300
this through.
00:58:04.300 --> 00:58:07.240
Because if you don't think it
through, you're stuck with
00:58:07.240 --> 00:58:10.760
this notation all
the way through.
00:58:10.760 --> 00:58:14.730
And if you try to use this
notation with martingales,
00:58:14.730 --> 00:58:18.330
this is nice notation when you
get confused, but you don't
00:58:18.330 --> 00:58:19.950
want to use it all the time.
00:58:19.950 --> 00:58:23.500
So you have to be able to go
through arguments like this.
00:58:23.500 --> 00:58:30.860
What I want to show is that if
E to the Z3, given Z1 and Z2
00:58:30.860 --> 00:58:37.390
is equal to Z2, then one special
case of this lemma is
00:58:37.390 --> 00:58:45.370
that expected value of Z3
given Z1 is equal to Z1.
00:58:45.370 --> 00:58:47.480
And how do we show that?
00:58:47.480 --> 00:58:51.930
Well, what we do is we use this
law complete expectation.
00:58:55.930 --> 00:59:00.500
Well, first we remember the
expected value of an arbitrary
00:59:00.500 --> 00:59:06.830
random variable X is the
expected value of the expected
00:59:06.830 --> 00:59:10.050
value of X given Y. Now
what does that mean?
00:59:10.050 --> 00:59:17.000
The expected value of the random
variable X, given Y, is
00:59:17.000 --> 00:59:18.990
a random variable.
00:59:18.990 --> 00:59:22.500
It's a random variable which
depends on Y. That's a
00:59:22.500 --> 00:59:27.250
function of the sample value of
Y. Namely, if you look at
00:59:27.250 --> 00:59:30.990
this quantity up here,
expected value of X
00:59:30.990 --> 00:59:33.290
given Y equals 1.
00:59:33.290 --> 00:59:36.130
Expected value of X
given Y equals 2.
00:59:36.130 --> 00:59:38.830
Expected value of X
given Y equals 3.
00:59:38.830 --> 00:59:41.010
We have all of these
values here.
00:59:41.010 --> 00:59:43.720
We have a probability
measure on it.
00:59:43.720 --> 00:59:51.150
This is a random variable,
which is a function of Y.
00:59:51.150 --> 00:59:54.710
You've averaged that over X,
but you're left move why
00:59:54.710 --> 00:59:56.810
because of the conditioning
here.
00:59:56.810 --> 01:00:04.440
So this quantity in here is now
a function of Y. So when
01:00:04.440 --> 01:00:11.290
we take this equation and we
add the conditioning on Z1,
01:00:11.290 --> 01:00:16.920
namely, this is being
used for Z3 and Z2.
01:00:16.920 --> 01:00:26.910
Expected value of Z3 is equal to
the expected value over Z2
01:00:26.910 --> 01:00:32.340
of the expected value of Z3
given Z2, whole thing
01:00:32.340 --> 01:00:35.180
dependent on Z1.
01:00:35.180 --> 01:00:39.040
OK, so what it says is this
expected value is the expected
01:00:39.040 --> 01:00:46.110
value of the expected value of
Z3 condition on Z2 and Z1.
01:00:46.110 --> 01:00:50.840
This quantity here as
a function of what?
01:00:50.840 --> 01:00:51.990
That's a random variable.
01:00:51.990 --> 01:00:55.136
It's a function of what
random variables?
01:00:55.136 --> 01:00:56.540
AUDIENCE: Z2, Z1.
01:00:56.540 --> 01:00:59.350
PROFESSOR: Z1 and Z2, yes.
01:00:59.350 --> 01:01:02.380
So this is a function
of Z1 and Z2.
01:01:02.380 --> 01:01:06.220
What value is it as a function
of Z1 and Z2?
01:01:06.220 --> 01:01:09.040
It's just equal to Z2.
01:01:09.040 --> 01:01:12.360
So this quantity
in here is Z2.
01:01:12.360 --> 01:01:17.620
so we're asking what's expected
value of Z2 given Z1.
01:01:17.620 --> 01:01:23.960
And by definition of martingale,
it is equal to Z1.
01:01:27.130 --> 01:01:30.910
Now I imagine about half you
could follow that, and half of
01:01:30.910 --> 01:01:35.460
you couldn't, and half of
you sort of followed it.
01:01:35.460 --> 01:01:39.530
This is a kind of argument we'll
be using all the way
01:01:39.530 --> 01:01:41.840
through on this stuff.
01:01:41.840 --> 01:01:43.310
So make sure you
understand it.
01:01:45.990 --> 01:01:49.220
I mean, once you get
it, it's easy.
01:01:49.220 --> 01:01:52.390
And you can apply it in
all sorts of places.
01:01:52.390 --> 01:01:55.010
So it's worth doing it.
01:01:55.010 --> 01:01:57.650
In the same way, you can
follow the same kind of
01:01:57.650 --> 01:02:02.470
argument through the expected
value Z sub i plus 2, using
01:02:02.470 --> 01:02:09.020
this total expectation
based on Zi plus 1.
01:02:09.020 --> 01:02:11.900
And you go through
the whole thing.
01:02:11.900 --> 01:02:17.770
When you go down to i equals 1,
it says the expected value
01:02:17.770 --> 01:02:22.330
of z is equal to the expected
value of Z1.
01:02:22.330 --> 01:02:26.420
If you want to become wealthy,
have a wealthy parent, who
01:02:26.420 --> 01:02:28.880
leads to a lot of money
20 years ago.
01:02:28.880 --> 01:02:34.600
That's the easiest way to make
a million dollars is to start
01:02:34.600 --> 01:02:37.580
out with 2 million dollars
is the way that
01:02:37.580 --> 01:02:40.830
some people put it.
01:02:40.830 --> 01:02:45.770
OK, let's have some simple
examples a martingales.
01:02:45.770 --> 01:02:49.536
One of them is a zero-mean
random walk.
01:02:49.536 --> 01:02:52.370
Mainly, what I'm trying to do
here is to show you this
01:02:52.370 --> 01:02:56.320
martingales are really pretty
general things.
01:02:56.320 --> 01:03:00.480
And since there are many very
general theorems that hold for
01:03:00.480 --> 01:03:03.980
all martingales, you can then
apply them to all of these
01:03:03.980 --> 01:03:07.640
special cases, which
is kind of neat.
01:03:07.640 --> 01:03:14.530
To have a zero-mean random walk,
let Z sub n be the sum
01:03:14.530 --> 01:03:23.720
of X1 plus Xn and the X sub
i's are IID and zero mean.
01:03:23.720 --> 01:03:26.710
The fact that they're IID
makes it a random walk.
01:03:26.710 --> 01:03:30.060
The fact that there's zero mean
makes it a special zero
01:03:30.060 --> 01:03:34.790
mean random walk, and the
expected value of Z sub n,
01:03:34.790 --> 01:03:38.860
given Zn minus 1.
01:03:38.860 --> 01:03:47.870
All the way back, Zn now
is Xn plus Zn minus 1.
01:03:47.870 --> 01:03:50.990
OK, Zn is the sum of all
these random variables.
01:03:50.990 --> 01:03:54.030
So you get up n minus 1 of them,
and then you add the
01:03:54.030 --> 01:03:55.290
last one in.
01:03:55.290 --> 01:04:07.250
So it's Zn minus 1 plus Xn, so
its expected value of Xn plus
01:04:07.250 --> 01:04:12.700
Zn minus 1, given all the
stuff before that.
01:04:12.700 --> 01:04:17.160
The expected value of Xn, given
all this stuff, is what?
01:04:17.160 --> 01:04:20.640
Xn is independent of all the
other X's, therefore it's
01:04:20.640 --> 01:04:23.760
independent of all
the earlier Z's.
01:04:23.760 --> 01:04:27.840
And therefore, that's just
expected value of Xn.
01:04:27.840 --> 01:04:32.740
So we have the expected value of
Zn minus 1, given Zn minus
01:04:32.740 --> 01:04:34.380
1 back to Z1.
01:04:34.380 --> 01:04:39.130
What's expected value of Zn
minus 1, given Zn minus 1?
01:04:39.130 --> 01:04:42.596
Well, it's Zn minus 1.
01:04:42.596 --> 01:04:44.380
That's no problem there.
01:04:44.380 --> 01:04:47.200
So this is 0.
01:04:47.200 --> 01:04:53.340
So this is equal to Zn minus
1, as it's supposed to be.
01:04:53.340 --> 01:04:56.770
All of these things you ought
to go back and think them
01:04:56.770 --> 01:04:57.620
through yourself.
01:04:57.620 --> 01:05:01.120
Because the first time you look
at martingales, all of
01:05:01.120 --> 01:05:05.040
this stuff, it's all pretty
easy, but it all looks a
01:05:05.040 --> 01:05:07.860
little strange at first.
01:05:07.860 --> 01:05:10.260
The next one is sums
of arbitrary
01:05:10.260 --> 01:05:12.540
dependent random variables.
01:05:12.540 --> 01:05:15.270
They're not quite arbitrary.
01:05:15.270 --> 01:05:19.900
Suppose you have a sequence of
random variables, X sub i, i
01:05:19.900 --> 01:05:21.910
greater or equal to 1.
01:05:21.910 --> 01:05:25.970
And they satisfy the expected
value of Xi, given all the
01:05:25.970 --> 01:05:30.310
earlier X of i's
is equal to 0.
01:05:30.310 --> 01:05:33.060
It's similar to what
a martingale is.
01:05:33.060 --> 01:05:37.350
But here, we're just saying
the Xi's all have expected
01:05:37.350 --> 01:05:39.530
value of 0.
01:05:39.530 --> 01:05:45.350
And Zn, the sum of these,
has to be a martingale.
01:05:45.350 --> 01:05:48.380
And I'm not going to write
all a proof of that.
01:05:48.380 --> 01:05:55.020
I mean, this proof is really
the same as this proof.
01:05:55.020 --> 01:05:59.110
This is really a pretty
important thing, because given
01:05:59.110 --> 01:06:06.180
any martingale, you can always
look at the partial sums
01:06:06.180 --> 01:06:09.370
between the terms of
the martingale--
01:06:09.370 --> 01:06:13.120
namely, given Z1, Z2,
up to Z sub n.
01:06:13.120 --> 01:06:16.440
You can always look
at Z2 minus Z1.
01:06:16.440 --> 01:06:18.970
You can look at Z3 minus Z2.
01:06:18.970 --> 01:06:23.980
You could at Z4 minus
Z3, and so forth.
01:06:23.980 --> 01:06:26.910
And each of the Z's is
just the sum of those
01:06:26.910 --> 01:06:28.380
other random variables.
01:06:28.380 --> 01:06:32.810
So given any martingale in the
world, you can always define
01:06:32.810 --> 01:06:37.920
the set of arbitrary depending
random variables would satisfy
01:06:37.920 --> 01:06:39.970
this rule here.
01:06:39.970 --> 01:06:45.900
So Zn, in this case,
is a martingale.
01:06:45.900 --> 01:06:50.690
And if Zn is a martingale, you
can always define a set of
01:06:50.690 --> 01:06:55.450
random variables, which
satisfy this property.
01:06:55.450 --> 01:06:58.920
I think it's almost easier to
see what a random variable
01:06:58.920 --> 01:07:01.430
really has to do with gambling,
which is where it
01:07:01.430 --> 01:07:05.420
started, by looking at this.
01:07:05.420 --> 01:07:08.790
This is not your capital
at time n.
01:07:08.790 --> 01:07:13.750
This is how much you win
or lose at time i.
01:07:13.750 --> 01:07:20.270
And what it's saying is your
winnings or losings at time i
01:07:20.270 --> 01:07:25.560
has zero mean independent of
everything in the past.
01:07:25.560 --> 01:07:28.770
In other words, in a fair game,
you can bet whatever you
01:07:28.770 --> 01:07:34.470
want to and depending on what
you bet, that's the expected
01:07:34.470 --> 01:07:37.940
amount you get on that trial.
01:07:37.940 --> 01:07:40.590
And that's what this says.
01:07:40.590 --> 01:07:43.860
This says essentially, you're
applying a fair game.
01:07:43.860 --> 01:07:47.100
So martingales really have
to do is fair games.
01:07:47.100 --> 01:07:50.390
If you can find fair games,
why, that's great.
01:07:50.390 --> 01:07:54.200
But we always look for games
where we have an edge.
01:07:54.200 --> 01:07:56.585
But what you want to avoid
is games where Las
01:07:56.585 --> 01:07:58.630
Vegas has an edge.
01:07:58.630 --> 01:08:01.800
OK so, that's a general one.
01:08:04.320 --> 01:08:10.790
Here's an interesting one,
because I think this is an
01:08:10.790 --> 01:08:13.470
example which you can use.
01:08:13.470 --> 01:08:15.820
I mean, in any field you
study, there are always
01:08:15.820 --> 01:08:20.760
generic examples, which can be
used to generate counter
01:08:20.760 --> 01:08:24.899
examples to any simple thing
you might want to think of.
01:08:24.899 --> 01:08:27.850
And this is to me the most
interesting one of those for
01:08:27.850 --> 01:08:29.880
martingales.
01:08:29.880 --> 01:08:34.800
Suppose that Xi is the product
of two random variables--
01:08:34.800 --> 01:08:40.890
one is either plus 1 or minus 1,
each with probability 1/2.
01:08:40.890 --> 01:08:44.510
And the other one, Y sub i is
anything it wants to be.
01:08:44.510 --> 01:08:46.609
I don't care what Y sub i is.
01:08:46.609 --> 01:08:50.810
Y sub i is non-negative, might
as well make it non-negative.
01:08:50.810 --> 01:08:51.540
I don't care about it.
01:08:51.540 --> 01:08:54.670
I don't care how it's related
to all the other Y sub i's.
01:08:54.670 --> 01:08:58.200
All I want is the that the U sub
i's are all independent of
01:08:58.200 --> 01:09:00.999
all the Y sub i's.
01:09:00.999 --> 01:09:03.590
And what happens then?
01:09:03.590 --> 01:09:07.420
I take the expected value of X
sub i, give it anything in the
01:09:07.420 --> 01:09:09.529
past, and what do I get?
01:09:12.100 --> 01:09:15.450
U sub i is independent
of Y sub i.
01:09:15.450 --> 01:09:19.229
And therefore, the expected
value of U sub i times Y sub i
01:09:19.229 --> 01:09:24.330
is expected value of U sub
i, which is what--
01:09:24.330 --> 01:09:27.665
plus 1 or minus 1 of probability
1/2 each.
01:09:27.665 --> 01:09:31.880
The expected value of U
sub i is equal to 0.
01:09:31.880 --> 01:09:35.050
That makes expected value
of X sub i of 0
01:09:35.050 --> 01:09:39.080
whatever the past is.
01:09:39.080 --> 01:09:41.716
So you automatically
have the--
01:09:46.676 --> 01:09:49.050
I don't know what to call in
them, the terms between the
01:09:49.050 --> 01:09:51.640
terms of a martingale--
01:09:51.640 --> 01:09:55.990
the interarrival terms,
so to speak.
01:09:55.990 --> 01:09:59.500
I mean, it's like those
for a renewal process.
01:09:59.500 --> 01:10:02.130
Those terms always
have mean 0.
01:10:02.130 --> 01:10:08.180
And therefore, the sums of
these turn out to be this
01:10:08.180 --> 01:10:11.000
simple kind of martingale.
01:10:11.000 --> 01:10:14.910
So that's a nice martingale to
use as counter examples for
01:10:14.910 --> 01:10:17.350
almost anything.
01:10:17.350 --> 01:10:19.730
The next one is product
for martingales.
01:10:19.730 --> 01:10:24.970
Product for martingales are
things we use quite a bit too,
01:10:24.970 --> 01:10:28.240
because now when we're using
generating functions, we're in
01:10:28.240 --> 01:10:31.200
the habit of multiplying
things together.
01:10:31.200 --> 01:10:34.320
And that's a useful
thing to do.
01:10:34.320 --> 01:10:41.000
So the expected value of Z to
the n, given Z to the n minus
01:10:41.000 --> 01:10:48.590
1, now to Z1, where Zn is
this product of terms.
01:10:48.590 --> 01:10:55.410
OK, Z sub n then is equal to
Xn times Z sub n minus 1,
01:10:55.410 --> 01:10:56.900
which is what we're
doing here.
01:10:56.900 --> 01:11:01.180
Expected value of Zn conditional
on the past is the
01:11:01.180 --> 01:11:06.600
expected value of Xn times Zn
minus 1, conditional on the
01:11:06.600 --> 01:11:13.170
past, Xn and Zn minus 1.
01:11:19.750 --> 01:11:25.790
Oh, the expected value of Xn
for any given value of Zn
01:11:25.790 --> 01:11:29.770
minus 1, all the way back,
is just the expected
01:11:29.770 --> 01:11:32.750
value of X sub n.
01:11:32.750 --> 01:11:36.510
So we have expected value of X
sub n times the expected value
01:11:36.510 --> 01:11:40.970
of Z sub n minus 1, given
Zn n minus 1 down to Z1.
01:11:40.970 --> 01:11:43.200
So that's just Zn minus 1.
01:11:46.170 --> 01:11:51.260
Ah, the missing quantity,
fortunately I wrote it here.
01:11:51.260 --> 01:11:53.675
The X sub i's are unit means
random variables.
01:11:57.990 --> 01:11:58.880
And they're IIDs.
01:11:58.880 --> 01:12:00.700
They're independent
of each other.
01:12:00.700 --> 01:12:05.260
And since the X sub i's are
independent of each other, X
01:12:05.260 --> 01:12:09.140
sub n is independent
of Xn minus 1, all
01:12:09.140 --> 01:12:12.110
the way back to X1.
01:12:12.110 --> 01:12:16.080
Zn minus 1 back to Z1
is a function of Xn
01:12:16.080 --> 01:12:19.140
minus 1, down to X1.
01:12:19.140 --> 01:12:24.090
So Xn is independent of all
those previous Z's also.
01:12:24.090 --> 01:12:27.560
That's why I could split
this apart in this way.
01:12:27.560 --> 01:12:31.350
And suddenly I wind up with
Zn n minus 1 again.
01:12:31.350 --> 01:12:35.390
So product form martingales
work.
01:12:35.390 --> 01:12:39.010
Special form of product
form martingales--
01:12:39.010 --> 01:12:44.930
this again is favored counter
example for when you can and
01:12:44.930 --> 01:12:48.530
can't get around with
going to limits and
01:12:48.530 --> 01:12:51.040
interchanging limits.
01:12:51.040 --> 01:12:53.360
And it's a simple one.
01:12:53.360 --> 01:12:56.740
Suppose that X sub i's
are IID, as in
01:12:56.740 --> 01:12:59.790
the previous example.
01:12:59.790 --> 01:13:02.735
And they're [INAUDIBLE]
probably 2 or 0.
01:13:09.010 --> 01:13:12.820
I mean, this is a game
you often play--
01:13:12.820 --> 01:13:14.190
double or nothing.
01:13:14.190 --> 01:13:15.950
You start out with dollar.
01:13:15.950 --> 01:13:16.890
You play the game.
01:13:16.890 --> 01:13:18.780
If you win, you get $2.
01:13:18.780 --> 01:13:21.300
If you lose, you're broke.
01:13:21.300 --> 01:13:24.210
If you win, you play your $2.
01:13:24.210 --> 01:13:26.020
If you win again, you have $4.
01:13:26.020 --> 01:13:27.760
If you lose, you're broke.
01:13:27.760 --> 01:13:28.950
You play again.
01:13:28.950 --> 01:13:30.460
If you win, you have $8.
01:13:30.460 --> 01:13:33.020
If you lose again,
you're broke.
01:13:33.020 --> 01:13:39.140
So the probability that Z sub n,
which is your capital after
01:13:39.140 --> 01:13:43.640
n trials, is equals to 2 to the
n, namely you've won all n
01:13:43.640 --> 01:13:48.540
times, is 2 to the minus n.
01:13:48.540 --> 01:13:52.840
And every other instance
you've lost.
01:13:52.840 --> 01:13:57.370
So the probability that Zn is
equal to 0 is 1 minus 2
01:13:57.370 --> 01:13:59.170
to the minus n.
01:13:59.170 --> 01:14:02.660
So for each n, if you calculate
the expected value
01:14:02.660 --> 01:14:07.420
of Z sub n, it's equal to 1.
01:14:07.420 --> 01:14:12.970
Namely, with probability 2 to
the minus n, your capital is 2
01:14:12.970 --> 01:14:15.850
to the n, so that's 1.
01:14:15.850 --> 01:14:18.630
With all the other probability,
you have nothing.
01:14:18.630 --> 01:14:24.070
So your expected value of Z sub
n is always equal to 1.
01:14:24.070 --> 01:14:27.420
That's what this product
form martingale says.
01:14:27.420 --> 01:14:29.820
And this is a product
form martingale.
01:14:29.820 --> 01:14:36.150
However, the limit as n goes to
infinity of Zn is equal to
01:14:36.150 --> 01:14:39.130
0 with probability 1.
01:14:39.130 --> 01:14:43.300
If you play double or nothing,
eventually you lose.
01:14:43.300 --> 01:14:45.920
And then you're wiped out.
01:14:45.920 --> 01:14:47.830
In other words, there's no real
purpose to playing the
01:14:47.830 --> 01:14:50.720
game, because eventually
you lose.
01:14:50.720 --> 01:14:53.620
If you're playing with somebody
else, and they're
01:14:53.620 --> 01:14:56.500
playing double or nothing, then
of course you get their
01:14:56.500 --> 01:14:58.530
money eventually.
01:14:58.530 --> 01:15:02.890
Or you go broke and the bank
that you bank at fails, and
01:15:02.890 --> 01:15:05.240
all that stuff.
01:15:05.240 --> 01:15:06.810
We won't worry about that.
01:15:06.810 --> 01:15:10.220
OK, but the point of this is
that the limit as n goes to
01:15:10.220 --> 01:15:13.690
infinity of Zn is equal to 0.
01:15:13.690 --> 01:15:17.020
And the limit of the expected
value of Z sub
01:15:17.020 --> 01:15:19.920
n is equal to 1.
01:15:19.920 --> 01:15:24.870
And therefore, the limit of Zn
and the expected value of the
01:15:24.870 --> 01:15:26.130
limit of Zn.
01:15:26.130 --> 01:15:29.260
The expected value of the
limit of Zn is 0.
01:15:29.260 --> 01:15:33.550
The limit as expected value
of Zn is equal to one.
01:15:33.550 --> 01:15:36.530
So this is a case where you
can't interchange limit and
01:15:36.530 --> 01:15:37.920
expectation.
01:15:37.920 --> 01:15:41.690
It's an easy one to keep in
mind, because we all know
01:15:41.690 --> 01:15:43.740
about playing double
or nothing.
01:15:50.630 --> 01:15:53.500
Might as well define
submartingales and
01:15:53.500 --> 01:15:55.540
supermartingales.
01:15:55.540 --> 01:16:00.800
Because the first thing to know
about them is they're
01:16:00.800 --> 01:16:03.780
like martingales, except
they're defined with
01:16:03.780 --> 01:16:06.220
inequalities.
01:16:06.220 --> 01:16:10.580
And for a submartingale, the
expected value of Zn, given
01:16:10.580 --> 01:16:12.970
all the previous terms,
is greater than or
01:16:12.970 --> 01:16:15.560
equal to Zn minus 1.
01:16:15.560 --> 01:16:18.520
So submartingales go up.
01:16:18.520 --> 01:16:20.630
Supermartingales are
the opposite.
01:16:20.630 --> 01:16:23.350
Supermartingales goes down.
01:16:23.350 --> 01:16:27.930
What else could you expect from
a mathematical theory?
01:16:27.930 --> 01:16:29.410
Things that should
go up, go down.
01:16:29.410 --> 01:16:31.500
Things that should
go down, go up.
01:16:31.500 --> 01:16:34.110
Only thing you have to remember
about submartingales
01:16:34.110 --> 01:16:39.490
and supermartingales is you
figure out what terminology
01:16:39.490 --> 01:16:43.350
should have been used, and you
remember the terminology they
01:16:43.350 --> 01:16:45.750
use was the opposite of
what they should use.
01:16:51.520 --> 01:16:52.980
I don't know whether I've
ever seen stupider
01:16:52.980 --> 01:16:54.720
terminology than this.
01:16:54.720 --> 01:16:57.890
And someone once explained the
reasoning for it, and the
01:16:57.890 --> 01:16:59.140
reasoning was stupid too.
01:17:01.990 --> 01:17:03.786
So there's no excuse
for that one.
01:17:08.130 --> 01:17:11.680
We're only going to refer to
submartingales in what we're
01:17:11.680 --> 01:17:15.330
doing, partly because that's
where most of the
01:17:15.330 --> 01:17:16.980
neat results are.
01:17:16.980 --> 01:17:24.920
And the other thing is if
you have to deal with a
01:17:24.920 --> 01:17:28.350
supermartingale, what you might
as well do is instead of
01:17:28.350 --> 01:17:29.430
dealing with a sequence--
01:17:29.430 --> 01:17:30.850
Z1, Z2--
01:17:30.850 --> 01:17:35.130
deal with the sequence minus
Z1, minus Z2, and so forth.
01:17:35.130 --> 01:17:39.290
And if you change the sign on
all the terms, then you change
01:17:39.290 --> 01:17:43.840
supermartingales into
submartingales and vice versa.
01:17:43.840 --> 01:17:47.510
You don't really have to
deal with both of them.
01:17:47.510 --> 01:17:57.090
Let me talk briefly about an
inequality that I'm sure most
01:17:57.090 --> 01:17:58.450
of you heard of.
01:17:58.450 --> 01:18:00.480
How many people have heard
of Jensen's Inequality?
01:18:03.680 --> 01:18:06.930
Maybe half of you,
so not everyone.
01:18:06.930 --> 01:18:10.700
Well, it's one of the
main work horses
01:18:10.700 --> 01:18:12.540
of probability theory.
01:18:16.730 --> 01:18:21.160
Even though we haven't seen it
yet this term, you will see it
01:18:21.160 --> 01:18:22.990
many times.
01:18:22.990 --> 01:18:26.680
So what a convex function is.
01:18:26.680 --> 01:18:31.210
A convex function in simple
minded terms is something, a
01:18:31.210 --> 01:18:33.790
convex function from r into r.
01:18:33.790 --> 01:18:40.700
A real value convex function
is a function which has a
01:18:40.700 --> 01:18:43.250
positive second derivative
everywhere.
01:18:43.250 --> 01:18:46.760
So it curves down and
comes back up again.
01:18:46.760 --> 01:18:49.730
Since you also want to talk
about functions which don't
01:18:49.730 --> 01:18:53.740
have second derivatives,
you want something more
01:18:53.740 --> 01:18:55.110
general than that.
01:18:55.110 --> 01:18:58.760
So you go from derivatives.
01:18:58.760 --> 01:19:01.260
You go back to your
high school ideas,
01:19:01.260 --> 01:19:02.900
and you draw a picture.
01:19:02.900 --> 01:19:07.060
And function is convex.
01:19:07.060 --> 01:19:13.660
If all the tangents to the curve
lie not strictly below,
01:19:13.660 --> 01:19:17.530
but all the tangents can curve
lie beneath the curve, so
01:19:17.530 --> 01:19:20.800
wherever you draw a tangent,
you get something which
01:19:20.800 --> 01:19:24.530
doesn't cross the curve.
01:19:24.530 --> 01:19:29.550
Magnitude of X is a convex
function of X. Magnitude X as
01:19:29.550 --> 01:19:32.120
a function of X looks
like this.
01:19:32.120 --> 01:19:34.990
You go down or up.
01:19:34.990 --> 01:19:39.740
And all tangents to this, this
goes off to infinity and this
01:19:39.740 --> 01:19:40.790
goes off to infinity.
01:19:40.790 --> 01:19:42.440
So there's no way to
get something like
01:19:42.440 --> 01:19:44.670
that in this tangent.
01:19:44.670 --> 01:19:46.540
So you have one tangent here.
01:19:46.540 --> 01:19:48.870
You have a bunch of tangents
along here.
01:19:48.870 --> 01:19:50.810
And you have one
tangent there.
01:19:50.810 --> 01:19:52.840
And they all lie below
the curve.
01:19:52.840 --> 01:19:56.320
So X bar is a convex
function too.
01:19:56.320 --> 01:20:02.390
And Jensen's Inequality says if
H is convex, and if Z is a
01:20:02.390 --> 01:20:07.970
random variable, it has finite
expectation, then H of the
01:20:07.970 --> 01:20:12.470
expected value of Z is less than
or equal to the expected
01:20:12.470 --> 01:20:16.960
value of H of Z. You can
interchange expected value and
01:20:16.960 --> 01:20:21.020
function with inequality
like this, if
01:20:21.020 --> 01:20:23.770
the function is convex.
01:20:23.770 --> 01:20:25.180
Now, why is this true?
01:20:29.430 --> 01:20:33.950
You can see why it's true
automatically, if you're
01:20:33.950 --> 01:20:38.600
dealing with a random variable
that has only two values.
01:20:38.600 --> 01:20:45.600
If you have two values for
the random variable--
01:20:45.600 --> 01:20:47.680
Z is a random variable here.
01:20:47.680 --> 01:20:51.630
You have one variable here,
one value here, one sample
01:20:51.630 --> 01:20:53.660
value here.
01:20:53.660 --> 01:21:00.130
Look at what the expected
value of H of Z is.
01:21:00.130 --> 01:21:06.640
The expected value of H of Z is
the expected value of this
01:21:06.640 --> 01:21:10.173
and this with the appropriate
probability put in on it, so
01:21:10.173 --> 01:21:12.770
at some point that lies
on the straight line
01:21:12.770 --> 01:21:15.990
between here and there.
01:21:15.990 --> 01:21:21.910
When you look at the H of the
expect the value, then you
01:21:21.910 --> 01:21:24.340
find the expected value
along here.
01:21:24.340 --> 01:21:27.200
You can think of finding it
along the straight line here.
01:21:27.200 --> 01:21:29.960
And then it's that
point there.
01:21:29.960 --> 01:21:40.500
So since the curve is convex,
the H of the expected value of
01:21:40.500 --> 01:21:43.870
Z is there's always
a bunch of points.
01:21:43.870 --> 01:21:45.530
Average them, which
lie on a straight
01:21:45.530 --> 01:21:47.820
line beneath the curve.
01:21:47.820 --> 01:21:51.270
And the expected value of H
of Z is taking the average
01:21:51.270 --> 01:21:53.220
directly along the curve.
01:21:53.220 --> 01:21:56.280
So you get this boosting
up everywhere.
01:21:56.280 --> 01:22:03.570
It's like saying that the
absolute value of expected
01:22:03.570 --> 01:22:09.470
value of Z is less than or equal
to the expected value of
01:22:09.470 --> 01:22:14.610
the absolute value of Z.
01:22:14.610 --> 01:22:22.242
And I think I will stop there
instead of going on, because
01:22:22.242 --> 01:22:24.410
we had a lot of new
things today.
01:22:24.410 --> 01:22:30.390
And somehow sequential detection
always wears one's
01:22:30.390 --> 01:22:34.590
mind out in a short
period of time.
01:22:34.590 --> 01:22:35.840
That should be enough.