WEBVTT
00:00:00.530 --> 00:00:02.960
The following content is
provided under a Creative
00:00:02.960 --> 00:00:04.370
Commons license.
00:00:04.370 --> 00:00:07.410
Your support will help MIT
OpenCourseWare continue to
00:00:07.410 --> 00:00:11.060
offer high quality educational
resources for free.
00:00:11.060 --> 00:00:13.960
To make a donation or view
additional materials from
00:00:13.960 --> 00:00:17.890
hundreds of MIT courses, visit
MIT OpenCourseWare at
00:00:17.890 --> 00:00:19.140
ocw.mit.edu.
00:00:24.010 --> 00:00:26.560
PROFESSOR: I'm going to spend
most of time talking about
00:00:26.560 --> 00:00:29.400
chapters one, two, and three.
00:00:29.400 --> 00:00:32.220
A little bit talking about
chapter four, because we've
00:00:32.220 --> 00:00:36.370
been doing so much with chapter
four in the last
00:00:36.370 --> 00:00:39.980
couple of weeks that you
probably remember that more.
00:00:39.980 --> 00:00:40.580
OK.
00:00:40.580 --> 00:00:44.310
The basics, which we started out
with, and which you should
00:00:44.310 --> 00:00:48.800
never forget, is that any time
you develop a probability
00:00:48.800 --> 00:00:53.840
model, you've got to specify
what the sample space is and
00:00:53.840 --> 00:00:57.920
what the probability measure
on that sample space is.
00:00:57.920 --> 00:01:01.850
And in practice, and in almost
everything we've talked about
00:01:01.850 --> 00:01:05.800
so far, there's really a basic
countable set of random
00:01:05.800 --> 00:01:08.490
variables which determine
everything else.
00:01:08.490 --> 00:01:12.030
In other words, when you find
the joint probability
00:01:12.030 --> 00:01:16.730
distribution on that set of
random variables, that tells
00:01:16.730 --> 00:01:20.570
you everything else
of interest.
00:01:20.570 --> 00:01:25.200
And a sample point or a sample
path on that set of random
00:01:25.200 --> 00:01:29.520
variables is in a collection of
sample values, one sample
00:01:29.520 --> 00:01:33.980
value for each random
variable.
00:01:33.980 --> 00:01:37.740
It's very convenient, especially
when you're in an
00:01:37.740 --> 00:01:43.630
exam and a little bit rushed,
to confuse random variables
00:01:43.630 --> 00:01:47.250
with the sample values for
the random variables.
00:01:47.250 --> 00:01:48.920
And that's fine.
00:01:48.920 --> 00:01:51.900
I just want to caution you
again, and I've done this many
00:01:51.900 --> 00:01:58.410
times, that about half the
mistakes that people make--
00:01:58.410 --> 00:02:01.980
half of the conceptual mistakes
that people make
00:02:01.980 --> 00:02:06.200
doing problems and doing quizzes
are connected with
00:02:06.200 --> 00:02:09.810
getting confused at some point
about what's a random variable
00:02:09.810 --> 00:02:12.210
and what's a sample value
of that random variable.
00:02:12.210 --> 00:02:17.210
And you start thinking about
sample values as just numbers.
00:02:17.210 --> 00:02:19.090
And I do that too.
00:02:19.090 --> 00:02:21.220
It's convenient for thinking
about things.
00:02:21.220 --> 00:02:26.790
But you have to know that that's
not the whole story.
00:02:26.790 --> 00:02:29.740
Often, we have uncountable
sets of random variables.
00:02:29.740 --> 00:02:34.720
Like in renewal processes, we
have the counting renewal
00:02:34.720 --> 00:02:38.690
process, which typically has an
uncountable set of random
00:02:38.690 --> 00:02:43.860
variables, a number of arrivals
up to each time, t,
00:02:43.860 --> 00:02:48.750
where t is a continuous valued
random variable.
00:02:48.750 --> 00:02:52.810
But in almost all of those
cases, you can define things
00:02:52.810 --> 00:02:56.195
in terms of simpler sets of
random variables, like the
00:02:56.195 --> 00:02:59.480
interarrival times,
which are IID.
00:03:02.530 --> 00:03:05.960
Most of the processes we've
talked about really have a
00:03:05.960 --> 00:03:08.600
pretty simple description if
you look for the simplest
00:03:08.600 --> 00:03:09.850
description of them.
00:03:13.730 --> 00:03:17.680
If you have a sequence of
IID random variables--
00:03:17.680 --> 00:03:25.270
which is what we have for
Poisson and renewal processes,
00:03:25.270 --> 00:03:28.680
and what we have for Markov
chains is not that much more
00:03:28.680 --> 00:03:30.310
complicated--
00:03:30.310 --> 00:03:35.500
the laws of large numbers are
useful to specify what the
00:03:35.500 --> 00:03:38.500
long term behavior is.
00:03:38.500 --> 00:03:47.280
The sample time average is, as
we all know by now, is the sum
00:03:47.280 --> 00:03:49.960
of the random variables
divided by n.
00:03:49.960 --> 00:03:53.090
So it's a sample average
of these quantities.
00:03:53.090 --> 00:03:57.570
The random variable, which has
a main x bar, the expected
00:03:57.570 --> 00:04:00.140
value of x, that's
almost obvious.
00:04:00.140 --> 00:04:03.350
You just take the expected value
of s sub n, and it's n
00:04:03.350 --> 00:04:08.360
times the expected value of x
divided by n, and you're done.
00:04:08.360 --> 00:04:11.680
And the variance, since these
random variables are
00:04:11.680 --> 00:04:15.540
independent, you find that
almost as easily.
00:04:15.540 --> 00:04:18.810
That has this very
simple-minded
00:04:18.810 --> 00:04:20.850
distribution function.
00:04:20.850 --> 00:04:24.340
Remember, we usually work
with distribution
00:04:24.340 --> 00:04:26.960
functions in this class.
00:04:26.960 --> 00:04:32.580
And often, the exercises are
much easier when you do them
00:04:32.580 --> 00:04:36.500
in terms of the distribution
function than if you use
00:04:36.500 --> 00:04:40.760
formulas you remember from
elementary courses, which are
00:04:40.760 --> 00:04:44.260
specialized to--
00:04:44.260 --> 00:04:47.140
which are specialized to
probability density and
00:04:47.140 --> 00:04:51.170
probability mass functions, and
often have more special
00:04:51.170 --> 00:04:53.110
conditions on them than that.
00:04:53.110 --> 00:04:57.470
But anyway, the distribution
function starts
00:04:57.470 --> 00:04:58.570
to look like this.
00:04:58.570 --> 00:05:03.250
As n gets bigger, you notice
that what's happening is that
00:05:03.250 --> 00:05:08.860
you get a distribution which
is scrunching in this way,
00:05:08.860 --> 00:05:10.820
which is starting to
look smoother.
00:05:10.820 --> 00:05:13.450
The jumps in it gets smaller.
00:05:13.450 --> 00:05:18.630
And you start out with this
thing which is kind of crazy.
00:05:18.630 --> 00:05:21.370
And by time, n is even 50.
00:05:21.370 --> 00:05:25.770
You get something which
almost looks like a--
00:05:25.770 --> 00:05:26.840
I don't know how we tell
the difference
00:05:26.840 --> 00:05:28.460
between those two things.
00:05:28.460 --> 00:05:30.060
I thought we could,
but we can't.
00:05:30.060 --> 00:05:31.670
I certainly can't up there.
00:05:31.670 --> 00:05:37.650
But anyway, the one that's
tightest in is the one
00:05:37.650 --> 00:05:39.880
for n equals 50.
00:05:39.880 --> 00:05:44.150
And what these laws of large
numbers all say in some sense
00:05:44.150 --> 00:05:51.380
is that this distribution
function gets crunched in
00:05:51.380 --> 00:05:54.550
towards an impulse
at the mean.
00:05:54.550 --> 00:05:58.260
And then they say other more
specialized things about how
00:05:58.260 --> 00:06:02.580
this happens, about sample
paths and all of that.
00:06:02.580 --> 00:06:06.270
But the idea is that this
distribution function is
00:06:06.270 --> 00:06:10.760
heading towards a
unit impulse.
00:06:10.760 --> 00:06:14.440
The weak law of large numbers
then says that if the expected
00:06:14.440 --> 00:06:18.840
value of the magnitude of x
is less than infinity--
00:06:18.840 --> 00:06:21.660
and usually when we talk about
random variables having a
00:06:21.660 --> 00:06:25.630
mean, that's exactly
what we mean.
00:06:25.630 --> 00:06:31.220
If that condition is not
satisfied, then we usually say
00:06:31.220 --> 00:06:33.690
that the random variable
doesn't have a mean.
00:06:33.690 --> 00:06:37.300
And you'll see that every time
you look at anything in
00:06:37.300 --> 00:06:38.520
probability theory.
00:06:38.520 --> 00:06:41.940
When people say the mean exists,
that's what they
00:06:41.940 --> 00:06:43.830
always mean.
00:06:43.830 --> 00:06:47.950
And what the theorem says then
is exactly what we were
00:06:47.950 --> 00:06:49.060
talking about before.
00:06:49.060 --> 00:06:54.940
The probability that the
difference between s n over n,
00:06:54.940 --> 00:06:58.570
and the mean of x bar, the
probability that it's greater
00:06:58.570 --> 00:07:03.090
than or equal to epsilon
equals 0 in the limit.
00:07:03.090 --> 00:07:06.020
So it's saying that you put
epsilon limits on that
00:07:06.020 --> 00:07:10.860
distribution function and let
n get bigger and bigger, it
00:07:10.860 --> 00:07:14.570
goes to 1 and 0.
00:07:14.570 --> 00:07:18.120
It says the probability of s n
over n, less than or equal to
00:07:18.120 --> 00:07:23.240
x, approaches a unit step as
n approaches infinity.
00:07:23.240 --> 00:07:27.660
This says this is the condition
for convergence in
00:07:27.660 --> 00:07:30.440
probability.
00:07:30.440 --> 00:07:33.880
What we're saying is that that
also means convergence and
00:07:33.880 --> 00:07:38.740
distribution function, and
distribution for this case.
00:07:38.740 --> 00:07:42.520
And then we also, when we got
to renewal processes, we
00:07:42.520 --> 00:07:45.330
talked about the strong
law of large numbers.
00:07:45.330 --> 00:07:49.760
And that says that the expected
value of x is finite.
00:07:49.760 --> 00:07:56.630
Then this limit approaches
x on a sample path basis.
00:07:56.630 --> 00:07:59.770
In other words, for every sample
path, except this set
00:07:59.770 --> 00:08:05.020
of probability 0, this
condition holds true.
00:08:05.020 --> 00:08:08.260
That doesn't seem like it's
very different or very
00:08:08.260 --> 00:08:10.610
important for the time being.
00:08:10.610 --> 00:08:14.060
But when we started studying
renewal processes, which is
00:08:14.060 --> 00:08:19.120
where we actually talked about
this, we saw that in fact, it
00:08:19.120 --> 00:08:24.830
let us talk about this, which
says that if you take any
00:08:24.830 --> 00:08:28.700
function of s n over n--
00:08:28.700 --> 00:08:31.590
in other words, a function
of a real value--
00:08:31.590 --> 00:08:33.830
a function of a--
00:08:33.830 --> 00:08:35.720
a real valued function of a--
00:08:40.010 --> 00:08:43.570
a real valued function
of a real value, yes.
00:08:43.570 --> 00:08:46.470
What you get is that
same function
00:08:46.470 --> 00:08:49.100
applied to the mean here.
00:08:49.100 --> 00:08:50.260
And that's the thing
which is so
00:08:50.260 --> 00:08:52.630
useful for renewal processes.
00:08:52.630 --> 00:08:55.740
And it's what usually makes
the strong law of large
00:08:55.740 --> 00:08:58.730
numbers so much easier to
use than the weak law.
00:09:04.220 --> 00:09:06.170
That's a plug for
the strong law.
00:09:06.170 --> 00:09:08.745
There are many extensions of the
week love telling how fast
00:09:08.745 --> 00:09:10.910
the convergence is.
00:09:10.910 --> 00:09:14.350
One thing you should always
remember about the central
00:09:14.350 --> 00:09:17.510
limit theorem, is it really
tells you something about the
00:09:17.510 --> 00:09:18.790
weak law of large numbers.
00:09:18.790 --> 00:09:22.260
It tells you how fast that
convergence is and what the
00:09:22.260 --> 00:09:24.720
convergence looks like.
00:09:24.720 --> 00:09:28.170
It says that if the variance
of this underlying random
00:09:28.170 --> 00:09:34.000
variable is finite, then this
limit here is equal to the
00:09:34.000 --> 00:09:37.290
normal distribution function,
the Gaussian at
00:09:37.290 --> 00:09:41.350
variance 1 and mean 0.
00:09:41.350 --> 00:09:45.070
And that becomes a little easier
to see what it's saying
00:09:45.070 --> 00:09:46.870
if you look at it this way.
00:09:46.870 --> 00:09:51.510
It says probability that s
n over n minus x bar--
00:09:51.510 --> 00:09:56.890
namely the difference between
the sum and the mean which
00:09:56.890 --> 00:09:58.380
it's converging to--
00:09:58.380 --> 00:10:01.340
the probability that that's less
than or equal to y sigma
00:10:01.340 --> 00:10:04.010
over square root of
n is this normal
00:10:04.010 --> 00:10:05.480
Gaussian random variable.
00:10:05.480 --> 00:10:11.740
It says that as n gets bigger
and bigger, this quantity here
00:10:11.740 --> 00:10:13.030
gets tighter and tighter.
00:10:13.030 --> 00:10:18.620
What it says in terms of the
picture here, in terms of this
00:10:18.620 --> 00:10:22.900
picture, it says that as n gets
bigger and bigger, this
00:10:22.900 --> 00:10:28.560
picture here scrunches down as
1 over the square root of n.
00:10:28.560 --> 00:10:30.970
And it also becomes Gaussian.
00:10:30.970 --> 00:10:33.760
| it tells you exactly what
kind of convergence you
00:10:33.760 --> 00:10:34.770
actually have here.
00:10:34.770 --> 00:10:39.200
Is not only saying that this
does converge to a unit step.
00:10:39.200 --> 00:10:42.010
It says how it converges.
00:10:42.010 --> 00:10:48.240
And that's a nice thing,
conceptually.
00:10:48.240 --> 00:10:51.780
You don't always need
it in problems.
00:10:51.780 --> 00:10:54.600
But you need it for
understanding what's going on.
00:10:59.890 --> 00:11:01.690
We're moving backwards,
it seems.
00:11:06.180 --> 00:11:09.420
Now, 1, 2, Poisson processes.
00:11:09.420 --> 00:11:12.630
We talked about arrival
processes.
00:11:12.630 --> 00:11:15.260
You'd almost think that all
processes are arrival
00:11:15.260 --> 00:11:17.080
processes at this point.
00:11:17.080 --> 00:11:19.770
But any time you start to think
about that, think of a
00:11:19.770 --> 00:11:21.270
Markov chain.
00:11:21.270 --> 00:11:26.150
And a Markov chain is not an
arrival process, ordinarily.
00:11:26.150 --> 00:11:28.470
Some of them can be
viewed that way.
00:11:28.470 --> 00:11:29.690
But most of them can't.
00:11:29.690 --> 00:11:31.990
An arrival processes
is an increasing
00:11:31.990 --> 00:11:34.650
sequence of random variables.
00:11:34.650 --> 00:11:40.020
0 less than s1, which is the
time of the first arrival, s2,
00:11:40.020 --> 00:11:42.810
which is a time of the second
arrival, and so forth.
00:11:42.810 --> 00:11:48.220
Interarrival times are x1 equals
s1, and x i equals s i
00:11:48.220 --> 00:11:51.150
minus s i minus 1.
00:11:51.150 --> 00:11:55.480
The picture, which you should
have indelibly printed on the
00:11:55.480 --> 00:11:58.850
back of your brain someplace
by this time, is
00:11:58.850 --> 00:12:00.430
this picture here.
00:12:00.430 --> 00:12:04.930
s1, s2, s3, are the times
at which arrivals occur.
00:12:04.930 --> 00:12:07.590
These are random variables, so
these arrivals come in at
00:12:07.590 --> 00:12:09.320
random times.
00:12:09.320 --> 00:12:14.690
x1, x2, x3 are the intervals
between arrivals.
00:12:14.690 --> 00:12:18.280
And N of t is the number of
arrivals that have occurred up
00:12:18.280 --> 00:12:19.860
until time t.
00:12:19.860 --> 00:12:26.800
So every time the t passes one
of these arrival times, N of t
00:12:26.800 --> 00:12:31.140
pops up by one, pops up by one
again, pops up by one again.
00:12:31.140 --> 00:12:34.200
The sample value
pops up by one.
00:12:34.200 --> 00:12:36.920
Arrival process can model
arrivals to a queue,
00:12:36.920 --> 00:12:40.320
departures from a queue,
locations of breaks in an oil
00:12:40.320 --> 00:12:43.960
line, an enormous number
of things.
00:12:43.960 --> 00:12:46.260
It's not just arrivals
we're talking about.
00:12:46.260 --> 00:12:48.070
It's all of these other
things, also.
00:12:48.070 --> 00:12:54.330
But it's something laid out on
a one-dimensional axis where
00:12:54.330 --> 00:12:58.390
things happen at various
places on that
00:12:58.390 --> 00:12:59.700
one-dimensional axis.
00:12:59.700 --> 00:13:05.100
So that's the way to view it.
00:13:05.100 --> 00:13:07.540
OK, same picture again.
00:13:07.540 --> 00:13:11.510
Process can be specified by the
joint distribution of the
00:13:11.510 --> 00:13:15.570
arrival epochs or the
interarrival times, and, in
00:13:15.570 --> 00:13:18.090
fact, of the counting process.
00:13:18.090 --> 00:13:25.200
If you see a sample path of
the counting process, then
00:13:25.200 --> 00:13:29.180
from that you can determine the
sample path of the arrival
00:13:29.180 --> 00:13:33.220
times and the sample path of
the interarrival times.
00:13:33.220 --> 00:13:38.320
And since any set of these
random variables specifies all
00:13:38.320 --> 00:13:43.220
three of these things, the
three are all equivalent.
00:13:43.220 --> 00:13:47.150
OK, we have this important
condition here.
00:13:47.150 --> 00:13:55.960
And I always sort of forget
this, but when these arrivals
00:13:55.960 --> 00:13:59.700
are highly delayed, when there's
a long period of time
00:13:59.700 --> 00:14:05.380
between each arrival, what that
says is the accounting
00:14:05.380 --> 00:14:08.480
process is getting small.
00:14:08.480 --> 00:14:12.570
So big interarrival times
corresponds to a small
00:14:12.570 --> 00:14:14.180
value of N of t.
00:14:14.180 --> 00:14:16.420
And you can see that in
the picture here.
00:14:16.420 --> 00:14:20.020
If you spread out these
arrivals, you make s1 all the
00:14:20.020 --> 00:14:21.290
way out here.
00:14:21.290 --> 00:14:26.190
Then N of t doesn't become
1 until way out here.
00:14:26.190 --> 00:14:32.930
So N of t as a function of t is
getting smaller as s sub n
00:14:32.930 --> 00:14:36.030
is getting larger.
00:14:36.030 --> 00:14:41.560
S sub n is the minimum of the
set of t, such that N of t is
00:14:41.560 --> 00:14:45.830
greater than or equal to N.
Sounds like a unpleasantly
00:14:45.830 --> 00:14:49.460
complicated expression.
00:14:49.460 --> 00:14:52.210
If any of you can find a simpler
way to say it than
00:14:52.210 --> 00:14:55.950
that, I would be absolutely
delighted to hear it.
00:14:55.950 --> 00:14:57.530
But I don't think there is.
00:14:57.530 --> 00:15:01.150
I think the simpler way to say
it is this picture here.
00:15:01.150 --> 00:15:03.230
And the picture says it.
00:15:03.230 --> 00:15:08.770
And you can sort of figure out
all those logical statements
00:15:08.770 --> 00:15:11.670
from the picture, which
is intuitively a
00:15:11.670 --> 00:15:12.942
lot clearer, I think.
00:15:17.270 --> 00:15:23.380
So now, renewal processes is
an arrival process with IID
00:15:23.380 --> 00:15:25.100
interarrival times.
00:15:25.100 --> 00:15:28.800
And a Poisson process is a
renewal process where the
00:15:28.800 --> 00:15:32.130
interarrival random variables
are exponential.
00:15:32.130 --> 00:15:35.290
So, Poisson process
is a special
00:15:35.290 --> 00:15:37.200
case of renewal process.
00:15:37.200 --> 00:15:40.920
Why are these exponential
interarrival
00:15:40.920 --> 00:15:43.350
arrival times so important?
00:15:43.350 --> 00:15:46.550
Well, it's because they're
memoryless.
00:15:46.550 --> 00:15:50.360
And the memoryless property says
that the probability that
00:15:50.360 --> 00:15:54.535
x is greater than t plus x is
equal to the probability that
00:15:54.535 --> 00:15:58.190
it's greater than x times the
probability that it's greater
00:15:58.190 --> 00:16:01.830
than t for all x and t greater
than or equal to 0.
00:16:01.830 --> 00:16:04.860
This makes better sense if
you say it conditionally.
00:16:04.860 --> 00:16:09.040
The probability that x is
greater than t plus x, given
00:16:09.040 --> 00:16:12.700
that it's greater than t, is
the same as the probability
00:16:12.700 --> 00:16:14.800
that x is greater that--
00:16:14.800 --> 00:16:17.460
capital X is greater
than little x.
00:16:17.460 --> 00:16:20.420
This really gives you
the memoryless
00:16:20.420 --> 00:16:21.780
property in a nutshell.
00:16:21.780 --> 00:16:25.860
It says if you're looking at
this process as it evolves,
00:16:25.860 --> 00:16:29.010
and you see an arrival, and then
you start looking for the
00:16:29.010 --> 00:16:32.160
next arrival, it says that no
matter how long you've been
00:16:32.160 --> 00:16:36.240
looking, the distribution
function, as the time to wait
00:16:36.240 --> 00:16:38.930
until the next arrival,
is the same
00:16:38.930 --> 00:16:40.580
exponential random variable.
00:16:40.580 --> 00:16:44.220
So you never gain anything
by waiting.
00:16:44.220 --> 00:16:46.390
You might as well
be impatient.
00:16:46.390 --> 00:16:48.790
But it doesn't do any good
to be impatient.
00:16:48.790 --> 00:16:51.130
Doesn't to any good to wait.
00:16:51.130 --> 00:16:52.850
It doesn't do any good
to not wait.
00:16:52.850 --> 00:16:56.280
No matter what you do, this
damn thing always takes an
00:16:56.280 --> 00:16:59.780
exponential amount
of time to occur.
00:16:59.780 --> 00:17:01.410
OK, that's what it means
to be memoryless.
00:17:01.410 --> 00:17:03.910
And the exponential is the only
00:17:03.910 --> 00:17:05.835
memoryless random variable.
00:17:10.775 --> 00:17:14.910
How about a geometric
random variable?
00:17:14.910 --> 00:17:19.190
The geometric random variable
is memoryless if you're only
00:17:19.190 --> 00:17:22.150
looking at integer times.
00:17:22.150 --> 00:17:32.180
Here we're talking about
times on a continuum.
00:17:32.180 --> 00:17:35.090
That's what this says.
00:17:35.090 --> 00:17:38.410
Well, that's what this says.
00:17:38.410 --> 00:17:46.590
And if you look at discrete
times, then a geometric random
00:17:46.590 --> 00:17:49.860
variable is memoryless also.
00:17:55.020 --> 00:17:58.210
We're given a Poisson
of rate lambda.
00:17:58.210 --> 00:18:01.290
The interval from any given t
greater than 0 until the first
00:18:01.290 --> 00:18:04.190
arrival after t is a
random variable.
00:18:04.190 --> 00:18:06.010
Let's call it z1.
00:18:06.010 --> 00:18:08.650
We already said that that
random variable was
00:18:08.650 --> 00:18:11.430
exponential.
00:18:11.430 --> 00:18:17.040
And it's independent of all
arrivals which occur before
00:18:17.040 --> 00:18:18.630
that starting time t.
00:18:18.630 --> 00:18:23.220
So looking at any starting
time t, doesn't make any
00:18:23.220 --> 00:18:25.530
difference what has happened
back here.
00:18:25.530 --> 00:18:27.450
That's not only the
last arrival, but
00:18:27.450 --> 00:18:29.630
all the other arrivals.
00:18:29.630 --> 00:18:32.880
The time until the next arrival
is exponential.
00:18:32.880 --> 00:18:36.520
The time until each arrival
after that is exponential
00:18:36.520 --> 00:18:41.690
also, which says that if you
look at this process starting
00:18:41.690 --> 00:18:47.250
at time t, it's a Poisson
process again, where all the
00:18:47.250 --> 00:18:50.450
times have to be shifted, of
course, but it's a Poisson
00:18:50.450 --> 00:18:52.830
process starting at time t.
00:18:52.830 --> 00:19:00.570
The corresponding counting
process, we can call it n
00:19:00.570 --> 00:19:04.950
tilde of t and tau, where tau is
greater than or equal to t,
00:19:04.950 --> 00:19:09.690
where this is the number of
arrivals in the original
00:19:09.690 --> 00:19:14.610
process up until time tau minus
the number of arrivals
00:19:14.610 --> 00:19:16.340
up until time t.
00:19:16.340 --> 00:19:19.330
If you look at that difference,
so many arrivals
00:19:19.330 --> 00:19:26.550
up until t, so many more
up until time tau.
00:19:26.550 --> 00:19:29.030
You look at the difference
between tau and t.
00:19:29.030 --> 00:19:37.080
The number of arrivals in that
interval is the same Poisson
00:19:37.080 --> 00:19:39.800
distributing random
variable again.
00:19:39.800 --> 00:19:43.080
So, it has the same
distribution as N
00:19:43.080 --> 00:19:45.020
of tau minus t.
00:19:45.020 --> 00:19:47.650
And that's called the stationary
increment property.
00:19:47.650 --> 00:19:50.720
It says that no matter where you
start a Poisson process,
00:19:50.720 --> 00:19:53.030
it always looks exactly
the same.
00:19:53.030 --> 00:19:58.370
It says that if you wait for one
hour and start then, it's
00:19:58.370 --> 00:20:01.750
exactly the same as what
it was before.
00:20:01.750 --> 00:20:05.960
If we had Poisson processes in
the world, it wouldn't do any
00:20:05.960 --> 00:20:09.720
good to travel on certain days
rather than other days.
00:20:09.720 --> 00:20:13.170
It wouldn't do any good to leave
to drive home at one
00:20:13.170 --> 00:20:14.850
hour rather than another hour.
00:20:14.850 --> 00:20:17.670
You'd have the same travel
all the time.
00:20:17.670 --> 00:20:18.980
It's all equal.
00:20:18.980 --> 00:20:21.140
It would be an awful world
if it were stationary.
00:20:23.770 --> 00:20:26.750
The independent increment
properties for counting
00:20:26.750 --> 00:20:33.170
process is that for all
sequences of ordered times--
00:20:33.170 --> 00:20:37.490
0 less than t1 less than
t2 up to t k--
00:20:37.490 --> 00:20:40.310
the random variables n of t1--
00:20:40.310 --> 00:20:44.440
and now we're talking about the
number of arrivals between
00:20:44.440 --> 00:20:47.510
t1 and t2, the number
of arrivals between
00:20:47.510 --> 00:20:49.600
n minus 1 and tn.
00:20:49.600 --> 00:20:52.330
These are all independent
of each other.
00:20:52.330 --> 00:20:55.390
That's what this independent
increment property says.
00:20:55.390 --> 00:20:58.110
And we see from what we've said
about this memoryless
00:20:58.110 --> 00:21:02.680
property that the Poisson
process does indeed have this
00:21:02.680 --> 00:21:04.750
independent increment
property.
00:21:04.750 --> 00:21:08.720
Poisson processes have both the
stationary and independent
00:21:08.720 --> 00:21:11.240
increment properties.
00:21:11.240 --> 00:21:15.760
And this looks like an immediate
consequence of that.
00:21:15.760 --> 00:21:16.370
It's not.
00:21:16.370 --> 00:21:19.630
Remember, we had to struggle
with this for a bit.
00:21:19.630 --> 00:21:22.500
But it says plus Poisson
processes can be defined by
00:21:22.500 --> 00:21:26.450
the stationary and independent
increment properties, plus
00:21:26.450 --> 00:21:32.730
either the Poisson PMF for N
of t, or this incremental
00:21:32.730 --> 00:21:38.660
property, the probability that N
tilde of t and t plus delta,
00:21:38.660 --> 00:21:43.320
and the number of arrivals
between t and t plus delta,
00:21:43.320 --> 00:21:46.170
the probability that that's
1 is equal to
00:21:46.170 --> 00:21:47.600
lambda times delta.
00:21:47.600 --> 00:21:53.040
In other words, this view of a
Poisson process is the view
00:21:53.040 --> 00:21:56.850
that you get when you sort
of forget about time.
00:21:56.850 --> 00:22:00.220
And you think of arrivals from
outer space coming down and
00:22:00.220 --> 00:22:01.470
hitting on a line.
00:22:01.470 --> 00:22:03.760
And they hit on that
line randomly.
00:22:03.760 --> 00:22:05.860
And each one of them
is independent
00:22:05.860 --> 00:22:07.780
of every other one.
00:22:07.780 --> 00:22:15.350
And that's what you get if you
wind up with a density of
00:22:15.350 --> 00:22:18.770
lambda arrivals per unit time.
00:22:18.770 --> 00:22:22.120
OK, we talked about all
of that, of course.
00:22:22.120 --> 00:22:23.400
The probability distributions--
00:22:26.050 --> 00:22:29.380
there are many of them for
a Poisson process.
00:22:29.380 --> 00:22:32.470
The Poisson process is
remarkable in the sense that
00:22:32.470 --> 00:22:35.320
anything you want to find,
there's generally a simple
00:22:35.320 --> 00:22:37.070
formula for it.
00:22:37.070 --> 00:22:39.530
If it's complicated, you're
probably not looking at
00:22:39.530 --> 00:22:42.010
it the right way.
00:22:42.010 --> 00:22:45.360
So many things come out
very, very simply.
00:22:45.360 --> 00:22:46.660
The probability--
00:22:46.660 --> 00:22:50.580
the joint probability
distribution of all of the
00:22:50.580 --> 00:22:58.670
arrival times up until time N is
an exponential just in the
00:22:58.670 --> 00:23:05.080
last one, which says that the
intermediate arrival epochs
00:23:05.080 --> 00:23:09.140
are equally likely to be
anywhere, just as long as they
00:23:09.140 --> 00:23:13.440
satisfy this ordering
restriction, s1 less than s2.
00:23:13.440 --> 00:23:15.430
That's what this formula says.
00:23:15.430 --> 00:23:20.490
It says that the joint density
of these arrival times doesn't
00:23:20.490 --> 00:23:23.010
depend on anything except the
time of the last one.
00:23:25.740 --> 00:23:28.520
But it does depend on the fact
that they're [INAUDIBLE].
00:23:28.520 --> 00:23:31.435
From that, you can find
virtually everything else if
00:23:31.435 --> 00:23:32.900
you want to.
00:23:32.900 --> 00:23:36.600
That really is saying exactly
the same thing as we were just
00:23:36.600 --> 00:23:38.440
saying a while ago.
00:23:38.440 --> 00:23:41.740
This is the viewpoint of looking
at this line from
00:23:41.740 --> 00:23:47.040
outer space with arrivals coming
in, coming in uniformly
00:23:47.040 --> 00:23:51.630
distributed over this line
interval, and each of them
00:23:51.630 --> 00:23:54.080
independent of each other one.
00:23:54.080 --> 00:23:57.740
That's what you wind
up saying.
00:23:57.740 --> 00:24:01.490
This density, then, of the
n-th arrival, if you just
00:24:01.490 --> 00:24:05.620
integrate all this stuff, you
get the Erlang formula.
00:24:05.620 --> 00:24:12.940
Probability of arrival n in
t to t plus delta is--
00:24:12.940 --> 00:24:17.820
now this is the derivation that
we went through before,
00:24:17.820 --> 00:24:20.310
going from Erlang to Poisson.
00:24:20.310 --> 00:24:24.370
You can go from Poisson to
Erlang too, if you want to.
00:24:24.370 --> 00:24:26.320
But it's a little easier
to go this way.
00:24:26.320 --> 00:24:30.500
The probability of arrival in
t to t plus delta is the
00:24:30.500 --> 00:24:35.890
probability that n of t is
equal to n minus 1 times
00:24:35.890 --> 00:24:40.670
lambda delta plus an o
of delta, of course.
00:24:40.670 --> 00:24:46.270
And the probability that n of
t is equal to n minus 1 from
00:24:46.270 --> 00:24:53.050
this formula here is going to be
the density of when s sub n
00:24:53.050 --> 00:24:55.040
appears, divided by lambda.
00:24:55.040 --> 00:24:58.910
That's exactly what this
formula here says.
00:24:58.910 --> 00:25:01.980
So that's just the Poisson
distribution.
00:25:01.980 --> 00:25:04.910
We've been through
that derivation.
00:25:04.910 --> 00:25:08.420
It's almost a derivation worth
remembering, because it just
00:25:08.420 --> 00:25:11.940
appears so often.
00:25:11.940 --> 00:25:16.160
As you've seen from the problem
sets we've done,
00:25:16.160 --> 00:25:20.970
almost every problem you can
dream of, dealing with Poisson
00:25:20.970 --> 00:25:27.150
processes, the easy way to do
them comes from this property
00:25:27.150 --> 00:25:30.730
of combining and splitting
Poisson processes.
00:25:30.730 --> 00:25:35.170
It says if n1 of t, n2 of t,
up to n sub k of t are
00:25:35.170 --> 00:25:37.500
independent Poisson
processes--
00:25:37.500 --> 00:25:39.880
what do you mean by
a process being
00:25:39.880 --> 00:25:42.200
independent of another process?
00:25:42.200 --> 00:25:46.660
Well, the process is specified
by the interarrival times for
00:25:46.660 --> 00:25:47.660
that process.
00:25:47.660 --> 00:25:50.950
So what we're saying here is the
interarrival times for the
00:25:50.950 --> 00:25:54.470
first process are independent
of the interarrival times of
00:25:54.470 --> 00:25:56.770
the second process,
independent of the
00:25:56.770 --> 00:26:00.620
interarrival times for the third
process, and so forth.
00:26:00.620 --> 00:26:02.990
Again, this is a view of someone
from outer space,
00:26:02.990 --> 00:26:06.180
throwing darts onto a line.
00:26:06.180 --> 00:26:09.750
And if you have multiple people
throwing darts on a
00:26:09.750 --> 00:26:13.450
line, but they're all equally
distributed, all uniformly
00:26:13.450 --> 00:26:16.600
distributed over the line,
this is exactly
00:26:16.600 --> 00:26:20.670
the model you get.
00:26:20.670 --> 00:26:22.180
So we have two views here.
00:26:22.180 --> 00:26:26.480
The first one is to look at
the arrival epochs that's
00:26:26.480 --> 00:26:28.420
generated from each process.
00:26:28.420 --> 00:26:31.710
And then combine all arrivals
into one Poisson process.
00:26:31.710 --> 00:26:34.900
So we look at all these Poisson
processes, and then
00:26:34.900 --> 00:26:38.340
take the sum of them, and we
get a Poisson process.
00:26:38.340 --> 00:26:40.190
The other way to look at it--
00:26:40.190 --> 00:26:43.120
and going back and forth between
these two views is the
00:26:43.120 --> 00:26:45.060
way you solve problems--
00:26:45.060 --> 00:26:46.770
you look at the combined
sequence of
00:26:46.770 --> 00:26:48.900
arrival epochs first.
00:26:48.900 --> 00:26:52.400
And then for each arrival that
comes in, you think of an IID
00:26:52.400 --> 00:26:55.450
random variable independent
of all the other random
00:26:55.450 --> 00:27:02.860
variables, which decides for
each arrival which of the
00:27:02.860 --> 00:27:04.710
sub-processes it goes to.
00:27:04.710 --> 00:27:08.680
So there's this hidden
process--
00:27:08.680 --> 00:27:09.890
well, it's not hidden.
00:27:09.890 --> 00:27:12.100
You can see what it's doing
from looking at all the
00:27:12.100 --> 00:27:14.340
sub-processes.
00:27:14.340 --> 00:27:20.670
And each arrival then is
associated with the given
00:27:20.670 --> 00:27:24.700
sub-process, with the
probability mass function
00:27:24.700 --> 00:27:28.160
lambda sub i over the
sum of lambda sub j.
00:27:28.160 --> 00:27:30.460
So this is the workhorse
of Poisson
00:27:30.460 --> 00:27:32.270
type queueing problems.
00:27:32.270 --> 00:27:35.990
You study queuing theory,
every page, you
00:27:35.990 --> 00:27:37.980
see this thing used.
00:27:37.980 --> 00:27:41.480
If you look at Kleinrock's books
on queueing, they're
00:27:41.480 --> 00:27:45.120
very nice books because they
cover so many different
00:27:45.120 --> 00:27:47.040
queueing situations.
00:27:47.040 --> 00:27:50.230
You find him using this
on every page.
00:27:50.230 --> 00:27:54.060
And he never tells you that he's
using it, but that's what
00:27:54.060 --> 00:27:54.670
he's doing.
00:27:54.670 --> 00:27:59.360
So that's a useful
thing to know.
00:27:59.360 --> 00:28:02.840
We then talked about conditional
arrivals and order
00:28:02.840 --> 00:28:05.590
statistics.
00:28:05.590 --> 00:28:12.280
The conditional distribution
of the N first arrivals--
00:28:12.280 --> 00:28:17.670
namely, s sub 1 s sub
2 up to s sub n--
00:28:17.670 --> 00:28:24.250
given the number of arrivals in
N of t is just n factorial
00:28:24.250 --> 00:28:25.430
over t to the n.
00:28:25.430 --> 00:28:29.380
Again, it doesn't depend on
where these arrivals are.
00:28:29.380 --> 00:28:33.215
It's just a function which is
independent of each arrival.
00:28:33.215 --> 00:28:36.660
It's the same kind of
conditioning we had before.
00:28:36.660 --> 00:28:40.080
It's n factorial divided
by t to the n.
00:28:40.080 --> 00:28:44.360
Because of the fact that if
you order these random
00:28:44.360 --> 00:28:49.450
variables, t1 less than t2 less
than t3, and so forth, up
00:28:49.450 --> 00:28:53.540
until time t, and then you say
how many different ways can I
00:28:53.540 --> 00:29:01.590
arrange a set of numbers, each
between 0 and t so that we
00:29:01.590 --> 00:29:03.630
have different orderings
of them.
00:29:03.630 --> 00:29:06.700
And you can choose any one
of the N to be the first.
00:29:06.700 --> 00:29:09.560
You can choose any one
of the remaining n
00:29:09.560 --> 00:29:11.510
minus 1 to be the second.
00:29:11.510 --> 00:29:14.670
And that's where this is n
factorial comes from here.
00:29:14.670 --> 00:29:18.140
And that, again we've
been over.
00:29:18.140 --> 00:29:21.660
The probability that s1 is
greater than tau, given that
00:29:21.660 --> 00:29:27.540
they're interarrivals in the
overall interval t, comes from
00:29:27.540 --> 00:29:31.390
just looking at N uniformly
distributed random variables
00:29:31.390 --> 00:29:33.190
between 0 and t.
00:29:33.190 --> 00:29:35.840
And then what do you do with
those uniformly distributed
00:29:35.840 --> 00:29:37.670
random variables?
00:29:37.670 --> 00:29:40.490
Well, you ask the question,
what's the probability that
00:29:40.490 --> 00:29:44.140
all of them occur
after time tau?
00:29:44.140 --> 00:29:47.820
And that's just t minus tau
divided by t raised to the
00:29:47.820 --> 00:29:48.910
n-th power.
00:29:48.910 --> 00:29:51.980
And see, all of these formulas
just come from particular
00:29:51.980 --> 00:29:54.360
viewpoints about what's
going on.
00:29:54.360 --> 00:29:55.760
You have a number
of viewpoints.
00:29:55.760 --> 00:29:58.550
One of them is throwing
darts at a line.
00:29:58.550 --> 00:30:01.140
One of them is having
exponential
00:30:01.140 --> 00:30:02.510
interarrival times.
00:30:02.510 --> 00:30:06.660
One of them is these uniform
interarrivals.
00:30:06.660 --> 00:30:08.880
It's only a very small
number of tricks.
00:30:08.880 --> 00:30:13.600
And you just use them in
various combinations.
00:30:13.600 --> 00:30:17.800
So the joint distribution of s1
to s n, given N of t equals
00:30:17.800 --> 00:30:21.250
n, is the same as the joint
distribution of N uniform
00:30:21.250 --> 00:30:24.070
random variables after
they've been ordered.
00:30:28.650 --> 00:30:32.115
So let's go on to finite
state Markov chains.
00:30:35.240 --> 00:30:37.670
Seems like we're covering an
enormous amount of material in
00:30:37.670 --> 00:30:38.350
this course.
00:30:38.350 --> 00:30:40.150
And I think we are.
00:30:40.150 --> 00:30:44.290
But as I'm trying to say, as
we go along, it's all--
00:30:44.290 --> 00:30:46.850
I mean, everything follows from
a relatively small set of
00:30:46.850 --> 00:30:48.620
principles.
00:30:48.620 --> 00:30:51.100
Of course, it's harder to
understand the small set of
00:30:51.100 --> 00:30:54.580
principles and how to apply them
than it is to understand
00:30:54.580 --> 00:30:55.460
all the details.
00:30:55.460 --> 00:30:56.710
But that's--
00:30:58.970 --> 00:31:01.560
but on the other hand, if you
understand the principles,
00:31:01.560 --> 00:31:04.620
then all those details,
including the ones we haven't
00:31:04.620 --> 00:31:08.280
talked about, are easy
to deal with.
00:31:08.280 --> 00:31:11.750
An integer-time stochastic
process--
00:31:11.750 --> 00:31:14.450
x1, x2, x3, blah, blah, blah--
00:31:14.450 --> 00:31:19.220
is a Markov chain if for all n,
namely the number of them
00:31:19.220 --> 00:31:21.770
that we're looking at--
00:31:21.770 --> 00:31:23.020
well--
00:31:25.880 --> 00:31:30.190
for all n, i, j, k, l, and so
forth, the probability that
00:31:30.190 --> 00:31:35.770
the n-th of these random
variables is equal to j, given
00:31:35.770 --> 00:31:39.340
what all of the others are-- and
these are not ordered now.
00:31:39.340 --> 00:31:41.460
I mean, in a Markov chain,
nothing is ordered.
00:31:41.460 --> 00:31:44.430
We're not talking about
an arrival process.
00:31:44.430 --> 00:31:47.220
We're just talking about a frog
jumping around on lily
00:31:47.220 --> 00:31:52.660
pads, if you arrange the lily
pads in a linear way, if these
00:31:52.660 --> 00:31:54.430
are random variables.
00:31:54.430 --> 00:32:00.530
The probability that the n-th
location is equal to j, given
00:32:00.530 --> 00:32:06.410
that the previous locations are
i, k, back to m, is just
00:32:06.410 --> 00:32:11.010
some probability p sub
i j, a conditional
00:32:11.010 --> 00:32:14.120
probability of j given i.
00:32:14.120 --> 00:32:17.670
In other words, once if you're
looking at what happens at
00:32:17.670 --> 00:32:22.340
time n, once you know what
happened at time n minus 1,
00:32:22.340 --> 00:32:24.830
everything else is
of no concern.
00:32:24.830 --> 00:32:29.400
This process evolves by having
a history of only one time
00:32:29.400 --> 00:32:31.980
unit, a little like the
Poisson process.
00:32:31.980 --> 00:32:36.070
The Poisson process evolves
by being totally
00:32:36.070 --> 00:32:37.880
independent of the past.
00:32:37.880 --> 00:32:40.600
Here, you put a little
dependence in the past.
00:32:40.600 --> 00:32:44.150
But the dependence is only to
look at the last thing that
00:32:44.150 --> 00:32:49.040
happened, and nothing before the
last time that happened.
00:32:49.040 --> 00:32:53.850
So p sub i j depends
only on i and j.
00:32:53.850 --> 00:32:59.170
And the initial probability mass
function is arbitrary.
00:32:59.170 --> 00:33:02.470
Markov chain is finite-state if
the sample space for each x
00:33:02.470 --> 00:33:07.400
i, as a finite set S. And the
sample space S is usually
00:33:07.400 --> 00:33:10.530
taken to be integers
1 up to M.
00:33:10.530 --> 00:33:13.490
In all these formulas we write,
we're always summing
00:33:13.490 --> 00:33:17.230
from one to M. And the reason
for that is we've assumed the
00:33:17.230 --> 00:33:22.120
states are 1, 2, 3, up to M.
Sometimes it's more convenient
00:33:22.120 --> 00:33:23.765
to think of different
state spaces.
00:33:26.730 --> 00:33:29.040
But all the formulas
we use are based on
00:33:29.040 --> 00:33:31.290
this state space here.
00:33:31.290 --> 00:33:36.500
Markov up chain is completely
described by these transition
00:33:36.500 --> 00:33:41.200
probabilities plus the initial
probabilities.
00:33:41.200 --> 00:33:44.390
If you want to write down the
probability of what x is this
00:33:44.390 --> 00:33:49.030
some time N given what was at
some time 0, all you have to
00:33:49.030 --> 00:33:52.890
do is trace all the paths from
0 out to N, add up the
00:33:52.890 --> 00:33:56.890
probabilities of all of those
paths, and that tells you the
00:33:56.890 --> 00:33:58.020
probability you want.
00:33:58.020 --> 00:34:01.820
All probabilities and be
calculated just from knowing
00:34:01.820 --> 00:34:06.240
what these transition
probabilities are.
00:34:06.240 --> 00:34:10.980
Note that when we're dealing
with Poisson processes, we
00:34:10.980 --> 00:34:15.520
defined everything in
terms of how many--
00:34:15.520 --> 00:34:20.250
how many variables are there in
defining a Poisson process?
00:34:20.250 --> 00:34:25.020
How many things do you have to
specify before I know exactly
00:34:25.020 --> 00:34:27.320
what Poisson process
I'm talking about?
00:34:30.540 --> 00:34:31.760
Only the Poisson rate.
00:34:31.760 --> 00:34:35.650
Only one parameter is necessary
00:34:35.650 --> 00:34:37.639
for a Poisson process.
00:34:37.639 --> 00:34:43.219
For a finite-state Markov
process, you need a lot more.
00:34:43.219 --> 00:34:48.310
What you need is all of these
values, p sub i j.
00:34:48.310 --> 00:34:52.409
If you sum p sub i j over
j, you have to get 1.
00:34:52.409 --> 00:34:54.830
So that removes one of them.
00:34:54.830 --> 00:34:58.360
But as soon as you specify that
transition matrix, you've
00:34:58.360 --> 00:34:59.960
specified everything.
00:34:59.960 --> 00:35:01.260
So there's nothing more to know
00:35:01.260 --> 00:35:03.220
about the Poisson process.
00:35:03.220 --> 00:35:06.060
There's only all these gruesome
derivations that we
00:35:06.060 --> 00:35:07.580
go through.
00:35:07.580 --> 00:35:11.600
But everything is initially
determined.
00:35:11.600 --> 00:35:13.960
Set of transition probabilities
is usually
00:35:13.960 --> 00:35:16.030
viewed as the Markov chain.
00:35:16.030 --> 00:35:19.760
And the initial probabilities
are usually viewed as just a
00:35:19.760 --> 00:35:21.740
parameter that we deal with.
00:35:21.740 --> 00:35:23.840
In other words, we--
00:35:23.840 --> 00:35:28.250
in other words, what we study
is the particular Markov
00:35:28.250 --> 00:35:31.550
chain, whether it's recurrent,
whether it's transient,
00:35:31.550 --> 00:35:32.800
whatever it is.
00:35:32.800 --> 00:35:35.770
How you break it up into
classes, all of that stuff
00:35:35.770 --> 00:35:39.060
only depends on these transition
probabilities and
00:35:39.060 --> 00:35:40.815
doesn't depend on
where you start.
00:35:46.920 --> 00:35:51.490
Now, a finite-state Markov chain
can be described either
00:35:51.490 --> 00:35:54.230
as a directed graph
or as a matrix.
00:35:54.230 --> 00:35:58.300
I hope you've seen by this
time that some things are
00:35:58.300 --> 00:36:03.040
easier to look at if you look at
things in terms of a graph.
00:36:03.040 --> 00:36:07.180
Some things are easier to look
at if you look at something
00:36:07.180 --> 00:36:08.660
like this matrix.
00:36:08.660 --> 00:36:13.230
And some problems can be solved
by inspection, if you
00:36:13.230 --> 00:36:14.700
draw a graph of it.
00:36:14.700 --> 00:36:17.890
Some can be solved almost
by inspection if
00:36:17.890 --> 00:36:19.480
you look at the matrix.
00:36:19.480 --> 00:36:23.460
If you're doing things by
computer, usually computers
00:36:23.460 --> 00:36:27.450
deal with matrices more easily
than with graphs.
00:36:27.450 --> 00:36:31.070
If you're dealing with a Markov
chain with 100,000
00:36:31.070 --> 00:36:35.290
states, you're not going to
look at the graph and
00:36:35.290 --> 00:36:38.330
determine very much from it,
because it's typically going
00:36:38.330 --> 00:36:39.650
to be fairly complicated--
00:36:39.650 --> 00:36:42.020
unless it has some very
simple structure.
00:36:42.020 --> 00:36:46.440
And sometimes that simple
structure is determined.
00:36:46.440 --> 00:36:48.780
If it's something where
you can only--
00:36:48.780 --> 00:36:52.190
where you have the states
numbered from 1 to 100,000,
00:36:52.190 --> 00:36:56.270
and you can only go from state
i to state i plus 1, or from
00:36:56.270 --> 00:36:59.910
state i to i plus 1, or
i minus 1, then it
00:36:59.910 --> 00:37:01.380
becomes very simple.
00:37:01.380 --> 00:37:04.320
And you like to look at
it as a graph again.
00:37:04.320 --> 00:37:07.670
But ordinarily, you don't
like to do that.
00:37:07.670 --> 00:37:15.000
But the nice thing about this
graph is that it tells you
00:37:15.000 --> 00:37:19.090
very simply and visually which
transition probabilities are
00:37:19.090 --> 00:37:23.810
zero, and which transition
probabilities are non-zero.
00:37:23.810 --> 00:37:26.690
And that's the thing that
specifies which states are
00:37:26.690 --> 00:37:31.650
recurrent, which states are
transient, and all of that.
00:37:31.650 --> 00:37:35.400
All of that kind of elementary
analysis about a Markov chain
00:37:35.400 --> 00:37:40.300
all comes from looking at this
graph and seeing whether you
00:37:40.300 --> 00:37:46.290
can get from one state to
another state by some process.
00:37:46.290 --> 00:37:50.520
So let's move on from that.
00:37:50.520 --> 00:37:53.620
Talk about the classification
of states.
00:37:53.620 --> 00:37:57.500
We started out with the
idea of a walk and
00:37:57.500 --> 00:37:59.370
a path and a cycle.
00:37:59.370 --> 00:38:03.610
I'm not sure these terms are
uniform throughout the field.
00:38:03.610 --> 00:38:07.550
But a walk is an ordered
string of nodes, like
00:38:07.550 --> 00:38:10.020
i0, i1, up to i n.
00:38:10.020 --> 00:38:14.960
You can have repeated elements
here, but you need a directed
00:38:14.960 --> 00:38:18.170
arc from i sub n minus
1 to i sub m.
00:38:18.170 --> 00:38:23.035
Like for example, in this stupid
Markov chain here--
00:38:25.870 --> 00:38:28.880
I mean, when you're drawing
things is LaTeX, it's kind of
00:38:28.880 --> 00:38:31.760
hard to draw those nice
little curves there.
00:38:31.760 --> 00:38:34.610
And because of that, when you
once draw a Markov chain, you
00:38:34.610 --> 00:38:36.050
never want to change it.
00:38:36.050 --> 00:38:39.210
And that's why these nodes
have a very small set of
00:38:39.210 --> 00:38:40.530
Markov chains in them.
00:38:40.530 --> 00:38:46.580
It's just to save me some work,
drawing and drawing
00:38:46.580 --> 00:38:47.830
these diagrams.
00:38:50.030 --> 00:38:55.700
An example of a walk, as you
start in 4, you take the self
00:38:55.700 --> 00:38:58.800
loop, go back to 4 at time 2.
00:38:58.800 --> 00:39:01.660
Then you go to state
1 at time 3.
00:39:01.660 --> 00:39:05.240
Then you go to state
2 at time 4.
00:39:05.240 --> 00:39:08.140
Then you go to stage
3, time 5.
00:39:08.140 --> 00:39:11.010
And back to state 2 at time 6.
00:39:11.010 --> 00:39:13.300
You have repeated nodes there.
00:39:13.300 --> 00:39:17.230
You have repeated nodes
separated here.
00:39:17.230 --> 00:39:20.630
Another example of a
walk is 4, 1, 2, 3.
00:39:20.630 --> 00:39:24.120
Example of a path, the path
can't have any repeated nodes.
00:39:24.120 --> 00:39:27.060
We'd like to look at paths,
because if you're going to be
00:39:27.060 --> 00:39:30.280
able to get from one node to
another node, and there's some
00:39:30.280 --> 00:39:33.420
walk that goes all around the
place and gets to that final
00:39:33.420 --> 00:39:36.770
node, there's also path
that goes there.
00:39:36.770 --> 00:39:39.900
If you look at the walk, you
just leave that all the cycles
00:39:39.900 --> 00:39:42.570
along the way, and
you get to the n.
00:39:42.570 --> 00:39:45.980
And a cycle, of course, which I
didn't define, is something
00:39:45.980 --> 00:39:49.820
which starts at one node, goes
through a path, and then
00:39:49.820 --> 00:39:52.730
finally comes back to the same
node that it started at.
00:39:52.730 --> 00:39:56.800
And it doesn't make any
difference for the cycle 2, 3,
00:39:56.800 --> 00:40:01.610
2 whether you call it
2, 3, 2 or 3, 2, 3.
00:40:01.610 --> 00:40:04.390
That's the same cycle, and
it's not even worth
00:40:04.390 --> 00:40:07.200
distinguishing between
those two ideas.
00:40:07.200 --> 00:40:12.723
OK That's that.
00:40:15.360 --> 00:40:20.010
If there's a path from--
00:40:20.010 --> 00:40:21.260
where did I--
00:40:26.110 --> 00:40:31.800
node j is accessible from i,
which we abbreviate as i
00:40:31.800 --> 00:40:33.680
has a path to j.
00:40:33.680 --> 00:40:38.010
If there's a walk from i to
j, which means that p
00:40:38.010 --> 00:40:40.650
sup i j to the n--
00:40:40.650 --> 00:40:44.150
this is the transition
probability, the probability
00:40:44.150 --> 00:40:49.160
that x sub n is equal to
j, given that x sub
00:40:49.160 --> 00:40:50.710
0 is equal to i.
00:40:50.710 --> 00:40:53.380
And we use this all the time.
00:40:53.380 --> 00:40:57.370
If this is greater than zero
for some n greater than 0.
00:40:57.370 --> 00:41:06.950
In other words, j is accessible
from i if there's a
00:41:06.950 --> 00:41:09.240
path from i that goes to j.
00:41:12.300 --> 00:41:17.170
And trivially, if i go to j, and
there's a path from j to
00:41:17.170 --> 00:41:21.520
k, then there has to be
a path from i to k.
00:41:21.520 --> 00:41:25.730
If you've ever tried to make up
a mapping program to find
00:41:25.730 --> 00:41:28.910
how to get from here to there,
this is one of the most useful
00:41:28.910 --> 00:41:29.740
things you use.
00:41:29.740 --> 00:41:32.320
If there's a way to get here
to there, and a way to get
00:41:32.320 --> 00:41:35.330
from here to there, then there's
a way to get from here
00:41:35.330 --> 00:41:37.560
all the way to the end.
00:41:37.560 --> 00:41:42.650
And if you look up what most of
these map programs do, you
00:41:42.650 --> 00:41:47.040
see that they overuse this
enormously and they wind up
00:41:47.040 --> 00:41:50.910
taking you from here to there
by some bizarre path just
00:41:50.910 --> 00:41:53.880
because it happens to go through
some intermediate node
00:41:53.880 --> 00:41:55.460
on the way.
00:41:55.460 --> 00:41:58.680
So two nodes communicate--
00:41:58.680 --> 00:42:01.890
i double arrow j--
00:42:01.890 --> 00:42:08.860
if j is accessible from i, and
if i is accessible from j.
00:42:08.860 --> 00:42:12.450
That means there's a path from
i to j, and another path from
00:42:12.450 --> 00:42:16.260
j back to i, if you shorten
them as much as you can.
00:42:16.260 --> 00:42:17.040
There's a cycle.
00:42:17.040 --> 00:42:23.530
It starts at i, goes through j,
and comes back to i again.
00:42:23.530 --> 00:42:29.810
I didn't say that quite right,
so delete that from what
00:42:29.810 --> 00:42:31.200
you've just heard.
00:42:31.200 --> 00:42:35.630
A class C of states as a
non-empty set, such that i and
00:42:35.630 --> 00:42:40.370
j communicate for each
i j in this class.
00:42:40.370 --> 00:42:45.330
But i does not communicate
with j for each i and C--
00:42:49.420 --> 00:42:53.210
for i and C and j, not in C.
00:42:53.210 --> 00:42:55.870
The convenient way to think
about this-- and I should have
00:42:55.870 --> 00:42:59.670
stated this as a theorem in
the notes, because it's--
00:43:03.990 --> 00:43:06.130
I think it's something that
we all use without even
00:43:06.130 --> 00:43:07.750
thinking about it.
00:43:07.750 --> 00:43:12.480
It says that the entire set of
states, or the entire set of
00:43:12.480 --> 00:43:16.500
nodes in a graph, is partitioned
into classes.
00:43:16.500 --> 00:43:22.860
The class C, containing, is i
in union with all of the j's
00:43:22.860 --> 00:43:24.110
that communicate with i.
00:43:24.110 --> 00:43:27.580
So if you want to find this
partition, you start out with
00:43:27.580 --> 00:43:31.280
an arbitrary node, you find all
of the other nodes that it
00:43:31.280 --> 00:43:34.590
communicates with, and you
find them by picking
00:43:34.590 --> 00:43:36.320
them one at a time.
00:43:36.320 --> 00:43:41.050
You pick all of the nodes
for which p sub i j is
00:43:41.050 --> 00:43:42.540
greater than 0.
00:43:42.540 --> 00:43:44.100
Then you pick--
00:43:44.100 --> 00:43:46.530
and p sub j i is great--
00:43:46.530 --> 00:43:47.780
well-- blah.
00:43:50.030 --> 00:43:55.400
If you want to find the set of
nodes that are accessible from
00:43:55.400 --> 00:43:57.640
i, you start out looking at i.
00:43:57.640 --> 00:44:00.640
You look at all the states
which are accessible
00:44:00.640 --> 00:44:03.300
from i in one step.
00:44:03.300 --> 00:44:06.870
Then you look at all the steps,
all of the states,
00:44:06.870 --> 00:44:09.380
which you can access from
any one of those.
00:44:09.380 --> 00:44:12.720
Those are the states which are
accessible in two states--
00:44:12.720 --> 00:44:16.150
in two steps, then in three
steps, and so forth.
00:44:16.150 --> 00:44:21.380
So you find all the nodes that
are accessible from node i.
00:44:21.380 --> 00:44:24.640
And then you turn around and
do it the other way.
00:44:24.640 --> 00:44:29.600
And presto, you have all of
these classes of states all
00:44:29.600 --> 00:44:30.910
very simply.
00:44:30.910 --> 00:44:34.990
For finite-state change, the
state i is transient if
00:44:34.990 --> 00:44:40.200
there's a j in S such that
i goes into j, but j
00:44:40.200 --> 00:44:41.420
does not go into i.
00:44:41.420 --> 00:44:46.900
In other words, if I'm a state
i, and I can get to you, but
00:44:46.900 --> 00:44:55.450
you can't get back to me,
then I'm transient.
00:44:55.450 --> 00:45:01.600
Because the way Markov chains
work, we keep going from one
00:45:01.600 --> 00:45:04.720
step to the next step to the
next step to the next step.
00:45:04.720 --> 00:45:09.710
And if I keep returning to
myself, then eventually I'm
00:45:09.710 --> 00:45:11.010
going to go to you.
00:45:11.010 --> 00:45:14.040
And once I go to you, I'll
never get back again.
00:45:14.040 --> 00:45:18.540
So because of that, these
transient states are states
00:45:18.540 --> 00:45:21.450
where eventually you
leave them and you
00:45:21.450 --> 00:45:23.160
never get back again.
00:45:23.160 --> 00:45:26.190
As soon as we start talking
about countable state Markov
00:45:26.190 --> 00:45:28.270
chains, you'll see that
this definition
00:45:28.270 --> 00:45:30.250
doesn't work anymore.
00:45:30.250 --> 00:45:32.620
You can--
00:45:32.620 --> 00:45:36.520
it is very possible to just
wander away in a countable
00:45:36.520 --> 00:45:40.390
state Markov chain, and you
never get back again that way.
00:45:40.390 --> 00:45:43.640
After you wander away too far,
the probability of getting
00:45:43.640 --> 00:45:45.540
back gets smaller and smaller.
00:45:45.540 --> 00:45:47.830
You keep getting further
and further away.
00:45:47.830 --> 00:45:52.810
The probability of returning
gets smaller and smaller, so
00:45:52.810 --> 00:45:56.360
that you have transience
that way also.
00:45:56.360 --> 00:45:59.470
But here, the situation is
simpler for a finite-state
00:45:59.470 --> 00:46:01.030
Markov chain.
00:46:01.030 --> 00:46:05.570
And you can define transience if
there's a j in S such that
00:46:05.570 --> 00:46:09.440
i goes into j, but j
doesn't go into i.
00:46:09.440 --> 00:46:13.160
If i's not transient,
then it's recurrent.
00:46:13.160 --> 00:46:16.240
Usually you define recurrence
first and transience later,
00:46:16.240 --> 00:46:19.470
but it's a little simpler
this way.
00:46:19.470 --> 00:46:22.310
All states in a class are
transient, or all are
00:46:22.310 --> 00:46:26.330
recurrent, and a finite-state
Markov chain contains at least
00:46:26.330 --> 00:46:27.990
one recurrent class.
00:46:27.990 --> 00:46:29.770
You did that in your homework.
00:46:29.770 --> 00:46:33.040
And you were surprised at how
complicated it was to do it.
00:46:33.040 --> 00:46:36.350
I hope that after you wrote
down a proof of this, you
00:46:36.350 --> 00:46:41.800
stopped and thought about what
you were actually proving,
00:46:41.800 --> 00:46:46.030
which intuitively is something
very, very simple.
00:46:46.030 --> 00:46:48.960
It's just looking at all of
the transient classes.
00:46:48.960 --> 00:46:51.480
Starting at one transient
class, you
00:46:51.480 --> 00:46:54.950
find if there's another--
00:46:54.950 --> 00:46:59.190
if there's another state you can
get to from OK i which is
00:46:59.190 --> 00:47:02.170
also transient, and then you
find if there's another state
00:47:02.170 --> 00:47:04.910
you get to from there which
is also transient.
00:47:04.910 --> 00:47:08.500
And eventually, you have to come
to a state from which you
00:47:08.500 --> 00:47:13.325
can't go to some other state,
from which you can't get back.
00:47:17.350 --> 00:47:20.410
That was explaining it almost
as badly as the problem
00:47:20.410 --> 00:47:22.120
statement explained it.
00:47:22.120 --> 00:47:25.460
And I hope that after you did
the problem, even if you can't
00:47:25.460 --> 00:47:27.910
explain it to someone,
you have an
00:47:27.910 --> 00:47:30.430
understanding of why it's true.
00:47:30.430 --> 00:47:34.920
It shouldn't be surprising
after you do that.
00:47:34.920 --> 00:47:38.950
So the finite-state Markov chain
contains at least one
00:47:38.950 --> 00:47:40.200
recurrent class.
00:47:42.800 --> 00:47:46.720
OK, the period of a state
i as the greatest common
00:47:46.720 --> 00:47:51.730
denominator of n, such that
p i n is greater than 0.
00:47:51.730 --> 00:47:54.580
Again, a very complicated
definition for a
00:47:54.580 --> 00:47:56.280
simple kind of idea.
00:47:56.280 --> 00:47:58.670
Namely, you start out
in a state i.
00:47:58.670 --> 00:48:02.440
You look at all of the times at
which you can get back to
00:48:02.440 --> 00:48:03.940
state i again.
00:48:03.940 --> 00:48:08.780
If you find it that set of
times has a period in it,
00:48:08.780 --> 00:48:19.550
namely, if every sequences of
states is a multiple of some
00:48:19.550 --> 00:48:25.410
d, then you know that the state
is periodic if d is
00:48:25.410 --> 00:48:26.720
greater than 1.
00:48:26.720 --> 00:48:30.060
And what you have to do is to
find the largest such number.
00:48:30.060 --> 00:48:32.040
And that's the period
of the state.
00:48:32.040 --> 00:48:35.170
All states in the same class
have the same period.
00:48:35.170 --> 00:48:38.690
A recurring class with period
d greater than one can be
00:48:38.690 --> 00:48:40.550
partitioned into sub-class--
00:48:40.550 --> 00:48:42.640
this is the best way
of looking at
00:48:42.640 --> 00:48:45.820
periodic classes of states.
00:48:45.820 --> 00:48:49.780
If you have a periodic class of
states, then you can always
00:48:49.780 --> 00:48:53.960
separate it into
d sub-classes.
00:48:53.960 --> 00:48:59.300
And in such a set of
sub-classes, transitions from
00:48:59.300 --> 00:49:03.770
S1 and the states in
S1 only go to S2.
00:49:03.770 --> 00:49:07.710
Transitions from states
in S2 only go to S3.
00:49:07.710 --> 00:49:12.430
Up to, transitions from S
d only go back to S1.
00:49:12.430 --> 00:49:16.050
They have to go someplace,
so they go back to S1.
00:49:16.050 --> 00:49:22.500
So as you cycle around, it takes
d steps to cycle from 1
00:49:22.500 --> 00:49:24.000
back to 1 again.
00:49:24.000 --> 00:49:28.410
It takes d steps to cycle
from 2 back to 2 again.
00:49:28.410 --> 00:49:31.300
So you can see the structure of
the Markov chain and why,
00:49:31.300 --> 00:49:34.810
in fact, it does have to be--
00:49:34.810 --> 00:49:38.480
why that class has
to be periodic.
00:49:38.480 --> 00:49:41.870
An ergodic class is a recurrent
aperiodic class.
00:49:41.870 --> 00:49:44.760
In other words, it's a class
where the period is equal to
00:49:44.760 --> 00:49:48.450
1, which means there really
isn't any period.
00:49:48.450 --> 00:49:52.550
A Markov chain with only one
class is ergodic if the class
00:49:52.550 --> 00:49:54.640
is ergodic.
00:49:54.640 --> 00:49:56.880
And the big theorem here--
00:49:56.880 --> 00:49:59.670
I mean, this is probably the
most important theorem about
00:49:59.670 --> 00:50:01.820
finite-state Markov chains.
00:50:01.820 --> 00:50:05.100
You have an ergodic,
finite-state Markov chain.
00:50:05.100 --> 00:50:12.300
Then the limit as n goes to
infinity of the probability of
00:50:12.300 --> 00:50:16.700
arriving in state j after n
steps, given that you started
00:50:16.700 --> 00:50:20.780
in state i, is just some
function of j.
00:50:20.780 --> 00:50:24.400
In other words, when n gets very
large, it doesn't depend
00:50:24.400 --> 00:50:27.370
on how large M is.
00:50:27.370 --> 00:50:28.480
It stays the same.
00:50:28.480 --> 00:50:30.570
It becomes independent of n.
00:50:30.570 --> 00:50:32.450
It doesn't depend on
where you started.
00:50:32.450 --> 00:50:34.860
No matter where you start
in a finite-state
00:50:34.860 --> 00:50:36.570
ergodic Markov chain.
00:50:36.570 --> 00:50:40.580
After a very long time, the
probability of being in a
00:50:40.580 --> 00:50:44.620
state j is independent of where
you started, and it's
00:50:44.620 --> 00:50:48.170
independent of how long
you've been running.
00:50:48.170 --> 00:50:52.200
So that's a very strong
kind of--
00:50:52.200 --> 00:50:54.890
it's a very strong kind
of limit theorem.
00:50:54.890 --> 00:50:58.690
It's very much like the law of
large numbers and all of these
00:50:58.690 --> 00:51:00.030
other things.
00:51:00.030 --> 00:51:03.120
I'm going to talk a little bit
at the end about what that
00:51:03.120 --> 00:51:04.820
relationship really is.
00:51:07.360 --> 00:51:10.850
Except what it says is, after a
long time, you're in steady
00:51:10.850 --> 00:51:12.670
state, which is why
it's called the
00:51:12.670 --> 00:51:13.760
steady state theorem.
00:51:13.760 --> 00:51:14.440
Yes?
00:51:14.440 --> 00:51:17.386
AUDIENCE: Could you define the
steady states for periodic
00:51:17.386 --> 00:51:18.636
changes [INAUDIBLE]?
00:51:21.320 --> 00:51:26.460
PROFESSOR: I try to avoid doing
that because you have
00:51:26.460 --> 00:51:28.650
steady state probabilities.
00:51:28.650 --> 00:51:31.810
The steady state probabilities
that you have are, you take--
00:51:34.990 --> 00:51:38.760
is if you have these
sub-classes.
00:51:38.760 --> 00:51:42.690
Then you wind up with a steady
state within each sub-class.
00:51:42.690 --> 00:51:46.900
If you assign a probability
of the probability in the
00:51:46.900 --> 00:51:51.870
sub-class, divided by d, then
you get what is the steady
00:51:51.870 --> 00:51:52.930
state probability.
00:51:52.930 --> 00:51:56.870
If you start out in that steady
state, then you're in
00:51:56.870 --> 00:52:00.130
each sub-class with probability
1 over d.
00:52:00.130 --> 00:52:04.230
And you shift to the next
sub-class and you're still in
00:52:04.230 --> 00:52:08.340
steady state, because you have
a probability, 1 over d, of
00:52:08.340 --> 00:52:12.230
being in each of those
sub-classes to start with.
00:52:12.230 --> 00:52:16.970
You shift and you're still in
one of the sub-classes with
00:52:16.970 --> 00:52:19.130
probability 1 over d.
00:52:19.130 --> 00:52:22.690
So there still is a steady
state in that sense, but
00:52:22.690 --> 00:52:24.830
there's not a steady state
in any nice sense.
00:52:31.940 --> 00:52:39.470
So anyway, that's
the way it is.
00:52:39.470 --> 00:52:44.860
But you see, if you understand
this theorem for ergodic
00:52:44.860 --> 00:52:48.550
finite state and Markov
chains, and then you
00:52:48.550 --> 00:52:52.540
understand about periodic
change and this set of
00:52:52.540 --> 00:52:56.070
sub-classes, you can
see within each
00:52:56.070 --> 00:52:59.450
sub-class, if you look at--
00:52:59.450 --> 00:53:00.700
if you look at--
00:53:04.440 --> 00:53:11.500
if you look at time 0, time d,
time 2d, times 3d and 4d, then
00:53:11.500 --> 00:53:14.470
whatever state you start in,
you're going to be in the same
00:53:14.470 --> 00:53:19.380
class after d steps, the same
class after 2d steps.
00:53:19.380 --> 00:53:21.480
You're going to have
a transition
00:53:21.480 --> 00:53:24.280
matrix over d steps.
00:53:24.280 --> 00:53:27.360
And this theorem still applies
to these sub-classes over
00:53:27.360 --> 00:53:29.200
periods of d.
00:53:29.200 --> 00:53:32.030
So the hard part of it
is proving this.
00:53:32.030 --> 00:53:35.180
After you prove this, then you
see that the same thing
00:53:35.180 --> 00:53:38.200
happens over each sub-class
after that.
00:53:43.650 --> 00:53:45.290
That's a pretty major theorem.
00:53:45.290 --> 00:53:46.990
It's difficult to prove.
00:53:46.990 --> 00:53:50.890
A sub-step is to show that for
an ergodic M state Markov
00:53:50.890 --> 00:53:56.380
chain, the probability of being
in state j at time n,
00:53:56.380 --> 00:54:00.930
given that you're in state i at
time 0, is positive for all
00:54:00.930 --> 00:54:05.870
i j, and all n greater than
M minus 1 squared plus 1.
00:54:05.870 --> 00:54:10.900
It's very surprising that you
have to go this many states--
00:54:10.900 --> 00:54:14.980
this many steps before you get
to the point that all these
00:54:14.980 --> 00:54:18.440
transition probabilities
are positive.
00:54:18.440 --> 00:54:22.450
You look at this particular kind
of Markov chain in the
00:54:22.450 --> 00:54:26.660
homework, and I hope what you
found out from it was that if
00:54:26.660 --> 00:54:32.040
you start, say, in state two,
then at the next time, you
00:54:32.040 --> 00:54:33.640
have to be in 3.
00:54:33.640 --> 00:54:37.020
Next time, you have to be in
4, you have to be in 5, you
00:54:37.020 --> 00:54:38.560
have to be in 6.
00:54:38.560 --> 00:54:41.300
In other words, the size of
the set that you can be in
00:54:41.300 --> 00:54:46.550
after one step is just 1.
00:54:46.550 --> 00:54:51.170
One possible state here, one
possible state here, one
00:54:51.170 --> 00:54:52.640
possible state here.
00:54:52.640 --> 00:54:57.250
The next step, you're in either
1 or 2, and as you
00:54:57.250 --> 00:55:01.600
travel around, the size of the
set of states you can be in at
00:55:01.600 --> 00:55:06.510
these different steps, is 2,
until you get all the way
00:55:06.510 --> 00:55:07.510
around again.
00:55:07.510 --> 00:55:09.800
And then there's
a way to get--
00:55:09.800 --> 00:55:15.050
when you get to state 6 again,
the set of states enlarges.
00:55:15.050 --> 00:55:18.970
So finally you get up to a
set of states, which is
00:55:18.970 --> 00:55:20.800
up to M minus 1.
00:55:20.800 --> 00:55:25.630
And that's why you get the M
minus 1 squared here, plus 1.
00:55:25.630 --> 00:55:28.710
And this is the only Markov
chain there is.
00:55:28.710 --> 00:55:31.850
You can have as many
states going around
00:55:31.850 --> 00:55:33.770
here as you want to.
00:55:33.770 --> 00:55:36.020
But you have to have this
structure at the end, where
00:55:36.020 --> 00:55:39.930
there's one special state and
one way of circumventing it,
00:55:39.930 --> 00:55:43.930
which means there's one cycle
of size M minus 1, and one
00:55:43.930 --> 00:55:48.440
cycle of size M. And that's the
only way you can get it.
00:55:48.440 --> 00:55:52.780
And that's the only Markov chain
that meets this bound
00:55:52.780 --> 00:55:53.640
with equality.
00:55:53.640 --> 00:56:01.470
In all other cases, you get this
property much earlier.
00:56:01.470 --> 00:56:05.200
And often, you get it after just
a linear amount of time.
00:56:09.360 --> 00:56:13.350
The other part of this major
theorem that you reach steady
00:56:13.350 --> 00:56:17.350
state says, let P be
greater than 0.
00:56:17.350 --> 00:56:19.150
In other words, let
all the transition
00:56:19.150 --> 00:56:22.410
probabilities be positive.
00:56:22.410 --> 00:56:28.040
And then define some quantity
alpha as a minimum of the
00:56:28.040 --> 00:56:30.160
transition probabilities.
00:56:30.160 --> 00:56:34.110
And then the theorem says, for
all states j and all n greater
00:56:34.110 --> 00:56:38.470
than or equal to 1, the maximum
over the initial
00:56:38.470 --> 00:56:43.180
states minus the minimum over
the initial states of P sub i
00:56:43.180 --> 00:56:49.040
j to the n plus-- first step,
that difference is less than
00:56:49.040 --> 00:56:52.470
or equal to the difference
a the n-th step,
00:56:52.470 --> 00:56:54.300
times 1 minus 2 alpha.
00:56:54.300 --> 00:56:58.970
Now 1 minus 2 alpha is
as a positive number.
00:56:58.970 --> 00:57:03.700
And this says that this maximum
minus minimum is 1
00:57:03.700 --> 00:57:07.860
minus 2 alpha to the n, which
says that the limit of the
00:57:07.860 --> 00:57:11.220
maximizing term is equal
to the limit of
00:57:11.220 --> 00:57:12.640
the minimizing term.
00:57:12.640 --> 00:57:13.850
And what does that say?
00:57:13.850 --> 00:57:18.740
It says that everything in the
middle gets squeezed together.
00:57:18.740 --> 00:57:24.200
And it says exactly what we want
it to say, that the limit
00:57:24.200 --> 00:57:30.380
of P sub l j to the n is
independent of l, after n gets
00:57:30.380 --> 00:57:31.310
very large.
00:57:31.310 --> 00:57:34.090
Because the maximum and
the minimum get very
00:57:34.090 --> 00:57:37.560
close to each other.
00:57:37.560 --> 00:57:40.170
We also showed that [? our ?]
approaches that limit
00:57:40.170 --> 00:57:41.780
exponentially.
00:57:41.780 --> 00:57:43.640
That's what this says.
00:57:43.640 --> 00:57:49.860
The exponent here is just this
alpha, determined in that way.
00:57:49.860 --> 00:57:54.630
And the theorem for ergodic
Markov chains then follows by
00:57:54.630 --> 00:58:01.380
just looking at successive h
steps in the Markov chain when
00:58:01.380 --> 00:58:06.110
h is large enough so that all
these transition probabilities
00:58:06.110 --> 00:58:07.360
are positive.
00:58:09.300 --> 00:58:12.220
So you go out far enough
that all the transition
00:58:12.220 --> 00:58:13.860
probabilities are positive.
00:58:13.860 --> 00:58:16.980
And then you look at repetitions
of that, and apply
00:58:16.980 --> 00:58:18.230
this theorem.
00:58:18.230 --> 00:58:21.570
And suddenly you have this
general theorem,
00:58:21.570 --> 00:58:22.900
which is what we wanted.
00:58:27.200 --> 00:58:30.530
An ergodic unichain is a Markov
up chain with one
00:58:30.530 --> 00:58:33.870
ergodic recurring class,
plus perhaps a set
00:58:33.870 --> 00:58:36.550
of transient states.
00:58:36.550 --> 00:58:39.600
And most of the things we talk
about in this course are for
00:58:39.600 --> 00:58:45.870
unichains, usually ergodic
unichains, because if you have
00:58:45.870 --> 00:58:49.160
multiple recurrent classes,
it just makes a mess.
00:58:49.160 --> 00:58:51.780
You wind up in this recurrent
class, or
00:58:51.780 --> 00:58:53.950
this recurrent class.
00:58:53.950 --> 00:59:00.080
And aside from the question of
which one you get to, you
00:59:00.080 --> 00:59:01.730
don't much care about it.
00:59:01.730 --> 00:59:05.790
And the theorem here is for an
ergodic finite-state unichain.
00:59:05.790 --> 00:59:10.370
The limit of P sub i j to the
n probability of being in
00:59:10.370 --> 00:59:15.130
state j at time n, given that
you're in state i at time 0,
00:59:15.130 --> 00:59:17.290
is equal to pi sub j.
00:59:17.290 --> 00:59:22.330
In other words, this limit
here exists for all i j.
00:59:22.330 --> 00:59:25.210
And the limit is independent
of i.
00:59:25.210 --> 00:59:27.900
And it's independent of n
as n gets big enough.
00:59:32.820 --> 00:59:42.970
And then also, we can choose
this so that this set of
00:59:42.970 --> 00:59:47.680
probabilities here satisfies
this, what's called the steady
00:59:47.680 --> 00:59:51.780
state condition, the sum
of pi i times P sub i j
00:59:51.780 --> 00:59:53.140
is equal to pi j.
00:59:53.140 --> 00:59:56.380
In other words, if you start out
in steady state, and you
00:59:56.380 --> 01:00:00.300
look at the probabilities of
being in the different states
01:00:00.300 --> 01:00:06.610
at the next time unit, this is
the probability of being in
01:00:06.610 --> 01:00:11.610
state j at time n plus 1, if
this is the probability of
01:00:11.610 --> 01:00:14.420
being in state i at time n.
01:00:14.420 --> 01:00:17.790
So that condition
gets satisfied.
01:00:17.790 --> 01:00:19.280
That condition is satisfied.
01:00:19.280 --> 01:00:22.760
You just stay in steady
state forever.
01:00:22.760 --> 01:00:29.210
And pi i has to be positive for
a recurrent i, and pi i is
01:00:29.210 --> 01:00:31.680
equal to 0 otherwise.
01:00:31.680 --> 01:00:35.230
So this is just a
generalization
01:00:35.230 --> 01:00:38.090
of the ergodic theorem.
01:00:38.090 --> 01:00:43.400
And this is not what people
refer to as the ergodic
01:00:43.400 --> 01:00:48.160
theorem, which is a much more
general theorem than this.
01:00:48.160 --> 01:00:50.900
This is the ergodic theorem for
the case of finite state
01:00:50.900 --> 01:00:53.110
Markov chains.
01:00:53.110 --> 01:00:59.190
You can restate this in matrix
form as the limit of the
01:00:59.190 --> 01:01:02.900
matrix P to the n-th power.
01:01:02.900 --> 01:01:06.680
What I didn't mention here and
what I probably didn't mention
01:01:06.680 --> 01:01:11.880
enough in the notes is
that P sub i j--
01:01:32.360 --> 01:01:47.560
but also, if you take the matrix
P times P time P, n
01:01:47.560 --> 01:01:53.880
times, namely, you take the
matrix, P to the n.
01:01:53.880 --> 01:02:00.720
This says the P sub i j
is the i j element.
01:02:09.900 --> 01:02:12.530
I'm sure all of you know that
by now, because you've been
01:02:12.530 --> 01:02:15.310
using it all the time.
01:02:15.310 --> 01:02:18.820
And what this says here--
01:02:18.820 --> 01:02:26.150
what we've said before is that
every row of this matrix, P to
01:02:26.150 --> 01:02:28.600
the n, is the same.
01:02:28.600 --> 01:02:31.290
Every row is equal to pi.
01:02:31.290 --> 01:02:47.786
P to the n tends to a matrix
which is pi 1, pi 2,
01:02:47.786 --> 01:02:52.120
up to pi sub n.
01:02:52.120 --> 01:02:57.000
Pi 1, pi 2, up to pi sub n.
01:03:00.760 --> 01:03:06.770
Pi 1, pi 2, up to pi sub n.
01:03:06.770 --> 01:03:14.660
And the easiest way to express
this is the vector e times pi,
01:03:14.660 --> 01:03:24.960
where e is transposed.
01:03:24.960 --> 01:03:32.755
In other words, if you take a
column matrix, column 1, 1, 1,
01:03:32.755 --> 01:03:40.670
1, 1, and you multiply this by
a row vector, pi 1 times pi
01:03:40.670 --> 01:03:48.030
sub n, what you get is, for this
first row multiplied by
01:03:48.030 --> 01:03:51.210
this, this gives you--
01:03:51.210 --> 01:03:53.480
well, in fact, if you
multiply this out,
01:03:53.480 --> 01:03:56.360
this is what you get.
01:03:56.360 --> 01:03:58.650
And if you've never gone through
the trouble of seeing
01:03:58.650 --> 01:04:03.880
that this multiplication leads
to this, please do it, because
01:04:03.880 --> 01:04:07.170
it's important to notice
that correspondence.
01:04:14.530 --> 01:04:18.080
We got specific results by
looking at the eigenvalues and
01:04:18.080 --> 01:04:20.880
eigenvectors of stochastic
matrices.
01:04:20.880 --> 01:04:24.720
And a stochastic matrix is the
matrix of a Markov chain.
01:04:28.500 --> 01:04:31.290
So some of these things
are sort of obvious.
01:04:31.290 --> 01:04:36.870
Lambda is an eigenvalue of P, if
and only if P minus lambda
01:04:36.870 --> 01:04:38.120
i is singular.
01:04:41.670 --> 01:04:45.040
This set of relationships
is not obvious.
01:04:45.040 --> 01:04:48.130
This is obvious linear
algebra.
01:04:48.130 --> 01:04:51.250
This is something that when
you study eigenvalues and
01:04:51.250 --> 01:04:55.430
eigenvectors in linear algebra,
you recognize that
01:04:55.430 --> 01:04:57.270
this is a summary of
a lot of things.
01:04:57.270 --> 01:05:01.440
If and only if this determinant
is equal to 0,
01:05:01.440 --> 01:05:05.650
which is true if and only if
there's some vector nu for
01:05:05.650 --> 01:05:12.560
which P times nu equals lambda
times nu for nu unequal to 0.
01:05:12.560 --> 01:05:16.920
And if and only if pi P equals
lambda pi for some
01:05:16.920 --> 01:05:18.210
pi unequal to 0.
01:05:18.210 --> 01:05:23.250
In other words, if this
determinant is equal to 0, it
01:05:23.250 --> 01:05:32.040
means that the matrix P minus
lambda i is singular.
01:05:32.040 --> 01:05:35.950
If the matrix is singular, there
has to be some solution
01:05:35.950 --> 01:05:38.370
to this equation here.
01:05:38.370 --> 01:05:40.220
There has to be some
solution to this
01:05:40.220 --> 01:05:44.530
left eigenvector equation.
01:05:44.530 --> 01:05:48.740
Now, once you see this, you
notice that e is always a
01:05:48.740 --> 01:05:53.750
right eigenvector of P. Every
stochastic matrix in the world
01:05:53.750 --> 01:05:58.920
has the property that e is a
right eigenvector of it.
01:05:58.920 --> 01:05:59.800
Why is that?
01:05:59.800 --> 01:06:05.230
Because all of the rows of a
stochastic matrix sum to 1.
01:06:05.230 --> 01:06:10.070
If you start off in state i, the
sum of the possible states
01:06:10.070 --> 01:06:14.530
you can be at in the next
step is equal to 1.
01:06:14.530 --> 01:06:17.120
You have to go somewhere.
01:06:17.120 --> 01:06:21.650
So e is always a right
eigenvector of P with
01:06:21.650 --> 01:06:23.300
eigenvalue 1.
01:06:23.300 --> 01:06:26.510
Since e is also is a right
eigenvector of P with
01:06:26.510 --> 01:06:29.850
eigenvalue 1, we go up here.
01:06:29.850 --> 01:06:32.460
We look at these if and
only if statements.
01:06:32.460 --> 01:06:34.890
We see, then, P must
be singular.
01:06:34.890 --> 01:06:38.410
And then pi times P
equals lambda pi.
01:06:38.410 --> 01:06:41.410
So no matter how many recurrent
classes we have, no
01:06:41.410 --> 01:06:46.430
matter what periodicity we have
in each of them, there's
01:06:46.430 --> 01:06:53.170
always a solution to pi
times P equals pi.
01:06:53.170 --> 01:06:55.550
There's always at least one
steady state vector.
01:06:59.320 --> 01:07:03.580
This determinant has an M-th
degree polynomial in lambda.
01:07:03.580 --> 01:07:08.150
M-th degree polynomials
have M roots.
01:07:08.150 --> 01:07:10.400
They aren't necessarily
distinct.
01:07:10.400 --> 01:07:14.040
The multiplicity of an
eigenvalue is the number roots
01:07:14.040 --> 01:07:15.500
of that value.
01:07:15.500 --> 01:07:19.780
And the multiplicity
of lambda equals 1.
01:07:19.780 --> 01:07:22.530
How many different roots
are there which have
01:07:22.530 --> 01:07:24.360
lambda equals 1?
01:07:24.360 --> 01:07:26.940
Well it turns out to be just
the number of recurrent
01:07:26.940 --> 01:07:29.550
classes that you have.
01:07:29.550 --> 01:07:32.750
If you have a bunch of recurrent
classes, within each
01:07:32.750 --> 01:07:37.330
recurring class, there's a
solution to pi P equals pi,
01:07:37.330 --> 01:07:41.540
which is non-zero only one
that recurrent class.
01:07:41.540 --> 01:07:46.340
Namely, you take this huge
Markov chain and you say, I
01:07:46.340 --> 01:07:48.650
don't care about any
of this except this
01:07:48.650 --> 01:07:50.890
one recurrent class.
01:07:50.890 --> 01:07:53.990
If we look at this one recurrent
class, and solve for
01:07:53.990 --> 01:07:57.500
the steady state probability in
that one recurrent class,
01:07:57.500 --> 01:08:01.220
then we get an eigenvector
which is non-zero on that
01:08:01.220 --> 01:08:05.990
class, 0 everywhere else, that
has an eigenvalue 1.
01:08:05.990 --> 01:08:08.050
And for every other recurrent
class, we
01:08:08.050 --> 01:08:10.590
get the same situation.
01:08:10.590 --> 01:08:14.150
So the multiplicity of lambda
equals 1 is equal to the
01:08:14.150 --> 01:08:17.260
number of recurrent classes.
01:08:17.260 --> 01:08:21.950
If you didn't get that proof
on the fly, it gets
01:08:21.950 --> 01:08:23.310
proved in the notes.
01:08:23.310 --> 01:08:27.130
And if you don't get the proof,
just remember that
01:08:27.130 --> 01:08:28.380
that's the way it is.
01:08:30.859 --> 01:08:34.859
For the special case where all
M eigenvalues are distinct,
01:08:34.859 --> 01:08:38.640
the right eigenvectors are
linearly independent.
01:08:38.640 --> 01:08:42.620
You remember that proof we went
through that all of the
01:08:42.620 --> 01:08:46.470
left eigenvectors and all the
right eigenvectors are all
01:08:46.470 --> 01:08:49.870
orthonormal to each other,
or you can make them all
01:08:49.870 --> 01:08:52.270
orthonormal to each other?
01:08:52.270 --> 01:08:57.380
That says that if the right
eigenvectors are linearly
01:08:57.380 --> 01:09:01.120
independent, you can represent
them as the columns of an
01:09:01.120 --> 01:09:04.750
invertible matrix U.
Then P times U is
01:09:04.750 --> 01:09:06.819
equal to U times lambda.
01:09:06.819 --> 01:09:09.800
What does this equations say?
01:09:09.800 --> 01:09:12.460
You split it up into a
bunch of equations.
01:09:16.500 --> 01:09:46.080
P times U and we look at it as
nu 1, nu 2, nu sub [? n ?].
01:09:46.080 --> 01:09:52.580
I guess better put the
superscripts on it.
01:09:56.100 --> 01:10:01.270
If I take the matrix U and just
view it as M different
01:10:01.270 --> 01:10:05.190
columns, then what this
is saying is that
01:10:05.190 --> 01:10:06.545
this is equal to--
01:10:17.290 --> 01:10:35.540
nu 1, nu 2, nu M, times lambda
1, lambda 2, up to lambda M.
01:10:35.540 --> 01:10:38.500
Now you multiply this out,
and what do you get?
01:10:38.500 --> 01:10:41.860
You get nu 1 times lambda 1.
01:10:41.860 --> 01:10:46.190
You get a nu 2 times lambda 2
for the second column, nu M
01:10:46.190 --> 01:10:49.820
times lambda M for the last
column, and here you get P
01:10:49.820 --> 01:10:54.360
times nu 1 is equal to a nu 1
times lambda 1, and so forth.
01:10:54.360 --> 01:10:59.240
So all this vector equation says
is the same thing that
01:10:59.240 --> 01:11:04.760
these n M individual eigenvector
equations say.
01:11:04.760 --> 01:11:11.160
It's just a more compact way
of saying the same thing.
01:11:11.160 --> 01:11:17.300
And if these eigenvectors span
this space, then this set of
01:11:17.300 --> 01:11:20.710
eigenvectors are linearly
independent of each other.
01:11:20.710 --> 01:11:24.860
And when you look at the set of
them, this matrix here has
01:11:24.860 --> 01:11:26.440
to have an inverse.
01:11:26.440 --> 01:11:34.890
So you can also express this
as P equals this vector--
01:11:34.890 --> 01:11:40.820
this matrix of right
eigenvectors times the
01:11:40.820 --> 01:11:46.630
diagonal matrix lambda, times
the inverse of this matrix.
01:11:46.630 --> 01:11:49.880
Matrix U to the minus 1 turns
out to have rows equal to the
01:11:49.880 --> 01:11:51.730
left eigenvectors.
01:11:51.730 --> 01:11:54.330
That's because these
eigenvectors--
01:11:54.330 --> 01:11:57.440
that's because the right
eigenvectors and the left
01:11:57.440 --> 01:12:01.270
eigenvectors are orthogonal
to each other.
01:12:04.670 --> 01:12:09.690
When we then split up this
matrix into a sum of M
01:12:09.690 --> 01:12:13.830
different matrices, each matrix
having only one--
01:12:41.270 --> 01:12:43.460
and so forth.
01:12:43.460 --> 01:12:45.710
Then what you get--
01:12:45.710 --> 01:12:48.490
here's this--
01:12:48.490 --> 01:12:54.730
this nice equation here, which
says that if all the
01:12:54.730 --> 01:12:58.870
eigenvalues are distinct, then
you can always represent a
01:12:58.870 --> 01:13:03.420
stochastic matrix as the sum of
lambda i times nu to the i
01:13:03.420 --> 01:13:04.670
times pi to the i.
01:13:04.670 --> 01:13:10.000
More importantly, if you take
this equation here and look at
01:13:10.000 --> 01:13:14.470
P to the n, P to the n is U
times lambda times U to the
01:13:14.470 --> 01:13:18.820
minus 1, times U times lambda
times U to the minus 1, blah,
01:13:18.820 --> 01:13:20.270
blah, blah forever.
01:13:20.270 --> 01:13:24.030
Each U to the minus 1 cancels
out with the following U. And
01:13:24.030 --> 01:13:29.330
you wind up with P to the n
equals U times lambda to the
01:13:29.330 --> 01:13:33.170
n, U to the minus 1.
01:13:33.170 --> 01:13:40.250
Which says that P to the
n is just a sum here.
01:13:40.250 --> 01:13:44.650
It's the sum of the eigenvalues
to the n-th power
01:13:44.650 --> 01:13:47.320
times these pairs of
eigenvectors here.
01:13:47.320 --> 01:13:51.660
So this is a general
decomposition for P to the n.
01:13:51.660 --> 01:13:56.010
What we're interested in is what
happens as n gets large.
01:13:56.010 --> 01:13:59.360
If we have a unit chain, we
already know what happens as n
01:13:59.360 --> 01:14:00.570
gets large.
01:14:00.570 --> 01:14:07.110
We know that as n gets large,
we wind up with just 1 times
01:14:07.110 --> 01:14:12.480
this eigenvector e times
this eigenvector pi.
01:14:12.480 --> 01:14:15.760
Which says that all of the other
eigenvalues have to go
01:14:15.760 --> 01:14:19.670
to 0, which says that the
magnitude of these other
01:14:19.670 --> 01:14:22.200
eigenvalues are less than 1.
01:14:22.200 --> 01:14:23.450
So they're all going away.
01:14:26.600 --> 01:14:32.300
So the facts here are that all
eigenvalues lambda have to
01:14:32.300 --> 01:14:35.310
satisfy the magnitude
of lambda is less
01:14:35.310 --> 01:14:36.740
than or equal to 1.
01:14:36.740 --> 01:14:39.680
That's what I just argued.
01:14:39.680 --> 01:14:44.530
For each recurrent class C,
there's one lambda equals 1,
01:14:44.530 --> 01:14:47.750
with a left side and vector
equals the steady state on
01:14:47.750 --> 01:14:51.190
that recurrent class
and 0 elsewhere.
01:14:51.190 --> 01:14:55.230
The right eigenvector nu
satisfies the limit as n goes
01:14:55.230 --> 01:14:56.410
to infinity.
01:14:56.410 --> 01:15:00.930
So the probability that x sub n
is in this recurring class,
01:15:00.930 --> 01:15:04.850
given that x sub 0 is equal
to 0, is equal to the i-th
01:15:04.850 --> 01:15:08.700
component of that right
eigenvector.
01:15:08.700 --> 01:15:13.200
In other words, if you have a
Markov chain which has several
01:15:13.200 --> 01:15:16.480
recurrent classes, and you
want to find out what the
01:15:16.480 --> 01:15:23.630
probability is, starting in the
transient state, of going
01:15:23.630 --> 01:15:29.170
to one of those classes, this is
what tells you that answer.
01:15:29.170 --> 01:15:33.510
This says that the probability
that you go to a particular
01:15:33.510 --> 01:15:37.530
recurrent class C, given that
you start off in a particular
01:15:37.530 --> 01:15:41.340
transient state i, is whatever
that right eigenvector
01:15:41.340 --> 01:15:42.690
turns out to be.
01:15:42.690 --> 01:15:46.170
And you can solve that right
eigenvector problem just as an
01:15:46.170 --> 01:15:48.920
M by M set of linear
equations.
01:15:48.920 --> 01:15:51.170
So you can find the
probabilities of going through
01:15:51.170 --> 01:15:56.370
each transient state just by
solving that set of linear
01:15:56.370 --> 01:16:01.650
equations and finding those
eigenvector equations.
01:16:01.650 --> 01:16:05.770
For each recurrent periodic
class of period d, there are d
01:16:05.770 --> 01:16:09.140
eigenvalues equally spaced
on the unit circle.
01:16:09.140 --> 01:16:13.330
There are no other eigenvalues
with lambda equals 1-- with a
01:16:13.330 --> 01:16:15.080
magnitude of lambda equals 1.
01:16:15.080 --> 01:16:19.070
In other words, for each
recurrent class, you get one
01:16:19.070 --> 01:16:20.700
eigenvalue that's equal to 1.
01:16:20.700 --> 01:16:25.260
If that recurrent class is
periodic, you get a bunch of
01:16:25.260 --> 01:16:30.640
other eigenvalues put around
the unit circle.
01:16:30.640 --> 01:16:35.380
And those are all the
eigenvalues there are.
01:16:35.380 --> 01:16:36.296
Oh my God.
01:16:36.296 --> 01:16:38.000
It's--
01:16:38.000 --> 01:16:39.930
I thought I was talking
quickly.
01:16:39.930 --> 01:16:44.870
But anyway, if the eigenvectors
don't span the
01:16:44.870 --> 01:16:50.360
space, then P to the n is equal
to U times this Jordan
01:16:50.360 --> 01:16:55.350
reform, U to the minus 1, where
J is a Jordan form.
01:16:55.350 --> 01:16:58.320
What you saw in the homework
when you looked at the--
01:17:02.030 --> 01:17:04.075
when you looked at the
Markov chain--
01:17:28.120 --> 01:17:28.620
OK.
01:17:28.620 --> 01:17:35.020
This is one recurrent class
with this one node in it.
01:17:35.020 --> 01:17:38.030
These two nodes are
both transient.
01:17:38.030 --> 01:17:41.720
If you look at how long it takes
to get from here over to
01:17:41.720 --> 01:17:45.120
there, those transition
probabilities do not
01:17:45.120 --> 01:17:51.620
correspond to this
equation here.
01:17:51.620 --> 01:17:54.075
Instead, P sub 1 2--
01:17:57.400 --> 01:18:00.230
P sub 2 3, the way I've
drawn it here.
01:18:00.230 --> 01:18:07.140
P sub 1 3 is n times this
eigenvalue, which
01:18:07.140 --> 01:18:09.760
is 1/2 in this case.
01:18:09.760 --> 01:18:12.820
And it doesn't correspond to
this, which is why you need a
01:18:12.820 --> 01:18:14.290
Jordan form.
01:18:14.290 --> 01:18:17.860
I said that Jordan forms
are excessively ugly.
01:18:17.860 --> 01:18:22.120
Jordan forms are really very
classy and nice ways of
01:18:22.120 --> 01:18:24.460
dealing with a problem
which is very ugly.
01:18:24.460 --> 01:18:26.340
So don't blame Jordan.
01:18:26.340 --> 01:18:29.670
Jordan simplified
things for us.
01:18:29.670 --> 01:18:36.840
So that's roughly as far as we
went with Markov chains.
01:18:40.970 --> 01:18:44.910
Renewal processes, we don't have
to review them because
01:18:44.910 --> 01:18:47.400
you're already immediately
familiar with them.
01:18:50.610 --> 01:18:55.910
I will do one thing next time
with renewal classes and
01:18:55.910 --> 01:19:00.290
Markov chains, which is to
explain to you why the
01:19:00.290 --> 01:19:04.660
expected amount of time to get
from one state back to itself
01:19:04.660 --> 01:19:07.380
is equal to 1 over pi--
01:19:07.380 --> 01:19:09.160
1 over pi sub i.
01:19:09.160 --> 01:19:10.790
You did that in the homework.
01:19:10.790 --> 01:19:12.900
And it was an awful
way to do it.
01:19:12.900 --> 01:19:14.340
And there's a nice
way to do it.
01:19:14.340 --> 01:19:15.860
I'll talk about that next time.