WEBVTT

00:00:00.530 --> 00:00:02.960
The following content is
provided under a Creative

00:00:02.960 --> 00:00:04.370
Commons license.

00:00:04.370 --> 00:00:07.410
Your support will help MIT
OpenCourseWare continue to

00:00:07.410 --> 00:00:11.060
offer high quality educational
resources for free.

00:00:11.060 --> 00:00:13.960
To make a donation or view
additional materials from

00:00:13.960 --> 00:00:17.890
hundreds of MIT courses, visit
MIT OpenCourseWare at

00:00:17.890 --> 00:00:19.140
ocw.mit.edu.

00:00:24.010 --> 00:00:26.560
PROFESSOR: I'm going to spend
most of time talking about

00:00:26.560 --> 00:00:29.400
chapters one, two, and three.

00:00:29.400 --> 00:00:32.220
A little bit talking about
chapter four, because we've

00:00:32.220 --> 00:00:36.370
been doing so much with chapter
four in the last

00:00:36.370 --> 00:00:39.980
couple of weeks that you
probably remember that more.

00:00:39.980 --> 00:00:40.580
OK.

00:00:40.580 --> 00:00:44.310
The basics, which we started out
with, and which you should

00:00:44.310 --> 00:00:48.800
never forget, is that any time
you develop a probability

00:00:48.800 --> 00:00:53.840
model, you've got to specify
what the sample space is and

00:00:53.840 --> 00:00:57.920
what the probability measure
on that sample space is.

00:00:57.920 --> 00:01:01.850
And in practice, and in almost
everything we've talked about

00:01:01.850 --> 00:01:05.800
so far, there's really a basic
countable set of random

00:01:05.800 --> 00:01:08.490
variables which determine
everything else.

00:01:08.490 --> 00:01:12.030
In other words, when you find
the joint probability

00:01:12.030 --> 00:01:16.730
distribution on that set of
random variables, that tells

00:01:16.730 --> 00:01:20.570
you everything else
of interest.

00:01:20.570 --> 00:01:25.200
And a sample point or a sample
path on that set of random

00:01:25.200 --> 00:01:29.520
variables is in a collection of
sample values, one sample

00:01:29.520 --> 00:01:33.980
value for each random
variable.

00:01:33.980 --> 00:01:37.740
It's very convenient, especially
when you're in an

00:01:37.740 --> 00:01:43.630
exam and a little bit rushed,
to confuse random variables

00:01:43.630 --> 00:01:47.250
with the sample values for
the random variables.

00:01:47.250 --> 00:01:48.920
And that's fine.

00:01:48.920 --> 00:01:51.900
I just want to caution you
again, and I've done this many

00:01:51.900 --> 00:01:58.410
times, that about half the
mistakes that people make--

00:01:58.410 --> 00:02:01.980
half of the conceptual mistakes
that people make

00:02:01.980 --> 00:02:06.200
doing problems and doing quizzes
are connected with

00:02:06.200 --> 00:02:09.810
getting confused at some point
about what's a random variable

00:02:09.810 --> 00:02:12.210
and what's a sample value
of that random variable.

00:02:12.210 --> 00:02:17.210
And you start thinking about
sample values as just numbers.

00:02:17.210 --> 00:02:19.090
And I do that too.

00:02:19.090 --> 00:02:21.220
It's convenient for thinking
about things.

00:02:21.220 --> 00:02:26.790
But you have to know that that's
not the whole story.

00:02:26.790 --> 00:02:29.740
Often, we have uncountable
sets of random variables.

00:02:29.740 --> 00:02:34.720
Like in renewal processes, we
have the counting renewal

00:02:34.720 --> 00:02:38.690
process, which typically has an
uncountable set of random

00:02:38.690 --> 00:02:43.860
variables, a number of arrivals
up to each time, t,

00:02:43.860 --> 00:02:48.750
where t is a continuous valued
random variable.

00:02:48.750 --> 00:02:52.810
But in almost all of those
cases, you can define things

00:02:52.810 --> 00:02:56.195
in terms of simpler sets of
random variables, like the

00:02:56.195 --> 00:02:59.480
interarrival times,
which are IID.

00:03:02.530 --> 00:03:05.960
Most of the processes we've
talked about really have a

00:03:05.960 --> 00:03:08.600
pretty simple description if
you look for the simplest

00:03:08.600 --> 00:03:09.850
description of them.

00:03:13.730 --> 00:03:17.680
If you have a sequence of
IID random variables--

00:03:17.680 --> 00:03:25.270
which is what we have for
Poisson and renewal processes,

00:03:25.270 --> 00:03:28.680
and what we have for Markov
chains is not that much more

00:03:28.680 --> 00:03:30.310
complicated--

00:03:30.310 --> 00:03:35.500
the laws of large numbers are
useful to specify what the

00:03:35.500 --> 00:03:38.500
long term behavior is.

00:03:38.500 --> 00:03:47.280
The sample time average is, as
we all know by now, is the sum

00:03:47.280 --> 00:03:49.960
of the random variables
divided by n.

00:03:49.960 --> 00:03:53.090
So it's a sample average
of these quantities.

00:03:53.090 --> 00:03:57.570
The random variable, which has
a main x bar, the expected

00:03:57.570 --> 00:04:00.140
value of x, that's
almost obvious.

00:04:00.140 --> 00:04:03.350
You just take the expected value
of s sub n, and it's n

00:04:03.350 --> 00:04:08.360
times the expected value of x
divided by n, and you're done.

00:04:08.360 --> 00:04:11.680
And the variance, since these
random variables are

00:04:11.680 --> 00:04:15.540
independent, you find that
almost as easily.

00:04:15.540 --> 00:04:18.810
That has this very
simple-minded

00:04:18.810 --> 00:04:20.850
distribution function.

00:04:20.850 --> 00:04:24.340
Remember, we usually work
with distribution

00:04:24.340 --> 00:04:26.960
functions in this class.

00:04:26.960 --> 00:04:32.580
And often, the exercises are
much easier when you do them

00:04:32.580 --> 00:04:36.500
in terms of the distribution
function than if you use

00:04:36.500 --> 00:04:40.760
formulas you remember from
elementary courses, which are

00:04:40.760 --> 00:04:44.260
specialized to--

00:04:44.260 --> 00:04:47.140
which are specialized to
probability density and

00:04:47.140 --> 00:04:51.170
probability mass functions, and
often have more special

00:04:51.170 --> 00:04:53.110
conditions on them than that.

00:04:53.110 --> 00:04:57.470
But anyway, the distribution
function starts

00:04:57.470 --> 00:04:58.570
to look like this.

00:04:58.570 --> 00:05:03.250
As n gets bigger, you notice
that what's happening is that

00:05:03.250 --> 00:05:08.860
you get a distribution which
is scrunching in this way,

00:05:08.860 --> 00:05:10.820
which is starting to
look smoother.

00:05:10.820 --> 00:05:13.450
The jumps in it gets smaller.

00:05:13.450 --> 00:05:18.630
And you start out with this
thing which is kind of crazy.

00:05:18.630 --> 00:05:21.370
And by time, n is even 50.

00:05:21.370 --> 00:05:25.770
You get something which
almost looks like a--

00:05:25.770 --> 00:05:26.840
I don't know how we tell
the difference

00:05:26.840 --> 00:05:28.460
between those two things.

00:05:28.460 --> 00:05:30.060
I thought we could,
but we can't.

00:05:30.060 --> 00:05:31.670
I certainly can't up there.

00:05:31.670 --> 00:05:37.650
But anyway, the one that's
tightest in is the one

00:05:37.650 --> 00:05:39.880
for n equals 50.

00:05:39.880 --> 00:05:44.150
And what these laws of large
numbers all say in some sense

00:05:44.150 --> 00:05:51.380
is that this distribution
function gets crunched in

00:05:51.380 --> 00:05:54.550
towards an impulse
at the mean.

00:05:54.550 --> 00:05:58.260
And then they say other more
specialized things about how

00:05:58.260 --> 00:06:02.580
this happens, about sample
paths and all of that.

00:06:02.580 --> 00:06:06.270
But the idea is that this
distribution function is

00:06:06.270 --> 00:06:10.760
heading towards a
unit impulse.

00:06:10.760 --> 00:06:14.440
The weak law of large numbers
then says that if the expected

00:06:14.440 --> 00:06:18.840
value of the magnitude of x
is less than infinity--

00:06:18.840 --> 00:06:21.660
and usually when we talk about
random variables having a

00:06:21.660 --> 00:06:25.630
mean, that's exactly
what we mean.

00:06:25.630 --> 00:06:31.220
If that condition is not
satisfied, then we usually say

00:06:31.220 --> 00:06:33.690
that the random variable
doesn't have a mean.

00:06:33.690 --> 00:06:37.300
And you'll see that every time
you look at anything in

00:06:37.300 --> 00:06:38.520
probability theory.

00:06:38.520 --> 00:06:41.940
When people say the mean exists,
that's what they

00:06:41.940 --> 00:06:43.830
always mean.

00:06:43.830 --> 00:06:47.950
And what the theorem says then
is exactly what we were

00:06:47.950 --> 00:06:49.060
talking about before.

00:06:49.060 --> 00:06:54.940
The probability that the
difference between s n over n,

00:06:54.940 --> 00:06:58.570
and the mean of x bar, the
probability that it's greater

00:06:58.570 --> 00:07:03.090
than or equal to epsilon
equals 0 in the limit.

00:07:03.090 --> 00:07:06.020
So it's saying that you put
epsilon limits on that

00:07:06.020 --> 00:07:10.860
distribution function and let
n get bigger and bigger, it

00:07:10.860 --> 00:07:14.570
goes to 1 and 0.

00:07:14.570 --> 00:07:18.120
It says the probability of s n
over n, less than or equal to

00:07:18.120 --> 00:07:23.240
x, approaches a unit step as
n approaches infinity.

00:07:23.240 --> 00:07:27.660
This says this is the condition
for convergence in

00:07:27.660 --> 00:07:30.440
probability.

00:07:30.440 --> 00:07:33.880
What we're saying is that that
also means convergence and

00:07:33.880 --> 00:07:38.740
distribution function, and
distribution for this case.

00:07:38.740 --> 00:07:42.520
And then we also, when we got
to renewal processes, we

00:07:42.520 --> 00:07:45.330
talked about the strong
law of large numbers.

00:07:45.330 --> 00:07:49.760
And that says that the expected
value of x is finite.

00:07:49.760 --> 00:07:56.630
Then this limit approaches
x on a sample path basis.

00:07:56.630 --> 00:07:59.770
In other words, for every sample
path, except this set

00:07:59.770 --> 00:08:05.020
of probability 0, this
condition holds true.

00:08:05.020 --> 00:08:08.260
That doesn't seem like it's
very different or very

00:08:08.260 --> 00:08:10.610
important for the time being.

00:08:10.610 --> 00:08:14.060
But when we started studying
renewal processes, which is

00:08:14.060 --> 00:08:19.120
where we actually talked about
this, we saw that in fact, it

00:08:19.120 --> 00:08:24.830
let us talk about this, which
says that if you take any

00:08:24.830 --> 00:08:28.700
function of s n over n--

00:08:28.700 --> 00:08:31.590
in other words, a function
of a real value--

00:08:31.590 --> 00:08:33.830
a function of a--

00:08:33.830 --> 00:08:35.720
a real valued function of a--

00:08:40.010 --> 00:08:43.570
a real valued function
of a real value, yes.

00:08:43.570 --> 00:08:46.470
What you get is that
same function

00:08:46.470 --> 00:08:49.100
applied to the mean here.

00:08:49.100 --> 00:08:50.260
And that's the thing
which is so

00:08:50.260 --> 00:08:52.630
useful for renewal processes.

00:08:52.630 --> 00:08:55.740
And it's what usually makes
the strong law of large

00:08:55.740 --> 00:08:58.730
numbers so much easier to
use than the weak law.

00:09:04.220 --> 00:09:06.170
That's a plug for
the strong law.

00:09:06.170 --> 00:09:08.745
There are many extensions of the
week love telling how fast

00:09:08.745 --> 00:09:10.910
the convergence is.

00:09:10.910 --> 00:09:14.350
One thing you should always
remember about the central

00:09:14.350 --> 00:09:17.510
limit theorem, is it really
tells you something about the

00:09:17.510 --> 00:09:18.790
weak law of large numbers.

00:09:18.790 --> 00:09:22.260
It tells you how fast that
convergence is and what the

00:09:22.260 --> 00:09:24.720
convergence looks like.

00:09:24.720 --> 00:09:28.170
It says that if the variance
of this underlying random

00:09:28.170 --> 00:09:34.000
variable is finite, then this
limit here is equal to the

00:09:34.000 --> 00:09:37.290
normal distribution function,
the Gaussian at

00:09:37.290 --> 00:09:41.350
variance 1 and mean 0.

00:09:41.350 --> 00:09:45.070
And that becomes a little easier
to see what it's saying

00:09:45.070 --> 00:09:46.870
if you look at it this way.

00:09:46.870 --> 00:09:51.510
It says probability that s
n over n minus x bar--

00:09:51.510 --> 00:09:56.890
namely the difference between
the sum and the mean which

00:09:56.890 --> 00:09:58.380
it's converging to--

00:09:58.380 --> 00:10:01.340
the probability that that's less
than or equal to y sigma

00:10:01.340 --> 00:10:04.010
over square root of
n is this normal

00:10:04.010 --> 00:10:05.480
Gaussian random variable.

00:10:05.480 --> 00:10:11.740
It says that as n gets bigger
and bigger, this quantity here

00:10:11.740 --> 00:10:13.030
gets tighter and tighter.

00:10:13.030 --> 00:10:18.620
What it says in terms of the
picture here, in terms of this

00:10:18.620 --> 00:10:22.900
picture, it says that as n gets
bigger and bigger, this

00:10:22.900 --> 00:10:28.560
picture here scrunches down as
1 over the square root of n.

00:10:28.560 --> 00:10:30.970
And it also becomes Gaussian.

00:10:30.970 --> 00:10:33.760
| it tells you exactly what
kind of convergence you

00:10:33.760 --> 00:10:34.770
actually have here.

00:10:34.770 --> 00:10:39.200
Is not only saying that this
does converge to a unit step.

00:10:39.200 --> 00:10:42.010
It says how it converges.

00:10:42.010 --> 00:10:48.240
And that's a nice thing,
conceptually.

00:10:48.240 --> 00:10:51.780
You don't always need
it in problems.

00:10:51.780 --> 00:10:54.600
But you need it for
understanding what's going on.

00:10:59.890 --> 00:11:01.690
We're moving backwards,
it seems.

00:11:06.180 --> 00:11:09.420
Now, 1, 2, Poisson processes.

00:11:09.420 --> 00:11:12.630
We talked about arrival
processes.

00:11:12.630 --> 00:11:15.260
You'd almost think that all
processes are arrival

00:11:15.260 --> 00:11:17.080
processes at this point.

00:11:17.080 --> 00:11:19.770
But any time you start to think
about that, think of a

00:11:19.770 --> 00:11:21.270
Markov chain.

00:11:21.270 --> 00:11:26.150
And a Markov chain is not an
arrival process, ordinarily.

00:11:26.150 --> 00:11:28.470
Some of them can be
viewed that way.

00:11:28.470 --> 00:11:29.690
But most of them can't.

00:11:29.690 --> 00:11:31.990
An arrival processes
is an increasing

00:11:31.990 --> 00:11:34.650
sequence of random variables.

00:11:34.650 --> 00:11:40.020
0 less than s1, which is the
time of the first arrival, s2,

00:11:40.020 --> 00:11:42.810
which is a time of the second
arrival, and so forth.

00:11:42.810 --> 00:11:48.220
Interarrival times are x1 equals
s1, and x i equals s i

00:11:48.220 --> 00:11:51.150
minus s i minus 1.

00:11:51.150 --> 00:11:55.480
The picture, which you should
have indelibly printed on the

00:11:55.480 --> 00:11:58.850
back of your brain someplace
by this time, is

00:11:58.850 --> 00:12:00.430
this picture here.

00:12:00.430 --> 00:12:04.930
s1, s2, s3, are the times
at which arrivals occur.

00:12:04.930 --> 00:12:07.590
These are random variables, so
these arrivals come in at

00:12:07.590 --> 00:12:09.320
random times.

00:12:09.320 --> 00:12:14.690
x1, x2, x3 are the intervals
between arrivals.

00:12:14.690 --> 00:12:18.280
And N of t is the number of
arrivals that have occurred up

00:12:18.280 --> 00:12:19.860
until time t.

00:12:19.860 --> 00:12:26.800
So every time the t passes one
of these arrival times, N of t

00:12:26.800 --> 00:12:31.140
pops up by one, pops up by one
again, pops up by one again.

00:12:31.140 --> 00:12:34.200
The sample value
pops up by one.

00:12:34.200 --> 00:12:36.920
Arrival process can model
arrivals to a queue,

00:12:36.920 --> 00:12:40.320
departures from a queue,
locations of breaks in an oil

00:12:40.320 --> 00:12:43.960
line, an enormous number
of things.

00:12:43.960 --> 00:12:46.260
It's not just arrivals
we're talking about.

00:12:46.260 --> 00:12:48.070
It's all of these other
things, also.

00:12:48.070 --> 00:12:54.330
But it's something laid out on
a one-dimensional axis where

00:12:54.330 --> 00:12:58.390
things happen at various
places on that

00:12:58.390 --> 00:12:59.700
one-dimensional axis.

00:12:59.700 --> 00:13:05.100
So that's the way to view it.

00:13:05.100 --> 00:13:07.540
OK, same picture again.

00:13:07.540 --> 00:13:11.510
Process can be specified by the
joint distribution of the

00:13:11.510 --> 00:13:15.570
arrival epochs or the
interarrival times, and, in

00:13:15.570 --> 00:13:18.090
fact, of the counting process.

00:13:18.090 --> 00:13:25.200
If you see a sample path of
the counting process, then

00:13:25.200 --> 00:13:29.180
from that you can determine the
sample path of the arrival

00:13:29.180 --> 00:13:33.220
times and the sample path of
the interarrival times.

00:13:33.220 --> 00:13:38.320
And since any set of these
random variables specifies all

00:13:38.320 --> 00:13:43.220
three of these things, the
three are all equivalent.

00:13:43.220 --> 00:13:47.150
OK, we have this important
condition here.

00:13:47.150 --> 00:13:55.960
And I always sort of forget
this, but when these arrivals

00:13:55.960 --> 00:13:59.700
are highly delayed, when there's
a long period of time

00:13:59.700 --> 00:14:05.380
between each arrival, what that
says is the accounting

00:14:05.380 --> 00:14:08.480
process is getting small.

00:14:08.480 --> 00:14:12.570
So big interarrival times
corresponds to a small

00:14:12.570 --> 00:14:14.180
value of N of t.

00:14:14.180 --> 00:14:16.420
And you can see that in
the picture here.

00:14:16.420 --> 00:14:20.020
If you spread out these
arrivals, you make s1 all the

00:14:20.020 --> 00:14:21.290
way out here.

00:14:21.290 --> 00:14:26.190
Then N of t doesn't become
1 until way out here.

00:14:26.190 --> 00:14:32.930
So N of t as a function of t is
getting smaller as s sub n

00:14:32.930 --> 00:14:36.030
is getting larger.

00:14:36.030 --> 00:14:41.560
S sub n is the minimum of the
set of t, such that N of t is

00:14:41.560 --> 00:14:45.830
greater than or equal to N.
Sounds like a unpleasantly

00:14:45.830 --> 00:14:49.460
complicated expression.

00:14:49.460 --> 00:14:52.210
If any of you can find a simpler
way to say it than

00:14:52.210 --> 00:14:55.950
that, I would be absolutely
delighted to hear it.

00:14:55.950 --> 00:14:57.530
But I don't think there is.

00:14:57.530 --> 00:15:01.150
I think the simpler way to say
it is this picture here.

00:15:01.150 --> 00:15:03.230
And the picture says it.

00:15:03.230 --> 00:15:08.770
And you can sort of figure out
all those logical statements

00:15:08.770 --> 00:15:11.670
from the picture, which
is intuitively a

00:15:11.670 --> 00:15:12.942
lot clearer, I think.

00:15:17.270 --> 00:15:23.380
So now, renewal processes is
an arrival process with IID

00:15:23.380 --> 00:15:25.100
interarrival times.

00:15:25.100 --> 00:15:28.800
And a Poisson process is a
renewal process where the

00:15:28.800 --> 00:15:32.130
interarrival random variables
are exponential.

00:15:32.130 --> 00:15:35.290
So, Poisson process
is a special

00:15:35.290 --> 00:15:37.200
case of renewal process.

00:15:37.200 --> 00:15:40.920
Why are these exponential
interarrival

00:15:40.920 --> 00:15:43.350
arrival times so important?

00:15:43.350 --> 00:15:46.550
Well, it's because they're
memoryless.

00:15:46.550 --> 00:15:50.360
And the memoryless property says
that the probability that

00:15:50.360 --> 00:15:54.535
x is greater than t plus x is
equal to the probability that

00:15:54.535 --> 00:15:58.190
it's greater than x times the
probability that it's greater

00:15:58.190 --> 00:16:01.830
than t for all x and t greater
than or equal to 0.

00:16:01.830 --> 00:16:04.860
This makes better sense if
you say it conditionally.

00:16:04.860 --> 00:16:09.040
The probability that x is
greater than t plus x, given

00:16:09.040 --> 00:16:12.700
that it's greater than t, is
the same as the probability

00:16:12.700 --> 00:16:14.800
that x is greater that--

00:16:14.800 --> 00:16:17.460
capital X is greater
than little x.

00:16:17.460 --> 00:16:20.420
This really gives you
the memoryless

00:16:20.420 --> 00:16:21.780
property in a nutshell.

00:16:21.780 --> 00:16:25.860
It says if you're looking at
this process as it evolves,

00:16:25.860 --> 00:16:29.010
and you see an arrival, and then
you start looking for the

00:16:29.010 --> 00:16:32.160
next arrival, it says that no
matter how long you've been

00:16:32.160 --> 00:16:36.240
looking, the distribution
function, as the time to wait

00:16:36.240 --> 00:16:38.930
until the next arrival,
is the same

00:16:38.930 --> 00:16:40.580
exponential random variable.

00:16:40.580 --> 00:16:44.220
So you never gain anything
by waiting.

00:16:44.220 --> 00:16:46.390
You might as well
be impatient.

00:16:46.390 --> 00:16:48.790
But it doesn't do any good
to be impatient.

00:16:48.790 --> 00:16:51.130
Doesn't to any good to wait.

00:16:51.130 --> 00:16:52.850
It doesn't do any good
to not wait.

00:16:52.850 --> 00:16:56.280
No matter what you do, this
damn thing always takes an

00:16:56.280 --> 00:16:59.780
exponential amount
of time to occur.

00:16:59.780 --> 00:17:01.410
OK, that's what it means
to be memoryless.

00:17:01.410 --> 00:17:03.910
And the exponential is the only

00:17:03.910 --> 00:17:05.835
memoryless random variable.

00:17:10.775 --> 00:17:14.910
How about a geometric
random variable?

00:17:14.910 --> 00:17:19.190
The geometric random variable
is memoryless if you're only

00:17:19.190 --> 00:17:22.150
looking at integer times.

00:17:22.150 --> 00:17:32.180
Here we're talking about
times on a continuum.

00:17:32.180 --> 00:17:35.090
That's what this says.

00:17:35.090 --> 00:17:38.410
Well, that's what this says.

00:17:38.410 --> 00:17:46.590
And if you look at discrete
times, then a geometric random

00:17:46.590 --> 00:17:49.860
variable is memoryless also.

00:17:55.020 --> 00:17:58.210
We're given a Poisson
of rate lambda.

00:17:58.210 --> 00:18:01.290
The interval from any given t
greater than 0 until the first

00:18:01.290 --> 00:18:04.190
arrival after t is a
random variable.

00:18:04.190 --> 00:18:06.010
Let's call it z1.

00:18:06.010 --> 00:18:08.650
We already said that that
random variable was

00:18:08.650 --> 00:18:11.430
exponential.

00:18:11.430 --> 00:18:17.040
And it's independent of all
arrivals which occur before

00:18:17.040 --> 00:18:18.630
that starting time t.

00:18:18.630 --> 00:18:23.220
So looking at any starting
time t, doesn't make any

00:18:23.220 --> 00:18:25.530
difference what has happened
back here.

00:18:25.530 --> 00:18:27.450
That's not only the
last arrival, but

00:18:27.450 --> 00:18:29.630
all the other arrivals.

00:18:29.630 --> 00:18:32.880
The time until the next arrival
is exponential.

00:18:32.880 --> 00:18:36.520
The time until each arrival
after that is exponential

00:18:36.520 --> 00:18:41.690
also, which says that if you
look at this process starting

00:18:41.690 --> 00:18:47.250
at time t, it's a Poisson
process again, where all the

00:18:47.250 --> 00:18:50.450
times have to be shifted, of
course, but it's a Poisson

00:18:50.450 --> 00:18:52.830
process starting at time t.

00:18:52.830 --> 00:19:00.570
The corresponding counting
process, we can call it n

00:19:00.570 --> 00:19:04.950
tilde of t and tau, where tau is
greater than or equal to t,

00:19:04.950 --> 00:19:09.690
where this is the number of
arrivals in the original

00:19:09.690 --> 00:19:14.610
process up until time tau minus
the number of arrivals

00:19:14.610 --> 00:19:16.340
up until time t.

00:19:16.340 --> 00:19:19.330
If you look at that difference,
so many arrivals

00:19:19.330 --> 00:19:26.550
up until t, so many more
up until time tau.

00:19:26.550 --> 00:19:29.030
You look at the difference
between tau and t.

00:19:29.030 --> 00:19:37.080
The number of arrivals in that
interval is the same Poisson

00:19:37.080 --> 00:19:39.800
distributing random
variable again.

00:19:39.800 --> 00:19:43.080
So, it has the same
distribution as N

00:19:43.080 --> 00:19:45.020
of tau minus t.

00:19:45.020 --> 00:19:47.650
And that's called the stationary
increment property.

00:19:47.650 --> 00:19:50.720
It says that no matter where you
start a Poisson process,

00:19:50.720 --> 00:19:53.030
it always looks exactly
the same.

00:19:53.030 --> 00:19:58.370
It says that if you wait for one
hour and start then, it's

00:19:58.370 --> 00:20:01.750
exactly the same as what
it was before.

00:20:01.750 --> 00:20:05.960
If we had Poisson processes in
the world, it wouldn't do any

00:20:05.960 --> 00:20:09.720
good to travel on certain days
rather than other days.

00:20:09.720 --> 00:20:13.170
It wouldn't do any good to leave
to drive home at one

00:20:13.170 --> 00:20:14.850
hour rather than another hour.

00:20:14.850 --> 00:20:17.670
You'd have the same travel
all the time.

00:20:17.670 --> 00:20:18.980
It's all equal.

00:20:18.980 --> 00:20:21.140
It would be an awful world
if it were stationary.

00:20:23.770 --> 00:20:26.750
The independent increment
properties for counting

00:20:26.750 --> 00:20:33.170
process is that for all
sequences of ordered times--

00:20:33.170 --> 00:20:37.490
0 less than t1 less than
t2 up to t k--

00:20:37.490 --> 00:20:40.310
the random variables n of t1--

00:20:40.310 --> 00:20:44.440
and now we're talking about the
number of arrivals between

00:20:44.440 --> 00:20:47.510
t1 and t2, the number
of arrivals between

00:20:47.510 --> 00:20:49.600
n minus 1 and tn.

00:20:49.600 --> 00:20:52.330
These are all independent
of each other.

00:20:52.330 --> 00:20:55.390
That's what this independent
increment property says.

00:20:55.390 --> 00:20:58.110
And we see from what we've said
about this memoryless

00:20:58.110 --> 00:21:02.680
property that the Poisson
process does indeed have this

00:21:02.680 --> 00:21:04.750
independent increment
property.

00:21:04.750 --> 00:21:08.720
Poisson processes have both the
stationary and independent

00:21:08.720 --> 00:21:11.240
increment properties.

00:21:11.240 --> 00:21:15.760
And this looks like an immediate
consequence of that.

00:21:15.760 --> 00:21:16.370
It's not.

00:21:16.370 --> 00:21:19.630
Remember, we had to struggle
with this for a bit.

00:21:19.630 --> 00:21:22.500
But it says plus Poisson
processes can be defined by

00:21:22.500 --> 00:21:26.450
the stationary and independent
increment properties, plus

00:21:26.450 --> 00:21:32.730
either the Poisson PMF for N
of t, or this incremental

00:21:32.730 --> 00:21:38.660
property, the probability that N
tilde of t and t plus delta,

00:21:38.660 --> 00:21:43.320
and the number of arrivals
between t and t plus delta,

00:21:43.320 --> 00:21:46.170
the probability that that's
1 is equal to

00:21:46.170 --> 00:21:47.600
lambda times delta.

00:21:47.600 --> 00:21:53.040
In other words, this view of a
Poisson process is the view

00:21:53.040 --> 00:21:56.850
that you get when you sort
of forget about time.

00:21:56.850 --> 00:22:00.220
And you think of arrivals from
outer space coming down and

00:22:00.220 --> 00:22:01.470
hitting on a line.

00:22:01.470 --> 00:22:03.760
And they hit on that
line randomly.

00:22:03.760 --> 00:22:05.860
And each one of them
is independent

00:22:05.860 --> 00:22:07.780
of every other one.

00:22:07.780 --> 00:22:15.350
And that's what you get if you
wind up with a density of

00:22:15.350 --> 00:22:18.770
lambda arrivals per unit time.

00:22:18.770 --> 00:22:22.120
OK, we talked about all
of that, of course.

00:22:22.120 --> 00:22:23.400
The probability distributions--

00:22:26.050 --> 00:22:29.380
there are many of them for
a Poisson process.

00:22:29.380 --> 00:22:32.470
The Poisson process is
remarkable in the sense that

00:22:32.470 --> 00:22:35.320
anything you want to find,
there's generally a simple

00:22:35.320 --> 00:22:37.070
formula for it.

00:22:37.070 --> 00:22:39.530
If it's complicated, you're
probably not looking at

00:22:39.530 --> 00:22:42.010
it the right way.

00:22:42.010 --> 00:22:45.360
So many things come out
very, very simply.

00:22:45.360 --> 00:22:46.660
The probability--

00:22:46.660 --> 00:22:50.580
the joint probability
distribution of all of the

00:22:50.580 --> 00:22:58.670
arrival times up until time N is
an exponential just in the

00:22:58.670 --> 00:23:05.080
last one, which says that the
intermediate arrival epochs

00:23:05.080 --> 00:23:09.140
are equally likely to be
anywhere, just as long as they

00:23:09.140 --> 00:23:13.440
satisfy this ordering
restriction, s1 less than s2.

00:23:13.440 --> 00:23:15.430
That's what this formula says.

00:23:15.430 --> 00:23:20.490
It says that the joint density
of these arrival times doesn't

00:23:20.490 --> 00:23:23.010
depend on anything except the
time of the last one.

00:23:25.740 --> 00:23:28.520
But it does depend on the fact
that they're [INAUDIBLE].

00:23:28.520 --> 00:23:31.435
From that, you can find
virtually everything else if

00:23:31.435 --> 00:23:32.900
you want to.

00:23:32.900 --> 00:23:36.600
That really is saying exactly
the same thing as we were just

00:23:36.600 --> 00:23:38.440
saying a while ago.

00:23:38.440 --> 00:23:41.740
This is the viewpoint of looking
at this line from

00:23:41.740 --> 00:23:47.040
outer space with arrivals coming
in, coming in uniformly

00:23:47.040 --> 00:23:51.630
distributed over this line
interval, and each of them

00:23:51.630 --> 00:23:54.080
independent of each other one.

00:23:54.080 --> 00:23:57.740
That's what you wind
up saying.

00:23:57.740 --> 00:24:01.490
This density, then, of the
n-th arrival, if you just

00:24:01.490 --> 00:24:05.620
integrate all this stuff, you
get the Erlang formula.

00:24:05.620 --> 00:24:12.940
Probability of arrival n in
t to t plus delta is--

00:24:12.940 --> 00:24:17.820
now this is the derivation that
we went through before,

00:24:17.820 --> 00:24:20.310
going from Erlang to Poisson.

00:24:20.310 --> 00:24:24.370
You can go from Poisson to
Erlang too, if you want to.

00:24:24.370 --> 00:24:26.320
But it's a little easier
to go this way.

00:24:26.320 --> 00:24:30.500
The probability of arrival in
t to t plus delta is the

00:24:30.500 --> 00:24:35.890
probability that n of t is
equal to n minus 1 times

00:24:35.890 --> 00:24:40.670
lambda delta plus an o
of delta, of course.

00:24:40.670 --> 00:24:46.270
And the probability that n of
t is equal to n minus 1 from

00:24:46.270 --> 00:24:53.050
this formula here is going to be
the density of when s sub n

00:24:53.050 --> 00:24:55.040
appears, divided by lambda.

00:24:55.040 --> 00:24:58.910
That's exactly what this
formula here says.

00:24:58.910 --> 00:25:01.980
So that's just the Poisson
distribution.

00:25:01.980 --> 00:25:04.910
We've been through
that derivation.

00:25:04.910 --> 00:25:08.420
It's almost a derivation worth
remembering, because it just

00:25:08.420 --> 00:25:11.940
appears so often.

00:25:11.940 --> 00:25:16.160
As you've seen from the problem
sets we've done,

00:25:16.160 --> 00:25:20.970
almost every problem you can
dream of, dealing with Poisson

00:25:20.970 --> 00:25:27.150
processes, the easy way to do
them comes from this property

00:25:27.150 --> 00:25:30.730
of combining and splitting
Poisson processes.

00:25:30.730 --> 00:25:35.170
It says if n1 of t, n2 of t,
up to n sub k of t are

00:25:35.170 --> 00:25:37.500
independent Poisson
processes--

00:25:37.500 --> 00:25:39.880
what do you mean by
a process being

00:25:39.880 --> 00:25:42.200
independent of another process?

00:25:42.200 --> 00:25:46.660
Well, the process is specified
by the interarrival times for

00:25:46.660 --> 00:25:47.660
that process.

00:25:47.660 --> 00:25:50.950
So what we're saying here is the
interarrival times for the

00:25:50.950 --> 00:25:54.470
first process are independent
of the interarrival times of

00:25:54.470 --> 00:25:56.770
the second process,
independent of the

00:25:56.770 --> 00:26:00.620
interarrival times for the third
process, and so forth.

00:26:00.620 --> 00:26:02.990
Again, this is a view of someone
from outer space,

00:26:02.990 --> 00:26:06.180
throwing darts onto a line.

00:26:06.180 --> 00:26:09.750
And if you have multiple people
throwing darts on a

00:26:09.750 --> 00:26:13.450
line, but they're all equally
distributed, all uniformly

00:26:13.450 --> 00:26:16.600
distributed over the line,
this is exactly

00:26:16.600 --> 00:26:20.670
the model you get.

00:26:20.670 --> 00:26:22.180
So we have two views here.

00:26:22.180 --> 00:26:26.480
The first one is to look at
the arrival epochs that's

00:26:26.480 --> 00:26:28.420
generated from each process.

00:26:28.420 --> 00:26:31.710
And then combine all arrivals
into one Poisson process.

00:26:31.710 --> 00:26:34.900
So we look at all these Poisson
processes, and then

00:26:34.900 --> 00:26:38.340
take the sum of them, and we
get a Poisson process.

00:26:38.340 --> 00:26:40.190
The other way to look at it--

00:26:40.190 --> 00:26:43.120
and going back and forth between
these two views is the

00:26:43.120 --> 00:26:45.060
way you solve problems--

00:26:45.060 --> 00:26:46.770
you look at the combined
sequence of

00:26:46.770 --> 00:26:48.900
arrival epochs first.

00:26:48.900 --> 00:26:52.400
And then for each arrival that
comes in, you think of an IID

00:26:52.400 --> 00:26:55.450
random variable independent
of all the other random

00:26:55.450 --> 00:27:02.860
variables, which decides for
each arrival which of the

00:27:02.860 --> 00:27:04.710
sub-processes it goes to.

00:27:04.710 --> 00:27:08.680
So there's this hidden
process--

00:27:08.680 --> 00:27:09.890
well, it's not hidden.

00:27:09.890 --> 00:27:12.100
You can see what it's doing
from looking at all the

00:27:12.100 --> 00:27:14.340
sub-processes.

00:27:14.340 --> 00:27:20.670
And each arrival then is
associated with the given

00:27:20.670 --> 00:27:24.700
sub-process, with the
probability mass function

00:27:24.700 --> 00:27:28.160
lambda sub i over the
sum of lambda sub j.

00:27:28.160 --> 00:27:30.460
So this is the workhorse
of Poisson

00:27:30.460 --> 00:27:32.270
type queueing problems.

00:27:32.270 --> 00:27:35.990
You study queuing theory,
every page, you

00:27:35.990 --> 00:27:37.980
see this thing used.

00:27:37.980 --> 00:27:41.480
If you look at Kleinrock's books
on queueing, they're

00:27:41.480 --> 00:27:45.120
very nice books because they
cover so many different

00:27:45.120 --> 00:27:47.040
queueing situations.

00:27:47.040 --> 00:27:50.230
You find him using this
on every page.

00:27:50.230 --> 00:27:54.060
And he never tells you that he's
using it, but that's what

00:27:54.060 --> 00:27:54.670
he's doing.

00:27:54.670 --> 00:27:59.360
So that's a useful
thing to know.

00:27:59.360 --> 00:28:02.840
We then talked about conditional
arrivals and order

00:28:02.840 --> 00:28:05.590
statistics.

00:28:05.590 --> 00:28:12.280
The conditional distribution
of the N first arrivals--

00:28:12.280 --> 00:28:17.670
namely, s sub 1 s sub
2 up to s sub n--

00:28:17.670 --> 00:28:24.250
given the number of arrivals in
N of t is just n factorial

00:28:24.250 --> 00:28:25.430
over t to the n.

00:28:25.430 --> 00:28:29.380
Again, it doesn't depend on
where these arrivals are.

00:28:29.380 --> 00:28:33.215
It's just a function which is
independent of each arrival.

00:28:33.215 --> 00:28:36.660
It's the same kind of
conditioning we had before.

00:28:36.660 --> 00:28:40.080
It's n factorial divided
by t to the n.

00:28:40.080 --> 00:28:44.360
Because of the fact that if
you order these random

00:28:44.360 --> 00:28:49.450
variables, t1 less than t2 less
than t3, and so forth, up

00:28:49.450 --> 00:28:53.540
until time t, and then you say
how many different ways can I

00:28:53.540 --> 00:29:01.590
arrange a set of numbers, each
between 0 and t so that we

00:29:01.590 --> 00:29:03.630
have different orderings
of them.

00:29:03.630 --> 00:29:06.700
And you can choose any one
of the N to be the first.

00:29:06.700 --> 00:29:09.560
You can choose any one
of the remaining n

00:29:09.560 --> 00:29:11.510
minus 1 to be the second.

00:29:11.510 --> 00:29:14.670
And that's where this is n
factorial comes from here.

00:29:14.670 --> 00:29:18.140
And that, again we've
been over.

00:29:18.140 --> 00:29:21.660
The probability that s1 is
greater than tau, given that

00:29:21.660 --> 00:29:27.540
they're interarrivals in the
overall interval t, comes from

00:29:27.540 --> 00:29:31.390
just looking at N uniformly
distributed random variables

00:29:31.390 --> 00:29:33.190
between 0 and t.

00:29:33.190 --> 00:29:35.840
And then what do you do with
those uniformly distributed

00:29:35.840 --> 00:29:37.670
random variables?

00:29:37.670 --> 00:29:40.490
Well, you ask the question,
what's the probability that

00:29:40.490 --> 00:29:44.140
all of them occur
after time tau?

00:29:44.140 --> 00:29:47.820
And that's just t minus tau
divided by t raised to the

00:29:47.820 --> 00:29:48.910
n-th power.

00:29:48.910 --> 00:29:51.980
And see, all of these formulas
just come from particular

00:29:51.980 --> 00:29:54.360
viewpoints about what's
going on.

00:29:54.360 --> 00:29:55.760
You have a number
of viewpoints.

00:29:55.760 --> 00:29:58.550
One of them is throwing
darts at a line.

00:29:58.550 --> 00:30:01.140
One of them is having
exponential

00:30:01.140 --> 00:30:02.510
interarrival times.

00:30:02.510 --> 00:30:06.660
One of them is these uniform
interarrivals.

00:30:06.660 --> 00:30:08.880
It's only a very small
number of tricks.

00:30:08.880 --> 00:30:13.600
And you just use them in
various combinations.

00:30:13.600 --> 00:30:17.800
So the joint distribution of s1
to s n, given N of t equals

00:30:17.800 --> 00:30:21.250
n, is the same as the joint
distribution of N uniform

00:30:21.250 --> 00:30:24.070
random variables after
they've been ordered.

00:30:28.650 --> 00:30:32.115
So let's go on to finite
state Markov chains.

00:30:35.240 --> 00:30:37.670
Seems like we're covering an
enormous amount of material in

00:30:37.670 --> 00:30:38.350
this course.

00:30:38.350 --> 00:30:40.150
And I think we are.

00:30:40.150 --> 00:30:44.290
But as I'm trying to say, as
we go along, it's all--

00:30:44.290 --> 00:30:46.850
I mean, everything follows from
a relatively small set of

00:30:46.850 --> 00:30:48.620
principles.

00:30:48.620 --> 00:30:51.100
Of course, it's harder to
understand the small set of

00:30:51.100 --> 00:30:54.580
principles and how to apply them
than it is to understand

00:30:54.580 --> 00:30:55.460
all the details.

00:30:55.460 --> 00:30:56.710
But that's--

00:30:58.970 --> 00:31:01.560
but on the other hand, if you
understand the principles,

00:31:01.560 --> 00:31:04.620
then all those details,
including the ones we haven't

00:31:04.620 --> 00:31:08.280
talked about, are easy
to deal with.

00:31:08.280 --> 00:31:11.750
An integer-time stochastic
process--

00:31:11.750 --> 00:31:14.450
x1, x2, x3, blah, blah, blah--

00:31:14.450 --> 00:31:19.220
is a Markov chain if for all n,
namely the number of them

00:31:19.220 --> 00:31:21.770
that we're looking at--

00:31:21.770 --> 00:31:23.020
well--

00:31:25.880 --> 00:31:30.190
for all n, i, j, k, l, and so
forth, the probability that

00:31:30.190 --> 00:31:35.770
the n-th of these random
variables is equal to j, given

00:31:35.770 --> 00:31:39.340
what all of the others are-- and
these are not ordered now.

00:31:39.340 --> 00:31:41.460
I mean, in a Markov chain,
nothing is ordered.

00:31:41.460 --> 00:31:44.430
We're not talking about
an arrival process.

00:31:44.430 --> 00:31:47.220
We're just talking about a frog
jumping around on lily

00:31:47.220 --> 00:31:52.660
pads, if you arrange the lily
pads in a linear way, if these

00:31:52.660 --> 00:31:54.430
are random variables.

00:31:54.430 --> 00:32:00.530
The probability that the n-th
location is equal to j, given

00:32:00.530 --> 00:32:06.410
that the previous locations are
i, k, back to m, is just

00:32:06.410 --> 00:32:11.010
some probability p sub
i j, a conditional

00:32:11.010 --> 00:32:14.120
probability of j given i.

00:32:14.120 --> 00:32:17.670
In other words, once if you're
looking at what happens at

00:32:17.670 --> 00:32:22.340
time n, once you know what
happened at time n minus 1,

00:32:22.340 --> 00:32:24.830
everything else is
of no concern.

00:32:24.830 --> 00:32:29.400
This process evolves by having
a history of only one time

00:32:29.400 --> 00:32:31.980
unit, a little like the
Poisson process.

00:32:31.980 --> 00:32:36.070
The Poisson process evolves
by being totally

00:32:36.070 --> 00:32:37.880
independent of the past.

00:32:37.880 --> 00:32:40.600
Here, you put a little
dependence in the past.

00:32:40.600 --> 00:32:44.150
But the dependence is only to
look at the last thing that

00:32:44.150 --> 00:32:49.040
happened, and nothing before the
last time that happened.

00:32:49.040 --> 00:32:53.850
So p sub i j depends
only on i and j.

00:32:53.850 --> 00:32:59.170
And the initial probability mass
function is arbitrary.

00:32:59.170 --> 00:33:02.470
Markov chain is finite-state if
the sample space for each x

00:33:02.470 --> 00:33:07.400
i, as a finite set S. And the
sample space S is usually

00:33:07.400 --> 00:33:10.530
taken to be integers
1 up to M.

00:33:10.530 --> 00:33:13.490
In all these formulas we write,
we're always summing

00:33:13.490 --> 00:33:17.230
from one to M. And the reason
for that is we've assumed the

00:33:17.230 --> 00:33:22.120
states are 1, 2, 3, up to M.
Sometimes it's more convenient

00:33:22.120 --> 00:33:23.765
to think of different
state spaces.

00:33:26.730 --> 00:33:29.040
But all the formulas
we use are based on

00:33:29.040 --> 00:33:31.290
this state space here.

00:33:31.290 --> 00:33:36.500
Markov up chain is completely
described by these transition

00:33:36.500 --> 00:33:41.200
probabilities plus the initial
probabilities.

00:33:41.200 --> 00:33:44.390
If you want to write down the
probability of what x is this

00:33:44.390 --> 00:33:49.030
some time N given what was at
some time 0, all you have to

00:33:49.030 --> 00:33:52.890
do is trace all the paths from
0 out to N, add up the

00:33:52.890 --> 00:33:56.890
probabilities of all of those
paths, and that tells you the

00:33:56.890 --> 00:33:58.020
probability you want.

00:33:58.020 --> 00:34:01.820
All probabilities and be
calculated just from knowing

00:34:01.820 --> 00:34:06.240
what these transition
probabilities are.

00:34:06.240 --> 00:34:10.980
Note that when we're dealing
with Poisson processes, we

00:34:10.980 --> 00:34:15.520
defined everything in
terms of how many--

00:34:15.520 --> 00:34:20.250
how many variables are there in
defining a Poisson process?

00:34:20.250 --> 00:34:25.020
How many things do you have to
specify before I know exactly

00:34:25.020 --> 00:34:27.320
what Poisson process
I'm talking about?

00:34:30.540 --> 00:34:31.760
Only the Poisson rate.

00:34:31.760 --> 00:34:35.650
Only one parameter is necessary

00:34:35.650 --> 00:34:37.639
for a Poisson process.

00:34:37.639 --> 00:34:43.219
For a finite-state Markov
process, you need a lot more.

00:34:43.219 --> 00:34:48.310
What you need is all of these
values, p sub i j.

00:34:48.310 --> 00:34:52.409
If you sum p sub i j over
j, you have to get 1.

00:34:52.409 --> 00:34:54.830
So that removes one of them.

00:34:54.830 --> 00:34:58.360
But as soon as you specify that
transition matrix, you've

00:34:58.360 --> 00:34:59.960
specified everything.

00:34:59.960 --> 00:35:01.260
So there's nothing more to know

00:35:01.260 --> 00:35:03.220
about the Poisson process.

00:35:03.220 --> 00:35:06.060
There's only all these gruesome
derivations that we

00:35:06.060 --> 00:35:07.580
go through.

00:35:07.580 --> 00:35:11.600
But everything is initially
determined.

00:35:11.600 --> 00:35:13.960
Set of transition probabilities
is usually

00:35:13.960 --> 00:35:16.030
viewed as the Markov chain.

00:35:16.030 --> 00:35:19.760
And the initial probabilities
are usually viewed as just a

00:35:19.760 --> 00:35:21.740
parameter that we deal with.

00:35:21.740 --> 00:35:23.840
In other words, we--

00:35:23.840 --> 00:35:28.250
in other words, what we study
is the particular Markov

00:35:28.250 --> 00:35:31.550
chain, whether it's recurrent,
whether it's transient,

00:35:31.550 --> 00:35:32.800
whatever it is.

00:35:32.800 --> 00:35:35.770
How you break it up into
classes, all of that stuff

00:35:35.770 --> 00:35:39.060
only depends on these transition
probabilities and

00:35:39.060 --> 00:35:40.815
doesn't depend on
where you start.

00:35:46.920 --> 00:35:51.490
Now, a finite-state Markov chain
can be described either

00:35:51.490 --> 00:35:54.230
as a directed graph
or as a matrix.

00:35:54.230 --> 00:35:58.300
I hope you've seen by this
time that some things are

00:35:58.300 --> 00:36:03.040
easier to look at if you look at
things in terms of a graph.

00:36:03.040 --> 00:36:07.180
Some things are easier to look
at if you look at something

00:36:07.180 --> 00:36:08.660
like this matrix.

00:36:08.660 --> 00:36:13.230
And some problems can be solved
by inspection, if you

00:36:13.230 --> 00:36:14.700
draw a graph of it.

00:36:14.700 --> 00:36:17.890
Some can be solved almost
by inspection if

00:36:17.890 --> 00:36:19.480
you look at the matrix.

00:36:19.480 --> 00:36:23.460
If you're doing things by
computer, usually computers

00:36:23.460 --> 00:36:27.450
deal with matrices more easily
than with graphs.

00:36:27.450 --> 00:36:31.070
If you're dealing with a Markov
chain with 100,000

00:36:31.070 --> 00:36:35.290
states, you're not going to
look at the graph and

00:36:35.290 --> 00:36:38.330
determine very much from it,
because it's typically going

00:36:38.330 --> 00:36:39.650
to be fairly complicated--

00:36:39.650 --> 00:36:42.020
unless it has some very
simple structure.

00:36:42.020 --> 00:36:46.440
And sometimes that simple
structure is determined.

00:36:46.440 --> 00:36:48.780
If it's something where
you can only--

00:36:48.780 --> 00:36:52.190
where you have the states
numbered from 1 to 100,000,

00:36:52.190 --> 00:36:56.270
and you can only go from state
i to state i plus 1, or from

00:36:56.270 --> 00:36:59.910
state i to i plus 1, or
i minus 1, then it

00:36:59.910 --> 00:37:01.380
becomes very simple.

00:37:01.380 --> 00:37:04.320
And you like to look at
it as a graph again.

00:37:04.320 --> 00:37:07.670
But ordinarily, you don't
like to do that.

00:37:07.670 --> 00:37:15.000
But the nice thing about this
graph is that it tells you

00:37:15.000 --> 00:37:19.090
very simply and visually which
transition probabilities are

00:37:19.090 --> 00:37:23.810
zero, and which transition
probabilities are non-zero.

00:37:23.810 --> 00:37:26.690
And that's the thing that
specifies which states are

00:37:26.690 --> 00:37:31.650
recurrent, which states are
transient, and all of that.

00:37:31.650 --> 00:37:35.400
All of that kind of elementary
analysis about a Markov chain

00:37:35.400 --> 00:37:40.300
all comes from looking at this
graph and seeing whether you

00:37:40.300 --> 00:37:46.290
can get from one state to
another state by some process.

00:37:46.290 --> 00:37:50.520
So let's move on from that.

00:37:50.520 --> 00:37:53.620
Talk about the classification
of states.

00:37:53.620 --> 00:37:57.500
We started out with the
idea of a walk and

00:37:57.500 --> 00:37:59.370
a path and a cycle.

00:37:59.370 --> 00:38:03.610
I'm not sure these terms are
uniform throughout the field.

00:38:03.610 --> 00:38:07.550
But a walk is an ordered
string of nodes, like

00:38:07.550 --> 00:38:10.020
i0, i1, up to i n.

00:38:10.020 --> 00:38:14.960
You can have repeated elements
here, but you need a directed

00:38:14.960 --> 00:38:18.170
arc from i sub n minus
1 to i sub m.

00:38:18.170 --> 00:38:23.035
Like for example, in this stupid
Markov chain here--

00:38:25.870 --> 00:38:28.880
I mean, when you're drawing
things is LaTeX, it's kind of

00:38:28.880 --> 00:38:31.760
hard to draw those nice
little curves there.

00:38:31.760 --> 00:38:34.610
And because of that, when you
once draw a Markov chain, you

00:38:34.610 --> 00:38:36.050
never want to change it.

00:38:36.050 --> 00:38:39.210
And that's why these nodes
have a very small set of

00:38:39.210 --> 00:38:40.530
Markov chains in them.

00:38:40.530 --> 00:38:46.580
It's just to save me some work,
drawing and drawing

00:38:46.580 --> 00:38:47.830
these diagrams.

00:38:50.030 --> 00:38:55.700
An example of a walk, as you
start in 4, you take the self

00:38:55.700 --> 00:38:58.800
loop, go back to 4 at time 2.

00:38:58.800 --> 00:39:01.660
Then you go to state
1 at time 3.

00:39:01.660 --> 00:39:05.240
Then you go to state
2 at time 4.

00:39:05.240 --> 00:39:08.140
Then you go to stage
3, time 5.

00:39:08.140 --> 00:39:11.010
And back to state 2 at time 6.

00:39:11.010 --> 00:39:13.300
You have repeated nodes there.

00:39:13.300 --> 00:39:17.230
You have repeated nodes
separated here.

00:39:17.230 --> 00:39:20.630
Another example of a
walk is 4, 1, 2, 3.

00:39:20.630 --> 00:39:24.120
Example of a path, the path
can't have any repeated nodes.

00:39:24.120 --> 00:39:27.060
We'd like to look at paths,
because if you're going to be

00:39:27.060 --> 00:39:30.280
able to get from one node to
another node, and there's some

00:39:30.280 --> 00:39:33.420
walk that goes all around the
place and gets to that final

00:39:33.420 --> 00:39:36.770
node, there's also path
that goes there.

00:39:36.770 --> 00:39:39.900
If you look at the walk, you
just leave that all the cycles

00:39:39.900 --> 00:39:42.570
along the way, and
you get to the n.

00:39:42.570 --> 00:39:45.980
And a cycle, of course, which I
didn't define, is something

00:39:45.980 --> 00:39:49.820
which starts at one node, goes
through a path, and then

00:39:49.820 --> 00:39:52.730
finally comes back to the same
node that it started at.

00:39:52.730 --> 00:39:56.800
And it doesn't make any
difference for the cycle 2, 3,

00:39:56.800 --> 00:40:01.610
2 whether you call it
2, 3, 2 or 3, 2, 3.

00:40:01.610 --> 00:40:04.390
That's the same cycle, and
it's not even worth

00:40:04.390 --> 00:40:07.200
distinguishing between
those two ideas.

00:40:07.200 --> 00:40:12.723
OK That's that.

00:40:15.360 --> 00:40:20.010
If there's a path from--

00:40:20.010 --> 00:40:21.260
where did I--

00:40:26.110 --> 00:40:31.800
node j is accessible from i,
which we abbreviate as i

00:40:31.800 --> 00:40:33.680
has a path to j.

00:40:33.680 --> 00:40:38.010
If there's a walk from i to
j, which means that p

00:40:38.010 --> 00:40:40.650
sup i j to the n--

00:40:40.650 --> 00:40:44.150
this is the transition
probability, the probability

00:40:44.150 --> 00:40:49.160
that x sub n is equal to
j, given that x sub

00:40:49.160 --> 00:40:50.710
0 is equal to i.

00:40:50.710 --> 00:40:53.380
And we use this all the time.

00:40:53.380 --> 00:40:57.370
If this is greater than zero
for some n greater than 0.

00:40:57.370 --> 00:41:06.950
In other words, j is accessible
from i if there's a

00:41:06.950 --> 00:41:09.240
path from i that goes to j.

00:41:12.300 --> 00:41:17.170
And trivially, if i go to j, and
there's a path from j to

00:41:17.170 --> 00:41:21.520
k, then there has to be
a path from i to k.

00:41:21.520 --> 00:41:25.730
If you've ever tried to make up
a mapping program to find

00:41:25.730 --> 00:41:28.910
how to get from here to there,
this is one of the most useful

00:41:28.910 --> 00:41:29.740
things you use.

00:41:29.740 --> 00:41:32.320
If there's a way to get here
to there, and a way to get

00:41:32.320 --> 00:41:35.330
from here to there, then there's
a way to get from here

00:41:35.330 --> 00:41:37.560
all the way to the end.

00:41:37.560 --> 00:41:42.650
And if you look up what most of
these map programs do, you

00:41:42.650 --> 00:41:47.040
see that they overuse this
enormously and they wind up

00:41:47.040 --> 00:41:50.910
taking you from here to there
by some bizarre path just

00:41:50.910 --> 00:41:53.880
because it happens to go through
some intermediate node

00:41:53.880 --> 00:41:55.460
on the way.

00:41:55.460 --> 00:41:58.680
So two nodes communicate--

00:41:58.680 --> 00:42:01.890
i double arrow j--

00:42:01.890 --> 00:42:08.860
if j is accessible from i, and
if i is accessible from j.

00:42:08.860 --> 00:42:12.450
That means there's a path from
i to j, and another path from

00:42:12.450 --> 00:42:16.260
j back to i, if you shorten
them as much as you can.

00:42:16.260 --> 00:42:17.040
There's a cycle.

00:42:17.040 --> 00:42:23.530
It starts at i, goes through j,
and comes back to i again.

00:42:23.530 --> 00:42:29.810
I didn't say that quite right,
so delete that from what

00:42:29.810 --> 00:42:31.200
you've just heard.

00:42:31.200 --> 00:42:35.630
A class C of states as a
non-empty set, such that i and

00:42:35.630 --> 00:42:40.370
j communicate for each
i j in this class.

00:42:40.370 --> 00:42:45.330
But i does not communicate
with j for each i and C--

00:42:49.420 --> 00:42:53.210
for i and C and j, not in C.

00:42:53.210 --> 00:42:55.870
The convenient way to think
about this-- and I should have

00:42:55.870 --> 00:42:59.670
stated this as a theorem in
the notes, because it's--

00:43:03.990 --> 00:43:06.130
I think it's something that
we all use without even

00:43:06.130 --> 00:43:07.750
thinking about it.

00:43:07.750 --> 00:43:12.480
It says that the entire set of
states, or the entire set of

00:43:12.480 --> 00:43:16.500
nodes in a graph, is partitioned
into classes.

00:43:16.500 --> 00:43:22.860
The class C, containing, is i
in union with all of the j's

00:43:22.860 --> 00:43:24.110
that communicate with i.

00:43:24.110 --> 00:43:27.580
So if you want to find this
partition, you start out with

00:43:27.580 --> 00:43:31.280
an arbitrary node, you find all
of the other nodes that it

00:43:31.280 --> 00:43:34.590
communicates with, and you
find them by picking

00:43:34.590 --> 00:43:36.320
them one at a time.

00:43:36.320 --> 00:43:41.050
You pick all of the nodes
for which p sub i j is

00:43:41.050 --> 00:43:42.540
greater than 0.

00:43:42.540 --> 00:43:44.100
Then you pick--

00:43:44.100 --> 00:43:46.530
and p sub j i is great--

00:43:46.530 --> 00:43:47.780
well-- blah.

00:43:50.030 --> 00:43:55.400
If you want to find the set of
nodes that are accessible from

00:43:55.400 --> 00:43:57.640
i, you start out looking at i.

00:43:57.640 --> 00:44:00.640
You look at all the states
which are accessible

00:44:00.640 --> 00:44:03.300
from i in one step.

00:44:03.300 --> 00:44:06.870
Then you look at all the steps,
all of the states,

00:44:06.870 --> 00:44:09.380
which you can access from
any one of those.

00:44:09.380 --> 00:44:12.720
Those are the states which are
accessible in two states--

00:44:12.720 --> 00:44:16.150
in two steps, then in three
steps, and so forth.

00:44:16.150 --> 00:44:21.380
So you find all the nodes that
are accessible from node i.

00:44:21.380 --> 00:44:24.640
And then you turn around and
do it the other way.

00:44:24.640 --> 00:44:29.600
And presto, you have all of
these classes of states all

00:44:29.600 --> 00:44:30.910
very simply.

00:44:30.910 --> 00:44:34.990
For finite-state change, the
state i is transient if

00:44:34.990 --> 00:44:40.200
there's a j in S such that
i goes into j, but j

00:44:40.200 --> 00:44:41.420
does not go into i.

00:44:41.420 --> 00:44:46.900
In other words, if I'm a state
i, and I can get to you, but

00:44:46.900 --> 00:44:55.450
you can't get back to me,
then I'm transient.

00:44:55.450 --> 00:45:01.600
Because the way Markov chains
work, we keep going from one

00:45:01.600 --> 00:45:04.720
step to the next step to the
next step to the next step.

00:45:04.720 --> 00:45:09.710
And if I keep returning to
myself, then eventually I'm

00:45:09.710 --> 00:45:11.010
going to go to you.

00:45:11.010 --> 00:45:14.040
And once I go to you, I'll
never get back again.

00:45:14.040 --> 00:45:18.540
So because of that, these
transient states are states

00:45:18.540 --> 00:45:21.450
where eventually you
leave them and you

00:45:21.450 --> 00:45:23.160
never get back again.

00:45:23.160 --> 00:45:26.190
As soon as we start talking
about countable state Markov

00:45:26.190 --> 00:45:28.270
chains, you'll see that
this definition

00:45:28.270 --> 00:45:30.250
doesn't work anymore.

00:45:30.250 --> 00:45:32.620
You can--

00:45:32.620 --> 00:45:36.520
it is very possible to just
wander away in a countable

00:45:36.520 --> 00:45:40.390
state Markov chain, and you
never get back again that way.

00:45:40.390 --> 00:45:43.640
After you wander away too far,
the probability of getting

00:45:43.640 --> 00:45:45.540
back gets smaller and smaller.

00:45:45.540 --> 00:45:47.830
You keep getting further
and further away.

00:45:47.830 --> 00:45:52.810
The probability of returning
gets smaller and smaller, so

00:45:52.810 --> 00:45:56.360
that you have transience
that way also.

00:45:56.360 --> 00:45:59.470
But here, the situation is
simpler for a finite-state

00:45:59.470 --> 00:46:01.030
Markov chain.

00:46:01.030 --> 00:46:05.570
And you can define transience if
there's a j in S such that

00:46:05.570 --> 00:46:09.440
i goes into j, but j
doesn't go into i.

00:46:09.440 --> 00:46:13.160
If i's not transient,
then it's recurrent.

00:46:13.160 --> 00:46:16.240
Usually you define recurrence
first and transience later,

00:46:16.240 --> 00:46:19.470
but it's a little simpler
this way.

00:46:19.470 --> 00:46:22.310
All states in a class are
transient, or all are

00:46:22.310 --> 00:46:26.330
recurrent, and a finite-state
Markov chain contains at least

00:46:26.330 --> 00:46:27.990
one recurrent class.

00:46:27.990 --> 00:46:29.770
You did that in your homework.

00:46:29.770 --> 00:46:33.040
And you were surprised at how
complicated it was to do it.

00:46:33.040 --> 00:46:36.350
I hope that after you wrote
down a proof of this, you

00:46:36.350 --> 00:46:41.800
stopped and thought about what
you were actually proving,

00:46:41.800 --> 00:46:46.030
which intuitively is something
very, very simple.

00:46:46.030 --> 00:46:48.960
It's just looking at all of
the transient classes.

00:46:48.960 --> 00:46:51.480
Starting at one transient
class, you

00:46:51.480 --> 00:46:54.950
find if there's another--

00:46:54.950 --> 00:46:59.190
if there's another state you can
get to from OK i which is

00:46:59.190 --> 00:47:02.170
also transient, and then you
find if there's another state

00:47:02.170 --> 00:47:04.910
you get to from there which
is also transient.

00:47:04.910 --> 00:47:08.500
And eventually, you have to come
to a state from which you

00:47:08.500 --> 00:47:13.325
can't go to some other state,
from which you can't get back.

00:47:17.350 --> 00:47:20.410
That was explaining it almost
as badly as the problem

00:47:20.410 --> 00:47:22.120
statement explained it.

00:47:22.120 --> 00:47:25.460
And I hope that after you did
the problem, even if you can't

00:47:25.460 --> 00:47:27.910
explain it to someone,
you have an

00:47:27.910 --> 00:47:30.430
understanding of why it's true.

00:47:30.430 --> 00:47:34.920
It shouldn't be surprising
after you do that.

00:47:34.920 --> 00:47:38.950
So the finite-state Markov chain
contains at least one

00:47:38.950 --> 00:47:40.200
recurrent class.

00:47:42.800 --> 00:47:46.720
OK, the period of a state
i as the greatest common

00:47:46.720 --> 00:47:51.730
denominator of n, such that
p i n is greater than 0.

00:47:51.730 --> 00:47:54.580
Again, a very complicated
definition for a

00:47:54.580 --> 00:47:56.280
simple kind of idea.

00:47:56.280 --> 00:47:58.670
Namely, you start out
in a state i.

00:47:58.670 --> 00:48:02.440
You look at all of the times at
which you can get back to

00:48:02.440 --> 00:48:03.940
state i again.

00:48:03.940 --> 00:48:08.780
If you find it that set of
times has a period in it,

00:48:08.780 --> 00:48:19.550
namely, if every sequences of
states is a multiple of some

00:48:19.550 --> 00:48:25.410
d, then you know that the state
is periodic if d is

00:48:25.410 --> 00:48:26.720
greater than 1.

00:48:26.720 --> 00:48:30.060
And what you have to do is to
find the largest such number.

00:48:30.060 --> 00:48:32.040
And that's the period
of the state.

00:48:32.040 --> 00:48:35.170
All states in the same class
have the same period.

00:48:35.170 --> 00:48:38.690
A recurring class with period
d greater than one can be

00:48:38.690 --> 00:48:40.550
partitioned into sub-class--

00:48:40.550 --> 00:48:42.640
this is the best way
of looking at

00:48:42.640 --> 00:48:45.820
periodic classes of states.

00:48:45.820 --> 00:48:49.780
If you have a periodic class of
states, then you can always

00:48:49.780 --> 00:48:53.960
separate it into
d sub-classes.

00:48:53.960 --> 00:48:59.300
And in such a set of
sub-classes, transitions from

00:48:59.300 --> 00:49:03.770
S1 and the states in
S1 only go to S2.

00:49:03.770 --> 00:49:07.710
Transitions from states
in S2 only go to S3.

00:49:07.710 --> 00:49:12.430
Up to, transitions from S
d only go back to S1.

00:49:12.430 --> 00:49:16.050
They have to go someplace,
so they go back to S1.

00:49:16.050 --> 00:49:22.500
So as you cycle around, it takes
d steps to cycle from 1

00:49:22.500 --> 00:49:24.000
back to 1 again.

00:49:24.000 --> 00:49:28.410
It takes d steps to cycle
from 2 back to 2 again.

00:49:28.410 --> 00:49:31.300
So you can see the structure of
the Markov chain and why,

00:49:31.300 --> 00:49:34.810
in fact, it does have to be--

00:49:34.810 --> 00:49:38.480
why that class has
to be periodic.

00:49:38.480 --> 00:49:41.870
An ergodic class is a recurrent
aperiodic class.

00:49:41.870 --> 00:49:44.760
In other words, it's a class
where the period is equal to

00:49:44.760 --> 00:49:48.450
1, which means there really
isn't any period.

00:49:48.450 --> 00:49:52.550
A Markov chain with only one
class is ergodic if the class

00:49:52.550 --> 00:49:54.640
is ergodic.

00:49:54.640 --> 00:49:56.880
And the big theorem here--

00:49:56.880 --> 00:49:59.670
I mean, this is probably the
most important theorem about

00:49:59.670 --> 00:50:01.820
finite-state Markov chains.

00:50:01.820 --> 00:50:05.100
You have an ergodic,
finite-state Markov chain.

00:50:05.100 --> 00:50:12.300
Then the limit as n goes to
infinity of the probability of

00:50:12.300 --> 00:50:16.700
arriving in state j after n
steps, given that you started

00:50:16.700 --> 00:50:20.780
in state i, is just some
function of j.

00:50:20.780 --> 00:50:24.400
In other words, when n gets very
large, it doesn't depend

00:50:24.400 --> 00:50:27.370
on how large M is.

00:50:27.370 --> 00:50:28.480
It stays the same.

00:50:28.480 --> 00:50:30.570
It becomes independent of n.

00:50:30.570 --> 00:50:32.450
It doesn't depend on
where you started.

00:50:32.450 --> 00:50:34.860
No matter where you start
in a finite-state

00:50:34.860 --> 00:50:36.570
ergodic Markov chain.

00:50:36.570 --> 00:50:40.580
After a very long time, the
probability of being in a

00:50:40.580 --> 00:50:44.620
state j is independent of where
you started, and it's

00:50:44.620 --> 00:50:48.170
independent of how long
you've been running.

00:50:48.170 --> 00:50:52.200
So that's a very strong
kind of--

00:50:52.200 --> 00:50:54.890
it's a very strong kind
of limit theorem.

00:50:54.890 --> 00:50:58.690
It's very much like the law of
large numbers and all of these

00:50:58.690 --> 00:51:00.030
other things.

00:51:00.030 --> 00:51:03.120
I'm going to talk a little bit
at the end about what that

00:51:03.120 --> 00:51:04.820
relationship really is.

00:51:07.360 --> 00:51:10.850
Except what it says is, after a
long time, you're in steady

00:51:10.850 --> 00:51:12.670
state, which is why
it's called the

00:51:12.670 --> 00:51:13.760
steady state theorem.

00:51:13.760 --> 00:51:14.440
Yes?

00:51:14.440 --> 00:51:17.386
AUDIENCE: Could you define the
steady states for periodic

00:51:17.386 --> 00:51:18.636
changes [INAUDIBLE]?

00:51:21.320 --> 00:51:26.460
PROFESSOR: I try to avoid doing
that because you have

00:51:26.460 --> 00:51:28.650
steady state probabilities.

00:51:28.650 --> 00:51:31.810
The steady state probabilities
that you have are, you take--

00:51:34.990 --> 00:51:38.760
is if you have these
sub-classes.

00:51:38.760 --> 00:51:42.690
Then you wind up with a steady
state within each sub-class.

00:51:42.690 --> 00:51:46.900
If you assign a probability
of the probability in the

00:51:46.900 --> 00:51:51.870
sub-class, divided by d, then
you get what is the steady

00:51:51.870 --> 00:51:52.930
state probability.

00:51:52.930 --> 00:51:56.870
If you start out in that steady
state, then you're in

00:51:56.870 --> 00:52:00.130
each sub-class with probability
1 over d.

00:52:00.130 --> 00:52:04.230
And you shift to the next
sub-class and you're still in

00:52:04.230 --> 00:52:08.340
steady state, because you have
a probability, 1 over d, of

00:52:08.340 --> 00:52:12.230
being in each of those
sub-classes to start with.

00:52:12.230 --> 00:52:16.970
You shift and you're still in
one of the sub-classes with

00:52:16.970 --> 00:52:19.130
probability 1 over d.

00:52:19.130 --> 00:52:22.690
So there still is a steady
state in that sense, but

00:52:22.690 --> 00:52:24.830
there's not a steady state
in any nice sense.

00:52:31.940 --> 00:52:39.470
So anyway, that's
the way it is.

00:52:39.470 --> 00:52:44.860
But you see, if you understand
this theorem for ergodic

00:52:44.860 --> 00:52:48.550
finite state and Markov
chains, and then you

00:52:48.550 --> 00:52:52.540
understand about periodic
change and this set of

00:52:52.540 --> 00:52:56.070
sub-classes, you can
see within each

00:52:56.070 --> 00:52:59.450
sub-class, if you look at--

00:52:59.450 --> 00:53:00.700
if you look at--

00:53:04.440 --> 00:53:11.500
if you look at time 0, time d,
time 2d, times 3d and 4d, then

00:53:11.500 --> 00:53:14.470
whatever state you start in,
you're going to be in the same

00:53:14.470 --> 00:53:19.380
class after d steps, the same
class after 2d steps.

00:53:19.380 --> 00:53:21.480
You're going to have
a transition

00:53:21.480 --> 00:53:24.280
matrix over d steps.

00:53:24.280 --> 00:53:27.360
And this theorem still applies
to these sub-classes over

00:53:27.360 --> 00:53:29.200
periods of d.

00:53:29.200 --> 00:53:32.030
So the hard part of it
is proving this.

00:53:32.030 --> 00:53:35.180
After you prove this, then you
see that the same thing

00:53:35.180 --> 00:53:38.200
happens over each sub-class
after that.

00:53:43.650 --> 00:53:45.290
That's a pretty major theorem.

00:53:45.290 --> 00:53:46.990
It's difficult to prove.

00:53:46.990 --> 00:53:50.890
A sub-step is to show that for
an ergodic M state Markov

00:53:50.890 --> 00:53:56.380
chain, the probability of being
in state j at time n,

00:53:56.380 --> 00:54:00.930
given that you're in state i at
time 0, is positive for all

00:54:00.930 --> 00:54:05.870
i j, and all n greater than
M minus 1 squared plus 1.

00:54:05.870 --> 00:54:10.900
It's very surprising that you
have to go this many states--

00:54:10.900 --> 00:54:14.980
this many steps before you get
to the point that all these

00:54:14.980 --> 00:54:18.440
transition probabilities
are positive.

00:54:18.440 --> 00:54:22.450
You look at this particular kind
of Markov chain in the

00:54:22.450 --> 00:54:26.660
homework, and I hope what you
found out from it was that if

00:54:26.660 --> 00:54:32.040
you start, say, in state two,
then at the next time, you

00:54:32.040 --> 00:54:33.640
have to be in 3.

00:54:33.640 --> 00:54:37.020
Next time, you have to be in
4, you have to be in 5, you

00:54:37.020 --> 00:54:38.560
have to be in 6.

00:54:38.560 --> 00:54:41.300
In other words, the size of
the set that you can be in

00:54:41.300 --> 00:54:46.550
after one step is just 1.

00:54:46.550 --> 00:54:51.170
One possible state here, one
possible state here, one

00:54:51.170 --> 00:54:52.640
possible state here.

00:54:52.640 --> 00:54:57.250
The next step, you're in either
1 or 2, and as you

00:54:57.250 --> 00:55:01.600
travel around, the size of the
set of states you can be in at

00:55:01.600 --> 00:55:06.510
these different steps, is 2,
until you get all the way

00:55:06.510 --> 00:55:07.510
around again.

00:55:07.510 --> 00:55:09.800
And then there's
a way to get--

00:55:09.800 --> 00:55:15.050
when you get to state 6 again,
the set of states enlarges.

00:55:15.050 --> 00:55:18.970
So finally you get up to a
set of states, which is

00:55:18.970 --> 00:55:20.800
up to M minus 1.

00:55:20.800 --> 00:55:25.630
And that's why you get the M
minus 1 squared here, plus 1.

00:55:25.630 --> 00:55:28.710
And this is the only Markov
chain there is.

00:55:28.710 --> 00:55:31.850
You can have as many
states going around

00:55:31.850 --> 00:55:33.770
here as you want to.

00:55:33.770 --> 00:55:36.020
But you have to have this
structure at the end, where

00:55:36.020 --> 00:55:39.930
there's one special state and
one way of circumventing it,

00:55:39.930 --> 00:55:43.930
which means there's one cycle
of size M minus 1, and one

00:55:43.930 --> 00:55:48.440
cycle of size M. And that's the
only way you can get it.

00:55:48.440 --> 00:55:52.780
And that's the only Markov chain
that meets this bound

00:55:52.780 --> 00:55:53.640
with equality.

00:55:53.640 --> 00:56:01.470
In all other cases, you get this
property much earlier.

00:56:01.470 --> 00:56:05.200
And often, you get it after just
a linear amount of time.

00:56:09.360 --> 00:56:13.350
The other part of this major
theorem that you reach steady

00:56:13.350 --> 00:56:17.350
state says, let P be
greater than 0.

00:56:17.350 --> 00:56:19.150
In other words, let
all the transition

00:56:19.150 --> 00:56:22.410
probabilities be positive.

00:56:22.410 --> 00:56:28.040
And then define some quantity
alpha as a minimum of the

00:56:28.040 --> 00:56:30.160
transition probabilities.

00:56:30.160 --> 00:56:34.110
And then the theorem says, for
all states j and all n greater

00:56:34.110 --> 00:56:38.470
than or equal to 1, the maximum
over the initial

00:56:38.470 --> 00:56:43.180
states minus the minimum over
the initial states of P sub i

00:56:43.180 --> 00:56:49.040
j to the n plus-- first step,
that difference is less than

00:56:49.040 --> 00:56:52.470
or equal to the difference
a the n-th step,

00:56:52.470 --> 00:56:54.300
times 1 minus 2 alpha.

00:56:54.300 --> 00:56:58.970
Now 1 minus 2 alpha is
as a positive number.

00:56:58.970 --> 00:57:03.700
And this says that this maximum
minus minimum is 1

00:57:03.700 --> 00:57:07.860
minus 2 alpha to the n, which
says that the limit of the

00:57:07.860 --> 00:57:11.220
maximizing term is equal
to the limit of

00:57:11.220 --> 00:57:12.640
the minimizing term.

00:57:12.640 --> 00:57:13.850
And what does that say?

00:57:13.850 --> 00:57:18.740
It says that everything in the
middle gets squeezed together.

00:57:18.740 --> 00:57:24.200
And it says exactly what we want
it to say, that the limit

00:57:24.200 --> 00:57:30.380
of P sub l j to the n is
independent of l, after n gets

00:57:30.380 --> 00:57:31.310
very large.

00:57:31.310 --> 00:57:34.090
Because the maximum and
the minimum get very

00:57:34.090 --> 00:57:37.560
close to each other.

00:57:37.560 --> 00:57:40.170
We also showed that [? our ?]
approaches that limit

00:57:40.170 --> 00:57:41.780
exponentially.

00:57:41.780 --> 00:57:43.640
That's what this says.

00:57:43.640 --> 00:57:49.860
The exponent here is just this
alpha, determined in that way.

00:57:49.860 --> 00:57:54.630
And the theorem for ergodic
Markov chains then follows by

00:57:54.630 --> 00:58:01.380
just looking at successive h
steps in the Markov chain when

00:58:01.380 --> 00:58:06.110
h is large enough so that all
these transition probabilities

00:58:06.110 --> 00:58:07.360
are positive.

00:58:09.300 --> 00:58:12.220
So you go out far enough
that all the transition

00:58:12.220 --> 00:58:13.860
probabilities are positive.

00:58:13.860 --> 00:58:16.980
And then you look at repetitions
of that, and apply

00:58:16.980 --> 00:58:18.230
this theorem.

00:58:18.230 --> 00:58:21.570
And suddenly you have this
general theorem,

00:58:21.570 --> 00:58:22.900
which is what we wanted.

00:58:27.200 --> 00:58:30.530
An ergodic unichain is a Markov
up chain with one

00:58:30.530 --> 00:58:33.870
ergodic recurring class,
plus perhaps a set

00:58:33.870 --> 00:58:36.550
of transient states.

00:58:36.550 --> 00:58:39.600
And most of the things we talk
about in this course are for

00:58:39.600 --> 00:58:45.870
unichains, usually ergodic
unichains, because if you have

00:58:45.870 --> 00:58:49.160
multiple recurrent classes,
it just makes a mess.

00:58:49.160 --> 00:58:51.780
You wind up in this recurrent
class, or

00:58:51.780 --> 00:58:53.950
this recurrent class.

00:58:53.950 --> 00:59:00.080
And aside from the question of
which one you get to, you

00:59:00.080 --> 00:59:01.730
don't much care about it.

00:59:01.730 --> 00:59:05.790
And the theorem here is for an
ergodic finite-state unichain.

00:59:05.790 --> 00:59:10.370
The limit of P sub i j to the
n probability of being in

00:59:10.370 --> 00:59:15.130
state j at time n, given that
you're in state i at time 0,

00:59:15.130 --> 00:59:17.290
is equal to pi sub j.

00:59:17.290 --> 00:59:22.330
In other words, this limit
here exists for all i j.

00:59:22.330 --> 00:59:25.210
And the limit is independent
of i.

00:59:25.210 --> 00:59:27.900
And it's independent of n
as n gets big enough.

00:59:32.820 --> 00:59:42.970
And then also, we can choose
this so that this set of

00:59:42.970 --> 00:59:47.680
probabilities here satisfies
this, what's called the steady

00:59:47.680 --> 00:59:51.780
state condition, the sum
of pi i times P sub i j

00:59:51.780 --> 00:59:53.140
is equal to pi j.

00:59:53.140 --> 00:59:56.380
In other words, if you start out
in steady state, and you

00:59:56.380 --> 01:00:00.300
look at the probabilities of
being in the different states

01:00:00.300 --> 01:00:06.610
at the next time unit, this is
the probability of being in

01:00:06.610 --> 01:00:11.610
state j at time n plus 1, if
this is the probability of

01:00:11.610 --> 01:00:14.420
being in state i at time n.

01:00:14.420 --> 01:00:17.790
So that condition
gets satisfied.

01:00:17.790 --> 01:00:19.280
That condition is satisfied.

01:00:19.280 --> 01:00:22.760
You just stay in steady
state forever.

01:00:22.760 --> 01:00:29.210
And pi i has to be positive for
a recurrent i, and pi i is

01:00:29.210 --> 01:00:31.680
equal to 0 otherwise.

01:00:31.680 --> 01:00:35.230
So this is just a
generalization

01:00:35.230 --> 01:00:38.090
of the ergodic theorem.

01:00:38.090 --> 01:00:43.400
And this is not what people
refer to as the ergodic

01:00:43.400 --> 01:00:48.160
theorem, which is a much more
general theorem than this.

01:00:48.160 --> 01:00:50.900
This is the ergodic theorem for
the case of finite state

01:00:50.900 --> 01:00:53.110
Markov chains.

01:00:53.110 --> 01:00:59.190
You can restate this in matrix
form as the limit of the

01:00:59.190 --> 01:01:02.900
matrix P to the n-th power.

01:01:02.900 --> 01:01:06.680
What I didn't mention here and
what I probably didn't mention

01:01:06.680 --> 01:01:11.880
enough in the notes is
that P sub i j--

01:01:32.360 --> 01:01:47.560
but also, if you take the matrix
P times P time P, n

01:01:47.560 --> 01:01:53.880
times, namely, you take the
matrix, P to the n.

01:01:53.880 --> 01:02:00.720
This says the P sub i j
is the i j element.

01:02:09.900 --> 01:02:12.530
I'm sure all of you know that
by now, because you've been

01:02:12.530 --> 01:02:15.310
using it all the time.

01:02:15.310 --> 01:02:18.820
And what this says here--

01:02:18.820 --> 01:02:26.150
what we've said before is that
every row of this matrix, P to

01:02:26.150 --> 01:02:28.600
the n, is the same.

01:02:28.600 --> 01:02:31.290
Every row is equal to pi.

01:02:31.290 --> 01:02:47.786
P to the n tends to a matrix
which is pi 1, pi 2,

01:02:47.786 --> 01:02:52.120
up to pi sub n.

01:02:52.120 --> 01:02:57.000
Pi 1, pi 2, up to pi sub n.

01:03:00.760 --> 01:03:06.770
Pi 1, pi 2, up to pi sub n.

01:03:06.770 --> 01:03:14.660
And the easiest way to express
this is the vector e times pi,

01:03:14.660 --> 01:03:24.960
where e is transposed.

01:03:24.960 --> 01:03:32.755
In other words, if you take a
column matrix, column 1, 1, 1,

01:03:32.755 --> 01:03:40.670
1, 1, and you multiply this by
a row vector, pi 1 times pi

01:03:40.670 --> 01:03:48.030
sub n, what you get is, for this
first row multiplied by

01:03:48.030 --> 01:03:51.210
this, this gives you--

01:03:51.210 --> 01:03:53.480
well, in fact, if you
multiply this out,

01:03:53.480 --> 01:03:56.360
this is what you get.

01:03:56.360 --> 01:03:58.650
And if you've never gone through
the trouble of seeing

01:03:58.650 --> 01:04:03.880
that this multiplication leads
to this, please do it, because

01:04:03.880 --> 01:04:07.170
it's important to notice
that correspondence.

01:04:14.530 --> 01:04:18.080
We got specific results by
looking at the eigenvalues and

01:04:18.080 --> 01:04:20.880
eigenvectors of stochastic
matrices.

01:04:20.880 --> 01:04:24.720
And a stochastic matrix is the
matrix of a Markov chain.

01:04:28.500 --> 01:04:31.290
So some of these things
are sort of obvious.

01:04:31.290 --> 01:04:36.870
Lambda is an eigenvalue of P, if
and only if P minus lambda

01:04:36.870 --> 01:04:38.120
i is singular.

01:04:41.670 --> 01:04:45.040
This set of relationships
is not obvious.

01:04:45.040 --> 01:04:48.130
This is obvious linear
algebra.

01:04:48.130 --> 01:04:51.250
This is something that when
you study eigenvalues and

01:04:51.250 --> 01:04:55.430
eigenvectors in linear algebra,
you recognize that

01:04:55.430 --> 01:04:57.270
this is a summary of
a lot of things.

01:04:57.270 --> 01:05:01.440
If and only if this determinant
is equal to 0,

01:05:01.440 --> 01:05:05.650
which is true if and only if
there's some vector nu for

01:05:05.650 --> 01:05:12.560
which P times nu equals lambda
times nu for nu unequal to 0.

01:05:12.560 --> 01:05:16.920
And if and only if pi P equals
lambda pi for some

01:05:16.920 --> 01:05:18.210
pi unequal to 0.

01:05:18.210 --> 01:05:23.250
In other words, if this
determinant is equal to 0, it

01:05:23.250 --> 01:05:32.040
means that the matrix P minus
lambda i is singular.

01:05:32.040 --> 01:05:35.950
If the matrix is singular, there
has to be some solution

01:05:35.950 --> 01:05:38.370
to this equation here.

01:05:38.370 --> 01:05:40.220
There has to be some
solution to this

01:05:40.220 --> 01:05:44.530
left eigenvector equation.

01:05:44.530 --> 01:05:48.740
Now, once you see this, you
notice that e is always a

01:05:48.740 --> 01:05:53.750
right eigenvector of P. Every
stochastic matrix in the world

01:05:53.750 --> 01:05:58.920
has the property that e is a
right eigenvector of it.

01:05:58.920 --> 01:05:59.800
Why is that?

01:05:59.800 --> 01:06:05.230
Because all of the rows of a
stochastic matrix sum to 1.

01:06:05.230 --> 01:06:10.070
If you start off in state i, the
sum of the possible states

01:06:10.070 --> 01:06:14.530
you can be at in the next
step is equal to 1.

01:06:14.530 --> 01:06:17.120
You have to go somewhere.

01:06:17.120 --> 01:06:21.650
So e is always a right
eigenvector of P with

01:06:21.650 --> 01:06:23.300
eigenvalue 1.

01:06:23.300 --> 01:06:26.510
Since e is also is a right
eigenvector of P with

01:06:26.510 --> 01:06:29.850
eigenvalue 1, we go up here.

01:06:29.850 --> 01:06:32.460
We look at these if and
only if statements.

01:06:32.460 --> 01:06:34.890
We see, then, P must
be singular.

01:06:34.890 --> 01:06:38.410
And then pi times P
equals lambda pi.

01:06:38.410 --> 01:06:41.410
So no matter how many recurrent
classes we have, no

01:06:41.410 --> 01:06:46.430
matter what periodicity we have
in each of them, there's

01:06:46.430 --> 01:06:53.170
always a solution to pi
times P equals pi.

01:06:53.170 --> 01:06:55.550
There's always at least one
steady state vector.

01:06:59.320 --> 01:07:03.580
This determinant has an M-th
degree polynomial in lambda.

01:07:03.580 --> 01:07:08.150
M-th degree polynomials
have M roots.

01:07:08.150 --> 01:07:10.400
They aren't necessarily
distinct.

01:07:10.400 --> 01:07:14.040
The multiplicity of an
eigenvalue is the number roots

01:07:14.040 --> 01:07:15.500
of that value.

01:07:15.500 --> 01:07:19.780
And the multiplicity
of lambda equals 1.

01:07:19.780 --> 01:07:22.530
How many different roots
are there which have

01:07:22.530 --> 01:07:24.360
lambda equals 1?

01:07:24.360 --> 01:07:26.940
Well it turns out to be just
the number of recurrent

01:07:26.940 --> 01:07:29.550
classes that you have.

01:07:29.550 --> 01:07:32.750
If you have a bunch of recurrent
classes, within each

01:07:32.750 --> 01:07:37.330
recurring class, there's a
solution to pi P equals pi,

01:07:37.330 --> 01:07:41.540
which is non-zero only one
that recurrent class.

01:07:41.540 --> 01:07:46.340
Namely, you take this huge
Markov chain and you say, I

01:07:46.340 --> 01:07:48.650
don't care about any
of this except this

01:07:48.650 --> 01:07:50.890
one recurrent class.

01:07:50.890 --> 01:07:53.990
If we look at this one recurrent
class, and solve for

01:07:53.990 --> 01:07:57.500
the steady state probability in
that one recurrent class,

01:07:57.500 --> 01:08:01.220
then we get an eigenvector
which is non-zero on that

01:08:01.220 --> 01:08:05.990
class, 0 everywhere else, that
has an eigenvalue 1.

01:08:05.990 --> 01:08:08.050
And for every other recurrent
class, we

01:08:08.050 --> 01:08:10.590
get the same situation.

01:08:10.590 --> 01:08:14.150
So the multiplicity of lambda
equals 1 is equal to the

01:08:14.150 --> 01:08:17.260
number of recurrent classes.

01:08:17.260 --> 01:08:21.950
If you didn't get that proof
on the fly, it gets

01:08:21.950 --> 01:08:23.310
proved in the notes.

01:08:23.310 --> 01:08:27.130
And if you don't get the proof,
just remember that

01:08:27.130 --> 01:08:28.380
that's the way it is.

01:08:30.859 --> 01:08:34.859
For the special case where all
M eigenvalues are distinct,

01:08:34.859 --> 01:08:38.640
the right eigenvectors are
linearly independent.

01:08:38.640 --> 01:08:42.620
You remember that proof we went
through that all of the

01:08:42.620 --> 01:08:46.470
left eigenvectors and all the
right eigenvectors are all

01:08:46.470 --> 01:08:49.870
orthonormal to each other,
or you can make them all

01:08:49.870 --> 01:08:52.270
orthonormal to each other?

01:08:52.270 --> 01:08:57.380
That says that if the right
eigenvectors are linearly

01:08:57.380 --> 01:09:01.120
independent, you can represent
them as the columns of an

01:09:01.120 --> 01:09:04.750
invertible matrix U.
Then P times U is

01:09:04.750 --> 01:09:06.819
equal to U times lambda.

01:09:06.819 --> 01:09:09.800
What does this equations say?

01:09:09.800 --> 01:09:12.460
You split it up into a
bunch of equations.

01:09:16.500 --> 01:09:46.080
P times U and we look at it as
nu 1, nu 2, nu sub [? n ?].

01:09:46.080 --> 01:09:52.580
I guess better put the
superscripts on it.

01:09:56.100 --> 01:10:01.270
If I take the matrix U and just
view it as M different

01:10:01.270 --> 01:10:05.190
columns, then what this
is saying is that

01:10:05.190 --> 01:10:06.545
this is equal to--

01:10:17.290 --> 01:10:35.540
nu 1, nu 2, nu M, times lambda
1, lambda 2, up to lambda M.

01:10:35.540 --> 01:10:38.500
Now you multiply this out,
and what do you get?

01:10:38.500 --> 01:10:41.860
You get nu 1 times lambda 1.

01:10:41.860 --> 01:10:46.190
You get a nu 2 times lambda 2
for the second column, nu M

01:10:46.190 --> 01:10:49.820
times lambda M for the last
column, and here you get P

01:10:49.820 --> 01:10:54.360
times nu 1 is equal to a nu 1
times lambda 1, and so forth.

01:10:54.360 --> 01:10:59.240
So all this vector equation says
is the same thing that

01:10:59.240 --> 01:11:04.760
these n M individual eigenvector
equations say.

01:11:04.760 --> 01:11:11.160
It's just a more compact way
of saying the same thing.

01:11:11.160 --> 01:11:17.300
And if these eigenvectors span
this space, then this set of

01:11:17.300 --> 01:11:20.710
eigenvectors are linearly
independent of each other.

01:11:20.710 --> 01:11:24.860
And when you look at the set of
them, this matrix here has

01:11:24.860 --> 01:11:26.440
to have an inverse.

01:11:26.440 --> 01:11:34.890
So you can also express this
as P equals this vector--

01:11:34.890 --> 01:11:40.820
this matrix of right
eigenvectors times the

01:11:40.820 --> 01:11:46.630
diagonal matrix lambda, times
the inverse of this matrix.

01:11:46.630 --> 01:11:49.880
Matrix U to the minus 1 turns
out to have rows equal to the

01:11:49.880 --> 01:11:51.730
left eigenvectors.

01:11:51.730 --> 01:11:54.330
That's because these
eigenvectors--

01:11:54.330 --> 01:11:57.440
that's because the right
eigenvectors and the left

01:11:57.440 --> 01:12:01.270
eigenvectors are orthogonal
to each other.

01:12:04.670 --> 01:12:09.690
When we then split up this
matrix into a sum of M

01:12:09.690 --> 01:12:13.830
different matrices, each matrix
having only one--

01:12:41.270 --> 01:12:43.460
and so forth.

01:12:43.460 --> 01:12:45.710
Then what you get--

01:12:45.710 --> 01:12:48.490
here's this--

01:12:48.490 --> 01:12:54.730
this nice equation here, which
says that if all the

01:12:54.730 --> 01:12:58.870
eigenvalues are distinct, then
you can always represent a

01:12:58.870 --> 01:13:03.420
stochastic matrix as the sum of
lambda i times nu to the i

01:13:03.420 --> 01:13:04.670
times pi to the i.

01:13:04.670 --> 01:13:10.000
More importantly, if you take
this equation here and look at

01:13:10.000 --> 01:13:14.470
P to the n, P to the n is U
times lambda times U to the

01:13:14.470 --> 01:13:18.820
minus 1, times U times lambda
times U to the minus 1, blah,

01:13:18.820 --> 01:13:20.270
blah, blah forever.

01:13:20.270 --> 01:13:24.030
Each U to the minus 1 cancels
out with the following U. And

01:13:24.030 --> 01:13:29.330
you wind up with P to the n
equals U times lambda to the

01:13:29.330 --> 01:13:33.170
n, U to the minus 1.

01:13:33.170 --> 01:13:40.250
Which says that P to the
n is just a sum here.

01:13:40.250 --> 01:13:44.650
It's the sum of the eigenvalues
to the n-th power

01:13:44.650 --> 01:13:47.320
times these pairs of
eigenvectors here.

01:13:47.320 --> 01:13:51.660
So this is a general
decomposition for P to the n.

01:13:51.660 --> 01:13:56.010
What we're interested in is what
happens as n gets large.

01:13:56.010 --> 01:13:59.360
If we have a unit chain, we
already know what happens as n

01:13:59.360 --> 01:14:00.570
gets large.

01:14:00.570 --> 01:14:07.110
We know that as n gets large,
we wind up with just 1 times

01:14:07.110 --> 01:14:12.480
this eigenvector e times
this eigenvector pi.

01:14:12.480 --> 01:14:15.760
Which says that all of the other
eigenvalues have to go

01:14:15.760 --> 01:14:19.670
to 0, which says that the
magnitude of these other

01:14:19.670 --> 01:14:22.200
eigenvalues are less than 1.

01:14:22.200 --> 01:14:23.450
So they're all going away.

01:14:26.600 --> 01:14:32.300
So the facts here are that all
eigenvalues lambda have to

01:14:32.300 --> 01:14:35.310
satisfy the magnitude
of lambda is less

01:14:35.310 --> 01:14:36.740
than or equal to 1.

01:14:36.740 --> 01:14:39.680
That's what I just argued.

01:14:39.680 --> 01:14:44.530
For each recurrent class C,
there's one lambda equals 1,

01:14:44.530 --> 01:14:47.750
with a left side and vector
equals the steady state on

01:14:47.750 --> 01:14:51.190
that recurrent class
and 0 elsewhere.

01:14:51.190 --> 01:14:55.230
The right eigenvector nu
satisfies the limit as n goes

01:14:55.230 --> 01:14:56.410
to infinity.

01:14:56.410 --> 01:15:00.930
So the probability that x sub n
is in this recurring class,

01:15:00.930 --> 01:15:04.850
given that x sub 0 is equal
to 0, is equal to the i-th

01:15:04.850 --> 01:15:08.700
component of that right
eigenvector.

01:15:08.700 --> 01:15:13.200
In other words, if you have a
Markov chain which has several

01:15:13.200 --> 01:15:16.480
recurrent classes, and you
want to find out what the

01:15:16.480 --> 01:15:23.630
probability is, starting in the
transient state, of going

01:15:23.630 --> 01:15:29.170
to one of those classes, this is
what tells you that answer.

01:15:29.170 --> 01:15:33.510
This says that the probability
that you go to a particular

01:15:33.510 --> 01:15:37.530
recurrent class C, given that
you start off in a particular

01:15:37.530 --> 01:15:41.340
transient state i, is whatever
that right eigenvector

01:15:41.340 --> 01:15:42.690
turns out to be.

01:15:42.690 --> 01:15:46.170
And you can solve that right
eigenvector problem just as an

01:15:46.170 --> 01:15:48.920
M by M set of linear
equations.

01:15:48.920 --> 01:15:51.170
So you can find the
probabilities of going through

01:15:51.170 --> 01:15:56.370
each transient state just by
solving that set of linear

01:15:56.370 --> 01:16:01.650
equations and finding those
eigenvector equations.

01:16:01.650 --> 01:16:05.770
For each recurrent periodic
class of period d, there are d

01:16:05.770 --> 01:16:09.140
eigenvalues equally spaced
on the unit circle.

01:16:09.140 --> 01:16:13.330
There are no other eigenvalues
with lambda equals 1-- with a

01:16:13.330 --> 01:16:15.080
magnitude of lambda equals 1.

01:16:15.080 --> 01:16:19.070
In other words, for each
recurrent class, you get one

01:16:19.070 --> 01:16:20.700
eigenvalue that's equal to 1.

01:16:20.700 --> 01:16:25.260
If that recurrent class is
periodic, you get a bunch of

01:16:25.260 --> 01:16:30.640
other eigenvalues put around
the unit circle.

01:16:30.640 --> 01:16:35.380
And those are all the
eigenvalues there are.

01:16:35.380 --> 01:16:36.296
Oh my God.

01:16:36.296 --> 01:16:38.000
It's--

01:16:38.000 --> 01:16:39.930
I thought I was talking
quickly.

01:16:39.930 --> 01:16:44.870
But anyway, if the eigenvectors
don't span the

01:16:44.870 --> 01:16:50.360
space, then P to the n is equal
to U times this Jordan

01:16:50.360 --> 01:16:55.350
reform, U to the minus 1, where
J is a Jordan form.

01:16:55.350 --> 01:16:58.320
What you saw in the homework
when you looked at the--

01:17:02.030 --> 01:17:04.075
when you looked at the
Markov chain--

01:17:28.120 --> 01:17:28.620
OK.

01:17:28.620 --> 01:17:35.020
This is one recurrent class
with this one node in it.

01:17:35.020 --> 01:17:38.030
These two nodes are
both transient.

01:17:38.030 --> 01:17:41.720
If you look at how long it takes
to get from here over to

01:17:41.720 --> 01:17:45.120
there, those transition
probabilities do not

01:17:45.120 --> 01:17:51.620
correspond to this
equation here.

01:17:51.620 --> 01:17:54.075
Instead, P sub 1 2--

01:17:57.400 --> 01:18:00.230
P sub 2 3, the way I've
drawn it here.

01:18:00.230 --> 01:18:07.140
P sub 1 3 is n times this
eigenvalue, which

01:18:07.140 --> 01:18:09.760
is 1/2 in this case.

01:18:09.760 --> 01:18:12.820
And it doesn't correspond to
this, which is why you need a

01:18:12.820 --> 01:18:14.290
Jordan form.

01:18:14.290 --> 01:18:17.860
I said that Jordan forms
are excessively ugly.

01:18:17.860 --> 01:18:22.120
Jordan forms are really very
classy and nice ways of

01:18:22.120 --> 01:18:24.460
dealing with a problem
which is very ugly.

01:18:24.460 --> 01:18:26.340
So don't blame Jordan.

01:18:26.340 --> 01:18:29.670
Jordan simplified
things for us.

01:18:29.670 --> 01:18:36.840
So that's roughly as far as we
went with Markov chains.

01:18:40.970 --> 01:18:44.910
Renewal processes, we don't have
to review them because

01:18:44.910 --> 01:18:47.400
you're already immediately
familiar with them.

01:18:50.610 --> 01:18:55.910
I will do one thing next time
with renewal classes and

01:18:55.910 --> 01:19:00.290
Markov chains, which is to
explain to you why the

01:19:00.290 --> 01:19:04.660
expected amount of time to get
from one state back to itself

01:19:04.660 --> 01:19:07.380
is equal to 1 over pi--

01:19:07.380 --> 01:19:09.160
1 over pi sub i.

01:19:09.160 --> 01:19:10.790
You did that in the homework.

01:19:10.790 --> 01:19:12.900
And it was an awful
way to do it.

01:19:12.900 --> 01:19:14.340
And there's a nice
way to do it.

01:19:14.340 --> 01:19:15.860
I'll talk about that next time.