WEBVTT
00:00:00.530 --> 00:00:02.960
The following content is
provided under a Creative
00:00:02.960 --> 00:00:04.370
Commons license.
00:00:04.370 --> 00:00:07.410
Your support will help MIT
OpenCourseWare continue to
00:00:07.410 --> 00:00:11.060
offer high-quality educational
resources for free.
00:00:11.060 --> 00:00:13.960
To make a donation or view
additional materials from
00:00:13.960 --> 00:00:19.790
hundreds of MIT courses, visit
MIT OpenCourseWare at
00:00:19.790 --> 00:00:21.040
ocw.mit.edu.
00:00:23.540 --> 00:00:29.950
PROFESSOR: OK, let's get started
again on finite-state
00:00:29.950 --> 00:00:31.490
Markov chains.
00:00:31.490 --> 00:00:33.130
Sorry I was away last week.
00:00:33.130 --> 00:00:38.390
It was a long-term commitment
that I had to honor.
00:00:38.390 --> 00:00:40.170
But I think I will
be around for all
00:00:40.170 --> 00:00:41.550
the rest of the lectures.
00:00:41.550 --> 00:00:49.950
So I want to start out by
reviewing just a little bit.
00:00:49.950 --> 00:00:53.570
I'm spending a lot more time on
finite-state Markov chains
00:00:53.570 --> 00:00:57.940
than we usually do in this
course, partly because I've
00:00:57.940 --> 00:01:02.230
rewritten this section, partly
because I think the material
00:01:02.230 --> 00:01:04.510
is very important.
00:01:04.510 --> 00:01:09.850
It's sort of bread-and-butter
stuff, of
00:01:09.850 --> 00:01:11.930
discrete stochastic processes.
00:01:11.930 --> 00:01:13.690
You use it all the time.
00:01:13.690 --> 00:01:18.080
It's a foundation for almost
everything else.
00:01:18.080 --> 00:01:22.410
And after thinking about it
for a long time, it really
00:01:22.410 --> 00:01:23.800
isn't all that complicated.
00:01:23.800 --> 00:01:27.760
I used to think that all these
details of finding eigenvalues
00:01:27.760 --> 00:01:32.580
and eigenvectors and so on
was extremely tedious.
00:01:32.580 --> 00:01:35.810
And it turns out that
there's a very nice
00:01:35.810 --> 00:01:37.850
pleasant theory there.
00:01:37.850 --> 00:01:40.960
You can find all of these things
after you know what
00:01:40.960 --> 00:01:46.790
you're doing by very simple
computer packages.
00:01:46.790 --> 00:01:49.160
But they don't help if you don't
know what's going on.
00:01:49.160 --> 00:01:52.930
So here, we're trying to figure
out what's going on.
00:01:52.930 --> 00:01:57.720
So let's start out by reviewing
what we know about
00:01:57.720 --> 00:02:05.120
ergodic unit chains and
proceed from there.
00:02:05.120 --> 00:02:10.710
An ergodic finite-state Markov
chain has transition
00:02:10.710 --> 00:02:17.190
probabilities which, if you look
at the transition matrix
00:02:17.190 --> 00:02:21.180
raised to the nth power, what
that gives you is the
00:02:21.180 --> 00:02:24.440
transition probabilities of
an n-step Markov chain.
00:02:24.440 --> 00:02:29.360
In other words, you start at
time 0, and at time n, you
00:02:29.360 --> 00:02:31.310
look at what state you're in.
00:02:31.310 --> 00:02:38.190
P sub ij to the nth power is
then the probability that
00:02:38.190 --> 00:02:42.110
you're in state j at time
n, given that you're in
00:02:42.110 --> 00:02:45.910
state i at time 0.
00:02:45.910 --> 00:02:48.980
So this has all the information
that you want
00:02:48.980 --> 00:02:53.180
about what happens to Markov
chain as time gets large.
00:02:53.180 --> 00:02:57.010
One of the things we're most
concerned with is, do you go
00:02:57.010 --> 00:02:58.440
to steady state?
00:02:58.440 --> 00:03:01.150
And if you do go to steady
state, how fast do you go to
00:03:01.150 --> 00:03:02.610
steady state?
00:03:02.610 --> 00:03:05.830
And of course, this matrix
tells you the whole story
00:03:05.830 --> 00:03:11.720
there, because if you go to
steady state, and the Markov
00:03:11.720 --> 00:03:24.390
chain forgets where it started,
then P sub ij to the
00:03:24.390 --> 00:03:29.240
n goes to some constant, pi sub
j, which is independent of
00:03:29.240 --> 00:03:33.190
the starting state, i,
and independent of m,
00:03:33.190 --> 00:03:35.080
asymptotically, as n gets big.
00:03:35.080 --> 00:03:41.730
So this pi is a strictly
positive probability vector.
00:03:41.730 --> 00:03:43.680
I shouldn't say so it is.
00:03:43.680 --> 00:03:47.590
That's something that
was shown last time.
00:03:47.590 --> 00:03:55.530
If you multiply both sides of
this equation by P sub jk in
00:03:55.530 --> 00:03:57.980
sum over k, then what
do you get?
00:03:57.980 --> 00:04:01.580
You get P sub ik to
the n plus 1.
00:04:01.580 --> 00:04:02.995
That goes to a limit also.
00:04:02.995 --> 00:04:04.740
If the limit in n
goes to infin--
00:04:07.670 --> 00:04:10.730
then the limit as n plus 1 goes
to infinity is clearly
00:04:10.730 --> 00:04:12.150
the same thing.
00:04:12.150 --> 00:04:17.200
So this quantity here is
the sum over j, of pi
00:04:17.200 --> 00:04:19.329
sub j, P sub jk.
00:04:19.329 --> 00:04:25.730
And this quantity is equal to
pi sub k, just by definition
00:04:25.730 --> 00:04:26.930
of this quantity.
00:04:26.930 --> 00:04:29.960
So P sub k is equal
to sum of pi j.
00:04:29.960 --> 00:04:32.050
Pjk, what does that say?
00:04:32.050 --> 00:04:34.770
That's the definition of
a steady state vector.
00:04:34.770 --> 00:04:40.310
That's the definition of, if
your probabilities of being in
00:04:40.310 --> 00:04:46.490
state k satisfy this equation,
then one step later, you still
00:04:46.490 --> 00:04:49.580
have the same probability
of being in state k.
00:04:49.580 --> 00:04:52.870
Two steps later, you still have
the same probability of
00:04:52.870 --> 00:04:54.860
being in state k.
00:04:54.860 --> 00:05:00.430
So this is called the steady
state equation.
00:05:00.430 --> 00:05:05.070
And a solution to that is called
a steady state vector.
00:05:05.070 --> 00:05:07.795
And that satisfies this.
00:05:07.795 --> 00:05:10.990
In matrix terms, if you rate
this out, what does it say?
00:05:10.990 --> 00:05:14.800
It says the limit as n
approaches infinity of p to
00:05:14.800 --> 00:05:21.210
the n is equal to the column
vector, e of all 1s.
00:05:23.720 --> 00:05:26.300
The transpose here means
it's a column vector.
00:05:26.300 --> 00:05:30.200
So you have a column vector
times a row vector.
00:05:30.200 --> 00:05:32.760
Now, you know if you have a
row vector times a column
00:05:32.760 --> 00:05:36.360
vector, that just gives
you a number.
00:05:36.360 --> 00:05:38.960
If you have a column
vector times a row
00:05:38.960 --> 00:05:42.200
vector, what happens?
00:05:42.200 --> 00:05:46.510
Well, for each element
of the column, you
00:05:46.510 --> 00:05:47.970
get this whole row.
00:05:47.970 --> 00:05:50.840
And for the next element of the
column, you get the whole
00:05:50.840 --> 00:05:54.790
row down beneath it multiplied
by the element of the column,
00:05:54.790 --> 00:05:56.170
and so forth, day on.
00:05:56.170 --> 00:06:01.760
So a column vector times
a row vector is, in
00:06:01.760 --> 00:06:03.280
fact, a whole matrix.
00:06:03.280 --> 00:06:07.590
It's a j by j matrix.
00:06:07.590 --> 00:06:13.900
And since e is all 1s, what
that matrix is is a matrix
00:06:13.900 --> 00:06:17.950
where every row is a steady
state vector pi.
00:06:17.950 --> 00:06:21.950
So we're saying not only does
this pi that we're talking
00:06:21.950 --> 00:06:25.770
about satisfy this steady
state equation, but more
00:06:25.770 --> 00:06:29.640
important, it's this limiting
vector here.
00:06:29.640 --> 00:06:32.450
And as n goes to infinity,
you in fact do
00:06:32.450 --> 00:06:34.730
forget where you were.
00:06:34.730 --> 00:06:41.660
And the entire matrix of where
you are at time n, given where
00:06:41.660 --> 00:06:46.690
you were at time 0, goes to
just this fixed vector pi.
00:06:46.690 --> 00:06:51.377
So this is a column vector, and
pi is a row vector then.
00:06:54.660 --> 00:06:59.180
The same result almost holds
for ergodic unit chains.
00:06:59.180 --> 00:07:01.510
What's an ergodic unit chain?
00:07:01.510 --> 00:07:06.740
An ergodic unit chain is an
ergodic set of states plus a
00:07:06.740 --> 00:07:09.060
whole bunch of transient
states.
00:07:09.060 --> 00:07:12.970
Doesn't matter whether the
transient states are one class
00:07:12.970 --> 00:07:16.360
of transient states or whether
it's multiple classes of
00:07:16.360 --> 00:07:17.960
transient states.
00:07:17.960 --> 00:07:19.850
It's just transient states.
00:07:19.850 --> 00:07:22.540
And there's one recurrent
class.
00:07:22.540 --> 00:07:24.970
And we're assuming here
that it's recurrent.
00:07:24.970 --> 00:07:28.740
So you can almost see
intuitively that if you start
00:07:28.740 --> 00:07:31.800
out in any one of these
transient states, you bum
00:07:31.800 --> 00:07:34.360
around through the transient
states for a while.
00:07:34.360 --> 00:07:39.800
And eventually, you flop off
into the recurrent class.
00:07:39.800 --> 00:07:41.110
And once you're in
the recurrent
00:07:41.110 --> 00:07:43.960
class, there's no return.
00:07:43.960 --> 00:07:46.220
So you stay there forever.
00:07:46.220 --> 00:07:49.070
Now, that's something that
has to be proven.
00:07:49.070 --> 00:07:50.190
And it's proven in the notes.
00:07:50.190 --> 00:07:51.885
It was probably proven
last time.
00:07:54.420 --> 00:07:59.250
But anyway, what happens then
is that the sole difference
00:07:59.250 --> 00:08:05.090
between ergodic unit chains and
just having a completely
00:08:05.090 --> 00:08:10.670
ergodic Markov chain is that the
steady state factor is now
00:08:10.670 --> 00:08:14.610
positive for all ergodic states
and it's 0 for all
00:08:14.610 --> 00:08:16.100
transient states.
00:08:16.100 --> 00:08:20.710
And aside from that, you still
get the same behavior still.
00:08:20.710 --> 00:08:26.140
As n gets large, you go to the
steady state vector, which is
00:08:26.140 --> 00:08:30.920
the steady state vector
of the ergodic chain.
00:08:30.920 --> 00:08:35.500
If you're doing this stuff by
hand, how do you do it?
00:08:35.500 --> 00:08:39.270
Well, you start out just
with the ergodic class.
00:08:39.270 --> 00:08:42.130
I mean, you might as well
ignore everything else,
00:08:42.130 --> 00:08:45.250
because you know that eventually
you're in that
00:08:45.250 --> 00:08:46.280
ergodic class.
00:08:46.280 --> 00:08:49.830
And you find the steady state
vector in that ergodic class,
00:08:49.830 --> 00:08:51.480
and that's the steady
state vector you're
00:08:51.480 --> 00:08:53.890
going to wind up with.
00:08:53.890 --> 00:08:56.350
This is one advantage of
understanding what you're
00:08:56.350 --> 00:08:59.930
doing, because if you don't
understand what you're doing
00:08:59.930 --> 00:09:04.110
and you're just using computer
programs, then you never have
00:09:04.110 --> 00:09:06.360
any idea what's ergodic,
what's not
00:09:06.360 --> 00:09:07.650
ergodic or anything else.
00:09:07.650 --> 00:09:11.180
You just plug it, you grind
away, you get some answer and
00:09:11.180 --> 00:09:15.230
say, ah, I'll publish a paper.
00:09:15.230 --> 00:09:19.220
And you put down exactly what
the computer says, but you
00:09:19.220 --> 00:09:22.520
have no interpretation
of it at all.
00:09:22.520 --> 00:09:27.700
So the other way of looking at
this is, when you have a bunch
00:09:27.700 --> 00:09:33.640
of transient states, and you
also have an ergodic class,
00:09:33.640 --> 00:09:40.050
you can represent a matrix if
the recurrent states are at
00:09:40.050 --> 00:09:43.190
the end of the chain and the
transient states are at the
00:09:43.190 --> 00:09:45.830
beginning of the chain.
00:09:45.830 --> 00:09:50.850
This matrix here is the matrix
of transition probabilities
00:09:50.850 --> 00:09:53.200
within the recurrent class.
00:09:53.200 --> 00:09:58.870
These are the probabilities for
going from the transient
00:09:58.870 --> 00:10:02.820
states to the recurrent class.
00:10:02.820 --> 00:10:04.810
And once you get over
here, the only place
00:10:04.810 --> 00:10:06.060
to go is down here.
00:10:10.880 --> 00:10:15.380
And the transient class is
just a t by t class.
00:10:15.380 --> 00:10:20.035
And the recurrent class
is just a j minus t
00:10:20.035 --> 00:10:22.540
by j minus t matrix.
00:10:22.540 --> 00:10:25.250
So the idea is that each
transient state eventually has
00:10:25.250 --> 00:10:28.910
a transition to a recurrent
state, and the class of
00:10:28.910 --> 00:10:33.770
recurrent states leads to
study state as before.
00:10:33.770 --> 00:10:37.230
So that really, all that
analysis of ergodic unit
00:10:37.230 --> 00:10:43.470
chains, if you look at it
intuitively, it's all obvious.
00:10:43.470 --> 00:10:48.230
Now, as in much of mathematics,
knowing that
00:10:48.230 --> 00:10:52.820
something is obvious does not
relieve you of the need to
00:10:52.820 --> 00:10:55.950
prove it, because sometimes you
find that something that
00:10:55.950 --> 00:10:58.800
looks obvious is true
most of the time but
00:10:58.800 --> 00:10:59.870
not all of the time.
00:10:59.870 --> 00:11:04.230
And that's the purpose of
doing these things.
00:11:04.230 --> 00:11:07.650
There's another way to express
this eigenvalue, eigenvector
00:11:07.650 --> 00:11:10.700
equation we have here.
00:11:10.700 --> 00:11:16.900
And that is that the transition
matrix minus lambda
00:11:16.900 --> 00:11:23.020
times the identity matrix times
the column vector v is
00:11:23.020 --> 00:11:24.250
equal to 0.
00:11:24.250 --> 00:11:30.760
That's the same as the equation
p times v is equal to
00:11:30.760 --> 00:11:34.910
v. That's the same as
a right eigenvector.
00:11:34.910 --> 00:11:38.880
Well, this is the equation
for an eigenvalue 1.
00:11:38.880 --> 00:11:42.780
This is an equation for an
arbitrary eigenvalue lambda.
00:11:42.780 --> 00:11:48.610
But p times v equals lambda
times v is the same as p minus
00:11:48.610 --> 00:11:50.810
lambda i times v equals 0.
00:11:50.810 --> 00:11:55.010
Why do we even bother to say
something so obvious?
00:11:55.010 --> 00:11:59.980
Well, because when you look at
linear algebra, how many of
00:11:59.980 --> 00:12:04.220
you have never studied any
linear algebra at all or have
00:12:04.220 --> 00:12:09.430
only studied completely
mathematical linear algebra,
00:12:09.430 --> 00:12:15.000
where you never deal with
n-tuples as vectors or
00:12:15.000 --> 00:12:16.970
matrices or any things
like this?
00:12:16.970 --> 00:12:18.220
Is there anyone?
00:12:21.410 --> 00:12:25.890
If you don't have this
background, pick up--
00:12:30.330 --> 00:12:31.633
what's his name?
00:12:31.633 --> 00:12:32.420
AUDIENCE: Strang.
00:12:32.420 --> 00:12:33.390
PROFESSOR: Strang.
00:12:33.390 --> 00:12:35.070
Strang's book.
00:12:35.070 --> 00:12:39.090
It's a remarkably simple-minded
book which says
00:12:39.090 --> 00:12:42.370
everything as clearly
as it can be stated.
00:12:42.370 --> 00:12:45.040
And it tells you everything
you have to know.
00:12:45.040 --> 00:12:48.730
And it does it in a very
straightforward way.
00:12:48.730 --> 00:12:52.860
So I highly recommend it to get
any of the background that
00:12:52.860 --> 00:12:53.960
you might need.
00:12:53.960 --> 00:12:55.660
Most of you, I'm sure, are very
00:12:55.660 --> 00:12:56.870
familiar with these things.
00:12:56.870 --> 00:12:59.760
So I'm just reminding
you of then.
00:12:59.760 --> 00:13:03.540
Now, a square matrix is singular
if there's a vector
00:13:03.540 --> 00:13:07.640
v, such that a times
v is equal to 0.
00:13:07.640 --> 00:13:10.890
That's just a definition
as a singularity.
00:13:10.890 --> 00:13:16.340
Now, lambda is an eigenvalue of
a matrix p if and only if p
00:13:16.340 --> 00:13:19.180
minus lambda times
i is singular.
00:13:19.180 --> 00:13:23.600
In other words, if there's
some v for which p minus
00:13:23.600 --> 00:13:27.870
lambda i times v is equal to
0, that's what this says.
00:13:27.870 --> 00:13:32.150
You put p minus lambda i in
for a, and it says it's
00:13:32.150 --> 00:13:37.580
singular if there's some v
for which this matrix--
00:13:37.580 --> 00:13:40.940
this matrix is singular if
there's some v such that p
00:13:40.940 --> 00:13:44.860
minus lambda i times
v is equal to 0.
00:13:44.860 --> 00:13:48.800
So let a1 to am be
the columns of a.
00:13:48.800 --> 00:13:52.430
Then a is going to be
singular if a1 to am
00:13:52.430 --> 00:13:53.980
are linearly dependent.
00:13:53.980 --> 00:14:02.280
In other words, if there's some
set of coefficients you
00:14:02.280 --> 00:14:09.000
can attach to a1 times v1 plus
a2 times v2, plus up to am
00:14:09.000 --> 00:14:15.510
times vm such that that sum is
equal to 0, that means that a1
00:14:15.510 --> 00:14:17.910
to am are linearly dependent.
00:14:17.910 --> 00:14:24.580
It also means that the matrix a
times that v is equal to 0.
00:14:24.580 --> 00:14:27.390
So those two things say
the same thing again.
00:14:27.390 --> 00:14:30.200
So the square matrix, a, is
singular if and only if the
00:14:30.200 --> 00:14:34.390
rows of a are linearly
independent.
00:14:34.390 --> 00:14:36.100
We set columns here.
00:14:36.100 --> 00:14:38.260
Here, we're doing the
same thing for rows.
00:14:38.260 --> 00:14:40.120
It still holds true.
00:14:40.120 --> 00:14:44.340
And one new thing, if and only
if the determinant of a is
00:14:44.340 --> 00:14:45.720
equal to 0.
00:14:45.720 --> 00:14:49.210
One of the nice things about
determinants is that
00:14:49.210 --> 00:14:54.470
determinants are 0 if the matrix
is singular, if and
00:14:54.470 --> 00:14:56.170
only if the matrix
is singular.
00:14:56.170 --> 00:15:01.440
So the summary of all of this
for a matrix which is a
00:15:01.440 --> 00:15:02.740
transition matrix--
00:15:02.740 --> 00:15:04.960
namely, a stochastic matrix--
00:15:04.960 --> 00:15:10.050
is lambda, is an eigenvalue of
p, if and only if p minus
00:15:10.050 --> 00:15:14.320
lambda i is singular, if and
only if the determinant of p
00:15:14.320 --> 00:15:19.870
minus lambda i is equal to 0,
if and only if p times some
00:15:19.870 --> 00:15:25.150
vector v equals lambda v, and
if and only if u times p
00:15:25.150 --> 00:15:28.410
equals lambda u for some u.
00:15:28.410 --> 00:15:29.310
Yes?
00:15:29.310 --> 00:15:31.672
AUDIENCE: The second to last
statement is actually linearly
00:15:31.672 --> 00:15:34.540
independent, you said?
00:15:34.540 --> 00:15:35.974
The second to last.
00:15:35.974 --> 00:15:38.364
Square matrix a.
00:15:38.364 --> 00:15:39.215
No, above that.
00:15:39.215 --> 00:15:41.870
PROFESSOR: Oh, above that.
00:15:41.870 --> 00:15:46.910
A square matrix a is singular
if and only if the rows of a
00:15:46.910 --> 00:15:49.773
are linearly dependent, yes.
00:15:49.773 --> 00:15:50.680
AUDIENCE: Dependent.
00:15:50.680 --> 00:15:51.490
PROFESSOR: Dependent, yes.
00:15:51.490 --> 00:15:56.050
In other words, if there's
some vector v such that a
00:15:56.050 --> 00:16:08.370
times v is equal to 0, that
means that those columns are
00:16:08.370 --> 00:16:09.620
linearly dependent.
00:16:13.080 --> 00:16:16.630
So we need all of those
relationships.
00:16:16.630 --> 00:16:20.072
It says for every stochastic
matrix--
00:16:20.072 --> 00:16:22.820
oh, now this is something new.
00:16:22.820 --> 00:16:28.000
For every stochastic matrix,
P times e is equal to e.
00:16:28.000 --> 00:16:46.040
Obviously, because if you sum
up the sum of Pij over j is
00:16:46.040 --> 00:16:46.860
equal to 1.
00:16:46.860 --> 00:16:51.230
P sub ij is the probability,
given that you start in state
00:16:51.230 --> 00:16:54.370
i, that in the next step,
you'll be in state j.
00:16:54.370 --> 00:16:56.650
You have to be somewhere
in the next step.
00:16:56.650 --> 00:17:00.570
So if you sum these quantities
up, you have to get 1, which
00:17:00.570 --> 00:17:03.230
says you have to
be some place.
00:17:03.230 --> 00:17:04.480
So that's all this is saying.
00:17:07.109 --> 00:17:10.579
That's true for every
finite-state Markov chain in
00:17:10.579 --> 00:17:15.839
the world, no matter how ugly
it is, how many sets of
00:17:15.839 --> 00:17:20.660
recurrent states it has, how
much periodicity it has.
00:17:20.660 --> 00:17:26.010
A complete generality, P
times e is equal to e.
00:17:26.010 --> 00:17:30.620
So lambda is always an
eigenvalue of a stochastic
00:17:30.620 --> 00:17:35.350
matrix, and e is always
a right eigenvector.
00:17:35.350 --> 00:17:38.130
Well, from what we've just said,
that means there has to
00:17:38.130 --> 00:17:41.090
be a left eigenvector also.
00:17:41.090 --> 00:17:44.080
So there has to be some
pi such that pi times
00:17:44.080 --> 00:17:47.380
P is equal to pi.
00:17:47.380 --> 00:17:51.210
So suddenly, we find there's
also a left eigenvector.
00:17:51.210 --> 00:17:56.470
What we haven't shown yet is
that that pi that satisfies
00:17:56.470 --> 00:17:59.210
this equation is a probability
vector.
00:17:59.210 --> 00:18:03.070
Namely, we haven't shown that
all the components of pi are
00:18:03.070 --> 00:18:04.740
greater than or equal to 0.
00:18:04.740 --> 00:18:06.800
We still have to do that.
00:18:06.800 --> 00:18:10.340
And in fact, that's not
completely trivial.
00:18:10.340 --> 00:18:14.020
If we can find such a vector
that is a probability vector,
00:18:14.020 --> 00:18:17.960
the compound in sum to 1 and
they're not negative, then
00:18:17.960 --> 00:18:21.890
this is the equation for
a steady state vector.
00:18:21.890 --> 00:18:25.590
So what we don't know yet
is whether a steady
00:18:25.590 --> 00:18:27.120
state vector exists.
00:18:27.120 --> 00:18:31.400
We do know that a left
eigenvector exists.
00:18:31.400 --> 00:18:33.690
We're going to show later
that there is a steady
00:18:33.690 --> 00:18:35.050
state vector pi.
00:18:35.050 --> 00:18:40.400
In other words, a non-negative
vector which sums to 1 for all
00:18:40.400 --> 00:18:42.340
finite-state Markov chains.
00:18:42.340 --> 00:18:46.780
In other words, no matter how
messy it is, just like e, the
00:18:46.780 --> 00:18:50.900
column vector of all 1s is
always a right eigenvector of
00:18:50.900 --> 00:18:52.480
eigenvalue 1.
00:18:52.480 --> 00:18:56.800
There is always a non-negative
vector pi whose components sum
00:18:56.800 --> 00:19:03.590
to 1, which is a left
eigenvector with eigenvalue 1.
00:19:03.590 --> 00:19:06.260
So these two relationships
hold everywhere.
00:19:10.780 --> 00:19:15.030
Incidentally, the notes
at one point claim
00:19:15.030 --> 00:19:17.030
to have shown this.
00:19:17.030 --> 00:19:18.680
And the notes really
don't show it.
00:19:18.680 --> 00:19:21.270
I'm going to show
it to you today.
00:19:21.270 --> 00:19:22.400
I'm sorry for that.
00:19:22.400 --> 00:19:26.660
It's something I've known for so
long that I find it hard to
00:19:26.660 --> 00:19:29.420
say is this true or not.
00:19:29.420 --> 00:19:30.790
Of course it's true.
00:19:30.790 --> 00:19:34.540
But it does have to be shown,
and I will show it
00:19:34.540 --> 00:19:35.790
to you later on.
00:19:38.490 --> 00:19:44.410
Chapter three of the notes is
largely rewritten this year.
00:19:44.410 --> 00:19:47.920
And it has a few more typos
in it than most
00:19:47.920 --> 00:19:49.870
of the other chapters.
00:19:49.870 --> 00:19:52.280
And a few of the typos
are fairly important.
00:19:52.280 --> 00:19:55.518
I'll try to point some
of them out as we go.
00:19:55.518 --> 00:19:59.190
But I'm sure I haven't
caught them all yet.
00:19:59.190 --> 00:20:03.990
Now, what is the determinant
of an M by M matrix?
00:20:03.990 --> 00:20:08.660
It's this very simple-looking
but rather messy formula,
00:20:08.660 --> 00:20:13.560
which says the determinant of
a square matrix A is the sum
00:20:13.560 --> 00:20:14.810
over all partitions--
00:20:17.340 --> 00:20:19.000
and then there's a plus
minus here, which
00:20:19.000 --> 00:20:20.700
I'll talk about later--
00:20:20.700 --> 00:20:24.780
of the product from i equals
1 to M. M is the number of
00:20:24.780 --> 00:20:30.270
states of A sub i.
00:20:30.270 --> 00:20:34.480
This is the component
of the ij position.
00:20:34.480 --> 00:20:36.300
And we're taking A sub i.
00:20:36.300 --> 00:20:40.260
And then the partition that
we're dealing with, mu sub i.
00:20:40.260 --> 00:20:46.520
So what we're doing is taking
a matrix with all sorts of
00:20:46.520 --> 00:20:48.050
terms in it--
00:20:48.050 --> 00:21:03.600
A11 up to A1j on to Aj1
up to A sub jj.
00:21:03.600 --> 00:21:06.880
And these partitions we're
talking about are ways of
00:21:06.880 --> 00:21:11.900
selecting one element from each
row and one element from
00:21:11.900 --> 00:21:12.540
each column.
00:21:12.540 --> 00:21:20.240
Namely, that first sum there is
talking about one element
00:21:20.240 --> 00:21:22.000
from each row.
00:21:22.000 --> 00:21:25.140
And then when we're talking
about a permutation here,
00:21:25.140 --> 00:21:29.700
we're doing something like, for
this row, we're looking
00:21:29.700 --> 00:21:31.170
at, say, this element.
00:21:31.170 --> 00:21:34.430
For this row, we might be
looking at this element.
00:21:34.430 --> 00:21:37.500
For this row, we might be
looking at this element, and
00:21:37.500 --> 00:21:40.790
so forth down, until finally,
we're looking at some
00:21:40.790 --> 00:21:41.910
element down here.
00:21:41.910 --> 00:21:45.090
Now, we've picked out every
column and every row in doing
00:21:45.090 --> 00:21:48.960
this, but we only have one
element in each row and one
00:21:48.960 --> 00:21:51.400
element in each column.
00:21:51.400 --> 00:21:54.860
If you've studied linear algebra
and you're at all
00:21:54.860 --> 00:21:58.190
interested in computation, the
first thing that everybody
00:21:58.190 --> 00:22:02.610
tells you is that this is a
god-awful way to ever compute
00:22:02.610 --> 00:22:07.760
a determinant, because the
number of permutations grows
00:22:07.760 --> 00:22:10.680
very, very fast with the
size of the matrix.
00:22:10.680 --> 00:22:12.310
And therefore you don't
want to use this
00:22:12.310 --> 00:22:14.230
formula very often.
00:22:14.230 --> 00:22:18.590
It's a very useful formula
conceptually, though, because
00:22:18.590 --> 00:22:23.620
if we look at the determinant
of p minus lambda i, if we
00:22:23.620 --> 00:22:27.740
want to ask the question, how
many eigenvalues does this
00:22:27.740 --> 00:22:30.140
transition matrix have?
00:22:30.140 --> 00:22:33.190
well, the number of eigenvalues
it has is the
00:22:33.190 --> 00:22:36.940
number of values of lambda such
that the determinant of p
00:22:36.940 --> 00:22:42.270
minus lambda i is 0.
00:22:42.270 --> 00:22:44.840
Now, how many such
values are there?
00:22:44.840 --> 00:22:56.300
Well, you look the matrix for
that, and you get A11 minus
00:22:56.300 --> 00:23:13.460
lambda A12 and A22 minus lambda
Ajj minus lambda.
00:23:13.460 --> 00:23:16.430
And none of the other elements
have lambda in it.
00:23:16.430 --> 00:23:20.450
So when you're looking at this
formula for finding the
00:23:20.450 --> 00:23:24.840
determinant, one of the
partitions is this partition,
00:23:24.840 --> 00:23:30.520
which is a polynomial of
degree j in lambda.
00:23:30.520 --> 00:23:33.590
All of the others are
polynomials of degree less
00:23:33.590 --> 00:23:35.270
than j in lambda.
00:23:35.270 --> 00:23:38.750
And therefore this whole
bloody mess here is a
00:23:38.750 --> 00:23:44.140
polynomial of degree
j and lambda.
00:23:44.140 --> 00:23:48.410
So the equation, determinant of
p minus lambda i, which is
00:23:48.410 --> 00:23:53.070
a polynomial of degree j
in lambda, equals 0.
00:23:53.070 --> 00:23:55.440
How many roots does it have?
00:23:55.440 --> 00:23:58.120
Well, the fundamental theorem
of algebra says that a
00:23:58.120 --> 00:24:04.510
polynomial of degree j,
of complex numbers--
00:24:04.510 --> 00:24:07.130
and real is a special
case of complex--
00:24:07.130 --> 00:24:12.360
that it has exactly j roots.
00:24:12.360 --> 00:24:16.630
So there are exactly,
in this case, M--
00:24:16.630 --> 00:24:17.780
excuse me, I've been
calling it j
00:24:17.780 --> 00:24:19.030
sometimes and M sometimes.
00:24:21.870 --> 00:24:26.810
This equation here has exactly
M roots to it.
00:24:26.810 --> 00:24:30.200
And since it has exactly M
roots, that's the number of
00:24:30.200 --> 00:24:32.370
eigenvalues there are.
00:24:32.370 --> 00:24:35.710
There's one flaw in
that argument.
00:24:35.710 --> 00:24:40.020
And that is, some of the roots
might be repeated.
00:24:40.020 --> 00:24:44.070
Say you have M roots
altogether.
00:24:44.070 --> 00:24:48.460
Some of them appear more than
one time, so you'll have roots
00:24:48.460 --> 00:24:51.240
of multiplicity, something
or other.
00:24:51.240 --> 00:24:54.740
And when you add up the
multiplicities of each of the
00:24:54.740 --> 00:24:58.860
distinct eigenvalues, you
get capital M, which is
00:24:58.860 --> 00:25:00.700
the number of states.
00:25:00.700 --> 00:25:04.580
So the number of different
eigenvalues is less than or
00:25:04.580 --> 00:25:09.940
equal to M. And the number of
distinct eigenvalues times the
00:25:09.940 --> 00:25:17.350
multiplicity of each eigenvalue
is equal to M.
00:25:17.350 --> 00:25:19.910
That's a simple, straightforward
fact.
00:25:19.910 --> 00:25:22.910
And it's worth remembering.
00:25:22.910 --> 00:25:24.850
So there are M roots
to the equation.
00:25:24.850 --> 00:25:28.100
Determinant p minus
lambda i equals 0.
00:25:28.100 --> 00:25:33.220
And therefore there are
M eigenvalues of p.
00:25:33.220 --> 00:25:38.210
And therefore you might think
that there are M eigenvectors.
00:25:38.210 --> 00:25:43.530
That, unfortunately, is
not true necessarily.
00:25:43.530 --> 00:25:46.460
That's one of the really--
00:25:46.460 --> 00:25:50.380
it's probably the only really
ugly thing in linear algebra.
00:25:50.380 --> 00:25:52.505
I mean, linear algebra is
a beautiful theory.
00:25:55.380 --> 00:25:58.260
I mean, it's like Poisson's
stochastic processes.
00:25:58.260 --> 00:26:01.190
Everything that can
be true is true.
00:26:01.190 --> 00:26:03.070
And if something isn't true,
there's a simple
00:26:03.070 --> 00:26:05.580
counter-example of why
it can't be true.
00:26:05.580 --> 00:26:09.260
This thing is just
a bloody mess.
00:26:09.260 --> 00:26:15.570
But unfortunately, if you have
M states in a finite-state
00:26:15.570 --> 00:26:20.380
Markov chain, you might not have
M different eigenvectors.
00:26:20.380 --> 00:26:24.790
And that's unfortunate, but we
will forget about that for as
00:26:24.790 --> 00:26:28.780
long as we can, and we'll
finally come back to it
00:26:28.780 --> 00:26:31.130
towards the end.
00:26:31.130 --> 00:26:32.380
AUDIENCE: [INAUDIBLE]?
00:26:38.600 --> 00:26:38.870
PROFESSOR: What?
00:26:38.870 --> 00:26:41.158
AUDIENCE: Why would we care
about all the eigenvectors if
00:26:41.158 --> 00:26:45.790
we are only concerned with the
ones that [INAUDIBLE]?
00:26:45.790 --> 00:26:47.890
PROFESSOR: Well, because we're
interested in the other ones
00:26:47.890 --> 00:26:51.960
because that tells us how fast
p to the M converges to what
00:26:51.960 --> 00:26:54.760
it should be.
00:26:54.760 --> 00:26:58.700
I mean, all those other
eigenvalues, as we'll see, are
00:26:58.700 --> 00:27:04.884
the error terms in p to the
M as it approaches this
00:27:04.884 --> 00:27:06.868
asymptotic value.
00:27:06.868 --> 00:27:10.416
And therefore we want to know
what those eigenvalues are.
00:27:10.416 --> 00:27:11.630
At least we want to
know what the
00:27:11.630 --> 00:27:13.258
second-biggest eigenvalue is.
00:27:19.820 --> 00:27:23.850
Now, let's look at just
a case of two states.
00:27:23.850 --> 00:27:28.020
Most of the things that can
happen will happen with two
00:27:28.020 --> 00:27:31.280
states, except for this ugly
thing that I told you about
00:27:31.280 --> 00:27:33.330
that can't happen
with two states.
00:27:33.330 --> 00:27:36.370
And therefore two states is a
good thing to look at, because
00:27:36.370 --> 00:27:38.930
with two states, you can
calculate everything very
00:27:38.930 --> 00:27:43.210
easily and you don't have to
use any linear algebra.
00:27:43.210 --> 00:27:48.010
So if we look at a Markov chain
with two states, P sub
00:27:48.010 --> 00:27:54.620
ij is this set of transition
probabilities.
00:27:54.620 --> 00:28:03.880
The left eigenvector equation
is pi 1 times P11 times pi 2
00:28:03.880 --> 00:28:07.490
times P21 is equal
to lambda pi 1.
00:28:07.490 --> 00:28:12.500
And so this is writing out
what we said before.
00:28:12.500 --> 00:28:17.770
The vector pi times the matrix
P is equal to lambda
00:28:17.770 --> 00:28:19.780
times the vector pi.
00:28:19.780 --> 00:28:21.560
That covers both of
these equations.
00:28:21.560 --> 00:28:23.810
Since M is only 2,
we only have to
00:28:23.810 --> 00:28:25.830
write things out twice.
00:28:25.830 --> 00:28:29.250
Same thing for the right
eigenvector equation.
00:28:29.250 --> 00:28:30.990
That's this.
00:28:30.990 --> 00:28:34.230
The determinant of P minus
lambda i, if we use this
00:28:34.230 --> 00:28:39.940
formula that we talked about
here, you put A11 minus
00:28:39.940 --> 00:28:42.430
lambda, A22 minus lambda.
00:28:42.430 --> 00:28:44.620
Well, then you're done.
00:28:44.620 --> 00:28:50.220
So all you need is P11 minus
lambda times P22 minus lambda.
00:28:50.220 --> 00:28:53.560
That's this permutation there.
00:28:53.560 --> 00:28:59.150
And then you have an odd
permutation, A12 times A21.
00:28:59.150 --> 00:29:01.470
How do you know which
permutations are even and
00:29:01.470 --> 00:29:04.160
which permutations are odd?
00:29:04.160 --> 00:29:07.070
It's how many flips
you have to do.
00:29:07.070 --> 00:29:09.930
But to see that that's
consistent, you really have to
00:29:09.930 --> 00:29:13.800
look at Strang or some book on
linear algebra, because it's
00:29:13.800 --> 00:29:15.590
not relevant here.
00:29:15.590 --> 00:29:18.100
But anyway, that determinant
is equal to
00:29:18.100 --> 00:29:19.970
this quantity here.
00:29:19.970 --> 00:29:24.850
That's a polynomial of
degree 2 in lambda.
00:29:24.850 --> 00:29:30.650
If you solve it, you find
out that one solution is
00:29:30.650 --> 00:29:32.690
lambda 1 equals 1.
00:29:32.690 --> 00:29:39.710
The other solution is lambda
2 is 1 minus P12 minus P21.
00:29:39.710 --> 00:29:44.020
Now, there are a bunch of
cases to look at here.
00:29:44.020 --> 00:29:48.770
If the off-diagonal transition
probabilities are both 0, what
00:29:48.770 --> 00:29:49.380
does that mean?
00:29:49.380 --> 00:29:52.520
It means if you start in state
0, you stay there.
00:29:52.520 --> 00:29:56.450
If you start in state 1,
you stay there forever.
00:29:56.450 --> 00:29:59.590
If you start in state 2,
you stay there forever.
00:29:59.590 --> 00:30:04.520
That's a very boring Markov
chain, but it's not very nice
00:30:04.520 --> 00:30:06.780
for the theory.
00:30:06.780 --> 00:30:11.120
So we're going to leave that
case out for the time being.
00:30:11.120 --> 00:30:14.530
But anyway, if you have that
case, then the chain has two
00:30:14.530 --> 00:30:16.520
recurrent classes.
00:30:16.520 --> 00:30:19.740
Lambda equals 1, has
multiplicity 2.
00:30:19.740 --> 00:30:27.430
You have two eigenvalues of
algebraic multiplicity 2.
00:30:27.430 --> 00:30:30.660
I mean, it's just one number,
but it appears twice in this
00:30:30.660 --> 00:30:32.920
determinant equation.
00:30:32.920 --> 00:30:35.960
And it also appears twice in
the sense that you have two
00:30:35.960 --> 00:30:37.510
recurrent classes.
00:30:37.510 --> 00:30:43.930
And you will find that there are
two linearly independent
00:30:43.930 --> 00:30:47.210
left eigenvectors, two linearly
independent right
00:30:47.210 --> 00:30:48.360
eigenvectors.
00:30:48.360 --> 00:30:50.400
And how do you find those?
00:30:50.400 --> 00:30:54.160
You use your common sense and
you say, well, if you start in
00:30:54.160 --> 00:30:55.910
state 1, you're always there.
00:30:55.910 --> 00:30:58.400
If you start in state 2,
you're always there.
00:30:58.400 --> 00:31:01.130
Why do I even look at
these two states?
00:31:01.130 --> 00:31:05.330
This is a crazy thing where
wherever I start, I stay there
00:31:05.330 --> 00:31:09.130
and I only look at state
1 or state 2.
00:31:09.130 --> 00:31:13.820
It's scarcely even
a Markov chain.
00:31:13.820 --> 00:31:19.630
If P12 and P21 are both 1, what
it means is you can never
00:31:19.630 --> 00:31:21.610
go from state 1 to state 1.
00:31:21.610 --> 00:31:24.200
You always go from state
1 to state 2.
00:31:24.200 --> 00:31:27.220
And you always go from
state 2 to state 1.
00:31:27.220 --> 00:31:30.830
It means you have a two-state
periodic chain.
00:31:30.830 --> 00:31:33.130
And that's the other
crazy case.
00:31:33.130 --> 00:31:35.170
The other case is not
very interesting.
00:31:35.170 --> 00:31:38.800
There's nothing stochastic
about it at all.
00:31:38.800 --> 00:31:40.970
So the chain is periodic.
00:31:40.970 --> 00:31:45.170
And if you look at this equation
here, the second
00:31:45.170 --> 00:31:48.270
eigenvalue is equal
to minus 1.
00:31:48.270 --> 00:31:51.790
I might as well tell you that,
in general, if you have a
00:31:51.790 --> 00:31:57.520
periodic Markov chain, just one
recurrent class and it's
00:31:57.520 --> 00:32:04.760
periodic, a period d, then the
eigenvalues turn out to be the
00:32:04.760 --> 00:32:08.890
uniformly spaced eigenvalues
around the unit circle.
00:32:08.890 --> 00:32:10.920
One is one of the eigenvalues.
00:32:10.920 --> 00:32:12.490
We've already seen that.
00:32:12.490 --> 00:32:15.940
And the other d minus 1
eigenvalues are those
00:32:15.940 --> 00:32:18.690
uniformly spaced around
the unit circle.
00:32:18.690 --> 00:32:24.410
So they add up to 360 degrees
when you get all done with it.
00:32:24.410 --> 00:32:26.270
So that's an easy case.
00:32:26.270 --> 00:32:29.780
Proving that is tedious.
00:32:29.780 --> 00:32:31.420
It's done in the notes.
00:32:31.420 --> 00:32:32.690
It's not even done
in the notes.
00:32:32.690 --> 00:32:34.680
It's done in one of
the exercises.
00:32:34.680 --> 00:32:38.260
And you can do it
if you choose.
00:32:41.470 --> 00:32:46.540
So let's look at these
eigenvector equations and the
00:32:46.540 --> 00:32:48.400
eigenvalue equations.
00:32:48.400 --> 00:32:52.200
Incidentally, if you don't know
what the eigenvalues are,
00:32:52.200 --> 00:32:57.010
is this a linear set
of equations?
00:32:57.010 --> 00:32:59.380
No, it's a nonlinear
set of equations.
00:32:59.380 --> 00:33:02.770
This is a nonlinear set
of equations in pi
00:33:02.770 --> 00:33:06.930
1, pi 2, and lambda.
00:33:06.930 --> 00:33:11.390
How do you solve non-linear
equations like that?
00:33:11.390 --> 00:33:14.750
Well, if you have much sense,
you first find out what lambda
00:33:14.750 --> 00:33:17.350
is and then you solve
linear equations.
00:33:20.105 --> 00:33:21.390
And you can always do that.
00:33:21.390 --> 00:33:25.880
We've said that these solutions
for lambda, there
00:33:25.880 --> 00:33:28.660
can only be M of them.
00:33:28.660 --> 00:33:30.210
And you can find
them by solving
00:33:30.210 --> 00:33:32.220
this polynomial equation.
00:33:32.220 --> 00:33:35.910
Then you can solve the linear
equation by finding the
00:33:35.910 --> 00:33:37.100
eigenvectors.
00:33:37.100 --> 00:33:39.750
There are packages to do all
of these things, so there's
00:33:39.750 --> 00:33:44.650
nothing you should waste
time on doing here.
00:33:44.650 --> 00:33:49.090
It's just knowing what the
results are that's important.
00:33:49.090 --> 00:33:55.100
From now on, I'm going to assume
that P12 or P21 are
00:33:55.100 --> 00:33:56.140
greater than 0.
00:33:56.140 --> 00:33:58.650
In other words, I'm going to
assume that we don't have the
00:33:58.650 --> 00:34:05.010
periodic case and we don't have
the case where you have
00:34:05.010 --> 00:34:07.000
two classes of states.
00:34:07.000 --> 00:34:11.760
In other words, I'm going to
assume that our Markov chain
00:34:11.760 --> 00:34:13.080
is actually ergodic.
00:34:13.080 --> 00:34:17.530
That's the assumption that
I'm making here.
00:34:17.530 --> 00:34:22.500
If you then solve these
equations using lambda 1
00:34:22.500 --> 00:34:27.380
equals 1, you'll find
out that pi 1 is the
00:34:27.380 --> 00:34:29.350
component sum to 1.
00:34:29.350 --> 00:34:32.670
First component is
P21 over the sum.
00:34:32.670 --> 00:34:36.700
Second component is
P12 over the sum.
00:34:36.700 --> 00:34:37.950
Not very interesting.
00:34:40.440 --> 00:34:44.520
Why is the steady state
probability weighted towards
00:34:44.520 --> 00:34:47.839
the largest of these transition
probabilities?
00:34:47.839 --> 00:34:54.330
If P21 is bigger than P12, how
do you know intuitively that
00:34:54.330 --> 00:34:59.196
you're going to be in state 1
more than you're in state 2?
00:34:59.196 --> 00:35:03.131
Is this intuitively
obvious to-- yeah?
00:35:03.131 --> 00:35:04.381
AUDIENCE: [INAUDIBLE].
00:35:06.220 --> 00:35:08.810
PROFESSOR: Because you make more
transistors from 2 to 1.
00:35:08.810 --> 00:35:11.970
Well, actually you don't make
more transitions from 2 to 1.
00:35:11.970 --> 00:35:14.890
You make exactly the same number
of transitions, but
00:35:14.890 --> 00:35:17.260
since the probability is higher,
it means you have to
00:35:17.260 --> 00:35:20.160
be in state 1 more
of the time.
00:35:20.160 --> 00:35:21.410
Good.
00:35:23.840 --> 00:35:25.520
So these are the two.
00:35:28.740 --> 00:35:35.920
And this is the left eigenvector
for the second
00:35:35.920 --> 00:35:36.480
eigenvalue--
00:35:36.480 --> 00:35:39.230
namely, the smaller
eigenvalue.
00:35:39.230 --> 00:35:47.000
Now, if you look at these
equations, you'll notice that
00:35:47.000 --> 00:35:54.390
the vector pi, the left i-th
eigenvector, multiplied by the
00:35:54.390 --> 00:35:59.650
right j-th eigenvector, is
always equal to delta ij.
00:35:59.650 --> 00:36:05.790
In other words, the left
eigenvectors are orthogonal to
00:36:05.790 --> 00:36:08.670
the right eigenvectors.
00:36:08.670 --> 00:36:11.840
I mean, you can see this just
by multiplying it out.
00:36:11.840 --> 00:36:16.310
You multiply pi 1 times nu
1, and what do you get?
00:36:16.310 --> 00:36:20.460
You get this plus this,
which is 1.
00:36:20.460 --> 00:36:25.310
Delta 11 means there's something
which is 1 when i is
00:36:25.310 --> 00:36:29.165
equal j and 0 when i
is unequal to j.
00:36:29.165 --> 00:36:36.170
You take this and you
multiply it by this,
00:36:36.170 --> 00:36:36.950
and what do you get?
00:36:36.950 --> 00:36:41.160
You get P21 times P12
over the square.
00:36:41.160 --> 00:36:45.830
Minus P12 times P21, it's 0.
00:36:45.830 --> 00:36:47.040
Same thing here.
00:36:47.040 --> 00:36:53.160
1 minus 1, that vector times
this vector, is 0 again.
00:36:53.160 --> 00:36:56.160
So the cross-terms are 0.
00:36:56.160 --> 00:36:58.680
The diagonal terms are 1.
00:37:07.150 --> 00:37:08.400
That's the way it is.
00:37:11.500 --> 00:37:13.410
So let's move on with this.
00:37:17.580 --> 00:37:21.440
These right eigenvector
equations, you can write them
00:37:21.440 --> 00:37:23.530
in matrix form.
00:37:23.530 --> 00:37:24.720
I'm doing this slowly.
00:37:24.720 --> 00:37:27.740
I hope I'm not boring those who
have done a lot of linear
00:37:27.740 --> 00:37:29.830
algebra too much.
00:37:29.830 --> 00:37:37.130
But they won't go on forever,
and it gets us to where we
00:37:37.130 --> 00:37:38.270
want to go.
00:37:38.270 --> 00:37:41.830
So if you take these two
equations and you write them
00:37:41.830 --> 00:37:48.170
in matrix form, what you get
is P times u, where u is a
00:37:48.170 --> 00:37:53.770
matrix whose columns are
the vector nu 1 and
00:37:53.770 --> 00:37:56.120
the vector nu 2.
00:37:56.120 --> 00:38:01.090
And capital lambda is the
diagonal matrix of the
00:38:01.090 --> 00:38:01.980
eigenvalues.
00:38:01.980 --> 00:38:07.540
If you multiply P times the
first column of u, and then
00:38:07.540 --> 00:38:11.940
you look at the first column of
this matrix, what you get--
00:38:11.940 --> 00:38:13.680
yes, that's exactly the
right way to do it.
00:38:17.100 --> 00:38:20.510
And if you're not doing that,
you're probably not
00:38:20.510 --> 00:38:21.330
understanding it.
00:38:21.330 --> 00:38:25.220
But if you just think of
ordinary matrix vector
00:38:25.220 --> 00:38:29.110
multiplication, this
all works out.
00:38:32.320 --> 00:38:36.700
Because of this orthogonality
relationship, we see that the
00:38:36.700 --> 00:38:52.530
matrix whose rows are the left
eigenvectors times the matrix
00:38:52.530 --> 00:38:56.980
whose columns are the
right eigenvectors,
00:38:56.980 --> 00:38:59.720
that's equal to i.
00:38:59.720 --> 00:39:01.740
Namely, it's equal to
the identity matrix.
00:39:01.740 --> 00:39:05.580
That's what this orthogonality
relationship means.
00:39:05.580 --> 00:39:12.730
This means that this matrix is
the inverse of this matrix.
00:39:12.730 --> 00:39:16.310
This proves that u
is invertible.
00:39:16.310 --> 00:39:20.250
And in fact, we've done this
just for m equals 2.
00:39:20.250 --> 00:39:24.390
But in fact, this proof is
general and holds for
00:39:24.390 --> 00:39:31.520
arbitrary Markov chains if the
eigenvectors span the space.
00:39:31.520 --> 00:39:32.960
And we'll see that later.
00:39:32.960 --> 00:39:38.030
We're doing this for m equals
2 now, so we how to proceed
00:39:38.030 --> 00:39:41.130
when we have an arbitrary
Markov chain.
00:39:41.130 --> 00:39:42.610
u is invertible.
00:39:42.610 --> 00:39:46.790
u to the minus 1 has pi
1 and pi 2 as rows.
00:39:46.790 --> 00:39:49.690
And thus P is going
to be equal to--
00:39:52.540 --> 00:39:54.020
I guess we should--
00:39:54.020 --> 00:39:56.180
oh, we set it up here.
00:39:56.180 --> 00:39:59.220
P times u is equal to
u times lambda.
00:39:59.220 --> 00:40:02.780
We've shown here that u is
invertible, therefore we can
00:40:02.780 --> 00:40:06.830
multiply this equation
by u to the minus 1.
00:40:06.830 --> 00:40:11.830
And we get the transition matrix
P is equal to u times
00:40:11.830 --> 00:40:15.730
the diagonal matrix lambda
times the matrix u
00:40:15.730 --> 00:40:18.300
to the minus 1.
00:40:18.300 --> 00:40:21.350
What happens if we try
to find P squared?
00:40:21.350 --> 00:40:25.470
Well, it's u times lambda
times u to the minus 1.
00:40:25.470 --> 00:40:28.470
One of the nice things about
matrices is you can multiply
00:40:28.470 --> 00:40:30.310
them, if you don't
worry about the
00:40:30.310 --> 00:40:32.960
details, almost like numbers.
00:40:32.960 --> 00:40:36.540
Times u times lambda times
u to the minus 1.
00:40:36.540 --> 00:40:41.580
Except you don't have
commutativity.
00:40:41.580 --> 00:40:44.310
That's the only thing
that you don't have.
00:40:44.310 --> 00:40:47.100
But anyway, you have u times
lambda times u to the minus 1
00:40:47.100 --> 00:40:50.600
times u times lambda times
u to t he minus 1.
00:40:50.600 --> 00:40:54.840
This and this turn out to be
the identity matrix, so you
00:40:54.840 --> 00:40:58.220
have u times lambda times
lambda, which is lambda
00:40:58.220 --> 00:41:00.870
squared, times u
to the minus 1.
00:41:00.870 --> 00:41:03.580
You still have this diagonal
matrix here, but the
00:41:03.580 --> 00:41:06.610
eigenvalues have all
been doubled.
00:41:06.610 --> 00:41:12.660
If you keep doing that
repeatedly, you find out that
00:41:12.660 --> 00:41:17.410
P to the n-- namely, this
long-term transition matrix,
00:41:17.410 --> 00:41:20.290
which is the thing we're
interested in--
00:41:20.290 --> 00:41:25.860
is the matrix u times this
diagonal matrix, lambda to the
00:41:25.860 --> 00:41:29.710
n, times u to the minus 1.
00:41:29.710 --> 00:41:34.910
Equation 329 in the text has a
typo, and it should be this.
00:41:34.910 --> 00:41:39.650
It's given as u to the minus 1
times lambda to the n times u,
00:41:39.650 --> 00:41:43.030
which is not at all right.
00:41:43.030 --> 00:41:47.730
That's probably the worst typo,
because if you try to
00:41:47.730 --> 00:41:51.076
say something from that, you'll
get very confused.
00:41:51.076 --> 00:41:54.700
You can solve one in general if
all the M eigenvalues are
00:41:54.700 --> 00:41:57.730
distinct as easily as
for M equals 2.
00:41:57.730 --> 00:42:01.020
This is still valid
so long as the
00:42:01.020 --> 00:42:04.780
eigenvectors span the space.
00:42:04.780 --> 00:42:09.600
So now the thing we want to
do is relatively simple.
00:42:09.600 --> 00:42:14.460
This lambda to the n is
a diagonal matrix.
00:42:14.460 --> 00:42:19.900
I can represent it as the sum
of M different matrices.
00:42:19.900 --> 00:42:23.750
And each of those matrices
has only one
00:42:23.750 --> 00:42:26.590
diagonal element, non-0.
00:42:26.590 --> 00:42:30.670
In other words, for the case
here, what we're doing is
00:42:30.670 --> 00:42:40.650
taking lambda 1, 0 to the n,
0 lambda 2 to the n, and
00:42:40.650 --> 00:42:54.090
representing this as lambda 1 to
the n, 0, 0, 0, plus 0, 0,
00:42:54.090 --> 00:42:57.890
0 lambda 2 to the n.
00:43:01.240 --> 00:43:07.940
So we have those trivial
matrices with u on the left
00:43:07.940 --> 00:43:11.620
side and u to the minus
1 on the right side.
00:43:11.620 --> 00:43:17.920
And we think of how to multiply
the matrix u, which
00:43:17.920 --> 00:43:23.910
is a matrix whose columns are
the eigenvectors, times this
00:43:23.910 --> 00:43:27.730
matrix with only one non-0
element, times the matrix
00:43:27.730 --> 00:43:33.620
here, whose elements are
the left eigenvectors.
00:43:33.620 --> 00:43:36.050
And how do you do that?
00:43:36.050 --> 00:43:40.200
Well, if you do this for a
while, and you think of what
00:43:40.200 --> 00:43:46.720
this one element here times
a matrix whose rows are
00:43:46.720 --> 00:43:52.280
eigenvectors does, this non-0
term in here picks out the
00:43:52.280 --> 00:43:54.830
appropriate row here.
00:43:54.830 --> 00:43:57.310
And this non-0 element
picks out the
00:43:57.310 --> 00:43:59.620
appropriate column here.
00:43:59.620 --> 00:44:05.770
So what that gives you is p to
the n is equal to the sum over
00:44:05.770 --> 00:44:10.550
the number of states in the
Markov chain times lambda sub
00:44:10.550 --> 00:44:13.780
i-- the i-th value to
the nth power--
00:44:13.780 --> 00:44:17.090
times nu to the i times
pi to the i.
00:44:17.090 --> 00:44:21.560
pi to the i is the i-th
eigenvector of p.
00:44:21.560 --> 00:44:26.150
nu to the i is the i-th right
eigenvector of p.
00:44:26.150 --> 00:44:28.010
They have nothing
to do with n.
00:44:28.010 --> 00:44:33.370
The only thing that n affects
is this eigenvalue here.
00:44:33.370 --> 00:44:37.650
And what this is saying is that
p to the n is just the
00:44:37.650 --> 00:44:46.160
sum of eigenvalues which are,
if lambda is bigger than 1,
00:44:46.160 --> 00:44:47.720
this is exploding.
00:44:47.720 --> 00:44:51.540
If lambda 1 is less than
1, it's going to 0.
00:44:51.540 --> 00:44:55.410
And if lambda 1 is equal to
1, it's staying constant.
00:44:55.410 --> 00:45:00.790
If lambda 1 is complex but has
magnitude 1, then it's just
00:45:00.790 --> 00:45:04.770
gradually rotating around and
not doing much of interest at
00:45:04.770 --> 00:45:06.930
all, but it's going away.
00:45:06.930 --> 00:45:08.520
So that's what this
equation means.
00:45:08.520 --> 00:45:13.730
It says that we've converted the
problem of finding the nth
00:45:13.730 --> 00:45:18.030
power of p just to this problem
of finding the nth
00:45:18.030 --> 00:45:20.740
power of these eigenvalues.
00:45:20.740 --> 00:45:22.070
So we've made some
real progress.
00:45:22.070 --> 00:45:24.505
AUDIENCE: Professor, what
is nu i right here?
00:45:24.505 --> 00:45:24.992
PROFESSOR: What?
00:45:24.992 --> 00:45:26.453
AUDIENCE: What is nu i?
00:45:29.635 --> 00:45:34.680
PROFESSOR: nu sub i is the i-th
of the right eigenvectors
00:45:34.680 --> 00:45:37.624
of the matrix p.
00:45:37.624 --> 00:45:38.940
AUDIENCE: And pi i?
00:45:38.940 --> 00:45:44.020
PROFESSOR: And pi i is the
i-th left eigenvector.
00:45:44.020 --> 00:45:48.060
And what we've shown is that
these are orthogonal to each
00:45:48.060 --> 00:45:51.554
other, orthonormal.
00:45:51.554 --> 00:45:53.932
AUDIENCE: Can you please say
again what happens when lambda
00:45:53.932 --> 00:45:54.890
is complex?
00:45:54.890 --> 00:45:55.290
PROFESSOR: What?
00:45:55.290 --> 00:45:57.094
AUDIENCE: When lambda is
complex, what exactly happens?
00:46:01.110 --> 00:46:04.160
PROFESSOR: Oh, if lambda i is
complex and the magnitude is
00:46:04.160 --> 00:46:07.190
less than 1, it just
dies away.
00:46:07.190 --> 00:46:09.730
if the magnitude is bigger than
1, it explodes, which
00:46:09.730 --> 00:46:10.860
will be very strange.
00:46:10.860 --> 00:46:12.800
And we'll see that
can't happen.
00:46:12.800 --> 00:46:17.540
And if the magnitude is 1, as
you take powers of a complex
00:46:17.540 --> 00:46:22.320
number of magnitude 1, I mean,
it start out here, it goes
00:46:22.320 --> 00:46:23.820
here, then here.
00:46:23.820 --> 00:46:27.620
I mean, it just rotates around
in some crazy way.
00:46:27.620 --> 00:46:31.220
But it maintains its magnitude
as being equal
00:46:31.220 --> 00:46:32.470
to 1 all the time.
00:46:38.290 --> 00:46:40.850
So this is just repeating
what we had before.
00:46:40.850 --> 00:46:42.100
These are the eigenvectors.
00:46:46.350 --> 00:46:51.690
If you calculate this very
quickly using this and this,
00:46:51.690 --> 00:46:59.970
and if you recognize that the
right eigenvector, nu 2, is
00:46:59.970 --> 00:47:07.060
the first part of it is pi sub
2, the second part of it minus
00:47:07.060 --> 00:47:14.450
pi sub 1, where pi is just this
first eigenvector here.
00:47:14.450 --> 00:47:17.010
So if you do this
multiplication, you find that
00:47:17.010 --> 00:47:19.960
nu to the 1--
00:47:19.960 --> 00:47:21.560
oh, I thought I had all
of these things out.
00:47:21.560 --> 00:47:22.810
This should be nu.
00:47:26.580 --> 00:47:32.270
The first right eigenvector
times the first left
00:47:32.270 --> 00:47:32.630
eigenvector.
00:47:32.630 --> 00:47:35.750
Oh, but this is all right,
because I'm saying the first
00:47:35.750 --> 00:47:39.080
left eigenvector is a steady
state vector, which is the
00:47:39.080 --> 00:47:40.570
thing we're interested in.
00:47:40.570 --> 00:47:46.210
That's pi 1, pi 2, pi 1,
pi 2, where pi 1 is
00:47:46.210 --> 00:47:49.070
this and pi 2 is this.
00:47:49.070 --> 00:47:53.480
nu 2 times pi 2 is just this.
00:47:53.480 --> 00:47:59.910
So when we calculate p sub n,
we get pi 1 plus pi 2 times
00:47:59.910 --> 00:48:03.530
this eigenvalue to
the nth power.
00:48:03.530 --> 00:48:07.000
Pi 1 minus pi 1, lambda
2 to the nth power.
00:48:07.000 --> 00:48:12.110
pi 2 and pi 2 is what we get
for the main eigenvalue.
00:48:12.110 --> 00:48:14.980
This is what we get for
the little eigenvalue.
00:48:14.980 --> 00:48:20.790
This little eigenvalue here is
1 minus P12 minus P21, which
00:48:20.790 --> 00:48:29.690
has magnitude less than 1,
unless we either have the
00:48:29.690 --> 00:48:33.840
situation where P12 is equal to
P21 is equal to 0, or both
00:48:33.840 --> 00:48:35.140
of them are 1.
00:48:35.140 --> 00:48:38.500
So these are the terms
that go to 0.
00:48:38.500 --> 00:48:39.755
This solution is exact.
00:48:39.755 --> 00:48:41.990
There were no approximations
in here.
00:48:41.990 --> 00:48:47.140
Before, when we analyzed what
happened to P to the n, we saw
00:48:47.140 --> 00:48:49.690
that we converged, but
we didn't really
00:48:49.690 --> 00:48:51.250
see how fast we converged.
00:48:51.250 --> 00:48:53.750
Now we know how fast
we converge.
00:48:53.750 --> 00:48:59.010
The rate of convergence is
the value of this second
00:48:59.010 --> 00:49:01.790
eigenvalue here.
00:49:01.790 --> 00:49:04.230
And that's a pretty
general result.
00:49:04.230 --> 00:49:08.150
You converged like the
second-largest eigenvalue.
00:49:08.150 --> 00:49:10.900
And we'll see how
that works out.
00:49:15.210 --> 00:49:18.810
Now, let's go on to the case
where you have an arbitrary
00:49:18.810 --> 00:49:20.230
number of states.
00:49:20.230 --> 00:49:23.820
We've almost solved that
already, because as we were
00:49:23.820 --> 00:49:29.870
looking at the case with two
states, we were doing most of
00:49:29.870 --> 00:49:32.000
the things in general.
00:49:32.000 --> 00:49:36.430
If you have an n state Markov
chain, a determinant of P
00:49:36.430 --> 00:49:40.760
minus lambda is a polynomial
of degree M in lambda.
00:49:40.760 --> 00:49:42.790
That was what we said
a while ago.
00:49:42.790 --> 00:49:45.480
It has M roots, eigenvalues.
00:49:45.480 --> 00:49:48.740
And here, we're going to assume
that those roots are
00:49:48.740 --> 00:49:49.520
all distinct.
00:49:49.520 --> 00:49:52.590
So we don't have to worry
about what happens with
00:49:52.590 --> 00:49:54.320
repeated roots.
00:49:54.320 --> 00:49:58.010
Each eigenvalue lambda sub i--
there are M of them now--
00:49:58.010 --> 00:50:03.160
has a right eigenvector,
nu sub i, and a left
00:50:03.160 --> 00:50:06.010
eigenvector, pi sub i.
00:50:06.010 --> 00:50:10.030
And we have seen that--
00:50:10.030 --> 00:50:11.220
well, we haven't seen it yet.
00:50:11.220 --> 00:50:13.140
We're going to show
it in a second.
00:50:13.140 --> 00:50:18.060
pi super i times nu super
j is equal to j for each
00:50:18.060 --> 00:50:20.420
ij unequal to i.
00:50:20.420 --> 00:50:24.660
If you scale either this or
that, when you saw this
00:50:24.660 --> 00:50:30.160
eigenvector equation, you have
a pi on both sides or a nu on
00:50:30.160 --> 00:50:34.280
both sides, and you have a scale
factor which can't be
00:50:34.280 --> 00:50:37.040
determined from the eigenvector
equation.
00:50:37.040 --> 00:50:41.020
So you have to choose that
scaling factor somehow.
00:50:41.020 --> 00:50:45.070
If we choose the scaling factor
appropriately, we get
00:50:45.070 --> 00:50:51.810
pi, the i-th left eigenvector,
times the i-th right
00:50:51.810 --> 00:50:52.075
eigenvector.
00:50:52.075 --> 00:50:53.610
This is just a number now.
00:50:53.610 --> 00:50:56.520
It's that times that.
00:50:56.520 --> 00:51:00.810
We can scale things, so
that's equal to 1.
00:51:00.810 --> 00:51:05.930
Then as before, let u be the
matrix with columns nu 1 to nu
00:51:05.930 --> 00:51:12.090
M, and let v have the rows, pi
1 to pi M. Because of this
00:51:12.090 --> 00:51:16.340
orthogonality relationship we've
set up, v times u is
00:51:16.340 --> 00:51:17.530
equal to i.
00:51:17.530 --> 00:51:26.310
So again, the left eigenvector
rows forms a matrix which is
00:51:26.310 --> 00:51:30.910
the inverse of the right
eigenvector columns.
00:51:30.910 --> 00:51:35.400
So that says v is equal
to u to the minus 1.
00:51:35.400 --> 00:51:41.040
Thus the eigenvector is nu, the
first right eigenvector up
00:51:41.040 --> 00:51:44.870
to the nth right eigenvector,
these are linearly
00:51:44.870 --> 00:51:46.100
independent.
00:51:46.100 --> 00:51:47.350
And they span M space.
00:51:50.040 --> 00:51:53.480
That's a very peculiar
thing we've done.
00:51:53.480 --> 00:51:57.600
We've said we have all these
M right eigenvectors.
00:51:57.600 --> 00:52:02.690
We don't know anything about
them, but what we do know is
00:52:02.690 --> 00:52:08.030
we also have M left
eigenvectors.
00:52:08.030 --> 00:52:13.070
And the left eigenvectors, as
we're going to show in just a
00:52:13.070 --> 00:52:17.800
second, are orthogonal to
the right eigenvectors.
00:52:17.800 --> 00:52:20.500
And therefore, when we look at
these two matrices, we can
00:52:20.500 --> 00:52:23.610
multiply them and get
the identity matrix.
00:52:23.610 --> 00:52:29.370
And that means that the right
eigenvectors have to be--
00:52:29.370 --> 00:52:31.970
when we look at the matrix of
the right eigenvectors, is
00:52:31.970 --> 00:52:33.220
non-singular.
00:52:34.920 --> 00:52:37.870
Very, very peculiar argument.
00:52:37.870 --> 00:52:41.350
I mean, we find out that those
right eigenvectors span the
00:52:41.350 --> 00:52:44.600
space, not by looking at the
right eigenvectors, but by
00:52:44.600 --> 00:52:48.220
looking at how they relate
to the left eigenvectors.
00:52:48.220 --> 00:52:51.370
But anyway, that's perfectly
all right.
00:52:51.370 --> 00:52:54.890
And so long as we can show
that we can satisfy this
00:52:54.890 --> 00:53:01.330
orthogonality condition, then in
fact all this works out. v
00:53:01.330 --> 00:53:03.560
is equal to u to the minus 1.
00:53:03.560 --> 00:53:06.210
These eigenvectors are linearly
independent and they
00:53:06.210 --> 00:53:07.380
span M space.
00:53:07.380 --> 00:53:08.630
Same here.
00:53:12.980 --> 00:53:17.010
And putting these equations
together, P times u equals u
00:53:17.010 --> 00:53:17.810
times lambda.
00:53:17.810 --> 00:53:19.960
This is exactly what
we did before.
00:53:19.960 --> 00:53:24.680
Post-multiplying by u to the
minus 1, we get P equals u
00:53:24.680 --> 00:53:27.210
times lambda times
u to the minus 1.
00:53:27.210 --> 00:53:30.430
P to the n is then u times
lambda to the n times u
00:53:30.430 --> 00:53:32.230
to the minus 1.
00:53:32.230 --> 00:53:35.670
All this stuff about convergence
is all revolving
00:53:35.670 --> 00:53:39.010
down to simply the question
of what happens to these
00:53:39.010 --> 00:53:40.200
eigenvalues.
00:53:40.200 --> 00:53:43.420
I mean, there's a mess first,
finding out what all these
00:53:43.420 --> 00:53:47.580
right eigenvectors are and
what all these left
00:53:47.580 --> 00:53:48.740
eigenvectors are.
00:53:48.740 --> 00:53:54.870
But once you do that, P to the
n is just looking at this
00:53:54.870 --> 00:53:57.300
quantity, breaking up
lambda to the n
00:53:57.300 --> 00:53:59.660
the way we did before.
00:53:59.660 --> 00:54:04.550
P to the n is just
this sum here.
00:54:04.550 --> 00:54:08.670
Now, each row of P sums to 1, so
e is a right eigenvector of
00:54:08.670 --> 00:54:10.960
eigenvalue 1.
00:54:10.960 --> 00:54:15.040
So we have a theorem that says
the left eigenvector pi of
00:54:15.040 --> 00:54:20.060
eigenvalue 1 is a steady state
vector if it's normalized to
00:54:20.060 --> 00:54:22.510
pi times e equals 1.
00:54:26.050 --> 00:54:30.760
So we almost did that before,
but now we want to be a little
00:54:30.760 --> 00:54:32.010
more careful about it.
00:54:38.768 --> 00:54:42.180
Oh, excuse me.
00:54:42.180 --> 00:54:45.640
The theorem is that the left
eigenvector pi is a steady
00:54:45.640 --> 00:54:48.040
state vector if it's normalized
in this way.
00:54:48.040 --> 00:54:53.250
In other words, we know that
there is a left eigenvector
00:54:53.250 --> 00:54:57.830
pi, which has eigenvalue 1,
because there's a right
00:54:57.830 --> 00:54:58.260
eigenvector.
00:54:58.260 --> 00:55:00.850
If there's a right eigenvector,
there has to be a
00:55:00.850 --> 00:55:02.320
left eigenvector.
00:55:02.320 --> 00:55:06.190
What we don't know is
that pi actually has
00:55:06.190 --> 00:55:08.340
non-negative terms.
00:55:08.340 --> 00:55:11.320
So that's the thing
we want to show.
00:55:11.320 --> 00:55:15.790
The proof is, there must be
a left eigenvector pi for
00:55:15.790 --> 00:55:16.960
eigenvalue 1.
00:55:16.960 --> 00:55:18.540
We already know that.
00:55:18.540 --> 00:55:25.275
For every j, Pi sub j is equal
to the sum over k times pi sub
00:55:25.275 --> 00:55:27.490
k times p sub kj.
00:55:27.490 --> 00:55:29.860
We don't know whether these
are complex or real.
00:55:29.860 --> 00:55:32.140
We don't know whether they're
positive or negative, if
00:55:32.140 --> 00:55:33.270
they're real.
00:55:33.270 --> 00:55:37.960
But we do know that since they
satisfy this eigenvector
00:55:37.960 --> 00:55:41.500
equation, they satisfy
this equation.
00:55:41.500 --> 00:55:43.670
If I take the magnitudes
of all of these
00:55:43.670 --> 00:55:45.220
things, what do I get?
00:55:45.220 --> 00:55:51.440
The magnitude on this side
is pi sub j magnitude.
00:55:51.440 --> 00:55:55.700
This is less than or equal to
the sum of the magnitudes of
00:55:55.700 --> 00:55:56.720
these terms.
00:55:56.720 --> 00:56:03.690
If you take two complex numbers
and you add them up,
00:56:03.690 --> 00:56:07.250
you get something which, in
magnitude, is less than or
00:56:07.250 --> 00:56:10.120
equal to the sum of
the magnitudes.
00:56:12.804 --> 00:56:16.030
It might sound strange,
but if you look
00:56:16.030 --> 00:56:20.070
in the complex plane--
00:56:20.070 --> 00:56:23.480
imaginary, real--
00:56:23.480 --> 00:56:27.250
and you look at one complex
number, and you add it to
00:56:27.250 --> 00:56:33.380
another complex number, this
distance here is less than or
00:56:33.380 --> 00:56:36.700
equal to this magnitude
plus this magnitude.
00:56:36.700 --> 00:56:38.940
That's all that equation
is saying.
00:56:38.940 --> 00:56:44.160
And this is equal to this
distance plus this distance if
00:56:44.160 --> 00:56:51.080
and only if each of these
components of the eigenvector
00:56:51.080 --> 00:56:55.110
that we're talking about, if and
only if those components
00:56:55.110 --> 00:56:57.630
are all heading off in
the same direction
00:56:57.630 --> 00:57:00.620
in the complex plane.
00:57:00.620 --> 00:57:02.404
Now what do we do?
00:57:02.404 --> 00:57:05.950
Well, you look at this for a
while and you say, OK, what
00:57:05.950 --> 00:57:11.031
happens if I sum this
inequality over j?
00:57:11.031 --> 00:57:15.320
Well, if I sum this
over j, I get one.
00:57:15.320 --> 00:57:28.410
And therefore when I sum both
sides over j, the sum over j
00:57:28.410 --> 00:57:33.240
of the magnitudes of these
eigenvector components is less
00:57:33.240 --> 00:57:36.570
than or equal to the sum over
k of the magnitude.
00:57:36.570 --> 00:57:38.760
This is the same as this.
00:57:38.760 --> 00:57:42.220
This j is just a dummy
index of summation.
00:57:42.220 --> 00:57:45.030
This is a dummy index
of summation.
00:57:45.030 --> 00:57:47.810
Obviously, this is less
than or equal to this.
00:57:47.810 --> 00:57:52.470
But what's interesting here is
that this is equal to this.
00:57:52.470 --> 00:57:56.290
And the only way this can be
equal to this is if every one
00:57:56.290 --> 00:58:00.450
of these things are satisfied
with equality.
00:58:00.450 --> 00:58:03.720
If any one of these are
satisfied with inequality,
00:58:03.720 --> 00:58:07.690
then when you add them all up,
this will be satisfied with
00:58:07.690 --> 00:58:10.120
inequality also, which
is impossible.
00:58:10.120 --> 00:58:15.080
So all of these are satisfied
with equality, which says that
00:58:15.080 --> 00:58:25.060
the magnitude of pi sub j, the
vector whose elements are the
00:58:25.060 --> 00:58:31.010
magnitudes of this thing we
started with, in fact form a
00:58:31.010 --> 00:58:35.700
steady state vector if we
normalize them to 1.
00:58:35.700 --> 00:58:38.600
It says these magnitudes
satisfy the
00:58:38.600 --> 00:58:41.270
steady state equation.
00:58:41.270 --> 00:58:45.010
These magnitudes are real
and they're positive.
00:58:45.010 --> 00:58:48.470
So when we normalize them to
sum to 1, we have a steady
00:58:48.470 --> 00:58:50.940
state vector.
00:58:50.940 --> 00:58:53.780
And therefore the left
eigenvector pi of eigenvalue 1
00:58:53.780 --> 00:58:57.790
is a steady state vector if it's
normalized to pi times e
00:58:57.790 --> 00:59:03.120
equals 1, which is the way we
want to normalize them.
00:59:03.120 --> 00:59:07.140
So there always is a steady
state vector for every
00:59:07.140 --> 00:59:08.840
finite-state Markov chain.
00:59:12.440 --> 00:59:15.580
So this is a non-negative vector
satisfying a steady
00:59:15.580 --> 00:59:16.960
state vector equation.
00:59:16.960 --> 00:59:20.420
And normalizing it, we have
a steady state vector.
00:59:20.420 --> 00:59:24.300
So we've demonstrated the
existence of a left
00:59:24.300 --> 00:59:27.480
eigenvector which is a
steady state vector.
00:59:27.480 --> 00:59:34.180
Another theorem is that every
eigenvalue satisfies lambda,
00:59:34.180 --> 00:59:37.520
magnitude of the eigenvalue is
less than or equal to 1.
00:59:37.520 --> 00:59:41.370
This, again, is sort of obvious,
because if you have
00:59:41.370 --> 00:59:45.190
an eigenvalue which is bigger
than 1 and you start taking
00:59:45.190 --> 00:59:49.020
powers of it, it starts marching
off to infinity.
00:59:49.020 --> 00:59:50.920
Now, you might say, maybe
something else
00:59:50.920 --> 00:59:52.140
is balancing that.
00:59:52.140 --> 00:59:55.760
But since you only have a finite
number of these things,
00:59:55.760 --> 00:59:58.050
that sounds pretty weird.
00:59:58.050 --> 00:59:59.790
And in fact, it is.
00:59:59.790 --> 01:00:09.140
So the proof of this is, we want
to assume that pi super l
01:00:09.140 --> 01:00:14.985
is the l-th of these
eigenvectors of P. Its
01:00:14.985 --> 01:00:18.820
eigenvalue is lambda sub l.
01:00:18.820 --> 01:00:25.170
It also is a left eigenvector of
P to the n with eigenvalue
01:00:25.170 --> 01:00:26.370
lambda to the n.
01:00:26.370 --> 01:00:29.120
That's what we've
shown before.
01:00:29.120 --> 01:00:33.070
I mean, you can multiply this
matrix P, and all you're doing
01:00:33.070 --> 01:00:37.710
is just taking powers
of the eigenvalue.
01:00:37.710 --> 01:00:43.160
So if we start out with lambda
to the n, let's forget about
01:00:43.160 --> 01:00:46.290
the l's, because we're just
looking at a fixed l now.
01:00:46.290 --> 01:00:54.870
Lambda to the nth power times
the j-th component of pi is
01:00:54.870 --> 01:01:04.900
equal to the sum over i of the
i-th component of pi times Pij
01:01:04.900 --> 01:01:06.640
to the n, for all j.
01:01:11.080 --> 01:01:14.430
Now I take the magnitude of
everything is before.
01:01:14.430 --> 01:01:17.510
The magnitude of this is, again,
less than or equal to
01:01:17.510 --> 01:01:19.380
the magnitude of this.
01:01:19.380 --> 01:01:25.510
I want to let beta be the
largest of these quantities.
01:01:25.510 --> 01:01:32.240
And when I put that maximizing
j in here, lambda to the l
01:01:32.240 --> 01:01:40.550
times beta is less than or equal
to the sum over i of--
01:01:40.550 --> 01:01:43.810
I can upper-bound
these by beta.
01:01:43.810 --> 01:01:47.340
So I wind up with lambda to the
l times beta is less than
01:01:47.340 --> 01:01:51.800
or equal to the sum over i of
beta times Pij to the n.
01:01:51.800 --> 01:01:54.680
I don't know what these powers
are, but they're certainly
01:01:54.680 --> 01:01:57.340
less than or equal to 1.
01:01:57.340 --> 01:02:03.920
So lambda sub l is less
than or equal to n.
01:02:03.920 --> 01:02:05.260
That's what this said.
01:02:05.260 --> 01:02:14.680
When you take this magnitude of
the l-th eigenvalue, it's
01:02:14.680 --> 01:02:17.210
less than or equal
to this number n.
01:02:17.210 --> 01:02:22.310
Now, if this number were larger
than 1, if it was 1
01:02:22.310 --> 01:02:27.300
plus 10 to the minus sixth,
and you multiplied it by a
01:02:27.300 --> 01:02:31.410
large enough number n, that
this would grow to be
01:02:31.410 --> 01:02:33.330
arbitrarily large.
01:02:33.330 --> 01:02:36.880
It can't grow to be arbitrarily
large, therefore
01:02:36.880 --> 01:02:39.890
the magnitude of lambda
sub l has to be less
01:02:39.890 --> 01:02:41.880
than or equal to 1.
01:02:41.880 --> 01:02:48.980
Tedious proof, but
unfortunately, the notes just
01:02:48.980 --> 01:02:50.230
assume this.
01:02:53.630 --> 01:02:56.610
Maybe I had some good, simple
reason for it before.
01:02:56.610 --> 01:02:59.600
I don't have any now, so I have
to go through a proof.
01:02:59.600 --> 01:03:04.440
Anyway, these two theorems, if
you look at them, are valid
01:03:04.440 --> 01:03:06.880
for all finite-state
Markov chains.
01:03:06.880 --> 01:03:12.190
There was no place that we
used the fact that we had
01:03:12.190 --> 01:03:14.790
anything with distinct
eigenvalues or anything.
01:03:14.790 --> 01:03:20.840
But now when we had distinct
eigenvalues, we have the nth
01:03:20.840 --> 01:03:28.500
power of P is the sum here again
over right eigenvectors
01:03:28.500 --> 01:03:32.600
times left eigenvectors.
01:03:32.600 --> 01:03:35.340
When you take a right
eigenvector, which is a column
01:03:35.340 --> 01:03:39.720
vector, times a left
eigenvector, which is a row
01:03:39.720 --> 01:03:44.080
vector, you get an
M by M matrix.
01:03:44.080 --> 01:03:47.330
I don't know what that matrix
is, but it's a matrix.
01:03:47.330 --> 01:03:50.980
It's a fixed matrix
independent of n.
01:03:50.980 --> 01:03:53.390
And the only thing that's
varying with n is these
01:03:53.390 --> 01:03:56.000
eigenvalues.
01:03:56.000 --> 01:03:59.220
These quantities are less
than or equal to 1.
01:03:59.220 --> 01:04:03.270
So if the chain is an ergodic
unit chain, we've already seen
01:04:03.270 --> 01:04:07.250
that one eigenvalue is 1, and
the rest of the eigenvalues
01:04:07.250 --> 01:04:09.260
are strictly less than
1 in magnitude.
01:04:09.260 --> 01:04:13.600
We saw that by showing that for
an ergodic unit chain, P
01:04:13.600 --> 01:04:16.060
to the n converged.
01:04:16.060 --> 01:04:21.280
So the rate at which P to the
n approaches e times pi is
01:04:21.280 --> 01:04:24.190
going to be determined
by the second-largest
01:04:24.190 --> 01:04:27.710
eigenvalue in here.
01:04:27.710 --> 01:04:31.180
And that second-largest
eigenvalue is going to be less
01:04:31.180 --> 01:04:33.880
than 1, strictly less than 1.
01:04:33.880 --> 01:04:35.170
We don't know what it is.
01:04:35.170 --> 01:04:39.040
Before, we knew this convergence
here for an
01:04:39.040 --> 01:04:43.050
ergodic unit chain
is exponential.
01:04:43.050 --> 01:04:45.380
Now we know that it's
exponential and we know
01:04:45.380 --> 01:04:48.740
exactly how fast it goes,
because the speed of
01:04:48.740 --> 01:04:52.480
convergence is just the
second-largest eigenvalue.
01:04:52.480 --> 01:04:58.530
If you want to know how fast P
to the n approaches e times
01:04:58.530 --> 01:05:02.170
the steady state vector pi,
all you have to do is find
01:05:02.170 --> 01:05:05.220
that second-largest eigenvalue,
and that tells you
01:05:05.220 --> 01:05:09.560
how fast the convergence is,
except for calculating these
01:05:09.560 --> 01:05:11.015
things, which are just fixed.
01:05:13.580 --> 01:05:19.200
If P is a periodic unit chain
with period d, then if you
01:05:19.200 --> 01:05:20.110
read the notes--
01:05:20.110 --> 01:05:22.110
you should read the notes--
01:05:22.110 --> 01:05:24.420
there are d eigenvalues
equally spaced
01:05:24.420 --> 01:05:26.160
around the unit circle.
01:05:26.160 --> 01:05:28.470
P to the n doesn't converge.
01:05:28.470 --> 01:05:33.040
The only thing you can say here
is, what happens if you
01:05:33.040 --> 01:05:37.280
look at P to the d-th power?
01:05:37.280 --> 01:05:39.890
And you can imagine what happens
if you look at P to
01:05:39.890 --> 01:05:44.070
the d-th power without
doing any analysis.
01:05:44.070 --> 01:05:49.290
I mean, we know that what
happens in a periodic chain is
01:05:49.290 --> 01:05:53.170
that you rotate from one set
of states to another set of
01:05:53.170 --> 01:05:56.220
states to another set of states
to another set of
01:05:56.220 --> 01:05:57.910
states, and then back
to the set of
01:05:57.910 --> 01:05:59.110
states you started with.
01:05:59.110 --> 01:06:01.220
And you keep rotating around.
01:06:01.220 --> 01:06:05.520
Now, there are d sets of states
going around here.
01:06:05.520 --> 01:06:08.860
What happens if I
take P to the d?
01:06:08.860 --> 01:06:12.320
P to the d is looking at
the d-step transitions.
01:06:12.320 --> 01:06:16.840
So it's looking at, if you start
here, after d steps,
01:06:16.840 --> 01:06:19.310
you're back here again,
after d steps,
01:06:19.310 --> 01:06:20.960
you're back here again.
01:06:20.960 --> 01:06:31.700
So the matrix, P to the d, is
in fact the matrix of d
01:06:31.700 --> 01:06:35.180
ergodic subclasses.
01:06:37.940 --> 01:06:41.090
And for each one of them,
whatever subclass you start
01:06:41.090 --> 01:06:44.200
in, you stay in that
subclass forever.
01:06:44.200 --> 01:06:49.130
So the analysis of a periodic
unit chain, really the classy
01:06:49.130 --> 01:06:52.730
way to do it is to look
at P to the d and see
01:06:52.730 --> 01:06:54.980
what happens there.
01:06:54.980 --> 01:06:58.800
And you see that you get
convergence within each
01:06:58.800 --> 01:07:03.030
subclass, but you just keep
rotating among subclasses.
01:07:03.030 --> 01:07:06.060
So there's nothing very
fancy going on there.
01:07:06.060 --> 01:07:09.860
You just rotate from one
subclass to another.
01:07:09.860 --> 01:07:12.350
And that's the way it is.
01:07:12.350 --> 01:07:14.720
And P to the n doesn't
converge.
01:07:14.720 --> 01:07:18.495
But P to the d times
n does converge.
01:07:22.900 --> 01:07:30.050
Now, let's look at the next-most
complicated state.
01:07:30.050 --> 01:07:34.460
Suppose we have M states and
we have M independent
01:07:34.460 --> 01:07:35.180
eigenvectors.
01:07:35.180 --> 01:07:38.650
OK, remember I told you that
there was a very ugly thing in
01:07:38.650 --> 01:07:43.140
linear algebra that said, when
you had an eigenvalue of
01:07:43.140 --> 01:07:50.590
multiplicity k, you might not
have k linearly independent
01:07:50.590 --> 01:07:51.060
eigenvectors.
01:07:51.060 --> 01:07:52.720
You might have a smaller
number of them.
01:07:52.720 --> 01:07:55.070
We'll look at an example
of that later.
01:07:55.070 --> 01:07:58.730
But here, I'm saying, let's
forget about that case,
01:07:58.730 --> 01:08:00.760
because it's ugly.
01:08:00.760 --> 01:08:04.640
Let's assume that whatever
multiplicity each of these
01:08:04.640 --> 01:08:08.950
eigenvalues has, if you have
an eigenvalue with
01:08:08.950 --> 01:08:15.010
multiplicity k, then you have
k linearly independent right
01:08:15.010 --> 01:08:19.279
eigenvectors and k linearly
independent left eigenvectors
01:08:19.279 --> 01:08:20.960
to correspond to that.
01:08:20.960 --> 01:08:26.420
And then when you add up all of
the eigenvectors, you have
01:08:26.420 --> 01:08:30.020
M linearly independent
eigenvectors.
01:08:30.020 --> 01:08:36.029
And what happens when you have M
linearly independent vectors
01:08:36.029 --> 01:08:39.710
in a space of dimension M?
01:08:39.710 --> 01:08:42.649
If you have M linearly
independent vectors in a space
01:08:42.649 --> 01:08:48.880
of dimension N, you expand the
whole space, which says that
01:08:48.880 --> 01:08:53.490
the vector of these eigenvectors
is in fact
01:08:53.490 --> 01:08:56.920
non-singular, which says, again,
we can do all of the
01:08:56.920 --> 01:08:58.700
stuff we did before.
01:08:58.700 --> 01:09:01.830
There's a little bit of a trick
in showing that the left
01:09:01.830 --> 01:09:04.460
eigenvectors and the right
eigenvectors can be made
01:09:04.460 --> 01:09:06.490
orthogonal.
01:09:06.490 --> 01:09:10.359
But aside from that,
P to the n is again
01:09:10.359 --> 01:09:13.960
equal to the same form.
01:09:13.960 --> 01:09:23.550
And what this form says is, if
all of the eigenvalues except
01:09:23.550 --> 01:09:27.250
one are less than 1, then you're
again going to approach
01:09:27.250 --> 01:09:28.649
steady state.
01:09:28.649 --> 01:09:29.899
What does that mean?
01:09:32.870 --> 01:09:39.729
Suppose I have more than one
ergodic chain, more than one
01:09:39.729 --> 01:09:44.350
ergodic class, or suppose I
have a periodic class or
01:09:44.350 --> 01:09:45.130
something else.
01:09:45.130 --> 01:09:49.399
Is it possible to have one
eigenvalue equal to 1 and all
01:09:49.399 --> 01:09:52.040
the other eigenvalues
be smaller?
01:09:52.040 --> 01:09:55.670
If there's one eigenvalue that's
equal to 1, according
01:09:55.670 --> 01:09:59.740
to this formula here, eventually
P to the n
01:09:59.740 --> 01:10:05.090
converges to that one
value equal to 1.
01:10:05.090 --> 01:10:09.290
And right eigenvector
can be taken as e.
01:10:09.290 --> 01:10:13.230
Left eigenvector can be taken
as a steady state vector pi.
01:10:13.230 --> 01:10:16.250
And we have the case
of convergence.
01:10:16.250 --> 01:10:20.830
Can you have convergence to all
the rows being the same if
01:10:20.830 --> 01:10:24.830
you have multiple
ergodic classes?
01:10:24.830 --> 01:10:25.900
No.
01:10:25.900 --> 01:10:28.820
If you have multiple ergodic
classes and you start out in
01:10:28.820 --> 01:10:30.040
one class, you stay there.
01:10:30.040 --> 01:10:32.350
You can't get out of it.
01:10:32.350 --> 01:10:35.190
If you have a periodic class
and you start out in that
01:10:35.190 --> 01:10:39.120
periodic class, you can't
have convergence there.
01:10:39.120 --> 01:10:47.100
So in this situation here, where
all the eigenvalues are
01:10:47.100 --> 01:10:51.180
distinct, you can only have
one eigenvalue equal to 1.
01:10:51.180 --> 01:10:55.270
Here, when we're going to this
more general case, we might
01:10:55.270 --> 01:10:58.470
have more than one eigenvalue
equal to 1.
01:10:58.470 --> 01:11:02.960
But if in fact we only have one
eigenvalue equal to 1, and
01:11:02.960 --> 01:11:06.440
all the others are strictly
smaller in magnitude, then in
01:11:06.440 --> 01:11:09.620
fact you're just talking about
this case of an ergodic unit
01:11:09.620 --> 01:11:10.505
chain again.
01:11:10.505 --> 01:11:14.490
It's the only place
you can be.
01:11:14.490 --> 01:11:19.350
So let's look at an
example of this.
01:11:19.350 --> 01:11:23.050
Suppose you have a Markov
chain which has l
01:11:23.050 --> 01:11:26.610
ergodic sets of states.
01:11:26.610 --> 01:11:29.420
You have one set of states.
01:11:40.990 --> 01:11:47.610
So we have one set of states
over here, which will all go
01:11:47.610 --> 01:11:50.480
back and forth to each other.
01:11:50.480 --> 01:11:52.850
Then another set of
states over here.
01:11:58.260 --> 01:12:03.840
Let's let l equal
2 in this case.
01:12:03.840 --> 01:12:05.945
So what happens in
this situation?
01:12:16.840 --> 01:12:18.660
We'll have to work quickly
before it gets up.
01:12:25.400 --> 01:12:29.860
Anybody with any sense, faced
with a Markov chain like this,
01:12:29.860 --> 01:12:32.800
would say if we start here,
we're going to stay here, if
01:12:32.800 --> 01:12:35.020
we start here, we're
going to stay here.
01:12:35.020 --> 01:12:37.150
Let's just analyze this first.
01:12:37.150 --> 01:12:39.390
And then after we're done
analyzing this,
01:12:39.390 --> 01:12:40.960
we'll analyze this.
01:12:40.960 --> 01:12:43.160
And then we'll put the
whole thing together.
01:12:43.160 --> 01:12:48.180
And what we will find is
a transition matrix
01:12:48.180 --> 01:12:49.510
which looks like this.
01:12:54.540 --> 01:12:56.420
And if you're here,
you stay here.
01:12:56.420 --> 01:12:57.990
If you're here, you stay here.
01:12:57.990 --> 01:13:01.630
We can find the eigenvalues
and eigenvectors of this.
01:13:01.630 --> 01:13:05.030
We can find the eigenvalues
and eigenvectors of this.
01:13:05.030 --> 01:13:08.530
If you look at this crazy
formula for finding
01:13:08.530 --> 01:13:12.940
determinants, what you're stuck
with is permutations
01:13:12.940 --> 01:13:16.500
within here times permutations
within here.
01:13:16.500 --> 01:13:20.490
So the eigenvalues that you wind
up with are products of
01:13:20.490 --> 01:13:21.960
the two eigenvalues.
01:13:21.960 --> 01:13:29.970
Or any eigenvalue here is an
eigenvalue of the whole thing.
01:13:29.970 --> 01:13:32.715
Any eigenvalue here is an
eigenvalue of the whole thing.
01:13:32.715 --> 01:13:36.120
And we just look at the sum of
the number of eigenvalues here
01:13:36.120 --> 01:13:37.300
and the number there.
01:13:37.300 --> 01:13:40.490
So we have a very boring
case here.
01:13:40.490 --> 01:13:44.750
Each ergodic set has an
eigenvalue equal to 1, has a
01:13:44.750 --> 01:13:47.580
right eigenvector equal to 1.
01:13:47.580 --> 01:13:53.090
When the steps of that state
and 0 elsewhere.
01:13:53.090 --> 01:13:56.290
There's also a steady state
vector on that set of states.
01:13:56.290 --> 01:13:58.120
We've already seen that.
01:13:58.120 --> 01:14:03.940
So P to the n converges to a
block diagonal matrix, where
01:14:03.940 --> 01:14:08.270
for each ergodic set, the rows
within that set are the same.
01:14:08.270 --> 01:14:21.400
So P to the n then
is pi 1, pi 1.
01:14:21.400 --> 01:14:27.095
And then here, we have
pi 2, pi 2, pi 2.
01:14:29.610 --> 01:14:34.000
So that's all that
can happen here.
01:14:34.000 --> 01:14:35.250
This is limit.
01:14:42.090 --> 01:14:47.220
So one message of this is that,
after you understand
01:14:47.220 --> 01:14:51.740
ergodic unit chains, you
understand almost everything.
01:14:51.740 --> 01:14:55.310
You still have to worry about
periodic unit chains.
01:14:55.310 --> 01:14:58.220
But you just take a power of
them, and then you have
01:14:58.220 --> 01:15:00.400
ergodic sets of states.
01:15:04.650 --> 01:15:07.250
one final thing.
01:15:07.250 --> 01:15:09.640
Good, I have five minutes
to talk about this.
01:15:09.640 --> 01:15:12.480
I don't want any more time to
talk about it, because I'll
01:15:12.480 --> 01:15:15.490
get terribly confused if I do.
01:15:15.490 --> 01:15:21.030
And it's a topic which, if you
want to read more about it,
01:15:21.030 --> 01:15:24.610
read about it in Strang.
01:15:24.610 --> 01:15:27.010
He obviously doesn't like
the topic either.
01:15:27.010 --> 01:15:28.710
Nobody likes the topic.
01:15:28.710 --> 01:15:33.320
Strang at least was driven to
say something clear about it.
01:15:33.320 --> 01:15:36.330
Most people don't even
bother to say
01:15:36.330 --> 01:15:38.260
something clear about it.
01:15:38.260 --> 01:15:42.190
There's a theorem, due to, I
guess, Jordan, because it's
01:15:42.190 --> 01:15:45.320
called a Jordan form.
01:15:45.320 --> 01:15:51.210
And what Jordan said is, in
the nice cases we talked
01:15:51.210 --> 01:15:57.860
about, you have this
decomposition of the
01:15:57.860 --> 01:16:04.090
transition matrix in P into a
matrix here whose columns are
01:16:04.090 --> 01:16:09.480
the right eigenvectors times
a matrix here, which is a
01:16:09.480 --> 01:16:13.140
diagonal matrix with the
eigenvalues along it.
01:16:13.140 --> 01:16:19.980
And this, finally, is a matrix
which is the inverse of this,
01:16:19.980 --> 01:16:24.200
and, which properly normalized,
is the left
01:16:24.200 --> 01:16:33.400
eigenvectors of P. And you can
replace this form by what's
01:16:33.400 --> 01:16:39.040
called a Jordan form, where P
is equal to some matrix u
01:16:39.040 --> 01:16:45.720
times the Jordan form matrix
j times the inverse of u.
01:16:45.720 --> 01:16:49.870
Now, u is no longer the
right eigenvectors.
01:16:49.870 --> 01:16:52.480
It can't be the right
eigenvectors, because when we
01:16:52.480 --> 01:16:56.090
needed Jordan form, we don't
have enough right eigenvectors
01:16:56.090 --> 01:16:58.030
to span the space.
01:16:58.030 --> 01:17:00.910
So it has to be something
else.
01:17:00.910 --> 01:17:04.450
And like everyone else,
we say, I don't care
01:17:04.450 --> 01:17:06.320
what that matrix is.
01:17:06.320 --> 01:17:09.940
Jordan proved that there is such
a matrix, and that's all
01:17:09.940 --> 01:17:11.270
we want to know.
01:17:11.270 --> 01:17:17.230
The important thing is that this
matrix j in here is as
01:17:17.230 --> 01:17:19.860
close as you can get it.
01:17:19.860 --> 01:17:25.400
It's a matrix, which along the
main diagonal, has all the
01:17:25.400 --> 01:17:28.310
eigenvalues with their
appropriate multiplicity.
01:17:28.310 --> 01:17:31.670
Namely, lambda 1 is
an eigenvalue with
01:17:31.670 --> 01:17:33.550
multiplicity 2.
01:17:33.550 --> 01:17:38.700
Lambda 2 is an eigenvalue
of multiplicity 3.
01:17:38.700 --> 01:17:43.210
And in this situation, you have
two eigenvectors here, so
01:17:43.210 --> 01:17:46.180
nothing appears up there.
01:17:46.180 --> 01:17:53.530
With this multiplicity 3
eigenvalue, there are only two
01:17:53.530 --> 01:17:56.370
linearly independent
eigenvectors.
01:17:56.370 --> 01:18:00.640
And therefore Jordan says, why
don't we stick a 1 in here and
01:18:00.640 --> 01:18:03.270
then solve everything else?
01:18:03.270 --> 01:18:08.770
And his theorem says, if you
do that, it in fact works.
01:18:08.770 --> 01:18:11.190
So every time--
01:18:11.190 --> 01:18:17.850
well, the eigenvalue is on the
main diagonal, the ones on the
01:18:17.850 --> 01:18:22.190
next diagonal up, the only place
would be anything non-0
01:18:22.190 --> 01:18:25.850
is on the main diagonal in this
form, and on the next
01:18:25.850 --> 01:18:29.400
diagonal up, where you
occasionally have a 1.
01:18:29.400 --> 01:18:33.420
And the 1 is to replace
the need for deficient
01:18:33.420 --> 01:18:33.900
eigenvectors.
01:18:33.900 --> 01:18:37.230
So every time you have a
deficient eigenvector, you
01:18:37.230 --> 01:18:39.260
have some 1 appearing there.
01:18:39.260 --> 01:18:40.960
And then there's a way
to solve for u.
01:18:40.960 --> 01:18:44.650
And I don't have any idea what
it is, and I don't care.
01:18:44.650 --> 01:18:49.390
But if you get interested in it,
I think that's wonderful.
01:18:49.390 --> 01:18:53.075
But please don't tell
me about it.
01:18:59.250 --> 01:19:04.400
Nice example of this is
this matrix here.
01:19:04.400 --> 01:19:10.160
What happens if you try to
take the determinant of P
01:19:10.160 --> 01:19:11.835
minus lambda i?
01:19:11.835 --> 01:19:16.850
Well, you have 1/2 minus lambda,
1/2 minus lambda, 1
01:19:16.850 --> 01:19:19.250
minus lambda.
01:19:19.250 --> 01:19:25.180
What are all the permutations
here that you can take?
01:19:25.180 --> 01:19:29.200
There's the permutation of
the main diagonal itself.
01:19:29.200 --> 01:19:33.380
If I try to include that
element, there's nothing I can
01:19:33.380 --> 01:19:35.880
do but have some element
down here.
01:19:35.880 --> 01:19:37.150
And all these elements are 0.
01:19:39.870 --> 01:19:43.480
So those elements don't
contribute to a
01:19:43.480 --> 01:19:45.140
determinant at all.
01:19:45.140 --> 01:19:49.100
So I have one eigenvalue
which is equal to 1.
01:19:49.100 --> 01:19:53.020
I have two values at
multiplicity 2, eigenvalue
01:19:53.020 --> 01:19:54.600
which is 1/2.
01:19:54.600 --> 01:19:58.070
If you try to find the
eigenvector here, you find
01:19:58.070 --> 01:19:59.930
there is only one.
01:19:59.930 --> 01:20:03.700
So in fact, this corresponds
to a Jordan form,
01:20:03.700 --> 01:20:07.180
where you have 1/2.
01:20:15.300 --> 01:20:22.010
1, and a 0, and a 1 here,
and 0 everywhere else.
01:20:29.650 --> 01:20:37.110
And now if I want to find P to
the n, I have u times this j
01:20:37.110 --> 01:20:39.320
times u to the minus
1 times u.
01:20:39.320 --> 01:20:42.140
All the u's in the middle
cancel out, so I wind up
01:20:42.140 --> 01:20:46.640
eventually with u times j
to the nth power times u
01:20:46.640 --> 01:20:48.020
to the minus 1.
01:20:48.020 --> 01:20:49.490
What is j to the nth power?
01:20:49.490 --> 01:20:56.260
What happens if I multiply this
matrix by itself n times?
01:20:56.260 --> 01:20:59.970
Well, it turns out that what
happens is that this main
01:20:59.970 --> 01:21:03.880
diagonal here, you wind
up with a 1/4 and
01:21:03.880 --> 01:21:06.350
then 1/8 and so forth.
01:21:06.350 --> 01:21:13.190
This term here, it goes
down exponential.
01:21:13.190 --> 01:21:24.270
Well, if you multiply this by
itself, eventually, you can
01:21:24.270 --> 01:21:27.920
see what's going on here more
easily if you draw the Markov
01:21:27.920 --> 01:21:29.160
chain for it.
01:21:29.160 --> 01:21:34.590
You have state 1, state
2, and state 3.
01:21:34.590 --> 01:21:40.680
State 1, there's a transition
1/2 and a transition 1/2.
01:21:40.680 --> 01:21:47.530
State 2, there's a transition
1/2 and a transition 1/2, And
01:21:47.530 --> 01:21:50.810
state 3, you just stay there.
01:21:50.810 --> 01:21:53.600
So the amount of time that it
takes you to get to steady
01:21:53.600 --> 01:21:56.820
state is the amount of
time it takes you--
01:21:56.820 --> 01:21:58.690
you start in state 1.
01:21:58.690 --> 01:22:01.930
You've got to make this
transition eventually, and
01:22:01.930 --> 01:22:05.690
then you've got to make this
transition eventually.
01:22:05.690 --> 01:22:08.800
And the amount of time that it
takes you to do that is the
01:22:08.800 --> 01:22:12.170
sum of the amount of time it
takes you to go there, plus
01:22:12.170 --> 01:22:15.220
the amount of time that
it takes to go there.
01:22:15.220 --> 01:22:16.960
So you have two random
variables.
01:22:16.960 --> 01:22:19.400
One is the time to go here.
01:22:19.400 --> 01:22:22.320
The other is the time
to go here.
01:22:22.320 --> 01:22:24.590
Both of those are geometrically
decreasing
01:22:24.590 --> 01:22:25.960
random variables.
01:22:25.960 --> 01:22:30.470
When we convolve those things
with each other, what we get
01:22:30.470 --> 01:22:31.940
is an extra term n.
01:22:31.940 --> 01:22:40.070
So we get an n times
1/2 to the n.
01:22:40.070 --> 01:22:43.200
So the thing which is different
in the Jordan form
01:22:43.200 --> 01:22:47.200
is, instead of having an
eigenvalue to the nth power,
01:22:47.200 --> 01:22:50.840
you have an eigenvalue times--
01:22:50.840 --> 01:22:54.610
if there's only a single one
there, there's an n there.
01:22:54.610 --> 01:22:58.450
If there are two 1s both
together, you get an n times n
01:22:58.450 --> 01:23:00.230
minus 1, and so forth.
01:23:00.230 --> 01:23:04.690
So worst case, you've got a
polynomial to the nth power
01:23:04.690 --> 01:23:07.020
times an eigenvalue.
01:23:07.020 --> 01:23:10.130
For all practical purposes, this
is still the eigenvalue
01:23:10.130 --> 01:23:11.950
going down exponentially.
01:23:11.950 --> 01:23:17.090
So for all practical purposes,
what you wind up with is the
01:23:17.090 --> 01:23:22.180
second-largest eigenvalue still
determines how fast you
01:23:22.180 --> 01:23:23.430
get convergence.
01:23:26.490 --> 01:23:29.120
Sorry, I took eight minutes
talking about the Jordan form.
01:23:29.120 --> 01:23:32.020
I wanted to take five minutes
talking about it.
01:23:32.020 --> 01:23:34.030
You can read more about
it in the notes.