1
00:00:00,530 --> 00:00:02,960
The following content is
provided under a Creative
2
00:00:02,960 --> 00:00:04,370
Commons license.
3
00:00:04,370 --> 00:00:07,410
Your support will help MIT
OpenCourseWare continue to
4
00:00:07,410 --> 00:00:11,060
offer high-quality educational
resources for free.
5
00:00:11,060 --> 00:00:13,960
To make a donation or view
additional materials from
6
00:00:13,960 --> 00:00:19,790
hundreds of MIT courses, visit
MIT OpenCourseWare at
7
00:00:19,790 --> 00:00:21,040
ocw.mit.edu.
8
00:00:23,540 --> 00:00:29,950
PROFESSOR: OK, let's get started
again on finite-state
9
00:00:29,950 --> 00:00:31,490
Markov chains.
10
00:00:31,490 --> 00:00:33,130
Sorry I was away last week.
11
00:00:33,130 --> 00:00:38,390
It was a long-term commitment
that I had to honor.
12
00:00:38,390 --> 00:00:40,170
But I think I will
be around for all
13
00:00:40,170 --> 00:00:41,550
the rest of the lectures.
14
00:00:41,550 --> 00:00:49,950
So I want to start out by
reviewing just a little bit.
15
00:00:49,950 --> 00:00:53,570
I'm spending a lot more time on
finite-state Markov chains
16
00:00:53,570 --> 00:00:57,940
than we usually do in this
course, partly because I've
17
00:00:57,940 --> 00:01:02,230
rewritten this section, partly
because I think the material
18
00:01:02,230 --> 00:01:04,510
is very important.
19
00:01:04,510 --> 00:01:09,850
It's sort of bread-and-butter
stuff, of
20
00:01:09,850 --> 00:01:11,930
discrete stochastic processes.
21
00:01:11,930 --> 00:01:13,690
You use it all the time.
22
00:01:13,690 --> 00:01:18,080
It's a foundation for almost
everything else.
23
00:01:18,080 --> 00:01:22,410
And after thinking about it
for a long time, it really
24
00:01:22,410 --> 00:01:23,800
isn't all that complicated.
25
00:01:23,800 --> 00:01:27,760
I used to think that all these
details of finding eigenvalues
26
00:01:27,760 --> 00:01:32,580
and eigenvectors and so on
was extremely tedious.
27
00:01:32,580 --> 00:01:35,810
And it turns out that
there's a very nice
28
00:01:35,810 --> 00:01:37,850
pleasant theory there.
29
00:01:37,850 --> 00:01:40,960
You can find all of these things
after you know what
30
00:01:40,960 --> 00:01:46,790
you're doing by very simple
computer packages.
31
00:01:46,790 --> 00:01:49,160
But they don't help if you don't
know what's going on.
32
00:01:49,160 --> 00:01:52,930
So here, we're trying to figure
out what's going on.
33
00:01:52,930 --> 00:01:57,720
So let's start out by reviewing
what we know about
34
00:01:57,720 --> 00:02:05,120
ergodic unit chains and
proceed from there.
35
00:02:05,120 --> 00:02:10,710
An ergodic finite-state Markov
chain has transition
36
00:02:10,710 --> 00:02:17,190
probabilities which, if you look
at the transition matrix
37
00:02:17,190 --> 00:02:21,180
raised to the nth power, what
that gives you is the
38
00:02:21,180 --> 00:02:24,440
transition probabilities of
an n-step Markov chain.
39
00:02:24,440 --> 00:02:29,360
In other words, you start at
time 0, and at time n, you
40
00:02:29,360 --> 00:02:31,310
look at what state you're in.
41
00:02:31,310 --> 00:02:38,190
P sub ij to the nth power is
then the probability that
42
00:02:38,190 --> 00:02:42,110
you're in state j at time
n, given that you're in
43
00:02:42,110 --> 00:02:45,910
state i at time 0.
44
00:02:45,910 --> 00:02:48,980
So this has all the information
that you want
45
00:02:48,980 --> 00:02:53,180
about what happens to Markov
chain as time gets large.
46
00:02:53,180 --> 00:02:57,010
One of the things we're most
concerned with is, do you go
47
00:02:57,010 --> 00:02:58,440
to steady state?
48
00:02:58,440 --> 00:03:01,150
And if you do go to steady
state, how fast do you go to
49
00:03:01,150 --> 00:03:02,610
steady state?
50
00:03:02,610 --> 00:03:05,830
And of course, this matrix
tells you the whole story
51
00:03:05,830 --> 00:03:11,720
there, because if you go to
steady state, and the Markov
52
00:03:11,720 --> 00:03:24,390
chain forgets where it started,
then P sub ij to the
53
00:03:24,390 --> 00:03:29,240
n goes to some constant, pi sub
j, which is independent of
54
00:03:29,240 --> 00:03:33,190
the starting state, i,
and independent of m,
55
00:03:33,190 --> 00:03:35,080
asymptotically, as n gets big.
56
00:03:35,080 --> 00:03:41,730
So this pi is a strictly
positive probability vector.
57
00:03:41,730 --> 00:03:43,680
I shouldn't say so it is.
58
00:03:43,680 --> 00:03:47,590
That's something that
was shown last time.
59
00:03:47,590 --> 00:03:55,530
If you multiply both sides of
this equation by P sub jk in
60
00:03:55,530 --> 00:03:57,980
sum over k, then what
do you get?
61
00:03:57,980 --> 00:04:01,580
You get P sub ik to
the n plus 1.
62
00:04:01,580 --> 00:04:02,995
That goes to a limit also.
63
00:04:02,995 --> 00:04:04,740
If the limit in n
goes to infin--
64
00:04:07,670 --> 00:04:10,730
then the limit as n plus 1 goes
to infinity is clearly
65
00:04:10,730 --> 00:04:12,150
the same thing.
66
00:04:12,150 --> 00:04:17,200
So this quantity here is
the sum over j, of pi
67
00:04:17,200 --> 00:04:19,329
sub j, P sub jk.
68
00:04:19,329 --> 00:04:25,730
And this quantity is equal to
pi sub k, just by definition
69
00:04:25,730 --> 00:04:26,930
of this quantity.
70
00:04:26,930 --> 00:04:29,960
So P sub k is equal
to sum of pi j.
71
00:04:29,960 --> 00:04:32,050
Pjk, what does that say?
72
00:04:32,050 --> 00:04:34,770
That's the definition of
a steady state vector.
73
00:04:34,770 --> 00:04:40,310
That's the definition of, if
your probabilities of being in
74
00:04:40,310 --> 00:04:46,490
state k satisfy this equation,
then one step later, you still
75
00:04:46,490 --> 00:04:49,580
have the same probability
of being in state k.
76
00:04:49,580 --> 00:04:52,870
Two steps later, you still have
the same probability of
77
00:04:52,870 --> 00:04:54,860
being in state k.
78
00:04:54,860 --> 00:05:00,430
So this is called the steady
state equation.
79
00:05:00,430 --> 00:05:05,070
And a solution to that is called
a steady state vector.
80
00:05:05,070 --> 00:05:07,795
And that satisfies this.
81
00:05:07,795 --> 00:05:10,990
In matrix terms, if you rate
this out, what does it say?
82
00:05:10,990 --> 00:05:14,800
It says the limit as n
approaches infinity of p to
83
00:05:14,800 --> 00:05:21,210
the n is equal to the column
vector, e of all 1s.
84
00:05:23,720 --> 00:05:26,300
The transpose here means
it's a column vector.
85
00:05:26,300 --> 00:05:30,200
So you have a column vector
times a row vector.
86
00:05:30,200 --> 00:05:32,760
Now, you know if you have a
row vector times a column
87
00:05:32,760 --> 00:05:36,360
vector, that just gives
you a number.
88
00:05:36,360 --> 00:05:38,960
If you have a column
vector times a row
89
00:05:38,960 --> 00:05:42,200
vector, what happens?
90
00:05:42,200 --> 00:05:46,510
Well, for each element
of the column, you
91
00:05:46,510 --> 00:05:47,970
get this whole row.
92
00:05:47,970 --> 00:05:50,840
And for the next element of the
column, you get the whole
93
00:05:50,840 --> 00:05:54,790
row down beneath it multiplied
by the element of the column,
94
00:05:54,790 --> 00:05:56,170
and so forth, day on.
95
00:05:56,170 --> 00:06:01,760
So a column vector times
a row vector is, in
96
00:06:01,760 --> 00:06:03,280
fact, a whole matrix.
97
00:06:03,280 --> 00:06:07,590
It's a j by j matrix.
98
00:06:07,590 --> 00:06:13,900
And since e is all 1s, what
that matrix is is a matrix
99
00:06:13,900 --> 00:06:17,950
where every row is a steady
state vector pi.
100
00:06:17,950 --> 00:06:21,950
So we're saying not only does
this pi that we're talking
101
00:06:21,950 --> 00:06:25,770
about satisfy this steady
state equation, but more
102
00:06:25,770 --> 00:06:29,640
important, it's this limiting
vector here.
103
00:06:29,640 --> 00:06:32,450
And as n goes to infinity,
you in fact do
104
00:06:32,450 --> 00:06:34,730
forget where you were.
105
00:06:34,730 --> 00:06:41,660
And the entire matrix of where
you are at time n, given where
106
00:06:41,660 --> 00:06:46,690
you were at time 0, goes to
just this fixed vector pi.
107
00:06:46,690 --> 00:06:51,377
So this is a column vector, and
pi is a row vector then.
108
00:06:54,660 --> 00:06:59,180
The same result almost holds
for ergodic unit chains.
109
00:06:59,180 --> 00:07:01,510
What's an ergodic unit chain?
110
00:07:01,510 --> 00:07:06,740
An ergodic unit chain is an
ergodic set of states plus a
111
00:07:06,740 --> 00:07:09,060
whole bunch of transient
states.
112
00:07:09,060 --> 00:07:12,970
Doesn't matter whether the
transient states are one class
113
00:07:12,970 --> 00:07:16,360
of transient states or whether
it's multiple classes of
114
00:07:16,360 --> 00:07:17,960
transient states.
115
00:07:17,960 --> 00:07:19,850
It's just transient states.
116
00:07:19,850 --> 00:07:22,540
And there's one recurrent
class.
117
00:07:22,540 --> 00:07:24,970
And we're assuming here
that it's recurrent.
118
00:07:24,970 --> 00:07:28,740
So you can almost see
intuitively that if you start
119
00:07:28,740 --> 00:07:31,800
out in any one of these
transient states, you bum
120
00:07:31,800 --> 00:07:34,360
around through the transient
states for a while.
121
00:07:34,360 --> 00:07:39,800
And eventually, you flop off
into the recurrent class.
122
00:07:39,800 --> 00:07:41,110
And once you're in
the recurrent
123
00:07:41,110 --> 00:07:43,960
class, there's no return.
124
00:07:43,960 --> 00:07:46,220
So you stay there forever.
125
00:07:46,220 --> 00:07:49,070
Now, that's something that
has to be proven.
126
00:07:49,070 --> 00:07:50,190
And it's proven in the notes.
127
00:07:50,190 --> 00:07:51,885
It was probably proven
last time.
128
00:07:54,420 --> 00:07:59,250
But anyway, what happens then
is that the sole difference
129
00:07:59,250 --> 00:08:05,090
between ergodic unit chains and
just having a completely
130
00:08:05,090 --> 00:08:10,670
ergodic Markov chain is that the
steady state factor is now
131
00:08:10,670 --> 00:08:14,610
positive for all ergodic states
and it's 0 for all
132
00:08:14,610 --> 00:08:16,100
transient states.
133
00:08:16,100 --> 00:08:20,710
And aside from that, you still
get the same behavior still.
134
00:08:20,710 --> 00:08:26,140
As n gets large, you go to the
steady state vector, which is
135
00:08:26,140 --> 00:08:30,920
the steady state vector
of the ergodic chain.
136
00:08:30,920 --> 00:08:35,500
If you're doing this stuff by
hand, how do you do it?
137
00:08:35,500 --> 00:08:39,270
Well, you start out just
with the ergodic class.
138
00:08:39,270 --> 00:08:42,130
I mean, you might as well
ignore everything else,
139
00:08:42,130 --> 00:08:45,250
because you know that eventually
you're in that
140
00:08:45,250 --> 00:08:46,280
ergodic class.
141
00:08:46,280 --> 00:08:49,830
And you find the steady state
vector in that ergodic class,
142
00:08:49,830 --> 00:08:51,480
and that's the steady
state vector you're
143
00:08:51,480 --> 00:08:53,890
going to wind up with.
144
00:08:53,890 --> 00:08:56,350
This is one advantage of
understanding what you're
145
00:08:56,350 --> 00:08:59,930
doing, because if you don't
understand what you're doing
146
00:08:59,930 --> 00:09:04,110
and you're just using computer
programs, then you never have
147
00:09:04,110 --> 00:09:06,360
any idea what's ergodic,
what's not
148
00:09:06,360 --> 00:09:07,650
ergodic or anything else.
149
00:09:07,650 --> 00:09:11,180
You just plug it, you grind
away, you get some answer and
150
00:09:11,180 --> 00:09:15,230
say, ah, I'll publish a paper.
151
00:09:15,230 --> 00:09:19,220
And you put down exactly what
the computer says, but you
152
00:09:19,220 --> 00:09:22,520
have no interpretation
of it at all.
153
00:09:22,520 --> 00:09:27,700
So the other way of looking at
this is, when you have a bunch
154
00:09:27,700 --> 00:09:33,640
of transient states, and you
also have an ergodic class,
155
00:09:33,640 --> 00:09:40,050
you can represent a matrix if
the recurrent states are at
156
00:09:40,050 --> 00:09:43,190
the end of the chain and the
transient states are at the
157
00:09:43,190 --> 00:09:45,830
beginning of the chain.
158
00:09:45,830 --> 00:09:50,850
This matrix here is the matrix
of transition probabilities
159
00:09:50,850 --> 00:09:53,200
within the recurrent class.
160
00:09:53,200 --> 00:09:58,870
These are the probabilities for
going from the transient
161
00:09:58,870 --> 00:10:02,820
states to the recurrent class.
162
00:10:02,820 --> 00:10:04,810
And once you get over
here, the only place
163
00:10:04,810 --> 00:10:06,060
to go is down here.
164
00:10:10,880 --> 00:10:15,380
And the transient class is
just a t by t class.
165
00:10:15,380 --> 00:10:20,035
And the recurrent class
is just a j minus t
166
00:10:20,035 --> 00:10:22,540
by j minus t matrix.
167
00:10:22,540 --> 00:10:25,250
So the idea is that each
transient state eventually has
168
00:10:25,250 --> 00:10:28,910
a transition to a recurrent
state, and the class of
169
00:10:28,910 --> 00:10:33,770
recurrent states leads to
study state as before.
170
00:10:33,770 --> 00:10:37,230
So that really, all that
analysis of ergodic unit
171
00:10:37,230 --> 00:10:43,470
chains, if you look at it
intuitively, it's all obvious.
172
00:10:43,470 --> 00:10:48,230
Now, as in much of mathematics,
knowing that
173
00:10:48,230 --> 00:10:52,820
something is obvious does not
relieve you of the need to
174
00:10:52,820 --> 00:10:55,950
prove it, because sometimes you
find that something that
175
00:10:55,950 --> 00:10:58,800
looks obvious is true
most of the time but
176
00:10:58,800 --> 00:10:59,870
not all of the time.
177
00:10:59,870 --> 00:11:04,230
And that's the purpose of
doing these things.
178
00:11:04,230 --> 00:11:07,650
There's another way to express
this eigenvalue, eigenvector
179
00:11:07,650 --> 00:11:10,700
equation we have here.
180
00:11:10,700 --> 00:11:16,900
And that is that the transition
matrix minus lambda
181
00:11:16,900 --> 00:11:23,020
times the identity matrix times
the column vector v is
182
00:11:23,020 --> 00:11:24,250
equal to 0.
183
00:11:24,250 --> 00:11:30,760
That's the same as the equation
p times v is equal to
184
00:11:30,760 --> 00:11:34,910
v. That's the same as
a right eigenvector.
185
00:11:34,910 --> 00:11:38,880
Well, this is the equation
for an eigenvalue 1.
186
00:11:38,880 --> 00:11:42,780
This is an equation for an
arbitrary eigenvalue lambda.
187
00:11:42,780 --> 00:11:48,610
But p times v equals lambda
times v is the same as p minus
188
00:11:48,610 --> 00:11:50,810
lambda i times v equals 0.
189
00:11:50,810 --> 00:11:55,010
Why do we even bother to say
something so obvious?
190
00:11:55,010 --> 00:11:59,980
Well, because when you look at
linear algebra, how many of
191
00:11:59,980 --> 00:12:04,220
you have never studied any
linear algebra at all or have
192
00:12:04,220 --> 00:12:09,430
only studied completely
mathematical linear algebra,
193
00:12:09,430 --> 00:12:15,000
where you never deal with
n-tuples as vectors or
194
00:12:15,000 --> 00:12:16,970
matrices or any things
like this?
195
00:12:16,970 --> 00:12:18,220
Is there anyone?
196
00:12:21,410 --> 00:12:25,890
If you don't have this
background, pick up--
197
00:12:30,330 --> 00:12:31,633
what's his name?
198
00:12:31,633 --> 00:12:32,420
AUDIENCE: Strang.
199
00:12:32,420 --> 00:12:33,390
PROFESSOR: Strang.
200
00:12:33,390 --> 00:12:35,070
Strang's book.
201
00:12:35,070 --> 00:12:39,090
It's a remarkably simple-minded
book which says
202
00:12:39,090 --> 00:12:42,370
everything as clearly
as it can be stated.
203
00:12:42,370 --> 00:12:45,040
And it tells you everything
you have to know.
204
00:12:45,040 --> 00:12:48,730
And it does it in a very
straightforward way.
205
00:12:48,730 --> 00:12:52,860
So I highly recommend it to get
any of the background that
206
00:12:52,860 --> 00:12:53,960
you might need.
207
00:12:53,960 --> 00:12:55,660
Most of you, I'm sure, are very
208
00:12:55,660 --> 00:12:56,870
familiar with these things.
209
00:12:56,870 --> 00:12:59,760
So I'm just reminding
you of then.
210
00:12:59,760 --> 00:13:03,540
Now, a square matrix is singular
if there's a vector
211
00:13:03,540 --> 00:13:07,640
v, such that a times
v is equal to 0.
212
00:13:07,640 --> 00:13:10,890
That's just a definition
as a singularity.
213
00:13:10,890 --> 00:13:16,340
Now, lambda is an eigenvalue of
a matrix p if and only if p
214
00:13:16,340 --> 00:13:19,180
minus lambda times
i is singular.
215
00:13:19,180 --> 00:13:23,600
In other words, if there's
some v for which p minus
216
00:13:23,600 --> 00:13:27,870
lambda i times v is equal to
0, that's what this says.
217
00:13:27,870 --> 00:13:32,150
You put p minus lambda i in
for a, and it says it's
218
00:13:32,150 --> 00:13:37,580
singular if there's some v
for which this matrix--
219
00:13:37,580 --> 00:13:40,940
this matrix is singular if
there's some v such that p
220
00:13:40,940 --> 00:13:44,860
minus lambda i times
v is equal to 0.
221
00:13:44,860 --> 00:13:48,800
So let a1 to am be
the columns of a.
222
00:13:48,800 --> 00:13:52,430
Then a is going to be
singular if a1 to am
223
00:13:52,430 --> 00:13:53,980
are linearly dependent.
224
00:13:53,980 --> 00:14:02,280
In other words, if there's some
set of coefficients you
225
00:14:02,280 --> 00:14:09,000
can attach to a1 times v1 plus
a2 times v2, plus up to am
226
00:14:09,000 --> 00:14:15,510
times vm such that that sum is
equal to 0, that means that a1
227
00:14:15,510 --> 00:14:17,910
to am are linearly dependent.
228
00:14:17,910 --> 00:14:24,580
It also means that the matrix a
times that v is equal to 0.
229
00:14:24,580 --> 00:14:27,390
So those two things say
the same thing again.
230
00:14:27,390 --> 00:14:30,200
So the square matrix, a, is
singular if and only if the
231
00:14:30,200 --> 00:14:34,390
rows of a are linearly
independent.
232
00:14:34,390 --> 00:14:36,100
We set columns here.
233
00:14:36,100 --> 00:14:38,260
Here, we're doing the
same thing for rows.
234
00:14:38,260 --> 00:14:40,120
It still holds true.
235
00:14:40,120 --> 00:14:44,340
And one new thing, if and only
if the determinant of a is
236
00:14:44,340 --> 00:14:45,720
equal to 0.
237
00:14:45,720 --> 00:14:49,210
One of the nice things about
determinants is that
238
00:14:49,210 --> 00:14:54,470
determinants are 0 if the matrix
is singular, if and
239
00:14:54,470 --> 00:14:56,170
only if the matrix
is singular.
240
00:14:56,170 --> 00:15:01,440
So the summary of all of this
for a matrix which is a
241
00:15:01,440 --> 00:15:02,740
transition matrix--
242
00:15:02,740 --> 00:15:04,960
namely, a stochastic matrix--
243
00:15:04,960 --> 00:15:10,050
is lambda, is an eigenvalue of
p, if and only if p minus
244
00:15:10,050 --> 00:15:14,320
lambda i is singular, if and
only if the determinant of p
245
00:15:14,320 --> 00:15:19,870
minus lambda i is equal to 0,
if and only if p times some
246
00:15:19,870 --> 00:15:25,150
vector v equals lambda v, and
if and only if u times p
247
00:15:25,150 --> 00:15:28,410
equals lambda u for some u.
248
00:15:28,410 --> 00:15:29,310
Yes?
249
00:15:29,310 --> 00:15:31,672
AUDIENCE: The second to last
statement is actually linearly
250
00:15:31,672 --> 00:15:34,540
independent, you said?
251
00:15:34,540 --> 00:15:35,974
The second to last.
252
00:15:35,974 --> 00:15:38,364
Square matrix a.
253
00:15:38,364 --> 00:15:39,215
No, above that.
254
00:15:39,215 --> 00:15:41,870
PROFESSOR: Oh, above that.
255
00:15:41,870 --> 00:15:46,910
A square matrix a is singular
if and only if the rows of a
256
00:15:46,910 --> 00:15:49,773
are linearly dependent, yes.
257
00:15:49,773 --> 00:15:50,680
AUDIENCE: Dependent.
258
00:15:50,680 --> 00:15:51,490
PROFESSOR: Dependent, yes.
259
00:15:51,490 --> 00:15:56,050
In other words, if there's
some vector v such that a
260
00:15:56,050 --> 00:16:08,370
times v is equal to 0, that
means that those columns are
261
00:16:08,370 --> 00:16:09,620
linearly dependent.
262
00:16:13,080 --> 00:16:16,630
So we need all of those
relationships.
263
00:16:16,630 --> 00:16:20,072
It says for every stochastic
matrix--
264
00:16:20,072 --> 00:16:22,820
oh, now this is something new.
265
00:16:22,820 --> 00:16:28,000
For every stochastic matrix,
P times e is equal to e.
266
00:16:28,000 --> 00:16:46,040
Obviously, because if you sum
up the sum of Pij over j is
267
00:16:46,040 --> 00:16:46,860
equal to 1.
268
00:16:46,860 --> 00:16:51,230
P sub ij is the probability,
given that you start in state
269
00:16:51,230 --> 00:16:54,370
i, that in the next step,
you'll be in state j.
270
00:16:54,370 --> 00:16:56,650
You have to be somewhere
in the next step.
271
00:16:56,650 --> 00:17:00,570
So if you sum these quantities
up, you have to get 1, which
272
00:17:00,570 --> 00:17:03,230
says you have to
be some place.
273
00:17:03,230 --> 00:17:04,480
So that's all this is saying.
274
00:17:07,109 --> 00:17:10,579
That's true for every
finite-state Markov chain in
275
00:17:10,579 --> 00:17:15,839
the world, no matter how ugly
it is, how many sets of
276
00:17:15,839 --> 00:17:20,660
recurrent states it has, how
much periodicity it has.
277
00:17:20,660 --> 00:17:26,010
A complete generality, P
times e is equal to e.
278
00:17:26,010 --> 00:17:30,620
So lambda is always an
eigenvalue of a stochastic
279
00:17:30,620 --> 00:17:35,350
matrix, and e is always
a right eigenvector.
280
00:17:35,350 --> 00:17:38,130
Well, from what we've just said,
that means there has to
281
00:17:38,130 --> 00:17:41,090
be a left eigenvector also.
282
00:17:41,090 --> 00:17:44,080
So there has to be some
pi such that pi times
283
00:17:44,080 --> 00:17:47,380
P is equal to pi.
284
00:17:47,380 --> 00:17:51,210
So suddenly, we find there's
also a left eigenvector.
285
00:17:51,210 --> 00:17:56,470
What we haven't shown yet is
that that pi that satisfies
286
00:17:56,470 --> 00:17:59,210
this equation is a probability
vector.
287
00:17:59,210 --> 00:18:03,070
Namely, we haven't shown that
all the components of pi are
288
00:18:03,070 --> 00:18:04,740
greater than or equal to 0.
289
00:18:04,740 --> 00:18:06,800
We still have to do that.
290
00:18:06,800 --> 00:18:10,340
And in fact, that's not
completely trivial.
291
00:18:10,340 --> 00:18:14,020
If we can find such a vector
that is a probability vector,
292
00:18:14,020 --> 00:18:17,960
the compound in sum to 1 and
they're not negative, then
293
00:18:17,960 --> 00:18:21,890
this is the equation for
a steady state vector.
294
00:18:21,890 --> 00:18:25,590
So what we don't know yet
is whether a steady
295
00:18:25,590 --> 00:18:27,120
state vector exists.
296
00:18:27,120 --> 00:18:31,400
We do know that a left
eigenvector exists.
297
00:18:31,400 --> 00:18:33,690
We're going to show later
that there is a steady
298
00:18:33,690 --> 00:18:35,050
state vector pi.
299
00:18:35,050 --> 00:18:40,400
In other words, a non-negative
vector which sums to 1 for all
300
00:18:40,400 --> 00:18:42,340
finite-state Markov chains.
301
00:18:42,340 --> 00:18:46,780
In other words, no matter how
messy it is, just like e, the
302
00:18:46,780 --> 00:18:50,900
column vector of all 1s is
always a right eigenvector of
303
00:18:50,900 --> 00:18:52,480
eigenvalue 1.
304
00:18:52,480 --> 00:18:56,800
There is always a non-negative
vector pi whose components sum
305
00:18:56,800 --> 00:19:03,590
to 1, which is a left
eigenvector with eigenvalue 1.
306
00:19:03,590 --> 00:19:06,260
So these two relationships
hold everywhere.
307
00:19:10,780 --> 00:19:15,030
Incidentally, the notes
at one point claim
308
00:19:15,030 --> 00:19:17,030
to have shown this.
309
00:19:17,030 --> 00:19:18,680
And the notes really
don't show it.
310
00:19:18,680 --> 00:19:21,270
I'm going to show
it to you today.
311
00:19:21,270 --> 00:19:22,400
I'm sorry for that.
312
00:19:22,400 --> 00:19:26,660
It's something I've known for so
long that I find it hard to
313
00:19:26,660 --> 00:19:29,420
say is this true or not.
314
00:19:29,420 --> 00:19:30,790
Of course it's true.
315
00:19:30,790 --> 00:19:34,540
But it does have to be shown,
and I will show it
316
00:19:34,540 --> 00:19:35,790
to you later on.
317
00:19:38,490 --> 00:19:44,410
Chapter three of the notes is
largely rewritten this year.
318
00:19:44,410 --> 00:19:47,920
And it has a few more typos
in it than most
319
00:19:47,920 --> 00:19:49,870
of the other chapters.
320
00:19:49,870 --> 00:19:52,280
And a few of the typos
are fairly important.
321
00:19:52,280 --> 00:19:55,518
I'll try to point some
of them out as we go.
322
00:19:55,518 --> 00:19:59,190
But I'm sure I haven't
caught them all yet.
323
00:19:59,190 --> 00:20:03,990
Now, what is the determinant
of an M by M matrix?
324
00:20:03,990 --> 00:20:08,660
It's this very simple-looking
but rather messy formula,
325
00:20:08,660 --> 00:20:13,560
which says the determinant of
a square matrix A is the sum
326
00:20:13,560 --> 00:20:14,810
over all partitions--
327
00:20:17,340 --> 00:20:19,000
and then there's a plus
minus here, which
328
00:20:19,000 --> 00:20:20,700
I'll talk about later--
329
00:20:20,700 --> 00:20:24,780
of the product from i equals
1 to M. M is the number of
330
00:20:24,780 --> 00:20:30,270
states of A sub i.
331
00:20:30,270 --> 00:20:34,480
This is the component
of the ij position.
332
00:20:34,480 --> 00:20:36,300
And we're taking A sub i.
333
00:20:36,300 --> 00:20:40,260
And then the partition that
we're dealing with, mu sub i.
334
00:20:40,260 --> 00:20:46,520
So what we're doing is taking
a matrix with all sorts of
335
00:20:46,520 --> 00:20:48,050
terms in it--
336
00:20:48,050 --> 00:21:03,600
A11 up to A1j on to Aj1
up to A sub jj.
337
00:21:03,600 --> 00:21:06,880
And these partitions we're
talking about are ways of
338
00:21:06,880 --> 00:21:11,900
selecting one element from each
row and one element from
339
00:21:11,900 --> 00:21:12,540
each column.
340
00:21:12,540 --> 00:21:20,240
Namely, that first sum there is
talking about one element
341
00:21:20,240 --> 00:21:22,000
from each row.
342
00:21:22,000 --> 00:21:25,140
And then when we're talking
about a permutation here,
343
00:21:25,140 --> 00:21:29,700
we're doing something like, for
this row, we're looking
344
00:21:29,700 --> 00:21:31,170
at, say, this element.
345
00:21:31,170 --> 00:21:34,430
For this row, we might be
looking at this element.
346
00:21:34,430 --> 00:21:37,500
For this row, we might be
looking at this element, and
347
00:21:37,500 --> 00:21:40,790
so forth down, until finally,
we're looking at some
348
00:21:40,790 --> 00:21:41,910
element down here.
349
00:21:41,910 --> 00:21:45,090
Now, we've picked out every
column and every row in doing
350
00:21:45,090 --> 00:21:48,960
this, but we only have one
element in each row and one
351
00:21:48,960 --> 00:21:51,400
element in each column.
352
00:21:51,400 --> 00:21:54,860
If you've studied linear algebra
and you're at all
353
00:21:54,860 --> 00:21:58,190
interested in computation, the
first thing that everybody
354
00:21:58,190 --> 00:22:02,610
tells you is that this is a
god-awful way to ever compute
355
00:22:02,610 --> 00:22:07,760
a determinant, because the
number of permutations grows
356
00:22:07,760 --> 00:22:10,680
very, very fast with the
size of the matrix.
357
00:22:10,680 --> 00:22:12,310
And therefore you don't
want to use this
358
00:22:12,310 --> 00:22:14,230
formula very often.
359
00:22:14,230 --> 00:22:18,590
It's a very useful formula
conceptually, though, because
360
00:22:18,590 --> 00:22:23,620
if we look at the determinant
of p minus lambda i, if we
361
00:22:23,620 --> 00:22:27,740
want to ask the question, how
many eigenvalues does this
362
00:22:27,740 --> 00:22:30,140
transition matrix have?
363
00:22:30,140 --> 00:22:33,190
well, the number of eigenvalues
it has is the
364
00:22:33,190 --> 00:22:36,940
number of values of lambda such
that the determinant of p
365
00:22:36,940 --> 00:22:42,270
minus lambda i is 0.
366
00:22:42,270 --> 00:22:44,840
Now, how many such
values are there?
367
00:22:44,840 --> 00:22:56,300
Well, you look the matrix for
that, and you get A11 minus
368
00:22:56,300 --> 00:23:13,460
lambda A12 and A22 minus lambda
Ajj minus lambda.
369
00:23:13,460 --> 00:23:16,430
And none of the other elements
have lambda in it.
370
00:23:16,430 --> 00:23:20,450
So when you're looking at this
formula for finding the
371
00:23:20,450 --> 00:23:24,840
determinant, one of the
partitions is this partition,
372
00:23:24,840 --> 00:23:30,520
which is a polynomial of
degree j in lambda.
373
00:23:30,520 --> 00:23:33,590
All of the others are
polynomials of degree less
374
00:23:33,590 --> 00:23:35,270
than j in lambda.
375
00:23:35,270 --> 00:23:38,750
And therefore this whole
bloody mess here is a
376
00:23:38,750 --> 00:23:44,140
polynomial of degree
j and lambda.
377
00:23:44,140 --> 00:23:48,410
So the equation, determinant of
p minus lambda i, which is
378
00:23:48,410 --> 00:23:53,070
a polynomial of degree j
in lambda, equals 0.
379
00:23:53,070 --> 00:23:55,440
How many roots does it have?
380
00:23:55,440 --> 00:23:58,120
Well, the fundamental theorem
of algebra says that a
381
00:23:58,120 --> 00:24:04,510
polynomial of degree j,
of complex numbers--
382
00:24:04,510 --> 00:24:07,130
and real is a special
case of complex--
383
00:24:07,130 --> 00:24:12,360
that it has exactly j roots.
384
00:24:12,360 --> 00:24:16,630
So there are exactly,
in this case, M--
385
00:24:16,630 --> 00:24:17,780
excuse me, I've been
calling it j
386
00:24:17,780 --> 00:24:19,030
sometimes and M sometimes.
387
00:24:21,870 --> 00:24:26,810
This equation here has exactly
M roots to it.
388
00:24:26,810 --> 00:24:30,200
And since it has exactly M
roots, that's the number of
389
00:24:30,200 --> 00:24:32,370
eigenvalues there are.
390
00:24:32,370 --> 00:24:35,710
There's one flaw in
that argument.
391
00:24:35,710 --> 00:24:40,020
And that is, some of the roots
might be repeated.
392
00:24:40,020 --> 00:24:44,070
Say you have M roots
altogether.
393
00:24:44,070 --> 00:24:48,460
Some of them appear more than
one time, so you'll have roots
394
00:24:48,460 --> 00:24:51,240
of multiplicity, something
or other.
395
00:24:51,240 --> 00:24:54,740
And when you add up the
multiplicities of each of the
396
00:24:54,740 --> 00:24:58,860
distinct eigenvalues, you
get capital M, which is
397
00:24:58,860 --> 00:25:00,700
the number of states.
398
00:25:00,700 --> 00:25:04,580
So the number of different
eigenvalues is less than or
399
00:25:04,580 --> 00:25:09,940
equal to M. And the number of
distinct eigenvalues times the
400
00:25:09,940 --> 00:25:17,350
multiplicity of each eigenvalue
is equal to M.
401
00:25:17,350 --> 00:25:19,910
That's a simple, straightforward
fact.
402
00:25:19,910 --> 00:25:22,910
And it's worth remembering.
403
00:25:22,910 --> 00:25:24,850
So there are M roots
to the equation.
404
00:25:24,850 --> 00:25:28,100
Determinant p minus
lambda i equals 0.
405
00:25:28,100 --> 00:25:33,220
And therefore there are
M eigenvalues of p.
406
00:25:33,220 --> 00:25:38,210
And therefore you might think
that there are M eigenvectors.
407
00:25:38,210 --> 00:25:43,530
That, unfortunately, is
not true necessarily.
408
00:25:43,530 --> 00:25:46,460
That's one of the really--
409
00:25:46,460 --> 00:25:50,380
it's probably the only really
ugly thing in linear algebra.
410
00:25:50,380 --> 00:25:52,505
I mean, linear algebra is
a beautiful theory.
411
00:25:55,380 --> 00:25:58,260
I mean, it's like Poisson's
stochastic processes.
412
00:25:58,260 --> 00:26:01,190
Everything that can
be true is true.
413
00:26:01,190 --> 00:26:03,070
And if something isn't true,
there's a simple
414
00:26:03,070 --> 00:26:05,580
counter-example of why
it can't be true.
415
00:26:05,580 --> 00:26:09,260
This thing is just
a bloody mess.
416
00:26:09,260 --> 00:26:15,570
But unfortunately, if you have
M states in a finite-state
417
00:26:15,570 --> 00:26:20,380
Markov chain, you might not have
M different eigenvectors.
418
00:26:20,380 --> 00:26:24,790
And that's unfortunate, but we
will forget about that for as
419
00:26:24,790 --> 00:26:28,780
long as we can, and we'll
finally come back to it
420
00:26:28,780 --> 00:26:31,130
towards the end.
421
00:26:31,130 --> 00:26:32,380
AUDIENCE: [INAUDIBLE]?
422
00:26:38,600 --> 00:26:38,870
PROFESSOR: What?
423
00:26:38,870 --> 00:26:41,158
AUDIENCE: Why would we care
about all the eigenvectors if
424
00:26:41,158 --> 00:26:45,790
we are only concerned with the
ones that [INAUDIBLE]?
425
00:26:45,790 --> 00:26:47,890
PROFESSOR: Well, because we're
interested in the other ones
426
00:26:47,890 --> 00:26:51,960
because that tells us how fast
p to the M converges to what
427
00:26:51,960 --> 00:26:54,760
it should be.
428
00:26:54,760 --> 00:26:58,700
I mean, all those other
eigenvalues, as we'll see, are
429
00:26:58,700 --> 00:27:04,884
the error terms in p to the
M as it approaches this
430
00:27:04,884 --> 00:27:06,868
asymptotic value.
431
00:27:06,868 --> 00:27:10,416
And therefore we want to know
what those eigenvalues are.
432
00:27:10,416 --> 00:27:11,630
At least we want to
know what the
433
00:27:11,630 --> 00:27:13,258
second-biggest eigenvalue is.
434
00:27:19,820 --> 00:27:23,850
Now, let's look at just
a case of two states.
435
00:27:23,850 --> 00:27:28,020
Most of the things that can
happen will happen with two
436
00:27:28,020 --> 00:27:31,280
states, except for this ugly
thing that I told you about
437
00:27:31,280 --> 00:27:33,330
that can't happen
with two states.
438
00:27:33,330 --> 00:27:36,370
And therefore two states is a
good thing to look at, because
439
00:27:36,370 --> 00:27:38,930
with two states, you can
calculate everything very
440
00:27:38,930 --> 00:27:43,210
easily and you don't have to
use any linear algebra.
441
00:27:43,210 --> 00:27:48,010
So if we look at a Markov chain
with two states, P sub
442
00:27:48,010 --> 00:27:54,620
ij is this set of transition
probabilities.
443
00:27:54,620 --> 00:28:03,880
The left eigenvector equation
is pi 1 times P11 times pi 2
444
00:28:03,880 --> 00:28:07,490
times P21 is equal
to lambda pi 1.
445
00:28:07,490 --> 00:28:12,500
And so this is writing out
what we said before.
446
00:28:12,500 --> 00:28:17,770
The vector pi times the matrix
P is equal to lambda
447
00:28:17,770 --> 00:28:19,780
times the vector pi.
448
00:28:19,780 --> 00:28:21,560
That covers both of
these equations.
449
00:28:21,560 --> 00:28:23,810
Since M is only 2,
we only have to
450
00:28:23,810 --> 00:28:25,830
write things out twice.
451
00:28:25,830 --> 00:28:29,250
Same thing for the right
eigenvector equation.
452
00:28:29,250 --> 00:28:30,990
That's this.
453
00:28:30,990 --> 00:28:34,230
The determinant of P minus
lambda i, if we use this
454
00:28:34,230 --> 00:28:39,940
formula that we talked about
here, you put A11 minus
455
00:28:39,940 --> 00:28:42,430
lambda, A22 minus lambda.
456
00:28:42,430 --> 00:28:44,620
Well, then you're done.
457
00:28:44,620 --> 00:28:50,220
So all you need is P11 minus
lambda times P22 minus lambda.
458
00:28:50,220 --> 00:28:53,560
That's this permutation there.
459
00:28:53,560 --> 00:28:59,150
And then you have an odd
permutation, A12 times A21.
460
00:28:59,150 --> 00:29:01,470
How do you know which
permutations are even and
461
00:29:01,470 --> 00:29:04,160
which permutations are odd?
462
00:29:04,160 --> 00:29:07,070
It's how many flips
you have to do.
463
00:29:07,070 --> 00:29:09,930
But to see that that's
consistent, you really have to
464
00:29:09,930 --> 00:29:13,800
look at Strang or some book on
linear algebra, because it's
465
00:29:13,800 --> 00:29:15,590
not relevant here.
466
00:29:15,590 --> 00:29:18,100
But anyway, that determinant
is equal to
467
00:29:18,100 --> 00:29:19,970
this quantity here.
468
00:29:19,970 --> 00:29:24,850
That's a polynomial of
degree 2 in lambda.
469
00:29:24,850 --> 00:29:30,650
If you solve it, you find
out that one solution is
470
00:29:30,650 --> 00:29:32,690
lambda 1 equals 1.
471
00:29:32,690 --> 00:29:39,710
The other solution is lambda
2 is 1 minus P12 minus P21.
472
00:29:39,710 --> 00:29:44,020
Now, there are a bunch of
cases to look at here.
473
00:29:44,020 --> 00:29:48,770
If the off-diagonal transition
probabilities are both 0, what
474
00:29:48,770 --> 00:29:49,380
does that mean?
475
00:29:49,380 --> 00:29:52,520
It means if you start in state
0, you stay there.
476
00:29:52,520 --> 00:29:56,450
If you start in state 1,
you stay there forever.
477
00:29:56,450 --> 00:29:59,590
If you start in state 2,
you stay there forever.
478
00:29:59,590 --> 00:30:04,520
That's a very boring Markov
chain, but it's not very nice
479
00:30:04,520 --> 00:30:06,780
for the theory.
480
00:30:06,780 --> 00:30:11,120
So we're going to leave that
case out for the time being.
481
00:30:11,120 --> 00:30:14,530
But anyway, if you have that
case, then the chain has two
482
00:30:14,530 --> 00:30:16,520
recurrent classes.
483
00:30:16,520 --> 00:30:19,740
Lambda equals 1, has
multiplicity 2.
484
00:30:19,740 --> 00:30:27,430
You have two eigenvalues of
algebraic multiplicity 2.
485
00:30:27,430 --> 00:30:30,660
I mean, it's just one number,
but it appears twice in this
486
00:30:30,660 --> 00:30:32,920
determinant equation.
487
00:30:32,920 --> 00:30:35,960
And it also appears twice in
the sense that you have two
488
00:30:35,960 --> 00:30:37,510
recurrent classes.
489
00:30:37,510 --> 00:30:43,930
And you will find that there are
two linearly independent
490
00:30:43,930 --> 00:30:47,210
left eigenvectors, two linearly
independent right
491
00:30:47,210 --> 00:30:48,360
eigenvectors.
492
00:30:48,360 --> 00:30:50,400
And how do you find those?
493
00:30:50,400 --> 00:30:54,160
You use your common sense and
you say, well, if you start in
494
00:30:54,160 --> 00:30:55,910
state 1, you're always there.
495
00:30:55,910 --> 00:30:58,400
If you start in state 2,
you're always there.
496
00:30:58,400 --> 00:31:01,130
Why do I even look at
these two states?
497
00:31:01,130 --> 00:31:05,330
This is a crazy thing where
wherever I start, I stay there
498
00:31:05,330 --> 00:31:09,130
and I only look at state
1 or state 2.
499
00:31:09,130 --> 00:31:13,820
It's scarcely even
a Markov chain.
500
00:31:13,820 --> 00:31:19,630
If P12 and P21 are both 1, what
it means is you can never
501
00:31:19,630 --> 00:31:21,610
go from state 1 to state 1.
502
00:31:21,610 --> 00:31:24,200
You always go from state
1 to state 2.
503
00:31:24,200 --> 00:31:27,220
And you always go from
state 2 to state 1.
504
00:31:27,220 --> 00:31:30,830
It means you have a two-state
periodic chain.
505
00:31:30,830 --> 00:31:33,130
And that's the other
crazy case.
506
00:31:33,130 --> 00:31:35,170
The other case is not
very interesting.
507
00:31:35,170 --> 00:31:38,800
There's nothing stochastic
about it at all.
508
00:31:38,800 --> 00:31:40,970
So the chain is periodic.
509
00:31:40,970 --> 00:31:45,170
And if you look at this equation
here, the second
510
00:31:45,170 --> 00:31:48,270
eigenvalue is equal
to minus 1.
511
00:31:48,270 --> 00:31:51,790
I might as well tell you that,
in general, if you have a
512
00:31:51,790 --> 00:31:57,520
periodic Markov chain, just one
recurrent class and it's
513
00:31:57,520 --> 00:32:04,760
periodic, a period d, then the
eigenvalues turn out to be the
514
00:32:04,760 --> 00:32:08,890
uniformly spaced eigenvalues
around the unit circle.
515
00:32:08,890 --> 00:32:10,920
One is one of the eigenvalues.
516
00:32:10,920 --> 00:32:12,490
We've already seen that.
517
00:32:12,490 --> 00:32:15,940
And the other d minus 1
eigenvalues are those
518
00:32:15,940 --> 00:32:18,690
uniformly spaced around
the unit circle.
519
00:32:18,690 --> 00:32:24,410
So they add up to 360 degrees
when you get all done with it.
520
00:32:24,410 --> 00:32:26,270
So that's an easy case.
521
00:32:26,270 --> 00:32:29,780
Proving that is tedious.
522
00:32:29,780 --> 00:32:31,420
It's done in the notes.
523
00:32:31,420 --> 00:32:32,690
It's not even done
in the notes.
524
00:32:32,690 --> 00:32:34,680
It's done in one of
the exercises.
525
00:32:34,680 --> 00:32:38,260
And you can do it
if you choose.
526
00:32:41,470 --> 00:32:46,540
So let's look at these
eigenvector equations and the
527
00:32:46,540 --> 00:32:48,400
eigenvalue equations.
528
00:32:48,400 --> 00:32:52,200
Incidentally, if you don't know
what the eigenvalues are,
529
00:32:52,200 --> 00:32:57,010
is this a linear set
of equations?
530
00:32:57,010 --> 00:32:59,380
No, it's a nonlinear
set of equations.
531
00:32:59,380 --> 00:33:02,770
This is a nonlinear set
of equations in pi
532
00:33:02,770 --> 00:33:06,930
1, pi 2, and lambda.
533
00:33:06,930 --> 00:33:11,390
How do you solve non-linear
equations like that?
534
00:33:11,390 --> 00:33:14,750
Well, if you have much sense,
you first find out what lambda
535
00:33:14,750 --> 00:33:17,350
is and then you solve
linear equations.
536
00:33:20,105 --> 00:33:21,390
And you can always do that.
537
00:33:21,390 --> 00:33:25,880
We've said that these solutions
for lambda, there
538
00:33:25,880 --> 00:33:28,660
can only be M of them.
539
00:33:28,660 --> 00:33:30,210
And you can find
them by solving
540
00:33:30,210 --> 00:33:32,220
this polynomial equation.
541
00:33:32,220 --> 00:33:35,910
Then you can solve the linear
equation by finding the
542
00:33:35,910 --> 00:33:37,100
eigenvectors.
543
00:33:37,100 --> 00:33:39,750
There are packages to do all
of these things, so there's
544
00:33:39,750 --> 00:33:44,650
nothing you should waste
time on doing here.
545
00:33:44,650 --> 00:33:49,090
It's just knowing what the
results are that's important.
546
00:33:49,090 --> 00:33:55,100
From now on, I'm going to assume
that P12 or P21 are
547
00:33:55,100 --> 00:33:56,140
greater than 0.
548
00:33:56,140 --> 00:33:58,650
In other words, I'm going to
assume that we don't have the
549
00:33:58,650 --> 00:34:05,010
periodic case and we don't have
the case where you have
550
00:34:05,010 --> 00:34:07,000
two classes of states.
551
00:34:07,000 --> 00:34:11,760
In other words, I'm going to
assume that our Markov chain
552
00:34:11,760 --> 00:34:13,080
is actually ergodic.
553
00:34:13,080 --> 00:34:17,530
That's the assumption that
I'm making here.
554
00:34:17,530 --> 00:34:22,500
If you then solve these
equations using lambda 1
555
00:34:22,500 --> 00:34:27,380
equals 1, you'll find
out that pi 1 is the
556
00:34:27,380 --> 00:34:29,350
component sum to 1.
557
00:34:29,350 --> 00:34:32,670
First component is
P21 over the sum.
558
00:34:32,670 --> 00:34:36,700
Second component is
P12 over the sum.
559
00:34:36,700 --> 00:34:37,950
Not very interesting.
560
00:34:40,440 --> 00:34:44,520
Why is the steady state
probability weighted towards
561
00:34:44,520 --> 00:34:47,839
the largest of these transition
probabilities?
562
00:34:47,839 --> 00:34:54,330
If P21 is bigger than P12, how
do you know intuitively that
563
00:34:54,330 --> 00:34:59,196
you're going to be in state 1
more than you're in state 2?
564
00:34:59,196 --> 00:35:03,131
Is this intuitively
obvious to-- yeah?
565
00:35:03,131 --> 00:35:04,381
AUDIENCE: [INAUDIBLE].
566
00:35:06,220 --> 00:35:08,810
PROFESSOR: Because you make more
transistors from 2 to 1.
567
00:35:08,810 --> 00:35:11,970
Well, actually you don't make
more transitions from 2 to 1.
568
00:35:11,970 --> 00:35:14,890
You make exactly the same number
of transitions, but
569
00:35:14,890 --> 00:35:17,260
since the probability is higher,
it means you have to
570
00:35:17,260 --> 00:35:20,160
be in state 1 more
of the time.
571
00:35:20,160 --> 00:35:21,410
Good.
572
00:35:23,840 --> 00:35:25,520
So these are the two.
573
00:35:28,740 --> 00:35:35,920
And this is the left eigenvector
for the second
574
00:35:35,920 --> 00:35:36,480
eigenvalue--
575
00:35:36,480 --> 00:35:39,230
namely, the smaller
eigenvalue.
576
00:35:39,230 --> 00:35:47,000
Now, if you look at these
equations, you'll notice that
577
00:35:47,000 --> 00:35:54,390
the vector pi, the left i-th
eigenvector, multiplied by the
578
00:35:54,390 --> 00:35:59,650
right j-th eigenvector, is
always equal to delta ij.
579
00:35:59,650 --> 00:36:05,790
In other words, the left
eigenvectors are orthogonal to
580
00:36:05,790 --> 00:36:08,670
the right eigenvectors.
581
00:36:08,670 --> 00:36:11,840
I mean, you can see this just
by multiplying it out.
582
00:36:11,840 --> 00:36:16,310
You multiply pi 1 times nu
1, and what do you get?
583
00:36:16,310 --> 00:36:20,460
You get this plus this,
which is 1.
584
00:36:20,460 --> 00:36:25,310
Delta 11 means there's something
which is 1 when i is
585
00:36:25,310 --> 00:36:29,165
equal j and 0 when i
is unequal to j.
586
00:36:29,165 --> 00:36:36,170
You take this and you
multiply it by this,
587
00:36:36,170 --> 00:36:36,950
and what do you get?
588
00:36:36,950 --> 00:36:41,160
You get P21 times P12
over the square.
589
00:36:41,160 --> 00:36:45,830
Minus P12 times P21, it's 0.
590
00:36:45,830 --> 00:36:47,040
Same thing here.
591
00:36:47,040 --> 00:36:53,160
1 minus 1, that vector times
this vector, is 0 again.
592
00:36:53,160 --> 00:36:56,160
So the cross-terms are 0.
593
00:36:56,160 --> 00:36:58,680
The diagonal terms are 1.
594
00:37:07,150 --> 00:37:08,400
That's the way it is.
595
00:37:11,500 --> 00:37:13,410
So let's move on with this.
596
00:37:17,580 --> 00:37:21,440
These right eigenvector
equations, you can write them
597
00:37:21,440 --> 00:37:23,530
in matrix form.
598
00:37:23,530 --> 00:37:24,720
I'm doing this slowly.
599
00:37:24,720 --> 00:37:27,740
I hope I'm not boring those who
have done a lot of linear
600
00:37:27,740 --> 00:37:29,830
algebra too much.
601
00:37:29,830 --> 00:37:37,130
But they won't go on forever,
and it gets us to where we
602
00:37:37,130 --> 00:37:38,270
want to go.
603
00:37:38,270 --> 00:37:41,830
So if you take these two
equations and you write them
604
00:37:41,830 --> 00:37:48,170
in matrix form, what you get
is P times u, where u is a
605
00:37:48,170 --> 00:37:53,770
matrix whose columns are
the vector nu 1 and
606
00:37:53,770 --> 00:37:56,120
the vector nu 2.
607
00:37:56,120 --> 00:38:01,090
And capital lambda is the
diagonal matrix of the
608
00:38:01,090 --> 00:38:01,980
eigenvalues.
609
00:38:01,980 --> 00:38:07,540
If you multiply P times the
first column of u, and then
610
00:38:07,540 --> 00:38:11,940
you look at the first column of
this matrix, what you get--
611
00:38:11,940 --> 00:38:13,680
yes, that's exactly the
right way to do it.
612
00:38:17,100 --> 00:38:20,510
And if you're not doing that,
you're probably not
613
00:38:20,510 --> 00:38:21,330
understanding it.
614
00:38:21,330 --> 00:38:25,220
But if you just think of
ordinary matrix vector
615
00:38:25,220 --> 00:38:29,110
multiplication, this
all works out.
616
00:38:32,320 --> 00:38:36,700
Because of this orthogonality
relationship, we see that the
617
00:38:36,700 --> 00:38:52,530
matrix whose rows are the left
eigenvectors times the matrix
618
00:38:52,530 --> 00:38:56,980
whose columns are the
right eigenvectors,
619
00:38:56,980 --> 00:38:59,720
that's equal to i.
620
00:38:59,720 --> 00:39:01,740
Namely, it's equal to
the identity matrix.
621
00:39:01,740 --> 00:39:05,580
That's what this orthogonality
relationship means.
622
00:39:05,580 --> 00:39:12,730
This means that this matrix is
the inverse of this matrix.
623
00:39:12,730 --> 00:39:16,310
This proves that u
is invertible.
624
00:39:16,310 --> 00:39:20,250
And in fact, we've done this
just for m equals 2.
625
00:39:20,250 --> 00:39:24,390
But in fact, this proof is
general and holds for
626
00:39:24,390 --> 00:39:31,520
arbitrary Markov chains if the
eigenvectors span the space.
627
00:39:31,520 --> 00:39:32,960
And we'll see that later.
628
00:39:32,960 --> 00:39:38,030
We're doing this for m equals
2 now, so we how to proceed
629
00:39:38,030 --> 00:39:41,130
when we have an arbitrary
Markov chain.
630
00:39:41,130 --> 00:39:42,610
u is invertible.
631
00:39:42,610 --> 00:39:46,790
u to the minus 1 has pi
1 and pi 2 as rows.
632
00:39:46,790 --> 00:39:49,690
And thus P is going
to be equal to--
633
00:39:52,540 --> 00:39:54,020
I guess we should--
634
00:39:54,020 --> 00:39:56,180
oh, we set it up here.
635
00:39:56,180 --> 00:39:59,220
P times u is equal to
u times lambda.
636
00:39:59,220 --> 00:40:02,780
We've shown here that u is
invertible, therefore we can
637
00:40:02,780 --> 00:40:06,830
multiply this equation
by u to the minus 1.
638
00:40:06,830 --> 00:40:11,830
And we get the transition matrix
P is equal to u times
639
00:40:11,830 --> 00:40:15,730
the diagonal matrix lambda
times the matrix u
640
00:40:15,730 --> 00:40:18,300
to the minus 1.
641
00:40:18,300 --> 00:40:21,350
What happens if we try
to find P squared?
642
00:40:21,350 --> 00:40:25,470
Well, it's u times lambda
times u to the minus 1.
643
00:40:25,470 --> 00:40:28,470
One of the nice things about
matrices is you can multiply
644
00:40:28,470 --> 00:40:30,310
them, if you don't
worry about the
645
00:40:30,310 --> 00:40:32,960
details, almost like numbers.
646
00:40:32,960 --> 00:40:36,540
Times u times lambda times
u to the minus 1.
647
00:40:36,540 --> 00:40:41,580
Except you don't have
commutativity.
648
00:40:41,580 --> 00:40:44,310
That's the only thing
that you don't have.
649
00:40:44,310 --> 00:40:47,100
But anyway, you have u times
lambda times u to the minus 1
650
00:40:47,100 --> 00:40:50,600
times u times lambda times
u to t he minus 1.
651
00:40:50,600 --> 00:40:54,840
This and this turn out to be
the identity matrix, so you
652
00:40:54,840 --> 00:40:58,220
have u times lambda times
lambda, which is lambda
653
00:40:58,220 --> 00:41:00,870
squared, times u
to the minus 1.
654
00:41:00,870 --> 00:41:03,580
You still have this diagonal
matrix here, but the
655
00:41:03,580 --> 00:41:06,610
eigenvalues have all
been doubled.
656
00:41:06,610 --> 00:41:12,660
If you keep doing that
repeatedly, you find out that
657
00:41:12,660 --> 00:41:17,410
P to the n-- namely, this
long-term transition matrix,
658
00:41:17,410 --> 00:41:20,290
which is the thing we're
interested in--
659
00:41:20,290 --> 00:41:25,860
is the matrix u times this
diagonal matrix, lambda to the
660
00:41:25,860 --> 00:41:29,710
n, times u to the minus 1.
661
00:41:29,710 --> 00:41:34,910
Equation 329 in the text has a
typo, and it should be this.
662
00:41:34,910 --> 00:41:39,650
It's given as u to the minus 1
times lambda to the n times u,
663
00:41:39,650 --> 00:41:43,030
which is not at all right.
664
00:41:43,030 --> 00:41:47,730
That's probably the worst typo,
because if you try to
665
00:41:47,730 --> 00:41:51,076
say something from that, you'll
get very confused.
666
00:41:51,076 --> 00:41:54,700
You can solve one in general if
all the M eigenvalues are
667
00:41:54,700 --> 00:41:57,730
distinct as easily as
for M equals 2.
668
00:41:57,730 --> 00:42:01,020
This is still valid
so long as the
669
00:42:01,020 --> 00:42:04,780
eigenvectors span the space.
670
00:42:04,780 --> 00:42:09,600
So now the thing we want to
do is relatively simple.
671
00:42:09,600 --> 00:42:14,460
This lambda to the n is
a diagonal matrix.
672
00:42:14,460 --> 00:42:19,900
I can represent it as the sum
of M different matrices.
673
00:42:19,900 --> 00:42:23,750
And each of those matrices
has only one
674
00:42:23,750 --> 00:42:26,590
diagonal element, non-0.
675
00:42:26,590 --> 00:42:30,670
In other words, for the case
here, what we're doing is
676
00:42:30,670 --> 00:42:40,650
taking lambda 1, 0 to the n,
0 lambda 2 to the n, and
677
00:42:40,650 --> 00:42:54,090
representing this as lambda 1 to
the n, 0, 0, 0, plus 0, 0,
678
00:42:54,090 --> 00:42:57,890
0 lambda 2 to the n.
679
00:43:01,240 --> 00:43:07,940
So we have those trivial
matrices with u on the left
680
00:43:07,940 --> 00:43:11,620
side and u to the minus
1 on the right side.
681
00:43:11,620 --> 00:43:17,920
And we think of how to multiply
the matrix u, which
682
00:43:17,920 --> 00:43:23,910
is a matrix whose columns are
the eigenvectors, times this
683
00:43:23,910 --> 00:43:27,730
matrix with only one non-0
element, times the matrix
684
00:43:27,730 --> 00:43:33,620
here, whose elements are
the left eigenvectors.
685
00:43:33,620 --> 00:43:36,050
And how do you do that?
686
00:43:36,050 --> 00:43:40,200
Well, if you do this for a
while, and you think of what
687
00:43:40,200 --> 00:43:46,720
this one element here times
a matrix whose rows are
688
00:43:46,720 --> 00:43:52,280
eigenvectors does, this non-0
term in here picks out the
689
00:43:52,280 --> 00:43:54,830
appropriate row here.
690
00:43:54,830 --> 00:43:57,310
And this non-0 element
picks out the
691
00:43:57,310 --> 00:43:59,620
appropriate column here.
692
00:43:59,620 --> 00:44:05,770
So what that gives you is p to
the n is equal to the sum over
693
00:44:05,770 --> 00:44:10,550
the number of states in the
Markov chain times lambda sub
694
00:44:10,550 --> 00:44:13,780
i-- the i-th value to
the nth power--
695
00:44:13,780 --> 00:44:17,090
times nu to the i times
pi to the i.
696
00:44:17,090 --> 00:44:21,560
pi to the i is the i-th
eigenvector of p.
697
00:44:21,560 --> 00:44:26,150
nu to the i is the i-th right
eigenvector of p.
698
00:44:26,150 --> 00:44:28,010
They have nothing
to do with n.
699
00:44:28,010 --> 00:44:33,370
The only thing that n affects
is this eigenvalue here.
700
00:44:33,370 --> 00:44:37,650
And what this is saying is that
p to the n is just the
701
00:44:37,650 --> 00:44:46,160
sum of eigenvalues which are,
if lambda is bigger than 1,
702
00:44:46,160 --> 00:44:47,720
this is exploding.
703
00:44:47,720 --> 00:44:51,540
If lambda 1 is less than
1, it's going to 0.
704
00:44:51,540 --> 00:44:55,410
And if lambda 1 is equal to
1, it's staying constant.
705
00:44:55,410 --> 00:45:00,790
If lambda 1 is complex but has
magnitude 1, then it's just
706
00:45:00,790 --> 00:45:04,770
gradually rotating around and
not doing much of interest at
707
00:45:04,770 --> 00:45:06,930
all, but it's going away.
708
00:45:06,930 --> 00:45:08,520
So that's what this
equation means.
709
00:45:08,520 --> 00:45:13,730
It says that we've converted the
problem of finding the nth
710
00:45:13,730 --> 00:45:18,030
power of p just to this problem
of finding the nth
711
00:45:18,030 --> 00:45:20,740
power of these eigenvalues.
712
00:45:20,740 --> 00:45:22,070
So we've made some
real progress.
713
00:45:22,070 --> 00:45:24,505
AUDIENCE: Professor, what
is nu i right here?
714
00:45:24,505 --> 00:45:24,992
PROFESSOR: What?
715
00:45:24,992 --> 00:45:26,453
AUDIENCE: What is nu i?
716
00:45:29,635 --> 00:45:34,680
PROFESSOR: nu sub i is the i-th
of the right eigenvectors
717
00:45:34,680 --> 00:45:37,624
of the matrix p.
718
00:45:37,624 --> 00:45:38,940
AUDIENCE: And pi i?
719
00:45:38,940 --> 00:45:44,020
PROFESSOR: And pi i is the
i-th left eigenvector.
720
00:45:44,020 --> 00:45:48,060
And what we've shown is that
these are orthogonal to each
721
00:45:48,060 --> 00:45:51,554
other, orthonormal.
722
00:45:51,554 --> 00:45:53,932
AUDIENCE: Can you please say
again what happens when lambda
723
00:45:53,932 --> 00:45:54,890
is complex?
724
00:45:54,890 --> 00:45:55,290
PROFESSOR: What?
725
00:45:55,290 --> 00:45:57,094
AUDIENCE: When lambda is
complex, what exactly happens?
726
00:46:01,110 --> 00:46:04,160
PROFESSOR: Oh, if lambda i is
complex and the magnitude is
727
00:46:04,160 --> 00:46:07,190
less than 1, it just
dies away.
728
00:46:07,190 --> 00:46:09,730
if the magnitude is bigger than
1, it explodes, which
729
00:46:09,730 --> 00:46:10,860
will be very strange.
730
00:46:10,860 --> 00:46:12,800
And we'll see that
can't happen.
731
00:46:12,800 --> 00:46:17,540
And if the magnitude is 1, as
you take powers of a complex
732
00:46:17,540 --> 00:46:22,320
number of magnitude 1, I mean,
it start out here, it goes
733
00:46:22,320 --> 00:46:23,820
here, then here.
734
00:46:23,820 --> 00:46:27,620
I mean, it just rotates around
in some crazy way.
735
00:46:27,620 --> 00:46:31,220
But it maintains its magnitude
as being equal
736
00:46:31,220 --> 00:46:32,470
to 1 all the time.
737
00:46:38,290 --> 00:46:40,850
So this is just repeating
what we had before.
738
00:46:40,850 --> 00:46:42,100
These are the eigenvectors.
739
00:46:46,350 --> 00:46:51,690
If you calculate this very
quickly using this and this,
740
00:46:51,690 --> 00:46:59,970
and if you recognize that the
right eigenvector, nu 2, is
741
00:46:59,970 --> 00:47:07,060
the first part of it is pi sub
2, the second part of it minus
742
00:47:07,060 --> 00:47:14,450
pi sub 1, where pi is just this
first eigenvector here.
743
00:47:14,450 --> 00:47:17,010
So if you do this
multiplication, you find that
744
00:47:17,010 --> 00:47:19,960
nu to the 1--
745
00:47:19,960 --> 00:47:21,560
oh, I thought I had all
of these things out.
746
00:47:21,560 --> 00:47:22,810
This should be nu.
747
00:47:26,580 --> 00:47:32,270
The first right eigenvector
times the first left
748
00:47:32,270 --> 00:47:32,630
eigenvector.
749
00:47:32,630 --> 00:47:35,750
Oh, but this is all right,
because I'm saying the first
750
00:47:35,750 --> 00:47:39,080
left eigenvector is a steady
state vector, which is the
751
00:47:39,080 --> 00:47:40,570
thing we're interested in.
752
00:47:40,570 --> 00:47:46,210
That's pi 1, pi 2, pi 1,
pi 2, where pi 1 is
753
00:47:46,210 --> 00:47:49,070
this and pi 2 is this.
754
00:47:49,070 --> 00:47:53,480
nu 2 times pi 2 is just this.
755
00:47:53,480 --> 00:47:59,910
So when we calculate p sub n,
we get pi 1 plus pi 2 times
756
00:47:59,910 --> 00:48:03,530
this eigenvalue to
the nth power.
757
00:48:03,530 --> 00:48:07,000
Pi 1 minus pi 1, lambda
2 to the nth power.
758
00:48:07,000 --> 00:48:12,110
pi 2 and pi 2 is what we get
for the main eigenvalue.
759
00:48:12,110 --> 00:48:14,980
This is what we get for
the little eigenvalue.
760
00:48:14,980 --> 00:48:20,790
This little eigenvalue here is
1 minus P12 minus P21, which
761
00:48:20,790 --> 00:48:29,690
has magnitude less than 1,
unless we either have the
762
00:48:29,690 --> 00:48:33,840
situation where P12 is equal to
P21 is equal to 0, or both
763
00:48:33,840 --> 00:48:35,140
of them are 1.
764
00:48:35,140 --> 00:48:38,500
So these are the terms
that go to 0.
765
00:48:38,500 --> 00:48:39,755
This solution is exact.
766
00:48:39,755 --> 00:48:41,990
There were no approximations
in here.
767
00:48:41,990 --> 00:48:47,140
Before, when we analyzed what
happened to P to the n, we saw
768
00:48:47,140 --> 00:48:49,690
that we converged, but
we didn't really
769
00:48:49,690 --> 00:48:51,250
see how fast we converged.
770
00:48:51,250 --> 00:48:53,750
Now we know how fast
we converge.
771
00:48:53,750 --> 00:48:59,010
The rate of convergence is
the value of this second
772
00:48:59,010 --> 00:49:01,790
eigenvalue here.
773
00:49:01,790 --> 00:49:04,230
And that's a pretty
general result.
774
00:49:04,230 --> 00:49:08,150
You converged like the
second-largest eigenvalue.
775
00:49:08,150 --> 00:49:10,900
And we'll see how
that works out.
776
00:49:15,210 --> 00:49:18,810
Now, let's go on to the case
where you have an arbitrary
777
00:49:18,810 --> 00:49:20,230
number of states.
778
00:49:20,230 --> 00:49:23,820
We've almost solved that
already, because as we were
779
00:49:23,820 --> 00:49:29,870
looking at the case with two
states, we were doing most of
780
00:49:29,870 --> 00:49:32,000
the things in general.
781
00:49:32,000 --> 00:49:36,430
If you have an n state Markov
chain, a determinant of P
782
00:49:36,430 --> 00:49:40,760
minus lambda is a polynomial
of degree M in lambda.
783
00:49:40,760 --> 00:49:42,790
That was what we said
a while ago.
784
00:49:42,790 --> 00:49:45,480
It has M roots, eigenvalues.
785
00:49:45,480 --> 00:49:48,740
And here, we're going to assume
that those roots are
786
00:49:48,740 --> 00:49:49,520
all distinct.
787
00:49:49,520 --> 00:49:52,590
So we don't have to worry
about what happens with
788
00:49:52,590 --> 00:49:54,320
repeated roots.
789
00:49:54,320 --> 00:49:58,010
Each eigenvalue lambda sub i--
there are M of them now--
790
00:49:58,010 --> 00:50:03,160
has a right eigenvector,
nu sub i, and a left
791
00:50:03,160 --> 00:50:06,010
eigenvector, pi sub i.
792
00:50:06,010 --> 00:50:10,030
And we have seen that--
793
00:50:10,030 --> 00:50:11,220
well, we haven't seen it yet.
794
00:50:11,220 --> 00:50:13,140
We're going to show
it in a second.
795
00:50:13,140 --> 00:50:18,060
pi super i times nu super
j is equal to j for each
796
00:50:18,060 --> 00:50:20,420
ij unequal to i.
797
00:50:20,420 --> 00:50:24,660
If you scale either this or
that, when you saw this
798
00:50:24,660 --> 00:50:30,160
eigenvector equation, you have
a pi on both sides or a nu on
799
00:50:30,160 --> 00:50:34,280
both sides, and you have a scale
factor which can't be
800
00:50:34,280 --> 00:50:37,040
determined from the eigenvector
equation.
801
00:50:37,040 --> 00:50:41,020
So you have to choose that
scaling factor somehow.
802
00:50:41,020 --> 00:50:45,070
If we choose the scaling factor
appropriately, we get
803
00:50:45,070 --> 00:50:51,810
pi, the i-th left eigenvector,
times the i-th right
804
00:50:51,810 --> 00:50:52,075
eigenvector.
805
00:50:52,075 --> 00:50:53,610
This is just a number now.
806
00:50:53,610 --> 00:50:56,520
It's that times that.
807
00:50:56,520 --> 00:51:00,810
We can scale things, so
that's equal to 1.
808
00:51:00,810 --> 00:51:05,930
Then as before, let u be the
matrix with columns nu 1 to nu
809
00:51:05,930 --> 00:51:12,090
M, and let v have the rows, pi
1 to pi M. Because of this
810
00:51:12,090 --> 00:51:16,340
orthogonality relationship we've
set up, v times u is
811
00:51:16,340 --> 00:51:17,530
equal to i.
812
00:51:17,530 --> 00:51:26,310
So again, the left eigenvector
rows forms a matrix which is
813
00:51:26,310 --> 00:51:30,910
the inverse of the right
eigenvector columns.
814
00:51:30,910 --> 00:51:35,400
So that says v is equal
to u to the minus 1.
815
00:51:35,400 --> 00:51:41,040
Thus the eigenvector is nu, the
first right eigenvector up
816
00:51:41,040 --> 00:51:44,870
to the nth right eigenvector,
these are linearly
817
00:51:44,870 --> 00:51:46,100
independent.
818
00:51:46,100 --> 00:51:47,350
And they span M space.
819
00:51:50,040 --> 00:51:53,480
That's a very peculiar
thing we've done.
820
00:51:53,480 --> 00:51:57,600
We've said we have all these
M right eigenvectors.
821
00:51:57,600 --> 00:52:02,690
We don't know anything about
them, but what we do know is
822
00:52:02,690 --> 00:52:08,030
we also have M left
eigenvectors.
823
00:52:08,030 --> 00:52:13,070
And the left eigenvectors, as
we're going to show in just a
824
00:52:13,070 --> 00:52:17,800
second, are orthogonal to
the right eigenvectors.
825
00:52:17,800 --> 00:52:20,500
And therefore, when we look at
these two matrices, we can
826
00:52:20,500 --> 00:52:23,610
multiply them and get
the identity matrix.
827
00:52:23,610 --> 00:52:29,370
And that means that the right
eigenvectors have to be--
828
00:52:29,370 --> 00:52:31,970
when we look at the matrix of
the right eigenvectors, is
829
00:52:31,970 --> 00:52:33,220
non-singular.
830
00:52:34,920 --> 00:52:37,870
Very, very peculiar argument.
831
00:52:37,870 --> 00:52:41,350
I mean, we find out that those
right eigenvectors span the
832
00:52:41,350 --> 00:52:44,600
space, not by looking at the
right eigenvectors, but by
833
00:52:44,600 --> 00:52:48,220
looking at how they relate
to the left eigenvectors.
834
00:52:48,220 --> 00:52:51,370
But anyway, that's perfectly
all right.
835
00:52:51,370 --> 00:52:54,890
And so long as we can show
that we can satisfy this
836
00:52:54,890 --> 00:53:01,330
orthogonality condition, then in
fact all this works out. v
837
00:53:01,330 --> 00:53:03,560
is equal to u to the minus 1.
838
00:53:03,560 --> 00:53:06,210
These eigenvectors are linearly
independent and they
839
00:53:06,210 --> 00:53:07,380
span M space.
840
00:53:07,380 --> 00:53:08,630
Same here.
841
00:53:12,980 --> 00:53:17,010
And putting these equations
together, P times u equals u
842
00:53:17,010 --> 00:53:17,810
times lambda.
843
00:53:17,810 --> 00:53:19,960
This is exactly what
we did before.
844
00:53:19,960 --> 00:53:24,680
Post-multiplying by u to the
minus 1, we get P equals u
845
00:53:24,680 --> 00:53:27,210
times lambda times
u to the minus 1.
846
00:53:27,210 --> 00:53:30,430
P to the n is then u times
lambda to the n times u
847
00:53:30,430 --> 00:53:32,230
to the minus 1.
848
00:53:32,230 --> 00:53:35,670
All this stuff about convergence
is all revolving
849
00:53:35,670 --> 00:53:39,010
down to simply the question
of what happens to these
850
00:53:39,010 --> 00:53:40,200
eigenvalues.
851
00:53:40,200 --> 00:53:43,420
I mean, there's a mess first,
finding out what all these
852
00:53:43,420 --> 00:53:47,580
right eigenvectors are and
what all these left
853
00:53:47,580 --> 00:53:48,740
eigenvectors are.
854
00:53:48,740 --> 00:53:54,870
But once you do that, P to the
n is just looking at this
855
00:53:54,870 --> 00:53:57,300
quantity, breaking up
lambda to the n
856
00:53:57,300 --> 00:53:59,660
the way we did before.
857
00:53:59,660 --> 00:54:04,550
P to the n is just
this sum here.
858
00:54:04,550 --> 00:54:08,670
Now, each row of P sums to 1, so
e is a right eigenvector of
859
00:54:08,670 --> 00:54:10,960
eigenvalue 1.
860
00:54:10,960 --> 00:54:15,040
So we have a theorem that says
the left eigenvector pi of
861
00:54:15,040 --> 00:54:20,060
eigenvalue 1 is a steady state
vector if it's normalized to
862
00:54:20,060 --> 00:54:22,510
pi times e equals 1.
863
00:54:26,050 --> 00:54:30,760
So we almost did that before,
but now we want to be a little
864
00:54:30,760 --> 00:54:32,010
more careful about it.
865
00:54:38,768 --> 00:54:42,180
Oh, excuse me.
866
00:54:42,180 --> 00:54:45,640
The theorem is that the left
eigenvector pi is a steady
867
00:54:45,640 --> 00:54:48,040
state vector if it's normalized
in this way.
868
00:54:48,040 --> 00:54:53,250
In other words, we know that
there is a left eigenvector
869
00:54:53,250 --> 00:54:57,830
pi, which has eigenvalue 1,
because there's a right
870
00:54:57,830 --> 00:54:58,260
eigenvector.
871
00:54:58,260 --> 00:55:00,850
If there's a right eigenvector,
there has to be a
872
00:55:00,850 --> 00:55:02,320
left eigenvector.
873
00:55:02,320 --> 00:55:06,190
What we don't know is
that pi actually has
874
00:55:06,190 --> 00:55:08,340
non-negative terms.
875
00:55:08,340 --> 00:55:11,320
So that's the thing
we want to show.
876
00:55:11,320 --> 00:55:15,790
The proof is, there must be
a left eigenvector pi for
877
00:55:15,790 --> 00:55:16,960
eigenvalue 1.
878
00:55:16,960 --> 00:55:18,540
We already know that.
879
00:55:18,540 --> 00:55:25,275
For every j, Pi sub j is equal
to the sum over k times pi sub
880
00:55:25,275 --> 00:55:27,490
k times p sub kj.
881
00:55:27,490 --> 00:55:29,860
We don't know whether these
are complex or real.
882
00:55:29,860 --> 00:55:32,140
We don't know whether they're
positive or negative, if
883
00:55:32,140 --> 00:55:33,270
they're real.
884
00:55:33,270 --> 00:55:37,960
But we do know that since they
satisfy this eigenvector
885
00:55:37,960 --> 00:55:41,500
equation, they satisfy
this equation.
886
00:55:41,500 --> 00:55:43,670
If I take the magnitudes
of all of these
887
00:55:43,670 --> 00:55:45,220
things, what do I get?
888
00:55:45,220 --> 00:55:51,440
The magnitude on this side
is pi sub j magnitude.
889
00:55:51,440 --> 00:55:55,700
This is less than or equal to
the sum of the magnitudes of
890
00:55:55,700 --> 00:55:56,720
these terms.
891
00:55:56,720 --> 00:56:03,690
If you take two complex numbers
and you add them up,
892
00:56:03,690 --> 00:56:07,250
you get something which, in
magnitude, is less than or
893
00:56:07,250 --> 00:56:10,120
equal to the sum of
the magnitudes.
894
00:56:12,804 --> 00:56:16,030
It might sound strange,
but if you look
895
00:56:16,030 --> 00:56:20,070
in the complex plane--
896
00:56:20,070 --> 00:56:23,480
imaginary, real--
897
00:56:23,480 --> 00:56:27,250
and you look at one complex
number, and you add it to
898
00:56:27,250 --> 00:56:33,380
another complex number, this
distance here is less than or
899
00:56:33,380 --> 00:56:36,700
equal to this magnitude
plus this magnitude.
900
00:56:36,700 --> 00:56:38,940
That's all that equation
is saying.
901
00:56:38,940 --> 00:56:44,160
And this is equal to this
distance plus this distance if
902
00:56:44,160 --> 00:56:51,080
and only if each of these
components of the eigenvector
903
00:56:51,080 --> 00:56:55,110
that we're talking about, if and
only if those components
904
00:56:55,110 --> 00:56:57,630
are all heading off in
the same direction
905
00:56:57,630 --> 00:57:00,620
in the complex plane.
906
00:57:00,620 --> 00:57:02,404
Now what do we do?
907
00:57:02,404 --> 00:57:05,950
Well, you look at this for a
while and you say, OK, what
908
00:57:05,950 --> 00:57:11,031
happens if I sum this
inequality over j?
909
00:57:11,031 --> 00:57:15,320
Well, if I sum this
over j, I get one.
910
00:57:15,320 --> 00:57:28,410
And therefore when I sum both
sides over j, the sum over j
911
00:57:28,410 --> 00:57:33,240
of the magnitudes of these
eigenvector components is less
912
00:57:33,240 --> 00:57:36,570
than or equal to the sum over
k of the magnitude.
913
00:57:36,570 --> 00:57:38,760
This is the same as this.
914
00:57:38,760 --> 00:57:42,220
This j is just a dummy
index of summation.
915
00:57:42,220 --> 00:57:45,030
This is a dummy index
of summation.
916
00:57:45,030 --> 00:57:47,810
Obviously, this is less
than or equal to this.
917
00:57:47,810 --> 00:57:52,470
But what's interesting here is
that this is equal to this.
918
00:57:52,470 --> 00:57:56,290
And the only way this can be
equal to this is if every one
919
00:57:56,290 --> 00:58:00,450
of these things are satisfied
with equality.
920
00:58:00,450 --> 00:58:03,720
If any one of these are
satisfied with inequality,
921
00:58:03,720 --> 00:58:07,690
then when you add them all up,
this will be satisfied with
922
00:58:07,690 --> 00:58:10,120
inequality also, which
is impossible.
923
00:58:10,120 --> 00:58:15,080
So all of these are satisfied
with equality, which says that
924
00:58:15,080 --> 00:58:25,060
the magnitude of pi sub j, the
vector whose elements are the
925
00:58:25,060 --> 00:58:31,010
magnitudes of this thing we
started with, in fact form a
926
00:58:31,010 --> 00:58:35,700
steady state vector if we
normalize them to 1.
927
00:58:35,700 --> 00:58:38,600
It says these magnitudes
satisfy the
928
00:58:38,600 --> 00:58:41,270
steady state equation.
929
00:58:41,270 --> 00:58:45,010
These magnitudes are real
and they're positive.
930
00:58:45,010 --> 00:58:48,470
So when we normalize them to
sum to 1, we have a steady
931
00:58:48,470 --> 00:58:50,940
state vector.
932
00:58:50,940 --> 00:58:53,780
And therefore the left
eigenvector pi of eigenvalue 1
933
00:58:53,780 --> 00:58:57,790
is a steady state vector if it's
normalized to pi times e
934
00:58:57,790 --> 00:59:03,120
equals 1, which is the way we
want to normalize them.
935
00:59:03,120 --> 00:59:07,140
So there always is a steady
state vector for every
936
00:59:07,140 --> 00:59:08,840
finite-state Markov chain.
937
00:59:12,440 --> 00:59:15,580
So this is a non-negative vector
satisfying a steady
938
00:59:15,580 --> 00:59:16,960
state vector equation.
939
00:59:16,960 --> 00:59:20,420
And normalizing it, we have
a steady state vector.
940
00:59:20,420 --> 00:59:24,300
So we've demonstrated the
existence of a left
941
00:59:24,300 --> 00:59:27,480
eigenvector which is a
steady state vector.
942
00:59:27,480 --> 00:59:34,180
Another theorem is that every
eigenvalue satisfies lambda,
943
00:59:34,180 --> 00:59:37,520
magnitude of the eigenvalue is
less than or equal to 1.
944
00:59:37,520 --> 00:59:41,370
This, again, is sort of obvious,
because if you have
945
00:59:41,370 --> 00:59:45,190
an eigenvalue which is bigger
than 1 and you start taking
946
00:59:45,190 --> 00:59:49,020
powers of it, it starts marching
off to infinity.
947
00:59:49,020 --> 00:59:50,920
Now, you might say, maybe
something else
948
00:59:50,920 --> 00:59:52,140
is balancing that.
949
00:59:52,140 --> 00:59:55,760
But since you only have a finite
number of these things,
950
00:59:55,760 --> 00:59:58,050
that sounds pretty weird.
951
00:59:58,050 --> 00:59:59,790
And in fact, it is.
952
00:59:59,790 --> 01:00:09,140
So the proof of this is, we want
to assume that pi super l
953
01:00:09,140 --> 01:00:14,985
is the l-th of these
eigenvectors of P. Its
954
01:00:14,985 --> 01:00:18,820
eigenvalue is lambda sub l.
955
01:00:18,820 --> 01:00:25,170
It also is a left eigenvector of
P to the n with eigenvalue
956
01:00:25,170 --> 01:00:26,370
lambda to the n.
957
01:00:26,370 --> 01:00:29,120
That's what we've
shown before.
958
01:00:29,120 --> 01:00:33,070
I mean, you can multiply this
matrix P, and all you're doing
959
01:00:33,070 --> 01:00:37,710
is just taking powers
of the eigenvalue.
960
01:00:37,710 --> 01:00:43,160
So if we start out with lambda
to the n, let's forget about
961
01:00:43,160 --> 01:00:46,290
the l's, because we're just
looking at a fixed l now.
962
01:00:46,290 --> 01:00:54,870
Lambda to the nth power times
the j-th component of pi is
963
01:00:54,870 --> 01:01:04,900
equal to the sum over i of the
i-th component of pi times Pij
964
01:01:04,900 --> 01:01:06,640
to the n, for all j.
965
01:01:11,080 --> 01:01:14,430
Now I take the magnitude of
everything is before.
966
01:01:14,430 --> 01:01:17,510
The magnitude of this is, again,
less than or equal to
967
01:01:17,510 --> 01:01:19,380
the magnitude of this.
968
01:01:19,380 --> 01:01:25,510
I want to let beta be the
largest of these quantities.
969
01:01:25,510 --> 01:01:32,240
And when I put that maximizing
j in here, lambda to the l
970
01:01:32,240 --> 01:01:40,550
times beta is less than or equal
to the sum over i of--
971
01:01:40,550 --> 01:01:43,810
I can upper-bound
these by beta.
972
01:01:43,810 --> 01:01:47,340
So I wind up with lambda to the
l times beta is less than
973
01:01:47,340 --> 01:01:51,800
or equal to the sum over i of
beta times Pij to the n.
974
01:01:51,800 --> 01:01:54,680
I don't know what these powers
are, but they're certainly
975
01:01:54,680 --> 01:01:57,340
less than or equal to 1.
976
01:01:57,340 --> 01:02:03,920
So lambda sub l is less
than or equal to n.
977
01:02:03,920 --> 01:02:05,260
That's what this said.
978
01:02:05,260 --> 01:02:14,680
When you take this magnitude of
the l-th eigenvalue, it's
979
01:02:14,680 --> 01:02:17,210
less than or equal
to this number n.
980
01:02:17,210 --> 01:02:22,310
Now, if this number were larger
than 1, if it was 1
981
01:02:22,310 --> 01:02:27,300
plus 10 to the minus sixth,
and you multiplied it by a
982
01:02:27,300 --> 01:02:31,410
large enough number n, that
this would grow to be
983
01:02:31,410 --> 01:02:33,330
arbitrarily large.
984
01:02:33,330 --> 01:02:36,880
It can't grow to be arbitrarily
large, therefore
985
01:02:36,880 --> 01:02:39,890
the magnitude of lambda
sub l has to be less
986
01:02:39,890 --> 01:02:41,880
than or equal to 1.
987
01:02:41,880 --> 01:02:48,980
Tedious proof, but
unfortunately, the notes just
988
01:02:48,980 --> 01:02:50,230
assume this.
989
01:02:53,630 --> 01:02:56,610
Maybe I had some good, simple
reason for it before.
990
01:02:56,610 --> 01:02:59,600
I don't have any now, so I have
to go through a proof.
991
01:02:59,600 --> 01:03:04,440
Anyway, these two theorems, if
you look at them, are valid
992
01:03:04,440 --> 01:03:06,880
for all finite-state
Markov chains.
993
01:03:06,880 --> 01:03:12,190
There was no place that we
used the fact that we had
994
01:03:12,190 --> 01:03:14,790
anything with distinct
eigenvalues or anything.
995
01:03:14,790 --> 01:03:20,840
But now when we had distinct
eigenvalues, we have the nth
996
01:03:20,840 --> 01:03:28,500
power of P is the sum here again
over right eigenvectors
997
01:03:28,500 --> 01:03:32,600
times left eigenvectors.
998
01:03:32,600 --> 01:03:35,340
When you take a right
eigenvector, which is a column
999
01:03:35,340 --> 01:03:39,720
vector, times a left
eigenvector, which is a row
1000
01:03:39,720 --> 01:03:44,080
vector, you get an
M by M matrix.
1001
01:03:44,080 --> 01:03:47,330
I don't know what that matrix
is, but it's a matrix.
1002
01:03:47,330 --> 01:03:50,980
It's a fixed matrix
independent of n.
1003
01:03:50,980 --> 01:03:53,390
And the only thing that's
varying with n is these
1004
01:03:53,390 --> 01:03:56,000
eigenvalues.
1005
01:03:56,000 --> 01:03:59,220
These quantities are less
than or equal to 1.
1006
01:03:59,220 --> 01:04:03,270
So if the chain is an ergodic
unit chain, we've already seen
1007
01:04:03,270 --> 01:04:07,250
that one eigenvalue is 1, and
the rest of the eigenvalues
1008
01:04:07,250 --> 01:04:09,260
are strictly less than
1 in magnitude.
1009
01:04:09,260 --> 01:04:13,600
We saw that by showing that for
an ergodic unit chain, P
1010
01:04:13,600 --> 01:04:16,060
to the n converged.
1011
01:04:16,060 --> 01:04:21,280
So the rate at which P to the
n approaches e times pi is
1012
01:04:21,280 --> 01:04:24,190
going to be determined
by the second-largest
1013
01:04:24,190 --> 01:04:27,710
eigenvalue in here.
1014
01:04:27,710 --> 01:04:31,180
And that second-largest
eigenvalue is going to be less
1015
01:04:31,180 --> 01:04:33,880
than 1, strictly less than 1.
1016
01:04:33,880 --> 01:04:35,170
We don't know what it is.
1017
01:04:35,170 --> 01:04:39,040
Before, we knew this convergence
here for an
1018
01:04:39,040 --> 01:04:43,050
ergodic unit chain
is exponential.
1019
01:04:43,050 --> 01:04:45,380
Now we know that it's
exponential and we know
1020
01:04:45,380 --> 01:04:48,740
exactly how fast it goes,
because the speed of
1021
01:04:48,740 --> 01:04:52,480
convergence is just the
second-largest eigenvalue.
1022
01:04:52,480 --> 01:04:58,530
If you want to know how fast P
to the n approaches e times
1023
01:04:58,530 --> 01:05:02,170
the steady state vector pi,
all you have to do is find
1024
01:05:02,170 --> 01:05:05,220
that second-largest eigenvalue,
and that tells you
1025
01:05:05,220 --> 01:05:09,560
how fast the convergence is,
except for calculating these
1026
01:05:09,560 --> 01:05:11,015
things, which are just fixed.
1027
01:05:13,580 --> 01:05:19,200
If P is a periodic unit chain
with period d, then if you
1028
01:05:19,200 --> 01:05:20,110
read the notes--
1029
01:05:20,110 --> 01:05:22,110
you should read the notes--
1030
01:05:22,110 --> 01:05:24,420
there are d eigenvalues
equally spaced
1031
01:05:24,420 --> 01:05:26,160
around the unit circle.
1032
01:05:26,160 --> 01:05:28,470
P to the n doesn't converge.
1033
01:05:28,470 --> 01:05:33,040
The only thing you can say here
is, what happens if you
1034
01:05:33,040 --> 01:05:37,280
look at P to the d-th power?
1035
01:05:37,280 --> 01:05:39,890
And you can imagine what happens
if you look at P to
1036
01:05:39,890 --> 01:05:44,070
the d-th power without
doing any analysis.
1037
01:05:44,070 --> 01:05:49,290
I mean, we know that what
happens in a periodic chain is
1038
01:05:49,290 --> 01:05:53,170
that you rotate from one set
of states to another set of
1039
01:05:53,170 --> 01:05:56,220
states to another set of states
to another set of
1040
01:05:56,220 --> 01:05:57,910
states, and then back
to the set of
1041
01:05:57,910 --> 01:05:59,110
states you started with.
1042
01:05:59,110 --> 01:06:01,220
And you keep rotating around.
1043
01:06:01,220 --> 01:06:05,520
Now, there are d sets of states
going around here.
1044
01:06:05,520 --> 01:06:08,860
What happens if I
take P to the d?
1045
01:06:08,860 --> 01:06:12,320
P to the d is looking at
the d-step transitions.
1046
01:06:12,320 --> 01:06:16,840
So it's looking at, if you start
here, after d steps,
1047
01:06:16,840 --> 01:06:19,310
you're back here again,
after d steps,
1048
01:06:19,310 --> 01:06:20,960
you're back here again.
1049
01:06:20,960 --> 01:06:31,700
So the matrix, P to the d, is
in fact the matrix of d
1050
01:06:31,700 --> 01:06:35,180
ergodic subclasses.
1051
01:06:37,940 --> 01:06:41,090
And for each one of them,
whatever subclass you start
1052
01:06:41,090 --> 01:06:44,200
in, you stay in that
subclass forever.
1053
01:06:44,200 --> 01:06:49,130
So the analysis of a periodic
unit chain, really the classy
1054
01:06:49,130 --> 01:06:52,730
way to do it is to look
at P to the d and see
1055
01:06:52,730 --> 01:06:54,980
what happens there.
1056
01:06:54,980 --> 01:06:58,800
And you see that you get
convergence within each
1057
01:06:58,800 --> 01:07:03,030
subclass, but you just keep
rotating among subclasses.
1058
01:07:03,030 --> 01:07:06,060
So there's nothing very
fancy going on there.
1059
01:07:06,060 --> 01:07:09,860
You just rotate from one
subclass to another.
1060
01:07:09,860 --> 01:07:12,350
And that's the way it is.
1061
01:07:12,350 --> 01:07:14,720
And P to the n doesn't
converge.
1062
01:07:14,720 --> 01:07:18,495
But P to the d times
n does converge.
1063
01:07:22,900 --> 01:07:30,050
Now, let's look at the next-most
complicated state.
1064
01:07:30,050 --> 01:07:34,460
Suppose we have M states and
we have M independent
1065
01:07:34,460 --> 01:07:35,180
eigenvectors.
1066
01:07:35,180 --> 01:07:38,650
OK, remember I told you that
there was a very ugly thing in
1067
01:07:38,650 --> 01:07:43,140
linear algebra that said, when
you had an eigenvalue of
1068
01:07:43,140 --> 01:07:50,590
multiplicity k, you might not
have k linearly independent
1069
01:07:50,590 --> 01:07:51,060
eigenvectors.
1070
01:07:51,060 --> 01:07:52,720
You might have a smaller
number of them.
1071
01:07:52,720 --> 01:07:55,070
We'll look at an example
of that later.
1072
01:07:55,070 --> 01:07:58,730
But here, I'm saying, let's
forget about that case,
1073
01:07:58,730 --> 01:08:00,760
because it's ugly.
1074
01:08:00,760 --> 01:08:04,640
Let's assume that whatever
multiplicity each of these
1075
01:08:04,640 --> 01:08:08,950
eigenvalues has, if you have
an eigenvalue with
1076
01:08:08,950 --> 01:08:15,010
multiplicity k, then you have
k linearly independent right
1077
01:08:15,010 --> 01:08:19,279
eigenvectors and k linearly
independent left eigenvectors
1078
01:08:19,279 --> 01:08:20,960
to correspond to that.
1079
01:08:20,960 --> 01:08:26,420
And then when you add up all of
the eigenvectors, you have
1080
01:08:26,420 --> 01:08:30,020
M linearly independent
eigenvectors.
1081
01:08:30,020 --> 01:08:36,029
And what happens when you have M
linearly independent vectors
1082
01:08:36,029 --> 01:08:39,710
in a space of dimension M?
1083
01:08:39,710 --> 01:08:42,649
If you have M linearly
independent vectors in a space
1084
01:08:42,649 --> 01:08:48,880
of dimension N, you expand the
whole space, which says that
1085
01:08:48,880 --> 01:08:53,490
the vector of these eigenvectors
is in fact
1086
01:08:53,490 --> 01:08:56,920
non-singular, which says, again,
we can do all of the
1087
01:08:56,920 --> 01:08:58,700
stuff we did before.
1088
01:08:58,700 --> 01:09:01,830
There's a little bit of a trick
in showing that the left
1089
01:09:01,830 --> 01:09:04,460
eigenvectors and the right
eigenvectors can be made
1090
01:09:04,460 --> 01:09:06,490
orthogonal.
1091
01:09:06,490 --> 01:09:10,359
But aside from that,
P to the n is again
1092
01:09:10,359 --> 01:09:13,960
equal to the same form.
1093
01:09:13,960 --> 01:09:23,550
And what this form says is, if
all of the eigenvalues except
1094
01:09:23,550 --> 01:09:27,250
one are less than 1, then you're
again going to approach
1095
01:09:27,250 --> 01:09:28,649
steady state.
1096
01:09:28,649 --> 01:09:29,899
What does that mean?
1097
01:09:32,870 --> 01:09:39,729
Suppose I have more than one
ergodic chain, more than one
1098
01:09:39,729 --> 01:09:44,350
ergodic class, or suppose I
have a periodic class or
1099
01:09:44,350 --> 01:09:45,130
something else.
1100
01:09:45,130 --> 01:09:49,399
Is it possible to have one
eigenvalue equal to 1 and all
1101
01:09:49,399 --> 01:09:52,040
the other eigenvalues
be smaller?
1102
01:09:52,040 --> 01:09:55,670
If there's one eigenvalue that's
equal to 1, according
1103
01:09:55,670 --> 01:09:59,740
to this formula here, eventually
P to the n
1104
01:09:59,740 --> 01:10:05,090
converges to that one
value equal to 1.
1105
01:10:05,090 --> 01:10:09,290
And right eigenvector
can be taken as e.
1106
01:10:09,290 --> 01:10:13,230
Left eigenvector can be taken
as a steady state vector pi.
1107
01:10:13,230 --> 01:10:16,250
And we have the case
of convergence.
1108
01:10:16,250 --> 01:10:20,830
Can you have convergence to all
the rows being the same if
1109
01:10:20,830 --> 01:10:24,830
you have multiple
ergodic classes?
1110
01:10:24,830 --> 01:10:25,900
No.
1111
01:10:25,900 --> 01:10:28,820
If you have multiple ergodic
classes and you start out in
1112
01:10:28,820 --> 01:10:30,040
one class, you stay there.
1113
01:10:30,040 --> 01:10:32,350
You can't get out of it.
1114
01:10:32,350 --> 01:10:35,190
If you have a periodic class
and you start out in that
1115
01:10:35,190 --> 01:10:39,120
periodic class, you can't
have convergence there.
1116
01:10:39,120 --> 01:10:47,100
So in this situation here, where
all the eigenvalues are
1117
01:10:47,100 --> 01:10:51,180
distinct, you can only have
one eigenvalue equal to 1.
1118
01:10:51,180 --> 01:10:55,270
Here, when we're going to this
more general case, we might
1119
01:10:55,270 --> 01:10:58,470
have more than one eigenvalue
equal to 1.
1120
01:10:58,470 --> 01:11:02,960
But if in fact we only have one
eigenvalue equal to 1, and
1121
01:11:02,960 --> 01:11:06,440
all the others are strictly
smaller in magnitude, then in
1122
01:11:06,440 --> 01:11:09,620
fact you're just talking about
this case of an ergodic unit
1123
01:11:09,620 --> 01:11:10,505
chain again.
1124
01:11:10,505 --> 01:11:14,490
It's the only place
you can be.
1125
01:11:14,490 --> 01:11:19,350
So let's look at an
example of this.
1126
01:11:19,350 --> 01:11:23,050
Suppose you have a Markov
chain which has l
1127
01:11:23,050 --> 01:11:26,610
ergodic sets of states.
1128
01:11:26,610 --> 01:11:29,420
You have one set of states.
1129
01:11:40,990 --> 01:11:47,610
So we have one set of states
over here, which will all go
1130
01:11:47,610 --> 01:11:50,480
back and forth to each other.
1131
01:11:50,480 --> 01:11:52,850
Then another set of
states over here.
1132
01:11:58,260 --> 01:12:03,840
Let's let l equal
2 in this case.
1133
01:12:03,840 --> 01:12:05,945
So what happens in
this situation?
1134
01:12:16,840 --> 01:12:18,660
We'll have to work quickly
before it gets up.
1135
01:12:25,400 --> 01:12:29,860
Anybody with any sense, faced
with a Markov chain like this,
1136
01:12:29,860 --> 01:12:32,800
would say if we start here,
we're going to stay here, if
1137
01:12:32,800 --> 01:12:35,020
we start here, we're
going to stay here.
1138
01:12:35,020 --> 01:12:37,150
Let's just analyze this first.
1139
01:12:37,150 --> 01:12:39,390
And then after we're done
analyzing this,
1140
01:12:39,390 --> 01:12:40,960
we'll analyze this.
1141
01:12:40,960 --> 01:12:43,160
And then we'll put the
whole thing together.
1142
01:12:43,160 --> 01:12:48,180
And what we will find is
a transition matrix
1143
01:12:48,180 --> 01:12:49,510
which looks like this.
1144
01:12:54,540 --> 01:12:56,420
And if you're here,
you stay here.
1145
01:12:56,420 --> 01:12:57,990
If you're here, you stay here.
1146
01:12:57,990 --> 01:13:01,630
We can find the eigenvalues
and eigenvectors of this.
1147
01:13:01,630 --> 01:13:05,030
We can find the eigenvalues
and eigenvectors of this.
1148
01:13:05,030 --> 01:13:08,530
If you look at this crazy
formula for finding
1149
01:13:08,530 --> 01:13:12,940
determinants, what you're stuck
with is permutations
1150
01:13:12,940 --> 01:13:16,500
within here times permutations
within here.
1151
01:13:16,500 --> 01:13:20,490
So the eigenvalues that you wind
up with are products of
1152
01:13:20,490 --> 01:13:21,960
the two eigenvalues.
1153
01:13:21,960 --> 01:13:29,970
Or any eigenvalue here is an
eigenvalue of the whole thing.
1154
01:13:29,970 --> 01:13:32,715
Any eigenvalue here is an
eigenvalue of the whole thing.
1155
01:13:32,715 --> 01:13:36,120
And we just look at the sum of
the number of eigenvalues here
1156
01:13:36,120 --> 01:13:37,300
and the number there.
1157
01:13:37,300 --> 01:13:40,490
So we have a very boring
case here.
1158
01:13:40,490 --> 01:13:44,750
Each ergodic set has an
eigenvalue equal to 1, has a
1159
01:13:44,750 --> 01:13:47,580
right eigenvector equal to 1.
1160
01:13:47,580 --> 01:13:53,090
When the steps of that state
and 0 elsewhere.
1161
01:13:53,090 --> 01:13:56,290
There's also a steady state
vector on that set of states.
1162
01:13:56,290 --> 01:13:58,120
We've already seen that.
1163
01:13:58,120 --> 01:14:03,940
So P to the n converges to a
block diagonal matrix, where
1164
01:14:03,940 --> 01:14:08,270
for each ergodic set, the rows
within that set are the same.
1165
01:14:08,270 --> 01:14:21,400
So P to the n then
is pi 1, pi 1.
1166
01:14:21,400 --> 01:14:27,095
And then here, we have
pi 2, pi 2, pi 2.
1167
01:14:29,610 --> 01:14:34,000
So that's all that
can happen here.
1168
01:14:34,000 --> 01:14:35,250
This is limit.
1169
01:14:42,090 --> 01:14:47,220
So one message of this is that,
after you understand
1170
01:14:47,220 --> 01:14:51,740
ergodic unit chains, you
understand almost everything.
1171
01:14:51,740 --> 01:14:55,310
You still have to worry about
periodic unit chains.
1172
01:14:55,310 --> 01:14:58,220
But you just take a power of
them, and then you have
1173
01:14:58,220 --> 01:15:00,400
ergodic sets of states.
1174
01:15:04,650 --> 01:15:07,250
one final thing.
1175
01:15:07,250 --> 01:15:09,640
Good, I have five minutes
to talk about this.
1176
01:15:09,640 --> 01:15:12,480
I don't want any more time to
talk about it, because I'll
1177
01:15:12,480 --> 01:15:15,490
get terribly confused if I do.
1178
01:15:15,490 --> 01:15:21,030
And it's a topic which, if you
want to read more about it,
1179
01:15:21,030 --> 01:15:24,610
read about it in Strang.
1180
01:15:24,610 --> 01:15:27,010
He obviously doesn't like
the topic either.
1181
01:15:27,010 --> 01:15:28,710
Nobody likes the topic.
1182
01:15:28,710 --> 01:15:33,320
Strang at least was driven to
say something clear about it.
1183
01:15:33,320 --> 01:15:36,330
Most people don't even
bother to say
1184
01:15:36,330 --> 01:15:38,260
something clear about it.
1185
01:15:38,260 --> 01:15:42,190
There's a theorem, due to, I
guess, Jordan, because it's
1186
01:15:42,190 --> 01:15:45,320
called a Jordan form.
1187
01:15:45,320 --> 01:15:51,210
And what Jordan said is, in
the nice cases we talked
1188
01:15:51,210 --> 01:15:57,860
about, you have this
decomposition of the
1189
01:15:57,860 --> 01:16:04,090
transition matrix in P into a
matrix here whose columns are
1190
01:16:04,090 --> 01:16:09,480
the right eigenvectors times
a matrix here, which is a
1191
01:16:09,480 --> 01:16:13,140
diagonal matrix with the
eigenvalues along it.
1192
01:16:13,140 --> 01:16:19,980
And this, finally, is a matrix
which is the inverse of this,
1193
01:16:19,980 --> 01:16:24,200
and, which properly normalized,
is the left
1194
01:16:24,200 --> 01:16:33,400
eigenvectors of P. And you can
replace this form by what's
1195
01:16:33,400 --> 01:16:39,040
called a Jordan form, where P
is equal to some matrix u
1196
01:16:39,040 --> 01:16:45,720
times the Jordan form matrix
j times the inverse of u.
1197
01:16:45,720 --> 01:16:49,870
Now, u is no longer the
right eigenvectors.
1198
01:16:49,870 --> 01:16:52,480
It can't be the right
eigenvectors, because when we
1199
01:16:52,480 --> 01:16:56,090
needed Jordan form, we don't
have enough right eigenvectors
1200
01:16:56,090 --> 01:16:58,030
to span the space.
1201
01:16:58,030 --> 01:17:00,910
So it has to be something
else.
1202
01:17:00,910 --> 01:17:04,450
And like everyone else,
we say, I don't care
1203
01:17:04,450 --> 01:17:06,320
what that matrix is.
1204
01:17:06,320 --> 01:17:09,940
Jordan proved that there is such
a matrix, and that's all
1205
01:17:09,940 --> 01:17:11,270
we want to know.
1206
01:17:11,270 --> 01:17:17,230
The important thing is that this
matrix j in here is as
1207
01:17:17,230 --> 01:17:19,860
close as you can get it.
1208
01:17:19,860 --> 01:17:25,400
It's a matrix, which along the
main diagonal, has all the
1209
01:17:25,400 --> 01:17:28,310
eigenvalues with their
appropriate multiplicity.
1210
01:17:28,310 --> 01:17:31,670
Namely, lambda 1 is
an eigenvalue with
1211
01:17:31,670 --> 01:17:33,550
multiplicity 2.
1212
01:17:33,550 --> 01:17:38,700
Lambda 2 is an eigenvalue
of multiplicity 3.
1213
01:17:38,700 --> 01:17:43,210
And in this situation, you have
two eigenvectors here, so
1214
01:17:43,210 --> 01:17:46,180
nothing appears up there.
1215
01:17:46,180 --> 01:17:53,530
With this multiplicity 3
eigenvalue, there are only two
1216
01:17:53,530 --> 01:17:56,370
linearly independent
eigenvectors.
1217
01:17:56,370 --> 01:18:00,640
And therefore Jordan says, why
don't we stick a 1 in here and
1218
01:18:00,640 --> 01:18:03,270
then solve everything else?
1219
01:18:03,270 --> 01:18:08,770
And his theorem says, if you
do that, it in fact works.
1220
01:18:08,770 --> 01:18:11,190
So every time--
1221
01:18:11,190 --> 01:18:17,850
well, the eigenvalue is on the
main diagonal, the ones on the
1222
01:18:17,850 --> 01:18:22,190
next diagonal up, the only place
would be anything non-0
1223
01:18:22,190 --> 01:18:25,850
is on the main diagonal in this
form, and on the next
1224
01:18:25,850 --> 01:18:29,400
diagonal up, where you
occasionally have a 1.
1225
01:18:29,400 --> 01:18:33,420
And the 1 is to replace
the need for deficient
1226
01:18:33,420 --> 01:18:33,900
eigenvectors.
1227
01:18:33,900 --> 01:18:37,230
So every time you have a
deficient eigenvector, you
1228
01:18:37,230 --> 01:18:39,260
have some 1 appearing there.
1229
01:18:39,260 --> 01:18:40,960
And then there's a way
to solve for u.
1230
01:18:40,960 --> 01:18:44,650
And I don't have any idea what
it is, and I don't care.
1231
01:18:44,650 --> 01:18:49,390
But if you get interested in it,
I think that's wonderful.
1232
01:18:49,390 --> 01:18:53,075
But please don't tell
me about it.
1233
01:18:59,250 --> 01:19:04,400
Nice example of this is
this matrix here.
1234
01:19:04,400 --> 01:19:10,160
What happens if you try to
take the determinant of P
1235
01:19:10,160 --> 01:19:11,835
minus lambda i?
1236
01:19:11,835 --> 01:19:16,850
Well, you have 1/2 minus lambda,
1/2 minus lambda, 1
1237
01:19:16,850 --> 01:19:19,250
minus lambda.
1238
01:19:19,250 --> 01:19:25,180
What are all the permutations
here that you can take?
1239
01:19:25,180 --> 01:19:29,200
There's the permutation of
the main diagonal itself.
1240
01:19:29,200 --> 01:19:33,380
If I try to include that
element, there's nothing I can
1241
01:19:33,380 --> 01:19:35,880
do but have some element
down here.
1242
01:19:35,880 --> 01:19:37,150
And all these elements are 0.
1243
01:19:39,870 --> 01:19:43,480
So those elements don't
contribute to a
1244
01:19:43,480 --> 01:19:45,140
determinant at all.
1245
01:19:45,140 --> 01:19:49,100
So I have one eigenvalue
which is equal to 1.
1246
01:19:49,100 --> 01:19:53,020
I have two values at
multiplicity 2, eigenvalue
1247
01:19:53,020 --> 01:19:54,600
which is 1/2.
1248
01:19:54,600 --> 01:19:58,070
If you try to find the
eigenvector here, you find
1249
01:19:58,070 --> 01:19:59,930
there is only one.
1250
01:19:59,930 --> 01:20:03,700
So in fact, this corresponds
to a Jordan form,
1251
01:20:03,700 --> 01:20:07,180
where you have 1/2.
1252
01:20:15,300 --> 01:20:22,010
1, and a 0, and a 1 here,
and 0 everywhere else.
1253
01:20:29,650 --> 01:20:37,110
And now if I want to find P to
the n, I have u times this j
1254
01:20:37,110 --> 01:20:39,320
times u to the minus
1 times u.
1255
01:20:39,320 --> 01:20:42,140
All the u's in the middle
cancel out, so I wind up
1256
01:20:42,140 --> 01:20:46,640
eventually with u times j
to the nth power times u
1257
01:20:46,640 --> 01:20:48,020
to the minus 1.
1258
01:20:48,020 --> 01:20:49,490
What is j to the nth power?
1259
01:20:49,490 --> 01:20:56,260
What happens if I multiply this
matrix by itself n times?
1260
01:20:56,260 --> 01:20:59,970
Well, it turns out that what
happens is that this main
1261
01:20:59,970 --> 01:21:03,880
diagonal here, you wind
up with a 1/4 and
1262
01:21:03,880 --> 01:21:06,350
then 1/8 and so forth.
1263
01:21:06,350 --> 01:21:13,190
This term here, it goes
down exponential.
1264
01:21:13,190 --> 01:21:24,270
Well, if you multiply this by
itself, eventually, you can
1265
01:21:24,270 --> 01:21:27,920
see what's going on here more
easily if you draw the Markov
1266
01:21:27,920 --> 01:21:29,160
chain for it.
1267
01:21:29,160 --> 01:21:34,590
You have state 1, state
2, and state 3.
1268
01:21:34,590 --> 01:21:40,680
State 1, there's a transition
1/2 and a transition 1/2.
1269
01:21:40,680 --> 01:21:47,530
State 2, there's a transition
1/2 and a transition 1/2, And
1270
01:21:47,530 --> 01:21:50,810
state 3, you just stay there.
1271
01:21:50,810 --> 01:21:53,600
So the amount of time that it
takes you to get to steady
1272
01:21:53,600 --> 01:21:56,820
state is the amount of
time it takes you--
1273
01:21:56,820 --> 01:21:58,690
you start in state 1.
1274
01:21:58,690 --> 01:22:01,930
You've got to make this
transition eventually, and
1275
01:22:01,930 --> 01:22:05,690
then you've got to make this
transition eventually.
1276
01:22:05,690 --> 01:22:08,800
And the amount of time that it
takes you to do that is the
1277
01:22:08,800 --> 01:22:12,170
sum of the amount of time it
takes you to go there, plus
1278
01:22:12,170 --> 01:22:15,220
the amount of time that
it takes to go there.
1279
01:22:15,220 --> 01:22:16,960
So you have two random
variables.
1280
01:22:16,960 --> 01:22:19,400
One is the time to go here.
1281
01:22:19,400 --> 01:22:22,320
The other is the time
to go here.
1282
01:22:22,320 --> 01:22:24,590
Both of those are geometrically
decreasing
1283
01:22:24,590 --> 01:22:25,960
random variables.
1284
01:22:25,960 --> 01:22:30,470
When we convolve those things
with each other, what we get
1285
01:22:30,470 --> 01:22:31,940
is an extra term n.
1286
01:22:31,940 --> 01:22:40,070
So we get an n times
1/2 to the n.
1287
01:22:40,070 --> 01:22:43,200
So the thing which is different
in the Jordan form
1288
01:22:43,200 --> 01:22:47,200
is, instead of having an
eigenvalue to the nth power,
1289
01:22:47,200 --> 01:22:50,840
you have an eigenvalue times--
1290
01:22:50,840 --> 01:22:54,610
if there's only a single one
there, there's an n there.
1291
01:22:54,610 --> 01:22:58,450
If there are two 1s both
together, you get an n times n
1292
01:22:58,450 --> 01:23:00,230
minus 1, and so forth.
1293
01:23:00,230 --> 01:23:04,690
So worst case, you've got a
polynomial to the nth power
1294
01:23:04,690 --> 01:23:07,020
times an eigenvalue.
1295
01:23:07,020 --> 01:23:10,130
For all practical purposes, this
is still the eigenvalue
1296
01:23:10,130 --> 01:23:11,950
going down exponentially.
1297
01:23:11,950 --> 01:23:17,090
So for all practical purposes,
what you wind up with is the
1298
01:23:17,090 --> 01:23:22,180
second-largest eigenvalue still
determines how fast you
1299
01:23:22,180 --> 01:23:23,430
get convergence.
1300
01:23:26,490 --> 01:23:29,120
Sorry, I took eight minutes
talking about the Jordan form.
1301
01:23:29,120 --> 01:23:32,020
I wanted to take five minutes
talking about it.
1302
01:23:32,020 --> 01:23:34,030
You can read more about
it in the notes.