1
00:00:00,530 --> 00:00:02,960
The following content is
provided under a Creative
2
00:00:02,960 --> 00:00:04,370
Commons license.
3
00:00:04,370 --> 00:00:07,410
Your support will help MIT
OpenCourseWare continue to
4
00:00:07,410 --> 00:00:11,060
offer high quality educational
resources for free.
5
00:00:11,060 --> 00:00:13,960
To make a donation or view
additional materials from
6
00:00:13,960 --> 00:00:17,890
hundreds of MIT courses, visit
MIT OpenCourseWare at
7
00:00:17,890 --> 00:00:19,140
ocw.mit.edu.
8
00:00:24,010 --> 00:00:26,560
PROFESSOR: I'm going to spend
most of time talking about
9
00:00:26,560 --> 00:00:29,400
chapters one, two, and three.
10
00:00:29,400 --> 00:00:32,220
A little bit talking about
chapter four, because we've
11
00:00:32,220 --> 00:00:36,370
been doing so much with chapter
four in the last
12
00:00:36,370 --> 00:00:39,980
couple of weeks that you
probably remember that more.
13
00:00:39,980 --> 00:00:40,580
OK.
14
00:00:40,580 --> 00:00:44,310
The basics, which we started out
with, and which you should
15
00:00:44,310 --> 00:00:48,800
never forget, is that any time
you develop a probability
16
00:00:48,800 --> 00:00:53,840
model, you've got to specify
what the sample space is and
17
00:00:53,840 --> 00:00:57,920
what the probability measure
on that sample space is.
18
00:00:57,920 --> 00:01:01,850
And in practice, and in almost
everything we've talked about
19
00:01:01,850 --> 00:01:05,800
so far, there's really a basic
countable set of random
20
00:01:05,800 --> 00:01:08,490
variables which determine
everything else.
21
00:01:08,490 --> 00:01:12,030
In other words, when you find
the joint probability
22
00:01:12,030 --> 00:01:16,730
distribution on that set of
random variables, that tells
23
00:01:16,730 --> 00:01:20,570
you everything else
of interest.
24
00:01:20,570 --> 00:01:25,200
And a sample point or a sample
path on that set of random
25
00:01:25,200 --> 00:01:29,520
variables is in a collection of
sample values, one sample
26
00:01:29,520 --> 00:01:33,980
value for each random
variable.
27
00:01:33,980 --> 00:01:37,740
It's very convenient, especially
when you're in an
28
00:01:37,740 --> 00:01:43,630
exam and a little bit rushed,
to confuse random variables
29
00:01:43,630 --> 00:01:47,250
with the sample values for
the random variables.
30
00:01:47,250 --> 00:01:48,920
And that's fine.
31
00:01:48,920 --> 00:01:51,900
I just want to caution you
again, and I've done this many
32
00:01:51,900 --> 00:01:58,410
times, that about half the
mistakes that people make--
33
00:01:58,410 --> 00:02:01,980
half of the conceptual mistakes
that people make
34
00:02:01,980 --> 00:02:06,200
doing problems and doing quizzes
are connected with
35
00:02:06,200 --> 00:02:09,810
getting confused at some point
about what's a random variable
36
00:02:09,810 --> 00:02:12,210
and what's a sample value
of that random variable.
37
00:02:12,210 --> 00:02:17,210
And you start thinking about
sample values as just numbers.
38
00:02:17,210 --> 00:02:19,090
And I do that too.
39
00:02:19,090 --> 00:02:21,220
It's convenient for thinking
about things.
40
00:02:21,220 --> 00:02:26,790
But you have to know that that's
not the whole story.
41
00:02:26,790 --> 00:02:29,740
Often, we have uncountable
sets of random variables.
42
00:02:29,740 --> 00:02:34,720
Like in renewal processes, we
have the counting renewal
43
00:02:34,720 --> 00:02:38,690
process, which typically has an
uncountable set of random
44
00:02:38,690 --> 00:02:43,860
variables, a number of arrivals
up to each time, t,
45
00:02:43,860 --> 00:02:48,750
where t is a continuous valued
random variable.
46
00:02:48,750 --> 00:02:52,810
But in almost all of those
cases, you can define things
47
00:02:52,810 --> 00:02:56,195
in terms of simpler sets of
random variables, like the
48
00:02:56,195 --> 00:02:59,480
interarrival times,
which are IID.
49
00:03:02,530 --> 00:03:05,960
Most of the processes we've
talked about really have a
50
00:03:05,960 --> 00:03:08,600
pretty simple description if
you look for the simplest
51
00:03:08,600 --> 00:03:09,850
description of them.
52
00:03:13,730 --> 00:03:17,680
If you have a sequence of
IID random variables--
53
00:03:17,680 --> 00:03:25,270
which is what we have for
Poisson and renewal processes,
54
00:03:25,270 --> 00:03:28,680
and what we have for Markov
chains is not that much more
55
00:03:28,680 --> 00:03:30,310
complicated--
56
00:03:30,310 --> 00:03:35,500
the laws of large numbers are
useful to specify what the
57
00:03:35,500 --> 00:03:38,500
long term behavior is.
58
00:03:38,500 --> 00:03:47,280
The sample time average is, as
we all know by now, is the sum
59
00:03:47,280 --> 00:03:49,960
of the random variables
divided by n.
60
00:03:49,960 --> 00:03:53,090
So it's a sample average
of these quantities.
61
00:03:53,090 --> 00:03:57,570
The random variable, which has
a main x bar, the expected
62
00:03:57,570 --> 00:04:00,140
value of x, that's
almost obvious.
63
00:04:00,140 --> 00:04:03,350
You just take the expected value
of s sub n, and it's n
64
00:04:03,350 --> 00:04:08,360
times the expected value of x
divided by n, and you're done.
65
00:04:08,360 --> 00:04:11,680
And the variance, since these
random variables are
66
00:04:11,680 --> 00:04:15,540
independent, you find that
almost as easily.
67
00:04:15,540 --> 00:04:18,810
That has this very
simple-minded
68
00:04:18,810 --> 00:04:20,850
distribution function.
69
00:04:20,850 --> 00:04:24,340
Remember, we usually work
with distribution
70
00:04:24,340 --> 00:04:26,960
functions in this class.
71
00:04:26,960 --> 00:04:32,580
And often, the exercises are
much easier when you do them
72
00:04:32,580 --> 00:04:36,500
in terms of the distribution
function than if you use
73
00:04:36,500 --> 00:04:40,760
formulas you remember from
elementary courses, which are
74
00:04:40,760 --> 00:04:44,260
specialized to--
75
00:04:44,260 --> 00:04:47,140
which are specialized to
probability density and
76
00:04:47,140 --> 00:04:51,170
probability mass functions, and
often have more special
77
00:04:51,170 --> 00:04:53,110
conditions on them than that.
78
00:04:53,110 --> 00:04:57,470
But anyway, the distribution
function starts
79
00:04:57,470 --> 00:04:58,570
to look like this.
80
00:04:58,570 --> 00:05:03,250
As n gets bigger, you notice
that what's happening is that
81
00:05:03,250 --> 00:05:08,860
you get a distribution which
is scrunching in this way,
82
00:05:08,860 --> 00:05:10,820
which is starting to
look smoother.
83
00:05:10,820 --> 00:05:13,450
The jumps in it gets smaller.
84
00:05:13,450 --> 00:05:18,630
And you start out with this
thing which is kind of crazy.
85
00:05:18,630 --> 00:05:21,370
And by time, n is even 50.
86
00:05:21,370 --> 00:05:25,770
You get something which
almost looks like a--
87
00:05:25,770 --> 00:05:26,840
I don't know how we tell
the difference
88
00:05:26,840 --> 00:05:28,460
between those two things.
89
00:05:28,460 --> 00:05:30,060
I thought we could,
but we can't.
90
00:05:30,060 --> 00:05:31,670
I certainly can't up there.
91
00:05:31,670 --> 00:05:37,650
But anyway, the one that's
tightest in is the one
92
00:05:37,650 --> 00:05:39,880
for n equals 50.
93
00:05:39,880 --> 00:05:44,150
And what these laws of large
numbers all say in some sense
94
00:05:44,150 --> 00:05:51,380
is that this distribution
function gets crunched in
95
00:05:51,380 --> 00:05:54,550
towards an impulse
at the mean.
96
00:05:54,550 --> 00:05:58,260
And then they say other more
specialized things about how
97
00:05:58,260 --> 00:06:02,580
this happens, about sample
paths and all of that.
98
00:06:02,580 --> 00:06:06,270
But the idea is that this
distribution function is
99
00:06:06,270 --> 00:06:10,760
heading towards a
unit impulse.
100
00:06:10,760 --> 00:06:14,440
The weak law of large numbers
then says that if the expected
101
00:06:14,440 --> 00:06:18,840
value of the magnitude of x
is less than infinity--
102
00:06:18,840 --> 00:06:21,660
and usually when we talk about
random variables having a
103
00:06:21,660 --> 00:06:25,630
mean, that's exactly
what we mean.
104
00:06:25,630 --> 00:06:31,220
If that condition is not
satisfied, then we usually say
105
00:06:31,220 --> 00:06:33,690
that the random variable
doesn't have a mean.
106
00:06:33,690 --> 00:06:37,300
And you'll see that every time
you look at anything in
107
00:06:37,300 --> 00:06:38,520
probability theory.
108
00:06:38,520 --> 00:06:41,940
When people say the mean exists,
that's what they
109
00:06:41,940 --> 00:06:43,830
always mean.
110
00:06:43,830 --> 00:06:47,950
And what the theorem says then
is exactly what we were
111
00:06:47,950 --> 00:06:49,060
talking about before.
112
00:06:49,060 --> 00:06:54,940
The probability that the
difference between s n over n,
113
00:06:54,940 --> 00:06:58,570
and the mean of x bar, the
probability that it's greater
114
00:06:58,570 --> 00:07:03,090
than or equal to epsilon
equals 0 in the limit.
115
00:07:03,090 --> 00:07:06,020
So it's saying that you put
epsilon limits on that
116
00:07:06,020 --> 00:07:10,860
distribution function and let
n get bigger and bigger, it
117
00:07:10,860 --> 00:07:14,570
goes to 1 and 0.
118
00:07:14,570 --> 00:07:18,120
It says the probability of s n
over n, less than or equal to
119
00:07:18,120 --> 00:07:23,240
x, approaches a unit step as
n approaches infinity.
120
00:07:23,240 --> 00:07:27,660
This says this is the condition
for convergence in
121
00:07:27,660 --> 00:07:30,440
probability.
122
00:07:30,440 --> 00:07:33,880
What we're saying is that that
also means convergence and
123
00:07:33,880 --> 00:07:38,740
distribution function, and
distribution for this case.
124
00:07:38,740 --> 00:07:42,520
And then we also, when we got
to renewal processes, we
125
00:07:42,520 --> 00:07:45,330
talked about the strong
law of large numbers.
126
00:07:45,330 --> 00:07:49,760
And that says that the expected
value of x is finite.
127
00:07:49,760 --> 00:07:56,630
Then this limit approaches
x on a sample path basis.
128
00:07:56,630 --> 00:07:59,770
In other words, for every sample
path, except this set
129
00:07:59,770 --> 00:08:05,020
of probability 0, this
condition holds true.
130
00:08:05,020 --> 00:08:08,260
That doesn't seem like it's
very different or very
131
00:08:08,260 --> 00:08:10,610
important for the time being.
132
00:08:10,610 --> 00:08:14,060
But when we started studying
renewal processes, which is
133
00:08:14,060 --> 00:08:19,120
where we actually talked about
this, we saw that in fact, it
134
00:08:19,120 --> 00:08:24,830
let us talk about this, which
says that if you take any
135
00:08:24,830 --> 00:08:28,700
function of s n over n--
136
00:08:28,700 --> 00:08:31,590
in other words, a function
of a real value--
137
00:08:31,590 --> 00:08:33,830
a function of a--
138
00:08:33,830 --> 00:08:35,720
a real valued function of a--
139
00:08:40,010 --> 00:08:43,570
a real valued function
of a real value, yes.
140
00:08:43,570 --> 00:08:46,470
What you get is that
same function
141
00:08:46,470 --> 00:08:49,100
applied to the mean here.
142
00:08:49,100 --> 00:08:50,260
And that's the thing
which is so
143
00:08:50,260 --> 00:08:52,630
useful for renewal processes.
144
00:08:52,630 --> 00:08:55,740
And it's what usually makes
the strong law of large
145
00:08:55,740 --> 00:08:58,730
numbers so much easier to
use than the weak law.
146
00:09:04,220 --> 00:09:06,170
That's a plug for
the strong law.
147
00:09:06,170 --> 00:09:08,745
There are many extensions of the
week love telling how fast
148
00:09:08,745 --> 00:09:10,910
the convergence is.
149
00:09:10,910 --> 00:09:14,350
One thing you should always
remember about the central
150
00:09:14,350 --> 00:09:17,510
limit theorem, is it really
tells you something about the
151
00:09:17,510 --> 00:09:18,790
weak law of large numbers.
152
00:09:18,790 --> 00:09:22,260
It tells you how fast that
convergence is and what the
153
00:09:22,260 --> 00:09:24,720
convergence looks like.
154
00:09:24,720 --> 00:09:28,170
It says that if the variance
of this underlying random
155
00:09:28,170 --> 00:09:34,000
variable is finite, then this
limit here is equal to the
156
00:09:34,000 --> 00:09:37,290
normal distribution function,
the Gaussian at
157
00:09:37,290 --> 00:09:41,350
variance 1 and mean 0.
158
00:09:41,350 --> 00:09:45,070
And that becomes a little easier
to see what it's saying
159
00:09:45,070 --> 00:09:46,870
if you look at it this way.
160
00:09:46,870 --> 00:09:51,510
It says probability that s
n over n minus x bar--
161
00:09:51,510 --> 00:09:56,890
namely the difference between
the sum and the mean which
162
00:09:56,890 --> 00:09:58,380
it's converging to--
163
00:09:58,380 --> 00:10:01,340
the probability that that's less
than or equal to y sigma
164
00:10:01,340 --> 00:10:04,010
over square root of
n is this normal
165
00:10:04,010 --> 00:10:05,480
Gaussian random variable.
166
00:10:05,480 --> 00:10:11,740
It says that as n gets bigger
and bigger, this quantity here
167
00:10:11,740 --> 00:10:13,030
gets tighter and tighter.
168
00:10:13,030 --> 00:10:18,620
What it says in terms of the
picture here, in terms of this
169
00:10:18,620 --> 00:10:22,900
picture, it says that as n gets
bigger and bigger, this
170
00:10:22,900 --> 00:10:28,560
picture here scrunches down as
1 over the square root of n.
171
00:10:28,560 --> 00:10:30,970
And it also becomes Gaussian.
172
00:10:30,970 --> 00:10:33,760
| it tells you exactly what
kind of convergence you
173
00:10:33,760 --> 00:10:34,770
actually have here.
174
00:10:34,770 --> 00:10:39,200
Is not only saying that this
does converge to a unit step.
175
00:10:39,200 --> 00:10:42,010
It says how it converges.
176
00:10:42,010 --> 00:10:48,240
And that's a nice thing,
conceptually.
177
00:10:48,240 --> 00:10:51,780
You don't always need
it in problems.
178
00:10:51,780 --> 00:10:54,600
But you need it for
understanding what's going on.
179
00:10:59,890 --> 00:11:01,690
We're moving backwards,
it seems.
180
00:11:06,180 --> 00:11:09,420
Now, 1, 2, Poisson processes.
181
00:11:09,420 --> 00:11:12,630
We talked about arrival
processes.
182
00:11:12,630 --> 00:11:15,260
You'd almost think that all
processes are arrival
183
00:11:15,260 --> 00:11:17,080
processes at this point.
184
00:11:17,080 --> 00:11:19,770
But any time you start to think
about that, think of a
185
00:11:19,770 --> 00:11:21,270
Markov chain.
186
00:11:21,270 --> 00:11:26,150
And a Markov chain is not an
arrival process, ordinarily.
187
00:11:26,150 --> 00:11:28,470
Some of them can be
viewed that way.
188
00:11:28,470 --> 00:11:29,690
But most of them can't.
189
00:11:29,690 --> 00:11:31,990
An arrival processes
is an increasing
190
00:11:31,990 --> 00:11:34,650
sequence of random variables.
191
00:11:34,650 --> 00:11:40,020
0 less than s1, which is the
time of the first arrival, s2,
192
00:11:40,020 --> 00:11:42,810
which is a time of the second
arrival, and so forth.
193
00:11:42,810 --> 00:11:48,220
Interarrival times are x1 equals
s1, and x i equals s i
194
00:11:48,220 --> 00:11:51,150
minus s i minus 1.
195
00:11:51,150 --> 00:11:55,480
The picture, which you should
have indelibly printed on the
196
00:11:55,480 --> 00:11:58,850
back of your brain someplace
by this time, is
197
00:11:58,850 --> 00:12:00,430
this picture here.
198
00:12:00,430 --> 00:12:04,930
s1, s2, s3, are the times
at which arrivals occur.
199
00:12:04,930 --> 00:12:07,590
These are random variables, so
these arrivals come in at
200
00:12:07,590 --> 00:12:09,320
random times.
201
00:12:09,320 --> 00:12:14,690
x1, x2, x3 are the intervals
between arrivals.
202
00:12:14,690 --> 00:12:18,280
And N of t is the number of
arrivals that have occurred up
203
00:12:18,280 --> 00:12:19,860
until time t.
204
00:12:19,860 --> 00:12:26,800
So every time the t passes one
of these arrival times, N of t
205
00:12:26,800 --> 00:12:31,140
pops up by one, pops up by one
again, pops up by one again.
206
00:12:31,140 --> 00:12:34,200
The sample value
pops up by one.
207
00:12:34,200 --> 00:12:36,920
Arrival process can model
arrivals to a queue,
208
00:12:36,920 --> 00:12:40,320
departures from a queue,
locations of breaks in an oil
209
00:12:40,320 --> 00:12:43,960
line, an enormous number
of things.
210
00:12:43,960 --> 00:12:46,260
It's not just arrivals
we're talking about.
211
00:12:46,260 --> 00:12:48,070
It's all of these other
things, also.
212
00:12:48,070 --> 00:12:54,330
But it's something laid out on
a one-dimensional axis where
213
00:12:54,330 --> 00:12:58,390
things happen at various
places on that
214
00:12:58,390 --> 00:12:59,700
one-dimensional axis.
215
00:12:59,700 --> 00:13:05,100
So that's the way to view it.
216
00:13:05,100 --> 00:13:07,540
OK, same picture again.
217
00:13:07,540 --> 00:13:11,510
Process can be specified by the
joint distribution of the
218
00:13:11,510 --> 00:13:15,570
arrival epochs or the
interarrival times, and, in
219
00:13:15,570 --> 00:13:18,090
fact, of the counting process.
220
00:13:18,090 --> 00:13:25,200
If you see a sample path of
the counting process, then
221
00:13:25,200 --> 00:13:29,180
from that you can determine the
sample path of the arrival
222
00:13:29,180 --> 00:13:33,220
times and the sample path of
the interarrival times.
223
00:13:33,220 --> 00:13:38,320
And since any set of these
random variables specifies all
224
00:13:38,320 --> 00:13:43,220
three of these things, the
three are all equivalent.
225
00:13:43,220 --> 00:13:47,150
OK, we have this important
condition here.
226
00:13:47,150 --> 00:13:55,960
And I always sort of forget
this, but when these arrivals
227
00:13:55,960 --> 00:13:59,700
are highly delayed, when there's
a long period of time
228
00:13:59,700 --> 00:14:05,380
between each arrival, what that
says is the accounting
229
00:14:05,380 --> 00:14:08,480
process is getting small.
230
00:14:08,480 --> 00:14:12,570
So big interarrival times
corresponds to a small
231
00:14:12,570 --> 00:14:14,180
value of N of t.
232
00:14:14,180 --> 00:14:16,420
And you can see that in
the picture here.
233
00:14:16,420 --> 00:14:20,020
If you spread out these
arrivals, you make s1 all the
234
00:14:20,020 --> 00:14:21,290
way out here.
235
00:14:21,290 --> 00:14:26,190
Then N of t doesn't become
1 until way out here.
236
00:14:26,190 --> 00:14:32,930
So N of t as a function of t is
getting smaller as s sub n
237
00:14:32,930 --> 00:14:36,030
is getting larger.
238
00:14:36,030 --> 00:14:41,560
S sub n is the minimum of the
set of t, such that N of t is
239
00:14:41,560 --> 00:14:45,830
greater than or equal to N.
Sounds like a unpleasantly
240
00:14:45,830 --> 00:14:49,460
complicated expression.
241
00:14:49,460 --> 00:14:52,210
If any of you can find a simpler
way to say it than
242
00:14:52,210 --> 00:14:55,950
that, I would be absolutely
delighted to hear it.
243
00:14:55,950 --> 00:14:57,530
But I don't think there is.
244
00:14:57,530 --> 00:15:01,150
I think the simpler way to say
it is this picture here.
245
00:15:01,150 --> 00:15:03,230
And the picture says it.
246
00:15:03,230 --> 00:15:08,770
And you can sort of figure out
all those logical statements
247
00:15:08,770 --> 00:15:11,670
from the picture, which
is intuitively a
248
00:15:11,670 --> 00:15:12,942
lot clearer, I think.
249
00:15:17,270 --> 00:15:23,380
So now, renewal processes is
an arrival process with IID
250
00:15:23,380 --> 00:15:25,100
interarrival times.
251
00:15:25,100 --> 00:15:28,800
And a Poisson process is a
renewal process where the
252
00:15:28,800 --> 00:15:32,130
interarrival random variables
are exponential.
253
00:15:32,130 --> 00:15:35,290
So, Poisson process
is a special
254
00:15:35,290 --> 00:15:37,200
case of renewal process.
255
00:15:37,200 --> 00:15:40,920
Why are these exponential
interarrival
256
00:15:40,920 --> 00:15:43,350
arrival times so important?
257
00:15:43,350 --> 00:15:46,550
Well, it's because they're
memoryless.
258
00:15:46,550 --> 00:15:50,360
And the memoryless property says
that the probability that
259
00:15:50,360 --> 00:15:54,535
x is greater than t plus x is
equal to the probability that
260
00:15:54,535 --> 00:15:58,190
it's greater than x times the
probability that it's greater
261
00:15:58,190 --> 00:16:01,830
than t for all x and t greater
than or equal to 0.
262
00:16:01,830 --> 00:16:04,860
This makes better sense if
you say it conditionally.
263
00:16:04,860 --> 00:16:09,040
The probability that x is
greater than t plus x, given
264
00:16:09,040 --> 00:16:12,700
that it's greater than t, is
the same as the probability
265
00:16:12,700 --> 00:16:14,800
that x is greater that--
266
00:16:14,800 --> 00:16:17,460
capital X is greater
than little x.
267
00:16:17,460 --> 00:16:20,420
This really gives you
the memoryless
268
00:16:20,420 --> 00:16:21,780
property in a nutshell.
269
00:16:21,780 --> 00:16:25,860
It says if you're looking at
this process as it evolves,
270
00:16:25,860 --> 00:16:29,010
and you see an arrival, and then
you start looking for the
271
00:16:29,010 --> 00:16:32,160
next arrival, it says that no
matter how long you've been
272
00:16:32,160 --> 00:16:36,240
looking, the distribution
function, as the time to wait
273
00:16:36,240 --> 00:16:38,930
until the next arrival,
is the same
274
00:16:38,930 --> 00:16:40,580
exponential random variable.
275
00:16:40,580 --> 00:16:44,220
So you never gain anything
by waiting.
276
00:16:44,220 --> 00:16:46,390
You might as well
be impatient.
277
00:16:46,390 --> 00:16:48,790
But it doesn't do any good
to be impatient.
278
00:16:48,790 --> 00:16:51,130
Doesn't to any good to wait.
279
00:16:51,130 --> 00:16:52,850
It doesn't do any good
to not wait.
280
00:16:52,850 --> 00:16:56,280
No matter what you do, this
damn thing always takes an
281
00:16:56,280 --> 00:16:59,780
exponential amount
of time to occur.
282
00:16:59,780 --> 00:17:01,410
OK, that's what it means
to be memoryless.
283
00:17:01,410 --> 00:17:03,910
And the exponential is the only
284
00:17:03,910 --> 00:17:05,835
memoryless random variable.
285
00:17:10,775 --> 00:17:14,910
How about a geometric
random variable?
286
00:17:14,910 --> 00:17:19,190
The geometric random variable
is memoryless if you're only
287
00:17:19,190 --> 00:17:22,150
looking at integer times.
288
00:17:22,150 --> 00:17:32,180
Here we're talking about
times on a continuum.
289
00:17:32,180 --> 00:17:35,090
That's what this says.
290
00:17:35,090 --> 00:17:38,410
Well, that's what this says.
291
00:17:38,410 --> 00:17:46,590
And if you look at discrete
times, then a geometric random
292
00:17:46,590 --> 00:17:49,860
variable is memoryless also.
293
00:17:55,020 --> 00:17:58,210
We're given a Poisson
of rate lambda.
294
00:17:58,210 --> 00:18:01,290
The interval from any given t
greater than 0 until the first
295
00:18:01,290 --> 00:18:04,190
arrival after t is a
random variable.
296
00:18:04,190 --> 00:18:06,010
Let's call it z1.
297
00:18:06,010 --> 00:18:08,650
We already said that that
random variable was
298
00:18:08,650 --> 00:18:11,430
exponential.
299
00:18:11,430 --> 00:18:17,040
And it's independent of all
arrivals which occur before
300
00:18:17,040 --> 00:18:18,630
that starting time t.
301
00:18:18,630 --> 00:18:23,220
So looking at any starting
time t, doesn't make any
302
00:18:23,220 --> 00:18:25,530
difference what has happened
back here.
303
00:18:25,530 --> 00:18:27,450
That's not only the
last arrival, but
304
00:18:27,450 --> 00:18:29,630
all the other arrivals.
305
00:18:29,630 --> 00:18:32,880
The time until the next arrival
is exponential.
306
00:18:32,880 --> 00:18:36,520
The time until each arrival
after that is exponential
307
00:18:36,520 --> 00:18:41,690
also, which says that if you
look at this process starting
308
00:18:41,690 --> 00:18:47,250
at time t, it's a Poisson
process again, where all the
309
00:18:47,250 --> 00:18:50,450
times have to be shifted, of
course, but it's a Poisson
310
00:18:50,450 --> 00:18:52,830
process starting at time t.
311
00:18:52,830 --> 00:19:00,570
The corresponding counting
process, we can call it n
312
00:19:00,570 --> 00:19:04,950
tilde of t and tau, where tau is
greater than or equal to t,
313
00:19:04,950 --> 00:19:09,690
where this is the number of
arrivals in the original
314
00:19:09,690 --> 00:19:14,610
process up until time tau minus
the number of arrivals
315
00:19:14,610 --> 00:19:16,340
up until time t.
316
00:19:16,340 --> 00:19:19,330
If you look at that difference,
so many arrivals
317
00:19:19,330 --> 00:19:26,550
up until t, so many more
up until time tau.
318
00:19:26,550 --> 00:19:29,030
You look at the difference
between tau and t.
319
00:19:29,030 --> 00:19:37,080
The number of arrivals in that
interval is the same Poisson
320
00:19:37,080 --> 00:19:39,800
distributing random
variable again.
321
00:19:39,800 --> 00:19:43,080
So, it has the same
distribution as N
322
00:19:43,080 --> 00:19:45,020
of tau minus t.
323
00:19:45,020 --> 00:19:47,650
And that's called the stationary
increment property.
324
00:19:47,650 --> 00:19:50,720
It says that no matter where you
start a Poisson process,
325
00:19:50,720 --> 00:19:53,030
it always looks exactly
the same.
326
00:19:53,030 --> 00:19:58,370
It says that if you wait for one
hour and start then, it's
327
00:19:58,370 --> 00:20:01,750
exactly the same as what
it was before.
328
00:20:01,750 --> 00:20:05,960
If we had Poisson processes in
the world, it wouldn't do any
329
00:20:05,960 --> 00:20:09,720
good to travel on certain days
rather than other days.
330
00:20:09,720 --> 00:20:13,170
It wouldn't do any good to leave
to drive home at one
331
00:20:13,170 --> 00:20:14,850
hour rather than another hour.
332
00:20:14,850 --> 00:20:17,670
You'd have the same travel
all the time.
333
00:20:17,670 --> 00:20:18,980
It's all equal.
334
00:20:18,980 --> 00:20:21,140
It would be an awful world
if it were stationary.
335
00:20:23,770 --> 00:20:26,750
The independent increment
properties for counting
336
00:20:26,750 --> 00:20:33,170
process is that for all
sequences of ordered times--
337
00:20:33,170 --> 00:20:37,490
0 less than t1 less than
t2 up to t k--
338
00:20:37,490 --> 00:20:40,310
the random variables n of t1--
339
00:20:40,310 --> 00:20:44,440
and now we're talking about the
number of arrivals between
340
00:20:44,440 --> 00:20:47,510
t1 and t2, the number
of arrivals between
341
00:20:47,510 --> 00:20:49,600
n minus 1 and tn.
342
00:20:49,600 --> 00:20:52,330
These are all independent
of each other.
343
00:20:52,330 --> 00:20:55,390
That's what this independent
increment property says.
344
00:20:55,390 --> 00:20:58,110
And we see from what we've said
about this memoryless
345
00:20:58,110 --> 00:21:02,680
property that the Poisson
process does indeed have this
346
00:21:02,680 --> 00:21:04,750
independent increment
property.
347
00:21:04,750 --> 00:21:08,720
Poisson processes have both the
stationary and independent
348
00:21:08,720 --> 00:21:11,240
increment properties.
349
00:21:11,240 --> 00:21:15,760
And this looks like an immediate
consequence of that.
350
00:21:15,760 --> 00:21:16,370
It's not.
351
00:21:16,370 --> 00:21:19,630
Remember, we had to struggle
with this for a bit.
352
00:21:19,630 --> 00:21:22,500
But it says plus Poisson
processes can be defined by
353
00:21:22,500 --> 00:21:26,450
the stationary and independent
increment properties, plus
354
00:21:26,450 --> 00:21:32,730
either the Poisson PMF for N
of t, or this incremental
355
00:21:32,730 --> 00:21:38,660
property, the probability that N
tilde of t and t plus delta,
356
00:21:38,660 --> 00:21:43,320
and the number of arrivals
between t and t plus delta,
357
00:21:43,320 --> 00:21:46,170
the probability that that's
1 is equal to
358
00:21:46,170 --> 00:21:47,600
lambda times delta.
359
00:21:47,600 --> 00:21:53,040
In other words, this view of a
Poisson process is the view
360
00:21:53,040 --> 00:21:56,850
that you get when you sort
of forget about time.
361
00:21:56,850 --> 00:22:00,220
And you think of arrivals from
outer space coming down and
362
00:22:00,220 --> 00:22:01,470
hitting on a line.
363
00:22:01,470 --> 00:22:03,760
And they hit on that
line randomly.
364
00:22:03,760 --> 00:22:05,860
And each one of them
is independent
365
00:22:05,860 --> 00:22:07,780
of every other one.
366
00:22:07,780 --> 00:22:15,350
And that's what you get if you
wind up with a density of
367
00:22:15,350 --> 00:22:18,770
lambda arrivals per unit time.
368
00:22:18,770 --> 00:22:22,120
OK, we talked about all
of that, of course.
369
00:22:22,120 --> 00:22:23,400
The probability distributions--
370
00:22:26,050 --> 00:22:29,380
there are many of them for
a Poisson process.
371
00:22:29,380 --> 00:22:32,470
The Poisson process is
remarkable in the sense that
372
00:22:32,470 --> 00:22:35,320
anything you want to find,
there's generally a simple
373
00:22:35,320 --> 00:22:37,070
formula for it.
374
00:22:37,070 --> 00:22:39,530
If it's complicated, you're
probably not looking at
375
00:22:39,530 --> 00:22:42,010
it the right way.
376
00:22:42,010 --> 00:22:45,360
So many things come out
very, very simply.
377
00:22:45,360 --> 00:22:46,660
The probability--
378
00:22:46,660 --> 00:22:50,580
the joint probability
distribution of all of the
379
00:22:50,580 --> 00:22:58,670
arrival times up until time N is
an exponential just in the
380
00:22:58,670 --> 00:23:05,080
last one, which says that the
intermediate arrival epochs
381
00:23:05,080 --> 00:23:09,140
are equally likely to be
anywhere, just as long as they
382
00:23:09,140 --> 00:23:13,440
satisfy this ordering
restriction, s1 less than s2.
383
00:23:13,440 --> 00:23:15,430
That's what this formula says.
384
00:23:15,430 --> 00:23:20,490
It says that the joint density
of these arrival times doesn't
385
00:23:20,490 --> 00:23:23,010
depend on anything except the
time of the last one.
386
00:23:25,740 --> 00:23:28,520
But it does depend on the fact
that they're [INAUDIBLE].
387
00:23:28,520 --> 00:23:31,435
From that, you can find
virtually everything else if
388
00:23:31,435 --> 00:23:32,900
you want to.
389
00:23:32,900 --> 00:23:36,600
That really is saying exactly
the same thing as we were just
390
00:23:36,600 --> 00:23:38,440
saying a while ago.
391
00:23:38,440 --> 00:23:41,740
This is the viewpoint of looking
at this line from
392
00:23:41,740 --> 00:23:47,040
outer space with arrivals coming
in, coming in uniformly
393
00:23:47,040 --> 00:23:51,630
distributed over this line
interval, and each of them
394
00:23:51,630 --> 00:23:54,080
independent of each other one.
395
00:23:54,080 --> 00:23:57,740
That's what you wind
up saying.
396
00:23:57,740 --> 00:24:01,490
This density, then, of the
n-th arrival, if you just
397
00:24:01,490 --> 00:24:05,620
integrate all this stuff, you
get the Erlang formula.
398
00:24:05,620 --> 00:24:12,940
Probability of arrival n in
t to t plus delta is--
399
00:24:12,940 --> 00:24:17,820
now this is the derivation that
we went through before,
400
00:24:17,820 --> 00:24:20,310
going from Erlang to Poisson.
401
00:24:20,310 --> 00:24:24,370
You can go from Poisson to
Erlang too, if you want to.
402
00:24:24,370 --> 00:24:26,320
But it's a little easier
to go this way.
403
00:24:26,320 --> 00:24:30,500
The probability of arrival in
t to t plus delta is the
404
00:24:30,500 --> 00:24:35,890
probability that n of t is
equal to n minus 1 times
405
00:24:35,890 --> 00:24:40,670
lambda delta plus an o
of delta, of course.
406
00:24:40,670 --> 00:24:46,270
And the probability that n of
t is equal to n minus 1 from
407
00:24:46,270 --> 00:24:53,050
this formula here is going to be
the density of when s sub n
408
00:24:53,050 --> 00:24:55,040
appears, divided by lambda.
409
00:24:55,040 --> 00:24:58,910
That's exactly what this
formula here says.
410
00:24:58,910 --> 00:25:01,980
So that's just the Poisson
distribution.
411
00:25:01,980 --> 00:25:04,910
We've been through
that derivation.
412
00:25:04,910 --> 00:25:08,420
It's almost a derivation worth
remembering, because it just
413
00:25:08,420 --> 00:25:11,940
appears so often.
414
00:25:11,940 --> 00:25:16,160
As you've seen from the problem
sets we've done,
415
00:25:16,160 --> 00:25:20,970
almost every problem you can
dream of, dealing with Poisson
416
00:25:20,970 --> 00:25:27,150
processes, the easy way to do
them comes from this property
417
00:25:27,150 --> 00:25:30,730
of combining and splitting
Poisson processes.
418
00:25:30,730 --> 00:25:35,170
It says if n1 of t, n2 of t,
up to n sub k of t are
419
00:25:35,170 --> 00:25:37,500
independent Poisson
processes--
420
00:25:37,500 --> 00:25:39,880
what do you mean by
a process being
421
00:25:39,880 --> 00:25:42,200
independent of another process?
422
00:25:42,200 --> 00:25:46,660
Well, the process is specified
by the interarrival times for
423
00:25:46,660 --> 00:25:47,660
that process.
424
00:25:47,660 --> 00:25:50,950
So what we're saying here is the
interarrival times for the
425
00:25:50,950 --> 00:25:54,470
first process are independent
of the interarrival times of
426
00:25:54,470 --> 00:25:56,770
the second process,
independent of the
427
00:25:56,770 --> 00:26:00,620
interarrival times for the third
process, and so forth.
428
00:26:00,620 --> 00:26:02,990
Again, this is a view of someone
from outer space,
429
00:26:02,990 --> 00:26:06,180
throwing darts onto a line.
430
00:26:06,180 --> 00:26:09,750
And if you have multiple people
throwing darts on a
431
00:26:09,750 --> 00:26:13,450
line, but they're all equally
distributed, all uniformly
432
00:26:13,450 --> 00:26:16,600
distributed over the line,
this is exactly
433
00:26:16,600 --> 00:26:20,670
the model you get.
434
00:26:20,670 --> 00:26:22,180
So we have two views here.
435
00:26:22,180 --> 00:26:26,480
The first one is to look at
the arrival epochs that's
436
00:26:26,480 --> 00:26:28,420
generated from each process.
437
00:26:28,420 --> 00:26:31,710
And then combine all arrivals
into one Poisson process.
438
00:26:31,710 --> 00:26:34,900
So we look at all these Poisson
processes, and then
439
00:26:34,900 --> 00:26:38,340
take the sum of them, and we
get a Poisson process.
440
00:26:38,340 --> 00:26:40,190
The other way to look at it--
441
00:26:40,190 --> 00:26:43,120
and going back and forth between
these two views is the
442
00:26:43,120 --> 00:26:45,060
way you solve problems--
443
00:26:45,060 --> 00:26:46,770
you look at the combined
sequence of
444
00:26:46,770 --> 00:26:48,900
arrival epochs first.
445
00:26:48,900 --> 00:26:52,400
And then for each arrival that
comes in, you think of an IID
446
00:26:52,400 --> 00:26:55,450
random variable independent
of all the other random
447
00:26:55,450 --> 00:27:02,860
variables, which decides for
each arrival which of the
448
00:27:02,860 --> 00:27:04,710
sub-processes it goes to.
449
00:27:04,710 --> 00:27:08,680
So there's this hidden
process--
450
00:27:08,680 --> 00:27:09,890
well, it's not hidden.
451
00:27:09,890 --> 00:27:12,100
You can see what it's doing
from looking at all the
452
00:27:12,100 --> 00:27:14,340
sub-processes.
453
00:27:14,340 --> 00:27:20,670
And each arrival then is
associated with the given
454
00:27:20,670 --> 00:27:24,700
sub-process, with the
probability mass function
455
00:27:24,700 --> 00:27:28,160
lambda sub i over the
sum of lambda sub j.
456
00:27:28,160 --> 00:27:30,460
So this is the workhorse
of Poisson
457
00:27:30,460 --> 00:27:32,270
type queueing problems.
458
00:27:32,270 --> 00:27:35,990
You study queuing theory,
every page, you
459
00:27:35,990 --> 00:27:37,980
see this thing used.
460
00:27:37,980 --> 00:27:41,480
If you look at Kleinrock's books
on queueing, they're
461
00:27:41,480 --> 00:27:45,120
very nice books because they
cover so many different
462
00:27:45,120 --> 00:27:47,040
queueing situations.
463
00:27:47,040 --> 00:27:50,230
You find him using this
on every page.
464
00:27:50,230 --> 00:27:54,060
And he never tells you that he's
using it, but that's what
465
00:27:54,060 --> 00:27:54,670
he's doing.
466
00:27:54,670 --> 00:27:59,360
So that's a useful
thing to know.
467
00:27:59,360 --> 00:28:02,840
We then talked about conditional
arrivals and order
468
00:28:02,840 --> 00:28:05,590
statistics.
469
00:28:05,590 --> 00:28:12,280
The conditional distribution
of the N first arrivals--
470
00:28:12,280 --> 00:28:17,670
namely, s sub 1 s sub
2 up to s sub n--
471
00:28:17,670 --> 00:28:24,250
given the number of arrivals in
N of t is just n factorial
472
00:28:24,250 --> 00:28:25,430
over t to the n.
473
00:28:25,430 --> 00:28:29,380
Again, it doesn't depend on
where these arrivals are.
474
00:28:29,380 --> 00:28:33,215
It's just a function which is
independent of each arrival.
475
00:28:33,215 --> 00:28:36,660
It's the same kind of
conditioning we had before.
476
00:28:36,660 --> 00:28:40,080
It's n factorial divided
by t to the n.
477
00:28:40,080 --> 00:28:44,360
Because of the fact that if
you order these random
478
00:28:44,360 --> 00:28:49,450
variables, t1 less than t2 less
than t3, and so forth, up
479
00:28:49,450 --> 00:28:53,540
until time t, and then you say
how many different ways can I
480
00:28:53,540 --> 00:29:01,590
arrange a set of numbers, each
between 0 and t so that we
481
00:29:01,590 --> 00:29:03,630
have different orderings
of them.
482
00:29:03,630 --> 00:29:06,700
And you can choose any one
of the N to be the first.
483
00:29:06,700 --> 00:29:09,560
You can choose any one
of the remaining n
484
00:29:09,560 --> 00:29:11,510
minus 1 to be the second.
485
00:29:11,510 --> 00:29:14,670
And that's where this is n
factorial comes from here.
486
00:29:14,670 --> 00:29:18,140
And that, again we've
been over.
487
00:29:18,140 --> 00:29:21,660
The probability that s1 is
greater than tau, given that
488
00:29:21,660 --> 00:29:27,540
they're interarrivals in the
overall interval t, comes from
489
00:29:27,540 --> 00:29:31,390
just looking at N uniformly
distributed random variables
490
00:29:31,390 --> 00:29:33,190
between 0 and t.
491
00:29:33,190 --> 00:29:35,840
And then what do you do with
those uniformly distributed
492
00:29:35,840 --> 00:29:37,670
random variables?
493
00:29:37,670 --> 00:29:40,490
Well, you ask the question,
what's the probability that
494
00:29:40,490 --> 00:29:44,140
all of them occur
after time tau?
495
00:29:44,140 --> 00:29:47,820
And that's just t minus tau
divided by t raised to the
496
00:29:47,820 --> 00:29:48,910
n-th power.
497
00:29:48,910 --> 00:29:51,980
And see, all of these formulas
just come from particular
498
00:29:51,980 --> 00:29:54,360
viewpoints about what's
going on.
499
00:29:54,360 --> 00:29:55,760
You have a number
of viewpoints.
500
00:29:55,760 --> 00:29:58,550
One of them is throwing
darts at a line.
501
00:29:58,550 --> 00:30:01,140
One of them is having
exponential
502
00:30:01,140 --> 00:30:02,510
interarrival times.
503
00:30:02,510 --> 00:30:06,660
One of them is these uniform
interarrivals.
504
00:30:06,660 --> 00:30:08,880
It's only a very small
number of tricks.
505
00:30:08,880 --> 00:30:13,600
And you just use them in
various combinations.
506
00:30:13,600 --> 00:30:17,800
So the joint distribution of s1
to s n, given N of t equals
507
00:30:17,800 --> 00:30:21,250
n, is the same as the joint
distribution of N uniform
508
00:30:21,250 --> 00:30:24,070
random variables after
they've been ordered.
509
00:30:28,650 --> 00:30:32,115
So let's go on to finite
state Markov chains.
510
00:30:35,240 --> 00:30:37,670
Seems like we're covering an
enormous amount of material in
511
00:30:37,670 --> 00:30:38,350
this course.
512
00:30:38,350 --> 00:30:40,150
And I think we are.
513
00:30:40,150 --> 00:30:44,290
But as I'm trying to say, as
we go along, it's all--
514
00:30:44,290 --> 00:30:46,850
I mean, everything follows from
a relatively small set of
515
00:30:46,850 --> 00:30:48,620
principles.
516
00:30:48,620 --> 00:30:51,100
Of course, it's harder to
understand the small set of
517
00:30:51,100 --> 00:30:54,580
principles and how to apply them
than it is to understand
518
00:30:54,580 --> 00:30:55,460
all the details.
519
00:30:55,460 --> 00:30:56,710
But that's--
520
00:30:58,970 --> 00:31:01,560
but on the other hand, if you
understand the principles,
521
00:31:01,560 --> 00:31:04,620
then all those details,
including the ones we haven't
522
00:31:04,620 --> 00:31:08,280
talked about, are easy
to deal with.
523
00:31:08,280 --> 00:31:11,750
An integer-time stochastic
process--
524
00:31:11,750 --> 00:31:14,450
x1, x2, x3, blah, blah, blah--
525
00:31:14,450 --> 00:31:19,220
is a Markov chain if for all n,
namely the number of them
526
00:31:19,220 --> 00:31:21,770
that we're looking at--
527
00:31:21,770 --> 00:31:23,020
well--
528
00:31:25,880 --> 00:31:30,190
for all n, i, j, k, l, and so
forth, the probability that
529
00:31:30,190 --> 00:31:35,770
the n-th of these random
variables is equal to j, given
530
00:31:35,770 --> 00:31:39,340
what all of the others are-- and
these are not ordered now.
531
00:31:39,340 --> 00:31:41,460
I mean, in a Markov chain,
nothing is ordered.
532
00:31:41,460 --> 00:31:44,430
We're not talking about
an arrival process.
533
00:31:44,430 --> 00:31:47,220
We're just talking about a frog
jumping around on lily
534
00:31:47,220 --> 00:31:52,660
pads, if you arrange the lily
pads in a linear way, if these
535
00:31:52,660 --> 00:31:54,430
are random variables.
536
00:31:54,430 --> 00:32:00,530
The probability that the n-th
location is equal to j, given
537
00:32:00,530 --> 00:32:06,410
that the previous locations are
i, k, back to m, is just
538
00:32:06,410 --> 00:32:11,010
some probability p sub
i j, a conditional
539
00:32:11,010 --> 00:32:14,120
probability of j given i.
540
00:32:14,120 --> 00:32:17,670
In other words, once if you're
looking at what happens at
541
00:32:17,670 --> 00:32:22,340
time n, once you know what
happened at time n minus 1,
542
00:32:22,340 --> 00:32:24,830
everything else is
of no concern.
543
00:32:24,830 --> 00:32:29,400
This process evolves by having
a history of only one time
544
00:32:29,400 --> 00:32:31,980
unit, a little like the
Poisson process.
545
00:32:31,980 --> 00:32:36,070
The Poisson process evolves
by being totally
546
00:32:36,070 --> 00:32:37,880
independent of the past.
547
00:32:37,880 --> 00:32:40,600
Here, you put a little
dependence in the past.
548
00:32:40,600 --> 00:32:44,150
But the dependence is only to
look at the last thing that
549
00:32:44,150 --> 00:32:49,040
happened, and nothing before the
last time that happened.
550
00:32:49,040 --> 00:32:53,850
So p sub i j depends
only on i and j.
551
00:32:53,850 --> 00:32:59,170
And the initial probability mass
function is arbitrary.
552
00:32:59,170 --> 00:33:02,470
Markov chain is finite-state if
the sample space for each x
553
00:33:02,470 --> 00:33:07,400
i, as a finite set S. And the
sample space S is usually
554
00:33:07,400 --> 00:33:10,530
taken to be integers
1 up to M.
555
00:33:10,530 --> 00:33:13,490
In all these formulas we write,
we're always summing
556
00:33:13,490 --> 00:33:17,230
from one to M. And the reason
for that is we've assumed the
557
00:33:17,230 --> 00:33:22,120
states are 1, 2, 3, up to M.
Sometimes it's more convenient
558
00:33:22,120 --> 00:33:23,765
to think of different
state spaces.
559
00:33:26,730 --> 00:33:29,040
But all the formulas
we use are based on
560
00:33:29,040 --> 00:33:31,290
this state space here.
561
00:33:31,290 --> 00:33:36,500
Markov up chain is completely
described by these transition
562
00:33:36,500 --> 00:33:41,200
probabilities plus the initial
probabilities.
563
00:33:41,200 --> 00:33:44,390
If you want to write down the
probability of what x is this
564
00:33:44,390 --> 00:33:49,030
some time N given what was at
some time 0, all you have to
565
00:33:49,030 --> 00:33:52,890
do is trace all the paths from
0 out to N, add up the
566
00:33:52,890 --> 00:33:56,890
probabilities of all of those
paths, and that tells you the
567
00:33:56,890 --> 00:33:58,020
probability you want.
568
00:33:58,020 --> 00:34:01,820
All probabilities and be
calculated just from knowing
569
00:34:01,820 --> 00:34:06,240
what these transition
probabilities are.
570
00:34:06,240 --> 00:34:10,980
Note that when we're dealing
with Poisson processes, we
571
00:34:10,980 --> 00:34:15,520
defined everything in
terms of how many--
572
00:34:15,520 --> 00:34:20,250
how many variables are there in
defining a Poisson process?
573
00:34:20,250 --> 00:34:25,020
How many things do you have to
specify before I know exactly
574
00:34:25,020 --> 00:34:27,320
what Poisson process
I'm talking about?
575
00:34:30,540 --> 00:34:31,760
Only the Poisson rate.
576
00:34:31,760 --> 00:34:35,650
Only one parameter is necessary
577
00:34:35,650 --> 00:34:37,639
for a Poisson process.
578
00:34:37,639 --> 00:34:43,219
For a finite-state Markov
process, you need a lot more.
579
00:34:43,219 --> 00:34:48,310
What you need is all of these
values, p sub i j.
580
00:34:48,310 --> 00:34:52,409
If you sum p sub i j over
j, you have to get 1.
581
00:34:52,409 --> 00:34:54,830
So that removes one of them.
582
00:34:54,830 --> 00:34:58,360
But as soon as you specify that
transition matrix, you've
583
00:34:58,360 --> 00:34:59,960
specified everything.
584
00:34:59,960 --> 00:35:01,260
So there's nothing more to know
585
00:35:01,260 --> 00:35:03,220
about the Poisson process.
586
00:35:03,220 --> 00:35:06,060
There's only all these gruesome
derivations that we
587
00:35:06,060 --> 00:35:07,580
go through.
588
00:35:07,580 --> 00:35:11,600
But everything is initially
determined.
589
00:35:11,600 --> 00:35:13,960
Set of transition probabilities
is usually
590
00:35:13,960 --> 00:35:16,030
viewed as the Markov chain.
591
00:35:16,030 --> 00:35:19,760
And the initial probabilities
are usually viewed as just a
592
00:35:19,760 --> 00:35:21,740
parameter that we deal with.
593
00:35:21,740 --> 00:35:23,840
In other words, we--
594
00:35:23,840 --> 00:35:28,250
in other words, what we study
is the particular Markov
595
00:35:28,250 --> 00:35:31,550
chain, whether it's recurrent,
whether it's transient,
596
00:35:31,550 --> 00:35:32,800
whatever it is.
597
00:35:32,800 --> 00:35:35,770
How you break it up into
classes, all of that stuff
598
00:35:35,770 --> 00:35:39,060
only depends on these transition
probabilities and
599
00:35:39,060 --> 00:35:40,815
doesn't depend on
where you start.
600
00:35:46,920 --> 00:35:51,490
Now, a finite-state Markov chain
can be described either
601
00:35:51,490 --> 00:35:54,230
as a directed graph
or as a matrix.
602
00:35:54,230 --> 00:35:58,300
I hope you've seen by this
time that some things are
603
00:35:58,300 --> 00:36:03,040
easier to look at if you look at
things in terms of a graph.
604
00:36:03,040 --> 00:36:07,180
Some things are easier to look
at if you look at something
605
00:36:07,180 --> 00:36:08,660
like this matrix.
606
00:36:08,660 --> 00:36:13,230
And some problems can be solved
by inspection, if you
607
00:36:13,230 --> 00:36:14,700
draw a graph of it.
608
00:36:14,700 --> 00:36:17,890
Some can be solved almost
by inspection if
609
00:36:17,890 --> 00:36:19,480
you look at the matrix.
610
00:36:19,480 --> 00:36:23,460
If you're doing things by
computer, usually computers
611
00:36:23,460 --> 00:36:27,450
deal with matrices more easily
than with graphs.
612
00:36:27,450 --> 00:36:31,070
If you're dealing with a Markov
chain with 100,000
613
00:36:31,070 --> 00:36:35,290
states, you're not going to
look at the graph and
614
00:36:35,290 --> 00:36:38,330
determine very much from it,
because it's typically going
615
00:36:38,330 --> 00:36:39,650
to be fairly complicated--
616
00:36:39,650 --> 00:36:42,020
unless it has some very
simple structure.
617
00:36:42,020 --> 00:36:46,440
And sometimes that simple
structure is determined.
618
00:36:46,440 --> 00:36:48,780
If it's something where
you can only--
619
00:36:48,780 --> 00:36:52,190
where you have the states
numbered from 1 to 100,000,
620
00:36:52,190 --> 00:36:56,270
and you can only go from state
i to state i plus 1, or from
621
00:36:56,270 --> 00:36:59,910
state i to i plus 1, or
i minus 1, then it
622
00:36:59,910 --> 00:37:01,380
becomes very simple.
623
00:37:01,380 --> 00:37:04,320
And you like to look at
it as a graph again.
624
00:37:04,320 --> 00:37:07,670
But ordinarily, you don't
like to do that.
625
00:37:07,670 --> 00:37:15,000
But the nice thing about this
graph is that it tells you
626
00:37:15,000 --> 00:37:19,090
very simply and visually which
transition probabilities are
627
00:37:19,090 --> 00:37:23,810
zero, and which transition
probabilities are non-zero.
628
00:37:23,810 --> 00:37:26,690
And that's the thing that
specifies which states are
629
00:37:26,690 --> 00:37:31,650
recurrent, which states are
transient, and all of that.
630
00:37:31,650 --> 00:37:35,400
All of that kind of elementary
analysis about a Markov chain
631
00:37:35,400 --> 00:37:40,300
all comes from looking at this
graph and seeing whether you
632
00:37:40,300 --> 00:37:46,290
can get from one state to
another state by some process.
633
00:37:46,290 --> 00:37:50,520
So let's move on from that.
634
00:37:50,520 --> 00:37:53,620
Talk about the classification
of states.
635
00:37:53,620 --> 00:37:57,500
We started out with the
idea of a walk and
636
00:37:57,500 --> 00:37:59,370
a path and a cycle.
637
00:37:59,370 --> 00:38:03,610
I'm not sure these terms are
uniform throughout the field.
638
00:38:03,610 --> 00:38:07,550
But a walk is an ordered
string of nodes, like
639
00:38:07,550 --> 00:38:10,020
i0, i1, up to i n.
640
00:38:10,020 --> 00:38:14,960
You can have repeated elements
here, but you need a directed
641
00:38:14,960 --> 00:38:18,170
arc from i sub n minus
1 to i sub m.
642
00:38:18,170 --> 00:38:23,035
Like for example, in this stupid
Markov chain here--
643
00:38:25,870 --> 00:38:28,880
I mean, when you're drawing
things is LaTeX, it's kind of
644
00:38:28,880 --> 00:38:31,760
hard to draw those nice
little curves there.
645
00:38:31,760 --> 00:38:34,610
And because of that, when you
once draw a Markov chain, you
646
00:38:34,610 --> 00:38:36,050
never want to change it.
647
00:38:36,050 --> 00:38:39,210
And that's why these nodes
have a very small set of
648
00:38:39,210 --> 00:38:40,530
Markov chains in them.
649
00:38:40,530 --> 00:38:46,580
It's just to save me some work,
drawing and drawing
650
00:38:46,580 --> 00:38:47,830
these diagrams.
651
00:38:50,030 --> 00:38:55,700
An example of a walk, as you
start in 4, you take the self
652
00:38:55,700 --> 00:38:58,800
loop, go back to 4 at time 2.
653
00:38:58,800 --> 00:39:01,660
Then you go to state
1 at time 3.
654
00:39:01,660 --> 00:39:05,240
Then you go to state
2 at time 4.
655
00:39:05,240 --> 00:39:08,140
Then you go to stage
3, time 5.
656
00:39:08,140 --> 00:39:11,010
And back to state 2 at time 6.
657
00:39:11,010 --> 00:39:13,300
You have repeated nodes there.
658
00:39:13,300 --> 00:39:17,230
You have repeated nodes
separated here.
659
00:39:17,230 --> 00:39:20,630
Another example of a
walk is 4, 1, 2, 3.
660
00:39:20,630 --> 00:39:24,120
Example of a path, the path
can't have any repeated nodes.
661
00:39:24,120 --> 00:39:27,060
We'd like to look at paths,
because if you're going to be
662
00:39:27,060 --> 00:39:30,280
able to get from one node to
another node, and there's some
663
00:39:30,280 --> 00:39:33,420
walk that goes all around the
place and gets to that final
664
00:39:33,420 --> 00:39:36,770
node, there's also path
that goes there.
665
00:39:36,770 --> 00:39:39,900
If you look at the walk, you
just leave that all the cycles
666
00:39:39,900 --> 00:39:42,570
along the way, and
you get to the n.
667
00:39:42,570 --> 00:39:45,980
And a cycle, of course, which I
didn't define, is something
668
00:39:45,980 --> 00:39:49,820
which starts at one node, goes
through a path, and then
669
00:39:49,820 --> 00:39:52,730
finally comes back to the same
node that it started at.
670
00:39:52,730 --> 00:39:56,800
And it doesn't make any
difference for the cycle 2, 3,
671
00:39:56,800 --> 00:40:01,610
2 whether you call it
2, 3, 2 or 3, 2, 3.
672
00:40:01,610 --> 00:40:04,390
That's the same cycle, and
it's not even worth
673
00:40:04,390 --> 00:40:07,200
distinguishing between
those two ideas.
674
00:40:07,200 --> 00:40:12,723
OK That's that.
675
00:40:15,360 --> 00:40:20,010
If there's a path from--
676
00:40:20,010 --> 00:40:21,260
where did I--
677
00:40:26,110 --> 00:40:31,800
node j is accessible from i,
which we abbreviate as i
678
00:40:31,800 --> 00:40:33,680
has a path to j.
679
00:40:33,680 --> 00:40:38,010
If there's a walk from i to
j, which means that p
680
00:40:38,010 --> 00:40:40,650
sup i j to the n--
681
00:40:40,650 --> 00:40:44,150
this is the transition
probability, the probability
682
00:40:44,150 --> 00:40:49,160
that x sub n is equal to
j, given that x sub
683
00:40:49,160 --> 00:40:50,710
0 is equal to i.
684
00:40:50,710 --> 00:40:53,380
And we use this all the time.
685
00:40:53,380 --> 00:40:57,370
If this is greater than zero
for some n greater than 0.
686
00:40:57,370 --> 00:41:06,950
In other words, j is accessible
from i if there's a
687
00:41:06,950 --> 00:41:09,240
path from i that goes to j.
688
00:41:12,300 --> 00:41:17,170
And trivially, if i go to j, and
there's a path from j to
689
00:41:17,170 --> 00:41:21,520
k, then there has to be
a path from i to k.
690
00:41:21,520 --> 00:41:25,730
If you've ever tried to make up
a mapping program to find
691
00:41:25,730 --> 00:41:28,910
how to get from here to there,
this is one of the most useful
692
00:41:28,910 --> 00:41:29,740
things you use.
693
00:41:29,740 --> 00:41:32,320
If there's a way to get here
to there, and a way to get
694
00:41:32,320 --> 00:41:35,330
from here to there, then there's
a way to get from here
695
00:41:35,330 --> 00:41:37,560
all the way to the end.
696
00:41:37,560 --> 00:41:42,650
And if you look up what most of
these map programs do, you
697
00:41:42,650 --> 00:41:47,040
see that they overuse this
enormously and they wind up
698
00:41:47,040 --> 00:41:50,910
taking you from here to there
by some bizarre path just
699
00:41:50,910 --> 00:41:53,880
because it happens to go through
some intermediate node
700
00:41:53,880 --> 00:41:55,460
on the way.
701
00:41:55,460 --> 00:41:58,680
So two nodes communicate--
702
00:41:58,680 --> 00:42:01,890
i double arrow j--
703
00:42:01,890 --> 00:42:08,860
if j is accessible from i, and
if i is accessible from j.
704
00:42:08,860 --> 00:42:12,450
That means there's a path from
i to j, and another path from
705
00:42:12,450 --> 00:42:16,260
j back to i, if you shorten
them as much as you can.
706
00:42:16,260 --> 00:42:17,040
There's a cycle.
707
00:42:17,040 --> 00:42:23,530
It starts at i, goes through j,
and comes back to i again.
708
00:42:23,530 --> 00:42:29,810
I didn't say that quite right,
so delete that from what
709
00:42:29,810 --> 00:42:31,200
you've just heard.
710
00:42:31,200 --> 00:42:35,630
A class C of states as a
non-empty set, such that i and
711
00:42:35,630 --> 00:42:40,370
j communicate for each
i j in this class.
712
00:42:40,370 --> 00:42:45,330
But i does not communicate
with j for each i and C--
713
00:42:49,420 --> 00:42:53,210
for i and C and j, not in C.
714
00:42:53,210 --> 00:42:55,870
The convenient way to think
about this-- and I should have
715
00:42:55,870 --> 00:42:59,670
stated this as a theorem in
the notes, because it's--
716
00:43:03,990 --> 00:43:06,130
I think it's something that
we all use without even
717
00:43:06,130 --> 00:43:07,750
thinking about it.
718
00:43:07,750 --> 00:43:12,480
It says that the entire set of
states, or the entire set of
719
00:43:12,480 --> 00:43:16,500
nodes in a graph, is partitioned
into classes.
720
00:43:16,500 --> 00:43:22,860
The class C, containing, is i
in union with all of the j's
721
00:43:22,860 --> 00:43:24,110
that communicate with i.
722
00:43:24,110 --> 00:43:27,580
So if you want to find this
partition, you start out with
723
00:43:27,580 --> 00:43:31,280
an arbitrary node, you find all
of the other nodes that it
724
00:43:31,280 --> 00:43:34,590
communicates with, and you
find them by picking
725
00:43:34,590 --> 00:43:36,320
them one at a time.
726
00:43:36,320 --> 00:43:41,050
You pick all of the nodes
for which p sub i j is
727
00:43:41,050 --> 00:43:42,540
greater than 0.
728
00:43:42,540 --> 00:43:44,100
Then you pick--
729
00:43:44,100 --> 00:43:46,530
and p sub j i is great--
730
00:43:46,530 --> 00:43:47,780
well-- blah.
731
00:43:50,030 --> 00:43:55,400
If you want to find the set of
nodes that are accessible from
732
00:43:55,400 --> 00:43:57,640
i, you start out looking at i.
733
00:43:57,640 --> 00:44:00,640
You look at all the states
which are accessible
734
00:44:00,640 --> 00:44:03,300
from i in one step.
735
00:44:03,300 --> 00:44:06,870
Then you look at all the steps,
all of the states,
736
00:44:06,870 --> 00:44:09,380
which you can access from
any one of those.
737
00:44:09,380 --> 00:44:12,720
Those are the states which are
accessible in two states--
738
00:44:12,720 --> 00:44:16,150
in two steps, then in three
steps, and so forth.
739
00:44:16,150 --> 00:44:21,380
So you find all the nodes that
are accessible from node i.
740
00:44:21,380 --> 00:44:24,640
And then you turn around and
do it the other way.
741
00:44:24,640 --> 00:44:29,600
And presto, you have all of
these classes of states all
742
00:44:29,600 --> 00:44:30,910
very simply.
743
00:44:30,910 --> 00:44:34,990
For finite-state change, the
state i is transient if
744
00:44:34,990 --> 00:44:40,200
there's a j in S such that
i goes into j, but j
745
00:44:40,200 --> 00:44:41,420
does not go into i.
746
00:44:41,420 --> 00:44:46,900
In other words, if I'm a state
i, and I can get to you, but
747
00:44:46,900 --> 00:44:55,450
you can't get back to me,
then I'm transient.
748
00:44:55,450 --> 00:45:01,600
Because the way Markov chains
work, we keep going from one
749
00:45:01,600 --> 00:45:04,720
step to the next step to the
next step to the next step.
750
00:45:04,720 --> 00:45:09,710
And if I keep returning to
myself, then eventually I'm
751
00:45:09,710 --> 00:45:11,010
going to go to you.
752
00:45:11,010 --> 00:45:14,040
And once I go to you, I'll
never get back again.
753
00:45:14,040 --> 00:45:18,540
So because of that, these
transient states are states
754
00:45:18,540 --> 00:45:21,450
where eventually you
leave them and you
755
00:45:21,450 --> 00:45:23,160
never get back again.
756
00:45:23,160 --> 00:45:26,190
As soon as we start talking
about countable state Markov
757
00:45:26,190 --> 00:45:28,270
chains, you'll see that
this definition
758
00:45:28,270 --> 00:45:30,250
doesn't work anymore.
759
00:45:30,250 --> 00:45:32,620
You can--
760
00:45:32,620 --> 00:45:36,520
it is very possible to just
wander away in a countable
761
00:45:36,520 --> 00:45:40,390
state Markov chain, and you
never get back again that way.
762
00:45:40,390 --> 00:45:43,640
After you wander away too far,
the probability of getting
763
00:45:43,640 --> 00:45:45,540
back gets smaller and smaller.
764
00:45:45,540 --> 00:45:47,830
You keep getting further
and further away.
765
00:45:47,830 --> 00:45:52,810
The probability of returning
gets smaller and smaller, so
766
00:45:52,810 --> 00:45:56,360
that you have transience
that way also.
767
00:45:56,360 --> 00:45:59,470
But here, the situation is
simpler for a finite-state
768
00:45:59,470 --> 00:46:01,030
Markov chain.
769
00:46:01,030 --> 00:46:05,570
And you can define transience if
there's a j in S such that
770
00:46:05,570 --> 00:46:09,440
i goes into j, but j
doesn't go into i.
771
00:46:09,440 --> 00:46:13,160
If i's not transient,
then it's recurrent.
772
00:46:13,160 --> 00:46:16,240
Usually you define recurrence
first and transience later,
773
00:46:16,240 --> 00:46:19,470
but it's a little simpler
this way.
774
00:46:19,470 --> 00:46:22,310
All states in a class are
transient, or all are
775
00:46:22,310 --> 00:46:26,330
recurrent, and a finite-state
Markov chain contains at least
776
00:46:26,330 --> 00:46:27,990
one recurrent class.
777
00:46:27,990 --> 00:46:29,770
You did that in your homework.
778
00:46:29,770 --> 00:46:33,040
And you were surprised at how
complicated it was to do it.
779
00:46:33,040 --> 00:46:36,350
I hope that after you wrote
down a proof of this, you
780
00:46:36,350 --> 00:46:41,800
stopped and thought about what
you were actually proving,
781
00:46:41,800 --> 00:46:46,030
which intuitively is something
very, very simple.
782
00:46:46,030 --> 00:46:48,960
It's just looking at all of
the transient classes.
783
00:46:48,960 --> 00:46:51,480
Starting at one transient
class, you
784
00:46:51,480 --> 00:46:54,950
find if there's another--
785
00:46:54,950 --> 00:46:59,190
if there's another state you can
get to from OK i which is
786
00:46:59,190 --> 00:47:02,170
also transient, and then you
find if there's another state
787
00:47:02,170 --> 00:47:04,910
you get to from there which
is also transient.
788
00:47:04,910 --> 00:47:08,500
And eventually, you have to come
to a state from which you
789
00:47:08,500 --> 00:47:13,325
can't go to some other state,
from which you can't get back.
790
00:47:17,350 --> 00:47:20,410
That was explaining it almost
as badly as the problem
791
00:47:20,410 --> 00:47:22,120
statement explained it.
792
00:47:22,120 --> 00:47:25,460
And I hope that after you did
the problem, even if you can't
793
00:47:25,460 --> 00:47:27,910
explain it to someone,
you have an
794
00:47:27,910 --> 00:47:30,430
understanding of why it's true.
795
00:47:30,430 --> 00:47:34,920
It shouldn't be surprising
after you do that.
796
00:47:34,920 --> 00:47:38,950
So the finite-state Markov chain
contains at least one
797
00:47:38,950 --> 00:47:40,200
recurrent class.
798
00:47:42,800 --> 00:47:46,720
OK, the period of a state
i as the greatest common
799
00:47:46,720 --> 00:47:51,730
denominator of n, such that
p i n is greater than 0.
800
00:47:51,730 --> 00:47:54,580
Again, a very complicated
definition for a
801
00:47:54,580 --> 00:47:56,280
simple kind of idea.
802
00:47:56,280 --> 00:47:58,670
Namely, you start out
in a state i.
803
00:47:58,670 --> 00:48:02,440
You look at all of the times at
which you can get back to
804
00:48:02,440 --> 00:48:03,940
state i again.
805
00:48:03,940 --> 00:48:08,780
If you find it that set of
times has a period in it,
806
00:48:08,780 --> 00:48:19,550
namely, if every sequences of
states is a multiple of some
807
00:48:19,550 --> 00:48:25,410
d, then you know that the state
is periodic if d is
808
00:48:25,410 --> 00:48:26,720
greater than 1.
809
00:48:26,720 --> 00:48:30,060
And what you have to do is to
find the largest such number.
810
00:48:30,060 --> 00:48:32,040
And that's the period
of the state.
811
00:48:32,040 --> 00:48:35,170
All states in the same class
have the same period.
812
00:48:35,170 --> 00:48:38,690
A recurring class with period
d greater than one can be
813
00:48:38,690 --> 00:48:40,550
partitioned into sub-class--
814
00:48:40,550 --> 00:48:42,640
this is the best way
of looking at
815
00:48:42,640 --> 00:48:45,820
periodic classes of states.
816
00:48:45,820 --> 00:48:49,780
If you have a periodic class of
states, then you can always
817
00:48:49,780 --> 00:48:53,960
separate it into
d sub-classes.
818
00:48:53,960 --> 00:48:59,300
And in such a set of
sub-classes, transitions from
819
00:48:59,300 --> 00:49:03,770
S1 and the states in
S1 only go to S2.
820
00:49:03,770 --> 00:49:07,710
Transitions from states
in S2 only go to S3.
821
00:49:07,710 --> 00:49:12,430
Up to, transitions from S
d only go back to S1.
822
00:49:12,430 --> 00:49:16,050
They have to go someplace,
so they go back to S1.
823
00:49:16,050 --> 00:49:22,500
So as you cycle around, it takes
d steps to cycle from 1
824
00:49:22,500 --> 00:49:24,000
back to 1 again.
825
00:49:24,000 --> 00:49:28,410
It takes d steps to cycle
from 2 back to 2 again.
826
00:49:28,410 --> 00:49:31,300
So you can see the structure of
the Markov chain and why,
827
00:49:31,300 --> 00:49:34,810
in fact, it does have to be--
828
00:49:34,810 --> 00:49:38,480
why that class has
to be periodic.
829
00:49:38,480 --> 00:49:41,870
An ergodic class is a recurrent
aperiodic class.
830
00:49:41,870 --> 00:49:44,760
In other words, it's a class
where the period is equal to
831
00:49:44,760 --> 00:49:48,450
1, which means there really
isn't any period.
832
00:49:48,450 --> 00:49:52,550
A Markov chain with only one
class is ergodic if the class
833
00:49:52,550 --> 00:49:54,640
is ergodic.
834
00:49:54,640 --> 00:49:56,880
And the big theorem here--
835
00:49:56,880 --> 00:49:59,670
I mean, this is probably the
most important theorem about
836
00:49:59,670 --> 00:50:01,820
finite-state Markov chains.
837
00:50:01,820 --> 00:50:05,100
You have an ergodic,
finite-state Markov chain.
838
00:50:05,100 --> 00:50:12,300
Then the limit as n goes to
infinity of the probability of
839
00:50:12,300 --> 00:50:16,700
arriving in state j after n
steps, given that you started
840
00:50:16,700 --> 00:50:20,780
in state i, is just some
function of j.
841
00:50:20,780 --> 00:50:24,400
In other words, when n gets very
large, it doesn't depend
842
00:50:24,400 --> 00:50:27,370
on how large M is.
843
00:50:27,370 --> 00:50:28,480
It stays the same.
844
00:50:28,480 --> 00:50:30,570
It becomes independent of n.
845
00:50:30,570 --> 00:50:32,450
It doesn't depend on
where you started.
846
00:50:32,450 --> 00:50:34,860
No matter where you start
in a finite-state
847
00:50:34,860 --> 00:50:36,570
ergodic Markov chain.
848
00:50:36,570 --> 00:50:40,580
After a very long time, the
probability of being in a
849
00:50:40,580 --> 00:50:44,620
state j is independent of where
you started, and it's
850
00:50:44,620 --> 00:50:48,170
independent of how long
you've been running.
851
00:50:48,170 --> 00:50:52,200
So that's a very strong
kind of--
852
00:50:52,200 --> 00:50:54,890
it's a very strong kind
of limit theorem.
853
00:50:54,890 --> 00:50:58,690
It's very much like the law of
large numbers and all of these
854
00:50:58,690 --> 00:51:00,030
other things.
855
00:51:00,030 --> 00:51:03,120
I'm going to talk a little bit
at the end about what that
856
00:51:03,120 --> 00:51:04,820
relationship really is.
857
00:51:07,360 --> 00:51:10,850
Except what it says is, after a
long time, you're in steady
858
00:51:10,850 --> 00:51:12,670
state, which is why
it's called the
859
00:51:12,670 --> 00:51:13,760
steady state theorem.
860
00:51:13,760 --> 00:51:14,440
Yes?
861
00:51:14,440 --> 00:51:17,386
AUDIENCE: Could you define the
steady states for periodic
862
00:51:17,386 --> 00:51:18,636
changes [INAUDIBLE]?
863
00:51:21,320 --> 00:51:26,460
PROFESSOR: I try to avoid doing
that because you have
864
00:51:26,460 --> 00:51:28,650
steady state probabilities.
865
00:51:28,650 --> 00:51:31,810
The steady state probabilities
that you have are, you take--
866
00:51:34,990 --> 00:51:38,760
is if you have these
sub-classes.
867
00:51:38,760 --> 00:51:42,690
Then you wind up with a steady
state within each sub-class.
868
00:51:42,690 --> 00:51:46,900
If you assign a probability
of the probability in the
869
00:51:46,900 --> 00:51:51,870
sub-class, divided by d, then
you get what is the steady
870
00:51:51,870 --> 00:51:52,930
state probability.
871
00:51:52,930 --> 00:51:56,870
If you start out in that steady
state, then you're in
872
00:51:56,870 --> 00:52:00,130
each sub-class with probability
1 over d.
873
00:52:00,130 --> 00:52:04,230
And you shift to the next
sub-class and you're still in
874
00:52:04,230 --> 00:52:08,340
steady state, because you have
a probability, 1 over d, of
875
00:52:08,340 --> 00:52:12,230
being in each of those
sub-classes to start with.
876
00:52:12,230 --> 00:52:16,970
You shift and you're still in
one of the sub-classes with
877
00:52:16,970 --> 00:52:19,130
probability 1 over d.
878
00:52:19,130 --> 00:52:22,690
So there still is a steady
state in that sense, but
879
00:52:22,690 --> 00:52:24,830
there's not a steady state
in any nice sense.
880
00:52:31,940 --> 00:52:39,470
So anyway, that's
the way it is.
881
00:52:39,470 --> 00:52:44,860
But you see, if you understand
this theorem for ergodic
882
00:52:44,860 --> 00:52:48,550
finite state and Markov
chains, and then you
883
00:52:48,550 --> 00:52:52,540
understand about periodic
change and this set of
884
00:52:52,540 --> 00:52:56,070
sub-classes, you can
see within each
885
00:52:56,070 --> 00:52:59,450
sub-class, if you look at--
886
00:52:59,450 --> 00:53:00,700
if you look at--
887
00:53:04,440 --> 00:53:11,500
if you look at time 0, time d,
time 2d, times 3d and 4d, then
888
00:53:11,500 --> 00:53:14,470
whatever state you start in,
you're going to be in the same
889
00:53:14,470 --> 00:53:19,380
class after d steps, the same
class after 2d steps.
890
00:53:19,380 --> 00:53:21,480
You're going to have
a transition
891
00:53:21,480 --> 00:53:24,280
matrix over d steps.
892
00:53:24,280 --> 00:53:27,360
And this theorem still applies
to these sub-classes over
893
00:53:27,360 --> 00:53:29,200
periods of d.
894
00:53:29,200 --> 00:53:32,030
So the hard part of it
is proving this.
895
00:53:32,030 --> 00:53:35,180
After you prove this, then you
see that the same thing
896
00:53:35,180 --> 00:53:38,200
happens over each sub-class
after that.
897
00:53:43,650 --> 00:53:45,290
That's a pretty major theorem.
898
00:53:45,290 --> 00:53:46,990
It's difficult to prove.
899
00:53:46,990 --> 00:53:50,890
A sub-step is to show that for
an ergodic M state Markov
900
00:53:50,890 --> 00:53:56,380
chain, the probability of being
in state j at time n,
901
00:53:56,380 --> 00:54:00,930
given that you're in state i at
time 0, is positive for all
902
00:54:00,930 --> 00:54:05,870
i j, and all n greater than
M minus 1 squared plus 1.
903
00:54:05,870 --> 00:54:10,900
It's very surprising that you
have to go this many states--
904
00:54:10,900 --> 00:54:14,980
this many steps before you get
to the point that all these
905
00:54:14,980 --> 00:54:18,440
transition probabilities
are positive.
906
00:54:18,440 --> 00:54:22,450
You look at this particular kind
of Markov chain in the
907
00:54:22,450 --> 00:54:26,660
homework, and I hope what you
found out from it was that if
908
00:54:26,660 --> 00:54:32,040
you start, say, in state two,
then at the next time, you
909
00:54:32,040 --> 00:54:33,640
have to be in 3.
910
00:54:33,640 --> 00:54:37,020
Next time, you have to be in
4, you have to be in 5, you
911
00:54:37,020 --> 00:54:38,560
have to be in 6.
912
00:54:38,560 --> 00:54:41,300
In other words, the size of
the set that you can be in
913
00:54:41,300 --> 00:54:46,550
after one step is just 1.
914
00:54:46,550 --> 00:54:51,170
One possible state here, one
possible state here, one
915
00:54:51,170 --> 00:54:52,640
possible state here.
916
00:54:52,640 --> 00:54:57,250
The next step, you're in either
1 or 2, and as you
917
00:54:57,250 --> 00:55:01,600
travel around, the size of the
set of states you can be in at
918
00:55:01,600 --> 00:55:06,510
these different steps, is 2,
until you get all the way
919
00:55:06,510 --> 00:55:07,510
around again.
920
00:55:07,510 --> 00:55:09,800
And then there's
a way to get--
921
00:55:09,800 --> 00:55:15,050
when you get to state 6 again,
the set of states enlarges.
922
00:55:15,050 --> 00:55:18,970
So finally you get up to a
set of states, which is
923
00:55:18,970 --> 00:55:20,800
up to M minus 1.
924
00:55:20,800 --> 00:55:25,630
And that's why you get the M
minus 1 squared here, plus 1.
925
00:55:25,630 --> 00:55:28,710
And this is the only Markov
chain there is.
926
00:55:28,710 --> 00:55:31,850
You can have as many
states going around
927
00:55:31,850 --> 00:55:33,770
here as you want to.
928
00:55:33,770 --> 00:55:36,020
But you have to have this
structure at the end, where
929
00:55:36,020 --> 00:55:39,930
there's one special state and
one way of circumventing it,
930
00:55:39,930 --> 00:55:43,930
which means there's one cycle
of size M minus 1, and one
931
00:55:43,930 --> 00:55:48,440
cycle of size M. And that's the
only way you can get it.
932
00:55:48,440 --> 00:55:52,780
And that's the only Markov chain
that meets this bound
933
00:55:52,780 --> 00:55:53,640
with equality.
934
00:55:53,640 --> 00:56:01,470
In all other cases, you get this
property much earlier.
935
00:56:01,470 --> 00:56:05,200
And often, you get it after just
a linear amount of time.
936
00:56:09,360 --> 00:56:13,350
The other part of this major
theorem that you reach steady
937
00:56:13,350 --> 00:56:17,350
state says, let P be
greater than 0.
938
00:56:17,350 --> 00:56:19,150
In other words, let
all the transition
939
00:56:19,150 --> 00:56:22,410
probabilities be positive.
940
00:56:22,410 --> 00:56:28,040
And then define some quantity
alpha as a minimum of the
941
00:56:28,040 --> 00:56:30,160
transition probabilities.
942
00:56:30,160 --> 00:56:34,110
And then the theorem says, for
all states j and all n greater
943
00:56:34,110 --> 00:56:38,470
than or equal to 1, the maximum
over the initial
944
00:56:38,470 --> 00:56:43,180
states minus the minimum over
the initial states of P sub i
945
00:56:43,180 --> 00:56:49,040
j to the n plus-- first step,
that difference is less than
946
00:56:49,040 --> 00:56:52,470
or equal to the difference
a the n-th step,
947
00:56:52,470 --> 00:56:54,300
times 1 minus 2 alpha.
948
00:56:54,300 --> 00:56:58,970
Now 1 minus 2 alpha is
as a positive number.
949
00:56:58,970 --> 00:57:03,700
And this says that this maximum
minus minimum is 1
950
00:57:03,700 --> 00:57:07,860
minus 2 alpha to the n, which
says that the limit of the
951
00:57:07,860 --> 00:57:11,220
maximizing term is equal
to the limit of
952
00:57:11,220 --> 00:57:12,640
the minimizing term.
953
00:57:12,640 --> 00:57:13,850
And what does that say?
954
00:57:13,850 --> 00:57:18,740
It says that everything in the
middle gets squeezed together.
955
00:57:18,740 --> 00:57:24,200
And it says exactly what we want
it to say, that the limit
956
00:57:24,200 --> 00:57:30,380
of P sub l j to the n is
independent of l, after n gets
957
00:57:30,380 --> 00:57:31,310
very large.
958
00:57:31,310 --> 00:57:34,090
Because the maximum and
the minimum get very
959
00:57:34,090 --> 00:57:37,560
close to each other.
960
00:57:37,560 --> 00:57:40,170
We also showed that [? our ?]
approaches that limit
961
00:57:40,170 --> 00:57:41,780
exponentially.
962
00:57:41,780 --> 00:57:43,640
That's what this says.
963
00:57:43,640 --> 00:57:49,860
The exponent here is just this
alpha, determined in that way.
964
00:57:49,860 --> 00:57:54,630
And the theorem for ergodic
Markov chains then follows by
965
00:57:54,630 --> 00:58:01,380
just looking at successive h
steps in the Markov chain when
966
00:58:01,380 --> 00:58:06,110
h is large enough so that all
these transition probabilities
967
00:58:06,110 --> 00:58:07,360
are positive.
968
00:58:09,300 --> 00:58:12,220
So you go out far enough
that all the transition
969
00:58:12,220 --> 00:58:13,860
probabilities are positive.
970
00:58:13,860 --> 00:58:16,980
And then you look at repetitions
of that, and apply
971
00:58:16,980 --> 00:58:18,230
this theorem.
972
00:58:18,230 --> 00:58:21,570
And suddenly you have this
general theorem,
973
00:58:21,570 --> 00:58:22,900
which is what we wanted.
974
00:58:27,200 --> 00:58:30,530
An ergodic unichain is a Markov
up chain with one
975
00:58:30,530 --> 00:58:33,870
ergodic recurring class,
plus perhaps a set
976
00:58:33,870 --> 00:58:36,550
of transient states.
977
00:58:36,550 --> 00:58:39,600
And most of the things we talk
about in this course are for
978
00:58:39,600 --> 00:58:45,870
unichains, usually ergodic
unichains, because if you have
979
00:58:45,870 --> 00:58:49,160
multiple recurrent classes,
it just makes a mess.
980
00:58:49,160 --> 00:58:51,780
You wind up in this recurrent
class, or
981
00:58:51,780 --> 00:58:53,950
this recurrent class.
982
00:58:53,950 --> 00:59:00,080
And aside from the question of
which one you get to, you
983
00:59:00,080 --> 00:59:01,730
don't much care about it.
984
00:59:01,730 --> 00:59:05,790
And the theorem here is for an
ergodic finite-state unichain.
985
00:59:05,790 --> 00:59:10,370
The limit of P sub i j to the
n probability of being in
986
00:59:10,370 --> 00:59:15,130
state j at time n, given that
you're in state i at time 0,
987
00:59:15,130 --> 00:59:17,290
is equal to pi sub j.
988
00:59:17,290 --> 00:59:22,330
In other words, this limit
here exists for all i j.
989
00:59:22,330 --> 00:59:25,210
And the limit is independent
of i.
990
00:59:25,210 --> 00:59:27,900
And it's independent of n
as n gets big enough.
991
00:59:32,820 --> 00:59:42,970
And then also, we can choose
this so that this set of
992
00:59:42,970 --> 00:59:47,680
probabilities here satisfies
this, what's called the steady
993
00:59:47,680 --> 00:59:51,780
state condition, the sum
of pi i times P sub i j
994
00:59:51,780 --> 00:59:53,140
is equal to pi j.
995
00:59:53,140 --> 00:59:56,380
In other words, if you start out
in steady state, and you
996
00:59:56,380 --> 01:00:00,300
look at the probabilities of
being in the different states
997
01:00:00,300 --> 01:00:06,610
at the next time unit, this is
the probability of being in
998
01:00:06,610 --> 01:00:11,610
state j at time n plus 1, if
this is the probability of
999
01:00:11,610 --> 01:00:14,420
being in state i at time n.
1000
01:00:14,420 --> 01:00:17,790
So that condition
gets satisfied.
1001
01:00:17,790 --> 01:00:19,280
That condition is satisfied.
1002
01:00:19,280 --> 01:00:22,760
You just stay in steady
state forever.
1003
01:00:22,760 --> 01:00:29,210
And pi i has to be positive for
a recurrent i, and pi i is
1004
01:00:29,210 --> 01:00:31,680
equal to 0 otherwise.
1005
01:00:31,680 --> 01:00:35,230
So this is just a
generalization
1006
01:00:35,230 --> 01:00:38,090
of the ergodic theorem.
1007
01:00:38,090 --> 01:00:43,400
And this is not what people
refer to as the ergodic
1008
01:00:43,400 --> 01:00:48,160
theorem, which is a much more
general theorem than this.
1009
01:00:48,160 --> 01:00:50,900
This is the ergodic theorem for
the case of finite state
1010
01:00:50,900 --> 01:00:53,110
Markov chains.
1011
01:00:53,110 --> 01:00:59,190
You can restate this in matrix
form as the limit of the
1012
01:00:59,190 --> 01:01:02,900
matrix P to the n-th power.
1013
01:01:02,900 --> 01:01:06,680
What I didn't mention here and
what I probably didn't mention
1014
01:01:06,680 --> 01:01:11,880
enough in the notes is
that P sub i j--
1015
01:01:32,360 --> 01:01:47,560
but also, if you take the matrix
P times P time P, n
1016
01:01:47,560 --> 01:01:53,880
times, namely, you take the
matrix, P to the n.
1017
01:01:53,880 --> 01:02:00,720
This says the P sub i j
is the i j element.
1018
01:02:09,900 --> 01:02:12,530
I'm sure all of you know that
by now, because you've been
1019
01:02:12,530 --> 01:02:15,310
using it all the time.
1020
01:02:15,310 --> 01:02:18,820
And what this says here--
1021
01:02:18,820 --> 01:02:26,150
what we've said before is that
every row of this matrix, P to
1022
01:02:26,150 --> 01:02:28,600
the n, is the same.
1023
01:02:28,600 --> 01:02:31,290
Every row is equal to pi.
1024
01:02:31,290 --> 01:02:47,786
P to the n tends to a matrix
which is pi 1, pi 2,
1025
01:02:47,786 --> 01:02:52,120
up to pi sub n.
1026
01:02:52,120 --> 01:02:57,000
Pi 1, pi 2, up to pi sub n.
1027
01:03:00,760 --> 01:03:06,770
Pi 1, pi 2, up to pi sub n.
1028
01:03:06,770 --> 01:03:14,660
And the easiest way to express
this is the vector e times pi,
1029
01:03:14,660 --> 01:03:24,960
where e is transposed.
1030
01:03:24,960 --> 01:03:32,755
In other words, if you take a
column matrix, column 1, 1, 1,
1031
01:03:32,755 --> 01:03:40,670
1, 1, and you multiply this by
a row vector, pi 1 times pi
1032
01:03:40,670 --> 01:03:48,030
sub n, what you get is, for this
first row multiplied by
1033
01:03:48,030 --> 01:03:51,210
this, this gives you--
1034
01:03:51,210 --> 01:03:53,480
well, in fact, if you
multiply this out,
1035
01:03:53,480 --> 01:03:56,360
this is what you get.
1036
01:03:56,360 --> 01:03:58,650
And if you've never gone through
the trouble of seeing
1037
01:03:58,650 --> 01:04:03,880
that this multiplication leads
to this, please do it, because
1038
01:04:03,880 --> 01:04:07,170
it's important to notice
that correspondence.
1039
01:04:14,530 --> 01:04:18,080
We got specific results by
looking at the eigenvalues and
1040
01:04:18,080 --> 01:04:20,880
eigenvectors of stochastic
matrices.
1041
01:04:20,880 --> 01:04:24,720
And a stochastic matrix is the
matrix of a Markov chain.
1042
01:04:28,500 --> 01:04:31,290
So some of these things
are sort of obvious.
1043
01:04:31,290 --> 01:04:36,870
Lambda is an eigenvalue of P, if
and only if P minus lambda
1044
01:04:36,870 --> 01:04:38,120
i is singular.
1045
01:04:41,670 --> 01:04:45,040
This set of relationships
is not obvious.
1046
01:04:45,040 --> 01:04:48,130
This is obvious linear
algebra.
1047
01:04:48,130 --> 01:04:51,250
This is something that when
you study eigenvalues and
1048
01:04:51,250 --> 01:04:55,430
eigenvectors in linear algebra,
you recognize that
1049
01:04:55,430 --> 01:04:57,270
this is a summary of
a lot of things.
1050
01:04:57,270 --> 01:05:01,440
If and only if this determinant
is equal to 0,
1051
01:05:01,440 --> 01:05:05,650
which is true if and only if
there's some vector nu for
1052
01:05:05,650 --> 01:05:12,560
which P times nu equals lambda
times nu for nu unequal to 0.
1053
01:05:12,560 --> 01:05:16,920
And if and only if pi P equals
lambda pi for some
1054
01:05:16,920 --> 01:05:18,210
pi unequal to 0.
1055
01:05:18,210 --> 01:05:23,250
In other words, if this
determinant is equal to 0, it
1056
01:05:23,250 --> 01:05:32,040
means that the matrix P minus
lambda i is singular.
1057
01:05:32,040 --> 01:05:35,950
If the matrix is singular, there
has to be some solution
1058
01:05:35,950 --> 01:05:38,370
to this equation here.
1059
01:05:38,370 --> 01:05:40,220
There has to be some
solution to this
1060
01:05:40,220 --> 01:05:44,530
left eigenvector equation.
1061
01:05:44,530 --> 01:05:48,740
Now, once you see this, you
notice that e is always a
1062
01:05:48,740 --> 01:05:53,750
right eigenvector of P. Every
stochastic matrix in the world
1063
01:05:53,750 --> 01:05:58,920
has the property that e is a
right eigenvector of it.
1064
01:05:58,920 --> 01:05:59,800
Why is that?
1065
01:05:59,800 --> 01:06:05,230
Because all of the rows of a
stochastic matrix sum to 1.
1066
01:06:05,230 --> 01:06:10,070
If you start off in state i, the
sum of the possible states
1067
01:06:10,070 --> 01:06:14,530
you can be at in the next
step is equal to 1.
1068
01:06:14,530 --> 01:06:17,120
You have to go somewhere.
1069
01:06:17,120 --> 01:06:21,650
So e is always a right
eigenvector of P with
1070
01:06:21,650 --> 01:06:23,300
eigenvalue 1.
1071
01:06:23,300 --> 01:06:26,510
Since e is also is a right
eigenvector of P with
1072
01:06:26,510 --> 01:06:29,850
eigenvalue 1, we go up here.
1073
01:06:29,850 --> 01:06:32,460
We look at these if and
only if statements.
1074
01:06:32,460 --> 01:06:34,890
We see, then, P must
be singular.
1075
01:06:34,890 --> 01:06:38,410
And then pi times P
equals lambda pi.
1076
01:06:38,410 --> 01:06:41,410
So no matter how many recurrent
classes we have, no
1077
01:06:41,410 --> 01:06:46,430
matter what periodicity we have
in each of them, there's
1078
01:06:46,430 --> 01:06:53,170
always a solution to pi
times P equals pi.
1079
01:06:53,170 --> 01:06:55,550
There's always at least one
steady state vector.
1080
01:06:59,320 --> 01:07:03,580
This determinant has an M-th
degree polynomial in lambda.
1081
01:07:03,580 --> 01:07:08,150
M-th degree polynomials
have M roots.
1082
01:07:08,150 --> 01:07:10,400
They aren't necessarily
distinct.
1083
01:07:10,400 --> 01:07:14,040
The multiplicity of an
eigenvalue is the number roots
1084
01:07:14,040 --> 01:07:15,500
of that value.
1085
01:07:15,500 --> 01:07:19,780
And the multiplicity
of lambda equals 1.
1086
01:07:19,780 --> 01:07:22,530
How many different roots
are there which have
1087
01:07:22,530 --> 01:07:24,360
lambda equals 1?
1088
01:07:24,360 --> 01:07:26,940
Well it turns out to be just
the number of recurrent
1089
01:07:26,940 --> 01:07:29,550
classes that you have.
1090
01:07:29,550 --> 01:07:32,750
If you have a bunch of recurrent
classes, within each
1091
01:07:32,750 --> 01:07:37,330
recurring class, there's a
solution to pi P equals pi,
1092
01:07:37,330 --> 01:07:41,540
which is non-zero only one
that recurrent class.
1093
01:07:41,540 --> 01:07:46,340
Namely, you take this huge
Markov chain and you say, I
1094
01:07:46,340 --> 01:07:48,650
don't care about any
of this except this
1095
01:07:48,650 --> 01:07:50,890
one recurrent class.
1096
01:07:50,890 --> 01:07:53,990
If we look at this one recurrent
class, and solve for
1097
01:07:53,990 --> 01:07:57,500
the steady state probability in
that one recurrent class,
1098
01:07:57,500 --> 01:08:01,220
then we get an eigenvector
which is non-zero on that
1099
01:08:01,220 --> 01:08:05,990
class, 0 everywhere else, that
has an eigenvalue 1.
1100
01:08:05,990 --> 01:08:08,050
And for every other recurrent
class, we
1101
01:08:08,050 --> 01:08:10,590
get the same situation.
1102
01:08:10,590 --> 01:08:14,150
So the multiplicity of lambda
equals 1 is equal to the
1103
01:08:14,150 --> 01:08:17,260
number of recurrent classes.
1104
01:08:17,260 --> 01:08:21,950
If you didn't get that proof
on the fly, it gets
1105
01:08:21,950 --> 01:08:23,310
proved in the notes.
1106
01:08:23,310 --> 01:08:27,130
And if you don't get the proof,
just remember that
1107
01:08:27,130 --> 01:08:28,380
that's the way it is.
1108
01:08:30,859 --> 01:08:34,859
For the special case where all
M eigenvalues are distinct,
1109
01:08:34,859 --> 01:08:38,640
the right eigenvectors are
linearly independent.
1110
01:08:38,640 --> 01:08:42,620
You remember that proof we went
through that all of the
1111
01:08:42,620 --> 01:08:46,470
left eigenvectors and all the
right eigenvectors are all
1112
01:08:46,470 --> 01:08:49,870
orthonormal to each other,
or you can make them all
1113
01:08:49,870 --> 01:08:52,270
orthonormal to each other?
1114
01:08:52,270 --> 01:08:57,380
That says that if the right
eigenvectors are linearly
1115
01:08:57,380 --> 01:09:01,120
independent, you can represent
them as the columns of an
1116
01:09:01,120 --> 01:09:04,750
invertible matrix U.
Then P times U is
1117
01:09:04,750 --> 01:09:06,819
equal to U times lambda.
1118
01:09:06,819 --> 01:09:09,800
What does this equations say?
1119
01:09:09,800 --> 01:09:12,460
You split it up into a
bunch of equations.
1120
01:09:16,500 --> 01:09:46,080
P times U and we look at it as
nu 1, nu 2, nu sub [? n ?].
1121
01:09:46,080 --> 01:09:52,580
I guess better put the
superscripts on it.
1122
01:09:56,100 --> 01:10:01,270
If I take the matrix U and just
view it as M different
1123
01:10:01,270 --> 01:10:05,190
columns, then what this
is saying is that
1124
01:10:05,190 --> 01:10:06,545
this is equal to--
1125
01:10:17,290 --> 01:10:35,540
nu 1, nu 2, nu M, times lambda
1, lambda 2, up to lambda M.
1126
01:10:35,540 --> 01:10:38,500
Now you multiply this out,
and what do you get?
1127
01:10:38,500 --> 01:10:41,860
You get nu 1 times lambda 1.
1128
01:10:41,860 --> 01:10:46,190
You get a nu 2 times lambda 2
for the second column, nu M
1129
01:10:46,190 --> 01:10:49,820
times lambda M for the last
column, and here you get P
1130
01:10:49,820 --> 01:10:54,360
times nu 1 is equal to a nu 1
times lambda 1, and so forth.
1131
01:10:54,360 --> 01:10:59,240
So all this vector equation says
is the same thing that
1132
01:10:59,240 --> 01:11:04,760
these n M individual eigenvector
equations say.
1133
01:11:04,760 --> 01:11:11,160
It's just a more compact way
of saying the same thing.
1134
01:11:11,160 --> 01:11:17,300
And if these eigenvectors span
this space, then this set of
1135
01:11:17,300 --> 01:11:20,710
eigenvectors are linearly
independent of each other.
1136
01:11:20,710 --> 01:11:24,860
And when you look at the set of
them, this matrix here has
1137
01:11:24,860 --> 01:11:26,440
to have an inverse.
1138
01:11:26,440 --> 01:11:34,890
So you can also express this
as P equals this vector--
1139
01:11:34,890 --> 01:11:40,820
this matrix of right
eigenvectors times the
1140
01:11:40,820 --> 01:11:46,630
diagonal matrix lambda, times
the inverse of this matrix.
1141
01:11:46,630 --> 01:11:49,880
Matrix U to the minus 1 turns
out to have rows equal to the
1142
01:11:49,880 --> 01:11:51,730
left eigenvectors.
1143
01:11:51,730 --> 01:11:54,330
That's because these
eigenvectors--
1144
01:11:54,330 --> 01:11:57,440
that's because the right
eigenvectors and the left
1145
01:11:57,440 --> 01:12:01,270
eigenvectors are orthogonal
to each other.
1146
01:12:04,670 --> 01:12:09,690
When we then split up this
matrix into a sum of M
1147
01:12:09,690 --> 01:12:13,830
different matrices, each matrix
having only one--
1148
01:12:41,270 --> 01:12:43,460
and so forth.
1149
01:12:43,460 --> 01:12:45,710
Then what you get--
1150
01:12:45,710 --> 01:12:48,490
here's this--
1151
01:12:48,490 --> 01:12:54,730
this nice equation here, which
says that if all the
1152
01:12:54,730 --> 01:12:58,870
eigenvalues are distinct, then
you can always represent a
1153
01:12:58,870 --> 01:13:03,420
stochastic matrix as the sum of
lambda i times nu to the i
1154
01:13:03,420 --> 01:13:04,670
times pi to the i.
1155
01:13:04,670 --> 01:13:10,000
More importantly, if you take
this equation here and look at
1156
01:13:10,000 --> 01:13:14,470
P to the n, P to the n is U
times lambda times U to the
1157
01:13:14,470 --> 01:13:18,820
minus 1, times U times lambda
times U to the minus 1, blah,
1158
01:13:18,820 --> 01:13:20,270
blah, blah forever.
1159
01:13:20,270 --> 01:13:24,030
Each U to the minus 1 cancels
out with the following U. And
1160
01:13:24,030 --> 01:13:29,330
you wind up with P to the n
equals U times lambda to the
1161
01:13:29,330 --> 01:13:33,170
n, U to the minus 1.
1162
01:13:33,170 --> 01:13:40,250
Which says that P to the
n is just a sum here.
1163
01:13:40,250 --> 01:13:44,650
It's the sum of the eigenvalues
to the n-th power
1164
01:13:44,650 --> 01:13:47,320
times these pairs of
eigenvectors here.
1165
01:13:47,320 --> 01:13:51,660
So this is a general
decomposition for P to the n.
1166
01:13:51,660 --> 01:13:56,010
What we're interested in is what
happens as n gets large.
1167
01:13:56,010 --> 01:13:59,360
If we have a unit chain, we
already know what happens as n
1168
01:13:59,360 --> 01:14:00,570
gets large.
1169
01:14:00,570 --> 01:14:07,110
We know that as n gets large,
we wind up with just 1 times
1170
01:14:07,110 --> 01:14:12,480
this eigenvector e times
this eigenvector pi.
1171
01:14:12,480 --> 01:14:15,760
Which says that all of the other
eigenvalues have to go
1172
01:14:15,760 --> 01:14:19,670
to 0, which says that the
magnitude of these other
1173
01:14:19,670 --> 01:14:22,200
eigenvalues are less than 1.
1174
01:14:22,200 --> 01:14:23,450
So they're all going away.
1175
01:14:26,600 --> 01:14:32,300
So the facts here are that all
eigenvalues lambda have to
1176
01:14:32,300 --> 01:14:35,310
satisfy the magnitude
of lambda is less
1177
01:14:35,310 --> 01:14:36,740
than or equal to 1.
1178
01:14:36,740 --> 01:14:39,680
That's what I just argued.
1179
01:14:39,680 --> 01:14:44,530
For each recurrent class C,
there's one lambda equals 1,
1180
01:14:44,530 --> 01:14:47,750
with a left side and vector
equals the steady state on
1181
01:14:47,750 --> 01:14:51,190
that recurrent class
and 0 elsewhere.
1182
01:14:51,190 --> 01:14:55,230
The right eigenvector nu
satisfies the limit as n goes
1183
01:14:55,230 --> 01:14:56,410
to infinity.
1184
01:14:56,410 --> 01:15:00,930
So the probability that x sub n
is in this recurring class,
1185
01:15:00,930 --> 01:15:04,850
given that x sub 0 is equal
to 0, is equal to the i-th
1186
01:15:04,850 --> 01:15:08,700
component of that right
eigenvector.
1187
01:15:08,700 --> 01:15:13,200
In other words, if you have a
Markov chain which has several
1188
01:15:13,200 --> 01:15:16,480
recurrent classes, and you
want to find out what the
1189
01:15:16,480 --> 01:15:23,630
probability is, starting in the
transient state, of going
1190
01:15:23,630 --> 01:15:29,170
to one of those classes, this is
what tells you that answer.
1191
01:15:29,170 --> 01:15:33,510
This says that the probability
that you go to a particular
1192
01:15:33,510 --> 01:15:37,530
recurrent class C, given that
you start off in a particular
1193
01:15:37,530 --> 01:15:41,340
transient state i, is whatever
that right eigenvector
1194
01:15:41,340 --> 01:15:42,690
turns out to be.
1195
01:15:42,690 --> 01:15:46,170
And you can solve that right
eigenvector problem just as an
1196
01:15:46,170 --> 01:15:48,920
M by M set of linear
equations.
1197
01:15:48,920 --> 01:15:51,170
So you can find the
probabilities of going through
1198
01:15:51,170 --> 01:15:56,370
each transient state just by
solving that set of linear
1199
01:15:56,370 --> 01:16:01,650
equations and finding those
eigenvector equations.
1200
01:16:01,650 --> 01:16:05,770
For each recurrent periodic
class of period d, there are d
1201
01:16:05,770 --> 01:16:09,140
eigenvalues equally spaced
on the unit circle.
1202
01:16:09,140 --> 01:16:13,330
There are no other eigenvalues
with lambda equals 1-- with a
1203
01:16:13,330 --> 01:16:15,080
magnitude of lambda equals 1.
1204
01:16:15,080 --> 01:16:19,070
In other words, for each
recurrent class, you get one
1205
01:16:19,070 --> 01:16:20,700
eigenvalue that's equal to 1.
1206
01:16:20,700 --> 01:16:25,260
If that recurrent class is
periodic, you get a bunch of
1207
01:16:25,260 --> 01:16:30,640
other eigenvalues put around
the unit circle.
1208
01:16:30,640 --> 01:16:35,380
And those are all the
eigenvalues there are.
1209
01:16:35,380 --> 01:16:36,296
Oh my God.
1210
01:16:36,296 --> 01:16:38,000
It's--
1211
01:16:38,000 --> 01:16:39,930
I thought I was talking
quickly.
1212
01:16:39,930 --> 01:16:44,870
But anyway, if the eigenvectors
don't span the
1213
01:16:44,870 --> 01:16:50,360
space, then P to the n is equal
to U times this Jordan
1214
01:16:50,360 --> 01:16:55,350
reform, U to the minus 1, where
J is a Jordan form.
1215
01:16:55,350 --> 01:16:58,320
What you saw in the homework
when you looked at the--
1216
01:17:02,030 --> 01:17:04,075
when you looked at the
Markov chain--
1217
01:17:28,120 --> 01:17:28,620
OK.
1218
01:17:28,620 --> 01:17:35,020
This is one recurrent class
with this one node in it.
1219
01:17:35,020 --> 01:17:38,030
These two nodes are
both transient.
1220
01:17:38,030 --> 01:17:41,720
If you look at how long it takes
to get from here over to
1221
01:17:41,720 --> 01:17:45,120
there, those transition
probabilities do not
1222
01:17:45,120 --> 01:17:51,620
correspond to this
equation here.
1223
01:17:51,620 --> 01:17:54,075
Instead, P sub 1 2--
1224
01:17:57,400 --> 01:18:00,230
P sub 2 3, the way I've
drawn it here.
1225
01:18:00,230 --> 01:18:07,140
P sub 1 3 is n times this
eigenvalue, which
1226
01:18:07,140 --> 01:18:09,760
is 1/2 in this case.
1227
01:18:09,760 --> 01:18:12,820
And it doesn't correspond to
this, which is why you need a
1228
01:18:12,820 --> 01:18:14,290
Jordan form.
1229
01:18:14,290 --> 01:18:17,860
I said that Jordan forms
are excessively ugly.
1230
01:18:17,860 --> 01:18:22,120
Jordan forms are really very
classy and nice ways of
1231
01:18:22,120 --> 01:18:24,460
dealing with a problem
which is very ugly.
1232
01:18:24,460 --> 01:18:26,340
So don't blame Jordan.
1233
01:18:26,340 --> 01:18:29,670
Jordan simplified
things for us.
1234
01:18:29,670 --> 01:18:36,840
So that's roughly as far as we
went with Markov chains.
1235
01:18:40,970 --> 01:18:44,910
Renewal processes, we don't have
to review them because
1236
01:18:44,910 --> 01:18:47,400
you're already immediately
familiar with them.
1237
01:18:50,610 --> 01:18:55,910
I will do one thing next time
with renewal classes and
1238
01:18:55,910 --> 01:19:00,290
Markov chains, which is to
explain to you why the
1239
01:19:00,290 --> 01:19:04,660
expected amount of time to get
from one state back to itself
1240
01:19:04,660 --> 01:19:07,380
is equal to 1 over pi--
1241
01:19:07,380 --> 01:19:09,160
1 over pi sub i.
1242
01:19:09,160 --> 01:19:10,790
You did that in the homework.
1243
01:19:10,790 --> 01:19:12,900
And it was an awful
way to do it.
1244
01:19:12,900 --> 01:19:14,340
And there's a nice
way to do it.
1245
01:19:14,340 --> 01:19:15,860
I'll talk about that next time.