1
00:00:00,090 --> 00:00:02,500
The following content is
provided under a Creative
2
00:00:02,500 --> 00:00:04,019
Commons license.
3
00:00:04,019 --> 00:00:06,360
Your support will help
MIT OpenCourseWare
4
00:00:06,360 --> 00:00:10,730
continue to offer high-quality
educational resources for free.
5
00:00:10,730 --> 00:00:13,330
To make a donation or
view additional materials
6
00:00:13,330 --> 00:00:17,210
from hundreds of MIT courses,
visit MIT OpenCourseWare
7
00:00:17,210 --> 00:00:17,835
at ocw.mit.edu.
8
00:00:21,650 --> 00:00:24,030
PROFESSOR: We introduced
the data last time.
9
00:00:24,030 --> 00:00:27,700
These were some
macroeconomic variables
10
00:00:27,700 --> 00:00:33,990
that can be used for forecasting
the economy in terms of growth
11
00:00:33,990 --> 00:00:39,330
and factors such as
inflation or unemployment.
12
00:00:39,330 --> 00:00:44,020
The case note goes through
analyzing just three
13
00:00:44,020 --> 00:00:47,690
of these economic time
series-- the unemployment rate,
14
00:00:47,690 --> 00:00:51,360
the federal funds rate,
and a measure of the CPI,
15
00:00:51,360 --> 00:00:52,530
or Consumer Price Index.
16
00:00:56,450 --> 00:01:00,520
When one fits a vector
autoregression model
17
00:01:00,520 --> 00:01:08,940
to this data, it turns
out that the roots
18
00:01:08,940 --> 00:01:16,800
of the characteristic polynomial
are 1.002, then 0.9863.
19
00:01:16,800 --> 00:01:19,090
And you recall when our
discussion of vector
20
00:01:19,090 --> 00:01:23,140
autoregressive models, there's
a characteristic equation
21
00:01:23,140 --> 00:01:25,425
sort of in matrix
form, the determinant
22
00:01:25,425 --> 00:01:29,720
is just like the univariate
autoregressive case.
23
00:01:29,720 --> 00:01:44,120
And in order for the process
to be invertible, basically,
24
00:01:44,120 --> 00:01:46,150
the roots of the
characteristic polynomial
25
00:01:46,150 --> 00:01:50,370
need to be less
than 1 in magnitude.
26
00:01:50,370 --> 00:01:54,110
In this implementation of the
vector autoregression model,
27
00:01:54,110 --> 00:01:57,220
the characteristic
roots are the inverses
28
00:01:57,220 --> 00:01:59,620
of the characteristic roots
that we've been discussing.
29
00:01:59,620 --> 00:02:03,770
So anyway, this particular fit
of the vector autoregression
30
00:02:03,770 --> 00:02:11,370
model suggests that the
process is non-stationary.
31
00:02:11,370 --> 00:02:17,580
And so one should be
considering different series
32
00:02:17,580 --> 00:02:20,400
to model this as a
stationary time series.
33
00:02:20,400 --> 00:02:26,520
But in terms of interpreting
the regression model,
34
00:02:26,520 --> 00:02:36,320
one can see-- to accommodate
the non-stationarity,
35
00:02:36,320 --> 00:02:41,020
we can take differences
of all the series
36
00:02:41,020 --> 00:02:43,360
and fit the vector
autoregression
37
00:02:43,360 --> 00:02:45,550
to the difference series.
38
00:02:45,550 --> 00:02:49,210
So one way of eliminating any
non-stationarity in time series
39
00:02:49,210 --> 00:02:52,810
models, basically
eliminate the random walk
40
00:02:52,810 --> 00:02:57,290
aspect of the processes, is to
be modeling first differences.
41
00:02:57,290 --> 00:03:06,180
And so doing that with
this series-- let's see.
42
00:03:06,180 --> 00:03:10,220
Here is just a graph of
the time series properties
43
00:03:10,220 --> 00:03:11,800
of the difference series.
44
00:03:15,210 --> 00:03:19,180
So with our original series, we
take differences and eliminate
45
00:03:19,180 --> 00:03:22,820
missing values in this R code.
46
00:03:22,820 --> 00:03:25,300
And this
autocorrelation function
47
00:03:25,300 --> 00:03:31,100
shows us basically
the correlations
48
00:03:31,100 --> 00:03:33,420
and autocorrelations
of individual series
49
00:03:33,420 --> 00:03:36,950
and the cross-correlations
across the different series.
50
00:03:36,950 --> 00:03:41,680
So along the diagonals are
the autocorrelation function.
51
00:03:41,680 --> 00:03:43,800
And one can see
that every series
52
00:03:43,800 --> 00:03:47,280
is correlation one with itself.
53
00:03:47,280 --> 00:03:52,380
But then at the first
lag, positive for the Fed
54
00:03:52,380 --> 00:03:56,450
funds and the CPI measure.
55
00:03:56,450 --> 00:03:58,980
And there's also some
cross-correlations
56
00:03:58,980 --> 00:04:01,550
that are strong.
57
00:04:01,550 --> 00:04:04,180
And whether or not a
correlation is strong or not
58
00:04:04,180 --> 00:04:06,125
depends upon how much
uncertainty there
59
00:04:06,125 --> 00:04:08,250
is in our estimate
of the correlation.
60
00:04:08,250 --> 00:04:11,750
And these dashed
lines here correspond
61
00:04:11,750 --> 00:04:16,980
to plus or minus two standard
deviations of the correlation
62
00:04:16,980 --> 00:04:23,440
coefficient when the correlation
coefficient is equal to 0.
63
00:04:23,440 --> 00:04:28,470
So any correlations that sort
of go beyond those bounds
64
00:04:28,470 --> 00:04:29,715
is statistically significant.
65
00:04:33,180 --> 00:04:39,210
The partial autocorrelation
function is graphed here.
66
00:04:39,210 --> 00:04:42,730
And let's say our
time series problem
67
00:04:42,730 --> 00:04:46,040
set goes through some discussion
of the partial autocorrelation
68
00:04:46,040 --> 00:04:48,600
coefficients and the
interpretation of those.
69
00:04:48,600 --> 00:04:51,910
The partial autocorrelation
coefficients
70
00:04:51,910 --> 00:04:57,450
are the correlation
between one variable
71
00:04:57,450 --> 00:04:59,330
and the lag of another
after explaining
72
00:04:59,330 --> 00:05:02,110
for all lower degree lags.
73
00:05:02,110 --> 00:05:06,480
So it's like the incremental
correlation of a variable
74
00:05:06,480 --> 00:05:10,760
with a lag term that exists.
75
00:05:10,760 --> 00:05:13,830
And so if we are actually
fitting regression models where
76
00:05:13,830 --> 00:05:18,460
we include extra lags
of a given variable,
77
00:05:18,460 --> 00:05:20,570
that partial
autocorrelation coefficient
78
00:05:20,570 --> 00:05:25,260
is essentially the correlation
associated with the addition
79
00:05:25,260 --> 00:05:27,620
of the final lagged variable.
80
00:05:27,620 --> 00:05:30,230
So here, we can see that
each of these series
81
00:05:30,230 --> 00:05:33,950
is quite strongly
correlated with itself.
82
00:05:33,950 --> 00:05:37,470
But there are also
some cross-correlations
83
00:05:37,470 --> 00:05:42,750
with, like, the unemployment
rate and the Fed funds rate.
84
00:05:42,750 --> 00:05:46,700
Basically, the Fed
funds rate tends
85
00:05:46,700 --> 00:05:50,400
to go down when the
unemployment rate goes up.
86
00:05:50,400 --> 00:05:54,610
And so this data is
indicating the association
87
00:05:54,610 --> 00:05:56,640
between these
macroeconomic variables
88
00:05:56,640 --> 00:05:59,100
and the evidence
of that behavior.
89
00:05:59,100 --> 00:06:02,100
In terms of modeling the
actual structural relations
90
00:06:02,100 --> 00:06:05,930
between these, we need
several, up to about 10
91
00:06:05,930 --> 00:06:08,380
or 12 variables more
than these three.
92
00:06:08,380 --> 00:06:12,710
And then one can have
a better understanding
93
00:06:12,710 --> 00:06:15,750
of the drivers of various
macroeconomic features.
94
00:06:15,750 --> 00:06:17,250
But this sort of
illustrates the use
95
00:06:17,250 --> 00:06:19,950
of these methods with this
reduced variable case.
96
00:06:22,830 --> 00:06:25,650
Let me also go
down here and just
97
00:06:25,650 --> 00:06:33,710
comment on the unemployment
rate or the Fed funds rate.
98
00:06:46,050 --> 00:06:48,460
When fitting these vector
autoregressive models
99
00:06:48,460 --> 00:06:52,070
with the packages
that exist in R,
100
00:06:52,070 --> 00:06:56,320
they give us output which
provides the specification
101
00:06:56,320 --> 00:07:01,440
of each of the
autoregressive models
102
00:07:01,440 --> 00:07:05,260
for the different dependent
variables, the different series
103
00:07:05,260 --> 00:07:07,620
of the process.
104
00:07:07,620 --> 00:07:13,610
And so here is the case of the
regression model for Fed funds
105
00:07:13,610 --> 00:07:17,720
as a function of
unemployment rate lagged,
106
00:07:17,720 --> 00:07:21,040
Fed funds rate lagged,
and CPI lagged.
107
00:07:21,040 --> 00:07:25,240
These are all on
the different scale.
108
00:07:25,240 --> 00:07:27,730
When you're looking at
these results, what's
109
00:07:27,730 --> 00:07:31,340
important is
basically how strong
110
00:07:31,340 --> 00:07:33,850
the signal-to-noise
ratio is for estimating
111
00:07:33,850 --> 00:07:37,590
these autoregressive
parameters, vector
112
00:07:37,590 --> 00:07:39,130
autoregressive parameters.
113
00:07:39,130 --> 00:07:43,540
And so with the Fed funds,
you can look at the t values.
114
00:07:43,540 --> 00:07:45,920
And t values that
are larger than 2
115
00:07:45,920 --> 00:07:49,210
are certainly quite significant.
116
00:07:49,210 --> 00:07:53,540
And you can see that basically
when the unemployment rate
117
00:07:53,540 --> 00:07:59,250
coefficient is a negative
0.71, so if the unemployment
118
00:07:59,250 --> 00:08:05,270
rate goes up, we expect to
see the Fed rate going down
119
00:08:05,270 --> 00:08:07,080
the next month.
120
00:08:07,080 --> 00:08:15,650
And the Fed funds rate for the
lag 1 has a t value of 7.97.
121
00:08:15,650 --> 00:08:18,790
So these are now models
on the differences.
122
00:08:18,790 --> 00:08:21,480
So if the Fed funds
rate was increased
123
00:08:21,480 --> 00:08:25,880
last month or last quarter, it's
likely to be increased again.
124
00:08:25,880 --> 00:08:31,560
And that's partly a factor
of how slow the economy is
125
00:08:31,560 --> 00:08:34,049
in reacting to changes
and how the Fed doesn't
126
00:08:34,049 --> 00:08:40,200
want to shock the economy with
large changes in their policy
127
00:08:40,200 --> 00:08:42,909
rates.
128
00:08:42,909 --> 00:08:46,600
Another thing to notice here
is that there's actually
129
00:08:46,600 --> 00:08:50,230
a negative coefficient
on the lag 2
130
00:08:50,230 --> 00:08:54,490
Fed funds term, a negative 0.17.
131
00:08:54,490 --> 00:08:58,870
And in interpreting
these kinds of models,
132
00:08:58,870 --> 00:09:02,510
I think it's helpful
just to think of,
133
00:09:02,510 --> 00:09:06,210
if you have Fed
funds sub t, that's
134
00:09:06,210 --> 00:09:13,970
equal to minus 0.71 times the
unemployment rate at t minus 1.
135
00:09:13,970 --> 00:09:24,050
And then we have plus 0.37 times
the Fed funds, so t minus 1.
136
00:09:24,050 --> 00:09:24,820
And this is delta.
137
00:09:24,820 --> 00:09:31,330
And then minus 1.8
times the Fed funds.
138
00:09:31,330 --> 00:09:35,000
So t minus 2.
139
00:09:35,000 --> 00:09:39,290
In interpreting
these coefficients,
140
00:09:39,290 --> 00:09:43,020
notice that these
two terms correspond
141
00:09:43,020 --> 00:09:57,110
to 0.19 times the Fed funds
change 1 lag ago plus 0.18
142
00:09:57,110 --> 00:09:59,445
times the change in that rate.
143
00:10:03,550 --> 00:10:06,360
So when you see
multiple lags coming
144
00:10:06,360 --> 00:10:11,720
into play in these models,
the interpretation of them
145
00:10:11,720 --> 00:10:17,560
can be made by considering
different transformations
146
00:10:17,560 --> 00:10:20,210
essentially of the
underlying variables.
147
00:10:20,210 --> 00:10:23,130
In this form, you can see
that OK, the Fed funds
148
00:10:23,130 --> 00:10:30,180
tends to change the way it
changed the previous month.
149
00:10:30,180 --> 00:10:38,644
But it also may change
depending on the double change
150
00:10:38,644 --> 00:10:39,560
in the previous month.
151
00:10:39,560 --> 00:10:42,620
So there's a degree of
acceleration in the Fed funds
152
00:10:42,620 --> 00:10:44,450
that is being captured here.
153
00:10:44,450 --> 00:10:47,640
So the interpretation
of these models
154
00:10:47,640 --> 00:10:51,930
sometimes requires some care.
155
00:10:51,930 --> 00:10:55,560
This kind of analysis,
I find it quite useful.
156
00:11:02,600 --> 00:11:09,710
So let's push on
to the next topic.
157
00:11:09,710 --> 00:11:13,230
So today's topics are going
to begin with a discussion
158
00:11:13,230 --> 00:11:15,640
of cointegration.
159
00:11:15,640 --> 00:11:18,980
Cointegration is a major topic
in time series analysis, which
160
00:11:18,980 --> 00:11:23,980
is dealing with the analysis
of non-stationary time series.
161
00:11:23,980 --> 00:11:28,060
And in the previous
discussion, we
162
00:11:28,060 --> 00:11:29,910
addressed
non-stationarity of series
163
00:11:29,910 --> 00:11:32,214
by taking first
differences to eliminate
164
00:11:32,214 --> 00:11:33,130
that non-stationarity.
165
00:11:36,440 --> 00:11:40,140
But we may be losing
some information
166
00:11:40,140 --> 00:11:41,450
with that differencing.
167
00:11:41,450 --> 00:11:44,940
And cointegration
provides a framework
168
00:11:44,940 --> 00:11:47,440
within which we
characterize all available
169
00:11:47,440 --> 00:11:49,680
information for
statistical modeling,
170
00:11:49,680 --> 00:11:52,920
in a very systematic way.
171
00:11:52,920 --> 00:11:58,580
So let's introduce the
context within which
172
00:11:58,580 --> 00:12:00,630
cointegration is relevant.
173
00:12:00,630 --> 00:12:05,810
It's relevant when we
have a stochastic process,
174
00:12:05,810 --> 00:12:08,620
a multivariate
stochastic process, which
175
00:12:08,620 --> 00:12:12,060
is integrated of some order d.
176
00:12:12,060 --> 00:12:15,810
And to be integrated
of order d means
177
00:12:15,810 --> 00:12:18,920
that if we take the
d-th difference,
178
00:12:18,920 --> 00:12:21,395
then that d-th
difference is stationary.
179
00:12:23,980 --> 00:12:33,720
So and if you look
at a time series
180
00:12:33,720 --> 00:12:38,630
and you plot that over time,
well, OK, a stationary time
181
00:12:38,630 --> 00:12:43,010
series we know should be
something that basically
182
00:12:43,010 --> 00:12:45,010
has a constant mean over time.
183
00:12:45,010 --> 00:12:48,580
There's some steady
mean which that has.
184
00:12:48,580 --> 00:12:51,470
And the variability
is also constant.
185
00:12:51,470 --> 00:12:59,000
With some other time series,
it might increase linearly
186
00:12:59,000 --> 00:13:00,940
over time.
187
00:13:00,940 --> 00:13:03,600
And a series that increases
linearly over time, well,
188
00:13:03,600 --> 00:13:05,070
if you take first
differences, that
189
00:13:05,070 --> 00:13:07,650
tends to take out
that linear trend.
190
00:13:07,650 --> 00:13:10,230
If there are higher order
differencing is required, then
191
00:13:10,230 --> 00:13:14,160
that means that there's some
curvature, quadratic say,
192
00:13:14,160 --> 00:13:18,760
that may exist in the data
that is being taken out.
193
00:13:18,760 --> 00:13:25,460
So this differencing is required
to result in stationarity.
194
00:13:25,460 --> 00:13:32,430
If the process does have vector
autoregressive representation
195
00:13:32,430 --> 00:13:35,330
in spite of its
non-stationarity,
196
00:13:35,330 --> 00:13:43,920
then it can be represented by
a polynomial lag of the x's is
197
00:13:43,920 --> 00:13:48,690
equal to white noise epsilon.
198
00:13:48,690 --> 00:13:53,590
And the polynomial
phi of L going
199
00:13:53,590 --> 00:13:59,180
to have a factor term
in there of 1 minus L,
200
00:13:59,180 --> 00:14:02,100
basically the first
difference to the d power.
201
00:14:02,100 --> 00:14:06,300
So if taking these the
d-th order difference
202
00:14:06,300 --> 00:14:12,430
reduces it to
stationarity, then we
203
00:14:12,430 --> 00:14:16,630
can express this vector
autoregression in this way.
204
00:14:16,630 --> 00:14:26,620
So the phi star of L
basically represents
205
00:14:26,620 --> 00:14:31,110
the stationary vector
autoregressive process
206
00:14:31,110 --> 00:14:33,255
on the d-th difference series.
207
00:14:47,730 --> 00:14:52,780
Now, as it says here, each
of the component series
208
00:14:52,780 --> 00:14:57,090
may be non-stationary and
integrated, say of order one.
209
00:14:57,090 --> 00:15:02,770
But the process itself may
not be jointly integrated.
210
00:15:02,770 --> 00:15:08,900
In that it may be that there
are linear combinations
211
00:15:08,900 --> 00:15:13,800
of our multivariate series
which are stationary.
212
00:15:13,800 --> 00:15:20,570
And so these linear
combinations basically
213
00:15:20,570 --> 00:15:25,050
represent the stationary
features of the process.
214
00:15:25,050 --> 00:15:31,160
And those features can be
apparent without looking
215
00:15:31,160 --> 00:15:32,490
at differences.
216
00:15:32,490 --> 00:15:35,350
So in a sense, if
you just focused
217
00:15:35,350 --> 00:15:38,880
on differences of these
non-stationary multivariate
218
00:15:38,880 --> 00:15:43,560
series, you would be
losing out on information
219
00:15:43,560 --> 00:15:49,900
of the stationary structure
of contemporaneous components
220
00:15:49,900 --> 00:15:52,230
of the multivariate series.
221
00:15:52,230 --> 00:15:56,130
And so cointegration
deals with this situation
222
00:15:56,130 --> 00:16:01,480
where some linear combinations
of the multivariate series
223
00:16:01,480 --> 00:16:02,996
in fact are stationary.
224
00:16:08,810 --> 00:16:15,090
So how do we represent
that mathematically?
225
00:16:15,090 --> 00:16:19,020
Well, we say that this
multivariate time series
226
00:16:19,020 --> 00:16:24,360
process is cointegrated if
there exists an m-vector beta
227
00:16:24,360 --> 00:16:29,470
such that, defining linear
weights on the x's, and
228
00:16:29,470 --> 00:16:32,225
beta prime X_t is a
stationary process.
229
00:16:37,920 --> 00:16:42,610
The cointegration vector of
beta can be scaled arbitrarily.
230
00:16:42,610 --> 00:16:49,110
So it's common
practice, if one has
231
00:16:49,110 --> 00:16:51,200
an interest, some primary
interest, perhaps,
232
00:16:51,200 --> 00:16:53,580
in the first component
series of process,
233
00:16:53,580 --> 00:16:56,680
to set that equal to 1.
234
00:16:56,680 --> 00:17:01,020
And the expression
basically says
235
00:17:01,020 --> 00:17:06,470
that our time t value
of the first series
236
00:17:06,470 --> 00:17:11,930
is related in a stationary
way to a linear combination
237
00:17:11,930 --> 00:17:15,550
of the other m minus 1 series.
238
00:17:15,550 --> 00:17:21,859
And this is a long-run
equilibrium type relationship.
239
00:17:21,859 --> 00:17:25,510
How does this arise?
240
00:17:25,510 --> 00:17:30,570
Well, it arises in many, many
ways in economics and finance.
241
00:17:33,100 --> 00:17:36,000
The term structure of interest
rates, purchase power parity.
242
00:17:38,820 --> 00:17:42,660
In the terms structure
of interest rates,
243
00:17:42,660 --> 00:17:47,100
basically the differences
between yields
244
00:17:47,100 --> 00:17:50,260
on interest rates over
different maturities,
245
00:17:50,260 --> 00:17:52,600
those differences
might be stationary.
246
00:17:52,600 --> 00:17:56,780
The overall level of interest
might not be stationary,
247
00:17:56,780 --> 00:18:01,350
but the spreads ought
to be stationary.
248
00:18:01,350 --> 00:18:04,680
The purchase power parity
in foreign exchange,
249
00:18:04,680 --> 00:18:10,940
if you look at the
value of currencies
250
00:18:10,940 --> 00:18:14,830
for different countries,
basically different countries
251
00:18:14,830 --> 00:18:19,710
ought to be able to purchase
the same goods for roughly
252
00:18:19,710 --> 00:18:20,720
the same price.
253
00:18:20,720 --> 00:18:23,860
And so if there are
disparities in currency values,
254
00:18:23,860 --> 00:18:27,740
purchase power parity suggests
that things will revert back
255
00:18:27,740 --> 00:18:32,900
to some norm where everybody
is paying on average over time
256
00:18:32,900 --> 00:18:34,960
the same amount for
different goods.
257
00:18:34,960 --> 00:18:37,460
Otherwise, there
would be arbitrage.
258
00:18:40,030 --> 00:18:41,890
Money demand, covered
interest rate parity,
259
00:18:41,890 --> 00:18:44,340
law of one price,
spot and futures.
260
00:18:44,340 --> 00:18:48,470
Let me show you
another example that
261
00:18:48,470 --> 00:18:54,820
will be in the case
study for this chapter.
262
00:19:00,290 --> 00:19:06,410
View, full screen.
263
00:19:06,410 --> 00:19:09,900
Let's think about
energy futures.
264
00:19:09,900 --> 00:19:13,450
In fact, next Tuesday's
talk from Morgan Stanley
265
00:19:13,450 --> 00:19:18,490
is going to be an expert in
commodity futures and options.
266
00:19:18,490 --> 00:19:21,090
And that should be
very interesting.
267
00:19:21,090 --> 00:19:28,920
Anyway, here, I'm
looking at energy futures
268
00:19:28,920 --> 00:19:31,136
from the Energy
Information Administration.
269
00:19:31,136 --> 00:19:32,510
Actually, for this
course, trying
270
00:19:32,510 --> 00:19:36,970
to get data that's freely
available to students
271
00:19:36,970 --> 00:19:40,560
is one of the things we do.
272
00:19:40,560 --> 00:19:42,646
So this data is actually
available from the Energy
273
00:19:42,646 --> 00:19:44,770
Information Administration
of the government, which
274
00:19:44,770 --> 00:19:48,960
is now open, so I guess
that'll be updated over time.
275
00:19:48,960 --> 00:19:52,070
But basically these
energy futures
276
00:19:52,070 --> 00:19:55,570
are traded on the Chicago
Mercantile Exchange.
277
00:19:55,570 --> 00:20:03,290
And basically CL is crude,
West Texas intermediate crude,
278
00:20:03,290 --> 00:20:08,760
light crude, which we have
here, a time series from 2006
279
00:20:08,760 --> 00:20:12,670
to basically yesterday.
280
00:20:12,670 --> 00:20:16,340
And you can see how at the
start of the period around $60
281
00:20:16,340 --> 00:20:19,080
and then went up
to close to $140,
282
00:20:19,080 --> 00:20:22,440
and then it dropped
down to around $40.
283
00:20:22,440 --> 00:20:26,110
And it's been hovering
around $100 lately.
284
00:20:26,110 --> 00:20:33,040
The second series here is
gasoline, RBOB gasoline.
285
00:20:33,040 --> 00:20:36,240
Always have to look this up.
286
00:20:36,240 --> 00:20:42,690
This is that reformulated blend
stock for oxygenated blending
287
00:20:42,690 --> 00:20:43,250
gasoline.
288
00:20:43,250 --> 00:20:48,030
Anyway, futures on this product
are traded at the CME as well.
289
00:20:48,030 --> 00:20:50,750
And then heating oil.
290
00:20:50,750 --> 00:20:56,780
And what's happening
with these data
291
00:20:56,780 --> 00:21:08,880
is that we have basically
a refinery which processes
292
00:21:08,880 --> 00:21:15,990
crude oil as an input.
293
00:21:15,990 --> 00:21:20,180
And it basically
refines it, distills it,
294
00:21:20,180 --> 00:21:36,600
and generates outputs, which
include heating oil, gasoline,
295
00:21:36,600 --> 00:21:41,680
and various other things
like jet fuel and others.
296
00:21:41,680 --> 00:21:46,460
So if we're looking
at the prices,
297
00:21:46,460 --> 00:21:49,510
the futures prices of, say,
gasoline and heating oil,
298
00:21:49,510 --> 00:21:55,710
relating those to crude
oil, well, certainly,
299
00:21:55,710 --> 00:21:59,140
the cost of producing these
products should depend
300
00:21:59,140 --> 00:22:01,820
on the cost of the input .
301
00:22:01,820 --> 00:22:10,480
So I've got in the next plot,
a translation of these futures
302
00:22:10,480 --> 00:22:15,510
contracts into their
price per barrel.
303
00:22:15,510 --> 00:22:19,320
Turns out crude is quoted
in dollars per barrel.
304
00:22:19,320 --> 00:22:24,390
And the gasoline heating
oil are in cents per gallon.
305
00:22:24,390 --> 00:22:26,490
So one multiplies.
306
00:22:26,490 --> 00:22:28,310
There are 42
gallons in a barrel.
307
00:22:28,310 --> 00:22:30,960
So you multiply those
previous years by 42.
308
00:22:30,960 --> 00:22:33,549
And this shows the plot of
the prices of the futures
309
00:22:33,549 --> 00:22:35,590
where we're looking at
essentially the same units
310
00:22:35,590 --> 00:22:40,600
of output relative to input.
311
00:22:40,600 --> 00:22:45,700
And what's evident here is that
while the futures for gasoline,
312
00:22:45,700 --> 00:22:50,450
the blue, is consistently above
the green, the input, and same
313
00:22:50,450 --> 00:22:52,520
for heating oil.
314
00:22:52,520 --> 00:22:55,680
And those vary depending
on which is greater.
315
00:22:55,680 --> 00:23:02,600
So if we look at the
difference between, say,
316
00:23:02,600 --> 00:23:07,020
the price of the heating
oil future and the crude oil
317
00:23:07,020 --> 00:23:11,625
future, what does
that represent?
318
00:23:14,380 --> 00:23:20,780
That's the spread in value of
the output minus the input.
319
00:23:20,780 --> 00:23:21,546
Ray?
320
00:23:21,546 --> 00:23:24,282
AUDIENCE: [INAUDIBLE] cost
of running the refinery?
321
00:23:27,146 --> 00:23:31,940
PROFESSOR: So cost of refining.
322
00:23:31,940 --> 00:23:39,700
So let's look at, say,
heating oil minus CL and, say,
323
00:23:39,700 --> 00:23:43,930
this RBOB minus CL.
324
00:23:43,930 --> 00:23:46,670
So it's cost of refining.
325
00:23:46,670 --> 00:23:49,487
What else could
be a factor here?
326
00:23:49,487 --> 00:23:51,820
AUDIENCE: Supply and demand
characteristics [INAUDIBLE].
327
00:23:51,820 --> 00:23:52,736
PROFESSOR: Definitely.
328
00:23:52,736 --> 00:23:54,165
Supply and demand.
329
00:23:54,165 --> 00:23:56,290
If one product is demanded
a lot more than another.
330
00:23:58,280 --> 00:23:59,030
Supply and demand.
331
00:24:05,820 --> 00:24:08,215
Anything else?
332
00:24:08,215 --> 00:24:09,840
AUDIENCE: Maybe for
the outputs, if you
333
00:24:09,840 --> 00:24:11,340
were to find the difference
between the outputs,
334
00:24:11,340 --> 00:24:13,060
it would be something cyclical.
335
00:24:13,060 --> 00:24:15,640
For example, in the
winter, heating oil
336
00:24:15,640 --> 00:24:17,840
is going to get far more
valuable as gasoline,
337
00:24:17,840 --> 00:24:19,840
because people drive less
and people demand more
338
00:24:19,840 --> 00:24:20,950
for heating homes.
339
00:24:20,950 --> 00:24:22,080
PROFESSOR: Absolutely.
340
00:24:22,080 --> 00:24:25,670
That's a very significant
factor with these.
341
00:24:25,670 --> 00:24:29,230
There are seasonal effects
that drive supply and demand.
342
00:24:29,230 --> 00:24:35,460
And so we can put
seasonal effects in there
343
00:24:35,460 --> 00:24:36,980
as affecting supply and demand.
344
00:24:36,980 --> 00:24:40,280
But certainly, you might expect
to see seasonal structure here.
345
00:24:40,280 --> 00:24:43,720
Anything else?
346
00:24:43,720 --> 00:24:47,070
Put on your traders hat.
347
00:24:47,070 --> 00:24:49,310
Profit, yes.
348
00:24:49,310 --> 00:24:53,160
The refinery needs
to make some profit.
349
00:24:53,160 --> 00:24:58,520
So there has to be some
level of profit that's
350
00:24:58,520 --> 00:25:02,240
acceptable and appropriate.
351
00:25:02,240 --> 00:25:05,250
So we have all these
things driving basically
352
00:25:05,250 --> 00:25:07,630
these differences.
353
00:25:07,630 --> 00:25:10,220
Let's just take a look
at those differences.
354
00:25:10,220 --> 00:25:14,880
These are actually
called the crack spreads.
355
00:25:14,880 --> 00:25:19,250
Cracking in the
business of refining
356
00:25:19,250 --> 00:25:22,220
is basically the
breaking down of oil
357
00:25:22,220 --> 00:25:26,250
into components, products.
358
00:25:26,250 --> 00:25:31,800
And on the top is the
gasoline crack spread.
359
00:25:31,800 --> 00:25:35,460
And the bottom is the
heating oil crack spread.
360
00:25:35,460 --> 00:25:37,720
And one can see
that as time series,
361
00:25:37,720 --> 00:25:41,860
these actually look stationary.
362
00:25:41,860 --> 00:25:45,920
There certainly doesn't appear
to be a linear trend up.
363
00:25:45,920 --> 00:25:51,390
But there are, of course, many
factors that could affect this.
364
00:25:51,390 --> 00:25:59,110
So with that as motivation, how
would we model such a series?
365
00:25:59,110 --> 00:26:01,230
So let's go back to
our lecture here.
366
00:26:06,420 --> 00:26:08,775
All right, View, full size.
367
00:26:15,760 --> 00:26:18,430
This is going to be a
very technical discussion,
368
00:26:18,430 --> 00:26:25,460
but it's, at the end of the day,
I think fairly straightforward.
369
00:26:25,460 --> 00:26:27,210
And the objective
actually of this lecture
370
00:26:27,210 --> 00:26:31,240
is to provide an introduction
to the notation here, which
371
00:26:31,240 --> 00:26:35,860
should make it seem like it's a
very straightforward derivation
372
00:26:35,860 --> 00:26:37,800
process of these models.
373
00:26:37,800 --> 00:26:42,890
So let's begin with just a recap
of the vector autoregressive
374
00:26:42,890 --> 00:26:45,350
model of order p.
375
00:26:45,350 --> 00:26:47,570
This is the extension of
the univariate case where
376
00:26:47,570 --> 00:26:52,870
we have a vector C of
constants, m constants,
377
00:26:52,870 --> 00:26:56,960
and matrices phi_1 to
phi_p corresponding
378
00:26:56,960 --> 00:27:01,650
to basically how the
autoregression of one series
379
00:27:01,650 --> 00:27:04,810
depends on all the other series.
380
00:27:04,810 --> 00:27:08,270
And then there's multivariate
white noise eta_t,
381
00:27:08,270 --> 00:27:13,630
which has mean 0 and some
covariance structure in it.
382
00:27:13,630 --> 00:27:19,830
And the stationarity-- if
this series were stationary,
383
00:27:19,830 --> 00:27:28,050
then the determinant of
this matrix polynomial
384
00:27:28,050 --> 00:27:33,360
would have roots outside the
unit circle for complex z.
385
00:27:33,360 --> 00:27:39,290
And if it's not stationary,
then some of those roots
386
00:27:39,290 --> 00:27:41,680
will be on the unit
circle or beyond.
387
00:27:41,680 --> 00:27:45,125
So let's actually go to
that non-stationary case
388
00:27:45,125 --> 00:27:50,540
and suppose that the process
is integrated of order one.
389
00:27:50,540 --> 00:27:53,050
So if we were to take
first differences,
390
00:27:53,050 --> 00:27:54,175
we would have stationarity.
391
00:28:02,690 --> 00:28:06,500
Well, the derivation
of the model
392
00:28:06,500 --> 00:28:12,150
proceeds by converting the
original vector autoregressive
393
00:28:12,150 --> 00:28:16,050
equation into an
equation that's mostly
394
00:28:16,050 --> 00:28:19,560
relating to differences but
with also some extra terms.
395
00:28:19,560 --> 00:28:24,130
So let's begin the process
by just subtracting
396
00:28:24,130 --> 00:28:26,620
the lagged value of
the multivariate vector
397
00:28:26,620 --> 00:28:29,030
from the original series.
398
00:28:29,030 --> 00:28:31,290
So we subtract X_(t-1)
from both sides,
399
00:28:31,290 --> 00:28:37,330
and we get delta X_t is equal to
C plus phi_1 minus I_m X_(t-1)
400
00:28:37,330 --> 00:28:38,200
plus the rest.
401
00:28:38,200 --> 00:28:41,960
So that's a very simple step.
402
00:28:41,960 --> 00:28:46,220
We're just subtracting the
lagged multivariate series
403
00:28:46,220 --> 00:28:49,370
from both sides.
404
00:28:49,370 --> 00:28:53,290
Now, what we want
to do is convert
405
00:28:53,290 --> 00:28:59,930
the second term in the middle
line into a difference term.
406
00:28:59,930 --> 00:29:00,990
So what do we do?
407
00:29:00,990 --> 00:29:07,900
Well, we can subtract and add
phi_1 minus I_m times X_(t-2).
408
00:29:07,900 --> 00:29:10,440
If we do that,
subtract and add that,
409
00:29:10,440 --> 00:29:13,810
we then get the delta X_t is
C plus a multiple of delta
410
00:29:13,810 --> 00:29:19,530
X_(t-1) plus this
multiple of X_(t-2).
411
00:29:19,530 --> 00:29:22,240
So we basically
reduced the equations
412
00:29:22,240 --> 00:29:25,290
to differences in
the first two terms
413
00:29:25,290 --> 00:29:29,520
or in the current
series and the lagged.
414
00:29:29,520 --> 00:29:33,550
But then we have the original
series for lags t minus 2.
415
00:29:33,550 --> 00:29:38,660
We can continue this
process with the third.
416
00:29:38,660 --> 00:29:42,460
And then at the
end of the day, we
417
00:29:42,460 --> 00:29:46,150
end up getting this equation
for the difference of the series
418
00:29:46,150 --> 00:29:49,300
is equal to a constant
plus a matrix multiple
419
00:29:49,300 --> 00:29:53,880
of the first difference
multivariate series,
420
00:29:53,880 --> 00:29:56,920
plus another matrix times
the second difference,
421
00:29:56,920 --> 00:30:01,720
all the way down to
the p-th difference,
422
00:30:01,720 --> 00:30:03,760
or the p minus first difference.
423
00:30:03,760 --> 00:30:07,400
But at the end,
we're left with terms
424
00:30:07,400 --> 00:30:11,320
at p lags that have no
differences in them.
425
00:30:11,320 --> 00:30:14,440
So we've been able to
represent this series
426
00:30:14,440 --> 00:30:19,090
as an autoregressive
function of differences.
427
00:30:19,090 --> 00:30:24,010
But there's also a term on
the undifferenced series
428
00:30:24,010 --> 00:30:27,470
at the end that's left over.
429
00:30:27,470 --> 00:30:34,900
And or this argument
can actually
430
00:30:34,900 --> 00:30:38,330
proceed by eliminating
differences in the reverse way,
431
00:30:38,330 --> 00:30:42,650
starting with the
p-th lag and going up.
432
00:30:42,650 --> 00:30:47,200
And one then can represent
this as delta X_t
433
00:30:47,200 --> 00:30:50,170
is C plus some
matrix times just the
434
00:30:50,170 --> 00:30:56,000
lagged series plus various
matrices times the differences
435
00:30:56,000 --> 00:30:58,880
going back p minus 1 lags.
436
00:31:05,460 --> 00:31:10,200
And so at the end of
the day, this model
437
00:31:10,200 --> 00:31:14,270
basically for delta
X_t is a constant
438
00:31:14,270 --> 00:31:20,760
plus a matrix times the
previous lagged series
439
00:31:20,760 --> 00:31:25,660
or the first lag of the
multivariate time series,
440
00:31:25,660 --> 00:31:30,320
plus various autoregressive
lags of the differenced series.
441
00:31:32,960 --> 00:31:36,130
So these notes give you
the formulas for those,
442
00:31:36,130 --> 00:31:40,840
and they're very easy to
verify if you go through them
443
00:31:40,840 --> 00:31:41,594
one by one.
444
00:31:45,730 --> 00:31:51,760
And when we look at this
expression for the model,
445
00:31:51,760 --> 00:31:57,270
this expresses the
stochastic process model
446
00:31:57,270 --> 00:31:59,560
for the difference series.
447
00:31:59,560 --> 00:32:03,780
This difference
series is stationary.
448
00:32:03,780 --> 00:32:05,970
We've eliminated
the non-stationarity
449
00:32:05,970 --> 00:32:06,630
in the process.
450
00:32:06,630 --> 00:32:09,160
So that means the
right-hand side
451
00:32:09,160 --> 00:32:12,890
has to be stationary as well.
452
00:32:12,890 --> 00:32:19,890
And so while the terms which
are matrix multiples of lags
453
00:32:19,890 --> 00:32:21,390
of the differenced
series, those are
454
00:32:21,390 --> 00:32:23,750
going to be stationary
because we're just
455
00:32:23,750 --> 00:32:27,680
taking lags of the
stationary multivariate time
456
00:32:27,680 --> 00:32:29,540
series, the difference series.
457
00:32:29,540 --> 00:32:36,880
But this pi X_t term has
to be stationary as well.
458
00:32:36,880 --> 00:32:41,640
So this pi X_t contains
the cointegrating terms.
459
00:32:41,640 --> 00:32:46,600
And fitting a sort of
cointegrated vector
460
00:32:46,600 --> 00:32:53,490
autoregression model involves
identifying this term, pi X_t.
461
00:32:53,490 --> 00:33:00,870
And given that the original
series had unit roots,
462
00:33:00,870 --> 00:33:06,195
it has to be the case that
pi, the matrix, is singular.
463
00:33:09,550 --> 00:33:12,080
So it's basically
a transformation
464
00:33:12,080 --> 00:33:15,310
of the data that
eliminates that unit
465
00:33:15,310 --> 00:33:19,880
root in the overall series.
466
00:33:19,880 --> 00:33:24,440
So the matrix pi
is of reduced rank,
467
00:33:24,440 --> 00:33:27,676
and it's either rank
zero, in which case
468
00:33:27,676 --> 00:33:29,300
there's no cointegrating
relationships,
469
00:33:29,300 --> 00:33:34,500
or its rank is less than m.
470
00:33:34,500 --> 00:33:39,060
And the matrix pi does
define the cointegrating
471
00:33:39,060 --> 00:33:40,550
relationships.
472
00:33:40,550 --> 00:33:43,080
Now, these cointegrating
relationships
473
00:33:43,080 --> 00:33:48,990
are the relationships in the
process that are stationary.
474
00:33:48,990 --> 00:33:53,200
And so basically there's
a lot of information
475
00:33:53,200 --> 00:33:57,880
in that multivariate series
with contemporaneous values
476
00:33:57,880 --> 00:33:59,470
of the series.
477
00:33:59,470 --> 00:34:02,500
There is stationary structure
at every single time
478
00:34:02,500 --> 00:34:08,199
point, which can be the
target of the modeling.
479
00:34:08,199 --> 00:34:16,250
So this matrix pi is
of rank r less than m.
480
00:34:16,250 --> 00:34:22,100
And so it can be expressed
as basically alpha beta
481
00:34:22,100 --> 00:34:30,540
prime, where these matrices
are of rank r, alpha and beta.
482
00:34:30,540 --> 00:34:33,199
And the columns of beta define
linearly independent vectors
483
00:34:33,199 --> 00:34:34,770
which cointegrate x.
484
00:34:34,770 --> 00:34:37,909
And the decomposition
of pi isn't unique.
485
00:34:37,909 --> 00:34:43,389
You can basically, for any
invertible r by r matrix g,
486
00:34:43,389 --> 00:34:46,350
define another set of
cointegrating relationships.
487
00:34:46,350 --> 00:34:50,340
So in the linear algebra
structure of these problems,
488
00:34:50,340 --> 00:34:52,800
there's basically an
r-dimensional space
489
00:34:52,800 --> 00:34:56,360
where the process is
stationary, and how
490
00:34:56,360 --> 00:35:02,020
you define the coordinate system
in that space is up to you
491
00:35:02,020 --> 00:35:08,130
or subject to some choice.
492
00:35:08,130 --> 00:35:09,780
So how do we estimate
these models?
493
00:35:09,780 --> 00:35:15,520
Well, rather nice result
of Sims, Stock, and Watson.
494
00:35:15,520 --> 00:35:17,800
Actually, Sims,
Christopher Sims,
495
00:35:17,800 --> 00:35:21,790
he got the Nobel Prize a
few years ago for his work
496
00:35:21,790 --> 00:35:23,730
in econometrics.
497
00:35:23,730 --> 00:35:33,850
And so this is a rather
significant work that he did.
498
00:35:33,850 --> 00:35:36,740
Anyway, he, together
with Stock and Watson,
499
00:35:36,740 --> 00:35:41,120
prove that if you're estimating
a vector autoregression model,
500
00:35:41,120 --> 00:35:45,490
then the least squares
estimator of the original model
501
00:35:45,490 --> 00:35:49,150
is basically sufficient
to do an analysis
502
00:35:49,150 --> 00:35:56,600
of this cointegrated vector
autoregression process.
503
00:35:56,600 --> 00:35:58,960
The parameter estimates
from just fitting
504
00:35:58,960 --> 00:36:03,610
the vector autoregression are
consistent for the underlying
505
00:36:03,610 --> 00:36:04,657
parameters.
506
00:36:04,657 --> 00:36:06,240
And they have
asymptotic distributions
507
00:36:06,240 --> 00:36:09,980
that are identical to those of
maximum likelihood estimators.
508
00:36:09,980 --> 00:36:18,360
And so what ends up happening
is the least squares estimates
509
00:36:18,360 --> 00:36:21,960
of the vector autoregression
parameters lead
510
00:36:21,960 --> 00:36:27,270
to an estimation
of the pi matrix.
511
00:36:27,270 --> 00:36:40,290
And the constraints on the pi
matrix which are basically pi
512
00:36:40,290 --> 00:36:44,430
is of reduced rank, those
will hold asymptotically.
513
00:36:44,430 --> 00:36:49,240
So let's just go back
to the equation before,
514
00:36:49,240 --> 00:36:54,490
to see if that
looks familiar here.
515
00:36:58,930 --> 00:37:03,070
So what that work says
is that if we basically
516
00:37:03,070 --> 00:37:07,110
fit the linear regression
model regressing the difference
517
00:37:07,110 --> 00:37:13,930
series on the lag of the series
plus lags of differences,
518
00:37:13,930 --> 00:37:18,590
the least squares estimates
of these underlying parameters
519
00:37:18,590 --> 00:37:21,690
will give us asymptotically
efficient estimates
520
00:37:21,690 --> 00:37:24,060
of this overall process.
521
00:37:24,060 --> 00:37:31,635
So we don't need to use any new
tools to specify these models.
522
00:37:43,800 --> 00:37:48,110
There's an advanced literature
on estimation methods
523
00:37:48,110 --> 00:37:49,950
for these models.
524
00:37:49,950 --> 00:37:55,050
Johansen does describe
maximum likelihood estimation
525
00:37:55,050 --> 00:38:01,260
when the innovation terms
are normally distributed.
526
00:38:01,260 --> 00:38:07,270
And that methodology applies
reduced rank regression
527
00:38:07,270 --> 00:38:13,150
methodology and
yields tests for what
528
00:38:13,150 --> 00:38:17,130
the rank is of the
cointegrating relationship.
529
00:38:17,130 --> 00:38:20,270
And these methods are
implemented in our packages.
530
00:38:25,710 --> 00:38:26,420
Let's see.
531
00:38:26,420 --> 00:38:40,890
Let me just go back now
to the-- so let's see.
532
00:38:40,890 --> 00:38:47,690
The case study on
the crack spread data
533
00:38:47,690 --> 00:38:51,370
actually goes through sort of
testing for non-stationarity
534
00:38:51,370 --> 00:38:54,040
in these underlying series.
535
00:38:54,040 --> 00:38:58,360
And actually, why don't
I just show you that?
536
00:38:58,360 --> 00:38:59,450
Let's go back here.
537
00:39:17,522 --> 00:39:23,460
If you can see this, for
the crack spread data,
538
00:39:23,460 --> 00:39:25,230
looking at the
crude oil futures,
539
00:39:25,230 --> 00:39:28,450
basically the crude oil
future can be evaluated
540
00:39:28,450 --> 00:39:30,790
to see if it's non-stationary.
541
00:39:30,790 --> 00:39:33,800
And there's this augmented
Dickey-Fuller test
542
00:39:33,800 --> 00:39:36,350
for non-stationarity.
543
00:39:36,350 --> 00:39:43,160
And it basically has a null
hypothesis that the model
544
00:39:43,160 --> 00:39:46,850
or the series is non-stationary,
or it has a unit root,
545
00:39:46,850 --> 00:39:49,040
versus the alternative
that it doesn't.
546
00:39:49,040 --> 00:39:52,180
And so testing that
null hypothesis
547
00:39:52,180 --> 00:39:56,121
that it's non-stationary
yields a p-value of 0.164
548
00:39:56,121 --> 00:40:01,690
for CLC1, the first
nearest contract,
549
00:40:01,690 --> 00:40:07,400
near month contract of
the futures for crude.
550
00:40:07,400 --> 00:40:11,230
And so the data
suggests that crude
551
00:40:11,230 --> 00:40:14,060
has a distribution that's
non-stationary, integrated
552
00:40:14,060 --> 00:40:16,490
order 1.
553
00:40:16,490 --> 00:40:23,950
And the HOC1 also basically
has a test for-- p-value
554
00:40:23,950 --> 00:40:27,550
for non-stationarity of 0.3265.
555
00:40:27,550 --> 00:40:31,000
So we can't reject
non-stationarity or unit root
556
00:40:31,000 --> 00:40:34,150
in those series with
these test statistics.
557
00:40:34,150 --> 00:40:39,260
In analyzing the data, this
suggests that we basically
558
00:40:39,260 --> 00:40:41,380
need to accommodate that
non-stationarity when
559
00:40:41,380 --> 00:40:43,150
we specify the models.
560
00:40:46,925 --> 00:40:49,130
Let me just see if
there's some results here.
561
00:41:55,180 --> 00:41:59,060
For this series,
actually the case notes
562
00:41:59,060 --> 00:42:01,270
will go through actually
conducting this Johansen
563
00:42:01,270 --> 00:42:03,360
procedure for
testing for the rank
564
00:42:03,360 --> 00:42:05,700
of the cointegrated process.
565
00:42:05,700 --> 00:42:11,630
And that test basically has
different test statistic
566
00:42:11,630 --> 00:42:15,260
for testing whether the rank is
0, 1, less than or equal to 1,
567
00:42:15,260 --> 00:42:16,870
or less than or equal to 2.
568
00:42:16,870 --> 00:42:19,650
And one can see that
there's marginal-- the test
569
00:42:19,650 --> 00:42:25,930
statistic is almost
significant at the 10% level
570
00:42:25,930 --> 00:42:29,780
for the overall series.
571
00:42:29,780 --> 00:42:32,670
It's not significant
for the rank
572
00:42:32,670 --> 00:42:34,460
being less than or equal to 1.
573
00:42:34,460 --> 00:42:38,390
And so these results, it
doesn't suggest there's
574
00:42:38,390 --> 00:42:40,880
strong non-stationarity.
575
00:42:40,880 --> 00:42:45,360
But certainly with
that non-stationarity
576
00:42:45,360 --> 00:42:48,620
is no more than rank
one for the series.
577
00:42:48,620 --> 00:42:52,030
And the eigenvector
corresponding
578
00:42:52,030 --> 00:42:54,070
to the stationary
relationship is
579
00:42:54,070 --> 00:43:00,940
given by these coefficients
of 1 on the crude oil future,
580
00:43:00,940 --> 00:43:05,710
1.3 on the RBOB and minus
1.7 on the heating oil.
581
00:43:08,640 --> 00:43:13,360
So what this suggests
is that there's
582
00:43:13,360 --> 00:43:20,880
considerable variability in
these energy futures contracts.
583
00:43:20,880 --> 00:43:24,390
What appears to be stationary
is some linear combination
584
00:43:24,390 --> 00:43:28,670
of crude plus gasoline
minus heating oil.
585
00:43:28,670 --> 00:43:33,090
And in terms of why does
it combine that way,
586
00:43:33,090 --> 00:43:35,280
well, there are all
kinds of factors
587
00:43:35,280 --> 00:43:38,760
that we went through-- cost of
refining, supply and demand,
588
00:43:38,760 --> 00:43:41,370
seasonality, which
affect things.
589
00:43:41,370 --> 00:43:45,970
And so when analyzed, sort
of ignoring seasonality,
590
00:43:45,970 --> 00:43:50,000
these would be the linear
combinations that appear
591
00:43:50,000 --> 00:43:51,312
to be stationary over time.
592
00:43:51,312 --> 00:43:51,812
Yeah?
593
00:43:53,722 --> 00:43:55,680
AUDIENCE: Why did you
choose to use the futures
594
00:43:55,680 --> 00:43:56,929
prices as opposed to the spot?
595
00:43:56,929 --> 00:44:00,170
And how did you combine the
data with actual [INAUDIBLE]?
596
00:44:00,170 --> 00:44:07,820
PROFESSOR: I chose this
because if refiners are wanting
597
00:44:07,820 --> 00:44:12,130
to hedge their risks, then they
will go to the futures market
598
00:44:12,130 --> 00:44:14,060
to hedge those.
599
00:44:14,060 --> 00:44:17,090
And so working with
these data, one
600
00:44:17,090 --> 00:44:24,370
can then consider problems of
hedging refinery production
601
00:44:24,370 --> 00:44:25,460
risks.
602
00:44:25,460 --> 00:44:28,620
And so that's why.
603
00:44:28,620 --> 00:44:30,960
AUDIENCE: [INAUDIBLE]
604
00:44:30,960 --> 00:44:33,800
PROFESSOR: OK, well, the Energy
Information Administration
605
00:44:33,800 --> 00:44:39,270
provides historical data
which gives the first month,
606
00:44:39,270 --> 00:44:42,030
the second month, the third
month available for each
607
00:44:42,030 --> 00:44:43,400
of these contracts.
608
00:44:43,400 --> 00:44:47,720
And so I chose the
first month contract
609
00:44:47,720 --> 00:44:49,680
for each of these features.
610
00:44:49,680 --> 00:44:51,980
Those 10 are the most liquid.
611
00:44:51,980 --> 00:44:54,440
Depending on what
one is hedging,
612
00:44:54,440 --> 00:44:58,550
one would use perhaps
longer periods for those.
613
00:44:58,550 --> 00:45:02,450
There's some very
nice finance problems
614
00:45:02,450 --> 00:45:04,690
dealing with hedging,
hedging these kinds of risks,
615
00:45:04,690 --> 00:45:07,150
and as well as trading
these kinds of risk.
616
00:45:07,150 --> 00:45:11,030
Traders can try to exploit
short term movements in these.
617
00:45:29,870 --> 00:45:31,820
Anyway, I'll let you
look through these,
618
00:45:31,820 --> 00:45:32,760
the case note later.
619
00:45:32,760 --> 00:45:36,810
And it does provide some detail
on the coefficient estimates.
620
00:45:36,810 --> 00:45:39,119
And one can basically
get a handle
621
00:45:39,119 --> 00:45:40,785
on how these things
are being specified.
622
00:45:43,980 --> 00:45:46,170
So let's go back.
623
00:45:58,260 --> 00:46:06,490
The next topic I want to cover
is linear state-space models.
624
00:46:06,490 --> 00:46:12,725
It turns out that many
of these time series
625
00:46:12,725 --> 00:46:15,090
models appropriate in
economics and finance
626
00:46:15,090 --> 00:46:20,290
can be expressed as a
linear state-space model.
627
00:46:28,590 --> 00:46:32,250
I'm going to introduce the
general notation first and then
628
00:46:32,250 --> 00:46:35,100
provide illustrations
of this general notation
629
00:46:35,100 --> 00:46:38,480
with a number of
different examples.
630
00:46:38,480 --> 00:46:46,205
So the formulation is we have
basically an observation vector
631
00:46:46,205 --> 00:46:47,420
at time t, y_t.
632
00:46:47,420 --> 00:46:50,730
This is our multivariate time
series that we're modeling.
633
00:46:50,730 --> 00:46:53,930
Now, I've chosen it
to be k-dimensional
634
00:46:53,930 --> 00:46:57,900
for the observations.
635
00:46:57,900 --> 00:47:00,720
There's an underlying
state vector
636
00:47:00,720 --> 00:47:04,390
that's of m dimensions,
which basically characterizes
637
00:47:04,390 --> 00:47:11,740
the state of the
process at time t.
638
00:47:11,740 --> 00:47:15,240
There's an observation error
vector at time t, epsilon_t.
639
00:47:15,240 --> 00:47:18,830
So it's k by 1 as well,
corresponding to y.
640
00:47:18,830 --> 00:47:22,200
And there's a state transition
innovation error vector,
641
00:47:22,200 --> 00:47:31,240
which is n by 1,
which actually can
642
00:47:31,240 --> 00:47:36,040
be different from m, the
dimension of the state vector.
643
00:47:36,040 --> 00:47:41,300
So we have-- in the state
space specification,
644
00:47:41,300 --> 00:47:43,720
we're going to specify
two equations, one
645
00:47:43,720 --> 00:47:47,640
for how the states evolve
over time and another for how
646
00:47:47,640 --> 00:47:50,090
the observations or
measurements evolve,
647
00:47:50,090 --> 00:47:51,910
depending on the
underlying states.
648
00:47:51,910 --> 00:47:55,400
So let's first focus
on a state equation
649
00:47:55,400 --> 00:47:58,490
which describes how
the state progresses
650
00:47:58,490 --> 00:48:05,680
from the state at time t to
the state at time t plus 1.
651
00:48:05,680 --> 00:48:09,030
Because this is a linear
state-space model,
652
00:48:09,030 --> 00:48:10,710
basically the state
at t plus 1 is
653
00:48:10,710 --> 00:48:13,400
going to be some linear
function of the states at time
654
00:48:13,400 --> 00:48:16,640
t plus some noise.
655
00:48:16,640 --> 00:48:22,570
And that noise is
given by eta_t,
656
00:48:22,570 --> 00:48:26,670
being independent identically
distributed white noise,
657
00:48:26,670 --> 00:48:31,600
or normally distributed
with some covariance matrix
658
00:48:31,600 --> 00:48:33,910
Q_t, positive definite.
659
00:48:33,910 --> 00:48:37,740
And R_t is some
linear transformation
660
00:48:37,740 --> 00:48:41,180
of those, which
characterize the uncertainty
661
00:48:41,180 --> 00:48:42,880
in the particular states.
662
00:48:42,880 --> 00:48:45,160
So there's a great
deal of flexibility
663
00:48:45,160 --> 00:48:47,830
here in how things
depend on each other.
664
00:48:47,830 --> 00:48:53,090
And right now, it will appear
just like a lot of notation.
665
00:48:53,090 --> 00:48:54,700
But as we see it
in different cases,
666
00:48:54,700 --> 00:48:57,750
you'll see how these
terms come into play.
667
00:48:57,750 --> 00:48:59,260
And they're very
straightforward.
668
00:49:02,510 --> 00:49:04,800
So we're considering simple
linear transformations
669
00:49:04,800 --> 00:49:07,080
of the states plus noise.
670
00:49:07,080 --> 00:49:09,690
And then the observation
equation or measurement
671
00:49:09,690 --> 00:49:13,080
equation is a linear
transformation
672
00:49:13,080 --> 00:49:14,665
of the underlying
states plus noise.
673
00:49:17,230 --> 00:49:20,230
So the matrix Z_t is the
observation coefficients
674
00:49:20,230 --> 00:49:21,500
matrix.
675
00:49:21,500 --> 00:49:25,792
And the noise or innovations
epsilon_t are, we'll assume,
676
00:49:25,792 --> 00:49:27,250
independent
identically distributed
677
00:49:27,250 --> 00:49:29,083
normal, multivariate
normal random variables
678
00:49:29,083 --> 00:49:33,550
with some covariance matrix H_t.
679
00:49:33,550 --> 00:49:35,760
To be fully general,
the subscript t
680
00:49:35,760 --> 00:49:40,800
means the covariance
can depend on time t.
681
00:49:40,800 --> 00:49:44,780
It doesn't have to, but it can.
682
00:49:44,780 --> 00:49:48,600
These two equations
can be written together
683
00:49:48,600 --> 00:49:52,830
in a joint equation where
we see that the underlying
684
00:49:52,830 --> 00:49:59,370
state at time t, s, gets
transformed with T sub t
685
00:49:59,370 --> 00:50:04,550
to the state at t plus 1 plus
residual innovation term.
686
00:50:04,550 --> 00:50:08,720
And the observation equation
y_t is Z_t s_t plus that.
687
00:50:08,720 --> 00:50:12,430
So we're representing how
the states evolve over time
688
00:50:12,430 --> 00:50:14,910
and how the observations
depend on the underlying
689
00:50:14,910 --> 00:50:16,815
states in this joint equation.
690
00:50:19,770 --> 00:50:23,950
And the structure of
basically this sort
691
00:50:23,950 --> 00:50:28,400
of linear function of states
plus error, the error term u_t
692
00:50:28,400 --> 00:50:33,740
here is normally distributed
with covariance matrix omega,
693
00:50:33,740 --> 00:50:36,690
which has this structure.
694
00:50:36,690 --> 00:50:38,850
It's a block diagonal.
695
00:50:38,850 --> 00:50:42,942
We have the covariance
of the epsilons as the H.
696
00:50:42,942 --> 00:50:48,860
And the covariance of R_t
eta_t is R_t Q_t R_t transpose.
697
00:50:48,860 --> 00:50:54,660
So you may recall when we
take a covariance matrix
698
00:50:54,660 --> 00:51:01,210
of linear function of random
variables given by a matrix,
699
00:51:01,210 --> 00:51:05,310
then it's that linear function
R times the covariance matrix
700
00:51:05,310 --> 00:51:07,970
times the transpose.
701
00:51:07,970 --> 00:51:12,910
So that term comes into play.
702
00:51:12,910 --> 00:51:16,860
So let's see how a
capital asset pricing
703
00:51:16,860 --> 00:51:19,720
model with time-varying
betas can be represented
704
00:51:19,720 --> 00:51:21,540
as a linear state-space model.
705
00:51:24,220 --> 00:51:29,180
You'll recall, we discussed
this model a few lectures ago,
706
00:51:29,180 --> 00:51:33,870
where we have the excess
return of a given stock, r_t,
707
00:51:33,870 --> 00:51:39,150
is a linear function of the
excess return of the market
708
00:51:39,150 --> 00:51:43,710
portfolio, r_(m,t), plus error.
709
00:51:43,710 --> 00:51:48,310
What we're going to do now
is extend that previous model
710
00:51:48,310 --> 00:51:54,170
by adding time dependence, t,
to the regression parameters.
711
00:51:54,170 --> 00:51:56,320
The alpha is not a constant.
712
00:51:56,320 --> 00:51:58,060
It is going to vary by time.
713
00:51:58,060 --> 00:52:02,700
And the beta is also
going to very by time.
714
00:52:02,700 --> 00:52:04,810
And how will they vary by time?
715
00:52:04,810 --> 00:52:10,030
Well, we're going to
assume that the alpha_t is
716
00:52:10,030 --> 00:52:13,520
a Gaussian random walk.
717
00:52:13,520 --> 00:52:17,982
And the beta is also a
Gaussian random walk.
718
00:52:28,810 --> 00:52:33,670
And with that set up, we
have the following expression
719
00:52:33,670 --> 00:52:35,450
for the state equation.
720
00:52:35,450 --> 00:52:38,460
OK, the state equation, which
is just the unknown parameters--
721
00:52:38,460 --> 00:52:40,990
it's the alpha and the
beta at given time t.
722
00:52:43,660 --> 00:52:45,720
The state at time
t gets adjusted
723
00:52:45,720 --> 00:52:49,340
to the state at time t plus 1
by just adding these random walk
724
00:52:49,340 --> 00:52:50,100
terms to it.
725
00:52:50,100 --> 00:52:52,290
So it's a very simple process.
726
00:52:52,290 --> 00:52:55,270
We have the identity
times the previous state
727
00:52:55,270 --> 00:52:58,930
plus the identity times this
vector of these innovations.
728
00:52:58,930 --> 00:53:04,120
So s_(t+1) is equal to
T_t s_t plus R_t eta_t,
729
00:53:04,120 --> 00:53:08,720
where this matrix, T sub
t and R sub t are trivial;
730
00:53:08,720 --> 00:53:10,290
they're just the identity.
731
00:53:10,290 --> 00:53:15,710
And eta_t has a
covariance matrix
732
00:53:15,710 --> 00:53:18,985
which is just given by
Q_t, sigma squared nu,
733
00:53:18,985 --> 00:53:22,560
sigma squared epsilon.
734
00:53:22,560 --> 00:53:28,680
This is a complex way, perhaps,
of representing this model.
735
00:53:28,680 --> 00:53:32,610
But it puts this simple model
into that linear state-space
736
00:53:32,610 --> 00:53:33,110
framework.
737
00:53:36,670 --> 00:53:45,660
Now, the observation equation
is given by this expression
738
00:53:45,660 --> 00:53:52,250
defining the Z_t matrix as the
unit element and r_(m,t) So
739
00:53:52,250 --> 00:53:58,150
it's basically a row vector, or
a row matrix, one-row matrix.
740
00:53:58,150 --> 00:54:02,180
And epsilon_t is the
white noise process.
741
00:54:02,180 --> 00:54:05,570
Now, putting these
equations together,
742
00:54:05,570 --> 00:54:09,270
we basically have the equation
for the state transition
743
00:54:09,270 --> 00:54:13,230
and the observation
equation together.
744
00:54:13,230 --> 00:54:16,120
We have this form for that.
745
00:54:25,780 --> 00:54:28,522
So now, let's
consider a second case
746
00:54:28,522 --> 00:54:31,360
of linear regression
models where
747
00:54:31,360 --> 00:54:33,780
we have a time varying beta.
748
00:54:33,780 --> 00:54:37,140
In a way, this case
we just looked at
749
00:54:37,140 --> 00:54:39,999
is a simple case of that.
750
00:54:39,999 --> 00:54:41,540
But let's look at
a more general case
751
00:54:41,540 --> 00:54:45,270
where we have p independent
variables, which
752
00:54:45,270 --> 00:54:47,190
could be time-varying.
753
00:54:47,190 --> 00:54:51,670
So we have a
regression model almost
754
00:54:51,670 --> 00:54:54,040
as we've considered
it previously.
755
00:54:54,040 --> 00:54:58,400
y_t is equal to x_t transpose
beta_t plus epsilon_t.
756
00:54:58,400 --> 00:55:00,850
The difference now is our
regression coefficients
757
00:55:00,850 --> 00:55:03,580
beta are allowed to
change over time.
758
00:55:09,880 --> 00:55:11,180
How do they change over time?
759
00:55:11,180 --> 00:55:14,120
Well, we're going to
assume that those also
760
00:55:14,120 --> 00:55:19,120
follow independent random
walks with variances
761
00:55:19,120 --> 00:55:23,090
of the random walks that
may depend on the component.
762
00:55:23,090 --> 00:55:24,770
So the joint
state-space equation
763
00:55:24,770 --> 00:55:32,530
here is given by the identity
times s_t plus eta_t.
764
00:55:32,530 --> 00:55:36,360
That's basically the random
walk process for the underlying
765
00:55:36,360 --> 00:55:37,600
regression parameters.
766
00:55:37,600 --> 00:55:42,360
And y_t is equal
to x_t transpose
767
00:55:42,360 --> 00:55:46,081
times the same regression
parameters plus the observation
768
00:55:46,081 --> 00:55:46,580
error.
769
00:55:56,480 --> 00:55:59,770
I guess needless to say, if we
consider the special case where
770
00:55:59,770 --> 00:56:04,610
the random walk
process is degenerate
771
00:56:04,610 --> 00:56:07,320
and they're basically
steps of size zero,
772
00:56:07,320 --> 00:56:10,410
then we get the normal linear
regression model coming out
773
00:56:10,410 --> 00:56:11,870
of this.
774
00:56:11,870 --> 00:56:17,950
If we were to be specifying
the linear state-space
775
00:56:17,950 --> 00:56:22,810
implementation of this model and
consider successive estimates
776
00:56:22,810 --> 00:56:25,270
of the model
parameters over time,
777
00:56:25,270 --> 00:56:28,970
then these equations would
give us recursive estimates
778
00:56:28,970 --> 00:56:34,080
for updating
regressions as we add
779
00:56:34,080 --> 00:56:37,500
additional values to the
data, additional observations
780
00:56:37,500 --> 00:56:38,000
to the data.
781
00:56:43,880 --> 00:56:49,960
Let's look at autoregressive
models of order p.
782
00:56:49,960 --> 00:56:55,780
The autoregressive model of
order p for a univariate time
783
00:56:55,780 --> 00:57:01,670
series has the setup given here.
784
00:57:01,670 --> 00:57:07,470
It's a polynomial
lag of the response
785
00:57:07,470 --> 00:57:10,940
variable y_t is equal to
the innovation epsilon_t.
786
00:57:10,940 --> 00:57:16,130
And we can define
the state vector
787
00:57:16,130 --> 00:57:24,980
to be equal to the vector of
p values, p successive values
788
00:57:24,980 --> 00:57:27,650
of the process.
789
00:57:27,650 --> 00:57:33,710
And so we basically
get a combination
790
00:57:33,710 --> 00:57:38,700
here of the observation equation
and state equation joining
791
00:57:38,700 --> 00:57:46,720
where basically
one of the states
792
00:57:46,720 --> 00:57:48,760
is actually equal
to the observation.
793
00:57:48,760 --> 00:57:52,600
And basically, with
this definition
794
00:57:52,600 --> 00:57:59,160
for a state of the vector
at the next time point t,
795
00:57:59,160 --> 00:58:03,730
that is equal to this
linear transformation
796
00:58:03,730 --> 00:58:09,114
of the lagged state vector
plus that innovation term.
797
00:58:09,114 --> 00:58:10,608
I dropped the mic.
798
00:58:16,600 --> 00:58:21,480
So the notation here
shows the structure
799
00:58:21,480 --> 00:58:26,240
for how this linear
state-space model is evolving.
800
00:58:26,240 --> 00:58:29,090
Basically, the
observation equation
801
00:58:29,090 --> 00:58:32,410
is the linear
combination of the five
802
00:58:32,410 --> 00:58:36,500
multiples of lags of the
values plus the residual.
803
00:58:36,500 --> 00:58:40,240
And the previous
lags of the states
804
00:58:40,240 --> 00:58:46,200
are just simply the identities
times those values, shifted.
805
00:58:46,200 --> 00:58:51,690
So it's a very simple structure
for the autoregressive process
806
00:58:51,690 --> 00:58:53,431
as a linear state-space model.
807
00:58:56,660 --> 00:59:02,470
We have, as I was just saying,
for the transition matrix T sub
808
00:59:02,470 --> 00:59:09,750
t, this matrix and the
observation equation
809
00:59:09,750 --> 00:59:13,730
is essentially picking out
the first element of the state
810
00:59:13,730 --> 00:59:16,540
vector, which has no
measurement error.
811
00:59:16,540 --> 00:59:18,490
So that simplifies that.
812
00:59:21,940 --> 00:59:27,210
The moving average
model of order q
813
00:59:27,210 --> 00:59:29,700
could also be expressed as
a linear state-space model.
814
00:59:37,240 --> 00:59:38,820
Remember, the
moving average model
815
00:59:38,820 --> 00:59:43,030
is one where our response
variable, y, is simply
816
00:59:43,030 --> 00:59:48,290
some linear combination
of innovations,
817
00:59:48,290 --> 00:59:50,500
q past innovations.
818
00:59:50,500 --> 00:59:55,350
And this state
vector, if we consider
819
00:59:55,350 --> 01:00:00,180
the state vector just
being basically q
820
01:00:00,180 --> 01:00:04,400
lags of the innovations,
then the transition
821
01:00:04,400 --> 01:00:08,780
of those underlying states is
given by this expression here.
822
01:00:14,690 --> 01:00:17,770
And we have a state equation,
an observation equation,
823
01:00:17,770 --> 01:00:23,500
which has these forms for these
various transition matrices
824
01:00:23,500 --> 01:00:30,615
and for how the innovation
terms are related.
825
01:00:40,840 --> 01:00:43,160
Let me just finish
up with example
826
01:00:43,160 --> 01:00:47,780
showing with the autoregressive
moving average model.
827
01:00:47,780 --> 01:00:49,340
And many years ago,
it was actually
828
01:00:49,340 --> 01:00:55,490
very difficult to
specify the estimation
829
01:00:55,490 --> 01:00:58,902
methods for autoregressive
moving average models.
830
01:00:58,902 --> 01:01:00,800
But the implementation
of these models
831
01:01:00,800 --> 01:01:05,590
as linear state-space models
facilitated that greatly.
832
01:01:05,590 --> 01:01:13,030
And with the ARMA model,
the setup basically
833
01:01:13,030 --> 01:01:14,730
is a combination of
the autoregressive
834
01:01:14,730 --> 01:01:16,900
moving average processes.
835
01:01:16,900 --> 01:01:20,280
We have an
autoregression of the y's
836
01:01:20,280 --> 01:01:24,719
is equal to a moving
average of the residuals
837
01:01:24,719 --> 01:01:25,510
or the innovations.
838
01:01:28,170 --> 01:01:32,550
And it's convenient in the setup
for linear state-space models
839
01:01:32,550 --> 01:01:37,720
to define the dimension m,
which is the maximum of p and q
840
01:01:37,720 --> 01:01:45,860
plus 1, and think of having
basically a possibly m order
841
01:01:45,860 --> 01:01:50,860
polynomial lag for each
of those two series.
842
01:01:50,860 --> 01:01:55,060
And we can basically
constrain those values
843
01:01:55,060 --> 01:01:59,134
to be 0 if m is greater than
p or m is greater than q.
844
01:02:06,880 --> 01:02:11,240
And Harvey, in a very
important work in '93,
845
01:02:11,240 --> 01:02:17,080
actually defined a particular
state-space representation
846
01:02:17,080 --> 01:02:19,350
for this process.
847
01:02:19,350 --> 01:02:20,980
And I guess it's
important to know
848
01:02:20,980 --> 01:02:24,310
that with these linear
state-space models,
849
01:02:24,310 --> 01:02:29,030
we're dealing with
characterizing structure
850
01:02:29,030 --> 01:02:31,750
in m-dimensional space.
851
01:02:31,750 --> 01:02:35,510
There's often some choice in how
you represent your underlying
852
01:02:35,510 --> 01:02:37,670
states.
853
01:02:37,670 --> 01:02:42,430
You can basically
re-parametrize the models
854
01:02:42,430 --> 01:02:47,080
by considering invertible
linear transformations
855
01:02:47,080 --> 01:02:49,760
of the underlying states.
856
01:02:49,760 --> 01:02:52,820
So let me go back here.
857
01:02:56,700 --> 01:02:59,990
In expressing the state
equation generally
858
01:02:59,990 --> 01:03:04,190
is T sub t s_t plus R_t eta_t.
859
01:03:04,190 --> 01:03:08,540
This matrix T sub t
and st-- basically s_t
860
01:03:08,540 --> 01:03:11,280
can be replaced by a linear
transformation of s_t,
861
01:03:11,280 --> 01:03:16,730
so long as we multiply
the T sub t by the inverse
862
01:03:16,730 --> 01:03:17,850
of that transformation.
863
01:03:17,850 --> 01:03:19,810
So there's flexibility
in the choice
864
01:03:19,810 --> 01:03:22,340
of our linear state-space
specification.
865
01:03:22,340 --> 01:03:28,820
And so there really are many
different equivalent linear
866
01:03:28,820 --> 01:03:33,380
state-space models for a
given process depending
867
01:03:33,380 --> 01:03:35,600
on exactly how you
define the states
868
01:03:35,600 --> 01:03:39,490
and the underlying
transformation matrix T.
869
01:03:39,490 --> 01:03:44,900
And the beauty of Harvey's
work was coming up
870
01:03:44,900 --> 01:03:47,490
with a nice representation
for the states,
871
01:03:47,490 --> 01:03:53,100
where we had very simple forms
for the various matrices.
872
01:03:53,100 --> 01:03:57,000
And the lecture notes here
go through the derivation
873
01:03:57,000 --> 01:03:59,430
of that for the ARMA process.
874
01:03:59,430 --> 01:04:04,490
And this derivation
is-- I just want
875
01:04:04,490 --> 01:04:08,240
to go through the
first case just
876
01:04:08,240 --> 01:04:11,020
to highlight how
the argument goes.
877
01:04:11,020 --> 01:04:15,090
We basically have this equation,
which is the original equation
878
01:04:15,090 --> 01:04:17,345
for an ARMA(p,q) process.
879
01:04:20,180 --> 01:04:25,810
And Harvey says, well,
define the first--
880
01:04:25,810 --> 01:04:29,460
or the state at time t to
be equal to the observation
881
01:04:29,460 --> 01:04:31,820
at time t.
882
01:04:31,820 --> 01:04:38,250
If we do that, then how
does this equation relate
883
01:04:38,250 --> 01:04:46,000
to the basically-- this is the
state at the next time point, t
884
01:04:46,000 --> 01:04:50,610
plus 1, is equal to phi_1
times the state at time t,
885
01:04:50,610 --> 01:05:00,340
plus a second state at time
t and a residual innovation
886
01:05:00,340 --> 01:05:01,420
eta_t.
887
01:05:01,420 --> 01:05:09,110
So by choosing the first state
to be the observation value
888
01:05:09,110 --> 01:05:16,680
at that time, we can then
solve for the second state,
889
01:05:16,680 --> 01:05:19,810
which is given by
this expression,
890
01:05:19,810 --> 01:05:25,730
just by rewriting our model
equation in terms of s_(1,t),
891
01:05:25,730 --> 01:05:27,880
s_(2,t) and eta_t.
892
01:05:27,880 --> 01:05:36,950
So this s_(2,t) is this function
of the observations and eta_t.
893
01:05:36,950 --> 01:05:39,440
So it's a very
simple specification
894
01:05:39,440 --> 01:05:41,820
of the second state.
895
01:05:41,820 --> 01:05:48,020
Just what is that
second state element
896
01:05:48,020 --> 01:05:50,520
given this definition
of the first one?
897
01:05:50,520 --> 01:05:54,650
And one can do this
process iteratively
898
01:05:54,650 --> 01:05:59,180
getting rid of the
observations and replacing them
899
01:05:59,180 --> 01:06:01,290
by underlying states.
900
01:06:01,290 --> 01:06:03,770
And at the end of
the day, you end up
901
01:06:03,770 --> 01:06:09,490
with this very simple form
for the transition matrix T.
902
01:06:09,490 --> 01:06:13,950
Basically, the T has the
autoregressive components
903
01:06:13,950 --> 01:06:16,410
as the first column
of the T matrix.
904
01:06:16,410 --> 01:06:20,440
And this R matrix has
this vector of the moving
905
01:06:20,440 --> 01:06:22,550
average components.
906
01:06:22,550 --> 01:06:28,330
So it's a very nice way
to represent the model.
907
01:06:28,330 --> 01:06:32,990
Coming up with it was something
very clever that he did.
908
01:06:32,990 --> 01:06:36,580
But what one can see is
that this basic model where
909
01:06:36,580 --> 01:06:41,620
you have the states
transitioning according
910
01:06:41,620 --> 01:06:45,540
to a linear transformation of
the previous state plus error,
911
01:06:45,540 --> 01:06:49,910
and the observation being some
function of the current states,
912
01:06:49,910 --> 01:06:54,119
plus error or not, depending
on the formulation,
913
01:06:54,119 --> 01:06:55,035
is the representation.
914
01:06:58,200 --> 01:07:03,770
Now, with all of
these models, a reason
915
01:07:03,770 --> 01:07:08,860
why linear state-space
modeling is in fact effective
916
01:07:08,860 --> 01:07:19,711
is that their specification is
fully specified with the Kalman
917
01:07:19,711 --> 01:07:20,210
filter.
918
01:07:22,730 --> 01:07:32,100
So with this formulation of
linear state-space models,
919
01:07:32,100 --> 01:07:37,000
the Kalman filter
as a methodology is
920
01:07:37,000 --> 01:07:41,380
the recursive computation
of the probability density
921
01:07:41,380 --> 01:07:48,535
functions for the underlying
states at basically
922
01:07:48,535 --> 01:07:52,420
t plus 1 given
information up to time t,
923
01:07:52,420 --> 01:07:56,710
as well as the joint
density of the future state
924
01:07:56,710 --> 01:07:59,800
and the future observation at
t plus 1, given information up
925
01:07:59,800 --> 01:08:02,370
to time t.
926
01:08:02,370 --> 01:08:05,520
And also just the
marginal distribution
927
01:08:05,520 --> 01:08:10,380
of the next observation given
the information up to time t.
928
01:08:20,490 --> 01:08:26,510
So what I want to do is
just go through with you
929
01:08:26,510 --> 01:08:31,550
how the Kalman filter is
implemented and defined.
930
01:08:31,550 --> 01:08:35,370
And the implementation
of the Kalman filter
931
01:08:35,370 --> 01:08:40,939
requires us to have some
notation that's a bit involved,
932
01:08:40,939 --> 01:08:46,710
but we'll hopefully explain it
so it's very straightforward.
933
01:08:46,710 --> 01:08:49,474
There are basically conditional
means of the states.
934
01:08:52,090 --> 01:08:55,450
s sub t given t
is the mean value
935
01:08:55,450 --> 01:08:59,510
of the state at time t given
the information up to time t.
936
01:08:59,510 --> 01:09:02,069
If we condition
on t minus 1, then
937
01:09:02,069 --> 01:09:03,500
it's the expectation
of the state
938
01:09:03,500 --> 01:09:06,300
at time t given the
information up to t minus 1.
939
01:09:09,460 --> 01:09:12,100
And then y t t minus
1 is the expectation
940
01:09:12,100 --> 01:09:16,880
of the observation given
information up to t minus 1.
941
01:09:16,880 --> 01:09:18,780
There's also
conditional covariances
942
01:09:18,780 --> 01:09:22,260
and mean squared errors.
943
01:09:22,260 --> 01:09:26,620
All these covariances
are determined by omegas.
944
01:09:26,620 --> 01:09:33,240
The subscript corresponds to
states s, or observation y.
945
01:09:33,240 --> 01:09:35,060
And basically, the
conditioning set
946
01:09:35,060 --> 01:09:39,149
is either information up to
time t, t minus 1 or t minus 1
947
01:09:39,149 --> 01:09:40,479
in the second case.
948
01:09:40,479 --> 01:09:45,370
And we want to compute
basically the covariance matrix
949
01:09:45,370 --> 01:09:49,999
of the states given whatever
the information is, information
950
01:09:49,999 --> 01:09:52,439
up to time t, t minus 1.
951
01:09:52,439 --> 01:09:57,810
So these covariance
matrices are the expectation
952
01:09:57,810 --> 01:10:01,990
of the state minus
their expectation
953
01:10:01,990 --> 01:10:06,850
under the conditioning times
the state minus the expectation
954
01:10:06,850 --> 01:10:07,950
transpose.
955
01:10:07,950 --> 01:10:10,810
That's the definition of
that covariance matrix.
956
01:10:10,810 --> 01:10:12,230
So the different
definitions here
957
01:10:12,230 --> 01:10:14,300
correspond to just
whether we're conditioning
958
01:10:14,300 --> 01:10:15,345
on different information.
959
01:10:17,900 --> 01:10:23,170
And then the observation
innovations or residuals
960
01:10:23,170 --> 01:10:29,510
are the difference
between an observation y_t
961
01:10:29,510 --> 01:10:33,847
and its estimate given
information up to t minus 1.
962
01:10:37,190 --> 01:10:41,370
So the residuals in this process
are the innovation residuals,
963
01:10:41,370 --> 01:10:44,200
one period ahead.
964
01:10:44,200 --> 01:10:50,780
And the Kalman filter
consists of four steps.
965
01:10:50,780 --> 01:11:00,800
We basically want to, first,
predict the state vector
966
01:11:00,800 --> 01:11:01,780
one step ahead.
967
01:11:01,780 --> 01:11:10,140
So given our estimate of the
state vector at time t minus 1,
968
01:11:10,140 --> 01:11:14,800
we want to predict this
state vector at time t.
969
01:11:14,800 --> 01:11:18,220
And we also want to
predict the observation
970
01:11:18,220 --> 01:11:23,820
at time t given our estimate
at state vector time t minus 1.
971
01:11:23,820 --> 01:11:31,674
And so at time t minus 1, we
can estimate these quantities.
972
01:11:31,674 --> 01:11:32,174
[INAUDIBLE]
973
01:11:35,646 --> 01:11:40,969
At t minus 1, we can
basically predict
974
01:11:40,969 --> 01:11:42,760
what the state is going
to and predict what
975
01:11:42,760 --> 01:11:44,750
the observation is going to be.
976
01:11:44,750 --> 01:11:47,166
And we can estimate
how much error there's
977
01:11:47,166 --> 01:11:49,707
going to be in those estimates,
by these covariance matrices.
978
01:11:59,420 --> 01:12:05,140
The second step is
updating these predictions
979
01:12:05,140 --> 01:12:11,900
to get our estimate of the state
given the observation at time t
980
01:12:11,900 --> 01:12:15,480
and to update our uncertainty
about that state given
981
01:12:15,480 --> 01:12:16,380
this new observation.
982
01:12:16,380 --> 01:12:21,350
So basically, our estimate
of the state at time t
983
01:12:21,350 --> 01:12:25,310
is an adjustment to our
estimate given information up
984
01:12:25,310 --> 01:12:31,164
to t minus 1, plus a function of
the difference between what we
985
01:12:31,164 --> 01:12:32,455
observed and what we predicted.
986
01:12:35,020 --> 01:12:42,870
And this T_t function matrix is
called the filter gain matrix.
987
01:12:42,870 --> 01:12:45,120
And basically, it
characterizes how
988
01:12:45,120 --> 01:12:50,070
do we adjust our prediction
of the underlying state
989
01:12:50,070 --> 01:12:52,760
depending on what happened.
990
01:12:52,760 --> 01:12:54,440
So that's the
filter gain matrix.
991
01:12:57,150 --> 01:13:00,470
So we actually do
gain information
992
01:13:00,470 --> 01:13:03,160
with each observation about what
the new value of the process
993
01:13:03,160 --> 01:13:04,320
is.
994
01:13:04,320 --> 01:13:06,830
And that information
is characterized
995
01:13:06,830 --> 01:13:09,190
by filter gain matrix.
996
01:13:09,190 --> 01:13:11,580
You'll notice that
the uncertainty
997
01:13:11,580 --> 01:13:15,720
in the state at time t, this
omega_s of t given t, that's
998
01:13:15,720 --> 01:13:19,630
equal to the covariance
matrix given t minus 1.
999
01:13:19,630 --> 01:13:23,330
So it's our beginning level
of uncertainty adjusted
1000
01:13:23,330 --> 01:13:27,790
by a term that tells us
how much information did we
1001
01:13:27,790 --> 01:13:29,580
get from that new information.
1002
01:13:29,580 --> 01:13:33,590
So notice that there's
a minus sign there.
1003
01:13:33,590 --> 01:13:35,600
We're basically
reducing our uncertainty
1004
01:13:35,600 --> 01:13:44,602
about the state given the
information in the innovation
1005
01:13:44,602 --> 01:13:45,685
that we now have observed.
1006
01:13:48,800 --> 01:13:51,870
Then, there's a
forecasting step which
1007
01:13:51,870 --> 01:13:59,310
is used to forecast the
state one period forward,
1008
01:13:59,310 --> 01:14:01,400
is simply given by this
linear transformation
1009
01:14:01,400 --> 01:14:03,170
of the previous state.
1010
01:14:03,170 --> 01:14:05,890
And we can also update
our covariance matrix
1011
01:14:05,890 --> 01:14:09,580
for future states given
the previous state
1012
01:14:09,580 --> 01:14:13,530
by applying this formula
which is a recursive formula
1013
01:14:13,530 --> 01:14:17,580
for estimating covariances.
1014
01:14:17,580 --> 01:14:24,760
So we have
forecasting algorithms
1015
01:14:24,760 --> 01:14:29,520
that are simple linear
functions of these estimates.
1016
01:14:29,520 --> 01:14:35,650
And then finally,
there's a smoothing step
1017
01:14:35,650 --> 01:14:43,960
which is characterizing
the conditional expectation
1018
01:14:43,960 --> 01:14:49,950
of underlying states, given
information in the whole time
1019
01:14:49,950 --> 01:14:51,150
series.
1020
01:14:51,150 --> 01:14:55,440
And so ordinarily with Kalman
filters, Kalman filters
1021
01:14:55,440 --> 01:14:58,210
are applied
sequentially over time
1022
01:14:58,210 --> 01:15:01,090
where one basically
is predicting ahead
1023
01:15:01,090 --> 01:15:03,550
one step, updating
that prediction,
1024
01:15:03,550 --> 01:15:08,320
predicting ahead another
step, updating the information
1025
01:15:08,320 --> 01:15:10,930
on the states.
1026
01:15:10,930 --> 01:15:19,410
And that overall
process is the process
1027
01:15:19,410 --> 01:15:21,550
of actually computing
the likelihood
1028
01:15:21,550 --> 01:15:25,210
function for these linear
state-space models.
1029
01:15:25,210 --> 01:15:32,140
And so the Kalman filter is
basically ultimately applied
1030
01:15:32,140 --> 01:15:35,010
for successive
forecasting of the process
1031
01:15:35,010 --> 01:15:39,600
but also for helping us identify
what the underlying model
1032
01:15:39,600 --> 01:15:43,430
parameters are using
maximum likelihood methods.
1033
01:15:43,430 --> 01:15:48,290
And so the likelihood function
for the linear state-space
1034
01:15:48,290 --> 01:15:52,050
model is basically the--
or the log-likelihood
1035
01:15:52,050 --> 01:15:54,920
is the log-likelihood of
the entire data series,
1036
01:15:54,920 --> 01:15:56,980
give the unknown parameters.
1037
01:15:56,980 --> 01:16:00,020
But that can be
expressed as the product
1038
01:16:00,020 --> 01:16:04,290
of the conditional distributions
of each successive observation,
1039
01:16:04,290 --> 01:16:07,150
given the history.
1040
01:16:07,150 --> 01:16:09,750
And so basically, the
likelihood of theta
1041
01:16:09,750 --> 01:16:12,390
is the likelihood of
the first observation
1042
01:16:12,390 --> 01:16:15,240
times the density of the
second observation given
1043
01:16:15,240 --> 01:16:18,990
the first times and so
forth for the whole series.
1044
01:16:18,990 --> 01:16:22,650
And so the likelihood
function is basically
1045
01:16:22,650 --> 01:16:25,490
a function of all these
terms that we were computing
1046
01:16:25,490 --> 01:16:26,490
with the Kalman filter.
1047
01:16:29,260 --> 01:16:33,470
And with the Kalman
filter, it basically
1048
01:16:33,470 --> 01:16:36,760
provides all the terms
necessary for this estimation.
1049
01:16:36,760 --> 01:16:42,270
If the error terms are
normally distributed,
1050
01:16:42,270 --> 01:16:46,550
then the means and
variances of these estimates
1051
01:16:46,550 --> 01:16:52,750
are in fact characterizing
the exact distributions
1052
01:16:52,750 --> 01:16:54,300
of the process.
1053
01:16:54,300 --> 01:16:56,850
Basically, we're taking--
if the innovation series are
1054
01:16:56,850 --> 01:16:59,290
all normal random
variables, then
1055
01:16:59,290 --> 01:17:00,980
the linear
state-space model, all
1056
01:17:00,980 --> 01:17:03,750
it's doing is taking linear
combinations of normals
1057
01:17:03,750 --> 01:17:07,410
for the underlying states and
for the actual observations.
1058
01:17:07,410 --> 01:17:08,890
And normal
distributions are fully
1059
01:17:08,890 --> 01:17:10,610
characterized by
their mean vectors
1060
01:17:10,610 --> 01:17:12,310
and covariance matrices.
1061
01:17:12,310 --> 01:17:14,050
And the Kalman
filter provides a way
1062
01:17:14,050 --> 01:17:21,570
to update these distributions
for all these features
1063
01:17:21,570 --> 01:17:23,000
of a model, the
underlying states
1064
01:17:23,000 --> 01:17:26,520
as well as the distributions
of the observations.
1065
01:17:26,520 --> 01:17:35,250
So that's a brief introduction
the Kalman filter.
1066
01:17:35,250 --> 01:17:36,940
Let's finish there.
1067
01:17:36,940 --> 01:17:38,490
Thank you.