1
00:00:00,090 --> 00:00:02,500
The following content is
provided under a Creative

2
00:00:02,500 --> 00:00:04,019
Commons license.

3
00:00:04,019 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,730
continue to offer high-quality
educational resources for free.

5
00:00:10,730 --> 00:00:13,330
To make a donation or
view additional materials

6
00:00:13,330 --> 00:00:17,210
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,210 --> 00:00:17,835
at ocw.mit.edu.

8
00:00:21,650 --> 00:00:24,030
PROFESSOR: We introduced
the data last time.

9
00:00:24,030 --> 00:00:27,700
These were some
macroeconomic variables

10
00:00:27,700 --> 00:00:33,990
that can be used for forecasting
the economy in terms of growth

11
00:00:33,990 --> 00:00:39,330
and factors such as
inflation or unemployment.

12
00:00:39,330 --> 00:00:44,020
The case note goes through
analyzing just three

13
00:00:44,020 --> 00:00:47,690
of these economic time
series-- the unemployment rate,

14
00:00:47,690 --> 00:00:51,360
the federal funds rate,
and a measure of the CPI,

15
00:00:51,360 --> 00:00:52,530
or Consumer Price Index.

16
00:00:56,450 --> 00:01:00,520
When one fits a vector
autoregression model

17
00:01:00,520 --> 00:01:08,940
to this data, it turns
out that the roots

18
00:01:08,940 --> 00:01:16,800
of the characteristic polynomial
are 1.002, then 0.9863.

19
00:01:16,800 --> 00:01:19,090
And you recall when our
discussion of vector

20
00:01:19,090 --> 00:01:23,140
autoregressive models, there's
a characteristic equation

21
00:01:23,140 --> 00:01:25,425
sort of in matrix
form, the determinant

22
00:01:25,425 --> 00:01:29,720
is just like the univariate
autoregressive case.

23
00:01:29,720 --> 00:01:44,120
And in order for the process
to be invertible, basically,

24
00:01:44,120 --> 00:01:46,150
the roots of the
characteristic polynomial

25
00:01:46,150 --> 00:01:50,370
need to be less
than 1 in magnitude.

26
00:01:50,370 --> 00:01:54,110
In this implementation of the
vector autoregression model,

27
00:01:54,110 --> 00:01:57,220
the characteristic
roots are the inverses

28
00:01:57,220 --> 00:01:59,620
of the characteristic roots
that we've been discussing.

29
00:01:59,620 --> 00:02:03,770
So anyway, this particular fit
of the vector autoregression

30
00:02:03,770 --> 00:02:11,370
model suggests that the
process is non-stationary.

31
00:02:11,370 --> 00:02:17,580
And so one should be
considering different series

32
00:02:17,580 --> 00:02:20,400
to model this as a
stationary time series.

33
00:02:20,400 --> 00:02:26,520
But in terms of interpreting
the regression model,

34
00:02:26,520 --> 00:02:36,320
one can see-- to accommodate
the non-stationarity,

35
00:02:36,320 --> 00:02:41,020
we can take differences
of all the series

36
00:02:41,020 --> 00:02:43,360
and fit the vector
autoregression

37
00:02:43,360 --> 00:02:45,550
to the difference series.

38
00:02:45,550 --> 00:02:49,210
So one way of eliminating any
non-stationarity in time series

39
00:02:49,210 --> 00:02:52,810
models, basically
eliminate the random walk

40
00:02:52,810 --> 00:02:57,290
aspect of the processes, is to
be modeling first differences.

41
00:02:57,290 --> 00:03:06,180
And so doing that with
this series-- let's see.

42
00:03:06,180 --> 00:03:10,220
Here is just a graph of
the time series properties

43
00:03:10,220 --> 00:03:11,800
of the difference series.

44
00:03:15,210 --> 00:03:19,180
So with our original series, we
take differences and eliminate

45
00:03:19,180 --> 00:03:22,820
missing values in this R code.

46
00:03:22,820 --> 00:03:25,300
And this
autocorrelation function

47
00:03:25,300 --> 00:03:31,100
shows us basically
the correlations

48
00:03:31,100 --> 00:03:33,420
and autocorrelations
of individual series

49
00:03:33,420 --> 00:03:36,950
and the cross-correlations
across the different series.

50
00:03:36,950 --> 00:03:41,680
So along the diagonals are
the autocorrelation function.

51
00:03:41,680 --> 00:03:43,800
And one can see
that every series

52
00:03:43,800 --> 00:03:47,280
is correlation one with itself.

53
00:03:47,280 --> 00:03:52,380
But then at the first
lag, positive for the Fed

54
00:03:52,380 --> 00:03:56,450
funds and the CPI measure.

55
00:03:56,450 --> 00:03:58,980
And there's also some
cross-correlations

56
00:03:58,980 --> 00:04:01,550
that are strong.

57
00:04:01,550 --> 00:04:04,180
And whether or not a
correlation is strong or not

58
00:04:04,180 --> 00:04:06,125
depends upon how much
uncertainty there

59
00:04:06,125 --> 00:04:08,250
is in our estimate
of the correlation.

60
00:04:08,250 --> 00:04:11,750
And these dashed
lines here correspond

61
00:04:11,750 --> 00:04:16,980
to plus or minus two standard
deviations of the correlation

62
00:04:16,980 --> 00:04:23,440
coefficient when the correlation
coefficient is equal to 0.

63
00:04:23,440 --> 00:04:28,470
So any correlations that sort
of go beyond those bounds

64
00:04:28,470 --> 00:04:29,715
is statistically significant.

65
00:04:33,180 --> 00:04:39,210
The partial autocorrelation
function is graphed here.

66
00:04:39,210 --> 00:04:42,730
And let's say our
time series problem

67
00:04:42,730 --> 00:04:46,040
set goes through some discussion
of the partial autocorrelation

68
00:04:46,040 --> 00:04:48,600
coefficients and the
interpretation of those.

69
00:04:48,600 --> 00:04:51,910
The partial autocorrelation
coefficients

70
00:04:51,910 --> 00:04:57,450
are the correlation
between one variable

71
00:04:57,450 --> 00:04:59,330
and the lag of another
after explaining

72
00:04:59,330 --> 00:05:02,110
for all lower degree lags.

73
00:05:02,110 --> 00:05:06,480
So it's like the incremental
correlation of a variable

74
00:05:06,480 --> 00:05:10,760
with a lag term that exists.

75
00:05:10,760 --> 00:05:13,830
And so if we are actually
fitting regression models where

76
00:05:13,830 --> 00:05:18,460
we include extra lags
of a given variable,

77
00:05:18,460 --> 00:05:20,570
that partial
autocorrelation coefficient

78
00:05:20,570 --> 00:05:25,260
is essentially the correlation
associated with the addition

79
00:05:25,260 --> 00:05:27,620
of the final lagged variable.

80
00:05:27,620 --> 00:05:30,230
So here, we can see that
each of these series

81
00:05:30,230 --> 00:05:33,950
is quite strongly
correlated with itself.

82
00:05:33,950 --> 00:05:37,470
But there are also
some cross-correlations

83
00:05:37,470 --> 00:05:42,750
with, like, the unemployment
rate and the Fed funds rate.

84
00:05:42,750 --> 00:05:46,700
Basically, the Fed
funds rate tends

85
00:05:46,700 --> 00:05:50,400
to go down when the
unemployment rate goes up.

86
00:05:50,400 --> 00:05:54,610
And so this data is
indicating the association

87
00:05:54,610 --> 00:05:56,640
between these
macroeconomic variables

88
00:05:56,640 --> 00:05:59,100
and the evidence
of that behavior.

89
00:05:59,100 --> 00:06:02,100
In terms of modeling the
actual structural relations

90
00:06:02,100 --> 00:06:05,930
between these, we need
several, up to about 10

91
00:06:05,930 --> 00:06:08,380
or 12 variables more
than these three.

92
00:06:08,380 --> 00:06:12,710
And then one can have
a better understanding

93
00:06:12,710 --> 00:06:15,750
of the drivers of various
macroeconomic features.

94
00:06:15,750 --> 00:06:17,250
But this sort of
illustrates the use

95
00:06:17,250 --> 00:06:19,950
of these methods with this
reduced variable case.

96
00:06:22,830 --> 00:06:25,650
Let me also go
down here and just

97
00:06:25,650 --> 00:06:33,710
comment on the unemployment
rate or the Fed funds rate.

98
00:06:46,050 --> 00:06:48,460
When fitting these vector
autoregressive models

99
00:06:48,460 --> 00:06:52,070
with the packages
that exist in R,

100
00:06:52,070 --> 00:06:56,320
they give us output which
provides the specification

101
00:06:56,320 --> 00:07:01,440
of each of the
autoregressive models

102
00:07:01,440 --> 00:07:05,260
for the different dependent
variables, the different series

103
00:07:05,260 --> 00:07:07,620
of the process.

104
00:07:07,620 --> 00:07:13,610
And so here is the case of the
regression model for Fed funds

105
00:07:13,610 --> 00:07:17,720
as a function of
unemployment rate lagged,

106
00:07:17,720 --> 00:07:21,040
Fed funds rate lagged,
and CPI lagged.

107
00:07:21,040 --> 00:07:25,240
These are all on
the different scale.

108
00:07:25,240 --> 00:07:27,730
When you're looking at
these results, what's

109
00:07:27,730 --> 00:07:31,340
important is
basically how strong

110
00:07:31,340 --> 00:07:33,850
the signal-to-noise
ratio is for estimating

111
00:07:33,850 --> 00:07:37,590
these autoregressive
parameters, vector

112
00:07:37,590 --> 00:07:39,130
autoregressive parameters.

113
00:07:39,130 --> 00:07:43,540
And so with the Fed funds,
you can look at the t values.

114
00:07:43,540 --> 00:07:45,920
And t values that
are larger than 2

115
00:07:45,920 --> 00:07:49,210
are certainly quite significant.

116
00:07:49,210 --> 00:07:53,540
And you can see that basically
when the unemployment rate

117
00:07:53,540 --> 00:07:59,250
coefficient is a negative
0.71, so if the unemployment

118
00:07:59,250 --> 00:08:05,270
rate goes up, we expect to
see the Fed rate going down

119
00:08:05,270 --> 00:08:07,080
the next month.

120
00:08:07,080 --> 00:08:15,650
And the Fed funds rate for the
lag 1 has a t value of 7.97.

121
00:08:15,650 --> 00:08:18,790
So these are now models
on the differences.

122
00:08:18,790 --> 00:08:21,480
So if the Fed funds
rate was increased

123
00:08:21,480 --> 00:08:25,880
last month or last quarter, it's
likely to be increased again.

124
00:08:25,880 --> 00:08:31,560
And that's partly a factor
of how slow the economy is

125
00:08:31,560 --> 00:08:34,049
in reacting to changes
and how the Fed doesn't

126
00:08:34,049 --> 00:08:40,200
want to shock the economy with
large changes in their policy

127
00:08:40,200 --> 00:08:42,909
rates.

128
00:08:42,909 --> 00:08:46,600
Another thing to notice here
is that there's actually

129
00:08:46,600 --> 00:08:50,230
a negative coefficient
on the lag 2

130
00:08:50,230 --> 00:08:54,490
Fed funds term, a negative 0.17.

131
00:08:54,490 --> 00:08:58,870
And in interpreting
these kinds of models,

132
00:08:58,870 --> 00:09:02,510
I think it's helpful
just to think of,

133
00:09:02,510 --> 00:09:06,210
if you have Fed
funds sub t, that's

134
00:09:06,210 --> 00:09:13,970
equal to minus 0.71 times the
unemployment rate at t minus 1.

135
00:09:13,970 --> 00:09:24,050
And then we have plus 0.37 times
the Fed funds, so t minus 1.

136
00:09:24,050 --> 00:09:24,820
And this is delta.

137
00:09:24,820 --> 00:09:31,330
And then minus 1.8
times the Fed funds.

138
00:09:31,330 --> 00:09:35,000
So t minus 2.

139
00:09:35,000 --> 00:09:39,290
In interpreting
these coefficients,

140
00:09:39,290 --> 00:09:43,020
notice that these
two terms correspond

141
00:09:43,020 --> 00:09:57,110
to 0.19 times the Fed funds
change 1 lag ago plus 0.18

142
00:09:57,110 --> 00:09:59,445
times the change in that rate.

143
00:10:03,550 --> 00:10:06,360
So when you see
multiple lags coming

144
00:10:06,360 --> 00:10:11,720
into play in these models,
the interpretation of them

145
00:10:11,720 --> 00:10:17,560
can be made by considering
different transformations

146
00:10:17,560 --> 00:10:20,210
essentially of the
underlying variables.

147
00:10:20,210 --> 00:10:23,130
In this form, you can see
that OK, the Fed funds

148
00:10:23,130 --> 00:10:30,180
tends to change the way it
changed the previous month.

149
00:10:30,180 --> 00:10:38,644
But it also may change
depending on the double change

150
00:10:38,644 --> 00:10:39,560
in the previous month.

151
00:10:39,560 --> 00:10:42,620
So there's a degree of
acceleration in the Fed funds

152
00:10:42,620 --> 00:10:44,450
that is being captured here.

153
00:10:44,450 --> 00:10:47,640
So the interpretation
of these models

154
00:10:47,640 --> 00:10:51,930
sometimes requires some care.

155
00:10:51,930 --> 00:10:55,560
This kind of analysis,
I find it quite useful.

156
00:11:02,600 --> 00:11:09,710
So let's push on
to the next topic.

157
00:11:09,710 --> 00:11:13,230
So today's topics are going
to begin with a discussion

158
00:11:13,230 --> 00:11:15,640
of cointegration.

159
00:11:15,640 --> 00:11:18,980
Cointegration is a major topic
in time series analysis, which

160
00:11:18,980 --> 00:11:23,980
is dealing with the analysis
of non-stationary time series.

161
00:11:23,980 --> 00:11:28,060
And in the previous
discussion, we

162
00:11:28,060 --> 00:11:29,910
addressed
non-stationarity of series

163
00:11:29,910 --> 00:11:32,214
by taking first
differences to eliminate

164
00:11:32,214 --> 00:11:33,130
that non-stationarity.

165
00:11:36,440 --> 00:11:40,140
But we may be losing
some information

166
00:11:40,140 --> 00:11:41,450
with that differencing.

167
00:11:41,450 --> 00:11:44,940
And cointegration
provides a framework

168
00:11:44,940 --> 00:11:47,440
within which we
characterize all available

169
00:11:47,440 --> 00:11:49,680
information for
statistical modeling,

170
00:11:49,680 --> 00:11:52,920
in a very systematic way.

171
00:11:52,920 --> 00:11:58,580
So let's introduce the
context within which

172
00:11:58,580 --> 00:12:00,630
cointegration is relevant.

173
00:12:00,630 --> 00:12:05,810
It's relevant when we
have a stochastic process,

174
00:12:05,810 --> 00:12:08,620
a multivariate
stochastic process, which

175
00:12:08,620 --> 00:12:12,060
is integrated of some order d.

176
00:12:12,060 --> 00:12:15,810
And to be integrated
of order d means

177
00:12:15,810 --> 00:12:18,920
that if we take the
d-th difference,

178
00:12:18,920 --> 00:12:21,395
then that d-th
difference is stationary.

179
00:12:23,980 --> 00:12:33,720
So and if you look
at a time series

180
00:12:33,720 --> 00:12:38,630
and you plot that over time,
well, OK, a stationary time

181
00:12:38,630 --> 00:12:43,010
series we know should be
something that basically

182
00:12:43,010 --> 00:12:45,010
has a constant mean over time.

183
00:12:45,010 --> 00:12:48,580
There's some steady
mean which that has.

184
00:12:48,580 --> 00:12:51,470
And the variability
is also constant.

185
00:12:51,470 --> 00:12:59,000
With some other time series,
it might increase linearly

186
00:12:59,000 --> 00:13:00,940
over time.

187
00:13:00,940 --> 00:13:03,600
And a series that increases
linearly over time, well,

188
00:13:03,600 --> 00:13:05,070
if you take first
differences, that

189
00:13:05,070 --> 00:13:07,650
tends to take out
that linear trend.

190
00:13:07,650 --> 00:13:10,230
If there are higher order
differencing is required, then

191
00:13:10,230 --> 00:13:14,160
that means that there's some
curvature, quadratic say,

192
00:13:14,160 --> 00:13:18,760
that may exist in the data
that is being taken out.

193
00:13:18,760 --> 00:13:25,460
So this differencing is required
to result in stationarity.

194
00:13:25,460 --> 00:13:32,430
If the process does have vector
autoregressive representation

195
00:13:32,430 --> 00:13:35,330
in spite of its
non-stationarity,

196
00:13:35,330 --> 00:13:43,920
then it can be represented by
a polynomial lag of the x's is

197
00:13:43,920 --> 00:13:48,690
equal to white noise epsilon.

198
00:13:48,690 --> 00:13:53,590
And the polynomial
phi of L going

199
00:13:53,590 --> 00:13:59,180
to have a factor term
in there of 1 minus L,

200
00:13:59,180 --> 00:14:02,100
basically the first
difference to the d power.

201
00:14:02,100 --> 00:14:06,300
So if taking these the
d-th order difference

202
00:14:06,300 --> 00:14:12,430
reduces it to
stationarity, then we

203
00:14:12,430 --> 00:14:16,630
can express this vector
autoregression in this way.

204
00:14:16,630 --> 00:14:26,620
So the phi star of L
basically represents

205
00:14:26,620 --> 00:14:31,110
the stationary vector
autoregressive process

206
00:14:31,110 --> 00:14:33,255
on the d-th difference series.

207
00:14:47,730 --> 00:14:52,780
Now, as it says here, each
of the component series

208
00:14:52,780 --> 00:14:57,090
may be non-stationary and
integrated, say of order one.

209
00:14:57,090 --> 00:15:02,770
But the process itself may
not be jointly integrated.

210
00:15:02,770 --> 00:15:08,900
In that it may be that there
are linear combinations

211
00:15:08,900 --> 00:15:13,800
of our multivariate series
which are stationary.

212
00:15:13,800 --> 00:15:20,570
And so these linear
combinations basically

213
00:15:20,570 --> 00:15:25,050
represent the stationary
features of the process.

214
00:15:25,050 --> 00:15:31,160
And those features can be
apparent without looking

215
00:15:31,160 --> 00:15:32,490
at differences.

216
00:15:32,490 --> 00:15:35,350
So in a sense, if
you just focused

217
00:15:35,350 --> 00:15:38,880
on differences of these
non-stationary multivariate

218
00:15:38,880 --> 00:15:43,560
series, you would be
losing out on information

219
00:15:43,560 --> 00:15:49,900
of the stationary structure
of contemporaneous components

220
00:15:49,900 --> 00:15:52,230
of the multivariate series.

221
00:15:52,230 --> 00:15:56,130
And so cointegration
deals with this situation

222
00:15:56,130 --> 00:16:01,480
where some linear combinations
of the multivariate series

223
00:16:01,480 --> 00:16:02,996
in fact are stationary.

224
00:16:08,810 --> 00:16:15,090
So how do we represent
that mathematically?

225
00:16:15,090 --> 00:16:19,020
Well, we say that this
multivariate time series

226
00:16:19,020 --> 00:16:24,360
process is cointegrated if
there exists an m-vector beta

227
00:16:24,360 --> 00:16:29,470
such that, defining linear
weights on the x's, and

228
00:16:29,470 --> 00:16:32,225
beta prime X_t is a
stationary process.

229
00:16:37,920 --> 00:16:42,610
The cointegration vector of
beta can be scaled arbitrarily.

230
00:16:42,610 --> 00:16:49,110
So it's common
practice, if one has

231
00:16:49,110 --> 00:16:51,200
an interest, some primary
interest, perhaps,

232
00:16:51,200 --> 00:16:53,580
in the first component
series of process,

233
00:16:53,580 --> 00:16:56,680
to set that equal to 1.

234
00:16:56,680 --> 00:17:01,020
And the expression
basically says

235
00:17:01,020 --> 00:17:06,470
that our time t value
of the first series

236
00:17:06,470 --> 00:17:11,930
is related in a stationary
way to a linear combination

237
00:17:11,930 --> 00:17:15,550
of the other m minus 1 series.

238
00:17:15,550 --> 00:17:21,859
And this is a long-run
equilibrium type relationship.

239
00:17:21,859 --> 00:17:25,510
How does this arise?

240
00:17:25,510 --> 00:17:30,570
Well, it arises in many, many
ways in economics and finance.

241
00:17:33,100 --> 00:17:36,000
The term structure of interest
rates, purchase power parity.

242
00:17:38,820 --> 00:17:42,660
In the terms structure
of interest rates,

243
00:17:42,660 --> 00:17:47,100
basically the differences
between yields

244
00:17:47,100 --> 00:17:50,260
on interest rates over
different maturities,

245
00:17:50,260 --> 00:17:52,600
those differences
might be stationary.

246
00:17:52,600 --> 00:17:56,780
The overall level of interest
might not be stationary,

247
00:17:56,780 --> 00:18:01,350
but the spreads ought
to be stationary.

248
00:18:01,350 --> 00:18:04,680
The purchase power parity
in foreign exchange,

249
00:18:04,680 --> 00:18:10,940
if you look at the
value of currencies

250
00:18:10,940 --> 00:18:14,830
for different countries,
basically different countries

251
00:18:14,830 --> 00:18:19,710
ought to be able to purchase
the same goods for roughly

252
00:18:19,710 --> 00:18:20,720
the same price.

253
00:18:20,720 --> 00:18:23,860
And so if there are
disparities in currency values,

254
00:18:23,860 --> 00:18:27,740
purchase power parity suggests
that things will revert back

255
00:18:27,740 --> 00:18:32,900
to some norm where everybody
is paying on average over time

256
00:18:32,900 --> 00:18:34,960
the same amount for
different goods.

257
00:18:34,960 --> 00:18:37,460
Otherwise, there
would be arbitrage.

258
00:18:40,030 --> 00:18:41,890
Money demand, covered
interest rate parity,

259
00:18:41,890 --> 00:18:44,340
law of one price,
spot and futures.

260
00:18:44,340 --> 00:18:48,470
Let me show you
another example that

261
00:18:48,470 --> 00:18:54,820
will be in the case
study for this chapter.

262
00:19:00,290 --> 00:19:06,410
View, full screen.

263
00:19:06,410 --> 00:19:09,900
Let's think about
energy futures.

264
00:19:09,900 --> 00:19:13,450
In fact, next Tuesday's
talk from Morgan Stanley

265
00:19:13,450 --> 00:19:18,490
is going to be an expert in
commodity futures and options.

266
00:19:18,490 --> 00:19:21,090
And that should be
very interesting.

267
00:19:21,090 --> 00:19:28,920
Anyway, here, I'm
looking at energy futures

268
00:19:28,920 --> 00:19:31,136
from the Energy
Information Administration.

269
00:19:31,136 --> 00:19:32,510
Actually, for this
course, trying

270
00:19:32,510 --> 00:19:36,970
to get data that's freely
available to students

271
00:19:36,970 --> 00:19:40,560
is one of the things we do.

272
00:19:40,560 --> 00:19:42,646
So this data is actually
available from the Energy

273
00:19:42,646 --> 00:19:44,770
Information Administration
of the government, which

274
00:19:44,770 --> 00:19:48,960
is now open, so I guess
that'll be updated over time.

275
00:19:48,960 --> 00:19:52,070
But basically these
energy futures

276
00:19:52,070 --> 00:19:55,570
are traded on the Chicago
Mercantile Exchange.

277
00:19:55,570 --> 00:20:03,290
And basically CL is crude,
West Texas intermediate crude,

278
00:20:03,290 --> 00:20:08,760
light crude, which we have
here, a time series from 2006

279
00:20:08,760 --> 00:20:12,670
to basically yesterday.

280
00:20:12,670 --> 00:20:16,340
And you can see how at the
start of the period around $60

281
00:20:16,340 --> 00:20:19,080
and then went up
to close to $140,

282
00:20:19,080 --> 00:20:22,440
and then it dropped
down to around $40.

283
00:20:22,440 --> 00:20:26,110
And it's been hovering
around $100 lately.

284
00:20:26,110 --> 00:20:33,040
The second series here is
gasoline, RBOB gasoline.

285
00:20:33,040 --> 00:20:36,240
Always have to look this up.

286
00:20:36,240 --> 00:20:42,690
This is that reformulated blend
stock for oxygenated blending

287
00:20:42,690 --> 00:20:43,250
gasoline.

288
00:20:43,250 --> 00:20:48,030
Anyway, futures on this product
are traded at the CME as well.

289
00:20:48,030 --> 00:20:50,750
And then heating oil.

290
00:20:50,750 --> 00:20:56,780
And what's happening
with these data

291
00:20:56,780 --> 00:21:08,880
is that we have basically
a refinery which processes

292
00:21:08,880 --> 00:21:15,990
crude oil as an input.

293
00:21:15,990 --> 00:21:20,180
And it basically
refines it, distills it,

294
00:21:20,180 --> 00:21:36,600
and generates outputs, which
include heating oil, gasoline,

295
00:21:36,600 --> 00:21:41,680
and various other things
like jet fuel and others.

296
00:21:41,680 --> 00:21:46,460
So if we're looking
at the prices,

297
00:21:46,460 --> 00:21:49,510
the futures prices of, say,
gasoline and heating oil,

298
00:21:49,510 --> 00:21:55,710
relating those to crude
oil, well, certainly,

299
00:21:55,710 --> 00:21:59,140
the cost of producing these
products should depend

300
00:21:59,140 --> 00:22:01,820
on the cost of the input .

301
00:22:01,820 --> 00:22:10,480
So I've got in the next plot,
a translation of these futures

302
00:22:10,480 --> 00:22:15,510
contracts into their
price per barrel.

303
00:22:15,510 --> 00:22:19,320
Turns out crude is quoted
in dollars per barrel.

304
00:22:19,320 --> 00:22:24,390
And the gasoline heating
oil are in cents per gallon.

305
00:22:24,390 --> 00:22:26,490
So one multiplies.

306
00:22:26,490 --> 00:22:28,310
There are 42
gallons in a barrel.

307
00:22:28,310 --> 00:22:30,960
So you multiply those
previous years by 42.

308
00:22:30,960 --> 00:22:33,549
And this shows the plot of
the prices of the futures

309
00:22:33,549 --> 00:22:35,590
where we're looking at
essentially the same units

310
00:22:35,590 --> 00:22:40,600
of output relative to input.

311
00:22:40,600 --> 00:22:45,700
And what's evident here is that
while the futures for gasoline,

312
00:22:45,700 --> 00:22:50,450
the blue, is consistently above
the green, the input, and same

313
00:22:50,450 --> 00:22:52,520
for heating oil.

314
00:22:52,520 --> 00:22:55,680
And those vary depending
on which is greater.

315
00:22:55,680 --> 00:23:02,600
So if we look at the
difference between, say,

316
00:23:02,600 --> 00:23:07,020
the price of the heating
oil future and the crude oil

317
00:23:07,020 --> 00:23:11,625
future, what does
that represent?

318
00:23:14,380 --> 00:23:20,780
That's the spread in value of
the output minus the input.

319
00:23:20,780 --> 00:23:21,546
Ray?

320
00:23:21,546 --> 00:23:24,282
AUDIENCE: [INAUDIBLE] cost
of running the refinery?

321
00:23:27,146 --> 00:23:31,940
PROFESSOR: So cost of refining.

322
00:23:31,940 --> 00:23:39,700
So let's look at, say,
heating oil minus CL and, say,

323
00:23:39,700 --> 00:23:43,930
this RBOB minus CL.

324
00:23:43,930 --> 00:23:46,670
So it's cost of refining.

325
00:23:46,670 --> 00:23:49,487
What else could
be a factor here?

326
00:23:49,487 --> 00:23:51,820
AUDIENCE: Supply and demand
characteristics [INAUDIBLE].

327
00:23:51,820 --> 00:23:52,736
PROFESSOR: Definitely.

328
00:23:52,736 --> 00:23:54,165
Supply and demand.

329
00:23:54,165 --> 00:23:56,290
If one product is demanded
a lot more than another.

330
00:23:58,280 --> 00:23:59,030
Supply and demand.

331
00:24:05,820 --> 00:24:08,215
Anything else?

332
00:24:08,215 --> 00:24:09,840
AUDIENCE: Maybe for
the outputs, if you

333
00:24:09,840 --> 00:24:11,340
were to find the difference
between the outputs,

334
00:24:11,340 --> 00:24:13,060
it would be something cyclical.

335
00:24:13,060 --> 00:24:15,640
For example, in the
winter, heating oil

336
00:24:15,640 --> 00:24:17,840
is going to get far more
valuable as gasoline,

337
00:24:17,840 --> 00:24:19,840
because people drive less
and people demand more

338
00:24:19,840 --> 00:24:20,950
for heating homes.

339
00:24:20,950 --> 00:24:22,080
PROFESSOR: Absolutely.

340
00:24:22,080 --> 00:24:25,670
That's a very significant
factor with these.

341
00:24:25,670 --> 00:24:29,230
There are seasonal effects
that drive supply and demand.

342
00:24:29,230 --> 00:24:35,460
And so we can put
seasonal effects in there

343
00:24:35,460 --> 00:24:36,980
as affecting supply and demand.

344
00:24:36,980 --> 00:24:40,280
But certainly, you might expect
to see seasonal structure here.

345
00:24:40,280 --> 00:24:43,720
Anything else?

346
00:24:43,720 --> 00:24:47,070
Put on your traders hat.

347
00:24:47,070 --> 00:24:49,310
Profit, yes.

348
00:24:49,310 --> 00:24:53,160
The refinery needs
to make some profit.

349
00:24:53,160 --> 00:24:58,520
So there has to be some
level of profit that's

350
00:24:58,520 --> 00:25:02,240
acceptable and appropriate.

351
00:25:02,240 --> 00:25:05,250
So we have all these
things driving basically

352
00:25:05,250 --> 00:25:07,630
these differences.

353
00:25:07,630 --> 00:25:10,220
Let's just take a look
at those differences.

354
00:25:10,220 --> 00:25:14,880
These are actually
called the crack spreads.

355
00:25:14,880 --> 00:25:19,250
Cracking in the
business of refining

356
00:25:19,250 --> 00:25:22,220
is basically the
breaking down of oil

357
00:25:22,220 --> 00:25:26,250
into components, products.

358
00:25:26,250 --> 00:25:31,800
And on the top is the
gasoline crack spread.

359
00:25:31,800 --> 00:25:35,460
And the bottom is the
heating oil crack spread.

360
00:25:35,460 --> 00:25:37,720
And one can see
that as time series,

361
00:25:37,720 --> 00:25:41,860
these actually look stationary.

362
00:25:41,860 --> 00:25:45,920
There certainly doesn't appear
to be a linear trend up.

363
00:25:45,920 --> 00:25:51,390
But there are, of course, many
factors that could affect this.

364
00:25:51,390 --> 00:25:59,110
So with that as motivation, how
would we model such a series?

365
00:25:59,110 --> 00:26:01,230
So let's go back to
our lecture here.

366
00:26:06,420 --> 00:26:08,775
All right, View, full size.

367
00:26:15,760 --> 00:26:18,430
This is going to be a
very technical discussion,

368
00:26:18,430 --> 00:26:25,460
but it's, at the end of the day,
I think fairly straightforward.

369
00:26:25,460 --> 00:26:27,210
And the objective
actually of this lecture

370
00:26:27,210 --> 00:26:31,240
is to provide an introduction
to the notation here, which

371
00:26:31,240 --> 00:26:35,860
should make it seem like it's a
very straightforward derivation

372
00:26:35,860 --> 00:26:37,800
process of these models.

373
00:26:37,800 --> 00:26:42,890
So let's begin with just a recap
of the vector autoregressive

374
00:26:42,890 --> 00:26:45,350
model of order p.

375
00:26:45,350 --> 00:26:47,570
This is the extension of
the univariate case where

376
00:26:47,570 --> 00:26:52,870
we have a vector C of
constants, m constants,

377
00:26:52,870 --> 00:26:56,960
and matrices phi_1 to
phi_p corresponding

378
00:26:56,960 --> 00:27:01,650
to basically how the
autoregression of one series

379
00:27:01,650 --> 00:27:04,810
depends on all the other series.

380
00:27:04,810 --> 00:27:08,270
And then there's multivariate
white noise eta_t,

381
00:27:08,270 --> 00:27:13,630
which has mean 0 and some
covariance structure in it.

382
00:27:13,630 --> 00:27:19,830
And the stationarity-- if
this series were stationary,

383
00:27:19,830 --> 00:27:28,050
then the determinant of
this matrix polynomial

384
00:27:28,050 --> 00:27:33,360
would have roots outside the
unit circle for complex z.

385
00:27:33,360 --> 00:27:39,290
And if it's not stationary,
then some of those roots

386
00:27:39,290 --> 00:27:41,680
will be on the unit
circle or beyond.

387
00:27:41,680 --> 00:27:45,125
So let's actually go to
that non-stationary case

388
00:27:45,125 --> 00:27:50,540
and suppose that the process
is integrated of order one.

389
00:27:50,540 --> 00:27:53,050
So if we were to take
first differences,

390
00:27:53,050 --> 00:27:54,175
we would have stationarity.

391
00:28:02,690 --> 00:28:06,500
Well, the derivation
of the model

392
00:28:06,500 --> 00:28:12,150
proceeds by converting the
original vector autoregressive

393
00:28:12,150 --> 00:28:16,050
equation into an
equation that's mostly

394
00:28:16,050 --> 00:28:19,560
relating to differences but
with also some extra terms.

395
00:28:19,560 --> 00:28:24,130
So let's begin the process
by just subtracting

396
00:28:24,130 --> 00:28:26,620
the lagged value of
the multivariate vector

397
00:28:26,620 --> 00:28:29,030
from the original series.

398
00:28:29,030 --> 00:28:31,290
So we subtract X_(t-1)
from both sides,

399
00:28:31,290 --> 00:28:37,330
and we get delta X_t is equal to
C plus phi_1 minus I_m X_(t-1)

400
00:28:37,330 --> 00:28:38,200
plus the rest.

401
00:28:38,200 --> 00:28:41,960
So that's a very simple step.

402
00:28:41,960 --> 00:28:46,220
We're just subtracting the
lagged multivariate series

403
00:28:46,220 --> 00:28:49,370
from both sides.

404
00:28:49,370 --> 00:28:53,290
Now, what we want
to do is convert

405
00:28:53,290 --> 00:28:59,930
the second term in the middle
line into a difference term.

406
00:28:59,930 --> 00:29:00,990
So what do we do?

407
00:29:00,990 --> 00:29:07,900
Well, we can subtract and add
phi_1 minus I_m times X_(t-2).

408
00:29:07,900 --> 00:29:10,440
If we do that,
subtract and add that,

409
00:29:10,440 --> 00:29:13,810
we then get the delta X_t is
C plus a multiple of delta

410
00:29:13,810 --> 00:29:19,530
X_(t-1) plus this
multiple of X_(t-2).

411
00:29:19,530 --> 00:29:22,240
So we basically
reduced the equations

412
00:29:22,240 --> 00:29:25,290
to differences in
the first two terms

413
00:29:25,290 --> 00:29:29,520
or in the current
series and the lagged.

414
00:29:29,520 --> 00:29:33,550
But then we have the original
series for lags t minus 2.

415
00:29:33,550 --> 00:29:38,660
We can continue this
process with the third.

416
00:29:38,660 --> 00:29:42,460
And then at the
end of the day, we

417
00:29:42,460 --> 00:29:46,150
end up getting this equation
for the difference of the series

418
00:29:46,150 --> 00:29:49,300
is equal to a constant
plus a matrix multiple

419
00:29:49,300 --> 00:29:53,880
of the first difference
multivariate series,

420
00:29:53,880 --> 00:29:56,920
plus another matrix times
the second difference,

421
00:29:56,920 --> 00:30:01,720
all the way down to
the p-th difference,

422
00:30:01,720 --> 00:30:03,760
or the p minus first difference.

423
00:30:03,760 --> 00:30:07,400
But at the end,
we're left with terms

424
00:30:07,400 --> 00:30:11,320
at p lags that have no
differences in them.

425
00:30:11,320 --> 00:30:14,440
So we've been able to
represent this series

426
00:30:14,440 --> 00:30:19,090
as an autoregressive
function of differences.

427
00:30:19,090 --> 00:30:24,010
But there's also a term on
the undifferenced series

428
00:30:24,010 --> 00:30:27,470
at the end that's left over.

429
00:30:27,470 --> 00:30:34,900
And or this argument
can actually

430
00:30:34,900 --> 00:30:38,330
proceed by eliminating
differences in the reverse way,

431
00:30:38,330 --> 00:30:42,650
starting with the
p-th lag and going up.

432
00:30:42,650 --> 00:30:47,200
And one then can represent
this as delta X_t

433
00:30:47,200 --> 00:30:50,170
is C plus some
matrix times just the

434
00:30:50,170 --> 00:30:56,000
lagged series plus various
matrices times the differences

435
00:30:56,000 --> 00:30:58,880
going back p minus 1 lags.

436
00:31:05,460 --> 00:31:10,200
And so at the end of
the day, this model

437
00:31:10,200 --> 00:31:14,270
basically for delta
X_t is a constant

438
00:31:14,270 --> 00:31:20,760
plus a matrix times the
previous lagged series

439
00:31:20,760 --> 00:31:25,660
or the first lag of the
multivariate time series,

440
00:31:25,660 --> 00:31:30,320
plus various autoregressive
lags of the differenced series.

441
00:31:32,960 --> 00:31:36,130
So these notes give you
the formulas for those,

442
00:31:36,130 --> 00:31:40,840
and they're very easy to
verify if you go through them

443
00:31:40,840 --> 00:31:41,594
one by one.

444
00:31:45,730 --> 00:31:51,760
And when we look at this
expression for the model,

445
00:31:51,760 --> 00:31:57,270
this expresses the
stochastic process model

446
00:31:57,270 --> 00:31:59,560
for the difference series.

447
00:31:59,560 --> 00:32:03,780
This difference
series is stationary.

448
00:32:03,780 --> 00:32:05,970
We've eliminated
the non-stationarity

449
00:32:05,970 --> 00:32:06,630
in the process.

450
00:32:06,630 --> 00:32:09,160
So that means the
right-hand side

451
00:32:09,160 --> 00:32:12,890
has to be stationary as well.

452
00:32:12,890 --> 00:32:19,890
And so while the terms which
are matrix multiples of lags

453
00:32:19,890 --> 00:32:21,390
of the differenced
series, those are

454
00:32:21,390 --> 00:32:23,750
going to be stationary
because we're just

455
00:32:23,750 --> 00:32:27,680
taking lags of the
stationary multivariate time

456
00:32:27,680 --> 00:32:29,540
series, the difference series.

457
00:32:29,540 --> 00:32:36,880
But this pi X_t term has
to be stationary as well.

458
00:32:36,880 --> 00:32:41,640
So this pi X_t contains
the cointegrating terms.

459
00:32:41,640 --> 00:32:46,600
And fitting a sort of
cointegrated vector

460
00:32:46,600 --> 00:32:53,490
autoregression model involves
identifying this term, pi X_t.

461
00:32:53,490 --> 00:33:00,870
And given that the original
series had unit roots,

462
00:33:00,870 --> 00:33:06,195
it has to be the case that
pi, the matrix, is singular.

463
00:33:09,550 --> 00:33:12,080
So it's basically
a transformation

464
00:33:12,080 --> 00:33:15,310
of the data that
eliminates that unit

465
00:33:15,310 --> 00:33:19,880
root in the overall series.

466
00:33:19,880 --> 00:33:24,440
So the matrix pi
is of reduced rank,

467
00:33:24,440 --> 00:33:27,676
and it's either rank
zero, in which case

468
00:33:27,676 --> 00:33:29,300
there's no cointegrating
relationships,

469
00:33:29,300 --> 00:33:34,500
or its rank is less than m.

470
00:33:34,500 --> 00:33:39,060
And the matrix pi does
define the cointegrating

471
00:33:39,060 --> 00:33:40,550
relationships.

472
00:33:40,550 --> 00:33:43,080
Now, these cointegrating
relationships

473
00:33:43,080 --> 00:33:48,990
are the relationships in the
process that are stationary.

474
00:33:48,990 --> 00:33:53,200
And so basically there's
a lot of information

475
00:33:53,200 --> 00:33:57,880
in that multivariate series
with contemporaneous values

476
00:33:57,880 --> 00:33:59,470
of the series.

477
00:33:59,470 --> 00:34:02,500
There is stationary structure
at every single time

478
00:34:02,500 --> 00:34:08,199
point, which can be the
target of the modeling.

479
00:34:08,199 --> 00:34:16,250
So this matrix pi is
of rank r less than m.

480
00:34:16,250 --> 00:34:22,100
And so it can be expressed
as basically alpha beta

481
00:34:22,100 --> 00:34:30,540
prime, where these matrices
are of rank r, alpha and beta.

482
00:34:30,540 --> 00:34:33,199
And the columns of beta define
linearly independent vectors

483
00:34:33,199 --> 00:34:34,770
which cointegrate x.

484
00:34:34,770 --> 00:34:37,909
And the decomposition
of pi isn't unique.

485
00:34:37,909 --> 00:34:43,389
You can basically, for any
invertible r by r matrix g,

486
00:34:43,389 --> 00:34:46,350
define another set of
cointegrating relationships.

487
00:34:46,350 --> 00:34:50,340
So in the linear algebra
structure of these problems,

488
00:34:50,340 --> 00:34:52,800
there's basically an
r-dimensional space

489
00:34:52,800 --> 00:34:56,360
where the process is
stationary, and how

490
00:34:56,360 --> 00:35:02,020
you define the coordinate system
in that space is up to you

491
00:35:02,020 --> 00:35:08,130
or subject to some choice.

492
00:35:08,130 --> 00:35:09,780
So how do we estimate
these models?

493
00:35:09,780 --> 00:35:15,520
Well, rather nice result
of Sims, Stock, and Watson.

494
00:35:15,520 --> 00:35:17,800
Actually, Sims,
Christopher Sims,

495
00:35:17,800 --> 00:35:21,790
he got the Nobel Prize a
few years ago for his work

496
00:35:21,790 --> 00:35:23,730
in econometrics.

497
00:35:23,730 --> 00:35:33,850
And so this is a rather
significant work that he did.

498
00:35:33,850 --> 00:35:36,740
Anyway, he, together
with Stock and Watson,

499
00:35:36,740 --> 00:35:41,120
prove that if you're estimating
a vector autoregression model,

500
00:35:41,120 --> 00:35:45,490
then the least squares
estimator of the original model

501
00:35:45,490 --> 00:35:49,150
is basically sufficient
to do an analysis

502
00:35:49,150 --> 00:35:56,600
of this cointegrated vector
autoregression process.

503
00:35:56,600 --> 00:35:58,960
The parameter estimates
from just fitting

504
00:35:58,960 --> 00:36:03,610
the vector autoregression are
consistent for the underlying

505
00:36:03,610 --> 00:36:04,657
parameters.

506
00:36:04,657 --> 00:36:06,240
And they have
asymptotic distributions

507
00:36:06,240 --> 00:36:09,980
that are identical to those of
maximum likelihood estimators.

508
00:36:09,980 --> 00:36:18,360
And so what ends up happening
is the least squares estimates

509
00:36:18,360 --> 00:36:21,960
of the vector autoregression
parameters lead

510
00:36:21,960 --> 00:36:27,270
to an estimation
of the pi matrix.

511
00:36:27,270 --> 00:36:40,290
And the constraints on the pi
matrix which are basically pi

512
00:36:40,290 --> 00:36:44,430
is of reduced rank, those
will hold asymptotically.

513
00:36:44,430 --> 00:36:49,240
So let's just go back
to the equation before,

514
00:36:49,240 --> 00:36:54,490
to see if that
looks familiar here.

515
00:36:58,930 --> 00:37:03,070
So what that work says
is that if we basically

516
00:37:03,070 --> 00:37:07,110
fit the linear regression
model regressing the difference

517
00:37:07,110 --> 00:37:13,930
series on the lag of the series
plus lags of differences,

518
00:37:13,930 --> 00:37:18,590
the least squares estimates
of these underlying parameters

519
00:37:18,590 --> 00:37:21,690
will give us asymptotically
efficient estimates

520
00:37:21,690 --> 00:37:24,060
of this overall process.

521
00:37:24,060 --> 00:37:31,635
So we don't need to use any new
tools to specify these models.

522
00:37:43,800 --> 00:37:48,110
There's an advanced literature
on estimation methods

523
00:37:48,110 --> 00:37:49,950
for these models.

524
00:37:49,950 --> 00:37:55,050
Johansen does describe
maximum likelihood estimation

525
00:37:55,050 --> 00:38:01,260
when the innovation terms
are normally distributed.

526
00:38:01,260 --> 00:38:07,270
And that methodology applies
reduced rank regression

527
00:38:07,270 --> 00:38:13,150
methodology and
yields tests for what

528
00:38:13,150 --> 00:38:17,130
the rank is of the
cointegrating relationship.

529
00:38:17,130 --> 00:38:20,270
And these methods are
implemented in our packages.

530
00:38:25,710 --> 00:38:26,420
Let's see.

531
00:38:26,420 --> 00:38:40,890
Let me just go back now
to the-- so let's see.

532
00:38:40,890 --> 00:38:47,690
The case study on
the crack spread data

533
00:38:47,690 --> 00:38:51,370
actually goes through sort of
testing for non-stationarity

534
00:38:51,370 --> 00:38:54,040
in these underlying series.

535
00:38:54,040 --> 00:38:58,360
And actually, why don't
I just show you that?

536
00:38:58,360 --> 00:38:59,450
Let's go back here.

537
00:39:17,522 --> 00:39:23,460
If you can see this, for
the crack spread data,

538
00:39:23,460 --> 00:39:25,230
looking at the
crude oil futures,

539
00:39:25,230 --> 00:39:28,450
basically the crude oil
future can be evaluated

540
00:39:28,450 --> 00:39:30,790
to see if it's non-stationary.

541
00:39:30,790 --> 00:39:33,800
And there's this augmented
Dickey-Fuller test

542
00:39:33,800 --> 00:39:36,350
for non-stationarity.

543
00:39:36,350 --> 00:39:43,160
And it basically has a null
hypothesis that the model

544
00:39:43,160 --> 00:39:46,850
or the series is non-stationary,
or it has a unit root,

545
00:39:46,850 --> 00:39:49,040
versus the alternative
that it doesn't.

546
00:39:49,040 --> 00:39:52,180
And so testing that
null hypothesis

547
00:39:52,180 --> 00:39:56,121
that it's non-stationary
yields a p-value of 0.164

548
00:39:56,121 --> 00:40:01,690
for CLC1, the first
nearest contract,

549
00:40:01,690 --> 00:40:07,400
near month contract of
the futures for crude.

550
00:40:07,400 --> 00:40:11,230
And so the data
suggests that crude

551
00:40:11,230 --> 00:40:14,060
has a distribution that's
non-stationary, integrated

552
00:40:14,060 --> 00:40:16,490
order 1.

553
00:40:16,490 --> 00:40:23,950
And the HOC1 also basically
has a test for-- p-value

554
00:40:23,950 --> 00:40:27,550
for non-stationarity of 0.3265.

555
00:40:27,550 --> 00:40:31,000
So we can't reject
non-stationarity or unit root

556
00:40:31,000 --> 00:40:34,150
in those series with
these test statistics.

557
00:40:34,150 --> 00:40:39,260
In analyzing the data, this
suggests that we basically

558
00:40:39,260 --> 00:40:41,380
need to accommodate that
non-stationarity when

559
00:40:41,380 --> 00:40:43,150
we specify the models.

560
00:40:46,925 --> 00:40:49,130
Let me just see if
there's some results here.

561
00:41:55,180 --> 00:41:59,060
For this series,
actually the case notes

562
00:41:59,060 --> 00:42:01,270
will go through actually
conducting this Johansen

563
00:42:01,270 --> 00:42:03,360
procedure for
testing for the rank

564
00:42:03,360 --> 00:42:05,700
of the cointegrated process.

565
00:42:05,700 --> 00:42:11,630
And that test basically has
different test statistic

566
00:42:11,630 --> 00:42:15,260
for testing whether the rank is
0, 1, less than or equal to 1,

567
00:42:15,260 --> 00:42:16,870
or less than or equal to 2.

568
00:42:16,870 --> 00:42:19,650
And one can see that
there's marginal-- the test

569
00:42:19,650 --> 00:42:25,930
statistic is almost
significant at the 10% level

570
00:42:25,930 --> 00:42:29,780
for the overall series.

571
00:42:29,780 --> 00:42:32,670
It's not significant
for the rank

572
00:42:32,670 --> 00:42:34,460
being less than or equal to 1.

573
00:42:34,460 --> 00:42:38,390
And so these results, it
doesn't suggest there's

574
00:42:38,390 --> 00:42:40,880
strong non-stationarity.

575
00:42:40,880 --> 00:42:45,360
But certainly with
that non-stationarity

576
00:42:45,360 --> 00:42:48,620
is no more than rank
one for the series.

577
00:42:48,620 --> 00:42:52,030
And the eigenvector
corresponding

578
00:42:52,030 --> 00:42:54,070
to the stationary
relationship is

579
00:42:54,070 --> 00:43:00,940
given by these coefficients
of 1 on the crude oil future,

580
00:43:00,940 --> 00:43:05,710
1.3 on the RBOB and minus
1.7 on the heating oil.

581
00:43:08,640 --> 00:43:13,360
So what this suggests
is that there's

582
00:43:13,360 --> 00:43:20,880
considerable variability in
these energy futures contracts.

583
00:43:20,880 --> 00:43:24,390
What appears to be stationary
is some linear combination

584
00:43:24,390 --> 00:43:28,670
of crude plus gasoline
minus heating oil.

585
00:43:28,670 --> 00:43:33,090
And in terms of why does
it combine that way,

586
00:43:33,090 --> 00:43:35,280
well, there are all
kinds of factors

587
00:43:35,280 --> 00:43:38,760
that we went through-- cost of
refining, supply and demand,

588
00:43:38,760 --> 00:43:41,370
seasonality, which
affect things.

589
00:43:41,370 --> 00:43:45,970
And so when analyzed, sort
of ignoring seasonality,

590
00:43:45,970 --> 00:43:50,000
these would be the linear
combinations that appear

591
00:43:50,000 --> 00:43:51,312
to be stationary over time.

592
00:43:51,312 --> 00:43:51,812
Yeah?

593
00:43:53,722 --> 00:43:55,680
AUDIENCE: Why did you
choose to use the futures

594
00:43:55,680 --> 00:43:56,929
prices as opposed to the spot?

595
00:43:56,929 --> 00:44:00,170
And how did you combine the
data with actual [INAUDIBLE]?

596
00:44:00,170 --> 00:44:07,820
PROFESSOR: I chose this
because if refiners are wanting

597
00:44:07,820 --> 00:44:12,130
to hedge their risks, then they
will go to the futures market

598
00:44:12,130 --> 00:44:14,060
to hedge those.

599
00:44:14,060 --> 00:44:17,090
And so working with
these data, one

600
00:44:17,090 --> 00:44:24,370
can then consider problems of
hedging refinery production

601
00:44:24,370 --> 00:44:25,460
risks.

602
00:44:25,460 --> 00:44:28,620
And so that's why.

603
00:44:28,620 --> 00:44:30,960
AUDIENCE: [INAUDIBLE]

604
00:44:30,960 --> 00:44:33,800
PROFESSOR: OK, well, the Energy
Information Administration

605
00:44:33,800 --> 00:44:39,270
provides historical data
which gives the first month,

606
00:44:39,270 --> 00:44:42,030
the second month, the third
month available for each

607
00:44:42,030 --> 00:44:43,400
of these contracts.

608
00:44:43,400 --> 00:44:47,720
And so I chose the
first month contract

609
00:44:47,720 --> 00:44:49,680
for each of these features.

610
00:44:49,680 --> 00:44:51,980
Those 10 are the most liquid.

611
00:44:51,980 --> 00:44:54,440
Depending on what
one is hedging,

612
00:44:54,440 --> 00:44:58,550
one would use perhaps
longer periods for those.

613
00:44:58,550 --> 00:45:02,450
There's some very
nice finance problems

614
00:45:02,450 --> 00:45:04,690
dealing with hedging,
hedging these kinds of risks,

615
00:45:04,690 --> 00:45:07,150
and as well as trading
these kinds of risk.

616
00:45:07,150 --> 00:45:11,030
Traders can try to exploit
short term movements in these.

617
00:45:29,870 --> 00:45:31,820
Anyway, I'll let you
look through these,

618
00:45:31,820 --> 00:45:32,760
the case note later.

619
00:45:32,760 --> 00:45:36,810
And it does provide some detail
on the coefficient estimates.

620
00:45:36,810 --> 00:45:39,119
And one can basically
get a handle

621
00:45:39,119 --> 00:45:40,785
on how these things
are being specified.

622
00:45:43,980 --> 00:45:46,170
So let's go back.

623
00:45:58,260 --> 00:46:06,490
The next topic I want to cover
is linear state-space models.

624
00:46:06,490 --> 00:46:12,725
It turns out that many
of these time series

625
00:46:12,725 --> 00:46:15,090
models appropriate in
economics and finance

626
00:46:15,090 --> 00:46:20,290
can be expressed as a
linear state-space model.

627
00:46:28,590 --> 00:46:32,250
I'm going to introduce the
general notation first and then

628
00:46:32,250 --> 00:46:35,100
provide illustrations
of this general notation

629
00:46:35,100 --> 00:46:38,480
with a number of
different examples.

630
00:46:38,480 --> 00:46:46,205
So the formulation is we have
basically an observation vector

631
00:46:46,205 --> 00:46:47,420
at time t, y_t.

632
00:46:47,420 --> 00:46:50,730
This is our multivariate time
series that we're modeling.

633
00:46:50,730 --> 00:46:53,930
Now, I've chosen it
to be k-dimensional

634
00:46:53,930 --> 00:46:57,900
for the observations.

635
00:46:57,900 --> 00:47:00,720
There's an underlying
state vector

636
00:47:00,720 --> 00:47:04,390
that's of m dimensions,
which basically characterizes

637
00:47:04,390 --> 00:47:11,740
the state of the
process at time t.

638
00:47:11,740 --> 00:47:15,240
There's an observation error
vector at time t, epsilon_t.

639
00:47:15,240 --> 00:47:18,830
So it's k by 1 as well,
corresponding to y.

640
00:47:18,830 --> 00:47:22,200
And there's a state transition
innovation error vector,

641
00:47:22,200 --> 00:47:31,240
which is n by 1,
which actually can

642
00:47:31,240 --> 00:47:36,040
be different from m, the
dimension of the state vector.

643
00:47:36,040 --> 00:47:41,300
So we have-- in the state
space specification,

644
00:47:41,300 --> 00:47:43,720
we're going to specify
two equations, one

645
00:47:43,720 --> 00:47:47,640
for how the states evolve
over time and another for how

646
00:47:47,640 --> 00:47:50,090
the observations or
measurements evolve,

647
00:47:50,090 --> 00:47:51,910
depending on the
underlying states.

648
00:47:51,910 --> 00:47:55,400
So let's first focus
on a state equation

649
00:47:55,400 --> 00:47:58,490
which describes how
the state progresses

650
00:47:58,490 --> 00:48:05,680
from the state at time t to
the state at time t plus 1.

651
00:48:05,680 --> 00:48:09,030
Because this is a linear
state-space model,

652
00:48:09,030 --> 00:48:10,710
basically the state
at t plus 1 is

653
00:48:10,710 --> 00:48:13,400
going to be some linear
function of the states at time

654
00:48:13,400 --> 00:48:16,640
t plus some noise.

655
00:48:16,640 --> 00:48:22,570
And that noise is
given by eta_t,

656
00:48:22,570 --> 00:48:26,670
being independent identically
distributed white noise,

657
00:48:26,670 --> 00:48:31,600
or normally distributed
with some covariance matrix

658
00:48:31,600 --> 00:48:33,910
Q_t, positive definite.

659
00:48:33,910 --> 00:48:37,740
And R_t is some
linear transformation

660
00:48:37,740 --> 00:48:41,180
of those, which
characterize the uncertainty

661
00:48:41,180 --> 00:48:42,880
in the particular states.

662
00:48:42,880 --> 00:48:45,160
So there's a great
deal of flexibility

663
00:48:45,160 --> 00:48:47,830
here in how things
depend on each other.

664
00:48:47,830 --> 00:48:53,090
And right now, it will appear
just like a lot of notation.

665
00:48:53,090 --> 00:48:54,700
But as we see it
in different cases,

666
00:48:54,700 --> 00:48:57,750
you'll see how these
terms come into play.

667
00:48:57,750 --> 00:48:59,260
And they're very
straightforward.

668
00:49:02,510 --> 00:49:04,800
So we're considering simple
linear transformations

669
00:49:04,800 --> 00:49:07,080
of the states plus noise.

670
00:49:07,080 --> 00:49:09,690
And then the observation
equation or measurement

671
00:49:09,690 --> 00:49:13,080
equation is a linear
transformation

672
00:49:13,080 --> 00:49:14,665
of the underlying
states plus noise.

673
00:49:17,230 --> 00:49:20,230
So the matrix Z_t is the
observation coefficients

674
00:49:20,230 --> 00:49:21,500
matrix.

675
00:49:21,500 --> 00:49:25,792
And the noise or innovations
epsilon_t are, we'll assume,

676
00:49:25,792 --> 00:49:27,250
independent
identically distributed

677
00:49:27,250 --> 00:49:29,083
normal, multivariate
normal random variables

678
00:49:29,083 --> 00:49:33,550
with some covariance matrix H_t.

679
00:49:33,550 --> 00:49:35,760
To be fully general,
the subscript t

680
00:49:35,760 --> 00:49:40,800
means the covariance
can depend on time t.

681
00:49:40,800 --> 00:49:44,780
It doesn't have to, but it can.

682
00:49:44,780 --> 00:49:48,600
These two equations
can be written together

683
00:49:48,600 --> 00:49:52,830
in a joint equation where
we see that the underlying

684
00:49:52,830 --> 00:49:59,370
state at time t, s, gets
transformed with T sub t

685
00:49:59,370 --> 00:50:04,550
to the state at t plus 1 plus
residual innovation term.

686
00:50:04,550 --> 00:50:08,720
And the observation equation
y_t is Z_t s_t plus that.

687
00:50:08,720 --> 00:50:12,430
So we're representing how
the states evolve over time

688
00:50:12,430 --> 00:50:14,910
and how the observations
depend on the underlying

689
00:50:14,910 --> 00:50:16,815
states in this joint equation.

690
00:50:19,770 --> 00:50:23,950
And the structure of
basically this sort

691
00:50:23,950 --> 00:50:28,400
of linear function of states
plus error, the error term u_t

692
00:50:28,400 --> 00:50:33,740
here is normally distributed
with covariance matrix omega,

693
00:50:33,740 --> 00:50:36,690
which has this structure.

694
00:50:36,690 --> 00:50:38,850
It's a block diagonal.

695
00:50:38,850 --> 00:50:42,942
We have the covariance
of the epsilons as the H.

696
00:50:42,942 --> 00:50:48,860
And the covariance of R_t
eta_t is R_t Q_t R_t transpose.

697
00:50:48,860 --> 00:50:54,660
So you may recall when we
take a covariance matrix

698
00:50:54,660 --> 00:51:01,210
of linear function of random
variables given by a matrix,

699
00:51:01,210 --> 00:51:05,310
then it's that linear function
R times the covariance matrix

700
00:51:05,310 --> 00:51:07,970
times the transpose.

701
00:51:07,970 --> 00:51:12,910
So that term comes into play.

702
00:51:12,910 --> 00:51:16,860
So let's see how a
capital asset pricing

703
00:51:16,860 --> 00:51:19,720
model with time-varying
betas can be represented

704
00:51:19,720 --> 00:51:21,540
as a linear state-space model.

705
00:51:24,220 --> 00:51:29,180
You'll recall, we discussed
this model a few lectures ago,

706
00:51:29,180 --> 00:51:33,870
where we have the excess
return of a given stock, r_t,

707
00:51:33,870 --> 00:51:39,150
is a linear function of the
excess return of the market

708
00:51:39,150 --> 00:51:43,710
portfolio, r_(m,t), plus error.

709
00:51:43,710 --> 00:51:48,310
What we're going to do now
is extend that previous model

710
00:51:48,310 --> 00:51:54,170
by adding time dependence, t,
to the regression parameters.

711
00:51:54,170 --> 00:51:56,320
The alpha is not a constant.

712
00:51:56,320 --> 00:51:58,060
It is going to vary by time.

713
00:51:58,060 --> 00:52:02,700
And the beta is also
going to very by time.

714
00:52:02,700 --> 00:52:04,810
And how will they vary by time?

715
00:52:04,810 --> 00:52:10,030
Well, we're going to
assume that the alpha_t is

716
00:52:10,030 --> 00:52:13,520
a Gaussian random walk.

717
00:52:13,520 --> 00:52:17,982
And the beta is also a
Gaussian random walk.

718
00:52:28,810 --> 00:52:33,670
And with that set up, we
have the following expression

719
00:52:33,670 --> 00:52:35,450
for the state equation.

720
00:52:35,450 --> 00:52:38,460
OK, the state equation, which
is just the unknown parameters--

721
00:52:38,460 --> 00:52:40,990
it's the alpha and the
beta at given time t.

722
00:52:43,660 --> 00:52:45,720
The state at time
t gets adjusted

723
00:52:45,720 --> 00:52:49,340
to the state at time t plus 1
by just adding these random walk

724
00:52:49,340 --> 00:52:50,100
terms to it.

725
00:52:50,100 --> 00:52:52,290
So it's a very simple process.

726
00:52:52,290 --> 00:52:55,270
We have the identity
times the previous state

727
00:52:55,270 --> 00:52:58,930
plus the identity times this
vector of these innovations.

728
00:52:58,930 --> 00:53:04,120
So s_(t+1) is equal to
T_t s_t plus R_t eta_t,

729
00:53:04,120 --> 00:53:08,720
where this matrix, T sub
t and R sub t are trivial;

730
00:53:08,720 --> 00:53:10,290
they're just the identity.

731
00:53:10,290 --> 00:53:15,710
And eta_t has a
covariance matrix

732
00:53:15,710 --> 00:53:18,985
which is just given by
Q_t, sigma squared nu,

733
00:53:18,985 --> 00:53:22,560
sigma squared epsilon.

734
00:53:22,560 --> 00:53:28,680
This is a complex way, perhaps,
of representing this model.

735
00:53:28,680 --> 00:53:32,610
But it puts this simple model
into that linear state-space

736
00:53:32,610 --> 00:53:33,110
framework.

737
00:53:36,670 --> 00:53:45,660
Now, the observation equation
is given by this expression

738
00:53:45,660 --> 00:53:52,250
defining the Z_t matrix as the
unit element and r_(m,t) So

739
00:53:52,250 --> 00:53:58,150
it's basically a row vector, or
a row matrix, one-row matrix.

740
00:53:58,150 --> 00:54:02,180
And epsilon_t is the
white noise process.

741
00:54:02,180 --> 00:54:05,570
Now, putting these
equations together,

742
00:54:05,570 --> 00:54:09,270
we basically have the equation
for the state transition

743
00:54:09,270 --> 00:54:13,230
and the observation
equation together.

744
00:54:13,230 --> 00:54:16,120
We have this form for that.

745
00:54:25,780 --> 00:54:28,522
So now, let's
consider a second case

746
00:54:28,522 --> 00:54:31,360
of linear regression
models where

747
00:54:31,360 --> 00:54:33,780
we have a time varying beta.

748
00:54:33,780 --> 00:54:37,140
In a way, this case
we just looked at

749
00:54:37,140 --> 00:54:39,999
is a simple case of that.

750
00:54:39,999 --> 00:54:41,540
But let's look at
a more general case

751
00:54:41,540 --> 00:54:45,270
where we have p independent
variables, which

752
00:54:45,270 --> 00:54:47,190
could be time-varying.

753
00:54:47,190 --> 00:54:51,670
So we have a
regression model almost

754
00:54:51,670 --> 00:54:54,040
as we've considered
it previously.

755
00:54:54,040 --> 00:54:58,400
y_t is equal to x_t transpose
beta_t plus epsilon_t.

756
00:54:58,400 --> 00:55:00,850
The difference now is our
regression coefficients

757
00:55:00,850 --> 00:55:03,580
beta are allowed to
change over time.

758
00:55:09,880 --> 00:55:11,180
How do they change over time?

759
00:55:11,180 --> 00:55:14,120
Well, we're going to
assume that those also

760
00:55:14,120 --> 00:55:19,120
follow independent random
walks with variances

761
00:55:19,120 --> 00:55:23,090
of the random walks that
may depend on the component.

762
00:55:23,090 --> 00:55:24,770
So the joint
state-space equation

763
00:55:24,770 --> 00:55:32,530
here is given by the identity
times s_t plus eta_t.

764
00:55:32,530 --> 00:55:36,360
That's basically the random
walk process for the underlying

765
00:55:36,360 --> 00:55:37,600
regression parameters.

766
00:55:37,600 --> 00:55:42,360
And y_t is equal
to x_t transpose

767
00:55:42,360 --> 00:55:46,081
times the same regression
parameters plus the observation

768
00:55:46,081 --> 00:55:46,580
error.

769
00:55:56,480 --> 00:55:59,770
I guess needless to say, if we
consider the special case where

770
00:55:59,770 --> 00:56:04,610
the random walk
process is degenerate

771
00:56:04,610 --> 00:56:07,320
and they're basically
steps of size zero,

772
00:56:07,320 --> 00:56:10,410
then we get the normal linear
regression model coming out

773
00:56:10,410 --> 00:56:11,870
of this.

774
00:56:11,870 --> 00:56:17,950
If we were to be specifying
the linear state-space

775
00:56:17,950 --> 00:56:22,810
implementation of this model and
consider successive estimates

776
00:56:22,810 --> 00:56:25,270
of the model
parameters over time,

777
00:56:25,270 --> 00:56:28,970
then these equations would
give us recursive estimates

778
00:56:28,970 --> 00:56:34,080
for updating
regressions as we add

779
00:56:34,080 --> 00:56:37,500
additional values to the
data, additional observations

780
00:56:37,500 --> 00:56:38,000
to the data.

781
00:56:43,880 --> 00:56:49,960
Let's look at autoregressive
models of order p.

782
00:56:49,960 --> 00:56:55,780
The autoregressive model of
order p for a univariate time

783
00:56:55,780 --> 00:57:01,670
series has the setup given here.

784
00:57:01,670 --> 00:57:07,470
It's a polynomial
lag of the response

785
00:57:07,470 --> 00:57:10,940
variable y_t is equal to
the innovation epsilon_t.

786
00:57:10,940 --> 00:57:16,130
And we can define
the state vector

787
00:57:16,130 --> 00:57:24,980
to be equal to the vector of
p values, p successive values

788
00:57:24,980 --> 00:57:27,650
of the process.

789
00:57:27,650 --> 00:57:33,710
And so we basically
get a combination

790
00:57:33,710 --> 00:57:38,700
here of the observation equation
and state equation joining

791
00:57:38,700 --> 00:57:46,720
where basically
one of the states

792
00:57:46,720 --> 00:57:48,760
is actually equal
to the observation.

793
00:57:48,760 --> 00:57:52,600
And basically, with
this definition

794
00:57:52,600 --> 00:57:59,160
for a state of the vector
at the next time point t,

795
00:57:59,160 --> 00:58:03,730
that is equal to this
linear transformation

796
00:58:03,730 --> 00:58:09,114
of the lagged state vector
plus that innovation term.

797
00:58:09,114 --> 00:58:10,608
I dropped the mic.

798
00:58:16,600 --> 00:58:21,480
So the notation here
shows the structure

799
00:58:21,480 --> 00:58:26,240
for how this linear
state-space model is evolving.

800
00:58:26,240 --> 00:58:29,090
Basically, the
observation equation

801
00:58:29,090 --> 00:58:32,410
is the linear
combination of the five

802
00:58:32,410 --> 00:58:36,500
multiples of lags of the
values plus the residual.

803
00:58:36,500 --> 00:58:40,240
And the previous
lags of the states

804
00:58:40,240 --> 00:58:46,200
are just simply the identities
times those values, shifted.

805
00:58:46,200 --> 00:58:51,690
So it's a very simple structure
for the autoregressive process

806
00:58:51,690 --> 00:58:53,431
as a linear state-space model.

807
00:58:56,660 --> 00:59:02,470
We have, as I was just saying,
for the transition matrix T sub

808
00:59:02,470 --> 00:59:09,750
t, this matrix and the
observation equation

809
00:59:09,750 --> 00:59:13,730
is essentially picking out
the first element of the state

810
00:59:13,730 --> 00:59:16,540
vector, which has no
measurement error.

811
00:59:16,540 --> 00:59:18,490
So that simplifies that.

812
00:59:21,940 --> 00:59:27,210
The moving average
model of order q

813
00:59:27,210 --> 00:59:29,700
could also be expressed as
a linear state-space model.

814
00:59:37,240 --> 00:59:38,820
Remember, the
moving average model

815
00:59:38,820 --> 00:59:43,030
is one where our response
variable, y, is simply

816
00:59:43,030 --> 00:59:48,290
some linear combination
of innovations,

817
00:59:48,290 --> 00:59:50,500
q past innovations.

818
00:59:50,500 --> 00:59:55,350
And this state
vector, if we consider

819
00:59:55,350 --> 01:00:00,180
the state vector just
being basically q

820
01:00:00,180 --> 01:00:04,400
lags of the innovations,
then the transition

821
01:00:04,400 --> 01:00:08,780
of those underlying states is
given by this expression here.

822
01:00:14,690 --> 01:00:17,770
And we have a state equation,
an observation equation,

823
01:00:17,770 --> 01:00:23,500
which has these forms for these
various transition matrices

824
01:00:23,500 --> 01:00:30,615
and for how the innovation
terms are related.

825
01:00:40,840 --> 01:00:43,160
Let me just finish
up with example

826
01:00:43,160 --> 01:00:47,780
showing with the autoregressive
moving average model.

827
01:00:47,780 --> 01:00:49,340
And many years ago,
it was actually

828
01:00:49,340 --> 01:00:55,490
very difficult to
specify the estimation

829
01:00:55,490 --> 01:00:58,902
methods for autoregressive
moving average models.

830
01:00:58,902 --> 01:01:00,800
But the implementation
of these models

831
01:01:00,800 --> 01:01:05,590
as linear state-space models
facilitated that greatly.

832
01:01:05,590 --> 01:01:13,030
And with the ARMA model,
the setup basically

833
01:01:13,030 --> 01:01:14,730
is a combination of
the autoregressive

834
01:01:14,730 --> 01:01:16,900
moving average processes.

835
01:01:16,900 --> 01:01:20,280
We have an
autoregression of the y's

836
01:01:20,280 --> 01:01:24,719
is equal to a moving
average of the residuals

837
01:01:24,719 --> 01:01:25,510
or the innovations.

838
01:01:28,170 --> 01:01:32,550
And it's convenient in the setup
for linear state-space models

839
01:01:32,550 --> 01:01:37,720
to define the dimension m,
which is the maximum of p and q

840
01:01:37,720 --> 01:01:45,860
plus 1, and think of having
basically a possibly m order

841
01:01:45,860 --> 01:01:50,860
polynomial lag for each
of those two series.

842
01:01:50,860 --> 01:01:55,060
And we can basically
constrain those values

843
01:01:55,060 --> 01:01:59,134
to be 0 if m is greater than
p or m is greater than q.

844
01:02:06,880 --> 01:02:11,240
And Harvey, in a very
important work in '93,

845
01:02:11,240 --> 01:02:17,080
actually defined a particular
state-space representation

846
01:02:17,080 --> 01:02:19,350
for this process.

847
01:02:19,350 --> 01:02:20,980
And I guess it's
important to know

848
01:02:20,980 --> 01:02:24,310
that with these linear
state-space models,

849
01:02:24,310 --> 01:02:29,030
we're dealing with
characterizing structure

850
01:02:29,030 --> 01:02:31,750
in m-dimensional space.

851
01:02:31,750 --> 01:02:35,510
There's often some choice in how
you represent your underlying

852
01:02:35,510 --> 01:02:37,670
states.

853
01:02:37,670 --> 01:02:42,430
You can basically
re-parametrize the models

854
01:02:42,430 --> 01:02:47,080
by considering invertible
linear transformations

855
01:02:47,080 --> 01:02:49,760
of the underlying states.

856
01:02:49,760 --> 01:02:52,820
So let me go back here.

857
01:02:56,700 --> 01:02:59,990
In expressing the state
equation generally

858
01:02:59,990 --> 01:03:04,190
is T sub t s_t plus R_t eta_t.

859
01:03:04,190 --> 01:03:08,540
This matrix T sub t
and st-- basically s_t

860
01:03:08,540 --> 01:03:11,280
can be replaced by a linear
transformation of s_t,

861
01:03:11,280 --> 01:03:16,730
so long as we multiply
the T sub t by the inverse

862
01:03:16,730 --> 01:03:17,850
of that transformation.

863
01:03:17,850 --> 01:03:19,810
So there's flexibility
in the choice

864
01:03:19,810 --> 01:03:22,340
of our linear state-space
specification.

865
01:03:22,340 --> 01:03:28,820
And so there really are many
different equivalent linear

866
01:03:28,820 --> 01:03:33,380
state-space models for a
given process depending

867
01:03:33,380 --> 01:03:35,600
on exactly how you
define the states

868
01:03:35,600 --> 01:03:39,490
and the underlying
transformation matrix T.

869
01:03:39,490 --> 01:03:44,900
And the beauty of Harvey's
work was coming up

870
01:03:44,900 --> 01:03:47,490
with a nice representation
for the states,

871
01:03:47,490 --> 01:03:53,100
where we had very simple forms
for the various matrices.

872
01:03:53,100 --> 01:03:57,000
And the lecture notes here
go through the derivation

873
01:03:57,000 --> 01:03:59,430
of that for the ARMA process.

874
01:03:59,430 --> 01:04:04,490
And this derivation
is-- I just want

875
01:04:04,490 --> 01:04:08,240
to go through the
first case just

876
01:04:08,240 --> 01:04:11,020
to highlight how
the argument goes.

877
01:04:11,020 --> 01:04:15,090
We basically have this equation,
which is the original equation

878
01:04:15,090 --> 01:04:17,345
for an ARMA(p,q) process.

879
01:04:20,180 --> 01:04:25,810
And Harvey says, well,
define the first--

880
01:04:25,810 --> 01:04:29,460
or the state at time t to
be equal to the observation

881
01:04:29,460 --> 01:04:31,820
at time t.

882
01:04:31,820 --> 01:04:38,250
If we do that, then how
does this equation relate

883
01:04:38,250 --> 01:04:46,000
to the basically-- this is the
state at the next time point, t

884
01:04:46,000 --> 01:04:50,610
plus 1, is equal to phi_1
times the state at time t,

885
01:04:50,610 --> 01:05:00,340
plus a second state at time
t and a residual innovation

886
01:05:00,340 --> 01:05:01,420
eta_t.

887
01:05:01,420 --> 01:05:09,110
So by choosing the first state
to be the observation value

888
01:05:09,110 --> 01:05:16,680
at that time, we can then
solve for the second state,

889
01:05:16,680 --> 01:05:19,810
which is given by
this expression,

890
01:05:19,810 --> 01:05:25,730
just by rewriting our model
equation in terms of s_(1,t),

891
01:05:25,730 --> 01:05:27,880
s_(2,t) and eta_t.

892
01:05:27,880 --> 01:05:36,950
So this s_(2,t) is this function
of the observations and eta_t.

893
01:05:36,950 --> 01:05:39,440
So it's a very
simple specification

894
01:05:39,440 --> 01:05:41,820
of the second state.

895
01:05:41,820 --> 01:05:48,020
Just what is that
second state element

896
01:05:48,020 --> 01:05:50,520
given this definition
of the first one?

897
01:05:50,520 --> 01:05:54,650
And one can do this
process iteratively

898
01:05:54,650 --> 01:05:59,180
getting rid of the
observations and replacing them

899
01:05:59,180 --> 01:06:01,290
by underlying states.

900
01:06:01,290 --> 01:06:03,770
And at the end of
the day, you end up

901
01:06:03,770 --> 01:06:09,490
with this very simple form
for the transition matrix T.

902
01:06:09,490 --> 01:06:13,950
Basically, the T has the
autoregressive components

903
01:06:13,950 --> 01:06:16,410
as the first column
of the T matrix.

904
01:06:16,410 --> 01:06:20,440
And this R matrix has
this vector of the moving

905
01:06:20,440 --> 01:06:22,550
average components.

906
01:06:22,550 --> 01:06:28,330
So it's a very nice way
to represent the model.

907
01:06:28,330 --> 01:06:32,990
Coming up with it was something
very clever that he did.

908
01:06:32,990 --> 01:06:36,580
But what one can see is
that this basic model where

909
01:06:36,580 --> 01:06:41,620
you have the states
transitioning according

910
01:06:41,620 --> 01:06:45,540
to a linear transformation of
the previous state plus error,

911
01:06:45,540 --> 01:06:49,910
and the observation being some
function of the current states,

912
01:06:49,910 --> 01:06:54,119
plus error or not, depending
on the formulation,

913
01:06:54,119 --> 01:06:55,035
is the representation.

914
01:06:58,200 --> 01:07:03,770
Now, with all of
these models, a reason

915
01:07:03,770 --> 01:07:08,860
why linear state-space
modeling is in fact effective

916
01:07:08,860 --> 01:07:19,711
is that their specification is
fully specified with the Kalman

917
01:07:19,711 --> 01:07:20,210
filter.

918
01:07:22,730 --> 01:07:32,100
So with this formulation of
linear state-space models,

919
01:07:32,100 --> 01:07:37,000
the Kalman filter
as a methodology is

920
01:07:37,000 --> 01:07:41,380
the recursive computation
of the probability density

921
01:07:41,380 --> 01:07:48,535
functions for the underlying
states at basically

922
01:07:48,535 --> 01:07:52,420
t plus 1 given
information up to time t,

923
01:07:52,420 --> 01:07:56,710
as well as the joint
density of the future state

924
01:07:56,710 --> 01:07:59,800
and the future observation at
t plus 1, given information up

925
01:07:59,800 --> 01:08:02,370
to time t.

926
01:08:02,370 --> 01:08:05,520
And also just the
marginal distribution

927
01:08:05,520 --> 01:08:10,380
of the next observation given
the information up to time t.

928
01:08:20,490 --> 01:08:26,510
So what I want to do is
just go through with you

929
01:08:26,510 --> 01:08:31,550
how the Kalman filter is
implemented and defined.

930
01:08:31,550 --> 01:08:35,370
And the implementation
of the Kalman filter

931
01:08:35,370 --> 01:08:40,939
requires us to have some
notation that's a bit involved,

932
01:08:40,939 --> 01:08:46,710
but we'll hopefully explain it
so it's very straightforward.

933
01:08:46,710 --> 01:08:49,474
There are basically conditional
means of the states.

934
01:08:52,090 --> 01:08:55,450
s sub t given t
is the mean value

935
01:08:55,450 --> 01:08:59,510
of the state at time t given
the information up to time t.

936
01:08:59,510 --> 01:09:02,069
If we condition
on t minus 1, then

937
01:09:02,069 --> 01:09:03,500
it's the expectation
of the state

938
01:09:03,500 --> 01:09:06,300
at time t given the
information up to t minus 1.

939
01:09:09,460 --> 01:09:12,100
And then y t t minus
1 is the expectation

940
01:09:12,100 --> 01:09:16,880
of the observation given
information up to t minus 1.

941
01:09:16,880 --> 01:09:18,780
There's also
conditional covariances

942
01:09:18,780 --> 01:09:22,260
and mean squared errors.

943
01:09:22,260 --> 01:09:26,620
All these covariances
are determined by omegas.

944
01:09:26,620 --> 01:09:33,240
The subscript corresponds to
states s, or observation y.

945
01:09:33,240 --> 01:09:35,060
And basically, the
conditioning set

946
01:09:35,060 --> 01:09:39,149
is either information up to
time t, t minus 1 or t minus 1

947
01:09:39,149 --> 01:09:40,479
in the second case.

948
01:09:40,479 --> 01:09:45,370
And we want to compute
basically the covariance matrix

949
01:09:45,370 --> 01:09:49,999
of the states given whatever
the information is, information

950
01:09:49,999 --> 01:09:52,439
up to time t, t minus 1.

951
01:09:52,439 --> 01:09:57,810
So these covariance
matrices are the expectation

952
01:09:57,810 --> 01:10:01,990
of the state minus
their expectation

953
01:10:01,990 --> 01:10:06,850
under the conditioning times
the state minus the expectation

954
01:10:06,850 --> 01:10:07,950
transpose.

955
01:10:07,950 --> 01:10:10,810
That's the definition of
that covariance matrix.

956
01:10:10,810 --> 01:10:12,230
So the different
definitions here

957
01:10:12,230 --> 01:10:14,300
correspond to just
whether we're conditioning

958
01:10:14,300 --> 01:10:15,345
on different information.

959
01:10:17,900 --> 01:10:23,170
And then the observation
innovations or residuals

960
01:10:23,170 --> 01:10:29,510
are the difference
between an observation y_t

961
01:10:29,510 --> 01:10:33,847
and its estimate given
information up to t minus 1.

962
01:10:37,190 --> 01:10:41,370
So the residuals in this process
are the innovation residuals,

963
01:10:41,370 --> 01:10:44,200
one period ahead.

964
01:10:44,200 --> 01:10:50,780
And the Kalman filter
consists of four steps.

965
01:10:50,780 --> 01:11:00,800
We basically want to, first,
predict the state vector

966
01:11:00,800 --> 01:11:01,780
one step ahead.

967
01:11:01,780 --> 01:11:10,140
So given our estimate of the
state vector at time t minus 1,

968
01:11:10,140 --> 01:11:14,800
we want to predict this
state vector at time t.

969
01:11:14,800 --> 01:11:18,220
And we also want to
predict the observation

970
01:11:18,220 --> 01:11:23,820
at time t given our estimate
at state vector time t minus 1.

971
01:11:23,820 --> 01:11:31,674
And so at time t minus 1, we
can estimate these quantities.

972
01:11:31,674 --> 01:11:32,174
[INAUDIBLE]

973
01:11:35,646 --> 01:11:40,969
At t minus 1, we can
basically predict

974
01:11:40,969 --> 01:11:42,760
what the state is going
to and predict what

975
01:11:42,760 --> 01:11:44,750
the observation is going to be.

976
01:11:44,750 --> 01:11:47,166
And we can estimate
how much error there's

977
01:11:47,166 --> 01:11:49,707
going to be in those estimates,
by these covariance matrices.

978
01:11:59,420 --> 01:12:05,140
The second step is
updating these predictions

979
01:12:05,140 --> 01:12:11,900
to get our estimate of the state
given the observation at time t

980
01:12:11,900 --> 01:12:15,480
and to update our uncertainty
about that state given

981
01:12:15,480 --> 01:12:16,380
this new observation.

982
01:12:16,380 --> 01:12:21,350
So basically, our estimate
of the state at time t

983
01:12:21,350 --> 01:12:25,310
is an adjustment to our
estimate given information up

984
01:12:25,310 --> 01:12:31,164
to t minus 1, plus a function of
the difference between what we

985
01:12:31,164 --> 01:12:32,455
observed and what we predicted.

986
01:12:35,020 --> 01:12:42,870
And this T_t function matrix is
called the filter gain matrix.

987
01:12:42,870 --> 01:12:45,120
And basically, it
characterizes how

988
01:12:45,120 --> 01:12:50,070
do we adjust our prediction
of the underlying state

989
01:12:50,070 --> 01:12:52,760
depending on what happened.

990
01:12:52,760 --> 01:12:54,440
So that's the
filter gain matrix.

991
01:12:57,150 --> 01:13:00,470
So we actually do
gain information

992
01:13:00,470 --> 01:13:03,160
with each observation about what
the new value of the process

993
01:13:03,160 --> 01:13:04,320
is.

994
01:13:04,320 --> 01:13:06,830
And that information
is characterized

995
01:13:06,830 --> 01:13:09,190
by filter gain matrix.

996
01:13:09,190 --> 01:13:11,580
You'll notice that
the uncertainty

997
01:13:11,580 --> 01:13:15,720
in the state at time t, this
omega_s of t given t, that's

998
01:13:15,720 --> 01:13:19,630
equal to the covariance
matrix given t minus 1.

999
01:13:19,630 --> 01:13:23,330
So it's our beginning level
of uncertainty adjusted

1000
01:13:23,330 --> 01:13:27,790
by a term that tells us
how much information did we

1001
01:13:27,790 --> 01:13:29,580
get from that new information.

1002
01:13:29,580 --> 01:13:33,590
So notice that there's
a minus sign there.

1003
01:13:33,590 --> 01:13:35,600
We're basically
reducing our uncertainty

1004
01:13:35,600 --> 01:13:44,602
about the state given the
information in the innovation

1005
01:13:44,602 --> 01:13:45,685
that we now have observed.

1006
01:13:48,800 --> 01:13:51,870
Then, there's a
forecasting step which

1007
01:13:51,870 --> 01:13:59,310
is used to forecast the
state one period forward,

1008
01:13:59,310 --> 01:14:01,400
is simply given by this
linear transformation

1009
01:14:01,400 --> 01:14:03,170
of the previous state.

1010
01:14:03,170 --> 01:14:05,890
And we can also update
our covariance matrix

1011
01:14:05,890 --> 01:14:09,580
for future states given
the previous state

1012
01:14:09,580 --> 01:14:13,530
by applying this formula
which is a recursive formula

1013
01:14:13,530 --> 01:14:17,580
for estimating covariances.

1014
01:14:17,580 --> 01:14:24,760
So we have
forecasting algorithms

1015
01:14:24,760 --> 01:14:29,520
that are simple linear
functions of these estimates.

1016
01:14:29,520 --> 01:14:35,650
And then finally,
there's a smoothing step

1017
01:14:35,650 --> 01:14:43,960
which is characterizing
the conditional expectation

1018
01:14:43,960 --> 01:14:49,950
of underlying states, given
information in the whole time

1019
01:14:49,950 --> 01:14:51,150
series.

1020
01:14:51,150 --> 01:14:55,440
And so ordinarily with Kalman
filters, Kalman filters

1021
01:14:55,440 --> 01:14:58,210
are applied
sequentially over time

1022
01:14:58,210 --> 01:15:01,090
where one basically
is predicting ahead

1023
01:15:01,090 --> 01:15:03,550
one step, updating
that prediction,

1024
01:15:03,550 --> 01:15:08,320
predicting ahead another
step, updating the information

1025
01:15:08,320 --> 01:15:10,930
on the states.

1026
01:15:10,930 --> 01:15:19,410
And that overall
process is the process

1027
01:15:19,410 --> 01:15:21,550
of actually computing
the likelihood

1028
01:15:21,550 --> 01:15:25,210
function for these linear
state-space models.

1029
01:15:25,210 --> 01:15:32,140
And so the Kalman filter is
basically ultimately applied

1030
01:15:32,140 --> 01:15:35,010
for successive
forecasting of the process

1031
01:15:35,010 --> 01:15:39,600
but also for helping us identify
what the underlying model

1032
01:15:39,600 --> 01:15:43,430
parameters are using
maximum likelihood methods.

1033
01:15:43,430 --> 01:15:48,290
And so the likelihood function
for the linear state-space

1034
01:15:48,290 --> 01:15:52,050
model is basically the--
or the log-likelihood

1035
01:15:52,050 --> 01:15:54,920
is the log-likelihood of
the entire data series,

1036
01:15:54,920 --> 01:15:56,980
give the unknown parameters.

1037
01:15:56,980 --> 01:16:00,020
But that can be
expressed as the product

1038
01:16:00,020 --> 01:16:04,290
of the conditional distributions
of each successive observation,

1039
01:16:04,290 --> 01:16:07,150
given the history.

1040
01:16:07,150 --> 01:16:09,750
And so basically, the
likelihood of theta

1041
01:16:09,750 --> 01:16:12,390
is the likelihood of
the first observation

1042
01:16:12,390 --> 01:16:15,240
times the density of the
second observation given

1043
01:16:15,240 --> 01:16:18,990
the first times and so
forth for the whole series.

1044
01:16:18,990 --> 01:16:22,650
And so the likelihood
function is basically

1045
01:16:22,650 --> 01:16:25,490
a function of all these
terms that we were computing

1046
01:16:25,490 --> 01:16:26,490
with the Kalman filter.

1047
01:16:29,260 --> 01:16:33,470
And with the Kalman
filter, it basically

1048
01:16:33,470 --> 01:16:36,760
provides all the terms
necessary for this estimation.

1049
01:16:36,760 --> 01:16:42,270
If the error terms are
normally distributed,

1050
01:16:42,270 --> 01:16:46,550
then the means and
variances of these estimates

1051
01:16:46,550 --> 01:16:52,750
are in fact characterizing
the exact distributions

1052
01:16:52,750 --> 01:16:54,300
of the process.

1053
01:16:54,300 --> 01:16:56,850
Basically, we're taking--
if the innovation series are

1054
01:16:56,850 --> 01:16:59,290
all normal random
variables, then

1055
01:16:59,290 --> 01:17:00,980
the linear
state-space model, all

1056
01:17:00,980 --> 01:17:03,750
it's doing is taking linear
combinations of normals

1057
01:17:03,750 --> 01:17:07,410
for the underlying states and
for the actual observations.

1058
01:17:07,410 --> 01:17:08,890
And normal
distributions are fully

1059
01:17:08,890 --> 01:17:10,610
characterized by
their mean vectors

1060
01:17:10,610 --> 01:17:12,310
and covariance matrices.

1061
01:17:12,310 --> 01:17:14,050
And the Kalman
filter provides a way

1062
01:17:14,050 --> 01:17:21,570
to update these distributions
for all these features

1063
01:17:21,570 --> 01:17:23,000
of a model, the
underlying states

1064
01:17:23,000 --> 01:17:26,520
as well as the distributions
of the observations.

1065
01:17:26,520 --> 01:17:35,250
So that's a brief introduction
the Kalman filter.

1066
01:17:35,250 --> 01:17:36,940
Let's finish there.

1067
01:17:36,940 --> 01:17:38,490
Thank you.