1
00:00:00,000 --> 00:00:00,060

2
00:00:00,060 --> 00:00:01,780
The following
content is provided

3
00:00:01,780 --> 00:00:04,019
under a Creative
Commons license.

4
00:00:04,019 --> 00:00:06,870
Your support will help MIT
OpenCourseWare continue

5
00:00:06,870 --> 00:00:10,730
to offer high quality
educational resources for free.

6
00:00:10,730 --> 00:00:13,330
To make a donation or
view additional materials

7
00:00:13,330 --> 00:00:17,217
from hundreds of MIT courses,
visit MIT OpenCourseWare

8
00:00:17,217 --> 00:00:17,842
at ocw.mit.edu.

9
00:00:17,842 --> 00:00:21,546

10
00:00:21,546 --> 00:00:22,530
PROFESSOR: All right.

11
00:00:22,530 --> 00:00:24,649
I want to complete
the discussion

12
00:00:24,649 --> 00:00:26,940
on volatility modeling in
the first part of the lecture

13
00:00:26,940 --> 00:00:27,870
today.

14
00:00:27,870 --> 00:00:32,950
And last time we
addressed the definition

15
00:00:32,950 --> 00:00:38,070
of ARCH models, which
allow for time varying

16
00:00:38,070 --> 00:00:42,325
volatility in modeling the
returns of a financial time

17
00:00:42,325 --> 00:00:42,940
series.

18
00:00:42,940 --> 00:00:46,910
And we were looking last time
at modeling the Euro-dollar

19
00:00:46,910 --> 00:00:48,770
exchange rate returns.

20
00:00:48,770 --> 00:00:54,970
And we went through fitting
arch models to those returns,

21
00:00:54,970 --> 00:00:59,880
and also looked at fitting the
GARCH model to those returns.

22
00:00:59,880 --> 00:01:07,980
And to recap, the GARCH model
extends upon the ARCH model

23
00:01:07,980 --> 00:01:10,110
by adding some extra terms.

24
00:01:10,110 --> 00:01:12,890
So if you look at this
expression for the GARCH model,

25
00:01:12,890 --> 00:01:15,950
the first two terms for the
time varying volatility sigma

26
00:01:15,950 --> 00:01:19,130
squared t is a
linear combination

27
00:01:19,130 --> 00:01:23,750
of the past sort of
residual returns squared.

28
00:01:23,750 --> 00:01:26,420
That's the ARCH model, p
[? of ?] [? those. ?] So

29
00:01:26,420 --> 00:01:29,280
the current volatility depends
upon what's happened in excess

30
00:01:29,280 --> 00:01:31,460
returns over the last p periods.

31
00:01:31,460 --> 00:01:33,960
But then we add an
extra term, which

32
00:01:33,960 --> 00:01:39,380
is corresponds to queue levels
of the previous volatility.

33
00:01:39,380 --> 00:01:42,670
And so what we're
doing with GARCH

34
00:01:42,670 --> 00:01:46,150
models is adding extra
parameters to the ARCH,

35
00:01:46,150 --> 00:01:49,840
but an advantage of considering
these extra parameters which

36
00:01:49,840 --> 00:01:53,440
relate basically the current
volatility sigma squared t

37
00:01:53,440 --> 00:01:56,280
with the previous or lagged
value sigma squared t

38
00:01:56,280 --> 00:01:58,940
minus j for lags
j is that we may

39
00:01:58,940 --> 00:02:02,910
be able to have a model
with many fewer parameters.

40
00:02:02,910 --> 00:02:09,930
So indeed, if we fit these
models to the exchange rate

41
00:02:09,930 --> 00:02:15,670
returns, what we found last
time-- let me go through

42
00:02:15,670 --> 00:02:19,970
and show that--
was basically here

43
00:02:19,970 --> 00:02:25,890
are various fits of the
three cases of ARCH models.

44
00:02:25,890 --> 00:02:29,210
ARCH orders 1, 2, and
10, thinking we maybe

45
00:02:29,210 --> 00:02:31,760
need many lags to
fit volatility.

46
00:02:31,760 --> 00:02:34,980
And then the GARCH
model 1,1, where we only

47
00:02:34,980 --> 00:02:38,890
have one ARCH term
and one GARCH term.

48
00:02:38,890 --> 00:02:44,640
And so basically the green
line-- or rather the blue line

49
00:02:44,640 --> 00:02:48,550
in this graph, shows the
plot of the fitted GARCH 1,1,

50
00:02:48,550 --> 00:02:51,540
model as compared
with the ARCH models.

51
00:02:51,540 --> 00:02:55,960
Now, in looking
at this graph, one

52
00:02:55,960 --> 00:02:59,670
can actually see some features
of how these models are

53
00:02:59,670 --> 00:03:04,090
fitting volatility, which
is important to understand.

54
00:03:04,090 --> 00:03:07,820
One is that the ARCH
models have a hard lower

55
00:03:07,820 --> 00:03:10,950
bound on the volatility.

56
00:03:10,950 --> 00:03:14,030
Basically there's
a constant term

57
00:03:14,030 --> 00:03:16,070
in the volatility equation.

58
00:03:16,070 --> 00:03:22,410
And because the additional terms
are squared, excess returns,

59
00:03:22,410 --> 00:03:24,270
it basically-- the
volatility does

60
00:03:24,270 --> 00:03:25,800
have lower bound
of that intercept.

61
00:03:25,800 --> 00:03:29,230
So depending on what range
you fit the data over,

62
00:03:29,230 --> 00:03:33,190
that lower bound is
going to be defined by--

63
00:03:33,190 --> 00:03:37,080
or it will be determined by
the data you're fitting to.

64
00:03:37,080 --> 00:03:41,700
As you increase the ARCH
order, you basically

65
00:03:41,700 --> 00:03:46,940
allow for a greater range of--
or a lower lower bound of that.

66
00:03:46,940 --> 00:03:48,590
And with the GARCH
model you can see

67
00:03:48,590 --> 00:03:52,150
that this blue line
is actually predicting

68
00:03:52,150 --> 00:03:55,030
very different
levels of volatility

69
00:03:55,030 --> 00:03:57,300
over the entire
range of the series.

70
00:03:57,300 --> 00:03:59,310
So it really is
much more flexible.

71
00:03:59,310 --> 00:04:02,500

72
00:04:02,500 --> 00:04:07,580
Now-- and in these fits, we are
assuming Gaussian distributions

73
00:04:07,580 --> 00:04:11,140
for the innovations
in the return series.

74
00:04:11,140 --> 00:04:15,000
We'll soon pursue looking
at alternatives to that,

75
00:04:15,000 --> 00:04:18,100
but let me talk just
a little bit more

76
00:04:18,100 --> 00:04:23,980
about the GARCH model going
back to lecture notes here.

77
00:04:23,980 --> 00:04:26,295
So let me expand this.

78
00:04:26,295 --> 00:04:31,150

79
00:04:31,150 --> 00:04:31,650
OK.

80
00:04:31,650 --> 00:04:32,858
So there's the specification.

81
00:04:32,858 --> 00:04:35,000
The GARCH 1,1 model.

82
00:04:35,000 --> 00:04:42,070
One thing to note is that
this GARCH 1,1 model does

83
00:04:42,070 --> 00:04:46,340
relate to an ARMA, an Auto
Regressive Moving Average,

84
00:04:46,340 --> 00:04:49,410
process in the
squared residuals.

85
00:04:49,410 --> 00:04:53,290
So if we look at
the top line, which

86
00:04:53,290 --> 00:04:57,680
is the equation for
the GARCH 1,1 model,

87
00:04:57,680 --> 00:05:02,250
consider eliminating
sigma squared t

88
00:05:02,250 --> 00:05:09,330
by using a new innovation
term, little ut, which

89
00:05:09,330 --> 00:05:11,690
is the difference between
the squared residual

90
00:05:11,690 --> 00:05:15,060
and the true volatility
given by the model.

91
00:05:15,060 --> 00:05:19,910
So if you plug in the difference
between our squared excess

92
00:05:19,910 --> 00:05:22,680
return and the
current volatility,

93
00:05:22,680 --> 00:05:27,710
that should have mean
0 because sigma squared

94
00:05:27,710 --> 00:05:33,360
t, the t volatility, squared
is equal to the square--

95
00:05:33,360 --> 00:05:38,070
or is equal to the expectation
of the squared excess residual

96
00:05:38,070 --> 00:05:40,440
return, epsilon t squared.

97
00:05:40,440 --> 00:05:42,650
So if we plug that
in, we basically

98
00:05:42,650 --> 00:05:47,330
get an ARMA model for
the squared residuals.

99
00:05:47,330 --> 00:05:50,750
And so epsilon t squared
is alpha 0 plus alpha 1

100
00:05:50,750 --> 00:05:57,300
plus beta 1 the squared t minus
1 lag plus ut minus beta 1 ut.

101
00:05:57,300 --> 00:06:04,800
And so what this implies is an
ARMA 1,1 model with white noise

102
00:06:04,800 --> 00:06:08,740
that has mean 0 and variance
2 sigma to the fourth.

103
00:06:08,740 --> 00:06:11,150
Just plugging things in.

104
00:06:11,150 --> 00:06:15,380
And through our
knowledge, understanding,

105
00:06:15,380 --> 00:06:19,120
of univariate time series
models, ARMA models,

106
00:06:19,120 --> 00:06:23,370
we can express this ARMA model
for the squared residuals

107
00:06:23,370 --> 00:06:27,660
as basically a polynomial
lag of the squared residuals

108
00:06:27,660 --> 00:06:32,690
is equal to a polynomial
lag of the innovations.

109
00:06:32,690 --> 00:06:38,330
And so we have this expression
for what the innovations are.

110
00:06:38,330 --> 00:06:47,020
And it's required that the
roots of this a of l operator,

111
00:06:47,020 --> 00:06:49,820
when it thought of it
on the complex plane,

112
00:06:49,820 --> 00:06:53,670
have roots outside the unit
circle, which corresponds

113
00:06:53,670 --> 00:06:57,690
to alpha 1 plus beta 1 being
less than 1 in magnitude.

114
00:06:57,690 --> 00:07:01,020
So in order for these
volatility models

115
00:07:01,020 --> 00:07:04,780
not to blow up and be
stationary, covariant

116
00:07:04,780 --> 00:07:08,480
stationary, we have these
bounds on the parameters.

117
00:07:08,480 --> 00:07:14,790

118
00:07:14,790 --> 00:07:19,330
OK, let's look at the
unconditional volatility

119
00:07:19,330 --> 00:07:23,260
or long run variance
of the GARCH model.

120
00:07:23,260 --> 00:07:29,190
If you take expectations on
both sides of the GARCH model

121
00:07:29,190 --> 00:07:34,270
equation, you basically have
the expectation of sigma

122
00:07:34,270 --> 00:07:38,400
squared sub t-- in the long
run is sigma star squared-- is

123
00:07:38,400 --> 00:07:43,630
alpha 0 plus alpha 1 plus
beta 1 sigma star squared.

124
00:07:43,630 --> 00:07:46,690
So that sigma star squared
there is the expectation

125
00:07:46,690 --> 00:07:58,521
of the t minus 1 volatility
squared in the limit.

126
00:07:58,521 --> 00:08:00,020
And then you can
just solve for this

127
00:08:00,020 --> 00:08:02,080
and see the sigma star
squared is equal alpha

128
00:08:02,080 --> 00:08:05,050
0 over 1 minus alpha
1 minus beta 1.

129
00:08:05,050 --> 00:08:10,180
And in terms of the stationarity
conditions for the process,

130
00:08:10,180 --> 00:08:15,050
if the long run variance, in
order for that to be finite,

131
00:08:15,050 --> 00:08:20,500
you need alpha 1 plus beta 1
to be less than 1 in magnitude.

132
00:08:20,500 --> 00:08:26,710
And if you consider the
general GARCH p,1 model,

133
00:08:26,710 --> 00:08:31,460
then the same argument leads to
a long run variance being equal

134
00:08:31,460 --> 00:08:35,490
to alpha 0, the sort of
intercept term in the GARCH

135
00:08:35,490 --> 00:08:40,030
model, divided by 1 minus the
sum of all the parameters.

136
00:08:40,030 --> 00:08:43,740
So these GARCH models
lead to constraints

137
00:08:43,740 --> 00:08:50,580
on the parameters that are
important to incorporate when

138
00:08:50,580 --> 00:08:54,870
we're doing any estimation of
these underlying parameters.

139
00:08:54,870 --> 00:08:56,650
And it does complicate
things, actually.

140
00:08:56,650 --> 00:08:59,460

141
00:08:59,460 --> 00:09:04,820
So with maximum
likelihood estimation,

142
00:09:04,820 --> 00:09:07,270
the routine for maximum
likelihood estimation

143
00:09:07,270 --> 00:09:08,840
is the same for all models.

144
00:09:08,840 --> 00:09:11,930
We basically want to determine
the likelihood function

145
00:09:11,930 --> 00:09:14,430
of our data given the
unknown parameters.

146
00:09:14,430 --> 00:09:17,090
And the likelihood function
is the probability density

147
00:09:17,090 --> 00:09:21,380
function of the data
conditional on the parameters.

148
00:09:21,380 --> 00:09:23,280
So our likelihood
function as a function

149
00:09:23,280 --> 00:09:26,470
of the unknown parameters
c, alpha, and beta

150
00:09:26,470 --> 00:09:30,390
is the value of the probability
density, the joint density

151
00:09:30,390 --> 00:09:34,530
of all the data conditional
on those parameters.

152
00:09:34,530 --> 00:09:37,820
And that joint
density function can

153
00:09:37,820 --> 00:09:41,030
be expressed as the product
of successive conditional

154
00:09:41,030 --> 00:09:42,600
expectations of the time series.

155
00:09:42,600 --> 00:09:45,420

156
00:09:45,420 --> 00:09:51,600
And those conditional densities
are normal random variables.

157
00:09:51,600 --> 00:09:53,650
So we can just plug
in what we know

158
00:09:53,650 --> 00:09:56,290
to be the probability
densities of normals

159
00:09:56,290 --> 00:10:01,270
for the t-th
innovation epsilon t.

160
00:10:01,270 --> 00:10:04,340
And we just optimize
that function.

161
00:10:04,340 --> 00:10:09,050
Now, the challenge with
estimating these GARCH models

162
00:10:09,050 --> 00:10:13,990
in part is the constraints
on the underlying parameters.

163
00:10:13,990 --> 00:10:16,000
Those need to be enforced.

164
00:10:16,000 --> 00:10:19,250
So we have to have that the
alpha i are greater than 0.

165
00:10:19,250 --> 00:10:21,860
Also, the beta j
are greater than 0.

166
00:10:21,860 --> 00:10:25,800
And the sum of all of
them is between 0 and 1.

167
00:10:25,800 --> 00:10:29,406

168
00:10:29,406 --> 00:10:33,860
Who in this class has had
courses in numerical analysis

169
00:10:33,860 --> 00:10:39,680
and done some
optimization of functions?

170
00:10:39,680 --> 00:10:40,840
Non-linear functions?

171
00:10:40,840 --> 00:10:43,220
Anybody?

172
00:10:43,220 --> 00:10:43,990
OK.

173
00:10:43,990 --> 00:10:48,490
Well, in addressing this kind
of problem, which will come up

174
00:10:48,490 --> 00:10:51,300
with any complex model
that you need to estimate,

175
00:10:51,300 --> 00:10:56,640
say via maximum likelihood,
the optimization methods

176
00:10:56,640 --> 00:11:02,610
do really well if you're
optimizing a convex function,

177
00:11:02,610 --> 00:11:05,020
finding the minimum
of a convex function.

178
00:11:05,020 --> 00:11:09,520
And it's always nice
to do minimization

179
00:11:09,520 --> 00:11:14,460
over sort of an unconstrained
range of underlying parameters.

180
00:11:14,460 --> 00:11:18,630
And one of the tracks in
solving these problems

181
00:11:18,630 --> 00:11:23,070
is to transform the
parameters to a scale

182
00:11:23,070 --> 00:11:27,480
where they're unlimited
in range, basically.

183
00:11:27,480 --> 00:11:29,420
So if you have a
positive random variable,

184
00:11:29,420 --> 00:11:32,200
you might use to log of
that variable as the thing

185
00:11:32,200 --> 00:11:33,860
to be optimizing over.

186
00:11:33,860 --> 00:11:36,180
If the variable's
between 0 and 1,

187
00:11:36,180 --> 00:11:39,050
then you might use
that variable divided

188
00:11:39,050 --> 00:11:43,699
by 1 minus that variable and
then take the log of that.

189
00:11:43,699 --> 00:11:44,740
And that's unconstrained.

190
00:11:44,740 --> 00:11:48,300
So there are tricks for how
you do this optimization, which

191
00:11:48,300 --> 00:11:49,580
come into play.

192
00:11:49,580 --> 00:11:53,290
Anyway, that's the likelihood
with the normal distribution.

193
00:11:53,290 --> 00:11:57,650
And we have computer
programs that

194
00:11:57,650 --> 00:11:59,150
will solve that
directly so we don't

195
00:11:59,150 --> 00:12:01,270
have to worry about
this particular case.

196
00:12:01,270 --> 00:12:05,430

197
00:12:05,430 --> 00:12:08,320
Once we fit this model,
we want to evaluate

198
00:12:08,320 --> 00:12:13,650
how good it is and the
evaluation is based

199
00:12:13,650 --> 00:12:16,860
upon looking at the
residuals from the model.

200
00:12:16,860 --> 00:12:19,450
So what we have are
these innovations,

201
00:12:19,450 --> 00:12:23,760
epsilon hat t, which
should be distributed

202
00:12:23,760 --> 00:12:28,880
with variance or
volatility sigma hat t.

203
00:12:28,880 --> 00:12:34,600
Those should be uncorrelated
with themselves or at least

204
00:12:34,600 --> 00:12:37,620
to the extent that they can be.

205
00:12:37,620 --> 00:12:39,520
And the squared
standardized residuals

206
00:12:39,520 --> 00:12:40,732
should also be uncorrelated.

207
00:12:40,732 --> 00:12:42,440
What we're trying to
do with these models

208
00:12:42,440 --> 00:12:45,990
is to capture the
dependence, actually,

209
00:12:45,990 --> 00:12:50,490
in this squared residuals,
which is measuring

210
00:12:50,490 --> 00:12:52,585
the magnitude of the
excess in returns.

211
00:12:52,585 --> 00:12:53,960
So though should
be uncorrelated.

212
00:12:53,960 --> 00:12:56,690

213
00:12:56,690 --> 00:12:58,340
There are various
test for normality.

214
00:12:58,340 --> 00:13:01,630
I've listed some of those that
are the most popular here.

215
00:13:01,630 --> 00:13:05,910
And then there's issues
of model selection

216
00:13:05,910 --> 00:13:10,470
for deciding sort of which
GARCH model to apply.

217
00:13:10,470 --> 00:13:17,280
I wanted to go through an
example of this analysis

218
00:13:17,280 --> 00:13:20,630
with the Euro-dollar
exchange rate.

219
00:13:20,630 --> 00:13:27,480
So let me go to this
case study note.

220
00:13:27,480 --> 00:13:31,700
So let's see.

221
00:13:31,700 --> 00:13:37,480
There's a package in r called
rugarch for univariate GARCH

222
00:13:37,480 --> 00:13:44,420
models, which fits various
GARCH models with different--

223
00:13:44,420 --> 00:13:47,780
and fits them by
maximum likelihood.

224
00:13:47,780 --> 00:13:52,280
So with this packet-- with
this particular library in r,

225
00:13:52,280 --> 00:14:08,320
I fit the GARCH
model after actually

226
00:14:08,320 --> 00:14:12,470
fitting the mean process for
the exchange rate returns.

227
00:14:12,470 --> 00:14:14,470
Now, when we looked
at things last time,

228
00:14:14,470 --> 00:14:16,950
we basically looked at
modeling the squared returns.

229
00:14:16,950 --> 00:14:19,380
In fact, there may be an
underlying mean process

230
00:14:19,380 --> 00:14:21,320
that needs to be
specified as well.

231
00:14:21,320 --> 00:14:24,010
So in this section
of the case note,

232
00:14:24,010 --> 00:14:29,970
I initially fit an
auto regressive process

233
00:14:29,970 --> 00:14:32,020
using the Akaike
information criterion

234
00:14:32,020 --> 00:14:35,590
to choose the order of the
auto regressive process

235
00:14:35,590 --> 00:14:42,200
and then fit a GARCH model
with normal GARCH terms.

236
00:14:42,200 --> 00:14:47,110
And this is a plot
of the normal q,

237
00:14:47,110 --> 00:14:51,800
q plot of the auto
regressive residuals.

238
00:14:51,800 --> 00:14:56,220
And what you can see is
that the points lie along

239
00:14:56,220 --> 00:14:59,350
a straight line sort of in
the middle of the range.

240
00:14:59,350 --> 00:15:03,260
But on the extremes, they
depart from that straight line.

241
00:15:03,260 --> 00:15:09,510
This basically is a measure
of standardized quantiles.

242
00:15:09,510 --> 00:15:12,120
So in terms of
standard units away

243
00:15:12,120 --> 00:15:16,120
from the mean for
the residuals, we

244
00:15:16,120 --> 00:15:19,499
tend to get many more high
values and many more low values

245
00:15:19,499 --> 00:15:20,790
with the Gaussian distribution.

246
00:15:20,790 --> 00:15:24,620
So that really isn't
fitting very well.

247
00:15:24,620 --> 00:15:33,300
If we proceed and fit--
OK, actually that plot

248
00:15:33,300 --> 00:15:37,590
was just the simple ARCH
model with no GARCH terms.

249
00:15:37,590 --> 00:15:46,240
And then this is the graph of
the q, q plot with the Gaussian

250
00:15:46,240 --> 00:15:47,350
assumption.

251
00:15:47,350 --> 00:15:50,840
So here we can see that the
residuals from this model

252
00:15:50,840 --> 00:15:54,650
are suggesting that it may do a
pretty good job when things are

253
00:15:54,650 --> 00:15:57,600
only a few standard
deviations away from the mean.

254
00:15:57,600 --> 00:15:59,310
Less than two 2, 2.5.

255
00:15:59,310 --> 00:16:02,480
But when we get to
more extreme values,

256
00:16:02,480 --> 00:16:04,690
this isn't modeling things well.

257
00:16:04,690 --> 00:16:09,510
So one alternative is
to consider a heavier

258
00:16:09,510 --> 00:16:12,740
tailed distribution
than the normal, namely

259
00:16:12,740 --> 00:16:14,920
the t distribution.

260
00:16:14,920 --> 00:16:19,230
And consider identifying
what t distribution best

261
00:16:19,230 --> 00:16:20,000
fits the data.

262
00:16:20,000 --> 00:16:24,490

263
00:16:24,490 --> 00:16:31,620
So let's just look at what ends
up being the maximum likelihood

264
00:16:31,620 --> 00:16:34,340
estimate for the degrees
of freedom parameter, which

265
00:16:34,340 --> 00:16:37,150
is 10 degrees of freedom.

266
00:16:37,150 --> 00:16:38,810
This shows the q,
q plot when you

267
00:16:38,810 --> 00:16:41,880
have a non-Gaussian
distribution that's

268
00:16:41,880 --> 00:16:45,380
t with 10 degrees of freedom.

269
00:16:45,380 --> 00:16:47,880
It basically is explaining
these residuals quite well,

270
00:16:47,880 --> 00:16:53,690
so that's accommodating the
heavier tailed distribution

271
00:16:53,690 --> 00:16:55,245
of these values.

272
00:16:55,245 --> 00:16:59,630

273
00:16:59,630 --> 00:17:06,550
With this GARCH
model, let's see--

274
00:17:06,550 --> 00:17:11,400
if you compare sort of estimate
of volatility under the GARCH

275
00:17:11,400 --> 00:17:18,849
and ARCH models-- the
GARCH models with the t

276
00:17:18,849 --> 00:17:24,040
distribution-- sorry t
distribution versus Gaussian.

277
00:17:24,040 --> 00:17:27,610
Here's just a graph showing time
series plots of the estimated

278
00:17:27,610 --> 00:17:29,860
volatility over time, which
actually look quite close.

279
00:17:29,860 --> 00:17:31,401
But when you look
at the differences,

280
00:17:31,401 --> 00:17:33,640
there really are differences.

281
00:17:33,640 --> 00:17:43,220
And so it turns out that
the volatility function

282
00:17:43,220 --> 00:17:46,380
or the volatility estimate
GARCH models with Gaussian

283
00:17:46,380 --> 00:17:48,460
versus GARCH with
t distributions

284
00:17:48,460 --> 00:17:51,560
are really very, very similar.

285
00:17:51,560 --> 00:17:53,250
And the heavier
tailed distribution

286
00:17:53,250 --> 00:17:58,740
of the t distribution
means that the distribution

287
00:17:58,740 --> 00:18:03,200
of actual volatility is greater.

288
00:18:03,200 --> 00:18:05,240
But in terms of
estimating the volatility,

289
00:18:05,240 --> 00:18:09,380
you have quite similar estimates
of the volatility coming out.

290
00:18:09,380 --> 00:18:13,110
And this display--
which you'll be

291
00:18:13,110 --> 00:18:15,740
able to see more clearly in the
case notes that I'll post up--

292
00:18:15,740 --> 00:18:18,875
but show that these are really
quite similar in magnitude.

293
00:18:18,875 --> 00:18:22,260

294
00:18:22,260 --> 00:18:31,350
And the value at risk concept
that was just-- by Ken couple

295
00:18:31,350 --> 00:18:35,950
weeks ago in his lecture
from Morgan Stanley--

296
00:18:35,950 --> 00:18:38,500
concerns the issue
of estimating what

297
00:18:38,500 --> 00:18:44,480
is the likelihood of returns
exceeding some threshold.

298
00:18:44,480 --> 00:18:52,310
And if we use the t distribution
for measuring variability

299
00:18:52,310 --> 00:18:57,420
of the excess returns, then
the computations in the notes

300
00:18:57,420 --> 00:19:03,000
indicate how you would compute
these value at risk limits.

301
00:19:03,000 --> 00:19:05,360
If you compare
the t distribution

302
00:19:05,360 --> 00:19:08,920
with a Gaussian distribution at
these nominal levels for value

303
00:19:08,920 --> 00:19:13,140
at risk of like 2.5%
or 5%, surprisingly you

304
00:19:13,140 --> 00:19:16,070
won't get too much difference.

305
00:19:16,070 --> 00:19:19,690
It's really in looking at
sort of the extreme tails

306
00:19:19,690 --> 00:19:24,400
of the distribution that
things come into play.

307
00:19:24,400 --> 00:19:30,260
And so I wanted to show
you how that plays out

308
00:19:30,260 --> 00:19:36,030
by showing you
another graph here.

309
00:19:36,030 --> 00:19:40,080
Those of you who have had
a statistics course before

310
00:19:40,080 --> 00:19:43,350
have heard that sort
of a t distribution

311
00:19:43,350 --> 00:19:46,801
can be a good
approximation to a normal--

312
00:19:46,801 --> 00:19:48,550
or it can be approximated
well by a normal

313
00:19:48,550 --> 00:19:53,060
if the degrees of freedom
for the t are at some level.

314
00:19:53,060 --> 00:19:56,670
And who wants to suggest
a degrees of freedom

315
00:19:56,670 --> 00:20:00,140
that you might
have before you're

316
00:20:00,140 --> 00:20:02,330
comfortable approximating
a t with a normal?

317
00:20:02,330 --> 00:20:05,480

318
00:20:05,480 --> 00:20:05,980
Danny?

319
00:20:05,980 --> 00:20:06,771
AUDIENCE: 30 or 40.

320
00:20:06,771 --> 00:20:08,600
PROFESSOR: 30 or 40.

321
00:20:08,600 --> 00:20:10,230
Sometimes people say even 25.

322
00:20:10,230 --> 00:20:13,210
Above 25, you can almost
expect the t distribution

323
00:20:13,210 --> 00:20:15,260
to be a good approximation
to the normal.

324
00:20:15,260 --> 00:20:19,340
Well, this is a graph the
PDF for a standard normal

325
00:20:19,340 --> 00:20:22,350
versus a standard t with
30 degrees of freedom.

326
00:20:22,350 --> 00:20:26,080
And you can see that the density
functions are very, very close.

327
00:20:26,080 --> 00:20:28,774
The standard-- the CDFs,
the Cumulative Distribution

328
00:20:28,774 --> 00:20:30,190
Functions, which
is the likelihood

329
00:20:30,190 --> 00:20:35,080
of being less than or equal
to the horizontal value

330
00:20:35,080 --> 00:20:37,779
ranges between 0 and 1-- is
almost indistinguishable.

331
00:20:37,779 --> 00:20:39,820
But if you look at the
tails of the distribution,

332
00:20:39,820 --> 00:20:43,470
here I've computed the
log of the CDF function.

333
00:20:43,470 --> 00:20:46,290
You basically have
to move much more

334
00:20:46,290 --> 00:20:48,290
than two standard deviations
away from the mean

335
00:20:48,290 --> 00:20:50,380
before there's really
a difference in the t

336
00:20:50,380 --> 00:20:53,742
distribution with 30
degrees of freedom.

337
00:20:53,742 --> 00:20:56,010
Now I'm going to
page up by reducing

338
00:20:56,010 --> 00:20:58,390
the degrees of freedom.

339
00:20:58,390 --> 00:21:00,628
Let's see.

340
00:21:00,628 --> 00:21:04,480
If we could do a page down here.

341
00:21:04,480 --> 00:21:07,420
Page down.

342
00:21:07,420 --> 00:21:08,850
Oh, page up.

343
00:21:08,850 --> 00:21:10,130
OK.

344
00:21:10,130 --> 00:21:15,520
So here is 20
degrees of freedom.

345
00:21:15,520 --> 00:21:21,380
Here's 10 degrees of freedom,
in our case, which turns out

346
00:21:21,380 --> 00:21:24,180
to be sort of the best
fit of the t distribution.

347
00:21:24,180 --> 00:21:29,240
And what you can see is that,
in terms of standard deviation

348
00:21:29,240 --> 00:21:31,870
units, up to about two standard
deviations below the mean,

349
00:21:31,870 --> 00:21:34,120
we're basically getting
virtually the same probability

350
00:21:34,120 --> 00:21:37,040
mass at the extreme below.

351
00:21:37,040 --> 00:21:40,860
But as we go to four or
six standard deviations,

352
00:21:40,860 --> 00:21:45,210
then we get heavier mass
with the t distribution.

353
00:21:45,210 --> 00:21:49,340
In discussion of
results in finance

354
00:21:49,340 --> 00:21:51,730
when you sort of fit models,
people talk about, oh, there

355
00:21:51,730 --> 00:21:55,300
was six standard
deviation move or-- which

356
00:21:55,300 --> 00:21:57,210
is just virtually
impossible to occur.

357
00:21:57,210 --> 00:22:02,100
Well, with t distributions a
six standard deviation move

358
00:22:02,100 --> 00:22:07,700
occurs about 1 in 10,000
times according to this fit.

359
00:22:07,700 --> 00:22:11,240
And so it actually is a
common [? idiomatic. ?]

360
00:22:11,240 --> 00:22:16,460
And so it's important to know
that these t distributions are

361
00:22:16,460 --> 00:22:19,920
benefiting us by giving us
a much better gauge of what

362
00:22:19,920 --> 00:22:22,250
the tail distribution is like.

363
00:22:22,250 --> 00:22:27,220
And we call these
distributions leptokurtic,

364
00:22:27,220 --> 00:22:30,260
meaning they're heavier tailed
than a normal distribution.

365
00:22:30,260 --> 00:22:33,950
Actually, lepto means
slender, I believe,

366
00:22:33,950 --> 00:22:40,000
if you're Greek or have the
Greek origin of the word.

367
00:22:40,000 --> 00:22:44,100
And you can see that
the blue curve, which

368
00:22:44,100 --> 00:22:47,070
is the t distribution, is
sort of a bit more slender

369
00:22:47,070 --> 00:22:49,087
in the center of the
distribution, which

370
00:22:49,087 --> 00:22:50,420
allows it to have heavier tails.

371
00:22:50,420 --> 00:22:56,270

372
00:22:56,270 --> 00:22:56,770
All right.

373
00:22:56,770 --> 00:22:59,710
So t distributions
are very useful.

374
00:22:59,710 --> 00:23:10,230
Let's go back to
this case note here

375
00:23:10,230 --> 00:23:14,940
which discusses-- this
case note goes through,

376
00:23:14,940 --> 00:23:17,950
actually, fitting
the t distribution--

377
00:23:17,950 --> 00:23:21,160
identifying the degrees of
freedom for this t model.

378
00:23:21,160 --> 00:23:27,230
And so with ru GARCH package,
we can get the log likelihood

379
00:23:27,230 --> 00:23:33,150
of the data fit under the
t distribution assumption.

380
00:23:33,150 --> 00:23:36,030
And here's a graph of the
negative log likelihood

381
00:23:36,030 --> 00:23:40,900
versus the degrees of
freedom in the t model.

382
00:23:40,900 --> 00:23:45,060
So with maximum likelihood
we identify the value,

383
00:23:45,060 --> 00:23:48,040
which minimizes the
negative log likelihood.

384
00:23:48,040 --> 00:23:51,012
And that comes out
as that 10 value.

385
00:23:51,012 --> 00:23:54,260

386
00:23:54,260 --> 00:23:56,280
All right.

387
00:23:56,280 --> 00:23:57,960
Let's go back to
these notes and see

388
00:23:57,960 --> 00:23:59,293
what else we want to talk about.

389
00:23:59,293 --> 00:24:05,250

390
00:24:05,250 --> 00:24:05,750
All right.

391
00:24:05,750 --> 00:24:13,380

392
00:24:13,380 --> 00:24:16,610
OK, with these GARCH
models we actually

393
00:24:16,610 --> 00:24:19,860
are able to model
volatility clustering.

394
00:24:19,860 --> 00:24:26,700
And volatility clustering
is where, over time, you

395
00:24:26,700 --> 00:24:30,630
expect volatility to be
high during some periods

396
00:24:30,630 --> 00:24:33,060
and to be low during
other periods.

397
00:24:33,060 --> 00:24:35,900
And the GARCH model
can accommodate that.

398
00:24:35,900 --> 00:24:38,080
So large volatilities
tend to be followed

399
00:24:38,080 --> 00:24:39,830
by large, small
volatilities tend

400
00:24:39,830 --> 00:24:42,711
to be followed by small ones.

401
00:24:42,711 --> 00:24:43,210
OK.

402
00:24:43,210 --> 00:24:49,240
The returns have heavier tails
than Gaussian distributions.

403
00:24:49,240 --> 00:24:53,020
Actually, even if we have
Gaussian errors in the GARCH

404
00:24:53,020 --> 00:24:56,870
model, it's still heavier
tailed than a Gaussian.

405
00:24:56,870 --> 00:24:59,990
The homework goes into
that a little bit.

406
00:24:59,990 --> 00:25:06,590
And the-- well, actually
one of the original papers

407
00:25:06,590 --> 00:25:10,690
by Engle with Bollerslev, who
introduced the GARCH model,

408
00:25:10,690 --> 00:25:13,490
discusses these
features and how useful

409
00:25:13,490 --> 00:25:17,000
they are for modeling
financial time series.

410
00:25:17,000 --> 00:25:25,860
Now, a property of these models
that may be obvious, perhaps,

411
00:25:25,860 --> 00:25:29,090
but it is-- OK,
these are models that

412
00:25:29,090 --> 00:25:33,310
are appropriate for modeling
covariant stationary time

413
00:25:33,310 --> 00:25:34,230
series.

414
00:25:34,230 --> 00:25:37,970
So the volatility
measure, which is

415
00:25:37,970 --> 00:25:42,540
a measure of the
squared excess return,

416
00:25:42,540 --> 00:25:46,950
is basically a covariant
stationary process.

417
00:25:46,950 --> 00:25:48,320
So what does that mean?

418
00:25:48,320 --> 00:25:51,870
That means that's going
to have a long term mean.

419
00:25:51,870 --> 00:25:55,060
So with these are GARCH models
that are covariant stationary,

420
00:25:55,060 --> 00:25:58,320
there's going to be a long
term mean of the GARCH process.

421
00:25:58,320 --> 00:26:06,820
And this discussion here
details how this GARCH process

422
00:26:06,820 --> 00:26:14,390
is essentially a mean
reversion of the volatility

423
00:26:14,390 --> 00:26:16,420
to that value.

424
00:26:16,420 --> 00:26:21,110
So basically, the sort
of excess volatility

425
00:26:21,110 --> 00:26:24,760
of the squared residuals
relative to their long term

426
00:26:24,760 --> 00:26:29,600
average is some multiple
of the previous period's

427
00:26:29,600 --> 00:26:32,300
excess volatility.

428
00:26:32,300 --> 00:26:36,770
So if we build forecasting
models of volatility

429
00:26:36,770 --> 00:26:42,390
with GARCH models,
what's going to happen?

430
00:26:42,390 --> 00:26:47,370
Basically, in the long run we
predict that any volatility

431
00:26:47,370 --> 00:26:51,470
value is going to revert
to this long run average.

432
00:26:51,470 --> 00:26:54,600
And in the short run, it's
going to move incrementally

433
00:26:54,600 --> 00:26:56,740
to that value.

434
00:26:56,740 --> 00:27:02,080
So these GARCH models are very
good for describing volatility

435
00:27:02,080 --> 00:27:03,650
relative to the
long term average.

436
00:27:03,650 --> 00:27:06,620
In terms of their
usefulness for prediction,

437
00:27:06,620 --> 00:27:09,580
well, they really
predict that volatility

438
00:27:09,580 --> 00:27:13,250
is going to revert back
to the mean at some rate.

439
00:27:13,250 --> 00:27:18,840
And the rate at which the
volatility reverts back

440
00:27:18,840 --> 00:27:22,150
is given by alpha 1 plus beta 1.

441
00:27:22,150 --> 00:27:25,870
So that number,
which is less than 1

442
00:27:25,870 --> 00:27:29,960
for covariant stationarity, is
sort of measuring, basically,

443
00:27:29,960 --> 00:27:33,900
how quickly you are
reverting back to the mean.

444
00:27:33,900 --> 00:27:36,860
And that sum is actually
called a persistence parameter

445
00:27:36,860 --> 00:27:38,820
in GARCH models as well.

446
00:27:38,820 --> 00:27:40,970
So is volatility
persisting or not?

447
00:27:40,970 --> 00:27:43,400
Well, the larger alpha
1 plus beta 1 is.

448
00:27:43,400 --> 00:27:47,290
The more persistent
volatility is, meaning it's

449
00:27:47,290 --> 00:27:51,330
reverting back to that long
run average very, very slowly.

450
00:27:51,330 --> 00:27:54,180
In the implementation
of volatility estimates

451
00:27:54,180 --> 00:27:58,410
with the risk
metrics methodology,

452
00:27:58,410 --> 00:28:04,090
they actually don't assume that
there is a long run volatility.

453
00:28:04,090 --> 00:28:08,540
And so that basically you'll
have alpha 1 be equal to 0

454
00:28:08,540 --> 00:28:14,130
and beta 1 equal to, say, 0.95.

455
00:28:14,130 --> 00:28:22,270
So or rather the alpha 0 is
0 and the alpha 1 and beta 1

456
00:28:22,270 --> 00:28:24,110
will actually sum to 1.

457
00:28:24,110 --> 00:28:28,220
And so you actually are tracking
a potentially non-stationary

458
00:28:28,220 --> 00:28:35,550
volatility, which allows you
to be estimating the volatility

459
00:28:35,550 --> 00:28:39,239
without presuming a
long run average is

460
00:28:39,239 --> 00:28:40,280
consistent with the past.

461
00:28:40,280 --> 00:28:45,290

462
00:28:45,290 --> 00:28:48,410
There are many extensions
of the GARCH models.

463
00:28:48,410 --> 00:28:53,162
And there's wide
literature on that.

464
00:28:53,162 --> 00:28:55,370
For this course, I think
it's important to understand

465
00:28:55,370 --> 00:28:59,020
the fundamentals of these
models in terms of how they're

466
00:28:59,020 --> 00:29:02,890
specified under Gaussian
and t assumptions.

467
00:29:02,890 --> 00:29:07,050
Extending them can
be very interesting.

468
00:29:07,050 --> 00:29:12,650
And there are many papers
to look at for that.

469
00:29:12,650 --> 00:29:13,150
OK.

470
00:29:13,150 --> 00:29:16,260

471
00:29:16,260 --> 00:29:21,210
let's pause for a minute
and get the next topic.

472
00:29:21,210 --> 00:29:34,921

473
00:29:34,921 --> 00:29:35,420
All right.

474
00:29:35,420 --> 00:29:42,630
The next topic is time series,
multivariate time series.

475
00:29:42,630 --> 00:29:46,220
In two lectures ago of mine,
we talked about univariate time

476
00:29:46,220 --> 00:29:49,270
series and basic
methodologies there.

477
00:29:49,270 --> 00:29:52,240
We're now going to be
extending that to multivariate

478
00:29:52,240 --> 00:29:55,040
time series.

479
00:29:55,040 --> 00:30:00,110
Turns out there's a multivariate
Wold representation theorem,

480
00:30:00,110 --> 00:30:02,860
extension of the univariate one.

481
00:30:02,860 --> 00:30:04,750
There are auto
regressive processes

482
00:30:04,750 --> 00:30:07,590
for multivariate cases, which
are vector auto regressive

483
00:30:07,590 --> 00:30:09,150
processes.

484
00:30:09,150 --> 00:30:12,040
Least squares estimation
comes into play.

485
00:30:12,040 --> 00:30:17,460
And then we'll see where
our regression analysis

486
00:30:17,460 --> 00:30:20,860
understanding allows us
to specify these vector

487
00:30:20,860 --> 00:30:25,670
auto regressive
processes nicely.

488
00:30:25,670 --> 00:30:30,120
There's an optimality properties
of ordinary least squares

489
00:30:30,120 --> 00:30:35,020
estimates component wise, which
we'll highlight in about a half

490
00:30:35,020 --> 00:30:35,530
an hour.

491
00:30:35,530 --> 00:30:39,870
And go through the maximum
likelihood estimation model

492
00:30:39,870 --> 00:30:44,680
selection methods,
which are just

493
00:30:44,680 --> 00:30:47,210
very straightforward
extensions of the same concepts

494
00:30:47,210 --> 00:30:54,270
for univariate time series
and univariate regressions.

495
00:30:54,270 --> 00:30:56,940
So let's talk-- let's
introduce the notation

496
00:30:56,940 --> 00:30:59,010
for multivariate time series.

497
00:30:59,010 --> 00:31:04,990
We have a stochastic process,
which now is multivariate.

498
00:31:04,990 --> 00:31:11,320
So we have bold x of t is
some m dimensional valued

499
00:31:11,320 --> 00:31:13,390
random variable.

500
00:31:13,390 --> 00:31:21,170
And it's stochastic process
that varies over time t.

501
00:31:21,170 --> 00:31:28,910
And we can think of this
as m different time series

502
00:31:28,910 --> 00:31:31,940
corresponding to the m
components of the given

503
00:31:31,940 --> 00:31:32,440
process.

504
00:31:32,440 --> 00:31:34,900
So, say, with exchange
rates we could

505
00:31:34,900 --> 00:31:39,190
be modeling m different
exchange rate values and want

506
00:31:39,190 --> 00:31:42,090
to model those jointly
as a time series.

507
00:31:42,090 --> 00:31:47,450
Or we could have collections
of stocks that we're modeling.

508
00:31:47,450 --> 00:31:49,160
So each of the
components individually

509
00:31:49,160 --> 00:31:53,680
can be treated as univariate
series with univariate methods.

510
00:31:53,680 --> 00:31:58,070

511
00:31:58,070 --> 00:32:00,740
With the multivariate
case, we extend

512
00:32:00,740 --> 00:32:04,000
the definition of
covariance stationarity

513
00:32:04,000 --> 00:32:11,390
to correspond to finite bounded
first and second order moments.

514
00:32:11,390 --> 00:32:14,060
So we need to talk
about the first order

515
00:32:14,060 --> 00:32:17,820
moment of the
multivariate time series.

516
00:32:17,820 --> 00:32:22,970
Mu now is an m vector, which is
the vector of expected values

517
00:32:22,970 --> 00:32:26,120
of the individual components,
which we can denote by mu 1

518
00:32:26,120 --> 00:32:27,470
through mu m.

519
00:32:27,470 --> 00:32:31,535
So we basically have m
vectors for our mean.

520
00:32:31,535 --> 00:32:35,830

521
00:32:35,830 --> 00:32:40,370
Then for the
variance/covariance matrix,

522
00:32:40,370 --> 00:32:46,490
let's define gamma 0 to be
the variance/covariance matrix

523
00:32:46,490 --> 00:32:51,440
of the t-th observation of
our multivariate process.

524
00:32:51,440 --> 00:32:56,270
So that's equal to
the expected value

525
00:32:56,270 --> 00:32:59,770
of xt minus mu xt
minus mu prime.

526
00:32:59,770 --> 00:33:10,900
So when we write that
down, we have xt minus mu.

527
00:33:10,900 --> 00:33:16,160
This is basically
an m by 1 vector

528
00:33:16,160 --> 00:33:23,530
and then xt minus mu
prime is a 1 by m vector.

529
00:33:23,530 --> 00:33:28,010
And so the product of that
is an m by m quantity.

530
00:33:28,010 --> 00:33:36,266
So the 1, 1 element of that
product is the variance of x1t.

531
00:33:36,266 --> 00:33:38,015
And the diagonal entries
are the variances

532
00:33:38,015 --> 00:33:40,800
of the components series.

533
00:33:40,800 --> 00:33:43,790
And the off diagonal
values are the covariance

534
00:33:43,790 --> 00:33:50,220
between the i-th row series
and the j-th column series,

535
00:33:50,220 --> 00:33:54,130
as given by the i-th row
of x and the j-th column

536
00:33:54,130 --> 00:33:58,100
of x transpose.

537
00:33:58,100 --> 00:34:01,270
So we're just collecting
together all the variances

538
00:34:01,270 --> 00:34:02,770
covariances together.

539
00:34:02,770 --> 00:34:07,630
And the notation is very
straightforward and simple

540
00:34:07,630 --> 00:34:10,800
with the matrix
notation given here.

541
00:34:10,800 --> 00:34:21,500
Now, the correlation matrix,
r0, is obtained by pre and post

542
00:34:21,500 --> 00:34:25,120
multiplying this
covariance matrix gamma

543
00:34:25,120 --> 00:34:31,110
0 by a diagonal matrix
with the square roots

544
00:34:31,110 --> 00:34:32,969
of the diagonal of this matrix.

545
00:34:32,969 --> 00:34:34,010
Now what's a correlation?

546
00:34:34,010 --> 00:34:38,800
Correlation is the correlation
between two random variables

547
00:34:38,800 --> 00:34:42,750
where we've standardize
the variables to have

548
00:34:42,750 --> 00:34:45,580
mean 0 and variance 1.

549
00:34:45,580 --> 00:34:49,020

550
00:34:49,020 --> 00:34:53,440
So what we want to do is
basically divide through all

551
00:34:53,440 --> 00:34:57,710
of these variables by
their standard deviation

552
00:34:57,710 --> 00:35:02,440
and compute the covariance
matrix on that new scaling.

553
00:35:02,440 --> 00:35:04,980
That's equivalent to just
pre and post multiplying

554
00:35:04,980 --> 00:35:08,390
by that diagonal of the inverse
of the standard deviations.

555
00:35:08,390 --> 00:35:11,630
So with matrix
algebra, that formula

556
00:35:11,630 --> 00:35:18,510
is-- I think it's very clear.

557
00:35:18,510 --> 00:35:26,320
And this is-- now with-- the
previous discussion was just

558
00:35:26,320 --> 00:35:29,230
looking at the sort of
contemporaneous covariance

559
00:35:29,230 --> 00:35:32,570
matrix of the time series
values at the given time

560
00:35:32,570 --> 00:35:34,820
t with itself.

561
00:35:34,820 --> 00:35:38,970
We want to look at, also, the
cross covariance matrices.

562
00:35:38,970 --> 00:35:44,890
So how are the current values
of the multivariate time series

563
00:35:44,890 --> 00:35:52,000
xt-- how do they covary with
the k-th lag of those values?

564
00:35:52,000 --> 00:35:57,810
So gamma k is looking at
how the current period is

565
00:35:57,810 --> 00:36:04,750
vector values as covaried with
the k-th lag of those values.

566
00:36:04,750 --> 00:36:09,670
So this covariance matrix
has covariance elements

567
00:36:09,670 --> 00:36:14,090
given in this display.

568
00:36:14,090 --> 00:36:18,890
And we can define the
cross correlation matrix

569
00:36:18,890 --> 00:36:21,450
by similarly pre
and post multiplying

570
00:36:21,450 --> 00:36:24,960
by the inverse of the
standard deviations.

571
00:36:24,960 --> 00:36:28,565
The diagonal of gamma
0 is the covariance--

572
00:36:28,565 --> 00:36:33,680
or is the matrix of diagonal
entries of variances.

573
00:36:33,680 --> 00:36:35,980
Now, properties of
these matrices is-- OK,

574
00:36:35,980 --> 00:36:40,150
gamma 0 is a symmetric
matrix that we had before.

575
00:36:40,150 --> 00:36:44,142
But gamma k where k is
greater than 1 or less than--

576
00:36:44,142 --> 00:36:46,600
or greater or equal to 1 or
less than-- basically different

577
00:36:46,600 --> 00:36:47,700
from 0.

578
00:36:47,700 --> 00:36:51,020
This is not symmetric.

579
00:36:51,020 --> 00:36:56,680
Basically, you may have
lags of some variables that

580
00:36:56,680 --> 00:37:01,010
are positively correlated with
others and not vice versa.

581
00:37:01,010 --> 00:37:08,510
So the off diagonal entries
here aren't necessarily even

582
00:37:08,510 --> 00:37:11,930
of the same sign, let
alone equal and symmetric.

583
00:37:11,930 --> 00:37:17,800
So with these
covariance matrices,

584
00:37:17,800 --> 00:37:20,820
one can look at
how things covary

585
00:37:20,820 --> 00:37:25,290
and whether they are--
whether there is, basically,

586
00:37:25,290 --> 00:37:28,300
a dependence between them.

587
00:37:28,300 --> 00:37:29,830
And you can define--
it's basically

588
00:37:29,830 --> 00:37:34,240
the j star series--
the j star component

589
00:37:34,240 --> 00:37:37,160
of the multivariate
time series may

590
00:37:37,160 --> 00:37:43,790
lead the j-th one if the
covariance of the k-th lag of j

591
00:37:43,790 --> 00:37:50,650
star is different from 0--
or the covariance of j star k

592
00:37:50,650 --> 00:37:57,510
lags ago is non-zero
covaries with the j-th lag.

593
00:37:57,510 --> 00:37:58,770
Sorry.

594
00:37:58,770 --> 00:37:59,780
The current lag.

595
00:37:59,780 --> 00:38:02,260
So xt j star will lead xtj.

596
00:38:02,260 --> 00:38:07,290
Basically, there's information
in the lagged values

597
00:38:07,290 --> 00:38:13,110
of j star for the component j.

598
00:38:13,110 --> 00:38:19,100
So if we're trying to build
models-- linear regression

599
00:38:19,100 --> 00:38:22,810
models, even, where we're
trying to look at how-- trying

600
00:38:22,810 --> 00:38:26,520
to predict values, then if
there's a non-zero covariance,

601
00:38:26,520 --> 00:38:29,110
then we can use those
variables' information

602
00:38:29,110 --> 00:38:34,760
to actually project what the
one variable is given the other.

603
00:38:34,760 --> 00:38:39,400
Now, it can be the
case that you have

604
00:38:39,400 --> 00:38:44,000
non-zero covariance
in both directions.

605
00:38:44,000 --> 00:38:46,280
And so that suggests
that there can

606
00:38:46,280 --> 00:38:48,400
be sort of feedback
between these variables.

607
00:38:48,400 --> 00:38:51,420
It's not just that one
variable causes another,

608
00:38:51,420 --> 00:38:54,170
but there can
actually be feedback.

609
00:38:54,170 --> 00:38:55,970
In economics and
finance, there's

610
00:38:55,970 --> 00:39:00,790
a notion of Granger causality.

611
00:39:00,790 --> 00:39:04,200
And basically that--
well, Granger and Engle

612
00:39:04,200 --> 00:39:08,890
got the Nobel Prize number of
years ago based on their work.

613
00:39:08,890 --> 00:39:16,030
And that work deals with
identifying, in part, judgments

614
00:39:16,030 --> 00:39:19,170
of causality
between-- or Granger

615
00:39:19,170 --> 00:39:22,470
causality between variables
economic time series.

616
00:39:22,470 --> 00:39:24,780
And so Granger
causality basically

617
00:39:24,780 --> 00:39:31,900
is sort of positive or non-zero
correlation between variables

618
00:39:31,900 --> 00:39:35,720
where lags of one variable will
cause another or cause changes

619
00:39:35,720 --> 00:39:36,650
in another.

620
00:39:36,650 --> 00:39:41,310

621
00:39:41,310 --> 00:39:42,040
All right.

622
00:39:42,040 --> 00:39:45,560
I want to just alert you to
the existence of this Wold

623
00:39:45,560 --> 00:39:47,750
decomposition theorem.

624
00:39:47,750 --> 00:39:53,060
This is an advanced theorem,
but it's a useful theorem

625
00:39:53,060 --> 00:39:55,600
to know exists.

626
00:39:55,600 --> 00:40:00,500
And this extends-- the
univariate Wold decomposition

627
00:40:00,500 --> 00:40:03,700
theorem, which
concerns the-- whenever

628
00:40:03,700 --> 00:40:06,550
we have a covariant
stationary process,

629
00:40:06,550 --> 00:40:11,740
there exists a representation
of that process, which

630
00:40:11,740 --> 00:40:15,870
is the sum of a
deterministic process

631
00:40:15,870 --> 00:40:23,160
and s moving average
process of a white noise.

632
00:40:23,160 --> 00:40:27,860
So if you're modeling
a time series

633
00:40:27,860 --> 00:40:30,160
and you're going
to be specifying

634
00:40:30,160 --> 00:40:33,890
a covariant stationary
process for that,

635
00:40:33,890 --> 00:40:38,120
there does exist a Wold
decomposition representation

636
00:40:38,120 --> 00:40:39,190
of that.

637
00:40:39,190 --> 00:40:44,660
You can basically
determine-- identify

638
00:40:44,660 --> 00:40:47,740
the deterministic process
that the process might follow.

639
00:40:47,740 --> 00:40:52,650
It might be a linear trend over
time or an exponential trend.

640
00:40:52,650 --> 00:40:58,490
And if you remove that sort
of deterministic process vt,

641
00:40:58,490 --> 00:41:02,520
then what remains
is a process that

642
00:41:02,520 --> 00:41:08,550
can be modeled with a moving
average of white noise.

643
00:41:08,550 --> 00:41:09,390
These.

644
00:41:09,390 --> 00:41:12,610
Now here, everything is
changed from univariate case

645
00:41:12,610 --> 00:41:16,670
to multivariate case, so we have
matrices in place of constants

646
00:41:16,670 --> 00:41:18,280
from before.

647
00:41:18,280 --> 00:41:24,330
So these-- new
concepts here are we

648
00:41:24,330 --> 00:41:26,555
have a multivariate
white noise process.

649
00:41:26,555 --> 00:41:29,180

650
00:41:29,180 --> 00:41:32,360
That's going to be a
process a to t which

651
00:41:32,360 --> 00:41:36,660
is m dimensional
which has mean 0.

652
00:41:36,660 --> 00:41:42,020
And the variance
matrix of this m vector

653
00:41:42,020 --> 00:41:46,470
is going to be sigma, which is
now m by m variance/covariance

654
00:41:46,470 --> 00:41:49,810
matrix of the components.

655
00:41:49,810 --> 00:41:52,930
And that must be a positive
semi-definite definite.

656
00:41:52,930 --> 00:41:58,880
And for white noise, we have
covariances between, say,

657
00:41:58,880 --> 00:42:03,320
the current t innovation and
a lag of its value are 0.

658
00:42:03,320 --> 00:42:07,440
So these are uncorrelated
multivariate white noise

659
00:42:07,440 --> 00:42:09,250
processes.

660
00:42:09,250 --> 00:42:14,130
And so they're uncorrelated
with each other at various lags.

661
00:42:14,130 --> 00:42:19,470
And the innovation a to
t has a covariance of 0

662
00:42:19,470 --> 00:42:22,070
with the deterministic process.

663
00:42:22,070 --> 00:42:25,720
Actually, that's
pretty much a given

664
00:42:25,720 --> 00:42:28,470
if we have a
deterministic process.

665
00:42:28,470 --> 00:42:33,900
Now, the term psi k-- basically
we have this vector xt

666
00:42:33,900 --> 00:42:36,410
is equal to the sum
m vectored process vt

667
00:42:36,410 --> 00:42:38,565
plus this weighted
average of innovations.

668
00:42:38,565 --> 00:42:41,970

669
00:42:41,970 --> 00:42:48,300
What's required is that the sum
of this-- basically each term

670
00:42:48,300 --> 00:42:51,810
psi k and its
transpose converges.

671
00:42:51,810 --> 00:42:54,990
Now, if you were to
take that xt process

672
00:42:54,990 --> 00:42:57,920
and say let me compute the
variance/covariance matrix

673
00:42:57,920 --> 00:43:02,310
of that representation,
then you would basically

674
00:43:02,310 --> 00:43:05,130
get terms in the
covariance matrix

675
00:43:05,130 --> 00:43:08,710
which includes
this sum of terms.

676
00:43:08,710 --> 00:43:12,260
So that sum has to be
finite in order for this

677
00:43:12,260 --> 00:43:15,509
to be covariant stationary.

678
00:43:15,509 --> 00:43:16,425
AUDIENCE: [INAUDIBLE].

679
00:43:16,425 --> 00:43:17,230
PROFESSOR: Yes?

680
00:43:17,230 --> 00:43:20,270
AUDIENCE: Could you define
what you mean by innovation?

681
00:43:20,270 --> 00:43:22,960
PROFESSOR: Oh, OK.

682
00:43:22,960 --> 00:43:27,050
Well, the innovation
is-- let's see.

683
00:43:27,050 --> 00:43:32,230
With-- let me go back up here.

684
00:43:32,230 --> 00:43:32,730
OK.

685
00:43:32,730 --> 00:43:39,240

686
00:43:39,240 --> 00:43:43,700
The innovation process--
innovation process.

687
00:43:43,700 --> 00:43:46,330

688
00:43:46,330 --> 00:43:50,220
OK, if we have, as
in this case, we

689
00:43:50,220 --> 00:43:58,760
have sort of our xt
stochastic process.

690
00:43:58,760 --> 00:44:04,290
And we have sort of,
say, f sub t minus 1

691
00:44:04,290 --> 00:44:15,250
equal to the information
on x2 minus 1 x2 minus 2.

692
00:44:15,250 --> 00:44:19,730
Basically consisting of the
information set available

693
00:44:19,730 --> 00:44:21,520
before time t.

694
00:44:21,520 --> 00:44:30,838
Then we can model xt to be the
expected value of xt given ft

695
00:44:30,838 --> 00:44:38,530
minus 1 plus an innovation.

696
00:44:38,530 --> 00:44:40,680
And so our objective
in these models

697
00:44:40,680 --> 00:44:50,000
is to be thinking of how is that
process evolving where we can

698
00:44:50,000 --> 00:44:53,190
model the process as well as
possible using information up

699
00:44:53,190 --> 00:44:54,200
to time before t.

700
00:44:54,200 --> 00:44:59,539
And then there's some
disturbance about that model.

701
00:44:59,539 --> 00:45:01,080
There's something
new that's happened

702
00:45:01,080 --> 00:45:04,460
at time t that wasn't
available before.

703
00:45:04,460 --> 00:45:07,660
And that's this
innovation process.

704
00:45:07,660 --> 00:45:11,120
So this representation
with the Wold decomposition

705
00:45:11,120 --> 00:45:16,600
is converting the-- or
representing, basically,

706
00:45:16,600 --> 00:45:20,410
the bits of information that
are affecting the process that

707
00:45:20,410 --> 00:45:24,294
are occurring at time t and
wasn't available prior to that.

708
00:45:24,294 --> 00:45:27,590

709
00:45:27,590 --> 00:45:29,060
All right.

710
00:45:29,060 --> 00:45:33,690
Well, let's move on to vector
auto regressive processes.

711
00:45:33,690 --> 00:45:39,840

712
00:45:39,840 --> 00:45:43,600
OK, this representation for a
vector auto regressive process

713
00:45:43,600 --> 00:45:47,580
is an extension of the
univariate auto regressive

714
00:45:47,580 --> 00:45:49,783
process to p dimensions.

715
00:45:49,783 --> 00:45:52,380
Sorry, to m dimensions.

716
00:45:52,380 --> 00:45:56,630
And so our xt is an m vector.

717
00:45:56,630 --> 00:46:05,120
That's going to be equal to some
constant vector c plus a matrix

718
00:46:05,120 --> 00:46:10,850
phi 1 times lag of xt
first order, xt minus 1.

719
00:46:10,850 --> 00:46:17,195
Plus another matrix, phi 2
times the second lag of xt,

720
00:46:17,195 --> 00:46:19,790
xt minus 2.

721
00:46:19,790 --> 00:46:25,040
Up to the p-th term, which is
a phi p m by m matrix times

722
00:46:25,040 --> 00:46:28,800
x2 minus p plus this
innovation term.

723
00:46:28,800 --> 00:46:33,030
So this is essentially-- this is
basically how a univariate auto

724
00:46:33,030 --> 00:46:37,320
regressive process extends
to an m variate case.

725
00:46:37,320 --> 00:46:43,550
And what this
allows one to do is

726
00:46:43,550 --> 00:46:47,500
model how a given component
of the multivariate series--

727
00:46:47,500 --> 00:46:51,500
like how one exchange
rate varies depending

728
00:46:51,500 --> 00:46:54,500
on how other exchange
rates might vary.

729
00:46:54,500 --> 00:47:00,490
Exchange rates tend to co-move
together in that example.

730
00:47:00,490 --> 00:47:04,070
So if we look at
what this represents

731
00:47:04,070 --> 00:47:08,130
in terms of basically
a component series,

732
00:47:08,130 --> 00:47:10,770
we can consider
fixing j, a component

733
00:47:10,770 --> 00:47:13,830
of the multivariate process.

734
00:47:13,830 --> 00:47:16,080
It could be the first,
the last, or the j-th,

735
00:47:16,080 --> 00:47:17,300
somewhere in the middle.

736
00:47:17,300 --> 00:47:20,590
And that component
time series-- like

737
00:47:20,590 --> 00:47:23,645
a fixed exchange
rate series or time

738
00:47:23,645 --> 00:47:26,920
series, whatever we're
focused on in our modeling--

739
00:47:26,920 --> 00:47:30,590
is a generalization of
the auto regressive model

740
00:47:30,590 --> 00:47:34,410
where we have the
auto regressive terms

741
00:47:34,410 --> 00:47:39,890
of the j-th series on lags of
the j-th series up to order p.

742
00:47:39,890 --> 00:47:43,410
So we have the univariate
auto regressive model,

743
00:47:43,410 --> 00:47:47,140
but we also add to that
terms corresponding

744
00:47:47,140 --> 00:47:52,130
to the relationship
between xj and xj star.

745
00:47:52,130 --> 00:47:54,340
So how does xj,
the j-th component,

746
00:47:54,340 --> 00:47:57,360
depend on other variables,
other components

747
00:47:57,360 --> 00:47:58,790
of the multivariate series.

748
00:47:58,790 --> 00:48:01,840
And those are given here.

749
00:48:01,840 --> 00:48:07,890
So it's a convenient way to
allow for interdependence

750
00:48:07,890 --> 00:48:10,219
among the components
and model that.

751
00:48:10,219 --> 00:48:15,210

752
00:48:15,210 --> 00:48:16,850
OK.

753
00:48:16,850 --> 00:48:25,490
This slide deals with
representing a p-th order

754
00:48:25,490 --> 00:48:32,720
process as a first order process
with vector auto regressions.

755
00:48:32,720 --> 00:48:37,970
Now the concept here is really
a very powerful concept that's

756
00:48:37,970 --> 00:48:41,750
applied in time
series methods, which

757
00:48:41,750 --> 00:48:51,110
is when you are modeling
dependence that goes back, say,

758
00:48:51,110 --> 00:48:57,060
a number of lags like
p lags, the structure

759
00:48:57,060 --> 00:49:02,450
can actually be re-expressed
as simply a first order

760
00:49:02,450 --> 00:49:04,660
dependence only.

761
00:49:04,660 --> 00:49:09,070
And so it's much easier sort
of to deal with just a lag one

762
00:49:09,070 --> 00:49:13,670
dependence then to
consider p lag dependence

763
00:49:13,670 --> 00:49:17,560
and the complications
involved with that.

764
00:49:17,560 --> 00:49:22,500
So-- and this
technique is one where,

765
00:49:22,500 --> 00:49:26,390
in the early days of
fitting, like auto regressive

766
00:49:26,390 --> 00:49:34,520
moving average processes and
various smoothing methods,

767
00:49:34,520 --> 00:49:40,520
the model-- basically
accommodating

768
00:49:40,520 --> 00:49:44,960
p lags complicated the
analysis enormously.

769
00:49:44,960 --> 00:49:46,740
But one can actually
re-express it just

770
00:49:46,740 --> 00:49:48,930
as a first order lag problem.

771
00:49:48,930 --> 00:49:53,860
So in this case, what
one does is one considers

772
00:49:53,860 --> 00:49:57,580
for a vector auto regressive
process of order of p,

773
00:49:57,580 --> 00:50:08,090
simply stacking the
values of the process.

774
00:50:08,090 --> 00:50:11,640
So let me just highlight
what's going on there.

775
00:50:11,640 --> 00:50:17,620

776
00:50:17,620 --> 00:50:29,500
So if we have basically--
OK, so if we have x1,

777
00:50:29,500 --> 00:50:38,680
x2, xn, which are
all m by 1 values,

778
00:50:38,680 --> 00:50:42,640
m vectors of the
stochastic process.

779
00:50:42,640 --> 00:50:56,770
Then consider defining zt to
be equal to xt transpose xt

780
00:50:56,770 --> 00:51:01,889
minus 1 transpose up to x t
minus p minus 1 transpose.

781
00:51:01,889 --> 00:51:07,378

782
00:51:07,378 --> 00:51:09,930
Or this is t minus p minus 1.

783
00:51:09,930 --> 00:51:10,915
So there are p terms.

784
00:51:10,915 --> 00:51:13,630

785
00:51:13,630 --> 00:51:21,130
And then if we consider the
lagged value of that, that's

786
00:51:21,130 --> 00:51:30,470
x2 minus 1, x2 minus 2,
x2 minus p transpose.

787
00:51:30,470 --> 00:51:35,380
So what we've done is
we're considering zt.

788
00:51:35,380 --> 00:51:40,380
This is going to be m times p.

789
00:51:40,380 --> 00:51:46,892
It's actually 1 by m
times p in this notation.

790
00:51:46,892 --> 00:51:50,850
Well, actually I guess I
should put transpose here.

791
00:51:50,850 --> 00:51:54,740
So m minus p by 1.

792
00:51:54,740 --> 00:51:57,040
OK, in the lecture
notes it actually

793
00:51:57,040 --> 00:52:00,640
is primed there to
indicate the transpose.

794
00:52:00,640 --> 00:52:03,660
Well, if you define zt
and zt minus 1 this way,

795
00:52:03,660 --> 00:52:09,050
then zt is equal to d
plus a of zt minus 1

796
00:52:09,050 --> 00:52:12,230
plus f, where this is d.

797
00:52:12,230 --> 00:52:15,410
Basically the constant term
has the c entering and then 0's

798
00:52:15,410 --> 00:52:16,660
everywhere else.

799
00:52:16,660 --> 00:52:23,820
And the a matrix is phi
1 phi 2 up to phi p.

800
00:52:23,820 --> 00:52:36,410
And so basically the zt
vector transforms the zt--

801
00:52:36,410 --> 00:52:40,590
or is the transpose--
this linear transformation

802
00:52:40,590 --> 00:52:43,290
of the zt minus 1.

803
00:52:43,290 --> 00:52:46,460
And we have sort of
a very simple form

804
00:52:46,460 --> 00:52:52,640
for the constant term and a very
simple form for the f vector.

805
00:52:52,640 --> 00:52:59,700
And this is-- renders the model
into a sort of a first order

806
00:52:59,700 --> 00:53:06,270
time series model with a
larger multivariate series,

807
00:53:06,270 --> 00:53:09,590
basically mp by 1.

808
00:53:09,590 --> 00:53:21,380
Now, with this representation
we basically have-- we

809
00:53:21,380 --> 00:53:31,040
can demonstrate that the process
is going to be stationary

810
00:53:31,040 --> 00:53:34,750
if all eigenvalues of
the companion matrix a

811
00:53:34,750 --> 00:53:38,060
have modulus less than 1.

812
00:53:38,060 --> 00:53:43,320
And let's see-- if we go
back to the expression.

813
00:53:43,320 --> 00:53:50,750
OK, if the eigenvalues of
this matrix A are less than 1,

814
00:53:50,750 --> 00:53:55,440
then we won't get sort
of an explosive behavior

815
00:53:55,440 --> 00:54:00,320
of the process when this
basically increments over time

816
00:54:00,320 --> 00:54:03,790
with every previous value
getting multiplied by the A

817
00:54:03,790 --> 00:54:10,880
matrix and scaling the process
over time by the A-th power.

818
00:54:10,880 --> 00:54:12,450
So that is required.

819
00:54:12,450 --> 00:54:14,620
All eigenvalues of A
have to be less than 1.

820
00:54:14,620 --> 00:54:17,090
And equivalently, all
roots of this equation

821
00:54:17,090 --> 00:54:21,300
need to be outside
the unit circle.

822
00:54:21,300 --> 00:54:25,560
You remember there was a
constraint of-- or a condition

823
00:54:25,560 --> 00:54:30,030
for univariate auto
regressive models

824
00:54:30,030 --> 00:54:35,100
to be stationary, that the roots
of the characteristic equation

825
00:54:35,100 --> 00:54:38,100
are all outside the unit circle.

826
00:54:38,100 --> 00:54:40,170
And the class notes
go through and went

827
00:54:40,170 --> 00:54:41,760
through the derivation of that.

828
00:54:41,760 --> 00:54:46,820
This is the extension of that
to the multivariate case.

829
00:54:46,820 --> 00:54:50,290
And so basically
one needs to solve

830
00:54:50,290 --> 00:54:53,460
for roots of a polynomial
in Z and determine

831
00:54:53,460 --> 00:54:59,120
whether those are
outside the unit circle.

832
00:54:59,120 --> 00:55:01,460
Who can tell me what the
order of the polynomial

833
00:55:01,460 --> 00:55:07,880
is here for this sort
of determinant equation?

834
00:55:07,880 --> 00:55:09,720
AUDIENCE: [INAUDIBLE] mp.

835
00:55:09,720 --> 00:55:11,100
PROFESSOR: mp.

836
00:55:11,100 --> 00:55:11,600
Yes.

837
00:55:11,600 --> 00:55:13,670
It's basically of power mp.

838
00:55:13,670 --> 00:55:15,870
So in a determinant
you basically

839
00:55:15,870 --> 00:55:19,910
are taking products
of the m components

840
00:55:19,910 --> 00:55:24,050
in the matrix, various
linear combinations of those.

841
00:55:24,050 --> 00:55:28,610
So that's going to be an
mp dimensional polynomial.

842
00:55:28,610 --> 00:55:29,110
All right.

843
00:55:29,110 --> 00:55:32,680
Well, the mean of the
stationary VAR process

844
00:55:32,680 --> 00:55:37,220
can be computed rather
easily by taking expectations

845
00:55:37,220 --> 00:55:41,660
of this on both sides.

846
00:55:41,660 --> 00:55:44,720
So if we take the
expectation of xt

847
00:55:44,720 --> 00:55:48,710
and take expectations
across both sides,

848
00:55:48,710 --> 00:55:57,400
we get that mu is the c vector
plus the product of the phi

849
00:55:57,400 --> 00:55:59,670
case times mu plus 0.

850
00:55:59,670 --> 00:56:05,620
So mu, the unconditional
mean of the process,

851
00:56:05,620 --> 00:56:10,640
actually has this
formula just solving

852
00:56:10,640 --> 00:56:18,810
for mu in the top-- in the
second line to the third line.

853
00:56:18,810 --> 00:56:27,080
So here we can see that
basically this expression

854
00:56:27,080 --> 00:56:33,040
1 minus phi 1 through phi p,
that inverse has to exist.

855
00:56:33,040 --> 00:56:36,430
And actually, if we
then plug in the value

856
00:56:36,430 --> 00:56:39,050
of c in terms of the
unconditional mean,

857
00:56:39,050 --> 00:56:43,860
we get this expression
for the original process.

858
00:56:43,860 --> 00:56:49,920
So the unconditional mean c,
if we demeaned the process,

859
00:56:49,920 --> 00:56:52,290
there's busy know mean term.

860
00:56:52,290 --> 00:56:53,730
There's 0.

861
00:56:53,730 --> 00:56:58,710
And so basically the
mean adjusted process x

862
00:56:58,710 --> 00:57:02,460
follows this multivariate
vector auto regression

863
00:57:02,460 --> 00:57:08,430
with no mean, which is actually
used when this is specified.

864
00:57:08,430 --> 00:57:11,450

865
00:57:11,450 --> 00:57:18,850
Now, this vector
auto regression model

866
00:57:18,850 --> 00:57:25,760
can be expressed as a system
of regression equations.

867
00:57:25,760 --> 00:57:33,820
And so what we have with the
multivariate series, if we have

868
00:57:33,820 --> 00:57:38,130
multivariate data, we'll
have n sample observations,

869
00:57:38,130 --> 00:57:40,765
xt which is basically
the m vector

870
00:57:40,765 --> 00:57:45,710
of the multivariate process
observed for n time points.

871
00:57:45,710 --> 00:57:48,000
And for the
computations here, we're

872
00:57:48,000 --> 00:57:52,050
going to assume that
we have p sort of-- we

873
00:57:52,050 --> 00:57:56,100
have pre-sample observations
available to us.

874
00:57:56,100 --> 00:57:58,980
So we're essentially going
to be considering models

875
00:57:58,980 --> 00:58:01,610
where we condition
on the first p time

876
00:58:01,610 --> 00:58:07,660
points in order to facilitate
the estimation methodology.

877
00:58:07,660 --> 00:58:12,040
Then we can set up m
regression models corresponding

878
00:58:12,040 --> 00:58:16,080
to each component of
the m variate series.

879
00:58:16,080 --> 00:58:32,190
And so what we have
is our original--

880
00:58:32,190 --> 00:58:39,100
we have our collection of data
values, which is x1 transpose,

881
00:58:39,100 --> 00:58:45,750
x2 transpose, down
to xn transpose,

882
00:58:45,750 --> 00:58:52,290
which is an n by m matrix.

883
00:58:52,290 --> 00:58:54,540
OK, this is our
multivariate time series

884
00:58:54,540 --> 00:58:56,720
where we were just-- the
first row corresponds

885
00:58:56,720 --> 00:58:59,515
to the first time values, nth
row to the nth time values.

886
00:58:59,515 --> 00:59:02,180

887
00:59:02,180 --> 00:59:05,580
And we can set up
m regression models

888
00:59:05,580 --> 00:59:11,410
where we're going
to consider modeling

889
00:59:11,410 --> 00:59:15,920
the j-th column of this matrix.

890
00:59:15,920 --> 00:59:19,180
So we're just picking out
the univariate time series

891
00:59:19,180 --> 00:59:21,320
corresponding to
the j-th component.

892
00:59:21,320 --> 00:59:23,990
That's yj.

893
00:59:23,990 --> 00:59:31,150
And we're going to model that
has z beta j plus epsilon j

894
00:59:31,150 --> 00:59:41,380
where z is given by the
vector of lagged values

895
00:59:41,380 --> 00:59:46,320
of the multivariate
process where there's,

896
00:59:46,320 --> 00:59:49,040
for the t-th-- t
minus first value

897
00:59:49,040 --> 00:59:52,510
we have that current
value-- or the t

898
00:59:52,510 --> 00:59:54,890
minus first, t minus
second, up to t minus p.

899
00:59:54,890 --> 01:00:01,210
So we have basically
p m vectors here.

900
01:00:01,210 --> 01:00:09,100
And so this j-th time
series has elements

901
01:00:09,100 --> 01:00:14,190
that follow a linear
regression model

902
01:00:14,190 --> 01:00:18,100
on the lags of the entire
multivariate series up to p

903
01:00:18,100 --> 01:00:23,380
lags with their progression
parameter given by beta j.

904
01:00:23,380 --> 01:00:28,250
And basically the beta
j regression parameters

905
01:00:28,250 --> 01:00:36,049
corresponds to the various
elements of the phi matrices.

906
01:00:36,049 --> 01:00:38,590
So now there's a one-to-one one
correspondence between those.

907
01:00:38,590 --> 01:00:50,800

908
01:00:50,800 --> 01:00:51,300
All right.

909
01:00:51,300 --> 01:00:59,280
So I'm using now a notation
where superscript j corresponds

910
01:00:59,280 --> 01:01:02,930
to the j-th component
of the series,

911
01:01:02,930 --> 01:01:07,760
of the multivariate
stochastic process.

912
01:01:07,760 --> 01:01:12,550
So we have an mp plus 1
vector progression parameters

913
01:01:12,550 --> 01:01:16,160
for each series j, and
we have an epsilon j

914
01:01:16,160 --> 01:01:22,100
for an n-vector innovation
errors for each series.

915
01:01:22,100 --> 01:01:31,970
And so basically if this,
the j-th column, is yj,

916
01:01:31,970 --> 01:01:35,740
we're modeling that to be
equal to the simple matrix

917
01:01:35,740 --> 01:01:44,540
Z times beta j plus epsilon
j, where this is n by 1.

918
01:01:44,540 --> 01:01:47,790
This is n by np plus 1.

919
01:01:47,790 --> 01:01:51,520

920
01:01:51,520 --> 01:01:55,920
And this beta j is the mp
plus 1 progression parameter.

921
01:01:55,920 --> 01:02:04,845

922
01:02:04,845 --> 01:02:05,345
OK.

923
01:02:05,345 --> 01:02:10,140

924
01:02:10,140 --> 01:02:12,030
One might think,
OK, one can consider

925
01:02:12,030 --> 01:02:17,320
each of these regressions for
each of the component series,

926
01:02:17,320 --> 01:02:19,630
you could consider
them separately.

927
01:02:19,630 --> 01:02:23,940
But to consider
them all together,

928
01:02:23,940 --> 01:02:29,270
we can define the
multivariate regression model,

929
01:02:29,270 --> 01:02:33,420
which has the following form.

930
01:02:33,420 --> 01:02:40,277
We basically have the n vectors
for the first component,

931
01:02:40,277 --> 01:02:42,360
and then the second component
up to nth component.

932
01:02:42,360 --> 01:02:46,730
So an n by p matrix of
dependent variables,

933
01:02:46,730 --> 01:02:53,540
where each column corresponds
to a different component series,

934
01:02:53,540 --> 01:02:55,820
follows a linear
regression model

935
01:02:55,820 --> 01:02:59,990
with the same Z matrix
with different regression

936
01:02:59,990 --> 01:03:03,460
coefficient parameters, beta
1 through beta m corresponding

937
01:03:03,460 --> 01:03:08,040
to the different components
of the multivariate series.

938
01:03:08,040 --> 01:03:14,330
And we have epsilon 1,
epsilon 2, up to epsilon m.

939
01:03:14,330 --> 01:03:22,220
So we're thinking of taking--
so basically the y1 y2 up to ym

940
01:03:22,220 --> 01:03:27,150
is essentially this original
matrix of our multivariate time

941
01:03:27,150 --> 01:03:35,820
series because it's the first
component in the first column

942
01:03:35,820 --> 01:03:37,670
and the nth component
in the nth column.

943
01:03:37,670 --> 01:03:42,230
And the-- this
regression parameter

944
01:03:42,230 --> 01:03:47,090
or this explanatory variables
matrix X, Z in this case

945
01:03:47,090 --> 01:03:53,210
corresponds to lags of the
whole process up to p lags.

946
01:03:53,210 --> 01:03:58,230
So we're having lags of all the
m variate process up to p lags.

947
01:03:58,230 --> 01:04:02,700
So that's mp and then
plus 1 for our constant.

948
01:04:02,700 --> 01:04:05,790
So this is the set up for a
multivariate regression model.

949
01:04:05,790 --> 01:04:12,880

950
01:04:12,880 --> 01:04:14,910
In terms of how
one specifies this,

951
01:04:14,910 --> 01:04:17,630
well, actually,
in economic theory

952
01:04:17,630 --> 01:04:20,930
this is also related
to seemingly unrelated

953
01:04:20,930 --> 01:04:23,750
regressions, which you'll
find in econometrics.

954
01:04:23,750 --> 01:04:26,730

955
01:04:26,730 --> 01:04:32,850
If we want to specify this
multivariate model, well,

956
01:04:32,850 --> 01:04:35,520
what we could do is
we could actually

957
01:04:35,520 --> 01:04:37,550
specify each of the
component models

958
01:04:37,550 --> 01:04:42,430
separately because we basically
have sort of-- can think

959
01:04:42,430 --> 01:04:44,890
of the univariate regression
model for each component

960
01:04:44,890 --> 01:04:47,010
series.

961
01:04:47,010 --> 01:04:52,580
And this slide
indicates basically what

962
01:04:52,580 --> 01:04:53,810
the formulas are for that.

963
01:04:53,810 --> 01:04:58,189
So if we don't know anything
about multivariate regression

964
01:04:58,189 --> 01:04:59,730
we can say, well,
let's start by just

965
01:04:59,730 --> 01:05:03,620
doing the univariate regression
of each component series

966
01:05:03,620 --> 01:05:04,820
on the lags.

967
01:05:04,820 --> 01:05:07,540
And so we get our beta
hat j's least squares

968
01:05:07,540 --> 01:05:10,330
estimates given by the
usual formula where

969
01:05:10,330 --> 01:05:14,510
the independent variable is
matrix Z goes Z transpose Z

970
01:05:14,510 --> 01:05:17,280
inverse Z transpose
Y of the residual.

971
01:05:17,280 --> 01:05:20,090
So these are familiar formulas.

972
01:05:20,090 --> 01:05:28,680
And if we did this for each
of the component series j,

973
01:05:28,680 --> 01:05:33,750
then we would actually
get sample estimates

974
01:05:33,750 --> 01:05:37,351
of the innovation
process, eta 1.

975
01:05:37,351 --> 01:05:40,970
Basically the whole eta series.

976
01:05:40,970 --> 01:05:45,980
And we could actually
define from these estimates

977
01:05:45,980 --> 01:05:49,110
of the innovations
our covariance matrix

978
01:05:49,110 --> 01:05:52,500
for the innovations as
the sample covariance

979
01:05:52,500 --> 01:05:54,640
matrix of these etas.

980
01:05:54,640 --> 01:05:58,170
So all of these formulas
are-- you're basically

981
01:05:58,170 --> 01:06:00,830
applying very
straightforward estimation

982
01:06:00,830 --> 01:06:05,440
methods for the parameters
of a linear regression

983
01:06:05,440 --> 01:06:08,855
and then estimating
variances/covariances

984
01:06:08,855 --> 01:06:11,440
of these innovation terms.

985
01:06:11,440 --> 01:06:14,470
So from this, we
actually have estimates

986
01:06:14,470 --> 01:06:20,420
of this process in terms of
the sigma and the beta hats.

987
01:06:20,420 --> 01:06:24,220
But it's made
assuming that we can

988
01:06:24,220 --> 01:06:26,755
treat each of these component
regressions separately.

989
01:06:26,755 --> 01:06:33,410

990
01:06:33,410 --> 01:06:35,310
A rather remarkable
result is that

991
01:06:35,310 --> 01:06:40,300
these component-wise
regressions are actually

992
01:06:40,300 --> 01:06:44,470
the optimal estimates for
the multivariate regression

993
01:06:44,470 --> 01:06:46,030
as well.

994
01:06:46,030 --> 01:06:51,840
And as mathematicians,
this kind of result

995
01:06:51,840 --> 01:06:54,610
is, I think, rather
neat and elegant.

996
01:06:54,610 --> 01:06:58,900
And maybe some of you will
think this is very obvious,

997
01:06:58,900 --> 01:07:05,720
but it actually-- it
isn't quite obvious.

998
01:07:05,720 --> 01:07:08,010
That said, this
component-wise estimation

999
01:07:08,010 --> 01:07:10,430
should be optimal as well.

1000
01:07:10,430 --> 01:07:13,100
And the next section
of the lecture notes

1001
01:07:13,100 --> 01:07:16,965
goes through this argument.

1002
01:07:16,965 --> 01:07:20,140

1003
01:07:20,140 --> 01:07:22,480
And I'm going to, in
the interest of time,

1004
01:07:22,480 --> 01:07:26,590
go through this-- just sort of
highlight what the results are.

1005
01:07:26,590 --> 01:07:29,580
The details are in these
notes that you can go through.

1006
01:07:29,580 --> 01:07:34,750
And I will be happy to go
into more detail about them

1007
01:07:34,750 --> 01:07:37,260
during office hours.

1008
01:07:37,260 --> 01:07:41,520
But if we're fitting a vector
auto regression model where

1009
01:07:41,520 --> 01:07:44,750
there are no constraints on
the coefficient matrices,

1010
01:07:44,750 --> 01:07:52,060
phi 1 through phi p, then
these component-wise estimates,

1011
01:07:52,060 --> 01:07:56,950
accounting for arbitrary
covariance matrix sigma

1012
01:07:56,950 --> 01:08:01,300
for the innovations,
those basically

1013
01:08:01,300 --> 01:08:03,810
are equal to the generalized
least squares estimates

1014
01:08:03,810 --> 01:08:06,030
of these underlying parameters.

1015
01:08:06,030 --> 01:08:09,280
You'll recall we talked about
the Gauss Markov theorem

1016
01:08:09,280 --> 01:08:14,480
where we were able to extend
the assumption of sort

1017
01:08:14,480 --> 01:08:16,380
equal variances
across observations

1018
01:08:16,380 --> 01:08:20,189
to unequal variances
and covariances.

1019
01:08:20,189 --> 01:08:23,832
Well, it turns out to
these component-wise OLS

1020
01:08:23,832 --> 01:08:26,040
estimates are, in fact, the
generalized least squared

1021
01:08:26,040 --> 01:08:27,180
estimates.

1022
01:08:27,180 --> 01:08:30,160
And under the assumption
of Gaussian distributions

1023
01:08:30,160 --> 01:08:32,410
for the innovations,
they, in fact,

1024
01:08:32,410 --> 01:08:34,569
are maximum
likelihood estimates.

1025
01:08:34,569 --> 01:08:41,210
And this theory applies
Kronecker products.

1026
01:08:41,210 --> 01:08:43,609
We're not going to have
any homework with Kronecker

1027
01:08:43,609 --> 01:08:44,580
products.

1028
01:08:44,580 --> 01:08:47,160
These notes really
are for those who

1029
01:08:47,160 --> 01:08:51,120
have some more extensive
background in linear algebra.

1030
01:08:51,120 --> 01:08:55,130
But it's a very nice use
of these Kronecker product

1031
01:08:55,130 --> 01:08:56,180
operators.

1032
01:08:56,180 --> 01:09:02,479
Basically, this notation--
or no, x circle--

1033
01:09:02,479 --> 01:09:04,970
I'll call it
Kronecker-- is one where

1034
01:09:04,970 --> 01:09:08,560
you take a matrix
A and a matrix B

1035
01:09:08,560 --> 01:09:11,170
and you consider
the matrix which

1036
01:09:11,170 --> 01:09:14,950
takes each element of A
times the whole matrix B.

1037
01:09:14,950 --> 01:09:18,370
So we start with
an m by n matrix A

1038
01:09:18,370 --> 01:09:22,220
and end up with
an mp by qn matrix

1039
01:09:22,220 --> 01:09:25,550
by taking each element of
A times the whole matrix B.

1040
01:09:25,550 --> 01:09:29,010
So it's, they say, has
this block structure.

1041
01:09:29,010 --> 01:09:32,510
So this is very
simple definition.

1042
01:09:32,510 --> 01:09:37,080
If you look at properties of
transposition of matrices,

1043
01:09:37,080 --> 01:09:38,540
you can prove these results.

1044
01:09:38,540 --> 01:09:42,850
These are properties of
the Kronecker product.

1045
01:09:42,850 --> 01:09:54,320
And there's a vec operator
which takes a matrix

1046
01:09:54,320 --> 01:09:58,470
and simply stacks
the columns together.

1047
01:09:58,470 --> 01:10:04,700
And in the talk last Tuesday of
Ivan's, talking about modeling

1048
01:10:04,700 --> 01:10:07,845
the volatility surface,
he basically, he

1049
01:10:07,845 --> 01:10:11,410
was modeling a two dimensional
surface-- or a surface

1050
01:10:11,410 --> 01:10:13,910
in three dimensions,
but there was

1051
01:10:13,910 --> 01:10:16,830
two dimensions explaining it.

1052
01:10:16,830 --> 01:10:22,140
You basically can stack
columns of the matrix

1053
01:10:22,140 --> 01:10:27,030
and be modeling a vector
instead of a matrix of values.

1054
01:10:27,030 --> 01:10:32,130
So the vectorizing operator
allows us to manipulate terms

1055
01:10:32,130 --> 01:10:35,400
into a more convenient form.

1056
01:10:35,400 --> 01:10:39,040
And this multivariate
regression model

1057
01:10:39,040 --> 01:10:50,950
is one where it's set up as
sort of a n by m matrix Y,

1058
01:10:50,950 --> 01:10:53,490
having that structure.

1059
01:10:53,490 --> 01:10:57,110
It can be expressed in terms
of the linear regression form

1060
01:10:57,110 --> 01:11:06,380
as y star equaling the
vector, the vec of y.

1061
01:11:06,380 --> 01:11:15,340
So we basically have y1 y2
down to ym all lined up.

1062
01:11:15,340 --> 01:11:18,480
So this is pm by 1.

1063
01:11:18,480 --> 01:11:21,600

1064
01:11:21,600 --> 01:11:30,055
That's going to be equal to
some matrix plus the epsilon 1

1065
01:11:30,055 --> 01:11:33,920
epsilon 2 down to epsilon n.

1066
01:11:33,920 --> 01:11:38,850
And then there's
going to be a matrix

1067
01:11:38,850 --> 01:11:43,970
and a regression coefficient
matrix beta 1 beta

1068
01:11:43,970 --> 01:11:47,360
2 down to beta p.

1069
01:11:47,360 --> 01:11:51,320
So we consider vectorizing
the beta matrix,

1070
01:11:51,320 --> 01:11:55,340
vectorizing epsilon,
and vectorizing y.

1071
01:11:55,340 --> 01:11:59,465
And then in order
to define this sort

1072
01:11:59,465 --> 01:12:03,370
of simple linear regression
model, univariate regression

1073
01:12:03,370 --> 01:12:08,750
model, well, we need to
have a Z in the first column

1074
01:12:08,750 --> 01:12:14,800
here corresponding to beta 1
for Y1 and 0's everywhere else.

1075
01:12:14,800 --> 01:12:20,160
In the second block
we want to have

1076
01:12:20,160 --> 01:12:25,920
a Z in the second off diagonal
with 0's everywhere else and so

1077
01:12:25,920 --> 01:12:27,100
forth.

1078
01:12:27,100 --> 01:12:30,960
So this is just re-expressing
everything in this notation.

1079
01:12:30,960 --> 01:12:34,290
But the notation is very nice
because, at the end of the day

1080
01:12:34,290 --> 01:12:36,760
we basically have a
regression model like we

1081
01:12:36,760 --> 01:12:39,030
had when we were doing
our regression analysis.

1082
01:12:39,030 --> 01:12:42,930
So all the theory we have
for specifying these models

1083
01:12:42,930 --> 01:12:46,270
plays through with
univariate regression.

1084
01:12:46,270 --> 01:12:50,309
And one can go through
this technical argument

1085
01:12:50,309 --> 01:12:52,600
to show that the generalized
least squares estimate is,

1086
01:12:52,600 --> 01:12:59,430
in fact, the equivalent to
the component-wise values.

1087
01:12:59,430 --> 01:13:03,980
And that's very, very good.

1088
01:13:03,980 --> 01:13:07,000
Maximum likelihood
estimation with these models.

1089
01:13:07,000 --> 01:13:12,130
Well, we actually use
this vectorized notation

1090
01:13:12,130 --> 01:13:15,050
to define the
likelihood function.

1091
01:13:15,050 --> 01:13:20,560
And if these
assumptions are made

1092
01:13:20,560 --> 01:13:24,610
about the linear
regression model,

1093
01:13:24,610 --> 01:13:28,740
we basically have
an n times m vector

1094
01:13:28,740 --> 01:13:34,780
of dependent variable values,
whereas your multivariate

1095
01:13:34,780 --> 01:13:38,700
normal with mean given
by x star beta star

1096
01:13:38,700 --> 01:13:41,870
and then a covariance
matrix epsilon.

1097
01:13:41,870 --> 01:13:47,380
The covariance matrix of
epsilon star is sigma star.

1098
01:13:47,380 --> 01:13:50,900
Well, sigma star is In
Kronecker product sigma.

1099
01:13:50,900 --> 01:13:54,130
So if you go through
the math of this,

1100
01:13:54,130 --> 01:13:59,250
everything matches up in terms
of what the assumptions are.

1101
01:13:59,250 --> 01:14:05,340
And the conditional probability
density function of this data

1102
01:14:05,340 --> 01:14:14,930
is the usual functions of log
normal or of a normal sample.

1103
01:14:14,930 --> 01:14:20,850
So we have unknown
parameters beta star sigma,

1104
01:14:20,850 --> 01:14:26,960
which are equal to
the joint density

1105
01:14:26,960 --> 01:14:29,740
of this normal linear
regression model.

1106
01:14:29,740 --> 01:14:33,250
So this corresponds
to what we had

1107
01:14:33,250 --> 01:14:34,980
before in our
regression analysis.

1108
01:14:34,980 --> 01:14:37,050
We just had this more
complicated definition

1109
01:14:37,050 --> 01:14:40,900
of the independent
variables matrix X star.

1110
01:14:40,900 --> 01:14:42,470
And a more
complicated definition

1111
01:14:42,470 --> 01:14:47,270
of our variance/covariance
matrix sigma star.

1112
01:14:47,270 --> 01:14:50,250
But the log likelihood
function ends up

1113
01:14:50,250 --> 01:14:54,390
being equal to a
term proportional

1114
01:14:54,390 --> 01:14:59,090
to the log of the determinant
of our sigma matrix

1115
01:14:59,090 --> 01:15:03,790
and minus one half q of beta
sigma, where q of beta sigma

1116
01:15:03,790 --> 01:15:08,860
is the least squares criterion
for each of the component

1117
01:15:08,860 --> 01:15:12,960
models summed up.

1118
01:15:12,960 --> 01:15:16,660
So the component-wise
maximum likelihood estimation

1119
01:15:16,660 --> 01:15:19,115
is-- for the
underlying parameters,

1120
01:15:19,115 --> 01:15:23,260
is the same as the large one.

1121
01:15:23,260 --> 01:15:31,880
And in terms of estimating
the covariance matrix,

1122
01:15:31,880 --> 01:15:37,420
there's a notion called the
concentrated log likelihood,

1123
01:15:37,420 --> 01:15:45,200
which comes into play in
models with many parameters.

1124
01:15:45,200 --> 01:15:48,390
In this model, we have
unknown parameters--

1125
01:15:48,390 --> 01:15:52,230
our regression parameters
beta and our covariance matrix

1126
01:15:52,230 --> 01:15:55,190
for the innovations sigma.

1127
01:15:55,190 --> 01:15:59,850
It turns out that our estimate
of the regression parameter

1128
01:15:59,850 --> 01:16:05,190
beta is independent, doesn't
depend-- not statistically

1129
01:16:05,190 --> 01:16:06,570
independent-- but
does not depend

1130
01:16:06,570 --> 01:16:10,700
on the value of the
covariance matrix sigma.

1131
01:16:10,700 --> 01:16:14,110
So whatever sigma is, we have
the same maximum likelihood

1132
01:16:14,110 --> 01:16:15,620
estimate for the betas.

1133
01:16:15,620 --> 01:16:19,760
So we can consider
the log likelihood

1134
01:16:19,760 --> 01:16:24,540
setting the beta parameter
equal to its maximum likelihood

1135
01:16:24,540 --> 01:16:25,410
estimate.

1136
01:16:25,410 --> 01:16:27,270
And then we have a
function that just

1137
01:16:27,270 --> 01:16:31,350
depends on the data and the
unknown parameter sigma.

1138
01:16:31,350 --> 01:16:34,230
So that's a concentrated
likelihood function

1139
01:16:34,230 --> 01:16:36,210
that needs to be maximized.

1140
01:16:36,210 --> 01:16:40,570
And the maximization of the log
of a determinant of a matrix

1141
01:16:40,570 --> 01:16:43,900
minus n over 2, the trace of
that matrix times an estimate

1142
01:16:43,900 --> 01:16:47,210
of it, that has been solved.

1143
01:16:47,210 --> 01:16:50,240
It's a bit involved.

1144
01:16:50,240 --> 01:16:52,760
But if you're interested in
the mathematics for how that's

1145
01:16:52,760 --> 01:16:55,470
actually solved and how you
take derivatives of determinants

1146
01:16:55,470 --> 01:16:58,152
and so forth, there's a
paper by Anderson and Olkin

1147
01:16:58,152 --> 01:16:59,860
that goes through all
the details of that

1148
01:16:59,860 --> 01:17:01,622
that you can Google on the web.

1149
01:17:01,622 --> 01:17:05,910

1150
01:17:05,910 --> 01:17:07,550
Finally, let's see.

1151
01:17:07,550 --> 01:17:09,270
There's-- well, not finally.

1152
01:17:09,270 --> 01:17:12,240
There's model selection
criteria that can be applied.

1153
01:17:12,240 --> 01:17:14,720
These have been applied
before for regression models

1154
01:17:14,720 --> 01:17:18,780
for univariate time series
model, the Akaike Information

1155
01:17:18,780 --> 01:17:22,770
Criterion, the Bayes Information
Criterion, Hannan-Quinn

1156
01:17:22,770 --> 01:17:24,640
Criterion.

1157
01:17:24,640 --> 01:17:27,060
These definitions
are all consistent

1158
01:17:27,060 --> 01:17:29,680
with the other definitions.

1159
01:17:29,680 --> 01:17:33,330
They basically take
the likelihood function

1160
01:17:33,330 --> 01:17:36,260
and you try to maximize
that plus a penalty

1161
01:17:36,260 --> 01:17:39,330
for the number of
unknown parameters.

1162
01:17:39,330 --> 01:17:43,380
And that's given here.

1163
01:17:43,380 --> 01:17:45,920

1164
01:17:45,920 --> 01:17:47,780
OK, then the last
section goes through

1165
01:17:47,780 --> 01:17:53,500
an asymptotic distribution
of least squares estimates.

1166
01:17:53,500 --> 01:17:56,950
And I'll let you read
that on your own.

1167
01:17:56,950 --> 01:17:57,450
Let's see.

1168
01:17:57,450 --> 01:18:03,750
For this lecture I put together
an example of fitting vector

1169
01:18:03,750 --> 01:18:09,330
auto regressions with some
macroeconomic variables.

1170
01:18:09,330 --> 01:18:15,360
And I just wanted to
point that out to you.

1171
01:18:15,360 --> 01:18:23,738
So let me go to
this document here.

1172
01:18:23,738 --> 01:18:25,690
What have we got here?

1173
01:18:25,690 --> 01:18:29,594

1174
01:18:29,594 --> 01:18:30,580
All right.

1175
01:18:30,580 --> 01:18:31,310
Well, OK.

1176
01:18:31,310 --> 01:18:37,410
Modeling macroeconomic time
series is an important topic.

1177
01:18:37,410 --> 01:18:39,940
It's what sort of
central bankers do.

1178
01:18:39,940 --> 01:18:42,550
They want to understand
what factors are affecting

1179
01:18:42,550 --> 01:18:45,880
the economy in terms of growth,
inflation, unemployment.

1180
01:18:45,880 --> 01:18:50,600
And what's the impact of
interest rate policies.

1181
01:18:50,600 --> 01:18:52,940
There are some really
important papers

1182
01:18:52,940 --> 01:18:56,750
by Robert Lederman and
Christopher Sims dealing

1183
01:18:56,750 --> 01:18:58,780
with fitting vector
auto regression

1184
01:18:58,780 --> 01:19:01,590
models to a macroeconomic
time series.

1185
01:19:01,590 --> 01:19:03,420
And actually, the
framework within which

1186
01:19:03,420 --> 01:19:07,680
they specified these models
was a Bayesian framework,

1187
01:19:07,680 --> 01:19:11,320
which is an extension of the
maximum likelihood method where

1188
01:19:11,320 --> 01:19:14,860
you'll incorporate
reasonable sort

1189
01:19:14,860 --> 01:19:18,190
prior assumptions about what
the parameters ought to be.

1190
01:19:18,190 --> 01:19:26,130
But in this note,
I sort of basically

1191
01:19:26,130 --> 01:19:29,870
go through collecting various
macroeconomic variables

1192
01:19:29,870 --> 01:19:33,240
directly off the web
using the package r.

1193
01:19:33,240 --> 01:19:36,550
All this stuff
is-- these are data

1194
01:19:36,550 --> 01:19:39,040
that you can get your hands on.

1195
01:19:39,040 --> 01:19:43,900
Here's the unemployment
rate from January 1946

1196
01:19:43,900 --> 01:19:47,030
up through this past month.

1197
01:19:47,030 --> 01:19:52,670
Anyone can see how that's
varied between much less than 4%

1198
01:19:52,670 --> 01:19:56,470
to over 10%, as it was recently.

1199
01:19:56,470 --> 01:19:59,550
And there's also
the Fed funds rate,

1200
01:19:59,550 --> 01:20:02,100
which is one of
the key variables

1201
01:20:02,100 --> 01:20:06,610
that the Federal Reserve Open
Market Committee controls,

1202
01:20:06,610 --> 01:20:08,880
or I should say
controlled in the past,

1203
01:20:08,880 --> 01:20:10,840
to try and affect the economy.

1204
01:20:10,840 --> 01:20:14,720
Now that value of that
rate is set almost at zero

1205
01:20:14,720 --> 01:20:19,340
and other means are
applied to have an impact

1206
01:20:19,340 --> 01:20:24,940
on economic growth
the economic situation

1207
01:20:24,940 --> 01:20:31,340
of the market-- of
the economy, rather.

1208
01:20:31,340 --> 01:20:32,120
Let's see.

1209
01:20:32,120 --> 01:20:34,330
There's also-- anyway, a
bunch of other variables.

1210
01:20:34,330 --> 01:20:38,470
CPI, which is a
measure of inflation.

1211
01:20:38,470 --> 01:20:45,502
What this note goes through
is the specification

1212
01:20:45,502 --> 01:20:52,070
of vector auto regression
models for these series.

1213
01:20:52,070 --> 01:20:54,490
And I use just a
small set of cases.

1214
01:20:54,490 --> 01:20:58,640
I look at unemployment
rate, federal funds,

1215
01:20:58,640 --> 01:21:02,470
and the CPI, which is
a measure of inflation.

1216
01:21:02,470 --> 01:21:06,580
And there's-- if
one goes through,

1217
01:21:06,580 --> 01:21:10,780
there are multivariate
versions of the autocorrelation

1218
01:21:10,780 --> 01:21:14,670
function, as given on
the top right panel here,

1219
01:21:14,670 --> 01:21:17,110
between these variables.

1220
01:21:17,110 --> 01:21:20,350
And one can also do the partial
autocorrelation function.

1221
01:21:20,350 --> 01:21:23,289
You'll recall that
autocorrelation functions

1222
01:21:23,289 --> 01:21:24,830
and partial
autocorrelation functions

1223
01:21:24,830 --> 01:21:29,040
are related to what kind of--
or help us understand what kind

1224
01:21:29,040 --> 01:21:31,390
of order ARMA processes
might be appropriate

1225
01:21:31,390 --> 01:21:32,670
for univariate series.

1226
01:21:32,670 --> 01:21:36,750
For multivariate series,
then there are basically

1227
01:21:36,750 --> 01:21:39,760
cross lags between variables
that are important,

1228
01:21:39,760 --> 01:21:42,750
and these can call be captured
with vector auto regression

1229
01:21:42,750 --> 01:21:43,520
models.

1230
01:21:43,520 --> 01:21:47,370
So this goes through
and shows how

1231
01:21:47,370 --> 01:21:50,610
these things are
correlated with themselves.

1232
01:21:50,610 --> 01:21:51,970
And let's see.

1233
01:21:51,970 --> 01:21:59,550
At the end of this note,
there are some impulse

1234
01:21:59,550 --> 01:22:02,660
response functions
graphed, which

1235
01:22:02,660 --> 01:22:07,370
are looking at what is the
impact of an innovation in one

1236
01:22:07,370 --> 01:22:11,090
of the components of the
multivariate time series.

1237
01:22:11,090 --> 01:22:16,570
So like if Fed funds were to be
increased by a certain value,

1238
01:22:16,570 --> 01:22:20,140
what would the likely impact
be on the unemployment rate?

1239
01:22:20,140 --> 01:22:22,240
Or on GNP?

1240
01:22:22,240 --> 01:22:25,540
Basically, the production
level of the economy.

1241
01:22:25,540 --> 01:22:30,790
And this looks at-- let's see.

1242
01:22:30,790 --> 01:22:32,260
Well, actually
here we're looking

1243
01:22:32,260 --> 01:22:34,555
at the impulse function.

1244
01:22:34,555 --> 01:22:36,680
You can look at the impulse
function of innovations

1245
01:22:36,680 --> 01:22:40,000
on any of the component
variables on all the others.

1246
01:22:40,000 --> 01:22:42,150
And in this case,
on the left panel

1247
01:22:42,150 --> 01:22:47,790
here is-- it shows what
happens when unemployment

1248
01:22:47,790 --> 01:22:50,360
has a spike up, or unit spike.

1249
01:22:50,360 --> 01:22:51,760
A unit impulse up.

1250
01:22:51,760 --> 01:22:55,460
Well, this second
panel shows what's

1251
01:22:55,460 --> 01:22:57,190
likely to happen to
the Fed funds rate.

1252
01:22:57,190 --> 01:22:59,730
It turns out that's
likely to go down.

1253
01:22:59,730 --> 01:23:01,670
And that sort of is
indicating-- it's sort

1254
01:23:01,670 --> 01:23:03,370
of reflecting
what, historically,

1255
01:23:03,370 --> 01:23:07,490
was the policy of the Fed
to basically reduce interest

1256
01:23:07,490 --> 01:23:11,550
rates if unemployment
was rising.

1257
01:23:11,550 --> 01:23:16,400
And then-- so anyway, these
impulse response functions

1258
01:23:16,400 --> 01:23:18,690
correspond to essentially
those innovation

1259
01:23:18,690 --> 01:23:20,450
terms on the Wold decomposition.

1260
01:23:20,450 --> 01:23:22,360
And why are these important?

1261
01:23:22,360 --> 01:23:26,720
Well, this indicates a
connection, basically,

1262
01:23:26,720 --> 01:23:30,260
between that sort of moving
average representation

1263
01:23:30,260 --> 01:23:31,870
and these time series models.

1264
01:23:31,870 --> 01:23:35,480
And the way these
graphs are generated

1265
01:23:35,480 --> 01:23:39,090
is by essentially finding
the Wold decomposition

1266
01:23:39,090 --> 01:23:43,880
and then incorporating
that into these values.

1267
01:23:43,880 --> 01:23:47,540
So-- OK, we'll finish
there for today.