1
00:00:00,060 --> 00:00:02,500
The following content is
provided under a Creative

2
00:00:02,500 --> 00:00:04,019
Commons license.

3
00:00:04,019 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,730
continue to offer high quality
educational resources for free.

5
00:00:10,730 --> 00:00:13,330
To make a donation or
view additional materials

6
00:00:13,330 --> 00:00:17,217
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,217 --> 00:00:17,842
at ocw.mit.edu.

8
00:00:22,370 --> 00:00:23,500
PROFESSOR: All right.

9
00:00:23,500 --> 00:00:24,000
Let's see.

10
00:00:24,000 --> 00:00:25,650
We're going to start
today with a wrap

11
00:00:25,650 --> 00:00:29,130
up of our discussion of
univariate time series

12
00:00:29,130 --> 00:00:29,990
analysis.

13
00:00:29,990 --> 00:00:35,140
And last time we went through
the Wold representation

14
00:00:35,140 --> 00:00:37,830
theorem, which
applies to covariance

15
00:00:37,830 --> 00:00:41,090
stationary processes, a
very powerful theorem.

16
00:00:41,090 --> 00:00:44,880
And implementations of
the covariance stationary

17
00:00:44,880 --> 00:00:47,700
processes with ARMA models.

18
00:00:47,700 --> 00:00:50,430
And we discussed
estimation of those models

19
00:00:50,430 --> 00:00:54,840
with maximum likelihood.

20
00:00:54,840 --> 00:00:57,010
And here in this
slide I just wanted

21
00:00:57,010 --> 00:01:01,070
to highlight how when
we estimate models

22
00:01:01,070 --> 00:01:03,340
with maximum likelihood
we need to have

23
00:01:03,340 --> 00:01:07,280
an assumption of a probability
distribution for what's random,

24
00:01:07,280 --> 00:01:11,270
and in the ARMA structure
we consider the simple case

25
00:01:11,270 --> 00:01:14,120
where the innovations,
the eta_t,

26
00:01:14,120 --> 00:01:17,437
are normally
distributed white noise.

27
00:01:17,437 --> 00:01:19,520
So they're independent and
identically distributed

28
00:01:19,520 --> 00:01:21,500
normal random variables.

29
00:01:21,500 --> 00:01:24,000
And the likelihood
function can be

30
00:01:24,000 --> 00:01:28,240
maximized at the maximum
likelihood parameters.

31
00:01:28,240 --> 00:01:34,450
And it's simple to implement
the limited information maximum

32
00:01:34,450 --> 00:01:38,200
likelihood where one conditions
on the first few observations

33
00:01:38,200 --> 00:01:40,590
in the time series.

34
00:01:40,590 --> 00:01:46,870
If you look at the likelihood
structure for ARMA models,

35
00:01:46,870 --> 00:01:50,520
the density of an outcome
at a given time point

36
00:01:50,520 --> 00:01:53,700
depends on lags of that
dependent variable.

37
00:01:53,700 --> 00:01:58,000
So if those are unavailable,
then that can be a problem.

38
00:01:58,000 --> 00:02:01,930
One can implement limited
information maximum likelihood

39
00:02:01,930 --> 00:02:04,680
where you're just conditioning
on those initial values,

40
00:02:04,680 --> 00:02:07,740
or there are full information
maximum likelihood methods

41
00:02:07,740 --> 00:02:09,750
that you can apply as well.

42
00:02:09,750 --> 00:02:13,170
Generally though the
limited information case

43
00:02:13,170 --> 00:02:16,300
is what's applied.

44
00:02:16,300 --> 00:02:18,800
Then the issue is
model selection.

45
00:02:18,800 --> 00:02:22,020
And with model
selection the issues

46
00:02:22,020 --> 00:02:23,870
that arise with time
series are issues

47
00:02:23,870 --> 00:02:27,480
that arise in fitting any
kind of statistical model.

48
00:02:27,480 --> 00:02:30,040
Ordinarily one will
have multiple candidates

49
00:02:30,040 --> 00:02:32,270
for the model you
want to fit to data.

50
00:02:32,270 --> 00:02:34,240
And the issue is how
do you judge which

51
00:02:34,240 --> 00:02:36,390
ones are better than others.

52
00:02:36,390 --> 00:02:38,700
Why would you prefer
one over the other?

53
00:02:38,700 --> 00:02:43,230
And if we're considering a
collection of different ARMA

54
00:02:43,230 --> 00:02:49,670
models then we could say, fit
all ARMA models of order p,q

55
00:02:49,670 --> 00:02:53,100
with p and q varying
over some range.

56
00:02:53,100 --> 00:02:57,130
p from 0 up to p_max,
q from q up to q_max.

57
00:02:57,130 --> 00:03:02,270
And evaluate those
p,q different models.

58
00:03:02,270 --> 00:03:05,570
And if we consider sigma
tilde squared of p,

59
00:03:05,570 --> 00:03:09,270
q being the MLE of
the error variance,

60
00:03:09,270 --> 00:03:12,490
then there are these
model selection criteria

61
00:03:12,490 --> 00:03:14,150
that are very popular.

62
00:03:14,150 --> 00:03:16,450
Akaike information criterion,
and Bayes information

63
00:03:16,450 --> 00:03:18,060
criterion, and Hannan-Quinn.

64
00:03:18,060 --> 00:03:22,850
Now these criteria all
have the same term,

65
00:03:22,850 --> 00:03:27,120
log of the MLE of
the error variance.

66
00:03:27,120 --> 00:03:30,520
So these criteria don't
vary at all with that.

67
00:03:30,520 --> 00:03:32,260
They just vary with
this second term,

68
00:03:32,260 --> 00:03:35,780
but let's focus first
on the AIC criterion.

69
00:03:35,780 --> 00:03:37,970
A given model is
going to be better

70
00:03:37,970 --> 00:03:44,760
if the log of the MLE for the
error variance is smaller.

71
00:03:44,760 --> 00:03:48,130
Now is that a good thing?

72
00:03:48,130 --> 00:03:50,186
Meaning, what is
the interpretation

73
00:03:50,186 --> 00:03:52,560
of that practically when you're
fitting different models?

74
00:03:55,720 --> 00:03:58,060
Well, the practical
interpretation

75
00:03:58,060 --> 00:04:02,980
is the variability of the
model about where you're

76
00:04:02,980 --> 00:04:05,380
predicting things, our
estimate of the error variance

77
00:04:05,380 --> 00:04:06,120
is smaller.

78
00:04:06,120 --> 00:04:09,940
So we have essentially a
model with a smaller error

79
00:04:09,940 --> 00:04:12,300
variance is better.

80
00:04:12,300 --> 00:04:15,720
So we're trying to minimize
the log of that variance.

81
00:04:15,720 --> 00:04:18,040
Minimizing that is a good thing.

82
00:04:18,040 --> 00:04:23,095
Now what happens when
you have many sort

83
00:04:23,095 --> 00:04:26,120
of independent variables
to include in a model?

84
00:04:26,120 --> 00:04:28,930
Well, if you were doing a
Taylor series approximation

85
00:04:28,930 --> 00:04:30,990
of a continuous function,
eventually you'd

86
00:04:30,990 --> 00:04:34,150
sort of get to probably
the smooth function

87
00:04:34,150 --> 00:04:39,800
with enough terms, but suppose
that the actual model, it does

88
00:04:39,800 --> 00:04:43,230
have a finite number
of parameters.

89
00:04:43,230 --> 00:04:44,990
And you're considering
new factors,

90
00:04:44,990 --> 00:04:46,990
new lags of
independent variables

91
00:04:46,990 --> 00:04:49,200
in the autoregressions.

92
00:04:49,200 --> 00:04:51,190
As you add more
and more variables,

93
00:04:51,190 --> 00:04:55,480
well, there really
should be a penalty

94
00:04:55,480 --> 00:05:00,140
for adding extra variables
that aren't adding

95
00:05:00,140 --> 00:05:03,620
real value to the model in terms
of reducing the error variance.

96
00:05:03,620 --> 00:05:06,190
So the Akaike
information criterion

97
00:05:06,190 --> 00:05:11,020
is penalizing different
models by a factor that

98
00:05:11,020 --> 00:05:14,490
depends on the size of the model
in terms of the dimensionality

99
00:05:14,490 --> 00:05:17,380
of the model parameters.

100
00:05:17,380 --> 00:05:19,340
So p plus q is
the dimensionality

101
00:05:19,340 --> 00:05:24,020
of the autoregression model.

102
00:05:24,020 --> 00:05:29,770
So let's see.

103
00:05:29,770 --> 00:05:34,270
With the BIC criterion the
difference between that

104
00:05:34,270 --> 00:05:39,030
and the AIC criterion is
that this factor two is

105
00:05:39,030 --> 00:05:43,090
replaced by log n.

106
00:05:43,090 --> 00:05:49,120
So rather than having a sort
of unit increment of penalty

107
00:05:49,120 --> 00:05:54,460
for adding an extra parameter,
the Bayes information criterion

108
00:05:54,460 --> 00:05:59,730
is adding a log n penalty
times the number of parameters.

109
00:05:59,730 --> 00:06:03,700
And so as the sample size
gets larger and larger,

110
00:06:03,700 --> 00:06:07,030
that penalty gets
higher and higher.

111
00:06:07,030 --> 00:06:12,610
Now the practical interpretation
of the Akaike information

112
00:06:12,610 --> 00:06:18,140
criterion is that it is
very similar to applying

113
00:06:18,140 --> 00:06:21,520
a rule which says, we're
going to include variables

114
00:06:21,520 --> 00:06:28,740
in our model if the square of
the t statistic for estimating

115
00:06:28,740 --> 00:06:34,850
the additional parameter in the
model is greater than 2 or not.

116
00:06:34,850 --> 00:06:41,710
So in terms of when does the
Akaike information criterion

117
00:06:41,710 --> 00:06:45,670
become lower from adding
additional terms to a model?

118
00:06:45,670 --> 00:06:49,150
If you're considering two models
that differ by just one factor,

119
00:06:49,150 --> 00:06:52,530
it's basically if the t
statistic for the model

120
00:06:52,530 --> 00:06:56,890
coefficient on that factor is a
squared value greater than two

121
00:06:56,890 --> 00:06:57,830
or not.

122
00:06:57,830 --> 00:07:03,310
Now many of you who have
seen regression models before

123
00:07:03,310 --> 00:07:05,560
and applied them, in
particular applications

124
00:07:05,560 --> 00:07:08,270
would probably
say, I really don't

125
00:07:08,270 --> 00:07:11,590
believe in the value
of an additional factor

126
00:07:11,590 --> 00:07:15,590
unless the t statistic
is greater than 1.96,

127
00:07:15,590 --> 00:07:18,410
or 2 or something.

128
00:07:18,410 --> 00:07:20,320
But the Akaike
information criterion

129
00:07:20,320 --> 00:07:22,410
says the t statistic
should be greater

130
00:07:22,410 --> 00:07:24,450
than the square root of 2.

131
00:07:24,450 --> 00:07:27,290
So it's sort of a weaker
constraint for adding variables

132
00:07:27,290 --> 00:07:28,510
into the model.

133
00:07:28,510 --> 00:07:31,551
And now why is it called
an information criterion?

134
00:07:31,551 --> 00:07:33,050
I won't go into
this in the lecture.

135
00:07:33,050 --> 00:07:34,940
I am happy to go into
it during office hours,

136
00:07:34,940 --> 00:07:38,085
but there's notions
of information theory

137
00:07:38,085 --> 00:07:41,480
and Kullback-Leibler
information of the model

138
00:07:41,480 --> 00:07:43,500
versus the true
model, and trying

139
00:07:43,500 --> 00:07:47,730
to basically maximize the
closeness of our fitted model

140
00:07:47,730 --> 00:07:48,960
to that.

141
00:07:48,960 --> 00:07:50,770
Now the Hannan-Quinn
criterion, let's

142
00:07:50,770 --> 00:07:52,270
just look at how that differs.

143
00:07:52,270 --> 00:07:57,480
Well, that basically has a
penalty midway between the log

144
00:07:57,480 --> 00:07:58,460
n and two.

145
00:07:58,460 --> 00:08:01,320
It's 2*log(log n).

146
00:08:01,320 --> 00:08:03,890
So this has a penalty that's
increasing with size n,

147
00:08:03,890 --> 00:08:07,680
but not as fast as log n.

148
00:08:07,680 --> 00:08:11,480
This becomes
relevant when we have

149
00:08:11,480 --> 00:08:16,240
models that get to be very large
because we have a lot of data.

150
00:08:16,240 --> 00:08:17,780
Basically the more
data you have,

151
00:08:17,780 --> 00:08:19,860
the more parameters
you should be

152
00:08:19,860 --> 00:08:22,220
able to incorporate in
the model if they're

153
00:08:22,220 --> 00:08:27,360
sort of statistically valid
factors, important factors.

154
00:08:27,360 --> 00:08:29,960
And the Hannan-Quinn
criterion basically

155
00:08:29,960 --> 00:08:36,179
allows for modeling processes
where really an infinite number

156
00:08:36,179 --> 00:08:38,870
of variables might
be appropriate,

157
00:08:38,870 --> 00:08:40,860
but you need larger
and larger sample sizes

158
00:08:40,860 --> 00:08:44,640
to effectively estimate those.

159
00:08:44,640 --> 00:08:51,530
So those are the criteria that
can be applied with time series

160
00:08:51,530 --> 00:08:52,330
models.

161
00:08:52,330 --> 00:08:55,210
And I should point
out that, let's see,

162
00:08:55,210 --> 00:08:59,720
if you took sort of
this factor 2 over n

163
00:08:59,720 --> 00:09:03,530
and inverted it to n over
two log sigma squared,

164
00:09:03,530 --> 00:09:07,560
that term is basically one of
the terms in the likelihood

165
00:09:07,560 --> 00:09:09,190
function of the fitted model.

166
00:09:09,190 --> 00:09:11,320
So you can see how this
criterion is basically

167
00:09:11,320 --> 00:09:16,280
manipulating the
maximum likelihood

168
00:09:16,280 --> 00:09:21,845
value by adjusting it for a
penalty for extra parameters.

169
00:09:28,080 --> 00:09:28,590
Let's see.

170
00:09:28,590 --> 00:09:29,360
OK.

171
00:09:29,360 --> 00:09:31,595
Next topic is just
test for stationarity

172
00:09:31,595 --> 00:09:32,470
and non-stationarity.

173
00:09:35,540 --> 00:09:40,870
There's a famous test called
the Dickey-Fuller test, which

174
00:09:40,870 --> 00:09:47,550
is essentially to evaluate
the time series to see if it's

175
00:09:47,550 --> 00:09:49,010
consistent with a random walk.

176
00:09:49,010 --> 00:09:51,490
We know that we've been
discussing sort of lecture

177
00:09:51,490 --> 00:09:56,880
after lecture how simple random
walks are non-stationary.

178
00:09:56,880 --> 00:10:02,960
And the simple random walk is
given by the model up here,

179
00:10:02,960 --> 00:10:05,920
x_t equals phi
x_(t-1) plus eta_t.

180
00:10:05,920 --> 00:10:09,345
If phi is equal
to 1, right, that

181
00:10:09,345 --> 00:10:11,740
is a non-stationary process.

182
00:10:11,740 --> 00:10:14,190
Well, in the
Dickey-Fuller test we

183
00:10:14,190 --> 00:10:17,640
want to test whether
phi equals 1 or not.

184
00:10:17,640 --> 00:10:23,090
And so we can fit the AR(1)
model by least squares

185
00:10:23,090 --> 00:10:29,110
and define the test statistic to
be the estimate of phi minus 1

186
00:10:29,110 --> 00:10:33,667
over its standard error where
phi is the least squares

187
00:10:33,667 --> 00:10:36,250
estimate and the standard error
is the least squares estimate,

188
00:10:36,250 --> 00:10:39,630
the standard error of that.

189
00:10:39,630 --> 00:10:44,970
If our coefficient phi is
less than 1 in modulus,

190
00:10:44,970 --> 00:10:47,430
so this really is a
stationary series,

191
00:10:47,430 --> 00:10:56,080
then the estimate phi converges
in distribution to a normal 0,

192
00:10:56,080 --> 00:10:58,950
1 minus phi squared.

193
00:10:58,950 --> 00:11:04,920
And let's see.

194
00:11:04,920 --> 00:11:09,990
But if phi is equal
to 1, OK, so just

195
00:11:09,990 --> 00:11:12,450
to recap that second
to last bullet point

196
00:11:12,450 --> 00:11:17,580
is basically the property that
when norm phi is less than 1,

197
00:11:17,580 --> 00:11:22,510
then our least squares
estimates are asymptotically

198
00:11:22,510 --> 00:11:26,420
normally distributed
with mean 0 if we

199
00:11:26,420 --> 00:11:29,740
normalize by the true value,
and 1 minus phi squared.

200
00:11:29,740 --> 00:11:34,220
If phi is equal to
1, then it turns out

201
00:11:34,220 --> 00:11:38,470
that phi hat is super-consistent
with rate 1 over t.

202
00:11:38,470 --> 00:11:44,520
Now this super-consistency
is related

203
00:11:44,520 --> 00:11:50,900
to statistics converging
to some value,

204
00:11:50,900 --> 00:11:55,590
and what is the rate of
convergence of those statistics

205
00:11:55,590 --> 00:11:56,490
to different values.

206
00:11:56,490 --> 00:12:04,840
So in normal samples we can
estimate sort of the mean

207
00:12:04,840 --> 00:12:06,440
by the sample mean.

208
00:12:06,440 --> 00:12:13,960
And that will converge to
the true mean at rate of 1

209
00:12:13,960 --> 00:12:14,570
over root n.

210
00:12:17,660 --> 00:12:23,070
When we have a
non-stationary random walk,

211
00:12:23,070 --> 00:12:28,890
the independent
variables matrix is such

212
00:12:28,890 --> 00:12:35,250
that X transpose X over
n grows without bound.

213
00:12:35,250 --> 00:12:42,270
So if we have y is equal
to X beta plus epsilon,

214
00:12:42,270 --> 00:12:46,750
and beta hat is equal to
X transpose X inverse X

215
00:12:46,750 --> 00:12:55,940
transpose y, the
problem is-- well,

216
00:12:55,940 --> 00:13:01,080
and beta hat is
distributed as ultimately

217
00:13:01,080 --> 00:13:03,500
normal with mean beta
and variance sigma

218
00:13:03,500 --> 00:13:07,430
squared, X transpose X inverse.

219
00:13:07,430 --> 00:13:10,160
This X transpose
X inverse matrix,

220
00:13:10,160 --> 00:13:14,750
when the process is
non-stationary, a random walk,

221
00:13:14,750 --> 00:13:15,650
it grows infinitely.

222
00:13:19,320 --> 00:13:23,980
X transpose X over
n actually grows

223
00:13:23,980 --> 00:13:31,280
to infinity in magnitude just
because it becomes unbounded.

224
00:13:31,280 --> 00:13:34,560
Whereas X transpose X over
n, when it's stationary

225
00:13:34,560 --> 00:13:36,130
is bounded.

226
00:13:36,130 --> 00:13:39,210
So anyway, so that leads
to the super-consistency,

227
00:13:39,210 --> 00:13:41,930
meaning that it converges
to the value much faster

228
00:13:41,930 --> 00:13:44,590
and so this normal
distribution isn't appropriate.

229
00:13:44,590 --> 00:13:47,730
And it turns out there's
Dickey-Fuller distribution

230
00:13:47,730 --> 00:13:50,910
for this test statistic,
which is based on integrals

231
00:13:50,910 --> 00:13:55,150
of diffusions and one
can read about that

232
00:13:55,150 --> 00:14:01,990
in the literature on unit roots
and test for non-stationarity.

233
00:14:01,990 --> 00:14:05,760
So there's a very rich
literature on this problem.

234
00:14:05,760 --> 00:14:12,470
If you're into econometrics,
basically a lot of time's

235
00:14:12,470 --> 00:14:15,870
been spent in that
field on this topic.

236
00:14:15,870 --> 00:14:22,370
And the mathematics gets
very, very involved,

237
00:14:22,370 --> 00:14:26,190
but good results are available.

238
00:14:26,190 --> 00:14:30,230
So let's see an application
of some of these time series

239
00:14:30,230 --> 00:14:30,730
methods.

240
00:14:37,550 --> 00:14:40,740
Let me go to the
desktop here if I can.

241
00:14:40,740 --> 00:14:46,690
In this supplemental material
that'll be on the website,

242
00:14:46,690 --> 00:14:49,540
I just wanted you
to be able to work

243
00:14:49,540 --> 00:14:51,150
with time series,
real time series

244
00:14:51,150 --> 00:14:53,250
and implement these
autoregressive moving

245
00:14:53,250 --> 00:14:58,220
average fits and understand
basically how things work.

246
00:14:58,220 --> 00:15:03,440
So in this, it introduces
loading the R libraries

247
00:15:03,440 --> 00:15:06,190
and Federal Reserve data into
R, basically collecting it

248
00:15:06,190 --> 00:15:07,370
off the web.

249
00:15:07,370 --> 00:15:11,300
Creating weekly and monthly
time series from a daily series,

250
00:15:11,300 --> 00:15:14,160
and it's a trivial thing to do,
but when you sit down and try

251
00:15:14,160 --> 00:15:16,460
to do it gets involved.

252
00:15:16,460 --> 00:15:20,210
So there's some nice
tools that are available.

253
00:15:20,210 --> 00:15:22,080
There's the ACF
and the PACF, which

254
00:15:22,080 --> 00:15:25,710
is the auto-correlation
function and the partial

255
00:15:25,710 --> 00:15:29,030
auto-correlation
function, which are

256
00:15:29,030 --> 00:15:30,690
used for interpreting series.

257
00:15:30,690 --> 00:15:35,350
Then we conduct Dickey-Fuller
test for unit roots

258
00:15:35,350 --> 00:15:39,920
and determine, evaluate
stationarity, non-stationarity

259
00:15:39,920 --> 00:15:42,170
of the 10-year yield.

260
00:15:42,170 --> 00:15:48,010
And then we evaluate
stationarity and cyclicality

261
00:15:48,010 --> 00:15:51,240
in the fitted autoregressive
model of order 2

262
00:15:51,240 --> 00:15:53,350
to monthly data.

263
00:15:53,350 --> 00:15:56,940
And actually 1.7 there,
that cyclicality issue,

264
00:15:56,940 --> 00:15:59,980
relates to one of the
problems on the problem set

265
00:15:59,980 --> 00:16:02,850
for time series,
which is looking at,

266
00:16:02,850 --> 00:16:06,030
with second order
autoregressive models,

267
00:16:06,030 --> 00:16:10,510
is there cyclicality
in the process?

268
00:16:10,510 --> 00:16:12,890
And then finally
looking at identifying

269
00:16:12,890 --> 00:16:16,960
the best autoregressive model
using the AIC criterion.

270
00:16:16,960 --> 00:16:21,500
So let me just page through
and show you a couple of plots

271
00:16:21,500 --> 00:16:22,470
here.

272
00:16:22,470 --> 00:16:22,970
OK.

273
00:16:22,970 --> 00:16:26,379
Well, there's the
original 10-year yield

274
00:16:26,379 --> 00:16:28,170
collected directly from
the Federal Reserve

275
00:16:28,170 --> 00:16:32,360
website over a 10 year period.

276
00:16:32,360 --> 00:16:34,600
And, oh, here we go.

277
00:16:34,600 --> 00:16:35,540
This is nice.

278
00:16:35,540 --> 00:16:36,040
OK.

279
00:16:42,580 --> 00:16:43,730
OK.

280
00:16:43,730 --> 00:16:46,870
Let's see, this
section 1.4 conducts

281
00:16:46,870 --> 00:16:49,930
the Dickey-Fuller test.

282
00:16:49,930 --> 00:17:03,080
And it basically
determines that the p-value

283
00:17:03,080 --> 00:17:06,420
for non-stationarity
is not rejected.

284
00:17:06,420 --> 00:17:12,819
And so, with the augmented
Dickey-Fuller test,

285
00:17:12,819 --> 00:17:15,089
the test statistic is computed.

286
00:17:15,089 --> 00:17:19,849
Its significance is
evaluated by the distribution

287
00:17:19,849 --> 00:17:21,760
for that statistic.

288
00:17:21,760 --> 00:17:24,790
And the p-value tells
you how extreme the value

289
00:17:24,790 --> 00:17:28,910
of the statistic is,
meaning how unusual is it.

290
00:17:28,910 --> 00:17:33,950
The smaller the p-value, the
more unlikely the value is.

291
00:17:33,950 --> 00:17:35,910
The p-value is what's
the likelihood of getting

292
00:17:35,910 --> 00:17:39,690
as extreme or more extreme a
value of the test statistic,

293
00:17:39,690 --> 00:17:41,150
and the test
statistic is evidence

294
00:17:41,150 --> 00:17:43,075
against the null hypothesis.

295
00:17:43,075 --> 00:17:48,850
So in this case the p-values
range basically 0.2726

296
00:17:48,850 --> 00:18:00,760
for the monthly data, which
says that basically there

297
00:18:00,760 --> 00:18:03,345
is evidence of a unit
root in the process.

298
00:18:06,530 --> 00:18:08,980
Let's see.

299
00:18:08,980 --> 00:18:09,480
OK.

300
00:18:09,480 --> 00:18:10,896
There's a section
on understanding

301
00:18:10,896 --> 00:18:12,815
partial auto-correlation
coefficients.

302
00:18:16,740 --> 00:18:20,180
And let me just state what
the partial correlation

303
00:18:20,180 --> 00:18:21,010
coefficients are.

304
00:18:21,010 --> 00:18:22,676
You have the
auto-correlation functions,

305
00:18:22,676 --> 00:18:25,850
which are simply the
correlations of the time

306
00:18:25,850 --> 00:18:28,190
series with lags of its values.

307
00:18:28,190 --> 00:18:30,020
The partial
auto-correlation coefficient

308
00:18:30,020 --> 00:18:36,640
is the correlation that's
between the time series

309
00:18:36,640 --> 00:18:42,180
and say, it's p-th lag that is
not explained by all lags lower

310
00:18:42,180 --> 00:18:42,720
than p.

311
00:18:42,720 --> 00:18:45,690
So it's basically the
incremental correlation

312
00:18:45,690 --> 00:18:50,460
of the time series variable with
the p-th lag after controlling

313
00:18:50,460 --> 00:18:51,540
for the others.

314
00:18:55,650 --> 00:18:57,420
And then let's see.

315
00:18:57,420 --> 00:19:01,480
With this, in section
eight here there's

316
00:19:01,480 --> 00:19:07,220
a function in R called ar, for
autoregressive, which basically

317
00:19:07,220 --> 00:19:11,170
will fit all autoregressive
models up to a given order

318
00:19:11,170 --> 00:19:14,230
and provide diagnostic
statistics for that.

319
00:19:14,230 --> 00:19:18,110
And here is a plot of the
relative AIC statistic

320
00:19:18,110 --> 00:19:20,640
for models of the monthly data.

321
00:19:20,640 --> 00:19:25,100
And you can see that basically
it takes all the AIC statistics

322
00:19:25,100 --> 00:19:28,950
and subtracts the smallest
one from all the others.

323
00:19:28,950 --> 00:19:33,495
So one can see that according
to the AIC statistic

324
00:19:33,495 --> 00:19:40,110
a model of order seven is
suggested for this treasury

325
00:19:40,110 --> 00:19:40,892
yield data.

326
00:19:43,670 --> 00:19:46,140
OK.

327
00:19:46,140 --> 00:19:49,500
Then finally because these
autoregressive models

328
00:19:49,500 --> 00:19:52,920
are implemented with
regression models,

329
00:19:52,920 --> 00:19:56,780
one can apply
regression diagnostics

330
00:19:56,780 --> 00:20:02,180
that we had introduced earlier
to look at those data as well.

331
00:20:02,180 --> 00:20:04,140
All right.

332
00:20:04,140 --> 00:20:07,495
So let's go down now.

333
00:20:14,978 --> 00:20:16,125
[INAUDIBLE]

334
00:20:16,125 --> 00:20:16,625
OK.

335
00:20:25,770 --> 00:20:27,970
[INAUDIBLE]

336
00:20:27,970 --> 00:20:28,660
Full screen.

337
00:20:28,660 --> 00:20:31,170
Here we go.

338
00:20:31,170 --> 00:20:31,670
All right.

339
00:20:36,700 --> 00:20:41,070
So let's move on to the
topic of volatility modeling.

340
00:20:44,350 --> 00:20:50,290
The discussion in
this section is

341
00:20:50,290 --> 00:20:53,640
going to begin with just
defining volatility.

342
00:20:53,640 --> 00:20:56,450
So we know what
we're talking about.

343
00:20:56,450 --> 00:21:01,740
And then measuring volatility
with historical data

344
00:21:01,740 --> 00:21:05,190
where we don't really apply sort
of statistical models so much,

345
00:21:05,190 --> 00:21:07,810
but we're concerned with
just historical measures

346
00:21:07,810 --> 00:21:10,180
of volatility and
their prediction.

347
00:21:10,180 --> 00:21:11,450
Then there are formal models.

348
00:21:11,450 --> 00:21:14,230
We'll introduce Geometric
Brownian Motion, of course.

349
00:21:14,230 --> 00:21:17,080
That's one of the standard
models in finance.

350
00:21:17,080 --> 00:21:18,710
But also Poisson
jump-diffusions,

351
00:21:18,710 --> 00:21:22,240
which is an extension of
Geometric Brownian Motion

352
00:21:22,240 --> 00:21:24,300
to allow for discontinuities.

353
00:21:24,300 --> 00:21:28,410
And then there's a property
of these Brownian motion

354
00:21:28,410 --> 00:21:30,860
and jump-diffusion
models which is models

355
00:21:30,860 --> 00:21:33,400
with independent increments.

356
00:21:33,400 --> 00:21:43,620
Basically you have disjoint
increments of the process,

357
00:21:43,620 --> 00:21:45,750
basically are independent
of each other, which

358
00:21:45,750 --> 00:21:51,270
is a key property when there's
time dependence in the models.

359
00:21:51,270 --> 00:21:54,040
There can be time dependence
actually in the volatility.

360
00:21:54,040 --> 00:21:55,980
And ARCH models were
introduced initially

361
00:21:55,980 --> 00:21:57,084
to try and capture that.

362
00:21:57,084 --> 00:21:58,500
And were extended
to GARCH models,

363
00:21:58,500 --> 00:22:00,910
and these are the
sort of simplest cases

364
00:22:00,910 --> 00:22:03,530
of time-dependent
volatility models

365
00:22:03,530 --> 00:22:06,680
that we can work
with and introduce.

366
00:22:06,680 --> 00:22:11,630
And in all of these the sort
of mathematical framework

367
00:22:11,630 --> 00:22:14,820
for defining these models
and the statistical framework

368
00:22:14,820 --> 00:22:18,050
for estimating their parameters
is going to be highlighted.

369
00:22:18,050 --> 00:22:22,100
And while it's a
very simple setting

370
00:22:22,100 --> 00:22:24,710
in terms of what
these models are,

371
00:22:24,710 --> 00:22:28,090
these issues that
we'll be covering

372
00:22:28,090 --> 00:22:33,200
relate to virtually all
statistical modeling as well.

373
00:22:33,200 --> 00:22:36,120
So let's define volatility.

374
00:22:36,120 --> 00:22:36,620
OK.

375
00:22:36,620 --> 00:22:40,480
In finance it's defined as the
annualized standard deviation

376
00:22:40,480 --> 00:22:43,380
of the change in price or
value of a financial security,

377
00:22:43,380 --> 00:22:45,280
or an index.

378
00:22:45,280 --> 00:22:49,630
So we're interested
in the variability

379
00:22:49,630 --> 00:22:55,220
of this process, a price
process or a value process.

380
00:22:55,220 --> 00:22:59,240
And we consider it on an
annualized time scale.

381
00:22:59,240 --> 00:23:03,910
Now because of that, when
you talk about volatility

382
00:23:03,910 --> 00:23:10,550
it really is meaningful to
communicate, levels of 10%.

383
00:23:10,550 --> 00:23:17,500
If you think of, at what level
do sort of absolute bond yields

384
00:23:17,500 --> 00:23:19,480
vary over a year?

385
00:23:22,440 --> 00:23:25,120
It's probably less than 5%.

386
00:23:25,120 --> 00:23:26,242
Bond yields don't--

387
00:23:26,242 --> 00:23:27,950
When you think of
currencies, how much do

388
00:23:27,950 --> 00:23:30,860
those vary over a year.

389
00:23:30,860 --> 00:23:32,790
Maybe 10%.

390
00:23:32,790 --> 00:23:35,480
With equity markets,
how do those vary?

391
00:23:35,480 --> 00:23:39,700
Well, maybe 30%, 40% or more.

392
00:23:39,700 --> 00:23:43,170
With the estimation and
prediction approaches,

393
00:23:43,170 --> 00:23:46,030
OK, these are what
we'll be discussing.

394
00:23:46,030 --> 00:23:47,930
There's different cases.

395
00:23:47,930 --> 00:23:52,830
So let's go on to
historical volatility.

396
00:23:52,830 --> 00:23:56,270
In terms of computing
the historical volatility

397
00:23:56,270 --> 00:23:59,350
we'll be considering
basically a price

398
00:23:59,350 --> 00:24:02,080
series of T plus 1 points.

399
00:24:02,080 --> 00:24:06,811
And then we can get
T period returns

400
00:24:06,811 --> 00:24:08,310
corresponding to
those prices, which

401
00:24:08,310 --> 00:24:12,450
is the difference in
the logs of the prices,

402
00:24:12,450 --> 00:24:14,300
or the log of the
price relatives.

403
00:24:14,300 --> 00:24:18,370
So R_t is going to be
the return for the asset.

404
00:24:18,370 --> 00:24:22,710
And one could use
other definitions,

405
00:24:22,710 --> 00:24:26,340
like sort of the absolute
return, not take logs.

406
00:24:26,340 --> 00:24:30,160
It's convenient in much
empirical analysis,

407
00:24:30,160 --> 00:24:34,250
I guess, to work with the
logs because if you sum

408
00:24:34,250 --> 00:24:37,990
logs you get sort of
log of the product.

409
00:24:37,990 --> 00:24:41,830
And so total cumulative
returns can be computed easily

410
00:24:41,830 --> 00:24:43,670
with sums of logs.

411
00:24:43,670 --> 00:24:47,140
But anyway, we'll work
with that scale for now.

412
00:24:47,140 --> 00:24:47,640
OK.

413
00:24:47,640 --> 00:24:52,080
Now the process R_t, the
return series process,

414
00:24:52,080 --> 00:24:55,400
is going to be assumed to
be covariance stationary,

415
00:24:55,400 --> 00:24:59,820
meaning that it does
have a finite variance.

416
00:24:59,820 --> 00:25:04,900
And the sample estimate
of that is just

417
00:25:04,900 --> 00:25:10,730
given by the square root
of the sample variance.

418
00:25:10,730 --> 00:25:13,445
And we're also considering
an unbiased estimate of that.

419
00:25:16,360 --> 00:25:20,770
And if we want to
basically convert these

420
00:25:20,770 --> 00:25:22,570
to annualized
values so that we're

421
00:25:22,570 --> 00:25:24,410
dealing with a
volatility, then if we

422
00:25:24,410 --> 00:25:28,672
have daily prices of
which in financial markets

423
00:25:28,672 --> 00:25:30,130
they're usually--
in the US they're

424
00:25:30,130 --> 00:25:33,550
open roughly 252 days
a year on average.

425
00:25:33,550 --> 00:25:37,580
We multiply that sigma
hat by 252 square root.

426
00:25:37,580 --> 00:25:44,110
And for weekly, root 52, and
root 12 for monthly data.

427
00:25:44,110 --> 00:25:48,870
So regardless of the
periodicity of our original data

428
00:25:48,870 --> 00:25:51,700
we can get them onto
that volatility scale.

429
00:25:56,410 --> 00:26:00,960
Now in terms of
prediction methods

430
00:26:00,960 --> 00:26:05,980
that one can make with
historical volatility,

431
00:26:05,980 --> 00:26:12,230
and there's a lot of work
done in finance by people

432
00:26:12,230 --> 00:26:15,060
who aren't sort of
trained as econometricians

433
00:26:15,060 --> 00:26:18,570
or statisticians, they basically
just work with the data.

434
00:26:18,570 --> 00:26:23,840
And there's a standard for
risk analysis called the risk

435
00:26:23,840 --> 00:26:30,780
metrics approach, where the
approach defines volatility

436
00:26:30,780 --> 00:26:33,470
and volatility estimates,
historical estimates, just

437
00:26:33,470 --> 00:26:35,750
using simple methodologies.

438
00:26:35,750 --> 00:26:39,870
And so that's just go
through what those are here.

439
00:26:39,870 --> 00:26:46,940
One can-- basically
for any period t,

440
00:26:46,940 --> 00:26:49,710
one can define the
sample volatility,

441
00:26:49,710 --> 00:26:53,670
just to be the sample standard
deviation of the period t

442
00:26:53,670 --> 00:26:55,100
returns.

443
00:26:55,100 --> 00:26:58,430
And so with daily
data that might just

444
00:26:58,430 --> 00:27:00,800
be the square of
that daily return.

445
00:27:00,800 --> 00:27:05,150
With monthly data it could be
the sample standard deviation

446
00:27:05,150 --> 00:27:08,240
of the returns over the
month and with yearly it

447
00:27:08,240 --> 00:27:10,860
would be the sample
over the year.

448
00:27:10,860 --> 00:27:15,150
Also with intraday data, it
could be the sample standard

449
00:27:15,150 --> 00:27:22,900
deviation over intraday periods
of say, half hours or hours.

450
00:27:22,900 --> 00:27:26,810
And the historical
average is simply

451
00:27:26,810 --> 00:27:30,320
the mean of those
estimates, which

452
00:27:30,320 --> 00:27:32,490
uses all the available data.

453
00:27:32,490 --> 00:27:34,830
One can consider the
simple moving average

454
00:27:34,830 --> 00:27:38,280
of these realized volatilities.

455
00:27:38,280 --> 00:27:44,330
And so that basically is using
the last m, for some finite m,

456
00:27:44,330 --> 00:27:46,260
values to average.

457
00:27:46,260 --> 00:27:53,296
And one could also consider
an exponential moving average

458
00:27:53,296 --> 00:27:57,170
of these sample
volatilities where

459
00:27:57,170 --> 00:28:02,060
we have-- our estimate of the
volatility is 1 minus beta

460
00:28:02,060 --> 00:28:05,600
times the current
period volatility

461
00:28:05,600 --> 00:28:08,550
plus beta times the
previous estimate.

462
00:28:08,550 --> 00:28:10,780
And these exponential
moving averages

463
00:28:10,780 --> 00:28:15,990
are really very nice
ways to estimate

464
00:28:15,990 --> 00:28:19,740
processes that change over time.

465
00:28:19,740 --> 00:28:23,440
And they're able to track
the changes quite well

466
00:28:23,440 --> 00:28:27,214
and they will tend to
come up again and again.

467
00:28:27,214 --> 00:28:28,880
This exponential
moving average actually

468
00:28:28,880 --> 00:28:31,990
uses all available data.

469
00:28:31,990 --> 00:28:34,770
And there can be discrete
versions of those where

470
00:28:34,770 --> 00:28:37,615
you say, well let's use not
an equal weighted average

471
00:28:37,615 --> 00:28:39,490
like the simple moving
average, but let's use

472
00:28:39,490 --> 00:28:44,220
a geometric average of the last
m values in an exponential way.

473
00:28:44,220 --> 00:28:46,790
And that's the exponential
weighted moving average

474
00:28:46,790 --> 00:28:47,780
that uses the last m.

475
00:28:54,191 --> 00:28:54,690
OK.

476
00:28:54,690 --> 00:28:55,190
There we go.

477
00:29:03,109 --> 00:29:03,609
OK.

478
00:29:06,610 --> 00:29:11,870
Well, with these different
measures of sample volatility,

479
00:29:11,870 --> 00:29:17,610
one can basically build
models to estimate them

480
00:29:17,610 --> 00:29:23,990
with regression
models and evaluate.

481
00:29:23,990 --> 00:29:26,650
And in terms of the
risk metrics benchmark,

482
00:29:26,650 --> 00:29:30,140
they consider a variety
of different methodologies

483
00:29:30,140 --> 00:29:32,080
for estimating volatility.

484
00:29:32,080 --> 00:29:35,000
And sort of determine
what methods are best

485
00:29:35,000 --> 00:29:38,320
for different kinds of
financial instruments.

486
00:29:38,320 --> 00:29:42,030
And different financial indexes.

487
00:29:42,030 --> 00:29:44,140
And there are different
performance measures

488
00:29:44,140 --> 00:29:45,000
one can apply.

489
00:29:45,000 --> 00:29:47,740
Sort of mean squared
error of prediction,

490
00:29:47,740 --> 00:29:51,020
mean absolute error
of prediction,

491
00:29:51,020 --> 00:29:53,360
mean absolute prediction
error, and so forth

492
00:29:53,360 --> 00:29:55,680
to evaluate different
methodologies.

493
00:29:55,680 --> 00:30:00,640
And on the web you can actually
look at the technical documents

494
00:30:00,640 --> 00:30:03,530
for risk metrics and they
go through these analyses

495
00:30:03,530 --> 00:30:06,700
and if your interest is in a
particular area of finance,

496
00:30:06,700 --> 00:30:09,810
whether it's fixed income
or equities, commodities,

497
00:30:09,810 --> 00:30:13,220
or currencies,
reviewing their work

498
00:30:13,220 --> 00:30:15,160
there is very
interesting because it

499
00:30:15,160 --> 00:30:20,740
does highlight different
aspects of those markets.

500
00:30:20,740 --> 00:30:25,690
And it turns out that basically
the exponential moving average

501
00:30:25,690 --> 00:30:30,040
is generally a very good
method for many instruments.

502
00:30:30,040 --> 00:30:38,050
And the sort of discounting
of the values over time

503
00:30:38,050 --> 00:30:41,340
corresponds to having roughly
between, I guess, a 45

504
00:30:41,340 --> 00:30:45,910
and a 90 day period in
estimating your volatility.

505
00:30:45,910 --> 00:30:50,690
And in these approaches
which are, I guess,

506
00:30:50,690 --> 00:30:52,930
they're a bit ad hoc.

507
00:30:52,930 --> 00:30:54,250
There's the formalism.

508
00:30:54,250 --> 00:30:57,530
And defining them is
basically just empirically

509
00:30:57,530 --> 00:30:58,750
what has worked in the past.

510
00:31:03,760 --> 00:31:04,260
Let's see.

511
00:31:08,610 --> 00:31:12,170
While these things are
ad hoc, they actually

512
00:31:12,170 --> 00:31:13,840
have been very, very effective.

513
00:31:13,840 --> 00:31:23,970
So let's move on to
formal statistical models

514
00:31:23,970 --> 00:31:25,940
of volatility.

515
00:31:25,940 --> 00:31:30,740
And the first class is-- model
is the Geometric Brownian

516
00:31:30,740 --> 00:31:31,240
Motion.

517
00:31:31,240 --> 00:31:37,700
So here we have basically
a stochastic differential

518
00:31:37,700 --> 00:31:41,960
equation defining the model
for Geometric Brownian Motion.

519
00:31:41,960 --> 00:31:44,950
And Choongbum will be
going in some detail

520
00:31:44,950 --> 00:31:49,360
about stochastic
differential equations,

521
00:31:49,360 --> 00:31:52,300
and stochastic calculus
for representing

522
00:31:52,300 --> 00:31:55,590
different processes,
continuous processes.

523
00:31:55,590 --> 00:32:00,910
And the formulation
is basically looking

524
00:32:00,910 --> 00:32:08,470
at increments of the price
process S is equal to basically

525
00:32:08,470 --> 00:32:14,910
a mu S of t, sort of a drift
term, plus a sigma S of t,

526
00:32:14,910 --> 00:32:18,930
a multiple of d W
of t, where sigma

527
00:32:18,930 --> 00:32:21,130
is the volatility of
the security price,

528
00:32:21,130 --> 00:32:25,810
mu is the mean return
per unit time, d W of t

529
00:32:25,810 --> 00:32:29,830
is the increment of a standard
Brownian motion processor,

530
00:32:29,830 --> 00:32:31,350
Wiener process.

531
00:32:31,350 --> 00:32:38,210
And this W process is
such that it's increments,

532
00:32:38,210 --> 00:32:42,160
basically the change in value
of the process between two time

533
00:32:42,160 --> 00:32:46,410
points is normally
distributed, with mean 0

534
00:32:46,410 --> 00:32:51,720
and variance equal to the
length of the interval.

535
00:32:54,354 --> 00:32:56,770
And increments on disjoint
time intervals are independent.

536
00:33:01,690 --> 00:33:10,810
And well, if you
divide both sides

537
00:33:10,810 --> 00:33:16,535
of that equation by S of t then
you have d S of t over S of t

538
00:33:16,535 --> 00:33:20,120
is equal to mu dt
plus sigma d W of t.

539
00:33:20,120 --> 00:33:25,495
And so the increments d S
of t normalized by S of t

540
00:33:25,495 --> 00:33:29,600
are a standard Brownian motion
with drift mu and volatility

541
00:33:29,600 --> 00:33:30,100
sigma.

542
00:33:36,200 --> 00:33:44,570
Now with sample data
from this process,

543
00:33:44,570 --> 00:33:46,890
now suppose we have
prices observed

544
00:33:46,890 --> 00:33:50,820
at times t_0 up to t_n.

545
00:33:50,820 --> 00:33:53,960
And for now we're not going
to make any assumptions

546
00:33:53,960 --> 00:33:57,950
about what those time increments
are, what those times are.

547
00:33:57,950 --> 00:33:59,724
They could be equally spaced.

548
00:33:59,724 --> 00:34:01,015
They could be unequally spaced.

549
00:34:03,550 --> 00:34:10,420
The returns, the log of the
relative price change from time

550
00:34:10,420 --> 00:34:15,880
t_(j-1) to t_j are
independent random variables.

551
00:34:15,880 --> 00:34:19,610
And they are independent.

552
00:34:19,610 --> 00:34:21,800
Their distribution is
normally distributed

553
00:34:21,800 --> 00:34:27,330
with mean given by mu times the
length of the time increment,

554
00:34:27,330 --> 00:34:31,580
and variance sigma squared times
the length of the increment.

555
00:34:31,580 --> 00:34:35,909
And these properties will
be covered by Choongbum

556
00:34:35,909 --> 00:34:38,139
in some later lectures.

557
00:34:38,139 --> 00:34:41,750
So for now what we can
just know that this is true

558
00:34:41,750 --> 00:34:46,420
and apply this result.
If we fix various time

559
00:34:46,420 --> 00:34:49,130
points for the observation
and compute returns this way.

560
00:34:49,130 --> 00:34:51,260
If it's a Geometric
Brownian Motion

561
00:34:51,260 --> 00:34:55,610
we know that this is the
distribution of the returns.

562
00:34:55,610 --> 00:34:58,190
Now knowing that
distribution we can now

563
00:34:58,190 --> 00:35:01,620
engage in maximum
likelihood estimation.

564
00:35:01,620 --> 00:35:02,120
OK.

565
00:35:02,120 --> 00:35:06,030
If the increments are
all just equal to 1,

566
00:35:06,030 --> 00:35:09,140
so we're thinking
of daily data, say.

567
00:35:09,140 --> 00:35:13,600
Then the maximum likelihood
estimates are simple.

568
00:35:13,600 --> 00:35:17,570
It's basically the sample mean
and the sample variance with 1

569
00:35:17,570 --> 00:35:20,340
over n instead of 1 over
n minus 1 in the MLE's.

570
00:35:20,340 --> 00:35:26,520
If delta_j varies
then, well, that's

571
00:35:26,520 --> 00:35:30,810
actually a case
in the exercises.

572
00:35:30,810 --> 00:35:39,100
Now does anyone,
in terms of, well,

573
00:35:39,100 --> 00:35:46,730
in the class exercise the issue
that is important to think

574
00:35:46,730 --> 00:35:53,640
about is if you consider a given
interval of time over which

575
00:35:53,640 --> 00:35:57,660
we're observing this Geometric
Brownian Motion process,

576
00:35:57,660 --> 00:36:03,440
if we increase the sampling
rate of prices over a given

577
00:36:03,440 --> 00:36:06,990
interval, how does that
change the properties

578
00:36:06,990 --> 00:36:09,400
of our estimates?

579
00:36:09,400 --> 00:36:11,840
Basically, do we obtain
more accurate estimates

580
00:36:11,840 --> 00:36:14,450
of the underlying parameters?

581
00:36:14,450 --> 00:36:19,420
And as you increase
the sampling frequency,

582
00:36:19,420 --> 00:36:21,830
it turns out that some
parameters are estimated much,

583
00:36:21,830 --> 00:36:26,190
much better and you
get basically much

584
00:36:26,190 --> 00:36:28,730
lower standard errors
on those estimates.

585
00:36:28,730 --> 00:36:31,900
With other parameters
you don't necessarily.

586
00:36:31,900 --> 00:36:35,140
And the exercise is
to evaluate that.

587
00:36:35,140 --> 00:36:37,350
Now another issue
that's important

588
00:36:37,350 --> 00:36:42,550
is the issue of sort of what
is the appropriate time scale

589
00:36:42,550 --> 00:36:46,910
for Geometric Brownian Motion.

590
00:36:46,910 --> 00:36:48,750
Right now we're
thinking of, you collect

591
00:36:48,750 --> 00:36:52,055
data, whatever the
periodicity is of the data

592
00:36:52,055 --> 00:36:54,430
is you think that's your period
for your Brownian Motion.

593
00:36:54,430 --> 00:36:56,630
Let's evaluate that.

594
00:36:56,630 --> 00:37:01,655
Let me go to another example.

595
00:37:08,200 --> 00:37:09,060
Let's see here.

596
00:37:13,515 --> 00:37:15,350
Yep.

597
00:37:15,350 --> 00:37:15,850
OK.

598
00:37:15,850 --> 00:37:17,360
Let's go control-minus here.

599
00:37:24,830 --> 00:37:25,424
OK.

600
00:37:25,424 --> 00:37:25,924
All right.

601
00:37:31,026 --> 00:37:32,060
Let's see.

602
00:37:32,060 --> 00:37:33,640
With this second
case study there

603
00:37:33,640 --> 00:37:41,200
was data on exchange rates,
looking for regime changes

604
00:37:41,200 --> 00:37:43,920
in exchange rate relationships.

605
00:37:43,920 --> 00:37:46,560
And so we have data
from that case study

606
00:37:46,560 --> 00:37:49,880
on different foreign
exchange rates.

607
00:37:49,880 --> 00:37:57,390
And here in the top panel
I've graphed the euro/dollar

608
00:37:57,390 --> 00:38:01,460
exchange rate from
the beginning of 1999

609
00:38:01,460 --> 00:38:05,370
through just a few months ago.

610
00:38:05,370 --> 00:38:12,830
And the second panel is a
plot of the daily returns

611
00:38:12,830 --> 00:38:14,730
for that series.

612
00:38:14,730 --> 00:38:21,860
And here is a histogram
of those daily returns.

613
00:38:21,860 --> 00:38:28,990
And a fit of the Gaussian
distribution for the daily

614
00:38:28,990 --> 00:38:33,270
returns if our sort of
time scale is correct.

615
00:38:33,270 --> 00:38:37,350
Basically daily returns
are normally distributed.

616
00:38:37,350 --> 00:38:41,630
Days are disjoint in
terms of the price change.

617
00:38:41,630 --> 00:38:45,160
And so they're independent
and identically distributed

618
00:38:45,160 --> 00:38:46,960
under the model.

619
00:38:46,960 --> 00:38:49,480
And they all have the
same normal distribution

620
00:38:49,480 --> 00:38:52,330
with mean mu and
variance sigma squared.

621
00:38:55,220 --> 00:38:55,870
OK.

622
00:38:55,870 --> 00:39:00,000
This analysis assumes
basically that we're

623
00:39:00,000 --> 00:39:03,340
dealing with trading days for
the appropriate time scale,

624
00:39:03,340 --> 00:39:04,630
the Geometric Brownian Motion.

625
00:39:09,640 --> 00:39:10,440
Let's see.

626
00:39:10,440 --> 00:39:15,300
One can ask, well, what
if trading dates really

627
00:39:15,300 --> 00:39:19,240
isn't the right time scale,
but it's more calendar time.

628
00:39:19,240 --> 00:39:22,060
The change in value
over the weekends

629
00:39:22,060 --> 00:39:26,050
maybe correspond to price
changes, or value changes

630
00:39:26,050 --> 00:39:28,150
over a longer period of time.

631
00:39:28,150 --> 00:39:30,980
And so this model
really needs to be

632
00:39:30,980 --> 00:39:35,270
adjusted for that time scale.

633
00:39:35,270 --> 00:39:41,190
The exercise that
allows you to consider

634
00:39:41,190 --> 00:39:45,660
different delta t's shows you
what the maximum likelihood

635
00:39:45,660 --> 00:39:47,429
estimates-- you'll
be deriving maximum

636
00:39:47,429 --> 00:39:49,470
likely estimates if we
have different definitions

637
00:39:49,470 --> 00:39:52,180
of time scale there.

638
00:39:52,180 --> 00:40:02,992
But if you apply the calendar
time scale to this euro,

639
00:40:02,992 --> 00:40:05,200
let me just show you what
the different estimates are

640
00:40:05,200 --> 00:40:09,590
of the annualized mean return
and the annualized volatility.

641
00:40:09,590 --> 00:40:16,030
So if we consider trading days
for euro it's 10.25% or 0.1025.

642
00:40:16,030 --> 00:40:22,390
If you consider clock time, it
actually turns out to be 12.2%.

643
00:40:22,390 --> 00:40:25,070
So depending on how
you specify the model

644
00:40:25,070 --> 00:40:28,640
you get a different
definition of volatility here.

645
00:40:28,640 --> 00:40:36,170
And it's important to
basically understand

646
00:40:36,170 --> 00:40:40,650
sort of what the assumptions
are of your model

647
00:40:40,650 --> 00:40:47,480
and whether perhaps things
ought to be different.

648
00:40:47,480 --> 00:40:53,700
In stochastic modeling,
there's an area

649
00:40:53,700 --> 00:40:57,030
called subordinated
stochastic processes.

650
00:40:57,030 --> 00:41:04,220
And basically the idea is, if
you have a stochastic process

651
00:41:04,220 --> 00:41:08,770
like Geometric Brownian Motion
of simple Brownian motion,

652
00:41:08,770 --> 00:41:14,005
maybe you're observing that
on the wrong time scale.

653
00:41:14,005 --> 00:41:15,963
You may fit the Geometric
Brownian Motion model

654
00:41:15,963 --> 00:41:17,560
and it doesn't look right.

655
00:41:17,560 --> 00:41:19,740
But it could be that
there's a different time

656
00:41:19,740 --> 00:41:21,180
scale that's appropriate.

657
00:41:21,180 --> 00:41:24,990
And it's really Brownian
motion on that time scale.

658
00:41:24,990 --> 00:41:29,830
And so formally it's called
a subordinated stochastic

659
00:41:29,830 --> 00:41:30,330
process.

660
00:41:30,330 --> 00:41:32,160
You have a different
time function

661
00:41:32,160 --> 00:41:35,970
for how to model the
stochastic process.

662
00:41:35,970 --> 00:41:40,530
And the evaluation of
subordinated stochastic

663
00:41:40,530 --> 00:41:43,750
processes leads to consideration
of different time scales.

664
00:41:43,750 --> 00:41:48,320
With, say, equity markets,
and futures markets,

665
00:41:48,320 --> 00:41:50,987
sort of the volume of trading,
sort of cumulative volume

666
00:41:50,987 --> 00:41:53,070
of training might be really
an appropriate measure

667
00:41:53,070 --> 00:41:54,880
of the real time scale.

668
00:41:54,880 --> 00:41:56,820
Because that's a
measure of, in a sense,

669
00:41:56,820 --> 00:41:59,000
information flow
coming into the market

670
00:41:59,000 --> 00:42:01,870
through the level of activity.

671
00:42:01,870 --> 00:42:06,720
So anyway I wanted to highlight
how with different time scales

672
00:42:06,720 --> 00:42:08,320
you can get different results.

673
00:42:08,320 --> 00:42:11,660
And so that's something
to be evaluated.

674
00:42:11,660 --> 00:42:13,620
In looking at these
different models,

675
00:42:13,620 --> 00:42:15,420
OK, these first few
graphs here show

676
00:42:15,420 --> 00:42:18,880
the fit of the normal model
with the trading day time scale.

677
00:42:22,400 --> 00:42:22,966
Let's see.

678
00:42:22,966 --> 00:42:25,340
Those of you who've ever taken
a statistics class before,

679
00:42:25,340 --> 00:42:29,780
or an applied statistics, may
know about normal q-q plots.

680
00:42:29,780 --> 00:42:33,830
Basically if you
want to evaluate

681
00:42:33,830 --> 00:42:37,960
the consistency of
the returns here

682
00:42:37,960 --> 00:42:41,620
with a Gaussian
distribution, what we can do

683
00:42:41,620 --> 00:42:49,410
is plot the observed
ordered, sorted returns

684
00:42:49,410 --> 00:42:52,790
against what we would
expect the sorted returns

685
00:42:52,790 --> 00:42:56,200
to be if it were from
a Gaussian sample.

686
00:42:56,200 --> 00:42:58,940
So under the Geometric
Brownian Motion model

687
00:42:58,940 --> 00:43:04,530
the daily returns are a sample,
independent and identically

688
00:43:04,530 --> 00:43:06,740
distributed random variable
sampled from a Gaussian

689
00:43:06,740 --> 00:43:07,750
distribution.

690
00:43:07,750 --> 00:43:11,510
So the smallest return should
be consistent with the smallest

691
00:43:11,510 --> 00:43:14,080
of the sample size n.

692
00:43:14,080 --> 00:43:18,870
And what's being plotted here
is the theoretical quantiles

693
00:43:18,870 --> 00:43:21,980
or percentiles versus
the actual ones.

694
00:43:21,980 --> 00:43:24,930
And one would expect that
to lie along a straight line

695
00:43:24,930 --> 00:43:30,100
if the theoretical quantiles
were well-predicting

696
00:43:30,100 --> 00:43:32,670
the actual extreme values.

697
00:43:32,670 --> 00:43:37,760
What we see here is that as the
theoretical quantiles get high,

698
00:43:37,760 --> 00:43:40,990
and it's in units of
standard deviation units,

699
00:43:40,990 --> 00:43:45,080
the realized sample
returns are in fact

700
00:43:45,080 --> 00:43:47,540
much higher than would be
predicted by the Gaussian

701
00:43:47,540 --> 00:43:49,210
distribution.

702
00:43:49,210 --> 00:43:52,400
And similarly, on
the low end side.

703
00:43:52,400 --> 00:43:54,550
So there's a normal
q-q plot that's

704
00:43:54,550 --> 00:43:57,800
used often in the
diagnostics of these models.

705
00:43:57,800 --> 00:44:04,910
Then down here I've actually
plotted a fitted percentile

706
00:44:04,910 --> 00:44:06,160
distribution.

707
00:44:06,160 --> 00:44:12,470
Now what's been done here
is if we modeled the series

708
00:44:12,470 --> 00:44:16,790
as a series of Gaussian
random variables

709
00:44:16,790 --> 00:44:24,960
then we can evaluate
the percentile

710
00:44:24,960 --> 00:44:27,130
of the fitted Gaussian
distribution that

711
00:44:27,130 --> 00:44:29,430
was realized by every point.

712
00:44:29,430 --> 00:44:38,780
So if we have a return of say
negative 2%, what percentile

713
00:44:38,780 --> 00:44:40,540
is the normal fit of that?

714
00:44:45,720 --> 00:44:50,410
And you can evaluate the
cumulative distribution

715
00:44:50,410 --> 00:44:54,750
function of the fitted model at
that value to get that point.

716
00:44:54,750 --> 00:44:59,100
And what should the
distribution of percentiles

717
00:44:59,100 --> 00:45:04,410
be for fitted percentiles if
we have a really good model?

718
00:45:04,410 --> 00:45:07,370
OK.

719
00:45:07,370 --> 00:45:08,180
Well, OK.

720
00:45:08,180 --> 00:45:09,860
Let's think.

721
00:45:09,860 --> 00:45:14,890
If you consider the 50th
percentile you would expect,

722
00:45:14,890 --> 00:45:18,800
I guess, 50% of the data to
lie above the 50th percentile

723
00:45:18,800 --> 00:45:21,930
and 50% to lie below the
50th percentile, right?

724
00:45:21,930 --> 00:45:22,530
OK.

725
00:45:22,530 --> 00:45:24,160
Let's consider,
here I divided up

726
00:45:24,160 --> 00:45:27,840
into 100 bins
between zero and one

727
00:45:27,840 --> 00:45:31,955
so this bin is the
99th percentile.

728
00:45:38,630 --> 00:45:40,460
How many observations
would you expect

729
00:45:40,460 --> 00:45:45,590
to find in between the
99th and 100 percentile?

730
00:45:49,800 --> 00:45:51,170
This is an easy question.

731
00:45:51,170 --> 00:45:52,150
AUDIENCE: 1%.

732
00:45:52,150 --> 00:45:53,070
PROFESSOR: 1%.

733
00:45:53,070 --> 00:45:53,790
Right.

734
00:45:53,790 --> 00:45:55,290
And so in any of
these bins we would

735
00:45:55,290 --> 00:46:01,450
expect to see 1% if the
Gaussian model were fitting.

736
00:46:01,450 --> 00:46:06,690
And what we see is that,
well, at the extremes

737
00:46:06,690 --> 00:46:08,600
they're more extreme values.

738
00:46:08,600 --> 00:46:13,720
And actually inside there
are some fewer values.

739
00:46:13,720 --> 00:46:17,660
And actually this is exhibiting
a leptokurtic distribution

740
00:46:17,660 --> 00:46:20,070
for the actually
realized samples;

741
00:46:20,070 --> 00:46:22,080
basically the middle
of the distribution

742
00:46:22,080 --> 00:46:24,280
is a little thinner
and it's compensated

743
00:46:24,280 --> 00:46:26,660
for by fatter tails.

744
00:46:26,660 --> 00:46:29,440
But with this
particular model we

745
00:46:29,440 --> 00:46:33,900
can basically expect to
see a uniform distribution

746
00:46:33,900 --> 00:46:39,690
of percentiles in this graph.

747
00:46:39,690 --> 00:46:46,990
If we compare this with
a fit of the clock time

748
00:46:46,990 --> 00:46:51,770
we actually see
that clock time does

749
00:46:51,770 --> 00:46:59,490
a bit of a better job at getting
the extreme values closer

750
00:46:59,490 --> 00:47:01,110
to what we would
expect them to be.

751
00:47:01,110 --> 00:47:07,506
So in terms of being a better
model for the returns process,

752
00:47:07,506 --> 00:47:09,380
if we're concerned with
these extreme values,

753
00:47:09,380 --> 00:47:12,320
we're actually getting
a slightly better value

754
00:47:12,320 --> 00:47:13,720
with those.

755
00:47:13,720 --> 00:47:16,590
So all right.

756
00:47:16,590 --> 00:47:20,890
Let's move on back to the notes.

757
00:47:20,890 --> 00:47:28,410
And talk about the
Garman-Klass Estimator.

758
00:47:28,410 --> 00:47:30,905
So let me do this.

759
00:47:34,625 --> 00:47:36,080
All right.

760
00:47:36,080 --> 00:47:37,123
View full screen.

761
00:47:43,040 --> 00:47:45,334
OK.

762
00:47:45,334 --> 00:47:46,120
All right.

763
00:47:46,120 --> 00:47:48,090
So, OK.

764
00:47:48,090 --> 00:47:50,990
The Garman-Klass
Estimator is one

765
00:47:50,990 --> 00:47:55,410
where we consider the
situation where we actually

766
00:47:55,410 --> 00:47:59,410
have much more information
than simply sort of closing

767
00:47:59,410 --> 00:48:01,980
prices at different intervals.

768
00:48:01,980 --> 00:48:05,374
Basically all transaction
data's collected

769
00:48:05,374 --> 00:48:06,290
in a financial market.

770
00:48:06,290 --> 00:48:08,150
So really we have
virtually all of the data

771
00:48:08,150 --> 00:48:11,280
available if we want
it, or can pay for it.

772
00:48:11,280 --> 00:48:14,270
But let's consider
a case where we

773
00:48:14,270 --> 00:48:18,300
expand upon just having
closing prices to having

774
00:48:18,300 --> 00:48:22,190
additional information over
increments of time that

775
00:48:22,190 --> 00:48:27,230
include the open,
high, and low price

776
00:48:27,230 --> 00:48:28,355
over the different periods.

777
00:48:33,500 --> 00:48:35,940
So those of you who are
familiar with bar data

778
00:48:35,940 --> 00:48:41,000
graphs that you see whenever you
plot stock prices over periods

779
00:48:41,000 --> 00:48:46,870
of weeks or months you'll
be familiar with having

780
00:48:46,870 --> 00:48:48,610
seen those.

781
00:48:48,610 --> 00:48:52,550
Now the Garman-Klass
paper addressed

782
00:48:52,550 --> 00:48:54,730
how can we exploit this
additional information

783
00:48:54,730 --> 00:49:00,290
to improve upon our estimates
of the close-to-close.

784
00:49:00,290 --> 00:49:03,600
And so we'll just
use this notation.

785
00:49:03,600 --> 00:49:06,200
Well, let's make some
assumptions and notation.

786
00:49:06,200 --> 00:49:09,470
So we'll assume that mu is equal
to 0 in our Geometric Brownian

787
00:49:09,470 --> 00:49:10,482
Motion model.

788
00:49:10,482 --> 00:49:12,190
So we don't have to
worry about the mean.

789
00:49:12,190 --> 00:49:14,260
We're just concerned
with volatility.

790
00:49:14,260 --> 00:49:18,530
We'll assume that the
increments are one for daily,

791
00:49:18,530 --> 00:49:20,760
corresponding to daily.

792
00:49:20,760 --> 00:49:23,640
And we'll let little f,
between zero and one,

793
00:49:23,640 --> 00:49:35,130
correspond to the time of day
at which the market opens.

794
00:49:35,130 --> 00:49:40,130
So over a day, from
day zero to day one at

795
00:49:40,130 --> 00:49:45,420
f we assume that
the market opens

796
00:49:45,420 --> 00:49:50,780
and basically the Geometric
Brownian Motion process

797
00:49:50,780 --> 00:49:53,020
might have closed
on day zero here.

798
00:49:53,020 --> 00:50:00,790
So this would be C_0 and it
may have opened on day one

799
00:50:00,790 --> 00:50:01,880
at this value.

800
00:50:01,880 --> 00:50:05,990
So this would be O_1.

801
00:50:05,990 --> 00:50:09,000
Might have gone up
and down and then

802
00:50:09,000 --> 00:50:14,801
closed here with the
Brownian Motion process.

803
00:50:14,801 --> 00:50:15,300
OK.

804
00:50:15,300 --> 00:50:22,240
This value here would
correspond to the high value.

805
00:50:22,240 --> 00:50:25,710
This value here would correspond
to the low value on day one.

806
00:50:25,710 --> 00:50:30,840
And then the closing
value here would be C_1.

807
00:50:30,840 --> 00:50:35,310
So the model is we have this
underlying Brownian Motion

808
00:50:35,310 --> 00:50:41,220
process is actually working
over continuous time,

809
00:50:41,220 --> 00:50:45,950
but we just observe it over
the time when the markets open.

810
00:50:45,950 --> 00:50:48,840
And so it can move between when
the market closes and opens

811
00:50:48,840 --> 00:50:52,640
on any given day and we have
the additional information.

812
00:50:52,640 --> 00:50:56,380
Instead of just the close, we
also have the high and low.

813
00:50:56,380 --> 00:51:00,450
So let's look at how we might
exploit that information

814
00:51:00,450 --> 00:51:03,070
to estimate volatility.

815
00:51:03,070 --> 00:51:03,570
OK.

816
00:51:03,570 --> 00:51:07,920
Using data from the first period
as we've graphed here, let's

817
00:51:07,920 --> 00:51:16,370
first just highlight what
the close-to-close return is.

818
00:51:16,370 --> 00:51:17,900
And that basically
is an estimate

819
00:51:17,900 --> 00:51:20,950
of the one-period variance.

820
00:51:20,950 --> 00:51:25,750
And so sigma hat 0 squared is
a single period squared return.

821
00:51:28,300 --> 00:51:33,130
C_1 minus C_0 has a distribution
which is normal with mean 0,

822
00:51:33,130 --> 00:51:36,930
and variance sigma squared.

823
00:51:36,930 --> 00:51:45,429
And if we consider
squaring that, what's

824
00:51:45,429 --> 00:51:46,470
the distribution of that?

825
00:51:46,470 --> 00:51:50,482
That's the square of a
normal random variable, which

826
00:51:50,482 --> 00:51:52,690
is chi squared, but it's a
multiple of a chi squared.

827
00:51:52,690 --> 00:51:57,250
It's sigma squared times a chi
squared one random variable.

828
00:51:57,250 --> 00:52:00,160
And with a chi squared
random variable

829
00:52:00,160 --> 00:52:03,150
the expected value is 1.

830
00:52:03,150 --> 00:52:09,140
The variance of a chi squared
random variable is equal to 2.

831
00:52:09,140 --> 00:52:11,430
So just knowing
those facts gives us

832
00:52:11,430 --> 00:52:17,120
the fact we have an unbiased
estimate of the volatility

833
00:52:17,120 --> 00:52:25,420
parameter sigma and the variance
is 2 sigma to the fourth.

834
00:52:25,420 --> 00:52:29,716
So that's basically
the precision

835
00:52:29,716 --> 00:52:31,510
of close-to-close returns.

836
00:52:34,140 --> 00:52:36,275
Let's look at two
other estimates.

837
00:52:40,170 --> 00:52:42,940
Basically the
open-to-close return,

838
00:52:42,940 --> 00:52:46,950
sigma_1 squared,
normalized by f,

839
00:52:46,950 --> 00:52:48,410
the length of the interval f.

840
00:52:48,410 --> 00:53:02,821
So we have sigma_1 is equal
to O_1 minus C_0 squared.

841
00:53:02,821 --> 00:53:03,320
OK.

842
00:53:06,500 --> 00:53:07,917
Actually why don't
I just do this?

843
00:53:07,917 --> 00:53:09,624
I'll just write down
a few facts and then

844
00:53:09,624 --> 00:53:11,270
you can see that the
results are clear.

845
00:53:11,270 --> 00:53:17,400
Basically O_1 minus C_0 is
distributed normal with mean 0

846
00:53:17,400 --> 00:53:19,860
and variance f sigma squared.

847
00:53:19,860 --> 00:53:25,100
And C_1 minus O_1 is
distributed normal with mean 0

848
00:53:25,100 --> 00:53:28,490
in variance 1 minus
f sigma squared.

849
00:53:28,490 --> 00:53:28,990
OK.

850
00:53:28,990 --> 00:53:31,410
This is simply using the
properties of the diffusion

851
00:53:31,410 --> 00:53:33,840
process over different
periods of time.

852
00:53:33,840 --> 00:53:37,070
So if we normalize
the squared values

853
00:53:37,070 --> 00:53:39,140
by the length of
the interval we get

854
00:53:39,140 --> 00:53:42,150
estimates of the volatility.

855
00:53:42,150 --> 00:53:46,670
And what's particularly
significant

856
00:53:46,670 --> 00:53:49,250
about these
estimates one and two

857
00:53:49,250 --> 00:53:52,360
is that they're independent.

858
00:53:52,360 --> 00:53:54,660
So we actually
have two estimates

859
00:53:54,660 --> 00:53:58,130
of the same
underlying parameter,

860
00:53:58,130 --> 00:54:01,090
which are independent.

861
00:54:01,090 --> 00:54:05,305
And actually they both
have the same mean

862
00:54:05,305 --> 00:54:08,360
and they both have
the same variance.

863
00:54:08,360 --> 00:54:11,650
So if we consider
a new estimate,

864
00:54:11,650 --> 00:54:15,790
which is basically
averaging those two.

865
00:54:15,790 --> 00:54:21,050
Then this new estimate has the
same-- is unbiased as well,

866
00:54:21,050 --> 00:54:26,340
but it's variance is basically
the variance of this sum.

867
00:54:26,340 --> 00:54:29,330
So it's 1/2 squared times
this variance plus 1/2 squared

868
00:54:29,330 --> 00:54:33,640
times this variance, which is
a half of the variance of each

869
00:54:33,640 --> 00:54:34,350
of them.

870
00:54:34,350 --> 00:54:38,000
So this estimate
has lower variance

871
00:54:38,000 --> 00:54:40,200
than our close-to-close.

872
00:54:40,200 --> 00:54:46,560
And we can define the efficiency
of this particular estimate

873
00:54:46,560 --> 00:54:50,870
relative to the
close-to-close estimate as 2.

874
00:54:50,870 --> 00:54:54,175
Basically we get
double the precision.

875
00:54:58,510 --> 00:55:01,940
Suppose you had the open,
high, close for one day.

876
00:55:01,940 --> 00:55:04,220
How many days of
close-to-close data

877
00:55:04,220 --> 00:55:08,343
would you need to have the
same variance as this estimate?

878
00:55:13,860 --> 00:55:14,729
No.

879
00:55:14,729 --> 00:55:15,645
AUDIENCE: [INAUDIBLE].

880
00:55:15,645 --> 00:55:17,550
Because of the three
data points [INAUDIBLE].

881
00:55:17,550 --> 00:55:18,360
PROFESSOR: No.

882
00:55:18,360 --> 00:55:20,590
No.

883
00:55:20,590 --> 00:55:22,565
Anyone else?

884
00:55:22,565 --> 00:55:23,760
One more.

885
00:55:23,760 --> 00:55:25,150
Four.

886
00:55:25,150 --> 00:55:25,760
OK.

887
00:55:25,760 --> 00:55:30,070
Basically if the
variance is 1/2,

888
00:55:30,070 --> 00:55:36,040
basically to get the standard
deviation, or the variance

889
00:55:36,040 --> 00:55:39,205
to be-- I'm sorry.

890
00:55:39,205 --> 00:55:40,770
The ratio of the
variance is two.

891
00:55:40,770 --> 00:55:41,430
So no.

892
00:55:41,430 --> 00:55:43,140
So it actually is close to two.

893
00:55:46,571 --> 00:55:47,070
Let's see.

894
00:55:47,070 --> 00:55:49,640
Our 1/n is-- so it
actually is two.

895
00:55:49,640 --> 00:55:50,140
OK.

896
00:55:50,140 --> 00:55:51,420
I was thinking standard
deviation units

897
00:55:51,420 --> 00:55:52,540
instead of squared units.

898
00:55:52,540 --> 00:55:55,900
So I was trying to
be clever there.

899
00:55:55,900 --> 00:55:59,270
So it actually is
basically two days.

900
00:55:59,270 --> 00:56:03,150
So sampling this
with this information

901
00:56:03,150 --> 00:56:05,842
gives you as much as two
days worth of information.

902
00:56:05,842 --> 00:56:06,800
So what does that mean?

903
00:56:06,800 --> 00:56:08,216
Well, if you want
something that's

904
00:56:08,216 --> 00:56:09,970
as efficient as daily
estimates you'll

905
00:56:09,970 --> 00:56:14,390
need to look back one
day instead of two days

906
00:56:14,390 --> 00:56:16,440
to get the same efficiency
with the estimate.

907
00:56:16,440 --> 00:56:16,940
All right.

908
00:56:19,870 --> 00:56:21,760
The motivation for
the Garman-Klass paper

909
00:56:21,760 --> 00:56:26,035
was actually a paper
written by Parkinson

910
00:56:26,035 --> 00:56:31,210
in 1976, which dealt with using
the extremes of a Brownian

911
00:56:31,210 --> 00:56:34,730
Motion to estimate the
underlying parameters.

912
00:56:34,730 --> 00:56:39,590
And when Choongbum talks about
Brownian Motion a bit later,

913
00:56:39,590 --> 00:56:42,080
I don't know if you'll
derive this result,

914
00:56:42,080 --> 00:56:43,810
but in courses on
stochastic processes

915
00:56:43,810 --> 00:56:47,340
one does derive properties
of the maximum of a Brownian

916
00:56:47,340 --> 00:56:50,200
Motion over a given
interval and the minimum.

917
00:56:50,200 --> 00:56:53,350
And it turns out
that if you look

918
00:56:53,350 --> 00:56:57,070
at the difference between the
high and low squared divided

919
00:56:57,070 --> 00:57:02,140
by 4 log 2, this is an
estimate of the volatility

920
00:57:02,140 --> 00:57:03,460
of the process.

921
00:57:03,460 --> 00:57:06,300
And the efficiency
of this estimate

922
00:57:06,300 --> 00:57:12,589
turns out to be 5.2,
which is better yet.

923
00:57:12,589 --> 00:57:14,380
Well, Garman and Klass
were excited by that

924
00:57:14,380 --> 00:57:16,710
and wanted to find
even better ones.

925
00:57:16,710 --> 00:57:22,350
So they wrote a paper that
evaluated all different kinds

926
00:57:22,350 --> 00:57:23,000
of estimates.

927
00:57:23,000 --> 00:57:25,772
And I encourage you
to Google that paper

928
00:57:25,772 --> 00:57:27,480
and read it because
it's very accessible.

929
00:57:27,480 --> 00:57:31,060
And it sort of highlights the
statistical and probability

930
00:57:31,060 --> 00:57:33,150
issues associated
with these problems.

931
00:57:33,150 --> 00:57:34,800
But what they did
was they derived

932
00:57:34,800 --> 00:57:38,130
the best analytic
scale-invariant estimator,

933
00:57:38,130 --> 00:57:46,130
which has this sort of bizarre
combination of different terms,

934
00:57:46,130 --> 00:57:48,840
but basically we're
using normalized values

935
00:57:48,840 --> 00:57:52,270
of the high, low, close
normalized by the open.

936
00:57:52,270 --> 00:57:55,840
And they're able to get
an efficiency of 7.4

937
00:57:55,840 --> 00:57:59,390
with this combination.

938
00:57:59,390 --> 00:58:06,700
Now scale-invariant estimates,
in statistical theory,

939
00:58:06,700 --> 00:58:08,130
they're different
principles that

940
00:58:08,130 --> 00:58:11,610
guide the development of
different methodologies.

941
00:58:11,610 --> 00:58:16,420
And one kind of principle is
issues of scale invariance.

942
00:58:16,420 --> 00:58:20,600
If you're estimating a scale
parameter, and in this case

943
00:58:20,600 --> 00:58:22,210
volatility is telling
you essentially

944
00:58:22,210 --> 00:58:25,680
how large is the
variability of this process,

945
00:58:25,680 --> 00:58:29,750
if you were to say multiply your
original data all by a given

946
00:58:29,750 --> 00:58:33,950
constant, then a
scale-invariant estimator

947
00:58:33,950 --> 00:58:37,560
should be such that your
estimator changes in that case

948
00:58:37,560 --> 00:58:39,710
only by that same scale factor.

949
00:58:39,710 --> 00:58:42,710
So sort of the
estimator doesn't depend

950
00:58:42,710 --> 00:58:45,700
on how you scale the data.

951
00:58:45,700 --> 00:58:50,230
So that's the notion
of scale invariance.

952
00:58:50,230 --> 00:58:55,092
The Garman-Klass paper
actually goes to the nth degree

953
00:58:55,092 --> 00:58:56,800
and actually finds a
particular estimator

954
00:58:56,800 --> 00:59:00,120
that has an efficiency
of 8.4, which

955
00:59:00,120 --> 00:59:02,470
is really highly significant.

956
00:59:02,470 --> 00:59:08,350
So if you are working
with a modeling process

957
00:59:08,350 --> 00:59:11,680
where you believe that the
underlying parameters may

958
00:59:11,680 --> 00:59:16,080
be reasonably assumed
to be constant

959
00:59:16,080 --> 00:59:19,590
over short periods
of time, well,

960
00:59:19,590 --> 00:59:21,870
over those short periods
of time if you use

961
00:59:21,870 --> 00:59:24,360
these extended
estimators like this,

962
00:59:24,360 --> 00:59:26,320
you'll get much more
precise measures

963
00:59:26,320 --> 00:59:28,970
of the underlying parameters
than from just using

964
00:59:28,970 --> 00:59:33,160
simple close-to-close data.

965
00:59:33,160 --> 00:59:33,730
All right.

966
00:59:33,730 --> 00:59:38,790
Let's introduce Poisson
Jump Diffusions.

967
00:59:38,790 --> 00:59:44,480
With Poisson Jump
Diffusions we have

968
00:59:44,480 --> 00:59:50,560
basically a stochastic
differential equation

969
00:59:50,560 --> 00:59:52,560
for representing this model.

970
00:59:52,560 --> 00:59:55,750
And it's just like the
Geometric Brownian Motion model,

971
00:59:55,750 --> 01:00:00,930
except we have this additional
term, gamma sigma Z d pi of t.

972
01:00:00,930 --> 01:00:04,390
Now that's a lot of
different variables,

973
01:00:04,390 --> 01:00:07,370
but essentially what
we're thinking about

974
01:00:07,370 --> 01:00:20,300
is over time a Brownian Motion
process is fully continuous.

975
01:00:20,300 --> 01:00:24,390
There are basically no jumps in
this Brownian Motion process.

976
01:00:24,390 --> 01:00:27,610
In order to allow
for jumps, we assume

977
01:00:27,610 --> 01:00:33,960
that there's some process pi of
t, which is a Poisson process.

978
01:00:33,960 --> 01:00:38,480
It's counting process that
counts when jumps occur,

979
01:00:38,480 --> 01:00:40,160
how many jumps have occurred.

980
01:00:40,160 --> 01:00:44,020
So that might start
at 0 at the value 0.

981
01:00:44,020 --> 01:00:50,680
Then if there's a jump
here it goes up by one.

982
01:00:50,680 --> 01:00:54,280
And then if there's another
jump here, it goes up by one,

983
01:00:54,280 --> 01:00:55,890
and so forth.

984
01:00:55,890 --> 01:01:02,000
And so the Poisson Jump
Diffusion model says,

985
01:01:02,000 --> 01:01:04,450
this diffusion
process is actually

986
01:01:04,450 --> 01:01:06,630
going to experience
some shocks to it.

987
01:01:06,630 --> 01:01:08,940
Those shocks are
going to be arriving

988
01:01:08,940 --> 01:01:11,010
according to a Poisson process.

989
01:01:11,010 --> 01:01:12,740
If you've taken
stochastic modeling

990
01:01:12,740 --> 01:01:17,925
you know that that's a sort
of a purely random process.

991
01:01:17,925 --> 01:01:21,630
Basically exponential
arrival rate of shocks occur.

992
01:01:21,630 --> 01:01:23,740
You can't predict them.

993
01:01:23,740 --> 01:01:28,630
And when those
occur, d pi of t is

994
01:01:28,630 --> 01:01:31,690
going to change from 0
up to the unit increment.

995
01:01:31,690 --> 01:01:33,740
So d pi of t is 1.

996
01:01:33,740 --> 01:01:36,540
And then we'll realize
gamma sigma Z of t.

997
01:01:36,540 --> 01:01:40,975
So at this point we're
going to have shocks.

998
01:01:43,580 --> 01:01:50,900
Here this is going to be gamma
sigma Z_1 And at this point,

999
01:01:50,900 --> 01:01:57,960
maybe it's a negative
shock, gamma sigma Z_2.

1000
01:01:57,960 --> 01:01:59,440
This is 0.

1001
01:01:59,440 --> 01:02:03,400
And so with this overall
process we basically

1002
01:02:03,400 --> 01:02:05,720
have a shift in the
diffusion, up or down,

1003
01:02:05,720 --> 01:02:08,070
according to these values.

1004
01:02:08,070 --> 01:02:11,746
And so this model allows for
the arrival of these processes

1005
01:02:11,746 --> 01:02:13,870
to be random according to
the Poisson distribution,

1006
01:02:13,870 --> 01:02:17,770
and for the magnitude of the
shocks to be random as well.

1007
01:02:21,350 --> 01:02:24,740
Now like the Geometric
Brownian Motion model

1008
01:02:24,740 --> 01:02:31,350
this process sort of has
independent increments, which

1009
01:02:31,350 --> 01:02:35,630
helps with this estimation.

1010
01:02:35,630 --> 01:02:39,550
One could estimate this
model by maximum likelihood,

1011
01:02:39,550 --> 01:02:43,020
but it does get tricky
in that basically

1012
01:02:43,020 --> 01:02:45,970
over different increments
of time the change

1013
01:02:45,970 --> 01:02:50,200
in the process corresponds
to the diffusion increment,

1014
01:02:50,200 --> 01:02:53,550
plus the sum of the
jumps that have occurred

1015
01:02:53,550 --> 01:02:55,820
over that same increment.

1016
01:02:55,820 --> 01:03:01,170
And so the model ultimately
is a Poisson mixture

1017
01:03:01,170 --> 01:03:05,410
of Gaussian distributions.

1018
01:03:05,410 --> 01:03:11,040
And in order to evaluate this
model, model's properties,

1019
01:03:11,040 --> 01:03:14,370
moment generating functions
can be computed rather directly

1020
01:03:14,370 --> 01:03:15,380
with that.

1021
01:03:15,380 --> 01:03:19,650
And so one can understand how
the moments of the process

1022
01:03:19,650 --> 01:03:22,410
vary with these different
model parameters.

1023
01:03:22,410 --> 01:03:28,410
The likelihood function is
a product of Poisson sums.

1024
01:03:28,410 --> 01:03:32,270
And there's a closed form
for the EM algorithm, which

1025
01:03:32,270 --> 01:03:34,280
can be used to
implement the estimation

1026
01:03:34,280 --> 01:03:37,280
of the unknown parameters.

1027
01:03:37,280 --> 01:03:44,430
And if you think about observing
a Poisson Jump Diffusion

1028
01:03:44,430 --> 01:03:52,070
process, if you knew
where the jumps occurred,

1029
01:03:52,070 --> 01:03:53,740
so you knew where
the jumps occurred

1030
01:03:53,740 --> 01:03:57,080
and how many there were
per increment in your data,

1031
01:03:57,080 --> 01:04:00,490
then the maximum
likelihood estimation

1032
01:04:00,490 --> 01:04:03,230
would all be very, very simple.

1033
01:04:03,230 --> 01:04:05,660
And because this sort
of is a separation

1034
01:04:05,660 --> 01:04:07,840
of the estimation of
the Gaussian parameters

1035
01:04:07,840 --> 01:04:10,940
from the Poisson parameters.

1036
01:04:10,940 --> 01:04:14,100
When you haven't observed
those values then

1037
01:04:14,100 --> 01:04:18,880
you need to deal with methods
appropriate for missing data.

1038
01:04:18,880 --> 01:04:22,910
And the EM algorithm is a very
famous algorithm developed

1039
01:04:22,910 --> 01:04:28,730
by the people up at Harvard,
Rubin, Laird, and Dempster,

1040
01:04:28,730 --> 01:04:35,720
which deals with, basically if
the problem is much simpler,

1041
01:04:35,720 --> 01:04:41,240
if you could posit there
being unobserved variables

1042
01:04:41,240 --> 01:04:44,600
that you would observe,
then you sort of

1043
01:04:44,600 --> 01:04:46,960
expand the problem to
include your observed

1044
01:04:46,960 --> 01:04:50,700
data, plus the missing
data, in this case where

1045
01:04:50,700 --> 01:04:52,310
the jumps have occurred.

1046
01:04:52,310 --> 01:04:57,190
And you then do
conditional expectations

1047
01:04:57,190 --> 01:04:59,850
of estimating those jumps.

1048
01:04:59,850 --> 01:05:03,560
And then assuming that
those jumps had those--

1049
01:05:03,560 --> 01:05:06,960
occurred with those frequencies,
estimating the underlying

1050
01:05:06,960 --> 01:05:07,580
parameters.

1051
01:05:07,580 --> 01:05:09,800
So the EM algorithm
is very powerful

1052
01:05:09,800 --> 01:05:14,550
and has extensive
applications in all kinds

1053
01:05:14,550 --> 01:05:15,840
of different models.

1054
01:05:15,840 --> 01:05:17,740
I'll put up on the
website a paper

1055
01:05:17,740 --> 01:05:24,240
that I wrote with David
Pickard and his student

1056
01:05:24,240 --> 01:05:30,470
Arshad Zakaria, which goes
through the maximum likelihood

1057
01:05:30,470 --> 01:05:31,600
methodology for this.

1058
01:05:31,600 --> 01:05:34,980
But looking at that,
you can see how

1059
01:05:34,980 --> 01:05:39,110
with an extended model,
how maximum likelihood gets

1060
01:05:39,110 --> 01:05:44,390
implemented and I think
that's useful to see.

1061
01:05:44,390 --> 01:05:44,890
All right.

1062
01:05:44,890 --> 01:05:51,060
So let's turn next
to ARCH models.

1063
01:05:51,060 --> 01:05:54,430
And OK.

1064
01:05:54,430 --> 01:06:03,250
Just as a bit of motivation, the
Geometric Brownian Motion model

1065
01:06:03,250 --> 01:06:05,640
and also the Poisson
Jump Diffusion model

1066
01:06:05,640 --> 01:06:10,750
are models which assume
that volatility over time

1067
01:06:10,750 --> 01:06:14,340
is essentially stationary.

1068
01:06:14,340 --> 01:06:20,470
And with the sort of independent
increments of those processes,

1069
01:06:20,470 --> 01:06:22,160
the volatility over
different increments

1070
01:06:22,160 --> 01:06:23,830
is essentially the same.

1071
01:06:23,830 --> 01:06:27,650
So the ARCH models
were introduced

1072
01:06:27,650 --> 01:06:31,440
to accommodate the
possibility that there's

1073
01:06:31,440 --> 01:06:33,025
time dependence in volatility.

1074
01:06:37,790 --> 01:06:40,220
And so let's see.

1075
01:06:47,041 --> 01:06:47,540
Let's see.

1076
01:06:47,540 --> 01:06:50,550
Let me go.

1077
01:06:50,550 --> 01:06:52,020
OK.

1078
01:06:52,020 --> 01:06:55,190
At the very end, I'll go through
an example showing that time

1079
01:06:55,190 --> 01:07:04,910
dependence with our
euro/dollar exchange rates.

1080
01:07:04,910 --> 01:07:09,230
So the set up for this
model is basically

1081
01:07:09,230 --> 01:07:13,370
we look at the log of
the price relatives y_t

1082
01:07:13,370 --> 01:07:19,610
and we model the
residuals to not

1083
01:07:19,610 --> 01:07:27,400
be of constant volatility,
but to be multiples of sort

1084
01:07:27,400 --> 01:07:32,900
of white noise with mean 0
and variance 1, where sigma_t

1085
01:07:32,900 --> 01:07:38,920
is given by this essentially
ARCH function, which

1086
01:07:38,920 --> 01:07:42,200
says that the volatility
at a given period t

1087
01:07:42,200 --> 01:07:48,020
is a weighted average
of the squared residuals

1088
01:07:48,020 --> 01:07:50,870
over the last p lags.

1089
01:07:50,870 --> 01:07:56,400
And so if there's a
large residual then

1090
01:07:56,400 --> 01:08:02,810
that could persist and
make the next observation

1091
01:08:02,810 --> 01:08:04,660
have a large variance.

1092
01:08:04,660 --> 01:08:10,310
And so this accommodates
some time dependence.

1093
01:08:10,310 --> 01:08:18,229
Now this model actually
has parameter constraints,

1094
01:08:18,229 --> 01:08:21,790
which are never a
nice thing to have

1095
01:08:21,790 --> 01:08:24,410
when you're fitting models.

1096
01:08:24,410 --> 01:08:28,290
In this case the parameters
alpha_one through alpha_p

1097
01:08:28,290 --> 01:08:30,279
all have to be positive.

1098
01:08:30,279 --> 01:08:33,400
And why do they
have to be positive?

1099
01:08:36,166 --> 01:08:37,262
AUDIENCE: [INAUDIBLE].

1100
01:08:37,262 --> 01:08:37,970
PROFESSOR: Right.

1101
01:08:37,970 --> 01:08:39,460
Variance is positive.

1102
01:08:39,460 --> 01:08:42,089
So if any of these
alphas were negative,

1103
01:08:42,089 --> 01:08:44,130
then there would be a
possibility that under this

1104
01:08:44,130 --> 01:08:46,250
model that you could
have negative volatility,

1105
01:08:46,250 --> 01:08:47,020
which you can't.

1106
01:08:47,020 --> 01:08:55,109
So if we estimate this
model to estimate them

1107
01:08:55,109 --> 01:08:57,729
with the constraint that
all these parameter values

1108
01:08:57,729 --> 01:09:00,979
are non-negative.

1109
01:09:00,979 --> 01:09:06,529
So that does complicate
the estimation a bit.

1110
01:09:06,529 --> 01:09:12,580
In terms of understanding
how this process works

1111
01:09:12,580 --> 01:09:17,880
one can actually see how
the ARCH model implies

1112
01:09:17,880 --> 01:09:23,492
an autoregressive model for
the squared residuals, which

1113
01:09:23,492 --> 01:09:24,450
turns out to be useful.

1114
01:09:24,450 --> 01:09:28,380
So the top line there
is the ARCH model

1115
01:09:28,380 --> 01:09:31,149
saying that the variance
of the t period return

1116
01:09:31,149 --> 01:09:35,000
is this weighted average
of the past residuals.

1117
01:09:35,000 --> 01:09:41,229
And then if we simply add
a new variable u_t, which

1118
01:09:41,229 --> 01:09:51,399
is our squared residual minus
its variance, to both sides

1119
01:09:51,399 --> 01:09:59,020
we get the next line, which says
that epsilon_t squared follows

1120
01:09:59,020 --> 01:10:07,230
an autoregression on itself,
with the u_t value being

1121
01:10:07,230 --> 01:10:10,800
the disturbance in
that autoregression.

1122
01:10:10,800 --> 01:10:17,230
Now u_t, which is epsilon_t
squared minus sigma squared t,

1123
01:10:17,230 --> 01:10:21,150
what is the mean of that?

1124
01:10:21,150 --> 01:10:23,220
The mean is 0.

1125
01:10:23,220 --> 01:10:25,030
So it's almost white noise.

1126
01:10:25,030 --> 01:10:28,920
But its variance is maybe
going to change over time.

1127
01:10:28,920 --> 01:10:31,130
So it's not sort of
standard white noise,

1128
01:10:31,130 --> 01:10:34,950
but it basically
has expectation 0.

1129
01:10:34,950 --> 01:10:38,040
It's also conditional
independent,

1130
01:10:38,040 --> 01:10:40,850
but there's some possible
variability there.

1131
01:10:40,850 --> 01:10:42,930
But what this implies
is that there basically

1132
01:10:42,930 --> 01:10:45,770
is an autoregressive
model where we just

1133
01:10:45,770 --> 01:10:50,190
have time-varying variances
in the underlying process.

1134
01:10:50,190 --> 01:10:53,820
Now because of that
one can sort of quickly

1135
01:10:53,820 --> 01:10:57,120
evaluate whether there's
ARCH structure in data

1136
01:10:57,120 --> 01:10:58,960
by simply fitting an
autoregressive model

1137
01:10:58,960 --> 01:11:00,970
to the squared residuals.

1138
01:11:00,970 --> 01:11:02,750
And testing whether
that regression

1139
01:11:02,750 --> 01:11:04,660
is significant or not.

1140
01:11:04,660 --> 01:11:08,940
And that formally is a
Lagrange multiplier test.

1141
01:11:08,940 --> 01:11:13,490
Some of the original papers by
Engle go through that analysis.

1142
01:11:13,490 --> 01:11:17,880
And the test statistic
turns out to just

1143
01:11:17,880 --> 01:11:24,620
be the multiple of the r
squared for that regression fit.

1144
01:11:24,620 --> 01:11:33,690
And basically under,
say, a null hypothesis

1145
01:11:33,690 --> 01:11:35,820
that there isn't
any ARCH structure,

1146
01:11:35,820 --> 01:11:40,290
then this regression model
should have no predictability.

1147
01:11:40,290 --> 01:11:41,820
This ARCH model
in the residuals,

1148
01:11:41,820 --> 01:11:45,920
basically if there's no time
dependence in those residuals,

1149
01:11:45,920 --> 01:11:52,620
that's evidence of there being
an absence of ARCH structure.

1150
01:11:52,620 --> 01:11:56,640
And so under the null
hypothesis of no ARCH structure

1151
01:11:56,640 --> 01:11:59,580
that r squared statistic
should be small.

1152
01:11:59,580 --> 01:12:04,260
It turns out that sort of n
times the r squared statistic

1153
01:12:04,260 --> 01:12:08,230
with p variables
is asymptotically

1154
01:12:08,230 --> 01:12:11,370
a chi-square distribution
with p degrees of freedom.

1155
01:12:11,370 --> 01:12:17,760
So that's where that test
statistic comes into play.

1156
01:12:17,760 --> 01:12:23,180
And in implementing this, the
fact that we were applying

1157
01:12:23,180 --> 01:12:25,630
essentially least squares
with the autoregression

1158
01:12:25,630 --> 01:12:30,900
to implement this Lagrange
multiplier test, but we were

1159
01:12:30,900 --> 01:12:33,530
assuming, well,
we're not assuming,

1160
01:12:33,530 --> 01:12:35,980
we're implicitly assuming the
assumptions of Gauss-Markov

1161
01:12:35,980 --> 01:12:37,730
in fitting that.

1162
01:12:37,730 --> 01:12:42,920
This corresponds to the notion
of quasi-maximum likelihood

1163
01:12:42,920 --> 01:12:47,040
estimates for
unknown parameters.

1164
01:12:47,040 --> 01:12:51,870
And quasi-maximum
likelihood estimates

1165
01:12:51,870 --> 01:12:56,570
are used extensively in some
stochastic volatility models.

1166
01:12:56,570 --> 01:12:59,140
And so essentially situations
where you sort of use

1167
01:12:59,140 --> 01:13:04,180
the normal approximation, or
the second order approximation,

1168
01:13:04,180 --> 01:13:05,870
to get your estimates,
and they turn out

1169
01:13:05,870 --> 01:13:11,930
to be consistent and decent.

1170
01:13:16,570 --> 01:13:18,590
All right.

1171
01:13:18,590 --> 01:13:20,700
Let's go to Maximum
Likelihood Estimation.

1172
01:13:20,700 --> 01:13:23,800
OK Maximum Likelihood
Estimation basically

1173
01:13:23,800 --> 01:13:28,350
involves-- the hard part
is defining the likelihood

1174
01:13:28,350 --> 01:13:33,700
function, which is the
density of the data given

1175
01:13:33,700 --> 01:13:35,290
the unknown parameters.

1176
01:13:35,290 --> 01:13:40,743
In this case, the data are
conditionally independent.

1177
01:13:50,630 --> 01:13:55,080
The joint density is the product
of the density of y_t given

1178
01:13:55,080 --> 01:13:56,570
the information at t minus 1.

1179
01:13:56,570 --> 01:14:00,520
So basically the joint
probability density

1180
01:14:00,520 --> 01:14:03,479
is the density at each time
point conditional on the past,

1181
01:14:03,479 --> 01:14:06,020
and then the density times the
density of the next time point

1182
01:14:06,020 --> 01:14:07,130
conditional on the past.

1183
01:14:07,130 --> 01:14:09,730
And those are all
normal random variables.

1184
01:14:09,730 --> 01:14:13,760
So these are the normal
PDFs coming into play here.

1185
01:14:13,760 --> 01:14:16,170
And so what we want
to do is basically

1186
01:14:16,170 --> 01:14:18,150
maximize this
likelihood function

1187
01:14:18,150 --> 01:14:19,440
subject to these constraints.

1188
01:14:22,040 --> 01:14:23,820
And we already went
through the fact

1189
01:14:23,820 --> 01:14:25,890
that the alpha_i's have
to be greater than zero.

1190
01:14:25,890 --> 01:14:29,080
And it turns out you
also have to have

1191
01:14:29,080 --> 01:14:33,124
that the sum of the
alphas is less than one.

1192
01:14:33,124 --> 01:14:35,040
Now what would happen
if the sum of the alphas

1193
01:14:35,040 --> 01:14:36,578
was not less than one?

1194
01:14:40,036 --> 01:14:41,852
AUDIENCE: [INAUDIBLE].

1195
01:14:41,852 --> 01:14:42,560
PROFESSOR: Right.

1196
01:14:42,560 --> 01:14:45,680
And you basically could have
the process start diverging.

1197
01:14:45,680 --> 01:14:50,570
Basically these
autoregressions can explode.

1198
01:14:50,570 --> 01:14:52,590
So let's go through and see.

1199
01:14:57,580 --> 01:14:58,080
Let's see.

1200
01:15:00,828 --> 01:15:06,570
Actually, we're going to
go to GARCH models next.

1201
01:15:06,570 --> 01:15:08,800
OK.

1202
01:15:08,800 --> 01:15:09,300
Let's see.

1203
01:15:17,464 --> 01:15:18,880
Let me just go
back here a second.

1204
01:15:18,880 --> 01:15:19,379
OK.

1205
01:15:19,379 --> 01:15:20,200
Very good.

1206
01:15:20,200 --> 01:15:21,160
OK.

1207
01:15:21,160 --> 01:15:23,780
In the remaining few minutes
let me just introduce you

1208
01:15:23,780 --> 01:15:28,320
to the GARCH models.

1209
01:15:36,560 --> 01:15:43,960
The GARCH model is
basically a series

1210
01:15:43,960 --> 01:15:46,610
of past values of the
squared volatilities,

1211
01:15:46,610 --> 01:15:55,860
basically the q sum of
past squared volatilities

1212
01:15:55,860 --> 01:16:00,620
for the equation for the
volatility sigma t squared.

1213
01:16:00,620 --> 01:16:06,300
And so it may be
that very high order

1214
01:16:06,300 --> 01:16:09,760
ARCH models are
actually important.

1215
01:16:09,760 --> 01:16:17,410
Or very high order ARCH terms
are found to be significant

1216
01:16:17,410 --> 01:16:20,680
when you fit ARCH models.

1217
01:16:20,680 --> 01:16:25,610
It could be that
much of that need

1218
01:16:25,610 --> 01:16:29,540
is explained by adding
these GARCH terms.

1219
01:16:29,540 --> 01:16:33,660
And so let's just consider
a simple GARCH model where

1220
01:16:33,660 --> 01:16:40,020
we have only a first order ARCH
term and a first order GARCH

1221
01:16:40,020 --> 01:16:40,830
term.

1222
01:16:40,830 --> 01:16:43,230
So we're basically
saying that this

1223
01:16:43,230 --> 01:16:48,470
is a weighted average of
the previous volatility,

1224
01:16:48,470 --> 01:16:51,210
the new squared residual.

1225
01:16:51,210 --> 01:16:58,060
And this is a very
parsimonious representation

1226
01:16:58,060 --> 01:17:01,630
that actually ends up fitting
data quite, quite well.

1227
01:17:01,630 --> 01:17:07,710
And there are various
properties of this GARCH model

1228
01:17:07,710 --> 01:17:12,270
which we'll go
through next time,

1229
01:17:12,270 --> 01:17:14,260
but I want to just
close this lecture

1230
01:17:14,260 --> 01:17:20,750
by showing you fits of the ARCH
models and of this GARCH model

1231
01:17:20,750 --> 01:17:25,500
to the euro/dollar
exchange rate process.

1232
01:17:25,500 --> 01:17:31,080
So let's just look at that here.

1233
01:17:34,521 --> 01:17:35,021
OK.

1234
01:17:40,700 --> 01:17:41,200
OK.

1235
01:17:41,200 --> 01:17:44,160
With the euro/dollar
exchange rate,

1236
01:17:44,160 --> 01:17:47,110
actually there's
the graph here which

1237
01:17:47,110 --> 01:17:51,380
shows the
auto-correlation function

1238
01:17:51,380 --> 01:17:54,070
and the partial
auto-correlation function

1239
01:17:54,070 --> 01:17:57,290
of the squared returns.

1240
01:17:57,290 --> 01:18:02,600
So is there dependence in
these daily volatilities?

1241
01:18:02,600 --> 01:18:07,470
And basically these blue
lines are plus or minus

1242
01:18:07,470 --> 01:18:12,580
two standard deviations of
the correlation coefficient.

1243
01:18:12,580 --> 01:18:16,950
Basically we have highly
significant auto-correlations

1244
01:18:16,950 --> 01:18:20,900
and very highly significant
partial auto-correlations,

1245
01:18:20,900 --> 01:18:24,290
which suggests if you're
familiar with ARMA process

1246
01:18:24,290 --> 01:18:27,820
that you would need a very
high order ARMA process

1247
01:18:27,820 --> 01:18:34,360
to fit the squared residuals.

1248
01:18:34,360 --> 01:18:38,730
But this highlights how
with the statistical tools

1249
01:18:38,730 --> 01:18:45,610
you can actually identify this
time dependence quite quickly.

1250
01:18:45,610 --> 01:18:52,530
And here's a plot of the ARCH
order one model and the ARCH

1251
01:18:52,530 --> 01:18:54,200
order two model.

1252
01:18:54,200 --> 01:18:56,570
And on each of
these I've actually

1253
01:18:56,570 --> 01:19:00,940
drawn a solid line where the
sort of constant variance model

1254
01:19:00,940 --> 01:19:01,440
would be.

1255
01:19:01,440 --> 01:19:05,300
So ARCH is saying that we
have a lot of variability

1256
01:19:05,300 --> 01:19:09,270
about that constant mean.

1257
01:19:09,270 --> 01:19:14,070
And a property, I guess,
of these ARCH models

1258
01:19:14,070 --> 01:19:18,340
is that they all have
sort of a minimum value

1259
01:19:18,340 --> 01:19:21,330
for the volatility that
they're estimating.

1260
01:19:21,330 --> 01:19:23,020
If you look at
the ARCH function,

1261
01:19:23,020 --> 01:19:26,770
that alpha_0 now is--
the constant term

1262
01:19:26,770 --> 01:19:31,460
is basically the minimum
value, which that can be.

1263
01:19:31,460 --> 01:19:35,790
So there's a constraint
sort of on the lower value.

1264
01:19:35,790 --> 01:19:48,080
Then here's an
ARCH(10) fit which,

1265
01:19:48,080 --> 01:19:51,190
it doesn't look like it sort of
has quite as much of a uniform

1266
01:19:51,190 --> 01:19:54,940
lower bound in it, but one could
go on and on with higher order

1267
01:19:54,940 --> 01:19:58,970
ARCH terms, but rather than
doing that one can also fit

1268
01:19:58,970 --> 01:20:01,030
just a GARCH(1,1) model.

1269
01:20:01,030 --> 01:20:04,250
And this is what it looks like.

1270
01:20:04,250 --> 01:20:10,170
So basically the time varying
volatility in this process

1271
01:20:10,170 --> 01:20:12,310
is captured really,
really well with just

1272
01:20:12,310 --> 01:20:16,040
this two-parameter GARCH model
as compared with a high order

1273
01:20:16,040 --> 01:20:17,990
autoregressive model.

1274
01:20:17,990 --> 01:20:20,760
And it sort of
highlights the issues

1275
01:20:20,760 --> 01:20:25,090
with the Wold decomposition
where a potentially infinite

1276
01:20:25,090 --> 01:20:29,010
order autoregressive
model will effectively

1277
01:20:29,010 --> 01:20:30,770
fit most time series.

1278
01:20:30,770 --> 01:20:32,510
Well, that's nice
to know, but it's

1279
01:20:32,510 --> 01:20:34,360
nice to have a parsimonious
way of defining

1280
01:20:34,360 --> 01:20:36,390
that infinite
collection of parameters

1281
01:20:36,390 --> 01:20:39,130
and with the GARCH model
a couple of parameters

1282
01:20:39,130 --> 01:20:40,580
do a good job.

1283
01:20:40,580 --> 01:20:42,820
And then finally here's
just a simultaneous plot

1284
01:20:42,820 --> 01:20:47,270
of all those volatility
estimates on the same graph.

1285
01:20:47,270 --> 01:20:53,010
And so one can see the
increased flexibility basically

1286
01:20:53,010 --> 01:20:56,490
of the GARCH models compared to
the ARCH models for capturing

1287
01:20:56,490 --> 01:20:59,280
time-varying volatility.

1288
01:20:59,280 --> 01:21:00,840
So all right.

1289
01:21:00,840 --> 01:21:02,970
I'll stop there for today.

1290
01:21:02,970 --> 01:21:05,080
And let's see.

1291
01:21:05,080 --> 01:21:10,290
Next Tuesday is a presentation
from Morgan Stanley so.

1292
01:21:10,290 --> 01:21:15,570
And today's the last day to
sign up for a field trip.