1 00:00:00,000 --> 00:00:00,060 2 00:00:00,060 --> 00:00:01,780 The following content is provided 3 00:00:01,780 --> 00:00:04,019 under a Creative Commons license. 4 00:00:04,019 --> 00:00:06,870 Your support will help MIT OpenCourseWare continue 5 00:00:06,870 --> 00:00:10,730 to offer high quality educational resources for free. 6 00:00:10,730 --> 00:00:13,330 To make a donation or view additional materials 7 00:00:13,330 --> 00:00:17,217 from hundreds of MIT courses, visit MIT OpenCourseWare 8 00:00:17,217 --> 00:00:17,842 at ocw.mit.edu. 9 00:00:17,842 --> 00:00:21,546 10 00:00:21,546 --> 00:00:22,530 PROFESSOR: All right. 11 00:00:22,530 --> 00:00:24,649 I want to complete the discussion 12 00:00:24,649 --> 00:00:26,940 on volatility modeling in the first part of the lecture 13 00:00:26,940 --> 00:00:27,870 today. 14 00:00:27,870 --> 00:00:32,950 And last time we addressed the definition 15 00:00:32,950 --> 00:00:38,070 of ARCH models, which allow for time varying 16 00:00:38,070 --> 00:00:42,325 volatility in modeling the returns of a financial time 17 00:00:42,325 --> 00:00:42,940 series. 18 00:00:42,940 --> 00:00:46,910 And we were looking last time at modeling the Euro-dollar 19 00:00:46,910 --> 00:00:48,770 exchange rate returns. 20 00:00:48,770 --> 00:00:54,970 And we went through fitting arch models to those returns, 21 00:00:54,970 --> 00:00:59,880 and also looked at fitting the GARCH model to those returns. 22 00:00:59,880 --> 00:01:07,980 And to recap, the GARCH model extends upon the ARCH model 23 00:01:07,980 --> 00:01:10,110 by adding some extra terms. 24 00:01:10,110 --> 00:01:12,890 So if you look at this expression for the GARCH model, 25 00:01:12,890 --> 00:01:15,950 the first two terms for the time varying volatility sigma 26 00:01:15,950 --> 00:01:19,130 squared t is a linear combination 27 00:01:19,130 --> 00:01:23,750 of the past sort of residual returns squared. 28 00:01:23,750 --> 00:01:26,420 That's the ARCH model, p [? of ?] [? those. ?] So 29 00:01:26,420 --> 00:01:29,280 the current volatility depends upon what's happened in excess 30 00:01:29,280 --> 00:01:31,460 returns over the last p periods. 31 00:01:31,460 --> 00:01:33,960 But then we add an extra term, which 32 00:01:33,960 --> 00:01:39,380 is corresponds to queue levels of the previous volatility. 33 00:01:39,380 --> 00:01:42,670 And so what we're doing with GARCH 34 00:01:42,670 --> 00:01:46,150 models is adding extra parameters to the ARCH, 35 00:01:46,150 --> 00:01:49,840 but an advantage of considering these extra parameters which 36 00:01:49,840 --> 00:01:53,440 relate basically the current volatility sigma squared t 37 00:01:53,440 --> 00:01:56,280 with the previous or lagged value sigma squared t 38 00:01:56,280 --> 00:01:58,940 minus j for lags j is that we may 39 00:01:58,940 --> 00:02:02,910 be able to have a model with many fewer parameters. 40 00:02:02,910 --> 00:02:09,930 So indeed, if we fit these models to the exchange rate 41 00:02:09,930 --> 00:02:15,670 returns, what we found last time-- let me go through 42 00:02:15,670 --> 00:02:19,970 and show that-- was basically here 43 00:02:19,970 --> 00:02:25,890 are various fits of the three cases of ARCH models. 44 00:02:25,890 --> 00:02:29,210 ARCH orders 1, 2, and 10, thinking we maybe 45 00:02:29,210 --> 00:02:31,760 need many lags to fit volatility. 46 00:02:31,760 --> 00:02:34,980 And then the GARCH model 1,1, where we only 47 00:02:34,980 --> 00:02:38,890 have one ARCH term and one GARCH term. 48 00:02:38,890 --> 00:02:44,640 And so basically the green line-- or rather the blue line 49 00:02:44,640 --> 00:02:48,550 in this graph, shows the plot of the fitted GARCH 1,1, 50 00:02:48,550 --> 00:02:51,540 model as compared with the ARCH models. 51 00:02:51,540 --> 00:02:55,960 Now, in looking at this graph, one 52 00:02:55,960 --> 00:02:59,670 can actually see some features of how these models are 53 00:02:59,670 --> 00:03:04,090 fitting volatility, which is important to understand. 54 00:03:04,090 --> 00:03:07,820 One is that the ARCH models have a hard lower 55 00:03:07,820 --> 00:03:10,950 bound on the volatility. 56 00:03:10,950 --> 00:03:14,030 Basically there's a constant term 57 00:03:14,030 --> 00:03:16,070 in the volatility equation. 58 00:03:16,070 --> 00:03:22,410 And because the additional terms are squared, excess returns, 59 00:03:22,410 --> 00:03:24,270 it basically-- the volatility does 60 00:03:24,270 --> 00:03:25,800 have lower bound of that intercept. 61 00:03:25,800 --> 00:03:29,230 So depending on what range you fit the data over, 62 00:03:29,230 --> 00:03:33,190 that lower bound is going to be defined by-- 63 00:03:33,190 --> 00:03:37,080 or it will be determined by the data you're fitting to. 64 00:03:37,080 --> 00:03:41,700 As you increase the ARCH order, you basically 65 00:03:41,700 --> 00:03:46,940 allow for a greater range of-- or a lower lower bound of that. 66 00:03:46,940 --> 00:03:48,590 And with the GARCH model you can see 67 00:03:48,590 --> 00:03:52,150 that this blue line is actually predicting 68 00:03:52,150 --> 00:03:55,030 very different levels of volatility 69 00:03:55,030 --> 00:03:57,300 over the entire range of the series. 70 00:03:57,300 --> 00:03:59,310 So it really is much more flexible. 71 00:03:59,310 --> 00:04:02,500 72 00:04:02,500 --> 00:04:07,580 Now-- and in these fits, we are assuming Gaussian distributions 73 00:04:07,580 --> 00:04:11,140 for the innovations in the return series. 74 00:04:11,140 --> 00:04:15,000 We'll soon pursue looking at alternatives to that, 75 00:04:15,000 --> 00:04:18,100 but let me talk just a little bit more 76 00:04:18,100 --> 00:04:23,980 about the GARCH model going back to lecture notes here. 77 00:04:23,980 --> 00:04:26,295 So let me expand this. 78 00:04:26,295 --> 00:04:31,150 79 00:04:31,150 --> 00:04:31,650 OK. 80 00:04:31,650 --> 00:04:32,858 So there's the specification. 81 00:04:32,858 --> 00:04:35,000 The GARCH 1,1 model. 82 00:04:35,000 --> 00:04:42,070 One thing to note is that this GARCH 1,1 model does 83 00:04:42,070 --> 00:04:46,340 relate to an ARMA, an Auto Regressive Moving Average, 84 00:04:46,340 --> 00:04:49,410 process in the squared residuals. 85 00:04:49,410 --> 00:04:53,290 So if we look at the top line, which 86 00:04:53,290 --> 00:04:57,680 is the equation for the GARCH 1,1 model, 87 00:04:57,680 --> 00:05:02,250 consider eliminating sigma squared t 88 00:05:02,250 --> 00:05:09,330 by using a new innovation term, little ut, which 89 00:05:09,330 --> 00:05:11,690 is the difference between the squared residual 90 00:05:11,690 --> 00:05:15,060 and the true volatility given by the model. 91 00:05:15,060 --> 00:05:19,910 So if you plug in the difference between our squared excess 92 00:05:19,910 --> 00:05:22,680 return and the current volatility, 93 00:05:22,680 --> 00:05:27,710 that should have mean 0 because sigma squared 94 00:05:27,710 --> 00:05:33,360 t, the t volatility, squared is equal to the square-- 95 00:05:33,360 --> 00:05:38,070 or is equal to the expectation of the squared excess residual 96 00:05:38,070 --> 00:05:40,440 return, epsilon t squared. 97 00:05:40,440 --> 00:05:42,650 So if we plug that in, we basically 98 00:05:42,650 --> 00:05:47,330 get an ARMA model for the squared residuals. 99 00:05:47,330 --> 00:05:50,750 And so epsilon t squared is alpha 0 plus alpha 1 100 00:05:50,750 --> 00:05:57,300 plus beta 1 the squared t minus 1 lag plus ut minus beta 1 ut. 101 00:05:57,300 --> 00:06:04,800 And so what this implies is an ARMA 1,1 model with white noise 102 00:06:04,800 --> 00:06:08,740 that has mean 0 and variance 2 sigma to the fourth. 103 00:06:08,740 --> 00:06:11,150 Just plugging things in. 104 00:06:11,150 --> 00:06:15,380 And through our knowledge, understanding, 105 00:06:15,380 --> 00:06:19,120 of univariate time series models, ARMA models, 106 00:06:19,120 --> 00:06:23,370 we can express this ARMA model for the squared residuals 107 00:06:23,370 --> 00:06:27,660 as basically a polynomial lag of the squared residuals 108 00:06:27,660 --> 00:06:32,690 is equal to a polynomial lag of the innovations. 109 00:06:32,690 --> 00:06:38,330 And so we have this expression for what the innovations are. 110 00:06:38,330 --> 00:06:47,020 And it's required that the roots of this a of l operator, 111 00:06:47,020 --> 00:06:49,820 when it thought of it on the complex plane, 112 00:06:49,820 --> 00:06:53,670 have roots outside the unit circle, which corresponds 113 00:06:53,670 --> 00:06:57,690 to alpha 1 plus beta 1 being less than 1 in magnitude. 114 00:06:57,690 --> 00:07:01,020 So in order for these volatility models 115 00:07:01,020 --> 00:07:04,780 not to blow up and be stationary, covariant 116 00:07:04,780 --> 00:07:08,480 stationary, we have these bounds on the parameters. 117 00:07:08,480 --> 00:07:14,790 118 00:07:14,790 --> 00:07:19,330 OK, let's look at the unconditional volatility 119 00:07:19,330 --> 00:07:23,260 or long run variance of the GARCH model. 120 00:07:23,260 --> 00:07:29,190 If you take expectations on both sides of the GARCH model 121 00:07:29,190 --> 00:07:34,270 equation, you basically have the expectation of sigma 122 00:07:34,270 --> 00:07:38,400 squared sub t-- in the long run is sigma star squared-- is 123 00:07:38,400 --> 00:07:43,630 alpha 0 plus alpha 1 plus beta 1 sigma star squared. 124 00:07:43,630 --> 00:07:46,690 So that sigma star squared there is the expectation 125 00:07:46,690 --> 00:07:58,521 of the t minus 1 volatility squared in the limit. 126 00:07:58,521 --> 00:08:00,020 And then you can just solve for this 127 00:08:00,020 --> 00:08:02,080 and see the sigma star squared is equal alpha 128 00:08:02,080 --> 00:08:05,050 0 over 1 minus alpha 1 minus beta 1. 129 00:08:05,050 --> 00:08:10,180 And in terms of the stationarity conditions for the process, 130 00:08:10,180 --> 00:08:15,050 if the long run variance, in order for that to be finite, 131 00:08:15,050 --> 00:08:20,500 you need alpha 1 plus beta 1 to be less than 1 in magnitude. 132 00:08:20,500 --> 00:08:26,710 And if you consider the general GARCH p,1 model, 133 00:08:26,710 --> 00:08:31,460 then the same argument leads to a long run variance being equal 134 00:08:31,460 --> 00:08:35,490 to alpha 0, the sort of intercept term in the GARCH 135 00:08:35,490 --> 00:08:40,030 model, divided by 1 minus the sum of all the parameters. 136 00:08:40,030 --> 00:08:43,740 So these GARCH models lead to constraints 137 00:08:43,740 --> 00:08:50,580 on the parameters that are important to incorporate when 138 00:08:50,580 --> 00:08:54,870 we're doing any estimation of these underlying parameters. 139 00:08:54,870 --> 00:08:56,650 And it does complicate things, actually. 140 00:08:56,650 --> 00:08:59,460 141 00:08:59,460 --> 00:09:04,820 So with maximum likelihood estimation, 142 00:09:04,820 --> 00:09:07,270 the routine for maximum likelihood estimation 143 00:09:07,270 --> 00:09:08,840 is the same for all models. 144 00:09:08,840 --> 00:09:11,930 We basically want to determine the likelihood function 145 00:09:11,930 --> 00:09:14,430 of our data given the unknown parameters. 146 00:09:14,430 --> 00:09:17,090 And the likelihood function is the probability density 147 00:09:17,090 --> 00:09:21,380 function of the data conditional on the parameters. 148 00:09:21,380 --> 00:09:23,280 So our likelihood function as a function 149 00:09:23,280 --> 00:09:26,470 of the unknown parameters c, alpha, and beta 150 00:09:26,470 --> 00:09:30,390 is the value of the probability density, the joint density 151 00:09:30,390 --> 00:09:34,530 of all the data conditional on those parameters. 152 00:09:34,530 --> 00:09:37,820 And that joint density function can 153 00:09:37,820 --> 00:09:41,030 be expressed as the product of successive conditional 154 00:09:41,030 --> 00:09:42,600 expectations of the time series. 155 00:09:42,600 --> 00:09:45,420 156 00:09:45,420 --> 00:09:51,600 And those conditional densities are normal random variables. 157 00:09:51,600 --> 00:09:53,650 So we can just plug in what we know 158 00:09:53,650 --> 00:09:56,290 to be the probability densities of normals 159 00:09:56,290 --> 00:10:01,270 for the t-th innovation epsilon t. 160 00:10:01,270 --> 00:10:04,340 And we just optimize that function. 161 00:10:04,340 --> 00:10:09,050 Now, the challenge with estimating these GARCH models 162 00:10:09,050 --> 00:10:13,990 in part is the constraints on the underlying parameters. 163 00:10:13,990 --> 00:10:16,000 Those need to be enforced. 164 00:10:16,000 --> 00:10:19,250 So we have to have that the alpha i are greater than 0. 165 00:10:19,250 --> 00:10:21,860 Also, the beta j are greater than 0. 166 00:10:21,860 --> 00:10:25,800 And the sum of all of them is between 0 and 1. 167 00:10:25,800 --> 00:10:29,406 168 00:10:29,406 --> 00:10:33,860 Who in this class has had courses in numerical analysis 169 00:10:33,860 --> 00:10:39,680 and done some optimization of functions? 170 00:10:39,680 --> 00:10:40,840 Non-linear functions? 171 00:10:40,840 --> 00:10:43,220 Anybody? 172 00:10:43,220 --> 00:10:43,990 OK. 173 00:10:43,990 --> 00:10:48,490 Well, in addressing this kind of problem, which will come up 174 00:10:48,490 --> 00:10:51,300 with any complex model that you need to estimate, 175 00:10:51,300 --> 00:10:56,640 say via maximum likelihood, the optimization methods 176 00:10:56,640 --> 00:11:02,610 do really well if you're optimizing a convex function, 177 00:11:02,610 --> 00:11:05,020 finding the minimum of a convex function. 178 00:11:05,020 --> 00:11:09,520 And it's always nice to do minimization 179 00:11:09,520 --> 00:11:14,460 over sort of an unconstrained range of underlying parameters. 180 00:11:14,460 --> 00:11:18,630 And one of the tracks in solving these problems 181 00:11:18,630 --> 00:11:23,070 is to transform the parameters to a scale 182 00:11:23,070 --> 00:11:27,480 where they're unlimited in range, basically. 183 00:11:27,480 --> 00:11:29,420 So if you have a positive random variable, 184 00:11:29,420 --> 00:11:32,200 you might use to log of that variable as the thing 185 00:11:32,200 --> 00:11:33,860 to be optimizing over. 186 00:11:33,860 --> 00:11:36,180 If the variable's between 0 and 1, 187 00:11:36,180 --> 00:11:39,050 then you might use that variable divided 188 00:11:39,050 --> 00:11:43,699 by 1 minus that variable and then take the log of that. 189 00:11:43,699 --> 00:11:44,740 And that's unconstrained. 190 00:11:44,740 --> 00:11:48,300 So there are tricks for how you do this optimization, which 191 00:11:48,300 --> 00:11:49,580 come into play. 192 00:11:49,580 --> 00:11:53,290 Anyway, that's the likelihood with the normal distribution. 193 00:11:53,290 --> 00:11:57,650 And we have computer programs that 194 00:11:57,650 --> 00:11:59,150 will solve that directly so we don't 195 00:11:59,150 --> 00:12:01,270 have to worry about this particular case. 196 00:12:01,270 --> 00:12:05,430 197 00:12:05,430 --> 00:12:08,320 Once we fit this model, we want to evaluate 198 00:12:08,320 --> 00:12:13,650 how good it is and the evaluation is based 199 00:12:13,650 --> 00:12:16,860 upon looking at the residuals from the model. 200 00:12:16,860 --> 00:12:19,450 So what we have are these innovations, 201 00:12:19,450 --> 00:12:23,760 epsilon hat t, which should be distributed 202 00:12:23,760 --> 00:12:28,880 with variance or volatility sigma hat t. 203 00:12:28,880 --> 00:12:34,600 Those should be uncorrelated with themselves or at least 204 00:12:34,600 --> 00:12:37,620 to the extent that they can be. 205 00:12:37,620 --> 00:12:39,520 And the squared standardized residuals 206 00:12:39,520 --> 00:12:40,732 should also be uncorrelated. 207 00:12:40,732 --> 00:12:42,440 What we're trying to do with these models 208 00:12:42,440 --> 00:12:45,990 is to capture the dependence, actually, 209 00:12:45,990 --> 00:12:50,490 in this squared residuals, which is measuring 210 00:12:50,490 --> 00:12:52,585 the magnitude of the excess in returns. 211 00:12:52,585 --> 00:12:53,960 So though should be uncorrelated. 212 00:12:53,960 --> 00:12:56,690 213 00:12:56,690 --> 00:12:58,340 There are various test for normality. 214 00:12:58,340 --> 00:13:01,630 I've listed some of those that are the most popular here. 215 00:13:01,630 --> 00:13:05,910 And then there's issues of model selection 216 00:13:05,910 --> 00:13:10,470 for deciding sort of which GARCH model to apply. 217 00:13:10,470 --> 00:13:17,280 I wanted to go through an example of this analysis 218 00:13:17,280 --> 00:13:20,630 with the Euro-dollar exchange rate. 219 00:13:20,630 --> 00:13:27,480 So let me go to this case study note. 220 00:13:27,480 --> 00:13:31,700 So let's see. 221 00:13:31,700 --> 00:13:37,480 There's a package in r called rugarch for univariate GARCH 222 00:13:37,480 --> 00:13:44,420 models, which fits various GARCH models with different-- 223 00:13:44,420 --> 00:13:47,780 and fits them by maximum likelihood. 224 00:13:47,780 --> 00:13:52,280 So with this packet-- with this particular library in r, 225 00:13:52,280 --> 00:14:08,320 I fit the GARCH model after actually 226 00:14:08,320 --> 00:14:12,470 fitting the mean process for the exchange rate returns. 227 00:14:12,470 --> 00:14:14,470 Now, when we looked at things last time, 228 00:14:14,470 --> 00:14:16,950 we basically looked at modeling the squared returns. 229 00:14:16,950 --> 00:14:19,380 In fact, there may be an underlying mean process 230 00:14:19,380 --> 00:14:21,320 that needs to be specified as well. 231 00:14:21,320 --> 00:14:24,010 So in this section of the case note, 232 00:14:24,010 --> 00:14:29,970 I initially fit an auto regressive process 233 00:14:29,970 --> 00:14:32,020 using the Akaike information criterion 234 00:14:32,020 --> 00:14:35,590 to choose the order of the auto regressive process 235 00:14:35,590 --> 00:14:42,200 and then fit a GARCH model with normal GARCH terms. 236 00:14:42,200 --> 00:14:47,110 And this is a plot of the normal q, 237 00:14:47,110 --> 00:14:51,800 q plot of the auto regressive residuals. 238 00:14:51,800 --> 00:14:56,220 And what you can see is that the points lie along 239 00:14:56,220 --> 00:14:59,350 a straight line sort of in the middle of the range. 240 00:14:59,350 --> 00:15:03,260 But on the extremes, they depart from that straight line. 241 00:15:03,260 --> 00:15:09,510 This basically is a measure of standardized quantiles. 242 00:15:09,510 --> 00:15:12,120 So in terms of standard units away 243 00:15:12,120 --> 00:15:16,120 from the mean for the residuals, we 244 00:15:16,120 --> 00:15:19,499 tend to get many more high values and many more low values 245 00:15:19,499 --> 00:15:20,790 with the Gaussian distribution. 246 00:15:20,790 --> 00:15:24,620 So that really isn't fitting very well. 247 00:15:24,620 --> 00:15:33,300 If we proceed and fit-- OK, actually that plot 248 00:15:33,300 --> 00:15:37,590 was just the simple ARCH model with no GARCH terms. 249 00:15:37,590 --> 00:15:46,240 And then this is the graph of the q, q plot with the Gaussian 250 00:15:46,240 --> 00:15:47,350 assumption. 251 00:15:47,350 --> 00:15:50,840 So here we can see that the residuals from this model 252 00:15:50,840 --> 00:15:54,650 are suggesting that it may do a pretty good job when things are 253 00:15:54,650 --> 00:15:57,600 only a few standard deviations away from the mean. 254 00:15:57,600 --> 00:15:59,310 Less than two 2, 2.5. 255 00:15:59,310 --> 00:16:02,480 But when we get to more extreme values, 256 00:16:02,480 --> 00:16:04,690 this isn't modeling things well. 257 00:16:04,690 --> 00:16:09,510 So one alternative is to consider a heavier 258 00:16:09,510 --> 00:16:12,740 tailed distribution than the normal, namely 259 00:16:12,740 --> 00:16:14,920 the t distribution. 260 00:16:14,920 --> 00:16:19,230 And consider identifying what t distribution best 261 00:16:19,230 --> 00:16:20,000 fits the data. 262 00:16:20,000 --> 00:16:24,490 263 00:16:24,490 --> 00:16:31,620 So let's just look at what ends up being the maximum likelihood 264 00:16:31,620 --> 00:16:34,340 estimate for the degrees of freedom parameter, which 265 00:16:34,340 --> 00:16:37,150 is 10 degrees of freedom. 266 00:16:37,150 --> 00:16:38,810 This shows the q, q plot when you 267 00:16:38,810 --> 00:16:41,880 have a non-Gaussian distribution that's 268 00:16:41,880 --> 00:16:45,380 t with 10 degrees of freedom. 269 00:16:45,380 --> 00:16:47,880 It basically is explaining these residuals quite well, 270 00:16:47,880 --> 00:16:53,690 so that's accommodating the heavier tailed distribution 271 00:16:53,690 --> 00:16:55,245 of these values. 272 00:16:55,245 --> 00:16:59,630 273 00:16:59,630 --> 00:17:06,550 With this GARCH model, let's see-- 274 00:17:06,550 --> 00:17:11,400 if you compare sort of estimate of volatility under the GARCH 275 00:17:11,400 --> 00:17:18,849 and ARCH models-- the GARCH models with the t 276 00:17:18,849 --> 00:17:24,040 distribution-- sorry t distribution versus Gaussian. 277 00:17:24,040 --> 00:17:27,610 Here's just a graph showing time series plots of the estimated 278 00:17:27,610 --> 00:17:29,860 volatility over time, which actually look quite close. 279 00:17:29,860 --> 00:17:31,401 But when you look at the differences, 280 00:17:31,401 --> 00:17:33,640 there really are differences. 281 00:17:33,640 --> 00:17:43,220 And so it turns out that the volatility function 282 00:17:43,220 --> 00:17:46,380 or the volatility estimate GARCH models with Gaussian 283 00:17:46,380 --> 00:17:48,460 versus GARCH with t distributions 284 00:17:48,460 --> 00:17:51,560 are really very, very similar. 285 00:17:51,560 --> 00:17:53,250 And the heavier tailed distribution 286 00:17:53,250 --> 00:17:58,740 of the t distribution means that the distribution 287 00:17:58,740 --> 00:18:03,200 of actual volatility is greater. 288 00:18:03,200 --> 00:18:05,240 But in terms of estimating the volatility, 289 00:18:05,240 --> 00:18:09,380 you have quite similar estimates of the volatility coming out. 290 00:18:09,380 --> 00:18:13,110 And this display-- which you'll be 291 00:18:13,110 --> 00:18:15,740 able to see more clearly in the case notes that I'll post up-- 292 00:18:15,740 --> 00:18:18,875 but show that these are really quite similar in magnitude. 293 00:18:18,875 --> 00:18:22,260 294 00:18:22,260 --> 00:18:31,350 And the value at risk concept that was just-- by Ken couple 295 00:18:31,350 --> 00:18:35,950 weeks ago in his lecture from Morgan Stanley-- 296 00:18:35,950 --> 00:18:38,500 concerns the issue of estimating what 297 00:18:38,500 --> 00:18:44,480 is the likelihood of returns exceeding some threshold. 298 00:18:44,480 --> 00:18:52,310 And if we use the t distribution for measuring variability 299 00:18:52,310 --> 00:18:57,420 of the excess returns, then the computations in the notes 300 00:18:57,420 --> 00:19:03,000 indicate how you would compute these value at risk limits. 301 00:19:03,000 --> 00:19:05,360 If you compare the t distribution 302 00:19:05,360 --> 00:19:08,920 with a Gaussian distribution at these nominal levels for value 303 00:19:08,920 --> 00:19:13,140 at risk of like 2.5% or 5%, surprisingly you 304 00:19:13,140 --> 00:19:16,070 won't get too much difference. 305 00:19:16,070 --> 00:19:19,690 It's really in looking at sort of the extreme tails 306 00:19:19,690 --> 00:19:24,400 of the distribution that things come into play. 307 00:19:24,400 --> 00:19:30,260 And so I wanted to show you how that plays out 308 00:19:30,260 --> 00:19:36,030 by showing you another graph here. 309 00:19:36,030 --> 00:19:40,080 Those of you who have had a statistics course before 310 00:19:40,080 --> 00:19:43,350 have heard that sort of a t distribution 311 00:19:43,350 --> 00:19:46,801 can be a good approximation to a normal-- 312 00:19:46,801 --> 00:19:48,550 or it can be approximated well by a normal 313 00:19:48,550 --> 00:19:53,060 if the degrees of freedom for the t are at some level. 314 00:19:53,060 --> 00:19:56,670 And who wants to suggest a degrees of freedom 315 00:19:56,670 --> 00:20:00,140 that you might have before you're 316 00:20:00,140 --> 00:20:02,330 comfortable approximating a t with a normal? 317 00:20:02,330 --> 00:20:05,480 318 00:20:05,480 --> 00:20:05,980 Danny? 319 00:20:05,980 --> 00:20:06,771 AUDIENCE: 30 or 40. 320 00:20:06,771 --> 00:20:08,600 PROFESSOR: 30 or 40. 321 00:20:08,600 --> 00:20:10,230 Sometimes people say even 25. 322 00:20:10,230 --> 00:20:13,210 Above 25, you can almost expect the t distribution 323 00:20:13,210 --> 00:20:15,260 to be a good approximation to the normal. 324 00:20:15,260 --> 00:20:19,340 Well, this is a graph the PDF for a standard normal 325 00:20:19,340 --> 00:20:22,350 versus a standard t with 30 degrees of freedom. 326 00:20:22,350 --> 00:20:26,080 And you can see that the density functions are very, very close. 327 00:20:26,080 --> 00:20:28,774 The standard-- the CDFs, the Cumulative Distribution 328 00:20:28,774 --> 00:20:30,190 Functions, which is the likelihood 329 00:20:30,190 --> 00:20:35,080 of being less than or equal to the horizontal value 330 00:20:35,080 --> 00:20:37,779 ranges between 0 and 1-- is almost indistinguishable. 331 00:20:37,779 --> 00:20:39,820 But if you look at the tails of the distribution, 332 00:20:39,820 --> 00:20:43,470 here I've computed the log of the CDF function. 333 00:20:43,470 --> 00:20:46,290 You basically have to move much more 334 00:20:46,290 --> 00:20:48,290 than two standard deviations away from the mean 335 00:20:48,290 --> 00:20:50,380 before there's really a difference in the t 336 00:20:50,380 --> 00:20:53,742 distribution with 30 degrees of freedom. 337 00:20:53,742 --> 00:20:56,010 Now I'm going to page up by reducing 338 00:20:56,010 --> 00:20:58,390 the degrees of freedom. 339 00:20:58,390 --> 00:21:00,628 Let's see. 340 00:21:00,628 --> 00:21:04,480 If we could do a page down here. 341 00:21:04,480 --> 00:21:07,420 Page down. 342 00:21:07,420 --> 00:21:08,850 Oh, page up. 343 00:21:08,850 --> 00:21:10,130 OK. 344 00:21:10,130 --> 00:21:15,520 So here is 20 degrees of freedom. 345 00:21:15,520 --> 00:21:21,380 Here's 10 degrees of freedom, in our case, which turns out 346 00:21:21,380 --> 00:21:24,180 to be sort of the best fit of the t distribution. 347 00:21:24,180 --> 00:21:29,240 And what you can see is that, in terms of standard deviation 348 00:21:29,240 --> 00:21:31,870 units, up to about two standard deviations below the mean, 349 00:21:31,870 --> 00:21:34,120 we're basically getting virtually the same probability 350 00:21:34,120 --> 00:21:37,040 mass at the extreme below. 351 00:21:37,040 --> 00:21:40,860 But as we go to four or six standard deviations, 352 00:21:40,860 --> 00:21:45,210 then we get heavier mass with the t distribution. 353 00:21:45,210 --> 00:21:49,340 In discussion of results in finance 354 00:21:49,340 --> 00:21:51,730 when you sort of fit models, people talk about, oh, there 355 00:21:51,730 --> 00:21:55,300 was six standard deviation move or-- which 356 00:21:55,300 --> 00:21:57,210 is just virtually impossible to occur. 357 00:21:57,210 --> 00:22:02,100 Well, with t distributions a six standard deviation move 358 00:22:02,100 --> 00:22:07,700 occurs about 1 in 10,000 times according to this fit. 359 00:22:07,700 --> 00:22:11,240 And so it actually is a common [? idiomatic. ?] 360 00:22:11,240 --> 00:22:16,460 And so it's important to know that these t distributions are 361 00:22:16,460 --> 00:22:19,920 benefiting us by giving us a much better gauge of what 362 00:22:19,920 --> 00:22:22,250 the tail distribution is like. 363 00:22:22,250 --> 00:22:27,220 And we call these distributions leptokurtic, 364 00:22:27,220 --> 00:22:30,260 meaning they're heavier tailed than a normal distribution. 365 00:22:30,260 --> 00:22:33,950 Actually, lepto means slender, I believe, 366 00:22:33,950 --> 00:22:40,000 if you're Greek or have the Greek origin of the word. 367 00:22:40,000 --> 00:22:44,100 And you can see that the blue curve, which 368 00:22:44,100 --> 00:22:47,070 is the t distribution, is sort of a bit more slender 369 00:22:47,070 --> 00:22:49,087 in the center of the distribution, which 370 00:22:49,087 --> 00:22:50,420 allows it to have heavier tails. 371 00:22:50,420 --> 00:22:56,270 372 00:22:56,270 --> 00:22:56,770 All right. 373 00:22:56,770 --> 00:22:59,710 So t distributions are very useful. 374 00:22:59,710 --> 00:23:10,230 Let's go back to this case note here 375 00:23:10,230 --> 00:23:14,940 which discusses-- this case note goes through, 376 00:23:14,940 --> 00:23:17,950 actually, fitting the t distribution-- 377 00:23:17,950 --> 00:23:21,160 identifying the degrees of freedom for this t model. 378 00:23:21,160 --> 00:23:27,230 And so with ru GARCH package, we can get the log likelihood 379 00:23:27,230 --> 00:23:33,150 of the data fit under the t distribution assumption. 380 00:23:33,150 --> 00:23:36,030 And here's a graph of the negative log likelihood 381 00:23:36,030 --> 00:23:40,900 versus the degrees of freedom in the t model. 382 00:23:40,900 --> 00:23:45,060 So with maximum likelihood we identify the value, 383 00:23:45,060 --> 00:23:48,040 which minimizes the negative log likelihood. 384 00:23:48,040 --> 00:23:51,012 And that comes out as that 10 value. 385 00:23:51,012 --> 00:23:54,260 386 00:23:54,260 --> 00:23:56,280 All right. 387 00:23:56,280 --> 00:23:57,960 Let's go back to these notes and see 388 00:23:57,960 --> 00:23:59,293 what else we want to talk about. 389 00:23:59,293 --> 00:24:05,250 390 00:24:05,250 --> 00:24:05,750 All right. 391 00:24:05,750 --> 00:24:13,380 392 00:24:13,380 --> 00:24:16,610 OK, with these GARCH models we actually 393 00:24:16,610 --> 00:24:19,860 are able to model volatility clustering. 394 00:24:19,860 --> 00:24:26,700 And volatility clustering is where, over time, you 395 00:24:26,700 --> 00:24:30,630 expect volatility to be high during some periods 396 00:24:30,630 --> 00:24:33,060 and to be low during other periods. 397 00:24:33,060 --> 00:24:35,900 And the GARCH model can accommodate that. 398 00:24:35,900 --> 00:24:38,080 So large volatilities tend to be followed 399 00:24:38,080 --> 00:24:39,830 by large, small volatilities tend 400 00:24:39,830 --> 00:24:42,711 to be followed by small ones. 401 00:24:42,711 --> 00:24:43,210 OK. 402 00:24:43,210 --> 00:24:49,240 The returns have heavier tails than Gaussian distributions. 403 00:24:49,240 --> 00:24:53,020 Actually, even if we have Gaussian errors in the GARCH 404 00:24:53,020 --> 00:24:56,870 model, it's still heavier tailed than a Gaussian. 405 00:24:56,870 --> 00:24:59,990 The homework goes into that a little bit. 406 00:24:59,990 --> 00:25:06,590 And the-- well, actually one of the original papers 407 00:25:06,590 --> 00:25:10,690 by Engle with Bollerslev, who introduced the GARCH model, 408 00:25:10,690 --> 00:25:13,490 discusses these features and how useful 409 00:25:13,490 --> 00:25:17,000 they are for modeling financial time series. 410 00:25:17,000 --> 00:25:25,860 Now, a property of these models that may be obvious, perhaps, 411 00:25:25,860 --> 00:25:29,090 but it is-- OK, these are models that 412 00:25:29,090 --> 00:25:33,310 are appropriate for modeling covariant stationary time 413 00:25:33,310 --> 00:25:34,230 series. 414 00:25:34,230 --> 00:25:37,970 So the volatility measure, which is 415 00:25:37,970 --> 00:25:42,540 a measure of the squared excess return, 416 00:25:42,540 --> 00:25:46,950 is basically a covariant stationary process. 417 00:25:46,950 --> 00:25:48,320 So what does that mean? 418 00:25:48,320 --> 00:25:51,870 That means that's going to have a long term mean. 419 00:25:51,870 --> 00:25:55,060 So with these are GARCH models that are covariant stationary, 420 00:25:55,060 --> 00:25:58,320 there's going to be a long term mean of the GARCH process. 421 00:25:58,320 --> 00:26:06,820 And this discussion here details how this GARCH process 422 00:26:06,820 --> 00:26:14,390 is essentially a mean reversion of the volatility 423 00:26:14,390 --> 00:26:16,420 to that value. 424 00:26:16,420 --> 00:26:21,110 So basically, the sort of excess volatility 425 00:26:21,110 --> 00:26:24,760 of the squared residuals relative to their long term 426 00:26:24,760 --> 00:26:29,600 average is some multiple of the previous period's 427 00:26:29,600 --> 00:26:32,300 excess volatility. 428 00:26:32,300 --> 00:26:36,770 So if we build forecasting models of volatility 429 00:26:36,770 --> 00:26:42,390 with GARCH models, what's going to happen? 430 00:26:42,390 --> 00:26:47,370 Basically, in the long run we predict that any volatility 431 00:26:47,370 --> 00:26:51,470 value is going to revert to this long run average. 432 00:26:51,470 --> 00:26:54,600 And in the short run, it's going to move incrementally 433 00:26:54,600 --> 00:26:56,740 to that value. 434 00:26:56,740 --> 00:27:02,080 So these GARCH models are very good for describing volatility 435 00:27:02,080 --> 00:27:03,650 relative to the long term average. 436 00:27:03,650 --> 00:27:06,620 In terms of their usefulness for prediction, 437 00:27:06,620 --> 00:27:09,580 well, they really predict that volatility 438 00:27:09,580 --> 00:27:13,250 is going to revert back to the mean at some rate. 439 00:27:13,250 --> 00:27:18,840 And the rate at which the volatility reverts back 440 00:27:18,840 --> 00:27:22,150 is given by alpha 1 plus beta 1. 441 00:27:22,150 --> 00:27:25,870 So that number, which is less than 1 442 00:27:25,870 --> 00:27:29,960 for covariant stationarity, is sort of measuring, basically, 443 00:27:29,960 --> 00:27:33,900 how quickly you are reverting back to the mean. 444 00:27:33,900 --> 00:27:36,860 And that sum is actually called a persistence parameter 445 00:27:36,860 --> 00:27:38,820 in GARCH models as well. 446 00:27:38,820 --> 00:27:40,970 So is volatility persisting or not? 447 00:27:40,970 --> 00:27:43,400 Well, the larger alpha 1 plus beta 1 is. 448 00:27:43,400 --> 00:27:47,290 The more persistent volatility is, meaning it's 449 00:27:47,290 --> 00:27:51,330 reverting back to that long run average very, very slowly. 450 00:27:51,330 --> 00:27:54,180 In the implementation of volatility estimates 451 00:27:54,180 --> 00:27:58,410 with the risk metrics methodology, 452 00:27:58,410 --> 00:28:04,090 they actually don't assume that there is a long run volatility. 453 00:28:04,090 --> 00:28:08,540 And so that basically you'll have alpha 1 be equal to 0 454 00:28:08,540 --> 00:28:14,130 and beta 1 equal to, say, 0.95. 455 00:28:14,130 --> 00:28:22,270 So or rather the alpha 0 is 0 and the alpha 1 and beta 1 456 00:28:22,270 --> 00:28:24,110 will actually sum to 1. 457 00:28:24,110 --> 00:28:28,220 And so you actually are tracking a potentially non-stationary 458 00:28:28,220 --> 00:28:35,550 volatility, which allows you to be estimating the volatility 459 00:28:35,550 --> 00:28:39,239 without presuming a long run average is 460 00:28:39,239 --> 00:28:40,280 consistent with the past. 461 00:28:40,280 --> 00:28:45,290 462 00:28:45,290 --> 00:28:48,410 There are many extensions of the GARCH models. 463 00:28:48,410 --> 00:28:53,162 And there's wide literature on that. 464 00:28:53,162 --> 00:28:55,370 For this course, I think it's important to understand 465 00:28:55,370 --> 00:28:59,020 the fundamentals of these models in terms of how they're 466 00:28:59,020 --> 00:29:02,890 specified under Gaussian and t assumptions. 467 00:29:02,890 --> 00:29:07,050 Extending them can be very interesting. 468 00:29:07,050 --> 00:29:12,650 And there are many papers to look at for that. 469 00:29:12,650 --> 00:29:13,150 OK. 470 00:29:13,150 --> 00:29:16,260 471 00:29:16,260 --> 00:29:21,210 let's pause for a minute and get the next topic. 472 00:29:21,210 --> 00:29:34,921 473 00:29:34,921 --> 00:29:35,420 All right. 474 00:29:35,420 --> 00:29:42,630 The next topic is time series, multivariate time series. 475 00:29:42,630 --> 00:29:46,220 In two lectures ago of mine, we talked about univariate time 476 00:29:46,220 --> 00:29:49,270 series and basic methodologies there. 477 00:29:49,270 --> 00:29:52,240 We're now going to be extending that to multivariate 478 00:29:52,240 --> 00:29:55,040 time series. 479 00:29:55,040 --> 00:30:00,110 Turns out there's a multivariate Wold representation theorem, 480 00:30:00,110 --> 00:30:02,860 extension of the univariate one. 481 00:30:02,860 --> 00:30:04,750 There are auto regressive processes 482 00:30:04,750 --> 00:30:07,590 for multivariate cases, which are vector auto regressive 483 00:30:07,590 --> 00:30:09,150 processes. 484 00:30:09,150 --> 00:30:12,040 Least squares estimation comes into play. 485 00:30:12,040 --> 00:30:17,460 And then we'll see where our regression analysis 486 00:30:17,460 --> 00:30:20,860 understanding allows us to specify these vector 487 00:30:20,860 --> 00:30:25,670 auto regressive processes nicely. 488 00:30:25,670 --> 00:30:30,120 There's an optimality properties of ordinary least squares 489 00:30:30,120 --> 00:30:35,020 estimates component wise, which we'll highlight in about a half 490 00:30:35,020 --> 00:30:35,530 an hour. 491 00:30:35,530 --> 00:30:39,870 And go through the maximum likelihood estimation model 492 00:30:39,870 --> 00:30:44,680 selection methods, which are just 493 00:30:44,680 --> 00:30:47,210 very straightforward extensions of the same concepts 494 00:30:47,210 --> 00:30:54,270 for univariate time series and univariate regressions. 495 00:30:54,270 --> 00:30:56,940 So let's talk-- let's introduce the notation 496 00:30:56,940 --> 00:30:59,010 for multivariate time series. 497 00:30:59,010 --> 00:31:04,990 We have a stochastic process, which now is multivariate. 498 00:31:04,990 --> 00:31:11,320 So we have bold x of t is some m dimensional valued 499 00:31:11,320 --> 00:31:13,390 random variable. 500 00:31:13,390 --> 00:31:21,170 And it's stochastic process that varies over time t. 501 00:31:21,170 --> 00:31:28,910 And we can think of this as m different time series 502 00:31:28,910 --> 00:31:31,940 corresponding to the m components of the given 503 00:31:31,940 --> 00:31:32,440 process. 504 00:31:32,440 --> 00:31:34,900 So, say, with exchange rates we could 505 00:31:34,900 --> 00:31:39,190 be modeling m different exchange rate values and want 506 00:31:39,190 --> 00:31:42,090 to model those jointly as a time series. 507 00:31:42,090 --> 00:31:47,450 Or we could have collections of stocks that we're modeling. 508 00:31:47,450 --> 00:31:49,160 So each of the components individually 509 00:31:49,160 --> 00:31:53,680 can be treated as univariate series with univariate methods. 510 00:31:53,680 --> 00:31:58,070 511 00:31:58,070 --> 00:32:00,740 With the multivariate case, we extend 512 00:32:00,740 --> 00:32:04,000 the definition of covariance stationarity 513 00:32:04,000 --> 00:32:11,390 to correspond to finite bounded first and second order moments. 514 00:32:11,390 --> 00:32:14,060 So we need to talk about the first order 515 00:32:14,060 --> 00:32:17,820 moment of the multivariate time series. 516 00:32:17,820 --> 00:32:22,970 Mu now is an m vector, which is the vector of expected values 517 00:32:22,970 --> 00:32:26,120 of the individual components, which we can denote by mu 1 518 00:32:26,120 --> 00:32:27,470 through mu m. 519 00:32:27,470 --> 00:32:31,535 So we basically have m vectors for our mean. 520 00:32:31,535 --> 00:32:35,830 521 00:32:35,830 --> 00:32:40,370 Then for the variance/covariance matrix, 522 00:32:40,370 --> 00:32:46,490 let's define gamma 0 to be the variance/covariance matrix 523 00:32:46,490 --> 00:32:51,440 of the t-th observation of our multivariate process. 524 00:32:51,440 --> 00:32:56,270 So that's equal to the expected value 525 00:32:56,270 --> 00:32:59,770 of xt minus mu xt minus mu prime. 526 00:32:59,770 --> 00:33:10,900 So when we write that down, we have xt minus mu. 527 00:33:10,900 --> 00:33:16,160 This is basically an m by 1 vector 528 00:33:16,160 --> 00:33:23,530 and then xt minus mu prime is a 1 by m vector. 529 00:33:23,530 --> 00:33:28,010 And so the product of that is an m by m quantity. 530 00:33:28,010 --> 00:33:36,266 So the 1, 1 element of that product is the variance of x1t. 531 00:33:36,266 --> 00:33:38,015 And the diagonal entries are the variances 532 00:33:38,015 --> 00:33:40,800 of the components series. 533 00:33:40,800 --> 00:33:43,790 And the off diagonal values are the covariance 534 00:33:43,790 --> 00:33:50,220 between the i-th row series and the j-th column series, 535 00:33:50,220 --> 00:33:54,130 as given by the i-th row of x and the j-th column 536 00:33:54,130 --> 00:33:58,100 of x transpose. 537 00:33:58,100 --> 00:34:01,270 So we're just collecting together all the variances 538 00:34:01,270 --> 00:34:02,770 covariances together. 539 00:34:02,770 --> 00:34:07,630 And the notation is very straightforward and simple 540 00:34:07,630 --> 00:34:10,800 with the matrix notation given here. 541 00:34:10,800 --> 00:34:21,500 Now, the correlation matrix, r0, is obtained by pre and post 542 00:34:21,500 --> 00:34:25,120 multiplying this covariance matrix gamma 543 00:34:25,120 --> 00:34:31,110 0 by a diagonal matrix with the square roots 544 00:34:31,110 --> 00:34:32,969 of the diagonal of this matrix. 545 00:34:32,969 --> 00:34:34,010 Now what's a correlation? 546 00:34:34,010 --> 00:34:38,800 Correlation is the correlation between two random variables 547 00:34:38,800 --> 00:34:42,750 where we've standardize the variables to have 548 00:34:42,750 --> 00:34:45,580 mean 0 and variance 1. 549 00:34:45,580 --> 00:34:49,020 550 00:34:49,020 --> 00:34:53,440 So what we want to do is basically divide through all 551 00:34:53,440 --> 00:34:57,710 of these variables by their standard deviation 552 00:34:57,710 --> 00:35:02,440 and compute the covariance matrix on that new scaling. 553 00:35:02,440 --> 00:35:04,980 That's equivalent to just pre and post multiplying 554 00:35:04,980 --> 00:35:08,390 by that diagonal of the inverse of the standard deviations. 555 00:35:08,390 --> 00:35:11,630 So with matrix algebra, that formula 556 00:35:11,630 --> 00:35:18,510 is-- I think it's very clear. 557 00:35:18,510 --> 00:35:26,320 And this is-- now with-- the previous discussion was just 558 00:35:26,320 --> 00:35:29,230 looking at the sort of contemporaneous covariance 559 00:35:29,230 --> 00:35:32,570 matrix of the time series values at the given time 560 00:35:32,570 --> 00:35:34,820 t with itself. 561 00:35:34,820 --> 00:35:38,970 We want to look at, also, the cross covariance matrices. 562 00:35:38,970 --> 00:35:44,890 So how are the current values of the multivariate time series 563 00:35:44,890 --> 00:35:52,000 xt-- how do they covary with the k-th lag of those values? 564 00:35:52,000 --> 00:35:57,810 So gamma k is looking at how the current period is 565 00:35:57,810 --> 00:36:04,750 vector values as covaried with the k-th lag of those values. 566 00:36:04,750 --> 00:36:09,670 So this covariance matrix has covariance elements 567 00:36:09,670 --> 00:36:14,090 given in this display. 568 00:36:14,090 --> 00:36:18,890 And we can define the cross correlation matrix 569 00:36:18,890 --> 00:36:21,450 by similarly pre and post multiplying 570 00:36:21,450 --> 00:36:24,960 by the inverse of the standard deviations. 571 00:36:24,960 --> 00:36:28,565 The diagonal of gamma 0 is the covariance-- 572 00:36:28,565 --> 00:36:33,680 or is the matrix of diagonal entries of variances. 573 00:36:33,680 --> 00:36:35,980 Now, properties of these matrices is-- OK, 574 00:36:35,980 --> 00:36:40,150 gamma 0 is a symmetric matrix that we had before. 575 00:36:40,150 --> 00:36:44,142 But gamma k where k is greater than 1 or less than-- 576 00:36:44,142 --> 00:36:46,600 or greater or equal to 1 or less than-- basically different 577 00:36:46,600 --> 00:36:47,700 from 0. 578 00:36:47,700 --> 00:36:51,020 This is not symmetric. 579 00:36:51,020 --> 00:36:56,680 Basically, you may have lags of some variables that 580 00:36:56,680 --> 00:37:01,010 are positively correlated with others and not vice versa. 581 00:37:01,010 --> 00:37:08,510 So the off diagonal entries here aren't necessarily even 582 00:37:08,510 --> 00:37:11,930 of the same sign, let alone equal and symmetric. 583 00:37:11,930 --> 00:37:17,800 So with these covariance matrices, 584 00:37:17,800 --> 00:37:20,820 one can look at how things covary 585 00:37:20,820 --> 00:37:25,290 and whether they are-- whether there is, basically, 586 00:37:25,290 --> 00:37:28,300 a dependence between them. 587 00:37:28,300 --> 00:37:29,830 And you can define-- it's basically 588 00:37:29,830 --> 00:37:34,240 the j star series-- the j star component 589 00:37:34,240 --> 00:37:37,160 of the multivariate time series may 590 00:37:37,160 --> 00:37:43,790 lead the j-th one if the covariance of the k-th lag of j 591 00:37:43,790 --> 00:37:50,650 star is different from 0-- or the covariance of j star k 592 00:37:50,650 --> 00:37:57,510 lags ago is non-zero covaries with the j-th lag. 593 00:37:57,510 --> 00:37:58,770 Sorry. 594 00:37:58,770 --> 00:37:59,780 The current lag. 595 00:37:59,780 --> 00:38:02,260 So xt j star will lead xtj. 596 00:38:02,260 --> 00:38:07,290 Basically, there's information in the lagged values 597 00:38:07,290 --> 00:38:13,110 of j star for the component j. 598 00:38:13,110 --> 00:38:19,100 So if we're trying to build models-- linear regression 599 00:38:19,100 --> 00:38:22,810 models, even, where we're trying to look at how-- trying 600 00:38:22,810 --> 00:38:26,520 to predict values, then if there's a non-zero covariance, 601 00:38:26,520 --> 00:38:29,110 then we can use those variables' information 602 00:38:29,110 --> 00:38:34,760 to actually project what the one variable is given the other. 603 00:38:34,760 --> 00:38:39,400 Now, it can be the case that you have 604 00:38:39,400 --> 00:38:44,000 non-zero covariance in both directions. 605 00:38:44,000 --> 00:38:46,280 And so that suggests that there can 606 00:38:46,280 --> 00:38:48,400 be sort of feedback between these variables. 607 00:38:48,400 --> 00:38:51,420 It's not just that one variable causes another, 608 00:38:51,420 --> 00:38:54,170 but there can actually be feedback. 609 00:38:54,170 --> 00:38:55,970 In economics and finance, there's 610 00:38:55,970 --> 00:39:00,790 a notion of Granger causality. 611 00:39:00,790 --> 00:39:04,200 And basically that-- well, Granger and Engle 612 00:39:04,200 --> 00:39:08,890 got the Nobel Prize number of years ago based on their work. 613 00:39:08,890 --> 00:39:16,030 And that work deals with identifying, in part, judgments 614 00:39:16,030 --> 00:39:19,170 of causality between-- or Granger 615 00:39:19,170 --> 00:39:22,470 causality between variables economic time series. 616 00:39:22,470 --> 00:39:24,780 And so Granger causality basically 617 00:39:24,780 --> 00:39:31,900 is sort of positive or non-zero correlation between variables 618 00:39:31,900 --> 00:39:35,720 where lags of one variable will cause another or cause changes 619 00:39:35,720 --> 00:39:36,650 in another. 620 00:39:36,650 --> 00:39:41,310 621 00:39:41,310 --> 00:39:42,040 All right. 622 00:39:42,040 --> 00:39:45,560 I want to just alert you to the existence of this Wold 623 00:39:45,560 --> 00:39:47,750 decomposition theorem. 624 00:39:47,750 --> 00:39:53,060 This is an advanced theorem, but it's a useful theorem 625 00:39:53,060 --> 00:39:55,600 to know exists. 626 00:39:55,600 --> 00:40:00,500 And this extends-- the univariate Wold decomposition 627 00:40:00,500 --> 00:40:03,700 theorem, which concerns the-- whenever 628 00:40:03,700 --> 00:40:06,550 we have a covariant stationary process, 629 00:40:06,550 --> 00:40:11,740 there exists a representation of that process, which 630 00:40:11,740 --> 00:40:15,870 is the sum of a deterministic process 631 00:40:15,870 --> 00:40:23,160 and s moving average process of a white noise. 632 00:40:23,160 --> 00:40:27,860 So if you're modeling a time series 633 00:40:27,860 --> 00:40:30,160 and you're going to be specifying 634 00:40:30,160 --> 00:40:33,890 a covariant stationary process for that, 635 00:40:33,890 --> 00:40:38,120 there does exist a Wold decomposition representation 636 00:40:38,120 --> 00:40:39,190 of that. 637 00:40:39,190 --> 00:40:44,660 You can basically determine-- identify 638 00:40:44,660 --> 00:40:47,740 the deterministic process that the process might follow. 639 00:40:47,740 --> 00:40:52,650 It might be a linear trend over time or an exponential trend. 640 00:40:52,650 --> 00:40:58,490 And if you remove that sort of deterministic process vt, 641 00:40:58,490 --> 00:41:02,520 then what remains is a process that 642 00:41:02,520 --> 00:41:08,550 can be modeled with a moving average of white noise. 643 00:41:08,550 --> 00:41:09,390 These. 644 00:41:09,390 --> 00:41:12,610 Now here, everything is changed from univariate case 645 00:41:12,610 --> 00:41:16,670 to multivariate case, so we have matrices in place of constants 646 00:41:16,670 --> 00:41:18,280 from before. 647 00:41:18,280 --> 00:41:24,330 So these-- new concepts here are we 648 00:41:24,330 --> 00:41:26,555 have a multivariate white noise process. 649 00:41:26,555 --> 00:41:29,180 650 00:41:29,180 --> 00:41:32,360 That's going to be a process a to t which 651 00:41:32,360 --> 00:41:36,660 is m dimensional which has mean 0. 652 00:41:36,660 --> 00:41:42,020 And the variance matrix of this m vector 653 00:41:42,020 --> 00:41:46,470 is going to be sigma, which is now m by m variance/covariance 654 00:41:46,470 --> 00:41:49,810 matrix of the components. 655 00:41:49,810 --> 00:41:52,930 And that must be a positive semi-definite definite. 656 00:41:52,930 --> 00:41:58,880 And for white noise, we have covariances between, say, 657 00:41:58,880 --> 00:42:03,320 the current t innovation and a lag of its value are 0. 658 00:42:03,320 --> 00:42:07,440 So these are uncorrelated multivariate white noise 659 00:42:07,440 --> 00:42:09,250 processes. 660 00:42:09,250 --> 00:42:14,130 And so they're uncorrelated with each other at various lags. 661 00:42:14,130 --> 00:42:19,470 And the innovation a to t has a covariance of 0 662 00:42:19,470 --> 00:42:22,070 with the deterministic process. 663 00:42:22,070 --> 00:42:25,720 Actually, that's pretty much a given 664 00:42:25,720 --> 00:42:28,470 if we have a deterministic process. 665 00:42:28,470 --> 00:42:33,900 Now, the term psi k-- basically we have this vector xt 666 00:42:33,900 --> 00:42:36,410 is equal to the sum m vectored process vt 667 00:42:36,410 --> 00:42:38,565 plus this weighted average of innovations. 668 00:42:38,565 --> 00:42:41,970 669 00:42:41,970 --> 00:42:48,300 What's required is that the sum of this-- basically each term 670 00:42:48,300 --> 00:42:51,810 psi k and its transpose converges. 671 00:42:51,810 --> 00:42:54,990 Now, if you were to take that xt process 672 00:42:54,990 --> 00:42:57,920 and say let me compute the variance/covariance matrix 673 00:42:57,920 --> 00:43:02,310 of that representation, then you would basically 674 00:43:02,310 --> 00:43:05,130 get terms in the covariance matrix 675 00:43:05,130 --> 00:43:08,710 which includes this sum of terms. 676 00:43:08,710 --> 00:43:12,260 So that sum has to be finite in order for this 677 00:43:12,260 --> 00:43:15,509 to be covariant stationary. 678 00:43:15,509 --> 00:43:16,425 AUDIENCE: [INAUDIBLE]. 679 00:43:16,425 --> 00:43:17,230 PROFESSOR: Yes? 680 00:43:17,230 --> 00:43:20,270 AUDIENCE: Could you define what you mean by innovation? 681 00:43:20,270 --> 00:43:22,960 PROFESSOR: Oh, OK. 682 00:43:22,960 --> 00:43:27,050 Well, the innovation is-- let's see. 683 00:43:27,050 --> 00:43:32,230 With-- let me go back up here. 684 00:43:32,230 --> 00:43:32,730 OK. 685 00:43:32,730 --> 00:43:39,240 686 00:43:39,240 --> 00:43:43,700 The innovation process-- innovation process. 687 00:43:43,700 --> 00:43:46,330 688 00:43:46,330 --> 00:43:50,220 OK, if we have, as in this case, we 689 00:43:50,220 --> 00:43:58,760 have sort of our xt stochastic process. 690 00:43:58,760 --> 00:44:04,290 And we have sort of, say, f sub t minus 1 691 00:44:04,290 --> 00:44:15,250 equal to the information on x2 minus 1 x2 minus 2. 692 00:44:15,250 --> 00:44:19,730 Basically consisting of the information set available 693 00:44:19,730 --> 00:44:21,520 before time t. 694 00:44:21,520 --> 00:44:30,838 Then we can model xt to be the expected value of xt given ft 695 00:44:30,838 --> 00:44:38,530 minus 1 plus an innovation. 696 00:44:38,530 --> 00:44:40,680 And so our objective in these models 697 00:44:40,680 --> 00:44:50,000 is to be thinking of how is that process evolving where we can 698 00:44:50,000 --> 00:44:53,190 model the process as well as possible using information up 699 00:44:53,190 --> 00:44:54,200 to time before t. 700 00:44:54,200 --> 00:44:59,539 And then there's some disturbance about that model. 701 00:44:59,539 --> 00:45:01,080 There's something new that's happened 702 00:45:01,080 --> 00:45:04,460 at time t that wasn't available before. 703 00:45:04,460 --> 00:45:07,660 And that's this innovation process. 704 00:45:07,660 --> 00:45:11,120 So this representation with the Wold decomposition 705 00:45:11,120 --> 00:45:16,600 is converting the-- or representing, basically, 706 00:45:16,600 --> 00:45:20,410 the bits of information that are affecting the process that 707 00:45:20,410 --> 00:45:24,294 are occurring at time t and wasn't available prior to that. 708 00:45:24,294 --> 00:45:27,590 709 00:45:27,590 --> 00:45:29,060 All right. 710 00:45:29,060 --> 00:45:33,690 Well, let's move on to vector auto regressive processes. 711 00:45:33,690 --> 00:45:39,840 712 00:45:39,840 --> 00:45:43,600 OK, this representation for a vector auto regressive process 713 00:45:43,600 --> 00:45:47,580 is an extension of the univariate auto regressive 714 00:45:47,580 --> 00:45:49,783 process to p dimensions. 715 00:45:49,783 --> 00:45:52,380 Sorry, to m dimensions. 716 00:45:52,380 --> 00:45:56,630 And so our xt is an m vector. 717 00:45:56,630 --> 00:46:05,120 That's going to be equal to some constant vector c plus a matrix 718 00:46:05,120 --> 00:46:10,850 phi 1 times lag of xt first order, xt minus 1. 719 00:46:10,850 --> 00:46:17,195 Plus another matrix, phi 2 times the second lag of xt, 720 00:46:17,195 --> 00:46:19,790 xt minus 2. 721 00:46:19,790 --> 00:46:25,040 Up to the p-th term, which is a phi p m by m matrix times 722 00:46:25,040 --> 00:46:28,800 x2 minus p plus this innovation term. 723 00:46:28,800 --> 00:46:33,030 So this is essentially-- this is basically how a univariate auto 724 00:46:33,030 --> 00:46:37,320 regressive process extends to an m variate case. 725 00:46:37,320 --> 00:46:43,550 And what this allows one to do is 726 00:46:43,550 --> 00:46:47,500 model how a given component of the multivariate series-- 727 00:46:47,500 --> 00:46:51,500 like how one exchange rate varies depending 728 00:46:51,500 --> 00:46:54,500 on how other exchange rates might vary. 729 00:46:54,500 --> 00:47:00,490 Exchange rates tend to co-move together in that example. 730 00:47:00,490 --> 00:47:04,070 So if we look at what this represents 731 00:47:04,070 --> 00:47:08,130 in terms of basically a component series, 732 00:47:08,130 --> 00:47:10,770 we can consider fixing j, a component 733 00:47:10,770 --> 00:47:13,830 of the multivariate process. 734 00:47:13,830 --> 00:47:16,080 It could be the first, the last, or the j-th, 735 00:47:16,080 --> 00:47:17,300 somewhere in the middle. 736 00:47:17,300 --> 00:47:20,590 And that component time series-- like 737 00:47:20,590 --> 00:47:23,645 a fixed exchange rate series or time 738 00:47:23,645 --> 00:47:26,920 series, whatever we're focused on in our modeling-- 739 00:47:26,920 --> 00:47:30,590 is a generalization of the auto regressive model 740 00:47:30,590 --> 00:47:34,410 where we have the auto regressive terms 741 00:47:34,410 --> 00:47:39,890 of the j-th series on lags of the j-th series up to order p. 742 00:47:39,890 --> 00:47:43,410 So we have the univariate auto regressive model, 743 00:47:43,410 --> 00:47:47,140 but we also add to that terms corresponding 744 00:47:47,140 --> 00:47:52,130 to the relationship between xj and xj star. 745 00:47:52,130 --> 00:47:54,340 So how does xj, the j-th component, 746 00:47:54,340 --> 00:47:57,360 depend on other variables, other components 747 00:47:57,360 --> 00:47:58,790 of the multivariate series. 748 00:47:58,790 --> 00:48:01,840 And those are given here. 749 00:48:01,840 --> 00:48:07,890 So it's a convenient way to allow for interdependence 750 00:48:07,890 --> 00:48:10,219 among the components and model that. 751 00:48:10,219 --> 00:48:15,210 752 00:48:15,210 --> 00:48:16,850 OK. 753 00:48:16,850 --> 00:48:25,490 This slide deals with representing a p-th order 754 00:48:25,490 --> 00:48:32,720 process as a first order process with vector auto regressions. 755 00:48:32,720 --> 00:48:37,970 Now the concept here is really a very powerful concept that's 756 00:48:37,970 --> 00:48:41,750 applied in time series methods, which 757 00:48:41,750 --> 00:48:51,110 is when you are modeling dependence that goes back, say, 758 00:48:51,110 --> 00:48:57,060 a number of lags like p lags, the structure 759 00:48:57,060 --> 00:49:02,450 can actually be re-expressed as simply a first order 760 00:49:02,450 --> 00:49:04,660 dependence only. 761 00:49:04,660 --> 00:49:09,070 And so it's much easier sort of to deal with just a lag one 762 00:49:09,070 --> 00:49:13,670 dependence then to consider p lag dependence 763 00:49:13,670 --> 00:49:17,560 and the complications involved with that. 764 00:49:17,560 --> 00:49:22,500 So-- and this technique is one where, 765 00:49:22,500 --> 00:49:26,390 in the early days of fitting, like auto regressive 766 00:49:26,390 --> 00:49:34,520 moving average processes and various smoothing methods, 767 00:49:34,520 --> 00:49:40,520 the model-- basically accommodating 768 00:49:40,520 --> 00:49:44,960 p lags complicated the analysis enormously. 769 00:49:44,960 --> 00:49:46,740 But one can actually re-express it just 770 00:49:46,740 --> 00:49:48,930 as a first order lag problem. 771 00:49:48,930 --> 00:49:53,860 So in this case, what one does is one considers 772 00:49:53,860 --> 00:49:57,580 for a vector auto regressive process of order of p, 773 00:49:57,580 --> 00:50:08,090 simply stacking the values of the process. 774 00:50:08,090 --> 00:50:11,640 So let me just highlight what's going on there. 775 00:50:11,640 --> 00:50:17,620 776 00:50:17,620 --> 00:50:29,500 So if we have basically-- OK, so if we have x1, 777 00:50:29,500 --> 00:50:38,680 x2, xn, which are all m by 1 values, 778 00:50:38,680 --> 00:50:42,640 m vectors of the stochastic process. 779 00:50:42,640 --> 00:50:56,770 Then consider defining zt to be equal to xt transpose xt 780 00:50:56,770 --> 00:51:01,889 minus 1 transpose up to x t minus p minus 1 transpose. 781 00:51:01,889 --> 00:51:07,378 782 00:51:07,378 --> 00:51:09,930 Or this is t minus p minus 1. 783 00:51:09,930 --> 00:51:10,915 So there are p terms. 784 00:51:10,915 --> 00:51:13,630 785 00:51:13,630 --> 00:51:21,130 And then if we consider the lagged value of that, that's 786 00:51:21,130 --> 00:51:30,470 x2 minus 1, x2 minus 2, x2 minus p transpose. 787 00:51:30,470 --> 00:51:35,380 So what we've done is we're considering zt. 788 00:51:35,380 --> 00:51:40,380 This is going to be m times p. 789 00:51:40,380 --> 00:51:46,892 It's actually 1 by m times p in this notation. 790 00:51:46,892 --> 00:51:50,850 Well, actually I guess I should put transpose here. 791 00:51:50,850 --> 00:51:54,740 So m minus p by 1. 792 00:51:54,740 --> 00:51:57,040 OK, in the lecture notes it actually 793 00:51:57,040 --> 00:52:00,640 is primed there to indicate the transpose. 794 00:52:00,640 --> 00:52:03,660 Well, if you define zt and zt minus 1 this way, 795 00:52:03,660 --> 00:52:09,050 then zt is equal to d plus a of zt minus 1 796 00:52:09,050 --> 00:52:12,230 plus f, where this is d. 797 00:52:12,230 --> 00:52:15,410 Basically the constant term has the c entering and then 0's 798 00:52:15,410 --> 00:52:16,660 everywhere else. 799 00:52:16,660 --> 00:52:23,820 And the a matrix is phi 1 phi 2 up to phi p. 800 00:52:23,820 --> 00:52:36,410 And so basically the zt vector transforms the zt-- 801 00:52:36,410 --> 00:52:40,590 or is the transpose-- this linear transformation 802 00:52:40,590 --> 00:52:43,290 of the zt minus 1. 803 00:52:43,290 --> 00:52:46,460 And we have sort of a very simple form 804 00:52:46,460 --> 00:52:52,640 for the constant term and a very simple form for the f vector. 805 00:52:52,640 --> 00:52:59,700 And this is-- renders the model into a sort of a first order 806 00:52:59,700 --> 00:53:06,270 time series model with a larger multivariate series, 807 00:53:06,270 --> 00:53:09,590 basically mp by 1. 808 00:53:09,590 --> 00:53:21,380 Now, with this representation we basically have-- we 809 00:53:21,380 --> 00:53:31,040 can demonstrate that the process is going to be stationary 810 00:53:31,040 --> 00:53:34,750 if all eigenvalues of the companion matrix a 811 00:53:34,750 --> 00:53:38,060 have modulus less than 1. 812 00:53:38,060 --> 00:53:43,320 And let's see-- if we go back to the expression. 813 00:53:43,320 --> 00:53:50,750 OK, if the eigenvalues of this matrix A are less than 1, 814 00:53:50,750 --> 00:53:55,440 then we won't get sort of an explosive behavior 815 00:53:55,440 --> 00:54:00,320 of the process when this basically increments over time 816 00:54:00,320 --> 00:54:03,790 with every previous value getting multiplied by the A 817 00:54:03,790 --> 00:54:10,880 matrix and scaling the process over time by the A-th power. 818 00:54:10,880 --> 00:54:12,450 So that is required. 819 00:54:12,450 --> 00:54:14,620 All eigenvalues of A have to be less than 1. 820 00:54:14,620 --> 00:54:17,090 And equivalently, all roots of this equation 821 00:54:17,090 --> 00:54:21,300 need to be outside the unit circle. 822 00:54:21,300 --> 00:54:25,560 You remember there was a constraint of-- or a condition 823 00:54:25,560 --> 00:54:30,030 for univariate auto regressive models 824 00:54:30,030 --> 00:54:35,100 to be stationary, that the roots of the characteristic equation 825 00:54:35,100 --> 00:54:38,100 are all outside the unit circle. 826 00:54:38,100 --> 00:54:40,170 And the class notes go through and went 827 00:54:40,170 --> 00:54:41,760 through the derivation of that. 828 00:54:41,760 --> 00:54:46,820 This is the extension of that to the multivariate case. 829 00:54:46,820 --> 00:54:50,290 And so basically one needs to solve 830 00:54:50,290 --> 00:54:53,460 for roots of a polynomial in Z and determine 831 00:54:53,460 --> 00:54:59,120 whether those are outside the unit circle. 832 00:54:59,120 --> 00:55:01,460 Who can tell me what the order of the polynomial 833 00:55:01,460 --> 00:55:07,880 is here for this sort of determinant equation? 834 00:55:07,880 --> 00:55:09,720 AUDIENCE: [INAUDIBLE] mp. 835 00:55:09,720 --> 00:55:11,100 PROFESSOR: mp. 836 00:55:11,100 --> 00:55:11,600 Yes. 837 00:55:11,600 --> 00:55:13,670 It's basically of power mp. 838 00:55:13,670 --> 00:55:15,870 So in a determinant you basically 839 00:55:15,870 --> 00:55:19,910 are taking products of the m components 840 00:55:19,910 --> 00:55:24,050 in the matrix, various linear combinations of those. 841 00:55:24,050 --> 00:55:28,610 So that's going to be an mp dimensional polynomial. 842 00:55:28,610 --> 00:55:29,110 All right. 843 00:55:29,110 --> 00:55:32,680 Well, the mean of the stationary VAR process 844 00:55:32,680 --> 00:55:37,220 can be computed rather easily by taking expectations 845 00:55:37,220 --> 00:55:41,660 of this on both sides. 846 00:55:41,660 --> 00:55:44,720 So if we take the expectation of xt 847 00:55:44,720 --> 00:55:48,710 and take expectations across both sides, 848 00:55:48,710 --> 00:55:57,400 we get that mu is the c vector plus the product of the phi 849 00:55:57,400 --> 00:55:59,670 case times mu plus 0. 850 00:55:59,670 --> 00:56:05,620 So mu, the unconditional mean of the process, 851 00:56:05,620 --> 00:56:10,640 actually has this formula just solving 852 00:56:10,640 --> 00:56:18,810 for mu in the top-- in the second line to the third line. 853 00:56:18,810 --> 00:56:27,080 So here we can see that basically this expression 854 00:56:27,080 --> 00:56:33,040 1 minus phi 1 through phi p, that inverse has to exist. 855 00:56:33,040 --> 00:56:36,430 And actually, if we then plug in the value 856 00:56:36,430 --> 00:56:39,050 of c in terms of the unconditional mean, 857 00:56:39,050 --> 00:56:43,860 we get this expression for the original process. 858 00:56:43,860 --> 00:56:49,920 So the unconditional mean c, if we demeaned the process, 859 00:56:49,920 --> 00:56:52,290 there's busy know mean term. 860 00:56:52,290 --> 00:56:53,730 There's 0. 861 00:56:53,730 --> 00:56:58,710 And so basically the mean adjusted process x 862 00:56:58,710 --> 00:57:02,460 follows this multivariate vector auto regression 863 00:57:02,460 --> 00:57:08,430 with no mean, which is actually used when this is specified. 864 00:57:08,430 --> 00:57:11,450 865 00:57:11,450 --> 00:57:18,850 Now, this vector auto regression model 866 00:57:18,850 --> 00:57:25,760 can be expressed as a system of regression equations. 867 00:57:25,760 --> 00:57:33,820 And so what we have with the multivariate series, if we have 868 00:57:33,820 --> 00:57:38,130 multivariate data, we'll have n sample observations, 869 00:57:38,130 --> 00:57:40,765 xt which is basically the m vector 870 00:57:40,765 --> 00:57:45,710 of the multivariate process observed for n time points. 871 00:57:45,710 --> 00:57:48,000 And for the computations here, we're 872 00:57:48,000 --> 00:57:52,050 going to assume that we have p sort of-- we 873 00:57:52,050 --> 00:57:56,100 have pre-sample observations available to us. 874 00:57:56,100 --> 00:57:58,980 So we're essentially going to be considering models 875 00:57:58,980 --> 00:58:01,610 where we condition on the first p time 876 00:58:01,610 --> 00:58:07,660 points in order to facilitate the estimation methodology. 877 00:58:07,660 --> 00:58:12,040 Then we can set up m regression models corresponding 878 00:58:12,040 --> 00:58:16,080 to each component of the m variate series. 879 00:58:16,080 --> 00:58:32,190 And so what we have is our original-- 880 00:58:32,190 --> 00:58:39,100 we have our collection of data values, which is x1 transpose, 881 00:58:39,100 --> 00:58:45,750 x2 transpose, down to xn transpose, 882 00:58:45,750 --> 00:58:52,290 which is an n by m matrix. 883 00:58:52,290 --> 00:58:54,540 OK, this is our multivariate time series 884 00:58:54,540 --> 00:58:56,720 where we were just-- the first row corresponds 885 00:58:56,720 --> 00:58:59,515 to the first time values, nth row to the nth time values. 886 00:58:59,515 --> 00:59:02,180 887 00:59:02,180 --> 00:59:05,580 And we can set up m regression models 888 00:59:05,580 --> 00:59:11,410 where we're going to consider modeling 889 00:59:11,410 --> 00:59:15,920 the j-th column of this matrix. 890 00:59:15,920 --> 00:59:19,180 So we're just picking out the univariate time series 891 00:59:19,180 --> 00:59:21,320 corresponding to the j-th component. 892 00:59:21,320 --> 00:59:23,990 That's yj. 893 00:59:23,990 --> 00:59:31,150 And we're going to model that has z beta j plus epsilon j 894 00:59:31,150 --> 00:59:41,380 where z is given by the vector of lagged values 895 00:59:41,380 --> 00:59:46,320 of the multivariate process where there's, 896 00:59:46,320 --> 00:59:49,040 for the t-th-- t minus first value 897 00:59:49,040 --> 00:59:52,510 we have that current value-- or the t 898 00:59:52,510 --> 00:59:54,890 minus first, t minus second, up to t minus p. 899 00:59:54,890 --> 01:00:01,210 So we have basically p m vectors here. 900 01:00:01,210 --> 01:00:09,100 And so this j-th time series has elements 901 01:00:09,100 --> 01:00:14,190 that follow a linear regression model 902 01:00:14,190 --> 01:00:18,100 on the lags of the entire multivariate series up to p 903 01:00:18,100 --> 01:00:23,380 lags with their progression parameter given by beta j. 904 01:00:23,380 --> 01:00:28,250 And basically the beta j regression parameters 905 01:00:28,250 --> 01:00:36,049 corresponds to the various elements of the phi matrices. 906 01:00:36,049 --> 01:00:38,590 So now there's a one-to-one one correspondence between those. 907 01:00:38,590 --> 01:00:50,800 908 01:00:50,800 --> 01:00:51,300 All right. 909 01:00:51,300 --> 01:00:59,280 So I'm using now a notation where superscript j corresponds 910 01:00:59,280 --> 01:01:02,930 to the j-th component of the series, 911 01:01:02,930 --> 01:01:07,760 of the multivariate stochastic process. 912 01:01:07,760 --> 01:01:12,550 So we have an mp plus 1 vector progression parameters 913 01:01:12,550 --> 01:01:16,160 for each series j, and we have an epsilon j 914 01:01:16,160 --> 01:01:22,100 for an n-vector innovation errors for each series. 915 01:01:22,100 --> 01:01:31,970 And so basically if this, the j-th column, is yj, 916 01:01:31,970 --> 01:01:35,740 we're modeling that to be equal to the simple matrix 917 01:01:35,740 --> 01:01:44,540 Z times beta j plus epsilon j, where this is n by 1. 918 01:01:44,540 --> 01:01:47,790 This is n by np plus 1. 919 01:01:47,790 --> 01:01:51,520 920 01:01:51,520 --> 01:01:55,920 And this beta j is the mp plus 1 progression parameter. 921 01:01:55,920 --> 01:02:04,845 922 01:02:04,845 --> 01:02:05,345 OK. 923 01:02:05,345 --> 01:02:10,140 924 01:02:10,140 --> 01:02:12,030 One might think, OK, one can consider 925 01:02:12,030 --> 01:02:17,320 each of these regressions for each of the component series, 926 01:02:17,320 --> 01:02:19,630 you could consider them separately. 927 01:02:19,630 --> 01:02:23,940 But to consider them all together, 928 01:02:23,940 --> 01:02:29,270 we can define the multivariate regression model, 929 01:02:29,270 --> 01:02:33,420 which has the following form. 930 01:02:33,420 --> 01:02:40,277 We basically have the n vectors for the first component, 931 01:02:40,277 --> 01:02:42,360 and then the second component up to nth component. 932 01:02:42,360 --> 01:02:46,730 So an n by p matrix of dependent variables, 933 01:02:46,730 --> 01:02:53,540 where each column corresponds to a different component series, 934 01:02:53,540 --> 01:02:55,820 follows a linear regression model 935 01:02:55,820 --> 01:02:59,990 with the same Z matrix with different regression 936 01:02:59,990 --> 01:03:03,460 coefficient parameters, beta 1 through beta m corresponding 937 01:03:03,460 --> 01:03:08,040 to the different components of the multivariate series. 938 01:03:08,040 --> 01:03:14,330 And we have epsilon 1, epsilon 2, up to epsilon m. 939 01:03:14,330 --> 01:03:22,220 So we're thinking of taking-- so basically the y1 y2 up to ym 940 01:03:22,220 --> 01:03:27,150 is essentially this original matrix of our multivariate time 941 01:03:27,150 --> 01:03:35,820 series because it's the first component in the first column 942 01:03:35,820 --> 01:03:37,670 and the nth component in the nth column. 943 01:03:37,670 --> 01:03:42,230 And the-- this regression parameter 944 01:03:42,230 --> 01:03:47,090 or this explanatory variables matrix X, Z in this case 945 01:03:47,090 --> 01:03:53,210 corresponds to lags of the whole process up to p lags. 946 01:03:53,210 --> 01:03:58,230 So we're having lags of all the m variate process up to p lags. 947 01:03:58,230 --> 01:04:02,700 So that's mp and then plus 1 for our constant. 948 01:04:02,700 --> 01:04:05,790 So this is the set up for a multivariate regression model. 949 01:04:05,790 --> 01:04:12,880 950 01:04:12,880 --> 01:04:14,910 In terms of how one specifies this, 951 01:04:14,910 --> 01:04:17,630 well, actually, in economic theory 952 01:04:17,630 --> 01:04:20,930 this is also related to seemingly unrelated 953 01:04:20,930 --> 01:04:23,750 regressions, which you'll find in econometrics. 954 01:04:23,750 --> 01:04:26,730 955 01:04:26,730 --> 01:04:32,850 If we want to specify this multivariate model, well, 956 01:04:32,850 --> 01:04:35,520 what we could do is we could actually 957 01:04:35,520 --> 01:04:37,550 specify each of the component models 958 01:04:37,550 --> 01:04:42,430 separately because we basically have sort of-- can think 959 01:04:42,430 --> 01:04:44,890 of the univariate regression model for each component 960 01:04:44,890 --> 01:04:47,010 series. 961 01:04:47,010 --> 01:04:52,580 And this slide indicates basically what 962 01:04:52,580 --> 01:04:53,810 the formulas are for that. 963 01:04:53,810 --> 01:04:58,189 So if we don't know anything about multivariate regression 964 01:04:58,189 --> 01:04:59,730 we can say, well, let's start by just 965 01:04:59,730 --> 01:05:03,620 doing the univariate regression of each component series 966 01:05:03,620 --> 01:05:04,820 on the lags. 967 01:05:04,820 --> 01:05:07,540 And so we get our beta hat j's least squares 968 01:05:07,540 --> 01:05:10,330 estimates given by the usual formula where 969 01:05:10,330 --> 01:05:14,510 the independent variable is matrix Z goes Z transpose Z 970 01:05:14,510 --> 01:05:17,280 inverse Z transpose Y of the residual. 971 01:05:17,280 --> 01:05:20,090 So these are familiar formulas. 972 01:05:20,090 --> 01:05:28,680 And if we did this for each of the component series j, 973 01:05:28,680 --> 01:05:33,750 then we would actually get sample estimates 974 01:05:33,750 --> 01:05:37,351 of the innovation process, eta 1. 975 01:05:37,351 --> 01:05:40,970 Basically the whole eta series. 976 01:05:40,970 --> 01:05:45,980 And we could actually define from these estimates 977 01:05:45,980 --> 01:05:49,110 of the innovations our covariance matrix 978 01:05:49,110 --> 01:05:52,500 for the innovations as the sample covariance 979 01:05:52,500 --> 01:05:54,640 matrix of these etas. 980 01:05:54,640 --> 01:05:58,170 So all of these formulas are-- you're basically 981 01:05:58,170 --> 01:06:00,830 applying very straightforward estimation 982 01:06:00,830 --> 01:06:05,440 methods for the parameters of a linear regression 983 01:06:05,440 --> 01:06:08,855 and then estimating variances/covariances 984 01:06:08,855 --> 01:06:11,440 of these innovation terms. 985 01:06:11,440 --> 01:06:14,470 So from this, we actually have estimates 986 01:06:14,470 --> 01:06:20,420 of this process in terms of the sigma and the beta hats. 987 01:06:20,420 --> 01:06:24,220 But it's made assuming that we can 988 01:06:24,220 --> 01:06:26,755 treat each of these component regressions separately. 989 01:06:26,755 --> 01:06:33,410 990 01:06:33,410 --> 01:06:35,310 A rather remarkable result is that 991 01:06:35,310 --> 01:06:40,300 these component-wise regressions are actually 992 01:06:40,300 --> 01:06:44,470 the optimal estimates for the multivariate regression 993 01:06:44,470 --> 01:06:46,030 as well. 994 01:06:46,030 --> 01:06:51,840 And as mathematicians, this kind of result 995 01:06:51,840 --> 01:06:54,610 is, I think, rather neat and elegant. 996 01:06:54,610 --> 01:06:58,900 And maybe some of you will think this is very obvious, 997 01:06:58,900 --> 01:07:05,720 but it actually-- it isn't quite obvious. 998 01:07:05,720 --> 01:07:08,010 That said, this component-wise estimation 999 01:07:08,010 --> 01:07:10,430 should be optimal as well. 1000 01:07:10,430 --> 01:07:13,100 And the next section of the lecture notes 1001 01:07:13,100 --> 01:07:16,965 goes through this argument. 1002 01:07:16,965 --> 01:07:20,140 1003 01:07:20,140 --> 01:07:22,480 And I'm going to, in the interest of time, 1004 01:07:22,480 --> 01:07:26,590 go through this-- just sort of highlight what the results are. 1005 01:07:26,590 --> 01:07:29,580 The details are in these notes that you can go through. 1006 01:07:29,580 --> 01:07:34,750 And I will be happy to go into more detail about them 1007 01:07:34,750 --> 01:07:37,260 during office hours. 1008 01:07:37,260 --> 01:07:41,520 But if we're fitting a vector auto regression model where 1009 01:07:41,520 --> 01:07:44,750 there are no constraints on the coefficient matrices, 1010 01:07:44,750 --> 01:07:52,060 phi 1 through phi p, then these component-wise estimates, 1011 01:07:52,060 --> 01:07:56,950 accounting for arbitrary covariance matrix sigma 1012 01:07:56,950 --> 01:08:01,300 for the innovations, those basically 1013 01:08:01,300 --> 01:08:03,810 are equal to the generalized least squares estimates 1014 01:08:03,810 --> 01:08:06,030 of these underlying parameters. 1015 01:08:06,030 --> 01:08:09,280 You'll recall we talked about the Gauss Markov theorem 1016 01:08:09,280 --> 01:08:14,480 where we were able to extend the assumption of sort 1017 01:08:14,480 --> 01:08:16,380 equal variances across observations 1018 01:08:16,380 --> 01:08:20,189 to unequal variances and covariances. 1019 01:08:20,189 --> 01:08:23,832 Well, it turns out to these component-wise OLS 1020 01:08:23,832 --> 01:08:26,040 estimates are, in fact, the generalized least squared 1021 01:08:26,040 --> 01:08:27,180 estimates. 1022 01:08:27,180 --> 01:08:30,160 And under the assumption of Gaussian distributions 1023 01:08:30,160 --> 01:08:32,410 for the innovations, they, in fact, 1024 01:08:32,410 --> 01:08:34,569 are maximum likelihood estimates. 1025 01:08:34,569 --> 01:08:41,210 And this theory applies Kronecker products. 1026 01:08:41,210 --> 01:08:43,609 We're not going to have any homework with Kronecker 1027 01:08:43,609 --> 01:08:44,580 products. 1028 01:08:44,580 --> 01:08:47,160 These notes really are for those who 1029 01:08:47,160 --> 01:08:51,120 have some more extensive background in linear algebra. 1030 01:08:51,120 --> 01:08:55,130 But it's a very nice use of these Kronecker product 1031 01:08:55,130 --> 01:08:56,180 operators. 1032 01:08:56,180 --> 01:09:02,479 Basically, this notation-- or no, x circle-- 1033 01:09:02,479 --> 01:09:04,970 I'll call it Kronecker-- is one where 1034 01:09:04,970 --> 01:09:08,560 you take a matrix A and a matrix B 1035 01:09:08,560 --> 01:09:11,170 and you consider the matrix which 1036 01:09:11,170 --> 01:09:14,950 takes each element of A times the whole matrix B. 1037 01:09:14,950 --> 01:09:18,370 So we start with an m by n matrix A 1038 01:09:18,370 --> 01:09:22,220 and end up with an mp by qn matrix 1039 01:09:22,220 --> 01:09:25,550 by taking each element of A times the whole matrix B. 1040 01:09:25,550 --> 01:09:29,010 So it's, they say, has this block structure. 1041 01:09:29,010 --> 01:09:32,510 So this is very simple definition. 1042 01:09:32,510 --> 01:09:37,080 If you look at properties of transposition of matrices, 1043 01:09:37,080 --> 01:09:38,540 you can prove these results. 1044 01:09:38,540 --> 01:09:42,850 These are properties of the Kronecker product. 1045 01:09:42,850 --> 01:09:54,320 And there's a vec operator which takes a matrix 1046 01:09:54,320 --> 01:09:58,470 and simply stacks the columns together. 1047 01:09:58,470 --> 01:10:04,700 And in the talk last Tuesday of Ivan's, talking about modeling 1048 01:10:04,700 --> 01:10:07,845 the volatility surface, he basically, he 1049 01:10:07,845 --> 01:10:11,410 was modeling a two dimensional surface-- or a surface 1050 01:10:11,410 --> 01:10:13,910 in three dimensions, but there was 1051 01:10:13,910 --> 01:10:16,830 two dimensions explaining it. 1052 01:10:16,830 --> 01:10:22,140 You basically can stack columns of the matrix 1053 01:10:22,140 --> 01:10:27,030 and be modeling a vector instead of a matrix of values. 1054 01:10:27,030 --> 01:10:32,130 So the vectorizing operator allows us to manipulate terms 1055 01:10:32,130 --> 01:10:35,400 into a more convenient form. 1056 01:10:35,400 --> 01:10:39,040 And this multivariate regression model 1057 01:10:39,040 --> 01:10:50,950 is one where it's set up as sort of a n by m matrix Y, 1058 01:10:50,950 --> 01:10:53,490 having that structure. 1059 01:10:53,490 --> 01:10:57,110 It can be expressed in terms of the linear regression form 1060 01:10:57,110 --> 01:11:06,380 as y star equaling the vector, the vec of y. 1061 01:11:06,380 --> 01:11:15,340 So we basically have y1 y2 down to ym all lined up. 1062 01:11:15,340 --> 01:11:18,480 So this is pm by 1. 1063 01:11:18,480 --> 01:11:21,600 1064 01:11:21,600 --> 01:11:30,055 That's going to be equal to some matrix plus the epsilon 1 1065 01:11:30,055 --> 01:11:33,920 epsilon 2 down to epsilon n. 1066 01:11:33,920 --> 01:11:38,850 And then there's going to be a matrix 1067 01:11:38,850 --> 01:11:43,970 and a regression coefficient matrix beta 1 beta 1068 01:11:43,970 --> 01:11:47,360 2 down to beta p. 1069 01:11:47,360 --> 01:11:51,320 So we consider vectorizing the beta matrix, 1070 01:11:51,320 --> 01:11:55,340 vectorizing epsilon, and vectorizing y. 1071 01:11:55,340 --> 01:11:59,465 And then in order to define this sort 1072 01:11:59,465 --> 01:12:03,370 of simple linear regression model, univariate regression 1073 01:12:03,370 --> 01:12:08,750 model, well, we need to have a Z in the first column 1074 01:12:08,750 --> 01:12:14,800 here corresponding to beta 1 for Y1 and 0's everywhere else. 1075 01:12:14,800 --> 01:12:20,160 In the second block we want to have 1076 01:12:20,160 --> 01:12:25,920 a Z in the second off diagonal with 0's everywhere else and so 1077 01:12:25,920 --> 01:12:27,100 forth. 1078 01:12:27,100 --> 01:12:30,960 So this is just re-expressing everything in this notation. 1079 01:12:30,960 --> 01:12:34,290 But the notation is very nice because, at the end of the day 1080 01:12:34,290 --> 01:12:36,760 we basically have a regression model like we 1081 01:12:36,760 --> 01:12:39,030 had when we were doing our regression analysis. 1082 01:12:39,030 --> 01:12:42,930 So all the theory we have for specifying these models 1083 01:12:42,930 --> 01:12:46,270 plays through with univariate regression. 1084 01:12:46,270 --> 01:12:50,309 And one can go through this technical argument 1085 01:12:50,309 --> 01:12:52,600 to show that the generalized least squares estimate is, 1086 01:12:52,600 --> 01:12:59,430 in fact, the equivalent to the component-wise values. 1087 01:12:59,430 --> 01:13:03,980 And that's very, very good. 1088 01:13:03,980 --> 01:13:07,000 Maximum likelihood estimation with these models. 1089 01:13:07,000 --> 01:13:12,130 Well, we actually use this vectorized notation 1090 01:13:12,130 --> 01:13:15,050 to define the likelihood function. 1091 01:13:15,050 --> 01:13:20,560 And if these assumptions are made 1092 01:13:20,560 --> 01:13:24,610 about the linear regression model, 1093 01:13:24,610 --> 01:13:28,740 we basically have an n times m vector 1094 01:13:28,740 --> 01:13:34,780 of dependent variable values, whereas your multivariate 1095 01:13:34,780 --> 01:13:38,700 normal with mean given by x star beta star 1096 01:13:38,700 --> 01:13:41,870 and then a covariance matrix epsilon. 1097 01:13:41,870 --> 01:13:47,380 The covariance matrix of epsilon star is sigma star. 1098 01:13:47,380 --> 01:13:50,900 Well, sigma star is In Kronecker product sigma. 1099 01:13:50,900 --> 01:13:54,130 So if you go through the math of this, 1100 01:13:54,130 --> 01:13:59,250 everything matches up in terms of what the assumptions are. 1101 01:13:59,250 --> 01:14:05,340 And the conditional probability density function of this data 1102 01:14:05,340 --> 01:14:14,930 is the usual functions of log normal or of a normal sample. 1103 01:14:14,930 --> 01:14:20,850 So we have unknown parameters beta star sigma, 1104 01:14:20,850 --> 01:14:26,960 which are equal to the joint density 1105 01:14:26,960 --> 01:14:29,740 of this normal linear regression model. 1106 01:14:29,740 --> 01:14:33,250 So this corresponds to what we had 1107 01:14:33,250 --> 01:14:34,980 before in our regression analysis. 1108 01:14:34,980 --> 01:14:37,050 We just had this more complicated definition 1109 01:14:37,050 --> 01:14:40,900 of the independent variables matrix X star. 1110 01:14:40,900 --> 01:14:42,470 And a more complicated definition 1111 01:14:42,470 --> 01:14:47,270 of our variance/covariance matrix sigma star. 1112 01:14:47,270 --> 01:14:50,250 But the log likelihood function ends up 1113 01:14:50,250 --> 01:14:54,390 being equal to a term proportional 1114 01:14:54,390 --> 01:14:59,090 to the log of the determinant of our sigma matrix 1115 01:14:59,090 --> 01:15:03,790 and minus one half q of beta sigma, where q of beta sigma 1116 01:15:03,790 --> 01:15:08,860 is the least squares criterion for each of the component 1117 01:15:08,860 --> 01:15:12,960 models summed up. 1118 01:15:12,960 --> 01:15:16,660 So the component-wise maximum likelihood estimation 1119 01:15:16,660 --> 01:15:19,115 is-- for the underlying parameters, 1120 01:15:19,115 --> 01:15:23,260 is the same as the large one. 1121 01:15:23,260 --> 01:15:31,880 And in terms of estimating the covariance matrix, 1122 01:15:31,880 --> 01:15:37,420 there's a notion called the concentrated log likelihood, 1123 01:15:37,420 --> 01:15:45,200 which comes into play in models with many parameters. 1124 01:15:45,200 --> 01:15:48,390 In this model, we have unknown parameters-- 1125 01:15:48,390 --> 01:15:52,230 our regression parameters beta and our covariance matrix 1126 01:15:52,230 --> 01:15:55,190 for the innovations sigma. 1127 01:15:55,190 --> 01:15:59,850 It turns out that our estimate of the regression parameter 1128 01:15:59,850 --> 01:16:05,190 beta is independent, doesn't depend-- not statistically 1129 01:16:05,190 --> 01:16:06,570 independent-- but does not depend 1130 01:16:06,570 --> 01:16:10,700 on the value of the covariance matrix sigma. 1131 01:16:10,700 --> 01:16:14,110 So whatever sigma is, we have the same maximum likelihood 1132 01:16:14,110 --> 01:16:15,620 estimate for the betas. 1133 01:16:15,620 --> 01:16:19,760 So we can consider the log likelihood 1134 01:16:19,760 --> 01:16:24,540 setting the beta parameter equal to its maximum likelihood 1135 01:16:24,540 --> 01:16:25,410 estimate. 1136 01:16:25,410 --> 01:16:27,270 And then we have a function that just 1137 01:16:27,270 --> 01:16:31,350 depends on the data and the unknown parameter sigma. 1138 01:16:31,350 --> 01:16:34,230 So that's a concentrated likelihood function 1139 01:16:34,230 --> 01:16:36,210 that needs to be maximized. 1140 01:16:36,210 --> 01:16:40,570 And the maximization of the log of a determinant of a matrix 1141 01:16:40,570 --> 01:16:43,900 minus n over 2, the trace of that matrix times an estimate 1142 01:16:43,900 --> 01:16:47,210 of it, that has been solved. 1143 01:16:47,210 --> 01:16:50,240 It's a bit involved. 1144 01:16:50,240 --> 01:16:52,760 But if you're interested in the mathematics for how that's 1145 01:16:52,760 --> 01:16:55,470 actually solved and how you take derivatives of determinants 1146 01:16:55,470 --> 01:16:58,152 and so forth, there's a paper by Anderson and Olkin 1147 01:16:58,152 --> 01:16:59,860 that goes through all the details of that 1148 01:16:59,860 --> 01:17:01,622 that you can Google on the web. 1149 01:17:01,622 --> 01:17:05,910 1150 01:17:05,910 --> 01:17:07,550 Finally, let's see. 1151 01:17:07,550 --> 01:17:09,270 There's-- well, not finally. 1152 01:17:09,270 --> 01:17:12,240 There's model selection criteria that can be applied. 1153 01:17:12,240 --> 01:17:14,720 These have been applied before for regression models 1154 01:17:14,720 --> 01:17:18,780 for univariate time series model, the Akaike Information 1155 01:17:18,780 --> 01:17:22,770 Criterion, the Bayes Information Criterion, Hannan-Quinn 1156 01:17:22,770 --> 01:17:24,640 Criterion. 1157 01:17:24,640 --> 01:17:27,060 These definitions are all consistent 1158 01:17:27,060 --> 01:17:29,680 with the other definitions. 1159 01:17:29,680 --> 01:17:33,330 They basically take the likelihood function 1160 01:17:33,330 --> 01:17:36,260 and you try to maximize that plus a penalty 1161 01:17:36,260 --> 01:17:39,330 for the number of unknown parameters. 1162 01:17:39,330 --> 01:17:43,380 And that's given here. 1163 01:17:43,380 --> 01:17:45,920 1164 01:17:45,920 --> 01:17:47,780 OK, then the last section goes through 1165 01:17:47,780 --> 01:17:53,500 an asymptotic distribution of least squares estimates. 1166 01:17:53,500 --> 01:17:56,950 And I'll let you read that on your own. 1167 01:17:56,950 --> 01:17:57,450 Let's see. 1168 01:17:57,450 --> 01:18:03,750 For this lecture I put together an example of fitting vector 1169 01:18:03,750 --> 01:18:09,330 auto regressions with some macroeconomic variables. 1170 01:18:09,330 --> 01:18:15,360 And I just wanted to point that out to you. 1171 01:18:15,360 --> 01:18:23,738 So let me go to this document here. 1172 01:18:23,738 --> 01:18:25,690 What have we got here? 1173 01:18:25,690 --> 01:18:29,594 1174 01:18:29,594 --> 01:18:30,580 All right. 1175 01:18:30,580 --> 01:18:31,310 Well, OK. 1176 01:18:31,310 --> 01:18:37,410 Modeling macroeconomic time series is an important topic. 1177 01:18:37,410 --> 01:18:39,940 It's what sort of central bankers do. 1178 01:18:39,940 --> 01:18:42,550 They want to understand what factors are affecting 1179 01:18:42,550 --> 01:18:45,880 the economy in terms of growth, inflation, unemployment. 1180 01:18:45,880 --> 01:18:50,600 And what's the impact of interest rate policies. 1181 01:18:50,600 --> 01:18:52,940 There are some really important papers 1182 01:18:52,940 --> 01:18:56,750 by Robert Lederman and Christopher Sims dealing 1183 01:18:56,750 --> 01:18:58,780 with fitting vector auto regression 1184 01:18:58,780 --> 01:19:01,590 models to a macroeconomic time series. 1185 01:19:01,590 --> 01:19:03,420 And actually, the framework within which 1186 01:19:03,420 --> 01:19:07,680 they specified these models was a Bayesian framework, 1187 01:19:07,680 --> 01:19:11,320 which is an extension of the maximum likelihood method where 1188 01:19:11,320 --> 01:19:14,860 you'll incorporate reasonable sort 1189 01:19:14,860 --> 01:19:18,190 prior assumptions about what the parameters ought to be. 1190 01:19:18,190 --> 01:19:26,130 But in this note, I sort of basically 1191 01:19:26,130 --> 01:19:29,870 go through collecting various macroeconomic variables 1192 01:19:29,870 --> 01:19:33,240 directly off the web using the package r. 1193 01:19:33,240 --> 01:19:36,550 All this stuff is-- these are data 1194 01:19:36,550 --> 01:19:39,040 that you can get your hands on. 1195 01:19:39,040 --> 01:19:43,900 Here's the unemployment rate from January 1946 1196 01:19:43,900 --> 01:19:47,030 up through this past month. 1197 01:19:47,030 --> 01:19:52,670 Anyone can see how that's varied between much less than 4% 1198 01:19:52,670 --> 01:19:56,470 to over 10%, as it was recently. 1199 01:19:56,470 --> 01:19:59,550 And there's also the Fed funds rate, 1200 01:19:59,550 --> 01:20:02,100 which is one of the key variables 1201 01:20:02,100 --> 01:20:06,610 that the Federal Reserve Open Market Committee controls, 1202 01:20:06,610 --> 01:20:08,880 or I should say controlled in the past, 1203 01:20:08,880 --> 01:20:10,840 to try and affect the economy. 1204 01:20:10,840 --> 01:20:14,720 Now that value of that rate is set almost at zero 1205 01:20:14,720 --> 01:20:19,340 and other means are applied to have an impact 1206 01:20:19,340 --> 01:20:24,940 on economic growth the economic situation 1207 01:20:24,940 --> 01:20:31,340 of the market-- of the economy, rather. 1208 01:20:31,340 --> 01:20:32,120 Let's see. 1209 01:20:32,120 --> 01:20:34,330 There's also-- anyway, a bunch of other variables. 1210 01:20:34,330 --> 01:20:38,470 CPI, which is a measure of inflation. 1211 01:20:38,470 --> 01:20:45,502 What this note goes through is the specification 1212 01:20:45,502 --> 01:20:52,070 of vector auto regression models for these series. 1213 01:20:52,070 --> 01:20:54,490 And I use just a small set of cases. 1214 01:20:54,490 --> 01:20:58,640 I look at unemployment rate, federal funds, 1215 01:20:58,640 --> 01:21:02,470 and the CPI, which is a measure of inflation. 1216 01:21:02,470 --> 01:21:06,580 And there's-- if one goes through, 1217 01:21:06,580 --> 01:21:10,780 there are multivariate versions of the autocorrelation 1218 01:21:10,780 --> 01:21:14,670 function, as given on the top right panel here, 1219 01:21:14,670 --> 01:21:17,110 between these variables. 1220 01:21:17,110 --> 01:21:20,350 And one can also do the partial autocorrelation function. 1221 01:21:20,350 --> 01:21:23,289 You'll recall that autocorrelation functions 1222 01:21:23,289 --> 01:21:24,830 and partial autocorrelation functions 1223 01:21:24,830 --> 01:21:29,040 are related to what kind of-- or help us understand what kind 1224 01:21:29,040 --> 01:21:31,390 of order ARMA processes might be appropriate 1225 01:21:31,390 --> 01:21:32,670 for univariate series. 1226 01:21:32,670 --> 01:21:36,750 For multivariate series, then there are basically 1227 01:21:36,750 --> 01:21:39,760 cross lags between variables that are important, 1228 01:21:39,760 --> 01:21:42,750 and these can call be captured with vector auto regression 1229 01:21:42,750 --> 01:21:43,520 models. 1230 01:21:43,520 --> 01:21:47,370 So this goes through and shows how 1231 01:21:47,370 --> 01:21:50,610 these things are correlated with themselves. 1232 01:21:50,610 --> 01:21:51,970 And let's see. 1233 01:21:51,970 --> 01:21:59,550 At the end of this note, there are some impulse 1234 01:21:59,550 --> 01:22:02,660 response functions graphed, which 1235 01:22:02,660 --> 01:22:07,370 are looking at what is the impact of an innovation in one 1236 01:22:07,370 --> 01:22:11,090 of the components of the multivariate time series. 1237 01:22:11,090 --> 01:22:16,570 So like if Fed funds were to be increased by a certain value, 1238 01:22:16,570 --> 01:22:20,140 what would the likely impact be on the unemployment rate? 1239 01:22:20,140 --> 01:22:22,240 Or on GNP? 1240 01:22:22,240 --> 01:22:25,540 Basically, the production level of the economy. 1241 01:22:25,540 --> 01:22:30,790 And this looks at-- let's see. 1242 01:22:30,790 --> 01:22:32,260 Well, actually here we're looking 1243 01:22:32,260 --> 01:22:34,555 at the impulse function. 1244 01:22:34,555 --> 01:22:36,680 You can look at the impulse function of innovations 1245 01:22:36,680 --> 01:22:40,000 on any of the component variables on all the others. 1246 01:22:40,000 --> 01:22:42,150 And in this case, on the left panel 1247 01:22:42,150 --> 01:22:47,790 here is-- it shows what happens when unemployment 1248 01:22:47,790 --> 01:22:50,360 has a spike up, or unit spike. 1249 01:22:50,360 --> 01:22:51,760 A unit impulse up. 1250 01:22:51,760 --> 01:22:55,460 Well, this second panel shows what's 1251 01:22:55,460 --> 01:22:57,190 likely to happen to the Fed funds rate. 1252 01:22:57,190 --> 01:22:59,730 It turns out that's likely to go down. 1253 01:22:59,730 --> 01:23:01,670 And that sort of is indicating-- it's sort 1254 01:23:01,670 --> 01:23:03,370 of reflecting what, historically, 1255 01:23:03,370 --> 01:23:07,490 was the policy of the Fed to basically reduce interest 1256 01:23:07,490 --> 01:23:11,550 rates if unemployment was rising. 1257 01:23:11,550 --> 01:23:16,400 And then-- so anyway, these impulse response functions 1258 01:23:16,400 --> 01:23:18,690 correspond to essentially those innovation 1259 01:23:18,690 --> 01:23:20,450 terms on the Wold decomposition. 1260 01:23:20,450 --> 01:23:22,360 And why are these important? 1261 01:23:22,360 --> 01:23:26,720 Well, this indicates a connection, basically, 1262 01:23:26,720 --> 01:23:30,260 between that sort of moving average representation 1263 01:23:30,260 --> 01:23:31,870 and these time series models. 1264 01:23:31,870 --> 01:23:35,480 And the way these graphs are generated 1265 01:23:35,480 --> 01:23:39,090 is by essentially finding the Wold decomposition 1266 01:23:39,090 --> 01:23:43,880 and then incorporating that into these values. 1267 01:23:43,880 --> 01:23:47,540 So-- OK, we'll finish there for today.