1 00:00:00,090 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high-quality educational resources for free. 5 00:00:10,730 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,210 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,210 --> 00:00:17,835 at ocw.mit.edu. 8 00:00:21,650 --> 00:00:24,030 PROFESSOR: We introduced the data last time. 9 00:00:24,030 --> 00:00:27,700 These were some macroeconomic variables 10 00:00:27,700 --> 00:00:33,990 that can be used for forecasting the economy in terms of growth 11 00:00:33,990 --> 00:00:39,330 and factors such as inflation or unemployment. 12 00:00:39,330 --> 00:00:44,020 The case note goes through analyzing just three 13 00:00:44,020 --> 00:00:47,690 of these economic time series-- the unemployment rate, 14 00:00:47,690 --> 00:00:51,360 the federal funds rate, and a measure of the CPI, 15 00:00:51,360 --> 00:00:52,530 or Consumer Price Index. 16 00:00:56,450 --> 00:01:00,520 When one fits a vector autoregression model 17 00:01:00,520 --> 00:01:08,940 to this data, it turns out that the roots 18 00:01:08,940 --> 00:01:16,800 of the characteristic polynomial are 1.002, then 0.9863. 19 00:01:16,800 --> 00:01:19,090 And you recall when our discussion of vector 20 00:01:19,090 --> 00:01:23,140 autoregressive models, there's a characteristic equation 21 00:01:23,140 --> 00:01:25,425 sort of in matrix form, the determinant 22 00:01:25,425 --> 00:01:29,720 is just like the univariate autoregressive case. 23 00:01:29,720 --> 00:01:44,120 And in order for the process to be invertible, basically, 24 00:01:44,120 --> 00:01:46,150 the roots of the characteristic polynomial 25 00:01:46,150 --> 00:01:50,370 need to be less than 1 in magnitude. 26 00:01:50,370 --> 00:01:54,110 In this implementation of the vector autoregression model, 27 00:01:54,110 --> 00:01:57,220 the characteristic roots are the inverses 28 00:01:57,220 --> 00:01:59,620 of the characteristic roots that we've been discussing. 29 00:01:59,620 --> 00:02:03,770 So anyway, this particular fit of the vector autoregression 30 00:02:03,770 --> 00:02:11,370 model suggests that the process is non-stationary. 31 00:02:11,370 --> 00:02:17,580 And so one should be considering different series 32 00:02:17,580 --> 00:02:20,400 to model this as a stationary time series. 33 00:02:20,400 --> 00:02:26,520 But in terms of interpreting the regression model, 34 00:02:26,520 --> 00:02:36,320 one can see-- to accommodate the non-stationarity, 35 00:02:36,320 --> 00:02:41,020 we can take differences of all the series 36 00:02:41,020 --> 00:02:43,360 and fit the vector autoregression 37 00:02:43,360 --> 00:02:45,550 to the difference series. 38 00:02:45,550 --> 00:02:49,210 So one way of eliminating any non-stationarity in time series 39 00:02:49,210 --> 00:02:52,810 models, basically eliminate the random walk 40 00:02:52,810 --> 00:02:57,290 aspect of the processes, is to be modeling first differences. 41 00:02:57,290 --> 00:03:06,180 And so doing that with this series-- let's see. 42 00:03:06,180 --> 00:03:10,220 Here is just a graph of the time series properties 43 00:03:10,220 --> 00:03:11,800 of the difference series. 44 00:03:15,210 --> 00:03:19,180 So with our original series, we take differences and eliminate 45 00:03:19,180 --> 00:03:22,820 missing values in this R code. 46 00:03:22,820 --> 00:03:25,300 And this autocorrelation function 47 00:03:25,300 --> 00:03:31,100 shows us basically the correlations 48 00:03:31,100 --> 00:03:33,420 and autocorrelations of individual series 49 00:03:33,420 --> 00:03:36,950 and the cross-correlations across the different series. 50 00:03:36,950 --> 00:03:41,680 So along the diagonals are the autocorrelation function. 51 00:03:41,680 --> 00:03:43,800 And one can see that every series 52 00:03:43,800 --> 00:03:47,280 is correlation one with itself. 53 00:03:47,280 --> 00:03:52,380 But then at the first lag, positive for the Fed 54 00:03:52,380 --> 00:03:56,450 funds and the CPI measure. 55 00:03:56,450 --> 00:03:58,980 And there's also some cross-correlations 56 00:03:58,980 --> 00:04:01,550 that are strong. 57 00:04:01,550 --> 00:04:04,180 And whether or not a correlation is strong or not 58 00:04:04,180 --> 00:04:06,125 depends upon how much uncertainty there 59 00:04:06,125 --> 00:04:08,250 is in our estimate of the correlation. 60 00:04:08,250 --> 00:04:11,750 And these dashed lines here correspond 61 00:04:11,750 --> 00:04:16,980 to plus or minus two standard deviations of the correlation 62 00:04:16,980 --> 00:04:23,440 coefficient when the correlation coefficient is equal to 0. 63 00:04:23,440 --> 00:04:28,470 So any correlations that sort of go beyond those bounds 64 00:04:28,470 --> 00:04:29,715 is statistically significant. 65 00:04:33,180 --> 00:04:39,210 The partial autocorrelation function is graphed here. 66 00:04:39,210 --> 00:04:42,730 And let's say our time series problem 67 00:04:42,730 --> 00:04:46,040 set goes through some discussion of the partial autocorrelation 68 00:04:46,040 --> 00:04:48,600 coefficients and the interpretation of those. 69 00:04:48,600 --> 00:04:51,910 The partial autocorrelation coefficients 70 00:04:51,910 --> 00:04:57,450 are the correlation between one variable 71 00:04:57,450 --> 00:04:59,330 and the lag of another after explaining 72 00:04:59,330 --> 00:05:02,110 for all lower degree lags. 73 00:05:02,110 --> 00:05:06,480 So it's like the incremental correlation of a variable 74 00:05:06,480 --> 00:05:10,760 with a lag term that exists. 75 00:05:10,760 --> 00:05:13,830 And so if we are actually fitting regression models where 76 00:05:13,830 --> 00:05:18,460 we include extra lags of a given variable, 77 00:05:18,460 --> 00:05:20,570 that partial autocorrelation coefficient 78 00:05:20,570 --> 00:05:25,260 is essentially the correlation associated with the addition 79 00:05:25,260 --> 00:05:27,620 of the final lagged variable. 80 00:05:27,620 --> 00:05:30,230 So here, we can see that each of these series 81 00:05:30,230 --> 00:05:33,950 is quite strongly correlated with itself. 82 00:05:33,950 --> 00:05:37,470 But there are also some cross-correlations 83 00:05:37,470 --> 00:05:42,750 with, like, the unemployment rate and the Fed funds rate. 84 00:05:42,750 --> 00:05:46,700 Basically, the Fed funds rate tends 85 00:05:46,700 --> 00:05:50,400 to go down when the unemployment rate goes up. 86 00:05:50,400 --> 00:05:54,610 And so this data is indicating the association 87 00:05:54,610 --> 00:05:56,640 between these macroeconomic variables 88 00:05:56,640 --> 00:05:59,100 and the evidence of that behavior. 89 00:05:59,100 --> 00:06:02,100 In terms of modeling the actual structural relations 90 00:06:02,100 --> 00:06:05,930 between these, we need several, up to about 10 91 00:06:05,930 --> 00:06:08,380 or 12 variables more than these three. 92 00:06:08,380 --> 00:06:12,710 And then one can have a better understanding 93 00:06:12,710 --> 00:06:15,750 of the drivers of various macroeconomic features. 94 00:06:15,750 --> 00:06:17,250 But this sort of illustrates the use 95 00:06:17,250 --> 00:06:19,950 of these methods with this reduced variable case. 96 00:06:22,830 --> 00:06:25,650 Let me also go down here and just 97 00:06:25,650 --> 00:06:33,710 comment on the unemployment rate or the Fed funds rate. 98 00:06:46,050 --> 00:06:48,460 When fitting these vector autoregressive models 99 00:06:48,460 --> 00:06:52,070 with the packages that exist in R, 100 00:06:52,070 --> 00:06:56,320 they give us output which provides the specification 101 00:06:56,320 --> 00:07:01,440 of each of the autoregressive models 102 00:07:01,440 --> 00:07:05,260 for the different dependent variables, the different series 103 00:07:05,260 --> 00:07:07,620 of the process. 104 00:07:07,620 --> 00:07:13,610 And so here is the case of the regression model for Fed funds 105 00:07:13,610 --> 00:07:17,720 as a function of unemployment rate lagged, 106 00:07:17,720 --> 00:07:21,040 Fed funds rate lagged, and CPI lagged. 107 00:07:21,040 --> 00:07:25,240 These are all on the different scale. 108 00:07:25,240 --> 00:07:27,730 When you're looking at these results, what's 109 00:07:27,730 --> 00:07:31,340 important is basically how strong 110 00:07:31,340 --> 00:07:33,850 the signal-to-noise ratio is for estimating 111 00:07:33,850 --> 00:07:37,590 these autoregressive parameters, vector 112 00:07:37,590 --> 00:07:39,130 autoregressive parameters. 113 00:07:39,130 --> 00:07:43,540 And so with the Fed funds, you can look at the t values. 114 00:07:43,540 --> 00:07:45,920 And t values that are larger than 2 115 00:07:45,920 --> 00:07:49,210 are certainly quite significant. 116 00:07:49,210 --> 00:07:53,540 And you can see that basically when the unemployment rate 117 00:07:53,540 --> 00:07:59,250 coefficient is a negative 0.71, so if the unemployment 118 00:07:59,250 --> 00:08:05,270 rate goes up, we expect to see the Fed rate going down 119 00:08:05,270 --> 00:08:07,080 the next month. 120 00:08:07,080 --> 00:08:15,650 And the Fed funds rate for the lag 1 has a t value of 7.97. 121 00:08:15,650 --> 00:08:18,790 So these are now models on the differences. 122 00:08:18,790 --> 00:08:21,480 So if the Fed funds rate was increased 123 00:08:21,480 --> 00:08:25,880 last month or last quarter, it's likely to be increased again. 124 00:08:25,880 --> 00:08:31,560 And that's partly a factor of how slow the economy is 125 00:08:31,560 --> 00:08:34,049 in reacting to changes and how the Fed doesn't 126 00:08:34,049 --> 00:08:40,200 want to shock the economy with large changes in their policy 127 00:08:40,200 --> 00:08:42,909 rates. 128 00:08:42,909 --> 00:08:46,600 Another thing to notice here is that there's actually 129 00:08:46,600 --> 00:08:50,230 a negative coefficient on the lag 2 130 00:08:50,230 --> 00:08:54,490 Fed funds term, a negative 0.17. 131 00:08:54,490 --> 00:08:58,870 And in interpreting these kinds of models, 132 00:08:58,870 --> 00:09:02,510 I think it's helpful just to think of, 133 00:09:02,510 --> 00:09:06,210 if you have Fed funds sub t, that's 134 00:09:06,210 --> 00:09:13,970 equal to minus 0.71 times the unemployment rate at t minus 1. 135 00:09:13,970 --> 00:09:24,050 And then we have plus 0.37 times the Fed funds, so t minus 1. 136 00:09:24,050 --> 00:09:24,820 And this is delta. 137 00:09:24,820 --> 00:09:31,330 And then minus 1.8 times the Fed funds. 138 00:09:31,330 --> 00:09:35,000 So t minus 2. 139 00:09:35,000 --> 00:09:39,290 In interpreting these coefficients, 140 00:09:39,290 --> 00:09:43,020 notice that these two terms correspond 141 00:09:43,020 --> 00:09:57,110 to 0.19 times the Fed funds change 1 lag ago plus 0.18 142 00:09:57,110 --> 00:09:59,445 times the change in that rate. 143 00:10:03,550 --> 00:10:06,360 So when you see multiple lags coming 144 00:10:06,360 --> 00:10:11,720 into play in these models, the interpretation of them 145 00:10:11,720 --> 00:10:17,560 can be made by considering different transformations 146 00:10:17,560 --> 00:10:20,210 essentially of the underlying variables. 147 00:10:20,210 --> 00:10:23,130 In this form, you can see that OK, the Fed funds 148 00:10:23,130 --> 00:10:30,180 tends to change the way it changed the previous month. 149 00:10:30,180 --> 00:10:38,644 But it also may change depending on the double change 150 00:10:38,644 --> 00:10:39,560 in the previous month. 151 00:10:39,560 --> 00:10:42,620 So there's a degree of acceleration in the Fed funds 152 00:10:42,620 --> 00:10:44,450 that is being captured here. 153 00:10:44,450 --> 00:10:47,640 So the interpretation of these models 154 00:10:47,640 --> 00:10:51,930 sometimes requires some care. 155 00:10:51,930 --> 00:10:55,560 This kind of analysis, I find it quite useful. 156 00:11:02,600 --> 00:11:09,710 So let's push on to the next topic. 157 00:11:09,710 --> 00:11:13,230 So today's topics are going to begin with a discussion 158 00:11:13,230 --> 00:11:15,640 of cointegration. 159 00:11:15,640 --> 00:11:18,980 Cointegration is a major topic in time series analysis, which 160 00:11:18,980 --> 00:11:23,980 is dealing with the analysis of non-stationary time series. 161 00:11:23,980 --> 00:11:28,060 And in the previous discussion, we 162 00:11:28,060 --> 00:11:29,910 addressed non-stationarity of series 163 00:11:29,910 --> 00:11:32,214 by taking first differences to eliminate 164 00:11:32,214 --> 00:11:33,130 that non-stationarity. 165 00:11:36,440 --> 00:11:40,140 But we may be losing some information 166 00:11:40,140 --> 00:11:41,450 with that differencing. 167 00:11:41,450 --> 00:11:44,940 And cointegration provides a framework 168 00:11:44,940 --> 00:11:47,440 within which we characterize all available 169 00:11:47,440 --> 00:11:49,680 information for statistical modeling, 170 00:11:49,680 --> 00:11:52,920 in a very systematic way. 171 00:11:52,920 --> 00:11:58,580 So let's introduce the context within which 172 00:11:58,580 --> 00:12:00,630 cointegration is relevant. 173 00:12:00,630 --> 00:12:05,810 It's relevant when we have a stochastic process, 174 00:12:05,810 --> 00:12:08,620 a multivariate stochastic process, which 175 00:12:08,620 --> 00:12:12,060 is integrated of some order d. 176 00:12:12,060 --> 00:12:15,810 And to be integrated of order d means 177 00:12:15,810 --> 00:12:18,920 that if we take the d-th difference, 178 00:12:18,920 --> 00:12:21,395 then that d-th difference is stationary. 179 00:12:23,980 --> 00:12:33,720 So and if you look at a time series 180 00:12:33,720 --> 00:12:38,630 and you plot that over time, well, OK, a stationary time 181 00:12:38,630 --> 00:12:43,010 series we know should be something that basically 182 00:12:43,010 --> 00:12:45,010 has a constant mean over time. 183 00:12:45,010 --> 00:12:48,580 There's some steady mean which that has. 184 00:12:48,580 --> 00:12:51,470 And the variability is also constant. 185 00:12:51,470 --> 00:12:59,000 With some other time series, it might increase linearly 186 00:12:59,000 --> 00:13:00,940 over time. 187 00:13:00,940 --> 00:13:03,600 And a series that increases linearly over time, well, 188 00:13:03,600 --> 00:13:05,070 if you take first differences, that 189 00:13:05,070 --> 00:13:07,650 tends to take out that linear trend. 190 00:13:07,650 --> 00:13:10,230 If there are higher order differencing is required, then 191 00:13:10,230 --> 00:13:14,160 that means that there's some curvature, quadratic say, 192 00:13:14,160 --> 00:13:18,760 that may exist in the data that is being taken out. 193 00:13:18,760 --> 00:13:25,460 So this differencing is required to result in stationarity. 194 00:13:25,460 --> 00:13:32,430 If the process does have vector autoregressive representation 195 00:13:32,430 --> 00:13:35,330 in spite of its non-stationarity, 196 00:13:35,330 --> 00:13:43,920 then it can be represented by a polynomial lag of the x's is 197 00:13:43,920 --> 00:13:48,690 equal to white noise epsilon. 198 00:13:48,690 --> 00:13:53,590 And the polynomial phi of L going 199 00:13:53,590 --> 00:13:59,180 to have a factor term in there of 1 minus L, 200 00:13:59,180 --> 00:14:02,100 basically the first difference to the d power. 201 00:14:02,100 --> 00:14:06,300 So if taking these the d-th order difference 202 00:14:06,300 --> 00:14:12,430 reduces it to stationarity, then we 203 00:14:12,430 --> 00:14:16,630 can express this vector autoregression in this way. 204 00:14:16,630 --> 00:14:26,620 So the phi star of L basically represents 205 00:14:26,620 --> 00:14:31,110 the stationary vector autoregressive process 206 00:14:31,110 --> 00:14:33,255 on the d-th difference series. 207 00:14:47,730 --> 00:14:52,780 Now, as it says here, each of the component series 208 00:14:52,780 --> 00:14:57,090 may be non-stationary and integrated, say of order one. 209 00:14:57,090 --> 00:15:02,770 But the process itself may not be jointly integrated. 210 00:15:02,770 --> 00:15:08,900 In that it may be that there are linear combinations 211 00:15:08,900 --> 00:15:13,800 of our multivariate series which are stationary. 212 00:15:13,800 --> 00:15:20,570 And so these linear combinations basically 213 00:15:20,570 --> 00:15:25,050 represent the stationary features of the process. 214 00:15:25,050 --> 00:15:31,160 And those features can be apparent without looking 215 00:15:31,160 --> 00:15:32,490 at differences. 216 00:15:32,490 --> 00:15:35,350 So in a sense, if you just focused 217 00:15:35,350 --> 00:15:38,880 on differences of these non-stationary multivariate 218 00:15:38,880 --> 00:15:43,560 series, you would be losing out on information 219 00:15:43,560 --> 00:15:49,900 of the stationary structure of contemporaneous components 220 00:15:49,900 --> 00:15:52,230 of the multivariate series. 221 00:15:52,230 --> 00:15:56,130 And so cointegration deals with this situation 222 00:15:56,130 --> 00:16:01,480 where some linear combinations of the multivariate series 223 00:16:01,480 --> 00:16:02,996 in fact are stationary. 224 00:16:08,810 --> 00:16:15,090 So how do we represent that mathematically? 225 00:16:15,090 --> 00:16:19,020 Well, we say that this multivariate time series 226 00:16:19,020 --> 00:16:24,360 process is cointegrated if there exists an m-vector beta 227 00:16:24,360 --> 00:16:29,470 such that, defining linear weights on the x's, and 228 00:16:29,470 --> 00:16:32,225 beta prime X_t is a stationary process. 229 00:16:37,920 --> 00:16:42,610 The cointegration vector of beta can be scaled arbitrarily. 230 00:16:42,610 --> 00:16:49,110 So it's common practice, if one has 231 00:16:49,110 --> 00:16:51,200 an interest, some primary interest, perhaps, 232 00:16:51,200 --> 00:16:53,580 in the first component series of process, 233 00:16:53,580 --> 00:16:56,680 to set that equal to 1. 234 00:16:56,680 --> 00:17:01,020 And the expression basically says 235 00:17:01,020 --> 00:17:06,470 that our time t value of the first series 236 00:17:06,470 --> 00:17:11,930 is related in a stationary way to a linear combination 237 00:17:11,930 --> 00:17:15,550 of the other m minus 1 series. 238 00:17:15,550 --> 00:17:21,859 And this is a long-run equilibrium type relationship. 239 00:17:21,859 --> 00:17:25,510 How does this arise? 240 00:17:25,510 --> 00:17:30,570 Well, it arises in many, many ways in economics and finance. 241 00:17:33,100 --> 00:17:36,000 The term structure of interest rates, purchase power parity. 242 00:17:38,820 --> 00:17:42,660 In the terms structure of interest rates, 243 00:17:42,660 --> 00:17:47,100 basically the differences between yields 244 00:17:47,100 --> 00:17:50,260 on interest rates over different maturities, 245 00:17:50,260 --> 00:17:52,600 those differences might be stationary. 246 00:17:52,600 --> 00:17:56,780 The overall level of interest might not be stationary, 247 00:17:56,780 --> 00:18:01,350 but the spreads ought to be stationary. 248 00:18:01,350 --> 00:18:04,680 The purchase power parity in foreign exchange, 249 00:18:04,680 --> 00:18:10,940 if you look at the value of currencies 250 00:18:10,940 --> 00:18:14,830 for different countries, basically different countries 251 00:18:14,830 --> 00:18:19,710 ought to be able to purchase the same goods for roughly 252 00:18:19,710 --> 00:18:20,720 the same price. 253 00:18:20,720 --> 00:18:23,860 And so if there are disparities in currency values, 254 00:18:23,860 --> 00:18:27,740 purchase power parity suggests that things will revert back 255 00:18:27,740 --> 00:18:32,900 to some norm where everybody is paying on average over time 256 00:18:32,900 --> 00:18:34,960 the same amount for different goods. 257 00:18:34,960 --> 00:18:37,460 Otherwise, there would be arbitrage. 258 00:18:40,030 --> 00:18:41,890 Money demand, covered interest rate parity, 259 00:18:41,890 --> 00:18:44,340 law of one price, spot and futures. 260 00:18:44,340 --> 00:18:48,470 Let me show you another example that 261 00:18:48,470 --> 00:18:54,820 will be in the case study for this chapter. 262 00:19:00,290 --> 00:19:06,410 View, full screen. 263 00:19:06,410 --> 00:19:09,900 Let's think about energy futures. 264 00:19:09,900 --> 00:19:13,450 In fact, next Tuesday's talk from Morgan Stanley 265 00:19:13,450 --> 00:19:18,490 is going to be an expert in commodity futures and options. 266 00:19:18,490 --> 00:19:21,090 And that should be very interesting. 267 00:19:21,090 --> 00:19:28,920 Anyway, here, I'm looking at energy futures 268 00:19:28,920 --> 00:19:31,136 from the Energy Information Administration. 269 00:19:31,136 --> 00:19:32,510 Actually, for this course, trying 270 00:19:32,510 --> 00:19:36,970 to get data that's freely available to students 271 00:19:36,970 --> 00:19:40,560 is one of the things we do. 272 00:19:40,560 --> 00:19:42,646 So this data is actually available from the Energy 273 00:19:42,646 --> 00:19:44,770 Information Administration of the government, which 274 00:19:44,770 --> 00:19:48,960 is now open, so I guess that'll be updated over time. 275 00:19:48,960 --> 00:19:52,070 But basically these energy futures 276 00:19:52,070 --> 00:19:55,570 are traded on the Chicago Mercantile Exchange. 277 00:19:55,570 --> 00:20:03,290 And basically CL is crude, West Texas intermediate crude, 278 00:20:03,290 --> 00:20:08,760 light crude, which we have here, a time series from 2006 279 00:20:08,760 --> 00:20:12,670 to basically yesterday. 280 00:20:12,670 --> 00:20:16,340 And you can see how at the start of the period around $60 281 00:20:16,340 --> 00:20:19,080 and then went up to close to $140, 282 00:20:19,080 --> 00:20:22,440 and then it dropped down to around $40. 283 00:20:22,440 --> 00:20:26,110 And it's been hovering around $100 lately. 284 00:20:26,110 --> 00:20:33,040 The second series here is gasoline, RBOB gasoline. 285 00:20:33,040 --> 00:20:36,240 Always have to look this up. 286 00:20:36,240 --> 00:20:42,690 This is that reformulated blend stock for oxygenated blending 287 00:20:42,690 --> 00:20:43,250 gasoline. 288 00:20:43,250 --> 00:20:48,030 Anyway, futures on this product are traded at the CME as well. 289 00:20:48,030 --> 00:20:50,750 And then heating oil. 290 00:20:50,750 --> 00:20:56,780 And what's happening with these data 291 00:20:56,780 --> 00:21:08,880 is that we have basically a refinery which processes 292 00:21:08,880 --> 00:21:15,990 crude oil as an input. 293 00:21:15,990 --> 00:21:20,180 And it basically refines it, distills it, 294 00:21:20,180 --> 00:21:36,600 and generates outputs, which include heating oil, gasoline, 295 00:21:36,600 --> 00:21:41,680 and various other things like jet fuel and others. 296 00:21:41,680 --> 00:21:46,460 So if we're looking at the prices, 297 00:21:46,460 --> 00:21:49,510 the futures prices of, say, gasoline and heating oil, 298 00:21:49,510 --> 00:21:55,710 relating those to crude oil, well, certainly, 299 00:21:55,710 --> 00:21:59,140 the cost of producing these products should depend 300 00:21:59,140 --> 00:22:01,820 on the cost of the input . 301 00:22:01,820 --> 00:22:10,480 So I've got in the next plot, a translation of these futures 302 00:22:10,480 --> 00:22:15,510 contracts into their price per barrel. 303 00:22:15,510 --> 00:22:19,320 Turns out crude is quoted in dollars per barrel. 304 00:22:19,320 --> 00:22:24,390 And the gasoline heating oil are in cents per gallon. 305 00:22:24,390 --> 00:22:26,490 So one multiplies. 306 00:22:26,490 --> 00:22:28,310 There are 42 gallons in a barrel. 307 00:22:28,310 --> 00:22:30,960 So you multiply those previous years by 42. 308 00:22:30,960 --> 00:22:33,549 And this shows the plot of the prices of the futures 309 00:22:33,549 --> 00:22:35,590 where we're looking at essentially the same units 310 00:22:35,590 --> 00:22:40,600 of output relative to input. 311 00:22:40,600 --> 00:22:45,700 And what's evident here is that while the futures for gasoline, 312 00:22:45,700 --> 00:22:50,450 the blue, is consistently above the green, the input, and same 313 00:22:50,450 --> 00:22:52,520 for heating oil. 314 00:22:52,520 --> 00:22:55,680 And those vary depending on which is greater. 315 00:22:55,680 --> 00:23:02,600 So if we look at the difference between, say, 316 00:23:02,600 --> 00:23:07,020 the price of the heating oil future and the crude oil 317 00:23:07,020 --> 00:23:11,625 future, what does that represent? 318 00:23:14,380 --> 00:23:20,780 That's the spread in value of the output minus the input. 319 00:23:20,780 --> 00:23:21,546 Ray? 320 00:23:21,546 --> 00:23:24,282 AUDIENCE: [INAUDIBLE] cost of running the refinery? 321 00:23:27,146 --> 00:23:31,940 PROFESSOR: So cost of refining. 322 00:23:31,940 --> 00:23:39,700 So let's look at, say, heating oil minus CL and, say, 323 00:23:39,700 --> 00:23:43,930 this RBOB minus CL. 324 00:23:43,930 --> 00:23:46,670 So it's cost of refining. 325 00:23:46,670 --> 00:23:49,487 What else could be a factor here? 326 00:23:49,487 --> 00:23:51,820 AUDIENCE: Supply and demand characteristics [INAUDIBLE]. 327 00:23:51,820 --> 00:23:52,736 PROFESSOR: Definitely. 328 00:23:52,736 --> 00:23:54,165 Supply and demand. 329 00:23:54,165 --> 00:23:56,290 If one product is demanded a lot more than another. 330 00:23:58,280 --> 00:23:59,030 Supply and demand. 331 00:24:05,820 --> 00:24:08,215 Anything else? 332 00:24:08,215 --> 00:24:09,840 AUDIENCE: Maybe for the outputs, if you 333 00:24:09,840 --> 00:24:11,340 were to find the difference between the outputs, 334 00:24:11,340 --> 00:24:13,060 it would be something cyclical. 335 00:24:13,060 --> 00:24:15,640 For example, in the winter, heating oil 336 00:24:15,640 --> 00:24:17,840 is going to get far more valuable as gasoline, 337 00:24:17,840 --> 00:24:19,840 because people drive less and people demand more 338 00:24:19,840 --> 00:24:20,950 for heating homes. 339 00:24:20,950 --> 00:24:22,080 PROFESSOR: Absolutely. 340 00:24:22,080 --> 00:24:25,670 That's a very significant factor with these. 341 00:24:25,670 --> 00:24:29,230 There are seasonal effects that drive supply and demand. 342 00:24:29,230 --> 00:24:35,460 And so we can put seasonal effects in there 343 00:24:35,460 --> 00:24:36,980 as affecting supply and demand. 344 00:24:36,980 --> 00:24:40,280 But certainly, you might expect to see seasonal structure here. 345 00:24:40,280 --> 00:24:43,720 Anything else? 346 00:24:43,720 --> 00:24:47,070 Put on your traders hat. 347 00:24:47,070 --> 00:24:49,310 Profit, yes. 348 00:24:49,310 --> 00:24:53,160 The refinery needs to make some profit. 349 00:24:53,160 --> 00:24:58,520 So there has to be some level of profit that's 350 00:24:58,520 --> 00:25:02,240 acceptable and appropriate. 351 00:25:02,240 --> 00:25:05,250 So we have all these things driving basically 352 00:25:05,250 --> 00:25:07,630 these differences. 353 00:25:07,630 --> 00:25:10,220 Let's just take a look at those differences. 354 00:25:10,220 --> 00:25:14,880 These are actually called the crack spreads. 355 00:25:14,880 --> 00:25:19,250 Cracking in the business of refining 356 00:25:19,250 --> 00:25:22,220 is basically the breaking down of oil 357 00:25:22,220 --> 00:25:26,250 into components, products. 358 00:25:26,250 --> 00:25:31,800 And on the top is the gasoline crack spread. 359 00:25:31,800 --> 00:25:35,460 And the bottom is the heating oil crack spread. 360 00:25:35,460 --> 00:25:37,720 And one can see that as time series, 361 00:25:37,720 --> 00:25:41,860 these actually look stationary. 362 00:25:41,860 --> 00:25:45,920 There certainly doesn't appear to be a linear trend up. 363 00:25:45,920 --> 00:25:51,390 But there are, of course, many factors that could affect this. 364 00:25:51,390 --> 00:25:59,110 So with that as motivation, how would we model such a series? 365 00:25:59,110 --> 00:26:01,230 So let's go back to our lecture here. 366 00:26:06,420 --> 00:26:08,775 All right, View, full size. 367 00:26:15,760 --> 00:26:18,430 This is going to be a very technical discussion, 368 00:26:18,430 --> 00:26:25,460 but it's, at the end of the day, I think fairly straightforward. 369 00:26:25,460 --> 00:26:27,210 And the objective actually of this lecture 370 00:26:27,210 --> 00:26:31,240 is to provide an introduction to the notation here, which 371 00:26:31,240 --> 00:26:35,860 should make it seem like it's a very straightforward derivation 372 00:26:35,860 --> 00:26:37,800 process of these models. 373 00:26:37,800 --> 00:26:42,890 So let's begin with just a recap of the vector autoregressive 374 00:26:42,890 --> 00:26:45,350 model of order p. 375 00:26:45,350 --> 00:26:47,570 This is the extension of the univariate case where 376 00:26:47,570 --> 00:26:52,870 we have a vector C of constants, m constants, 377 00:26:52,870 --> 00:26:56,960 and matrices phi_1 to phi_p corresponding 378 00:26:56,960 --> 00:27:01,650 to basically how the autoregression of one series 379 00:27:01,650 --> 00:27:04,810 depends on all the other series. 380 00:27:04,810 --> 00:27:08,270 And then there's multivariate white noise eta_t, 381 00:27:08,270 --> 00:27:13,630 which has mean 0 and some covariance structure in it. 382 00:27:13,630 --> 00:27:19,830 And the stationarity-- if this series were stationary, 383 00:27:19,830 --> 00:27:28,050 then the determinant of this matrix polynomial 384 00:27:28,050 --> 00:27:33,360 would have roots outside the unit circle for complex z. 385 00:27:33,360 --> 00:27:39,290 And if it's not stationary, then some of those roots 386 00:27:39,290 --> 00:27:41,680 will be on the unit circle or beyond. 387 00:27:41,680 --> 00:27:45,125 So let's actually go to that non-stationary case 388 00:27:45,125 --> 00:27:50,540 and suppose that the process is integrated of order one. 389 00:27:50,540 --> 00:27:53,050 So if we were to take first differences, 390 00:27:53,050 --> 00:27:54,175 we would have stationarity. 391 00:28:02,690 --> 00:28:06,500 Well, the derivation of the model 392 00:28:06,500 --> 00:28:12,150 proceeds by converting the original vector autoregressive 393 00:28:12,150 --> 00:28:16,050 equation into an equation that's mostly 394 00:28:16,050 --> 00:28:19,560 relating to differences but with also some extra terms. 395 00:28:19,560 --> 00:28:24,130 So let's begin the process by just subtracting 396 00:28:24,130 --> 00:28:26,620 the lagged value of the multivariate vector 397 00:28:26,620 --> 00:28:29,030 from the original series. 398 00:28:29,030 --> 00:28:31,290 So we subtract X_(t-1) from both sides, 399 00:28:31,290 --> 00:28:37,330 and we get delta X_t is equal to C plus phi_1 minus I_m X_(t-1) 400 00:28:37,330 --> 00:28:38,200 plus the rest. 401 00:28:38,200 --> 00:28:41,960 So that's a very simple step. 402 00:28:41,960 --> 00:28:46,220 We're just subtracting the lagged multivariate series 403 00:28:46,220 --> 00:28:49,370 from both sides. 404 00:28:49,370 --> 00:28:53,290 Now, what we want to do is convert 405 00:28:53,290 --> 00:28:59,930 the second term in the middle line into a difference term. 406 00:28:59,930 --> 00:29:00,990 So what do we do? 407 00:29:00,990 --> 00:29:07,900 Well, we can subtract and add phi_1 minus I_m times X_(t-2). 408 00:29:07,900 --> 00:29:10,440 If we do that, subtract and add that, 409 00:29:10,440 --> 00:29:13,810 we then get the delta X_t is C plus a multiple of delta 410 00:29:13,810 --> 00:29:19,530 X_(t-1) plus this multiple of X_(t-2). 411 00:29:19,530 --> 00:29:22,240 So we basically reduced the equations 412 00:29:22,240 --> 00:29:25,290 to differences in the first two terms 413 00:29:25,290 --> 00:29:29,520 or in the current series and the lagged. 414 00:29:29,520 --> 00:29:33,550 But then we have the original series for lags t minus 2. 415 00:29:33,550 --> 00:29:38,660 We can continue this process with the third. 416 00:29:38,660 --> 00:29:42,460 And then at the end of the day, we 417 00:29:42,460 --> 00:29:46,150 end up getting this equation for the difference of the series 418 00:29:46,150 --> 00:29:49,300 is equal to a constant plus a matrix multiple 419 00:29:49,300 --> 00:29:53,880 of the first difference multivariate series, 420 00:29:53,880 --> 00:29:56,920 plus another matrix times the second difference, 421 00:29:56,920 --> 00:30:01,720 all the way down to the p-th difference, 422 00:30:01,720 --> 00:30:03,760 or the p minus first difference. 423 00:30:03,760 --> 00:30:07,400 But at the end, we're left with terms 424 00:30:07,400 --> 00:30:11,320 at p lags that have no differences in them. 425 00:30:11,320 --> 00:30:14,440 So we've been able to represent this series 426 00:30:14,440 --> 00:30:19,090 as an autoregressive function of differences. 427 00:30:19,090 --> 00:30:24,010 But there's also a term on the undifferenced series 428 00:30:24,010 --> 00:30:27,470 at the end that's left over. 429 00:30:27,470 --> 00:30:34,900 And or this argument can actually 430 00:30:34,900 --> 00:30:38,330 proceed by eliminating differences in the reverse way, 431 00:30:38,330 --> 00:30:42,650 starting with the p-th lag and going up. 432 00:30:42,650 --> 00:30:47,200 And one then can represent this as delta X_t 433 00:30:47,200 --> 00:30:50,170 is C plus some matrix times just the 434 00:30:50,170 --> 00:30:56,000 lagged series plus various matrices times the differences 435 00:30:56,000 --> 00:30:58,880 going back p minus 1 lags. 436 00:31:05,460 --> 00:31:10,200 And so at the end of the day, this model 437 00:31:10,200 --> 00:31:14,270 basically for delta X_t is a constant 438 00:31:14,270 --> 00:31:20,760 plus a matrix times the previous lagged series 439 00:31:20,760 --> 00:31:25,660 or the first lag of the multivariate time series, 440 00:31:25,660 --> 00:31:30,320 plus various autoregressive lags of the differenced series. 441 00:31:32,960 --> 00:31:36,130 So these notes give you the formulas for those, 442 00:31:36,130 --> 00:31:40,840 and they're very easy to verify if you go through them 443 00:31:40,840 --> 00:31:41,594 one by one. 444 00:31:45,730 --> 00:31:51,760 And when we look at this expression for the model, 445 00:31:51,760 --> 00:31:57,270 this expresses the stochastic process model 446 00:31:57,270 --> 00:31:59,560 for the difference series. 447 00:31:59,560 --> 00:32:03,780 This difference series is stationary. 448 00:32:03,780 --> 00:32:05,970 We've eliminated the non-stationarity 449 00:32:05,970 --> 00:32:06,630 in the process. 450 00:32:06,630 --> 00:32:09,160 So that means the right-hand side 451 00:32:09,160 --> 00:32:12,890 has to be stationary as well. 452 00:32:12,890 --> 00:32:19,890 And so while the terms which are matrix multiples of lags 453 00:32:19,890 --> 00:32:21,390 of the differenced series, those are 454 00:32:21,390 --> 00:32:23,750 going to be stationary because we're just 455 00:32:23,750 --> 00:32:27,680 taking lags of the stationary multivariate time 456 00:32:27,680 --> 00:32:29,540 series, the difference series. 457 00:32:29,540 --> 00:32:36,880 But this pi X_t term has to be stationary as well. 458 00:32:36,880 --> 00:32:41,640 So this pi X_t contains the cointegrating terms. 459 00:32:41,640 --> 00:32:46,600 And fitting a sort of cointegrated vector 460 00:32:46,600 --> 00:32:53,490 autoregression model involves identifying this term, pi X_t. 461 00:32:53,490 --> 00:33:00,870 And given that the original series had unit roots, 462 00:33:00,870 --> 00:33:06,195 it has to be the case that pi, the matrix, is singular. 463 00:33:09,550 --> 00:33:12,080 So it's basically a transformation 464 00:33:12,080 --> 00:33:15,310 of the data that eliminates that unit 465 00:33:15,310 --> 00:33:19,880 root in the overall series. 466 00:33:19,880 --> 00:33:24,440 So the matrix pi is of reduced rank, 467 00:33:24,440 --> 00:33:27,676 and it's either rank zero, in which case 468 00:33:27,676 --> 00:33:29,300 there's no cointegrating relationships, 469 00:33:29,300 --> 00:33:34,500 or its rank is less than m. 470 00:33:34,500 --> 00:33:39,060 And the matrix pi does define the cointegrating 471 00:33:39,060 --> 00:33:40,550 relationships. 472 00:33:40,550 --> 00:33:43,080 Now, these cointegrating relationships 473 00:33:43,080 --> 00:33:48,990 are the relationships in the process that are stationary. 474 00:33:48,990 --> 00:33:53,200 And so basically there's a lot of information 475 00:33:53,200 --> 00:33:57,880 in that multivariate series with contemporaneous values 476 00:33:57,880 --> 00:33:59,470 of the series. 477 00:33:59,470 --> 00:34:02,500 There is stationary structure at every single time 478 00:34:02,500 --> 00:34:08,199 point, which can be the target of the modeling. 479 00:34:08,199 --> 00:34:16,250 So this matrix pi is of rank r less than m. 480 00:34:16,250 --> 00:34:22,100 And so it can be expressed as basically alpha beta 481 00:34:22,100 --> 00:34:30,540 prime, where these matrices are of rank r, alpha and beta. 482 00:34:30,540 --> 00:34:33,199 And the columns of beta define linearly independent vectors 483 00:34:33,199 --> 00:34:34,770 which cointegrate x. 484 00:34:34,770 --> 00:34:37,909 And the decomposition of pi isn't unique. 485 00:34:37,909 --> 00:34:43,389 You can basically, for any invertible r by r matrix g, 486 00:34:43,389 --> 00:34:46,350 define another set of cointegrating relationships. 487 00:34:46,350 --> 00:34:50,340 So in the linear algebra structure of these problems, 488 00:34:50,340 --> 00:34:52,800 there's basically an r-dimensional space 489 00:34:52,800 --> 00:34:56,360 where the process is stationary, and how 490 00:34:56,360 --> 00:35:02,020 you define the coordinate system in that space is up to you 491 00:35:02,020 --> 00:35:08,130 or subject to some choice. 492 00:35:08,130 --> 00:35:09,780 So how do we estimate these models? 493 00:35:09,780 --> 00:35:15,520 Well, rather nice result of Sims, Stock, and Watson. 494 00:35:15,520 --> 00:35:17,800 Actually, Sims, Christopher Sims, 495 00:35:17,800 --> 00:35:21,790 he got the Nobel Prize a few years ago for his work 496 00:35:21,790 --> 00:35:23,730 in econometrics. 497 00:35:23,730 --> 00:35:33,850 And so this is a rather significant work that he did. 498 00:35:33,850 --> 00:35:36,740 Anyway, he, together with Stock and Watson, 499 00:35:36,740 --> 00:35:41,120 prove that if you're estimating a vector autoregression model, 500 00:35:41,120 --> 00:35:45,490 then the least squares estimator of the original model 501 00:35:45,490 --> 00:35:49,150 is basically sufficient to do an analysis 502 00:35:49,150 --> 00:35:56,600 of this cointegrated vector autoregression process. 503 00:35:56,600 --> 00:35:58,960 The parameter estimates from just fitting 504 00:35:58,960 --> 00:36:03,610 the vector autoregression are consistent for the underlying 505 00:36:03,610 --> 00:36:04,657 parameters. 506 00:36:04,657 --> 00:36:06,240 And they have asymptotic distributions 507 00:36:06,240 --> 00:36:09,980 that are identical to those of maximum likelihood estimators. 508 00:36:09,980 --> 00:36:18,360 And so what ends up happening is the least squares estimates 509 00:36:18,360 --> 00:36:21,960 of the vector autoregression parameters lead 510 00:36:21,960 --> 00:36:27,270 to an estimation of the pi matrix. 511 00:36:27,270 --> 00:36:40,290 And the constraints on the pi matrix which are basically pi 512 00:36:40,290 --> 00:36:44,430 is of reduced rank, those will hold asymptotically. 513 00:36:44,430 --> 00:36:49,240 So let's just go back to the equation before, 514 00:36:49,240 --> 00:36:54,490 to see if that looks familiar here. 515 00:36:58,930 --> 00:37:03,070 So what that work says is that if we basically 516 00:37:03,070 --> 00:37:07,110 fit the linear regression model regressing the difference 517 00:37:07,110 --> 00:37:13,930 series on the lag of the series plus lags of differences, 518 00:37:13,930 --> 00:37:18,590 the least squares estimates of these underlying parameters 519 00:37:18,590 --> 00:37:21,690 will give us asymptotically efficient estimates 520 00:37:21,690 --> 00:37:24,060 of this overall process. 521 00:37:24,060 --> 00:37:31,635 So we don't need to use any new tools to specify these models. 522 00:37:43,800 --> 00:37:48,110 There's an advanced literature on estimation methods 523 00:37:48,110 --> 00:37:49,950 for these models. 524 00:37:49,950 --> 00:37:55,050 Johansen does describe maximum likelihood estimation 525 00:37:55,050 --> 00:38:01,260 when the innovation terms are normally distributed. 526 00:38:01,260 --> 00:38:07,270 And that methodology applies reduced rank regression 527 00:38:07,270 --> 00:38:13,150 methodology and yields tests for what 528 00:38:13,150 --> 00:38:17,130 the rank is of the cointegrating relationship. 529 00:38:17,130 --> 00:38:20,270 And these methods are implemented in our packages. 530 00:38:25,710 --> 00:38:26,420 Let's see. 531 00:38:26,420 --> 00:38:40,890 Let me just go back now to the-- so let's see. 532 00:38:40,890 --> 00:38:47,690 The case study on the crack spread data 533 00:38:47,690 --> 00:38:51,370 actually goes through sort of testing for non-stationarity 534 00:38:51,370 --> 00:38:54,040 in these underlying series. 535 00:38:54,040 --> 00:38:58,360 And actually, why don't I just show you that? 536 00:38:58,360 --> 00:38:59,450 Let's go back here. 537 00:39:17,522 --> 00:39:23,460 If you can see this, for the crack spread data, 538 00:39:23,460 --> 00:39:25,230 looking at the crude oil futures, 539 00:39:25,230 --> 00:39:28,450 basically the crude oil future can be evaluated 540 00:39:28,450 --> 00:39:30,790 to see if it's non-stationary. 541 00:39:30,790 --> 00:39:33,800 And there's this augmented Dickey-Fuller test 542 00:39:33,800 --> 00:39:36,350 for non-stationarity. 543 00:39:36,350 --> 00:39:43,160 And it basically has a null hypothesis that the model 544 00:39:43,160 --> 00:39:46,850 or the series is non-stationary, or it has a unit root, 545 00:39:46,850 --> 00:39:49,040 versus the alternative that it doesn't. 546 00:39:49,040 --> 00:39:52,180 And so testing that null hypothesis 547 00:39:52,180 --> 00:39:56,121 that it's non-stationary yields a p-value of 0.164 548 00:39:56,121 --> 00:40:01,690 for CLC1, the first nearest contract, 549 00:40:01,690 --> 00:40:07,400 near month contract of the futures for crude. 550 00:40:07,400 --> 00:40:11,230 And so the data suggests that crude 551 00:40:11,230 --> 00:40:14,060 has a distribution that's non-stationary, integrated 552 00:40:14,060 --> 00:40:16,490 order 1. 553 00:40:16,490 --> 00:40:23,950 And the HOC1 also basically has a test for-- p-value 554 00:40:23,950 --> 00:40:27,550 for non-stationarity of 0.3265. 555 00:40:27,550 --> 00:40:31,000 So we can't reject non-stationarity or unit root 556 00:40:31,000 --> 00:40:34,150 in those series with these test statistics. 557 00:40:34,150 --> 00:40:39,260 In analyzing the data, this suggests that we basically 558 00:40:39,260 --> 00:40:41,380 need to accommodate that non-stationarity when 559 00:40:41,380 --> 00:40:43,150 we specify the models. 560 00:40:46,925 --> 00:40:49,130 Let me just see if there's some results here. 561 00:41:55,180 --> 00:41:59,060 For this series, actually the case notes 562 00:41:59,060 --> 00:42:01,270 will go through actually conducting this Johansen 563 00:42:01,270 --> 00:42:03,360 procedure for testing for the rank 564 00:42:03,360 --> 00:42:05,700 of the cointegrated process. 565 00:42:05,700 --> 00:42:11,630 And that test basically has different test statistic 566 00:42:11,630 --> 00:42:15,260 for testing whether the rank is 0, 1, less than or equal to 1, 567 00:42:15,260 --> 00:42:16,870 or less than or equal to 2. 568 00:42:16,870 --> 00:42:19,650 And one can see that there's marginal-- the test 569 00:42:19,650 --> 00:42:25,930 statistic is almost significant at the 10% level 570 00:42:25,930 --> 00:42:29,780 for the overall series. 571 00:42:29,780 --> 00:42:32,670 It's not significant for the rank 572 00:42:32,670 --> 00:42:34,460 being less than or equal to 1. 573 00:42:34,460 --> 00:42:38,390 And so these results, it doesn't suggest there's 574 00:42:38,390 --> 00:42:40,880 strong non-stationarity. 575 00:42:40,880 --> 00:42:45,360 But certainly with that non-stationarity 576 00:42:45,360 --> 00:42:48,620 is no more than rank one for the series. 577 00:42:48,620 --> 00:42:52,030 And the eigenvector corresponding 578 00:42:52,030 --> 00:42:54,070 to the stationary relationship is 579 00:42:54,070 --> 00:43:00,940 given by these coefficients of 1 on the crude oil future, 580 00:43:00,940 --> 00:43:05,710 1.3 on the RBOB and minus 1.7 on the heating oil. 581 00:43:08,640 --> 00:43:13,360 So what this suggests is that there's 582 00:43:13,360 --> 00:43:20,880 considerable variability in these energy futures contracts. 583 00:43:20,880 --> 00:43:24,390 What appears to be stationary is some linear combination 584 00:43:24,390 --> 00:43:28,670 of crude plus gasoline minus heating oil. 585 00:43:28,670 --> 00:43:33,090 And in terms of why does it combine that way, 586 00:43:33,090 --> 00:43:35,280 well, there are all kinds of factors 587 00:43:35,280 --> 00:43:38,760 that we went through-- cost of refining, supply and demand, 588 00:43:38,760 --> 00:43:41,370 seasonality, which affect things. 589 00:43:41,370 --> 00:43:45,970 And so when analyzed, sort of ignoring seasonality, 590 00:43:45,970 --> 00:43:50,000 these would be the linear combinations that appear 591 00:43:50,000 --> 00:43:51,312 to be stationary over time. 592 00:43:51,312 --> 00:43:51,812 Yeah? 593 00:43:53,722 --> 00:43:55,680 AUDIENCE: Why did you choose to use the futures 594 00:43:55,680 --> 00:43:56,929 prices as opposed to the spot? 595 00:43:56,929 --> 00:44:00,170 And how did you combine the data with actual [INAUDIBLE]? 596 00:44:00,170 --> 00:44:07,820 PROFESSOR: I chose this because if refiners are wanting 597 00:44:07,820 --> 00:44:12,130 to hedge their risks, then they will go to the futures market 598 00:44:12,130 --> 00:44:14,060 to hedge those. 599 00:44:14,060 --> 00:44:17,090 And so working with these data, one 600 00:44:17,090 --> 00:44:24,370 can then consider problems of hedging refinery production 601 00:44:24,370 --> 00:44:25,460 risks. 602 00:44:25,460 --> 00:44:28,620 And so that's why. 603 00:44:28,620 --> 00:44:30,960 AUDIENCE: [INAUDIBLE] 604 00:44:30,960 --> 00:44:33,800 PROFESSOR: OK, well, the Energy Information Administration 605 00:44:33,800 --> 00:44:39,270 provides historical data which gives the first month, 606 00:44:39,270 --> 00:44:42,030 the second month, the third month available for each 607 00:44:42,030 --> 00:44:43,400 of these contracts. 608 00:44:43,400 --> 00:44:47,720 And so I chose the first month contract 609 00:44:47,720 --> 00:44:49,680 for each of these features. 610 00:44:49,680 --> 00:44:51,980 Those 10 are the most liquid. 611 00:44:51,980 --> 00:44:54,440 Depending on what one is hedging, 612 00:44:54,440 --> 00:44:58,550 one would use perhaps longer periods for those. 613 00:44:58,550 --> 00:45:02,450 There's some very nice finance problems 614 00:45:02,450 --> 00:45:04,690 dealing with hedging, hedging these kinds of risks, 615 00:45:04,690 --> 00:45:07,150 and as well as trading these kinds of risk. 616 00:45:07,150 --> 00:45:11,030 Traders can try to exploit short term movements in these. 617 00:45:29,870 --> 00:45:31,820 Anyway, I'll let you look through these, 618 00:45:31,820 --> 00:45:32,760 the case note later. 619 00:45:32,760 --> 00:45:36,810 And it does provide some detail on the coefficient estimates. 620 00:45:36,810 --> 00:45:39,119 And one can basically get a handle 621 00:45:39,119 --> 00:45:40,785 on how these things are being specified. 622 00:45:43,980 --> 00:45:46,170 So let's go back. 623 00:45:58,260 --> 00:46:06,490 The next topic I want to cover is linear state-space models. 624 00:46:06,490 --> 00:46:12,725 It turns out that many of these time series 625 00:46:12,725 --> 00:46:15,090 models appropriate in economics and finance 626 00:46:15,090 --> 00:46:20,290 can be expressed as a linear state-space model. 627 00:46:28,590 --> 00:46:32,250 I'm going to introduce the general notation first and then 628 00:46:32,250 --> 00:46:35,100 provide illustrations of this general notation 629 00:46:35,100 --> 00:46:38,480 with a number of different examples. 630 00:46:38,480 --> 00:46:46,205 So the formulation is we have basically an observation vector 631 00:46:46,205 --> 00:46:47,420 at time t, y_t. 632 00:46:47,420 --> 00:46:50,730 This is our multivariate time series that we're modeling. 633 00:46:50,730 --> 00:46:53,930 Now, I've chosen it to be k-dimensional 634 00:46:53,930 --> 00:46:57,900 for the observations. 635 00:46:57,900 --> 00:47:00,720 There's an underlying state vector 636 00:47:00,720 --> 00:47:04,390 that's of m dimensions, which basically characterizes 637 00:47:04,390 --> 00:47:11,740 the state of the process at time t. 638 00:47:11,740 --> 00:47:15,240 There's an observation error vector at time t, epsilon_t. 639 00:47:15,240 --> 00:47:18,830 So it's k by 1 as well, corresponding to y. 640 00:47:18,830 --> 00:47:22,200 And there's a state transition innovation error vector, 641 00:47:22,200 --> 00:47:31,240 which is n by 1, which actually can 642 00:47:31,240 --> 00:47:36,040 be different from m, the dimension of the state vector. 643 00:47:36,040 --> 00:47:41,300 So we have-- in the state space specification, 644 00:47:41,300 --> 00:47:43,720 we're going to specify two equations, one 645 00:47:43,720 --> 00:47:47,640 for how the states evolve over time and another for how 646 00:47:47,640 --> 00:47:50,090 the observations or measurements evolve, 647 00:47:50,090 --> 00:47:51,910 depending on the underlying states. 648 00:47:51,910 --> 00:47:55,400 So let's first focus on a state equation 649 00:47:55,400 --> 00:47:58,490 which describes how the state progresses 650 00:47:58,490 --> 00:48:05,680 from the state at time t to the state at time t plus 1. 651 00:48:05,680 --> 00:48:09,030 Because this is a linear state-space model, 652 00:48:09,030 --> 00:48:10,710 basically the state at t plus 1 is 653 00:48:10,710 --> 00:48:13,400 going to be some linear function of the states at time 654 00:48:13,400 --> 00:48:16,640 t plus some noise. 655 00:48:16,640 --> 00:48:22,570 And that noise is given by eta_t, 656 00:48:22,570 --> 00:48:26,670 being independent identically distributed white noise, 657 00:48:26,670 --> 00:48:31,600 or normally distributed with some covariance matrix 658 00:48:31,600 --> 00:48:33,910 Q_t, positive definite. 659 00:48:33,910 --> 00:48:37,740 And R_t is some linear transformation 660 00:48:37,740 --> 00:48:41,180 of those, which characterize the uncertainty 661 00:48:41,180 --> 00:48:42,880 in the particular states. 662 00:48:42,880 --> 00:48:45,160 So there's a great deal of flexibility 663 00:48:45,160 --> 00:48:47,830 here in how things depend on each other. 664 00:48:47,830 --> 00:48:53,090 And right now, it will appear just like a lot of notation. 665 00:48:53,090 --> 00:48:54,700 But as we see it in different cases, 666 00:48:54,700 --> 00:48:57,750 you'll see how these terms come into play. 667 00:48:57,750 --> 00:48:59,260 And they're very straightforward. 668 00:49:02,510 --> 00:49:04,800 So we're considering simple linear transformations 669 00:49:04,800 --> 00:49:07,080 of the states plus noise. 670 00:49:07,080 --> 00:49:09,690 And then the observation equation or measurement 671 00:49:09,690 --> 00:49:13,080 equation is a linear transformation 672 00:49:13,080 --> 00:49:14,665 of the underlying states plus noise. 673 00:49:17,230 --> 00:49:20,230 So the matrix Z_t is the observation coefficients 674 00:49:20,230 --> 00:49:21,500 matrix. 675 00:49:21,500 --> 00:49:25,792 And the noise or innovations epsilon_t are, we'll assume, 676 00:49:25,792 --> 00:49:27,250 independent identically distributed 677 00:49:27,250 --> 00:49:29,083 normal, multivariate normal random variables 678 00:49:29,083 --> 00:49:33,550 with some covariance matrix H_t. 679 00:49:33,550 --> 00:49:35,760 To be fully general, the subscript t 680 00:49:35,760 --> 00:49:40,800 means the covariance can depend on time t. 681 00:49:40,800 --> 00:49:44,780 It doesn't have to, but it can. 682 00:49:44,780 --> 00:49:48,600 These two equations can be written together 683 00:49:48,600 --> 00:49:52,830 in a joint equation where we see that the underlying 684 00:49:52,830 --> 00:49:59,370 state at time t, s, gets transformed with T sub t 685 00:49:59,370 --> 00:50:04,550 to the state at t plus 1 plus residual innovation term. 686 00:50:04,550 --> 00:50:08,720 And the observation equation y_t is Z_t s_t plus that. 687 00:50:08,720 --> 00:50:12,430 So we're representing how the states evolve over time 688 00:50:12,430 --> 00:50:14,910 and how the observations depend on the underlying 689 00:50:14,910 --> 00:50:16,815 states in this joint equation. 690 00:50:19,770 --> 00:50:23,950 And the structure of basically this sort 691 00:50:23,950 --> 00:50:28,400 of linear function of states plus error, the error term u_t 692 00:50:28,400 --> 00:50:33,740 here is normally distributed with covariance matrix omega, 693 00:50:33,740 --> 00:50:36,690 which has this structure. 694 00:50:36,690 --> 00:50:38,850 It's a block diagonal. 695 00:50:38,850 --> 00:50:42,942 We have the covariance of the epsilons as the H. 696 00:50:42,942 --> 00:50:48,860 And the covariance of R_t eta_t is R_t Q_t R_t transpose. 697 00:50:48,860 --> 00:50:54,660 So you may recall when we take a covariance matrix 698 00:50:54,660 --> 00:51:01,210 of linear function of random variables given by a matrix, 699 00:51:01,210 --> 00:51:05,310 then it's that linear function R times the covariance matrix 700 00:51:05,310 --> 00:51:07,970 times the transpose. 701 00:51:07,970 --> 00:51:12,910 So that term comes into play. 702 00:51:12,910 --> 00:51:16,860 So let's see how a capital asset pricing 703 00:51:16,860 --> 00:51:19,720 model with time-varying betas can be represented 704 00:51:19,720 --> 00:51:21,540 as a linear state-space model. 705 00:51:24,220 --> 00:51:29,180 You'll recall, we discussed this model a few lectures ago, 706 00:51:29,180 --> 00:51:33,870 where we have the excess return of a given stock, r_t, 707 00:51:33,870 --> 00:51:39,150 is a linear function of the excess return of the market 708 00:51:39,150 --> 00:51:43,710 portfolio, r_(m,t), plus error. 709 00:51:43,710 --> 00:51:48,310 What we're going to do now is extend that previous model 710 00:51:48,310 --> 00:51:54,170 by adding time dependence, t, to the regression parameters. 711 00:51:54,170 --> 00:51:56,320 The alpha is not a constant. 712 00:51:56,320 --> 00:51:58,060 It is going to vary by time. 713 00:51:58,060 --> 00:52:02,700 And the beta is also going to very by time. 714 00:52:02,700 --> 00:52:04,810 And how will they vary by time? 715 00:52:04,810 --> 00:52:10,030 Well, we're going to assume that the alpha_t is 716 00:52:10,030 --> 00:52:13,520 a Gaussian random walk. 717 00:52:13,520 --> 00:52:17,982 And the beta is also a Gaussian random walk. 718 00:52:28,810 --> 00:52:33,670 And with that set up, we have the following expression 719 00:52:33,670 --> 00:52:35,450 for the state equation. 720 00:52:35,450 --> 00:52:38,460 OK, the state equation, which is just the unknown parameters-- 721 00:52:38,460 --> 00:52:40,990 it's the alpha and the beta at given time t. 722 00:52:43,660 --> 00:52:45,720 The state at time t gets adjusted 723 00:52:45,720 --> 00:52:49,340 to the state at time t plus 1 by just adding these random walk 724 00:52:49,340 --> 00:52:50,100 terms to it. 725 00:52:50,100 --> 00:52:52,290 So it's a very simple process. 726 00:52:52,290 --> 00:52:55,270 We have the identity times the previous state 727 00:52:55,270 --> 00:52:58,930 plus the identity times this vector of these innovations. 728 00:52:58,930 --> 00:53:04,120 So s_(t+1) is equal to T_t s_t plus R_t eta_t, 729 00:53:04,120 --> 00:53:08,720 where this matrix, T sub t and R sub t are trivial; 730 00:53:08,720 --> 00:53:10,290 they're just the identity. 731 00:53:10,290 --> 00:53:15,710 And eta_t has a covariance matrix 732 00:53:15,710 --> 00:53:18,985 which is just given by Q_t, sigma squared nu, 733 00:53:18,985 --> 00:53:22,560 sigma squared epsilon. 734 00:53:22,560 --> 00:53:28,680 This is a complex way, perhaps, of representing this model. 735 00:53:28,680 --> 00:53:32,610 But it puts this simple model into that linear state-space 736 00:53:32,610 --> 00:53:33,110 framework. 737 00:53:36,670 --> 00:53:45,660 Now, the observation equation is given by this expression 738 00:53:45,660 --> 00:53:52,250 defining the Z_t matrix as the unit element and r_(m,t) So 739 00:53:52,250 --> 00:53:58,150 it's basically a row vector, or a row matrix, one-row matrix. 740 00:53:58,150 --> 00:54:02,180 And epsilon_t is the white noise process. 741 00:54:02,180 --> 00:54:05,570 Now, putting these equations together, 742 00:54:05,570 --> 00:54:09,270 we basically have the equation for the state transition 743 00:54:09,270 --> 00:54:13,230 and the observation equation together. 744 00:54:13,230 --> 00:54:16,120 We have this form for that. 745 00:54:25,780 --> 00:54:28,522 So now, let's consider a second case 746 00:54:28,522 --> 00:54:31,360 of linear regression models where 747 00:54:31,360 --> 00:54:33,780 we have a time varying beta. 748 00:54:33,780 --> 00:54:37,140 In a way, this case we just looked at 749 00:54:37,140 --> 00:54:39,999 is a simple case of that. 750 00:54:39,999 --> 00:54:41,540 But let's look at a more general case 751 00:54:41,540 --> 00:54:45,270 where we have p independent variables, which 752 00:54:45,270 --> 00:54:47,190 could be time-varying. 753 00:54:47,190 --> 00:54:51,670 So we have a regression model almost 754 00:54:51,670 --> 00:54:54,040 as we've considered it previously. 755 00:54:54,040 --> 00:54:58,400 y_t is equal to x_t transpose beta_t plus epsilon_t. 756 00:54:58,400 --> 00:55:00,850 The difference now is our regression coefficients 757 00:55:00,850 --> 00:55:03,580 beta are allowed to change over time. 758 00:55:09,880 --> 00:55:11,180 How do they change over time? 759 00:55:11,180 --> 00:55:14,120 Well, we're going to assume that those also 760 00:55:14,120 --> 00:55:19,120 follow independent random walks with variances 761 00:55:19,120 --> 00:55:23,090 of the random walks that may depend on the component. 762 00:55:23,090 --> 00:55:24,770 So the joint state-space equation 763 00:55:24,770 --> 00:55:32,530 here is given by the identity times s_t plus eta_t. 764 00:55:32,530 --> 00:55:36,360 That's basically the random walk process for the underlying 765 00:55:36,360 --> 00:55:37,600 regression parameters. 766 00:55:37,600 --> 00:55:42,360 And y_t is equal to x_t transpose 767 00:55:42,360 --> 00:55:46,081 times the same regression parameters plus the observation 768 00:55:46,081 --> 00:55:46,580 error. 769 00:55:56,480 --> 00:55:59,770 I guess needless to say, if we consider the special case where 770 00:55:59,770 --> 00:56:04,610 the random walk process is degenerate 771 00:56:04,610 --> 00:56:07,320 and they're basically steps of size zero, 772 00:56:07,320 --> 00:56:10,410 then we get the normal linear regression model coming out 773 00:56:10,410 --> 00:56:11,870 of this. 774 00:56:11,870 --> 00:56:17,950 If we were to be specifying the linear state-space 775 00:56:17,950 --> 00:56:22,810 implementation of this model and consider successive estimates 776 00:56:22,810 --> 00:56:25,270 of the model parameters over time, 777 00:56:25,270 --> 00:56:28,970 then these equations would give us recursive estimates 778 00:56:28,970 --> 00:56:34,080 for updating regressions as we add 779 00:56:34,080 --> 00:56:37,500 additional values to the data, additional observations 780 00:56:37,500 --> 00:56:38,000 to the data. 781 00:56:43,880 --> 00:56:49,960 Let's look at autoregressive models of order p. 782 00:56:49,960 --> 00:56:55,780 The autoregressive model of order p for a univariate time 783 00:56:55,780 --> 00:57:01,670 series has the setup given here. 784 00:57:01,670 --> 00:57:07,470 It's a polynomial lag of the response 785 00:57:07,470 --> 00:57:10,940 variable y_t is equal to the innovation epsilon_t. 786 00:57:10,940 --> 00:57:16,130 And we can define the state vector 787 00:57:16,130 --> 00:57:24,980 to be equal to the vector of p values, p successive values 788 00:57:24,980 --> 00:57:27,650 of the process. 789 00:57:27,650 --> 00:57:33,710 And so we basically get a combination 790 00:57:33,710 --> 00:57:38,700 here of the observation equation and state equation joining 791 00:57:38,700 --> 00:57:46,720 where basically one of the states 792 00:57:46,720 --> 00:57:48,760 is actually equal to the observation. 793 00:57:48,760 --> 00:57:52,600 And basically, with this definition 794 00:57:52,600 --> 00:57:59,160 for a state of the vector at the next time point t, 795 00:57:59,160 --> 00:58:03,730 that is equal to this linear transformation 796 00:58:03,730 --> 00:58:09,114 of the lagged state vector plus that innovation term. 797 00:58:09,114 --> 00:58:10,608 I dropped the mic. 798 00:58:16,600 --> 00:58:21,480 So the notation here shows the structure 799 00:58:21,480 --> 00:58:26,240 for how this linear state-space model is evolving. 800 00:58:26,240 --> 00:58:29,090 Basically, the observation equation 801 00:58:29,090 --> 00:58:32,410 is the linear combination of the five 802 00:58:32,410 --> 00:58:36,500 multiples of lags of the values plus the residual. 803 00:58:36,500 --> 00:58:40,240 And the previous lags of the states 804 00:58:40,240 --> 00:58:46,200 are just simply the identities times those values, shifted. 805 00:58:46,200 --> 00:58:51,690 So it's a very simple structure for the autoregressive process 806 00:58:51,690 --> 00:58:53,431 as a linear state-space model. 807 00:58:56,660 --> 00:59:02,470 We have, as I was just saying, for the transition matrix T sub 808 00:59:02,470 --> 00:59:09,750 t, this matrix and the observation equation 809 00:59:09,750 --> 00:59:13,730 is essentially picking out the first element of the state 810 00:59:13,730 --> 00:59:16,540 vector, which has no measurement error. 811 00:59:16,540 --> 00:59:18,490 So that simplifies that. 812 00:59:21,940 --> 00:59:27,210 The moving average model of order q 813 00:59:27,210 --> 00:59:29,700 could also be expressed as a linear state-space model. 814 00:59:37,240 --> 00:59:38,820 Remember, the moving average model 815 00:59:38,820 --> 00:59:43,030 is one where our response variable, y, is simply 816 00:59:43,030 --> 00:59:48,290 some linear combination of innovations, 817 00:59:48,290 --> 00:59:50,500 q past innovations. 818 00:59:50,500 --> 00:59:55,350 And this state vector, if we consider 819 00:59:55,350 --> 01:00:00,180 the state vector just being basically q 820 01:00:00,180 --> 01:00:04,400 lags of the innovations, then the transition 821 01:00:04,400 --> 01:00:08,780 of those underlying states is given by this expression here. 822 01:00:14,690 --> 01:00:17,770 And we have a state equation, an observation equation, 823 01:00:17,770 --> 01:00:23,500 which has these forms for these various transition matrices 824 01:00:23,500 --> 01:00:30,615 and for how the innovation terms are related. 825 01:00:40,840 --> 01:00:43,160 Let me just finish up with example 826 01:00:43,160 --> 01:00:47,780 showing with the autoregressive moving average model. 827 01:00:47,780 --> 01:00:49,340 And many years ago, it was actually 828 01:00:49,340 --> 01:00:55,490 very difficult to specify the estimation 829 01:00:55,490 --> 01:00:58,902 methods for autoregressive moving average models. 830 01:00:58,902 --> 01:01:00,800 But the implementation of these models 831 01:01:00,800 --> 01:01:05,590 as linear state-space models facilitated that greatly. 832 01:01:05,590 --> 01:01:13,030 And with the ARMA model, the setup basically 833 01:01:13,030 --> 01:01:14,730 is a combination of the autoregressive 834 01:01:14,730 --> 01:01:16,900 moving average processes. 835 01:01:16,900 --> 01:01:20,280 We have an autoregression of the y's 836 01:01:20,280 --> 01:01:24,719 is equal to a moving average of the residuals 837 01:01:24,719 --> 01:01:25,510 or the innovations. 838 01:01:28,170 --> 01:01:32,550 And it's convenient in the setup for linear state-space models 839 01:01:32,550 --> 01:01:37,720 to define the dimension m, which is the maximum of p and q 840 01:01:37,720 --> 01:01:45,860 plus 1, and think of having basically a possibly m order 841 01:01:45,860 --> 01:01:50,860 polynomial lag for each of those two series. 842 01:01:50,860 --> 01:01:55,060 And we can basically constrain those values 843 01:01:55,060 --> 01:01:59,134 to be 0 if m is greater than p or m is greater than q. 844 01:02:06,880 --> 01:02:11,240 And Harvey, in a very important work in '93, 845 01:02:11,240 --> 01:02:17,080 actually defined a particular state-space representation 846 01:02:17,080 --> 01:02:19,350 for this process. 847 01:02:19,350 --> 01:02:20,980 And I guess it's important to know 848 01:02:20,980 --> 01:02:24,310 that with these linear state-space models, 849 01:02:24,310 --> 01:02:29,030 we're dealing with characterizing structure 850 01:02:29,030 --> 01:02:31,750 in m-dimensional space. 851 01:02:31,750 --> 01:02:35,510 There's often some choice in how you represent your underlying 852 01:02:35,510 --> 01:02:37,670 states. 853 01:02:37,670 --> 01:02:42,430 You can basically re-parametrize the models 854 01:02:42,430 --> 01:02:47,080 by considering invertible linear transformations 855 01:02:47,080 --> 01:02:49,760 of the underlying states. 856 01:02:49,760 --> 01:02:52,820 So let me go back here. 857 01:02:56,700 --> 01:02:59,990 In expressing the state equation generally 858 01:02:59,990 --> 01:03:04,190 is T sub t s_t plus R_t eta_t. 859 01:03:04,190 --> 01:03:08,540 This matrix T sub t and st-- basically s_t 860 01:03:08,540 --> 01:03:11,280 can be replaced by a linear transformation of s_t, 861 01:03:11,280 --> 01:03:16,730 so long as we multiply the T sub t by the inverse 862 01:03:16,730 --> 01:03:17,850 of that transformation. 863 01:03:17,850 --> 01:03:19,810 So there's flexibility in the choice 864 01:03:19,810 --> 01:03:22,340 of our linear state-space specification. 865 01:03:22,340 --> 01:03:28,820 And so there really are many different equivalent linear 866 01:03:28,820 --> 01:03:33,380 state-space models for a given process depending 867 01:03:33,380 --> 01:03:35,600 on exactly how you define the states 868 01:03:35,600 --> 01:03:39,490 and the underlying transformation matrix T. 869 01:03:39,490 --> 01:03:44,900 And the beauty of Harvey's work was coming up 870 01:03:44,900 --> 01:03:47,490 with a nice representation for the states, 871 01:03:47,490 --> 01:03:53,100 where we had very simple forms for the various matrices. 872 01:03:53,100 --> 01:03:57,000 And the lecture notes here go through the derivation 873 01:03:57,000 --> 01:03:59,430 of that for the ARMA process. 874 01:03:59,430 --> 01:04:04,490 And this derivation is-- I just want 875 01:04:04,490 --> 01:04:08,240 to go through the first case just 876 01:04:08,240 --> 01:04:11,020 to highlight how the argument goes. 877 01:04:11,020 --> 01:04:15,090 We basically have this equation, which is the original equation 878 01:04:15,090 --> 01:04:17,345 for an ARMA(p,q) process. 879 01:04:20,180 --> 01:04:25,810 And Harvey says, well, define the first-- 880 01:04:25,810 --> 01:04:29,460 or the state at time t to be equal to the observation 881 01:04:29,460 --> 01:04:31,820 at time t. 882 01:04:31,820 --> 01:04:38,250 If we do that, then how does this equation relate 883 01:04:38,250 --> 01:04:46,000 to the basically-- this is the state at the next time point, t 884 01:04:46,000 --> 01:04:50,610 plus 1, is equal to phi_1 times the state at time t, 885 01:04:50,610 --> 01:05:00,340 plus a second state at time t and a residual innovation 886 01:05:00,340 --> 01:05:01,420 eta_t. 887 01:05:01,420 --> 01:05:09,110 So by choosing the first state to be the observation value 888 01:05:09,110 --> 01:05:16,680 at that time, we can then solve for the second state, 889 01:05:16,680 --> 01:05:19,810 which is given by this expression, 890 01:05:19,810 --> 01:05:25,730 just by rewriting our model equation in terms of s_(1,t), 891 01:05:25,730 --> 01:05:27,880 s_(2,t) and eta_t. 892 01:05:27,880 --> 01:05:36,950 So this s_(2,t) is this function of the observations and eta_t. 893 01:05:36,950 --> 01:05:39,440 So it's a very simple specification 894 01:05:39,440 --> 01:05:41,820 of the second state. 895 01:05:41,820 --> 01:05:48,020 Just what is that second state element 896 01:05:48,020 --> 01:05:50,520 given this definition of the first one? 897 01:05:50,520 --> 01:05:54,650 And one can do this process iteratively 898 01:05:54,650 --> 01:05:59,180 getting rid of the observations and replacing them 899 01:05:59,180 --> 01:06:01,290 by underlying states. 900 01:06:01,290 --> 01:06:03,770 And at the end of the day, you end up 901 01:06:03,770 --> 01:06:09,490 with this very simple form for the transition matrix T. 902 01:06:09,490 --> 01:06:13,950 Basically, the T has the autoregressive components 903 01:06:13,950 --> 01:06:16,410 as the first column of the T matrix. 904 01:06:16,410 --> 01:06:20,440 And this R matrix has this vector of the moving 905 01:06:20,440 --> 01:06:22,550 average components. 906 01:06:22,550 --> 01:06:28,330 So it's a very nice way to represent the model. 907 01:06:28,330 --> 01:06:32,990 Coming up with it was something very clever that he did. 908 01:06:32,990 --> 01:06:36,580 But what one can see is that this basic model where 909 01:06:36,580 --> 01:06:41,620 you have the states transitioning according 910 01:06:41,620 --> 01:06:45,540 to a linear transformation of the previous state plus error, 911 01:06:45,540 --> 01:06:49,910 and the observation being some function of the current states, 912 01:06:49,910 --> 01:06:54,119 plus error or not, depending on the formulation, 913 01:06:54,119 --> 01:06:55,035 is the representation. 914 01:06:58,200 --> 01:07:03,770 Now, with all of these models, a reason 915 01:07:03,770 --> 01:07:08,860 why linear state-space modeling is in fact effective 916 01:07:08,860 --> 01:07:19,711 is that their specification is fully specified with the Kalman 917 01:07:19,711 --> 01:07:20,210 filter. 918 01:07:22,730 --> 01:07:32,100 So with this formulation of linear state-space models, 919 01:07:32,100 --> 01:07:37,000 the Kalman filter as a methodology is 920 01:07:37,000 --> 01:07:41,380 the recursive computation of the probability density 921 01:07:41,380 --> 01:07:48,535 functions for the underlying states at basically 922 01:07:48,535 --> 01:07:52,420 t plus 1 given information up to time t, 923 01:07:52,420 --> 01:07:56,710 as well as the joint density of the future state 924 01:07:56,710 --> 01:07:59,800 and the future observation at t plus 1, given information up 925 01:07:59,800 --> 01:08:02,370 to time t. 926 01:08:02,370 --> 01:08:05,520 And also just the marginal distribution 927 01:08:05,520 --> 01:08:10,380 of the next observation given the information up to time t. 928 01:08:20,490 --> 01:08:26,510 So what I want to do is just go through with you 929 01:08:26,510 --> 01:08:31,550 how the Kalman filter is implemented and defined. 930 01:08:31,550 --> 01:08:35,370 And the implementation of the Kalman filter 931 01:08:35,370 --> 01:08:40,939 requires us to have some notation that's a bit involved, 932 01:08:40,939 --> 01:08:46,710 but we'll hopefully explain it so it's very straightforward. 933 01:08:46,710 --> 01:08:49,474 There are basically conditional means of the states. 934 01:08:52,090 --> 01:08:55,450 s sub t given t is the mean value 935 01:08:55,450 --> 01:08:59,510 of the state at time t given the information up to time t. 936 01:08:59,510 --> 01:09:02,069 If we condition on t minus 1, then 937 01:09:02,069 --> 01:09:03,500 it's the expectation of the state 938 01:09:03,500 --> 01:09:06,300 at time t given the information up to t minus 1. 939 01:09:09,460 --> 01:09:12,100 And then y t t minus 1 is the expectation 940 01:09:12,100 --> 01:09:16,880 of the observation given information up to t minus 1. 941 01:09:16,880 --> 01:09:18,780 There's also conditional covariances 942 01:09:18,780 --> 01:09:22,260 and mean squared errors. 943 01:09:22,260 --> 01:09:26,620 All these covariances are determined by omegas. 944 01:09:26,620 --> 01:09:33,240 The subscript corresponds to states s, or observation y. 945 01:09:33,240 --> 01:09:35,060 And basically, the conditioning set 946 01:09:35,060 --> 01:09:39,149 is either information up to time t, t minus 1 or t minus 1 947 01:09:39,149 --> 01:09:40,479 in the second case. 948 01:09:40,479 --> 01:09:45,370 And we want to compute basically the covariance matrix 949 01:09:45,370 --> 01:09:49,999 of the states given whatever the information is, information 950 01:09:49,999 --> 01:09:52,439 up to time t, t minus 1. 951 01:09:52,439 --> 01:09:57,810 So these covariance matrices are the expectation 952 01:09:57,810 --> 01:10:01,990 of the state minus their expectation 953 01:10:01,990 --> 01:10:06,850 under the conditioning times the state minus the expectation 954 01:10:06,850 --> 01:10:07,950 transpose. 955 01:10:07,950 --> 01:10:10,810 That's the definition of that covariance matrix. 956 01:10:10,810 --> 01:10:12,230 So the different definitions here 957 01:10:12,230 --> 01:10:14,300 correspond to just whether we're conditioning 958 01:10:14,300 --> 01:10:15,345 on different information. 959 01:10:17,900 --> 01:10:23,170 And then the observation innovations or residuals 960 01:10:23,170 --> 01:10:29,510 are the difference between an observation y_t 961 01:10:29,510 --> 01:10:33,847 and its estimate given information up to t minus 1. 962 01:10:37,190 --> 01:10:41,370 So the residuals in this process are the innovation residuals, 963 01:10:41,370 --> 01:10:44,200 one period ahead. 964 01:10:44,200 --> 01:10:50,780 And the Kalman filter consists of four steps. 965 01:10:50,780 --> 01:11:00,800 We basically want to, first, predict the state vector 966 01:11:00,800 --> 01:11:01,780 one step ahead. 967 01:11:01,780 --> 01:11:10,140 So given our estimate of the state vector at time t minus 1, 968 01:11:10,140 --> 01:11:14,800 we want to predict this state vector at time t. 969 01:11:14,800 --> 01:11:18,220 And we also want to predict the observation 970 01:11:18,220 --> 01:11:23,820 at time t given our estimate at state vector time t minus 1. 971 01:11:23,820 --> 01:11:31,674 And so at time t minus 1, we can estimate these quantities. 972 01:11:31,674 --> 01:11:32,174 [INAUDIBLE] 973 01:11:35,646 --> 01:11:40,969 At t minus 1, we can basically predict 974 01:11:40,969 --> 01:11:42,760 what the state is going to and predict what 975 01:11:42,760 --> 01:11:44,750 the observation is going to be. 976 01:11:44,750 --> 01:11:47,166 And we can estimate how much error there's 977 01:11:47,166 --> 01:11:49,707 going to be in those estimates, by these covariance matrices. 978 01:11:59,420 --> 01:12:05,140 The second step is updating these predictions 979 01:12:05,140 --> 01:12:11,900 to get our estimate of the state given the observation at time t 980 01:12:11,900 --> 01:12:15,480 and to update our uncertainty about that state given 981 01:12:15,480 --> 01:12:16,380 this new observation. 982 01:12:16,380 --> 01:12:21,350 So basically, our estimate of the state at time t 983 01:12:21,350 --> 01:12:25,310 is an adjustment to our estimate given information up 984 01:12:25,310 --> 01:12:31,164 to t minus 1, plus a function of the difference between what we 985 01:12:31,164 --> 01:12:32,455 observed and what we predicted. 986 01:12:35,020 --> 01:12:42,870 And this T_t function matrix is called the filter gain matrix. 987 01:12:42,870 --> 01:12:45,120 And basically, it characterizes how 988 01:12:45,120 --> 01:12:50,070 do we adjust our prediction of the underlying state 989 01:12:50,070 --> 01:12:52,760 depending on what happened. 990 01:12:52,760 --> 01:12:54,440 So that's the filter gain matrix. 991 01:12:57,150 --> 01:13:00,470 So we actually do gain information 992 01:13:00,470 --> 01:13:03,160 with each observation about what the new value of the process 993 01:13:03,160 --> 01:13:04,320 is. 994 01:13:04,320 --> 01:13:06,830 And that information is characterized 995 01:13:06,830 --> 01:13:09,190 by filter gain matrix. 996 01:13:09,190 --> 01:13:11,580 You'll notice that the uncertainty 997 01:13:11,580 --> 01:13:15,720 in the state at time t, this omega_s of t given t, that's 998 01:13:15,720 --> 01:13:19,630 equal to the covariance matrix given t minus 1. 999 01:13:19,630 --> 01:13:23,330 So it's our beginning level of uncertainty adjusted 1000 01:13:23,330 --> 01:13:27,790 by a term that tells us how much information did we 1001 01:13:27,790 --> 01:13:29,580 get from that new information. 1002 01:13:29,580 --> 01:13:33,590 So notice that there's a minus sign there. 1003 01:13:33,590 --> 01:13:35,600 We're basically reducing our uncertainty 1004 01:13:35,600 --> 01:13:44,602 about the state given the information in the innovation 1005 01:13:44,602 --> 01:13:45,685 that we now have observed. 1006 01:13:48,800 --> 01:13:51,870 Then, there's a forecasting step which 1007 01:13:51,870 --> 01:13:59,310 is used to forecast the state one period forward, 1008 01:13:59,310 --> 01:14:01,400 is simply given by this linear transformation 1009 01:14:01,400 --> 01:14:03,170 of the previous state. 1010 01:14:03,170 --> 01:14:05,890 And we can also update our covariance matrix 1011 01:14:05,890 --> 01:14:09,580 for future states given the previous state 1012 01:14:09,580 --> 01:14:13,530 by applying this formula which is a recursive formula 1013 01:14:13,530 --> 01:14:17,580 for estimating covariances. 1014 01:14:17,580 --> 01:14:24,760 So we have forecasting algorithms 1015 01:14:24,760 --> 01:14:29,520 that are simple linear functions of these estimates. 1016 01:14:29,520 --> 01:14:35,650 And then finally, there's a smoothing step 1017 01:14:35,650 --> 01:14:43,960 which is characterizing the conditional expectation 1018 01:14:43,960 --> 01:14:49,950 of underlying states, given information in the whole time 1019 01:14:49,950 --> 01:14:51,150 series. 1020 01:14:51,150 --> 01:14:55,440 And so ordinarily with Kalman filters, Kalman filters 1021 01:14:55,440 --> 01:14:58,210 are applied sequentially over time 1022 01:14:58,210 --> 01:15:01,090 where one basically is predicting ahead 1023 01:15:01,090 --> 01:15:03,550 one step, updating that prediction, 1024 01:15:03,550 --> 01:15:08,320 predicting ahead another step, updating the information 1025 01:15:08,320 --> 01:15:10,930 on the states. 1026 01:15:10,930 --> 01:15:19,410 And that overall process is the process 1027 01:15:19,410 --> 01:15:21,550 of actually computing the likelihood 1028 01:15:21,550 --> 01:15:25,210 function for these linear state-space models. 1029 01:15:25,210 --> 01:15:32,140 And so the Kalman filter is basically ultimately applied 1030 01:15:32,140 --> 01:15:35,010 for successive forecasting of the process 1031 01:15:35,010 --> 01:15:39,600 but also for helping us identify what the underlying model 1032 01:15:39,600 --> 01:15:43,430 parameters are using maximum likelihood methods. 1033 01:15:43,430 --> 01:15:48,290 And so the likelihood function for the linear state-space 1034 01:15:48,290 --> 01:15:52,050 model is basically the-- or the log-likelihood 1035 01:15:52,050 --> 01:15:54,920 is the log-likelihood of the entire data series, 1036 01:15:54,920 --> 01:15:56,980 give the unknown parameters. 1037 01:15:56,980 --> 01:16:00,020 But that can be expressed as the product 1038 01:16:00,020 --> 01:16:04,290 of the conditional distributions of each successive observation, 1039 01:16:04,290 --> 01:16:07,150 given the history. 1040 01:16:07,150 --> 01:16:09,750 And so basically, the likelihood of theta 1041 01:16:09,750 --> 01:16:12,390 is the likelihood of the first observation 1042 01:16:12,390 --> 01:16:15,240 times the density of the second observation given 1043 01:16:15,240 --> 01:16:18,990 the first times and so forth for the whole series. 1044 01:16:18,990 --> 01:16:22,650 And so the likelihood function is basically 1045 01:16:22,650 --> 01:16:25,490 a function of all these terms that we were computing 1046 01:16:25,490 --> 01:16:26,490 with the Kalman filter. 1047 01:16:29,260 --> 01:16:33,470 And with the Kalman filter, it basically 1048 01:16:33,470 --> 01:16:36,760 provides all the terms necessary for this estimation. 1049 01:16:36,760 --> 01:16:42,270 If the error terms are normally distributed, 1050 01:16:42,270 --> 01:16:46,550 then the means and variances of these estimates 1051 01:16:46,550 --> 01:16:52,750 are in fact characterizing the exact distributions 1052 01:16:52,750 --> 01:16:54,300 of the process. 1053 01:16:54,300 --> 01:16:56,850 Basically, we're taking-- if the innovation series are 1054 01:16:56,850 --> 01:16:59,290 all normal random variables, then 1055 01:16:59,290 --> 01:17:00,980 the linear state-space model, all 1056 01:17:00,980 --> 01:17:03,750 it's doing is taking linear combinations of normals 1057 01:17:03,750 --> 01:17:07,410 for the underlying states and for the actual observations. 1058 01:17:07,410 --> 01:17:08,890 And normal distributions are fully 1059 01:17:08,890 --> 01:17:10,610 characterized by their mean vectors 1060 01:17:10,610 --> 01:17:12,310 and covariance matrices. 1061 01:17:12,310 --> 01:17:14,050 And the Kalman filter provides a way 1062 01:17:14,050 --> 01:17:21,570 to update these distributions for all these features 1063 01:17:21,570 --> 01:17:23,000 of a model, the underlying states 1064 01:17:23,000 --> 01:17:26,520 as well as the distributions of the observations. 1065 01:17:26,520 --> 01:17:35,250 So that's a brief introduction the Kalman filter. 1066 01:17:35,250 --> 01:17:36,940 Let's finish there. 1067 01:17:36,940 --> 01:17:38,490 Thank you.