WEBVTT
00:00:00.090 --> 00:00:02.500
The following content is
provided under a Creative
00:00:02.500 --> 00:00:04.019
Commons license.
00:00:04.019 --> 00:00:06.360
Your support will help
MIT OpenCourseWare
00:00:06.360 --> 00:00:10.730
continue to offer high-quality
educational resources for free.
00:00:10.730 --> 00:00:13.330
To make a donation or
view additional materials
00:00:13.330 --> 00:00:17.210
from hundreds of MIT courses,
visit MIT OpenCourseWare
00:00:17.210 --> 00:00:17.835
at ocw.mit.edu.
00:00:21.650 --> 00:00:24.030
PROFESSOR: We introduced
the data last time.
00:00:24.030 --> 00:00:27.700
These were some
macroeconomic variables
00:00:27.700 --> 00:00:33.990
that can be used for forecasting
the economy in terms of growth
00:00:33.990 --> 00:00:39.330
and factors such as
inflation or unemployment.
00:00:39.330 --> 00:00:44.020
The case note goes through
analyzing just three
00:00:44.020 --> 00:00:47.690
of these economic time
series-- the unemployment rate,
00:00:47.690 --> 00:00:51.360
the federal funds rate,
and a measure of the CPI,
00:00:51.360 --> 00:00:52.530
or Consumer Price Index.
00:00:56.450 --> 00:01:00.520
When one fits a vector
autoregression model
00:01:00.520 --> 00:01:08.940
to this data, it turns
out that the roots
00:01:08.940 --> 00:01:16.800
of the characteristic polynomial
are 1.002, then 0.9863.
00:01:16.800 --> 00:01:19.090
And you recall when our
discussion of vector
00:01:19.090 --> 00:01:23.140
autoregressive models, there's
a characteristic equation
00:01:23.140 --> 00:01:25.425
sort of in matrix
form, the determinant
00:01:25.425 --> 00:01:29.720
is just like the univariate
autoregressive case.
00:01:29.720 --> 00:01:44.120
And in order for the process
to be invertible, basically,
00:01:44.120 --> 00:01:46.150
the roots of the
characteristic polynomial
00:01:46.150 --> 00:01:50.370
need to be less
than 1 in magnitude.
00:01:50.370 --> 00:01:54.110
In this implementation of the
vector autoregression model,
00:01:54.110 --> 00:01:57.220
the characteristic
roots are the inverses
00:01:57.220 --> 00:01:59.620
of the characteristic roots
that we've been discussing.
00:01:59.620 --> 00:02:03.770
So anyway, this particular fit
of the vector autoregression
00:02:03.770 --> 00:02:11.370
model suggests that the
process is non-stationary.
00:02:11.370 --> 00:02:17.580
And so one should be
considering different series
00:02:17.580 --> 00:02:20.400
to model this as a
stationary time series.
00:02:20.400 --> 00:02:26.520
But in terms of interpreting
the regression model,
00:02:26.520 --> 00:02:36.320
one can see-- to accommodate
the non-stationarity,
00:02:36.320 --> 00:02:41.020
we can take differences
of all the series
00:02:41.020 --> 00:02:43.360
and fit the vector
autoregression
00:02:43.360 --> 00:02:45.550
to the difference series.
00:02:45.550 --> 00:02:49.210
So one way of eliminating any
non-stationarity in time series
00:02:49.210 --> 00:02:52.810
models, basically
eliminate the random walk
00:02:52.810 --> 00:02:57.290
aspect of the processes, is to
be modeling first differences.
00:02:57.290 --> 00:03:06.180
And so doing that with
this series-- let's see.
00:03:06.180 --> 00:03:10.220
Here is just a graph of
the time series properties
00:03:10.220 --> 00:03:11.800
of the difference series.
00:03:15.210 --> 00:03:19.180
So with our original series, we
take differences and eliminate
00:03:19.180 --> 00:03:22.820
missing values in this R code.
00:03:22.820 --> 00:03:25.300
And this
autocorrelation function
00:03:25.300 --> 00:03:31.100
shows us basically
the correlations
00:03:31.100 --> 00:03:33.420
and autocorrelations
of individual series
00:03:33.420 --> 00:03:36.950
and the cross-correlations
across the different series.
00:03:36.950 --> 00:03:41.680
So along the diagonals are
the autocorrelation function.
00:03:41.680 --> 00:03:43.800
And one can see
that every series
00:03:43.800 --> 00:03:47.280
is correlation one with itself.
00:03:47.280 --> 00:03:52.380
But then at the first
lag, positive for the Fed
00:03:52.380 --> 00:03:56.450
funds and the CPI measure.
00:03:56.450 --> 00:03:58.980
And there's also some
cross-correlations
00:03:58.980 --> 00:04:01.550
that are strong.
00:04:01.550 --> 00:04:04.180
And whether or not a
correlation is strong or not
00:04:04.180 --> 00:04:06.125
depends upon how much
uncertainty there
00:04:06.125 --> 00:04:08.250
is in our estimate
of the correlation.
00:04:08.250 --> 00:04:11.750
And these dashed
lines here correspond
00:04:11.750 --> 00:04:16.980
to plus or minus two standard
deviations of the correlation
00:04:16.980 --> 00:04:23.440
coefficient when the correlation
coefficient is equal to 0.
00:04:23.440 --> 00:04:28.470
So any correlations that sort
of go beyond those bounds
00:04:28.470 --> 00:04:29.715
is statistically significant.
00:04:33.180 --> 00:04:39.210
The partial autocorrelation
function is graphed here.
00:04:39.210 --> 00:04:42.730
And let's say our
time series problem
00:04:42.730 --> 00:04:46.040
set goes through some discussion
of the partial autocorrelation
00:04:46.040 --> 00:04:48.600
coefficients and the
interpretation of those.
00:04:48.600 --> 00:04:51.910
The partial autocorrelation
coefficients
00:04:51.910 --> 00:04:57.450
are the correlation
between one variable
00:04:57.450 --> 00:04:59.330
and the lag of another
after explaining
00:04:59.330 --> 00:05:02.110
for all lower degree lags.
00:05:02.110 --> 00:05:06.480
So it's like the incremental
correlation of a variable
00:05:06.480 --> 00:05:10.760
with a lag term that exists.
00:05:10.760 --> 00:05:13.830
And so if we are actually
fitting regression models where
00:05:13.830 --> 00:05:18.460
we include extra lags
of a given variable,
00:05:18.460 --> 00:05:20.570
that partial
autocorrelation coefficient
00:05:20.570 --> 00:05:25.260
is essentially the correlation
associated with the addition
00:05:25.260 --> 00:05:27.620
of the final lagged variable.
00:05:27.620 --> 00:05:30.230
So here, we can see that
each of these series
00:05:30.230 --> 00:05:33.950
is quite strongly
correlated with itself.
00:05:33.950 --> 00:05:37.470
But there are also
some cross-correlations
00:05:37.470 --> 00:05:42.750
with, like, the unemployment
rate and the Fed funds rate.
00:05:42.750 --> 00:05:46.700
Basically, the Fed
funds rate tends
00:05:46.700 --> 00:05:50.400
to go down when the
unemployment rate goes up.
00:05:50.400 --> 00:05:54.610
And so this data is
indicating the association
00:05:54.610 --> 00:05:56.640
between these
macroeconomic variables
00:05:56.640 --> 00:05:59.100
and the evidence
of that behavior.
00:05:59.100 --> 00:06:02.100
In terms of modeling the
actual structural relations
00:06:02.100 --> 00:06:05.930
between these, we need
several, up to about 10
00:06:05.930 --> 00:06:08.380
or 12 variables more
than these three.
00:06:08.380 --> 00:06:12.710
And then one can have
a better understanding
00:06:12.710 --> 00:06:15.750
of the drivers of various
macroeconomic features.
00:06:15.750 --> 00:06:17.250
But this sort of
illustrates the use
00:06:17.250 --> 00:06:19.950
of these methods with this
reduced variable case.
00:06:22.830 --> 00:06:25.650
Let me also go
down here and just
00:06:25.650 --> 00:06:33.710
comment on the unemployment
rate or the Fed funds rate.
00:06:46.050 --> 00:06:48.460
When fitting these vector
autoregressive models
00:06:48.460 --> 00:06:52.070
with the packages
that exist in R,
00:06:52.070 --> 00:06:56.320
they give us output which
provides the specification
00:06:56.320 --> 00:07:01.440
of each of the
autoregressive models
00:07:01.440 --> 00:07:05.260
for the different dependent
variables, the different series
00:07:05.260 --> 00:07:07.620
of the process.
00:07:07.620 --> 00:07:13.610
And so here is the case of the
regression model for Fed funds
00:07:13.610 --> 00:07:17.720
as a function of
unemployment rate lagged,
00:07:17.720 --> 00:07:21.040
Fed funds rate lagged,
and CPI lagged.
00:07:21.040 --> 00:07:25.240
These are all on
the different scale.
00:07:25.240 --> 00:07:27.730
When you're looking at
these results, what's
00:07:27.730 --> 00:07:31.340
important is
basically how strong
00:07:31.340 --> 00:07:33.850
the signal-to-noise
ratio is for estimating
00:07:33.850 --> 00:07:37.590
these autoregressive
parameters, vector
00:07:37.590 --> 00:07:39.130
autoregressive parameters.
00:07:39.130 --> 00:07:43.540
And so with the Fed funds,
you can look at the t values.
00:07:43.540 --> 00:07:45.920
And t values that
are larger than 2
00:07:45.920 --> 00:07:49.210
are certainly quite significant.
00:07:49.210 --> 00:07:53.540
And you can see that basically
when the unemployment rate
00:07:53.540 --> 00:07:59.250
coefficient is a negative
0.71, so if the unemployment
00:07:59.250 --> 00:08:05.270
rate goes up, we expect to
see the Fed rate going down
00:08:05.270 --> 00:08:07.080
the next month.
00:08:07.080 --> 00:08:15.650
And the Fed funds rate for the
lag 1 has a t value of 7.97.
00:08:15.650 --> 00:08:18.790
So these are now models
on the differences.
00:08:18.790 --> 00:08:21.480
So if the Fed funds
rate was increased
00:08:21.480 --> 00:08:25.880
last month or last quarter, it's
likely to be increased again.
00:08:25.880 --> 00:08:31.560
And that's partly a factor
of how slow the economy is
00:08:31.560 --> 00:08:34.049
in reacting to changes
and how the Fed doesn't
00:08:34.049 --> 00:08:40.200
want to shock the economy with
large changes in their policy
00:08:40.200 --> 00:08:42.909
rates.
00:08:42.909 --> 00:08:46.600
Another thing to notice here
is that there's actually
00:08:46.600 --> 00:08:50.230
a negative coefficient
on the lag 2
00:08:50.230 --> 00:08:54.490
Fed funds term, a negative 0.17.
00:08:54.490 --> 00:08:58.870
And in interpreting
these kinds of models,
00:08:58.870 --> 00:09:02.510
I think it's helpful
just to think of,
00:09:02.510 --> 00:09:06.210
if you have Fed
funds sub t, that's
00:09:06.210 --> 00:09:13.970
equal to minus 0.71 times the
unemployment rate at t minus 1.
00:09:13.970 --> 00:09:24.050
And then we have plus 0.37 times
the Fed funds, so t minus 1.
00:09:24.050 --> 00:09:24.820
And this is delta.
00:09:24.820 --> 00:09:31.330
And then minus 1.8
times the Fed funds.
00:09:31.330 --> 00:09:35.000
So t minus 2.
00:09:35.000 --> 00:09:39.290
In interpreting
these coefficients,
00:09:39.290 --> 00:09:43.020
notice that these
two terms correspond
00:09:43.020 --> 00:09:57.110
to 0.19 times the Fed funds
change 1 lag ago plus 0.18
00:09:57.110 --> 00:09:59.445
times the change in that rate.
00:10:03.550 --> 00:10:06.360
So when you see
multiple lags coming
00:10:06.360 --> 00:10:11.720
into play in these models,
the interpretation of them
00:10:11.720 --> 00:10:17.560
can be made by considering
different transformations
00:10:17.560 --> 00:10:20.210
essentially of the
underlying variables.
00:10:20.210 --> 00:10:23.130
In this form, you can see
that OK, the Fed funds
00:10:23.130 --> 00:10:30.180
tends to change the way it
changed the previous month.
00:10:30.180 --> 00:10:38.644
But it also may change
depending on the double change
00:10:38.644 --> 00:10:39.560
in the previous month.
00:10:39.560 --> 00:10:42.620
So there's a degree of
acceleration in the Fed funds
00:10:42.620 --> 00:10:44.450
that is being captured here.
00:10:44.450 --> 00:10:47.640
So the interpretation
of these models
00:10:47.640 --> 00:10:51.930
sometimes requires some care.
00:10:51.930 --> 00:10:55.560
This kind of analysis,
I find it quite useful.
00:11:02.600 --> 00:11:09.710
So let's push on
to the next topic.
00:11:09.710 --> 00:11:13.230
So today's topics are going
to begin with a discussion
00:11:13.230 --> 00:11:15.640
of cointegration.
00:11:15.640 --> 00:11:18.980
Cointegration is a major topic
in time series analysis, which
00:11:18.980 --> 00:11:23.980
is dealing with the analysis
of non-stationary time series.
00:11:23.980 --> 00:11:28.060
And in the previous
discussion, we
00:11:28.060 --> 00:11:29.910
addressed
non-stationarity of series
00:11:29.910 --> 00:11:32.214
by taking first
differences to eliminate
00:11:32.214 --> 00:11:33.130
that non-stationarity.
00:11:36.440 --> 00:11:40.140
But we may be losing
some information
00:11:40.140 --> 00:11:41.450
with that differencing.
00:11:41.450 --> 00:11:44.940
And cointegration
provides a framework
00:11:44.940 --> 00:11:47.440
within which we
characterize all available
00:11:47.440 --> 00:11:49.680
information for
statistical modeling,
00:11:49.680 --> 00:11:52.920
in a very systematic way.
00:11:52.920 --> 00:11:58.580
So let's introduce the
context within which
00:11:58.580 --> 00:12:00.630
cointegration is relevant.
00:12:00.630 --> 00:12:05.810
It's relevant when we
have a stochastic process,
00:12:05.810 --> 00:12:08.620
a multivariate
stochastic process, which
00:12:08.620 --> 00:12:12.060
is integrated of some order d.
00:12:12.060 --> 00:12:15.810
And to be integrated
of order d means
00:12:15.810 --> 00:12:18.920
that if we take the
d-th difference,
00:12:18.920 --> 00:12:21.395
then that d-th
difference is stationary.
00:12:23.980 --> 00:12:33.720
So and if you look
at a time series
00:12:33.720 --> 00:12:38.630
and you plot that over time,
well, OK, a stationary time
00:12:38.630 --> 00:12:43.010
series we know should be
something that basically
00:12:43.010 --> 00:12:45.010
has a constant mean over time.
00:12:45.010 --> 00:12:48.580
There's some steady
mean which that has.
00:12:48.580 --> 00:12:51.470
And the variability
is also constant.
00:12:51.470 --> 00:12:59.000
With some other time series,
it might increase linearly
00:12:59.000 --> 00:13:00.940
over time.
00:13:00.940 --> 00:13:03.600
And a series that increases
linearly over time, well,
00:13:03.600 --> 00:13:05.070
if you take first
differences, that
00:13:05.070 --> 00:13:07.650
tends to take out
that linear trend.
00:13:07.650 --> 00:13:10.230
If there are higher order
differencing is required, then
00:13:10.230 --> 00:13:14.160
that means that there's some
curvature, quadratic say,
00:13:14.160 --> 00:13:18.760
that may exist in the data
that is being taken out.
00:13:18.760 --> 00:13:25.460
So this differencing is required
to result in stationarity.
00:13:25.460 --> 00:13:32.430
If the process does have vector
autoregressive representation
00:13:32.430 --> 00:13:35.330
in spite of its
non-stationarity,
00:13:35.330 --> 00:13:43.920
then it can be represented by
a polynomial lag of the x's is
00:13:43.920 --> 00:13:48.690
equal to white noise epsilon.
00:13:48.690 --> 00:13:53.590
And the polynomial
phi of L going
00:13:53.590 --> 00:13:59.180
to have a factor term
in there of 1 minus L,
00:13:59.180 --> 00:14:02.100
basically the first
difference to the d power.
00:14:02.100 --> 00:14:06.300
So if taking these the
d-th order difference
00:14:06.300 --> 00:14:12.430
reduces it to
stationarity, then we
00:14:12.430 --> 00:14:16.630
can express this vector
autoregression in this way.
00:14:16.630 --> 00:14:26.620
So the phi star of L
basically represents
00:14:26.620 --> 00:14:31.110
the stationary vector
autoregressive process
00:14:31.110 --> 00:14:33.255
on the d-th difference series.
00:14:47.730 --> 00:14:52.780
Now, as it says here, each
of the component series
00:14:52.780 --> 00:14:57.090
may be non-stationary and
integrated, say of order one.
00:14:57.090 --> 00:15:02.770
But the process itself may
not be jointly integrated.
00:15:02.770 --> 00:15:08.900
In that it may be that there
are linear combinations
00:15:08.900 --> 00:15:13.800
of our multivariate series
which are stationary.
00:15:13.800 --> 00:15:20.570
And so these linear
combinations basically
00:15:20.570 --> 00:15:25.050
represent the stationary
features of the process.
00:15:25.050 --> 00:15:31.160
And those features can be
apparent without looking
00:15:31.160 --> 00:15:32.490
at differences.
00:15:32.490 --> 00:15:35.350
So in a sense, if
you just focused
00:15:35.350 --> 00:15:38.880
on differences of these
non-stationary multivariate
00:15:38.880 --> 00:15:43.560
series, you would be
losing out on information
00:15:43.560 --> 00:15:49.900
of the stationary structure
of contemporaneous components
00:15:49.900 --> 00:15:52.230
of the multivariate series.
00:15:52.230 --> 00:15:56.130
And so cointegration
deals with this situation
00:15:56.130 --> 00:16:01.480
where some linear combinations
of the multivariate series
00:16:01.480 --> 00:16:02.996
in fact are stationary.
00:16:08.810 --> 00:16:15.090
So how do we represent
that mathematically?
00:16:15.090 --> 00:16:19.020
Well, we say that this
multivariate time series
00:16:19.020 --> 00:16:24.360
process is cointegrated if
there exists an m-vector beta
00:16:24.360 --> 00:16:29.470
such that, defining linear
weights on the x's, and
00:16:29.470 --> 00:16:32.225
beta prime X_t is a
stationary process.
00:16:37.920 --> 00:16:42.610
The cointegration vector of
beta can be scaled arbitrarily.
00:16:42.610 --> 00:16:49.110
So it's common
practice, if one has
00:16:49.110 --> 00:16:51.200
an interest, some primary
interest, perhaps,
00:16:51.200 --> 00:16:53.580
in the first component
series of process,
00:16:53.580 --> 00:16:56.680
to set that equal to 1.
00:16:56.680 --> 00:17:01.020
And the expression
basically says
00:17:01.020 --> 00:17:06.470
that our time t value
of the first series
00:17:06.470 --> 00:17:11.930
is related in a stationary
way to a linear combination
00:17:11.930 --> 00:17:15.550
of the other m minus 1 series.
00:17:15.550 --> 00:17:21.859
And this is a long-run
equilibrium type relationship.
00:17:21.859 --> 00:17:25.510
How does this arise?
00:17:25.510 --> 00:17:30.570
Well, it arises in many, many
ways in economics and finance.
00:17:33.100 --> 00:17:36.000
The term structure of interest
rates, purchase power parity.
00:17:38.820 --> 00:17:42.660
In the terms structure
of interest rates,
00:17:42.660 --> 00:17:47.100
basically the differences
between yields
00:17:47.100 --> 00:17:50.260
on interest rates over
different maturities,
00:17:50.260 --> 00:17:52.600
those differences
might be stationary.
00:17:52.600 --> 00:17:56.780
The overall level of interest
might not be stationary,
00:17:56.780 --> 00:18:01.350
but the spreads ought
to be stationary.
00:18:01.350 --> 00:18:04.680
The purchase power parity
in foreign exchange,
00:18:04.680 --> 00:18:10.940
if you look at the
value of currencies
00:18:10.940 --> 00:18:14.830
for different countries,
basically different countries
00:18:14.830 --> 00:18:19.710
ought to be able to purchase
the same goods for roughly
00:18:19.710 --> 00:18:20.720
the same price.
00:18:20.720 --> 00:18:23.860
And so if there are
disparities in currency values,
00:18:23.860 --> 00:18:27.740
purchase power parity suggests
that things will revert back
00:18:27.740 --> 00:18:32.900
to some norm where everybody
is paying on average over time
00:18:32.900 --> 00:18:34.960
the same amount for
different goods.
00:18:34.960 --> 00:18:37.460
Otherwise, there
would be arbitrage.
00:18:40.030 --> 00:18:41.890
Money demand, covered
interest rate parity,
00:18:41.890 --> 00:18:44.340
law of one price,
spot and futures.
00:18:44.340 --> 00:18:48.470
Let me show you
another example that
00:18:48.470 --> 00:18:54.820
will be in the case
study for this chapter.
00:19:00.290 --> 00:19:06.410
View, full screen.
00:19:06.410 --> 00:19:09.900
Let's think about
energy futures.
00:19:09.900 --> 00:19:13.450
In fact, next Tuesday's
talk from Morgan Stanley
00:19:13.450 --> 00:19:18.490
is going to be an expert in
commodity futures and options.
00:19:18.490 --> 00:19:21.090
And that should be
very interesting.
00:19:21.090 --> 00:19:28.920
Anyway, here, I'm
looking at energy futures
00:19:28.920 --> 00:19:31.136
from the Energy
Information Administration.
00:19:31.136 --> 00:19:32.510
Actually, for this
course, trying
00:19:32.510 --> 00:19:36.970
to get data that's freely
available to students
00:19:36.970 --> 00:19:40.560
is one of the things we do.
00:19:40.560 --> 00:19:42.646
So this data is actually
available from the Energy
00:19:42.646 --> 00:19:44.770
Information Administration
of the government, which
00:19:44.770 --> 00:19:48.960
is now open, so I guess
that'll be updated over time.
00:19:48.960 --> 00:19:52.070
But basically these
energy futures
00:19:52.070 --> 00:19:55.570
are traded on the Chicago
Mercantile Exchange.
00:19:55.570 --> 00:20:03.290
And basically CL is crude,
West Texas intermediate crude,
00:20:03.290 --> 00:20:08.760
light crude, which we have
here, a time series from 2006
00:20:08.760 --> 00:20:12.670
to basically yesterday.
00:20:12.670 --> 00:20:16.340
And you can see how at the
start of the period around $60
00:20:16.340 --> 00:20:19.080
and then went up
to close to $140,
00:20:19.080 --> 00:20:22.440
and then it dropped
down to around $40.
00:20:22.440 --> 00:20:26.110
And it's been hovering
around $100 lately.
00:20:26.110 --> 00:20:33.040
The second series here is
gasoline, RBOB gasoline.
00:20:33.040 --> 00:20:36.240
Always have to look this up.
00:20:36.240 --> 00:20:42.690
This is that reformulated blend
stock for oxygenated blending
00:20:42.690 --> 00:20:43.250
gasoline.
00:20:43.250 --> 00:20:48.030
Anyway, futures on this product
are traded at the CME as well.
00:20:48.030 --> 00:20:50.750
And then heating oil.
00:20:50.750 --> 00:20:56.780
And what's happening
with these data
00:20:56.780 --> 00:21:08.880
is that we have basically
a refinery which processes
00:21:08.880 --> 00:21:15.990
crude oil as an input.
00:21:15.990 --> 00:21:20.180
And it basically
refines it, distills it,
00:21:20.180 --> 00:21:36.600
and generates outputs, which
include heating oil, gasoline,
00:21:36.600 --> 00:21:41.680
and various other things
like jet fuel and others.
00:21:41.680 --> 00:21:46.460
So if we're looking
at the prices,
00:21:46.460 --> 00:21:49.510
the futures prices of, say,
gasoline and heating oil,
00:21:49.510 --> 00:21:55.710
relating those to crude
oil, well, certainly,
00:21:55.710 --> 00:21:59.140
the cost of producing these
products should depend
00:21:59.140 --> 00:22:01.820
on the cost of the input .
00:22:01.820 --> 00:22:10.480
So I've got in the next plot,
a translation of these futures
00:22:10.480 --> 00:22:15.510
contracts into their
price per barrel.
00:22:15.510 --> 00:22:19.320
Turns out crude is quoted
in dollars per barrel.
00:22:19.320 --> 00:22:24.390
And the gasoline heating
oil are in cents per gallon.
00:22:24.390 --> 00:22:26.490
So one multiplies.
00:22:26.490 --> 00:22:28.310
There are 42
gallons in a barrel.
00:22:28.310 --> 00:22:30.960
So you multiply those
previous years by 42.
00:22:30.960 --> 00:22:33.549
And this shows the plot of
the prices of the futures
00:22:33.549 --> 00:22:35.590
where we're looking at
essentially the same units
00:22:35.590 --> 00:22:40.600
of output relative to input.
00:22:40.600 --> 00:22:45.700
And what's evident here is that
while the futures for gasoline,
00:22:45.700 --> 00:22:50.450
the blue, is consistently above
the green, the input, and same
00:22:50.450 --> 00:22:52.520
for heating oil.
00:22:52.520 --> 00:22:55.680
And those vary depending
on which is greater.
00:22:55.680 --> 00:23:02.600
So if we look at the
difference between, say,
00:23:02.600 --> 00:23:07.020
the price of the heating
oil future and the crude oil
00:23:07.020 --> 00:23:11.625
future, what does
that represent?
00:23:14.380 --> 00:23:20.780
That's the spread in value of
the output minus the input.
00:23:20.780 --> 00:23:21.546
Ray?
00:23:21.546 --> 00:23:24.282
AUDIENCE: [INAUDIBLE] cost
of running the refinery?
00:23:27.146 --> 00:23:31.940
PROFESSOR: So cost of refining.
00:23:31.940 --> 00:23:39.700
So let's look at, say,
heating oil minus CL and, say,
00:23:39.700 --> 00:23:43.930
this RBOB minus CL.
00:23:43.930 --> 00:23:46.670
So it's cost of refining.
00:23:46.670 --> 00:23:49.487
What else could
be a factor here?
00:23:49.487 --> 00:23:51.820
AUDIENCE: Supply and demand
characteristics [INAUDIBLE].
00:23:51.820 --> 00:23:52.736
PROFESSOR: Definitely.
00:23:52.736 --> 00:23:54.165
Supply and demand.
00:23:54.165 --> 00:23:56.290
If one product is demanded
a lot more than another.
00:23:58.280 --> 00:23:59.030
Supply and demand.
00:24:05.820 --> 00:24:08.215
Anything else?
00:24:08.215 --> 00:24:09.840
AUDIENCE: Maybe for
the outputs, if you
00:24:09.840 --> 00:24:11.340
were to find the difference
between the outputs,
00:24:11.340 --> 00:24:13.060
it would be something cyclical.
00:24:13.060 --> 00:24:15.640
For example, in the
winter, heating oil
00:24:15.640 --> 00:24:17.840
is going to get far more
valuable as gasoline,
00:24:17.840 --> 00:24:19.840
because people drive less
and people demand more
00:24:19.840 --> 00:24:20.950
for heating homes.
00:24:20.950 --> 00:24:22.080
PROFESSOR: Absolutely.
00:24:22.080 --> 00:24:25.670
That's a very significant
factor with these.
00:24:25.670 --> 00:24:29.230
There are seasonal effects
that drive supply and demand.
00:24:29.230 --> 00:24:35.460
And so we can put
seasonal effects in there
00:24:35.460 --> 00:24:36.980
as affecting supply and demand.
00:24:36.980 --> 00:24:40.280
But certainly, you might expect
to see seasonal structure here.
00:24:40.280 --> 00:24:43.720
Anything else?
00:24:43.720 --> 00:24:47.070
Put on your traders hat.
00:24:47.070 --> 00:24:49.310
Profit, yes.
00:24:49.310 --> 00:24:53.160
The refinery needs
to make some profit.
00:24:53.160 --> 00:24:58.520
So there has to be some
level of profit that's
00:24:58.520 --> 00:25:02.240
acceptable and appropriate.
00:25:02.240 --> 00:25:05.250
So we have all these
things driving basically
00:25:05.250 --> 00:25:07.630
these differences.
00:25:07.630 --> 00:25:10.220
Let's just take a look
at those differences.
00:25:10.220 --> 00:25:14.880
These are actually
called the crack spreads.
00:25:14.880 --> 00:25:19.250
Cracking in the
business of refining
00:25:19.250 --> 00:25:22.220
is basically the
breaking down of oil
00:25:22.220 --> 00:25:26.250
into components, products.
00:25:26.250 --> 00:25:31.800
And on the top is the
gasoline crack spread.
00:25:31.800 --> 00:25:35.460
And the bottom is the
heating oil crack spread.
00:25:35.460 --> 00:25:37.720
And one can see
that as time series,
00:25:37.720 --> 00:25:41.860
these actually look stationary.
00:25:41.860 --> 00:25:45.920
There certainly doesn't appear
to be a linear trend up.
00:25:45.920 --> 00:25:51.390
But there are, of course, many
factors that could affect this.
00:25:51.390 --> 00:25:59.110
So with that as motivation, how
would we model such a series?
00:25:59.110 --> 00:26:01.230
So let's go back to
our lecture here.
00:26:06.420 --> 00:26:08.775
All right, View, full size.
00:26:15.760 --> 00:26:18.430
This is going to be a
very technical discussion,
00:26:18.430 --> 00:26:25.460
but it's, at the end of the day,
I think fairly straightforward.
00:26:25.460 --> 00:26:27.210
And the objective
actually of this lecture
00:26:27.210 --> 00:26:31.240
is to provide an introduction
to the notation here, which
00:26:31.240 --> 00:26:35.860
should make it seem like it's a
very straightforward derivation
00:26:35.860 --> 00:26:37.800
process of these models.
00:26:37.800 --> 00:26:42.890
So let's begin with just a recap
of the vector autoregressive
00:26:42.890 --> 00:26:45.350
model of order p.
00:26:45.350 --> 00:26:47.570
This is the extension of
the univariate case where
00:26:47.570 --> 00:26:52.870
we have a vector C of
constants, m constants,
00:26:52.870 --> 00:26:56.960
and matrices phi_1 to
phi_p corresponding
00:26:56.960 --> 00:27:01.650
to basically how the
autoregression of one series
00:27:01.650 --> 00:27:04.810
depends on all the other series.
00:27:04.810 --> 00:27:08.270
And then there's multivariate
white noise eta_t,
00:27:08.270 --> 00:27:13.630
which has mean 0 and some
covariance structure in it.
00:27:13.630 --> 00:27:19.830
And the stationarity-- if
this series were stationary,
00:27:19.830 --> 00:27:28.050
then the determinant of
this matrix polynomial
00:27:28.050 --> 00:27:33.360
would have roots outside the
unit circle for complex z.
00:27:33.360 --> 00:27:39.290
And if it's not stationary,
then some of those roots
00:27:39.290 --> 00:27:41.680
will be on the unit
circle or beyond.
00:27:41.680 --> 00:27:45.125
So let's actually go to
that non-stationary case
00:27:45.125 --> 00:27:50.540
and suppose that the process
is integrated of order one.
00:27:50.540 --> 00:27:53.050
So if we were to take
first differences,
00:27:53.050 --> 00:27:54.175
we would have stationarity.
00:28:02.690 --> 00:28:06.500
Well, the derivation
of the model
00:28:06.500 --> 00:28:12.150
proceeds by converting the
original vector autoregressive
00:28:12.150 --> 00:28:16.050
equation into an
equation that's mostly
00:28:16.050 --> 00:28:19.560
relating to differences but
with also some extra terms.
00:28:19.560 --> 00:28:24.130
So let's begin the process
by just subtracting
00:28:24.130 --> 00:28:26.620
the lagged value of
the multivariate vector
00:28:26.620 --> 00:28:29.030
from the original series.
00:28:29.030 --> 00:28:31.290
So we subtract X_(t-1)
from both sides,
00:28:31.290 --> 00:28:37.330
and we get delta X_t is equal to
C plus phi_1 minus I_m X_(t-1)
00:28:37.330 --> 00:28:38.200
plus the rest.
00:28:38.200 --> 00:28:41.960
So that's a very simple step.
00:28:41.960 --> 00:28:46.220
We're just subtracting the
lagged multivariate series
00:28:46.220 --> 00:28:49.370
from both sides.
00:28:49.370 --> 00:28:53.290
Now, what we want
to do is convert
00:28:53.290 --> 00:28:59.930
the second term in the middle
line into a difference term.
00:28:59.930 --> 00:29:00.990
So what do we do?
00:29:00.990 --> 00:29:07.900
Well, we can subtract and add
phi_1 minus I_m times X_(t-2).
00:29:07.900 --> 00:29:10.440
If we do that,
subtract and add that,
00:29:10.440 --> 00:29:13.810
we then get the delta X_t is
C plus a multiple of delta
00:29:13.810 --> 00:29:19.530
X_(t-1) plus this
multiple of X_(t-2).
00:29:19.530 --> 00:29:22.240
So we basically
reduced the equations
00:29:22.240 --> 00:29:25.290
to differences in
the first two terms
00:29:25.290 --> 00:29:29.520
or in the current
series and the lagged.
00:29:29.520 --> 00:29:33.550
But then we have the original
series for lags t minus 2.
00:29:33.550 --> 00:29:38.660
We can continue this
process with the third.
00:29:38.660 --> 00:29:42.460
And then at the
end of the day, we
00:29:42.460 --> 00:29:46.150
end up getting this equation
for the difference of the series
00:29:46.150 --> 00:29:49.300
is equal to a constant
plus a matrix multiple
00:29:49.300 --> 00:29:53.880
of the first difference
multivariate series,
00:29:53.880 --> 00:29:56.920
plus another matrix times
the second difference,
00:29:56.920 --> 00:30:01.720
all the way down to
the p-th difference,
00:30:01.720 --> 00:30:03.760
or the p minus first difference.
00:30:03.760 --> 00:30:07.400
But at the end,
we're left with terms
00:30:07.400 --> 00:30:11.320
at p lags that have no
differences in them.
00:30:11.320 --> 00:30:14.440
So we've been able to
represent this series
00:30:14.440 --> 00:30:19.090
as an autoregressive
function of differences.
00:30:19.090 --> 00:30:24.010
But there's also a term on
the undifferenced series
00:30:24.010 --> 00:30:27.470
at the end that's left over.
00:30:27.470 --> 00:30:34.900
And or this argument
can actually
00:30:34.900 --> 00:30:38.330
proceed by eliminating
differences in the reverse way,
00:30:38.330 --> 00:30:42.650
starting with the
p-th lag and going up.
00:30:42.650 --> 00:30:47.200
And one then can represent
this as delta X_t
00:30:47.200 --> 00:30:50.170
is C plus some
matrix times just the
00:30:50.170 --> 00:30:56.000
lagged series plus various
matrices times the differences
00:30:56.000 --> 00:30:58.880
going back p minus 1 lags.
00:31:05.460 --> 00:31:10.200
And so at the end of
the day, this model
00:31:10.200 --> 00:31:14.270
basically for delta
X_t is a constant
00:31:14.270 --> 00:31:20.760
plus a matrix times the
previous lagged series
00:31:20.760 --> 00:31:25.660
or the first lag of the
multivariate time series,
00:31:25.660 --> 00:31:30.320
plus various autoregressive
lags of the differenced series.
00:31:32.960 --> 00:31:36.130
So these notes give you
the formulas for those,
00:31:36.130 --> 00:31:40.840
and they're very easy to
verify if you go through them
00:31:40.840 --> 00:31:41.594
one by one.
00:31:45.730 --> 00:31:51.760
And when we look at this
expression for the model,
00:31:51.760 --> 00:31:57.270
this expresses the
stochastic process model
00:31:57.270 --> 00:31:59.560
for the difference series.
00:31:59.560 --> 00:32:03.780
This difference
series is stationary.
00:32:03.780 --> 00:32:05.970
We've eliminated
the non-stationarity
00:32:05.970 --> 00:32:06.630
in the process.
00:32:06.630 --> 00:32:09.160
So that means the
right-hand side
00:32:09.160 --> 00:32:12.890
has to be stationary as well.
00:32:12.890 --> 00:32:19.890
And so while the terms which
are matrix multiples of lags
00:32:19.890 --> 00:32:21.390
of the differenced
series, those are
00:32:21.390 --> 00:32:23.750
going to be stationary
because we're just
00:32:23.750 --> 00:32:27.680
taking lags of the
stationary multivariate time
00:32:27.680 --> 00:32:29.540
series, the difference series.
00:32:29.540 --> 00:32:36.880
But this pi X_t term has
to be stationary as well.
00:32:36.880 --> 00:32:41.640
So this pi X_t contains
the cointegrating terms.
00:32:41.640 --> 00:32:46.600
And fitting a sort of
cointegrated vector
00:32:46.600 --> 00:32:53.490
autoregression model involves
identifying this term, pi X_t.
00:32:53.490 --> 00:33:00.870
And given that the original
series had unit roots,
00:33:00.870 --> 00:33:06.195
it has to be the case that
pi, the matrix, is singular.
00:33:09.550 --> 00:33:12.080
So it's basically
a transformation
00:33:12.080 --> 00:33:15.310
of the data that
eliminates that unit
00:33:15.310 --> 00:33:19.880
root in the overall series.
00:33:19.880 --> 00:33:24.440
So the matrix pi
is of reduced rank,
00:33:24.440 --> 00:33:27.676
and it's either rank
zero, in which case
00:33:27.676 --> 00:33:29.300
there's no cointegrating
relationships,
00:33:29.300 --> 00:33:34.500
or its rank is less than m.
00:33:34.500 --> 00:33:39.060
And the matrix pi does
define the cointegrating
00:33:39.060 --> 00:33:40.550
relationships.
00:33:40.550 --> 00:33:43.080
Now, these cointegrating
relationships
00:33:43.080 --> 00:33:48.990
are the relationships in the
process that are stationary.
00:33:48.990 --> 00:33:53.200
And so basically there's
a lot of information
00:33:53.200 --> 00:33:57.880
in that multivariate series
with contemporaneous values
00:33:57.880 --> 00:33:59.470
of the series.
00:33:59.470 --> 00:34:02.500
There is stationary structure
at every single time
00:34:02.500 --> 00:34:08.199
point, which can be the
target of the modeling.
00:34:08.199 --> 00:34:16.250
So this matrix pi is
of rank r less than m.
00:34:16.250 --> 00:34:22.100
And so it can be expressed
as basically alpha beta
00:34:22.100 --> 00:34:30.540
prime, where these matrices
are of rank r, alpha and beta.
00:34:30.540 --> 00:34:33.199
And the columns of beta define
linearly independent vectors
00:34:33.199 --> 00:34:34.770
which cointegrate x.
00:34:34.770 --> 00:34:37.909
And the decomposition
of pi isn't unique.
00:34:37.909 --> 00:34:43.389
You can basically, for any
invertible r by r matrix g,
00:34:43.389 --> 00:34:46.350
define another set of
cointegrating relationships.
00:34:46.350 --> 00:34:50.340
So in the linear algebra
structure of these problems,
00:34:50.340 --> 00:34:52.800
there's basically an
r-dimensional space
00:34:52.800 --> 00:34:56.360
where the process is
stationary, and how
00:34:56.360 --> 00:35:02.020
you define the coordinate system
in that space is up to you
00:35:02.020 --> 00:35:08.130
or subject to some choice.
00:35:08.130 --> 00:35:09.780
So how do we estimate
these models?
00:35:09.780 --> 00:35:15.520
Well, rather nice result
of Sims, Stock, and Watson.
00:35:15.520 --> 00:35:17.800
Actually, Sims,
Christopher Sims,
00:35:17.800 --> 00:35:21.790
he got the Nobel Prize a
few years ago for his work
00:35:21.790 --> 00:35:23.730
in econometrics.
00:35:23.730 --> 00:35:33.850
And so this is a rather
significant work that he did.
00:35:33.850 --> 00:35:36.740
Anyway, he, together
with Stock and Watson,
00:35:36.740 --> 00:35:41.120
prove that if you're estimating
a vector autoregression model,
00:35:41.120 --> 00:35:45.490
then the least squares
estimator of the original model
00:35:45.490 --> 00:35:49.150
is basically sufficient
to do an analysis
00:35:49.150 --> 00:35:56.600
of this cointegrated vector
autoregression process.
00:35:56.600 --> 00:35:58.960
The parameter estimates
from just fitting
00:35:58.960 --> 00:36:03.610
the vector autoregression are
consistent for the underlying
00:36:03.610 --> 00:36:04.657
parameters.
00:36:04.657 --> 00:36:06.240
And they have
asymptotic distributions
00:36:06.240 --> 00:36:09.980
that are identical to those of
maximum likelihood estimators.
00:36:09.980 --> 00:36:18.360
And so what ends up happening
is the least squares estimates
00:36:18.360 --> 00:36:21.960
of the vector autoregression
parameters lead
00:36:21.960 --> 00:36:27.270
to an estimation
of the pi matrix.
00:36:27.270 --> 00:36:40.290
And the constraints on the pi
matrix which are basically pi
00:36:40.290 --> 00:36:44.430
is of reduced rank, those
will hold asymptotically.
00:36:44.430 --> 00:36:49.240
So let's just go back
to the equation before,
00:36:49.240 --> 00:36:54.490
to see if that
looks familiar here.
00:36:58.930 --> 00:37:03.070
So what that work says
is that if we basically
00:37:03.070 --> 00:37:07.110
fit the linear regression
model regressing the difference
00:37:07.110 --> 00:37:13.930
series on the lag of the series
plus lags of differences,
00:37:13.930 --> 00:37:18.590
the least squares estimates
of these underlying parameters
00:37:18.590 --> 00:37:21.690
will give us asymptotically
efficient estimates
00:37:21.690 --> 00:37:24.060
of this overall process.
00:37:24.060 --> 00:37:31.635
So we don't need to use any new
tools to specify these models.
00:37:43.800 --> 00:37:48.110
There's an advanced literature
on estimation methods
00:37:48.110 --> 00:37:49.950
for these models.
00:37:49.950 --> 00:37:55.050
Johansen does describe
maximum likelihood estimation
00:37:55.050 --> 00:38:01.260
when the innovation terms
are normally distributed.
00:38:01.260 --> 00:38:07.270
And that methodology applies
reduced rank regression
00:38:07.270 --> 00:38:13.150
methodology and
yields tests for what
00:38:13.150 --> 00:38:17.130
the rank is of the
cointegrating relationship.
00:38:17.130 --> 00:38:20.270
And these methods are
implemented in our packages.
00:38:25.710 --> 00:38:26.420
Let's see.
00:38:26.420 --> 00:38:40.890
Let me just go back now
to the-- so let's see.
00:38:40.890 --> 00:38:47.690
The case study on
the crack spread data
00:38:47.690 --> 00:38:51.370
actually goes through sort of
testing for non-stationarity
00:38:51.370 --> 00:38:54.040
in these underlying series.
00:38:54.040 --> 00:38:58.360
And actually, why don't
I just show you that?
00:38:58.360 --> 00:38:59.450
Let's go back here.
00:39:17.522 --> 00:39:23.460
If you can see this, for
the crack spread data,
00:39:23.460 --> 00:39:25.230
looking at the
crude oil futures,
00:39:25.230 --> 00:39:28.450
basically the crude oil
future can be evaluated
00:39:28.450 --> 00:39:30.790
to see if it's non-stationary.
00:39:30.790 --> 00:39:33.800
And there's this augmented
Dickey-Fuller test
00:39:33.800 --> 00:39:36.350
for non-stationarity.
00:39:36.350 --> 00:39:43.160
And it basically has a null
hypothesis that the model
00:39:43.160 --> 00:39:46.850
or the series is non-stationary,
or it has a unit root,
00:39:46.850 --> 00:39:49.040
versus the alternative
that it doesn't.
00:39:49.040 --> 00:39:52.180
And so testing that
null hypothesis
00:39:52.180 --> 00:39:56.121
that it's non-stationary
yields a p-value of 0.164
00:39:56.121 --> 00:40:01.690
for CLC1, the first
nearest contract,
00:40:01.690 --> 00:40:07.400
near month contract of
the futures for crude.
00:40:07.400 --> 00:40:11.230
And so the data
suggests that crude
00:40:11.230 --> 00:40:14.060
has a distribution that's
non-stationary, integrated
00:40:14.060 --> 00:40:16.490
order 1.
00:40:16.490 --> 00:40:23.950
And the HOC1 also basically
has a test for-- p-value
00:40:23.950 --> 00:40:27.550
for non-stationarity of 0.3265.
00:40:27.550 --> 00:40:31.000
So we can't reject
non-stationarity or unit root
00:40:31.000 --> 00:40:34.150
in those series with
these test statistics.
00:40:34.150 --> 00:40:39.260
In analyzing the data, this
suggests that we basically
00:40:39.260 --> 00:40:41.380
need to accommodate that
non-stationarity when
00:40:41.380 --> 00:40:43.150
we specify the models.
00:40:46.925 --> 00:40:49.130
Let me just see if
there's some results here.
00:41:55.180 --> 00:41:59.060
For this series,
actually the case notes
00:41:59.060 --> 00:42:01.270
will go through actually
conducting this Johansen
00:42:01.270 --> 00:42:03.360
procedure for
testing for the rank
00:42:03.360 --> 00:42:05.700
of the cointegrated process.
00:42:05.700 --> 00:42:11.630
And that test basically has
different test statistic
00:42:11.630 --> 00:42:15.260
for testing whether the rank is
0, 1, less than or equal to 1,
00:42:15.260 --> 00:42:16.870
or less than or equal to 2.
00:42:16.870 --> 00:42:19.650
And one can see that
there's marginal-- the test
00:42:19.650 --> 00:42:25.930
statistic is almost
significant at the 10% level
00:42:25.930 --> 00:42:29.780
for the overall series.
00:42:29.780 --> 00:42:32.670
It's not significant
for the rank
00:42:32.670 --> 00:42:34.460
being less than or equal to 1.
00:42:34.460 --> 00:42:38.390
And so these results, it
doesn't suggest there's
00:42:38.390 --> 00:42:40.880
strong non-stationarity.
00:42:40.880 --> 00:42:45.360
But certainly with
that non-stationarity
00:42:45.360 --> 00:42:48.620
is no more than rank
one for the series.
00:42:48.620 --> 00:42:52.030
And the eigenvector
corresponding
00:42:52.030 --> 00:42:54.070
to the stationary
relationship is
00:42:54.070 --> 00:43:00.940
given by these coefficients
of 1 on the crude oil future,
00:43:00.940 --> 00:43:05.710
1.3 on the RBOB and minus
1.7 on the heating oil.
00:43:08.640 --> 00:43:13.360
So what this suggests
is that there's
00:43:13.360 --> 00:43:20.880
considerable variability in
these energy futures contracts.
00:43:20.880 --> 00:43:24.390
What appears to be stationary
is some linear combination
00:43:24.390 --> 00:43:28.670
of crude plus gasoline
minus heating oil.
00:43:28.670 --> 00:43:33.090
And in terms of why does
it combine that way,
00:43:33.090 --> 00:43:35.280
well, there are all
kinds of factors
00:43:35.280 --> 00:43:38.760
that we went through-- cost of
refining, supply and demand,
00:43:38.760 --> 00:43:41.370
seasonality, which
affect things.
00:43:41.370 --> 00:43:45.970
And so when analyzed, sort
of ignoring seasonality,
00:43:45.970 --> 00:43:50.000
these would be the linear
combinations that appear
00:43:50.000 --> 00:43:51.312
to be stationary over time.
00:43:51.312 --> 00:43:51.812
Yeah?
00:43:53.722 --> 00:43:55.680
AUDIENCE: Why did you
choose to use the futures
00:43:55.680 --> 00:43:56.929
prices as opposed to the spot?
00:43:56.929 --> 00:44:00.170
And how did you combine the
data with actual [INAUDIBLE]?
00:44:00.170 --> 00:44:07.820
PROFESSOR: I chose this
because if refiners are wanting
00:44:07.820 --> 00:44:12.130
to hedge their risks, then they
will go to the futures market
00:44:12.130 --> 00:44:14.060
to hedge those.
00:44:14.060 --> 00:44:17.090
And so working with
these data, one
00:44:17.090 --> 00:44:24.370
can then consider problems of
hedging refinery production
00:44:24.370 --> 00:44:25.460
risks.
00:44:25.460 --> 00:44:28.620
And so that's why.
00:44:28.620 --> 00:44:30.960
AUDIENCE: [INAUDIBLE]
00:44:30.960 --> 00:44:33.800
PROFESSOR: OK, well, the Energy
Information Administration
00:44:33.800 --> 00:44:39.270
provides historical data
which gives the first month,
00:44:39.270 --> 00:44:42.030
the second month, the third
month available for each
00:44:42.030 --> 00:44:43.400
of these contracts.
00:44:43.400 --> 00:44:47.720
And so I chose the
first month contract
00:44:47.720 --> 00:44:49.680
for each of these features.
00:44:49.680 --> 00:44:51.980
Those 10 are the most liquid.
00:44:51.980 --> 00:44:54.440
Depending on what
one is hedging,
00:44:54.440 --> 00:44:58.550
one would use perhaps
longer periods for those.
00:44:58.550 --> 00:45:02.450
There's some very
nice finance problems
00:45:02.450 --> 00:45:04.690
dealing with hedging,
hedging these kinds of risks,
00:45:04.690 --> 00:45:07.150
and as well as trading
these kinds of risk.
00:45:07.150 --> 00:45:11.030
Traders can try to exploit
short term movements in these.
00:45:29.870 --> 00:45:31.820
Anyway, I'll let you
look through these,
00:45:31.820 --> 00:45:32.760
the case note later.
00:45:32.760 --> 00:45:36.810
And it does provide some detail
on the coefficient estimates.
00:45:36.810 --> 00:45:39.119
And one can basically
get a handle
00:45:39.119 --> 00:45:40.785
on how these things
are being specified.
00:45:43.980 --> 00:45:46.170
So let's go back.
00:45:58.260 --> 00:46:06.490
The next topic I want to cover
is linear state-space models.
00:46:06.490 --> 00:46:12.725
It turns out that many
of these time series
00:46:12.725 --> 00:46:15.090
models appropriate in
economics and finance
00:46:15.090 --> 00:46:20.290
can be expressed as a
linear state-space model.
00:46:28.590 --> 00:46:32.250
I'm going to introduce the
general notation first and then
00:46:32.250 --> 00:46:35.100
provide illustrations
of this general notation
00:46:35.100 --> 00:46:38.480
with a number of
different examples.
00:46:38.480 --> 00:46:46.205
So the formulation is we have
basically an observation vector
00:46:46.205 --> 00:46:47.420
at time t, y_t.
00:46:47.420 --> 00:46:50.730
This is our multivariate time
series that we're modeling.
00:46:50.730 --> 00:46:53.930
Now, I've chosen it
to be k-dimensional
00:46:53.930 --> 00:46:57.900
for the observations.
00:46:57.900 --> 00:47:00.720
There's an underlying
state vector
00:47:00.720 --> 00:47:04.390
that's of m dimensions,
which basically characterizes
00:47:04.390 --> 00:47:11.740
the state of the
process at time t.
00:47:11.740 --> 00:47:15.240
There's an observation error
vector at time t, epsilon_t.
00:47:15.240 --> 00:47:18.830
So it's k by 1 as well,
corresponding to y.
00:47:18.830 --> 00:47:22.200
And there's a state transition
innovation error vector,
00:47:22.200 --> 00:47:31.240
which is n by 1,
which actually can
00:47:31.240 --> 00:47:36.040
be different from m, the
dimension of the state vector.
00:47:36.040 --> 00:47:41.300
So we have-- in the state
space specification,
00:47:41.300 --> 00:47:43.720
we're going to specify
two equations, one
00:47:43.720 --> 00:47:47.640
for how the states evolve
over time and another for how
00:47:47.640 --> 00:47:50.090
the observations or
measurements evolve,
00:47:50.090 --> 00:47:51.910
depending on the
underlying states.
00:47:51.910 --> 00:47:55.400
So let's first focus
on a state equation
00:47:55.400 --> 00:47:58.490
which describes how
the state progresses
00:47:58.490 --> 00:48:05.680
from the state at time t to
the state at time t plus 1.
00:48:05.680 --> 00:48:09.030
Because this is a linear
state-space model,
00:48:09.030 --> 00:48:10.710
basically the state
at t plus 1 is
00:48:10.710 --> 00:48:13.400
going to be some linear
function of the states at time
00:48:13.400 --> 00:48:16.640
t plus some noise.
00:48:16.640 --> 00:48:22.570
And that noise is
given by eta_t,
00:48:22.570 --> 00:48:26.670
being independent identically
distributed white noise,
00:48:26.670 --> 00:48:31.600
or normally distributed
with some covariance matrix
00:48:31.600 --> 00:48:33.910
Q_t, positive definite.
00:48:33.910 --> 00:48:37.740
And R_t is some
linear transformation
00:48:37.740 --> 00:48:41.180
of those, which
characterize the uncertainty
00:48:41.180 --> 00:48:42.880
in the particular states.
00:48:42.880 --> 00:48:45.160
So there's a great
deal of flexibility
00:48:45.160 --> 00:48:47.830
here in how things
depend on each other.
00:48:47.830 --> 00:48:53.090
And right now, it will appear
just like a lot of notation.
00:48:53.090 --> 00:48:54.700
But as we see it
in different cases,
00:48:54.700 --> 00:48:57.750
you'll see how these
terms come into play.
00:48:57.750 --> 00:48:59.260
And they're very
straightforward.
00:49:02.510 --> 00:49:04.800
So we're considering simple
linear transformations
00:49:04.800 --> 00:49:07.080
of the states plus noise.
00:49:07.080 --> 00:49:09.690
And then the observation
equation or measurement
00:49:09.690 --> 00:49:13.080
equation is a linear
transformation
00:49:13.080 --> 00:49:14.665
of the underlying
states plus noise.
00:49:17.230 --> 00:49:20.230
So the matrix Z_t is the
observation coefficients
00:49:20.230 --> 00:49:21.500
matrix.
00:49:21.500 --> 00:49:25.792
And the noise or innovations
epsilon_t are, we'll assume,
00:49:25.792 --> 00:49:27.250
independent
identically distributed
00:49:27.250 --> 00:49:29.083
normal, multivariate
normal random variables
00:49:29.083 --> 00:49:33.550
with some covariance matrix H_t.
00:49:33.550 --> 00:49:35.760
To be fully general,
the subscript t
00:49:35.760 --> 00:49:40.800
means the covariance
can depend on time t.
00:49:40.800 --> 00:49:44.780
It doesn't have to, but it can.
00:49:44.780 --> 00:49:48.600
These two equations
can be written together
00:49:48.600 --> 00:49:52.830
in a joint equation where
we see that the underlying
00:49:52.830 --> 00:49:59.370
state at time t, s, gets
transformed with T sub t
00:49:59.370 --> 00:50:04.550
to the state at t plus 1 plus
residual innovation term.
00:50:04.550 --> 00:50:08.720
And the observation equation
y_t is Z_t s_t plus that.
00:50:08.720 --> 00:50:12.430
So we're representing how
the states evolve over time
00:50:12.430 --> 00:50:14.910
and how the observations
depend on the underlying
00:50:14.910 --> 00:50:16.815
states in this joint equation.
00:50:19.770 --> 00:50:23.950
And the structure of
basically this sort
00:50:23.950 --> 00:50:28.400
of linear function of states
plus error, the error term u_t
00:50:28.400 --> 00:50:33.740
here is normally distributed
with covariance matrix omega,
00:50:33.740 --> 00:50:36.690
which has this structure.
00:50:36.690 --> 00:50:38.850
It's a block diagonal.
00:50:38.850 --> 00:50:42.942
We have the covariance
of the epsilons as the H.
00:50:42.942 --> 00:50:48.860
And the covariance of R_t
eta_t is R_t Q_t R_t transpose.
00:50:48.860 --> 00:50:54.660
So you may recall when we
take a covariance matrix
00:50:54.660 --> 00:51:01.210
of linear function of random
variables given by a matrix,
00:51:01.210 --> 00:51:05.310
then it's that linear function
R times the covariance matrix
00:51:05.310 --> 00:51:07.970
times the transpose.
00:51:07.970 --> 00:51:12.910
So that term comes into play.
00:51:12.910 --> 00:51:16.860
So let's see how a
capital asset pricing
00:51:16.860 --> 00:51:19.720
model with time-varying
betas can be represented
00:51:19.720 --> 00:51:21.540
as a linear state-space model.
00:51:24.220 --> 00:51:29.180
You'll recall, we discussed
this model a few lectures ago,
00:51:29.180 --> 00:51:33.870
where we have the excess
return of a given stock, r_t,
00:51:33.870 --> 00:51:39.150
is a linear function of the
excess return of the market
00:51:39.150 --> 00:51:43.710
portfolio, r_(m,t), plus error.
00:51:43.710 --> 00:51:48.310
What we're going to do now
is extend that previous model
00:51:48.310 --> 00:51:54.170
by adding time dependence, t,
to the regression parameters.
00:51:54.170 --> 00:51:56.320
The alpha is not a constant.
00:51:56.320 --> 00:51:58.060
It is going to vary by time.
00:51:58.060 --> 00:52:02.700
And the beta is also
going to very by time.
00:52:02.700 --> 00:52:04.810
And how will they vary by time?
00:52:04.810 --> 00:52:10.030
Well, we're going to
assume that the alpha_t is
00:52:10.030 --> 00:52:13.520
a Gaussian random walk.
00:52:13.520 --> 00:52:17.982
And the beta is also a
Gaussian random walk.
00:52:28.810 --> 00:52:33.670
And with that set up, we
have the following expression
00:52:33.670 --> 00:52:35.450
for the state equation.
00:52:35.450 --> 00:52:38.460
OK, the state equation, which
is just the unknown parameters--
00:52:38.460 --> 00:52:40.990
it's the alpha and the
beta at given time t.
00:52:43.660 --> 00:52:45.720
The state at time
t gets adjusted
00:52:45.720 --> 00:52:49.340
to the state at time t plus 1
by just adding these random walk
00:52:49.340 --> 00:52:50.100
terms to it.
00:52:50.100 --> 00:52:52.290
So it's a very simple process.
00:52:52.290 --> 00:52:55.270
We have the identity
times the previous state
00:52:55.270 --> 00:52:58.930
plus the identity times this
vector of these innovations.
00:52:58.930 --> 00:53:04.120
So s_(t+1) is equal to
T_t s_t plus R_t eta_t,
00:53:04.120 --> 00:53:08.720
where this matrix, T sub
t and R sub t are trivial;
00:53:08.720 --> 00:53:10.290
they're just the identity.
00:53:10.290 --> 00:53:15.710
And eta_t has a
covariance matrix
00:53:15.710 --> 00:53:18.985
which is just given by
Q_t, sigma squared nu,
00:53:18.985 --> 00:53:22.560
sigma squared epsilon.
00:53:22.560 --> 00:53:28.680
This is a complex way, perhaps,
of representing this model.
00:53:28.680 --> 00:53:32.610
But it puts this simple model
into that linear state-space
00:53:32.610 --> 00:53:33.110
framework.
00:53:36.670 --> 00:53:45.660
Now, the observation equation
is given by this expression
00:53:45.660 --> 00:53:52.250
defining the Z_t matrix as the
unit element and r_(m,t) So
00:53:52.250 --> 00:53:58.150
it's basically a row vector, or
a row matrix, one-row matrix.
00:53:58.150 --> 00:54:02.180
And epsilon_t is the
white noise process.
00:54:02.180 --> 00:54:05.570
Now, putting these
equations together,
00:54:05.570 --> 00:54:09.270
we basically have the equation
for the state transition
00:54:09.270 --> 00:54:13.230
and the observation
equation together.
00:54:13.230 --> 00:54:16.120
We have this form for that.
00:54:25.780 --> 00:54:28.522
So now, let's
consider a second case
00:54:28.522 --> 00:54:31.360
of linear regression
models where
00:54:31.360 --> 00:54:33.780
we have a time varying beta.
00:54:33.780 --> 00:54:37.140
In a way, this case
we just looked at
00:54:37.140 --> 00:54:39.999
is a simple case of that.
00:54:39.999 --> 00:54:41.540
But let's look at
a more general case
00:54:41.540 --> 00:54:45.270
where we have p independent
variables, which
00:54:45.270 --> 00:54:47.190
could be time-varying.
00:54:47.190 --> 00:54:51.670
So we have a
regression model almost
00:54:51.670 --> 00:54:54.040
as we've considered
it previously.
00:54:54.040 --> 00:54:58.400
y_t is equal to x_t transpose
beta_t plus epsilon_t.
00:54:58.400 --> 00:55:00.850
The difference now is our
regression coefficients
00:55:00.850 --> 00:55:03.580
beta are allowed to
change over time.
00:55:09.880 --> 00:55:11.180
How do they change over time?
00:55:11.180 --> 00:55:14.120
Well, we're going to
assume that those also
00:55:14.120 --> 00:55:19.120
follow independent random
walks with variances
00:55:19.120 --> 00:55:23.090
of the random walks that
may depend on the component.
00:55:23.090 --> 00:55:24.770
So the joint
state-space equation
00:55:24.770 --> 00:55:32.530
here is given by the identity
times s_t plus eta_t.
00:55:32.530 --> 00:55:36.360
That's basically the random
walk process for the underlying
00:55:36.360 --> 00:55:37.600
regression parameters.
00:55:37.600 --> 00:55:42.360
And y_t is equal
to x_t transpose
00:55:42.360 --> 00:55:46.081
times the same regression
parameters plus the observation
00:55:46.081 --> 00:55:46.580
error.
00:55:56.480 --> 00:55:59.770
I guess needless to say, if we
consider the special case where
00:55:59.770 --> 00:56:04.610
the random walk
process is degenerate
00:56:04.610 --> 00:56:07.320
and they're basically
steps of size zero,
00:56:07.320 --> 00:56:10.410
then we get the normal linear
regression model coming out
00:56:10.410 --> 00:56:11.870
of this.
00:56:11.870 --> 00:56:17.950
If we were to be specifying
the linear state-space
00:56:17.950 --> 00:56:22.810
implementation of this model and
consider successive estimates
00:56:22.810 --> 00:56:25.270
of the model
parameters over time,
00:56:25.270 --> 00:56:28.970
then these equations would
give us recursive estimates
00:56:28.970 --> 00:56:34.080
for updating
regressions as we add
00:56:34.080 --> 00:56:37.500
additional values to the
data, additional observations
00:56:37.500 --> 00:56:38.000
to the data.
00:56:43.880 --> 00:56:49.960
Let's look at autoregressive
models of order p.
00:56:49.960 --> 00:56:55.780
The autoregressive model of
order p for a univariate time
00:56:55.780 --> 00:57:01.670
series has the setup given here.
00:57:01.670 --> 00:57:07.470
It's a polynomial
lag of the response
00:57:07.470 --> 00:57:10.940
variable y_t is equal to
the innovation epsilon_t.
00:57:10.940 --> 00:57:16.130
And we can define
the state vector
00:57:16.130 --> 00:57:24.980
to be equal to the vector of
p values, p successive values
00:57:24.980 --> 00:57:27.650
of the process.
00:57:27.650 --> 00:57:33.710
And so we basically
get a combination
00:57:33.710 --> 00:57:38.700
here of the observation equation
and state equation joining
00:57:38.700 --> 00:57:46.720
where basically
one of the states
00:57:46.720 --> 00:57:48.760
is actually equal
to the observation.
00:57:48.760 --> 00:57:52.600
And basically, with
this definition
00:57:52.600 --> 00:57:59.160
for a state of the vector
at the next time point t,
00:57:59.160 --> 00:58:03.730
that is equal to this
linear transformation
00:58:03.730 --> 00:58:09.114
of the lagged state vector
plus that innovation term.
00:58:09.114 --> 00:58:10.608
I dropped the mic.
00:58:16.600 --> 00:58:21.480
So the notation here
shows the structure
00:58:21.480 --> 00:58:26.240
for how this linear
state-space model is evolving.
00:58:26.240 --> 00:58:29.090
Basically, the
observation equation
00:58:29.090 --> 00:58:32.410
is the linear
combination of the five
00:58:32.410 --> 00:58:36.500
multiples of lags of the
values plus the residual.
00:58:36.500 --> 00:58:40.240
And the previous
lags of the states
00:58:40.240 --> 00:58:46.200
are just simply the identities
times those values, shifted.
00:58:46.200 --> 00:58:51.690
So it's a very simple structure
for the autoregressive process
00:58:51.690 --> 00:58:53.431
as a linear state-space model.
00:58:56.660 --> 00:59:02.470
We have, as I was just saying,
for the transition matrix T sub
00:59:02.470 --> 00:59:09.750
t, this matrix and the
observation equation
00:59:09.750 --> 00:59:13.730
is essentially picking out
the first element of the state
00:59:13.730 --> 00:59:16.540
vector, which has no
measurement error.
00:59:16.540 --> 00:59:18.490
So that simplifies that.
00:59:21.940 --> 00:59:27.210
The moving average
model of order q
00:59:27.210 --> 00:59:29.700
could also be expressed as
a linear state-space model.
00:59:37.240 --> 00:59:38.820
Remember, the
moving average model
00:59:38.820 --> 00:59:43.030
is one where our response
variable, y, is simply
00:59:43.030 --> 00:59:48.290
some linear combination
of innovations,
00:59:48.290 --> 00:59:50.500
q past innovations.
00:59:50.500 --> 00:59:55.350
And this state
vector, if we consider
00:59:55.350 --> 01:00:00.180
the state vector just
being basically q
01:00:00.180 --> 01:00:04.400
lags of the innovations,
then the transition
01:00:04.400 --> 01:00:08.780
of those underlying states is
given by this expression here.
01:00:14.690 --> 01:00:17.770
And we have a state equation,
an observation equation,
01:00:17.770 --> 01:00:23.500
which has these forms for these
various transition matrices
01:00:23.500 --> 01:00:30.615
and for how the innovation
terms are related.
01:00:40.840 --> 01:00:43.160
Let me just finish
up with example
01:00:43.160 --> 01:00:47.780
showing with the autoregressive
moving average model.
01:00:47.780 --> 01:00:49.340
And many years ago,
it was actually
01:00:49.340 --> 01:00:55.490
very difficult to
specify the estimation
01:00:55.490 --> 01:00:58.902
methods for autoregressive
moving average models.
01:00:58.902 --> 01:01:00.800
But the implementation
of these models
01:01:00.800 --> 01:01:05.590
as linear state-space models
facilitated that greatly.
01:01:05.590 --> 01:01:13.030
And with the ARMA model,
the setup basically
01:01:13.030 --> 01:01:14.730
is a combination of
the autoregressive
01:01:14.730 --> 01:01:16.900
moving average processes.
01:01:16.900 --> 01:01:20.280
We have an
autoregression of the y's
01:01:20.280 --> 01:01:24.719
is equal to a moving
average of the residuals
01:01:24.719 --> 01:01:25.510
or the innovations.
01:01:28.170 --> 01:01:32.550
And it's convenient in the setup
for linear state-space models
01:01:32.550 --> 01:01:37.720
to define the dimension m,
which is the maximum of p and q
01:01:37.720 --> 01:01:45.860
plus 1, and think of having
basically a possibly m order
01:01:45.860 --> 01:01:50.860
polynomial lag for each
of those two series.
01:01:50.860 --> 01:01:55.060
And we can basically
constrain those values
01:01:55.060 --> 01:01:59.134
to be 0 if m is greater than
p or m is greater than q.
01:02:06.880 --> 01:02:11.240
And Harvey, in a very
important work in '93,
01:02:11.240 --> 01:02:17.080
actually defined a particular
state-space representation
01:02:17.080 --> 01:02:19.350
for this process.
01:02:19.350 --> 01:02:20.980
And I guess it's
important to know
01:02:20.980 --> 01:02:24.310
that with these linear
state-space models,
01:02:24.310 --> 01:02:29.030
we're dealing with
characterizing structure
01:02:29.030 --> 01:02:31.750
in m-dimensional space.
01:02:31.750 --> 01:02:35.510
There's often some choice in how
you represent your underlying
01:02:35.510 --> 01:02:37.670
states.
01:02:37.670 --> 01:02:42.430
You can basically
re-parametrize the models
01:02:42.430 --> 01:02:47.080
by considering invertible
linear transformations
01:02:47.080 --> 01:02:49.760
of the underlying states.
01:02:49.760 --> 01:02:52.820
So let me go back here.
01:02:56.700 --> 01:02:59.990
In expressing the state
equation generally
01:02:59.990 --> 01:03:04.190
is T sub t s_t plus R_t eta_t.
01:03:04.190 --> 01:03:08.540
This matrix T sub t
and st-- basically s_t
01:03:08.540 --> 01:03:11.280
can be replaced by a linear
transformation of s_t,
01:03:11.280 --> 01:03:16.730
so long as we multiply
the T sub t by the inverse
01:03:16.730 --> 01:03:17.850
of that transformation.
01:03:17.850 --> 01:03:19.810
So there's flexibility
in the choice
01:03:19.810 --> 01:03:22.340
of our linear state-space
specification.
01:03:22.340 --> 01:03:28.820
And so there really are many
different equivalent linear
01:03:28.820 --> 01:03:33.380
state-space models for a
given process depending
01:03:33.380 --> 01:03:35.600
on exactly how you
define the states
01:03:35.600 --> 01:03:39.490
and the underlying
transformation matrix T.
01:03:39.490 --> 01:03:44.900
And the beauty of Harvey's
work was coming up
01:03:44.900 --> 01:03:47.490
with a nice representation
for the states,
01:03:47.490 --> 01:03:53.100
where we had very simple forms
for the various matrices.
01:03:53.100 --> 01:03:57.000
And the lecture notes here
go through the derivation
01:03:57.000 --> 01:03:59.430
of that for the ARMA process.
01:03:59.430 --> 01:04:04.490
And this derivation
is-- I just want
01:04:04.490 --> 01:04:08.240
to go through the
first case just
01:04:08.240 --> 01:04:11.020
to highlight how
the argument goes.
01:04:11.020 --> 01:04:15.090
We basically have this equation,
which is the original equation
01:04:15.090 --> 01:04:17.345
for an ARMA(p,q) process.
01:04:20.180 --> 01:04:25.810
And Harvey says, well,
define the first--
01:04:25.810 --> 01:04:29.460
or the state at time t to
be equal to the observation
01:04:29.460 --> 01:04:31.820
at time t.
01:04:31.820 --> 01:04:38.250
If we do that, then how
does this equation relate
01:04:38.250 --> 01:04:46.000
to the basically-- this is the
state at the next time point, t
01:04:46.000 --> 01:04:50.610
plus 1, is equal to phi_1
times the state at time t,
01:04:50.610 --> 01:05:00.340
plus a second state at time
t and a residual innovation
01:05:00.340 --> 01:05:01.420
eta_t.
01:05:01.420 --> 01:05:09.110
So by choosing the first state
to be the observation value
01:05:09.110 --> 01:05:16.680
at that time, we can then
solve for the second state,
01:05:16.680 --> 01:05:19.810
which is given by
this expression,
01:05:19.810 --> 01:05:25.730
just by rewriting our model
equation in terms of s_(1,t),
01:05:25.730 --> 01:05:27.880
s_(2,t) and eta_t.
01:05:27.880 --> 01:05:36.950
So this s_(2,t) is this function
of the observations and eta_t.
01:05:36.950 --> 01:05:39.440
So it's a very
simple specification
01:05:39.440 --> 01:05:41.820
of the second state.
01:05:41.820 --> 01:05:48.020
Just what is that
second state element
01:05:48.020 --> 01:05:50.520
given this definition
of the first one?
01:05:50.520 --> 01:05:54.650
And one can do this
process iteratively
01:05:54.650 --> 01:05:59.180
getting rid of the
observations and replacing them
01:05:59.180 --> 01:06:01.290
by underlying states.
01:06:01.290 --> 01:06:03.770
And at the end of
the day, you end up
01:06:03.770 --> 01:06:09.490
with this very simple form
for the transition matrix T.
01:06:09.490 --> 01:06:13.950
Basically, the T has the
autoregressive components
01:06:13.950 --> 01:06:16.410
as the first column
of the T matrix.
01:06:16.410 --> 01:06:20.440
And this R matrix has
this vector of the moving
01:06:20.440 --> 01:06:22.550
average components.
01:06:22.550 --> 01:06:28.330
So it's a very nice way
to represent the model.
01:06:28.330 --> 01:06:32.990
Coming up with it was something
very clever that he did.
01:06:32.990 --> 01:06:36.580
But what one can see is
that this basic model where
01:06:36.580 --> 01:06:41.620
you have the states
transitioning according
01:06:41.620 --> 01:06:45.540
to a linear transformation of
the previous state plus error,
01:06:45.540 --> 01:06:49.910
and the observation being some
function of the current states,
01:06:49.910 --> 01:06:54.119
plus error or not, depending
on the formulation,
01:06:54.119 --> 01:06:55.035
is the representation.
01:06:58.200 --> 01:07:03.770
Now, with all of
these models, a reason
01:07:03.770 --> 01:07:08.860
why linear state-space
modeling is in fact effective
01:07:08.860 --> 01:07:19.711
is that their specification is
fully specified with the Kalman
01:07:19.711 --> 01:07:20.210
filter.
01:07:22.730 --> 01:07:32.100
So with this formulation of
linear state-space models,
01:07:32.100 --> 01:07:37.000
the Kalman filter
as a methodology is
01:07:37.000 --> 01:07:41.380
the recursive computation
of the probability density
01:07:41.380 --> 01:07:48.535
functions for the underlying
states at basically
01:07:48.535 --> 01:07:52.420
t plus 1 given
information up to time t,
01:07:52.420 --> 01:07:56.710
as well as the joint
density of the future state
01:07:56.710 --> 01:07:59.800
and the future observation at
t plus 1, given information up
01:07:59.800 --> 01:08:02.370
to time t.
01:08:02.370 --> 01:08:05.520
And also just the
marginal distribution
01:08:05.520 --> 01:08:10.380
of the next observation given
the information up to time t.
01:08:20.490 --> 01:08:26.510
So what I want to do is
just go through with you
01:08:26.510 --> 01:08:31.550
how the Kalman filter is
implemented and defined.
01:08:31.550 --> 01:08:35.370
And the implementation
of the Kalman filter
01:08:35.370 --> 01:08:40.939
requires us to have some
notation that's a bit involved,
01:08:40.939 --> 01:08:46.710
but we'll hopefully explain it
so it's very straightforward.
01:08:46.710 --> 01:08:49.474
There are basically conditional
means of the states.
01:08:52.090 --> 01:08:55.450
s sub t given t
is the mean value
01:08:55.450 --> 01:08:59.510
of the state at time t given
the information up to time t.
01:08:59.510 --> 01:09:02.069
If we condition
on t minus 1, then
01:09:02.069 --> 01:09:03.500
it's the expectation
of the state
01:09:03.500 --> 01:09:06.300
at time t given the
information up to t minus 1.
01:09:09.460 --> 01:09:12.100
And then y t t minus
1 is the expectation
01:09:12.100 --> 01:09:16.880
of the observation given
information up to t minus 1.
01:09:16.880 --> 01:09:18.780
There's also
conditional covariances
01:09:18.780 --> 01:09:22.260
and mean squared errors.
01:09:22.260 --> 01:09:26.620
All these covariances
are determined by omegas.
01:09:26.620 --> 01:09:33.240
The subscript corresponds to
states s, or observation y.
01:09:33.240 --> 01:09:35.060
And basically, the
conditioning set
01:09:35.060 --> 01:09:39.149
is either information up to
time t, t minus 1 or t minus 1
01:09:39.149 --> 01:09:40.479
in the second case.
01:09:40.479 --> 01:09:45.370
And we want to compute
basically the covariance matrix
01:09:45.370 --> 01:09:49.999
of the states given whatever
the information is, information
01:09:49.999 --> 01:09:52.439
up to time t, t minus 1.
01:09:52.439 --> 01:09:57.810
So these covariance
matrices are the expectation
01:09:57.810 --> 01:10:01.990
of the state minus
their expectation
01:10:01.990 --> 01:10:06.850
under the conditioning times
the state minus the expectation
01:10:06.850 --> 01:10:07.950
transpose.
01:10:07.950 --> 01:10:10.810
That's the definition of
that covariance matrix.
01:10:10.810 --> 01:10:12.230
So the different
definitions here
01:10:12.230 --> 01:10:14.300
correspond to just
whether we're conditioning
01:10:14.300 --> 01:10:15.345
on different information.
01:10:17.900 --> 01:10:23.170
And then the observation
innovations or residuals
01:10:23.170 --> 01:10:29.510
are the difference
between an observation y_t
01:10:29.510 --> 01:10:33.847
and its estimate given
information up to t minus 1.
01:10:37.190 --> 01:10:41.370
So the residuals in this process
are the innovation residuals,
01:10:41.370 --> 01:10:44.200
one period ahead.
01:10:44.200 --> 01:10:50.780
And the Kalman filter
consists of four steps.
01:10:50.780 --> 01:11:00.800
We basically want to, first,
predict the state vector
01:11:00.800 --> 01:11:01.780
one step ahead.
01:11:01.780 --> 01:11:10.140
So given our estimate of the
state vector at time t minus 1,
01:11:10.140 --> 01:11:14.800
we want to predict this
state vector at time t.
01:11:14.800 --> 01:11:18.220
And we also want to
predict the observation
01:11:18.220 --> 01:11:23.820
at time t given our estimate
at state vector time t minus 1.
01:11:23.820 --> 01:11:31.674
And so at time t minus 1, we
can estimate these quantities.
01:11:31.674 --> 01:11:32.174
[INAUDIBLE]
01:11:35.646 --> 01:11:40.969
At t minus 1, we can
basically predict
01:11:40.969 --> 01:11:42.760
what the state is going
to and predict what
01:11:42.760 --> 01:11:44.750
the observation is going to be.
01:11:44.750 --> 01:11:47.166
And we can estimate
how much error there's
01:11:47.166 --> 01:11:49.707
going to be in those estimates,
by these covariance matrices.
01:11:59.420 --> 01:12:05.140
The second step is
updating these predictions
01:12:05.140 --> 01:12:11.900
to get our estimate of the state
given the observation at time t
01:12:11.900 --> 01:12:15.480
and to update our uncertainty
about that state given
01:12:15.480 --> 01:12:16.380
this new observation.
01:12:16.380 --> 01:12:21.350
So basically, our estimate
of the state at time t
01:12:21.350 --> 01:12:25.310
is an adjustment to our
estimate given information up
01:12:25.310 --> 01:12:31.164
to t minus 1, plus a function of
the difference between what we
01:12:31.164 --> 01:12:32.455
observed and what we predicted.
01:12:35.020 --> 01:12:42.870
And this T_t function matrix is
called the filter gain matrix.
01:12:42.870 --> 01:12:45.120
And basically, it
characterizes how
01:12:45.120 --> 01:12:50.070
do we adjust our prediction
of the underlying state
01:12:50.070 --> 01:12:52.760
depending on what happened.
01:12:52.760 --> 01:12:54.440
So that's the
filter gain matrix.
01:12:57.150 --> 01:13:00.470
So we actually do
gain information
01:13:00.470 --> 01:13:03.160
with each observation about what
the new value of the process
01:13:03.160 --> 01:13:04.320
is.
01:13:04.320 --> 01:13:06.830
And that information
is characterized
01:13:06.830 --> 01:13:09.190
by filter gain matrix.
01:13:09.190 --> 01:13:11.580
You'll notice that
the uncertainty
01:13:11.580 --> 01:13:15.720
in the state at time t, this
omega_s of t given t, that's
01:13:15.720 --> 01:13:19.630
equal to the covariance
matrix given t minus 1.
01:13:19.630 --> 01:13:23.330
So it's our beginning level
of uncertainty adjusted
01:13:23.330 --> 01:13:27.790
by a term that tells us
how much information did we
01:13:27.790 --> 01:13:29.580
get from that new information.
01:13:29.580 --> 01:13:33.590
So notice that there's
a minus sign there.
01:13:33.590 --> 01:13:35.600
We're basically
reducing our uncertainty
01:13:35.600 --> 01:13:44.602
about the state given the
information in the innovation
01:13:44.602 --> 01:13:45.685
that we now have observed.
01:13:48.800 --> 01:13:51.870
Then, there's a
forecasting step which
01:13:51.870 --> 01:13:59.310
is used to forecast the
state one period forward,
01:13:59.310 --> 01:14:01.400
is simply given by this
linear transformation
01:14:01.400 --> 01:14:03.170
of the previous state.
01:14:03.170 --> 01:14:05.890
And we can also update
our covariance matrix
01:14:05.890 --> 01:14:09.580
for future states given
the previous state
01:14:09.580 --> 01:14:13.530
by applying this formula
which is a recursive formula
01:14:13.530 --> 01:14:17.580
for estimating covariances.
01:14:17.580 --> 01:14:24.760
So we have
forecasting algorithms
01:14:24.760 --> 01:14:29.520
that are simple linear
functions of these estimates.
01:14:29.520 --> 01:14:35.650
And then finally,
there's a smoothing step
01:14:35.650 --> 01:14:43.960
which is characterizing
the conditional expectation
01:14:43.960 --> 01:14:49.950
of underlying states, given
information in the whole time
01:14:49.950 --> 01:14:51.150
series.
01:14:51.150 --> 01:14:55.440
And so ordinarily with Kalman
filters, Kalman filters
01:14:55.440 --> 01:14:58.210
are applied
sequentially over time
01:14:58.210 --> 01:15:01.090
where one basically
is predicting ahead
01:15:01.090 --> 01:15:03.550
one step, updating
that prediction,
01:15:03.550 --> 01:15:08.320
predicting ahead another
step, updating the information
01:15:08.320 --> 01:15:10.930
on the states.
01:15:10.930 --> 01:15:19.410
And that overall
process is the process
01:15:19.410 --> 01:15:21.550
of actually computing
the likelihood
01:15:21.550 --> 01:15:25.210
function for these linear
state-space models.
01:15:25.210 --> 01:15:32.140
And so the Kalman filter is
basically ultimately applied
01:15:32.140 --> 01:15:35.010
for successive
forecasting of the process
01:15:35.010 --> 01:15:39.600
but also for helping us identify
what the underlying model
01:15:39.600 --> 01:15:43.430
parameters are using
maximum likelihood methods.
01:15:43.430 --> 01:15:48.290
And so the likelihood function
for the linear state-space
01:15:48.290 --> 01:15:52.050
model is basically the--
or the log-likelihood
01:15:52.050 --> 01:15:54.920
is the log-likelihood of
the entire data series,
01:15:54.920 --> 01:15:56.980
give the unknown parameters.
01:15:56.980 --> 01:16:00.020
But that can be
expressed as the product
01:16:00.020 --> 01:16:04.290
of the conditional distributions
of each successive observation,
01:16:04.290 --> 01:16:07.150
given the history.
01:16:07.150 --> 01:16:09.750
And so basically, the
likelihood of theta
01:16:09.750 --> 01:16:12.390
is the likelihood of
the first observation
01:16:12.390 --> 01:16:15.240
times the density of the
second observation given
01:16:15.240 --> 01:16:18.990
the first times and so
forth for the whole series.
01:16:18.990 --> 01:16:22.650
And so the likelihood
function is basically
01:16:22.650 --> 01:16:25.490
a function of all these
terms that we were computing
01:16:25.490 --> 01:16:26.490
with the Kalman filter.
01:16:29.260 --> 01:16:33.470
And with the Kalman
filter, it basically
01:16:33.470 --> 01:16:36.760
provides all the terms
necessary for this estimation.
01:16:36.760 --> 01:16:42.270
If the error terms are
normally distributed,
01:16:42.270 --> 01:16:46.550
then the means and
variances of these estimates
01:16:46.550 --> 01:16:52.750
are in fact characterizing
the exact distributions
01:16:52.750 --> 01:16:54.300
of the process.
01:16:54.300 --> 01:16:56.850
Basically, we're taking--
if the innovation series are
01:16:56.850 --> 01:16:59.290
all normal random
variables, then
01:16:59.290 --> 01:17:00.980
the linear
state-space model, all
01:17:00.980 --> 01:17:03.750
it's doing is taking linear
combinations of normals
01:17:03.750 --> 01:17:07.410
for the underlying states and
for the actual observations.
01:17:07.410 --> 01:17:08.890
And normal
distributions are fully
01:17:08.890 --> 01:17:10.610
characterized by
their mean vectors
01:17:10.610 --> 01:17:12.310
and covariance matrices.
01:17:12.310 --> 01:17:14.050
And the Kalman
filter provides a way
01:17:14.050 --> 01:17:21.570
to update these distributions
for all these features
01:17:21.570 --> 01:17:23.000
of a model, the
underlying states
01:17:23.000 --> 01:17:26.520
as well as the distributions
of the observations.
01:17:26.520 --> 01:17:35.250
So that's a brief introduction
the Kalman filter.
01:17:35.250 --> 01:17:36.940
Let's finish there.
01:17:36.940 --> 01:17:38.490
Thank you.