WEBVTT

00:00:00.060 --> 00:00:02.500
The following content is
provided under a Creative

00:00:02.500 --> 00:00:04.019
Commons license.

00:00:04.019 --> 00:00:06.360
Your support will help
MIT OpenCourseWare

00:00:06.360 --> 00:00:10.730
continue to offer high quality
educational resources for free.

00:00:10.730 --> 00:00:13.340
To make a donation or
view additional materials

00:00:13.340 --> 00:00:17.217
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.217 --> 00:00:17.842
at ocw.mit.edu.

00:00:22.240 --> 00:00:25.970
PROFESSOR: Today's topic
is factor modeling,

00:00:25.970 --> 00:00:32.420
and the subject here basically
exploits multivariate analysis

00:00:32.420 --> 00:00:37.910
in statistics to financial
markets where our concern is

00:00:37.910 --> 00:00:42.790
using factors to model
returns and variances,

00:00:42.790 --> 00:00:44.900
covariances, correlations.

00:00:44.900 --> 00:00:48.970
And with these models,
there are two basic cases.

00:00:48.970 --> 00:00:52.150
There's one where the
factors are observable.

00:00:52.150 --> 00:00:55.150
Those can be
macroeconomic factors.

00:00:55.150 --> 00:00:59.690
They can be fundamental
factors on assets or securities

00:00:59.690 --> 00:01:03.070
that might explain
returns and covariances.

00:01:03.070 --> 00:01:06.490
A second class of models
is where these factors

00:01:06.490 --> 00:01:08.930
are hidden or latent.

00:01:08.930 --> 00:01:11.850
And statistical
factor models are then

00:01:11.850 --> 00:01:15.240
used to specify these models.

00:01:15.240 --> 00:01:17.110
In particular, there
are two methodologies.

00:01:17.110 --> 00:01:21.310
There's factor analysis and
principal components analysis,

00:01:21.310 --> 00:01:24.930
which we'll get into some
detail during the lecture.

00:01:24.930 --> 00:01:31.200
So let's proceed to talk about
the setup for a linear factor

00:01:31.200 --> 00:01:33.410
model.

00:01:33.410 --> 00:01:38.540
We have m assets, or
instruments, or indexes

00:01:38.540 --> 00:01:42.710
whose values correspond to a
multivariate stochastic process

00:01:42.710 --> 00:01:44.030
we're modeling.

00:01:44.030 --> 00:01:47.530
And we have n time periods t.

00:01:47.530 --> 00:01:52.840
And with the factor model
we model the t-th value

00:01:52.840 --> 00:01:58.140
for the i-th object-- whether
it's a stock price, futures

00:01:58.140 --> 00:02:04.750
price, currency-- as a
linear function of factors

00:02:04.750 --> 00:02:07.360
f_1 through f_k.

00:02:07.360 --> 00:02:10.690
So there's basically
like a state-space model

00:02:10.690 --> 00:02:12.845
for the value of the
stochastic process,

00:02:12.845 --> 00:02:16.020
as it depends on these
underlying factors.

00:02:16.020 --> 00:02:20.080
And the dependence is given
by coefficients beta_1

00:02:20.080 --> 00:02:27.600
through beta_k, which are
depending upon i, the asset.

00:02:27.600 --> 00:02:31.730
So we allow each asset, say
if we're thinking of stocks,

00:02:31.730 --> 00:02:34.770
to depend on factors
in different ways.

00:02:34.770 --> 00:02:38.900
If a certain underlying
factor changes in value,

00:02:38.900 --> 00:02:44.340
the beta corresponds to the
impact of that underlying

00:02:44.340 --> 00:02:46.330
factor.

00:02:46.330 --> 00:02:49.440
So we have common factors.

00:02:52.080 --> 00:02:54.250
Now these common factors
f, this is really

00:02:54.250 --> 00:02:58.150
going to be a nice model if the
number of factors that we're

00:02:58.150 --> 00:03:01.300
using is relatively small.

00:03:01.300 --> 00:03:05.360
So the number k
of common factors

00:03:05.360 --> 00:03:09.490
is generally very, very
small relative to m.

00:03:09.490 --> 00:03:13.010
And if you think about modeling,
say asset-- equity asset

00:03:13.010 --> 00:03:16.000
returns in a market, there
can be hundreds and thousands

00:03:16.000 --> 00:03:17.570
of securities.

00:03:17.570 --> 00:03:22.530
And so in terms of modeling
those returns and covariances,

00:03:22.530 --> 00:03:24.360
what we're trying to
do is characterize

00:03:24.360 --> 00:03:28.230
those in terms of a modest
number of underlying factors

00:03:28.230 --> 00:03:30.510
which simplifies
the problem greatly.

00:03:33.450 --> 00:03:37.190
The vectors beta_i are
termed the factor loadings

00:03:37.190 --> 00:03:38.610
of an asset.

00:03:38.610 --> 00:03:43.680
And the epsilon_(i,t)'s are
a specific factor of asset i,

00:03:43.680 --> 00:03:44.470
period t.

00:03:44.470 --> 00:03:48.260
So in factor modeling,
we talk about there

00:03:48.260 --> 00:03:53.340
being common factors affecting
the dynamics of the system,

00:03:53.340 --> 00:03:59.210
and the factor associated
with particular cases

00:03:59.210 --> 00:04:02.450
are the specific factors.

00:04:02.450 --> 00:04:05.120
So this setup is
really very familiar.

00:04:05.120 --> 00:04:08.430
It just looks like a standard
sort of regression type model

00:04:08.430 --> 00:04:11.240
that we've seen before.

00:04:11.240 --> 00:04:14.270
And so let's see how
this can be set up

00:04:14.270 --> 00:04:18.100
as a set of cross-sectional
regressions.

00:04:18.100 --> 00:04:25.870
So now we're going to fix
the period t, the time t,

00:04:25.870 --> 00:04:31.040
and consider the
m-variate x variable

00:04:31.040 --> 00:04:38.460
as satisfying a regression model
with intercept given by alphas.

00:04:38.460 --> 00:04:43.140
And then the independent
variables matrix

00:04:43.140 --> 00:04:48.710
is B, given by the coefficients
of the factor loadings.

00:04:48.710 --> 00:04:54.210
And then we have the residuals
epsilon_t for the m assets.

00:04:54.210 --> 00:04:57.640
So the cross-sectional
terminology

00:04:57.640 --> 00:05:00.700
means we're fixing time
and looking across all

00:05:00.700 --> 00:05:02.970
the assets for one fixed time.

00:05:02.970 --> 00:05:09.310
And we're trying to explain
how, say, the returns of assets

00:05:09.310 --> 00:05:12.240
are varying depending upon
the underlying factors.

00:05:12.240 --> 00:05:19.990
And so the-- well OK, what's
random in this process?

00:05:19.990 --> 00:05:23.770
Well certainly the residual
term is considered to be random.

00:05:23.770 --> 00:05:26.410
That's basically
going to be assumed

00:05:26.410 --> 00:05:29.660
to be white noise with mean 0.

00:05:29.660 --> 00:05:35.490
There's going to be possibly
a covariance matrix psi.

00:05:35.490 --> 00:05:38.010
And it's going to
be uncorrelated

00:05:38.010 --> 00:05:41.860
across different
time cross sections.

00:05:41.860 --> 00:05:44.550
Let's see if I can move the
mouse, if this is what's

00:05:44.550 --> 00:05:46.700
causing the problem down here.

00:05:46.700 --> 00:05:54.450
So in this model we have the
realizations on the underlying

00:05:54.450 --> 00:05:56.600
factors being random variables.

00:05:56.600 --> 00:05:59.710
The returns on the assets depend
on the underlying factors.

00:05:59.710 --> 00:06:04.540
Those are going to be assumed
to have some mean, mu_f,

00:06:04.540 --> 00:06:07.010
and some covariance matrix.

00:06:07.010 --> 00:06:09.760
And basically the
dimension of that

00:06:09.760 --> 00:06:13.250
covariance matrix omega_f
is going to be k by k.

00:06:13.250 --> 00:06:16.975
So in terms of modeling this
problem, we go from an m

00:06:16.975 --> 00:06:22.130
by m system of
covariances, correlations,

00:06:22.130 --> 00:06:27.360
to focusing initially on an a
k by k system of covariances

00:06:27.360 --> 00:06:30.730
and correlations between
the underlying factors.

00:06:30.730 --> 00:06:38.380
Psi is a diagonal matrix
with the specific variances

00:06:38.380 --> 00:06:40.310
of the underlying assets.

00:06:40.310 --> 00:06:50.270
So we have basically epsilon--
the covariance matrix

00:06:50.270 --> 00:06:53.010
of the epsilons is
a diagonal matrix,

00:06:53.010 --> 00:06:59.500
and the covariance matrix of
f is given by this omega_f.

00:06:59.500 --> 00:07:01.690
Well, with those
specifications we

00:07:01.690 --> 00:07:09.070
can get the covariance
for the overall vector

00:07:09.070 --> 00:07:13.880
of the m-variate
stochastic process.

00:07:13.880 --> 00:07:19.880
And we have this model here
for the conditional moments.

00:07:19.880 --> 00:07:23.270
Basically, the
conditional expectation

00:07:23.270 --> 00:07:25.810
of the process given
the underlying factors

00:07:25.810 --> 00:07:30.310
is this linear model in terms
of the underlying factors f.

00:07:30.310 --> 00:07:34.025
And the covariance matrix is the
psi matrix, which is diagonal.

00:07:38.040 --> 00:07:42.840
And the unconditional
moments are

00:07:42.840 --> 00:07:46.290
obtained by just taking
the expectations of these.

00:07:46.290 --> 00:07:50.130
Well actually, the unconditional
expectation of x is this.

00:07:50.130 --> 00:07:52.860
The unconditional
covariance of x

00:07:52.860 --> 00:07:56.340
is actually equal
to the expectation

00:07:56.340 --> 00:08:02.129
of this plus the variance of
the conditional expectation,

00:08:02.129 --> 00:08:04.170
or the covariance of the
conditional expectation.

00:08:04.170 --> 00:08:08.690
So one of the formulas that's
important to realize here

00:08:08.690 --> 00:08:13.620
is that if we're considering
the covariance of x_t, which

00:08:13.620 --> 00:08:21.530
is equal to covariance of B
f_t plus epsilon_t, that's

00:08:21.530 --> 00:08:27.715
equal to the covariance of
B f_t plus the covariance

00:08:27.715 --> 00:08:35.100
of epsilon_t plus
twice the covariance

00:08:35.100 --> 00:08:39.600
between this term and this,
but those are uncorrelated.

00:08:39.600 --> 00:08:47.520
And so this is equal to B
covariance of f_t B transpose

00:08:47.520 --> 00:08:49.240
plus psi.

00:08:54.700 --> 00:08:56.865
With m assets, how
many parameters

00:08:56.865 --> 00:08:59.890
are in the covariance
matrix if there's

00:08:59.890 --> 00:09:02.987
no constraints on the
covariance matrix?

00:09:02.987 --> 00:09:03.903
AUDIENCE: [INAUDIBLE].

00:09:07.340 --> 00:09:08.670
PROFESSOR: How many parameters?

00:09:08.670 --> 00:09:09.560
Right.

00:09:09.560 --> 00:09:11.370
So this is sigma.

00:09:11.370 --> 00:09:15.214
So the number of
parameters in sigma.

00:09:15.214 --> 00:09:16.130
AUDIENCE: [INAUDIBLE].

00:09:19.954 --> 00:09:21.595
PROFESSOR: m plus what?

00:09:21.595 --> 00:09:23.970
AUDIENCE: [INAUDIBLE].

00:09:23.970 --> 00:09:29.250
PROFESSOR: OK, this is
a square matrix, m by m.

00:09:29.250 --> 00:09:32.440
So there's possibly m
squared, but it's symmetric.

00:09:32.440 --> 00:09:36.660
So we're double-counting
off the diagonal.

00:09:36.660 --> 00:09:39.950
So it's m times m plus 1 over 2.

00:09:43.490 --> 00:09:47.540
How many parameters do we
have with the factor model?

00:09:52.150 --> 00:09:57.210
So if we think of a--
let's call this sigma star.

00:09:57.210 --> 00:10:01.640
The number of parameters
in sigma star is what?

00:10:05.810 --> 00:10:10.986
Well, B is an m by k matrix.

00:10:15.920 --> 00:10:22.315
This is m by k, so we have
possibly m times k values.

00:10:25.870 --> 00:10:39.030
The f_x is-- or the
covariance of f_t

00:10:39.030 --> 00:10:45.460
is the number of elements in
the covariance matrix of f,

00:10:45.460 --> 00:10:48.360
which is k by k.

00:10:48.360 --> 00:10:58.470
And then we have psi, which
is a diagonal of dimension m.

00:10:58.470 --> 00:11:00.680
So depending on how
we structure things,

00:11:00.680 --> 00:11:03.970
we can have many, many fewer
parameters in this factor model

00:11:03.970 --> 00:11:05.609
than in the unconstrained case.

00:11:05.609 --> 00:11:07.400
And we're going to see
that we can actually

00:11:07.400 --> 00:11:12.630
reduce this number in the
covariance matrix of f

00:11:12.630 --> 00:11:15.637
dramatically because
of flexibility

00:11:15.637 --> 00:11:16.720
in choosing those factors.

00:11:21.940 --> 00:11:27.990
Well let's also look at the
interpretation of the factor

00:11:27.990 --> 00:11:30.110
model as a series of
time series regressions.

00:11:30.110 --> 00:11:35.410
You remember when we talked
about multivariate regression

00:11:35.410 --> 00:11:38.490
a few lectures ago, we
talked about cross-sectional

00:11:38.490 --> 00:11:41.760
regressions and time
series regressions,

00:11:41.760 --> 00:11:45.760
and looking at the collection
of all the regressions

00:11:45.760 --> 00:11:47.770
in a multivariate
regression setting.

00:11:47.770 --> 00:11:50.460
Here we can do the same thing.

00:11:50.460 --> 00:11:52.620
In contrast to the
cross-sectional regression

00:11:52.620 --> 00:11:55.680
where we're fixing time and
looking at all the assets,

00:11:55.680 --> 00:12:01.570
here we're looking at fixing
the asset i and the regression

00:12:01.570 --> 00:12:04.590
over time for that single asset.

00:12:04.590 --> 00:12:09.980
So the values of x_i,
ranging from time 1

00:12:09.980 --> 00:12:16.130
up to time capital T, basically
follows a regression model

00:12:16.130 --> 00:12:22.890
that's equal to the intercept
alpha_i plus this matrix F

00:12:22.890 --> 00:12:30.055
times beta_i, where beta_i
corresponds to the regression

00:12:30.055 --> 00:12:31.680
parameters in this
regression, but they

00:12:31.680 --> 00:12:35.985
are the factor corresponding to
an asset i on the different k

00:12:35.985 --> 00:12:36.485
factors.

00:12:39.430 --> 00:12:45.470
In this setting, the covariance
matrix of the epsilon_i vector

00:12:45.470 --> 00:12:50.640
is the diagonal matrix sigma
squared times the identity.

00:12:50.640 --> 00:12:54.580
And so this is the classic
Gauss-Markov assumptions

00:12:54.580 --> 00:12:58.180
for a single linear
regression model.

00:13:04.530 --> 00:13:09.600
Well, as we did previously,
we can group together

00:13:09.600 --> 00:13:13.700
all of these time series
regressions for all the m

00:13:13.700 --> 00:13:19.220
assets together by simply
putting them all together.

00:13:19.220 --> 00:13:28.620
So we start off with x_i
equal to basically F beta_i

00:13:28.620 --> 00:13:31.030
plus epsilon_i.

00:13:31.030 --> 00:13:39.980
And we can basically
consider x_1, x_2, up to x_n.

00:13:39.980 --> 00:13:46.260
So we have a T by m
matrix for the m assets.

00:13:46.260 --> 00:13:56.230
And that's equal to a regression
model given by basically

00:13:56.230 --> 00:13:58.470
what's on the slides here.

00:13:58.470 --> 00:14:01.370
So basically, we're able to
group everything together

00:14:01.370 --> 00:14:05.900
and deal with everything all
at once, which computationally

00:14:05.900 --> 00:14:08.530
is applied in fitting these.

00:14:16.630 --> 00:14:21.780
Let's look at the simplest
example of a factor model.

00:14:21.780 --> 00:14:24.610
This is the single-factor
model of Sharpe.

00:14:24.610 --> 00:14:27.640
We went through the capital
asset pricing model,

00:14:27.640 --> 00:14:33.382
how returns on assets and
stocks are basically--

00:14:33.382 --> 00:14:35.090
the excess return on
stock can be modeled

00:14:35.090 --> 00:14:39.360
in terms as a linear
regression on the excess return

00:14:39.360 --> 00:14:40.530
of the market.

00:14:40.530 --> 00:14:43.860
And the regression
parameter beta_i

00:14:43.860 --> 00:14:48.760
corresponds to the level of
risk associated with the asset.

00:14:48.760 --> 00:14:54.110
And all we're doing
in this model is,

00:14:54.110 --> 00:14:57.050
by choosing different
assets we're choosing assets

00:14:57.050 --> 00:15:01.800
with different levels of
risk scaled by the beta_i.

00:15:01.800 --> 00:15:04.510
And they may have
returns that vary

00:15:04.510 --> 00:15:08.760
across assets given by alpha_i.

00:15:08.760 --> 00:15:16.380
The covariance
matrix of the assets

00:15:16.380 --> 00:15:18.600
has-- the unconditional
covariance matrix

00:15:18.600 --> 00:15:20.540
has this structure.

00:15:20.540 --> 00:15:25.190
It's basically equal to the
variance of the market times

00:15:25.190 --> 00:15:28.580
beta beta prime plus psi.

00:15:28.580 --> 00:15:33.780
And so that equation
is really very simple.

00:15:37.070 --> 00:15:41.270
It's really self-evident from
what we've discussed, but let

00:15:41.270 --> 00:15:45.580
me just highlight what it is.

00:15:45.580 --> 00:15:53.276
Sigma squared beta beta
transposed plus psi.

00:15:53.276 --> 00:15:55.170
And that's equal
to sigma squared

00:15:55.170 --> 00:15:58.720
times basically a column vector
of all the betas, beta_1 down

00:15:58.720 --> 00:16:08.460
to beta_m times its transpose
plus a diagonal matrix

00:16:08.460 --> 00:16:09.740
with the psi.

00:16:09.740 --> 00:16:12.460
So this is really a very,
very simple structure

00:16:12.460 --> 00:16:14.790
for the covariance.

00:16:14.790 --> 00:16:18.610
And if you had wanted to
apply this model to thousands

00:16:18.610 --> 00:16:20.850
of securities, it's
basically no problem.

00:16:20.850 --> 00:16:23.270
You can construct a
covariance matrix.

00:16:23.270 --> 00:16:26.510
And if this were appropriate
for a large collection

00:16:26.510 --> 00:16:30.110
of securities, then the
amount-- the reduction

00:16:30.110 --> 00:16:35.810
in terms of what you're
estimating is enormous.

00:16:35.810 --> 00:16:39.103
Rather than estimating
each cross-correlation

00:16:39.103 --> 00:16:44.190
and covariance of
all the assets,

00:16:44.190 --> 00:16:49.190
the factor model tells us what
those cross covariances are.

00:16:49.190 --> 00:16:54.141
So that's really where the
power of the model comes in.

00:16:54.141 --> 00:16:58.310
And in terms of why
is this so useful,

00:16:58.310 --> 00:17:03.980
well in portfolio management
one of the key drivers

00:17:03.980 --> 00:17:07.450
of asset allocation is
the covariance matrix

00:17:07.450 --> 00:17:08.460
of the assets.

00:17:08.460 --> 00:17:09.910
So if you have an
effective model

00:17:09.910 --> 00:17:12.319
for modeling the
covariance, then that

00:17:12.319 --> 00:17:14.920
simplifies the portfolio
allocation problem

00:17:14.920 --> 00:17:17.800
because you can specify
a covariance matrix

00:17:17.800 --> 00:17:20.510
that you are confident with.

00:17:20.510 --> 00:17:28.089
And also in risk
management, effective models

00:17:28.089 --> 00:17:32.010
of risk management
deal with, how

00:17:32.010 --> 00:17:37.820
do we anticipate what would
happen if different scenarios

00:17:37.820 --> 00:17:38.750
occur in the market?

00:17:38.750 --> 00:17:41.320
Well, the different
scenarios that can occur

00:17:41.320 --> 00:17:45.900
can be associated with what's
happening to underlying factors

00:17:45.900 --> 00:17:48.460
that affect the system.

00:17:48.460 --> 00:17:51.580
And so we can consider
risk management approaches

00:17:51.580 --> 00:17:54.200
that vary these underlying
factors, and look at

00:17:54.200 --> 00:17:57.172
how that has an impact
on the covariance matrix

00:17:57.172 --> 00:17:57.755
very directly.

00:18:04.640 --> 00:18:08.350
Estimation of Sharpe's
single index model.

00:18:08.350 --> 00:18:11.460
We went through
before how we estimate

00:18:11.460 --> 00:18:14.970
the alphas and the betas.

00:18:14.970 --> 00:18:17.440
In terms of estimating
the sigmas--

00:18:17.440 --> 00:18:20.800
the specific
variances-- basically,

00:18:20.800 --> 00:18:23.640
that comes from the
simple regression as well.

00:18:23.640 --> 00:18:26.920
Basically, the sum of the
squared estimated residuals

00:18:26.920 --> 00:18:28.840
divided by t minus 2.

00:18:28.840 --> 00:18:31.870
Here we're assuming unbiasedness
because we have two parameters

00:18:31.870 --> 00:18:34.220
estimated per model.

00:18:34.220 --> 00:18:40.700
Then for the market
portfolio, that basically

00:18:40.700 --> 00:18:42.580
has a simple estimate as well.

00:18:42.580 --> 00:18:46.470
The psi hat matrix would
just be the diagonal

00:18:46.470 --> 00:18:53.680
of that-- the diagonal of
the specific variances.

00:18:53.680 --> 00:18:56.620
And then the unconditional
covariance matrix

00:18:56.620 --> 00:19:00.940
is estimated by simply plugging
in these parameter estimates.

00:19:00.940 --> 00:19:08.490
So, very simple and effective
if that single-factor

00:19:08.490 --> 00:19:09.760
model is appropriate.

00:19:09.760 --> 00:19:13.660
Now needless to say,
a single-factor model

00:19:13.660 --> 00:19:18.860
doesn't characterize the
structure of the covariances

00:19:18.860 --> 00:19:22.220
and/or the returns typically.

00:19:22.220 --> 00:19:26.860
And so we want to consider
more general models,

00:19:26.860 --> 00:19:28.880
multi-factor models.

00:19:28.880 --> 00:19:31.160
And the first set of models
we're going to talk about

00:19:31.160 --> 00:19:36.640
are common factor variables
that can actually be observed.

00:19:39.630 --> 00:19:44.430
Basically any factor
that you can observe

00:19:44.430 --> 00:19:49.480
is a potential candidate
for being a relevant factor

00:19:49.480 --> 00:19:51.240
in a linear factor model.

00:19:51.240 --> 00:19:54.100
The effectiveness of
a potential factor

00:19:54.100 --> 00:19:56.490
is determined by
fitting the model

00:19:56.490 --> 00:20:00.050
and seeing how much
contribution that factor

00:20:00.050 --> 00:20:03.567
makes to the explanation
of the returns

00:20:03.567 --> 00:20:04.775
and the covariance structure.

00:20:07.300 --> 00:20:12.970
Chen, Ross, and Roll wrote
a classic paper in 1986.

00:20:12.970 --> 00:20:17.460
Now Ross is actually
here at MIT.

00:20:17.460 --> 00:20:30.580
And with their paper,
they looked at modeling--

00:20:30.580 --> 00:20:32.180
rather than looking
at these factors

00:20:32.180 --> 00:20:34.560
directly, including
those in the model,

00:20:34.560 --> 00:20:39.940
they looked at
transforming these factors

00:20:39.940 --> 00:20:43.080
into surprise factors.

00:20:43.080 --> 00:20:47.550
So rather than having
interest rates just

00:20:47.550 --> 00:20:50.230
as a simple factor directly
plugged into the model,

00:20:50.230 --> 00:20:54.100
it would be the change
in interest rates.

00:20:54.100 --> 00:20:56.890
And additionally, not only just
the change in interest rate,

00:20:56.890 --> 00:20:59.180
but the unanticipated
change in interest

00:20:59.180 --> 00:21:01.550
rates given market information.

00:21:01.550 --> 00:21:07.480
So their paper goes
through modeling different

00:21:07.480 --> 00:21:12.130
macroeconomic variables with
vector autoregression models,

00:21:12.130 --> 00:21:17.270
and then using those to
specify unanticipated changes

00:21:17.270 --> 00:21:19.540
in these underlying factors.

00:21:19.540 --> 00:21:22.680
And so that's where
the power comes in.

00:21:22.680 --> 00:21:27.680
And that highlights how when
you're applying these models,

00:21:27.680 --> 00:21:30.960
it does involve some
creativity to get the most bang

00:21:30.960 --> 00:21:32.340
for the buck with your models.

00:21:32.340 --> 00:21:36.780
And the idea they had of
incorporating unanticipated

00:21:36.780 --> 00:21:39.060
changes was really
a very good one

00:21:39.060 --> 00:21:41.555
and is applied quite widely now.

00:21:55.050 --> 00:22:03.380
Now with this setup,
one basically--

00:22:03.380 --> 00:22:10.040
if one has empirical data over
times 1 through capital T,

00:22:10.040 --> 00:22:13.580
then if one wants to
specify these models,

00:22:13.580 --> 00:22:17.740
one has their observations
on the x_i process.

00:22:17.740 --> 00:22:22.120
You basically have observed
all the returns historically.

00:22:22.120 --> 00:22:24.940
We also, because the
factors are observable,

00:22:24.940 --> 00:22:29.290
have the F matrix as a
set of observed variables.

00:22:29.290 --> 00:22:36.300
So we can basically use those to
estimate the beta_i's and also

00:22:36.300 --> 00:22:40.970
estimate the variances
of the residual terms

00:22:40.970 --> 00:22:44.310
with simple regression methods.

00:22:44.310 --> 00:22:49.970
So implementing these
is quite feasible,

00:22:49.970 --> 00:22:53.860
and basically applies methods
that we have from before.

00:22:53.860 --> 00:23:01.110
So what this slide now discusses
is how we basically estimate

00:23:01.110 --> 00:23:02.700
the underlying parameters.

00:23:02.700 --> 00:23:06.990
We need to be a little bit
careful about the Gauss-Markov

00:23:06.990 --> 00:23:08.210
assumptions.

00:23:08.210 --> 00:23:15.100
You'll remember that if we
have a regression model where

00:23:15.100 --> 00:23:18.650
the residual terms are
uncorrelated and constant

00:23:18.650 --> 00:23:22.560
variance, then the
simple linear regression

00:23:22.560 --> 00:23:25.210
estimates are the best ones.

00:23:25.210 --> 00:23:32.740
If there is unequal
variances of the residuals,

00:23:32.740 --> 00:23:36.650
and maybe even
covariances, then we

00:23:36.650 --> 00:23:40.250
need to use generalized
least squares.

00:23:40.250 --> 00:23:46.850
So the notes go through those
computations and the formulas,

00:23:46.850 --> 00:23:51.380
which are just simple extensions
of our regression model theory

00:23:51.380 --> 00:23:53.755
that we had in
previous lectures.

00:24:04.240 --> 00:24:11.433
Let me go through an example.

00:24:17.720 --> 00:24:19.790
With common factor
variables that

00:24:19.790 --> 00:24:25.560
are using either fundamental
or asset-specific attributes,

00:24:25.560 --> 00:24:29.047
there's the approach of-- well,
it's called the BARRA Approach.

00:24:29.047 --> 00:24:30.213
This is from Barr Rosenberg.

00:24:33.470 --> 00:24:36.090
Actually, I have to say, he
was one of the inspirations

00:24:36.090 --> 00:24:41.040
to me for going into statistical
modeling and finance.

00:24:41.040 --> 00:24:44.910
He was a professor at UC
Berkeley who left academics

00:24:44.910 --> 00:24:51.770
very early to basically
apply models in trade money.

00:24:51.770 --> 00:24:55.250
As an anecdote, his
current situation

00:24:55.250 --> 00:24:57.210
is a little different.

00:24:57.210 --> 00:24:58.620
I'll let you look that up.

00:24:58.620 --> 00:25:04.170
But anyway, this
approach basically

00:25:04.170 --> 00:25:09.260
provided the BARRA Approach
for factor modeling and risk

00:25:09.260 --> 00:25:11.950
analysis, which is still
used extensively today.

00:25:15.360 --> 00:25:23.710
So with common factor
variables using

00:25:23.710 --> 00:25:28.740
asset-specific
attributes-- in fact,

00:25:28.740 --> 00:25:33.890
the factor realizations
are unobserved

00:25:33.890 --> 00:25:38.960
but are estimated in the
application of these models.

00:25:38.960 --> 00:25:41.930
So let's see how that goes.

00:25:41.930 --> 00:25:50.410
Oh, OK, this slide talks about
the Fama-French approach, which

00:25:50.410 --> 00:25:54.610
concerns-- OK, Fama and
French, Fama of course

00:25:54.610 --> 00:25:56.780
we talked about him
in the last lecture.

00:25:56.780 --> 00:25:58.920
He got the Nobel
Prize for his work

00:25:58.920 --> 00:26:05.220
in modeling asset price
returns and market efficiency.

00:26:05.220 --> 00:26:08.230
Fama and French
found that there were

00:26:08.230 --> 00:26:11.860
some very important
factors affecting

00:26:11.860 --> 00:26:14.330
asset returns in equities.

00:26:14.330 --> 00:26:18.300
Basically, returns
tended to vary depending

00:26:18.300 --> 00:26:20.910
upon the size of firms.

00:26:20.910 --> 00:26:25.680
So if you consider small
firms versus large firms,

00:26:25.680 --> 00:26:27.516
small firms tended to
have returns that were

00:26:27.516 --> 00:26:28.640
more similar to each other.

00:26:28.640 --> 00:26:30.660
Large firms tended to
have returns that were

00:26:30.660 --> 00:26:32.110
more similar to each other.

00:26:32.110 --> 00:26:35.920
So there's basically a
big versus small factor

00:26:35.920 --> 00:26:38.500
that is operating in the market.

00:26:38.500 --> 00:26:40.610
Sometimes the market
prefers small stocks,

00:26:40.610 --> 00:26:42.790
sometimes it prefers
large stocks.

00:26:42.790 --> 00:26:48.580
And similarly,
there's another factor

00:26:48.580 --> 00:26:50.950
which is value versus growth.

00:26:54.030 --> 00:26:58.410
Basically, stocks that
are considered good values

00:26:58.410 --> 00:27:02.914
are stocks which are cheap,
basically, for what they have.

00:27:02.914 --> 00:27:04.955
So you're basically getting
a stock at a discount

00:27:04.955 --> 00:27:08.390
if you're getting a good value.

00:27:08.390 --> 00:27:12.500
And value stocks can be
measured by looking at the price

00:27:12.500 --> 00:27:13.165
to book equity.

00:27:13.165 --> 00:27:15.940
If that's low, then
the price you're

00:27:15.940 --> 00:27:20.600
paying for that equity in the
firm is low, and it's cheap.

00:27:20.600 --> 00:27:24.150
And that compares
with stocks for which

00:27:24.150 --> 00:27:28.110
the price relative to the
book value is very, very high.

00:27:28.110 --> 00:27:32.240
Why are people willing
to pay a lot for stocks?

00:27:32.240 --> 00:27:35.000
In that case, well it's
because the growth prospects

00:27:35.000 --> 00:27:39.030
of those stocks is high,
and there's an anticipation

00:27:39.030 --> 00:27:41.580
basically that the
current price is just

00:27:41.580 --> 00:27:47.610
reflecting a projection of how
much growth potential there is.

00:27:47.610 --> 00:27:51.670
Now the Fama French approach
is for each of these factors

00:27:51.670 --> 00:27:57.080
to basically rank order all
the stocks by this factor

00:27:57.080 --> 00:28:01.800
and divide them
up into quintiles.

00:28:01.800 --> 00:28:06.550
So say this is market cap.

00:28:06.550 --> 00:28:12.030
We can divide up all the
stocks in-- basically

00:28:12.030 --> 00:28:15.000
consider a histogram,
or whatever,

00:28:15.000 --> 00:28:18.230
of all the market caps of all
the stocks in our universe.

00:28:18.230 --> 00:28:23.500
And then divide it up into
basically the bottom fifth,

00:28:23.500 --> 00:28:27.026
the next fifth, and
then-- it probably

00:28:27.026 --> 00:28:29.830
needs to go up-- the top fifth.

00:28:33.120 --> 00:28:35.960
And the Fama-French
approach says, well,

00:28:35.960 --> 00:28:41.080
let's look at an equal-weighted
average of the top fifth.

00:28:41.080 --> 00:28:50.920
And basically, buy that
and sell the bottom fifth.

00:28:50.920 --> 00:28:55.620
And so that would be the
big versus small market

00:28:55.620 --> 00:28:57.640
factor of Fama and French.

00:28:57.640 --> 00:29:00.300
Now, if you look at
their work, they actually

00:29:00.300 --> 00:29:03.080
do the bottom minus the
top, because the value

00:29:03.080 --> 00:29:04.670
tends to outperform the other.

00:29:04.670 --> 00:29:07.010
So they have a factor
whose more positive

00:29:07.010 --> 00:29:08.510
values and associated
more generally

00:29:08.510 --> 00:29:10.660
with positive returns.

00:29:10.660 --> 00:29:14.780
But that factor can
be applied and used

00:29:14.780 --> 00:29:20.400
to correlate with individual
asset returns as well.

00:29:26.590 --> 00:29:28.580
Now, with the BARRA
Industry Factor--

00:29:28.580 --> 00:29:35.960
this is just getting back
to the BARRA Approach--

00:29:35.960 --> 00:29:37.840
the simplest case
of understanding

00:29:37.840 --> 00:29:40.580
the BARRA industry
factor models is

00:29:40.580 --> 00:29:42.820
to consider looking
at dividing stocks up

00:29:42.820 --> 00:29:45.020
into different industry groups.

00:29:45.020 --> 00:29:56.610
So we might expect that,
say oil stocks will

00:29:56.610 --> 00:30:02.100
tend to move together and
have greater variability

00:30:02.100 --> 00:30:04.790
or common variability.

00:30:04.790 --> 00:30:10.640
And that could be very different
from utility stocks, which

00:30:10.640 --> 00:30:13.105
tend to actually be
quite low-risk stocks.

00:30:17.749 --> 00:30:19.290
Utility companies
are companies which

00:30:19.290 --> 00:30:21.910
are very highly regulated.

00:30:21.910 --> 00:30:26.360
And the profitability
of those firms

00:30:26.360 --> 00:30:30.850
is basically overlooked
by the regulators.

00:30:30.850 --> 00:30:37.110
They don't want the
utilities to gouge consumers

00:30:37.110 --> 00:30:41.890
and make too much profit from
delivering power to customers.

00:30:41.890 --> 00:30:46.555
So utilities tend to have
fairly low volatility

00:30:46.555 --> 00:30:50.330
but very consistent
returns, which

00:30:50.330 --> 00:30:55.530
are based on reasonable,
from a regulatory standpoint,

00:30:55.530 --> 00:30:58.110
levels of profitability
for those companies.

00:30:58.110 --> 00:31:03.270
Well with an industry
factor model, what we can do

00:31:03.270 --> 00:31:08.710
is associate factor
loadings, which basically

00:31:08.710 --> 00:31:13.570
are loading each asset in
terms of which industry group

00:31:13.570 --> 00:31:14.520
it's in.

00:31:14.520 --> 00:31:20.140
So we actually know the beta
values for these stocks,

00:31:20.140 --> 00:31:23.080
but we don't know
the underlying factor

00:31:23.080 --> 00:31:26.400
realizations for these stocks.

00:31:26.400 --> 00:31:29.480
But in terms of the
betas, with these factors

00:31:29.480 --> 00:31:34.641
we can basically get a well
defined beta vectors and B

00:31:34.641 --> 00:31:37.390
matrix for all the stocks.

00:31:37.390 --> 00:31:40.650
And the problem
then is, how do we

00:31:40.650 --> 00:31:44.540
specify the realization
of the underlying factors?

00:31:44.540 --> 00:31:51.000
Well the realization of
the underlying factors

00:31:51.000 --> 00:31:56.190
basically is just estimated
with a regression model.

00:31:56.190 --> 00:32:06.300
And so if we have all of our
assets x_i for different times

00:32:06.300 --> 00:32:13.700
t, those would have a model
given by factor realizations

00:32:13.700 --> 00:32:21.380
corresponding to the k industry
groups with known beta_(i,j)

00:32:21.380 --> 00:32:21.880
values.

00:32:29.010 --> 00:32:34.030
And the estimation of
these, we basically

00:32:34.030 --> 00:32:36.940
have a simple
regression model where

00:32:36.940 --> 00:32:43.060
the realizations of
the factor returns f_t

00:32:43.060 --> 00:32:45.840
are given by essentially
a regression coefficient

00:32:45.840 --> 00:32:50.270
in this regression, where
we have the asset returns

00:32:50.270 --> 00:32:54.840
x_t, the known
factor loadings B,

00:32:54.840 --> 00:32:58.520
the unknown factor
realizations f_t.

00:32:58.520 --> 00:33:01.930
And just plugging
into the regression,

00:33:01.930 --> 00:33:05.500
if we do it very simply
we get this expression

00:33:05.500 --> 00:33:10.710
for f hat t, which is the
simple linear regression model

00:33:10.710 --> 00:33:13.310
estimating those realizations.

00:33:13.310 --> 00:33:20.660
Now this particular estimate
of the factor realizations

00:33:20.660 --> 00:33:29.020
is assuming that the variability
of the components of x

00:33:29.020 --> 00:33:31.960
have the same variance.

00:33:31.960 --> 00:33:33.830
This is like the
linear regression

00:33:33.830 --> 00:33:36.940
estimates under normal
Gauss-Markov assumptions.

00:33:36.940 --> 00:33:42.970
But basically the
epsilon_i's will

00:33:42.970 --> 00:33:44.420
vary across the
different assets.

00:33:44.420 --> 00:33:47.240
The different assets will
have different variabilities,

00:33:47.240 --> 00:33:48.500
different specific variances.

00:33:48.500 --> 00:33:53.630
So that's actually going
to be heteroscedasticity

00:33:53.630 --> 00:33:54.720
in these models.

00:33:54.720 --> 00:33:57.840
So this particular vector
of industry averages

00:33:57.840 --> 00:34:06.670
should actually be extended
to accommodate for that.

00:34:06.670 --> 00:34:10.940
So we have the estimation
of the covariance matrix

00:34:10.940 --> 00:34:14.900
of the factors can
then be estimated

00:34:14.900 --> 00:34:17.909
using these estimates
of the realizations.

00:34:17.909 --> 00:34:21.599
And our estimation of the
residual covariance matrix

00:34:21.599 --> 00:34:22.515
can then be estimated.

00:34:25.310 --> 00:34:29.639
So I guess an initial estimate
of the covariance matrix sigma

00:34:29.639 --> 00:34:34.409
hat is given by this known
matrix B times our sample

00:34:34.409 --> 00:34:39.340
estimate of the factor
realizations plus the diagonal

00:34:39.340 --> 00:34:42.330
matrix C hat.

00:34:42.330 --> 00:34:46.310
And a second step
in this process

00:34:46.310 --> 00:34:49.659
can incorporate
information about there

00:34:49.659 --> 00:34:53.659
being heteroscedasticity along
the diagonal of the psi's

00:34:53.659 --> 00:34:56.699
to adjust the
regression estimates.

00:34:56.699 --> 00:35:00.950
So we basically get a
refinement of the estimates

00:35:00.950 --> 00:35:05.640
that does account for the
non-constant variability.

00:35:05.640 --> 00:35:13.750
Now this property of
heteroscedasticity verses

00:35:13.750 --> 00:35:20.270
homoscedasticity in estimating
the regression parameters,

00:35:20.270 --> 00:35:22.460
it may seem like
this is a nicety

00:35:22.460 --> 00:35:27.550
of the statistical theory that
we just have to try and check,

00:35:27.550 --> 00:35:29.180
but it's not too big a deal.

00:35:29.180 --> 00:35:36.820
But let me highlight where this
issue comes up again and again.

00:35:36.820 --> 00:35:43.880
With portfolio optimization,
we went through last time--

00:35:43.880 --> 00:35:46.300
for mean-variance
optimization, we

00:35:46.300 --> 00:35:50.550
want to consider a weighting of
assets that basically weights

00:35:50.550 --> 00:35:56.640
the assets by the expected
returns, pre-multiplied

00:35:56.640 --> 00:36:00.500
by the inverse of the
covariance matrix.

00:36:00.500 --> 00:36:04.150
And so we basically in
portfolio allocation

00:36:04.150 --> 00:36:07.010
want to allocate to
assets with high return,

00:36:07.010 --> 00:36:10.970
but we're going to penalize
those with high variance.

00:36:10.970 --> 00:36:21.450
And so the impact of discounting
values with high variability

00:36:21.450 --> 00:36:23.810
arises in asset allocation.

00:36:23.810 --> 00:36:27.480
And then of course arises
in statistical estimation.

00:36:27.480 --> 00:36:30.160
Basically with signals
with high noise,

00:36:30.160 --> 00:36:33.070
you want to normalize
by the level of noise

00:36:33.070 --> 00:36:37.900
before you incorporate the
impact of that variable

00:36:37.900 --> 00:36:38.900
on the particular model.

00:36:45.400 --> 00:36:47.560
So here are just some notes
about the inefficiency

00:36:47.560 --> 00:36:50.170
of estimates due to
heteroscedasticity.

00:36:50.170 --> 00:36:53.032
We can apply generalized
least squares.

00:36:53.032 --> 00:36:56.470
A second bullet here is
that factor realizations

00:36:56.470 --> 00:37:00.063
can be scaled to represent
factor mimicking portfolios.

00:37:02.690 --> 00:37:06.360
Now with the
Fama-French factors,

00:37:06.360 --> 00:37:09.360
where we have say big
versus small stocks or value

00:37:09.360 --> 00:37:11.410
versus growth
stocks, it would be

00:37:11.410 --> 00:37:16.060
nice to know, well what's
the real value of trading

00:37:16.060 --> 00:37:17.390
that factor?

00:37:17.390 --> 00:37:21.550
If you were to have unit
weight to trading that factor,

00:37:21.550 --> 00:37:22.740
would you make money or not?

00:37:22.740 --> 00:37:26.040
Or under what periods
would you make money?

00:37:26.040 --> 00:37:31.010
And the notion of factor
mimicking portfolios

00:37:31.010 --> 00:37:31.590
is important.

00:37:31.590 --> 00:37:38.060
Let's go back to the
specification of the factor

00:37:38.060 --> 00:37:41.440
realizations here.

00:37:41.440 --> 00:37:48.210
f hat t, the t-th realization
of the factors, their k factors,

00:37:48.210 --> 00:37:50.680
is given by essentially
the regression estimate

00:37:50.680 --> 00:37:54.400
of those factors from the
realizations of the asset

00:37:54.400 --> 00:37:55.150
returns.

00:37:55.150 --> 00:37:57.230
And if we're doing
this in the proper way,

00:37:57.230 --> 00:38:01.370
we'd be correcting for
the heteroscedasticity.

00:38:01.370 --> 00:38:06.810
Well this realization
of the factor returns

00:38:06.810 --> 00:38:17.430
is a weighted average or
a weighted sum of the x_t.

00:38:17.430 --> 00:38:29.100
So we have basically f_t
is equal to a matrix times

00:38:29.100 --> 00:38:39.409
x_t, where this is B B prime
toe the minus one, B prime.

00:38:42.250 --> 00:38:49.216
So our k-dimensional
realizations-- let's see,

00:38:49.216 --> 00:38:54.000
this is basically k by 1.

00:38:57.470 --> 00:39:04.100
Each of these k factors is
a weighted sum of these x's.

00:39:04.100 --> 00:39:06.570
Now the x's, if these are
returns on the underlying

00:39:06.570 --> 00:39:13.770
assets, then we can consider
normalizing these factors.

00:39:13.770 --> 00:39:16.740
Or basically normalizing
this matrix here

00:39:16.740 --> 00:39:23.880
so that the row weights sum to
1, say, for a unit of capital.

00:39:23.880 --> 00:39:28.380
So if we were to invest
a net unit of one capital

00:39:28.380 --> 00:39:32.550
unit in these assets, then
this factor realization

00:39:32.550 --> 00:39:38.510
would give us the return on
a portfolio of the assets

00:39:38.510 --> 00:39:43.290
that is perfectly correlated
with the factor realization.

00:39:43.290 --> 00:39:49.820
So factor mimicking portfolios
can be defined that way.

00:39:49.820 --> 00:39:54.220
And they have a
good interpretation

00:39:54.220 --> 00:39:57.080
in terms of the realization
of potential investments.

00:40:02.980 --> 00:40:03.990
So let's go back.

00:40:18.000 --> 00:40:21.870
The next subject is
statistical factor models.

00:40:21.870 --> 00:40:26.540
This is the case where
we begin the analysis

00:40:26.540 --> 00:40:31.740
with just our collection of
outcomes for the process x_t.

00:40:31.740 --> 00:40:34.150
So basically our
time series of asset

00:40:34.150 --> 00:40:40.100
returns for m assets
over T time units.

00:40:40.100 --> 00:40:44.300
And we have no clue initially
what the underlying factors

00:40:44.300 --> 00:40:47.360
are, but we hypothesize
that there are factors that

00:40:47.360 --> 00:40:49.570
do characterize the returns.

00:40:49.570 --> 00:40:52.090
And factor analysis and
principal components analysis

00:40:52.090 --> 00:40:57.890
provide ways of uncovering
those underlying factors

00:40:57.890 --> 00:41:01.335
and defining them in terms
of the data themselves.

00:41:15.540 --> 00:41:18.020
So we'll first talk
about factor analysis.

00:41:18.020 --> 00:41:21.290
Then we'll turn to principal
components analysis.

00:41:21.290 --> 00:41:26.290
Both of these
methods are efforts

00:41:26.290 --> 00:41:29.360
to model the covariance matrix.

00:41:29.360 --> 00:41:37.710
And the underlying covariance
matrix for the assets x

00:41:37.710 --> 00:41:40.810
can be estimated with sample
data in terms of the sample

00:41:40.810 --> 00:41:42.100
covariance matrix.

00:41:42.100 --> 00:41:45.090
So here I've just written
out in matrix form

00:41:45.090 --> 00:41:47.700
how that would be computed.

00:41:47.700 --> 00:41:57.260
And so with this m by T
matrix x, we basically

00:41:57.260 --> 00:42:03.280
take that matrix, take
out the means by computing

00:42:03.280 --> 00:42:06.380
the means with multiplying
by this matrix,

00:42:06.380 --> 00:42:10.480
and then take the
sum of deviations

00:42:10.480 --> 00:42:14.280
about the means for
all the m assets

00:42:14.280 --> 00:42:17.470
individually and
across each other,

00:42:17.470 --> 00:42:40.170
and divide that by capital T.

00:42:40.170 --> 00:42:42.750
Now, the setup for
statistical factor models

00:42:42.750 --> 00:42:48.810
is exactly the same as
before, except the only thing

00:42:48.810 --> 00:42:51.620
that we observe is x_t.

00:42:51.620 --> 00:42:59.490
So we're hypothesizing a
model where alpha is basically

00:42:59.490 --> 00:43:06.580
a vector that is basically
the vector of mean returns

00:43:06.580 --> 00:43:08.420
of the individual assets.

00:43:08.420 --> 00:43:14.500
B is a matrix of factor
loadings on k factors f_t.

00:43:14.500 --> 00:43:18.240
And epsilon_t is white
noise with mean 0,

00:43:18.240 --> 00:43:20.910
covariance matrix
given by the diagonal.

00:43:20.910 --> 00:43:25.410
So the setup here is the
same basic setup as before,

00:43:25.410 --> 00:43:33.100
but we don't have the
matrix B or the vector f_t.

00:43:35.790 --> 00:43:37.030
Or, of course, alpha.

00:43:39.920 --> 00:43:43.240
Now in this setup,
it's important

00:43:43.240 --> 00:43:49.920
that there is an
indeterminacy of this model,

00:43:49.920 --> 00:43:57.450
because for any given
specification of the matrix B

00:43:57.450 --> 00:44:04.540
or the factors f, we can
actually transform those

00:44:04.540 --> 00:44:09.840
by a k by k invertible
matrix H. So for a given

00:44:09.840 --> 00:44:13.900
specification of this model,
if we transform the underlying

00:44:13.900 --> 00:44:19.230
factor realizations f by the
matrix H, which is k by k,

00:44:19.230 --> 00:44:25.890
then if we transform the
factor loadings B by H inverse,

00:44:25.890 --> 00:44:28.290
we get the same model.

00:44:28.290 --> 00:44:31.800
So there is an indeterminacy
here, or a-- OK,

00:44:31.800 --> 00:44:37.940
there's an indeterminacy of
these particular variables,

00:44:37.940 --> 00:44:41.970
but there's basically
flexibility in how

00:44:41.970 --> 00:44:44.630
we define the factor model.

00:44:44.630 --> 00:44:48.050
So in trying to uncover a factor
model with statistical factor

00:44:48.050 --> 00:44:50.990
analysis, there is
some flexibility

00:44:50.990 --> 00:44:53.130
in defining our factors.

00:44:53.130 --> 00:44:57.030
We can arbitrarily
transform the factors

00:44:57.030 --> 00:45:00.500
by an invertible
transformation in the k space.

00:45:15.050 --> 00:45:19.560
And I guess it's important
to note that what changes

00:45:19.560 --> 00:45:22.550
when we do that transformation?

00:45:22.550 --> 00:45:24.570
Well the linear
function stays the same

00:45:24.570 --> 00:45:28.040
in terms of the covariance
matrix of the underlying

00:45:28.040 --> 00:45:29.180
factors.

00:45:29.180 --> 00:45:31.880
Well, if we have a covariance
matrix for those underlying

00:45:31.880 --> 00:45:36.350
factors, we need to accommodate
the matrix transformation

00:45:36.350 --> 00:45:37.690
H in that.

00:45:37.690 --> 00:45:39.640
So that has an impact there.

00:45:39.640 --> 00:45:44.030
But one of the
things we can do is

00:45:44.030 --> 00:45:49.040
consider trying to
define a matrix H, that

00:45:49.040 --> 00:45:50.780
diagonalizes the factors.

00:45:50.780 --> 00:45:53.790
So in some settings, it's useful
to consider factor models where

00:45:53.790 --> 00:46:00.260
you have uncorrelated
factor components.

00:46:00.260 --> 00:46:04.140
And it's possible to
define linear factor

00:46:04.140 --> 00:46:09.290
models with uncorrelated factor
components by a choice of H.

00:46:09.290 --> 00:46:12.440
So with any linear
factor model in fact,

00:46:12.440 --> 00:46:17.571
we can have uncorrelated factor
components if that's useful.

00:46:21.720 --> 00:46:26.300
So this first bullet
highlights that point

00:46:26.300 --> 00:46:30.200
that we can get
orthonormal factors.

00:46:32.930 --> 00:46:37.490
And we can also have
zero mean factors

00:46:37.490 --> 00:46:41.530
by adjusting the
data to incorporate

00:46:41.530 --> 00:46:43.340
the mean of these factors.

00:46:45.930 --> 00:46:53.290
And if we make these
particular assumptions,

00:46:53.290 --> 00:46:55.840
then the model does
simplify to just

00:46:55.840 --> 00:47:02.400
being the covariance matrix
sigma_x is the factor

00:47:02.400 --> 00:47:08.020
loadings B times its transpose
plus a diagonal matrix.

00:47:08.020 --> 00:47:11.060
And just to reiterate,
the power of this

00:47:11.060 --> 00:47:19.130
is basically no matter how
large m is, as m increases

00:47:19.130 --> 00:47:28.004
the B matrix just increases
by k for every increment in m.

00:47:28.004 --> 00:47:32.490
And we also have an additional
diagonal entry on the psi.

00:47:32.490 --> 00:47:39.320
So as we add more and more
assets to our modeling,

00:47:39.320 --> 00:47:42.660
the complexity basically
doesn't increase very much.

00:47:51.720 --> 00:47:55.520
With all of our statistical
models, one of the questions

00:47:55.520 --> 00:47:59.850
is how do we specify the
particular parameters?

00:47:59.850 --> 00:48:05.930
Maximum likelihood estimation is
the first thing to go through,

00:48:05.930 --> 00:48:12.050
and with normal
linear factor models

00:48:12.050 --> 00:48:13.580
we have normal
distributions for all

00:48:13.580 --> 00:48:16.890
the underlying random variables.

00:48:16.890 --> 00:48:19.755
So the residuals
epsilon_t are independent

00:48:19.755 --> 00:48:23.970
and identically distributed,
multivariate normal dimension m

00:48:23.970 --> 00:48:30.760
with diagonal matrix psi given
by the individual elements'

00:48:30.760 --> 00:48:31.740
variances.

00:48:31.740 --> 00:48:36.950
f_t, the realization
of the factors,

00:48:36.950 --> 00:48:40.490
the k-dimensional
factors can have mean 0,

00:48:40.490 --> 00:48:43.970
and just to have the
identity covariance

00:48:43.970 --> 00:48:48.550
we can scale them and
make them uncorrelated.

00:48:48.550 --> 00:48:53.050
And then x_t will be
normally distributed

00:48:53.050 --> 00:48:55.360
with mean alpha and
covariance matrix

00:48:55.360 --> 00:48:59.210
sigma_x given by the formulas
in the previous slide.

00:49:03.020 --> 00:49:05.370
With these assumptions,
we can write down

00:49:05.370 --> 00:49:08.130
the model likelihood.

00:49:08.130 --> 00:49:10.440
The model likelihood
is the joint density

00:49:10.440 --> 00:49:12.195
of our data given the
unknown parameters.

00:49:22.670 --> 00:49:28.720
And the standard setup actually
for statistical factor modeling

00:49:28.720 --> 00:49:31.290
is to assume
independence over time.

00:49:31.290 --> 00:49:34.900
Now we know that there can
be time series dependence.

00:49:34.900 --> 00:49:37.240
We won't deal with
that at this point.

00:49:37.240 --> 00:49:41.340
Let's just assume that they
are independent across time.

00:49:41.340 --> 00:49:46.280
Then we can consider this
as simply the product

00:49:46.280 --> 00:49:51.570
of the conditional density
of x_t given alpha and sigma,

00:49:51.570 --> 00:49:54.020
which has this form.

00:49:54.020 --> 00:50:00.120
This form for the density
function of a multivariate

00:50:00.120 --> 00:50:05.210
normal should be very
familiar to you at this point.

00:50:05.210 --> 00:50:07.950
It's basically the extension
of the univariate normal

00:50:07.950 --> 00:50:10.010
distribution to m-variate.

00:50:10.010 --> 00:50:14.370
So we have 1 over the square
root of pi to the m power.

00:50:14.370 --> 00:50:16.380
There are m components.

00:50:16.380 --> 00:50:23.170
And then we divide by the square
root of the individual variance

00:50:23.170 --> 00:50:26.430
or the determinant of
the covariance matrix.

00:50:26.430 --> 00:50:31.970
And then the exponential
of this term here,

00:50:31.970 --> 00:50:41.370
which for the t-th case is
a quadratic form in the x's.

00:50:41.370 --> 00:50:46.050
So this multivariate normal
x, we take off its mean

00:50:46.050 --> 00:50:48.759
and look at the quadratic
form of that with the inverse

00:50:48.759 --> 00:50:49.800
of its covariance matrix.

00:50:57.650 --> 00:50:59.660
So there's the
log-likelihood function.

00:50:59.660 --> 00:51:06.400
It reduces to this form here.

00:51:06.400 --> 00:51:09.170
And maximum likelihood
estimation methods

00:51:09.170 --> 00:51:16.550
can be applied to specify all
the parameters of B and psi.

00:51:16.550 --> 00:51:23.620
And there's an EM algorithm,
which is applied in this case.

00:51:23.620 --> 00:51:26.070
I think I may have
highlighted it before,

00:51:26.070 --> 00:51:30.000
but the EM algorithm is a very
powerful estimation methodology

00:51:30.000 --> 00:51:33.850
for maximum likelihood
in statistics.

00:51:33.850 --> 00:51:40.520
When one has very
complicated models which

00:51:40.520 --> 00:51:44.530
can be simplified-- well,
models that are complicated

00:51:44.530 --> 00:51:47.830
by the fact that we have
hidden variables-- basically

00:51:47.830 --> 00:51:51.760
the hidden variables lead
to very complex likelihood

00:51:51.760 --> 00:51:54.550
functions.

00:51:54.550 --> 00:51:56.330
A simplification
of the EM algorithm

00:51:56.330 --> 00:52:00.450
is that if we could observe
some of the hidden variables,

00:52:00.450 --> 00:52:02.325
then our likelihood
functions are very simple

00:52:02.325 --> 00:52:05.070
and can be computed directly.

00:52:05.070 --> 00:52:10.820
And the EM algorithm
alternates estimating

00:52:10.820 --> 00:52:14.620
the hidden variables, assuming
the hidden variables are known

00:52:14.620 --> 00:52:18.362
doing the simple estimates with
the observed hidden variables,

00:52:18.362 --> 00:52:20.320
and then estimating the
hidden variables again,

00:52:20.320 --> 00:52:22.860
and just iterating that
process again and again.

00:52:22.860 --> 00:52:24.100
And it converges.

00:52:24.100 --> 00:52:26.460
And their paper
demonstrates that this

00:52:26.460 --> 00:52:29.980
applies in many, many
different application settings.

00:52:29.980 --> 00:52:33.610
And it's just a very, very
powerful estimation methodology

00:52:33.610 --> 00:52:39.900
that is applied here with
statistical factor analysis.

00:52:39.900 --> 00:52:45.460
I indicated that for now we
could just assume independence

00:52:45.460 --> 00:52:49.970
over time of the data points
in computing its likelihood.

00:52:49.970 --> 00:52:53.060
You recall our discussion
a couple of lectures back

00:52:53.060 --> 00:52:57.260
about the state-space models,
linear state-space models.

00:52:57.260 --> 00:53:00.710
Essentially, that linear
state-space model framework

00:53:00.710 --> 00:53:03.830
can be applied here as
well to incorporate time

00:53:03.830 --> 00:53:06.840
dependence in the data as well.

00:53:10.220 --> 00:53:16.190
So that simplification is not
binding in terms of holding us

00:53:16.190 --> 00:53:17.970
up in estimating these models.

00:53:25.555 --> 00:53:28.320
Let me go back here, OK.

00:53:28.320 --> 00:53:32.160
So the maximum likelihood
estimation process

00:53:32.160 --> 00:53:37.920
will give us estimates of the
B matrix and the psi matrix.

00:53:37.920 --> 00:53:43.630
So applying this EM
algorithm, a good computer

00:53:43.630 --> 00:53:51.880
can actually get estimates of
B and psi and the underlying

00:53:51.880 --> 00:53:53.880
alpha, of course.

00:53:53.880 --> 00:54:03.660
Now from these we can estimate
the factor realizations f_t.

00:54:03.660 --> 00:54:11.560
And these can be estimated by
simply this regression formula,

00:54:11.560 --> 00:54:13.640
using our estimates for
the factor loadings B

00:54:13.640 --> 00:54:17.720
hat, our estimates of
alpha, we can actually

00:54:17.720 --> 00:54:20.540
estimate the factor
realizations.

00:54:20.540 --> 00:54:24.390
So with statistical
factor analysis,

00:54:24.390 --> 00:54:27.360
we use the EM algorithm to
estimate the covariance matrix

00:54:27.360 --> 00:54:28.510
parameters.

00:54:28.510 --> 00:54:32.455
Then the next step, we
can estimate the factor

00:54:32.455 --> 00:54:32.996
realizations.

00:54:37.240 --> 00:54:41.310
So as the output
from factor analysis,

00:54:41.310 --> 00:54:45.830
we can work with these
factor realizations.

00:54:45.830 --> 00:54:50.610
And those realizations
or those estimates

00:54:50.610 --> 00:54:52.590
of the realizations
of the factors

00:54:52.590 --> 00:55:00.570
can then be used basically
for risk modeling as well.

00:55:00.570 --> 00:55:10.150
So we could do a statistical
factor analysis of returns

00:55:10.150 --> 00:55:15.980
in, say, the
commodities markets.

00:55:15.980 --> 00:55:21.610
And identify what factors are
driving returns and covariances

00:55:21.610 --> 00:55:23.910
in commodity markets.

00:55:23.910 --> 00:55:26.120
We can then get estimates
of those underlying

00:55:26.120 --> 00:55:29.570
factors from the methodology.

00:55:29.570 --> 00:55:32.610
We could then use those
as inputs to other models.

00:55:32.610 --> 00:55:35.210
Certain stocks may depend
on significant factors

00:55:35.210 --> 00:55:36.900
in the commodity markets.

00:55:36.900 --> 00:55:41.310
And what they depend on, well
we can use statistical modeling

00:55:41.310 --> 00:55:44.880
to identify where
the dependencies are.

00:55:44.880 --> 00:55:49.530
So getting these realizations
of the statistical factors

00:55:49.530 --> 00:55:52.610
is very useful, not
only to understand

00:55:52.610 --> 00:55:55.330
what happened in the
past with the process

00:55:55.330 --> 00:55:57.030
and how these
underlying factors vary,

00:55:57.030 --> 00:56:00.080
but you can also use those
as inputs to other models.

00:56:11.770 --> 00:56:16.950
Finally, let's see,
there was a lot

00:56:16.950 --> 00:56:19.050
of interest with
statistical factor

00:56:19.050 --> 00:56:23.030
analysis on the interpretation
of the underlying factors.

00:56:23.030 --> 00:56:28.320
Of course, in terms
of using any model,

00:56:28.320 --> 00:56:32.460
it's once confidence
rises when you have

00:56:32.460 --> 00:56:34.580
highly interpretable results.

00:56:34.580 --> 00:56:37.630
One of the initial applications
of statistical factor analysis

00:56:37.630 --> 00:56:40.310
was in measuring IQ.

00:56:40.310 --> 00:56:45.070
And how many people here
have taken an IQ test?

00:56:45.070 --> 00:56:47.580
Probably everybody
or almost everybody?

00:56:47.580 --> 00:56:51.240
Well actually if you want to
work for some hedge funds,

00:56:51.240 --> 00:56:54.690
you'll have to
take some IQ tests.

00:56:54.690 --> 00:57:00.402
But basically in an IQ test
there are 20, 30, 40 questions.

00:57:00.402 --> 00:57:02.360
And they're trying to
measure different aspects

00:57:02.360 --> 00:57:04.630
of your ability.

00:57:04.630 --> 00:57:09.820
And statistical
factor analysis has

00:57:09.820 --> 00:57:12.990
been used to try and understand
what are the underlying

00:57:12.990 --> 00:57:14.930
dimensions of intelligence.

00:57:14.930 --> 00:57:21.930
And one has the
flexibility of considering

00:57:21.930 --> 00:57:25.350
different transformations
of any given

00:57:25.350 --> 00:57:30.290
set of estimated factors by this
H matrix for transformation.

00:57:30.290 --> 00:57:35.230
And so there has been work in
statistical factor analysis

00:57:35.230 --> 00:57:38.520
to find rotations of
the factor loadings

00:57:38.520 --> 00:57:42.220
that make the factors
more interpretable.

00:57:42.220 --> 00:57:48.650
So I just raise that as there's
potential to get insight

00:57:48.650 --> 00:57:51.390
into these underlying factors
if that's appropriate.

00:57:51.390 --> 00:57:54.100
In the IQ setting, the
effort was actually

00:57:54.100 --> 00:57:57.979
to try and find what
are interpretations

00:57:57.979 --> 00:57:59.645
of different dimensions
of intelligence?

00:58:07.400 --> 00:58:10.940
We previously talked about
factor mimicking portfolios.

00:58:10.940 --> 00:58:13.280
The same thing applies.

00:58:13.280 --> 00:58:18.460
One final thing is with
likelihood ratio tests,

00:58:18.460 --> 00:58:23.700
one can test for whether
the linear factor model is

00:58:23.700 --> 00:58:25.870
a good description of the data.

00:58:25.870 --> 00:58:29.950
And so with likelihood
ratio tests,

00:58:29.950 --> 00:58:32.950
we compare the
likelihood of the data

00:58:32.950 --> 00:58:36.650
where we fit our unknown
parameters, the mean vector

00:58:36.650 --> 00:58:41.190
alpha and covariance matrix
sigma, without any constraints.

00:58:41.190 --> 00:58:45.850
And then we compare that
to the likelihood function

00:58:45.850 --> 00:58:50.070
under the factor model
with, say, k factors.

00:58:50.070 --> 00:58:56.930
And the likelihood
ratio tests are

00:58:56.930 --> 00:59:00.510
computed by looking at twice the
difference in log likelihoods.

00:59:00.510 --> 00:59:04.710
If you take an advanced
course in statistics,

00:59:04.710 --> 00:59:08.790
you'll see that basically this
difference in the likelihood

00:59:08.790 --> 00:59:13.280
functions under many conditions
is approximately a chi

00:59:13.280 --> 00:59:16.030
squared random
variable with degrees

00:59:16.030 --> 00:59:18.030
of freedom equal to the
difference in parameters

00:59:18.030 --> 00:59:20.070
under the two models.

00:59:20.070 --> 00:59:25.230
So that's why it's
specified this way.

00:59:25.230 --> 00:59:29.035
But anyway, one can test for
the dimensionality of the factor

00:59:29.035 --> 00:59:29.535
model.

00:59:33.940 --> 00:59:36.280
Before going into an
example of factor modeling,

00:59:36.280 --> 00:59:39.890
I want to cover principal
components analysis.

00:59:42.606 --> 00:59:44.230
Actually, principal
components analysis

00:59:44.230 --> 00:59:46.990
comes up in factor
modeling, but it's also

00:59:46.990 --> 00:59:52.700
a methodology that's appropriate
for modeling multivariate data

00:59:52.700 --> 00:59:56.410
and considering
dimensionality reduction.

00:59:56.410 --> 00:59:59.750
You're dealing with data
in very many dimensions.

00:59:59.750 --> 01:00:05.680
You're wondering is there
a simple characterization

01:00:05.680 --> 01:00:08.720
of the multivariate
structure that lies

01:00:08.720 --> 01:00:10.770
in a smaller dimensional space?

01:00:10.770 --> 01:00:14.130
And principle components
analysis gives us that.

01:00:14.130 --> 01:00:18.320
The theoretical framework for
principal components analysis

01:00:18.320 --> 01:00:23.300
is to consider an
m-variate random variable.

01:00:23.300 --> 01:00:27.650
So this is like a single
realization of asset returns

01:00:27.650 --> 01:00:31.620
in a given time, which has
some mean and covariance matrix

01:00:31.620 --> 01:00:32.120
sigma.

01:00:34.876 --> 01:00:36.250
The principal
components analysis

01:00:36.250 --> 01:00:41.190
is going to exploit the
eigenvalues and eigenvectors

01:00:41.190 --> 01:00:42.490
of the covariance matrix.

01:00:45.530 --> 01:00:50.320
Choongbum went through
eigenvalues and singular value

01:00:50.320 --> 01:00:51.370
decompositions.

01:00:51.370 --> 01:00:55.640
So here we basically have
the eigenvalue/eigenvector

01:00:55.640 --> 01:00:58.930
decomposition of our
covariance matrix sigma, which

01:00:58.930 --> 01:01:04.700
is the scalar eigenvalues
times the eigenvector

01:01:04.700 --> 01:01:08.270
gamma_i times its transpose.

01:01:08.270 --> 01:01:12.900
So we actually are able to
decompose our covariance matrix

01:01:12.900 --> 01:01:15.450
with eigenvalues, eigenvectors.

01:01:15.450 --> 01:01:20.980
The principal
component variables

01:01:20.980 --> 01:01:28.190
are to consider taking away the
mean from the random vector x,

01:01:28.190 --> 01:01:29.390
alpha.

01:01:29.390 --> 01:01:35.800
And then consider the weighted
average of those de-meaned x's

01:01:35.800 --> 01:01:39.630
given by the i-th eigenvector.

01:01:39.630 --> 01:01:42.210
So these are going to be
called the principal component

01:01:42.210 --> 01:01:46.710
variables, where gamma_1 is
the first one corresponding

01:01:46.710 --> 01:01:48.450
to the largest eigenvalue.

01:01:48.450 --> 01:01:51.609
Gamma m is going to be the
m-th, or last, corresponding

01:01:51.609 --> 01:01:52.275
to the smallest.

01:01:59.690 --> 01:02:07.650
The properties of these
principal component variables

01:02:07.650 --> 01:02:14.030
is that they have mean 0,
and their covariance matrix

01:02:14.030 --> 01:02:17.610
is given by the diagonal
matrix of eigenvalues.

01:02:17.610 --> 01:02:21.670
So the principal
component variables

01:02:21.670 --> 01:02:25.210
are a very simple sort
of affine transformation

01:02:25.210 --> 01:02:29.760
of the original variable x.

01:02:29.760 --> 01:02:38.450
We translate x to a new origin,
basically to the 0 origin,

01:02:38.450 --> 01:02:41.260
by subtracting the means off it.

01:02:41.260 --> 01:02:46.710
And then we multiply
that de-meaned x value

01:02:46.710 --> 01:02:51.335
by an orthogonal
matrix gamma prime.

01:02:54.004 --> 01:02:54.920
And what does that do?

01:02:54.920 --> 01:02:59.450
That simply rotates
the coordinate axes.

01:02:59.450 --> 01:03:04.330
So what we're doing is creating
a new coordinate system

01:03:04.330 --> 01:03:07.860
for our data, which
hasn't changed

01:03:07.860 --> 01:03:11.380
the relative position of the
data or the random variable

01:03:11.380 --> 01:03:14.600
at all in the space.

01:03:14.600 --> 01:03:18.280
Basically, it just is using
a new coordinate system

01:03:18.280 --> 01:03:22.389
with no change in the
overall variability of what

01:03:22.389 --> 01:03:23.886
we're working with.

01:03:38.170 --> 01:03:46.350
In matrix form, we can express
this principal component

01:03:46.350 --> 01:03:48.660
variables p.

01:03:51.540 --> 01:03:54.830
Let's consider partitioning
p into the first k

01:03:54.830 --> 01:03:59.750
elements and the last
m minus k elements p_2.

01:03:59.750 --> 01:04:05.463
Then our original random vector
x has this decomposition.

01:04:09.320 --> 01:04:13.530
And we can think of this
as being approximately

01:04:13.530 --> 01:04:14.850
a linear factor model.

01:04:19.790 --> 01:04:24.260
We can consider from
principal components analysis

01:04:24.260 --> 01:04:26.720
essentially if p_1, the
principle component variables,

01:04:26.720 --> 01:04:32.900
correspond to our factors,
then our linear factor model

01:04:32.900 --> 01:04:37.940
would have B as given by
gamma_1, F as given by p_1.

01:04:37.940 --> 01:04:42.400
And our epsilon vector would
be given by gamma_2 p_2.

01:04:42.400 --> 01:04:45.110
So the principal
components decomposition

01:04:45.110 --> 01:04:48.830
is almost a linear factor model.

01:04:48.830 --> 01:04:59.910
The only issue is this
gamma_2 p_2 is an m-vector,

01:04:59.910 --> 01:05:06.340
but it may not have a
diagonal covariance matrix.

01:05:06.340 --> 01:05:10.360
Under the linear factor model
with a given set of factors

01:05:10.360 --> 01:05:12.630
k less than m, we
always are assuming

01:05:12.630 --> 01:05:17.810
that the residual vector
has covariance matrix

01:05:17.810 --> 01:05:19.140
equal to a diagonal.

01:05:19.140 --> 01:05:21.530
With a principal
components analysis,

01:05:21.530 --> 01:05:25.810
that may or may not be true.

01:05:25.810 --> 01:05:29.870
So this is like an
approximate factor model,

01:05:29.870 --> 01:05:32.540
but that's why this is called
principal components analysis.

01:05:32.540 --> 01:05:35.792
It's not called principal
factor analysis yet.

01:05:45.130 --> 01:05:49.870
The empirical principal
components analysis now.

01:05:49.870 --> 01:05:51.870
We've gone through
just a description

01:05:51.870 --> 01:05:54.670
of theoretical principal
components, where

01:05:54.670 --> 01:05:58.454
if we have a mean vector alpha,
covariance matrix sigma, how

01:05:58.454 --> 01:06:00.620
we would define these
principle component variables.

01:06:00.620 --> 01:06:05.400
If we just have sample
data, then this slide

01:06:05.400 --> 01:06:08.782
goes through the computations
of the empirical principal

01:06:08.782 --> 01:06:09.970
components results.

01:06:09.970 --> 01:06:14.120
So all we're doing is
substituting in estimates

01:06:14.120 --> 01:06:17.220
of the means and
covariance matrix,

01:06:17.220 --> 01:06:19.110
and computing the
eigenvalue/eigenvector

01:06:19.110 --> 01:06:21.060
decomposition of that.

01:06:21.060 --> 01:06:25.180
And we get sample principal
component variables

01:06:25.180 --> 01:06:31.720
which are-- we basically
compute x, the de-meaned vector,

01:06:31.720 --> 01:06:38.780
or matrix of realizations and
pre-multiply that by gamma hat

01:06:38.780 --> 01:06:44.470
prime, which is the
matrix of eigenvectors

01:06:44.470 --> 01:06:46.570
corresponding to the
eigenvalue/eigenvector

01:06:46.570 --> 01:06:48.964
decomposition of the
sample covariance matrix.

01:06:54.790 --> 01:06:59.960
This slide goes through the
singular value decomposition.

01:06:59.960 --> 01:07:03.840
You don't have to go through and
compute variances, covariances

01:07:03.840 --> 01:07:08.340
to derive estimates of the
principal component variables.

01:07:08.340 --> 01:07:11.600
You can work simply with the
singular value decomposition.

01:07:11.600 --> 01:07:15.804
I'll let you go through
that on your own.

01:07:15.804 --> 01:07:18.220
There's an alternate definition
of the principal component

01:07:18.220 --> 01:07:19.803
variable though
that's very important.

01:07:27.270 --> 01:07:32.470
If we consider a
linear combination

01:07:32.470 --> 01:07:40.850
of the components of x, x_1
through x_m, given by w,

01:07:40.850 --> 01:07:45.150
if we consider a linear
combination of that which

01:07:45.150 --> 01:07:48.390
maximizes the variability
of that linear combination

01:07:48.390 --> 01:07:56.040
subject to the norm of the
coefficients w equals 1,

01:07:56.040 --> 01:08:00.340
then this is the first
principal component variable.

01:08:00.340 --> 01:08:08.250
So if we have in two
dimensions the x_1 and x_2,

01:08:08.250 --> 01:08:21.540
if we have points that look like
an ellipsoidal distribution,

01:08:21.540 --> 01:08:28.850
this would correspond to having
alpha 1 there, alpha 2 there,

01:08:28.850 --> 01:08:32.410
a sort of degree of variability.

01:08:32.410 --> 01:08:35.770
The principal
components analysis

01:08:35.770 --> 01:08:42.620
says, let's shift to the origin
being at (alpha_1, alpha_2).

01:08:42.620 --> 01:08:50.370
And then let's rotate the axes
to align with the eigenvectors.

01:08:50.370 --> 01:08:54.170
Well the first principal
component variable

01:08:54.170 --> 01:09:02.550
finds the dimension at which
the coordinate axis at which

01:09:02.550 --> 01:09:04.800
the variability is a maximum.

01:09:04.800 --> 01:09:07.350
And basically along
this dimension

01:09:07.350 --> 01:09:11.790
here, this is where
the variability

01:09:11.790 --> 01:09:12.800
would be the maximum.

01:09:12.800 --> 01:09:15.529
And that's the first
principal component variable.

01:09:15.529 --> 01:09:18.660
So this principal
components analysis

01:09:18.660 --> 01:09:20.930
is identifying
essentially where's

01:09:20.930 --> 01:09:23.620
there the most
variability in the data,

01:09:23.620 --> 01:09:28.390
where it's the most variability
without doing any change

01:09:28.390 --> 01:09:30.270
in the scaling of the data?

01:09:30.270 --> 01:09:33.816
All we're doing is
shifting and rotating.

01:09:33.816 --> 01:09:35.649
Then the second principal
component variable

01:09:35.649 --> 01:09:38.420
is basically the
direction which is

01:09:38.420 --> 01:09:42.529
orthogonal to the first, which
has the maximum variance.

01:09:42.529 --> 01:09:46.400
And continuing that
process to define all

01:09:46.400 --> 01:09:48.160
m principal component variables.

01:09:56.780 --> 01:09:58.180
In principal
components analysis,

01:09:58.180 --> 01:10:01.600
there's discussions of the
total variability of the data

01:10:01.600 --> 01:10:05.460
and how well that's explained
by principal components.

01:10:05.460 --> 01:10:09.030
If we have a covariance
matrix sigma,

01:10:09.030 --> 01:10:13.390
the total variance
can be defined

01:10:13.390 --> 01:10:17.960
and is defined as the sum
of the diagonal entries.

01:10:17.960 --> 01:10:21.040
So it's the trace of
a covariance matrix.

01:10:21.040 --> 01:10:25.220
We'll call that the total
variance of this multivariate

01:10:25.220 --> 01:10:27.160
x.

01:10:27.160 --> 01:10:31.630
That is equal to the sum
of the eigenvalues as well.

01:10:31.630 --> 01:10:36.520
So we have a decomposition
of the total variability

01:10:36.520 --> 01:10:40.050
into the variability of
different principal component

01:10:40.050 --> 01:10:42.070
variables.

01:10:42.070 --> 01:10:44.250
And the principal
component variables

01:10:44.250 --> 01:10:48.459
themselves are uncorrelated.

01:10:48.459 --> 01:10:49.875
You remember the
covariance matrix

01:10:49.875 --> 01:10:51.374
of the principal
component variables

01:10:51.374 --> 01:10:56.750
was the lambda, the diagonal
matrix of eigenvalues.

01:10:56.750 --> 01:11:00.060
So the off-diagonal
entries are 0.

01:11:00.060 --> 01:11:01.590
So the principal
component variables

01:11:01.590 --> 01:11:05.100
are uncorrelated, and
have variability lambda_k,

01:11:05.100 --> 01:11:07.610
and basically decompose
the variability.

01:11:07.610 --> 01:11:09.760
So principal components
analysis provides

01:11:09.760 --> 01:11:14.140
this very nice
decomposition of the data

01:11:14.140 --> 01:11:18.020
into different
dimensions, with highest

01:11:18.020 --> 01:11:22.450
to lowest information content
as given by the eigenvalues.

01:11:28.690 --> 01:11:34.140
I want to go
through a case study

01:11:34.140 --> 01:11:41.295
here of doing factor modeling
with the U.S. Treasury yields.

01:11:43.922 --> 01:11:49.040
I loaded in data into R, which
ranged from the beginning

01:11:49.040 --> 01:11:54.280
of 2000 to the end of May 2013.

01:11:54.280 --> 01:11:58.750
And here are the yields on
constant maturity U.S. Treasury

01:11:58.750 --> 01:12:01.100
securities ranging from
3 months, 6 months,

01:12:01.100 --> 01:12:03.050
up to 20 years.

01:12:03.050 --> 01:12:06.100
So this is essentially
the term structure

01:12:06.100 --> 01:12:12.858
of US Government [INAUDIBLE]
of varying levels of risk.

01:12:18.170 --> 01:12:25.292
Here's a plot of [INAUDIBLE]
over that period.

01:12:33.148 --> 01:12:36.585
So starting in the
[INAUDIBLE], we

01:12:36.585 --> 01:12:41.570
can see this, the rather
dramatic evolution of the term

01:12:41.570 --> 01:12:44.891
structure over
this entire period.

01:12:44.891 --> 01:12:47.320
If we wanted to have
set any [INAUDIBLE].

01:12:52.732 --> 01:12:55.190
If we wanted to do a principal
components analysis of this,

01:12:55.190 --> 01:12:57.900
well if we did the
entire period we'd

01:12:57.900 --> 01:13:01.040
be measuring variability
of all kinds of things.

01:13:01.040 --> 01:13:03.580
When things go down, up, down.

01:13:03.580 --> 01:13:07.380
What I've done in this
note is just initially

01:13:07.380 --> 01:13:15.750
to look at the period
from 2001 up through 2005.

01:13:15.750 --> 01:13:20.620
So we have five years of data
on basically the early part

01:13:20.620 --> 01:13:23.380
of this period that I
want to focus on and do

01:13:23.380 --> 01:13:32.340
a principal components analysis
of the yields on this data.

01:13:32.340 --> 01:13:40.845
So here's basically the series
over that five year period.

01:13:44.470 --> 01:13:47.110
Beginning of this
analysis, this analysis

01:13:47.110 --> 01:13:50.590
is on the actual yield changes.

01:13:50.590 --> 01:13:53.940
So just as we might be modeling
say asset prices over time

01:13:53.940 --> 01:13:58.515
and then doing an analysis
of the changes, the returns,

01:13:58.515 --> 01:14:00.015
here we're looking
at yield changes.

01:14:07.080 --> 01:14:12.000
So first, you can see
there's-- basically,

01:14:12.000 --> 01:14:17.170
the average daily value for the
different yield tenors ranging

01:14:17.170 --> 01:14:20.250
from 3 months up to 20, those
are actually all negative.

01:14:20.250 --> 01:14:24.360
That corresponds to the time
series over that five year

01:14:24.360 --> 01:14:25.160
period.

01:14:25.160 --> 01:14:29.480
Basically the time series
were all at lower levels

01:14:29.480 --> 01:14:31.800
from beginning to
end on average.

01:14:31.800 --> 01:14:36.400
The daily volatility is the
daily standard deviation.

01:14:36.400 --> 01:14:42.590
Those vary from
0.0384 up to .0698

01:14:42.590 --> 01:14:45.650
for-- is that the three year?

01:14:45.650 --> 01:14:49.920
And this is the standard
deviation of daily yield

01:14:49.920 --> 01:14:55.650
changes where 1 is like 1%.

01:14:55.650 --> 01:15:02.310
And so basically it's
between three and six basis

01:15:02.310 --> 01:15:05.407
points a day are the variation
in the yield changes.

01:15:05.407 --> 01:15:06.990
So that's something
that's reasonable.

01:15:06.990 --> 01:15:10.720
When you look at the
news or a newspaper

01:15:10.720 --> 01:15:13.520
and see how interest
rates change from one day

01:15:13.520 --> 01:15:15.780
to the next, it's generally
a few basis points

01:15:15.780 --> 01:15:17.250
from one day to the next.

01:15:20.680 --> 01:15:26.560
This next matrix is
the correlation matrix

01:15:26.560 --> 01:15:27.885
of the yield changes.

01:15:30.440 --> 01:15:32.650
If you look at
this closely, which

01:15:32.650 --> 01:15:38.480
you can when you
download these results,

01:15:38.480 --> 01:15:42.310
you'll see that
near the diagonal

01:15:42.310 --> 01:15:47.870
the values are very high, like
above 90% for correlation.

01:15:47.870 --> 01:15:51.930
And as you move across,
away from the diagonal,

01:15:51.930 --> 01:15:53.918
the correlations
get lower and lower.

01:15:58.180 --> 01:16:02.870
Mathematically that
is what is happening.

01:16:02.870 --> 01:16:05.800
We can look at these things
graphically, which I always

01:16:05.800 --> 01:16:06.810
like to do.

01:16:06.810 --> 01:16:11.840
Here is just a graph, a bar
chart of the yield changes

01:16:11.840 --> 01:16:14.290
and the standard
deviations of the yield

01:16:14.290 --> 01:16:18.910
changes, daily volatilities
ranging from very short yields

01:16:18.910 --> 01:16:22.020
to long-tenor yields,
up to 20 years.

01:16:25.956 --> 01:16:28.416
So there's variability there.

01:16:35.840 --> 01:16:40.500
Here is a pairs
plot of the data.

01:16:40.500 --> 01:16:45.460
So what I've done is
just looked at basically

01:16:45.460 --> 01:16:50.390
for every single tenor, this
is say the 5 year, 7 year,

01:16:50.390 --> 01:16:53.015
10 year, 20 year.

01:16:53.015 --> 01:16:55.970
I basically plotted
the yield changes

01:16:55.970 --> 01:16:57.800
of each of those
against each other.

01:16:57.800 --> 01:17:01.245
We could do this with basically
all nine different tenors,

01:17:01.245 --> 01:17:08.690
and we'd have a very dense
page of a pairs plot.

01:17:08.690 --> 01:17:10.950
So I split it up
into looking just

01:17:10.950 --> 01:17:14.890
at the top and bottom
block diagonals.

01:17:14.890 --> 01:17:18.000
But you can see basically
how the correlation

01:17:18.000 --> 01:17:23.190
between these yield
changes is very tight

01:17:23.190 --> 01:17:26.110
and then gets less tight
as you move further away.

01:17:26.110 --> 01:17:33.250
With the long
tenors-- let's see,

01:17:33.250 --> 01:17:39.030
the short tenors--
one, one more.

01:17:39.030 --> 01:17:44.070
Here the short tenors, ranging
from 3 year, 2 year, 1 year,

01:17:44.070 --> 01:17:45.240
6 month, and so forth.

01:17:45.240 --> 01:17:48.660
So here you can see how it
gets less and less correlated

01:17:48.660 --> 01:17:50.950
as you move away
from a given tenor.

01:17:53.730 --> 01:17:58.100
Well the principal
components analysis

01:17:58.100 --> 01:18:11.700
gives us-- if you conduct
a principal components,

01:18:11.700 --> 01:18:14.200
basically the standard
output is first

01:18:14.200 --> 01:18:18.610
a table of how the
variability of the series

01:18:18.610 --> 01:18:22.210
is broken down across the
different component variables.

01:18:22.210 --> 01:18:24.990
And so there's
basically the importance

01:18:24.990 --> 01:18:29.640
of components for each of
the nine component variables

01:18:29.640 --> 01:18:36.260
where it's measured in terms
of the relative squared

01:18:36.260 --> 01:18:41.140
standard deviations of these
variables relative to the sum.

01:18:41.140 --> 01:18:43.400
And the proportion
of variance explained

01:18:43.400 --> 01:18:47.030
by the first component
variable is 0.849.

01:18:47.030 --> 01:18:50.300
So basically 85% of
the total variability

01:18:50.300 --> 01:18:54.000
is explained by the first
principal component variable.

01:18:54.000 --> 01:18:57.990
Looking at the second
row, second in, 0.0919,

01:18:57.990 --> 01:19:02.042
that's the percentage
of total variability

01:19:02.042 --> 01:19:04.250
explained by the second
principal component variable.

01:19:04.250 --> 01:19:05.700
So 9%.

01:19:05.700 --> 01:19:08.920
And then for third
it's around 3%.

01:19:08.920 --> 01:19:20.940
And it just goes
down closer to 0,

01:19:20.940 --> 01:19:23.800
There's a scree plot for
principal components analysis,

01:19:23.800 --> 01:19:26.352
which is just a plot
of the variability

01:19:26.352 --> 01:19:28.310
of the different principal
component variables.

01:19:28.310 --> 01:19:34.510
So you can see whether
the principal components

01:19:34.510 --> 01:19:37.830
is explaining much variability
in the first few components

01:19:37.830 --> 01:19:38.350
or not.

01:19:38.350 --> 01:19:41.280
Here there's a huge
amount of variability

01:19:41.280 --> 01:19:43.910
explained by the first
principal component variable.

01:19:43.910 --> 01:19:47.930
I've plotted here the
standard deviations

01:19:47.930 --> 01:19:50.616
of the original yield
changes in green,

01:19:50.616 --> 01:19:52.990
versus the standard deviations
of the principal component

01:19:52.990 --> 01:19:55.620
variables in blue.

01:19:55.620 --> 01:20:02.090
So we basically are modeling
with principal component

01:20:02.090 --> 01:20:03.900
variables most of
the variability

01:20:03.900 --> 01:20:06.620
in the first few
principal components.

01:20:06.620 --> 01:20:08.489
Now let's look at
the interpretation

01:20:08.489 --> 01:20:10.030
of the principal
component variables.

01:20:10.030 --> 01:20:12.400
There's the loadings
matrix, which

01:20:12.400 --> 01:20:16.280
is the gamma matrix for the
principal components variables.

01:20:19.440 --> 01:20:25.200
Looking at numbers is
less informative for me

01:20:25.200 --> 01:20:26.160
than looking at graphs.

01:20:26.160 --> 01:20:30.620
Here's a plot of the loadings
on the different yield

01:20:30.620 --> 01:20:34.070
changes for the first
principal component variable.

01:20:34.070 --> 01:20:36.120
So the first principal
component variable

01:20:36.120 --> 01:20:39.760
is a weighted average of
all the yield changes,

01:20:39.760 --> 01:20:44.690
giving greatest weight
to the five year.

01:20:44.690 --> 01:20:45.210
What's that?

01:20:45.210 --> 01:20:51.220
Well that's just a measure of a
level shift in the yield curve.

01:20:51.220 --> 01:20:52.750
It's like, what's
the average yield

01:20:52.750 --> 01:20:55.747
change across the whole range?

01:20:55.747 --> 01:20:57.580
So that's what the first
principal component

01:20:57.580 --> 01:20:59.720
variable is measuring.

01:20:59.720 --> 01:21:03.440
The second principal component
variable gives positive weight

01:21:03.440 --> 01:21:07.250
to the long tenors, negative
weight to the short tenors.

01:21:07.250 --> 01:21:11.540
So it's looking at the
difference between the yield

01:21:11.540 --> 01:21:13.920
changes on the long
tenors verses the yield

01:21:13.920 --> 01:21:15.610
change on the short tenors.

01:21:15.610 --> 01:21:19.774
So that's looking at how the
spread in yields is changing.

01:21:27.090 --> 01:21:32.250
Then the third principal
component variable

01:21:32.250 --> 01:21:36.190
has this structure.

01:21:36.190 --> 01:21:38.780
And this structure
for the weights

01:21:38.780 --> 01:21:40.050
is like a double difference.

01:21:40.050 --> 01:21:44.570
It's looking at the difference
between the long tenor

01:21:44.570 --> 01:21:48.150
and medium tenor,
minus the medium tenor,

01:21:48.150 --> 01:21:50.710
minus the short tenor.

01:21:50.710 --> 01:21:54.100
So that's giving us a measure
of the curvature of the term

01:21:54.100 --> 01:21:57.440
structure and how that's
changing over time.

01:21:57.440 --> 01:21:59.350
So these principal
component variables

01:21:59.350 --> 01:22:01.600
are measuring the level
shift for the first,

01:22:01.600 --> 01:22:04.710
the spread for the second, and
the curvature for the third.

01:22:07.350 --> 01:22:09.250
With principal
components analysis,

01:22:09.250 --> 01:22:11.879
many times I think
people focus just

01:22:11.879 --> 01:22:14.170
on the first few principal
component variables and then

01:22:14.170 --> 01:22:16.480
say they're done.

01:22:16.480 --> 01:22:19.137
The last principle component
variable, and the last few,

01:22:19.137 --> 01:22:20.720
can be very, very
interesting as well,

01:22:20.720 --> 01:22:27.640
because these are the variables
of the original scales,

01:22:27.640 --> 01:22:33.420
the linear combinations which
have the least variability.

01:22:33.420 --> 01:22:35.760
And if you look at the
ninth principle component

01:22:35.760 --> 01:22:37.500
variable-- there were
nine yield changes

01:22:37.500 --> 01:22:43.810
here-- it's basically looking
at a weighted average of the 5

01:22:43.810 --> 01:22:47.580
and 10 year minus the 7 year.

01:22:47.580 --> 01:22:53.240
So this is like the hedge of the
7 year yield with the 5 and 10

01:22:53.240 --> 01:22:53.739
year.

01:22:56.910 --> 01:23:00.600
So that difference
in yield change

01:23:00.600 --> 01:23:03.005
is-- that combination
of yield change

01:23:03.005 --> 01:23:04.630
is going to have the
least variability.

01:23:07.310 --> 01:23:08.720
The principal
component variables

01:23:08.720 --> 01:23:10.860
have zero correlation.

01:23:10.860 --> 01:23:14.840
Here's just a pairs plot of the
first three principal component

01:23:14.840 --> 01:23:16.010
variables and the ninth.

01:23:16.010 --> 01:23:18.670
And you can see
that those have been

01:23:18.670 --> 01:23:22.240
transformed to have zero
correlations with each other.

01:23:24.750 --> 01:23:31.540
One can plot the cumulative
principal component variables

01:23:31.540 --> 01:23:35.570
over time to see how the
evolution of these underlying

01:23:35.570 --> 01:23:38.820
factors has changed
over the time period.

01:23:38.820 --> 01:23:42.300
And you'll recall that
we talked about the first

01:23:42.300 --> 01:23:43.720
being the level shift.

01:23:43.720 --> 01:23:49.750
Basically from 2001 to 2005, the
overall level of interest rates

01:23:49.750 --> 01:23:51.150
went down and then went up.

01:23:51.150 --> 01:23:54.030
And this is captured by this
first principal component

01:23:54.030 --> 01:24:00.770
variable accumulating from 0
down to minus 8, back up to 0.

01:24:06.170 --> 01:24:11.920
And the scale of this
change from 0 to minus 8

01:24:11.920 --> 01:24:16.270
is the amount of
greatest variability.

01:24:16.270 --> 01:24:19.340
The second principal
component variable

01:24:19.340 --> 01:24:24.130
accumulates from 0 up to
less than 6, back down to 0.

01:24:24.130 --> 01:24:27.092
So this is a measure
of the spread

01:24:27.092 --> 01:24:28.300
between long and short rates.

01:24:28.300 --> 01:24:31.470
So the spread
increased, and then it

01:24:31.470 --> 01:24:33.805
decreased over the period.

01:24:39.700 --> 01:24:46.560
And then the curvature, it
varies from 0 down to minus 1.5

01:24:46.560 --> 01:24:47.560
back up to 0.

01:24:47.560 --> 01:24:50.590
So how the curvature changed
over this entire period

01:24:50.590 --> 01:24:57.260
was much, much less, which
is perhaps as it should be.

01:24:57.260 --> 01:24:59.170
But these graphs
indicate basically

01:24:59.170 --> 01:25:03.710
how these underlying factors
evolved over the time period.

01:25:03.710 --> 01:25:10.410
In the case note I go through
and fit a statistical factor

01:25:10.410 --> 01:25:13.600
analysis model to
these same data

01:25:13.600 --> 01:25:16.980
and look at identifying
the number of factors.

01:25:16.980 --> 01:25:19.970
And also comparing the results
over this five year period

01:25:19.970 --> 01:25:24.030
with the period
from 2009 to 2013,

01:25:24.030 --> 01:25:27.330
and comparing those
different results.

01:25:27.330 --> 01:25:29.970
They are different,
and so it really

01:25:29.970 --> 01:25:33.890
matters over what period one
specifies these models to.

01:25:33.890 --> 01:25:37.620
And fitting these models is
really just a starting point

01:25:37.620 --> 01:25:41.490
where you want to
ultimately model

01:25:41.490 --> 01:25:44.450
the dynamics in these
factors and their structural

01:25:44.450 --> 01:25:47.150
relationships.

01:25:47.150 --> 01:25:49.000
So we'll finish there.