WEBVTT

00:00:00.040 --> 00:00:02.390
The following content is
provided under a Creative

00:00:02.390 --> 00:00:03.680
Commons license.

00:00:03.680 --> 00:00:06.640
Your support will help MIT
OpenCourseWare continue to

00:00:06.640 --> 00:00:09.980
offer high quality educational
resources for free.

00:00:09.980 --> 00:00:12.820
To make a donation or to view
additional materials from

00:00:12.820 --> 00:00:16.750
hundreds of MIT courses, visit
MIT OpenCourseWare at

00:00:16.750 --> 00:00:18.000
ocw.mit.edu.

00:00:21.170 --> 00:00:21.800
SHAWN COLE: Great.

00:00:21.800 --> 00:00:25.890
It's a real pleasure to be
here and thank you for

00:00:25.890 --> 00:00:27.350
listening to me.

00:00:27.350 --> 00:00:30.430
This is perhaps the capstone
lecture of the course or at

00:00:30.430 --> 00:00:32.150
least the last lecture
of the course.

00:00:32.150 --> 00:00:34.950
And I'm going to try to pick
up right where Michael left

00:00:34.950 --> 00:00:37.190
off talking about intention
to treat and moving on to

00:00:37.190 --> 00:00:38.880
treatment of the treated.

00:00:38.880 --> 00:00:41.420
But with any luck, they'll be
some time at the end where we

00:00:41.420 --> 00:00:44.620
can have more general
discussions or questions if

00:00:44.620 --> 00:00:48.310
people are still interested
about particular topics or

00:00:48.310 --> 00:00:49.450
have questions.

00:00:49.450 --> 00:00:51.630
And I'll stay after class
to talk to people.

00:00:51.630 --> 00:00:54.490
And I think you have my
email on the slides.

00:00:54.490 --> 00:00:57.970
So feel free to get in touch
with me at any time.

00:00:57.970 --> 00:00:59.540
We've got this great project.

00:00:59.540 --> 00:01:02.850
We want to evaluate
it or write JPAL.

00:01:02.850 --> 00:01:04.920
So what are we going
to do today?

00:01:04.920 --> 00:01:08.520
Look at some challenges to
randomized evaluations.

00:01:08.520 --> 00:01:10.740
So these problems.

00:01:10.740 --> 00:01:13.700
So basically, when people don't
do what you assign them

00:01:13.700 --> 00:01:17.080
to do because you can't
control them.

00:01:17.080 --> 00:01:20.270
You can control undergraduates
in a lab, but in the real

00:01:20.270 --> 00:01:21.660
world it's a lot harder.

00:01:21.660 --> 00:01:23.570
Then we're going to talk about
sort of how do you choose

00:01:23.570 --> 00:01:25.030
which effects to report
in your study?

00:01:25.030 --> 00:01:27.390
So you've got your study, you
did a bunch of household

00:01:27.390 --> 00:01:29.450
surveying, what do you
want to report?

00:01:29.450 --> 00:01:31.240
How credible are your results?

00:01:31.240 --> 00:01:32.990
Then we'll spend some time
talking about external

00:01:32.990 --> 00:01:36.530
validity, which is sort of the
question, OK, I have a study

00:01:36.530 --> 00:01:38.600
that I think is internally
valid.

00:01:38.600 --> 00:01:41.760
We did the randomization
correctly and the results we

00:01:41.760 --> 00:01:42.490
think are legitimate.

00:01:42.490 --> 00:01:44.550
But how much can that tell
us about the greater

00:01:44.550 --> 00:01:45.920
world around us?

00:01:45.920 --> 00:01:48.750
And finally, we'll conclude
by talking about cost

00:01:48.750 --> 00:01:49.440
effectiveness.

00:01:49.440 --> 00:01:52.420
Which, as economists, is
very important to us.

00:01:52.420 --> 00:01:55.220
So we may have a program that's
effective, but how do

00:01:55.220 --> 00:01:59.310
we compare whether we should
spend our precious aid or

00:01:59.310 --> 00:02:02.930
budget dollars on that
particular program as composed

00:02:02.930 --> 00:02:04.620
to a host of other programs?

00:02:04.620 --> 00:02:08.800
And so I usually teach at HBS
in the case method, which is

00:02:08.800 --> 00:02:11.320
very different than a lecture
format because it's basically

00:02:11.320 --> 00:02:13.770
the students always talking
and me refereeing.

00:02:13.770 --> 00:02:16.880
I'm not going to do that
today, but I'm very

00:02:16.880 --> 00:02:18.760
comfortable with interruptions,
or questions,

00:02:18.760 --> 00:02:23.430
or requests for clarification,
et cetera.

00:02:23.430 --> 00:02:26.060
And I think we all stay more
engaged if there's more

00:02:26.060 --> 00:02:26.750
discussion.

00:02:26.750 --> 00:02:29.790
And so that's super valuable.

00:02:29.790 --> 00:02:32.000
So that's the outline
for today.

00:02:32.000 --> 00:02:34.540
And the slides up here are going
to differ a little bit

00:02:34.540 --> 00:02:37.050
from the slides that you have
printed out because there was

00:02:37.050 --> 00:02:42.330
some last minute optimization
and re-coordination.

00:02:42.330 --> 00:02:45.350
So the basic problem, which I
think Michael talked about, is

00:02:45.350 --> 00:02:47.890
that individuals are allocated
to treatment groups and they

00:02:47.890 --> 00:02:50.640
don't receive treatment, or
individuals are allocated to

00:02:50.640 --> 00:02:51.840
the assignment group,
but somehow

00:02:51.840 --> 00:02:52.890
managed to get treatment.

00:02:52.890 --> 00:02:57.140
So you talked about students
didn't show up at school, so

00:02:57.140 --> 00:02:58.740
they didn't get the treatment
on the treatment day.

00:02:58.740 --> 00:03:02.610
Or the program said you can't
give this program, you can't

00:03:02.610 --> 00:03:05.430
give de-worming medicine to
girls over the age of 13

00:03:05.430 --> 00:03:07.170
because of health reasons.

00:03:07.170 --> 00:03:09.680
They may be pregnant and we
don't know how the de-worming

00:03:09.680 --> 00:03:11.870
medicine affects pregnancy.

00:03:11.870 --> 00:03:13.720
So what do you do?

00:03:13.720 --> 00:03:17.600
You came up with the solution
of estimating the program

00:03:17.600 --> 00:03:20.780
effect, ITT.

00:03:20.780 --> 00:03:24.680
Which is to use the original
assignment.

00:03:24.680 --> 00:03:27.770
So we have our baseline survey,
our baseline list of

00:03:27.770 --> 00:03:29.780
schools or people and we flipped
our coins or run our

00:03:29.780 --> 00:03:31.130
[UNINTELLIGIBLE] code
to randomize.

00:03:31.130 --> 00:03:33.470
So then it would just evaluate
them no matter what actually

00:03:33.470 --> 00:03:33.950
happened and them.

00:03:33.950 --> 00:03:36.170
We count them in the treatment
group or we count them in the

00:03:36.170 --> 00:03:37.090
control group.

00:03:37.090 --> 00:03:40.870
And that gives us our intention
to treat estimate.

00:03:40.870 --> 00:03:43.710
And so the interpretation of
that is, what was the average

00:03:43.710 --> 00:03:47.560
effect on an individual in the
treated population relative to

00:03:47.560 --> 00:03:50.650
an individual in the comparison
population?

00:03:50.650 --> 00:03:55.900
And you've already covered
this with Michael?

00:03:55.900 --> 00:03:57.240
This is just a review.

00:03:57.240 --> 00:03:59.420
So is this the right
number to look for?

00:03:59.420 --> 00:04:01.820
Well, if you're thinking about
putting in a de-worming

00:04:01.820 --> 00:04:03.820
program, you have to realize
that people aren't going to be

00:04:03.820 --> 00:04:06.570
at school when the government
shows up to administer the

00:04:06.570 --> 00:04:09.250
de-worming program to
all their students.

00:04:09.250 --> 00:04:10.550
And that's that.

00:04:10.550 --> 00:04:12.970
So maybe we can just pause for
a second and think about some

00:04:12.970 --> 00:04:15.630
programs that if you're
an [? IPARA ?]

00:04:15.630 --> 00:04:18.480
you think you're going to be
involved with, or if you're

00:04:18.480 --> 00:04:20.740
involved in an NGO, the type
of program you're running.

00:04:20.740 --> 00:04:24.530
And maybe a few people can
volunteer what the intent to

00:04:24.530 --> 00:04:27.590
treat estimate might look like
in their evaluation and

00:04:27.590 --> 00:04:30.580
whether it's something
they care about.

00:04:30.580 --> 00:04:31.830
Put this into practice.

00:04:39.244 --> 00:04:41.570
I should be comfortable with
long pregnant pauses as well.

00:04:45.660 --> 00:04:46.140
Any examples?

00:04:46.140 --> 00:04:46.450
Excellent.

00:04:46.450 --> 00:04:49.348
AUDIENCE: So one of the projects
that we're working on

00:04:49.348 --> 00:04:52.246
is looking at the impact of
financial education in low

00:04:52.246 --> 00:04:54.661
income communities
in New York City.

00:04:57.559 --> 00:05:00.210
Obviously we're trying to
measure the impact that

00:05:00.210 --> 00:05:03.620
financial education has, but
we're working specifically

00:05:03.620 --> 00:05:07.350
with a certain NGO to implement
that education.

00:05:12.260 --> 00:05:15.206
Whatever estimate we get that's
the intention to treat

00:05:15.206 --> 00:05:17.661
will just be measuring the
impact of the program.

00:05:17.661 --> 00:05:19.500
SHAWN COLE: And so what are the
compliance problems you

00:05:19.500 --> 00:05:20.750
anticipate there?

00:05:23.700 --> 00:05:30.034
AUDIENCE: Maybe people not
showing up for classes or not

00:05:30.034 --> 00:05:31.657
following through on whatever
they're asked

00:05:31.657 --> 00:05:35.760
to do in the education.

00:05:35.760 --> 00:05:37.330
SHAWN COLE: Or people in the
control group could get on the

00:05:37.330 --> 00:05:41.150
internet and download the
government's financial

00:05:41.150 --> 00:05:43.520
literacy program and study
very industriously

00:05:43.520 --> 00:05:45.780
themselves and learn.

00:05:45.780 --> 00:05:47.990
The intention to treat will tell
you what the program does

00:05:47.990 --> 00:05:51.580
and you could imagine that if
only 15% of the people turn up

00:05:51.580 --> 00:05:54.770
for your meetings, you're going
to have a pretty small

00:05:54.770 --> 00:05:56.210
difference between the treatment
group and the

00:05:56.210 --> 00:05:57.480
comparison group.

00:05:57.480 --> 00:06:00.600
That'll be an accurate measure
of the program, but that's not

00:06:00.600 --> 00:06:02.690
going to tell us much about
how important financial

00:06:02.690 --> 00:06:04.540
literacy is in making
financial decisions.

00:06:04.540 --> 00:06:06.760
So if maybe your outcome
variable is what interest rate

00:06:06.760 --> 00:06:08.960
do people borrow at, or do they
pay their credit cards

00:06:08.960 --> 00:06:11.860
back on time, you might
not find much effect.

00:06:11.860 --> 00:06:15.250
From that we can't conclude
that financial education

00:06:15.250 --> 00:06:18.300
doesn't affect credit card
repayment behavior.

00:06:18.300 --> 00:06:20.070
We just have to conclude that
this particular program we

00:06:20.070 --> 00:06:22.440
delivered wasn't
very effective.

00:06:22.440 --> 00:06:23.580
So that's an example.

00:06:23.580 --> 00:06:24.830
Any other examples?

00:06:27.020 --> 00:06:30.210
The point is hopefully
pretty clear.

00:06:30.210 --> 00:06:30.800
Great.

00:06:30.800 --> 00:06:34.520
So I don't know if you went
through these calculations.

00:06:34.520 --> 00:06:35.840
Yeah, OK.

00:06:35.840 --> 00:06:36.930
It's absolutely simple.

00:06:36.930 --> 00:06:39.060
You just take the average in
this one, average in this one,

00:06:39.060 --> 00:06:40.080
and it's the difference.

00:06:40.080 --> 00:06:44.260
So it's not rocket science.

00:06:44.260 --> 00:06:49.370
So it relates to the actual
program and it gives us an

00:06:49.370 --> 00:06:50.620
estimate of the program's
impact.

00:06:59.550 --> 00:07:03.580
I guess the second method we're
going to talk about,

00:07:03.580 --> 00:07:06.410
which you talked about in your
learning groups this morning

00:07:06.410 --> 00:07:07.710
too, is treatment
on the treated.

00:07:07.710 --> 00:07:11.310
And maybe you could motivate
this by telling the joke about

00:07:11.310 --> 00:07:12.650
the econometricians or

00:07:12.650 --> 00:07:15.110
statisticians who went hunting.

00:07:15.110 --> 00:07:16.320
Have you heard this?

00:07:16.320 --> 00:07:19.490
So the first one aims at a deer
and shoots and misses 10

00:07:19.490 --> 00:07:20.910
meters to the left.

00:07:20.910 --> 00:07:24.660
And the second one aims at
the deer and misses 10

00:07:24.660 --> 00:07:26.080
meters to the right.

00:07:26.080 --> 00:07:29.940
And the third one says,
yes, we got it.

00:07:29.940 --> 00:07:33.740
So the intention to treat is
giving us the average program

00:07:33.740 --> 00:07:36.780
effect, but maybe we care more
about what's the effect of

00:07:36.780 --> 00:07:38.050
knowing financial literacy?

00:07:38.050 --> 00:07:39.650
What's the effect of actually
changing people's

00:07:39.650 --> 00:07:41.680
understanding of financial
literacy?

00:07:41.680 --> 00:07:45.490
And that's where the treatment
on the treated estimate can

00:07:45.490 --> 00:07:48.770
provide some assistance.

00:07:48.770 --> 00:07:52.410
So again, we went back to the
worming example which you

00:07:52.410 --> 00:07:53.080
talked about.

00:07:53.080 --> 00:07:58.750
And so we have 76% of the
people in the treatment

00:07:58.750 --> 00:08:00.760
schools got some treatment
in the first round.

00:08:00.760 --> 00:08:03.190
And in the next round
it was 72%.

00:08:03.190 --> 00:08:06.370
So that's actually nowhere
near 100%.

00:08:06.370 --> 00:08:07.900
One-fifth of the students
are not getting

00:08:07.900 --> 00:08:09.150
their de-worming medicine.

00:08:13.080 --> 00:08:18.840
And some students in the
comparison group received

00:08:18.840 --> 00:08:19.890
treatment also.

00:08:19.890 --> 00:08:22.760
So for example, I think when you
were testing the children

00:08:22.760 --> 00:08:25.220
and you found that they had
worms in the comparison group,

00:08:25.220 --> 00:08:27.680
the sort of medical protocol
required you

00:08:27.680 --> 00:08:30.530
to give them treatment.

00:08:30.530 --> 00:08:33.159
So what would you do if you
wanted to know the effect of

00:08:33.159 --> 00:08:37.270
medicine on the children
who took the medicine?

00:08:37.270 --> 00:08:40.530
And so you can't just compare
children who took the medicine

00:08:40.530 --> 00:08:42.500
with children who didn't
take the medicine.

00:08:42.500 --> 00:08:46.910
That leads to all the same
selection problems we had in

00:08:46.910 --> 00:08:51.430
the first few days of class
where the people who decided

00:08:51.430 --> 00:08:53.800
not to come to school that day
or weren't able to make it to

00:08:53.800 --> 00:08:55.930
school that day are different
than the people who did.

00:08:55.930 --> 00:08:59.450
And the people in the comparison
group who went out

00:08:59.450 --> 00:09:01.440
to the pharmacy and bought the
de-worming medicine are going

00:09:01.440 --> 00:09:06.950
to be different than the
people who didn't.

00:09:06.950 --> 00:09:09.630
So what we do in this case is
something really quite simple

00:09:09.630 --> 00:09:15.550
and it's at the foundation of
the entire field of modern

00:09:15.550 --> 00:09:16.480
empirical research.

00:09:16.480 --> 00:09:20.380
But we won't go into all the
details, we'll just talk about

00:09:20.380 --> 00:09:24.400
this treatment on the treated
estimator, or ToT.

00:09:24.400 --> 00:09:27.820
And so what you don't do is just
take the change of the

00:09:27.820 --> 00:09:31.040
people who were treated and the
change of the people who

00:09:31.040 --> 00:09:32.970
were not treated and
compare them.

00:09:32.970 --> 00:09:36.630
That would be just silly because
people who are treated

00:09:36.630 --> 00:09:38.980
are different than the people
who are not treated.

00:09:42.760 --> 00:09:46.480
So I think conceptually, this
is actually a really simple

00:09:46.480 --> 00:09:47.150
thing that we're doing.

00:09:47.150 --> 00:09:49.890
So I don't want to get bogged
down or confused in the math.

00:09:49.890 --> 00:09:55.090
But in the ideal experiment, in
the treatment group 100% of

00:09:55.090 --> 00:09:55.850
the people are treated.

00:09:55.850 --> 00:09:58.250
In the comparison group, 0%
of the people are treated.

00:09:58.250 --> 00:10:00.820
And then the average difference
is just the average

00:10:00.820 --> 00:10:03.260
treatment effect.

00:10:03.260 --> 00:10:06.380
But in real life when we do our
experiments, we very often

00:10:06.380 --> 00:10:08.550
have leakage across groups.

00:10:08.550 --> 00:10:12.050
So the treatment control
difference is not 1, 100% in

00:10:12.050 --> 00:10:14.460
the treatment group treated
and 0% in the

00:10:14.460 --> 00:10:15.930
control group treated.

00:10:15.930 --> 00:10:17.180
But it's smaller.

00:10:21.430 --> 00:10:24.270
The formal econometric
phrase for this is

00:10:24.270 --> 00:10:25.520
instrumental variables.

00:10:29.220 --> 00:10:31.330
We instrument the probability of
treatment with the original

00:10:31.330 --> 00:10:34.510
assignment and this will rescale
the difference in

00:10:34.510 --> 00:10:37.890
means to give us a better
estimate of what the effect of

00:10:37.890 --> 00:10:41.140
the treatment on the people
who were treated is.

00:10:58.910 --> 00:11:00.080
So this is a simple example.

00:11:00.080 --> 00:11:02.020
And it turns out it gets
more complicated.

00:11:02.020 --> 00:11:03.510
We're not going to go into
all the nuances.

00:11:03.510 --> 00:11:05.330
But just suppose for simplicity
that children who

00:11:05.330 --> 00:11:07.880
get the treatment have a weight
gain of A, irrespective

00:11:07.880 --> 00:11:08.916
of whether they're in
the treatment or

00:11:08.916 --> 00:11:10.000
in the control school.

00:11:10.000 --> 00:11:11.820
And children who get no
treatment have a weight gain

00:11:11.820 --> 00:11:14.380
of B. Again, in both schools.

00:11:14.380 --> 00:11:16.990
And we want to know A minus B,
the difference between getting

00:11:16.990 --> 00:11:18.520
treated and not getting
treated.

00:11:24.540 --> 00:11:26.630
This is the math.

00:11:26.630 --> 00:11:28.680
And maybe it looks complicated,

00:11:28.680 --> 00:11:31.010
but it's really not.

00:11:34.410 --> 00:11:36.880
I think if we work through the
Excel worksheet explaining

00:11:36.880 --> 00:11:39.880
what we're doing and then go
back to the math, it'll become

00:11:39.880 --> 00:11:41.130
pretty clear.

00:11:43.190 --> 00:11:46.860
Imagine we run this experiment
with pupils in School 1 and

00:11:46.860 --> 00:11:49.970
pupils in School 2.

00:11:49.970 --> 00:11:51.830
We intended to treat everybody
in School 1.

00:11:51.830 --> 00:11:53.450
We intended for everybody
in School 2 to be

00:11:53.450 --> 00:11:56.000
in the control group.

00:11:56.000 --> 00:11:58.470
Unfortunately, we were only able
to treat 6 out of the 10

00:11:58.470 --> 00:12:02.670
people in this group and 2 out
of the 10 people in this group

00:12:02.670 --> 00:12:03.990
managed to get treatment
somehow.

00:12:07.140 --> 00:12:09.840
The formula sort of guides us
through what we need to do and

00:12:09.840 --> 00:12:14.120
then we can talk about the
intuition between what it is.

00:12:14.120 --> 00:12:17.120
Is there a heroic soul who's
willing to try and just talk

00:12:17.120 --> 00:12:18.370
through the math
and figure out?

00:12:22.889 --> 00:12:24.139
A non-heroic soul?

00:12:28.290 --> 00:12:29.540
Cold call.

00:12:32.630 --> 00:12:35.930
So, Will, let's start out
with the easy stuff.

00:12:35.930 --> 00:12:39.150
So the formula we want is the
average in the treatment group

00:12:39.150 --> 00:12:42.180
minus the average in the control
group divided by the

00:12:42.180 --> 00:12:43.960
probability of treatment in the
treatment group minus the

00:12:43.960 --> 00:12:46.050
probability of treatment
in the control group.

00:12:46.050 --> 00:12:50.055
OK, so how do we calculate
all of these things?

00:12:50.055 --> 00:12:52.470
AUDIENCE: So first you want to
look at the change in the

00:12:52.470 --> 00:12:53.960
treatment group.

00:12:53.960 --> 00:12:59.100
So you average out the observed
change in weight for

00:12:59.100 --> 00:13:01.540
the treatment group.

00:13:01.540 --> 00:13:02.790
SHAWN COLE: Great.

00:13:04.472 --> 00:13:07.102
OK, so that's three.

00:13:07.102 --> 00:13:09.910
AUDIENCE: You do the same
calculation for the control

00:13:09.910 --> 00:13:12.994
group looking at the average
change in weight.

00:13:15.800 --> 00:13:18.907
You take the difference between
those two numbers and

00:13:18.907 --> 00:13:20.110
that's the numerator of
the proper fraction.

00:13:20.110 --> 00:13:24.380
SHAWN COLE: OK, so
that's yt minus--

00:13:24.380 --> 00:13:26.490
OK.

00:13:26.490 --> 00:13:29.370
AUDIENCE: And then to do the
second half of the calculation

00:13:29.370 --> 00:13:30.330
you've got the denominator.

00:13:30.330 --> 00:13:35.879
Compare the rate of compliance
in the treatment group to the

00:13:35.879 --> 00:13:39.252
rate of compliance in
the control group.

00:13:39.252 --> 00:13:44.196
So the percentage in the
treatment group that actually

00:13:44.196 --> 00:13:47.770
complied would be 0.6.

00:13:47.770 --> 00:13:49.780
SHAWN COLE: 1, 2, 3, 4, 5, 6.

00:13:49.780 --> 00:13:50.370
Awesome.

00:13:50.370 --> 00:13:51.670
OK.

00:13:51.670 --> 00:13:54.975
AUDIENCE: And for the control
group, it would be

00:13:54.975 --> 00:13:56.406
the 2 out of 10.

00:13:56.406 --> 00:13:59.400
So the rate that received
the treatment.

00:13:59.400 --> 00:14:02.820
So 0.6 minus 0.2.

00:14:02.820 --> 00:14:05.120
SHAWN COLE: 0.6 minus 0.2.

00:14:05.120 --> 00:14:06.860
OK.

00:14:06.860 --> 00:14:09.220
And now?

00:14:09.220 --> 00:14:10.930
AUDIENCE: Just divide
the top and bottom.

00:14:13.580 --> 00:14:15.350
SHAWN COLE: OK, so
that's the math.

00:14:15.350 --> 00:14:17.170
See if we got it right.

00:14:17.170 --> 00:14:18.810
Yes, we got it right.

00:14:18.810 --> 00:14:19.620
Excellent.

00:14:19.620 --> 00:14:21.990
So what's the intuition behind
what we're doing here?

00:14:29.718 --> 00:14:33.740
AUDIENCE: You're sort of doing
a weighted average.

00:14:37.670 --> 00:14:39.420
SHAWN COLE: OK, we're certainly
taking the two

00:14:39.420 --> 00:14:43.390
averages and we're taking
the difference of them.

00:14:43.390 --> 00:14:44.810
And what do you mean
by weighting?

00:14:44.810 --> 00:14:49.165
AUDIENCE: You're weighting
them by the degree of

00:14:49.165 --> 00:14:50.500
compliance.

00:14:50.500 --> 00:14:54.800
SHAWN COLE: So if the difference
in compliance--

00:14:54.800 --> 00:14:58.670
keeping the top term the
same, so suppose

00:14:58.670 --> 00:14:59.530
that this is still--

00:14:59.530 --> 00:15:03.312
AUDIENCE: If compliance were
really horrible you'd end up

00:15:03.312 --> 00:15:05.080
with no effect.

00:15:05.080 --> 00:15:07.208
It could swap the difference
between the

00:15:07.208 --> 00:15:09.400
control and the treatment.

00:15:09.400 --> 00:15:12.630
SHAWN COLE: OK, so let's take
the mental exercise, keeping

00:15:12.630 --> 00:15:19.040
this yt minus yc the
same at 2.1.

00:15:19.040 --> 00:15:21.350
If the compliance in the
treatment group goes down from

00:15:21.350 --> 00:15:25.970
0.6 to 0.3, what's going to
happen to the ToT effect?

00:15:25.970 --> 00:15:27.490
Is it going to go up or down?

00:15:27.490 --> 00:15:27.960
AUDIENCE: Up.

00:15:27.960 --> 00:15:30.910
SHAWN COLE: Why?

00:15:30.910 --> 00:15:32.160
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

00:15:34.070 --> 00:15:37.750
The people that you were
targeting are not receiving

00:15:37.750 --> 00:15:38.700
the treatment.

00:15:38.700 --> 00:15:44.090
So in a sense, the effect that
you are trying to describe is

00:15:44.090 --> 00:15:47.510
not what we would expect would
happen because some of these

00:15:47.510 --> 00:15:51.205
people did not comply with what
you were expecting from

00:15:51.205 --> 00:15:53.480
the beginning.

00:15:53.480 --> 00:15:57.190
AUDIENCE: It could go either
up or down depending on the

00:15:57.190 --> 00:15:58.070
second parameter.

00:15:58.070 --> 00:16:00.470
SHAWN COLE: OK, so what I want
to do is I want to say there

00:16:00.470 --> 00:16:02.030
are four parameters here.

00:16:02.030 --> 00:16:03.830
There's the average in
the treatment group.

00:16:03.830 --> 00:16:05.060
There's the average in
the control group.

00:16:05.060 --> 00:16:06.500
There's the probability of
treatment in the treatment

00:16:06.500 --> 00:16:08.000
group and the probability
of treatment

00:16:08.000 --> 00:16:09.490
in the control group.

00:16:09.490 --> 00:16:15.870
If this goes down, what's the
ToT estimate going to do?

00:16:15.870 --> 00:16:16.760
It's going to go?

00:16:16.760 --> 00:16:17.620
AUDIENCE: Increase.

00:16:17.620 --> 00:16:19.380
SHAWN COLE: Why?

00:16:19.380 --> 00:16:24.280
AUDIENCE: Because you previously
underestimated the

00:16:24.280 --> 00:16:27.660
effect because not that many
people received the treatment.

00:16:27.660 --> 00:16:29.910
SHAWN COLE: So mathematically,
if we're making the

00:16:29.910 --> 00:16:33.490
denominator smaller, the size of
the fraction has to go up.

00:16:33.490 --> 00:16:36.300
That's just a law
of mathematics.

00:16:36.300 --> 00:16:38.560
But the intuition, what I'm
looking for is the intuition.

00:16:38.560 --> 00:16:41.480
So what's going on?

00:16:41.480 --> 00:16:46.600
So suppose I first tell you
that this difference is 3.

00:16:46.600 --> 00:16:49.083
Can you guys not see
these numbers here?

00:16:49.083 --> 00:16:50.030
AUDIENCE: No.

00:16:50.030 --> 00:16:51.310
SHAWN COLE: That's excellent.

00:16:57.680 --> 00:17:00.420
OK, so let me use a blackboard,
I think this'll be

00:17:00.420 --> 00:17:01.670
a little bit helpful.

00:17:06.020 --> 00:17:10.690
AUDIENCE: So were you asking
just basically that because

00:17:10.690 --> 00:17:12.297
not everybody is being
treated it's

00:17:12.297 --> 00:17:15.550
diluting the overall result?

00:17:15.550 --> 00:17:16.069
SHAWN COLE: Explain this.

00:17:16.069 --> 00:17:19.220
Give us the sort of intuition.

00:17:19.220 --> 00:17:30.100
I said that this treatment group
has an average of 3 and

00:17:30.100 --> 00:17:37.010
the control group has an
average of 0.1 I think.

00:17:37.010 --> 00:17:41.210
No 0.9.

00:17:41.210 --> 00:17:46.360
And first I told you that the
probability of treatment

00:17:46.360 --> 00:17:51.900
conditional on being in the
treatment group is 0.6.

00:17:51.900 --> 00:17:53.510
But now I tell you actually
it's much lower.

00:17:53.510 --> 00:17:54.880
It's only 0.3.

00:17:54.880 --> 00:17:56.420
So we know from the math
that it's going to

00:17:56.420 --> 00:17:57.230
be higher, but why?

00:17:57.230 --> 00:17:57.880
What's the intuition?

00:17:57.880 --> 00:18:00.128
Why does it have to be higher?

00:18:00.128 --> 00:18:02.508
AUDIENCE: Because the gap in
outcomes is actually caused by

00:18:02.508 --> 00:18:02.990
fewer people.

00:18:02.990 --> 00:18:05.620
Meaning that for those two
people it must have been a

00:18:05.620 --> 00:18:08.670
really big gap to balance out
the average to still be a lot

00:18:08.670 --> 00:18:10.930
higher when everyone else
in the treated group

00:18:10.930 --> 00:18:12.380
actually got zeros.

00:18:12.380 --> 00:18:13.900
SHAWN COLE: Yeah, absolutely.

00:18:13.900 --> 00:18:18.820
So we've got some difference
in observed outcomes.

00:18:18.820 --> 00:18:21.760
And that has to be caused by the
fact that there were more

00:18:21.760 --> 00:18:24.580
people in the treatment group
treated than in the control

00:18:24.580 --> 00:18:25.860
group who were treated.

00:18:25.860 --> 00:18:28.240
So if there were a whole lot
of people in the treatment

00:18:28.240 --> 00:18:32.240
group treated, then this rate
difference could be just that

00:18:32.240 --> 00:18:36.110
the average score is two higher
for each person in the

00:18:36.110 --> 00:18:37.140
treatment group, if
everybody in the

00:18:37.140 --> 00:18:38.500
treatment group were treated.

00:18:38.500 --> 00:18:40.375
But if only three people in
the treatment group were

00:18:40.375 --> 00:18:44.800
treated, to raise the average
from 0.9 to 3, you have to

00:18:44.800 --> 00:18:48.810
have a really big effect
on those three people.

00:18:48.810 --> 00:18:51.940
So that's all the treatment on
the treated estimator is

00:18:51.940 --> 00:18:56.960
doing, is it's rescaling this
observed effect to account for

00:18:56.960 --> 00:19:03.630
the fact that the difference
is not 1 here.

00:19:03.630 --> 00:19:06.760
If we had perfect compliance,
if this were 1 and this were

00:19:06.760 --> 00:19:10.490
0, then the denominator of the
fraction would just be 1.

00:19:10.490 --> 00:19:12.910
This would look just like the
intention to treat and the

00:19:12.910 --> 00:19:14.680
intention to treat and the
treatment on the treated

00:19:14.680 --> 00:19:16.710
estimators would be the same.

00:19:16.710 --> 00:19:21.050
But if we have imperfect
compliance, then we can scale

00:19:21.050 --> 00:19:24.440
up the effect to account for
the fact that all of the

00:19:24.440 --> 00:19:27.710
change has to be due to the fact
that a smaller number of

00:19:27.710 --> 00:19:30.153
people were getting
the treatment.

00:19:30.153 --> 00:19:34.750
AUDIENCE: Now you can only do
this because you believe there

00:19:34.750 --> 00:19:37.780
is no systematic difference for
why people were treated or

00:19:37.780 --> 00:19:40.420
not in between these
two groups?

00:19:40.420 --> 00:19:40.720
SHAWN COLE: Right.

00:19:40.720 --> 00:19:46.900
So there's some technical
assumptions on how you

00:19:46.900 --> 00:19:48.470
interpret these effects.

00:19:48.470 --> 00:19:52.040
But if we agree that the effect
is basically a constant

00:19:52.040 --> 00:19:54.580
effect, so our literacy program
has the same effect on

00:19:54.580 --> 00:19:58.530
everybody, then this
is perfectly fine.

00:19:58.530 --> 00:20:01.950
And we can do this because
we randomly assigned the

00:20:01.950 --> 00:20:03.830
treatment in the control group
so ex ante the two

00:20:03.830 --> 00:20:05.080
groups are the same.

00:20:07.570 --> 00:20:09.510
Will?

00:20:09.510 --> 00:20:10.440
AUDIENCE: So why do you not have
to worry that the people

00:20:10.440 --> 00:20:13.780
in the treatment group who
actually comply are the more

00:20:13.780 --> 00:20:16.350
motivated batch of the
treatment group?

00:20:16.350 --> 00:20:17.110
SHAWN COLE: Right.

00:20:17.110 --> 00:20:27.650
That is a concern and it's
covered in the papers, which

00:20:27.650 --> 00:20:30.020
I'll get to at the end.

00:20:30.020 --> 00:20:34.270
Basically, the sort of
aggressive intuition is that

00:20:34.270 --> 00:20:37.520
this tends to measure the
effect on people who are

00:20:37.520 --> 00:20:41.180
affected by the program.

00:20:41.180 --> 00:20:45.490
But in general, this is a
pretty good way of just

00:20:45.490 --> 00:20:48.870
scaling up your program effects
to account for the

00:20:48.870 --> 00:20:50.120
possibility of noncompliance.

00:20:53.470 --> 00:20:57.441
So moving back to--

00:20:57.441 --> 00:20:58.320
sure.

00:20:58.320 --> 00:21:00.048
AUDIENCE: Did you answer
his question?

00:21:03.160 --> 00:21:06.560
SHAWN COLE: Why do we have
to worry about it?

00:21:06.560 --> 00:21:08.150
AUDIENCE: Yeah.

00:21:08.150 --> 00:21:12.110
It seemed like it's
a serious problem.

00:21:12.110 --> 00:21:14.340
AUDIENCE: It started random, but
then people who actually

00:21:14.340 --> 00:21:17.060
got treated, if they were
treated because they were

00:21:17.060 --> 00:21:20.500
systematically different
then it's no longer--

00:21:20.500 --> 00:21:23.412
your control group is no longer
a good representation

00:21:23.412 --> 00:21:24.250
of the [UNINTELLIGIBLE].

00:21:24.250 --> 00:21:24.870
SHAWN COLE: Right.

00:21:24.870 --> 00:21:27.540
There are technical assumptions
about monotonicity

00:21:27.540 --> 00:21:31.910
and independence that I sort of
would rather not go into.

00:21:31.910 --> 00:21:38.040
But if you'll at least just
grant that there's a constant

00:21:38.040 --> 00:21:42.960
treatment effect, then I think
we're fine and by scaling up

00:21:42.960 --> 00:21:48.476
the impact to account for the
non-compliers, we'll be OK.

00:21:48.476 --> 00:21:50.690
AUDIENCE: Is the idea-- sorry
not to belabor the point.

00:21:50.690 --> 00:21:53.070
But is the idea that this method
will get the correct

00:21:53.070 --> 00:21:55.980
magnitude of the treatment on
those who are treated in this

00:21:55.980 --> 00:21:58.450
study, but you can't necessarily
extrapolate that

00:21:58.450 --> 00:22:00.620
to say, that would have
been the same

00:22:00.620 --> 00:22:01.570
treatment on the treated.

00:22:01.570 --> 00:22:03.550
Would you be able to force these
other people who might

00:22:03.550 --> 00:22:05.460
be different in some
way to get treated.

00:22:05.460 --> 00:22:05.790
SHAWN COLE: Right.

00:22:05.790 --> 00:22:09.430
This will tell you the correct
magnitude of the people who

00:22:09.430 --> 00:22:11.960
because they were treated,
get the treatment.

00:22:11.960 --> 00:22:14.010
So that's again, a relevant
parameter when you're running

00:22:14.010 --> 00:22:18.050
a program that causes people
to take the treatment, you

00:22:18.050 --> 00:22:19.870
have the effect.

00:22:19.870 --> 00:22:21.500
If on the other hand you're the
government and you have

00:22:21.500 --> 00:22:25.985
some ability to compel
compliance, then you might not

00:22:25.985 --> 00:22:26.970
get effect.

00:22:26.970 --> 00:22:28.240
So you may worry about this.

00:22:28.240 --> 00:22:31.130
Some people have done studies to
try and sort of see how big

00:22:31.130 --> 00:22:32.640
a problem this is.

00:22:32.640 --> 00:22:36.630
And one example I can cite is
some studies on educational

00:22:36.630 --> 00:22:37.020
literature.

00:22:37.020 --> 00:22:40.520
So in the US people have looked
at mandatory school

00:22:40.520 --> 00:22:44.120
attendance laws that say in some
states you can drop out

00:22:44.120 --> 00:22:46.330
of school at the age of 16 and
in some states you can drop

00:22:46.330 --> 00:22:49.400
out of school at
the age of 15.

00:22:49.400 --> 00:22:52.130
And so changes in these
laws induce some

00:22:52.130 --> 00:22:53.460
people to stay in longer.

00:22:53.460 --> 00:22:56.720
But probably nobody here would
have been affected by one of

00:22:56.720 --> 00:22:59.040
these mandatory schooling laws
because we were all planning

00:22:59.040 --> 00:23:01.340
to go on to college anyway.

00:23:01.340 --> 00:23:05.550
And so people have compared
estimates in the US with those

00:23:05.550 --> 00:23:08.280
in the UK where the mandatory
schooling law were very

00:23:08.280 --> 00:23:10.700
binding and they found that
the point estimates were

00:23:10.700 --> 00:23:11.370
pretty similar.

00:23:11.370 --> 00:23:15.080
So that's an example of where
it can be reasonable.

00:23:15.080 --> 00:23:18.210
But this is something that you
have to treat with a little

00:23:18.210 --> 00:23:20.850
bit of caution.

00:23:20.850 --> 00:23:23.100
So there are other challenges
with this.

00:23:25.610 --> 00:23:28.680
To get the ToT, we need to
know the probability of

00:23:28.680 --> 00:23:30.750
treatment conditional being
on the treatment and the

00:23:30.750 --> 00:23:32.750
probability of being treated
condition on being in the

00:23:32.750 --> 00:23:34.020
control group.

00:23:34.020 --> 00:23:35.480
So why might these
be hard to get?

00:23:38.896 --> 00:23:40.848
AUDIENCE: I actually
had a question.

00:23:40.848 --> 00:23:44.150
I don't know if it's
directly related.

00:23:44.150 --> 00:23:48.008
Does this work if the
probability that you're

00:23:48.008 --> 00:23:50.372
treated in both the treatment
and control turn out to be

00:23:50.372 --> 00:23:51.710
equal or is there probability
that--

00:23:51.710 --> 00:23:53.690
SHAWN COLE: Great question.

00:23:53.690 --> 00:23:56.780
So you're anticipating a slide
three slides from now.

00:23:56.780 --> 00:23:58.100
So let's go there and
then we'll come

00:23:58.100 --> 00:23:59.420
back to my other problem.

00:23:59.420 --> 00:24:01.230
But this is the equation.

00:24:01.230 --> 00:24:03.070
So what happens if the
probability of treatment and

00:24:03.070 --> 00:24:05.838
the treatment in the control
group is the same?

00:24:05.838 --> 00:24:07.275
AUDIENCE: Mathematically
you have problems.

00:24:10.630 --> 00:24:12.830
SHAWN COLE: Mathematically
you're dividing something by 0

00:24:12.830 --> 00:24:16.160
and you're going to get infinity
or negative infinity.

00:24:16.160 --> 00:24:19.320
So yeah, that's not
successful.

00:24:19.320 --> 00:24:20.740
I mean, another way of
putting that is that

00:24:20.740 --> 00:24:22.300
your experiment failed.

00:24:22.300 --> 00:24:24.910
You randomly assigned treatment
and you assumed that

00:24:24.910 --> 00:24:27.540
your program, a financial
literacy education program,

00:24:27.540 --> 00:24:31.250
de-worming medicine, is going
to deliver a product.

00:24:31.250 --> 00:24:33.850
But if, in fact, it doesn't
change the probability of

00:24:33.850 --> 00:24:36.060
treatment at all, then
your program failed.

00:24:36.060 --> 00:24:39.280
It's sort of like if I
were to sit here and

00:24:39.280 --> 00:24:41.210
working with an [? MFI, ?]

00:24:41.210 --> 00:24:43.810
send them an email and say,
I'm assigning these 50

00:24:43.810 --> 00:24:46.540
villages to treatment and these
50 villages to control.

00:24:46.540 --> 00:24:49.560
And then the email gets lost and
they don't do any sort of

00:24:49.560 --> 00:24:50.810
treatment or control.

00:24:50.810 --> 00:24:52.570
And then they send me the
data back a year later

00:24:52.570 --> 00:24:53.430
and I do the analysis.

00:24:53.430 --> 00:24:55.940
Well, I'm going to find that
there's no difference in

00:24:55.940 --> 00:24:58.030
probability of treatment between
the two groups and I

00:24:58.030 --> 00:24:59.700
won't have anything interesting
to say.

00:25:03.190 --> 00:25:05.300
So that's definitely
one problem.

00:25:05.300 --> 00:25:10.270
The experiment doesn't work
unless the treatment induces a

00:25:10.270 --> 00:25:11.980
change in probability of
treatment between the

00:25:11.980 --> 00:25:13.230
treatment in the comparison
groups.

00:25:16.200 --> 00:25:17.610
So that's one problem.

00:25:17.610 --> 00:25:19.600
There's a second problem with
this estimation since we're on

00:25:19.600 --> 00:25:23.540
this slide, which is perhaps,
slightly more technical.

00:25:23.540 --> 00:25:27.830
But if the difference between
the treatment and the

00:25:27.830 --> 00:25:29.560
comparison group is small.

00:25:29.560 --> 00:25:33.360
So say it's 10% of the treatment
group are treated

00:25:33.360 --> 00:25:36.720
and none of the comparison
group is treated.

00:25:36.720 --> 00:25:40.660
Then we're estimating the mean
of all of the treated people

00:25:40.660 --> 00:25:43.260
minus the mean of all the
control people and dividing by

00:25:43.260 --> 00:25:47.280
0.1, which is the same thing
as multiplying by 10.

00:25:47.280 --> 00:25:51.510
So if there's a little bit of
noise in these measures, then

00:25:51.510 --> 00:25:52.550
instead of--

00:25:52.550 --> 00:25:54.300
suppose the true mean
is 3, but you

00:25:54.300 --> 00:25:57.410
happened to measure 3.1.

00:25:57.410 --> 00:25:59.890
Then that noise is going
to be blown up by a

00:25:59.890 --> 00:26:02.690
factor of 10 as well.

00:26:02.690 --> 00:26:08.020
So the estimate is going to be
much less precise when the

00:26:08.020 --> 00:26:11.540
difference in treatment between
the treatment and the

00:26:11.540 --> 00:26:15.740
comparison group is low.

00:26:15.740 --> 00:26:19.333
And as you said, if it gets
to 0, you're in a pickle.

00:26:19.333 --> 00:26:22.005
AUDIENCE: That's a pretty
extreme problem.

00:26:22.005 --> 00:26:25.740
It means I had a treatment
group and

00:26:25.740 --> 00:26:27.850
only 10% of the people.

00:26:27.850 --> 00:26:30.160
SHAWN COLE: Sure.

00:26:30.160 --> 00:26:33.040
That's true, but it's not
implausible that you'd want to

00:26:33.040 --> 00:26:34.470
run that type of experiment.

00:26:34.470 --> 00:26:36.700
So if you think of--

00:26:36.700 --> 00:26:38.960
I don't know the exact numbers
on the flu encouragement

00:26:38.960 --> 00:26:41.440
design papers.

00:26:41.440 --> 00:26:43.960
To try and understand how
effective a flu shot is, you

00:26:43.960 --> 00:26:47.750
can send out letters to a bunch
of people and some of

00:26:47.750 --> 00:26:49.720
the people-- the control group,
you just send a letter

00:26:49.720 --> 00:26:52.200
saying, just for your
information, the influenza

00:26:52.200 --> 00:26:52.970
season is coming.

00:26:52.970 --> 00:26:54.860
Please wash your hands a lot.

00:26:54.860 --> 00:26:57.330
The treatment group you send a
letter saying that information

00:26:57.330 --> 00:27:00.180
plus we're offering free flu
shots down at the clinic, why

00:27:00.180 --> 00:27:01.470
don't you come down?

00:27:01.470 --> 00:27:03.070
And you may only get
a 10% response

00:27:03.070 --> 00:27:05.280
rate from that letter.

00:27:05.280 --> 00:27:08.760
But in that case, the treatment
is very cheap.

00:27:08.760 --> 00:27:12.670
It's just $0.32, or whatever
a stamp cost these days.

00:27:12.670 --> 00:27:16.290
So you could do a study
with 100,000 people.

00:27:16.290 --> 00:27:19.090
Then you would estimate both of
these things very precisely

00:27:19.090 --> 00:27:21.230
because you'd have 50,000 people
in the treatment group,

00:27:21.230 --> 00:27:23.540
50,000 people in the
control group.

00:27:23.540 --> 00:27:26.590
And you'd have these numbers and
so you could come up with

00:27:26.590 --> 00:27:27.300
an estimate.

00:27:27.300 --> 00:27:29.690
It's not like we should give up
on an experiment that has a

00:27:29.690 --> 00:27:33.680
low probability of treatment,
it's just if we think we're

00:27:33.680 --> 00:27:35.370
going to be there we want
to do our power

00:27:35.370 --> 00:27:36.190
calculations carefully.

00:27:36.190 --> 00:27:39.880
And I think you would have
seen when you did your--

00:27:39.880 --> 00:27:42.940
did your power calculations
include noncompliance?

00:27:42.940 --> 00:27:45.660
OK, so if you start adjusting
the power calculations for

00:27:45.660 --> 00:27:48.700
noncompliance, you see that you
need larger sample sizes.

00:27:48.700 --> 00:27:51.440
And so that's an important
lesson as well.

00:27:55.460 --> 00:27:56.750
Sticking with that
flu example.

00:28:00.400 --> 00:28:02.280
Why might this be hard?

00:28:02.280 --> 00:28:04.480
When we sent out these letters
to all these people either

00:28:04.480 --> 00:28:06.170
encouraging them to wash their
hands or encouraging them to

00:28:06.170 --> 00:28:06.705
get flu shots.

00:28:06.705 --> 00:28:09.240
AUDIENCE: We have
to observe both.

00:28:09.240 --> 00:28:12.494
Whether there's treatment in
your control in particular,

00:28:12.494 --> 00:28:15.180
that could be really
hard to do.

00:28:15.180 --> 00:28:17.700
SHAWN COLE: So maybe if you're
directing them to your flu

00:28:17.700 --> 00:28:20.020
clinic, you're going sort of
observe everybody you send a

00:28:20.020 --> 00:28:23.230
letter who comes in
for a flu shot.

00:28:23.230 --> 00:28:25.760
And maybe they're employees of
Blue Cross Blue Shield or

00:28:25.760 --> 00:28:28.060
something and you hope you'll
get a good estimate.

00:28:28.060 --> 00:28:30.350
But what about this, you've got
50,000 people you've sent

00:28:30.350 --> 00:28:32.940
letters to in the
control group.

00:28:32.940 --> 00:28:33.480
What can you do?

00:28:33.480 --> 00:28:40.480
Do you have to phone up all
50,000 people and try and--

00:28:40.480 --> 00:28:42.690
AUDIENCE: And say, oh, by
the way, did you happen

00:28:42.690 --> 00:28:43.210
to get a flu shot.

00:28:43.210 --> 00:28:43.320
SHAWN COLE: Exactly.

00:28:43.320 --> 00:28:45.690
So do we need to make 50,000
telephone calls

00:28:45.690 --> 00:28:47.402
to solve this problem?

00:28:47.402 --> 00:28:49.250
AUDIENCE: Randomly sample.

00:28:49.250 --> 00:28:49.970
SHAWN COLE: Randomly sample?

00:28:49.970 --> 00:28:53.140
So pick 500, even 1,000,
or who knows?

00:28:53.140 --> 00:28:55.520
Maybe even 200 people, 300
people will give you a pretty

00:28:55.520 --> 00:28:57.040
good estimate of what this is.

00:28:57.040 --> 00:28:59.590
You just need to make sure that
it is a randomly selected

00:28:59.590 --> 00:29:01.670
sample and you're not just
asking all of the people in

00:29:01.670 --> 00:29:02.920
this particular clinic.

00:29:08.500 --> 00:29:12.540
We're getting a little
bit out of order.

00:29:12.540 --> 00:29:14.900
I've gone through the math, but
PowerPoint wouldn't let

00:29:14.900 --> 00:29:15.420
you see it.

00:29:15.420 --> 00:29:19.800
But at least suffice it to say
Will talked us through it.

00:29:19.800 --> 00:29:22.470
It's not a lot of fancy
econometrics we're doing here.

00:29:22.470 --> 00:29:24.740
This is called the Wald estimate
if you've taken

00:29:24.740 --> 00:29:26.170
econometrics.

00:29:26.170 --> 00:29:32.100
But it's a very simple
method for coming up.

00:29:32.100 --> 00:29:35.630
So there's some problems or
there's some areas where this

00:29:35.630 --> 00:29:36.330
could be a problem.

00:29:36.330 --> 00:29:37.280
And Will hinted at one.

00:29:37.280 --> 00:29:39.170
We can mention a
couple others.

00:29:39.170 --> 00:29:44.140
So how might this treatment on
the treatment design fail,

00:29:44.140 --> 00:29:46.740
let's say, in the
letters example,

00:29:46.740 --> 00:29:47.870
the influenza example?

00:29:47.870 --> 00:29:50.170
So we're sending out
a bunch of letters.

00:29:50.170 --> 00:29:52.290
So suppose the treatment group
is you send out a bunch of

00:29:52.290 --> 00:29:55.290
letters saying, it's flu season,
come get a flu shot.

00:29:55.290 --> 00:29:57.425
And the control group is you
don't send out any letters.

00:30:00.150 --> 00:30:03.640
And so what happens?

00:30:03.640 --> 00:30:07.990
Suppose you get 50%
compliance.

00:30:07.990 --> 00:30:15.290
So treated and the
control group,

00:30:15.290 --> 00:30:19.070
compliance means flu shot.

00:30:19.070 --> 00:30:21.020
You get 50%.

00:30:21.020 --> 00:30:23.600
Let's make it easy, in the
control group you get 0%.

00:30:23.600 --> 00:30:26.120
You do a sample and just
nobody gets a flu shot.

00:30:26.120 --> 00:30:28.650
Maybe this is in a developing
country where people don't

00:30:28.650 --> 00:30:30.260
tend to think about flu shots.

00:30:30.260 --> 00:30:38.480
And suppose you get the flu
rate to be 10% here.

00:30:38.480 --> 00:30:41.180
We could say this
is in Mexico.

00:30:41.180 --> 00:30:42.510
And 15% here.

00:30:45.070 --> 00:30:46.460
Do I have my math right?

00:30:46.460 --> 00:30:46.960
Yeah.

00:30:46.960 --> 00:30:49.940
So what would the treatment on
the treated estimate here be?

00:30:55.700 --> 00:30:57.620
AUDIENCE: 20%?

00:30:57.620 --> 00:30:59.280
SHAWN COLE: So what's
the formula?

00:31:04.840 --> 00:31:08.814
AUDIENCE: 10 minus 15.

00:31:08.814 --> 00:31:12.140
Divided by 0.5.

00:31:12.140 --> 00:31:13.810
SHAWN COLE: Minus 0.

00:31:13.810 --> 00:31:15.060
So it's minus 10.

00:31:19.200 --> 00:31:23.520
So giving the flu shot to a
population will reduce the

00:31:23.520 --> 00:31:27.350
incidence of flu by 10
percentage points.

00:31:27.350 --> 00:31:31.860
So what's an example of how this
experiment could fail, or

00:31:31.860 --> 00:31:35.366
how this number could
not be correct?

00:31:35.366 --> 00:31:38.824
AUDIENCE: By reminding people to
get a flu shot, you remind

00:31:38.824 --> 00:31:41.788
them that the flu is out there
and they might do other things

00:31:41.788 --> 00:31:42.290
besides get a flu shot.

00:31:42.290 --> 00:31:42.710
SHAWN COLE: Absolutely.

00:31:42.710 --> 00:31:44.830
So now they wash their hands, or
they don't go out in public

00:31:44.830 --> 00:31:46.990
places, or they wear masks.

00:31:46.990 --> 00:31:51.570
And so you think that you're
reducing the flu a lot by

00:31:51.570 --> 00:31:52.470
these flu shots.

00:31:52.470 --> 00:31:54.530
But maybe in fact, washing
your hands is a lot more

00:31:54.530 --> 00:31:55.380
important than--

00:31:55.380 --> 00:31:57.330
in fact, I think it probably
is-- washing your hands off

00:31:57.330 --> 00:31:58.990
and is a lot more important
than getting a flu shot.

00:31:58.990 --> 00:32:00.800
And that's what's giving
you the effect.

00:32:00.800 --> 00:32:03.670
So if you're scaling up this
impact in the treatment on the

00:32:03.670 --> 00:32:07.440
treated to give yourself credit
for the fact that 50%

00:32:07.440 --> 00:32:08.710
of the people didn't
get a flu shot.

00:32:08.710 --> 00:32:10.980
If instead they're actually
washing their hands a lot,

00:32:10.980 --> 00:32:13.610
then this won't be the
correct estimate of

00:32:13.610 --> 00:32:15.900
treatment on the treated.

00:32:15.900 --> 00:32:19.096
Is that reasonably clear?

00:32:19.096 --> 00:32:20.578
AUDIENCE: I have a question.

00:32:20.578 --> 00:32:24.010
Will it be a good estimate of
the impact of that specific

00:32:24.010 --> 00:32:25.400
intervention on the treated?

00:32:25.400 --> 00:32:29.524
So instead of measuring the
impact of the flu shot, you're

00:32:29.524 --> 00:32:32.512
measuring the impact of
reminding people about flu

00:32:32.512 --> 00:32:34.520
shots and giving them
access to free ones?

00:32:34.520 --> 00:32:37.090
SHAWN COLE: So would you want
to scale that up or not?

00:32:37.090 --> 00:32:38.948
Which estimate would you
want to take there?

00:32:38.948 --> 00:32:40.945
AUDIENCE: Depends on what
you're interested in.

00:32:40.945 --> 00:32:42.310
SHAWN COLE: So suppose we're
interested in, what's the

00:32:42.310 --> 00:32:43.980
impact of sending a letter
to somebody and

00:32:43.980 --> 00:32:46.780
offering them a flu shot?

00:32:46.780 --> 00:32:50.250
Is the correct estimate 5
percentage points or 10

00:32:50.250 --> 00:32:51.500
percentage points?

00:32:54.380 --> 00:32:55.420
AUDIENCE: 5.

00:32:55.420 --> 00:32:56.120
SHAWN COLE: Why?

00:32:56.120 --> 00:32:58.625
AUDIENCE: Then your intent
to treat is maybe more

00:32:58.625 --> 00:33:01.365
interesting because you want
to take into account that

00:33:01.365 --> 00:33:04.116
people change their behavior and
their compliance rates are

00:33:04.116 --> 00:33:04.280
[UNINTELLIGIBLE].

00:33:04.280 --> 00:33:04.590
SHAWN COLE: Right.

00:33:04.590 --> 00:33:07.030
So then we're really interested
in the package of

00:33:07.030 --> 00:33:11.540
effects that sending a letter
causes, which includes

00:33:11.540 --> 00:33:12.940
sometimes getting a
shot, more likely.

00:33:12.940 --> 00:33:15.330
But also washing your hands
or being more careful.

00:33:15.330 --> 00:33:16.890
So the intent to treat--

00:33:16.890 --> 00:33:18.633
and as we say, so if you have
this situation, you should

00:33:18.633 --> 00:33:19.850
only use intent to treat.

00:33:19.850 --> 00:33:21.170
And intent to treat is very
interesting and it tells us

00:33:21.170 --> 00:33:23.480
the effect of sending out
a letter, but it doesn't

00:33:23.480 --> 00:33:26.720
necessarily tell us the effect
of the flu shot.

00:33:26.720 --> 00:33:31.300
Anybody have any examples from
their own programs or

00:33:31.300 --> 00:33:33.410
projected projects where they're
concerned about this?

00:33:37.300 --> 00:33:40.260
Maybe you guys can be at least
sure to work this in on your

00:33:40.260 --> 00:33:43.470
presentations tomorrow.

00:33:43.470 --> 00:33:46.350
What about when you have
spillovers or externalities?

00:33:46.350 --> 00:33:48.270
So I think Michael talked
about spillovers and

00:33:48.270 --> 00:33:49.100
externalities this morning.

00:33:49.100 --> 00:33:50.800
But he might not have
integrated that with

00:33:50.800 --> 00:33:53.100
intention to treat.

00:33:53.100 --> 00:33:54.090
Is that a correct

00:33:54.090 --> 00:33:55.570
characterization of his lecture?

00:33:55.570 --> 00:33:56.820
Excellent.

00:34:01.130 --> 00:34:03.440
How could we go wrong-- let's
stick to something I know

00:34:03.440 --> 00:34:06.450
well-- with the Balsakhi
Program.

00:34:06.450 --> 00:34:07.860
In the Balsakhi Program,
we have

00:34:07.860 --> 00:34:09.980
treatment and control schools.

00:34:09.980 --> 00:34:17.300
And sort of the compliance is
20% in the treatment schools

00:34:17.300 --> 00:34:20.010
and 0% in the control schools.

00:34:20.010 --> 00:34:23.420
And the change in test
scores is, let's

00:34:23.420 --> 00:34:26.050
say, 1 standard deviation.

00:34:26.050 --> 00:34:28.690
No, that's going to
be way to high.

00:34:28.690 --> 00:34:33.510
0.2 standard deviations
here and 0

00:34:33.510 --> 00:34:36.489
standard deviations here.

00:34:36.489 --> 00:34:39.880
So quickly, as a review, how do
we get the ITT estimator?

00:34:45.159 --> 00:34:47.810
This is what we call
a chip shot at HBS.

00:34:47.810 --> 00:34:50.520
But since we're not awarding
points for participation there

00:34:50.520 --> 00:34:52.344
aren't a lot of golfers
out there.

00:34:52.344 --> 00:34:53.312
AUDIENCE: Intention to treat?

00:34:53.312 --> 00:34:54.280
It's just 0.2.

00:34:54.280 --> 00:34:57.430
SHAWN COLE: OK and how
did you get that?

00:34:57.430 --> 00:34:59.335
You just saw it?

00:34:59.335 --> 00:35:00.670
AUDIENCE: You just
subtract it.

00:35:00.670 --> 00:35:01.560
You don't have to do anything.

00:35:01.560 --> 00:35:02.885
SHAWN COLE: So it's 0.2.

00:35:02.885 --> 00:35:03.920
AUDIENCE: Minus 0.

00:35:03.920 --> 00:35:05.340
SHAWN COLE: Minus 0.

00:35:05.340 --> 00:35:06.363
Over?

00:35:06.363 --> 00:35:09.470
AUDIENCE: Wait, are
you saying--

00:35:09.470 --> 00:35:09.910
SHAWN COLE: Oh, sorry.

00:35:09.910 --> 00:35:11.320
Intent to treat.

00:35:11.320 --> 00:35:11.780
Yeah, exactly.

00:35:11.780 --> 00:35:12.350
Yeah, you're right.

00:35:12.350 --> 00:35:13.030
Sorry, my bad.

00:35:13.030 --> 00:35:15.500
So the intent to treat is 0.2.

00:35:15.500 --> 00:35:18.143
And what's the treatment
on the treated?

00:35:18.143 --> 00:35:19.690
AUDIENCE: 0.2 minus 0.

00:35:19.690 --> 00:35:20.944
SHAWN COLE: Great.

00:35:20.944 --> 00:35:25.540
AUDIENCE: Over 0.2 minus 0.

00:35:25.540 --> 00:35:27.650
SHAWN COLE: This is confusing.

00:35:27.650 --> 00:35:29.570
This is the standard deviations
in test score and

00:35:29.570 --> 00:35:31.590
this is the percentage
compliance.

00:35:31.590 --> 00:35:33.440
And so what's that
going to give us?

00:35:33.440 --> 00:35:33.870
AUDIENCE: 1.

00:35:33.870 --> 00:35:35.680
SHAWN COLE: 1.

00:35:35.680 --> 00:35:38.520
So wow, that's a spectacularly
effective program.

00:35:38.520 --> 00:35:42.600
I was very proud to be
associated with it.

00:35:42.600 --> 00:35:44.520
It raised test scores by
1 standard deviation.

00:35:44.520 --> 00:35:46.730
Which if you know the education
literature, is a

00:35:46.730 --> 00:35:47.980
pretty big impact.

00:35:50.970 --> 00:35:55.100
What might we be concerned
about in this case?

00:35:55.100 --> 00:35:56.570
So what did the Balsakhi
Program do?

00:35:56.570 --> 00:35:57.490
Refresh ourselves.

00:35:57.490 --> 00:36:00.070
It takes 20% of the students,
pulls them out of the

00:36:00.070 --> 00:36:03.150
classroom during the
regular class, and

00:36:03.150 --> 00:36:04.440
sends them to a tutor.

00:36:04.440 --> 00:36:06.360
And these are the kids who are
often in the back of the

00:36:06.360 --> 00:36:08.240
class, the teacher's not paying
attention to them

00:36:08.240 --> 00:36:11.040
because they're only teaching
to the top of the class.

00:36:11.040 --> 00:36:12.830
Maybe they're making
trouble, throwing

00:36:12.830 --> 00:36:14.080
things around, et cetera.

00:36:17.150 --> 00:36:19.716
What could go wrong?

00:36:19.716 --> 00:36:22.999
AUDIENCE: You're attributing the
fact to just taking this

00:36:22.999 --> 00:36:25.701
class to just those students
in particular.

00:36:25.701 --> 00:36:28.106
Whereas you're taking them out
of the class, so you're making

00:36:28.106 --> 00:36:30.511
the class that you left
behind smaller.

00:36:30.511 --> 00:36:32.440
So there's effects--

00:36:32.440 --> 00:36:34.010
SHAWN COLE: Right, so how does
making the class that you left

00:36:34.010 --> 00:36:34.950
behind smaller matter?

00:36:34.950 --> 00:36:39.340
AUDIENCE: Usually it's easier
to learn in a smaller group.

00:36:39.340 --> 00:36:41.960
SHAWN COLE: That's why we teach
a JPAL maximum class

00:36:41.960 --> 00:36:44.960
size of 15 or 20.

00:36:44.960 --> 00:36:49.330
We hope that small classes
are more effective.

00:36:49.330 --> 00:36:52.180
And there's also a tracking
argument that maybe now the

00:36:52.180 --> 00:36:56.080
teacher can focus really on the
homogeneous group rather

00:36:56.080 --> 00:36:58.510
than having to teach to multiple
levels, et cetera.

00:36:58.510 --> 00:37:00.520
So there are all these other
reasons to think that it may

00:37:00.520 --> 00:37:02.920
be the case that they're going
to be some spillovers.

00:37:02.920 --> 00:37:07.830
So how would you explain in
words to a policymaker why

00:37:07.830 --> 00:37:09.710
you're not sure that 1 is
the right effective

00:37:09.710 --> 00:37:10.960
treatment on treated?

00:37:15.460 --> 00:37:17.610
This is particular for the IPA
folks who will have to be

00:37:17.610 --> 00:37:20.370
doing this to earn
their paychecks.

00:37:23.457 --> 00:37:24.707
But anybody is welcome.

00:37:27.222 --> 00:37:32.020
AUDIENCE: There were sidebar
benefits to the control group.

00:37:32.020 --> 00:37:33.890
SHAWN COLE: Right.

00:37:33.890 --> 00:37:35.100
Well, not the control group.

00:37:35.100 --> 00:37:37.510
AUDIENCE: I mean to the--

00:37:37.510 --> 00:37:39.510
SHAWN COLE: Untreated students
in the treatment group.

00:37:39.510 --> 00:37:41.710
So we were attributing all of
this test gain to sending

00:37:41.710 --> 00:37:44.260
these kids out to class.

00:37:44.260 --> 00:37:47.020
But in fact, there could've
been some test gain in the

00:37:47.020 --> 00:37:48.440
other groups.

00:37:48.440 --> 00:37:51.650
In the extreme, I suppose you
could imagine that there's no

00:37:51.650 --> 00:37:53.770
performance gain for the
children who go out to the

00:37:53.770 --> 00:37:57.520
Balsakhis, but getting those
misbehaving children out of

00:37:57.520 --> 00:38:00.750
the class causes everybody else
to learn so much more

00:38:00.750 --> 00:38:03.750
effectively that it raises
the test score.

00:38:03.750 --> 00:38:06.820
So the program still has an
effect, but the way it has an

00:38:06.820 --> 00:38:10.460
effect is by getting
the misbehaving

00:38:10.460 --> 00:38:11.760
children out of the class.

00:38:11.760 --> 00:38:14.140
Now it turns out that that's
not the case that there are

00:38:14.140 --> 00:38:17.050
sort of some interesting ways
to try and tease out how big

00:38:17.050 --> 00:38:18.280
the spillovers were.

00:38:18.280 --> 00:38:19.750
And I encourage you to read
the paper if you're

00:38:19.750 --> 00:38:20.800
interested.

00:38:20.800 --> 00:38:23.800
And it turns out that there's
not really any evidence of

00:38:23.800 --> 00:38:26.380
spillovers, so it really does
seem like the effect happened

00:38:26.380 --> 00:38:29.150
through the Balsakhi Program.

00:38:29.150 --> 00:38:32.080
But it's definitely something we
want to be aware about when

00:38:32.080 --> 00:38:33.330
we're doing our analysis.

00:38:38.060 --> 00:38:40.420
OK, so we've already
talked about this.

00:38:40.420 --> 00:38:41.580
If you have partial
compliance, your

00:38:41.580 --> 00:38:45.560
power may be affected.

00:38:45.560 --> 00:38:49.930
So the intention to treat is
often appropriate for program

00:38:49.930 --> 00:38:51.270
evaluations.

00:38:51.270 --> 00:38:52.730
It's simple to calculate.

00:38:52.730 --> 00:38:54.840
It's easy to explain.

00:38:54.840 --> 00:38:57.880
If you do this program, you'll
get a mean change in your

00:38:57.880 --> 00:39:00.240
outcome of 0.3, 0.4, 0.5.

00:39:00.240 --> 00:39:02.570
There are a lot of
advantages to it.

00:39:02.570 --> 00:39:05.620
But sometimes you may be
interested as well on the

00:39:05.620 --> 00:39:06.520
program itself.

00:39:06.520 --> 00:39:09.580
And as we said, it measures the
treatment effect for those

00:39:09.580 --> 00:39:12.980
who take the treatment because
they're assigned to it.

00:39:12.980 --> 00:39:15.260
If you have people who
will never ever

00:39:15.260 --> 00:39:16.180
ever take the treatment.

00:39:16.180 --> 00:39:20.260
If you tried to run a randomized
evaluation on

00:39:20.260 --> 00:39:22.280
Christian scientists who refuse
medical treatment and

00:39:22.280 --> 00:39:26.830
you assign them free medical
care, you're not going to find

00:39:26.830 --> 00:39:27.380
any effect.

00:39:27.380 --> 00:39:31.780
But we can find the effect of
the treatment on people who

00:39:31.780 --> 00:39:34.350
take the treatment when
they're offered it.

00:39:34.350 --> 00:39:36.600
And so when you're doing the
design of your experiment,

00:39:36.600 --> 00:39:39.270
it's important to think through
these issues and think

00:39:39.270 --> 00:39:42.040
at the end of the day, at the
end of the study, in two years

00:39:42.040 --> 00:39:45.150
time when we've collected all
the data and analyzed it, what

00:39:45.150 --> 00:39:46.480
sort of results are we
going to report?

00:39:46.480 --> 00:39:47.910
How are we going
to report them?

00:39:47.910 --> 00:39:50.588
And how are we going to explain
them to other folks?

00:39:53.980 --> 00:39:58.650
So that is intention to treat
and treatment on the treated.

00:39:58.650 --> 00:40:02.230
And let's move briefly to the
choice of outcomes and

00:40:02.230 --> 00:40:03.480
covariates.

00:40:07.640 --> 00:40:12.080
We always look forward to your
feedback from the course.

00:40:12.080 --> 00:40:15.580
But in my view, the course might
benefit from a little

00:40:15.580 --> 00:40:18.820
bit more focus on some of the
practical aspects of doing an

00:40:18.820 --> 00:40:19.460
evaluation.

00:40:19.460 --> 00:40:24.690
So for example, survey design
and determining what outcomes.

00:40:24.690 --> 00:40:27.100
Although maybe many of you are
already familiar with that.

00:40:27.100 --> 00:40:30.220
So often when you do these
randomized evaluations, you do

00:40:30.220 --> 00:40:32.940
like a household survey or you
have administrative data on

00:40:32.940 --> 00:40:35.980
the individual and you have 100
or you have 150 variables

00:40:35.980 --> 00:40:36.370
about them.

00:40:36.370 --> 00:40:38.820
So you go into the field with
a 20 page survey and you sit

00:40:38.820 --> 00:40:40.890
and you ask, how many
cows do you have?

00:40:40.890 --> 00:40:42.330
How many goats do you have?

00:40:42.330 --> 00:40:43.840
How much land do you have?

00:40:43.840 --> 00:40:46.040
What was your monthly income?

00:40:46.040 --> 00:40:48.540
Are your children going
to school, et cetera.

00:40:48.540 --> 00:40:52.290
And so if you imagine an outcome
like microfinance,

00:40:52.290 --> 00:40:57.400
which we hope causes people to
be more productive and engaged

00:40:57.400 --> 00:41:00.470
and boosts household income,
arguably boosts women's

00:41:00.470 --> 00:41:01.930
empowerment, et cetera.

00:41:01.930 --> 00:41:04.420
There are sort of a lot of
things that could plausibly

00:41:04.420 --> 00:41:09.420
happen because you offer
people microfinance.

00:41:09.420 --> 00:41:12.070
But from a statistical side,
this offers some challenges.

00:41:12.070 --> 00:41:15.390
So what's the problem when
you're interested in how

00:41:15.390 --> 00:41:17.480
microfinance affects 40
different outcomes.

00:41:17.480 --> 00:41:22.350
So education, consumption,
feelings of empowerment,

00:41:22.350 --> 00:41:23.330
number of hours worked.

00:41:23.330 --> 00:41:25.310
You know, you can come up with a
lot of plausible things that

00:41:25.310 --> 00:41:27.320
you think microfinance
would affect.

00:41:33.240 --> 00:41:36.250
AUDIENCE: At a 0.5 level too,
those will come out

00:41:36.250 --> 00:41:38.980
significant having
40 even though--

00:41:38.980 --> 00:41:39.910
SHAWN COLE: Right.

00:41:39.910 --> 00:41:45.110
So for people who haven't done
hypothesis testing, hypothesis

00:41:45.110 --> 00:41:48.880
testing, which is what we use
to analyze data says, how

00:41:48.880 --> 00:41:50.510
likely is it that the difference
between the

00:41:50.510 --> 00:41:52.710
treatment and the control
group is because of the

00:41:52.710 --> 00:41:55.990
program or simply because
of random chance?

00:41:55.990 --> 00:41:58.710
And so you can look at the
distribution of outcomes in

00:41:58.710 --> 00:42:00.690
the treatment group and the
control group and observe

00:42:00.690 --> 00:42:02.820
their means and their variances
and the number of

00:42:02.820 --> 00:42:07.480
observations, and you can come
up with a rule that says it's

00:42:07.480 --> 00:42:08.500
very, very unlikely.

00:42:08.500 --> 00:42:11.680
Only 1 in 100 times would this
thing have happened because of

00:42:11.680 --> 00:42:13.240
random chance.

00:42:13.240 --> 00:42:16.310
The standard that tends to be
used in economics and the

00:42:16.310 --> 00:42:19.290
medical literature is this
5% p value, which

00:42:19.290 --> 00:42:22.190
is 1 out of 20 times.

00:42:22.190 --> 00:42:26.450
But if you're looking at 40
outcomes, then on average, 2

00:42:26.450 --> 00:42:28.580
of them are going to be
statistically significant just

00:42:28.580 --> 00:42:29.240
out of random chance.

00:42:29.240 --> 00:42:33.030
So if I were to just randomly
divide this group, this class

00:42:33.030 --> 00:42:35.810
into two groups, and started
looking at things like US born

00:42:35.810 --> 00:42:41.210
or foreign born, or Yankees fan
or Red Sox fan, or what

00:42:41.210 --> 00:42:43.880
have you, it wouldn't be too
hard to eventually find

00:42:43.880 --> 00:42:46.160
something that were
statistically significantly

00:42:46.160 --> 00:42:49.500
different between
the two groups.

00:42:49.500 --> 00:42:54.770
This is a challenge.

00:42:54.770 --> 00:42:57.240
There are a few ways to deal

00:42:57.240 --> 00:42:58.140
effectively with this challenge.

00:42:58.140 --> 00:43:01.020
So this isn't a particularly
difficult challenge.

00:43:01.020 --> 00:43:03.780
So what the medical literature
does in which, I think, we're

00:43:03.780 --> 00:43:07.310
sort of slowly moving towards
in social sciences, is you

00:43:07.310 --> 00:43:11.090
have to stay in advance where
you expect to find effects.

00:43:11.090 --> 00:43:13.380
So if you're going to the FDA
to test the efficacy of a

00:43:13.380 --> 00:43:17.030
drug, when you apply for the
phase III evaluation you have

00:43:17.030 --> 00:43:19.110
to say, I think this is
going to cure Shawn's

00:43:19.110 --> 00:43:20.790
male pattern baldness.

00:43:20.790 --> 00:43:27.760
And I think that this is going
to result in a more charming

00:43:27.760 --> 00:43:28.480
personality.

00:43:28.480 --> 00:43:32.400
And then you find Shawn and all
of his brothers and you

00:43:32.400 --> 00:43:33.170
run the experiment.

00:43:33.170 --> 00:43:37.180
And then if the outcome turns
out that the treatment group

00:43:37.180 --> 00:43:43.030
is now never get sick, simply
immune from all diseases.

00:43:43.030 --> 00:43:46.450
In the past year, nobody in the
treatment group got sick,

00:43:46.450 --> 00:43:48.680
you can't then go out and market
that drug as something

00:43:48.680 --> 00:43:51.520
that cures diseases because
that wasn't your stated

00:43:51.520 --> 00:43:54.470
hypothesis test ahead of time.

00:43:54.470 --> 00:43:57.320
So oftentimes, we have a
lot of guidance on what

00:43:57.320 --> 00:43:58.420
hypotheses to test.

00:43:58.420 --> 00:44:01.470
So if we're doing a financial
literacy program, we expect

00:44:01.470 --> 00:44:04.400
that to effect financial
outcomes and not employment

00:44:04.400 --> 00:44:08.830
and not divorce rates.

00:44:08.830 --> 00:44:10.500
Although you could
tell a story that

00:44:10.500 --> 00:44:12.250
eventually gets you there.

00:44:12.250 --> 00:44:14.920
But when you're reporting your
results to the world then,

00:44:14.920 --> 00:44:16.950
what you want to do is report
the results on all the

00:44:16.950 --> 00:44:18.770
measured outcomes,
even the ones for

00:44:18.770 --> 00:44:20.330
which you find no effect.

00:44:20.330 --> 00:44:23.790
So then, anybody who takes the
study can look and say, OK,

00:44:23.790 --> 00:44:27.070
they're saying that their
program has a great effect on

00:44:27.070 --> 00:44:30.610
income and children's schooling
and health.

00:44:30.610 --> 00:44:35.560
But they tested 200 things, so
I'm a little bit skeptical.

00:44:35.560 --> 00:44:38.050
Or but they only tested 6 things
and half of them were

00:44:38.050 --> 00:44:40.400
large statistically significant
impact, so I

00:44:40.400 --> 00:44:41.840
really believe that study.

00:44:46.950 --> 00:44:49.020
Maybe it's a little unfortunate
the last class of

00:44:49.020 --> 00:44:51.470
the course is about sort of
challenges with randomized

00:44:51.470 --> 00:44:56.000
evaluations because I should
just take a sidebar to

00:44:56.000 --> 00:44:57.800
emphasize that these problems
we're talking about are not

00:44:57.800 --> 00:44:59.370
unique to randomized
evaluations.

00:44:59.370 --> 00:45:02.310
Any evaluation you
run these risks.

00:45:02.310 --> 00:45:06.310
So if it's just sort of the
standard, what we like to

00:45:06.310 --> 00:45:08.170
think of as a pretty bad
evaluation where you just find

00:45:08.170 --> 00:45:10.820
some treatment people and find
some associated comparison

00:45:10.820 --> 00:45:13.460
people and do a survey, you're
going to run into this exact

00:45:13.460 --> 00:45:13.940
same problem.

00:45:13.940 --> 00:45:17.640
So don't take many of these as
a criticism as randomized

00:45:17.640 --> 00:45:21.010
evaluation, but just take
them as good science.

00:45:21.010 --> 00:45:22.620
What to do for good science.

00:45:22.620 --> 00:45:25.360
And there are other things you
can do which is to adjust your

00:45:25.360 --> 00:45:26.440
standard errors.

00:45:26.440 --> 00:45:28.840
So we have a very simple way of

00:45:28.840 --> 00:45:30.180
calculating standard errors.

00:45:30.180 --> 00:45:32.630
But if you're testing multiple
hypotheses, then you can

00:45:32.630 --> 00:45:35.430
actually statistically take that
into account and come up

00:45:35.430 --> 00:45:38.180
with sort of corrected
or bounds on

00:45:38.180 --> 00:45:38.820
your standard errors.

00:45:38.820 --> 00:45:42.460
And those are described in the
literature that I'm going to

00:45:42.460 --> 00:45:44.572
refer to you at the end
of the talk as well.

00:45:44.572 --> 00:45:46.600
AUDIENCE: What about taking half
your data and mining it?

00:45:49.250 --> 00:45:51.610
SHAWN COLE: And throwing
the rest away?

00:45:51.610 --> 00:45:53.860
AUDIENCE: No, I mine
half of my data.

00:45:53.860 --> 00:45:57.010
I just figure out what really
matters and that's how I

00:45:57.010 --> 00:45:58.260
generate my hypothesis.

00:46:01.142 --> 00:46:04.258
SHAWN COLE: That sounds
reasonable to me.

00:46:04.258 --> 00:46:06.750
AUDIENCE: I better have had
a big enough sample.

00:46:06.750 --> 00:46:08.710
SHAWN COLE: Yeah, you better
have had a big enough sample.

00:46:08.710 --> 00:46:12.460
I think sort of that
intuition is--

00:46:12.460 --> 00:46:15.810
data mining is problematic, but
it is useful when you run

00:46:15.810 --> 00:46:18.200
a study to report all of your
results because if there are

00:46:18.200 --> 00:46:21.190
some surprising things there, we
don't know whether they're

00:46:21.190 --> 00:46:23.260
there because of chance
or because the

00:46:23.260 --> 00:46:24.130
program had that effect.

00:46:24.130 --> 00:46:27.400
But that gives us sort of a
view on what to do when we

00:46:27.400 --> 00:46:28.110
test again.

00:46:28.110 --> 00:46:30.590
So if we're going to try this
microfinance program out in

00:46:30.590 --> 00:46:32.910
some other country or some other
area of the country, we

00:46:32.910 --> 00:46:36.350
said it had this surprising
effect on girl's empowerment.

00:46:36.350 --> 00:46:39.280
So now let's make that one of
our key outcomes and let's

00:46:39.280 --> 00:46:39.810
test it again.

00:46:39.810 --> 00:46:41.700
So I think that sounds
pretty reasonable.

00:46:44.660 --> 00:46:47.010
There's a pretty advanced,
developed statistical

00:46:47.010 --> 00:46:50.000
literature on hypothesis testing
and how you develop

00:46:50.000 --> 00:46:51.540
hypotheses.

00:46:51.540 --> 00:46:53.976
But that sounds not unreasonable
to me.

00:46:53.976 --> 00:46:55.226
Other thoughts?

00:47:00.670 --> 00:47:06.490
So another possibility is
heterogeneous treatment

00:47:06.490 --> 00:47:08.600
effects, which is what
we often look for.

00:47:08.600 --> 00:47:11.180
So you might think that the
financial literacy program is

00:47:11.180 --> 00:47:14.240
more effective with women than
men because women started out

00:47:14.240 --> 00:47:16.770
with lower levels of initial
financial literacy, or you

00:47:16.770 --> 00:47:19.200
might think that the de-worming
medication is more

00:47:19.200 --> 00:47:23.800
effective for children who live
near the river because

00:47:23.800 --> 00:47:25.890
then they play outside more
often, or something like that.

00:47:25.890 --> 00:47:28.170
And so it's very tempting to
run your regressions for

00:47:28.170 --> 00:47:29.580
different subgroups.

00:47:29.580 --> 00:47:33.900
But again, there's this risk
that you're data mining.

00:47:33.900 --> 00:47:39.430
Suppose I wanted to show that
I randomly assigned you this

00:47:39.430 --> 00:47:42.430
group into a treatment
and control group.

00:47:42.430 --> 00:47:46.750
And I wanted to show that
Yankees and Red Sox

00:47:46.750 --> 00:47:49.170
preferences were significantly
different in the two groups.

00:47:49.170 --> 00:47:52.590
I could probably cut the data
in different ways and

00:47:52.590 --> 00:47:56.100
eventually find some subset of
you for whom the treatment and

00:47:56.100 --> 00:47:59.080
the control variables were
actually different.

00:47:59.080 --> 00:48:03.200
And so you want to be aware
of that possibility.

00:48:03.200 --> 00:48:09.570
And again, like the FDA drug
trial way to avoid this is

00:48:09.570 --> 00:48:12.170
just to announce in advance
which subgroups you expect

00:48:12.170 --> 00:48:14.550
this product to be more or less
effective for and make

00:48:14.550 --> 00:48:18.140
sure you have a sufficient
sample size to test

00:48:18.140 --> 00:48:20.010
statistical significance
within those subgroups.

00:48:24.160 --> 00:48:28.010
Again, as a service to all the
consumers of your studies,

00:48:28.010 --> 00:48:29.500
report the results on
all the subgroups.

00:48:29.500 --> 00:48:34.370
Even the subgroups for whom the
program's not effective.

00:48:34.370 --> 00:48:35.890
So there's another
problem that--

00:48:35.890 --> 00:48:39.320
did we talk about clustering in
groups, data in the power

00:48:39.320 --> 00:48:39.870
calculations?

00:48:39.870 --> 00:48:46.370
OK, so that is sort of the bane
of statistical analysis.

00:48:46.370 --> 00:48:50.050
Or as we liked to say when I was
a graduate student, sort

00:48:50.050 --> 00:48:52.890
of people used to not really
appreciate the importance of

00:48:52.890 --> 00:48:54.230
these grouped errors.

00:48:54.230 --> 00:48:56.370
It was much easier to write a
paper back then because you

00:48:56.370 --> 00:48:58.740
found lots of statistically
significant results.

00:48:58.740 --> 00:49:01.980
And once you start using
this cluster

00:49:01.980 --> 00:49:04.180
adjustment, it's a lot harder.

00:49:04.180 --> 00:49:06.700
Now only 5% of results are
statistically significant.

00:49:06.700 --> 00:49:10.110
So we whined that if only we
had graduated five years

00:49:10.110 --> 00:49:11.630
earlier, it would have
been much easier to

00:49:11.630 --> 00:49:12.680
get our thesis done.

00:49:12.680 --> 00:49:17.140
But we here don't care about
getting the thesis done.

00:49:17.140 --> 00:49:22.390
We here care about finding out
the truth and measuring the

00:49:22.390 --> 00:49:25.110
potential for cluster in
standard errors, or clustering

00:49:25.110 --> 00:49:28.830
common shocks from the
group is very strong.

00:49:28.830 --> 00:49:31.840
You can often get estimates of
the correlation within groups

00:49:31.840 --> 00:49:33.670
using survey data that
you already have.

00:49:33.670 --> 00:49:36.090
That will inform your
power calculations.

00:49:36.090 --> 00:49:39.760
But when you're doing your
analysis, you need to adjust

00:49:39.760 --> 00:49:43.600
your statistical results
for this clustering.

00:49:43.600 --> 00:49:47.130
And in particular, you run into
problems when your groups

00:49:47.130 --> 00:49:48.030
are very small.

00:49:48.030 --> 00:49:50.810
So if you're thinking, I want
to do an evaluation where I

00:49:50.810 --> 00:49:54.090
had 15 treatment villages
and 15 control villages.

00:49:54.090 --> 00:49:55.760
Well, it's very likely that
outcomes are going to be

00:49:55.760 --> 00:49:57.610
correlated within
that village.

00:49:57.610 --> 00:50:00.200
And then, all of a sudden, you
only have 30 clusters.

00:50:00.200 --> 00:50:02.990
And even the statistical
technique isn't great for

00:50:02.990 --> 00:50:04.820
sample sizes that
are that small.

00:50:04.820 --> 00:50:07.320
You have to use other
statistical techniques, which

00:50:07.320 --> 00:50:10.950
are sort of valid, but
less powerful.

00:50:10.950 --> 00:50:14.650
So we won't go into the
randomization inference.

00:50:14.650 --> 00:50:18.410
Again, it's mentioned in this
paper I was talking about.

00:50:22.000 --> 00:50:25.140
You should be aware of this and
I think the most important

00:50:25.140 --> 00:50:27.300
time to think about this is
when you're designing your

00:50:27.300 --> 00:50:30.890
evaluation, to make sure you
get enough clusters.

00:50:30.890 --> 00:50:34.670
And if you can, it's almost
always preferable to randomize

00:50:34.670 --> 00:50:37.320
by individual than group.

00:50:37.320 --> 00:50:40.010
Because randomizing by group
requires typically, much

00:50:40.010 --> 00:50:41.070
larger sample sizes.

00:50:41.070 --> 00:50:43.502
AUDIENCE: Can you just say one
more thing or give us a

00:50:43.502 --> 00:50:46.525
reference on what you meant by
other statistical techniques

00:50:46.525 --> 00:50:49.550
that are valid, but
not as powerful?

00:50:49.550 --> 00:50:50.480
SHAWN COLE: Sure.

00:50:50.480 --> 00:50:52.660
Let me just skip to this.

00:50:52.660 --> 00:50:57.730
This should be on your
slide package.

00:50:57.730 --> 00:51:00.930
The ultimate slide is additional
resources.

00:51:00.930 --> 00:51:04.200
And so there's a paper called
"Using Randomization in

00:51:04.200 --> 00:51:05.890
Development Economics Research:
A Toolkit, by

00:51:05.890 --> 00:51:07.410
Esther, Rachel, and Michael.

00:51:07.410 --> 00:51:09.390
And this goes through everything
we've talked about

00:51:09.390 --> 00:51:12.560
this week in pretty careful
detail, works out the math,

00:51:12.560 --> 00:51:14.630
and gives you references
to what you need.

00:51:14.630 --> 00:51:17.470
So it's in there.

00:51:17.470 --> 00:51:20.950
Josh Angrist, who's on the
faculty here, has a very good

00:51:20.950 --> 00:51:23.840
book called "Mostly Harmless
Econometrics." But it's

00:51:23.840 --> 00:51:24.840
designed for academics.

00:51:24.840 --> 00:51:26.110
It's not really a textbook.

00:51:26.110 --> 00:51:28.900
But it goes through these things
in very, very, very

00:51:28.900 --> 00:51:30.870
good detail and it's
fun to read.

00:51:30.870 --> 00:51:33.110
But specifically, the
technique is called

00:51:33.110 --> 00:51:34.825
randomization inference.

00:51:34.825 --> 00:51:37.260
It was developed by Fisher.

00:51:37.260 --> 00:51:39.225
And what you do is
you basically--

00:51:45.420 --> 00:51:47.210
so you have your treatment and
your control group and your

00:51:47.210 --> 00:51:49.760
mean between the treatment and
the mean in the control and

00:51:49.760 --> 00:51:51.510
you test the statistical
significance.

00:51:51.510 --> 00:51:54.190
And then what you do is you
just randomly reassign

00:51:54.190 --> 00:51:57.320
everybody to either treatment or
control, regardless of what

00:51:57.320 --> 00:51:59.920
they actually did, and see if
there's a difference between

00:51:59.920 --> 00:52:01.330
the treatment and
control group.

00:52:01.330 --> 00:52:05.470
And if you do that a hundred
times, you can sort of get a

00:52:05.470 --> 00:52:09.090
sense for how often you find
statistically significant

00:52:09.090 --> 00:52:10.070
differences or not.

00:52:10.070 --> 00:52:11.340
It's related to bootstrapping.

00:52:16.940 --> 00:52:19.040
That's a reasonable method, but
the problem with that is

00:52:19.040 --> 00:52:21.160
the statistical power's
not very good.

00:52:21.160 --> 00:52:23.050
So you need a larger sample.

00:52:23.050 --> 00:52:25.140
And then once you have a larger
sample, then you don't

00:52:25.140 --> 00:52:27.070
need to worry about it because
you can cluster.

00:52:30.630 --> 00:52:32.980
So another question that's
maybe a little bit more

00:52:32.980 --> 00:52:35.670
technical is when you're doing
your analysis, your regression

00:52:35.670 --> 00:52:40.690
analysis, is what covariates
do you want to control for?

00:52:40.690 --> 00:52:43.510
So we're looking at the effect
of financial literacy

00:52:43.510 --> 00:52:46.960
education on credit
card repayment.

00:52:46.960 --> 00:52:48.760
When we do our statistical
analysis, do we want to

00:52:48.760 --> 00:52:53.310
control for the age of the
person, for their gender, for

00:52:53.310 --> 00:52:55.420
their initial measured
level of financial

00:52:55.420 --> 00:52:57.660
literacy, et cetera.

00:52:57.660 --> 00:52:59.860
Now the beauty of randomization
is that it

00:52:59.860 --> 00:53:00.920
doesn't matter.

00:53:00.920 --> 00:53:04.190
Even if you don't have data on
any of these covariates, as

00:53:04.190 --> 00:53:06.680
long as the program was
initially randomly assigned

00:53:06.680 --> 00:53:12.670
and the sample size is large
enough, then you'll be OK.

00:53:12.670 --> 00:53:16.340
But what the controls can do is
that they can help you get

00:53:16.340 --> 00:53:17.560
a more precise estimate.

00:53:17.560 --> 00:53:20.400
So if a lot of people's credit
card repayment behavior is

00:53:20.400 --> 00:53:24.920
explained by whether they ate a
cookie as a child or not and

00:53:24.920 --> 00:53:26.710
you happen to have that
particular data for people who

00:53:26.710 --> 00:53:31.370
have been reading The New
Yorker, then you can soak up

00:53:31.370 --> 00:53:32.780
some of the variation
and come up with a

00:53:32.780 --> 00:53:34.480
more precise estimate.

00:53:34.480 --> 00:53:35.970
Or you can control for
age, or control for

00:53:35.970 --> 00:53:37.340
income, or other things.

00:53:37.340 --> 00:53:41.760
So it's often desirable to
have additional controls.

00:53:41.760 --> 00:53:44.070
But what you don't want to do is
control for a variable that

00:53:44.070 --> 00:53:46.480
might have been affected by
the treatment itself.

00:53:46.480 --> 00:53:49.620
So if you're looking at the
effect of microfinance on

00:53:49.620 --> 00:53:53.060
women's empowerment, so that's
the goal of your study.

00:53:53.060 --> 00:53:55.990
And then you would say, well,
women who have higher levels

00:53:55.990 --> 00:53:59.760
of income report higher levels
of feeling empowered.

00:53:59.760 --> 00:54:00.910
So that's an important

00:54:00.910 --> 00:54:02.400
determinant of feeling empowered.

00:54:02.400 --> 00:54:04.610
We should include that
in our control

00:54:04.610 --> 00:54:06.100
when we do our analysis.

00:54:06.100 --> 00:54:09.930
But if it turns out that
microfinance increased income

00:54:09.930 --> 00:54:13.330
and increased control, then we
might conclude that there's no

00:54:13.330 --> 00:54:15.910
effect because we're attributing
the effect to the

00:54:15.910 --> 00:54:18.420
differences in income.

00:54:18.420 --> 00:54:19.670
Is that clear?

00:54:23.660 --> 00:54:28.890
A lot of these are fairly
nuanced issues and it's often

00:54:28.890 --> 00:54:34.920
worth consulting an academic
or often a PhD student are

00:54:34.920 --> 00:54:38.930
eager to work on projects
like this as well.

00:54:38.930 --> 00:54:41.750
Just as a rule, it's important
to report the raw differences

00:54:41.750 --> 00:54:43.310
and the regression
adjusted results.

00:54:48.510 --> 00:54:51.760
I think we advance a very strong
view that randomized

00:54:51.760 --> 00:54:55.780
evaluation is a very credible
method of evaluation.

00:54:55.780 --> 00:54:59.700
But even still, there are always
ways to tweak or twist

00:54:59.700 --> 00:55:01.860
things a little bit to try and
get the results you want.

00:55:01.860 --> 00:55:05.160
So you could have a survey with
a hundred people or a

00:55:05.160 --> 00:55:08.150
hundred outcomes and only
report seven of them.

00:55:08.150 --> 00:55:10.940
That might make your program
look better than it is.

00:55:10.940 --> 00:55:14.110
So these rules we're proposing
are ways to give people an

00:55:14.110 --> 00:55:17.700
honest and a thorough
view of the

00:55:17.700 --> 00:55:18.800
effectiveness of your program.

00:55:18.800 --> 00:55:21.280
So another rule is that when
you're reporting your

00:55:21.280 --> 00:55:24.750
regression results, you should
include the results with the

00:55:24.750 --> 00:55:26.600
covariates, as well as the
results without the

00:55:26.600 --> 00:55:27.850
covariates.

00:55:32.140 --> 00:55:36.550
So now let's talk about threats
to external validity.

00:55:36.550 --> 00:55:39.710
So we spent a lot of time so
far talking about internal

00:55:39.710 --> 00:55:43.880
validity, which is
sort of was the

00:55:43.880 --> 00:55:45.130
treatment randomly assigned?

00:55:45.130 --> 00:55:46.760
Did enough people in the
treatment group comply that

00:55:46.760 --> 00:55:47.600
you have a difference
between the

00:55:47.600 --> 00:55:49.130
treatment and control group?

00:55:49.130 --> 00:55:52.560
But it's not sufficient to know
that we've learned a lot

00:55:52.560 --> 00:55:55.432
from a randomized evaluation.

00:55:55.432 --> 00:56:01.190
So there's some threats that
just doing the evaluation

00:56:01.190 --> 00:56:05.070
itself may have some impact
above and beyond the program.

00:56:05.070 --> 00:56:07.630
And so these are called
Hawthorne and Henry effects.

00:56:07.630 --> 00:56:11.270
And maybe we'll go back to the
audience for some examples of

00:56:11.270 --> 00:56:12.760
a Hawthorne effect.

00:56:12.760 --> 00:56:15.570
So a Hawthorne effect is when
the treatment group behavior

00:56:15.570 --> 00:56:16.990
changes because of
the experiment.

00:56:16.990 --> 00:56:18.550
What's an example of that?

00:56:18.550 --> 00:56:21.881
Anybody familiar with the
original Hawthorne study?

00:56:21.881 --> 00:56:24.290
AUDIENCE: Was is the lights
being dimmed or different

00:56:24.290 --> 00:56:27.850
levels of lights and workers
felt like they were getting

00:56:27.850 --> 00:56:30.407
attention paid to them
no matter what the

00:56:30.407 --> 00:56:31.590
light level was at?

00:56:31.590 --> 00:56:33.300
It's just that there was
something going on and so

00:56:33.300 --> 00:56:35.710
their productivity was higher.

00:56:35.710 --> 00:56:37.140
SHAWN COLE: I actually don't
remember the study.

00:56:37.140 --> 00:56:38.040
Can I just try and
rephrase that?

00:56:38.040 --> 00:56:38.810
You can tell me whether
I'm right.

00:56:38.810 --> 00:56:41.900
So the experiment was to try and
figure out how the level

00:56:41.900 --> 00:56:44.575
of lighting in a factory
affects productivity?

00:56:44.575 --> 00:56:45.920
I suppose.

00:56:45.920 --> 00:56:48.770
And so they said we're going to
raise the level of lighting

00:56:48.770 --> 00:56:51.610
in this sort of select, maybe
even randomly selected.

00:56:51.610 --> 00:56:55.560
We randomly select 50 out of 100
of our work groups and we

00:56:55.560 --> 00:56:57.440
roll in an extra light.

00:56:57.440 --> 00:56:59.250
And they're like, oh
great, management

00:56:59.250 --> 00:57:00.800
really cares about us.

00:57:00.800 --> 00:57:02.230
They're including us
in this survey.

00:57:02.230 --> 00:57:03.530
They're giving us extra light.

00:57:03.530 --> 00:57:04.450
We're going to work
extra hard.

00:57:04.450 --> 00:57:08.660
And so you find a higher
output from that group.

00:57:08.660 --> 00:57:10.580
Alternatively, if you had said,
maybe people are getting

00:57:10.580 --> 00:57:12.060
distracted by having
too much light.

00:57:12.060 --> 00:57:14.780
So management picked these 50
groups, went in and unscrewed

00:57:14.780 --> 00:57:15.410
some light bulbs.

00:57:15.410 --> 00:57:16.770
They're working in sort
of a dim area, they'd

00:57:16.770 --> 00:57:17.740
be like, oh, wow.

00:57:17.740 --> 00:57:20.150
Management really cares
about our well being.

00:57:20.150 --> 00:57:22.350
Now we focus on the natural
light and we're going to work

00:57:22.350 --> 00:57:23.150
really hard.

00:57:23.150 --> 00:57:26.420
So just the act of sort of being
in the treatment causes

00:57:26.420 --> 00:57:27.645
your behavior to change.

00:57:27.645 --> 00:57:30.810
And so what's wrong with that?

00:57:30.810 --> 00:57:32.680
I mean we're trying to measure
the effect of treatment.

00:57:32.680 --> 00:57:34.220
If the effect of treatment
is to increase

00:57:34.220 --> 00:57:35.850
productivity, fine.

00:57:35.850 --> 00:57:39.110
AUDIENCE: Maybe you can not
make a generalization to a

00:57:39.110 --> 00:57:42.700
population that doesn't have
change that behavior because

00:57:42.700 --> 00:57:44.690
doesn't get the treatment
already.

00:57:49.680 --> 00:57:52.030
AUDIENCE: You run around
changing the light levels at

00:57:52.030 --> 00:57:55.210
different factories and you
don't get the effect because

00:57:55.210 --> 00:58:00.570
the real treatment was people
running around and testing and

00:58:00.570 --> 00:58:03.340
observing and measuring
the change in light.

00:58:03.340 --> 00:58:03.600
SHAWN COLE: Right.

00:58:03.600 --> 00:58:05.100
And saying, we're really glad
you're part of this study.

00:58:05.100 --> 00:58:06.632
It's very important.

00:58:06.632 --> 00:58:07.920
AUDIENCE: Which was
really what got

00:58:07.920 --> 00:58:09.260
people to work harder.

00:58:09.260 --> 00:58:10.510
SHAWN COLE: Right.

00:58:12.400 --> 00:58:14.500
You might be able to generalize
that if we decided

00:58:14.500 --> 00:58:18.650
to run this study in every
factory in our firm, then we

00:58:18.650 --> 00:58:20.500
might get similar results
in different factories.

00:58:20.500 --> 00:58:22.710
But probably, within a few
months, people would sort of

00:58:22.710 --> 00:58:24.960
just catch on that this is
just kind of wacky, and

00:58:24.960 --> 00:58:27.550
what's going on?

00:58:27.550 --> 00:58:30.240
It wouldn't really be the
effect of the program.

00:58:30.240 --> 00:58:34.292
Any other examples from your
programs of Hawthorne effects?

00:58:34.292 --> 00:58:36.900
Or things you might
be worried about?

00:58:40.540 --> 00:58:42.325
AUDIENCE: It seems like a lot
of behavior situations,

00:58:42.325 --> 00:58:43.200
there's a threat to this.

00:58:43.200 --> 00:58:47.105
Especially if you have some in
developing country context

00:58:47.105 --> 00:58:49.855
where you have foreigners coming
in or people from the

00:58:49.855 --> 00:58:51.410
capitol coming in.

00:58:51.410 --> 00:58:53.650
Especially if it's something
that-- again, with a behavior

00:58:53.650 --> 00:58:56.560
change where OK, I know I'm
supposed to be washing my

00:58:56.560 --> 00:58:57.230
hands with soap.

00:58:57.230 --> 00:58:59.630
I normally don't, but I know
that the white people get

00:58:59.630 --> 00:59:02.520
really happy when I do it and
they're coming in to evaluate.

00:59:02.520 --> 00:59:02.800
So I'm going to go ahead--

00:59:02.800 --> 00:59:03.031
SHAWN COLE: Right.

00:59:03.031 --> 00:59:07.090
So if you show up from abroad
and put posters encouraging

00:59:07.090 --> 00:59:09.890
people to wash their hands,
people may pay more

00:59:09.890 --> 00:59:12.260
attention to that.

00:59:12.260 --> 00:59:14.330
What does that validate
or not invalidate?

00:59:17.010 --> 00:59:19.380
So suppose we did this and we
found that the effect was it

00:59:19.380 --> 00:59:24.310
reduces reported incidence
of diarrhea by 15%.

00:59:24.310 --> 00:59:27.045
AUDIENCE: If you then don't
still have white people coming

00:59:27.045 --> 00:59:30.840
into the village, then the same
effect might not happen.

00:59:30.840 --> 00:59:31.170
SHAWN COLE: Right.

00:59:31.170 --> 00:59:34.550
So I guess it's a little bit
nuanced because we should

00:59:34.550 --> 00:59:37.550
distinguish between the program
generalizability,

00:59:37.550 --> 00:59:41.260
which is the program could be
white people come into the

00:59:41.260 --> 00:59:44.030
village, and Hawthorne effects,
which is because I

00:59:44.030 --> 00:59:46.250
know I'm in the treatment group
in the study, I'm going

00:59:46.250 --> 00:59:47.850
to act differently.

00:59:47.850 --> 00:59:52.890
So what's another example of
really, a Hawthorne effect?

00:59:52.890 --> 00:59:55.450
I'm sympathetic to yours as a
Hawthorne effect, but I want

00:59:55.450 --> 00:59:56.700
to really sort of nail it.

00:59:56.700 --> 01:00:00.330
AUDIENCE: There's one for hand
washing where every week

01:00:00.330 --> 01:00:02.490
people go into the villages
and then tell them the

01:00:02.490 --> 01:00:05.571
importance of hand washing as
a way to prevent malaria.

01:00:05.571 --> 01:00:07.320
And every time they ask them,
did you wash hands?

01:00:07.320 --> 01:00:09.000
Do you wash hands?

01:00:09.000 --> 01:00:12.350
But that's not sustainable on
a long-term basis because--

01:00:12.350 --> 01:00:14.900
and at the same time, you're
distributing free soap.

01:00:14.900 --> 01:00:17.140
So how do you separate
everything?

01:00:17.140 --> 01:00:18.980
SHAWN COLE: I would say again,
that's the program.

01:00:18.980 --> 01:00:21.740
And so we would say if we scaled
that program up to all

01:00:21.740 --> 01:00:24.270
of the country, we'd be fine.

01:00:24.270 --> 01:00:26.330
If we go in every week and
tell people to wash their

01:00:26.330 --> 01:00:30.029
hands and distribute
soap, that's fine.

01:00:30.029 --> 01:00:32.494
AUDIENCE: So sometimes on
sexual behavior studies,

01:00:32.494 --> 01:00:36.109
you'll find that a treatment
group that is encouraged to

01:00:36.109 --> 01:00:36.931
adopt condoms, or something.

01:00:36.931 --> 01:00:38.725
And then in the post-measurement
period, they

01:00:38.725 --> 01:00:40.970
know what the intervention
is about.

01:00:40.970 --> 01:00:42.840
And so if you ask them, what has
your sexual behavior been

01:00:42.840 --> 01:00:45.275
in the last week, they're much
more likely to say that

01:00:45.275 --> 01:00:47.460
they've been using condoms.

01:00:47.460 --> 01:00:51.340
Or change their partnering
habits or these other sorts of

01:00:51.340 --> 01:00:53.180
things that have nothing
to do with--

01:00:53.180 --> 01:00:53.440
SHAWN COLE: Right.

01:00:53.440 --> 01:00:57.000
So the effect of being in the
treatment group and knowing

01:00:57.000 --> 01:00:59.440
that you're getting this
treatment might change how you

01:00:59.440 --> 01:01:01.010
answer the survey questions,
even if you

01:01:01.010 --> 01:01:03.440
didn't behave that way.

01:01:03.440 --> 01:01:05.660
Any other?

01:01:05.660 --> 01:01:08.770
AUDIENCE: For the Hawthorne
effect?

01:01:08.770 --> 01:01:12.680
SHAWN COLE: So it's certainly
a problem with your survey.

01:01:12.680 --> 01:01:13.752
AUDIENCE: It might change
other aspects of their

01:01:13.752 --> 01:01:17.820
behavior other than condom use
that would be simply because

01:01:17.820 --> 01:01:19.200
they know that you're looking.

01:01:21.910 --> 01:01:24.110
SHAWN COLE: So I mean I would've
said something like I

01:01:24.110 --> 01:01:27.220
know that they really care about
me and so because I'm

01:01:27.220 --> 01:01:31.320
part of this MIT Poverty Action
Lab Study, it must be

01:01:31.320 --> 01:01:32.350
really important
that I do this.

01:01:32.350 --> 01:01:35.690
But if you were to generalize
the program, not as part of a

01:01:35.690 --> 01:01:39.542
study, then people would react
to it differently.

01:01:39.542 --> 01:01:43.900
The other side of the coin is
the John Henry effect, which

01:01:43.900 --> 01:01:45.700
is people in the comparison
group behave differently.

01:01:45.700 --> 01:01:47.350
What are examples of that?

01:01:52.580 --> 01:01:59.400
AUDIENCE: The village in the
control group is resentful to

01:01:59.400 --> 01:02:02.952
the politician who they
perceived as determining who

01:02:02.952 --> 01:02:06.880
got treatment status and so
they don't try as hard on

01:02:06.880 --> 01:02:07.730
whatever's being measured.

01:02:07.730 --> 01:02:12.580
Or they intentionally turn in
a poor performance as a sign

01:02:12.580 --> 01:02:14.035
of protest.

01:02:14.035 --> 01:02:15.290
SHAWN COLE: Right.

01:02:15.290 --> 01:02:17.460
They're like, why am I in the
control group of this study.

01:02:17.460 --> 01:02:19.250
I wanted to be in the
treatment group.

01:02:19.250 --> 01:02:21.520
I'm not going to use fertilizer
because I know this

01:02:21.520 --> 01:02:22.940
study's about fertilizer
or something.

01:02:26.530 --> 01:02:27.060
Other thoughts?

01:02:27.060 --> 01:02:29.310
AUDIENCE: You could
do the opposite.

01:02:29.310 --> 01:02:33.440
I could say, oh, those guys,
they got the treatment.

01:02:33.440 --> 01:02:35.680
But I don't need that.

01:02:35.680 --> 01:02:37.710
I can do just as well,
so I'm going to--

01:02:37.710 --> 01:02:38.790
SHAWN COLE: I'm going to pull
myself by my bootstraps.

01:02:38.790 --> 01:02:40.560
AUDIENCE: [UNINTELLIGIBLE] going
to start studying and

01:02:40.560 --> 01:02:43.870
double up and I'll show them.

01:02:43.870 --> 01:02:45.400
SHAWN COLE: That's an
interesting problem, is we

01:02:45.400 --> 01:02:49.900
don't really know which way
these effects go ex ante.

01:02:49.900 --> 01:02:52.810
The Hawthorne or John Henry
effects could be positive or

01:02:52.810 --> 01:02:55.630
could be negative and could
be a challenge for the

01:02:55.630 --> 01:02:55.910
evaluation.

01:02:55.910 --> 01:02:58.230
So how do you sort of try
and address this to

01:02:58.230 --> 01:02:59.480
resolve these problems?

01:03:06.766 --> 01:03:08.365
AUDIENCE: It'd be hard
to do statistically.

01:03:14.930 --> 01:03:16.900
AUDIENCE: This could be
dangerous I suppose.

01:03:16.900 --> 01:03:19.660
But if you try to make the
people in the control group,

01:03:19.660 --> 01:03:21.950
for instance, feel special
in some other way.

01:03:21.950 --> 01:03:26.540
But in a way that you can say
is not related to anything

01:03:26.540 --> 01:03:28.120
you're measuring from
the treatment.

01:03:28.120 --> 01:03:28.255
SHAWN COLE: Right.

01:03:28.255 --> 01:03:30.115
So we're doing a financial
literacy evaluation where

01:03:30.115 --> 01:03:31.920
we're showing financial
literacy videos to the

01:03:31.920 --> 01:03:32.940
treatment group.

01:03:32.940 --> 01:03:34.570
In the control group we're
bringing them in and we're

01:03:34.570 --> 01:03:37.610
showing them films about health
or something that we

01:03:37.610 --> 01:03:39.350
don't think will have any effect
on financial literacy,

01:03:39.350 --> 01:03:42.720
but lots of the things
are the same.

01:03:42.720 --> 01:03:45.170
In the medical literature they
do often double blind studies,

01:03:45.170 --> 01:03:46.790
where you don't even know
whether you're in the

01:03:46.790 --> 01:03:47.710
treatment group or the
control group.

01:03:47.710 --> 01:03:50.230
So you can't get despondent for
being in the control group

01:03:50.230 --> 01:03:54.610
because you don't know
you're there.

01:03:54.610 --> 01:03:57.210
Sometimes these are sort of
inevitable and you can't get

01:03:57.210 --> 01:03:59.520
around them, but you should
think about them carefully and

01:03:59.520 --> 01:04:02.780
try to minimize the risk.

01:04:02.780 --> 01:04:04.680
So another problem with
evaluations is sort of

01:04:04.680 --> 01:04:07.270
behavioral responses
to evaluations.

01:04:07.270 --> 01:04:09.380
So we assign some schools to
treatment schools and some

01:04:09.380 --> 01:04:11.250
schools to comparison schools.

01:04:11.250 --> 01:04:17.310
And the people in the comparison
school say, oh.

01:04:17.310 --> 01:04:19.490
So we give textbooks to
the treatment school.

01:04:19.490 --> 01:04:20.990
So lots of people
say, hey, this

01:04:20.990 --> 01:04:22.110
school's got new textbooks.

01:04:22.110 --> 01:04:24.450
I'm going to go to
this school.

01:04:24.450 --> 01:04:26.100
And so that increases
the class size.

01:04:26.100 --> 01:04:29.470
So the textbooks may benefit
the test score, but the

01:04:29.470 --> 01:04:32.610
increased class size offsets
that and you find no effect of

01:04:32.610 --> 01:04:34.780
textbooks because
the behavioral

01:04:34.780 --> 01:04:35.950
responses undid this.

01:04:35.950 --> 01:04:38.480
Whereas if you were to do it
throughout the country and

01:04:38.480 --> 01:04:40.480
give every school new textbooks,
then there'd be no

01:04:40.480 --> 01:04:43.150
transferring around because
there'd be no reason to change

01:04:43.150 --> 01:04:45.070
schools because your school
would have free

01:04:45.070 --> 01:04:46.720
textbooks as well.

01:04:46.720 --> 01:04:48.420
So that's sort of another.

01:04:48.420 --> 01:04:50.478
AUDIENCE: I just wanted to go
back to your question about

01:04:50.478 --> 01:04:54.503
how to minimize the Hawthorne
and John Henry effect.

01:04:54.503 --> 01:04:59.340
We're doing an impact study,
impact evaluation on

01:04:59.340 --> 01:05:02.735
microfinance product
in Mexico.

01:05:02.735 --> 01:05:05.650
We try not to talk
about the study.

01:05:05.650 --> 01:05:08.868
The people at the top know that
they're implementing this

01:05:08.868 --> 01:05:10.110
study, but the participants
don't know

01:05:10.110 --> 01:05:10.610
anything about the study.

01:05:10.610 --> 01:05:15.601
And it's very kind of hush
hush, basically to try to

01:05:15.601 --> 01:05:17.450
avoid changing behavior.

01:05:17.450 --> 01:05:20.110
And obviously that's not
always possible.

01:05:20.110 --> 01:05:23.150
But to the extent that it is,
people don't have to know that

01:05:23.150 --> 01:05:24.040
they're part of a study.

01:05:24.040 --> 01:05:26.460
SHAWN COLE: I'm sure everybody
here who's lived in the US has

01:05:26.460 --> 01:05:32.070
participated in a study
sponsored by a credit card

01:05:32.070 --> 01:05:37.390
company that does randomized
evaluations to figure out how

01:05:37.390 --> 01:05:38.880
to get people to sign up
for their credit cards.

01:05:38.880 --> 01:05:41.740
So they randomly send some
people 10 point font on the

01:05:41.740 --> 01:05:43.440
outside letter, some people
12 point font

01:05:43.440 --> 01:05:44.030
on the outside letter.

01:05:44.030 --> 01:05:46.690
Some people 16 point font
on the outside letter.

01:05:46.690 --> 01:05:48.570
And they keep track of who
responds and they figure out

01:05:48.570 --> 01:05:50.110
that this is the right
font size.

01:05:50.110 --> 01:05:51.650
And then they say, what
color should it be?

01:05:51.650 --> 01:05:52.650
This is the right color.

01:05:52.650 --> 01:05:54.790
What should the teaser
interest rate be?

01:05:54.790 --> 01:06:00.290
And so lots of firms do this
without you even knowing it.

01:06:00.290 --> 01:06:02.740
And then you won't get any John
Henry or John Hawthorne

01:06:02.740 --> 01:06:03.800
effects because people won't
even know they're in

01:06:03.800 --> 01:06:04.670
experiments.

01:06:04.670 --> 01:06:07.530
Sometimes there are sort of
consent issues that you need

01:06:07.530 --> 01:06:11.165
people's informed consent that
preclude that from happening.

01:06:14.280 --> 01:06:16.450
There's some issues that we were
touching on before, sort

01:06:16.450 --> 01:06:18.550
of the generalizability
of results.

01:06:18.550 --> 01:06:20.670
So can the program be
replicated on a

01:06:20.670 --> 01:06:22.100
large national scale?

01:06:22.100 --> 01:06:24.700
So we're going in and giving
free soap to these villages,

01:06:24.700 --> 01:06:26.860
but it would get expensive to
give free soap to every

01:06:26.860 --> 01:06:28.080
village in the country.

01:06:28.080 --> 01:06:29.860
The study sample, is
it representative?

01:06:29.860 --> 01:06:32.020
So what's a problem you
might run into here?

01:06:36.710 --> 01:06:39.765
AUDIENCE: If for logistical
reasons, you're only doing it

01:06:39.765 --> 01:06:44.130
in one state in a country and
it's randomized within the

01:06:44.130 --> 01:06:48.110
state, but then there's just
a different culture in the

01:06:48.110 --> 01:06:50.994
state, or there's a really
strong history or traditions

01:06:50.994 --> 01:06:53.922
in the state that then are not
generalizable up to the

01:06:53.922 --> 01:06:55.880
country as a whole.

01:06:55.880 --> 01:07:00.990
SHAWN COLE: A problem you often
run into is NGOs will, I

01:07:00.990 --> 01:07:05.910
think for reasonable reasons
say, OK, let's do this study

01:07:05.910 --> 01:07:08.480
at out best bank branch or
at our best district.

01:07:08.480 --> 01:07:10.820
Because doing a study
is pretty hard.

01:07:10.820 --> 01:07:12.970
You have to have treatment and
you have to have control and

01:07:12.970 --> 01:07:14.600
keep track of who's
in which group.

01:07:14.600 --> 01:07:16.600
And so these people have been
with us for five years and

01:07:16.600 --> 01:07:18.110
they can do the study
really well.

01:07:18.110 --> 01:07:23.150
And so you do the study and you
find some nice effect, but

01:07:23.150 --> 01:07:24.760
that is the effect of putting
your best people into the

01:07:24.760 --> 01:07:26.660
program and you only
have 20 of them.

01:07:26.660 --> 01:07:30.110
And now when you try to scale
it up to 500 villages, you

01:07:30.110 --> 01:07:32.170
just don't have that level of
human capital to implement the

01:07:32.170 --> 01:07:34.020
same quality of program
elsewhere.

01:07:37.540 --> 01:07:39.860
So sensitivity of results.

01:07:39.860 --> 01:07:46.560
This is sort of important, but
may be second order important.

01:07:46.560 --> 01:07:49.110
The state of the art and the
sciences, we're still looking

01:07:49.110 --> 01:07:51.110
for things that work
and work well.

01:07:51.110 --> 01:07:56.960
So we're not as worried about
figuring out if we give the

01:07:56.960 --> 01:08:00.320
de-worming tablet every month
versus every three weeks,

01:08:00.320 --> 01:08:01.470
which one is more effective.

01:08:01.470 --> 01:08:03.970
I mean that's a useful important
question and it

01:08:03.970 --> 01:08:05.450
probably deserves a study.

01:08:05.450 --> 01:08:09.190
But it's hard enough to get the
big picture studies done

01:08:09.190 --> 01:08:12.740
to then move onto the
sensitivity of the results.

01:08:12.740 --> 01:08:15.270
That said, sometimes there are
often interesting economic

01:08:15.270 --> 01:08:16.020
questions you have.

01:08:16.020 --> 01:08:18.800
So you want to know whether
microfinance has an impact on

01:08:18.800 --> 01:08:19.770
people's wealth.

01:08:19.770 --> 01:08:22.109
But you might also care about
the interest rate.

01:08:22.109 --> 01:08:25.384
And so for microfinance to be
sustainable, the interest rate

01:08:25.384 --> 01:08:25.979
has to be high.

01:08:25.979 --> 01:08:28.290
But for it to generate a lot of
income for the borrowers,

01:08:28.290 --> 01:08:29.500
the interest rate
has to be low.

01:08:29.500 --> 01:08:32.250
So you could try your program
at different interest rates

01:08:32.250 --> 01:08:35.140
and see whether you find the
same effect at different

01:08:35.140 --> 01:08:36.460
interest rates.

01:08:36.460 --> 01:08:37.710
That would be very
interesting.

01:08:40.620 --> 01:08:43.029
So there's often a trade-off
between internal

01:08:43.029 --> 01:08:46.500
and external validity.

01:08:46.500 --> 01:08:52.149
In my experience, I think it's
probably reasonable to focus

01:08:52.149 --> 01:08:54.450
on the first pass on the
internal validity.

01:08:54.450 --> 01:08:58.399
Because the advantage of picking
your best branch or

01:08:58.399 --> 01:09:00.939
picking your good people to get
the study done and done

01:09:00.939 --> 01:09:04.220
well and have a large treatment
effect is that we

01:09:04.220 --> 01:09:06.470
were sure we know what
we're measuring.

01:09:06.470 --> 01:09:10.229
It's often hard to measure
effects in the real world.

01:09:10.229 --> 01:09:13.120
It's not as if the hundred
people in this room are the

01:09:13.120 --> 01:09:14.270
first people to think
we should do

01:09:14.270 --> 01:09:16.520
something to reduce poverty.

01:09:16.520 --> 01:09:18.520
It's a difficult and
thorny problem.

01:09:18.520 --> 01:09:21.310
And so if we can throw sort of
our best program and show that

01:09:21.310 --> 01:09:26.710
our best program is effective,
then we can sort of work on

01:09:26.710 --> 01:09:31.000
expanding and testing our
second best program.

01:09:31.000 --> 01:09:34.120
Statistical power is often
stronger if you have a very

01:09:34.120 --> 01:09:35.149
homogeneous sample.

01:09:35.149 --> 01:09:41.609
So if you can randomize in a set
of twins or something like

01:09:41.609 --> 01:09:43.130
that, you have very good
statistical power.

01:09:43.130 --> 01:09:45.470
But twins might not be
representative of the general

01:09:45.470 --> 01:09:46.770
population.

01:09:46.770 --> 01:09:48.899
And then, of course, the
study location is

01:09:48.899 --> 01:09:49.655
almost never random.

01:09:49.655 --> 01:09:53.710
In Indonesia we did manage to do
a nationally representative

01:09:53.710 --> 01:09:56.860
randomized evaluation of a
financial literacy program,

01:09:56.860 --> 01:09:59.480
but that was just because the
world bank was doing a

01:09:59.480 --> 01:10:01.940
nationally representative survey
and we persuaded them

01:10:01.940 --> 01:10:03.470
to tack the experiment
on at the end.

01:10:03.470 --> 01:10:06.080
But otherwise it's almost always
prohibitively expensive

01:10:06.080 --> 01:10:09.640
to travel around to hundreds
of locations.

01:10:09.640 --> 01:10:11.450
But at the end of the day,
you do care a lot

01:10:11.450 --> 01:10:12.550
about external validity.

01:10:12.550 --> 01:10:15.550
You want to know that before
you throw a lot of money at

01:10:15.550 --> 01:10:19.390
the program, can you get
the same effect when

01:10:19.390 --> 01:10:20.920
you scale it up?

01:10:20.920 --> 01:10:24.320
And is this program effective
for large populations?

01:10:27.440 --> 01:10:30.710
So in the last 5 or 10 minutes,
we'll talk a little

01:10:30.710 --> 01:10:32.020
bit about cost effectiveness.

01:10:32.020 --> 01:10:33.870
So you've done your program,
you've done your evaluation,

01:10:33.870 --> 01:10:35.160
you've got the efficacy.

01:10:35.160 --> 01:10:36.640
You know how much it costs
to deliver the program.

01:10:36.640 --> 01:10:40.400
Now how do you decide which
program to pursue?

01:10:46.360 --> 01:10:48.150
I guess the important thing
in this is pretty obvious.

01:10:48.150 --> 01:10:50.380
It's just finding a metric that
you can use to compare

01:10:50.380 --> 01:10:51.340
different programs.

01:10:51.340 --> 01:10:54.670
So in educational programs
we often look at years of

01:10:54.670 --> 01:10:55.920
schooling as an output.

01:10:58.540 --> 01:11:00.840
Having an extra teacher causes
people to stay in school

01:11:00.840 --> 01:11:02.940
longer, but extra teachers
are expensive.

01:11:02.940 --> 01:11:07.450
You can figure out how much
it costs per child year of

01:11:07.450 --> 01:11:08.910
schooling you create.

01:11:08.910 --> 01:11:11.200
In health programs, they have
something called a disability

01:11:11.200 --> 01:11:14.100
adjusted life year, which I'm
sure some of you know a lot

01:11:14.100 --> 01:11:14.780
better than I do.

01:11:14.780 --> 01:11:18.810
But it's basically an unimpaired
year of life with

01:11:18.810 --> 01:11:22.390
no disability counts as one.

01:11:22.390 --> 01:11:25.450
If your legs are immobile,
then maybe

01:11:25.450 --> 01:11:26.830
it'd be 0.6 or something.

01:11:26.830 --> 01:11:30.930
And it sort of gets adjusted
down to figure out which

01:11:30.930 --> 01:11:33.170
health interventions are more
or less cost effective.

01:11:33.170 --> 01:11:35.230
Or you could do cost
per death averted.

01:11:42.080 --> 01:11:47.210
I think the interesting takeaway
here is that doing

01:11:47.210 --> 01:11:49.150
these types of comparisons can
sometimes lead to pretty

01:11:49.150 --> 01:11:50.650
surprising results.

01:11:50.650 --> 01:11:59.640
So we know how to get people in
school by reducing the cost

01:11:59.640 --> 01:12:02.820
of education, so the PROGRESA
program in Mexico made

01:12:02.820 --> 01:12:04.980
conditional cash transfers
to students' parents

01:12:04.980 --> 01:12:06.260
who attended school.

01:12:06.260 --> 01:12:08.030
Providing free uniforms
increases attendance.

01:12:08.030 --> 01:12:11.280
Providing school meals
increases attendance.

01:12:11.280 --> 01:12:13.270
We've looked at incentives
to increase learning.

01:12:13.270 --> 01:12:15.010
But we've also looked at
the de-worming case.

01:12:19.380 --> 01:12:21.780
If you'd said five years ago
to educational people who

01:12:21.780 --> 01:12:24.180
specialize in education in
developing countries, what do

01:12:24.180 --> 01:12:28.020
you think a very high impact
intervention would be?

01:12:28.020 --> 01:12:30.850
I think very few people would
have suggested de-worming.

01:12:30.850 --> 01:12:35.680
But if you do the math, you can
figure out that an extra

01:12:35.680 --> 01:12:38.540
teacher will induce let's say
one year of additional

01:12:38.540 --> 01:12:41.790
schooling, but costs
$55 per pupil.

01:12:41.790 --> 01:12:44.395
So the cost per additional year
of schooling is here for

01:12:44.395 --> 01:12:45.410
extra teacher.

01:12:45.410 --> 01:12:47.120
Iron supplements here.

01:12:47.120 --> 01:12:48.850
School meals here.

01:12:48.850 --> 01:12:49.600
Deworming here.

01:12:49.600 --> 01:12:53.070
So it's just tremendously cost
effective to provide this

01:12:53.070 --> 01:12:56.860
de-worming medicine
as a means of

01:12:56.860 --> 01:13:00.450
increasing years of education.

01:13:00.450 --> 01:13:02.580
Much cheaper than scholarships
for girls, et cetera, or

01:13:02.580 --> 01:13:03.830
school uniforms.

01:13:07.330 --> 01:13:10.980
And you could do this
calculation not just for

01:13:10.980 --> 01:13:12.720
education, you could say, there
are a lot of things that

01:13:12.720 --> 01:13:13.260
we care about.

01:13:13.260 --> 01:13:18.120
We care about health outcomes,
human capital investment,

01:13:18.120 --> 01:13:19.650
externalities.

01:13:19.650 --> 01:13:22.990
And so an interesting thing
that came out of the

01:13:22.990 --> 01:13:27.380
de-worming study was that if you
did the old studies that

01:13:27.380 --> 01:13:30.165
didn't take into account the
externalities and just sort of

01:13:30.165 --> 01:13:31.935
treated some people in a school
but not other people in

01:13:31.935 --> 01:13:34.170
a school, it didn't look like
a very good intervention.

01:13:34.170 --> 01:13:36.940
Because the kids would keep
reinfecting each other even

01:13:36.940 --> 01:13:38.840
though they'd just been treated,
and so it wasn't that

01:13:38.840 --> 01:13:39.780
cost effective.

01:13:39.780 --> 01:13:43.540
But once you did the school
level randomization and took

01:13:43.540 --> 01:13:46.440
into account for the
externalities, then the

01:13:46.440 --> 01:13:50.060
program turned out to look very,
very cheap as a way of

01:13:50.060 --> 01:13:51.310
providing education.

01:13:54.660 --> 01:13:57.580
It's also an incredibly
effective way of improving

01:13:57.580 --> 01:14:00.560
health outcomes, de-worming.

01:14:00.560 --> 01:14:05.540
And much more effective, for
example, than treating

01:14:05.540 --> 01:14:06.790
schistosomiasis.

01:14:09.880 --> 01:14:11.790
You can do even more
calculations.

01:14:11.790 --> 01:14:14.150
You can say OK, so we know that
the deworming medicine is

01:14:14.150 --> 01:14:16.810
going to increase years
of education by 0.2.

01:14:16.810 --> 01:14:19.590
Well, what's 0.2 years
of education worth?

01:14:19.590 --> 01:14:22.150
There are economists who have
done estimates of the returns

01:14:22.150 --> 01:14:23.080
to schooling in Kenya.

01:14:23.080 --> 01:14:25.310
They say if you get an extra
year of schooling, you're

01:14:25.310 --> 01:14:29.010
going to get 7% more income
throughout your life.

01:14:29.010 --> 01:14:31.360
And then so you've got
40 years of life

01:14:31.360 --> 01:14:32.510
at 7% higher income.

01:14:32.510 --> 01:14:36.430
You can take the present value
of that stream of additional

01:14:36.430 --> 01:14:40.460
wage payments and you can see
that, wow, by investing only

01:14:40.460 --> 01:14:46.170
$0.49, we're going to generate
$20 more in wages at net

01:14:46.170 --> 01:14:50.180
present value, on average.

01:14:50.180 --> 01:14:53.560
And so if you have a tax rate
of 10%, that's clearly a

01:14:53.560 --> 01:14:55.660
profitable intervention
for the government if

01:14:55.660 --> 01:14:57.260
it's patient enough.

01:14:57.260 --> 01:15:00.040
Because it'll take in $2 in net
present value in taxes at

01:15:00.040 --> 01:15:02.180
a cost of only $0.49
of delivering.

01:15:05.310 --> 01:15:06.600
Of course, there may
not be any taxes on

01:15:06.600 --> 01:15:09.660
informal labor in Kenya.

01:15:09.660 --> 01:15:14.170
So I think this is an example we
like to cite first, because

01:15:14.170 --> 01:15:16.200
Michael helped prepare this
particular lecture the first

01:15:16.200 --> 01:15:18.360
time around and is fond
of that paper.

01:15:18.360 --> 01:15:22.300
But second, I think it's a very
nice example of a program

01:15:22.300 --> 01:15:24.660
that has a really big
macro effect.

01:15:24.660 --> 01:15:29.660
So basically, it's been adopted
nationwide in Uganda,

01:15:29.660 --> 01:15:32.460
and they're expanding
it a lot in Kenya.

01:15:32.460 --> 01:15:36.230
It's been tried in India and
many other countries have

01:15:36.230 --> 01:15:36.980
realized that this is a

01:15:36.980 --> 01:15:39.040
tremendously effective program.

01:15:39.040 --> 01:15:42.350
And the ability to have this
randomized evaluation, there's

01:15:42.350 --> 01:15:45.070
very credible evidence that
said, we had these treatment

01:15:45.070 --> 01:15:45.970
groups, these control groups.

01:15:45.970 --> 01:15:49.490
We followed people for three
years and the reason why our

01:15:49.490 --> 01:15:51.440
results are so different than
the other results that you

01:15:51.440 --> 01:15:54.410
were citing as a reason not to
provide de-worming is because

01:15:54.410 --> 01:15:55.350
of these externalities.

01:15:55.350 --> 01:15:58.310
And we can show you why these
externalities matter.

01:15:58.310 --> 01:16:01.320
The credibility of that study
really helped to transform

01:16:01.320 --> 01:16:05.800
policy and literally save
thousands, tens of

01:16:05.800 --> 01:16:07.800
thousands of lives.

01:16:07.800 --> 01:16:09.810
Maybe more.

01:16:09.810 --> 01:16:10.870
Other examples.

01:16:10.870 --> 01:16:13.430
PROGRESA, which some of you
might be familiar with.

01:16:13.430 --> 01:16:17.110
It's actually the government of
Mexico decided to integrate

01:16:17.110 --> 01:16:19.260
a bunch of randomized
evaluations into its social

01:16:19.260 --> 01:16:21.770
welfare programs.

01:16:21.770 --> 01:16:24.510
That methodology and the results
of what's been shown

01:16:24.510 --> 01:16:26.810
effective in that program have
been adopted throughout Latin

01:16:26.810 --> 01:16:28.020
America and elsewhere.

01:16:28.020 --> 01:16:31.510
And Ben Olken, whom you saw
earlier, did some experiments

01:16:31.510 --> 01:16:34.180
on threat of audits in Indonesia
for corruption.

01:16:34.180 --> 01:16:36.790
And the government of Indonesia
is increasing the

01:16:36.790 --> 01:16:39.910
probability of audits as a way
of fighting corruption in the

01:16:39.910 --> 01:16:44.100
nation's fourth most
populous country.

01:16:44.100 --> 01:16:49.760
So I think the conclusion is
that these evaluations, which

01:16:49.760 --> 01:16:51.410
take a lot of time and
take a lot of effort,

01:16:51.410 --> 01:16:52.350
let's not kid ourselves.

01:16:52.350 --> 01:16:54.690
If you sign up for one
of these, it's

01:16:54.690 --> 01:16:57.370
going to be a big affair.

01:16:57.370 --> 01:16:59.750
But it can have a tremendous
impact.

01:16:59.750 --> 01:17:03.040
It's very important to know from
your own perspective how

01:17:03.040 --> 01:17:04.640
effective your program
is, but you can

01:17:04.640 --> 01:17:06.220
influence policy a lot.

01:17:06.220 --> 01:17:09.180
So I'm just going to conclude
with two things.

01:17:09.180 --> 01:17:12.960
One is this mention of the
additional resources, which

01:17:12.960 --> 01:17:14.500
should be on the JPAL website.

01:17:14.500 --> 01:17:17.780
This is a book you have
to buy for $60.

01:17:17.780 --> 01:17:19.880
Only buy this if you're
already familiar with

01:17:19.880 --> 01:17:20.860
econometrics.

01:17:20.860 --> 01:17:23.010
But they're both great
treatments of the material

01:17:23.010 --> 01:17:26.260
we've covered this week.

01:17:26.260 --> 01:17:29.800
And I believe JPAL is in the
process of developing a

01:17:29.800 --> 01:17:31.970
practitioner's guide as well.

01:17:31.970 --> 01:17:33.170
This is a much more
technical guide

01:17:33.170 --> 01:17:34.220
that's full of equations.

01:17:34.220 --> 01:17:39.080
But hopefully, as you've seen
throughout the last week,

01:17:39.080 --> 01:17:41.980
we've tried to explain things
in ways that are accessible

01:17:41.980 --> 01:17:44.500
that you can explain to people
who haven't taken econometrics

01:17:44.500 --> 01:17:47.020
and that will hopefully be
coming out pretty soon.

01:17:47.020 --> 01:17:50.250
And so if I were just to at
least take two seconds to give

01:17:50.250 --> 01:17:54.160
my perspective, I'm young but
I've done a few of these.

01:17:54.160 --> 01:17:56.000
It's probably helpful when
you're doing one of these

01:17:56.000 --> 01:17:58.460
evaluations to engage
in academics.

01:17:58.460 --> 01:18:00.330
If you're thinking of doing
a study, you just

01:18:00.330 --> 01:18:01.900
send an email to JPAL.

01:18:01.900 --> 01:18:06.050
They'll send it out to their
network of 15 or 20 people and

01:18:06.050 --> 01:18:07.340
I don't know what the
response rate is.

01:18:07.340 --> 01:18:11.170
But I think there's a lot of
interest in doing these

01:18:11.170 --> 01:18:12.730
experiments.

01:18:12.730 --> 01:18:14.230
IPA is another organization
that does this.

01:18:14.230 --> 01:18:16.850
But there's some subtle nuanced
issues that require

01:18:16.850 --> 01:18:18.000
careful thinking through.

01:18:18.000 --> 01:18:21.430
Or you could just call me up
and say, we're going to be

01:18:21.430 --> 01:18:22.160
doing this study.

01:18:22.160 --> 01:18:23.800
We're spending a lot of money
on it, can we just talk

01:18:23.800 --> 01:18:25.700
through these issues with you?

01:18:25.700 --> 01:18:30.310
I'd be perfectly happy
to do that.

01:18:30.310 --> 01:18:32.580
And then, one thought.

01:18:32.580 --> 01:18:34.080
Just a few thoughts
on keeping your

01:18:34.080 --> 01:18:35.140
evaluation costs effective.

01:18:35.140 --> 01:18:36.640
And maybe even think about this
when you're proposing

01:18:36.640 --> 01:18:40.230
your program tomorrow, is that
randomizing at the individual

01:18:40.230 --> 01:18:46.550
level where possible and
appropriate is a tremendously

01:18:46.550 --> 01:18:48.590
useful way to keep costs down.

01:18:48.590 --> 01:18:51.270
Because their statistical power
is so much higher, so to

01:18:51.270 --> 01:18:54.340
detect any given effect size you
can do it with many fewer

01:18:54.340 --> 01:18:56.010
observations.

01:18:56.010 --> 01:18:57.990
Another way to save a lot
of money is to use

01:18:57.990 --> 01:18:58.850
administrative data.

01:18:58.850 --> 01:19:02.790
So if you can get people to give
you access in New York to

01:19:02.790 --> 01:19:06.090
their checking accounts and
credit card statements, and

01:19:06.090 --> 01:19:09.040
get those reported, at a
low cost to you, the

01:19:09.040 --> 01:19:11.520
administrative data is often of
very high quality and can

01:19:11.520 --> 01:19:14.020
be collected for little money.

01:19:14.020 --> 01:19:16.550
And then the third trick is
sort of using lotteries to

01:19:16.550 --> 01:19:17.610
ensure high compliance.

01:19:17.610 --> 01:19:19.250
So if you sort of announce
that you've got this new

01:19:19.250 --> 01:19:22.440
program and you let everybody
who's interested apply, and

01:19:22.440 --> 01:19:26.850
then you sort of randomly select
60 and have 60 as a

01:19:26.850 --> 01:19:29.380
control group, then the
compliance is going to be

01:19:29.380 --> 01:19:32.230
pretty high in the
treatment group.

01:19:32.230 --> 01:19:35.200
That Wald estimator, the
difference between these two

01:19:35.200 --> 01:19:37.740
things, is going to
be pretty high.

01:19:37.740 --> 01:19:42.700
So we're at 2:25, which at MIT
is the end of the class.

01:19:42.700 --> 01:19:45.890
And I guess the end of the
course, at least from the

01:19:45.890 --> 01:19:46.520
lecturing side.

01:19:46.520 --> 01:19:49.000
So thank you very much and I
will be around here to answer

01:19:49.000 --> 01:19:51.110
questions for the next
15 minutes if people

01:19:51.110 --> 01:19:52.360
would like to talk.