WEBVTT

00:00:00.040 --> 00:00:02.390
SPEAKER: The following content
is provided under a Creative

00:00:02.390 --> 00:00:03.680
Commons license.

00:00:03.680 --> 00:00:06.640
Your support will help MIT
OpenCourseWare continue to

00:00:06.640 --> 00:00:09.980
offer high-quality educational
resources for free.

00:00:09.980 --> 00:00:12.820
To make a donation or view
additional materials from

00:00:12.820 --> 00:00:16.750
hundreds of MIT courses, visit
MIT OpenCourseWare at

00:00:16.750 --> 00:00:18.000
ocw.mit.edu.

00:00:22.390 --> 00:00:22.760
PROFESSOR: OK.

00:00:22.760 --> 00:00:25.340
So before we begin, I would
like to just ask

00:00:25.340 --> 00:00:27.500
a very simple question.

00:00:27.500 --> 00:00:31.140
Do you think randomized
evaluation are the best way to

00:00:31.140 --> 00:00:32.729
conduct an impact evaluation?

00:00:32.729 --> 00:00:36.550
Please raise your hand
if you think so.

00:00:36.550 --> 00:00:38.700
Just be honest.

00:00:38.700 --> 00:00:42.260
All right, the TAs, you
guys don't count.

00:00:42.260 --> 00:00:43.100
All right.

00:00:43.100 --> 00:00:44.080
OK.

00:00:44.080 --> 00:00:46.700
So I have a job to do now.

00:00:46.700 --> 00:00:48.780
Whereas I thought
that maybe not.

00:00:48.780 --> 00:00:54.050
One of the things I would
like to do is to--

00:00:54.050 --> 00:00:56.760
this is one thing I've
discovered about teaching.

00:00:56.760 --> 00:00:59.180
We have about an hour
and 25 minutes.

00:00:59.180 --> 00:01:02.410
And if I speak for an hour and
25 minutes, I know two things

00:01:02.410 --> 00:01:02.990
will happen.

00:01:02.990 --> 00:01:05.080
One, you will get very
bored, and two, you

00:01:05.080 --> 00:01:06.290
will not learn anything.

00:01:06.290 --> 00:01:10.960
So I want you to make sure
that you interrupt with

00:01:10.960 --> 00:01:12.550
questions that you have.

00:01:12.550 --> 00:01:15.370
If they can be on the topic,
that would be very good.

00:01:15.370 --> 00:01:18.460
If they're are off-topic, I may
delay the question or I

00:01:18.460 --> 00:01:21.320
may postpone the question, at
least until I get there.

00:01:21.320 --> 00:01:25.610
The other thing I would like
to say about the way this

00:01:25.610 --> 00:01:28.140
would work, is I have
no power over you.

00:01:28.140 --> 00:01:30.860
Whereas my students, I have
a grade to give, with

00:01:30.860 --> 00:01:31.800
you I have no power.

00:01:31.800 --> 00:01:34.580
But I will still ask you to do
certain things during the

00:01:34.580 --> 00:01:35.230
presentation.

00:01:35.230 --> 00:01:37.070
So I hope you'll collaborate.

00:01:37.070 --> 00:01:39.170
So my session is called
Why Randomize?

00:01:39.170 --> 00:01:42.600
And the idea of Why Randomize?

00:01:42.600 --> 00:01:45.240
comes, for those of you who are
convinced, I hope you can

00:01:45.240 --> 00:01:50.970
use this session to help
convince others why this

00:01:50.970 --> 00:01:53.775
method is a very good method
to do an impact evaluation.

00:01:53.775 --> 00:01:56.330
And for those of you who are not
convinced, I would like to

00:01:56.330 --> 00:02:00.230
actually welcome you to raise
any objections you have.

00:02:00.230 --> 00:02:05.230
And I'm not here to tell you
randomization is a panacea or

00:02:05.230 --> 00:02:08.690
it's a solution to all the
problems of mankind.

00:02:08.690 --> 00:02:11.470
But I think in terms of impact
evaluations, it's a very

00:02:11.470 --> 00:02:13.320
powerful method.

00:02:13.320 --> 00:02:15.640
So the outline of the talk, I'll
give you a little bit of

00:02:15.640 --> 00:02:16.090
background.

00:02:16.090 --> 00:02:18.590
We'll define, what is a
randomized evaluation?

00:02:18.590 --> 00:02:20.030
It's going to be important
to make sure we

00:02:20.030 --> 00:02:21.050
have a common language.

00:02:21.050 --> 00:02:25.830
Then advantages and
disadvantages of experiments.

00:02:25.830 --> 00:02:28.620
Then we're going to do the get
out the vote, and then finally

00:02:28.620 --> 00:02:31.910
conclude in hopefully an
hour and 20 minutes.

00:02:31.910 --> 00:02:33.610
So how to measure impact?

00:02:33.610 --> 00:02:36.440
This is something that
Rachel referred to.

00:02:36.440 --> 00:02:39.360
The idea for measuring impact
is, we want to compare what

00:02:39.360 --> 00:02:43.190
happened to the beneficiaries of
a program versus what would

00:02:43.190 --> 00:02:45.350
have happened in the absence
of the program.

00:02:45.350 --> 00:02:46.400
This is really key.

00:02:46.400 --> 00:02:48.760
What would have happened in the
absence of a program is

00:02:48.760 --> 00:02:51.830
what's called a counterfactual,
and it's key

00:02:51.830 --> 00:02:55.350
for you to evaluate any method
to estimate program impact.

00:02:55.350 --> 00:02:57.040
Not just randomized
evaluation.

00:02:57.040 --> 00:03:00.090
So when you are trying to assess
how someone is going to

00:03:00.090 --> 00:03:02.990
do an impact evaluation,
always ask yourself the

00:03:02.990 --> 00:03:06.040
question, what is the
counterfactual here?

00:03:06.040 --> 00:03:09.700
How are they planning to think
about this counterfactual?

00:03:09.700 --> 00:03:14.090
How do these people look like in
the absence of the program?

00:03:14.090 --> 00:03:16.600
In the case of Kenya in the
textbooks that Rachel was

00:03:16.600 --> 00:03:18.810
referring to this morning,
we thought about the

00:03:18.810 --> 00:03:22.750
counterfactual in terms of how
these children fared after

00:03:22.750 --> 00:03:25.460
this textbook program was
implemented versus how they

00:03:25.460 --> 00:03:28.590
would have fared at the same
moment in time had the program

00:03:28.590 --> 00:03:29.970
not been implemented.

00:03:29.970 --> 00:03:34.170
This is crucial, because even
before and after methodologies

00:03:34.170 --> 00:03:36.470
or any other of those
methodologies, you are

00:03:36.470 --> 00:03:38.890
assuming implicitly
counterfactual.

00:03:38.890 --> 00:03:41.330
And the question is, what
counterfactual are you

00:03:41.330 --> 00:03:45.640
assuming, and then is that
assumption realistic?

00:03:45.640 --> 00:03:47.150
And in some cases, it may be.

00:03:47.150 --> 00:03:49.430
In other cases, it may not.

00:03:49.430 --> 00:03:52.030
So the problem is, the
counterfactual is not

00:03:52.030 --> 00:03:52.720
observable.

00:03:52.720 --> 00:03:55.030
So the key goal of this
impact evaluation

00:03:55.030 --> 00:03:57.140
methodology is to mimic it.

00:03:57.140 --> 00:04:00.050
You can't observe how this
children in Kenya would have

00:04:00.050 --> 00:04:03.940
fared if the textbook program
had not been implemented.

00:04:03.940 --> 00:04:06.600
The truth is, the textbook
program was implemented, these

00:04:06.600 --> 00:04:10.310
textbooks were sent, and so you
can't observe what that

00:04:10.310 --> 00:04:13.520
alternative reality
would have been.

00:04:13.520 --> 00:04:17.130
And so constructing the
counterfactual is usually done

00:04:17.130 --> 00:04:19.810
by selecting a group
of people--

00:04:19.810 --> 00:04:22.810
in this case, children, in the
case of the Kenya example--

00:04:22.810 --> 00:04:26.320
that have not been exposed to
the program, or were not

00:04:26.320 --> 00:04:27.980
affected by the program.

00:04:27.980 --> 00:04:31.680
And so in a randomized
evaluation, the key goal here

00:04:31.680 --> 00:04:33.650
of the randomized evaluation
is that you

00:04:33.650 --> 00:04:35.470
do it from the beginning.

00:04:35.470 --> 00:04:38.300
And this is a question that I
think Logan had in the first

00:04:38.300 --> 00:04:40.190
session with Rachel.

00:04:40.190 --> 00:04:43.680
You can't do a randomized
evaluation three years after

00:04:43.680 --> 00:04:45.400
the program was implemented.

00:04:45.400 --> 00:04:49.200
And the reason you can't do it
is that you need to create,

00:04:49.200 --> 00:04:52.120
through this randomized
experiment, the treatment in

00:04:52.120 --> 00:04:52.860
the control group.

00:04:52.860 --> 00:04:55.600
You need to decide early on
who's going to get the

00:04:55.600 --> 00:04:57.870
treatment or who's going to be
offered the treatment and who

00:04:57.870 --> 00:04:59.830
is not going to be offered
the treatment.

00:04:59.830 --> 00:05:03.070
There are some opportunities,
as Rachel referred to, and

00:05:03.070 --> 00:05:07.010
your get out the vote case is a
good example, where someone

00:05:07.010 --> 00:05:08.610
already did this.

00:05:08.610 --> 00:05:11.580
And so you may be lucky, and you
may step into the room and

00:05:11.580 --> 00:05:12.520
say, oh, look.

00:05:12.520 --> 00:05:13.670
Someone did it.

00:05:13.670 --> 00:05:18.310
But this thing is, someone
should have taken care so that

00:05:18.310 --> 00:05:20.890
the assignment to this treatment
and control group

00:05:20.890 --> 00:05:23.130
was done in a random manner.

00:05:23.130 --> 00:05:25.950
And in effect, and we'll see
what exactly is random, but

00:05:25.950 --> 00:05:30.085
what I can tell you for now is
if someone doesn't say, we did

00:05:30.085 --> 00:05:33.100
it randomized, we did a
deliberate process so that it

00:05:33.100 --> 00:05:35.930
was random, it's probably
not random.

00:05:35.930 --> 00:05:38.950
Random is not what people
say in the real world.

00:05:38.950 --> 00:05:39.120
Oh!

00:05:39.120 --> 00:05:40.560
This is just a random event.

00:05:40.560 --> 00:05:43.250
Random has a very specific
definition which we're going

00:05:43.250 --> 00:05:44.430
to see in a second.

00:05:44.430 --> 00:05:47.450
So it's not enough to
just say, oh, look.

00:05:47.450 --> 00:05:49.210
We didn't do anything
systematic.

00:05:49.210 --> 00:05:51.560
Just people enrolled, and
that's what happened.

00:05:51.560 --> 00:05:55.200
If they didn't do something
deliberate to do it random,

00:05:55.200 --> 00:05:56.810
then it probably
wasn't random.

00:05:56.810 --> 00:06:01.640
You can try to check this,
but not always possible.

00:06:01.640 --> 00:06:06.320
The non-randomized, basically,
I use that some excluded

00:06:06.320 --> 00:06:10.570
group, the group of people
you're going to use as this

00:06:10.570 --> 00:06:14.130
comparison group, it's mimicking
this counterfactual.

00:06:14.130 --> 00:06:19.490
And the non-randomized methods
rely on the strength of the

00:06:19.490 --> 00:06:20.730
assumption that you're making.

00:06:20.730 --> 00:06:25.580
So the methods will be strong
if the assumption that the

00:06:25.580 --> 00:06:27.960
comparison group mimics the
counterfactual is a good

00:06:27.960 --> 00:06:28.960
assumption.

00:06:28.960 --> 00:06:32.490
There's not any sense in which
you say, well, this method is

00:06:32.490 --> 00:06:35.860
better than other this other
in some absolute manner.

00:06:35.860 --> 00:06:39.240
It is better or it's not better
if the assumptions

00:06:39.240 --> 00:06:41.690
needed to mimic the
counterfactual hold.

00:06:41.690 --> 00:06:45.560
If they hold, then that's great,
you have a good method.

00:06:45.560 --> 00:06:47.660
The key distinction
between this--

00:06:47.660 --> 00:06:48.563
yes?

00:06:48.563 --> 00:06:50.910
AUDIENCE: Could you give us
an example of when the

00:06:50.910 --> 00:06:54.330
assumptions were just
obviously untrue?

00:06:54.330 --> 00:06:54.810
PROFESSOR: Sure.

00:06:54.810 --> 00:07:00.130
So suppose that you had this
textbook program and it was

00:07:00.130 --> 00:07:03.940
happening in Kenya,
where many--

00:07:03.940 --> 00:07:05.590
and this is program happened--

00:07:05.590 --> 00:07:07.370
where many other things
were happening in

00:07:07.370 --> 00:07:08.670
this education system.

00:07:08.670 --> 00:07:11.550
So textbooks were being
distributed, different

00:07:11.550 --> 00:07:13.000
teachers were being hired.

00:07:13.000 --> 00:07:15.520
A lot of activities
were happening.

00:07:15.520 --> 00:07:19.310
And so you just compare what
test scores of children were

00:07:19.310 --> 00:07:22.880
before the program and then
what textbooks of children

00:07:22.880 --> 00:07:27.420
were after the program, you
would suspect that--

00:07:27.420 --> 00:07:30.570
well, first of all, if you did
that, the counterfactual you

00:07:30.570 --> 00:07:33.760
would be assuming is that in
the absence of the program,

00:07:33.760 --> 00:07:36.870
test scores would have
remained flat.

00:07:36.870 --> 00:07:38.700
And that may be a reasonable

00:07:38.700 --> 00:07:40.530
counterfactual in some contexts.

00:07:40.530 --> 00:07:42.340
Not many, to be honest.

00:07:42.340 --> 00:07:43.340
But not in others.

00:07:43.340 --> 00:07:46.200
So in one context in which other
things happening in the

00:07:46.200 --> 00:07:50.200
education system in Kenya, it's
very hard to argue that

00:07:50.200 --> 00:07:52.220
nothing would have changed
in test scores.

00:07:52.220 --> 00:07:54.060
Because test scores would have
increased, because there are

00:07:54.060 --> 00:07:55.740
lots of things that happen.

00:07:55.740 --> 00:07:59.130
Now suppose you implemented this
same program in a very

00:07:59.130 --> 00:08:03.280
remote village, very secluded
area where nothing else would

00:08:03.280 --> 00:08:04.110
have happened.

00:08:04.110 --> 00:08:07.350
You sort of have a pretty
good sense that no other

00:08:07.350 --> 00:08:09.960
intervention was happening
for one group or the

00:08:09.960 --> 00:08:11.430
other at the same time.

00:08:11.430 --> 00:08:13.240
The assumption maybe
more plausible.

00:08:13.240 --> 00:08:15.970
I think in this case, the
textbook example, it's still

00:08:15.970 --> 00:08:18.790
questionable, because there are
other educational input

00:08:18.790 --> 00:08:19.970
said may be happening.

00:08:19.970 --> 00:08:23.120
But the key is that the context
and the method are the

00:08:23.120 --> 00:08:25.980
ones that together can
tell you how good

00:08:25.980 --> 00:08:27.360
the assumption is.

00:08:27.360 --> 00:08:29.380
The method by itself
cannot tell you.

00:08:29.380 --> 00:08:32.510
The method by itself may be
reasonable under certain

00:08:32.510 --> 00:08:34.500
conditions but not
under others.

00:08:34.500 --> 00:08:36.870
AUDIENCE: But there aren't any
sort of big famous studies

00:08:36.870 --> 00:08:39.836
that weren't randomized, that
everybody thinks they're

00:08:39.836 --> 00:08:40.140
pretty good?

00:08:40.140 --> 00:08:40.700
PROFESSOR: Yes.

00:08:40.700 --> 00:08:45.270
So I don't want to get a lot
into this, but there's a whole

00:08:45.270 --> 00:08:50.780
debate now in economics
literature as to whether

00:08:50.780 --> 00:08:53.680
randomized experiments
are the only way to

00:08:53.680 --> 00:08:55.490
estimate causal effects.

00:08:55.490 --> 00:08:59.200
This is a big, big debate, and
there are very respectable

00:08:59.200 --> 00:09:02.160
people on both sides
of the debate.

00:09:02.160 --> 00:09:05.060
What I can tell you is that
debate has not been solved,

00:09:05.060 --> 00:09:08.240
but I think more and more
people are sort of

00:09:08.240 --> 00:09:11.460
recognizing, at least, that
the randomized experiment

00:09:11.460 --> 00:09:12.670
should be a first best.

00:09:12.670 --> 00:09:17.120
I think even the opponents of
the method do say that.

00:09:17.120 --> 00:09:19.190
But the other thing I would say
is there have been many

00:09:19.190 --> 00:09:23.210
studies trying to compare the
results of an experiment with

00:09:23.210 --> 00:09:25.920
some of the other
non-experimental methods.

00:09:25.920 --> 00:09:28.990
You have one in your
get out the vote.

00:09:28.990 --> 00:09:32.950
That was not a study in which
the non-experimental methods

00:09:32.950 --> 00:09:35.840
fared very well, but there are
other studies in which they

00:09:35.840 --> 00:09:36.880
fared well.

00:09:36.880 --> 00:09:40.000
The key thing is we haven't been
able to figure out under

00:09:40.000 --> 00:09:44.650
what conditions the
non-randomized evaluations

00:09:44.650 --> 00:09:45.560
fared well.

00:09:45.560 --> 00:09:47.750
If we knew, then it
would be nice.

00:09:47.750 --> 00:09:50.080
But I think so far,
the answer--

00:09:50.080 --> 00:09:51.300
we don't know.

00:09:51.300 --> 00:09:53.660
We know the theoretical answer,
which is, if the

00:09:53.660 --> 00:09:57.860
assumptions hold,
we're golden.

00:09:57.860 --> 00:10:03.290
The problem, key problem, is
that this is relying on the

00:10:03.290 --> 00:10:07.040
assumptions, and you cannot
test these assumptions.

00:10:07.040 --> 00:10:09.210
If you could test this
assumption, if you could test

00:10:09.210 --> 00:10:13.380
under what assumption this
mimics the counterfactuals,

00:10:13.380 --> 00:10:14.220
we'll be all done.

00:10:14.220 --> 00:10:17.230
We'll be able to say, from the
very beginning, we can use

00:10:17.230 --> 00:10:18.050
this method.

00:10:18.050 --> 00:10:20.990
You cannot, no matter how
sophisticated and how good the

00:10:20.990 --> 00:10:23.320
non-experimental method is.

00:10:23.320 --> 00:10:24.220
Yes?

00:10:24.220 --> 00:10:24.895
You seem skeptical.

00:10:24.895 --> 00:10:26.860
AUDIENCE: No, no, no.

00:10:26.860 --> 00:10:27.800
PROFESSOR: You're--?

00:10:27.800 --> 00:10:28.950
OK.

00:10:28.950 --> 00:10:32.380
So this is very confusing.

00:10:32.380 --> 00:10:33.905
It's like twice they're
showing--

00:10:36.850 --> 00:10:38.560
you should do a randomized
evaluation

00:10:38.560 --> 00:10:39.590
to see if this helps.

00:10:39.590 --> 00:10:41.400
Two boards.

00:10:41.400 --> 00:10:41.870
All right.

00:10:41.870 --> 00:10:44.620
So the randomized evaluations
here, you have a bunch of

00:10:44.620 --> 00:10:47.030
other names in which they are
known-- random assignment

00:10:47.030 --> 00:10:50.180
studies, randomized field
trials, just in case--

00:10:50.180 --> 00:10:53.360
RCTs are the way that they were
known very early in the

00:10:53.360 --> 00:10:57.310
literature, and still nowadays
in other disciplines.

00:10:57.310 --> 00:11:01.030
And then the non-experimental
methods, all of this that you

00:11:01.030 --> 00:11:03.560
have here, some of which
are in your get

00:11:03.560 --> 00:11:07.310
out the vote study.

00:11:07.310 --> 00:11:07.610
All right.

00:11:07.610 --> 00:11:10.490
So before we go into what is a
randomized experiment, I want

00:11:10.490 --> 00:11:12.640
to introduce the notion
of validity.

00:11:12.640 --> 00:11:14.130
And Rachel raised
it a little bit.

00:11:14.130 --> 00:11:18.640
But we usually think of in terms
of two kinds of validity

00:11:18.640 --> 00:11:20.570
when you assess a study.

00:11:20.570 --> 00:11:22.440
The first one is internal
validity.

00:11:22.440 --> 00:11:24.150
This has to do with
your ability

00:11:24.150 --> 00:11:25.670
to draw causal inference.

00:11:25.670 --> 00:11:29.260
So your ability to attribute
your impact

00:11:29.260 --> 00:11:30.980
estimates to the program.

00:11:30.980 --> 00:11:34.440
So if you said, this difference
is my impact

00:11:34.440 --> 00:11:38.470
estimate, the study has strong
internal validity if you can

00:11:38.470 --> 00:11:41.610
reliably attribute that to the
program and not to something

00:11:41.610 --> 00:11:48.310
else for whatever population is
represented in your study.

00:11:48.310 --> 00:11:53.400
So if you did the textbook
project in Kenya, in a rural

00:11:53.400 --> 00:11:57.410
village in Kenya, well,
that study--

00:11:57.410 --> 00:12:00.210
if it's internally valid, or
if it has strong internal

00:12:00.210 --> 00:12:03.620
validity, then it's going to
be valid for the population

00:12:03.620 --> 00:12:06.755
represented by the sample you
drew in Kenya, in that rural

00:12:06.755 --> 00:12:07.760
village in Kenya.

00:12:07.760 --> 00:12:10.680
External validity, on the other
hand, has to do with the

00:12:10.680 --> 00:12:14.400
ability to generalize to other
populations, other settings,

00:12:14.400 --> 00:12:15.650
other time periods.

00:12:18.370 --> 00:12:23.660
The reason I mention this is
that these two things often

00:12:23.660 --> 00:12:27.490
trade off against each other
when you are sort of trying to

00:12:27.490 --> 00:12:29.050
commission or conduct a study.

00:12:29.050 --> 00:12:32.750
So you may decide, I'm going to
go this randomized trial in

00:12:32.750 --> 00:12:36.920
this very small place to
test out my model.

00:12:36.920 --> 00:12:39.850
And you may be concerned with,
how do I know if it

00:12:39.850 --> 00:12:42.830
generalizes to other settings?

00:12:42.830 --> 00:12:45.990
On the other hand, you may
decide, well, I'm going to use

00:12:45.990 --> 00:12:50.260
other kinds of methods and be
representative of the whole

00:12:50.260 --> 00:12:52.630
Kenya, or the whole India, or
whatever country you're

00:12:52.630 --> 00:12:53.790
working in.

00:12:53.790 --> 00:12:56.050
The key thing is to distinguish
two things.

00:12:56.050 --> 00:12:59.110
The first one has to do with
causal inference for your own

00:12:59.110 --> 00:13:02.050
sample, or for the population
represented in your sample.

00:13:02.050 --> 00:13:05.920
The second one has to do
with generalizability.

00:13:05.920 --> 00:13:08.150
And Rachel talked a little bit
about how much you can

00:13:08.150 --> 00:13:10.650
generalize from experiments,
and we can talk

00:13:10.650 --> 00:13:12.500
about that if you want.

00:13:12.500 --> 00:13:13.010
All right.

00:13:13.010 --> 00:13:15.960
So what is a randomized
evaluation?

00:13:15.960 --> 00:13:19.360
So the very basics--

00:13:19.360 --> 00:13:21.880
can someone tell me what
the basics are?

00:13:21.880 --> 00:13:24.640
Randomized experiments?

00:13:24.640 --> 00:13:26.800
How do you do it?

00:13:26.800 --> 00:13:28.050
How does it work?

00:13:31.600 --> 00:13:36.420
There's one thing that
you should know.

00:13:36.420 --> 00:13:39.760
When I first started teaching,
I used to be very, very

00:13:39.760 --> 00:13:42.720
nervous when there was
silence in the room.

00:13:42.720 --> 00:13:45.650
But now I'm very comfortable.

00:13:45.650 --> 00:13:46.990
So you tell me.

00:13:46.990 --> 00:13:51.120
So how does a randomized
trial work?

00:13:51.120 --> 00:13:53.390
AUDIENCE: Allocate the subject
into the treatment of the

00:13:53.390 --> 00:13:55.680
control group based on
a random assignment.

00:13:55.680 --> 00:13:56.810
PROFESSOR: OK.

00:13:56.810 --> 00:13:57.970
random assignment.

00:13:57.970 --> 00:14:00.140
Sort of like a flip
of a coin, right?

00:14:00.140 --> 00:14:03.940
So in the simple scenario, we
take a sample of program

00:14:03.940 --> 00:14:04.460
applicants--

00:14:04.460 --> 00:14:06.370
just like we do with
drug trials--

00:14:06.370 --> 00:14:09.240
take a sample of program
applicants and we randomly

00:14:09.240 --> 00:14:11.670
assign them either to a
treatment group which is

00:14:11.670 --> 00:14:14.330
offered the treatment
and a control group.

00:14:14.330 --> 00:14:15.740
They're not offered
the treatment.

00:14:15.740 --> 00:14:20.440
This is a very simple setting,
but the idea here is that by

00:14:20.440 --> 00:14:23.550
doing this, the treatment and
the control group are

00:14:23.550 --> 00:14:25.650
comparable to each other.

00:14:25.650 --> 00:14:28.290
And so any differences you
observe between these two

00:14:28.290 --> 00:14:31.970
groups should be attributable
to the program.

00:14:31.970 --> 00:14:33.880
The key about this method--

00:14:33.880 --> 00:14:37.840
so this do not differ
systematically at the outset

00:14:37.840 --> 00:14:39.070
of the experiment.

00:14:39.070 --> 00:14:42.320
The key about this method is
that this control group is

00:14:42.320 --> 00:14:43.930
mimicking the counterfactuals.

00:14:43.930 --> 00:14:47.100
It's mimicking what will happen
to the treatment in the

00:14:47.100 --> 00:14:48.560
absence of the treatment.

00:14:48.560 --> 00:14:52.160
And the reason it's mimicking
the counterfactual is that on

00:14:52.160 --> 00:14:55.110
average, this group should
be exactly like

00:14:55.110 --> 00:14:55.915
the treatment group.

00:14:55.915 --> 00:14:59.950
So if we took all of you and
we flip coins, from each of

00:14:59.950 --> 00:15:02.710
you we flip coins, and then you
ended up in two different

00:15:02.710 --> 00:15:07.110
groups, the two groups would
have, on average, the same

00:15:07.110 --> 00:15:08.440
characteristics.

00:15:08.440 --> 00:15:11.370
So the same people
that come from a

00:15:11.370 --> 00:15:12.830
certain area of the world.

00:15:12.830 --> 00:15:14.320
The same percent of females.

00:15:14.320 --> 00:15:15.810
The same average intelligence.

00:15:15.810 --> 00:15:17.520
The same average income.

00:15:17.520 --> 00:15:19.330
The same average education.

00:15:19.330 --> 00:15:20.090
You name it.

00:15:20.090 --> 00:15:22.540
We're going to do an exercise
where you can see this.

00:15:22.540 --> 00:15:25.470
The beauty of this method
is that the two groups

00:15:25.470 --> 00:15:28.940
statistically are going to be
identical to each other.

00:15:28.940 --> 00:15:32.620
If they're not identical to each
other statistically then

00:15:32.620 --> 00:15:34.030
you don't have random
assignment.

00:15:34.030 --> 00:15:35.250
It has failed.

00:15:35.250 --> 00:15:37.340
Random assignment.

00:15:37.340 --> 00:15:40.110
So the random assignment is
the process you employ to

00:15:40.110 --> 00:15:42.280
create these two comparable
groups.

00:15:42.280 --> 00:15:45.950
The huge advantage of this
random assignment is that you

00:15:45.950 --> 00:15:49.780
don't need to think about, are
the two groups the same on

00:15:49.780 --> 00:15:52.350
this characteristic
that I care about?

00:15:52.350 --> 00:15:54.180
You don't need to think
about that.

00:15:54.180 --> 00:15:57.680
The two groups should be the
same on those characteristics.

00:15:57.680 --> 00:15:58.620
AUDIENCE: So that's
theoretically.

00:15:58.620 --> 00:16:01.590
So now thinking in terms of a
program where you have, say,

00:16:01.590 --> 00:16:03.480
selection criteria.

00:16:03.480 --> 00:16:05.830
So let's say you want to do
a program in a particular

00:16:05.830 --> 00:16:09.490
district, and you're looking
for people that have three

00:16:09.490 --> 00:16:11.750
characteristics that
are all the same.

00:16:11.750 --> 00:16:13.840
Let's say for whatever reason,
the number of people that

00:16:13.840 --> 00:16:17.065
present themselves in that way
is a relatively small number.

00:16:19.670 --> 00:16:22.980
Then you can randomly select
within that small number.

00:16:22.980 --> 00:16:25.900
But then you're challenged by
the size of your group.

00:16:25.900 --> 00:16:26.710
PROFESSOR: Absolutely.

00:16:26.710 --> 00:16:29.420
And on Thursday, you'll
get to that

00:16:29.420 --> 00:16:31.220
minimum sample size detected.

00:16:31.220 --> 00:16:34.090
But the key there, if those
three characteristics are your

00:16:34.090 --> 00:16:37.250
selection criteria, you don't
want to modify your selection

00:16:37.250 --> 00:16:39.180
criteria because someone is
going to come and do an

00:16:39.180 --> 00:16:40.040
experiment.

00:16:40.040 --> 00:16:42.500
You want to offer the program
to whoever you're going to

00:16:42.500 --> 00:16:43.640
offer the program.

00:16:43.640 --> 00:16:46.010
So those three characteristics
are key for your program,

00:16:46.010 --> 00:16:49.050
because you decide those are the
people you want to serve,

00:16:49.050 --> 00:16:52.040
then you need to find a way
to do your evaluation that

00:16:52.040 --> 00:16:54.270
doesn't involve relaxing
that criteria.

00:16:54.270 --> 00:16:57.070
Unless you really are thinking,
well, it would be

00:16:57.070 --> 00:16:59.250
interesting to know if I served
this other group,

00:16:59.250 --> 00:17:01.590
whether the program has a
different effect or no.

00:17:01.590 --> 00:17:03.620
AUDIENCE: But you can't mix and
match among the criteria.

00:17:03.620 --> 00:17:05.280
You can't say-- or could you?

00:17:05.280 --> 00:17:06.640
Let's say you have trouble.

00:17:06.640 --> 00:17:07.710
You're not getting
enough people

00:17:07.710 --> 00:17:08.710
with those three criteria.

00:17:08.710 --> 00:17:11.090
So you say, OK, now we're going
to make it six criteria,

00:17:11.090 --> 00:17:13.339
and we'll be happy if they only
meet four of the six.

00:17:13.339 --> 00:17:16.630
That right there would not make
it possible to do this.

00:17:16.630 --> 00:17:22.160
PROFESSOR: So if, at the end
of your processes, where

00:17:22.160 --> 00:17:24.859
you're saying three criteria,
six criteria, five, four,

00:17:24.859 --> 00:17:27.490
whatever you say-- if at the end
of this process, you end

00:17:27.490 --> 00:17:32.340
up with a large enough pool to
be able to randomly assign

00:17:32.340 --> 00:17:34.930
into two groups, treatment
and control?

00:17:34.930 --> 00:17:36.190
No problem.

00:17:36.190 --> 00:17:37.495
You could have relaxed
the criteria.

00:17:37.495 --> 00:17:41.990
You could have said six, five,
four, whatever you want.

00:17:41.990 --> 00:17:45.960
My previous answer is more to,
don't change the criteria just

00:17:45.960 --> 00:17:47.580
because you want to do
a randomized trial.

00:17:47.580 --> 00:17:49.180
You want to evaluate
the program

00:17:49.180 --> 00:17:50.380
that you want to evaluate.

00:17:50.380 --> 00:17:52.790
You don't want to evaluate the
program that you think will

00:17:52.790 --> 00:17:55.180
fit the randomized design.

00:17:55.180 --> 00:17:56.800
Make sense?

00:17:56.800 --> 00:17:59.703
Other questions, comments?

00:17:59.703 --> 00:18:01.180
No?

00:18:01.180 --> 00:18:01.520
OK.

00:18:01.520 --> 00:18:04.630
So the two groups did not differ
systematically at the

00:18:04.630 --> 00:18:05.610
outset of the experiment.

00:18:05.610 --> 00:18:06.990
I want to emphasize this.

00:18:06.990 --> 00:18:09.070
And again, there's going to be
an exercise where you can see

00:18:09.070 --> 00:18:10.180
this in Excel.

00:18:10.180 --> 00:18:13.850
But the key is that the two
groups will be identical both

00:18:13.850 --> 00:18:15.875
on observable characteristics
and non-observable.

00:18:15.875 --> 00:18:17.920
And when I say identical,
they're identical

00:18:17.920 --> 00:18:18.730
statistically.

00:18:18.730 --> 00:18:20.910
It's not like the needs
of these two groups

00:18:20.910 --> 00:18:21.980
are exactly the same.

00:18:21.980 --> 00:18:26.290
They are statistically identical
in the sense that

00:18:26.290 --> 00:18:28.560
you should not observe a pattern
of statistically

00:18:28.560 --> 00:18:31.210
significant differences between
the two groups.

00:18:31.210 --> 00:18:34.370
If you were to test 100
characteristics, then five of

00:18:34.370 --> 00:18:36.670
them may end up being
statistically significant,

00:18:36.670 --> 00:18:40.240
just because of the luck of the
draw or multiple testing.

00:18:40.240 --> 00:18:43.690
But they shouldn't differ
systematically at the outset

00:18:43.690 --> 00:18:46.130
of the experiment.

00:18:46.130 --> 00:18:49.240
And this is the key.

00:18:49.240 --> 00:18:51.940
The whole key of impact
evaluation is that then you

00:18:51.940 --> 00:18:54.660
can take that difference and
attribute it to the program.

00:18:54.660 --> 00:18:57.510
And then you're not thinking,
is it the program, or is it

00:18:57.510 --> 00:19:00.650
some pre-existing differences
between the groups?

00:19:00.650 --> 00:19:04.710
If you reach the end of an
impact evaluation and you're

00:19:04.710 --> 00:19:08.180
wondering, is it the program,
or is it something else?

00:19:08.180 --> 00:19:10.560
Unfortunately, that's not a very
good impact evaluation.

00:19:16.780 --> 00:19:18.680
So there are some variations
on the basics.

00:19:18.680 --> 00:19:20.530
You could assign to multiple
treatment groups.

00:19:20.530 --> 00:19:23.490
So rather than having only one
treatment, you could have

00:19:23.490 --> 00:19:25.230
multiple treatments.

00:19:25.230 --> 00:19:27.460
And this happens a lot if
you're trying to test

00:19:27.460 --> 00:19:30.080
different ways of implementing
a program.

00:19:30.080 --> 00:19:35.100
So you may have a program that
you're thinking, well, I don't

00:19:35.100 --> 00:19:37.450
know of the best way to deliver
it is method number

00:19:37.450 --> 00:19:38.920
one or method number two.

00:19:38.920 --> 00:19:41.570
And you may randomize
into three groups.

00:19:41.570 --> 00:19:43.360
Method number one, method
number two,

00:19:43.360 --> 00:19:44.490
and a control group.

00:19:44.490 --> 00:19:47.090
Or you may decide to do away
with the control group and

00:19:47.090 --> 00:19:50.320
only randomize into, say, three
methods, three ways of

00:19:50.320 --> 00:19:51.640
delivering an intervention.

00:19:51.640 --> 00:19:54.180
If you do away with the control
group, you're going to

00:19:54.180 --> 00:19:56.320
be able to answer the question,
is one treatment

00:19:56.320 --> 00:19:57.400
better than the other?

00:19:57.400 --> 00:19:59.790
But you're not going to be able
to answer the question,

00:19:59.790 --> 00:20:02.580
is any of this treatment better
than what would have

00:20:02.580 --> 00:20:04.950
happened in the absence
of the program?

00:20:04.950 --> 00:20:06.210
So this is one variation.

00:20:06.210 --> 00:20:09.190
And the other variation, we were
talking about when Iqbal

00:20:09.190 --> 00:20:10.470
answered the question.

00:20:10.470 --> 00:20:12.120
He said, well, you have
a bunch of people.

00:20:12.120 --> 00:20:13.490
You assign some to
the treatment or

00:20:13.490 --> 00:20:14.720
to the control group.

00:20:14.720 --> 00:20:17.800
You can assign units other then
people or households.

00:20:17.800 --> 00:20:21.770
Health centers, schools, local
government, villages.

00:20:21.770 --> 00:20:23.740
And you can see in
JPAL's website.

00:20:23.740 --> 00:20:25.950
There are a bunch of examples
where each of these have been

00:20:25.950 --> 00:20:29.910
used as units for random
assignment?

00:20:29.910 --> 00:20:31.400
Yes?

00:20:31.400 --> 00:20:32.280
Your name, please?

00:20:32.280 --> 00:20:34.340
We don't have name tags,
but I like to call

00:20:34.340 --> 00:20:35.630
people by their name.

00:20:35.630 --> 00:20:36.550
Wendy?

00:20:36.550 --> 00:20:37.920
Go ahead.

00:20:37.920 --> 00:20:45.630
AUDIENCE: So if we pick schools,
my conclusions will

00:20:45.630 --> 00:20:47.380
be about schools.

00:20:47.380 --> 00:20:50.427
They won't be about the students
in the school.

00:20:50.427 --> 00:20:51.590
Or is that wrong?

00:20:51.590 --> 00:20:54.970
PROFESSOR: So it depends on--
you say your conclusions will

00:20:54.970 --> 00:20:57.530
be about the schools?

00:20:57.530 --> 00:21:01.320
The key thing is, what is the
unit of intervention here?

00:21:01.320 --> 00:21:04.990
So it's a program that's
directed at all the children

00:21:04.990 --> 00:21:08.110
in the school, only some
children in the school?

00:21:08.110 --> 00:21:10.970
In part, the decision of what
you randomize, whether it's

00:21:10.970 --> 00:21:13.990
schools or children within
schools, depends on what's the

00:21:13.990 --> 00:21:16.040
nature of the treatment.

00:21:16.040 --> 00:21:19.660
So if you have a program
that serves everyone

00:21:19.660 --> 00:21:21.280
in the school, yes.

00:21:21.280 --> 00:21:24.740
Your assignment should be
at the school level.

00:21:24.740 --> 00:21:27.020
That is, you should have some
schools that receive the

00:21:27.020 --> 00:21:28.970
program and others that don't.

00:21:28.970 --> 00:21:31.540
But if you have a program that
is only going to serve some

00:21:31.540 --> 00:21:35.820
children in the school, then
your assignment could be

00:21:35.820 --> 00:21:38.570
within the school, and you have
some children who receive

00:21:38.570 --> 00:21:41.840
the treatment, and others
that do not.

00:21:41.840 --> 00:21:44.630
The key, though, is if you're
using your second method, you

00:21:44.630 --> 00:21:46.610
want to make sure there
are no spillovers.

00:21:46.610 --> 00:21:49.070
You want to make sure that
someone receiving the

00:21:49.070 --> 00:21:53.020
treatment is not going to affect
the outcomes of someone

00:21:53.020 --> 00:21:55.030
not receiving the treatment.

00:21:55.030 --> 00:21:56.550
And so you're going to
see the spillovers.

00:21:56.550 --> 00:21:59.700
That's something you're going
to see on Friday.

00:21:59.700 --> 00:22:03.500
But the basic idea is, what
level of randomization you

00:22:03.500 --> 00:22:06.890
have depends on, what is the
level of your treatment?

00:22:06.890 --> 00:22:09.240
If you're treating schools, if
you're treating individuals

00:22:09.240 --> 00:22:10.775
within schools, et cetera.

00:22:10.775 --> 00:22:14.060
AUDIENCE: So statistically I
want them to be the same.

00:22:14.060 --> 00:22:18.310
PROFESSOR: You want them
to be the same, yes.

00:22:18.310 --> 00:22:19.555
AUDIENCE: My name is Manuel.

00:22:19.555 --> 00:22:21.485
Please talk a little bit
about the unobserved

00:22:21.485 --> 00:22:22.590
characteristics.

00:22:22.590 --> 00:22:23.930
PROFESSOR: Yes.

00:22:23.930 --> 00:22:26.960
So the unobserved
characteristics--

00:22:26.960 --> 00:22:30.220
this is something that a lot
of the non-experimental

00:22:30.220 --> 00:22:32.430
methods wrestle with.

00:22:32.430 --> 00:22:38.390
And the idea is, the randomized
experiment creates

00:22:38.390 --> 00:22:42.680
these two groups that, by pure
laws of statistics, are

00:22:42.680 --> 00:22:46.480
identical in every single
characteristic,

00:22:46.480 --> 00:22:48.060
statistically speaking.

00:22:48.060 --> 00:22:49.920
So both the ones you
observe and the

00:22:49.920 --> 00:22:51.390
ones you don't observe.

00:22:51.390 --> 00:22:54.410
So if we were trying to do an
experiment in this classroom

00:22:54.410 --> 00:22:58.575
and I randomly assigned you
into two groups, I can be

00:22:58.575 --> 00:23:02.430
confident that even things I
don't observe about you,

00:23:02.430 --> 00:23:04.920
you're going to be balanced
across those two groups.

00:23:04.920 --> 00:23:09.480
If instead I try to match you,
I use all the information you

00:23:09.480 --> 00:23:14.450
gave me on your application
forms and say OK, these people

00:23:14.450 --> 00:23:15.450
are from this--

00:23:15.450 --> 00:23:19.200
I'm going to be able to do so
with the observables, but not

00:23:19.200 --> 00:23:20.520
with the unobservables.

00:23:20.520 --> 00:23:23.540
And again, depending on how
important these unobservable

00:23:23.540 --> 00:23:27.080
are in explaining the outcomes,
that may be a big

00:23:27.080 --> 00:23:29.810
disadvantage or not so
big disadvantage.

00:23:29.810 --> 00:23:33.640
And this is what happened in the
get out the vote example.

00:23:33.640 --> 00:23:37.730
You were able to observe some
characteristics of people.

00:23:37.730 --> 00:23:40.660
And then non-experimental
methods, all of them--

00:23:40.660 --> 00:23:42.250
I mean, not all of them,
but most of them--

00:23:42.250 --> 00:23:45.710
can address those.

00:23:45.710 --> 00:23:48.450
Some of the methods can also
address some unobservables,

00:23:48.450 --> 00:23:53.170
but again, they always rely on
some assumption about how

00:23:53.170 --> 00:23:55.060
those unobservables behave.

00:23:55.060 --> 00:23:58.090
Here you're not relying
on any assumptions.

00:23:58.090 --> 00:24:00.750
You need to do the random
assignment properly, but once

00:24:00.750 --> 00:24:05.700
it's done properly, you're not
relying on any assumption.

00:24:05.700 --> 00:24:07.220
AUDIENCE: Is that the
general dichotomy?

00:24:07.220 --> 00:24:12.992
There's randomized tests, and
then matched pairs tests?

00:24:12.992 --> 00:24:15.066
Or is there other , is
it generally broken

00:24:15.066 --> 00:24:17.100
down into those two?

00:24:17.100 --> 00:24:21.220
PROFESSOR: So the way that I
think most people break it

00:24:21.220 --> 00:24:25.520
down is randomized, where you
use this random assignment,

00:24:25.520 --> 00:24:28.860
and then non-experimental
methods.

00:24:28.860 --> 00:24:31.760
But I don't mean to imply that
all the non-experimental

00:24:31.760 --> 00:24:33.440
methods are the same.

00:24:33.440 --> 00:24:35.620
And in fact, there are some
people who called them

00:24:35.620 --> 00:24:37.120
quasi-experimental methods.

00:24:37.120 --> 00:24:40.840
Those people tend to think of
them a little bit higher than

00:24:40.840 --> 00:24:42.460
the non-experimental methods.

00:24:42.460 --> 00:24:46.100
Non-experimental people tend
to say, this is not good.

00:24:46.100 --> 00:24:49.990
Quasi-experimental, oh, this
gets closer to the experiment.

00:24:49.990 --> 00:24:55.470
But the key thing here is that
whatever method you use, the

00:24:55.470 --> 00:25:00.010
key is how are the people
getting into the program being

00:25:00.010 --> 00:25:03.500
selected, and how are you
forming that comparison group,

00:25:03.500 --> 00:25:07.060
and what statistical techniques
are you using to

00:25:07.060 --> 00:25:10.550
adjust for whether that
comparison group is the same

00:25:10.550 --> 00:25:11.810
or not than the treatment?

00:25:11.810 --> 00:25:13.290
So the dichotomy is not between

00:25:13.290 --> 00:25:14.310
randomized and matching.

00:25:14.310 --> 00:25:17.100
The dichotomy is usually
between randomized and

00:25:17.100 --> 00:25:18.530
everything else.

00:25:18.530 --> 00:25:20.660
But within everything else,
there are methods that are

00:25:20.660 --> 00:25:23.010
much better than others.

00:25:23.010 --> 00:25:24.260
Yes?

00:25:26.110 --> 00:25:27.674
[? Holgo? ?]

00:25:27.674 --> 00:25:30.536
AUDIENCE: How do we randomize
when we assign people into

00:25:30.536 --> 00:25:31.967
treatment and control groups,
besides a lottery?

00:25:31.967 --> 00:25:33.217
[INAUDIBLE]

00:25:35.310 --> 00:25:36.750
PROFESSOR: You mean
the process?

00:25:36.750 --> 00:25:39.220
So tomorrow, the whole
day is going to be

00:25:39.220 --> 00:25:40.650
about how to randomize.

00:25:40.650 --> 00:25:44.650
But the basic idea is, you can
do it in a variety of ways.

00:25:44.650 --> 00:25:47.250
You can do it in a computer,
which allows you a lot more

00:25:47.250 --> 00:25:48.640
flexibility.

00:25:48.640 --> 00:25:52.630
But if for any reason, you
need to show people that

00:25:52.630 --> 00:25:55.210
you're doing it in a random,
transparent manner, that can

00:25:55.210 --> 00:25:56.070
also be done.

00:25:56.070 --> 00:26:01.910
We just did one in Niger in
West Africa where we used

00:26:01.910 --> 00:26:03.060
bingo balls.

00:26:03.060 --> 00:26:05.780
So literally, people would
draw from there, and then

00:26:05.780 --> 00:26:07.140
everyone could see.

00:26:07.140 --> 00:26:10.430
If we had brought a computer
into their room in Niger and

00:26:10.430 --> 00:26:13.300
tried to do things, it just
wouldn't have worked.

00:26:13.300 --> 00:26:16.080
People would have said, what
are you doing here?

00:26:16.080 --> 00:26:19.770
So there are there of many
different ways of randomizing.

00:26:19.770 --> 00:26:22.070
The key-- and this is something
we're going to talk

00:26:22.070 --> 00:26:23.200
about in a little bit--

00:26:23.200 --> 00:26:26.590
is what exactly is the process
that you use to make sure that

00:26:26.590 --> 00:26:29.510
it's random assignment, not
the how, you know, whether

00:26:29.510 --> 00:26:34.640
it's bingo balls or a lottery
or a coin or whatever it is.

00:26:34.640 --> 00:26:35.090
Yes?

00:26:35.090 --> 00:26:38.520
AUDIENCE: So at what point this
week will we talk about

00:26:38.520 --> 00:26:42.590
the ethical dimensions of
denying treatment to someone?

00:26:42.590 --> 00:26:43.040
PROFESSOR: OK.

00:26:43.040 --> 00:26:46.530
Like in three slides,
you can jump at me

00:26:46.530 --> 00:26:49.360
with the ethical issues.

00:26:49.360 --> 00:26:52.550
And then if I don't satisfy you,
you have four more days

00:26:52.550 --> 00:26:55.340
to jump at every single people
who comes into this room.

00:26:58.560 --> 00:27:01.210
So what I want to give you
is a little bit of

00:27:01.210 --> 00:27:02.170
the nuts and bolts.

00:27:02.170 --> 00:27:05.040
Rather they keep this discussion
in the abstract,

00:27:05.040 --> 00:27:06.730
this is what happens
in the experiment.

00:27:06.730 --> 00:27:09.160
The nuts and bolts, if you
wanted to do a randomized

00:27:09.160 --> 00:27:12.240
experiment tomorrow, these are
sort of eight key steps that

00:27:12.240 --> 00:27:14.650
you need to think about.

00:27:14.650 --> 00:27:17.410
This is a very simplified
description of the process.

00:27:17.410 --> 00:27:21.390
As those people sitting in the
back will tell you, this is

00:27:21.390 --> 00:27:22.420
very simplified.

00:27:22.420 --> 00:27:25.900
Their daily lives are consumed
with many of the steps, and

00:27:25.900 --> 00:27:30.520
they work months, if not
years, in each of this.

00:27:30.520 --> 00:27:33.530
The first step, and I can't
emphasize this enough, is to

00:27:33.530 --> 00:27:35.870
design the study carefully.

00:27:35.870 --> 00:27:41.550
So no matter what you do, what
you do at the beginning is

00:27:41.550 --> 00:27:44.480
going to affect you study for
the rest of the study.

00:27:44.480 --> 00:27:47.450
This is true for some things
in life and not others.

00:27:47.450 --> 00:27:51.420
For evaluations, impact
evaluations, if you don't do

00:27:51.420 --> 00:27:53.900
it right at the beginning,
you're going to be in trouble.

00:27:53.900 --> 00:27:55.360
That's going to come
down to haunt you.

00:27:55.360 --> 00:27:58.690
So anything you can do to spend
time at the beginning,

00:27:58.690 --> 00:28:02.120
making sure that the study is
designed properly, is going to

00:28:02.120 --> 00:28:03.890
be very helpful.

00:28:03.890 --> 00:28:07.690
What that means, in very
practical terms, is if you are

00:28:07.690 --> 00:28:10.860
in a position where you are
commissioning a study, and you

00:28:10.860 --> 00:28:14.280
don't have people in your staff
who are expert at this,

00:28:14.280 --> 00:28:16.470
make sure that whoever is
going to help you do the

00:28:16.470 --> 00:28:19.600
evaluation is involved from
the very beginning.

00:28:19.600 --> 00:28:24.010
What this also means is that
calling someone three years

00:28:24.010 --> 00:28:26.100
after the program was
implemented, saying, can you

00:28:26.100 --> 00:28:28.060
come and evaluate?

00:28:28.060 --> 00:28:31.980
That leaves the evaluator
with very few options.

00:28:31.980 --> 00:28:37.060
So the earlier the evaluators
are involved, the better the

00:28:37.060 --> 00:28:39.350
options are in terms of
how you can do this.

00:28:39.350 --> 00:28:42.550
Both in terms of the validity of
the evaluation, but also in

00:28:42.550 --> 00:28:47.010
terms of how it will interact
with the program in a way that

00:28:47.010 --> 00:28:49.140
it doesn't disrupt
the program.

00:28:49.140 --> 00:28:50.360
So this is key.

00:28:50.360 --> 00:28:53.870
And we can talk about design a
little bit now, but you will

00:28:53.870 --> 00:28:56.670
learn a little bit about design
when you speak about

00:28:56.670 --> 00:28:59.960
sample size, about measurement
issues, and all of those

00:28:59.960 --> 00:29:01.090
sessions are coming.

00:29:01.090 --> 00:29:02.110
How to randomize.

00:29:02.110 --> 00:29:05.930
So Wednesday and Thursday
are really about that.

00:29:05.930 --> 00:29:08.750
The second one is to randomly
assign people to treatment or

00:29:08.750 --> 00:29:11.790
control or more groups, if there
are more than those.

00:29:11.790 --> 00:29:13.840
The third one is to collect
baseline data.

00:29:13.840 --> 00:29:15.660
So this is a big question
that comes up.

00:29:15.660 --> 00:29:17.680
Should you collect
baseline data?

00:29:17.680 --> 00:29:23.300
I think my answer to that is, in
general, if you don't have

00:29:23.300 --> 00:29:27.250
a randomized evaluation, it's
going to be very, very, very

00:29:27.250 --> 00:29:30.530
difficult to get away without
baseline data.

00:29:30.530 --> 00:29:32.180
There are some methods
that work, but

00:29:32.180 --> 00:29:33.170
it's going to be difficult.

00:29:33.170 --> 00:29:36.220
By baseline, I mean, before
the intervention started.

00:29:36.220 --> 00:29:41.110
If you have a randomized trial
it would be highly preferable

00:29:41.110 --> 00:29:43.390
to have baseline data.

00:29:43.390 --> 00:29:44.530
Highly preferable.

00:29:44.530 --> 00:29:47.630
But not as critical as
with other methods.

00:29:47.630 --> 00:29:49.400
And it's preferable
in two ways.

00:29:49.400 --> 00:29:53.240
The first one is if you have a
baseline data, you can verify,

00:29:53.240 --> 00:29:55.910
at least in terms of those
characteristics you collected

00:29:55.910 --> 00:29:58.730
in the baseline survey,
you can verify that

00:29:58.730 --> 00:30:00.040
two groups look like.

00:30:00.040 --> 00:30:03.280
This is a nice thing to verify
at the beginning and not at

00:30:03.280 --> 00:30:04.660
the end of the evaluation.

00:30:04.660 --> 00:30:06.600
So if you can do it, that
would be helpful.

00:30:06.600 --> 00:30:08.470
And the second thing you
have to do is-- yes?

00:30:08.470 --> 00:30:09.370
AUDIENCE: Sorry.

00:30:09.370 --> 00:30:11.880
What happens if, at the baseline
data, you realize

00:30:11.880 --> 00:30:14.370
that the two groups that you
made were not random?

00:30:14.370 --> 00:30:16.820
Do you go and keep randomizing
until you get there?

00:30:16.820 --> 00:30:18.400
PROFESSOR: So it depends.

00:30:18.400 --> 00:30:20.950
It depends on when you
discovered this.

00:30:20.950 --> 00:30:23.620
If you discover this when the
treatment is already being

00:30:23.620 --> 00:30:27.130
implemented, it is too late to
do anything else in terms of

00:30:27.130 --> 00:30:28.200
re-randomizing.

00:30:28.200 --> 00:30:31.980
The ideal scenario is one in
which you can do this, collect

00:30:31.980 --> 00:30:35.830
the baseline data, randomize,
verify that they are similar,

00:30:35.830 --> 00:30:38.570
and then if they are not
similar, then you can

00:30:38.570 --> 00:30:41.070
re-randomize again.

00:30:41.070 --> 00:30:44.120
There's controversy about how
many times you should do this,

00:30:44.120 --> 00:30:47.980
but for the most part, in
general, if you randomize, the

00:30:47.980 --> 00:30:49.440
two groups should
look similar.

00:30:49.440 --> 00:30:53.720
There are very few scenarios,
but they exist, where they

00:30:53.720 --> 00:30:55.040
don't look similar
to each other.

00:30:55.040 --> 00:30:57.770
And if you reach one of those
scenarios, you can

00:30:57.770 --> 00:30:59.420
re-randomize.

00:30:59.420 --> 00:31:02.910
What you can't do is
re-randomize when the

00:31:02.910 --> 00:31:04.580
treatment is already
being distributed.

00:31:04.580 --> 00:31:06.740
So if you already decided,
you're in the treatment group,

00:31:06.740 --> 00:31:08.260
you're in the control
group, you can't

00:31:08.260 --> 00:31:11.200
re-randomize at that phase.

00:31:11.200 --> 00:31:13.350
The second reason you want to
collect data, and this is

00:31:13.350 --> 00:31:15.530
going to be important
particularly in a setting like

00:31:15.530 --> 00:31:18.310
yours, if you are worried about
sample size, is that it

00:31:18.310 --> 00:31:19.960
buys a lot of statistical
power.

00:31:19.960 --> 00:31:22.540
Particularly if you can collect
data on the baseline

00:31:22.540 --> 00:31:24.800
version of the outcomes
that you care about.

00:31:24.800 --> 00:31:27.730
If you can do that, it's
highly desirable.

00:31:27.730 --> 00:31:31.310
The reality is that sometimes
it's feasible to collect

00:31:31.310 --> 00:31:34.080
baseline data and sometimes the
nature of implementation

00:31:34.080 --> 00:31:35.930
of the program makes
it difficult.

00:31:35.930 --> 00:31:42.804
But you will do well if you
can collect baseline data.

00:31:42.804 --> 00:31:46.020
AUDIENCE: Wouldn't it seem
that by the very fact of

00:31:46.020 --> 00:31:48.990
collecting the baseline data,
once we have already

00:31:48.990 --> 00:31:57.547
randomized, can bias this
randomized by collecting the

00:31:57.547 --> 00:31:58.720
baseline data?

00:31:58.720 --> 00:32:05.120
PROFESSOR: Because you're
affecting the people who are

00:32:05.120 --> 00:32:07.000
answering the survey?

00:32:07.000 --> 00:32:10.210
Well, this has to do a little
bit more with survey design

00:32:10.210 --> 00:32:11.600
than with any other thing.

00:32:11.600 --> 00:32:15.530
The key is, you're going to
collect baseline data for both

00:32:15.530 --> 00:32:17.520
the participant or
the treatment

00:32:17.520 --> 00:32:19.280
and the control group.

00:32:19.280 --> 00:32:21.910
So if you feel that when
people answer a

00:32:21.910 --> 00:32:24.810
survey, they somehow--

00:32:24.810 --> 00:32:25.910
I don't know--

00:32:25.910 --> 00:32:29.260
get optimistic about life and
do better or the other way

00:32:29.260 --> 00:32:34.440
around, as long as it happens
in the same way for both

00:32:34.440 --> 00:32:37.070
treatment and control groups,
it's not a problem for the

00:32:37.070 --> 00:32:38.780
randomized trials.

00:32:38.780 --> 00:32:41.200
The problem would be if, for
some reason, you think that

00:32:41.200 --> 00:32:43.400
administering a survey is going
to affect the treatment

00:32:43.400 --> 00:32:44.900
and the control group
differently.

00:32:44.900 --> 00:32:47.450
If that's the case, then you
need to be careful about how

00:32:47.450 --> 00:32:50.134
you do the survey.

00:32:50.134 --> 00:32:55.660
AUDIENCE: Can you explain how
[INAUDIBLE] statistical power?

00:32:55.660 --> 00:32:59.890
PROFESSOR: So in technical
terms, what happens is, you,

00:32:59.890 --> 00:33:03.680
in your regression, where you
estimate an impact, you have

00:33:03.680 --> 00:33:05.780
an outcome of interest.

00:33:05.780 --> 00:33:10.530
And that outcome has a variance,
has some variations.

00:33:10.530 --> 00:33:13.710
And then if you can add into
your regressions statistical

00:33:13.710 --> 00:33:17.610
controls, things you collected
at baseline, what essentially

00:33:17.610 --> 00:33:20.720
happens is, in technical terms,
the standard errors of

00:33:20.720 --> 00:33:22.910
your coefficients, particularly
if these

00:33:22.910 --> 00:33:25.990
variables have a lot of
explanatory power, those

00:33:25.990 --> 00:33:27.480
standard errors should
drop, and you get

00:33:27.480 --> 00:33:28.730
more statistical power.

00:33:31.220 --> 00:33:32.210
Yes, Jessica?

00:33:32.210 --> 00:33:34.110
AUDIENCE: Do you mean to say
that you have to collect the

00:33:34.110 --> 00:33:36.960
baseline data after you do the
first round of randomization?

00:33:36.960 --> 00:33:38.180
Does it matter what order
you do those steps in?

00:33:38.180 --> 00:33:39.010
PROFESSOR: Sorry.

00:33:39.010 --> 00:33:42.290
Steps two and three
can be inverted.

00:33:42.290 --> 00:33:47.110
In fact, it would be ideal
if you could invert them.

00:33:47.110 --> 00:33:49.000
It would be ideal, because
then you can do

00:33:49.000 --> 00:33:50.010
what Iqbal is saying.

00:33:50.010 --> 00:33:52.570
Which is, you collect the
baseline data, you do the

00:33:52.570 --> 00:33:55.230
randomization, and
then you say, OK.

00:33:55.230 --> 00:33:57.160
Are they the same or not?

00:33:57.160 --> 00:34:00.100
Then if they're not the same,
you re-randomize.

00:34:00.100 --> 00:34:03.290
If you collect the baseline data
after randomly assigning,

00:34:03.290 --> 00:34:06.360
unless you have not communicated
to people who

00:34:06.360 --> 00:34:09.139
gets the treatment and who gets
the control, your options

00:34:09.139 --> 00:34:11.429
for re-randomizing are
not very good.

00:34:11.429 --> 00:34:13.880
So very good point.

00:34:13.880 --> 00:34:14.360
All right.

00:34:14.360 --> 00:34:16.850
So the fourth step is to verify
that the assignment

00:34:16.850 --> 00:34:17.630
looks random.

00:34:17.630 --> 00:34:19.770
By verifying that the assignment
looks random, this

00:34:19.770 --> 00:34:22.760
is something that if you were
to commission an evaluation,

00:34:22.760 --> 00:34:24.590
you should make sure
that your evaluator

00:34:24.590 --> 00:34:26.790
provides to you this.

00:34:26.790 --> 00:34:29.580
Which is at the very least a
table that says, here's the

00:34:29.580 --> 00:34:33.080
treatment group, here's the
control group, and here's how

00:34:33.080 --> 00:34:35.110
they look like in terms
of these baseline

00:34:35.110 --> 00:34:36.350
characteristics.

00:34:36.350 --> 00:34:40.290
And ideally those two groups,
those tables should have very,

00:34:40.290 --> 00:34:44.159
very few differences
between the groups.

00:34:44.159 --> 00:34:47.370
When I say differences, they
cannot be, in practical terms,

00:34:47.370 --> 00:34:48.699
large differences.

00:34:48.699 --> 00:34:50.239
There could be some differences
that are

00:34:50.239 --> 00:34:53.199
statistically significant,
because either you have a lot

00:34:53.199 --> 00:34:57.920
of statistical power, or more
likely, if you compare 10

00:34:57.920 --> 00:35:00.770
variables, some of them will
end up being significant.

00:35:00.770 --> 00:35:03.210
The key is, there are no
systematic differences between

00:35:03.210 --> 00:35:03.880
the groups.

00:35:03.880 --> 00:35:06.410
If you observe systematic
differences,

00:35:06.410 --> 00:35:07.800
then you're in trouble.

00:35:07.800 --> 00:35:09.150
This didn't work well.

00:35:09.150 --> 00:35:12.350
But I can tell you from
experience, from the law of

00:35:12.350 --> 00:35:17.360
statistics, these two groups
will look the same

00:35:17.360 --> 00:35:20.430
a lot of the time.

00:35:20.430 --> 00:35:20.770
OK.

00:35:20.770 --> 00:35:24.490
So obviously you can only do
that verification if you have

00:35:24.490 --> 00:35:27.290
some data on the two
groups before.

00:35:27.290 --> 00:35:30.110
Now, when I say "collect
baseline data," if maybe you

00:35:30.110 --> 00:35:33.360
already have baseline data--
for some reason this is a

00:35:33.360 --> 00:35:35.760
population that you're ready
serving, you already did

00:35:35.760 --> 00:35:38.790
surveys on these people--

00:35:38.790 --> 00:35:42.750
if that's the case, then
all the better.

00:35:42.750 --> 00:35:45.950
It may be that you don't have
baseline data, but you may be

00:35:45.950 --> 00:35:47.390
able to get baseline data.

00:35:47.390 --> 00:35:50.840
So for example, if you're
randomly assigning schools,

00:35:50.840 --> 00:35:53.770
you may have, from the
government or from some

00:35:53.770 --> 00:35:56.220
agency, some census
of schools.

00:35:56.220 --> 00:35:58.670
And you may be able to compare
schools in terms of

00:35:58.670 --> 00:36:01.050
socioeconomic characteristics
of the students.

00:36:01.050 --> 00:36:03.540
You may be able to compare
schools, you know, percent of

00:36:03.540 --> 00:36:04.800
private, public.

00:36:04.800 --> 00:36:07.040
If there was a test done
nationally for all the

00:36:07.040 --> 00:36:08.720
schools, you may be able
to compare test

00:36:08.720 --> 00:36:10.480
scores on those schools.

00:36:10.480 --> 00:36:13.490
The key thing is, anything you
can do to verify that, will a

00:36:13.490 --> 00:36:14.910
random assignment work?

00:36:14.910 --> 00:36:15.600
Is good.

00:36:15.600 --> 00:36:19.910
It would be useful to do
it at the beginning.

00:36:19.910 --> 00:36:22.940
The fifth step is to monitor
the process so that the

00:36:22.940 --> 00:36:24.830
integrity of the experiment
is not compromised.

00:36:24.830 --> 00:36:28.300
This is something that's
really, really key.

00:36:28.300 --> 00:36:30.660
When you do a randomized
experiment, designing the

00:36:30.660 --> 00:36:32.230
study carefully is
very important.

00:36:32.230 --> 00:36:34.480
Doing the random assignment
is very important.

00:36:34.480 --> 00:36:37.690
But you can't just relax and
then wait for two years until

00:36:37.690 --> 00:36:39.270
you collect the outcomes.

00:36:39.270 --> 00:36:41.840
And the people who are sitting
at the back of the room know

00:36:41.840 --> 00:36:43.540
this much better than I do.

00:36:43.540 --> 00:36:46.510
If you are not following exactly
what's happening in

00:36:46.510 --> 00:36:52.280
the field, the opportunities for
this experiment to not go

00:36:52.280 --> 00:36:56.160
well are very, very big.

00:36:56.160 --> 00:36:59.220
You're going to have a whole
session on Friday on threats

00:36:59.220 --> 00:37:01.470
to an experiment.

00:37:01.470 --> 00:37:05.290
The only thing I will say now
is that the best way to deal

00:37:05.290 --> 00:37:09.450
with threats to an experiment is
to avoid those threats, and

00:37:09.450 --> 00:37:12.950
to avoid them at this stage
of implementation.

00:37:12.950 --> 00:37:14.145
One very quick threat.

00:37:14.145 --> 00:37:16.890
If you assign people to a
treatment group and people to

00:37:16.890 --> 00:37:20.310
a control group, that means
that people in the control

00:37:20.310 --> 00:37:22.190
group are not offered
the treatment.

00:37:22.190 --> 00:37:24.890
But that also means, they
shouldn't get the treatment.

00:37:24.890 --> 00:37:29.160
And as some of you know, that
doesn't always happen.

00:37:29.160 --> 00:37:31.300
So some people in the control
group find their

00:37:31.300 --> 00:37:33.420
way into the program.

00:37:33.420 --> 00:37:36.960
Having systems to monitor that
this doesn't happen, and that

00:37:36.960 --> 00:37:42.180
if it does happen, that it
happens in very, very few

00:37:42.180 --> 00:37:44.810
exceptional cases, is going
to be very important.

00:37:44.810 --> 00:37:46.450
Yes, Logan?

00:37:46.450 --> 00:37:49.580
AUDIENCE: One of the arguments
for the superiority of the

00:37:49.580 --> 00:37:54.530
matched pairs is that if one
treatment group ends up not

00:37:54.530 --> 00:37:58.040
getting the treatment because
lack of capacity in that

00:37:58.040 --> 00:38:00.733
region, or vice versa, the
scenario you described, you

00:38:00.733 --> 00:38:02.210
can just drop that pair.

00:38:02.210 --> 00:38:03.140
PROFESSOR: Yes.

00:38:03.140 --> 00:38:06.810
The problem when you drop
that pair is that it may

00:38:06.810 --> 00:38:08.300
be costly to you.

00:38:08.300 --> 00:38:09.700
Dropping that pair.

00:38:09.700 --> 00:38:13.880
And you have to assume that
that-- well, first of all, you

00:38:13.880 --> 00:38:17.030
have to assume that pair was
comparable to begin with.

00:38:17.030 --> 00:38:22.000
And then even if you were to
drop that pair, well, first of

00:38:22.000 --> 00:38:23.990
all, matching doesn't always
work on one-to-one.

00:38:23.990 --> 00:38:26.940
But even if you had one-to-one
matching, suppose you had to

00:38:26.940 --> 00:38:31.910
drop 10% or 20% or 30% of your
pairs, then you lose

00:38:31.910 --> 00:38:34.040
statistical power, and then
you also lose external

00:38:34.040 --> 00:38:36.700
validity to begin with.

00:38:36.700 --> 00:38:37.230
Yes?

00:38:37.230 --> 00:38:39.940
AUDIENCE: So there's also the
issue of spillover effect,

00:38:39.940 --> 00:38:40.910
which isn't the same.

00:38:40.910 --> 00:38:43.730
So one might be that somebody
sneaks into the program who

00:38:43.730 --> 00:38:44.888
was supposed to be
in the program.

00:38:44.888 --> 00:38:47.710
But the other is, if you do
things in the same community,

00:38:47.710 --> 00:38:50.460
which is often the case in the
work that we do, or in a

00:38:50.460 --> 00:38:53.370
similar environment, the
mere effect of having

00:38:53.370 --> 00:38:54.410
something going on--

00:38:54.410 --> 00:38:55.300
PROFESSOR: Yes.

00:38:55.300 --> 00:38:58.460
And this is why the first
stage is very important.

00:38:58.460 --> 00:39:02.360
If you think spillovers will
occur, the moment to think

00:39:02.360 --> 00:39:04.980
about them is at the design
stage of the evaluation.

00:39:04.980 --> 00:39:08.460
Because then you can decide on
how you're going to randomize

00:39:08.460 --> 00:39:11.120
in a way that minimizes
the effect that

00:39:11.120 --> 00:39:12.890
spillovers would have.

00:39:12.890 --> 00:39:16.410
So there's some statistical
techniques to deal with some

00:39:16.410 --> 00:39:17.050
of these problems.

00:39:17.050 --> 00:39:20.020
But the best way to do with
these problems is to avoid

00:39:20.020 --> 00:39:20.980
them in the first place.

00:39:20.980 --> 00:39:23.400
And you avoid them by good
design, where the evaluator

00:39:23.400 --> 00:39:27.200
can help, and by a good
monitoring system to make sure

00:39:27.200 --> 00:39:31.860
that the evaluation is being
implemented as intended.

00:39:31.860 --> 00:39:32.500
Makes sense?

00:39:32.500 --> 00:39:35.640
Yes, your name please?

00:39:35.640 --> 00:39:38.800
Are you also filming from
this camera here?

00:39:38.800 --> 00:39:40.160
OK.

00:39:40.160 --> 00:39:41.200
I'm nervous now.

00:39:41.200 --> 00:39:44.170
Two cameras.

00:39:44.170 --> 00:39:46.240
AUDIENCE: What's on
the [INAUDIBLE]

00:39:46.240 --> 00:39:50.010
to avoid [INAUDIBLE]?

00:39:50.010 --> 00:39:50.730
PROFESSOR: Yes.

00:39:50.730 --> 00:39:57.490
So I think one important thing
is to have people in the field

00:39:57.490 --> 00:40:02.300
who can help monitor, and who
know about the evaluation.

00:40:02.300 --> 00:40:04.120
Two is to have a clear
commitment.

00:40:04.120 --> 00:40:05.950
This is something that Rachel
said this morning that's

00:40:05.950 --> 00:40:07.810
really, really key.

00:40:07.810 --> 00:40:11.850
Very clear commitment from
whoever is organizing.

00:40:11.850 --> 00:40:13.470
That's very creative.

00:40:13.470 --> 00:40:18.990
For whoever is implementing
the program.

00:40:18.990 --> 00:40:20.350
So I'll give you an example.

00:40:20.350 --> 00:40:23.770
We were evaluating this
program in Jamaica.

00:40:23.770 --> 00:40:28.000
And we were telling them,
we need to monitor the

00:40:28.000 --> 00:40:28.670
crossovers.

00:40:28.670 --> 00:40:31.220
We can't have people who are
not supposed to receive the

00:40:31.220 --> 00:40:32.790
program, get into the program.

00:40:32.790 --> 00:40:35.850
Yes, yes, yes.

00:40:35.850 --> 00:40:37.730
Is it OK if a few do it?

00:40:37.730 --> 00:40:41.270
We say, well, only if a few, but
really, this has to be the

00:40:41.270 --> 00:40:44.160
exception, and you really have
to monitor this rate, and we

00:40:44.160 --> 00:40:46.540
asked them for a report on
this rate, and so on.

00:40:46.540 --> 00:40:49.340
This is a government
agency in Jamaica.

00:40:49.340 --> 00:40:51.050
And so they were all the
time asking, OK.

00:40:51.050 --> 00:40:54.200
How many is too many?

00:40:54.200 --> 00:40:55.740
And we were like,
no, no, no, no.

00:40:55.740 --> 00:40:57.960
You have to keep that
rate to a minimum.

00:40:57.960 --> 00:41:00.590
There's no way you can
have crossovers.

00:41:00.590 --> 00:41:01.760
Just keep--

00:41:01.760 --> 00:41:03.680
no, but how many, how many?

00:41:03.680 --> 00:41:07.990
In one day of weakness,
we said, OK.

00:41:07.990 --> 00:41:10.790
If it's more than 10%, this
is completely ruined.

00:41:10.790 --> 00:41:13.190
We can't do anything with it.

00:41:13.190 --> 00:41:17.000
So end of the evaluation
arrived.

00:41:17.000 --> 00:41:19.110
We compute the crossover rate.

00:41:19.110 --> 00:41:20.360
9.6%.

00:41:21.990 --> 00:41:25.460
So what I want to say here is
that if they didn't want to

00:41:25.460 --> 00:41:29.120
comply with our request, they
could have made this rate be

00:41:29.120 --> 00:41:33.430
30% or 40% and we would have not
heard anything about it.

00:41:33.430 --> 00:41:35.320
I'm not saying 10% is
the right threshold.

00:41:35.320 --> 00:41:38.190
It of course depends on the
program and on other things.

00:41:38.190 --> 00:41:41.680
But the key thing here is, you
need to have full cooperation

00:41:41.680 --> 00:41:44.520
between the people in the field
who are implementing and

00:41:44.520 --> 00:41:46.580
the people in the field
who are evaluating.

00:41:46.580 --> 00:41:49.370
If you don't have that, then
it's very difficult.

00:41:49.370 --> 00:41:52.700
Because people find a way to get
to a program if they hear

00:41:52.700 --> 00:41:55.880
that this program is serving,
is doing some good.

00:41:55.880 --> 00:42:00.630
So I mean, who's a parent
in this room?

00:42:00.630 --> 00:42:00.940
All right.

00:42:00.940 --> 00:42:02.680
So now, confess.

00:42:02.680 --> 00:42:06.970
If your child, in your school,
there was a randomized trial

00:42:06.970 --> 00:42:09.700
on this very promising,
you name it.

00:42:09.700 --> 00:42:11.150
After school program.

00:42:11.150 --> 00:42:15.110
And your child fell in
the control group.

00:42:15.110 --> 00:42:18.200
Would you be at least tempted
to go to the principal and

00:42:18.200 --> 00:42:22.070
say, I want my child
in that program?

00:42:22.070 --> 00:42:23.260
Tempted?

00:42:23.260 --> 00:42:23.810
All right.

00:42:23.810 --> 00:42:26.090
I can tell you that other
parents are more than tempted,

00:42:26.090 --> 00:42:27.740
and will find a way.

00:42:27.740 --> 00:42:29.070
All right.

00:42:29.070 --> 00:42:30.180
AUDIENCE: What do you do
with the spillovers?

00:42:30.180 --> 00:42:31.880
Do you just exclude them
and put them in

00:42:31.880 --> 00:42:33.630
the comparison group?

00:42:33.630 --> 00:42:36.100
PROFESSOR: So these are called
crossovers, because they cross

00:42:36.100 --> 00:42:37.900
from the control to
the treatment.

00:42:37.900 --> 00:42:41.050
The key thing-- this comes at
the analysis stage, and this

00:42:41.050 --> 00:42:42.700
you'll do on Friday.

00:42:42.700 --> 00:42:46.530
But the key thing is, what
random assignment buys you is

00:42:46.530 --> 00:42:49.400
that the two groups are
comparable as a whole.

00:42:49.400 --> 00:42:52.460
The whole treatment group with
the whole control group.

00:42:52.460 --> 00:42:54.470
You can't then just say,
oh, I don't like this

00:42:54.470 --> 00:42:55.360
control group member.

00:42:55.360 --> 00:42:56.670
I'm just going to
throw it out.

00:42:56.670 --> 00:42:59.230
That completely destroys
the comparability.

00:42:59.230 --> 00:43:02.660
You still need to compare the
full two groups, and you do

00:43:02.660 --> 00:43:05.890
some statistical adjustments
to deal with the crossover.

00:43:05.890 --> 00:43:08.590
But once a treatment,
always a treatment.

00:43:08.590 --> 00:43:10.930
Once a control, always
a control.

00:43:10.930 --> 00:43:13.940
The random assignment buys you
that two groups are the same.

00:43:13.940 --> 00:43:17.370
If you throw away-- suppose
then, 10% of crossovers.

00:43:17.370 --> 00:43:21.280
If you throw them away you will
be comparing the whole

00:43:21.280 --> 00:43:24.540
treatment group with this 90%
of the control group.

00:43:24.540 --> 00:43:28.140
And let's just assume for a
second that that 10% who

00:43:28.140 --> 00:43:31.060
crossover are people who are
particularly motivated, and

00:43:31.060 --> 00:43:32.820
that's why they switch over.

00:43:32.820 --> 00:43:35.770
Well then, the average
motivation of the two groups

00:43:35.770 --> 00:43:37.840
were the same at the beginning,
but once you throw

00:43:37.840 --> 00:43:41.030
that 10% away, the average
motivation of the treatment

00:43:41.030 --> 00:43:43.220
group is going to be higher than
the average motivation of

00:43:43.220 --> 00:43:43.960
the control group.

00:43:43.960 --> 00:43:46.400
So any difference you find in
outcomes between these two

00:43:46.400 --> 00:43:49.400
groups could be due to the
program, but could also be due

00:43:49.400 --> 00:43:51.700
to differences in motivation.

00:43:51.700 --> 00:43:53.140
You can't throw them away.

00:43:53.140 --> 00:43:56.020
There's statistical ways
of dealing with them.

00:43:56.020 --> 00:43:57.965
Yes?

00:43:57.965 --> 00:44:00.090
AUDIENCE: Turns out, I guess I
didn't understand the answer

00:44:00.090 --> 00:44:02.260
to the earlier question.

00:44:02.260 --> 00:44:05.630
So we're worried about
spillover, and we're going to

00:44:05.630 --> 00:44:08.450
deliver books to--

00:44:08.450 --> 00:44:11.990
clearly the intervention is that
the kids get books that

00:44:11.990 --> 00:44:14.350
they can take home to
study at night.

00:44:14.350 --> 00:44:17.750
But I've decided that because
I'm worried about spillover

00:44:17.750 --> 00:44:20.660
and because it's more
administratively convenient,

00:44:20.660 --> 00:44:24.370
I'm going to deliver
to some schools.

00:44:24.370 --> 00:44:28.360
So I'm going to draw the schools
at random, but I'm

00:44:28.360 --> 00:44:31.040
looking at the kids,
impact on the kids.

00:44:31.040 --> 00:44:33.100
PROFESSOR: That's OK.

00:44:33.100 --> 00:44:37.810
AUDIENCE: So even so, I haven't
damaged my ability to

00:44:37.810 --> 00:44:42.900
look at the students' effects,
because my unit of

00:44:42.900 --> 00:44:46.660
randomization was at
a different level.

00:44:46.660 --> 00:44:49.340
PROFESSOR: That's
perfectly fine.

00:44:49.340 --> 00:44:55.650
However, the higher the unit
of randomization, the more

00:44:55.650 --> 00:44:58.920
trouble you're going to have in
having enough statistical

00:44:58.920 --> 00:45:00.310
power to detect effects.

00:45:00.310 --> 00:45:02.880
But that's a topic that I want
to leave up to Thursday.

00:45:02.880 --> 00:45:04.510
But yes.

00:45:04.510 --> 00:45:06.750
I mean, when we say the
schools are treated--

00:45:06.750 --> 00:45:08.230
I mean, the schools
are buildings.

00:45:08.230 --> 00:45:10.130
They're not being treated
in any way.

00:45:10.130 --> 00:45:12.480
Unless you paint them or do
something to them, they're not

00:45:12.480 --> 00:45:13.150
being treated--

00:45:13.150 --> 00:45:14.010
AUDIENCE: Ours got paint.

00:45:14.010 --> 00:45:15.260
PROFESSOR: OK.

00:45:15.260 --> 00:45:19.790
So if it's just painting them,
then the schools--

00:45:19.790 --> 00:45:20.770
no, but seriously.

00:45:20.770 --> 00:45:24.150
When I say treated, who's being

00:45:24.150 --> 00:45:25.540
affected by the treatment?

00:45:25.540 --> 00:45:27.685
AUDIENCE: Well, I
can't have a--

00:45:27.685 --> 00:45:28.850
it's going to hurt my power.

00:45:28.850 --> 00:45:31.870
But I can randomize at a
different level than

00:45:31.870 --> 00:45:32.700
[INAUDIBLE].

00:45:32.700 --> 00:45:33.720
PROFESSOR: You can.

00:45:33.720 --> 00:45:35.910
Particularly if you want to
avoid spillovers, that's

00:45:35.910 --> 00:45:39.100
exactly what you should
be doing.

00:45:39.100 --> 00:45:39.840
All right.

00:45:39.840 --> 00:45:40.792
Yes?

00:45:40.792 --> 00:45:43.100
AUDIENCE: My name is Cesar.

00:45:43.100 --> 00:45:46.270
What happened when the
intervention is something

00:45:46.270 --> 00:45:47.010
about knowledge?

00:45:47.010 --> 00:45:50.900
For example, that some nurse
trained to a treatment group

00:45:50.900 --> 00:45:55.850
about wash your hands, and
this knowledge can--

00:45:55.850 --> 00:45:56.710
PROFESSOR: Can spillover.

00:45:56.710 --> 00:45:57.270
Yeah.

00:45:57.270 --> 00:45:58.190
That's exactly right.

00:45:58.190 --> 00:46:01.250
So again, you need to think
about the design of the study.

00:46:01.250 --> 00:46:04.170
If you really think it's going
to spill over, then you need

00:46:04.170 --> 00:46:07.970
to think about randomizing at
a higher level so that the

00:46:07.970 --> 00:46:09.940
spillover doesn't occur.

00:46:09.940 --> 00:46:11.560
I do have to say one thing.

00:46:11.560 --> 00:46:13.830
There's some interventions
where the

00:46:13.830 --> 00:46:15.290
spillover is evident.

00:46:15.290 --> 00:46:17.550
And you're going to see that
in the deworming case.

00:46:17.550 --> 00:46:18.880
I think it's case number 4.

00:46:18.880 --> 00:46:21.400
So it's very clear that
this is happening.

00:46:21.400 --> 00:46:24.570
There's a human biological
transmission of disease that

00:46:24.570 --> 00:46:26.070
makes spillovers very clear.

00:46:28.820 --> 00:46:30.070
This is my own bias.

00:46:30.070 --> 00:46:34.110
But there are tons of problem
programs out there that have

00:46:34.110 --> 00:46:36.620
difficulty affecting
the people that

00:46:36.620 --> 00:46:38.620
they're intended to.

00:46:38.620 --> 00:46:41.880
So thinking that they're going
to affect other people they

00:46:41.880 --> 00:46:45.800
haven't been intending to help,
in some cases at least,

00:46:45.800 --> 00:46:46.750
is a stretch.

00:46:46.750 --> 00:46:50.080
Having said that, if you think
spillovers will occur, then

00:46:50.080 --> 00:46:52.520
you need to think about
that at the design

00:46:52.520 --> 00:46:54.130
stage of the study.

00:46:54.130 --> 00:46:54.630
yes?

00:46:54.630 --> 00:46:55.365
Your name please?

00:46:55.365 --> 00:46:56.006
AUDIENCE: Yes, sir.

00:46:56.006 --> 00:46:57.956
Raj.

00:46:57.956 --> 00:46:59.888
Just getting back to the example
where you were saying

00:46:59.888 --> 00:47:02.061
if you took each of us, and
you assigned us to two

00:47:02.061 --> 00:47:04.718
different groups, it would
adjust for the unobservable

00:47:04.718 --> 00:47:05.684
characteristics.

00:47:05.684 --> 00:47:08.300
Would that work out in a
sample size so small?

00:47:08.300 --> 00:47:11.280
PROFESSOR: In a sample size
like this, you will have

00:47:11.280 --> 00:47:12.480
trouble with statistical--

00:47:12.480 --> 00:47:14.290
I want to leave all those
questions of--

00:47:14.290 --> 00:47:17.340
you have our superstar, Esther
Duflo, who's going to speak

00:47:17.340 --> 00:47:18.690
about statistical power.

00:47:18.690 --> 00:47:26.390
But the key thing here is, if
you have a small group, then

00:47:26.390 --> 00:47:28.970
what happens is the sampling
error is bigger.

00:47:28.970 --> 00:47:31.300
So you may observe differences
between the groups.

00:47:31.300 --> 00:47:33.930
You may not declare them to be
statistically significant

00:47:33.930 --> 00:47:35.910
because you have very
little power.

00:47:35.910 --> 00:47:39.610
So in general, you want
larger sample sizes.

00:47:39.610 --> 00:47:41.180
This group is probably small.

00:47:41.180 --> 00:47:43.470
But even if you did it with this
group, and I challenge

00:47:43.470 --> 00:47:44.440
you to do it--

00:47:44.440 --> 00:47:46.570
just take an Excel spreadsheet
and take five

00:47:46.570 --> 00:47:48.060
characteristics of you.

00:47:48.060 --> 00:47:50.040
And the random assignment,
you're going to see some

00:47:50.040 --> 00:47:51.640
differences.

00:47:51.640 --> 00:47:53.660
But it's really amazing
how the two

00:47:53.660 --> 00:47:54.910
groups will look alike.

00:47:54.910 --> 00:47:55.620
And the other thing.

00:47:55.620 --> 00:47:59.680
If you're not accounting for
unobservable differences like

00:47:59.680 --> 00:48:02.260
some non-experimental
methods do.

00:48:02.260 --> 00:48:04.630
The key thing about this is, you
don't need to account for

00:48:04.630 --> 00:48:07.090
anything, because the two groups
are balanced across

00:48:07.090 --> 00:48:08.060
these two things.

00:48:08.060 --> 00:48:11.750
So they have the same average
level of motivation, and so I

00:48:11.750 --> 00:48:14.270
don't need to control
statistically for motivation.

00:48:14.270 --> 00:48:16.580
Because that cannot be a
confounding factor if the two

00:48:16.580 --> 00:48:18.040
groups are the same.

00:48:18.040 --> 00:48:19.590
OK?

00:48:19.590 --> 00:48:20.110
All right.

00:48:20.110 --> 00:48:22.820
So step number six.

00:48:22.820 --> 00:48:25.060
If you're going to measure the
impact of a program on an

00:48:25.060 --> 00:48:27.110
outcome of interest,
you need to collect

00:48:27.110 --> 00:48:28.200
data on that outcome.

00:48:28.200 --> 00:48:29.780
And that's called
follow-up data.

00:48:29.780 --> 00:48:32.290
And the key thing is, you need
to collect that for both

00:48:32.290 --> 00:48:34.380
treatment and control groups.

00:48:34.380 --> 00:48:37.640
And it's important that it be
done in identical ways.

00:48:37.640 --> 00:48:42.690
So you can't, or it would not
be a good idea, to have

00:48:42.690 --> 00:48:46.370
treatment group data come from
one source, say, a survey, and

00:48:46.370 --> 00:48:48.620
control group data come from
another source, say,

00:48:48.620 --> 00:48:51.370
administrative data, because
data sources are generally not

00:48:51.370 --> 00:48:54.400
very compatible to each other.

00:48:54.400 --> 00:48:55.860
The seventh step.

00:48:55.860 --> 00:48:57.870
Of course, estimate the
program impact.

00:48:57.870 --> 00:49:00.280
And if the experiment is
properly done, what you should

00:49:00.280 --> 00:49:02.880
be doing is just compare the
outcomes-- the mean outcomes

00:49:02.880 --> 00:49:04.780
of the treatment group with
the mean outcomes of the

00:49:04.780 --> 00:49:06.170
control groups.

00:49:06.170 --> 00:49:09.560
Now, there are versions of the
experiments where they are

00:49:09.560 --> 00:49:12.790
more sophisticated, and then you
need to use the multiple

00:49:12.790 --> 00:49:14.920
regression framework to
control for things,

00:49:14.920 --> 00:49:16.850
particularly if you have
stratified your

00:49:16.850 --> 00:49:18.480
sample, and so on.

00:49:18.480 --> 00:49:21.990
But in general, the basic idea
is, there are no differences

00:49:21.990 --> 00:49:23.730
between these two groups.

00:49:23.730 --> 00:49:27.090
Then the simple differences in
outcomes between those groups

00:49:27.090 --> 00:49:30.170
should give you the impact
of the program.

00:49:30.170 --> 00:49:33.140
There are other reasons you may
want to use the regression

00:49:33.140 --> 00:49:35.630
framework, such as statistical
power, that we were talking

00:49:35.630 --> 00:49:38.110
about before, but this
is the basic idea.

00:49:38.110 --> 00:49:41.460
If the differences between the
two groups is very different

00:49:41.460 --> 00:49:44.870
than what you get with the
regression, you should start

00:49:44.870 --> 00:49:47.710
thinking about what's
going on.

00:49:47.710 --> 00:49:48.440
And then eight.

00:49:48.440 --> 00:49:51.200
And I think this is very
important for practitioners.

00:49:51.200 --> 00:49:53.440
You should assess whether
the program's impact are

00:49:53.440 --> 00:49:56.150
statistically significant, but
also if they're practically

00:49:56.150 --> 00:49:56.910
significant.

00:49:56.910 --> 00:49:58.910
So if statistically significant
means, we're

00:49:58.910 --> 00:50:01.500
confident that this impact
is different from 0 in a

00:50:01.500 --> 00:50:03.150
statistical sense.

00:50:03.150 --> 00:50:06.070
Having said that, the impact
may still be very small for

00:50:06.070 --> 00:50:07.250
any practical purposes.

00:50:07.250 --> 00:50:10.250
So it may be that a program
affects some outcome of

00:50:10.250 --> 00:50:14.010
interest, but the effect is so
small that you won't decide

00:50:14.010 --> 00:50:16.760
that this program was a success
on the basis of that.

00:50:16.760 --> 00:50:18.980
So both of those things
are important.

00:50:18.980 --> 00:50:22.240
The stars or the asterisks for
statistical significance are

00:50:22.240 --> 00:50:26.380
not enough for you to conclude
that a program is successful.

00:50:26.380 --> 00:50:27.480
Yes?

00:50:27.480 --> 00:50:28.676
Your name pace.

00:50:28.676 --> 00:50:30.222
AUDIENCE: Ashu.

00:50:30.222 --> 00:50:30.684
Yeah.

00:50:30.684 --> 00:50:33.120
I understand we can get the
mean just by seeing the

00:50:33.120 --> 00:50:34.760
difference between the
two sample sets.

00:50:34.760 --> 00:50:37.974
How do we get a handle on this
trend of standard error and

00:50:37.974 --> 00:50:40.334
consequently the statistical
significance?

00:50:40.334 --> 00:50:40.810
PROFESSOR: Yeah.

00:50:40.810 --> 00:50:47.340
So again, in the simplest, very,
very simple, you just do

00:50:47.340 --> 00:50:50.830
a comparison of two groups, this
is the standard t-test,

00:50:50.830 --> 00:50:52.530
there's nothing else to do.

00:50:52.530 --> 00:50:57.230
In practice, a lot of this
impact estimation is done

00:50:57.230 --> 00:50:58.970
through the regression
framework.

00:50:58.970 --> 00:51:01.250
However you're going to do it,
you're going to let your

00:51:01.250 --> 00:51:03.480
statistical software calculate
those standard errors.

00:51:03.480 --> 00:51:06.960
Of course you need to be careful
about things you learn

00:51:06.960 --> 00:51:08.820
on Thursday, such as clustering
and so on.

00:51:08.820 --> 00:51:11.530
You need to make sure that those
errors reflect that.

00:51:11.530 --> 00:51:16.570
But the basic idea is, you let
your statistical software or

00:51:16.570 --> 00:51:18.770
the evaluator calculate
those impacts.

00:51:18.770 --> 00:51:24.220
But as a proxy, if the two means
are not different, then

00:51:24.220 --> 00:51:26.330
it's going to be hard
to argue that this

00:51:26.330 --> 00:51:27.580
program had a big effect.

00:51:30.280 --> 00:51:31.670
OK.

00:51:31.670 --> 00:51:32.950
So random.

00:51:32.950 --> 00:51:37.060
As I said at the beginning,
anyone can tell me, what does

00:51:37.060 --> 00:51:38.310
the term "random" mean?

00:51:42.590 --> 00:51:42.970
Yes?

00:51:42.970 --> 00:51:44.540
AUDIENCE: Chosen by chance.

00:51:44.540 --> 00:51:48.330
PROFESSOR: Oh, you work for
public opinion polls.

00:51:48.330 --> 00:51:50.410
I should have asked you.

00:51:50.410 --> 00:51:51.040
All right.

00:51:51.040 --> 00:51:52.890
So "chosen by chance."
What does that mean?

00:51:55.794 --> 00:51:57.044
AUDIENCE: [INAUDIBLE]

00:52:00.634 --> 00:52:03.860
One can say random if there's
no systematic

00:52:03.860 --> 00:52:06.650
trend behind the selection.

00:52:06.650 --> 00:52:07.520
PROFESSOR: OK.

00:52:07.520 --> 00:52:08.480
Systematic trends.

00:52:08.480 --> 00:52:10.690
So you don't have someone
saying, you

00:52:10.690 --> 00:52:13.740
go here you go there.

00:52:13.740 --> 00:52:16.400
So suppose I wanted
to do a random

00:52:16.400 --> 00:52:18.930
assignment in this classroom.

00:52:18.930 --> 00:52:23.370
And I went here, and I closed
my eyes, and I throw a ball

00:52:23.370 --> 00:52:24.700
right here.

00:52:24.700 --> 00:52:25.870
I don't see where
I'm throwing.

00:52:25.870 --> 00:52:27.360
I just throw it.

00:52:27.360 --> 00:52:29.700
Person gets it, falls
into the treatment.

00:52:29.700 --> 00:52:31.252
Is that random?

00:52:31.252 --> 00:52:31.710
AUDIENCE: No.

00:52:31.710 --> 00:52:32.960
PROFESSOR: Why not?

00:52:35.460 --> 00:52:36.980
I already turned that
way, right?

00:52:36.980 --> 00:52:38.580
AUDIENCE: Maybe you
like the sun.

00:52:38.580 --> 00:52:40.810
PROFESSOR: Maybe
I like the sun.

00:52:40.810 --> 00:52:43.170
And the people sitting near the
sun may be different from

00:52:43.170 --> 00:52:44.800
the people who are not.

00:52:44.800 --> 00:52:46.160
Who knows.

00:52:46.160 --> 00:52:49.290
The key thing is that when we
say random, particularly in a

00:52:49.290 --> 00:52:53.780
simple randomized experiment,
what we mean is that everyone,

00:52:53.780 --> 00:52:58.070
every single one of you, has the
same probability of being

00:52:58.070 --> 00:53:00.920
selected into the
treatment group.

00:53:00.920 --> 00:53:02.000
Or into one of the groups.

00:53:02.000 --> 00:53:03.460
Let's say the treatment group.

00:53:03.460 --> 00:53:10.140
So the key thing here is that
Iqbal, Brook, Jamie, Jessica,

00:53:10.140 --> 00:53:13.620
everyone, Farah, everyone in
this room, if we do a simple

00:53:13.620 --> 00:53:16.660
random assignment, you should
have the same probability of

00:53:16.660 --> 00:53:18.810
being assigned to the
treatment group.

00:53:18.810 --> 00:53:21.440
So it has a precise statistical
definition.

00:53:21.440 --> 00:53:24.530
It's not just someone
saying, oh, yeah.

00:53:24.530 --> 00:53:26.250
We can't remember
how we did it.

00:53:26.250 --> 00:53:27.120
It must have been random.

00:53:27.120 --> 00:53:27.330
No.

00:53:27.330 --> 00:53:30.120
It has a very, very precise
definition.

00:53:30.120 --> 00:53:34.390
Because if you trust someone
telling you, it was random,

00:53:34.390 --> 00:53:36.980
and then you trust that word,
and then you start doing your

00:53:36.980 --> 00:53:39.970
study, and three years later,
you discover it wasn't random,

00:53:39.970 --> 00:53:42.910
you are not going to be very
happy with yourself.

00:53:42.910 --> 00:53:47.050
So there are variations
on this.

00:53:47.050 --> 00:53:49.360
If you have stratified, it
doesn't mean that everyone

00:53:49.360 --> 00:53:50.340
must have the same
probability.

00:53:50.340 --> 00:53:52.040
It means everyone
within a strata.

00:53:52.040 --> 00:53:56.050
But the basic idea is, before
we do random assignments, we

00:53:56.050 --> 00:53:59.380
should know the probability of
everyone being selected.

00:53:59.380 --> 00:54:01.990
When I say the same probability
of being selected

00:54:01.990 --> 00:54:04.080
into a treatment group,
that probability

00:54:04.080 --> 00:54:05.630
doesn't need to be half.

00:54:05.630 --> 00:54:07.210
So it could be a third.

00:54:07.210 --> 00:54:08.670
It could be two thirds.

00:54:08.670 --> 00:54:10.530
From a statistical power
perspective, you

00:54:10.530 --> 00:54:12.230
prefer half and half.

00:54:12.230 --> 00:54:15.700
But whatever it is, all of
you should have the same

00:54:15.700 --> 00:54:17.400
probability of being selected.

00:54:17.400 --> 00:54:19.160
Make sense?

00:54:19.160 --> 00:54:20.410
OK.

00:54:22.800 --> 00:54:26.043
AUDIENCE: In your example of
drawing the ball, is that a

00:54:26.043 --> 00:54:27.890
random assignment?

00:54:27.890 --> 00:54:28.150
PROFESSOR: Right.

00:54:28.150 --> 00:54:30.940
So again, it depends on the
details on how you do it.

00:54:30.940 --> 00:54:35.320
But suppose we have balls for,
I don't know, 30 participants

00:54:35.320 --> 00:54:39.150
or however many you are, and you
have balls from 1 to 30,

00:54:39.150 --> 00:54:41.890
and you mix the bag, and you
really trusted the physics

00:54:41.890 --> 00:54:44.700
that by mixing, that all the
balls would have the same

00:54:44.700 --> 00:54:47.470
chance of being selected,
and you draw one

00:54:47.470 --> 00:54:49.300
ball from the bag--

00:54:49.300 --> 00:54:51.480
all the balls had the same
chance of being selected.

00:54:51.480 --> 00:54:53.796
All of you had the same chance
of being selected.

00:54:53.796 --> 00:54:55.014
AUDIENCE: But the
second person--

00:54:55.014 --> 00:54:59.030
so when you draw one,
that's 1 out of 30.

00:54:59.030 --> 00:55:00.450
PROFESSOR: Yes.

00:55:00.450 --> 00:55:06.160
AUDIENCE: But the second time
you do it, you could have a--

00:55:06.160 --> 00:55:09.200
PROFESSOR: So if the sample size
is very, very small, you

00:55:09.200 --> 00:55:11.990
worry about sampling with
replacement and without

00:55:11.990 --> 00:55:13.110
replacing--

00:55:13.110 --> 00:55:18.550
if the population from which
you're drawing is very small,

00:55:18.550 --> 00:55:19.970
you may have an issue
with that.

00:55:19.970 --> 00:55:23.170
If the population is large, the
difference between 1 in

00:55:23.170 --> 00:55:28.820
1000 and 1 in 999, it's going
to be pretty small.

00:55:28.820 --> 00:55:30.260
If you do it sequentially
like that.

00:55:30.260 --> 00:55:34.200
If you do it in a computer,
you can have a randomizing

00:55:34.200 --> 00:55:38.060
device that just generates a
random number, and then you

00:55:38.060 --> 00:55:39.310
pick the first half.

00:55:41.980 --> 00:55:42.670
OK.

00:55:42.670 --> 00:55:46.040
So is random assignment the
same as random sampling?

00:55:53.730 --> 00:55:56.196
I see no, yes?

00:55:56.196 --> 00:55:56.630
AUDIENCE: No.

00:55:56.630 --> 00:55:58.130
PROFESSOR: No.

00:55:58.130 --> 00:55:59.440
I need a little bit
more than that.

00:55:59.440 --> 00:56:01.141
AUDIENCE: A random assignment,
you would have already

00:56:01.141 --> 00:56:04.300
narrowed down to a smaller
sample, and assigned within

00:56:04.300 --> 00:56:05.713
that sample.

00:56:05.713 --> 00:56:08.431
Random sampling would be taking
a group out of a whole

00:56:08.431 --> 00:56:09.790
population.

00:56:09.790 --> 00:56:10.420
PROFESSOR: OK.

00:56:10.420 --> 00:56:10.860
Very good.

00:56:10.860 --> 00:56:17.110
So one way think about this
is you have your target

00:56:17.110 --> 00:56:20.730
population, then you have
potential participants.

00:56:20.730 --> 00:56:24.870
This may be children you're
targeting to in your

00:56:24.870 --> 00:56:26.220
intervention.

00:56:26.220 --> 00:56:28.920
And then you have your
evaluation sample.

00:56:28.920 --> 00:56:34.170
Here's where the random
sampling could occur.

00:56:34.170 --> 00:56:36.150
So--

00:56:36.150 --> 00:56:37.450
sorry I forgot your name,

00:56:37.450 --> 00:56:38.270
AUDIENCE: I didn't tell you.

00:56:38.270 --> 00:56:39.120
PROFESSOR: You didn't tell me.

00:56:39.120 --> 00:56:41.800
This is even worse.

00:56:41.800 --> 00:56:42.600
Jean.

00:56:42.600 --> 00:56:46.100
So what Jean is saying is,
random sampling happened at

00:56:46.100 --> 00:56:47.070
this stage.

00:56:47.070 --> 00:56:49.400
Or could have happened
in this stage.

00:56:49.400 --> 00:56:53.870
What random sampling is buying
you is the ability to

00:56:53.870 --> 00:56:56.440
generalize from your
evaluation to

00:56:56.440 --> 00:56:57.790
this population here.

00:56:57.790 --> 00:57:00.080
And whether this is a population
of policy interests

00:57:00.080 --> 00:57:01.320
or not, that's a different
matter.

00:57:01.320 --> 00:57:04.730
But that's what random sampling
is buying you.

00:57:04.730 --> 00:57:07.760
What random assignment is doing
is once you have the

00:57:07.760 --> 00:57:10.540
samples-- so suppose there
are 100,000 potential

00:57:10.540 --> 00:57:11.640
participants.

00:57:11.640 --> 00:57:15.000
You don't have money to enroll
100,000 people in a program or

00:57:15.000 --> 00:57:16.350
in an evaluation.

00:57:16.350 --> 00:57:20.510
You pick, out of this 100,000,
5,000 at random, the results

00:57:20.510 --> 00:57:23.990
of your study are going to be
generalizable to this 100,000.

00:57:23.990 --> 00:57:27.380
Now, within this 5,000, you do
random assignment and you

00:57:27.380 --> 00:57:29.670
assign to a treatment group
and to a control group.

00:57:29.670 --> 00:57:34.800
Maybe of this 5,000, 2,500 fall
here, 2,500 fall here.

00:57:34.800 --> 00:57:38.770
What random assignment buys you
is these two groups are

00:57:38.770 --> 00:57:41.490
identical, and so any difference
you observe in

00:57:41.490 --> 00:57:43.670
outcomes is due to
the program.

00:57:43.670 --> 00:57:45.140
That's internal validity.

00:57:45.140 --> 00:57:48.760
That has to do with causal
inference that is about this

00:57:48.760 --> 00:57:50.920
5,000 that are here.

00:57:50.920 --> 00:57:56.340
So where the 5,000 generalize to
is an external validation.

00:57:56.340 --> 00:57:59.610
So they both have the word
"random," but these are two

00:57:59.610 --> 00:58:01.980
different concepts.

00:58:01.980 --> 00:58:04.370
Again, random assignment relates
to internal validity,

00:58:04.370 --> 00:58:05.320
causal inference.

00:58:05.320 --> 00:58:09.020
Random sampling refers
to external validity.

00:58:09.020 --> 00:58:09.480
yes?

00:58:09.480 --> 00:58:12.040
AUDIENCE: My name is Cornelia.

00:58:12.040 --> 00:58:13.590
PROFESSOR: I should
know it by now.

00:58:13.590 --> 00:58:14.840
AUDIENCE: I haven't
said it yet.

00:58:16.900 --> 00:58:18.100
Can you do one and
not the other?

00:58:18.100 --> 00:58:18.780
Not really.

00:58:18.780 --> 00:58:19.610
Do you have to--?

00:58:19.610 --> 00:58:20.430
PROFESSOR: You can, you can.

00:58:20.430 --> 00:58:21.130
In fact--

00:58:21.130 --> 00:58:22.230
well, sorry.

00:58:22.230 --> 00:58:25.040
If it's called a randomized
experiment, this

00:58:25.040 --> 00:58:28.440
one has to be there.

00:58:28.440 --> 00:58:30.610
This is what defines a
randomized experiment.

00:58:30.610 --> 00:58:31.860
there was random assignment.

00:58:34.250 --> 00:58:37.230
AUDIENCE: So you can do a
randomized assignment, even if

00:58:37.230 --> 00:58:38.600
your sampling is not running.

00:58:38.600 --> 00:58:39.760
PROFESSOR: That's right.

00:58:39.760 --> 00:58:43.350
So what that means is that then
you need to think about

00:58:43.350 --> 00:58:44.600
who you generalize to.

00:58:47.880 --> 00:58:48.580
All right.

00:58:48.580 --> 00:58:51.225
So advantages and limitations
of experiments.

00:58:53.790 --> 00:58:57.830
For those of you who are a
little bit more statistically

00:58:57.830 --> 00:59:03.170
inclined, the key thing about
random assignment is that not

00:59:03.170 --> 00:59:06.510
only on average the two groups
are the same, but the

00:59:06.510 --> 00:59:09.550
distribution, the statistical
distribution of the two

00:59:09.550 --> 00:59:13.220
groups, is the same.

00:59:13.220 --> 00:59:16.310
And this is very powerful for a
lot of the adjustments that

00:59:16.310 --> 00:59:19.010
come at a later stage,
particularly when there are

00:59:19.010 --> 00:59:21.200
crossovers and similar things.

00:59:21.200 --> 00:59:24.450
The idea is that the two groups
not only on average

00:59:24.450 --> 00:59:26.900
both unobservable, and
unobservable characteristics

00:59:26.900 --> 00:59:29.320
look the same, but the
whole distribution.

00:59:29.320 --> 00:59:31.480
So they have the same variance,
they have the same

00:59:31.480 --> 00:59:35.530
25th percentile, the same
75th percentile.

00:59:35.530 --> 00:59:38.720
And of course, when I say the
same, again, it's in a

00:59:38.720 --> 00:59:43.430
statistical sense, subject to
sampling error, which we can

00:59:43.430 --> 00:59:45.000
account for.

00:59:45.000 --> 00:59:45.970
And so there are--

00:59:45.970 --> 00:59:46.730
yes?

00:59:46.730 --> 00:59:49.287
AUDIENCE: That doesn't
necessarily mean that they're

00:59:49.287 --> 00:59:51.010
both anomolies.

00:59:51.010 --> 00:59:51.770
PROFESSOR: No, no, no.

00:59:51.770 --> 00:59:52.640
AUDIENCE: [INAUDIBLE]

00:59:52.640 --> 00:59:53.986
PROFESSOR: Anything.

00:59:53.986 --> 00:59:54.420
Yeah.

00:59:54.420 --> 00:59:56.760
Anything.

00:59:56.760 --> 00:59:58.910
But the distribution should
look the same.

01:00:01.510 --> 01:00:01.860
OK.

01:00:01.860 --> 01:00:04.790
So no systematic differences
between the two groups.

01:00:07.520 --> 01:00:12.610
This is deliberately
a repeated slide.

01:00:12.610 --> 01:00:15.340
I didn't forget to take it
out of the presentation.

01:00:15.340 --> 01:00:17.940
Key advantage, key takeaway
message--

01:00:17.940 --> 01:00:21.950
these two groups do not differ
systematically at the outset,

01:00:21.950 --> 01:00:25.620
so any difference you observe
should be attributable to the

01:00:25.620 --> 01:00:26.240
experiment.

01:00:26.240 --> 01:00:28.960
And this is under the big
assumption that the experiment

01:00:28.960 --> 01:00:31.160
was properly designed
and conducted.

01:00:31.160 --> 01:00:33.755
It's not like any experiment
will reach this.

01:00:38.660 --> 01:00:41.540
So other advantages
of experiments.

01:00:41.540 --> 01:00:45.330
Relative to results from
non-experimental studies,

01:00:45.330 --> 01:00:48.040
they're less subject to
methodological debates.

01:00:48.040 --> 01:00:51.760
So a lot more boring
conversations in academic

01:00:51.760 --> 01:01:00.140
seminars because there may be
some questions about what

01:01:00.140 --> 01:01:03.070
question is being answered,
there may be some questions

01:01:03.070 --> 01:01:05.370
about things that happen in
the field that may have

01:01:05.370 --> 01:01:06.710
threatened the experiment.

01:01:06.710 --> 01:01:09.830
But the basic notion that if it
was done properly, the two

01:01:09.830 --> 01:01:12.580
groups should look alike,
it's never debated.

01:01:12.580 --> 01:01:17.340
Whereas with non-experimental
methods, that's the whole sort

01:01:17.340 --> 01:01:22.500
of central claim of the seminar
and of the presenter.

01:01:22.500 --> 01:01:23.780
They're easier to convey.

01:01:23.780 --> 01:01:25.250
You can explain to
people, look.

01:01:25.250 --> 01:01:27.570
These two groups look alike
at the beginning.

01:01:27.570 --> 01:01:29.240
Now there's a difference.

01:01:29.240 --> 01:01:31.100
It must have been the program.

01:01:31.100 --> 01:01:34.240
And they're more likely to be
convincing to program funders

01:01:34.240 --> 01:01:35.920
and/or policymakers.

01:01:35.920 --> 01:01:39.460
If they find it more credible,
easier to convey, it's more

01:01:39.460 --> 01:01:40.960
likely that they will
take action.

01:01:40.960 --> 01:01:44.210
Although in this respect, I
can't emphasize enough what

01:01:44.210 --> 01:01:46.870
Rachel said, which is, look.

01:01:46.870 --> 01:01:50.090
If you have the right question,
then answering that

01:01:50.090 --> 01:01:53.650
question is going to be
important to lead to change.

01:01:53.650 --> 01:01:55.490
If you have the wrong question,
even if you did a

01:01:55.490 --> 01:01:58.500
nice experiment, it's not going
to help you that much.

01:01:58.500 --> 01:02:00.964
Yes?

01:02:00.964 --> 01:02:06.420
AUDIENCE: I've been to the
conference two months ago.

01:02:06.420 --> 01:02:12.372
Some people were arguing that
last first advantage that is

01:02:12.372 --> 01:02:15.950
with randomization--

01:02:15.950 --> 01:02:18.105
that's random assignment--

01:02:18.105 --> 01:02:28.200
how to build two groups that are
identical to each other.

01:02:28.200 --> 01:02:33.030
And some people argue that you
will almost never find a

01:02:33.030 --> 01:02:39.380
context where you will have
that situation occur.

01:02:39.380 --> 01:02:43.690
The way the government programs
operating in most

01:02:43.690 --> 01:02:51.710
cases, it is almost impossible
that you find an exact

01:02:51.710 --> 01:02:55.630
identical treatment group
and control group.

01:02:55.630 --> 01:03:00.730
PROFESSOR: See, the key thing
here is that you don't

01:03:00.730 --> 01:03:01.520
need to find it.

01:03:01.520 --> 01:03:04.270
It's not like you have a
treatment group and now let's

01:03:04.270 --> 01:03:06.980
look in the whole country, where
is the control group?

01:03:06.980 --> 01:03:08.400
No.

01:03:08.400 --> 01:03:12.270
This method forces the two
groups to be the same.

01:03:12.270 --> 01:03:14.680
As long as there are some people
who are going to be

01:03:14.680 --> 01:03:18.430
served by the program and some
that are not, if you randomly

01:03:18.430 --> 01:03:21.000
assign to these two
groups, the two

01:03:21.000 --> 01:03:22.260
groups should be identical.

01:03:22.260 --> 01:03:24.890
Not because you were very
smart and looked for the

01:03:24.890 --> 01:03:25.670
other group, no.

01:03:25.670 --> 01:03:32.370
It's like random assignment is
for those of us who precisely

01:03:32.370 --> 01:03:34.130
don't think we can come
up with that other

01:03:34.130 --> 01:03:37.340
group on our own.

01:03:37.340 --> 01:03:44.080
So there may be issues with
whether you have enough

01:03:44.080 --> 01:03:47.130
program applicants to be able
to divide them into two

01:03:47.130 --> 01:03:50.540
groups, participants and
non-participants.

01:03:50.540 --> 01:03:52.060
But in context where you're not

01:03:52.060 --> 01:03:55.240
serving all the two groups--

01:03:55.240 --> 01:03:58.470
so if you don't have money to
serve 1,000 people, and 1,000

01:03:58.470 --> 01:04:00.990
people applied to your program,
and you only have 400

01:04:00.990 --> 01:04:04.510
slots, that's not going to--
this goes to the ethical

01:04:04.510 --> 01:04:06.110
issue, which we'll discuss
in a second.

01:04:10.140 --> 01:04:12.990
The only thing that changes is
how you select those 400.

01:04:12.990 --> 01:04:15.520
But once you've selected
randomly, those two groups

01:04:15.520 --> 01:04:16.860
should look identical.

01:04:16.860 --> 01:04:20.940
Again, not because you were
incredibly astute at saying,

01:04:20.940 --> 01:04:22.260
oh, here's another group.

01:04:22.260 --> 01:04:22.500
No.

01:04:22.500 --> 01:04:25.490
This this happens through
the flip of a coin.

01:04:25.490 --> 01:04:30.380
This is not a researcher a kind
of, oh, can the research

01:04:30.380 --> 01:04:31.500
and find a group?

01:04:31.500 --> 01:04:35.140
Or the context is development
versus a developed country.

01:04:35.140 --> 01:04:36.780
This has to do with
the technique

01:04:36.780 --> 01:04:38.880
applied to any setting.

01:04:38.880 --> 01:04:43.240
Again, you're going to have
a case where you see a

01:04:43.240 --> 01:04:45.440
spreadsheet and you can see,
you can do the random

01:04:45.440 --> 01:04:47.430
assignment and see for yourself
that the two groups

01:04:47.430 --> 01:04:49.200
will look similar.

01:04:49.200 --> 01:04:50.450
OK?

01:04:52.480 --> 01:04:54.680
AUDIENCE: Is it necessary that
the size of the two groups

01:04:54.680 --> 01:04:56.120
have to be the same?

01:04:56.120 --> 01:04:57.470
PROFESSOR: No, it's
not necessary.

01:04:57.470 --> 01:05:01.150
And in fact in practice, what
happens is, suppose you had

01:05:01.150 --> 01:05:07.200
1,000 applicants and you
had money to serve 600.

01:05:07.200 --> 01:05:11.760
Then no matter what the
statistician says-- oh, it

01:05:11.760 --> 01:05:13.840
would be nice to have
500 and 500--

01:05:13.840 --> 01:05:18.140
you're not going to have 100
people not being served just

01:05:18.140 --> 01:05:23.170
because you want to keep
the half-half ratio.

01:05:23.170 --> 01:05:26.450
From a statistical perspective
it's ideal to have 50-50

01:05:26.450 --> 01:05:30.740
ratio, but only from a
statistical prospective.

01:05:30.740 --> 01:05:33.360
If you deviate too much
from that 50-50,

01:05:33.360 --> 01:05:34.750
then you get in trouble.

01:05:34.750 --> 01:05:36.683
So if you get to--

01:05:36.683 --> 01:05:37.410
I don't know.

01:05:37.410 --> 01:05:38.810
The rule of thumb
may be different

01:05:38.810 --> 01:05:39.540
for different people.

01:05:39.540 --> 01:05:44.040
But if you get over 70-30, I
would say probably you're

01:05:44.040 --> 01:05:45.300
going to lose a lot
of statistical

01:05:45.300 --> 01:05:46.400
power by doing that.

01:05:46.400 --> 01:05:51.450
AUDIENCE: Yeah, but in some
cases, for example, a country

01:05:51.450 --> 01:05:57.870
needs to make priority
in aid with about 200

01:05:57.870 --> 01:05:59.690
hospitals, for example.

01:05:59.690 --> 01:06:06.320
And in my country, there are one
hospital that is the most

01:06:06.320 --> 01:06:08.830
important public hospital
in Honduras.

01:06:08.830 --> 01:06:13.630
So you can apply this
randomized process.

01:06:13.630 --> 01:06:20.390
But if you don't include this
particular hospital, you

01:06:20.390 --> 01:06:24.300
cannot include this particular
hospital

01:06:24.300 --> 01:06:26.740
because it's too important.

01:06:26.740 --> 01:06:30.750
We call that [UNINTELLIGIBLE]

01:06:30.750 --> 01:06:32.940
[? represented ?]

01:06:32.940 --> 01:06:36.090
subject for this type of
problem, who have the

01:06:36.090 --> 01:06:38.960
possibility of 1.

01:06:38.960 --> 01:06:42.570
Should be in the sample.

01:06:42.570 --> 01:06:44.960
I don't know if you understand
my Spanglish.

01:06:44.960 --> 01:06:45.750
PROFESSOR: No, no.

01:06:45.750 --> 01:06:46.520
I speak Spanish.

01:06:46.520 --> 01:06:48.010
We can communicate here.

01:06:48.010 --> 01:06:55.320
So the key thing is, Again,
you're trying to create

01:06:55.320 --> 01:06:57.220
comparable groups.

01:06:57.220 --> 01:07:01.350
If for some reason you need to
serve a hospital because the

01:07:01.350 --> 01:07:04.260
president of your country says,
you need to serve this

01:07:04.260 --> 01:07:06.300
hospital, that's fine.

01:07:06.300 --> 01:07:07.650
One slot.

01:07:07.650 --> 01:07:10.605
But that hospital should not
be a part of your study,

01:07:10.605 --> 01:07:14.920
because that hospital was
not randomly assigned.

01:07:14.920 --> 01:07:15.570
That's all.

01:07:15.570 --> 01:07:16.550
As simple as that.

01:07:16.550 --> 01:07:17.690
And you may have
a few of those.

01:07:17.690 --> 01:07:21.890
I mean, I can tell you, in my
own experience, we're trying

01:07:21.890 --> 01:07:25.680
to implement random assignment
in Niger, a program financed

01:07:25.680 --> 01:07:28.156
by the Millennium Challenge
Corporation.

01:07:28.156 --> 01:07:31.560
A program about building
schools.

01:07:31.560 --> 01:07:33.250
We said, we're going to do
a random assignment.

01:07:33.250 --> 01:07:35.230
And they say, yes, yes, yes.

01:07:35.230 --> 01:07:38.380
Well, the US ambassador visited
two of the villages,

01:07:38.380 --> 01:07:41.740
and he promised them they
were getting schools.

01:07:41.740 --> 01:07:43.850
Now, you tell me if you want to
be the evaluator and tell

01:07:43.850 --> 01:07:44.870
those schools, no, no.

01:07:44.870 --> 01:07:47.780
We're going to put you
in the pool of--

01:07:47.780 --> 01:07:48.440
no way.

01:07:48.440 --> 01:07:51.520
Those two villages are going
to get their schools, but

01:07:51.520 --> 01:07:52.820
they're not part of
our evaluation.

01:07:57.552 --> 01:08:01.960
AUDIENCE: Is there an
acceptable margin?

01:08:01.960 --> 01:08:03.920
PROFESSOR: See, that's again
the Jamaica question.

01:08:03.920 --> 01:08:05.710
I won't make that
mistake again.

01:08:05.710 --> 01:08:06.960
I won't to tell you.

01:08:09.210 --> 01:08:11.080
You're going to see on Thursday
a whole session on

01:08:11.080 --> 01:08:13.110
statistical power, and you're
going to get a sense

01:08:13.110 --> 01:08:14.610
of where you are.

01:08:14.610 --> 01:08:16.870
You don't want to have too many
first, because you lose

01:08:16.870 --> 01:08:19.410
sample size, and second
because you lose

01:08:19.410 --> 01:08:20.500
representativeness.

01:08:20.500 --> 01:08:23.380
I mean, in the case of the
hospital in Honduras, if

01:08:23.380 --> 01:08:28.700
that's the hospital where 90% of
things are happening, then

01:08:28.700 --> 01:08:31.950
it's a little bit hard to have
that as a hospital that's out

01:08:31.950 --> 01:08:32.850
of your study.

01:08:32.850 --> 01:08:36.279
So that is an important issue.

01:08:36.279 --> 01:08:36.830
All right.

01:08:36.830 --> 01:08:38.510
There are limitations
of experiments,

01:08:38.510 --> 01:08:39.760
believe it or not.

01:08:42.279 --> 01:08:47.800
So the first one is, huge
methodological advantages.

01:08:47.800 --> 01:08:50.700
But you still need to worry
about these issues of internal

01:08:50.700 --> 01:08:53.410
validity and external
validity.

01:08:53.410 --> 01:08:56.210
And what I would say about this
is, on Friday youo're

01:08:56.210 --> 01:08:59.040
going to learn a lot about how
to do with these internal

01:08:59.040 --> 01:09:00.540
validity issues.

01:09:00.540 --> 01:09:02.270
And I'm not going to
go over them now.

01:09:02.270 --> 01:09:05.060
But the key thing is, if you
can avoid them from the

01:09:05.060 --> 01:09:07.790
beginning in terms of how you
design your program and how

01:09:07.790 --> 01:09:10.229
you implement them,
then much better.

01:09:10.229 --> 01:09:11.810
External validity issues--

01:09:11.810 --> 01:09:14.970
as Rachel said, any impact
evaluation conducted in a

01:09:14.970 --> 01:09:17.880
particular setting is going to
have external validity issues.

01:09:17.880 --> 01:09:20.279
But experiments are particularly
prone to them

01:09:20.279 --> 01:09:23.819
because they're sometimes done
in particularly concentrated

01:09:23.819 --> 01:09:26.350
areas where you really want to
find out, does this program

01:09:26.350 --> 01:09:28.490
work before expanding
it, so the external

01:09:28.490 --> 01:09:30.889
validity issue is there.

01:09:30.889 --> 01:09:33.810
As Rachel said, if you can
design an experiment to test

01:09:33.810 --> 01:09:40.760
each thing in your theory of
change, that usually helps

01:09:40.760 --> 01:09:41.600
with external validity.

01:09:41.600 --> 01:09:43.279
And of course, if you
can replicate

01:09:43.279 --> 01:09:45.220
evaluation in other settings.

01:09:45.220 --> 01:09:48.410
AUDIENCE: So OK, you're going
to have 10 variables with

01:09:48.410 --> 01:09:52.450
internal validity, equal
internal validity, but only

01:09:52.450 --> 01:09:54.910
three variables with
external validity?

01:09:54.910 --> 01:09:56.790
PROFESSOR: When you say three
variables, what do you mean

01:09:56.790 --> 01:09:58.950
with variables?

01:09:58.950 --> 01:10:02.310
AUDIENCE: The variables that
you are--variables.

01:10:02.310 --> 01:10:03.790
The study variables.

01:10:03.790 --> 01:10:08.310
I mean, when you're going to
evaluate internal validity,

01:10:08.310 --> 01:10:11.320
you're going to have
10 variables or 20.

01:10:11.320 --> 01:10:13.440
PROFESSOR: Well, internal
validity, the two groups

01:10:13.440 --> 01:10:14.450
should be the same.

01:10:14.450 --> 01:10:17.250
And you have pretty strong
internal validity if you can

01:10:17.250 --> 01:10:19.460
deal with this problem.

01:10:19.460 --> 01:10:24.190
AUDIENCE: When you're going to
the external validity, maybe

01:10:24.190 --> 01:10:28.090
not the whole 20 variables will
have external validity.

01:10:28.090 --> 01:10:33.620
But maybe your three or four
where you have been made

01:10:33.620 --> 01:10:35.390
different experiment in--

01:10:35.390 --> 01:10:39.100
PROFESSOR: So it really
depends on the

01:10:39.100 --> 01:10:40.880
context of your project.

01:10:40.880 --> 01:10:44.030
Again, I think the good
example is deworming.

01:10:44.030 --> 01:10:48.950
So deworming, you
take out worms.

01:10:48.950 --> 01:10:54.700
Well, in Honduras, if children
who go to school, there are no

01:10:54.700 --> 01:10:56.690
worms, and that's not the
reason they don't go to

01:10:56.690 --> 01:11:00.420
school, then that program in
Kenya doesn't have much

01:11:00.420 --> 01:11:03.020
external validity or
generalizability to Honduras.

01:11:03.020 --> 01:11:06.020
So you need to be thinking
about how the effect is

01:11:06.020 --> 01:11:06.875
supposed to be happening.

01:11:06.875 --> 01:11:09.930
And here there was the anemia
thing, which may work in the

01:11:09.930 --> 01:11:12.240
case of Honduras or not.

01:11:12.240 --> 01:11:15.600
You need to be seeing,
what is the chain?

01:11:15.600 --> 01:11:18.360
And seeing whether that chain is
likely to hold in whatever

01:11:18.360 --> 01:11:20.350
other contexts you
want to apply.

01:11:20.350 --> 01:11:22.260
There's no magic formula here.

01:11:22.260 --> 01:11:24.890
AUDIENCE: Yeah, but you are
going to control the

01:11:24.890 --> 01:11:28.880
theoretical framework with just
three, four variables

01:11:28.880 --> 01:11:33.700
because that variable will be
common in different countries?

01:11:33.700 --> 01:11:36.290
PROFESSOR: Yeah, but you
can have 200 variables.

01:11:36.290 --> 01:11:39.530
You can say, it depends
on so many things.

01:11:39.530 --> 01:11:42.810
But there's a limit
to how much--

01:11:42.810 --> 01:11:45.520
the external validity issue is
an issue that you can always

01:11:45.520 --> 01:11:46.370
hide behind it.

01:11:46.370 --> 01:11:49.520
You can always say, oh, this
program worked in Kenya.

01:11:49.520 --> 01:11:51.670
Who knows whether it would
work somewhere else?

01:11:51.670 --> 01:11:54.530
And then if you take that
attitude, then you can't learn

01:11:54.530 --> 01:11:57.880
anything from a randomized
experiment, or from any impact

01:11:57.880 --> 01:12:00.140
evaluation that's done in
a specific setting.

01:12:00.140 --> 01:12:03.340
Because even if you did it in
Kenya, in a particular point

01:12:03.340 --> 01:12:07.000
in time, you can always say,
well, it worked in Kenya ten

01:12:07.000 --> 01:12:09.370
years ago, but maybe it
won't work today.

01:12:09.370 --> 01:12:12.080
So I lean to the middle
ground here.

01:12:12.080 --> 01:12:15.290
You sort of think about what
are the critical steps or

01:12:15.290 --> 01:12:19.880
stages in which it can work, and
then go implement it, and

01:12:19.880 --> 01:12:21.630
maybe evaluate it.

01:12:21.630 --> 01:12:25.370
I think my answer here is,
external validity issues are

01:12:25.370 --> 01:12:27.090
going to be present for
both experiments and

01:12:27.090 --> 01:12:27.730
non-experiments.

01:12:27.730 --> 01:12:29.370
There is no magic
formula here.

01:12:29.370 --> 01:12:32.060
As long as you evaluate in a
particular setting, you're

01:12:32.060 --> 01:12:35.400
still going to be subject to the
question, does it work in

01:12:35.400 --> 01:12:38.610
some other setting?

01:12:38.610 --> 01:12:40.980
Some of these threats also
affect the validity of

01:12:40.980 --> 01:12:43.080
non-experimental studies.

01:12:43.080 --> 01:12:45.930
The key thing, though, is that
some of this, in the

01:12:45.930 --> 01:12:49.290
non-experimental studies, you
may not even realize that you

01:12:49.290 --> 01:12:50.370
have the threat.

01:12:50.370 --> 01:12:53.160
Because you've already done
something that allows you to

01:12:53.160 --> 01:12:54.960
be blind to the threat.

01:12:59.600 --> 01:13:03.820
So other limitations, the
experiment measures the impact

01:13:03.820 --> 01:13:07.070
of the offer of the treatment.

01:13:07.070 --> 01:13:13.770
So when we implement the
program, and we say, OK, you

01:13:13.770 --> 01:13:15.600
are in the treatment group,
you're going to get the

01:13:15.600 --> 01:13:18.510
program, as you know from
implementing these programs in

01:13:18.510 --> 01:13:21.580
the field, not all of the people
you offer the program

01:13:21.580 --> 01:13:23.670
are going to take
up the program.

01:13:23.670 --> 01:13:27.680
So what the experiment buys you
is, the whole treatment

01:13:27.680 --> 01:13:29.860
group is comparable to the
whole control group.

01:13:29.860 --> 01:13:33.030
So the experiment is going to
tell you, this is the impact

01:13:33.030 --> 01:13:36.000
for every, on average, for the
whole treatment group.

01:13:36.000 --> 01:13:39.940
So some of them may not have
received the program, and some

01:13:39.940 --> 01:13:41.730
of them may be diluting
the impact of the

01:13:41.730 --> 01:13:43.310
program when you estimate.

01:13:43.310 --> 01:13:48.270
But technically, that's the
impact that the experiment is

01:13:48.270 --> 01:13:49.170
estimating.

01:13:49.170 --> 01:13:53.470
So if you have a program with a
very low take-up rate, then

01:13:53.470 --> 01:13:56.710
you need to worry about the
issue that the non-takers are

01:13:56.710 --> 01:13:58.950
going to dilute the effect
of the program.

01:13:58.950 --> 01:14:01.670
You can then go and calculate,
what is the effect of the

01:14:01.670 --> 01:14:04.070
program for those who
participated?

01:14:04.070 --> 01:14:07.810
But then you start relying on
non-experimental assumptions.

01:14:07.810 --> 01:14:11.220
You've lost a bit the advantage
of the experiment.

01:14:11.220 --> 01:14:14.750
So that's something that you
need to think about when you

01:14:14.750 --> 01:14:16.000
do an experiment.

01:14:18.840 --> 01:14:20.440
There's a limitation
in terms of these

01:14:20.440 --> 01:14:22.726
experiments can be costly.

01:14:22.726 --> 01:14:25.250
I'll sort of just say two things
about being costly.

01:14:29.210 --> 01:14:31.460
I'll say three things
about being costly.

01:14:31.460 --> 01:14:33.930
And I did learn that I should
never say "I'll say three

01:14:33.930 --> 01:14:35.960
things," and I'll forget what
those three things are.

01:14:35.960 --> 01:14:37.360
But I think I'll keep
them in mind.

01:14:37.360 --> 01:14:39.310
The first thing--

01:14:39.310 --> 01:14:42.340
a lot of the cost of an
experiment is data collection.

01:14:42.340 --> 01:14:45.150
So if you are trying to evaluate
the impact of a

01:14:45.150 --> 01:14:48.540
program through some other
non-experimental method that

01:14:48.540 --> 01:14:54.280
involves data collection, you've
already made the two

01:14:54.280 --> 01:14:55.540
costs pretty comparable.

01:14:55.540 --> 01:14:58.340
Because again, data collection
is a big cost.

01:14:58.340 --> 01:15:00.760
If you had a non-experimental
method where you don't have to

01:15:00.760 --> 01:15:04.150
collect data, obviously there's
no question that that

01:15:04.150 --> 01:15:05.870
is going to be cheaper.

01:15:05.870 --> 01:15:07.330
So it can be costly.

01:15:07.330 --> 01:15:09.790
But again, main cost data
collection, which may be the

01:15:09.790 --> 01:15:13.490
same for non-experimental
studies that collect data.

01:15:13.490 --> 01:15:17.540
But the other thing about the
experiment in terms of cost is

01:15:17.540 --> 01:15:22.000
that the same sample size buys
you more statistical power.

01:15:22.000 --> 01:15:24.570
And you may see some of
this on Thursday.

01:15:24.570 --> 01:15:27.750
So if you have a sample size
of 1,000 people for an

01:15:27.750 --> 01:15:31.590
experimental study and a sample
size of 1,000 people

01:15:31.590 --> 01:15:35.310
for a non-experimental study,
those data collections' cost

01:15:35.310 --> 01:15:38.010
will be identical, but they will
be buying you different

01:15:38.010 --> 01:15:39.310
statistical power.

01:15:39.310 --> 01:15:42.690
So that's one thing to keep
in mind about the cost of

01:15:42.690 --> 01:15:43.880
experiments.

01:15:43.880 --> 01:15:47.580
And the last thing is, you need
to factor in, what is the

01:15:47.580 --> 01:15:49.140
cost of getting the
wrong answers?

01:15:49.140 --> 01:15:51.510
If you really think that
non-experimental methods are

01:15:51.510 --> 01:15:54.710
not going to work in your
particular context, then it's

01:15:54.710 --> 01:15:57.540
not so useful to invest less
money if you don't think

01:15:57.540 --> 01:15:58.870
you're going to get
the same answer.

01:15:58.870 --> 01:16:01.430
And again, I don't want to push
the notion that only with

01:16:01.430 --> 01:16:02.930
an experiment you'll get
the right answer.

01:16:02.930 --> 01:16:05.880
But if you think with a
non-experiment, you won't get

01:16:05.880 --> 01:16:08.700
the right answer, then the cost
of the wrong answer, the

01:16:08.700 --> 01:16:10.440
risk of a wrong answer.

01:16:10.440 --> 01:16:13.550
Ethical issues.

01:16:13.550 --> 01:16:15.720
Throw them at me.

01:16:15.720 --> 01:16:18.380
AUDIENCE: How do you say no to
people who come to you, saying

01:16:18.380 --> 01:16:20.450
I want to put myself
in this program.

01:16:20.450 --> 01:16:23.090
I have all the characteristics
you're asking for.

01:16:23.090 --> 01:16:25.710
You're offering it
to my neighbor.

01:16:25.710 --> 01:16:27.080
How come you're not
offering it to me?

01:16:27.080 --> 01:16:28.600
PROFESSOR: OK.

01:16:28.600 --> 01:16:34.120
The first thing to think about
here is experiments are

01:16:34.120 --> 01:16:39.180
typically done in context where
there's access demand.

01:16:39.180 --> 01:16:42.340
Where there are more people who
want to be in your program

01:16:42.340 --> 01:16:45.670
than can be served
by your program.

01:16:45.670 --> 01:16:48.740
And if that's the case, suppose
you had 1,000 people

01:16:48.740 --> 01:16:54.440
who applied to your program,
and you can only serve 400.

01:16:54.440 --> 01:16:56.800
The question I ask
you, Cornelia--

01:16:56.800 --> 01:16:58.490
and only you--

01:16:58.490 --> 01:17:02.620
is how many people are you going
to have to say, sorry, I

01:17:02.620 --> 01:17:04.840
can't serve you?

01:17:04.840 --> 01:17:06.030
600.

01:17:06.030 --> 01:17:09.030
Both in the context of an
experiment and in the context

01:17:09.030 --> 01:17:10.730
of a non-experimental study.

01:17:10.730 --> 01:17:14.990
The only thing that changes is
how you decide who those 600

01:17:14.990 --> 01:17:16.020
people are.

01:17:16.020 --> 01:17:17.470
It's the only thing
that changes.

01:17:17.470 --> 01:17:22.200
And in fact, in some contexts,
the flip of the coin can seem

01:17:22.200 --> 01:17:27.190
more fair then you deciding,
I think this person is more

01:17:27.190 --> 01:17:29.990
deserving, or this person--

01:17:29.990 --> 01:17:33.190
So in that context, in the
context where you're going to

01:17:33.190 --> 01:17:37.070
have to turn away people, then
the ethical issues, in my

01:17:37.070 --> 01:17:40.200
mind, are much harder
to justify.

01:17:40.200 --> 01:17:43.120
I'm not saying there are no
ethical issues in experiments.

01:17:43.120 --> 01:17:44.380
There are some context in which

01:17:44.380 --> 01:17:45.390
there are ethical issues.

01:17:45.390 --> 01:17:48.760
So if you are completely
convinced that your program

01:17:48.760 --> 01:17:54.330
works, then why are you going
to do this whole randomized

01:17:54.330 --> 01:17:55.070
experiment?

01:17:55.070 --> 01:17:57.450
The only thing I can tell you
is that a lot of people have

01:17:57.450 --> 01:18:00.060
been very convinced that some
programs work, and then they

01:18:00.060 --> 01:18:01.600
turn out not to work.

01:18:01.600 --> 01:18:03.520
But if you are completely
convinced that the program

01:18:03.520 --> 01:18:06.860
works, then you shouldn't
be doing it.

01:18:06.860 --> 01:18:11.560
And then the other thing is,
if you are testing an

01:18:11.560 --> 01:18:16.210
intervention that you think can
harm people, then there

01:18:16.210 --> 01:18:18.000
are ethical issues involved.

01:18:18.000 --> 01:18:22.610
So I don't think anyone will
be very fond of doing an

01:18:22.610 --> 01:18:27.720
experiment to try to find out
whether smoking causes lung

01:18:27.720 --> 01:18:30.590
cancer, for example.

01:18:30.590 --> 01:18:33.270
Because we don't have
experimental evidence, but the

01:18:33.270 --> 01:18:34.920
medical evidence seems
to be pretty

01:18:34.920 --> 01:18:36.692
strongly in favor of that.

01:18:36.692 --> 01:18:37.942
Maria Teresa?

01:18:39.996 --> 01:18:42.230
AUDIENCE: A consequence of that
ethical question, was

01:18:42.230 --> 01:18:45.134
hard for me, was people who are
indeed chosen to be in the

01:18:45.134 --> 01:18:47.325
program and people
who are not.

01:18:47.325 --> 01:18:48.708
You have to come back to these
people who are not and follow

01:18:48.708 --> 01:18:50.355
up with them.

01:18:50.355 --> 01:18:53.455
And how willing to cooperate
were they to collect more

01:18:53.455 --> 01:18:55.442
data, to talk with them.

01:18:55.442 --> 01:18:56.649
And you know, working
[UNINTELLIGIBLE] is really

01:18:56.649 --> 01:19:01.410
hard, because you take time from
the farmer for two hours

01:19:01.410 --> 01:19:04.050
every couple months, and come
back, and standing there.

01:19:04.050 --> 01:19:06.760
I mean, while the other guy
received something for these

01:19:06.760 --> 01:19:08.072
two hours that are
given to you.

01:19:08.072 --> 01:19:09.245
So I think that that is the--

01:19:09.245 --> 01:19:11.610
Maybe you need to apply
this more often.

01:19:11.610 --> 01:19:12.280
PROFESSOR: Yeah.

01:19:12.280 --> 01:19:16.250
So I mean, again, I think there
are things you try to do

01:19:16.250 --> 01:19:19.110
to deal with them.

01:19:19.110 --> 01:19:22.790
That has to do more with the
implementation of any study in

01:19:22.790 --> 01:19:23.920
which you have a comparison
group.

01:19:23.920 --> 01:19:25.010
It's not the experiment.

01:19:25.010 --> 01:19:26.200
Experiment has a
control group.

01:19:26.200 --> 01:19:28.480
With any other study that has
a comparison group where

01:19:28.480 --> 01:19:31.300
you're collecting data
faces this issue.

01:19:31.300 --> 01:19:32.650
And then there are things
you can do.

01:19:35.180 --> 01:19:36.470
It depends on the program.

01:19:36.470 --> 01:19:40.100
But certainly sometimes offering
some small incentive

01:19:40.100 --> 01:19:44.070
for people in both groups
to fill in the survey is

01:19:44.070 --> 01:19:46.180
certainly one thing
that could help.

01:19:46.180 --> 01:19:49.490
The other thing that I think
is very important is data

01:19:49.490 --> 01:19:50.710
collection.

01:19:50.710 --> 01:19:58.140
The average researcher, when
they are asked the question,

01:19:58.140 --> 01:20:00.480
do you want to add one more
question to the survey?

01:20:00.480 --> 01:20:03.940
The probability of saying yes
is 99% for the average

01:20:03.940 --> 01:20:04.440
researcher.

01:20:04.440 --> 01:20:07.950
So if you have two hours in the
field, you have to start

01:20:07.950 --> 01:20:11.580
thinking, well, how many of this
question do I really need

01:20:11.580 --> 01:20:12.780
to be asking?

01:20:12.780 --> 01:20:15.780
I mean, that's an issue of
implementation versus--

01:20:15.780 --> 01:20:18.350
So I think there ways
to do with this.

01:20:18.350 --> 01:20:20.150
But again, it's not unique
to experiment.

01:20:20.150 --> 01:20:23.760
It really has to do with how
you implement any study in

01:20:23.760 --> 01:20:26.140
which you're going to collect
data on people who are not

01:20:26.140 --> 01:20:29.520
receiving any benefit.

01:20:29.520 --> 01:20:29.900
Yes?

01:20:29.900 --> 01:20:30.750
Ethical issues?

01:20:30.750 --> 01:20:31.730
AUDIENCE: Nigel.

01:20:31.730 --> 01:20:34.180
I think an answer which--

01:20:34.180 --> 01:20:34.640
PROFESSOR: Nigel.

01:20:34.640 --> 01:20:35.910
You are from the
Kennedy School.

01:20:35.910 --> 01:20:36.810
Very nice to meet you.

01:20:36.810 --> 01:20:38.060
AUDIENCE: I'm leaving
next week.

01:20:40.130 --> 01:20:43.090
The issue of, even if you had
as much money as you kept to

01:20:43.090 --> 01:20:45.190
all give to those 1,000
people, you

01:20:45.190 --> 01:20:46.620
can't do them all today.

01:20:46.620 --> 01:20:49.870
So the way to do it is say, OK,
we'll do 500 this year and

01:20:49.870 --> 01:20:50.830
500 next year.

01:20:50.830 --> 01:20:56.410
So you're getting all 1,000
people, but you do your

01:20:56.410 --> 01:20:58.670
randomized evaluation
year one.

01:20:58.670 --> 01:20:59.700
PROFESSOR: Exactly.

01:20:59.700 --> 01:21:02.660
And tomorrow there are going to
be two sessions on how to

01:21:02.660 --> 01:21:06.560
do roll out design-- there's
a bunch of designs that are

01:21:06.560 --> 01:21:09.504
applying the same principle.

01:21:09.504 --> 01:21:14.150
AUDIENCE: When you think about
the cost of the study, don't

01:21:14.150 --> 01:21:17.970
you think a question you should
deal with way early on

01:21:17.970 --> 01:21:22.080
is the size of the impact
that you're looking for?

01:21:22.080 --> 01:21:24.418
PROFESSOR: Absolutely.

01:21:24.418 --> 01:21:27.840
AUDIENCE: If the study is going
to cost me a lot of

01:21:27.840 --> 01:21:35.280
money, and there's a significant
probability that

01:21:35.280 --> 01:21:37.890
it might have only a small
effect, then that maybe isn't

01:21:37.890 --> 01:21:40.144
worth bothering with.

01:21:40.144 --> 01:21:44.440
And so you talked about looking
up the size of the

01:21:44.440 --> 01:21:48.080
effect and the statistics, and
whether it's statistically

01:21:48.080 --> 01:21:48.886
significant.

01:21:48.886 --> 01:21:53.270
But that size question,
it seems to me, gets

01:21:53.270 --> 01:21:55.730
looked at very late.

01:21:55.730 --> 01:22:01.253
And it should be way up front in
the very early days because

01:22:01.253 --> 01:22:05.015
of the impact, whether the
program is really of interest,

01:22:05.015 --> 01:22:07.000
and worth following.

01:22:07.000 --> 01:22:08.880
PROFESSOR: So two
quick reactions.

01:22:08.880 --> 01:22:11.190
The first one is what
Rachel said.

01:22:11.190 --> 01:22:13.270
Think strategically about
impact evaluations.

01:22:13.270 --> 01:22:16.330
You don't want to evaluate every
single thing that's in

01:22:16.330 --> 01:22:19.640
your organization or every
single thing under the sun.

01:22:19.640 --> 01:22:22.270
You're not going to be able to
do an impact evaluation on all

01:22:22.270 --> 01:22:23.180
of those things.

01:22:23.180 --> 01:22:25.800
You may do other kinds of
evaluations on hopefully most

01:22:25.800 --> 01:22:28.480
of your programs, but an impact
evaluation, you should

01:22:28.480 --> 01:22:30.810
be very strategic on
where you do it.

01:22:30.810 --> 01:22:33.350
And if you think this is a
program that is not generating

01:22:33.350 --> 01:22:36.010
much impact and it's not costing
you that much money,

01:22:36.010 --> 01:22:39.410
then you may say, I'm not
going to evaluate it.

01:22:39.410 --> 01:22:45.710
The second thing I would say
with regard to that is

01:22:45.710 --> 01:22:48.660
thinking about the effect of the
program is something you

01:22:48.660 --> 01:22:51.460
need to do at stage one, the
designing of the study.

01:22:51.460 --> 01:22:54.770
And this will connect with your
session on sample size

01:22:54.770 --> 01:22:57.200
that Esther will speak
about on Thursday.

01:22:57.200 --> 01:23:00.910
Because thinking about the
larger that impact is, that

01:23:00.910 --> 01:23:03.770
affects your calculations
of sample size.

01:23:03.770 --> 01:23:07.690
The paradox in all of this,
despite of what you said, the

01:23:07.690 --> 01:23:10.400
paradox in all of this is
that the bigger the

01:23:10.400 --> 01:23:13.110
effect of the program--

01:23:13.110 --> 01:23:14.990
so if you expect this
program is going

01:23:14.990 --> 01:23:17.580
to have a huge effect--

01:23:17.580 --> 01:23:20.240
the smaller the sample size
you need, and hence the

01:23:20.240 --> 01:23:22.100
smaller the data collection
costs.

01:23:22.100 --> 01:23:25.450
So paradoxically, if the
program is extremely

01:23:25.450 --> 01:23:29.250
important, the data collection
cost should actually be lower

01:23:29.250 --> 01:23:31.740
than a program where you want
to detect effects that are

01:23:31.740 --> 01:23:32.530
very small.

01:23:32.530 --> 01:23:35.970
Having said that, you want to
evaluate the programs that

01:23:35.970 --> 01:23:38.630
make strategic sense for
you to evaluate.

01:23:38.630 --> 01:23:41.390
I mean, one thing I think you
should try to avoid, despite

01:23:41.390 --> 01:23:44.390
all our enthusiasm with
randomized experiment, you

01:23:44.390 --> 01:23:46.670
shouldn't leave this course
thinking, OK.

01:23:46.670 --> 01:23:49.420
Where do I see an opportunity
to randomize?

01:23:49.420 --> 01:23:53.840
And then forget about what is
it that you're trying to do.

01:23:53.840 --> 01:23:56.880
You know, you may find a great
opportunity to randomize, but

01:23:56.880 --> 01:23:58.890
if it doesn't answer a question
you care about,

01:23:58.890 --> 01:24:02.470
you've just wasted money.

01:24:02.470 --> 01:24:04.970
All right, so--

01:24:04.970 --> 01:24:06.160
you have a question?

01:24:06.160 --> 01:24:08.950
This is very interesting.

01:24:08.950 --> 01:24:14.770
AUDIENCE: I want to know, do you
think that in any context,

01:24:14.770 --> 01:24:16.470
one can be able to carry out
an impact evaluation?

01:24:20.390 --> 01:24:22.350
For any type of program--

01:24:22.350 --> 01:24:26.860
PROFESSOR: So my answer
to that is

01:24:26.860 --> 01:24:29.340
no, not in any context.

01:24:29.340 --> 01:24:33.600
But probably in more contexts
than you think about.

01:24:33.600 --> 01:24:34.540
That is my short answer.

01:24:34.540 --> 01:24:39.420
AUDIENCE: What about, for
example, infrastructure--?

01:24:39.420 --> 01:24:40.220
PROFESSOR: There have been.

01:24:40.220 --> 01:24:41.600
It's harder to do.

01:24:41.600 --> 01:24:42.920
There have been some studies.

01:24:42.920 --> 01:24:44.970
This is actually, I think,
a growing area.

01:24:44.970 --> 01:24:48.760
This is an area where people are
trying to do some impact

01:24:48.760 --> 01:24:49.670
evaluation.

01:24:49.670 --> 01:24:51.780
I mean, if you're building a
road in the middle of the

01:24:51.780 --> 01:24:55.180
country, and this is one road
for the whole country--

01:24:55.180 --> 01:24:56.270
you can't do it.

01:24:56.270 --> 01:24:57.070
But it's OK.

01:24:57.070 --> 01:25:00.820
You don't need to do an impact
evaluation for everything you

01:25:00.820 --> 01:25:04.130
do, and you don't need to do a
randomized impact evaluation

01:25:04.130 --> 01:25:05.510
for everything you do.

01:25:05.510 --> 01:25:09.450
What I do hope the message comes
clear is, if you decide

01:25:09.450 --> 01:25:12.100
to do an impact evaluation,
then thinking about a

01:25:12.100 --> 01:25:15.290
randomized design should
be your first choice.

01:25:15.290 --> 01:25:19.460
If you can't do it-- and can't
do it is not just, oh, there's

01:25:19.460 --> 01:25:20.190
some issues--

01:25:20.190 --> 01:25:20.740
no, no.

01:25:20.740 --> 01:25:23.510
Can't do it, really trying,
given all these advantages,

01:25:23.510 --> 01:25:27.490
really trying-- if you can't do
it, then you may consider

01:25:27.490 --> 01:25:28.810
doing other things.

01:25:28.810 --> 01:25:31.680
But this should be your first
option if you decide to do an

01:25:31.680 --> 01:25:32.930
impact evaluation.

01:25:34.990 --> 01:25:35.310
All right.

01:25:35.310 --> 01:25:37.200
Partial equilibrium.

01:25:37.200 --> 01:25:38.470
It's a little bit
more technical.

01:25:38.470 --> 01:25:42.050
But if you have a program that
only affects some people

01:25:42.050 --> 01:25:43.750
differentially.

01:25:43.750 --> 01:25:47.440
So suppose you had a program
that was going to train people

01:25:47.440 --> 01:25:50.740
on how to have better resumes.

01:25:50.740 --> 01:25:53.960
And if you only do it for a few
people, then this program

01:25:53.960 --> 01:25:55.040
may have a huge effect.

01:25:55.040 --> 01:25:58.270
But if you do it for everyone in
your town, there's going to

01:25:58.270 --> 01:26:01.000
be little advantage that's
gained from this.

01:26:01.000 --> 01:26:05.200
And so the randomized experiment
estimates a partial

01:26:05.200 --> 01:26:06.320
equilibrium effect.

01:26:06.320 --> 01:26:09.140
You don't know what would
happen if everyone in a

01:26:09.140 --> 01:26:11.270
particular setting got
the treatment.

01:26:11.270 --> 01:26:15.290
I think this is important in
some settings, but not enough.

01:26:15.290 --> 01:26:15.610
All right.

01:26:15.610 --> 01:26:18.960
So I'm not going to go too much
about get out the vote,

01:26:18.960 --> 01:26:22.260
because we're already a
minute away from time.

01:26:22.260 --> 01:26:29.180
What I want to do is just show
you this table here.

01:26:29.180 --> 01:26:30.450
You already discussed it.

01:26:40.110 --> 01:26:43.670
So this is what the
case study shows.

01:26:43.670 --> 01:26:46.970
This is a situation
where you had four

01:26:46.970 --> 01:26:49.950
methods to estimate impacts.

01:26:49.950 --> 01:26:52.540
The first four methods
found out that the

01:26:52.540 --> 01:26:54.380
program had an effect.

01:26:54.380 --> 01:26:57.130
The last method, the randomized
experiment, found

01:26:57.130 --> 01:26:59.590
no statistically significant
effect.

01:26:59.590 --> 01:27:02.470
I'm not saying that in every
single-- this goes back to

01:27:02.470 --> 01:27:03.070
your question.

01:27:03.070 --> 01:27:04.170
I'm not saying that
in every single

01:27:04.170 --> 01:27:06.320
setting, this will happen.

01:27:06.320 --> 01:27:09.630
But this is a good example of a
setting in which if you had

01:27:09.630 --> 01:27:11.680
gone with any of these
techniques, you would have

01:27:11.680 --> 01:27:14.250
concluded the program had an
effect when it didn't.

01:27:14.250 --> 01:27:17.050
And there are other settings
where the reverse may happen.

01:27:17.050 --> 01:27:21.830
And so if we were able to
say ex ante, before the

01:27:21.830 --> 01:27:24.900
evaluation, this method is going
to be just as good as

01:27:24.900 --> 01:27:27.070
the experiment, that's great.

01:27:27.070 --> 01:27:29.790
We may be able to save some
money if there's no data

01:27:29.790 --> 01:27:31.820
collection involved, and
that would be great.

01:27:31.820 --> 01:27:34.440
But I think the bottom
line here is, we

01:27:34.440 --> 01:27:35.980
are not always able--

01:27:35.980 --> 01:27:39.150
and I think very few people will
tell you, we know when

01:27:39.150 --> 01:27:40.900
this method will work.

01:27:40.900 --> 01:27:46.090
Because the assumption behind
each of this methods on how

01:27:46.090 --> 01:27:47.660
the work is untestable--

01:27:47.660 --> 01:27:50.890
you can't statistically
test that assumption.

01:27:50.890 --> 01:27:53.680
So you may argue
in favor of it.

01:27:53.680 --> 01:27:56.640
You may show evidence
in favor of it.

01:27:56.640 --> 01:27:58.520
But you can't specifically
test it.

01:27:58.520 --> 01:28:04.040
And that's the big advantage
of the experiment.

01:28:04.040 --> 01:28:09.250
So let me just close with
what I hope are the

01:28:09.250 --> 01:28:10.660
bottom lines from this.

01:28:10.660 --> 01:28:12.940
The first thing, what's
underlined there.

01:28:12.940 --> 01:28:15.290
If properly designed and
conducted, the social

01:28:15.290 --> 01:28:17.290
experiments provide
the most credible

01:28:17.290 --> 01:28:19.770
assessment of the program.

01:28:19.770 --> 01:28:22.440
But the "if" is a very important
"if." Don't leave

01:28:22.440 --> 01:28:25.400
this course thinking,
if it's a randomized

01:28:25.400 --> 01:28:26.870
experiment, piece of cake.

01:28:26.870 --> 01:28:27.850
Everything will work.

01:28:27.850 --> 01:28:29.930
That's not the message that
we want to give you here.

01:28:29.930 --> 01:28:33.180
It needs to be properly designed
and conducted.

01:28:33.180 --> 01:28:35.602
And for that, you really need
a partnership between the

01:28:35.602 --> 01:28:38.990
evaluators and the agencies
implementing it.

01:28:38.990 --> 01:28:41.480
They're easy to understand,
much less subject to the

01:28:41.480 --> 01:28:45.700
methodological quibbles, and
more likely to convince

01:28:45.700 --> 01:28:47.060
policymakers.

01:28:47.060 --> 01:28:50.890
These advantages are only
present if they are properly

01:28:50.890 --> 01:28:54.310
conducted and implemented, and
you must assess the validity

01:28:54.310 --> 01:28:56.540
of experiment in the same
way you assess the

01:28:56.540 --> 01:28:57.760
validity of any studies.

01:28:57.760 --> 01:29:00.150
Because you're going to have
threats to an experiment

01:29:00.150 --> 01:29:02.750
anyway, and on Friday, you're
going to learn how to deal

01:29:02.750 --> 01:29:04.430
with some of them.

01:29:04.430 --> 01:29:06.790
I hope this was moderately
helpful.

01:29:06.790 --> 01:29:09.990
I think I have one of the
toughest sessions to teach,

01:29:09.990 --> 01:29:13.740
because you guys, some of you
come completely convinced of

01:29:13.740 --> 01:29:16.300
why you want to randomize,
some of you come very

01:29:16.300 --> 01:29:18.410
skeptical, and I have to
reach a middle ground.

01:29:18.410 --> 01:29:19.650
I hope I did.

01:29:19.650 --> 01:29:22.380
If you have one more question,
I'll take it.

01:29:22.380 --> 01:29:23.329
Yes?

01:29:23.329 --> 01:29:26.263
AUDIENCE: Have you found that
it's possible to teach

01:29:26.263 --> 01:29:29.360
organizations to run their own
randomized trials from start

01:29:29.360 --> 01:29:32.510
to finish, even if there are
no economists on staff?

01:29:32.510 --> 01:29:36.220
Or does this always sort of
require the intervention or

01:29:36.220 --> 01:29:39.740
assistance of outside
modulators?

01:29:39.740 --> 01:29:42.710
PROFESSOR: I think, as you
will see throughout this

01:29:42.710 --> 01:29:47.470
course, conducting an impact
evaluation, even a randomized

01:29:47.470 --> 01:29:50.530
one, does involve some technical
skills and does

01:29:50.530 --> 01:29:53.420
involve some practical
experience in doing it.

01:29:53.420 --> 01:29:58.180
I'm not saying those cannot be
found in organizations that

01:29:58.180 --> 01:29:59.510
are in the field.

01:29:59.510 --> 01:30:02.130
But if those skills are not
there, it's going to be very

01:30:02.130 --> 01:30:04.000
hard to do it.

01:30:04.000 --> 01:30:07.710
Now, you can do a lot
of training on

01:30:07.710 --> 01:30:09.180
how to do this things.

01:30:09.180 --> 01:30:13.620
But I think it'd be hard to do
it without someone who has at

01:30:13.620 --> 01:30:16.330
least done a few of these
and seen some of the

01:30:16.330 --> 01:30:17.590
problems that arise.

01:30:17.590 --> 01:30:19.200
Because problems will arise--

01:30:19.200 --> 01:30:21.830
I mean, no question about it.

01:30:21.830 --> 01:30:27.030
You will be asking the
evaluator, how far can we go?

01:30:27.030 --> 01:30:30.100
And the evaluator, whoever it
is, whether they're in the

01:30:30.100 --> 01:30:34.490
agency or not, needs to be able
to answer that question

01:30:34.490 --> 01:30:38.200
in a way that at the end, you
have a credible evaluation.

01:30:38.200 --> 01:30:42.970
I'm not saying you need an
expert outside of the

01:30:42.970 --> 01:30:44.160
organization.

01:30:44.160 --> 01:30:46.630
But I am saying you need
an expert somewhere.

01:30:46.630 --> 01:30:49.880
And whether you have it inside
or outside, there's a whole

01:30:49.880 --> 01:30:52.830
issue of independence versus
objectivity that

01:30:52.830 --> 01:30:54.931
I won't speak to.

01:30:54.931 --> 01:30:58.700
AUDIENCE: Consumer
companies do it.

01:30:58.700 --> 01:31:00.072
PROFESSOR: Consumer companies?

01:31:00.072 --> 01:31:00.554
AUDIENCE: Yeah.

01:31:00.554 --> 01:31:03.205
Procter & Gamble and big
companies like that do

01:31:03.205 --> 01:31:05.615
experiments all the time, build
their capability into

01:31:05.615 --> 01:31:10.080
the organization, how
they make decisions.

01:31:10.080 --> 01:31:12.510
I'm just wondering that if
someone leaving this course

01:31:12.510 --> 01:31:14.880
with a few experiments under
their belt could implement

01:31:14.880 --> 01:31:18.000
something like this, or whether
you need to go as far

01:31:18.000 --> 01:31:22.074
as getting an economics degree
in order to be able to do the

01:31:22.074 --> 01:31:25.000
coordinating and evaluation
of this type.

01:31:25.000 --> 01:31:29.450
PROFESSOR: So I think to do an
impact evaluation, there are

01:31:29.450 --> 01:31:32.290
usually more than one
people involved.

01:31:32.290 --> 01:31:35.050
And there are different roles
for different people.

01:31:35.050 --> 01:31:38.100
There are some roles who are
having good training in

01:31:38.100 --> 01:31:40.540
economics as particularly
useful.

01:31:40.540 --> 01:31:44.480
There are other roles where I
would say it's particularly

01:31:44.480 --> 01:31:46.920
un-useful to be an economist.

01:31:46.920 --> 01:31:55.360
So I really think it depends on
what role a person leaving

01:31:55.360 --> 01:32:01.640
this course would like to sort
of play in the evaluation.

01:32:01.640 --> 01:32:07.550
And you know, whether leaving
this course, you'll be able to

01:32:07.550 --> 01:32:12.430
run your experiments
on your own--

01:32:12.430 --> 01:32:14.690
I think would be an extremely
successful

01:32:14.690 --> 01:32:17.610
course if that happened.

01:32:17.610 --> 01:32:20.740
We have no way to measure the
impact of this program, but if

01:32:20.740 --> 01:32:23.670
that were to happen, relative to
what would have happened if

01:32:23.670 --> 01:32:25.860
you had not come to this
course, that would be

01:32:25.860 --> 01:32:27.840
phenomenal.

01:32:27.840 --> 01:32:31.460
I think my sense is unless you
have prior training in this

01:32:31.460 --> 01:32:35.280
kind of thing, what this course
will hopefully give you

01:32:35.280 --> 01:32:39.940
is the ability to be involved
in an evaluation and to be

01:32:39.940 --> 01:32:45.180
pretty good at interacting with
whoever is also involved

01:32:45.180 --> 01:32:47.000
in evaluation at asking
the right

01:32:47.000 --> 01:32:48.590
question of the evaluator.

01:32:48.590 --> 01:32:50.800
This is extremely important.

01:32:50.800 --> 01:32:54.330
And being very aware in the
field of what may be

01:32:54.330 --> 01:32:56.440
threatening an evaluation.

01:32:56.440 --> 01:32:59.970
If you're able to do it on your
own after this, I hate to

01:32:59.970 --> 01:33:02.850
say it, but I don't think it's
because this session that you

01:33:02.850 --> 01:33:06.370
heard from me today.

01:33:06.370 --> 01:33:06.770
All right.

01:33:06.770 --> 01:33:10.120
I think I already ate a few
minutes into your time.

01:33:10.120 --> 01:33:10.970
It was a pleasure.

01:33:10.970 --> 01:33:13.440
I'll be here for a few more
minutes if you want.

01:33:13.440 --> 01:33:17.630
I hope you have a wonderful rest
of the course, and see

01:33:17.630 --> 01:33:18.880
you somewhere.