So before we begin, I would like to just ask a very simple question. Do you think randomized evaluation are the best way to conduct an impact evaluation? Please raise your hand if you think so. Just be honest. All right, the TAs, you guys don't count. All right. OK. So I have a job to do now. Whereas I thought that maybe not. One of the things I would like to do is to-- this is one thing I've discovered about teaching. We have about an hour and 25 minutes. And if I speak for an hour and 25 minutes, I know two things will happen. One, you will get very bored, and two, you will not learn anything. So I want you to make sure that you interrupt with questions that you have. If they can be on the topic, that would be very good. If they're are off-topic, I may delay the question or I may postpone the question, at least until I get there. The other thing I would like to say about the way this would work, is I have no power over you. Whereas my students, I have a grade to give, with you I have no power. But I will still ask you to do certain things during the presentation. So I hope you'll collaborate. So my session is called Why Randomize? And the idea of Why Randomize? comes, for those of you who are convinced, I hope you can use this session to help convince others why this method is a very good method to do an impact evaluation. And for those of you who are not convinced, I would like to actually welcome you to raise any objections you have. And I'm not here to tell you randomization is a panacea or it's a solution to all the problems of mankind. But I think in terms of impact evaluations, it's a very powerful method. So the outline of the talk, I'll give you a little bit of background. We'll define, what is a randomized evaluation? It's going to be important to make sure we have a common language. Then advantages and disadvantages of experiments. Then we're going to do the get out the vote, and then finally conclude in hopefully an hour and 20 minutes. So how to measure impact? This is something that Rachel referred to. The idea for measuring impact is, we want to compare what happened to the beneficiaries of a program versus what would have happened in the absence of the program. This is really key. What would have happened in the absence of a program is what's called a counterfactual, and it's key for you to evaluate any method to estimate program impact. Not just randomized evaluation. So when you are trying to assess how someone is going to do an impact evaluation, always ask yourself the question, what is the counterfactual here? How are they planning to think about this counterfactual? How do these people look like in the absence of the program? In the case of Kenya in the textbooks that Rachel was referring to this morning, we thought about the counterfactual in terms of how these children fared after this textbook program was implemented versus how they would have fared at the same moment in time had the program not been implemented. This is crucial, because even before and after methodologies or any other of those methodologies, you are assuming implicitly counterfactual. And the question is, what counterfactual are you assuming, and then is that assumption realistic? And in some cases, it may be. In other cases, it may not. So the problem is, the counterfactual is not observable. So the key goal of this impact evaluation methodology is to mimic it. You can't observe how this children in Kenya would have fared if the textbook program had not been implemented. The truth is, the textbook program was implemented, these textbooks were sent, and so you can't observe what that alternative reality would have been. And so constructing the counterfactual is usually done by selecting a group of people-- in this case, children, in the case of the Kenya example-- that have not been exposed to the program, or were not affected by the program. And so in a randomized evaluation, the key goal here of the randomized evaluation is that you do it from the beginning. And this is a question that I think Logan had in the first session with Rachel. You can't do a randomized evaluation three years after the program was implemented. And the reason you can't do it is that you need to create, through this randomized experiment, the treatment in the control group. You need to decide early on who's going to get the treatment or who's going to be offered the treatment and who is not going to be offered the treatment. There are some opportunities, as Rachel referred to, and your get out the vote case is a good example, where someone already did this. And so you may be lucky, and you may step into the room and say, oh, look. Someone did it. But this thing is, someone should have taken care so that the assignment to this treatment and control group was done in a random manner. And in effect, and we'll see what exactly is random, but what I can tell you for now is if someone doesn't say, we did it randomized, we did a deliberate process so that it was random, it's probably not random. Random is not what people say in the real world. Oh! This is just a random event. Random has a very specific definition which we're going to see in a second. So it's not enough to just say, oh, look. We didn't do anything systematic. Just people enrolled, and that's what happened. If they didn't do something deliberate to do it random, then it probably wasn't random. You can try to check this, but not always possible. The non-randomized, basically, I use that some excluded group, the group of people you're going to use as this comparison group, it's mimicking this counterfactual. And the non-randomized methods rely on the strength of the assumption that you're making. So the methods will be strong if the assumption that the counterfactual is a good 146 00:06:27,960 --> 00:06:28,960 assumption. 147 00:06:28,960 --> 00:06:32,490 There's not any sense in which you say, well, this method is 148 00:06:32,490 --> 00:06:35,860 better than other this other in some absolute manner. 149 00:06:35,860 --> 00:06:39,240 It is better or it's not better if the assumptions 150 00:06:39,240 --> 00:06:41,690 needed to mimic the counterfactual hold. 151 00:06:41,690 --> 00:06:45,560 If they hold, then that's great, you have a good method. 152 00:06:45,560 --> 00:06:47,660 The key distinction between this-- 153 00:06:47,660 --> 00:06:48,563 yes? 154 00:06:48,563 --> 00:06:50,910 AUDIENCE: Could you give us an example of when the 155 00:06:50,910 --> 00:06:54,330 assumptions were just obviously untrue? 156 00:06:54,330 --> 00:06:54,810 PROFESSOR: Sure. 157 00:06:54,810 --> 00:07:00,130 So suppose that you had this textbook program and it was 158 00:07:00,130 --> 00:07:03,940 happening in Kenya, where many-- 159 00:07:03,940 --> 00:07:05,590 and this is program happened-- 160 00:07:05,590 --> 00:07:07,370 where many other things were happening in 161 00:07:07,370 --> 00:07:08,670 this education system. 162 00:07:08,670 --> 00:07:11,550 So textbooks were being distributed, different 163 00:07:11,550 --> 00:07:13,000 teachers were being hired. 164 00:07:13,000 --> 00:07:15,520 A lot of activities were happening. 165 00:07:15,520 --> 00:07:19,310 And so you just compare what test scores of children were 166 00:07:19,310 --> 00:07:22,880 before the program and then what textbooks of children 167 00:07:22,880 --> 00:07:27,420 were after the program, you would suspect that-- 168 00:07:27,420 --> 00:07:30,570 well, first of all, if you did that, the counterfactual you 169 00:07:30,570 --> 00:07:33,760 would be assuming is that in the absence of the program, 170 00:07:33,760 --> 00:07:36,870 test scores would have remained flat. 171 00:07:36,870 --> 00:07:38,700 And that may be a reasonable 172 00:07:38,700 --> 00:07:40,530 counterfactual in some contexts. 173 00:07:40,530 --> 00:07:42,340 Not many, to be honest. 174 00:07:42,340 --> 00:07:43,340 But not in others. 175 00:07:43,340 --> 00:07:46,200 So in one context in which other things happening in the 176 00:07:46,200 --> 00:07:50,200 education system in Kenya, it's very hard to argue that 177 00:07:50,200 --> 00:07:52,220 nothing would have changed in test scores. 178 00:07:52,220 --> 00:07:54,060 Because test scores would have increased, because there are 179 00:07:54,060 --> 00:07:55,740 lots of things that happen. 180 00:07:55,740 --> 00:07:59,130 Now suppose you implemented this same program in a very 181 00:07:59,130 --> 00:08:03,280 remote village, very secluded area where nothing else would 182 00:08:03,280 --> 00:08:04,110 have happened. 183 00:08:04,110 --> 00:08:07,350 You sort of have a pretty good sense that no other 184 00:08:07,350 --> 00:08:09,960 intervention was happening for one group or the 185 00:08:09,960 --> 00:08:11,430 other at the same time. 186 00:08:11,430 --> 00:08:13,240 The assumption maybe more plausible. 187 00:08:13,240 --> 00:08:15,970 I think in this case, the textbook example, it's still 188 00:08:15,970 --> 00:08:18,790 questionable, because there are other educational input 189 00:08:18,790 --> 00:08:19,970 said may be happening. 190 00:08:19,970 --> 00:08:23,120 But the key is that the context and the method are the 191 00:08:23,120 --> 00:08:25,980 ones that together can tell you how good 192 00:08:25,980 --> 00:08:27,360 the assumption is. 193 00:08:27,360 --> 00:08:29,380 The method by itself cannot tell you. 194 00:08:29,380 --> 00:08:32,510 The method by itself may be reasonable under certain 195 00:08:32,510 --> 00:08:34,500 conditions but not under others. 196 00:08:34,500 --> 00:08:36,870 AUDIENCE: But there aren't any sort of big famous studies 197 00:08:36,870 --> 00:08:39,836 that weren't randomized, that everybody thinks they're 198 00:08:39,836 --> 00:08:40,140 pretty good? 199 00:08:40,140 --> 00:08:40,700 PROFESSOR: Yes. 200 00:08:40,700 --> 00:08:45,270 So I don't want to get a lot into this, but there's a whole 201 00:08:45,270 --> 00:08:50,780 debate now in economics literature as to whether 202 00:08:50,780 --> 00:08:53,680 randomized experiments are the only way to 203 00:08:53,680 --> 00:08:55,490 estimate causal effects. 204 00:08:55,490 --> 00:08:59,200 This is a big, big debate, and there are very respectable 205 00:08:59,200 --> 00:09:02,160 people on both sides of the debate. 206 00:09:02,160 --> 00:09:05,060 What I can tell you is that debate has not been solved, 207 00:09:05,060 --> 00:09:08,240 but I think more and more people are sort of 208 00:09:08,240 --> 00:09:11,460 recognizing, at least, that the randomized experiment 209 00:09:11,460 --> 00:09:12,670 should be a first best. 210 00:09:12,670 --> 00:09:17,120 I think even the opponents of the method do say that. 211 00:09:17,120 --> 00:09:19,190 But the other thing I would say is there have been many 212 00:09:19,190 --> 00:09:23,210 studies trying to compare the results of an experiment with 213 00:09:23,210 --> 00:09:25,920 some of the other non-experimental methods. 214 00:09:25,920 --> 00:09:28,990 You have one in your get out the vote. 215 00:09:28,990 --> 00:09:32,950 That was not a study in which the non-experimental methods 216 00:09:32,950 --> 00:09:35,840 fared very well, but there are other studies in which they 217 00:09:35,840 --> 00:09:36,880 fared well. 218 00:09:36,880 --> 00:09:40,000 The key thing is we haven't been able to figure out under 219 00:09:40,000 --> 00:09:44,650 what conditions the non-randomized evaluations 220 00:09:44,650 --> 00:09:45,560 fared well. 221 00:09:45,560 --> 00:09:47,750 If we knew, then it would be nice. 222 00:09:47,750 --> 00:09:50,080 But I think so far, the answer-- 223 00:09:50,080 --> 00:09:51,300 we don't know. 224 00:09:51,300 --> 00:09:53,660 We know the theoretical answer, which is, if the 225 00:09:53,660 --> 00:09:57,860 assumptions hold, we're golden. 226 00:09:57,860 --> 00:10:03,290 The problem, key problem, is that this is relying on the 227 00:10:03,290 --> 00:10:07,040 assumptions, and you cannot test these assumptions. 228 00:10:07,040 --> 00:10:09,210 If you could test this assumption, if you could test 229 00:10:09,210 --> 00:10:13,380 under what assumption this mimics the counterfactuals, 230 00:10:13,380 --> 00:10:14,220 we'll be all done. 231 00:10:14,220 --> 00:10:17,230 We'll be able to say, from the very beginning, we can use 232 00:10:17,230 --> 00:10:18,050 this method. 233 00:10:18,050 --> 00:10:20,990 You cannot, no matter how sophisticated and how good the 234 00:10:20,990 --> 00:10:23,320 non-experimental method is. 235 00:10:23,320 --> 00:10:24,220 Yes? 236 00:10:24,220 --> 00:10:24,895 You seem skeptical. 237 00:10:24,895 --> 00:10:26,860 AUDIENCE: No, no, no. 238 00:10:26,860 --> 00:10:27,800 PROFESSOR: You're--? 239 00:10:27,800 --> 00:10:28,950 OK. 240 00:10:28,950 --> 00:10:32,380 So this is very confusing. 241 00:10:32,380 --> 00:10:33,905 It's like twice they're showing-- 242 00:10:33,905 --> 00:10:36,850 243 00:10:36,850 --> 00:10:38,560 you should do a randomized evaluation 244 00:10:38,560 --> 00:10:39,590 to see if this helps. 245 00:10:39,590 --> 00:10:41,400 Two boards. 246 00:10:41,400 --> 00:10:41,870 All right. 247 00:10:41,870 --> 00:10:44,620 So the randomized evaluations here, you have a bunch of 248 00:10:44,620 --> 00:10:47,030 other names in which they are known-- random assignment 249 00:10:47,030 --> 00:10:50,180 studies, randomized field trials, just in case-- 250 00:10:50,180 --> 00:10:53,360 RCTs are the way that they were known very early in the 251 00:10:53,360 --> 00:10:57,310 literature, and still nowadays in other disciplines. 252 00:10:57,310 --> 00:11:01,030 And then the non-experimental methods, all of this that you 253 00:11:01,030 --> 00:11:03,560 have here, some of which are in your get 254 00:11:03,560 --> 00:11:07,310 out the vote study. 255 00:11:07,310 --> 00:11:07,610 All right. 256 00:11:07,610 --> 00:11:10,490 So before we go into what is a randomized experiment, I want 257 00:11:10,490 --> 00:11:12,640 to introduce the notion of validity. 258 00:11:12,640 --> 00:11:14,130 And Rachel raised it a little bit. 259 00:11:14,130 --> 00:11:18,640 But we usually think of in terms of two kinds of validity 260 00:11:18,640 --> 00:11:20,570 when you assess a study. 261 00:11:20,570 --> 00:11:22,440 The first one is internal validity. 262 00:11:22,440 --> 00:11:24,150 This has to do with your ability 263 00:11:24,150 --> 00:11:25,670 to draw causal inference. 264 00:11:25,670 --> 00:11:29,260 So your ability to attribute your impact 265 00:11:29,260 --> 00:11:30,980 estimates to the program. 266 00:11:30,980 --> 00:11:34,440 So if you said, this difference is my impact 267 00:11:34,440 --> 00:11:38,470 estimate, the study has strong internal validity if you can 268 00:11:38,470 --> 00:11:41,610 reliably attribute that to the program and not to something 269 00:11:41,610 --> 00:11:48,310 else for whatever population is represented in your study. 270 00:11:48,310 --> 00:11:53,400 So if you did the textbook project in Kenya, in a rural 271 00:11:53,400 --> 00:11:57,410 village in Kenya, well, that study-- 272 00:11:57,410 --> 00:12:00,210 if it's internally valid, or if it has strong internal 273 00:12:00,210 --> 00:12:03,620 validity, then it's going to be valid for the population 274 00:12:03,620 --> 00:12:06,755 represented by the sample you drew in Kenya, in that rural 275 00:12:06,755 --> 00:12:07,760 village in Kenya. 276 00:12:07,760 --> 00:12:10,680 External validity, on the other hand, has to do with the 277 00:12:10,680 --> 00:12:14,400 ability to generalize to other populations, other settings, 278 00:12:14,400 --> 00:12:15,650 other time periods. 279 00:12:15,650 --> 00:12:18,370 280 00:12:18,370 --> 00:12:23,660 The reason I mention this is that these two things often 281 00:12:23,660 --> 00:12:27,490 trade off against each other when you are sort of trying to 282 00:12:27,490 --> 00:12:29,050 commission or conduct a study. 283 00:12:29,050 --> 00:12:32,750 So you may decide, I'm going to go this randomized trial in 284 00:12:32,750 --> 00:12:36,920 this very small place to test out my model. 285 00:12:36,920 --> 00:12:39,850 And you may be concerned with, how do I know if it 286 00:12:39,850 --> 00:12:42,830 generalizes to other settings? 287 00:12:42,830 --> 00:12:45,990 On the other hand, you may decide, well, I'm going to use 288 00:12:45,990 --> 00:12:50,260 other kinds of methods and be representative of the whole 289 00:12:50,260 --> 00:12:52,630 Kenya, or the whole India, or whatever country you're 290 00:12:52,630 --> 00:12:53,790 working in. 291 00:12:53,790 --> 00:12:56,050 The key thing is to distinguish two things. 292 00:12:56,050 --> 00:12:59,110 The first one has to do with causal inference for your own 293 00:12:59,110 --> 00:13:02,050 sample, or for the population represented in your sample. 294 00:13:02,050 --> 00:13:05,920 The second one has to do with generalizability. 295 00:13:05,920 --> 00:13:08,150 And Rachel talked a little bit about how much you can 296 00:13:08,150 --> 00:13:10,650 generalize from experiments, and we can talk 297 00:13:10,650 --> 00:13:12,500 about that if you want. 298 00:13:12,500 --> 00:13:13,010 All right. 299 00:13:13,010 --> 00:13:15,960 So what is a randomized evaluation? 300 00:13:15,960 --> 00:13:19,360 So the very basics-- 301 00:13:19,360 --> 00:13:21,880 can someone tell me what the basics are? 302 00:13:21,880 --> 00:13:24,640 Randomized experiments? 303 00:13:24,640 --> 00:13:26,800 How do you do it? 304 00:13:26,800 --> 00:13:28,050 How does it work? 305 00:13:28,050 --> 00:13:31,600 306 00:13:31,600 --> 00:13:36,420 There's one thing that you should know. 307 00:13:36,420 --> 00:13:39,760 When I first started teaching, I used to be very, very 308 00:13:39,760 --> 00:13:42,720 nervous when there was silence in the room. 309 00:13:42,720 --> 00:13:45,650 But now I'm very comfortable. 310 00:13:45,650 --> 00:13:46,990 So you tell me. 311 00:13:46,990 --> 00:13:51,120 So how does a randomized trial work? 312 00:13:51,120 --> 00:13:53,390 AUDIENCE: Allocate the subject into the treatment of the 313 00:13:53,390 --> 00:13:55,680 control group based on a random assignment. 314 00:13:55,680 --> 00:13:56,810 PROFESSOR: OK. 315 00:13:56,810 --> 00:13:57,970 random assignment. 316 00:13:57,970 --> 00:14:00,140 Sort of like a flip of a coin, right? 317 00:14:00,140 --> 00:14:03,940 So in the simple scenario, we take a sample of program 318 00:14:03,940 --> 00:14:04,460 applicants-- 319 00:14:04,460 --> 00:14:06,370 just like we do with drug trials-- 320 00:14:06,370 --> 00:14:09,240 take a sample of program applicants and we randomly 321 00:14:09,240 --> 00:14:11,670 assign them either to a treatment group which is 322 00:14:11,670 --> 00:14:14,330 offered the treatment and a control group. 323 00:14:14,330 --> 00:14:15,740 They're not offered the treatment. 324 00:14:15,740 --> 00:14:20,440 This is a very simple setting, but the idea here is that by 325 00:14:20,440 --> 00:14:23,550 doing this, the treatment and the control group are 326 00:14:23,550 --> 00:14:25,650 comparable to each other. 327 00:14:25,650 --> 00:14:28,290 And so any differences you observe between these two 328 00:14:28,290 --> 00:14:31,970 groups should be attributable to the program. 329 00:14:31,970 --> 00:14:33,880 The key about this method-- 330 00:14:33,880 --> 00:14:37,840 so this do not differ systematically at the outset 331 00:14:37,840 --> 00:14:39,070 of the experiment. 332 00:14:39,070 --> 00:14:42,320 The key about this method is that this control group is 333 00:14:42,320 --> 00:14:43,930 mimicking the counterfactuals. 334 00:14:43,930 --> 00:14:47,100 It's mimicking what will happen to the treatment in the 335 00:14:47,100 --> 00:14:48,560 absence of the treatment. 336 00:14:48,560 --> 00:14:52,160 And the reason it's mimicking the counterfactual is that on 337 00:14:52,160 --> 00:14:55,110 average, this group should be exactly like 338 00:14:55,110 --> 00:14:55,915 the treatment group. 339 00:14:55,915 --> 00:14:59,950 So if we took all of you and we flip coins, from each of 340 00:14:59,950 --> 00:15:02,710 you we flip coins, and then you ended up in two different 341 00:15:02,710 --> 00:15:07,110 groups, the two groups would have, on average, the same 342 00:15:07,110 --> 00:15:08,440 characteristics. 343 00:15:08,440 --> 00:15:11,370 So the same people that come from a 344 00:15:11,370 --> 00:15:12,830 certain area of the world. 345 00:15:12,830 --> 00:15:14,320 The same percent of females. 346 00:15:14,320 --> 00:15:15,810 The same average intelligence. 347 00:15:15,810 --> 00:15:17,520 The same average income. 348 00:15:17,520 --> 00:15:19,330 The same average education. 349 00:15:19,330 --> 00:15:20,090 You name it. 350 00:15:20,090 --> 00:15:22,540 We're going to do an exercise where you can see this. 351 00:15:22,540 --> 00:15:25,470 The beauty of this method is that the two groups 352 00:15:25,470 --> 00:15:28,940 statistically are going to be identical to each other. 353 00:15:28,940 --> 00:15:32,620 If they're not identical to each other statistically then 354 00:15:32,620 --> 00:15:34,030 you don't have random assignment. 355 00:15:34,030 --> 00:15:35,250 It has failed. 356 00:15:35,250 --> 00:15:37,340 Random assignment. 357 00:15:37,340 --> 00:15:40,110 So the random assignment is the process you employ to 358 00:15:40,110 --> 00:15:42,280 create these two comparable groups. 359 00:15:42,280 --> 00:15:45,950 The huge advantage of this random assignment is that you 360 00:15:45,950 --> 00:15:49,780 don't need to think about, are the two groups the same on 361 00:15:49,780 --> 00:15:52,350 this characteristic that I care about? 362 00:15:52,350 --> 00:15:54,180 You don't need to think about that. 363 00:15:54,180 --> 00:15:57,680 The two groups should be the same on those characteristics. 364 00:15:57,680 --> 00:15:58,620 AUDIENCE: So that's theoretically. 365 00:15:58,620 --> 00:16:01,590 So now thinking in terms of a program where you have, say, 366 00:16:01,590 --> 00:16:03,480 selection criteria. 367 00:16:03,480 --> 00:16:05,830 So let's say you want to do a program in a particular 368 00:16:05,830 --> 00:16:09,490 district, and you're looking for people that have three 369 00:16:09,490 --> 00:16:11,750 characteristics that are all the same. 370 00:16:11,750 --> 00:16:13,840 Let's say for whatever reason, the number of people that 371 00:16:13,840 --> 00:16:17,065 present themselves in that way is a relatively small number. 372 00:16:17,065 --> 00:16:19,670 373 00:16:19,670 --> 00:16:22,980 Then you can randomly select within that small number. 374 00:16:22,980 --> 00:16:25,900 But then you're challenged by the size of your group. 375 00:16:25,900 --> 00:16:26,710 PROFESSOR: Absolutely. 376 00:16:26,710 --> 00:16:29,420 And on Thursday, you'll get to that 377 00:16:29,420 --> 00:16:31,220 minimum sample size detected. 378 00:16:31,220 --> 00:16:34,090 But the key there, if those three characteristics are your 379 00:16:34,090 --> 00:16:37,250 selection criteria, you don't want to modify your selection 380 00:16:37,250 --> 00:16:39,180 criteria because someone is going to come and do an 381 00:16:39,180 --> 00:16:40,040 experiment. 382 00:16:40,040 --> 00:16:42,500 You want to offer the program to whoever you're going to 383 00:16:42,500 --> 00:16:43,640 offer the program. 384 00:16:43,640 --> 00:16:46,010 So those three characteristics are key for your program, 385 00:16:46,010 --> 00:16:49,050 because you decide those are the people you want to serve, 386 00:16:49,050 --> 00:16:52,040 then you need to find a way to do your evaluation that 387 00:16:52,040 --> 00:16:54,270 doesn't involve relaxing that criteria. 388 00:16:54,270 --> 00:16:57,070 Unless you really are thinking, well, it would be 389 00:16:57,070 --> 00:16:59,250 interesting to know if I served this other group, 390 00:16:59,250 --> 00:17:01,590 whether the program has a different effect or no. 391 00:17:01,590 --> 00:17:03,620 AUDIENCE: But you can't mix and match among the criteria. 392 00:17:03,620 --> 00:17:05,280 You can't say-- or could you? 393 00:17:05,280 --> 00:17:06,640 Let's say you have trouble. 394 00:17:06,640 --> 00:17:07,710 You're not getting enough people 395 00:17:07,710 --> 00:17:08,710 with those three criteria. 396 00:17:08,710 --> 00:17:11,090 So you say, OK, now we're going to make it six criteria, 397 00:17:11,090 --> 00:17:13,339 and we'll be happy if they only meet four of the six. 398 00:17:13,339 --> 00:17:16,630 That right there would not make it possible to do this. 399 00:17:16,630 --> 00:17:22,160 PROFESSOR: So if, at the end of your processes, where 400 00:17:22,160 --> 00:17:24,859 you're saying three criteria, six criteria, five, four, 401 00:17:24,859 --> 00:17:27,490 whatever you say-- if at the end of this process, you end 402 00:17:27,490 --> 00:17:32,340 up with a large enough pool to be able to randomly assign 403 00:17:32,340 --> 00:17:34,930 into two groups, treatment and control? 404 00:17:34,930 --> 00:17:36,190 No problem. 405 00:17:36,190 --> 00:17:37,495 You could have relaxed the criteria. 406 00:17:37,495 --> 00:17:41,990 You could have said six, five, four, whatever you want. 407 00:17:41,990 --> 00:17:45,960 My previous answer is more to, don't change the criteria just 408 00:17:45,960 --> 00:17:47,580 because you want to do a randomized trial. 409 00:17:47,580 --> 00:17:49,180 You want to evaluate the program 410 00:17:49,180 --> 00:17:50,380 that you want to evaluate. 411 00:17:50,380 --> 00:17:52,790 You don't want to evaluate the program that you think will 412 00:17:52,790 --> 00:17:55,180 fit the randomized design. 413 00:17:55,180 --> 00:17:56,800 Make sense? 414 00:17:56,800 --> 00:17:59,703 Other questions, comments? 415 00:17:59,703 --> 00:18:01,180 No? 416 00:18:01,180 --> 00:18:01,520 OK. 417 00:18:01,520 --> 00:18:04,630 So the two groups did not differ systematically at the 418 00:18:04,630 --> 00:18:05,610 outset of the experiment. 419 00:18:05,610 --> 00:18:06,990 I want to emphasize this. 420 00:18:06,990 --> 00:18:09,070 And again, there's going to be an exercise where you can see 421 00:18:09,070 --> 00:18:10,180 this in Excel. 422 00:18:10,180 --> 00:18:13,850 But the key is that the two groups will be identical both 423 00:18:13,850 --> 00:18:15,875 on observable characteristics and non-observable. 424 00:18:15,875 --> 00:18:17,920 And when I say identical, they're identical 425 00:18:17,920 --> 00:18:18,730 statistically. 426 00:18:18,730 --> 00:18:20,910 It's not like the needs of these two groups 427 00:18:20,910 --> 00:18:21,980 are exactly the same. 428 00:18:21,980 --> 00:18:26,290 They are statistically identical in the sense that 429 00:18:26,290 --> 00:18:28,560 you should not observe a pattern of statistically 430 00:18:28,560 --> 00:18:31,210 significant differences between the two groups. 431 00:18:31,210 --> 00:18:34,370 If you were to test 100 characteristics, then five of 432 00:18:34,370 --> 00:18:36,670 them may end up being statistically significant, 433 00:18:36,670 --> 00:18:40,240 just because of the luck of the draw or multiple testing. 434 00:18:40,240 --> 00:18:43,690 But they shouldn't differ systematically at the outset 435 00:18:43,690 --> 00:18:46,130 of the experiment. 436 00:18:46,130 --> 00:18:49,240 And this is the key. 437 00:18:49,240 --> 00:18:51,940 The whole key of impact evaluation is that then you 438 00:18:51,940 --> 00:18:54,660 can take that difference and attribute it to the program. 439 00:18:54,660 --> 00:18:57,510 And then you're not thinking, is it the program, or is it 440 00:18:57,510 --> 00:19:00,650 some pre-existing differences between the groups? 441 00:19:00,650 --> 00:19:04,710 If you reach the end of an impact evaluation and you're 442 00:19:04,710 --> 00:19:08,180 wondering, is it the program, or is it something else? 443 00:19:08,180 --> 00:19:10,560 Unfortunately, that's not a very good impact evaluation. 444 00:19:10,560 --> 00:19:16,780 445 00:19:16,780 --> 00:19:18,680 So there are some variations on the basics. 446 00:19:18,680 --> 00:19:20,530 You could assign to multiple treatment groups. 447 00:19:20,530 --> 00:19:23,490 So rather than having only one treatment, you could have 448 00:19:23,490 --> 00:19:25,230 multiple treatments. 449 00:19:25,230 --> 00:19:27,460 And this happens a lot if you're trying to test 450 00:19:27,460 --> 00:19:30,080 different ways of implementing a program. 451 00:19:30,080 --> 00:19:35,100 So you may have a program that you're thinking, well, I don't 452 00:19:35,100 --> 00:19:37,450 know of the best way to deliver it is method number 453 00:19:37,450 --> 00:19:38,920 one or method number two. 454 00:19:38,920 --> 00:19:41,570 And you may randomize into three groups. 455 00:19:41,570 --> 00:19:43,360 Method number one, method number two, 456 00:19:43,360 --> 00:19:44,490 and a control group. 457 00:19:44,490 --> 00:19:47,090 Or you may decide to do away with the control group and 458 00:19:47,090 --> 00:19:50,320 only randomize into, say, three methods, three ways of 459 00:19:50,320 --> 00:19:51,640 delivering an intervention. 460 00:19:51,640 --> 00:19:54,180 If you do away with the control group, you're going to 461 00:19:54,180 --> 00:19:56,320 be able to answer the question, is one treatment 462 00:19:56,320 --> 00:19:57,400 better than the other? 463 00:19:57,400 --> 00:19:59,790 But you're not going to be able to answer the question, 464 00:19:59,790 --> 00:20:02,580 is any of this treatment better than what would have 465 00:20:02,580 --> 00:20:04,950 happened in the absence of the program? 466 00:20:04,950 --> 00:20:06,210 So this is one variation. 467 00:20:06,210 --> 00:20:09,190 And the other variation, we were talking about when Iqbal 468 00:20:09,190 --> 00:20:10,470 answered the question. 469 00:20:10,470 --> 00:20:12,120 He said, well, you have a bunch of people. 470 00:20:12,120 --> 00:20:13,490 You assign some to the treatment or 471 00:20:13,490 --> 00:20:14,720 to the control group. 472 00:20:14,720 --> 00:20:17,800 You can assign units other then people or households. 473 00:20:17,800 --> 00:20:21,770 Health centers, schools, local government, villages. 474 00:20:21,770 --> 00:20:23,740 And you can see in JPAL's website. 475 00:20:23,740 --> 00:20:25,950 There are a bunch of examples where each of these have been 476 00:20:25,950 --> 00:20:29,910 used as units for random assignment? 477 00:20:29,910 --> 00:20:31,400 Yes? 478 00:20:31,400 --> 00:20:32,280 Your name, please? 479 00:20:32,280 --> 00:20:34,340 We don't have name tags, but I like to call 480 00:20:34,340 --> 00:20:35,630 people by their name. 481 00:20:35,630 --> 00:20:36,550 Wendy? 482 00:20:36,550 --> 00:20:37,920 Go ahead. 483 00:20:37,920 --> 00:20:45,630 AUDIENCE: So if we pick schools, my conclusions will 484 00:20:45,630 --> 00:20:47,380 be about schools. 485 00:20:47,380 --> 00:20:50,427 They won't be about the students in the school. 486 00:20:50,427 --> 00:20:51,590 Or is that wrong? 487 00:20:51,590 --> 00:20:54,970 PROFESSOR: So it depends on-- you say your conclusions will 488 00:20:54,970 --> 00:20:57,530 be about the schools? 489 00:20:57,530 --> 00:21:01,320 The key thing is, what is the unit of intervention here? 490 00:21:01,320 --> 00:21:04,990 So it's a program that's directed at all the children 491 00:21:04,990 --> 00:21:08,110 in the school, only some children in the school? 492 00:21:08,110 --> 00:21:10,970 In part, the decision of what you randomize, whether it's 493 00:21:10,970 --> 00:21:13,990 schools or children within schools, depends on what's the 494 00:21:13,990 --> 00:21:16,040 nature of the treatment. 495 00:21:16,040 --> 00:21:19,660 So if you have a program that serves everyone 496 00:21:19,660 --> 00:21:21,280 in the school, yes. 497 00:21:21,280 --> 00:21:24,740 Your assignment should be at the school level. 498 00:21:24,740 --> 00:21:27,020 That is, you should have some schools that receive the 499 00:21:27,020 --> 00:21:28,970 program and others that don't. 500 00:21:28,970 --> 00:21:31,540 But if you have a program that is only going to serve some 501 00:21:31,540 --> 00:21:35,820 children in the school, then your assignment could be 502 00:21:35,820 --> 00:21:38,570 within the school, and you have some children who receive 503 00:21:38,570 --> 00:21:41,840 the treatment, and others that do not. 504 00:21:41,840 --> 00:21:44,630 The key, though, is if you're using your second method, you 505 00:21:44,630 --> 00:21:46,610 want to make sure there are no spillovers. 506 00:21:46,610 --> 00:21:49,070 You want to make sure that someone receiving the 507 00:21:49,070 --> 00:21:53,020 treatment is not going to affect the outcomes of someone 508 00:21:53,020 --> 00:21:55,030 not receiving the treatment. 509 00:21:55,030 --> 00:21:56,550 And so you're going to see the spillovers. 510 00:21:56,550 --> 00:21:59,700 That's something you're going to see on Friday. 511 00:21:59,700 --> 00:22:03,500 But the basic idea is, what level of randomization you 512 00:22:03,500 --> 00:22:06,890 have depends on, what is the level of your treatment? 513 00:22:06,890 --> 00:22:09,240 If you're treating schools, if you're treating individuals 514 00:22:09,240 --> 00:22:10,775 within schools, et cetera. 515 00:22:10,775 --> 00:22:14,060 AUDIENCE: So statistically I want them to be the same. 516 00:22:14,060 --> 00:22:18,310 PROFESSOR: You want them to be the same, yes. 517 00:22:18,310 --> 00:22:19,555 AUDIENCE: My name is Manuel. 518 00:22:19,555 --> 00:22:21,485 Please talk a little bit about the unobserved 519 00:22:21,485 --> 00:22:22,590 characteristics. 520 00:22:22,590 --> 00:22:23,930 PROFESSOR: Yes. 521 00:22:23,930 --> 00:22:26,960 So the unobserved characteristics-- 522 00:22:26,960 --> 00:22:30,220 this is something that a lot of the non-experimental 523 00:22:30,220 --> 00:22:32,430 methods wrestle with. 524 00:22:32,430 --> 00:22:38,390 And the idea is, the randomized experiment creates 525 00:22:38,390 --> 00:22:42,680 these two groups that, by pure laws of statistics, are 526 00:22:42,680 --> 00:22:46,480 identical in every single characteristic, 527 00:22:46,480 --> 00:22:48,060 statistically speaking. 528 00:22:48,060 --> 00:22:49,920 So both the ones you observe and the 529 00:22:49,920 --> 00:22:51,390 ones you don't observe. 530 00:22:51,390 --> 00:22:54,410 So if we were trying to do an experiment in this classroom 531 00:22:54,410 --> 00:22:58,575 and I randomly assigned you into two groups, I can be 532 00:22:58,575 --> 00:23:02,430 confident that even things I don't observe about you, 533 00:23:02,430 --> 00:23:04,920 you're going to be balanced across those two groups. 534 00:23:04,920 --> 00:23:09,480 If instead I try to match you, I use all the information you 535 00:23:09,480 --> 00:23:14,450 gave me on your application forms and say OK, these people 536 00:23:14,450 --> 00:23:15,450 are from this-- 537 00:23:15,450 --> 00:23:19,200 I'm going to be able to do so with the observables, but not 538 00:23:19,200 --> 00:23:20,520 with the unobservables. 539 00:23:20,520 --> 00:23:23,540 And again, depending on how important these unobservable 540 00:23:23,540 --> 00:23:27,080 are in explaining the outcomes, that may be a big 541 00:23:27,080 --> 00:23:29,810 disadvantage or not so big disadvantage. 542 00:23:29,810 --> 00:23:33,640 And this is what happened in the get out the vote example. 543 00:23:33,640 --> 00:23:37,730 You were able to observe some characteristics of people. 544 00:23:37,730 --> 00:23:40,660 And then non-experimental methods, all of them-- 545 00:23:40,660 --> 00:23:42,250 I mean, not all of them, but most of them-- 546 00:23:42,250 --> 00:23:45,710 can address those. 547 00:23:45,710 --> 00:23:48,450 Some of the methods can also address some unobservables, 548 00:23:48,450 --> 00:23:53,170 but again, they always rely on some assumption about how 549 00:23:53,170 --> 00:23:55,060 those unobservables behave. 550 00:23:55,060 --> 00:23:58,090 Here you're not relying on any assumptions. 551 00:23:58,090 --> 00:24:00,750 You need to do the random assignment properly, but once 552 00:24:00,750 --> 00:24:05,700 it's done properly, you're not relying on any assumption. 553 00:24:05,700 --> 00:24:07,220 AUDIENCE: Is that the general dichotomy? 554 00:24:07,220 --> 00:24:12,992 There's randomized tests, and then matched pairs tests? 555 00:24:12,992 --> 00:24:15,066 Or is there other , is it generally broken 556 00:24:15,066 --> 00:24:17,100 down into those two? 557 00:24:17,100 --> 00:24:21,220 PROFESSOR: So the way that I think most people break it 558 00:24:21,220 --> 00:24:25,520 down is randomized, where you use this random assignment, 559 00:24:25,520 --> 00:24:28,860 and then non-experimental methods. 560 00:24:28,860 --> 00:24:31,760 But I don't mean to imply that all the non-experimental 561 00:24:31,760 --> 00:24:33,440 methods are the same. 562 00:24:33,440 --> 00:24:35,620 And in fact, there are some people who called them 563 00:24:35,620 --> 00:24:37,120 quasi-experimental methods. 564 00:24:37,120 --> 00:24:40,840 Those people tend to think of them a little bit higher than 565 00:24:40,840 --> 00:24:42,460 the non-experimental methods. 566 00:24:42,460 --> 00:24:46,100 Non-experimental people tend to say, this is not good. 567 00:24:46,100 --> 00:24:49,990 Quasi-experimental, oh, this gets closer to the experiment. 568 00:24:49,990 --> 00:24:55,470 But the key thing here is that whatever method you use, the 569 00:24:55,470 --> 00:25:00,010 key is how are the people getting into the program being 570 00:25:00,010 --> 00:25:03,500 selected, and how are you forming that comparison group, 571 00:25:03,500 --> 00:25:07,060 and what statistical techniques are you using to 572 00:25:07,060 --> 00:25:10,550 adjust for whether that comparison group is the same 573 00:25:10,550 --> 00:25:11,810 or not than the treatment? 574 00:25:11,810 --> 00:25:13,290 So the dichotomy is not between 575 00:25:13,290 --> 00:25:14,310 randomized and matching. 576 00:25:14,310 --> 00:25:17,100 The dichotomy is usually between randomized and 577 00:25:17,100 --> 00:25:18,530 everything else. 578 00:25:18,530 --> 00:25:20,660 But within everything else, there are methods that are 579 00:25:20,660 --> 00:25:23,010 much better than others. 580 00:25:23,010 --> 00:25:24,260 Yes? 581 00:25:24,260 --> 00:25:26,110 582 00:25:26,110 --> 00:25:27,674 [? Holgo? ?] 583 00:25:27,674 --> 00:25:30,536 AUDIENCE: How do we randomize when we assign people into 584 00:25:30,536 --> 00:25:31,967 treatment and control groups, besides a lottery? 585 00:25:31,967 --> 00:25:33,217 [INAUDIBLE] 586 00:25:33,217 --> 00:25:35,310 587 00:25:35,310 --> 00:25:36,750 PROFESSOR: You mean the process? 588 00:25:36,750 --> 00:25:39,220 So tomorrow, the whole day is going to be 589 00:25:39,220 --> 00:25:40,650 about how to randomize. 590 00:25:40,650 --> 00:25:44,650 But the basic idea is, you can do it in a variety of ways. 591 00:25:44,650 --> 00:25:47,250 You can do it in a computer, which allows you a lot more 592 00:25:47,250 --> 00:25:48,640 flexibility. 593 00:25:48,640 --> 00:25:52,630 But if for any reason, you need to show people that 594 00:25:52,630 --> 00:25:55,210 you're doing it in a random, transparent manner, that can 595 00:25:55,210 --> 00:25:56,070 also be done. 596 00:25:56,070 --> 00:26:01,910 We just did one in Niger in West Africa where we used 597 00:26:01,910 --> 00:26:03,060 bingo balls. 598 00:26:03,060 --> 00:26:05,780 So literally, people would draw from there, and then 599 00:26:05,780 --> 00:26:07,140 everyone could see. 600 00:26:07,140 --> 00:26:10,430 If we had brought a computer into their room in Niger and 601 00:26:10,430 --> 00:26:13,300 tried to do things, it just wouldn't have worked. 602 00:26:13,300 --> 00:26:16,080 People would have said, what are you doing here? 603 00:26:16,080 --> 00:26:19,770 So there are there of many different ways of randomizing. 604 00:26:19,770 --> 00:26:22,070 The key-- and this is something we're going to talk 605 00:26:22,070 --> 00:26:23,200 about in a little bit-- 606 00:26:23,200 --> 00:26:26,590 is what exactly is the process that you use to make sure that 607 00:26:26,590 --> 00:26:29,510 it's random assignment, not the how, you know, whether 608 00:26:29,510 --> 00:26:34,640 it's bingo balls or a lottery or a coin or whatever it is. 609 00:26:34,640 --> 00:26:35,090 Yes? 610 00:26:35,090 --> 00:26:38,520 AUDIENCE: So at what point this week will we talk about 611 00:26:38,520 --> 00:26:42,590 the ethical dimensions of denying treatment to someone? 612 00:26:42,590 --> 00:26:43,040 PROFESSOR: OK. 613 00:26:43,040 --> 00:26:46,530 Like in three slides, you can jump at me 614 00:26:46,530 --> 00:26:49,360 with the ethical issues. 615 00:26:49,360 --> 00:26:52,550 And then if I don't satisfy you, you have four more days 616 00:26:52,550 --> 00:26:55,340 to jump at every single people who comes into this room. 617 00:26:55,340 --> 00:26:58,560 618 00:26:58,560 --> 00:27:01,210 So what I want to give you is a little bit of 619 00:27:01,210 --> 00:27:02,170 the nuts and bolts. 620 00:27:02,170 --> 00:27:05,040 Rather they keep this discussion in the abstract, 621 00:27:05,040 --> 00:27:06,730 this is what happens in the experiment. 622 00:27:06,730 --> 00:27:09,160 The nuts and bolts, if you wanted to do a randomized 623 00:27:09,160 --> 00:27:12,240 experiment tomorrow, these are sort of eight key steps that 624 00:27:12,240 --> 00:27:14,650 you need to think about. 625 00:27:14,650 --> 00:27:17,410 This is a very simplified description of the process. 626 00:27:17,410 --> 00:27:21,390 As those people sitting in the back will tell you, this is 627 00:27:21,390 --> 00:27:22,420 very simplified. 628 00:27:22,420 --> 00:27:25,900 Their daily lives are consumed with many of the steps, and 629 00:27:25,900 --> 00:27:30,520 they work months, if not years, in each of this. 630 00:27:30,520 --> 00:27:33,530 The first step, and I can't emphasize this enough, is to 631 00:27:33,530 --> 00:27:35,870 design the study carefully. 632 00:27:35,870 --> 00:27:41,550 So no matter what you do, what you do at the beginning is 633 00:27:41,550 --> 00:27:44,480 going to affect you study for the rest of the study. 634 00:27:44,480 --> 00:27:47,450 This is true for some things in life and not others. 635 00:27:47,450 --> 00:27:51,420 For evaluations, impact evaluations, if you don't do 636 00:27:51,420 --> 00:27:53,900 it right at the beginning, you're going to be in trouble. 637 00:27:53,900 --> 00:27:55,360 That's going to come down to haunt you. 638 00:27:55,360 --> 00:27:58,690 So anything you can do to spend time at the beginning, 639 00:27:58,690 --> 00:28:02,120 making sure that the study is designed properly, is going to 640 00:28:02,120 --> 00:28:03,890 be very helpful. 641 00:28:03,890 --> 00:28:07,690 What that means, in very practical terms, is if you are 642 00:28:07,690 --> 00:28:10,860 in a position where you are commissioning a study, and you 643 00:28:10,860 --> 00:28:14,280 don't have people in your staff who are expert at this, 644 00:28:14,280 --> 00:28:16,470 make sure that whoever is going to help you do the 645 00:28:16,470 --> 00:28:19,600 evaluation is involved from the very beginning. 646 00:28:19,600 --> 00:28:24,010 What this also means is that calling someone three years 647 00:28:24,010 --> 00:28:26,100 after the program was implemented, saying, can you 648 00:28:26,100 --> 00:28:28,060 come and evaluate? 649 00:28:28,060 --> 00:28:31,980 That leaves the evaluator with very few options. 650 00:28:31,980 --> 00:28:37,060 So the earlier the evaluators are involved, the better the 651 00:28:37,060 --> 00:28:39,350 options are in terms of how you can do this. 652 00:28:39,350 --> 00:28:42,550 Both in terms of the validity of the evaluation, but also in 653 00:28:42,550 --> 00:28:47,010 terms of how it will interact with the program in a way that 654 00:28:47,010 --> 00:28:49,140 it doesn't disrupt the program. 655 00:28:49,140 --> 00:28:50,360 So this is key. 656 00:28:50,360 --> 00:28:53,870 And we can talk about design a little bit now, but you will 657 00:28:53,870 --> 00:28:56,670 learn a little bit about design when you speak about 658 00:28:56,670 --> 00:28:59,960 sample size, about measurement issues, and all of those 659 00:28:59,960 --> 00:29:01,090 sessions are coming. 660 00:29:01,090 --> 00:29:02,110 How to randomize. 661 00:29:02,110 --> 00:29:05,930 So Wednesday and Thursday are really about that. 662 00:29:05,930 --> 00:29:08,750 The second one is to randomly assign people to treatment or 663 00:29:08,750 --> 00:29:11,790 control or more groups, if there are more than those. 664 00:29:11,790 --> 00:29:13,840 The third one is to collect baseline data. 665 00:29:13,840 --> 00:29:15,660 So this is a big question that comes up. 666 00:29:15,660 --> 00:29:17,680 Should you collect baseline data? 667 00:29:17,680 --> 00:29:23,300 I think my answer to that is, in general, if you don't have 668 00:29:23,300 --> 00:29:27,250 a randomized evaluation, it's going to be very, very, very 669 00:29:27,250 --> 00:29:30,530 difficult to get away without baseline data. 670 00:29:30,530 --> 00:29:32,180 There are some methods that work, but 671 00:29:32,180 --> 00:29:33,170 it's going to be difficult. 672 00:29:33,170 --> 00:29:36,220 By baseline, I mean, before the intervention started. 673 00:29:36,220 --> 00:29:41,110 If you have a randomized trial it would be highly preferable 674 00:29:41,110 --> 00:29:43,390 to have baseline data. 675 00:29:43,390 --> 00:29:44,530 Highly preferable. 676 00:29:44,530 --> 00:29:47,630 But not as critical as with other methods. 677 00:29:47,630 --> 00:29:49,400 And it's preferable in two ways. 678 00:29:49,400 --> 00:29:53,240 The first one is if you have a baseline data, you can verify, 679 00:29:53,240 --> 00:29:55,910 at least in terms of those characteristics you collected 680 00:29:55,910 --> 00:29:58,730 in the baseline survey, you can verify that 681 00:29:58,730 --> 00:30:00,040 two groups look like. 682 00:30:00,040 --> 00:30:03,280 This is a nice thing to verify at the beginning and not at 683 00:30:03,280 --> 00:30:04,660 the end of the evaluation. 684 00:30:04,660 --> 00:30:06,600 So if you can do it, that would be helpful. 685 00:30:06,600 --> 00:30:08,470 And the second thing you have to do is-- yes? 686 00:30:08,470 --> 00:30:09,370 AUDIENCE: Sorry. 687 00:30:09,370 --> 00:30:11,880 What happens if, at the baseline data, you realize 688 00:30:11,880 --> 00:30:14,370 that the two groups that you made were not random? 689 00:30:14,370 --> 00:30:16,820 Do you go and keep randomizing until you get there? 690 00:30:16,820 --> 00:30:18,400 PROFESSOR: So it depends. 691 00:30:18,400 --> 00:30:20,950 It depends on when you discovered this. 692 00:30:20,950 --> 00:30:23,620 If you discover this when the treatment is already being 693 00:30:23,620 --> 00:30:27,130 implemented, it is too late to do anything else in terms of 694 00:30:27,130 --> 00:30:28,200 re-randomizing. 695 00:30:28,200 --> 00:30:31,980 The ideal scenario is one in which you can do this, collect 696 00:30:31,980 --> 00:30:35,830 the baseline data, randomize, verify that they are similar, 697 00:30:35,830 --> 00:30:38,570 and then if they are not similar, then you can 698 00:30:38,570 --> 00:30:41,070 re-randomize again. 699 00:30:41,070 --> 00:30:44,120 There's controversy about how many times you should do this, 700 00:30:44,120 --> 00:30:47,980 but for the most part, in general, if you randomize, the 701 00:30:47,980 --> 00:30:49,440 two groups should look similar. 702 00:30:49,440 --> 00:30:53,720 There are very few scenarios, but they exist, where they 703 00:30:53,720 --> 00:30:55,040 don't look similar to each other. 704 00:30:55,040 --> 00:30:57,770 And if you reach one of those scenarios, you can 705 00:30:57,770 --> 00:30:59,420 re-randomize. 706 00:30:59,420 --> 00:31:02,910 What you can't do is re-randomize when the 707 00:31:02,910 --> 00:31:04,580 treatment is already being distributed. 708 00:31:04,580 --> 00:31:06,740 So if you already decided, you're in the treatment group, 709 00:31:06,740 --> 00:31:08,260 you're in the control group, you can't 710 00:31:08,260 --> 00:31:11,200 re-randomize at that phase. 711 00:31:11,200 --> 00:31:13,350 The second reason you want to collect data, and this is 712 00:31:13,350 --> 00:31:15,530 going to be important particularly in a setting like 713 00:31:15,530 --> 00:31:18,310 yours, if you are worried about sample size, is that it 714 00:31:18,310 --> 00:31:19,960 buys a lot of statistical power. 715 00:31:19,960 --> 00:31:22,540 Particularly if you can collect data on the baseline 716 00:31:22,540 --> 00:31:24,800 version of the outcomes that you care about. 717 00:31:24,800 --> 00:31:27,730 If you can do that, it's highly desirable. 718 00:31:27,730 --> 00:31:31,310 The reality is that sometimes it's feasible to collect 719 00:31:31,310 --> 00:31:34,080 baseline data and sometimes the nature of implementation 720 00:31:34,080 --> 00:31:35,930 of the program makes it difficult. 721 00:31:35,930 --> 00:31:42,804 But you will do well if you can collect baseline data. 722 00:31:42,804 --> 00:31:46,020 AUDIENCE: Wouldn't it seem that by the very fact of 723 00:31:46,020 --> 00:31:48,990 collecting the baseline data, once we have already 724 00:31:48,990 --> 00:31:57,547 randomized, can bias this randomized by collecting the 725 00:31:57,547 --> 00:31:58,720 baseline data? 726 00:31:58,720 --> 00:32:05,120 PROFESSOR: Because you're affecting the people who are 727 00:32:05,120 --> 00:32:07,000 answering the survey? 728 00:32:07,000 --> 00:32:10,210 Well, this has to do a little bit more with survey design 729 00:32:10,210 --> 00:32:11,600 than with any other thing. 730 00:32:11,600 --> 00:32:15,530 The key is, you're going to collect baseline data for both 731 00:32:15,530 --> 00:32:17,520 the participant or the treatment 732 00:32:17,520 --> 00:32:19,280 and the control group. 733 00:32:19,280 --> 00:32:21,910 So if you feel that when people answer a 734 00:32:21,910 --> 00:32:24,810 survey, they somehow-- 735 00:32:24,810 --> 00:32:25,910 I don't know-- 736 00:32:25,910 --> 00:32:29,260 get optimistic about life and do better or the other way 737 00:32:29,260 --> 00:32:34,440 around, as long as it happens in the same way for both 738 00:32:34,440 --> 00:32:37,070 treatment and control groups, it's not a problem for the 739 00:32:37,070 --> 00:32:38,780 randomized trials. 740 00:32:38,780 --> 00:32:41,200 The problem would be if, for some reason, you think that 741 00:32:41,200 --> 00:32:43,400 administering a survey is going to affect the treatment 742 00:32:43,400 --> 00:32:44,900 and the control group differently. 743 00:32:44,900 --> 00:32:47,450 If that's the case, then you need to be careful about how 744 00:32:47,450 --> 00:32:50,134 you do the survey. 745 00:32:50,134 --> 00:32:55,660 AUDIENCE: Can you explain how [INAUDIBLE] statistical power? 746 00:32:55,660 --> 00:32:59,890 PROFESSOR: So in technical terms, what happens is, you, 747 00:32:59,890 --> 00:33:03,680 in your regression, where you estimate an impact, you have 748 00:33:03,680 --> 00:33:05,780 an outcome of interest. 749 00:33:05,780 --> 00:33:10,530 And that outcome has a variance, has some variations. 750 00:33:10,530 --> 00:33:13,710 And then if you can add into your regressions statistical 751 00:33:13,710 --> 00:33:17,610 controls, things you collected at baseline, what essentially 752 00:33:17,610 --> 00:33:20,720 happens is, in technical terms, the standard errors of 753 00:33:20,720 --> 00:33:22,910 your coefficients, particularly if these 754 00:33:22,910 --> 00:33:25,990 variables have a lot of explanatory power, those 755 00:33:25,990 --> 00:33:27,480 standard errors should drop, and you get 756 00:33:27,480 --> 00:33:28,730 more statistical power. 757 00:33:28,730 --> 00:33:31,220 758 00:33:31,220 --> 00:33:32,210 Yes, Jessica? 759 00:33:32,210 --> 00:33:34,110 AUDIENCE: Do you mean to say that you have to collect the 760 00:33:34,110 --> 00:33:36,960 baseline data after you do the first round of randomization? 761 00:33:36,960 --> 00:33:38,180 Does it matter what order you do those steps in? 762 00:33:38,180 --> 00:33:39,010 PROFESSOR: Sorry. 763 00:33:39,010 --> 00:33:42,290 Steps two and three can be inverted. 764 00:33:42,290 --> 00:33:47,110 In fact, it would be ideal if you could invert them. 765 00:33:47,110 --> 00:33:49,000 It would be ideal, because then you can do 766 00:33:49,000 --> 00:33:50,010 what Iqbal is saying. 767 00:33:50,010 --> 00:33:52,570 Which is, you collect the baseline data, you do the 768 00:33:52,570 --> 00:33:55,230 randomization, and then you say, OK. 769 00:33:55,230 --> 00:33:57,160 Are they the same or not? 770 00:33:57,160 --> 00:34:00,100 Then if they're not the same, you re-randomize. 771 00:34:00,100 --> 00:34:03,290 If you collect the baseline data after randomly assigning, 772 00:34:03,290 --> 00:34:06,360 unless you have not communicated to people who 773 00:34:06,360 --> 00:34:09,139 gets the treatment and who gets the control, your options 774 00:34:09,139 --> 00:34:11,429 for re-randomizing are not very good. 775 00:34:11,429 --> 00:34:13,880 So very good point. 776 00:34:13,880 --> 00:34:14,360 All right. 777 00:34:14,360 --> 00:34:16,850 So the fourth step is to verify that the assignment 778 00:34:16,850 --> 00:34:17,630 looks random. 779 00:34:17,630 --> 00:34:19,770 By verifying that the assignment looks random, this 780 00:34:19,770 --> 00:34:22,760 is something that if you were to commission an evaluation, 781 00:34:22,760 --> 00:34:24,590 you should make sure that your evaluator 782 00:34:24,590 --> 00:34:26,790 provides to you this. 783 00:34:26,790 --> 00:34:29,580 Which is at the very least a table that says, here's the 784 00:34:29,580 --> 00:34:33,080 treatment group, here's the control group, and here's how 785 00:34:33,080 --> 00:34:35,110 they look like in terms of these baseline 786 00:34:35,110 --> 00:34:36,350 characteristics. 787 00:34:36,350 --> 00:34:40,290 And ideally those two groups, those tables should have very, 788 00:34:40,290 --> 00:34:44,159 very few differences between the groups. 789 00:34:44,159 --> 00:34:47,370 When I say differences, they cannot be, in practical terms, 790 00:34:47,370 --> 00:34:48,699 large differences. 791 00:34:48,699 --> 00:34:50,239 There could be some differences that are 792 00:34:50,239 --> 00:34:53,199 statistically significant, because either you have a lot 793 00:34:53,199 --> 00:34:57,920 of statistical power, or more likely, if you compare 10 794 00:34:57,920 --> 00:35:00,770 variables, some of them will end up being significant. 795 00:35:00,770 --> 00:35:03,210 The key is, there are no systematic differences between 796 00:35:03,210 --> 00:35:03,880 the groups. 797 00:35:03,880 --> 00:35:06,410 If you observe systematic differences, 798 00:35:06,410 --> 00:35:07,800 then you're in trouble. 799 00:35:07,800 --> 00:35:09,150 This didn't work well. 800 00:35:09,150 --> 00:35:12,350 But I can tell you from experience, from the law of 801 00:35:12,350 --> 00:35:17,360 statistics, these two groups will look the same 802 00:35:17,360 --> 00:35:20,430 a lot of the time. 803 00:35:20,430 --> 00:35:20,770 OK. 804 00:35:20,770 --> 00:35:24,490 So obviously you can only do that verification if you have 805 00:35:24,490 --> 00:35:27,290 some data on the two groups before. 806 00:35:27,290 --> 00:35:30,110 Now, when I say "collect baseline data," if maybe you 807 00:35:30,110 --> 00:35:33,360 already have baseline data-- for some reason this is a 808 00:35:33,360 --> 00:35:35,760 population that you're ready serving, you already did 809 00:35:35,760 --> 00:35:38,790 surveys on these people-- 810 00:35:38,790 --> 00:35:42,750 if that's the case, then all the better. 811 00:35:42,750 --> 00:35:45,950 It may be that you don't have baseline data, but you may be 812 00:35:45,950 --> 00:35:47,390 able to get baseline data. 813 00:35:47,390 --> 00:35:50,840 So for example, if you're randomly assigning schools, 814 00:35:50,840 --> 00:35:53,770 you may have, from the government or from some 815 00:35:53,770 --> 00:35:56,220 agency, some census of schools. 816 00:35:56,220 --> 00:35:58,670 And you may be able to compare schools in terms of 817 00:35:58,670 --> 00:36:01,050 socioeconomic characteristics of the students. 818 00:36:01,050 --> 00:36:03,540 You may be able to compare schools, you know, percent of 819 00:36:03,540 --> 00:36:04,800 private, public. 820 00:36:04,800 --> 00:36:07,040 If there was a test done nationally for all the 821 00:36:07,040 --> 00:36:08,720 schools, you may be able to compare test 822 00:36:08,720 --> 00:36:10,480 scores on those schools. 823 00:36:10,480 --> 00:36:13,490 The key thing is, anything you can do to verify that, will a 824 00:36:13,490 --> 00:36:14,910 random assignment work? 825 00:36:14,910 --> 00:36:15,600 Is good. 826 00:36:15,600 --> 00:36:19,910 It would be useful to do it at the beginning. 827 00:36:19,910 --> 00:36:22,940 The fifth step is to monitor the process so that the 828 00:36:22,940 --> 00:36:24,830 integrity of the experiment is not compromised. 829 00:36:24,830 --> 00:36:28,300 This is something that's really, really key. 830 00:36:28,300 --> 00:36:30,660 When you do a randomized experiment, designing the 831 00:36:30,660 --> 00:36:32,230 study carefully is very important. 832 00:36:32,230 --> 00:36:34,480 Doing the random assignment is very important. 833 00:36:34,480 --> 00:36:37,690 But you can't just relax and then wait for two years until 834 00:36:37,690 --> 00:36:39,270 you collect the outcomes. 835 00:36:39,270 --> 00:36:41,840 And the people who are sitting at the back of the room know 836 00:36:41,840 --> 00:36:43,540 this much better than I do. 837 00:36:43,540 --> 00:36:46,510 If you are not following exactly what's happening in 838 00:36:46,510 --> 00:36:52,280 the field, the opportunities for this experiment to not go 839 00:36:52,280 --> 00:36:56,160 well are very, very big. 840 00:36:56,160 --> 00:36:59,220 You're going to have a whole session on Friday on threats 841 00:36:59,220 --> 00:37:01,470 to an experiment. 842 00:37:01,470 --> 00:37:05,290 The only thing I will say now is that the best way to deal 843 00:37:05,290 --> 00:37:09,450 with threats to an experiment is to avoid those threats, and 844 00:37:09,450 --> 00:37:12,950 to avoid them at this stage of implementation. 845 00:37:12,950 --> 00:37:14,145 One very quick threat. 846 00:37:14,145 --> 00:37:16,890 If you assign people to a treatment group and people to 847 00:37:16,890 --> 00:37:20,310 a control group, that means that people in the control 848 00:37:20,310 --> 00:37:22,190 group are not offered the treatment. 849 00:37:22,190 --> 00:37:24,890 But that also means, they shouldn't get the treatment. 850 00:37:24,890 --> 00:37:29,160 And as some of you know, that doesn't always happen. 851 00:37:29,160 --> 00:37:31,300 So some people in the control group find their 852 00:37:31,300 --> 00:37:33,420 way into the program. 853 00:37:33,420 --> 00:37:36,960 Having systems to monitor that this doesn't happen, and that 854 00:37:36,960 --> 00:37:42,180 if it does happen, that it happens in very, very few 855 00:37:42,180 --> 00:37:44,810 exceptional cases, is going to be very important. 856 00:37:44,810 --> 00:37:46,450 Yes, Logan? 857 00:37:46,450 --> 00:37:49,580 AUDIENCE: One of the arguments for the superiority of the 858 00:37:49,580 --> 00:37:54,530 matched pairs is that if one treatment group ends up not 859 00:37:54,530 --> 00:37:58,040 getting the treatment because lack of capacity in that 860 00:37:58,040 --> 00:38:00,733 region, or vice versa, the scenario you described, you 861 00:38:00,733 --> 00:38:02,210 can just drop that pair. 862 00:38:02,210 --> 00:38:03,140 PROFESSOR: Yes. 863 00:38:03,140 --> 00:38:06,810 The problem when you drop that pair is that it may 864 00:38:06,810 --> 00:38:08,300 be costly to you. 865 00:38:08,300 --> 00:38:09,700 Dropping that pair. 866 00:38:09,700 --> 00:38:13,880 And you have to assume that that-- well, first of all, you 867 00:38:13,880 --> 00:38:17,030 have to assume that pair was comparable to begin with. 868 00:38:17,030 --> 00:38:22,000 And then even if you were to drop that pair, well, first of 869 00:38:22,000 --> 00:38:23,990 all, matching doesn't always work on one-to-one. 870 00:38:23,990 --> 00:38:26,940 But even if you had one-to-one matching, suppose you had to 871 00:38:26,940 --> 00:38:31,910 drop 10% or 20% or 30% of your pairs, then you lose 872 00:38:31,910 --> 00:38:34,040 statistical power, and then you also lose external 873 00:38:34,040 --> 00:38:36,700 validity to begin with. 874 00:38:36,700 --> 00:38:37,230 Yes? 875 00:38:37,230 --> 00:38:39,940 AUDIENCE: So there's also the issue of spillover effect, 876 00:38:39,940 --> 00:38:40,910 which isn't the same. 877 00:38:40,910 --> 00:38:43,730 So one might be that somebody sneaks into the program who 878 00:38:43,730 --> 00:38:44,888 was supposed to be in the program. 879 00:38:44,888 --> 00:38:47,710 But the other is, if you do things in the same community, 880 00:38:47,710 --> 00:38:50,460 which is often the case in the work that we do, or in a 881 00:38:50,460 --> 00:38:53,370 similar environment, the mere effect of having 882 00:38:53,370 --> 00:38:54,410 something going on-- 883 00:38:54,410 --> 00:38:55,300 PROFESSOR: Yes. 884 00:38:55,300 --> 00:38:58,460 And this is why the first stage is very important. 885 00:38:58,460 --> 00:39:02,360 If you think spillovers will occur, the moment to think 886 00:39:02,360 --> 00:39:04,980 about them is at the design stage of the evaluation. 887 00:39:04,980 --> 00:39:08,460 Because then you can decide on how you're going to randomize 888 00:39:08,460 --> 00:39:11,120 in a way that minimizes the effect that 889 00:39:11,120 --> 00:39:12,890 spillovers would have. 890 00:39:12,890 --> 00:39:16,410 So there's some statistical techniques to deal with some 891 00:39:16,410 --> 00:39:17,050 of these problems. 892 00:39:17,050 --> 00:39:20,020 But the best way to do with these problems is to avoid 893 00:39:20,020 --> 00:39:20,980 them in the first place. 894 00:39:20,980 --> 00:39:23,400 And you avoid them by good design, where the evaluator 895 00:39:23,400 --> 00:39:27,200 can help, and by a good monitoring system to make sure 896 00:39:27,200 --> 00:39:31,860 that the evaluation is being implemented as intended. 897 00:39:31,860 --> 00:39:32,500 Makes sense? 898 00:39:32,500 --> 00:39:35,640 Yes, your name please? 899 00:39:35,640 --> 00:39:38,800 Are you also filming from this camera here? 900 00:39:38,800 --> 00:39:40,160 OK. 901 00:39:40,160 --> 00:39:41,200 I'm nervous now. 902 00:39:41,200 --> 00:39:44,170 Two cameras. 903 00:39:44,170 --> 00:39:46,240 AUDIENCE: What's on the [INAUDIBLE] 904 00:39:46,240 --> 00:39:50,010 to avoid [INAUDIBLE]? 905 00:39:50,010 --> 00:39:50,730 PROFESSOR: Yes. 906 00:39:50,730 --> 00:39:57,490 So I think one important thing is to have people in the field 907 00:39:57,490 --> 00:40:02,300 who can help monitor, and who know about the evaluation. 908 00:40:02,300 --> 00:40:04,120 Two is to have a clear commitment. 909 00:40:04,120 --> 00:40:05,950 This is something that Rachel said this morning that's 910 00:40:05,950 --> 00:40:07,810 really, really key. 911 00:40:07,810 --> 00:40:11,850 Very clear commitment from whoever is organizing. 912 00:40:11,850 --> 00:40:13,470 That's very creative. 913 00:40:13,470 --> 00:40:18,990 For whoever is implementing the program. 914 00:40:18,990 --> 00:40:20,350 So I'll give you an example. 915 00:40:20,350 --> 00:40:23,770 We were evaluating this program in Jamaica. 916 00:40:23,770 --> 00:40:28,000 And we were telling them, we need to monitor the 917 00:40:28,000 --> 00:40:28,670 crossovers. 918 00:40:28,670 --> 00:40:31,220 We can't have people who are not supposed to receive the 919 00:40:31,220 --> 00:40:32,790 program, get into the program. 920 00:40:32,790 --> 00:40:35,850 Yes, yes, yes. 921 00:40:35,850 --> 00:40:37,730 Is it OK if a few do it? 922 00:40:37,730 --> 00:40:41,270 We say, well, only if a few, but really, this has to be the 923 00:40:41,270 --> 00:40:44,160 exception, and you really have to monitor this rate, and we 924 00:40:44,160 --> 00:40:46,540 asked them for a report on this rate, and so on. 925 00:40:46,540 --> 00:40:49,340 This is a government agency in Jamaica. 926 00:40:49,340 --> 00:40:51,050 And so they were all the time asking, OK. 927 00:40:51,050 --> 00:40:54,200 How many is too many? 928 00:40:54,200 --> 00:40:55,740 And we were like, no, no, no, no. 929 00:40:55,740 --> 00:40:57,960 You have to keep that rate to a minimum. 930 00:40:57,960 --> 00:41:00,590 There's no way you can have crossovers. 931 00:41:00,590 --> 00:41:01,760 Just keep-- 932 00:41:01,760 --> 00:41:03,680 no, but how many, how many? 933 00:41:03,680 --> 00:41:07,990 In one day of weakness, we said, OK. 934 00:41:07,990 --> 00:41:10,790 If it's more than 10%, this is completely ruined. 935 00:41:10,790 --> 00:41:13,190 We can't do anything with it. 936 00:41:13,190 --> 00:41:17,000 So end of the evaluation arrived. 937 00:41:17,000 --> 00:41:19,110 We compute the crossover rate. 938 00:41:19,110 --> 00:41:20,360 9.6%. 939 00:41:20,360 --> 00:41:21,990 940 00:41:21,990 --> 00:41:25,460 So what I want to say here is that if they didn't want to 941 00:41:25,460 --> 00:41:29,120 comply with our request, they could have made this rate be 942 00:41:29,120 --> 00:41:33,430 30% or 40% and we would have not heard anything about it. 943 00:41:33,430 --> 00:41:35,320 I'm not saying 10% is the right threshold. 944 00:41:35,320 --> 00:41:38,190 It of course depends on the program and on other things. 945 00:41:38,190 --> 00:41:41,680 But the key thing here is, you need to have full cooperation 946 00:41:41,680 --> 00:41:44,520 between the people in the field who are implementing and 947 00:41:44,520 --> 00:41:46,580 the people in the field who are evaluating. 948 00:41:46,580 --> 00:41:49,370 If you don't have that, then it's very difficult. 949 00:41:49,370 --> 00:41:52,700 Because people find a way to get to a program if they hear 950 00:41:52,700 --> 00:41:55,880 that this program is serving, is doing some good. 951 00:41:55,880 --> 00:42:00,630 So I mean, who's a parent in this room? 952 00:42:00,630 --> 00:42:00,940 All right. 953 00:42:00,940 --> 00:42:02,680 So now, confess. 954 00:42:02,680 --> 00:42:06,970 If your child, in your school, there was a randomized trial 955 00:42:06,970 --> 00:42:09,700 on this very promising, you name it. 956 00:42:09,700 --> 00:42:11,150 After school program. 957 00:42:11,150 --> 00:42:15,110 And your child fell in the control group. 958 00:42:15,110 --> 00:42:18,200 Would you be at least tempted to go to the principal and 959 00:42:18,200 --> 00:42:22,070 say, I want my child in that program? 960 00:42:22,070 --> 00:42:23,260 Tempted? 961 00:42:23,260 --> 00:42:23,810 All right. 962 00:42:23,810 --> 00:42:26,090 I can tell you that other parents are more than tempted, 963 00:42:26,090 --> 00:42:27,740 and will find a way. 964 00:42:27,740 --> 00:42:29,070 All right. 965 00:42:29,070 --> 00:42:30,180 AUDIENCE: What do you do with the spillovers? 966 00:42:30,180 --> 00:42:31,880 Do you just exclude them and put them in 967 00:42:31,880 --> 00:42:33,630 the comparison group? 968 00:42:33,630 --> 00:42:36,100 PROFESSOR: So these are called crossovers, because they cross 969 00:42:36,100 --> 00:42:37,900 from the control to the treatment. 970 00:42:37,900 --> 00:42:41,050 The key thing-- this comes at the analysis stage, and this 971 00:42:41,050 --> 00:42:42,700 you'll do on Friday. 972 00:42:42,700 --> 00:42:46,530 But the key thing is, what random assignment buys you is 973 00:42:46,530 --> 00:42:49,400 that the two groups are comparable as a whole. 974 00:42:49,400 --> 00:42:52,460 The whole treatment group with the whole control group. 975 00:42:52,460 --> 00:42:54,470 You can't then just say, oh, I don't like this 976 00:42:54,470 --> 00:42:55,360 control group member. 977 00:42:55,360 --> 00:42:56,670 I'm just going to throw it out. 978 00:42:56,670 --> 00:42:59,230 That completely destroys the comparability. 979 00:42:59,230 --> 00:43:02,660 You still need to compare the full two groups, and you do 980 00:43:02,660 --> 00:43:05,890 some statistical adjustments to deal with the crossover. 981 00:43:05,890 --> 00:43:08,590 But once a treatment, always a treatment. 982 00:43:08,590 --> 00:43:10,930 Once a control, always a control. 983 00:43:10,930 --> 00:43:13,940 The random assignment buys you that two groups are the same. 984 00:43:13,940 --> 00:43:17,370 If you throw away-- suppose then, 10% of crossovers. 985 00:43:17,370 --> 00:43:21,280 If you throw them away you will be comparing the whole 986 00:43:21,280 --> 00:43:24,540 treatment group with this 90% of the control group. 987 00:43:24,540 --> 00:43:28,140 And let's just assume for a second that that 10% who 988 00:43:28,140 --> 00:43:31,060 crossover are people who are particularly motivated, and 989 00:43:31,060 --> 00:43:32,820 that's why they switch over. 990 00:43:32,820 --> 00:43:35,770 Well then, the average motivation of the two groups 991 00:43:35,770 --> 00:43:37,840 were the same at the beginning, but once you throw 992 00:43:37,840 --> 00:43:41,030 that 10% away, the average motivation of the treatment 993 00:43:41,030 --> 00:43:43,220 group is going to be higher than the average motivation of 994 00:43:43,220 --> 00:43:43,960 the control group. 995 00:43:43,960 --> 00:43:46,400 So any difference you find in outcomes between these two 996 00:43:46,400 --> 00:43:49,400 groups could be due to the program, but could also be due 997 00:43:49,400 --> 00:43:51,700 to differences in motivation. 998 00:43:51,700 --> 00:43:53,140 You can't throw them away. 999 00:43:53,140 --> 00:43:56,020 There's statistical ways of dealing with them. 1000 00:43:56,020 --> 00:43:57,965 Yes? 1001 00:43:57,965 --> 00:44:00,090 AUDIENCE: Turns out, I guess I didn't understand the answer 1002 00:44:00,090 --> 00:44:02,260 to the earlier question. 1003 00:44:02,260 --> 00:44:05,630 So we're worried about spillover, and we're going to 1004 00:44:05,630 --> 00:44:08,450 deliver books to-- 1005 00:44:08,450 --> 00:44:11,990 clearly the intervention is that the kids get books that 1006 00:44:11,990 --> 00:44:14,350 they can take home to study at night. 1007 00:44:14,350 --> 00:44:17,750 But I've decided that because I'm worried about spillover 1008 00:44:17,750 --> 00:44:20,660 and because it's more administratively convenient, 1009 00:44:20,660 --> 00:44:24,370 I'm going to deliver to some schools. 1010 00:44:24,370 --> 00:44:28,360 So I'm going to draw the schools at random, but I'm 1011 00:44:28,360 --> 00:44:31,040 looking at the kids, impact on the kids. 1012 00:44:31,040 --> 00:44:33,100 PROFESSOR: That's OK. 1013 00:44:33,100 --> 00:44:37,810 AUDIENCE: So even so, I haven't damaged my ability to 1014 00:44:37,810 --> 00:44:42,900 look at the students' effects, because my unit of 1015 00:44:42,900 --> 00:44:46,660 randomization was at a different level. 1016 00:44:46,660 --> 00:44:49,340 PROFESSOR: That's perfectly fine. 1017 00:44:49,340 --> 00:44:55,650 However, the higher the unit of randomization, the more 1018 00:44:55,650 --> 00:44:58,920 trouble you're going to have in having enough statistical 1019 00:44:58,920 --> 00:45:00,310 power to detect effects. 1020 00:45:00,310 --> 00:45:02,880 But that's a topic that I want to leave up to Thursday. 1021 00:45:02,880 --> 00:45:04,510 But yes. 1022 00:45:04,510 --> 00:45:06,750 I mean, when we say the schools are treated-- 1023 00:45:06,750 --> 00:45:08,230 I mean, the schools are buildings. 1024 00:45:08,230 --> 00:45:10,130 They're not being treated in any way. 1025 00:45:10,130 --> 00:45:12,480 Unless you paint them or do something to them, they're not 1026 00:45:12,480 --> 00:45:13,150 being treated-- 1027 00:45:13,150 --> 00:45:14,010 AUDIENCE: Ours got paint. 1028 00:45:14,010 --> 00:45:15,260 PROFESSOR: OK. 1029 00:45:15,260 --> 00:45:19,790 So if it's just painting them, then the schools-- 1030 00:45:19,790 --> 00:45:20,770 no, but seriously. 1031 00:45:20,770 --> 00:45:24,150 When I say treated, who's being 1032 00:45:24,150 --> 00:45:25,540 affected by the treatment? 1033 00:45:25,540 --> 00:45:27,685 AUDIENCE: Well, I can't have a-- 1034 00:45:27,685 --> 00:45:28,850 it's going to hurt my power. 1035 00:45:28,850 --> 00:45:31,870 But I can randomize at a different level than 1036 00:45:31,870 --> 00:45:32,700 [INAUDIBLE]. 1037 00:45:32,700 --> 00:45:33,720 PROFESSOR: You can. 1038 00:45:33,720 --> 00:45:35,910 Particularly if you want to avoid spillovers, that's 1039 00:45:35,910 --> 00:45:39,100 exactly what you should be doing. 1040 00:45:39,100 --> 00:45:39,840 All right. 1041 00:45:39,840 --> 00:45:40,792 Yes? 1042 00:45:40,792 --> 00:45:43,100 AUDIENCE: My name is Cesar. 1043 00:45:43,100 --> 00:45:46,270 What happened when the intervention is something 1044 00:45:46,270 --> 00:45:47,010 about knowledge? 1045 00:45:47,010 --> 00:45:50,900 For example, that some nurse trained to a treatment group 1046 00:45:50,900 --> 00:45:55,850 about wash your hands, and this knowledge can-- 1047 00:45:55,850 --> 00:45:56,710 PROFESSOR: Can spillover. 1048 00:45:56,710 --> 00:45:57,270 Yeah. 1049 00:45:57,270 --> 00:45:58,190 That's exactly right. 1050 00:45:58,190 --> 00:46:01,250 So again, you need to think about the design of the study. 1051 00:46:01,250 --> 00:46:04,170 If you really think it's going to spill over, then you need 1052 00:46:04,170 --> 00:46:07,970 to think about randomizing at a higher level so that the 1053 00:46:07,970 --> 00:46:09,940 spillover doesn't occur. 1054 00:46:09,940 --> 00:46:11,560 I do have to say one thing. 1055 00:46:11,560 --> 00:46:13,830 There's some interventions where the 1056 00:46:13,830 --> 00:46:15,290 spillover is evident. 1057 00:46:15,290 --> 00:46:17,550 And you're going to see that in the deworming case. 1058 00:46:17,550 --> 00:46:18,880 I think it's case number 4. 1059 00:46:18,880 --> 00:46:21,400 So it's very clear that this is happening. 1060 00:46:21,400 --> 00:46:24,570 There's a human biological transmission of disease that 1061 00:46:24,570 --> 00:46:26,070 makes spillovers very clear. 1062 00:46:26,070 --> 00:46:28,820 1063 00:46:28,820 --> 00:46:30,070 This is my own bias. 1064 00:46:30,070 --> 00:46:34,110 But there are tons of problem programs out there that have 1065 00:46:34,110 --> 00:46:36,620 difficulty affecting the people that 1066 00:46:36,620 --> 00:46:38,620 they're intended to. 1067 00:46:38,620 --> 00:46:41,880 So thinking that they're going to affect other people they 1068 00:46:41,880 --> 00:46:45,800 haven't been intending to help, in some cases at least, 1069 00:46:45,800 --> 00:46:46,750 is a stretch. 1070 00:46:46,750 --> 00:46:50,080 Having said that, if you think spillovers will occur, then 1071 00:46:50,080 --> 00:46:52,520 you need to think about that at the design 1072 00:46:52,520 --> 00:46:54,130 stage of the study. 1073 00:46:54,130 --> 00:46:54,630 yes? 1074 00:46:54,630 --> 00:46:55,365 Your name please? 1075 00:46:55,365 --> 00:46:56,006 AUDIENCE: Yes, sir. 1076 00:46:56,006 --> 00:46:57,956 Raj. 1077 00:46:57,956 --> 00:46:59,888 Just getting back to the example where you were saying 1078 00:46:59,888 --> 00:47:02,061 if you took each of us, and you assigned us to two 1079 00:47:02,061 --> 00:47:04,718 different groups, it would adjust for the unobservable 1080 00:47:04,718 --> 00:47:05,684 characteristics. 1081 00:47:05,684 --> 00:47:08,300 Would that work out in a sample size so small? 1082 00:47:08,300 --> 00:47:11,280 PROFESSOR: In a sample size like this, you will have 1083 00:47:11,280 --> 00:47:12,480 trouble with statistical-- 1084 00:47:12,480 --> 00:47:14,290 I want to leave all those questions of-- 1085 00:47:14,290 --> 00:47:17,340 you have our superstar, Esther Duflo, who's going to speak 1086 00:47:17,340 --> 00:47:18,690 about statistical power. 1087 00:47:18,690 --> 00:47:26,390 But the key thing here is, if you have a small group, then 1088 00:47:26,390 --> 00:47:28,970 what happens is the sampling error is bigger. 1089 00:47:28,970 --> 00:47:31,300 So you may observe differences between the groups. 1090 00:47:31,300 --> 00:47:33,930 You may not declare them to be statistically significant 1091 00:47:33,930 --> 00:47:35,910 because you have very little power. 1092 00:47:35,910 --> 00:47:39,610 So in general, you want larger sample sizes. 1093 00:47:39,610 --> 00:47:41,180 This group is probably small. 1094 00:47:41,180 --> 00:47:43,470 But even if you did it with this group, and I challenge 1095 00:47:43,470 --> 00:47:44,440 you to do it-- 1096 00:47:44,440 --> 00:47:46,570 just take an Excel spreadsheet and take five 1097 00:47:46,570 --> 00:47:48,060 characteristics of you. 1098 00:47:48,060 --> 00:47:50,040 And the random assignment, you're going to see some 1099 00:47:50,040 --> 00:47:51,640 differences. 1100 00:47:51,640 --> 00:47:53,660 But it's really amazing how the two 1101 00:47:53,660 --> 00:47:54,910 groups will look alike. 1102 00:47:54,910 --> 00:47:55,620 And the other thing. 1103 00:47:55,620 --> 00:47:59,680 If you're not accounting for unobservable differences like 1104 00:47:59,680 --> 00:48:02,260 some non-experimental methods do. 1105 00:48:02,260 --> 00:48:04,630 The key thing about this is, you don't need to account for 1106 00:48:04,630 --> 00:48:07,090 anything, because the two groups are balanced across 1107 00:48:07,090 --> 00:48:08,060 these two things. 1108 00:48:08,060 --> 00:48:11,750 So they have the same average level of motivation, and so I 1109 00:48:11,750 --> 00:48:14,270 don't need to control statistically for motivation. 1110 00:48:14,270 --> 00:48:16,580 Because that cannot be a confounding factor if the two 1111 00:48:16,580 --> 00:48:18,040 groups are the same. 1112 00:48:18,040 --> 00:48:19,590 OK? 1113 00:48:19,590 --> 00:48:20,110 All right. 1114 00:48:20,110 --> 00:48:22,820 So step number six. 1115 00:48:22,820 --> 00:48:25,060 If you're going to measure the impact of a program on an 1116 00:48:25,060 --> 00:48:27,110 outcome of interest, you need to collect 1117 00:48:27,110 --> 00:48:28,200 data on that outcome. 1118 00:48:28,200 --> 00:48:29,780 And that's called follow-up data. 1119 00:48:29,780 --> 00:48:32,290 And the key thing is, you need to collect that for both 1120 00:48:32,290 --> 00:48:34,380 treatment and control groups. 1121 00:48:34,380 --> 00:48:37,640 And it's important that it be done in identical ways. 1122 00:48:37,640 --> 00:48:42,690 So you can't, or it would not be a good idea, to have 1123 00:48:42,690 --> 00:48:46,370 treatment group data come from one source, say, a survey, and 1124 00:48:46,370 --> 00:48:48,620 control group data come from another source, say, 1125 00:48:48,620 --> 00:48:51,370 administrative data, because data sources are generally not 1126 00:48:51,370 --> 00:48:54,400 very compatible to each other. 1127 00:48:54,400 --> 00:48:55,860 The seventh step. 1128 00:48:55,860 --> 00:48:57,870 Of course, estimate the program impact. 1129 00:48:57,870 --> 00:49:00,280 And if the experiment is properly done, what you should 1130 00:49:00,280 --> 00:49:02,880 be doing is just compare the outcomes-- the mean outcomes 1131 00:49:02,880 --> 00:49:04,780 of the treatment group with the mean outcomes of the 1132 00:49:04,780 --> 00:49:06,170 control groups. 1133 00:49:06,170 --> 00:49:09,560 Now, there are versions of the experiments where they are 1134 00:49:09,560 --> 00:49:12,790 more sophisticated, and then you need to use the multiple 1135 00:49:12,790 --> 00:49:14,920 regression framework to control for things, 1136 00:49:14,920 --> 00:49:16,850 particularly if you have stratified your 1137 00:49:16,850 --> 00:49:18,480 sample, and so on. 1138 00:49:18,480 --> 00:49:21,990 But in general, the basic idea is, there are no differences 1139 00:49:21,990 --> 00:49:23,730 between these two groups. 1140 00:49:23,730 --> 00:49:27,090 Then the simple differences in outcomes between those groups 1141 00:49:27,090 --> 00:49:30,170 should give you the impact of the program. 1142 00:49:30,170 --> 00:49:33,140 There are other reasons you may want to use the regression 1143 00:49:33,140 --> 00:49:35,630 framework, such as statistical power, that we were talking 1144 00:49:35,630 --> 00:49:38,110 about before, but this is the basic idea. 1145 00:49:38,110 --> 00:49:41,460 If the differences between the two groups is very different 1146 00:49:41,460 --> 00:49:44,870 than what you get with the regression, you should start 1147 00:49:44,870 --> 00:49:47,710 thinking about what's going on. 1148 00:49:47,710 --> 00:49:48,440 And then eight. 1149 00:49:48,440 --> 00:49:51,200 And I think this is very important for practitioners. 1150 00:49:51,200 --> 00:49:53,440 You should assess whether the program's impact are 1151 00:49:53,440 --> 00:49:56,150 statistically significant, but also if they're practically 1152 00:49:56,150 --> 00:49:56,910 significant. 1153 00:49:56,910 --> 00:49:58,910 So if statistically significant means, we're 1154 00:49:58,910 --> 00:50:01,500 confident that this impact is different from 0 in a 1155 00:50:01,500 --> 00:50:03,150 statistical sense. 1156 00:50:03,150 --> 00:50:06,070 Having said that, the impact may still be very small for 1157 00:50:06,070 --> 00:50:07,250 any practical purposes. 1158 00:50:07,250 --> 00:50:10,250 So it may be that a program affects some outcome of 1159 00:50:10,250 --> 00:50:14,010 interest, but the effect is so small that you won't decide 1160 00:50:14,010 --> 00:50:16,760 that this program was a success on the basis of that. 1161 00:50:16,760 --> 00:50:18,980 So both of those things are important. 1162 00:50:18,980 --> 00:50:22,240 The stars or the asterisks for statistical significance are 1163 00:50:22,240 --> 00:50:26,380 not enough for you to conclude that a program is successful. 1164 00:50:26,380 --> 00:50:27,480 Yes? 1165 00:50:27,480 --> 00:50:28,676 Your name pace. 1166 00:50:28,676 --> 00:50:30,222 AUDIENCE: Ashu. 1167 00:50:30,222 --> 00:50:30,684 Yeah. 1168 00:50:30,684 --> 00:50:33,120 I understand we can get the mean just by seeing the 1169 00:50:33,120 --> 00:50:34,760 difference between the two sample sets. 1170 00:50:34,760 --> 00:50:37,974 How do we get a handle on this trend of standard error and 1171 00:50:37,974 --> 00:50:40,334 consequently the statistical significance? 1172 00:50:40,334 --> 00:50:40,810 PROFESSOR: Yeah. 1173 00:50:40,810 --> 00:50:47,340 So again, in the simplest, very, very simple, you just do 1174 00:50:47,340 --> 00:50:50,830 a comparison of two groups, this is the standard t-test, 1175 00:50:50,830 --> 00:50:52,530 there's nothing else to do. 1176 00:50:52,530 --> 00:50:57,230 In practice, a lot of this impact estimation is done 1177 00:50:57,230 --> 00:50:58,970 through the regression framework. 1178 00:50:58,970 --> 00:51:01,250 However you're going to do it, you're going to let your 1179 00:51:01,250 --> 00:51:03,480 statistical software calculate those standard errors. 1180 00:51:03,480 --> 00:51:06,960 Of course you need to be careful about things you learn 1181 00:51:06,960 --> 00:51:08,820 on Thursday, such as clustering and so on. 1182 00:51:08,820 --> 00:51:11,530 You need to make sure that those errors reflect that. 1183 00:51:11,530 --> 00:51:16,570 But the basic idea is, you let your statistical software or 1184 00:51:16,570 --> 00:51:18,770 the evaluator calculate those impacts. 1185 00:51:18,770 --> 00:51:24,220 But as a proxy, if the two means are not different, then 1186 00:51:24,220 --> 00:51:26,330 it's going to be hard to argue that this 1187 00:51:26,330 --> 00:51:27,580 program had a big effect. 1188 00:51:27,580 --> 00:51:30,280 1189 00:51:30,280 --> 00:51:31,670 OK. 1190 00:51:31,670 --> 00:51:32,950 So random. 1191 00:51:32,950 --> 00:51:37,060 As I said at the beginning, anyone can tell me, what does 1192 00:51:37,060 --> 00:51:38,310 the term "random" mean? 1193 00:51:38,310 --> 00:51:42,590 1194 00:51:42,590 --> 00:51:42,970 Yes? 1195 00:51:42,970 --> 00:51:44,540 AUDIENCE: Chosen by chance. 1196 00:51:44,540 --> 00:51:48,330 PROFESSOR: Oh, you work for public opinion polls. 1197 00:51:48,330 --> 00:51:50,410 I should have asked you. 1198 00:51:50,410 --> 00:51:51,040 All right. 1199 00:51:51,040 --> 00:51:52,890 So "chosen by chance." What does that mean? 1200 00:51:52,890 --> 00:51:55,794 1201 00:51:55,794 --> 00:51:57,044 AUDIENCE: [INAUDIBLE] 1202 00:51:57,044 --> 00:52:00,634 1203 00:52:00,634 --> 00:52:03,860 One can say random if there's no systematic 1204 00:52:03,860 --> 00:52:06,650 trend behind the selection. 1205 00:52:06,650 --> 00:52:07,520 PROFESSOR: OK. 1206 00:52:07,520 --> 00:52:08,480 Systematic trends. 1207 00:52:08,480 --> 00:52:10,690 So you don't have someone saying, you 1208 00:52:10,690 --> 00:52:13,740 go here you go there. 1209 00:52:13,740 --> 00:52:16,400 So suppose I wanted to do a random 1210 00:52:16,400 --> 00:52:18,930 assignment in this classroom. 1211 00:52:18,930 --> 00:52:23,370 And I went here, and I closed my eyes, and I throw a ball 1212 00:52:23,370 --> 00:52:24,700 right here. 1213 00:52:24,700 --> 00:52:25,870 I don't see where I'm throwing. 1214 00:52:25,870 --> 00:52:27,360 I just throw it. 1215 00:52:27,360 --> 00:52:29,700 Person gets it, falls into the treatment. 1216 00:52:29,700 --> 00:52:31,252 Is that random? 1217 00:52:31,252 --> 00:52:31,710 AUDIENCE: No. 1218 00:52:31,710 --> 00:52:32,960 PROFESSOR: Why not? 1219 00:52:32,960 --> 00:52:35,460 1220 00:52:35,460 --> 00:52:36,980 I already turned that way, right? 1221 00:52:36,980 --> 00:52:38,580 AUDIENCE: Maybe you like the sun. 1222 00:52:38,580 --> 00:52:40,810 PROFESSOR: Maybe I like the sun. 1223 00:52:40,810 --> 00:52:43,170 And the people sitting near the sun may be different from 1224 00:52:43,170 --> 00:52:44,800 the people who are not. 1225 00:52:44,800 --> 00:52:46,160 Who knows. 1226 00:52:46,160 --> 00:52:49,290 The key thing is that when we say random, particularly in a 1227 00:52:49,290 --> 00:52:53,780 simple randomized experiment, what we mean is that everyone, 1228 00:52:53,780 --> 00:52:58,070 every single one of you, has the same probability of being 1229 00:52:58,070 --> 00:53:00,920 selected into the treatment group. 1230 00:53:00,920 --> 00:53:02,000 Or into one of the groups. 1231 00:53:02,000 --> 00:53:03,460 Let's say the treatment group. 1232 00:53:03,460 --> 00:53:10,140 So the key thing here is that Iqbal, Brook, Jamie, Jessica, 1233 00:53:10,140 --> 00:53:13,620 everyone, Farah, everyone in this room, if we do a simple 1234 00:53:13,620 --> 00:53:16,660 random assignment, you should have the same probability of 1235 00:53:16,660 --> 00:53:18,810 being assigned to the treatment group. 1236 00:53:18,810 --> 00:53:21,440 So it has a precise statistical definition. 1237 00:53:21,440 --> 00:53:24,530 It's not just someone saying, oh, yeah. 1238 00:53:24,530 --> 00:53:26,250 We can't remember how we did it. 1239 00:53:26,250 --> 00:53:27,120 It must have been random. 1240 00:53:27,120 --> 00:53:27,330 No. 1241 00:53:27,330 --> 00:53:30,120 It has a very, very precise definition. 1242 00:53:30,120 --> 00:53:34,390 Because if you trust someone telling you, it was random, 1243 00:53:34,390 --> 00:53:36,980 and then you trust that word, and then you start doing your 1244 00:53:36,980 --> 00:53:39,970 study, and three years later, you discover it wasn't random, 1245 00:53:39,970 --> 00:53:42,910 you are not going to be very happy with yourself. 1246 00:53:42,910 --> 00:53:47,050 So there are variations on this. 1247 00:53:47,050 --> 00:53:49,360 If you have stratified, it doesn't mean that everyone 1248 00:53:49,360 --> 00:53:50,340 must have the same probability. 1249 00:53:50,340 --> 00:53:52,040 It means everyone within a strata. 1250 00:53:52,040 --> 00:53:56,050 But the basic idea is, before we do random assignments, we 1251 00:53:56,050 --> 00:53:59,380 should know the probability of everyone being selected. 1252 00:53:59,380 --> 00:54:01,990 When I say the same probability of being selected 1253 00:54:01,990 --> 00:54:04,080 into a treatment group, that probability 1254 00:54:04,080 --> 00:54:05,630 doesn't need to be half. 1255 00:54:05,630 --> 00:54:07,210 So it could be a third. 1256 00:54:07,210 --> 00:54:08,670 It could be two thirds. 1257 00:54:08,670 --> 00:54:10,530 From a statistical power perspective, you 1258 00:54:10,530 --> 00:54:12,230 prefer half and half. 1259 00:54:12,230 --> 00:54:15,700 But whatever it is, all of you should have the same 1260 00:54:15,700 --> 00:54:17,400 probability of being selected. 1261 00:54:17,400 --> 00:54:19,160 Make sense? 1262 00:54:19,160 --> 00:54:20,410 OK. 1263 00:54:20,410 --> 00:54:22,800 1264 00:54:22,800 --> 00:54:26,043 AUDIENCE: In your example of drawing the ball, is that a 1265 00:54:26,043 --> 00:54:27,890 random assignment? 1266 00:54:27,890 --> 00:54:28,150 PROFESSOR: Right. 1267 00:54:28,150 --> 00:54:30,940 So again, it depends on the details on how you do it. 1268 00:54:30,940 --> 00:54:35,320 But suppose we have balls for, I don't know, 30 participants 1269 00:54:35,320 --> 00:54:39,150 or however many you are, and you have balls from 1 to 30, 1270 00:54:39,150 --> 00:54:41,890 and you mix the bag, and you really trusted the physics 1271 00:54:41,890 --> 00:54:44,700 that by mixing, that all the balls would have the same 1272 00:54:44,700 --> 00:54:47,470 chance of being selected, and you draw one 1273 00:54:47,470 --> 00:54:49,300 ball from the bag-- 1274 00:54:49,300 --> 00:54:51,480 all the balls had the same chance of being selected. 1275 00:54:51,480 --> 00:54:53,796 All of you had the same chance of being selected. 1276 00:54:53,796 --> 00:54:55,014 AUDIENCE: But the second person-- 1277 00:54:55,014 --> 00:54:59,030 so when you draw one, that's 1 out of 30. 1278 00:54:59,030 --> 00:55:00,450 PROFESSOR: Yes. 1279 00:55:00,450 --> 00:55:06,160 AUDIENCE: But the second time you do it, you could have a-- 1280 00:55:06,160 --> 00:55:09,200 PROFESSOR: So if the sample size is very, very small, you 1281 00:55:09,200 --> 00:55:11,990 worry about sampling with replacement and without 1282 00:55:11,990 --> 00:55:13,110 replacing-- 1283 00:55:13,110 --> 00:55:18,550 if the population from which you're drawing is very small, 1284 00:55:18,550 --> 00:55:19,970 you may have an issue with that. 1285 00:55:19,970 --> 00:55:23,170 If the population is large, the difference between 1 in 1286 00:55:23,170 --> 00:55:28,820 1000 and 1 in 999, it's going to be pretty small. 1287 00:55:28,820 --> 00:55:30,260 If you do it sequentially like that. 1288 00:55:30,260 --> 00:55:34,200 If you do it in a computer, you can have a randomizing 1289 00:55:34,200 --> 00:55:38,060 device that just generates a random number, and then you 1290 00:55:38,060 --> 00:55:39,310 pick the first half. 1291 00:55:39,310 --> 00:55:41,980 1292 00:55:41,980 --> 00:55:42,670 OK. 1293 00:55:42,670 --> 00:55:46,040 So is random assignment the same as random sampling? 1294 00:55:46,040 --> 00:55:53,730 1295 00:55:53,730 --> 00:55:56,196 I see no, yes? 1296 00:55:56,196 --> 00:55:56,630 AUDIENCE: No. 1297 00:55:56,630 --> 00:55:58,130 PROFESSOR: No. 1298 00:55:58,130 --> 00:55:59,440 I need a little bit more than that. 1299 00:55:59,440 --> 00:56:01,141 AUDIENCE: A random assignment, you would have already 1300 00:56:01,141 --> 00:56:04,300 narrowed down to a smaller sample, and assigned within 1301 00:56:04,300 --> 00:56:05,713 that sample. 1302 00:56:05,713 --> 00:56:08,431 Random sampling would be taking a group out of a whole 1303 00:56:08,431 --> 00:56:09,790 population. 1304 00:56:09,790 --> 00:56:10,420 PROFESSOR: OK. 1305 00:56:10,420 --> 00:56:10,860 Very good. 1306 00:56:10,860 --> 00:56:17,110 So one way think about this is you have your target 1307 00:56:17,110 --> 00:56:20,730 population, then you have potential participants. 1308 00:56:20,730 --> 00:56:24,870 This may be children you're targeting to in your 1309 00:56:24,870 --> 00:56:26,220 intervention. 1310 00:56:26,220 --> 00:56:28,920 And then you have your evaluation sample. 1311 00:56:28,920 --> 00:56:34,170 Here's where the random sampling could occur. 1312 00:56:34,170 --> 00:56:36,150 So-- 1313 00:56:36,150 --> 00:56:37,450 sorry I forgot your name, 1314 00:56:37,450 --> 00:56:38,270 AUDIENCE: I didn't tell you. 1315 00:56:38,270 --> 00:56:39,120 PROFESSOR: You didn't tell me. 1316 00:56:39,120 --> 00:56:41,800 This is even worse. 1317 00:56:41,800 --> 00:56:42,600 Jean. 1318 00:56:42,600 --> 00:56:46,100 So what Jean is saying is, random sampling happened at 1319 00:56:46,100 --> 00:56:47,070 this stage. 1320 00:56:47,070 --> 00:56:49,400 Or could have happened in this stage. 1321 00:56:49,400 --> 00:56:53,870 What random sampling is buying you is the ability to 1322 00:56:53,870 --> 00:56:56,440 generalize from your evaluation to 1323 00:56:56,440 --> 00:56:57,790 this population here. 1324 00:56:57,790 --> 00:57:00,080 And whether this is a population of policy interests 1325 00:57:00,080 --> 00:57:01,320 or not, that's a different matter. 1326 00:57:01,320 --> 00:57:04,730 But that's what random sampling is buying you. 1327 00:57:04,730 --> 00:57:07,760 What random assignment is doing is once you have the 1328 00:57:07,760 --> 00:57:10,540 samples-- so suppose there are 100,000 potential 1329 00:57:10,540 --> 00:57:11,640 participants. 1330 00:57:11,640 --> 00:57:15,000 You don't have money to enroll 100,000 people in a program or 1331 00:57:15,000 --> 00:57:16,350 in an evaluation. 1332 00:57:16,350 --> 00:57:20,510 You pick, out of this 100,000, 5,000 at random, the results 1333 00:57:20,510 --> 00:57:23,990 of your study are going to be generalizable to this 100,000. 1334 00:57:23,990 --> 00:57:27,380 Now, within this 5,000, you do random assignment and you 1335 00:57:27,380 --> 00:57:29,670 assign to a treatment group and to a control group. 1336 00:57:29,670 --> 00:57:34,800 Maybe of this 5,000, 2,500 fall here, 2,500 fall here. 1337 00:57:34,800 --> 00:57:38,770 What random assignment buys you is these two groups are 1338 00:57:38,770 --> 00:57:41,490 identical, and so any difference you observe in 1339 00:57:41,490 --> 00:57:43,670 outcomes is due to the program. 1340 00:57:43,670 --> 00:57:45,140 That's internal validity. 1341 00:57:45,140 --> 00:57:48,760 That has to do with causal inference that is about this 1342 00:57:48,760 --> 00:57:50,920 5,000 that are here. 1343 00:57:50,920 --> 00:57:56,340 So where the 5,000 generalize to is an external validation. 1344 00:57:56,340 --> 00:57:59,610 So they both have the word "random," but these are two 1345 00:57:59,610 --> 00:58:01,980 different concepts. 1346 00:58:01,980 --> 00:58:04,370 Again, random assignment relates to internal validity, 1347 00:58:04,370 --> 00:58:05,320 causal inference. 1348 00:58:05,320 --> 00:58:09,020 Random sampling refers to external validity. 1349 00:58:09,020 --> 00:58:09,480 yes? 1350 00:58:09,480 --> 00:58:12,040 AUDIENCE: My name is Cornelia. 1351 00:58:12,040 --> 00:58:13,590 PROFESSOR: I should know it by now. 1352 00:58:13,590 --> 00:58:14,840 AUDIENCE: I haven't said it yet. 1353 00:58:14,840 --> 00:58:16,900 1354 00:58:16,900 --> 00:58:18,100 Can you do one and not the other? 1355 00:58:18,100 --> 00:58:18,780 Not really. 1356 00:58:18,780 --> 00:58:19,610 Do you have to--? 1357 00:58:19,610 --> 00:58:20,430 PROFESSOR: You can, you can. 1358 00:58:20,430 --> 00:58:21,130 In fact-- 1359 00:58:21,130 --> 00:58:22,230 well, sorry. 1360 00:58:22,230 --> 00:58:25,040 If it's called a randomized experiment, this 1361 00:58:25,040 --> 00:58:28,440 one has to be there. 1362 00:58:28,440 --> 00:58:30,610 This is what defines a randomized experiment. 1363 00:58:30,610 --> 00:58:31,860 there was random assignment. 1364 00:58:31,860 --> 00:58:34,250 1365 00:58:34,250 --> 00:58:37,230 AUDIENCE: So you can do a randomized assignment, even if 1366 00:58:37,230 --> 00:58:38,600 your sampling is not running. 1367 00:58:38,600 --> 00:58:39,760 PROFESSOR: That's right. 1368 00:58:39,760 --> 00:58:43,350 So what that means is that then you need to think about 1369 00:58:43,350 --> 00:58:44,600 who you generalize to. 1370 00:58:44,600 --> 00:58:47,880 1371 00:58:47,880 --> 00:58:48,580 All right. 1372 00:58:48,580 --> 00:58:51,225 So advantages and limitations of experiments. 1373 00:58:51,225 --> 00:58:53,790 1374 00:58:53,790 --> 00:58:57,830 For those of you who are a little bit more statistically 1375 00:58:57,830 --> 00:59:03,170 inclined, the key thing about random assignment is that not 1376 00:59:03,170 --> 00:59:06,510 only on average the two groups are the same, but the 1377 00:59:06,510 --> 00:59:09,550 distribution, the statistical distribution of the two 1378 00:59:09,550 --> 00:59:13,220 groups, is the same. 1379 00:59:13,220 --> 00:59:16,310 And this is very powerful for a lot of the adjustments that 1380 00:59:16,310 --> 00:59:19,010 come at a later stage, particularly when there are 1381 00:59:19,010 --> 00:59:21,200 crossovers and similar things. 1382 00:59:21,200 --> 00:59:24,450 The idea is that the two groups not only on average 1383 00:59:24,450 --> 00:59:26,900 both unobservable, and unobservable characteristics 1384 00:59:26,900 --> 00:59:29,320 look the same, but the whole distribution. 1385 00:59:29,320 --> 00:59:31,480 So they have the same variance, they have the same 1386 00:59:31,480 --> 00:59:35,530 25th percentile, the same 75th percentile. 1387 00:59:35,530 --> 00:59:38,720 And of course, when I say the same, again, it's in a 1388 00:59:38,720 --> 00:59:43,430 statistical sense, subject to sampling error, which we can 1389 00:59:43,430 --> 00:59:45,000 account for. 1390 00:59:45,000 --> 00:59:45,970 And so there are-- 1391 00:59:45,970 --> 00:59:46,730 yes? 1392 00:59:46,730 --> 00:59:49,287 AUDIENCE: That doesn't necessarily mean that they're 1393 00:59:49,287 --> 00:59:51,010 both anomolies. 1394 00:59:51,010 --> 00:59:51,770 PROFESSOR: No, no, no. 1395 00:59:51,770 --> 00:59:52,640 AUDIENCE: [INAUDIBLE] 1396 00:59:52,640 --> 00:59:53,986 PROFESSOR: Anything. 1397 00:59:53,986 --> 00:59:54,420 Yeah. 1398 00:59:54,420 --> 00:59:56,760 Anything. 1399 00:59:56,760 --> 00:59:58,910 But the distribution should look the same. 1400 00:59:58,910 --> 01:00:01,510 1401 01:00:01,510 --> 01:00:01,860 OK. 1402 01:00:01,860 --> 01:00:04,790 So no systematic differences between the two groups. 1403 01:00:04,790 --> 01:00:07,520 1404 01:00:07,520 --> 01:00:12,610 This is deliberately a repeated slide. 1405 01:00:12,610 --> 01:00:15,340 I didn't forget to take it out of the presentation. 1406 01:00:15,340 --> 01:00:17,940 Key advantage, key takeaway message-- 1407 01:00:17,940 --> 01:00:21,950 these two groups do not differ systematically at the outset, 1408 01:00:21,950 --> 01:00:25,620 so any difference you observe should be attributable to the 1409 01:00:25,620 --> 01:00:26,240 experiment. 1410 01:00:26,240 --> 01:00:28,960 And this is under the big assumption that the experiment 1411 01:00:28,960 --> 01:00:31,160 was properly designed and conducted. 1412 01:00:31,160 --> 01:00:33,755 It's not like any experiment will reach this. 1413 01:00:33,755 --> 01:00:38,660 1414 01:00:38,660 --> 01:00:41,540 So other advantages of experiments. 1415 01:00:41,540 --> 01:00:45,330 Relative to results from non-experimental studies, 1416 01:00:45,330 --> 01:00:48,040 they're less subject to methodological debates. 1417 01:00:48,040 --> 01:00:51,760 So a lot more boring conversations in academic 1418 01:00:51,760 --> 01:01:00,140 seminars because there may be some questions about what 1419 01:01:00,140 --> 01:01:03,070 question is being answered, there may be some questions 1420 01:01:03,070 --> 01:01:05,370 about things that happen in the field that may have 1421 01:01:05,370 --> 01:01:06,710 threatened the experiment. 1422 01:01:06,710 --> 01:01:09,830 But the basic notion that if it was done properly, the two 1423 01:01:09,830 --> 01:01:12,580 groups should look alike, it's never debated. 1424 01:01:12,580 --> 01:01:17,340 Whereas with non-experimental methods, that's the whole sort 1425 01:01:17,340 --> 01:01:22,500 of central claim of the seminar and of the presenter. 1426 01:01:22,500 --> 01:01:23,780 They're easier to convey. 1427 01:01:23,780 --> 01:01:25,250 You can explain to people, look. 1428 01:01:25,250 --> 01:01:27,570 These two groups look alike at the beginning. 1429 01:01:27,570 --> 01:01:29,240 Now there's a difference. 1430 01:01:29,240 --> 01:01:31,100 It must have been the program. 1431 01:01:31,100 --> 01:01:34,240 And they're more likely to be convincing to program funders 1432 01:01:34,240 --> 01:01:35,920 and/or policymakers. 1433 01:01:35,920 --> 01:01:39,460 If they find it more credible, easier to convey, it's more 1434 01:01:39,460 --> 01:01:40,960 likely that they will take action. 1435 01:01:40,960 --> 01:01:44,210 Although in this respect, I can't emphasize enough what 1436 01:01:44,210 --> 01:01:46,870 Rachel said, which is, look. 1437 01:01:46,870 --> 01:01:50,090 If you have the right question, then answering that 1438 01:01:50,090 --> 01:01:53,650 question is going to be important to lead to change. 1439 01:01:53,650 --> 01:01:55,490 If you have the wrong question, even if you did a 1440 01:01:55,490 --> 01:01:58,500 nice experiment, it's not going to help you that much. 1441 01:01:58,500 --> 01:02:00,964 Yes? 1442 01:02:00,964 --> 01:02:06,420 AUDIENCE: I've been to the conference two months ago. 1443 01:02:06,420 --> 01:02:12,372 Some people were arguing that last first advantage that is 1444 01:02:12,372 --> 01:02:15,950 with randomization-- 1445 01:02:15,950 --> 01:02:18,105 that's random assignment-- 1446 01:02:18,105 --> 01:02:28,200 how to build two groups that are identical to each other. 1447 01:02:28,200 --> 01:02:33,030 And some people argue that you will almost never find a 1448 01:02:33,030 --> 01:02:39,380 context where you will have that situation occur. 1449 01:02:39,380 --> 01:02:43,690 The way the government programs operating in most 1450 01:02:43,690 --> 01:02:51,710 cases, it is almost impossible that you find an exact 1451 01:02:51,710 --> 01:02:55,630 identical treatment group and control group. 1452 01:02:55,630 --> 01:03:00,730 PROFESSOR: See, the key thing here is that you don't 1453 01:03:00,730 --> 01:03:01,520 need to find it. 1454 01:03:01,520 --> 01:03:04,270 It's not like you have a treatment group and now let's 1455 01:03:04,270 --> 01:03:06,980 look in the whole country, where is the control group? 1456 01:03:06,980 --> 01:03:08,400 No. 1457 01:03:08,400 --> 01:03:12,270 This method forces the two groups to be the same. 1458 01:03:12,270 --> 01:03:14,680 As long as there are some people who are going to be 1459 01:03:14,680 --> 01:03:18,430 served by the program and some that are not, if you randomly 1460 01:03:18,430 --> 01:03:21,000 assign to these two groups, the two 1461 01:03:21,000 --> 01:03:22,260 groups should be identical. 1462 01:03:22,260 --> 01:03:24,890 Not because you were very smart and looked for the 1463 01:03:24,890 --> 01:03:25,670 other group, no. 1464 01:03:25,670 --> 01:03:32,370 It's like random assignment is for those of us who precisely 1465 01:03:32,370 --> 01:03:34,130 don't think we can come up with that other 1466 01:03:34,130 --> 01:03:37,340 group on our own. 1467 01:03:37,340 --> 01:03:44,080 So there may be issues with whether you have enough 1468 01:03:44,080 --> 01:03:47,130 program applicants to be able to divide them into two 1469 01:03:47,130 --> 01:03:50,540 groups, participants and non-participants. 1470 01:03:50,540 --> 01:03:52,060 But in context where you're not 1471 01:03:52,060 --> 01:03:55,240 serving all the two groups-- 1472 01:03:55,240 --> 01:03:58,470 so if you don't have money to serve 1,000 people, and 1,000 1473 01:03:58,470 --> 01:04:00,990 people applied to your program, and you only have 400 1474 01:04:00,990 --> 01:04:04,510 slots, that's not going to-- this goes to the ethical 1475 01:04:04,510 --> 01:04:06,110 issue, which we'll discuss in a second. 1476 01:04:06,110 --> 01:04:10,140 1477 01:04:10,140 --> 01:04:12,990 The only thing that changes is how you select those 400. 1478 01:04:12,990 --> 01:04:15,520 But once you've selected randomly, those two groups 1479 01:04:15,520 --> 01:04:16,860 should look identical. 1480 01:04:16,860 --> 01:04:20,940 Again, not because you were incredibly astute at saying, 1481 01:04:20,940 --> 01:04:22,260 oh, here's another group. 1482 01:04:22,260 --> 01:04:22,500 No. 1483 01:04:22,500 --> 01:04:25,490 This this happens through the flip of a coin. 1484 01:04:25,490 --> 01:04:30,380 This is not a researcher a kind of, oh, can the research 1485 01:04:30,380 --> 01:04:31,500 and find a group? 1486 01:04:31,500 --> 01:04:35,140 Or the context is development versus a developed country. 1487 01:04:35,140 --> 01:04:36,780 This has to do with the technique 1488 01:04:36,780 --> 01:04:38,880 applied to any setting. 1489 01:04:38,880 --> 01:04:43,240 Again, you're going to have a case where you see a 1490 01:04:43,240 --> 01:04:45,440 spreadsheet and you can see, you can do the random 1491 01:04:45,440 --> 01:04:47,430 assignment and see for yourself that the two groups 1492 01:04:47,430 --> 01:04:49,200 will look similar. 1493 01:04:49,200 --> 01:04:50,450 OK? 1494 01:04:50,450 --> 01:04:52,480 1495 01:04:52,480 --> 01:04:54,680 AUDIENCE: Is it necessary that the size of the two groups 1496 01:04:54,680 --> 01:04:56,120 have to be the same? 1497 01:04:56,120 --> 01:04:57,470 PROFESSOR: No, it's not necessary. 1498 01:04:57,470 --> 01:05:01,150 And in fact in practice, what happens is, suppose you had 1499 01:05:01,150 --> 01:05:07,200 1,000 applicants and you had money to serve 600. 1500 01:05:07,200 --> 01:05:11,760 Then no matter what the statistician says-- oh, it 1501 01:05:11,760 --> 01:05:13,840 would be nice to have 500 and 500-- 1502 01:05:13,840 --> 01:05:18,140 you're not going to have 100 people not being served just 1503 01:05:18,140 --> 01:05:23,170 because you want to keep the half-half ratio. 1504 01:05:23,170 --> 01:05:26,450 From a statistical perspective it's ideal to have 50-50 1505 01:05:26,450 --> 01:05:30,740 ratio, but only from a statistical prospective. 1506 01:05:30,740 --> 01:05:33,360 If you deviate too much from that 50-50, 1507 01:05:33,360 --> 01:05:34,750 then you get in trouble. 1508 01:05:34,750 --> 01:05:36,683 So if you get to-- 1509 01:05:36,683 --> 01:05:37,410 I don't know. 1510 01:05:37,410 --> 01:05:38,810 The rule of thumb may be different 1511 01:05:38,810 --> 01:05:39,540 for different people. 1512 01:05:39,540 --> 01:05:44,040 But if you get over 70-30, I would say probably you're 1513 01:05:44,040 --> 01:05:45,300 going to lose a lot of statistical 1514 01:05:45,300 --> 01:05:46,400 power by doing that. 1515 01:05:46,400 --> 01:05:51,450 AUDIENCE: Yeah, but in some cases, for example, a country 1516 01:05:51,450 --> 01:05:57,870 needs to make priority in aid with about 200 1517 01:05:57,870 --> 01:05:59,690 hospitals, for example. 1518 01:05:59,690 --> 01:06:06,320 And in my country, there are one hospital that is the most 1519 01:06:06,320 --> 01:06:08,830 important public hospital in Honduras. 1520 01:06:08,830 --> 01:06:13,630 So you can apply this randomized process. 1521 01:06:13,630 --> 01:06:20,390 But if you don't include this particular hospital, you 1522 01:06:20,390 --> 01:06:24,300 cannot include this particular hospital 1523 01:06:24,300 --> 01:06:26,740 because it's too important. 1524 01:06:26,740 --> 01:06:30,750 We call that [UNINTELLIGIBLE] 1525 01:06:30,750 --> 01:06:32,940 [? represented ?] 1526 01:06:32,940 --> 01:06:36,090 subject for this type of problem, who have the 1527 01:06:36,090 --> 01:06:38,960 possibility of 1. 1528 01:06:38,960 --> 01:06:42,570 Should be in the sample. 1529 01:06:42,570 --> 01:06:44,960 I don't know if you understand my Spanglish. 1530 01:06:44,960 --> 01:06:45,750 PROFESSOR: No, no. 1531 01:06:45,750 --> 01:06:46,520 I speak Spanish. 1532 01:06:46,520 --> 01:06:48,010 We can communicate here. 1533 01:06:48,010 --> 01:06:55,320 So the key thing is, Again, you're trying to create 1534 01:06:55,320 --> 01:06:57,220 comparable groups. 1535 01:06:57,220 --> 01:07:01,350 If for some reason you need to serve a hospital because the 1536 01:07:01,350 --> 01:07:04,260 president of your country says, you need to serve this 1537 01:07:04,260 --> 01:07:06,300 hospital, that's fine. 1538 01:07:06,300 --> 01:07:07,650 One slot. 1539 01:07:07,650 --> 01:07:10,605 But that hospital should not be a part of your study, 1540 01:07:10,605 --> 01:07:14,920 because that hospital was not randomly assigned. 1541 01:07:14,920 --> 01:07:15,570 That's all. 1542 01:07:15,570 --> 01:07:16,550 As simple as that. 1543 01:07:16,550 --> 01:07:17,690 And you may have a few of those. 1544 01:07:17,690 --> 01:07:21,890 I mean, I can tell you, in my own experience, we're trying 1545 01:07:21,890 --> 01:07:25,680 to implement random assignment in Niger, a program financed 1546 01:07:25,680 --> 01:07:28,156 by the Millennium Challenge Corporation. 1547 01:07:28,156 --> 01:07:31,560 A program about building schools. 1548 01:07:31,560 --> 01:07:33,250 We said, we're going to do a random assignment. 1549 01:07:33,250 --> 01:07:35,230 And they say, yes, yes, yes. 1550 01:07:35,230 --> 01:07:38,380 Well, the US ambassador visited two of the villages, 1551 01:07:38,380 --> 01:07:41,740 and he promised them they were getting schools. 1552 01:07:41,740 --> 01:07:43,850 Now, you tell me if you want to be the evaluator and tell 1553 01:07:43,850 --> 01:07:44,870 those schools, no, no. 1554 01:07:44,870 --> 01:07:47,780 We're going to put you in the pool of-- 1555 01:07:47,780 --> 01:07:48,440 no way. 1556 01:07:48,440 --> 01:07:51,520 Those two villages are going to get their schools, but 1557 01:07:51,520 --> 01:07:52,820 they're not part of our evaluation. 1558 01:07:52,820 --> 01:07:57,552 1559 01:07:57,552 --> 01:08:01,960 AUDIENCE: Is there an acceptable margin? 1560 01:08:01,960 --> 01:08:03,920 PROFESSOR: See, that's again the Jamaica question. 1561 01:08:03,920 --> 01:08:05,710 I won't make that mistake again. 1562 01:08:05,710 --> 01:08:06,960 I won't to tell you. 1563 01:08:06,960 --> 01:08:09,210 1564 01:08:09,210 --> 01:08:11,080 You're going to see on Thursday a whole session on 1565 01:08:11,080 --> 01:08:13,110 statistical power, and you're going to get a sense 1566 01:08:13,110 --> 01:08:14,610 of where you are. 1567 01:08:14,610 --> 01:08:16,870 You don't want to have too many first, because you lose 1568 01:08:16,870 --> 01:08:19,410 sample size, and second because you lose 1569 01:08:19,410 --> 01:08:20,500 representativeness. 1570 01:08:20,500 --> 01:08:23,380 I mean, in the case of the hospital in Honduras, if 1571 01:08:23,380 --> 01:08:28,700 that's the hospital where 90% of things are happening, then 1572 01:08:28,700 --> 01:08:31,950 it's a little bit hard to have that as a hospital that's out 1573 01:08:31,950 --> 01:08:32,850 of your study. 1574 01:08:32,850 --> 01:08:36,279 So that is an important issue. 1575 01:08:36,279 --> 01:08:36,830 All right. 1576 01:08:36,830 --> 01:08:38,510 There are limitations of experiments, 1577 01:08:38,510 --> 01:08:39,760 believe it or not. 1578 01:08:39,760 --> 01:08:42,279 1579 01:08:42,279 --> 01:08:47,800 So the first one is, huge methodological advantages. 1580 01:08:47,800 --> 01:08:50,700 But you still need to worry about these issues of internal 1581 01:08:50,700 --> 01:08:53,410 validity and external validity. 1582 01:08:53,410 --> 01:08:56,210 And what I would say about this is, on Friday youo're 1583 01:08:56,210 --> 01:08:59,040 going to learn a lot about how to do with these internal 1584 01:08:59,040 --> 01:09:00,540 validity issues. 1585 01:09:00,540 --> 01:09:02,270 And I'm not going to go over them now. 1586 01:09:02,270 --> 01:09:05,060 But the key thing is, if you can avoid them from the 1587 01:09:05,060 --> 01:09:07,790 beginning in terms of how you design your program and how 1588 01:09:07,790 --> 01:09:10,229 you implement them, then much better. 1589 01:09:10,229 --> 01:09:11,810 External validity issues-- 1590 01:09:11,810 --> 01:09:14,970 as Rachel said, any impact evaluation conducted in a 1591 01:09:14,970 --> 01:09:17,880 particular setting is going to have external validity issues. 1592 01:09:17,880 --> 01:09:20,279 But experiments are particularly prone to them 1593 01:09:20,279 --> 01:09:23,819 because they're sometimes done in particularly concentrated 1594 01:09:23,819 --> 01:09:26,350 areas where you really want to find out, does this program 1595 01:09:26,350 --> 01:09:28,490 work before expanding it, so the external 1596 01:09:28,490 --> 01:09:30,889 validity issue is there. 1597 01:09:30,889 --> 01:09:33,810 As Rachel said, if you can design an experiment to test 1598 01:09:33,810 --> 01:09:40,760 each thing in your theory of change, that usually helps 1599 01:09:40,760 --> 01:09:41,600 with external validity. 1600 01:09:41,600 --> 01:09:43,279 And of course, if you can replicate 1601 01:09:43,279 --> 01:09:45,220 evaluation in other settings. 1602 01:09:45,220 --> 01:09:48,410 AUDIENCE: So OK, you're going to have 10 variables with 1603 01:09:48,410 --> 01:09:52,450 internal validity, equal internal validity, but only 1604 01:09:52,450 --> 01:09:54,910 three variables with external validity? 1605 01:09:54,910 --> 01:09:56,790 PROFESSOR: When you say three variables, what do you mean 1606 01:09:56,790 --> 01:09:58,950 with variables? 1607 01:09:58,950 --> 01:10:02,310 AUDIENCE: The variables that you are--variables. 1608 01:10:02,310 --> 01:10:03,790 The study variables. 1609 01:10:03,790 --> 01:10:08,310 I mean, when you're going to evaluate internal validity, 1610 01:10:08,310 --> 01:10:11,320 you're going to have 10 variables or 20. 1611 01:10:11,320 --> 01:10:13,440 PROFESSOR: Well, internal validity, the two groups 1612 01:10:13,440 --> 01:10:14,450 should be the same. 1613 01:10:14,450 --> 01:10:17,250 And you have pretty strong internal validity if you can 1614 01:10:17,250 --> 01:10:19,460 deal with this problem. 1615 01:10:19,460 --> 01:10:24,190 AUDIENCE: When you're going to the external validity, maybe 1616 01:10:24,190 --> 01:10:28,090 not the whole 20 variables will have external validity. 1617 01:10:28,090 --> 01:10:33,620 But maybe your three or four where you have been made 1618 01:10:33,620 --> 01:10:35,390 different experiment in-- 1619 01:10:35,390 --> 01:10:39,100 PROFESSOR: So it really depends on the 1620 01:10:39,100 --> 01:10:40,880 context of your project. 1621 01:10:40,880 --> 01:10:44,030 Again, I think the good example is deworming. 1622 01:10:44,030 --> 01:10:48,950 So deworming, you take out worms. 1623 01:10:48,950 --> 01:10:54,700 Well, in Honduras, if children who go to school, there are no 1624 01:10:54,700 --> 01:10:56,690 worms, and that's not the reason they don't go to 1625 01:10:56,690 --> 01:11:00,420 school, then that program in Kenya doesn't have much 1626 01:11:00,420 --> 01:11:03,020 external validity or generalizability to Honduras. 1627 01:11:03,020 --> 01:11:06,020 So you need to be thinking about how the effect is 1628 01:11:06,020 --> 01:11:06,875 supposed to be happening. 1629 01:11:06,875 --> 01:11:09,930 And here there was the anemia thing, which may work in the 1630 01:11:09,930 --> 01:11:12,240 case of Honduras or not. 1631 01:11:12,240 --> 01:11:15,600 You need to be seeing, what is the chain? 1632 01:11:15,600 --> 01:11:18,360 And seeing whether that chain is likely to hold in whatever 1633 01:11:18,360 --> 01:11:20,350 other contexts you want to apply. 1634 01:11:20,350 --> 01:11:22,260 There's no magic formula here. 1635 01:11:22,260 --> 01:11:24,890 AUDIENCE: Yeah, but you are going to control the 1636 01:11:24,890 --> 01:11:28,880 theoretical framework with just three, four variables 1637 01:11:28,880 --> 01:11:33,700 because that variable will be common in different countries? 1638 01:11:33,700 --> 01:11:36,290 PROFESSOR: Yeah, but you can have 200 variables. 1639 01:11:36,290 --> 01:11:39,530 You can say, it depends on so many things. 1640 01:11:39,530 --> 01:11:42,810 But there's a limit to how much-- 1641 01:11:42,810 --> 01:11:45,520 the external validity issue is an issue that you can always 1642 01:11:45,520 --> 01:11:46,370 hide behind it. 1643 01:11:46,370 --> 01:11:49,520 You can always say, oh, this program worked in Kenya. 1644 01:11:49,520 --> 01:11:51,670 Who knows whether it would work somewhere else? 1645 01:11:51,670 --> 01:11:54,530 And then if you take that attitude, then you can't learn 1646 01:11:54,530 --> 01:11:57,880 anything from a randomized experiment, or from any impact 1647 01:11:57,880 --> 01:12:00,140 evaluation that's done in a specific setting. 1648 01:12:00,140 --> 01:12:03,340 Because even if you did it in Kenya, in a particular point 1649 01:12:03,340 --> 01:12:07,000 in time, you can always say, well, it worked in Kenya ten 1650 01:12:07,000 --> 01:12:09,370 years ago, but maybe it won't work today. 1651 01:12:09,370 --> 01:12:12,080 So I lean to the middle ground here. 1652 01:12:12,080 --> 01:12:15,290 You sort of think about what are the critical steps or 1653 01:12:15,290 --> 01:12:19,880 stages in which it can work, and then go implement it, and 1654 01:12:19,880 --> 01:12:21,630 maybe evaluate it. 1655 01:12:21,630 --> 01:12:25,370 I think my answer here is, external validity issues are 1656 01:12:25,370 --> 01:12:27,090 going to be present for both experiments and 1657 01:12:27,090 --> 01:12:27,730 non-experiments. 1658 01:12:27,730 --> 01:12:29,370 There is no magic formula here. 1659 01:12:29,370 --> 01:12:32,060 As long as you evaluate in a particular setting, you're 1660 01:12:32,060 --> 01:12:35,400 still going to be subject to the question, does it work in 1661 01:12:35,400 --> 01:12:38,610 some other setting? 1662 01:12:38,610 --> 01:12:40,980 Some of these threats also affect the validity of 1663 01:12:40,980 --> 01:12:43,080 non-experimental studies. 1664 01:12:43,080 --> 01:12:45,930 The key thing, though, is that some of this, in the 1665 01:12:45,930 --> 01:12:49,290 non-experimental studies, you may not even realize that you 1666 01:12:49,290 --> 01:12:50,370 have the threat. 1667 01:12:50,370 --> 01:12:53,160 Because you've already done something that allows you to 1668 01:12:53,160 --> 01:12:54,960 be blind to the threat. 1669 01:12:54,960 --> 01:12:59,600 1670 01:12:59,600 --> 01:13:03,820 So other limitations, the experiment measures the impact 1671 01:13:03,820 --> 01:13:07,070 of the offer of the treatment. 1672 01:13:07,070 --> 01:13:13,770 So when we implement the program, and we say, OK, you 1673 01:13:13,770 --> 01:13:15,600 are in the treatment group, you're going to get the 1674 01:13:15,600 --> 01:13:18,510 program, as you know from implementing these programs in 1675 01:13:18,510 --> 01:13:21,580 the field, not all of the people you offer the program 1676 01:13:21,580 --> 01:13:23,670 are going to take up the program. 1677 01:13:23,670 --> 01:13:27,680 So what the experiment buys you is, the whole treatment 1678 01:13:27,680 --> 01:13:29,860 group is comparable to the whole control group. 1679 01:13:29,860 --> 01:13:33,030 So the experiment is going to tell you, this is the impact 1680 01:13:33,030 --> 01:13:36,000 for every, on average, for the whole treatment group. 1681 01:13:36,000 --> 01:13:39,940 So some of them may not have received the program, and some 1682 01:13:39,940 --> 01:13:41,730 of them may be diluting the impact of the 1683 01:13:41,730 --> 01:13:43,310 program when you estimate. 1684 01:13:43,310 --> 01:13:48,270 But technically, that's the impact that the experiment is 1685 01:13:48,270 --> 01:13:49,170 estimating. 1686 01:13:49,170 --> 01:13:53,470 So if you have a program with a very low take-up rate, then 1687 01:13:53,470 --> 01:13:56,710 you need to worry about the issue that the non-takers are 1688 01:13:56,710 --> 01:13:58,950 going to dilute the effect of the program. 1689 01:13:58,950 --> 01:14:01,670 You can then go and calculate, what is the effect of the 1690 01:14:01,670 --> 01:14:04,070 program for those who participated? 1691 01:14:04,070 --> 01:14:07,810 But then you start relying on non-experimental assumptions. 1692 01:14:07,810 --> 01:14:11,220 You've lost a bit the advantage of the experiment. 1693 01:14:11,220 --> 01:14:14,750 So that's something that you need to think about when you 1694 01:14:14,750 --> 01:14:16,000 do an experiment. 1695 01:14:16,000 --> 01:14:18,840 1696 01:14:18,840 --> 01:14:20,440 There's a limitation in terms of these 1697 01:14:20,440 --> 01:14:22,726 experiments can be costly. 1698 01:14:22,726 --> 01:14:25,250 I'll sort of just say two things about being costly. 1699 01:14:25,250 --> 01:14:29,210 1700 01:14:29,210 --> 01:14:31,460 I'll say three things about being costly. 1701 01:14:31,460 --> 01:14:33,930 And I did learn that I should never say "I'll say three 1702 01:14:33,930 --> 01:14:35,960 things," and I'll forget what those three things are. 1703 01:14:35,960 --> 01:14:37,360 But I think I'll keep them in mind. 1704 01:14:37,360 --> 01:14:39,310 The first thing-- 1705 01:14:39,310 --> 01:14:42,340 a lot of the cost of an experiment is data collection. 1706 01:14:42,340 --> 01:14:45,150 So if you are trying to evaluate the impact of a 1707 01:14:45,150 --> 01:14:48,540 program through some other non-experimental method that 1708 01:14:48,540 --> 01:14:54,280 involves data collection, you've already made the two 1709 01:14:54,280 --> 01:14:55,540 costs pretty comparable. 1710 01:14:55,540 --> 01:14:58,340 Because again, data collection is a big cost. 1711 01:14:58,340 --> 01:15:00,760 If you had a non-experimental method where you don't have to 1712 01:15:00,760 --> 01:15:04,150 collect data, obviously there's no question that that 1713 01:15:04,150 --> 01:15:05,870 is going to be cheaper. 1714 01:15:05,870 --> 01:15:07,330 So it can be costly. 1715 01:15:07,330 --> 01:15:09,790 But again, main cost data collection, which may be the 1716 01:15:09,790 --> 01:15:13,490 same for non-experimental studies that collect data. 1717 01:15:13,490 --> 01:15:17,540 But the other thing about the experiment in terms of cost is 1718 01:15:17,540 --> 01:15:22,000 that the same sample size buys you more statistical power. 1719 01:15:22,000 --> 01:15:24,570 And you may see some of this on Thursday. 1720 01:15:24,570 --> 01:15:27,750 So if you have a sample size of 1,000 people for an 1721 01:15:27,750 --> 01:15:31,590 experimental study and a sample size of 1,000 people 1722 01:15:31,590 --> 01:15:35,310 for a non-experimental study, those data collections' cost 1723 01:15:35,310 --> 01:15:38,010 will be identical, but they will be buying you different 1724 01:15:38,010 --> 01:15:39,310 statistical power. 1725 01:15:39,310 --> 01:15:42,690 So that's one thing to keep in mind about the cost of 1726 01:15:42,690 --> 01:15:43,880 experiments. 1727 01:15:43,880 --> 01:15:47,580 And the last thing is, you need to factor in, what is the 1728 01:15:47,580 --> 01:15:49,140 cost of getting the wrong answers? 1729 01:15:49,140 --> 01:15:51,510 If you really think that non-experimental methods are 1730 01:15:51,510 --> 01:15:54,710 not going to work in your particular context, then it's 1731 01:15:54,710 --> 01:15:57,540 not so useful to invest less money if you don't think 1732 01:15:57,540 --> 01:15:58,870 you're going to get the same answer. 1733 01:15:58,870 --> 01:16:01,430 And again, I don't want to push the notion that only with 1734 01:16:01,430 --> 01:16:02,930 an experiment you'll get the right answer. 1735 01:16:02,930 --> 01:16:05,880 But if you think with a non-experiment, you won't get 1736 01:16:05,880 --> 01:16:08,700 the right answer, then the cost of the wrong answer, the 1737 01:16:08,700 --> 01:16:10,440 risk of a wrong answer. 1738 01:16:10,440 --> 01:16:13,550 Ethical issues. 1739 01:16:13,550 --> 01:16:15,720 Throw them at me. 1740 01:16:15,720 --> 01:16:18,380 AUDIENCE: How do you say no to people who come to you, saying 1741 01:16:18,380 --> 01:16:20,450 I want to put myself in this program. 1742 01:16:20,450 --> 01:16:23,090 I have all the characteristics you're asking for. 1743 01:16:23,090 --> 01:16:25,710 You're offering it to my neighbor. 1744 01:16:25,710 --> 01:16:27,080 How come you're not offering it to me? 1745 01:16:27,080 --> 01:16:28,600 PROFESSOR: OK. 1746 01:16:28,600 --> 01:16:34,120 The first thing to think about here is experiments are 1747 01:16:34,120 --> 01:16:39,180 typically done in context where there's access demand. 1748 01:16:39,180 --> 01:16:42,340 Where there are more people who want to be in your program 1749 01:16:42,340 --> 01:16:45,670 than can be served by your program. 1750 01:16:45,670 --> 01:16:48,740 And if that's the case, suppose you had 1,000 people 1751 01:16:48,740 --> 01:16:54,440 who applied to your program, and you can only serve 400. 1752 01:16:54,440 --> 01:16:56,800 The question I ask you, Cornelia-- 1753 01:16:56,800 --> 01:16:58,490 and only you-- 1754 01:16:58,490 --> 01:17:02,620 is how many people are you going to have to say, sorry, I 1755 01:17:02,620 --> 01:17:04,840 can't serve you? 1756 01:17:04,840 --> 01:17:06,030 600. 1757 01:17:06,030 --> 01:17:09,030 Both in the context of an experiment and in the context 1758 01:17:09,030 --> 01:17:10,730 of a non-experimental study. 1759 01:17:10,730 --> 01:17:14,990 The only thing that changes is how you decide who those 600 1760 01:17:14,990 --> 01:17:16,020 people are. 1761 01:17:16,020 --> 01:17:17,470 It's the only thing that changes. 1762 01:17:17,470 --> 01:17:22,200 And in fact, in some contexts, the flip of the coin can seem 1763 01:17:22,200 --> 01:17:27,190 more fair then you deciding, I think this person is more 1764 01:17:27,190 --> 01:17:29,990 deserving, or this person-- 1765 01:17:29,990 --> 01:17:33,190 So in that context, in the context where you're going to 1766 01:17:33,190 --> 01:17:37,070 have to turn away people, then the ethical issues, in my 1767 01:17:37,070 --> 01:17:40,200 mind, are much harder to justify. 1768 01:17:40,200 --> 01:17:43,120 I'm not saying there are no ethical issues in experiments. 1769 01:17:43,120 --> 01:17:44,380 There are some context in which 1770 01:17:44,380 --> 01:17:45,390 there are ethical issues. 1771 01:17:45,390 --> 01:17:48,760 So if you are completely convinced that your program 1772 01:17:48,760 --> 01:17:54,330 works, then why are you going to do this whole randomized 1773 01:17:54,330 --> 01:17:55,070 experiment? 1774 01:17:55,070 --> 01:17:57,450 The only thing I can tell you is that a lot of people have 1775 01:17:57,450 --> 01:18:00,060 been very convinced that some programs work, and then they 1776 01:18:00,060 --> 01:18:01,600 turn out not to work. 1777 01:18:01,600 --> 01:18:03,520 But if you are completely convinced that the program 1778 01:18:03,520 --> 01:18:06,860 works, then you shouldn't be doing it. 1779 01:18:06,860 --> 01:18:11,560 And then the other thing is, if you are testing an 1780 01:18:11,560 --> 01:18:16,210 intervention that you think can harm people, then there 1781 01:18:16,210 --> 01:18:18,000 are ethical issues involved. 1782 01:18:18,000 --> 01:18:22,610 So I don't think anyone will be very fond of doing an 1783 01:18:22,610 --> 01:18:27,720 experiment to try to find out whether smoking causes lung 1784 01:18:27,720 --> 01:18:30,590 cancer, for example. 1785 01:18:30,590 --> 01:18:33,270 Because we don't have experimental evidence, but the 1786 01:18:33,270 --> 01:18:34,920 medical evidence seems to be pretty 1787 01:18:34,920 --> 01:18:36,692 strongly in favor of that. 1788 01:18:36,692 --> 01:18:37,942 Maria Teresa? 1789 01:18:37,942 --> 01:18:39,996 1790 01:18:39,996 --> 01:18:42,230 AUDIENCE: A consequence of that ethical question, was 1791 01:18:42,230 --> 01:18:45,134 hard for me, was people who are indeed chosen to be in the 1792 01:18:45,134 --> 01:18:47,325 program and people who are not. 1793 01:18:47,325 --> 01:18:48,708 You have to come back to these people who are not and follow 1794 01:18:48,708 --> 01:18:50,355 up with them. 1795 01:18:50,355 --> 01:18:53,455 And how willing to cooperate were they to collect more 1796 01:18:53,455 --> 01:18:55,442 data, to talk with them. 1797 01:18:55,442 --> 01:18:56,649 And you know, working [UNINTELLIGIBLE] is really 1798 01:18:56,649 --> 01:19:01,410 hard, because you take time from the farmer for two hours 1799 01:19:01,410 --> 01:19:04,050 every couple months, and come back, and standing there. 1800 01:19:04,050 --> 01:19:06,760 I mean, while the other guy received something for these 1801 01:19:06,760 --> 01:19:08,072 two hours that are given to you. 1802 01:19:08,072 --> 01:19:09,245 So I think that that is the-- 1803 01:19:09,245 --> 01:19:11,610 Maybe you need to apply this more often. 1804 01:19:11,610 --> 01:19:12,280 PROFESSOR: Yeah. 1805 01:19:12,280 --> 01:19:16,250 So I mean, again, I think there are things you try to do 1806 01:19:16,250 --> 01:19:19,110 to deal with them. 1807 01:19:19,110 --> 01:19:22,790 That has to do more with the implementation of any study in 1808 01:19:22,790 --> 01:19:23,920 which you have a comparison group. 1809 01:19:23,920 --> 01:19:25,010 It's not the experiment. 1810 01:19:25,010 --> 01:19:26,200 Experiment has a control group. 1811 01:19:26,200 --> 01:19:28,480 With any other study that has a comparison group where 1812 01:19:28,480 --> 01:19:31,300 you're collecting data faces this issue. 1813 01:19:31,300 --> 01:19:32,650 And then there are things you can do. 1814 01:19:32,650 --> 01:19:35,180 1815 01:19:35,180 --> 01:19:36,470 It depends on the program. 1816 01:19:36,470 --> 01:19:40,100 But certainly sometimes offering some small incentive 1817 01:19:40,100 --> 01:19:44,070 for people in both groups to fill in the survey is 1818 01:19:44,070 --> 01:19:46,180 certainly one thing that could help. 1819 01:19:46,180 --> 01:19:49,490 The other thing that I think is very important is data 1820 01:19:49,490 --> 01:19:50,710 collection. 1821 01:19:50,710 --> 01:19:58,140 The average researcher, when they are asked the question, 1822 01:19:58,140 --> 01:20:00,480 do you want to add one more question to the survey? 1823 01:20:00,480 --> 01:20:03,940 The probability of saying yes is 99% for the average 1824 01:20:03,940 --> 01:20:04,440 researcher. 1825 01:20:04,440 --> 01:20:07,950 So if you have two hours in the field, you have to start 1826 01:20:07,950 --> 01:20:11,580 thinking, well, how many of this question do I really need 1827 01:20:11,580 --> 01:20:12,780 to be asking? 1828 01:20:12,780 --> 01:20:15,780 I mean, that's an issue of implementation versus-- 1829 01:20:15,780 --> 01:20:18,350 So I think there ways to do with this. 1830 01:20:18,350 --> 01:20:20,150 But again, it's not unique to experiment. 1831 01:20:20,150 --> 01:20:23,760 It really has to do with how you implement any study in 1832 01:20:23,760 --> 01:20:26,140 which you're going to collect data on people who are not 1833 01:20:26,140 --> 01:20:29,520 receiving any benefit. 1834 01:20:29,520 --> 01:20:29,900 Yes? 1835 01:20:29,900 --> 01:20:30,750 Ethical issues? 1836 01:20:30,750 --> 01:20:31,730 AUDIENCE: Nigel. 1837 01:20:31,730 --> 01:20:34,180 I think an answer which-- 1838 01:20:34,180 --> 01:20:34,640 PROFESSOR: Nigel. 1839 01:20:34,640 --> 01:20:35,910 You are from the Kennedy School. 1840 01:20:35,910 --> 01:20:36,810 Very nice to meet you. 1841 01:20:36,810 --> 01:20:38,060 AUDIENCE: I'm leaving next week. 1842 01:20:38,060 --> 01:20:40,130 1843 01:20:40,130 --> 01:20:43,090 The issue of, even if you had as much money as you kept to 1844 01:20:43,090 --> 01:20:45,190 all give to those 1,000 people, you 1845 01:20:45,190 --> 01:20:46,620 can't do them all today. 1846 01:20:46,620 --> 01:20:49,870 So the way to do it is say, OK, we'll do 500 this year and 1847 01:20:49,870 --> 01:20:50,830 500 next year. 1848 01:20:50,830 --> 01:20:56,410 So you're getting all 1,000 people, but you do your 1849 01:20:56,410 --> 01:20:58,670 randomized evaluation year one. 1850 01:20:58,670 --> 01:20:59,700 PROFESSOR: Exactly. 1851 01:20:59,700 --> 01:21:02,660 And tomorrow there are going to be two sessions on how to 1852 01:21:02,660 --> 01:21:06,560 do roll out design-- there's a bunch of designs that are 1853 01:21:06,560 --> 01:21:09,504 applying the same principle. 1854 01:21:09,504 --> 01:21:14,150 AUDIENCE: When you think about the cost of the study, don't 1855 01:21:14,150 --> 01:21:17,970 you think a question you should deal with way early on 1856 01:21:17,970 --> 01:21:22,080 is the size of the impact that you're looking for? 1857 01:21:22,080 --> 01:21:24,418 PROFESSOR: Absolutely. 1858 01:21:24,418 --> 01:21:27,840 AUDIENCE: If the study is going to cost me a lot of 1859 01:21:27,840 --> 01:21:35,280 money, and there's a significant probability that 1860 01:21:35,280 --> 01:21:37,890 it might have only a small effect, then that maybe isn't 1861 01:21:37,890 --> 01:21:40,144 worth bothering with. 1862 01:21:40,144 --> 01:21:44,440 And so you talked about looking up the size of the 1863 01:21:44,440 --> 01:21:48,080 effect and the statistics, and whether it's statistically 1864 01:21:48,080 --> 01:21:48,886 significant. 1865 01:21:48,886 --> 01:21:53,270 But that size question, it seems to me, gets 1866 01:21:53,270 --> 01:21:55,730 looked at very late. 1867 01:21:55,730 --> 01:22:01,253 And it should be way up front in the very early days because 1868 01:22:01,253 --> 01:22:05,015 of the impact, whether the program is really of interest, 1869 01:22:05,015 --> 01:22:07,000 and worth following. 1870 01:22:07,000 --> 01:22:08,880 PROFESSOR: So two quick reactions. 1871 01:22:08,880 --> 01:22:11,190 The first one is what Rachel said. 1872 01:22:11,190 --> 01:22:13,270 Think strategically about impact evaluations. 1873 01:22:13,270 --> 01:22:16,330 You don't want to evaluate every single thing that's in 1874 01:22:16,330 --> 01:22:19,640 your organization or every single thing under the sun. 1875 01:22:19,640 --> 01:22:22,270 You're not going to be able to do an impact evaluation on all 1876 01:22:22,270 --> 01:22:23,180 of those things. 1877 01:22:23,180 --> 01:22:25,800 You may do other kinds of evaluations on hopefully most 1878 01:22:25,800 --> 01:22:28,480 of your programs, but an impact evaluation, you should 1879 01:22:28,480 --> 01:22:30,810 be very strategic on where you do it. 1880 01:22:30,810 --> 01:22:33,350 And if you think this is a program that is not generating 1881 01:22:33,350 --> 01:22:36,010 much impact and it's not costing you that much money, 1882 01:22:36,010 --> 01:22:39,410 then you may say, I'm not going to evaluate it. 1883 01:22:39,410 --> 01:22:45,710 The second thing I would say with regard to that is 1884 01:22:45,710 --> 01:22:48,660 thinking about the effect of the program is something you 1885 01:22:48,660 --> 01:22:51,460 need to do at stage one, the designing of the study. 1886 01:22:51,460 --> 01:22:54,770 And this will connect with your session on sample size 1887 01:22:54,770 --> 01:22:57,200 that Esther will speak about on Thursday. 1888 01:22:57,200 --> 01:23:00,910 Because thinking about the larger that impact is, that 1889 01:23:00,910 --> 01:23:03,770 affects your calculations of sample size. 1890 01:23:03,770 --> 01:23:07,690 The paradox in all of this, despite of what you said, the 1891 01:23:07,690 --> 01:23:10,400 paradox in all of this is that the bigger the 1892 01:23:10,400 --> 01:23:13,110 effect of the program-- 1893 01:23:13,110 --> 01:23:14,990 so if you expect this program is going 1894 01:23:14,990 --> 01:23:17,580 to have a huge effect-- 1895 01:23:17,580 --> 01:23:20,240 the smaller the sample size you need, and hence the 1896 01:23:20,240 --> 01:23:22,100 smaller the data collection costs. 1897 01:23:22,100 --> 01:23:25,450 So paradoxically, if the program is extremely 1898 01:23:25,450 --> 01:23:29,250 important, the data collection cost should actually be lower 1899 01:23:29,250 --> 01:23:31,740 than a program where you want to detect effects that are 1900 01:23:31,740 --> 01:23:32,530 very small. 1901 01:23:32,530 --> 01:23:35,970 Having said that, you want to evaluate the programs that 1902 01:23:35,970 --> 01:23:38,630 make strategic sense for you to evaluate. 1903 01:23:38,630 --> 01:23:41,390 I mean, one thing I think you should try to avoid, despite 1904 01:23:41,390 --> 01:23:44,390 all our enthusiasm with randomized experiment, you 1905 01:23:44,390 --> 01:23:46,670 shouldn't leave this course thinking, OK. 1906 01:23:46,670 --> 01:23:49,420 Where do I see an opportunity to randomize? 1907 01:23:49,420 --> 01:23:53,840 And then forget about what is it that you're trying to do. 1908 01:23:53,840 --> 01:23:56,880 You know, you may find a great opportunity to randomize, but 1909 01:23:56,880 --> 01:23:58,890 if it doesn't answer a question you care about, 1910 01:23:58,890 --> 01:24:02,470 you've just wasted money. 1911 01:24:02,470 --> 01:24:04,970 All right, so-- 1912 01:24:04,970 --> 01:24:06,160 you have a question? 1913 01:24:06,160 --> 01:24:08,950 This is very interesting. 1914 01:24:08,950 --> 01:24:14,770 AUDIENCE: I want to know, do you think that in any context, 1915 01:24:14,770 --> 01:24:16,470 one can be able to carry out an impact evaluation? 1916 01:24:16,470 --> 01:24:20,390 1917 01:24:20,390 --> 01:24:22,350 For any type of program-- 1918 01:24:22,350 --> 01:24:26,860 PROFESSOR: So my answer to that is 1919 01:24:26,860 --> 01:24:29,340 no, not in any context. 1920 01:24:29,340 --> 01:24:33,600 But probably in more contexts than you think about. 1921 01:24:33,600 --> 01:24:34,540 That is my short answer. 1922 01:24:34,540 --> 01:24:39,420 AUDIENCE: What about, for example, infrastructure--? 1923 01:24:39,420 --> 01:24:40,220 PROFESSOR: There have been. 1924 01:24:40,220 --> 01:24:41,600 It's harder to do. 1925 01:24:41,600 --> 01:24:42,920 There have been some studies. 1926 01:24:42,920 --> 01:24:44,970 This is actually, I think, a growing area. 1927 01:24:44,970 --> 01:24:48,760 This is an area where people are trying to do some impact 1928 01:24:48,760 --> 01:24:49,670 evaluation. 1929 01:24:49,670 --> 01:24:51,780 I mean, if you're building a road in the middle of the 1930 01:24:51,780 --> 01:24:55,180 country, and this is one road for the whole country-- 1931 01:24:55,180 --> 01:24:56,270 you can't do it. 1932 01:24:56,270 --> 01:24:57,070 But it's OK. 1933 01:24:57,070 --> 01:25:00,820 You don't need to do an impact evaluation for everything you 1934 01:25:00,820 --> 01:25:04,130 do, and you don't need to do a randomized impact evaluation 1935 01:25:04,130 --> 01:25:05,510 for everything you do. 1936 01:25:05,510 --> 01:25:09,450 What I do hope the message comes clear is, if you decide 1937 01:25:09,450 --> 01:25:12,100 to do an impact evaluation, then thinking about a 1938 01:25:12,100 --> 01:25:15,290 randomized design should be your first choice. 1939 01:25:15,290 --> 01:25:19,460 If you can't do it-- and can't do it is not just, oh, there's 1940 01:25:19,460 --> 01:25:20,190 some issues-- 1941 01:25:20,190 --> 01:25:20,740 no, no. 1942 01:25:20,740 --> 01:25:23,510 Can't do it, really trying, given all these advantages, 1943 01:25:23,510 --> 01:25:27,490 really trying-- if you can't do it, then you may consider 1944 01:25:27,490 --> 01:25:28,810 doing other things. 1945 01:25:28,810 --> 01:25:31,680 But this should be your first option if you decide to do an 1946 01:25:31,680 --> 01:25:32,930 impact evaluation. 1947 01:25:32,930 --> 01:25:34,990 1948 01:25:34,990 --> 01:25:35,310 All right. 1949 01:25:35,310 --> 01:25:37,200 Partial equilibrium. 1950 01:25:37,200 --> 01:25:38,470 It's a little bit more technical. 1951 01:25:38,470 --> 01:25:42,050 But if you have a program that only affects some people 1952 01:25:42,050 --> 01:25:43,750 differentially. 1953 01:25:43,750 --> 01:25:47,440 So suppose you had a program that was going to train people 1954 01:25:47,440 --> 01:25:50,740 on how to have better resumes. 1955 01:25:50,740 --> 01:25:53,960 And if you only do it for a few people, then this program 1956 01:25:53,960 --> 01:25:55,040 may have a huge effect. 1957 01:25:55,040 --> 01:25:58,270 But if you do it for everyone in your town, there's going to 1958 01:25:58,270 --> 01:26:01,000 be little advantage that's gained from this. 1959 01:26:01,000 --> 01:26:05,200 And so the randomized experiment estimates a partial 1960 01:26:05,200 --> 01:26:06,320 equilibrium effect. 1961 01:26:06,320 --> 01:26:09,140 You don't know what would happen if everyone in a 1962 01:26:09,140 --> 01:26:11,270 particular setting got the treatment. 1963 01:26:11,270 --> 01:26:15,290 I think this is important in some settings, but not enough. 1964 01:26:15,290 --> 01:26:15,610 All right. 1965 01:26:15,610 --> 01:26:18,960 So I'm not going to go too much about get out the vote, 1966 01:26:18,960 --> 01:26:22,260 because we're already a minute away from time. 1967 01:26:22,260 --> 01:26:29,180 What I want to do is just show you this table here. 1968 01:26:29,180 --> 01:26:30,450 You already discussed it. 1969 01:26:30,450 --> 01:26:40,110 1970 01:26:40,110 --> 01:26:43,670 So this is what the case study shows. 1971 01:26:43,670 --> 01:26:46,970 This is a situation where you had four 1972 01:26:46,970 --> 01:26:49,950 methods to estimate impacts. 1973 01:26:49,950 --> 01:26:52,540 The first four methods found out that the 1974 01:26:52,540 --> 01:26:54,380 program had an effect. 1975 01:26:54,380 --> 01:26:57,130 The last method, the randomized experiment, found 1976 01:26:57,130 --> 01:26:59,590 no statistically significant effect. 1977 01:26:59,590 --> 01:27:02,470 I'm not saying that in every single-- this goes back to 1978 01:27:02,470 --> 01:27:03,070 your question. 1979 01:27:03,070 --> 01:27:04,170 I'm not saying that in every single 1980 01:27:04,170 --> 01:27:06,320 setting, this will happen. 1981 01:27:06,320 --> 01:27:09,630 But this is a good example of a setting in which if you had 1982 01:27:09,630 --> 01:27:11,680 gone with any of these techniques, you would have 1983 01:27:11,680 --> 01:27:14,250 concluded the program had an effect when it didn't. 1984 01:27:14,250 --> 01:27:17,050 And there are other settings where the reverse may happen. 1985 01:27:17,050 --> 01:27:21,830 And so if we were able to say ex ante, before the 1986 01:27:21,830 --> 01:27:24,900 evaluation, this method is going to be just as good as 1987 01:27:24,900 --> 01:27:27,070 the experiment, that's great. 1988 01:27:27,070 --> 01:27:29,790 We may be able to save some money if there's no data 1989 01:27:29,790 --> 01:27:31,820 collection involved, and that would be great. 1990 01:27:31,820 --> 01:27:34,440 But I think the bottom line here is, we 1991 01:27:34,440 --> 01:27:35,980 are not always able-- 1992 01:27:35,980 --> 01:27:39,150 and I think very few people will tell you, we know when 1993 01:27:39,150 --> 01:27:40,900 this method will work. 1994 01:27:40,900 --> 01:27:46,090 Because the assumption behind each of this methods on how 1995 01:27:46,090 --> 01:27:47,660 the work is untestable-- 1996 01:27:47,660 --> 01:27:50,890 you can't statistically test that assumption. 1997 01:27:50,890 --> 01:27:53,680 So you may argue in favor of it. 1998 01:27:53,680 --> 01:27:56,640 You may show evidence in favor of it. 1999 01:27:56,640 --> 01:27:58,520 But you can't specifically test it. 2000 01:27:58,520 --> 01:28:04,040 And that's the big advantage of the experiment. 2001 01:28:04,040 --> 01:28:09,250 So let me just close with what I hope are the 2002 01:28:09,250 --> 01:28:10,660 bottom lines from this. 2003 01:28:10,660 --> 01:28:12,940 The first thing, what's underlined there. 2004 01:28:12,940 --> 01:28:15,290 If properly designed and conducted, the social 2005 01:28:15,290 --> 01:28:17,290 experiments provide the most credible 2006 01:28:17,290 --> 01:28:19,770 assessment of the program. 2007 01:28:19,770 --> 01:28:22,440 But the "if" is a very important "if." Don't leave this course thinking, if it's a randomized experiment, piece of cake. Everything will work. That's not the message that we want to give you here. It needs to be properly designed and conducted. And for that, you really need a partnership between the evaluators and the agencies implementing it. They're easy to understand, much less subject to the methodological quibbles, and more likely to convince policymakers. These advantages are only present if they are properly conducted and implemented, and you must assess the validity of experiment in the same way you assess the validity of any studies. Because you're going to have threats to an experiment anyway, and on Friday, you're going to learn how to deal with some of them. I hope this was moderately helpful. I think I have one of the toughest sessions to teach, because you guys, some of you come completely convinced of why you want to randomize, some of you come very skeptical, and I have to reach a middle ground. I hope I did. If you have one more question, I'll take it. Yes? AUDIENCE: Have you found that it's possible to teach organizations to run their own randomized trials from start to finish, even if there are no economists on staff? Or does this always sort of require the intervention or assistance of outside modulators? PROFESSOR: I think, as you will see throughout this course, conducting an impact evaluation, even a randomized one, does involve some technical skills and does involve some practical experience in doing it. I'm not saying those cannot be found in organizations that are in the field. But if those skills are not there, it's going to be very hard to do it. Now, you can do a lot of training on how to do this things. But I think it'd be hard to do it without someone who has at least done a few of these and seen some of the problems that arise. Because problems will arise-- I mean, no question about it. You will be asking the evaluator, how far can we go? And the evaluator, whoever it is, whether they're in the agency or not, needs to be able to answer that question in a way that at the end, you have a credible evaluation. I'm not saying you need an expert outside of the organization. But I am saying you need an expert somewhere. And whether you have it inside or outside, there's a whole issue of independence versus objectivity that I won't speak to. AUDIENCE: Consumer companies do it. PROFESSOR: Consumer companies? AUDIENCE: Yeah. Procter & Gamble and big companies like that do experiments all the time, build their capability into the organization, how they make decisions. I'm just wondering that if someone leaving this course with a few experiments under their belt could implement something like this, or whether you need to go as far as getting an economics degree in order to be able to do the coordinating and evaluation of this type. PROFESSOR: So I think to do an impact evaluation, there are usually more than one people involved. And there are different roles for different people. There are some roles who are having good training in economics as particularly useful. There are other roles where I would say it's particularly un-useful to be an economist. So I really think it depends on what role a person leaving this course would like to sort of play in the evaluation. And you know, whether leaving this course, you'll be able to run your experiments on your own-- I think would be an extremely successful course if that happened. We have no way to measure the impact of this program, but if that were to happen, relative to what would have happened if you had not come to this course, that would be phenomenal. I think my sense is unless you have prior training in this kind of thing, what this course will hopefully give you is the ability to be involved in an evaluation and to be pretty good at interacting with whoever is also involved in evaluation at asking the right question of the evaluator. This is extremely important. And being very aware in the field of what may be threatening an evaluation. If you're able to do it on your own after this, I hate to say it, but I don't think it's because this session that you heard from me today. All right. I think I already ate a few minutes into your time. It was a pleasure. I'll be here for a few more minutes if you want. I hope you have a wonderful rest of the course, and see you somewhere.