1 00:00:01,640 --> 00:00:04,040 The following content is provided under a Creative 2 00:00:04,040 --> 00:00:05,580 Commons license. 3 00:00:05,580 --> 00:00:07,880 Your support will help MIT OpenCourseWare 4 00:00:07,880 --> 00:00:12,270 continue to offer high-quality educational resources for free. 5 00:00:12,270 --> 00:00:14,870 To make a donation or view additional materials 6 00:00:14,870 --> 00:00:18,830 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,830 --> 00:00:21,749 at ocw.mit.edu. 8 00:00:21,749 --> 00:00:24,290 STEFANIE TELLEX: So today I'm going to talk about human robot 9 00:00:24,290 --> 00:00:24,920 collaboration. 10 00:00:24,920 --> 00:00:28,767 How can we make robots that can work together with people just 11 00:00:28,767 --> 00:00:30,350 as if they were another person and try 12 00:00:30,350 --> 00:00:33,620 to achieve this kind of fluid dynamic that people have 13 00:00:33,620 --> 00:00:34,820 when they work together? 14 00:00:34,820 --> 00:00:36,470 These are my human collaborators. 15 00:00:36,470 --> 00:00:38,900 This work is done by a lot of collaborative students 16 00:00:38,900 --> 00:00:40,490 and postdocs. 17 00:00:40,490 --> 00:00:43,460 So we're really in an exciting time in robotics 18 00:00:43,460 --> 00:00:46,670 because robots are becoming more and more capable 19 00:00:46,670 --> 00:00:49,770 and they're able to operate in one structured environment. 20 00:00:49,770 --> 00:00:51,920 Russ gave a great talk about Atlas 21 00:00:51,920 --> 00:00:55,517 doing things like driving a car and opening doors. 22 00:00:55,517 --> 00:00:57,350 This is another robot that I've worked with. 23 00:00:57,350 --> 00:01:00,260 A robotic forklift that can drive around autonomously 24 00:01:00,260 --> 00:01:01,880 in warehouse environments. 25 00:01:01,880 --> 00:01:05,360 It can detect where pallets are, track people, pick things up, 26 00:01:05,360 --> 00:01:06,650 put things down. 27 00:01:06,650 --> 00:01:08,810 And it's designed to do this in collaboration 28 00:01:08,810 --> 00:01:11,330 with people who also share the environment. 29 00:01:11,330 --> 00:01:13,580 There's robots that can assemble IKEA furniture. 30 00:01:13,580 --> 00:01:16,040 This was Ross Knepper and Daniela Rus 31 00:01:16,040 --> 00:01:18,200 at MIT that I worked with to do this. 32 00:01:18,200 --> 00:01:22,340 So they made this team of robots that can autonomously 33 00:01:22,340 --> 00:01:27,920 assemble tables and chairs that are produced by IKEA. 34 00:01:27,920 --> 00:01:29,570 And what would be nice is if people 35 00:01:29,570 --> 00:01:30,770 can work with these robots. 36 00:01:30,770 --> 00:01:32,540 Sometimes they encounter failures, 37 00:01:32,540 --> 00:01:34,280 and the person might be able to intervene 38 00:01:34,280 --> 00:01:37,730 in a way that enables the robot to recover from failure. 39 00:01:37,730 --> 00:01:40,460 And kind of the dream is robots that operate 40 00:01:40,460 --> 00:01:42,082 in household environments. 41 00:01:42,082 --> 00:01:44,540 So this is my son when he was about nine months old when we 42 00:01:44,540 --> 00:01:47,720 shot this picture with a PR2. 43 00:01:47,720 --> 00:01:49,725 You'd really like to imagine a robot-- 44 00:01:49,725 --> 00:01:52,100 Rosie the robot from the Jetsons that lives in your house 45 00:01:52,100 --> 00:01:54,110 with you and helps you in all kinds of ways. 46 00:01:54,110 --> 00:01:57,530 Anything from doing the laundry to cleaning up your room, 47 00:01:57,530 --> 00:01:59,520 emptying the dishwasher, helping you cook. 48 00:01:59,520 --> 00:02:02,150 And this could have applications for people 49 00:02:02,150 --> 00:02:03,320 in all aspects of life. 50 00:02:03,320 --> 00:02:06,005 Elders, people who are disabled, or even people 51 00:02:06,005 --> 00:02:07,880 who are really busy and don't feel like doing 52 00:02:07,880 --> 00:02:10,340 all the chores in their house. 53 00:02:10,340 --> 00:02:13,340 So the aim of my research program 54 00:02:13,340 --> 00:02:16,670 is to enable humans and robots to collaborate together 55 00:02:16,670 --> 00:02:18,680 on complex tasks. 56 00:02:18,680 --> 00:02:21,500 And I'm going to talk about the three big problems 57 00:02:21,500 --> 00:02:24,600 that I think we need to solve to make this happen. 58 00:02:24,600 --> 00:02:26,690 So the first problem is that you need 59 00:02:26,690 --> 00:02:29,960 to be able to have a robot that can robustly perform actions 60 00:02:29,960 --> 00:02:32,292 in real-world environments. 61 00:02:32,292 --> 00:02:34,500 And we're seeing more and more progress in this area, 62 00:02:34,500 --> 00:02:38,030 but the house is kind of like this grand challenge. 63 00:02:38,030 --> 00:02:41,247 And John was talking about kind of all these edge cases. 64 00:02:41,247 --> 00:02:42,830 So I'm going to talk about an approach 65 00:02:42,830 --> 00:02:46,370 that we're taking to try to increase 66 00:02:46,370 --> 00:02:49,100 the robustness and also the diversity of actions 67 00:02:49,100 --> 00:02:51,350 that a robot can take in a real-world environment 68 00:02:51,350 --> 00:02:54,320 by taking an instance-based approach. 69 00:02:54,320 --> 00:02:56,150 Next, you need robots that can carry out 70 00:02:56,150 --> 00:02:57,880 complex sequences of actions. 71 00:02:57,880 --> 00:03:00,710 So they need to be able to plan in really, really 72 00:03:00,710 --> 00:03:03,406 large combinatorial state-action spaces. 73 00:03:03,406 --> 00:03:05,780 There might be hundreds or thousands of objects in a home 74 00:03:05,780 --> 00:03:07,600 that a robot might need to manipulate 75 00:03:07,600 --> 00:03:09,800 and depending on whether the person is doing laundry 76 00:03:09,800 --> 00:03:12,830 or they cooking broccoli or are they making dessert. 77 00:03:12,830 --> 00:03:14,750 The set of objects that are relevant that 78 00:03:14,750 --> 00:03:18,500 are useful that the robot needs to worry about in order to help 79 00:03:18,500 --> 00:03:20,490 the person is wildly different. 80 00:03:20,490 --> 00:03:23,410 So we need new algorithms for planning in this really large 81 00:03:23,410 --> 00:03:25,050 state- action space. 82 00:03:25,050 --> 00:03:26,480 And finally, the robot needs to be 83 00:03:26,480 --> 00:03:29,520 able to figure out what people want in the first place. 84 00:03:29,520 --> 00:03:32,600 So people communicate using language, gesture, 85 00:03:32,600 --> 00:03:34,640 but also just by walking around the environment 86 00:03:34,640 --> 00:03:37,400 and doing things that you can infer something about what 87 00:03:37,400 --> 00:03:38,774 their intentions are. 88 00:03:38,774 --> 00:03:41,190 And critically, when people communicate with other people, 89 00:03:41,190 --> 00:03:43,200 it's not an open loop kind of communication. 90 00:03:43,200 --> 00:03:45,574 It's not like you send a message and then close your eyes 91 00:03:45,574 --> 00:03:46,500 and hope for the best. 92 00:03:46,500 --> 00:03:48,500 People, when you're talking with other people, 93 00:03:48,500 --> 00:03:51,020 engage in a closed loop dialogue. 94 00:03:51,020 --> 00:03:53,390 There's feedback going on in both directions that 95 00:03:53,390 --> 00:03:57,470 acts to detect and reduce errors in the communication. 96 00:03:57,470 --> 00:03:59,450 And this is a critical thing for robots 97 00:03:59,450 --> 00:04:02,060 to exploit because robots have a lot more problems 98 00:04:02,060 --> 00:04:04,620 than people do in terms of perceiving the environment 99 00:04:04,620 --> 00:04:05,870 and acting in the environment. 100 00:04:05,870 --> 00:04:08,284 So it's really important that we establish 101 00:04:08,284 --> 00:04:10,700 some kind of feedback loop between the human and the robot 102 00:04:10,700 --> 00:04:13,310 so that the robot can infer what the person wants 103 00:04:13,310 --> 00:04:15,830 and carry out helpful actions. 104 00:04:15,830 --> 00:04:17,310 So the three parts of the talk are 105 00:04:17,310 --> 00:04:19,579 going to be about each of these three things. 106 00:04:19,579 --> 00:04:22,800 So this is my dad's pantry in a home. 107 00:04:22,800 --> 00:04:25,430 And it's kind of like John's pictures of the Google car. 108 00:04:25,430 --> 00:04:29,000 Most robots can't pick up most objects most of the time. 109 00:04:29,000 --> 00:04:32,210 It's really hard to imagine a robot doing anything 110 00:04:32,210 --> 00:04:34,079 with a scene like this one. 111 00:04:34,079 --> 00:04:35,870 There was just the Amazon picking challenge 112 00:04:35,870 --> 00:04:38,740 and the team that won used a vacuum cleaner, 113 00:04:38,740 --> 00:04:40,700 not a gripper to pick up the objects. 114 00:04:40,700 --> 00:04:43,700 They literally sucked the things up in the gripper 115 00:04:43,700 --> 00:04:47,330 and then turned the vacuum cleaner off to put things down. 116 00:04:47,330 --> 00:04:51,650 And the Amazon challenge had much, much sparser stuff 117 00:04:51,650 --> 00:04:52,286 on the shelves. 118 00:04:52,286 --> 00:04:54,410 We'd really like to be able to do things like this. 119 00:04:56,912 --> 00:04:58,370 And what we're doing now I'm really 120 00:04:58,370 --> 00:05:02,280 going to focus on a sub-problem, which is object delivery. 121 00:05:02,280 --> 00:05:05,600 So from my perspective, I think a really important sort 122 00:05:05,600 --> 00:05:08,450 of baseline capability for a manipulator robot 123 00:05:08,450 --> 00:05:11,360 is to be able to pick something up and move it somewhere else. 124 00:05:11,360 --> 00:05:13,160 We'd obviously love a lot more things. 125 00:05:13,160 --> 00:05:15,920 We're also talking about buttoning shirts in the car. 126 00:05:15,920 --> 00:05:18,154 And you can go on with all the things you 127 00:05:18,154 --> 00:05:19,320 might want your robot to do. 128 00:05:19,320 --> 00:05:21,710 But at least, we'd like to be able to do pick and place. 129 00:05:21,710 --> 00:05:23,782 Pick it up and put it down. 130 00:05:23,782 --> 00:05:25,740 So maybe you are in a factory delivering tools, 131 00:05:25,740 --> 00:05:28,970 or maybe you're in the kitchen delivering stuff 132 00:05:28,970 --> 00:05:32,810 like ingredients or cooking utensils. 133 00:05:32,810 --> 00:05:35,536 So to do pick and place in response to natural language 134 00:05:35,536 --> 00:05:37,910 commands-- so let's say, hand me the knife or something-- 135 00:05:37,910 --> 00:05:40,370 you need to know a few things about the object. 136 00:05:40,370 --> 00:05:42,850 First of all, you need to be able to know what it is. 137 00:05:42,850 --> 00:05:45,527 If they said, hand me the ruler, you 138 00:05:45,527 --> 00:05:48,110 need to be able to know whether or not this object is a ruler. 139 00:05:48,110 --> 00:05:49,940 So some kind of label that can hook up 140 00:05:49,940 --> 00:05:51,920 to some kind of language model. 141 00:05:51,920 --> 00:05:55,719 Second, you have to know where the object is in the world 142 00:05:55,719 --> 00:05:57,260 because you're going to actually move 143 00:05:57,260 --> 00:05:59,270 your grippers and your object and yourself 144 00:05:59,270 --> 00:06:01,710 through 3D space in order to find that object. 145 00:06:01,710 --> 00:06:04,160 So here I'm going to highlight the pixels of the object. 146 00:06:04,160 --> 00:06:05,810 But you have to register those pixels 147 00:06:05,810 --> 00:06:08,240 into some kind of coordinate system 148 00:06:08,240 --> 00:06:10,920 that lets you move your gripper over to that object. 149 00:06:10,920 --> 00:06:12,982 And then third, you have to know where on that 150 00:06:12,982 --> 00:06:14,690 object are you going to put your gripper. 151 00:06:14,690 --> 00:06:17,510 So in the case of this ruler, it's pretty heavy, 152 00:06:17,510 --> 00:06:19,420 and it's this funny shape that doesn't 153 00:06:19,420 --> 00:06:20,420 have very good friction. 154 00:06:20,420 --> 00:06:23,180 So for our robot, the best place to pick it up 155 00:06:23,180 --> 00:06:24,677 is in the middle of the object. 156 00:06:24,677 --> 00:06:26,510 And there might be more than one good place, 157 00:06:26,510 --> 00:06:28,910 and it might depend on the gripper. 158 00:06:28,910 --> 00:06:32,150 And different objects might have complex things 159 00:06:32,150 --> 00:06:36,510 going on that change where the right place is to pick it up. 160 00:06:36,510 --> 00:06:38,490 So conventional approaches to this problem 161 00:06:38,490 --> 00:06:40,680 fall into two general categories. 162 00:06:40,680 --> 00:06:44,240 The first category, the first high-level approach 163 00:06:44,240 --> 00:06:46,970 is what I'm going to call category-based grasping. 164 00:06:46,970 --> 00:06:47,840 This is the dream. 165 00:06:47,840 --> 00:06:50,510 So the dream is that you walk up to your robot, 166 00:06:50,510 --> 00:06:53,580 you hand it an object that it's never seen before, 167 00:06:53,580 --> 00:06:56,210 and the robot infers all three of those things, what 168 00:06:56,210 --> 00:06:59,090 it is, where it is, and where to put the gripper. 169 00:06:59,090 --> 00:07:01,190 And there's a line of work that does this. 170 00:07:01,190 --> 00:07:03,140 So this is one paper from Ashutosh Saxena, 171 00:07:03,140 --> 00:07:05,180 and there's a bunch of others. 172 00:07:05,180 --> 00:07:07,520 The problem is that it doesn't work well enough. 173 00:07:07,520 --> 00:07:09,950 We are not at the accuracy rates that sort of John 174 00:07:09,950 --> 00:07:12,460 was alluding to that we need for driving. 175 00:07:12,460 --> 00:07:15,575 So in Ashutosh's paper, I think they got 70% or 80% 176 00:07:15,575 --> 00:07:19,430 pick success rate on their particular test set at doing 177 00:07:19,430 --> 00:07:21,639 category-based grasping. 178 00:07:21,639 --> 00:07:23,180 And I think that you're going to have 179 00:07:23,180 --> 00:07:26,210 to expect that to fall if you actually give it a wider 180 00:07:26,210 --> 00:07:27,890 array of objects in the home. 181 00:07:27,890 --> 00:07:30,290 And even if it doesn't fall, 80% means 182 00:07:30,290 --> 00:07:33,950 it's dropping things 20% of the time, and that's not so good. 183 00:07:33,950 --> 00:07:36,720 The second approach is instance-based grasping. 184 00:07:36,720 --> 00:07:38,822 So I was talking to Eric Sudderth in my department 185 00:07:38,822 --> 00:07:40,280 who does machine learning, he said, 186 00:07:40,280 --> 00:07:42,990 instant recognition is a solved problem in computer vision. 187 00:07:42,990 --> 00:07:46,715 So instant recognition is I give you a training set of the slide 188 00:07:46,715 --> 00:07:49,970 flipper, lots of images of it, and then your job 189 00:07:49,970 --> 00:07:53,610 given a new picture is to draw a little box around the slide 190 00:07:53,610 --> 00:07:54,290 flipper. 191 00:07:54,290 --> 00:07:56,070 This is considered a solved problem in computer vision. 192 00:07:56,070 --> 00:07:57,502 There is a data set and a corpus, 193 00:07:57,502 --> 00:07:59,210 and the performance maxed out, and people 194 00:07:59,210 --> 00:08:00,680 have stopped working on it. 195 00:08:00,680 --> 00:08:02,210 And a lot of the work in robotics 196 00:08:02,210 --> 00:08:03,660 uses this kind of approach. 197 00:08:03,660 --> 00:08:06,170 We were talking about you have some kind of geometric model. 198 00:08:06,170 --> 00:08:07,320 That's the instance-based model. 199 00:08:07,320 --> 00:08:09,278 These models can take a lot of different forms. 200 00:08:09,278 --> 00:08:11,990 They can be an image or a 3D model or whatever it is. 201 00:08:11,990 --> 00:08:14,010 The problem is, where do you get that model. 202 00:08:14,010 --> 00:08:17,120 So if I am in my house and there is thousands 203 00:08:17,120 --> 00:08:19,520 of different objects, you're not going 204 00:08:19,520 --> 00:08:22,130 to have the 3D model most likely for the object 205 00:08:22,130 --> 00:08:24,600 that you want to pick up right now for the person. 206 00:08:24,600 --> 00:08:26,030 So there's this sort of data grab. 207 00:08:26,030 --> 00:08:28,910 But if you do have the model, it can be really, really accurate 208 00:08:28,910 --> 00:08:31,070 because you can know a lot about the object 209 00:08:31,070 --> 00:08:32,750 that you're trying to pick up. 210 00:08:32,750 --> 00:08:34,640 So the contribution of our approach 211 00:08:34,640 --> 00:08:38,600 is to try to bridge these two by enabling a robot 212 00:08:38,600 --> 00:08:41,630 to get the accuracy of the instance-based approach 213 00:08:41,630 --> 00:08:44,390 by autonomously collecting its own data 214 00:08:44,390 --> 00:08:47,684 that it needs in order to robustly manipulate objects. 215 00:08:47,684 --> 00:08:49,100 So we're going to get the accuracy 216 00:08:49,100 --> 00:08:51,830 of instance-based approach and the generality of category 217 00:08:51,830 --> 00:08:54,650 at the cost of not human time, but robot time 218 00:08:54,650 --> 00:08:56,100 to build this model. 219 00:08:56,100 --> 00:08:58,410 So here's what it looks like on our Baxter. 220 00:08:58,410 --> 00:09:01,550 It's going to make a point cloud. 221 00:09:01,550 --> 00:09:06,460 This is showing-- it's got a one pixel connect in its gripper. 222 00:09:06,460 --> 00:09:08,719 So you're seeing it doing a sort of raster scan. 223 00:09:08,719 --> 00:09:10,260 This is sped up to get a point cloud. 224 00:09:10,260 --> 00:09:12,110 Now it's taking images of the object. 225 00:09:12,110 --> 00:09:14,414 So it's got an RGB camera in its wrist. 226 00:09:14,414 --> 00:09:15,830 It's taking pictures of the object 227 00:09:15,830 --> 00:09:17,390 from lots of different perspectives. 228 00:09:17,390 --> 00:09:19,370 So the data looks like this. 229 00:09:19,370 --> 00:09:21,652 You segment out the object from the background. 230 00:09:21,652 --> 00:09:23,360 You get lots and lots and lots of images. 231 00:09:23,360 --> 00:09:25,109 You do completely standard computer vision 232 00:09:25,109 --> 00:09:28,717 stuff, SIFT and kNN, to make a detector out of this data. 233 00:09:28,717 --> 00:09:29,800 You can get a point cloud. 234 00:09:29,800 --> 00:09:31,216 This is the point cloud looks like 235 00:09:31,216 --> 00:09:33,210 at one-centimeter resolution. 236 00:09:33,210 --> 00:09:37,290 And after we do this, we're able to pick up lots of stuff. 237 00:09:37,290 --> 00:09:39,100 So this is showing our robot-- 238 00:09:39,100 --> 00:09:41,120 these two objects, localizing the object 239 00:09:41,120 --> 00:09:43,010 and picking things up. 240 00:09:43,010 --> 00:09:44,580 It's going to pick up the egg. 241 00:09:44,580 --> 00:09:49,160 And that's a practice EpiPen. 242 00:09:49,160 --> 00:09:52,310 There's a little shake to make sure it's got a good grasp. 243 00:09:52,310 --> 00:09:54,800 Now this works on a lot of objects, 244 00:09:54,800 --> 00:09:56,930 so let's see how it does on the ruler. 245 00:09:56,930 --> 00:09:59,060 So the way that the system is working 246 00:09:59,060 --> 00:10:01,740 is it's using the point cloud to infer 247 00:10:01,740 --> 00:10:03,030 where to grasp the object. 248 00:10:03,030 --> 00:10:06,720 But we don't really have a model of physics or friction 249 00:10:06,720 --> 00:10:07,380 or slippage. 250 00:10:07,380 --> 00:10:08,969 So it infers a grasp near the end 251 00:10:08,969 --> 00:10:10,260 because it fits in the gripper. 252 00:10:10,260 --> 00:10:11,430 It kind of looks like it's going to work. 253 00:10:11,430 --> 00:10:13,950 And it does fit in the gripper, but when we go and do 254 00:10:13,950 --> 00:10:15,706 that shake, what's going to happen 255 00:10:15,706 --> 00:10:17,580 is it's going to pop right out of the gripper 256 00:10:17,580 --> 00:10:20,250 because it's got this relatively low friction. 257 00:10:20,250 --> 00:10:22,330 There it goes and falls out. 258 00:10:22,330 --> 00:10:23,220 So that's bad, right? 259 00:10:23,220 --> 00:10:26,630 We don't really like it when our robots drop things. 260 00:10:26,630 --> 00:10:30,366 So before training, what happens is it falls out of the gripper. 261 00:10:30,366 --> 00:10:31,740 In the case of the ruler, there's 262 00:10:31,740 --> 00:10:33,000 sort of physics going on, right? 263 00:10:33,000 --> 00:10:34,541 Things are slipping out, and maybe we 264 00:10:34,541 --> 00:10:36,270 should be doing physical reasoning. 265 00:10:36,270 --> 00:10:37,590 I think we should be doing physical reasoning. 266 00:10:37,590 --> 00:10:38,760 I won't say "maybe" about that. 267 00:10:38,760 --> 00:10:40,134 But we're not doing it right now. 268 00:10:40,134 --> 00:10:42,135 And there's lots of reasons things could fail. 269 00:10:42,135 --> 00:10:44,542 So in other problematic objects, this 270 00:10:44,542 --> 00:10:45,750 is one of those salt shakers. 271 00:10:45,750 --> 00:10:49,080 It's got black handles that are great for our robot to pick up, 272 00:10:49,080 --> 00:10:51,480 but they're black, so they absorb the IR light, 273 00:10:51,480 --> 00:10:52,600 so we can't see them. 274 00:10:52,600 --> 00:10:54,975 So we can't figure out that we're supposed to grab there. 275 00:10:54,975 --> 00:10:56,250 That round bulb looks awesome. 276 00:10:56,250 --> 00:10:57,990 It's transparent though, so you get 277 00:10:57,990 --> 00:10:59,250 all these weird reflections. 278 00:10:59,250 --> 00:11:01,120 So the robots-- are inference algorithms is like, 279 00:11:01,120 --> 00:11:03,161 oh, that bulb, that's where we should pick it up. 280 00:11:03,161 --> 00:11:04,620 It doesn't fit in the gripper. 281 00:11:04,620 --> 00:11:06,700 So it will very often slip out of the gripper. 282 00:11:06,700 --> 00:11:10,200 So what our approach to solve this problem is, is we're 283 00:11:10,200 --> 00:11:11,849 going to let the robot practice. 284 00:11:11,849 --> 00:11:14,140 So we have-- I'm not going to go through the algorithm, 285 00:11:14,140 --> 00:11:18,250 but we have this unarmed bandit algorithm that 286 00:11:18,250 --> 00:11:22,300 lets us systematically decide where we should pick objects 287 00:11:22,300 --> 00:11:22,800 up. 288 00:11:22,800 --> 00:11:26,610 You can give it a prior on where you think good graphs are. 289 00:11:26,610 --> 00:11:28,110 And you can use whatever information 290 00:11:28,110 --> 00:11:30,210 you want in that prior. 291 00:11:30,210 --> 00:11:32,850 And if the prior was perfect, this would be boring. 292 00:11:32,850 --> 00:11:36,100 It would just work the first time, and life would go on. 293 00:11:36,100 --> 00:11:38,730 But if the prior's wrong for any reason, 294 00:11:38,730 --> 00:11:41,880 the robot will be able to detect it and fix things up and learn 295 00:11:41,880 --> 00:11:44,525 where the most reliable places are to pick up those objects. 296 00:11:44,525 --> 00:11:47,550 So here's an example of what happens 297 00:11:47,550 --> 00:11:48,990 when we use this algorithm. 298 00:11:48,990 --> 00:11:51,390 We practice picking up the ruler. 299 00:11:51,390 --> 00:11:52,730 I forget how many times it had. 300 00:11:52,730 --> 00:11:55,350 Maybe 20 on this particular object. 301 00:11:55,350 --> 00:11:57,060 One of the rifts on the algorithm 302 00:11:57,060 --> 00:11:58,660 is it decides when to stop. 303 00:11:58,660 --> 00:12:01,110 So we go a maximum 50 picks. 304 00:12:01,110 --> 00:12:03,600 But we might stop after three if all three of them 305 00:12:03,600 --> 00:12:06,010 work so that you can go on to the next object to train. 306 00:12:06,010 --> 00:12:09,810 So here it picks up in the middle and does a nice shake. 307 00:12:09,810 --> 00:12:13,300 OK, so what we're doing now is scaling up this whole thing. 308 00:12:13,300 --> 00:12:16,500 So this is showing our robot practicing on lots and lots 309 00:12:16,500 --> 00:12:17,702 of different objects. 310 00:12:17,702 --> 00:12:18,660 A lot of them are toys. 311 00:12:18,660 --> 00:12:20,820 My son likes to watch this video because he 312 00:12:20,820 --> 00:12:23,700 likes to see the robot playing with all of his toys. 313 00:12:23,700 --> 00:12:25,387 And I think playing is actually-- 314 00:12:25,387 --> 00:12:27,720 I mean it's one of those loaded cognitive science words, 315 00:12:27,720 --> 00:12:30,561 but I think that's an interesting way 316 00:12:30,561 --> 00:12:33,060 to think about what the robots are actually doing right now. 317 00:12:33,060 --> 00:12:35,070 It's doing little experiments trying 318 00:12:35,070 --> 00:12:37,260 to pick up these objects in different places 319 00:12:37,260 --> 00:12:39,580 and recording where it works and where it doesn't work. 320 00:12:39,580 --> 00:12:43,350 So this is sort of showing 16, 32, one 321 00:12:43,350 --> 00:12:47,220 in each hand objects being done in our initial evaluation. 322 00:12:47,220 --> 00:12:50,040 And at the end of this, basically, it works. 323 00:12:50,040 --> 00:12:55,140 So this is all the objects in our test set. 324 00:12:55,140 --> 00:12:58,980 And before learning, we were able to do with this proposal 325 00:12:58,980 --> 00:13:02,745 system, which uses the steps information, we get about 50% 326 00:13:02,745 --> 00:13:04,200 pick success rate. 327 00:13:04,200 --> 00:13:06,690 After learning, they go up to 75%. 328 00:13:06,690 --> 00:13:09,000 And the other really cool thing is that this 329 00:13:09,000 --> 00:13:10,890 is a bimodal distribution. 330 00:13:10,890 --> 00:13:16,020 So it doesn't say 75% is what you're going to get. 331 00:13:16,020 --> 00:13:19,350 A lot of these objects worked eight, nine out of 10 times 332 00:13:19,350 --> 00:13:20,450 or 10 out of 10 times. 333 00:13:20,450 --> 00:13:22,041 It goes from worst to best. 334 00:13:22,041 --> 00:13:23,540 So the good stuff is all over there, 335 00:13:23,540 --> 00:13:25,200 and the hard stuff is all over there. 336 00:13:25,200 --> 00:13:26,910 A lot of other objects were really hard. 337 00:13:26,910 --> 00:13:29,160 So that garlic press I think we picked it up one time. 338 00:13:29,160 --> 00:13:32,620 It's really, really heavy, so it slips out a lot. 339 00:13:32,620 --> 00:13:35,870 That gyro-ball thing has a lot of reflection, 340 00:13:35,870 --> 00:13:37,720 so we had trouble localizing it accurately. 341 00:13:37,720 --> 00:13:40,034 So we picked it up very few times. 342 00:13:40,034 --> 00:13:41,700 I think everything from about the EpiPen 343 00:13:41,700 --> 00:13:43,158 over was eight out of 10 or better. 344 00:13:43,158 --> 00:13:46,580 So not only-- so there's a lot of objects that we can pick up, 345 00:13:46,580 --> 00:13:48,330 and we can know which ones we can pick up. 346 00:13:48,330 --> 00:13:49,680 And which ones we can't. 347 00:13:49,680 --> 00:13:52,650 We are right now taking an aggressively instance-based 348 00:13:52,650 --> 00:13:53,222 approach. 349 00:13:53,222 --> 00:13:54,930 And the reason that we're doing that is I 350 00:13:54,930 --> 00:13:57,138 think there's something magic when the robot actually 351 00:13:57,138 --> 00:13:58,170 picks something up. 352 00:13:58,170 --> 00:14:01,380 So where I wanted to start is let's cheat 353 00:14:01,380 --> 00:14:02,340 in every way we can. 354 00:14:02,340 --> 00:14:04,020 Let's completely make a model that's 355 00:14:04,020 --> 00:14:06,990 totally specific to this particular object. 356 00:14:06,990 --> 00:14:08,460 But the next step that we're doing 357 00:14:08,460 --> 00:14:10,890 is to try to scale up this whole thing 358 00:14:10,890 --> 00:14:14,040 and then start to think about more general models to go back 359 00:14:14,040 --> 00:14:16,770 to that dream of category-based recognition. 360 00:14:16,770 --> 00:14:19,530 So if you look at computer vision success stories, 361 00:14:19,530 --> 00:14:23,340 one of the things that makes a lot of algorithms successful 362 00:14:23,340 --> 00:14:24,540 is data sets. 363 00:14:24,540 --> 00:14:26,585 And the size of those data sets is immense. 364 00:14:26,585 --> 00:14:28,210 A lot of the computer vision data sets, 365 00:14:28,210 --> 00:14:30,900 COCO DB from Microsoft, have millions 366 00:14:30,900 --> 00:14:34,110 of images, which are labeled with where the object is. 367 00:14:34,110 --> 00:14:36,060 But most of those images are taken 368 00:14:36,060 --> 00:14:38,610 by a human photographer on your cell phone 369 00:14:38,610 --> 00:14:39,750 or uploaded to Flicker. 370 00:14:39,750 --> 00:14:42,130 Wherever they got them from. 371 00:14:42,130 --> 00:14:44,130 And you get to see each object once. 372 00:14:44,130 --> 00:14:46,140 Maybe you see it twice from one perspective 373 00:14:46,140 --> 00:14:47,931 that a human carefully chose. 374 00:14:47,931 --> 00:14:49,180 You don't get to play with it. 375 00:14:49,180 --> 00:14:50,570 You don't get to manipulate it. 376 00:14:50,570 --> 00:14:54,210 In robotics, there's some data sets of object instances. 377 00:14:54,210 --> 00:14:56,850 The largest ones have a few hundred of objects. 378 00:14:56,850 --> 00:14:58,530 So computer vision people that I've 379 00:14:58,530 --> 00:15:01,450 talked to they laugh at it because it's just so much 380 00:15:01,450 --> 00:15:04,540 smaller compared to the data sets that we're working with. 381 00:15:04,540 --> 00:15:06,940 I think it's also so much smaller than what 382 00:15:06,940 --> 00:15:10,930 a human child gets to play with over the course of going 383 00:15:10,930 --> 00:15:12,487 from zero to two years old. 384 00:15:12,487 --> 00:15:15,070 I guess my son became a mobile manipulator around a year later 385 00:15:15,070 --> 00:15:16,236 around one and a half or so. 386 00:15:16,236 --> 00:15:18,110 I'm not sure exactly when. 387 00:15:18,110 --> 00:15:20,770 So one of my goals is to scale up this whole thing 388 00:15:20,770 --> 00:15:24,190 to change this data equation to be more in our favor. 389 00:15:24,190 --> 00:15:27,220 So there's about 300 of these-- this is the Baxter robot-- 390 00:15:27,220 --> 00:15:29,920 there's about 300 of them that Rethink Robotics-- 391 00:15:29,920 --> 00:15:31,240 Rod Brooks-- so we were talking about Rob Brooks 392 00:15:31,240 --> 00:15:32,115 in the previous talk. 393 00:15:32,115 --> 00:15:34,270 Rod founded this company Rethink Robotics. 394 00:15:34,270 --> 00:15:37,240 They've sold about 300 of them to the robotics research 395 00:15:37,240 --> 00:15:37,740 community. 396 00:15:37,740 --> 00:15:40,630 That's a very high penetration rate in robotics research. 397 00:15:40,630 --> 00:15:45,680 Everybody has a Baxter or a friend with a Baxter. 398 00:15:45,680 --> 00:15:47,650 So we're starting something which we're calling 399 00:15:47,650 --> 00:15:49,900 the million object challenge. 400 00:15:49,900 --> 00:15:52,540 And the goal is to enlist all of those Baxters, which 401 00:15:52,540 --> 00:15:54,880 are sitting around doing nothing a lot of the time-- 402 00:15:54,880 --> 00:15:57,154 to change this data equation. 403 00:15:57,154 --> 00:15:58,570 So what we're doing is we're going 404 00:15:58,570 --> 00:16:01,120 to try to get everybody to scan objects for us, 405 00:16:01,120 --> 00:16:03,340 so that we can get models, perceptual models, 406 00:16:03,340 --> 00:16:06,490 visual models, and also manipulation experiences 407 00:16:06,490 --> 00:16:09,880 with these objects to try to train new and better category 408 00:16:09,880 --> 00:16:10,860 models. 409 00:16:10,860 --> 00:16:12,970 And I think even existing algorithms 410 00:16:12,970 --> 00:16:15,350 may work way better simply because they have better data. 411 00:16:15,350 --> 00:16:16,891 But I think it also opens up the door 412 00:16:16,891 --> 00:16:19,364 to thinking about better models that we maybe couldn't even 413 00:16:19,364 --> 00:16:21,280 think about before because we just didn't have 414 00:16:21,280 --> 00:16:23,020 the data to play with them. 415 00:16:23,020 --> 00:16:25,360 So where we are right now is we've installed our stack 416 00:16:25,360 --> 00:16:27,790 at MIT on Daniela Rus's Baxter. 417 00:16:27,790 --> 00:16:28,810 That's this one. 418 00:16:28,810 --> 00:16:30,460 And we went down to Yale a couple 419 00:16:30,460 --> 00:16:33,064 of weeks ago to Scass's lab and we have our software 420 00:16:33,064 --> 00:16:33,730 on their Baxter. 421 00:16:33,730 --> 00:16:35,200 We're going to Rethink tomorrow. 422 00:16:35,200 --> 00:16:37,491 They're going to give us three Baxters that we're going 423 00:16:37,491 --> 00:16:38,890 to play with and install there. 424 00:16:38,890 --> 00:16:41,350 And I have a verbal yes from WPI. 425 00:16:41,350 --> 00:16:43,150 And a few other people have been like-- 426 00:16:43,150 --> 00:16:44,147 I pitched this at RSS. 427 00:16:44,147 --> 00:16:46,230 So a lot of people have said they were interested. 428 00:16:46,230 --> 00:16:50,050 I don't know if they'll actually translate to robot time. 429 00:16:50,050 --> 00:16:54,460 And our goal is to get about 500 or 1,000 objects 430 00:16:54,460 --> 00:16:55,930 between these three sites. 431 00:16:55,930 --> 00:16:57,910 Four sites I guess if the WPI gets on board. 432 00:16:57,910 --> 00:16:59,470 Four sites including us. 433 00:16:59,470 --> 00:17:02,200 So Rethink, Yale, MIT and us. 434 00:17:02,200 --> 00:17:05,470 And then do like a larger press release about the project. 435 00:17:05,470 --> 00:17:08,140 Advertise it, push over all of our friends with Baxters 436 00:17:08,140 --> 00:17:09,369 to help us scan. 437 00:17:09,369 --> 00:17:12,579 And then have yearly scanathons where you download the latest 438 00:17:12,579 --> 00:17:15,640 software and then spend a couple of days scanning objects 439 00:17:15,640 --> 00:17:20,680 for the glory of robotics or something. 440 00:17:20,680 --> 00:17:23,200 And really try to change this data equation for the better, 441 00:17:23,200 --> 00:17:24,699 so we can manipulate lots of things. 442 00:17:27,230 --> 00:17:30,676 So that's our plan for making robots 443 00:17:30,676 --> 00:17:32,050 that can robustly perform actions 444 00:17:32,050 --> 00:17:33,216 and real-world environments. 445 00:17:33,216 --> 00:17:35,140 More generally, I imagine like a mobile robot 446 00:17:35,140 --> 00:17:36,850 walking around your house at night 447 00:17:36,850 --> 00:17:38,890 and scanning stuff completely autonomously. 448 00:17:38,890 --> 00:17:41,223 Taking these pictures, building these models, hopefully, 449 00:17:41,223 --> 00:17:43,240 not breaking too much of your stuff. 450 00:17:43,240 --> 00:17:47,710 And not only learning about your particular house and the things 451 00:17:47,710 --> 00:17:49,870 that are in it, but also collecting data 452 00:17:49,870 --> 00:17:54,820 that will enable other robots to perform better over time. 453 00:17:54,820 --> 00:17:57,070 All right, so that's our attack on making robots 454 00:17:57,070 --> 00:18:01,450 robustly perform actions in real-world environments. 455 00:18:01,450 --> 00:18:03,910 So the next problem that I think is important 456 00:18:03,910 --> 00:18:06,700 for language understanding of human robot collaboration 457 00:18:06,700 --> 00:18:10,640 is making robots to carry out complex sequences of actions. 458 00:18:10,640 --> 00:18:12,692 So for example, this is this pantry again. 459 00:18:12,692 --> 00:18:14,650 There might be hundreds or thousands of objects 460 00:18:14,650 --> 00:18:16,900 that the robot could potentially manipulate. 461 00:18:16,900 --> 00:18:18,550 And it might need to do a sequence 462 00:18:18,550 --> 00:18:20,200 of 10 or 20 manipulations in order 463 00:18:20,200 --> 00:18:22,510 to solve a problem such as clean up the kitchen 464 00:18:22,510 --> 00:18:24,850 or put away the groceries. 465 00:18:24,850 --> 00:18:27,370 For work that I had done in the past on the forklift a lot 466 00:18:27,370 --> 00:18:30,850 of the commands that we studied and thought 467 00:18:30,850 --> 00:18:32,980 about were the level of abstraction of put 468 00:18:32,980 --> 00:18:34,610 the pallet on the truck. 469 00:18:34,610 --> 00:18:36,860 But one of our annotators-- we cleared the law of data 470 00:18:36,860 --> 00:18:38,412 on Amazon Mechanical Turk. 471 00:18:38,412 --> 00:18:40,870 And one of our annotators gave us this problem that I never 472 00:18:40,870 --> 00:18:43,030 forgot, which was how-- 473 00:18:43,030 --> 00:18:44,770 it was the actual forklift operator 474 00:18:44,770 --> 00:18:46,660 who worked in a warehouse, and he 475 00:18:46,660 --> 00:18:49,120 said if you paid me extra money, I'll 476 00:18:49,120 --> 00:18:52,300 tell you how to pick up a dime-- a dime, like a little coin 477 00:18:52,300 --> 00:18:53,500 with a forklift. 478 00:18:53,500 --> 00:18:55,930 Here's the instructions that he eventually 479 00:18:55,930 --> 00:18:58,140 without making us pay him gave us 480 00:18:58,140 --> 00:18:59,390 for how to solve this problem. 481 00:18:59,390 --> 00:19:01,304 So it was raise the forks 12 inches, 482 00:19:01,304 --> 00:19:03,220 line it in front of the dime, tilt it forward, 483 00:19:03,220 --> 00:19:05,665 drive a little bit over, you lower the fork 484 00:19:05,665 --> 00:19:08,224 on top of the dime, put it in reverse 485 00:19:08,224 --> 00:19:09,640 and travel backward, the dime kind 486 00:19:09,640 --> 00:19:14,440 of flips up backwards on top of the fork. 487 00:19:14,440 --> 00:19:16,240 Maybe you know how to drive a forklift, 488 00:19:16,240 --> 00:19:17,917 but you can see how that would work. 489 00:19:17,917 --> 00:19:19,750 And if you did know how to drive a forklift, 490 00:19:19,750 --> 00:19:22,120 you can follow those instructions 491 00:19:22,120 --> 00:19:23,350 and have it happen. 492 00:19:23,350 --> 00:19:25,950 But I knew that our system if we gave it these commands, there 493 00:19:25,950 --> 00:19:28,119 is no way that it would work. 494 00:19:28,119 --> 00:19:29,410 It would completely fall apart. 495 00:19:29,410 --> 00:19:31,480 And the reason that it would fall apart 496 00:19:31,480 --> 00:19:34,120 is that we gave the robot a model of actions 497 00:19:34,120 --> 00:19:36,010 at a different level of abstraction 498 00:19:36,010 --> 00:19:37,320 than this language is using. 499 00:19:37,320 --> 00:19:40,050 We gave it a very high-level of abstract actions, 500 00:19:40,050 --> 00:19:43,360 like picking stuff up and moving it into particular locations 501 00:19:43,360 --> 00:19:44,830 and moving things down. 502 00:19:44,830 --> 00:19:46,930 And if we gave it these low-level actions 503 00:19:46,930 --> 00:19:49,310 of like raising the forks 12 inches, 504 00:19:49,310 --> 00:19:50,980 the search steps that would be required 505 00:19:50,980 --> 00:19:54,130 to find a high-level thing like put the pallet on the truck 506 00:19:54,130 --> 00:19:55,709 would be prohibitively expensive. 507 00:19:55,709 --> 00:19:58,000 And I think if we want to have human-- but the thing is 508 00:19:58,000 --> 00:20:00,730 people don't like to stick at any fixed level of abstraction. 509 00:20:00,730 --> 00:20:02,770 People move up and down the tree freely. 510 00:20:02,770 --> 00:20:05,770 They give very high-level, mid-level, low-level commands. 511 00:20:05,770 --> 00:20:08,110 So I think we need new planning algorithms that 512 00:20:08,110 --> 00:20:09,260 support this kind of thing. 513 00:20:09,260 --> 00:20:10,690 So to think about this, we decided 514 00:20:10,690 --> 00:20:13,810 to look at a version of the problem in simulation. 515 00:20:13,810 --> 00:20:17,020 The simulator that we chose is a game called Minecraft. 516 00:20:17,020 --> 00:20:18,970 Five minutes, OK. 517 00:20:18,970 --> 00:20:20,500 So it's sort of this-- 518 00:20:20,500 --> 00:20:23,110 this is a picture from a Minecraft world. 519 00:20:23,110 --> 00:20:25,910 And we're trying to figure out new planning algorithms. 520 00:20:25,910 --> 00:20:27,880 So the problem here is that the agent 521 00:20:27,880 --> 00:20:30,550 needs to cross the trench. 522 00:20:30,550 --> 00:20:33,080 So he needs to make a bridge to get across the trench. 523 00:20:33,080 --> 00:20:35,080 So it's got some blocks that he can manipulate. 524 00:20:35,080 --> 00:20:37,330 And you have this combinatorial explosion 525 00:20:37,330 --> 00:20:39,220 of where the blocks can go. 526 00:20:39,220 --> 00:20:40,270 They can go anywhere. 527 00:20:40,270 --> 00:20:41,770 So in a naive algorithm, we'll spend 528 00:20:41,770 --> 00:20:44,840 a lot of time putting the blocks everywhere, 529 00:20:44,840 --> 00:20:46,720 which doesn't really make progress 530 00:20:46,720 --> 00:20:47,950 towards solving a problem. 531 00:20:47,950 --> 00:20:49,366 Whereas what you really need to do 532 00:20:49,366 --> 00:20:53,320 is focus on putting these blocks actually in the trench 533 00:20:53,320 --> 00:20:55,324 in order to solve the problem. 534 00:20:55,324 --> 00:20:56,740 Of course, on a different day, you 535 00:20:56,740 --> 00:20:58,600 might be asked to make a tower or make a castle 536 00:20:58,600 --> 00:20:59,920 or make a staircase, and then these 537 00:20:59,920 --> 00:21:01,060 might be good things to do. 538 00:21:01,060 --> 00:21:03,160 So you don't just throw out those actions. 539 00:21:03,160 --> 00:21:05,380 You want to have them both and figure out 540 00:21:05,380 --> 00:21:07,690 what to do based on your high-level goal. 541 00:21:07,690 --> 00:21:10,400 So we have some work about learning how to do this. 542 00:21:10,400 --> 00:21:12,430 So we have an agent that practices 543 00:21:12,430 --> 00:21:14,920 solving small Minecraft problems and then learns 544 00:21:14,920 --> 00:21:19,330 how to solve bigger problems from experience. 545 00:21:19,330 --> 00:21:21,576 This is showing transferring this from small problems 546 00:21:21,576 --> 00:21:23,950 to big problems in a decision theoretic framework, an MDP 547 00:21:23,950 --> 00:21:25,572 framework. 548 00:21:25,572 --> 00:21:27,280 And we've just released a couple of weeks 549 00:21:27,280 --> 00:21:30,280 ago a mod for Minecraft, the game called BurlapCraft. 550 00:21:30,280 --> 00:21:33,590 BURLAP is our reinforcement learning and planning framework 551 00:21:33,590 --> 00:21:35,170 that James MacGlashan and Michael 552 00:21:35,170 --> 00:21:36,670 Littman developed in Java. 553 00:21:36,670 --> 00:21:39,640 So you can run BURLAP inside the Minecraft JVM. 554 00:21:39,640 --> 00:21:42,460 Get the state of the real Minecraft world. 555 00:21:42,460 --> 00:21:44,200 Make small toy problems if you want. 556 00:21:44,200 --> 00:21:46,810 Or let your agent go in the real thing 557 00:21:46,810 --> 00:21:50,200 and explore the whole space of possible Minecraft spaces 558 00:21:50,200 --> 00:21:52,180 if you're interested in that simulation. 559 00:21:52,180 --> 00:21:54,010 OK, I'm almost out of time, so I'm not 560 00:21:54,010 --> 00:21:57,550 going to go too much into robots coordinating with people. 561 00:21:57,550 --> 00:22:02,620 But maybe I will show some of the videos about this work. 562 00:22:02,620 --> 00:22:05,650 The idea is that a lot of the previous work and language 563 00:22:05,650 --> 00:22:10,420 understanding shows people works in batch mode. 564 00:22:10,420 --> 00:22:13,630 So the robot does something-- the person says something, 565 00:22:13,630 --> 00:22:15,304 the robot thinks for a long time, 566 00:22:15,304 --> 00:22:16,720 and then the robot does something. 567 00:22:16,720 --> 00:22:18,250 Hopefully, the right thing. 568 00:22:18,250 --> 00:22:20,770 And as I said before, this is not how people will work. 569 00:22:20,770 --> 00:22:23,456 So we're working on new models that enable the robot-- 570 00:22:23,456 --> 00:22:25,330 this is a graphical model that shows how it-- 571 00:22:25,330 --> 00:22:26,740 talking about it in the car. 572 00:22:26,740 --> 00:22:30,160 What happens, it incrementally interprets language and gesture 573 00:22:30,160 --> 00:22:32,600 updating at very high frequencies. 574 00:22:32,600 --> 00:22:35,260 So this is showing the belief about which objects 575 00:22:35,260 --> 00:22:37,960 the person wants, updating from their language and gesture 576 00:22:37,960 --> 00:22:39,880 in an animated kind of way, right? 577 00:22:39,880 --> 00:22:42,310 Like, it's updating at 14 Hertz. 578 00:22:42,310 --> 00:22:44,719 So the idea is that the robot has the information. 579 00:22:44,719 --> 00:22:45,760 This is its own language. 580 00:22:45,760 --> 00:22:46,670 I would like a bowl. 581 00:22:46,670 --> 00:22:48,040 Both bowls go up. 582 00:22:48,040 --> 00:22:51,040 That one he points and then the one that he's pointing at 583 00:22:51,040 --> 00:22:51,850 goes up. 584 00:22:51,850 --> 00:22:53,897 So the robot knows very, very quickly, 585 00:22:53,897 --> 00:22:56,230 every time we get a new word from [INAUDIBLE] condition, 586 00:22:56,230 --> 00:22:58,850 every time we get a new observation from the gesture 587 00:22:58,850 --> 00:23:00,680 system, we update our belief. 588 00:23:00,680 --> 00:23:04,450 And just a couple of weeks ago, we had our first pilot results 589 00:23:04,450 --> 00:23:07,900 showing that we can use this information to enable the robot 590 00:23:07,900 --> 00:23:11,770 to produce real-time feedback that increases the human's 591 00:23:11,770 --> 00:23:14,150 accuracy at getting the robot to select the right object. 592 00:23:17,570 --> 00:23:19,090 This is some quantitative results. 593 00:23:19,090 --> 00:23:21,250 I'll skip it. 594 00:23:21,250 --> 00:23:24,162 OK, so that's the three main thrusts 595 00:23:24,162 --> 00:23:25,870 that I'm working on in my research group. 596 00:23:25,870 --> 00:23:28,316 Trying to make robots that can robustly perform actions 597 00:23:28,316 --> 00:23:29,440 in real-world environments. 598 00:23:29,440 --> 00:23:32,590 Thinking about planning in a really large state action 599 00:23:32,590 --> 00:23:36,019 spaces that result when you have a capable and powerful robot. 600 00:23:36,019 --> 00:23:38,560 And then thinking about how you can make the robot coordinate 601 00:23:38,560 --> 00:23:40,184 with people so that they can figure out 602 00:23:40,184 --> 00:23:42,790 what to do in these really large state actions spaces. 603 00:23:42,790 --> 00:23:44,550 Thank you.