1 00:00:01,640 --> 00:00:04,040 The following content is provided under a Creative 2 00:00:04,040 --> 00:00:05,580 Commons license. 3 00:00:05,580 --> 00:00:07,880 Your support will help MIT OpenCourseWare 4 00:00:07,880 --> 00:00:12,270 continue to offer high quality educational resources for free. 5 00:00:12,270 --> 00:00:14,870 To make a donation or view additional materials 6 00:00:14,870 --> 00:00:18,830 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,830 --> 00:00:20,000 at ocw.mit.edu. 8 00:00:23,467 --> 00:00:25,550 REBECCA SAXE: There's a whole bunch of limitations 9 00:00:25,550 --> 00:00:28,490 of Haxby-style correlations-- 10 00:00:28,490 --> 00:00:32,900 one of them is that all the tests are binary. 11 00:00:32,900 --> 00:00:35,420 The answer you get for anything you test 12 00:00:35,420 --> 00:00:37,490 is that there is or is not information 13 00:00:37,490 --> 00:00:41,070 about that distinction, so there's no continuous measure 14 00:00:41,070 --> 00:00:41,570 here. 15 00:00:41,570 --> 00:00:44,780 It's just that two things are more-- 16 00:00:44,780 --> 00:00:46,830 they are different from one another 17 00:00:46,830 --> 00:00:50,360 or they are not different from one another. 18 00:00:50,360 --> 00:00:56,289 And so once people started thinking about this method 19 00:00:56,289 --> 00:00:57,830 it became clear that this is actually 20 00:00:57,830 --> 00:01:00,660 just a special case of a much more general way of thinking 21 00:01:00,660 --> 00:01:01,930 about fMRI data. 22 00:01:01,930 --> 00:01:05,780 So this particular method-- using spatial correlations-- 23 00:01:05,780 --> 00:01:10,400 is very stable and robust, but it's a special case 24 00:01:10,400 --> 00:01:12,540 of a much more general set. 25 00:01:12,540 --> 00:01:14,690 And here's the more general idea. 26 00:01:14,690 --> 00:01:18,020 The more general idea is that we can think of the response 27 00:01:18,020 --> 00:01:20,960 pattern to a stimulus in a set of voxels, for example-- 28 00:01:20,960 --> 00:01:23,210 the voxels in a region-- 29 00:01:23,210 --> 00:01:26,150 we can think of that response pattern 30 00:01:26,150 --> 00:01:30,470 as a vector in voxel space. 31 00:01:30,470 --> 00:01:32,570 So every time you present a stimulus 32 00:01:32,570 --> 00:01:34,700 you get the response of all the voxels. 33 00:01:34,700 --> 00:01:37,370 Now, instead of thinking of that as a spatial pattern, 34 00:01:37,370 --> 00:01:39,650 think of that as a vector in voxel space. 35 00:01:39,650 --> 00:01:44,040 Every voxel defines a dimension, and the position in voxel space 36 00:01:44,040 --> 00:01:47,420 is how much activity in each of those voxels there was. 37 00:01:47,420 --> 00:01:49,500 Can everybody do that mental transformation? 38 00:01:49,500 --> 00:01:52,300 This is, like, the key insight that people had about MVPA-- 39 00:01:52,300 --> 00:01:56,340 is we had been thinking about everything in space-- 40 00:01:56,340 --> 00:01:58,340 in the space of cortex-- 41 00:01:58,340 --> 00:02:02,030 but instead of thinking of a spatial pattern on cortex, 42 00:02:02,030 --> 00:02:04,130 treat each voxel as a dimension of a very 43 00:02:04,130 --> 00:02:05,782 multi-dimensional space. 44 00:02:05,782 --> 00:02:07,240 Now, the response to every stimulus 45 00:02:07,240 --> 00:02:10,610 is one point in voxel space. 46 00:02:10,610 --> 00:02:11,390 OK? 47 00:02:11,390 --> 00:02:14,830 As soon as you think of it that way, 48 00:02:14,830 --> 00:02:18,541 then your mental representation of fMRI data looks like that. 49 00:02:18,541 --> 00:02:19,040 Right? 50 00:02:19,040 --> 00:02:20,789 So your mental representation of fMRI data 51 00:02:20,789 --> 00:02:23,150 used to be a BOLD response, and then 52 00:02:23,150 --> 00:02:25,580 it was a spatial pattern of a cortex, 53 00:02:25,580 --> 00:02:28,610 and now it's a point in voxel space. 54 00:02:28,610 --> 00:02:31,820 And if you can follow those three transformations then 55 00:02:31,820 --> 00:02:36,530 you realize that a set of points in a multi-dimensional space 56 00:02:36,530 --> 00:02:38,450 is the kind of problem that all of machine 57 00:02:38,450 --> 00:02:40,790 learning for the last 20 years has been working on. 58 00:02:40,790 --> 00:02:41,660 Right? 59 00:02:41,660 --> 00:02:44,990 And so everything that has ever happened in machine learning 60 00:02:44,990 --> 00:02:48,080 could now be used in fMRI, because-- 61 00:02:48,080 --> 00:02:51,740 well, almost-- because machine learning has just 62 00:02:51,740 --> 00:02:55,250 absolutely proliferated in both techniques and problems 63 00:02:55,250 --> 00:02:57,560 and solutions to those problems for handling 64 00:02:57,560 --> 00:03:00,260 data sets where you have no idea where the dataset came from, 65 00:03:00,260 --> 00:03:03,110 but it's now represented as multiple points 66 00:03:03,110 --> 00:03:05,100 in a multidimensional space. 67 00:03:05,100 --> 00:03:08,360 And so that's what happened about five years ago-- 68 00:03:08,360 --> 00:03:09,920 is that people realized that we could 69 00:03:09,920 --> 00:03:12,860 think of fMRI as the response to every stimulus 70 00:03:12,860 --> 00:03:16,010 as a point in voxel space. 71 00:03:16,010 --> 00:03:19,490 A set of data is a set of points in voxel space. 72 00:03:19,490 --> 00:03:21,980 Now, do anything you want with that. 73 00:03:21,980 --> 00:03:24,080 And the first most obvious thing to do 74 00:03:24,080 --> 00:03:26,370 is to think of this as a classification problem. 75 00:03:26,370 --> 00:03:26,870 OK? 76 00:03:26,870 --> 00:03:30,020 So we created conditions in our stimuli or dimensions 77 00:03:30,020 --> 00:03:32,390 in our stimuli, so now we can ask, 78 00:03:32,390 --> 00:03:34,940 can we decode those conditions? 79 00:03:34,940 --> 00:03:35,842 Can we find clusters? 80 00:03:35,842 --> 00:03:36,800 Can we find dimensions? 81 00:03:36,800 --> 00:03:36,950 Right? 82 00:03:36,950 --> 00:03:38,408 All the standard things that people 83 00:03:38,408 --> 00:03:42,470 have done when you had points in multi-dimensional spaces. 84 00:03:42,470 --> 00:03:44,900 And so, again, the most common thing people 85 00:03:44,900 --> 00:03:47,630 now do is now that you think of fMRI data that way, 86 00:03:47,630 --> 00:03:50,840 try linear classification of the categories or dimensions 87 00:03:50,840 --> 00:03:53,729 that you're interested in, and typically 88 00:03:53,729 --> 00:03:55,520 using standard machine learning techniques. 89 00:03:55,520 --> 00:03:57,980 So think of training a classifier on some of your data 90 00:03:57,980 --> 00:03:59,819 and testing it on independent data 91 00:03:59,819 --> 00:04:01,610 and trying to find the right classification 92 00:04:01,610 --> 00:04:03,651 techniques that can identify whatever distinction 93 00:04:03,651 --> 00:04:06,800 you're interested in in the data set that you built. 94 00:04:06,800 --> 00:04:10,850 And so, the way that this one looks is that you take some-- 95 00:04:10,850 --> 00:04:14,030 now, voxels are on the y-axis of this heat map, 96 00:04:14,030 --> 00:04:19,220 so we have whatever that is 80, 100 voxels in a region-- maybe 97 00:04:19,220 --> 00:04:22,550 more, and for every stimulus you have 98 00:04:22,550 --> 00:04:24,770 the response in every voxel to that stimulus. 99 00:04:24,770 --> 00:04:25,270 Right? 100 00:04:25,270 --> 00:04:27,830 So each of those columns now is a representation 101 00:04:27,830 --> 00:04:30,860 of where that stimulus landed in voxel space, 102 00:04:30,860 --> 00:04:33,194 and you have a whole bunch of instances. 103 00:04:33,194 --> 00:04:34,610 And so now what you're going to do 104 00:04:34,610 --> 00:04:39,740 is use the training to learn a potential linear classifier 105 00:04:39,740 --> 00:04:42,350 that tells you what was the best way to separate 106 00:04:42,350 --> 00:04:44,480 the stimuli that came from one labeled 107 00:04:44,480 --> 00:04:47,090 set versus the stimuli that came from some other labeled set. 108 00:04:47,090 --> 00:04:50,510 And the test of that is going to be-- 109 00:04:50,510 --> 00:04:53,540 take a new stimulus or new response 110 00:04:53,540 --> 00:04:55,310 and use the classification you learned 111 00:04:55,310 --> 00:04:58,420 to try to decode which stimulus that that came from, 112 00:04:58,420 --> 00:04:59,870 and measure your accuracy. 113 00:04:59,870 --> 00:05:03,470 And so the new measure of the information in fMRI 114 00:05:03,470 --> 00:05:05,480 is going to be classification accuracy. 115 00:05:05,480 --> 00:05:07,130 Does that makes sense-- 116 00:05:07,130 --> 00:05:10,120 the people with me? 117 00:05:10,120 --> 00:05:14,390 OK, because that's where a lot of fMRI is right now-- 118 00:05:14,390 --> 00:05:18,550 is now thinking about responses to stimuli 119 00:05:18,550 --> 00:05:21,290 as points in voxel space and the problem as one 120 00:05:21,290 --> 00:05:26,030 of classification accuracy in independent data. 121 00:05:26,030 --> 00:05:26,730 OK. 122 00:05:26,730 --> 00:05:29,240 Here's one experiment that we did 123 00:05:29,240 --> 00:05:32,060 where we use classification. 124 00:05:32,060 --> 00:05:35,420 So now, another thing just to note is that in this context 125 00:05:35,420 --> 00:05:37,700 you're often trying to classify a single trial. 126 00:05:37,700 --> 00:05:38,270 Right? 127 00:05:38,270 --> 00:05:39,740 So in our case, we're always trying 128 00:05:39,740 --> 00:05:41,240 to classify a single trial, so we've 129 00:05:41,240 --> 00:05:43,290 gone from partitioning the data to two halves 130 00:05:43,290 --> 00:05:46,520 and asking about similarity, to training on some of the data, 131 00:05:46,520 --> 00:05:49,400 and now classifying single independent trials. 132 00:05:49,400 --> 00:05:50,900 OK. 133 00:05:50,900 --> 00:05:54,160 So here's a case where we tried to do that, 134 00:05:54,160 --> 00:05:55,944 and it was an extension of the stuff 135 00:05:55,944 --> 00:05:57,860 that I just showed you that you could classify 136 00:05:57,860 --> 00:06:01,700 seeing versus hearing, and so we tried to replicate and extend 137 00:06:01,700 --> 00:06:02,266 that. 138 00:06:02,266 --> 00:06:03,890 So we told people's stories like this-- 139 00:06:03,890 --> 00:06:06,110 there's a background-- so Bella's 140 00:06:06,110 --> 00:06:08,454 pouring sleeping potion into Ardwin's soup, 141 00:06:08,454 --> 00:06:09,870 where her sister, Jen, is waiting. 142 00:06:09,870 --> 00:06:12,500 They're holding their breath while he starts to eat. 143 00:06:12,500 --> 00:06:14,980 The conclusion of the story is always going to be the same. 144 00:06:14,980 --> 00:06:16,949 Bella concludes that the potion worked, 145 00:06:16,949 --> 00:06:18,740 and then we tell you based on what evidence 146 00:06:18,740 --> 00:06:21,920 she made that conclusion. 147 00:06:21,920 --> 00:06:23,460 Another case here is going to be-- 148 00:06:23,460 --> 00:06:25,970 Bella stared through the secret peephole and waited. 149 00:06:25,970 --> 00:06:28,850 In the bright light she saw his eyes close and his head droop, 150 00:06:28,850 --> 00:06:31,280 so that her evidence for the conclusion that the potion 151 00:06:31,280 --> 00:06:32,610 has worked. 152 00:06:32,610 --> 00:06:36,055 That's OK evidence, and we can vary that in a bunch of ways. 153 00:06:36,055 --> 00:06:38,360 So one is we can change the modality for evidence. 154 00:06:38,360 --> 00:06:41,000 Instead of seeing something, she can hear something. 155 00:06:41,000 --> 00:06:43,400 So for example, she pressed her ear against the door 156 00:06:43,400 --> 00:06:44,180 and waited. 157 00:06:44,180 --> 00:06:47,360 In the quiet she heard the spoon drop and a soft snore. 158 00:06:47,360 --> 00:06:50,270 So that's similar content of information, 159 00:06:50,270 --> 00:06:54,020 but arrived at through a different modality. 160 00:06:54,020 --> 00:06:56,810 Or we can change how good her evidence is, 161 00:06:56,810 --> 00:06:58,580 and so in this case we did it by saying, 162 00:06:58,580 --> 00:07:00,770 she tried to peer through a crack in the door. 163 00:07:00,770 --> 00:07:03,500 In the dim light she squinted to see his eyes closed. 164 00:07:03,500 --> 00:07:07,040 OK, so that's less strong perceptual evidence 165 00:07:07,040 --> 00:07:09,870 for the conclusion that the potion has worked. 166 00:07:09,870 --> 00:07:10,770 OK. 167 00:07:10,770 --> 00:07:12,500 And so now what we're going to ask is-- 168 00:07:12,500 --> 00:07:14,630 if we train on one set of stories, 169 00:07:14,630 --> 00:07:17,390 on the pattern of activity in a brain region for stories that 170 00:07:17,390 --> 00:07:19,100 vary on either of these dimensions, 171 00:07:19,100 --> 00:07:23,000 one at a time either vary on modality or vary on quality-- 172 00:07:23,000 --> 00:07:26,670 in a new test set, can we decode that dimension? 173 00:07:26,670 --> 00:07:30,800 Yeah, and the first answer is we can-- both of them. 174 00:07:30,800 --> 00:07:34,440 One thing about this is that this measure isn't binary 175 00:07:34,440 --> 00:07:35,000 anymore. 176 00:07:35,000 --> 00:07:37,790 So since we're doing for every stimulus 177 00:07:37,790 --> 00:07:42,770 we're asking whether we can classify that stimulus or not-- 178 00:07:42,770 --> 00:07:45,170 we can get for every subject, so we can get 179 00:07:45,170 --> 00:07:48,839 for each item the probability-- 180 00:07:48,839 --> 00:07:51,380 for each subject for each item we get a measure of whether it 181 00:07:51,380 --> 00:07:53,960 was classified correctly or not, so across objects, 182 00:07:53,960 --> 00:07:54,957 across items-- 183 00:07:54,957 --> 00:07:56,540 we know for every item the probability 184 00:07:56,540 --> 00:07:59,340 of it being correctly classified or not. 185 00:07:59,340 --> 00:08:01,940 And then we can ask, is that related 186 00:08:01,940 --> 00:08:04,115 to other continuous features of that item? 187 00:08:04,115 --> 00:08:05,990 So in this case what we can say, for example, 188 00:08:05,990 --> 00:08:09,410 is the quality dimension-- 189 00:08:09,410 --> 00:08:12,440 how good your evidence is for the belief that you conclude-- 190 00:08:12,440 --> 00:08:14,510 that's a continuous metric-- 191 00:08:14,510 --> 00:08:16,250 though it's a continuous feature. 192 00:08:16,250 --> 00:08:18,530 It can be judged continuously by human observers, 193 00:08:18,530 --> 00:08:20,990 so for each item we can ask, how good is 194 00:08:20,990 --> 00:08:25,700 the evidence for the conclusion for this specific story? 195 00:08:25,700 --> 00:08:29,870 That judgment by human observers of how good the evidence is, 196 00:08:29,870 --> 00:08:33,049 continuously predicts the probability 197 00:08:33,049 --> 00:08:35,330 of that item being classified as being 198 00:08:35,330 --> 00:08:37,299 good evidence or bad evidence-- even 199 00:08:37,299 --> 00:08:39,179 over above the label that we gave it. 200 00:08:39,179 --> 00:08:44,280 So if you regress out the labels there's a continuous predictor. 201 00:08:44,280 --> 00:08:46,714 So something, like, imagine a neural population 202 00:08:46,714 --> 00:08:48,255 that responds-- a sub population that 203 00:08:48,255 --> 00:08:51,140 responds more the better the evidence, continuously, 204 00:08:51,140 --> 00:08:54,110 so that classification gets better 205 00:08:54,110 --> 00:08:57,470 as you get further out on that dimension. 206 00:08:57,470 --> 00:09:00,091 It's also not redundant across brain regions, 207 00:09:00,091 --> 00:09:02,590 so there's different information in different brain regions. 208 00:09:02,590 --> 00:09:04,756 And this is just to show you that in two other brain 209 00:09:04,756 --> 00:09:06,180 regions-- 210 00:09:06,180 --> 00:09:08,690 so in the right STS we can decode quality, but not 211 00:09:08,690 --> 00:09:10,700 modality, and in the left TPJ we can decode 212 00:09:10,700 --> 00:09:13,300 modality, but not quality. 213 00:09:13,300 --> 00:09:18,050 And the left TPJ we've replicated a bunch of times. 214 00:09:18,050 --> 00:09:22,070 In the DMPFC we can't decode modality or quality, 215 00:09:22,070 --> 00:09:24,710 but we can decode valence, which is the thing I told you 216 00:09:24,710 --> 00:09:26,090 the right TPJ doesn't decode. 217 00:09:26,090 --> 00:09:29,030 And then if we go back and look at valence in this dataset-- 218 00:09:29,030 --> 00:09:31,520 we can only decode valence in the DMPFC. 219 00:09:31,520 --> 00:09:33,890 So this is, to me, this is starting to get cool, right? 220 00:09:33,890 --> 00:09:36,080 Three features of other people's mental states 221 00:09:36,080 --> 00:09:38,960 represented differentially in different brain regions. 222 00:09:38,960 --> 00:09:41,810 This distinction between the more epistemic stuff-- 223 00:09:41,810 --> 00:09:45,040 like modality and quality which is represented in the TPJ, 224 00:09:45,040 --> 00:09:47,720 and valence which is represented in the DMPFC-- 225 00:09:47,720 --> 00:09:49,752 I think is real and deep and hints at one 226 00:09:49,752 --> 00:09:52,210 of the really most important distinctions within our theory 227 00:09:52,210 --> 00:09:54,350 of mind that I mentioned at the very beginning-- 228 00:09:54,350 --> 00:09:57,320 between epistemic states and affective or motivational 229 00:09:57,320 --> 00:09:58,760 states. 230 00:09:58,760 --> 00:10:02,725 So what's cool about classification analyzes? 231 00:10:02,725 --> 00:10:04,100 They have all the same properties 232 00:10:04,100 --> 00:10:06,082 as the Haxby-style analyses in principle, 233 00:10:06,082 --> 00:10:08,540 because they're actually just a generalization of the Haxby 234 00:10:08,540 --> 00:10:13,580 analyses, except that they're a lot less robust, because what 235 00:10:13,580 --> 00:10:16,490 you're trying to classify as single trials are single items. 236 00:10:16,490 --> 00:10:21,050 And so noisy data collapses faster in these classification 237 00:10:21,050 --> 00:10:23,160 strategies than in Haxby-style analyses 238 00:10:23,160 --> 00:10:24,255 where you're averaging. 239 00:10:24,255 --> 00:10:29,570 But otherwise, those are the same two techniques. 240 00:10:29,570 --> 00:10:31,070 What's nice about the classification 241 00:10:31,070 --> 00:10:33,700 analyses is you can get item specific outcomes, right? 242 00:10:33,700 --> 00:10:36,836 So you can say, for a specific item, 243 00:10:36,836 --> 00:10:38,210 how likely it is to be classified 244 00:10:38,210 --> 00:10:39,707 as one thing or another? 245 00:10:39,707 --> 00:10:41,540 And this is where I started the talk before, 246 00:10:41,540 --> 00:10:45,650 which is that in both of these cases we think of a hypothesis 247 00:10:45,650 --> 00:10:47,600 and test it sequentially. 248 00:10:47,600 --> 00:10:50,900 And so the representational similarity matrix 249 00:10:50,900 --> 00:10:55,250 tests whole hypothesis spaces instead of single features. 250 00:10:55,250 --> 00:10:57,710 Classification and Haxby-style stuff 251 00:10:57,710 --> 00:11:01,580 are ways to think of a future or dimension that 252 00:11:01,580 --> 00:11:04,190 might be represented in a brain region you care about, 253 00:11:04,190 --> 00:11:06,497 and test whether or not it's represented. 254 00:11:06,497 --> 00:11:08,330 So they're a way of thinking of a hypothesis 255 00:11:08,330 --> 00:11:10,788 and testing it, and thinking of hypothesis and testing it-- 256 00:11:10,788 --> 00:11:12,590 and that's what I mean by sequentially. 257 00:11:12,590 --> 00:11:15,050 So you can think of, does the right TPJ 258 00:11:15,050 --> 00:11:16,970 represent the difference, for example, 259 00:11:16,970 --> 00:11:19,130 between Grace poisoning the person knowingly 260 00:11:19,130 --> 00:11:20,690 and poisoning the person unknowingly? 261 00:11:20,690 --> 00:11:23,090 The answer to that is yes, it does, 262 00:11:23,090 --> 00:11:24,331 but that's one hypothesis. 263 00:11:24,331 --> 00:11:26,330 And then we can come up with another hypothesis, 264 00:11:26,330 --> 00:11:27,892 and then another hypothesis. 265 00:11:27,892 --> 00:11:30,350 And what's interesting about representational dissimilarity 266 00:11:30,350 --> 00:11:34,280 matrices-- one of the versions of MVPA people use these days-- 267 00:11:34,280 --> 00:11:36,300 is that it takes a different approach. 268 00:11:36,300 --> 00:11:39,520 So instead of trying to think of one hypothesis and test it, 269 00:11:39,520 --> 00:11:44,076 it proposes a hypothesis space and tests the space as a whole, 270 00:11:44,076 --> 00:11:45,950 and that gives you both different sensitivity 271 00:11:45,950 --> 00:11:48,260 and strengths and different weaknesses. 272 00:11:48,260 --> 00:11:50,630 So I'll work through an example in which we did this. 273 00:11:53,540 --> 00:11:55,250 I told you that I would come back 274 00:11:55,250 --> 00:11:57,500 to thinking about other people's feelings, 275 00:11:57,500 --> 00:12:00,800 and in this experiment we took different kinds of things 276 00:12:00,800 --> 00:12:04,980 that people could feel as one subspace of theory of mind. 277 00:12:04,980 --> 00:12:09,050 So our stimuli, in this case, are 200 stories about people 278 00:12:09,050 --> 00:12:11,480 having an emotional experience. 279 00:12:11,480 --> 00:12:13,820 And we're going to look at-- 280 00:12:13,820 --> 00:12:16,970 what can we understand about how your brain represents those 281 00:12:16,970 --> 00:12:17,960 different-- 282 00:12:17,960 --> 00:12:20,420 your knowledge that lets you sort out people's 283 00:12:20,420 --> 00:12:21,950 experiences in those cases. 284 00:12:21,950 --> 00:12:23,510 OK, so it's hard in the abstract, 285 00:12:23,510 --> 00:12:24,890 let's do it in the concrete. 286 00:12:24,890 --> 00:12:28,130 So in the behavioral version of this test 287 00:12:28,130 --> 00:12:30,350 I give you a list of 20 different emotions-- 288 00:12:30,350 --> 00:12:32,450 jealous, disappointed, devastated, embarrassed, 289 00:12:32,450 --> 00:12:35,750 disgusted, guilty, impressed, proud, excited, hopeful, 290 00:12:35,750 --> 00:12:38,750 joyful, et cetera-- so you have 20 different choices. 291 00:12:38,750 --> 00:12:42,030 And I'm going to tell you a single story about a character 292 00:12:42,030 --> 00:12:44,030 you don't know, and something they experienced-- 293 00:12:44,030 --> 00:12:45,200 very briefly-- 294 00:12:45,200 --> 00:12:48,110 and what I want you to think to yourself is, which emotion did 295 00:12:48,110 --> 00:12:49,650 they experience in that case? 296 00:12:49,650 --> 00:12:50,150 OK? 297 00:12:50,150 --> 00:12:51,230 So here's one. 298 00:12:51,230 --> 00:12:53,780 After an 18 hour flight, Alice arrived at her vacation 299 00:12:53,780 --> 00:12:56,030 destination to learn that her baggage, including 300 00:12:56,030 --> 00:12:58,347 camping gear for her trip, hadn't made the flight. 301 00:12:58,347 --> 00:13:00,180 After waiting at the airport for two nights, 302 00:13:00,180 --> 00:13:02,430 she was informed that the airline had lost her luggage 303 00:13:02,430 --> 00:13:04,460 and wouldn't provide any compensation. 304 00:13:04,460 --> 00:13:07,340 How many people think she felt joyful? 305 00:13:07,340 --> 00:13:10,775 How many people think that she felt annoyed? 306 00:13:13,790 --> 00:13:16,533 How about furious? 307 00:13:16,533 --> 00:13:18,830 OK, so furious is the modal answer 308 00:13:18,830 --> 00:13:21,200 and annoyed is the most likely second choice 309 00:13:21,200 --> 00:13:23,412 answer to that case. 310 00:13:23,412 --> 00:13:24,370 Here's a different one. 311 00:13:24,370 --> 00:13:26,984 Sarah swore to her roommate that she would keep her new diet. 312 00:13:26,984 --> 00:13:28,400 Later, she was in the kitchen, she 313 00:13:28,400 --> 00:13:30,680 took a bite of a cake she had bought for the dinner party. 314 00:13:30,680 --> 00:13:32,304 When her roommates arrived home to find 315 00:13:32,304 --> 00:13:34,940 that she'd eaten half the cake and broken her diet. 316 00:13:34,940 --> 00:13:39,410 How many people think that she would feel disgusted? 317 00:13:39,410 --> 00:13:40,760 Terrified? 318 00:13:40,760 --> 00:13:42,030 Embarrassed? 319 00:13:42,030 --> 00:13:42,530 OK. 320 00:13:42,530 --> 00:13:43,850 And just to give you a sense of how 321 00:13:43,850 --> 00:13:45,620 fine grained your knowledge is in this case, 322 00:13:45,620 --> 00:13:46,786 think about this difference. 323 00:13:46,786 --> 00:13:49,280 In this case she swore she would keep her diet 324 00:13:49,280 --> 00:13:51,242 and then broke it, right? 325 00:13:51,242 --> 00:13:52,700 What about the difference between-- 326 00:13:52,700 --> 00:13:55,290 she first ate a cake and then swore she would keep her diet. 327 00:13:55,290 --> 00:13:55,790 Right? 328 00:13:55,790 --> 00:13:57,470 That's a totally different texture to the story. 329 00:13:57,470 --> 00:13:57,970 OK. 330 00:13:57,970 --> 00:14:00,230 So we have incredibly fine grained knowledge 331 00:14:00,230 --> 00:14:02,810 of how a description of a situation 332 00:14:02,810 --> 00:14:05,697 predicts an overall emotion. 333 00:14:05,697 --> 00:14:07,530 You can see that in a behavioral experiment, 334 00:14:07,530 --> 00:14:10,280 so what I'm showing you here is-- 335 00:14:10,280 --> 00:14:13,490 on the y-axis the emotion that we intended when we wrote 336 00:14:13,490 --> 00:14:16,910 the story-- so ten stories for each category for 200 stories-- 337 00:14:16,910 --> 00:14:19,580 on the x-axis is the percent of participants 338 00:14:19,580 --> 00:14:21,080 picking that label. 339 00:14:21,080 --> 00:14:24,320 And so the first thing is that 65% of the time, people 340 00:14:24,320 --> 00:14:25,880 pick the label we attend. 341 00:14:25,880 --> 00:14:27,834 If instead, you ask, take half the subjects 342 00:14:27,834 --> 00:14:29,750 to determine a modal answer and the other half 343 00:14:29,750 --> 00:14:32,125 of the subjects as the test set, you get the same answer. 344 00:14:32,125 --> 00:14:35,270 There's about 65% agreement on the right, single label out 345 00:14:35,270 --> 00:14:35,990 of 20. 346 00:14:35,990 --> 00:14:38,510 That's, of course, way above chance, which is 5%, 347 00:14:38,510 --> 00:14:40,460 so people are quite good at this. 348 00:14:40,460 --> 00:14:42,600 And the off-diagonal is also meaningful, 349 00:14:42,600 --> 00:14:44,090 so that also contains information-- 350 00:14:44,090 --> 00:14:45,530 the second best answer, right? 351 00:14:45,530 --> 00:14:49,640 So annoyed as opposed to furious, for example. 352 00:14:49,640 --> 00:14:52,700 OK, so that's a huge amount of rich knowledge 353 00:14:52,700 --> 00:14:55,130 about other people's experiences from these very brief 354 00:14:55,130 --> 00:14:58,490 descriptions of events to a very fine grained classification 355 00:14:58,490 --> 00:15:01,520 of which emotion they're experiencing. 356 00:15:01,520 --> 00:15:03,440 And one way to look at these data 357 00:15:03,440 --> 00:15:06,920 is to ask, OK, well, that's knowledge that we have-- 358 00:15:06,920 --> 00:15:08,620 where is that knowledge in the brain? 359 00:15:08,620 --> 00:15:10,536 That's sort of a first question you could ask, 360 00:15:10,536 --> 00:15:15,810 and you could ask it by just saying, if we try to use-- 361 00:15:15,810 --> 00:15:18,470 so in this case, we're going to do train and test. 362 00:15:18,470 --> 00:15:23,050 So we train a classifier on a patch of cortex, 363 00:15:23,050 --> 00:15:26,800 based on five examples from each condition, 364 00:15:26,800 --> 00:15:29,410 and then we test on the remaining half of the data. 365 00:15:29,410 --> 00:15:33,160 And we just ask, based on the pattern of activity in a patch 366 00:15:33,160 --> 00:15:34,960 can you get above chance classification 367 00:15:34,960 --> 00:15:37,654 in the independent data? 368 00:15:37,654 --> 00:15:39,820 For every patch where that's true we put a, sort of, 369 00:15:39,820 --> 00:15:42,010 bright mark, and then ask, where in the brain 370 00:15:42,010 --> 00:15:44,050 is the relevant decoding that would 371 00:15:44,050 --> 00:15:47,020 let you be above chance on this distinction? 372 00:15:47,020 --> 00:15:50,285 The answer is in exactly the same brain regions 373 00:15:50,285 --> 00:15:52,160 that I have been talking about and showed you 374 00:15:52,160 --> 00:15:55,150 before, that is where there's above chance classification, 375 00:15:55,150 --> 00:15:58,570 and then overlaid on the standard belief versus photo 376 00:15:58,570 --> 00:15:59,950 task in green. 377 00:15:59,950 --> 00:16:02,230 So within the brain regions involved 378 00:16:02,230 --> 00:16:04,240 in theory of mind or social cognition 379 00:16:04,240 --> 00:16:07,330 are the brain regions that can above chance classify 380 00:16:07,330 --> 00:16:09,970 in this 20 way distinction. 381 00:16:09,970 --> 00:16:13,090 And then this is just looking inside each one of those. 382 00:16:13,090 --> 00:16:14,650 Inside each of the regions-- 383 00:16:14,650 --> 00:16:16,750 that's four of the regions in that group 384 00:16:16,750 --> 00:16:18,280 that I showed you before-- 385 00:16:18,280 --> 00:16:19,351 you can do above-- 386 00:16:19,351 --> 00:16:21,850 with using just the pattern of activity in that brain region 387 00:16:21,850 --> 00:16:24,820 you can do above chance classification on this 20 way 388 00:16:24,820 --> 00:16:26,050 distinction. 389 00:16:26,050 --> 00:16:28,120 And there's a hint that that information 390 00:16:28,120 --> 00:16:30,274 is somewhat non-redundant, because if you combine 391 00:16:30,274 --> 00:16:31,690 information across all of them you 392 00:16:31,690 --> 00:16:35,020 do slightly better than if you use any one of them alone. 393 00:16:35,020 --> 00:16:37,960 OK, so now the question is, how can we 394 00:16:37,960 --> 00:16:40,150 study what knowledge is represented 395 00:16:40,150 --> 00:16:41,510 in each of these brain regions? 396 00:16:41,510 --> 00:16:42,010 Right? 397 00:16:42,010 --> 00:16:44,790 So we know that there's some information about that 20 way 398 00:16:44,790 --> 00:16:46,690 classification, but can we learn anything 399 00:16:46,690 --> 00:16:49,540 about the representation of emotions in those brain regions 400 00:16:49,540 --> 00:16:51,550 using fMRI? 401 00:16:51,550 --> 00:16:55,660 And that's where the representation dissimilarity 402 00:16:55,660 --> 00:16:58,100 matrices come in as a strategy. 403 00:16:58,100 --> 00:17:01,840 OK, so the question is, how might you 404 00:17:01,840 --> 00:17:03,910 represent the knowledge that you have 405 00:17:03,910 --> 00:17:07,150 of what Alice is experiencing, for example, in this story? 406 00:17:07,150 --> 00:17:10,690 What's a possible hypothesis? 407 00:17:10,690 --> 00:17:15,130 And the way representational dissimilarity matrices 408 00:17:15,130 --> 00:17:17,950 work as a strategy for fMRI analyses 409 00:17:17,950 --> 00:17:19,510 is that what you should do is think 410 00:17:19,510 --> 00:17:23,170 of multiple different hypotheses about how that knowledge could 411 00:17:23,170 --> 00:17:24,880 be represented. 412 00:17:24,880 --> 00:17:28,329 So a first hypothesis, which is deep in the literature 413 00:17:28,329 --> 00:17:32,183 on emotions, is that we represent other people's 414 00:17:32,183 --> 00:17:36,670 emotional experience in terms of two fundamental dimensions 415 00:17:36,670 --> 00:17:39,061 of emotional experience-- valence and arousal. 416 00:17:39,061 --> 00:17:40,810 Have you guys heard of valence and arousal 417 00:17:40,810 --> 00:17:41,740 as the two fundamental-- 418 00:17:41,740 --> 00:17:42,250 OK. 419 00:17:42,250 --> 00:17:45,350 So this hypothesis says, when we think about emotions-- 420 00:17:45,350 --> 00:17:46,750 our own or other people-- 421 00:17:46,750 --> 00:17:50,300 we put emotions in a two dimensional space, 422 00:17:50,300 --> 00:17:52,450 which is, how good or bad did it make you feel, 423 00:17:52,450 --> 00:17:53,900 and how intense was it? 424 00:17:53,900 --> 00:17:54,400 OK. 425 00:17:54,400 --> 00:17:59,020 So terrified is negative and very intense. 426 00:17:59,020 --> 00:18:01,610 Lonely is negative, but not that intense. 427 00:18:01,610 --> 00:18:02,110 Right? 428 00:18:02,110 --> 00:18:03,820 That's the idea. 429 00:18:03,820 --> 00:18:06,550 Happy is positive and somewhat intense. 430 00:18:06,550 --> 00:18:09,020 Thrilled is happy and more intense. 431 00:18:09,020 --> 00:18:11,470 So that idea is that there's these two basic dimensions 432 00:18:11,470 --> 00:18:14,260 of emotional experience, and so one thing we can do 433 00:18:14,260 --> 00:18:15,880 is have each of our stories, like 434 00:18:15,880 --> 00:18:19,709 this one-- we can have people tell us in that story 435 00:18:19,709 --> 00:18:21,250 was she feeling positive or negative? 436 00:18:21,250 --> 00:18:23,865 How positive or negative, and how intensely? 437 00:18:23,865 --> 00:18:25,240 And so, for each individual story 438 00:18:25,240 --> 00:18:29,800 we can have a representation of it as a point in that space. 439 00:18:29,800 --> 00:18:33,967 And if you use just that, you can classify our 200 stories 440 00:18:33,967 --> 00:18:35,800 reasonably well-- not as well as people can, 441 00:18:35,800 --> 00:18:37,430 but still reasonably well. 442 00:18:37,430 --> 00:18:40,060 OK, so the 200 stories do clump into lumps 443 00:18:40,060 --> 00:18:43,210 in that two dimensional space. 444 00:18:43,210 --> 00:18:46,630 But another idea is that valence and arousal 445 00:18:46,630 --> 00:18:51,210 seem not to capture the full texture of the 20 categories 446 00:18:51,210 --> 00:18:52,210 that we originally have. 447 00:18:52,210 --> 00:18:54,160 It's not that we can't embed 20 categories 448 00:18:54,160 --> 00:18:55,690 in two dimensions-- you obviously 449 00:18:55,690 --> 00:18:58,940 can have 20 clusters in a two dimensional space. 450 00:18:58,940 --> 00:19:01,626 But we had the intuition that it's not a two dimensional 451 00:19:01,626 --> 00:19:03,250 space-- that those two dimensions don't 452 00:19:03,250 --> 00:19:05,200 capture all the features that people have 453 00:19:05,200 --> 00:19:07,370 and know about when they use the stimuli. 454 00:19:07,370 --> 00:19:09,400 And so, based on another literature 455 00:19:09,400 --> 00:19:10,860 called appraisal theory-- 456 00:19:10,860 --> 00:19:15,220 what we tried to do is capture some of the abstract knowledge 457 00:19:15,220 --> 00:19:18,460 that people have about these situations that lets them 458 00:19:18,460 --> 00:19:21,190 identify which emotion it is. 459 00:19:21,190 --> 00:19:23,680 And we did that by having them rate 460 00:19:23,680 --> 00:19:26,770 each of these stories on a bunch of abstract event features. 461 00:19:26,770 --> 00:19:29,160 So those event features are things like-- 462 00:19:29,160 --> 00:19:31,630 was this situation caused by a person 463 00:19:31,630 --> 00:19:33,106 or some other external force. 464 00:19:33,106 --> 00:19:34,480 So I hope you guys have the sense 465 00:19:34,480 --> 00:19:36,813 that if your luggage gets lost on your way to the trip-- 466 00:19:36,813 --> 00:19:38,860 it's different if that was airline incompetence 467 00:19:38,860 --> 00:19:40,190 versus a tornado, right? 468 00:19:40,190 --> 00:19:41,270 Does everybody have that intuition? 469 00:19:41,270 --> 00:19:42,190 The emotion is different. 470 00:19:42,190 --> 00:19:42,689 OK. 471 00:19:42,689 --> 00:19:44,440 So that's an important abstract feature 472 00:19:44,440 --> 00:19:45,880 of our knowledge of other people. 473 00:19:45,880 --> 00:19:47,560 Was it caused by you yourself? 474 00:19:47,560 --> 00:19:49,200 If you left your luggage at home, 475 00:19:49,200 --> 00:19:51,700 that's different from if airline incompetence caused you not 476 00:19:51,700 --> 00:19:52,574 to have your luggage. 477 00:19:52,574 --> 00:19:54,779 And does it refer to something in her past? 478 00:19:54,779 --> 00:19:56,320 Is she interacting with other people? 479 00:19:56,320 --> 00:19:57,850 That makes a really big difference, for example, 480 00:19:57,850 --> 00:19:59,891 in pride and embarrassment-- whether other people 481 00:19:59,891 --> 00:20:01,390 are around. 482 00:20:01,390 --> 00:20:03,400 How will it affect her future relationships? 483 00:20:03,400 --> 00:20:06,161 So things that potentially cause harm to future relationships 484 00:20:06,161 --> 00:20:08,410 feel very different from things that are just annoying 485 00:20:08,410 --> 00:20:09,940 right now but will end. 486 00:20:09,940 --> 00:20:13,900 So these are abstract features and they encapsulate things 487 00:20:13,900 --> 00:20:16,630 we know about emotion relevant features of the situations 488 00:20:16,630 --> 00:20:18,160 people find themselves in. 489 00:20:18,160 --> 00:20:20,440 So we came up with 42 of these and we 490 00:20:20,440 --> 00:20:23,380 had every story rated on all of those dimensions. 491 00:20:23,380 --> 00:20:26,080 And of course, we can, again, classify the stories 492 00:20:26,080 --> 00:20:29,080 as 20 clusters in a 42 dimensional space, right? 493 00:20:29,080 --> 00:20:30,730 Again, of course we can. 494 00:20:30,730 --> 00:20:34,310 But the question is [INAUDIBLE] this is those data. 495 00:20:34,310 --> 00:20:37,060 This is just every set of 10 stories and their average 496 00:20:37,060 --> 00:20:38,370 rating on our-- oh, 38-- 497 00:20:38,370 --> 00:20:40,510 on our 38 appraisal features, so that 498 00:20:40,510 --> 00:20:43,840 creates a 38 dimensional space. 499 00:20:43,840 --> 00:20:45,130 Here the idea is-- 500 00:20:45,130 --> 00:20:47,800 for each category-- like, for all the stories about being 501 00:20:47,800 --> 00:20:49,480 jealous-- 502 00:20:49,480 --> 00:20:52,810 you can get-- for, let's say, for the two dimensions 503 00:20:52,810 --> 00:20:55,510 of valence and arousal-- the average value of valence 504 00:20:55,510 --> 00:20:57,414 and the average value of arousal, right? 505 00:20:57,414 --> 00:20:59,330 So that's a point in a two dimensional space-- 506 00:20:59,330 --> 00:21:01,010 the stories about being jealous. 507 00:21:01,010 --> 00:21:01,930 OK. 508 00:21:01,930 --> 00:21:04,210 Then you take the stories about being terrified. 509 00:21:04,210 --> 00:21:06,320 What's their valence and arousal? 510 00:21:06,320 --> 00:21:08,740 So that's another point in a two dimensional space. 511 00:21:08,740 --> 00:21:11,410 And then you take the distance between them, 512 00:21:11,410 --> 00:21:13,750 and that number goes in a representational dissimilarity 513 00:21:13,750 --> 00:21:14,600 matrix. 514 00:21:14,600 --> 00:21:16,600 So the further away you are in a two dimensional 515 00:21:16,600 --> 00:21:19,044 space, the more dissimilar. 516 00:21:19,044 --> 00:21:21,460 And you could do the same thing in a 42 dimensional space, 517 00:21:21,460 --> 00:21:23,334 a 38 dimensional space, any dimensional space 518 00:21:23,334 --> 00:21:26,960 you want-- what you need to know is just how far away you are. 519 00:21:26,960 --> 00:21:29,890 And so what a representational dissimilarity matrix has in it 520 00:21:29,890 --> 00:21:30,950 is for every pair. 521 00:21:30,950 --> 00:21:33,910 So the jealous stories versus the grateful stories-- 522 00:21:33,910 --> 00:21:36,760 the number in that cell is the distance 523 00:21:36,760 --> 00:21:38,810 from the mean position in your space 524 00:21:38,810 --> 00:21:42,040 of all the jealous stories to the mean position in your space 525 00:21:42,040 --> 00:21:43,780 of all the grateful stories. 526 00:21:43,780 --> 00:21:46,660 Does that make sense? 527 00:21:46,660 --> 00:21:49,390 And that could be true of any dimensionality. 528 00:21:49,390 --> 00:21:51,310 When you know these 38 features-- 529 00:21:51,310 --> 00:21:52,990 so this is behavioral data-- 530 00:21:52,990 --> 00:21:56,470 when you know the 38 features of these emotions, 531 00:21:56,470 --> 00:22:00,280 the green bar is how well you can classify new items, just 532 00:22:00,280 --> 00:22:00,850 behaviorally. 533 00:22:00,850 --> 00:22:02,800 So if I give you a new item and all I 534 00:22:02,800 --> 00:22:05,962 tell you is it's value in these 38 dimensions, 535 00:22:05,962 --> 00:22:07,420 how well can you tell me back which 536 00:22:07,420 --> 00:22:09,280 emotion category it comes from? 537 00:22:09,280 --> 00:22:12,310 The best you could possibly do is 65%, 538 00:22:12,310 --> 00:22:15,440 because that's what human observers do in all of our-- 539 00:22:15,440 --> 00:22:17,255 so the reality is the human observers-- 540 00:22:17,255 --> 00:22:18,880 the features come from human observers, 541 00:22:18,880 --> 00:22:21,340 so our ceiling's going to be 65%, 542 00:22:21,340 --> 00:22:23,280 and the answer is about 55%. 543 00:22:23,280 --> 00:22:23,860 OK. 544 00:22:23,860 --> 00:22:26,330 And you can take that in two different ways. 545 00:22:26,330 --> 00:22:29,860 One tendency is to say, wow, we know 546 00:22:29,860 --> 00:22:34,210 a lot of the key features that go into emotion attribution. 547 00:22:34,210 --> 00:22:36,040 I think, Amy, who I did this work with, 548 00:22:36,040 --> 00:22:37,840 had a tendency to feel that way. 549 00:22:37,840 --> 00:22:40,661 And I think, wow, we thought of 38 things 550 00:22:40,661 --> 00:22:42,910 and we still didn't think of all the important things. 551 00:22:42,910 --> 00:22:44,530 Like, what are those other things 552 00:22:44,530 --> 00:22:46,360 that we didn't think of that explain 553 00:22:46,360 --> 00:22:47,520 the rest of the variation? 554 00:22:47,520 --> 00:22:49,895 So you could feel either way about this, but in any case, 555 00:22:49,895 --> 00:22:53,020 once you know the position of one of these stories in the 38 556 00:22:53,020 --> 00:22:55,120 dimensional space of these features, 557 00:22:55,120 --> 00:22:59,230 you know a lot about which emotion category it came from. 558 00:22:59,230 --> 00:23:01,750 And then this is the correlation to the neural RDM data 559 00:23:01,750 --> 00:23:02,500 that I showed you. 560 00:23:02,500 --> 00:23:05,083 And so, again, what I showed you is, so observer's knowledge-- 561 00:23:05,083 --> 00:23:08,120 that's everything that we know that lets us classify a story. 562 00:23:08,120 --> 00:23:10,000 Valence and arousal is the yellow bar-- 563 00:23:10,000 --> 00:23:12,100 that's just these two features of the story, 564 00:23:12,100 --> 00:23:14,683 and they're both less good than this intermediate thing, which 565 00:23:14,683 --> 00:23:16,330 is the 38 dimensional space. 566 00:23:16,330 --> 00:23:17,980 And one question is, like, do I really 567 00:23:17,980 --> 00:23:19,390 think it's 38 dimensions? 568 00:23:19,390 --> 00:23:20,380 No, definitely not. 569 00:23:20,380 --> 00:23:23,290 That was just the set of all the things that we could think of. 570 00:23:23,290 --> 00:23:25,480 How many dimensions is it, really? 571 00:23:25,480 --> 00:23:28,930 Again, I don't know, really, but I can tell you 572 00:23:28,930 --> 00:23:31,570 that the best ten dimensions capture 573 00:23:31,570 --> 00:23:34,550 most of the information from the 38 dimensions. 574 00:23:34,550 --> 00:23:36,730 So what we've discovered so far is 575 00:23:36,730 --> 00:23:39,697 ten really important dimensions of your knowledge of emotion. 576 00:23:39,697 --> 00:23:42,280 I don't, again, think that means that our knowledge of emotion 577 00:23:42,280 --> 00:23:43,450 is ten dimensional. 578 00:23:43,450 --> 00:23:46,090 Lots of this is limited by the set of stimuli 579 00:23:46,090 --> 00:23:48,880 that we chose, the resolution of the data that we have, 580 00:23:48,880 --> 00:23:50,510 and so forth and so on. 581 00:23:50,510 --> 00:23:52,540 But in these data you need something 582 00:23:52,540 --> 00:23:54,790 on the order of ten dimensions to get 583 00:23:54,790 --> 00:23:56,950 close to human performance or close 584 00:23:56,950 --> 00:24:03,250 to the genuinely differential signal in the neural data. 585 00:24:03,250 --> 00:24:06,400 If you take one thing away from this talk about the methods 586 00:24:06,400 --> 00:24:08,470 used in representational dissimilarity matrix-- 587 00:24:08,470 --> 00:24:10,040 really only one thing. 588 00:24:10,040 --> 00:24:11,980 Here's the one thing I want you to know-- 589 00:24:11,980 --> 00:24:13,990 the dimensionality of the theory that 590 00:24:13,990 --> 00:24:16,600 generated your representational dissimilarity matrix 591 00:24:16,600 --> 00:24:19,480 does nothing for you in the fit to your data. 592 00:24:19,480 --> 00:24:20,590 Nothing at all. 593 00:24:20,590 --> 00:24:22,410 It's a parameter-free fit. 594 00:24:22,410 --> 00:24:23,020 OK? 595 00:24:23,020 --> 00:24:26,800 So anybody to whom those words mean anything, 596 00:24:26,800 --> 00:24:30,185 this will be important, so I want you to actually know this. 597 00:24:30,185 --> 00:24:31,810 Representational dissimilarity matrices 598 00:24:31,810 --> 00:24:34,546 provide a parameter-free fit to the data, 599 00:24:34,546 --> 00:24:35,920 and therefore, the dimensionality 600 00:24:35,920 --> 00:24:38,740 of the theory that generated the representational dissimilarity 601 00:24:38,740 --> 00:24:43,224 matrix has nothing to do with the fit of the data. 602 00:24:43,224 --> 00:24:45,640 You can probably notice I should have ordered this better. 603 00:24:45,640 --> 00:24:47,599 Valance has two dimensions, the observers 604 00:24:47,599 --> 00:24:48,640 has a lot of dimensions-- 605 00:24:48,640 --> 00:24:50,560 I don't know how many, but a lot more than 38. 606 00:24:50,560 --> 00:24:53,510 We know that because 38 doesn't explain all their data. 607 00:24:53,510 --> 00:24:57,700 So as you go up in-- and in principle, 608 00:24:57,700 --> 00:25:00,160 having more dimensions doesn't help in the set. 609 00:25:00,160 --> 00:25:02,740 You can correctly see that they overfit rather than fitting 610 00:25:02,740 --> 00:25:05,330 the data, and here's why. 611 00:25:05,330 --> 00:25:09,070 Because the way you build a representational dissimilarity 612 00:25:09,070 --> 00:25:12,590 matrix is, out of no matter how many dimensions you 613 00:25:12,590 --> 00:25:15,680 have in your data set for every pair of stimuli, 614 00:25:15,680 --> 00:25:19,280 you take one number, and then a representational dissimilarity 615 00:25:19,280 --> 00:25:23,360 matrix encodes the relationships among those numbers. 616 00:25:23,360 --> 00:25:24,200 OK? 617 00:25:24,200 --> 00:25:28,280 So jealousy is more similar to irritation than it is to pride. 618 00:25:28,280 --> 00:25:29,460 By how much? 619 00:25:29,460 --> 00:25:30,410 OK? 620 00:25:30,410 --> 00:25:34,250 And those relative differences is all you have. 621 00:25:34,250 --> 00:25:37,820 You have nothing else, and so there's no parameters. 622 00:25:37,820 --> 00:25:38,600 Right? 623 00:25:38,600 --> 00:25:40,810 You have the same amount of information 624 00:25:40,810 --> 00:25:42,560 in a representational dissimilarity matrix 625 00:25:42,560 --> 00:25:44,851 that you generated from a one dimensional theory, a two 626 00:25:44,851 --> 00:25:46,340 dimensional theory, a 38 dimension 627 00:25:46,340 --> 00:25:48,145 theory, and an infinite dimensional theory. 628 00:25:48,145 --> 00:25:50,270 The size of the theory doesn't make any difference, 629 00:25:50,270 --> 00:25:53,720 because what you get in the end is exactly the same thing-- 630 00:25:53,720 --> 00:25:58,160 the relative distance between every two points in the set. 631 00:25:58,160 --> 00:26:01,460 There's a few things to say about-- 632 00:26:01,460 --> 00:26:05,330 so one thing to say about the representational dissimilarity 633 00:26:05,330 --> 00:26:07,490 analysis that I just showed you is 634 00:26:07,490 --> 00:26:11,330 that it tells you that the 38 dimensional theory is 635 00:26:11,330 --> 00:26:12,985 better than the valence theory. 636 00:26:12,985 --> 00:26:14,360 Like, the event feature theory is 637 00:26:14,360 --> 00:26:18,260 better than the valence theory, but it doesn't tell you why. 638 00:26:18,260 --> 00:26:18,860 Right? 639 00:26:18,860 --> 00:26:22,010 It doesn't tell you whether any specific one of those features 640 00:26:22,010 --> 00:26:25,190 is capturing variance in any specific one of those regions. 641 00:26:25,190 --> 00:26:28,010 It tells you that that whole set was better 642 00:26:28,010 --> 00:26:30,020 than this other whole set, and maybe this 643 00:26:30,020 --> 00:26:31,144 is where you're getting at. 644 00:26:31,144 --> 00:26:35,000 It's much less good for trying post-hoc things for saying, 645 00:26:35,000 --> 00:26:35,630 but why? 646 00:26:35,630 --> 00:26:38,390 Which aspect of that theory was better 647 00:26:38,390 --> 00:26:40,460 than the valence and arousal? 648 00:26:40,460 --> 00:26:42,650 It gives you an all things considered answer, 649 00:26:42,650 --> 00:26:45,020 not a dimension specific answer. 650 00:26:45,020 --> 00:26:47,780 That's one thing that is a limit in the way you 651 00:26:47,780 --> 00:26:50,960 should use representational dissimilarity analyses. 652 00:26:54,270 --> 00:26:57,890 There's two key problems that I think 653 00:26:57,890 --> 00:27:02,120 bear reflecting on about MVPA, and one of them 654 00:27:02,120 --> 00:27:04,070 is a catastrophe and the other one 655 00:27:04,070 --> 00:27:07,304 is an incredibly deep puzzle. 656 00:27:07,304 --> 00:27:08,720 And I think I should just say them 657 00:27:08,720 --> 00:27:10,885 right away before you get too excited, because all 658 00:27:10,885 --> 00:27:12,260 of this stuff was really exciting 659 00:27:12,260 --> 00:27:14,970 and now I'm going to tell you a catastrophe and a puzzle. 660 00:27:14,970 --> 00:27:17,060 Here's the catastrophe. 661 00:27:17,060 --> 00:27:19,880 The catastrophe is that you can't 662 00:27:19,880 --> 00:27:22,700 make anything of null results. 663 00:27:22,700 --> 00:27:24,560 OK, now, here's why. 664 00:27:24,560 --> 00:27:26,660 Because when I say that you can decode something 665 00:27:26,660 --> 00:27:28,400 from an MVPA analysis, what I mean 666 00:27:28,400 --> 00:27:30,050 is that at the scale of voxels, there's 667 00:27:30,050 --> 00:27:31,850 some signal in terms of which voxels 668 00:27:31,850 --> 00:27:34,700 relatively higher or relatively lower in response 669 00:27:34,700 --> 00:27:35,340 to the stimuli. 670 00:27:35,340 --> 00:27:35,840 Right? 671 00:27:35,840 --> 00:27:38,750 So in voxel space or in spatial space, whichever one of those 672 00:27:38,750 --> 00:27:40,010 you find helpful-- 673 00:27:40,010 --> 00:27:43,820 if that at the level of voxels we could cluster these stimuli. 674 00:27:43,820 --> 00:27:45,800 And what that says is that they are something 675 00:27:45,800 --> 00:27:48,500 like distinct populations in this region, 676 00:27:48,500 --> 00:27:50,819 responding across that feature dimension, 677 00:27:50,819 --> 00:27:53,360 and they're spatially segregated enough that we could pick up 678 00:27:53,360 --> 00:27:55,145 on them with fMRI. 679 00:27:55,145 --> 00:27:57,020 But who cares if they're spatially segregated 680 00:27:57,020 --> 00:27:59,353 enough that we could pick up with them with fMRI, right? 681 00:27:59,353 --> 00:28:01,550 fMRI is the scale of a millimeter. 682 00:28:01,550 --> 00:28:04,340 And there could be many, many, many things 683 00:28:04,340 --> 00:28:06,290 that are represented by populations of neurons 684 00:28:06,290 --> 00:28:08,800 within a region that are not spatially organized 685 00:28:08,800 --> 00:28:10,220 at the scale of a millimeter. 686 00:28:10,220 --> 00:28:11,620 Not only could there be-- 687 00:28:11,620 --> 00:28:13,032 there absolutely, definitely are. 688 00:28:13,032 --> 00:28:14,990 There's a whole bunch of things that we already 689 00:28:14,990 --> 00:28:17,420 know are really important properties 690 00:28:17,420 --> 00:28:19,619 of neural representations of things we care about, 691 00:28:19,619 --> 00:28:21,410 and we know that their spatial scale is not 692 00:28:21,410 --> 00:28:23,599 high enough that they can be picked up on with fMRI. 693 00:28:23,599 --> 00:28:25,640 So two cases that I'll tell you about because you 694 00:28:25,640 --> 00:28:26,940 should care about them-- 695 00:28:26,940 --> 00:28:31,160 one is face responses in the middle temporal region 696 00:28:31,160 --> 00:28:35,090 that Doris and Winrich study for face representations 697 00:28:35,090 --> 00:28:37,130 in monkeys. 698 00:28:37,130 --> 00:28:39,085 It's one of the middle ones. 699 00:28:39,085 --> 00:28:41,210 In that one there's face features that can tell you 700 00:28:41,210 --> 00:28:43,370 how far apart the pupils are, how high 701 00:28:43,370 --> 00:28:46,130 the eyebrows are-- did Winrich show you this amazing data? 702 00:28:46,130 --> 00:28:49,700 Totally, amazing, beautiful, feature space of face identity 703 00:28:49,700 --> 00:28:50,780 representation? 704 00:28:50,780 --> 00:28:53,150 One of the most strikingly beautiful things I've 705 00:28:53,150 --> 00:28:54,170 ever seen. 706 00:28:54,170 --> 00:28:56,360 And he already knows-- 707 00:28:56,360 --> 00:28:57,620 he and Doris already know-- 708 00:28:57,620 --> 00:28:59,960 that there's no spatial relationship 709 00:28:59,960 --> 00:29:03,740 at all between the property that one neuron signals 710 00:29:03,740 --> 00:29:05,330 and it's distance from other neurons 711 00:29:05,330 --> 00:29:06,538 that signal other properties. 712 00:29:06,538 --> 00:29:08,352 There's no spatial organization at all. 713 00:29:08,352 --> 00:29:10,310 So if you know that right here is a neuron that 714 00:29:10,310 --> 00:29:14,150 responds to eye width, you know nothing more about the neuron 715 00:29:14,150 --> 00:29:16,490 next to its preferred property than a neuron 716 00:29:16,490 --> 00:29:17,390 a centimeter away. 717 00:29:17,390 --> 00:29:21,165 There's no spatial structure to which feature I give a neuron 718 00:29:21,165 --> 00:29:24,890 response to, which means that you absolutely could not 719 00:29:24,890 --> 00:29:27,930 and cannot pick up on that in fMRI, which Doris has shown. 720 00:29:27,930 --> 00:29:30,320 This feature structure information cannot be picked up 721 00:29:30,320 --> 00:29:34,491 on with fMRI, even though it is there and really important. 722 00:29:34,491 --> 00:29:36,740 Another example is valence and coding in the amygdala. 723 00:29:36,740 --> 00:29:38,364 The amygdala contains some neurons that 724 00:29:38,364 --> 00:29:39,980 respond to positively valenced events 725 00:29:39,980 --> 00:29:42,530 and other neurons that respond to negatively valenced events, 726 00:29:42,530 --> 00:29:44,510 and they are as spatially interleaved 727 00:29:44,510 --> 00:29:47,130 as physically possible-- that's what Kay Tye's data shows. 728 00:29:47,130 --> 00:29:49,130 You couldn't get them more spatially interleaved 729 00:29:49,130 --> 00:29:49,750 than they are. 730 00:29:49,750 --> 00:29:53,090 They are as close together as the size of the neurons allow. 731 00:29:53,090 --> 00:29:55,430 So you absolutely will never be able to decode 732 00:29:55,430 --> 00:29:57,890 with fMRI in those population-- the amygdala-- that there 733 00:29:57,890 --> 00:30:00,098 are different populations for positive and negatively 734 00:30:00,098 --> 00:30:02,120 valenced events, but there are. 735 00:30:02,120 --> 00:30:02,750 OK. 736 00:30:02,750 --> 00:30:05,900 So that means that when you see something in fMRI it's probably 737 00:30:05,900 --> 00:30:07,790 there, but when you don't see it in fMRI 738 00:30:07,790 --> 00:30:09,410 you don't know that it's not there. 739 00:30:09,410 --> 00:30:11,715 And the reason why that's a total catastrophe 740 00:30:11,715 --> 00:30:14,180 is if it means that when I tell you that a region codes 741 00:30:14,180 --> 00:30:15,420 A and not B-- 742 00:30:15,420 --> 00:30:17,130 I don't know that it doesn't code B. 743 00:30:17,130 --> 00:30:19,440 And when I tell you the region that this thing is 744 00:30:19,440 --> 00:30:22,935 coded in region A and it's not coded in region B-- 745 00:30:22,935 --> 00:30:25,290 I don't know that it's not coded in region B. 746 00:30:25,290 --> 00:30:27,340 So I can never show you a double dissociation. 747 00:30:27,340 --> 00:30:29,131 I can never show you a single dissociation. 748 00:30:29,131 --> 00:30:30,930 I can never show you a dissociation at all. 749 00:30:30,930 --> 00:30:33,780 All I can say for sure is that the spatial scale 750 00:30:33,780 --> 00:30:37,000 of the information is different between one region and another, 751 00:30:37,000 --> 00:30:39,434 or between one piece of information and another, 752 00:30:39,434 --> 00:30:41,850 and we have no reason to believe that that matters at all. 753 00:30:41,850 --> 00:30:42,349 Right? 754 00:30:42,349 --> 00:30:44,970 Really important things are encoded at very fine spatial 755 00:30:44,970 --> 00:30:45,748 scales. 756 00:30:45,748 --> 00:30:47,956 And so any time I tell you-- which I told you a bunch 757 00:30:47,956 --> 00:30:49,770 of times because I think it's really cool-- 758 00:30:49,770 --> 00:30:52,920 that there's a difference in what feature is encoded where, 759 00:30:52,920 --> 00:30:54,390 you have no reason to believe me. 760 00:30:54,390 --> 00:30:55,910 And that's the catastrophe. 761 00:30:55,910 --> 00:30:56,980 It's a total catastrophe. 762 00:30:56,980 --> 00:31:00,510 If you can't make distinctions, you can't make any conclusions 763 00:31:00,510 --> 00:31:01,680 at all. 764 00:31:01,680 --> 00:31:03,810 I'll just briefly say the other thing 765 00:31:03,810 --> 00:31:05,490 that's a problem with this, which 766 00:31:05,490 --> 00:31:08,500 is that this idea of similarity space-- 767 00:31:08,500 --> 00:31:11,310 the idea that you should think of a concept, like jealous, 768 00:31:11,310 --> 00:31:13,349 as a point in a multidimensional space, 769 00:31:13,349 --> 00:31:15,390 and what it means to think of somebody as jealous 770 00:31:15,390 --> 00:31:17,670 is to think of them as a certain distance 771 00:31:17,670 --> 00:31:20,550 from irritated and angry and proud and impressed-- 772 00:31:20,550 --> 00:31:23,520 that idea has been thoroughly undermined 773 00:31:23,520 --> 00:31:29,180 in psychology and psychophysics and computational cognition. 774 00:31:29,180 --> 00:31:31,150 It's really a bad theory of concepts. 775 00:31:31,150 --> 00:31:33,930 It can't do any of the work that concepts are supposed to do. 776 00:31:33,930 --> 00:31:35,520 One of the most important things they can't do 777 00:31:35,520 --> 00:31:36,353 is compositionality. 778 00:31:36,353 --> 00:31:38,440 It can't explain the way concepts compose, 779 00:31:38,440 --> 00:31:40,110 which is absolutely critical to the way that we think 780 00:31:40,110 --> 00:31:41,790 and even more critical to the way that we think 781 00:31:41,790 --> 00:31:43,950 about other people's minds, because every thought you have 782 00:31:43,950 --> 00:31:45,510 about somebody else's mental state 783 00:31:45,510 --> 00:31:48,480 is a composition of an agent, a mental state, and a content. 784 00:31:48,480 --> 00:31:52,650 And so, this whole way of thinking about concepts 785 00:31:52,650 --> 00:31:55,350 as points in multi-dimensional spaces 786 00:31:55,350 --> 00:31:58,560 works, but shouldn't work. 787 00:31:58,560 --> 00:32:02,800 And that's the other problem with this whole endeavor. 788 00:32:02,800 --> 00:32:04,890 OK. 789 00:32:04,890 --> 00:32:06,390 There's a bunch of things that we're 790 00:32:06,390 --> 00:32:08,459 doing with this that I will just briefly mention 791 00:32:08,459 --> 00:32:10,750 in case people want to think about it or know about it. 792 00:32:10,750 --> 00:32:12,480 The two things I'm really excited about-- one 793 00:32:12,480 --> 00:32:14,040 is adding temporal information, so looking 794 00:32:14,040 --> 00:32:16,410 at the change in information in brain regions over time, 795 00:32:16,410 --> 00:32:18,100 and how they influence one another. 796 00:32:18,100 --> 00:32:21,480 And that's my post-doc Stefano Anzellotti's project. 797 00:32:21,480 --> 00:32:23,849 And another thing that I'm excited about is that-- 798 00:32:23,849 --> 00:32:25,890 to the degree that you take these positive claims 799 00:32:25,890 --> 00:32:28,389 as something interesting, which I actually still do in spite 800 00:32:28,389 --> 00:32:31,099 of all my end of the world talk-- 801 00:32:31,099 --> 00:32:32,640 one thing that I think is really neat 802 00:32:32,640 --> 00:32:37,380 is the idea of increasingly differentiable representational 803 00:32:37,380 --> 00:32:40,070 spaces. 804 00:32:40,070 --> 00:32:42,240 So two sets of stimuli that produce clusters 805 00:32:42,240 --> 00:32:43,920 that are not separable-- 806 00:32:43,920 --> 00:32:46,080 for example, in voxel or neural space-- 807 00:32:46,080 --> 00:32:48,180 and making them increasingly distinct. 808 00:32:48,180 --> 00:32:50,890 So Jim DiCarlo calls this unfolding a manifold. 809 00:32:50,890 --> 00:32:51,390 Right? 810 00:32:51,390 --> 00:32:53,880 That idea, which is Jim DiCarlo's model 811 00:32:53,880 --> 00:32:57,030 of the successive processing in stages from V1 to V2 812 00:32:57,030 --> 00:32:59,034 to V4 to IT-- 813 00:32:59,034 --> 00:33:00,450 I think that's a really cool model 814 00:33:00,450 --> 00:33:01,870 of conceptual development. 815 00:33:01,870 --> 00:33:03,570 That what you might have is originally 816 00:33:03,570 --> 00:33:05,640 neural responses that can't separate stimuli 817 00:33:05,640 --> 00:33:07,530 along some interesting dimension-- that 818 00:33:07,530 --> 00:33:10,200 unfold that representational space 819 00:33:10,200 --> 00:33:13,560 to make them more dissimilar as you get that concept more-- 820 00:33:13,560 --> 00:33:16,140 or that dimension or feature of the stimuli 821 00:33:16,140 --> 00:33:18,060 more distinctively represented. 822 00:33:18,060 --> 00:33:19,680 And so we've tried a first version 823 00:33:19,680 --> 00:33:22,170 of this with justification-- 824 00:33:22,170 --> 00:33:24,660 so kids between age seven and 12 get better and better 825 00:33:24,660 --> 00:33:26,284 at distinguishing people's beliefs that 826 00:33:26,284 --> 00:33:27,630 have good and bad evidence. 827 00:33:27,630 --> 00:33:29,490 And we've shown that that's correlated 828 00:33:29,490 --> 00:33:31,950 with a neural signature in the right TPJ getting 829 00:33:31,950 --> 00:33:35,880 more and more distinct over that same time and those same kids. 830 00:33:35,880 --> 00:33:39,660 And so I think thinking of representational dissimilarity 831 00:33:39,660 --> 00:33:42,840 as a model of conceptual change, while certainly wrong, 832 00:33:42,840 --> 00:33:46,369 is probably really powerful, and I'm very excited about it. 833 00:33:46,369 --> 00:33:47,910 And the last thing I will do is thank 834 00:33:47,910 --> 00:33:51,210 the people who did the work, especially everybody in my lab, 835 00:33:51,210 --> 00:33:52,760 and two PhD students-- 836 00:33:52,760 --> 00:33:55,600 Jorie Koster-Hale and Amy Skerry and you guys. 837 00:33:55,600 --> 00:33:57,470 Thank you.