1 00:00:01,640 --> 00:00:04,040 The following content is provided under a Creative 2 00:00:04,040 --> 00:00:05,580 Commons license. 3 00:00:05,580 --> 00:00:07,880 Your support will help MIT OpenCourseWare 4 00:00:07,880 --> 00:00:12,270 continue to offer high quality educational resources for free. 5 00:00:12,270 --> 00:00:14,870 To make a donation, or view additional materials 6 00:00:14,870 --> 00:00:18,830 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,830 --> 00:00:21,559 at ocw.mit.edu. 8 00:00:21,559 --> 00:00:23,600 NANCY KANWISHER: Auditory cortex is fun to study, 9 00:00:23,600 --> 00:00:25,250 because very few people do it. 10 00:00:25,250 --> 00:00:26,930 So you study vision, you have to read hundreds of papers 11 00:00:26,930 --> 00:00:28,010 before you get off the ground. 12 00:00:28,010 --> 00:00:29,700 You study audition, you can read three papers, 13 00:00:29,700 --> 00:00:30,574 then you get to play. 14 00:00:30,574 --> 00:00:31,820 It's great. 15 00:00:31,820 --> 00:00:33,650 So there's consensus about tonotopy, 16 00:00:33,650 --> 00:00:35,070 I mentioned this before. 17 00:00:35,070 --> 00:00:37,700 This is an inflated brain, oriented like this top 18 00:00:37,700 --> 00:00:38,840 of the temporal lobe. 19 00:00:38,840 --> 00:00:40,200 High, low, high frequencies. 20 00:00:40,200 --> 00:00:40,700 OK. 21 00:00:40,700 --> 00:00:43,950 That's like retinotopy, but for primary auditory cortex. 22 00:00:43,950 --> 00:00:45,590 So this has been known forever. 23 00:00:45,590 --> 00:00:48,110 Many people have reported it in animals and humans. 24 00:00:48,110 --> 00:00:50,347 Oops, we'll skip the high and low sounds, right. 25 00:00:50,347 --> 00:00:52,430 So there are lots of claims about the organization 26 00:00:52,430 --> 00:00:55,790 of the rest of auditory cortex outside that. 27 00:00:55,790 --> 00:00:58,070 But basically, there's no consensus. 28 00:00:58,070 --> 00:01:00,670 You know, nobody knows how it's organized. 29 00:01:00,670 --> 00:01:05,990 So what we set out to do, was to take a very different approach 30 00:01:05,990 --> 00:01:07,790 from everything I've talked about so far. 31 00:01:07,790 --> 00:01:10,081 We said, let's kind of try to figure out how high level 32 00:01:10,081 --> 00:01:12,050 auditory cortex is organized. 33 00:01:12,050 --> 00:01:15,020 Not by coming up with one fancy little hypothesis, 34 00:01:15,020 --> 00:01:18,200 and a beautifully designed pair of contrast conditions 35 00:01:18,200 --> 00:01:19,850 to test each little hypothesis. 36 00:01:19,850 --> 00:01:21,260 What we usually do. 37 00:01:21,260 --> 00:01:23,930 Let's just scan people listening to lots of stuff, 38 00:01:23,930 --> 00:01:26,420 and use some data driven method to kind of shake the data. 39 00:01:26,420 --> 00:01:27,610 And see what falls out. 40 00:01:27,610 --> 00:01:28,250 OK. 41 00:01:28,250 --> 00:01:30,781 To be really technical about it. 42 00:01:30,781 --> 00:01:31,280 OK. 43 00:01:31,280 --> 00:01:33,890 So when I say we, this is really Sam Norman-Haignere 44 00:01:33,890 --> 00:01:34,640 did all of this. 45 00:01:34,640 --> 00:01:36,890 And as this project got more and more mathematically 46 00:01:36,890 --> 00:01:38,930 sophisticated, I got more and more 47 00:01:38,930 --> 00:01:41,330 taken over by Josh McDermott, who knows about audition. 48 00:01:41,330 --> 00:01:44,060 And knows a lot of fancy math, much more than I do. 49 00:01:44,060 --> 00:01:45,480 They're fabulous collaborators. 50 00:01:45,480 --> 00:01:45,980 OK. 51 00:01:45,980 --> 00:01:47,146 So basically, what do we do? 52 00:01:47,146 --> 00:01:49,700 The first thing you want to do, especially when you're 53 00:01:49,700 --> 00:01:53,270 using these data driven methods to broadly characterize 54 00:01:53,270 --> 00:01:57,530 a region of the brain, is a major source of bias 55 00:01:57,530 --> 00:02:01,050 in the structure you discover, is the stimuli you use. 56 00:02:01,050 --> 00:02:02,717 So with vision, you have a real problem. 57 00:02:02,717 --> 00:02:04,758 You can't just scan people looking at everything, 58 00:02:04,758 --> 00:02:05,840 because there's too much. 59 00:02:05,840 --> 00:02:06,339 Right? 60 00:02:06,339 --> 00:02:11,142 You know, can't keep people in the scanner for 20 hours. 61 00:02:11,142 --> 00:02:13,100 And so, you have a problem of how to choose it. 62 00:02:13,100 --> 00:02:16,010 And as your selection of stimuli shape what you find, 63 00:02:16,010 --> 00:02:17,420 and it's kind of a mess. 64 00:02:17,420 --> 00:02:20,540 With audition, it turns out that there's 65 00:02:20,540 --> 00:02:23,120 a relatively small number of basic level sounds 66 00:02:23,120 --> 00:02:25,020 that people can recognize. 67 00:02:25,020 --> 00:02:26,780 So if you guys all write down-- 68 00:02:26,780 --> 00:02:28,940 you don't have to do this, but just to illustrate. 69 00:02:28,940 --> 00:02:32,510 If you all write down three frequently 70 00:02:32,510 --> 00:02:34,640 heard sounds that are easily recognizable, 71 00:02:34,640 --> 00:02:37,100 that you encounter regularly in your life. 72 00:02:37,100 --> 00:02:41,030 The three sounds that you wrote down are on our list of 165. 73 00:02:41,030 --> 00:02:44,120 Because, in fact, there are not that many different sounds. 74 00:02:44,120 --> 00:02:46,070 And so we did this on the web. 75 00:02:46,070 --> 00:02:47,210 We played people sounds. 76 00:02:47,210 --> 00:02:48,770 And obviously depends on the grain, right. 77 00:02:48,770 --> 00:02:51,103 If it's this person's voice, versus that person's voice, 78 00:02:51,103 --> 00:02:52,580 there are hundreds of thousands. 79 00:02:52,580 --> 00:02:53,600 Right. 80 00:02:53,600 --> 00:02:55,370 But at the grain of person speaking, 81 00:02:55,370 --> 00:02:58,220 dog barking, toilet flushing, ambulance siren. 82 00:02:58,220 --> 00:03:00,380 At that grain, there's only a couple 83 00:03:00,380 --> 00:03:04,580 hundred sounds that everyone can pretty much recognize 84 00:03:04,580 --> 00:03:07,190 in a two second clip, and that they hear frequently. 85 00:03:07,190 --> 00:03:08,690 And that's really lovely, because it 86 00:03:08,690 --> 00:03:11,450 means we can scan subjects listening to all of them. 87 00:03:11,450 --> 00:03:13,350 And we don't have this selection bias. 88 00:03:13,350 --> 00:03:15,770 So we basically tile the space of 89 00:03:15,770 --> 00:03:18,730 recognizable, frequently-heard, natural sounds. 90 00:03:18,730 --> 00:03:21,470 And we scan subjects listening to all of it. 91 00:03:21,470 --> 00:03:24,134 OK so here are some of our sounds. 92 00:03:24,134 --> 00:03:24,800 [VIDEO PLAYBACK] 93 00:03:24,800 --> 00:03:26,140 It's supposed to either rain or snow. 94 00:03:26,140 --> 00:03:26,723 [END PLAYBACK] 95 00:03:26,723 --> 00:03:28,670 This is our list of frequency. 96 00:03:28,670 --> 00:03:30,080 Most common, man speaking. 97 00:03:30,080 --> 00:03:31,670 Second most common, toilet flushing. 98 00:03:31,670 --> 00:03:32,880 And so forth. 99 00:03:32,880 --> 00:03:33,546 [VIDEO PLAYBACK] 100 00:03:33,546 --> 00:03:35,796 Hannah is good at compromising. 101 00:03:35,796 --> 00:03:36,957 [VAROOM] 102 00:03:36,957 --> 00:03:37,540 [END PLAYBACK] 103 00:03:37,540 --> 00:03:39,635 So we pop subjects in the scanner, 104 00:03:39,635 --> 00:03:41,760 and we scan them while they listen to these sounds. 105 00:03:41,760 --> 00:03:42,426 [VIDEO PLAYBACK] 106 00:03:42,426 --> 00:03:44,690 [CLACK, CLACK, CLACK] 107 00:03:44,690 --> 00:03:45,190 [VAROOM] 108 00:03:45,190 --> 00:03:45,680 [END PLAYBACK] 109 00:03:45,680 --> 00:03:46,430 Anyway, you get the idea. 110 00:03:46,430 --> 00:03:47,096 [VIDEO PLAYBACK] 111 00:03:47,096 --> 00:03:48,980 [WATER RUSHING] 112 00:03:48,980 --> 00:03:50,227 [GASP] 113 00:03:50,227 --> 00:03:50,810 [END PLAYBACK] 114 00:03:50,810 --> 00:03:51,710 OK. 115 00:03:51,710 --> 00:03:54,260 So we scan them while they listen to these sounds. 116 00:03:54,260 --> 00:03:59,090 And then what we get is a 165 dimensional vector 117 00:03:59,090 --> 00:04:01,651 describing the response profile for each voxel. 118 00:04:01,651 --> 00:04:02,150 OK. 119 00:04:02,150 --> 00:04:03,920 So each voxel in the brain, we say 120 00:04:03,920 --> 00:04:06,100 how strong was the response to each of those sounds? 121 00:04:06,100 --> 00:04:07,511 And we get something like this. 122 00:04:07,511 --> 00:04:08,260 Everybody with me? 123 00:04:08,260 --> 00:04:08,759 Sort of? 124 00:04:08,759 --> 00:04:10,130 OK. 125 00:04:10,130 --> 00:04:13,670 So now what we do is, we take all 126 00:04:13,670 --> 00:04:17,300 of those voxels that are in greater suburban auditory 127 00:04:17,300 --> 00:04:21,029 cortex, which is just like a whole big region around, 128 00:04:21,029 --> 00:04:23,720 including but far beyond primary auditory cortex. 129 00:04:23,720 --> 00:04:25,550 Anything in that zone that responds to any 130 00:04:25,550 --> 00:04:28,130 of these sounds, is in the net. 131 00:04:28,130 --> 00:04:31,530 And we take all of those, and we put them into a huge matrix. 132 00:04:31,530 --> 00:04:32,030 OK. 133 00:04:32,030 --> 00:04:35,870 So this is now all of the voxels from auditory cortex, 134 00:04:35,870 --> 00:04:37,510 in 10 different subjects. 135 00:04:37,510 --> 00:04:41,030 OK, 11,000 voxels. 136 00:04:41,030 --> 00:04:45,710 And so we've got 11,000 voxels by 165 sounds. 137 00:04:45,710 --> 00:04:47,450 OK. 138 00:04:47,450 --> 00:04:49,280 So now the cool thing is, what we 139 00:04:49,280 --> 00:04:52,730 do is, we throw away the labels on the matrix. 140 00:04:52,730 --> 00:04:53,870 And just apply math. 141 00:04:53,870 --> 00:04:57,490 And say, what is the dominant structure in here? 142 00:04:57,490 --> 00:04:58,780 OK. 143 00:04:58,780 --> 00:05:02,230 And what I love about that is, this is a way to say, 144 00:05:02,230 --> 00:05:05,470 in a very, theory-neutral way, what 145 00:05:05,470 --> 00:05:07,840 are the basic dimensions of representation 146 00:05:07,840 --> 00:05:09,250 that we have in auditory cortex? 147 00:05:09,250 --> 00:05:12,520 Not, can I find evidence for my hypothesis. 148 00:05:12,520 --> 00:05:14,680 But, let's look broadly and let the data 149 00:05:14,680 --> 00:05:17,842 tell us what the major structure is in there. 150 00:05:17,842 --> 00:05:19,090 OK. 151 00:05:19,090 --> 00:05:22,057 So basically what we do is, factorise this matrix. 152 00:05:22,057 --> 00:05:24,640 And probably half of you would understand this better than me. 153 00:05:24,640 --> 00:05:27,790 But just to describe it, basically, we 154 00:05:27,790 --> 00:05:30,400 do a-- it's not exactly independent component analysis, 155 00:05:30,400 --> 00:05:32,034 but it's a version of that. 156 00:05:32,034 --> 00:05:33,700 Actually, multiple versions of this that 157 00:05:33,700 --> 00:05:35,230 have slightly different constraints. 158 00:05:35,230 --> 00:05:36,813 Because of course, there are many ways 159 00:05:36,813 --> 00:05:38,000 to factorise this matrix. 160 00:05:38,000 --> 00:05:39,541 It's an unconstrained problem, so you 161 00:05:39,541 --> 00:05:40,860 need to bring some constraints. 162 00:05:40,860 --> 00:05:44,290 We'd try to bring in minimalist ones in several different ways. 163 00:05:44,290 --> 00:05:47,740 It turns out, the results really don't depend strongly on this. 164 00:05:47,740 --> 00:05:53,800 And so, the cool thing is that the structure that emerges 165 00:05:53,800 --> 00:05:56,740 is not based on any hypothesis about functional profiles. 166 00:05:56,740 --> 00:05:59,290 Because the labels are not even used in the analysis. 167 00:05:59,290 --> 00:06:02,200 And it's not based on any assumption about the anatomy 168 00:06:02,200 --> 00:06:03,850 of auditory cortex. 169 00:06:03,850 --> 00:06:07,440 Because the locations of these voxels 170 00:06:07,440 --> 00:06:09,400 are not known by the analysis. 171 00:06:09,400 --> 00:06:09,900 OK. 172 00:06:13,150 --> 00:06:16,150 OK so basically, the assumption of this analysis 173 00:06:16,150 --> 00:06:17,230 goes like this. 174 00:06:17,230 --> 00:06:19,480 Each voxel, as I lamented earlier, 175 00:06:19,480 --> 00:06:21,320 is hundreds of thousands of neurons. 176 00:06:21,320 --> 00:06:23,860 So the hope here is that there's a relatively small number 177 00:06:23,860 --> 00:06:26,620 of kinds of neural populations. 178 00:06:26,620 --> 00:06:29,680 And that, each one has a distinctive response profile 179 00:06:29,680 --> 00:06:32,320 over those 165 sounds. 180 00:06:32,320 --> 00:06:34,510 And that voxels have different ratios 181 00:06:34,510 --> 00:06:38,480 of the different neural population types. 182 00:06:38,480 --> 00:06:39,760 OK. 183 00:06:39,760 --> 00:06:44,620 And so, further, we assume that there's 184 00:06:44,620 --> 00:06:50,180 this smallish number of sort of canonical response profiles. 185 00:06:50,180 --> 00:06:53,770 Such that we can model the response of each voxel 186 00:06:53,770 --> 00:06:59,160 as a linear weighted sum of some small number of components. 187 00:06:59,160 --> 00:07:00,280 OK. 188 00:07:00,280 --> 00:07:03,490 And so, the goal then is to discover 189 00:07:03,490 --> 00:07:04,700 what those components are. 190 00:07:04,700 --> 00:07:07,540 And the idea is that each component is basically 191 00:07:07,540 --> 00:07:10,510 the response profile and the anatomical distribution 192 00:07:10,510 --> 00:07:12,630 of some neural population. 193 00:07:12,630 --> 00:07:13,720 OK. 194 00:07:13,720 --> 00:07:15,500 So let me just do that one other way here. 195 00:07:15,500 --> 00:07:17,180 So we're going to take this matrix, 196 00:07:17,180 --> 00:07:20,135 and we're going to factorise it into some set of end 197 00:07:20,135 --> 00:07:20,890 components. 198 00:07:20,890 --> 00:07:24,610 And each of those components is going 199 00:07:24,610 --> 00:07:28,350 to have 165 dimensional vector of its response profile. 200 00:07:28,350 --> 00:07:30,080 OK. 201 00:07:30,080 --> 00:07:33,740 Each component will also have a weight matrix 202 00:07:33,740 --> 00:07:37,380 across the relevant voxels there. 203 00:07:37,380 --> 00:07:37,880 OK. 204 00:07:37,880 --> 00:07:39,800 Telling us how much that component 205 00:07:39,800 --> 00:07:41,610 contributes to each voxel. 206 00:07:41,610 --> 00:07:42,110 OK. 207 00:07:44,840 --> 00:07:48,880 And then we use, sort of, ICA to find these components. 208 00:07:48,880 --> 00:07:49,380 OK. 209 00:07:49,380 --> 00:07:50,796 The first thing you do, of course, 210 00:07:50,796 --> 00:07:52,910 in any of these problems is, OK, how many? 211 00:07:52,910 --> 00:07:55,430 And so, actually Sam did a beautiful analysis, 212 00:07:55,430 --> 00:07:56,742 the details of which I'll skip. 213 00:07:56,742 --> 00:07:58,700 Because they're complicated, because I actually 214 00:07:58,700 --> 00:08:00,020 don't remember all of them. 215 00:08:00,020 --> 00:08:02,360 But essentially, you can split the data in half. 216 00:08:02,360 --> 00:08:05,480 Model one whole half, and measure how much variance is 217 00:08:05,480 --> 00:08:07,280 accounted for in left-out data. 218 00:08:07,280 --> 00:08:10,250 And what you find is that, variance accounted for goes up 219 00:08:10,250 --> 00:08:11,690 'til six components. 220 00:08:11,690 --> 00:08:14,990 And then goes down, because you start over fitting, right. 221 00:08:14,990 --> 00:08:17,529 So we know that there are six components in there. 222 00:08:17,529 --> 00:08:19,070 Now, that doesn't mean there are only 223 00:08:19,070 --> 00:08:20,424 six kinds of neural populations. 224 00:08:20,424 --> 00:08:22,340 That's, in part, a statement about what we can 225 00:08:22,340 --> 00:08:24,474 resolve with functional MRI. 226 00:08:24,474 --> 00:08:25,890 But we know that with this method, 227 00:08:25,890 --> 00:08:28,070 we're looking for six components. 228 00:08:28,070 --> 00:08:30,500 That's what it finds. 229 00:08:30,500 --> 00:08:33,169 And so, to remind you, the cool thing 230 00:08:33,169 --> 00:08:35,210 about the components that we're going to get out, 231 00:08:35,210 --> 00:08:37,070 which I'll tell you about in a second, 232 00:08:37,070 --> 00:08:39,110 is that nothing about this analysis 233 00:08:39,110 --> 00:08:40,520 constrain those components. 234 00:08:40,520 --> 00:08:42,645 There are no assumptions that went in there, right. 235 00:08:42,645 --> 00:08:45,960 So if you think about it, if all we can resolve 236 00:08:45,960 --> 00:08:48,800 for the response of each voxel to each sound is, 237 00:08:48,800 --> 00:08:49,887 say, high versus low. 238 00:08:49,887 --> 00:08:50,720 That's conservative. 239 00:08:50,720 --> 00:08:52,250 I think we can resolve, you know, 240 00:08:52,250 --> 00:08:54,620 a finer grain of magnitude of response. 241 00:08:54,620 --> 00:08:56,570 But even if it's just high or low, 242 00:08:56,570 --> 00:09:00,241 there are 2 to the 165 possible response profiles in here. 243 00:09:00,241 --> 00:09:00,740 Right. 244 00:09:00,740 --> 00:09:03,050 We're searching a massive space. 245 00:09:03,050 --> 00:09:04,580 Anything is possible. 246 00:09:04,580 --> 00:09:05,080 Right. 247 00:09:05,080 --> 00:09:07,370 And similarly, the anatomical weight distributions 248 00:09:07,370 --> 00:09:10,190 are completely unconstrained, with respect 249 00:09:10,190 --> 00:09:13,100 to whether they're clustered, overlapping, 250 00:09:13,100 --> 00:09:15,590 a speckly mess, any of those things. 251 00:09:15,590 --> 00:09:16,700 OK. 252 00:09:16,700 --> 00:09:17,806 So what did we find? 253 00:09:17,806 --> 00:09:18,680 I just said all this. 254 00:09:18,680 --> 00:09:21,670 OK, so we're looking for the response profiles 255 00:09:21,670 --> 00:09:22,880 and their distribution. 256 00:09:22,880 --> 00:09:24,170 OK speckly mess, right. 257 00:09:24,170 --> 00:09:25,430 Just said that. 258 00:09:25,430 --> 00:09:26,840 OK. 259 00:09:26,840 --> 00:09:31,280 So what we get with the response profiles is, four of them 260 00:09:31,280 --> 00:09:33,920 are things we already knew about auditory cortex. 261 00:09:33,920 --> 00:09:35,870 One is high frequency selectivity, 262 00:09:35,870 --> 00:09:37,850 and one is low frequency selectivity. 263 00:09:37,850 --> 00:09:39,935 That's tonotopic cortex. 264 00:09:39,935 --> 00:09:42,950 That's the one thing we knew really solidly. 265 00:09:42,950 --> 00:09:44,600 A third thing we find is a response 266 00:09:44,600 --> 00:09:46,520 to pitch, which is different than frequency. 267 00:09:46,520 --> 00:09:47,450 I'll skip the details. 268 00:09:47,450 --> 00:09:49,283 But we'd actually published a paper the year 269 00:09:49,283 --> 00:09:51,470 before, showing that there's a patch of cortex that 270 00:09:51,470 --> 00:09:52,760 likes pitch in particular. 271 00:09:52,760 --> 00:09:54,551 And that, that's not the same as frequency. 272 00:09:54,551 --> 00:09:57,500 And it popped out as one of the components. 273 00:09:57,500 --> 00:10:00,110 A fourth one is a somewhat controversial claim 274 00:10:00,110 --> 00:10:02,589 about spectral temporal modulation, which many people 275 00:10:02,589 --> 00:10:03,380 have written about. 276 00:10:03,380 --> 00:10:06,230 The idea that this is somehow a useful basis 277 00:10:06,230 --> 00:10:10,010 set in auditory representations. 278 00:10:10,010 --> 00:10:12,560 And we found what seems to be a response that fits that. 279 00:10:12,560 --> 00:10:13,060 OK. 280 00:10:13,060 --> 00:10:15,890 So all of those are either totally expected, 281 00:10:15,890 --> 00:10:19,540 or kind of in line with a number of prior papers. 282 00:10:19,540 --> 00:10:21,740 It's the last two that are the cool ones. 283 00:10:21,740 --> 00:10:23,680 OK. 284 00:10:23,680 --> 00:10:26,521 And the numbers-- actually, the numbers refer to-- 285 00:10:26,521 --> 00:10:27,020 never mind. 286 00:10:27,020 --> 00:10:29,180 The numbers are largely arbitrary. 287 00:10:29,180 --> 00:10:31,290 The numbers are for dramatic effect, really. 288 00:10:31,290 --> 00:10:32,020 Component four. 289 00:10:34,601 --> 00:10:35,100 OK. 290 00:10:35,100 --> 00:10:39,500 So here's what one of these last two components is. 291 00:10:39,500 --> 00:10:42,850 So now what we have is, this is a magnitude 292 00:10:42,850 --> 00:10:44,100 of response of that component. 293 00:10:44,100 --> 00:10:45,516 Remember, component is two things. 294 00:10:45,516 --> 00:10:47,270 It's got this profile here, and it's 295 00:10:47,270 --> 00:10:49,440 got the distribution over the cortex. 296 00:10:49,440 --> 00:10:52,460 So this is the profile to the 165 sounds. 297 00:10:52,460 --> 00:10:55,340 The colors refer to different categories of sound. 298 00:10:55,340 --> 00:10:56,930 We put them on Mechanical Turk and had 299 00:10:56,930 --> 00:11:01,190 people stick 1 of 10 different familiar labels. 300 00:11:01,190 --> 00:11:04,340 And so dark green is English speech and light 301 00:11:04,340 --> 00:11:08,120 green is foreign speech, not understood to the subjects. 302 00:11:08,120 --> 00:11:09,830 This just pops right out. 303 00:11:09,830 --> 00:11:10,700 We didn't twiddle. 304 00:11:10,700 --> 00:11:11,840 We didn't fuss. 305 00:11:11,840 --> 00:11:14,870 We didn't look for this, it just popped out. 306 00:11:14,870 --> 00:11:18,180 And light blue is singing. 307 00:11:18,180 --> 00:11:22,490 So this is a response to speech, a really selective response 308 00:11:22,490 --> 00:11:23,750 to speech. 309 00:11:23,750 --> 00:11:26,046 It's not language, because it doesn't care 310 00:11:26,046 --> 00:11:27,170 if it's English or foreign. 311 00:11:27,170 --> 00:11:29,711 So this is not something about representing language meaning, 312 00:11:29,711 --> 00:11:32,420 it's about representing the sounds of speech that 313 00:11:32,420 --> 00:11:37,530 are present here, and to some extent in vocal music. 314 00:11:37,530 --> 00:11:38,730 Pretty amazing. 315 00:11:38,730 --> 00:11:42,150 Now there have been a number of reports from functional MRI 316 00:11:42,150 --> 00:11:44,850 and from intracranial recordings, 317 00:11:44,850 --> 00:11:47,850 suggesting cortical regions selected for speech. 318 00:11:47,850 --> 00:11:50,359 This wasn't completely unprecedented, 319 00:11:50,359 --> 00:11:51,900 although it's certainly the strongest 320 00:11:51,900 --> 00:11:53,190 evidence for specificity. 321 00:11:53,190 --> 00:11:55,830 You can see that in this profile here. 322 00:11:55,830 --> 00:11:57,570 Right. 323 00:11:57,570 --> 00:11:59,550 Dark purple is the next thing you get to 324 00:11:59,550 --> 00:12:03,055 after the language and the singing. 325 00:12:03,055 --> 00:12:05,430 So you get all the way down before you're at dark purple. 326 00:12:05,430 --> 00:12:09,480 And dark purple is non-speech human vocalizations, 327 00:12:09,480 --> 00:12:11,860 stuff like laughing, and crying, and singing. 328 00:12:11,860 --> 00:12:12,360 Right. 329 00:12:12,360 --> 00:12:13,920 Which are similar in some ways. 330 00:12:13,920 --> 00:12:16,430 Not exactly speech, but it's similar. 331 00:12:16,430 --> 00:12:18,120 So that's damn selective. 332 00:12:18,120 --> 00:12:19,270 Pretty cool. 333 00:12:19,270 --> 00:12:21,030 Yeah? 334 00:12:21,030 --> 00:12:23,095 OK. 335 00:12:23,095 --> 00:12:23,970 I just said all that. 336 00:12:23,970 --> 00:12:28,180 OK the other component is even cooler, and here it is. 337 00:12:28,180 --> 00:12:28,770 OK. 338 00:12:28,770 --> 00:12:30,660 Here's the code. 339 00:12:30,660 --> 00:12:36,030 Non-vocal music and vocal music, or singing. 340 00:12:36,030 --> 00:12:38,700 This is a music selective response. 341 00:12:38,700 --> 00:12:40,300 This has never been reported before. 342 00:12:40,300 --> 00:12:41,300 Many people have looked. 343 00:12:41,300 --> 00:12:44,770 We have looked, it hasn't been found. 344 00:12:44,770 --> 00:12:46,680 We think that we were able to find this music 345 00:12:46,680 --> 00:12:48,170 selective response. 346 00:12:48,170 --> 00:12:50,130 In fact, we have evidence that we 347 00:12:50,130 --> 00:12:53,820 were able to find this music selective response. 348 00:12:53,820 --> 00:12:58,710 In large part, because of the use of this linear weighting 349 00:12:58,710 --> 00:13:00,355 model. 350 00:13:00,355 --> 00:13:01,420 If you then-- 351 00:13:01,420 --> 00:13:02,990 I got to show you where these things are in the brain. 352 00:13:02,990 --> 00:13:03,520 OK. 353 00:13:03,520 --> 00:13:05,700 Running out of time, so I'm accelerating here. 354 00:13:05,700 --> 00:13:07,230 We did a bunch of low level acoustic 355 00:13:07,230 --> 00:13:09,720 controls to show that these things really are selective. 356 00:13:09,720 --> 00:13:11,160 You can't account for them. 357 00:13:11,160 --> 00:13:13,590 They don't get the same response if you scramble them. 358 00:13:13,590 --> 00:13:16,173 They really have to do with the structure of speech and music. 359 00:13:16,173 --> 00:13:18,910 I'll skip all that. 360 00:13:18,910 --> 00:13:19,410 Right. 361 00:13:19,410 --> 00:13:21,450 So now we can take those things, those components, 362 00:13:21,450 --> 00:13:22,500 and project them back of the brain. 363 00:13:22,500 --> 00:13:23,880 And say, where are they? 364 00:13:23,880 --> 00:13:24,660 OK. 365 00:13:24,660 --> 00:13:26,350 So first, let's do the reality check. 366 00:13:26,350 --> 00:13:29,970 Here's tonotopic cortex mapped in the usual hypothesis-driven 367 00:13:29,970 --> 00:13:31,567 way. 368 00:13:31,567 --> 00:13:33,900 And now, we're going to put outlines, just as landmarks, 369 00:13:33,900 --> 00:13:38,040 on the high and low frequency parts of tonotopic cortex. 370 00:13:38,040 --> 00:13:41,460 And so, I mentioned before that, one of the components 371 00:13:41,460 --> 00:13:42,570 was low frequencies. 372 00:13:42,570 --> 00:13:43,440 Here it is. 373 00:13:43,440 --> 00:13:46,020 Perfectly aligning with frequency mapping, 374 00:13:46,020 --> 00:13:49,750 that this one pops out of the natural sound experiment, 375 00:13:49,750 --> 00:13:51,000 the ICA on the natural sounds. 376 00:13:51,000 --> 00:13:53,790 And this one is based on hypothesis-driven mapping. 377 00:13:53,790 --> 00:13:56,430 So that's a nice reality check. 378 00:13:56,430 --> 00:13:59,650 But what about speech cortex? 379 00:13:59,650 --> 00:14:00,450 Well, here it is. 380 00:14:00,450 --> 00:14:00,949 OK. 381 00:14:00,949 --> 00:14:05,610 So the white and black outlines are primary auditory cortex. 382 00:14:05,610 --> 00:14:10,710 And you see this band of speech selectivity right below it. 383 00:14:10,710 --> 00:14:14,580 Situated strategically between auditory cortex and language 384 00:14:14,580 --> 00:14:17,190 cortex, which is right below it, actually. 385 00:14:17,190 --> 00:14:20,140 Not shown here, but we know from other studies. 386 00:14:20,140 --> 00:14:22,470 So that's pretty cool. 387 00:14:22,470 --> 00:14:24,660 Here's where the music stuff is. 388 00:14:24,660 --> 00:14:27,760 It's anterior of primary auditory cortex. 389 00:14:27,760 --> 00:14:29,920 And there's a little bit behind it as well. 390 00:14:29,920 --> 00:14:30,960 OK. 391 00:14:30,960 --> 00:14:35,340 So we think we were able to find the music selectivity, 392 00:14:35,340 --> 00:14:37,800 when it wasn't found before with functional MRI. 393 00:14:41,820 --> 00:14:43,740 Because this method enables us to discover 394 00:14:43,740 --> 00:14:46,140 selective components, even if they overlap within 395 00:14:46,140 --> 00:14:47,910 voxels with other components. 396 00:14:47,910 --> 00:14:50,130 Because our linear weighting model 397 00:14:50,130 --> 00:14:53,370 takes it apart and discovers the underlying latent component, 398 00:14:53,370 --> 00:14:55,140 which may be very selective. 399 00:14:55,140 --> 00:14:57,240 Even if, in all of the voxels, it's 400 00:14:57,240 --> 00:14:58,867 mixed in with something else. 401 00:14:58,867 --> 00:15:01,200 So actually, if you go in and you look at the same data. 402 00:15:01,200 --> 00:15:04,560 And you say, let's look for voxels 403 00:15:04,560 --> 00:15:07,530 that are individually very music selective, you can't really 404 00:15:07,530 --> 00:15:09,296 find them. 405 00:15:09,296 --> 00:15:11,670 Because they overlap a little bit with the pitch response 406 00:15:11,670 --> 00:15:13,600 and with some of the other stuff. 407 00:15:13,600 --> 00:15:16,770 So the standard methods can't find the selectivity 408 00:15:16,770 --> 00:15:18,540 in the way that we can with this kind 409 00:15:18,540 --> 00:15:22,000 of mathematical decomposition, which is really thrilling. 410 00:15:22,000 --> 00:15:24,660 I can say one more thing, and I'll take a question. 411 00:15:24,660 --> 00:15:28,530 And the final thing is, we have recently had the opportunity 412 00:15:28,530 --> 00:15:33,180 to reality check this stuff, by using intracranial recording 413 00:15:33,180 --> 00:15:35,040 from patients who have electrodes right 414 00:15:35,040 --> 00:15:36,400 on the surface of their brain. 415 00:15:36,400 --> 00:15:38,460 And we've done this in three subjects now. 416 00:15:38,460 --> 00:15:40,840 And in each subject, we see-- 417 00:15:40,840 --> 00:15:42,000 sorry, this is hard to see. 418 00:15:42,000 --> 00:15:46,830 These are responses of two different electrodes over time. 419 00:15:46,830 --> 00:15:48,340 So the stimulus last two seconds. 420 00:15:48,340 --> 00:15:49,420 So that's 0 to 2 seconds. 421 00:15:49,420 --> 00:15:50,670 This is time. 422 00:15:50,670 --> 00:15:52,500 And this is a speech selective electrode 423 00:15:52,500 --> 00:15:57,060 responding to native speech, foreign speech, and singing. 424 00:15:57,060 --> 00:15:58,650 And here is another electrode that 425 00:15:58,650 --> 00:16:03,100 responds to instrumental music in purple, and singing in blue. 426 00:16:03,100 --> 00:16:07,320 And so what this shows is, we can validate the selectivity. 427 00:16:07,320 --> 00:16:09,960 With intracranial recording, we can see that selectivity 428 00:16:09,960 --> 00:16:12,000 in individual electrodes, that you 429 00:16:12,000 --> 00:16:14,190 can't see in individual voxels. 430 00:16:14,190 --> 00:16:16,290 So that sort of validates having to go 431 00:16:16,290 --> 00:16:19,184 through the tunnel of math to infer the latent selective 432 00:16:19,184 --> 00:16:20,100 components underneath. 433 00:16:20,100 --> 00:16:22,050 Because we can see them in the raw data 434 00:16:22,050 --> 00:16:24,100 in the intracranial recording. 435 00:16:24,100 --> 00:16:26,820 So this is cool, because nobody even 436 00:16:26,820 --> 00:16:29,920 knows why people have music in the first place. 437 00:16:29,920 --> 00:16:32,220 And so the very idea that there are, apparently, 438 00:16:32,220 --> 00:16:34,170 bits of brain that are selectively 439 00:16:34,170 --> 00:16:39,150 engaged in processing music, is radical, and fascinating, 440 00:16:39,150 --> 00:16:41,130 and deeply puzzling. 441 00:16:41,130 --> 00:16:43,590 So, you know, one of the speculations 442 00:16:43,590 --> 00:16:46,200 about why we have music-- 443 00:16:46,200 --> 00:16:48,990 Steve Pinker famously wrote in one of his books 444 00:16:48,990 --> 00:16:51,690 that music is auditory cheesecake. 445 00:16:51,690 --> 00:16:55,470 By which he meant that, music is not like some special purpose 446 00:16:55,470 --> 00:16:58,980 thing, it just pings a bunch of preexisting mechanisms. 447 00:16:58,980 --> 00:17:01,620 Like, you know, fat, and sweet, and all that stuff. 448 00:17:01,620 --> 00:17:02,220 Right. 449 00:17:02,220 --> 00:17:07,170 And so that idea is that music kind of makes use of mechanisms 450 00:17:07,170 --> 00:17:09,040 that exist for other reasons. 451 00:17:09,040 --> 00:17:11,109 And I think this argues otherwise. 452 00:17:11,109 --> 00:17:13,084 If you have selective brain regions, 453 00:17:13,084 --> 00:17:14,500 we don't know that they're innate. 454 00:17:14,500 --> 00:17:16,230 They're quite possibly learned. 455 00:17:16,230 --> 00:17:19,829 But they sure aren't piggybacking 456 00:17:19,829 --> 00:17:20,760 on other mechanisms. 457 00:17:20,760 --> 00:17:23,190 Those regions are pretty selective for music, 458 00:17:23,190 --> 00:17:25,310 as far as we can tell.