1 00:00:00,000 --> 00:00:01,944 [SQUEAKING] 2 00:00:01,944 --> 00:00:03,402 [RUSTLING] 3 00:00:03,402 --> 00:00:05,346 [CLICKING] 4 00:00:09,720 --> 00:00:12,620 NANCY KANWISHER: All right, OK, so let's start. 5 00:00:12,620 --> 00:00:16,070 We're talking about music today, which is fun and awesome. 6 00:00:16,070 --> 00:00:20,120 But first, let me give you a brief whirlwind reminder 7 00:00:20,120 --> 00:00:21,440 of what we did last time. 8 00:00:21,440 --> 00:00:23,960 We talked about hearing in general and speech 9 00:00:23,960 --> 00:00:25,070 in particular. 10 00:00:25,070 --> 00:00:28,280 And we started, as usual, with computational theory, 11 00:00:28,280 --> 00:00:31,730 thinking about what is the problem of audition 12 00:00:31,730 --> 00:00:32,903 and what is sound. 13 00:00:32,903 --> 00:00:34,070 It's the first step of that. 14 00:00:34,070 --> 00:00:36,920 And sound is pressure waves traveling through the air. 15 00:00:36,920 --> 00:00:38,690 And the cool thing about hearing is 16 00:00:38,690 --> 00:00:40,700 that we extract lots of information 17 00:00:40,700 --> 00:00:43,250 from this very, very simple signal of pressure 18 00:00:43,250 --> 00:00:44,480 waves arriving at the ear. 19 00:00:44,480 --> 00:00:48,320 We use it to recognize sounds, to localize sounds, 20 00:00:48,320 --> 00:00:50,270 to figure out what things are made of, 21 00:00:50,270 --> 00:00:54,150 and to understand events around us, and all kinds of things. 22 00:00:54,150 --> 00:00:58,700 And these problems are a major computational challenge. 23 00:00:58,700 --> 00:01:00,920 And in particular, they are ill-posed. 24 00:01:00,920 --> 00:01:05,570 That means that the available information doesn't give you 25 00:01:05,570 --> 00:01:07,190 a unique solution if you consider 26 00:01:07,190 --> 00:01:09,680 the computational problem narrowly. 27 00:01:09,680 --> 00:01:12,510 And that's true for separating sound sources. 28 00:01:12,510 --> 00:01:15,290 So if you have two sound sources at once, 29 00:01:15,290 --> 00:01:18,140 say, two people speaking or a person speaking and a lot 30 00:01:18,140 --> 00:01:21,080 of background noise, that's known as the cocktail party 31 00:01:21,080 --> 00:01:21,710 problem. 32 00:01:21,710 --> 00:01:23,960 Those sounds add on top of each other. 33 00:01:23,960 --> 00:01:26,570 And there's no way to pull them apart without bringing 34 00:01:26,570 --> 00:01:29,660 in other information, knowledge about the world 35 00:01:29,660 --> 00:01:32,630 or knowledge about the nature of voices or speaking 36 00:01:32,630 --> 00:01:33,380 or who's speaking. 37 00:01:33,380 --> 00:01:36,050 Or you need something else, or else it's ill-posed. 38 00:01:36,050 --> 00:01:39,710 That is not solvable just from the basic input. 39 00:01:39,710 --> 00:01:41,720 Another case of an ill-posed problem in audition 40 00:01:41,720 --> 00:01:43,370 is the case of reverb. 41 00:01:43,370 --> 00:01:45,980 So the sound that I'm making right now that's coming out 42 00:01:45,980 --> 00:01:47,690 my mouth is bouncing off the walls 43 00:01:47,690 --> 00:01:51,740 and is arriving at your ears over each little piece of sound 44 00:01:51,740 --> 00:01:54,380 that I make is arriving at different latencies 45 00:01:54,380 --> 00:01:57,830 after I say it as it travels different paths bouncing 46 00:01:57,830 --> 00:01:58,590 around the room. 47 00:01:58,590 --> 00:02:00,090 There's not too much reverb in here, 48 00:02:00,090 --> 00:02:02,390 so it's not that noticeable. 49 00:02:02,390 --> 00:02:04,010 But if we did this in a cathedral, 50 00:02:04,010 --> 00:02:07,130 you'd hear all these echoes. 51 00:02:07,130 --> 00:02:09,410 OK, and so that makes another ill-posed problem, 52 00:02:09,410 --> 00:02:11,120 because all of those different sounds 53 00:02:11,120 --> 00:02:15,570 are added on top of themselves diminished in volume over time. 54 00:02:15,570 --> 00:02:17,210 And you get the sum of all of those, 55 00:02:17,210 --> 00:02:19,043 and you have to pull it apart and figure out 56 00:02:19,043 --> 00:02:20,930 what that sound is. 57 00:02:20,930 --> 00:02:23,480 So both problems are solved by using 58 00:02:23,480 --> 00:02:24,680 knowledge of the real world. 59 00:02:24,680 --> 00:02:27,410 In the case of reverb, it's actual implicit knowledge 60 00:02:27,410 --> 00:02:29,150 that you all have that you didn't 61 00:02:29,150 --> 00:02:31,603 know you have about the physics of reverb. 62 00:02:31,603 --> 00:02:33,770 Because if we play you sounds with the wrong physics 63 00:02:33,770 --> 00:02:35,960 of reverb, you won't be able to deal with reverb. 64 00:02:35,960 --> 00:02:38,390 And that says it's implicit knowledge in your head, which 65 00:02:38,390 --> 00:02:40,640 is pretty cool, that you use to constrain 66 00:02:40,640 --> 00:02:42,710 the ill-posed problem. 67 00:02:42,710 --> 00:02:43,880 We talked about speech. 68 00:02:43,880 --> 00:02:46,370 Phonemes are sounds that distinguish 69 00:02:46,370 --> 00:02:49,562 two different words in a language, like make and bake. 70 00:02:49,562 --> 00:02:51,020 Those are two different sounds that 71 00:02:51,020 --> 00:02:52,603 make the difference between two words. 72 00:02:55,220 --> 00:02:57,350 Each possible speech sound is not a phoneme 73 00:02:57,350 --> 00:02:59,240 in every language of the world. 74 00:02:59,240 --> 00:03:02,750 Languages have some subset of the space of possible phonemes 75 00:03:02,750 --> 00:03:05,720 that distinguish words in their language. 76 00:03:05,720 --> 00:03:09,980 Phonemes include vowels that have these stacked harmonics 77 00:03:09,980 --> 00:03:12,260 in the spectrogram, and consonants 78 00:03:12,260 --> 00:03:14,900 which are the quick transitions in the vertical stripes 79 00:03:14,900 --> 00:03:19,730 in the spectrogram, leading into the harmonic stacks of vowels. 80 00:03:19,730 --> 00:03:23,000 We talked about the problem of talker variability, 81 00:03:23,000 --> 00:03:26,085 that a given phoneme or word sounds very different, 82 00:03:26,085 --> 00:03:27,710 looks very different in the spectrogram 83 00:03:27,710 --> 00:03:29,630 if spoken by two different people. 84 00:03:29,630 --> 00:03:33,650 And conversely, the same person speaking two different words 85 00:03:33,650 --> 00:03:35,510 looks very different in the spectrogram. 86 00:03:35,510 --> 00:03:37,730 And so that means that the identity 87 00:03:37,730 --> 00:03:40,430 of the speaker and the identity of the word being said 88 00:03:40,430 --> 00:03:42,227 are all mushed up together. 89 00:03:42,227 --> 00:03:44,060 And that means that if you want to recognize 90 00:03:44,060 --> 00:03:46,490 the voice independent of what's being said, 91 00:03:46,490 --> 00:03:49,670 or recognize the word independent of who's saying it, 92 00:03:49,670 --> 00:03:52,760 you have a big computational challenge, a classic invariance 93 00:03:52,760 --> 00:03:53,270 problem. 94 00:03:53,270 --> 00:03:53,883 Yeah, Ben. 95 00:03:53,883 --> 00:03:55,425 AUDIENCE: I don't mean to hold us up. 96 00:03:55,425 --> 00:03:58,880 I just wanted to make sure that I'm understanding. 97 00:03:58,880 --> 00:04:02,540 So the difference between consonants and vowels, 98 00:04:02,540 --> 00:04:05,750 are vowels just harmonic, like connective elements 99 00:04:05,750 --> 00:04:06,860 between consonants? 100 00:04:06,860 --> 00:04:09,080 And are consonants the percussive? 101 00:04:09,080 --> 00:04:10,640 Or are they actual-- 102 00:04:10,640 --> 00:04:12,350 like, I just didn't understand that. 103 00:04:12,350 --> 00:04:17,198 NANCY KANWISHER: Yeah, so in the spectrogram, those-- 104 00:04:17,198 --> 00:04:18,740 I didn't put that on the slide here-- 105 00:04:18,740 --> 00:04:21,440 but those horizontal red stripes in the slides 106 00:04:21,440 --> 00:04:24,440 that I showed you last time, those in the spectrogram, those 107 00:04:24,440 --> 00:04:27,680 are bands of energy at different frequencies 108 00:04:27,680 --> 00:04:29,450 that are sustained over a chunk of time. 109 00:04:29,450 --> 00:04:33,860 And those are typical of vowels, or singing or musical. 110 00:04:33,860 --> 00:04:35,870 And those harmonic sounds that have pitch. 111 00:04:35,870 --> 00:04:38,540 And so vowels have those sustained chunks 112 00:04:38,540 --> 00:04:40,422 that look like this in the spectrogram. 113 00:04:40,422 --> 00:04:42,380 And then there are these weird vertical stripes 114 00:04:42,380 --> 00:04:44,360 and transitions in and out of the vowels 115 00:04:44,360 --> 00:04:47,590 that are the consonants. 116 00:04:47,590 --> 00:04:49,550 AUDIENCE: Vowels are when you don't 117 00:04:49,550 --> 00:04:52,810 have [INAUDIBLE] spectrographs because air is just 118 00:04:52,810 --> 00:04:54,810 flowing through and you're filtering it somehow, 119 00:04:54,810 --> 00:04:57,150 like positioning your vocal tract in a certain way. 120 00:04:57,150 --> 00:04:59,330 And consonants are when you close off that air 121 00:04:59,330 --> 00:05:01,790 or restrict it in some way. 122 00:05:01,790 --> 00:05:04,882 So like S's and F's, you're not closing all the way off, 123 00:05:04,882 --> 00:05:06,840 but you're really constricting the vocal tract. 124 00:05:06,840 --> 00:05:08,215 And in a lot of other consonants, 125 00:05:08,215 --> 00:05:09,874 you're actually fully closing it. 126 00:05:13,025 --> 00:05:14,900 NANCY KANWISHER: OK, and then we talked a bit 127 00:05:14,900 --> 00:05:16,310 about the brain basis. 128 00:05:16,310 --> 00:05:18,920 And I pointed out that the neural anatomy 129 00:05:18,920 --> 00:05:21,320 of sound processing-- the subcortical neuroanatomy 130 00:05:21,320 --> 00:05:24,230 is much more complicated than the subcortical neuroanatomy 131 00:05:24,230 --> 00:05:25,640 of vision. 132 00:05:25,640 --> 00:05:27,682 In vision, you have one stop in the LGN, 133 00:05:27,682 --> 00:05:30,140 and then you go up to the cortex coming up from the retina. 134 00:05:30,140 --> 00:05:33,260 In audition, you have many stops between the cochlea, where 135 00:05:33,260 --> 00:05:38,660 you pick up sounds in the inner ear, and auditory cortex. 136 00:05:38,660 --> 00:05:40,280 Some of those stops are shown up here. 137 00:05:40,280 --> 00:05:42,810 And we didn't discuss them. 138 00:05:42,810 --> 00:05:45,200 So then we talked about primary auditory cortex. 139 00:05:45,200 --> 00:05:47,150 That's on the top of the temporal lobes, 140 00:05:47,150 --> 00:05:49,070 like right in there medially. 141 00:05:49,070 --> 00:05:50,750 You went in. 142 00:05:50,750 --> 00:05:53,520 And it has this tonotopic property, 143 00:05:53,520 --> 00:05:56,780 and that is a map of frequency space with this systematic 144 00:05:56,780 --> 00:06:01,130 high-low-high mapping of frequency space that you can 145 00:06:01,130 --> 00:06:01,940 see here-- 146 00:06:01,940 --> 00:06:03,620 high, low, high, like that. 147 00:06:03,620 --> 00:06:06,590 This is the top of the temporal lobe right there. 148 00:06:09,140 --> 00:06:13,850 And I pointed out that in animals and in one recent MRI 149 00:06:13,850 --> 00:06:18,140 study, the response properties of primary auditory cortex 150 00:06:18,140 --> 00:06:23,540 are well modeled by these fairly simple linear filters, known 151 00:06:23,540 --> 00:06:28,100 as spectrotemporal receptive fields or STRFs, shown here. 152 00:06:28,100 --> 00:06:31,460 So they're simple acoustic properties 153 00:06:31,460 --> 00:06:34,088 of a given band of frequencies rising or falling 154 00:06:34,088 --> 00:06:34,880 at different rates. 155 00:06:38,250 --> 00:06:42,470 So today, we're going to talk about music. 156 00:06:42,470 --> 00:06:44,780 And this is also an important moment in the course. 157 00:06:44,780 --> 00:06:47,750 Because up to now, we've been talking about functions that 158 00:06:47,750 --> 00:06:49,550 are mostly shared with animals. 159 00:06:49,550 --> 00:06:51,080 Speech is kind of on the cusp. 160 00:06:51,080 --> 00:06:53,437 I was going to make this point before speech. 161 00:06:53,437 --> 00:06:55,520 And that's actually muddy, because lots of animals 162 00:06:55,520 --> 00:06:57,830 are really good at speech perception. 163 00:06:57,830 --> 00:07:00,260 Chinchillas can distinguish ba from pa. 164 00:07:00,260 --> 00:07:02,150 Go figure, anyway. 165 00:07:02,150 --> 00:07:04,490 So they can perceive speech, but obviously they 166 00:07:04,490 --> 00:07:05,720 don't use it in the same way. 167 00:07:05,720 --> 00:07:09,273 But music is most definitely uniquely human. 168 00:07:09,273 --> 00:07:11,690 And so most of the things we'll be talking about from here 169 00:07:11,690 --> 00:07:15,200 on out are things about the human brain, in particular. 170 00:07:15,200 --> 00:07:16,925 And I think these are the coolest things 171 00:07:16,925 --> 00:07:19,550 in human cognitive neuroscience, because they tell us something 172 00:07:19,550 --> 00:07:23,010 about who we are as human beings. 173 00:07:23,010 --> 00:07:25,620 But they are also the hardest ones to study. 174 00:07:25,620 --> 00:07:28,088 Why is that? 175 00:07:28,088 --> 00:07:28,982 AUDIENCE: [INAUDIBLE] 176 00:07:28,982 --> 00:07:31,530 NANCY KANWISHER: No animal models. 177 00:07:31,530 --> 00:07:34,140 And I'm always lamenting how-- about the shortcomings 178 00:07:34,140 --> 00:07:36,450 of each of the methods in human cognitive neuroscience. 179 00:07:36,450 --> 00:07:38,850 And we have lots of them, and they complement each other, 180 00:07:38,850 --> 00:07:40,350 but there's a whole host of things 181 00:07:40,350 --> 00:07:42,460 that none of those methods are good for. 182 00:07:42,460 --> 00:07:44,400 And so now we're really out on thin ice 183 00:07:44,400 --> 00:07:46,890 trying to understand these things with a weaker 184 00:07:46,890 --> 00:07:49,380 set of methods where we can't go back and validate them 185 00:07:49,380 --> 00:07:50,430 with animal models. 186 00:07:50,430 --> 00:07:51,990 And that's just life. 187 00:07:51,990 --> 00:07:54,150 That's what we do. 188 00:07:54,150 --> 00:07:55,800 So now let's back up for a second 189 00:07:55,800 --> 00:07:58,380 and consider, why am I allocating a whole lecture 190 00:07:58,380 --> 00:08:03,660 for such a fluffy, frivolous topic as music. 191 00:08:03,660 --> 00:08:08,240 And I would say, that's because it's not fluffy. 192 00:08:08,240 --> 00:08:10,610 It's actually fundamental. 193 00:08:10,610 --> 00:08:12,740 And it's fundamental in the sense 194 00:08:12,740 --> 00:08:15,680 that music is both uniquely human-- 195 00:08:15,680 --> 00:08:19,280 no other animal has anything remotely like human music-- 196 00:08:19,280 --> 00:08:21,740 and it's also universally human. 197 00:08:21,740 --> 00:08:24,500 That is, every human culture that's been studied 198 00:08:24,500 --> 00:08:26,030 has some kind of music. 199 00:08:26,030 --> 00:08:28,520 So music is really an essential part 200 00:08:28,520 --> 00:08:30,450 of what it means to be a human being. 201 00:08:30,450 --> 00:08:32,659 It's really at the core of humanity. 202 00:08:32,659 --> 00:08:36,140 And that alone makes it interesting. 203 00:08:36,140 --> 00:08:37,385 But further-- question? 204 00:08:37,385 --> 00:08:38,990 AUDIENCE: So, like, birdsong-- 205 00:08:38,990 --> 00:08:40,657 NANCY KANWISHER: Birdsong doesn't count. 206 00:08:40,657 --> 00:08:43,710 No, birdsong doesn't count in all kinds of ways. 207 00:08:43,710 --> 00:08:46,580 One, it doesn't have anywhere near the flexibility 208 00:08:46,580 --> 00:08:47,670 and variability. 209 00:08:47,670 --> 00:08:54,482 There are like narrow domains in which each male zebra 210 00:08:54,482 --> 00:08:56,690 finch makes a slightly different version of the call, 211 00:08:56,690 --> 00:08:59,607 but within an extremely narrow range. 212 00:08:59,607 --> 00:09:01,190 There's actually a brain imaging study 213 00:09:01,190 --> 00:09:06,320 that looks in brain imaging in songbirds and asks, 214 00:09:06,320 --> 00:09:14,180 do they have reward brain region responses to music. 215 00:09:14,180 --> 00:09:16,400 And the answer is, yes, in some cases. 216 00:09:16,400 --> 00:09:19,220 Like, do they enjoy it, right, is that part of-- 217 00:09:19,220 --> 00:09:22,003 and the answer is yes, but only when 218 00:09:22,003 --> 00:09:24,170 the significance of the birdsong is something that's 219 00:09:24,170 --> 00:09:26,900 relevant to them, like, there's a potential mate right here, 220 00:09:26,900 --> 00:09:28,130 then they like it. 221 00:09:28,130 --> 00:09:30,410 But they don't like it just for the sound. 222 00:09:30,410 --> 00:09:32,420 And that makes it very different from humans. 223 00:09:32,420 --> 00:09:34,087 And there are other differences as well. 224 00:09:36,610 --> 00:09:38,770 So it's further really important to us 225 00:09:38,770 --> 00:09:40,700 humans in a whole bunch of ways. 226 00:09:40,700 --> 00:09:45,020 One, we have been doing it for a very long time. 227 00:09:45,020 --> 00:09:48,310 And so, for example, the archaeological record shows 228 00:09:48,310 --> 00:09:52,840 these 40,000-year-old bone flutes that you can see from 229 00:09:52,840 --> 00:09:58,600 the structure of flute make particular sets of possible 230 00:09:58,600 --> 00:10:00,100 pitches. 231 00:10:00,100 --> 00:10:02,710 And further, most people who've thought about this 232 00:10:02,710 --> 00:10:05,830 have argued that singing probably goes back much farther 233 00:10:05,830 --> 00:10:06,790 than the bone flutes. 234 00:10:06,790 --> 00:10:08,957 After all, you don't have to make anything to do it. 235 00:10:08,957 --> 00:10:10,960 You can just sing. 236 00:10:10,960 --> 00:10:13,810 Some have even speculated that singing 237 00:10:13,810 --> 00:10:16,090 evolved before language. 238 00:10:16,090 --> 00:10:18,700 It's just speculation, but that's possible. 239 00:10:18,700 --> 00:10:22,600 In any case, it goes way back evolutionarily. 240 00:10:22,600 --> 00:10:25,340 It also arises early in development. 241 00:10:25,340 --> 00:10:30,340 So very young infants are extremely interested in music. 242 00:10:30,340 --> 00:10:33,370 They're sensitive to beat and melody, independent of pitch. 243 00:10:33,370 --> 00:10:35,950 We'll talk more about that a little bit. 244 00:10:35,950 --> 00:10:37,450 And finally, if you're not impressed 245 00:10:37,450 --> 00:10:39,220 with any of those arguments, people 246 00:10:39,220 --> 00:10:41,680 spend a lot of money on music. 247 00:10:41,680 --> 00:10:43,720 And if that's your index of importance, 248 00:10:43,720 --> 00:10:45,940 it's really important. 249 00:10:45,940 --> 00:10:50,470 Last year, $43 billion in sales. 250 00:10:50,470 --> 00:10:52,870 So I'd say it's not a frivolous topic. 251 00:10:52,870 --> 00:10:54,340 It's a fundamental topic. 252 00:10:54,340 --> 00:10:57,700 It's near the core of what it means to be a human being. 253 00:10:57,700 --> 00:11:01,270 And all of this raises a really obvious question. 254 00:11:01,270 --> 00:11:04,160 Why do we create and like music in the first place? 255 00:11:04,160 --> 00:11:06,640 What is it for? 256 00:11:06,640 --> 00:11:12,010 And this is a puzzle that people have thought about for at least 257 00:11:12,010 --> 00:11:14,800 centuries, probably millennia. 258 00:11:14,800 --> 00:11:17,980 And this includes all kinds of major thinkers, 259 00:11:17,980 --> 00:11:22,060 like Darwin, who said, "As neither the enjoyment 260 00:11:22,060 --> 00:11:25,030 nor the capacity of producing musical notes 261 00:11:25,030 --> 00:11:27,190 are faculties of the least direct use 262 00:11:27,190 --> 00:11:29,980 to man in reference to his ordinary habits of life, 263 00:11:29,980 --> 00:11:32,710 they must be ranked amongst the most mysterious with which 264 00:11:32,710 --> 00:11:34,990 he is endowed." 265 00:11:34,990 --> 00:11:39,250 So Darwin is implicitly assuming here 266 00:11:39,250 --> 00:11:41,710 that music is an evolved capacity. 267 00:11:41,710 --> 00:11:44,680 It's not something that we just learn and that cultures invent, 268 00:11:44,680 --> 00:11:47,440 if they feel like it or don't feel like it. 269 00:11:47,440 --> 00:11:52,480 But it's actually evolved and shaped by natural selection. 270 00:11:52,480 --> 00:11:55,120 And that means there must be some function 271 00:11:55,120 --> 00:11:57,130 that natural selection was acting on that 272 00:11:57,130 --> 00:11:59,740 was relevant to survival. 273 00:11:59,740 --> 00:12:05,110 So people have speculated about what that function might be. 274 00:12:05,110 --> 00:12:07,690 Those who think that music is an evolved function, 275 00:12:07,690 --> 00:12:11,410 including Darwin, he speculated that it's for sexual selection. 276 00:12:11,410 --> 00:12:13,900 And his writing is so beautiful, I won't paraphrase it. 277 00:12:13,900 --> 00:12:18,520 He says, "It appears probable that the progenitors of man, 278 00:12:18,520 --> 00:12:21,190 either the males or females or both sexes, 279 00:12:21,190 --> 00:12:24,310 before acquiring the power of expressing their mutual love 280 00:12:24,310 --> 00:12:26,500 in articulate language, endeavored 281 00:12:26,500 --> 00:12:30,368 to charm each other with musical notes and rhythm." 282 00:12:30,368 --> 00:12:31,660 So that's Darwin's speculation. 283 00:12:31,660 --> 00:12:33,820 It's just a speculation, but a lovely one. 284 00:12:33,820 --> 00:12:38,290 Also, note that he threw away this radical idea in here: 285 00:12:38,290 --> 00:12:42,370 "before acquiring the power to express their mutual love 286 00:12:42,370 --> 00:12:43,390 in articulate language." 287 00:12:43,390 --> 00:12:48,160 So he's speculating that music came before language. 288 00:12:48,160 --> 00:12:51,850 Again, all speculation, but interesting speculation. 289 00:12:51,850 --> 00:12:54,880 More recently, up the street, there's 290 00:12:54,880 --> 00:12:57,820 a bunch of people who've been thinking about this a lot. 291 00:12:57,820 --> 00:13:00,130 And Sam Mehr at Harvard has been arguing 292 00:13:00,130 --> 00:13:02,170 that the function of music and song, 293 00:13:02,170 --> 00:13:03,910 in particular, which he thinks is really 294 00:13:03,910 --> 00:13:08,890 the fundamental basic kind of native form of music, 295 00:13:08,890 --> 00:13:10,960 has an evolutionary role in managing 296 00:13:10,960 --> 00:13:12,442 parent-offspring conflict. 297 00:13:12,442 --> 00:13:14,650 And that's something that many evolutionary theorists 298 00:13:14,650 --> 00:13:15,730 have written about. 299 00:13:15,730 --> 00:13:18,640 The genetic interests of a parent and an offspring 300 00:13:18,640 --> 00:13:21,700 are highly overlapping, but not completely overlapping. 301 00:13:21,700 --> 00:13:23,590 The parent has other offspring to take care 302 00:13:23,590 --> 00:13:25,300 of besides this one right here. 303 00:13:25,300 --> 00:13:28,360 That one right there wants 100% of the parent's effort. 304 00:13:28,360 --> 00:13:30,950 Therein lies the conflict. 305 00:13:30,950 --> 00:13:35,020 And so Mehr has proposed that infant directed 306 00:13:35,020 --> 00:13:37,840 song arose in this kind of arms race 307 00:13:37,840 --> 00:13:40,390 between the somewhat competing interests of the parent 308 00:13:40,390 --> 00:13:41,480 and the offspring. 309 00:13:41,480 --> 00:13:44,020 And it manages this need the infant has 310 00:13:44,020 --> 00:13:46,450 to know the parent is there with the fact 311 00:13:46,450 --> 00:13:48,460 that the parent has other needs, so i guess idea 312 00:13:48,460 --> 00:13:54,080 they can sing while attending to other offspring, and on and on. 313 00:13:54,080 --> 00:13:56,320 So there's other kinds of speculations like this. 314 00:13:56,320 --> 00:13:59,380 But importantly, this is not the only kind of view. 315 00:13:59,380 --> 00:14:01,810 It's not necessarily the case that music 316 00:14:01,810 --> 00:14:03,910 is an evolved capacity. 317 00:14:03,910 --> 00:14:06,520 So others have argued that it's not. 318 00:14:06,520 --> 00:14:09,160 So Steve Pinker, also up the street, 319 00:14:09,160 --> 00:14:11,500 has argued that music is "auditory 320 00:14:11,500 --> 00:14:15,010 cheesecake, an exquisite confection crafted 321 00:14:15,010 --> 00:14:17,290 to tickle the sensitive spots of at least six 322 00:14:17,290 --> 00:14:19,480 of our mental faculties. 323 00:14:19,480 --> 00:14:22,630 If it vanished from our species, the rest of our lifestyle 324 00:14:22,630 --> 00:14:24,628 would be virtually unchanged." 325 00:14:24,628 --> 00:14:26,920 I think that might say a little more about Steve Pinker 326 00:14:26,920 --> 00:14:29,350 than it does about music. 327 00:14:29,350 --> 00:14:33,640 Nonetheless, it's a possible view. 328 00:14:33,640 --> 00:14:36,130 What he's saying is that music is not 329 00:14:36,130 --> 00:14:38,620 an evolutionary adaptation at all, 330 00:14:38,620 --> 00:14:41,320 but an alternate use of neural machinery 331 00:14:41,320 --> 00:14:43,090 that evolved for some other function. 332 00:14:43,090 --> 00:14:45,400 And then once you have this neural machinery, what 333 00:14:45,400 --> 00:14:47,470 the hell, you can invent cultural forms 334 00:14:47,470 --> 00:14:51,430 and use it to do other things like music. 335 00:14:51,430 --> 00:14:54,305 And the most obvious kind of neural machinery 336 00:14:54,305 --> 00:14:55,930 that you might co-opt for that function 337 00:14:55,930 --> 00:14:58,360 would be neural machinery for speech or neural machinery 338 00:14:58,360 --> 00:15:02,020 for language, which, as I argued briefly last time, 339 00:15:02,020 --> 00:15:03,260 are not the same thing. 340 00:15:03,260 --> 00:15:05,770 One is the auditory perception of speech sounds 341 00:15:05,770 --> 00:15:07,540 and the other is the understanding 342 00:15:07,540 --> 00:15:09,850 of linguistic meaning. 343 00:15:09,850 --> 00:15:12,460 So the nice thing about this is, finally 344 00:15:12,460 --> 00:15:17,740 after all this entertaining but speculative stuff, 345 00:15:17,740 --> 00:15:19,750 we have an empirical question. 346 00:15:19,750 --> 00:15:21,790 This is something we can ask empirically. 347 00:15:21,790 --> 00:15:24,580 Does music actually use the same machinery 348 00:15:24,580 --> 00:15:27,887 as speech or language, or does it not? 349 00:15:27,887 --> 00:15:29,470 Some of the rest of these speculations 350 00:15:29,470 --> 00:15:30,760 are very hard to test. 351 00:15:30,760 --> 00:15:31,600 So stay tuned. 352 00:15:31,600 --> 00:15:34,670 We'll get back to that shortly. 353 00:15:34,670 --> 00:15:38,180 But first, let's step back and think, OK, 354 00:15:38,180 --> 00:15:41,260 if music is an evolved capacity, it should 355 00:15:41,260 --> 00:15:44,680 be innate in some sense, at least genetically 356 00:15:44,680 --> 00:15:46,930 specified, right, because that's what evolution does 357 00:15:46,930 --> 00:15:53,380 is that natural selection acts on the genome to produce things 358 00:15:53,380 --> 00:15:57,010 that are genetically specified. 359 00:15:57,010 --> 00:15:59,950 And it should be present in all human societies, 360 00:15:59,950 --> 00:16:01,840 since the branching out of human societies 361 00:16:01,840 --> 00:16:04,820 is very recent in human evolution. 362 00:16:04,820 --> 00:16:06,970 So is it? 363 00:16:06,970 --> 00:16:11,140 Well, is music an innate? 364 00:16:11,140 --> 00:16:15,280 So, suppose we found specialized machinery in the brain 365 00:16:15,280 --> 00:16:17,290 and adults for music. 366 00:16:17,290 --> 00:16:18,910 And we showed really definitively, 367 00:16:18,910 --> 00:16:21,520 it's really, really, really specialized for music. 368 00:16:21,520 --> 00:16:24,390 Would that prove innateness? 369 00:16:24,390 --> 00:16:25,460 No, why not? 370 00:16:25,460 --> 00:16:29,933 AUDIENCE: Might have [INAUDIBLE].. 371 00:16:29,933 --> 00:16:32,830 NANCY KANWISHER: Bingo, thank you, very good. 372 00:16:32,830 --> 00:16:34,450 Yup, exactly. 373 00:16:34,450 --> 00:16:37,240 So this is something that many, many people are confused about, 374 00:16:37,240 --> 00:16:39,340 including colleagues of mine, most 375 00:16:39,340 --> 00:16:41,440 of the popular scientific press. 376 00:16:41,440 --> 00:16:43,570 Just because there's a specialized bit of brain 377 00:16:43,570 --> 00:16:46,270 that does x doesn't mean x is innate. 378 00:16:46,270 --> 00:16:47,530 It could be learned. 379 00:16:47,530 --> 00:16:50,638 And the clearest example of that is the visual word form area. 380 00:16:50,638 --> 00:16:51,430 Everybody get that? 381 00:16:54,280 --> 00:16:57,410 OK, so we've got to try something else. 382 00:16:57,410 --> 00:17:00,700 What if we find sensitivity to music, in some very music 383 00:17:00,700 --> 00:17:04,060 particular way, in newborns? 384 00:17:04,060 --> 00:17:07,569 Now that will get closer, but here's the problem. 385 00:17:07,569 --> 00:17:10,390 Fetuses can hear pretty well in the womb. 386 00:17:10,390 --> 00:17:13,060 And if the mom is singing or even if there's 387 00:17:13,060 --> 00:17:15,970 music in the ambient room, some of that sound 388 00:17:15,970 --> 00:17:17,540 gets into the womb. 389 00:17:17,540 --> 00:17:21,310 So that means that even if you show sensitivity to music, even 390 00:17:21,310 --> 00:17:24,619 in some very particular way, in a newborn, 391 00:17:24,619 --> 00:17:27,409 it's not a really tight argument that it wasn't, in part, 392 00:17:27,409 --> 00:17:27,909 learned. 393 00:17:30,520 --> 00:17:32,450 So this is a real challenge. 394 00:17:32,450 --> 00:17:34,540 It may just be impossible to answer. 395 00:17:34,540 --> 00:17:35,170 I'm not sure. 396 00:17:35,170 --> 00:17:35,920 I don't know how-- 397 00:17:35,920 --> 00:17:38,087 I don't know what method could actually answer this. 398 00:17:38,087 --> 00:17:40,060 But at the very least, it's really difficult 399 00:17:40,060 --> 00:17:41,500 and nobody's nailed it. 400 00:17:41,500 --> 00:17:45,820 So we can backtrack and ask the related, not quite 401 00:17:45,820 --> 00:17:49,570 as definitive question: "But OK, how early developing is it?" 402 00:17:49,570 --> 00:17:52,960 So often, developmental psychologists take this hedge. 403 00:17:52,960 --> 00:17:54,910 It's like, we can't exactly establish 404 00:17:54,910 --> 00:17:55,840 definitive innateness. 405 00:17:55,840 --> 00:17:58,660 But if things are really there very early 406 00:17:58,660 --> 00:18:00,730 and develop very fast, that's a suggestion 407 00:18:00,730 --> 00:18:04,610 that at least the system is designed to pick it up quickly. 408 00:18:04,610 --> 00:18:06,550 So even if there's a role for experience, 409 00:18:06,550 --> 00:18:09,008 there's some things that are picked up really fast and some 410 00:18:09,008 --> 00:18:10,280 things that aren't. 411 00:18:10,280 --> 00:18:13,450 And so how quickly is it picked up? 412 00:18:13,450 --> 00:18:15,430 So it turns out there's a bunch of studies 413 00:18:15,430 --> 00:18:16,600 that have looked at this. 414 00:18:16,600 --> 00:18:20,650 And young infants are in fact highly attuned to music. 415 00:18:20,650 --> 00:18:24,700 They're sensitive to pitch and to rhythm. 416 00:18:24,700 --> 00:18:27,670 And in one charming study, they took two 417 00:18:27,670 --> 00:18:30,640 to three-day-old infants who were sleeping, 418 00:18:30,640 --> 00:18:35,380 put EEG electrodes on them, and played them. 419 00:18:35,380 --> 00:18:37,750 They wanted to test beat induction, which is 420 00:18:37,750 --> 00:18:39,583 when you hear a rhythmic beat. 421 00:18:39,583 --> 00:18:40,750 You get trained to the beat. 422 00:18:40,750 --> 00:18:42,730 And you know when the next beat is. 423 00:18:42,730 --> 00:18:45,280 And that's true even if it's not just a single pulse. 424 00:18:45,280 --> 00:18:50,170 So they played these infants sounds like this. 425 00:18:50,170 --> 00:18:51,580 Oh, but the audio is not on. 426 00:18:54,820 --> 00:18:56,630 Now it's going to blast everyone. 427 00:19:02,030 --> 00:19:02,960 All right, hang on. 428 00:19:02,960 --> 00:19:04,151 AUDIENCE: It's playing. 429 00:19:04,151 --> 00:19:05,030 NANCY KANWISHER: Oh, it is playing? 430 00:19:05,030 --> 00:19:05,540 Turn up more? 431 00:19:05,540 --> 00:19:06,040 OK. 432 00:19:09,040 --> 00:19:12,680 Didn't want to deafen people. 433 00:19:12,680 --> 00:19:13,180 OK, here. 434 00:19:13,180 --> 00:19:14,300 AUDIENCE: It's going a little [INAUDIBLE].. 435 00:19:14,300 --> 00:19:15,837 Just turn it up so you can hear it. 436 00:19:26,102 --> 00:19:27,477 AUDIENCE: Go to HDMI, [INAUDIBLE] 437 00:19:27,477 --> 00:19:28,451 plugged in [INAUDIBLE]. 438 00:19:28,451 --> 00:19:32,230 NANCY KANWISHER: It's not, but that's supposed to work, right? 439 00:19:32,230 --> 00:19:35,655 It has worked before 440 00:19:35,655 --> 00:19:36,989 AUDIENCE: In there. 441 00:19:36,989 --> 00:19:39,614 AUDIENCE: Let's just check your system settings really quickly. 442 00:19:46,530 --> 00:19:48,430 So I can hear you from my system. 443 00:19:48,430 --> 00:19:49,847 NANCY KANWISHER: Yeah, it's weird. 444 00:19:52,223 --> 00:19:54,640 AUDIENCE: Wait, if I can hear you from my system, you're-- 445 00:19:54,640 --> 00:19:57,032 NANCY KANWISHER: Then, it is going out, yeah. 446 00:19:57,032 --> 00:19:59,790 AUDIENCE: Oh, somebody unplugged both. 447 00:19:59,790 --> 00:20:01,010 OK, let's try [INAUDIBLE]. 448 00:20:01,010 --> 00:20:01,885 NANCY KANWISHER: Aah. 449 00:20:01,885 --> 00:20:03,530 AUDIENCE: OK, try it one more time. 450 00:20:03,530 --> 00:20:06,710 NANCY KANWISHER: OK, here we go. 451 00:20:06,710 --> 00:20:10,650 [MUSIC PLAYING] 452 00:20:10,650 --> 00:20:11,940 Did you hear that glitch? 453 00:20:11,940 --> 00:20:14,740 Let me do it again. 454 00:20:14,740 --> 00:20:16,987 Take it back here. 455 00:20:16,987 --> 00:20:21,680 [MUSIC PLAYING] 456 00:20:21,680 --> 00:20:24,770 Everybody here the hiccup in the beat? 457 00:20:24,770 --> 00:20:26,750 So that's what these guys tested. 458 00:20:26,750 --> 00:20:33,080 They played rhythms like that to two to three-day-old infants. 459 00:20:33,080 --> 00:20:35,864 And-- 460 00:20:35,864 --> 00:20:36,535 [MUSIC PLAYING] 461 00:20:36,535 --> 00:20:37,410 Oh, now it's working. 462 00:20:37,410 --> 00:20:37,910 OK, great. 463 00:20:37,910 --> 00:20:40,980 OK, anyway, so here's what they find with their ERPs. 464 00:20:40,980 --> 00:20:45,330 This is the onset of that little hiccup, the time when 465 00:20:45,330 --> 00:20:47,970 that beat was supposed to happen and didn't, the missing 466 00:20:47,970 --> 00:20:49,410 beat right there. 467 00:20:49,410 --> 00:20:52,020 And this is an ERP response happening 468 00:20:52,020 --> 00:20:58,260 about 200 milliseconds later for that missing but expected beat. 469 00:20:58,260 --> 00:21:00,720 And let's see, this is a standard 470 00:21:00,720 --> 00:21:02,050 where the beat keeps going. 471 00:21:02,050 --> 00:21:03,760 Now you might say, well, of course they're different. 472 00:21:03,760 --> 00:21:05,302 One has a beat there and one doesn't. 473 00:21:05,302 --> 00:21:06,670 They're acoustically different. 474 00:21:06,670 --> 00:21:08,760 So they have a control condition which 475 00:21:08,760 --> 00:21:11,640 has a beat, but a different preceding context. 476 00:21:11,640 --> 00:21:14,995 So where that beat is not-- 477 00:21:14,995 --> 00:21:16,620 I'm sorry, where it has a missing beat, 478 00:21:16,620 --> 00:21:19,290 but that's expected by the previous context. 479 00:21:19,290 --> 00:21:22,620 So that's just evidence that even young infants 480 00:21:22,620 --> 00:21:26,190 have some sense of beat. 481 00:21:26,190 --> 00:21:29,670 So moving a little later, by five to six months, 482 00:21:29,670 --> 00:21:32,040 infants can recognize a familiar melody, 483 00:21:32,040 --> 00:21:34,710 even if it's shifted in pitch from the version 484 00:21:34,710 --> 00:21:36,120 that they learned. 485 00:21:36,120 --> 00:21:38,280 And that's really cool, because that 486 00:21:38,280 --> 00:21:41,130 means they use relative pitch, not absolute pitch. 487 00:21:41,130 --> 00:21:43,650 And that's something that adults do in music. 488 00:21:43,650 --> 00:21:44,740 We're very good at that. 489 00:21:44,740 --> 00:21:46,230 But no animal can do that. 490 00:21:46,230 --> 00:21:49,410 You can train animals to do various things like recognize 491 00:21:49,410 --> 00:21:52,260 a particular pair of sounds or even 492 00:21:52,260 --> 00:21:54,180 a few sounds, a few pitches. 493 00:21:54,180 --> 00:21:56,680 But if you transpose it, they don't recognize that. 494 00:21:56,680 --> 00:21:57,180 Yeah, Ben. 495 00:22:00,066 --> 00:22:02,540 AUDIENCE: Isn't it possible that we're 496 00:22:02,540 --> 00:22:04,982 just sensitive to rhythm and pitch 497 00:22:04,982 --> 00:22:06,815 rather than being sensitive to music itself? 498 00:22:06,815 --> 00:22:09,000 NANCY KANWISHER: Yes, hang on to that thought. 499 00:22:09,000 --> 00:22:11,820 It takes more work to show that it's music per se 500 00:22:11,820 --> 00:22:14,220 rather than just rhythm and pitch. 501 00:22:14,220 --> 00:22:16,200 We'd have to say what we meant by rhythm. 502 00:22:16,200 --> 00:22:18,510 If we load enough into the idea of rhythm, then 503 00:22:18,510 --> 00:22:20,230 it's like most of music right there. 504 00:22:20,230 --> 00:22:23,160 But we might say just even beat. 505 00:22:23,160 --> 00:22:24,660 How about that, right? 506 00:22:24,660 --> 00:22:26,670 And actually, already this study already 507 00:22:26,670 --> 00:22:28,230 is not just an even beat, because it 508 00:22:28,230 --> 00:22:29,730 has more context than that. 509 00:22:29,730 --> 00:22:35,520 That is, for example, the beats in this ERP infant study 510 00:22:35,520 --> 00:22:37,657 were not emphasized louder. 511 00:22:37,657 --> 00:22:39,990 The infants have to be able to pick out what the beat is 512 00:22:39,990 --> 00:22:41,910 from that complex sound. 513 00:22:41,910 --> 00:22:44,160 It's not automatically there in the acoustic signal 514 00:22:44,160 --> 00:22:46,725 as the louder onset sound. 515 00:22:51,810 --> 00:22:53,880 Five-month-old infants, if you play 516 00:22:53,880 --> 00:22:56,280 them a melody for one or two weeks, so they 517 00:22:56,280 --> 00:22:58,845 get really familiar with it and learn it, 518 00:22:58,845 --> 00:23:01,470 and then you don't play it again and you come back eight months 519 00:23:01,470 --> 00:23:04,340 later, they remember it. 520 00:23:04,340 --> 00:23:08,000 So music is really salient to infants. 521 00:23:08,000 --> 00:23:11,535 On the other hand, newborn infants' appreciation of music 522 00:23:11,535 --> 00:23:12,035 is not-- 523 00:23:16,930 --> 00:23:19,570 what is that not doing there? 524 00:23:19,570 --> 00:23:21,700 Oh, yeah, that's right. 525 00:23:21,700 --> 00:23:28,360 So they don't prefer consonance over dissonance, right. 526 00:23:28,360 --> 00:23:31,900 And they're insensitive to key. 527 00:23:35,020 --> 00:23:41,740 And they detect timing changes in rhythms, 528 00:23:41,740 --> 00:23:44,110 whether they are timing changes that 529 00:23:44,110 --> 00:23:46,000 are typical in the kind of music they've 530 00:23:46,000 --> 00:23:50,470 heard or typical in a more foreign kind of music. 531 00:23:50,470 --> 00:23:54,760 And so a really nice study that shows this 532 00:23:54,760 --> 00:23:58,030 is that in Western music, it's really common to have-- 533 00:23:58,030 --> 00:24:00,880 most Western music has isochronous beat. 534 00:24:00,880 --> 00:24:02,350 So you can see that over here. 535 00:24:02,350 --> 00:24:03,940 Here's an isochronous beat. 536 00:24:03,940 --> 00:24:05,980 Those are even, temporal intervals. 537 00:24:05,980 --> 00:24:08,930 And there's a whole note here and then half notes. 538 00:24:08,930 --> 00:24:11,860 And they're all multiples of each other, just wholes 539 00:24:11,860 --> 00:24:20,230 and halves, with the beat happening every four notes. 540 00:24:20,230 --> 00:24:22,630 Non-isochronous beat has this funny business where 541 00:24:22,630 --> 00:24:29,028 there's a whole note and a half note, making up just three-- 542 00:24:29,028 --> 00:24:30,320 what do you call those things-- 543 00:24:30,320 --> 00:24:31,040 they're not beats. 544 00:24:31,040 --> 00:24:31,670 What are they called? 545 00:24:31,670 --> 00:24:32,630 AUDIENCE: Three-beat notes. 546 00:24:32,630 --> 00:24:33,710 NANCY KANWISHER: Sorry, three notes, I guess. 547 00:24:33,710 --> 00:24:35,330 But it's not even notes, because it's whatever. 548 00:24:35,330 --> 00:24:36,872 I don't know what the terminology is. 549 00:24:36,872 --> 00:24:40,520 But anyway, this sound here followed by 4. 550 00:24:40,520 --> 00:24:42,410 This is non-isochronous rhythm. 551 00:24:42,410 --> 00:24:46,640 Those are really common in Balkan music 552 00:24:46,640 --> 00:24:50,392 where they do all kinds of crazy things, like 8/22 553 00:24:50,392 --> 00:24:51,350 or something like that. 554 00:24:51,350 --> 00:24:54,110 I mean, like really, really crazy musical meters. 555 00:24:54,110 --> 00:24:55,760 They're awesome, I love them. 556 00:24:55,760 --> 00:24:57,230 But they are very other. 557 00:24:57,230 --> 00:25:01,280 Like, if you grew up in Western society 558 00:25:01,280 --> 00:25:03,020 when you first hear Balkan rhythms, 559 00:25:03,020 --> 00:25:05,240 it's very hard to copy them. 560 00:25:05,240 --> 00:25:08,540 But six-month-old infants get rhythms 561 00:25:08,540 --> 00:25:14,030 equally well if they're isochronous or non-isochronous. 562 00:25:14,030 --> 00:25:18,020 By 12 months, they can only get automatically, 563 00:25:18,020 --> 00:25:21,020 like immediately, perceive and appreciate 564 00:25:21,020 --> 00:25:24,560 rhythms that are familiar from their cultural exposure. 565 00:25:24,560 --> 00:25:27,440 That is isochronous if they're from a Western society 566 00:25:27,440 --> 00:25:32,730 or non-isochronous if they're from a Balkan country. 567 00:25:32,730 --> 00:25:33,230 Yeah? 568 00:25:33,230 --> 00:25:35,465 AUDIENCE: just what is getting a meter again? 569 00:25:35,465 --> 00:25:36,840 NANCY KANWISHER: Well, so there's 570 00:25:36,840 --> 00:25:37,690 a whole bunch of studies. 571 00:25:37,690 --> 00:25:38,773 I'm just summarizing here. 572 00:25:38,773 --> 00:25:41,940 That is, they're sensitive to violations by all kinds 573 00:25:41,940 --> 00:25:45,000 of measures of little whatever behavioral thing you can get 574 00:25:45,000 --> 00:25:47,730 out of a five-month-old, whether it's how much they're kicking 575 00:25:47,730 --> 00:25:49,050 their legs or how much-- 576 00:25:49,050 --> 00:25:51,300 often, it's how hard they're sucking 577 00:25:51,300 --> 00:25:52,830 on a pacifier is another measure. 578 00:25:52,830 --> 00:25:54,870 So you just see, can they detect changes 579 00:25:54,870 --> 00:25:57,560 in a stimulus or violations by any of those measures. 580 00:25:57,560 --> 00:25:58,935 Or you could do it with the ERPs. 581 00:26:02,790 --> 00:26:08,880 So brief exposure to a previously unfamiliar rhythm 582 00:26:08,880 --> 00:26:12,120 is enough for a 12-month-old to appreciate 583 00:26:12,120 --> 00:26:14,610 the relevant distinctions in that rhythm, 584 00:26:14,610 --> 00:26:16,800 but not for adults. 585 00:26:16,800 --> 00:26:20,340 So if you haven't heard non-isochronous Balkan rhythms 586 00:26:20,340 --> 00:26:25,842 until now and you try dancing to them, good luck to you. 587 00:26:25,842 --> 00:26:27,300 You can probably get it eventually, 588 00:26:27,300 --> 00:26:30,270 but it will take you a long time. 589 00:26:30,270 --> 00:26:31,710 So does this sound familiar? 590 00:26:34,470 --> 00:26:36,540 Perceptual narrowing, right? 591 00:26:36,540 --> 00:26:38,370 So we keep encountering this. 592 00:26:38,370 --> 00:26:41,400 We encountered this with face recognition, 593 00:26:41,400 --> 00:26:46,140 with same versus other races, same versus other species. 594 00:26:46,140 --> 00:26:48,090 You see it in face recognition. 595 00:26:48,090 --> 00:26:49,920 We encountered it with phoneme perception. 596 00:26:49,920 --> 00:26:53,190 The phonemes-- remember, newborn infants can distinguish 597 00:26:53,190 --> 00:26:55,230 all the phonemes of the world's languages, 598 00:26:55,230 --> 00:26:57,990 even those exotic clicks that I played last time 599 00:26:57,990 --> 00:27:00,240 from Southern African languages. 600 00:27:00,240 --> 00:27:03,780 And you guys can't distinguish all those clicks now. 601 00:27:03,780 --> 00:27:06,420 So that's perceptual narrowing. 602 00:27:06,420 --> 00:27:08,520 It makes sense, of course, because the reason we 603 00:27:08,520 --> 00:27:11,745 have perceptual narrowing is you want to have invariants. 604 00:27:11,745 --> 00:27:13,620 You want to appreciate the sameness of things 605 00:27:13,620 --> 00:27:15,120 across transformations. 606 00:27:15,120 --> 00:27:19,020 And if your speech culture or your music culture 607 00:27:19,020 --> 00:27:22,027 is telling you these two things, this variation, doesn't count, 608 00:27:22,027 --> 00:27:23,610 you want to throw away that difference 609 00:27:23,610 --> 00:27:25,060 and treat them as the same. 610 00:27:25,060 --> 00:27:28,200 And then once you do that, you can't make that discrimination 611 00:27:28,200 --> 00:27:31,380 anymore. 612 00:27:31,380 --> 00:27:33,240 So on this question we started with, 613 00:27:33,240 --> 00:27:35,130 is music an evolved capacity. 614 00:27:35,130 --> 00:27:36,900 If so, it should be innate. 615 00:27:36,900 --> 00:27:39,420 And we haven't really answered that question, maybe. 616 00:27:39,420 --> 00:27:42,810 But as I said, it's really hard, and maybe ultimately 617 00:27:42,810 --> 00:27:43,410 unanswerable. 618 00:27:43,410 --> 00:27:45,688 But certainly it's early developing. 619 00:27:45,688 --> 00:27:46,980 What about this other question? 620 00:27:46,980 --> 00:27:50,970 Is it present in all human societies? 621 00:27:50,970 --> 00:27:53,910 Well, I said before briefly that it is. 622 00:27:53,910 --> 00:27:56,400 Oh yeah, sorry, we have to back up and say, OK, 623 00:27:56,400 --> 00:28:00,450 to answer this question, we have to say what is music. 624 00:28:00,450 --> 00:28:02,580 To answer whether it's present in all societies. 625 00:28:02,580 --> 00:28:05,910 And this has been a real problem, because music 626 00:28:05,910 --> 00:28:07,890 is notoriously hard to define. 627 00:28:07,890 --> 00:28:10,200 And many people have made a point 628 00:28:10,200 --> 00:28:13,950 of stretching the definition of music, including 629 00:28:13,950 --> 00:28:17,970 the ridiculous and hilarious John Cage. 630 00:28:23,990 --> 00:28:27,650 So this is his 1960 TV appearance. 631 00:28:27,650 --> 00:28:28,490 [VIDEO PLAYBACK] 632 00:28:28,490 --> 00:28:30,770 - Over here, Mr. Cage has a tape recording machine, 633 00:28:30,770 --> 00:28:33,283 which will provide much of the-- will you touch the machine 634 00:28:33,283 --> 00:28:35,450 so we can know where it is-- which will provide much 635 00:28:35,450 --> 00:28:37,220 of the background. 636 00:28:37,220 --> 00:28:39,860 Also, he works with a stopwatch. 637 00:28:39,860 --> 00:28:41,930 The reason he does this is because these sounds 638 00:28:41,930 --> 00:28:45,800 are in no sense accidental in their sequence. 639 00:28:45,800 --> 00:28:47,840 They each must fall mathematically 640 00:28:47,840 --> 00:28:49,070 at a precise point. 641 00:28:49,070 --> 00:28:51,170 So he wants to watch as he works. 642 00:28:51,170 --> 00:28:52,740 He takes it seriously. 643 00:28:52,740 --> 00:28:53,990 I think it's interesting. 644 00:28:53,990 --> 00:28:56,790 If you are amused, you may laugh. 645 00:28:56,790 --> 00:28:59,490 If you like it, you may buy the recording. 646 00:28:59,490 --> 00:29:02,666 John Cage and "Water Walk." 647 00:29:08,618 --> 00:29:15,562 [EXPERIMENTAL MUSICAL SOUNDS] 648 00:29:44,739 --> 00:29:45,322 [END PLAYBACK] 649 00:29:45,322 --> 00:29:48,290 NANCY KANWISHER: Anyway, it goes on and on like that. 650 00:29:48,290 --> 00:29:52,710 I guess it was a little edgier in 1959 than it is now. 651 00:29:52,710 --> 00:29:54,560 But he's making a point. 652 00:29:54,560 --> 00:29:58,940 He's making a point is, what the hell is music. 653 00:29:58,940 --> 00:30:02,990 And he's saying, I can call this music if I want. 654 00:30:02,990 --> 00:30:05,925 And everybody's enjoying it. 655 00:30:05,925 --> 00:30:06,425 Anyway. 656 00:30:11,420 --> 00:30:13,490 So you can watch the YouTube video, if you want. 657 00:30:13,490 --> 00:30:16,130 It's quite entertaining. 658 00:30:16,130 --> 00:30:18,182 Despite this kind of nihilistic view 659 00:30:18,182 --> 00:30:19,640 that anything could count is music, 660 00:30:19,640 --> 00:30:21,860 there are some things we can say. 661 00:30:21,860 --> 00:30:24,548 First thing I'd say is, if you want to study music, 662 00:30:24,548 --> 00:30:26,090 one of your first things you run into 663 00:30:26,090 --> 00:30:27,420 is, oh, what's going to count. 664 00:30:27,420 --> 00:30:28,753 You run into this problem here. 665 00:30:28,753 --> 00:30:30,170 But actually, I think that doesn't 666 00:30:30,170 --> 00:30:32,540 need to be so paralyzing as it feels at first. 667 00:30:32,540 --> 00:30:35,030 You can just take the most canonical forms where all 668 00:30:35,030 --> 00:30:38,150 of your subjects will agree that this is music and this isn't. 669 00:30:38,150 --> 00:30:40,630 And then someday you can study the edge cases later, 670 00:30:40,630 --> 00:30:43,130 but you don't need to agonize about them in order to get off 671 00:30:43,130 --> 00:30:45,020 the ground and study it. 672 00:30:45,020 --> 00:30:49,850 Further, we can ask what is music cross-culturally. 673 00:30:49,850 --> 00:30:52,580 Oh, right, I keep forgetting my next point. 674 00:30:52,580 --> 00:30:55,400 And let me make another point is that music is not just 675 00:30:55,400 --> 00:30:57,800 about a set of acoustic properties. 676 00:30:57,800 --> 00:31:01,250 You may think of music as just an auditory thing, 677 00:31:01,250 --> 00:31:06,570 a solitary experience, because a lot of the time it's like that. 678 00:31:06,570 --> 00:31:10,670 But remember that that's a very recent cultural invention. 679 00:31:10,670 --> 00:31:13,250 And throughout most of human evolution, 680 00:31:13,250 --> 00:31:16,970 music has been a fundamentally social phenomenon, more 681 00:31:16,970 --> 00:31:21,050 like this, experienced in groups of people 682 00:31:21,050 --> 00:31:25,130 as a kind of deeply social, communicative, interactive kind 683 00:31:25,130 --> 00:31:25,970 of enterprise. 684 00:31:25,970 --> 00:31:28,340 Or even if not in a large group, music 685 00:31:28,340 --> 00:31:30,410 is very social in this sense here. 686 00:31:30,410 --> 00:31:33,530 There's a whole bunch of cool studies about the role of song 687 00:31:33,530 --> 00:31:36,530 in infants and how infants use song to glean information 688 00:31:36,530 --> 00:31:38,900 about their social environment. 689 00:31:38,900 --> 00:31:41,310 And the point is just music is extremely social. 690 00:31:41,310 --> 00:31:45,200 It's not just defined by its acoustic properties. 691 00:31:45,200 --> 00:31:48,440 But in addition, we can ask, OK, let's 692 00:31:48,440 --> 00:31:52,320 look across the cultures of the world and ask, 693 00:31:52,320 --> 00:31:53,810 are there universals of music? 694 00:31:53,810 --> 00:31:57,740 Is there anything in common across all the different kinds 695 00:31:57,740 --> 00:32:00,980 of music that people experience in different cultures? 696 00:32:00,980 --> 00:32:04,370 For example, are there always discrete pitches or always 697 00:32:04,370 --> 00:32:05,163 isochronous beats. 698 00:32:05,163 --> 00:32:06,830 I already showed you there aren't always 699 00:32:06,830 --> 00:32:07,940 isochronous beats. 700 00:32:07,940 --> 00:32:10,940 And this is nice because it's an empirical question. 701 00:32:10,940 --> 00:32:13,700 There's a really cool paper from a few years ago 702 00:32:13,700 --> 00:32:16,190 where they took recordings of music 703 00:32:16,190 --> 00:32:19,340 from all over the world, all those colored dots, 704 00:32:19,340 --> 00:32:22,430 and they asked, what are the properties that 705 00:32:22,430 --> 00:32:26,540 are present in most of those musics 706 00:32:26,540 --> 00:32:28,010 and how prevalent are they. 707 00:32:28,010 --> 00:32:30,680 And what they found is there's no single property of music 708 00:32:30,680 --> 00:32:32,880 that's present in all of those cultures, 709 00:32:32,880 --> 00:32:35,150 but there's many that are present in most, 710 00:32:35,150 --> 00:32:37,260 and there are a lot of regularities. 711 00:32:37,260 --> 00:32:40,790 So this is a huge table from their paper 712 00:32:40,790 --> 00:32:44,570 where they list many different possible universals. 713 00:32:44,570 --> 00:32:48,080 And what you see is the relevant column is this one here. 714 00:32:48,080 --> 00:32:52,550 And the white is the percent of those 304 cultures 715 00:32:52,550 --> 00:32:56,720 that they looked at that have that property in their music. 716 00:32:56,720 --> 00:32:59,242 So these top ones are very prevalent, 717 00:32:59,242 --> 00:33:00,950 just not quite universal, because there's 718 00:33:00,950 --> 00:33:04,580 a couple of cases that don't have it. 719 00:33:04,580 --> 00:33:07,070 So one of the most common ones is the idea 720 00:33:07,070 --> 00:33:10,280 that melodies are made from a limited set 721 00:33:10,280 --> 00:33:13,520 of discrete pitches, seven or fewer, 722 00:33:13,520 --> 00:33:17,210 and that those pitches are arranged in some kind of scale 723 00:33:17,210 --> 00:33:20,630 with unequal intervals between the notes. 724 00:33:20,630 --> 00:33:23,240 So that's as close to a universal of music 725 00:33:23,240 --> 00:33:24,740 as you can get, although you can see 726 00:33:24,740 --> 00:33:26,360 from that little teeny black snip 727 00:33:26,360 --> 00:33:29,780 that it's not quite perfectly universal. 728 00:33:29,780 --> 00:33:33,170 And the second thing is that most music has 729 00:33:33,170 --> 00:33:36,680 some kind of regular pulse, either an isochronous 730 00:33:36,680 --> 00:33:39,950 beat or even the non-isochronous ones 731 00:33:39,950 --> 00:33:42,890 have different subdivisions with different numbers of beats 732 00:33:42,890 --> 00:33:46,070 so that there's a systematic rhythmic pattern. 733 00:33:46,070 --> 00:33:49,340 So there's something kind of like melody and something 734 00:33:49,340 --> 00:33:54,410 kind of like rhythm in almost all the world's musics. 735 00:33:54,410 --> 00:33:57,110 They did find some pretty weird ones, one 736 00:33:57,110 --> 00:33:58,530 I can't resist playing for you. 737 00:33:58,530 --> 00:34:00,753 This is from Papua New Guinea. 738 00:34:00,753 --> 00:34:03,170 So as they say, the closest thing to an absolute universal 739 00:34:03,170 --> 00:34:06,140 was song containing discrete pitches, 740 00:34:06,140 --> 00:34:09,500 or regular rhythmic patterns, or both, which implied 741 00:34:09,500 --> 00:34:11,010 to almost the entire sample. 742 00:34:11,010 --> 00:34:13,850 However, music examples from Papua New Guinea 743 00:34:13,850 --> 00:34:18,949 contain combinations of friction blocks, swung slats, ribbon 744 00:34:18,949 --> 00:34:20,980 reeds, and moaning voices-- 745 00:34:20,980 --> 00:34:22,730 I don't know what those things are either, 746 00:34:22,730 --> 00:34:24,770 but I'll play them for you in a second-- 747 00:34:24,770 --> 00:34:28,489 that contained neither discrete pitches nor an isochronous 748 00:34:28,489 --> 00:34:29,060 beat. 749 00:34:29,060 --> 00:34:29,752 OK, here we go. 750 00:34:29,752 --> 00:34:30,419 [VIDEO PLAYBACK] 751 00:34:30,419 --> 00:34:37,290 [PAPUA NEW GUINEAN MUSIC] 752 00:34:46,132 --> 00:34:47,090 [END PLAYBACK] 753 00:34:47,090 --> 00:34:50,210 OK, pretty wild, huh? 754 00:34:50,210 --> 00:34:53,210 So maybe wilder, arguably, than John Cage. 755 00:34:53,210 --> 00:34:57,710 But anyway, so there are some like pretty remote edges 756 00:34:57,710 --> 00:35:01,640 to the concept of music. 757 00:35:01,640 --> 00:35:05,240 I mentioned before the case of consonance and dissonance 758 00:35:05,240 --> 00:35:09,130 and that infants don't prefer one over the other. 759 00:35:09,130 --> 00:35:12,770 In fact, this links to a really cool recent study 760 00:35:12,770 --> 00:35:14,360 from Josh McDermott's lab. 761 00:35:14,360 --> 00:35:18,230 And so the question he asked is, why do we like consonant sounds 762 00:35:18,230 --> 00:35:19,910 like this-- 763 00:35:19,910 --> 00:35:21,860 oops, [INAUDIBLE] play. 764 00:35:21,860 --> 00:35:22,400 Here we go. 765 00:35:22,400 --> 00:35:24,310 [RHYTHMIC SOUND] 766 00:35:24,310 --> 00:35:26,230 Kind of nice, right? 767 00:35:26,230 --> 00:35:28,810 But we're not so hot about this. 768 00:35:28,810 --> 00:35:30,600 [OFF TUNE SOUND] 769 00:35:30,600 --> 00:35:32,770 Right, everybody get that intuition? 770 00:35:32,770 --> 00:35:34,960 OK so what's up with that? 771 00:35:34,960 --> 00:35:37,240 So many people have hypothesized for a long time 772 00:35:37,240 --> 00:35:39,640 that that difference is based in biology, 773 00:35:39,640 --> 00:35:42,250 or even it's like a physical analog of it, 774 00:35:42,250 --> 00:35:45,040 beats and stuff like that. 775 00:35:45,040 --> 00:35:48,530 But actually, it's an empirical question. 776 00:35:48,530 --> 00:35:50,140 And so one way to ask that question 777 00:35:50,140 --> 00:35:53,950 is to go to a culture that's had minimal exposure 778 00:35:53,950 --> 00:35:57,303 to Western music, all of which really prefers consonance 779 00:35:57,303 --> 00:35:57,970 over dissonance. 780 00:35:57,970 --> 00:35:58,715 Yes, [? Carly? ?] 781 00:35:58,715 --> 00:36:00,173 AUDIENCE: Is consonants [INAUDIBLE] 782 00:36:00,173 --> 00:36:01,556 differentiated [INAUDIBLE]? 783 00:36:01,556 --> 00:36:03,070 NANCY KANWISHER: Oh, yeah, yeah. 784 00:36:03,070 --> 00:36:04,690 I'm sorry, totally different word-- 785 00:36:04,690 --> 00:36:09,450 consonance, C-E, has no relationship 786 00:36:09,450 --> 00:36:13,500 to consonants as distinguished from vowels. 787 00:36:13,500 --> 00:36:15,750 A consonant and a vowel, those are two different kinds 788 00:36:15,750 --> 00:36:16,692 of phonemes. 789 00:36:16,692 --> 00:36:18,900 Here, consonance is that difference between those two 790 00:36:18,900 --> 00:36:20,940 sounds I just played. 791 00:36:20,940 --> 00:36:24,810 And it has to do with the precise intervals 792 00:36:24,810 --> 00:36:29,700 of those harmonics in the harmonic stack. 793 00:36:29,700 --> 00:36:36,470 All right, so what McDermott and his co-workers did is to go 794 00:36:36,470 --> 00:36:41,900 to a Bolivian culture in the rainforest in a very remote 795 00:36:41,900 --> 00:36:45,680 location to test these people here, the Tsimane'. 796 00:36:45,680 --> 00:36:48,590 And the Tsimane' lack televisions 797 00:36:48,590 --> 00:36:53,360 and have very little access to recorded music and radio. 798 00:36:53,360 --> 00:36:56,510 Their village doesn't have electricity or tap water. 799 00:36:56,510 --> 00:37:01,010 You can't get there by road and you have to get there by canoe. 800 00:37:01,010 --> 00:37:03,410 So that's what McDermott and his team did. 801 00:37:03,410 --> 00:37:05,900 They went down there to visit the Tsimane'. 802 00:37:05,900 --> 00:37:09,050 And what they found, they played them 803 00:37:09,050 --> 00:37:12,290 consonant sounds and dissonant sounds, and with a translator, 804 00:37:12,290 --> 00:37:13,820 and spent a lot of time making sure 805 00:37:13,820 --> 00:37:16,580 that they really understood the difference between liking 806 00:37:16,580 --> 00:37:17,360 and not liking. 807 00:37:17,360 --> 00:37:18,877 And they tested their understanding 808 00:37:18,877 --> 00:37:20,960 of what it means to like something or not like it, 809 00:37:20,960 --> 00:37:22,340 and all kinds of other ways. 810 00:37:22,340 --> 00:37:25,010 And the upshot is, the Tsimane' do not 811 00:37:25,010 --> 00:37:29,600 have a preference for consonance over dissonance. 812 00:37:29,600 --> 00:37:32,150 So it's not a cultural universal. 813 00:37:32,150 --> 00:37:33,680 And that's consistent with the idea 814 00:37:33,680 --> 00:37:38,120 that it's not a preference in infants either. 815 00:37:38,120 --> 00:37:42,650 So this is something specific to Western music. 816 00:37:42,650 --> 00:37:47,030 So that's kind of introduction to some stuff about what music 817 00:37:47,030 --> 00:37:49,100 is and what its variability is and the fact 818 00:37:49,100 --> 00:37:51,680 that its presence is universal. 819 00:37:51,680 --> 00:37:55,370 And there are many very common properties across the world's 820 00:37:55,370 --> 00:37:59,710 musics, and it developed early. 821 00:37:59,710 --> 00:38:02,680 So let's ask, is music a separate capacity 822 00:38:02,680 --> 00:38:04,330 in the mind and brain. 823 00:38:04,330 --> 00:38:06,580 All right, so let's start with the classic way 824 00:38:06,580 --> 00:38:08,920 this has been asked for many decades, 825 00:38:08,920 --> 00:38:11,350 and that's to study patients with brain damage. 826 00:38:11,350 --> 00:38:13,510 And it turns out there is such a thing 827 00:38:13,510 --> 00:38:17,560 as amusia, the loss of music ability after brain damage. 828 00:38:21,130 --> 00:38:23,290 And so there are both sides of this. 829 00:38:23,290 --> 00:38:25,090 There are people who have impaired ability 830 00:38:25,090 --> 00:38:29,200 to recognize melodies without impaired speech perception. 831 00:38:29,200 --> 00:38:31,030 And there's the opposite-- people 832 00:38:31,030 --> 00:38:34,090 who have impaired speech recognition without impaired 833 00:38:34,090 --> 00:38:36,580 melody recognition. 834 00:38:36,580 --> 00:38:39,730 So that is, of course, a double dissociation, sort of, 835 00:38:39,730 --> 00:38:41,260 it's a little mucky in there. 836 00:38:41,260 --> 00:38:43,150 If you state the word simply like that, 837 00:38:43,150 --> 00:38:47,320 if you look in detail, there's some muck, as there often is. 838 00:38:47,320 --> 00:38:49,930 So let's look in a little more detail at these two cases, 839 00:38:49,930 --> 00:38:54,580 the most interesting ones who seem to have problems 840 00:38:54,580 --> 00:38:59,230 with auditory tunes but not with words or other familiar sounds. 841 00:38:59,230 --> 00:39:02,080 So here is a horizontal slice. 842 00:39:02,080 --> 00:39:03,010 This is an old study. 843 00:39:03,010 --> 00:39:05,800 So it's a CAT scan showing you something's 844 00:39:05,800 --> 00:39:09,890 up with the anterior temporal lobes in this patient. 845 00:39:09,890 --> 00:39:14,110 And this was true of these two classic patients, CN and GL. 846 00:39:14,110 --> 00:39:17,530 Both of them were very bad at recognizing melodies, even 847 00:39:17,530 --> 00:39:21,160 highly familiar melodies, happy birthday and stuff like that, 848 00:39:21,160 --> 00:39:23,530 they don't recognize. 849 00:39:23,530 --> 00:39:26,330 They mostly have intact rhythm perception. 850 00:39:26,330 --> 00:39:28,960 And this is a core question we'll come back to. 851 00:39:28,960 --> 00:39:31,390 It's a complicated non-resolved situation. 852 00:39:31,390 --> 00:39:34,570 But these guys had intact rhythm perception 853 00:39:34,570 --> 00:39:37,810 and relatively intact language and speech perception. 854 00:39:40,360 --> 00:39:43,960 However, upon further testing, it 855 00:39:43,960 --> 00:39:46,690 becomes clear that these guys have a more general problem 856 00:39:46,690 --> 00:39:52,090 with pitch perception, even if it's not 857 00:39:52,090 --> 00:39:54,220 in the context of music. 858 00:39:54,220 --> 00:39:56,830 So this is a question that I asked all of you guys 859 00:39:56,830 --> 00:39:59,590 to think about for in the opposite direction 860 00:39:59,590 --> 00:40:02,830 in your assignment for Sunday night. 861 00:40:02,830 --> 00:40:06,310 When I asked you whether those electrodes 862 00:40:06,310 --> 00:40:08,770 in the brains of epilepsy patients 863 00:40:08,770 --> 00:40:13,420 that are sensitive to speech prosody, 864 00:40:13,420 --> 00:40:16,210 to the intonation contour in speech, 865 00:40:16,210 --> 00:40:18,370 I asked you whether you thought they would also 866 00:40:18,370 --> 00:40:21,310 be sensitive to the intonation contour in melodies. 867 00:40:21,310 --> 00:40:25,370 And most of you said, yes, it's pitch, pitch contour, must be. 868 00:40:25,370 --> 00:40:27,340 Well, it's a perfectly reasonable speculation, 869 00:40:27,340 --> 00:40:28,870 but not necessarily. 870 00:40:28,870 --> 00:40:31,420 Maybe we have special pitch contour processing 871 00:40:31,420 --> 00:40:35,380 for speech and different pitch contour processing for music. 872 00:40:35,380 --> 00:40:36,175 It's possible. 873 00:40:36,175 --> 00:40:37,300 It's an empirical question. 874 00:40:37,300 --> 00:40:40,120 Was there a question back there a second? 875 00:40:40,120 --> 00:40:46,720 OK, so maybe this is about pitch for both speech and music, not 876 00:40:46,720 --> 00:40:49,540 music per se. 877 00:40:49,540 --> 00:40:51,850 And so there are more detailed studies 878 00:40:51,850 --> 00:40:54,220 of patients with congenital amusia. 879 00:40:54,220 --> 00:40:57,610 And just like the case with acquired prosopagnosia 880 00:40:57,610 --> 00:41:00,010 versus congenital prosopagnosia, whether you get it 881 00:41:00,010 --> 00:41:02,650 from brain damage as an adult or whether you just always 882 00:41:02,650 --> 00:41:05,110 had it your whole life, and nobody knows exactly why 883 00:41:05,110 --> 00:41:07,480 and there's no evidence of any brain damage, 884 00:41:07,480 --> 00:41:10,960 the same thing happens with a congenital amusia. 885 00:41:10,960 --> 00:41:14,860 So something like 4% of the population, 886 00:41:14,860 --> 00:41:16,528 they might say they're tone deaf. 887 00:41:16,528 --> 00:41:18,070 But just to tell you what that means, 888 00:41:18,070 --> 00:41:19,720 it can be really quite extreme. 889 00:41:19,720 --> 00:41:23,110 They can just completely fail to recognize familiar melodies 890 00:41:23,110 --> 00:41:26,080 that anyone else could recognize. 891 00:41:26,080 --> 00:41:29,800 They may be unable to detect really obvious wrong notes 892 00:41:29,800 --> 00:41:32,260 in a canonical melody. 893 00:41:32,260 --> 00:41:34,990 They're just really bad at all of this. 894 00:41:34,990 --> 00:41:38,950 And further, they don't have whopping obvious problems 895 00:41:38,950 --> 00:41:40,130 with speech perception. 896 00:41:40,130 --> 00:41:44,660 So at first, it was thought that speech perception was fine. 897 00:41:44,660 --> 00:41:47,830 But if you look closer, it looks like actually there 898 00:41:47,830 --> 00:41:52,540 is, even outside of music, there is a finer grained deficit 899 00:41:52,540 --> 00:41:58,782 in pitch contour perception that shows up even in speech. 900 00:41:58,782 --> 00:42:01,240 So what I mentioned before, so we can ask this in the case. 901 00:42:01,240 --> 00:42:03,740 This is sort of the reverse case of the ones you considered. 902 00:42:03,740 --> 00:42:06,310 Now we have people who have this problem with pitch contour 903 00:42:06,310 --> 00:42:07,720 perception in music. 904 00:42:07,720 --> 00:42:10,300 Are they going to have a problem also with pitch contour 905 00:42:10,300 --> 00:42:12,280 perception in speech? 906 00:42:12,280 --> 00:42:14,950 So that's what this study looked at. 907 00:42:14,950 --> 00:42:16,630 So they played sounds like this. 908 00:42:16,630 --> 00:42:18,040 And you have to listen carefully. 909 00:42:18,040 --> 00:42:21,430 There will be sentences spoken. 910 00:42:21,430 --> 00:42:23,680 And you have to see if they're identical or different. 911 00:42:23,680 --> 00:42:24,513 So listen carefully. 912 00:42:24,513 --> 00:42:25,180 [VIDEO PLAYBACK] 913 00:42:25,180 --> 00:42:27,310 - She looks like Ann. 914 00:42:27,310 --> 00:42:29,088 She looks like Ann? 915 00:42:29,088 --> 00:42:29,671 [END PLAYBACK] 916 00:42:29,671 --> 00:42:32,171 NANCY KANWISHER: How many people thought that was different? 917 00:42:32,171 --> 00:42:33,130 Good, you got it. 918 00:42:33,130 --> 00:42:35,500 So one is the statement and one is-- 919 00:42:35,500 --> 00:42:37,060 it's sort of a question. 920 00:42:37,060 --> 00:42:38,840 It's in a sort of British accent. 921 00:42:38,840 --> 00:42:41,290 It's a little harder to detect, but different intonation 922 00:42:41,290 --> 00:42:42,130 contour. 923 00:42:42,130 --> 00:42:44,500 So that's what the Tang, et al. 924 00:42:44,500 --> 00:42:47,530 Paper was talking about is that distinction. 925 00:42:47,530 --> 00:42:50,680 So we can then ask, that subtle distinction, 926 00:42:50,680 --> 00:42:53,980 are people with congenital amusia impaired at that. 927 00:42:53,980 --> 00:42:56,020 So if it's specific to music, they shouldn't be. 928 00:42:56,020 --> 00:43:00,250 But if it's any intonation contour, they should be. 929 00:43:00,250 --> 00:43:02,920 Yeah, I'll play the other ones. 930 00:43:02,920 --> 00:43:04,690 So they are in fact impaired. 931 00:43:04,690 --> 00:43:07,840 This is accuracy here, the controls are way up there, 932 00:43:07,840 --> 00:43:09,770 the amusics are down there. 933 00:43:09,770 --> 00:43:13,720 So they are impaired at this pitch contour perception thing, 934 00:43:13,720 --> 00:43:16,510 even in the context of music. 935 00:43:16,510 --> 00:43:20,050 I'm sorry, I said that wrong-- even in the context of speech. 936 00:43:20,050 --> 00:43:22,060 So it's not just about music. 937 00:43:22,060 --> 00:43:25,930 And in the controls, they have sounds like this, 938 00:43:25,930 --> 00:43:28,120 which are just tones. 939 00:43:32,600 --> 00:43:33,100 Got that? 940 00:43:33,100 --> 00:43:34,933 It's the same kind of thing, but not speech. 941 00:43:34,933 --> 00:43:38,080 And you see a similar deficit in the amusics 942 00:43:38,080 --> 00:43:39,700 compared to the controls. 943 00:43:39,700 --> 00:43:41,673 And then they have a nonsense speech version. 944 00:43:41,673 --> 00:43:42,340 [VIDEO PLAYBACK] 945 00:43:42,340 --> 00:43:45,382 - [INAUDIBLE] 946 00:43:45,382 --> 00:43:45,965 [END PLAYBACK] 947 00:43:45,965 --> 00:43:47,720 NANCY KANWISHER: Same deal-- 948 00:43:47,720 --> 00:43:52,350 the amusics are impaired compared to the controls. 949 00:43:52,350 --> 00:43:55,100 So that shows that the deficit for these guys 950 00:43:55,100 --> 00:43:57,620 is not specific to music per se but it 951 00:43:57,620 --> 00:44:02,600 seems to be a pitch contour problem in general that 952 00:44:02,600 --> 00:44:04,310 extends to speech. 953 00:44:04,310 --> 00:44:07,225 Yeah? 954 00:44:07,225 --> 00:44:08,380 AUDIENCE: Which of those-- 955 00:44:14,425 --> 00:44:17,100 NANCY KANWISHER: We'll get there, sort of. 956 00:44:17,100 --> 00:44:20,910 It would have been nice if the Tang et. al. paper had included 957 00:44:20,910 --> 00:44:22,462 some musical contour stuff. 958 00:44:22,462 --> 00:44:24,420 They didn't, but I'll show you some of our data 959 00:44:24,420 --> 00:44:28,130 shortly that gets close to this. 960 00:44:28,130 --> 00:44:33,410 OK, so all of that suggests that this amusia is really more 961 00:44:33,410 --> 00:44:35,660 about pitch than speech. 962 00:44:35,660 --> 00:44:37,370 I'm sorry, what's the matter with me. 963 00:44:37,370 --> 00:44:39,290 It's really more about pitch than music. 964 00:44:42,110 --> 00:44:45,650 But the reading that I assigned for today 965 00:44:45,650 --> 00:44:50,103 is a very new twist in this evolving story. 966 00:44:50,103 --> 00:44:51,770 So this used to be a nice, clean lecture 967 00:44:51,770 --> 00:44:53,088 with a simple conclusion. 968 00:44:53,088 --> 00:44:55,130 And now all of a sudden, I ran across that paper. 969 00:44:55,130 --> 00:44:59,740 It's like, wow, OK, that might not be quite the case. 970 00:44:59,740 --> 00:45:01,490 So what did you guys get from the reading? 971 00:45:01,490 --> 00:45:06,270 In what way does that slightly complicate the story here? 972 00:45:06,270 --> 00:45:07,020 Yeah, [INAUDIBLE]? 973 00:45:07,020 --> 00:45:11,062 AUDIENCE: [INAUDIBLE] 974 00:45:12,978 --> 00:45:14,570 NANCY KANWISHER: Yeah, what they found 975 00:45:14,570 --> 00:45:18,740 is that amusics, not all of them, 976 00:45:18,740 --> 00:45:20,780 also have problems with rhythm. 977 00:45:20,780 --> 00:45:22,400 And that is inconsistent with the idea 978 00:45:22,400 --> 00:45:29,180 that amusia is just about pitch, whether in speech or music. 979 00:45:29,180 --> 00:45:34,890 And that says, OK, many amusics also have problems with rhythm. 980 00:45:34,890 --> 00:45:35,390 Yeah? 981 00:45:35,390 --> 00:45:39,078 AUDIENCE: [INAUDIBLE] 982 00:45:39,078 --> 00:45:41,190 NANCY KANWISHER: So there's a standard battery 983 00:45:41,190 --> 00:45:43,170 that people use that asks-- 984 00:45:43,170 --> 00:45:43,860 Dana, help me. 985 00:45:43,860 --> 00:45:45,610 What does the standard battery ask people? 986 00:45:45,610 --> 00:45:47,230 AUDIENCE: There's a lot stuff, tests, 987 00:45:47,230 --> 00:45:49,942 things like listening to like a clip of a symphony 988 00:45:49,942 --> 00:45:54,722 and having to decide whether [INAUDIBLE] or they're too 989 00:45:54,722 --> 00:45:55,297 slow. 990 00:45:55,297 --> 00:45:57,130 NANCY KANWISHER: Kinds of things that people 991 00:45:57,130 --> 00:45:59,860 without musical training answer fine, 992 00:45:59,860 --> 00:46:01,180 although there's quite a range. 993 00:46:01,180 --> 00:46:03,100 I'm at the way bottom end of Dana's scale 994 00:46:03,100 --> 00:46:04,000 when she gives these. 995 00:46:07,199 --> 00:46:08,812 AUDIENCE: That rhythm falls apart, 996 00:46:08,812 --> 00:46:10,520 might not be able to tell the difference. 997 00:46:10,520 --> 00:46:14,248 NANCY KANWISHER: Just that this prior evidence on the stuff 998 00:46:14,248 --> 00:46:16,040 I showed and a whole bunch of other studies 999 00:46:16,040 --> 00:46:19,310 seem to suggest that amusia, both in acquired brain 1000 00:46:19,310 --> 00:46:22,310 damage and congenital amusia, seem to be really 1001 00:46:22,310 --> 00:46:25,220 when you drill down more of a problem with pitch per se, 1002 00:46:25,220 --> 00:46:27,830 even pitch in speech. 1003 00:46:27,830 --> 00:46:31,820 And so then if it's about pitch, why would it also go along 1004 00:46:31,820 --> 00:46:33,237 with rhythm? 1005 00:46:33,237 --> 00:46:34,820 And so when it goes along with rhythm, 1006 00:46:34,820 --> 00:46:38,330 that starts to sound more like this is something about music. 1007 00:46:38,330 --> 00:46:39,960 It gums up the story. 1008 00:46:39,960 --> 00:46:40,770 Talia? 1009 00:46:40,770 --> 00:46:44,860 AUDIENCE: So I don't really know if this could be a compound, 1010 00:46:44,860 --> 00:46:48,010 but when it comes to natural speech 1011 00:46:48,010 --> 00:46:49,900 when you have some kind of intonation, 1012 00:46:49,900 --> 00:46:52,510 like pitch differences when you emphasize, 1013 00:46:52,510 --> 00:46:55,120 like especially in terms of a question, 1014 00:46:55,120 --> 00:46:58,378 aren't there also some kind of rhythmic differences as well? 1015 00:46:58,378 --> 00:46:59,295 NANCY KANWISHER: Yeah. 1016 00:46:59,295 --> 00:47:01,257 AUDIENCE: So how do you separate the two out? 1017 00:47:01,257 --> 00:47:03,340 NANCY KANWISHER: You just have to do a lot of work 1018 00:47:03,340 --> 00:47:05,140 to try to separate those out. 1019 00:47:05,140 --> 00:47:08,140 And so the paper I signed to you guys did some of that work. 1020 00:47:08,140 --> 00:47:10,743 There's still room to quibble, but they did. 1021 00:47:10,743 --> 00:47:12,160 There was experiment two, and they 1022 00:47:12,160 --> 00:47:14,493 tried to deal with exactly that kind of thing of saying, 1023 00:47:14,493 --> 00:47:17,360 OK, let's try to make sure that-- 1024 00:47:17,360 --> 00:47:19,360 well, actually the controls that they were doing 1025 00:47:19,360 --> 00:47:20,277 is slightly different. 1026 00:47:20,277 --> 00:47:24,820 They were to make sure that the beat task didn't require pitch. 1027 00:47:24,820 --> 00:47:27,995 So it's very, very tricky to pull these things apart, 1028 00:47:27,995 --> 00:47:28,495 which is-- 1029 00:47:28,495 --> 00:47:31,010 AUDIENCE: Yes, so like the beat task doesn't make sense, 1030 00:47:31,010 --> 00:47:32,843 but I was just, like, in the verb first one, 1031 00:47:32,843 --> 00:47:36,940 even from the paper that was assignment Sunday. 1032 00:47:36,940 --> 00:47:39,940 I don't know, so you're saying that it's totally possible 1033 00:47:39,940 --> 00:47:43,750 to separate out rhythmic differences from when 1034 00:47:43,750 --> 00:47:45,045 you're just changing pitch. 1035 00:47:45,045 --> 00:47:47,160 NANCY KANWISHER: It's really, really difficult. 1036 00:47:47,160 --> 00:47:48,660 It's really difficult. Dana's trying 1037 00:47:48,660 --> 00:47:50,550 to do experiments to do this right now. 1038 00:47:50,550 --> 00:47:53,580 And she's invented some delightful and crazy stimuli 1039 00:47:53,580 --> 00:47:55,380 that try to have one and not the other. 1040 00:47:55,380 --> 00:47:57,240 It's very tricky. 1041 00:47:57,240 --> 00:48:00,180 You can have rhythm without pitch change. 1042 00:48:00,180 --> 00:48:02,640 That you can totally do. 1043 00:48:02,640 --> 00:48:06,150 It's really hard or impossible to have a melodic contour 1044 00:48:06,150 --> 00:48:08,790 without some beat or other. 1045 00:48:08,790 --> 00:48:11,200 We have some crazy stimuli that sort of do that, 1046 00:48:11,200 --> 00:48:13,050 but they're pretty crazy. 1047 00:48:13,050 --> 00:48:16,050 So anyway, these are very tricky things to pull apart. 1048 00:48:16,050 --> 00:48:18,360 And this is all right at the cutting edge. 1049 00:48:18,360 --> 00:48:20,550 These things have not been cleanly separated. 1050 00:48:20,550 --> 00:48:21,550 I'm running out of time. 1051 00:48:21,550 --> 00:48:24,130 So do you have a quick question? 1052 00:48:24,130 --> 00:48:26,800 OK, sorry about that. 1053 00:48:26,800 --> 00:48:28,840 So conclusions from the patient literature, 1054 00:48:28,840 --> 00:48:32,290 they're suggestive evidence for specialization for music, 1055 00:48:32,290 --> 00:48:35,170 but no really clear disassociations. 1056 00:48:35,170 --> 00:48:37,270 Music deficits are frequently but not 1057 00:48:37,270 --> 00:48:41,620 always associated with just more general pitch deficits. 1058 00:48:41,620 --> 00:48:43,540 And all of this is complicated because there's 1059 00:48:43,540 --> 00:48:47,450 lots of possible components of music, right. 1060 00:48:47,450 --> 00:48:49,630 When there's pitch deficits, is it 1061 00:48:49,630 --> 00:48:52,610 pitch or relative pitch, interval, key, melody, beat, 1062 00:48:52,610 --> 00:48:53,110 meter? 1063 00:48:53,110 --> 00:48:55,780 All of these things are different facets of music. 1064 00:48:55,780 --> 00:48:59,410 And so it's really not resolved exactly what's going on here. 1065 00:48:59,410 --> 00:49:01,750 It's kind of encouraging that there's a space in there, 1066 00:49:01,750 --> 00:49:03,520 but not resolved. 1067 00:49:03,520 --> 00:49:05,852 So let's go on to functional MRI. 1068 00:49:05,852 --> 00:49:07,310 And we're going to run out of time. 1069 00:49:07,310 --> 00:49:09,060 So let me just take a moment to figure out 1070 00:49:09,060 --> 00:49:11,560 how I'm going to do this. 1071 00:49:11,560 --> 00:49:13,960 What the hell am I going to do here? 1072 00:49:13,960 --> 00:49:16,070 Well, I hate to-- 1073 00:49:18,640 --> 00:49:21,880 OK, you guys are going to tell me at 12:05. 1074 00:49:21,880 --> 00:49:23,320 Yeah, OK. 1075 00:49:23,320 --> 00:49:24,880 Maybe we can get all through this. 1076 00:49:24,880 --> 00:49:28,600 So here's a really charming study from a few years ago 1077 00:49:28,600 --> 00:49:32,110 that tried to ask whether there are systematic brain 1078 00:49:32,110 --> 00:49:34,390 regions that are engaged in processing music. 1079 00:49:34,390 --> 00:49:38,317 And they used a really fun perceptual illusion 1080 00:49:38,317 --> 00:49:39,400 that you're going to hear. 1081 00:49:39,400 --> 00:49:41,890 I'm going to play a speech clip. 1082 00:49:41,890 --> 00:49:44,740 And it's part of it is going to be repeated many times. 1083 00:49:44,740 --> 00:49:48,786 And just listen to it and think about what it sounds like. 1084 00:49:48,786 --> 00:49:49,740 [VIDEO PLAYBACK] 1085 00:49:49,740 --> 00:49:53,250 - For it had never been his good luck to own and eat one. 1086 00:49:53,250 --> 00:49:55,410 There was a cold drizzle of rain. 1087 00:49:55,410 --> 00:49:58,400 The atmosphere was murky. 1088 00:49:58,400 --> 00:50:00,200 There was a cold drizzle. 1089 00:50:00,200 --> 00:50:02,090 There was a cold drizzle. 1090 00:50:02,090 --> 00:50:03,920 There was a cold drizzle. 1091 00:50:03,920 --> 00:50:05,720 There was a cold drizzle. 1092 00:50:05,720 --> 00:50:07,550 There was a cold drizzle. 1093 00:50:07,550 --> 00:50:09,139 There was a cold drizzle. 1094 00:50:09,139 --> 00:50:09,722 [END PLAYBACK] 1095 00:50:09,722 --> 00:50:11,826 NANCY KANWISHER: What happened? 1096 00:50:11,826 --> 00:50:13,305 AUDIENCE: [INAUDIBLE] 1097 00:50:13,305 --> 00:50:15,300 NANCY KANWISHER: Yeah? 1098 00:50:15,300 --> 00:50:16,398 What happened? 1099 00:50:16,398 --> 00:50:18,888 AUDIENCE: [INAUDIBLE] 1100 00:50:18,888 --> 00:50:21,040 NANCY KANWISHER: You start to hear a melody. 1101 00:50:21,040 --> 00:50:23,020 And you didn't hear the melody the first time he said it. 1102 00:50:23,020 --> 00:50:24,395 It was just normal speech, right. 1103 00:50:24,395 --> 00:50:26,830 Speech has this kind of intonation contour. 1104 00:50:26,830 --> 00:50:28,810 And he's speaking with an intonation contour. 1105 00:50:28,810 --> 00:50:30,560 But then somehow when you keep hearing it, 1106 00:50:30,560 --> 00:50:32,860 it turns into a melody. 1107 00:50:32,860 --> 00:50:35,200 So it turns out that doesn't work for all speech clips. 1108 00:50:35,200 --> 00:50:37,750 In fact, it's really hard to find speech clips for which it 1109 00:50:37,750 --> 00:50:38,250 works. 1110 00:50:38,250 --> 00:50:39,970 But there are some. 1111 00:50:39,970 --> 00:50:43,090 But everyone has that experience, or most people do. 1112 00:50:43,090 --> 00:50:45,310 And that gives us a really nice lever, 1113 00:50:45,310 --> 00:50:48,760 because we can take that same acoustic sound when you hear it 1114 00:50:48,760 --> 00:50:51,670 as speech and when you hear it as melody and we can ask, 1115 00:50:51,670 --> 00:50:54,100 are there brain regions that respond differentially. 1116 00:50:54,100 --> 00:50:56,832 It's sort of analogous to upright versus inverted faces. 1117 00:50:56,832 --> 00:50:57,790 Well, it's even better. 1118 00:50:57,790 --> 00:50:59,590 It's the exact same sound clip that's 1119 00:50:59,590 --> 00:51:02,830 construed one way at first and another way afterwards. 1120 00:51:02,830 --> 00:51:06,070 Everybody get that? 1121 00:51:06,070 --> 00:51:07,510 So that's what these guys did. 1122 00:51:07,510 --> 00:51:09,590 They used a standard block design. 1123 00:51:09,590 --> 00:51:11,715 They just listened to those sounds 1124 00:51:11,715 --> 00:51:13,090 and they just looked in the brain 1125 00:51:13,090 --> 00:51:17,230 to see what bits respond more after the sound starts getting 1126 00:51:17,230 --> 00:51:19,060 perceived as music than before when 1127 00:51:19,060 --> 00:51:20,920 it was being heard as speech. 1128 00:51:20,920 --> 00:51:23,350 And they got a bunch of blobs in the brain. 1129 00:51:23,350 --> 00:51:25,330 It's a bit of a mess, but they got some stuff. 1130 00:51:27,880 --> 00:51:30,200 And so that's fun. 1131 00:51:30,200 --> 00:51:32,530 But it's also ambiguous. 1132 00:51:32,530 --> 00:51:35,620 We still don't know if this is about some kind 1133 00:51:35,620 --> 00:51:38,440 of pitch processing, which becomes more salient-- 1134 00:51:38,440 --> 00:51:41,680 you hear it as abstract pitch-- or whether it's really 1135 00:51:41,680 --> 00:51:43,750 about melodic contour or what. 1136 00:51:43,750 --> 00:51:45,970 So that's a cool study, but I think it doesn't really 1137 00:51:45,970 --> 00:51:48,140 nail what's going on. 1138 00:51:48,140 --> 00:51:53,890 So another angle at this is to ask whether music recruits 1139 00:51:53,890 --> 00:51:56,080 neural machinery for language. 1140 00:51:56,080 --> 00:51:59,920 So let me say why this has been such a pervasive question 1141 00:51:59,920 --> 00:52:00,470 in the field. 1142 00:52:00,470 --> 00:52:04,150 So there's a lot of people who have pointed out for 30 years, 1143 00:52:04,150 --> 00:52:07,540 or probably more, there are many deep commonalities 1144 00:52:07,540 --> 00:52:09,310 between language and music. 1145 00:52:09,310 --> 00:52:12,280 So they're both distinctively or uniquely human. 1146 00:52:12,280 --> 00:52:14,530 They're natively auditory. 1147 00:52:14,530 --> 00:52:16,930 That is, we can read language, but that's very recent. 1148 00:52:16,930 --> 00:52:20,770 Really, language is all about hearing, evolutionarily. 1149 00:52:20,770 --> 00:52:22,990 They unfold over time. 1150 00:52:22,990 --> 00:52:26,530 And they have complex hierarchical structure. 1151 00:52:26,530 --> 00:52:28,570 So you can parse a sentence in various ways 1152 00:52:28,570 --> 00:52:29,945 and there are all kinds of people 1153 00:52:29,945 --> 00:52:34,060 who've come up with ways to have hierarchical parsings of pieces 1154 00:52:34,060 --> 00:52:35,390 of music as well. 1155 00:52:35,390 --> 00:52:37,300 So there's a lot of deep connections 1156 00:52:37,300 --> 00:52:39,670 between language and music. 1157 00:52:39,670 --> 00:52:41,620 And so many people have hypothesized that they 1158 00:52:41,620 --> 00:52:44,170 use common brain machinery. 1159 00:52:44,170 --> 00:52:47,800 And there, in fact, many reports from neuroimaging 1160 00:52:47,800 --> 00:52:50,230 that argue that in fact they do use common machinery. 1161 00:52:50,230 --> 00:52:53,410 Like, we found overlapping activation in Broca's area 1162 00:52:53,410 --> 00:52:57,670 for people listening to music and speech. 1163 00:52:57,670 --> 00:53:01,350 However, both studies are all group analyses. 1164 00:53:01,350 --> 00:53:03,370 I forget if I've gone on my tirade in here 1165 00:53:03,370 --> 00:53:04,330 about group analyses. 1166 00:53:04,330 --> 00:53:06,583 Have I done the group analysis tirade in here? 1167 00:53:06,583 --> 00:53:07,750 You'll get more of it later. 1168 00:53:07,750 --> 00:53:10,042 I'll do a brief version now, and you'll get more later. 1169 00:53:10,042 --> 00:53:11,770 Here's the problem-- group analysis 1170 00:53:11,770 --> 00:53:14,230 is you scan 12 subjects. 1171 00:53:14,230 --> 00:53:16,760 You align their brains as best you can. 1172 00:53:16,760 --> 00:53:19,900 And you do an analysis that goes across them. 1173 00:53:19,900 --> 00:53:23,110 And you find some blob, say, here, yeah, 1174 00:53:23,110 --> 00:53:26,230 be there, for listening to sentences versus listening 1175 00:53:26,230 --> 00:53:27,490 to non-word strings. 1176 00:53:27,490 --> 00:53:29,470 OK, that's a standard finding. 1177 00:53:29,470 --> 00:53:31,030 Then you do it again for listening 1178 00:53:31,030 --> 00:53:34,930 to melodies versus listening to scrambled melodies. 1179 00:53:34,930 --> 00:53:37,800 And you find the blob overlaps. 1180 00:53:37,800 --> 00:53:40,590 And then you say, hey, common neural machinery 1181 00:53:40,590 --> 00:53:46,560 for sentence understanding and for music perception. 1182 00:53:46,560 --> 00:53:48,880 Now that's an interesting question to ask. 1183 00:53:48,880 --> 00:53:50,500 It's close to the right way to do it. 1184 00:53:50,500 --> 00:53:52,140 but there's a fundamental problem. 1185 00:53:52,140 --> 00:53:55,800 And that is, you can find an overlap in a group analysis, 1186 00:53:55,800 --> 00:53:59,050 even if no single subject shows that overlap at all. 1187 00:53:59,050 --> 00:53:59,550 Why? 1188 00:53:59,550 --> 00:54:02,220 Because those regions vary in their exact location. 1189 00:54:02,220 --> 00:54:04,710 And if you mush across a whole bunch of individuals, 1190 00:54:04,710 --> 00:54:08,730 you're essentially blurring your activation pattern. 1191 00:54:08,730 --> 00:54:12,180 And so all of the prior studies, until a few years ago, 1192 00:54:12,180 --> 00:54:14,700 had been group analyses and they found overlap. 1193 00:54:14,700 --> 00:54:17,100 And who the hell knows if there was actually 1194 00:54:17,100 --> 00:54:20,130 overlapping activation within individual subjects, which 1195 00:54:20,130 --> 00:54:22,410 there would have to be if it's common machinery. 1196 00:54:22,410 --> 00:54:25,090 Or if they're just nearby and you muck them up 1197 00:54:25,090 --> 00:54:27,090 with a group analysis and they look like they're 1198 00:54:27,090 --> 00:54:28,422 on top of each other. 1199 00:54:28,422 --> 00:54:29,880 If you didn't quite get that, we'll 1200 00:54:29,880 --> 00:54:31,088 be coming back to that point. 1201 00:54:31,088 --> 00:54:33,060 For now, all you need to know is many people 1202 00:54:33,060 --> 00:54:34,770 ask this question and the methods 1203 00:54:34,770 --> 00:54:37,650 were close but problematic. 1204 00:54:37,650 --> 00:54:39,570 But luckily, however, Ev Fedorenko 1205 00:54:39,570 --> 00:54:42,310 did this experiment right a few years ago. 1206 00:54:42,310 --> 00:54:44,280 So here's Ev and here's what she did, 1207 00:54:44,280 --> 00:54:47,370 she functionally identified language regions 1208 00:54:47,370 --> 00:54:49,620 in each subject individually. 1209 00:54:49,620 --> 00:54:51,720 And we'll talk more about exactly how you do that. 1210 00:54:51,720 --> 00:54:54,030 You listen to sentences versus non-word strings. 1211 00:54:54,030 --> 00:54:57,210 You find a systematic set of brain regions 1212 00:54:57,210 --> 00:55:00,510 that you can identify in each individual that look like this. 1213 00:55:00,510 --> 00:55:01,740 Here is in three subjects. 1214 00:55:01,740 --> 00:55:03,600 Those red bits are the bits that respond 1215 00:55:03,600 --> 00:55:06,270 more when you listen to a sentence 1216 00:55:06,270 --> 00:55:08,010 versus listen to non-word strings 1217 00:55:08,010 --> 00:55:10,703 or read sentences versus non-word strings. 1218 00:55:10,703 --> 00:55:12,870 Then what she could do is she said, now that I found 1219 00:55:12,870 --> 00:55:14,760 those exact regions in each subject, 1220 00:55:14,760 --> 00:55:16,960 I can ask of those exact regions, 1221 00:55:16,960 --> 00:55:20,880 how do they respond to music versus scrambled music. 1222 00:55:20,880 --> 00:55:23,139 So she played stuff like this. 1223 00:55:23,139 --> 00:55:27,230 [MUSIC PLAYING] 1224 00:55:27,730 --> 00:55:30,690 OK, so nice canonical and nothing crazy, weird. 1225 00:55:30,690 --> 00:55:32,440 We're not going with the New Guinean music 1226 00:55:32,440 --> 00:55:33,520 and asking edgy questions. 1227 00:55:33,520 --> 00:55:35,353 We're just saying something everybody agrees 1228 00:55:35,353 --> 00:55:37,330 that's music, versus you scramble it 1229 00:55:37,330 --> 00:55:38,812 and it sounds like this. 1230 00:55:38,812 --> 00:55:42,260 [MUSIC PLAYING] 1231 00:55:42,260 --> 00:55:43,640 OK, it's actually the same notes. 1232 00:55:43,640 --> 00:55:44,150 I know, I know. 1233 00:55:44,150 --> 00:55:46,400 A lot of people that go, that's cool, that's really edgy. 1234 00:55:46,400 --> 00:55:46,940 Yeah, it is. 1235 00:55:46,940 --> 00:55:52,160 But to most people, it's not canonical music. 1236 00:55:52,160 --> 00:55:55,370 And so what Ev found is that none of those language regions 1237 00:55:55,370 --> 00:56:00,240 responded more to the intact than scrambled music. 1238 00:56:00,240 --> 00:56:02,280 So language regions are not interested in music. 1239 00:56:02,280 --> 00:56:05,820 We'll talk more about that next week or the week after. 1240 00:56:05,820 --> 00:56:07,330 Then she did the opposite. 1241 00:56:07,330 --> 00:56:10,967 She identified brain regions here in a group analysis 1242 00:56:10,967 --> 00:56:12,675 just to show you where they are, anterior 1243 00:56:12,675 --> 00:56:15,960 in the temporal lobes, that respond more to intact 1244 00:56:15,960 --> 00:56:18,150 than scrambled music. 1245 00:56:18,150 --> 00:56:20,230 She identified those in each subject 1246 00:56:20,230 --> 00:56:22,230 and measured the response of those regions 1247 00:56:22,230 --> 00:56:25,890 to language, sentences and non-word strings. 1248 00:56:25,890 --> 00:56:29,550 And each of those regions respond exactly the same 1249 00:56:29,550 --> 00:56:31,620 to sentences and non-word strings. 1250 00:56:31,620 --> 00:56:34,800 So basically, the language regions 1251 00:56:34,800 --> 00:56:37,380 are not interested in music, and the music regions 1252 00:56:37,380 --> 00:56:38,970 are not interested in language. 1253 00:56:38,970 --> 00:56:41,395 And therein, we have a-- 1254 00:56:41,395 --> 00:56:42,270 AUDIENCE: [INAUDIBLE] 1255 00:56:42,270 --> 00:56:46,490 NANCY KANWISHER: Thank you, exactly. 1256 00:56:46,490 --> 00:56:50,932 So music is not using machinery for language. 1257 00:56:50,932 --> 00:56:52,890 That was one of the hypotheses we started with. 1258 00:56:52,890 --> 00:56:53,540 And it was not. 1259 00:56:56,320 --> 00:57:00,070 So that's true, at least for high-level language processing, 1260 00:57:00,070 --> 00:57:02,770 that computes the meaning of a sentence. 1261 00:57:02,770 --> 00:57:04,447 But what about speech perception? 1262 00:57:04,447 --> 00:57:07,030 Remember, last time I made the distinction between the sounds, 1263 00:57:07,030 --> 00:57:10,150 like ba and pa, which have a whole set 1264 00:57:10,150 --> 00:57:13,648 of computational challenges, just perceiving those sounds, 1265 00:57:13,648 --> 00:57:15,190 which is quite different than knowing 1266 00:57:15,190 --> 00:57:17,080 the meaning of a sentence. 1267 00:57:17,080 --> 00:57:19,520 So what about speech perception or, in fact, 1268 00:57:19,520 --> 00:57:22,420 any other aspect of hearing? 1269 00:57:22,420 --> 00:57:25,420 So what I'm going to try to do is briefly tell you about one 1270 00:57:25,420 --> 00:57:26,260 of our experiments. 1271 00:57:26,260 --> 00:57:28,420 I'm sorry, I try not to turn this whole course into stuff 1272 00:57:28,420 --> 00:57:30,790 we've done in my lab, but it's one of my favorite ever. 1273 00:57:30,790 --> 00:57:34,720 And it's a cool, different way to go at this question 1274 00:57:34,720 --> 00:57:38,790 from the other MRI experiments we've talked about before. 1275 00:57:38,790 --> 00:57:41,060 So the background is, OK, let's step back. 1276 00:57:41,060 --> 00:57:44,480 What's the overall organization of auditory cortex? 1277 00:57:44,480 --> 00:57:46,970 And when we did this experiment five or six years ago, 1278 00:57:46,970 --> 00:57:48,590 not a whole lot was known. 1279 00:57:48,590 --> 00:57:50,600 Basically, everybody agrees. 1280 00:57:50,600 --> 00:57:52,460 Whoops, I put the wrong slide in here. 1281 00:57:52,460 --> 00:57:56,390 Everybody agrees that primary auditory cortex is right there 1282 00:57:56,390 --> 00:57:58,130 with that high-low-high frequency thing 1283 00:57:58,130 --> 00:57:59,450 we talked about from there. 1284 00:57:59,450 --> 00:58:02,690 But from there on out, in the last couple of years, 1285 00:58:02,690 --> 00:58:05,540 there's an agreement about speech selective cortex 1286 00:58:05,540 --> 00:58:07,190 that I showed you briefly last time 1287 00:58:07,190 --> 00:58:09,410 and other people have seen that. 1288 00:58:09,410 --> 00:58:13,130 But there's lots of hypotheses and no agreement with anything 1289 00:58:13,130 --> 00:58:19,430 else and no real evidence for really music-selective cortex. 1290 00:58:19,430 --> 00:58:21,350 But there's a problem with all the prior work 1291 00:58:21,350 --> 00:58:23,270 where you sit around and make a hypothesis 1292 00:58:23,270 --> 00:58:25,790 and say, oh, let's see, are we going to get a higher 1293 00:58:25,790 --> 00:58:29,990 response to, say, intact versus scrambled music, or faces 1294 00:58:29,990 --> 00:58:33,410 versus objects, or whatever. 1295 00:58:33,410 --> 00:58:36,300 All of those are scientists making up hypotheses, 1296 00:58:36,300 --> 00:58:37,790 and then testing them. 1297 00:58:37,790 --> 00:58:39,290 And there's nothing wrong with that. 1298 00:58:39,290 --> 00:58:40,550 That's what scientists are supposed 1299 00:58:40,550 --> 00:58:42,860 to do-- invent hypotheses, and then make good designs 1300 00:58:42,860 --> 00:58:44,190 and go test them. 1301 00:58:44,190 --> 00:58:47,750 But the problem with that is, we can only discover things 1302 00:58:47,750 --> 00:58:49,950 that we can think to test. 1303 00:58:49,950 --> 00:58:52,040 What if deep facts about mind and brain 1304 00:58:52,040 --> 00:58:55,260 are things that nobody would think up in the first place? 1305 00:58:55,260 --> 00:58:58,100 And so that's where we can get real power from what 1306 00:58:58,100 --> 00:59:01,100 are known as data-driven studies, where you collect 1307 00:59:01,100 --> 00:59:03,860 a boatload of data and then use some fancy math 1308 00:59:03,860 --> 00:59:06,680 and say, tell me what the structure is in this data. 1309 00:59:06,680 --> 00:59:10,550 Not, is this hypothesis that I love true in these data. 1310 00:59:10,550 --> 00:59:12,582 And I'll do anything to pull it out if I can. 1311 00:59:12,582 --> 00:59:14,540 See it in there, find evidence for it in there. 1312 00:59:14,540 --> 00:59:15,860 But yeah, exactly. 1313 00:59:15,860 --> 00:59:19,340 But if we collect a whole bunch of data and do some math 1314 00:59:19,340 --> 00:59:23,070 and see what the structure is, what do we see? 1315 00:59:23,070 --> 00:59:25,440 So that's what we did in this study. 1316 00:59:25,440 --> 00:59:28,470 I'm going to speed up to try to give you the gist here. 1317 00:59:28,470 --> 00:59:31,185 So "we" is Sam Norman-Haignere here and Josh McDermott. 1318 00:59:31,185 --> 00:59:32,727 [SOUND RECORDING EXPERIMENT PLAYING] 1319 00:59:32,727 --> 00:59:34,200 And so we scanned people while they 1320 00:59:34,200 --> 00:59:36,070 were hearing stuff like this. 1321 00:59:36,070 --> 00:59:40,710 We first collected the 165 categories of sounds 1322 00:59:40,710 --> 00:59:42,480 that people hear most commonly. 1323 00:59:42,480 --> 00:59:45,090 This is classic cocktail party effect you guys are doing. 1324 00:59:45,090 --> 00:59:47,970 You have to separate me speaking from all this crazy, weird, 1325 00:59:47,970 --> 00:59:50,400 changing background. 1326 00:59:50,400 --> 00:59:52,530 And so anyway, we scan people listening 1327 00:59:52,530 --> 00:59:57,360 to these sounds, which broadly sample auditory experience. 1328 00:59:57,360 --> 00:59:59,645 And so we collected sounds people hear most often 1329 00:59:59,645 --> 01:00:01,770 and that they can recognize from a two-second clip. 1330 01:00:01,770 --> 01:00:02,730 OK, enough already. 1331 01:00:02,730 --> 01:00:03,522 [CELLPHONE RINGING] 1332 01:00:03,522 --> 01:00:06,930 Oh, yeah, just to wake everyone up. 1333 01:00:06,930 --> 01:00:10,650 So we scan them listening to those 165 sounds, broad sample 1334 01:00:10,650 --> 01:00:12,270 of auditory experience. 1335 01:00:12,270 --> 01:00:15,150 Then, from each voxel in the brain, 1336 01:00:15,150 --> 01:00:18,750 we measure the exact magnitude of response of that voxel 1337 01:00:18,750 --> 01:00:22,200 to each of the 165 sounds and we get a vector like this. 1338 01:00:22,200 --> 01:00:23,190 Everybody with me? 1339 01:00:23,190 --> 01:00:27,600 That's one voxel right there, another voxel, another voxel. 1340 01:00:27,600 --> 01:00:30,060 We do this in all of kind of greater, 1341 01:00:30,060 --> 01:00:31,710 suburban, auditory cortex. 1342 01:00:31,710 --> 01:00:34,920 That is not just primary cortex, but all this stuff around it 1343 01:00:34,920 --> 01:00:37,980 that might even remotely, that responds in any systematic way 1344 01:00:37,980 --> 01:00:38,940 to auditory stimuli. 1345 01:00:38,940 --> 01:00:41,430 They grabbed the whole damn thing. 1346 01:00:41,430 --> 01:00:43,830 So you do that in 10 subjects. 1347 01:00:43,830 --> 01:00:46,140 You have a big matrix like this-- 1348 01:00:46,140 --> 01:00:49,020 1,000 voxels in each subject, 11,000 voxels 1349 01:00:49,020 --> 01:00:51,930 across the top, 165 sounds. 1350 01:00:51,930 --> 01:00:54,820 That's our data. 1351 01:00:54,820 --> 01:01:01,060 So each column is the response of one voxel 1352 01:01:01,060 --> 01:01:04,420 in one person's brain to each of the 165 sounds. 1353 01:01:04,420 --> 01:01:06,420 Everybody got it? 1354 01:01:06,420 --> 01:01:08,520 Now, we have this lovely matrix, which 1355 01:01:08,520 --> 01:01:10,155 is basically all the data we care about 1356 01:01:10,155 --> 01:01:11,280 from this whole experiment. 1357 01:01:11,280 --> 01:01:14,040 Then, we throw away all the labels. 1358 01:01:14,040 --> 01:01:14,670 Poof. 1359 01:01:14,670 --> 01:01:16,260 It's just a matrix. 1360 01:01:16,260 --> 01:01:19,890 And then we do some math, which essentially says, 1361 01:01:19,890 --> 01:01:21,990 let's boil down the structure in this matrix 1362 01:01:21,990 --> 01:01:24,840 and discover its fundamental components. 1363 01:01:24,840 --> 01:01:26,498 That math happens to be a variant 1364 01:01:26,498 --> 01:01:27,915 of independent component analysis, 1365 01:01:27,915 --> 01:01:29,370 if that means anything to you. 1366 01:01:29,370 --> 01:01:30,870 If it doesn't, don't worry about it. 1367 01:01:30,870 --> 01:01:33,000 The gist is, we're doing math to say 1368 01:01:33,000 --> 01:01:34,530 what's the structure in here. 1369 01:01:34,530 --> 01:01:36,720 And we're doing it without any labels. 1370 01:01:36,720 --> 01:01:40,080 So this analysis doesn't even know where the voxels are 1371 01:01:40,080 --> 01:01:42,750 or which of your 10 subjects that voxel came from. 1372 01:01:42,750 --> 01:01:44,740 It doesn't know which sound is which. 1373 01:01:44,740 --> 01:01:46,860 And so it's very hypothesis neutral. 1374 01:01:46,860 --> 01:01:50,640 It's a way to say, show me structure with almost no kind 1375 01:01:50,640 --> 01:01:52,140 of prior biases. 1376 01:01:52,140 --> 01:01:54,000 Just show me the structure. 1377 01:01:54,000 --> 01:01:57,390 So everybody get how that's kind of a totally different thing 1378 01:01:57,390 --> 01:02:00,450 to do from everything we've talked about so far? 1379 01:02:00,450 --> 01:02:01,470 So that's what we did. 1380 01:02:01,470 --> 01:02:03,630 I'm going to skip the math and the modeling assumption. 1381 01:02:03,630 --> 01:02:05,005 It's not really that complicated, 1382 01:02:05,005 --> 01:02:07,990 but I think I'm going to run out of time, 1383 01:02:07,990 --> 01:02:09,850 so very hypothesis neutral. 1384 01:02:09,850 --> 01:02:13,110 And what we find is six components 1385 01:02:13,110 --> 01:02:15,450 account for most of the replicable variance 1386 01:02:15,450 --> 01:02:16,663 in that whole matrix. 1387 01:02:16,663 --> 01:02:18,580 I'll tell you what a component is in a second. 1388 01:02:18,580 --> 01:02:19,580 Did you have a question? 1389 01:02:19,580 --> 01:02:23,460 AUDIENCE: Is it just like with ICA, but [INAUDIBLE] PCA 1390 01:02:23,460 --> 01:02:25,060 [INAUDIBLE]? 1391 01:02:25,060 --> 01:02:28,060 NANCY KANWISHER: With PCA, you assume orthogonal axes. 1392 01:02:28,060 --> 01:02:30,790 With ICA, you don't assume orthogonal axes. 1393 01:02:30,790 --> 01:02:33,280 And so it's very, very similar to PCA. 1394 01:02:33,280 --> 01:02:34,963 And it starts out as PCA and then 1395 01:02:34,963 --> 01:02:36,130 it does some more rigmarole. 1396 01:02:36,130 --> 01:02:38,260 Yeah, it's the same idea. 1397 01:02:38,260 --> 01:02:40,790 Like basically, tell me the main dimensions of variation. 1398 01:02:40,790 --> 01:02:41,290 Yeah? 1399 01:02:41,290 --> 01:02:43,897 AUDIENCE: And are these matrices sparse and [INAUDIBLE]?? 1400 01:02:43,897 --> 01:02:45,480 NANCY KANWISHER: Yes, they are sparse. 1401 01:02:45,480 --> 01:02:48,090 And that is one of the assumptions you use. 1402 01:02:48,090 --> 01:02:51,240 There isn't only one way to factorize a matrix. 1403 01:02:51,240 --> 01:02:52,630 It's an ill-posed problem. 1404 01:02:52,630 --> 01:02:54,520 So you need to make some assumptions. 1405 01:02:54,520 --> 01:02:58,290 And that's one of the ones we made, but you can test them. 1406 01:02:58,290 --> 01:03:00,630 So what we find is six components 1407 01:03:00,630 --> 01:03:03,150 account for most of the data. 1408 01:03:03,150 --> 01:03:07,770 And four of those reflected acoustic properties 1409 01:03:07,770 --> 01:03:10,080 of the stimuli. 1410 01:03:10,080 --> 01:03:14,740 One was high for all the sounds with lots of low frequencies. 1411 01:03:14,740 --> 01:03:17,880 Another was high for all the sounds with high frequencies. 1412 01:03:17,880 --> 01:03:18,570 What is that? 1413 01:03:23,863 --> 01:03:24,530 Sorry, speak up? 1414 01:03:24,530 --> 01:03:26,310 AUDIENCE: [INAUDIBLE] 1415 01:03:26,310 --> 01:03:28,310 NANCY KANWISHER: They're sensitive to frequency, 1416 01:03:28,310 --> 01:03:29,900 but where is that in the brain that you've already 1417 01:03:29,900 --> 01:03:30,470 heard about? 1418 01:03:30,470 --> 01:03:31,370 AUDIENCE: Primary-- 1419 01:03:31,370 --> 01:03:33,230 NANCY KANWISHER: Primary auditory cortex 1420 01:03:33,230 --> 01:03:34,520 as a tonotopic map. 1421 01:03:34,520 --> 01:03:35,840 So this is awesome. 1422 01:03:35,840 --> 01:03:37,670 Because if you go invent some crazy math 1423 01:03:37,670 --> 01:03:40,160 and you apply it to your data and you discover something 1424 01:03:40,160 --> 01:03:42,860 you know to be true, that's very reassuring. 1425 01:03:42,860 --> 01:03:44,930 The math isn't just inventing crazy stuff. 1426 01:03:44,930 --> 01:03:47,810 It's discovering stuff we already know to be true. 1427 01:03:47,810 --> 01:03:50,000 That's known in more biological parts of the field 1428 01:03:50,000 --> 01:03:51,530 as a positive control. 1429 01:03:51,530 --> 01:03:53,630 Invent a new method, make sure it can discover 1430 01:03:53,630 --> 01:03:55,470 the stuff you know to be true. 1431 01:03:55,470 --> 01:03:59,360 So check, check, OK. 1432 01:03:59,360 --> 01:04:01,618 But then it discovered some other stuff. 1433 01:04:01,618 --> 01:04:03,660 And I'm just going to tell you about two of them. 1434 01:04:03,660 --> 01:04:04,463 So here's one. 1435 01:04:04,463 --> 01:04:06,380 So I was just loose about what a component is. 1436 01:04:06,380 --> 01:04:10,820 A component is a magnitude of response for each of the 165 1437 01:04:10,820 --> 01:04:14,660 sounds and a separate distribution 1438 01:04:14,660 --> 01:04:17,130 in the brain, which I'll show you in a moment. 1439 01:04:17,130 --> 01:04:19,020 So here's one of those components. 1440 01:04:19,020 --> 01:04:23,150 And we've taken the 165 sounds and added basic category 1441 01:04:23,150 --> 01:04:23,870 labels on them. 1442 01:04:23,870 --> 01:04:26,030 We put them on Mechanical Turk and people told us 1443 01:04:26,030 --> 01:04:28,050 which category they belong to. 1444 01:04:28,050 --> 01:04:31,370 So that enables us to look at this mysterious thing 1445 01:04:31,370 --> 01:04:34,430 and average within a category. 1446 01:04:34,430 --> 01:04:37,040 So this is its component. 1447 01:04:37,040 --> 01:04:39,940 And if you look at it, you see that it's 1448 01:04:39,940 --> 01:04:44,290 really high for English speech and foreign speech 1449 01:04:44,290 --> 01:04:46,660 that our subjects don't understand. 1450 01:04:46,660 --> 01:04:49,450 And then, oh, what's, that intermediate Thing 1451 01:04:49,450 --> 01:04:51,430 Oh, that's music with vocals. 1452 01:04:51,430 --> 01:04:54,910 It has a kind of speech. 1453 01:04:54,910 --> 01:04:56,320 And way down here-- 1454 01:04:56,320 --> 01:04:59,350 that's non speech vocalizations, stuff like laughing 1455 01:04:59,350 --> 01:05:00,790 and crying and sighing. 1456 01:05:00,790 --> 01:05:03,670 So there's a voice but no speech content. 1457 01:05:03,670 --> 01:05:07,040 So that's a speech component. 1458 01:05:07,040 --> 01:05:09,080 And as I mentioned, this had been seen 1459 01:05:09,080 --> 01:05:10,690 before in the last few years. 1460 01:05:10,690 --> 01:05:12,140 So it wasn't completely new. 1461 01:05:12,140 --> 01:05:15,830 But what's cool about this is just emerged spontaneously 1462 01:05:15,830 --> 01:05:17,630 from this very broad screen. 1463 01:05:17,630 --> 01:05:19,070 We didn't go and say, hey, can we 1464 01:05:19,070 --> 01:05:20,870 find a speech selective region of cortex, 1465 01:05:20,870 --> 01:05:21,860 if we try really hard. 1466 01:05:21,860 --> 01:05:24,150 Oh, yeah, we validate our hypothesis. 1467 01:05:24,150 --> 01:05:26,810 This is like, let's sample auditory experience-- and wow, 1468 01:05:26,810 --> 01:05:27,330 there it is. 1469 01:05:27,330 --> 01:05:27,830 Yeah? 1470 01:05:27,830 --> 01:05:30,650 AUDIENCE: I mean, you assigned [INAUDIBLE].. 1471 01:05:30,650 --> 01:05:32,150 NANCY KANWISHER: We put them on Turk 1472 01:05:32,150 --> 01:05:34,170 and had people say what category they fit into. 1473 01:05:34,170 --> 01:05:34,670 Yeah? 1474 01:05:34,670 --> 01:05:36,077 AUDIENCE: [INAUDIBLE]. 1475 01:05:39,830 --> 01:05:43,644 Categorizing by speech is a very good way [INAUDIBLE] 1476 01:05:43,644 --> 01:05:45,080 better way than [INAUDIBLE]. 1477 01:05:45,080 --> 01:05:46,940 NANCY KANWISHER: Absolutely, absolutely. 1478 01:05:46,940 --> 01:05:47,990 This is a first pass. 1479 01:05:47,990 --> 01:05:50,210 And one hopes to go deeper and deeper. 1480 01:05:50,210 --> 01:05:56,900 If we could separate different aspects of speech, consonants 1481 01:05:56,900 --> 01:05:58,520 and vowels, fricatives, whatever, 1482 01:05:58,520 --> 01:06:00,020 there could be much more to be done. 1483 01:06:00,020 --> 01:06:02,000 Yeah, I got to-- 1484 01:06:02,000 --> 01:06:04,170 oh, boy, OK. 1485 01:06:04,170 --> 01:06:07,120 And when do I have to give them the quiz? 1486 01:06:07,120 --> 01:06:08,080 It's shortish. 1487 01:06:08,080 --> 01:06:10,190 They don't need a full 10 minutes. 1488 01:06:10,190 --> 01:06:10,690 What is it? 1489 01:06:10,690 --> 01:06:11,710 Seven questions? 1490 01:06:11,710 --> 01:06:12,570 AUDIENCE: Eight. 1491 01:06:12,570 --> 01:06:14,195 NANCY KANWISHER: Eight-- eight minutes? 1492 01:06:14,195 --> 01:06:15,310 AUDIENCE: [INAUDIBLE] 1493 01:06:15,310 --> 01:06:19,450 NANCY KANWISHER: OK, make me stop definitively at 12:18. 1494 01:06:19,450 --> 01:06:23,690 OK, so that's cool. 1495 01:06:23,690 --> 01:06:25,780 It's not exactly new, but it's a really nice way 1496 01:06:25,780 --> 01:06:30,310 to rediscover things that we thought to be true. 1497 01:06:30,310 --> 01:06:35,007 All right, then there's component 6 that popped out. 1498 01:06:35,007 --> 01:06:35,840 What is component 6? 1499 01:06:35,840 --> 01:06:38,920 Well, if we average within a category 1500 01:06:38,920 --> 01:06:41,710 instrumental music and music with vocals, 1501 01:06:41,710 --> 01:06:44,470 and everything else is really low. 1502 01:06:44,470 --> 01:06:46,720 We didn't go looking for this. 1503 01:06:46,720 --> 01:06:51,070 Boom-- music selectivity. 1504 01:06:51,070 --> 01:06:52,060 That's pretty amazing. 1505 01:06:52,060 --> 01:06:54,712 Never really been seen before. 1506 01:06:54,712 --> 01:06:56,170 People have looked and they've made 1507 01:06:56,170 --> 01:06:58,780 some kind of sort of smoke and mirrors, like, not really. 1508 01:06:58,780 --> 01:07:00,197 This is the first time it was seen 1509 01:07:00,197 --> 01:07:01,655 and it just popped out of the data. 1510 01:07:01,655 --> 01:07:03,405 And that says that it's not just something 1511 01:07:03,405 --> 01:07:06,160 you can find if you try really hard and go fishing for it. 1512 01:07:06,160 --> 01:07:08,680 It's actually a significant part of the variance 1513 01:07:08,680 --> 01:07:10,120 in this whole response. 1514 01:07:10,120 --> 01:07:12,640 I'm going to skip everything except clarification questions 1515 01:07:12,640 --> 01:07:14,020 now, because I'm-- 1516 01:07:14,020 --> 01:07:14,690 go ahead. 1517 01:07:14,690 --> 01:07:17,092 AUDIENCE: Did these voxels correspond 1518 01:07:17,092 --> 01:07:18,990 to the music [INAUDIBLE]? 1519 01:07:18,990 --> 01:07:21,000 NANCY KANWISHER: Sort of, it's complicated. 1520 01:07:21,000 --> 01:07:23,070 Sorry, it's a long answer. 1521 01:07:23,070 --> 01:07:25,290 So this really looks like it's music. 1522 01:07:25,290 --> 01:07:28,650 And so now, I was vague about what a component is, 1523 01:07:28,650 --> 01:07:30,300 but it's both that response profile 1524 01:07:30,300 --> 01:07:32,140 and it's a set of weights in the brain. 1525 01:07:32,140 --> 01:07:34,020 So if you project this one back in the brain, 1526 01:07:34,020 --> 01:07:36,510 you get this band of speech selective cortex 1527 01:07:36,510 --> 01:07:40,320 right below primary auditory cortex, like that. 1528 01:07:40,320 --> 01:07:43,170 And if you project the music stuff back in the brain, 1529 01:07:43,170 --> 01:07:44,138 you get a patch. 1530 01:07:44,138 --> 01:07:45,930 This is sort of an answer to your question. 1531 01:07:45,930 --> 01:07:48,900 You get a patch up in front of primary auditory 1532 01:07:48,900 --> 01:07:50,250 cortex and a patch behind. 1533 01:07:55,020 --> 01:07:57,480 So here we have a double dissociation 1534 01:07:57,480 --> 01:08:01,350 of speech selectivity and music selectivity in the brain, OK? 1535 01:08:01,350 --> 01:08:04,590 So music doesn't just use mechanisms for speech 1536 01:08:04,590 --> 01:08:05,800 as many people have proposed. 1537 01:08:05,800 --> 01:08:08,400 It's not true, right. 1538 01:08:08,400 --> 01:08:11,130 So when you see dramatic data like this, 1539 01:08:11,130 --> 01:08:15,420 a natural reaction is to say, like, really, get out, come on. 1540 01:08:15,420 --> 01:08:18,510 Like, music specificity, like what? 1541 01:08:18,510 --> 01:08:21,149 So very briefly, Dana has just replicated this 1542 01:08:21,149 --> 01:08:22,859 in a new sample of subjects. 1543 01:08:22,859 --> 01:08:27,060 It does not matter if those subjects have musical training, 1544 01:08:27,060 --> 01:08:28,740 like students from Berklee School who 1545 01:08:28,740 --> 01:08:30,930 spend like six hours a day practicing, 1546 01:08:30,930 --> 01:08:33,060 versus people who have essentially 1547 01:08:33,060 --> 01:08:35,880 zero music lessons ever in their life, 1548 01:08:35,880 --> 01:08:39,458 you get those components in both groups, maybe slightly stronger 1549 01:08:39,458 --> 01:08:40,500 in the trained musicians. 1550 01:08:40,500 --> 01:08:41,550 We're not quite sure yet. 1551 01:08:41,550 --> 01:08:43,620 But in any case, it is totally present in people 1552 01:08:43,620 --> 01:08:45,285 with zero musical training. 1553 01:08:45,285 --> 01:08:47,160 That doesn't mean it's innate, because people 1554 01:08:47,160 --> 01:08:49,859 without musical training have musical experience 1555 01:08:49,859 --> 01:08:52,779 but no explicit training. 1556 01:08:52,779 --> 01:08:54,090 Skip all of this. 1557 01:08:54,090 --> 01:08:55,410 Here is her replication. 1558 01:08:55,410 --> 01:08:56,057 Boom, boom. 1559 01:08:56,057 --> 01:08:57,599 It's there with and without training. 1560 01:08:59,865 --> 01:09:00,990 I'm going to skip all this. 1561 01:09:00,990 --> 01:09:02,823 You can read it on the slides, if I lost you 1562 01:09:02,823 --> 01:09:05,160 in here, because I want to show you something else. 1563 01:09:05,160 --> 01:09:08,850 That music selectivity was not evident 1564 01:09:08,850 --> 01:09:11,420 if you just do a direct contrast in the same data. 1565 01:09:11,420 --> 01:09:13,920 Take all the music conditions, all the non-music conditions, 1566 01:09:13,920 --> 01:09:15,149 you get a blurry mess. 1567 01:09:15,149 --> 01:09:16,200 It's not strong. 1568 01:09:16,200 --> 01:09:19,290 You have to do the math to siphon it off. 1569 01:09:19,290 --> 01:09:20,670 And that's OK. 1570 01:09:20,670 --> 01:09:23,040 But I like to see things in the raw data. 1571 01:09:23,040 --> 01:09:24,569 And so probably what that means is 1572 01:09:24,569 --> 01:09:28,060 that the music is overlapping with other things in the brain. 1573 01:09:28,060 --> 01:09:29,939 And so the direct contrast doesn't work well, 1574 01:09:29,939 --> 01:09:31,600 the math can pull them apart. 1575 01:09:31,600 --> 01:09:33,640 But wouldn't it be nice to see them separately? 1576 01:09:33,640 --> 01:09:35,609 And so we've been doing intracranial recordings 1577 01:09:35,609 --> 01:09:37,649 from patients with electrodes in their brain. 1578 01:09:37,649 --> 01:09:41,490 And I'll just show you a few very cool responses. 1579 01:09:41,490 --> 01:09:44,490 So this is a single electrode in a single patient. 1580 01:09:44,490 --> 01:09:47,160 These are the 165 sounds, same ones. 1581 01:09:47,160 --> 01:09:48,510 This is the time course. 1582 01:09:48,510 --> 01:09:51,420 And this is a speech selective electrode. 1583 01:09:51,420 --> 01:09:53,290 It responds to native and foreign music. 1584 01:09:53,290 --> 01:09:55,080 Those are the two green ones-- 1585 01:09:55,080 --> 01:09:56,760 I'm sorry, native and foreign speech. 1586 01:09:56,760 --> 01:09:59,670 And it responds to music with vocals in pink. 1587 01:09:59,670 --> 01:10:03,390 Everybody see how that's a speech selective electrode? 1588 01:10:03,390 --> 01:10:04,960 So there's loads of those. 1589 01:10:04,960 --> 01:10:06,360 But we also found these. 1590 01:10:06,360 --> 01:10:08,010 Here is a single electrode. 1591 01:10:08,010 --> 01:10:13,240 Look, each row is a single stimulus. 1592 01:10:13,240 --> 01:10:16,260 Here's a histogram of responses to all the music with vocals, 1593 01:10:16,260 --> 01:10:19,830 music without vocals, much stronger than to anything else. 1594 01:10:19,830 --> 01:10:22,360 You might be saying, well, what about those things. 1595 01:10:22,360 --> 01:10:24,210 Let's look at what those things are. 1596 01:10:24,210 --> 01:10:27,360 Oh, even the violations aren't really violations. 1597 01:10:27,360 --> 01:10:30,570 Whistling, humming, computer jingle, ringtone-- 1598 01:10:30,570 --> 01:10:32,490 those are sort of musicy. 1599 01:10:32,490 --> 01:10:35,280 So that is an extremely music-selective 1600 01:10:35,280 --> 01:10:38,100 individual electrode in a single subject's brain. 1601 01:10:38,100 --> 01:10:41,640 No fancy math that might have invented it somehow. 1602 01:10:41,640 --> 01:10:45,780 It's just there right in the raw data. 1603 01:10:45,780 --> 01:10:47,280 Further, and here's the time course, 1604 01:10:47,280 --> 01:10:50,370 you can see the time course of music with instruments, music 1605 01:10:50,370 --> 01:10:52,080 with vocals, everything else. 1606 01:10:52,080 --> 01:10:54,630 Really selective. 1607 01:10:54,630 --> 01:10:56,520 So this is the strongest evidence yet 1608 01:10:56,520 --> 01:10:59,380 for music specificity in the human brain. 1609 01:10:59,380 --> 01:11:04,390 But there's one more cool thing that came out of this analysis. 1610 01:11:04,390 --> 01:11:06,660 And that is we found some electrodes 1611 01:11:06,660 --> 01:11:08,310 that are not just selected for music, 1612 01:11:08,310 --> 01:11:13,320 but selected for vocal music, selected for song. 1613 01:11:13,320 --> 01:11:14,450 And that's really amazing. 1614 01:11:14,450 --> 01:11:16,200 Because as I started off at the beginning, 1615 01:11:16,200 --> 01:11:17,820 many people have said that song is 1616 01:11:17,820 --> 01:11:19,440 a kind of native form of music. 1617 01:11:19,440 --> 01:11:22,900 The first one to evolve and all that kind of stuff. 1618 01:11:22,900 --> 01:11:24,760 And so we did all the controls. 1619 01:11:24,760 --> 01:11:27,210 It's not the low-level stuff. 1620 01:11:27,210 --> 01:11:32,040 And there's lots of open questions. 1621 01:11:32,040 --> 01:11:36,360 We started with this puzzle of how did 1622 01:11:36,360 --> 01:11:38,700 music evolve, if it did evolve. 1623 01:11:38,700 --> 01:11:41,400 And we made a little bit of progress. 1624 01:11:41,400 --> 01:11:44,355 It doesn't share music machinery with speech and language. 1625 01:11:47,460 --> 01:11:50,430 If it's auditory cheesecake, as Pinker said, 1626 01:11:50,430 --> 01:11:53,880 it's auditory cheesecake that not only uses machinery that 1627 01:11:53,880 --> 01:11:55,680 evolved for something else, but changes it 1628 01:11:55,680 --> 01:12:00,480 throughout development and makes it very selective. 1629 01:12:00,480 --> 01:12:02,640 These guys speculated that song is special. 1630 01:12:02,640 --> 01:12:03,990 Maybe it is. 1631 01:12:03,990 --> 01:12:05,940 And sexual selection, who knows? 1632 01:12:05,940 --> 01:12:07,910 We have no data.