1
00:00:00,000 --> 00:00:01,944
[SQUEAKING]

2
00:00:01,944 --> 00:00:03,402
[RUSTLING]

3
00:00:03,402 --> 00:00:05,346
[CLICKING]

4
00:00:09,720 --> 00:00:12,620
NANCY KANWISHER: All
right, OK, so let's start.

5
00:00:12,620 --> 00:00:16,070
We're talking about music
today, which is fun and awesome.

6
00:00:16,070 --> 00:00:20,120
But first, let me give you
a brief whirlwind reminder

7
00:00:20,120 --> 00:00:21,440
of what we did last time.

8
00:00:21,440 --> 00:00:23,960
We talked about hearing
in general and speech

9
00:00:23,960 --> 00:00:25,070
in particular.

10
00:00:25,070 --> 00:00:28,280
And we started, as usual,
with computational theory,

11
00:00:28,280 --> 00:00:31,730
thinking about what is
the problem of audition

12
00:00:31,730 --> 00:00:32,903
and what is sound.

13
00:00:32,903 --> 00:00:34,070
It's the first step of that.

14
00:00:34,070 --> 00:00:36,920
And sound is pressure waves
traveling through the air.

15
00:00:36,920 --> 00:00:38,690
And the cool thing
about hearing is

16
00:00:38,690 --> 00:00:40,700
that we extract
lots of information

17
00:00:40,700 --> 00:00:43,250
from this very, very
simple signal of pressure

18
00:00:43,250 --> 00:00:44,480
waves arriving at the ear.

19
00:00:44,480 --> 00:00:48,320
We use it to recognize
sounds, to localize sounds,

20
00:00:48,320 --> 00:00:50,270
to figure out what
things are made of,

21
00:00:50,270 --> 00:00:54,150
and to understand events around
us, and all kinds of things.

22
00:00:54,150 --> 00:00:58,700
And these problems are a
major computational challenge.

23
00:00:58,700 --> 00:01:00,920
And in particular,
they are ill-posed.

24
00:01:00,920 --> 00:01:05,570
That means that the available
information doesn't give you

25
00:01:05,570 --> 00:01:07,190
a unique solution
if you consider

26
00:01:07,190 --> 00:01:09,680
the computational
problem narrowly.

27
00:01:09,680 --> 00:01:12,510
And that's true for
separating sound sources.

28
00:01:12,510 --> 00:01:15,290
So if you have two
sound sources at once,

29
00:01:15,290 --> 00:01:18,140
say, two people speaking or
a person speaking and a lot

30
00:01:18,140 --> 00:01:21,080
of background noise, that's
known as the cocktail party

31
00:01:21,080 --> 00:01:21,710
problem.

32
00:01:21,710 --> 00:01:23,960
Those sounds add on
top of each other.

33
00:01:23,960 --> 00:01:26,570
And there's no way to pull
them apart without bringing

34
00:01:26,570 --> 00:01:29,660
in other information,
knowledge about the world

35
00:01:29,660 --> 00:01:32,630
or knowledge about the
nature of voices or speaking

36
00:01:32,630 --> 00:01:33,380
or who's speaking.

37
00:01:33,380 --> 00:01:36,050
Or you need something else,
or else it's ill-posed.

38
00:01:36,050 --> 00:01:39,710
That is not solvable just
from the basic input.

39
00:01:39,710 --> 00:01:41,720
Another case of an ill-posed
problem in audition

40
00:01:41,720 --> 00:01:43,370
is the case of reverb.

41
00:01:43,370 --> 00:01:45,980
So the sound that I'm making
right now that's coming out

42
00:01:45,980 --> 00:01:47,690
my mouth is bouncing
off the walls

43
00:01:47,690 --> 00:01:51,740
and is arriving at your ears
over each little piece of sound

44
00:01:51,740 --> 00:01:54,380
that I make is arriving
at different latencies

45
00:01:54,380 --> 00:01:57,830
after I say it as it travels
different paths bouncing

46
00:01:57,830 --> 00:01:58,590
around the room.

47
00:01:58,590 --> 00:02:00,090
There's not too
much reverb in here,

48
00:02:00,090 --> 00:02:02,390
so it's not that noticeable.

49
00:02:02,390 --> 00:02:04,010
But if we did this
in a cathedral,

50
00:02:04,010 --> 00:02:07,130
you'd hear all these echoes.

51
00:02:07,130 --> 00:02:09,410
OK, and so that makes
another ill-posed problem,

52
00:02:09,410 --> 00:02:11,120
because all of those
different sounds

53
00:02:11,120 --> 00:02:15,570
are added on top of themselves
diminished in volume over time.

54
00:02:15,570 --> 00:02:17,210
And you get the sum
of all of those,

55
00:02:17,210 --> 00:02:19,043
and you have to pull
it apart and figure out

56
00:02:19,043 --> 00:02:20,930
what that sound is.

57
00:02:20,930 --> 00:02:23,480
So both problems
are solved by using

58
00:02:23,480 --> 00:02:24,680
knowledge of the real world.

59
00:02:24,680 --> 00:02:27,410
In the case of reverb, it's
actual implicit knowledge

60
00:02:27,410 --> 00:02:29,150
that you all have
that you didn't

61
00:02:29,150 --> 00:02:31,603
know you have about
the physics of reverb.

62
00:02:31,603 --> 00:02:33,770
Because if we play you
sounds with the wrong physics

63
00:02:33,770 --> 00:02:35,960
of reverb, you won't be
able to deal with reverb.

64
00:02:35,960 --> 00:02:38,390
And that says it's implicit
knowledge in your head, which

65
00:02:38,390 --> 00:02:40,640
is pretty cool, that
you use to constrain

66
00:02:40,640 --> 00:02:42,710
the ill-posed problem.

67
00:02:42,710 --> 00:02:43,880
We talked about speech.

68
00:02:43,880 --> 00:02:46,370
Phonemes are sounds
that distinguish

69
00:02:46,370 --> 00:02:49,562
two different words in a
language, like make and bake.

70
00:02:49,562 --> 00:02:51,020
Those are two
different sounds that

71
00:02:51,020 --> 00:02:52,603
make the difference
between two words.

72
00:02:55,220 --> 00:02:57,350
Each possible speech
sound is not a phoneme

73
00:02:57,350 --> 00:02:59,240
in every language of the world.

74
00:02:59,240 --> 00:03:02,750
Languages have some subset of
the space of possible phonemes

75
00:03:02,750 --> 00:03:05,720
that distinguish words
in their language.

76
00:03:05,720 --> 00:03:09,980
Phonemes include vowels that
have these stacked harmonics

77
00:03:09,980 --> 00:03:12,260
in the spectrogram,
and consonants

78
00:03:12,260 --> 00:03:14,900
which are the quick transitions
in the vertical stripes

79
00:03:14,900 --> 00:03:19,730
in the spectrogram, leading into
the harmonic stacks of vowels.

80
00:03:19,730 --> 00:03:23,000
We talked about the problem
of talker variability,

81
00:03:23,000 --> 00:03:26,085
that a given phoneme or
word sounds very different,

82
00:03:26,085 --> 00:03:27,710
looks very different
in the spectrogram

83
00:03:27,710 --> 00:03:29,630
if spoken by two
different people.

84
00:03:29,630 --> 00:03:33,650
And conversely, the same person
speaking two different words

85
00:03:33,650 --> 00:03:35,510
looks very different
in the spectrogram.

86
00:03:35,510 --> 00:03:37,730
And so that means
that the identity

87
00:03:37,730 --> 00:03:40,430
of the speaker and the
identity of the word being said

88
00:03:40,430 --> 00:03:42,227
are all mushed up together.

89
00:03:42,227 --> 00:03:44,060
And that means that if
you want to recognize

90
00:03:44,060 --> 00:03:46,490
the voice independent
of what's being said,

91
00:03:46,490 --> 00:03:49,670
or recognize the word
independent of who's saying it,

92
00:03:49,670 --> 00:03:52,760
you have a big computational
challenge, a classic invariance

93
00:03:52,760 --> 00:03:53,270
problem.

94
00:03:53,270 --> 00:03:53,883
Yeah, Ben.

95
00:03:53,883 --> 00:03:55,425
AUDIENCE: I don't
mean to hold us up.

96
00:03:55,425 --> 00:03:58,880
I just wanted to make sure
that I'm understanding.

97
00:03:58,880 --> 00:04:02,540
So the difference between
consonants and vowels,

98
00:04:02,540 --> 00:04:05,750
are vowels just harmonic,
like connective elements

99
00:04:05,750 --> 00:04:06,860
between consonants?

100
00:04:06,860 --> 00:04:09,080
And are consonants
the percussive?

101
00:04:09,080 --> 00:04:10,640
Or are they actual--

102
00:04:10,640 --> 00:04:12,350
like, I just didn't
understand that.

103
00:04:12,350 --> 00:04:17,198
NANCY KANWISHER: Yeah, so
in the spectrogram, those--

104
00:04:17,198 --> 00:04:18,740
I didn't put that
on the slide here--

105
00:04:18,740 --> 00:04:21,440
but those horizontal red
stripes in the slides

106
00:04:21,440 --> 00:04:24,440
that I showed you last time,
those in the spectrogram, those

107
00:04:24,440 --> 00:04:27,680
are bands of energy at
different frequencies

108
00:04:27,680 --> 00:04:29,450
that are sustained
over a chunk of time.

109
00:04:29,450 --> 00:04:33,860
And those are typical of
vowels, or singing or musical.

110
00:04:33,860 --> 00:04:35,870
And those harmonic
sounds that have pitch.

111
00:04:35,870 --> 00:04:38,540
And so vowels have
those sustained chunks

112
00:04:38,540 --> 00:04:40,422
that look like this
in the spectrogram.

113
00:04:40,422 --> 00:04:42,380
And then there are these
weird vertical stripes

114
00:04:42,380 --> 00:04:44,360
and transitions in
and out of the vowels

115
00:04:44,360 --> 00:04:47,590
that are the consonants.

116
00:04:47,590 --> 00:04:49,550
AUDIENCE: Vowels
are when you don't

117
00:04:49,550 --> 00:04:52,810
have [INAUDIBLE] spectrographs
because air is just

118
00:04:52,810 --> 00:04:54,810
flowing through and you're
filtering it somehow,

119
00:04:54,810 --> 00:04:57,150
like positioning your vocal
tract in a certain way.

120
00:04:57,150 --> 00:04:59,330
And consonants are when
you close off that air

121
00:04:59,330 --> 00:05:01,790
or restrict it in some way.

122
00:05:01,790 --> 00:05:04,882
So like S's and F's, you're
not closing all the way off,

123
00:05:04,882 --> 00:05:06,840
but you're really
constricting the vocal tract.

124
00:05:06,840 --> 00:05:08,215
And in a lot of
other consonants,

125
00:05:08,215 --> 00:05:09,874
you're actually
fully closing it.

126
00:05:13,025 --> 00:05:14,900
NANCY KANWISHER: OK,
and then we talked a bit

127
00:05:14,900 --> 00:05:16,310
about the brain basis.

128
00:05:16,310 --> 00:05:18,920
And I pointed out that
the neural anatomy

129
00:05:18,920 --> 00:05:21,320
of sound processing-- the
subcortical neuroanatomy

130
00:05:21,320 --> 00:05:24,230
is much more complicated than
the subcortical neuroanatomy

131
00:05:24,230 --> 00:05:25,640
of vision.

132
00:05:25,640 --> 00:05:27,682
In vision, you have
one stop in the LGN,

133
00:05:27,682 --> 00:05:30,140
and then you go up to the cortex
coming up from the retina.

134
00:05:30,140 --> 00:05:33,260
In audition, you have many
stops between the cochlea, where

135
00:05:33,260 --> 00:05:38,660
you pick up sounds in the
inner ear, and auditory cortex.

136
00:05:38,660 --> 00:05:40,280
Some of those stops
are shown up here.

137
00:05:40,280 --> 00:05:42,810
And we didn't discuss them.

138
00:05:42,810 --> 00:05:45,200
So then we talked about
primary auditory cortex.

139
00:05:45,200 --> 00:05:47,150
That's on the top of
the temporal lobes,

140
00:05:47,150 --> 00:05:49,070
like right in there medially.

141
00:05:49,070 --> 00:05:50,750
You went in.

142
00:05:50,750 --> 00:05:53,520
And it has this
tonotopic property,

143
00:05:53,520 --> 00:05:56,780
and that is a map of frequency
space with this systematic

144
00:05:56,780 --> 00:06:01,130
high-low-high mapping of
frequency space that you can

145
00:06:01,130 --> 00:06:01,940
see here--

146
00:06:01,940 --> 00:06:03,620
high, low, high, like that.

147
00:06:03,620 --> 00:06:06,590
This is the top of the
temporal lobe right there.

148
00:06:09,140 --> 00:06:13,850
And I pointed out that in
animals and in one recent MRI

149
00:06:13,850 --> 00:06:18,140
study, the response properties
of primary auditory cortex

150
00:06:18,140 --> 00:06:23,540
are well modeled by these fairly
simple linear filters, known

151
00:06:23,540 --> 00:06:28,100
as spectrotemporal receptive
fields or STRFs, shown here.

152
00:06:28,100 --> 00:06:31,460
So they're simple
acoustic properties

153
00:06:31,460 --> 00:06:34,088
of a given band of
frequencies rising or falling

154
00:06:34,088 --> 00:06:34,880
at different rates.

155
00:06:38,250 --> 00:06:42,470
So today, we're going
to talk about music.

156
00:06:42,470 --> 00:06:44,780
And this is also an important
moment in the course.

157
00:06:44,780 --> 00:06:47,750
Because up to now, we've been
talking about functions that

158
00:06:47,750 --> 00:06:49,550
are mostly shared with animals.

159
00:06:49,550 --> 00:06:51,080
Speech is kind of on the cusp.

160
00:06:51,080 --> 00:06:53,437
I was going to make this
point before speech.

161
00:06:53,437 --> 00:06:55,520
And that's actually muddy,
because lots of animals

162
00:06:55,520 --> 00:06:57,830
are really good at
speech perception.

163
00:06:57,830 --> 00:07:00,260
Chinchillas can
distinguish ba from pa.

164
00:07:00,260 --> 00:07:02,150
Go figure, anyway.

165
00:07:02,150 --> 00:07:04,490
So they can perceive
speech, but obviously they

166
00:07:04,490 --> 00:07:05,720
don't use it in the same way.

167
00:07:05,720 --> 00:07:09,273
But music is most
definitely uniquely human.

168
00:07:09,273 --> 00:07:11,690
And so most of the things we'll
be talking about from here

169
00:07:11,690 --> 00:07:15,200
on out are things about the
human brain, in particular.

170
00:07:15,200 --> 00:07:16,925
And I think these are
the coolest things

171
00:07:16,925 --> 00:07:19,550
in human cognitive neuroscience,
because they tell us something

172
00:07:19,550 --> 00:07:23,010
about who we are
as human beings.

173
00:07:23,010 --> 00:07:25,620
But they are also the
hardest ones to study.

174
00:07:25,620 --> 00:07:28,088
Why is that?

175
00:07:28,088 --> 00:07:28,982
AUDIENCE: [INAUDIBLE]

176
00:07:28,982 --> 00:07:31,530
NANCY KANWISHER:
No animal models.

177
00:07:31,530 --> 00:07:34,140
And I'm always lamenting
how-- about the shortcomings

178
00:07:34,140 --> 00:07:36,450
of each of the methods in
human cognitive neuroscience.

179
00:07:36,450 --> 00:07:38,850
And we have lots of them, and
they complement each other,

180
00:07:38,850 --> 00:07:40,350
but there's a whole
host of things

181
00:07:40,350 --> 00:07:42,460
that none of those
methods are good for.

182
00:07:42,460 --> 00:07:44,400
And so now we're
really out on thin ice

183
00:07:44,400 --> 00:07:46,890
trying to understand
these things with a weaker

184
00:07:46,890 --> 00:07:49,380
set of methods where we can't
go back and validate them

185
00:07:49,380 --> 00:07:50,430
with animal models.

186
00:07:50,430 --> 00:07:51,990
And that's just life.

187
00:07:51,990 --> 00:07:54,150
That's what we do.

188
00:07:54,150 --> 00:07:55,800
So now let's back
up for a second

189
00:07:55,800 --> 00:07:58,380
and consider, why am I
allocating a whole lecture

190
00:07:58,380 --> 00:08:03,660
for such a fluffy,
frivolous topic as music.

191
00:08:03,660 --> 00:08:08,240
And I would say, that's
because it's not fluffy.

192
00:08:08,240 --> 00:08:10,610
It's actually fundamental.

193
00:08:10,610 --> 00:08:12,740
And it's fundamental
in the sense

194
00:08:12,740 --> 00:08:15,680
that music is both
uniquely human--

195
00:08:15,680 --> 00:08:19,280
no other animal has anything
remotely like human music--

196
00:08:19,280 --> 00:08:21,740
and it's also universally human.

197
00:08:21,740 --> 00:08:24,500
That is, every human
culture that's been studied

198
00:08:24,500 --> 00:08:26,030
has some kind of music.

199
00:08:26,030 --> 00:08:28,520
So music is really
an essential part

200
00:08:28,520 --> 00:08:30,450
of what it means to
be a human being.

201
00:08:30,450 --> 00:08:32,659
It's really at the
core of humanity.

202
00:08:32,659 --> 00:08:36,140
And that alone makes
it interesting.

203
00:08:36,140 --> 00:08:37,385
But further-- question?

204
00:08:37,385 --> 00:08:38,990
AUDIENCE: So, like, birdsong--

205
00:08:38,990 --> 00:08:40,657
NANCY KANWISHER:
Birdsong doesn't count.

206
00:08:40,657 --> 00:08:43,710
No, birdsong doesn't count
in all kinds of ways.

207
00:08:43,710 --> 00:08:46,580
One, it doesn't have
anywhere near the flexibility

208
00:08:46,580 --> 00:08:47,670
and variability.

209
00:08:47,670 --> 00:08:54,482
There are like narrow domains
in which each male zebra

210
00:08:54,482 --> 00:08:56,690
finch makes a slightly
different version of the call,

211
00:08:56,690 --> 00:08:59,607
but within an
extremely narrow range.

212
00:08:59,607 --> 00:09:01,190
There's actually a
brain imaging study

213
00:09:01,190 --> 00:09:06,320
that looks in brain imaging
in songbirds and asks,

214
00:09:06,320 --> 00:09:14,180
do they have reward brain
region responses to music.

215
00:09:14,180 --> 00:09:16,400
And the answer is,
yes, in some cases.

216
00:09:16,400 --> 00:09:19,220
Like, do they enjoy it,
right, is that part of--

217
00:09:19,220 --> 00:09:22,003
and the answer is
yes, but only when

218
00:09:22,003 --> 00:09:24,170
the significance of the
birdsong is something that's

219
00:09:24,170 --> 00:09:26,900
relevant to them, like, there's
a potential mate right here,

220
00:09:26,900 --> 00:09:28,130
then they like it.

221
00:09:28,130 --> 00:09:30,410
But they don't like
it just for the sound.

222
00:09:30,410 --> 00:09:32,420
And that makes it very
different from humans.

223
00:09:32,420 --> 00:09:34,087
And there are other
differences as well.

224
00:09:36,610 --> 00:09:38,770
So it's further
really important to us

225
00:09:38,770 --> 00:09:40,700
humans in a whole bunch of ways.

226
00:09:40,700 --> 00:09:45,020
One, we have been doing
it for a very long time.

227
00:09:45,020 --> 00:09:48,310
And so, for example, the
archaeological record shows

228
00:09:48,310 --> 00:09:52,840
these 40,000-year-old bone
flutes that you can see from

229
00:09:52,840 --> 00:09:58,600
the structure of flute make
particular sets of possible

230
00:09:58,600 --> 00:10:00,100
pitches.

231
00:10:00,100 --> 00:10:02,710
And further, most people
who've thought about this

232
00:10:02,710 --> 00:10:05,830
have argued that singing
probably goes back much farther

233
00:10:05,830 --> 00:10:06,790
than the bone flutes.

234
00:10:06,790 --> 00:10:08,957
After all, you don't have
to make anything to do it.

235
00:10:08,957 --> 00:10:10,960
You can just sing.

236
00:10:10,960 --> 00:10:13,810
Some have even
speculated that singing

237
00:10:13,810 --> 00:10:16,090
evolved before language.

238
00:10:16,090 --> 00:10:18,700
It's just speculation,
but that's possible.

239
00:10:18,700 --> 00:10:22,600
In any case, it goes
way back evolutionarily.

240
00:10:22,600 --> 00:10:25,340
It also arises early
in development.

241
00:10:25,340 --> 00:10:30,340
So very young infants are
extremely interested in music.

242
00:10:30,340 --> 00:10:33,370
They're sensitive to beat and
melody, independent of pitch.

243
00:10:33,370 --> 00:10:35,950
We'll talk more about
that a little bit.

244
00:10:35,950 --> 00:10:37,450
And finally, if
you're not impressed

245
00:10:37,450 --> 00:10:39,220
with any of those
arguments, people

246
00:10:39,220 --> 00:10:41,680
spend a lot of money on music.

247
00:10:41,680 --> 00:10:43,720
And if that's your
index of importance,

248
00:10:43,720 --> 00:10:45,940
it's really important.

249
00:10:45,940 --> 00:10:50,470
Last year, $43 billion in sales.

250
00:10:50,470 --> 00:10:52,870
So I'd say it's not
a frivolous topic.

251
00:10:52,870 --> 00:10:54,340
It's a fundamental topic.

252
00:10:54,340 --> 00:10:57,700
It's near the core of what
it means to be a human being.

253
00:10:57,700 --> 00:11:01,270
And all of this raises a
really obvious question.

254
00:11:01,270 --> 00:11:04,160
Why do we create and like
music in the first place?

255
00:11:04,160 --> 00:11:06,640
What is it for?

256
00:11:06,640 --> 00:11:12,010
And this is a puzzle that people
have thought about for at least

257
00:11:12,010 --> 00:11:14,800
centuries, probably millennia.

258
00:11:14,800 --> 00:11:17,980
And this includes all
kinds of major thinkers,

259
00:11:17,980 --> 00:11:22,060
like Darwin, who said,
"As neither the enjoyment

260
00:11:22,060 --> 00:11:25,030
nor the capacity of
producing musical notes

261
00:11:25,030 --> 00:11:27,190
are faculties of
the least direct use

262
00:11:27,190 --> 00:11:29,980
to man in reference to his
ordinary habits of life,

263
00:11:29,980 --> 00:11:32,710
they must be ranked amongst
the most mysterious with which

264
00:11:32,710 --> 00:11:34,990
he is endowed."

265
00:11:34,990 --> 00:11:39,250
So Darwin is implicitly
assuming here

266
00:11:39,250 --> 00:11:41,710
that music is an
evolved capacity.

267
00:11:41,710 --> 00:11:44,680
It's not something that we just
learn and that cultures invent,

268
00:11:44,680 --> 00:11:47,440
if they feel like it
or don't feel like it.

269
00:11:47,440 --> 00:11:52,480
But it's actually evolved and
shaped by natural selection.

270
00:11:52,480 --> 00:11:55,120
And that means there
must be some function

271
00:11:55,120 --> 00:11:57,130
that natural selection
was acting on that

272
00:11:57,130 --> 00:11:59,740
was relevant to survival.

273
00:11:59,740 --> 00:12:05,110
So people have speculated about
what that function might be.

274
00:12:05,110 --> 00:12:07,690
Those who think that music
is an evolved function,

275
00:12:07,690 --> 00:12:11,410
including Darwin, he speculated
that it's for sexual selection.

276
00:12:11,410 --> 00:12:13,900
And his writing is so beautiful,
I won't paraphrase it.

277
00:12:13,900 --> 00:12:18,520
He says, "It appears probable
that the progenitors of man,

278
00:12:18,520 --> 00:12:21,190
either the males or
females or both sexes,

279
00:12:21,190 --> 00:12:24,310
before acquiring the power of
expressing their mutual love

280
00:12:24,310 --> 00:12:26,500
in articulate
language, endeavored

281
00:12:26,500 --> 00:12:30,368
to charm each other with
musical notes and rhythm."

282
00:12:30,368 --> 00:12:31,660
So that's Darwin's speculation.

283
00:12:31,660 --> 00:12:33,820
It's just a speculation,
but a lovely one.

284
00:12:33,820 --> 00:12:38,290
Also, note that he threw away
this radical idea in here:

285
00:12:38,290 --> 00:12:42,370
"before acquiring the power
to express their mutual love

286
00:12:42,370 --> 00:12:43,390
in articulate language."

287
00:12:43,390 --> 00:12:48,160
So he's speculating that
music came before language.

288
00:12:48,160 --> 00:12:51,850
Again, all speculation, but
interesting speculation.

289
00:12:51,850 --> 00:12:54,880
More recently, up
the street, there's

290
00:12:54,880 --> 00:12:57,820
a bunch of people who've been
thinking about this a lot.

291
00:12:57,820 --> 00:13:00,130
And Sam Mehr at Harvard
has been arguing

292
00:13:00,130 --> 00:13:02,170
that the function
of music and song,

293
00:13:02,170 --> 00:13:03,910
in particular, which
he thinks is really

294
00:13:03,910 --> 00:13:08,890
the fundamental basic kind
of native form of music,

295
00:13:08,890 --> 00:13:10,960
has an evolutionary
role in managing

296
00:13:10,960 --> 00:13:12,442
parent-offspring conflict.

297
00:13:12,442 --> 00:13:14,650
And that's something that
many evolutionary theorists

298
00:13:14,650 --> 00:13:15,730
have written about.

299
00:13:15,730 --> 00:13:18,640
The genetic interests of
a parent and an offspring

300
00:13:18,640 --> 00:13:21,700
are highly overlapping, but
not completely overlapping.

301
00:13:21,700 --> 00:13:23,590
The parent has other
offspring to take care

302
00:13:23,590 --> 00:13:25,300
of besides this one right here.

303
00:13:25,300 --> 00:13:28,360
That one right there wants
100% of the parent's effort.

304
00:13:28,360 --> 00:13:30,950
Therein lies the conflict.

305
00:13:30,950 --> 00:13:35,020
And so Mehr has proposed
that infant directed

306
00:13:35,020 --> 00:13:37,840
song arose in this
kind of arms race

307
00:13:37,840 --> 00:13:40,390
between the somewhat competing
interests of the parent

308
00:13:40,390 --> 00:13:41,480
and the offspring.

309
00:13:41,480 --> 00:13:44,020
And it manages this
need the infant has

310
00:13:44,020 --> 00:13:46,450
to know the parent is
there with the fact

311
00:13:46,450 --> 00:13:48,460
that the parent has other
needs, so i guess idea

312
00:13:48,460 --> 00:13:54,080
they can sing while attending to
other offspring, and on and on.

313
00:13:54,080 --> 00:13:56,320
So there's other kinds of
speculations like this.

314
00:13:56,320 --> 00:13:59,380
But importantly, this is
not the only kind of view.

315
00:13:59,380 --> 00:14:01,810
It's not necessarily
the case that music

316
00:14:01,810 --> 00:14:03,910
is an evolved capacity.

317
00:14:03,910 --> 00:14:06,520
So others have
argued that it's not.

318
00:14:06,520 --> 00:14:09,160
So Steve Pinker,
also up the street,

319
00:14:09,160 --> 00:14:11,500
has argued that
music is "auditory

320
00:14:11,500 --> 00:14:15,010
cheesecake, an exquisite
confection crafted

321
00:14:15,010 --> 00:14:17,290
to tickle the sensitive
spots of at least six

322
00:14:17,290 --> 00:14:19,480
of our mental faculties.

323
00:14:19,480 --> 00:14:22,630
If it vanished from our species,
the rest of our lifestyle

324
00:14:22,630 --> 00:14:24,628
would be virtually unchanged."

325
00:14:24,628 --> 00:14:26,920
I think that might say a
little more about Steve Pinker

326
00:14:26,920 --> 00:14:29,350
than it does about music.

327
00:14:29,350 --> 00:14:33,640
Nonetheless, it's
a possible view.

328
00:14:33,640 --> 00:14:36,130
What he's saying is
that music is not

329
00:14:36,130 --> 00:14:38,620
an evolutionary
adaptation at all,

330
00:14:38,620 --> 00:14:41,320
but an alternate use
of neural machinery

331
00:14:41,320 --> 00:14:43,090
that evolved for
some other function.

332
00:14:43,090 --> 00:14:45,400
And then once you have
this neural machinery, what

333
00:14:45,400 --> 00:14:47,470
the hell, you can
invent cultural forms

334
00:14:47,470 --> 00:14:51,430
and use it to do other
things like music.

335
00:14:51,430 --> 00:14:54,305
And the most obvious
kind of neural machinery

336
00:14:54,305 --> 00:14:55,930
that you might co-opt
for that function

337
00:14:55,930 --> 00:14:58,360
would be neural machinery for
speech or neural machinery

338
00:14:58,360 --> 00:15:02,020
for language, which, as I
argued briefly last time,

339
00:15:02,020 --> 00:15:03,260
are not the same thing.

340
00:15:03,260 --> 00:15:05,770
One is the auditory
perception of speech sounds

341
00:15:05,770 --> 00:15:07,540
and the other is
the understanding

342
00:15:07,540 --> 00:15:09,850
of linguistic meaning.

343
00:15:09,850 --> 00:15:12,460
So the nice thing
about this is, finally

344
00:15:12,460 --> 00:15:17,740
after all this entertaining
but speculative stuff,

345
00:15:17,740 --> 00:15:19,750
we have an empirical question.

346
00:15:19,750 --> 00:15:21,790
This is something we
can ask empirically.

347
00:15:21,790 --> 00:15:24,580
Does music actually
use the same machinery

348
00:15:24,580 --> 00:15:27,887
as speech or language,
or does it not?

349
00:15:27,887 --> 00:15:29,470
Some of the rest of
these speculations

350
00:15:29,470 --> 00:15:30,760
are very hard to test.

351
00:15:30,760 --> 00:15:31,600
So stay tuned.

352
00:15:31,600 --> 00:15:34,670
We'll get back to that shortly.

353
00:15:34,670 --> 00:15:38,180
But first, let's step
back and think, OK,

354
00:15:38,180 --> 00:15:41,260
if music is an evolved
capacity, it should

355
00:15:41,260 --> 00:15:44,680
be innate in some sense,
at least genetically

356
00:15:44,680 --> 00:15:46,930
specified, right, because
that's what evolution does

357
00:15:46,930 --> 00:15:53,380
is that natural selection acts
on the genome to produce things

358
00:15:53,380 --> 00:15:57,010
that are genetically specified.

359
00:15:57,010 --> 00:15:59,950
And it should be present
in all human societies,

360
00:15:59,950 --> 00:16:01,840
since the branching
out of human societies

361
00:16:01,840 --> 00:16:04,820
is very recent in
human evolution.

362
00:16:04,820 --> 00:16:06,970
So is it?

363
00:16:06,970 --> 00:16:11,140
Well, is music an innate?

364
00:16:11,140 --> 00:16:15,280
So, suppose we found specialized
machinery in the brain

365
00:16:15,280 --> 00:16:17,290
and adults for music.

366
00:16:17,290 --> 00:16:18,910
And we showed
really definitively,

367
00:16:18,910 --> 00:16:21,520
it's really, really, really
specialized for music.

368
00:16:21,520 --> 00:16:24,390
Would that prove innateness?

369
00:16:24,390 --> 00:16:25,460
No, why not?

370
00:16:25,460 --> 00:16:29,933
AUDIENCE: Might
have [INAUDIBLE]..

371
00:16:29,933 --> 00:16:32,830
NANCY KANWISHER: Bingo,
thank you, very good.

372
00:16:32,830 --> 00:16:34,450
Yup, exactly.

373
00:16:34,450 --> 00:16:37,240
So this is something that many,
many people are confused about,

374
00:16:37,240 --> 00:16:39,340
including colleagues
of mine, most

375
00:16:39,340 --> 00:16:41,440
of the popular scientific press.

376
00:16:41,440 --> 00:16:43,570
Just because there's a
specialized bit of brain

377
00:16:43,570 --> 00:16:46,270
that does x doesn't
mean x is innate.

378
00:16:46,270 --> 00:16:47,530
It could be learned.

379
00:16:47,530 --> 00:16:50,638
And the clearest example of that
is the visual word form area.

380
00:16:50,638 --> 00:16:51,430
Everybody get that?

381
00:16:54,280 --> 00:16:57,410
OK, so we've got to
try something else.

382
00:16:57,410 --> 00:17:00,700
What if we find sensitivity
to music, in some very music

383
00:17:00,700 --> 00:17:04,060
particular way, in newborns?

384
00:17:04,060 --> 00:17:07,569
Now that will get closer,
but here's the problem.

385
00:17:07,569 --> 00:17:10,390
Fetuses can hear pretty
well in the womb.

386
00:17:10,390 --> 00:17:13,060
And if the mom is singing
or even if there's

387
00:17:13,060 --> 00:17:15,970
music in the ambient
room, some of that sound

388
00:17:15,970 --> 00:17:17,540
gets into the womb.

389
00:17:17,540 --> 00:17:21,310
So that means that even if you
show sensitivity to music, even

390
00:17:21,310 --> 00:17:24,619
in some very particular
way, in a newborn,

391
00:17:24,619 --> 00:17:27,409
it's not a really tight argument
that it wasn't, in part,

392
00:17:27,409 --> 00:17:27,909
learned.

393
00:17:30,520 --> 00:17:32,450
So this is a real challenge.

394
00:17:32,450 --> 00:17:34,540
It may just be
impossible to answer.

395
00:17:34,540 --> 00:17:35,170
I'm not sure.

396
00:17:35,170 --> 00:17:35,920
I don't know how--

397
00:17:35,920 --> 00:17:38,087
I don't know what method
could actually answer this.

398
00:17:38,087 --> 00:17:40,060
But at the very least,
it's really difficult

399
00:17:40,060 --> 00:17:41,500
and nobody's nailed it.

400
00:17:41,500 --> 00:17:45,820
So we can backtrack and
ask the related, not quite

401
00:17:45,820 --> 00:17:49,570
as definitive question: "But
OK, how early developing is it?"

402
00:17:49,570 --> 00:17:52,960
So often, developmental
psychologists take this hedge.

403
00:17:52,960 --> 00:17:54,910
It's like, we can't
exactly establish

404
00:17:54,910 --> 00:17:55,840
definitive innateness.

405
00:17:55,840 --> 00:17:58,660
But if things are
really there very early

406
00:17:58,660 --> 00:18:00,730
and develop very fast,
that's a suggestion

407
00:18:00,730 --> 00:18:04,610
that at least the system is
designed to pick it up quickly.

408
00:18:04,610 --> 00:18:06,550
So even if there's a
role for experience,

409
00:18:06,550 --> 00:18:09,008
there's some things that are
picked up really fast and some

410
00:18:09,008 --> 00:18:10,280
things that aren't.

411
00:18:10,280 --> 00:18:13,450
And so how quickly
is it picked up?

412
00:18:13,450 --> 00:18:15,430
So it turns out there's
a bunch of studies

413
00:18:15,430 --> 00:18:16,600
that have looked at this.

414
00:18:16,600 --> 00:18:20,650
And young infants are in
fact highly attuned to music.

415
00:18:20,650 --> 00:18:24,700
They're sensitive to
pitch and to rhythm.

416
00:18:24,700 --> 00:18:27,670
And in one charming
study, they took two

417
00:18:27,670 --> 00:18:30,640
to three-day-old infants
who were sleeping,

418
00:18:30,640 --> 00:18:35,380
put EEG electrodes on
them, and played them.

419
00:18:35,380 --> 00:18:37,750
They wanted to test
beat induction, which is

420
00:18:37,750 --> 00:18:39,583
when you hear a rhythmic beat.

421
00:18:39,583 --> 00:18:40,750
You get trained to the beat.

422
00:18:40,750 --> 00:18:42,730
And you know when
the next beat is.

423
00:18:42,730 --> 00:18:45,280
And that's true even if it's
not just a single pulse.

424
00:18:45,280 --> 00:18:50,170
So they played these
infants sounds like this.

425
00:18:50,170 --> 00:18:51,580
Oh, but the audio is not on.

426
00:18:54,820 --> 00:18:56,630
Now it's going to
blast everyone.

427
00:19:02,030 --> 00:19:02,960
All right, hang on.

428
00:19:02,960 --> 00:19:04,151
AUDIENCE: It's playing.

429
00:19:04,151 --> 00:19:05,030
NANCY KANWISHER:
Oh, it is playing?

430
00:19:05,030 --> 00:19:05,540
Turn up more?

431
00:19:05,540 --> 00:19:06,040
OK.

432
00:19:09,040 --> 00:19:12,680
Didn't want to deafen people.

433
00:19:12,680 --> 00:19:13,180
OK, here.

434
00:19:13,180 --> 00:19:14,300
AUDIENCE: It's going
a little [INAUDIBLE]..

435
00:19:14,300 --> 00:19:15,837
Just turn it up so
you can hear it.

436
00:19:26,102 --> 00:19:27,477
AUDIENCE: Go to
HDMI, [INAUDIBLE]

437
00:19:27,477 --> 00:19:28,451
plugged in [INAUDIBLE].

438
00:19:28,451 --> 00:19:32,230
NANCY KANWISHER: It's not, but
that's supposed to work, right?

439
00:19:32,230 --> 00:19:35,655
It has worked before

440
00:19:35,655 --> 00:19:36,989
AUDIENCE: In there.

441
00:19:36,989 --> 00:19:39,614
AUDIENCE: Let's just check your
system settings really quickly.

442
00:19:46,530 --> 00:19:48,430
So I can hear you
from my system.

443
00:19:48,430 --> 00:19:49,847
NANCY KANWISHER:
Yeah, it's weird.

444
00:19:52,223 --> 00:19:54,640
AUDIENCE: Wait, if I can hear
you from my system, you're--

445
00:19:54,640 --> 00:19:57,032
NANCY KANWISHER: Then,
it is going out, yeah.

446
00:19:57,032 --> 00:19:59,790
AUDIENCE: Oh, somebody
unplugged both.

447
00:19:59,790 --> 00:20:01,010
OK, let's try [INAUDIBLE].

448
00:20:01,010 --> 00:20:01,885
NANCY KANWISHER: Aah.

449
00:20:01,885 --> 00:20:03,530
AUDIENCE: OK, try
it one more time.

450
00:20:03,530 --> 00:20:06,710
NANCY KANWISHER: OK, here we go.

451
00:20:06,710 --> 00:20:10,650
[MUSIC PLAYING]

452
00:20:10,650 --> 00:20:11,940
Did you hear that glitch?

453
00:20:11,940 --> 00:20:14,740
Let me do it again.

454
00:20:14,740 --> 00:20:16,987
Take it back here.

455
00:20:16,987 --> 00:20:21,680
[MUSIC PLAYING]

456
00:20:21,680 --> 00:20:24,770
Everybody here the
hiccup in the beat?

457
00:20:24,770 --> 00:20:26,750
So that's what
these guys tested.

458
00:20:26,750 --> 00:20:33,080
They played rhythms like that
to two to three-day-old infants.

459
00:20:33,080 --> 00:20:35,864
And--

460
00:20:35,864 --> 00:20:36,535
[MUSIC PLAYING]

461
00:20:36,535 --> 00:20:37,410
Oh, now it's working.

462
00:20:37,410 --> 00:20:37,910
OK, great.

463
00:20:37,910 --> 00:20:40,980
OK, anyway, so here's what
they find with their ERPs.

464
00:20:40,980 --> 00:20:45,330
This is the onset of that
little hiccup, the time when

465
00:20:45,330 --> 00:20:47,970
that beat was supposed to
happen and didn't, the missing

466
00:20:47,970 --> 00:20:49,410
beat right there.

467
00:20:49,410 --> 00:20:52,020
And this is an ERP
response happening

468
00:20:52,020 --> 00:20:58,260
about 200 milliseconds later for
that missing but expected beat.

469
00:20:58,260 --> 00:21:00,720
And let's see,
this is a standard

470
00:21:00,720 --> 00:21:02,050
where the beat keeps going.

471
00:21:02,050 --> 00:21:03,760
Now you might say, well, of
course they're different.

472
00:21:03,760 --> 00:21:05,302
One has a beat there
and one doesn't.

473
00:21:05,302 --> 00:21:06,670
They're acoustically different.

474
00:21:06,670 --> 00:21:08,760
So they have a control
condition which

475
00:21:08,760 --> 00:21:11,640
has a beat, but a different
preceding context.

476
00:21:11,640 --> 00:21:14,995
So where that beat is not--

477
00:21:14,995 --> 00:21:16,620
I'm sorry, where it
has a missing beat,

478
00:21:16,620 --> 00:21:19,290
but that's expected by
the previous context.

479
00:21:19,290 --> 00:21:22,620
So that's just evidence
that even young infants

480
00:21:22,620 --> 00:21:26,190
have some sense of beat.

481
00:21:26,190 --> 00:21:29,670
So moving a little later,
by five to six months,

482
00:21:29,670 --> 00:21:32,040
infants can recognize
a familiar melody,

483
00:21:32,040 --> 00:21:34,710
even if it's shifted in
pitch from the version

484
00:21:34,710 --> 00:21:36,120
that they learned.

485
00:21:36,120 --> 00:21:38,280
And that's really
cool, because that

486
00:21:38,280 --> 00:21:41,130
means they use relative
pitch, not absolute pitch.

487
00:21:41,130 --> 00:21:43,650
And that's something
that adults do in music.

488
00:21:43,650 --> 00:21:44,740
We're very good at that.

489
00:21:44,740 --> 00:21:46,230
But no animal can do that.

490
00:21:46,230 --> 00:21:49,410
You can train animals to do
various things like recognize

491
00:21:49,410 --> 00:21:52,260
a particular pair
of sounds or even

492
00:21:52,260 --> 00:21:54,180
a few sounds, a few pitches.

493
00:21:54,180 --> 00:21:56,680
But if you transpose it,
they don't recognize that.

494
00:21:56,680 --> 00:21:57,180
Yeah, Ben.

495
00:22:00,066 --> 00:22:02,540
AUDIENCE: Isn't it
possible that we're

496
00:22:02,540 --> 00:22:04,982
just sensitive to
rhythm and pitch

497
00:22:04,982 --> 00:22:06,815
rather than being
sensitive to music itself?

498
00:22:06,815 --> 00:22:09,000
NANCY KANWISHER: Yes,
hang on to that thought.

499
00:22:09,000 --> 00:22:11,820
It takes more work to show
that it's music per se

500
00:22:11,820 --> 00:22:14,220
rather than just
rhythm and pitch.

501
00:22:14,220 --> 00:22:16,200
We'd have to say what
we meant by rhythm.

502
00:22:16,200 --> 00:22:18,510
If we load enough into
the idea of rhythm, then

503
00:22:18,510 --> 00:22:20,230
it's like most of
music right there.

504
00:22:20,230 --> 00:22:23,160
But we might say just even beat.

505
00:22:23,160 --> 00:22:24,660
How about that, right?

506
00:22:24,660 --> 00:22:26,670
And actually, already
this study already

507
00:22:26,670 --> 00:22:28,230
is not just an even
beat, because it

508
00:22:28,230 --> 00:22:29,730
has more context than that.

509
00:22:29,730 --> 00:22:35,520
That is, for example, the
beats in this ERP infant study

510
00:22:35,520 --> 00:22:37,657
were not emphasized louder.

511
00:22:37,657 --> 00:22:39,990
The infants have to be able
to pick out what the beat is

512
00:22:39,990 --> 00:22:41,910
from that complex sound.

513
00:22:41,910 --> 00:22:44,160
It's not automatically
there in the acoustic signal

514
00:22:44,160 --> 00:22:46,725
as the louder onset sound.

515
00:22:51,810 --> 00:22:53,880
Five-month-old
infants, if you play

516
00:22:53,880 --> 00:22:56,280
them a melody for one
or two weeks, so they

517
00:22:56,280 --> 00:22:58,845
get really familiar
with it and learn it,

518
00:22:58,845 --> 00:23:01,470
and then you don't play it again
and you come back eight months

519
00:23:01,470 --> 00:23:04,340
later, they remember it.

520
00:23:04,340 --> 00:23:08,000
So music is really
salient to infants.

521
00:23:08,000 --> 00:23:11,535
On the other hand, newborn
infants' appreciation of music

522
00:23:11,535 --> 00:23:12,035
is not--

523
00:23:16,930 --> 00:23:19,570
what is that not doing there?

524
00:23:19,570 --> 00:23:21,700
Oh, yeah, that's right.

525
00:23:21,700 --> 00:23:28,360
So they don't prefer consonance
over dissonance, right.

526
00:23:28,360 --> 00:23:31,900
And they're insensitive to key.

527
00:23:35,020 --> 00:23:41,740
And they detect timing
changes in rhythms,

528
00:23:41,740 --> 00:23:44,110
whether they are
timing changes that

529
00:23:44,110 --> 00:23:46,000
are typical in the
kind of music they've

530
00:23:46,000 --> 00:23:50,470
heard or typical in a more
foreign kind of music.

531
00:23:50,470 --> 00:23:54,760
And so a really nice
study that shows this

532
00:23:54,760 --> 00:23:58,030
is that in Western music,
it's really common to have--

533
00:23:58,030 --> 00:24:00,880
most Western music
has isochronous beat.

534
00:24:00,880 --> 00:24:02,350
So you can see that over here.

535
00:24:02,350 --> 00:24:03,940
Here's an isochronous beat.

536
00:24:03,940 --> 00:24:05,980
Those are even,
temporal intervals.

537
00:24:05,980 --> 00:24:08,930
And there's a whole note
here and then half notes.

538
00:24:08,930 --> 00:24:11,860
And they're all multiples
of each other, just wholes

539
00:24:11,860 --> 00:24:20,230
and halves, with the beat
happening every four notes.

540
00:24:20,230 --> 00:24:22,630
Non-isochronous beat has
this funny business where

541
00:24:22,630 --> 00:24:29,028
there's a whole note and a half
note, making up just three--

542
00:24:29,028 --> 00:24:30,320
what do you call those things--

543
00:24:30,320 --> 00:24:31,040
they're not beats.

544
00:24:31,040 --> 00:24:31,670
What are they called?

545
00:24:31,670 --> 00:24:32,630
AUDIENCE: Three-beat notes.

546
00:24:32,630 --> 00:24:33,710
NANCY KANWISHER: Sorry,
three notes, I guess.

547
00:24:33,710 --> 00:24:35,330
But it's not even notes,
because it's whatever.

548
00:24:35,330 --> 00:24:36,872
I don't know what
the terminology is.

549
00:24:36,872 --> 00:24:40,520
But anyway, this sound
here followed by 4.

550
00:24:40,520 --> 00:24:42,410
This is non-isochronous rhythm.

551
00:24:42,410 --> 00:24:46,640
Those are really
common in Balkan music

552
00:24:46,640 --> 00:24:50,392
where they do all kinds
of crazy things, like 8/22

553
00:24:50,392 --> 00:24:51,350
or something like that.

554
00:24:51,350 --> 00:24:54,110
I mean, like really, really
crazy musical meters.

555
00:24:54,110 --> 00:24:55,760
They're awesome, I love them.

556
00:24:55,760 --> 00:24:57,230
But they are very other.

557
00:24:57,230 --> 00:25:01,280
Like, if you grew up
in Western society

558
00:25:01,280 --> 00:25:03,020
when you first hear
Balkan rhythms,

559
00:25:03,020 --> 00:25:05,240
it's very hard to copy them.

560
00:25:05,240 --> 00:25:08,540
But six-month-old
infants get rhythms

561
00:25:08,540 --> 00:25:14,030
equally well if they're
isochronous or non-isochronous.

562
00:25:14,030 --> 00:25:18,020
By 12 months, they can
only get automatically,

563
00:25:18,020 --> 00:25:21,020
like immediately,
perceive and appreciate

564
00:25:21,020 --> 00:25:24,560
rhythms that are familiar
from their cultural exposure.

565
00:25:24,560 --> 00:25:27,440
That is isochronous if
they're from a Western society

566
00:25:27,440 --> 00:25:32,730
or non-isochronous if they're
from a Balkan country.

567
00:25:32,730 --> 00:25:33,230
Yeah?

568
00:25:33,230 --> 00:25:35,465
AUDIENCE: just what is
getting a meter again?

569
00:25:35,465 --> 00:25:36,840
NANCY KANWISHER:
Well, so there's

570
00:25:36,840 --> 00:25:37,690
a whole bunch of studies.

571
00:25:37,690 --> 00:25:38,773
I'm just summarizing here.

572
00:25:38,773 --> 00:25:41,940
That is, they're sensitive
to violations by all kinds

573
00:25:41,940 --> 00:25:45,000
of measures of little whatever
behavioral thing you can get

574
00:25:45,000 --> 00:25:47,730
out of a five-month-old, whether
it's how much they're kicking

575
00:25:47,730 --> 00:25:49,050
their legs or how much--

576
00:25:49,050 --> 00:25:51,300
often, it's how
hard they're sucking

577
00:25:51,300 --> 00:25:52,830
on a pacifier is
another measure.

578
00:25:52,830 --> 00:25:54,870
So you just see, can
they detect changes

579
00:25:54,870 --> 00:25:57,560
in a stimulus or violations
by any of those measures.

580
00:25:57,560 --> 00:25:58,935
Or you could do
it with the ERPs.

581
00:26:02,790 --> 00:26:08,880
So brief exposure to a
previously unfamiliar rhythm

582
00:26:08,880 --> 00:26:12,120
is enough for a
12-month-old to appreciate

583
00:26:12,120 --> 00:26:14,610
the relevant distinctions
in that rhythm,

584
00:26:14,610 --> 00:26:16,800
but not for adults.

585
00:26:16,800 --> 00:26:20,340
So if you haven't heard
non-isochronous Balkan rhythms

586
00:26:20,340 --> 00:26:25,842
until now and you try dancing
to them, good luck to you.

587
00:26:25,842 --> 00:26:27,300
You can probably
get it eventually,

588
00:26:27,300 --> 00:26:30,270
but it will take
you a long time.

589
00:26:30,270 --> 00:26:31,710
So does this sound familiar?

590
00:26:34,470 --> 00:26:36,540
Perceptual narrowing, right?

591
00:26:36,540 --> 00:26:38,370
So we keep encountering this.

592
00:26:38,370 --> 00:26:41,400
We encountered this
with face recognition,

593
00:26:41,400 --> 00:26:46,140
with same versus other races,
same versus other species.

594
00:26:46,140 --> 00:26:48,090
You see it in face recognition.

595
00:26:48,090 --> 00:26:49,920
We encountered it with
phoneme perception.

596
00:26:49,920 --> 00:26:53,190
The phonemes-- remember,
newborn infants can distinguish

597
00:26:53,190 --> 00:26:55,230
all the phonemes of
the world's languages,

598
00:26:55,230 --> 00:26:57,990
even those exotic clicks
that I played last time

599
00:26:57,990 --> 00:27:00,240
from Southern African languages.

600
00:27:00,240 --> 00:27:03,780
And you guys can't distinguish
all those clicks now.

601
00:27:03,780 --> 00:27:06,420
So that's perceptual narrowing.

602
00:27:06,420 --> 00:27:08,520
It makes sense, of course,
because the reason we

603
00:27:08,520 --> 00:27:11,745
have perceptual narrowing is
you want to have invariants.

604
00:27:11,745 --> 00:27:13,620
You want to appreciate
the sameness of things

605
00:27:13,620 --> 00:27:15,120
across transformations.

606
00:27:15,120 --> 00:27:19,020
And if your speech culture
or your music culture

607
00:27:19,020 --> 00:27:22,027
is telling you these two things,
this variation, doesn't count,

608
00:27:22,027 --> 00:27:23,610
you want to throw
away that difference

609
00:27:23,610 --> 00:27:25,060
and treat them as the same.

610
00:27:25,060 --> 00:27:28,200
And then once you do that, you
can't make that discrimination

611
00:27:28,200 --> 00:27:31,380
anymore.

612
00:27:31,380 --> 00:27:33,240
So on this question
we started with,

613
00:27:33,240 --> 00:27:35,130
is music an evolved capacity.

614
00:27:35,130 --> 00:27:36,900
If so, it should be innate.

615
00:27:36,900 --> 00:27:39,420
And we haven't really
answered that question, maybe.

616
00:27:39,420 --> 00:27:42,810
But as I said, it's really
hard, and maybe ultimately

617
00:27:42,810 --> 00:27:43,410
unanswerable.

618
00:27:43,410 --> 00:27:45,688
But certainly it's
early developing.

619
00:27:45,688 --> 00:27:46,980
What about this other question?

620
00:27:46,980 --> 00:27:50,970
Is it present in
all human societies?

621
00:27:50,970 --> 00:27:53,910
Well, I said before
briefly that it is.

622
00:27:53,910 --> 00:27:56,400
Oh yeah, sorry, we have
to back up and say, OK,

623
00:27:56,400 --> 00:28:00,450
to answer this question, we
have to say what is music.

624
00:28:00,450 --> 00:28:02,580
To answer whether it's
present in all societies.

625
00:28:02,580 --> 00:28:05,910
And this has been a real
problem, because music

626
00:28:05,910 --> 00:28:07,890
is notoriously hard to define.

627
00:28:07,890 --> 00:28:10,200
And many people
have made a point

628
00:28:10,200 --> 00:28:13,950
of stretching the definition
of music, including

629
00:28:13,950 --> 00:28:17,970
the ridiculous and
hilarious John Cage.

630
00:28:23,990 --> 00:28:27,650
So this is his
1960 TV appearance.

631
00:28:27,650 --> 00:28:28,490
[VIDEO PLAYBACK]

632
00:28:28,490 --> 00:28:30,770
- Over here, Mr. Cage has
a tape recording machine,

633
00:28:30,770 --> 00:28:33,283
which will provide much of
the-- will you touch the machine

634
00:28:33,283 --> 00:28:35,450
so we can know where it
is-- which will provide much

635
00:28:35,450 --> 00:28:37,220
of the background.

636
00:28:37,220 --> 00:28:39,860
Also, he works with a stopwatch.

637
00:28:39,860 --> 00:28:41,930
The reason he does this
is because these sounds

638
00:28:41,930 --> 00:28:45,800
are in no sense accidental
in their sequence.

639
00:28:45,800 --> 00:28:47,840
They each must
fall mathematically

640
00:28:47,840 --> 00:28:49,070
at a precise point.

641
00:28:49,070 --> 00:28:51,170
So he wants to
watch as he works.

642
00:28:51,170 --> 00:28:52,740
He takes it seriously.

643
00:28:52,740 --> 00:28:53,990
I think it's interesting.

644
00:28:53,990 --> 00:28:56,790
If you are amused,
you may laugh.

645
00:28:56,790 --> 00:28:59,490
If you like it, you
may buy the recording.

646
00:28:59,490 --> 00:29:02,666
John Cage and "Water Walk."

647
00:29:08,618 --> 00:29:15,562
[EXPERIMENTAL MUSICAL SOUNDS]

648
00:29:44,739 --> 00:29:45,322
[END PLAYBACK]

649
00:29:45,322 --> 00:29:48,290
NANCY KANWISHER: Anyway, it
goes on and on like that.

650
00:29:48,290 --> 00:29:52,710
I guess it was a little
edgier in 1959 than it is now.

651
00:29:52,710 --> 00:29:54,560
But he's making a point.

652
00:29:54,560 --> 00:29:58,940
He's making a point is,
what the hell is music.

653
00:29:58,940 --> 00:30:02,990
And he's saying, I can
call this music if I want.

654
00:30:02,990 --> 00:30:05,925
And everybody's enjoying it.

655
00:30:05,925 --> 00:30:06,425
Anyway.

656
00:30:11,420 --> 00:30:13,490
So you can watch the
YouTube video, if you want.

657
00:30:13,490 --> 00:30:16,130
It's quite entertaining.

658
00:30:16,130 --> 00:30:18,182
Despite this kind
of nihilistic view

659
00:30:18,182 --> 00:30:19,640
that anything could
count is music,

660
00:30:19,640 --> 00:30:21,860
there are some
things we can say.

661
00:30:21,860 --> 00:30:24,548
First thing I'd say is, if
you want to study music,

662
00:30:24,548 --> 00:30:26,090
one of your first
things you run into

663
00:30:26,090 --> 00:30:27,420
is, oh, what's going to count.

664
00:30:27,420 --> 00:30:28,753
You run into this problem here.

665
00:30:28,753 --> 00:30:30,170
But actually, I
think that doesn't

666
00:30:30,170 --> 00:30:32,540
need to be so paralyzing
as it feels at first.

667
00:30:32,540 --> 00:30:35,030
You can just take the most
canonical forms where all

668
00:30:35,030 --> 00:30:38,150
of your subjects will agree that
this is music and this isn't.

669
00:30:38,150 --> 00:30:40,630
And then someday you can
study the edge cases later,

670
00:30:40,630 --> 00:30:43,130
but you don't need to agonize
about them in order to get off

671
00:30:43,130 --> 00:30:45,020
the ground and study it.

672
00:30:45,020 --> 00:30:49,850
Further, we can ask what
is music cross-culturally.

673
00:30:49,850 --> 00:30:52,580
Oh, right, I keep
forgetting my next point.

674
00:30:52,580 --> 00:30:55,400
And let me make another point
is that music is not just

675
00:30:55,400 --> 00:30:57,800
about a set of
acoustic properties.

676
00:30:57,800 --> 00:31:01,250
You may think of music as
just an auditory thing,

677
00:31:01,250 --> 00:31:06,570
a solitary experience, because a
lot of the time it's like that.

678
00:31:06,570 --> 00:31:10,670
But remember that that's a
very recent cultural invention.

679
00:31:10,670 --> 00:31:13,250
And throughout most
of human evolution,

680
00:31:13,250 --> 00:31:16,970
music has been a fundamentally
social phenomenon, more

681
00:31:16,970 --> 00:31:21,050
like this, experienced
in groups of people

682
00:31:21,050 --> 00:31:25,130
as a kind of deeply social,
communicative, interactive kind

683
00:31:25,130 --> 00:31:25,970
of enterprise.

684
00:31:25,970 --> 00:31:28,340
Or even if not in a
large group, music

685
00:31:28,340 --> 00:31:30,410
is very social in
this sense here.

686
00:31:30,410 --> 00:31:33,530
There's a whole bunch of cool
studies about the role of song

687
00:31:33,530 --> 00:31:36,530
in infants and how infants
use song to glean information

688
00:31:36,530 --> 00:31:38,900
about their social environment.

689
00:31:38,900 --> 00:31:41,310
And the point is just
music is extremely social.

690
00:31:41,310 --> 00:31:45,200
It's not just defined by
its acoustic properties.

691
00:31:45,200 --> 00:31:48,440
But in addition, we
can ask, OK, let's

692
00:31:48,440 --> 00:31:52,320
look across the cultures
of the world and ask,

693
00:31:52,320 --> 00:31:53,810
are there universals of music?

694
00:31:53,810 --> 00:31:57,740
Is there anything in common
across all the different kinds

695
00:31:57,740 --> 00:32:00,980
of music that people experience
in different cultures?

696
00:32:00,980 --> 00:32:04,370
For example, are there always
discrete pitches or always

697
00:32:04,370 --> 00:32:05,163
isochronous beats.

698
00:32:05,163 --> 00:32:06,830
I already showed you
there aren't always

699
00:32:06,830 --> 00:32:07,940
isochronous beats.

700
00:32:07,940 --> 00:32:10,940
And this is nice because
it's an empirical question.

701
00:32:10,940 --> 00:32:13,700
There's a really cool
paper from a few years ago

702
00:32:13,700 --> 00:32:16,190
where they took
recordings of music

703
00:32:16,190 --> 00:32:19,340
from all over the world,
all those colored dots,

704
00:32:19,340 --> 00:32:22,430
and they asked, what
are the properties that

705
00:32:22,430 --> 00:32:26,540
are present in most
of those musics

706
00:32:26,540 --> 00:32:28,010
and how prevalent are they.

707
00:32:28,010 --> 00:32:30,680
And what they found is there's
no single property of music

708
00:32:30,680 --> 00:32:32,880
that's present in all
of those cultures,

709
00:32:32,880 --> 00:32:35,150
but there's many that
are present in most,

710
00:32:35,150 --> 00:32:37,260
and there are a lot
of regularities.

711
00:32:37,260 --> 00:32:40,790
So this is a huge
table from their paper

712
00:32:40,790 --> 00:32:44,570
where they list many
different possible universals.

713
00:32:44,570 --> 00:32:48,080
And what you see is the relevant
column is this one here.

714
00:32:48,080 --> 00:32:52,550
And the white is the percent
of those 304 cultures

715
00:32:52,550 --> 00:32:56,720
that they looked at that have
that property in their music.

716
00:32:56,720 --> 00:32:59,242
So these top ones
are very prevalent,

717
00:32:59,242 --> 00:33:00,950
just not quite universal,
because there's

718
00:33:00,950 --> 00:33:04,580
a couple of cases
that don't have it.

719
00:33:04,580 --> 00:33:07,070
So one of the most
common ones is the idea

720
00:33:07,070 --> 00:33:10,280
that melodies are made
from a limited set

721
00:33:10,280 --> 00:33:13,520
of discrete pitches,
seven or fewer,

722
00:33:13,520 --> 00:33:17,210
and that those pitches are
arranged in some kind of scale

723
00:33:17,210 --> 00:33:20,630
with unequal intervals
between the notes.

724
00:33:20,630 --> 00:33:23,240
So that's as close to
a universal of music

725
00:33:23,240 --> 00:33:24,740
as you can get,
although you can see

726
00:33:24,740 --> 00:33:26,360
from that little
teeny black snip

727
00:33:26,360 --> 00:33:29,780
that it's not quite
perfectly universal.

728
00:33:29,780 --> 00:33:33,170
And the second thing
is that most music has

729
00:33:33,170 --> 00:33:36,680
some kind of regular pulse,
either an isochronous

730
00:33:36,680 --> 00:33:39,950
beat or even the
non-isochronous ones

731
00:33:39,950 --> 00:33:42,890
have different subdivisions
with different numbers of beats

732
00:33:42,890 --> 00:33:46,070
so that there's a
systematic rhythmic pattern.

733
00:33:46,070 --> 00:33:49,340
So there's something kind
of like melody and something

734
00:33:49,340 --> 00:33:54,410
kind of like rhythm in almost
all the world's musics.

735
00:33:54,410 --> 00:33:57,110
They did find some
pretty weird ones, one

736
00:33:57,110 --> 00:33:58,530
I can't resist playing for you.

737
00:33:58,530 --> 00:34:00,753
This is from Papua New Guinea.

738
00:34:00,753 --> 00:34:03,170
So as they say, the closest
thing to an absolute universal

739
00:34:03,170 --> 00:34:06,140
was song containing
discrete pitches,

740
00:34:06,140 --> 00:34:09,500
or regular rhythmic patterns,
or both, which implied

741
00:34:09,500 --> 00:34:11,010
to almost the entire sample.

742
00:34:11,010 --> 00:34:13,850
However, music examples
from Papua New Guinea

743
00:34:13,850 --> 00:34:18,949
contain combinations of friction
blocks, swung slats, ribbon

744
00:34:18,949 --> 00:34:20,980
reeds, and moaning voices--

745
00:34:20,980 --> 00:34:22,730
I don't know what those
things are either,

746
00:34:22,730 --> 00:34:24,770
but I'll play them
for you in a second--

747
00:34:24,770 --> 00:34:28,489
that contained neither discrete
pitches nor an isochronous

748
00:34:28,489 --> 00:34:29,060
beat.

749
00:34:29,060 --> 00:34:29,752
OK, here we go.

750
00:34:29,752 --> 00:34:30,419
[VIDEO PLAYBACK]

751
00:34:30,419 --> 00:34:37,290
[PAPUA NEW GUINEAN MUSIC]

752
00:34:46,132 --> 00:34:47,090
[END PLAYBACK]

753
00:34:47,090 --> 00:34:50,210
OK, pretty wild, huh?

754
00:34:50,210 --> 00:34:53,210
So maybe wilder,
arguably, than John Cage.

755
00:34:53,210 --> 00:34:57,710
But anyway, so there are
some like pretty remote edges

756
00:34:57,710 --> 00:35:01,640
to the concept of music.

757
00:35:01,640 --> 00:35:05,240
I mentioned before the case
of consonance and dissonance

758
00:35:05,240 --> 00:35:09,130
and that infants don't
prefer one over the other.

759
00:35:09,130 --> 00:35:12,770
In fact, this links to a
really cool recent study

760
00:35:12,770 --> 00:35:14,360
from Josh McDermott's lab.

761
00:35:14,360 --> 00:35:18,230
And so the question he asked is,
why do we like consonant sounds

762
00:35:18,230 --> 00:35:19,910
like this--

763
00:35:19,910 --> 00:35:21,860
oops, [INAUDIBLE] play.

764
00:35:21,860 --> 00:35:22,400
Here we go.

765
00:35:22,400 --> 00:35:24,310
[RHYTHMIC SOUND]

766
00:35:24,310 --> 00:35:26,230
Kind of nice, right?

767
00:35:26,230 --> 00:35:28,810
But we're not so hot about this.

768
00:35:28,810 --> 00:35:30,600
[OFF TUNE SOUND]

769
00:35:30,600 --> 00:35:32,770
Right, everybody
get that intuition?

770
00:35:32,770 --> 00:35:34,960
OK so what's up with that?

771
00:35:34,960 --> 00:35:37,240
So many people have
hypothesized for a long time

772
00:35:37,240 --> 00:35:39,640
that that difference
is based in biology,

773
00:35:39,640 --> 00:35:42,250
or even it's like a
physical analog of it,

774
00:35:42,250 --> 00:35:45,040
beats and stuff like that.

775
00:35:45,040 --> 00:35:48,530
But actually, it's an
empirical question.

776
00:35:48,530 --> 00:35:50,140
And so one way to
ask that question

777
00:35:50,140 --> 00:35:53,950
is to go to a culture
that's had minimal exposure

778
00:35:53,950 --> 00:35:57,303
to Western music, all of which
really prefers consonance

779
00:35:57,303 --> 00:35:57,970
over dissonance.

780
00:35:57,970 --> 00:35:58,715
Yes, [? Carly? ?]

781
00:35:58,715 --> 00:36:00,173
AUDIENCE: Is
consonants [INAUDIBLE]

782
00:36:00,173 --> 00:36:01,556
differentiated [INAUDIBLE]?

783
00:36:01,556 --> 00:36:03,070
NANCY KANWISHER: Oh, yeah, yeah.

784
00:36:03,070 --> 00:36:04,690
I'm sorry, totally
different word--

785
00:36:04,690 --> 00:36:09,450
consonance, C-E,
has no relationship

786
00:36:09,450 --> 00:36:13,500
to consonants as
distinguished from vowels.

787
00:36:13,500 --> 00:36:15,750
A consonant and a vowel,
those are two different kinds

788
00:36:15,750 --> 00:36:16,692
of phonemes.

789
00:36:16,692 --> 00:36:18,900
Here, consonance is that
difference between those two

790
00:36:18,900 --> 00:36:20,940
sounds I just played.

791
00:36:20,940 --> 00:36:24,810
And it has to do with
the precise intervals

792
00:36:24,810 --> 00:36:29,700
of those harmonics in
the harmonic stack.

793
00:36:29,700 --> 00:36:36,470
All right, so what McDermott
and his co-workers did is to go

794
00:36:36,470 --> 00:36:41,900
to a Bolivian culture in the
rainforest in a very remote

795
00:36:41,900 --> 00:36:45,680
location to test these
people here, the Tsimane'.

796
00:36:45,680 --> 00:36:48,590
And the Tsimane'
lack televisions

797
00:36:48,590 --> 00:36:53,360
and have very little access
to recorded music and radio.

798
00:36:53,360 --> 00:36:56,510
Their village doesn't have
electricity or tap water.

799
00:36:56,510 --> 00:37:01,010
You can't get there by road and
you have to get there by canoe.

800
00:37:01,010 --> 00:37:03,410
So that's what McDermott
and his team did.

801
00:37:03,410 --> 00:37:05,900
They went down there
to visit the Tsimane'.

802
00:37:05,900 --> 00:37:09,050
And what they found,
they played them

803
00:37:09,050 --> 00:37:12,290
consonant sounds and dissonant
sounds, and with a translator,

804
00:37:12,290 --> 00:37:13,820
and spent a lot of
time making sure

805
00:37:13,820 --> 00:37:16,580
that they really understood
the difference between liking

806
00:37:16,580 --> 00:37:17,360
and not liking.

807
00:37:17,360 --> 00:37:18,877
And they tested
their understanding

808
00:37:18,877 --> 00:37:20,960
of what it means to like
something or not like it,

809
00:37:20,960 --> 00:37:22,340
and all kinds of other ways.

810
00:37:22,340 --> 00:37:25,010
And the upshot is,
the Tsimane' do not

811
00:37:25,010 --> 00:37:29,600
have a preference for
consonance over dissonance.

812
00:37:29,600 --> 00:37:32,150
So it's not a
cultural universal.

813
00:37:32,150 --> 00:37:33,680
And that's consistent
with the idea

814
00:37:33,680 --> 00:37:38,120
that it's not a preference
in infants either.

815
00:37:38,120 --> 00:37:42,650
So this is something
specific to Western music.

816
00:37:42,650 --> 00:37:47,030
So that's kind of introduction
to some stuff about what music

817
00:37:47,030 --> 00:37:49,100
is and what its
variability is and the fact

818
00:37:49,100 --> 00:37:51,680
that its presence is universal.

819
00:37:51,680 --> 00:37:55,370
And there are many very common
properties across the world's

820
00:37:55,370 --> 00:37:59,710
musics, and it developed early.

821
00:37:59,710 --> 00:38:02,680
So let's ask, is music
a separate capacity

822
00:38:02,680 --> 00:38:04,330
in the mind and brain.

823
00:38:04,330 --> 00:38:06,580
All right, so let's start
with the classic way

824
00:38:06,580 --> 00:38:08,920
this has been asked
for many decades,

825
00:38:08,920 --> 00:38:11,350
and that's to study
patients with brain damage.

826
00:38:11,350 --> 00:38:13,510
And it turns out
there is such a thing

827
00:38:13,510 --> 00:38:17,560
as amusia, the loss of music
ability after brain damage.

828
00:38:21,130 --> 00:38:23,290
And so there are
both sides of this.

829
00:38:23,290 --> 00:38:25,090
There are people who
have impaired ability

830
00:38:25,090 --> 00:38:29,200
to recognize melodies without
impaired speech perception.

831
00:38:29,200 --> 00:38:31,030
And there's the
opposite-- people

832
00:38:31,030 --> 00:38:34,090
who have impaired speech
recognition without impaired

833
00:38:34,090 --> 00:38:36,580
melody recognition.

834
00:38:36,580 --> 00:38:39,730
So that is, of course, a
double dissociation, sort of,

835
00:38:39,730 --> 00:38:41,260
it's a little mucky in there.

836
00:38:41,260 --> 00:38:43,150
If you state the word
simply like that,

837
00:38:43,150 --> 00:38:47,320
if you look in detail, there's
some muck, as there often is.

838
00:38:47,320 --> 00:38:49,930
So let's look in a little more
detail at these two cases,

839
00:38:49,930 --> 00:38:54,580
the most interesting ones
who seem to have problems

840
00:38:54,580 --> 00:38:59,230
with auditory tunes but not with
words or other familiar sounds.

841
00:38:59,230 --> 00:39:02,080
So here is a horizontal slice.

842
00:39:02,080 --> 00:39:03,010
This is an old study.

843
00:39:03,010 --> 00:39:05,800
So it's a CAT scan
showing you something's

844
00:39:05,800 --> 00:39:09,890
up with the anterior temporal
lobes in this patient.

845
00:39:09,890 --> 00:39:14,110
And this was true of these two
classic patients, CN and GL.

846
00:39:14,110 --> 00:39:17,530
Both of them were very bad
at recognizing melodies, even

847
00:39:17,530 --> 00:39:21,160
highly familiar melodies, happy
birthday and stuff like that,

848
00:39:21,160 --> 00:39:23,530
they don't recognize.

849
00:39:23,530 --> 00:39:26,330
They mostly have intact
rhythm perception.

850
00:39:26,330 --> 00:39:28,960
And this is a core question
we'll come back to.

851
00:39:28,960 --> 00:39:31,390
It's a complicated
non-resolved situation.

852
00:39:31,390 --> 00:39:34,570
But these guys had
intact rhythm perception

853
00:39:34,570 --> 00:39:37,810
and relatively intact language
and speech perception.

854
00:39:40,360 --> 00:39:43,960
However, upon
further testing, it

855
00:39:43,960 --> 00:39:46,690
becomes clear that these guys
have a more general problem

856
00:39:46,690 --> 00:39:52,090
with pitch perception,
even if it's not

857
00:39:52,090 --> 00:39:54,220
in the context of music.

858
00:39:54,220 --> 00:39:56,830
So this is a question that
I asked all of you guys

859
00:39:56,830 --> 00:39:59,590
to think about for in
the opposite direction

860
00:39:59,590 --> 00:40:02,830
in your assignment
for Sunday night.

861
00:40:02,830 --> 00:40:06,310
When I asked you
whether those electrodes

862
00:40:06,310 --> 00:40:08,770
in the brains of
epilepsy patients

863
00:40:08,770 --> 00:40:13,420
that are sensitive
to speech prosody,

864
00:40:13,420 --> 00:40:16,210
to the intonation
contour in speech,

865
00:40:16,210 --> 00:40:18,370
I asked you whether you
thought they would also

866
00:40:18,370 --> 00:40:21,310
be sensitive to the intonation
contour in melodies.

867
00:40:21,310 --> 00:40:25,370
And most of you said, yes, it's
pitch, pitch contour, must be.

868
00:40:25,370 --> 00:40:27,340
Well, it's a perfectly
reasonable speculation,

869
00:40:27,340 --> 00:40:28,870
but not necessarily.

870
00:40:28,870 --> 00:40:31,420
Maybe we have special
pitch contour processing

871
00:40:31,420 --> 00:40:35,380
for speech and different pitch
contour processing for music.

872
00:40:35,380 --> 00:40:36,175
It's possible.

873
00:40:36,175 --> 00:40:37,300
It's an empirical question.

874
00:40:37,300 --> 00:40:40,120
Was there a question
back there a second?

875
00:40:40,120 --> 00:40:46,720
OK, so maybe this is about pitch
for both speech and music, not

876
00:40:46,720 --> 00:40:49,540
music per se.

877
00:40:49,540 --> 00:40:51,850
And so there are
more detailed studies

878
00:40:51,850 --> 00:40:54,220
of patients with
congenital amusia.

879
00:40:54,220 --> 00:40:57,610
And just like the case
with acquired prosopagnosia

880
00:40:57,610 --> 00:41:00,010
versus congenital prosopagnosia,
whether you get it

881
00:41:00,010 --> 00:41:02,650
from brain damage as an adult
or whether you just always

882
00:41:02,650 --> 00:41:05,110
had it your whole life, and
nobody knows exactly why

883
00:41:05,110 --> 00:41:07,480
and there's no evidence
of any brain damage,

884
00:41:07,480 --> 00:41:10,960
the same thing happens
with a congenital amusia.

885
00:41:10,960 --> 00:41:14,860
So something like 4%
of the population,

886
00:41:14,860 --> 00:41:16,528
they might say
they're tone deaf.

887
00:41:16,528 --> 00:41:18,070
But just to tell
you what that means,

888
00:41:18,070 --> 00:41:19,720
it can be really quite extreme.

889
00:41:19,720 --> 00:41:23,110
They can just completely fail
to recognize familiar melodies

890
00:41:23,110 --> 00:41:26,080
that anyone else
could recognize.

891
00:41:26,080 --> 00:41:29,800
They may be unable to detect
really obvious wrong notes

892
00:41:29,800 --> 00:41:32,260
in a canonical melody.

893
00:41:32,260 --> 00:41:34,990
They're just really
bad at all of this.

894
00:41:34,990 --> 00:41:38,950
And further, they don't have
whopping obvious problems

895
00:41:38,950 --> 00:41:40,130
with speech perception.

896
00:41:40,130 --> 00:41:44,660
So at first, it was thought
that speech perception was fine.

897
00:41:44,660 --> 00:41:47,830
But if you look closer, it
looks like actually there

898
00:41:47,830 --> 00:41:52,540
is, even outside of music,
there is a finer grained deficit

899
00:41:52,540 --> 00:41:58,782
in pitch contour perception
that shows up even in speech.

900
00:41:58,782 --> 00:42:01,240
So what I mentioned before, so
we can ask this in the case.

901
00:42:01,240 --> 00:42:03,740
This is sort of the reverse
case of the ones you considered.

902
00:42:03,740 --> 00:42:06,310
Now we have people who have
this problem with pitch contour

903
00:42:06,310 --> 00:42:07,720
perception in music.

904
00:42:07,720 --> 00:42:10,300
Are they going to have a
problem also with pitch contour

905
00:42:10,300 --> 00:42:12,280
perception in speech?

906
00:42:12,280 --> 00:42:14,950
So that's what this
study looked at.

907
00:42:14,950 --> 00:42:16,630
So they played sounds like this.

908
00:42:16,630 --> 00:42:18,040
And you have to
listen carefully.

909
00:42:18,040 --> 00:42:21,430
There will be sentences spoken.

910
00:42:21,430 --> 00:42:23,680
And you have to see if they're
identical or different.

911
00:42:23,680 --> 00:42:24,513
So listen carefully.

912
00:42:24,513 --> 00:42:25,180
[VIDEO PLAYBACK]

913
00:42:25,180 --> 00:42:27,310
- She looks like Ann.

914
00:42:27,310 --> 00:42:29,088
She looks like Ann?

915
00:42:29,088 --> 00:42:29,671
[END PLAYBACK]

916
00:42:29,671 --> 00:42:32,171
NANCY KANWISHER: How many people
thought that was different?

917
00:42:32,171 --> 00:42:33,130
Good, you got it.

918
00:42:33,130 --> 00:42:35,500
So one is the
statement and one is--

919
00:42:35,500 --> 00:42:37,060
it's sort of a question.

920
00:42:37,060 --> 00:42:38,840
It's in a sort of
British accent.

921
00:42:38,840 --> 00:42:41,290
It's a little harder to detect,
but different intonation

922
00:42:41,290 --> 00:42:42,130
contour.

923
00:42:42,130 --> 00:42:44,500
So that's what the Tang, et al.

924
00:42:44,500 --> 00:42:47,530
Paper was talking about
is that distinction.

925
00:42:47,530 --> 00:42:50,680
So we can then ask,
that subtle distinction,

926
00:42:50,680 --> 00:42:53,980
are people with congenital
amusia impaired at that.

927
00:42:53,980 --> 00:42:56,020
So if it's specific to
music, they shouldn't be.

928
00:42:56,020 --> 00:43:00,250
But if it's any intonation
contour, they should be.

929
00:43:00,250 --> 00:43:02,920
Yeah, I'll play the other ones.

930
00:43:02,920 --> 00:43:04,690
So they are in fact impaired.

931
00:43:04,690 --> 00:43:07,840
This is accuracy here, the
controls are way up there,

932
00:43:07,840 --> 00:43:09,770
the amusics are down there.

933
00:43:09,770 --> 00:43:13,720
So they are impaired at this
pitch contour perception thing,

934
00:43:13,720 --> 00:43:16,510
even in the context of music.

935
00:43:16,510 --> 00:43:20,050
I'm sorry, I said that wrong--
even in the context of speech.

936
00:43:20,050 --> 00:43:22,060
So it's not just about music.

937
00:43:22,060 --> 00:43:25,930
And in the controls, they
have sounds like this,

938
00:43:25,930 --> 00:43:28,120
which are just tones.

939
00:43:32,600 --> 00:43:33,100
Got that?

940
00:43:33,100 --> 00:43:34,933
It's the same kind of
thing, but not speech.

941
00:43:34,933 --> 00:43:38,080
And you see a similar
deficit in the amusics

942
00:43:38,080 --> 00:43:39,700
compared to the controls.

943
00:43:39,700 --> 00:43:41,673
And then they have a
nonsense speech version.

944
00:43:41,673 --> 00:43:42,340
[VIDEO PLAYBACK]

945
00:43:42,340 --> 00:43:45,382
- [INAUDIBLE]

946
00:43:45,382 --> 00:43:45,965
[END PLAYBACK]

947
00:43:45,965 --> 00:43:47,720
NANCY KANWISHER: Same deal--

948
00:43:47,720 --> 00:43:52,350
the amusics are impaired
compared to the controls.

949
00:43:52,350 --> 00:43:55,100
So that shows that the
deficit for these guys

950
00:43:55,100 --> 00:43:57,620
is not specific to
music per se but it

951
00:43:57,620 --> 00:44:02,600
seems to be a pitch contour
problem in general that

952
00:44:02,600 --> 00:44:04,310
extends to speech.

953
00:44:04,310 --> 00:44:07,225
Yeah?

954
00:44:07,225 --> 00:44:08,380
AUDIENCE: Which of those--

955
00:44:14,425 --> 00:44:17,100
NANCY KANWISHER: We'll
get there, sort of.

956
00:44:17,100 --> 00:44:20,910
It would have been nice if the
Tang et. al. paper had included

957
00:44:20,910 --> 00:44:22,462
some musical contour stuff.

958
00:44:22,462 --> 00:44:24,420
They didn't, but I'll
show you some of our data

959
00:44:24,420 --> 00:44:28,130
shortly that gets close to this.

960
00:44:28,130 --> 00:44:33,410
OK, so all of that suggests
that this amusia is really more

961
00:44:33,410 --> 00:44:35,660
about pitch than speech.

962
00:44:35,660 --> 00:44:37,370
I'm sorry, what's
the matter with me.

963
00:44:37,370 --> 00:44:39,290
It's really more about
pitch than music.

964
00:44:42,110 --> 00:44:45,650
But the reading that
I assigned for today

965
00:44:45,650 --> 00:44:50,103
is a very new twist in
this evolving story.

966
00:44:50,103 --> 00:44:51,770
So this used to be a
nice, clean lecture

967
00:44:51,770 --> 00:44:53,088
with a simple conclusion.

968
00:44:53,088 --> 00:44:55,130
And now all of a sudden,
I ran across that paper.

969
00:44:55,130 --> 00:44:59,740
It's like, wow, OK, that
might not be quite the case.

970
00:44:59,740 --> 00:45:01,490
So what did you guys
get from the reading?

971
00:45:01,490 --> 00:45:06,270
In what way does that slightly
complicate the story here?

972
00:45:06,270 --> 00:45:07,020
Yeah, [INAUDIBLE]?

973
00:45:07,020 --> 00:45:11,062
AUDIENCE: [INAUDIBLE]

974
00:45:12,978 --> 00:45:14,570
NANCY KANWISHER:
Yeah, what they found

975
00:45:14,570 --> 00:45:18,740
is that amusics,
not all of them,

976
00:45:18,740 --> 00:45:20,780
also have problems with rhythm.

977
00:45:20,780 --> 00:45:22,400
And that is inconsistent
with the idea

978
00:45:22,400 --> 00:45:29,180
that amusia is just about pitch,
whether in speech or music.

979
00:45:29,180 --> 00:45:34,890
And that says, OK, many amusics
also have problems with rhythm.

980
00:45:34,890 --> 00:45:35,390
Yeah?

981
00:45:35,390 --> 00:45:39,078
AUDIENCE: [INAUDIBLE]

982
00:45:39,078 --> 00:45:41,190
NANCY KANWISHER: So
there's a standard battery

983
00:45:41,190 --> 00:45:43,170
that people use that asks--

984
00:45:43,170 --> 00:45:43,860
Dana, help me.

985
00:45:43,860 --> 00:45:45,610
What does the standard
battery ask people?

986
00:45:45,610 --> 00:45:47,230
AUDIENCE: There's
a lot stuff, tests,

987
00:45:47,230 --> 00:45:49,942
things like listening to
like a clip of a symphony

988
00:45:49,942 --> 00:45:54,722
and having to decide whether
[INAUDIBLE] or they're too

989
00:45:54,722 --> 00:45:55,297
slow.

990
00:45:55,297 --> 00:45:57,130
NANCY KANWISHER: Kinds
of things that people

991
00:45:57,130 --> 00:45:59,860
without musical
training answer fine,

992
00:45:59,860 --> 00:46:01,180
although there's quite a range.

993
00:46:01,180 --> 00:46:03,100
I'm at the way bottom
end of Dana's scale

994
00:46:03,100 --> 00:46:04,000
when she gives these.

995
00:46:07,199 --> 00:46:08,812
AUDIENCE: That
rhythm falls apart,

996
00:46:08,812 --> 00:46:10,520
might not be able to
tell the difference.

997
00:46:10,520 --> 00:46:14,248
NANCY KANWISHER: Just that this
prior evidence on the stuff

998
00:46:14,248 --> 00:46:16,040
I showed and a whole
bunch of other studies

999
00:46:16,040 --> 00:46:19,310
seem to suggest that amusia,
both in acquired brain

1000
00:46:19,310 --> 00:46:22,310
damage and congenital
amusia, seem to be really

1001
00:46:22,310 --> 00:46:25,220
when you drill down more of
a problem with pitch per se,

1002
00:46:25,220 --> 00:46:27,830
even pitch in speech.

1003
00:46:27,830 --> 00:46:31,820
And so then if it's about pitch,
why would it also go along

1004
00:46:31,820 --> 00:46:33,237
with rhythm?

1005
00:46:33,237 --> 00:46:34,820
And so when it goes
along with rhythm,

1006
00:46:34,820 --> 00:46:38,330
that starts to sound more like
this is something about music.

1007
00:46:38,330 --> 00:46:39,960
It gums up the story.

1008
00:46:39,960 --> 00:46:40,770
Talia?

1009
00:46:40,770 --> 00:46:44,860
AUDIENCE: So I don't really know
if this could be a compound,

1010
00:46:44,860 --> 00:46:48,010
but when it comes
to natural speech

1011
00:46:48,010 --> 00:46:49,900
when you have some
kind of intonation,

1012
00:46:49,900 --> 00:46:52,510
like pitch differences
when you emphasize,

1013
00:46:52,510 --> 00:46:55,120
like especially in
terms of a question,

1014
00:46:55,120 --> 00:46:58,378
aren't there also some kind of
rhythmic differences as well?

1015
00:46:58,378 --> 00:46:59,295
NANCY KANWISHER: Yeah.

1016
00:46:59,295 --> 00:47:01,257
AUDIENCE: So how do you
separate the two out?

1017
00:47:01,257 --> 00:47:03,340
NANCY KANWISHER: You just
have to do a lot of work

1018
00:47:03,340 --> 00:47:05,140
to try to separate those out.

1019
00:47:05,140 --> 00:47:08,140
And so the paper I signed to
you guys did some of that work.

1020
00:47:08,140 --> 00:47:10,743
There's still room to
quibble, but they did.

1021
00:47:10,743 --> 00:47:12,160
There was experiment
two, and they

1022
00:47:12,160 --> 00:47:14,493
tried to deal with exactly
that kind of thing of saying,

1023
00:47:14,493 --> 00:47:17,360
OK, let's try to
make sure that--

1024
00:47:17,360 --> 00:47:19,360
well, actually the controls
that they were doing

1025
00:47:19,360 --> 00:47:20,277
is slightly different.

1026
00:47:20,277 --> 00:47:24,820
They were to make sure that the
beat task didn't require pitch.

1027
00:47:24,820 --> 00:47:27,995
So it's very, very tricky
to pull these things apart,

1028
00:47:27,995 --> 00:47:28,495
which is--

1029
00:47:28,495 --> 00:47:31,010
AUDIENCE: Yes, so like the
beat task doesn't make sense,

1030
00:47:31,010 --> 00:47:32,843
but I was just, like,
in the verb first one,

1031
00:47:32,843 --> 00:47:36,940
even from the paper that
was assignment Sunday.

1032
00:47:36,940 --> 00:47:39,940
I don't know, so you're saying
that it's totally possible

1033
00:47:39,940 --> 00:47:43,750
to separate out rhythmic
differences from when

1034
00:47:43,750 --> 00:47:45,045
you're just changing pitch.

1035
00:47:45,045 --> 00:47:47,160
NANCY KANWISHER: It's
really, really difficult.

1036
00:47:47,160 --> 00:47:48,660
It's really difficult.
Dana's trying

1037
00:47:48,660 --> 00:47:50,550
to do experiments to
do this right now.

1038
00:47:50,550 --> 00:47:53,580
And she's invented some
delightful and crazy stimuli

1039
00:47:53,580 --> 00:47:55,380
that try to have one
and not the other.

1040
00:47:55,380 --> 00:47:57,240
It's very tricky.

1041
00:47:57,240 --> 00:48:00,180
You can have rhythm
without pitch change.

1042
00:48:00,180 --> 00:48:02,640
That you can totally do.

1043
00:48:02,640 --> 00:48:06,150
It's really hard or impossible
to have a melodic contour

1044
00:48:06,150 --> 00:48:08,790
without some beat or other.

1045
00:48:08,790 --> 00:48:11,200
We have some crazy stimuli
that sort of do that,

1046
00:48:11,200 --> 00:48:13,050
but they're pretty crazy.

1047
00:48:13,050 --> 00:48:16,050
So anyway, these are very
tricky things to pull apart.

1048
00:48:16,050 --> 00:48:18,360
And this is all right
at the cutting edge.

1049
00:48:18,360 --> 00:48:20,550
These things have not
been cleanly separated.

1050
00:48:20,550 --> 00:48:21,550
I'm running out of time.

1051
00:48:21,550 --> 00:48:24,130
So do you have a quick question?

1052
00:48:24,130 --> 00:48:26,800
OK, sorry about that.

1053
00:48:26,800 --> 00:48:28,840
So conclusions from
the patient literature,

1054
00:48:28,840 --> 00:48:32,290
they're suggestive evidence
for specialization for music,

1055
00:48:32,290 --> 00:48:35,170
but no really clear
disassociations.

1056
00:48:35,170 --> 00:48:37,270
Music deficits are
frequently but not

1057
00:48:37,270 --> 00:48:41,620
always associated with just
more general pitch deficits.

1058
00:48:41,620 --> 00:48:43,540
And all of this is
complicated because there's

1059
00:48:43,540 --> 00:48:47,450
lots of possible
components of music, right.

1060
00:48:47,450 --> 00:48:49,630
When there's pitch
deficits, is it

1061
00:48:49,630 --> 00:48:52,610
pitch or relative pitch,
interval, key, melody, beat,

1062
00:48:52,610 --> 00:48:53,110
meter?

1063
00:48:53,110 --> 00:48:55,780
All of these things are
different facets of music.

1064
00:48:55,780 --> 00:48:59,410
And so it's really not resolved
exactly what's going on here.

1065
00:48:59,410 --> 00:49:01,750
It's kind of encouraging that
there's a space in there,

1066
00:49:01,750 --> 00:49:03,520
but not resolved.

1067
00:49:03,520 --> 00:49:05,852
So let's go on to
functional MRI.

1068
00:49:05,852 --> 00:49:07,310
And we're going to
run out of time.

1069
00:49:07,310 --> 00:49:09,060
So let me just take a
moment to figure out

1070
00:49:09,060 --> 00:49:11,560
how I'm going to do this.

1071
00:49:11,560 --> 00:49:13,960
What the hell am I
going to do here?

1072
00:49:13,960 --> 00:49:16,070
Well, I hate to--

1073
00:49:18,640 --> 00:49:21,880
OK, you guys are going
to tell me at 12:05.

1074
00:49:21,880 --> 00:49:23,320
Yeah, OK.

1075
00:49:23,320 --> 00:49:24,880
Maybe we can get
all through this.

1076
00:49:24,880 --> 00:49:28,600
So here's a really charming
study from a few years ago

1077
00:49:28,600 --> 00:49:32,110
that tried to ask whether
there are systematic brain

1078
00:49:32,110 --> 00:49:34,390
regions that are engaged
in processing music.

1079
00:49:34,390 --> 00:49:38,317
And they used a really
fun perceptual illusion

1080
00:49:38,317 --> 00:49:39,400
that you're going to hear.

1081
00:49:39,400 --> 00:49:41,890
I'm going to play a speech clip.

1082
00:49:41,890 --> 00:49:44,740
And it's part of it is going
to be repeated many times.

1083
00:49:44,740 --> 00:49:48,786
And just listen to it and think
about what it sounds like.

1084
00:49:48,786 --> 00:49:49,740
[VIDEO PLAYBACK]

1085
00:49:49,740 --> 00:49:53,250
- For it had never been his
good luck to own and eat one.

1086
00:49:53,250 --> 00:49:55,410
There was a cold
drizzle of rain.

1087
00:49:55,410 --> 00:49:58,400
The atmosphere was murky.

1088
00:49:58,400 --> 00:50:00,200
There was a cold drizzle.

1089
00:50:00,200 --> 00:50:02,090
There was a cold drizzle.

1090
00:50:02,090 --> 00:50:03,920
There was a cold drizzle.

1091
00:50:03,920 --> 00:50:05,720
There was a cold drizzle.

1092
00:50:05,720 --> 00:50:07,550
There was a cold drizzle.

1093
00:50:07,550 --> 00:50:09,139
There was a cold drizzle.

1094
00:50:09,139 --> 00:50:09,722
[END PLAYBACK]

1095
00:50:09,722 --> 00:50:11,826
NANCY KANWISHER: What happened?

1096
00:50:11,826 --> 00:50:13,305
AUDIENCE: [INAUDIBLE]

1097
00:50:13,305 --> 00:50:15,300
NANCY KANWISHER: Yeah?

1098
00:50:15,300 --> 00:50:16,398
What happened?

1099
00:50:16,398 --> 00:50:18,888
AUDIENCE: [INAUDIBLE]

1100
00:50:18,888 --> 00:50:21,040
NANCY KANWISHER: You
start to hear a melody.

1101
00:50:21,040 --> 00:50:23,020
And you didn't hear the melody
the first time he said it.

1102
00:50:23,020 --> 00:50:24,395
It was just normal
speech, right.

1103
00:50:24,395 --> 00:50:26,830
Speech has this kind
of intonation contour.

1104
00:50:26,830 --> 00:50:28,810
And he's speaking with
an intonation contour.

1105
00:50:28,810 --> 00:50:30,560
But then somehow when
you keep hearing it,

1106
00:50:30,560 --> 00:50:32,860
it turns into a melody.

1107
00:50:32,860 --> 00:50:35,200
So it turns out that doesn't
work for all speech clips.

1108
00:50:35,200 --> 00:50:37,750
In fact, it's really hard to
find speech clips for which it

1109
00:50:37,750 --> 00:50:38,250
works.

1110
00:50:38,250 --> 00:50:39,970
But there are some.

1111
00:50:39,970 --> 00:50:43,090
But everyone has that
experience, or most people do.

1112
00:50:43,090 --> 00:50:45,310
And that gives us a
really nice lever,

1113
00:50:45,310 --> 00:50:48,760
because we can take that same
acoustic sound when you hear it

1114
00:50:48,760 --> 00:50:51,670
as speech and when you hear
it as melody and we can ask,

1115
00:50:51,670 --> 00:50:54,100
are there brain regions
that respond differentially.

1116
00:50:54,100 --> 00:50:56,832
It's sort of analogous to
upright versus inverted faces.

1117
00:50:56,832 --> 00:50:57,790
Well, it's even better.

1118
00:50:57,790 --> 00:50:59,590
It's the exact same
sound clip that's

1119
00:50:59,590 --> 00:51:02,830
construed one way at first
and another way afterwards.

1120
00:51:02,830 --> 00:51:06,070
Everybody get that?

1121
00:51:06,070 --> 00:51:07,510
So that's what these guys did.

1122
00:51:07,510 --> 00:51:09,590
They used a standard
block design.

1123
00:51:09,590 --> 00:51:11,715
They just listened
to those sounds

1124
00:51:11,715 --> 00:51:13,090
and they just
looked in the brain

1125
00:51:13,090 --> 00:51:17,230
to see what bits respond more
after the sound starts getting

1126
00:51:17,230 --> 00:51:19,060
perceived as music
than before when

1127
00:51:19,060 --> 00:51:20,920
it was being heard as speech.

1128
00:51:20,920 --> 00:51:23,350
And they got a bunch
of blobs in the brain.

1129
00:51:23,350 --> 00:51:25,330
It's a bit of a mess,
but they got some stuff.

1130
00:51:27,880 --> 00:51:30,200
And so that's fun.

1131
00:51:30,200 --> 00:51:32,530
But it's also ambiguous.

1132
00:51:32,530 --> 00:51:35,620
We still don't know if
this is about some kind

1133
00:51:35,620 --> 00:51:38,440
of pitch processing, which
becomes more salient--

1134
00:51:38,440 --> 00:51:41,680
you hear it as abstract
pitch-- or whether it's really

1135
00:51:41,680 --> 00:51:43,750
about melodic contour or what.

1136
00:51:43,750 --> 00:51:45,970
So that's a cool study, but
I think it doesn't really

1137
00:51:45,970 --> 00:51:48,140
nail what's going on.

1138
00:51:48,140 --> 00:51:53,890
So another angle at this is
to ask whether music recruits

1139
00:51:53,890 --> 00:51:56,080
neural machinery for language.

1140
00:51:56,080 --> 00:51:59,920
So let me say why this has
been such a pervasive question

1141
00:51:59,920 --> 00:52:00,470
in the field.

1142
00:52:00,470 --> 00:52:04,150
So there's a lot of people who
have pointed out for 30 years,

1143
00:52:04,150 --> 00:52:07,540
or probably more, there
are many deep commonalities

1144
00:52:07,540 --> 00:52:09,310
between language and music.

1145
00:52:09,310 --> 00:52:12,280
So they're both distinctively
or uniquely human.

1146
00:52:12,280 --> 00:52:14,530
They're natively auditory.

1147
00:52:14,530 --> 00:52:16,930
That is, we can read language,
but that's very recent.

1148
00:52:16,930 --> 00:52:20,770
Really, language is all about
hearing, evolutionarily.

1149
00:52:20,770 --> 00:52:22,990
They unfold over time.

1150
00:52:22,990 --> 00:52:26,530
And they have complex
hierarchical structure.

1151
00:52:26,530 --> 00:52:28,570
So you can parse a
sentence in various ways

1152
00:52:28,570 --> 00:52:29,945
and there are all
kinds of people

1153
00:52:29,945 --> 00:52:34,060
who've come up with ways to have
hierarchical parsings of pieces

1154
00:52:34,060 --> 00:52:35,390
of music as well.

1155
00:52:35,390 --> 00:52:37,300
So there's a lot
of deep connections

1156
00:52:37,300 --> 00:52:39,670
between language and music.

1157
00:52:39,670 --> 00:52:41,620
And so many people have
hypothesized that they

1158
00:52:41,620 --> 00:52:44,170
use common brain machinery.

1159
00:52:44,170 --> 00:52:47,800
And there, in fact, many
reports from neuroimaging

1160
00:52:47,800 --> 00:52:50,230
that argue that in fact they
do use common machinery.

1161
00:52:50,230 --> 00:52:53,410
Like, we found overlapping
activation in Broca's area

1162
00:52:53,410 --> 00:52:57,670
for people listening
to music and speech.

1163
00:52:57,670 --> 00:53:01,350
However, both studies
are all group analyses.

1164
00:53:01,350 --> 00:53:03,370
I forget if I've gone
on my tirade in here

1165
00:53:03,370 --> 00:53:04,330
about group analyses.

1166
00:53:04,330 --> 00:53:06,583
Have I done the group
analysis tirade in here?

1167
00:53:06,583 --> 00:53:07,750
You'll get more of it later.

1168
00:53:07,750 --> 00:53:10,042
I'll do a brief version now,
and you'll get more later.

1169
00:53:10,042 --> 00:53:11,770
Here's the problem--
group analysis

1170
00:53:11,770 --> 00:53:14,230
is you scan 12 subjects.

1171
00:53:14,230 --> 00:53:16,760
You align their brains
as best you can.

1172
00:53:16,760 --> 00:53:19,900
And you do an analysis
that goes across them.

1173
00:53:19,900 --> 00:53:23,110
And you find some
blob, say, here, yeah,

1174
00:53:23,110 --> 00:53:26,230
be there, for listening to
sentences versus listening

1175
00:53:26,230 --> 00:53:27,490
to non-word strings.

1176
00:53:27,490 --> 00:53:29,470
OK, that's a standard finding.

1177
00:53:29,470 --> 00:53:31,030
Then you do it
again for listening

1178
00:53:31,030 --> 00:53:34,930
to melodies versus listening
to scrambled melodies.

1179
00:53:34,930 --> 00:53:37,800
And you find the blob overlaps.

1180
00:53:37,800 --> 00:53:40,590
And then you say, hey,
common neural machinery

1181
00:53:40,590 --> 00:53:46,560
for sentence understanding
and for music perception.

1182
00:53:46,560 --> 00:53:48,880
Now that's an interesting
question to ask.

1183
00:53:48,880 --> 00:53:50,500
It's close to the
right way to do it.

1184
00:53:50,500 --> 00:53:52,140
but there's a
fundamental problem.

1185
00:53:52,140 --> 00:53:55,800
And that is, you can find an
overlap in a group analysis,

1186
00:53:55,800 --> 00:53:59,050
even if no single subject
shows that overlap at all.

1187
00:53:59,050 --> 00:53:59,550
Why?

1188
00:53:59,550 --> 00:54:02,220
Because those regions vary
in their exact location.

1189
00:54:02,220 --> 00:54:04,710
And if you mush across a
whole bunch of individuals,

1190
00:54:04,710 --> 00:54:08,730
you're essentially blurring
your activation pattern.

1191
00:54:08,730 --> 00:54:12,180
And so all of the prior
studies, until a few years ago,

1192
00:54:12,180 --> 00:54:14,700
had been group analyses
and they found overlap.

1193
00:54:14,700 --> 00:54:17,100
And who the hell knows
if there was actually

1194
00:54:17,100 --> 00:54:20,130
overlapping activation within
individual subjects, which

1195
00:54:20,130 --> 00:54:22,410
there would have to be
if it's common machinery.

1196
00:54:22,410 --> 00:54:25,090
Or if they're just nearby
and you muck them up

1197
00:54:25,090 --> 00:54:27,090
with a group analysis and
they look like they're

1198
00:54:27,090 --> 00:54:28,422
on top of each other.

1199
00:54:28,422 --> 00:54:29,880
If you didn't quite
get that, we'll

1200
00:54:29,880 --> 00:54:31,088
be coming back to that point.

1201
00:54:31,088 --> 00:54:33,060
For now, all you need
to know is many people

1202
00:54:33,060 --> 00:54:34,770
ask this question
and the methods

1203
00:54:34,770 --> 00:54:37,650
were close but problematic.

1204
00:54:37,650 --> 00:54:39,570
But luckily,
however, Ev Fedorenko

1205
00:54:39,570 --> 00:54:42,310
did this experiment
right a few years ago.

1206
00:54:42,310 --> 00:54:44,280
So here's Ev and
here's what she did,

1207
00:54:44,280 --> 00:54:47,370
she functionally
identified language regions

1208
00:54:47,370 --> 00:54:49,620
in each subject individually.

1209
00:54:49,620 --> 00:54:51,720
And we'll talk more about
exactly how you do that.

1210
00:54:51,720 --> 00:54:54,030
You listen to sentences
versus non-word strings.

1211
00:54:54,030 --> 00:54:57,210
You find a systematic
set of brain regions

1212
00:54:57,210 --> 00:55:00,510
that you can identify in each
individual that look like this.

1213
00:55:00,510 --> 00:55:01,740
Here is in three subjects.

1214
00:55:01,740 --> 00:55:03,600
Those red bits are
the bits that respond

1215
00:55:03,600 --> 00:55:06,270
more when you
listen to a sentence

1216
00:55:06,270 --> 00:55:08,010
versus listen to
non-word strings

1217
00:55:08,010 --> 00:55:10,703
or read sentences
versus non-word strings.

1218
00:55:10,703 --> 00:55:12,870
Then what she could do is
she said, now that I found

1219
00:55:12,870 --> 00:55:14,760
those exact regions
in each subject,

1220
00:55:14,760 --> 00:55:16,960
I can ask of those
exact regions,

1221
00:55:16,960 --> 00:55:20,880
how do they respond to music
versus scrambled music.

1222
00:55:20,880 --> 00:55:23,139
So she played stuff like this.

1223
00:55:23,139 --> 00:55:27,230
[MUSIC PLAYING]

1224
00:55:27,730 --> 00:55:30,690
OK, so nice canonical
and nothing crazy, weird.

1225
00:55:30,690 --> 00:55:32,440
We're not going with
the New Guinean music

1226
00:55:32,440 --> 00:55:33,520
and asking edgy questions.

1227
00:55:33,520 --> 00:55:35,353
We're just saying
something everybody agrees

1228
00:55:35,353 --> 00:55:37,330
that's music, versus
you scramble it

1229
00:55:37,330 --> 00:55:38,812
and it sounds like this.

1230
00:55:38,812 --> 00:55:42,260
[MUSIC PLAYING]

1231
00:55:42,260 --> 00:55:43,640
OK, it's actually
the same notes.

1232
00:55:43,640 --> 00:55:44,150
I know, I know.

1233
00:55:44,150 --> 00:55:46,400
A lot of people that go, that's
cool, that's really edgy.

1234
00:55:46,400 --> 00:55:46,940
Yeah, it is.

1235
00:55:46,940 --> 00:55:52,160
But to most people, it's
not canonical music.

1236
00:55:52,160 --> 00:55:55,370
And so what Ev found is that
none of those language regions

1237
00:55:55,370 --> 00:56:00,240
responded more to the
intact than scrambled music.

1238
00:56:00,240 --> 00:56:02,280
So language regions are
not interested in music.

1239
00:56:02,280 --> 00:56:05,820
We'll talk more about that
next week or the week after.

1240
00:56:05,820 --> 00:56:07,330
Then she did the opposite.

1241
00:56:07,330 --> 00:56:10,967
She identified brain regions
here in a group analysis

1242
00:56:10,967 --> 00:56:12,675
just to show you where
they are, anterior

1243
00:56:12,675 --> 00:56:15,960
in the temporal lobes,
that respond more to intact

1244
00:56:15,960 --> 00:56:18,150
than scrambled music.

1245
00:56:18,150 --> 00:56:20,230
She identified those
in each subject

1246
00:56:20,230 --> 00:56:22,230
and measured the
response of those regions

1247
00:56:22,230 --> 00:56:25,890
to language, sentences
and non-word strings.

1248
00:56:25,890 --> 00:56:29,550
And each of those regions
respond exactly the same

1249
00:56:29,550 --> 00:56:31,620
to sentences and
non-word strings.

1250
00:56:31,620 --> 00:56:34,800
So basically, the
language regions

1251
00:56:34,800 --> 00:56:37,380
are not interested in
music, and the music regions

1252
00:56:37,380 --> 00:56:38,970
are not interested in language.

1253
00:56:38,970 --> 00:56:41,395
And therein, we have a--

1254
00:56:41,395 --> 00:56:42,270
AUDIENCE: [INAUDIBLE]

1255
00:56:42,270 --> 00:56:46,490
NANCY KANWISHER:
Thank you, exactly.

1256
00:56:46,490 --> 00:56:50,932
So music is not using
machinery for language.

1257
00:56:50,932 --> 00:56:52,890
That was one of the
hypotheses we started with.

1258
00:56:52,890 --> 00:56:53,540
And it was not.

1259
00:56:56,320 --> 00:57:00,070
So that's true, at least for
high-level language processing,

1260
00:57:00,070 --> 00:57:02,770
that computes the
meaning of a sentence.

1261
00:57:02,770 --> 00:57:04,447
But what about
speech perception?

1262
00:57:04,447 --> 00:57:07,030
Remember, last time I made the
distinction between the sounds,

1263
00:57:07,030 --> 00:57:10,150
like ba and pa, which
have a whole set

1264
00:57:10,150 --> 00:57:13,648
of computational challenges,
just perceiving those sounds,

1265
00:57:13,648 --> 00:57:15,190
which is quite
different than knowing

1266
00:57:15,190 --> 00:57:17,080
the meaning of a sentence.

1267
00:57:17,080 --> 00:57:19,520
So what about speech
perception or, in fact,

1268
00:57:19,520 --> 00:57:22,420
any other aspect of hearing?

1269
00:57:22,420 --> 00:57:25,420
So what I'm going to try to do
is briefly tell you about one

1270
00:57:25,420 --> 00:57:26,260
of our experiments.

1271
00:57:26,260 --> 00:57:28,420
I'm sorry, I try not to turn
this whole course into stuff

1272
00:57:28,420 --> 00:57:30,790
we've done in my lab, but
it's one of my favorite ever.

1273
00:57:30,790 --> 00:57:34,720
And it's a cool, different
way to go at this question

1274
00:57:34,720 --> 00:57:38,790
from the other MRI experiments
we've talked about before.

1275
00:57:38,790 --> 00:57:41,060
So the background is,
OK, let's step back.

1276
00:57:41,060 --> 00:57:44,480
What's the overall organization
of auditory cortex?

1277
00:57:44,480 --> 00:57:46,970
And when we did this experiment
five or six years ago,

1278
00:57:46,970 --> 00:57:48,590
not a whole lot was known.

1279
00:57:48,590 --> 00:57:50,600
Basically, everybody agrees.

1280
00:57:50,600 --> 00:57:52,460
Whoops, I put the
wrong slide in here.

1281
00:57:52,460 --> 00:57:56,390
Everybody agrees that primary
auditory cortex is right there

1282
00:57:56,390 --> 00:57:58,130
with that high-low-high
frequency thing

1283
00:57:58,130 --> 00:57:59,450
we talked about from there.

1284
00:57:59,450 --> 00:58:02,690
But from there on out, in
the last couple of years,

1285
00:58:02,690 --> 00:58:05,540
there's an agreement about
speech selective cortex

1286
00:58:05,540 --> 00:58:07,190
that I showed you
briefly last time

1287
00:58:07,190 --> 00:58:09,410
and other people have seen that.

1288
00:58:09,410 --> 00:58:13,130
But there's lots of hypotheses
and no agreement with anything

1289
00:58:13,130 --> 00:58:19,430
else and no real evidence for
really music-selective cortex.

1290
00:58:19,430 --> 00:58:21,350
But there's a problem
with all the prior work

1291
00:58:21,350 --> 00:58:23,270
where you sit around
and make a hypothesis

1292
00:58:23,270 --> 00:58:25,790
and say, oh, let's see, are
we going to get a higher

1293
00:58:25,790 --> 00:58:29,990
response to, say, intact versus
scrambled music, or faces

1294
00:58:29,990 --> 00:58:33,410
versus objects, or whatever.

1295
00:58:33,410 --> 00:58:36,300
All of those are scientists
making up hypotheses,

1296
00:58:36,300 --> 00:58:37,790
and then testing them.

1297
00:58:37,790 --> 00:58:39,290
And there's nothing
wrong with that.

1298
00:58:39,290 --> 00:58:40,550
That's what scientists
are supposed

1299
00:58:40,550 --> 00:58:42,860
to do-- invent hypotheses,
and then make good designs

1300
00:58:42,860 --> 00:58:44,190
and go test them.

1301
00:58:44,190 --> 00:58:47,750
But the problem with that is,
we can only discover things

1302
00:58:47,750 --> 00:58:49,950
that we can think to test.

1303
00:58:49,950 --> 00:58:52,040
What if deep facts
about mind and brain

1304
00:58:52,040 --> 00:58:55,260
are things that nobody would
think up in the first place?

1305
00:58:55,260 --> 00:58:58,100
And so that's where we can
get real power from what

1306
00:58:58,100 --> 00:59:01,100
are known as data-driven
studies, where you collect

1307
00:59:01,100 --> 00:59:03,860
a boatload of data and
then use some fancy math

1308
00:59:03,860 --> 00:59:06,680
and say, tell me what the
structure is in this data.

1309
00:59:06,680 --> 00:59:10,550
Not, is this hypothesis that
I love true in these data.

1310
00:59:10,550 --> 00:59:12,582
And I'll do anything to
pull it out if I can.

1311
00:59:12,582 --> 00:59:14,540
See it in there, find
evidence for it in there.

1312
00:59:14,540 --> 00:59:15,860
But yeah, exactly.

1313
00:59:15,860 --> 00:59:19,340
But if we collect a whole
bunch of data and do some math

1314
00:59:19,340 --> 00:59:23,070
and see what the structure
is, what do we see?

1315
00:59:23,070 --> 00:59:25,440
So that's what we
did in this study.

1316
00:59:25,440 --> 00:59:28,470
I'm going to speed up to try
to give you the gist here.

1317
00:59:28,470 --> 00:59:31,185
So "we" is Sam Norman-Haignere
here and Josh McDermott.

1318
00:59:31,185 --> 00:59:32,727
[SOUND RECORDING EXPERIMENT
 PLAYING]

1319
00:59:32,727 --> 00:59:34,200
And so we scanned
people while they

1320
00:59:34,200 --> 00:59:36,070
were hearing stuff like this.

1321
00:59:36,070 --> 00:59:40,710
We first collected the
165 categories of sounds

1322
00:59:40,710 --> 00:59:42,480
that people hear most commonly.

1323
00:59:42,480 --> 00:59:45,090
This is classic cocktail party
effect you guys are doing.

1324
00:59:45,090 --> 00:59:47,970
You have to separate me speaking
from all this crazy, weird,

1325
00:59:47,970 --> 00:59:50,400
changing background.

1326
00:59:50,400 --> 00:59:52,530
And so anyway, we
scan people listening

1327
00:59:52,530 --> 00:59:57,360
to these sounds, which broadly
sample auditory experience.

1328
00:59:57,360 --> 00:59:59,645
And so we collected sounds
people hear most often

1329
00:59:59,645 --> 01:00:01,770
and that they can recognize
from a two-second clip.

1330
01:00:01,770 --> 01:00:02,730
OK, enough already.

1331
01:00:02,730 --> 01:00:03,522
[CELLPHONE RINGING]

1332
01:00:03,522 --> 01:00:06,930
Oh, yeah, just to
wake everyone up.

1333
01:00:06,930 --> 01:00:10,650
So we scan them listening to
those 165 sounds, broad sample

1334
01:00:10,650 --> 01:00:12,270
of auditory experience.

1335
01:00:12,270 --> 01:00:15,150
Then, from each
voxel in the brain,

1336
01:00:15,150 --> 01:00:18,750
we measure the exact magnitude
of response of that voxel

1337
01:00:18,750 --> 01:00:22,200
to each of the 165 sounds and
we get a vector like this.

1338
01:00:22,200 --> 01:00:23,190
Everybody with me?

1339
01:00:23,190 --> 01:00:27,600
That's one voxel right there,
another voxel, another voxel.

1340
01:00:27,600 --> 01:00:30,060
We do this in all
of kind of greater,

1341
01:00:30,060 --> 01:00:31,710
suburban, auditory cortex.

1342
01:00:31,710 --> 01:00:34,920
That is not just primary cortex,
but all this stuff around it

1343
01:00:34,920 --> 01:00:37,980
that might even remotely, that
responds in any systematic way

1344
01:00:37,980 --> 01:00:38,940
to auditory stimuli.

1345
01:00:38,940 --> 01:00:41,430
They grabbed the
whole damn thing.

1346
01:00:41,430 --> 01:00:43,830
So you do that in 10 subjects.

1347
01:00:43,830 --> 01:00:46,140
You have a big
matrix like this--

1348
01:00:46,140 --> 01:00:49,020
1,000 voxels in each
subject, 11,000 voxels

1349
01:00:49,020 --> 01:00:51,930
across the top, 165 sounds.

1350
01:00:51,930 --> 01:00:54,820
That's our data.

1351
01:00:54,820 --> 01:01:01,060
So each column is the
response of one voxel

1352
01:01:01,060 --> 01:01:04,420
in one person's brain to
each of the 165 sounds.

1353
01:01:04,420 --> 01:01:06,420
Everybody got it?

1354
01:01:06,420 --> 01:01:08,520
Now, we have this
lovely matrix, which

1355
01:01:08,520 --> 01:01:10,155
is basically all the
data we care about

1356
01:01:10,155 --> 01:01:11,280
from this whole experiment.

1357
01:01:11,280 --> 01:01:14,040
Then, we throw away
all the labels.

1358
01:01:14,040 --> 01:01:14,670
Poof.

1359
01:01:14,670 --> 01:01:16,260
It's just a matrix.

1360
01:01:16,260 --> 01:01:19,890
And then we do some math,
which essentially says,

1361
01:01:19,890 --> 01:01:21,990
let's boil down the
structure in this matrix

1362
01:01:21,990 --> 01:01:24,840
and discover its
fundamental components.

1363
01:01:24,840 --> 01:01:26,498
That math happens
to be a variant

1364
01:01:26,498 --> 01:01:27,915
of independent
component analysis,

1365
01:01:27,915 --> 01:01:29,370
if that means anything to you.

1366
01:01:29,370 --> 01:01:30,870
If it doesn't, don't
worry about it.

1367
01:01:30,870 --> 01:01:33,000
The gist is, we're
doing math to say

1368
01:01:33,000 --> 01:01:34,530
what's the structure in here.

1369
01:01:34,530 --> 01:01:36,720
And we're doing it
without any labels.

1370
01:01:36,720 --> 01:01:40,080
So this analysis doesn't even
know where the voxels are

1371
01:01:40,080 --> 01:01:42,750
or which of your 10 subjects
that voxel came from.

1372
01:01:42,750 --> 01:01:44,740
It doesn't know
which sound is which.

1373
01:01:44,740 --> 01:01:46,860
And so it's very
hypothesis neutral.

1374
01:01:46,860 --> 01:01:50,640
It's a way to say, show me
structure with almost no kind

1375
01:01:50,640 --> 01:01:52,140
of prior biases.

1376
01:01:52,140 --> 01:01:54,000
Just show me the structure.

1377
01:01:54,000 --> 01:01:57,390
So everybody get how that's kind
of a totally different thing

1378
01:01:57,390 --> 01:02:00,450
to do from everything
we've talked about so far?

1379
01:02:00,450 --> 01:02:01,470
So that's what we did.

1380
01:02:01,470 --> 01:02:03,630
I'm going to skip the math
and the modeling assumption.

1381
01:02:03,630 --> 01:02:05,005
It's not really
that complicated,

1382
01:02:05,005 --> 01:02:07,990
but I think I'm going
to run out of time,

1383
01:02:07,990 --> 01:02:09,850
so very hypothesis neutral.

1384
01:02:09,850 --> 01:02:13,110
And what we find
is six components

1385
01:02:13,110 --> 01:02:15,450
account for most of
the replicable variance

1386
01:02:15,450 --> 01:02:16,663
in that whole matrix.

1387
01:02:16,663 --> 01:02:18,580
I'll tell you what a
component is in a second.

1388
01:02:18,580 --> 01:02:19,580
Did you have a question?

1389
01:02:19,580 --> 01:02:23,460
AUDIENCE: Is it just like
with ICA, but [INAUDIBLE] PCA

1390
01:02:23,460 --> 01:02:25,060
[INAUDIBLE]?

1391
01:02:25,060 --> 01:02:28,060
NANCY KANWISHER: With PCA,
you assume orthogonal axes.

1392
01:02:28,060 --> 01:02:30,790
With ICA, you don't
assume orthogonal axes.

1393
01:02:30,790 --> 01:02:33,280
And so it's very,
very similar to PCA.

1394
01:02:33,280 --> 01:02:34,963
And it starts out
as PCA and then

1395
01:02:34,963 --> 01:02:36,130
it does some more rigmarole.

1396
01:02:36,130 --> 01:02:38,260
Yeah, it's the same idea.

1397
01:02:38,260 --> 01:02:40,790
Like basically, tell me the
main dimensions of variation.

1398
01:02:40,790 --> 01:02:41,290
Yeah?

1399
01:02:41,290 --> 01:02:43,897
AUDIENCE: And are these
matrices sparse and [INAUDIBLE]??

1400
01:02:43,897 --> 01:02:45,480
NANCY KANWISHER:
Yes, they are sparse.

1401
01:02:45,480 --> 01:02:48,090
And that is one of the
assumptions you use.

1402
01:02:48,090 --> 01:02:51,240
There isn't only one way
to factorize a matrix.

1403
01:02:51,240 --> 01:02:52,630
It's an ill-posed problem.

1404
01:02:52,630 --> 01:02:54,520
So you need to make
some assumptions.

1405
01:02:54,520 --> 01:02:58,290
And that's one of the ones we
made, but you can test them.

1406
01:02:58,290 --> 01:03:00,630
So what we find
is six components

1407
01:03:00,630 --> 01:03:03,150
account for most of the data.

1408
01:03:03,150 --> 01:03:07,770
And four of those reflected
acoustic properties

1409
01:03:07,770 --> 01:03:10,080
of the stimuli.

1410
01:03:10,080 --> 01:03:14,740
One was high for all the sounds
with lots of low frequencies.

1411
01:03:14,740 --> 01:03:17,880
Another was high for all the
sounds with high frequencies.

1412
01:03:17,880 --> 01:03:18,570
What is that?

1413
01:03:23,863 --> 01:03:24,530
Sorry, speak up?

1414
01:03:24,530 --> 01:03:26,310
AUDIENCE: [INAUDIBLE]

1415
01:03:26,310 --> 01:03:28,310
NANCY KANWISHER: They're
sensitive to frequency,

1416
01:03:28,310 --> 01:03:29,900
but where is that in the
brain that you've already

1417
01:03:29,900 --> 01:03:30,470
heard about?

1418
01:03:30,470 --> 01:03:31,370
AUDIENCE: Primary--

1419
01:03:31,370 --> 01:03:33,230
NANCY KANWISHER:
Primary auditory cortex

1420
01:03:33,230 --> 01:03:34,520
as a tonotopic map.

1421
01:03:34,520 --> 01:03:35,840
So this is awesome.

1422
01:03:35,840 --> 01:03:37,670
Because if you go
invent some crazy math

1423
01:03:37,670 --> 01:03:40,160
and you apply it to your data
and you discover something

1424
01:03:40,160 --> 01:03:42,860
you know to be true,
that's very reassuring.

1425
01:03:42,860 --> 01:03:44,930
The math isn't just
inventing crazy stuff.

1426
01:03:44,930 --> 01:03:47,810
It's discovering stuff we
already know to be true.

1427
01:03:47,810 --> 01:03:50,000
That's known in more
biological parts of the field

1428
01:03:50,000 --> 01:03:51,530
as a positive control.

1429
01:03:51,530 --> 01:03:53,630
Invent a new method,
make sure it can discover

1430
01:03:53,630 --> 01:03:55,470
the stuff you know to be true.

1431
01:03:55,470 --> 01:03:59,360
So check, check, OK.

1432
01:03:59,360 --> 01:04:01,618
But then it discovered
some other stuff.

1433
01:04:01,618 --> 01:04:03,660
And I'm just going to tell
you about two of them.

1434
01:04:03,660 --> 01:04:04,463
So here's one.

1435
01:04:04,463 --> 01:04:06,380
So I was just loose about
what a component is.

1436
01:04:06,380 --> 01:04:10,820
A component is a magnitude of
response for each of the 165

1437
01:04:10,820 --> 01:04:14,660
sounds and a
separate distribution

1438
01:04:14,660 --> 01:04:17,130
in the brain, which I'll
show you in a moment.

1439
01:04:17,130 --> 01:04:19,020
So here's one of
those components.

1440
01:04:19,020 --> 01:04:23,150
And we've taken the 165 sounds
and added basic category

1441
01:04:23,150 --> 01:04:23,870
labels on them.

1442
01:04:23,870 --> 01:04:26,030
We put them on Mechanical
Turk and people told us

1443
01:04:26,030 --> 01:04:28,050
which category they belong to.

1444
01:04:28,050 --> 01:04:31,370
So that enables us to look
at this mysterious thing

1445
01:04:31,370 --> 01:04:34,430
and average within a category.

1446
01:04:34,430 --> 01:04:37,040
So this is its component.

1447
01:04:37,040 --> 01:04:39,940
And if you look at
it, you see that it's

1448
01:04:39,940 --> 01:04:44,290
really high for English
speech and foreign speech

1449
01:04:44,290 --> 01:04:46,660
that our subjects
don't understand.

1450
01:04:46,660 --> 01:04:49,450
And then, oh, what's,
that intermediate Thing

1451
01:04:49,450 --> 01:04:51,430
Oh, that's music with vocals.

1452
01:04:51,430 --> 01:04:54,910
It has a kind of speech.

1453
01:04:54,910 --> 01:04:56,320
And way down here--

1454
01:04:56,320 --> 01:04:59,350
that's non speech vocalizations,
stuff like laughing

1455
01:04:59,350 --> 01:05:00,790
and crying and sighing.

1456
01:05:00,790 --> 01:05:03,670
So there's a voice
but no speech content.

1457
01:05:03,670 --> 01:05:07,040
So that's a speech component.

1458
01:05:07,040 --> 01:05:09,080
And as I mentioned,
this had been seen

1459
01:05:09,080 --> 01:05:10,690
before in the last few years.

1460
01:05:10,690 --> 01:05:12,140
So it wasn't completely new.

1461
01:05:12,140 --> 01:05:15,830
But what's cool about this
is just emerged spontaneously

1462
01:05:15,830 --> 01:05:17,630
from this very broad screen.

1463
01:05:17,630 --> 01:05:19,070
We didn't go and
say, hey, can we

1464
01:05:19,070 --> 01:05:20,870
find a speech selective
region of cortex,

1465
01:05:20,870 --> 01:05:21,860
if we try really hard.

1466
01:05:21,860 --> 01:05:24,150
Oh, yeah, we validate
our hypothesis.

1467
01:05:24,150 --> 01:05:26,810
This is like, let's sample
auditory experience-- and wow,

1468
01:05:26,810 --> 01:05:27,330
there it is.

1469
01:05:27,330 --> 01:05:27,830
Yeah?

1470
01:05:27,830 --> 01:05:30,650
AUDIENCE: I mean, you
assigned [INAUDIBLE]..

1471
01:05:30,650 --> 01:05:32,150
NANCY KANWISHER:
We put them on Turk

1472
01:05:32,150 --> 01:05:34,170
and had people say what
category they fit into.

1473
01:05:34,170 --> 01:05:34,670
Yeah?

1474
01:05:34,670 --> 01:05:36,077
AUDIENCE: [INAUDIBLE].

1475
01:05:39,830 --> 01:05:43,644
Categorizing by speech is
a very good way [INAUDIBLE]

1476
01:05:43,644 --> 01:05:45,080
better way than [INAUDIBLE].

1477
01:05:45,080 --> 01:05:46,940
NANCY KANWISHER:
Absolutely, absolutely.

1478
01:05:46,940 --> 01:05:47,990
This is a first pass.

1479
01:05:47,990 --> 01:05:50,210
And one hopes to go
deeper and deeper.

1480
01:05:50,210 --> 01:05:56,900
If we could separate different
aspects of speech, consonants

1481
01:05:56,900 --> 01:05:58,520
and vowels,
fricatives, whatever,

1482
01:05:58,520 --> 01:06:00,020
there could be much
more to be done.

1483
01:06:00,020 --> 01:06:02,000
Yeah, I got to--

1484
01:06:02,000 --> 01:06:04,170
oh, boy, OK.

1485
01:06:04,170 --> 01:06:07,120
And when do I have to
give them the quiz?

1486
01:06:07,120 --> 01:06:08,080
It's shortish.

1487
01:06:08,080 --> 01:06:10,190
They don't need a
full 10 minutes.

1488
01:06:10,190 --> 01:06:10,690
What is it?

1489
01:06:10,690 --> 01:06:11,710
Seven questions?

1490
01:06:11,710 --> 01:06:12,570
AUDIENCE: Eight.

1491
01:06:12,570 --> 01:06:14,195
NANCY KANWISHER:
Eight-- eight minutes?

1492
01:06:14,195 --> 01:06:15,310
AUDIENCE: [INAUDIBLE]

1493
01:06:15,310 --> 01:06:19,450
NANCY KANWISHER: OK, make me
stop definitively at 12:18.

1494
01:06:19,450 --> 01:06:23,690
OK, so that's cool.

1495
01:06:23,690 --> 01:06:25,780
It's not exactly new, but
it's a really nice way

1496
01:06:25,780 --> 01:06:30,310
to rediscover things that
we thought to be true.

1497
01:06:30,310 --> 01:06:35,007
All right, then there's
component 6 that popped out.

1498
01:06:35,007 --> 01:06:35,840
What is component 6?

1499
01:06:35,840 --> 01:06:38,920
Well, if we average
within a category

1500
01:06:38,920 --> 01:06:41,710
instrumental music
and music with vocals,

1501
01:06:41,710 --> 01:06:44,470
and everything
else is really low.

1502
01:06:44,470 --> 01:06:46,720
We didn't go looking for this.

1503
01:06:46,720 --> 01:06:51,070
Boom-- music selectivity.

1504
01:06:51,070 --> 01:06:52,060
That's pretty amazing.

1505
01:06:52,060 --> 01:06:54,712
Never really been seen before.

1506
01:06:54,712 --> 01:06:56,170
People have looked
and they've made

1507
01:06:56,170 --> 01:06:58,780
some kind of sort of smoke
and mirrors, like, not really.

1508
01:06:58,780 --> 01:07:00,197
This is the first
time it was seen

1509
01:07:00,197 --> 01:07:01,655
and it just popped
out of the data.

1510
01:07:01,655 --> 01:07:03,405
And that says that
it's not just something

1511
01:07:03,405 --> 01:07:06,160
you can find if you try really
hard and go fishing for it.

1512
01:07:06,160 --> 01:07:08,680
It's actually a significant
part of the variance

1513
01:07:08,680 --> 01:07:10,120
in this whole response.

1514
01:07:10,120 --> 01:07:12,640
I'm going to skip everything
except clarification questions

1515
01:07:12,640 --> 01:07:14,020
now, because I'm--

1516
01:07:14,020 --> 01:07:14,690
go ahead.

1517
01:07:14,690 --> 01:07:17,092
AUDIENCE: Did these
voxels correspond

1518
01:07:17,092 --> 01:07:18,990
to the music [INAUDIBLE]?

1519
01:07:18,990 --> 01:07:21,000
NANCY KANWISHER: Sort
of, it's complicated.

1520
01:07:21,000 --> 01:07:23,070
Sorry, it's a long answer.

1521
01:07:23,070 --> 01:07:25,290
So this really looks
like it's music.

1522
01:07:25,290 --> 01:07:28,650
And so now, I was vague
about what a component is,

1523
01:07:28,650 --> 01:07:30,300
but it's both that
response profile

1524
01:07:30,300 --> 01:07:32,140
and it's a set of
weights in the brain.

1525
01:07:32,140 --> 01:07:34,020
So if you project this
one back in the brain,

1526
01:07:34,020 --> 01:07:36,510
you get this band of
speech selective cortex

1527
01:07:36,510 --> 01:07:40,320
right below primary
auditory cortex, like that.

1528
01:07:40,320 --> 01:07:43,170
And if you project the music
stuff back in the brain,

1529
01:07:43,170 --> 01:07:44,138
you get a patch.

1530
01:07:44,138 --> 01:07:45,930
This is sort of an
answer to your question.

1531
01:07:45,930 --> 01:07:48,900
You get a patch up in
front of primary auditory

1532
01:07:48,900 --> 01:07:50,250
cortex and a patch behind.

1533
01:07:55,020 --> 01:07:57,480
So here we have a
double dissociation

1534
01:07:57,480 --> 01:08:01,350
of speech selectivity and music
selectivity in the brain, OK?

1535
01:08:01,350 --> 01:08:04,590
So music doesn't just
use mechanisms for speech

1536
01:08:04,590 --> 01:08:05,800
as many people have proposed.

1537
01:08:05,800 --> 01:08:08,400
It's not true, right.

1538
01:08:08,400 --> 01:08:11,130
So when you see
dramatic data like this,

1539
01:08:11,130 --> 01:08:15,420
a natural reaction is to say,
like, really, get out, come on.

1540
01:08:15,420 --> 01:08:18,510
Like, music
specificity, like what?

1541
01:08:18,510 --> 01:08:21,149
So very briefly, Dana
has just replicated this

1542
01:08:21,149 --> 01:08:22,859
in a new sample of subjects.

1543
01:08:22,859 --> 01:08:27,060
It does not matter if those
subjects have musical training,

1544
01:08:27,060 --> 01:08:28,740
like students from
Berklee School who

1545
01:08:28,740 --> 01:08:30,930
spend like six hours
a day practicing,

1546
01:08:30,930 --> 01:08:33,060
versus people who
have essentially

1547
01:08:33,060 --> 01:08:35,880
zero music lessons
ever in their life,

1548
01:08:35,880 --> 01:08:39,458
you get those components in both
groups, maybe slightly stronger

1549
01:08:39,458 --> 01:08:40,500
in the trained musicians.

1550
01:08:40,500 --> 01:08:41,550
We're not quite sure yet.

1551
01:08:41,550 --> 01:08:43,620
But in any case, it is
totally present in people

1552
01:08:43,620 --> 01:08:45,285
with zero musical training.

1553
01:08:45,285 --> 01:08:47,160
That doesn't mean it's
innate, because people

1554
01:08:47,160 --> 01:08:49,859
without musical training
have musical experience

1555
01:08:49,859 --> 01:08:52,779
but no explicit training.

1556
01:08:52,779 --> 01:08:54,090
Skip all of this.

1557
01:08:54,090 --> 01:08:55,410
Here is her replication.

1558
01:08:55,410 --> 01:08:56,057
Boom, boom.

1559
01:08:56,057 --> 01:08:57,599
It's there with and
without training.

1560
01:08:59,865 --> 01:09:00,990
I'm going to skip all this.

1561
01:09:00,990 --> 01:09:02,823
You can read it on the
slides, if I lost you

1562
01:09:02,823 --> 01:09:05,160
in here, because I want to
show you something else.

1563
01:09:05,160 --> 01:09:08,850
That music selectivity
was not evident

1564
01:09:08,850 --> 01:09:11,420
if you just do a direct
contrast in the same data.

1565
01:09:11,420 --> 01:09:13,920
Take all the music conditions,
all the non-music conditions,

1566
01:09:13,920 --> 01:09:15,149
you get a blurry mess.

1567
01:09:15,149 --> 01:09:16,200
It's not strong.

1568
01:09:16,200 --> 01:09:19,290
You have to do the
math to siphon it off.

1569
01:09:19,290 --> 01:09:20,670
And that's OK.

1570
01:09:20,670 --> 01:09:23,040
But I like to see
things in the raw data.

1571
01:09:23,040 --> 01:09:24,569
And so probably
what that means is

1572
01:09:24,569 --> 01:09:28,060
that the music is overlapping
with other things in the brain.

1573
01:09:28,060 --> 01:09:29,939
And so the direct contrast
doesn't work well,

1574
01:09:29,939 --> 01:09:31,600
the math can pull them apart.

1575
01:09:31,600 --> 01:09:33,640
But wouldn't it be nice
to see them separately?

1576
01:09:33,640 --> 01:09:35,609
And so we've been doing
intracranial recordings

1577
01:09:35,609 --> 01:09:37,649
from patients with
electrodes in their brain.

1578
01:09:37,649 --> 01:09:41,490
And I'll just show you a
few very cool responses.

1579
01:09:41,490 --> 01:09:44,490
So this is a single electrode
in a single patient.

1580
01:09:44,490 --> 01:09:47,160
These are the 165
sounds, same ones.

1581
01:09:47,160 --> 01:09:48,510
This is the time course.

1582
01:09:48,510 --> 01:09:51,420
And this is a speech
selective electrode.

1583
01:09:51,420 --> 01:09:53,290
It responds to native
and foreign music.

1584
01:09:53,290 --> 01:09:55,080
Those are the two green ones--

1585
01:09:55,080 --> 01:09:56,760
I'm sorry, native
and foreign speech.

1586
01:09:56,760 --> 01:09:59,670
And it responds to music
with vocals in pink.

1587
01:09:59,670 --> 01:10:03,390
Everybody see how that's a
speech selective electrode?

1588
01:10:03,390 --> 01:10:04,960
So there's loads of those.

1589
01:10:04,960 --> 01:10:06,360
But we also found these.

1590
01:10:06,360 --> 01:10:08,010
Here is a single electrode.

1591
01:10:08,010 --> 01:10:13,240
Look, each row is
a single stimulus.

1592
01:10:13,240 --> 01:10:16,260
Here's a histogram of responses
to all the music with vocals,

1593
01:10:16,260 --> 01:10:19,830
music without vocals, much
stronger than to anything else.

1594
01:10:19,830 --> 01:10:22,360
You might be saying, well,
what about those things.

1595
01:10:22,360 --> 01:10:24,210
Let's look at what
those things are.

1596
01:10:24,210 --> 01:10:27,360
Oh, even the violations
aren't really violations.

1597
01:10:27,360 --> 01:10:30,570
Whistling, humming,
computer jingle, ringtone--

1598
01:10:30,570 --> 01:10:32,490
those are sort of musicy.

1599
01:10:32,490 --> 01:10:35,280
So that is an extremely
music-selective

1600
01:10:35,280 --> 01:10:38,100
individual electrode in
a single subject's brain.

1601
01:10:38,100 --> 01:10:41,640
No fancy math that might
have invented it somehow.

1602
01:10:41,640 --> 01:10:45,780
It's just there right
in the raw data.

1603
01:10:45,780 --> 01:10:47,280
Further, and here's
the time course,

1604
01:10:47,280 --> 01:10:50,370
you can see the time course of
music with instruments, music

1605
01:10:50,370 --> 01:10:52,080
with vocals, everything else.

1606
01:10:52,080 --> 01:10:54,630
Really selective.

1607
01:10:54,630 --> 01:10:56,520
So this is the
strongest evidence yet

1608
01:10:56,520 --> 01:10:59,380
for music specificity
in the human brain.

1609
01:10:59,380 --> 01:11:04,390
But there's one more cool thing
that came out of this analysis.

1610
01:11:04,390 --> 01:11:06,660
And that is we found
some electrodes

1611
01:11:06,660 --> 01:11:08,310
that are not just
selected for music,

1612
01:11:08,310 --> 01:11:13,320
but selected for vocal
music, selected for song.

1613
01:11:13,320 --> 01:11:14,450
And that's really amazing.

1614
01:11:14,450 --> 01:11:16,200
Because as I started
off at the beginning,

1615
01:11:16,200 --> 01:11:17,820
many people have
said that song is

1616
01:11:17,820 --> 01:11:19,440
a kind of native form of music.

1617
01:11:19,440 --> 01:11:22,900
The first one to evolve
and all that kind of stuff.

1618
01:11:22,900 --> 01:11:24,760
And so we did all the controls.

1619
01:11:24,760 --> 01:11:27,210
It's not the low-level stuff.

1620
01:11:27,210 --> 01:11:32,040
And there's lots
of open questions.

1621
01:11:32,040 --> 01:11:36,360
We started with this
puzzle of how did

1622
01:11:36,360 --> 01:11:38,700
music evolve, if it did evolve.

1623
01:11:38,700 --> 01:11:41,400
And we made a little
bit of progress.

1624
01:11:41,400 --> 01:11:44,355
It doesn't share music machinery
with speech and language.

1625
01:11:47,460 --> 01:11:50,430
If it's auditory
cheesecake, as Pinker said,

1626
01:11:50,430 --> 01:11:53,880
it's auditory cheesecake that
not only uses machinery that

1627
01:11:53,880 --> 01:11:55,680
evolved for something
else, but changes it

1628
01:11:55,680 --> 01:12:00,480
throughout development and
makes it very selective.

1629
01:12:00,480 --> 01:12:02,640
These guys speculated
that song is special.

1630
01:12:02,640 --> 01:12:03,990
Maybe it is.

1631
01:12:03,990 --> 01:12:05,940
And sexual selection, who knows?

1632
01:12:05,940 --> 01:12:07,910
We have no data.