1
00:00:01,640 --> 00:00:04,040
The following content is
provided under a Creative

2
00:00:04,040 --> 00:00:05,580
Commons license.

3
00:00:05,580 --> 00:00:07,880
Your support will help
MIT OpenCourseWare

4
00:00:07,880 --> 00:00:12,270
continue to offer high quality
educational resources for free.

5
00:00:12,270 --> 00:00:14,870
To make a donation or
view additional materials

6
00:00:14,870 --> 00:00:18,830
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,830 --> 00:00:20,000
at ocw.mit.edu.

8
00:00:23,467 --> 00:00:25,550
REBECCA SAXE: There's a
whole bunch of limitations

9
00:00:25,550 --> 00:00:28,490
of Haxby-style correlations--

10
00:00:28,490 --> 00:00:32,900
one of them is that all
the tests are binary.

11
00:00:32,900 --> 00:00:35,420
The answer you get
for anything you test

12
00:00:35,420 --> 00:00:37,490
is that there is or
is not information

13
00:00:37,490 --> 00:00:41,070
about that distinction, so
there's no continuous measure

14
00:00:41,070 --> 00:00:41,570
here.

15
00:00:41,570 --> 00:00:44,780
It's just that two
things are more--

16
00:00:44,780 --> 00:00:46,830
they are different
from one another

17
00:00:46,830 --> 00:00:50,360
or they are not different
from one another.

18
00:00:50,360 --> 00:00:56,289
And so once people started
thinking about this method

19
00:00:56,289 --> 00:00:57,830
it became clear that
this is actually

20
00:00:57,830 --> 00:01:00,660
just a special case of a much
more general way of thinking

21
00:01:00,660 --> 00:01:01,930
about fMRI data.

22
00:01:01,930 --> 00:01:05,780
So this particular method--
using spatial correlations--

23
00:01:05,780 --> 00:01:10,400
is very stable and robust,
but it's a special case

24
00:01:10,400 --> 00:01:12,540
of a much more general set.

25
00:01:12,540 --> 00:01:14,690
And here's the
more general idea.

26
00:01:14,690 --> 00:01:18,020
The more general idea is that
we can think of the response

27
00:01:18,020 --> 00:01:20,960
pattern to a stimulus in a
set of voxels, for example--

28
00:01:20,960 --> 00:01:23,210
the voxels in a region--

29
00:01:23,210 --> 00:01:26,150
we can think of that
response pattern

30
00:01:26,150 --> 00:01:30,470
as a vector in voxel space.

31
00:01:30,470 --> 00:01:32,570
So every time you
present a stimulus

32
00:01:32,570 --> 00:01:34,700
you get the response
of all the voxels.

33
00:01:34,700 --> 00:01:37,370
Now, instead of thinking of
that as a spatial pattern,

34
00:01:37,370 --> 00:01:39,650
think of that as a
vector in voxel space.

35
00:01:39,650 --> 00:01:44,040
Every voxel defines a dimension,
and the position in voxel space

36
00:01:44,040 --> 00:01:47,420
is how much activity in each
of those voxels there was.

37
00:01:47,420 --> 00:01:49,500
Can everybody do that
mental transformation?

38
00:01:49,500 --> 00:01:52,300
This is, like, the key insight
that people had about MVPA--

39
00:01:52,300 --> 00:01:56,340
is we had been thinking
about everything in space--

40
00:01:56,340 --> 00:01:58,340
in the space of cortex--

41
00:01:58,340 --> 00:02:02,030
but instead of thinking of
a spatial pattern on cortex,

42
00:02:02,030 --> 00:02:04,130
treat each voxel as
a dimension of a very

43
00:02:04,130 --> 00:02:05,782
multi-dimensional space.

44
00:02:05,782 --> 00:02:07,240
Now, the response
to every stimulus

45
00:02:07,240 --> 00:02:10,610
is one point in voxel space.

46
00:02:10,610 --> 00:02:11,390
OK?

47
00:02:11,390 --> 00:02:14,830
As soon as you think
of it that way,

48
00:02:14,830 --> 00:02:18,541
then your mental representation
of fMRI data looks like that.

49
00:02:18,541 --> 00:02:19,040
Right?

50
00:02:19,040 --> 00:02:20,789
So your mental
representation of fMRI data

51
00:02:20,789 --> 00:02:23,150
used to be a BOLD
response, and then

52
00:02:23,150 --> 00:02:25,580
it was a spatial
pattern of a cortex,

53
00:02:25,580 --> 00:02:28,610
and now it's a point
in voxel space.

54
00:02:28,610 --> 00:02:31,820
And if you can follow those
three transformations then

55
00:02:31,820 --> 00:02:36,530
you realize that a set of points
in a multi-dimensional space

56
00:02:36,530 --> 00:02:38,450
is the kind of problem
that all of machine

57
00:02:38,450 --> 00:02:40,790
learning for the last 20
years has been working on.

58
00:02:40,790 --> 00:02:41,660
Right?

59
00:02:41,660 --> 00:02:44,990
And so everything that has ever
happened in machine learning

60
00:02:44,990 --> 00:02:48,080
could now be used
in fMRI, because--

61
00:02:48,080 --> 00:02:51,740
well, almost-- because
machine learning has just

62
00:02:51,740 --> 00:02:55,250
absolutely proliferated in
both techniques and problems

63
00:02:55,250 --> 00:02:57,560
and solutions to those
problems for handling

64
00:02:57,560 --> 00:03:00,260
data sets where you have no idea
where the dataset came from,

65
00:03:00,260 --> 00:03:03,110
but it's now represented
as multiple points

66
00:03:03,110 --> 00:03:05,100
in a multidimensional space.

67
00:03:05,100 --> 00:03:08,360
And so that's what happened
about five years ago--

68
00:03:08,360 --> 00:03:09,920
is that people
realized that we could

69
00:03:09,920 --> 00:03:12,860
think of fMRI as the
response to every stimulus

70
00:03:12,860 --> 00:03:16,010
as a point in voxel space.

71
00:03:16,010 --> 00:03:19,490
A set of data is a set
of points in voxel space.

72
00:03:19,490 --> 00:03:21,980
Now, do anything
you want with that.

73
00:03:21,980 --> 00:03:24,080
And the first most
obvious thing to do

74
00:03:24,080 --> 00:03:26,370
is to think of this as a
classification problem.

75
00:03:26,370 --> 00:03:26,870
OK?

76
00:03:26,870 --> 00:03:30,020
So we created conditions in
our stimuli or dimensions

77
00:03:30,020 --> 00:03:32,390
in our stimuli,
so now we can ask,

78
00:03:32,390 --> 00:03:34,940
can we decode those conditions?

79
00:03:34,940 --> 00:03:35,842
Can we find clusters?

80
00:03:35,842 --> 00:03:36,800
Can we find dimensions?

81
00:03:36,800 --> 00:03:36,950
Right?

82
00:03:36,950 --> 00:03:38,408
All the standard
things that people

83
00:03:38,408 --> 00:03:42,470
have done when you had points
in multi-dimensional spaces.

84
00:03:42,470 --> 00:03:44,900
And so, again, the most
common thing people

85
00:03:44,900 --> 00:03:47,630
now do is now that you
think of fMRI data that way,

86
00:03:47,630 --> 00:03:50,840
try linear classification of
the categories or dimensions

87
00:03:50,840 --> 00:03:53,729
that you're interested
in, and typically

88
00:03:53,729 --> 00:03:55,520
using standard machine
learning techniques.

89
00:03:55,520 --> 00:03:57,980
So think of training a
classifier on some of your data

90
00:03:57,980 --> 00:03:59,819
and testing it on
independent data

91
00:03:59,819 --> 00:04:01,610
and trying to find the
right classification

92
00:04:01,610 --> 00:04:03,651
techniques that can identify
whatever distinction

93
00:04:03,651 --> 00:04:06,800
you're interested in in the
data set that you built.

94
00:04:06,800 --> 00:04:10,850
And so, the way that this one
looks is that you take some--

95
00:04:10,850 --> 00:04:14,030
now, voxels are on the
y-axis of this heat map,

96
00:04:14,030 --> 00:04:19,220
so we have whatever that is 80,
100 voxels in a region-- maybe

97
00:04:19,220 --> 00:04:22,550
more, and for every
stimulus you have

98
00:04:22,550 --> 00:04:24,770
the response in every
voxel to that stimulus.

99
00:04:24,770 --> 00:04:25,270
Right?

100
00:04:25,270 --> 00:04:27,830
So each of those columns
now is a representation

101
00:04:27,830 --> 00:04:30,860
of where that stimulus
landed in voxel space,

102
00:04:30,860 --> 00:04:33,194
and you have a whole
bunch of instances.

103
00:04:33,194 --> 00:04:34,610
And so now what
you're going to do

104
00:04:34,610 --> 00:04:39,740
is use the training to learn
a potential linear classifier

105
00:04:39,740 --> 00:04:42,350
that tells you what was
the best way to separate

106
00:04:42,350 --> 00:04:44,480
the stimuli that
came from one labeled

107
00:04:44,480 --> 00:04:47,090
set versus the stimuli that came
from some other labeled set.

108
00:04:47,090 --> 00:04:50,510
And the test of that
is going to be--

109
00:04:50,510 --> 00:04:53,540
take a new stimulus
or new response

110
00:04:53,540 --> 00:04:55,310
and use the
classification you learned

111
00:04:55,310 --> 00:04:58,420
to try to decode which
stimulus that that came from,

112
00:04:58,420 --> 00:04:59,870
and measure your accuracy.

113
00:04:59,870 --> 00:05:03,470
And so the new measure of
the information in fMRI

114
00:05:03,470 --> 00:05:05,480
is going to be
classification accuracy.

115
00:05:05,480 --> 00:05:07,130
Does that makes sense--

116
00:05:07,130 --> 00:05:10,120
the people with me?

117
00:05:10,120 --> 00:05:14,390
OK, because that's where a
lot of fMRI is right now--

118
00:05:14,390 --> 00:05:18,550
is now thinking about
responses to stimuli

119
00:05:18,550 --> 00:05:21,290
as points in voxel space
and the problem as one

120
00:05:21,290 --> 00:05:26,030
of classification accuracy
in independent data.

121
00:05:26,030 --> 00:05:26,730
OK.

122
00:05:26,730 --> 00:05:29,240
Here's one experiment
that we did

123
00:05:29,240 --> 00:05:32,060
where we use classification.

124
00:05:32,060 --> 00:05:35,420
So now, another thing just to
note is that in this context

125
00:05:35,420 --> 00:05:37,700
you're often trying to
classify a single trial.

126
00:05:37,700 --> 00:05:38,270
Right?

127
00:05:38,270 --> 00:05:39,740
So in our case,
we're always trying

128
00:05:39,740 --> 00:05:41,240
to classify a single
trial, so we've

129
00:05:41,240 --> 00:05:43,290
gone from partitioning
the data to two halves

130
00:05:43,290 --> 00:05:46,520
and asking about similarity, to
training on some of the data,

131
00:05:46,520 --> 00:05:49,400
and now classifying
single independent trials.

132
00:05:49,400 --> 00:05:50,900
OK.

133
00:05:50,900 --> 00:05:54,160
So here's a case where
we tried to do that,

134
00:05:54,160 --> 00:05:55,944
and it was an
extension of the stuff

135
00:05:55,944 --> 00:05:57,860
that I just showed you
that you could classify

136
00:05:57,860 --> 00:06:01,700
seeing versus hearing, and so
we tried to replicate and extend

137
00:06:01,700 --> 00:06:02,266
that.

138
00:06:02,266 --> 00:06:03,890
So we told people's
stories like this--

139
00:06:03,890 --> 00:06:06,110
there's a background--
so Bella's

140
00:06:06,110 --> 00:06:08,454
pouring sleeping potion
into Ardwin's soup,

141
00:06:08,454 --> 00:06:09,870
where her sister,
Jen, is waiting.

142
00:06:09,870 --> 00:06:12,500
They're holding their breath
while he starts to eat.

143
00:06:12,500 --> 00:06:14,980
The conclusion of the story is
always going to be the same.

144
00:06:14,980 --> 00:06:16,949
Bella concludes that
the potion worked,

145
00:06:16,949 --> 00:06:18,740
and then we tell you
based on what evidence

146
00:06:18,740 --> 00:06:21,920
she made that conclusion.

147
00:06:21,920 --> 00:06:23,460
Another case here
is going to be--

148
00:06:23,460 --> 00:06:25,970
Bella stared through the
secret peephole and waited.

149
00:06:25,970 --> 00:06:28,850
In the bright light she saw his
eyes close and his head droop,

150
00:06:28,850 --> 00:06:31,280
so that her evidence for the
conclusion that the potion

151
00:06:31,280 --> 00:06:32,610
has worked.

152
00:06:32,610 --> 00:06:36,055
That's OK evidence, and we can
vary that in a bunch of ways.

153
00:06:36,055 --> 00:06:38,360
So one is we can change
the modality for evidence.

154
00:06:38,360 --> 00:06:41,000
Instead of seeing something,
she can hear something.

155
00:06:41,000 --> 00:06:43,400
So for example, she pressed
her ear against the door

156
00:06:43,400 --> 00:06:44,180
and waited.

157
00:06:44,180 --> 00:06:47,360
In the quiet she heard the
spoon drop and a soft snore.

158
00:06:47,360 --> 00:06:50,270
So that's similar
content of information,

159
00:06:50,270 --> 00:06:54,020
but arrived at through
a different modality.

160
00:06:54,020 --> 00:06:56,810
Or we can change how
good her evidence is,

161
00:06:56,810 --> 00:06:58,580
and so in this case
we did it by saying,

162
00:06:58,580 --> 00:07:00,770
she tried to peer through
a crack in the door.

163
00:07:00,770 --> 00:07:03,500
In the dim light she squinted
to see his eyes closed.

164
00:07:03,500 --> 00:07:07,040
OK, so that's less strong
perceptual evidence

165
00:07:07,040 --> 00:07:09,870
for the conclusion that
the potion has worked.

166
00:07:09,870 --> 00:07:10,770
OK.

167
00:07:10,770 --> 00:07:12,500
And so now what we're
going to ask is--

168
00:07:12,500 --> 00:07:14,630
if we train on one
set of stories,

169
00:07:14,630 --> 00:07:17,390
on the pattern of activity in
a brain region for stories that

170
00:07:17,390 --> 00:07:19,100
vary on either of
these dimensions,

171
00:07:19,100 --> 00:07:23,000
one at a time either vary on
modality or vary on quality--

172
00:07:23,000 --> 00:07:26,670
in a new test set, can
we decode that dimension?

173
00:07:26,670 --> 00:07:30,800
Yeah, and the first answer
is we can-- both of them.

174
00:07:30,800 --> 00:07:34,440
One thing about this is that
this measure isn't binary

175
00:07:34,440 --> 00:07:35,000
anymore.

176
00:07:35,000 --> 00:07:37,790
So since we're doing
for every stimulus

177
00:07:37,790 --> 00:07:42,770
we're asking whether we can
classify that stimulus or not--

178
00:07:42,770 --> 00:07:45,170
we can get for every
subject, so we can get

179
00:07:45,170 --> 00:07:48,839
for each item the probability--

180
00:07:48,839 --> 00:07:51,380
for each subject for each item
we get a measure of whether it

181
00:07:51,380 --> 00:07:53,960
was classified correctly
or not, so across objects,

182
00:07:53,960 --> 00:07:54,957
across items--

183
00:07:54,957 --> 00:07:56,540
we know for every
item the probability

184
00:07:56,540 --> 00:07:59,340
of it being correctly
classified or not.

185
00:07:59,340 --> 00:08:01,940
And then we can
ask, is that related

186
00:08:01,940 --> 00:08:04,115
to other continuous
features of that item?

187
00:08:04,115 --> 00:08:05,990
So in this case what we
can say, for example,

188
00:08:05,990 --> 00:08:09,410
is the quality dimension--

189
00:08:09,410 --> 00:08:12,440
how good your evidence is for
the belief that you conclude--

190
00:08:12,440 --> 00:08:14,510
that's a continuous metric--

191
00:08:14,510 --> 00:08:16,250
though it's a
continuous feature.

192
00:08:16,250 --> 00:08:18,530
It can be judged continuously
by human observers,

193
00:08:18,530 --> 00:08:20,990
so for each item we
can ask, how good is

194
00:08:20,990 --> 00:08:25,700
the evidence for the conclusion
for this specific story?

195
00:08:25,700 --> 00:08:29,870
That judgment by human observers
of how good the evidence is,

196
00:08:29,870 --> 00:08:33,049
continuously predicts
the probability

197
00:08:33,049 --> 00:08:35,330
of that item being
classified as being

198
00:08:35,330 --> 00:08:37,299
good evidence or
bad evidence-- even

199
00:08:37,299 --> 00:08:39,179
over above the label
that we gave it.

200
00:08:39,179 --> 00:08:44,280
So if you regress out the labels
there's a continuous predictor.

201
00:08:44,280 --> 00:08:46,714
So something, like,
imagine a neural population

202
00:08:46,714 --> 00:08:48,255
that responds-- a
sub population that

203
00:08:48,255 --> 00:08:51,140
responds more the better
the evidence, continuously,

204
00:08:51,140 --> 00:08:54,110
so that classification
gets better

205
00:08:54,110 --> 00:08:57,470
as you get further
out on that dimension.

206
00:08:57,470 --> 00:09:00,091
It's also not redundant
across brain regions,

207
00:09:00,091 --> 00:09:02,590
so there's different information
in different brain regions.

208
00:09:02,590 --> 00:09:04,756
And this is just to show
you that in two other brain

209
00:09:04,756 --> 00:09:06,180
regions--

210
00:09:06,180 --> 00:09:08,690
so in the right STS we can
decode quality, but not

211
00:09:08,690 --> 00:09:10,700
modality, and in the
left TPJ we can decode

212
00:09:10,700 --> 00:09:13,300
modality, but not quality.

213
00:09:13,300 --> 00:09:18,050
And the left TPJ we've
replicated a bunch of times.

214
00:09:18,050 --> 00:09:22,070
In the DMPFC we can't
decode modality or quality,

215
00:09:22,070 --> 00:09:24,710
but we can decode valence,
which is the thing I told you

216
00:09:24,710 --> 00:09:26,090
the right TPJ doesn't decode.

217
00:09:26,090 --> 00:09:29,030
And then if we go back and look
at valence in this dataset--

218
00:09:29,030 --> 00:09:31,520
we can only decode
valence in the DMPFC.

219
00:09:31,520 --> 00:09:33,890
So this is, to me, this is
starting to get cool, right?

220
00:09:33,890 --> 00:09:36,080
Three features of other
people's mental states

221
00:09:36,080 --> 00:09:38,960
represented differentially
in different brain regions.

222
00:09:38,960 --> 00:09:41,810
This distinction between
the more epistemic stuff--

223
00:09:41,810 --> 00:09:45,040
like modality and quality which
is represented in the TPJ,

224
00:09:45,040 --> 00:09:47,720
and valence which is
represented in the DMPFC--

225
00:09:47,720 --> 00:09:49,752
I think is real and
deep and hints at one

226
00:09:49,752 --> 00:09:52,210
of the really most important
distinctions within our theory

227
00:09:52,210 --> 00:09:54,350
of mind that I mentioned
at the very beginning--

228
00:09:54,350 --> 00:09:57,320
between epistemic states and
affective or motivational

229
00:09:57,320 --> 00:09:58,760
states.

230
00:09:58,760 --> 00:10:02,725
So what's cool about
classification analyzes?

231
00:10:02,725 --> 00:10:04,100
They have all the
same properties

232
00:10:04,100 --> 00:10:06,082
as the Haxby-style
analyses in principle,

233
00:10:06,082 --> 00:10:08,540
because they're actually just
a generalization of the Haxby

234
00:10:08,540 --> 00:10:13,580
analyses, except that they're
a lot less robust, because what

235
00:10:13,580 --> 00:10:16,490
you're trying to classify as
single trials are single items.

236
00:10:16,490 --> 00:10:21,050
And so noisy data collapses
faster in these classification

237
00:10:21,050 --> 00:10:23,160
strategies than in
Haxby-style analyses

238
00:10:23,160 --> 00:10:24,255
where you're averaging.

239
00:10:24,255 --> 00:10:29,570
But otherwise, those are
the same two techniques.

240
00:10:29,570 --> 00:10:31,070
What's nice about
the classification

241
00:10:31,070 --> 00:10:33,700
analyses is you can get item
specific outcomes, right?

242
00:10:33,700 --> 00:10:36,836
So you can say, for
a specific item,

243
00:10:36,836 --> 00:10:38,210
how likely it is
to be classified

244
00:10:38,210 --> 00:10:39,707
as one thing or another?

245
00:10:39,707 --> 00:10:41,540
And this is where I
started the talk before,

246
00:10:41,540 --> 00:10:45,650
which is that in both of these
cases we think of a hypothesis

247
00:10:45,650 --> 00:10:47,600
and test it sequentially.

248
00:10:47,600 --> 00:10:50,900
And so the representational
similarity matrix

249
00:10:50,900 --> 00:10:55,250
tests whole hypothesis spaces
instead of single features.

250
00:10:55,250 --> 00:10:57,710
Classification and
Haxby-style stuff

251
00:10:57,710 --> 00:11:01,580
are ways to think of a
future or dimension that

252
00:11:01,580 --> 00:11:04,190
might be represented in a
brain region you care about,

253
00:11:04,190 --> 00:11:06,497
and test whether or
not it's represented.

254
00:11:06,497 --> 00:11:08,330
So they're a way of
thinking of a hypothesis

255
00:11:08,330 --> 00:11:10,788
and testing it, and thinking
of hypothesis and testing it--

256
00:11:10,788 --> 00:11:12,590
and that's what I
mean by sequentially.

257
00:11:12,590 --> 00:11:15,050
So you can think of,
does the right TPJ

258
00:11:15,050 --> 00:11:16,970
represent the
difference, for example,

259
00:11:16,970 --> 00:11:19,130
between Grace poisoning
the person knowingly

260
00:11:19,130 --> 00:11:20,690
and poisoning the
person unknowingly?

261
00:11:20,690 --> 00:11:23,090
The answer to that
is yes, it does,

262
00:11:23,090 --> 00:11:24,331
but that's one hypothesis.

263
00:11:24,331 --> 00:11:26,330
And then we can come up
with another hypothesis,

264
00:11:26,330 --> 00:11:27,892
and then another hypothesis.

265
00:11:27,892 --> 00:11:30,350
And what's interesting about
representational dissimilarity

266
00:11:30,350 --> 00:11:34,280
matrices-- one of the versions
of MVPA people use these days--

267
00:11:34,280 --> 00:11:36,300
is that it takes a
different approach.

268
00:11:36,300 --> 00:11:39,520
So instead of trying to think
of one hypothesis and test it,

269
00:11:39,520 --> 00:11:44,076
it proposes a hypothesis space
and tests the space as a whole,

270
00:11:44,076 --> 00:11:45,950
and that gives you both
different sensitivity

271
00:11:45,950 --> 00:11:48,260
and strengths and
different weaknesses.

272
00:11:48,260 --> 00:11:50,630
So I'll work through an
example in which we did this.

273
00:11:53,540 --> 00:11:55,250
I told you that
I would come back

274
00:11:55,250 --> 00:11:57,500
to thinking about other
people's feelings,

275
00:11:57,500 --> 00:12:00,800
and in this experiment we
took different kinds of things

276
00:12:00,800 --> 00:12:04,980
that people could feel as one
subspace of theory of mind.

277
00:12:04,980 --> 00:12:09,050
So our stimuli, in this case,
are 200 stories about people

278
00:12:09,050 --> 00:12:11,480
having an emotional experience.

279
00:12:11,480 --> 00:12:13,820
And we're going to look at--

280
00:12:13,820 --> 00:12:16,970
what can we understand about
how your brain represents those

281
00:12:16,970 --> 00:12:17,960
different--

282
00:12:17,960 --> 00:12:20,420
your knowledge that lets
you sort out people's

283
00:12:20,420 --> 00:12:21,950
experiences in those cases.

284
00:12:21,950 --> 00:12:23,510
OK, so it's hard
in the abstract,

285
00:12:23,510 --> 00:12:24,890
let's do it in the concrete.

286
00:12:24,890 --> 00:12:28,130
So in the behavioral
version of this test

287
00:12:28,130 --> 00:12:30,350
I give you a list of
20 different emotions--

288
00:12:30,350 --> 00:12:32,450
jealous, disappointed,
devastated, embarrassed,

289
00:12:32,450 --> 00:12:35,750
disgusted, guilty, impressed,
proud, excited, hopeful,

290
00:12:35,750 --> 00:12:38,750
joyful, et cetera-- so you
have 20 different choices.

291
00:12:38,750 --> 00:12:42,030
And I'm going to tell you a
single story about a character

292
00:12:42,030 --> 00:12:44,030
you don't know, and
something they experienced--

293
00:12:44,030 --> 00:12:45,200
very briefly--

294
00:12:45,200 --> 00:12:48,110
and what I want you to think to
yourself is, which emotion did

295
00:12:48,110 --> 00:12:49,650
they experience in that case?

296
00:12:49,650 --> 00:12:50,150
OK?

297
00:12:50,150 --> 00:12:51,230
So here's one.

298
00:12:51,230 --> 00:12:53,780
After an 18 hour flight,
Alice arrived at her vacation

299
00:12:53,780 --> 00:12:56,030
destination to learn that
her baggage, including

300
00:12:56,030 --> 00:12:58,347
camping gear for her trip,
hadn't made the flight.

301
00:12:58,347 --> 00:13:00,180
After waiting at the
airport for two nights,

302
00:13:00,180 --> 00:13:02,430
she was informed that the
airline had lost her luggage

303
00:13:02,430 --> 00:13:04,460
and wouldn't provide
any compensation.

304
00:13:04,460 --> 00:13:07,340
How many people think
she felt joyful?

305
00:13:07,340 --> 00:13:10,775
How many people think
that she felt annoyed?

306
00:13:13,790 --> 00:13:16,533
How about furious?

307
00:13:16,533 --> 00:13:18,830
OK, so furious is
the modal answer

308
00:13:18,830 --> 00:13:21,200
and annoyed is the most
likely second choice

309
00:13:21,200 --> 00:13:23,412
answer to that case.

310
00:13:23,412 --> 00:13:24,370
Here's a different one.

311
00:13:24,370 --> 00:13:26,984
Sarah swore to her roommate that
she would keep her new diet.

312
00:13:26,984 --> 00:13:28,400
Later, she was in
the kitchen, she

313
00:13:28,400 --> 00:13:30,680
took a bite of a cake she had
bought for the dinner party.

314
00:13:30,680 --> 00:13:32,304
When her roommates
arrived home to find

315
00:13:32,304 --> 00:13:34,940
that she'd eaten half the
cake and broken her diet.

316
00:13:34,940 --> 00:13:39,410
How many people think that
she would feel disgusted?

317
00:13:39,410 --> 00:13:40,760
Terrified?

318
00:13:40,760 --> 00:13:42,030
Embarrassed?

319
00:13:42,030 --> 00:13:42,530
OK.

320
00:13:42,530 --> 00:13:43,850
And just to give
you a sense of how

321
00:13:43,850 --> 00:13:45,620
fine grained your
knowledge is in this case,

322
00:13:45,620 --> 00:13:46,786
think about this difference.

323
00:13:46,786 --> 00:13:49,280
In this case she swore
she would keep her diet

324
00:13:49,280 --> 00:13:51,242
and then broke it, right?

325
00:13:51,242 --> 00:13:52,700
What about the
difference between--

326
00:13:52,700 --> 00:13:55,290
she first ate a cake and then
swore she would keep her diet.

327
00:13:55,290 --> 00:13:55,790
Right?

328
00:13:55,790 --> 00:13:57,470
That's a totally different
texture to the story.

329
00:13:57,470 --> 00:13:57,970
OK.

330
00:13:57,970 --> 00:14:00,230
So we have incredibly
fine grained knowledge

331
00:14:00,230 --> 00:14:02,810
of how a description
of a situation

332
00:14:02,810 --> 00:14:05,697
predicts an overall emotion.

333
00:14:05,697 --> 00:14:07,530
You can see that in a
behavioral experiment,

334
00:14:07,530 --> 00:14:10,280
so what I'm showing
you here is--

335
00:14:10,280 --> 00:14:13,490
on the y-axis the emotion
that we intended when we wrote

336
00:14:13,490 --> 00:14:16,910
the story-- so ten stories for
each category for 200 stories--

337
00:14:16,910 --> 00:14:19,580
on the x-axis is the
percent of participants

338
00:14:19,580 --> 00:14:21,080
picking that label.

339
00:14:21,080 --> 00:14:24,320
And so the first thing is
that 65% of the time, people

340
00:14:24,320 --> 00:14:25,880
pick the label we attend.

341
00:14:25,880 --> 00:14:27,834
If instead, you ask,
take half the subjects

342
00:14:27,834 --> 00:14:29,750
to determine a modal
answer and the other half

343
00:14:29,750 --> 00:14:32,125
of the subjects as the test
set, you get the same answer.

344
00:14:32,125 --> 00:14:35,270
There's about 65% agreement
on the right, single label out

345
00:14:35,270 --> 00:14:35,990
of 20.

346
00:14:35,990 --> 00:14:38,510
That's, of course, way
above chance, which is 5%,

347
00:14:38,510 --> 00:14:40,460
so people are
quite good at this.

348
00:14:40,460 --> 00:14:42,600
And the off-diagonal
is also meaningful,

349
00:14:42,600 --> 00:14:44,090
so that also contains
information--

350
00:14:44,090 --> 00:14:45,530
the second best answer, right?

351
00:14:45,530 --> 00:14:49,640
So annoyed as opposed
to furious, for example.

352
00:14:49,640 --> 00:14:52,700
OK, so that's a huge
amount of rich knowledge

353
00:14:52,700 --> 00:14:55,130
about other people's experiences
from these very brief

354
00:14:55,130 --> 00:14:58,490
descriptions of events to a
very fine grained classification

355
00:14:58,490 --> 00:15:01,520
of which emotion
they're experiencing.

356
00:15:01,520 --> 00:15:03,440
And one way to
look at these data

357
00:15:03,440 --> 00:15:06,920
is to ask, OK, well, that's
knowledge that we have--

358
00:15:06,920 --> 00:15:08,620
where is that
knowledge in the brain?

359
00:15:08,620 --> 00:15:10,536
That's sort of a first
question you could ask,

360
00:15:10,536 --> 00:15:15,810
and you could ask it by just
saying, if we try to use--

361
00:15:15,810 --> 00:15:18,470
so in this case, we're
going to do train and test.

362
00:15:18,470 --> 00:15:23,050
So we train a classifier
on a patch of cortex,

363
00:15:23,050 --> 00:15:26,800
based on five examples
from each condition,

364
00:15:26,800 --> 00:15:29,410
and then we test on the
remaining half of the data.

365
00:15:29,410 --> 00:15:33,160
And we just ask, based on the
pattern of activity in a patch

366
00:15:33,160 --> 00:15:34,960
can you get above
chance classification

367
00:15:34,960 --> 00:15:37,654
in the independent data?

368
00:15:37,654 --> 00:15:39,820
For every patch where that's
true we put a, sort of,

369
00:15:39,820 --> 00:15:42,010
bright mark, and then
ask, where in the brain

370
00:15:42,010 --> 00:15:44,050
is the relevant
decoding that would

371
00:15:44,050 --> 00:15:47,020
let you be above chance
on this distinction?

372
00:15:47,020 --> 00:15:50,285
The answer is in exactly
the same brain regions

373
00:15:50,285 --> 00:15:52,160
that I have been talking
about and showed you

374
00:15:52,160 --> 00:15:55,150
before, that is where there's
above chance classification,

375
00:15:55,150 --> 00:15:58,570
and then overlaid on the
standard belief versus photo

376
00:15:58,570 --> 00:15:59,950
task in green.

377
00:15:59,950 --> 00:16:02,230
So within the brain
regions involved

378
00:16:02,230 --> 00:16:04,240
in theory of mind
or social cognition

379
00:16:04,240 --> 00:16:07,330
are the brain regions that
can above chance classify

380
00:16:07,330 --> 00:16:09,970
in this 20 way distinction.

381
00:16:09,970 --> 00:16:13,090
And then this is just looking
inside each one of those.

382
00:16:13,090 --> 00:16:14,650
Inside each of the regions--

383
00:16:14,650 --> 00:16:16,750
that's four of the
regions in that group

384
00:16:16,750 --> 00:16:18,280
that I showed you before--

385
00:16:18,280 --> 00:16:19,351
you can do above--

386
00:16:19,351 --> 00:16:21,850
with using just the pattern of
activity in that brain region

387
00:16:21,850 --> 00:16:24,820
you can do above chance
classification on this 20 way

388
00:16:24,820 --> 00:16:26,050
distinction.

389
00:16:26,050 --> 00:16:28,120
And there's a hint
that that information

390
00:16:28,120 --> 00:16:30,274
is somewhat non-redundant,
because if you combine

391
00:16:30,274 --> 00:16:31,690
information across
all of them you

392
00:16:31,690 --> 00:16:35,020
do slightly better than if
you use any one of them alone.

393
00:16:35,020 --> 00:16:37,960
OK, so now the
question is, how can we

394
00:16:37,960 --> 00:16:40,150
study what knowledge
is represented

395
00:16:40,150 --> 00:16:41,510
in each of these brain regions?

396
00:16:41,510 --> 00:16:42,010
Right?

397
00:16:42,010 --> 00:16:44,790
So we know that there's some
information about that 20 way

398
00:16:44,790 --> 00:16:46,690
classification, but
can we learn anything

399
00:16:46,690 --> 00:16:49,540
about the representation of
emotions in those brain regions

400
00:16:49,540 --> 00:16:51,550
using fMRI?

401
00:16:51,550 --> 00:16:55,660
And that's where the
representation dissimilarity

402
00:16:55,660 --> 00:16:58,100
matrices come in as a strategy.

403
00:16:58,100 --> 00:17:01,840
OK, so the question
is, how might you

404
00:17:01,840 --> 00:17:03,910
represent the
knowledge that you have

405
00:17:03,910 --> 00:17:07,150
of what Alice is experiencing,
for example, in this story?

406
00:17:07,150 --> 00:17:10,690
What's a possible hypothesis?

407
00:17:10,690 --> 00:17:15,130
And the way representational
dissimilarity matrices

408
00:17:15,130 --> 00:17:17,950
work as a strategy
for fMRI analyses

409
00:17:17,950 --> 00:17:19,510
is that what you
should do is think

410
00:17:19,510 --> 00:17:23,170
of multiple different hypotheses
about how that knowledge could

411
00:17:23,170 --> 00:17:24,880
be represented.

412
00:17:24,880 --> 00:17:28,329
So a first hypothesis, which
is deep in the literature

413
00:17:28,329 --> 00:17:32,183
on emotions, is that we
represent other people's

414
00:17:32,183 --> 00:17:36,670
emotional experience in terms
of two fundamental dimensions

415
00:17:36,670 --> 00:17:39,061
of emotional experience--
valence and arousal.

416
00:17:39,061 --> 00:17:40,810
Have you guys heard
of valence and arousal

417
00:17:40,810 --> 00:17:41,740
as the two fundamental--

418
00:17:41,740 --> 00:17:42,250
OK.

419
00:17:42,250 --> 00:17:45,350
So this hypothesis says, when
we think about emotions--

420
00:17:45,350 --> 00:17:46,750
our own or other people--

421
00:17:46,750 --> 00:17:50,300
we put emotions in a
two dimensional space,

422
00:17:50,300 --> 00:17:52,450
which is, how good or
bad did it make you feel,

423
00:17:52,450 --> 00:17:53,900
and how intense was it?

424
00:17:53,900 --> 00:17:54,400
OK.

425
00:17:54,400 --> 00:17:59,020
So terrified is negative
and very intense.

426
00:17:59,020 --> 00:18:01,610
Lonely is negative,
but not that intense.

427
00:18:01,610 --> 00:18:02,110
Right?

428
00:18:02,110 --> 00:18:03,820
That's the idea.

429
00:18:03,820 --> 00:18:06,550
Happy is positive
and somewhat intense.

430
00:18:06,550 --> 00:18:09,020
Thrilled is happy
and more intense.

431
00:18:09,020 --> 00:18:11,470
So that idea is that there's
these two basic dimensions

432
00:18:11,470 --> 00:18:14,260
of emotional experience,
and so one thing we can do

433
00:18:14,260 --> 00:18:15,880
is have each of
our stories, like

434
00:18:15,880 --> 00:18:19,709
this one-- we can have
people tell us in that story

435
00:18:19,709 --> 00:18:21,250
was she feeling
positive or negative?

436
00:18:21,250 --> 00:18:23,865
How positive or negative,
and how intensely?

437
00:18:23,865 --> 00:18:25,240
And so, for each
individual story

438
00:18:25,240 --> 00:18:29,800
we can have a representation
of it as a point in that space.

439
00:18:29,800 --> 00:18:33,967
And if you use just that, you
can classify our 200 stories

440
00:18:33,967 --> 00:18:35,800
reasonably well-- not
as well as people can,

441
00:18:35,800 --> 00:18:37,430
but still reasonably well.

442
00:18:37,430 --> 00:18:40,060
OK, so the 200 stories
do clump into lumps

443
00:18:40,060 --> 00:18:43,210
in that two dimensional space.

444
00:18:43,210 --> 00:18:46,630
But another idea is
that valence and arousal

445
00:18:46,630 --> 00:18:51,210
seem not to capture the full
texture of the 20 categories

446
00:18:51,210 --> 00:18:52,210
that we originally have.

447
00:18:52,210 --> 00:18:54,160
It's not that we can't
embed 20 categories

448
00:18:54,160 --> 00:18:55,690
in two dimensions--
you obviously

449
00:18:55,690 --> 00:18:58,940
can have 20 clusters in
a two dimensional space.

450
00:18:58,940 --> 00:19:01,626
But we had the intuition that
it's not a two dimensional

451
00:19:01,626 --> 00:19:03,250
space-- that those
two dimensions don't

452
00:19:03,250 --> 00:19:05,200
capture all the features
that people have

453
00:19:05,200 --> 00:19:07,370
and know about when
they use the stimuli.

454
00:19:07,370 --> 00:19:09,400
And so, based on
another literature

455
00:19:09,400 --> 00:19:10,860
called appraisal theory--

456
00:19:10,860 --> 00:19:15,220
what we tried to do is capture
some of the abstract knowledge

457
00:19:15,220 --> 00:19:18,460
that people have about these
situations that lets them

458
00:19:18,460 --> 00:19:21,190
identify which emotion it is.

459
00:19:21,190 --> 00:19:23,680
And we did that by
having them rate

460
00:19:23,680 --> 00:19:26,770
each of these stories on a bunch
of abstract event features.

461
00:19:26,770 --> 00:19:29,160
So those event features
are things like--

462
00:19:29,160 --> 00:19:31,630
was this situation
caused by a person

463
00:19:31,630 --> 00:19:33,106
or some other external force.

464
00:19:33,106 --> 00:19:34,480
So I hope you guys
have the sense

465
00:19:34,480 --> 00:19:36,813
that if your luggage gets
lost on your way to the trip--

466
00:19:36,813 --> 00:19:38,860
it's different if that
was airline incompetence

467
00:19:38,860 --> 00:19:40,190
versus a tornado, right?

468
00:19:40,190 --> 00:19:41,270
Does everybody have
that intuition?

469
00:19:41,270 --> 00:19:42,190
The emotion is different.

470
00:19:42,190 --> 00:19:42,689
OK.

471
00:19:42,689 --> 00:19:44,440
So that's an important
abstract feature

472
00:19:44,440 --> 00:19:45,880
of our knowledge
of other people.

473
00:19:45,880 --> 00:19:47,560
Was it caused by you yourself?

474
00:19:47,560 --> 00:19:49,200
If you left your
luggage at home,

475
00:19:49,200 --> 00:19:51,700
that's different from if airline
incompetence caused you not

476
00:19:51,700 --> 00:19:52,574
to have your luggage.

477
00:19:52,574 --> 00:19:54,779
And does it refer to
something in her past?

478
00:19:54,779 --> 00:19:56,320
Is she interacting
with other people?

479
00:19:56,320 --> 00:19:57,850
That makes a really big
difference, for example,

480
00:19:57,850 --> 00:19:59,891
in pride and embarrassment--
whether other people

481
00:19:59,891 --> 00:20:01,390
are around.

482
00:20:01,390 --> 00:20:03,400
How will it affect her
future relationships?

483
00:20:03,400 --> 00:20:06,161
So things that potentially cause
harm to future relationships

484
00:20:06,161 --> 00:20:08,410
feel very different from
things that are just annoying

485
00:20:08,410 --> 00:20:09,940
right now but will end.

486
00:20:09,940 --> 00:20:13,900
So these are abstract features
and they encapsulate things

487
00:20:13,900 --> 00:20:16,630
we know about emotion relevant
features of the situations

488
00:20:16,630 --> 00:20:18,160
people find themselves in.

489
00:20:18,160 --> 00:20:20,440
So we came up with
42 of these and we

490
00:20:20,440 --> 00:20:23,380
had every story rated on
all of those dimensions.

491
00:20:23,380 --> 00:20:26,080
And of course, we can,
again, classify the stories

492
00:20:26,080 --> 00:20:29,080
as 20 clusters in a 42
dimensional space, right?

493
00:20:29,080 --> 00:20:30,730
Again, of course we can.

494
00:20:30,730 --> 00:20:34,310
But the question is
[INAUDIBLE] this is those data.

495
00:20:34,310 --> 00:20:37,060
This is just every set of
10 stories and their average

496
00:20:37,060 --> 00:20:38,370
rating on our-- oh, 38--

497
00:20:38,370 --> 00:20:40,510
on our 38 appraisal
features, so that

498
00:20:40,510 --> 00:20:43,840
creates a 38 dimensional space.

499
00:20:43,840 --> 00:20:45,130
Here the idea is--

500
00:20:45,130 --> 00:20:47,800
for each category-- like, for
all the stories about being

501
00:20:47,800 --> 00:20:49,480
jealous--

502
00:20:49,480 --> 00:20:52,810
you can get-- for, let's
say, for the two dimensions

503
00:20:52,810 --> 00:20:55,510
of valence and arousal--
the average value of valence

504
00:20:55,510 --> 00:20:57,414
and the average value
of arousal, right?

505
00:20:57,414 --> 00:20:59,330
So that's a point in a
two dimensional space--

506
00:20:59,330 --> 00:21:01,010
the stories about being jealous.

507
00:21:01,010 --> 00:21:01,930
OK.

508
00:21:01,930 --> 00:21:04,210
Then you take the stories
about being terrified.

509
00:21:04,210 --> 00:21:06,320
What's their
valence and arousal?

510
00:21:06,320 --> 00:21:08,740
So that's another point in
a two dimensional space.

511
00:21:08,740 --> 00:21:11,410
And then you take the
distance between them,

512
00:21:11,410 --> 00:21:13,750
and that number goes in a
representational dissimilarity

513
00:21:13,750 --> 00:21:14,600
matrix.

514
00:21:14,600 --> 00:21:16,600
So the further away you
are in a two dimensional

515
00:21:16,600 --> 00:21:19,044
space, the more dissimilar.

516
00:21:19,044 --> 00:21:21,460
And you could do the same thing
in a 42 dimensional space,

517
00:21:21,460 --> 00:21:23,334
a 38 dimensional space,
any dimensional space

518
00:21:23,334 --> 00:21:26,960
you want-- what you need to know
is just how far away you are.

519
00:21:26,960 --> 00:21:29,890
And so what a representational
dissimilarity matrix has in it

520
00:21:29,890 --> 00:21:30,950
is for every pair.

521
00:21:30,950 --> 00:21:33,910
So the jealous stories
versus the grateful stories--

522
00:21:33,910 --> 00:21:36,760
the number in that
cell is the distance

523
00:21:36,760 --> 00:21:38,810
from the mean
position in your space

524
00:21:38,810 --> 00:21:42,040
of all the jealous stories to
the mean position in your space

525
00:21:42,040 --> 00:21:43,780
of all the grateful stories.

526
00:21:43,780 --> 00:21:46,660
Does that make sense?

527
00:21:46,660 --> 00:21:49,390
And that could be true
of any dimensionality.

528
00:21:49,390 --> 00:21:51,310
When you know
these 38 features--

529
00:21:51,310 --> 00:21:52,990
so this is behavioral data--

530
00:21:52,990 --> 00:21:56,470
when you know the 38
features of these emotions,

531
00:21:56,470 --> 00:22:00,280
the green bar is how well you
can classify new items, just

532
00:22:00,280 --> 00:22:00,850
behaviorally.

533
00:22:00,850 --> 00:22:02,800
So if I give you a
new item and all I

534
00:22:02,800 --> 00:22:05,962
tell you is it's value
in these 38 dimensions,

535
00:22:05,962 --> 00:22:07,420
how well can you
tell me back which

536
00:22:07,420 --> 00:22:09,280
emotion category it comes from?

537
00:22:09,280 --> 00:22:12,310
The best you could
possibly do is 65%,

538
00:22:12,310 --> 00:22:15,440
because that's what human
observers do in all of our--

539
00:22:15,440 --> 00:22:17,255
so the reality is
the human observers--

540
00:22:17,255 --> 00:22:18,880
the features come
from human observers,

541
00:22:18,880 --> 00:22:21,340
so our ceiling's
going to be 65%,

542
00:22:21,340 --> 00:22:23,280
and the answer is about 55%.

543
00:22:23,280 --> 00:22:23,860
OK.

544
00:22:23,860 --> 00:22:26,330
And you can take that
in two different ways.

545
00:22:26,330 --> 00:22:29,860
One tendency is to
say, wow, we know

546
00:22:29,860 --> 00:22:34,210
a lot of the key features that
go into emotion attribution.

547
00:22:34,210 --> 00:22:36,040
I think, Amy, who I
did this work with,

548
00:22:36,040 --> 00:22:37,840
had a tendency to feel that way.

549
00:22:37,840 --> 00:22:40,661
And I think, wow, we
thought of 38 things

550
00:22:40,661 --> 00:22:42,910
and we still didn't think
of all the important things.

551
00:22:42,910 --> 00:22:44,530
Like, what are
those other things

552
00:22:44,530 --> 00:22:46,360
that we didn't think
of that explain

553
00:22:46,360 --> 00:22:47,520
the rest of the variation?

554
00:22:47,520 --> 00:22:49,895
So you could feel either way
about this, but in any case,

555
00:22:49,895 --> 00:22:53,020
once you know the position of
one of these stories in the 38

556
00:22:53,020 --> 00:22:55,120
dimensional space
of these features,

557
00:22:55,120 --> 00:22:59,230
you know a lot about which
emotion category it came from.

558
00:22:59,230 --> 00:23:01,750
And then this is the correlation
to the neural RDM data

559
00:23:01,750 --> 00:23:02,500
that I showed you.

560
00:23:02,500 --> 00:23:05,083
And so, again, what I showed you
is, so observer's knowledge--

561
00:23:05,083 --> 00:23:08,120
that's everything that we know
that lets us classify a story.

562
00:23:08,120 --> 00:23:10,000
Valence and arousal
is the yellow bar--

563
00:23:10,000 --> 00:23:12,100
that's just these two
features of the story,

564
00:23:12,100 --> 00:23:14,683
and they're both less good than
this intermediate thing, which

565
00:23:14,683 --> 00:23:16,330
is the 38 dimensional space.

566
00:23:16,330 --> 00:23:17,980
And one question is,
like, do I really

567
00:23:17,980 --> 00:23:19,390
think it's 38 dimensions?

568
00:23:19,390 --> 00:23:20,380
No, definitely not.

569
00:23:20,380 --> 00:23:23,290
That was just the set of all the
things that we could think of.

570
00:23:23,290 --> 00:23:25,480
How many dimensions
is it, really?

571
00:23:25,480 --> 00:23:28,930
Again, I don't know,
really, but I can tell you

572
00:23:28,930 --> 00:23:31,570
that the best ten
dimensions capture

573
00:23:31,570 --> 00:23:34,550
most of the information
from the 38 dimensions.

574
00:23:34,550 --> 00:23:36,730
So what we've
discovered so far is

575
00:23:36,730 --> 00:23:39,697
ten really important dimensions
of your knowledge of emotion.

576
00:23:39,697 --> 00:23:42,280
I don't, again, think that means
that our knowledge of emotion

577
00:23:42,280 --> 00:23:43,450
is ten dimensional.

578
00:23:43,450 --> 00:23:46,090
Lots of this is limited
by the set of stimuli

579
00:23:46,090 --> 00:23:48,880
that we chose, the resolution
of the data that we have,

580
00:23:48,880 --> 00:23:50,510
and so forth and so on.

581
00:23:50,510 --> 00:23:52,540
But in these data
you need something

582
00:23:52,540 --> 00:23:54,790
on the order of ten
dimensions to get

583
00:23:54,790 --> 00:23:56,950
close to human
performance or close

584
00:23:56,950 --> 00:24:03,250
to the genuinely differential
signal in the neural data.

585
00:24:03,250 --> 00:24:06,400
If you take one thing away from
this talk about the methods

586
00:24:06,400 --> 00:24:08,470
used in representational
dissimilarity matrix--

587
00:24:08,470 --> 00:24:10,040
really only one thing.

588
00:24:10,040 --> 00:24:11,980
Here's the one thing
I want you to know--

589
00:24:11,980 --> 00:24:13,990
the dimensionality
of the theory that

590
00:24:13,990 --> 00:24:16,600
generated your representational
dissimilarity matrix

591
00:24:16,600 --> 00:24:19,480
does nothing for you in
the fit to your data.

592
00:24:19,480 --> 00:24:20,590
Nothing at all.

593
00:24:20,590 --> 00:24:22,410
It's a parameter-free fit.

594
00:24:22,410 --> 00:24:23,020
OK?

595
00:24:23,020 --> 00:24:26,800
So anybody to whom those
words mean anything,

596
00:24:26,800 --> 00:24:30,185
this will be important, so I
want you to actually know this.

597
00:24:30,185 --> 00:24:31,810
Representational
dissimilarity matrices

598
00:24:31,810 --> 00:24:34,546
provide a parameter-free
fit to the data,

599
00:24:34,546 --> 00:24:35,920
and therefore,
the dimensionality

600
00:24:35,920 --> 00:24:38,740
of the theory that generated the
representational dissimilarity

601
00:24:38,740 --> 00:24:43,224
matrix has nothing to do
with the fit of the data.

602
00:24:43,224 --> 00:24:45,640
You can probably notice I
should have ordered this better.

603
00:24:45,640 --> 00:24:47,599
Valance has two
dimensions, the observers

604
00:24:47,599 --> 00:24:48,640
has a lot of dimensions--

605
00:24:48,640 --> 00:24:50,560
I don't know how many,
but a lot more than 38.

606
00:24:50,560 --> 00:24:53,510
We know that because 38
doesn't explain all their data.

607
00:24:53,510 --> 00:24:57,700
So as you go up in--
and in principle,

608
00:24:57,700 --> 00:25:00,160
having more dimensions
doesn't help in the set.

609
00:25:00,160 --> 00:25:02,740
You can correctly see that they
overfit rather than fitting

610
00:25:02,740 --> 00:25:05,330
the data, and here's why.

611
00:25:05,330 --> 00:25:09,070
Because the way you build a
representational dissimilarity

612
00:25:09,070 --> 00:25:12,590
matrix is, out of no matter
how many dimensions you

613
00:25:12,590 --> 00:25:15,680
have in your data set for
every pair of stimuli,

614
00:25:15,680 --> 00:25:19,280
you take one number, and then a
representational dissimilarity

615
00:25:19,280 --> 00:25:23,360
matrix encodes the relationships
among those numbers.

616
00:25:23,360 --> 00:25:24,200
OK?

617
00:25:24,200 --> 00:25:28,280
So jealousy is more similar to
irritation than it is to pride.

618
00:25:28,280 --> 00:25:29,460
By how much?

619
00:25:29,460 --> 00:25:30,410
OK?

620
00:25:30,410 --> 00:25:34,250
And those relative
differences is all you have.

621
00:25:34,250 --> 00:25:37,820
You have nothing else, and
so there's no parameters.

622
00:25:37,820 --> 00:25:38,600
Right?

623
00:25:38,600 --> 00:25:40,810
You have the same
amount of information

624
00:25:40,810 --> 00:25:42,560
in a representational
dissimilarity matrix

625
00:25:42,560 --> 00:25:44,851
that you generated from a
one dimensional theory, a two

626
00:25:44,851 --> 00:25:46,340
dimensional theory,
a 38 dimension

627
00:25:46,340 --> 00:25:48,145
theory, and an infinite
dimensional theory.

628
00:25:48,145 --> 00:25:50,270
The size of the theory
doesn't make any difference,

629
00:25:50,270 --> 00:25:53,720
because what you get in the
end is exactly the same thing--

630
00:25:53,720 --> 00:25:58,160
the relative distance between
every two points in the set.

631
00:25:58,160 --> 00:26:01,460
There's a few things
to say about--

632
00:26:01,460 --> 00:26:05,330
so one thing to say about the
representational dissimilarity

633
00:26:05,330 --> 00:26:07,490
analysis that I
just showed you is

634
00:26:07,490 --> 00:26:11,330
that it tells you that the
38 dimensional theory is

635
00:26:11,330 --> 00:26:12,985
better than the valence theory.

636
00:26:12,985 --> 00:26:14,360
Like, the event
feature theory is

637
00:26:14,360 --> 00:26:18,260
better than the valence theory,
but it doesn't tell you why.

638
00:26:18,260 --> 00:26:18,860
Right?

639
00:26:18,860 --> 00:26:22,010
It doesn't tell you whether any
specific one of those features

640
00:26:22,010 --> 00:26:25,190
is capturing variance in any
specific one of those regions.

641
00:26:25,190 --> 00:26:28,010
It tells you that that
whole set was better

642
00:26:28,010 --> 00:26:30,020
than this other whole
set, and maybe this

643
00:26:30,020 --> 00:26:31,144
is where you're getting at.

644
00:26:31,144 --> 00:26:35,000
It's much less good for trying
post-hoc things for saying,

645
00:26:35,000 --> 00:26:35,630
but why?

646
00:26:35,630 --> 00:26:38,390
Which aspect of that
theory was better

647
00:26:38,390 --> 00:26:40,460
than the valence and arousal?

648
00:26:40,460 --> 00:26:42,650
It gives you an all
things considered answer,

649
00:26:42,650 --> 00:26:45,020
not a dimension specific answer.

650
00:26:45,020 --> 00:26:47,780
That's one thing that is
a limit in the way you

651
00:26:47,780 --> 00:26:50,960
should use representational
dissimilarity analyses.

652
00:26:54,270 --> 00:26:57,890
There's two key
problems that I think

653
00:26:57,890 --> 00:27:02,120
bear reflecting on about
MVPA, and one of them

654
00:27:02,120 --> 00:27:04,070
is a catastrophe
and the other one

655
00:27:04,070 --> 00:27:07,304
is an incredibly deep puzzle.

656
00:27:07,304 --> 00:27:08,720
And I think I
should just say them

657
00:27:08,720 --> 00:27:10,885
right away before you get
too excited, because all

658
00:27:10,885 --> 00:27:12,260
of this stuff was
really exciting

659
00:27:12,260 --> 00:27:14,970
and now I'm going to tell you
a catastrophe and a puzzle.

660
00:27:14,970 --> 00:27:17,060
Here's the catastrophe.

661
00:27:17,060 --> 00:27:19,880
The catastrophe
is that you can't

662
00:27:19,880 --> 00:27:22,700
make anything of null results.

663
00:27:22,700 --> 00:27:24,560
OK, now, here's why.

664
00:27:24,560 --> 00:27:26,660
Because when I say that
you can decode something

665
00:27:26,660 --> 00:27:28,400
from an MVPA
analysis, what I mean

666
00:27:28,400 --> 00:27:30,050
is that at the scale
of voxels, there's

667
00:27:30,050 --> 00:27:31,850
some signal in terms
of which voxels

668
00:27:31,850 --> 00:27:34,700
relatively higher or
relatively lower in response

669
00:27:34,700 --> 00:27:35,340
to the stimuli.

670
00:27:35,340 --> 00:27:35,840
Right?

671
00:27:35,840 --> 00:27:38,750
So in voxel space or in spatial
space, whichever one of those

672
00:27:38,750 --> 00:27:40,010
you find helpful--

673
00:27:40,010 --> 00:27:43,820
if that at the level of voxels
we could cluster these stimuli.

674
00:27:43,820 --> 00:27:45,800
And what that says is
that they are something

675
00:27:45,800 --> 00:27:48,500
like distinct populations
in this region,

676
00:27:48,500 --> 00:27:50,819
responding across that
feature dimension,

677
00:27:50,819 --> 00:27:53,360
and they're spatially segregated
enough that we could pick up

678
00:27:53,360 --> 00:27:55,145
on them with fMRI.

679
00:27:55,145 --> 00:27:57,020
But who cares if they're
spatially segregated

680
00:27:57,020 --> 00:27:59,353
enough that we could pick up
with them with fMRI, right?

681
00:27:59,353 --> 00:28:01,550
fMRI is the scale
of a millimeter.

682
00:28:01,550 --> 00:28:04,340
And there could be
many, many, many things

683
00:28:04,340 --> 00:28:06,290
that are represented by
populations of neurons

684
00:28:06,290 --> 00:28:08,800
within a region that are
not spatially organized

685
00:28:08,800 --> 00:28:10,220
at the scale of a millimeter.

686
00:28:10,220 --> 00:28:11,620
Not only could there be--

687
00:28:11,620 --> 00:28:13,032
there absolutely,
definitely are.

688
00:28:13,032 --> 00:28:14,990
There's a whole bunch of
things that we already

689
00:28:14,990 --> 00:28:17,420
know are really
important properties

690
00:28:17,420 --> 00:28:19,619
of neural representations
of things we care about,

691
00:28:19,619 --> 00:28:21,410
and we know that their
spatial scale is not

692
00:28:21,410 --> 00:28:23,599
high enough that they can
be picked up on with fMRI.

693
00:28:23,599 --> 00:28:25,640
So two cases that I'll
tell you about because you

694
00:28:25,640 --> 00:28:26,940
should care about them--

695
00:28:26,940 --> 00:28:31,160
one is face responses in
the middle temporal region

696
00:28:31,160 --> 00:28:35,090
that Doris and Winrich study
for face representations

697
00:28:35,090 --> 00:28:37,130
in monkeys.

698
00:28:37,130 --> 00:28:39,085
It's one of the middle ones.

699
00:28:39,085 --> 00:28:41,210
In that one there's face
features that can tell you

700
00:28:41,210 --> 00:28:43,370
how far apart the
pupils are, how high

701
00:28:43,370 --> 00:28:46,130
the eyebrows are-- did Winrich
show you this amazing data?

702
00:28:46,130 --> 00:28:49,700
Totally, amazing, beautiful,
feature space of face identity

703
00:28:49,700 --> 00:28:50,780
representation?

704
00:28:50,780 --> 00:28:53,150
One of the most strikingly
beautiful things I've

705
00:28:53,150 --> 00:28:54,170
ever seen.

706
00:28:54,170 --> 00:28:56,360
And he already knows--

707
00:28:56,360 --> 00:28:57,620
he and Doris already know--

708
00:28:57,620 --> 00:28:59,960
that there's no
spatial relationship

709
00:28:59,960 --> 00:29:03,740
at all between the property
that one neuron signals

710
00:29:03,740 --> 00:29:05,330
and it's distance
from other neurons

711
00:29:05,330 --> 00:29:06,538
that signal other properties.

712
00:29:06,538 --> 00:29:08,352
There's no spatial
organization at all.

713
00:29:08,352 --> 00:29:10,310
So if you know that right
here is a neuron that

714
00:29:10,310 --> 00:29:14,150
responds to eye width, you know
nothing more about the neuron

715
00:29:14,150 --> 00:29:16,490
next to its preferred
property than a neuron

716
00:29:16,490 --> 00:29:17,390
a centimeter away.

717
00:29:17,390 --> 00:29:21,165
There's no spatial structure to
which feature I give a neuron

718
00:29:21,165 --> 00:29:24,890
response to, which means
that you absolutely could not

719
00:29:24,890 --> 00:29:27,930
and cannot pick up on that in
fMRI, which Doris has shown.

720
00:29:27,930 --> 00:29:30,320
This feature structure
information cannot be picked up

721
00:29:30,320 --> 00:29:34,491
on with fMRI, even though it
is there and really important.

722
00:29:34,491 --> 00:29:36,740
Another example is valence
and coding in the amygdala.

723
00:29:36,740 --> 00:29:38,364
The amygdala contains
some neurons that

724
00:29:38,364 --> 00:29:39,980
respond to positively
valenced events

725
00:29:39,980 --> 00:29:42,530
and other neurons that respond
to negatively valenced events,

726
00:29:42,530 --> 00:29:44,510
and they are as
spatially interleaved

727
00:29:44,510 --> 00:29:47,130
as physically possible-- that's
what Kay Tye's data shows.

728
00:29:47,130 --> 00:29:49,130
You couldn't get them
more spatially interleaved

729
00:29:49,130 --> 00:29:49,750
than they are.

730
00:29:49,750 --> 00:29:53,090
They are as close together as
the size of the neurons allow.

731
00:29:53,090 --> 00:29:55,430
So you absolutely will
never be able to decode

732
00:29:55,430 --> 00:29:57,890
with fMRI in those population--
the amygdala-- that there

733
00:29:57,890 --> 00:30:00,098
are different populations
for positive and negatively

734
00:30:00,098 --> 00:30:02,120
valenced events, but there are.

735
00:30:02,120 --> 00:30:02,750
OK.

736
00:30:02,750 --> 00:30:05,900
So that means that when you see
something in fMRI it's probably

737
00:30:05,900 --> 00:30:07,790
there, but when you
don't see it in fMRI

738
00:30:07,790 --> 00:30:09,410
you don't know that
it's not there.

739
00:30:09,410 --> 00:30:11,715
And the reason why that's
a total catastrophe

740
00:30:11,715 --> 00:30:14,180
is if it means that when I
tell you that a region codes

741
00:30:14,180 --> 00:30:15,420
A and not B--

742
00:30:15,420 --> 00:30:17,130
I don't know that
it doesn't code B.

743
00:30:17,130 --> 00:30:19,440
And when I tell you the
region that this thing is

744
00:30:19,440 --> 00:30:22,935
coded in region A and it's
not coded in region B--

745
00:30:22,935 --> 00:30:25,290
I don't know that it's
not coded in region B.

746
00:30:25,290 --> 00:30:27,340
So I can never show you
a double dissociation.

747
00:30:27,340 --> 00:30:29,131
I can never show you
a single dissociation.

748
00:30:29,131 --> 00:30:30,930
I can never show you
a dissociation at all.

749
00:30:30,930 --> 00:30:33,780
All I can say for sure
is that the spatial scale

750
00:30:33,780 --> 00:30:37,000
of the information is different
between one region and another,

751
00:30:37,000 --> 00:30:39,434
or between one piece of
information and another,

752
00:30:39,434 --> 00:30:41,850
and we have no reason to believe
that that matters at all.

753
00:30:41,850 --> 00:30:42,349
Right?

754
00:30:42,349 --> 00:30:44,970
Really important things are
encoded at very fine spatial

755
00:30:44,970 --> 00:30:45,748
scales.

756
00:30:45,748 --> 00:30:47,956
And so any time I tell you--
which I told you a bunch

757
00:30:47,956 --> 00:30:49,770
of times because I
think it's really cool--

758
00:30:49,770 --> 00:30:52,920
that there's a difference in
what feature is encoded where,

759
00:30:52,920 --> 00:30:54,390
you have no reason
to believe me.

760
00:30:54,390 --> 00:30:55,910
And that's the catastrophe.

761
00:30:55,910 --> 00:30:56,980
It's a total catastrophe.

762
00:30:56,980 --> 00:31:00,510
If you can't make distinctions,
you can't make any conclusions

763
00:31:00,510 --> 00:31:01,680
at all.

764
00:31:01,680 --> 00:31:03,810
I'll just briefly
say the other thing

765
00:31:03,810 --> 00:31:05,490
that's a problem
with this, which

766
00:31:05,490 --> 00:31:08,500
is that this idea of
similarity space--

767
00:31:08,500 --> 00:31:11,310
the idea that you should think
of a concept, like jealous,

768
00:31:11,310 --> 00:31:13,349
as a point in a
multidimensional space,

769
00:31:13,349 --> 00:31:15,390
and what it means to think
of somebody as jealous

770
00:31:15,390 --> 00:31:17,670
is to think of them
as a certain distance

771
00:31:17,670 --> 00:31:20,550
from irritated and angry
and proud and impressed--

772
00:31:20,550 --> 00:31:23,520
that idea has been
thoroughly undermined

773
00:31:23,520 --> 00:31:29,180
in psychology and psychophysics
and computational cognition.

774
00:31:29,180 --> 00:31:31,150
It's really a bad
theory of concepts.

775
00:31:31,150 --> 00:31:33,930
It can't do any of the work that
concepts are supposed to do.

776
00:31:33,930 --> 00:31:35,520
One of the most important
things they can't do

777
00:31:35,520 --> 00:31:36,353
is compositionality.

778
00:31:36,353 --> 00:31:38,440
It can't explain the
way concepts compose,

779
00:31:38,440 --> 00:31:40,110
which is absolutely critical
to the way that we think

780
00:31:40,110 --> 00:31:41,790
and even more critical
to the way that we think

781
00:31:41,790 --> 00:31:43,950
about other people's minds,
because every thought you have

782
00:31:43,950 --> 00:31:45,510
about somebody
else's mental state

783
00:31:45,510 --> 00:31:48,480
is a composition of an agent,
a mental state, and a content.

784
00:31:48,480 --> 00:31:52,650
And so, this whole way of
thinking about concepts

785
00:31:52,650 --> 00:31:55,350
as points in
multi-dimensional spaces

786
00:31:55,350 --> 00:31:58,560
works, but shouldn't work.

787
00:31:58,560 --> 00:32:02,800
And that's the other problem
with this whole endeavor.

788
00:32:02,800 --> 00:32:04,890
OK.

789
00:32:04,890 --> 00:32:06,390
There's a bunch of
things that we're

790
00:32:06,390 --> 00:32:08,459
doing with this that I
will just briefly mention

791
00:32:08,459 --> 00:32:10,750
in case people want to think
about it or know about it.

792
00:32:10,750 --> 00:32:12,480
The two things I'm really
excited about-- one

793
00:32:12,480 --> 00:32:14,040
is adding temporal
information, so looking

794
00:32:14,040 --> 00:32:16,410
at the change in information
in brain regions over time,

795
00:32:16,410 --> 00:32:18,100
and how they
influence one another.

796
00:32:18,100 --> 00:32:21,480
And that's my post-doc
Stefano Anzellotti's project.

797
00:32:21,480 --> 00:32:23,849
And another thing that I'm
excited about is that--

798
00:32:23,849 --> 00:32:25,890
to the degree that you
take these positive claims

799
00:32:25,890 --> 00:32:28,389
as something interesting, which
I actually still do in spite

800
00:32:28,389 --> 00:32:31,099
of all my end of
the world talk--

801
00:32:31,099 --> 00:32:32,640
one thing that I
think is really neat

802
00:32:32,640 --> 00:32:37,380
is the idea of increasingly
differentiable representational

803
00:32:37,380 --> 00:32:40,070
spaces.

804
00:32:40,070 --> 00:32:42,240
So two sets of stimuli
that produce clusters

805
00:32:42,240 --> 00:32:43,920
that are not separable--

806
00:32:43,920 --> 00:32:46,080
for example, in voxel
or neural space--

807
00:32:46,080 --> 00:32:48,180
and making them
increasingly distinct.

808
00:32:48,180 --> 00:32:50,890
So Jim DiCarlo calls this
unfolding a manifold.

809
00:32:50,890 --> 00:32:51,390
Right?

810
00:32:51,390 --> 00:32:53,880
That idea, which is
Jim DiCarlo's model

811
00:32:53,880 --> 00:32:57,030
of the successive processing
in stages from V1 to V2

812
00:32:57,030 --> 00:32:59,034
to V4 to IT--

813
00:32:59,034 --> 00:33:00,450
I think that's a
really cool model

814
00:33:00,450 --> 00:33:01,870
of conceptual development.

815
00:33:01,870 --> 00:33:03,570
That what you might
have is originally

816
00:33:03,570 --> 00:33:05,640
neural responses that
can't separate stimuli

817
00:33:05,640 --> 00:33:07,530
along some interesting
dimension-- that

818
00:33:07,530 --> 00:33:10,200
unfold that
representational space

819
00:33:10,200 --> 00:33:13,560
to make them more dissimilar
as you get that concept more--

820
00:33:13,560 --> 00:33:16,140
or that dimension or
feature of the stimuli

821
00:33:16,140 --> 00:33:18,060
more distinctively represented.

822
00:33:18,060 --> 00:33:19,680
And so we've tried
a first version

823
00:33:19,680 --> 00:33:22,170
of this with justification--

824
00:33:22,170 --> 00:33:24,660
so kids between age seven
and 12 get better and better

825
00:33:24,660 --> 00:33:26,284
at distinguishing
people's beliefs that

826
00:33:26,284 --> 00:33:27,630
have good and bad evidence.

827
00:33:27,630 --> 00:33:29,490
And we've shown that
that's correlated

828
00:33:29,490 --> 00:33:31,950
with a neural signature
in the right TPJ getting

829
00:33:31,950 --> 00:33:35,880
more and more distinct over that
same time and those same kids.

830
00:33:35,880 --> 00:33:39,660
And so I think thinking of
representational dissimilarity

831
00:33:39,660 --> 00:33:42,840
as a model of conceptual
change, while certainly wrong,

832
00:33:42,840 --> 00:33:46,369
is probably really powerful,
and I'm very excited about it.

833
00:33:46,369 --> 00:33:47,910
And the last thing
I will do is thank

834
00:33:47,910 --> 00:33:51,210
the people who did the work,
especially everybody in my lab,

835
00:33:51,210 --> 00:33:52,760
and two PhD students--

836
00:33:52,760 --> 00:33:55,600
Jorie Koster-Hale and
Amy Skerry and you guys.

837
00:33:55,600 --> 00:33:57,470
Thank you.