1
00:00:00,000 --> 00:00:01,976
[SQUEAKING]

2
00:00:01,976 --> 00:00:03,952
[RUSTLING]

3
00:00:03,952 --> 00:00:07,410
[CLICKING]

4
00:00:07,410 --> 00:00:09,707


5
00:00:09,707 --> 00:00:11,040
KIKI GUTIERREZ: My name is Kiki.

6
00:00:11,040 --> 00:00:13,170
I'm visiting a
scholar here at MIT.

7
00:00:13,170 --> 00:00:14,780
I'm actually an
assistant professor

8
00:00:14,780 --> 00:00:18,530
at Polytechnical
University of Madrid.

9
00:00:18,530 --> 00:00:21,030
My background is
actually on engineering.

10
00:00:21,030 --> 00:00:23,490
I am aerospace engineer.

11
00:00:23,490 --> 00:00:28,140
I did my PhD on numerical
methods, but after my postdoc,

12
00:00:28,140 --> 00:00:30,660
I decided to start doing things
that really fascinates me,

13
00:00:30,660 --> 00:00:33,710
and that's how I end up doing
something related with music.

14
00:00:33,710 --> 00:00:36,350
So I'm relatively
new in this field.

15
00:00:36,350 --> 00:00:39,470
Today I'm going to show you
an algorithm that I came up

16
00:00:39,470 --> 00:00:42,200
for pattern detection
in music, which

17
00:00:42,200 --> 00:00:45,240
is what I've been working on
for the last year and a half,

18
00:00:45,240 --> 00:00:46,040
more or less.

19
00:00:46,040 --> 00:00:50,270
So as running example,
I'm going to use

20
00:00:50,270 --> 00:00:57,620
the most listened song of the
history, which is Baby Shark.

21
00:00:57,620 --> 00:01:02,180
So I don't know
if I can play it.

22
00:01:02,180 --> 00:01:07,135
At least, it's the most listened
song of the history of YouTube.

23
00:01:07,135 --> 00:01:07,760
[MUSIC PLAYING]

24
00:01:07,760 --> 00:01:10,380
Baby shark doo doo
doo doo doo doo.

25
00:01:10,380 --> 00:01:12,450
Baby Shark doo doo
doo doo doo doo.

26
00:01:12,450 --> 00:01:14,550
Baby Shark doo doo
doo doo doo doo.

27
00:01:14,550 --> 00:01:15,780
Baby shark.

28
00:01:15,780 --> 00:01:18,720
Mommy shark doo doo doo doo doo.

29
00:01:18,720 --> 00:01:20,790
Mommy shark doo doo doo doo.

30
00:01:20,790 --> 00:01:22,860
Mommy shark doo doo doo doo.

31
00:01:22,860 --> 00:01:23,690
Mommy shark--

32
00:01:23,690 --> 00:01:26,450
OK, is pretty much
like that of the song.

33
00:01:26,450 --> 00:01:28,350
This is the melody.

34
00:01:28,350 --> 00:01:32,030
So what's the problem of
pattern detection in music?

35
00:01:32,030 --> 00:01:36,770
We would like to have a
tool, an automated tool that

36
00:01:36,770 --> 00:01:40,910
helps us to identify over
the score or over a corpus

37
00:01:40,910 --> 00:01:43,280
the important musical
ideas there, you know.

38
00:01:43,280 --> 00:01:46,790
So I'm highlighting
here some fragments,

39
00:01:46,790 --> 00:01:49,740
how I look first at the
blue fragment over there.

40
00:01:49,740 --> 00:01:53,040
There are three fragments,
three occurrences of them.

41
00:01:53,040 --> 00:01:54,870
All of them are
exactly the same.

42
00:01:54,870 --> 00:01:57,590
They have the same notes
with the same pitches

43
00:01:57,590 --> 00:01:59,150
and same durations.

44
00:01:59,150 --> 00:02:00,950
The only difference
is that they occur

45
00:02:00,950 --> 00:02:04,380
at different moments of the
score, but it seems logical.

46
00:02:04,380 --> 00:02:07,880
It seems reasonable to group
them as the same musical idea

47
00:02:07,880 --> 00:02:13,190
and say that they
represent the same pattern.

48
00:02:13,190 --> 00:02:15,170
We can call them--

49
00:02:15,170 --> 00:02:18,210
we can put them a
label, blue pattern.

50
00:02:18,210 --> 00:02:20,780
For the green one,
it's a little trickier

51
00:02:20,780 --> 00:02:24,740
because the occurrence number
2 and the occurrence number 3

52
00:02:24,740 --> 00:02:26,330
are exactly the same.

53
00:02:26,330 --> 00:02:28,560
Probably makes sense
to group them together.

54
00:02:28,560 --> 00:02:31,250
Occurrence number 1
has different durations

55
00:02:31,250 --> 00:02:34,820
for the notes and the
occurrence number 4

56
00:02:34,820 --> 00:02:40,490
is even trickier because they
just start with the rest that--

57
00:02:40,490 --> 00:02:43,950
it's three notes
also and the lyrics.

58
00:02:43,950 --> 00:02:46,160
But it's pretty different.

59
00:02:46,160 --> 00:02:49,760
So that's one of the main
difficulties of the problem

60
00:02:49,760 --> 00:02:51,050
of pattern mining in music.

61
00:02:51,050 --> 00:02:55,310
That even though objects that
we want to group together--

62
00:02:55,310 --> 00:02:59,490
because in our mind represent
the same musical idea--

63
00:02:59,490 --> 00:03:02,740
over the table, they might
look pretty different.

64
00:03:02,740 --> 00:03:05,490
And there are two
mechanisms involved here.

65
00:03:05,490 --> 00:03:07,560
The first one is something
that we have already

66
00:03:07,560 --> 00:03:09,220
studied, transformations.

67
00:03:09,220 --> 00:03:12,960
So if we have a
music fragment, if we

68
00:03:12,960 --> 00:03:16,590
apply a mathematical
transformation over the notes

69
00:03:16,590 --> 00:03:20,970
there, we obtain other fragments
that under some circumstances,

70
00:03:20,970 --> 00:03:23,910
our mind will process
them as equivalent.

71
00:03:23,910 --> 00:03:28,270
And how to deal with
this computationally

72
00:03:28,270 --> 00:03:31,330
from the point of view of
pattern mining algorithms?

73
00:03:31,330 --> 00:03:34,780
My approach was to use
the concept of viewpoint.

74
00:03:34,780 --> 00:03:37,570
I'm going to show you an
example of what is this about,

75
00:03:37,570 --> 00:03:40,080
but for the moment,
it's enough to know

76
00:03:40,080 --> 00:03:43,770
that it has a double task.

77
00:03:43,770 --> 00:03:46,890
On the first hand,
it directly takes

78
00:03:46,890 --> 00:03:48,613
into account these
transformations.

79
00:03:48,613 --> 00:03:49,780
You will see it in a minute.

80
00:03:49,780 --> 00:03:53,950
And probably more important, it
simplifies the representation,

81
00:03:53,950 --> 00:03:57,330
because one of the constants in
this course, in these classes,

82
00:03:57,330 --> 00:04:00,160
is that music is
something complex.

83
00:04:00,160 --> 00:04:04,020
It has complex,
logical structures

84
00:04:04,020 --> 00:04:06,040
among the elements of the score.

85
00:04:06,040 --> 00:04:10,650
And if we find a way to
simplify that representation,

86
00:04:10,650 --> 00:04:13,930
would be helpful from the
computational point of view.

87
00:04:13,930 --> 00:04:17,769
So the viewpoint representation
is pretty simple.

88
00:04:17,769 --> 00:04:19,800
It's just I'm
going to substitute

89
00:04:19,800 --> 00:04:26,830
a complex tune like this one by
a single sequence of symbols.

90
00:04:26,830 --> 00:04:30,060
So actually, here I'm showing
three viewpoints representation.

91
00:04:30,060 --> 00:04:34,240
The pitch viewpoint, it's the
sequence of these symbols 83,

92
00:04:34,240 --> 00:04:35,800
83, 83, 85.

93
00:04:35,800 --> 00:04:37,300
It's the midi Note.

94
00:04:37,300 --> 00:04:41,100
Then we have the duration
and the onset time.

95
00:04:41,100 --> 00:04:44,050
I'm going to
highlight a fragment.

96
00:04:44,050 --> 00:04:46,680
So you can see the different
viewpoints representations

97
00:04:46,680 --> 00:04:47,800
for that fragment.

98
00:04:47,800 --> 00:04:50,640
And it's nice that the
idea of constructing

99
00:04:50,640 --> 00:04:54,000
the score by overlapping
layers of information.

100
00:04:54,000 --> 00:04:55,230
That's the idea.

101
00:04:55,230 --> 00:04:58,830
Formerly, a viewpoint
is a mapping

102
00:04:58,830 --> 00:05:03,960
between the current event, the
current time slice, and all

103
00:05:03,960 --> 00:05:08,550
the former ones into symbol,
usually an integer or float

104
00:05:08,550 --> 00:05:10,420
but could be others.

105
00:05:10,420 --> 00:05:13,560
Those that are
derived at viewpoints.

106
00:05:13,560 --> 00:05:17,950
And perhaps for your own
particular music application,

107
00:05:17,950 --> 00:05:21,780
you need to develop your own
viewpoint representation.

108
00:05:21,780 --> 00:05:25,230
And here is clearly highlighted
how the transformations

109
00:05:25,230 --> 00:05:28,440
could be directly taken into
account by using this technique.

110
00:05:28,440 --> 00:05:32,400
For instance, if you apply
the octave transformation

111
00:05:32,400 --> 00:05:35,680
to that pattern
there, that fragment,

112
00:05:35,680 --> 00:05:42,510
you obtain that one over there,
which by the interval viewpoint

113
00:05:42,510 --> 00:05:45,640
representation, we
have the same symbol.

114
00:05:45,640 --> 00:05:48,180
So for sure, any
pattern mining algorithm

115
00:05:48,180 --> 00:05:52,290
that is trying to
find this sequence

116
00:05:52,290 --> 00:05:54,840
will see that that
one is the same.

117
00:05:54,840 --> 00:05:58,260
Is that enough to deal with
the concept of dissimilarity

118
00:05:58,260 --> 00:05:59,140
in music?

119
00:05:59,140 --> 00:06:00,580
Unfortunately, no.

120
00:06:00,580 --> 00:06:04,110
So have a look at these two
occurrences of the green pattern

121
00:06:04,110 --> 00:06:05,640
of Baby Shark.

122
00:06:05,640 --> 00:06:08,520
There is no viewpoint
representation

123
00:06:08,520 --> 00:06:11,110
that can take us from
one to the other.

124
00:06:11,110 --> 00:06:13,210
So someone said
at some point, OK,

125
00:06:13,210 --> 00:06:16,500
it would be nice to
have a kind of measure,

126
00:06:16,500 --> 00:06:19,770
a mathematical tool that help
us to evaluate how different two

127
00:06:19,770 --> 00:06:21,480
fragments are.

128
00:06:21,480 --> 00:06:27,180
Like the aim here is that,
OK, if I use that matrix

129
00:06:27,180 --> 00:06:31,980
with this guy and this guy,
I would expect a lower value

130
00:06:31,980 --> 00:06:34,080
than the one that
I would obtain when

131
00:06:34,080 --> 00:06:35,920
taking this guy and this guy.

132
00:06:35,920 --> 00:06:39,240
Because they are more far.

133
00:06:39,240 --> 00:06:42,550
They represent further ideas.

134
00:06:42,550 --> 00:06:46,050
So in computer science, they
use what is called the distance

135
00:06:46,050 --> 00:06:46,930
functions.

136
00:06:46,930 --> 00:06:52,260
This in a general scenario is
that you can give to a function

137
00:06:52,260 --> 00:06:55,710
two objects, two random objects,
for instance, me, myself,

138
00:06:55,710 --> 00:06:58,020
and this computer,
and it returns

139
00:06:58,020 --> 00:07:01,800
a non-negative real
number, evaluating

140
00:07:01,800 --> 00:07:05,040
the degree of dissimilarity
between the two objects.

141
00:07:05,040 --> 00:07:07,680
In the context of
sequences, because we

142
00:07:07,680 --> 00:07:09,580
are using viewpoint
representation,

143
00:07:09,580 --> 00:07:12,160
so now this fragment
will be a sequence.

144
00:07:12,160 --> 00:07:13,510
For instance.

145
00:07:13,510 --> 00:07:16,840
Let's think that I'm using
the duration viewpoint.

146
00:07:16,840 --> 00:07:22,000
So in the literature, we have
plenty of distance functions.

147
00:07:22,000 --> 00:07:26,820
This was a very trending topic
in music information retrieval

148
00:07:26,820 --> 00:07:28,930
15 years ago, more or less.

149
00:07:28,930 --> 00:07:32,010
There were many
studies and papers

150
00:07:32,010 --> 00:07:36,880
trying to figure out which was
the best distance function,

151
00:07:36,880 --> 00:07:39,240
depending on the style.

152
00:07:39,240 --> 00:07:42,210
The Levenshtein distance
function, perhaps, it's

153
00:07:42,210 --> 00:07:43,980
not the most advanced
one, but this

154
00:07:43,980 --> 00:07:46,750
is one of the simplest one and
probably the most used one.

155
00:07:46,750 --> 00:07:50,710
And it accounts for the number
of substitutions, deletions,

156
00:07:50,710 --> 00:07:54,710
or insertions to go from
one sequence to the other.

157
00:07:54,710 --> 00:08:01,790
So to go from the sequence 0.5,
0.5, 0.5 to the sequence 0.5,

158
00:08:01,790 --> 00:08:05,120
0.52, we just need a
single substitution.

159
00:08:05,120 --> 00:08:08,830
So the Levenshtein distance
function between the fragment

160
00:08:08,830 --> 00:08:11,038
2 and fragment 4 is 1.

161
00:08:11,038 --> 00:08:13,330
Probably the Levenshtein
distance function between this

162
00:08:13,330 --> 00:08:14,650
guy and this guy it's--

163
00:08:14,650 --> 00:08:17,500
I don't know-- two
or three, at least.

164
00:08:17,500 --> 00:08:18,920
So cool.

165
00:08:18,920 --> 00:08:21,730
Now we have all the
ingredients needed

166
00:08:21,730 --> 00:08:26,240
to understand the inputs
and outputs of my algorithm.

167
00:08:26,240 --> 00:08:29,780
We have a corpus or
a piece of music.

168
00:08:29,780 --> 00:08:31,660
We have also the
viewpoint representation

169
00:08:31,660 --> 00:08:33,500
that the user wants to choose.

170
00:08:33,500 --> 00:08:36,020
And some parameters
at the input.

171
00:08:36,020 --> 00:08:39,070
The minimum support is the
minimum number of repetitions

172
00:08:39,070 --> 00:08:43,610
that we require to a pattern
to be considered frequent

173
00:08:43,610 --> 00:08:45,820
and to be included
in the result list.

174
00:08:45,820 --> 00:08:48,410
These two guys, the minimum
length and maximum length,

175
00:08:48,410 --> 00:08:50,980
are the parameters to control
the lengths of the patterns

176
00:08:50,980 --> 00:08:53,590
that we want to mine
and some parameters

177
00:08:53,590 --> 00:08:56,410
to control the degree
of dissimilarity

178
00:08:56,410 --> 00:08:57,740
from the former slide.

179
00:08:57,740 --> 00:09:02,440
And the result is just like
the list of frequent patterns

180
00:09:02,440 --> 00:09:04,040
together with the position.

181
00:09:04,040 --> 00:09:08,650
This is an example with the
piece of the former slide.

182
00:09:08,650 --> 00:09:13,540
Now I'm selecting the interval
viewpoint and those parameters

183
00:09:13,540 --> 00:09:14,240
over there.

184
00:09:14,240 --> 00:09:18,140
And this is an example of the
output that we might obtain.

185
00:09:18,140 --> 00:09:23,590
So we have a yellow pattern
that has four occurrences

186
00:09:23,590 --> 00:09:26,800
and have a look that
each of these fragments

187
00:09:26,800 --> 00:09:33,070
are at most distance one
from any of the rest,

188
00:09:33,070 --> 00:09:35,780
taking the Levenshtein
distance function.

189
00:09:35,780 --> 00:09:38,320
And we might have,
for instance, also

190
00:09:38,320 --> 00:09:42,080
this guy, the purple
pattern, and this green one,

191
00:09:42,080 --> 00:09:43,910
which is of length 4.

192
00:09:43,910 --> 00:09:47,920
So in a real case,
the pattern structure

193
00:09:47,920 --> 00:09:49,810
might be complex, as
you see, because you

194
00:09:49,810 --> 00:09:50,990
have nested patterns.

195
00:09:50,990 --> 00:09:52,330
You have many overlappings.

196
00:09:52,330 --> 00:09:54,010
I won't go into
the details of how

197
00:09:54,010 --> 00:09:57,760
I implemented the algorithm,
just a brief overview.

198
00:09:57,760 --> 00:10:00,080
I divided in four steps.

199
00:10:00,080 --> 00:10:03,510
The one is the creation
of an initial database.

200
00:10:03,510 --> 00:10:05,980
A vertical database
is called sometimes,

201
00:10:05,980 --> 00:10:10,340
where I just take a sliding
window of minimum length,

202
00:10:10,340 --> 00:10:12,620
the minimum length
selected by the user,

203
00:10:12,620 --> 00:10:16,450
and I just create a
database of all the patterns

204
00:10:16,450 --> 00:10:18,770
together with their
positions within the score.

205
00:10:18,770 --> 00:10:23,020
Then in the next step,
I compare every pattern

206
00:10:23,020 --> 00:10:27,910
against all the rest, trying
to see if they satisfies

207
00:10:27,910 --> 00:10:29,410
the distance constraint.

208
00:10:29,410 --> 00:10:32,020
And in that case, I
group them together

209
00:10:32,020 --> 00:10:37,000
into what I call metric
patterns, which are like groups

210
00:10:37,000 --> 00:10:38,730
of fragments of music.

211
00:10:38,730 --> 00:10:41,020
At this stage,
it's safe to delete

212
00:10:41,020 --> 00:10:44,775
the entries that doesn't reach
the minimum support constraint.

213
00:10:44,775 --> 00:10:46,480
They don't have
enough occurrences

214
00:10:46,480 --> 00:10:48,280
to be considered frequent.

215
00:10:48,280 --> 00:10:53,917
And finally, as here, I
still have patterns of length

216
00:10:53,917 --> 00:10:55,750
equal to the minimum
length, and we actually

217
00:10:55,750 --> 00:10:57,020
want longer patterns.

218
00:10:57,020 --> 00:11:00,430
So the last stitch is
overlapping different patterns

219
00:11:00,430 --> 00:11:03,260
to form longer ones.

220
00:11:03,260 --> 00:11:06,530
And to finish this I'm going
to show you some results.

221
00:11:06,530 --> 00:11:08,200
This is the last slide.

222
00:11:08,200 --> 00:11:10,690
I'm actually already using this.

223
00:11:10,690 --> 00:11:13,300
I started collaborating
with these research groups

224
00:11:13,300 --> 00:11:15,650
like seven months ago.

225
00:11:15,650 --> 00:11:22,490
This is a research group from
another university of Madrid.

226
00:11:22,490 --> 00:11:26,170
They got a good amount of
money of the European Union

227
00:11:26,170 --> 00:11:28,760
to study the Italian operas.

228
00:11:28,760 --> 00:11:34,100
They have a huge database with
many features for each piece.

229
00:11:34,100 --> 00:11:37,180
And in particular, they are
interested in the motion

230
00:11:37,180 --> 00:11:39,320
that the piece evokes.

231
00:11:39,320 --> 00:11:43,480
So here, we are trying
to find some correlations

232
00:11:43,480 --> 00:11:46,840
between the emotions and
the pattern structure

233
00:11:46,840 --> 00:11:48,560
of the different pieces.

234
00:11:48,560 --> 00:11:51,800
These two other products
are pretty similar.

235
00:11:51,800 --> 00:11:58,250
The number 2 is on a corpus of
folk music from a very concrete

236
00:11:58,250 --> 00:12:01,650
region of Europe,
around the Pyrenees,

237
00:12:01,650 --> 00:12:07,740
and the second one is the
database of jazz solos.

238
00:12:07,740 --> 00:12:10,940
So here we are building
classifiers, similar to what

239
00:12:10,940 --> 00:12:13,350
we did in the last class.

240
00:12:13,350 --> 00:12:17,480
Here I'm showing the number
of patterns per style

241
00:12:17,480 --> 00:12:21,440
for the different
styles in this corpus,

242
00:12:21,440 --> 00:12:25,520
but perhaps here, we can see
a cool thing of the algorithm.

243
00:12:25,520 --> 00:12:28,440
That it returns also the
position of the patterns,

244
00:12:28,440 --> 00:12:31,370
not only the fact that a
piece contains the pattern.

245
00:12:31,370 --> 00:12:33,200
So here, I'm plotting--

246
00:12:33,200 --> 00:12:35,550
it's called coverage,
this variable,

247
00:12:35,550 --> 00:12:39,770
but it's actually like
the probability function

248
00:12:39,770 --> 00:12:42,680
of encountering a
pattern within a solo.

249
00:12:42,680 --> 00:12:47,240
So I normalize the length
of all the solos by style,

250
00:12:47,240 --> 00:12:50,670
and the thin line is the
average for the solos.

251
00:12:50,670 --> 00:12:55,220
We can see, for instance, that
the concentration of patterns

252
00:12:55,220 --> 00:12:57,930
tends to be at the beginning
and not in the end.

253
00:12:57,930 --> 00:13:02,360
This is intuitive because
the improvisers usually

254
00:13:02,360 --> 00:13:05,550
start their solos in
a more organized way,

255
00:13:05,550 --> 00:13:08,610
taking thematic material
from the original melody,

256
00:13:08,610 --> 00:13:11,660
and then they start to
do more crazy stuff.

257
00:13:11,660 --> 00:13:14,780
Then, I'm also trying
to build up a pattern

258
00:13:14,780 --> 00:13:18,270
dictionary in an Irish corpus.

259
00:13:18,270 --> 00:13:21,020
These are the five
most frequent patterns

260
00:13:21,020 --> 00:13:25,050
in the subset of minor rules in
the corpus that I'm working on.

261
00:13:25,050 --> 00:13:27,410
So I don't know
for a performance

262
00:13:27,410 --> 00:13:29,550
that is interested in
playing this music.

263
00:13:29,550 --> 00:13:32,900
Perhaps, the take away
of this is, OK, you

264
00:13:32,900 --> 00:13:35,090
might want to start
studying these patterns

265
00:13:35,090 --> 00:13:39,180
because they appear quite
often in the corpus.

266
00:13:39,180 --> 00:13:42,590
So better to first master them.

267
00:13:42,590 --> 00:13:46,880
For the future with
Michael, we want

268
00:13:46,880 --> 00:13:49,760
to study how the
algorithm performs

269
00:13:49,760 --> 00:13:52,380
in a nice corpus of
early music that he has,

270
00:13:52,380 --> 00:13:55,890
and also, we want to analyze
some solos of Charlie Parker,

271
00:13:55,890 --> 00:13:59,420
who was one of the most
mathematical improvisers

272
00:13:59,420 --> 00:14:00,500
of the history.

273
00:14:00,500 --> 00:14:02,660
And also for the
close future, I would

274
00:14:02,660 --> 00:14:06,080
like to see if this tool can
be helpful for plagiarism

275
00:14:06,080 --> 00:14:06,980
detection.

276
00:14:06,980 --> 00:14:12,530
I suspect that two pieces
that are very similar

277
00:14:12,530 --> 00:14:16,190
are also some similarities
in their pattern structure.

278
00:14:16,190 --> 00:14:17,550
And that's all.

279
00:14:17,550 --> 00:14:19,100
Thank you very much guys.

280
00:14:19,100 --> 00:14:20,020
Any questions.

281
00:14:20,020 --> 00:14:23,250
[APPLAUSE]

282
00:14:23,250 --> 00:14:24,793


283
00:14:24,793 --> 00:14:26,210
MICHAEL CUTHBERT:
By the way, just

284
00:14:26,210 --> 00:14:27,930
so you get a sense
for your final things,

285
00:14:27,930 --> 00:14:29,010
that was 12 minutes.

286
00:14:29,010 --> 00:14:32,875
So a tiny bit longer than
what you have as a maximum.

287
00:14:32,875 --> 00:14:37,940


288
00:14:37,940 --> 00:14:39,170
KIKI GUTIERREZ: Cool.

289
00:14:39,170 --> 00:14:41,570
If any of you are interested
in a particular thing

290
00:14:41,570 --> 00:14:43,530
of the algorithm,
just drop me an email,

291
00:14:43,530 --> 00:14:46,190
and we can talk about it.

292
00:14:46,190 --> 00:14:47,277
Thank you.

293
00:14:47,277 --> 00:14:48,860
MICHAEL CUTHBERT:
So what I want to do

294
00:14:48,860 --> 00:14:51,830
is I want to immediately
start by putting

295
00:14:51,830 --> 00:14:54,750
some of these thoughts to work.

296
00:14:54,750 --> 00:15:00,206
So get with a
partner next to you.

297
00:15:00,206 --> 00:15:00,900
You can do that.

298
00:15:00,900 --> 00:15:02,400
Over here, we need
a group of three.

299
00:15:02,400 --> 00:15:03,775
Or unless you want
to participate

300
00:15:03,775 --> 00:15:09,140
with a friend or jump over
there or jump over with Jordan,

301
00:15:09,140 --> 00:15:10,970
if you do.

302
00:15:10,970 --> 00:15:14,990
And here are six melodies.

303
00:15:14,990 --> 00:15:18,090
I'm going to play
each one of them.

304
00:15:18,090 --> 00:15:20,370
I'm going to play A
bunch of times also.

305
00:15:20,370 --> 00:15:23,840
So I'll tell you which one I'm
playing, and you're going to--

306
00:15:23,840 --> 00:15:27,870
and you can talk during this or
wait till just after playing.

307
00:15:27,870 --> 00:15:33,457
You're going to
tell me which melody

308
00:15:33,457 --> 00:15:37,006
is most or least similar to A.

309
00:15:37,006 --> 00:15:43,677
[PLAYING INSTRUMENT]

310
00:15:43,677 --> 00:15:46,020
OK.

311
00:15:46,020 --> 00:15:51,640
And this is B. And I
want you to say why.

312
00:15:51,640 --> 00:15:54,996
[PLAYING INSTRUMENT]

313
00:15:54,996 --> 00:15:59,870


314
00:15:59,870 --> 00:16:02,823
This is C.

315
00:16:02,823 --> 00:16:06,190
[PLAYING INSTRUMENT]

316
00:16:06,190 --> 00:16:09,560


317
00:16:09,560 --> 00:16:11,390
And I'm going to
play A one more time

318
00:16:11,390 --> 00:16:13,636
to get it back into our heads.

319
00:16:13,636 --> 00:16:16,898
[PLAYING INSTRUMENT]

320
00:16:16,898 --> 00:16:19,700


321
00:16:19,700 --> 00:16:23,210
And now D--

322
00:16:23,210 --> 00:16:24,950
Oh, and by the way,
this isn't a right--

323
00:16:24,950 --> 00:16:26,180
isn't a right answer one.

324
00:16:26,180 --> 00:16:29,470
[PLAYING INSTRUMENT]

325
00:16:29,470 --> 00:16:32,300


326
00:16:32,300 --> 00:16:35,000
Great.

327
00:16:35,000 --> 00:16:36,180
Now E.

328
00:16:36,180 --> 00:16:39,484
[PLAYING INSTRUMENT]

329
00:16:39,484 --> 00:16:44,220


330
00:16:44,220 --> 00:16:49,688
And now A one more time before
doing F. So this is A again.

331
00:16:49,688 --> 00:16:53,146
[PLAYING INSTRUMENT]

332
00:16:53,146 --> 00:16:56,610


333
00:16:56,610 --> 00:17:03,240
And now F.

334
00:17:03,240 --> 00:17:04,680
[PLAYING INSTRUMENT]

335
00:17:04,680 --> 00:17:06,970
Oops, that's F and
E simultaneously,

336
00:17:06,970 --> 00:17:07,980
which is very different.

337
00:17:07,980 --> 00:17:09,304
Now F.

338
00:17:09,304 --> 00:17:12,736
[PLAYING INSTRUMENT]

339
00:17:12,736 --> 00:17:16,710


340
00:17:16,710 --> 00:17:17,520
Great.

341
00:17:17,520 --> 00:17:18,609
Talk amongst yourself.

342
00:17:18,609 --> 00:17:21,062
I want a ranking
from your group,

343
00:17:21,062 --> 00:17:22,145
so somebody write it down.

344
00:17:22,145 --> 00:17:27,900


345
00:17:27,900 --> 00:17:29,663
And I want to know why.

346
00:17:29,663 --> 00:17:33,000


347
00:17:33,000 --> 00:17:34,823
OK, let's bring it
all back together.

348
00:17:34,823 --> 00:17:37,650


349
00:17:37,650 --> 00:17:40,350
Let's bring it
all back together.

350
00:17:40,350 --> 00:17:45,570
So look looking at
your list, and we'll

351
00:17:45,570 --> 00:17:47,050
count like normal human beings.

352
00:17:47,050 --> 00:17:50,440
Whichever one is the
top in your list is one.

353
00:17:50,440 --> 00:17:52,390
Whichever one
second, two, stuff.

354
00:17:52,390 --> 00:17:58,000
And we'll vote by fingers first
off for the general ranking.

355
00:17:58,000 --> 00:18:02,775
So if you hold up the
number of fingers that--

356
00:18:02,775 --> 00:18:05,710
if it's closest, it's
pulled up one finger.

357
00:18:05,710 --> 00:18:07,600
If not, two-- whatever.

358
00:18:07,600 --> 00:18:11,350
And full palm means I
have no idea what that is.

359
00:18:11,350 --> 00:18:22,410
OK, B. Oh, B's doing pretty well
A lot of twos, a couple of ones.

360
00:18:22,410 --> 00:18:27,112
C. OK, well, a little
bit more variety.

361
00:18:27,112 --> 00:18:29,070
Hold them up so that
other people can see also.

362
00:18:29,070 --> 00:18:29,770
Look around the room.

363
00:18:29,770 --> 00:18:30,450
It's not just for me.

364
00:18:30,450 --> 00:18:30,950
Great.

365
00:18:30,950 --> 00:18:31,740
Great.

366
00:18:31,740 --> 00:18:38,110
D. Oh, OK, we got a variety
who don't know, whatever.

367
00:18:38,110 --> 00:18:39,090
Great.

368
00:18:39,090 --> 00:18:45,490
E. Oh, still see
some poems or things.

369
00:18:45,490 --> 00:18:45,990
Great.

370
00:18:45,990 --> 00:18:50,280
And F. OK.

371
00:18:50,280 --> 00:18:52,120
Oh, we have one four.

372
00:18:52,120 --> 00:18:54,120
Good, good.

373
00:18:54,120 --> 00:18:55,380
What do you-- you put it down.

374
00:18:55,380 --> 00:18:58,095
What do you guys--
those who put F as four,

375
00:18:58,095 --> 00:18:59,095
what do you put as five?

376
00:18:59,095 --> 00:18:59,597
AUDIENCE: D.

377
00:18:59,597 --> 00:19:00,430
MICHAEL CUTHBERT: B.

378
00:19:00,430 --> 00:19:00,930
AUDIENCE: D.

379
00:19:00,930 --> 00:19:02,820
MICHAEL CUTHBERT:
D. OK, D. Good.

380
00:19:02,820 --> 00:19:03,640
Good.

381
00:19:03,640 --> 00:19:05,100
Great.

382
00:19:05,100 --> 00:19:13,065
Somebody who had C above
B justify your answer.

383
00:19:13,065 --> 00:19:14,260
Who had C above B?

384
00:19:14,260 --> 00:19:15,510
There were a couple people.

385
00:19:15,510 --> 00:19:16,980
Yeah, go ahead.

386
00:19:16,980 --> 00:19:18,960
Jake first.

387
00:19:18,960 --> 00:19:20,580
Yeah, groups.

388
00:19:20,580 --> 00:19:23,940
AUDIENCE: C was basically just a
transposition, whereas B like--

389
00:19:23,940 --> 00:19:26,700
B changed a lot of
the rhythms a bit.

390
00:19:26,700 --> 00:19:31,170
B, I think, fits the exact
pitches a little better, but--

391
00:19:31,170 --> 00:19:34,900
or aside from a couple places
where it has some accidentals,

392
00:19:34,900 --> 00:19:37,600
but it changes the
rhythm quite a bit.

393
00:19:37,600 --> 00:19:39,460
Whereas C is just
a transposition.

394
00:19:39,460 --> 00:19:41,400
So for relative
pitch essentially.

395
00:19:41,400 --> 00:19:43,495
We waited transposition
equivalence more.

396
00:19:43,495 --> 00:19:44,620
MICHAEL CUTHBERT: OK, good.

397
00:19:44,620 --> 00:19:44,790
Good.

398
00:19:44,790 --> 00:19:47,020
I'm already hearing words
that I'm liking stuff.

399
00:19:47,020 --> 00:19:47,558
Great.

400
00:19:47,558 --> 00:19:50,100
Somebody who had it the other
way around justify your answer.

401
00:19:50,100 --> 00:19:53,265
Who had it the other way around?

402
00:19:53,265 --> 00:19:58,140
Who had-- a bunch of people
had B above C. Yeah, John.

403
00:19:58,140 --> 00:20:03,360
AUDIENCE: I'm looking for
the sound feel that A had.

404
00:20:03,360 --> 00:20:07,180
Because a lot of the comments
are probably similar to A and B.

405
00:20:07,180 --> 00:20:09,790
That's why we put it a
bit higher relative to C.

406
00:20:09,790 --> 00:20:13,620
And when you take transposition
into account, only part of C's

407
00:20:13,620 --> 00:20:15,420
becoming transpose,
so it doesn't even

408
00:20:15,420 --> 00:20:17,540
feel like it's like
a full transposition.

409
00:20:17,540 --> 00:20:18,540
MICHAEL CUTHBERT: Great.

410
00:20:18,540 --> 00:20:19,870
Only part of it's transpose.

411
00:20:19,870 --> 00:20:22,950
Yeah, yeah, it kind of
gets back on for a bit

412
00:20:22,950 --> 00:20:24,700
and then comes back off.

413
00:20:24,700 --> 00:20:26,810
Great.

414
00:20:26,810 --> 00:20:30,880
Who had D-- who had D one two?

415
00:20:30,880 --> 00:20:31,410
Anybody?

416
00:20:31,410 --> 00:20:33,660
I can't remember.

417
00:20:33,660 --> 00:20:36,300
Why do you have one or two?

418
00:20:36,300 --> 00:20:38,770
AUDIENCE: Because it
basically has this--

419
00:20:38,770 --> 00:20:42,120
so it has all of the same
notes in the right positions--

420
00:20:42,120 --> 00:20:45,026
or basically--

421
00:20:45,026 --> 00:20:47,040
yeah, all you have to
do is remove notes,

422
00:20:47,040 --> 00:20:49,650
and then you get the
same thing, and--

423
00:20:49,650 --> 00:20:51,190
I think with a few exceptions.

424
00:20:51,190 --> 00:20:52,980
But that never happened
in any of them.

425
00:20:52,980 --> 00:20:53,980
MICHAEL CUTHBERT: Great.

426
00:20:53,980 --> 00:20:57,000
Who had D very low?

427
00:20:57,000 --> 00:20:58,700
Yeah, go ahead, Tony.

428
00:20:58,700 --> 00:20:59,900
AUDIENCE: Like before--

429
00:20:59,900 --> 00:21:01,150
MICHAEL CUTHBERT: Sorry, Tony.

430
00:21:01,150 --> 00:21:02,025
Can you speak louder?

431
00:21:02,025 --> 00:21:06,773
AUDIENCE: Before, like a
rhythm, I guess, the same.

432
00:21:06,773 --> 00:21:07,940
MICHAEL CUTHBERT: OK, great.

433
00:21:07,940 --> 00:21:12,330


434
00:21:12,330 --> 00:21:18,030
Did anybody have E above four?

435
00:21:18,030 --> 00:21:19,446
OK.

436
00:21:19,446 --> 00:21:21,870
AUDIENCE: We have D at three.

437
00:21:21,870 --> 00:21:26,173
And we put it there because
the pattern was very similar,

438
00:21:26,173 --> 00:21:27,840
if not identical,
even though the melody

439
00:21:27,840 --> 00:21:29,110
wasn't all that close.

440
00:21:29,110 --> 00:21:31,205
So we figured that probably
counted for something.

441
00:21:31,205 --> 00:21:33,330
MICHAEL CUTHBERT: So when
you say pattern, what's--

442
00:21:33,330 --> 00:21:33,720
AUDIENCE: It's like the rhythm.

443
00:21:33,720 --> 00:21:34,320
MICHAEL CUTHBERT: The rhythm.

444
00:21:34,320 --> 00:21:36,070
The rhythmic pattern,
it's about the same.

445
00:21:36,070 --> 00:21:37,768
Good.

446
00:21:37,768 --> 00:21:39,310
AUDIENCE: That's an
inversion, right?

447
00:21:39,310 --> 00:21:41,090
MICHAEL CUTHBERT:
Is it an inversion?

448
00:21:41,090 --> 00:21:41,960
No.

449
00:21:41,960 --> 00:21:43,600
No.

450
00:21:43,600 --> 00:21:45,640
Would I do that?

451
00:21:45,640 --> 00:21:47,150
AUDIENCE: The inversion.

452
00:21:47,150 --> 00:21:49,692
MICHAEL CUTHBERT: By the way,
I can't remember where melody--

453
00:21:49,692 --> 00:21:54,490
I think melody A comes from
a Huron book and then gives--

454
00:21:54,490 --> 00:21:58,630
there's some search book, and
I should have my notes better,

455
00:21:58,630 --> 00:22:01,210
and I'll try to make sure
it gets annotated later.

456
00:22:01,210 --> 00:22:02,870
That had three other melodies.

457
00:22:02,870 --> 00:22:05,140
I tried this in the past,
and it was so obvious

458
00:22:05,140 --> 00:22:06,928
that everybody had the
exact same ranking.

459
00:22:06,928 --> 00:22:07,970
I had to agree with them.

460
00:22:07,970 --> 00:22:10,580
But I think this is better
for making some arguments.

461
00:22:10,580 --> 00:22:11,760
Good.

462
00:22:11,760 --> 00:22:14,920
Who had F anything but--

463
00:22:14,920 --> 00:22:16,820
actually, somebody
who gave F five?

464
00:22:16,820 --> 00:22:19,095
Jonathan, why would you--
did you give F five?

465
00:22:19,095 --> 00:22:19,720
AUDIENCE: Yeah.

466
00:22:19,720 --> 00:22:21,580
MICHAEL CUTHBERT: Why
did you give it five?

467
00:22:21,580 --> 00:22:26,660
AUDIENCE: I mean, it didn't have
any noticeable similarities.

468
00:22:26,660 --> 00:22:30,550
Like at first, it seemed
closer to E in terms of-- it

469
00:22:30,550 --> 00:22:33,400
might have been inversion
but then not really.

470
00:22:33,400 --> 00:22:36,890
The rhythm is also completely
different-- or not completely,

471
00:22:36,890 --> 00:22:38,420
but it's fairly different.

472
00:22:38,420 --> 00:22:39,700
MICHAEL CUTHBERT: Great.

473
00:22:39,700 --> 00:22:44,464
So I'm just going to
point, and we'll get some--

474
00:22:44,464 --> 00:22:47,560


475
00:22:47,560 --> 00:22:49,550
just what your ranking is.

476
00:22:49,550 --> 00:22:50,830
So say them from--

477
00:22:50,830 --> 00:22:52,960
Matthew, what was yours?

478
00:22:52,960 --> 00:22:54,220
AUDIENCE: Alphabetical order.

479
00:22:54,220 --> 00:22:55,790
MICHAEL CUTHBERT:
B, C, D, E, F--

480
00:22:55,790 --> 00:22:58,590
OK, good.

481
00:22:58,590 --> 00:22:59,590
AUDIENCE: B, C, D, E, F.

482
00:22:59,590 --> 00:23:01,535
MICHAEL CUTHBERT: B, C, D, E, F.

483
00:23:01,535 --> 00:23:02,870
AUDIENCE: [INAUDIBLE]

484
00:23:02,870 --> 00:23:05,860
MICHAEL CUTHBERT: B,
C, E, D, F. Great.

485
00:23:05,860 --> 00:23:07,510
Great.

486
00:23:07,510 --> 00:23:09,080
Vincent?

487
00:23:09,080 --> 00:23:10,270
AUDIENCE: C, B, D--

488
00:23:10,270 --> 00:23:12,400
MICHAEL CUTHBERT:
C, B, D-- good.

489
00:23:12,400 --> 00:23:14,650
And anything it
feels like you're not

490
00:23:14,650 --> 00:23:16,220
being represented on there?

491
00:23:16,220 --> 00:23:18,850
Hannah, what's your group have?

492
00:23:18,850 --> 00:23:22,270
AUDIENCE: I think we
put C, B, D, E, F.

493
00:23:22,270 --> 00:23:24,945
MICHAEL CUTHBERT: C, B,
D, E, F-- great, super.

494
00:23:24,945 --> 00:23:26,320
Now what I want
you to do-- we're

495
00:23:26,320 --> 00:23:29,150
not going to get through
all of the exercises today,

496
00:23:29,150 --> 00:23:31,215
but I think this is the
most important part.

497
00:23:31,215 --> 00:23:35,763
What I want you to do is
think about what ways--

498
00:23:35,763 --> 00:23:38,037


499
00:23:38,037 --> 00:23:39,620
I'll give you a
little bit of things--

500
00:23:39,620 --> 00:23:42,640
what are some ways you can
make sure that your computer

501
00:23:42,640 --> 00:23:46,250
system that is going to
classify things by similarity

502
00:23:46,250 --> 00:23:49,790
follows your intuition
of what is similar,

503
00:23:49,790 --> 00:23:54,020
and not somebody else's
intuition for what is similar?

504
00:23:54,020 --> 00:23:57,080
So that's going to be the main
theme for the rest of this.

505
00:23:57,080 --> 00:23:59,570
So we are intelligent people.

506
00:23:59,570 --> 00:24:01,100
We are intelligent musicians.

507
00:24:01,100 --> 00:24:04,330
We make these
choices, and yet we

508
00:24:04,330 --> 00:24:09,540
are making differences on how
far and how similar they are.

509
00:24:09,540 --> 00:24:11,290
So, in fact, I'm going
to blank the screen

510
00:24:11,290 --> 00:24:16,820
and say the one takeaway
from today's lecture, I hope,

511
00:24:16,820 --> 00:24:19,150
and from all these
things, is that there

512
00:24:19,150 --> 00:24:25,610
is no right answer for the
similarity between two melodies,

513
00:24:25,610 --> 00:24:28,460
between the similarity
between two pieces.

514
00:24:28,460 --> 00:24:30,350
There may be wrong answers.

515
00:24:30,350 --> 00:24:33,100
I will not deny that,
that if somebody

516
00:24:33,100 --> 00:24:37,510
said that F was closer to, I
don't know, than the same thing

517
00:24:37,510 --> 00:24:39,200
with one note
changed or something,

518
00:24:39,200 --> 00:24:43,420
I would think that that might be
wrong, that your program might

519
00:24:43,420 --> 00:24:44,360
be malfunctioning.

520
00:24:44,360 --> 00:24:45,820
But there isn't a right answer.

521
00:24:45,820 --> 00:24:47,200
And a lot of it--

522
00:24:47,200 --> 00:24:50,290
what's different
between good answers

523
00:24:50,290 --> 00:24:53,980
are what we think
of as important

524
00:24:53,980 --> 00:24:55,370
when thinking similarity.

525
00:24:55,370 --> 00:24:57,410
There is a yearly competition--

526
00:24:57,410 --> 00:24:59,683
I think it's been
suspended since COVID,

527
00:24:59,683 --> 00:25:01,100
so I don't know
if it's restarted,

528
00:25:01,100 --> 00:25:06,460
but for the algorithm
that can classify songs

529
00:25:06,460 --> 00:25:07,970
as the most similar.

530
00:25:07,970 --> 00:25:09,800
And here is a place
where I would say,

531
00:25:09,800 --> 00:25:11,780
what are your ground truths?

532
00:25:11,780 --> 00:25:14,720
How do we trust that you
have gotten it right?

533
00:25:14,720 --> 00:25:20,120
And are we just trying having
to recreate the views--

534
00:25:20,120 --> 00:25:22,390
I won't say biases, but
the views of the people

535
00:25:22,390 --> 00:25:28,940
who organize the conference and
what's that going to do for us?

536
00:25:28,940 --> 00:25:32,540
So I want you to
start thinking that.

537
00:25:32,540 --> 00:25:36,710
And I will tell you
what F is beforehand.

538
00:25:36,710 --> 00:25:40,890
F is one that a lot of
computers' programs--

539
00:25:40,890 --> 00:25:46,580
in fact, what I went aha
during one of these algorithms,

540
00:25:46,580 --> 00:25:47,120
they could--

541
00:25:47,120 --> 00:25:50,220
F is 1 that a number
of algorithms,

542
00:25:50,220 --> 00:25:54,590
especially older ones, will
classify as the most similar.

543
00:25:54,590 --> 00:25:57,050
Because what is F?

544
00:25:57,050 --> 00:26:03,740
Unlike any of the other
lines, F has every single note

545
00:26:03,740 --> 00:26:06,420
and every single rhythm,
if I did it right.

546
00:26:06,420 --> 00:26:07,430
I was doing in my head.

547
00:26:07,430 --> 00:26:10,630
Every single note and every
single rhythm from A--

548
00:26:10,630 --> 00:26:13,670
just order didn't matter.

549
00:26:13,670 --> 00:26:20,630
A is the counterset function,
the unordered version, the P--

550
00:26:20,630 --> 00:26:25,550
yeah, the permutation does
not matter version of F.

551
00:26:25,550 --> 00:26:28,590
Or F is the permutation does
not matter version of A.

552
00:26:28,590 --> 00:26:30,218
Did I get it right?

553
00:26:30,218 --> 00:26:31,260
AUDIENCE: It looks right.

554
00:26:31,260 --> 00:26:32,302
AUDIENCE: It looks right.

555
00:26:32,302 --> 00:26:34,608


556
00:26:34,608 --> 00:26:37,588
MICHAEL CUTHBERT: So
we're going to go quickly

557
00:26:37,588 --> 00:26:39,380
through some things I
think you've probably

558
00:26:39,380 --> 00:26:42,390
seen before, some ways
of measuring distance.

559
00:26:42,390 --> 00:26:47,000
You all learned this at some
point, the Euclidean distance

560
00:26:47,000 --> 00:26:48,900
between two points.

561
00:26:48,900 --> 00:26:51,390
Take the square
root of the x terms.

562
00:26:51,390 --> 00:26:53,190
Take the square
root of the y term.

563
00:26:53,190 --> 00:26:57,350
Add, what, difference squared
plus difference squared

564
00:26:57,350 --> 00:26:59,610
square root.

565
00:26:59,610 --> 00:27:02,720
Square root of x squared
plus difference between x

566
00:27:02,720 --> 00:27:03,925
and difference between y.

567
00:27:03,925 --> 00:27:06,800


568
00:27:06,800 --> 00:27:11,120
So anyone seen this thing, where
the distance between these two

569
00:27:11,120 --> 00:27:17,000
points, 3 comma 2
and 7 comma 8, is 10.

570
00:27:17,000 --> 00:27:21,200
Taxicab distance--
Manhattan distance,

571
00:27:21,200 --> 00:27:23,480
we'll go with taxicabs
since not all of us

572
00:27:23,480 --> 00:27:27,630
have been to a Manhattan and had
the joys of taking a taxi there.

573
00:27:27,630 --> 00:27:29,840
And so why this--

574
00:27:29,840 --> 00:27:32,180
here, the distance was 10.

575
00:27:32,180 --> 00:27:34,923
Here, it's approximately--
that's not a negative sign.

576
00:27:34,923 --> 00:27:36,090
That's an approximate sign--

577
00:27:36,090 --> 00:27:39,296
approximately 7.2.

578
00:27:39,296 --> 00:27:42,620
Why is the distance
greater here?

579
00:27:42,620 --> 00:27:46,610
Somebody who's done this
triangle inequality.

580
00:27:46,610 --> 00:27:50,210
Talk English to me for a second.

581
00:27:50,210 --> 00:27:54,588
Talk like you're talking
to your cab driver

582
00:27:54,588 --> 00:27:55,880
who you're explaining to this--

583
00:27:55,880 --> 00:27:57,088
cab drivers are really smart.

584
00:27:57,088 --> 00:28:00,180
Talk, but who may not have
heard the final inequality.

585
00:28:00,180 --> 00:28:03,410
What is represented
by the term Manhattan

586
00:28:03,410 --> 00:28:07,730
distance or taxicab distance?

587
00:28:07,730 --> 00:28:13,220
What's the notion-- intuition?

588
00:28:13,220 --> 00:28:14,133
Yeah?

589
00:28:14,133 --> 00:28:15,300
AUDIENCE: Go along the axes.

590
00:28:15,300 --> 00:28:19,020
MICHAEL CUTHBERT: Go along
the axes or that go along--

591
00:28:19,020 --> 00:28:21,290
let's get more literal.

592
00:28:21,290 --> 00:28:23,210
One of the things that
we don't do so well

593
00:28:23,210 --> 00:28:24,780
is step back into
the real world.

594
00:28:24,780 --> 00:28:26,940
What is the distance traveled?

595
00:28:26,940 --> 00:28:33,420
What constrains the taxicab
from not hitting distance of 7.2

596
00:28:33,420 --> 00:28:34,170
but instead of 10?

597
00:28:34,170 --> 00:28:34,760
Yeah?

598
00:28:34,760 --> 00:28:37,550
AUDIENCE: You get straight
up and down to the side.

599
00:28:37,550 --> 00:28:38,450
MICHAEL CUTHBERT: You
can only go straight

600
00:28:38,450 --> 00:28:39,180
up and down to the side.

601
00:28:39,180 --> 00:28:41,250
You can only go on-- let's
go even further back.

602
00:28:41,250 --> 00:28:44,010
What, in Manhattan, if you
don't want to get arrested,

603
00:28:44,010 --> 00:28:45,320
you can only drive on?

604
00:28:45,320 --> 00:28:46,070
AUDIENCE: Streets.

605
00:28:46,070 --> 00:28:47,153
MICHAEL CUTHBERT: Streets.

606
00:28:47,153 --> 00:28:49,400
And the streets
in Manhattan go--

607
00:28:49,400 --> 00:28:50,360
AUDIENCE: Orthogonal.

608
00:28:50,360 --> 00:28:51,690
MICHAEL CUTHBERT: Yeah,
they're orthogonal.

609
00:28:51,690 --> 00:28:53,220
There are these little lines.

610
00:28:53,220 --> 00:28:56,400
So you are constrained
in where you can go.

611
00:28:56,400 --> 00:28:59,340
So if there are constraints
on your distance,

612
00:28:59,340 --> 00:29:05,150
and the most common one is you
can go up, down, left, or right.

613
00:29:05,150 --> 00:29:08,150
You can't always do that
in Manhattan but because

614
00:29:08,150 --> 00:29:08,790
of one ways.

615
00:29:08,790 --> 00:29:11,340
But let's assume that we
have certain constraints.

616
00:29:11,340 --> 00:29:12,690
You can be brought down.

617
00:29:12,690 --> 00:29:13,190
Good.

618
00:29:13,190 --> 00:29:15,530
I wanted to make sure
that we all have that,

619
00:29:15,530 --> 00:29:18,380
and so that we can
start thinking about--

620
00:29:18,380 --> 00:29:21,420


621
00:29:21,420 --> 00:29:23,540
first off, that what
operations are allowed

622
00:29:23,540 --> 00:29:25,890
determines the distance metric.

623
00:29:25,890 --> 00:29:28,340
What operations are
allowed determines

624
00:29:28,340 --> 00:29:29,790
how far the distance are.

625
00:29:29,790 --> 00:29:32,223
What are some operations
we do in music?

626
00:29:32,223 --> 00:29:36,350


627
00:29:36,350 --> 00:29:37,350
That's a question.

628
00:29:37,350 --> 00:29:40,290
What operations do we
allow and not allow?

629
00:29:40,290 --> 00:29:40,790
Adam?

630
00:29:40,790 --> 00:29:42,600
AUDIENCE: We could look
at many different models.

631
00:29:42,600 --> 00:29:45,225
MICHAEL CUTHBERT: We can look at
midi difference between notes.

632
00:29:45,225 --> 00:29:50,460
So therefore, we can take notes
and bring them higher and lower.

633
00:29:50,460 --> 00:29:52,890
We can raise and lower notes.

634
00:29:52,890 --> 00:29:54,535
What are other things we can do?

635
00:29:54,535 --> 00:29:58,190


636
00:29:58,190 --> 00:30:00,890
What are some things you've
ever done with a piece

637
00:30:00,890 --> 00:30:04,910
to make it a little bit
different or interesting?

638
00:30:04,910 --> 00:30:05,810
Yeah?

639
00:30:05,810 --> 00:30:08,030
AUDIENCE: You can
subdivide or combine notes.

640
00:30:08,030 --> 00:30:11,000
MICHAEL CUTHBERT: You can
subdivide or combine notes.

641
00:30:11,000 --> 00:30:12,330
Maybe you can, maybe you can't.

642
00:30:12,330 --> 00:30:13,740
But yeah, quite often you can.

643
00:30:13,740 --> 00:30:16,850
This is a context where you can.

644
00:30:16,850 --> 00:30:17,510
Yeah, other--

645
00:30:17,510 --> 00:30:18,427
AUDIENCE: --durations.

646
00:30:18,427 --> 00:30:22,400
MICHAEL CUTHBERT: You can
change durations-- great, super.

647
00:30:22,400 --> 00:30:24,800
How about this?

648
00:30:24,800 --> 00:30:29,220
Which of these two chords
are closer to the first one?

649
00:30:29,220 --> 00:30:36,103
The first one is going
to be C major versus--

650
00:30:36,103 --> 00:30:43,800


651
00:30:43,800 --> 00:30:46,500
another little
similarity problem.

652
00:30:46,500 --> 00:30:50,730
The second one was G major.

653
00:30:50,730 --> 00:30:53,590
Sorry, the first
one was G major.

654
00:30:53,590 --> 00:30:58,170
The second one, I went from
C major to C augmented.

655
00:30:58,170 --> 00:30:59,770
Great, C augmented triad.

656
00:30:59,770 --> 00:31:02,290
So those are two things.

657
00:31:02,290 --> 00:31:02,830
Which one?

658
00:31:02,830 --> 00:31:08,130
Who votes that from going from
C major to G major is closer?

659
00:31:08,130 --> 00:31:12,330
Who votes that C major and
C augmented are closer?

660
00:31:12,330 --> 00:31:12,850
Two people.

661
00:31:12,850 --> 00:31:13,510
OK, great.

662
00:31:13,510 --> 00:31:18,210
So a lot of it has to do
with your thought about--

663
00:31:18,210 --> 00:31:21,990
well, on the augmented,
you're only changing one note,

664
00:31:21,990 --> 00:31:25,830
and you're only changing by a
half step, the minimum distance

665
00:31:25,830 --> 00:31:30,780
in our Manhattanized musical
world of midi and piano

666
00:31:30,780 --> 00:31:31,530
keyboards.

667
00:31:31,530 --> 00:31:34,560
That is, their minimum
distance is one half step--

668
00:31:34,560 --> 00:31:36,040
not for all music in the world.

669
00:31:36,040 --> 00:31:36,540
Great.

670
00:31:36,540 --> 00:31:38,290
C major to G major--

671
00:31:38,290 --> 00:31:40,432
you're also just moving one.

672
00:31:40,432 --> 00:31:41,890
If you think of
something this way,

673
00:31:41,890 --> 00:31:45,720
you're moving one
distance in what space?

674
00:31:45,720 --> 00:31:46,735
AUDIENCE: [INAUDIBLE]

675
00:31:46,735 --> 00:31:48,360
MICHAEL CUTHBERT:
Oh, what's that word?

676
00:31:48,360 --> 00:31:48,970
AUDIENCE: Circle of fifths.

677
00:31:48,970 --> 00:31:50,910
MICHAEL CUTHBERT: It's
circle of fifths space.

678
00:31:50,910 --> 00:31:53,490
C and G are about
as close as you

679
00:31:53,490 --> 00:31:55,410
can get without
being the identity,

680
00:31:55,410 --> 00:31:57,947
C and F probably the other
way, although, I don't know,

681
00:31:57,947 --> 00:31:59,530
maybe it's a one way
circle of fifths.

682
00:31:59,530 --> 00:32:02,110
You only go around one
direction or another.

683
00:32:02,110 --> 00:32:04,380
Great.

684
00:32:04,380 --> 00:32:06,750
Based on the time,
I'm not going to go

685
00:32:06,750 --> 00:32:09,570
through all these other
measurements of distance

686
00:32:09,570 --> 00:32:10,420
that people can do.

687
00:32:10,420 --> 00:32:12,430
Who has heard of
earthmover distance?

688
00:32:12,430 --> 00:32:14,790
That is the amount
of work that it

689
00:32:14,790 --> 00:32:20,260
takes to move one mound of
things over to another place.

690
00:32:20,260 --> 00:32:23,280
And sometimes, you're
optimizing depending

691
00:32:23,280 --> 00:32:26,700
on how much it costs
to move distance

692
00:32:26,700 --> 00:32:29,760
and how much it costs
to move material.

693
00:32:29,760 --> 00:32:31,513
You can end up with
different results.

694
00:32:31,513 --> 00:32:35,183


695
00:32:35,183 --> 00:32:36,600
This was one of
the charts I think

696
00:32:36,600 --> 00:32:39,240
I showed early in the
semester is here's

697
00:32:39,240 --> 00:32:41,490
one place where
earthmover distance might

698
00:32:41,490 --> 00:32:46,840
be a good use of
things of distances.

699
00:32:46,840 --> 00:32:49,860
And then Levenshtein
or edit distance

700
00:32:49,860 --> 00:32:51,810
is what was mentioned in--

701
00:32:51,810 --> 00:32:53,550
do you use it in your work?

702
00:32:53,550 --> 00:32:56,400
Yep, so that's where
you're talking about,

703
00:32:56,400 --> 00:33:02,370
so the idea of how to change
the word Hyundai into Honda,

704
00:33:02,370 --> 00:33:08,230
and no international East Asian
politics please for a second.

705
00:33:08,230 --> 00:33:13,510
And you can think of every
time, OK, H and H are the same.

706
00:33:13,510 --> 00:33:15,130
So it has a cost of 0.

707
00:33:15,130 --> 00:33:19,300
Or we can delete the H and start
an O, and we have a cost of 1.

708
00:33:19,300 --> 00:33:22,380
But we can find the
pattern of as we

709
00:33:22,380 --> 00:33:25,560
change, we're going to
insert a Y after the H.

710
00:33:25,560 --> 00:33:28,590
We're going to
substitute a U for an O.

711
00:33:28,590 --> 00:33:30,570
This should be
symmetrical the other way

712
00:33:30,570 --> 00:33:32,100
around-- different operations.

713
00:33:32,100 --> 00:33:34,480
N is the same, so that's good.

714
00:33:34,480 --> 00:33:36,610
D is the same, so it
doesn't cost anything.

715
00:33:36,610 --> 00:33:38,050
So our cost function goes here.

716
00:33:38,050 --> 00:33:41,490
And so we're trying to find
the minimum cost from going

717
00:33:41,490 --> 00:33:43,770
from one end to another.

718
00:33:43,770 --> 00:33:47,650
We don't have time to go through
all of the algorithms for this.

719
00:33:47,650 --> 00:33:50,070
But Levenshtein
distance, edit distance,

720
00:33:50,070 --> 00:33:53,550
has a lot of good
qualities that makes

721
00:33:53,550 --> 00:33:58,170
it useful for a lot of
musical similarity tasks.

722
00:33:58,170 --> 00:34:00,960
Just so that you can
say your professor

723
00:34:00,960 --> 00:34:04,780
at least put the algorithm
up on the hand for a second.

724
00:34:04,780 --> 00:34:07,350
But more importantly,
I think a lot of times

725
00:34:07,350 --> 00:34:10,770
is thinking about the
particular costs of things

726
00:34:10,770 --> 00:34:13,330
in a musical space,
in a musical world.

727
00:34:13,330 --> 00:34:18,330
So for instance,
is deleting what--

728
00:34:18,330 --> 00:34:23,219
we're trying to think about
two pieces, two melodies.

729
00:34:23,219 --> 00:34:26,580
One of them deletes
the first note.

730
00:34:26,580 --> 00:34:28,980
What would you call
the cost on that?

731
00:34:28,980 --> 00:34:30,850
Well, maybe 1.

732
00:34:30,850 --> 00:34:36,630
But then, if it doesn't make
up the total rhythm later,

733
00:34:36,630 --> 00:34:38,770
and everything from here
on is going to be off,

734
00:34:38,770 --> 00:34:41,969
and it's one line within
an orchestral piece,

735
00:34:41,969 --> 00:34:46,630
that might be a higher
cost, maybe some--

736
00:34:46,630 --> 00:34:49,830
and the classic debate
is whether changing

737
00:34:49,830 --> 00:34:53,820
a note is that the same
or changing a letter

738
00:34:53,820 --> 00:34:55,210
in something like this?

739
00:34:55,210 --> 00:34:56,980
Is this the same?

740
00:34:56,980 --> 00:34:59,860
Does this cost 1
or does this cost--

741
00:34:59,860 --> 00:35:02,530
well, one way you can change
a letter is you delete it,

742
00:35:02,530 --> 00:35:06,220
and then you add a new
letter back with a cost of 2.

743
00:35:06,220 --> 00:35:09,000
And these are things
that come up quite a bit

744
00:35:09,000 --> 00:35:11,490
in similarity search.

745
00:35:11,490 --> 00:35:15,220
And just really want to say that
it comes up a lot in music--

746
00:35:15,220 --> 00:35:19,240
don't borrow your distance
metric from somebody else.

747
00:35:19,240 --> 00:35:21,880
Different ones might be used
for different situations.

748
00:35:21,880 --> 00:35:25,540
So the distance
between dog and gato--

749
00:35:25,540 --> 00:35:29,260
well, we can substitute d for g.

750
00:35:29,260 --> 00:35:31,490
I don't know, or maybe
we add other things.

751
00:35:31,490 --> 00:35:34,670
But I'm going to assert
that, in some situations,

752
00:35:34,670 --> 00:35:38,570
the distance might be
2 between dog and gato.

753
00:35:38,570 --> 00:35:42,460
What we do is we use the
substitute closely related

754
00:35:42,460 --> 00:35:44,960
pet function for cost of 1.

755
00:35:44,960 --> 00:35:48,700
So dog becomes cat, and then
translate English to Spanish

756
00:35:48,700 --> 00:35:50,500
might cost 1.

757
00:35:50,500 --> 00:35:53,200
And if you think about
large language learning

758
00:35:53,200 --> 00:35:55,840
models and things,
you might want

759
00:35:55,840 --> 00:35:57,710
to have functions like this.

760
00:35:57,710 --> 00:36:02,690
And, in fact, this is not a
digital humanities text class.

761
00:36:02,690 --> 00:36:04,640
But if it were and we
were doing computation,

762
00:36:04,640 --> 00:36:08,300
we'd definitely be talking about
an algorithm called word2vec,

763
00:36:08,300 --> 00:36:11,950
which was one of the earlier
successful algorithms

764
00:36:11,950 --> 00:36:16,120
for trying to predict what words
are similar to other words, what

765
00:36:16,120 --> 00:36:21,040
words are synonyms, so you can
create a kind of cost function

766
00:36:21,040 --> 00:36:24,830
that is for this word is a
synonym for this one that

767
00:36:24,830 --> 00:36:29,460
has been substituted
that is lower than this--

768
00:36:29,460 --> 00:36:31,490
then this sentence is
different from this one

769
00:36:31,490 --> 00:36:34,403
because it's using a
completely different concept.

770
00:36:34,403 --> 00:36:37,400


771
00:36:37,400 --> 00:36:39,830
I'll skip that.

772
00:36:39,830 --> 00:36:44,810
So when we're thinking
about these distances

773
00:36:44,810 --> 00:36:49,040
and these weird things
like substitute dog for cat

774
00:36:49,040 --> 00:36:51,950
on low-cost substitute
cat for gato

775
00:36:51,950 --> 00:36:57,450
at low cost, what's the term
that we spent a lot of time,

776
00:36:57,450 --> 00:36:59,360
maybe even too much
time for-- it felt

777
00:36:59,360 --> 00:37:02,480
like at the time-- talking about
earlier in this semester, that

778
00:37:02,480 --> 00:37:06,480
helps to think about things
that are not the same,

779
00:37:06,480 --> 00:37:08,800
but might be closely
related to each other?

780
00:37:08,800 --> 00:37:13,173


781
00:37:13,173 --> 00:37:14,090
AUDIENCE: Equivalence.

782
00:37:14,090 --> 00:37:16,760
MICHAEL CUTHBERT: Equivalence,
or equivalence classes, yes.

783
00:37:16,760 --> 00:37:18,950
So one of the things
you might want to do

784
00:37:18,950 --> 00:37:21,840
is define what equivalence
classes it could be.

785
00:37:21,840 --> 00:37:23,560
I mean, I think last--

786
00:37:23,560 --> 00:37:27,400
other times, I've given the
exact same melody up an octave,

787
00:37:27,400 --> 00:37:29,000
and everybody
immediately said, oh,

788
00:37:29,000 --> 00:37:31,160
that is basically
the same thing.

789
00:37:31,160 --> 00:37:33,310
So everybody was
very quickly putting

790
00:37:33,310 --> 00:37:36,730
in an oh, equivalence
class things.

791
00:37:36,730 --> 00:37:41,600
So I wanted to make
sure that we had that.

792
00:37:41,600 --> 00:37:44,680
And so once you have
these distances,

793
00:37:44,680 --> 00:37:48,407
we tend to go through-- and this
is if you're in a biology class,

794
00:37:48,407 --> 00:37:50,240
you'll spend a lot of
computational biology,

795
00:37:50,240 --> 00:37:53,140
a lot of time on this--
sequence alignment,

796
00:37:53,140 --> 00:37:55,120
a kind of distance
metric where you're

797
00:37:55,120 --> 00:37:58,720
trying to find the minimum
distance between two things

798
00:37:58,720 --> 00:38:03,290
that you believe might represent
the same type of thing.

799
00:38:03,290 --> 00:38:06,620
Or you might say it's
innocent until proven guilty.

800
00:38:06,620 --> 00:38:10,060
We'll first try to see if they
can be changed into another

801
00:38:10,060 --> 00:38:15,610
thing at a low cost and then
discard once we realize the cost

802
00:38:15,610 --> 00:38:17,860
cannot be minimized.

803
00:38:17,860 --> 00:38:21,940
I will say that algorithms that
can be short circuited, that you

804
00:38:21,940 --> 00:38:24,250
can prove at a certain
point you can't

805
00:38:24,250 --> 00:38:31,070
do better than this cost will
speed up a lot of your run times

806
00:38:31,070 --> 00:38:32,050
because once--

807
00:38:32,050 --> 00:38:36,880
you might say that
there's no way

808
00:38:36,880 --> 00:38:44,030
that this could be better than
20% or it could be-- yeah,

809
00:38:44,030 --> 00:38:47,080
there's no way that this could
possibly be better than 90%

810
00:38:47,080 --> 00:38:48,260
similar to this.

811
00:38:48,260 --> 00:38:51,280
So I'm going to stop looking
at the rest of the piece

812
00:38:51,280 --> 00:38:54,280
or whatever your cut off.

813
00:38:54,280 --> 00:38:58,060
So one of the classic things
for sequence alignment

814
00:38:58,060 --> 00:38:59,680
is trying to find--

815
00:38:59,680 --> 00:39:02,530
this is Google's data
set they released

816
00:39:02,530 --> 00:39:06,370
at the height of Britney
Spears popularity of all

817
00:39:06,370 --> 00:39:09,910
the number of searches
that they believed

818
00:39:09,910 --> 00:39:14,740
were trying to find the top
left one, Britney Spears--

819
00:39:14,740 --> 00:39:16,270
actually really,
really impressed

820
00:39:16,270 --> 00:39:20,620
that the number of correct
spellings of a hard name

821
00:39:20,620 --> 00:39:22,460
to spell outweighs the rest.

822
00:39:22,460 --> 00:39:23,870
Anyhow, that's
all we're talking.

823
00:39:23,870 --> 00:39:27,310
And the people who are
really, really good

824
00:39:27,310 --> 00:39:30,610
at this-- and any time
I'm trying to figure out

825
00:39:30,610 --> 00:39:34,090
a similarity sequence
alignment or a similarity task

826
00:39:34,090 --> 00:39:37,300
that I don't know is to
look at the people who

827
00:39:37,300 --> 00:39:42,130
are trying to align
base pairs in biology

828
00:39:42,130 --> 00:39:52,760
or trying to align genes because
they have many, many options.

829
00:39:52,760 --> 00:39:55,120
So I'm just going
to keep pounding

830
00:39:55,120 --> 00:39:58,510
this term in as many
different ways I can do.

831
00:39:58,510 --> 00:40:00,670
All the things that
we're just working with--

832
00:40:00,670 --> 00:40:06,050
Hyundai, Honda, Britney Spears,
genes, those are all strings.

833
00:40:06,050 --> 00:40:10,100
But we work on notes and clefs
and things like notes and stuff,

834
00:40:10,100 --> 00:40:11,060
but things like that.

835
00:40:11,060 --> 00:40:13,480
So how do we get them in?

836
00:40:13,480 --> 00:40:16,610
So this is great to get this
from two different people,

837
00:40:16,610 --> 00:40:19,740
same thing-- we use
things called hashes,

838
00:40:19,740 --> 00:40:23,840
which are very similar to
the concept of viewpoints

839
00:40:23,840 --> 00:40:25,370
to the rescue, so that--

840
00:40:25,370 --> 00:40:28,670
try to convert things--

841
00:40:28,670 --> 00:40:29,610
hash and note.

842
00:40:29,610 --> 00:40:33,290
We might say that here are
equivalence classes all notes

843
00:40:33,290 --> 00:40:34,560
that are names with octave.

844
00:40:34,560 --> 00:40:39,510
And so we might hash a stream by
just joining all the hash notes

845
00:40:39,510 --> 00:40:42,410
for all the notes in there.

846
00:40:42,410 --> 00:40:46,050
You will find, in Music 21,
if you're working on it,

847
00:40:46,050 --> 00:40:48,800
there's a bunch of
tools for this already.

848
00:40:48,800 --> 00:40:51,380
They're in
music21.search, a module

849
00:40:51,380 --> 00:40:52,970
that we have not
talked about now

850
00:40:52,970 --> 00:40:54,480
and we will not
talk about again.

851
00:40:54,480 --> 00:40:56,310
But if you're doing
a lot of searching,

852
00:40:56,310 --> 00:41:01,920
it's probably worth reading
the module reference for it.

853
00:41:01,920 --> 00:41:03,990
I think that there
might be a user's guide,

854
00:41:03,990 --> 00:41:05,900
but I can't remember
if I finished it

855
00:41:05,900 --> 00:41:08,220
or if it just trails
off after a few words.

856
00:41:08,220 --> 00:41:13,980
So we might take a string,
convert it to a stream--

857
00:41:13,980 --> 00:41:15,710
that's hard to say very fast--

858
00:41:15,710 --> 00:41:17,910
and translate it.

859
00:41:17,910 --> 00:41:20,940
And we might have
some of hash function

860
00:41:20,940 --> 00:41:27,990
that tries to make everything
into an ASCII character.

861
00:41:27,990 --> 00:41:30,060
Though there's no
reason that everything

862
00:41:30,060 --> 00:41:34,860
needs to be turned into a
string like nameWithOctave.

863
00:41:34,860 --> 00:41:41,183
In a lot of ways, strings
are just arrays of ints.

864
00:41:41,183 --> 00:41:43,740


865
00:41:43,740 --> 00:41:53,880
We're talking about that A--

866
00:41:53,880 --> 00:41:59,010
no, lowercase a is generally
represented internally

867
00:41:59,010 --> 00:42:01,048
as anyone-- remember number?

868
00:42:01,048 --> 00:42:01,590
AUDIENCE: 97.

869
00:42:01,590 --> 00:42:03,090
MICHAEL CUTHBERT: What's that?

870
00:42:03,090 --> 00:42:05,580
97 or 96, I can't remember.

871
00:42:05,580 --> 00:42:06,370
96?

872
00:42:06,370 --> 00:42:08,772
Yep, and capital
A-- is that one 60--

873
00:42:08,772 --> 00:42:09,314
AUDIENCE: 65.

874
00:42:09,314 --> 00:42:10,189
MICHAEL CUTHBERT: 65.

875
00:42:10,189 --> 00:42:12,250
OK, so some people
some people know these.

876
00:42:12,250 --> 00:42:13,930
I used to have them
all top of the head.

877
00:42:13,930 --> 00:42:16,020
So all the letters
you're doing have

878
00:42:16,020 --> 00:42:17,540
a particular representation.

879
00:42:17,540 --> 00:42:22,597
And back in the bad, bad days
of the '60s and '70s, different

880
00:42:22,597 --> 00:42:24,930
computers would have different
representations for this,

881
00:42:24,930 --> 00:42:28,950
and then we all agreed on the
same representation for letters.

882
00:42:28,950 --> 00:42:33,890
And then we remembered that
there are other things in the--

883
00:42:33,890 --> 00:42:36,290
other characters in the world.

884
00:42:36,290 --> 00:42:38,180
That looks too much
like an A. How do I

885
00:42:38,180 --> 00:42:43,020
do a jin or something, or an
alpha, beta, things like that.

886
00:42:43,020 --> 00:42:45,425
And then, for a while,
we had a big problem

887
00:42:45,425 --> 00:42:47,550
that they weren't all
converging to the same thing.

888
00:42:47,550 --> 00:42:51,230
Anyhow, digression aside,
maybe we'll get to the point

889
00:42:51,230 --> 00:42:54,680
where we can start converting
things besides midi numbers

890
00:42:54,680 --> 00:42:58,260
and notes into something
more standardized

891
00:42:58,260 --> 00:43:00,740
because, right now, the
midi numbers is basically

892
00:43:00,740 --> 00:43:03,290
the only standardized
notes, which is probably

893
00:43:03,290 --> 00:43:10,430
why midi keeps being used for a
lot of computational projects.

894
00:43:10,430 --> 00:43:14,720
So the hard part is
always finding out

895
00:43:14,720 --> 00:43:18,230
what numbers we should
use to represent a note.

896
00:43:18,230 --> 00:43:21,060


897
00:43:21,060 --> 00:43:24,210
So if we're going to
convert nameWithOctave

898
00:43:24,210 --> 00:43:26,070
and we want to make
a string, and then

899
00:43:26,070 --> 00:43:27,990
we want to make it
a number, and then

900
00:43:27,990 --> 00:43:30,100
we want to have a
whole bunch of numbers,

901
00:43:30,100 --> 00:43:35,640
what have we just recently
seen that looks like a tool

902
00:43:35,640 --> 00:43:39,820
to take a score or
a part or something

903
00:43:39,820 --> 00:43:42,120
and works like a
hash or a viewpoint

904
00:43:42,120 --> 00:43:44,413
that tries to convert it
to a bunch of numbers?

905
00:43:44,413 --> 00:43:53,460


906
00:43:53,460 --> 00:43:58,440
Not asking you to think too
far back, but farther back

907
00:43:58,440 --> 00:43:58,970
than today.

908
00:43:58,970 --> 00:44:02,100


909
00:44:02,100 --> 00:44:02,792
Yeah?

910
00:44:02,792 --> 00:44:04,750
AUDIENCE: I'm getting a
feature representation.

911
00:44:04,750 --> 00:44:05,730
MICHAEL CUTHBERT:
Yeah, extracting

912
00:44:05,730 --> 00:44:07,690
features, getting a
feature representation.

913
00:44:07,690 --> 00:44:10,860
Yeah, so feature
extraction and this kind

914
00:44:10,860 --> 00:44:15,790
of viewpoint searching go
hand-in-hand with each other.

915
00:44:15,790 --> 00:44:20,312
So if it's partially why once
you finish up a search function,

916
00:44:20,312 --> 00:44:22,020
you're just going to
want to probably try

917
00:44:22,020 --> 00:44:25,410
to see if AI or machine
learning can do it

918
00:44:25,410 --> 00:44:27,970
better because you have
everything ready to go for it.

919
00:44:27,970 --> 00:44:32,520
But sometimes what we extract
is different from others.

920
00:44:32,520 --> 00:44:39,960
I want to give a little bit
of a caution that back then--

921
00:44:39,960 --> 00:44:41,680
of course, you had
to do a little final,

922
00:44:41,680 --> 00:44:43,350
a final project called the UAP--

923
00:44:43,350 --> 00:44:45,310
and we thought that
making these viewpoints,

924
00:44:45,310 --> 00:44:50,580
making a hashing system for
Music 21 for comparisons

925
00:44:50,580 --> 00:44:52,750
would be a nice senior project.

926
00:44:52,750 --> 00:44:56,520
And then we both
realized that, no,

927
00:44:56,520 --> 00:44:59,100
it's a lot bigger than we
thought and a lot more complex

928
00:44:59,100 --> 00:45:02,040
than we thought, and so
it needed to be an M. Eng.

929
00:45:02,040 --> 00:45:04,650
Emily Zhang was great at
creating this and great,

930
00:45:04,650 --> 00:45:06,940
oh, we did a really
great M. Eng project.

931
00:45:06,940 --> 00:45:10,710
And then we realized,
no, this really

932
00:45:10,710 --> 00:45:12,280
needs to be a PhD project.

933
00:45:12,280 --> 00:45:15,190
We did not continue on--

934
00:45:15,190 --> 00:45:19,200
there are so many difficult
parts of hash algorithms

935
00:45:19,200 --> 00:45:23,670
because you want to
think about things like--

936
00:45:23,670 --> 00:45:27,060
yeah, we'll not get to it--

937
00:45:27,060 --> 00:45:32,730
going all the way
back to the beginning,

938
00:45:32,730 --> 00:45:36,720
how can we create a
viewpoint or something

939
00:45:36,720 --> 00:45:42,120
that allows D not to be totally,
totally different for anybody

940
00:45:42,120 --> 00:45:48,240
who didn't put D as the last
of all possible results?

941
00:45:48,240 --> 00:45:51,580
What kinds of hashes--

942
00:45:51,580 --> 00:45:54,840
what kinds of numbers would we
need to represent a piece on

943
00:45:54,840 --> 00:46:02,220
to make D not the worst
and really make sure

944
00:46:02,220 --> 00:46:05,070
that F isn't the best?

945
00:46:05,070 --> 00:46:07,990
So that's going to be our
last 5-6 minutes of class.

946
00:46:07,990 --> 00:46:11,010
I want you to talk with 5
minutes of y'all talking with

947
00:46:11,010 --> 00:46:14,390
each other, and 5 minutes of
y'all talk and talking to me.

948
00:46:14,390 --> 00:46:18,493
So what kinds of
feature extraction,

949
00:46:18,493 --> 00:46:20,660
what kind of hash function,
what kind of viewpoints?

950
00:46:20,660 --> 00:46:22,400
These are all slightly
different concepts,

951
00:46:22,400 --> 00:46:23,775
but they're all
in the same area.

952
00:46:23,775 --> 00:46:25,330
What kinds of
equivalence classes

953
00:46:25,330 --> 00:46:31,360
will you need in order
to make this happen?

954
00:46:31,360 --> 00:46:34,090
Go ahead.

955
00:46:34,090 --> 00:46:38,900
OK, I hear words continuing
but less frequently.

956
00:46:38,900 --> 00:46:41,140
Let's talk about what
are some of the ways

957
00:46:41,140 --> 00:46:45,880
that people thought to create
a strategy that doesn't

958
00:46:45,880 --> 00:46:48,810
make D and F about the same?

959
00:46:48,810 --> 00:46:52,840


960
00:46:52,840 --> 00:46:53,537
Yeah, go ahead.

961
00:46:53,537 --> 00:46:55,870
AUDIENCE: You could look at
the sequence of local maxima

962
00:46:55,870 --> 00:46:56,740
and minima.

963
00:46:56,740 --> 00:46:58,490
MICHAEL CUTHBERT: Local
maxima and minima.

964
00:46:58,490 --> 00:47:00,365
OK, I think I know what
you're talking about,

965
00:47:00,365 --> 00:47:02,230
but let's give you
a little example.

966
00:47:02,230 --> 00:47:04,730
Let's talk about
A. What do you--

967
00:47:04,730 --> 00:47:06,760
AUDIENCE: So you
could maybe argue

968
00:47:06,760 --> 00:47:08,770
that the C is the
local minima, and then

969
00:47:08,770 --> 00:47:11,630
the F is higher than both of its
neighbors, so it's a maximum.

970
00:47:11,630 --> 00:47:14,380
And then a D is a minimum,
the E's a maximum,

971
00:47:14,380 --> 00:47:18,720
and then the D and C after that
are not really anything until

972
00:47:18,720 --> 00:47:21,902
you hit the A on the 16th note.

973
00:47:21,902 --> 00:47:22,860
MICHAEL CUTHBERT: Cool.

974
00:47:22,860 --> 00:47:24,568
So yeah, we're just
looking at every time

975
00:47:24,568 --> 00:47:26,620
the direction changes
of the pitches.

976
00:47:26,620 --> 00:47:27,120
Great.

977
00:47:27,120 --> 00:47:32,730
And compare that-- so beginning
A has G, F, D, E. Here,

978
00:47:32,730 --> 00:47:35,490
we have D, F, G--

979
00:47:35,490 --> 00:47:39,572
a tiny bit different, D,
F, but then going down to--

980
00:47:39,572 --> 00:47:41,280
a little bit different,
but it's at least

981
00:47:41,280 --> 00:47:45,150
giving some numbers we have.

982
00:47:45,150 --> 00:47:48,990
Always the question is,
does your current streak

983
00:47:48,990 --> 00:47:51,790
end when you hit a rest or not?

984
00:47:51,790 --> 00:47:54,640
And maybe it depends on how
long the rest is, so good.

985
00:47:54,640 --> 00:47:56,810
Other strategies?

986
00:47:56,810 --> 00:47:57,310
Adam?

987
00:47:57,310 --> 00:47:58,920
AUDIENCE: I would look at
where offsets are the same

988
00:47:58,920 --> 00:48:01,000
and then check that their
notes are the same or not.

989
00:48:01,000 --> 00:48:02,000
MICHAEL CUTHBERT: Great.

990
00:48:02,000 --> 00:48:04,230
So we're going to look at
offsets that are the same

991
00:48:04,230 --> 00:48:07,180
and see if notes
are the same or not.

992
00:48:07,180 --> 00:48:08,410
That works really well.

993
00:48:08,410 --> 00:48:12,460
And what I'd love to do, if
this were, what do you call it,

994
00:48:12,460 --> 00:48:15,000
the generalized
adversarial problem

995
00:48:15,000 --> 00:48:17,388
set, the gain problem
set, where one team

996
00:48:17,388 --> 00:48:18,430
has to solve the problem.

997
00:48:18,430 --> 00:48:20,070
The other team has
to keep giving them

998
00:48:20,070 --> 00:48:21,960
things that break that.

999
00:48:21,960 --> 00:48:23,938
I think it was a great idea.

1000
00:48:23,938 --> 00:48:25,480
And I think it would
work in general.

1001
00:48:25,480 --> 00:48:32,250
But I could generate something
where all I insert is

1002
00:48:32,250 --> 00:48:36,360
let's insert a 64th rest at
the beginning and then put all

1003
00:48:36,360 --> 00:48:37,830
random notes.

1004
00:48:37,830 --> 00:48:39,850
And you're going
to end up with--

1005
00:48:39,850 --> 00:48:42,940
and then maybe we'll put one
note that's the same at the end.

1006
00:48:42,940 --> 00:48:47,130
And you could end up with 100%
of the notes on the same offset

1007
00:48:47,130 --> 00:48:50,310
are the same.

1008
00:48:50,310 --> 00:48:53,980
I think we would really
work in the real world,

1009
00:48:53,980 --> 00:48:57,480
but we might want to always
think about something like that,

1010
00:48:57,480 --> 00:48:58,020
too.

1011
00:48:58,020 --> 00:48:58,830
Great idea.

1012
00:48:58,830 --> 00:48:59,410
John?

1013
00:48:59,410 --> 00:49:00,600
Then two people.

1014
00:49:00,600 --> 00:49:02,798
AUDIENCE: [INAUDIBLE] so first--

1015
00:49:02,798 --> 00:49:04,090
MICHAEL CUTHBERT: You can say--

1016
00:49:04,090 --> 00:49:06,900
AUDIENCE: --builds
up on what Adam said

1017
00:49:06,900 --> 00:49:09,480
AUDIENCE: But first, you
take a look at the notes

1018
00:49:09,480 --> 00:49:14,560
and do a set, kind of like
crossover between a set of D's

1019
00:49:14,560 --> 00:49:16,450
and aces and then ASes.

1020
00:49:16,450 --> 00:49:19,910
So both of those would still
show up relatively high.

1021
00:49:19,910 --> 00:49:22,150
And then compare
the offsets, which

1022
00:49:22,150 --> 00:49:24,370
would obviously take
up a bit, but still

1023
00:49:24,370 --> 00:49:26,680
keep D relatively high.

1024
00:49:26,680 --> 00:49:27,680
MICHAEL CUTHBERT: Super.

1025
00:49:27,680 --> 00:49:30,040
When we're talking about
offsets, are we talking about--

1026
00:49:30,040 --> 00:49:33,460
what kind of offsets?

1027
00:49:33,460 --> 00:49:34,960
AUDIENCE: We're in the--

1028
00:49:34,960 --> 00:49:37,420
I guess within the two
measures that the notes being--

1029
00:49:37,420 --> 00:49:37,900
MICHAEL CUTHBERT: Great.

1030
00:49:37,900 --> 00:49:38,983
Where in the two measures?

1031
00:49:38,983 --> 00:49:40,360
Where in the measure?

1032
00:49:40,360 --> 00:49:43,750
We sometimes want to do global
offset from the beginning

1033
00:49:43,750 --> 00:49:44,390
of the measure.

1034
00:49:44,390 --> 00:49:49,180
But then you can't identify
similar phrases or, all it takes

1035
00:49:49,180 --> 00:49:53,330
is put a repeat, put the first
four measures, repeat it once,

1036
00:49:53,330 --> 00:49:55,580
and suddenly the whole rest
of the piece is different.

1037
00:49:55,580 --> 00:49:57,730
So yeah, that's great.

1038
00:49:57,730 --> 00:50:02,080
AUDIENCE: If you
just start at time

1039
00:50:02,080 --> 00:50:04,220
equals 0 and go all the
way through the piece,

1040
00:50:04,220 --> 00:50:08,090
like anytime the two pieces
have the same pitch, you score--

1041
00:50:08,090 --> 00:50:10,930
so the alpha that they
both have on beat two

1042
00:50:10,930 --> 00:50:13,828
would be like a
quarter point because--

1043
00:50:13,828 --> 00:50:16,120
MICHAEL CUTHBERT: The F that
they both have on beat two

1044
00:50:16,120 --> 00:50:17,740
would be a quarter
point because--

1045
00:50:17,740 --> 00:50:20,450
AUDIENCE: Their duration
is only for that 16th note.

1046
00:50:20,450 --> 00:50:21,533
MICHAEL CUTHBERT: Got you.

1047
00:50:21,533 --> 00:50:23,060
So we look at shared duration.

1048
00:50:23,060 --> 00:50:24,320
Great.

1049
00:50:24,320 --> 00:50:25,760
I like that a lot.

1050
00:50:25,760 --> 00:50:28,850
Did anybody try to come
up with a equivalent?

1051
00:50:28,850 --> 00:50:29,350
Yeah?

1052
00:50:29,350 --> 00:50:32,620
The contour ends up being a
kind of a new equivalence class

1053
00:50:32,620 --> 00:50:34,900
that we hadn't talked about,
which kind of works out

1054
00:50:34,900 --> 00:50:38,620
in thinking that everything
that doesn't change directions

1055
00:50:38,620 --> 00:50:43,480
is a kind of passing tone, even
though not in the proper music

1056
00:50:43,480 --> 00:50:48,250
theory term, and
so can be ignored.

1057
00:50:48,250 --> 00:50:50,860
By the way, the one
I use quite often

1058
00:50:50,860 --> 00:50:54,100
is I'm just going to look
on downbeats or on beats

1059
00:50:54,100 --> 00:50:57,730
and ignore everything else,
and that works pretty well

1060
00:50:57,730 --> 00:51:01,020
for a lot of things.

1061
00:51:01,020 --> 00:51:09,000