1
00:00:00,680 --> 00:00:02,120
[SQUEAKING]

2
00:00:02,120 --> 00:00:04,340
[PAPERS RUSTLING]

3
00:00:04,340 --> 00:00:06,320
[CLICKING]

4
00:00:15,600 --> 00:00:25,040
JEREMY KEPNER: All right. Welcome. Great to
see everyone here. We're really excited about

5
00:00:25,040 --> 00:00:32,229
this opportunity. As you know, our AI accelerator
has officially kicked off. All of your teams

6
00:00:32,229 --> 00:00:41,559
are ready to go. And we wanted this to be
an opportunity as a team, come together and

7
00:00:41,559 --> 00:00:47,440
develop some common foundation, some common
technological foundation, some common language

8
00:00:47,440 --> 00:00:50,260
for talking about these very challenging AI
problems.

9
00:00:50,260 --> 00:00:56,260
And so with that, I'll hand it over to Vijay.

10
00:00:56,260 --> 00:00:57,400
VIJAY GADEPALLY: All right.

11
00:00:57,400 --> 00:01:01,040
JEREMY KEPNER: Who will kick off with the first
lecture, which basically provides some overview

12
00:01:01,049 --> 00:01:03,170
AI context for this.

13
00:01:03,170 --> 00:01:07,660
VIJAY GADEPALLY: Again, welcome to the class.
We're really looking forward to this. What

14
00:01:07,660 --> 00:01:12,600
we're going to present this morning is really
a lot of overview material, right? Many of

15
00:01:12,600 --> 00:01:19,430
you here know a lot in AI and machine learning.
This is really meant to just level set before

16
00:01:19,430 --> 00:01:22,890
we start the program, before we start these
classes.

17
00:01:22,890 --> 00:01:27,580
So you can see this generic title-- Artificial
Intelligence and Machine Learning. And we're

18
00:01:27,580 --> 00:01:32,390
going to try and cover all of that in about
an hour. So some details might be skipped,

19
00:01:32,390 --> 00:01:38,310
but we'll try and hit some of the salient
features. All of these slides are available

20
00:01:38,310 --> 00:01:39,310
for you to use.

21
00:01:39,310 --> 00:01:44,289
So if you're presenting back to your own teams,
please feel free to pull from these slides.

22
00:01:44,289 --> 00:01:49,039
We've actually gone through-- over some time
putting a good set of survey and overview

23
00:01:49,039 --> 00:01:54,280
slides together. So if any of these are useful
to you, just email us, or we'll make them

24
00:01:54,280 --> 00:01:58,280
available to you. You're more than welcome
to use any and all of these slides if you're

25
00:01:58,280 --> 00:02:01,590
trying to present this back to other people.

26
00:02:01,590 --> 00:02:06,810
So with that, let's begin. So we're going
to do a quick overview of artificial intelligence.

27
00:02:06,810 --> 00:02:11,890
Again, a lot of level setting going on here.
We're going to do a quick, deep dive. These

28
00:02:11,890 --> 00:02:16,420
aren't the deepest of dives, again, given
the amount of time that we have, but just

29
00:02:16,420 --> 00:02:21,170
talk very quickly about supervised, unsupervised,
and reinforcement learning, and then summarize.

30
00:02:21,170 --> 00:02:26,730
And we can certainly stop for questions, philosophical
debates, et cetera, towards the end. We'll

31
00:02:26,730 --> 00:02:32,450
try not to get a lot of the philosophical
debates on camera if we can. All right. So

32
00:02:32,450 --> 00:02:37,040
first question-- what is artificial intelligence?
And this is a question that probably a lot

33
00:02:37,040 --> 00:02:41,550
of you get. And I certainly have received
this from a number of people. And that actually

34
00:02:41,550 --> 00:02:45,070
takes a lot of-- it took us a lot of time
to come up with this.

35
00:02:45,070 --> 00:02:50,680
And so we are very fortunate to have Professor
Winston spend some time with us out at Lincoln

36
00:02:50,680 --> 00:02:54,590
Laboratory. And we actually brainstormed for
a good hour or two, really trying to come

37
00:02:54,590 --> 00:03:00,480
up with what is a good definition for what
we call artificial intelligence. And what

38
00:03:00,480 --> 00:03:04,740
we came up with is that there are two aspects
to artificial intelligence.

39
00:03:04,740 --> 00:03:10,270
First, that we should not confuse with each
other. One is the concept of narrow AI, and

40
00:03:10,270 --> 00:03:14,640
another is a concept of general AI. And sometimes
in conversation, we tend to conflate or mix

41
00:03:14,640 --> 00:03:20,950
the two. So narrow AI, according to our definition,
is the theory and development of computer

42
00:03:20,950 --> 00:03:26,570
systems that perform tasks that augment for
human intelligence, such as perceiving, classifying,

43
00:03:26,570 --> 00:03:30,050
learning, abstracting, reasoning, and/or acting.

44
00:03:30,050 --> 00:03:34,690
Certainly in a lot of the programs that we
work in, we're very focused on narrow AI and

45
00:03:34,690 --> 00:03:42,010
not necessarily the more general AI, which
we define as full autonomy. So that's a very

46
00:03:42,010 --> 00:03:46,840
high-level definition of what we mean by AI.
Now, many of you in the crowd are probably

47
00:03:46,840 --> 00:03:51,370
saying, well, AI has been around for a while.
People have been talking about this for 50,

48
00:03:51,370 --> 00:03:58,209
60-plus years. Why now? What is so special
about it now? Why is this conversation piece

49
00:03:58,209 --> 00:03:59,209
now?

50
00:03:59,209 --> 00:04:04,090
Well, from what we've seen, it really is the
convergence of three different communities

51
00:04:04,090 --> 00:04:08,550
that have come together. The first is the
community on big data. The second is a community

52
00:04:08,550 --> 00:04:14,360
on computing in a lot of computing technologies.
And finally, a lot of research and results

53
00:04:14,360 --> 00:04:16,279
in machine learning algorithms.

54
00:04:16,279 --> 00:04:20,220
The other one I forgot to put is dollar signs
down here. People have basically figured out

55
00:04:20,220 --> 00:04:27,800
how to make money off of selling advertisements,
labeling cat pictures, et cetera. So that's

56
00:04:27,800 --> 00:04:33,930
maybe the hidden-- why now in particular.
But these are the three large technical areas

57
00:04:33,930 --> 00:04:40,010
that have evolved over the past decade or
so to really make AI something we discuss

58
00:04:40,010 --> 00:04:42,370
a lot today.

59
00:04:42,370 --> 00:04:48,659
So when we talk about AI, there are a number
of different pieces which make up an AI system.

60
00:04:48,659 --> 00:04:53,650
And we love the algorithms, people, but there
is a lot more going on outside of that. So

61
00:04:53,650 --> 00:04:57,900
we've spent a significant amount of effort
just trying to figure out what goes into an

62
00:04:57,900 --> 00:04:58,900
AI system.

63
00:04:58,900 --> 00:05:03,930
And this is what we call a canonical architecture.
Very much in line with Lincoln Laboratory

64
00:05:03,930 --> 00:05:07,770
thinking, we like to think of an end to end
pipeline. What are the various components?

65
00:05:07,770 --> 00:05:11,800
And what are the interconnections between
these various components? So within our AI

66
00:05:11,800 --> 00:05:18,220
canonical architecture shown here, we go all
the way from sensors to the end user or missions.

67
00:05:18,220 --> 00:05:22,100
And a lot of the projects that you all are
working on are going to go all the way from

68
00:05:22,100 --> 00:05:27,460
here to there. A lot of our class, however,
for the next few weeks is going to focus on

69
00:05:27,460 --> 00:05:33,050
step one, where a lot of people get stuck.
So we take data that comes in through either

70
00:05:33,050 --> 00:05:36,050
structured or unstructured sources.

71
00:05:36,050 --> 00:05:41,520
These are typically passed into some data
conditioning or data curation step. This data

72
00:05:41,520 --> 00:05:48,320
is through that process, typically converted
into some form of information. That information

73
00:05:48,320 --> 00:05:54,270
is then passed into a series of algorithms,
maybe one or many algorithms. There are lots

74
00:05:54,270 --> 00:05:57,610
of them. There is life beyond neural networks.

75
00:05:57,610 --> 00:06:01,669
Once we pass them through the algorithms,
these typically form. This information is

76
00:06:01,669 --> 00:06:06,490
converted into knowledge. It's typically then
passed into some module that interacts with

77
00:06:06,490 --> 00:06:11,449
the end user, or a human, or the mission.
And that's what we call a human machine teaming

78
00:06:11,449 --> 00:06:12,449
step.

79
00:06:12,449 --> 00:06:17,400
And that finally-- that knowledge with the
human complement becomes insight that can

80
00:06:17,400 --> 00:06:24,110
then be used to execute the mission that the
AI system was created for. All of these components

81
00:06:24,110 --> 00:06:30,440
sit on the bedrock of modern computing. Many
different technologies that make up modern

82
00:06:30,440 --> 00:06:34,690
computing and the system that we're using
today has combination of some of these computing

83
00:06:34,690 --> 00:06:36,900
hardware elements.

84
00:06:36,900 --> 00:06:41,550
And certainly within the context of a lot
of the projects that we are interested in,

85
00:06:41,550 --> 00:06:47,050
all of this also needs to be wrapped in a
layer that we call robust AI, which consists

86
00:06:47,050 --> 00:06:51,930
of explainable artificial intelligence, metrics,
and biased assessment, verification, validation,

87
00:06:51,930 --> 00:06:57,250
security, policy, ethics, safety, and training.
We'll talk very briefly about each of these

88
00:06:57,250 --> 00:07:00,169
pieces in detail in a little bit.

89
00:07:00,169 --> 00:07:07,729
As I mentioned, AI has an extremely rich history.
This is just a very Lincoln and MIT specific

90
00:07:07,729 --> 00:07:13,889
view of the history of artificial intelligence.
But certainly, there has been great work since

91
00:07:13,889 --> 00:07:20,350
the folks of Minsky, Clark, Dineen, Oliver
Selfridge, et cetera, since the '50s. We've

92
00:07:20,350 --> 00:07:25,900
seen a lot of work in the '80s and '90s. And
certainly, recently there has been, again,

93
00:07:25,900 --> 00:07:31,570
a resurgence of AI in our parlance in our
thinking of the way AI works.

94
00:07:31,570 --> 00:07:36,650
So without going into too much detail about
each of these eras and why the winters came

95
00:07:36,650 --> 00:07:43,389
about, et cetera, I think John Launchbury
at DARPA actually put it very well, when he

96
00:07:43,389 --> 00:07:48,770
talked about different waves of AI technology
that have come about. And when he talks about

97
00:07:48,770 --> 00:07:53,380
it, he talks about the three waves of AI or
the four waves of AI. And the first wave,

98
00:07:53,380 --> 00:08:00,910
which you can think of as the first decade
of AI technology, resulted in a lot of reasoning-based

99
00:08:00,910 --> 00:08:04,250
systems, which were based on handcrafted knowledge.

100
00:08:04,250 --> 00:08:08,830
So an example of an output of this would be
an expert system, right? So a lot of work

101
00:08:08,830 --> 00:08:17,370
in that. So if we take the four dimensions
that John Launchbury suggests of the ability

102
00:08:17,370 --> 00:08:21,840
of the system to perceive, learn, abstract,
and reason, these are typically pretty good

103
00:08:21,840 --> 00:08:24,330
at reasoning. Because they encoded human knowledge,
right?

104
00:08:24,330 --> 00:08:32,539
So a human expert sat down and said, what's
going on in the system? And tried to write

105
00:08:32,539 --> 00:08:37,549
a series of rules. So tax software, for example,
does a pretty reasonable job of that where

106
00:08:37,549 --> 00:08:43,460
a chartered accountant or a tax expert sits
down, encodes a series of rules. We have a

107
00:08:43,460 --> 00:08:44,460
question in the back.

108
00:08:44,460 --> 00:08:50,580
AUDIENCE: Yeah. [INAUDIBLE] I just wanted
an example of an expert system from the '50s

109
00:08:50,580 --> 00:08:51,580
to [INAUDIBLE].

110
00:08:51,580 --> 00:08:56,230
VIJAY GADEPALLY: Yep. So the question is,
are there examples of expert systems? And

111
00:08:56,230 --> 00:09:01,600
certainly, one would be tax software. My graduate
research was actually an autonomous vehicle.

112
00:09:01,600 --> 00:09:07,370
Some of the early autonomous vehicles used
a form of expert systems where the states

113
00:09:07,370 --> 00:09:10,950
on a finite state machine were maybe handcrafted.

114
00:09:10,950 --> 00:09:14,790
And the transitions between them were designed
that way. There was some machine learning

115
00:09:14,790 --> 00:09:19,709
wrapped around, but expert systems certainly
played a large part in some of the early,

116
00:09:19,709 --> 00:09:27,230
even autonomous vehicle research that went
on. All right. So over time, we were able

117
00:09:27,230 --> 00:09:28,770
to use these expert systems.

118
00:09:28,770 --> 00:09:32,611
And don't get me wrong. These systems are
still extremely valid in cases where you have

119
00:09:32,611 --> 00:09:37,320
limited data availability, limited compute
power, a lot of expert systems still being

120
00:09:37,320 --> 00:09:43,580
used, or in cases where explainability is
a very important factor. You still see expert

121
00:09:43,580 --> 00:09:47,680
systems, because they do have the ability
to explain why they came up. They can typically

122
00:09:47,680 --> 00:09:53,090
point to a set of rules that somebody wrote,
which is usually quite interpretable by a

123
00:09:53,090 --> 00:09:54,090
human.

124
00:09:54,090 --> 00:09:58,480
However, as we were able to collect more data,
we were maybe able to understand a little

125
00:09:58,480 --> 00:10:05,300
bit more about what was the underlying process.
We were able to apply statistical learning.

126
00:10:05,300 --> 00:10:12,410
And this led to the next era or next wave
of AI technologies, which is often called

127
00:10:12,410 --> 00:10:13,899
the learning wave.

128
00:10:13,899 --> 00:10:18,970
And this was really enabled by lots of data-enabled
non-expert systems. So what we mean by that

129
00:10:18,970 --> 00:10:24,530
is we were able to dial back the amount of
expert knowledge that we encoded into the

130
00:10:24,530 --> 00:10:30,890
algorithm, maybe put a higher level of expert
knowledge into that. But usually, then used

131
00:10:30,890 --> 00:10:33,790
data to learn what some of these rules could
be.

132
00:10:33,790 --> 00:10:39,870
An example of that, in case someone wants
to ask, would be in speech processing, for

133
00:10:39,870 --> 00:10:48,550
example. So we were able to say, well, I realized
that speech follows this Gaussian mixture

134
00:10:48,550 --> 00:10:53,450
model. So I can encode that level of statistical
knowledge, but I'm going to let the system

135
00:10:53,450 --> 00:10:57,339
figure out the details of how all that actually
works out.

136
00:10:57,339 --> 00:11:01,880
And there are many other cases. Again, coming
back to some of the research I did on autonomous

137
00:11:01,880 --> 00:11:06,660
vehicles, we were able to maybe use some high-level
expert rules, that here are a set of states

138
00:11:06,660 --> 00:11:12,089
that a car may be in. But I'm going to let
the algorithm actually figure out when these

139
00:11:12,089 --> 00:11:17,570
transitions occur and what constitutes a transition
between different states.

140
00:11:17,570 --> 00:11:23,290
So looking at the four vectors that you could
think about it, these systems had a little

141
00:11:23,290 --> 00:11:28,540
bit more on perception. Obviously, we're doing
a lot more learning. But their ability to

142
00:11:28,540 --> 00:11:33,839
abstract and reason was still pretty low.
And by reasoning, we mean, can you explain?

143
00:11:33,839 --> 00:11:38,660
Can you tell us what's going on when you give
me an output to the result?

144
00:11:38,660 --> 00:11:46,420
The next wave which we're maybe at the beginning
stages of is what we call contextual learning

145
00:11:46,420 --> 00:11:51,880
or contextual adaptation. This is where an
AI system can actually add context into what

146
00:11:51,880 --> 00:11:57,140
it's doing. I'm not sure I have too many examples
of people doing this very well.

147
00:11:57,140 --> 00:12:05,420
I think most of the work today probably falls
into the end stage of the learning wave of

148
00:12:05,420 --> 00:12:09,950
AI. But we're able to combine a bunch of these
learning things to make it look like it's

149
00:12:09,950 --> 00:12:16,079
contextual in nature. But the key concept
over here is being able to have the system

150
00:12:16,079 --> 00:12:20,350
automatically abstract and reason. So the
way that we think about things, right?

151
00:12:20,350 --> 00:12:24,459
So if I see a chair over here and I put a
chair somewhere else, I still know it's a

152
00:12:24,459 --> 00:12:29,760
chair, because I'm using other contexts. Maybe
it's next to a table or stuff like that. Some

153
00:12:29,760 --> 00:12:35,220
early research going on in that area. And
certainly, the next wave of this is what we

154
00:12:35,220 --> 00:12:39,230
call abstraction. And there is very little
work on this, but if we have to think out

155
00:12:39,230 --> 00:12:40,260
in the future.

156
00:12:40,260 --> 00:12:46,649
But this is really the system of the ability
of an AI system to actually abstract information

157
00:12:46,649 --> 00:12:51,870
that it's learning. So instead of learning
that a chair or a table is something with

158
00:12:51,870 --> 00:12:56,480
a leg at the bottom, it learns that a table
is something you put things on and is able

159
00:12:56,480 --> 00:13:04,350
to abstract that information or that knowledge
to any other domain or any other field. Do

160
00:13:04,350 --> 00:13:13,300
we have any questions before I continue from
here? OK, great.

161
00:13:13,300 --> 00:13:18,120
So that's a little bit on the evolution of
AI. The reason we like to go through this

162
00:13:18,120 --> 00:13:23,860
is because there is great work going on in
each of these waves. And nothing that people

163
00:13:23,860 --> 00:13:28,820
are doing in any of these waves is any lesser
or more. It's typically dependent on what

164
00:13:28,820 --> 00:13:31,100
you have at your disposal.

165
00:13:31,100 --> 00:13:35,280
What I like to tell people sometimes is the
way to think about all of this is you have

166
00:13:35,280 --> 00:13:41,570
a couple of dials at your disposal-- turning
dials. The first dial is, how much compute?

167
00:13:41,570 --> 00:13:47,870
Right? How much ability do you have to crunch
data? The second piece is, how much data do

168
00:13:47,870 --> 00:13:51,540
you actually have available? This can be labeled
data in many cases.

169
00:13:51,540 --> 00:13:56,471
And the third dial is, how much knowledge
are you able to embed into an algorithm? In

170
00:13:56,471 --> 00:14:02,370
certain cases where maybe you have very little
computing, very little labeled data availability,

171
00:14:02,370 --> 00:14:07,670
but a lot of expert knowledge or a lot of
ability to encode information into an algorithm,

172
00:14:07,670 --> 00:14:11,720
you might be able to use an expert system,
right? And that's a very good use case for

173
00:14:11,720 --> 00:14:12,720
that.

174
00:14:12,720 --> 00:14:17,870
An example may be on another dimension where
you want to encode very little human knowledge,

175
00:14:17,870 --> 00:14:22,529
but you have a lot of computing and data available,
would be rare neural networks fall in, where

176
00:14:22,529 --> 00:14:27,360
they're essentially learning what the human
knowledge-- what that encoded information

177
00:14:27,360 --> 00:14:28,360
should be.

178
00:14:28,360 --> 00:14:32,170
A lot of statistical techniques also fall
into that camp where maybe you encode a little

179
00:14:32,170 --> 00:14:37,519
bit of information as to what the background
distribution of the process is. But it learns

180
00:14:37,519 --> 00:14:44,170
the details of exactly how that distribution
is modeled, based on the data that it sees.

181
00:14:44,170 --> 00:14:48,230
So you have a lot of different settings that
you can use. And there are a number of different

182
00:14:48,230 --> 00:14:55,230
techniques within the broader AI context that
you can use to achieve your mission. And I'm

183
00:14:55,230 --> 00:15:01,050
sure many of you are going to be doing different
types of algorithms. And a lot of that decision

184
00:15:01,050 --> 00:15:06,139
will be dependent on, well, how much data
was I given? Right? How good is this data

185
00:15:06,139 --> 00:15:10,070
that I'm using? Is there an ability to learn
anything from this?

186
00:15:10,070 --> 00:15:13,670
And if not, you might have to encode some
knowledge of your own into it saying, well,

187
00:15:13,670 --> 00:15:19,649
I know that this process looks like that,
so let me tell the algorithm to not waste

188
00:15:19,649 --> 00:15:23,330
too much time crunching the data to learn
the underlying distribution, which I can tell

189
00:15:23,330 --> 00:15:28,279
you. Why don't you learn the parameters of
the distribution instead? Does that makes

190
00:15:28,279 --> 00:15:29,949
sense? All right.

191
00:15:29,949 --> 00:15:35,680
And as you know, there's just a lot going
on in AI and machine learning. You can't walk

192
00:15:35,680 --> 00:15:41,410
two steps without running into somebody who's
either starting something up, working for

193
00:15:41,410 --> 00:15:48,250
one of these organizations. So it really is
an exciting time to be in the field. All right.

194
00:15:48,250 --> 00:15:53,500
So that's a little bit on the overview, but
let's talk now in a little bit of detail on

195
00:15:53,500 --> 00:16:00,070
what some of the critical components are within
this AI architecture. So one thing that we

196
00:16:00,070 --> 00:16:03,360
like to note-- and there's a reason that as
we've been reaching out to a number of you,

197
00:16:03,360 --> 00:16:08,820
we've been talking about getting data, right?
Work with stakeholders to get your data in

198
00:16:08,820 --> 00:16:09,820
place.

199
00:16:09,820 --> 00:16:15,330
And the reason we talk about that is data
is critical to breakthroughs in AI. A lot

200
00:16:15,330 --> 00:16:19,870
of the press may be on the algorithms and
the algorithms that have been designed. But

201
00:16:19,870 --> 00:16:23,880
really, when we've looked back in history,
we've seen that, well, the availability of

202
00:16:23,880 --> 00:16:31,769
a good canonical data set actually is equally,
if not more critical to a breakthrough in

203
00:16:31,769 --> 00:16:32,769
AI.

204
00:16:32,769 --> 00:16:38,011
So what we've done here is we've just picked
a select number of breakthroughs in AI. Our

205
00:16:38,011 --> 00:16:42,800
definition of a breakthrough in this particular
example is something that made a lot of press

206
00:16:42,800 --> 00:16:47,760
or something that we thought was really cool.
So here are some examples of that in different

207
00:16:47,760 --> 00:16:49,570
years.

208
00:16:49,570 --> 00:16:54,160
And we've talked about the data set, the canonical
data set that maybe led to that breakthrough

209
00:16:54,160 --> 00:16:58,600
or that was cited in that breakthrough as
well as the algorithms that were used in that

210
00:16:58,600 --> 00:17:04,579
breakthrough and when they were first proposed.
This is notional in nature. Clearly, you could

211
00:17:04,579 --> 00:17:07,730
adjust these dates a few years here or there.

212
00:17:07,730 --> 00:17:12,829
But what we really want to get across is that
the average number of years to a breakthrough

213
00:17:12,829 --> 00:17:18,618
from the availability of a cool data set or
a very important, well-structured, well-labeled

214
00:17:18,618 --> 00:17:25,119
data set is much smaller than from the algorithm's
first proposal or when the algorithm first

215
00:17:25,119 --> 00:17:26,209
comes out.

216
00:17:26,209 --> 00:17:30,889
So as you're developing your challenge problems,
as you're developing your interactions with

217
00:17:30,889 --> 00:17:37,029
stakeholders, certainly something to keep
in mind that there's clearly a lot of algorithmic

218
00:17:37,029 --> 00:17:44,149
research that's going to go on. But having
a good, strong, well-labeled, and documented

219
00:17:44,149 --> 00:17:49,210
data set can be equally important. And making
that available to the wider AI community,

220
00:17:49,210 --> 00:17:56,009
the wider AI ecosystem can be very, very valuable
to your work and the work of many other people.

221
00:17:56,009 --> 00:17:57,389
All right.

222
00:17:57,389 --> 00:18:02,100
So back to the AI architecture, we're going
to just go through very briefly-- different

223
00:18:02,100 --> 00:18:07,590
pieces of this architecture. So the first
piece we're going to talk about is data conditioning,

224
00:18:07,590 --> 00:18:13,600
which is converting unstructured and structured
data. Within the structured and unstructured

225
00:18:13,600 --> 00:18:19,779
data, you might have structured sources coming
from sensors, network logs for some of you,

226
00:18:19,779 --> 00:18:25,559
metadata associated with sensors, maybe speech
or other such signals.

227
00:18:25,559 --> 00:18:29,590
There's also a lot of unstructured data. You
think of things that you might collect from

228
00:18:29,590 --> 00:18:34,800
the internet that you might download from,
say, a social media site, maybe reports, other

229
00:18:34,800 --> 00:18:40,019
types of sensors that maybe don't have well--
that don't have the strong structure within

230
00:18:40,019 --> 00:18:41,509
the data set itself.

231
00:18:41,509 --> 00:18:46,710
And typically, this first step, what we call
the data conditioning step, consists of a

232
00:18:46,710 --> 00:18:52,070
number of different elements. You might want
to first figure out where to put this data.

233
00:18:52,070 --> 00:18:56,889
That can often take a lot of time. And there
have been religious wars fought on this topic.

234
00:18:56,889 --> 00:19:02,880
We're here to tell you that you're probably
OK, picking most technologies. But if you

235
00:19:02,880 --> 00:19:07,700
have any questions, feel free to reach out
to me or to others on the team. We have a

236
00:19:07,700 --> 00:19:16,139
lot of opinions on what's the right infrastructure
to solve the problem. Typically, these infrastructure

237
00:19:16,139 --> 00:19:20,960
or databases might provide capabilities such
as indexing, organization, and structure.

238
00:19:20,960 --> 00:19:24,840
Very important in unstructured data to convert
it into some format that you can do things

239
00:19:24,840 --> 00:19:25,840
with.

240
00:19:25,840 --> 00:19:30,609
They may allow you to connect to them using
domain specific languages. So it's converting

241
00:19:30,609 --> 00:19:35,989
it into a language that maybe you're used
to talking. They can provide high-performance

242
00:19:35,989 --> 00:19:39,629
data access and in many cases, a declarative
interface. Because maybe you don't really

243
00:19:39,629 --> 00:19:43,620
care about how the data is being accessed.
You want to just say select the data, give

244
00:19:43,620 --> 00:19:46,659
it to me. And then move forward from there.

245
00:19:46,659 --> 00:19:53,340
Another important part of the data conditioning
step is data curation. This unfortunately

246
00:19:53,340 --> 00:20:00,039
will probably take you a very long time. And
it requires a lot of knowledge of the data

247
00:20:00,039 --> 00:20:05,940
itself, what you want to do with the data,
and how you receive the data.

248
00:20:05,940 --> 00:20:12,879
But what you might do in the data curation
step is perform some unsupervised learning,

249
00:20:12,879 --> 00:20:17,850
maybe reduce the dimensionality of your problem.
You might do some clustering or pattern recognition

250
00:20:17,850 --> 00:20:22,879
to maybe remove certain pieces of your data
or to highlight certain pieces of the data

251
00:20:22,879 --> 00:20:24,249
that look important.

252
00:20:24,249 --> 00:20:30,390
You might do some outlier detection. You might
highlight missing values. There's just dot,

253
00:20:30,390 --> 00:20:34,380
dot, dot, et cetera, et cetera, et cetera.
A lot goes on in the data curation step. We

254
00:20:34,380 --> 00:20:40,480
could certainly spend hours just talking about
that. And the final thing, especially within

255
00:20:40,480 --> 00:20:46,299
the context of supervised machine learning,
but even in the world of unsupervised learning

256
00:20:46,299 --> 00:20:48,979
would be spending some time on data labeling,
right?

257
00:20:48,979 --> 00:20:54,730
So this is taking data that you've received,
typically doing an initial data exploration.

258
00:20:54,730 --> 00:20:59,090
Could be as simple as opening it up in Excel
to see what the different columns and rows

259
00:20:59,090 --> 00:21:05,340
look like if that's a suitable place to open
it up. You might look for highlight, missing,

260
00:21:05,340 --> 00:21:08,690
or incomplete data, just from your initial
data exploration.

261
00:21:08,690 --> 00:21:13,759
You might be able to go back to the data provider
or to the sensor and say, can you reorient

262
00:21:13,759 --> 00:21:19,840
the sensors or recapture the data? I noticed
that every time you've measured this particular

263
00:21:19,840 --> 00:21:25,100
quantity, it always shows up as 3. I can't
imagine that that's correct. Can you go back

264
00:21:25,100 --> 00:21:29,460
and tell me if that sensor is actually working?
Or is it actually 3? In which case, you might

265
00:21:29,460 --> 00:21:30,929
want to know that.

266
00:21:30,929 --> 00:21:35,330
And you might look for errors, biases, and
collection, of course, on top of the actual

267
00:21:35,330 --> 00:21:39,649
labeling process that you're doing to highlight
phenomenology within the data that you'd like

268
00:21:39,649 --> 00:21:46,200
to then look for through your machine learning
algorithms. I'll pause for a second. Yes?

269
00:21:46,200 --> 00:21:47,659
AUDIENCE: I have a quick question.

270
00:21:47,659 --> 00:21:48,659
VIJAY GADEPALLY: Yeah?

271
00:21:48,659 --> 00:21:52,490
AUDIENCE: What's the ratio that you see between
structured data and unstructured data?

272
00:21:52,490 --> 00:21:56,649
VIJAY GADEPALLY: So the question is, what's
the ratio we see between structured and unstructured

273
00:21:56,649 --> 00:22:02,369
data? That's a great question. So the ratio
in terms of the volume or the ratio in terms

274
00:22:02,369 --> 00:22:07,499
of what you can do with it? Because those
are actually almost the opposite.

275
00:22:07,499 --> 00:22:12,250
So, again, I'm talking about a few data sets
that I'm very familiar with. The unstructured

276
00:22:12,250 --> 00:22:19,871
data can often be 90% of the volume. And maybe
the 10% is the metadata associated with the

277
00:22:19,871 --> 00:22:25,999
unstructured data. Most of the value, however,
comes from the structured data where people

278
00:22:25,999 --> 00:22:30,980
really analyze the crap out of these structured
data, because they know how to.

279
00:22:30,980 --> 00:22:38,649
There is certainly a lot of potential within
the unstructured data. So when we talk to

280
00:22:38,649 --> 00:22:42,999
people, that's why we talk a lot about infrastructure
and databases as being an important first

281
00:22:42,999 --> 00:22:48,799
step. Because if you can just take the unstructured
data and put it into a structured or semi-structured

282
00:22:48,799 --> 00:22:51,860
form, that itself can provide a lot of value.

283
00:22:51,860 --> 00:23:00,220
Because very often in problems that we see,
the 90% volume of data is largely untapped.

284
00:23:00,220 --> 00:23:04,149
Because people don't know how to get into
it, or don't know what to do with it, or it's

285
00:23:04,149 --> 00:23:09,220
not in a form that you can really deal with.
So I think next class, we're going to be talking

286
00:23:09,220 --> 00:23:16,789
to you about how to organize your data, strategies
for organizing data that can get you a lot

287
00:23:16,789 --> 00:23:26,380
more value out of the unstructured data. Does
that answer your question? Yes?

288
00:23:26,380 --> 00:23:32,460
AUDIENCE: [INAUDIBLE]

289
00:23:32,460 --> 00:23:44,359
VIJAY GADEPALLY: So the question is when you
apply AI or machine learning techniques to

290
00:23:44,359 --> 00:23:49,759
a problem domain, is it typically a single
modality or multiple modalities? I'd say the

291
00:23:49,759 --> 00:23:55,470
answer is both. Certainly, there's a lot of
research. And back there, we have Matthew,

292
00:23:55,470 --> 00:24:00,809
who's actually doing research on that right
now on how to fuse multiple modalities of

293
00:24:00,809 --> 00:24:01,809
data.

294
00:24:01,809 --> 00:24:06,419
I know a lot of projects that are being discussed
here are certainly looking at multiple modalities.

295
00:24:06,419 --> 00:24:16,690
If I had to say as of today, a lot of the
work that's out there-- the published work

296
00:24:16,690 --> 00:24:21,490
may be focused on a single modality. But that's
not to say-- I mean, I think there is a lot

297
00:24:21,490 --> 00:24:25,929
of value on multiple modalities. But the challenge
still comes up on, how do you integrate this

298
00:24:25,929 --> 00:24:30,440
data, especially if they're collected from
different systems? Yep?

299
00:24:30,440 --> 00:24:35,470
AUDIENCE: Just on the structure versus unstructured.
So it's not really my area, but I am surprised

300
00:24:35,470 --> 00:24:39,120
to see speech in the structured [INAUDIBLE].
And I wonder. Is that just because the technologies

301
00:24:39,120 --> 00:24:46,840
that can force the [INAUDIBLE] and all of
this data conditioning are mature enough that

302
00:24:46,840 --> 00:24:50,529
you can basically treat it [INAUDIBLE]?

303
00:24:50,529 --> 00:24:55,239
VIJAY GADEPALLY: So the question is, why would
speech or something else like that fall into

304
00:24:55,239 --> 00:24:59,969
structured versus unstructured? And you're
absolutely right. I think when we pick speech--

305
00:24:59,969 --> 00:25:02,909
and I'm sure there are others in the room
that might disagree with that and might stick

306
00:25:02,909 --> 00:25:04,989
it over here.

307
00:25:04,989 --> 00:25:08,979
When we look at the type of acquisition processes
that are used, the software that's used, they

308
00:25:08,979 --> 00:25:13,630
typically come out with some known metadata.
They follow a certain pattern that we can

309
00:25:13,630 --> 00:25:17,610
then use, right? There is a clear range to
where there is-- the frequency to which the

310
00:25:17,610 --> 00:25:22,370
data is collected. And that's why we stuck
it in the structured data type.

311
00:25:22,370 --> 00:25:26,529
Of course, if you're collecting data out in
the field without that, you could probably

312
00:25:26,529 --> 00:25:31,120
stick it into the unstructured world as well.
But that's probably a good example of something

313
00:25:31,120 --> 00:25:39,479
that can fall in between the two places. OK.
All right. Now, for the part everyone's really

314
00:25:39,479 --> 00:25:41,399
interested in-- machine learning, right?

315
00:25:41,399 --> 00:25:46,340
So, all right, you got through the boring
data conditioning stuff, which will take you

316
00:25:46,340 --> 00:25:52,479
a couple of years or something like that.
Nothing serious. And now, you're ready to

317
00:25:52,479 --> 00:25:57,379
do the machine learning. And now, you're given
a choice. Well, which algorithm do you use?

318
00:25:57,379 --> 00:25:59,820
Neural networks, you might say, right?

319
00:25:59,820 --> 00:26:07,849
There is a lot more, though, beyond the neural
network world. So there is numerous taxonomies.

320
00:26:07,849 --> 00:26:12,229
I'm going to give you two of them today for
how you describe machine learning algorithms.

321
00:26:12,229 --> 00:26:17,580
One that's really an interesting way is from
Pedro Domingos at the University of Washington

322
00:26:17,580 --> 00:26:22,429
in which he says that there are five tribes
of machine learning.

323
00:26:22,429 --> 00:26:27,369
So there are the symbolists, which an example
of that would be expert systems. There are

324
00:26:27,369 --> 00:26:32,799
the Bayesian tribes, which an example of an
algorithm within that might be naive Bayes.

325
00:26:32,799 --> 00:26:37,799
There are the analogizers, which an example
of that would be a support vector machine

326
00:26:37,799 --> 00:26:43,169
and the connectionists, an example of which
would be deep neural networks. And evolutionary

327
00:26:43,169 --> 00:26:46,929
is an example of that which might be genetic
programming.

328
00:26:46,929 --> 00:26:52,619
What really I'm trying to get across-- I'm
sure the author is trying to get across here

329
00:26:52,619 --> 00:26:57,580
is that lots and lots of different algorithms.
Each have their relative merits and relative

330
00:26:57,580 --> 00:27:03,109
strengths. Apply the right one for your application.
Apply the right one for-- again, given these

331
00:27:03,109 --> 00:27:06,970
dials that I talked about earlier, the amount
of computing that you have available, the

332
00:27:06,970 --> 00:27:10,330
amount of data that you have available, and
the amount of expert knowledge that you're

333
00:27:10,330 --> 00:27:16,109
able to encode into your algorithm that you
think is generalizable enough.

334
00:27:16,109 --> 00:27:23,309
If we actually talk about-- this is a very
useful chart I found in describing to folks

335
00:27:23,309 --> 00:27:29,529
that are not familiar with AI that might say,
wasn't AI just neural networks? And neural

336
00:27:29,529 --> 00:27:35,619
networks are a part of AI, but not necessarily
all of it. So if we think of the big circle

337
00:27:35,619 --> 00:27:41,450
is the broad field of artificial intelligence,
within that is the world of machine learning.

338
00:27:41,450 --> 00:27:46,350
Within machine learning are connectionists
or neural networks that fall into a small

339
00:27:46,350 --> 00:27:52,889
camp within that. And deep neural networks
is a part of neural network. So can anyone

340
00:27:52,889 --> 00:27:56,259
maybe give me an example-- although, I've
said it numerous times-- of something that

341
00:27:56,259 --> 00:28:00,049
might fall out of machine learning, but into
artificial intelligence from an algorithmic

342
00:28:00,049 --> 00:28:03,549
point of view? Yes?

343
00:28:03,549 --> 00:28:05,112
AUDIENCE: Graph search.

344
00:28:05,112 --> 00:28:09,969
VIJAY GADEPALLY: Graph search could be an
example. I would maybe stick that into some

345
00:28:09,969 --> 00:28:11,409
of the connectionists, however.

346
00:28:11,409 --> 00:28:12,519
AUDIENCE: Expert systems.

347
00:28:12,519 --> 00:28:17,789
VIJAY GADEPALLY: Yes, exactly. So expert systems--
it's the one that comes to my mind. Or knowledge-based

348
00:28:17,789 --> 00:28:23,279
systems are an example of maybe something
that fall outside of the realm of machine

349
00:28:23,279 --> 00:28:29,389
learning, again in the very strict sense,
but maybe within the realm of artificial intelligence

350
00:28:29,389 --> 00:28:33,789
from an algorithmic point of view.

351
00:28:33,789 --> 00:28:36,960
OK, so that's a little bit on the algorithms.
Next, let's talk about some of the modern

352
00:28:36,960 --> 00:28:43,969
computing engines that are out there. I mentioned
that data compute as well as algorithms have

353
00:28:43,969 --> 00:28:50,200
been key drivers to the resurgence of AI over
the past few years. What are some of these

354
00:28:50,200 --> 00:28:58,159
computing technologies, for example? So clearly,
CPUs and GPUs. They're very popular computing

355
00:28:58,159 --> 00:29:02,679
platforms. Lots of software written to work
with these computing platforms.

356
00:29:02,679 --> 00:29:09,019
But what we're seeing now is that with the
end of Moore's Law and a lot more performance

357
00:29:09,019 --> 00:29:16,399
engineering going on, we're seeing a lot more
work, research, and hardware architectures

358
00:29:16,399 --> 00:29:21,919
that are custom in nature. And custom architectures
are almost the new commercial off-the-shelf

359
00:29:21,919 --> 00:29:23,980
solutions that are out there.

360
00:29:23,980 --> 00:29:29,980
So an example of a custom architecture could
be Google's Tensor Processing Unit, or TPU.

361
00:29:29,980 --> 00:29:34,269
There is some very exciting research going
on in the world of neuromorphic computing.

362
00:29:34,269 --> 00:29:38,279
I'm happy to chat with you all later if you're
interested to know what's going on in that

363
00:29:38,279 --> 00:29:41,830
area and maybe our role in some of that work.

364
00:29:41,830 --> 00:29:45,909
And there is just some stuff that we would
still call custom. These are still people

365
00:29:45,909 --> 00:29:51,690
deciding, designing, basically looking at
an algorithm saying, OK, here's the data layout.

366
00:29:51,690 --> 00:29:58,119
Here is the movement of data or information
within this algorithm. Let's create a custom

367
00:29:58,119 --> 00:30:03,499
processor that does that. An example of that
could be the graph processor, which is being

368
00:30:03,499 --> 00:30:06,679
developed at Lincoln Laboratory.

369
00:30:06,679 --> 00:30:13,149
And obviously, no slide on computing architectures
or computing technologies would be complete

370
00:30:13,149 --> 00:30:18,749
without mentioning the word quantum in it.
There is some early results on solving linear

371
00:30:18,749 --> 00:30:26,469
system of equations. But I think applied to
AI, it's still unsure, or unknown, or unproven

372
00:30:26,469 --> 00:30:31,330
where quantum may play a part. But certainly,
a technology that all of us, I'm sure, have

373
00:30:31,330 --> 00:30:36,070
heard of, or continue to track, or just interested
in seeing where that goes to.

374
00:30:36,070 --> 00:30:42,389
So within the first few, however, these are
all products that you can buy today. You can

375
00:30:42,389 --> 00:30:47,679
go out to your favorite-- your computing store
and just purchase these off-the-shelf solutions.

376
00:30:47,679 --> 00:30:53,710
A lot of software has been written to work
with these different technologies. And it's

377
00:30:53,710 --> 00:30:56,980
a really nice time to be involved. Yeah?

378
00:30:56,980 --> 00:31:10,330
AUDIENCE: Can you give a brief just concept
of what is attached to the [INAUDIBLE]? I

379
00:31:10,330 --> 00:31:12,929
see [INAUDIBLE], but I don't really have a
high-level concept of why I should associate

380
00:31:12,929 --> 00:31:13,929
with that.

381
00:31:13,929 --> 00:31:15,889
VIJAY GADEPALLY: OK, so the question is, what
should I think about when I'm thinking about

382
00:31:15,889 --> 00:31:22,119
neuromorphic? So there's a few features which
I say fall into the camp of what people are

383
00:31:22,119 --> 00:31:26,459
calling neuromorphic computing. One is what
they're calling a brain inspired architecture,

384
00:31:26,459 --> 00:31:28,499
which often means its clockless.

385
00:31:28,499 --> 00:31:36,080
So you typically have some-- so a lot of these
technologies have clocked movement of information.

386
00:31:36,080 --> 00:31:44,769
These might be clockless in nature. They typically
sit on top of different types of memory architectures.

387
00:31:44,769 --> 00:31:55,290
And I'm trying to think of what would be another
parameter that would be very useful. I can

388
00:31:55,290 --> 00:31:58,980
probably send you a couple things that help
highlight that. I certainly wouldn't call

389
00:31:58,980 --> 00:32:00,619
myself an expert in this area.

390
00:32:00,619 --> 00:32:01,619
AUDIENCE: OK, thanks.

391
00:32:01,619 --> 00:32:05,309
VIJAY GADEPALLY: But, yeah. I think the term
that's used is it's supposed to mimic the

392
00:32:05,309 --> 00:32:13,320
brain in the way that the computing architecture
actually performs or functions. So lots of

393
00:32:13,320 --> 00:32:18,279
research as well. And this is work that we've
done here at the lab on actually trying to

394
00:32:18,279 --> 00:32:23,889
map the performance of these different processors
and how they perform for different types of

395
00:32:23,889 --> 00:32:25,489
functions.

396
00:32:25,489 --> 00:32:31,169
So what we're doing here is basically looking
at the power on the x-axis. And the y-axis

397
00:32:31,169 --> 00:32:36,639
is the peak performance in giga operations
per second. Different types of precision are

398
00:32:36,639 --> 00:32:42,049
noted over there by the different shapes of
the boxes and then different form factors.

399
00:32:42,049 --> 00:32:46,950
And the idea here is basically to say there's
so much going on in the world of computing.

400
00:32:46,950 --> 00:32:51,339
How can we compare them? They all have their
own individual areas where they're strong.

401
00:32:51,339 --> 00:32:56,279
So one can't come up and say, well, the GPU
is better than the CPU. Well, it depends on

402
00:32:56,279 --> 00:32:59,880
what you're trying to do and what your goals
of the operation are.

403
00:32:59,880 --> 00:33:05,509
So some of the key lines to note here is that
there seems to be a lot of existing systems

404
00:33:05,509 --> 00:33:11,970
on this 100 giga operations per watt on this
line over here, this dash line. Some of the

405
00:33:11,970 --> 00:33:17,380
newer offerings maybe fit into the 1 tera
op per watt. And some of the research chips

406
00:33:17,380 --> 00:33:23,479
like IBM's TrueNorth or Intel's Arria fall
into just a bit under the 10 tera operations

407
00:33:23,479 --> 00:33:28,759
per watt line that we see there.

408
00:33:28,759 --> 00:33:33,119
But depending on the type of application,
you may be OK with a certain amount of peak

409
00:33:33,119 --> 00:33:36,729
power. So if you're looking at embedded applications,
you're probably somewhere over here, right?

410
00:33:36,729 --> 00:33:42,299
If you're trying to get something that's on
a little drone or something like that, you

411
00:33:42,299 --> 00:33:45,809
might want to go here. And if you have a data
center, you're probably OK with that type

412
00:33:45,809 --> 00:33:50,799
of power utilization or peak power utilization.
But you do need the performance that goes

413
00:33:50,799 --> 00:33:51,799
along with that.

414
00:33:51,799 --> 00:33:55,839
So I'd say the most important parts to look
at are essentially these different lines.

415
00:33:55,839 --> 00:34:02,619
Those are the trajectories for maybe some
of the existing systems, all the way up to

416
00:34:02,619 --> 00:34:09,659
some of the more research oriented processors
out there. OK? All right.

417
00:34:09,659 --> 00:34:13,918
So we talked about modern computing. Let's
talk a little bit about the robust AI side

418
00:34:13,918 --> 00:34:19,529
of things. And the basic idea between robust
AI is that it's extremely important. And the

419
00:34:19,529 --> 00:34:25,179
reason that it's important is that the consequence
of actions on certain applications of AI can

420
00:34:25,179 --> 00:34:28,000
be quite high.

421
00:34:28,000 --> 00:34:33,989
So what we've done here is think about, where
are the places that maybe humans and machines

422
00:34:33,989 --> 00:34:38,268
have their relative strength? So on the x-axis,
we're talking about the consequence of action.

423
00:34:38,268 --> 00:34:43,359
So this could be, does somebody get hurt if
the system doesn't do the right thing? All

424
00:34:43,359 --> 00:34:49,579
the way down to, no worries if the system
doesn't do the right thing, which could be

425
00:34:49,579 --> 00:34:55,389
maybe some of the labeling of images that
we see online, might fall into this category.

426
00:34:55,389 --> 00:34:58,240
I'm sure people disagree with me on that.

427
00:34:58,240 --> 00:35:02,420
But maybe a lot of national security applications.
Health applications certainly fall into the

428
00:35:02,420 --> 00:35:07,539
area of high consequence of action. If you
give someone the wrong treatment, that's a

429
00:35:07,539 --> 00:35:12,349
deal. And then on the y-axis, we're talking
about the confidence level in the machine

430
00:35:12,349 --> 00:35:16,840
making the decision. So how much confidence
do we have in the system that's actually making

431
00:35:16,840 --> 00:35:17,859
the decision?

432
00:35:17,859 --> 00:35:22,660
In certain cases, we might have very high
confidence in the system that's making a decision.

433
00:35:22,660 --> 00:35:26,349
And obviously in certain cases, we do not
have much confidence in the system making

434
00:35:26,349 --> 00:35:30,970
the decision. So in areas where you have a
low consequence of action, maybe high confidence

435
00:35:30,970 --> 00:35:35,190
level in the machine making the decision,
we might say those are best matched to machines.

436
00:35:35,190 --> 00:35:38,510
Those are good candidates for automation.

437
00:35:38,510 --> 00:35:43,330
On the contrary, there might be areas where
that consequence of action is very high. And

438
00:35:43,330 --> 00:35:47,000
we have very little confidence in the system
that's making the decision, probably an area

439
00:35:47,000 --> 00:35:53,299
we want humans to be intricately, if not solely
or involved or responsible. And the area in

440
00:35:53,299 --> 00:35:57,040
between is where machines might be augmenting
humans.

441
00:35:57,040 --> 00:36:03,690
Does anybody want to venture maybe a couple
of examples-- help come up with a couple of

442
00:36:03,690 --> 00:36:09,190
examples here that we might put into each
of these categories? So maybe what's a good

443
00:36:09,190 --> 00:36:13,970
problem that you can think of that might be
best matched to machines, beyond labeling

444
00:36:13,970 --> 00:36:15,180
images for advertisements?

445
00:36:15,180 --> 00:36:16,829
AUDIENCE: Assembly lines.

446
00:36:16,829 --> 00:36:24,420
VIJAY GADEPALLY: Assembly lines? Yep, that's
a good example. I'm thinking within spam filtering,

447
00:36:24,420 --> 00:36:29,150
could be another example where-- I mean, there
is some machine augmenting human. It does

448
00:36:29,150 --> 00:36:34,519
send you an email saying this is spam. Are
you sure? But for the most part, it's largely

449
00:36:34,519 --> 00:36:36,140
automated.

450
00:36:36,140 --> 00:36:40,019
I'd say a lot of the work that many of us
are probably doing falls into this category,

451
00:36:40,019 --> 00:36:43,859
maybe on different sides of the spectrum,
but of where machines are augmenting humans.

452
00:36:43,859 --> 00:36:49,660
So the system can be providing data back to
a human that can then select. It might filter

453
00:36:49,660 --> 00:36:54,250
information out for humans that then the humans
can then go ahead and say, OK, well, instead

454
00:36:54,250 --> 00:36:59,450
of looking at a thousand documents, I can
only look at 10, which is much better.

455
00:36:59,450 --> 00:37:05,329
And then there's obviously certain-- probably
we want humans to be heavily involved with

456
00:37:05,329 --> 00:37:10,661
any kinetic-- anything that involves life
or death-- we probably want. And there are

457
00:37:10,661 --> 00:37:14,510
probably legal reasons, also, that we want
humans involved with things like that.

458
00:37:14,510 --> 00:37:20,579
One of the examples that we often get which
is autonomous vehicles-- and it's always a

459
00:37:20,579 --> 00:37:24,809
little confusing where autonomous vehicles
fall into this. Certainly, the consequence

460
00:37:24,809 --> 00:37:28,700
of action of a mistake in an autonomous vehicle
can be pretty high.

461
00:37:28,700 --> 00:37:35,380
And as of today, the confidence and the decision-making
is medium at best. But people still seem to

462
00:37:35,380 --> 00:37:40,710
somehow be OK with fully automating. That
just shows how terrible Boston roads or driving

463
00:37:40,710 --> 00:37:45,940
in general is, that we're like, I'm not really
sure if this thing will kill me or not, but

464
00:37:45,940 --> 00:37:48,430
totally worth trying it out.

465
00:37:48,430 --> 00:37:55,410
AUDIENCE: Do you think the trend in this chart
is to slowly expand the yellow out?

466
00:37:55,410 --> 00:38:03,860
VIJAY GADEPALLY: Yes, I'd say-- the question
is, is the yellow expanding? I think so. One

467
00:38:03,860 --> 00:38:08,630
could make the argument that, is it shifting
that direction? Are we finding areas where--

468
00:38:08,630 --> 00:38:13,460
and I think that's maybe the direction. We
are probably looking at automating certain

469
00:38:13,460 --> 00:38:18,480
things a little bit more as confidence in
decision-making goes up.

470
00:38:18,480 --> 00:38:24,029
So you might think about this frontier moving
down so that maybe the green expanding slightly

471
00:38:24,029 --> 00:38:29,559
and the yellow taking over a little bit of
the red. There might be some places where

472
00:38:29,559 --> 00:38:38,980
over time, we're more open to the machine
making a decision and the human having a largely

473
00:38:38,980 --> 00:38:42,910
supervisory role, which I would put right
at this frontier between the yellow and the

474
00:38:42,910 --> 00:38:43,910
red.

475
00:38:43,910 --> 00:38:44,910
AUDIENCE: Again, I guess it depends on what
augmenting means. But I guess [INAUDIBLE]

476
00:38:44,910 --> 00:38:50,980
is truly red without any-- even cognitive
augmenting.

477
00:38:50,980 --> 00:39:01,230
VIJAY GADEPALLY: I can think of some examples,
but maybe I'll share it with you later. So

478
00:39:01,230 --> 00:39:05,890
certainly, a robust artificial intelligence
plays a very important part in the development

479
00:39:05,890 --> 00:39:10,109
and deployment of AI systems. I won't go through
the details of each of these.

480
00:39:10,109 --> 00:39:14,559
I'm sure many of you are very familiar with
it. And I know a few of you are far more knowledgeable

481
00:39:14,559 --> 00:39:21,250
about this, maybe than I am. But some of the
key features would be explainable AI, which

482
00:39:21,250 --> 00:39:26,780
is a system being able to describe what it's
doing in an interpretable fashion.

483
00:39:26,780 --> 00:39:34,260
Metrics-- so being able to provide the right
metric if you want to go beyond accuracy or

484
00:39:34,260 --> 00:39:38,359
performance. Validation and verification--
there might be cases where you're not really

485
00:39:38,359 --> 00:39:43,000
concerned about the explainability, but you
just want to know that when I pass an input,

486
00:39:43,000 --> 00:39:48,359
I get a known output out of it. And is there
a way to confirm that I'm able to do that?

487
00:39:48,359 --> 00:39:52,770
Another could be on security. So an example
of this-- or not having security would be

488
00:39:52,770 --> 00:39:59,970
counter AI, right? So when we talk about security
within the context of robust AI, it's almost

489
00:39:59,970 --> 00:40:04,539
like the cryptographic way of thinking about
it, which is, can I protect the confidentiality,

490
00:40:04,539 --> 00:40:12,430
integrity, and availability of my algorithm,
the data sets, the outputs, the weights, the

491
00:40:12,430 --> 00:40:14,740
biases, et cetera?

492
00:40:14,740 --> 00:40:20,970
And finally, a lot of significant importance
is policy, ethics, safety, and training. This

493
00:40:20,970 --> 00:40:25,630
is actually very important in some of those
applications where in the previous slide,

494
00:40:25,630 --> 00:40:31,720
we had the yellow and the red where humans
and machines augmenting humans, where that

495
00:40:31,720 --> 00:40:32,720
falls.

496
00:40:32,720 --> 00:40:36,100
A lot of that might be governed by policy,
ethics, safety, and training, which is some

497
00:40:36,100 --> 00:40:40,829
of the examples that I can think of, where
there are policy reasons that make it that

498
00:40:40,829 --> 00:40:49,210
only a human can be involved with this, maybe
with minimal input from a system. OK.

499
00:40:49,210 --> 00:40:54,309
And the final component of our AI architecture--
we've gone through conditioning, algorithms,

500
00:40:54,309 --> 00:41:01,079
computing, robust AI-- is human machine teaming.
And I think what we want to get across with

501
00:41:01,079 --> 00:41:04,890
human machine teaming-- that is it really
depends on the application and what you're

502
00:41:04,890 --> 00:41:09,180
trying to do. But it is important to think
about the human and the machine working together.

503
00:41:09,180 --> 00:41:14,630
And there is a spectrum of where the machine
will largely-- will play a large part and

504
00:41:14,630 --> 00:41:18,800
the human largely supervisory or to where
the human plays a large part and the machine

505
00:41:18,800 --> 00:41:25,009
is very targeted in what you do with the machine
or the AI of the system.

506
00:41:25,009 --> 00:41:30,609
But a couple of ways to think about it would
be-- of course, we talked about the confidence

507
00:41:30,609 --> 00:41:36,390
level versus consequence of actions, but also,
the scale versus application complexity. So

508
00:41:36,390 --> 00:41:41,309
on the top chart over there, we have on the
x-axis is the application complexities. How

509
00:41:41,309 --> 00:41:46,180
complex is this application? And on the y-axis
is the scale. How many times do you need to

510
00:41:46,180 --> 00:41:48,349
keep doing this thing?

511
00:41:48,349 --> 00:41:52,599
Places that machines might be more effective
than humans are where we have low application

512
00:41:52,599 --> 00:41:58,670
complexity, but very, very high skill. So
again, spam filtering falls into this. The

513
00:41:58,670 --> 00:42:04,880
complexity of spam filtering has gone up over
time, but is something that is reasonable

514
00:42:04,880 --> 00:42:08,400
within systems. But the scale is very high,
that we just don't want a human being involved

515
00:42:08,400 --> 00:42:09,579
with that process.

516
00:42:09,579 --> 00:42:13,349
And on the other end of the spectrum is where
you have very high application complexity

517
00:42:13,349 --> 00:42:19,690
that'll only happen a couple of times. So
this could be, say, reviewing a situation.

518
00:42:19,690 --> 00:42:24,289
Maybe a company is trying to make an acquisition.
It's not going to happen over and over.

519
00:42:24,289 --> 00:42:29,660
So you might have a human involved with that,
that goes through a lot of that. Maybe they

520
00:42:29,660 --> 00:42:33,590
target the system to go look for specific
pieces of information. But really, it's the

521
00:42:33,590 --> 00:42:36,900
human that might be more effective in that,
especially given that the situation would

522
00:42:36,900 --> 00:42:39,730
change over and over. All right.

523
00:42:39,730 --> 00:42:45,400
So with that, we're going do take a quick
tour of the world of machine learning. I'll

524
00:42:45,400 --> 00:42:54,070
stop there for a second. Any questions? OK.
All right. So what is machine learning? Always

525
00:42:54,070 --> 00:42:59,450
a good place to start. It's the study of algorithms
to improve their performance at some task

526
00:42:59,450 --> 00:43:04,059
with experience. In this context, experience
is data.

527
00:43:04,059 --> 00:43:09,829
And they typically do this by optimizing based
on some performance criteria that uses example

528
00:43:09,829 --> 00:43:15,580
data or past experience. So in the world of
supervised learning, that could be the example

529
00:43:15,580 --> 00:43:21,390
data. Or past experience could be the correct
label, given an input data set or input data

530
00:43:21,390 --> 00:43:23,630
point.

531
00:43:23,630 --> 00:43:28,240
Machine learning is a combination of techniques
from statistics, computer, and computer science

532
00:43:28,240 --> 00:43:34,150
communities. And it's the idea of getting
computers to program themselves. Common tasks

533
00:43:34,150 --> 00:43:37,950
within the world of machine learning could
be things like classification, regression,

534
00:43:37,950 --> 00:43:41,210
prediction, clustering, et cetera.

535
00:43:41,210 --> 00:43:46,109
For those who are maybe making the shift to
machine learning from traditional programming,

536
00:43:46,109 --> 00:43:52,260
I found this, again, from Pedro Domingos to
be a very useful way of describing it to people.

537
00:43:52,260 --> 00:43:56,309
So in traditional programming, you have a
data set. You write a program, which would

538
00:43:56,309 --> 00:44:01,650
be if you see this, do that. When you see
this, do that. For this many instances, do

539
00:44:01,650 --> 00:44:04,480
the following thing on it and then write an
output out, right?

540
00:44:04,480 --> 00:44:09,530
So you input a data into the program into
a computer, and the computer produces an output

541
00:44:09,530 --> 00:44:14,849
where it says, OK, I've applied this program
on that data. And this gives me the output.

542
00:44:14,849 --> 00:44:19,289
Machine learning is a very different way of
thinking about it in which you're almost inputting

543
00:44:19,289 --> 00:44:21,289
the data as well as the output.

544
00:44:21,289 --> 00:44:25,810
So in this case, the data could be unlabeled
images. The output could be the labels associated

545
00:44:25,810 --> 00:44:30,839
with those images. And you tell the computer,
figure out what the program would look like.

546
00:44:30,839 --> 00:44:34,599
And this is a slightly different way of thinking
about machine learning versus traditional

547
00:44:34,599 --> 00:44:36,019
programming.

548
00:44:36,019 --> 00:44:41,759
What are some of these programs or algorithms
that the computer might use to figure it out?

549
00:44:41,759 --> 00:44:47,549
So within the large realm of machine learning,
we have supervised, unsupervised reinforcement

550
00:44:47,549 --> 00:44:53,030
learning. What we have in the brackets is
essentially what you're providing in the world.

551
00:44:53,030 --> 00:44:56,589
In the case of supervised learning, you're
providing labels, which is the correct label

552
00:44:56,589 --> 00:45:02,410
associated with an input feature or with an
input data set or data point. In unsupervised

553
00:45:02,410 --> 00:45:06,080
learning, you typically have no labels, but
also are limited by what the algorithm itself

554
00:45:06,080 --> 00:45:07,080
can do.

555
00:45:07,080 --> 00:45:12,109
And in the world of reinforcement learning,
instead of a label per data point, you're

556
00:45:12,109 --> 00:45:15,859
providing the reward information to the system
that says if you're doing more-- if you're

557
00:45:15,859 --> 00:45:18,760
doing the right thing, I'm going to give you
some points. If you're doing the wrong thing,

558
00:45:18,760 --> 00:45:23,480
I'm going to take away some points-- very
useful in very complex applications where

559
00:45:23,480 --> 00:45:28,210
you can't really figure out the labels associated
with each data point.

560
00:45:28,210 --> 00:45:32,579
Within the world of supervised learning, the
typical tasks that people have-- and I should

561
00:45:32,579 --> 00:45:36,460
note before I go through this. There's a lot
of overlap between all of these different

562
00:45:36,460 --> 00:45:42,069
pieces. So this is a high-level view. But
we can certainly argue about the specific

563
00:45:42,069 --> 00:45:45,849
positioning of everything. I'm sure we can.

564
00:45:45,849 --> 00:45:50,099
So within supervised learning, you can fall
into classification regression. Unsupervised

565
00:45:50,099 --> 00:45:55,420
learning is typically clustering dimensionality
reduction. And within these, there are different

566
00:45:55,420 --> 00:46:00,760
algorithms that fall into place. So examples
could be things like neural nets that cover

567
00:46:00,760 --> 00:46:02,349
all of these spaces.

568
00:46:02,349 --> 00:46:10,589
You got-- just take regression, PCA, which
might fall into dimensionality reduction,

569
00:46:10,589 --> 00:46:14,960
lots and lots of different techniques and
also some in the reinforcement learning world.

570
00:46:14,960 --> 00:46:20,609
And there's just more and more and more. If
you open up a survey of machine learning,

571
00:46:20,609 --> 00:46:25,140
it'll give you even more than all of these
techniques over here.

572
00:46:25,140 --> 00:46:28,990
And the thing to remember when you're using
machine learning is that there are some common

573
00:46:28,990 --> 00:46:34,309
pitfalls that you can fall into. An example
of that would be overfitting versus underfitting

574
00:46:34,309 --> 00:46:38,690
where you come up with this awesome model
that does really, really well on your training

575
00:46:38,690 --> 00:46:39,690
data.

576
00:46:39,690 --> 00:46:46,900
You apply it to your test data, and you get
terrible results. You might have done a really

577
00:46:46,900 --> 00:46:51,309
good job learning the training data, but not
necessarily learning-- being able to generalize

578
00:46:51,309 --> 00:46:59,130
beyond that. Sometimes it could be just the
algorithm itself is unable to correctly model

579
00:46:59,130 --> 00:47:01,950
the behavior that's exhibited by the training
and test data.

580
00:47:01,950 --> 00:47:06,351
I won't go through each of these again, but
there might just be bad, noisy, missing data.

581
00:47:06,351 --> 00:47:09,690
That certainly happens where you end up with
an algorithm with terrible results. And you

582
00:47:09,690 --> 00:47:12,700
look at it and you're like, well, why is that?
And you actually look at the data that you

583
00:47:12,700 --> 00:47:17,599
did. And it was incorrect, that there was
just missing features. Or it was noisy in

584
00:47:17,599 --> 00:47:23,059
nature, such that the actual phenomenology
that you were trying to look for was hidden

585
00:47:23,059 --> 00:47:24,059
within the noise.

586
00:47:24,059 --> 00:47:28,730
You might have picked the wrong model. You
might have used a linear model in a non-linear

587
00:47:28,730 --> 00:47:34,839
case where the phenomenology you're trying
to describe is non-linear in nature, but maybe

588
00:47:34,839 --> 00:47:40,249
you've used a linear model. You've not done
a good job of separating training versus testing

589
00:47:40,249 --> 00:47:44,510
data, et cetera, et cetera.

590
00:47:44,510 --> 00:47:49,220
So we'll just take a quick view into each
of these different learning paradigms. So

591
00:47:49,220 --> 00:47:55,089
the first is on supervised learning. And you
basically start with label data or what we

592
00:47:55,089 --> 00:47:59,460
call-- it was often referred to as ground
truth. And you build a model that predicts

593
00:47:59,460 --> 00:48:02,349
labels, given new pieces of data.

594
00:48:02,349 --> 00:48:07,400
And you have two general goals. One is in
regression, which is to predict some continuous

595
00:48:07,400 --> 00:48:12,839
variable or a classification, which is to
predict a class or label. So if we look at

596
00:48:12,839 --> 00:48:21,130
this, the diagram on the right, we have training
data that we provide, which is data and labels.

597
00:48:21,130 --> 00:48:27,349
That goes into a trained model. That's typically
an iterative process where we find out, well,

598
00:48:27,349 --> 00:48:32,650
did we do a good job? That is now called a
supervised learning model that we then apply

599
00:48:32,650 --> 00:48:37,539
new data, or test data, or unseen data and
look at the predicted labels.

600
00:48:37,539 --> 00:48:40,960
Typically, when you are designing an algorithm
like this, you'd separate out. You'd take

601
00:48:40,960 --> 00:48:45,610
your training data. You'd remove a small portion
of it that you do know the labels for. That's

602
00:48:45,610 --> 00:48:50,060
your test data over here. And then you run
that. And you can see, well, is it working

603
00:48:50,060 --> 00:48:52,690
well or not?

604
00:48:52,690 --> 00:48:57,400
And most of these algorithms have a training
step that forms a model. So when we talk about

605
00:48:57,400 --> 00:49:02,499
machine learning in both the supervised and
unsupervised sense, we'll often talk about

606
00:49:02,499 --> 00:49:08,279
training the model, which is this process,
and then inference, which is the second step,

607
00:49:08,279 --> 00:49:13,660
which is where you apply unseen data. So this
is the trained model in deployment or in the

608
00:49:13,660 --> 00:49:16,579
field. It's performing inference at that point.

609
00:49:16,579 --> 00:49:23,650
Of course, no class these days on machine
learning and AI could go without talking about

610
00:49:23,650 --> 00:49:30,130
neural networks. And as I mentioned, neural
networks do form a very important part of

611
00:49:30,130 --> 00:49:33,680
machine learning. And they certainly are an
algorithm that many of you, I'm sure, are

612
00:49:33,680 --> 00:49:38,810
familiar with. And they fall well within the
supervised and unsupervised. And they've been

613
00:49:38,810 --> 00:49:42,009
used for so many different applications at
this point.

614
00:49:42,009 --> 00:49:46,990
So what's a neural network? A computing system
inspired by biological networks. And the system

615
00:49:46,990 --> 00:49:52,710
essentially learns by repetitive training
to do tasks based on examples. Much of the

616
00:49:52,710 --> 00:49:56,410
work that we've seen is typically it being
applied to supervised learning, though I'll

617
00:49:56,410 --> 00:50:00,509
mention some that we are doing, some research
and actually applying it for unsupervised

618
00:50:00,509 --> 00:50:04,099
learning as well. And they're quite powerful.

619
00:50:04,099 --> 00:50:09,339
The components of a neural network include
inputs, layers, outputs, and weights. So these

620
00:50:09,339 --> 00:50:14,039
are often the terms that someone will use.
And a deep neural network has lots of hidden

621
00:50:14,039 --> 00:50:20,799
layers. Does anyone here have a better definition
for what deep neural network means beyond

622
00:50:20,799 --> 00:50:22,645
lots? I've heard definitions anywhere, 3 and
above. Yes?

623
00:50:22,645 --> 00:50:24,560
AUDIENCE: [INAUDIBLE] deep neural network
will occur at any recurrent networks. Because

624
00:50:24,560 --> 00:50:36,299
that has more than one layer, but not necessarily
more than one layer after you have actually

625
00:50:36,299 --> 00:50:37,742
written the code for it.

626
00:50:37,742 --> 00:50:41,580
VIJAY GADEPALLY: OK, so one definition here
for deep is-- and this is-- anyone have a

627
00:50:41,580 --> 00:50:48,539
better-- no. So the one to beat right now
is-- a feature of a deep neural network could

628
00:50:48,539 --> 00:50:54,980
be recurrence within the network architecture,
which implies that there is some depth to

629
00:50:54,980 --> 00:51:02,710
the overall network. So above 3 with recurrence--
deep. All right.

630
00:51:02,710 --> 00:51:08,040
Lots of variance within the supervised world
of neural networks, such as convolutional

631
00:51:08,040 --> 00:51:12,541
neural networks, recursive neural networks,
deep belief networks. One, I think, in my

632
00:51:12,541 --> 00:51:18,630
opinion-- again, since you've all asked me
to opine here. I know you've not, but I think

633
00:51:18,630 --> 00:51:22,390
a reason that these are so popular these days
is there's so many tools out there that are

634
00:51:22,390 --> 00:51:24,609
very easy to use.

635
00:51:24,609 --> 00:51:28,369
You can just go online and within about five
minutes, write your first neural network.

636
00:51:28,369 --> 00:51:36,130
Try writing a hidden mark-off model that quickly.
Maybe there are people who can, but in general.

637
00:51:36,130 --> 00:51:41,650
So what are the features of a deep neural
network? So you have some input features.

638
00:51:41,650 --> 00:51:47,359
You have weights, which are essentially associated
with each line over here, as well as biases

639
00:51:47,359 --> 00:51:51,789
for each of the layers that govern the interaction
between the layers and then an output layer.

640
00:51:51,789 --> 00:51:57,039
So these input features can often be combined
to each other. So these feature vectors that

641
00:51:57,039 --> 00:52:01,559
are coming in can often be combined. Think
Jeremy we'll talk a little bit about how the

642
00:52:01,559 --> 00:52:04,520
matrix view of all of this.

643
00:52:04,520 --> 00:52:09,839
But you can think of it as if your-- an example
could be if you have an image, it could be

644
00:52:09,839 --> 00:52:17,799
the RGB pixel values of each pixel in that
image, could be the input feature. So you

645
00:52:17,799 --> 00:52:22,510
could have large numbers of input features.
If you have a time series signal, it could

646
00:52:22,510 --> 00:52:28,410
be the amplitude or the magnitude at a particular
frequency or at a particular step.

647
00:52:28,410 --> 00:52:33,140
There's often a combination of features that
you might do. So in addition to the pixel

648
00:52:33,140 --> 00:52:39,000
intensities for an image, you might also then
combine the spatial distance between two pixels.

649
00:52:39,000 --> 00:52:43,369
Or its position within the image may also
be another input feature. And you can really

650
00:52:43,369 --> 00:52:47,280
go hog wild over here, just trying to come
up with new features.

651
00:52:47,280 --> 00:52:52,030
And there's a lot of research just in that
area, which is I take a data set that everyone

652
00:52:52,030 --> 00:52:56,740
knows. And I'm just going to spend a lot of
time doing feature engineering, which is coming

653
00:52:56,740 --> 00:53:01,730
up with, well, what is the right way to do
the features? So coming back to an earlier

654
00:53:01,730 --> 00:53:07,400
question, this is an area where people are
often looking at supplementing maybe a given

655
00:53:07,400 --> 00:53:09,799
data set with additional data.

656
00:53:09,799 --> 00:53:17,249
And then fusing those two pieces together,
for example, could be audio and text together

657
00:53:17,249 --> 00:53:24,579
as input features to a network, that you can
then learn that might do a better job. But

658
00:53:24,579 --> 00:53:30,600
all of this is governed by this really, really
simple, but powerful equation, which is that

659
00:53:30,600 --> 00:53:39,749
the output at the i plus 1th layer is given
by some non-linear function of the weights

660
00:53:39,749 --> 00:53:45,650
multiplied by the inputs from the previous
step, plus some bias term then.

661
00:53:45,650 --> 00:53:49,109
And when you're learning-- when you're training
a machine learning model, you're essentially

662
00:53:49,109 --> 00:53:54,000
trying to figure out what the Ws are and what
the Bs are. That's really what a model is

663
00:53:54,000 --> 00:54:00,640
defined as. So if we zoom into one of these
pieces, it's actually pretty straightforward

664
00:54:00,640 --> 00:54:02,240
what's going on over here.

665
00:54:02,240 --> 00:54:05,749
So you have your inputs that are coming from
the previous layers, so this could be your

666
00:54:05,749 --> 00:54:11,550
Y sub i. Here are the different weights, so
W1, W2, W3. These are the connections or the

667
00:54:11,550 --> 00:54:18,819
weights going into a neuron or a node. And
you're performing some function on these inputs.

668
00:54:18,819 --> 00:54:22,700
And that function is referred to as an activation
function.

669
00:54:22,700 --> 00:54:26,130
So let's just take an example where we have
some actual numbers. Maybe I've gone through.

670
00:54:26,130 --> 00:54:31,230
I've trained my models. I figured out that
just for this one dot in that big network

671
00:54:31,230 --> 00:54:40,079
that we saw earlier, that my weights are 2.7,
8.6, and 0.002. My inputs from the previous

672
00:54:40,079 --> 00:54:46,779
layer is maybe -0.06, 2.5, 1.4.

673
00:54:46,779 --> 00:54:56,069
And all I'm doing is coming up with this x,
which is -0.06 multiplied by 2.7, plus 2.5,

674
00:54:56,069 --> 00:55:03,780
times 8.6, plus 1.4 times that. That gives
me some number-- 21.34. I apply my non-linear

675
00:55:03,780 --> 00:55:08,829
function, which in this case is a sigmoid
governed by that equation at the top right.

676
00:55:08,829 --> 00:55:15,670
And I say f of 21.34, so somewhere way over
there is approximately 1, right? So this--

677
00:55:15,670 --> 00:55:21,789
probably a little less than 1, but approximately
1 for the purpose of this.

678
00:55:21,789 --> 00:55:26,220
And you just do that over and over. So really,
a neural network-- I think the power of a

679
00:55:26,220 --> 00:55:30,609
neural network is it allows you to encode
a lot less information than many of the other

680
00:55:30,609 --> 00:55:35,109
machine learning algorithms out there at the
cost, typically, of a lot more data being

681
00:55:35,109 --> 00:55:40,519
used and a lot more computing being used.
But for many people, that's perfectly fine,

682
00:55:40,519 --> 00:55:41,519
right?

683
00:55:41,519 --> 00:55:45,249
But it does take-- it's just over and over,
back and forth, back and forth, back and forth

684
00:55:45,249 --> 00:55:54,670
to come up with, what's the right Ws in order
for this to give me a result that looks reasonable?

685
00:55:54,670 --> 00:55:58,989
Lots of work going on and just deciding the
right activation function.

686
00:55:58,989 --> 00:56:06,200
I showed you a sigmoid over there. We do a
lot of work with ReLU units. The choices--

687
00:56:06,200 --> 00:56:10,369
there are certain applications-- certain,
I should say, domains or applications where

688
00:56:10,369 --> 00:56:16,560
people have found that a particular activation
function tends to work well.

689
00:56:16,560 --> 00:56:21,440
But that choice is something I leave to domain
experts to maybe look at their problem and

690
00:56:21,440 --> 00:56:25,989
figure out what are the relative advantages.
Each of these have their own advantages. I

691
00:56:25,989 --> 00:56:30,230
know, for example, one of the big advantages
of rectified linear unit is that since you're

692
00:56:30,230 --> 00:56:35,900
not limiting yourself between a -0 and 1 range,
you don't have to do that. You don't run into

693
00:56:35,900 --> 00:56:39,839
a problem of vanishing gradients. That doesn't
mean much for people. That's OK. We're not

694
00:56:39,839 --> 00:56:43,980
going to spend too much time talking about
that anyhow.

695
00:56:43,980 --> 00:56:49,089
AUDIENCE: Vijay?

696
00:56:49,089 --> 00:56:56,760
VIJAY GADEPALLY: Yeah?

697
00:56:56,760 --> 00:56:59,383
AUDIENCE: So in general, [INAUDIBLE].

698
00:56:59,383 --> 00:57:03,799
VIJAY GADEPALLY: So the question is picking
the activation functions, picking the number

699
00:57:03,799 --> 00:57:09,750
of layers. We'll talk about that in a couple
of slides. But there is a lot of art. Trial

700
00:57:09,750 --> 00:57:14,580
and error-- yes, but also, we'll call it art
as well that's involved with coming up with

701
00:57:14,580 --> 00:57:15,580
that.

702
00:57:15,580 --> 00:57:20,539
A lot of what happens in practice, however,
is you find an application area, which looks

703
00:57:20,539 --> 00:57:24,680
very similar to the problem that you are trying
to solve. And you might borrow the architecture

704
00:57:24,680 --> 00:57:29,210
from there and use that as a starting point
in coming up with where you start. Yeah?

705
00:57:29,210 --> 00:57:38,140
AUDIENCE: Are you aware of any research of
some type of parameterizing the activation

706
00:57:38,140 --> 00:57:43,200
function and then trying to learn the activation
function?

707
00:57:43,200 --> 00:57:48,079
VIJAY GADEPALLY: I'm sure people are doing
it. I'm personally not familiar with that

708
00:57:48,079 --> 00:57:50,440
research. I don't if anyone else in the room
has-- yep?

709
00:57:50,440 --> 00:57:51,839
AUDIENCE: [INAUDIBLE] DARPA D3M program, so
data-driven machine learning. You're trying

710
00:57:51,839 --> 00:57:55,250
to learn both the architecture of the network
and the activation function and therefore

711
00:57:55,250 --> 00:58:07,900
all the other attributes. Because you're trying
to just go from data set to machine learning

712
00:58:07,900 --> 00:58:10,019
system with no human intervention.

713
00:58:10,019 --> 00:58:16,609
VIJAY GADEPALLY: So the question was, is there
any research into parameterizing the activation

714
00:58:16,609 --> 00:58:23,599
function? So I guess the model as a whole.
So, yeah, there is. And one of the responses

715
00:58:23,599 --> 00:58:28,770
was that there is a program run by DARPA,
which is the D3M program, which is really

716
00:58:28,770 --> 00:58:37,440
looking at, can you go from data to result
with no or almost no human intervention?

717
00:58:37,440 --> 00:58:42,220
I'm not familiar with activation function
parameterization. But certainly, network model

718
00:58:42,220 --> 00:58:47,730
parameterization is absolutely there. So people
who are running optimization models to basically

719
00:58:47,730 --> 00:58:52,660
look for-- I have this particular set of resources.
What is the best model architecture that fits

720
00:58:52,660 --> 00:58:53,910
into that?

721
00:58:53,910 --> 00:58:59,190
Maybe I want to deploy this on a really tiny
processor that only gives me 16 megabytes

722
00:58:59,190 --> 00:59:03,859
of memory. I want to make sure that my model
and data can fit on that. Can you find what

723
00:59:03,859 --> 00:59:08,099
would be the ideal model for that? So that's
absolutely something that people are doing

724
00:59:08,099 --> 00:59:14,050
right now. But I'm not sure if people are
trying to come up with, I guess, brand new

725
00:59:14,050 --> 00:59:19,180
activation functions. All right.

726
00:59:19,180 --> 00:59:25,250
So lots of stuff in the neural network landscape.
And as I mentioned earlier, neural network

727
00:59:25,250 --> 00:59:29,029
training is essentially adjusting weights
until the function represented by the neural

728
00:59:29,029 --> 00:59:34,259
network essentially does what you would like
it to do. And the key idea here is to iteratively

729
00:59:34,259 --> 00:59:36,900
adjust weights to reduce the error.

730
00:59:36,900 --> 00:59:42,040
So what you do is you take some random instantiation
of your neural network or maybe based on another

731
00:59:42,040 --> 00:59:47,809
domain or another problem. You might borrow
that. And you start there. And then you pass

732
00:59:47,809 --> 00:59:52,539
a data set in. You look at the output and
you say, that's not right. What went wrong

733
00:59:52,539 --> 00:59:53,539
over here?

734
00:59:53,539 --> 00:59:58,480
And you go back and adjust things and do that
again, and again, and again, and again, and

735
00:59:58,480 --> 01:00:02,680
again, over and over, until you get something
that looks reasonable. That's really what's

736
01:00:02,680 --> 01:00:07,769
going on over there. And so real neural networks
can have thousands of input, data points--

737
01:00:07,769 --> 01:00:10,799
hundreds of layers, and millions to billions
of weight changes per iteration. Yes?

738
01:00:10,799 --> 01:00:11,799
AUDIENCE: So what you're talking about is
[INAUDIBLE] adjustment [INAUDIBLE]. Do you

739
01:00:11,799 --> 01:00:14,299
know of any [INAUDIBLE] this process?

740
01:00:14,299 --> 01:00:24,999
VIJAY GADEPALLY: Yes, there's a lot of work
being done to parallelize this--

741
01:00:24,999 --> 01:00:25,999
AUDIENCE: Like, for--

742
01:00:25,999 --> 01:00:26,999
VIJAY GADEPALLY: --and by default.

743
01:00:26,999 --> 01:00:27,999
AUDIENCE: [INAUDIBLE]?

744
01:00:27,999 --> 01:00:32,740
VIJAY GADEPALLY: So the question is when--
as I just described it right now, it's a serial

745
01:00:32,740 --> 01:00:38,400
process where I pass one data point in. It
goes all the way to the end. It says, oh,

746
01:00:38,400 --> 01:00:42,980
this is the output-- goes back and adjusts.
Are there techniques that people are doing

747
01:00:42,980 --> 01:00:48,420
to do this in a distributed fashion? And the
answer to that is a strong yes. It's a very

748
01:00:48,420 --> 01:00:53,700
active area in especially high-performance
computing and machine learning.

749
01:00:53,700 --> 01:00:59,809
We might talk about this in-- are we talking
about this on day three? We might talk a little

750
01:00:59,809 --> 01:01:04,609
bit about it. But there is model parallelism,
which is I have the model itself distributed

751
01:01:04,609 --> 01:01:09,579
across multiple pieces. And I want to adjust
different pieces of the model at the same

752
01:01:09,579 --> 01:01:15,039
time. There's research and lots of results.
I think we might even have some examples that

753
01:01:15,039 --> 01:01:16,880
people are doing with that.

754
01:01:16,880 --> 01:01:25,819
AUDIENCE: Have you got some examples on the
[INAUDIBLE] approach, the [INAUDIBLE] approach?

755
01:01:25,819 --> 01:01:27,470
VIJAY GADEPALLY: A little bit earlier.

756
01:01:27,470 --> 01:01:28,470
AUDIENCE: Communication [INAUDIBLE].

757
01:01:28,470 --> 01:01:33,829
VIJAY GADEPALLY: So there are many different
ways to parallelize it. One would be data

758
01:01:33,829 --> 01:01:40,270
parallelism, which is I take my big data set
or big data point, and I distribute that across

759
01:01:40,270 --> 01:01:44,359
my different nodes. And each one independently
learns a model that works well. And then I

760
01:01:44,359 --> 01:01:47,750
do some synchronization across these different
pieces.

761
01:01:47,750 --> 01:01:51,539
There are also techniques where you have--
the model itself may be too big to sit on

762
01:01:51,539 --> 01:01:55,700
a single node or a single processing element.
And you might have to distribute that. So,

763
01:01:55,700 --> 01:02:02,609
yes, a lot of very interesting research going
on in that area. And by default, when you

764
01:02:02,609 --> 01:02:07,809
do run things, they are running in parallel,
just even on your GPU. They're using multiple

765
01:02:07,809 --> 01:02:12,529
cores at once. So there is some level of--
within the node itself, parallelism that runs

766
01:02:12,529 --> 01:02:18,130
by default on most machine learning software.

767
01:02:18,130 --> 01:02:22,140
So inferences-- I mentioned is just using
the trained model again. And the power of

768
01:02:22,140 --> 01:02:25,950
neural networks really falls within their
non-linearity. So you have that non-linear

769
01:02:25,950 --> 01:02:31,230
F function that you're applying over and over
and over across your layers. And this crudely

770
01:02:31,230 --> 01:02:38,509
drawn diagram on my iPad-- this is not clear
at all.

771
01:02:38,509 --> 01:02:44,538
If you had Xs and Os, right? It reminds me
of a song. And you have features over here.

772
01:02:44,538 --> 01:02:48,579
And you're trying to basically classify it.
Which is an X? And which is an O? A linear

773
01:02:48,579 --> 01:02:54,019
classifier could do a pretty good job in this
type of situation. And you could apply a neural

774
01:02:54,019 --> 01:02:57,099
network to this, but maybe it's overkill in
that type of situation.

775
01:02:57,099 --> 01:03:02,339
But in some feature space, if this is how
your Xs and Os are divided amongst each other

776
01:03:02,339 --> 01:03:05,910
and you're trying to come up with the right
label, one thing I might suggest is maybe

777
01:03:05,910 --> 01:03:09,411
find another feature space that you could
maybe get a better separation between the

778
01:03:09,411 --> 01:03:13,599
two. Or a technique like a neural network
might do a very good job. Or any of these

779
01:03:13,599 --> 01:03:17,299
non-linear machine learning techniques might
do a very good job for looking for these really

780
01:03:17,299 --> 01:03:20,980
complex decision boundaries that are out there.
All right.

781
01:03:20,980 --> 01:03:26,999
So you mentioned earlier when you're designing
a neural network, what do you have to do?

782
01:03:26,999 --> 01:03:30,770
What are the different choices, et cetera?
There is a lot going on here. So you have

783
01:03:30,770 --> 01:03:35,410
to pick the depth, the number of layers, the
inputs, and what the inputs are, the type

784
01:03:35,410 --> 01:03:39,910
of network that you're using, the types of
layers, the training algorithm and metrics

785
01:03:39,910 --> 01:03:42,839
that you're using to assess the performance
of this neural network.

786
01:03:42,839 --> 01:03:46,910
The good thing, however, is it so expensive
to train a neural network, that you largely

787
01:03:46,910 --> 01:03:50,920
are not making these decisions in many cases.
You just pick up what somebody else has done,

788
01:03:50,920 --> 01:03:54,660
and you start from there, and then you start.
That might be-- I don't know if that's a good

789
01:03:54,660 --> 01:04:00,749
or a bad thing. But that's often a way in
practice that people end up doing this.

790
01:04:00,749 --> 01:04:07,430
But there is some theory on the general approach.
I think in this short amount of time, which

791
01:04:07,430 --> 01:04:11,740
I'm already over, we won't be able to get
into it. But I'm happy to-- actually, these

792
01:04:11,740 --> 01:04:15,990
slides have backups on them. So when I share
them with you, they do have a lot more detail

793
01:04:15,990 --> 01:04:18,730
on each of these different pieces. All right.

794
01:04:18,730 --> 01:04:22,720
Very quickly, we'll talk about unsupervised
learning. And the basic idea is the task of

795
01:04:22,720 --> 01:04:27,641
describing a hidden structure from unlabeled
data. So in contrast to supervised learning,

796
01:04:27,641 --> 01:04:31,579
we are not providing labels. We're just giving
the algorithm a data set and saying, tell

797
01:04:31,579 --> 01:04:34,330
me something cool that's going on over here.

798
01:04:34,330 --> 01:04:39,630
Now, clearly, you can't label the data if
you do that. But what you can do is maybe

799
01:04:39,630 --> 01:04:46,440
look for clusters or look for dimensions or
pieces of the data that are unimportant or

800
01:04:46,440 --> 01:04:52,049
extraneous. So if we observe certain features,
we would like to observe the patterns amongst

801
01:04:52,049 --> 01:04:53,349
these features.

802
01:04:53,349 --> 01:04:57,970
And the typical tasks that one would do in
unsupervised learning is clustering and data

803
01:04:57,970 --> 01:05:02,869
projection, or data pre-processing, or dimensionality
reduction. And the goal is to discover interesting

804
01:05:02,869 --> 01:05:10,049
things about the data set, such as subgroups,
patterns, clusters, et cetera.

805
01:05:10,049 --> 01:05:13,900
In unsupervised learning-- one of the difficulties
in supervised learning-- we know, right? We

806
01:05:13,900 --> 01:05:17,650
have an input. We have a label. And we're
like, OK, if that input-- if my algorithm

807
01:05:17,650 --> 01:05:24,259
doesn't give me the label, bad. Go retrain.
Or I know what-- I can go back, use that as

808
01:05:24,259 --> 01:05:25,829
my performance metric.

809
01:05:25,829 --> 01:05:30,279
On unsupervised learning, there is no simple
goal such as maximizing a certain probability

810
01:05:30,279 --> 01:05:34,730
for the algorithm. Some of that is going to
be something that you have to work on, is

811
01:05:34,730 --> 01:05:39,549
at the interclass or intraclass distance that
I'm most having that separation. Is that going

812
01:05:39,549 --> 01:05:43,260
to be my performance metric? Is it the number
of clusters that I'm creating? Is that the

813
01:05:43,260 --> 01:05:46,069
number of-- is that the metric that I'm using?

814
01:05:46,069 --> 01:05:50,690
But it is very popular, because it works on
unlabeled data. And I'm sure many of us work

815
01:05:50,690 --> 01:05:55,359
on data sets, which are just too large or
too difficult to sit and label. An example

816
01:05:55,359 --> 01:06:00,640
that comes to my mind, certainly, is in the
world of cybersecurity where you're collecting

817
01:06:00,640 --> 01:06:05,140
billions and billions of networked packets.
And you're trying to look for an almost behavior.

818
01:06:05,140 --> 01:06:09,400
You're not going to go through and look at
each pack and be like, bad, good, what it

819
01:06:09,400 --> 01:06:13,779
is. But you might use an unsupervised technique
to maybe extract out some of the relevant

820
01:06:13,779 --> 01:06:18,180
pieces, then use a supervised-- then go through
the trouble of labeling that data, and then

821
01:06:18,180 --> 01:06:21,490
pass that on to a supervised learning technique.
And I'm happy to share some research that

822
01:06:21,490 --> 01:06:23,740
we've been doing on that front.

823
01:06:23,740 --> 01:06:29,550
Some common techniques are within clustering
and data projection. Clustering is the basic

824
01:06:29,550 --> 01:06:33,269
idea that we want to group objects or sets
of features, such that objects in the same

825
01:06:33,269 --> 01:06:37,730
cluster are more similar to those of another
cluster. And what you typically do for that

826
01:06:37,730 --> 01:06:45,549
is you put your data in some feature space,
and you try to maximize some intracluster

827
01:06:45,549 --> 01:06:49,999
measure, which is basically saying, I want
the points within my cluster to be closer

828
01:06:49,999 --> 01:06:52,339
than anything outside of my cluster, right?

829
01:06:52,339 --> 01:06:57,830
So that's a metric. And you iteratively move
the membership from each. You set a number

830
01:06:57,830 --> 01:07:02,109
of clusters, saying, I need five clusters.
It'll randomly assign things. And it'll keep

831
01:07:02,109 --> 01:07:06,930
adjusting the membership of a particular data
point within a cluster, based on a metric

832
01:07:06,930 --> 01:07:09,720
such as the squared error.

833
01:07:09,720 --> 01:07:13,809
So in this example, we might say that, OK,
these are three clusters that I get out of

834
01:07:13,809 --> 01:07:19,499
it. Dimensionality reduction is the idea of
reducing the number of random variables under

835
01:07:19,499 --> 01:07:24,150
consideration. Very often, you'll collect
a data set that has hundreds to thousands

836
01:07:24,150 --> 01:07:25,750
of different features.

837
01:07:25,750 --> 01:07:30,009
Maybe some of these features are not that
important. Maybe they're unchanging. Or even

838
01:07:30,009 --> 01:07:35,940
if they are changing, it's not by much. And
so maybe you want to remove them from consideration.

839
01:07:35,940 --> 01:07:41,479
That's when you use a technique like dimensionality
reduction. And this is really, really important

840
01:07:41,479 --> 01:07:47,109
when you're doing feature selection and feature
extraction in your real data sets. And you

841
01:07:47,109 --> 01:07:50,779
might also use it for other techniques, such
as compression or visualization.

842
01:07:50,779 --> 01:07:55,999
So if you want to show things on Excel, showing
a thousand dimensional object may be difficult.

843
01:07:55,999 --> 01:08:02,369
You might try to project it down to the two
or three dimensions that are easiest to visualize.

844
01:08:02,369 --> 01:08:08,109
And of course, you can use neural networks
for unsupervised learning as well. Surprise,

845
01:08:08,109 --> 01:08:09,339
surprise.

846
01:08:09,339 --> 01:08:13,510
So as much as a lot of the press you've seen
has been on things like image classification

847
01:08:13,510 --> 01:08:18,560
using nice labeled data sets, there's a lot
of work where you can apply it in an unsupervised

848
01:08:18,560 --> 01:08:23,899
case. And these are largely used to find better
representations for data, such as clustering

849
01:08:23,899 --> 01:08:27,689
and dimensionality reduction. And they're
really powerful because of their non-linear

850
01:08:27,689 --> 01:08:28,689
capabilities.

851
01:08:28,689 --> 01:08:33,618
So one example-- I won't spend way too much
time on this-- is an autoencoder. And the

852
01:08:33,618 --> 01:08:37,421
basic idea behind an autoencoder is you're
trying to find some compressed representation

853
01:08:37,421 --> 01:08:43,849
for data. And the way we do this is by changing
the metric that we use to say that the system

854
01:08:43,849 --> 01:08:45,658
has done a good job.

855
01:08:45,658 --> 01:08:49,488
And the metric is basically-- if I have a
set of input features that I'm passing in,

856
01:08:49,488 --> 01:08:55,529
I would like to do the best job in reconstructing
that input at my output. And what I do is

857
01:08:55,529 --> 01:09:00,421
I squeeze it through a smaller number of layers,
which forms this compressed representation

858
01:09:00,421 --> 01:09:02,339
for my data set.

859
01:09:02,339 --> 01:09:08,380
And so the idea here is, how can I pass my
inputs through this narrow waste to come up

860
01:09:08,380 --> 01:09:13,068
with a reconstructed input that's very similar
to my original input? And so my metric in

861
01:09:13,068 --> 01:09:19,679
this particular case is essentially the difference
between the reconstructed input or the output

862
01:09:19,679 --> 01:09:27,960
and the input. And the compressed representation--
you can think of as the reduced dimensionality

863
01:09:27,960 --> 01:09:31,920
version of my problem.

864
01:09:31,920 --> 01:09:37,219
We've also done some work on replicator networks,
which are also really, really cool. Happy

865
01:09:37,219 --> 01:09:41,929
to chat about that as well. And finally, we
have to talk very briefly on reinforcement

866
01:09:41,929 --> 01:09:48,649
learning. And the basic-- again, at a very
high level, the reason reinforcement learning

867
01:09:48,649 --> 01:09:53,219
is fundamentally different than supervised
or unsupervised learning is that you're not

868
01:09:53,219 --> 01:09:57,550
passing in a label associated with an input
feature.

869
01:09:57,550 --> 01:10:02,610
So there is no supervisor or a person that
can label it, but just a reward signal. And

870
01:10:02,610 --> 01:10:08,460
the feedback is often delayed. And time is
important, so it steps through a process.

871
01:10:08,460 --> 01:10:15,050
And the agent's actions often change the input
data that it receives. So just to maybe--

872
01:10:15,050 --> 01:10:19,260
in the interest of time, just to give you
examples of where reinforcement learning could

873
01:10:19,260 --> 01:10:22,760
work and why you would use a technique like
reinforcement learning.

874
01:10:22,760 --> 01:10:29,599
So flying stunt maneuvers in a helicopter.
So if your helicopter is straight, you say

875
01:10:29,599 --> 01:10:33,540
keep doing more of whatever you're doing to
keep it there. If the helicopter tips over,

876
01:10:33,540 --> 01:10:38,530
you say stop doing whatever you just did to
do that. Could you create a supervised learning

877
01:10:38,530 --> 01:10:41,710
algorithm for doing this? Sure. Right?

878
01:10:41,710 --> 01:10:47,030
You would basically look for all the configurations
of your entire system every time the helicopter

879
01:10:47,030 --> 01:10:51,330
was upright. And you would look for all the
examples where your helicopter was tipping

880
01:10:51,330 --> 01:10:57,510
over or falling. And you would basically say,
OK, my engine speed was this much. My rotor

881
01:10:57,510 --> 01:10:58,860
speed was this much.

882
01:10:58,860 --> 01:11:04,980
And there are probably people here who fly
helicopters, so pardon me if I am completely

883
01:11:04,980 --> 01:11:10,860
oversimplifying this problem here. However,
you could certainly label it that way and

884
01:11:10,860 --> 01:11:16,210
say all these configurations of the helicopter
meant the helicopter was upright. All these

885
01:11:16,210 --> 01:11:19,630
configurations of the helicopter meant the
helicopter was not upright.

886
01:11:19,630 --> 01:11:24,110
That would be pretty expensive and difficult
data-- collect to do. Not sure how many people

887
01:11:24,110 --> 01:11:30,040
want to volunteer for-- let's do all the ones
that are at faults. And lots of other applications

888
01:11:30,040 --> 01:11:36,260
beyond that. So these are really useful, especially
in cases where-- what you're trying to model

889
01:11:36,260 --> 01:11:41,970
is just extremely complex. And the other really
powerful thing is this tends to mimic human

890
01:11:41,970 --> 01:11:47,070
behavior. And so they're very useful in those
type of applications.

891
01:11:47,070 --> 01:11:50,480
AUDIENCE: Can you explain, shortly, what a
reward would look like?

892
01:11:50,480 --> 01:11:55,660
VIJAY GADEPALLY: So a reward would just be--
it would be very similar until you get points.

893
01:11:55,660 --> 01:11:59,080
So you have your algorithm that's basically
trying to maximize the number of points that

894
01:11:59,080 --> 01:12:05,270
it receives, for example. And as you do--
it's very similar to what you or I would consider

895
01:12:05,270 --> 01:12:07,690
a reward playing a video game, right?

896
01:12:07,690 --> 01:12:14,380
Every time I get points, I do more of the
activities that make me get points. And it's

897
01:12:14,380 --> 01:12:23,409
essentially the same concept over here. All
right. So with that, I will conclude only

898
01:12:23,409 --> 01:12:30,349
20 minutes behind schedule. So I guess the
long story short is there's lots of exciting

899
01:12:30,349 --> 01:12:33,889
research into AI and machine learning techniques
out here.

900
01:12:33,889 --> 01:12:39,969
We did a one-hour view of this broad field
that research has dedicated about six to seven

901
01:12:39,969 --> 01:12:46,170
decades of work to words, so my apologies
to anyone watching this or in the room whose

902
01:12:46,170 --> 01:12:52,280
work I just jumped over. The key ingredients,
however-- and I think this is most important

903
01:12:52,280 --> 01:12:58,650
to this group-- is I look at what are the
problems where AI has done really well.

904
01:12:58,650 --> 01:13:03,480
These are some of the key ingredients-- data
availability, computing infrastructure, and

905
01:13:03,480 --> 01:13:06,659
the domain expertise and algorithms. And I
think it's very exciting to see this group

906
01:13:06,659 --> 01:13:11,909
over here, because we do have all of these
pieces coming together. So great things are

907
01:13:11,909 --> 01:13:13,659
bound to happen.

908
01:13:13,659 --> 01:13:18,840
There are, I think, large challenges in data
availability and readiness for AI, which is

909
01:13:18,840 --> 01:13:24,390
what we're just going to scrape the edge off
during this class. And some of the computing

910
01:13:24,390 --> 01:13:28,389
infrastructure is something that we'll be
talking to you about in a couple of minutes.

911
01:13:28,389 --> 01:13:33,309
And if you're interested in some of the more
detailed look at any of these things, a number

912
01:13:33,309 --> 01:13:39,150
of us actually wrote-- maybe I'm biased. I
think it's a great, great, great write-up.

913
01:13:39,150 --> 01:13:45,909
But, no, I think it's useful. It has its places.
Obviously, a lot of material in here. But

914
01:13:45,909 --> 01:13:51,099
we try to do our best job to at least cite
some of this really, really interesting work

915
01:13:51,099 --> 01:13:56,300
that's going on in the field. So with that,
I'll pause for any additional questions, but

916
01:13:56,300 --> 01:13:57,300
thank you very much for your attention.