1
00:00:00,000 --> 00:00:03,888
[SQUEAKING][RUSTLING][CLICKING]

2
00:00:12,650 --> 00:00:15,860
ERIK DEMAINE: Today we're
going to, in one lecture,

3
00:00:15,860 --> 00:00:19,390
cover an entire field, which
is computational complexity.

4
00:00:19,390 --> 00:00:23,410
It's sort of-- it
meets algorithms

5
00:00:23,410 --> 00:00:25,780
in an interesting way, which
is, algorithms is mostly

6
00:00:25,780 --> 00:00:28,090
about showing how to
solve problems well

7
00:00:28,090 --> 00:00:30,730
and showing that you can
solve a problem well.

8
00:00:30,730 --> 00:00:33,250
And computational complexity
is more about the lower bound

9
00:00:33,250 --> 00:00:35,260
side, proving that
you can't prove--

10
00:00:35,260 --> 00:00:37,210
you can't solve a
problem very well,

11
00:00:37,210 --> 00:00:39,912
you can't find a good
algorithm to solve it.

12
00:00:39,912 --> 00:00:42,370
We've seen a little bit about
lower bounds several lectures

13
00:00:42,370 --> 00:00:45,810
ago, proving search and
sorting lower bounds

14
00:00:45,810 --> 00:00:51,070
in a bounded branching
decision tree model.

15
00:00:51,070 --> 00:00:54,460
But these are much stronger
notions of badness.

16
00:00:54,460 --> 00:00:58,370
This is not about n versus n
log n or constant versus log n.

17
00:00:58,370 --> 00:01:02,380
This is about polynomial
versus exponential,

18
00:01:02,380 --> 00:01:05,170
which has been the sort
of bread-and-butter model

19
00:01:05,170 --> 00:01:06,010
in this class.

20
00:01:06,010 --> 00:01:07,600
Polynomial is a
good running time,

21
00:01:07,600 --> 00:01:09,058
and we're always
striving for that.

22
00:01:09,058 --> 00:01:12,220
Exponential is usually
pretty trivial to get.

23
00:01:12,220 --> 00:01:14,290
And so we're going to talk
about some different--

24
00:01:14,290 --> 00:01:16,210
they're called
complexity classes

25
00:01:16,210 --> 00:01:18,460
that talk about this
issue and different ways

26
00:01:18,460 --> 00:01:20,500
to prove hardness.

27
00:01:20,500 --> 00:01:22,000
This is a pretty
high-level lecture,

28
00:01:22,000 --> 00:01:23,417
so you're not going
to be expected

29
00:01:23,417 --> 00:01:24,850
to be able to prove hardness.

30
00:01:24,850 --> 00:01:26,600
But you'll get a flavor
of what it's like,

31
00:01:26,600 --> 00:01:29,330
and this will segue nicely
into other follow-on classes,

32
00:01:29,330 --> 00:01:32,290
which is-- we're at pretty
much the end of 006,

33
00:01:32,290 --> 00:01:37,450
so natural to talk about what
other things you might study.

34
00:01:37,450 --> 00:01:41,650
One result we'll prove today
is that most problems actually

35
00:01:41,650 --> 00:01:44,770
have no algorithm, which
is kind of shocking,

36
00:01:44,770 --> 00:01:46,720
and lots of other fun things.

37
00:01:46,720 --> 00:01:50,710
So let's get started
with the notion of P.

38
00:01:50,710 --> 00:01:55,660
This is the set of all problems
solvable in polynomial time.

39
00:02:01,110 --> 00:02:02,720
We talked about
what polynomial time

40
00:02:02,720 --> 00:02:06,000
means a bunch last lecture.

41
00:02:06,000 --> 00:02:08,060
So just recall that
polynomial time

42
00:02:08,060 --> 00:02:14,540
means polynomial in the problem
size, which I'll denote as n

43
00:02:14,540 --> 00:02:21,924
here, the number of
words in your input.

44
00:02:28,560 --> 00:02:32,970
OK, so these are the problems
that are efficiently solvable.

45
00:02:32,970 --> 00:02:35,910
P is the set of all of them.

46
00:02:35,910 --> 00:02:40,200
And for contrast, EXP is the
set of all problems solvable

47
00:02:40,200 --> 00:02:42,926
in exponential time.

48
00:02:42,926 --> 00:02:53,880
It's the problems solvable
in exponential time.

49
00:02:53,880 --> 00:02:57,210
Exponential here
means something like 2

50
00:02:57,210 --> 00:03:00,135
to the n to the constant.

51
00:03:00,135 --> 00:03:02,690
That's one reasonable
definition of exponential,

52
00:03:02,690 --> 00:03:06,990
so just the exponentiation
of this-- of polynomial.

53
00:03:06,990 --> 00:03:10,875
So as you might expect, most--

54
00:03:10,875 --> 00:03:13,000
every problem that we've
talked about in this class

55
00:03:13,000 --> 00:03:16,350
so far can be solved in
exponential time rather easily.

56
00:03:16,350 --> 00:03:18,240
And algorithms, in
some sense, is about

57
00:03:18,240 --> 00:03:20,790
distinguishing these two,
which problems are in P

58
00:03:20,790 --> 00:03:24,890
versus are in say EXP minus P.

59
00:03:24,890 --> 00:03:27,570
So to formalize
this a little bit,

60
00:03:27,570 --> 00:03:32,970
I'm going to draw
a picture, which

61
00:03:32,970 --> 00:03:35,250
is a bit of a
simplification of reality,

62
00:03:35,250 --> 00:03:38,112
but for the purposes of
this class will suffice,

63
00:03:38,112 --> 00:03:39,570
and I think is a
really helpful way

64
00:03:39,570 --> 00:03:43,740
to think about things, which
is to have a big axis for--

65
00:03:43,740 --> 00:03:46,680
a single axis for, how
hard is your problem,

66
00:03:46,680 --> 00:03:50,190
what is the difficulty
of solving your problem?

67
00:03:50,190 --> 00:03:52,990
And I want to be sure to leave--

68
00:03:52,990 --> 00:03:55,080
so the easiest
problems are over here.

69
00:03:55,080 --> 00:03:57,660
And each problem is
a dot on this axis.

70
00:03:57,660 --> 00:04:01,263
Hardest problems are
way down the line.

71
00:04:01,263 --> 00:04:03,930
And I want to make sure to leave
enough space for all the things

72
00:04:03,930 --> 00:04:05,080
that I care about.

73
00:04:05,080 --> 00:04:10,470
So P, I'm just going to
call this segment up front.

74
00:04:10,470 --> 00:04:13,940
And then I'm going
to have a bigger

75
00:04:13,940 --> 00:04:17,149
thing for exponential time.

76
00:04:20,399 --> 00:04:24,385
So this is just to say that
P is nested inside EXP.

77
00:04:24,385 --> 00:04:26,510
Every problem that can be
solved in polynomial time

78
00:04:26,510 --> 00:04:28,190
can also be solved
in exponential time

79
00:04:28,190 --> 00:04:30,740
because polynomial is less
than or equal to exponential.

80
00:04:30,740 --> 00:04:33,720
These are just upper bounds.

81
00:04:33,720 --> 00:04:36,270
Being an EXP means you're
somewhere from this line

82
00:04:36,270 --> 00:04:36,780
to the left.

83
00:04:36,780 --> 00:04:39,210
Being in P means you're
somewhere from this line

84
00:04:39,210 --> 00:04:41,550
to the left, in
terms of difficulty.

85
00:04:41,550 --> 00:04:48,120
But formally, we would write
P is contained in EXP as sets.

86
00:04:48,120 --> 00:04:52,827
In fact, they're also known to
be different from each other.

87
00:04:52,827 --> 00:04:55,410
There are problems that can be
solved in exponential time that

88
00:04:55,410 --> 00:04:57,210
cannot be solved
in polynomial time.

89
00:04:57,210 --> 00:05:02,010
For example-- I'll
put that here, sure.

90
00:05:09,480 --> 00:05:18,510
For example, n by n chess
is in exponential time,

91
00:05:18,510 --> 00:05:19,960
but not polynomial time.

92
00:05:19,960 --> 00:05:22,170
So what is the n
by chess problem?

93
00:05:22,170 --> 00:05:24,960
This is, I give you
an n by n chessboard,

94
00:05:24,960 --> 00:05:28,327
and I describe to
you a position.

95
00:05:28,327 --> 00:05:29,910
Here's where all the
white pieces are.

96
00:05:29,910 --> 00:05:31,530
Here's where all the
black pieces are.

97
00:05:31,530 --> 00:05:34,890
You can have an arbitrary number
of queens and bishops and pawns

98
00:05:34,890 --> 00:05:37,020
of each color, of
course, up to n

99
00:05:37,020 --> 00:05:39,540
squared of them so they
don't overlap each other.

100
00:05:39,540 --> 00:05:42,382
And I want to know, does
white win from this position?

101
00:05:42,382 --> 00:05:43,590
Let's say it's white to move.

102
00:05:43,590 --> 00:05:45,000
Can white win?

103
00:05:45,000 --> 00:05:47,700
And that problem can be
solved in an exponential time

104
00:05:47,700 --> 00:05:51,703
by exploring the entire
tree of all possible games.

105
00:05:51,703 --> 00:05:54,120
But it cannot-- but you can
prove that it cannot be solved

106
00:05:54,120 --> 00:05:56,110
in polynomial time.

107
00:05:56,110 --> 00:05:57,840
So that's a nice example.

108
00:05:57,840 --> 00:06:01,630
A more positive
example, so to speak,

109
00:06:01,630 --> 00:06:04,530
is negative weight
cycle detection.

110
00:06:04,530 --> 00:06:08,269
I guess it's literally negative,
but it's morally positive.

111
00:06:13,465 --> 00:06:15,840
Negative weight cycle detection
is the following problem.

112
00:06:15,840 --> 00:06:18,497
I give you a graph, a
directed graph with weights,

113
00:06:18,497 --> 00:06:20,580
and I want to know, does
it have a negative weight

114
00:06:20,580 --> 00:06:22,380
cycle, yes or no?

115
00:06:22,380 --> 00:06:24,520
And this problem is in?

116
00:06:24,520 --> 00:06:25,200
AUDIENCE: P.

117
00:06:25,200 --> 00:06:28,870
ERIK DEMAINE: P, because we
saw a polynomial time algorithm

118
00:06:28,870 --> 00:06:29,370
for this.

119
00:06:29,370 --> 00:06:31,987
You run Bellman-Ford
on an augmented graph.

120
00:06:31,987 --> 00:06:34,320
So this is an example of a
problem we know how to solve.

121
00:06:34,320 --> 00:06:36,120
This whole class
is full of examples

122
00:06:36,120 --> 00:06:38,250
that we know how to
solve in polynomial time.

123
00:06:38,250 --> 00:06:44,550
But this is a nice, non-trivial
and succinct one to phrase.

124
00:06:44,550 --> 00:06:46,590
It's also an example
of a decision problem.

125
00:06:46,590 --> 00:06:49,890
A lot of-- basically all the
problems I'll talk about today

126
00:06:49,890 --> 00:06:52,560
are decision problems, like
we talked about last class,

127
00:06:52,560 --> 00:06:54,810
meaning, the answer
is just yes or no.

128
00:06:54,810 --> 00:06:57,720
Can white win from this
position, yes or no?

129
00:06:57,720 --> 00:07:00,000
Is there a negative
weight cycle, yes or no?

130
00:07:00,000 --> 00:07:04,950
Tetris we can also
formulate as a problem.

131
00:07:04,950 --> 00:07:07,170
This is a version of
Tetris that we might call

132
00:07:07,170 --> 00:07:09,670
perfect information Tetris.

133
00:07:09,670 --> 00:07:12,120
Suppose I give you
a Tetris board.

134
00:07:12,120 --> 00:07:16,350
It has some garbage left
over from your past playing,

135
00:07:16,350 --> 00:07:17,580
or maybe it started that way.

136
00:07:17,580 --> 00:07:20,610
And I give you the sequence of
n pieces that are going to come.

137
00:07:20,610 --> 00:07:22,500
And I want to
know, can I survive

138
00:07:22,500 --> 00:07:23,940
this sequence of n pieces?

139
00:07:23,940 --> 00:07:25,710
Can you place each
of these pieces

140
00:07:25,710 --> 00:07:29,820
as they fall such that
you never overflow

141
00:07:29,820 --> 00:07:34,130
the top of the board
on an n by n board?

142
00:07:34,130 --> 00:07:37,850
This problem can be solved
in exponential time.

143
00:07:37,850 --> 00:07:40,820
But we don't know whether it can
be solved in polynomial time.

144
00:07:48,412 --> 00:07:50,120
We will talk about
that more in a moment.

145
00:07:50,120 --> 00:07:53,750
It's a problem that
very likely is not in P,

146
00:07:53,750 --> 00:07:57,080
but we can't actually
prove it yet.

147
00:07:57,080 --> 00:07:59,778
All right, so there's
one other class

148
00:07:59,778 --> 00:08:01,070
I want to define at this point.

149
00:08:01,070 --> 00:08:03,470
And we'll get to
a fourth one also.

150
00:08:03,470 --> 00:08:07,490
But R is the class
of all problems that

151
00:08:07,490 --> 00:08:10,360
can be solved in finite time.

152
00:08:10,360 --> 00:08:11,335
R stands for finite.

153
00:08:23,660 --> 00:08:25,520
R stands for
recursive, actually.

154
00:08:25,520 --> 00:08:27,620
This is a notion
by Church way back

155
00:08:27,620 --> 00:08:29,690
in the foundations of computing.

156
00:08:29,690 --> 00:08:32,570
As we know, we write recursive
algorithms to solve problems.

157
00:08:32,570 --> 00:08:34,070
In the beginning, that
was the only way to do it.

158
00:08:34,070 --> 00:08:35,487
Now we have other
ways with loops.

159
00:08:35,487 --> 00:08:38,210
But they're all effectively
recursion in the end.

160
00:08:38,210 --> 00:08:40,669
So R is all the problems
that can be solved

161
00:08:40,669 --> 00:08:43,130
in finite time on any computer.

162
00:08:43,130 --> 00:08:45,320
So very general,
this should include

163
00:08:45,320 --> 00:08:47,420
everything we care about.

164
00:08:47,420 --> 00:08:49,010
And it's bigger than
EXP, but includes

165
00:08:49,010 --> 00:08:53,750
problems that take doubly
exponential time or whatever.

166
00:08:53,750 --> 00:09:00,330
So I will draw a region
for R. So everything--

167
00:09:00,330 --> 00:09:03,000
it includes P. It includes EXP.

168
00:09:03,000 --> 00:09:05,580
And so we also have
containment but not

169
00:09:05,580 --> 00:09:10,003
equal R. There's, of course,
many classes in between.

170
00:09:10,003 --> 00:09:11,420
You could talk
about problems that

171
00:09:11,420 --> 00:09:14,150
take double A exponential time,
and that would have a thing

172
00:09:14,150 --> 00:09:15,230
in between here.

173
00:09:15,230 --> 00:09:17,450
Or there's also--
between P and EXP

174
00:09:17,450 --> 00:09:18,920
there's a lot of
different things.

175
00:09:18,920 --> 00:09:22,170
We will talk about one of them.

176
00:09:22,170 --> 00:09:25,310
But before we get to the
finer side of things,

177
00:09:25,310 --> 00:09:30,305
let me talk in particular about
R. So we have a nice example,

178
00:09:30,305 --> 00:09:34,580
we being computational
complexity theory--

179
00:09:34,580 --> 00:09:37,460
or I guess this is usually just
called theoretical computer

180
00:09:37,460 --> 00:09:38,120
science--

181
00:09:38,120 --> 00:09:40,250
has a problem.

182
00:09:40,250 --> 00:09:49,070
And if you're interested
in this, you can take 6041,

183
00:09:49,070 --> 00:09:50,908
I think.

184
00:09:50,908 --> 00:09:51,950
That doesn't sound right.

185
00:09:51,950 --> 00:09:53,480
That's a probability.

186
00:09:53,480 --> 00:09:54,230
It'll come to me.

187
00:10:02,470 --> 00:10:07,153
We have an explicit
problem that is not in R.

188
00:10:07,153 --> 00:10:09,070
So this class has been
all about problems that

189
00:10:09,070 --> 00:10:11,190
are in P. You have the number?

190
00:10:11,190 --> 00:10:12,130
AUDIENCE: 6045.

191
00:10:12,130 --> 00:10:13,480
ERIK DEMAINE: 6045, thank you.

192
00:10:13,480 --> 00:10:15,340
It's so close to this class.

193
00:10:15,340 --> 00:10:18,340
Or it's so close to 6046,
which is the natural successor

194
00:10:18,340 --> 00:10:19,700
to this class.

195
00:10:19,700 --> 00:10:24,040
So in 6045 we talk about this.

196
00:10:24,040 --> 00:10:25,890
So this class is all
about problems that

197
00:10:25,890 --> 00:10:27,720
are in P, which is very easy.

198
00:10:27,720 --> 00:10:34,740
But in fact, there are
problems way out here beyond R.

199
00:10:34,740 --> 00:10:38,310
And here is one
such problem, which

200
00:10:38,310 --> 00:10:39,780
we won't prove here today.

201
00:10:39,780 --> 00:10:43,380
It takes a whole
lecture to prove this.

202
00:10:43,380 --> 00:10:49,340
Given a computer program,
does it ever halt?

203
00:10:59,160 --> 00:11:00,563
Does it ever terminate?

204
00:11:00,563 --> 00:11:02,730
This would be a great thing
if we knew how to solve.

205
00:11:02,730 --> 00:11:04,848
It's basically an
infinite loop detector.

206
00:11:04,848 --> 00:11:06,390
If your problem
doesn't halt, then it

207
00:11:06,390 --> 00:11:07,920
has an infinite
loop of some sort.

208
00:11:07,920 --> 00:11:10,150
And you'd like to
tell your user, hey,

209
00:11:10,150 --> 00:11:11,860
you have a bug in your program.

210
00:11:11,860 --> 00:11:13,890
So this is one part
of bug detection.

211
00:11:13,890 --> 00:11:14,980
And it's impossible.

212
00:11:14,980 --> 00:11:17,550
There is no algorithm
that always--

213
00:11:17,550 --> 00:11:19,800
that solves all inputs
to this problem.

214
00:11:19,800 --> 00:11:23,157
Maybe given one program that,
say, has 0 lines of code,

215
00:11:23,157 --> 00:11:23,990
it could solve that.

216
00:11:23,990 --> 00:11:25,950
It says, yeah, that
one terminates.

217
00:11:25,950 --> 00:11:28,870
And maybe you can detect
simple kinds of infinite loops.

218
00:11:28,870 --> 00:11:30,960
So there's some inputs,
some computer programs

219
00:11:30,960 --> 00:11:32,040
that you could detect.

220
00:11:32,040 --> 00:11:36,970
But there's no one algorithm
that solves all inputs.

221
00:11:36,970 --> 00:11:39,760
This is kind of sad news.

222
00:11:39,760 --> 00:11:42,110
We call such problems
uncomputable.

223
00:11:42,110 --> 00:11:51,450
This is just another
word for being not in R.

224
00:11:51,450 --> 00:11:54,300
OK, and next thing
I'd like to do

225
00:11:54,300 --> 00:11:57,600
is prove to you
that most decision

226
00:11:57,600 --> 00:12:00,630
problems are uncomputable,
or sketcher proof.

227
00:12:08,740 --> 00:12:10,910
So remember, decision
problems are problems where

228
00:12:10,910 --> 00:12:12,920
the answer is just yes or no.

229
00:12:12,920 --> 00:12:14,630
This is a very special
kind of problem.

230
00:12:18,620 --> 00:12:23,660
And even those, almost all
of them, cannot be solved.

231
00:12:23,660 --> 00:12:27,200
So halting is an example
of a problem we want--

232
00:12:27,200 --> 00:12:28,460
we can't solve.

233
00:12:28,460 --> 00:12:32,552
This whole class, this 006, is
about problems we can solve.

234
00:12:32,552 --> 00:12:34,760
But today I'm going to show
you that, actually, those

235
00:12:34,760 --> 00:12:35,640
are in the minority.

236
00:12:35,640 --> 00:12:37,400
Most problems
cannot be computed.

237
00:12:37,400 --> 00:12:41,810
This is very strange and
also a little depressing.

238
00:12:41,810 --> 00:12:43,620
So we'll talk more
about that in a moment.

239
00:12:43,620 --> 00:12:45,840
First let me argue
why this is the case.

240
00:12:45,840 --> 00:12:48,230
So I'm going to be
a little informal

241
00:12:48,230 --> 00:12:51,650
about what exactly
is a computer program

242
00:12:51,650 --> 00:12:53,720
and what exactly is
a decision problem.

243
00:12:53,720 --> 00:12:58,370
But roughly, all I need
to do, the only level

244
00:12:58,370 --> 00:13:01,670
of precision I need is just
to count how many are there.

245
00:13:01,670 --> 00:13:03,440
What is a computer program?

246
00:13:03,440 --> 00:13:04,940
Well, it's usually a file.

247
00:13:04,940 --> 00:13:05,630
What's a file?

248
00:13:05,630 --> 00:13:07,290
It's like a string
of characters.

249
00:13:07,290 --> 00:13:08,120
What's a character?

250
00:13:08,120 --> 00:13:09,280
It's a string of bits.

251
00:13:09,280 --> 00:13:12,200
So a program is just, in
the end, a string of bits,

252
00:13:12,200 --> 00:13:14,000
finite string of bits.

253
00:13:18,112 --> 00:13:19,070
We all understand that.

254
00:13:19,070 --> 00:13:23,360
Whatever language you
define, in the end,

255
00:13:23,360 --> 00:13:26,300
every program is just
a string of bits.

256
00:13:26,300 --> 00:13:30,030
And a string of bits we can
translate into a number.

257
00:13:30,030 --> 00:13:36,110
So we can convert between
strings of bits and numbers.

258
00:13:36,110 --> 00:13:38,300
When I say number, I
mean what's usually

259
00:13:38,300 --> 00:13:43,320
called a natural number
or a non-negative integer.

260
00:13:43,320 --> 00:13:51,470
This is usually represented
by bold board bold--

261
00:13:51,470 --> 00:13:54,740
blackboard bold
capital N. So this

262
00:13:54,740 --> 00:13:56,870
is just 0, 1, 2, and so on.

263
00:14:02,270 --> 00:14:03,905
Now, what about
decision problems?

264
00:14:10,850 --> 00:14:13,820
Decision problem
is a specification

265
00:14:13,820 --> 00:14:14,940
of what we want to solve.

266
00:14:14,940 --> 00:14:17,360
So we can think of it as
saying, for every input,

267
00:14:17,360 --> 00:14:19,190
is the answer yes or no?

268
00:14:19,190 --> 00:14:21,710
That's literally what
a decision problem is.

269
00:14:21,710 --> 00:14:23,870
The only question
is, what is an input?

270
00:14:23,870 --> 00:14:26,900
And we've talked about inputs
and the size of inputs.

271
00:14:26,900 --> 00:14:30,090
And there's lots of different
ways to measure them.

272
00:14:30,090 --> 00:14:32,660
But in the end, we can think
of an input as a string of bits

273
00:14:32,660 --> 00:14:33,170
also.

274
00:14:33,170 --> 00:14:34,930
It's just a file.

275
00:14:34,930 --> 00:14:41,980
So a decision
problem is a function

276
00:14:41,980 --> 00:14:46,480
from inputs to yes or no.

277
00:14:50,410 --> 00:14:53,320
And inputs we're going
to say, well, that's

278
00:14:53,320 --> 00:15:03,506
a string of bits, which we can
associate with a number in N.

279
00:15:03,506 --> 00:15:05,680
So here we can start
to tie things together.

280
00:15:08,230 --> 00:15:12,060
So in other words, a program
is a finite string of bits,

281
00:15:12,060 --> 00:15:17,310
and a problem is, in some sense,
an infinite string of bits

282
00:15:17,310 --> 00:15:20,050
because there are infinitely
many possible inputs.

283
00:15:20,050 --> 00:15:23,280
And for each of them,
we specify yes or no.

284
00:15:23,280 --> 00:15:31,940
So this is basically an
infinite string of bits.

285
00:15:38,050 --> 00:15:46,140
So we can imagine
011010001110, infinitely.

286
00:15:46,140 --> 00:15:48,720
Just some-- for every string
of bits, we can say, OK,

287
00:15:48,720 --> 00:15:52,050
if your input is the number
0, here's the answer--

288
00:15:52,050 --> 00:15:53,070
no.

289
00:15:53,070 --> 00:15:57,110
If your input is the number
1, then the answer is yes.

290
00:15:57,110 --> 00:15:58,860
If your input is the
number 2, your answer

291
00:15:58,860 --> 00:16:01,080
is yes, and so on
down this line.

292
00:16:01,080 --> 00:16:02,970
Every infinite string
of bits corresponds

293
00:16:02,970 --> 00:16:05,040
to exactly one
decision problem, which

294
00:16:05,040 --> 00:16:07,980
specifies for every possible
input integer, which

295
00:16:07,980 --> 00:16:12,100
corresponds to a string of bits,
what is the answer, yes or no?

296
00:16:12,100 --> 00:16:15,010
So this may seem subtle, or it
may seem like not a big deal.

297
00:16:15,010 --> 00:16:16,390
This is a finite string of bits.

298
00:16:16,390 --> 00:16:17,850
This is an infinite
string of bits.

299
00:16:17,850 --> 00:16:21,030
But mathematics has well
studied this problem.

300
00:16:21,030 --> 00:16:26,040
And infinite strings of bits,
there are very many of them,

301
00:16:26,040 --> 00:16:26,737
infinitely many.

302
00:16:26,737 --> 00:16:27,570
It's not surprising.

303
00:16:27,570 --> 00:16:29,830
There are also
infinitely many integers.

304
00:16:29,830 --> 00:16:32,190
So maybe it doesn't
seem that deep.

305
00:16:32,190 --> 00:16:37,990
But there's a difference
in infinitude.

306
00:16:37,990 --> 00:16:42,761
Programs and integers
are countably infinite.

307
00:16:47,580 --> 00:16:50,640
And infinite strings of bits
are what's called uncountable.

308
00:16:56,210 --> 00:16:58,250
I think the most
intuitive way to see

309
00:16:58,250 --> 00:17:00,540
this is, an infinite
string of bits,

310
00:17:00,540 --> 00:17:04,849
if I put a decimal or a
binary point in front,

311
00:17:04,849 --> 00:17:09,530
this encodes a real
number between 0 and 1.

312
00:17:09,530 --> 00:17:17,369
So this is roughly
a real number in 01.

313
00:17:17,369 --> 00:17:19,849
And when I'm writing
approximately equal here,

314
00:17:19,849 --> 00:17:21,349
this really goes
in both directions.

315
00:17:21,349 --> 00:17:24,109
Given a decision problem, I
can define a string of bits,

316
00:17:24,109 --> 00:17:26,030
of course giving me the
answer for all inputs.

317
00:17:26,030 --> 00:17:28,447
And I can convert that into a
real number between 0 and 1.

318
00:17:28,447 --> 00:17:31,070
But also the other direction,
if I take any real number,

319
00:17:31,070 --> 00:17:33,030
that is a corresponding
decision problem.

320
00:17:33,030 --> 00:17:36,740
These are 1 to 1
bijection between them.

321
00:17:36,740 --> 00:17:39,710
And the bad news is, real
numbers are uncountable,

322
00:17:39,710 --> 00:17:42,050
and natural numbers
are countable,

323
00:17:42,050 --> 00:17:47,660
which means there's a lot more
of these than there are these.

324
00:17:47,660 --> 00:17:55,490
So one way you might
phrase this is, informally,

325
00:17:55,490 --> 00:17:57,350
the number of natural
numbers is way smaller

326
00:17:57,350 --> 00:17:59,340
than the number of real numbers.

327
00:17:59,340 --> 00:18:05,120
And so from that, we derive that
most problems are unsolvable,

328
00:18:05,120 --> 00:18:10,220
because every program solves
exactly one decision problem.

329
00:18:10,220 --> 00:18:12,590
We can also run a
program, conceptually,

330
00:18:12,590 --> 00:18:14,810
on all possible inputs,
and we will figure out

331
00:18:14,810 --> 00:18:16,290
what function at solving.

332
00:18:16,290 --> 00:18:18,980
And if we don't allow random
numbers in our program, which

333
00:18:18,980 --> 00:18:23,490
I'm not here, then every program
solves exactly one decision

334
00:18:23,490 --> 00:18:23,990
problem.

335
00:18:23,990 --> 00:18:27,080
Possibly, it's even worse for
us because multiple programs

336
00:18:27,080 --> 00:18:29,150
probably solve the
same decision problem.

337
00:18:29,150 --> 00:18:31,200
They're just-- they add
irrelevant lines of code

338
00:18:31,200 --> 00:18:32,700
or they don't do
anything different.

339
00:18:32,700 --> 00:18:35,690
Or you run Bellman-Ford
versus running Bellman-Ford

340
00:18:35,690 --> 00:18:39,780
five times, you'll
get the same result.

341
00:18:39,780 --> 00:18:41,748
And that's actually the
bad direction for us.

342
00:18:41,748 --> 00:18:43,790
We'd like to know whether
there is a program that

343
00:18:43,790 --> 00:18:46,290
solves every decision problem.

344
00:18:46,290 --> 00:18:50,130
And because there are
only this many programs

345
00:18:50,130 --> 00:18:52,130
and this many decision
problems, it just-- there

346
00:18:52,130 --> 00:18:53,930
aren't enough to go around.

347
00:18:53,930 --> 00:18:59,670
So most-- what's the phrasing?

348
00:18:59,670 --> 00:19:05,460
Not nearly enough programs for
all problems, and so there's

349
00:19:05,460 --> 00:19:17,980
no assignment of
programs to problems

350
00:19:17,980 --> 00:19:19,605
because there's just
too many problems.

351
00:19:22,610 --> 00:19:25,880
More money, more
problems, I guess.

352
00:19:25,880 --> 00:19:27,470
So when I first
saw this result, I

353
00:19:27,470 --> 00:19:29,488
was shocked and dismayed that--

354
00:19:29,488 --> 00:19:32,030
why are we even doing computer
science if most problems can't

355
00:19:32,030 --> 00:19:32,780
be solved?

356
00:19:32,780 --> 00:19:35,480
Luckily, it seems like most
of the problems we care about

357
00:19:35,480 --> 00:19:36,350
can be solved.

358
00:19:36,350 --> 00:19:38,020
That's what this
class is all about.

359
00:19:38,020 --> 00:19:40,020
And in fact, even the
problems that seem really,

360
00:19:40,020 --> 00:19:43,460
really hard for us to solve,
like n by n chess, where we can

361
00:19:43,460 --> 00:19:45,380
prove it takes
exponential time, there

362
00:19:45,380 --> 00:19:46,700
is an algorithm to solve chess.

363
00:19:46,700 --> 00:19:48,497
It's just really slow.

364
00:19:48,497 --> 00:19:50,330
And this is a statement
about, most problems

365
00:19:50,330 --> 00:19:51,980
can even be solved
in finite time

366
00:19:51,980 --> 00:19:54,830
no matter how much
time you give them.

367
00:19:54,830 --> 00:19:58,310
So it's not all bad.

368
00:19:58,310 --> 00:20:13,380
Luckily, most
problems we care about

369
00:20:13,380 --> 00:20:17,370
are in R. I don't know why.

370
00:20:17,370 --> 00:20:19,770
This is sort of a
mystery of life.

371
00:20:19,770 --> 00:20:22,470
But it's good news.

372
00:20:22,470 --> 00:20:28,230
Or it's why we keep persevering
trying to solve problems

373
00:20:28,230 --> 00:20:30,322
with algorithms.

374
00:20:30,322 --> 00:20:32,280
AUDIENCE: Is it because
when we state problems,

375
00:20:32,280 --> 00:20:35,830
the statement tends to be small?

376
00:20:35,830 --> 00:20:37,030
ERIK DEMAINE: Well, this--

377
00:20:37,030 --> 00:20:38,530
so the question
was, maybe it's just

378
00:20:38,530 --> 00:20:40,950
because these short
statement problems are easy.

379
00:20:40,950 --> 00:20:44,820
But this is a pretty short
statement, and it's hard.

380
00:20:44,820 --> 00:20:49,310
I think-- I don't have
a great reason why.

381
00:20:49,310 --> 00:20:50,750
I wish I understood.

382
00:20:50,750 --> 00:20:52,250
There's a general
result that if you

383
00:20:52,250 --> 00:20:55,355
have any question about
the program itself, then

384
00:20:55,355 --> 00:20:56,730
there's no algorithm
to solve it.

385
00:20:56,730 --> 00:20:59,090
Basically, any non-trivial
question about programs is

386
00:20:59,090 --> 00:21:04,190
hard, is not in R. And
I guess if you took--

387
00:21:04,190 --> 00:21:07,168
if you imagine taking a
random statement of a problem,

388
00:21:07,168 --> 00:21:08,960
then maybe this will
be in the middle of it

389
00:21:08,960 --> 00:21:10,790
with some probability.

390
00:21:10,790 --> 00:21:11,990
Maybe that's why most.

391
00:21:11,990 --> 00:21:14,150
But this is a very
strong notion of most.

392
00:21:14,150 --> 00:21:16,520
There are so many more real
numbers than natural numbers

393
00:21:16,520 --> 00:21:17,750
that--

394
00:21:17,750 --> 00:21:18,380
I don't know.

395
00:21:23,780 --> 00:21:31,580
I want to add one more class
to this picture, which is NP.

396
00:21:31,580 --> 00:21:35,150
It nestles in between P and EXP.

397
00:21:35,150 --> 00:21:42,980
So we know that P is
contained in or equal to NP.

398
00:21:42,980 --> 00:21:46,780
And NP is contained
in or equal to EXP.

399
00:21:46,780 --> 00:21:49,400
We don't know whether there's
a quality here or here.

400
00:21:49,400 --> 00:21:52,370
Probably not, but
we can't prove it.

401
00:21:52,370 --> 00:21:53,865
But what is this class?

402
00:21:53,865 --> 00:21:56,420
A couple of different
ways to define it--

403
00:21:56,420 --> 00:22:01,340
you might find one way or
the other more intuitive.

404
00:22:01,340 --> 00:22:02,160
They're equivalent.

405
00:22:02,160 --> 00:22:07,820
So as long as you understand at
least one of them, it's good.

406
00:22:07,820 --> 00:22:12,010
NP is just a class
of decision problems.

407
00:22:12,010 --> 00:22:15,980
So I define P and
EXP and R arbitrary.

408
00:22:15,980 --> 00:22:18,350
They can be problems
with any kind of output.

409
00:22:18,350 --> 00:22:21,440
But NP only makes sense
for decision problems.

410
00:22:21,440 --> 00:22:27,980
And it's going to look almost
like the definition of P--

411
00:22:32,420 --> 00:22:34,340
problem solvable
in polynomial time.

412
00:22:34,340 --> 00:22:36,260
We've just restricted
to decision problems.

413
00:22:36,260 --> 00:22:40,220
But we're going to
allow a strange kind

414
00:22:40,220 --> 00:22:44,000
of computer or
algorithm, which I

415
00:22:44,000 --> 00:22:46,659
like to call a lucky algorithm.

416
00:22:50,880 --> 00:22:53,150
And this is going to relate
to the notion of guessing

417
00:22:53,150 --> 00:22:55,190
that we talked about for
the last four lectures

418
00:22:55,190 --> 00:22:57,320
in dynamic programming.

419
00:22:57,320 --> 00:22:59,330
With dynamic programming,
we said, oh, there

420
00:22:59,330 --> 00:23:01,210
are all these different
choices I could make.

421
00:23:01,210 --> 00:23:02,210
What's the right choice?

422
00:23:02,210 --> 00:23:05,840
I don't know, so I'd
like to make a guess.

423
00:23:05,840 --> 00:23:08,180
And what that meant in terms
of a real algorithm is,

424
00:23:08,180 --> 00:23:09,950
we tried all of
the possibilities,

425
00:23:09,950 --> 00:23:12,170
and then took the max
or the OR or whatever

426
00:23:12,170 --> 00:23:14,750
over all those possibilities.

427
00:23:14,750 --> 00:23:15,620
And so we were--

428
00:23:15,620 --> 00:23:17,690
but what we were
simulating is something

429
00:23:17,690 --> 00:23:20,570
that I call a lucky algorithm,
which can make guesses

430
00:23:20,570 --> 00:23:22,790
and always makes
the right guess.

431
00:23:22,790 --> 00:23:25,590
This is a computer that
is impossible to buy.

432
00:23:25,590 --> 00:23:28,310
It would be great if you could
buy a computer that's lucky.

433
00:23:28,310 --> 00:23:30,268
But we don't know how to
build such a computer.

434
00:23:32,460 --> 00:23:35,820
So what does this mean?

435
00:23:35,820 --> 00:23:37,790
So informally, it
means your algorithm

436
00:23:37,790 --> 00:23:41,330
can make lucky guesses, and it
always makes the right guess.

437
00:23:41,330 --> 00:23:43,790
And whereas in DP, we had
to try all the options

438
00:23:43,790 --> 00:23:46,400
and spend time for all of
them, the lucky algorithm only

439
00:23:46,400 --> 00:23:49,940
has to spend time on the lucky
guess, on the correct guess.

440
00:23:49,940 --> 00:23:54,410
More formally, this is called
a non-deterministic model

441
00:23:54,410 --> 00:23:55,778
of computation.

442
00:24:02,330 --> 00:24:04,310
And this N is the--

443
00:24:04,310 --> 00:24:06,830
the N in non-determinism
is the N for NP.

444
00:24:06,830 --> 00:24:09,110
So this is non-deterministic
polynomial time.

445
00:24:11,800 --> 00:24:17,670
So algorithm can make guesses.

446
00:24:23,610 --> 00:24:27,870
And then in the end, it
should output yes or no.

447
00:24:33,750 --> 00:24:37,162
Like say if you're exploring a
maze, this algorithm could say,

448
00:24:37,162 --> 00:24:38,370
should I go left or go right?

449
00:24:38,370 --> 00:24:40,470
I'm going to guess whether
to go left or go right.

450
00:24:40,470 --> 00:24:42,010
And let's say it guesses left.

451
00:24:42,010 --> 00:24:43,260
And so then it just goes left.

452
00:24:43,260 --> 00:24:45,232
And then it reaches
another junction.

453
00:24:45,232 --> 00:24:46,690
It says, should I
go left or right?

454
00:24:46,690 --> 00:24:48,190
And it'll say, I'll
guess, and it'll

455
00:24:48,190 --> 00:24:49,590
say, guess right this time.

456
00:24:49,590 --> 00:24:53,393
And in the end, if I get to some
dead end maybe and I say no,

457
00:24:53,393 --> 00:24:55,560
or if I get to the destination
I'm trying to get to,

458
00:24:55,560 --> 00:24:56,280
I say yes.

459
00:24:56,280 --> 00:24:58,720
So that's a
non-deterministic algorithm.

460
00:24:58,720 --> 00:25:02,550
And what does it mean
to run that algorithm?

461
00:25:02,550 --> 00:25:06,090
What does it mean for
the guesses to be lucky?

462
00:25:06,090 --> 00:25:07,920
Here's what it means.

463
00:25:07,920 --> 00:25:11,490
These guesses are guaranteed--

464
00:25:11,490 --> 00:25:15,372
which way you end up going is
guaranteed to lead you to a yes

465
00:25:15,372 --> 00:25:16,080
if there is one--

466
00:25:25,910 --> 00:25:27,960
if possible.

467
00:25:27,960 --> 00:25:32,420
So in my maze analogy,
if my destination

468
00:25:32,420 --> 00:25:34,640
is reachable from
my source, then

469
00:25:34,640 --> 00:25:38,420
I'm guaranteed, whenever
I guessed left or right,

470
00:25:38,420 --> 00:25:42,410
I will choose a path that
leads me to my destination.

471
00:25:42,410 --> 00:25:44,870
Whereas, if the destination
is in some disconnected part

472
00:25:44,870 --> 00:25:47,520
of the maze and I
can't get there,

473
00:25:47,520 --> 00:25:49,168
then I don't know
what the guesses do.

474
00:25:49,168 --> 00:25:50,210
It doesn't really matter.

475
00:25:50,210 --> 00:25:52,460
Because no matter what I do,
I'll end up in a dead end

476
00:25:52,460 --> 00:25:54,230
and say no.

477
00:25:54,230 --> 00:25:56,610
That's the model.

478
00:25:56,610 --> 00:25:59,802
As long as you have an algorithm
that always outputs yes or no

479
00:25:59,802 --> 00:26:02,010
in polynomial time-- because
we're only talking about

480
00:26:02,010 --> 00:26:04,560
polynomial time,
lucky algorithms--

481
00:26:04,560 --> 00:26:06,360
if there's any way
to get to a yes,

482
00:26:06,360 --> 00:26:10,220
then your machine
will magically find it

483
00:26:10,220 --> 00:26:13,480
without having to spend any
time to make these decisions.

484
00:26:13,480 --> 00:26:16,100
So it's a pretty
magical computer,

485
00:26:16,100 --> 00:26:18,620
and it's not a computer
that exists in real life.

486
00:26:18,620 --> 00:26:21,430
But it's a computer that's
great to program on.

487
00:26:21,430 --> 00:26:22,345
It's very powerful.

488
00:26:22,345 --> 00:26:23,970
You could solve lots
of things with it.

489
00:26:23,970 --> 00:26:24,590
Yeah.

490
00:26:24,590 --> 00:26:27,800
AUDIENCE: If you had
this magical computer,

491
00:26:27,800 --> 00:26:30,080
it can guess whether
it's yes or no,

492
00:26:30,080 --> 00:26:33,117
why doesn't it just
answer the question?

493
00:26:33,117 --> 00:26:33,950
ERIK DEMAINE: Right.

494
00:26:33,950 --> 00:26:37,200
So what if we-- so
a nice check is,

495
00:26:37,200 --> 00:26:39,745
does this make all problems
trivial, all decision problems?

496
00:26:39,745 --> 00:26:41,120
Maybe I should
say, well, I don't

497
00:26:41,120 --> 00:26:42,703
know whether the
answer to the problem

498
00:26:42,703 --> 00:26:45,590
is yes or no, so I'll
just guess yes or no.

499
00:26:45,590 --> 00:26:49,470
This is problematic because--

500
00:26:49,470 --> 00:26:51,930
so I might say, it
will guess A or B,

501
00:26:51,930 --> 00:26:54,750
and if I choose the A
option, I will output yes,

502
00:26:54,750 --> 00:26:56,910
and if I choose the B
option, I will output no.

503
00:26:56,910 --> 00:27:00,055
In this model, that algorithm
will always output yes.

504
00:27:00,055 --> 00:27:01,680
Because what it's
saying is, if there's

505
00:27:01,680 --> 00:27:05,800
any way to get to a yes
answer, I will do that way.

506
00:27:05,800 --> 00:27:08,598
And so such an algorithm
that tries to cheat and just

507
00:27:08,598 --> 00:27:10,140
guess the whole
answer to the problem

508
00:27:10,140 --> 00:27:11,310
will actually end
up always saying

509
00:27:11,310 --> 00:27:12,600
yes, which means
it doesn't solve

510
00:27:12,600 --> 00:27:13,725
a very interesting problem.

511
00:27:13,725 --> 00:27:15,780
It only solves
the problem, which

512
00:27:15,780 --> 00:27:20,220
is represented by the
bit vector 1111111,

513
00:27:20,220 --> 00:27:21,930
where all the answers are yes.

514
00:27:21,930 --> 00:27:23,520
But good check.

515
00:27:23,520 --> 00:27:24,610
Yeah.

516
00:27:24,610 --> 00:27:26,235
AUDIENCE: Does there
have to be a bound

517
00:27:26,235 --> 00:27:30,330
of a number of things it has
to choose between when it

518
00:27:30,330 --> 00:27:31,075
[AUDIO OUT]

519
00:27:31,075 --> 00:27:31,825
ERIK DEMAINE: Yes.

520
00:27:31,825 --> 00:27:34,110
AUDIENCE: Does it have an
exponential number of them?

521
00:27:36,750 --> 00:27:39,690
ERIK DEMAINE: Exponential
number of choices is OK.

522
00:27:39,690 --> 00:27:42,240
I usually like to think
of it, as you can only

523
00:27:42,240 --> 00:27:45,270
guess one bit at a time.

524
00:27:45,270 --> 00:27:46,932
But we're allowed
polynomial time,

525
00:27:46,932 --> 00:27:48,390
so you're actually
allowed to guess

526
00:27:48,390 --> 00:27:49,870
polynomial number of bits.

527
00:27:49,870 --> 00:27:52,710
At that point, you can guess
over an exponential size space,

528
00:27:52,710 --> 00:27:55,840
but not more than exponential.

529
00:27:55,840 --> 00:27:59,070
So it's-- yeah, polynomial
time let's say in the one-bit

530
00:27:59,070 --> 00:28:00,570
guessing model.

531
00:28:00,570 --> 00:28:02,430
What did I say?

532
00:28:02,430 --> 00:28:07,440
Makes guesses-- let's
add binary here.

533
00:28:07,440 --> 00:28:12,210
Otherwise we get some other
class, which I don't want.

534
00:28:12,210 --> 00:28:16,020
OK, let's do an example, a real
example of such an algorithm

535
00:28:16,020 --> 00:28:20,550
that's useful, which is Tetris.

536
00:28:20,550 --> 00:28:27,960
So I claim Tetris is
in NP because there

537
00:28:27,960 --> 00:28:31,140
is a lucky algorithm and
non-deterministic polynomial

538
00:28:31,140 --> 00:28:34,260
time algorithm that can
solve the Tetris game.

539
00:28:34,260 --> 00:28:36,600
So again, you're
given a board, you're

540
00:28:36,600 --> 00:28:39,870
given some sequence
of pieces, and you

541
00:28:39,870 --> 00:28:41,730
want to know whether
there's some way

542
00:28:41,730 --> 00:28:44,920
to place the pieces
that lets you survive.

543
00:28:44,920 --> 00:28:53,790
And so what I'm going to
do is, for each piece,

544
00:28:53,790 --> 00:28:56,820
I'm going to guess
how to place it.

545
00:28:59,915 --> 00:29:01,290
So for the first
piece, I'm going

546
00:29:01,290 --> 00:29:04,440
to guess how far left
or right do I move it.

547
00:29:04,440 --> 00:29:05,820
Then I let it fall one step.

548
00:29:05,820 --> 00:29:07,290
Maybe I rotate it.

549
00:29:07,290 --> 00:29:10,560
I choose a sequence of moves
among left, right, down,

550
00:29:10,560 --> 00:29:13,440
rotate right, rotate left.

551
00:29:13,440 --> 00:29:17,370
And all along the way, I
check, is that move valid?

552
00:29:17,370 --> 00:29:21,190
If the move is invalid at any
point, I just say, return no.

553
00:29:21,190 --> 00:29:24,160
And then if the piece gets
nestled into a good spot,

554
00:29:24,160 --> 00:29:25,570
I continue to the next piece.

555
00:29:25,570 --> 00:29:27,653
I do the same thing, guess
all the possible things

556
00:29:27,653 --> 00:29:28,540
I could do to that.

557
00:29:28,540 --> 00:29:31,750
Again, I only need to
guess one bit at a time.

558
00:29:31,750 --> 00:29:34,090
And I'll only need to do a
polynomial number of guesses,

559
00:29:34,090 --> 00:29:37,060
like a linear number of guesses,
for each piece about where

560
00:29:37,060 --> 00:29:39,220
it falls in, so maybe a
quadratic number of guesses

561
00:29:39,220 --> 00:29:40,600
overall.

562
00:29:40,600 --> 00:29:42,955
And then at the
end, if I survived--

563
00:29:42,955 --> 00:29:45,260
oh, I also have to
check if a line clears.

564
00:29:45,260 --> 00:29:47,380
Then I clear the line.

565
00:29:47,380 --> 00:29:50,260
And if in the end I
survive, I return yes.

566
00:29:50,260 --> 00:29:53,800
So this is a
non-deterministic algorithm.

567
00:29:53,800 --> 00:29:58,220
So I would say, check
the rules of the game.

568
00:29:58,220 --> 00:30:06,460
And if we survive, return yes.

569
00:30:06,460 --> 00:30:10,270
And if at any point we violate
the rules-- for example,

570
00:30:10,270 --> 00:30:12,220
we go off the top of the board--

571
00:30:12,220 --> 00:30:13,933
we return no.

572
00:30:13,933 --> 00:30:15,850
So this is an algorithm
that sometimes returns

573
00:30:15,850 --> 00:30:17,920
no and sometimes
returns yes depending

574
00:30:17,920 --> 00:30:19,330
on what choices you make.

575
00:30:19,330 --> 00:30:21,070
And this model
guarantees, if there's

576
00:30:21,070 --> 00:30:24,660
any way to get to a
yes, it will find it.

577
00:30:24,660 --> 00:30:28,980
If I swapped these
answers, if I returned yes

578
00:30:28,980 --> 00:30:32,192
when I violated the rules and
returned no if I survived,

579
00:30:32,192 --> 00:30:33,900
this would be an
uninteresting algorithm.

580
00:30:33,900 --> 00:30:36,460
Because it's very easy
to lose in Tetris.

581
00:30:36,460 --> 00:30:38,850
The hard part is to survive.

582
00:30:38,850 --> 00:30:41,760
If I say, is there any way to
play the game in such a way

583
00:30:41,760 --> 00:30:45,000
that I violate the rules, then,
of course, the answer is yes.

584
00:30:45,000 --> 00:30:47,980
You can just stack pieces
and go off the top.

585
00:30:47,980 --> 00:30:52,180
There's an asymmetry in this
definition of yes versus no.

586
00:30:52,180 --> 00:30:55,500
It always finds yes
answers if possible.

587
00:30:55,500 --> 00:30:58,090
It doesn't always find
no answers if possible.

588
00:30:58,090 --> 00:31:01,230
So it's very important the way
that I wrote these questions.

589
00:31:01,230 --> 00:31:03,540
It's important that I define
Tetris as the problem of,

590
00:31:03,540 --> 00:31:04,890
can I survive?

591
00:31:04,890 --> 00:31:08,490
The problem of
can I not survive,

592
00:31:08,490 --> 00:31:11,330
is it impossible to survive,
that's a different question.

593
00:31:11,330 --> 00:31:14,130
That problem is not
in NP, probably.

594
00:31:14,130 --> 00:31:17,370
OK, so slight subtlety
there, yes versus no.

595
00:31:17,370 --> 00:31:19,230
Let me give you the
other definition of NP.

596
00:31:19,230 --> 00:31:21,240
So if this one's
confusing, which--

597
00:31:21,240 --> 00:31:23,820
although I prefer
this definition.

598
00:31:23,820 --> 00:31:26,290
Most people do not.

599
00:31:26,290 --> 00:31:28,200
So this is confusing.

600
00:31:28,200 --> 00:31:31,630
Let's do the other definition.

601
00:31:31,630 --> 00:32:04,300
So another definition is that
NP is a set of decision problems

602
00:32:04,300 --> 00:32:07,120
that can be checked
in polynomial time.

603
00:32:18,672 --> 00:32:20,380
This actually came up
in the last lecture

604
00:32:20,380 --> 00:32:22,060
where we talked
about subset sum.

605
00:32:22,060 --> 00:32:24,820
I said, here's a
bunch of integers,

606
00:32:24,820 --> 00:32:26,920
here's a target integer,
and I can prove to you

607
00:32:26,920 --> 00:32:30,310
that this integer can be
represented as a sum of numbers

608
00:32:30,310 --> 00:32:34,210
from my subset of numbers from
my set, because here they are.

609
00:32:34,210 --> 00:32:38,590
I gave you this plus this plus
this equals the target sum.

610
00:32:38,590 --> 00:32:41,890
And so that is a solution,
in some sense, that can

611
00:32:41,890 --> 00:32:44,720
be checked for a yes example.

612
00:32:44,720 --> 00:32:48,790
If I can represent my number
as a subset sum of a given set,

613
00:32:48,790 --> 00:32:50,590
it's easy for me to
prove that to you.

614
00:32:50,590 --> 00:32:53,200
And you can check it just
by adding up the numbers

615
00:32:53,200 --> 00:32:56,810
and checking that each
number was in the set.

616
00:32:56,810 --> 00:33:00,647
Whereas no instances, I had
an example of a target sum

617
00:33:00,647 --> 00:33:01,730
that could not be reached.

618
00:33:01,730 --> 00:33:03,563
And the only reason I
knew that is because I

619
00:33:03,563 --> 00:33:05,780
had brute-forced the thing.

620
00:33:05,780 --> 00:33:09,500
And there's no succinct way to
prove to you that that number

621
00:33:09,500 --> 00:33:11,410
can't be represented.

622
00:33:11,410 --> 00:33:16,880
A similar thing with Tetris,
what I would say is--

623
00:33:16,880 --> 00:33:20,150
so this is version
1, version 2--

624
00:33:20,150 --> 00:33:29,060
for Tetris is
that, a certificate

625
00:33:29,060 --> 00:33:38,990
for a yes input of
Tetris is a sequence

626
00:33:38,990 --> 00:33:40,475
of moves for the pieces.

627
00:33:48,520 --> 00:33:50,730
OK, if it's possible
to survive in Tetris,

628
00:33:50,730 --> 00:33:51,810
I can prove it to you.

629
00:33:51,810 --> 00:33:55,710
I can just play the game and
show you that I survived.

630
00:33:55,710 --> 00:33:58,230
No answers, I don't
know, it's hard to prove

631
00:33:58,230 --> 00:34:00,750
to you that I can't survive
a given sequence of pieces.

632
00:34:00,750 --> 00:34:02,010
But yes answers are easy.

633
00:34:02,010 --> 00:34:04,947
I just show you, here's the
sequence of button presses

634
00:34:04,947 --> 00:34:06,780
I'll do for this piece,
then for this piece,

635
00:34:06,780 --> 00:34:07,613
then for this piece.

636
00:34:07,613 --> 00:34:09,840
Notice it's exactly
the same thing

637
00:34:09,840 --> 00:34:12,989
that I guessed in the
beginning of this algorithm.

638
00:34:12,989 --> 00:34:15,328
And then I did some other
work to implement the rules.

639
00:34:15,328 --> 00:34:17,370
And similarly, if I gave
you a certificate, which

640
00:34:17,370 --> 00:34:20,760
is the things that I wanted to
guess of how to play the game,

641
00:34:20,760 --> 00:34:23,790
I can check this certificate
by just implementing

642
00:34:23,790 --> 00:34:26,409
the rules of Tetris and
seeing whether I survived.

643
00:34:26,409 --> 00:34:29,219
And if you violate the rules
at any point, you say no.

644
00:34:29,219 --> 00:34:32,219
And if you survive,
you return yes.

645
00:34:32,219 --> 00:34:34,620
That's what's called a
verification algorithm.

646
00:34:34,620 --> 00:34:38,070
So let me formalize this notion.

647
00:34:38,070 --> 00:34:48,480
Given a problem input
plus a certificate,

648
00:34:48,480 --> 00:34:56,790
like that one over there,
there is a polynomial time--

649
00:34:56,790 --> 00:34:58,550
so this is yet
another definition.

650
00:34:58,550 --> 00:35:01,370
This is what I mean by
this definition of NP--

651
00:35:01,370 --> 00:35:12,250
verification algorithm that
satisfies two properties.

652
00:35:12,250 --> 00:35:15,670
One is, for every yes input--

653
00:35:15,670 --> 00:35:19,770
so every input where the
answer is yes to the problem--

654
00:35:19,770 --> 00:35:28,440
there exists a certificate such
that the verifier says yes.

655
00:35:33,720 --> 00:35:36,150
So this is saying, it's
possible to prove to me

656
00:35:36,150 --> 00:35:38,700
that an answer is yes,
because if you ever

657
00:35:38,700 --> 00:35:41,940
have an input that the
answer happens to be yes,

658
00:35:41,940 --> 00:35:44,290
you can prove it to me by
giving me a certificate.

659
00:35:44,290 --> 00:35:47,610
There's always some certificate
that proves the answer's yes.

660
00:35:47,610 --> 00:35:50,770
Because the verifier, which runs
in regular polynomial time--

661
00:35:50,770 --> 00:35:53,100
this is a regular,
old-fashioned,

662
00:35:53,100 --> 00:35:55,440
down-to-earth
verification algorithm,

663
00:35:55,440 --> 00:35:57,870
polynomial time in
our usual sense--

664
00:35:57,870 --> 00:35:59,340
it will say yes.

665
00:35:59,340 --> 00:36:01,500
And furthermore, the yes
answers from the verifier

666
00:36:01,500 --> 00:36:06,060
are actually meaningful, because
if I ever give it a no input,

667
00:36:06,060 --> 00:36:10,230
it always says no, no matter
what certificate I give it.

668
00:36:18,730 --> 00:36:21,600
So this should really
formalize what all this means.

669
00:36:21,600 --> 00:36:24,030
It's equivalent to the
previous definition.

670
00:36:24,030 --> 00:36:27,210
This is saying that proofs
exist for yes instances.

671
00:36:27,210 --> 00:36:29,740
And this is saying that proofs
don't exist for no instances,

672
00:36:29,740 --> 00:36:31,440
meaning there are
no false proofs.

673
00:36:31,440 --> 00:36:33,450
So if the verifier
ever outputs yes,

674
00:36:33,450 --> 00:36:36,797
you know that the answer
to your problem is yes.

675
00:36:36,797 --> 00:36:38,380
But if it outputs
no, you're not sure.

676
00:36:38,380 --> 00:36:40,080
Maybe you got the
certificate wrong

677
00:36:40,080 --> 00:36:41,790
because we only know there's
some certificate where

678
00:36:41,790 --> 00:36:42,810
the verifier will say yes.

679
00:36:42,810 --> 00:36:44,435
Or maybe it was a no
input, and then it

680
00:36:44,435 --> 00:36:46,620
didn't matter what
certificate you used.

681
00:36:46,620 --> 00:36:49,390
But it's nice, because
it says on, say, Tetris,

682
00:36:49,390 --> 00:36:51,150
if I give you the
sequence of pieces,

683
00:36:51,150 --> 00:36:53,692
it's very easy to write down a
verifier which just implements

684
00:36:53,692 --> 00:36:55,740
the rules of Tetris.

685
00:36:55,740 --> 00:36:57,960
And so then you can at least
check whether a solution

686
00:36:57,960 --> 00:36:59,880
is valid in the yes case.

687
00:36:59,880 --> 00:37:03,040
In the no case, we don't
have anything useful.

688
00:37:03,040 --> 00:37:06,660
So NP is a structure,
some additional structure

689
00:37:06,660 --> 00:37:09,810
about the yes inputs
in your problem.

690
00:37:09,810 --> 00:37:12,600
And a lot of decision
problems are in NP.

691
00:37:12,600 --> 00:37:14,850
A lot of the problems
that we care about

692
00:37:14,850 --> 00:37:16,380
can be phrased as an NP problem.

693
00:37:16,380 --> 00:37:18,030
As long as it's a
decision problem,

694
00:37:18,030 --> 00:37:25,070
usually, answering yes or no
is provable, like subset sum,

695
00:37:25,070 --> 00:37:26,335
like Tetris.

696
00:37:26,335 --> 00:37:28,460
These are all problems
where, if the answer is yes,

697
00:37:28,460 --> 00:37:31,805
I can give you a
convincing proof why.

698
00:37:31,805 --> 00:37:33,680
And it turns out a lot--
so a lot of problems

699
00:37:33,680 --> 00:37:35,700
fall into this NP setting.

700
00:37:35,700 --> 00:37:39,350
And so we have some tools
for talking about problems

701
00:37:39,350 --> 00:37:45,510
being hard with respect to NP.

702
00:37:45,510 --> 00:37:50,220
Let me first talk a
little bit about P.

703
00:37:50,220 --> 00:37:55,580
Does not equal
NP, question mark.

704
00:37:55,580 --> 00:37:58,470
A lot of people conjecture
that P does not equal NP.

705
00:37:58,470 --> 00:38:00,750
It's sort of a
standard conjecture

706
00:38:00,750 --> 00:38:02,130
in theoretical computer science.

707
00:38:02,130 --> 00:38:06,690
But we don't know how to prove
whether P equals NP or does not

708
00:38:06,690 --> 00:38:08,130
equal NP.

709
00:38:08,130 --> 00:38:12,160
And so in this picture,
I've drawn the hypothesis,

710
00:38:12,160 --> 00:38:16,513
which is that NP is a strictly
bigger region than P is.

711
00:38:16,513 --> 00:38:18,180
But we don't actually
know whether there

712
00:38:18,180 --> 00:38:20,203
are problems in this region.

713
00:38:20,203 --> 00:38:21,870
We don't know whether
there are problems

714
00:38:21,870 --> 00:38:24,690
in this region
between NP and EXP.

715
00:38:24,690 --> 00:38:26,340
We conjecture there
are problems here

716
00:38:26,340 --> 00:38:28,170
and there are problems here.

717
00:38:28,170 --> 00:38:30,690
There's definitely problems
here or problems here,

718
00:38:30,690 --> 00:38:31,998
but we don't know which one.

719
00:38:31,998 --> 00:38:33,540
Because we know P
does not equal EXP,

720
00:38:33,540 --> 00:38:35,123
but we don't know
whether P equals NP,

721
00:38:35,123 --> 00:38:37,440
and we don't know
whether P equals EXP.

722
00:38:37,440 --> 00:38:41,250
If you could prove that P does
not equal NP, or disprove it,

723
00:38:41,250 --> 00:38:44,460
you would win $1 million, which
not that much money these days.

724
00:38:44,460 --> 00:38:47,880
But you would be famous
to for the rest of time

725
00:38:47,880 --> 00:38:49,710
if you could ever prove this.

726
00:38:49,710 --> 00:38:53,130
Every year, there's
usually a crackpot proof

727
00:38:53,130 --> 00:38:56,340
that doesn't work out.

728
00:38:56,340 --> 00:38:57,960
Some of them go to me.

729
00:38:57,960 --> 00:38:59,790
Please don't send them.

730
00:38:59,790 --> 00:39:02,490
And anyway, it's a
very hard problem.

731
00:39:02,490 --> 00:39:04,740
It is sort of the core problem
in theoretical computer

732
00:39:04,740 --> 00:39:07,380
science, how to prove
P does not equal NP.

733
00:39:07,380 --> 00:39:11,470
But for the most part,
we just assume it.

734
00:39:11,470 --> 00:39:12,970
Now, what does this
conjecture mean?

735
00:39:12,970 --> 00:39:15,300
It essentially means-- the
way I like to say it is,

736
00:39:15,300 --> 00:39:17,850
you cannot engineer luck.

737
00:39:17,850 --> 00:39:19,530
Because NP problems
are problems you

738
00:39:19,530 --> 00:39:20,910
can solve by lucky algorithms.

739
00:39:20,910 --> 00:39:24,990
P are problems you can solve
by regular old algorithms.

740
00:39:24,990 --> 00:39:27,930
And so if P equalled
NP, it means

741
00:39:27,930 --> 00:39:30,810
luck doesn't buy you
anything, which seems weird.

742
00:39:30,810 --> 00:39:34,650
If I can magically make
these super powerful guesses,

743
00:39:34,650 --> 00:39:36,570
then I can solve the
problem that that's

744
00:39:36,570 --> 00:39:39,120
NP, that seems
super powerful, way

745
00:39:39,120 --> 00:39:41,055
more powerful than
regular algorithms,

746
00:39:41,055 --> 00:39:42,930
where we have to actually
brute-force and try

747
00:39:42,930 --> 00:39:44,970
all the choices.

748
00:39:44,970 --> 00:39:48,550
And so it seems pretty solid
that P does not equal NP.

749
00:39:48,550 --> 00:39:51,090
That's my-- of course, we
don't know how to prove it.

750
00:39:51,090 --> 00:39:52,680
Another phrasing
is that it's harder

751
00:39:52,680 --> 00:39:56,190
to come up with proofs
than it is to check them,

752
00:39:56,190 --> 00:39:57,690
from a mathematical perspective.

753
00:39:57,690 --> 00:40:00,300
This is equivalent to
P does not equal NP.

754
00:40:00,300 --> 00:40:03,030
So that's why you
should believe it.

755
00:40:03,030 --> 00:40:06,430
Now, let's go over here.

756
00:40:11,740 --> 00:40:16,960
The next notion is NP-hardness.

757
00:40:27,910 --> 00:40:30,730
So in particular,
I want to claim--

758
00:40:30,730 --> 00:40:33,160
this is a theorem that
exists in the literature--

759
00:40:33,160 --> 00:40:40,740
that if P does not equal
NP, then Tetris is not NP.

760
00:40:44,180 --> 00:40:46,300
So I said right here,
Tetris is in EXP,

761
00:40:46,300 --> 00:40:47,920
but we don't know
whether it's in NP.

762
00:40:47,920 --> 00:40:50,140
But in fact, we
conjecture it is not NP

763
00:40:50,140 --> 00:40:52,210
because we conjecture
that P does not equal NP.

764
00:40:52,210 --> 00:40:53,710
If you could prove
this conjecture--

765
00:40:53,710 --> 00:40:56,252
and there's a lot of theorems
that are conditioned assuming P

766
00:40:56,252 --> 00:40:57,190
does not equal NP--

767
00:40:57,190 --> 00:41:00,160
then we get some nice results,
like Tetris cannot be solved

768
00:41:00,160 --> 00:41:01,030
in polynomial time.

769
00:41:01,030 --> 00:41:05,320
It cannot figure out whether
I can win a Tetris game

770
00:41:05,320 --> 00:41:08,380
in polynomial time
in the input size.

771
00:41:08,380 --> 00:41:10,850
Why?

772
00:41:10,850 --> 00:41:15,590
This is a consequence of
another theorem, which

773
00:41:15,590 --> 00:41:21,620
is that Tetris is NP-hard.

774
00:41:21,620 --> 00:41:24,410
I'm going to define
NP-hard informally first,

775
00:41:24,410 --> 00:41:27,500
and then I'll define it slightly
more formally in a second.

776
00:41:27,500 --> 00:41:32,420
But this means,
roughly, that Tetris is

777
00:41:32,420 --> 00:41:38,390
as hard as all problems in NP.

778
00:41:41,780 --> 00:41:46,080
So let me draw this
in the picture.

779
00:41:46,080 --> 00:41:52,170
So NP-hard is this part.

780
00:41:54,810 --> 00:41:56,580
Did I leave myself enough room?

781
00:41:56,580 --> 00:41:57,150
Maybe not.

782
00:42:02,310 --> 00:42:05,060
Well, we'll squeeze it in.

783
00:42:05,060 --> 00:42:07,790
There's another region
here for EXP-hard.

784
00:42:14,290 --> 00:42:18,670
So your problem being in
NP was a positive result.

785
00:42:18,670 --> 00:42:22,090
It says you're no more
difficult than this line.

786
00:42:22,090 --> 00:42:24,880
You're either at this
position or to the left.

787
00:42:24,880 --> 00:42:26,680
Being in P was also
a positive statement.

788
00:42:26,680 --> 00:42:28,690
It says you're here
or to the left.

789
00:42:28,690 --> 00:42:30,970
Being in P is better
than being in NP

790
00:42:30,970 --> 00:42:33,940
because this is
a subset of that.

791
00:42:33,940 --> 00:42:36,130
NP-hard is a lower bound.

792
00:42:36,130 --> 00:42:40,210
It says, you are at this point,
at this level of difficulty,

793
00:42:40,210 --> 00:42:42,730
or to the right.

794
00:42:42,730 --> 00:42:46,540
And so it goes from here off
to infinity in difficulty.

795
00:42:46,540 --> 00:42:48,250
And EXP-hard says
you're at least

796
00:42:48,250 --> 00:42:53,260
as hard as the right
extent of the EXP set,

797
00:42:53,260 --> 00:42:55,720
or you're harder than
that, in a sense that we

798
00:42:55,720 --> 00:42:58,660
will formalize in a moment.

799
00:42:58,660 --> 00:43:01,470
And this place right here,
as you might imagine,

800
00:43:01,470 --> 00:43:02,670
is kind of interesting.

801
00:43:02,670 --> 00:43:05,610
It's exactly where
NP meets NP-hard.

802
00:43:05,610 --> 00:43:09,433
This thing is
called NP-complete.

803
00:43:09,433 --> 00:43:11,350
You probably have heard
about NP-completeness,

804
00:43:11,350 --> 00:43:12,690
a famous notion.

805
00:43:12,690 --> 00:43:14,610
And this is what it means.

806
00:43:14,610 --> 00:43:17,840
It is, the problems
that are in NP--

807
00:43:17,840 --> 00:43:19,840
so they have a lucky
algorithm that solves them,

808
00:43:19,840 --> 00:43:21,673
they can be verified,
there are certificates

809
00:43:21,673 --> 00:43:25,170
that can be verified--
and they are NP-hard.

810
00:43:25,170 --> 00:43:28,080
So they're in NP, and they
are the hardest among problems

811
00:43:28,080 --> 00:43:29,310
in NP.

812
00:43:29,310 --> 00:43:31,200
Now, they're not
the hardest problem.

813
00:43:31,200 --> 00:43:33,030
There are actually many
problems right here

814
00:43:33,030 --> 00:43:36,750
at this single level of
difficulty called NP-complete.

815
00:43:36,750 --> 00:43:39,130
Among them is Tetris.

816
00:43:39,130 --> 00:43:43,120
There are many others, which
I will list in a moment.

817
00:43:43,120 --> 00:43:45,790
So that is NP-completeness.

818
00:43:45,790 --> 00:43:52,810
So because these problems are
the hardest problems in NP,

819
00:43:52,810 --> 00:43:55,090
if there's any problems
here in between--

820
00:43:55,090 --> 00:43:59,550
in NP minus P, then
these must be among them.

821
00:43:59,550 --> 00:44:01,880
And so if you assume
that P does not

822
00:44:01,880 --> 00:44:03,830
equal NP, as most
people do, then

823
00:44:03,830 --> 00:44:06,500
you know that all problems
at this right-most extreme

824
00:44:06,500 --> 00:44:08,660
of NP, the hardest of
the problems in NP,

825
00:44:08,660 --> 00:44:10,580
they must not be NP.

826
00:44:10,580 --> 00:44:12,980
And that's why I can say,
if P does not equal NP,

827
00:44:12,980 --> 00:44:17,700
Tetris is not NP, and also, any
NP complete problem is not NP.

828
00:44:17,700 --> 00:44:21,300
OK, what does "as hard as" mean?

829
00:44:26,110 --> 00:44:30,430
This is our good
friend reductions.

830
00:44:33,010 --> 00:44:35,095
We talked about reductions
a lot in this class.

831
00:44:38,260 --> 00:44:40,600
Reductions are the easy
way to use algorithms.

832
00:44:40,600 --> 00:44:43,600
You just take your problem
and reduce it to a problem

833
00:44:43,600 --> 00:44:45,490
you already know how to solve.

834
00:44:45,490 --> 00:44:52,150
You take the input to some
problem that you want to solve,

835
00:44:52,150 --> 00:44:56,530
and you convert it into an
input to some other problem,

836
00:44:56,530 --> 00:44:58,960
like single source shortest
paths or something like that

837
00:44:58,960 --> 00:45:01,810
that you already have an
algorithm for solving.

838
00:45:01,810 --> 00:45:05,050
So if you have an algorithm
that solves problem B,

839
00:45:05,050 --> 00:45:09,760
you can convert that
into a solution for B.

840
00:45:09,760 --> 00:45:11,980
And a reduction should
also tell me how to--

841
00:45:11,980 --> 00:45:14,350
given a solution to B,
how to convert it back

842
00:45:14,350 --> 00:45:18,610
into a solution for A. And
when I say solution here,

843
00:45:18,610 --> 00:45:21,370
I actually mean certificate
from over there.

844
00:45:21,370 --> 00:45:28,280
So how-- so if I--

845
00:45:28,280 --> 00:45:32,360
so if I have-- so reduction
consists of these two pieces--

846
00:45:32,360 --> 00:45:35,045
how to convert an input
at A to an input for B,

847
00:45:35,045 --> 00:45:36,980
and given a solution
to B, how to convert it

848
00:45:36,980 --> 00:45:37,853
to a solution to A.

849
00:45:37,853 --> 00:45:40,520
Let me give you some examples of
reductions you've already seen.

850
00:45:40,520 --> 00:45:42,620
You've seen a lot of them.

851
00:45:42,620 --> 00:45:47,180
If I have unweighted
shortest paths on the left--

852
00:45:47,180 --> 00:45:52,580
unweighted single
source shortest paths--

853
00:45:52,580 --> 00:45:55,400
I can reduce that to
weighted shortest paths.

854
00:45:59,990 --> 00:46:01,262
How?

855
00:46:01,262 --> 00:46:02,970
AUDIENCE: Set all
the weights to 1.

856
00:46:02,970 --> 00:46:04,080
ERIK DEMAINE: Set
all the weights to 1.

857
00:46:04,080 --> 00:46:06,060
So here I'm given a
graph without weights.

858
00:46:06,060 --> 00:46:07,800
If I set all the
weights to 1, that

859
00:46:07,800 --> 00:46:10,330
turns it into an input for a
weighted single source shortest

860
00:46:10,330 --> 00:46:10,830
paths.

861
00:46:10,830 --> 00:46:12,497
So if you didn't know
how to solve this,

862
00:46:12,497 --> 00:46:14,170
you could solve it
by converting it.

863
00:46:14,170 --> 00:46:16,450
If you've already written,
say, a Dijkstra algorithm,

864
00:46:16,450 --> 00:46:18,815
you could apply it to solve
unweighted single source

865
00:46:18,815 --> 00:46:19,440
shortest paths.

866
00:46:19,440 --> 00:46:21,240
Now, we know a faster
way to solve this,

867
00:46:21,240 --> 00:46:23,340
but it's only a
log factor faster.

868
00:46:23,340 --> 00:46:26,452
And here we're talking about
polynomial versus exponential.

869
00:46:26,452 --> 00:46:27,660
So this is a valid reduction.

870
00:46:27,660 --> 00:46:29,035
It's not the most
interesting one

871
00:46:29,035 --> 00:46:32,280
from an algorithmic standpoint,
but it is an algorithm.

872
00:46:32,280 --> 00:46:35,640
Another one we've
seen is, if you

873
00:46:35,640 --> 00:46:37,470
have integer
weights in the left,

874
00:46:37,470 --> 00:46:39,630
you can convert
that to unweighted

875
00:46:39,630 --> 00:46:42,780
on the right, positive
integer weights,

876
00:46:42,780 --> 00:46:45,510
by subdividing each
edge of weight W

877
00:46:45,510 --> 00:46:50,040
into W edges of no weight.

878
00:46:50,040 --> 00:46:52,230
So that's maybe a little
bit less efficient.

879
00:46:52,230 --> 00:46:55,200
It depends what the
sum of the weights are.

880
00:46:55,200 --> 00:46:59,950
Another version that we've seen
is longest path in a graph.

881
00:46:59,950 --> 00:47:04,470
We can-- weighted path we
can reduce to shortest path

882
00:47:04,470 --> 00:47:10,290
in a graph, weighted by
negating all the weights.

883
00:47:10,290 --> 00:47:12,540
We did this in some of the
dynamic programming things.

884
00:47:12,540 --> 00:47:14,362
Like oh, longest path on a DAG?

885
00:47:14,362 --> 00:47:16,320
We can convert that into
shortest path on a DAG

886
00:47:16,320 --> 00:47:18,120
just by negating
all the weights.

887
00:47:18,120 --> 00:47:20,700
So these are all examples
of converting one problem

888
00:47:20,700 --> 00:47:21,630
to another.

889
00:47:21,630 --> 00:47:23,400
Usually, you convert from--

890
00:47:23,400 --> 00:47:25,440
for algorithms, you
convert from a problem

891
00:47:25,440 --> 00:47:28,200
you want to solve into a
problem that you already

892
00:47:28,200 --> 00:47:30,070
know how to solve.

893
00:47:30,070 --> 00:47:32,610
But it turns out the same
tool reductions can be used

894
00:47:32,610 --> 00:47:35,510
to prove negative results too.

895
00:47:35,510 --> 00:47:38,640
And in this case, we're going
to reduce from a problem that we

896
00:47:38,640 --> 00:47:42,750
think cannot be solved and
reduce it to the problem that

897
00:47:42,750 --> 00:47:45,330
we're interested in solving.

898
00:47:45,330 --> 00:47:48,390
So let me write more
precisely what this means.

899
00:47:48,390 --> 00:47:50,710
If you can find a
reduction like this,

900
00:47:50,710 --> 00:47:53,880
it means that
solving A is at least

901
00:47:53,880 --> 00:48:04,300
as easy as solving B. Because
I could solve A, in particular,

902
00:48:04,300 --> 00:48:08,080
by converting it into B, solving
B, and then converting it back

903
00:48:08,080 --> 00:48:11,380
to a solution to A. So in
other words, if I can solve B,

904
00:48:11,380 --> 00:48:14,065
I can solve A, which I
can phrase informally as,

905
00:48:14,065 --> 00:48:16,270
A is at least as easy as B.

906
00:48:16,270 --> 00:48:19,660
And now using grammar,
contrapositive

907
00:48:19,660 --> 00:48:23,770
whatever, this is the same thing
as saying that B is at least as

908
00:48:23,770 --> 00:48:34,430
hard as A. And this is what I
mean by at least as hard as.

909
00:48:34,430 --> 00:48:37,120
So this is my definition
of at least as hard,

910
00:48:37,120 --> 00:48:40,810
in this notion of NP-hardness.

911
00:48:40,810 --> 00:48:44,020
So what NP-hard means
is that I'm at least as

912
00:48:44,020 --> 00:48:46,000
hard as all problems in NP.

913
00:48:46,000 --> 00:48:49,420
So what that means is,
every problem in NP

914
00:48:49,420 --> 00:48:53,845
can be reduced to Tetris,
which is kind of funny.

915
00:48:53,845 --> 00:48:55,720
But in particular, that
means that if there's

916
00:48:55,720 --> 00:48:57,595
an algorithm for Tetris,
there's an algorithm

917
00:48:57,595 --> 00:48:59,950
for all problems in NP.

918
00:48:59,950 --> 00:49:01,720
And so that's actually
the contrapositive

919
00:49:01,720 --> 00:49:02,350
of this statement.

920
00:49:02,350 --> 00:49:04,160
So this is saying, if
there's a polynomial--

921
00:49:04,160 --> 00:49:06,368
if I take the contrapositive
of this, this is saying,

922
00:49:06,368 --> 00:49:08,590
if there's a polynomial
time algorithm for Tetris,

923
00:49:08,590 --> 00:49:10,960
then P equals NP, there's
a polynomial time algorithm

924
00:49:10,960 --> 00:49:12,670
for every problem in NP.

925
00:49:12,670 --> 00:49:14,830
And the way we prove
that is by reductions.

926
00:49:14,830 --> 00:49:19,000
We take an arbitrary problem in
NP, and we reduce it to Tetris.

927
00:49:19,000 --> 00:49:22,090
Luckily, that's not as hard as
it sounds because it's already

928
00:49:22,090 --> 00:49:22,900
been done once.

929
00:49:22,900 --> 00:49:27,250
There is already a
reduction from NP to--

930
00:49:27,250 --> 00:49:31,570
from all problems in NP to
singular problems out there,

931
00:49:31,570 --> 00:49:32,710
the NP-complete problems.

932
00:49:32,710 --> 00:49:35,110
There is some first
NP-complete problem, which

933
00:49:35,110 --> 00:49:38,240
I guess is the Turing machine.

934
00:49:38,240 --> 00:49:40,250
It's basically simulating
a lucky algorithm,

935
00:49:40,250 --> 00:49:42,558
so it's kind of a not
very interesting problem.

936
00:49:42,558 --> 00:49:44,350
But from that problem,
if you can reduce it

937
00:49:44,350 --> 00:49:46,142
to any other problem,
you know that problem

938
00:49:46,142 --> 00:49:48,050
is NP-hard as well.

939
00:49:48,050 --> 00:49:56,250
And so briefly, I want to show
you some examples of that here.

940
00:49:56,250 --> 00:49:59,090
So I want to start
out with a problem

941
00:49:59,090 --> 00:50:02,510
that I'm just going to
assume is NP-complete.

942
00:50:02,510 --> 00:50:05,510
And it's called 3-partition.

943
00:50:05,510 --> 00:50:08,600
One way to phrase it is, I
give you a bunch of integers--

944
00:50:08,600 --> 00:50:14,360
I think I have it written down
over here, also the board.

945
00:50:14,360 --> 00:50:17,870
I give you n integers, and
I'd like to divide them up

946
00:50:17,870 --> 00:50:22,310
into n over 3 groups of size 3,
such that each group of size 3

947
00:50:22,310 --> 00:50:24,740
has the same sum.

948
00:50:24,740 --> 00:50:27,668
And it's written
there on the board.

949
00:50:27,668 --> 00:50:29,960
So you can also think of this
as the following problem.

950
00:50:29,960 --> 00:50:33,560
I give you a bunch of rectangles
that are a side length--

951
00:50:33,560 --> 00:50:36,920
or a bunch of sticks, let's
say, of varying lengths,

952
00:50:36,920 --> 00:50:41,010
and I want to group them up
like on the right diagram,

953
00:50:41,010 --> 00:50:44,720
so in groups of 3, such that
the total length of each group

954
00:50:44,720 --> 00:50:46,970
is exactly the same.

955
00:50:46,970 --> 00:50:48,530
This is just a problem.

956
00:50:48,530 --> 00:50:51,350
And just believe for now
that it is NP-complete.

957
00:50:51,350 --> 00:50:52,490
I won't prove that.

958
00:50:52,490 --> 00:50:57,170
But what I'd like to show you
is a reduction from this problem

959
00:50:57,170 --> 00:50:59,000
to another problem--

960
00:50:59,000 --> 00:51:01,520
solving jigsaw puzzles.

961
00:51:01,520 --> 00:51:03,110
So you might think
jigsaw puzzles

962
00:51:03,110 --> 00:51:09,590
are really easy, and especially
easy if I lose the projector.

963
00:51:09,590 --> 00:51:11,480
But in fact, if you
have a jigsaw puzzle

964
00:51:11,480 --> 00:51:14,540
where some of the
matches are ambiguous,

965
00:51:14,540 --> 00:51:17,810
if there's multiple pieces that
could fit against a given tab

966
00:51:17,810 --> 00:51:22,250
or pocket, then I
claim I can represent

967
00:51:22,250 --> 00:51:26,030
this 3-partition problem
by building little sticks,

968
00:51:26,030 --> 00:51:27,360
like here.

969
00:51:27,360 --> 00:51:30,650
So if I want to represent
a stick of length ai,

970
00:51:30,650 --> 00:51:32,090
I'm just going to build an ai--

971
00:51:32,090 --> 00:51:33,920
I didn't mention
they're all integers,

972
00:51:33,920 --> 00:51:38,110
and they're
polynomial-sized integers.

973
00:51:38,110 --> 00:51:43,360
I'm going to represent that
by ai different pieces here.

974
00:51:43,360 --> 00:51:47,140
And the red tabs and
pockets are designed

975
00:51:47,140 --> 00:51:48,850
to be unique global
to the puzzle,

976
00:51:48,850 --> 00:51:50,680
like a regular jigsaw puzzle.

977
00:51:50,680 --> 00:51:56,320
Given this piece on the left
and this tab on the right,

978
00:51:56,320 --> 00:51:59,320
there's a unique pocket, there's
a piece with unique pocket

979
00:51:59,320 --> 00:52:01,250
that fits perfectly
into that piece.

980
00:52:01,250 --> 00:52:03,760
So this joining is forced.

981
00:52:03,760 --> 00:52:05,470
And also this joining is forced.

982
00:52:05,470 --> 00:52:07,510
But the blue tabs and
pockets are different.

983
00:52:07,510 --> 00:52:09,610
They're all the same.

984
00:52:09,610 --> 00:52:11,030
They're all identical.

985
00:52:11,030 --> 00:52:14,230
And so if I build
this frame using

986
00:52:14,230 --> 00:52:20,110
the red unique assignments,
and I build these rectangles,

987
00:52:20,110 --> 00:52:23,140
if I want to pack these
rectangles into this rectangle,

988
00:52:23,140 --> 00:52:26,110
that's exactly the 3-partition
problem, with some details

989
00:52:26,110 --> 00:52:27,400
that I didn't fill in.

990
00:52:27,400 --> 00:52:30,820
But it turns out you'd
be forced to group these

991
00:52:30,820 --> 00:52:35,461
into groups of size 3,
something like this,

992
00:52:35,461 --> 00:52:39,110
with varying lengths.

993
00:52:39,110 --> 00:52:40,860
OK, so that's an
example of a reduction.

994
00:52:40,860 --> 00:52:42,680
If you believe the
3-partition is NP-hard,

995
00:52:42,680 --> 00:52:45,770
this proves to you that
jigsaw puzzles are NP-hard,

996
00:52:45,770 --> 00:52:47,568
something you may
not have known.

997
00:52:47,568 --> 00:52:49,110
Every time you solve
a jigsaw puzzle,

998
00:52:49,110 --> 00:52:50,652
you can feel good
about yourself now,

999
00:52:50,652 --> 00:52:52,950
especially if it
has ambiguous mates.

1000
00:52:52,950 --> 00:52:54,890
Next is Tetris.

1001
00:52:54,890 --> 00:52:58,940
So here is a reduction from
the same 3-partition problem,

1002
00:52:58,940 --> 00:53:01,220
which is one of my favorite
problems, to Tetris.

1003
00:53:01,220 --> 00:53:03,740
It starts out with
this strange board.

1004
00:53:03,740 --> 00:53:10,230
It has a bunch of columns
here where I could put pieces.

1005
00:53:10,230 --> 00:53:13,310
So I'm not allowed to put
pieces in these dark regions.

1006
00:53:13,310 --> 00:53:16,670
They all have height
T. T is the target sum

1007
00:53:16,670 --> 00:53:18,770
that we want all
of the numbers to--

1008
00:53:18,770 --> 00:53:21,200
all of the triples of
numbers to add up to.

1009
00:53:21,200 --> 00:53:24,260
And there's n over
3 of these slots

1010
00:53:24,260 --> 00:53:26,730
where I can try to put pieces.

1011
00:53:26,730 --> 00:53:29,470
And it's-- because of this
thing over on the right,

1012
00:53:29,470 --> 00:53:32,940
there's no way to clear
lines in this game.

1013
00:53:32,940 --> 00:53:35,820
And now to represent
a single number ai,

1014
00:53:35,820 --> 00:53:38,250
I'm going to give you
this sequence of pieces,

1015
00:53:38,250 --> 00:53:40,380
which starts with an L piece.

1016
00:53:40,380 --> 00:53:43,410
And then it has ai
repetitions of this pattern,

1017
00:53:43,410 --> 00:53:45,780
and then it ends with
these two pieces.

1018
00:53:45,780 --> 00:53:49,500
And so what ends up
happening is that--

1019
00:53:49,500 --> 00:53:51,300
this is in the
intended solution--

1020
00:53:51,300 --> 00:53:56,220
you first place an L at the
bottom of one of these buckets,

1021
00:53:56,220 --> 00:53:58,950
and then you repeat this
pattern in this nice way.

1022
00:53:58,950 --> 00:54:03,990
And it fills up the ai,
roughly, height of this bucket.

1023
00:54:03,990 --> 00:54:08,260
And then at the end, you
have to put the I here.

1024
00:54:08,260 --> 00:54:10,750
And what this ends
up guaranteeing

1025
00:54:10,750 --> 00:54:14,290
is that all of these pieces
go into a single bucket.

1026
00:54:14,290 --> 00:54:14,950
You can check.

1027
00:54:14,950 --> 00:54:15,910
It's tedious.

1028
00:54:15,910 --> 00:54:18,700
But if you tried to put
some of these pieces

1029
00:54:18,700 --> 00:54:20,980
in one bucket and other
pieces in a different bucket,

1030
00:54:20,980 --> 00:54:23,862
you would lose some space,
and then you would die.

1031
00:54:23,862 --> 00:54:26,320
So if you want to survive, you
have to put all these pieces

1032
00:54:26,320 --> 00:54:26,995
into one bucket.

1033
00:54:26,995 --> 00:54:28,870
And so again, we're just
stacking rectangles.

1034
00:54:28,870 --> 00:54:31,303
We're putting a whole bunch
of rectangles in one pocket

1035
00:54:31,303 --> 00:54:33,220
and then a bunch of
rectangles another pocket.

1036
00:54:33,220 --> 00:54:35,300
We can switch back and
forth however we want.

1037
00:54:35,300 --> 00:54:37,210
But the only way to
win, it turns out,

1038
00:54:37,210 --> 00:54:40,060
is if you get all
of those rectangles

1039
00:54:40,060 --> 00:54:42,040
to add up to exactly
the right height.

1040
00:54:42,040 --> 00:54:43,930
Then you get a
picture like this.

1041
00:54:43,930 --> 00:54:45,730
If you don't get a
picture like this,

1042
00:54:45,730 --> 00:54:47,710
you can prove you end up dying.

1043
00:54:47,710 --> 00:54:49,360
Then I'll give
you a bunch of Ls.

1044
00:54:49,360 --> 00:54:52,300
Then I'll finally give you this
T, which clears some lines.

1045
00:54:52,300 --> 00:54:53,327
And then I'll give you--

1046
00:54:53,327 --> 00:54:54,910
the most satisfying
Tetris game ever--

1047
00:54:54,910 --> 00:54:56,530
I'll give you a
ton of I's, and you

1048
00:54:56,530 --> 00:54:59,290
get Tetris, Tetris, Tetris,
and you clear the entire board.

1049
00:54:59,290 --> 00:55:01,883
And so if you can solve
the 3-partition problem

1050
00:55:01,883 --> 00:55:03,550
you can clear the
board and win the game

1051
00:55:03,550 --> 00:55:06,650
and be the best
Tetris player ever.

1052
00:55:06,650 --> 00:55:09,340
And if there is no
solution to 3-partition,

1053
00:55:09,340 --> 00:55:11,540
you're guaranteed to lose.

1054
00:55:11,540 --> 00:55:14,384
And so this proves
Tetris is NP-hard.

1055
00:55:14,384 --> 00:55:16,120
Cool.

1056
00:55:16,120 --> 00:55:20,040
So what else do I
want to say, briefly?

1057
00:55:20,040 --> 00:55:24,230
I think that's the main idea.

1058
00:55:24,230 --> 00:55:27,490
So another example--
so this spot

1059
00:55:27,490 --> 00:55:29,380
is called EXP-completeness.

1060
00:55:35,440 --> 00:55:38,800
And this includes problems
such as n by n chess.

1061
00:55:38,800 --> 00:55:42,023
So we know that chess
requires exponential time

1062
00:55:42,023 --> 00:55:43,690
because, in fact,
it's among the hardest

1063
00:55:43,690 --> 00:55:45,340
problems in exponential time.

1064
00:55:45,340 --> 00:55:47,605
But most common are the--

1065
00:55:47,605 --> 00:55:51,390
that's somehow because of the
two-player nature of the game.

1066
00:55:51,390 --> 00:55:53,610
Most common are
NP-complete problems.

1067
00:55:53,610 --> 00:55:56,760
And we have a bunch of
example NP-complete problems

1068
00:55:56,760 --> 00:55:59,170
I'll just briefly mention here.

1069
00:55:59,170 --> 00:56:01,920
So we saw the
subset sum problem,

1070
00:56:01,920 --> 00:56:04,690
which we had a polynomial
time algorithm for--

1071
00:56:04,690 --> 00:56:07,600
sorry, a pseudo polynomial
time algorithm for last class--

1072
00:56:07,600 --> 00:56:09,660
in fact has no polynomial
time algorithm,

1073
00:56:09,660 --> 00:56:11,250
assuming P equals NP.

1074
00:56:11,250 --> 00:56:15,000
So pseudo poly is the best you
can hope for for subset sum.

1075
00:56:15,000 --> 00:56:17,310
There's a related notion
called weakly NP-hardness,

1076
00:56:17,310 --> 00:56:19,140
which I won't get into here.

1077
00:56:19,140 --> 00:56:20,460
3-partition is one we saw.

1078
00:56:20,460 --> 00:56:23,080
We saw some reductions
to other problems.

1079
00:56:23,080 --> 00:56:24,930
So these are all NP complete.

1080
00:56:24,930 --> 00:56:27,450
Longest common subsequence is
another dynamic programming

1081
00:56:27,450 --> 00:56:28,933
problem we saw
with two sequences.

1082
00:56:28,933 --> 00:56:30,600
I mentioned you could
solve it for three

1083
00:56:30,600 --> 00:56:32,320
or four or any constant number.

1084
00:56:32,320 --> 00:56:34,620
But if I give you n
sequences each of length n,

1085
00:56:34,620 --> 00:56:37,830
that problem is
NP-hard, NP-complete.

1086
00:56:37,830 --> 00:56:39,958
Longest simple path
in a graph-- we

1087
00:56:39,958 --> 00:56:41,250
know how to solve longest path.

1088
00:56:41,250 --> 00:56:43,830
You just solve shortest
path and negative weights.

1089
00:56:43,830 --> 00:56:46,500
But longest simple path, where
you don't repeat vertices,

1090
00:56:46,500 --> 00:56:47,820
that's NP-complete.

1091
00:56:47,820 --> 00:56:51,270
Relatedly, one of the most
famous NP-complete problems

1092
00:56:51,270 --> 00:56:53,640
is traveling salesman
problem, finding

1093
00:56:53,640 --> 00:56:58,330
the shortest path that visits
all vertices in a given graph.

1094
00:56:58,330 --> 00:57:00,090
So instead of just
going from A to B,

1095
00:57:00,090 --> 00:57:03,420
I want to visit all the
vertices in the graph.

1096
00:57:03,420 --> 00:57:05,010
A lot of these
problems I'm phrasing

1097
00:57:05,010 --> 00:57:07,140
as optimization problems.

1098
00:57:07,140 --> 00:57:08,815
But when I say
NP-complete, I actually

1099
00:57:08,815 --> 00:57:10,440
mean a decision
version of the problem.

1100
00:57:10,440 --> 00:57:12,690
For example, with this
one, the decision question

1101
00:57:12,690 --> 00:57:15,450
is, is the shortest path that
visits all vertices in a graph

1102
00:57:15,450 --> 00:57:17,940
less than or equal
to a given value x.

1103
00:57:17,940 --> 00:57:19,830
If you can solve this,
then by binary search

1104
00:57:19,830 --> 00:57:22,500
you can solve the
overall weight.

1105
00:57:22,500 --> 00:57:25,230
3-coloring a graph is hard,
even though 2-coloring a graph

1106
00:57:25,230 --> 00:57:26,130
is polynomial.

1107
00:57:26,130 --> 00:57:27,720
3-coloring is NP-complete.

1108
00:57:27,720 --> 00:57:30,150
Assigning three
colors to the vertices

1109
00:57:30,150 --> 00:57:32,460
so that no adjacent vertices
have the same color,

1110
00:57:32,460 --> 00:57:34,750
finding the largest clique
in a given graph, which

1111
00:57:34,750 --> 00:57:38,040
would be useful for analyzing
social networks, whatever.

1112
00:57:38,040 --> 00:57:40,170
This is a fun one
for me as a geometer.

1113
00:57:40,170 --> 00:57:43,200
If you're in a three-dimensional
world, which I am,

1114
00:57:43,200 --> 00:57:47,850
and I want to find the shortest
path from here to there that

1115
00:57:47,850 --> 00:57:49,680
doesn't collide
with any obstacles,

1116
00:57:49,680 --> 00:57:53,250
like this desk and all the
chairs and so on, in 3D,

1117
00:57:53,250 --> 00:57:53,970
this problem--

1118
00:57:53,970 --> 00:57:56,400
if you can fly, so if
you're a drone flying

1119
00:57:56,400 --> 00:57:58,170
among all these
obstacles, you want

1120
00:57:58,170 --> 00:58:02,025
to find the shortest path from
A to B, this is NP-complete.

1121
00:58:02,025 --> 00:58:02,970
It's quite surprising.

1122
00:58:02,970 --> 00:58:04,560
In two dimensions,
it's polynomial.

1123
00:58:04,560 --> 00:58:06,400
You can reduce it to
graph shortest paths.

1124
00:58:06,400 --> 00:58:08,100
But in 3D, it's NP-hard.

1125
00:58:08,100 --> 00:58:10,350
This is a formula problem
that comes up a lot.

1126
00:58:10,350 --> 00:58:12,240
Given a Boolean formula
with AND, OR, or NOT,

1127
00:58:12,240 --> 00:58:14,940
can you ever make it true,
if it has some variables that

1128
00:58:14,940 --> 00:58:15,930
are not assigned?

1129
00:58:15,930 --> 00:58:18,810
And some more fun examples
are Minesweeper or Sudoku.

1130
00:58:18,810 --> 00:58:21,443
Basically any paper and pencil
puzzle you've ever played,

1131
00:58:21,443 --> 00:58:22,860
there's probably
a paper out there

1132
00:58:22,860 --> 00:58:24,390
proving that it's NP-complete.

1133
00:58:24,390 --> 00:58:27,240
And on the video game
side, Super Mario Brothers

1134
00:58:27,240 --> 00:58:28,180
is NP-hard.

1135
00:58:28,180 --> 00:58:29,460
Legend of Zelda is NP-hard.

1136
00:58:29,460 --> 00:58:30,780
Pokemon is NP-hard.

1137
00:58:30,780 --> 00:58:33,960
These problems are actually all
a little bit harder than NP,

1138
00:58:33,960 --> 00:58:37,540
in a different class called
P-space, which I won't go into.

1139
00:58:37,540 --> 00:58:39,630
But if you're interested
in this stuff,

1140
00:58:39,630 --> 00:58:42,690
there is a whole class devoted
to it, which has online video

1141
00:58:42,690 --> 00:58:45,900
lectures, so you can watch
them whenever you want,

1142
00:58:45,900 --> 00:58:50,640
called 6.892, that gives
a bunch of especially fun

1143
00:58:50,640 --> 00:58:53,850
examples of NP-hardness
and other types of hardness

1144
00:58:53,850 --> 00:58:56,880
proofs from a sort of
algorithm perspective for lots

1145
00:58:56,880 --> 00:59:01,200
of games and puzzles
you might care about.

1146
00:59:01,200 --> 00:59:03,280
And that's it.