WEBVTT
00:00:00.040 --> 00:00:02.480
The following content is
provided under a Creative
00:00:02.480 --> 00:00:04.010
Commons license.
00:00:04.010 --> 00:00:06.340
Your support will help
MIT OpenCourseWare
00:00:06.340 --> 00:00:10.690
continue to offer high quality
educational resources for free.
00:00:10.690 --> 00:00:13.320
To make a donation or
view additional materials
00:00:13.320 --> 00:00:17.035
from hundreds of MIT courses,
visit MIT OpenCourseWare
00:00:17.035 --> 00:00:17.660
at ocw.mit.edu.
00:00:26.350 --> 00:00:32.340
ERIC DEMAINE: All right,
today we do NP completeness,
00:00:32.340 --> 00:00:35.420
an entire field in one lecture.
00:00:35.420 --> 00:00:36.010
Should be fun.
00:00:36.010 --> 00:00:38.590
I actually taught an entire
class about this topic
00:00:38.590 --> 00:00:42.620
last semester, but now we're
going to do it in 80 minutes.
00:00:42.620 --> 00:00:45.610
And we're going to look at
lots of different problems,
00:00:45.610 --> 00:00:48.440
from Super Mario Brothers
to jigsaw puzzles,
00:00:48.440 --> 00:00:51.000
and show that
they're NP -complete.
00:00:51.000 --> 00:00:51.930
This is a fun area.
00:00:51.930 --> 00:00:54.499
As Srini mentioned last class,
it's all about reductions.
00:00:54.499 --> 00:00:56.040
It's all about
converting one problem
00:00:56.040 --> 00:00:59.380
into another, which is a fun
kind of puzzle in itself.
00:00:59.380 --> 00:01:01.170
It's an algorithmic challenge.
00:01:01.170 --> 00:01:03.410
And we're going to do it a lot.
00:01:03.410 --> 00:01:06.600
But first I'm going to remind
you of some of the things you
00:01:06.600 --> 00:01:10.010
learned from 006, and tell you
what we need to do in order
00:01:10.010 --> 00:01:14.830
to prove all of these relations,
what exactly we need to show
00:01:14.830 --> 00:01:18.750
for each of those arrows,
and why it's interesting.
00:01:18.750 --> 00:01:23.490
So this is generally around
the P versus NP problem.
00:01:23.490 --> 00:01:26.630
So remember, P is all
the problems we know how
00:01:26.630 --> 00:01:27.932
to solve in polynomial time.
00:01:27.932 --> 00:01:30.140
Well not just the ones we
know how to solve, but also
00:01:30.140 --> 00:01:36.430
the ones that can
be solved, which
00:01:36.430 --> 00:01:44.490
is pretty much-- which is the
topic of 6.006, and 6.046 up
00:01:44.490 --> 00:01:45.250
till now.
00:01:45.250 --> 00:01:47.210
But for now, in the
next few lectures,
00:01:47.210 --> 00:01:49.480
we'll be talking about
problems that are probably not
00:01:49.480 --> 00:01:52.940
polynomially solvable,
and what to do about them.
00:01:52.940 --> 00:01:57.590
Polynomial, as you now, is
like n to the some constant.
00:01:57.590 --> 00:01:59.350
Polynomial good exponential bad.
00:02:03.490 --> 00:02:04.300
What is n?
00:02:04.300 --> 00:02:08.630
I guess n is the size
of the problem, which
00:02:08.630 --> 00:02:11.150
we'll have to be a little
bit careful about today.
00:02:11.150 --> 00:02:17.240
And then NP is not
problem solvable
00:02:17.240 --> 00:02:19.480
not in polynomial time,
but it's problem solvable
00:02:19.480 --> 00:02:21.800
in nondeterministic
polynomial time.
00:02:24.720 --> 00:02:27.060
And in this case
we need to focus
00:02:27.060 --> 00:02:32.010
on a particular type of problem,
which is decision problems.
00:02:32.010 --> 00:02:35.520
Decision just means that the
answer is either yes or no.
00:02:35.520 --> 00:02:37.450
So it's a single bit answer.
00:02:43.130 --> 00:02:44.935
We will see why we
need to restrict
00:02:44.935 --> 00:02:46.435
to that kind of
problem in a moment.
00:03:11.030 --> 00:03:14.370
So this is problems you can
solve in polynomial time.
00:03:14.370 --> 00:03:16.960
Same notion of polynomials,
same notion of n,
00:03:16.960 --> 00:03:19.640
but in a totally unrealistic
model of computation.
00:03:19.640 --> 00:03:22.080
Which is a
nondeterministic model.
00:03:22.080 --> 00:03:24.060
In a nondeterministic
model, what
00:03:24.060 --> 00:03:29.150
you can do is say
instead of computing
00:03:29.150 --> 00:03:33.010
something from something you
know, you could make a guess.
00:03:33.010 --> 00:03:58.840
So you can guess one out of
polynomially many options
00:03:58.840 --> 00:04:02.090
in constant time.
00:04:02.090 --> 00:04:04.680
So normally a constant time
operation, in regular models,
00:04:04.680 --> 00:04:08.230
like you add two numbers, or you
do an if, that sort of thing.
00:04:08.230 --> 00:04:10.870
Here we can make a guess.
00:04:10.870 --> 00:04:15.290
I give the computer polynomially
many options I'm interested in.
00:04:15.290 --> 00:04:17.050
Computer's going to
give me one of them.
00:04:17.050 --> 00:04:20.570
It's going to give
me a good guess.
00:04:20.570 --> 00:04:21.959
Guess is guaranteed to be good.
00:04:21.959 --> 00:04:24.290
And good means here
that I want to get
00:04:24.290 --> 00:04:27.110
to a yes answer if I can.
00:04:27.110 --> 00:04:35.630
So the formal statement
is, if any guess
00:04:35.630 --> 00:04:46.960
would lead to a yes answer,
then we get such a guess.
00:04:51.096 --> 00:04:52.580
OK, this is weird.
00:04:52.580 --> 00:04:53.610
And it's asymmetric.
00:04:53.610 --> 00:04:55.060
It's biased towards yes.
00:04:55.060 --> 00:04:58.010
And this is why we can only
think about decision problems,
00:04:58.010 --> 00:04:59.090
yes or no.
00:04:59.090 --> 00:05:00.360
You could bias towards no.
00:05:00.360 --> 00:05:02.060
You get something
else called coNP.
00:05:02.060 --> 00:05:04.560
But we'll focus here just on NP.
00:05:04.560 --> 00:05:08.820
So the idea is I'd really
like to find a guess that
00:05:08.820 --> 00:05:09.980
leads to a yes answer.
00:05:09.980 --> 00:05:12.670
And the machine magically
gives me one if there is one.
00:05:12.670 --> 00:05:15.170
Which means if I end
up saying no, that
00:05:15.170 --> 00:05:18.790
means there was absolutely no
path that would lead to a yes.
00:05:18.790 --> 00:05:21.300
So when you get a no, you
get a lot of information.
00:05:21.300 --> 00:05:23.250
When you get a yes, you
get some information.
00:05:23.250 --> 00:05:24.249
But hey, you were lucky.
00:05:24.249 --> 00:05:25.700
Hard to complain.
00:05:25.700 --> 00:05:30.340
So in 006, I often call this
the lucky model of computation.
00:05:30.340 --> 00:05:31.960
That's the informal version.
00:05:31.960 --> 00:05:35.740
But nondeterminism is
what's really going on here.
00:05:35.740 --> 00:05:41.820
So maybe it's useful
to get an example.
00:05:41.820 --> 00:05:47.240
So here's a problem we'll--
this is sort of the granddaddy
00:05:47.240 --> 00:05:49.080
of all NP-complete problems.
00:05:49.080 --> 00:05:51.560
We'll get to
completeness in a moment.
00:05:51.560 --> 00:06:01.855
3SAT-- SAT stands
for satisfiability.
00:06:10.030 --> 00:06:12.590
So in 3SAT, the
input to the problem
00:06:12.590 --> 00:06:14.470
looks something like this.
00:06:14.470 --> 00:06:16.016
I'm just going to
give an example.
00:06:26.930 --> 00:06:30.870
And in case you've forgotten
your weird logic notation,
00:06:30.870 --> 00:06:33.520
this is an and.
00:06:33.520 --> 00:06:36.580
These are ORs.
00:06:36.580 --> 00:06:42.680
And I'm using this
for negation, not.
00:06:42.680 --> 00:06:48.050
So in other words, I'm given
a formula which is and of ORs.
00:06:48.050 --> 00:06:52.340
And each or clause only
has three things in it.
00:06:52.340 --> 00:06:53.715
These things are
called literals.
00:07:00.300 --> 00:07:03.990
And a literal is either
a variable x sub i,
00:07:03.990 --> 00:07:08.020
or it's the negation of
a variable, not x sub i.
00:07:08.020 --> 00:07:09.500
So this is a typical example.
00:07:09.500 --> 00:07:11.380
You could have no negations.
00:07:11.380 --> 00:07:13.510
You could here have one
negation, two negations,
00:07:13.510 --> 00:07:15.950
any number of
negations per clause.
00:07:15.950 --> 00:07:19.580
These groups of three--
these or of three
00:07:19.580 --> 00:07:23.900
things, three literals,
are called clauses.
00:07:23.900 --> 00:07:26.960
And they're all ANDed together.
00:07:26.960 --> 00:07:30.100
And my goal is, this should
be a decision question,
00:07:30.100 --> 00:07:31.960
so I have a yes or no question.
00:07:31.960 --> 00:07:44.030
And that question is, can
you set the variables--
00:07:44.030 --> 00:07:52.010
So they're x1 to true or false?
00:07:52.010 --> 00:07:56.020
So each variable I get to choose
a true or false designation
00:07:56.020 --> 00:07:57.860
such that the formula
comes out true.
00:08:04.870 --> 00:08:07.390
I use T and F for
true and false.
00:08:07.390 --> 00:08:09.650
So I want to set
these variables such
00:08:09.650 --> 00:08:12.230
that every clause comes
out true, because they're
00:08:12.230 --> 00:08:12.970
ANDed together.
00:08:12.970 --> 00:08:15.910
So I have to satisfy this
clause in one of three ways.
00:08:15.910 --> 00:08:17.700
Maybe I satisfy
it all three ways.
00:08:17.700 --> 00:08:19.700
Doesn't matter, as long
as at least one of these
00:08:19.700 --> 00:08:22.490
should be true, and at least
one of these should be true,
00:08:22.490 --> 00:08:25.940
and at least one of each
clause should be true.
00:08:25.940 --> 00:08:28.300
So that's the 3SAT problem.
00:08:28.300 --> 00:08:29.421
This is a hard problem.
00:08:29.421 --> 00:08:31.170
We don't know a
polynomial time algorithm.
00:08:31.170 --> 00:08:32.520
There probably isn't one.
00:08:32.520 --> 00:08:37.669
But there is a polynomial time
nondeterministic algorithm.
00:08:37.669 --> 00:08:54.140
So this problem is in NP
because if I have lucky guesses,
00:08:54.140 --> 00:08:56.460
it's kind of designed to
solve this kind of problem.
00:08:56.460 --> 00:09:03.730
What I'm going to do is guess
whether x1 is true or false.
00:09:03.730 --> 00:09:05.490
So I have two choices.
00:09:05.490 --> 00:09:08.240
And I'm going to ask my machine
to make the right choice,
00:09:08.240 --> 00:09:10.120
whether it should
be true or false.
00:09:10.120 --> 00:09:13.760
Then I'll guess x2.
00:09:13.760 --> 00:09:17.480
Each of these guess operations
takes constant time.
00:09:17.480 --> 00:09:19.250
So I do it for every variable.
00:09:19.250 --> 00:09:21.000
And then I'm going
to check whether I
00:09:21.000 --> 00:09:22.670
happen to satisfy the formula.
00:09:28.960 --> 00:09:33.140
And if it comes out true,
then I'll return yes.
00:09:33.140 --> 00:09:35.267
And if it comes out
false, I'll return no.
00:09:43.720 --> 00:09:47.720
And because NP is biased
towards yes answers,
00:09:47.720 --> 00:09:50.824
it always finds a yes
answer if you can.
00:09:50.824 --> 00:09:53.560
If there's some way to
satisfy the formula,
00:09:53.560 --> 00:09:55.160
then I will get it.
00:09:55.160 --> 00:09:57.550
If there's some way to make
the formula come out true,
00:09:57.550 --> 00:09:59.890
then this algorithm
will return yes.
00:09:59.890 --> 00:10:03.930
If there's no way
to satisfy it, then
00:10:03.930 --> 00:10:05.996
this nondeterministic
algorithm will return no.
00:10:05.996 --> 00:10:07.370
That's just the
definition of how
00:10:07.370 --> 00:10:09.340
nondeterministic machines work.
00:10:09.340 --> 00:10:10.410
It's a little weird.
00:10:10.410 --> 00:10:13.670
But you can see from
this kind of prototype
00:10:13.670 --> 00:10:17.800
of a nondeterministic algorithm,
you can actually always
00:10:17.800 --> 00:10:20.920
arrange for your guessing
to be at the beginning.
00:10:20.920 --> 00:10:25.830
And then you do some regular
polynomial time checking
00:10:25.830 --> 00:10:29.010
or deterministic checking.
00:10:29.010 --> 00:10:30.440
So when you rewrite
your algorithm
00:10:30.440 --> 00:10:32.820
like this with guesses up
front and then checking,
00:10:32.820 --> 00:10:36.360
you can also think of it as
a verification algorithm.
00:10:36.360 --> 00:10:41.050
So you can say, your friend
claims that this 3SAT formula
00:10:41.050 --> 00:10:43.420
is satisfiable,
meaning there's a way
00:10:43.420 --> 00:10:46.140
to set the variable so
that it comes out true.
00:10:46.140 --> 00:10:48.264
So this is called a
satisfying assignment.
00:10:50.990 --> 00:10:52.627
Satisfying just means make true.
00:10:56.830 --> 00:10:59.355
And you're like, no,
I don't believe you.
00:10:59.355 --> 00:11:01.940
And your friend says no,
no, no, really, it's true.
00:11:01.940 --> 00:11:03.460
And here's how I can prove it.
00:11:03.460 --> 00:11:04.720
You set x1 to false.
00:11:04.720 --> 00:11:05.725
You set x2 to true.
00:11:05.725 --> 00:11:09.260
You set x3-- basically
they give you the guesses.
00:11:09.260 --> 00:11:11.605
And then you don't
have to be convinced
00:11:11.605 --> 00:11:12.980
that those are
the right guesses,
00:11:12.980 --> 00:11:14.646
you can check that
it's the right guess.
00:11:14.646 --> 00:11:17.090
You can compute this
formula in linear time,
00:11:17.090 --> 00:11:18.510
see what the outcome is.
00:11:18.510 --> 00:11:20.420
If someone tells you
what the xi's are,
00:11:20.420 --> 00:11:21.980
you can very quickly
see whether that
00:11:21.980 --> 00:11:24.310
was a satisfying assignment.
00:11:24.310 --> 00:11:26.440
So you could call
this a solution,
00:11:26.440 --> 00:11:28.650
and then there's a
polynomial time verification
00:11:28.650 --> 00:11:32.700
algorithm that checks
that solutions are valid.
00:11:32.700 --> 00:11:36.170
But, you can only do
that for yes answers.
00:11:36.170 --> 00:11:39.510
Your friend says no,
this is not satisfiable,
00:11:39.510 --> 00:11:42.270
they have no way of
proving it to you.
00:11:42.270 --> 00:11:46.450
I mean, other than checking
all the assignments separately,
00:11:46.450 --> 00:11:48.290
which would take
exponential time,
00:11:48.290 --> 00:11:51.140
there's no easy way to confirm
that the answer to this problem
00:11:51.140 --> 00:11:51.670
is no.
00:11:51.670 --> 00:11:53.470
But there is an
easy way to check
00:11:53.470 --> 00:11:55.830
that the answer is yes, namely
I give you the satisfying
00:11:55.830 --> 00:11:57.600
assignment.
00:11:57.600 --> 00:12:01.130
So this definition of NP
is what I'll stick to.
00:12:01.130 --> 00:12:05.100
It's this sort of-- I
like guessing because it's
00:12:05.100 --> 00:12:06.500
like dynamic programming.
00:12:06.500 --> 00:12:09.329
With dynamic programming
we also guess,
00:12:09.329 --> 00:12:11.620
and guessing actually originally
comes from this world,
00:12:11.620 --> 00:12:13.390
nondeterminism.
00:12:13.390 --> 00:12:16.790
In dynamic programming, we
don't allow this kind of model.
00:12:16.790 --> 00:12:19.030
And so we have to check
the guesses separately.
00:12:19.030 --> 00:12:20.790
And so we spend lots of time.
00:12:20.790 --> 00:12:22.680
Here, magically, you
always get the right
00:12:22.680 --> 00:12:24.320
guess in only constant time.
00:12:24.320 --> 00:12:26.070
So this is a much
more powerful model.
00:12:26.070 --> 00:12:30.251
Of course there's no computers
that work like this, sadly,
00:12:30.251 --> 00:12:32.280
or I guess more interestingly.
00:12:32.280 --> 00:12:36.870
So this is more about confirming
that your problem is not
00:12:36.870 --> 00:12:38.830
totally impossible.
00:12:38.830 --> 00:12:42.560
At least you can check the
answers in polynomial time.
00:12:42.560 --> 00:12:46.830
So that's one thing.
00:13:31.110 --> 00:13:33.210
So this is an equivalent
definition of NP
00:13:33.210 --> 00:13:35.430
because you can take a
nondeterministic algorithm
00:13:35.430 --> 00:13:37.300
and put the guessing up top.
00:13:37.300 --> 00:13:39.200
You can call the
results of those guesses
00:13:39.200 --> 00:13:42.060
a certificate that
an answer is yes.
00:13:42.060 --> 00:13:45.940
And then you have a regular old
deterministic polynomial time
00:13:45.940 --> 00:13:48.430
algorithm that, given
that certificate,
00:13:48.430 --> 00:13:54.516
will verify that it actually
proves that the answer is yes.
00:13:57.820 --> 00:14:00.730
It's just that certificate
has to be polynomial size.
00:14:00.730 --> 00:14:02.960
You can't guess something
of exponential size.
00:14:02.960 --> 00:14:07.010
You can only guess something of
polynomial size in this model.
00:14:07.010 --> 00:14:10.410
So seems a little weird.
00:14:10.410 --> 00:14:16.610
But we'll see why this is
useful in a little bit.
00:14:16.610 --> 00:14:18.795
So let me go to NP completeness.
00:14:24.740 --> 00:14:37.320
So if I have a problem X,
it's NP-complete if X is in NP
00:14:37.320 --> 00:14:38.830
and X is NP-hard.
00:14:41.760 --> 00:14:44.130
But I haven't told
you what NP-hard is.
00:14:44.130 --> 00:15:24.860
Maybe you remember from
006, but let me remind you.
00:15:24.860 --> 00:15:28.840
So, I need to define reduce.
00:15:28.840 --> 00:15:30.684
So maybe I'll do
that as well, then
00:15:30.684 --> 00:15:31.850
we can talk about all these.
00:16:26.130 --> 00:16:29.110
OK, a lot of definitions.
00:16:29.110 --> 00:16:32.970
But the idea of NP
hardness is very simple.
00:16:32.970 --> 00:16:35.140
If problem X is
NP-hard, it means
00:16:35.140 --> 00:16:39.890
that it's at least
as hard as-- sorry,
00:16:39.890 --> 00:16:46.280
that is a Y-- it's at least
as hard as all problems in NP.
00:16:46.280 --> 00:16:48.140
Intuitively, X means
it's at least as
00:16:48.140 --> 00:16:50.280
hard as everything in NP.
00:16:50.280 --> 00:16:53.310
Whereas being in NP is
a positive statement.
00:16:53.310 --> 00:16:55.240
That says it's not
too hard, at least
00:16:55.240 --> 00:16:57.760
there's a polynomial time
verification algorithm.
00:16:57.760 --> 00:17:00.060
So being in NP is good news.
00:17:00.060 --> 00:17:02.650
It says you're no
harder than NP.
00:17:02.650 --> 00:17:05.920
NP-hard says you're at least
as hard as everything in NP.
00:17:05.920 --> 00:17:08.230
And so NP-complete
is a nice answer
00:17:08.230 --> 00:17:11.579
because this says you're exactly
as hard as everything in NP--
00:17:11.579 --> 00:17:13.849
no harder, no easier.
00:17:13.849 --> 00:17:17.099
If you draw, in
this vague sense,
00:17:17.099 --> 00:17:23.589
computational difficulty
on one axis-- which is not
00:17:23.589 --> 00:17:26.380
really accurate, but I
like to do it anyway--
00:17:26.380 --> 00:17:31.320
and you have P is all of
these easy problems down here.
00:17:31.320 --> 00:17:35.460
And NP is some
larger set like this.
00:17:35.460 --> 00:17:38.139
NP-hard is from here over.
00:17:42.540 --> 00:17:45.040
And this point right
here is NP-complete.
00:17:49.500 --> 00:17:53.350
Being in NP means you're left
of this line, or on the line.
00:17:53.350 --> 00:17:55.600
And being NP-hard means
you're right of this line,
00:17:55.600 --> 00:17:56.480
or on the line.
00:17:56.480 --> 00:17:58.610
NP-complete means
you're right there.
00:17:58.610 --> 00:18:02.510
So that's a very definitive
sense of hardness.
00:18:02.510 --> 00:18:04.690
Now there is this
slight catch, which
00:18:04.690 --> 00:18:06.550
is we don't know
whether P equals NP.
00:18:06.550 --> 00:18:11.450
So maybe this is the same
as this, but probably not.
00:18:11.450 --> 00:18:14.530
Unless you believe
in luck, basically,
00:18:14.530 --> 00:18:16.400
unless you imagine
that a computer could
00:18:16.400 --> 00:18:19.830
engineer luck and always
guess the right things
00:18:19.830 --> 00:18:23.630
without spending a lot of
time, then P does not equal NP.
00:18:23.630 --> 00:18:26.680
And in that world, what
we get is that if you have
00:18:26.680 --> 00:18:29.650
an NP-complete problem, or
actually any NP-hard problem,
00:18:29.650 --> 00:18:33.590
you know it cannot be NP.
00:18:33.590 --> 00:18:38.240
So if you have
that X is NP-hard,
00:18:38.240 --> 00:18:44.405
then you know that X is
not in P unless all of NP
00:18:44.405 --> 00:18:47.690
is in P. So unless P equals NP.
00:18:47.690 --> 00:18:51.040
And most reasonable people
do not believe this.
00:18:51.040 --> 00:18:53.160
And so instead they
have to believe this,
00:18:53.160 --> 00:18:57.580
that your problem is not
polynomially solvable.
00:18:57.580 --> 00:18:59.740
So why is this true?
00:18:59.740 --> 00:19:01.680
Because if your
problem is NP-hard,
00:19:01.680 --> 00:19:05.880
it is at least as hard
as every problem in NP.
00:19:05.880 --> 00:19:08.449
And if you believe that
there is some problem in NP--
00:19:08.449 --> 00:19:09.990
we don't necessarily
know which one--
00:19:09.990 --> 00:19:14.280
but if there is any problem out
there in NP that is not in P,
00:19:14.280 --> 00:19:18.710
then X has to be at
least as hard as it.
00:19:18.710 --> 00:19:22.730
So it also requires
nonpolynomial time, something
00:19:22.730 --> 00:19:25.060
larger than polynomial time.
00:19:25.060 --> 00:19:27.090
What does at least
as hard mean though?
00:19:27.090 --> 00:19:29.690
We're going to define it
in terms of reductions.
00:19:29.690 --> 00:19:31.530
Reduction from one
problem to another
00:19:31.530 --> 00:19:33.910
is just a polynomial
time algorithm,
00:19:33.910 --> 00:19:36.130
regular deterministic
polynomial time,
00:19:36.130 --> 00:19:38.440
that converts an
input to the problem A
00:19:38.440 --> 00:19:40.980
into an equivalent
input to problem
00:19:40.980 --> 00:19:45.554
B. Equivalent means that it
has the same yes or no answer.
00:19:48.105 --> 00:19:50.480
And we'll just be thinking
about decision problems today.
00:19:55.410 --> 00:19:59.610
So why would I care
about a reduction?
00:19:59.610 --> 00:20:03.960
Because what it tells me is that
if I know how to solve problem
00:20:03.960 --> 00:20:07.860
B, then I also know how
to solve problem A. If I
00:20:07.860 --> 00:20:11.180
have a, say, a polynomial
time algorithm for solving B
00:20:11.180 --> 00:20:13.960
and I want one for A,
I just take my A input.
00:20:13.960 --> 00:20:16.440
I convert it into the
equivalent B input.
00:20:16.440 --> 00:20:18.190
Then I run my algorithm
for B, and then it
00:20:18.190 --> 00:20:19.880
gives me the answer
to the A problem
00:20:19.880 --> 00:20:22.920
because the answers
are the same.
00:20:22.920 --> 00:20:27.480
So if you have a reduction
like this and if say,
00:20:27.480 --> 00:20:29.940
B, has a polynomial
time algorithm,
00:20:29.940 --> 00:20:35.410
then so does A, because you
can just convert A into B,
00:20:35.410 --> 00:20:39.870
and then solve B.
Also this works
00:20:39.870 --> 00:20:42.990
for nondeterministic algorithms.
00:20:48.540 --> 00:20:50.288
Not too important.
00:20:53.360 --> 00:20:56.710
So what this tells us is
that in a certain sense--
00:20:56.710 --> 00:21:00.850
get this right--
well this is saying,
00:21:00.850 --> 00:21:06.320
if I can solve B,
that I can solve A.
00:21:06.320 --> 00:21:12.020
So this is saying
that B is at least as
00:21:12.020 --> 00:21:18.020
hard as A. I think I got
that right, a little tricky.
00:21:18.020 --> 00:21:21.210
So if we want to prove
the problem is NP hard,
00:21:21.210 --> 00:21:24.560
what we do is show that
every problem in NP
00:21:24.560 --> 00:21:28.150
can be reduced to the problem
of X. So now we can go back
00:21:28.150 --> 00:21:31.390
and say well, if we believe
that there is some problem Y,
00:21:31.390 --> 00:21:34.520
that is in NP minus P, if
there's something out here that
00:21:34.520 --> 00:21:38.780
is not in P, then we
can take that problem Y,
00:21:38.780 --> 00:21:40.790
and by this definition,
we can reduce it
00:21:40.790 --> 00:21:43.750
to X, because everything
in NP reduces to X.
00:21:43.750 --> 00:21:49.250
And so then I can
solve my problem
00:21:49.250 --> 00:21:51.420
Y, which is in NP minus P,
00:21:51.420 --> 00:21:52.810
by converting it
to X and solving
00:21:52.810 --> 00:21:55.770
X. So that means X better
not have a polynomial time
00:21:55.770 --> 00:21:59.240
algorithm, because if
it did, Y would also
00:21:59.240 --> 00:22:00.650
have a polynomial
time algorithm.
00:22:00.650 --> 00:22:02.740
And then in general,
P would equal
00:22:02.740 --> 00:22:05.730
NP, because every problem
in NP can be converted to X.
00:22:05.730 --> 00:22:07.780
So if X has a polynomial
time algorithm,
00:22:07.780 --> 00:22:10.470
then every problem Y does.
00:22:10.470 --> 00:22:11.667
Question?
00:22:11.667 --> 00:22:13.250
AUDIENCE: For the
second if statement,
00:22:13.250 --> 00:22:17.690
why can't you say that if
A is in NP, B is in NP?
00:22:17.690 --> 00:22:20.190
ERIC DEMAINE: So you're asked
us about the reverse question.
00:22:20.190 --> 00:22:23.900
If is A in NP, can we
conclude that B is in NP?
00:22:23.900 --> 00:22:25.314
And the answer is no.
00:22:30.890 --> 00:22:33.880
Because this reduction only
lets us convert from A to B.
00:22:33.880 --> 00:22:37.100
It doesn't let us do anything
for converting from B to A.
00:22:37.100 --> 00:22:39.580
So if we know how to
solve A and we also
00:22:39.580 --> 00:22:43.180
know how to convert A into B,
it doesn't tell us anything.
00:22:43.180 --> 00:22:47.345
It could be B is a much
harder problem than A,
00:22:47.345 --> 00:22:48.450
in that situation.
00:22:53.394 --> 00:22:55.310
That's, I think, as good
as I can do for that.
00:22:55.310 --> 00:22:55.976
Other questions?
00:22:58.891 --> 00:22:59.390
All right.
00:22:59.390 --> 00:23:01.860
It is really tricky to get
these directions right.
00:23:01.860 --> 00:23:09.410
So let me give you a handy guide
on how to not make a mistake.
00:23:09.410 --> 00:23:11.028
So maybe over here.
00:23:20.410 --> 00:23:24.430
What we care about, from
an algorithmic perspective,
00:23:24.430 --> 00:23:27.780
is proving the problems
are NP-complete.
00:23:37.716 --> 00:23:39.590
Because if we prove
NP-completeness-- I mean,
00:23:39.590 --> 00:23:41.044
really we care
about NP-hardness,
00:23:41.044 --> 00:23:42.710
but we might as well
do NP-completeness.
00:23:42.710 --> 00:23:45.250
Most of the problems that
we'll see that are NP-hard
00:23:45.250 --> 00:23:47.950
are also NP-complete.
00:23:47.950 --> 00:23:51.450
So when we prove this, we
prove that there is basically
00:23:51.450 --> 00:23:53.990
no polynomial time
algorithm for that problem.
00:23:53.990 --> 00:23:55.810
So that's good to
know, because then we
00:23:55.810 --> 00:23:59.840
can just give up searching for
a polynomial time algorithm.
00:23:59.840 --> 00:24:02.300
So all the problems
we've seen so far
00:24:02.300 --> 00:24:05.050
have polynomial time algorithms,
except a couple in your problem
00:24:05.050 --> 00:24:07.089
sets, which were
actually NP-complete.
00:24:07.089 --> 00:24:09.130
And the best you could
have done was exponential,
00:24:09.130 --> 00:24:11.420
unless P equals NP.
00:24:11.420 --> 00:24:15.550
So here's how you can prove
this kind of lower bound
00:24:15.550 --> 00:24:17.550
to say look, I don't need
to look for algorithms
00:24:17.550 --> 00:24:19.970
any more because my
problem is just too hard.
00:24:19.970 --> 00:24:22.580
It's as hard as
everything in NP.
00:24:22.580 --> 00:24:25.700
So this is just a summary
of those definitions.
00:24:25.700 --> 00:24:29.820
The first thing you do
is prove that X is in NP.
00:24:29.820 --> 00:24:33.140
The second thing you do is
prove that X is NP-hard.
00:24:33.140 --> 00:24:39.460
And to do that, you reduce
from some known NP-complete
00:24:39.460 --> 00:24:44.430
problem-- or I guess
NP-hard, but we'll
00:24:44.430 --> 00:24:55.600
use NP-complete--
to your problem
00:24:55.600 --> 00:25:06.030
X. Maybe I'll give
this a name Y.
00:25:06.030 --> 00:25:08.080
OK, so to prove
that X is in NP, you
00:25:08.080 --> 00:25:09.970
do something like
what we did over here,
00:25:09.970 --> 00:25:12.390
which is to give a
nondeterministic algorithm.
00:25:12.390 --> 00:25:13.970
Or you can think
of it as defining
00:25:13.970 --> 00:25:19.290
what the certificate is and
then giving a polynomial time
00:25:19.290 --> 00:25:22.050
verification algorithm.
00:25:22.050 --> 00:25:24.812
So sort of two approaches.
00:25:24.812 --> 00:25:29.710
You can give a nondeterministic
polynomial time algorithm,
00:25:29.710 --> 00:25:33.234
or you give a certificate
and a verifier.
00:25:39.360 --> 00:25:41.230
There's no right or
wrong certificate.
00:25:41.230 --> 00:25:43.730
I mean, a certificate, you
can define however you want,
00:25:43.730 --> 00:25:46.190
as long as the verifier
can actually check it
00:25:46.190 --> 00:25:49.480
and when it says yes, then the
answer to the problem was yes.
00:25:49.480 --> 00:25:51.900
So it's really the same thing.
00:25:51.900 --> 00:25:53.710
Just want to say
there's some certificate
00:25:53.710 --> 00:25:57.236
that a verifier
could actually check.
00:25:57.236 --> 00:25:59.110
So that's proving that
your problem is in NP.
00:25:59.110 --> 00:26:01.660
It's sort of an
algorithmic thing.
00:26:01.660 --> 00:26:03.490
The second part is
all about reductions.
00:26:03.490 --> 00:26:05.270
Now the definition
says that I should
00:26:05.270 --> 00:26:08.776
reduce every problem
in NP to my problem X.
00:26:08.776 --> 00:26:10.150
That's tedious,
because there are
00:26:10.150 --> 00:26:11.610
a lot of problems in the world.
00:26:11.610 --> 00:26:14.530
So I don't want to do it
for every problem in NP.
00:26:14.530 --> 00:26:16.030
I'd like to just do it for one.
00:26:16.030 --> 00:26:17.994
Now if I reduce
sorting to my problem,
00:26:17.994 --> 00:26:19.160
that's not very interesting.
00:26:19.160 --> 00:26:22.980
It says my problem is at
least as hard as sorting.
00:26:22.980 --> 00:26:24.940
But I already know
how to solve sorting.
00:26:24.940 --> 00:26:28.960
But if I start from an
NP-complete problem,
00:26:28.960 --> 00:26:32.720
then I know, by the definition,
that every problem in NP
00:26:32.720 --> 00:26:34.550
can be reduced to that problem.
00:26:34.550 --> 00:26:37.770
And if I show how to reduce
the NP-complete problem to me,
00:26:37.770 --> 00:26:39.830
then I know that
I'm NP-complete too.
00:26:39.830 --> 00:26:44.180
Because if I have
any problem Z in NP,
00:26:44.180 --> 00:26:49.510
by the definition of NP-complete
of Y I can reduce that to Y.
00:26:49.510 --> 00:26:53.330
And then if I can build
a reduction from Y to X,
00:26:53.330 --> 00:26:55.520
then I get this reduction.
00:26:55.520 --> 00:26:58.380
And so that means I can
convert any problem in NP
00:26:58.380 --> 00:27:01.310
to my problem X, which
means X is NP-hard.
00:27:01.310 --> 00:27:03.190
That's the definition.
00:27:03.190 --> 00:27:06.650
So all this is to
say the first time
00:27:06.650 --> 00:27:09.740
you prove a problem is
NP-complete in the world-- this
00:27:09.740 --> 00:27:14.700
happened in the '70s by Cook.
00:27:14.700 --> 00:27:16.900
Basically he proved that
3SAT is NP-complete.
00:27:19.725 --> 00:27:21.100
That was annoying,
because he had
00:27:21.100 --> 00:27:23.106
to start from any
problem in NP, and he
00:27:23.106 --> 00:27:25.230
had to show that you could
reduce any problem in NP
00:27:25.230 --> 00:27:26.810
to 3SAT.
00:27:26.810 --> 00:27:29.820
But now that that hard work is
done, our life is much easier.
00:27:29.820 --> 00:27:31.780
And in this class all
you need to think about
00:27:31.780 --> 00:27:34.620
is picking your favorite
NP-complete problem.
00:27:34.620 --> 00:27:36.900
3SAT's a good choice
for almost anything,
00:27:36.900 --> 00:27:40.190
but we'll see a bunch of other
problems today from here.
00:27:40.190 --> 00:27:44.520
And then reduce from that
known problem to your problem
00:27:44.520 --> 00:27:46.690
that you're trying
to prove is NP-hard.
00:27:46.690 --> 00:27:50.220
If you can do that, you know
your problem is NP-hard.
00:27:50.220 --> 00:27:52.970
So we only need one reduction
for each hardness result, which
00:27:52.970 --> 00:27:53.470
is nice.
00:27:53.470 --> 00:27:56.060
And this picture is a
collection of reductions.
00:27:56.060 --> 00:27:57.360
We're going to start from 3SAT.
00:27:57.360 --> 00:27:59.010
I'm not going to prove
that it's NP-complete,
00:27:59.010 --> 00:28:01.150
although I'll give you a
hint as to why that's true.
00:28:01.150 --> 00:28:02.850
We're going to reduce it
to Super Mario Brothers.
00:28:02.850 --> 00:28:04.890
We're going to reduce it to
three dimensional matching.
00:28:04.890 --> 00:28:06.290
We're going to reduce
three dimensional matching
00:28:06.290 --> 00:28:08.490
to subsets sum, to partition,
to rectangle packing,
00:28:08.490 --> 00:28:10.165
to jig saw puzzles.
00:28:10.165 --> 00:28:13.310
And we're going to do all
those reductions, hopefully.
00:28:13.310 --> 00:28:18.920
And that's proving NP-hardness
of all those problems.
00:28:18.920 --> 00:28:20.070
They're also all in NP.
00:28:24.990 --> 00:28:31.610
So 30 second intuition
why 3SAT is NP-hard.
00:28:31.610 --> 00:28:35.310
Well, if you have
any problem in NP,
00:28:35.310 --> 00:28:39.580
that means there is one of these
nondeterministic polynomial
00:28:39.580 --> 00:28:44.870
time algorithms, or there
is some verifier given
00:28:44.870 --> 00:28:46.740
a polynomial size certificate.
00:28:46.740 --> 00:28:49.250
So that verifier is
just some algorithm.
00:28:49.250 --> 00:28:52.070
And software and hardware
are basically the same thing,
00:28:52.070 --> 00:28:52.570
right?
00:28:52.570 --> 00:28:54.200
So you can convert
that algorithm
00:28:54.200 --> 00:28:57.150
into a circuit that
implements the algorithm.
00:28:57.150 --> 00:28:59.840
And if I have a circuit with
like ANDs and ORs and NOTs,
00:28:59.840 --> 00:29:02.469
I can convert that
into a Boolean formula
00:29:02.469 --> 00:29:03.510
with ANDs, ORs, and NOTs.
00:29:03.510 --> 00:29:06.030
Circuits and formulas
are about the same.
00:29:06.030 --> 00:29:08.374
And if I have a
formula-- fun fact,
00:29:08.374 --> 00:29:10.040
although this is a
little less obvious--
00:29:10.040 --> 00:29:17.120
you can convert it into this
form, an AND of triple ORs.
00:29:17.120 --> 00:29:18.850
And once you've done
that, that formula
00:29:18.850 --> 00:29:22.120
is equivalent to the
original algorithm.
00:29:22.120 --> 00:29:25.840
And the inputs to that
verification algorithm,
00:29:25.840 --> 00:29:29.220
the certificate, are represented
by these variables, the xi's.
00:29:29.220 --> 00:29:31.040
And so deciding whether
there's some way
00:29:31.040 --> 00:29:33.160
to set the xi's to
make the formula true
00:29:33.160 --> 00:29:37.040
is the same thing as saying is
there some certificate where
00:29:37.040 --> 00:29:40.220
the verifier says yes, which
is the same thing as saying
00:29:40.220 --> 00:29:44.120
that the problem has answer yes.
00:29:44.120 --> 00:29:47.670
So given an NP algorithm, one
of these nondeterministic funny
00:29:47.670 --> 00:29:50.700
algorithms, we can convert it
into a formula satisfaction
00:29:50.700 --> 00:29:51.670
problem.
00:29:51.670 --> 00:29:53.800
And that's how you prove
3SAT is NP-complete.
00:29:53.800 --> 00:29:55.680
But to do that can
take many lectures,
00:29:55.680 --> 00:29:58.212
so I'm not going
to do the details.
00:30:01.195 --> 00:30:03.570
The main annoying part is
being formal about what exactly
00:30:03.570 --> 00:30:08.220
an algorithm is, which we
don't do in this class.
00:30:08.220 --> 00:30:11.820
If you're interested,
take 6.045,
00:30:11.820 --> 00:30:13.350
which is some
people are actually
00:30:13.350 --> 00:30:16.050
in the overlap this semester.
00:30:16.050 --> 00:30:16.550
Cool.
00:30:16.550 --> 00:30:17.850
Let's do some reductions.
00:30:17.850 --> 00:30:19.500
This is where things get fun.
00:30:19.500 --> 00:30:23.230
So we're going to start
with reducing 3SAT
00:30:23.230 --> 00:30:24.490
to Super Mario Brothers.
00:30:27.400 --> 00:30:31.320
So how many people have
played Super Mario Brothers?
00:30:31.320 --> 00:30:31.880
Easy one.
00:30:31.880 --> 00:30:33.769
I hope if you haven't
played, you've seen it,
00:30:33.769 --> 00:30:36.310
because we're going to rely very
much on Super Mario Brothers
00:30:36.310 --> 00:30:38.720
physics, which I hope
is fairly intuitive.
00:30:38.720 --> 00:30:42.050
But if you haven't played,
you should, obviously.
00:30:46.180 --> 00:30:50.610
And we're going to reduce
3SAT to Super Mario Brothers.
00:31:04.480 --> 00:31:13.440
Now this is a theorem by a
bunch of people, one MIT grad
00:31:13.440 --> 00:31:19.550
student, myself, and a couple
other collaborators not at MIT.
00:31:19.550 --> 00:31:23.220
And of course this result holds
for all versions of Super Mario
00:31:23.220 --> 00:31:26.569
Brothers so far
released, I think.
00:31:26.569 --> 00:31:28.110
The proofs are a
little bit different
00:31:28.110 --> 00:31:31.200
for each one, especially Mario
2, which is its own universe.
00:31:33.710 --> 00:31:36.430
What I'm going to talk about the
original Super Mario Brothers,
00:31:36.430 --> 00:31:39.380
NES classic which
I grew up with.
00:31:42.020 --> 00:31:45.240
Now the real Super Mario
Brothers is on a 320
00:31:45.240 --> 00:31:46.450
by 240 screen.
00:31:46.450 --> 00:31:47.480
It's a little bit small.
00:31:47.480 --> 00:31:51.110
Once you go right, you can't go
back left, except in the maze
00:31:51.110 --> 00:31:53.020
levels anyway.
00:31:53.020 --> 00:31:55.190
So I need to generalize
a little bit.
00:31:55.190 --> 00:31:58.540
Because if you assume that
the screen size of Super Mario
00:31:58.540 --> 00:32:00.270
Brothers is
constant, in fact you
00:32:00.270 --> 00:32:01.770
can dynamic program
your way through
00:32:01.770 --> 00:32:04.670
and find the optimal
solution in polynomial time.
00:32:04.670 --> 00:32:13.080
So I need to generalize a little
bit to arbitrary board size,
00:32:13.080 --> 00:32:15.640
arbitrary screen size.
00:32:15.640 --> 00:32:22.010
So in fact, my entire level will
be in one screen, no scrolling.
00:32:22.010 --> 00:32:24.890
Never mind this is a
side scrolling adventure.
00:32:24.890 --> 00:32:27.160
And so that's my
generalized problem.
00:32:27.160 --> 00:32:28.400
And I claim this is NP-hard.
00:32:28.400 --> 00:32:30.820
If I give you a
level and I ask you,
00:32:30.820 --> 00:32:34.070
can you get to the
end of this level?
00:32:34.070 --> 00:32:36.890
That problem is NP-hard.
00:32:36.890 --> 00:32:38.284
Also no time limit.
00:32:38.284 --> 00:32:40.700
The time limit would be OK,
but you have to generalize it.
00:32:40.700 --> 00:32:43.240
Instead of 300
seconds or whatever,
00:32:43.240 --> 00:32:47.119
it has to be an arbitrary value.
00:32:47.119 --> 00:32:48.410
So how are we going to do this?
00:32:48.410 --> 00:32:52.100
We're going to reduce from
3SAT to Super Mario Brothers.
00:32:52.100 --> 00:32:54.480
So that means I'm given--
I don't get to choose.
00:32:54.480 --> 00:32:56.470
I'm given one of these formulas.
00:32:56.470 --> 00:32:59.800
And I have to convert it into an
equivalent Super Mario Brother
00:32:59.800 --> 00:33:00.550
instance.
00:33:00.550 --> 00:33:04.300
So I have to convert it into
a level, a hypothetical level
00:33:04.300 --> 00:33:05.310
of Super Mario Brothers.
00:33:05.310 --> 00:33:08.000
Given a formula, I have
to build a level that
00:33:08.000 --> 00:33:11.150
implements that formula.
00:33:11.150 --> 00:33:13.230
So here's what it's
going to look like.
00:33:13.230 --> 00:33:15.300
I'm going to start
out somewhere.
00:33:15.300 --> 00:33:19.180
Here's my drawing of Mario.
00:33:19.180 --> 00:33:21.840
Mario-- or you could play Luigi.
00:33:21.840 --> 00:33:23.857
It doesn't matter.
00:33:23.857 --> 00:33:26.190
First thing it's going to do
is enter a little black box
00:33:26.190 --> 00:33:29.220
called a variable.
00:33:29.220 --> 00:33:33.800
This is supposed to
represent, let's call it x1.
00:33:33.800 --> 00:33:35.035
And so it's some black box.
00:33:35.035 --> 00:33:36.910
I'm going to tell you
what it is in a moment.
00:33:36.910 --> 00:33:38.180
And it has two outputs.
00:33:38.180 --> 00:33:41.020
There's the true output
and the false output.
00:33:41.020 --> 00:33:44.040
And the idea is that Mario has
to choose whether to set x1
00:33:44.040 --> 00:33:46.470
to true or false.
00:33:46.470 --> 00:33:47.676
Let me show you that gadget.
00:33:52.326 --> 00:33:56.880
So here's the-- whoops,
upside down-- here
00:33:56.880 --> 00:33:59.330
is the variable gadget.
00:33:59.330 --> 00:34:00.950
So here's Mario.
00:34:00.950 --> 00:34:02.630
Could enter from
this way or that way.
00:34:02.630 --> 00:34:04.610
We'll need a couple of
entrances in a moment.
00:34:04.610 --> 00:34:06.510
And then falls down.
00:34:06.510 --> 00:34:08.760
Once Mario is down here, if
you check the jump height,
00:34:08.760 --> 00:34:10.199
you cannot get back up to here.
00:34:10.199 --> 00:34:11.451
So this is like a one way.
00:34:11.451 --> 00:34:13.159
Once you're down here,
you have a choice.
00:34:13.159 --> 00:34:15.270
Should I fall to the left
or fall to the right?
00:34:15.270 --> 00:34:19.150
And if you make these falls
large enough, once you fall,
00:34:19.150 --> 00:34:21.550
you can't unfall.
00:34:21.550 --> 00:34:23.389
So once you make a
choice of whether I
00:34:23.389 --> 00:34:26.780
leave on the true exit
or the false exit,
00:34:26.780 --> 00:34:28.230
that's a permanent choice.
00:34:28.230 --> 00:34:30.730
So you can't undo it, unless
you can come back to here.
00:34:30.730 --> 00:34:33.487
But we'll set up so
that never happens.
00:34:33.487 --> 00:34:35.320
I mean, if you're trying
to solve the level,
00:34:35.320 --> 00:34:37.100
you don't know which way to go.
00:34:37.100 --> 00:34:38.212
You have to guess.
00:34:38.212 --> 00:34:40.170
Can I go fall to the left
or fall to the right,
00:34:40.170 --> 00:34:43.370
or do something.
00:34:43.370 --> 00:34:47.100
So the existence of a
play through, this level,
00:34:47.100 --> 00:34:50.870
is the same as saying there is
a choice for the x1 variable.
00:34:50.870 --> 00:34:53.573
Now we have to do this
for lots of variables.
00:34:53.573 --> 00:34:58.760
So there's x2 variable,
x3 variable, and so on.
00:34:58.760 --> 00:35:03.950
Each one has a true
exit and a false exit.
00:35:03.950 --> 00:35:06.890
So the actual level will
have n instances of this
00:35:06.890 --> 00:35:08.530
if we have n variables.
00:35:08.530 --> 00:35:11.750
Now, what do I do
once Mario decides
00:35:11.750 --> 00:35:14.220
that this is a true thing?
00:35:14.220 --> 00:35:15.680
What I'm going to
do is have-- this
00:35:15.680 --> 00:35:17.150
is called a gadget by the way.
00:35:17.150 --> 00:35:19.680
In general, most
NP-hardness proofs
00:35:19.680 --> 00:35:21.950
use these things called
gadgets, which is just saying,
00:35:21.950 --> 00:35:24.790
we take various
features of the input,
00:35:24.790 --> 00:35:26.940
and we convert them into
corresponding features
00:35:26.940 --> 00:35:27.890
on the output.
00:35:27.890 --> 00:35:30.770
So here I'm taking each
variable, x1, x2, x3,
00:35:30.770 --> 00:35:33.270
and so on, and building
this little gadget
00:35:33.270 --> 00:35:35.330
for each of those variables.
00:35:35.330 --> 00:35:39.260
Now the other main thing you
have in 3SAT are the clauses.
00:35:39.260 --> 00:35:42.100
We have triples of variables
or their negations.
00:35:42.100 --> 00:35:45.640
They have to come
together and be satisfied.
00:35:45.640 --> 00:35:47.320
One of them has to be true.
00:35:47.320 --> 00:35:56.220
So down here I'm going to have
some clause gadgets, which
00:35:56.220 --> 00:35:57.560
I will show you in a moment.
00:36:05.630 --> 00:36:08.510
OK, and I think
I'll switch colors.
00:36:08.510 --> 00:36:10.220
This is about to get messy.
00:36:10.220 --> 00:36:16.132
So the idea is that some of
the clauses have x1 in them.
00:36:16.132 --> 00:36:19.440
The true version
of x1, not x1 bar.
00:36:19.440 --> 00:36:22.920
So for those clauses,
I want to connect.
00:36:22.920 --> 00:36:25.010
I'm going to dip into
the clause briefly.
00:36:25.010 --> 00:36:27.960
So from this wire going to
dip into the clause here.
00:36:27.960 --> 00:36:30.950
And then I'm going to go to
the next clause that has x1.
00:36:30.950 --> 00:36:35.020
Maybe it's this one, and
the next one, and so on.
00:36:35.020 --> 00:36:38.360
All the clauses that have
x1 in it, I dip into.
00:36:38.360 --> 00:36:40.240
The other ones I don't.
00:36:40.240 --> 00:36:41.640
And then once I'm
done, I'm going
00:36:41.640 --> 00:36:44.290
to come back and feed into x2.
00:36:48.620 --> 00:36:52.180
Next, I look at this
false wire for x1.
00:36:52.180 --> 00:36:55.260
So all the clauses that
have x1 bar in them,
00:36:55.260 --> 00:36:56.720
I'm going to connect.
00:36:56.720 --> 00:36:58.770
So I don't know
which ones they are.
00:36:58.770 --> 00:37:05.420
Maybe this one, or
this one, something.
00:37:05.420 --> 00:37:08.270
And then I come here.
00:37:08.270 --> 00:37:11.150
And so the idea is that
Mario makes a choice
00:37:11.150 --> 00:37:12.540
whether x1 is true or false.
00:37:12.540 --> 00:37:17.640
If x1 is true, Mario is going
to visit all of the clauses that
00:37:17.640 --> 00:37:19.400
have x1 true in them.
00:37:19.400 --> 00:37:21.260
And then it's going to
go to the x2 choice.
00:37:21.260 --> 00:37:23.510
Then it's going to choose
whether x2 is true or false,
00:37:23.510 --> 00:37:25.620
and repeat.
00:37:25.620 --> 00:37:27.900
Or Mario decides
x1 should be false.
00:37:27.900 --> 00:37:30.420
That will satisfy
all the clauses
00:37:30.420 --> 00:37:34.550
that have x1 bar in them.
00:37:34.550 --> 00:37:37.220
And then again, we
feed back into x2.
00:37:37.220 --> 00:37:39.950
So this is why we have two
inputs into the x2 gadget.
00:37:39.950 --> 00:37:42.400
One of them is when the
previous variable was true.
00:37:42.400 --> 00:37:45.200
The other is when the
previous variable was false.
00:37:45.200 --> 00:37:47.780
The choice of x2 doesn't
depend on the choice of x1.
00:37:47.780 --> 00:37:49.495
So they feed into
the same thing.
00:37:49.495 --> 00:37:50.870
And you have to
make your choice.
00:37:53.960 --> 00:37:55.330
So far, so good.
00:37:55.330 --> 00:37:58.970
Now the question is, what's
happening in these clauses.
00:37:58.970 --> 00:38:03.580
And then there's
one other aspect,
00:38:03.580 --> 00:38:07.740
which is after you've set all of
the variables, at the very end,
00:38:07.740 --> 00:38:13.350
after this last variable
xn, at the very end,
00:38:13.350 --> 00:38:19.650
what we're going to do is come
and go through all the clauses.
00:38:19.650 --> 00:38:21.440
And then this is the flag.
00:38:21.440 --> 00:38:22.970
This is where you win the level.
00:38:22.970 --> 00:38:24.110
Sorry, I drew it backwards.
00:38:24.110 --> 00:38:29.600
But the goal is for Martin to
start here and get to here.
00:38:29.600 --> 00:38:31.070
In order to do
that, you have to be
00:38:31.070 --> 00:38:32.694
able to traverse
through these clauses.
00:38:32.694 --> 00:38:34.750
So what do the
clauses look like?
00:38:34.750 --> 00:38:37.780
This is a little
bit more elaborate.
00:38:37.780 --> 00:38:39.250
So here we are.
00:38:43.095 --> 00:38:44.970
This is a clause gadget.
00:38:44.970 --> 00:38:47.906
So there are three ways
to dip into the clause.
00:38:47.906 --> 00:38:50.030
It's actually upside down
relative to that picture,
00:38:50.030 --> 00:38:51.500
but that's not a problem.
00:38:54.180 --> 00:38:58.830
So if Mario comes here, then
he can hit the question mark
00:38:58.830 --> 00:38:59.850
from below.
00:38:59.850 --> 00:39:02.742
And inside this question mark
is an invincibility star.
00:39:02.742 --> 00:39:04.950
And the invincibility star
will come up here and just
00:39:04.950 --> 00:39:07.710
bounce around forever.
00:39:07.710 --> 00:39:08.610
We checked.
00:39:08.610 --> 00:39:12.350
The star will just stay there
for as long as you let it sit.
00:39:12.350 --> 00:39:14.290
Unfortunately, all of
these are solid blocks,
00:39:14.290 --> 00:39:18.420
so Mario can't actually get
up to here to get the star.
00:39:18.420 --> 00:39:20.459
But as long as Mario
can visit this question
00:39:20.459 --> 00:39:22.500
mark or this question mark
or this question mark,
00:39:22.500 --> 00:39:25.275
then there will be at
least one star up here.
00:39:25.275 --> 00:39:26.650
So the idea is
that each of these
00:39:26.650 --> 00:39:30.140
represents one of the
literals that's in the clause.
00:39:30.140 --> 00:39:34.560
And if we choose-- so let's look
at this first clause, x1 or x3
00:39:34.560 --> 00:39:35.860
or x6 bar.
00:39:35.860 --> 00:39:40.150
So if we choose x1 to be true,
then we'll follow the path
00:39:40.150 --> 00:39:42.040
and we'll be able
to hit the star.
00:39:42.040 --> 00:39:46.150
Or if we choose x3 to be
true, then we'll come in here
00:39:46.150 --> 00:39:47.910
and hit this star.
00:39:47.910 --> 00:39:52.199
Or if we choose x6 to
be false, then that path
00:39:52.199 --> 00:39:54.740
will lead to here and we'll be
able to hit this question mark
00:39:54.740 --> 00:39:55.800
and get the star up here.
00:39:55.800 --> 00:39:58.570
So as long as we
satisfy the clause,
00:39:58.570 --> 00:39:59.990
there will be at least one star.
00:39:59.990 --> 00:40:02.880
Won't help if you
have multiple stars.
00:40:02.880 --> 00:40:07.220
Then the final traversal part--
so that was this first clause.
00:40:07.220 --> 00:40:08.960
And now we're
traversing through.
00:40:08.960 --> 00:40:10.960
Actually in this picture,
it's left to right.
00:40:10.960 --> 00:40:13.420
Just turn your head.
00:40:13.420 --> 00:40:16.040
And so now Mario
is going to have
00:40:16.040 --> 00:40:19.450
to traverse this gadget from
left to right on this top part.
00:40:19.450 --> 00:40:22.810
And if Mario comes in here and
you can barely jump over that.
00:40:22.810 --> 00:40:25.120
If there's a star, you
can collect the star
00:40:25.120 --> 00:40:29.700
and then run through all of
these flaming bars of death.
00:40:29.700 --> 00:40:32.030
If there's no star, you can't.
00:40:32.030 --> 00:40:34.170
You'll die if you
try to traverse.
00:40:34.170 --> 00:40:38.530
So in order to be able to
traverse all these clauses,
00:40:38.530 --> 00:40:40.630
they must all be true.
00:40:40.630 --> 00:40:44.180
And them all being true is the
same is their AND being true.
00:40:44.180 --> 00:40:47.300
So you will be able to survive
through all these clauses
00:40:47.300 --> 00:40:51.404
if and only if this formula
has a satisfying assignment.
00:40:51.404 --> 00:40:52.820
The satisfying
assignment would be
00:40:52.820 --> 00:40:57.990
given to you by the level play.
00:40:57.990 --> 00:41:00.400
The choices that Mario
makes in this gadget
00:41:00.400 --> 00:41:02.750
will tell you
whether each variable
00:41:02.750 --> 00:41:05.310
should be true or false.
00:41:05.310 --> 00:41:08.520
So to elaborate just
a little bit more
00:41:08.520 --> 00:41:10.600
in general, when you have
a reduction like this,
00:41:10.600 --> 00:41:13.600
to prove that it actually works,
you need to check two things.
00:41:13.600 --> 00:41:15.690
You need to check
that if there is a way
00:41:15.690 --> 00:41:17.670
to satisfy this
formula, then there
00:41:17.670 --> 00:41:19.300
is a way to play this level.
00:41:19.300 --> 00:41:21.716
And then conversely you need
to show that if there's a way
00:41:21.716 --> 00:41:23.610
to play this level,
then the formula
00:41:23.610 --> 00:41:25.150
has a satisfying assignment.
00:41:25.150 --> 00:41:28.980
So for that latter
part, in order
00:41:28.980 --> 00:41:31.437
to convert a level play into
a satisfying assignment,
00:41:31.437 --> 00:41:34.020
you just check which way Mario
falls in each of these gadgets,
00:41:34.020 --> 00:41:34.820
left or right.
00:41:34.820 --> 00:41:36.700
That tells you the
variable assignment.
00:41:36.700 --> 00:41:38.860
And because of the
way the clauses work,
00:41:38.860 --> 00:41:41.200
you'll only be able to
finish the level if there
00:41:41.200 --> 00:41:42.910
was at least one star here.
00:41:42.910 --> 00:41:44.780
And stars run out
after some time.
00:41:44.780 --> 00:41:48.530
So you can barely make it
through all the flaming bars
00:41:48.530 --> 00:41:49.067
of death.
00:41:49.067 --> 00:41:50.400
Then you get to the next clause.
00:41:50.400 --> 00:41:53.230
You need another
star for each one.
00:41:53.230 --> 00:41:56.300
Conversely, if there is
a satisfying assignment,
00:41:56.300 --> 00:41:58.060
you can actually play
through the level,
00:41:58.060 --> 00:41:59.934
you just make these
choices according to what
00:41:59.934 --> 00:42:01.180
the satisfying assignment is.
00:42:01.180 --> 00:42:03.330
So either way it's equivalent.
00:42:03.330 --> 00:42:07.090
We always get a yes or
no answer here whenever
00:42:07.090 --> 00:42:11.460
we get a corresponding yes or
no answer to the 3SAT process.
00:42:11.460 --> 00:42:15.330
You also need to check that this
reduction is polynomial size.
00:42:15.330 --> 00:42:17.690
It can be computed
in polynomial time.
00:42:17.690 --> 00:42:18.530
So there's an issue.
00:42:18.530 --> 00:42:21.420
Given this thing, you have
to lay this out in a grid
00:42:21.420 --> 00:42:23.780
and draw all these wires.
00:42:23.780 --> 00:42:28.290
And there's one
problem here, which is,
00:42:28.290 --> 00:42:31.010
these wires cross each other.
00:42:31.010 --> 00:42:33.870
And that's a little
awkward, because these wires
00:42:33.870 --> 00:42:36.660
are basically just long tunnels
for Mario to walk through.
00:42:36.660 --> 00:42:38.910
But what does it mean
to have a crossing wire?
00:42:38.910 --> 00:42:41.400
Really, if Mario's
coming this way,
00:42:41.400 --> 00:42:44.630
I don't want them to
be able to go up here.
00:42:44.630 --> 00:42:46.100
He has to go straight.
00:42:46.100 --> 00:42:47.900
Otherwise this
reduction won't work.
00:42:47.900 --> 00:42:50.800
So I need what's called
a crossover gadget.
00:42:50.800 --> 00:42:56.940
And everywhere here I have a
crossing, I have a crossover.
00:42:56.940 --> 00:42:59.360
And this gadget has
to guarantee that I
00:42:59.360 --> 00:43:01.220
can go through one
way or the other way,
00:43:01.220 --> 00:43:04.860
but there's no leakage from
one path to the other path.
00:43:04.860 --> 00:43:08.550
Actually, if I first
traverse through here,
00:43:08.550 --> 00:43:10.950
and then I traverse through
here, it's OK if I leak back.
00:43:10.950 --> 00:43:14.310
Because once I visit a
wire, it's kind of done.
00:43:14.310 --> 00:43:17.430
But I can't have leakage if
only one of them is traversed.
00:43:17.430 --> 00:43:23.570
So this is the last gadget, the
most complicated of them all.
00:43:23.570 --> 00:43:25.295
So this took a
while to construct,
00:43:25.295 --> 00:43:26.170
as you might imagine.
00:43:28.710 --> 00:43:32.470
So this is what we call a
unidirectional crossover.
00:43:32.470 --> 00:43:38.690
You can either go from left to
right or from bottom to top,
00:43:38.690 --> 00:43:42.270
but you cannot go from bottom to
right or bottom to left or left
00:43:42.270 --> 00:43:44.480
to bottom, that kind of thing.
00:43:44.480 --> 00:43:45.980
So I'm told that
Mario is only going
00:43:45.980 --> 00:43:48.450
to enter from here to here,
because all of these wires,
00:43:48.450 --> 00:43:50.010
I can make one way wires.
00:43:50.010 --> 00:43:53.440
I only have to think about
going in a particular direction.
00:43:53.440 --> 00:43:55.620
I can have falls to
force Mario to only go
00:43:55.620 --> 00:43:57.840
one way along these wires.
00:43:57.840 --> 00:44:01.790
And so let me show you
the valid traversals.
00:44:01.790 --> 00:44:03.900
Maybe the simplest
one is from here.
00:44:03.900 --> 00:44:05.910
So let's say Mario
comes in here, falls.
00:44:05.910 --> 00:44:08.480
So I can't backtrack,
can jump up here.
00:44:08.480 --> 00:44:11.960
And then if Mario's big,
he can break this block,
00:44:11.960 --> 00:44:12.890
break this block.
00:44:12.890 --> 00:44:17.490
But if he's big-- there should
be a couple more zig zags here.
00:44:17.490 --> 00:44:18.790
Let's try to run.
00:44:18.790 --> 00:44:21.729
You can crouch
slide through here.
00:44:21.729 --> 00:44:23.520
But then you'll sort
of lose your momentum,
00:44:23.520 --> 00:44:25.144
and you won't be able
to go through all
00:44:25.144 --> 00:44:27.990
these traversals as big Mario.
00:44:27.990 --> 00:44:31.800
So you can break these blocks
and then get up to the top
00:44:31.800 --> 00:44:32.800
and leave.
00:44:32.800 --> 00:44:35.520
Or, if big Mario comes
from over this way,
00:44:35.520 --> 00:44:38.940
you can first take a
damage, become small Mario.
00:44:38.940 --> 00:44:42.010
Then you can fit through
these wiggly blocks.
00:44:42.010 --> 00:44:45.090
But you cannot break blocks
anymore as small Mario.
00:44:45.090 --> 00:44:48.230
So once you've committed
to going small,
00:44:48.230 --> 00:44:50.520
you have to stay small,
until you get to here.
00:44:50.520 --> 00:44:52.402
And then there's a
mushroom in this block.
00:44:52.402 --> 00:44:54.860
So you can get big again, and
then you can break this block
00:44:54.860 --> 00:44:56.020
and leave.
00:44:56.020 --> 00:44:57.700
But once you're big,
you can't backtrack
00:44:57.700 --> 00:45:00.690
because big Mario can't fit
through these tiny tubes.
00:45:00.690 --> 00:45:03.430
See it clear, right?
00:45:03.430 --> 00:45:07.260
So slight detail, which
is at the beginning,
00:45:07.260 --> 00:45:10.660
we need to make Mario big--
so there's a little mushroom.
00:45:10.660 --> 00:45:14.450
I think they have three
spots-- at the beginning.
00:45:14.450 --> 00:45:16.180
And also at the
end, there has to be
00:45:16.180 --> 00:45:18.600
something like this that
checks that you actually
00:45:18.600 --> 00:45:19.640
have a mushroom.
00:45:19.640 --> 00:45:22.180
So the only time you're
allowed to take damage
00:45:22.180 --> 00:45:24.232
is briefly in this
gadget you take damage.
00:45:24.232 --> 00:45:26.190
If you tried to backtrack,
you would get stuck.
00:45:26.190 --> 00:45:28.120
There's a long fall here.
00:45:28.120 --> 00:45:29.820
And then you have
to get the mushroom
00:45:29.820 --> 00:45:32.610
so you can escape again.
00:45:32.610 --> 00:45:39.250
So at the end there's
like a mushroom check.
00:45:39.250 --> 00:45:40.950
Make sure you have it.
00:45:40.950 --> 00:45:42.790
So most of the
time Mario is big.
00:45:42.790 --> 00:45:44.260
And just in these
little crossovers
00:45:44.260 --> 00:45:45.635
you have to make
these decisions.
00:45:45.635 --> 00:45:47.740
This would make a
giant level, but it
00:45:47.740 --> 00:45:54.650
is polynomial size, probably
quadratic or something.
00:45:54.650 --> 00:45:56.400
Therefore Super Mario
Brothers is NP-hard.
00:45:59.050 --> 00:46:01.420
So if you want more
fun examples like this,
00:46:01.420 --> 00:46:04.650
you should check out 6.890, the
class I taught last semester,
00:46:04.650 --> 00:46:08.100
which has online video lectures,
soon to be on OpenCourseWare.
00:46:08.100 --> 00:46:12.310
So you can play with that.
00:46:12.310 --> 00:46:14.310
Any questions about Mario?
00:46:17.340 --> 00:46:18.740
All right, I hope you all play.
00:46:27.380 --> 00:46:30.730
So the next topic is
a problem you probably
00:46:30.730 --> 00:46:33.410
haven't heard about, three
dimensional matching.
00:46:43.880 --> 00:46:46.980
This is a kind of a
graph theory problem.
00:46:46.980 --> 00:46:50.590
We're going to call
it 3DM for short.
00:46:50.590 --> 00:46:53.640
And you've seen matching
problems based on flow.
00:46:53.640 --> 00:46:56.030
Matching problems are usually
about pairs of things.
00:46:56.030 --> 00:46:58.220
You're pairing them
up, which you might
00:46:58.220 --> 00:46:59.470
call two dimensional matching.
00:46:59.470 --> 00:47:01.700
That can be solved
in polynomial time.
00:47:01.700 --> 00:47:05.380
But if you change two to three
and you're tripling things up,
00:47:05.380 --> 00:47:08.420
then suddenly the problem
becomes NP-complete.
00:47:08.420 --> 00:47:13.290
So it's a useful starting
point, similar to 3SAT.
00:47:20.090 --> 00:47:28.250
So you're given a
set X of elements,
00:47:28.250 --> 00:47:30.810
a set Y of elements,
a set Z of elements.
00:47:30.810 --> 00:47:32.770
None of them are shared.
00:47:32.770 --> 00:47:37.020
But more importantly, you
are given a bunch of triples.
00:47:37.020 --> 00:47:40.850
These are the allowable triples.
00:47:40.850 --> 00:47:44.790
So we'll call the set
of allowable triples T.
00:47:44.790 --> 00:47:47.100
And so we're looking
at the cross product.
00:47:47.100 --> 00:47:50.890
This is the set of all triples
X, Y, and Z, or X is in X,
00:47:50.890 --> 00:47:54.620
Y is in Y, and Z is in Z. But
not all triples are allowed.
00:47:54.620 --> 00:47:58.210
Only some subset of
triples is allowed.
00:47:58.210 --> 00:48:04.800
And your goal is to choose
among those subsets-- sorry,
00:48:04.800 --> 00:48:09.440
among those triples a
subset of the triples.
00:48:09.440 --> 00:48:11.020
So we're trying
to choose a subset
00:48:11.020 --> 00:48:26.667
S of T such that every element--
so the things in X, Y, and Z
00:48:26.667 --> 00:48:27.500
are called elements.
00:48:27.500 --> 00:48:33.240
So I'm just taking
somebody in the union XYZ.
00:48:33.240 --> 00:48:48.225
It should be in exactly
one triple s in big S.
00:48:48.225 --> 00:48:50.600
This is a little weird, but
you can think of this problem
00:48:50.600 --> 00:48:56.140
as you have an alien
race with three genders--
00:48:56.140 --> 00:48:58.254
male, female, neuter I guess.
00:48:58.254 --> 00:48:59.420
Those are the X, Y, and Z's.
00:48:59.420 --> 00:49:02.420
There's an equal number of each.
00:49:02.420 --> 00:49:06.010
And every triple reports
to you whether that
00:49:06.010 --> 00:49:08.270
is a compatible matching.
00:49:08.270 --> 00:49:11.130
Who knows what they're
doing, all three of them?
00:49:11.130 --> 00:49:14.110
So you're told up front--
you take a survey.
00:49:14.110 --> 00:49:15.790
There's only n cubed
different triples.
00:49:15.790 --> 00:49:19.992
For each of them they
say, yeah, I'd do that.
00:49:19.992 --> 00:49:22.630
So you were given that subset.
00:49:22.630 --> 00:49:26.400
And now your goal is to
permanently triple up
00:49:26.400 --> 00:49:27.380
these guys.
00:49:27.380 --> 00:49:30.630
And everybody wants to
be in exactly one triple.
00:49:30.630 --> 00:49:34.860
So it's a monogamous
race, imagine.
00:49:34.860 --> 00:49:38.190
So everybody wants to be put
in one triple, but only one
00:49:38.190 --> 00:49:39.317
triple.
00:49:39.317 --> 00:49:40.900
And the question is,
is this possible?
00:49:40.900 --> 00:49:42.358
This is three
dimensional matching.
00:49:42.358 --> 00:49:45.436
Certainly not always going to be
possible, but sometimes it is.
00:49:45.436 --> 00:49:46.810
If it is, you want
to answer yes.
00:49:46.810 --> 00:49:50.060
If it's not possible,
you want to answer no.
00:49:50.060 --> 00:49:52.180
This problem is NP-complete.
00:49:52.180 --> 00:49:54.150
Why is it in NP?
00:49:54.150 --> 00:49:58.140
Because I can basically guess
which elements of T are in S.
00:49:58.140 --> 00:50:00.730
There's only at most
n cubed of them.
00:50:00.730 --> 00:50:02.510
So for each one, it
is guess yes or no,
00:50:02.510 --> 00:50:04.310
is that element of T in S?
00:50:04.310 --> 00:50:07.270
And then I check whether this
coverage constraint holds.
00:50:07.270 --> 00:50:09.617
So it's very easy to
prove this is in NP.
00:50:12.540 --> 00:50:15.940
The challenge is to
prove that it's NP-hard.
00:50:15.940 --> 00:50:20.610
And we're going to do that,
again, by reducing from 3SAT.
00:50:32.510 --> 00:50:36.640
So we're going to make
a reduction from 3SAT
00:50:36.640 --> 00:50:38.850
to three dimensional matching.
00:50:38.850 --> 00:50:39.840
Direction is important.
00:50:39.840 --> 00:50:42.030
Always reduce from the
thing you know is hard
00:50:42.030 --> 00:50:45.230
and reduce to the thing
you don't know is hard.
00:50:45.230 --> 00:50:47.320
So again, we're given a formula.
00:50:47.320 --> 00:50:49.640
And we want to
convert that formula
00:50:49.640 --> 00:50:53.670
into an equivalent three
dimensional matching input.
00:50:53.670 --> 00:50:56.965
So the formula has
variables and clauses.
00:50:56.965 --> 00:50:58.590
For each variable,
we're going to build
00:50:58.590 --> 00:51:01.777
a gadget that looks like this.
00:51:01.777 --> 00:51:03.860
And for each clause we're
going to build a gadget.
00:51:03.860 --> 00:51:06.720
So here's what they look like.
00:51:06.720 --> 00:51:13.730
If we have a variable x1,
we're going to convert that
00:51:13.730 --> 00:51:15.650
into this picture.
00:51:30.490 --> 00:51:32.053
Stay monochromatic for now.
00:51:36.660 --> 00:51:41.176
Looks pretty crazy at the
moment, but it's not so crazy.
00:51:56.510 --> 00:51:58.490
This is not supposed
to be obvious.
00:51:58.490 --> 00:52:00.330
You have to think for a while.
00:52:00.330 --> 00:52:03.310
It's a puzzle to figure
out this kind of thing.
00:52:03.310 --> 00:52:08.610
But I call this thing a variable
gadget because locally--
00:52:08.610 --> 00:52:11.450
so there's basically a
wheel in the center here.
00:52:11.450 --> 00:52:13.740
And then there's
these extra dots
00:52:13.740 --> 00:52:15.810
for every pair of dots,
consecutive pairs of dots
00:52:15.810 --> 00:52:17.100
in a wheel.
00:52:17.100 --> 00:52:19.960
And what I've drawn is the set
of triples that are allowed.
00:52:19.960 --> 00:52:22.620
There's tons of other
triples which are forbidden.
00:52:22.620 --> 00:52:24.670
The triples that are
in T are the ones
00:52:24.670 --> 00:52:27.490
that I draw as little triangles.
00:52:27.490 --> 00:52:30.480
And two color them because
there are exactly two ways
00:52:30.480 --> 00:52:31.850
to solve this gadget locally.
00:52:31.850 --> 00:52:35.310
Now these dots are going to
be connected to other gadgets.
00:52:35.310 --> 00:52:39.407
But these dots only exist
in this gadget, which means
00:52:39.407 --> 00:52:40.490
they've got to be covered.
00:52:40.490 --> 00:52:42.115
They've got to be
covered exactly once.
00:52:42.115 --> 00:52:45.470
So either you choose
the blue triangles,
00:52:45.470 --> 00:52:47.050
or you choose the red triangles.
00:52:47.050 --> 00:52:51.840
Each of them will exactly
cover each of these guys once.
00:52:51.840 --> 00:52:53.890
You cannot mix and
match red and blue,
00:52:53.890 --> 00:52:56.810
because you either get overlap
if you choose two guys that
00:52:56.810 --> 00:52:59.250
share a point, or
you'd miss one.
00:52:59.250 --> 00:53:01.770
If I choose like this
blue and this red,
00:53:01.770 --> 00:53:04.550
then I can't cover this
point because both of these
00:53:04.550 --> 00:53:05.910
would overlap those two.
00:53:05.910 --> 00:53:09.080
And over here you have to
choose [INAUDIBLE] triples.
00:53:09.080 --> 00:53:10.590
They can't overlap at all.
00:53:10.590 --> 00:53:13.270
And everybody has
to get covered.
00:53:13.270 --> 00:53:15.640
So just given those
constraints, locally you
00:53:15.640 --> 00:53:18.260
can see you have to
choose red or blue.
00:53:18.260 --> 00:53:18.920
Guess what?
00:53:18.920 --> 00:53:21.800
One of them is true,
the other one is false.
00:53:21.800 --> 00:53:25.414
Let's say that red is
true and blue is false.
00:53:25.414 --> 00:53:27.830
In general, when you're trying
to build a variable gadget,
00:53:27.830 --> 00:53:30.180
you build something
that has exactly
00:53:30.180 --> 00:53:33.220
two solutions, one representing
true, one representing false.
00:53:33.220 --> 00:53:36.850
Now how big do I
make this wheel?
00:53:36.850 --> 00:53:39.800
Big enough.
00:53:39.800 --> 00:53:41.970
You could make it as big
as the number of clauses.
00:53:41.970 --> 00:53:46.890
I'm going to make
it into two and x1.
00:53:46.890 --> 00:53:59.110
So wheel-- and this number
is the number of occurrences
00:53:59.110 --> 00:54:03.680
of x1 in the formula.
00:54:03.680 --> 00:54:06.900
So this is the number of clauses
that contain either xi or xi
00:54:06.900 --> 00:54:08.720
bar.
00:54:08.720 --> 00:54:09.585
That's in xi.
00:54:09.585 --> 00:54:11.580
I'm going to double that.
00:54:11.580 --> 00:54:14.960
Because what I get over
here is basically xi
00:54:14.960 --> 00:54:21.540
being true for those guys.
00:54:21.540 --> 00:54:25.020
Actually, yeah,
that's actually right.
00:54:25.020 --> 00:54:27.220
It looks backwards.
00:54:27.220 --> 00:54:28.650
And false for these guys.
00:54:31.670 --> 00:54:35.380
One way or the other,
we'll figure it out.
00:54:35.380 --> 00:54:40.800
So in order for xi to appear
in, say, five different clauses,
00:54:40.800 --> 00:54:45.590
I want five of the true things
and five of the false things.
00:54:45.590 --> 00:54:49.854
And so I need to double in
order to get-- potentially
00:54:49.854 --> 00:54:51.520
I have twice as many
as I actually need,
00:54:51.520 --> 00:54:53.603
but this way I'm guaranteed
to have false or true,
00:54:53.603 --> 00:54:55.830
whichever I need.
00:54:55.830 --> 00:54:57.560
In reality I have
some true occurrences.
00:54:57.560 --> 00:55:01.210
I have some false occurrences,
some x1's, some x1 bars.
00:55:01.210 --> 00:55:06.600
This will guarantee that I have
enough of these free points
00:55:06.600 --> 00:55:09.720
to connect into
my clause gadgets.
00:55:09.720 --> 00:55:11.359
How do I do a clause gadget?
00:55:11.359 --> 00:55:12.442
It's actually really easy.
00:55:16.854 --> 00:55:18.770
So these would be pretty
boring by themselves.
00:55:25.590 --> 00:55:27.180
So a clause always
looks like this.
00:55:27.180 --> 00:55:28.388
Maybe there's some negations.
00:55:30.860 --> 00:55:34.620
Yeah, let's do
something like that.
00:55:34.620 --> 00:55:36.795
I'm going to convert it
into a very simple picture.
00:55:43.810 --> 00:55:50.700
It's going to be xi dot,
and xj bar dot, and xk dot.
00:55:50.700 --> 00:55:56.363
And then-- well maybe I'll
stick to these colors.
00:56:02.400 --> 00:56:06.940
Again, these two points only
appear in this clause gadget.
00:56:06.940 --> 00:56:10.410
These dots are
actually these dots.
00:56:10.410 --> 00:56:12.570
So there's one of
these pictures for x1.
00:56:12.570 --> 00:56:14.330
There's another one for x2, x3.
00:56:14.330 --> 00:56:16.770
And so xi has one
of these wheels.
00:56:16.770 --> 00:56:20.190
I want this dot to be one
of these dots of the wheel.
00:56:20.190 --> 00:56:22.900
And then I want
this dot to be one
00:56:22.900 --> 00:56:26.300
of the dots in the xj wheel
with the false setting, one
00:56:26.300 --> 00:56:27.600
of the red dots.
00:56:27.600 --> 00:56:34.981
I want this one to be xk
true setting in the xk wheel.
00:56:34.981 --> 00:56:36.730
So these things are
all connected together
00:56:36.730 --> 00:56:38.310
in a complicated pattern.
00:56:38.310 --> 00:56:40.110
But the point is that
within this gadget,
00:56:40.110 --> 00:56:42.910
I only have three
allowed triples.
00:56:42.910 --> 00:56:44.790
And these points only
appear in this gadget,
00:56:44.790 --> 00:56:47.734
which means they have to
be covered in this gadget.
00:56:47.734 --> 00:56:49.150
They can be covered
by this triple
00:56:49.150 --> 00:56:51.200
or this triple or this triple.
00:56:51.200 --> 00:56:54.480
But once you choose one,
you can't choose the others.
00:56:54.480 --> 00:56:59.540
What this means is if
I set x1 to be true,
00:56:59.540 --> 00:57:03.540
it leaves behind these
points marked true.
00:57:03.540 --> 00:57:05.640
If I choose the red
things, then it's
00:57:05.640 --> 00:57:08.184
the blue points that
are left behind.
00:57:08.184 --> 00:57:09.600
Leaving points
behind in this case
00:57:09.600 --> 00:57:11.772
is going to be good,
because this clause,
00:57:11.772 --> 00:57:13.480
in order to satisfy
this clause, in order
00:57:13.480 --> 00:57:17.330
to choose one of these three
triples, at least one of these
00:57:17.330 --> 00:57:22.500
must be left behind
by the wheel.
00:57:22.500 --> 00:57:24.457
If all of these are
covered by their wheels,
00:57:24.457 --> 00:57:25.290
then there's no way.
00:57:25.290 --> 00:57:27.750
I can't choose
any of these guys.
00:57:27.750 --> 00:57:30.810
But if at least one of these
is left behind by the wheel,
00:57:30.810 --> 00:57:33.780
then I can choose the
corresponding triple
00:57:33.780 --> 00:57:34.760
and cover these points.
00:57:34.760 --> 00:57:36.880
So I'll be able to cover
these points if and only
00:57:36.880 --> 00:57:39.947
if at least one
of these is true.
00:57:39.947 --> 00:57:40.780
And that's a clause.
00:57:40.780 --> 00:57:43.440
That's what a clause is
supposed to do in 3SAT.
00:57:43.440 --> 00:57:45.200
If at least one
of these is true,
00:57:45.200 --> 00:57:46.590
then the clause is satisfied.
00:57:46.590 --> 00:57:48.380
I need all the clauses
to be satisfied
00:57:48.380 --> 00:57:50.720
because I need to cover
of these points for all
00:57:50.720 --> 00:57:53.760
the instances of these clauses.
00:57:53.760 --> 00:57:55.310
And that's how it works.
00:57:55.310 --> 00:57:57.540
Now, slight catch.
00:57:57.540 --> 00:57:59.710
If you do this,
not all the points
00:57:59.710 --> 00:58:01.740
will be covered, even so.
00:58:04.650 --> 00:58:05.970
Maybe all of these are true.
00:58:05.970 --> 00:58:07.890
And so they're all left behind.
00:58:07.890 --> 00:58:10.580
And I can only cover one
of them with the clause.
00:58:10.580 --> 00:58:12.560
It's a little messy.
00:58:12.560 --> 00:58:17.710
You need another gadget, which
is called garbage collection.
00:58:23.910 --> 00:58:26.650
I don't want to spend
too much time on it.
00:58:26.650 --> 00:58:30.900
But you have two dots.
00:58:30.900 --> 00:58:42.610
And then you have every
single xi-- these dots,
00:58:42.610 --> 00:58:47.490
all true and false dots.
00:58:47.490 --> 00:58:49.410
And you're going to
have this triple,
00:58:49.410 --> 00:58:52.690
and this triple, and this
triple, and this triple,
00:58:52.690 --> 00:58:53.220
and so on.
00:58:53.220 --> 00:58:56.230
It looks an awful
lot like a clause.
00:58:56.230 --> 00:58:59.190
But this is like a clause
that's connected to everybody
00:58:59.190 --> 00:59:00.820
in the entire universe.
00:59:00.820 --> 00:59:03.260
And you repeat this
the appropriate number
00:59:03.260 --> 00:59:09.380
of times, which is
something like sum of nx
00:59:09.380 --> 00:59:10.670
minus the number of clauses.
00:59:14.100 --> 00:59:14.850
OK, why?
00:59:14.850 --> 00:59:16.880
Because if you
look at a wheel, it
00:59:16.880 --> 00:59:21.780
has size 2 times nx
for a variable x.
00:59:21.780 --> 00:59:26.300
And half of the points
will be left uncovered.
00:59:26.300 --> 00:59:29.630
So that means nx of
them will be uncovered.
00:59:29.630 --> 00:59:33.110
Then the clause, if everything
works out correctly,
00:59:33.110 --> 00:59:37.710
the clause will cover
exactly one of those points.
00:59:37.710 --> 00:59:40.090
So for each clause we
cover one of the points.
00:59:40.090 --> 00:59:43.270
That means this
difference is exactly how
00:59:43.270 --> 00:59:45.470
many points are left uncovered.
00:59:45.470 --> 00:59:48.720
And so we make this gadget
exactly that many times.
00:59:48.720 --> 00:59:50.530
And it's free to cover anybody.
00:59:50.530 --> 00:59:53.860
So whatever is left over,
this garbage collector
00:59:53.860 --> 00:59:55.160
will clean up.
00:59:55.160 --> 00:59:57.200
And if we use exactly
the right number of them,
00:59:57.200 --> 01:00:01.170
this garbage collector won't
run out of things to collect.
01:00:01.170 --> 01:00:03.830
So this makes the proof messy.
01:00:03.830 --> 01:00:07.712
But I want to move on to
somewhat simpler proofs
01:00:07.712 --> 01:00:08.670
and for other problems.
01:00:08.670 --> 01:00:09.832
Yeah?
01:00:09.832 --> 01:00:12.262
AUDIENCE: Real quick,
what about the t or f
01:00:12.262 --> 01:00:14.692
points that we didn't cover
because we didn't actually
01:00:14.692 --> 01:00:15.670
need that many?
01:00:15.670 --> 01:00:16.503
ERIC DEMAINE: Right.
01:00:16.503 --> 01:00:19.210
So this also includes the points
that weren't even connected
01:00:19.210 --> 01:00:20.950
to clauses.
01:00:20.950 --> 01:00:23.330
I think this is the right
number no matter what,
01:00:23.330 --> 01:00:26.180
because this is counting the
total number of uncovered guys,
01:00:26.180 --> 01:00:28.310
whether they're connected
to clauses or not.
01:00:28.310 --> 01:00:30.890
Each clause will, in
a satisfied situation,
01:00:30.890 --> 01:00:33.341
it will cover exactly
one of those points.
01:00:33.341 --> 01:00:35.090
The ones that are
connected to the clauses
01:00:35.090 --> 01:00:37.010
won't be covered at
all, but that will still
01:00:37.010 --> 01:00:38.000
be in this difference.
01:00:38.000 --> 01:00:39.750
So yeah, it's good
to check that.
01:00:39.750 --> 01:00:42.250
The first time I wrote this
down I forgot about those points
01:00:42.250 --> 01:00:42.958
and got it wrong.
01:00:42.958 --> 01:00:45.930
But I think this is
right, hopefully.
01:00:45.930 --> 01:00:47.980
I did not come up
with this proof.
01:00:47.980 --> 01:00:49.950
Garey and Johnson
I think-- or no.
01:00:49.950 --> 01:00:52.906
This is-- I forgot.
01:00:52.906 --> 01:00:54.710
Yeah, this is a Garey
and Johnson proof.
01:00:54.710 --> 01:00:58.610
There's a cool book from the
late '70s by Garey and Johnson,
01:00:58.610 --> 01:01:04.320
does a lot of NP-completeness,
if you're curious.
01:01:04.320 --> 01:01:06.702
All right, so
hopefully you believe
01:01:06.702 --> 01:01:08.160
three dimensional
matching is hard.
01:01:08.160 --> 01:01:10.520
Now I'm going to use it to prove
that some very different types
01:01:10.520 --> 01:01:11.650
of problems are hard.
01:01:11.650 --> 01:01:14.270
This is a kind of
graph theory problem.
01:01:14.270 --> 01:01:18.070
You'll see more graph theory
problems in recitation.
01:01:18.070 --> 01:01:21.800
This one, I can
erase 3SAT and Mario.
01:01:21.800 --> 01:01:27.660
So in the world, most
NP-hardness proofs
01:01:27.660 --> 01:01:31.520
are reductions from 3SAT,
or some variation of 3SAT.
01:01:31.520 --> 01:01:34.450
In some sense, you can think
of three dimensional matching
01:01:34.450 --> 01:01:37.870
as kind of like a
version of 3SAT,
01:01:37.870 --> 01:01:39.840
but it's a little
bit more stringent.
01:01:39.840 --> 01:01:44.980
And that stringency helps
us to do other reductions.
01:01:44.980 --> 01:01:47.070
So here's another
problem where we'll
01:01:47.070 --> 01:01:51.710
reduce from three
dimensional matching.
01:01:51.710 --> 01:01:52.940
It's called subset sum.
01:02:00.220 --> 01:02:09.830
So you're given n
integers, a1 up to an.
01:02:09.830 --> 01:02:15.090
And you're given a target
sum, also an integer.
01:02:15.090 --> 01:02:17.150
Call it t.
01:02:17.150 --> 01:02:21.160
What you'd like to know is, is
there a subset of the integers
01:02:21.160 --> 01:02:22.550
that adds up to that target.
01:02:34.380 --> 01:02:36.230
Can you choose a
sum of the integers
01:02:36.230 --> 01:02:39.800
so that-- I'll write
it the sum of S.
01:02:39.800 --> 01:02:43.670
But what this means is
the sum over the ai's that
01:02:43.670 --> 01:02:47.576
are in S of the value ai.
01:02:47.576 --> 01:02:51.540
I want that to equal t.
01:02:51.540 --> 01:02:52.980
So this is the definition.
01:02:52.980 --> 01:02:55.206
This is the constraint.
01:02:55.206 --> 01:02:56.580
So I give you a
bunch of numbers.
01:02:56.580 --> 01:02:58.480
Do any subset of
them add up to t?
01:02:58.480 --> 01:02:59.720
That's all this is asking.
01:02:59.720 --> 01:03:02.740
This problem is NP-hard.
01:03:02.740 --> 01:03:06.265
It's NP-complete, in fact, when
you can guess which integers
01:03:06.265 --> 01:03:08.140
should go in the subset,
and then add them up
01:03:08.140 --> 01:03:09.264
to see if you got it right.
01:03:11.886 --> 01:03:14.320
It is NP-hard, but
it's something special
01:03:14.320 --> 01:03:15.960
we call weakly NP-hard.
01:03:22.430 --> 01:03:28.687
And why don't I come back to the
definition of that in a moment?
01:03:28.687 --> 01:03:30.020
Let me first show you the proof.
01:03:30.020 --> 01:03:33.510
It's actually really easy
now that we have this three
01:03:33.510 --> 01:03:37.120
dimensional matching problem.
01:03:37.120 --> 01:03:38.820
It's pretty cool.
01:03:38.820 --> 01:03:42.530
So these numbers are
going to be huge.
01:03:42.530 --> 01:03:48.600
What we're going to say
is, let's view-- so again,
01:03:48.600 --> 01:03:51.587
we're given a three
dimensional matching instance.
01:03:51.587 --> 01:03:52.670
Get the directions, right?
01:03:52.670 --> 01:03:54.520
We're given a set of triples.
01:03:54.520 --> 01:03:59.520
We want to solve this problem
by reducing it to a subset sum.
01:03:59.520 --> 01:04:04.688
So we get to construct integers
that represent triples.
01:04:04.688 --> 01:04:06.220
That's what we're going to do.
01:04:06.220 --> 01:04:09.477
So here we go.
01:04:09.477 --> 01:04:10.560
We get to choose a number.
01:04:10.560 --> 01:04:16.440
So I'm going to think of
them in a particular base, b,
01:04:16.440 --> 01:04:23.420
which is going to be 1
plus the max of the mxi's.
01:04:23.420 --> 01:04:26.310
So again, this is the number
of occurrences of variable xi
01:04:26.310 --> 01:04:28.610
in a true or false form.
01:04:28.610 --> 01:04:31.490
So I take the maximum occurrence
of any variable, add 1.
01:04:31.490 --> 01:04:32.490
That's my base.
01:04:32.490 --> 01:04:34.170
It just has to be large enough.
01:04:37.010 --> 01:04:43.420
And this is basically the
entire reduction, is one line.
01:04:43.420 --> 01:04:48.350
If I have three triples--
if I have a triple xi, xj,
01:04:48.350 --> 01:04:53.530
xk, I'm going to convert
that into a number that
01:04:53.530 --> 01:05:03.150
looks like this where the one
positions are-- I don't really
01:05:03.150 --> 01:05:06.650
know the order, but
they are i, j, and k.
01:05:06.650 --> 01:05:08.190
Everything else is zero.
01:05:08.190 --> 01:05:11.300
And this is in
base b, not base 2.
01:05:11.300 --> 01:05:12.460
It's a little weird.
01:05:12.460 --> 01:05:16.120
All my digits are 0 or
1, but I'm in base b.
01:05:16.120 --> 01:05:18.500
And three of the digits are 1.
01:05:18.500 --> 01:05:19.720
And the rest are zero.
01:05:19.720 --> 01:05:20.440
Why?
01:05:20.440 --> 01:05:22.680
Because of my target sum.
01:05:22.680 --> 01:05:28.470
Target sum is going
to be 1111111111.
01:05:28.470 --> 01:05:31.580
So this number,
in algebra, you're
01:05:31.580 --> 01:05:36.280
write this as b to the i plus
b to the j plus b to the k.
01:05:36.280 --> 01:05:43.437
This you would write as the
sum of b to the i for all i.
01:05:43.437 --> 01:05:44.520
Do you see why this works?
01:05:44.520 --> 01:05:45.992
It's actually really simple.
01:05:51.890 --> 01:05:54.100
For this instance,
my goal is to choose
01:05:54.100 --> 01:05:57.890
a subset of these numbers
that add up to this number.
01:05:57.890 --> 01:05:59.330
How could that possibly happen?
01:05:59.330 --> 01:06:01.810
Well, I've got to choose--
every time I choose
01:06:01.810 --> 01:06:06.930
one of the numbers, those three
digits get set to 1 in my sum.
01:06:06.930 --> 01:06:10.790
If I ever have a collision,
if I add two 1s together,
01:06:10.790 --> 01:06:12.654
I'm going to get a 2.
01:06:12.654 --> 01:06:14.320
That's not good,
because once I get a 2,
01:06:14.320 --> 01:06:16.390
I'll never be able
to get back to a 1,
01:06:16.390 --> 01:06:20.150
because my base is really big.
01:06:20.150 --> 01:06:22.140
This base is designed
so that the total-- this
01:06:22.140 --> 01:06:26.060
is the total number
of colliding 1s.
01:06:26.060 --> 01:06:27.504
So we set it one
larger than that,
01:06:27.504 --> 01:06:29.920
which means you'll never get
a carry when you're adding up
01:06:29.920 --> 01:06:31.000
in this base.
01:06:31.000 --> 01:06:34.420
That's why I set the base to
be something large, not base 2.
01:06:34.420 --> 01:06:39.040
Base 2 might work, but
this is much safer.
01:06:39.040 --> 01:06:44.540
So what that means is for each
of these 1s in the target sum,
01:06:44.540 --> 01:06:47.140
I've got to find a
triple that has those 1s.
01:06:47.140 --> 01:06:48.902
And those triples can't overlap.
01:06:48.902 --> 01:06:51.360
So that means choosing a set
of numbers that add up to this
01:06:51.360 --> 01:06:54.460
is exactly the same as
choosing a set of triples that
01:06:54.460 --> 01:06:56.970
covers all of the elements.
01:06:56.970 --> 01:07:03.150
Done, super easy once you
have the right problem.
01:07:03.150 --> 01:07:05.690
OK, good.
01:07:05.690 --> 01:07:08.750
Now why do I call
this weekly NP-hard?
01:07:08.750 --> 01:07:12.010
Because these numbers are giant.
01:07:12.010 --> 01:07:18.240
If I have n elements
in X, Y, Z over there--
01:07:18.240 --> 01:07:22.130
I guess here they're
called xi, yk, zk.
01:07:22.130 --> 01:07:24.490
Sorry, maybe I should've
called them that here.
01:07:24.490 --> 01:07:26.980
Doesn't matter.
01:07:26.980 --> 01:07:34.120
If I have n of those elements
in X union Y union Z,
01:07:34.120 --> 01:07:38.760
the number of digits here is n.
01:07:38.760 --> 01:07:46.030
So the number of
digits in order n.
01:07:46.030 --> 01:07:48.430
This is fine from an
NP-completeness standpoint.
01:07:48.430 --> 01:07:49.600
This is polynomial size.
01:07:49.600 --> 01:07:55.170
The number of digits in my
numbers is a polynomial.
01:07:55.170 --> 01:07:56.730
And this base is
also pretty small.
01:07:56.730 --> 01:07:58.124
So if you wrote
it out in binary,
01:07:58.124 --> 01:07:59.290
it would also be polynomial.
01:07:59.290 --> 01:08:01.600
So just lost a log factor.
01:08:01.600 --> 01:08:10.460
But the size of the numbers, the
actual values of the numbers,
01:08:10.460 --> 01:08:12.020
is exponential.
01:08:17.350 --> 01:08:20.550
With weak NP-hardness,
that's allowed.
01:08:20.550 --> 01:08:23.920
With strong NP-hardness,
that's forbidden.
01:08:23.920 --> 01:08:26.460
In strong NP-hardness, you
want the values of the numbers
01:08:26.460 --> 01:08:28.370
to be polynomial.
01:08:28.370 --> 01:08:30.950
So in this case, the
number of bits is small,
01:08:30.950 --> 01:08:34.220
but the actual values
are giant, because you
01:08:34.220 --> 01:08:36.271
have to exponentiate.
01:08:36.271 --> 01:08:37.550
It would be cool.
01:08:37.550 --> 01:08:39.279
And this problem is
only weakly NP-hard.
01:08:39.279 --> 01:08:41.700
Maybe you actually know
a pseudo-polynomial time
01:08:41.700 --> 01:08:42.660
algorithm for this.
01:08:42.660 --> 01:08:44.500
It's basically a knapsack.
01:08:44.500 --> 01:08:51.660
If these numbers have polynomial
value, then you can basically,
01:08:51.660 --> 01:08:53.800
in your subproblems in
dynamic programming,
01:08:53.800 --> 01:08:57.609
you can write down
the number t and just
01:08:57.609 --> 01:08:59.149
solve it for all values of t.
01:08:59.149 --> 01:09:01.790
And it's easy to solve
it in polynomial time,
01:09:01.790 --> 01:09:05.302
polynomial in the
integer values.
01:09:05.302 --> 01:09:07.260
So we call that
pseudo-polynomial, because it's
01:09:07.260 --> 01:09:08.229
not really polynomial.
01:09:08.229 --> 01:09:10.065
It's not polynomial in
the number of digits
01:09:10.065 --> 01:09:11.689
that you have to
write down the number.
01:09:11.689 --> 01:09:13.800
It's Polynomial in the values.
01:09:13.800 --> 01:09:18.180
Weak NP-hardness goes together
with pseudo-polynomial.
01:09:18.180 --> 01:09:20.540
That's kind of a matching
result. Say look,
01:09:20.540 --> 01:09:22.890
pseudo-polynomial is
the best you can do.
01:09:22.890 --> 01:09:25.520
You can't hope for a
polynomial because if you
01:09:25.520 --> 01:09:30.330
let the numbers get huge, then
the problem is NP-complete.
01:09:30.330 --> 01:09:34.029
But if you force the numbers to
be small, this problem is easy.
01:09:34.029 --> 01:09:37.010
So subset sum is a little
funny in that sense.
01:09:40.131 --> 01:09:40.630
Cool.
01:10:11.400 --> 01:10:15.327
Let me tell you about
another problem, partition.
01:10:19.360 --> 01:10:21.560
So partition is pretty
much the same set up.
01:10:21.560 --> 01:10:23.200
I'm given n integers.
01:10:31.880 --> 01:10:33.860
Let's say they're positive.
01:10:33.860 --> 01:10:39.040
And I want to know, is
there a subset-- I'm not
01:10:39.040 --> 01:10:41.200
given a target sum t.
01:10:41.200 --> 01:10:42.655
Target sum is basically forced.
01:10:50.240 --> 01:10:53.880
What I would like is the
sum of all the values in S
01:10:53.880 --> 01:10:55.950
to equal the sum
of all the values
01:10:55.950 --> 01:11:00.490
not in S. That's A minus
S, which in other words
01:11:00.490 --> 01:11:04.790
is going to be the sum of
all values in A divided by 2.
01:11:04.790 --> 01:11:08.610
So this is called partition
because you're taking a set,
01:11:08.610 --> 01:11:11.990
you're splitting it into
two halves of equal sum.
01:11:11.990 --> 01:11:14.290
Every element has to go
in one of the two halves.
01:11:14.290 --> 01:11:19.510
And they're called S and A minus
S, like cuts in the flow stuff.
01:11:19.510 --> 01:11:21.227
And you want those
two halves to have
01:11:21.227 --> 01:11:22.810
exactly the same
sum, which means they
01:11:22.810 --> 01:11:24.620
will be the sum divided by 2.
01:11:24.620 --> 01:11:26.320
So that better be
even, otherwise
01:11:26.320 --> 01:11:27.840
it's not going to be possible.
01:11:27.840 --> 01:11:30.190
So again, you want to
decide whether this
01:11:30.190 --> 01:11:34.720
is possible or
impossible, yes or no.
01:11:34.720 --> 01:11:38.200
I claim this problem is
also weakly NP-complete,
01:11:38.200 --> 01:11:42.934
and we can reduce from
subset sum to partition.
01:11:42.934 --> 01:11:45.350
This is a little interesting
because partition is actually
01:11:45.350 --> 01:11:48.560
a special case of subset sum.
01:11:48.560 --> 01:11:53.500
It is the case
where t equals this.
01:11:53.500 --> 01:11:56.490
Subset sum, you're trying to
solve it no matter what t is.
01:11:56.490 --> 01:11:58.280
t is a given input.
01:11:58.280 --> 01:12:00.750
So there's more
instances over here.
01:12:00.750 --> 01:12:03.150
Some of them, some
of these instances
01:12:03.150 --> 01:12:05.650
are the case where t
equals the sum over 2.
01:12:05.650 --> 01:12:07.010
Those are partition instances.
01:12:07.010 --> 01:12:09.550
So this is like a subset
of the possible inputs
01:12:09.550 --> 01:12:11.900
as over there, which
means this problem is
01:12:11.900 --> 01:12:16.470
easier than this one--
no harder anyway.
01:12:16.470 --> 01:12:21.580
In other words, I can reduce
partition to subset sum.
01:12:21.580 --> 01:12:23.750
I just compute this
value and set that to t,
01:12:23.750 --> 01:12:25.880
and then leave the a's alone.
01:12:25.880 --> 01:12:28.350
That will reduce
partition to subset sum.
01:12:28.350 --> 01:12:30.130
But that's not the
direction I want.
01:12:30.130 --> 01:12:32.690
I want to reduce from subset
sum, a problem I can prove
01:12:32.690 --> 01:12:35.360
is NP-complete, to
partition, because I
01:12:35.360 --> 01:12:37.400
want to prove that
partition is NP-complete.
01:12:37.400 --> 01:12:42.090
So in this case, there's an easy
reduction in both directions.
01:12:42.090 --> 01:12:44.750
This direction is
a little harder.
01:12:44.750 --> 01:12:51.410
So reduction from subset sum.
01:12:55.440 --> 01:12:57.700
So I'm given a bunch of ai's.
01:12:57.700 --> 01:12:59.380
I'm not going to touch them.
01:12:59.380 --> 01:13:01.010
And I'm given a target sum t.
01:13:01.010 --> 01:13:04.820
And I basically want to make
that target sum into this half.
01:13:04.820 --> 01:13:09.480
To do that, I'm going to
add two numbers to my set.
01:13:09.480 --> 01:13:17.660
So I'm going to let sigma
be the sum of the given a's.
01:13:17.660 --> 01:13:21.300
And then I'm going to add--
so I'm given a1 through an.
01:13:21.300 --> 01:13:30.720
I'm going to add an plus 1,
is going to be sigma plus t.
01:13:30.720 --> 01:13:40.822
And I'm going to add an plus
2 to be 2 sigma minus t.
01:13:45.050 --> 01:13:45.990
Why?
01:13:45.990 --> 01:13:48.040
So these are two
basically huge numbers.
01:13:51.100 --> 01:13:52.720
Because sigma is
bigger than-- I mean,
01:13:52.720 --> 01:13:54.386
it's the sum of all
the numbers, so it's
01:13:54.386 --> 01:13:56.140
bigger than all of them.
01:13:56.140 --> 01:13:58.320
And so imagine for
a moment that I
01:13:58.320 --> 01:14:02.020
put these two in the same
side of the partition.
01:14:02.020 --> 01:14:04.100
I put them both in S
or I put them both out
01:14:04.100 --> 01:14:09.060
of S. Their sum by
themselves is 3 sigma.
01:14:09.060 --> 01:14:10.450
The t's cancel.
01:14:10.450 --> 01:14:13.940
Whereas all the other
items, their sum is sigma.
01:14:13.940 --> 01:14:15.622
So I'm host.
01:14:15.622 --> 01:14:17.830
If I have 3 sigma on one
side and sigma on the other,
01:14:17.830 --> 01:14:19.830
I'm not going to
make them equal.
01:14:19.830 --> 01:14:23.670
So in fact, these two elements
have to be on opposite sides.
01:14:23.670 --> 01:14:25.690
So there's a side
that has sigma plus t.
01:14:25.690 --> 01:14:28.877
There's a side has
2 sigma minus t.
01:14:28.877 --> 01:14:31.210
And then there's all the other
n items, and some of them
01:14:31.210 --> 01:14:32.990
are going to go to
this side, some of them
01:14:32.990 --> 01:14:35.660
are going to go to this side.
01:14:35.660 --> 01:14:38.310
Their total value is sigma.
01:14:38.310 --> 01:14:40.260
Right now this is
close to sigma.
01:14:40.260 --> 01:14:41.540
This is close to 2 sigma.
01:14:41.540 --> 01:14:43.650
So they have to kind
of meet in the middle.
01:14:43.650 --> 01:14:54.540
In fact, what you'll have to do
is add sigma minus t over here
01:14:54.540 --> 01:15:01.194
and add t over here.
01:15:01.194 --> 01:15:02.360
Think about it for a second.
01:15:02.360 --> 01:15:04.910
If I add sigma minus t,
this comes out to 2 sigma.
01:15:04.910 --> 01:15:07.622
If I add t to this, this
comes out to 2 sigma.
01:15:07.622 --> 01:15:09.330
That would be good
because they're equal.
01:15:09.330 --> 01:15:11.490
And notice that this
is sigma minus t.
01:15:11.490 --> 01:15:12.130
This is t.
01:15:12.130 --> 01:15:13.770
Their sum is sigma.
01:15:13.770 --> 01:15:16.870
So in fact, it has
to be like this.
01:15:16.870 --> 01:15:20.210
You add something over here, and
sigma minus something over here
01:15:20.210 --> 01:15:21.980
for all the other ai's.
01:15:21.980 --> 01:15:24.920
And the something has to be t
in order for these two values
01:15:24.920 --> 01:15:26.390
to equalize.
01:15:26.390 --> 01:15:30.420
So in order to solve this
slightly larger partition
01:15:30.420 --> 01:15:33.640
problem, you have to actually
solve the subset sum problem
01:15:33.640 --> 01:15:37.540
because you have to construct
a subset that adds up to t.
01:15:37.540 --> 01:15:40.260
t was an arbitrary given value.
01:15:40.260 --> 01:15:43.820
So this is pretty nifty.
01:15:43.820 --> 01:15:47.990
We're adding some values so that
the new target sum is the 50/50
01:15:47.990 --> 01:15:51.330
split when we're
given some values that
01:15:51.330 --> 01:15:55.140
have an arbitrary target sum.
01:15:55.140 --> 01:15:59.380
So partition is
weakly NP-complete.
01:15:59.380 --> 01:16:03.220
Let me go to rectangle packing.
01:16:27.690 --> 01:16:30.810
So rectangle packing-- I'm
going to draw a picture.
01:16:30.810 --> 01:16:34.780
I give you a bunch of
rectangles of varying sizes.
01:16:34.780 --> 01:16:37.692
And I give you a
target rectangle.
01:16:37.692 --> 01:16:42.240
Let's call it T.
These are the Ri's.
01:16:42.240 --> 01:16:48.900
I want to put these
rectangles into this picture
01:16:48.900 --> 01:16:52.000
without any overlaps.
01:16:52.000 --> 01:16:54.630
Each of these rectangles
here corresponds to one
01:16:54.630 --> 01:16:56.510
of the rectangles over here.
01:16:56.510 --> 01:17:01.180
So I'll tell you that the sum
of the areas of these rectangles
01:17:01.180 --> 01:17:04.370
is equal to the area of T.
And the question is, can you
01:17:04.370 --> 01:17:08.080
pack those rectangles into
T without any overlaps,
01:17:08.080 --> 01:17:10.850
and therefore without any gaps,
because the areas are exactly
01:17:10.850 --> 01:17:12.090
the same.
01:17:12.090 --> 01:17:16.050
I claim this problem
is weakly NP-hard-- I
01:17:16.050 --> 01:17:21.065
guess NP-complete by
reduction from partition.
01:17:24.740 --> 01:17:31.800
This will be super easy
if you followed what
01:17:31.800 --> 01:17:33.270
the definition of partition is.
01:17:33.270 --> 01:17:36.490
We're given some integers ai.
01:17:36.490 --> 01:17:41.100
And we're going to take each of
them and convert them into a,
01:17:41.100 --> 01:17:46.330
let's say, 1 by 3ai rectangle.
01:17:46.330 --> 01:17:49.490
Three is to avoid some
rotation we'll see.
01:17:49.490 --> 01:17:51.225
And then we're also
given the targets.
01:17:51.225 --> 01:17:53.330
Oh no, target sum is given.
01:17:53.330 --> 01:17:55.210
Target sum is the sum over 2.
01:17:55.210 --> 01:17:59.520
But anyway, we're going to
build our target rectangle
01:17:59.520 --> 01:18:06.970
to be-- it's actually
going to be really big.
01:18:06.970 --> 01:18:13.820
It's going to be 2 by 3 times t.
01:18:13.820 --> 01:18:16.020
So this is that thing.
01:18:16.020 --> 01:18:21.580
So this is 3/2 sum of the a's.
01:18:21.580 --> 01:18:24.720
OK, that's about it.
01:18:24.720 --> 01:18:26.600
In order to pack these
rectangles into here,
01:18:26.600 --> 01:18:28.627
because each of them
is at least three long,
01:18:28.627 --> 01:18:29.960
you cannot pack them vertically.
01:18:29.960 --> 01:18:31.790
They have to be horizontal.
01:18:31.790 --> 01:18:35.320
So in fact what your
packing will look like is
01:18:35.320 --> 01:18:38.070
they'll be the top half
and the bottom half.
01:18:38.070 --> 01:18:40.620
And the top half, the total
length of those rectangles
01:18:40.620 --> 01:18:45.940
has to add up to 3/2 sum of A.
Everything was scaled up by 3,
01:18:45.940 --> 01:18:49.690
so that's 1/2 of A on
the top and the bottom.
01:18:49.690 --> 01:18:50.790
That's a partition.
01:18:50.790 --> 01:18:52.620
In order to pack the
rectangles into here,
01:18:52.620 --> 01:18:55.570
you have to solve the partition
problem, and vice versa.
01:18:55.570 --> 01:18:57.580
Easy.
01:18:57.580 --> 01:19:11.815
OK, let me show you one
more thing, jigsaw puzzles.
01:19:16.790 --> 01:19:21.360
This is not the jigsaw puzzles
you grew up on, somewhat more
01:19:21.360 --> 01:19:23.150
generalized.
01:19:23.150 --> 01:19:27.286
So a piece is going to
look something like this.
01:19:30.480 --> 01:19:33.860
I drew them
intentionally different.
01:19:33.860 --> 01:19:36.820
So on each, you
have a unit square.
01:19:36.820 --> 01:19:39.620
Some of the sides can be flat.
01:19:39.620 --> 01:19:40.840
Some of them can be tabs.
01:19:40.840 --> 01:19:42.500
Some of them can be pockets.
01:19:42.500 --> 01:19:45.000
Each tab and pocket has a shape.
01:19:45.000 --> 01:19:48.110
And they're not in a perfect
matching with each other.
01:19:48.110 --> 01:19:50.950
So there could be
seven of these tabs
01:19:50.950 --> 01:19:53.550
and seven of these pockets,
all the same shape.
01:19:53.550 --> 01:19:55.810
This is what you might call
ambiguous jigsaw puzzles.
01:19:55.810 --> 01:19:58.240
Plus, there is no
image on the piece,
01:19:58.240 --> 01:20:01.900
so this is like
hardcore jigsaw puzzles.
01:20:01.900 --> 01:20:05.100
This is NP-complete.
01:20:05.100 --> 01:20:10.860
And what I'd like to do
is to simulate a rectangle
01:20:10.860 --> 01:20:13.110
with a bunch of jigsaw pieces.
01:20:13.110 --> 01:20:15.790
So it would look
something like this.
01:20:24.080 --> 01:20:27.350
If I have a 1 buy
something rectangle,
01:20:27.350 --> 01:20:31.370
I'm going to simulate it
with that same something,
01:20:31.370 --> 01:20:34.020
little jigsaw pieces.
01:20:34.020 --> 01:20:38.290
And I'm going to make these
shapes only match each other.
01:20:38.290 --> 01:20:41.140
And so for every
rectangle, they're
01:20:41.140 --> 01:20:43.920
going to have a different shape.
01:20:43.920 --> 01:20:45.694
This one will be squares.
01:20:45.694 --> 01:20:47.860
At that point I ran out of
shapes I can easily draw,
01:20:47.860 --> 01:20:48.970
but you get the idea.
01:20:48.970 --> 01:20:50.980
Each rectangle has
a different shape.
01:20:50.980 --> 01:20:52.930
And so these have to
match to each other.
01:20:52.930 --> 01:20:55.000
You can't mix the
tiles, which means you
01:20:55.000 --> 01:20:56.470
have to build this rectangle.
01:20:56.470 --> 01:20:58.300
You have to build
this rectangle.
01:20:58.300 --> 01:21:00.850
And then if the jigsaw
problem is, can you
01:21:00.850 --> 01:21:02.830
fit these into a
given rectangle,
01:21:02.830 --> 01:21:04.460
then you get rectangle packing.
01:21:04.460 --> 01:21:07.185
But this is not a
valid reduction.
01:21:10.610 --> 01:21:15.050
You can't reduce from partition.
01:21:18.550 --> 01:21:20.150
Why?
01:21:20.150 --> 01:21:26.040
Because these numbers are huge.
01:21:26.040 --> 01:21:28.430
Remember, the values of
the numbers in my partition
01:21:28.430 --> 01:21:30.550
instance are exponential.
01:21:30.550 --> 01:21:34.970
So if I have a value ai and it's
exponential in my problem size,
01:21:34.970 --> 01:21:38.030
and I tried to make
ai have little tiles,
01:21:38.030 --> 01:21:40.040
that means a number
of jigsaw pieces
01:21:40.040 --> 01:21:42.814
will be exponential in n.
01:21:42.814 --> 01:21:43.480
That's not good.
01:21:43.480 --> 01:21:45.910
That's not allowed.
01:21:45.910 --> 01:21:49.780
This is why weak
NP-hardness is annoying.
01:21:49.780 --> 01:21:54.550
So instead, we need a
strong NP-hard problem.
01:22:00.020 --> 01:22:01.810
This is a problem
that's NP-hard even when
01:22:01.810 --> 01:22:06.090
the numbers are polynomial
in value, not just in size.
01:22:06.090 --> 01:22:07.400
And it's called 4-partition.
01:22:11.670 --> 01:22:16.970
4-partition, you're given
n integers, as usual.
01:22:20.380 --> 01:22:28.110
Say set is A. And you want
to split those integers
01:22:28.110 --> 01:22:39.990
into n over 4 quadruples
of the same sum.
01:22:46.860 --> 01:22:51.400
So this would be the sum of
A divided by n over four.
01:22:51.400 --> 01:22:53.240
That's your target sum.
01:22:53.240 --> 01:22:55.300
So before we had to
split into two parts that
01:22:55.300 --> 01:22:56.500
had the same sum.
01:22:56.500 --> 01:22:57.600
That was partition.
01:22:57.600 --> 01:23:00.330
Now we have to split
into n over 4 parts.
01:23:00.330 --> 01:23:04.040
Each part will have exactly
four numbers, four integers.
01:23:04.040 --> 01:23:06.840
And they should all
have the same sum.
01:23:06.840 --> 01:23:09.660
This problem is hard
even when the integers
01:23:09.660 --> 01:23:12.345
have polynomial value.
01:23:12.345 --> 01:23:20.765
So the values are at most
some polynomial in n.
01:23:20.765 --> 01:23:22.890
I won't prove it here, but
it's in my lecture notes
01:23:22.890 --> 01:23:23.639
if you're curious.
01:23:23.639 --> 01:23:27.290
It's like this
proof, but harder.
01:23:27.290 --> 01:23:30.090
You end up, instead of
having n digit numbers,
01:23:30.090 --> 01:23:32.080
you have five digit numbers.
01:23:32.080 --> 01:23:36.130
Each digit only has a polynomial
in n different values.
01:23:36.130 --> 01:23:40.470
So the total value of the
numbers is only polynomial.
01:23:40.470 --> 01:23:43.380
It's like n to the
fifth or something.
01:23:43.380 --> 01:23:46.160
Good news is that
this reduction I just
01:23:46.160 --> 01:23:56.710
gave you is also a
reduction from 4-partition
01:23:56.710 --> 01:23:59.950
because it's the same set up.
01:23:59.950 --> 01:24:01.260
Again, I'm given integers.
01:24:01.260 --> 01:24:05.340
Each integer I'm going to
represent by that many tiles.
01:24:05.340 --> 01:24:07.390
Now the number of tiles
is only polynomial,
01:24:07.390 --> 01:24:09.560
so this is a valid reduction.
01:24:09.560 --> 01:24:11.650
And again, if I have to
pack all of these tiles
01:24:11.650 --> 01:24:14.360
into a rectangular board,
that's exactly the same
01:24:14.360 --> 01:24:17.530
as packing these integers.
01:24:17.530 --> 01:24:19.670
Well, I guess I should do
rectangle packing again.
01:24:19.670 --> 01:24:22.590
So this is a proof rectangle
packing was weakly NP-hard.
01:24:22.590 --> 01:24:24.786
But in fact it's
strongly NP-hard.
01:24:24.786 --> 01:24:26.160
You just change
these dimensions.
01:24:26.160 --> 01:24:33.470
You say well, I need whatever,
n over 4 different parts, each
01:24:33.470 --> 01:24:36.820
of size the sum over n over 4.
01:24:36.820 --> 01:24:38.260
You need some scale factor here.
01:24:38.260 --> 01:24:39.360
Three doesn't work.
01:24:39.360 --> 01:24:43.950
Use n or something-- n and n.
01:24:43.950 --> 01:24:46.330
That will prove that
rectangle packing is actually
01:24:46.330 --> 01:24:49.510
strongly NP-hard because
we're reducing for 4-partition
01:24:49.510 --> 01:24:50.612
instead of partition.
01:24:50.612 --> 01:24:52.320
And then you can reduce
rectangle packing
01:24:52.320 --> 01:24:55.187
to jigsaw puzzles because you
have strong hardness over here.
01:24:55.187 --> 01:24:56.520
Over here we don't have numbers.
01:24:56.520 --> 01:24:59.800
We just have these pieces.
01:24:59.800 --> 01:25:01.710
So whenever you convert
from a number problem
01:25:01.710 --> 01:25:05.110
to a non-number problem, if
you're representing the numbers
01:25:05.110 --> 01:25:07.060
in unary, which is
what's going on here,
01:25:07.060 --> 01:25:09.620
you need strong
NP-hardness for it to work.
01:25:09.620 --> 01:25:11.340
Weak NP-hardness isn't enough.
01:25:11.340 --> 01:25:13.970
Then we get jigsaw puzzles,
which we know and love,
01:25:13.970 --> 01:25:15.220
are NP-complete.
01:25:15.220 --> 01:25:17.070
That's it.