WEBVTT

00:00:00.000 --> 00:00:01.988
[SQUEAKING]

00:00:01.988 --> 00:00:04.473
[RUSTLING]

00:00:04.473 --> 00:00:07.455
[CLICKING]

00:00:24.910 --> 00:00:26.390
MICHAEL SIPSER:
Welcome, everyone.

00:00:26.390 --> 00:00:29.800
Welcome back to
theory of computation.

00:00:29.800 --> 00:00:38.130
And just to recap
where we are, we

00:00:38.130 --> 00:00:40.470
have been looking at
time complexity and space

00:00:40.470 --> 00:00:41.790
complexity.

00:00:41.790 --> 00:00:48.570
And we just finished
proving what

00:00:48.570 --> 00:00:53.790
are called the hierarchy
theorems, which, in a nutshell,

00:00:53.790 --> 00:00:58.360
basically say that, if you
allow the computational model

00:00:58.360 --> 00:01:00.360
to have a little bit more
resource, a little bit

00:01:00.360 --> 00:01:03.780
more time, a little
bit more space, then

00:01:03.780 --> 00:01:09.500
you can do more things
with certain conditions.

00:01:09.500 --> 00:01:11.890
So we proved that last time.

00:01:11.890 --> 00:01:14.600
It was a proof, basically,
by a diagonalization.

00:01:14.600 --> 00:01:17.620
I don't know if you recognized
the diagonalization there,

00:01:17.620 --> 00:01:21.730
but when you're encoding
a machine by an input

00:01:21.730 --> 00:01:24.550
and then basically running all
possible different machines,

00:01:24.550 --> 00:01:28.910
that's essentially
a diagonalization.

00:01:28.910 --> 00:01:33.050
So today, we're going
to build on that work

00:01:33.050 --> 00:01:35.810
to give an example
of what we call

00:01:35.810 --> 00:01:39.260
a natural intractable problem.

00:01:39.260 --> 00:01:41.755
We'll say a bit more
about what that means.

00:01:41.755 --> 00:01:43.880
And then, we're going to
talk about something which

00:01:43.880 --> 00:01:47.030
is a different topic,
but nevertheless related,

00:01:47.030 --> 00:01:52.220
having to do with oracles and
methods which may or may not

00:01:52.220 --> 00:01:59.750
work to solve the P versus NP
problem, which, of course, is

00:01:59.750 --> 00:02:02.390
a big open problem in the field.

00:02:02.390 --> 00:02:03.470
OK.

00:02:03.470 --> 00:02:07.100
So the time and space
hierarchy theorems--

00:02:07.100 --> 00:02:09.560
because we're going to
be using those today--

00:02:09.560 --> 00:02:13.670
they say that if you give a
little bit more space here-- so

00:02:13.670 --> 00:02:16.190
for space constructible
functions, functions

00:02:16.190 --> 00:02:19.400
that you can actually compute
within the amount of space

00:02:19.400 --> 00:02:24.230
that they specify, you can
show that the things that you

00:02:24.230 --> 00:02:29.060
can do in that much space
is probably larger than what

00:02:29.060 --> 00:02:30.500
you can do in less space.

00:02:30.500 --> 00:02:33.390
And you can prove a similar
slightly weaker fact

00:02:33.390 --> 00:02:36.270
about the time
complexity classes.

00:02:36.270 --> 00:02:43.050
So what that means is that
these classes form a hierarchy.

00:02:43.050 --> 00:02:46.220
So as you add more
time, or let's

00:02:46.220 --> 00:02:49.010
say, in this case, space,
from n squared, to n cubed,

00:02:49.010 --> 00:02:53.810
to n to the 4th, you get
larger and larger classes,

00:02:53.810 --> 00:02:57.660
which I'm kind illustrating
here by putting a dot there,

00:02:57.660 --> 00:02:59.360
which shows that
there's something

00:02:59.360 --> 00:03:02.180
that we know that's
new in those classes

00:03:02.180 --> 00:03:07.830
as you go up these
different bounds.

00:03:07.830 --> 00:03:09.890
And this is going to be
true for space complexity

00:03:09.890 --> 00:03:12.500
and it's also going to be
true for time complexity.

00:03:17.900 --> 00:03:21.470
And one of the corollaries
that we pointed out last time

00:03:21.470 --> 00:03:30.320
is that, PSPACE is a-- properly
includes non-deterministic log

00:03:30.320 --> 00:03:31.340
space, NL.

00:03:31.340 --> 00:03:33.920
So NL is a proper
subset of PSPACE.

00:03:33.920 --> 00:03:38.750
So there's stuff in
PSPACE that is not in NL.

00:03:38.750 --> 00:03:41.420
And remember this notation
here, this means proper subset.

00:03:44.470 --> 00:03:46.600
One of the things that--

00:03:46.600 --> 00:03:49.520
a follow-on corollary that
we didn't mention last time,

00:03:49.520 --> 00:03:52.450
but that's something
that you should know,

00:03:52.450 --> 00:03:57.970
is that the TQBF problem, our
PSPACE based complete problem,

00:03:57.970 --> 00:04:02.140
is an example of a problem
that's in PSPACE, obviously,

00:04:02.140 --> 00:04:05.620
but we know it's also not in NL.

00:04:05.620 --> 00:04:07.540
And in order to get
that conclusion,

00:04:07.540 --> 00:04:14.020
you have to look, again, at
the proof that TQBF is PSPACE

00:04:14.020 --> 00:04:19.300
complete, and observe that
the reductions that we gave

00:04:19.300 --> 00:04:23.410
in that proof can be carried
out not only in polynomial time,

00:04:23.410 --> 00:04:26.110
but they can be carried
out in log space.

00:04:26.110 --> 00:04:32.200
And therefore, if TQBF
turned out to go down to NL,

00:04:32.200 --> 00:04:34.660
then because
everything in PSPACE

00:04:34.660 --> 00:04:38.650
is log space reducible to TQBF,
that would bring all of PSPACE

00:04:38.650 --> 00:04:40.480
down to NL.

00:04:40.480 --> 00:04:42.950
But that we just
proved is not the case.

00:04:42.950 --> 00:04:46.530
So therefore, TQBF
could not be in NL.

00:04:46.530 --> 00:04:50.340
OK, and we're going to be
using that kind of reasoning

00:04:50.340 --> 00:04:54.070
again in this lecture.

00:04:54.070 --> 00:04:56.740
So just a quick check-in.

00:04:56.740 --> 00:05:00.030
These are a few,
more or less easy,

00:05:00.030 --> 00:05:04.370
maybe more or less
tricky, follow-ons

00:05:04.370 --> 00:05:09.357
that you can conclude from
the time and space hierarchy

00:05:09.357 --> 00:05:10.940
theorems plus some
of the other things

00:05:10.940 --> 00:05:12.450
we've proven along the way.

00:05:12.450 --> 00:05:16.957
And so just as a check of
your understanding, maybe

00:05:16.957 --> 00:05:18.540
these a little bit
on the tricky side,

00:05:18.540 --> 00:05:21.020
so you have to read
them carefully.

00:05:21.020 --> 00:05:25.520
Which of these are known to
be true based on the material

00:05:25.520 --> 00:05:26.510
that we've presented?

00:05:26.510 --> 00:05:29.210
And this is also
just material that's

00:05:29.210 --> 00:05:35.300
the facts that we know to be
true in complexity theory.

00:05:35.300 --> 00:05:36.640
So let me launch that poll.

00:05:36.640 --> 00:05:43.290
And just check off the
ones that we can prove.

00:05:43.290 --> 00:05:44.970
Hmm.

00:05:44.970 --> 00:05:46.710
OK.

00:05:46.710 --> 00:05:48.220
I'm going to close it down.

00:05:48.220 --> 00:05:53.010
So please answer quickly
if you're going to.

00:05:53.010 --> 00:05:54.360
OK, 1, 2, 3, end.

00:05:57.120 --> 00:05:59.220
OK.

00:05:59.220 --> 00:06:04.620
Well, the two leading
candidates are correct.

00:06:04.620 --> 00:06:06.750
And the two that are
the laggards here

00:06:06.750 --> 00:06:09.960
are, in fact, the ones
that are not true.

00:06:09.960 --> 00:06:14.910
So A and D are not true,
based on what we know.

00:06:14.910 --> 00:06:18.070
And B and C are true.

00:06:18.070 --> 00:06:20.250
So let's understand,
first of all, A, we

00:06:20.250 --> 00:06:25.620
know it's false because 2 to
the n plus 1 is just 2 times 2

00:06:25.620 --> 00:06:27.140
to the n.

00:06:27.140 --> 00:06:32.250
And so these two bounds differ
only by a constant factor.

00:06:32.250 --> 00:06:35.570
And so in fact, they're
the same complexity class.

00:06:35.570 --> 00:06:37.940
And so you don't get
proper containment for A.

00:06:37.940 --> 00:06:39.755
So that one we
absolutely know is false.

00:06:42.820 --> 00:06:49.440
D, well, if we could
prove that, then we

00:06:49.440 --> 00:06:51.120
would have solved
the famous problem,

00:06:51.120 --> 00:06:56.530
because we don't know
whether even P equals PSPACE.

00:06:56.530 --> 00:06:59.280
So if P equals PSPACE,
then certainly PSPACE

00:06:59.280 --> 00:07:02.200
would equal NP, which
is in between the two.

00:07:02.200 --> 00:07:04.890
And so we don't
know how to prove

00:07:04.890 --> 00:07:08.130
PSPACE is different
from NP, that's

00:07:08.130 --> 00:07:11.777
based on the current state
of knowledge of the field.

00:07:11.777 --> 00:07:13.360
So this would not
be something that we

00:07:13.360 --> 00:07:17.530
know to be true based on
what things that we've said.

00:07:17.530 --> 00:07:23.980
Now, B follows directly from
the time hierarchy theorem,

00:07:23.980 --> 00:07:28.210
because 2 to the 2n is
the square of 2 to the n.

00:07:28.210 --> 00:07:33.650
And that is, asymptotically,
a significantly larger bound.

00:07:33.650 --> 00:07:41.660
And so you can prove that
time 2 the n is properly

00:07:41.660 --> 00:07:44.560
contains time 2 the n.

00:07:44.560 --> 00:07:49.970
C is a little trickier
because you need

00:07:49.970 --> 00:07:52.630
to remember Savitch's theorem.

00:07:52.630 --> 00:07:54.970
Savitch's theorem
applies to space.

00:07:54.970 --> 00:07:56.650
But you also need to
remember that what

00:07:56.650 --> 00:07:59.440
you can do in time, in
non-deterministic time

00:07:59.440 --> 00:08:02.230
n squared, you can also do
in non-deterministic space

00:08:02.230 --> 00:08:04.720
n squared, which,
then, in turn, you

00:08:04.720 --> 00:08:09.550
can do in deterministic space
n to the 4th, which is properly

00:08:09.550 --> 00:08:12.380
contained within
space n to the 5th.

00:08:12.380 --> 00:08:15.800
So you can prove
that PSPACE properly

00:08:15.800 --> 00:08:19.940
contains non-deterministic
time n squared.

00:08:19.940 --> 00:08:22.850
OK, just a bunch of
containments there.

00:08:22.850 --> 00:08:27.470
A and C are perhaps, in a
sense, it may be the most tricky

00:08:27.470 --> 00:08:29.380
of this group.

00:08:29.380 --> 00:08:32.049
OK.

00:08:32.049 --> 00:08:34.460
So let's move on.

00:08:34.460 --> 00:08:38.350
So we're going to introduce,
today, two new classes.

00:08:38.350 --> 00:08:41.069
And actually, I want
to go back to here.

00:08:43.750 --> 00:08:46.930
What are we going to
be trying to accomplish

00:08:46.930 --> 00:08:48.620
in today's lecture?

00:08:48.620 --> 00:08:53.170
So we're going to be looking
at provable intractability.

00:08:53.170 --> 00:08:58.780
So a problem being intractable
for us means it's outside of P.

00:08:58.780 --> 00:09:02.560
So we can't solve it
in polynomial time.

00:09:02.560 --> 00:09:06.160
For our perspective,
we're going to call that

00:09:06.160 --> 00:09:08.710
an intractable problem.

00:09:08.710 --> 00:09:13.870
Now, this problem over here,
that's sitting in time 2 the n,

00:09:13.870 --> 00:09:17.740
but not in smaller classes, so
this is an intractable problem.

00:09:17.740 --> 00:09:25.420
That's outside of
P. But this example

00:09:25.420 --> 00:09:30.280
of a language, if you remember
how the time hierarchy

00:09:30.280 --> 00:09:32.560
theorem or the space
hierarchy theorem

00:09:32.560 --> 00:09:36.430
was proved, basically,
this language itself is not

00:09:36.430 --> 00:09:39.490
an interesting language
for other than the purpose

00:09:39.490 --> 00:09:43.270
that it serves, to be in that
class and not in a lower class.

00:09:43.270 --> 00:09:45.670
But it's not a language that
anyone would care about.

00:09:45.670 --> 00:09:48.470
And it's not even a language
that is easy to describe.

00:09:48.470 --> 00:09:53.110
It's just the language that
some Turing machine decides,

00:09:53.110 --> 00:09:54.790
where that Turing
machine is especially

00:09:54.790 --> 00:09:58.210
designed to have the
property that its language is

00:09:58.210 --> 00:10:01.610
at a particular
complexity level.

00:10:01.610 --> 00:10:04.388
But otherwise, there's no nice
description of that language.

00:10:04.388 --> 00:10:05.930
It's not like a to
the n, b to the n,

00:10:05.930 --> 00:10:13.320
or some equivalence of 2
dfa's or something like that.

00:10:13.320 --> 00:10:16.190
So I would say that that
language is, in a sense,

00:10:16.190 --> 00:10:18.830
it serves its purpose, but it's
not a natural language that you

00:10:18.830 --> 00:10:19.950
really care about.

00:10:19.950 --> 00:10:23.150
So one one of the goals
of today's lecture

00:10:23.150 --> 00:10:26.090
is to give an example
of a natural language,

00:10:26.090 --> 00:10:30.260
a naturally-occurring
language, in a sense, that's

00:10:30.260 --> 00:10:33.830
easy to describe, where you
can prove that that language is

00:10:33.830 --> 00:10:39.170
intractable, is
actually outside of P.

00:10:39.170 --> 00:10:42.840
So that's a bit of
motivation where we're going.

00:10:42.840 --> 00:10:45.230
So along the way, we're
going to introduce

00:10:45.230 --> 00:10:48.810
these exponential complexity
classes, exponential time

00:10:48.810 --> 00:10:52.490
and exponential space,
which are exponentially

00:10:52.490 --> 00:10:57.170
bigger than polynomial time
and polynomial space classes.

00:10:57.170 --> 00:11:00.200
So it's 2 to the n to
the k in both cases.

00:11:00.200 --> 00:11:03.930
2 to a polynomial.

00:11:03.930 --> 00:11:08.790
And the first five
of these classes, L

00:11:08.790 --> 00:11:10.430
through PSPACE
we've already seen,

00:11:10.430 --> 00:11:12.890
and exponential time
and exponential space

00:11:12.890 --> 00:11:17.880
extend the containments
that we've already seen.

00:11:17.880 --> 00:11:24.200
So you have to double check that
you understand why PSPACE is

00:11:24.200 --> 00:11:27.090
a subset of exponential time.

00:11:27.090 --> 00:11:30.170
But that's because
that, as we showed,

00:11:30.170 --> 00:11:33.110
going from space to
time, you can do that

00:11:33.110 --> 00:11:35.190
with an exponential increase.

00:11:35.190 --> 00:11:37.130
That's the cost
of the simulation.

00:11:37.130 --> 00:11:39.050
And going from
time to space, you

00:11:39.050 --> 00:11:40.370
don't need any increase at all.

00:11:40.370 --> 00:11:42.578
Anything that you can do in
a certain amount of time,

00:11:42.578 --> 00:11:44.293
you can do in that much space.

00:11:44.293 --> 00:11:46.460
So anything you can do in
a certain amount of space,

00:11:46.460 --> 00:11:50.290
you can also do in exponentially
more amount of time.

00:11:50.290 --> 00:11:53.090
OK, so those were
simple theorems

00:11:53.090 --> 00:11:57.590
that we proved right
at the very beginning.

00:11:57.590 --> 00:11:59.560
Now, the hierarchy
theorems allow

00:11:59.560 --> 00:12:02.890
us to conclude some separations
among these classes.

00:12:02.890 --> 00:12:06.880
So we already looked at
this one, NL versus PSPACE.

00:12:06.880 --> 00:12:11.800
And we saw that because NL
is, by Savitch's theorem,

00:12:11.800 --> 00:12:15.550
in deterministic log squared
space, which is properly

00:12:15.550 --> 00:12:19.270
contained in
polynomial space, you

00:12:19.270 --> 00:12:24.260
get a separation between
those two classes, provably.

00:12:24.260 --> 00:12:26.780
And for similar reasons,
polynomial space

00:12:26.780 --> 00:12:28.490
to exponential
space, you're going

00:12:28.490 --> 00:12:32.000
to get a separation from
the space hierarchy theorem.

00:12:32.000 --> 00:12:34.280
And polynomial time
to exponential time,

00:12:34.280 --> 00:12:37.480
you get a provable
separation by virtue

00:12:37.480 --> 00:12:38.980
of the hierarchy theorem.

00:12:41.930 --> 00:12:45.710
Now we're going to
define complete problems

00:12:45.710 --> 00:12:48.333
for these two classes,
exponential time

00:12:48.333 --> 00:12:49.250
and exponential space.

00:12:49.250 --> 00:12:51.500
So we have exponential
time complete.

00:12:51.500 --> 00:12:56.880
It's going to be analogous
to what we showed before,

00:12:56.880 --> 00:13:01.860
which is that it's a
member of exponential time.

00:13:01.860 --> 00:13:06.173
And every problem in exponential
time is reducible to it,

00:13:06.173 --> 00:13:08.090
let's say, in polynomial
time, though it's not

00:13:08.090 --> 00:13:09.680
going to really turn
out to be matter.

00:13:09.680 --> 00:13:12.320
It could be in log space.

00:13:12.320 --> 00:13:14.135
Some simple method of
doing the reduction

00:13:14.135 --> 00:13:15.260
is going to be good enough.

00:13:15.260 --> 00:13:18.800
Let's say polynomial time
is the typical definition.

00:13:18.800 --> 00:13:21.350
And the same thing for
exponential space complete.

00:13:21.350 --> 00:13:23.630
We'll say it's exponential
space complete,

00:13:23.630 --> 00:13:25.040
if it's an exponential space.

00:13:25.040 --> 00:13:27.230
And anything else
in exponential space

00:13:27.230 --> 00:13:31.050
is polynomial time
reducible to it.

00:13:31.050 --> 00:13:31.890
OK.

00:13:31.890 --> 00:13:37.320
But the important thing
is that if something

00:13:37.320 --> 00:13:40.156
is exponential
time complete, you

00:13:40.156 --> 00:13:45.770
know it's outside of P, for
the same reasons we've now

00:13:45.770 --> 00:13:46.745
seen several times.

00:13:49.860 --> 00:13:55.110
Namely, that if an exponential
time complete problem ended up

00:13:55.110 --> 00:14:00.220
being in P, then
because everything

00:14:00.220 --> 00:14:03.190
else in exponential time is
reducible to the complete

00:14:03.190 --> 00:14:05.560
problem, they
would also be in P.

00:14:05.560 --> 00:14:08.650
And so exponential time
and P would be equal.

00:14:08.650 --> 00:14:12.820
But we just said they're not
equal because of the hierarchy

00:14:12.820 --> 00:14:15.140
theorem.

00:14:15.140 --> 00:14:19.550
So the logic is the hierarchy
theorem separates the class,

00:14:19.550 --> 00:14:24.020
and then the complete problem
inherits the difficulty

00:14:24.020 --> 00:14:26.010
of the larger class.

00:14:26.010 --> 00:14:30.202
So the complete problem cannot
be any lower than the other

00:14:30.202 --> 00:14:32.660
problems in the class, because
they're all reducible to it.

00:14:36.150 --> 00:14:39.390
So the same thing is going to
be true for an exponential space

00:14:39.390 --> 00:14:40.170
complete problem.

00:14:40.170 --> 00:14:43.650
Can't be even in PSPACE because
exponential space and PSPACE

00:14:43.650 --> 00:14:45.120
are different.

00:14:45.120 --> 00:14:47.940
And if it's not in PSPACE,
it's not going to be in P.

00:14:47.940 --> 00:14:51.630
And so in both cases, if
you have a problem that's

00:14:51.630 --> 00:14:54.780
complete for exponential
space or exponential time,

00:14:54.780 --> 00:14:59.110
we know that those
problems are intractable.

00:14:59.110 --> 00:15:02.080
And our strategy,
then, for giving

00:15:02.080 --> 00:15:07.420
a natural intractable
problem is to show

00:15:07.420 --> 00:15:09.760
it's complete for
one of these classes.

00:15:09.760 --> 00:15:11.290
And it's actually
going to turn out

00:15:11.290 --> 00:15:14.800
to be an exponential space
complete problem that we're

00:15:14.800 --> 00:15:17.880
going to give as our example.

00:15:17.880 --> 00:15:19.890
OK, so that is the plan.

00:15:22.500 --> 00:15:24.390
I think it's a good time to--

00:15:24.390 --> 00:15:27.120
let's just take a
few questions here

00:15:27.120 --> 00:15:33.300
to make sure we're all on the
same page as what we're doing.

00:15:33.300 --> 00:15:34.140
So let me just read.

00:15:34.140 --> 00:15:36.060
I got a couple of
questions already in here.

00:15:45.360 --> 00:15:46.830
So this is a little
bit of a side

00:15:46.830 --> 00:15:49.230
comment that somebody-- that's
an interesting question.

00:15:49.230 --> 00:15:53.910
Basically, is it
possible that we may not

00:15:53.910 --> 00:15:57.270
be able to prove, solve
the P versus NP problem,

00:15:57.270 --> 00:16:01.050
that it's not a problem
that one can answer

00:16:01.050 --> 00:16:02.830
from the basic axioms
of mathematics,

00:16:02.830 --> 00:16:06.740
if I'm interpreting
the question correctly.

00:16:06.740 --> 00:16:10.280
There are certain
problems in mathematics--

00:16:10.280 --> 00:16:11.810
and I think I,
perhaps, I mentioned

00:16:11.810 --> 00:16:14.150
earlier in the term, the
problem of whether there

00:16:14.150 --> 00:16:21.080
is a set whose size is
in between the integers

00:16:21.080 --> 00:16:22.670
and the real numbers.

00:16:22.670 --> 00:16:24.890
We know the real numbers
are larger in size

00:16:24.890 --> 00:16:26.540
than the integers.

00:16:26.540 --> 00:16:28.790
That was our first example
of a diagonalization.

00:16:28.790 --> 00:16:32.660
And is there a problem of size
strictly in between the two?

00:16:32.660 --> 00:16:35.850
Bigger than the integers,
smaller than the real numbers.

00:16:35.850 --> 00:16:39.680
So that's a problem that
was posed a long time ago.

00:16:39.680 --> 00:16:41.240
It was one of
Hilbert's problems.

00:16:41.240 --> 00:16:46.300
And was eventually
shown to be unanswerable

00:16:46.300 --> 00:16:49.310
using the basic
axioms of mathematics.

00:16:49.310 --> 00:16:51.175
So the question is,
maybe P versus NP

00:16:51.175 --> 00:16:52.510
is in the same category.

00:16:55.370 --> 00:16:55.870
Could be.

00:16:55.870 --> 00:16:58.350
That could be true of
any unsolved problems

00:16:58.350 --> 00:17:00.000
in mathematics.

00:17:00.000 --> 00:17:02.130
But at least our
experience has shown

00:17:02.130 --> 00:17:04.980
that the kinds of problems
that, at least, have been shown

00:17:04.980 --> 00:17:09.420
to be unsolvable from
mathematical axioms

00:17:09.420 --> 00:17:12.270
tend to involve
infinities and very large

00:17:12.270 --> 00:17:14.940
things, things that are very
far from our intuitions.

00:17:14.940 --> 00:17:18.810
And something as down to earth
as P versus NP, at least,

00:17:18.810 --> 00:17:20.700
it would be very
surprising to me

00:17:20.700 --> 00:17:23.430
if that turned out
to be unanswerable

00:17:23.430 --> 00:17:25.530
using our mathematical axioms.

00:17:25.530 --> 00:17:26.692
But, who knows?

00:17:26.692 --> 00:17:28.109
Oh, this is another
good question.

00:17:28.109 --> 00:17:31.260
Do the time and space
hierarchy theorems

00:17:31.260 --> 00:17:33.130
have non-deterministic variants?

00:17:33.130 --> 00:17:34.380
Yes, they do.

00:17:34.380 --> 00:17:36.000
They're much harder
to prove, however,

00:17:36.000 --> 00:17:37.417
and we're not going
to cover that.

00:17:37.417 --> 00:17:42.900
But you can also prove that
non-deterministic time, n cubed

00:17:42.900 --> 00:17:45.180
properly includes
non-deterministic time

00:17:45.180 --> 00:17:45.750
n squared.

00:17:45.750 --> 00:17:47.583
You're not going to be
responsible for that.

00:17:47.583 --> 00:17:48.990
Don't worry.

00:17:48.990 --> 00:17:51.480
If you try to
actually prove that,

00:17:51.480 --> 00:17:58.520
you'll see the diagonalization
doesn't directly work.

00:17:58.520 --> 00:18:01.730
And so you have to
do something fancier.

00:18:04.560 --> 00:18:07.320
People are asking about which
reduction method to use.

00:18:07.320 --> 00:18:11.550
Again, the kinds of
reductions that we encounter

00:18:11.550 --> 00:18:13.210
are always very simple.

00:18:13.210 --> 00:18:16.200
So we're just going to be
working with very weak notions

00:18:16.200 --> 00:18:17.460
of reductions.

00:18:17.460 --> 00:18:20.130
Not interesting yet, generally,
to consider powerful kinds

00:18:20.130 --> 00:18:25.037
of reductions like polynomial
exponential time reductions

00:18:25.037 --> 00:18:25.870
or things like that.

00:18:25.870 --> 00:18:30.210
So it's just not something that
people really think about much.

00:18:30.210 --> 00:18:33.360
I mean, I can talk about
it at length offline.

00:18:33.360 --> 00:18:36.870
But let's just assume that
our reduction strength

00:18:36.870 --> 00:18:38.370
is something very low.

00:18:38.370 --> 00:18:39.930
Log space is going
to be good enough

00:18:39.930 --> 00:18:41.745
to do all of the
reductions in this class.

00:18:45.330 --> 00:18:47.490
OK, so let's move on, then.

00:18:47.490 --> 00:18:51.090
So here is the
problem that we're

00:18:51.090 --> 00:18:53.250
going to spend the
next 20 minutes

00:18:53.250 --> 00:18:59.040
or so proving to be
exponential space complete.

00:18:59.040 --> 00:19:01.080
I have got to do a little
introduction first.

00:19:01.080 --> 00:19:07.130
So this is not the problem, but
this is related to the problem.

00:19:07.130 --> 00:19:10.060
So the problem of testing
if two regular expressions

00:19:10.060 --> 00:19:12.083
are equivalent.

00:19:12.083 --> 00:19:13.500
Write down to
regular expressions,

00:19:13.500 --> 00:19:15.830
do they generate
the same language?

00:19:15.830 --> 00:19:18.915
So that problem actually
turns out to be in PSPACE.

00:19:18.915 --> 00:19:21.040
So it's not going to be
exponential space complete.

00:19:21.040 --> 00:19:22.732
It's actually in PSPACE.

00:19:22.732 --> 00:19:24.190
I don't think we're
going to have--

00:19:24.190 --> 00:19:26.260
I thought about presenting
it in the lecture.

00:19:26.260 --> 00:19:27.820
It's not that hard to show.

00:19:27.820 --> 00:19:30.190
But it just took too much
time and doesn't really

00:19:30.190 --> 00:19:31.780
introduce new methods.

00:19:31.780 --> 00:19:35.890
It's a good exercise, actually,
using Savitch's theorem.

00:19:35.890 --> 00:19:38.740
But maybe we'll do
it in recitation,

00:19:38.740 --> 00:19:43.150
or if the lecture
miraculously ends earlier,

00:19:43.150 --> 00:19:44.073
I'll do it at the end.

00:19:44.073 --> 00:19:45.490
But I don't think
we'll have time.

00:19:50.570 --> 00:19:55.520
But that's a setup for
the intractable problem

00:19:55.520 --> 00:19:58.850
that we're going to talk
about, which is very related.

00:19:58.850 --> 00:20:01.310
Now, OK, before
we get to that, so

00:20:01.310 --> 00:20:04.370
if I have a regular
expression, I'm

00:20:04.370 --> 00:20:11.450
going to enhance our regular
expression in one simple way,

00:20:11.450 --> 00:20:15.490
by allowing exponents
or exponentiation.

00:20:15.490 --> 00:20:21.060
And that means if I have
a regular expression R,

00:20:21.060 --> 00:20:25.170
I can write R to the k to mean
R concatenated with itself k

00:20:25.170 --> 00:20:26.190
times.

00:20:26.190 --> 00:20:28.770
We've been sort of informally
using that all the way along

00:20:28.770 --> 00:20:31.350
anyway, like when we talk
about 0 to the k, 1 to the k.

00:20:34.260 --> 00:20:36.330
So if we're going to
formally allow that

00:20:36.330 --> 00:20:40.080
when we write down regular
expressions, in some cases,

00:20:40.080 --> 00:20:42.090
that might allow the
regular expression

00:20:42.090 --> 00:20:46.030
to be much smaller,
especially if we're

00:20:46.030 --> 00:20:48.800
writing down k in binary.

00:20:48.800 --> 00:20:52.220
Because I can write R to
the million with just a few

00:20:52.220 --> 00:20:55.490
symbols if I have
exponentiation.

00:20:55.490 --> 00:20:57.500
But if I don't have
exponentiation,

00:20:57.500 --> 00:20:59.900
then I have to
write R concatenated

00:20:59.900 --> 00:21:03.750
with R out a million
times, and I get a much,

00:21:03.750 --> 00:21:07.340
much longer, an exponentially
longer expression

00:21:07.340 --> 00:21:11.300
if I don't have that exponent
as a way of describing

00:21:11.300 --> 00:21:13.010
regular expressions.

00:21:13.010 --> 00:21:15.870
And that's going to
make a big difference.

00:21:15.870 --> 00:21:21.410
So now, the equivalence problem
for regular expressions with

00:21:21.410 --> 00:21:25.890
exponentiation-- that's what
that little up arrow means,

00:21:25.890 --> 00:21:27.840
what it signifies--

00:21:27.840 --> 00:21:30.390
now I'm giving you two
regular expressions.

00:21:30.390 --> 00:21:33.780
But they're allowed to
have the exponentiation

00:21:33.780 --> 00:21:41.860
operation in addition to the
standard regular operations.

00:21:41.860 --> 00:21:46.560
So now, testing whether two of
these regular expressions that

00:21:46.560 --> 00:21:49.140
have exponentiation,
that problem

00:21:49.140 --> 00:21:51.975
turns out to be
exponential space complete.

00:21:56.495 --> 00:21:58.870
So here's the equivalence
problem for regular expressions

00:21:58.870 --> 00:22:00.250
with exponentiation.

00:22:00.250 --> 00:22:02.840
That's an exponential
space complete problem.

00:22:02.840 --> 00:22:05.500
And as we pointed out,
that means this problem

00:22:05.500 --> 00:22:08.360
is provably intractable.

00:22:08.360 --> 00:22:11.930
So there's just no
way, in general,

00:22:11.930 --> 00:22:14.180
to solve that problem
in polynomial time.

00:22:14.180 --> 00:22:15.740
That's proven, that's known.

00:22:19.120 --> 00:22:23.170
So we're going to go
through the reduction.

00:22:23.170 --> 00:22:25.690
I think it's going to be our
last reduction of the term,

00:22:25.690 --> 00:22:28.660
of proving problems
complete for some class.

00:22:28.660 --> 00:22:34.660
But each one of those has
their own kind of thing

00:22:34.660 --> 00:22:37.120
that makes it special.

00:22:37.120 --> 00:22:40.870
So first of all, we have to show
that it's in exponential space.

00:22:40.870 --> 00:22:42.820
That's really going to
rely on this other fact

00:22:42.820 --> 00:22:43.870
that we didn't prove.

00:22:43.870 --> 00:22:47.360
So I'm going to go
over that very quickly.

00:22:47.360 --> 00:22:49.400
But the interesting part
is doing the reduction.

00:22:49.400 --> 00:22:51.970
So if I have something
in exponential space

00:22:51.970 --> 00:22:54.820
that I can show
that I can reduce it

00:22:54.820 --> 00:22:58.630
to the equivalence problem
for regular expressions

00:22:58.630 --> 00:23:00.870
with exponentiation.

00:23:00.870 --> 00:23:04.530
OK, so quickly arguing
part one that we're

00:23:04.530 --> 00:23:07.950
in exponential space,
basically, what you do

00:23:07.950 --> 00:23:09.690
is you take your two
regular expressions

00:23:09.690 --> 00:23:12.120
that you want to test to
see if they're equivalent,

00:23:12.120 --> 00:23:14.010
but now they have
exponentiation.

00:23:14.010 --> 00:23:17.400
And as a first step, you get
rid of the exponentiation.

00:23:17.400 --> 00:23:22.620
You just expand things out
by repeating the parts that

00:23:22.620 --> 00:23:25.050
have the exponents.

00:23:25.050 --> 00:23:27.900
And of course, as I
said, that's going

00:23:27.900 --> 00:23:31.050
to make the expression
themselves exponentially

00:23:31.050 --> 00:23:33.040
bigger.

00:23:33.040 --> 00:23:36.610
But now, you run
the PSPACE algorithm

00:23:36.610 --> 00:23:40.220
on those two exponentially
larger expressions.

00:23:40.220 --> 00:23:42.970
So the input that the
PSPACE algorithm is now

00:23:42.970 --> 00:23:47.620
exponential in the
original input size,

00:23:47.620 --> 00:23:50.330
but it's PSPACE in
that enlarged input.

00:23:50.330 --> 00:23:52.690
So that's going to give
you an exponential space

00:23:52.690 --> 00:23:57.070
algorithm in the original input
size, because you expanded

00:23:57.070 --> 00:23:58.570
it to become
exponentially bigger,

00:23:58.570 --> 00:24:04.655
and then you run the PSPACE
algorithm on that expanded

00:24:04.655 --> 00:24:05.155
problem.

00:24:08.620 --> 00:24:10.750
So that gives you an
exponential space algorithm

00:24:10.750 --> 00:24:15.380
for this problem.

00:24:15.380 --> 00:24:17.170
But now, what
we're going to do--

00:24:17.170 --> 00:24:20.480
the interesting part
is the reduction.

00:24:20.480 --> 00:24:24.280
So given some language and
exponential space, say,

00:24:24.280 --> 00:24:27.880
decided by some Turing machine
in that amount of space,

00:24:27.880 --> 00:24:33.730
2 to the n to the k, we're going
to give a reduction that maps a

00:24:33.730 --> 00:24:38.710
to this equivalence problem.

00:24:38.710 --> 00:24:40.150
Got it?

00:24:40.150 --> 00:24:41.530
That is the plan.

00:24:44.640 --> 00:24:47.780
So let's make sure we're
all together on the plan

00:24:47.780 --> 00:24:51.320
before we go ahead and
carry out that plan.

00:24:56.160 --> 00:24:57.680
We just sort of
set things up here,

00:24:57.680 --> 00:25:01.130
in a sense, for what
we're going to be doing.

00:25:01.130 --> 00:25:08.240
So feel free to ask a
question on just the plan.

00:25:08.240 --> 00:25:09.890
It's going to get technical.

00:25:09.890 --> 00:25:12.740
Because, as doing these
reductions always is,

00:25:12.740 --> 00:25:15.080
there's a simulation
involved, and you

00:25:15.080 --> 00:25:19.290
have to kind of describe that
simulation in its own way.

00:25:19.290 --> 00:25:22.100
So now, we're going
to be simulating,

00:25:22.100 --> 00:25:28.290
in a certain sense,
M on w, the decider

00:25:28.290 --> 00:25:32.970
for this exponential
space, problem A,

00:25:32.970 --> 00:25:34.470
we're going to take
M on w and we're

00:25:34.470 --> 00:25:39.130
going to somehow have to express
the fact that M accepts w using

00:25:39.130 --> 00:25:41.130
this equivalence problem
for regular expressions

00:25:41.130 --> 00:25:42.367
with exponentiation.

00:25:47.260 --> 00:25:48.365
So no questions?

00:25:48.365 --> 00:25:49.240
Why don't we move on?

00:25:52.120 --> 00:25:56.320
I have three slides on this,
but they're kind of dense,

00:25:56.320 --> 00:25:57.340
I'm sorry to say.

00:26:00.810 --> 00:26:04.920
So here is the plan as usual.

00:26:04.920 --> 00:26:09.420
We're going to map A with
a polynomial time reduction

00:26:09.420 --> 00:26:11.910
to the equivalence problem
for regular expressions

00:26:11.910 --> 00:26:13.590
with exponentiation.

00:26:13.590 --> 00:26:16.680
So that means we're going to
have to take an input, which

00:26:16.680 --> 00:26:21.480
may or may not be in A, and
produce two regular expressions

00:26:21.480 --> 00:26:28.160
with exponentiation, which are
going to be equivalent when

00:26:28.160 --> 00:26:33.220
w is in A. Or when M accepts w.

00:26:40.220 --> 00:26:45.570
So it's going to be, as
these things always are,

00:26:45.570 --> 00:26:47.660
these are going to be in
terms of the computation

00:26:47.660 --> 00:26:49.760
history for M under w.

00:26:49.760 --> 00:26:51.590
But in this case,
it's going to turn out

00:26:51.590 --> 00:26:57.230
to be convenient to work with
the rejecting computation

00:26:57.230 --> 00:26:59.270
history for M on w.

00:26:59.270 --> 00:27:04.240
So remember, now we
have a Turing machine M.

00:27:04.240 --> 00:27:08.140
It's a decider, so that
means it always holds--

00:27:08.140 --> 00:27:11.140
for the strings in the language,
it ends up at a Q accept state,

00:27:11.140 --> 00:27:15.340
for things not in the language,
it ends up at a Q reject state.

00:27:15.340 --> 00:27:17.710
So a rejecting
computation history

00:27:17.710 --> 00:27:19.330
is the sequence
of configurations

00:27:19.330 --> 00:27:22.870
the machine goes through
from the start configuration

00:27:22.870 --> 00:27:25.240
until it ends up
at a configuration

00:27:25.240 --> 00:27:29.890
with a reject state, a
rejecting configuration.

00:27:29.890 --> 00:27:32.830
And we're going to make
a regular expression that

00:27:32.830 --> 00:27:38.640
describes all strings
except for that one.

00:27:38.640 --> 00:27:43.050
It's going to avoid describing
a rejecting computation

00:27:43.050 --> 00:27:44.670
history for M on w.

00:27:44.670 --> 00:27:47.445
Otherwise, it's going to
describe all possible strings.

00:27:50.480 --> 00:27:54.530
Now, if M does not
reject w, so there

00:27:54.530 --> 00:27:57.170
is no rejecting
computation history--

00:27:57.170 --> 00:27:59.000
namely, M accepts w, by the way.

00:27:59.000 --> 00:28:01.790
So if M accepts w,
does not reject w,

00:28:01.790 --> 00:28:05.270
it does not have a rejecting
computation history,

00:28:05.270 --> 00:28:09.470
what is R1 describing?

00:28:09.470 --> 00:28:12.610
Well, it's describing,
in that case, everything,

00:28:12.610 --> 00:28:16.450
because there is no rejecting
computation history.

00:28:16.450 --> 00:28:19.150
So it's describing every
other string besides.

00:28:19.150 --> 00:28:23.170
So that means it's describing
all strings, if there

00:28:23.170 --> 00:28:25.390
is no rejecting computation
history in the case

00:28:25.390 --> 00:28:27.370
that M accepts w.

00:28:27.370 --> 00:28:30.890
So what does that suggest
we should use for R2?

00:28:30.890 --> 00:28:33.740
R2 is going to be the
regular expression that

00:28:33.740 --> 00:28:36.480
just generates all strings.

00:28:36.480 --> 00:28:40.920
So we'll be testing whether R1
generates all strings or not,

00:28:40.920 --> 00:28:49.410
which is the same as saying
does M accept w or not.

00:28:49.410 --> 00:28:52.470
So R2 is going to be--

00:28:52.470 --> 00:28:55.710
I would like to say sigma
star, but sigma is really

00:28:55.710 --> 00:29:00.622
the input to M, and gamma
is the tape alphabet for M.

00:29:00.622 --> 00:29:02.580
So we have a lot of Greek
letters to play with,

00:29:02.580 --> 00:29:07.190
so we're going to use
delta for the alphabet

00:29:07.190 --> 00:29:10.910
that we write the
computation histories in.

00:29:10.910 --> 00:29:17.760
If you want to get reminded what
that delta is, a computation

00:29:17.760 --> 00:29:20.760
history can have a tape
alphabet symbol for M,

00:29:20.760 --> 00:29:23.970
it can have a
state symbol for M,

00:29:23.970 --> 00:29:25.590
or it can have a
delimiter pound--

00:29:25.590 --> 00:29:26.880
hashtag.

00:29:26.880 --> 00:29:32.530
So it's either a
capital delta alphabet

00:29:32.530 --> 00:29:37.330
is a tape alphabet symbol, or
state, something representing

00:29:37.330 --> 00:29:40.740
a state symbol, or a hashtag.

00:29:40.740 --> 00:29:41.490
That's just delta.

00:29:41.490 --> 00:29:44.190
So don't get-- I always
feel bad if somebody

00:29:44.190 --> 00:29:45.840
gets confused by
something that's

00:29:45.840 --> 00:29:47.140
supposed to be very simple.

00:29:47.140 --> 00:29:49.140
Don't get confused
by delta star.

00:29:49.140 --> 00:29:51.150
This is just all
possible strings

00:29:51.150 --> 00:29:52.275
over the alphabet delta.

00:29:56.450 --> 00:29:58.820
OK, so what does R1--

00:29:58.820 --> 00:30:01.130
so my job is to do R1.

00:30:01.130 --> 00:30:04.160
R2, I already told you.

00:30:04.160 --> 00:30:07.380
R1 now has to describe
all those strings

00:30:07.380 --> 00:30:10.830
except for the rejecting
computation history.

00:30:10.830 --> 00:30:16.260
So everything that fails to be a
rejecting computation history--

00:30:16.260 --> 00:30:19.380
so it fails either
because it started wrong,

00:30:19.380 --> 00:30:22.320
or it ended wrong, or it's
wrong somewhere in the middle.

00:30:22.320 --> 00:30:27.510
And by wrong I mean, it
fails to correctly describe

00:30:27.510 --> 00:30:32.235
the way the machine operates
if it's ending up rejecting w.

00:30:35.550 --> 00:30:36.050
All right.

00:30:36.050 --> 00:30:40.970
So I'm going to describe
all those possible strings

00:30:40.970 --> 00:30:44.780
by breaking it down into
those three categories.

00:30:44.780 --> 00:30:47.870
Starts wrong, ends wrong,
or somewhere computes

00:30:47.870 --> 00:30:51.180
wrong along the way.

00:30:51.180 --> 00:30:51.680
OK.

00:30:51.680 --> 00:30:56.780
So rejecting computation history
looks something like this.

00:30:56.780 --> 00:31:04.400
Here's the start configuration
as we usually envision it.

00:31:04.400 --> 00:31:07.160
It's a start state looking at
the first symbol of the input,

00:31:07.160 --> 00:31:09.650
and there's the
rest of the input.

00:31:09.650 --> 00:31:10.955
So let me just write this out.

00:31:13.615 --> 00:31:15.760
This is a rejecting
computation history now.

00:31:15.760 --> 00:31:19.310
So the first configuration,
the second one,

00:31:19.310 --> 00:31:21.790
and so on and so on, until
we end up at a rejecting

00:31:21.790 --> 00:31:26.350
computation-- rejecting
configuration.

00:31:26.350 --> 00:31:32.530
Now, for convenience,
I'm going to insist

00:31:32.530 --> 00:31:38.960
that all of these configurations
are the same length.

00:31:38.960 --> 00:31:44.620
It's going to make my life
easier in doing the proof.

00:31:44.620 --> 00:31:47.630
But why can I do that?

00:31:47.630 --> 00:31:49.193
Well, I'm just
going to take them--

00:31:49.193 --> 00:31:51.610
you know, because usually you
think of the configurations,

00:31:51.610 --> 00:31:54.340
they start small because they're
just basically of length n,

00:31:54.340 --> 00:31:56.380
but this is using
exponential space,

00:31:56.380 --> 00:31:57.820
they're getting
longer and longer.

00:31:57.820 --> 00:32:00.280
Let's just pair them
all out with blanks

00:32:00.280 --> 00:32:02.780
so that they're
all the same size.

00:32:02.780 --> 00:32:04.900
So as I've indicated
over here, we're

00:32:04.900 --> 00:32:06.340
adding in a bunch of blanks.

00:32:06.340 --> 00:32:09.436
It's going to be a
lot of blanks here,

00:32:09.436 --> 00:32:12.280
to make sure they all
have length 2 to the n

00:32:12.280 --> 00:32:15.260
to the k, which is the maximum
size of a configuration

00:32:15.260 --> 00:32:16.510
when you have that much space.

00:32:24.440 --> 00:32:26.720
I'm going to construct-- so
basically, that's my job.

00:32:26.720 --> 00:32:29.000
I'm going to construct
R1 so that it

00:32:29.000 --> 00:32:30.980
generates all those strings.

00:32:30.980 --> 00:32:37.500
I wrote a little box around
that thing I'm trying to--

00:32:41.703 --> 00:32:44.030
that's my to do.

00:32:44.030 --> 00:32:46.400
It's going to help me
in the coming slides

00:32:46.400 --> 00:32:49.370
because they're a
little bit dense.

00:32:49.370 --> 00:32:52.550
When I'm going to draw this
sort of reddish, pinkish box

00:32:52.550 --> 00:32:55.760
around something,
that means that I'm

00:32:55.760 --> 00:32:58.970
going to try to describe all
strings except for that one.

00:33:10.770 --> 00:33:12.600
I want to avoid
describing that one,

00:33:12.600 --> 00:33:14.730
because that's the rejecting
computation history,

00:33:14.730 --> 00:33:16.410
but I want to describe
everything else.

00:33:16.410 --> 00:33:18.585
That's my wish.

00:33:21.870 --> 00:33:25.180
So here's a check in
before we move forward.

00:33:25.180 --> 00:33:26.790
But we can also--
maybe we should just

00:33:26.790 --> 00:33:30.463
take some questions, even
before we launch the check in.

00:33:30.463 --> 00:33:31.380
How are we doing here?

00:33:36.000 --> 00:33:40.380
So, is our one describing--

00:33:40.380 --> 00:33:43.520
well, R1 is a
regular expression.

00:33:43.520 --> 00:33:45.620
Over here, we're
talking about a--

00:33:45.620 --> 00:33:47.630
this is just an ordinary
computation history,

00:33:47.630 --> 00:33:48.720
but it ends with a reject.

00:33:48.720 --> 00:33:49.220
That's all.

00:33:49.220 --> 00:33:52.040
A rejecting computation
history is just one that's

00:33:52.040 --> 00:33:53.540
a little different at the end.

00:33:53.540 --> 00:33:56.180
The machine just ended up
rejecting instead of accepting.

00:33:56.180 --> 00:34:03.620
Otherwise everything has to
be spelled out in accordance

00:34:03.620 --> 00:34:06.254
with the rules of the machine
and the start configuration.

00:34:09.710 --> 00:34:13.597
Yeah, we were assuming
one rejecting state.

00:34:13.597 --> 00:34:15.889
Yeah, that's the way we
actually define Turing machines

00:34:15.889 --> 00:34:16.681
in the first place.

00:34:16.681 --> 00:34:19.159
But, who's arguing.

00:34:19.159 --> 00:34:21.460
Yeah, there's one reject state.

00:34:21.460 --> 00:34:24.460
We're all
deterministic, correct.

00:34:24.460 --> 00:34:25.960
Why do we need the padding?

00:34:25.960 --> 00:34:29.560
Because I want to make these
all the same size, all of these

00:34:29.560 --> 00:34:30.550
configurations.

00:34:30.550 --> 00:34:32.739
That's going to help
me later in terms

00:34:32.739 --> 00:34:37.810
of describing the invalid
configurations, the ones that

00:34:37.810 --> 00:34:42.980
are not legal configurations,
legal rejecting configurations.

00:34:42.980 --> 00:34:45.409
So just simply a
matter of convenience,

00:34:45.409 --> 00:34:47.449
but just accept it for now.

00:34:47.449 --> 00:34:49.580
I just want all of
those configurations

00:34:49.580 --> 00:34:55.540
to be the same length in my
rejecting computation history.

00:34:55.540 --> 00:34:57.370
Otherwise I'm not going to--

00:34:57.370 --> 00:34:59.980
I'm just coding that
rejecting computation history

00:34:59.980 --> 00:35:01.210
in this particular way.

00:35:06.610 --> 00:35:09.340
So people are asking about
the details of bad start.

00:35:09.340 --> 00:35:10.540
That's yet to come.

00:35:10.540 --> 00:35:13.010
I have two more slides on this.

00:35:13.010 --> 00:35:16.950
So I'll tell you about how
we're going to do those.

00:35:16.950 --> 00:35:21.480
So R bad-start-- that's a good
question-- is R bad-start all--

00:35:21.480 --> 00:35:25.770
these are all the strings
that don't start this way.

00:35:25.770 --> 00:35:27.060
We'll see it in a second.

00:35:27.060 --> 00:35:30.480
But R bad-start are all the
things that don't start with

00:35:30.480 --> 00:35:31.890
the--

00:35:31.890 --> 00:35:33.000
they start bad.

00:35:36.720 --> 00:35:39.577
They're not starting with
the start configuration.

00:35:39.577 --> 00:35:41.160
They're starting
with some other junk.

00:35:46.130 --> 00:35:48.850
Do we need only one rejecting
computation history?

00:35:48.850 --> 00:35:51.410
What about the other ones?

00:35:51.410 --> 00:35:54.500
This is a deterministic machine,
so there's only going to be--

00:35:54.500 --> 00:35:57.060
if I prescribe the
lengths as I've done,

00:35:57.060 --> 00:35:59.607
there's going to be only one
rejecting computation history.

00:35:59.607 --> 00:36:01.190
Because it's
deterministic, everything

00:36:01.190 --> 00:36:06.380
is going to be forced
from the beginning.

00:36:06.380 --> 00:36:09.020
Should R1 be the
not of those three?

00:36:09.020 --> 00:36:09.880
No.

00:36:09.880 --> 00:36:11.870
R1 is describing
all of the strings

00:36:11.870 --> 00:36:18.660
except, except this one string.

00:36:18.660 --> 00:36:21.800
So I'm capturing all the
different possible ways

00:36:21.800 --> 00:36:24.140
a string could fail
to be the string.

00:36:24.140 --> 00:36:26.180
It could start wrong.

00:36:26.180 --> 00:36:28.760
Could be wrong along
the middle somewhere.

00:36:28.760 --> 00:36:31.870
So I have to union
them together.

00:36:31.870 --> 00:36:35.290
Because I'm describing--
as I always believe,

00:36:35.290 --> 00:36:38.420
negations are the most
confusing thing to everybody,

00:36:38.420 --> 00:36:41.510
including me.

00:36:41.510 --> 00:36:43.460
So we're describing
all the things

00:36:43.460 --> 00:36:47.143
that are not this string.

00:36:47.143 --> 00:36:48.810
We're trying to stay
away from that one.

00:36:48.810 --> 00:36:50.580
We want to describe
everything else.

00:36:55.308 --> 00:36:57.100
All right, I think I'd
better move on here.

00:36:57.100 --> 00:36:58.740
We've got a lot of questions.

00:36:58.740 --> 00:37:01.470
Talk to the TAs.

00:37:01.470 --> 00:37:02.700
All right, check in.

00:37:09.190 --> 00:37:15.110
How big is this rejecting
computation history anyway?

00:37:15.110 --> 00:37:16.430
Interesting.

00:37:16.430 --> 00:37:18.620
There's a lesson here.

00:37:18.620 --> 00:37:21.680
I got a big burst of answers
right at the very beginning.

00:37:21.680 --> 00:37:24.550
All wrong.

00:37:24.550 --> 00:37:26.020
But then the bright--

00:37:26.020 --> 00:37:29.770
the people who took a little
bit more time to think

00:37:29.770 --> 00:37:35.620
started getting the
right answer, which is--

00:37:35.620 --> 00:37:36.130
let's look.

00:37:36.130 --> 00:37:39.622
We've got a close election here
folks, so now I have to report.

00:37:39.622 --> 00:37:41.080
Hope we don't have
to do a recount.

00:37:45.680 --> 00:37:47.450
OK, come on guys.

00:37:47.450 --> 00:37:48.500
Answer up.

00:37:48.500 --> 00:37:50.152
10 seconds.

00:37:50.152 --> 00:37:51.110
This is not super hard.

00:37:53.690 --> 00:37:54.395
Stop the count.

00:37:57.500 --> 00:38:00.905
Yeah, I think we'd better stop
at this, we're on the edge.

00:38:03.880 --> 00:38:05.845
OK, 3 seconds.

00:38:10.030 --> 00:38:12.780
End polling.

00:38:12.780 --> 00:38:13.470
Share results.

00:38:20.670 --> 00:38:23.100
The correct answer
is, in fact, c.

00:38:23.100 --> 00:38:23.730
Why is that?

00:38:23.730 --> 00:38:27.900
Because each configuration
is 2 to the n to the k.

00:38:27.900 --> 00:38:31.770
So that's how much space the
machine has, exponential space.

00:38:31.770 --> 00:38:35.328
But the amount of time,
which is each one--

00:38:35.328 --> 00:38:36.870
the number of
configurations is going

00:38:36.870 --> 00:38:39.600
to be the amount of
time that's used.

00:38:39.600 --> 00:38:42.300
It's going to be exponentially
more even than that.

00:38:42.300 --> 00:38:45.450
So it's going to be 2 to
the 2 to the n of the k,

00:38:45.450 --> 00:38:47.550
is how many steps
the machine can run.

00:38:47.550 --> 00:38:51.420
And that's going to be how long
the computation history could

00:38:51.420 --> 00:38:52.380
be.

00:38:52.380 --> 00:38:55.650
So it's a very long thing.

00:38:55.650 --> 00:39:02.070
And when you think about
it, the regular expression

00:39:02.070 --> 00:39:03.860
we are generating,
how big is that?

00:39:06.880 --> 00:39:09.490
The regular expression--
again, a lot

00:39:09.490 --> 00:39:11.830
of people playing
off my comments here.

00:39:15.660 --> 00:39:17.100
Were the votes legal or not?

00:39:17.100 --> 00:39:18.810
OK.

00:39:18.810 --> 00:39:20.050
Let's focus here.

00:39:22.990 --> 00:39:26.260
So this is doubly
exponentially large.

00:39:26.260 --> 00:39:28.360
How big is the
regular expression

00:39:28.360 --> 00:39:29.680
that we're generating?

00:39:29.680 --> 00:39:32.780
Well that has to be
produced in polynomial time,

00:39:32.780 --> 00:39:34.330
so it's only polynomially big.

00:39:34.330 --> 00:39:38.260
So we have this little teensy
weensy, relatively speaking,

00:39:38.260 --> 00:39:42.100
regular expression,
which is only n to the k.

00:39:42.100 --> 00:39:44.830
It's having to
describe all strings

00:39:44.830 --> 00:39:49.300
except for this particular
string, which is 2 to the 2

00:39:49.300 --> 00:39:51.050
to the n to the k.

00:39:51.050 --> 00:39:54.410
So in a sense,
this string that is

00:39:54.410 --> 00:39:56.270
related to that
regular expression

00:39:56.270 --> 00:39:58.520
is doubly exponentially
larger than that.

00:39:58.520 --> 00:40:01.070
And that kind of presents
some of the challenge

00:40:01.070 --> 00:40:05.000
in doing the reduction,
in constructing

00:40:05.000 --> 00:40:07.640
that regular expression.

00:40:07.640 --> 00:40:09.740
So let's move on
and start doing--

00:40:09.740 --> 00:40:12.660
this is the hard stuff.

00:40:12.660 --> 00:40:17.430
Here is the bad start,
which is challenging enough.

00:40:17.430 --> 00:40:20.160
Even this little piece is going
to be a little bit challenging

00:40:20.160 --> 00:40:22.680
to describe.

00:40:22.680 --> 00:40:26.230
Just rewriting from
the previous slide.

00:40:26.230 --> 00:40:28.050
So we're trying
to make R1, which

00:40:28.050 --> 00:40:30.540
is generating all the
strings except the rejecting

00:40:30.540 --> 00:40:34.230
computation history for M on w.

00:40:34.230 --> 00:40:35.940
It's in those three parts.

00:40:35.940 --> 00:40:39.150
Right now I'm describing
the bad start piece.

00:40:39.150 --> 00:40:41.820
So that's going to
describe all strings that

00:40:41.820 --> 00:40:46.630
don't start with this C1.

00:40:46.630 --> 00:40:47.880
So let me write that out here.

00:40:47.880 --> 00:40:49.650
This is going to
generate all strings that

00:40:49.650 --> 00:40:56.520
don't start with C start or
C1, which is as specified.

00:40:56.520 --> 00:40:57.460
Looks like this.

00:40:57.460 --> 00:41:01.320
So any string that doesn't start
with these symbols, doesn't

00:41:01.320 --> 00:41:06.610
start exactly like
this, should be

00:41:06.610 --> 00:41:12.220
described by bad start,
that regular expression.

00:41:12.220 --> 00:41:17.620
So that, in itself, is going
to be further subdivided.

00:41:17.620 --> 00:41:22.600
And the reason for that is
not that hard to understand.

00:41:22.600 --> 00:41:24.490
I'm going to--
bad start is going

00:41:24.490 --> 00:41:31.600
to accomplish its
goal by saying, well,

00:41:31.600 --> 00:41:34.630
anything that doesn't
start this way either

00:41:34.630 --> 00:41:38.260
doesn't start with a q0,
or doesn't or doesn't

00:41:38.260 --> 00:41:40.570
have a w1 in the next
place, or doesn't

00:41:40.570 --> 00:41:42.460
have a w2 in the next place.

00:41:42.460 --> 00:41:48.220
Or somewhere along the
way, it has a wrong symbol.

00:41:48.220 --> 00:41:52.960
Each one of these guys
is going to be about one

00:41:52.960 --> 00:41:58.973
of those symbols being wrong
in some particular place.

00:41:58.973 --> 00:42:00.890
So I'm going to show you
what those look like.

00:42:00.890 --> 00:42:07.750
So right now, I'm going
to focus my attention

00:42:07.750 --> 00:42:12.960
on describing all strings
except for this one.

00:42:12.960 --> 00:42:18.150
All strings that start with
something except for this one.

00:42:20.970 --> 00:42:23.310
So just remember,
delta is the alphabet

00:42:23.310 --> 00:42:26.070
for the competition histories.

00:42:26.070 --> 00:42:28.950
And some notation here,
delta sub epsilon,

00:42:28.950 --> 00:42:30.450
we've seen this
before, is you're

00:42:30.450 --> 00:42:35.490
going to add in epsilon as
an allowed thing for delta.

00:42:35.490 --> 00:42:39.600
So it's all the
symbols, or epsilon, now

00:42:39.600 --> 00:42:41.740
thought of as a set here.

00:42:41.740 --> 00:42:45.390
And furthermore, it's going to
be convenient to talk about all

00:42:45.390 --> 00:42:49.080
of the symbols in delta,
except for some symbol.

00:42:49.080 --> 00:42:51.570
So like at the
very beginning, q0.

00:42:51.570 --> 00:42:55.092
I want to talk about all of the
symbols except for q0 symbol.

00:42:55.092 --> 00:42:56.550
Because that's what
I'm going to be

00:42:56.550 --> 00:43:02.130
using to start off R bad-start.

00:43:02.130 --> 00:43:05.590
It's going to be
anything except for q0.

00:43:05.590 --> 00:43:07.370
So let's just see
how that looks.

00:43:07.370 --> 00:43:11.890
So here is S0, the very
first part of our bad start.

00:43:11.890 --> 00:43:14.420
It's going to say--

00:43:14.420 --> 00:43:18.430
I'm trying to color the
active ingredient here

00:43:18.430 --> 00:43:22.020
in the pink color.

00:43:22.020 --> 00:43:29.980
So delta, with q0 removed,
followed by anything.

00:43:29.980 --> 00:43:31.570
So this little
regular expression

00:43:31.570 --> 00:43:36.460
here describes all strings
that don't start with a q0,

00:43:36.460 --> 00:43:38.110
as I'm indicating over here.

00:43:38.110 --> 00:43:40.930
All strings that
don't start with a q0

00:43:40.930 --> 00:43:45.750
is what as S0 describes.

00:43:45.750 --> 00:43:47.782
You have to understand
that, because it's just

00:43:47.782 --> 00:43:48.990
going to build up from there.

00:43:51.880 --> 00:43:53.820
So what do we want
to say for S1?

00:43:53.820 --> 00:43:55.860
What's going to be
all strings that don't

00:43:55.860 --> 00:43:58.990
have w1 in the second place?

00:43:58.990 --> 00:44:01.960
So I'm going to
write that over here.

00:44:01.960 --> 00:44:06.520
S1 is anything in
the first place--

00:44:06.520 --> 00:44:09.400
I mean, if the first place
was wrong, S0 took care of it.

00:44:09.400 --> 00:44:11.400
So I'm just going to
keep my life simple.

00:44:11.400 --> 00:44:15.030
All I want to do is
describe all of the places

00:44:15.030 --> 00:44:17.400
where the second
symbol is wrong.

00:44:17.400 --> 00:44:19.530
Namely, it's not w1.

00:44:19.530 --> 00:44:22.600
So anything in the
first place, something

00:44:22.600 --> 00:44:27.490
besides w1 in the next place,
and then anything at all

00:44:27.490 --> 00:44:28.150
afterward.

00:44:28.150 --> 00:44:30.460
Those are all strings
that don't have--

00:44:30.460 --> 00:44:34.840
[AUDIO CUTS]

00:44:34.840 --> 00:44:37.090
So I'll write it
over here like that.

00:44:37.090 --> 00:44:41.840
Now S2 similarly is going to
d since I have exponentiation,

00:44:41.840 --> 00:44:45.100
let's use that for convenience.

00:44:45.100 --> 00:44:47.930
Delta delta, or
just delta squared.

00:44:47.930 --> 00:44:53.645
So anything in the first two
places, then not w2, and then

00:44:53.645 --> 00:44:55.640
the next place,
and then anything.

00:44:55.640 --> 00:44:57.540
So that's going to
capture this part.

00:44:57.540 --> 00:44:59.630
So this is what
these S's do, and you

00:44:59.630 --> 00:45:01.622
can sort of get the idea.

00:45:01.622 --> 00:45:02.330
So dot, dot, dot.

00:45:02.330 --> 00:45:04.790
This Sn is going to
describe everything

00:45:04.790 --> 00:45:08.675
except for wn in
that location, which

00:45:08.675 --> 00:45:12.590
is going to be the n plus
first location, actually.

00:45:12.590 --> 00:45:18.620
And now I have to continue
on doing that for the blanks.

00:45:18.620 --> 00:45:24.380
So now, if you think
with me, let's just

00:45:24.380 --> 00:45:26.120
take a look how that could go.

00:45:29.630 --> 00:45:34.790
The next symbol, which is
skipping over the n plus 1

00:45:34.790 --> 00:45:38.440
that I've already
taken care of, I

00:45:38.440 --> 00:45:41.890
want to say it's not a
blank symbol in this very

00:45:41.890 --> 00:45:44.450
first location after the input.

00:45:44.450 --> 00:45:46.660
So again, I'm
describing these non--

00:45:46.660 --> 00:45:49.420
these strings which are not
the start configuration.

00:45:49.420 --> 00:45:53.198
It could fail because
there's not a blank where

00:45:53.198 --> 00:45:54.490
there's supposed to be a blank.

00:45:57.190 --> 00:45:59.380
Suppose I do that for
each one of these guys.

00:46:02.790 --> 00:46:05.570
That would work.

00:46:05.570 --> 00:46:06.560
But.

00:46:06.560 --> 00:46:07.370
But what?

00:46:10.250 --> 00:46:13.060
Think.

00:46:13.060 --> 00:46:17.260
This is actually not going
to be a good solution for us.

00:46:17.260 --> 00:46:21.700
Because there are exponentially
many blanks over here.

00:46:21.700 --> 00:46:24.652
This is a hugely
long configuration.

00:46:24.652 --> 00:46:26.360
And so there's
exponentially many blanks.

00:46:26.360 --> 00:46:30.890
If I do it this way, I'm going
to end up with an exponentially

00:46:30.890 --> 00:46:32.990
large regular expression.

00:46:32.990 --> 00:46:35.970
And that's not doable
in polynomial time.

00:46:35.970 --> 00:46:39.380
So I have a more complicated
way of getting the same effect.

00:46:39.380 --> 00:46:40.970
Which is-- I don't
really expect you

00:46:40.970 --> 00:46:43.010
to fully parse
through this right

00:46:43.010 --> 00:46:46.250
now, in real time in lecture,
but let me try to help you.

00:46:46.250 --> 00:46:50.030
What I'm going to do is skip
over these first initial n

00:46:50.030 --> 00:46:53.120
plus 1 places, and
then a variable number

00:46:53.120 --> 00:46:58.310
of places, which is indicated
by the next piece here.

00:46:58.310 --> 00:47:00.040
And the way that works is--

00:47:00.040 --> 00:47:02.110
these are all
strings of length n

00:47:02.110 --> 00:47:09.370
plus 1 through the end
of the configuration.

00:47:09.370 --> 00:47:13.930
And to understand that,
it's almost a little

00:47:13.930 --> 00:47:16.630
too technical to even
try, but let's see.

00:47:16.630 --> 00:47:19.840
If I put delta to the 7,
that's all strings of length 7.

00:47:19.840 --> 00:47:22.840
But if I put delta
sub epsilon to the 7,

00:47:22.840 --> 00:47:24.400
if you think about
what that means,

00:47:24.400 --> 00:47:28.120
that's all strings of
length between 0 and 7.

00:47:31.530 --> 00:47:33.660
Because I can either
have it as epsilon

00:47:33.660 --> 00:47:37.110
as my variable or a
symbol from delta.

00:47:37.110 --> 00:47:39.360
And so that's what
I'm doing over here.

00:47:39.360 --> 00:47:45.240
I'm getting a variable length
space, spacer of deltas,

00:47:45.240 --> 00:47:48.420
that are going to then end
up at a certain location--

00:47:48.420 --> 00:47:50.670
I'm going to say at that place.

00:47:50.670 --> 00:47:53.430
Then I have a non-blank.

00:47:53.430 --> 00:47:56.220
Because all I need
to do is describe

00:47:56.220 --> 00:48:02.440
the strings that fail to have a
blank somewhere in this range.

00:48:02.440 --> 00:48:04.680
So we've got to sort
have a variable spacer

00:48:04.680 --> 00:48:10.800
out to that spot, where
that missing blank might be.

00:48:10.800 --> 00:48:12.850
So that's what this describes.

00:48:12.850 --> 00:48:15.420
If you didn't get
that, don't worry.

00:48:15.420 --> 00:48:17.010
That is a technical
point and you can

00:48:17.010 --> 00:48:19.620
try to think about it offline.

00:48:19.620 --> 00:48:25.380
And then at the very end, I'm
going to describe what happens.

00:48:25.380 --> 00:48:27.600
Describe the strings
that fail to have

00:48:27.600 --> 00:48:30.510
a hashtag in that location.

00:48:30.510 --> 00:48:36.180
It's how I describe all
strings that don't start right.

00:48:36.180 --> 00:48:39.330
That's a lot of work, just
to do that little piece.

00:48:39.330 --> 00:48:42.540
Fortunately, the next two
pieces are easier, surprisingly.

00:48:46.110 --> 00:48:50.160
You can jump in with a
question, but maybe I

00:48:50.160 --> 00:48:53.580
should move, push on.

00:48:53.580 --> 00:48:58.580
So now I'm going to describe the
bad move and bad reject pieces.

00:48:58.580 --> 00:49:02.150
And bad reject
generates all strings

00:49:02.150 --> 00:49:06.200
that don't contain
the q reject symbol.

00:49:06.200 --> 00:49:07.850
So that's going to
certainly describe

00:49:07.850 --> 00:49:11.580
all of the strings that
don't end correctly.

00:49:11.580 --> 00:49:15.870
And that's just simply
the delta with the q

00:49:15.870 --> 00:49:19.730
reject symbol removed, and
then any string of those.

00:49:19.730 --> 00:49:22.140
That's all strings that
don't have q reject.

00:49:22.140 --> 00:49:26.360
So that's going to
describe all strings that

00:49:26.360 --> 00:49:28.713
don't end with a q reject,
plus some other junk

00:49:28.713 --> 00:49:29.630
strings along the way.

00:49:29.630 --> 00:49:33.500
But that's all that's
never a problem,

00:49:33.500 --> 00:49:36.737
to put in other strings
that you might be capturing

00:49:36.737 --> 00:49:38.570
in some other part of
the regular expression

00:49:38.570 --> 00:49:40.220
that you know are bad strings.

00:49:40.220 --> 00:49:43.400
You just want to make sure you
don't put in that one uniquely

00:49:43.400 --> 00:49:45.500
good string, which is
the rejecting computation

00:49:45.500 --> 00:49:47.700
history, good string.

00:49:47.700 --> 00:49:51.470
And lastly, we're
going to use the notion

00:49:51.470 --> 00:49:54.512
of the neighborhoods.

00:49:54.512 --> 00:49:56.220
You might think this
is the hardest part,

00:49:56.220 --> 00:49:57.820
but in fact not that hard.

00:49:57.820 --> 00:50:01.650
So these are all
of the strings that

00:50:01.650 --> 00:50:08.160
have somewhere along the
way a violation according

00:50:08.160 --> 00:50:09.480
to M's rules.

00:50:09.480 --> 00:50:11.580
You want to describe
all of those as well.

00:50:11.580 --> 00:50:13.720
I'm going to do that in
terms of the neighborhoods.

00:50:13.720 --> 00:50:19.320
But the neighborhoods are
going to be stretched out.

00:50:19.320 --> 00:50:22.460
We don't have a tableau
anymore, so they're not

00:50:22.460 --> 00:50:25.910
so easily visualizable,
but it's the same idea,

00:50:25.910 --> 00:50:26.820
the neighborhood.

00:50:26.820 --> 00:50:28.700
So this is abc and def.

00:50:28.700 --> 00:50:32.630
But now it's an
illegal neighborhood.

00:50:32.630 --> 00:50:34.745
def does not follow from abc.

00:50:39.170 --> 00:50:41.330
If all the
neighborhoods are legal,

00:50:41.330 --> 00:50:45.710
then the whole computation
is a legitimate computation,

00:50:45.710 --> 00:50:48.060
provided it starts
and ends correctly.

00:50:48.060 --> 00:50:50.030
So if it's not a
legitimate computation,

00:50:50.030 --> 00:50:52.520
there's got to be an illegal
neighborhood somewhere.

00:50:52.520 --> 00:50:55.250
And I'm going to just
describe all strings that

00:50:55.250 --> 00:50:57.530
have an illegal neighborhood.

00:50:57.530 --> 00:51:00.350
And the interesting part is
that you have to describe--

00:51:00.350 --> 00:51:07.430
you have to place that
separator between abc and def.

00:51:07.430 --> 00:51:10.550
So this is another place where
we're going to critically use

00:51:10.550 --> 00:51:17.780
the exponentiation, and the fact
that all of the configurations

00:51:17.780 --> 00:51:19.020
are the same length.

00:51:19.020 --> 00:51:20.420
That's what we're using there.

00:51:20.420 --> 00:51:23.720
We know exactly how
far apart the bottom

00:51:23.720 --> 00:51:26.960
of the 2 by 3 neighborhood
is from the top of the 2

00:51:26.960 --> 00:51:28.640
by 3 neighborhood.

00:51:28.640 --> 00:51:33.110
So we're going to take a
union over all illegal 2

00:51:33.110 --> 00:51:35.900
by 3 neighborhoods.

00:51:35.900 --> 00:51:37.520
Neighborhood settings,
I should say.

00:51:37.520 --> 00:51:39.620
And there's only a fixed number
of those, for the same reason

00:51:39.620 --> 00:51:41.360
that we had in the
Cook-Levin theorem.

00:51:41.360 --> 00:51:44.210
There's a fixed number of those,
depending upon the machine.

00:51:44.210 --> 00:51:45.920
And now we're going
to have, say, we're

00:51:45.920 --> 00:51:48.050
going to start with anything.

00:51:48.050 --> 00:51:49.730
Here's the top of
the neighborhood.

00:51:49.730 --> 00:51:53.480
Here is the separator
that separates the top

00:51:53.480 --> 00:51:57.020
from the bottom in the two
consecutive configurations,

00:51:57.020 --> 00:52:03.410
here's Ci going C i plus 1
inside my computation history.

00:52:03.410 --> 00:52:06.350
And then after that
separator, I put

00:52:06.350 --> 00:52:09.365
in the second part of the
neighborhood, which is the def.

00:52:13.020 --> 00:52:16.870
You have to really be
comfortable with the way we've

00:52:16.870 --> 00:52:19.060
been presenting these
other reductions up

00:52:19.060 --> 00:52:22.090
till now, to really get this.

00:52:22.090 --> 00:52:25.240
Anyway, I think
we're at the break.

00:52:25.240 --> 00:52:27.940
So we can just take
questions during the break,

00:52:27.940 --> 00:52:29.730
if you have any.

00:52:29.730 --> 00:52:35.910
And I will, otherwise,
see you in five minutes.

00:52:35.910 --> 00:52:38.760
In my description back here--

00:52:38.760 --> 00:52:40.020
let me just take this off.

00:52:42.810 --> 00:52:47.910
For bad reject, it looks like
I'm doing kind of overkill,

00:52:47.910 --> 00:52:50.160
and maybe doing
something wrong here.

00:52:50.160 --> 00:52:53.310
I'm describing all strings that
don't have a reject anywhere.

00:52:53.310 --> 00:52:57.930
But as long as I don't describe
the legitimate rejecting

00:52:57.930 --> 00:53:02.200
computation history, I do
describe all strings that

00:53:02.200 --> 00:53:07.490
don't end correctly, I'm good.

00:53:07.490 --> 00:53:10.670
I could go through more
effort to make sure

00:53:10.670 --> 00:53:15.380
that I'm only describing the
very last configuration here

00:53:15.380 --> 00:53:17.270
as not having the reject.

00:53:17.270 --> 00:53:22.465
But that would
just be more work,

00:53:22.465 --> 00:53:23.840
and I don't need
to do that work.

00:53:23.840 --> 00:53:25.673
So maybe it would be
good just to understand

00:53:25.673 --> 00:53:28.172
why this is sufficient,
what I've described here,

00:53:28.172 --> 00:53:30.005
and it's not going to
cause me any problems.

00:53:37.010 --> 00:53:41.270
I'm getting a note from
one of my TAs, Thomas,

00:53:41.270 --> 00:53:43.910
saying that the
notion "bad" perhaps

00:53:43.910 --> 00:53:46.610
is confusing, because bad
sounds like rejecting.

00:53:46.610 --> 00:53:47.720
Yes.

00:53:47.720 --> 00:53:52.490
I mean bad in the sense of not
describing a legal computation

00:53:52.490 --> 00:53:53.150
history.

00:53:53.150 --> 00:53:54.525
If you can think
of another name,

00:53:54.525 --> 00:53:59.090
I'm happy to switch
that for future years.

00:53:59.090 --> 00:54:00.020
Too late for now.

00:54:00.020 --> 00:54:00.950
But, yeah.

00:54:00.950 --> 00:54:03.290
I don't mean that rejecting,
I mean that it's--

00:54:07.050 --> 00:54:08.910
well, I don't know
what the right term is.

00:54:08.910 --> 00:54:11.040
Illegal?

00:54:11.040 --> 00:54:14.515
Or-- I'm not sure what a good--

00:54:14.515 --> 00:54:16.140
How are the neighborhoods
defined here?

00:54:16.140 --> 00:54:18.980
What is the tableau here?

00:54:18.980 --> 00:54:22.790
I think you do need to think
about it after lecture.

00:54:22.790 --> 00:54:25.580
But the tableau, you
can think of the tableau

00:54:25.580 --> 00:54:28.030
now here just
written out linearly.

00:54:28.030 --> 00:54:30.240
There are all of the
rows now, instead of

00:54:30.240 --> 00:54:32.760
nicely organized into a table.

00:54:32.760 --> 00:54:35.557
They just appear
consecutively, because I'm just

00:54:35.557 --> 00:54:36.390
trying to describe--

00:54:36.390 --> 00:54:38.070
I need to do it to
describe a string,

00:54:38.070 --> 00:54:40.080
whether my regular
expression doesn't really

00:54:40.080 --> 00:54:41.520
make sense to think about.

00:54:41.520 --> 00:54:43.950
I mean you can fold it up
into a tableau, if you like.

00:54:43.950 --> 00:54:47.220
And then abc and
def will line up.

00:54:47.220 --> 00:54:50.280
But here, if you think about
them written consecutively,

00:54:50.280 --> 00:54:52.740
this is exactly how far
apart they end up being.

00:54:56.750 --> 00:55:02.570
Are there only polynomially
many illegal neighborhoods?

00:55:02.570 --> 00:55:04.360
That's why I kind
of corrected myself.

00:55:04.360 --> 00:55:07.030
It's not illegal
neighborhoods that we're

00:55:07.030 --> 00:55:10.300
talking-- because the number of
neighborhoods in this picture

00:55:10.300 --> 00:55:11.410
is vast.

00:55:11.410 --> 00:55:14.650
But the number of
neighborhood settings,

00:55:14.650 --> 00:55:17.050
the way to set these
values to abc, def.

00:55:20.950 --> 00:55:22.810
I mean these are
symbols that can

00:55:22.810 --> 00:55:29.790
appear in a configuration
of the machine.

00:55:29.790 --> 00:55:34.740
There's only a fixed number of
symbols that can appear here,

00:55:34.740 --> 00:55:38.310
that depend upon the
definition of the machine.

00:55:38.310 --> 00:55:39.720
So it's not only polynomial.

00:55:39.720 --> 00:55:42.420
There's a constant number
of these things, that

00:55:42.420 --> 00:55:45.190
only depends on the machine.

00:55:45.190 --> 00:55:47.560
So you have to think
about what's going on.

00:55:47.560 --> 00:55:49.830
There's a lot-- this
is a lot on the slide.

00:55:54.480 --> 00:55:56.040
Bad history for reject.

00:55:56.040 --> 00:55:57.900
It's a bad history
for rejecting,

00:55:57.900 --> 00:55:59.310
somebody's suggesting.

00:55:59.310 --> 00:56:01.765
Yeah, it's a bad history.

00:56:04.930 --> 00:56:05.590
Fake news.

00:56:05.590 --> 00:56:07.360
Maybe we should be fake.

00:56:07.360 --> 00:56:09.010
Fake would be a good term.

00:56:09.010 --> 00:56:10.360
No, that's not so good.

00:56:10.360 --> 00:56:10.960
I don't know.

00:56:17.600 --> 00:56:18.740
Yeah, 2 by 3.

00:56:18.740 --> 00:56:21.380
The reason 2 by
3, is the right--

00:56:21.380 --> 00:56:22.880
Somebody's asking why 2 by 3.

00:56:22.880 --> 00:56:25.670
2 by 3 is exactly
the size you need

00:56:25.670 --> 00:56:28.280
to say that, if all the
2 by 3 neighborhoods

00:56:28.280 --> 00:56:33.620
are correct everywhere in
the computation history,

00:56:33.620 --> 00:56:36.590
then the whole
history is going to be

00:56:36.590 --> 00:56:38.480
consistent with the
rules of M. It's

00:56:38.480 --> 00:56:41.030
going to be a legal
representation of a computation

00:56:41.030 --> 00:56:42.350
of M.

00:56:42.350 --> 00:56:48.500
So if the string, which
is allegedly a computation

00:56:48.500 --> 00:56:51.710
history, has a bad
neighborhood somewhere,

00:56:51.710 --> 00:56:55.140
bad 2 by 3 neighborhood
somewhere, then--

00:56:55.140 --> 00:56:57.470
well if it's not a legal
computation history,

00:56:57.470 --> 00:57:00.920
it's got to have a
bad neighborhood, 2

00:57:00.920 --> 00:57:02.406
by 3 neighborhood somewhere.

00:57:06.390 --> 00:57:10.130
OK, let's move on.

00:57:10.130 --> 00:57:13.970
Because I think we're
out of time here.

00:57:13.970 --> 00:57:16.220
Our timer is up.

00:57:16.220 --> 00:57:18.440
We're going to shift
gears now anyway.

00:57:18.440 --> 00:57:22.803
So if you got a little
lost in the previous proof,

00:57:22.803 --> 00:57:24.720
we're going to talk about
something different.

00:57:24.720 --> 00:57:25.820
And in some ways,
a little bit, I

00:57:25.820 --> 00:57:27.500
think a little lighter,
a little less technical.

00:57:27.500 --> 00:57:28.542
And that's about oracles.

00:57:36.570 --> 00:57:37.710
What are oracles?

00:57:37.710 --> 00:57:39.000
Oracles are a simple thing.

00:57:41.700 --> 00:57:48.008
But they are a useful concept
for a number of reasons.

00:57:48.008 --> 00:57:49.800
Especially because
they're going to tell us

00:57:49.800 --> 00:57:52.500
something interesting about
methods, which may or may not

00:57:52.500 --> 00:57:55.560
be useful for proving
the P versus NP question,

00:57:55.560 --> 00:57:57.990
when someday somebody
hopefully does that.

00:58:01.170 --> 00:58:03.030
What is an oracle?

00:58:03.030 --> 00:58:05.340
An oracle is free
information you're

00:58:05.340 --> 00:58:07.050
going to give to a
Turing machine, which

00:58:07.050 --> 00:58:11.443
might affect the difficulty
of solving problems.

00:58:11.443 --> 00:58:13.860
And the way we're going to
represent that free information

00:58:13.860 --> 00:58:17.400
is, we're going to allow
the Turing machine to test

00:58:17.400 --> 00:58:21.740
membership in some
specified language,

00:58:21.740 --> 00:58:27.230
without charging for
the work involved.

00:58:30.970 --> 00:58:34.090
I'm going to allow you have any
language at all, some language

00:58:34.090 --> 00:58:38.810
A. And say a Turing machine
with an oracle for A

00:58:38.810 --> 00:58:43.085
is written this way, M
with a superscript A. It's

00:58:43.085 --> 00:58:46.370
a machine that has a black
box that can answer questions.

00:58:46.370 --> 00:58:50.540
Is some string, which
the machine can choose,

00:58:50.540 --> 00:58:52.680
in A or not?

00:58:52.680 --> 00:58:56.340
And it gets that answer in one
step, effectively for free.

00:59:01.020 --> 00:59:04.320
So you can imagine, depending
upon the language that you're

00:59:04.320 --> 00:59:07.515
providing to the machine,
that may or may not be useful.

00:59:11.250 --> 00:59:15.800
For example, suppose I give you
an oracle for the SAT language.

00:59:15.800 --> 00:59:17.880
That can be very useful.

00:59:17.880 --> 00:59:21.440
It could be very useful for
deciding SAT, for example.

00:59:21.440 --> 00:59:24.110
Because now you don't have to
go through a brute force search

00:59:24.110 --> 00:59:25.310
to solve SAT.

00:59:25.310 --> 00:59:26.720
You just ask the oracle.

00:59:26.720 --> 00:59:29.630
And the oracle is going to
say, yes it's satisfiable,

00:59:29.630 --> 00:59:31.430
or no it's not satisfiable.

00:59:31.430 --> 00:59:36.230
But you can use that to solve
other languages too, quickly.

00:59:36.230 --> 00:59:40.670
Because anything that you can
do in NP, you can reduce to SAT.

00:59:40.670 --> 00:59:43.383
So you can convert it to a SAT
question, which you can then

00:59:43.383 --> 00:59:45.050
ship up to the oracle,
and the oracle is

00:59:45.050 --> 00:59:47.540
going to tell you the answer.

00:59:47.540 --> 00:59:51.480
The word "oracle" already sort
of conveys something magical.

00:59:51.480 --> 00:59:53.268
We're not really
going to be concerned

00:59:53.268 --> 00:59:55.310
with the operation of the
oracle, so don't ask me

00:59:55.310 --> 00:59:57.602
how does the oracle work, or
what does it correspond to

00:59:57.602 --> 00:59:58.320
in reality.

00:59:58.320 --> 00:59:59.090
It doesn't.

00:59:59.090 --> 01:00:01.280
It's just a mathematical
device which

01:00:01.280 --> 01:00:03.770
provides this free information
to the Turing machine, which

01:00:03.770 --> 01:00:05.870
enables it to compute
certain things.

01:00:05.870 --> 01:00:07.657
It turns out to be
a useful concept.

01:00:07.657 --> 01:00:09.740
It's used in cryptography,
where you might imagine

01:00:09.740 --> 01:00:13.250
the oracle could provide
the factors to some number,

01:00:13.250 --> 01:00:16.340
or the password to some
system or something.

01:00:16.340 --> 01:00:17.510
Free information.

01:00:17.510 --> 01:00:19.680
And then what can
you do with that?

01:00:19.680 --> 01:00:22.475
So this is a notion that
comes up in other places.

01:00:27.660 --> 01:00:34.370
If we have an oracle, we can
think of all of the things

01:00:34.370 --> 01:00:37.220
that you can compute
in polynomial time

01:00:37.220 --> 01:00:39.020
relative to that oracle.

01:00:39.020 --> 01:00:41.600
So that's what we--

01:00:41.600 --> 01:00:43.445
the terminology that
people usually use

01:00:43.445 --> 01:00:46.490
is sometimes called
relativism, or computation

01:00:46.490 --> 01:00:49.020
relative to having
this extra information.

01:00:49.020 --> 01:00:52.850
So P with an A oracle
is all of the language

01:00:52.850 --> 01:00:54.740
that you can decide
in polynomial time

01:00:54.740 --> 01:00:59.450
if you have an oracle
for A. Let's see.

01:01:05.190 --> 01:01:05.785
Yeah.

01:01:05.785 --> 01:01:07.410
Somebody's asking
me, is it really free

01:01:07.410 --> 01:01:10.326
or does it cost one unit?

01:01:10.326 --> 01:01:12.947
Even just setting up the oracle
and writing down the question

01:01:12.947 --> 01:01:15.280
to the oracle is going to
take you some number of steps.

01:01:15.280 --> 01:01:17.447
So you're not going to be
able do an infinite number

01:01:17.447 --> 01:01:19.690
of oracle calls in zero time.

01:01:19.690 --> 01:01:22.198
So charging one
step or zero steps,

01:01:22.198 --> 01:01:23.490
not going to make a difference.

01:01:23.490 --> 01:01:25.532
Because you still have to
formulate the question.

01:01:32.230 --> 01:01:34.608
As I pointed out, P
with a SAT oracle--

01:01:34.608 --> 01:01:36.400
so all the things you
do in polynomial time

01:01:36.400 --> 01:01:39.530
with a SAT oracle includes NP.

01:01:39.530 --> 01:01:42.770
Does it perhaps
include other stuff?

01:01:42.770 --> 01:01:46.192
Or does it equal NP?

01:01:46.192 --> 01:01:47.900
Would have been a good
check-in question,

01:01:47.900 --> 01:01:50.330
but I'm not going to ask that.

01:01:50.330 --> 01:01:54.320
In fact, it seems like it
contains other things too.

01:01:54.320 --> 01:02:00.590
Because co-NP is also contained
within P, given a SAT oracle.

01:02:00.590 --> 01:02:04.310
Because the SAT oracle
answer is both yes or no,

01:02:04.310 --> 01:02:07.050
depending upon the answer.

01:02:07.050 --> 01:02:09.890
So if the formula
is unsatisfiable,

01:02:09.890 --> 01:02:12.980
the oracle is going to say
no, it's not in the language.

01:02:12.980 --> 01:02:16.640
And now you can do the
complement of the SAT problem

01:02:16.640 --> 01:02:17.190
as well.

01:02:17.190 --> 01:02:18.580
The unsatisfiability problem.

01:02:18.580 --> 01:02:21.620
So you can do all of
co-NP in the same way.

01:02:21.620 --> 01:02:25.840
You can also define NP
relative to some oracle.

01:02:25.840 --> 01:02:28.600
So all the things you can do
with a non-deterministic Turing

01:02:28.600 --> 01:02:30.910
machine, where all
of the branches

01:02:30.910 --> 01:02:33.400
have separately access.

01:02:33.400 --> 01:02:35.710
And they can ask multiple
questions, by the way,

01:02:35.710 --> 01:02:38.840
of the oracle.

01:02:38.840 --> 01:02:39.770
Independently.

01:02:43.450 --> 01:02:46.690
Let's do another, a little bit
of a more challenging example.

01:02:46.690 --> 01:02:49.363
The MIN-FORMULA language,
which I hope you

01:02:49.363 --> 01:02:50.530
remember from your homework.

01:02:53.510 --> 01:02:57.710
So those are all of the
formulas that do not have

01:02:57.710 --> 01:03:01.850
a shorter equivalent formula.

01:03:01.850 --> 01:03:03.360
They are minimal.

01:03:03.360 --> 01:03:06.030
You cannot make a smaller
formula that's equivalent that

01:03:06.030 --> 01:03:09.950
gives you the same
Boolean function.

01:03:09.950 --> 01:03:14.170
So you showed, for example, that
that language is in P space,

01:03:14.170 --> 01:03:15.430
as I recall.

01:03:15.430 --> 01:03:17.440
And there was some other--
you had another two

01:03:17.440 --> 01:03:18.648
problems about that language.

01:03:23.200 --> 01:03:26.580
The complement of the
MIN-FORMULA problem

01:03:26.580 --> 01:03:28.800
is in NP with a SAT oracle.

01:03:31.640 --> 01:03:34.310
So mull that over for a
second and then we'll see why.

01:03:42.040 --> 01:03:49.900
Here's an algorithm, in NP
with a SAT oracle algorithm.

01:03:49.900 --> 01:03:57.470
So in other words, now I
want to kind of implement

01:03:57.470 --> 01:03:59.810
that strategy, which I argued
in the homework problem

01:03:59.810 --> 01:04:00.710
was not legal.

01:04:00.710 --> 01:04:02.330
But now that I have
the SAT oracle,

01:04:02.330 --> 01:04:05.600
it's going to make it
possible where before it

01:04:05.600 --> 01:04:06.980
was not possible.

01:04:10.230 --> 01:04:12.950
So let's just understand
what I mean by that.

01:04:15.730 --> 01:04:20.920
If I'm trying to do the
non minimal formulas,

01:04:20.920 --> 01:04:29.050
namely the formulas that do have
a shorter equivalent formula.

01:04:29.050 --> 01:04:31.630
I'm going to guess that
shorter formula, called psi.

01:04:35.050 --> 01:04:38.300
The challenge before was testing
whether that shorter formula

01:04:38.300 --> 01:04:41.400
was actually equivalent to phi.

01:04:41.400 --> 01:04:46.040
Because that's not obviously
doable in polynomial time.

01:04:46.040 --> 01:04:51.620
But the equivalence problem for
formulas is a co-NP problem.

01:04:51.620 --> 01:04:53.630
Or if you like to think
about it the other way,

01:04:53.630 --> 01:04:57.440
any formula in equivalence
is an NP problem,

01:04:57.440 --> 01:04:59.360
because you just
have to-- the witness

01:04:59.360 --> 01:05:01.460
is the assignment on
which they disagree.

01:05:05.370 --> 01:05:09.940
So two formulas are equivalent
if they never disagree.

01:05:09.940 --> 01:05:11.295
And so that's a co-NP problem.

01:05:15.560 --> 01:05:18.320
A SAT oracle can
solve a co-NP problem.

01:05:18.320 --> 01:05:22.310
Namely, the equivalence of the
two formulas, the input formula

01:05:22.310 --> 01:05:25.820
and the one that you now
deterministically guessed.

01:05:25.820 --> 01:05:28.240
And if it turns out that
they are equivalent,

01:05:28.240 --> 01:05:30.820
a smaller formula is equivalent
to the input formula,

01:05:30.820 --> 01:05:34.568
you know the input
formula is not minimal.

01:05:34.568 --> 01:05:35.485
And so you can accept.

01:05:39.210 --> 01:05:40.710
And if it gets
the wrong formula,

01:05:40.710 --> 01:05:43.350
it turns out not
to be equivalent,

01:05:43.350 --> 01:05:45.630
then you reject on that
branch of the non-determinism,

01:05:45.630 --> 01:05:47.200
just like we did before.

01:05:47.200 --> 01:05:51.810
And if the formula really was
minimal, none of the branches

01:05:51.810 --> 01:05:54.340
is going to find a shorter
equivalent formula.

01:05:54.340 --> 01:05:58.920
So that's why this problem here
is in NP with a SAT oracle.

01:06:04.510 --> 01:06:08.920
So now we're going to try
to investigate this on my--

01:06:08.920 --> 01:06:13.040
we're getting near the
end of the lecture.

01:06:13.040 --> 01:06:20.270
We're going to look at
problems like, well,

01:06:20.270 --> 01:06:22.700
suppose I compare
P with a SAT oracle

01:06:22.700 --> 01:06:25.980
and NP with a SAT oracle.

01:06:25.980 --> 01:06:28.810
Could those be the same?

01:06:28.810 --> 01:06:31.440
Well, there's reasons to
believe those are not the same.

01:06:34.050 --> 01:06:41.330
But could there be any
A where P with A oracle

01:06:41.330 --> 01:06:43.610
is the same as NP
with an A oracle?

01:06:43.610 --> 01:06:48.050
It seems like no, but
actually that's wrong.

01:06:48.050 --> 01:06:51.800
There is a language, there
are languages for which

01:06:51.800 --> 01:06:55.460
NP with that oracle
and P with that oracle

01:06:55.460 --> 01:06:57.260
are exactly the same.

01:06:57.260 --> 01:06:59.660
And that actually is an
interest-- it's not just

01:06:59.660 --> 01:07:03.860
a curiosity, it actually
has relevance to strategies

01:07:03.860 --> 01:07:05.450
for solving P versus NP.

01:07:08.330 --> 01:07:11.720
Hopefully I'll be able
to have time to get to.

01:07:11.720 --> 01:07:15.130
Can we think of an
oracle like a hash table?

01:07:15.130 --> 01:07:17.335
I think hashing is somehow
different in spirit.

01:07:20.920 --> 01:07:23.080
I understand there's
some similarity there,

01:07:23.080 --> 01:07:24.190
but I don't see the--

01:07:24.190 --> 01:07:33.550
hashing is a way of finding sort
of a short name for objects,

01:07:33.550 --> 01:07:36.460
which has a variety
of different purposes

01:07:36.460 --> 01:07:38.340
why you might want to do that.

01:07:38.340 --> 01:07:40.520
So I don't really
think it's the same.

01:07:40.520 --> 01:07:43.360
Let's see, an oracle
question, OK, let's see.

01:07:43.360 --> 01:07:46.030
How do we use SAT oracle to
solve whether two formulas are

01:07:46.030 --> 01:07:47.128
equivalent?

01:07:51.195 --> 01:07:52.820
OK, this is getting
back to this point.

01:07:52.820 --> 01:07:55.790
How can we use a SAT oracle to
solve whether two formulas are

01:07:55.790 --> 01:07:58.690
equivalent?

01:07:58.690 --> 01:08:03.010
Well, we can use a SAT oracle
to solve any NP problem,

01:08:03.010 --> 01:08:04.480
because it's reducible to SAT.

01:08:07.080 --> 01:08:08.470
In other words--

01:08:08.470 --> 01:08:12.407
P with a SAT oracle
contains all of NP,

01:08:12.407 --> 01:08:14.490
so you have to make sure
you understand that part.

01:08:21.010 --> 01:08:23.569
If you have the clique
problem, you can reduce.

01:08:23.569 --> 01:08:27.010
If I give you a clique problem,
which is an NP problem,

01:08:27.010 --> 01:08:31.510
and I want to use the oracle
to test if the formula--

01:08:31.510 --> 01:08:34.960
if the graph has a clique,
I reduce that problem

01:08:34.960 --> 01:08:37.029
to a SAT problem using
the Cook-Levin theorem.

01:08:41.240 --> 01:08:43.520
And knowing that a
clique of a certain size

01:08:43.520 --> 01:08:46.500
is going to correspond to having
a formula which is satisfiable,

01:08:46.500 --> 01:08:49.770
now I can ask the oracle.

01:08:49.770 --> 01:08:52.319
And if I can do NP problems,
I can do co-NP problems,

01:08:52.319 --> 01:08:54.527
because P is a
deterministic class.

01:08:54.527 --> 01:08:56.819
Even though it has an oracle,
it's still deterministic.

01:08:56.819 --> 01:08:58.830
It can invert the answer.

01:08:58.830 --> 01:09:03.520
Something that non-deterministic
machines cannot necessarily do.

01:09:03.520 --> 01:09:05.229
So I don't know, maybe that's--

01:09:05.229 --> 01:09:08.109
let's move on.

01:09:08.109 --> 01:09:11.939
So there's an oracle
where P to the A

01:09:11.939 --> 01:09:15.827
equals NP to the A, which
kind of seems kind of amazing

01:09:15.827 --> 01:09:16.410
at some level.

01:09:16.410 --> 01:09:19.740
Because here's an oracle where
the non-determinism-- if I

01:09:19.740 --> 01:09:23.520
give you that oracle,
non-determinism doesn't help.

01:09:26.399 --> 01:09:28.939
And it's actually a
language we've seen.

01:09:28.939 --> 01:09:29.594
It's TQBF.

01:09:32.470 --> 01:09:34.021
Why is that?

01:09:34.021 --> 01:09:35.229
Well, here's the whole proof.

01:09:38.220 --> 01:09:41.294
If I have NP with a TQBF oracle.

01:09:44.840 --> 01:09:46.429
Let's just check
each of these steps.

01:09:49.380 --> 01:09:54.860
I claim I can do that with a
non-deterministic polynomial

01:09:54.860 --> 01:10:02.760
space machine, which no
longer has an oracle.

01:10:02.760 --> 01:10:06.210
The reason is that, if
I have polynomial space,

01:10:06.210 --> 01:10:10.170
I can answer questions about
TQBF without needing an oracle.

01:10:10.170 --> 01:10:12.510
I have enough space just to
answer the question directly

01:10:12.510 --> 01:10:14.370
myself.

01:10:14.370 --> 01:10:15.990
And I use my
non-determinism here

01:10:15.990 --> 01:10:20.610
to simulate the non-determinism
of the NP machine.

01:10:20.610 --> 01:10:24.450
So every time the NP machine
branches non-deterministically,

01:10:24.450 --> 01:10:26.850
so do I. Every time
one of those branches

01:10:26.850 --> 01:10:30.060
asks the oracle a
TQBF question, I just

01:10:30.060 --> 01:10:34.810
do my polynomial space algorithm
to solve that question myself.

01:10:34.810 --> 01:10:38.700
But now NPSPACE equals
PSPACE by Savitch's theorem.

01:10:38.700 --> 01:10:43.170
And because TQBF
is PSPACE complete,

01:10:43.170 --> 01:10:47.070
for the very same reason
that a SAT oracle allows

01:10:47.070 --> 01:10:50.190
me to do every NP
problem, a TQBF problem

01:10:50.190 --> 01:10:51.780
allows me to do
every PSPACE problem.

01:10:54.400 --> 01:10:59.310
And so I get NP contained
within P for a TQBF oracle.

01:10:59.310 --> 01:11:02.190
And of course, you get the
containment the other way

01:11:02.190 --> 01:11:03.340
immediately.

01:11:03.340 --> 01:11:06.290
So they're equal.

01:11:06.290 --> 01:11:08.035
What does that have to do with--

01:11:11.470 --> 01:11:14.707
somebody said--
well, I'll just--

01:11:14.707 --> 01:11:16.040
I don't want to run out of time.

01:11:16.040 --> 01:11:19.810
So I'll take any
questions at the end.

01:11:19.810 --> 01:11:24.050
What does this got to do
with the P versus NP problem?

01:11:24.050 --> 01:11:27.140
OK, so this is a very
interesting connection.

01:11:35.380 --> 01:11:39.600
Remember, we just showed
through a combination

01:11:39.600 --> 01:11:41.850
of today's lecture and
yesterday's lecture,

01:11:41.850 --> 01:11:49.470
and I guess Thursday's
lecture, that this problem,

01:11:49.470 --> 01:11:52.530
this equivalence problem,
is not in PSPACE,

01:11:52.530 --> 01:11:56.140
and therefore it's not in P,
and therefore it's intractable.

01:11:56.140 --> 01:11:57.140
That's what we just did.

01:12:01.700 --> 01:12:04.280
We showed it's complete
for a class, which

01:12:04.280 --> 01:12:13.345
is provably outside of P,
provably bigger than P.

01:12:13.345 --> 01:12:14.720
That's the kind
of thing we would

01:12:14.720 --> 01:12:17.660
like to be able to do
to separate P and NP.

01:12:17.660 --> 01:12:20.510
We would like to show that
some other problem is not

01:12:20.510 --> 01:12:23.480
in P. Some other
problem is intractable.

01:12:23.480 --> 01:12:24.380
Namely, SAT.

01:12:24.380 --> 01:12:27.560
If we could do SAT,
then we're good.

01:12:27.560 --> 01:12:29.990
We've solved P and NP.

01:12:29.990 --> 01:12:33.260
So we already have an example
of being able to do that.

01:12:33.260 --> 01:12:37.020
Could we use the same method?

01:12:37.020 --> 01:12:39.780
Which is something people
did try to do many years ago,

01:12:39.780 --> 01:12:45.900
to show that SAT is not in P.
So what is that method really?

01:12:45.900 --> 01:12:49.647
The guts of that method really
comes from the hierarchy there.

01:12:49.647 --> 01:12:52.230
That's where you were actually
proving problems that are hard.

01:12:52.230 --> 01:12:56.460
You're getting this problem
with through the hierarchy

01:12:56.460 --> 01:13:02.830
construction that's
provably outside of PSPACE.

01:13:02.830 --> 01:13:05.490
And outside of P.

01:13:05.490 --> 01:13:07.340
That's a diagonalization.

01:13:07.340 --> 01:13:10.840
And if you look carefully
at what's going on there--

01:13:10.840 --> 01:13:12.450
so we're going to
say, no, we're not

01:13:12.450 --> 01:13:15.240
going to be able to solve
SAT, show SAT's outside of P

01:13:15.240 --> 01:13:16.180
in the same way.

01:13:16.180 --> 01:13:19.290
And the reason is,
suppose we could.

01:13:19.290 --> 01:13:22.450
Well the hierarchy theorems
are proved by diagonalization.

01:13:25.550 --> 01:13:30.260
What I mean by that is that
in the hierarchy theorem,

01:13:30.260 --> 01:13:34.430
there's a machine D, which is
simulating some other machine,

01:13:34.430 --> 01:13:39.210
M. To remember what's
going on there,

01:13:39.210 --> 01:13:41.700
remember that we
made a machine which

01:13:41.700 --> 01:13:44.580
is going to make its language
different from the language

01:13:44.580 --> 01:13:47.700
of every machine that's
running with less space

01:13:47.700 --> 01:13:48.600
or with less time.

01:13:52.160 --> 01:13:54.680
That's how D was defined.

01:13:54.680 --> 01:14:00.360
It wants to make sure its
language cannot be done in less

01:14:00.360 --> 01:14:01.090
space.

01:14:01.090 --> 01:14:04.180
So it makes sure that its
language is different.

01:14:04.180 --> 01:14:07.860
It simulates the machines
that use less space,

01:14:07.860 --> 01:14:12.080
and does something
different from what they do.

01:14:12.080 --> 01:14:18.240
Well, that simulation--
if we had an oracle,

01:14:18.240 --> 01:14:22.650
if we're trying to
show that if we provide

01:14:22.650 --> 01:14:25.860
both a simulator and the
machine being simulated

01:14:25.860 --> 01:14:29.340
with the same oracle, the
simulation still works.

01:14:29.340 --> 01:14:33.600
Every time the machine you're
simulating asks a question,

01:14:33.600 --> 01:14:35.220
the simulator has
the same oracle

01:14:35.220 --> 01:14:37.320
so it can also ask
the same question,

01:14:37.320 --> 01:14:39.490
and can still do the simulation.

01:14:39.490 --> 01:14:43.830
So in other words, if you
could prove P different from NP

01:14:43.830 --> 01:14:46.290
using basically a
simulation, which

01:14:46.290 --> 01:14:48.990
is what a diagonalization
is, then you

01:14:48.990 --> 01:14:51.570
would be able to prove
that P is different from NP

01:14:51.570 --> 01:14:52.530
for every oracle.

01:14:55.820 --> 01:14:58.670
So if you can prove P different
from NP by a diagonalization,

01:14:58.670 --> 01:15:00.290
that would also
immediately prove

01:15:00.290 --> 01:15:03.620
that P is different from
NP for every oracle.

01:15:03.620 --> 01:15:10.850
Because the argument is
transparent to the oracle.

01:15:10.850 --> 01:15:15.360
If you just put the oracle
down, everything still works.

01:15:15.360 --> 01:15:24.120
But-- here is the big
but, it can't be that--

01:15:24.120 --> 01:15:27.990
we know that P A is--

01:15:27.990 --> 01:15:29.250
we know this is false.

01:15:29.250 --> 01:15:33.220
We just exhibit an oracle
for which they're equal.

01:15:33.220 --> 01:15:35.280
It's not the case that
P is different from NP

01:15:35.280 --> 01:15:36.120
for every oracle.

01:15:38.810 --> 01:15:41.480
Sometimes they're
equal, for some oracles.

01:15:41.480 --> 01:15:44.180
So something that's
just basically a very

01:15:44.180 --> 01:15:46.670
straightforward diagonalization,
something that's

01:15:46.670 --> 01:15:48.860
at its core is a
diagonalization,

01:15:48.860 --> 01:15:51.320
is not going to be
enough to solve P and NP.

01:15:51.320 --> 01:15:53.210
Because otherwise it
would prove that they're

01:15:53.210 --> 01:15:54.335
different for every oracle.

01:15:54.335 --> 01:15:57.320
And sometimes they're not
different, for some oracles.

01:16:00.460 --> 01:16:04.240
That's an important insight
for what kind of a method

01:16:04.240 --> 01:16:09.340
will not be adequate to
prove P different from NP.

01:16:09.340 --> 01:16:12.490
And this comes up all the time.

01:16:12.490 --> 01:16:16.870
People who propose hypothetical
solutions that they're trying

01:16:16.870 --> 01:16:18.220
to show P different from NP.

01:16:18.220 --> 01:16:21.430
One of the very first
things people ask is, well,

01:16:21.430 --> 01:16:27.400
would that argument still work
if you put an oracle there.

01:16:27.400 --> 01:16:31.750
Often it does, which points
out there was a flaw.

01:16:31.750 --> 01:16:35.720
Anyway, last check in.

01:16:35.720 --> 01:16:38.830
So this is just a little test
of your knowledge about oracles.

01:16:41.780 --> 01:16:43.655
Why don't we-- in our
remaining minute here.

01:16:46.400 --> 01:16:48.230
Let's say 30 seconds.

01:16:48.230 --> 01:16:51.770
And then we'll do
a wrap on this,

01:16:51.770 --> 01:16:54.290
and I'll point out
which ones are right.

01:16:54.290 --> 01:16:56.780
Oh boy, we're all over
the place on this one.

01:17:00.820 --> 01:17:02.170
You're liking them all.

01:17:02.170 --> 01:17:08.100
Well, I guess the ones that
are false are lagging slightly.

01:17:15.630 --> 01:17:16.950
OK, let's conclude.

01:17:21.310 --> 01:17:23.800
Did I give you
enough time there?

01:17:23.800 --> 01:17:24.430
Share results.

01:17:30.160 --> 01:17:31.020
So, in fact--

01:17:37.510 --> 01:17:41.427
Yeah, so having an
oracle for the complement

01:17:41.427 --> 01:17:42.760
is the same as having an oracle.

01:17:42.760 --> 01:17:46.160
So this is certainly true.

01:17:46.160 --> 01:17:50.220
NP SAT equal coNP
SAT, we have no reason

01:17:50.220 --> 01:17:53.808
to believe that
would be true, and we

01:17:53.808 --> 01:17:54.850
don't know it to be true.

01:17:54.850 --> 01:18:00.660
So B is not a good choice
and that's the laggard here.

01:18:00.660 --> 01:18:05.700
MIN-FORMULA, well, is in
PSPACE, and anything in PSPACE

01:18:05.700 --> 01:18:09.660
is reducible to TQBF, so
this is certainly true.

01:18:09.660 --> 01:18:14.370
And same thing for NP with
TQBF and coNP with TQBF.

01:18:14.370 --> 01:18:22.640
Once you have TQBF, you're
going to get all of PSPACE.

01:18:22.640 --> 01:18:27.450
And as we pointed out, this
is going to be equal as well.

01:18:27.450 --> 01:18:30.500
So why don't we end here.

01:18:30.500 --> 01:18:34.910
And I think that's
my last slide.

01:18:34.910 --> 01:18:36.800
Oh no, there's my summary here.

01:18:36.800 --> 01:18:38.150
This is what we've done.

01:18:38.150 --> 01:18:43.888
And I will send you
all off on your way.

01:18:43.888 --> 01:18:45.680
How does the interaction
between the Turing

01:18:45.680 --> 01:18:46.847
machine and the oracle look?

01:18:46.847 --> 01:18:49.490
Yeah, I didn't define exactly
how the machine interacts

01:18:49.490 --> 01:18:50.510
with an oracle.

01:18:50.510 --> 01:18:52.580
You can imagine
having a separate tape

01:18:52.580 --> 01:18:54.110
where it writes
the oracle question

01:18:54.110 --> 01:18:56.960
and then goes into a
special query state.

01:18:56.960 --> 01:18:59.212
You can formalize
it however you like.

01:18:59.212 --> 01:19:01.670
They're all going to be-- any
reasonable way of formalizing

01:19:01.670 --> 01:19:06.500
it is going to come up with
the same notion in the end.

01:19:06.500 --> 01:19:10.070
It does show that P with a
TQBF oracle equals PSPACE.

01:19:10.070 --> 01:19:11.330
Yes, that is correct.

01:19:11.330 --> 01:19:13.970
Good point.

01:19:13.970 --> 01:19:16.220
Why do we need the
oracle to be TQBF?

01:19:16.220 --> 01:19:19.170
Wouldn't SAT also work because
it could solve any NP problem?

01:19:19.170 --> 01:19:23.010
So you're asking, does
P with a SAT oracle

01:19:23.010 --> 01:19:25.170
equal NP with a SAT oracle?

01:19:25.170 --> 01:19:26.190
Not known.

01:19:26.190 --> 01:19:29.670
And believed not to be
true, but we don't have

01:19:29.670 --> 01:19:32.460
a compelling reason for that.

01:19:32.460 --> 01:19:34.290
No one has any idea
how to do that.

01:19:34.290 --> 01:19:42.630
Because, for example, we showed
the complement of MIN-FORMULA

01:19:42.630 --> 01:19:44.235
is in NP with a SAT oracle.

01:19:47.050 --> 01:19:48.820
But no one knows how to do--

01:19:48.820 --> 01:19:51.070
because there's sort of two
levels of non-determinism

01:19:51.070 --> 01:19:51.570
there.

01:19:51.570 --> 01:19:54.160
There's guessing
the smaller formula,

01:19:54.160 --> 01:19:59.070
and then guessing again
to check the equivalence.

01:19:59.070 --> 01:20:01.320
And they really can't be
combined, because one of them

01:20:01.320 --> 01:20:03.750
is sort of an exist
type guessing,

01:20:03.750 --> 01:20:05.970
the other one is kind of
a for all type guessing.

01:20:11.010 --> 01:20:13.320
No one knows how to do
that in polynomial time

01:20:13.320 --> 01:20:14.190
with a SAT oracle.

01:20:18.085 --> 01:20:19.960
Generally believed that
they're not the same.

01:20:23.250 --> 01:20:24.990
In the check-in,
why was B false?

01:20:24.990 --> 01:20:29.900
B is the same question.

01:20:29.900 --> 01:20:35.750
Does NP with a SAT oracle
equal coNP with a SAT oracle?

01:20:35.750 --> 01:20:39.820
I'm not saying it's false,
it's just not known to be true.

01:20:39.820 --> 01:20:43.870
It doesn't follow from anything
that we've shown so far.

01:20:43.870 --> 01:20:47.500
And I think that would
be something that--

01:20:47.500 --> 01:20:49.510
well, I guess it
doesn't immediately

01:20:49.510 --> 01:20:51.565
imply any famous open problem.

01:20:54.790 --> 01:20:57.280
I wouldn't
necessarily expect you

01:20:57.280 --> 01:21:00.430
to know that it's an
unsolved problem, but it is.

01:21:04.870 --> 01:21:06.850
Could we have oracles
for undecidable language?

01:21:06.850 --> 01:21:08.410
Absolutely.

01:21:08.410 --> 01:21:09.370
Would it be helpful?

01:21:09.370 --> 01:21:12.160
Well, if you're trying to
solve an undecidable problem,

01:21:12.160 --> 01:21:13.690
it would be helpful.

01:21:13.690 --> 01:21:16.240
But people do study that.

01:21:16.240 --> 01:21:22.690
In fact, the original concept
of oracles was presented,

01:21:22.690 --> 01:21:27.670
was derived in the
computability theory.

01:21:27.670 --> 01:21:31.413
And a side note, you can
talk about reducibility.

01:21:31.413 --> 01:21:32.830
No, I don't want
to even go there.

01:21:32.830 --> 01:21:33.910
Too complicated.

01:21:37.130 --> 01:21:41.000
What is not known to be true?

01:21:41.000 --> 01:21:44.210
What is not known to be true
is that NP with a SAT oracle

01:21:44.210 --> 01:21:48.980
equals coNP with a SAT oracle,
or equals P with a SAT oracle.

01:21:48.980 --> 01:21:53.120
Nothing is known except the
obvious relations among those.

01:21:53.120 --> 01:21:57.485
Those are all unknown, and just
not known to be true or false.

01:22:05.020 --> 01:22:08.730
Is NP with a SAT
oracle equal to NP?

01:22:08.730 --> 01:22:10.140
Probably not.

01:22:10.140 --> 01:22:15.050
NP with a SAT oracle, for
one thing, contains coNP.

01:22:15.050 --> 01:22:16.700
Because it's even more powerful.

01:22:16.700 --> 01:22:21.050
We pointed out that P with
a SAT oracle contains coNP.

01:22:21.050 --> 01:22:24.870
And so NP with a SAT oracle is
going to be at least as good.

01:22:24.870 --> 01:22:27.350
And so it's going
to contain coNP.

01:22:27.350 --> 01:22:29.810
And so, probably not
going to be equal to

01:22:29.810 --> 01:22:34.380
NP unless shockingly
unexpected things

01:22:34.380 --> 01:22:37.460
happen in our complexity world.