WEBVTT

00:00:00.090 --> 00:00:02.430
The following content is
provided under a Creative

00:00:02.430 --> 00:00:03.810
Commons license.

00:00:03.810 --> 00:00:06.050
Your support will help
MIT OpenCourseWare

00:00:06.050 --> 00:00:10.160
continue to offer high quality
educational resources for free.

00:00:10.160 --> 00:00:12.690
To make a donation or to
view additional materials

00:00:12.690 --> 00:00:16.590
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:16.590 --> 00:00:17.260
at ocw.mit.edu.

00:00:26.700 --> 00:00:28.955
PROFESSOR: All right,
guys, let's get started.

00:00:28.955 --> 00:00:31.330
So today, we're going to talk
about side-channel attacks,

00:00:31.330 --> 00:00:36.360
which is a general class
of problems that comes up

00:00:36.360 --> 00:00:38.870
in all kinds of systems.

00:00:38.870 --> 00:00:40.320
Broadly, side-channel
attacks are

00:00:40.320 --> 00:00:42.778
situations where you haven't
thought about some information

00:00:42.778 --> 00:00:44.810
that your system
might be revealing.

00:00:44.810 --> 00:00:47.860
So typically, you have multiple
components that you [INAUDIBLE]

00:00:47.860 --> 00:00:50.480
maybe a user talking
to some server.

00:00:50.480 --> 00:00:53.387
And you're thinking, great,
I know exactly all the bits

00:00:53.387 --> 00:00:57.600
going over some wire [INAUDIBLE]
server, and those are secure.

00:00:57.600 --> 00:01:00.796
But it's often easy to miss
some information revealed,

00:01:00.796 --> 00:01:03.830
either by user or by server.

00:01:03.830 --> 00:01:07.800
So the example that the
paper for today talks about

00:01:07.800 --> 00:01:10.465
is a situation where the
timing of the messages

00:01:10.465 --> 00:01:12.900
between the user and
the server reveals

00:01:12.900 --> 00:01:16.070
some additional information
that you wouldn't have otherwise

00:01:16.070 --> 00:01:19.390
learned by just observing the
bits flowing between these two

00:01:19.390 --> 00:01:20.930
guys.

00:01:20.930 --> 00:01:24.650
But In fact, there's a much
broader class of side-channels

00:01:24.650 --> 00:01:25.790
you might worry about.

00:01:25.790 --> 00:01:28.550
Originally,
side-channels showed up,

00:01:28.550 --> 00:01:31.360
or people discovered them in
the '40s when they discovered

00:01:31.360 --> 00:01:33.440
that when you start
typing characters

00:01:33.440 --> 00:01:37.110
on a teletype the electronics,
or the electrical machinery

00:01:37.110 --> 00:01:39.580
in the teletype, would
emit RF radiation.

00:01:39.580 --> 00:01:41.920
And you can hook up
an oscilloscope nearby

00:01:41.920 --> 00:01:44.490
and just watch the
characters being typed out

00:01:44.490 --> 00:01:48.230
by monitoring the frequency
or RF frequencies that

00:01:48.230 --> 00:01:49.800
are going out of this machine.

00:01:49.800 --> 00:01:54.410
So RF radiation is a classic
example of a side-channel

00:01:54.410 --> 00:01:57.490
that you might worry about.

00:01:57.490 --> 00:02:00.880
And there's lots of examples
lots of other examples

00:02:00.880 --> 00:02:02.900
that people have looked
at, almost anything.

00:02:02.900 --> 00:02:07.343
So power usage is
another side-channel

00:02:07.343 --> 00:02:08.259
you might worry about.

00:02:08.259 --> 00:02:09.750
So your computer
is probably going

00:02:09.750 --> 00:02:12.230
to use different amounts of
power depending on what exactly

00:02:12.230 --> 00:02:13.970
it's computing.

00:02:13.970 --> 00:02:17.200
I'm gonna go into other
clever examples of sound

00:02:17.200 --> 00:02:19.330
turns out to also leak stuff.

00:02:19.330 --> 00:02:21.740
There's a [? cute ?] paper
that you can look at.

00:02:21.740 --> 00:02:25.344
The people listen to a
printer and based on the sound

00:02:25.344 --> 00:02:26.760
the printer is
making you can tell

00:02:26.760 --> 00:02:28.670
what characters it's printing.

00:02:28.670 --> 00:02:31.695
This is especially easy to do
for dot matrix printers that

00:02:31.695 --> 00:02:35.180
make this very annoying
sound when they're printing.

00:02:35.180 --> 00:02:38.690
And in general, a good
thing to think about,

00:02:38.690 --> 00:02:40.681
Kevin on Monday's
lecture also mentioned

00:02:40.681 --> 00:02:43.014
some interesting side-channels
that he's running through

00:02:43.014 --> 00:02:45.700
in his research.

00:02:45.700 --> 00:02:49.090
But, in particular,
here we're going

00:02:49.090 --> 00:02:51.880
to look at the
specific side-channel

00:02:51.880 --> 00:02:56.240
that David Brumley and Dan Boneh
looked at in their paper-- I

00:02:56.240 --> 00:02:59.095
guess about 10 years ago now--
where they were able to extract

00:02:59.095 --> 00:03:03.170
a cryptographic key out of
a web server running Apache

00:03:03.170 --> 00:03:06.310
by measuring the timing
of different responses

00:03:06.310 --> 00:03:11.520
to different input packets
from the adversarial client.

00:03:11.520 --> 00:03:14.330
And in this particular
case, they're

00:03:14.330 --> 00:03:15.990
going after a cryptographic key.

00:03:15.990 --> 00:03:17.860
In fact, many
side-channel attacks

00:03:17.860 --> 00:03:21.440
target cryptographic keys
partly because it's a little bit

00:03:21.440 --> 00:03:24.744
tricky to get lots of data
through a side-channel.

00:03:24.744 --> 00:03:26.410
And cryptographic
keys are one situation

00:03:26.410 --> 00:03:30.050
where getting a small number
of bits helps you a lot.

00:03:30.050 --> 00:03:32.870
So in their attack they're
able to extract maybe

00:03:32.870 --> 00:03:36.760
about 200 256 bits or so.

00:03:36.760 --> 00:03:38.970
And just from those
200ish bits, they're

00:03:38.970 --> 00:03:42.300
able to break the cryptographic
key of this web server.

00:03:42.300 --> 00:03:43.890
Whereas, if you're
trying to leak

00:03:43.890 --> 00:03:46.140
some database full of
Social Security numbers,

00:03:46.140 --> 00:03:48.340
then that'll be
a lot of bits you

00:03:48.340 --> 00:03:51.082
have to leak to get
out of this database.

00:03:51.082 --> 00:03:53.290
So that's why many of
these side-channels,

00:03:53.290 --> 00:03:55.670
if you'll see them
later on, they often

00:03:55.670 --> 00:03:59.240
focus on getting
small secrets out,

00:03:59.240 --> 00:04:02.850
might be cryptographic
keys or passwords.

00:04:02.850 --> 00:04:04.970
But in general, this
is applicable to lots

00:04:04.970 --> 00:04:09.210
of other situations as well.

00:04:09.210 --> 00:04:11.230
And one cool thing
about this paper,

00:04:11.230 --> 00:04:13.410
before we jump into
the details, is

00:04:13.410 --> 00:04:16.459
that they show that you actually
do this over the network.

00:04:16.459 --> 00:04:18.890
So as you probably figured
out from reading this paper,

00:04:18.890 --> 00:04:20.560
they have to do a
lot of careful work

00:04:20.560 --> 00:04:23.150
to tease out these
minute differences

00:04:23.150 --> 00:04:24.670
in timing information.

00:04:24.670 --> 00:04:28.290
So if you actually compute out
the numbers from this paper,

00:04:28.290 --> 00:04:33.190
it turns out that each request
that they sent to the server

00:04:33.190 --> 00:04:35.365
differs from potentially
another [? website ?]

00:04:35.365 --> 00:04:39.480
by an order of 1 to
2 microseconds, which

00:04:39.480 --> 00:04:41.280
is pretty tiny.

00:04:41.280 --> 00:04:47.000
So you have to be quite
careful, and all of our network

00:04:47.000 --> 00:04:50.080
it might be hard to tell
whether some server took

00:04:50.080 --> 00:04:53.750
1 or 2 microseconds longer to
process your request or not.

00:04:53.750 --> 00:04:58.150
And as a result, it was not
so clear for whether you

00:04:58.150 --> 00:05:01.060
could mount this kind of attack
over a very noisy network.

00:05:01.060 --> 00:05:03.690
And these guys were
one of the first people

00:05:03.690 --> 00:05:06.620
to show that you can actually
do this over a real ethernet

00:05:06.620 --> 00:05:09.600
network with a server sitting
in one place, a client sitting

00:05:09.600 --> 00:05:10.461
somewhere else.

00:05:10.461 --> 00:05:12.460
And you could actually
measure these differences

00:05:12.460 --> 00:05:16.740
partly by averaging, partly
through other tricks.

00:05:16.740 --> 00:05:21.270
All right, does that make sense,
the overall side-channel stuff?

00:05:21.270 --> 00:05:21.770
All right.

00:05:21.770 --> 00:05:23.860
So the plan for the
rest of this lecture

00:05:23.860 --> 00:05:27.990
is we'll first dive into
the details of this RSA

00:05:27.990 --> 00:05:29.800
cryptosystem that
these guys use.

00:05:29.800 --> 00:05:32.480
Then we'll not look at
exactly why it's secure

00:05:32.480 --> 00:05:34.900
or not but we'll look at
how do you implement it

00:05:34.900 --> 00:05:37.980
because that turns out to
be critical for exploiting

00:05:37.980 --> 00:05:39.350
this particular side-channel.

00:05:39.350 --> 00:05:42.800
They carefully leverage various
details of the implementation

00:05:42.800 --> 00:05:46.164
to figure out when there are
some things faster or slower.

00:05:46.164 --> 00:05:48.080
And then we'll pop back
out once we understand

00:05:48.080 --> 00:05:49.210
how RSA is implemented.

00:05:49.210 --> 00:05:52.125
Then we'll come back and figure
out how do you attack it,

00:05:52.125 --> 00:05:54.250
how do you attack all these
different organizations

00:05:54.250 --> 00:05:56.040
that RSA has.

00:05:56.040 --> 00:05:57.580
Sounds good?

00:05:57.580 --> 00:05:58.710
All right.

00:05:58.710 --> 00:06:00.760
So I guess let's
start off by looking

00:06:00.760 --> 00:06:04.200
at the high level plan for RSA.

00:06:04.200 --> 00:06:08.940
So RSA is a pretty widely
used public key cryptosystem.

00:06:08.940 --> 00:06:10.800
We've mentioned these
guys a couple of weeks

00:06:10.800 --> 00:06:14.690
ago in general in certificates,
in the context of certificates.

00:06:14.690 --> 00:06:17.100
But now we're going to look
at actually how it works.

00:06:17.100 --> 00:06:20.710
So typically there's 3 things
you have to worry about.

00:06:20.710 --> 00:06:25.290
So there's generating a key,
encrypting, and decrypting.

00:06:25.290 --> 00:06:29.220
So for RSA, the way you
generate a key is you actually

00:06:29.220 --> 00:06:32.220
pick 2 large prime integers.

00:06:32.220 --> 00:06:35.500
So you're going to
pick 2 primes, p and q.

00:06:35.500 --> 00:06:42.020
And in the paper, these
guys focus on p and q,

00:06:42.020 --> 00:06:45.810
which are about 512 bits each.

00:06:45.810 --> 00:06:49.730
So this is typically
called 1,024 bit RSA

00:06:49.730 --> 00:06:52.570
because the resulting product
of these primes that you're

00:06:52.570 --> 00:06:56.500
going to use in a second is
a 1,000 bit integer number.

00:06:56.500 --> 00:06:59.360
These days, that's probably
not a particularly good choice

00:06:59.360 --> 00:07:02.170
for the size of your
RSA key because it

00:07:02.170 --> 00:07:06.860
makes it relatively easy for
attackers to factor this-- not

00:07:06.860 --> 00:07:09.080
trivial but certainly viable.

00:07:09.080 --> 00:07:12.170
So if 10 years ago, this seemed
like a potentially sensible

00:07:12.170 --> 00:07:14.520
parameter, now if you're
actually building a system,

00:07:14.520 --> 00:07:16.780
you should probably
pick a 2,000 or 3,000

00:07:16.780 --> 00:07:19.866
or even 4,000 bit RSA key.

00:07:19.866 --> 00:07:22.590
Well, that's what
RSA key size means

00:07:22.590 --> 00:07:24.620
is the size of these primes.

00:07:24.620 --> 00:07:26.480
And then, for
convenience, we're going

00:07:26.480 --> 00:07:28.140
to talk about the
number n, which

00:07:28.140 --> 00:07:33.010
is just the product of
these 2 primes, p times q.

00:07:33.010 --> 00:07:33.510
All right.

00:07:33.510 --> 00:07:35.490
So now we know how
to generate a key,

00:07:35.490 --> 00:07:38.440
now we need to figure
out-- well this is at least

00:07:38.440 --> 00:07:40.100
part of a key-- now
we're going to have

00:07:40.100 --> 00:07:45.060
to figure out how we're going
to encrypt and decrypt messages.

00:07:45.060 --> 00:07:48.280
And the way we're going to
encrypt and decrypt messages

00:07:48.280 --> 00:07:54.320
is by exponentiating numbers
modulo this number n.

00:07:54.320 --> 00:07:57.790
So it seems a little weird, but
let's go with it for a second.

00:07:57.790 --> 00:08:00.520
So if you want to
encrypt a message,

00:08:00.520 --> 00:08:03.560
then we're going
to take a message m

00:08:03.560 --> 00:08:11.920
and transform it into
m to the power e mod m.

00:08:11.920 --> 00:08:14.570
So e is going to be some
exponent-- we'll talk about how

00:08:14.570 --> 00:08:15.640
to choose it in a second.

00:08:15.640 --> 00:08:17.880
But this is how we're
going to encrypt a message.

00:08:17.880 --> 00:08:21.230
We'll just take this
message as an integer number

00:08:21.230 --> 00:08:23.260
and just exponentiate it.

00:08:23.260 --> 00:08:25.610
And then we'll see why
this works in a second,

00:08:25.610 --> 00:08:30.500
but let's call this
guy c, ciphertext.

00:08:30.500 --> 00:08:36.039
Then to decrypt it, we're
going to somehow find

00:08:36.039 --> 00:08:37.940
an interesting
other exponent where

00:08:37.940 --> 00:08:41.336
you can take a ciphertext c
and if you exponentiate it

00:08:41.336 --> 00:08:46.440
to some power d mod m,
then you'll magically

00:08:46.440 --> 00:08:49.500
get back the same message m.

00:08:49.500 --> 00:08:52.290
So this is the general plan:
To encrypt, you exponentiate.

00:08:52.290 --> 00:08:56.687
To decrypt, you exponentiate
by another exponent.

00:08:56.687 --> 00:08:58.270
And in general, it
seems a little hard

00:08:58.270 --> 00:09:00.561
to figure out how we're going
to come up with these two

00:09:00.561 --> 00:09:02.800
magic numbers that
somehow end up giving us

00:09:02.800 --> 00:09:04.390
back the same message.

00:09:04.390 --> 00:09:06.890
But it turns out
that if you look

00:09:06.890 --> 00:09:12.000
at how exponentiation works
or multiplication works,

00:09:12.000 --> 00:09:14.340
modulo of this number n.

00:09:14.340 --> 00:09:22.670
Then there's this cool property
that if you have any number x,

00:09:22.670 --> 00:09:26.000
and you raise it to what's
called a [? order ?] of phi

00:09:26.000 --> 00:09:32.215
function of n-- maybe I'll
use more board space for this.

00:09:32.215 --> 00:09:33.790
This seems important.

00:09:33.790 --> 00:09:37.998
So if you take x and you
raise it to phi of n,

00:09:37.998 --> 00:09:44.370
then this is going to
be equal to 1 mod m.

00:09:44.370 --> 00:09:48.260
And this phi function for
our particular choice of n

00:09:48.260 --> 00:09:49.960
is pretty straightforward,
it's actually

00:09:49.960 --> 00:09:54.600
p minus 1 times q minus 1.

00:09:54.600 --> 00:10:01.560
So this gives us hope that maybe
if we pick ed so that e times

00:10:01.560 --> 00:10:06.370
d is 5n plus 1, then
we're in good shape.

00:10:06.370 --> 00:10:11.200
Because then any message m we
exponentiate it to e and d,

00:10:11.200 --> 00:10:16.380
we get back 1 times m
because our ed product

00:10:16.380 --> 00:10:19.420
is going to be
roughly 5n plus 1,

00:10:19.420 --> 00:10:25.445
or maybe some constant
alpha times 5n plus 1.

00:10:25.445 --> 00:10:26.320
Does this make sense?

00:10:26.320 --> 00:10:30.800
This is why the message is going
to get decrypted correctly.

00:10:30.800 --> 00:10:33.900
And it turns out that there's
a reasonably straightforward

00:10:33.900 --> 00:10:39.880
algorithm if you know this
phi value for how to compute

00:10:39.880 --> 00:10:42.430
d given an e or e given a d.

00:10:42.430 --> 00:10:42.930
All right.

00:10:42.930 --> 00:10:43.770
Question.

00:10:43.770 --> 00:10:45.640
AUDIENCE: Isn't 1 mod n just 1?

00:10:45.640 --> 00:10:48.710
PROFESSOR: Yeah, so
far we add one more.

00:10:48.710 --> 00:10:50.048
Sorry?

00:10:50.048 --> 00:10:52.388
AUDIENCE: Like, up over there.

00:10:52.388 --> 00:10:53.471
PROFESSOR: Yeah, this one?

00:10:53.471 --> 00:10:55.430
AUDIENCE: Yeah.

00:10:55.430 --> 00:10:57.200
PROFESSOR: Isn't 1 mod n just 1?

00:10:57.200 --> 00:10:58.820
Sorry, I mean this.

00:10:58.820 --> 00:11:02.462
So when I say this 1 n, it
means that both sides taken 1n

00:11:02.462 --> 00:11:04.820
are equal.

00:11:04.820 --> 00:11:07.990
So what this means
is if you want

00:11:07.990 --> 00:11:10.046
to think of mod as
literally an operator,

00:11:10.046 --> 00:11:13.816
you would write this guy
mod m equals 1 mod m.

00:11:13.816 --> 00:11:15.440
So that's what mod
m on the side means.

00:11:15.440 --> 00:11:18.325
Like, the whole
equality is mod m.

00:11:18.325 --> 00:11:21.175
Sorry for the [INAUDIBLE].

00:11:21.175 --> 00:11:22.610
Make sense?

00:11:22.610 --> 00:11:24.120
All right.

00:11:24.120 --> 00:11:27.665
So what this basically
means for RSA is that we're

00:11:27.665 --> 00:11:32.150
going to pick some value e.

00:11:32.150 --> 00:11:34.558
So e is going to be
our encryption value.

00:11:34.558 --> 00:11:41.180
And then from e we're going
to generate d to be basically

00:11:41.180 --> 00:11:45.826
1 over e mod phi of n.

00:11:45.826 --> 00:11:47.665
And there's some
Euclidean algorithms

00:11:47.665 --> 00:11:51.460
you can use to do this
computation efficiently.

00:11:51.460 --> 00:11:53.390
But in order to do
this you actually

00:11:53.390 --> 00:11:56.180
have to know this
phi of n, which

00:11:56.180 --> 00:11:59.485
requires knowing the
factorization of our number n

00:11:59.485 --> 00:12:01.910
into p and q.

00:12:01.910 --> 00:12:02.410
All right.

00:12:02.410 --> 00:12:08.600
So finally, RSA ends
up being a system where

00:12:08.600 --> 00:12:13.132
the public key is this number n
and this encryption exponent e.

00:12:13.132 --> 00:12:16.750
So n and e are public,
and d should be private.

00:12:16.750 --> 00:12:18.820
So then anyone can
exponentiate a message

00:12:18.820 --> 00:12:20.320
to encrypt it for you.

00:12:20.320 --> 00:12:22.914
But only you know this
value d and therefore

00:12:22.914 --> 00:12:25.230
can decrypt messages.

00:12:25.230 --> 00:12:30.090
And as long as you don't know
this factorization of p and q,

00:12:30.090 --> 00:12:32.660
of n to p and q,
then you don't know

00:12:32.660 --> 00:12:33.785
what this [? phi del ?] is.

00:12:33.785 --> 00:12:35.910
And as a result, it's
actually difficult to compute

00:12:35.910 --> 00:12:37.470
this d value.

00:12:37.470 --> 00:12:41.580
So this is roughly what RSA is.

00:12:41.580 --> 00:12:43.370
High level.

00:12:43.370 --> 00:12:45.450
Does this make sense?

00:12:45.450 --> 00:12:45.950
All right.

00:12:45.950 --> 00:12:48.140
So there's 2 things I
want to talk about now

00:12:48.140 --> 00:12:52.590
that we at least have the basic
[? implementation ?] for RSA.

00:12:52.590 --> 00:12:55.850
There's tricks to use it
correctly and pitfalls

00:12:55.850 --> 00:12:57.085
and how to use RSA.

00:12:57.085 --> 00:12:59.210
And then there's all kinds
of implementation tricks

00:12:59.210 --> 00:13:02.440
on how do you actually
implement [? root ?]

00:13:02.440 --> 00:13:07.360
code to do these exponentiations
and do them efficiently.

00:13:07.360 --> 00:13:10.010
There's actually more
trivial because these are all

00:13:10.010 --> 00:13:13.110
large numbers, these are 1,000
bit integers that can't just

00:13:13.110 --> 00:13:15.730
do a multiply instruction for.

00:13:15.730 --> 00:13:18.156
Probably going to take
a fair amount of time

00:13:18.156 --> 00:13:20.430
to do these operations.

00:13:20.430 --> 00:13:20.930
All right.

00:13:20.930 --> 00:13:22.430
So the first thing
I want to mention

00:13:22.430 --> 00:13:26.470
is the various RSA pitfalls.

00:13:26.470 --> 00:13:31.310
One of them we're actually going
to rely on in a little bit.

00:13:31.310 --> 00:13:35.360
One property is, that
it's multiplicative.

00:13:38.827 --> 00:13:43.600
So what I mean by this is that
suppose we have 2 messages.

00:13:43.600 --> 00:13:46.950
Suppose we have m0 and m1.

00:13:46.950 --> 00:13:49.196
And suppose I
encrypt these guys,

00:13:49.196 --> 00:13:55.612
so I encrypt m0, I'm going to
get m0 to the power e mod n.

00:13:55.612 --> 00:14:02.840
And if I encrypt m1, then
I'd get m1 to the e mod n.

00:14:02.840 --> 00:14:06.220
The problem is-- not
necessarily a problem

00:14:06.220 --> 00:14:08.940
but could be a
surprise to someone

00:14:08.940 --> 00:14:11.300
using RSA-- it's
very easy to generate

00:14:11.300 --> 00:14:14.480
an encryption of m0
times m1 because you just

00:14:14.480 --> 00:14:15.940
multiply these 2 numbers.

00:14:15.940 --> 00:14:18.480
If you multiply these
guys out, you're

00:14:18.480 --> 00:14:26.500
going to get m0
m1 to the e mod n.

00:14:26.500 --> 00:14:29.840
This is a correct encryption
under this simplistic use

00:14:29.840 --> 00:14:34.512
of RSA for the
value m0 times m1.

00:14:34.512 --> 00:14:36.847
I mean at this point,
it's not a huge problem

00:14:36.847 --> 00:14:38.555
because if you aren't
able to decrypt it,

00:14:38.555 --> 00:14:41.940
you're just able to construct
this encrypted message.

00:14:41.940 --> 00:14:45.620
But it might be that the
overall system maybe allows you

00:14:45.620 --> 00:14:46.786
to decrypt certain messages.

00:14:46.786 --> 00:14:50.110
And if it allows you to decrypt
this message that you construct

00:14:50.110 --> 00:14:52.670
yourself, maybe you can
now go back and figure out

00:14:52.670 --> 00:14:53.820
what are these messages.

00:14:53.820 --> 00:15:00.310
So it's maybe not a great plan
to be ignorant of this fact.

00:15:00.310 --> 00:15:04.000
This has certainly come back
to bite a number of protocols

00:15:04.000 --> 00:15:05.450
that use RSA.

00:15:05.450 --> 00:15:06.950
There's one property,
we'll actually

00:15:06.950 --> 00:15:11.450
use it as a defensive mechanism
towards the end of the lecture.

00:15:11.450 --> 00:15:15.910
Another property of RSA that you
probably want to watch out for

00:15:15.910 --> 00:15:18.566
is the fact that
it's deterministic.

00:15:21.350 --> 00:15:23.695
So in this [? naive ?]
implementation

00:15:23.695 --> 00:15:27.072
that I just described here,
if you take a message m

00:15:27.072 --> 00:15:29.165
and you encrypt it,
you're going to get m

00:15:29.165 --> 00:15:32.100
to the e mod n, which is
a deterministic function

00:15:32.100 --> 00:15:33.296
of the message.

00:15:33.296 --> 00:15:35.303
So if you encrypt
it again, you'll

00:15:35.303 --> 00:15:36.870
get exactly the same encryption.

00:15:36.870 --> 00:15:38.590
This is not surprising
but it might not

00:15:38.590 --> 00:15:40.510
be a desirable
property because if I

00:15:40.510 --> 00:15:44.090
see you send send some
message encrypted with RSA,

00:15:44.090 --> 00:15:46.495
and I want to know what
it is, it might be hard

00:15:46.495 --> 00:15:47.370
for me to decrypt it.

00:15:47.370 --> 00:15:48.890
But I can try different
things and I can see,

00:15:48.890 --> 00:15:50.306
well are you sending
this message?

00:15:50.306 --> 00:15:52.600
I'll encrypt it and see if
you get the same ciphertext.

00:15:52.600 --> 00:15:54.820
And if so, then I'll know
that's what you encrypted.

00:15:54.820 --> 00:15:56.790
Because all I need to
encrypt a message is

00:15:56.790 --> 00:16:01.850
the publicly known public key,
which is n and the number e.

00:16:01.850 --> 00:16:04.104
So that's not so great.

00:16:04.104 --> 00:16:06.145
And you might want to
watch out for this property

00:16:06.145 --> 00:16:08.640
if you're actually using RSA.

00:16:08.640 --> 00:16:10.140
So all of these
[? primitives are ?]

00:16:10.140 --> 00:16:14.340
probably a little bit
hard to use directly.

00:16:14.340 --> 00:16:17.320
What people do in
practice in order

00:16:17.320 --> 00:16:20.024
to avoid these
problems with RSA is

00:16:20.024 --> 00:16:21.690
they encode the message
in a certain way

00:16:21.690 --> 00:16:23.030
before encrypting it.

00:16:23.030 --> 00:16:25.790
Instead of directly
exponentiating a message,

00:16:25.790 --> 00:16:28.020
it actually takes some
function of a message,

00:16:28.020 --> 00:16:31.680
and then they encrypt that.

00:16:31.680 --> 00:16:33.096
mod n.

00:16:33.096 --> 00:16:38.190
And this function f, the
right one to use these days,

00:16:38.190 --> 00:16:41.526
is probably something called
optimal asymmetric encryption

00:16:41.526 --> 00:16:45.595
padding, O A E P.
You can look it up.

00:16:45.595 --> 00:16:49.310
It's something coded that has
two interesting properties.

00:16:49.310 --> 00:16:51.390
First of all, it
injects randomness.

00:16:51.390 --> 00:16:57.230
You can think of f of n as
generating 1,000 bit message

00:16:57.230 --> 00:16:58.580
that you're going to encrypt.

00:16:58.580 --> 00:17:01.566
Part of this message is going to
be your message m in the middle

00:17:01.566 --> 00:17:02.065
here.

00:17:02.065 --> 00:17:03.420
So that you can get it back
when you decrypt, of course.

00:17:03.420 --> 00:17:04.641
[INAUDIBLE].

00:17:04.641 --> 00:17:06.599
So there's 2 interesting
things you want to do.

00:17:06.599 --> 00:17:08.339
You want to put in
some randomness here,

00:17:08.339 --> 00:17:10.640
some value r so that when
you encrypt the message

00:17:10.640 --> 00:17:12.839
multiple times, you'll
get different results out

00:17:12.839 --> 00:17:16.069
of each time so then it's
not deterministic anymore.

00:17:16.069 --> 00:17:18.390
And in order to defeat this
multiplicative property

00:17:18.390 --> 00:17:20.840
and other kinds of
problems, you're

00:17:20.840 --> 00:17:23.010
going to put in some
fixed padding here.

00:17:23.010 --> 00:17:25.510
You can think of this as
an altering sequence of 1 0

00:17:25.510 --> 00:17:27.003
1 0 1 0.

00:17:27.003 --> 00:17:28.044
You can do better things.

00:17:28.044 --> 00:17:30.134
But roughly it's some
predictable sequence

00:17:30.134 --> 00:17:33.395
that you put in here and
whenever you decrypt,

00:17:33.395 --> 00:17:35.590
you make sure the
sequence is still there.

00:17:35.590 --> 00:17:37.560
Even in multiplication
it's going

00:17:37.560 --> 00:17:40.570
to destroy this bit power.

00:17:40.570 --> 00:17:43.597
And then you should be
clear that someone tampered

00:17:43.597 --> 00:17:46.082
with my message and reject it.

00:17:46.082 --> 00:17:51.220
And if it's still there, then
presumably, sometimes provably,

00:17:51.220 --> 00:17:53.621
no one tampered with your
message, and as a result

00:17:53.621 --> 00:17:55.004
you should be able to accept it.

00:17:55.004 --> 00:17:59.140
And treat message m as
correctly encrypted by someone.

00:17:59.140 --> 00:18:00.721
Make sense?

00:18:00.721 --> 00:18:01.220
Yeah?

00:18:01.220 --> 00:18:05.250
AUDIENCE: If the attacker knows
how big the pad is, can't they

00:18:05.250 --> 00:18:10.960
put a 1 in the lowest
place and then [INAUDIBLE]

00:18:10.960 --> 00:18:13.207
under multiplication?

00:18:13.207 --> 00:18:14.165
PROFESSOR: Yeah, maybe.

00:18:14.165 --> 00:18:16.552
It's a little bit tricky
because this randomness

00:18:16.552 --> 00:18:17.510
is going to bleed over.

00:18:17.510 --> 00:18:20.170
So the particular
construction of this O A E P

00:18:20.170 --> 00:18:22.740
is a little bit more
sophisticated than this.

00:18:22.740 --> 00:18:25.210
But if you imagine
this is integer

00:18:25.210 --> 00:18:28.160
multiplication not
bit-wise multiplication.

00:18:28.160 --> 00:18:31.530
And so this randomness is
going to bleed over somewhere,

00:18:31.530 --> 00:18:34.700
and you can construct
O A E P scheme such

00:18:34.700 --> 00:18:37.896
that this doesn't happen.

00:18:37.896 --> 00:18:41.720
[INAUDIBLE] Make sense?

00:18:41.720 --> 00:18:42.390
All right.

00:18:42.390 --> 00:18:44.514
So it turns out that
basically you shouldn't really

00:18:44.514 --> 00:18:46.170
use this RSA math
directly, you should

00:18:46.170 --> 00:18:48.760
use some library in
practice that implements all

00:18:48.760 --> 00:18:51.340
those things correctly for you.

00:18:51.340 --> 00:18:53.980
And use it just as an
encrypt/decrypt parameter.

00:18:53.980 --> 00:18:56.390
But it turns out these details
will come in and matter

00:18:56.390 --> 00:18:58.473
for us because we're
actually trying to figure out

00:18:58.473 --> 00:19:03.300
how to break or how to attack
an existing RSA implementation.

00:19:03.300 --> 00:19:07.100
So in particular the
attack from this paper

00:19:07.100 --> 00:19:10.080
is going to exploit the
fact that the server is

00:19:10.080 --> 00:19:13.210
going to check for this padding
when they get a message.

00:19:13.210 --> 00:19:17.130
So this is how we're going to
time how long it takes a server

00:19:17.130 --> 00:19:17.770
to decrypt.

00:19:17.770 --> 00:19:21.690
We're going to send some random
message, or some carefully

00:19:21.690 --> 00:19:22.545
constructed message.

00:19:22.545 --> 00:19:26.243
But the message wasn't
constructed by taking a real m

00:19:26.243 --> 00:19:27.330
and encrypting it.

00:19:27.330 --> 00:19:29.980
We're going to construct a
careful ciphertext integer

00:19:29.980 --> 00:19:31.300
value.

00:19:31.300 --> 00:19:33.020
And the server is
going to decrypt it,

00:19:33.020 --> 00:19:34.700
it's going to decrypt
to some nonsense,

00:19:34.700 --> 00:19:36.590
and the padding is
going to not match

00:19:36.590 --> 00:19:37.820
with a very high probability.

00:19:37.820 --> 00:19:40.090
And immediately the server
is going to reject it.

00:19:40.090 --> 00:19:41.720
And the reason this
is going to be good

00:19:41.720 --> 00:19:44.340
for us is because it will tell
us exactly how long it took

00:19:44.340 --> 00:19:47.250
the server to get to this point,
just do the RSA decryption,

00:19:47.250 --> 00:19:50.281
get this message, check
the padding, and reject it.

00:19:50.281 --> 00:19:52.030
So that's what we're
going to be measuring

00:19:52.030 --> 00:19:54.290
in this attack from the paper.

00:19:54.290 --> 00:19:55.450
Does that make sense?

00:19:55.450 --> 00:19:57.700
So there's some integrity
component to the the message

00:19:57.700 --> 00:20:02.800
that allows us to time the
decryption leading up to it.

00:20:02.800 --> 00:20:03.625
All right.

00:20:03.625 --> 00:20:07.180
So now let's talk about how to
do you actually implement RSA.

00:20:07.180 --> 00:20:09.940
So the core of it is
really this exponentiation,

00:20:09.940 --> 00:20:12.485
which is not exactly
trivial to do

00:20:12.485 --> 00:20:14.860
as I was mentioning earlier
because all these numbers are

00:20:14.860 --> 00:20:15.880
very large integers.

00:20:15.880 --> 00:20:18.820
So the message itself
is going to be at least,

00:20:18.820 --> 00:20:20.830
in this paper,
1,000 bit integer.

00:20:20.830 --> 00:20:23.810
And the exponent itself is
also going to be pretty large.

00:20:23.810 --> 00:20:26.180
The encryption exponent
is at least well known.

00:20:26.180 --> 00:20:27.596
But the decryption
exponent better

00:20:27.596 --> 00:20:30.255
be also a large integer also
on the order of 1,000 bits.

00:20:30.255 --> 00:20:32.126
So you have a 1,000
bit integer you

00:20:32.126 --> 00:20:35.900
want to exponentiate to another
1,000 bit integer power modulo

00:20:35.900 --> 00:20:38.030
some other 1,000
bit integer n that's

00:20:38.030 --> 00:20:39.830
going to be a little
messy, if you just do

00:20:39.830 --> 00:20:42.210
[? the naive thing. ?]
So almost everyone has

00:20:42.210 --> 00:20:45.530
lots of optimizations in
their RSA implementations

00:20:45.530 --> 00:20:48.640
to make this go a
little bit faster.

00:20:48.640 --> 00:20:51.970
And there's four
optimizations that matter

00:20:51.970 --> 00:20:53.420
for the purpose of this attack.

00:20:53.420 --> 00:20:55.420
There is actually more
tricks that you can play,

00:20:55.420 --> 00:20:57.100
but the most important
ones are these.

00:20:57.100 --> 00:21:02.130
So first there's something
called the Chinese remainder

00:21:02.130 --> 00:21:06.640
theorem, or C R T.
And just to remind you

00:21:06.640 --> 00:21:10.250
from grade school or
high school maybe what

00:21:10.250 --> 00:21:12.330
this remainder theorem says.

00:21:12.330 --> 00:21:16.380
It actually says that
if you have two numbers

00:21:16.380 --> 00:21:20.170
and you have some
value x and you know

00:21:20.170 --> 00:21:25.360
that x is equal to a1 mod p.

00:21:25.360 --> 00:21:31.200
And you know that x is
equal to a2 mod q, where

00:21:31.200 --> 00:21:33.350
p and q are prime numbers.

00:21:33.350 --> 00:21:38.790
And this modular equality
applies to the whole equation.

00:21:38.790 --> 00:21:42.920
Then it turns out that there's
a unique solution to this

00:21:42.920 --> 00:21:43.650
is mod p q.

00:21:43.650 --> 00:21:52.210
So there's are some x equals
to some x prime mod pq.

00:21:52.210 --> 00:21:55.050
And in fact, there's
a unique such x prime,

00:21:55.050 --> 00:21:57.170
and it's actually very
efficient to compute.

00:21:57.170 --> 00:21:59.450
So the Chinese
remainder theorem also

00:21:59.450 --> 00:22:03.070
comes with an algorithm for
how to compute this unique x

00:22:03.070 --> 00:22:09.300
prime that's equal to x mod pq
given the values a1 and a2 mod

00:22:09.300 --> 00:22:12.570
p and q, respectively.

00:22:12.570 --> 00:22:15.170
Make sense?

00:22:15.170 --> 00:22:17.495
OK, so how can you use this
Chinese remainder theorem

00:22:17.495 --> 00:22:22.580
to speed up modular
exponentiation?

00:22:22.580 --> 00:22:24.130
So the way this is
going to help us

00:22:24.130 --> 00:22:26.350
is that if you
notice all the time

00:22:26.350 --> 00:22:31.400
we're doing this computational
of some bunch of stuff modulo

00:22:31.400 --> 00:22:33.710
n, which is p times q.

00:22:33.710 --> 00:22:35.135
And the Chinese
remainder theorem

00:22:35.135 --> 00:22:39.100
says that if you want the value
of something mod p times q,

00:22:39.100 --> 00:22:42.320
it suffices to compute the
value of that thing mod p

00:22:42.320 --> 00:22:44.746
and the value of
that thing mod q.

00:22:44.746 --> 00:22:46.610
And then use the Chinese
remainder theorem

00:22:46.610 --> 00:22:48.960
to figure out the
unique solution to what

00:22:48.960 --> 00:22:53.220
this thing is mod p times q.

00:22:53.220 --> 00:22:55.516
All right, why is this faster?

00:22:55.516 --> 00:22:58.335
Seems like you're basically
doing the same thing twice,

00:22:58.335 --> 00:23:00.854
and that's more
work to recombine it

00:23:00.854 --> 00:23:02.270
Is this going to
save me anything?

00:23:02.270 --> 00:23:02.770
Yeah?

00:23:02.770 --> 00:23:03.746
AUDIENCE: [INAUDIBLE]

00:23:06.479 --> 00:23:08.270
PROFESSOR: Well, they're
certainly smaller,

00:23:08.270 --> 00:23:09.311
they're not that smaller.

00:23:09.311 --> 00:23:11.950
And so p and q, so n
is 1,000 bits, p and q

00:23:11.950 --> 00:23:15.600
are both 500 bits, they're not
quite to the machine word size

00:23:15.600 --> 00:23:16.360
yet.

00:23:16.360 --> 00:23:18.980
But it is going to
help us because most

00:23:18.980 --> 00:23:21.340
of the stuff we're doing
in this computation

00:23:21.340 --> 00:23:23.160
is all these multiplications.

00:23:23.160 --> 00:23:26.315
And roughly multiplication
is quadratic in the size

00:23:26.315 --> 00:23:29.960
of the thing you're multiplying
because the grade school

00:23:29.960 --> 00:23:31.980
method of multiplication
you take all the digits

00:23:31.980 --> 00:23:34.910
and multiply them by all the
other digits in the number.

00:23:34.910 --> 00:23:38.785
And as a result, doing
exponentiation multiplication

00:23:38.785 --> 00:23:40.650
is roughly quadratic
in the input side.

00:23:40.650 --> 00:23:46.460
So if we shrink the value of p,
we basically go from 1,000 bits

00:23:46.460 --> 00:23:49.204
to 512 bits, we reduce the
size of our input by 2.

00:23:49.204 --> 00:23:51.370
So this means all this
multiplication exponentiation

00:23:51.370 --> 00:23:54.930
is going to be roughly
4 times cheaper.

00:23:54.930 --> 00:23:58.530
So even though we do it twice,
each time is 4 times faster.

00:23:58.530 --> 00:24:01.300
So overall, the
CRT optimization is

00:24:01.300 --> 00:24:04.120
going to give us
basically a 2x performance

00:24:04.120 --> 00:24:08.080
boost for doing any
RSA operation both,

00:24:08.080 --> 00:24:10.694
in the encryption
and decryption side.

00:24:10.694 --> 00:24:14.220
That make sense?

00:24:14.220 --> 00:24:15.570
All right.

00:24:15.570 --> 00:24:20.250
So that's the first optimization
that most people use.

00:24:20.250 --> 00:24:24.550
The second thing that
most implementations do

00:24:24.550 --> 00:24:27.195
is a technique called
sliding windows.

00:24:32.620 --> 00:24:36.200
And we'll look at this
implementation in 2 steps

00:24:36.200 --> 00:24:40.199
so this implementation is
going to be concerned with what

00:24:40.199 --> 00:24:41.740
basic operations
are going to perform

00:24:41.740 --> 00:24:44.390
to do this exponentiation.

00:24:44.390 --> 00:24:49.000
Suppose you have some
ciphertext c that's now 500 bits

00:24:49.000 --> 00:24:52.155
because you were not
doing mod p or mod q.

00:24:52.155 --> 00:24:58.270
We have a 500 bit c and,
similarly, roughly a 500 bit d

00:24:58.270 --> 00:25:00.185
as well.

00:25:00.185 --> 00:25:04.070
So how do we raise
c to the power d?

00:25:04.070 --> 00:25:07.040
I guess the stupid way
that is to take c and keep

00:25:07.040 --> 00:25:08.740
multiplying d times.

00:25:08.740 --> 00:25:10.770
But d is very big,
it's 2 to the 500.

00:25:10.770 --> 00:25:12.940
So that's never going to finish.

00:25:12.940 --> 00:25:16.780
So a more amenable,
or more performant,

00:25:16.780 --> 00:25:20.810
plan is to do what's
called repeat of squaring.

00:25:20.810 --> 00:25:24.880
So that's the step
before sliding windows.

00:25:24.880 --> 00:25:31.360
So this technique called
repeated squaring looks

00:25:31.360 --> 00:25:31.860
like this.

00:25:31.860 --> 00:25:40.580
So if you want to compute
c to the power 2 x,

00:25:40.580 --> 00:25:46.080
then you can actually compute
c to the x and then square it.

00:25:46.080 --> 00:25:48.600
So in our naive plan,
computing c to the 2x

00:25:48.600 --> 00:25:50.850
would have involved us making
twice as many iterations

00:25:50.850 --> 00:25:53.449
of multiplying because it's
multiplying c twice many times.

00:25:53.449 --> 00:25:55.490
But in fact, you could be
clever and just compute

00:25:55.490 --> 00:25:58.336
c to the x and then
square it later.

00:25:58.336 --> 00:26:00.610
So this works well,
and this means

00:26:00.610 --> 00:26:06.810
that if you're computing c to
some even exponent, this works.

00:26:06.810 --> 00:26:10.412
And conversely, if you're
computing c to some 2x plus 1,

00:26:10.412 --> 00:26:11.870
then you could
imagine this is just

00:26:11.870 --> 00:26:16.461
c to the x squared
times another c.

00:26:16.461 --> 00:26:18.770
So this is what's called
repeated squaring.

00:26:18.770 --> 00:26:23.375
And this now allows us to
compute these exponentiations,

00:26:23.375 --> 00:26:27.600
or modular exponentiations,
in a time that's

00:26:27.600 --> 00:26:31.200
basically linear in the
size of the exponent.

00:26:31.200 --> 00:26:34.110
So for every bit
in the exponent,

00:26:34.110 --> 00:26:37.090
we're going to either
square something

00:26:37.090 --> 00:26:40.760
or square something then
do an extra multiplication.

00:26:40.760 --> 00:26:43.920
So that's the plan
for repeated squaring.

00:26:43.920 --> 00:26:47.290
So now we can at least have
non-embarrassing run times

00:26:47.290 --> 00:26:50.045
for computing modular exponents.

00:26:50.045 --> 00:26:54.652
Does this make sense, why this
is working and why it's faster?

00:26:54.652 --> 00:26:56.610
All right, so what's this
sliding windows trick

00:26:56.610 --> 00:26:58.930
that the paper talks about?

00:26:58.930 --> 00:27:02.500
So this is a little bit
more sophisticated than this

00:27:02.500 --> 00:27:04.050
repeating squaring business.

00:27:04.050 --> 00:27:08.020
And basically the
squaring is going

00:27:08.020 --> 00:27:09.690
to be pretty much inevitable.

00:27:09.690 --> 00:27:13.450
But what the sliding windows
optimization is trying do

00:27:13.450 --> 00:27:17.570
is reduce the overhead of
multiplying by this extra c

00:27:17.570 --> 00:27:18.656
down here.

00:27:18.656 --> 00:27:21.300
So suppose if you
have some number that

00:27:21.300 --> 00:27:25.470
has several 1 bits in the
exponent, for every 1 bit

00:27:25.470 --> 00:27:27.485
in the exponent in the
binder of presentation,

00:27:27.485 --> 00:27:30.670
you're going to have do this
step instead of this step.

00:27:30.670 --> 00:27:33.130
Because for every
odd number, you're

00:27:33.130 --> 00:27:34.610
going to have to multiply by c.

00:27:34.610 --> 00:27:37.930
So these guys would like to not
multiply by this c as often.

00:27:37.930 --> 00:27:44.754
So the plan is to precompute
different powers of c.

00:27:44.754 --> 00:27:46.170
So what we're going
to do is we're

00:27:46.170 --> 00:27:48.340
going to generate
a table that says,

00:27:48.340 --> 00:27:53.020
well, here's the value of c
to the x-- sorry, c to the 1--

00:27:53.020 --> 00:27:56.460
here's the value of c
to the 3, c to the 7.

00:27:56.460 --> 00:27:57.960
And I think
[? in open ?] as a cell,

00:27:57.960 --> 00:28:02.020
it goes up to c to the 31st.

00:28:02.020 --> 00:28:04.780
So this table is
going to just be

00:28:04.780 --> 00:28:08.640
precomputed when you want to
do some modular exponentiation.

00:28:08.640 --> 00:28:11.660
You're going to precompute
all the slots in this table.

00:28:11.660 --> 00:28:14.340
And then when you want to do
this exponentiation, instead

00:28:14.340 --> 00:28:16.850
of doing the repeated squaring
and multiplying by this c

00:28:16.850 --> 00:28:18.754
every time,

00:28:18.754 --> 00:28:20.420
You're going to use
a different formula.

00:28:20.420 --> 00:28:26.580
It says as well if you have
c to the 32x plus some y,

00:28:26.580 --> 00:28:29.075
well you can do c
to the x, and you

00:28:29.075 --> 00:28:33.665
can do repeated squaring--
very much like before-- this

00:28:33.665 --> 00:28:38.250
is to get the 32, there's
like 5 powers of 2 here

00:28:38.250 --> 00:28:41.560
times c to the y.

00:28:41.560 --> 00:28:44.055
And c to the y, you can
get out of this table.

00:28:44.055 --> 00:28:46.770
So you can see that we're doing
the same number of squaring

00:28:46.770 --> 00:28:48.280
as before here.

00:28:48.280 --> 00:28:52.270
But we don't have to
multiply by c as many times.

00:28:52.270 --> 00:28:54.400
You're going to fish
it out of this table

00:28:54.400 --> 00:28:56.580
and do several multiplies
by c for the cost

00:28:56.580 --> 00:28:59.030
of a single multiply.

00:28:59.030 --> 00:29:00.484
This make sense?

00:29:00.484 --> 00:29:00.983
Yeah?

00:29:00.983 --> 00:29:03.876
AUDIENCE: How do you determine
x and y in the first place?

00:29:03.876 --> 00:29:05.125
PROFESSOR: How do determine y?

00:29:05.125 --> 00:29:06.156
AUDIENCE: X and y.

00:29:06.156 --> 00:29:07.000
PROFESSOR: Oh, OK.

00:29:07.000 --> 00:29:08.380
So let's look at that.

00:29:08.380 --> 00:29:13.290
So for repeated
squaring, well actually

00:29:13.290 --> 00:29:14.940
in both cases,
what you want to do

00:29:14.940 --> 00:29:17.240
is you want to look
at the exponent

00:29:17.240 --> 00:29:21.830
that you're trying to use
in a binary representation.

00:29:21.830 --> 00:29:26.180
So suppose I'm trying to compute
the value of c to the exponent,

00:29:26.180 --> 00:29:32.755
I don't know, 1 0 1 1 0 1 0,
and maybe there's more bits.

00:29:32.755 --> 00:29:35.310
OK, so if we wanted to
do repeated squaring,

00:29:35.310 --> 00:29:38.410
then you look at the
lowest bit here-- it's 0.

00:29:38.410 --> 00:29:39.910
So what you're
going to write down

00:29:39.910 --> 00:29:46.346
is this is equal to c to
the 1 0 1 1 0 1 squared.

00:29:46.346 --> 00:29:49.205
OK, so now if only
you knew this value,

00:29:49.205 --> 00:29:50.812
then you could just square it.

00:29:50.812 --> 00:29:54.816
OK, now we're going to compute
this guy, so c to the 1 0 1 1

00:29:54.816 --> 00:29:57.850
0 1 is equal to-- well
here we can't use this rule

00:29:57.850 --> 00:30:00.400
because it's not 2x-- it's
going to be to the x plus 1.

00:30:00.400 --> 00:30:06.030
So now we're going to write
this is c to the 1 0 1 1 0

00:30:06.030 --> 00:30:09.430
squared times another c.

00:30:09.430 --> 00:30:15.020
Because it's this prefix
times 2 plus this one of m.

00:30:15.020 --> 00:30:17.140
That's how you fish it
out for repeated squaring.

00:30:17.140 --> 00:30:19.950
And for sliding window,
you just grab more bits

00:30:19.950 --> 00:30:20.680
from the low end.

00:30:20.680 --> 00:30:24.090
So if you wanted to do the
sliding window trick here

00:30:24.090 --> 00:30:27.130
instead of taking
one c out, suppose

00:30:27.130 --> 00:30:29.880
we do-- instead of this
giant table-- maybe

00:30:29.880 --> 00:30:30.980
we do 3 bits at a time.

00:30:30.980 --> 00:30:32.785
So we go off to c to the 7th.

00:30:32.785 --> 00:30:36.620
So here you would
grab the first 3 bits,

00:30:36.620 --> 00:30:40.448
and that's what you would
compute here: c to the 1

00:30:40.448 --> 00:30:42.700
0 1 to the 8th power.

00:30:42.700 --> 00:30:47.995
And then, the rest is c
to the 1 0 1 power here.

00:30:47.995 --> 00:30:50.120
It's a little unfortunate
these are the same thing,

00:30:50.120 --> 00:30:53.001
but really there's
more bits here.

00:30:53.001 --> 00:30:54.625
But here, this is
the thing that you're

00:30:54.625 --> 00:30:55.875
going to look up in the table.

00:30:55.875 --> 00:30:57.760
This is c to the 5th in decimal.

00:30:57.760 --> 00:31:00.590
And this says you're going to
keep doing the sliding window

00:31:00.590 --> 00:31:03.310
to compute this value.

00:31:03.310 --> 00:31:05.036
Make sense?

00:31:05.036 --> 00:31:06.410
This just saves
on how many times

00:31:06.410 --> 00:31:08.760
you have to multiply
by c by pre-multiplying

00:31:08.760 --> 00:31:10.910
it a bunch of times.

00:31:10.910 --> 00:31:12.870
[? And the cell guys ?]
at least 10 years ago

00:31:12.870 --> 00:31:16.520
thought that going
up to 32 power

00:31:16.520 --> 00:31:18.229
was the best plan in
terms of efficiency

00:31:18.229 --> 00:31:20.020
because there's some
trade off here, right?

00:31:20.020 --> 00:31:21.728
You spend time
preconfiguring this table,

00:31:21.728 --> 00:31:24.109
but then if this
table is too giant,

00:31:24.109 --> 00:31:25.650
you're not going to
use some entries,

00:31:25.650 --> 00:31:28.190
because if you run
this table out to,

00:31:28.190 --> 00:31:31.700
I don't know, c to the 128
but you're computing just

00:31:31.700 --> 00:31:33.191
like 500 [? full bit ?]
exponents,

00:31:33.191 --> 00:31:35.190
maybe you're not going
to use all these entries.

00:31:35.190 --> 00:31:36.670
So it's gonna be
a waste of time.

00:31:36.670 --> 00:31:37.170
Question.

00:31:37.170 --> 00:31:41.156
AUDIENCE: [INAUDIBLE]
Is there a reason

00:31:41.156 --> 00:31:44.128
not to compute the
table [INAUDIBLE]?

00:31:44.128 --> 00:31:44.628
[INAUDIBLE]

00:31:49.460 --> 00:31:52.240
PROFESSOR: It ends
up being the case

00:31:52.240 --> 00:31:57.740
that you don't want to-- well
there's two things going on.

00:31:57.740 --> 00:32:01.850
One is that you'll have now code
to check whether the entry is

00:32:01.850 --> 00:32:05.440
filled in or not, and that'll
probably reduce your branch

00:32:05.440 --> 00:32:07.232
predictor accuracy
on the CPU So it

00:32:07.232 --> 00:32:09.010
will run slower
in the common case

00:32:09.010 --> 00:32:11.903
because if you [INAUDIBLE]
with the entries there.

00:32:11.903 --> 00:32:13.319
Another slightly
annoying thing is

00:32:13.319 --> 00:32:15.850
that it turns out
this entry leaks stuff

00:32:15.850 --> 00:32:18.440
through a different
side-channel, namely

00:32:18.440 --> 00:32:20.670
cache access patterns.

00:32:20.670 --> 00:32:23.610
So if you have some other
process on the same CPU,

00:32:23.610 --> 00:32:26.650
you can sort of see which
cache addresses are getting

00:32:26.650 --> 00:32:30.910
evicted out of the cache or are
slower because someone accessed

00:32:30.910 --> 00:32:32.730
this entry or this entry.

00:32:32.730 --> 00:32:35.400
And the bigger this
table gets, the easier

00:32:35.400 --> 00:32:38.630
it is to tell what the
exponent bits were.

00:32:38.630 --> 00:32:42.930
In the limit, this table is
gigantic and just telling,

00:32:42.930 --> 00:32:47.680
just being able to tell which
cache address on this CPU

00:32:47.680 --> 00:32:50.345
had a [? miss ?] tells you that
the encryption process must

00:32:50.345 --> 00:32:51.965
have accessed that
entry in the table.

00:32:51.965 --> 00:32:55.450
And tells you that, oh that long
bit sequence appears somewhere

00:32:55.450 --> 00:32:58.170
in your secret key exponent.

00:32:58.170 --> 00:33:00.930
So I guess the answer
isn't mathematically

00:33:00.930 --> 00:33:03.080
you could totally fill
this in on demand.

00:33:03.080 --> 00:33:06.550
In practice, you probably
don't want it to be that giant.

00:33:06.550 --> 00:33:08.810
And also, if you have
it's particularly giant,

00:33:08.810 --> 00:33:12.350
you aren't going to be able to
use entries as efficiently as

00:33:12.350 --> 00:33:13.250
well.

00:33:13.250 --> 00:33:14.910
You can reuse these
entries as you're

00:33:14.910 --> 00:33:16.576
computing. [INAUDIBLE]
It's not actually

00:33:16.576 --> 00:33:19.460
that expensive because
you use c to the cubed

00:33:19.460 --> 00:33:23.330
when you're computing c to the
7th and so on and so forth.

00:33:23.330 --> 00:33:25.644
It's not that bad.

00:33:25.644 --> 00:33:26.800
Make sense?

00:33:26.800 --> 00:33:30.040
Other questions?

00:33:30.040 --> 00:33:31.260
All right.

00:33:31.260 --> 00:33:35.250
So this is the repeated
squaring and sliding

00:33:35.250 --> 00:33:41.384
window optimization that
open [? a cell ?] implements

00:33:41.384 --> 00:33:43.550
[INAUDIBLE] I don't actually
know whether they still

00:33:43.550 --> 00:33:46.252
have the same size of the
sliding window or not.

00:33:46.252 --> 00:33:48.460
But it does actually give
you a fair bit of speed up.

00:33:48.460 --> 00:33:53.135
So before you had to square
for every bit in the exponent.

00:33:53.135 --> 00:33:57.060
And then you'd have to have
a multiply for every 1 bit.

00:33:57.060 --> 00:33:59.990
So if you have a 500
bit exponent then

00:33:59.990 --> 00:34:02.880
you're going to do 500
squarings and, on average,

00:34:02.880 --> 00:34:06.349
roughly 256
multiplications by c.

00:34:06.349 --> 00:34:07.890
So with sliding
windows, you're going

00:34:07.890 --> 00:34:11.469
to still do the 512
squarings because there's

00:34:11.469 --> 00:34:13.280
no getting around that.

00:34:13.280 --> 00:34:16.050
But instead of doing
256 multiplies by c,

00:34:16.050 --> 00:34:19.214
you're going to
hopefully do way fewer,

00:34:19.214 --> 00:34:21.130
maybe something on the
order of 32 [INAUDIBLE]

00:34:21.130 --> 00:34:24.900
multiplies by some
entry in this table.

00:34:24.900 --> 00:34:27.489
So that's the general plan.

00:34:27.489 --> 00:34:31.400
[INAUDIBLE] Not as
dramatic as CRT, not 2x,

00:34:31.400 --> 00:34:33.760
but it could save
you like almost 1.5x.

00:34:37.516 --> 00:34:40.660
All depending on exactly
what [INAUDIBLE].

00:34:40.660 --> 00:34:42.870
Make sense?

00:34:42.870 --> 00:34:45.888
Another question about this?

00:34:45.888 --> 00:34:47.260
All right.

00:34:47.260 --> 00:34:50.360
So these are the [? roughly ?]
easier optimizations.

00:34:50.360 --> 00:34:53.040
And then there's
two clever tricks

00:34:53.040 --> 00:34:57.290
playing with numbers for how to
do just a multiplication more

00:34:57.290 --> 00:34:59.150
efficiently.

00:34:59.150 --> 00:35:01.690
So the first one of
these optimizations

00:35:01.690 --> 00:35:04.080
that we're going to
look at-- I think

00:35:04.080 --> 00:35:08.060
I'll raise this board--
is called this Montgomery

00:35:08.060 --> 00:35:09.820
representation.

00:35:09.820 --> 00:35:13.190
And we'll see in
a second why it's

00:35:13.190 --> 00:35:14.800
particularly important for us.

00:35:23.820 --> 00:35:26.700
So the problem that this
Montgomery representation

00:35:26.700 --> 00:35:29.150
optimization is
trying to solve for us

00:35:29.150 --> 00:35:33.170
is the fact that every
time we do a multiply,

00:35:33.170 --> 00:35:34.880
we get a number
that keeps growing

00:35:34.880 --> 00:35:36.650
and growing and growing.

00:35:36.650 --> 00:35:40.690
In particular, both
in sliding windows

00:35:40.690 --> 00:35:43.750
or in repeated
squaring, actually when

00:35:43.750 --> 00:35:46.010
you square you multiply
2 numbers together,

00:35:46.010 --> 00:35:47.510
when you multiply
by c to the y, you

00:35:47.510 --> 00:35:48.685
multiply 2 numbers together.

00:35:48.685 --> 00:35:53.010
And the problem is that if the
inputs to the multiplication

00:35:53.010 --> 00:35:56.910
were, let's say, 512 bits each.

00:35:56.910 --> 00:35:59.140
Then the result of
the multiplication

00:35:59.140 --> 00:36:01.130
is going to be 1,000 bits.

00:36:01.130 --> 00:36:03.120
And then you'd take
this 1,000 bit result

00:36:03.120 --> 00:36:04.746
and you multiply it
again by something

00:36:04.746 --> 00:36:05.870
like five [INAUDIBLE] bits.

00:36:05.870 --> 00:36:08.910
And now it's 1,500 bits,
2,000 bits, 2,500 bits,

00:36:08.910 --> 00:36:10.790
and it keeps
growing and growing.

00:36:10.790 --> 00:36:13.430
And you really don't want
this because multiplications

00:36:13.430 --> 00:36:17.670
[? quadratic ?] in the size of
the number we're multiplying.

00:36:17.670 --> 00:36:19.430
So we have to keep
the size of our number

00:36:19.430 --> 00:36:21.985
as small as possible,
which means basically 512

00:36:21.985 --> 00:36:27.360
bits because all this
computation is mod p or mod q.

00:36:27.360 --> 00:36:28.045
Yeah?

00:36:28.045 --> 00:36:29.670
AUDIENCE: What do
you want [INAUDIBLE]?

00:36:31.960 --> 00:36:33.210
PROFESSOR: That's right, yeah.

00:36:33.210 --> 00:36:36.240
So the cool thing is that
we can keep this number down

00:36:36.240 --> 00:36:37.640
because what we
do is, let's say,

00:36:37.640 --> 00:36:40.730
we want to compute c to the
x just for this example.

00:36:40.730 --> 00:36:41.524
Squared.

00:36:41.524 --> 00:36:43.270
Squared again.

00:36:43.270 --> 00:36:44.350
Squared again.

00:36:44.350 --> 00:36:46.610
What you could do is
you compute c to the x

00:36:46.610 --> 00:36:49.740
then you take mod
p, let's say, right.

00:36:49.740 --> 00:36:53.110
Then you square it then
you do mod p again.

00:36:53.110 --> 00:36:56.820
Then you square it again,
and then you do mod p again.

00:36:56.820 --> 00:36:57.539
And so on.

00:36:57.539 --> 00:36:59.330
So this is basically
what you're proposing.

00:36:59.330 --> 00:37:00.100
So this is great.

00:37:00.100 --> 00:37:02.830
In fact, this keeps
it size of our numbers

00:37:02.830 --> 00:37:05.260
to basically five total
bits, which is about as

00:37:05.260 --> 00:37:06.890
small as we can get.

00:37:06.890 --> 00:37:08.710
This is good in
terms of keeping down

00:37:08.710 --> 00:37:11.940
the size of these numbers
for multiplication.

00:37:11.940 --> 00:37:15.310
But it's actually kind of
expensive to do this mod p

00:37:15.310 --> 00:37:16.920
operation.

00:37:16.920 --> 00:37:19.240
Because the way that you
do mod p something is

00:37:19.240 --> 00:37:21.740
you basically have
to do division.

00:37:21.740 --> 00:37:24.510
And division is way worse
than multiplication.

00:37:24.510 --> 00:37:27.730
I'm not going to go through
the algorithms for division,

00:37:27.730 --> 00:37:30.520
but it's really slow.

00:37:30.520 --> 00:37:33.907
You usually want to avoid
division as much as possible.

00:37:33.907 --> 00:37:36.240
Because it's not even just a
straightforward programming

00:37:36.240 --> 00:37:39.290
thing, you have to do some
approximation algorithm,

00:37:39.290 --> 00:37:41.780
some sort of Newton's
method of some sort

00:37:41.780 --> 00:37:43.330
and just keep it [INAUDIBLE].

00:37:43.330 --> 00:37:44.790
It's going to be slow.

00:37:44.790 --> 00:37:47.290
And in the main
implementation, this actually

00:37:47.290 --> 00:37:50.640
turns out to be the slowest
part of doing multiplication.

00:37:50.640 --> 00:37:52.230
The multiplication is cheap.

00:37:52.230 --> 00:37:56.210
But then doing mod p or mod q
to bring it back down in size

00:37:56.210 --> 00:37:59.190
is going to be actually more
expensive than the multiplying.

00:37:59.190 --> 00:38:01.480
So that's actually
kind of a bummer.

00:38:01.480 --> 00:38:04.560
So the way that we're
going to get around this

00:38:04.560 --> 00:38:08.590
is by doing this multiplication,
this clever other

00:38:08.590 --> 00:38:13.280
representation, and also
I'll show you the trick here.

00:38:13.280 --> 00:38:14.780
Let's see.

00:38:14.780 --> 00:38:16.680
Bear with me for a
second, and then we'll

00:38:16.680 --> 00:38:21.082
and then see why it's so fast
to use this Montgomery trick.

00:38:21.082 --> 00:38:26.190
And the basic idea is
to represent numbers,

00:38:26.190 --> 00:38:29.570
these are regular numbers
that you might actually

00:38:29.570 --> 00:38:30.852
want to multiply.

00:38:30.852 --> 00:38:32.980
And we're going to have a
different representation

00:38:32.980 --> 00:38:35.313
for these numbers, called the
Montgomery representation.

00:38:37.530 --> 00:38:41.190
And that representation
is actually very easy.

00:38:41.190 --> 00:38:43.990
We just take the value
a and we multiply it

00:38:43.990 --> 00:38:46.000
by some magic value R.

00:38:46.000 --> 00:38:48.250
I'll tell you what
this R is in a second.

00:38:48.250 --> 00:38:51.710
But let's first figure out if
you pick some arbitrary value

00:38:51.710 --> 00:38:53.820
R, what's going to happen here?

00:38:53.820 --> 00:38:56.200
So we take 2 numbers, a and b.

00:38:56.200 --> 00:39:00.075
Their Montgomery representations
are sort of expectedly.

00:39:00.075 --> 00:39:02.840
A is aR, b is bR.

00:39:02.840 --> 00:39:05.920
And if you want to compute
the product of a times b,

00:39:05.920 --> 00:39:08.100
well in Montgomery
space, you can also

00:39:08.100 --> 00:39:09.160
multiply these guys out.

00:39:09.160 --> 00:39:13.310
You can take aR
multiply it by bR.

00:39:13.310 --> 00:39:17.130
And what you get here
is ab times R squared.

00:39:17.130 --> 00:39:18.770
So there are two Rs now.

00:39:18.770 --> 00:39:22.570
That's kind of annoying, but
you can divide that by R.

00:39:22.570 --> 00:39:29.610
And we get ab times R. So this
is probably weird in a sense

00:39:29.610 --> 00:39:32.190
that why would you
multiply this extra number.

00:39:32.190 --> 00:39:34.525
But let's first figure out
whether this is correct.

00:39:34.525 --> 00:39:37.179
And then we'll figure out why
this is going to be faster.

00:39:37.179 --> 00:39:39.220
So it's correct in the
sense that it's very easy.

00:39:39.220 --> 00:39:40.840
If you want to
multiply some numbers,

00:39:40.840 --> 00:39:43.364
we just multiply by this R
value and get the Montgomery

00:39:43.364 --> 00:39:44.208
representation.

00:39:44.208 --> 00:39:45.980
Then we can do all
these multiplications

00:39:45.980 --> 00:39:47.920
to these Montgomery forms.

00:39:47.920 --> 00:39:50.264
And every time we
multiply 2 numbers,

00:39:50.264 --> 00:39:52.180
we have to divide by R,
look at the Montgomery

00:39:52.180 --> 00:39:54.550
form of the
multiplication result.

00:39:54.550 --> 00:39:56.360
And then when we're
done doing all

00:39:56.360 --> 00:39:58.780
of our squarings,
multiplication, all this stuff,

00:39:58.780 --> 00:40:01.180
we're going to move back
to the normal, regular form

00:40:01.180 --> 00:40:04.890
by just dividing
by R one last time.

00:40:04.890 --> 00:40:06.586
AUDIENCE: [INAUDIBLE]

00:40:06.586 --> 00:40:08.086
PROFESSOR: We're
now going to pick R

00:40:08.086 --> 00:40:09.560
to be a very nice number.

00:40:09.560 --> 00:40:11.900
And in particular,
we're going to pick R

00:40:11.900 --> 00:40:17.780
to be a very nice number to make
this division by R very fast.

00:40:17.780 --> 00:40:21.320
And the cool thing is
that if this division by R

00:40:21.320 --> 00:40:24.499
is going to be very
fast, then this

00:40:24.499 --> 00:40:26.290
is going to be a small
number and we're not

00:40:26.290 --> 00:40:29.460
going to have to do
this mod q very often.

00:40:29.460 --> 00:40:32.120
In particular, aR,
let's say, is also

00:40:32.120 --> 00:40:34.530
going to be roughly 500 bits
because it's all actually

00:40:34.530 --> 00:40:36.630
mod p or mod q.

00:40:36.630 --> 00:40:39.320
So aR is 500 bits.

00:40:39.320 --> 00:40:41.230
BR is going to also be 500 bits.

00:40:41.230 --> 00:40:44.160
So this product is
going to be 1,000 bits.

00:40:44.160 --> 00:40:46.830
This R is going to be
this nice 500 roughly bit

00:40:46.830 --> 00:40:48.630
number, same size as p.

00:40:48.630 --> 00:40:50.925
And if we can make this
division to be fast,

00:40:50.925 --> 00:40:55.744
then the result is going to be
a roughly 500 bit number here.

00:40:55.744 --> 00:40:57.910
So we were able to do the
multiplying without having

00:40:57.910 --> 00:40:59.400
to do an extra divide.

00:40:59.400 --> 00:41:03.920
Dividing by R cheaply gives us
this small result, getting us

00:41:03.920 --> 00:41:08.360
out of doing a mod p
for most situations.

00:41:08.360 --> 00:41:11.670
OK, so what is this weird number
that I keep talking about?

00:41:11.670 --> 00:41:17.944
Well R is just going
to be 2 to 512.

00:41:17.944 --> 00:41:22.930
It's going to be 1
followed by a ton of zeros.

00:41:22.930 --> 00:41:25.260
So multiplying by
this is easy, you just

00:41:25.260 --> 00:41:27.320
append a bunch of
zeros to a number.

00:41:27.320 --> 00:41:32.960
Dividing could be easy if
the low bits of the result

00:41:32.960 --> 00:41:34.547
are all zeros.

00:41:34.547 --> 00:41:37.750
So if you have a value
that's a bunch of bits

00:41:37.750 --> 00:41:41.460
followed by 512 zeros, then
dividing by 2 to the 512

00:41:41.460 --> 00:41:41.960
is cheap.

00:41:41.960 --> 00:41:44.337
You just discard the zeros
on the right-hand side.

00:41:44.337 --> 00:41:47.140
And that's actually
the correct division.

00:41:47.140 --> 00:41:48.650
Does that make sense?

00:41:48.650 --> 00:41:50.311
The slight problem
is that we actually

00:41:50.311 --> 00:41:51.664
don't have zeros on
the right hand side

00:41:51.664 --> 00:41:53.110
when you do this multiplication.

00:41:53.110 --> 00:41:56.750
These are like real 512 bit
numbers with all the 512 bits

00:41:56.750 --> 00:41:57.460
used.

00:41:57.460 --> 00:41:58.890
So this will be a
1,000 bit number

00:41:58.890 --> 00:42:02.352
[? or ?] with all this bits
also set to randomly 0 or 1,

00:42:02.352 --> 00:42:03.560
depending on what's going on.

00:42:03.560 --> 00:42:06.460
So we can't just
discard the low bits.

00:42:06.460 --> 00:42:09.144
But the cleverness
comes from the fact

00:42:09.144 --> 00:42:11.210
that the only
thing we care about

00:42:11.210 --> 00:42:14.370
is the value of
this thing mod p.

00:42:14.370 --> 00:42:18.610
So you can always add
multiples of p to this value

00:42:18.610 --> 00:42:22.380
without changing it when
it's equivalent to mod p.

00:42:22.380 --> 00:42:25.130
And as a result, we
can add multiples of p

00:42:25.130 --> 00:42:28.020
to get the low bits
to all be zeros.

00:42:28.020 --> 00:42:30.510
So let's look through
some simple examples.

00:42:30.510 --> 00:42:33.390
I'm not going to write
out 512 bits on the board.

00:42:33.390 --> 00:42:37.325
But suppose that--
here's a short example.

00:42:40.200 --> 00:42:42.710
Suppose that we have
a situation where

00:42:42.710 --> 00:42:46.340
our value R is 2 to the 4th.

00:42:46.340 --> 00:42:49.810
So it's 1 followed
by four zeros.

00:42:49.810 --> 00:42:53.170
So this is a much smaller
example than the real thing.

00:42:53.170 --> 00:42:55.140
But let's see how this
Montgomery division

00:42:55.140 --> 00:42:57.170
is going to work out.

00:42:57.170 --> 00:43:02.600
So suppose we're going to try
to compute stuff mod q, where

00:43:02.600 --> 00:43:05.570
q, let's say, is maybe 7.

00:43:05.570 --> 00:43:10.000
So this is 1 1 1 in binary form.

00:43:10.000 --> 00:43:12.970
And what we're
going to try to do

00:43:12.970 --> 00:43:16.360
is maybe we did
some multiplication.

00:43:16.360 --> 00:43:19.700
And this value aR
times bR is equal

00:43:19.700 --> 00:43:26.520
to this binary
presentation 1 1 0 1 0.

00:43:26.520 --> 00:43:31.060
So this is going to be
the value of aR times bR.

00:43:31.060 --> 00:43:32.780
How do we divide it by R?

00:43:32.780 --> 00:43:35.175
So clearly the low
four bits aren't all 0,

00:43:35.175 --> 00:43:37.472
so we can't just divide it out.

00:43:37.472 --> 00:43:40.680
But we can add multiples of q.

00:43:40.680 --> 00:43:45.510
In particular, we
can add 2 times q.

00:43:45.510 --> 00:43:49.700
So 2q is equal to 1 1 1 0.

00:43:49.700 --> 00:43:56.740
And now what we get
is 0 0, carry a 1, 0,

00:43:56.740 --> 00:44:01.520
carry a 1, 1, carry a 1, 0 1.

00:44:01.520 --> 00:44:02.520
I hope I did that right.

00:44:02.520 --> 00:44:03.530
So this is what we get.

00:44:03.530 --> 00:44:07.207
So now we get aR
bR plus 2 cubed.

00:44:07.207 --> 00:44:09.290
But we actually don't care
about the plus 2 cubed.

00:44:09.290 --> 00:44:11.123
It's actually fine
because all we care about

00:44:11.123 --> 00:44:12.190
is the value of mod q.

00:44:15.190 --> 00:44:18.070
And now we're closer, we have
three 0 bits at the bottom.

00:44:18.070 --> 00:44:20.190
Now we can add
another multiple of q.

00:44:20.190 --> 00:44:23.000
This time it's going
to be probably 8q.

00:44:23.000 --> 00:44:26.680
So we add 1 1 1 here 0 0.

00:44:26.680 --> 00:44:29.905
And if we add it, we're
going to get, let's say,

00:44:29.905 --> 00:44:37.120
0 0 0 then add these two guys
0, carry a 1, 0, carry a 1, 1 1.

00:44:37.120 --> 00:44:38.250
I think that's right.

00:44:38.250 --> 00:44:41.390
But now we have
our original aR bR

00:44:41.390 --> 00:44:45.030
plus 2q plus 8q is
equal to this thing.

00:44:45.030 --> 00:44:48.720
And finally, we can divide
this thing by R very cheaply.

00:44:48.720 --> 00:44:54.762
Because we just discard
the low four zeros.

00:44:54.762 --> 00:44:56.205
Make sense?

00:44:56.205 --> 00:44:57.167
Question.

00:44:57.167 --> 00:45:01.150
AUDIENCE: Is aR bR always
going to end in, I guess,

00:45:01.150 --> 00:45:03.270
1,024 zeros?

00:45:03.270 --> 00:45:08.021
PROFESSOR: No, and the
reason is that-- OK,

00:45:08.021 --> 00:45:10.130
here is the thing
that's maybe confusing.

00:45:10.130 --> 00:45:12.710
A was, let's say, 512 bits.

00:45:12.710 --> 00:45:15.470
Then you multiply it by
R. So here, you're right.

00:45:15.470 --> 00:45:19.380
This value is that 1,000 bit
number where the high bit is

00:45:19.380 --> 00:45:20.980
a, the high 512 bits are a.

00:45:20.980 --> 00:45:22.794
And the low bits are all zeros.

00:45:22.794 --> 00:45:24.710
But then, you're going
[? to do it with ?] mod

00:45:24.710 --> 00:45:27.410
q to bring it down
to make it smaller.

00:45:27.410 --> 00:45:29.570
And in general, this is
going to be the case.

00:45:29.570 --> 00:45:32.745
Because [? it only ?] has
these low zeros the first time

00:45:32.745 --> 00:45:33.370
you convert it.

00:45:33.370 --> 00:45:35.119
But after you do a
couple multiplications,

00:45:35.119 --> 00:45:37.685
they're going to
be arbitrary bits.

00:45:37.685 --> 00:45:40.270
So these guys are--
so I really should

00:45:40.270 --> 00:45:43.260
have written mod q here--
and to compute this mod q

00:45:43.260 --> 00:45:49.356
as soon as you do the conversion
to keep the whole value small.

00:45:49.356 --> 00:45:50.802
AUDIENCE: [INAUDIBLE]

00:45:50.802 --> 00:45:53.460
PROFESSOR: Yeah, so the
initial conversion is expensive

00:45:53.460 --> 00:45:58.650
or at least it's as expensive
as doing a regular modulus

00:45:58.650 --> 00:46:01.010
during the multiplication.

00:46:01.010 --> 00:46:03.010
The cool thing is
that you pay this cost

00:46:03.010 --> 00:46:05.176
just once when you do the
conversion into Montgomery

00:46:05.176 --> 00:46:06.122
form.

00:46:06.122 --> 00:46:09.240
And then, instead of converting
it back at every step,

00:46:09.240 --> 00:46:11.235
you just keep it
in Montgomery form.

00:46:11.235 --> 00:46:13.700
But remember that in order
to do an exponentiation

00:46:13.700 --> 00:46:16.064
to an exponent
which has 512 bits,

00:46:16.064 --> 00:46:17.480
you're saying
you're going to have

00:46:17.480 --> 00:46:21.320
to do over 500 multiplications
because we have to do at least

00:46:21.320 --> 00:46:23.870
500 squarings plus then some.

00:46:23.870 --> 00:46:27.000
So you do these mod
q twice and then

00:46:27.000 --> 00:46:30.370
you get a lot of cheap divisions
if you stay in this form.

00:46:30.370 --> 00:46:34.500
And then you do a division by R
to get back to this form again.

00:46:34.500 --> 00:46:37.520
So instead of doing 500 mod qs
for every multiplication step,

00:46:37.520 --> 00:46:39.366
you do it twice mod q.

00:46:39.366 --> 00:46:41.510
And then you keep
doing these divisions

00:46:41.510 --> 00:46:45.080
by R cheaply using this trick.

00:46:45.080 --> 00:46:45.580
Question.

00:46:45.580 --> 00:46:49.460
AUDIENCE: So when you're
adding the multiples of q

00:46:49.460 --> 00:46:51.400
and then dividing
by R, [INAUDIBLE]

00:46:54.310 --> 00:46:56.780
PROFESSOR: Because it's
actually mod q means

00:46:56.780 --> 00:46:58.920
the remainder when
you divide by q.

00:46:58.920 --> 00:47:07.990
So x plus y times
q, mod q is just x.

00:47:07.990 --> 00:47:08.930
AUDIENCE: [INAUDIBLE]

00:47:12.230 --> 00:47:16.089
PROFESSOR: So in this case,
dividing by-- so another sort

00:47:16.089 --> 00:47:17.630
of nice property is
that because it's

00:47:17.630 --> 00:47:22.450
all modulus at prime
number-- it's also true

00:47:22.450 --> 00:47:28.080
that if you have x
plus yq divided by R,

00:47:28.080 --> 00:47:35.790
mod q is actually the same
as x divided by R mod q.

00:47:35.790 --> 00:47:39.180
The way to think of it is
that there's no real division

00:47:39.180 --> 00:47:40.650
in modular arithmetic.

00:47:40.650 --> 00:47:41.730
It's just an inverse.

00:47:41.730 --> 00:47:44.060
So what this really
says is this is actually

00:47:44.060 --> 00:47:49.465
x plus yq times some
number called R inverse.

00:47:49.465 --> 00:47:52.930
And then you compute
this whole thing mod q.

00:47:52.930 --> 00:47:57.210
And then you could think of
this as x times R inverse

00:47:57.210 --> 00:48:05.320
mod q plus y [? u ?]
R inverse mod q.

00:48:05.320 --> 00:48:08.610
And this thing cancels out
because it's something times q.

00:48:15.060 --> 00:48:17.856
And there's some closed
form for this thing.

00:48:17.856 --> 00:48:22.195
So here I did it by bit by
bit, 2q then 8q, et cetera.

00:48:22.195 --> 00:48:23.765
It's actually a
nice closed formula

00:48:23.765 --> 00:48:25.630
you can compute-- it's
in the lecture notes,

00:48:25.630 --> 00:48:27.880
but it's probably not worth
spending time on the board

00:48:27.880 --> 00:48:31.215
here-- for how do you figure
out what multiple of q

00:48:31.215 --> 00:48:35.331
should you add to get all
the low bits to turn to 0.

00:48:35.331 --> 00:48:38.200
So then it turns out that in
order to do this division by R,

00:48:38.200 --> 00:48:43.450
you just need to compute this
magic multiple of q, add it.

00:48:43.450 --> 00:48:46.290
And then discard the
low bits and that

00:48:46.290 --> 00:48:53.047
brings your number back to 512
bits, or whatever the size is.

00:48:53.047 --> 00:48:54.029
OK.

00:48:54.029 --> 00:48:55.790
And here's the subtlety.

00:48:55.790 --> 00:48:57.470
The only reason we're
talking about this

00:48:57.470 --> 00:49:00.470
is that there's something
funny going on here

00:49:00.470 --> 00:49:05.090
that is going to allow us
to learn timing information.

00:49:05.090 --> 00:49:09.780
And in particular, even
though we divided by R,

00:49:09.780 --> 00:49:12.770
we know the result is
going to be 512 bits.

00:49:12.770 --> 00:49:15.123
But it still might
be greater than q

00:49:15.123 --> 00:49:16.820
because q isn't exactly
[? up to 512 ?],

00:49:16.820 --> 00:49:18.340
it's not a 512 bit number.

00:49:18.340 --> 00:49:20.840
So it might be a
little bit less than R.

00:49:20.840 --> 00:49:24.730
So it might be that after we
do this cheap division by R,

00:49:24.730 --> 00:49:26.960
[? the way ?] we
subtract out q one more

00:49:26.960 --> 00:49:29.690
time because we get something
that's small but not

00:49:29.690 --> 00:49:31.400
quite small enough.

00:49:31.400 --> 00:49:34.740
So there's a chance that
after doing this division,

00:49:34.740 --> 00:49:39.740
we maybe have to also
subtract q again.

00:49:39.740 --> 00:49:42.390
And this subtraction is
going to be part of what

00:49:42.390 --> 00:49:44.250
this attack is all about.

00:49:44.250 --> 00:49:48.060
It turns out that
subtracting this q adds time.

00:49:48.060 --> 00:49:51.660
And someone figured
out-- not these guys

00:49:51.660 --> 00:49:53.050
but some previous
work-- that you

00:49:53.050 --> 00:49:56.770
show that this probability
of doing this thing, this

00:49:56.770 --> 00:49:58.145
is called an
extractor reduction.

00:50:03.500 --> 00:50:10.020
This probability sort of
depends on the particular value

00:50:10.020 --> 00:50:12.410
that you're exponentiating.

00:50:12.410 --> 00:50:19.790
So if you're computing
x to the d mod q,

00:50:19.790 --> 00:50:22.400
the probability of
an extra reduction,

00:50:22.400 --> 00:50:25.240
at some point while
computing x to the d mod q,

00:50:25.240 --> 00:50:31.860
is going to be equal to
x mod q divided by 2R.

00:50:36.890 --> 00:50:40.390
So if we're going to be
computing x to the mod q,

00:50:40.390 --> 00:50:43.690
then depending on what
the value of x mod q

00:50:43.690 --> 00:50:45.410
is, whether it's
big or small, you're

00:50:45.410 --> 00:50:49.080
going to have even more or
less of these extra reductions.

00:50:49.080 --> 00:50:51.577
And just to show you where
this is going to fit in,

00:50:51.577 --> 00:50:53.785
this is actually going to
happen in the decrypt step,

00:50:53.785 --> 00:50:55.951
because during the decrypt
step, the server is going

00:50:55.951 --> 00:50:57.330
to be computing c to the d.

00:50:57.330 --> 00:51:00.650
And this says the
extractor reductions

00:51:00.650 --> 00:51:05.160
are going to be proportional to
how close x, or c in this case,

00:51:05.160 --> 00:51:07.254
is to the value q.

00:51:07.254 --> 00:51:08.920
So this is going to
be worrisome, right,

00:51:08.920 --> 00:51:12.490
because the attacker gets
to choose the input c.

00:51:12.490 --> 00:51:14.640
And the number of
extractor reductions

00:51:14.640 --> 00:51:16.940
is going to be proportional
to how close the c is

00:51:16.940 --> 00:51:18.981
to one of the factors, the q.

00:51:18.981 --> 00:51:21.260
And this is how you're going
to tell I'm getting close

00:51:21.260 --> 00:51:23.337
to the q, or I've overshot q.

00:51:23.337 --> 00:51:25.545
And all of a sudden, there's
no extractor reductions,

00:51:25.545 --> 00:51:28.556
it's probably because x mod
q is very small the x is

00:51:28.556 --> 00:51:29.472
q plus little epsilon.

00:51:29.472 --> 00:51:31.720
And it's very small.

00:51:31.720 --> 00:51:33.942
So that's one part
of the timing attack

00:51:33.942 --> 00:51:35.650
we're going to be
looking at in a second.

00:51:38.770 --> 00:51:42.740
I don't have any proof that
this actually true [INAUDIBLE]

00:51:42.740 --> 00:51:44.905
these extractor
reductions work like this.

00:51:44.905 --> 00:51:45.680
Yea, question.

00:51:45.680 --> 00:51:48.700
AUDIENCE: What happens if you
don't do this extra reduction?

00:51:48.700 --> 00:51:51.210
PROFESSOR: Oh, what happens
if you don't do this extractor

00:51:51.210 --> 00:51:51.710
reduction?

00:51:55.510 --> 00:51:57.850
You can avoid this
extra reduction.

00:51:57.850 --> 00:52:01.790
And then you just have
to do some extra probably

00:52:01.790 --> 00:52:03.410
modular reductions later.

00:52:03.410 --> 00:52:06.500
I think the math just
works out nicely this way

00:52:06.500 --> 00:52:07.834
for the Montgomery form.

00:52:07.834 --> 00:52:09.750
I think for many of these
things it's actually

00:52:09.750 --> 00:52:12.406
once you look at them as a
timing channel [INAUDIBLE]

00:52:12.406 --> 00:52:13.780
[? think ?] don't
do this at all,

00:52:13.780 --> 00:52:16.004
or maybe you should
do some other plan.

00:52:16.004 --> 00:52:16.670
So you're right,

00:52:16.670 --> 00:52:19.710
I think you could probably
avoid this extra reduction

00:52:19.710 --> 00:52:22.655
and probably just do the
mod q, perhaps at the end.

00:52:22.655 --> 00:52:24.840
I haven't actually
tried implementing this.

00:52:24.840 --> 00:52:27.380
But it seems like it could work.

00:52:27.380 --> 00:52:29.390
It might be that you just
have to do mod q once

00:52:29.390 --> 00:52:31.598
[? there ?], which you'll
probably have to do anyway.

00:52:31.598 --> 00:52:32.820
So it's not super clear.

00:52:32.820 --> 00:52:37.770
Maybe it's [INAUDIBLE]
probably not q.

00:52:37.770 --> 00:52:40.314
So in light of the
fact that [INAUDIBLE].

00:52:44.274 --> 00:52:46.440
Actually, I shouldn't speak
authoritatively to this.

00:52:46.440 --> 00:52:47.000
I haven't tired
implementing this.

00:52:47.000 --> 00:52:49.166
So maybe there's some deep
reason why this extractor

00:52:49.166 --> 00:52:50.184
reduction has to happen.

00:52:50.184 --> 00:52:53.490
I couldn't think of one.

00:52:53.490 --> 00:52:54.450
All right, questions?

00:52:57.110 --> 00:53:00.995
So here's the last piece of
the puzzle for how OpenSSL,

00:53:00.995 --> 00:53:06.040
this library that this
paper attacks implements

00:53:06.040 --> 00:53:07.870
multiplication.

00:53:07.870 --> 00:53:12.630
So this Montgomery trick is
great for avoiding the mod q

00:53:12.630 --> 00:53:15.630
part during modular
multiplication.

00:53:15.630 --> 00:53:17.770
But then there's a question
of how do you actually

00:53:17.770 --> 00:53:19.020
multiply two numbers together.

00:53:19.020 --> 00:53:21.235
So we're doing lower
and lower level.

00:53:21.235 --> 00:53:25.791
So suppose you have
[? the raw ?] multiplication.

00:53:28.579 --> 00:53:30.370
So this is not even
modular multiplication.

00:53:30.370 --> 00:53:33.475
You have two numbers, a and b.

00:53:33.475 --> 00:53:38.636
And both these guys
are 512 bit numbers.

00:53:38.636 --> 00:53:40.250
How do you multiply
them together

00:53:40.250 --> 00:53:42.400
when your machine is
only a 32 bit machine,

00:53:42.400 --> 00:53:46.226
like the guys in the paper, or
a 64 bit, but still, same thing?

00:53:46.226 --> 00:53:48.670
How would you implement
multiplication of these guys?

00:53:53.740 --> 00:53:56.242
Any suggestions?

00:53:56.242 --> 00:53:58.200
Well I guess it was a
straightforward question,

00:53:58.200 --> 00:54:01.860
you just represent a and
b as a sequence of machine

00:54:01.860 --> 00:54:05.290
[? words. ?] And then you
just do this quadratic product

00:54:05.290 --> 00:54:06.752
of these two guys.

00:54:06.752 --> 00:54:08.960
[INAUDIBLE] see a simple
example, instead of thinking

00:54:08.960 --> 00:54:13.574
of a 512 bit number, let's think
of these guys as 64 bit numbers

00:54:13.574 --> 00:54:15.671
and we're on a 32 bit machine.

00:54:15.671 --> 00:54:16.170
Right.

00:54:16.170 --> 00:54:17.900
So we're going to have values.

00:54:17.900 --> 00:54:20.794
The value of a is going
to be represented by two

00:54:20.794 --> 00:54:21.960
[? very ?] different things.

00:54:21.960 --> 00:54:27.550
It's going to be, let's
call it, a1 and a0.

00:54:27.550 --> 00:54:29.895
So a0 is the low bit,
a1 is the high bit.

00:54:29.895 --> 00:54:31.520
And similarly, we're
going to represent

00:54:31.520 --> 00:54:36.760
b as two things, b1 b0.

00:54:36.760 --> 00:54:39.640
So then a naive way
to represent a b

00:54:39.640 --> 00:54:44.310
is going to be to multiply
all these guys out.

00:54:44.310 --> 00:54:48.020
So it's going to be
a three cell number.

00:54:48.020 --> 00:54:52.140
The high bit is
going to be a1 b1.

00:54:52.140 --> 00:54:55.560
The low bit is
going to be a0 b0.

00:54:55.560 --> 00:55:01.845
And the middle word is going
to be a1 b0 plus a0 b1.

00:55:01.845 --> 00:55:06.330
So this is how you do the
multiplication, right.

00:55:06.330 --> 00:55:06.940
Question?

00:55:06.940 --> 00:55:08.822
AUDIENCE: So I was
going to say are

00:55:08.822 --> 00:55:10.785
you using [INAUDIBLE] method?

00:55:10.785 --> 00:55:13.060
PROFESSOR: Yeah, so this
is like a clever method

00:55:13.060 --> 00:55:15.490
alternative for doing
multiplication, which

00:55:15.490 --> 00:55:16.680
doesn't involve four steps.

00:55:16.680 --> 00:55:18.435
Here, you have to do
four multiplications.

00:55:18.435 --> 00:55:20.807
There's this clever
other method, Karatsuba.

00:55:20.807 --> 00:55:22.890
Do they teach this in 601
or something these days?

00:55:22.890 --> 00:55:23.290
AUDIENCE: 042.

00:55:23.290 --> 00:55:24.373
PROFESSOR: 042, excellent.

00:55:24.373 --> 00:55:25.980
Yeah, that's a very nice method.

00:55:25.980 --> 00:55:29.440
Almost every cryptographic
library implements this.

00:55:29.440 --> 00:55:32.230
And for those of
you that, I guess,

00:55:32.230 --> 00:55:34.980
weren't undergrads here, since
we have grad students maybe

00:55:34.980 --> 00:55:35.685
they haven't seen Karatsuba.

00:55:35.685 --> 00:55:37.184
I'll just write it
out on the board.

00:55:37.184 --> 00:55:40.850
It's a clever thing the
first time you see it.

00:55:40.850 --> 00:55:46.310
And what you can do is basically
compute out three values.

00:55:46.310 --> 00:55:49.040
You're going to
compute out a1 b1.

00:55:49.040 --> 00:55:59.190
You're going to also
compute a1 minus b0 times b1

00:55:59.190 --> 00:56:04.950
minus-- sorry-- a1
minus a0, b1 minus b0.

00:56:04.950 --> 00:56:08.690
And a0 b0.

00:56:08.690 --> 00:56:11.125
And this does three
multiplications

00:56:11.125 --> 00:56:12.225
instead of four.

00:56:12.225 --> 00:56:13.810
And it turns out
you can actually

00:56:13.810 --> 00:56:18.440
reconstruct this value from
these three multiplication

00:56:18.440 --> 00:56:20.200
results.

00:56:20.200 --> 00:56:22.810
And the particular
way to do it is this

00:56:22.810 --> 00:56:29.736
is going to be the--
let me write it out

00:56:29.736 --> 00:56:31.910
in a different form.

00:56:31.910 --> 00:56:41.010
So we're going to have 2 to the
64 times-- sorry-- 2 to the 64

00:56:41.010 --> 00:56:52.710
plus 2 to the 32
times a1 b1 plus 2

00:56:52.710 --> 00:57:00.230
to the 32 times minus that
little guy in the middle a1

00:57:00.230 --> 00:57:05.640
minus a0 b1 minus b0.

00:57:05.640 --> 00:57:15.020
And finally, we're going to do
2 to the 32 plus 1 times a0 b0.

00:57:15.020 --> 00:57:16.920
And it's a little
messy, but actually

00:57:16.920 --> 00:57:19.380
if you work through
the details, you'll

00:57:19.380 --> 00:57:20.880
end up convincing
yourself hopefully

00:57:20.880 --> 00:57:26.285
that this value is exactly
the same as this value.

00:57:26.285 --> 00:57:27.930
So it's a clever.

00:57:27.930 --> 00:57:31.470
But nonetheless, it saves
you one multiplication.

00:57:31.470 --> 00:57:34.670
And the way we
apply this to doing

00:57:34.670 --> 00:57:37.660
much larger multiplications
is that you recursively

00:57:37.660 --> 00:57:38.610
keep going down.

00:57:38.610 --> 00:57:41.750
So if you have 512
bit values, you

00:57:41.750 --> 00:57:44.790
could break it down to
256 bit multiplication.

00:57:44.790 --> 00:57:47.802
You do three 256
bit multiplications.

00:57:47.802 --> 00:57:49.260
And then each of
those you're going

00:57:49.260 --> 00:57:52.410
to do using the same
Karatsuba trick recursively.

00:57:52.410 --> 00:57:54.840
And eventually you'll get
down to machine size, which

00:57:54.840 --> 00:57:56.986
you can just do with
a single machine

00:57:56.986 --> 00:58:02.590
instruction. [INAUDIBLE]
This make sense?

00:58:02.590 --> 00:58:04.660
So what's the
timing attack here?

00:58:04.660 --> 00:58:07.430
How do these guys exploit
this Karatsuba multiplication?

00:58:07.430 --> 00:58:11.720
Well, it turns out
that OpenSSL worries

00:58:11.720 --> 00:58:13.920
about basically two
kinds of multiplications

00:58:13.920 --> 00:58:15.850
that you might need to do.

00:58:15.850 --> 00:58:18.757
One is a multiplication
between two large numbers

00:58:18.757 --> 00:58:19.965
that are about the same size.

00:58:19.965 --> 00:58:22.250
So this happens a
lot when we're doing

00:58:22.250 --> 00:58:25.327
this modular exponentiation
because all the values we're

00:58:25.327 --> 00:58:26.868
going to be multiplying
are all going

00:58:26.868 --> 00:58:29.445
to be roughly 512 bits in size.

00:58:29.445 --> 00:58:33.330
So when we're multiplying by c
to the y or doing a squaring,

00:58:33.330 --> 00:58:35.850
we're multiplying two things
that are about the same size.

00:58:35.850 --> 00:58:38.890
And then this Karatsuba
trick makes a lot of sense

00:58:38.890 --> 00:58:41.290
because, instead
of computing stuff

00:58:41.290 --> 00:58:43.790
in times squared
of the input size,

00:58:43.790 --> 00:58:48.740
Karatsuba is roughly n to the
1.58, something like that.

00:58:48.740 --> 00:58:50.335
So it's much faster.

00:58:50.335 --> 00:58:52.490
But then there's
this other situation

00:58:52.490 --> 00:58:54.930
where OpenSSL might be
multiplying two numbers that

00:58:54.930 --> 00:58:57.410
are very different in
size: one that's very big,

00:58:57.410 --> 00:58:58.530
and one that's very small.

00:58:58.530 --> 00:59:00.900
And in that case you
could use Karatsuba,

00:59:00.900 --> 00:59:02.990
but then it's going
to get you slower

00:59:02.990 --> 00:59:04.610
than doing the naive thing.

00:59:04.610 --> 00:59:06.660
Suppose you're trying
to multiply a 512 bit

00:59:06.660 --> 00:59:08.997
number by a 64 bit
number, you'd rather just

00:59:08.997 --> 00:59:10.830
do the straightforward
thing, where you just

00:59:10.830 --> 00:59:13.050
multiply by each of the
things in the 64 bit

00:59:13.050 --> 00:59:18.290
number plus 2n instead of
n to the 1.58 something.

00:59:18.290 --> 00:59:21.900
So as a result, the OpenSSL
guys tried to be clever,

00:59:21.900 --> 00:59:25.760
and that's where
often problems start.

00:59:25.760 --> 00:59:28.280
They decided that
they'll actually

00:59:28.280 --> 00:59:30.880
switch dynamically between
this Karatsuba efficient thing

00:59:30.880 --> 00:59:35.450
and this sort of grade school
method of multiplication here.

00:59:35.450 --> 00:59:37.400
And their heuristic
was basically

00:59:37.400 --> 00:59:39.050
if the two things
you're multiplying

00:59:39.050 --> 00:59:42.483
are exactly the same
number of machine words,

00:59:42.483 --> 00:59:44.024
so they at least
have the same number

00:59:44.024 --> 00:59:48.110
of bits up to 32-bit units,
then they'll go to Karatsuba.

00:59:48.110 --> 00:59:50.380
And if the two things
they're multiplying

00:59:50.380 --> 00:59:52.770
have a different
number or 32 bit units,

00:59:52.770 --> 00:59:57.660
then they'll do the quadratic
or straightforward or regular,

00:59:57.660 --> 00:59:59.882
normal multiplication.

00:59:59.882 --> 01:00:03.880
And there you can see if
your number all of a sudden

01:00:03.880 --> 01:00:06.290
switches to be a
little bit smaller,

01:00:06.290 --> 01:00:08.710
then you're going to switch
from the sufficient thing

01:00:08.710 --> 01:00:11.240
to this other
multiplication method.

01:00:11.240 --> 01:00:14.030
And presumably, the
cutoff point isn't

01:00:14.030 --> 01:00:15.595
going to be exactly
smooth so you'll

01:00:15.595 --> 01:00:17.500
be able to tell all
of a sudden, it's

01:00:17.500 --> 01:00:19.190
now taking a lot
longer to multiply

01:00:19.190 --> 01:00:22.320
or a lot shorter to
multiply than before.

01:00:22.320 --> 01:00:26.000
And that's what these guys
exploit in their timing attack

01:00:26.000 --> 01:00:26.940
again.

01:00:26.940 --> 01:00:28.060
Does that make sense?

01:00:28.060 --> 01:00:32.070
What's going on with the
[INAUDIBLE] All right.

01:00:32.070 --> 01:00:34.680
So I think I'm now
done with telling you

01:00:34.680 --> 01:00:36.385
about all the weird
implementation

01:00:36.385 --> 01:00:39.590
tricks that people play when
implementing RSA in practice.

01:00:39.590 --> 01:00:41.630
So now let's try to
put them back together

01:00:41.630 --> 01:00:44.410
into an entire web
server and figure out

01:00:44.410 --> 01:00:48.230
how do you [? tickle ?]
all these interesting bits

01:00:48.230 --> 01:00:52.220
of the implementation from
the input network packet.

01:00:52.220 --> 01:00:54.910
So what happens
in a web server is

01:00:54.910 --> 01:00:59.330
that the web server, if
you remember from the HTTPS

01:00:59.330 --> 01:01:01.890
lecture, has a secret key.

01:01:01.890 --> 01:01:04.780
And it uses the
secret key to prove

01:01:04.780 --> 01:01:06.820
that it's the
correct owner of all

01:01:06.820 --> 01:01:11.190
that certificate in the
HTTPS protocol or in TLS.

01:01:11.190 --> 01:01:15.940
And they way this works is that
the clients send some randomly

01:01:15.940 --> 01:01:19.470
chosen bits, and the
bits are encrypted

01:01:19.470 --> 01:01:21.210
using the server's public key.

01:01:21.210 --> 01:01:24.395
And the server in this TLS
protocol decrypts this message.

01:01:24.395 --> 01:01:26.730
And if the message
checks out, it

01:01:26.730 --> 01:01:29.249
uses those random bits to
establish a [? session ?].

01:01:29.249 --> 01:01:32.246
But in this case, the message
isn't going to check out.

01:01:32.246 --> 01:01:34.079
The message is going
to be carefully chosen,

01:01:34.079 --> 01:01:35.845
the padding bits
aren't going to match,

01:01:35.845 --> 01:01:37.470
and the server is
going to return error

01:01:37.470 --> 01:01:39.850
as soon as it finishes
encrypting our message.

01:01:39.850 --> 01:01:42.080
And that's what we're
going to time here.

01:01:42.080 --> 01:01:49.368
So the server-- you can think of
this is Apache with open SSL--

01:01:49.368 --> 01:01:52.500
you're going to get a
message from the client,

01:01:52.500 --> 01:01:55.940
and you can think of
this as a ciphertext

01:01:55.940 --> 01:01:59.400
c, or a hypothetical
ciphertext, that the client

01:01:59.400 --> 01:02:00.545
might have produced.

01:02:00.545 --> 01:02:03.340
And the first thing we're going
to do with a ciphertext c,

01:02:03.340 --> 01:02:06.910
we want to decrypt it
using roughly this formula.

01:02:06.910 --> 01:02:08.820
And if you remember
the first optimization

01:02:08.820 --> 01:02:12.806
we're going to apply is the
Chinese Remainder Theorem.

01:02:12.806 --> 01:02:14.306
So the first thing
we're going to do

01:02:14.306 --> 01:02:16.730
is basically split our
pipeline in two parts.

01:02:16.730 --> 01:02:20.430
We're going to do one thing
mod p another thing mod q

01:02:20.430 --> 01:02:22.719
and then recombine the
results at the end of the day.

01:02:22.719 --> 01:02:24.218
So the first thing
we're going to do

01:02:24.218 --> 01:02:26.070
is, we're actually
going to take c

01:02:26.070 --> 01:02:28.580
and we're going
to compute, let's

01:02:28.580 --> 01:02:35.480
call this c0, which is going
to be equal to c mod q.

01:02:35.480 --> 01:02:38.710
And we're also going to have
a different value, let's

01:02:38.710 --> 01:02:44.730
call it c1, which is
going to be c mod p.

01:02:44.730 --> 01:02:46.930
And then we're going to
do the same thing to each

01:02:46.930 --> 01:02:51.905
of these values to basically
compute c to the d mod p

01:02:51.905 --> 01:02:55.010
and c to the d mod q.

01:02:55.010 --> 01:02:58.070
And here we're going to
basically initially we're

01:02:58.070 --> 01:03:00.585
going to [? starch. ?]
After CRT, we're

01:03:00.585 --> 01:03:02.610
going to switch into
Montgomery representation

01:03:02.610 --> 01:03:06.040
because that's going to make
our multiplies very fast.

01:03:06.040 --> 01:03:08.150
So the next thing
SSL is going to do

01:03:08.150 --> 01:03:09.610
to your number,
it's actually going

01:03:09.610 --> 01:03:12.900
to compute all the
[INAUDIBLE] at c0 prime,

01:03:12.900 --> 01:03:18.740
which is going to
be c0 times R mod q.

01:03:18.740 --> 01:03:20.208
And the same thing
down here, I'm

01:03:20.208 --> 01:03:21.666
not going to write
out the pipeline

01:03:21.666 --> 01:03:23.200
because that'll look the same.

01:03:23.200 --> 01:03:27.520
And then, now that we've
switched into Montgomery form,

01:03:27.520 --> 01:03:31.840
we can finally do
our multiplications.

01:03:31.840 --> 01:03:34.190
And here's where we're going
to use the sliding window

01:03:34.190 --> 01:03:35.780
technique.

01:03:35.780 --> 01:03:38.290
So once we have c
prime, we can actually

01:03:38.290 --> 01:03:47.460
try to compute this prime
exponentiate it to 2d mod q.

01:03:47.460 --> 01:03:52.250
And here, as we're computing
this value to the d,

01:03:52.250 --> 01:03:53.990
we're going to be
using sliding windows.

01:03:53.990 --> 01:03:59.510
So here, we're going
to do sliding windows

01:03:59.510 --> 01:04:03.350
for the bits in this d exponent.

01:04:03.350 --> 01:04:08.450
And also we're going
to do Karatsuba

01:04:08.450 --> 01:04:12.820
or regular multiplication
depending on exactly what

01:04:12.820 --> 01:04:15.540
the size of our operands are.

01:04:15.540 --> 01:04:18.500
So if it turns out that the
thing we're multiplying,

01:04:18.500 --> 01:04:25.070
c0 prime and maybe that
previously squared result,

01:04:25.070 --> 01:04:27.310
are the same size, we're
going to do Karatsuba.

01:04:27.310 --> 01:04:31.230
If c0 prime is tiny
but some previous thing

01:04:31.230 --> 01:04:34.240
we're multiplying it to is
big , then we're going to do

01:04:34.240 --> 01:04:36.610
quadratic multiplication,
normal multiplication.

01:04:36.610 --> 01:04:38.520
There's sliding
windows coming in here,

01:04:38.520 --> 01:04:45.770
here we also have this Karatsuba
versus normal multiplying.

01:04:45.770 --> 01:04:49.630
And also in this step, the
extra reductions come in.

01:04:49.630 --> 01:04:54.420
Because at every multiply,
the extra reductions

01:04:54.420 --> 01:04:58.840
are going to be proportional
to the thing we're

01:04:58.840 --> 01:05:00.950
exponentiating mod q.

01:05:00.950 --> 01:05:04.452
[INAUDIBLE] just plug in
the formula over here,

01:05:04.452 --> 01:05:05.910
the probability
extra reductions is

01:05:05.910 --> 01:05:11.170
going to be proportional to
this value of c0 prime mod

01:05:11.170 --> 01:05:14.990
q divided by 2R.

01:05:19.200 --> 01:05:21.672
So this is where the
really timing sensitive bit

01:05:21.672 --> 01:05:22.718
is going to come in.

01:05:22.718 --> 01:05:24.384
And there are actually
two effects here.

01:05:24.384 --> 01:05:27.425
There's this Karatsuba
versus normal choice.

01:05:27.425 --> 01:05:29.720
And then there's the
number of extra reductions

01:05:29.720 --> 01:05:32.605
you're going to be making.

01:05:32.605 --> 01:05:34.480
So we'll see how we
exploit this in a second,

01:05:34.480 --> 01:05:36.800
but now that you get
this result for mod q,

01:05:36.800 --> 01:05:39.560
you're going to get a
similar result mod p,

01:05:39.560 --> 01:05:43.780
you can finally recombine
these guys from the top

01:05:43.780 --> 01:05:46.660
and the bottom and use CRT.

01:05:46.660 --> 01:05:49.870
And what you get out
from CRT is actually--

01:05:49.870 --> 01:05:55.110
sorry I guess we need a first
convert it back down into non

01:05:55.110 --> 01:05:56.760
Montgomery form.

01:05:56.760 --> 01:06:00.380
So we're going to
get first, we're

01:06:00.380 --> 01:06:09.620
going to get c0 prime to
the d divided by R mod q.

01:06:09.620 --> 01:06:15.160
And this thing, because c0
prime was c0 times R mod q,

01:06:15.160 --> 01:06:19.820
if we do this then we're going
to get back out our value of c

01:06:19.820 --> 01:06:23.110
to the d mod q.

01:06:23.110 --> 01:06:25.370
And we get c to
the d here, we're

01:06:25.370 --> 01:06:28.290
going to get to c to the d
mod p on the bottom version

01:06:28.290 --> 01:06:29.700
of this pipeline.

01:06:29.700 --> 01:06:35.220
And we can use CRT to get the
value of c to the d mod m.

01:06:35.220 --> 01:06:38.060
Sorry for the small
type here, or font size.

01:06:38.060 --> 01:06:40.680
But roughly it's the same
thing we're expecting here.

01:06:40.680 --> 01:06:44.305
We can finally get our result.
And we get our message, m.

01:06:44.305 --> 01:06:46.420
So the server takes
an incoming packet

01:06:46.420 --> 01:06:51.000
that it gets, runs it
through this whole pipeline,

01:06:51.000 --> 01:06:53.578
does two parts of
this pipeline, ends up

01:06:53.578 --> 01:06:57.627
with a decrypted message m
that's equal c to the d mod m.

01:06:57.627 --> 01:07:00.682
And then it's going to check
the padding of this message.

01:07:00.682 --> 01:07:02.940
And in this particular
attack, because we're

01:07:02.940 --> 01:07:05.320
going to carefully
construct this value c,

01:07:05.320 --> 01:07:07.810
the padding is going to
actually not match up.

01:07:07.810 --> 01:07:10.290
We're going to choose
the value c according

01:07:10.290 --> 01:07:12.629
to some other
heuristics that aren't

01:07:12.629 --> 01:07:14.754
encrypting a real message
with the correct padding.

01:07:14.754 --> 01:07:17.310
So the padding is going to be
a mismatch, and the server's

01:07:17.310 --> 01:07:19.601
going to need it to record
an error back to the client.

01:07:19.601 --> 01:07:22.080
[? And it pulls ?]
the connection.

01:07:22.080 --> 01:07:23.680
And that's the time
that we're going

01:07:23.680 --> 01:07:28.230
to measure to figure out how
long this whole pipeline took.

01:07:28.230 --> 01:07:29.362
Makes sense?

01:07:29.362 --> 01:07:31.070
Questions about this
pipeline and putting

01:07:31.070 --> 01:07:34.396
all the optimizations together?

01:07:34.396 --> 01:07:35.354
AUDIENCE: [INAUDIBLE]

01:07:41.445 --> 01:07:43.070
PROFESSOR: Yeah,
you're probably right.

01:07:43.070 --> 01:07:45.600
Yes, c1 to the d, c0 to the d.

01:07:45.600 --> 01:07:46.620
Yeah, this is c0.

01:07:46.620 --> 01:07:49.287
Yeah, correct.

01:07:49.287 --> 01:07:51.722
AUDIENCE: When you
divide by r [INAUDIBLE],

01:07:51.722 --> 01:07:55.131
isn't there a
[INAUDIBLE] on how many

01:07:55.131 --> 01:08:00.812
q's you have to have to get
the [? little bit ?] to be

01:08:00.812 --> 01:08:03.035
0? [INAUDIBLE].

01:08:03.035 --> 01:08:05.160
PROFESSOR: Yeah, so there
might be extra reductions

01:08:05.160 --> 01:08:07.049
in this final phase as well.

01:08:07.049 --> 01:08:07.590
You're right.

01:08:07.590 --> 01:08:11.220
So potentially, we have do
this divide by R correctly.

01:08:11.220 --> 01:08:13.300
So we probably have to
do exactly the same thing

01:08:13.300 --> 01:08:16.399
as we saw for the
Montgomery reductions here.

01:08:16.399 --> 01:08:19.649
When we do this divide
by R to convert it back.

01:08:19.649 --> 01:08:22.560
So it's not clear exactly
how many qs we should add.

01:08:22.560 --> 01:08:25.250
We should figure out how many
qs to add, add that many,

01:08:25.250 --> 01:08:28.329
kill the low zeros, and
then do mod q again,

01:08:28.329 --> 01:08:29.514
maybe an extra reduction.

01:08:29.514 --> 01:08:31.180
You're absolutely
right, this is exactly

01:08:31.180 --> 01:08:33.406
the same kind of
divide by R mod q

01:08:33.406 --> 01:08:38.229
as we do for every Montgomery
multiplication step.

01:08:38.229 --> 01:08:40.689
Make sense?

01:08:40.689 --> 01:08:43.569
Any other questions?

01:08:43.569 --> 01:08:44.116
All right.

01:08:44.116 --> 01:08:45.240
So how do you exploit this?

01:08:45.240 --> 01:08:47.689
How does an attacker
actually figure out

01:08:47.689 --> 01:08:49.710
what the secret
key of the server

01:08:49.710 --> 01:08:54.300
is by measuring the time
of this entire pipeline?

01:08:54.300 --> 01:08:58.160
So these guys have a
plan that basically

01:08:58.160 --> 01:09:03.810
involves guessing one bit of
the private key at a time.

01:09:03.810 --> 01:09:07.060
And what they mean actually
by guessing the private key is

01:09:07.060 --> 01:09:10.960
that you might think the private
key is this encryption exponent

01:09:10.960 --> 01:09:13.528
d, because actually
you know e, you

01:09:13.528 --> 01:09:15.160
know n, that's the public key.

01:09:15.160 --> 01:09:16.849
The only think you
don't know is d.

01:09:16.849 --> 01:09:19.785
But in fact, in this attack
they don't go for the exponent d

01:09:19.785 --> 01:09:21.810
directly, that's a little
bit harder to guess.

01:09:21.810 --> 01:09:23.185
Instead, what
they're going to go

01:09:23.185 --> 01:09:25.890
for is the value
q or the value p,

01:09:25.890 --> 01:09:27.649
doesn't really matter which one.

01:09:27.649 --> 01:09:31.229
Once you guess what the
value p or q is, then

01:09:31.229 --> 01:09:34.662
you can give an n, you can
factor in the p times q.

01:09:34.662 --> 01:09:37.470
Then if you know p times
q, you can actually--

01:09:37.470 --> 01:09:39.219
sorry-- if you know
the values of p and q,

01:09:39.219 --> 01:09:41.729
you can compute that phi
function we saw before.

01:09:41.729 --> 01:09:45.979
That's going to allow you to get
the value d from the value e.

01:09:45.979 --> 01:09:48.750
So this factorization of the
value m is hugely important,

01:09:48.750 --> 01:09:51.985
it should be secret for
RSA to remain secure.

01:09:51.985 --> 01:09:53.840
So these guys are
actually going to go

01:09:53.840 --> 01:09:55.830
and try to guess
what the value of q

01:09:55.830 --> 01:09:59.570
is by timing this pipeline.

01:09:59.570 --> 01:10:00.070
All right.

01:10:00.070 --> 01:10:02.410
So how do these
guys actually do it?

01:10:02.410 --> 01:10:10.280
Well, they construct
carefully chosen inputs, c,

01:10:10.280 --> 01:10:12.570
into this pipeline
and-- I guess I

01:10:12.570 --> 01:10:16.800
keep saying they keep measuring
the time for this guy.

01:10:16.800 --> 01:10:22.130
But the particular,
well, there's

01:10:22.130 --> 01:10:23.505
two parts of the
attack, you have

01:10:23.505 --> 01:10:26.390
to bootstrap it a little bit to
guess the first couple of bits.

01:10:26.390 --> 01:10:28.390
And then once you have
the first couple of bits,

01:10:28.390 --> 01:10:29.600
you can I guess the next bit.

01:10:29.600 --> 01:10:31.810
So let me not say
exactly how they

01:10:31.810 --> 01:10:34.997
guess the first couple of bits
because it's actually much more

01:10:34.997 --> 01:10:36.955
interesting to see how
they guess the next bit.

01:10:36.955 --> 01:10:38.330
And then we'll come
back if we have

01:10:38.330 --> 01:10:40.621
time to look at how they
guess the first couple of bits

01:10:40.621 --> 01:10:41.970
[? at this ?] in the paper.

01:10:41.970 --> 01:10:45.820
But basically, suppose you
have a guess g about what

01:10:45.820 --> 01:10:48.216
the bits are of this value q.

01:10:48.216 --> 01:10:56.820
So you know that q has some
bits, g0, g1, g2, et cetera.

01:10:56.820 --> 01:11:01.720
And actually, I guess
these are not even gs,

01:11:01.720 --> 01:11:04.990
these are real q bits, so
let me write it as that.

01:11:04.990 --> 01:11:10.310
So you know tat q bit
0 q bit 1, q bit 2,

01:11:10.310 --> 01:11:12.690
these are the highest bits of q.

01:11:12.690 --> 01:11:15.455
And then you're trying to
guess lower and lower bits.

01:11:15.455 --> 01:11:20.275
So suppose you know the
value of q up to bit j.

01:11:20.275 --> 01:11:22.750
And from that point on, your
guess is actually all 0.

01:11:22.750 --> 01:11:26.280
You have no idea what
the other bits are.

01:11:26.280 --> 01:11:31.900
So these guys are going
to try to get this guess

01:11:31.900 --> 01:11:35.760
g into this place
in the pipeline.

01:11:35.760 --> 01:11:38.280
Because this is where
there are two tiny effects:

01:11:38.280 --> 01:11:41.010
this choice of Karatsuba
versus normal multiplication.

01:11:41.010 --> 01:11:44.230
And this choice of, or
this a different number

01:11:44.230 --> 01:11:48.436
of extra reductions depending
on the value c0 prime.

01:11:48.436 --> 01:11:51.020
Sp they're going to actually
try to get two different guess

01:11:51.020 --> 01:11:53.330
values into that
place in the pipeline.

01:11:53.330 --> 01:11:58.120
One that looks like this,
and one that they call

01:11:58.120 --> 01:12:05.110
g high, which is all the
same high bits, q2 qj.

01:12:05.110 --> 01:12:07.440
And for the next bit,
which they don't know,

01:12:07.440 --> 01:12:09.750
[? you ?] guess g
is going to have 0,

01:12:09.750 --> 01:12:14.906
g high is going to have a bit
1 here and all zeros later on.

01:12:14.906 --> 01:12:19.040
So how does it help these guys
figure out what's going on?

01:12:19.040 --> 01:12:22.120
So there are really two
ways you can think of it.

01:12:22.120 --> 01:12:28.930
Suppose that we get this guess
g to be the value of c0 prime.

01:12:28.930 --> 01:12:34.350
We can think of g and g high
being the c0 prime value

01:12:34.350 --> 01:12:36.200
on that left board over there.

01:12:36.200 --> 01:12:37.700
It's actually fairly
straightforward

01:12:37.700 --> 01:12:42.460
to do this because c0 prime
is pretty deterministically

01:12:42.460 --> 01:12:44.480
computed from the
input ciphertext c0.

01:12:44.480 --> 01:12:47.030
You just multiply it
by R. So, in order

01:12:47.030 --> 01:12:49.240
for them to get
some value to here,

01:12:49.240 --> 01:12:53.370
as a guess, they just
need to take their guess

01:12:53.370 --> 01:12:57.340
and first divide it by R, so
divide it by 2 to the 512 mod

01:12:57.340 --> 01:12:58.340
something.

01:12:58.340 --> 01:13:01.610
And then, they're going
to inject it back.

01:13:01.610 --> 01:13:04.260
And the server's going
to multiply it by R,

01:13:04.260 --> 01:13:06.490
and then off you go.

01:13:06.490 --> 01:13:07.910
Make sense?

01:13:07.910 --> 01:13:09.490
All right.

01:13:09.490 --> 01:13:13.730
So suppose that we manage to get
our particular chosen integer

01:13:13.730 --> 01:13:16.650
value into that c0
you're prime spot.

01:13:16.650 --> 01:13:19.930
So what's going to be
the time to compute

01:13:19.930 --> 01:13:22.522
c0 prime to the d mod q?

01:13:22.522 --> 01:13:26.780
So there are two possible
options here where

01:13:26.780 --> 01:13:28.180
q falls in this picture.

01:13:28.180 --> 01:13:33.920
So it might be that q is
between these two values.

01:13:33.920 --> 01:13:37.462
Because the next bit of q is 0.

01:13:37.462 --> 01:13:39.170
So this value is going
to be less than q,

01:13:39.170 --> 01:13:41.670
but this guy's going
to be greater than q.

01:13:41.670 --> 01:13:44.970
So this happens if the
next bit of q0 or it

01:13:44.970 --> 01:13:48.340
might be that q lies
above both of these values

01:13:48.340 --> 01:13:51.880
if the next bit of q is 1.

01:13:51.880 --> 01:13:53.860
So now we can tell,
OK, what's going

01:13:53.860 --> 01:13:58.280
to be the timing of
decrypting these two values,

01:13:58.280 --> 01:14:04.225
if q lies in between them, or
if q lies above both of them.

01:14:04.225 --> 01:14:05.600
Let's look at the
situation where

01:14:05.600 --> 01:14:08.140
q lies above both of them.

01:14:08.140 --> 01:14:11.760
Well in that case,
actually everything

01:14:11.760 --> 01:14:13.160
is pretty much the same.

01:14:13.160 --> 01:14:13.660
Right?

01:14:13.660 --> 01:14:16.330
Because both of these
values are smaller than q,

01:14:16.330 --> 01:14:18.057
then the value of
these things mod q

01:14:18.057 --> 01:14:19.390
is going to be roughly the same.

01:14:19.390 --> 01:14:21.140
They're going to be a
little bit different

01:14:21.140 --> 01:14:24.540
because this extra bit,
but more or less they're

01:14:24.540 --> 01:14:26.420
the same magnitude.

01:14:26.420 --> 01:14:28.797
And the number of
extractor reductions

01:14:28.797 --> 01:14:31.380
is also probably not going to
be hugely different because it's

01:14:31.380 --> 01:14:34.780
proportional to the
value of this guy mod q.

01:14:34.780 --> 01:14:37.690
And for both these guys, they're
both a little bit smaller

01:14:37.690 --> 01:14:40.130
than q, so they're
all about the same.

01:14:40.130 --> 01:14:43.080
Neither of them is going to
exceed q and all of a sudden

01:14:43.080 --> 01:14:46.080
have [? many or fewer ?]
extra reductions.

01:14:46.080 --> 01:14:49.290
So if q is greater than
both of these guesses

01:14:49.290 --> 01:14:52.197
then Karatsuba versus normal
is going to stay the same.

01:14:52.197 --> 01:14:54.280
The server is going to do
the same thing basically

01:14:54.280 --> 01:14:56.825
for both g and g high in terms
of Karatsuba versus normal.

01:14:56.825 --> 01:14:59.145
And the server's going to
do about the same number

01:14:59.145 --> 01:15:01.497
of extra reductions for
both these guys as well.

01:15:01.497 --> 01:15:04.080
So If you see that the server's
taking the same amount of time

01:15:04.080 --> 01:15:06.050
to respond to these
guesses, then you

01:15:06.050 --> 01:15:10.580
should probably guess that, oh,
q probably has the bit 1 here.

01:15:10.580 --> 01:15:12.754
On the other hand, if
q lies in the middle,

01:15:12.754 --> 01:15:14.170
then there are two
possible things

01:15:14.170 --> 01:15:17.370
that could trigger a
change in the timing.

01:15:17.370 --> 01:15:19.680
One possibility is
that because g high

01:15:19.680 --> 01:15:22.712
is just a little
bit larger than q,

01:15:22.712 --> 01:15:24.170
then the number of
extra reductions

01:15:24.170 --> 01:15:26.336
is going to be proportional
to this guy mod q, which

01:15:26.336 --> 01:15:31.040
is very small because
c0 prime is q plus just

01:15:31.040 --> 01:15:33.915
a little bit in
these extra bits.

01:15:33.915 --> 01:15:35.290
So the number of
extra reductions

01:15:35.290 --> 01:15:36.650
is going to [? flaunt it ?].

01:15:36.650 --> 01:15:39.297
And all of a sudden,
it will be faster.

01:15:39.297 --> 01:15:40.880
Another possible
thing that can happen

01:15:40.880 --> 01:15:42.623
is that maybe the
server will decide, oh,

01:15:42.623 --> 01:15:44.664
now it's time to do normal
multiplication instead

01:15:44.664 --> 01:15:45.690
of Karatsuba.

01:15:45.690 --> 01:15:51.910
Maybe for this value,
all these, c to the 0

01:15:51.910 --> 01:15:55.170
prime was the same
number of bits as q

01:15:55.170 --> 01:15:58.890
if it turns out that
g high is above q,

01:15:58.890 --> 01:16:02.700
then g high mod q is potentially
going to have fewer bits.

01:16:02.700 --> 01:16:04.930
And if this crosses the
[INAUDIBLE] boundary,

01:16:04.930 --> 01:16:07.055
then the server's going to
do normal multiplication

01:16:07.055 --> 01:16:08.270
all of a sudden.

01:16:08.270 --> 01:16:10.590
So that's going to be
in the other direction.

01:16:10.590 --> 01:16:14.260
So if you cross over, then
normal multiplication kicks in,

01:16:14.260 --> 01:16:16.885
and things get a lot slower
because normal multiplication

01:16:16.885 --> 01:16:20.612
is quadratic instead of
nicer, faster Karatsuba.

01:16:20.612 --> 01:16:21.112
Question.

01:16:21.112 --> 01:16:22.066
AUDIENCE: [INAUDIBLE]

01:16:23.859 --> 01:16:26.150
PROFESSOR: Yeah, because the
number of extra reductions

01:16:26.150 --> 01:16:31.520
is proportional to from above
there to c0 prime mod q.

01:16:31.520 --> 01:16:36.880
So if c0 prime, which is this
value, is just a little over q.

01:16:36.880 --> 01:16:40.350
Then, this is tiny, as opposed
to this guy who's basically

01:16:40.350 --> 01:16:43.495
the same as q, or all the
high bits are the same as q,

01:16:43.495 --> 01:16:44.820
and then it's big.

01:16:44.820 --> 01:16:47.980
So then it'll be the difference
that you can try to measure.

01:16:47.980 --> 01:16:49.730
So this is one interesting
thing, actually

01:16:49.730 --> 01:16:51.480
a couple interesting
things, these effects

01:16:51.480 --> 01:16:53.355
actually work in different
directions, right.

01:16:53.355 --> 01:16:55.870
So if you hit a 32 bit
boundary and Karatsuba

01:16:55.870 --> 01:16:58.170
versus normal switches,
then all of a sudden

01:16:58.170 --> 01:17:00.930
it takes much longer to
decrypt this message.

01:17:00.930 --> 01:17:04.460
On the other hand, if it's
not a 32 bit boundary,

01:17:04.460 --> 01:17:07.424
maybe this effect will
tell you what's going on.

01:17:07.424 --> 01:17:09.590
So you actually have to
watch for different effects.

01:17:09.590 --> 01:17:13.400
If you're not guessing a bit
that's a multiple of 32 bits,

01:17:13.400 --> 01:17:15.410
then you should
probably expect the time

01:17:15.410 --> 01:17:18.125
to drop because of
extra reductions.

01:17:18.125 --> 01:17:19.620
On the other hand,
if you're trying

01:17:19.620 --> 01:17:22.570
to guess a bit that's
a multiple of 32, then

01:17:22.570 --> 01:17:25.100
maybe you should be expecting
for it to jump a lot

01:17:25.100 --> 01:17:27.890
or maybe drop if it's
[INAUDIBLE] normal.

01:17:27.890 --> 01:17:29.890
So I guess what these
guys look at in the paper,

01:17:29.890 --> 01:17:31.450
this actually
doesn't really matter

01:17:31.450 --> 01:17:34.380
whether there's a jump up
or a jump down in time.

01:17:34.380 --> 01:17:38.570
You should just expect if q
is, if the next bit of q is 1,

01:17:38.570 --> 01:17:40.310
you should expect
these things to take

01:17:40.310 --> 01:17:41.740
almost the same amount of time.

01:17:41.740 --> 01:17:44.607
And if the next bit
of q is 0, then you

01:17:44.607 --> 01:17:46.940
should expect these guys to
have a noticeable difference

01:17:46.940 --> 01:17:51.740
even if it's big or small, even
if it's positive or negative.

01:17:51.740 --> 01:17:53.364
So actually, they measure this.

01:17:53.364 --> 01:17:55.280
And it turns out to
actually work pretty well.

01:17:55.280 --> 01:17:57.790
They have to do actually
two interesting tricks

01:17:57.790 --> 01:17:58.820
to make this work out.

01:17:58.820 --> 01:18:01.890
If you remember the timing
difference was tiny,

01:18:01.890 --> 01:18:05.110
it's an order of 1
to 2 microseconds.

01:18:05.110 --> 01:18:07.690
So it's going to be hard to
measure this over a network,

01:18:07.690 --> 01:18:10.060
over an ethernet
switch for example.

01:18:10.060 --> 01:18:13.460
What they do is they actually
do two kinds of measurements,

01:18:13.460 --> 01:18:15.310
two kinds of averaging.

01:18:15.310 --> 01:18:17.370
So for each guess
that they send,

01:18:17.370 --> 01:18:18.870
they actually send
it several times.

01:18:18.870 --> 01:18:20.710
In the paper, they
said they send it

01:18:20.710 --> 01:18:22.380
like 7 times or something.

01:18:22.380 --> 01:18:24.430
So what kind of
noise do you think

01:18:24.430 --> 01:18:26.670
this helps them with
[? if they ?] just resend

01:18:26.670 --> 01:18:29.440
the same guess over and over?

01:18:29.440 --> 01:18:30.400
Yeah.

01:18:30.400 --> 01:18:33.114
AUDIENCE: What's up
with the [INAUDIBLE]?

01:18:33.114 --> 01:18:34.780
PROFESSOR: Yeah, so
if the network keeps

01:18:34.780 --> 01:18:36.154
adding different
things, you just

01:18:36.154 --> 01:18:37.686
try the same thing many times.

01:18:37.686 --> 01:18:39.060
The thing in the
server should be

01:18:39.060 --> 01:18:41.101
taking exactly the same
amount of time every time

01:18:41.101 --> 01:18:42.870
and just average out
the network noise.

01:18:42.870 --> 01:18:45.460
In the paper, they say they take
the median value-- I actually

01:18:45.460 --> 01:18:47.030
don't understand why
they take the median,

01:18:47.030 --> 01:18:48.510
I think they should be taking
the min of the real thing

01:18:48.510 --> 01:18:50.160
that's going on--
but anyway, this

01:18:50.160 --> 01:18:52.000
was the average of the network.

01:18:52.000 --> 01:18:54.630
But then they do this
other weird thing,

01:18:54.630 --> 01:18:57.850
which is that when
they're sending a guess,

01:18:57.850 --> 01:19:00.280
they don't just send
the same guess 7 times,

01:19:00.280 --> 01:19:02.730
they actually send a
neighborhood of guesses.

01:19:02.730 --> 01:19:04.920
And each value in
the neighborhood

01:19:04.920 --> 01:19:06.250
gets sent 7 times itself.

01:19:06.250 --> 01:19:09.960
So they actually send g 7 times.

01:19:09.960 --> 01:19:13.700
Then they send g
plus 1 also 7 times.

01:19:13.700 --> 01:19:17.980
Then they send g plus 2 also
7 times, et cetera, up to g

01:19:17.980 --> 01:19:20.660
plus 400 in the paper.

01:19:20.660 --> 01:19:23.640
Why do they do this
kind of averaging

01:19:23.640 --> 01:19:29.120
as well over different g value
instead of just sending g

01:19:29.120 --> 01:19:32.007
7 times 400 times.

01:19:32.007 --> 01:19:33.590
Because it seems
more straightforward.

01:19:33.590 --> 01:19:34.090
Yeah?

01:19:34.090 --> 01:19:35.000
AUDIENCE: [INAUDIBLE]

01:19:38.290 --> 01:19:40.380
PROFESSOR: Yeah, that's
actually what's going on.

01:19:40.380 --> 01:19:44.060
We're actually trying to measure
exactly how long this piece

01:19:44.060 --> 01:19:45.109
of computation will take.

01:19:45.109 --> 01:19:46.650
But then there's
lots of other stuff.

01:19:46.650 --> 01:19:48.858
For example, this other
pipeline that's at the bottom

01:19:48.858 --> 01:19:50.339
is doing all the stuff mod p.

01:19:50.339 --> 01:19:52.630
I mean it's also going to
take different amount of time

01:19:52.630 --> 01:19:54.870
depending on what
exactly the input is.

01:19:54.870 --> 01:19:57.600
So the cool thing is
that if you perturb

01:19:57.600 --> 01:20:01.340
the value of all your
guess g by adding 1, 2, 3,

01:20:01.340 --> 01:20:03.370
whatever, it's just
[INAUDIBLE] the little bits.

01:20:03.370 --> 01:20:05.690
So the timing attack we
just looked at just now,

01:20:05.690 --> 01:20:07.570
isn't going to change
because that depended

01:20:07.570 --> 01:20:10.400
on this middle bit flipping.

01:20:10.400 --> 01:20:13.115
But everything that's
happening on the bottom side

01:20:13.115 --> 01:20:15.550
of the pipeline mod p
is going to be totally

01:20:15.550 --> 01:20:17.160
randomized by this
because when they

01:20:17.160 --> 01:20:19.570
do it mod p then
adding an extra bit

01:20:19.570 --> 01:20:22.610
could shift things
around quite a bit mod p.

01:20:22.610 --> 01:20:25.920
Then you're going to,
it will average out

01:20:25.920 --> 01:20:28.000
other kinds of
computational noise

01:20:28.000 --> 01:20:30.140
that's deterministic
for a particular value

01:20:30.140 --> 01:20:33.730
but it's not related to this
part of the computation we're

01:20:33.730 --> 01:20:34.690
trying to go after.

01:20:34.690 --> 01:20:35.668
Make sense?

01:20:35.668 --> 01:20:37.436
AUDIENCE: How do they
do that when they

01:20:37.436 --> 01:20:38.602
try to guess the lower bits?

01:20:38.602 --> 01:20:41.650
PROFESSOR: So actually they use
some other mathematical trick

01:20:41.650 --> 01:20:44.910
to only actually bother guessing
the top half of the bits of q.

01:20:44.910 --> 01:20:47.160
It turns out if you know the
top half of the bits of q

01:20:47.160 --> 01:20:50.480
there's some math you can
rely on to factor the numbers,

01:20:50.480 --> 01:20:51.730
and then you're in good shape.

01:20:51.730 --> 01:20:53.790
So you can always
[INAUDIBLE] little bit.

01:20:53.790 --> 01:20:55.689
Basically not worry about it.

01:20:55.689 --> 01:20:56.189
Make sense?

01:20:56.189 --> 01:20:57.155
Yeah, question.

01:20:57.155 --> 01:20:58.121
AUDIENCE: [INAUDIBLE]

01:21:01.510 --> 01:21:05.600
PROFESSOR: Well, you're going to
construct this value c0-- well

01:21:05.600 --> 01:21:08.250
you want the c0 prime-- you're
going to construct a value

01:21:08.250 --> 01:21:13.200
c by basically taking your c0
prime and multiplying it times

01:21:13.200 --> 01:21:14.990
R inverse mod n.

01:21:17.860 --> 01:21:20.430
And then when the
server takes this value,

01:21:20.430 --> 01:21:22.000
it's going to push
it through here.

01:21:22.000 --> 01:21:23.680
So it's going to compute c0.

01:21:23.680 --> 01:21:26.386
It's going to be c mod
q, so that value is going

01:21:26.386 --> 01:21:29.210
to be c0 prime R inverse mod q.

01:21:29.210 --> 01:21:32.550
Then you multiply it by R, so
you get rid of the R inverse.

01:21:32.550 --> 01:21:35.800
And then you end up with a
guess exactly in this position.

01:21:35.800 --> 01:21:37.820
So the cool thing is
basically all manipulations

01:21:37.820 --> 01:21:40.570
leading up to here are
just multiplying by this R.

01:21:40.570 --> 01:21:43.360
And you know what R is going be,
it's going to be 2 to the 512.

01:21:43.360 --> 01:21:46.894
I'm going to be really
straightforward.

01:21:46.894 --> 01:21:47.674
Make sense?

01:21:47.674 --> 01:21:48.382
Another question?

01:21:48.382 --> 01:21:51.180
AUDIENCE: Could we just
cancel out timing [INAUDIBLE]?

01:21:56.115 --> 01:21:59.930
PROFESSOR: Well, if you do
p, you'd be in business.

01:21:59.930 --> 01:22:01.220
Yeah, so that's the thing.

01:22:01.220 --> 01:22:03.341
Yeah, you don't know
what p is, but you just

01:22:03.341 --> 01:22:06.375
want to randomize it out.

01:22:06.375 --> 01:22:07.440
Any questions?

01:22:07.440 --> 01:22:10.549
All right. [INAUDIBLE] but
thanks for sticking around.

01:22:10.549 --> 01:22:13.300
So we'll start talking about
other kinds of problems

01:22:13.300 --> 01:22:15.150
next week.